In Part 1 and Part 2, I set the background. In Part 3, I attacked a castle… it is now an infested ruins. We’re now building a better version of it (meeting the specification). Our new castle with relational fortifications. Who told database design is no fun?
I’m using a full fledged Microsoft stack/tool on a Linux machine. I’m so happy to be a developer in 2021, it is such a great experience with all the good and quite easy to install softwares and tools. For this exercise, I finally made a final move to Azure Data Studio and SQL Server (Developer Edition) on Docker. It should be easier to tell a story with all the pieces in one place: code blocks, outputs and messages all packed together. No more copy/paste. It may well help my SQL syntax (I add to correct the Part 3 on the foreign key, a
ADD was missing)
Or so I though. SqlServer on docker, great: itjustworkstm. With Azure Data Studio, I put the bar too high. I expected to easily export the Notebook as HTML or Markdown into WordPress… not much (I was earlier in my career spoiled by the great RStudio … again so grateful to be in 2021). GitHub get it almost right… just that the error messages are lost 😦
On the good side:
- this post should be shorter, and focus on the major achievements in comparison with the old design.
- you can run the
.ipynbstep by step with the SQL kernel on your own machine.
Our new castle
We went from that
## graphviz/dot required java -jar schemaspy-6.1.0.jar -o './alt' -u '******' -db alt -p '*****' -t mssql08 -host 'localhost:1400' -schemas alt -dp ./sqljdbc_18.104.22.168_enu/sqljdbc_9.2\\enu/mssql-jdbc-9.2.1.jre11.jar
No more implied relationship, I’m still struggling with finding appropriate names but they are much more explicit. We don’t have any more
'id'. The chance that somebody join on icdo_location_code = icdo_location_desc is way lower… and even if they try, they now have totally different domain: ‘C33.2′ = ‘Some description’ ? It returns an empty set and implies that something bad happened with the query, It needs to be corrected.
Under the hood
Database posters are good to communicate, but they can not represent all they’re is to know, it is just a nice bird’s-eye view. Let’s see 2 interesting enough to be mentioned improvements
User-Defined Data Types
CREATE TYPE alt._icdo_version FROM int NOT NULL; CREATE TYPE alt._icdo_location_code FROM varchar(8) NOT NULL;
I was quite excited to see their potential in action. I expected some strong type definition… and was a bit disappointed. It is not enough to protect against ill-formed values (RULEs could have helped here, but Microsoft warns they may well be deprecated, so sad) and values from different TYPEs can be compared without throwing an error like this one:
Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value 'heaeu' to data type int.
Buuuuut, I should and will keep them. You may remember that the ICDO Location Codes were different in 2 tables: varchar(8) vs varchar(12). We now have a tool to deal with the issue. The length is defined once for all in the TYPE. It’s a block we can reuse, it ensures the length are the same, makes the domain explicit and brings the next point … roullement de tambour (drum roll)
A crone for diagnose
The types are finally ensured to be the same (_icdo_location_code and no more varchar… or is it char… and what is the length…)
ALTER TABLE alt.diagnose ADD FOREIGN KEY (icdo_version, icdo_location_code) REFERENCES alt.icdo_location_version_description (icdo_version, icdo_location_code);
9:06:10 PM Started executing query at Line 67 Commands completed successfully. Total execution time: 00:00:00.033
What a relief! The diagnose is finally connected to the rest, and is ensured to be consistent with our domain definition.
And the winner is the castle of Part 4. It stands so proudly. Do not take my words for granted, go on and attack it yourself. Here is the Notebook: icdo_refact. You will surely find some angle to defeat it or some pitfalls (I have at least one in mind). If so, leave me a comment!
Now we can rest and contemplate what we gained from this new design on the reporting side. The query is now only 1 JOIN with 2 conditions where in Part 2 we had to deal with 3 JOINs with 4 conditions.
select d.patient_id, d.icd10_code, d.icdo_location_code, ilvd.icdo_location_desc from alt.diagnose d join alt.icdo_location_version_description as ilvd on ilvd.icdo_version = d.icdo_version and ilvd.icdo_location_code = d.icdo_location_code
We have gain in robustness and ease of reporting… not that bad!
In Part 5, I shall… naaa just kidding. I had a hard time publishing this one. Blogging is more time consuming than expected. Knowing I had to publish the fourth part was really a challenge. Lesson learned: no multi-part blog and surrogate key are no panacea!
too late to edit, ship it! SEE YOU SQL COWBOY…