This weekend, I listened to the SDU Show 81 podcast and the guest was Louis Davidson, the author my favorite SQL Server book. In this podcast, a common SQL anti-pattern came: not covering the natural keys with UNIQUE constraints, only having the surrogate being unique. I could only node in agreement. I take the time to review if a column could be UNIQUE or not, and even naming and commenting such constraint appropriately. But I’ve never really got burned with the consequences of such a “design”. Only once and then in a JOIN gone cartesian in our reports.
Now back to practice, back to the office and our legacy database… and you may smell what’s coming.
Ooops
Today I inserted new medical protocols in a table with something like protocol(guid, code, name). Your usual code table. One commit later, run on test… ok, push to prod… fine. One ticket closed, what a good start for a Monday. I’m done and move to the next task.
In front of me the application developer get a call, the emails pop in the Inbox and I can hear that’s about the application crashing, like you click and it’s gone. I don’t make the link with my changes until I hear the word “therapy”.
I inform him of the change I just made. The time I scan what could have gone wrong, my colleague spotted a System.ArgumentException “An item with the same key has already been added.” (quite similar to this one). I had inserted a duplicated “name” under a different “code”. From this point, the quick fix was to rename one of the duplicated protocol with “old” and ensure unique name asap. It stopped the application from crashing. Damage control done, I pushed a UNIQUE constraint on the “name” column and was warned that such an index is kind of 1700 bit limited. For good measure, the column was also altered to an appropriate nvarchar(200) instead of the outrageous nvarchar(4000). For comparison, the content of the present blog post would fit in such a column. Crazy.
Aftermath
A few things bother me with this incident. But let just consider the main one. We could have avoided it by implementing it directly in the database. The application was assuming a contract the database could ignore.
To forget referential integrity in the database is literally letting traps open for yourself and your colleagues to get caught. I’ve stumbled quite often on the assumption that all the changes on the data will occur through the application layer and thus why bother with the database. Today it did bother me in a new and stressful form. I learned the hard way that it can cost downtime on the application your users rely on.
How do I value the cost of a downtime? High, it can disturb my sleep! Is it less than the initial time to review a column definition? Nope, definitely not to me. Can I stop the time and go in the database to remove all those traps? Surely not, but one at a time, we’re heading in the good direction!