IDENTITY crisis — 2. Death by Surrogate

Nothing looks more alike than 2 INT. I wish a warning could come if I ever try to join on 2 columns defined with IDENTITY.


Surrogate: 1 – Developer: 0

We left with ... on v.id = c.id ... and the query was:

select 
    date, patient_id, icd10_code, icdo_code, icdo_value
from diagnose d
    join icdo c on c.code = d.icdo_code
    -- Part 2, now!
    join icdo_value v on v.id = c.id
group by 
    -- same list as SELECT treated in Part 1
    date, patient_id, icd10_code, icdo_code, icdo_value

In our database on v.id = c.id should be something strange, create a black hole somewhere, a glitch in the matrix. Compare that with matching the numbers on each jersey in a team with the numbers on a bingo card. The numbers do match, but it is nonsensical. The query is incorrect, and the results dangerously misleading because “good enough” looking.

Moving on

Correcting the query is only a part of the job, let be constructive, let’s ask question! What could be done to avoid this nonsense and protect us, the poor (hum) developers? The posts to come should develop some of the following elements.

  1. Intellisense: It would have been nice to have an error in the way Msg: Can not JOIN on two IDENTITY columns: icdo_value.id / icdo.id … but I can not directly implement that in SSMS, kind of out of my league here.
  2. Naming: With something like … icdo_code_id = icdo_description_id … you know you’re in the wrong. Finding good enough names is an art and a necessity!
  3. Natural Keys: The icdo_codes do not need surrogates, it is a pretty fine primary key … in a code table! They look like ‘C50.9’ and make much sense than ‘234’ in our domain. As for the icdo_value, it is a function of code * version… so let it be, go for composite!
  4. User-defined data types (UDDT): icdo_code and icdo_value are 2 different types, you should get a nice error with a join! Point 2. and 3. are treated quite often in blogs/articles, but UDDT is not getting the attention I think it deserves. Outside of SQL we enjoy load of types. In SQL we abuse int, varchar, date and bit to fit all purposes? I don’t yet see the reason. If you know, leave a comment please! What have been your experience with it. Awesome? ORM don’t get it? Bug? Performance? It cannot always be ignorance or laziness.
  5. Code review: That saved us on this one and on many others. We were 2 persons with different approaches. Is your query important enough? Let somebody else have an eye on it!

Correctness is not all

Before we dive any deeper and unveil our tormentor in Part 3, here is a look at what our failed wannabe DISTINCT with id mismatch should have been. See how icdo is joined by c.code for the table diagnose but by the surrogate c.id for the table icdo_code_has_value. We were send to a minefield.

select 
    d.date, d.patient_id, 
    d.icd10_code, d.icdo_code, v.icdo_value
from diagnose d
    join icdo c on c.code = d.icdo_code
    join icdo_code_has_value ichv
        on c.id = ichv.code_id 
        and d.icdo_version = ichv.version
    join icdo_value v on v.id = ichv.value_id -- c.id
-- group by 
--     date, patient_id, icd10_code, icdo_code, icdo_value

too late to edit, ship it! SEE YOU SQL COWBOY…

2 thoughts on “IDENTITY crisis — 2. Death by Surrogate

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: