Nothing looks more alike than 2 INT
. I wish a warning could come if I ever try to join on 2 columns defined with IDENTITY.
Surrogate: 1 – Developer: 0
We left with ... on v.id = c.id ...
and the query was:
select
date, patient_id, icd10_code, icdo_code, icdo_value
from diagnose d
join icdo c on c.code = d.icdo_code
-- Part 2, now!
join icdo_value v on v.id = c.id
group by
-- same list as SELECT treated in Part 1
date, patient_id, icd10_code, icdo_code, icdo_value
In our database on v.id = c.id
should be something strange, create a black hole somewhere, a glitch in the matrix. Compare that with matching the numbers on each jersey in a team with the numbers on a bingo card. The numbers do match, but it is nonsensical. The query is incorrect, and the results dangerously misleading because “good enough” looking.
Moving on
Correcting the query is only a part of the job, let be constructive, let’s ask question! What could be done to avoid this nonsense and protect us, the poor (hum) developers? The posts to come should develop some of the following elements.
- Intellisense: It would have been nice to have an error in the way
Msg: Can not JOIN on two IDENTITY columns: icdo_value.id / icdo.id
… but I can not directly implement that in SSMS, kind of out of my league here. - Naming: With something like … icdo_code_id = icdo_description_id … you know you’re in the wrong. Finding good enough names is an art and a necessity!
- Natural Keys: The icdo_codes do not need surrogates, it is a pretty fine primary key … in a code table! They look like ‘C50.9’ and make much sense than ‘234’ in our domain. As for the icdo_value, it is a function of code * version… so let it be, go for composite!
- User-defined data types (UDDT): icdo_code and icdo_value are 2 different types, you should get a nice error with a join! Point 2. and 3. are treated quite often in blogs/articles, but UDDT is not getting the attention I think it deserves. Outside of SQL we enjoy load of types. In SQL we abuse int, varchar, date and bit to fit all purposes? I don’t yet see the reason. If you know, leave a comment please! What have been your experience with it. Awesome? ORM don’t get it? Bug? Performance? It cannot always be ignorance or laziness.
- Code review: That saved us on this one and on many others. We were 2 persons with different approaches. Is your query important enough? Let somebody else have an eye on it!
Correctness is not all
Before we dive any deeper and unveil our tormentor in Part 3, here is a look at what our failed wannabe DISTINCT with id mismatch should have been. See how icdo
is joined by c.code
for the table diagnose
but by the surrogate c.id
for the table icdo_code_has_value
. We were send to a minefield.
select
d.date, d.patient_id,
d.icd10_code, d.icdo_code, v.icdo_value
from diagnose d
join icdo c on c.code = d.icdo_code
join icdo_code_has_value ichv
on c.id = ichv.code_id
and d.icdo_version = ichv.version
join icdo_value v on v.id = ichv.value_id -- c.id
-- group by
-- date, patient_id, icd10_code, icdo_code, icdo_value
too late to edit, ship it! SEE YOU SQL COWBOY…
2 thoughts on “IDENTITY crisis — 2. Death by Surrogate”