Nothing looks more alike than 2 INT
. I wish a warning could come if I ever try to join on 2 columns defined with IDENTITY.
This started as a single post. But as I’m redacting, it gets longer. I plan/need to split it in 3 parts. That being said:
If you want to make God laugh, tell him about your plans.
Woody Allen?
I recently required some help from a colleague. We receive export request like:
Please give us a list of all patient with XXX, YYY, ZZZ tumors of type TTT
Dr. med. Dr. Prof.
I had interpreted the 3 tumors description in the Disease catalog (ICD10), but they (singular neutral)* rightly redirected to the Location catalog (ICD-O). The 2 really are correlated.
In doing so, the colleague nicely provided a SQL script. I was more than happy, I knew the way the ICD* codes are build in the database is not trivial and the results was now close to the expectation. The reason why I asked a review was that my solution had 7 elements, expected were 70.
But one things was worrying: the uniqueness was achieved with a GROUP BY on all selected attributes:
select
date, patient_id, icd10_code, icdo_code, icdo_value
from diagnose d
join icdo c on c.code = d.icdo_code
join icdo_value v on v.id = c.id
group by
-- same list as SELECT ... code RED :( ALARM
date, patient_id, icd10_code, icdo_code, icdo_value
A simple SELECT would have created an explosion: for a given code, the value is dependent on the version (definition_text ~ code * version). If you short cut it, you multiple and get size = code * (number of version)
. They chooses to treat it with this GROUP BY… And I’m still left today wondering why…
But okay, I have to move on. (- OK Fred? you can not move on that, do you? -Tssst… I said move on. – But… – MOVE ON). So let’s get this version to achieve uniqueness. Considering the query, I go look at the relation icdo_value
for the FOREIGN KEY pointing in some way to the version
… but none to be found. This table just stares at me with a single primary key named id of type IDENTITY. Huuuu?
Back to the query: ... on v.id = c.id ...
Can you see what’s going on here? It took me some times. More on that in Part 2, R.I.P. GROUP BY (sorrow).
* just trying to get use to it.
2 thoughts on “IDENTITY crisis — Part 1: a DISTINCT in disguise”