IDENTITY crisis — Part 1: a DISTINCT in disguise

Nothing looks more alike than 2 INT. I wish a warning could come if I ever try to join on 2 columns defined with IDENTITY.

This started as a single post. But as I’m redacting, it gets longer. I plan/need to split it in 3 parts. That being said:

If you want to make God laugh, tell him about your plans.

Woody Allen?

I recently required some help from a colleague. We receive export request like:

Please give us a list of all patient with XXX, YYY, ZZZ tumors of type TTT

Dr. med. Dr. Prof.

I had interpreted the 3 tumors description in the Disease catalog (ICD10), but they (singular neutral)* rightly redirected to the Location catalog (ICD-O). The 2 really are correlated.

In doing so, the colleague nicely provided a SQL script. I was more than happy, I knew the way the ICD* codes are build in the database is not trivial and the results was now close to the expectation. The reason why I asked a review was that my solution had 7 elements, expected were 70.

But one things was worrying: the uniqueness was achieved with a GROUP BY on all selected attributes:

select 
    date, patient_id, icd10_code, icdo_code, icdo_value
from diagnose d
    join icdo c on c.code = d.icdo_code
    join icdo_value v on v.id = c.id
group by 
    -- same list as SELECT ... code RED :( ALARM
    date, patient_id, icd10_code, icdo_code, icdo_value

A simple SELECT would have created an explosion: for a given code, the value is dependent on the version (definition_text ~ code * version). If you short cut it, you multiple and get size = code * (number of version). They chooses to treat it with this GROUP BY… And I’m still left today wondering why…

But okay, I have to move on. (- OK Fred? you can not move on that, do you? -Tssst… I said move on. – But… – MOVE ON). So let’s get this version to achieve uniqueness. Considering the query, I go look at the relation icdo_value for the FOREIGN KEY pointing in some way to the version… but none to be found. This table just stares at me with a single primary key named id of type IDENTITY. Huuuu?

Back to the query: ... on v.id = c.id ... Can you see what’s going on here? It took me some times. More on that in Part 2, R.I.P. GROUP BY (sorrow).


* just trying to get use to it.

2 thoughts on “IDENTITY crisis — Part 1: a DISTINCT in disguise

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: