IDENTITY crisis — 3. Relational Distancing

After Part 1 and Part 2. I want to embark on a code review. Playing the pathologist, running a small diagnostic on the diagnose table and its friends.


Relationaaaaaaah

By the all mighty, we shall invoke SchemaSpy, and the picture should speak:

a lonely diagnose Table

I now feel so sad for diagnose. Quarantine measures? The only relation is “implied”. The FOREIGN KEY do not exists. Just the neat SchemaSpy telling us what ought to be.

We now dive deeper and zoom on the structure. Along the trip I will shoot at the tables with assumption and business rules … and see how this proto-database holds.

Load, Aim…

Relational Atoms

One positive point: the codes/static data are present in the database. It’s not that an evidence. That being said … FIRE!

CREATE TABLE old.icdo_version (
        version  INT NOT NULL PRIMARY KEY CLUSTERED
        );
        ---------------------------------------------
        insert into old.icdo_version values (2011), (2017);
            /*
            -- PASS: unique version
            insert into old.icdo_version values (2011);
            -- PASS: no null
            insert into old.icdo_version values (NULL);
            -- FAIL: version named after years
            insert into old.icdo_version values (42);
            */
 

    CREATE TABLE old.icdo_value (
        id    INT IDENTITY    NOT NULL PRIMARY KEY CLUSTERED,
        text  NVARCHAR (400) -- NULL and DUPLICATE allowed?
        );
        --------------------------------------------
        insert into old.icdo_value(text)
            values ('C11/2011'), ('C22.2/2011'), ('C11/2017'), ('C99.4/2017');
            /*
            -- FAIL: unique meaning, else what is the point of such a value catalog
            insert into old.icdo_value(text) select top 1 text from old.icdo_value order by id asc;
            -- FAIL: a code as a meaning
            insert into old.icdo_value(text) select NULL;
            */

    CREATE TABLE old.icdo (
        id   INT IDENTITY  NOT NULL PRIMARY KEY CLUSTERED,
        code VARCHAR (12)  NOT NULL UNIQUE
            -- later to be used in a varchar(8)
            -- but here it contains 2 codes and one of them
        );
        -------------------------------------------------
        insert into old.icdo(code)
            values ('C11'), ('C22.2'), ('C99.4');
            /*
            -- PASS: Unique
            insert into old.icdo(code) select top 1 code from old.icdo order by id asc;
            -- PASS: NON NULL
            insert into old.icdo(code) select NULL;
            -- FAIL: Well formatted '[CD][0-9][0-9]%'
            insert into old.icdo(code) select 'bad format';
            */

Round 1: 4 out of 8 target. Not that reliable for a building block. I particularly don’t enjoy that duplicated are allowed where value should be unique. Also previously mentioned, the unfortunate naming and choice for surrogate over nice natural keys.

Loaded, Attack!

Molecules

Next level: not all combinations out of code * version * text are relevant. But let’s see how it behave on the field.

   CREATE TABLE old.icdo_code_has_value (
        code_id  INT NOT NULL
            FOREIGN KEY REFERENCES old.icdo(id) 
            ON DELETE CASCADE ON UPDATE CASCADE,
                -- on delete cascade. not playing safe here
        value_id INT NOT NULL
            FOREIGN KEY REFERENCES old.icdo_value(id) ON DELETE CASCADE ON UPDATE CASCADE,
        version  INT NOT NULL
            FOREIGN KEY REFERENCES old.icdo_version(version)
                ON DELETE CASCADE ON UPDATE CASCADE,    

        PRIMARY KEY CLUSTERED (code_id, version, value_id),
            -- This was a good start: but this PK allow different value
            -- to be added to the code * version
        );
        -----------------------------------------
        with prep as (
            select val.id,
                LEFT(val.text, CHARINDEX('/', val.text) - 1) as code,
                cast( SUBSTRING(val.text, charindex('/', val.text) +1, len(val.text)) as int) as version
            from old.icdo_value as val
            )
        insert into old.icdo_code_has_value(code_id, value_id, version)
        select icdo.id, prep.id, icdo_version.version
        from prep
            join old.icdo on icdo.code = prep.code
            join old.icdo_version on icdo_version.version = prep.version
        ;
        select * from old.icdo_code_has_value;
            /*
            -- FAIL: code * version => text --
            insert into old.icdo_code_has_value(code_id, value_id, version)
            select 1, 2, 2011;
            */

Positive point: no null allowed. We could nonetheless score 1 major hit due to the zealous Primary Key. This could lead to undetermined text: some code*version may be assigned 2 meaning… try to hold your JOIN.

Cells

Last to document the different cases:

    CREATE TABLE old.diagnose (
        diagnose_id    INT IDENTITY   NOT NULL PRIMARY KEY CLUSTERED,
        patient_id     INT            NOT NULL, -- FK/Ref Table removed for clarity
        icd10_code     VARCHAR (8)    ,
        icdo_code     VARCHAR (8)    ,
            -- lol... our codes are varchar(12) because mixing 2 types of codes
            -- no luck to get a FK on that
        icdo_version   INT           
            -- hey... anyone ... I'm looking for FK here... no?
        );
        --------------------------------------------------
        insert into old.diagnose(patient_id, icd10_code, icdo_code, icdo_version)
        select 99, 'd44', icdo.code, ichv.version
        from old.icdo_code_has_value ichv
        join old.icdo on icdo.id = ichv.code_id;
        select * from old.diagnose;        
            /*
            -- FAIL * 3
            -- Non allowed combination with non allowed code or value
                -- update without where... do not do this at home :)
            update old.diagnose
            set icdo_code = 'hell', icdo_version = 666;
            select * from old.diagnose;
            */

… oh, poor diagnose table. On top of the lack of data integrity, please do ponder the varchar(8) for a varchar(12) associated code. .. and for the fun, throw our last grenade šŸ™‚

ALTER TABLE old.diagnose
    ADD FOREIGN KEY (icdo_code) REFERENCES old.icdo(code);
Msg 1753, Level 16, State 0, Line 131
Column 'old.icdo.code' is not the same length or scale as referencing column 'diagnose.icdo_code' in foreign key 'FK__diagnose__icdo_c__4222D4EF'. Columns participating in a foreign key relationship must be defined with the same length and scale.
Msg 1750, Level 16, State 1, Line 131
Could not create constraint or index. See previous errors.

Ah if only the application was the only point of access, the data quality would be fine… sure… yeah sure…


Now rest! We’ve actually played the naughty virus, and infected the database with erroneous data.

That was not for free! We’ve put some light on the part to improve. Bringing more integrity to the data should help us write better reports, trust our data and last but not least, give some hint to the SQL Server engine to optimize the performance.

Now that it’s done let’s wait the Part 4. I should practice some operation, try to heal this design… and confront it to your evaluations and comments.

too late to edit, ship it! SEE YOU SQL COWBOY…

One thought on “IDENTITY crisis — 3. Relational Distancing

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: