After Part 1 and Part 2. I want to embark on a code review. Playing the pathologist, running a small diagnostic on the diagnose
table and its friends.
Relationaaaaaaah
By the all mighty, we shall invoke SchemaSpy, and the picture should speak:

diagnose
TableI now feel so sad for diagnose
. Quarantine measures? The only relation is “implied”. The FOREIGN KEY do not exists. Just the neat SchemaSpy telling us what ought to be.
We now dive deeper and zoom on the structure. Along the trip I will shoot at the tables with assumption and business rules … and see how this proto-database holds.
Load, Aim…
Relational Atoms
One positive point: the codes/static data are present in the database. It’s not that an evidence. That being said … FIRE!
CREATE TABLE old.icdo_version (
version INT NOT NULL PRIMARY KEY CLUSTERED
);
---------------------------------------------
insert into old.icdo_version values (2011), (2017);
/*
-- PASS: unique version
insert into old.icdo_version values (2011);
-- PASS: no null
insert into old.icdo_version values (NULL);
-- FAIL: version named after years
insert into old.icdo_version values (42);
*/
CREATE TABLE old.icdo_value (
id INT IDENTITY NOT NULL PRIMARY KEY CLUSTERED,
text NVARCHAR (400) -- NULL and DUPLICATE allowed?
);
--------------------------------------------
insert into old.icdo_value(text)
values ('C11/2011'), ('C22.2/2011'), ('C11/2017'), ('C99.4/2017');
/*
-- FAIL: unique meaning, else what is the point of such a value catalog
insert into old.icdo_value(text) select top 1 text from old.icdo_value order by id asc;
-- FAIL: a code as a meaning
insert into old.icdo_value(text) select NULL;
*/
CREATE TABLE old.icdo (
id INT IDENTITY NOT NULL PRIMARY KEY CLUSTERED,
code VARCHAR (12) NOT NULL UNIQUE
-- later to be used in a varchar(8)
-- but here it contains 2 codes and one of them
);
-------------------------------------------------
insert into old.icdo(code)
values ('C11'), ('C22.2'), ('C99.4');
/*
-- PASS: Unique
insert into old.icdo(code) select top 1 code from old.icdo order by id asc;
-- PASS: NON NULL
insert into old.icdo(code) select NULL;
-- FAIL: Well formatted '[CD][0-9][0-9]%'
insert into old.icdo(code) select 'bad format';
*/
Round 1: 4 out of 8 target. Not that reliable for a building block. I particularly don’t enjoy that duplicated are allowed where value should be unique. Also previously mentioned, the unfortunate naming and choice for surrogate over nice natural keys.
Loaded, Attack!
Molecules
Next level: not all combinations out of code * version * text are relevant. But let’s see how it behave on the field.
CREATE TABLE old.icdo_code_has_value (
code_id INT NOT NULL
FOREIGN KEY REFERENCES old.icdo(id)
ON DELETE CASCADE ON UPDATE CASCADE,
-- on delete cascade. not playing safe here
value_id INT NOT NULL
FOREIGN KEY REFERENCES old.icdo_value(id) ON DELETE CASCADE ON UPDATE CASCADE,
version INT NOT NULL
FOREIGN KEY REFERENCES old.icdo_version(version)
ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY CLUSTERED (code_id, version, value_id),
-- This was a good start: but this PK allow different value
-- to be added to the code * version
);
-----------------------------------------
with prep as (
select val.id,
LEFT(val.text, CHARINDEX('/', val.text) - 1) as code,
cast( SUBSTRING(val.text, charindex('/', val.text) +1, len(val.text)) as int) as version
from old.icdo_value as val
)
insert into old.icdo_code_has_value(code_id, value_id, version)
select icdo.id, prep.id, icdo_version.version
from prep
join old.icdo on icdo.code = prep.code
join old.icdo_version on icdo_version.version = prep.version
;
select * from old.icdo_code_has_value;
/*
-- FAIL: code * version => text --
insert into old.icdo_code_has_value(code_id, value_id, version)
select 1, 2, 2011;
*/
Positive point: no null allowed. We could nonetheless score 1 major hit due to the zealous Primary Key. This could lead to undetermined text: some code*version may be assigned 2 meaning… try to hold your JOIN.
Cells
Last to document the different cases:
CREATE TABLE old.diagnose (
diagnose_id INT IDENTITY NOT NULL PRIMARY KEY CLUSTERED,
patient_id INT NOT NULL, -- FK/Ref Table removed for clarity
icd10_code VARCHAR (8) ,
icdo_code VARCHAR (8) ,
-- lol... our codes are varchar(12) because mixing 2 types of codes
-- no luck to get a FK on that
icdo_version INT
-- hey... anyone ... I'm looking for FK here... no?
);
--------------------------------------------------
insert into old.diagnose(patient_id, icd10_code, icdo_code, icdo_version)
select 99, 'd44', icdo.code, ichv.version
from old.icdo_code_has_value ichv
join old.icdo on icdo.id = ichv.code_id;
select * from old.diagnose;
/*
-- FAIL * 3
-- Non allowed combination with non allowed code or value
-- update without where... do not do this at home :)
update old.diagnose
set icdo_code = 'hell', icdo_version = 666;
select * from old.diagnose;
*/
… oh, poor diagnose table. On top of the lack of data integrity, please do ponder the varchar(8)
for a varchar(12)
associated code. .. and for the fun, throw our last grenade š
ALTER TABLE old.diagnose
ADD FOREIGN KEY (icdo_code) REFERENCES old.icdo(code);
Msg 1753, Level 16, State 0, Line 131
Column 'old.icdo.code' is not the same length or scale as referencing column 'diagnose.icdo_code' in foreign key 'FK__diagnose__icdo_c__4222D4EF'. Columns participating in a foreign key relationship must be defined with the same length and scale.
Msg 1750, Level 16, State 1, Line 131
Could not create constraint or index. See previous errors.
Ah if only the application was the only point of access, the data quality would be fine… sure… yeah sure…
Now rest! We’ve actually played the naughty virus, and infected the database with erroneous data.
That was not for free! We’ve put some light on the part to improve. Bringing more integrity to the data should help us write better reports, trust our data and last but not least, give some hint to the SQL Server engine to optimize the performance.
Now that it’s done let’s wait the Part 4. I should practice some operation, try to heal this design… and confront it to your evaluations and comments.
too late to edit, ship it! SEE YOU SQL COWBOY…
One thought on “IDENTITY crisis — 3. Relational Distancing”