scieee Science in your language
[en] (orig)

Putting the pieces together - Integrating clinical data with domain knowledge for advanced knowledge generation and prediction.

Author: Gehrmann, Julia
Publisher: Zenodo
DOI: 10.5281/zenodo.17702059
Source: https://zenodo.org/records/17702059/files/abstract-aikg-sd-2025-gehrmann.pdf
AIKG-SD 2025 Summe School co-loca ed wi h he NFDI4DS Con e ence 2025
No embe 25-26, 2025, Be lin, Ge many
1
Pu ing he pieces oge he - In eg a ing clinical da a wi h domain knowledge o
ad anced knowledge gene a ion and p edic ion.
Julia Geh mann
(ORCiD 0000-0002-4101-5458)
Ins i u e o Biomedical In o ma ics, Uni e si y o Cologne, Facul y o Medicine and Uni e si y
Hospi al Cologne
Abs ac
Da a cap u ing so wa e like REDCap enables clinicians o documen pa ien -le el clinical
da a in a s uc u ed and well-anno a ed way [1]. Howe e , he da a expo om REDCap is
abula , no co e ing po en ial seman ic connec ions be ween ea u es. These connec ions can
be ele an o downs eam asks gene a ing new knowledge o p edic ions om he REDCap
da a [2]. The e o e, we aim a de eloping an au oma ed and eusable pipeline ans o ming
REDCap expo s in o pa ien -le el ec o ep esen a ions, en iched wi h domain knowledge
om clinical e minology SNOMED CT and insigh s om schola ly li e a u e ia
Knowledge G aph (KG) modeling. The enhanced ec o ep esen a ions can e en ually se e
as inpu o downs eam asks such as classi ica ion and clus e ing.
The ans o ma ion p ocess begins wi h he mapping o a iable names o SNOMED CT
concep s, le e aging he clinical e minology o es ablish a seman ic ounda ion o he KG.
Based on he esul ing mappings and he REDCap me ada a, we cons uc a KG in which
nodes ep esen pa ien s, clinical e en s, and indi idual measu emen s. Each measu emen is
connec ed o he espec i e pa ien and clinical e en ia a labeled edge. Mo eo e , we
in oduce labeled edges be ween measu emen nodes connec ed o he same pa ien and e en
i hei associa ed SNOMED CT concep s ha e a de ined seman ic ela ionship. This s ep
embeds medical hie a chies and domain knowledge in o he g aph s uc u e [3].
To u he en ich he g aph, we inco po a e schola ly da a by using a la ge language model
(LLM) ine- uned on manually selec ed, domain-speci ic li e a u e. The ine- uned LLM
assesses he con ex ual s eng h o ela ionships be ween clinical concep s based on li e a u e-
de i ed ele ance and assigns espec i e weigh s o edges. The inal KG consis s o nodes
whose a ibu es comp ise he o iginal measu emen alues om he REDCap expo and
seman ic labels om SNOMED CT, while edge weigh s e lec bo h on ology-de ined and
li e a u e-in o med concep ela ionships.
In a second ans o ma ion p ocess, we use a g aph au oencode o lea n low-dimensional
embeddings o his en iched KG. By join ly conside ing he g aph s uc u e, edge weigh s,
and node a ibu es, he au oencode gene a es ec o ep esen a ions ha e ain bo h he
clinical da a om REDCap and he quali a i e con ex de i ed om on ologies and schola ly
esou ces.
Compa ed o he o iginal REDCap expo , he esul ing embeddings o e se e al key
ad an ages: (1) They inco po a e hie a chical ela ionships be ween clinical a iables as
de ined in SNOMED CT. (2) They comp ise cu en biomedical knowledge h ough
li e a u e-in o med edge weigh ing. (3) They p o ide a compac , seman ically awa e
AIKG-SD 2025 Summe School co-loca ed wi h he NFDI4DS Con e ence 2025
No embe 25-26, 2025, Be lin, Ge many
2
ep esen a ion sui able o machine lea ning asks in medical esea ch. Fu he mo e, ou KG-
based app oach ully suppo s he e ogenei y by allowing o u he da a modali ies o be
in eg a ed i a ailable. The au oma ed na u e o ou p oposed pipeline, mo eo e , suppo s
ypical he e ogenei y be ween medical cen e s such as di e en REDCap da abase s uc u es.
Abo e all, howe e , ou app oach aims o b idge he gap be ween s uc u ed clinical da a and
es ablished biomedical knowledge, enabling a mo e in o med and con ex -awa e applica ion
o machine lea ning in heal hca e.
Re e ences
[1] Ha is, P. A., Taylo , R., Thielke, R., Payne, J., Gonzalez, N., & Conde, J. G. (2009).
Resea ch elec onic da a cap u e (REDCap)—a me ada a-d i en me hodology and
wo k low p ocess o p o iding ansla ional esea ch in o ma ics suppo . Jou nal o
biomedical in o ma ics, 42(2), 377-381.
[2] Oh, S. (2019). Fea u e in e ac ion in e ms o p edic ion pe o mance. Applied Sciences,
9(23), 5191.
[3] Chang, E., & Mos a a, J. (2021). The use o SNOMED CT, 2013-2020: a li e a u e
e iew. Jou nal o he Ame ican Medical In o ma ics Associa ion, 28(9), 2017-2026.