scieee Science in your language
[en] (orig)

Privacy-Preserving Linkage of Distributed Pseudonymised Datasets in a Virtual European Rare Disease Platform

Author: Hayn, Dieter,Sandner, Emanuel,Vengadeswaran, Abishaa,Tãtaru, Elena Alexandra,Wilkinson, Mark D.,Hanauer, Marc,Kreiner, Karl,Schreier, Guenter
Publisher: IOS Press
DOI: http://dx.doi.org/10.13039/501100000780
Source: https://digital.csic.es/bitstream/10261/369324/1/Privacy_Preserving_Linkage.pdf
P i acy-P ese ing Linkage o Dis ibu ed
Pseudonymised Da ase s in a Vi ual
Eu opean Ra e Disease Pla o m
Die e HAYNa,1 , Emanuel SANDNERa, Abishaa VENGADESWARANb,
Elena-Alexand a TÃTARUc, Ma k WILKINSONd, Ma c HANAUERc,
Ka l KREINERa and Guen e SCHREIERa
a AIT Aus ian Ins i u e o Technology GmbH, G az, Aus ia
b Goe he Uni e si y F ank u , Uni e si y Hospi al, Ins i u e o Medical In o ma ics
(IMI), F ank u am Main, Ge many
cF ench Na ional Ins i u e o Heal h and Medical Resea ch (INSERM), Pa is, F ance
dDepa amen o de Bio ecnología-Biología Vege al, Escuela Técnica Supe io de
Ingenie ía Ag onómica, Alimen a ia y de Biosis emas, Cen o de Bio ecnología y
Genómica de Plan as UPM-INIA, Uni e sidad Poli écnica de Mad id (UPM), Ins i u o
Nacional de In es igación y Tecnología Ag a ia y Alimen a ia (INIA/CSIC), Mad id,
Spain
ORCiD ID: Die e Hayn h ps://o cid.o g/0000-0003-1822-9033, Ma c Hanaue
h ps://o cid.o g/0000-0002-6758-2506, Ka l K eine h ps://o cid.o g/0000-0001-
6066-9708, Guen e Sch eie h ps://o cid.o g/0000-0003-3724-4255, Elena-Alexand a
Tã a u h ps://o cid.o g/0009-0007-7339-7175
Abs ac . Seconda y use o da a o esea ch pu poses is especially impo an in
a e diseases (RD), since, pe de ini ion, da a a e spa se. The Eu opean Join
P og amme on Ra e Diseases (EJP RD) aims a de eloping an RD in as uc u e
which suppo s he seconda y use o da a. Signi ican amoun s o RD da a a e a)
dis ibu ed and b) a ailable only in pseudonymised o ma . P i acy-P ese ing
Reco d Linkage (PPRL) conce ns he linking o such dis ibu ed da ase s wi hou
disclosing he pa icipan ’s iden i ies. We p esen a concep o linking a PPRL
Se ice o he EJP RD Vi ual Pla o m (VP). Le el 1 ( esou ce disco e y)
connec ion is p o ided by unning an FDP wi hin he PPRL Se ice. On Le el 2
(da a disco e abili y), he PPRL Se ice can ep esen bo h, an indi idual and a
ca alog endpoin . Ou solu ion can coun pa ien s in PPRL-suppo ing esou ces,
coun duplica es only once, and coun only pa ien s egis e ed o mul iple esou ces.
Cu en ly, we a e p epa ing he deploymen wi hin he EJP RD VP.
Keywo ds. Ra e diseases, p i acy-p ese ing eco d linkage (PPRL), esea ch
in as uc u e, seconda y use, Eu opean heal h da a space (EHDS)
1. In oduc ion
Seconda y use o da a o esea ch pu poses is especially impo an in a e diseases (RD),
since, pe de ini ion, da a a e spa se. Suppo o seconda y use o heal h da a is one o
1
Co esponding Au ho : Die e Hayn, Reininghauss . 13/1, G az, Aus ia; E-mail: [email p o ec ed].
Digi al Heal h and In o ma ics Inno a ions o Sus ainable Heal h Ca e Sys ems
J. Man as e al. (Eds.)
© 2024 The Au ho s.
This a icle is published online wi h Open Access by IOS P ess and dis ibu ed unde he e ms
o he C ea i e Commons A ibu ion Non-Comme cial License 4.0 (CC BY-NC 4.0).
doi:10.3233/SHTI240683
1442
he main aims o he Eu opean Heal h Da a Space (EHDS,[1]). Fo RD, he Eu opean
Join P og amme on RD (EJP RD) and i s successo , he Eu opean RD Resea ch Alliance
(ERDERA) aim a de eloping a Eu opean RD in as uc u e, which links o he EHDS,
based on he Findable, Accessible, In e ope able and Reusable (FAIR,[2]) p inciples.
In he absence o clinical guidelines, pa ien s a e o en ea ed acco ding o he mos
ecen ial p o ocol (see e.g.[3]). In addi ion o elec onic ial da a cap u e sys ems and
egis ies, RD pa ien s’ biological samples (blood, umou issue, u ine, bone ma ow,
e c.) and genomic p o iles a e ypically s o ed in biobanks. The e o e, signi ican
amoun s o quali y-con olled RD da a a e a) dis ibu ed and b) a ailable only in
pseudonymised o ma , wi h di e en pseudonyms o di e en con ex s, as demanded
in he Gene al Da a P o ec ion Regula ion (GDPR) [4].
P i acy-P ese ing Da a Linkage (PPRL) conce ns he linking o di e en da ase s
wi hou disclosing he pa icipan ’s iden i y in o ma ion. PPRL can be applied on
pe sonalized da a (e.g., be ween di e en hospi als) o be ween pseudonymised da ase s
(e.g., be ween di e en egis ies). PPRL conce ns a huge a ie y o scena ios. This
pape ocusses on a small sub-se o h ee dedica ed use cases ela ed o he EJP RD:
1. Coun pa ien s in PPRL-suppo ing esou ces. As a RD esea che , I would
like o coun pa ien s egis e ed in Eu opean RD esou ces suppo ing PPRL so ha I can
quan i y he cu en s a e o PPRL suppo in Eu ope.
2. Coun duplica es only once: As a RD esea che , I would like o coun pa ien s
egis e ed in Eu opean RD esou ces suppo ing PPRL and coun pa ien s egis e ed o
mo e han one esou ce only once so ha I can a oid biased esul s due o duplica es.
3. Coun pa ien s in mul iple esou ces: As a RD esea che , I would like o coun
pa ien s egis e ed in a minimum numbe o Eu opean RD esou ces so ha I can analyse
he o e lap o pa ien s be ween esou ces.
Cu en ly, hese use cases a e no suppo ed by any Eu opean RD in as uc u e. The
p esen pape desc ibes a concep , how hese use cases could be suppo ed in he EJP RD.
2. Ma e ials and Me hods
We ha e de eloped a concep o linking PPRL Se ices o he EJP RD in as uc u e
based on p e-exis ing componen s and speci ica ions as desc ibed in he ollowing.
2.1. Eu opean Join P og amme on Ra e Diseases Vi ual Pla o m (VP)
2.1.1. O e all a chi ec u e
The EJP RD Vi ual Pla o m (VP) is a se ice-o ien ed ecosys em o in e -linked web-
se ices ha p o iding esea che s wi h a uni ied way o access esou ces such as
egis ies, biobanks, da a eposi o ies and ca alogues. The VP is dis ibu ed and ede a ed
by design, meaning ha mos se ices a e p o ided om mul iple emo e Eu opean
loca ions a he han a cen al loca ion. Resea che s can en e he VP ei he ia he VP’s
in e aces (see below), o by speci ic VP po als, which p o ide g aphical use in e aces
o speci y eques s in a use - iendly way.
2.1.2. Me a da a model
Wi hin he VP, a s anda dized me ada a model is being used o exp ess me ada a
p o ided by esou ces. The ‘Da a Ca alog Vocabula y’ (DCAT), a widely used
D. Hayn e al. / P i acy-P ese ing Linkage o Dis ibu ed Pseudonymised Da ase s 1443
ocabula y, ecommended by he Wo ld Wide Web Conso ium (W3C) o desc ibing
esou ces, is used as he base model o he EJP RD me ada a. The cu en e sion 1.0 o
EJP RD me ada a model (Figu e 1) p o ides schemas o desc ibe he ollowing ypes o
esou ces [5]: Pa ien Regis y, Biobank, Da ase , Da a Se ice, Guideline.
Figu e 1. Eu opean Join P og amme on Ra e Diseases DCAT based me ada a model [5]
2.1.3. Le el 1 – Resou ce disco e y
Any esou ce linked o he VP mus p o ide i s me ada a au oma ically ( a he han
manually) by a s anda d mechanism. The e o e, he FAIR Da a Poin (FDP) speci ica ion
is applied [6]. Depending on he esou ce ype (Pa ien egis y, biobank, o guideline
esou ce), di e en me ada a elemen s a e manda o y, ecommended, and op ional. Any
EJP RD esou ce’s FDP is linked o he EJP RD cen al FDP index, which is used o
disco e RD esou ces by using he po al o he VP (“Le el 1”).
2.1.4. Le el 2 – Con en disco e y
Le el 1 connec ed esou ces a e encou aged o also suppo in e oga ion o anonymous
/ agg ega ed da a by suppo ing que ies wi hin he da a eco ds (“Le el 2”). A le el 2
que y eques will ypically e u n in o ma ion as ei he yes/no, o coun s. Le el 2 que ies
allow esea che s o de e mine i he esou ce is likely o con ain ele an in o ma ion o
in es iga e hei esea ch ques ion acco ding o hei espec i e s udy design and
me hodology. While single esou ces can be que ied ia he “indi idual” endpoin , he
“ca alog” endpoin can be used o que y bundles o esou ces ha a e linked o a ce ain
ca alog p o ide in a single eques . A le el 2, esou ce disco e y que ies a e ca ied ou
p og amma ically using he GA4GH Beacon 2 amewo k s anda d [7], which has
sligh ly been adap ed o he equi emen s o he VP. The Beacon 2 amewo k p o ides
an API and a JSON exchange o ma o que y and esul s.
2.2. EUPID Se ices
The EUPID Se ices a e a PPRL Se ice which was de eloped wi hin he EU FP7
p ojec Eu opean Ne wo k o Cance esea ch in Child en and Adolescen s (ENCCA)
[8]. The EUPID Se ices suppo gene a ion o di e en pseudonyms o one and he
same pa ien in di e en EUPID Con ex s (e.g., clinical ials, egis ies, biobanks), while
keeping he possibili y o link he pseudonymised da ase s in a p i acy-p ese ing way,
wi hou he need o in ol e all he p ima y si es who hold he pa ien s’ iden i y da a.
The e o e, c yp og aphic algo i hms and A gon-2 Hashes a e applied. While he EUPID
Se ices s o es enc yp ed / hashed iden i y da a and pseudonyms o egis e ed pa ien s,
all clinical / sensi i e da a is s o ed sepa a ely in he espec i e esou ces.
D. Hayn e al. / P i acy-P ese ing Linkage o Dis ibu ed Pseudonymised Da ase s1444
3. Resul s
3.1. O e all a chi ec u e
The p oposed o e all a chi ec u e o linking PPRL Se ices such as he EUPID Se ices
o he EJP RD VP is shown in Figu e 2.
Figu e 2. O e all a chi ec u e o connec ing P i acy-P ese ing Reco d Linkage (PPRL)-Se ices such as
he EUPID Se ices o he Eu opean Join P og amme on Ra e Diseases Vi ual Pla o m
3.2. Le el 1 – Resou ce disco e y
Le el 1 ( esou ce disco e y) connec ion be ween he VP and he EUPID Se ices is
p o ided by unning an FDP wi hin he EUPID Se ices in as uc u e. Wi hin he FDP,
basic me ada a conce ning he se ice a e p o ided and links o wo Le el 2 endpoin s
as desc ibed below a e p o ided.
3.3. Le el 2 – Con en disco e y
On Le el 2, he EUPID Se ices can ep esen bo h, indi idual and ca alog endpoin . As
an indi idual endpoin , he EUPID Se ices p o ide in o ma ion conce ning he o e all
numbe s o pa ien s egis e ed wi hin hei da abase. As a ca alog, hese da a a e p o ided
a) o each EUPID Con ex sepa a ely (e.g., numbe o pa ien s in Con ex A) AND b)
o all iden i ied combina ions o EUPID Con ex s (e.g., numbe o pa ien s in Con ex
A & B). The subsequen equi emen s apply o bo h endpoin s.
Fo Le el 2 que ying, we speci ied il e c i e ia o Beacon-2 que ies, which need
o be added o he cu en speci ica ion o EJP RD Beacon endpoin s (as speci ied in [9]):
 F1 “Coun duplica es only once” (boolean, de aul : FALSE)
 F2 “Igno e esou ces no suppo ing duplica e de ec ion” (boolean, de aul :
FALSE)
 N1 Minimum numbe o esou ces a pa ien mus be egis e ed o (in ege ,
de aul : 1)
Po als o he VP need o upda e hei GUI and hei Beacon-2- eques s o suppo
F1, F2, and N1. Addi ionally, hey ei he need o send que ies only o sui able esou ces
o il e esponses based on PPRL-suppo p io isualiza ion. Finally, hey a e
ecommended o isualize he numbe o esou ces ha do no suppo he selec ed lags
in addi ion o he coun . Resou ces suppo ing duplica e de ec ion should coun
D. Hayn e al. / P i acy-P ese ing Linkage o Dis ibu ed Pseudonymised Da ase s 1445
acco dingly when eplying o eques s wi h F1, F2 o N1, and indica e in hei eply ha
duplica e de ec ion was applied. Resou ces no suppo ing duplica e de ec ion, which a e
al eady connec ed o he VP, a e no equi ed o apply any adap a ions on hei in e aces.
4. Discussion and Conclusions
We ha e de eloped a concep o linking PPRL suppo ing esou ces o he EJP RD VP
based on he EUPID Se ices. Ou solu ion is capable o add essing he h ee use cases
p esen ed in chap e 1 – i.e., coun ing pa ien s in PPRL-suppo ing esou ces, coun ing
duplica es only once, and coun ing pa ien s egis e ed o mul iple esou ces. By linking
he PPRL Se ice o he VP, all esou ces connec ed o he espec i e PPRL se ice a e
connec ed, wi hou he need o adap he in e aces o hese esou ces o o o he
esou ces al eady connec ed o he VP. Based on ou app oach, que ying is only possible
based on hose da a ha a e a ailable wi hin he PPRL Se ice. In he case o he EUPID
Se ices, his means ha coun ing can only be done based on pa ien s and based on
me ada a a ailable on a esou ce le el. No il e ing based on pa ien -le el (e.g., age, sex,
e c.) is suppo ed. So a , ou concep only suppo s esou ce disco e y and con en
disco e y. Fu u e e sions o he VP will also suppo ede a ed analyses o da a om
mul iple RD esou ces in Eu ope (Le el 3). Howe e , ou concep ocusses on Le el 1
and 2, only. Fu u e wo k includes he deploymen o ou concep wi hin he VP a) based
on es da a and b) wi h eal-wo ld da a. In addi ion, PPRL will be in eg a ed in o u he
use cases, pa ly in he cou se o he Eu opean Ra e Diseases Resea ch Alliance
(ERDERA) p ojec , which will s a in Sep embe 2024.
Acknowledgemen s
This wo k was suppo ed by he EJP RD (G an -numbe 825575).
Re e ences
[1] Eu opean Commission. P oposal o a  REGULATION OF THE EUROPEAN PARLIAMENT AND OF
THE COUNCIL on he Eu opean Heal h Da a Space S asbou g2022.
[2] Wilkinson MD, Dumon ie M, Aalbe sbe g IJ, Apple on G, Ax on M, Baak A, e al. The FAIR Guiding
P inciples o scien i ic da a managemen and s ewa dship. Sci Da a. 2016;3:160018.
[3] De Wilde B, Ba y E, Fox E, Ka es D, Kie an M, Manlay J, e al. The C i ical Role o Academic Clinical
T ials in Pedia ic Cance D ug App o als: Design, Conduc , and Fi o Pu pose Da a o Posi i e
Regula o y Decisions. J Clin Oncol. 2022;40(29):3456.
[4] THE EUROPEAN PARLIAMENT AND OF THE COUNCIL. REGULATION (EU) 2016/679 OF THE
EUROPEAN PARLIAMENT AND OF THE COUNCIL o 27 Ap il 2016 on he p o ec ion o na u al
pe sons wi h ega d o he p ocessing o pe sonal da a and on he ee mo emen o such da a, and
epealing Di ec i e 95/46/EC (Gene al Da a P o ec ion Regula ion). 2016.
[5] Eu opean Join P og amme on Ra e Diseases. Me ada a o EJP a e disease pa ien egis ies, biobanks
and ca alogs Me ada a o EJP a e disease pa ien egis ies, biobanks and ca alogs
[6] Ola o Bonino L, Bu ge K, Kaliyape umal R. FAIR Da a Poin h ps://specs. ai da apoin .o g/ dp-specs-
1.2.h ml2023
[7] Uni ied eposi o y o Beacon 2 Code & Documen a ion h ps://gi hub.com/ga4gh-beacon/beacon- 2/
[8] Ni zlnade M, Sch eie G. Pa ien iden i y managemen o seconda y use o biomedical esea ch da a in
a dis ibu ed compu ing en i onmen . S ud Heal h Technol In o m. 2014;198:211-8.
[9] Eu opean Join P og amme on Ra e Diseases conso ium. REST API speci ica ion o que ying RD
esou ces (Vi ual Pla o m Le el 2) h ps://gi hub.com/ejp- d- p/ p-api-specs
D. Hayn e al. / P i acy-P ese ing Linkage o Dis ibu ed Pseudonymised Da ase s1446