Page 1 o 11
Da aTools4Hea
A Eu opean Heal h Da a Toolbox o Enhancing Ca diology Da a In e ope abili y, Reusabili y and P i acy
Miles one MS10
3 FL inno a ions implemen ed, op imised and
es ed ac oss he ne wo k, i.e. Cen e D opou ,
Unbiased Agg ega ion and Unce ain y-Awa eness
Re e ence MS10_ Da aTools4Hea _UB_30092025
Lead Bene icia y Uni e si y o Ba celona
Au ho (s) G zego z Sko upko
, Jo ge Fabila, C is ian Izquie do,
Xenia Puig.
Dissemina ion le el Public
Type Miles one
O icial Deli e y Da e 30/09/2025
Da e o alida ion o he WP leade 15/09/2025
Da e o alida ion by he P ojec
Coo dina o
26/09/2025
P ojec Coo dina o Signa u e
Da aTools4Hea is unded by he Eu opean Union’s Ho izon Eu ope
F amewo k Unde G an Ag eemen No. 101057849.
Page 2 o 11
Ve sion Log
Issue Da e Ve sion In ol ed Commen s
Fi s d a
V1.0
Jo ge Fabila, G zego z
Sko upko, C is ian
Izquie do
1s D a
30/09/2025
V2.0
C is ian Izquie do, Xenia
Puig, Ka im Lekadi
Final Ve sion
Execu i e Summa y
This miles one se es as a p oo ha he h ee di e en FL implemen a ions (Cen e D opou ,
Unbiased Agg ega ion and Unce ain y Awa eness) ha e been designed and es ed in e nally, and
eady o deploymen .
Page 3 o 11
Table o Con en s
Ve sion Log ............................................................................................................................................ 2
Execu i e Summa y ............................................................................................................................... 2
Ac onyms ............................................................................................................................................... 3
Cen e D opou : Expe imen s and E alua ion ....................................................................................... 4
Me hod O e iew ............................................................................................................................... 4
Expe imen al Se up ............................................................................................................................ 4
Expe imen al Resul s ......................................................................................................................... 4
T aining Time and E iciency .............................................................................................................. 6
Expe imen s Summa y ....................................................................................................................... 7
Weigh Smoo hing: Expe imen s and E alua ion .................................................................................. 7
Me hod O e iew ............................................................................................................................... 7
Expe imen al Se up ............................................................................................................................ 7
Expe imen al Resul s ......................................................................................................................... 7
A e aged Pe o mance Ac oss Cen es ............................................................................................ 8
Expe imen s Summa y ....................................................................................................................... 8
Unce ain y awa e Fede a ed Lea ning ................................................................................................. 9
Conclusions .......................................................................................................................................... 11
Bibliog aphy ......................................................................................................................................... 11
Ac onyms
FL: Fede a ed Lea ning
CVD: Ca dio ascula Disease
Page 4 o 11
Cen e D opou : Expe imen s and E alua ion
Me hod O e iew
Cen e D opou is a me hodological ex ension o ede a ed lea ning aimed a imp o ing bo h e iciency
and ai ness in collabo a i e model aining, in oduced by [1]. Ins ead o equi ing all pa icipa ing
ins i u ions o con ibu e upda es in e e y ound, Cen e D opou andomly o selec i ely excludes a
ac ion o cen es while p opo ionally scaling he aining load o he emaining ones. This educes
communica ion o e head, accele a es aining by a oiding synch oniza ion wi h he slowes si es, and
ensu es ha smalle cen es a e no consis en ly o e shadowed by la ge ins i u ions in he agg ega ion
p ocess. O e mul iple ounds, he app oach balances ins i u ional con ibu ions while p ese ing
o e all aining e o . This me hod is designed o p o ide a lexible middle g ound be ween s anda d
ede a ed a e aging and mo e sequen ial aining schemes, making i highly sui able o
he e ogeneous medical da ase s.
Expe imen al Se up
To assess he e ec i eness o Cen e D opou in p ac ice, we conduc ed expe imen s using he UK
Biobank da ase . Due o delays in da a access om pa ne ing ins i u ions, his la ge-scale publicly
a ailable da ase was selec ed o p o ide a ep esen a i e benchma k o ede a ed lea ning
expe imen s.
● Da ase : 225,355 samples o igina ing om 12 acquisi ion cen es we e conside ed in hei
o iginal dis ibu ion.
● P edic ion ask: F om his coho , 3,350 pa ien s wi h ca dio ascula disease (CVD) as
he main cause o dea h we e selec ed o he classi ica ion ask.
● Model: A Logis ic Reg ession classi ie ained on 12 clinical ea u es was used as he
baseline model.
● E alua ion me ic: Balanced Accu acy was chosen o accoun o he signi ican label
imbalance p esen in he aining da a.
● Valida ion s a egy: Da a om he S oke cen e was en i ely held ou o ex e nal
alida ion.
● Rep oducibili y: Each un was epea ed eigh imes wi h andomized spli s o ensu e
obus ness o esul s.
Expe imen al Resul s
Th ee se s o analyses we e pe o med:
1. Sample dis ibu ion ac oss cen es
Fig. 1 illus a es he dis ibu ion o a ailable samples pe acquisi ion cen e. The da ase was
highly imbalanced, wi h some cen es con ibu ing a disp opo iona ely la ge numbe o
cases (Newcas le) while o he s emained ela i ely unde ep esen ed (W exham, Swansea).
Page 5 o 11
Figu e 1: Dis ibu ion o samples ac oss 12 acquisi ion cen es in he UKBiobank da ase .
2. Fede a ed s. local aining pe o mance
Fig. 2 compa es local models ( ained independen ly pe cen e) wi h he ede a ed model.
Resul s demons a e a consis en pe o mance imp o emen o he ede a ed app oach,
pa icula ly o smalle cen es ha o he wise su e om insu icien aining da a. This
con i ms he expec ed bene i o FL in enhancing gene aliza ion ac oss he e ogeneous
clinical si es.
Figu e 2: Compa ison o Balanced Accu acy be ween local and ede a ed aining ac oss cen es. A end o
imp o ed pe o mance o ede a ed aining, especially in smalle cen es, is obse ed.
Page 6 o 11
3. Benchma king Cen e D opou me hods
Fig. 3 p esen s esul s om expe imen s wi h di e en Cen e D opou a ian s, including
he “Less pa icipan s e e y odd ound” s a egy a d opou ac ions o 0.3, 0.7, and 0.8.
O e all, pe o mance emained compa able o he baseline ede a ed model (all cen es
pa icipa ing in e e y ound). In e es ingly, he smalles cen e (W exham) achie ed a mild
pe o mance inc ease unde he “Less pa icipan s e e y odd” con igu a ion, as i
con ibu ed p opo ionally mo e o en o he agg ega ion p ocess.
Figu e 3: Benchma k o Cen e D opou me hods (“Less pa icipan s e e y odd ound”) wi h a ying d opou ac ions
(0.3, 0.7, 0.8). Pe o mance emains compa able o he baseline, wi h mild imp o emen o he smalles cen e
(W exham).
T aining Time and E iciency
To quan i y he e iciency gains, we compa ed aining imes ac oss Cen e D opou con igu a ions.
The esul s a e summa ized in Table 1. All d opou a ian s esul ed in educed aining ime compa ed
o s anda d FL. The “Less pa icipan s e e y odd (0.7)” se up achie ed he la ges e iciency
imp o emen , educing aining ime by app oxima ely 20% while incu ing only a 0.6% loss in
Balanced Accu acy.
Table 1: T aining ime educ ion and Balanced Accu acy pe o mance unde di e en Cen e D opou con igu a ions.
Me hod
Balanced Accu acy
T aining ime [s]
Locally ained
0.675 ±0.018
-
Fede a ed (baseline)
0.723 ±0.013
72.103 ±1.121
Cen e d opou :
Random d opou (0.5)
0.723 ±0.012
68.979 ±2.332
Fas a odd ounds (0.3)
0.721 ±0.012
64.479 ±1.824
Less pa icipan s e e y odd (0.3)
0.722 ±0.012
64.601 ±0.951
Less pa icipan s e e y odd (0.7)
0.719 ±0.011
57.256 ±1.972
Less pa icipan s e e y odd (0.8)
0.711 ±0.014
59.676 ±1.905
Page 7 o 11
Expe imen s Summa y
These indings highligh he po en ial o Cen e D opou o accele a e ede a ed aining wi hou
comp omising p edic i e pe o mance. Con igu a ions wi h highe d opou ac ions (e.g., 0.7) deli e
subs an ial e iciency gains while main aining nea ly iden ical Balanced Accu acy. This con i ms he
me hod’s alue as a p ac ical mechanism o scale ede a ed lea ning ac oss he e ogeneous, mul i-
cen e heal hca e da ase s.
As a nex s ep, he conso ium will ex end hese expe imen s o eal-wo ld pa ne da ase s as soon
as access is a ailable, o alida e Cen e D opou unde he speci ic condi ions o c oss-ins i u ional
medical da a in eg a ion
Weigh Smoo hing: Expe imen s and E alua ion
Me hod O e iew
Weigh Smoo hing is a ede a ed lea ning s a egy designed o educe bias owa ds da a- ich cen es
du ing he agg ega ion phase, p oposed by [1]. In con en ional ede a ed a e aging [2], clien upda es
a e weigh ed p opo ionally o da ase size, which can lead o o e ep esen a ion o la ge cen es and
unde ep esen a ion o smalle ones. Weigh Smoo hing mi iga es his issue by adjus ing he
agg ega ion weigh s along a spec um be ween equal cen e con ibu ions (balanced o ing) and
sample-size–based con ibu ions. By con olling he balance be ween hese wo ex emes, he me hod
aims o p omo e ai ness while p ese ing he s a is ical obus ness o he global model.
Expe imen al Se up
To e alua e Weigh Smoo hing wi hin he p ojec , we conduc ed expe imen s using he UK Biobank
da ase in he same se up as o Cen e D opou e alua ion. Due o delays in da a access om
pa ne ing ins i u ions, his da ase p o ided a ep esen a i e la ge-scale benchma k.
● Da ase : 225,355 samples om 12 acquisi ion cen es in he o iginal dis ibu ion.
● P edic ion ask: Selec ion o 3,350 pa ien s wi h ca dio ascula disease (CVD) as he
main cause o dea h o bina y classi ica ion.
● Model: Logis ic Reg ession ained on 12 inpu ea u es.
● E alua ion me ic: Balanced Accu acy, e lec ing he s ong class imbalance in he
da ase .
● Valida ion s a egy: Da a om he S oke cen e was held ou en i ely o ex e nal
alida ion.
● Rep oducibili y: Each expe imen was epea ed eigh imes wi h andomized spli s.
Expe imen al Resul s
Th ee Weigh Smoo hing s a egies we e compa ed:
● Equal sample weigh s ( ede a ed baseline): agg ega ion p opo ional o he numbe o samples
pe cen e.
● Equal cen e weigh s: all cen es con ibu e equally, independen o da ase size.
● Lowe qua ile smoo hing hyb id scheme weigh ing 0.25 cen e-balanced and 0.75 sample-
size–based con ibu ions.
Page 8 o 11
Fig. 4 p esen s Balanced Accu acy ac oss all pa icipa ing cen es o he h ee me hods. In his case,
no subs an ial pe o mance di e ences we e obse ed be ween s a egies. This ou come may be
explained by he ela i ely homogeneous popula ion dis ibu ion ac oss UK Biobank cen es, which
limi s c oss-si e a iabili y. Addi ionally, since Logis ic Reg ession in ol es a ewe ainable
pa ame e s han deep lea ning models, he po en ial bene i o weigh smoo hing in mi iga ing
agg ega ion bias is likely educed compa ed o p io s udies [1].
Figu e 4: Balanced Accu acy ac oss cen es o h ee Weigh Smoo hing s a egies: Equal sample weigh s, Equal
cen e weigh s, and Lowe qua ile smoo hing.
A e aged Pe o mance Ac oss Cen es
To u he assess he e ec s, esul s we e a e aged ac oss all cen es (Table 2). The Equal sample
weigh s me hod (baseline) p o ided he bes o e all pe o mance. Ne e heless, all Weigh Smoo hing
con igu a ions ou pe o med locally ained models, con i ming he ad an age o ede a ed aining
e en unde simple linea modelling assump ions.
Table 2. A e aged Balanced Accu acy ac oss cen es unde di e en Weigh Smoo hing con igu a ions.
Me hod
Balanced Accu acy
T aining ime [s]
Locally ained
0.718 ±0.012
-
Weigh Smoo hing:
Equal sample weigh s (baseline)
0.723 ±0.013
72.103 ±1.121
Equal cen e weigh s
0.721 ±0.014
73.818 ±3.774
Lowe Qua ile
0.723 ±0.013
71.611 ±1.527
Expe imen s Summa y
The esul s sugges ha Weigh Smoo hing did no subs an ially a ec model pe o mance in he
cu en se up, likely due o he homogeneous sample dis ibu ion ac oss UK Biobank cen es and he
simplici y o he Logis ic Reg ession model. None heless, he expe imen s con i m ha ede a ed
app oaches consis en ly ou pe o m local aining, and mo e p onounced e ec s o Weigh Smoo hing
a e expec ed in u u e expe imen s wi h mo e he e ogeneous pa ne da ase s om DT4H conso ium.
Page 9 o 11
Unce ain y awa e Fede a ed Lea ning
In many machine lea ning applica ions, especially in AI o heal hca e, i is impo an no only o p edic
labels accu a ely bu also o es ima e he unce ain y o hose p edic ions. T adi ional Bayesian Neu al
Ne wo ks (BNNs) explici ly model he pos e io dis ibu ion o e weigh s bu a e o en compu a ionally
expensi e and di icul o scale.
Mon e Ca lo (MC) D opou o e s a p ac ical app oxima ion o Bayesian in e ence. I in e p e s d opou
(o iginally designed as a egula iza ion echnique) o sample om an app oxima e pos e io [3].
A model wi h high a iance means i is no con iden abou i s p edic ions. One way o include his
in o ma ion in he ede a ed se ing is o weigh his measu e du ing he agg ega ion p ocess.
Fede a ed a e aging usually weigh s model con ibu ions based on sample size, o a oid
o e ep esen ing cen es wi h many samples and o balance he agg ega ion. In his case, we also
include en opy, which is a measu e o he models’ a iance: models wi h highe a iance (i.e., g ea e
unce ain y) con ibu e less han hose wi h lowe a iance. En opy (H, as was called by Shannon) is
de ined as:
Whe e each p edic ion p is he a e age o he T s ochas ic o wa d passes:
And each o wa d pass is he ou pu o a gi en NN:
S a ing om he agg ega ion based on numbe o samples we ha e:
The weigh s a e de ined based on he numbe o samples and he en opy (H), ϵ is only o a oid
di iding by 0: