The CrowdGleason dataset: Learning the Gleason grade from crowds and experts

Author: López-Pérez, Miguel,Morquecho, Alba,Schmidt, Arne,Pérez Bueno, Fernando,Martín-Castro, Aurelio,Mateos, Javier,Molina, Rafael

Publisher: ELSEVIER

Year: 2024

DOI: 10.1016/j.cmpb.2024.108472

Source: https://addi.ehu.eus/bitstream/10810/70375/1/The%20CrowdGleason%20dataset2024.pdf

Con en s lis s a ailable a ScienceDi ec
Compu e Me hods and P og ams in Biomedicine
jou nal homepage: h ps://www.sciencedi ec .com/jou nal/compu e -me hods-and-
p og ams-in-biomedicine
The C owdGleason da ase : Lea ning he Gleason g ade om c owds and
expe s✩
Miguel López-Pé eza,∗, Alba Mo quechob, A ne Schmid b, Fe nando Pé ez-Buenod,
Au elio Ma ín-Cas oc, Ja ie Ma eosb, Ra ael Molinab
aIns i u o Uni e si a io de In es igación en Tecnología Cen ada en el Se Humano, Uni e si a Poli ècnica de València, Spain
bDepa men o Compu e Science and A i icial In elligence, Uni e sidad de G anada, G anada, Spain
cDepa men o Pa hology, Vi gen de las Nie es Uni e si y Hospi al, 18014 G anada, Spain
dBasque Cen e on Cogni ion, B ain and Language, Donos ia - San Sebas ián, Spain
ARTICLE INFO
Keywo ds:
Compu a ional pa hology
C owdsou cing
P os a e cance
Gleason g ade
Gaussian p ocesses
Medical image analysis
ABSTRACT
Backg ound: Cu en ly, p os a e cance (PCa) diagnosis elies on he human analysis o p os a e biopsy
Whole Slide Images (WSIs) using he Gleason sco e. Since his p ocess is e o -p one and ime-consuming,
ecen ad ances in machine lea ning ha e p omo ed he use o au oma ed sys ems o assis pa hologis s.
Un o una ely, labeled da ase s o aining and alida ion a e sca ce due o he need o expe pa hologis s
o p o ide g ound- u h labels.
Me hods: This wo k in oduces a new p os a e his opa hological da ase named C owdGleason, which consis s
o 19,077 pa ches om 1045 WSIs wi h a ious Gleason g ades. The da ase was anno a ed using a
c owdsou cing p o ocol in ol ing se en pa hologis s-in- aining o dis ibu e he labeling e o . To p o ide a
baseline analysis, wo c owdsou cing me hods based on Gaussian P ocesses (GPs) we e e alua ed o Gleason
g ade p edic ion: SVGPCR, which lea ns a model om he C owdGleason da ase , and SVGPMIX, which
combines da a om he public da ase SICAP 2 and he C owdGleason da ase . The pe o mance o hese
me hods was compa ed wi h o he c owdsou cing and expe label-based me hods h ough comp ehensi e
expe imen s.
Resul s: The esul s demons a e ha ou GP-based c owdsou cing app oach ou pe o ms o he me hods o
agg ega ing c owdsou ced labels (𝜅= 0.7048 ± 0.0207) o SVGPCR s.(𝜅= 0.6576 ± 0.0086) o SVGP wi h
majo i y o ing). SVGPCR ained wi h c owdsou ced labels pe o ms be e han GP ained wi h expe
labels om SICAP 2 (𝜅= 0.6583 ± 0.0220) and ou pe o ms mos indi idual pa hologis s-in- aining (mean
𝜅= 0.5432). Addi ionally, SVGPMIX ained wi h a combina ion o SICAP 2 and C owdGleason achie es he
highes pe o mance on bo h da ase s (𝜅= 0.7814 ± 0.0083 and 𝜅= 0.7276 ± 0.0260).
Conclusion: The expe imen s show ha he C owdGleason da ase can be success ully used o aining and
alida ing supe ised and c owdsou cing me hods. Fu he mo e, he c owdsou cing me hods ained on his
da ase ob ain compe i i e esul s agains hose using expe labels. In e es ingly, he combina ion o expe and
non-expe labels opens he doo o a u u e o massi e labeling by inco po a ing bo h expe and non-expe
pa hologis anno a o s.
1. In oduc ion
P os a e cance is a p e alen cance and he i h leading cause
o cance - ela ed dea hs wo ldwide [1]. Timely and p ecise diagnosis
is c ucial o e ec i e ea men and educing mo ali y a es [2].
Cu en ly, he gold s anda d o diagnosis and p ognosis is o analyze
✩This wo k was suppo ed in pa by FEDER/Jun a de Andalucía unde p ojec P20_00286, g an PID2022-140189OB-C22 unded by MI-
CIU/AEI/10.13039/501100011033 and by ‘‘ERDF/EU’’. The wo k by Miguel López-Pé ez and Fe nando Pé ez-Bueno was suppo ed by he g an s
JDC2022-048318-I and JDC2022-048784-I, espec i ely, unded by MICIU/AEI/10.13039/501100011033 he Eu opean Union ‘‘Nex Gene a ionEU’’/PRTR.
∗Co esponding au ho .
E-mail add esses: [email p o ec ed] (M. López-Pé ez), [email p o ec ed] (A. Mo quecho), [email p o ec ed] (A. Schmid ), [email p o ec ed]
(F. Pé ez-Bueno), [email p o ec ed] (A. Ma ín-Cas o), [email p o ec ed] (J. Ma eos), [email p o ec ed] (R. Molina).
a biopsy o p os a e issue by he Gleason g ading (GG) sys em which
assesses he cance s age and agg essi eness based on gland mo phol-
ogy. Howe e , he assessmen o GG is inhe en ly subjec i e wi h high
in a- and in e -obse e a iabili y [3,4].
h ps://doi.o g/10.1016/j.cmpb.2024.108472
Recei ed 2 May 2024; Recei ed in e ised o m 30 Sep embe 2024; Accep ed 20 Oc obe 2024
Compu e Me hods and P og ams in Biomedicine 257 (2024) 108472
A ailable online 28 Oc obe 2024
0169-2607/© 2024 The Au ho (s). Published by Else ie B.V. This is an open access a icle unde he CC BY-NC license (
h p://c ea i ecommons.o g/licenses/by-
nc/4.0/ ).
M. López-Pé ez e al.
Compu e -Aided Diagnosis (CAD) sys ems assis pa hologis s and
aim o minimize human a iabili y in decision-making. These sys ems
u ilize WSIs and compu e ision and machine lea ning (ML) algo i hms
o de ec and g ade cance ous egions. The main bo leneck in ain-
ing and alida ing ML me hods o GG p edic ion is he sca ci y o
la ge-scale public da ase s [5]. C ea ing hese da ase s is cos ly and
ime-consuming and, oge he wi h he sca ci y o expe pa hologis s,
explain why he e a e ew anno a ed da ase s and e en ewe public
da ase s.
C owdsou cing has eme ged as a cos -e ec i e and e icien me hod
o labeling his opa hological da ase s by le e aging a la ge pool o
anno a o s wi h a ying le els o expe ise [6,7]. While c owdsou cing
has shown success in asks like nuclei de ec ion [8] and cance cell
iden i ica ion [9], he labels gene a ed a e equen ly noisy, limi ing
hei di ec applica ion o complex asks such as GG. To add ess his
challenge, p obabilis ic models like GPs ha e become popula [10,11].
GPs o c owdsou cing ha e demons a ed excellen pe o mance in
a ious asks [12–14] and ha e been success ully applied in his opa ho-
logical image classi ica ion s udies, including b eas cance [15,16]
and skin cance de ec ion [17]. These me hods o e compe i i e pe -
o mance compa ed o me hods ained wi h expe labels, indica ing
ha c owdsou ced labeling o his opa hological images could be a
easible op ion o cance classi ica ion wi h minimal eliance on expe
pa hologis s. Rega ding GG classi ica ion, he e a e no p e ious s udies
in ol ing non-expe anno a o s. Howe e , i has been shown ha
lea ning om he opinion o mul iple expe pa hologis s, despi e high
in e - and in a-obse e a iabili y, esul s in s ong pe o mance when
e ec i ely modeling his a iabili y [18,19].
The objec i e o his wo k is wo old. Fi s , we p esen and make
publicly a ailable he i s p os a e da ase labeled by non-expe s o
GG p edic ion. Second, we explo e he lea ning om c owds amewo k
wi h his no el da ase , assessing and analyzing wo s a e-o - he-a
me hods based on GP o c owdsou cing. This pape also demons a es
he iabili y o in eg a ing his new da ase wi h exis ing da ase s
con aining expe labels, o c ea e a la ge and mo e di e se da ase .
Ou expe imen s indica e ha he noisy non-expe labels om he
p esen ed da ase can imp o e p e ious models in he li e a u e. Below,
we ou line ou con ibu ions in de ail:
•In oduc ion o new c owdsou cing p o ocol o he anno a ion o
pa ches om WSIs, ou lined in Fig. 1, which cheapens and speeds
up he labeling p ocess by d ama ically educing he in e en ion
o expe pa hologis s.
•C ea ion o a new da ase , called C owdGleason, comp ising
19,077 pa ches om 1045 WSIs o PCa wi h di e en GG. This
da ase was anno a ed by se en pa hologis -in- aining wi hou
expe supe ision. No e ha no all anno a o s labeled all pa ches.
To he bes o ou knowledge, his is he i s PCa da ase anno-
a ed by non-expe pa hologis s.
•De elopmen o a cu a ed es se anno a ed by each pa hologis -
in- aining and wo expe PCa pa hologis s o e alua e au o-
ma ed ML me hods and assess bias, expe ise, and disc epancies
be ween pa icipan s.
•Comp ehensi e expe imen s o e alua e wo GP-based me hods
o GG p edic ion: SVGPCR [7] and SVGPMIX [14]. SVGPCR
lea ns om he C owdGleason da ase , while SVGPMIX combines
expe labels om he public SICAP 2 [20] da ase wi h he
C owdGleason da ase . Resul s demons a e ha hese GP-based
c owdsou cing me hods ou pe o m popula echniques o label
agg ega ion, wi h SVGPMIX achie ing he bes pe o mance in
bo h da ase s.
The emainde o he wo k is o ganized as ollows. Sec ion 2de-
sc ibes ela ed wo k. Sec ion 3p esen s he C owdGleason da ase and
i s anno a ion p o ocol. Sec ion 4desc ibes he expe imen al se up
and he me hods e alua ed. The expe imen al esul s a e shown in
Sec ion 5, and Sec ion 6discusses hem. Finally, Sec ion 7p esen s he
conclusions and u u e wo k. Fo u he in o ma ion on he code and
da ase : h ps://gi hub.com/ ipgug /C owdGleason.
2. Rela ed wo k
Public da ase s a e essen ial o de elop p ecise ML me hods o GG
p edic ion. Hence, Sec ion 2.1 del es in o he cu en publicly a ailable
PCa da ase s and Sec ion 2.2 p o ides an o e iew o he co e wo k o
c owdsou cing and i s applica ions in he con ex o his opa hology.
2.1. Public PCa his opa hological da ase s
The cu en publicly a ailable PCa his opa hological da ase s ha e
been ypically c ea ed by s aining issue biopsies wi h hema oxylin
and eosin (H&E) and scanning hem as WSI o his opa hological ex-
amina ion. In clinical p ac ice, a WSI usually con ains one o a ew
issue samples. The use o Tissue Mic o A ays (TMAs) allows many
issue samples o be a anged on a g id and p ocessed simul aneously
o ob ain a single slide. These da ase s a e labeled a pixel, pa ch, o
WSI le els. The labeling p ocess a pixel le el consis s o manually
delinea ing umo a eas and assigning GG classes. This me iculous
p ocedu e p o ides comp ehensi e umo in o ma ion bu i is ime-
consuming. In WSI le el labeling, pa hologis s assign a label o he
en i e image wi hou speci ic umo loca ion in o ma ion. Pa ch le el
labeling di ides WSIs in o small egions, named pa ches, and a label is
assigned o a selec ed se o pa ches, hus educing he need o examine
he en i e WSI.
We b ie ly examine popula public da ase s o GG p edic ion, in-
cluding A ani i, SICAP, GLEASON2019, and PANDA. Table 1p o ides
an o e iew o hese da ase s and ou p oposed C owdGleason. A ani i
e al. [21] da ase comp ises TMAs anno a ed a pixel le el by an
expe pa hologis , while SICAP 1 [11] and SICAP 2 [20] da ase s o e
pixel-le el anno a ions on WSIs. The WSIs we e downsampled a 10×
magni ica ion and di ided in o pa ches o 5122pixels wi h 50% o e lap
ob aining pa ch-le el anno a ions. To ou knowledge, SICAP 2 is he
la ges ully anno a ed da ase a pa ch le el in he li e a u e.
Challenges, such as Gleason2019 and PANDA, ha e been a popula
way o p omo ing esea ch in GG p edic ion by p o iding benchma k
da ase s o e alua ing ML algo i hms. The Gleason2019 challenge [18]
da ase p o ides TMA images anno a ed by a panel o 5 expe pa hol-
ogis s, and he PANDA Challenge [22] da ase includes WSIs anno a ed
a he WSI le el by consensus among a la ge panel o highly ex-
pe ienced expe pa hologis s, wi h some samples anno a ed a pixel
le el.
2.2. C owdsou cing
To he bes o ou knowledge, all wo ks o GG p edic ion om
pa ches add essed he p oblem wi h g ound- u h labels p o ided by
ei he a single expe o a panel o expe pa hologis s. C owdsou cing
p esen s an oppo uni y o scale da ase s by engaging non-expe an-
no a o s in compu a ional pa hology- ela ed asks [8]. Va ious s udies
explo ed he use o labels om non-expe anno a o s o asks like
mi osis de ec ion [23] o his opa hological image classi ica ion [24].
P e ious wo ks [25,26] ha e demons a ed p omising esul s in he
ield o his opa hology using c owdsou cing, bu hey equi ed s ong
supe ision om senio pa hologis s o e iew he anno a ions p o ided
by he c owd. To educe he need o expe supe ision, label agg e-
ga ion echniques [27] ha e been de eloped o au oma ically cu a e
c owdsou cing labels, enabling he c ea ion o da ase s sui able o ML
wi hou expe supe ision. Va ious label agg ega ion me hods ha e
been p oposed, including majo i y o ing (MV) and mo e elabo a ed
me hods ha conside he biases o he di e en anno a o s, yielding a
be e -calib a ed se o aining labels [7]. They include Dawid-Skene
(DS) [28], GLAD [29] and MACE [30] models.
Recen s udies show ha join ly lea ning g ound- u h labels, an-
no a o expe ise, and he la en classi ie leads o supe io pe o -
mance [31]. Models like SVGPCR [7] ha e success ully combined
Compu e Me hods and P og ams in Biomedicine 257 (2024) 108472
2
M. López-Pé ez e al.
Fig. 1. Da ase c ea ion and anno a ion p o ocol. We collec 1045 WSIs, o which 783 a e used exclusi ely o c owd labeling and 262 o c owd and expe labeling. We di ide
all WSIs in o pa ches and dis ibu e hem among he non-expe anno a o s o ob ain he aining se . We c ea e a cu a ed es se wi h pa ches whe e he expe s and majo i y
o he non-expe s ag ee.
Table 1
Publicly a ailable da ase s o GG p edic ion. MA e e s o mul iple anno a o s.
Biopsy # Samples Anno a ions Expe s MA
A ani i [21] TMA 895 Pixel-le el Yes No
SICAP 1 [11] WSI 79 Pixel-le el Yes No
SICAP 2 [20] WSI 182 Pa ch-/pixel-le el Yes No
GLEASON19 [18] TMA 331 Pixel-le el Yes Yes
PANDA [22] WSI 12,625 WSI-/pixel-le el Yes No
C owdGleason (p oposed) WSI 1,045 Pa ch-le el No Yes
spa se GPs wi h a c owdsou cing p obabilis ic amewo k, demon-
s a ing compe i i e pe o mance o GPs ained wi h expe labels in
b eas cance de ec ion om his opa hological images [32]. Mo eo e ,
SVGPMIX me hod [14] is he i s p obabilis ic app oach based on
GPs o using expe and non-expe labels, le e aging he con idence
o e ed by expe labels and he la ge olume o da a p o ided by non-
expe anno a o s. To he bes o ou knowledge, his model has no ye
been applied in he biomedical domain.
3. C owdGleason da ase
The da ase p esen ed in his pape , named C owdGleason, has been
pa ially anno a ed by di e en pa hologis s in- aining wi h a ying
deg ees o expe ise. A subse o he da ase was also anno a ed by
expe pa hologis s, which helped o ob ain a es se .
3.1. Da a acquisi ion and anno a ion by expe pa hologis s
To c ea e C owdGleason, 1045 WSIs o H&E-s ained p os a e issue
samples om di e en pa ien s, we e collec ed by medical expe s
om he a chi e o he Hospi al Uni e si a io San Cecilio (HUSC) in
G anada. All WSIs we e digi ally scanned a 40× magni ica ion ac o .
Two expe pa hologis s exhaus i ely anno a ed 262 o hose 1045
WSIs a he pixel le el. Each image was anno a ed by only one o
he pa hologis s independen ly, using he online applica ion desc ibed
in [20]. Expe s ho oughly ma ked all pa hological a eas wi h hei GG
and delinea ed a i ac s.
3.2. Pa ch ex ac ion
All WSIs we e di ided in o pa ches o size 2048 ×2048 pixels a
a magni ica ion o 40x, wi hou o e lapping. This size and magni i-
ca ion we e selec ed in ag eemen wi h expe pa hologis s o p o ide
su icien con ex and de ail o acili a e he iden i ica ion o cance ous
lesions. Pa ches con aining less han 20% o issue we e disca ded, as
hey do no con ain enough issue o make an accu a e diagnosis. Tissue
p esence was de ec ed by h esholding he magen a channel by he
O su me hod [33]. F om images wi h pa hological a eas ma ked by he
expe s, we selec ed pa ches con aining a leas 15% o pa hological
issue, labeled wi h GG o he a ea ma ked by he expe : Gleason
g ade 3 (G3), Gleason g ade 4 (G4), o Gleason g ade 5 (G5). Pa ches
con aining mo e han one pa hological a ea we e disca ded since i was
no possible o assign a single label o he pa ch. F om images labeled
as non-cance ous (NC) by he expe pa hologis , on he o he side,
we could use all issue o ex ac pa ches. To educe he numbe o
candida e pa ches, we disca ded pa ches ha ing less han 30% o issue.
A o al o 4573 pa ches, which o m he so called expe -labeled se , we e
ob ained om he 262 images anno a ed by expe s.
Fo he emaining 783 images no anno a ed by expe pa hologis s,
a la ge numbe o pa ches wi hou g ound- u h labels we e ex ac ed
o be anno a ed by non-expe pa hologis s a a la e s age. To ex-
pedi e he labeling p ocess, we educed he numbe o pa ches. As
gland s uc u e is c ucial in PCa diagnosis, we chose pa ches wi h a
subs an ial p esence o nuclei as ep esen a i e o issue wi h glands.
Since nuclei s ain wi h hema oxylin, which is p ominen in he cyan
componen , o ex ac pa ches ich in nuclei, we selec ed hose pa ches
whe e a leas 40% o he issue’s pixels had a high cyan alue. S ill he
numbe o pa ches was o e whelming. Due o he huge class imbalance
in his opa hological da a, wi h la ge a eas o non-pa hological issue,
and cance ous issue ha is only spa sely ep esen ed, o selec a se o
pa ches ep esen ing he di e en PCa g ades, we p oceeded as ollows.
We ained he classi ica ion algo i hm in [34] ha combines semi-
supe ised and mul iple ins ance lea ning on he public PANDA da ase
o PCa classi ica ion, ollowing he success ul se ing in [34]. No e
ha he PANDA da ase is labeled a he WSI le el; hence, s anda d
supe ised lea ning echniques canno be used. Al hough segmen a ion
masks a e p o ided o some images, hey can be only used o de-
elop s a egies o selec ing he mos signi ican subsamples o he
Compu e Me hods and P og ams in Biomedicine 257 (2024) 108472
3
M. López-Pé ez e al.
Fig. 2. Examples o pa ches and anno a ions by each pa icipan .
Table 2
Me ics o he pa ch selec ion algo i hm in a small se o pa ches
ex ac ed om he expe -labeled se .
Accu acy F1 sco e Kappa
0.732 0.648 0.606
images [35]. The algo i hm in [34] uses an E icien Ne -B5 neu al
ne wo k a chi ec u e [36] ha was ained wi h a lea ning a e o 0.01
o 10 epochs on he classes NC, G3, G4, and G5. To alida e his
app oach, we classi ied a small se o pa ches ex ac ed om he expe -
labeled se . No e ha hese pa ches we e only used o ob ain he me ics
shown in Table 2and we e no used in model aining. These igu es o
me i show ha he me hod is good enough o dis inguish pa ches om
he di e en classes. Using he lea ned model, he pa ches om non-
anno a ed WSIs we e classi ied in he class wi h he highes p obabili y,
selec ing a o al o 16,151 pa ches. This collec ion cons i u es a oughly
balanced se ha will se e as aining se o ou s udy. Since his se was
designed o anno a ion by non-expe pa hologis s, he labels p o ided
by he classi ica ion algo i hm we e disca ded a e pa ch selec ion and
no u he used in his s udy.
3.3. Anno a ion by non-expe pa hologis s
Se en pa hologis s in- aining wi h di e en expe ise le els pa ici-
pa ed in he anno a ion o he da ase . Two o hem we e in hei ou h
yea o medical esidency, wo in hei hi d, wo in hei second, and
one was a i s -yea medical esiden . Following he Spanish O icial
Specialis T aining P og am in Ana omic Pa hology, hi d and ou h-
yea pa hologis s in- aining ha e comple ed speci ic aining in he
subspecial y o U opa hology, which includes he s udy o he p os a e,
wi h an app oxima e du a ion o 2–3 mon hs. Fi s and second-yea
s uden s do no ha e speci ic aining in his subspecial y. In collab-
o a ion wi h expe pa hologis s, we designed an anno a ion p o ocol
based on he well-known PCa g ading o he Gleason Sco e, o iginally
in oduced by Donald F. Gleason [37] o g ading p os a ic ca cinoma
based solely on he a chi ec u al pa e n o he umo . Non-expe
pa hologis s we e ins uc ed o label as NC, G3, G4, o G5 he pa ches in
he expe -labeled and aining se s, desc ibed in he p e ious sec ion,
a he han ho oughly examine he WSI and exhaus i ely delinea e he
umo a eas. Pa ches ha could no be labeled due o he p esence o
a i ac s, blu iness, issue om o he o gans, olded issue, e c. ha e
been labeled as ‘‘O he ’’. Pa ches wi h mo e han one GG, which ha e
no a clea label, ha e also been labeled as ‘‘O he ’’.
All pa hologis s in- aining labeled he 4573 pa ches in he expe -
labeled se . Table 3p esen s a summa y o he dis ibu ion o he labels
in his se . No e ha some pa ches we e labeled as ‘‘O he ’’ by some
esiden s and, hence, he o al o each column may no add up o he
o al numbe o pa ches. An example o pa ches and he anno a ions
p o ided by he c owd and he expe s is shown in Fig. 2. To minimize
he in luence o inhe en pa hologis a iabili y in labeling, we c ea ed
acu a ed es se whe e he g ound- u h label o each sample was es-
ablished by consensus be ween he majo i y o pa hologis s in- aining
and he expe pa hologis . Using his cu a ed se , whose dis ibu ion o
samples o each class is shown in Table 4, we will es ima e he deg ee
o eliabili y o each esiden as well as e alua e ML me hods. Recall
ha expe in e en ion has only been necessa y o he c ea ion o he
cu a ed es se , no o he aining se .
The 16,151 pa ches in he aining se we e labeled, on a e age,
by mo e han wo esiden pa hologis s, wi h each pa hologis in-
aining labeling app oxima ely 5000 pa ches. Table 5summa izes he
aining se . Pa ches we e p o ided o he esiden s in 4 ba ches o
app oxima ely equal size o e a 6-mon h pe iod.
Finally, as a pos -p ocessing s ep, he pa ches o bo h he aining se
and he cu a ed es se we e downsampled using bicubic in e pola ion
o a size o 512 ×512 pixels. This is equi alen o ob aining he pa ches
a a magni ica ion ac o o 10×, and i was necessa y o accommoda e
he pa ches in o he GPU memo y.
In summa y, he C owdGleason consis s o a c owdsou cing anno-
a ed aining se wi h 16,151 pa ches o size 512 ×512 ex ac ed
om 783 WSIs, anno a ed by one o mo e o he se en pa hologis s
in- aining, and a cu a ed es se wi h 2926 pa ches o size 512 ×512
ex ac ed om o he 262 WSIs, anno a ed by expe pa hologis s and
all he pa hologis s in- aining. G ound- u h labels o he cu a ed es
se we e ob ained by consensus be ween he expe pa hologis s and
he majo i y o he pa hologis s in- aining.
3.4. E hical consen and da a a ailabili y
The Resea ch E hics Commi ee o he Uni e sidad de G anada
app o ed he s udy wi h code 4096/CEIH/2024 as pa o he p ojec
P20_00286, unded by FEDER/Jun a de Andalucía, ollowing he p inci-
ples es ablished in in e na ional and na ional biomedical in e na ional
and na ional legisla ion in he ield o biomedicine, bioe hics and
bioe hics, as well as he igh s de i ed om he p o ec ion o pe sonal
da a.
The comple e C owdGleason da ase will be a ailable in Figsha e
upon accep ance o he pape .
4. Ma e ials and me hods
Da ase s. We p esen and u ilize he da ase , desc ibed in Sec-
ion 3.1, and combine i wi h he public da ase SICAP 2 [20]. We
no malize bo h da ase s ia he BKSVD me hod [38], and use hem
o aining and e alua ion o demons a e he u ili y o he p oposed
C owdGleason da ase wi h espec o ano he popula da ase in he
li e a u e. Ou app oach also allows o he gene aliza ion abili y o
Compu e Me hods and P og ams in Biomedicine 257 (2024) 108472
4
M. López-Pé ez e al.
Table 3
Dis ibu ion o labels in he expe -labeled se labeled by each esiden pa hologis and he expe . We e e o each anno a o
as A#, whe e he numbe is an anonymized ID.
Class A1 A2 A3 A4 A5 A6 A7 Expe To al
NC 1891 2290 2968 2983 2941 2868 389 2438 14,439
G3 1024 1267 507 601 763 693 1402 1498 5233
G4 1155 674 808 663 413 666 1513 449 4737
G5 503 341 281 308 431 317 1074 188 2752
To al 4573 4572 4564 4555 4548 4544 4378 4573 27,161
Table 4
Dis ibu ion o pa ches in each class o he cu a ed es se .
NC G3 G4 G5 To al
2157 548 164 57 2926
he classi ie s on an ex e nal coho . The ask is o lea n he GG o each
pa ch, i.e. o classi y he pa ches as ‘NC’, ‘G3’, ‘G4’ and ‘G5’.
Fea u e ex ac ion. A educed se o ea u es is ex ac ed and used
as inpu o he GP-based me hods. Fo his, we u ilize he 18-laye
a ian o ResNe [39], i.e., ResNe 18, p e- ained on Imagene and
ine- uned on SICAP 2. We use he ou pu o he las con olu ional
laye as a ea u e ex ac o . Since SICAP 2 is he la ges publicly
a ailable PCa da ase wi h pa ch labels, we assume ha he lea ned
ea u e ex ac o gene alizes well o o he da ase s. The expe imen al
esul s will alida e his assump ion. We u ilize hese 512-dimensional
ea u e ec o s as inpu o he GP classi ie s and also epo he esul s
o he end- o-end aining o ResNe 18 o compa ison. We pe o m i e
independen uns o all he p esen ed expe imen s, including he mean
pe o mance and 95% con idence in e al (CI). No e ha o each un,
we also un he ea u e ex ac o o ob ain a di e en se he ea u es
and ensu e he obus ness o he whole pipeline p oposed in his wo k.
To ain he ne wo k, we use he SGD op imize wi h a lea ning a e o
10−3, a momen um o 0.9, and a ba ch size o 32 pa ches. Common
da a augmen a ion ans o ma ions, such as ho izon al and e ical
lips, blu , and b igh ness, con as , hue, and sa u a ion a ia ions, a e
applied o he aining da ase . The CNN is implemen ed using Py o ch
2.0.1 and is un on an NVIDIA GeFo ce RTX 3090 GPU.
Supe ised Lea ning: Gaussian P ocesses. We use he ea u es ex-
ac ed by ResNe 18 and he g ound- u h label as inpu s o a s ochas ic
a ia ional Gaussian p ocess (SVGP) model o lea n he GG om he
SICAP 2 da ase . SVGPs a e scalable GP models ha use a ia ional
in e ence o app oxima e he pos e io dis ibu ion. See a de ailed
desc ip ion in [40] and an in ui i e e iew in [32]. We u ilize a squa ed
exponen ial ke nel o compu e he co ela ion ma ix. We ini ialize
he ke nel hype pa ame e s, leng hscale and a iance, o 2. We ain
he SVGP o 50 epochs and sa e he pa ame e s ha ob ain he bes
Cohen’s Quad a ic Kappa (𝜅) on he alida ion se . In all cases, we use
he Adam Op imize wi h a lea ning a e o 10−2and a ba ch size o 128.
Based on he expe imen al esul s (see Sec ion 5.1), we ixed a alue
o 512 inducing poin s, which p o ides a good ade-o be ween he
gene aliza ion and complexi y o he model, o all GP based me hods.
SVGP is implemen ed using GP low 1.2.0 and is un on an NVIDIA
GeFo ce RTX 3090 GPU. The code will be eleased in Gi Hub upon
accep ance o he pape .
Label agg ega ion. We u ilize and compa e Majo i y Vo ing (MV),
DS [28], MACE [30], and GLAD [29] agg ega ion me hods o cu a e he
mul iple noisy labels a ailable in he C owdGleason da ase . The agg e-
ga ed labels can be used as he single g ound- u h label and, he e o e,
used by supe ised lea ning me hods. All me hods, implemen ed in
he popula Py hon lib a y o c owdsou cing asks c owd-ki [41], a e
un wi h he de aul hype pa ame e s. The agg ega ed labels and he
ea u es ex ac ed by ResNe 18 a e ed o SVGP o lea n a GG classi ie .
C owdsou cing models. We u ilize, as an enhancemen o he label
agg ega ion me hods, he lea ning om c owds model based on GPs,
SVGPCR [7]. This model ex ends he GPs o he c owdsou cing scena io
and join ly lea ns he expe ise o he anno a o s and he GP classi ie .
The main assump ion is ha mul iple anno a o s p o ide noisy labels
ha a e co up ed obse a ions o he g ound- u h label. This co up-
ion is modeled wi h a con usion ma ix o each anno a o , which
e lec s he p obabili y o p o iding a gi en label o each g ound- u h
class (as in he DS model [28]). Once ained, he model can p edic
g ound- u h labels in unseen ins ances using he GP classi ie .
Fu he mo e, we analyze how c owdsou cing labeled da ase s can
be used in conjunc ion wi h expe labeled da ase s o lea n a classi ie .
Fo his pu pose, we use he SVGPMIX model [14]. This model, used
he e o he i s ime in medical imaging, gene alizes SVGPCR o cases
whe e labels ha e been p o ided ei he by a noisy anno a o o by an
expe . We u ilize his model o s udy he combina ion o C owdGleason
wi h SICAP 2. SVGPCR and SVGPMIX use he same aining p ocedu e
and so wa e amewo k as supe ised GPs.
E alua ion Me ics. To assess he quali y o he lea ned models,
we epo he nume ical esul s o h ee di e en me ics: Accu acy,
Cohen’s Quad a ic Kappa (𝜅), and he F1-sco e. The accu acy is he
a e o success o he classi ie s. The F1 can be de ined pe class as he
ha monic mean be ween p ecision and ecall. We only epo mul iclass
F1, which can be de ined as he a e age ac oss class-wise F1. Recall ha
his sco e is o special impo ance in imbalanced scena ios, which a e
common in medical imaging. Finally, he 𝜅sco e is inc easingly popula
o GG assessmen [20,21,42,43]. I measu es he le el o ag eemen
be ween he ou pu o he classi ie and he g ound- u h label [44].
We can also use i o measu e he ag eemen be ween anno a o s.
The 𝜅me ic anges om −1 o 1, being di ec ly p opo ional o he
le el o ag eemen be ween obse e s (−1 comple e disag eemen , 0
no ag eemen beyond wha would be expec ed by chance, 1 o al
ag eemen ). I is commonly a gued ha a mode a e ag eemen is
achie ed i 𝜅is highe han 0.6, whe eas a s ong ag eemen is a ained
when 𝜅is highe han 0.8. Fu he mo e, his me ic also penalizes
disag eemen s depending on class di e ences (in a quad a ic manne ).
Fo example, a disag eemen be ween classes NC and G5 implies a
s onge penaliza ion han be ween classes NC and G3.
5. Expe imen al esul s
In his sec ion, we epo he esul s o a se o expe imen s. They
compa e di e en GP-based app oaches ha lea n om expe labels,
c owdsou cing labels, and a combina ion o bo h.
5.1. Expe imen 1: Expe labels
In his expe imen , we p esen he esul s o models ained on
expe SICAP 2 labels. The model is alida ed using he alida ion se
o SICAP 2. Fo compa ison pu poses, we also ain he CNN-based
me hod ResNe 18 wi h he same da a.
To selec he numbe o inducing poin s o he SVGP me hod, we
un he me hod wi h se e al alues: 64, 128, 256, 512, and 1024. Fig. 3
shows ha he F1 and Kappa sco es a e s able in he SICAP 2 alida ion
se ac oss di e en numbe s o inducing poin s. This esul means ha
he in o ma ion can be summa ized in a ew poin s o he ea u e
space, and adding mo e lexibili y does no imp o e he pe o mance.
Fu he mo e, we can see ha a la ge numbe o inducing poin s does
no lead o o e i ing. Hence, we ix a alue o 512 inducing poin s o
Compu e Me hods and P og ams in Biomedicine 257 (2024) 108472
5

M. López-Pé ez e al.
Table 5
Dis ibu ion o pa ches in he aining se labeled by each esiden pa hologis . We e e o each anno a o as A#, whe e he
numbe is an anonymized ID.
Class A1 A2 A3 A4 A5 A6 A7 To al
NC 2165 2747 3659 4186 3476 3452 866 20,551
G3 1462 1618 604 149 1024 667 682 6206
G4 995 510 544 580 405 641 2580 6255
G5 400 205 140 111 250 337 677 2120
To al 5022 5080 4947 5026 5155 5097 4805 35,132
Table 6
Resul s o he me hods ained on SICAP 2 (expe labels) when es ed on SICAP 2 and C owdGleason es se s.
Me hod SICAP 2 C owdGleason
Accu acy F1 sco e Kappa Accu acy F1 sco e Kappa
ResNe 18 𝟎.𝟕𝟔𝟒𝟖 ±𝟎.𝟎𝟏𝟎𝟐 𝟎.𝟕𝟏𝟒𝟓 ±𝟎.𝟎𝟏𝟒𝟑 0.6611 ± 0.0149 𝟎.𝟖𝟖𝟑𝟗 ±𝟎.𝟎𝟏𝟐𝟐 𝟎.𝟔𝟔𝟗𝟖 ±𝟎.𝟎𝟑𝟏𝟕 𝟎.𝟕𝟎𝟗𝟓 ±𝟎.𝟎𝟒𝟐𝟔
SVGP-SICAP 0.7515 ± 0.0048 0.6912 ± 0.0119 𝟎.𝟕𝟕𝟑𝟔 ±𝟎.𝟎𝟏𝟑𝟗 0.8736 ± 0.0075 0.6628 ± 0.0061 0.6583 ± 0.0220
Table 7
Resul s o he me hods ained on C owdGleason (c owdsou cing labels) when es ed on SICAP 2 and C owdGleason es se s.
Me hod SICAP 2 C owdGleason
Accu acy F1 sco e Kappa Accu acy F1 sco e Kappa
ResNe 18-MV 0.5910 ± 0.0568 0.5251 ± 0.0694 0.4139 ± 0.0763 0.8815 ± 0.0149 0.6791 ± 0.0316 0.6958 ± 0.0415
SVGP-DS 0.4960 ± 0.0233 0.4345 ± 0.0282 0.4965 ± 0.0283 0.8402 ± 0.0121 0.5499 ± 0.0360 0.6152 ± 0.0236
SVGP-MACE 0.4980 ± 0.0212 0.4345 ± 0.0263 0.4759 ± 0.0347 0.8486 ± 0.0070 0.5363 ± 0.0300 0.5574 ± 0.0325
SVGP-GLAD 0.4909 ± 0.0256 0.4342 ± 0.0344 0.5052 ± 0.0533 0.8539 ± 0.0091 0.5410 ± 0.0260 0.5776 ± 0.0338
SVGP-MV 0.6861 ± 0.0138 0.6331 ± 0.0169 0.6242 ± 0.0277 0.8649 ± 0.0016 0.6287 ± 0.0123 0.6576 ± 0.0086
SVGPCR 𝟎.𝟕𝟏𝟐𝟑 ±𝟎.𝟎𝟎𝟕𝟐 𝟎.𝟔𝟖𝟓𝟎 ±𝟎.𝟎𝟎𝟕𝟓 𝟎.𝟔𝟗𝟓𝟑 ±𝟎.𝟎𝟏𝟕𝟔 𝟎.𝟗𝟎𝟐𝟑 ±𝟎.𝟎𝟎𝟑𝟕 𝟎.𝟕𝟎𝟔𝟖 ±𝟎.𝟎𝟏𝟒𝟐 𝟎.𝟕𝟎𝟒𝟖 ±𝟎.𝟎𝟐𝟎𝟕
Fig. 3. Va ia ion o he F1 and Kappa sco es wi h he numbe o inducing poin s o
he SVGP model ained on he SICAP 2 da ase . These esul s a e epo ed on he
alida ion se .
all expe imen s as a ade-o be ween complexi y and gene aliza ion
capabili y.
Table 6shows he esul s o he SVGP me hod ained on SICAP 2
(deno ed as SVGP-SICAP) es ed on he SICAP 2 and C owdGleason
cu a ed es se s. Addi ionally, Table 6includes he es esul s o he
end- o-end ained ResNe 18 ne wo k o compa a i e analysis. The
SVGP-SICAP classi ie achie es be e igu es-o -me i o he Kappa
sco e in SICAP 2 han using ResNe 18 and is compe i i e in he es
o he me ics.
5.2. Expe imen 2: C owdsou cing labels
In his expe imen , we ain he me hods wi h he C owdGleason
da ase . The da ase is spli in o 13824 aining samples and 2327 ali-
da ion samples. Fo alida ion, we use he MV s a egy o agg ega ing
he labels.
We i s use di e en label agg ega ion s a egies (DS, MACE, GLAD,
and MV) o ain he SVGP classi ie . No e ha he inpu ea u es, as
we ha e al eady indica ed, a e ex ac ed using ResNe 18 ained on
SICAP 2. Resul s in Table 7show ha MV p oduces he bes esul
among he label agg ega ion s a egies ollowed by DS. Fo compa ison
pu poses, we also epo he esul s on ResNe 18 ained end- o-end
wi h he MV labels. Finally, we epo he esul s o he SVGPCR
me hod ained wi h he c owdsou cing labels o C owdGleason. F om
Table 7i is clea ha SVGPCR ou pe o ms he es o he me hods in
he li e a u e in bo h da ase s. The MV agg ega ion s a egy can educe
he bias o he anno a ions bu he noisy labels hinde he classi ie ’s
pe o mance.
5.3. Expe imen 3: Combining expe and c owdsou cing labels
Un il now, in o ma ion om expe s and c owds was no used
simul aneously. In his expe imen , we explo e he possibili y o en-
hancing expe -labeled da ase s wi h c owdsou cing-labeled ones. We
add he C owdGleason aining se o he SICAP 2 aining se o his
pu pose. Fo supe ised me hods (SVGP and ResNe 18), we conside
MV agg ega ion, since i achie ed he bes esul s in expe imen 2. All
me hods use SICAP 2 as he alida ion se since i al eady p o ides
g ound- u h labels.
Resul s a e shown in Table 8. SVGPMIX ou pe o ms he compe ing
me hods on he SICAP 2 and C owdGleason es da ase s, showing
ha he combina ion o expe and c owdsou cing labels is easible
and bene icial. Al hough ResNe 18-MV achie es a sligh ly highe F1
sco e alue han SVGPMIX on SICAP 2 (F1 = 0.7137 ± 0.0119 s. F1
= 0.7216 ± 0.0152), i s Kappa alue is much lowe (𝜅= 0.6748 ± 0.0085)
compa ed o SVGPMIX (𝜅= 0.7814 ± 0.0083). We obse e a simila
beha io in he SVGP-MV pe o mance. We belie e ha his is due
o he p esence o noisy labels in he combined da ase . In con as ,
SVGPMIX achie es a sa is ying Kappa alue on he SICAP 2 (𝜅=
0.7814 ± 0.0083) and C owdGleason (𝜅= 0.7276 ± 0.0260) da ase s,
demons a ing i s obus ness.
To assess he s a is ical signi icance o ou esul s, we apply he
Almos S ochas ic O de (ASO) es [45,46] (implemen ed in he deep
signi icance lib a y1) on he i e andom uns o bo h SVGP-SICAP and
1h ps://deep-signi icance. ead hedocs.io/en/la es /
Compu e Me hods and P og ams in Biomedicine 257 (2024) 108472
6
M. López-Pé ez e al.
Table 8
Resul s o he me hods ained on SICAP 2 and C owdGleason (expe and c owdsou cing labels, espec i ely) combined when es ed on SICAP 2
and C owdGleason es se s.
Me hod SICAP 2 C owdGleason
Accu acy F1 sco e Kappa Accu acy F1 sco e Kappa
ResNe 18-MV 𝟎.𝟕𝟕𝟒𝟑 ±𝟎.𝟎𝟎𝟓𝟔 𝟎.𝟕𝟐𝟏𝟔 ±𝟎.𝟎𝟏𝟓𝟐 0.6748 ± 0.0085 0.8804 ± 0.0192 0.6843 ± 0.0355 0.7042 ± 0.0381
SVGP-MV 0.6861 ± 0.0138 0.6331 ± 0.0169 0.6242 ± 0.0277 0.8649 ± 0.0016 0.6287 ± 0.0123 0.6576 ± 0.0086
SVGPMIX 0.7660 ± 0.0056 0.7137 ± 0.0119 𝟎.𝟕𝟖𝟏𝟒 ±𝟎.𝟎𝟎𝟖𝟑 𝟎.𝟗𝟎𝟐𝟕 ±𝟎.𝟎𝟎𝟗𝟔 𝟎.𝟕𝟏𝟕𝟔 ±𝟎.𝟎𝟐𝟕𝟎 𝟎.𝟕𝟐𝟕𝟔 ±𝟎.𝟎𝟐𝟔𝟎
he p oposed SVGPMIX models. The es was pe o med on he F1 sco e
me ic since i akes in o accoun he imbalanced scena io p esen ed in
his pape . The ASO es ou pu s a alue be ween 0 and 1 indica ing
he deg ee o iola ion in s ochas ic o de , whe e a alue below 0.5
indica es ha he SVGPMIX model pe o ms s a is ically be e han
SVGP-SICAP. Using ASO wi h a con idence le el 𝛼= 0.05, we ound
he sco e dis ibu ion o SVGPMIX o be s ochas ically dominan o e
SVGP-SICAP (𝜖min = 0.0615 in SICAP 2 and 𝜖min = 0.0in C owdGlea-
son). In conclusion, he p oposed C owdGleason da ase ou pe o ms
he model ained on SVGP-SICAP.
5.4. Abla ion s udies on he quali y o he labels and he numbe o
anno a o s
We ha e seen how well he models pe o m on he es se bu
ha e no analyzed in-dep h he ole o he non-expe anno a o s in
he c owdsou cing model. In his subsec ion, we p o ide an abla ion
s udy on he c owdsou cing models, highligh ing he impac o c owd-
sou ced anno a ions in o he inal pe o mance. Fi s , we assess he
e ec o expe ience on pa hologis s in aining by di iding hem in o
wo g oups: junio ( esiden s in hei i s o second yea ) and senio
( hi d o ou h yea esiden s). We ain he SVGPCR model using he
same con igu a ion as in p e ious expe imen s, bu in wo di e en
se ings: (i) using only samples labeled by junio pa icipan s and (ii)
using only samples labeled by senio pa icipan s. This expe imen is
conduc ed o e i e independen uns. The esul s a e shown in Fig. 4
which includes he mean pe o mance and 0.95 CI. The pe o mance
is compa able ac oss bo h da ase s; howe e , he model ained wi h
junio -labeled samples pe o ms be e on he C owdGleason da ase ,
while he model ained wi h senio -labeled samples excels in he
SICAP 2 da ase . Since he junio esiden s we e speci ically ained
using he C owdGleason da ase , he model ained wi h hei anno a-
ions ends o o e i . In con as , he senio pa icipan s, wi h g ea e
expe ience, a e able o ecognize a b oade ange o pa e ns, allowing
he model o gene alize mo e e ec i ely o he SICAP 2 da ase .
Secondly, in he abla ion s udy on he numbe o anno a o s, we
in es iga e how many anno a o s a e su icien o achie e a sa is ac o y
c owdsou cing model. Fo his, we ained he SVGPCR model wi h
a ying numbe s o anno a o s. Fo each numbe o anno a o s, we
pe o med eigh independen uns, andomly sampling di e en subse s
o anno a o s. Fo each subse , we un he models i e imes o ensu e
s abili y and consis ency. Fig. 5illus a es he esul s o he SICAP 2
and C owdGleason da ase s, showing he mean pe o mance, he 95%
CI, and he SVGPCR pe o mance wi h all anno a o s. As we inc ease
he numbe o anno a o s, he CI na ows, indica ing ha he models
become mo e s able and less dependen on he speci ic anno a o s
selec ed. On bo h es da ase s, he pe o mance o he models ained
wi h subse s o anno a o s o e laps wi h ha o he model ained wi h
all anno a o s. This sugges s ha ewe anno a o s can achie e compa-
able pe o mance. O e all, o his expe imen , abou i e anno a o s
appea su icien o achie e sa is ac o y esul s, al hough inc easing he
numbe o anno a o s leads o a mo e s able pe o mance since he
esul s a e highly in luenced by he expe ise o he selec ed anno a o s.
5.5. Abla ion s udy on he impac o expe label da ase s in he c owd-
sou cing scheme
This sec ion analyzes he impac o he expe labels om SICAP 2
on he SVGPMIX model. We in es iga e how many expe -labeled sam-
ples a e necessa y o achie e a obus SVGPMIX model. Fo his, we
ained he SVGPMIX model using C owdGleason and di e en p opo -
ions o expe -labeled samples om SICAP 2. Fo each p opo ion, we
pe o med eigh independen uns, andomly sampling di e en subse s
o expe -labeled da a. Fo each sampled da ase , we un he model i e
imes o ensu e s abili y and consis ency. Fig. 6illus a es he esul s
o each da ase , showing he mean pe o mance, he 95% CI, and
bo h SVGPMIX and SVGP-SICAP pe o mance wi h all expe -labeled
samples.
Unlike c owdsou ced-labeled samples, inc easing he numbe o
expe -labeled samples does no signi ican ly na ow he CI, as expe
labels end o ha e less a iabili y and a e inhe en ly mo e obus .
No ably, when 10% o he expe samples a e used, he model s abi-
lizes (and also su passes he esul s om SVGP-SICAP in he SICAP 2
da ase ), indica ing ha he samples a e highly in o ma i e o aining
he c owdsou cing model. Beyond his poin , adding mo e samples
does no p o ide addi ional bene i s, demons a ing ha he model can
pe o m e ec i ely wi h a ela i ely small amoun o expe da a.
5.6. Analysis o anno a o beha io
We measu e he pe o mance o each non-expe anno a o by
means o he Kappa sco e. The igu es-o -me i , shown in Table 9,
indica e he deg ee o ag eemen be ween each anno a o and he
cu a ed es se . The bes -pe o ming anno a o is A4 (𝜅= 0.7765),
while A7 p esen s he lowes ag eemen (𝜅= 0.0899). The dispa i y
o pe o mance among anno a o s highligh s he c owd he e ogenei y
and complexi y o he ask.
We u he depic he pe -class beha io o he anno a o s in Fig. 7.
The con usion ma ices a e no malized ow-wise o be e isualiza ion
and compa ison pu poses. These ma ices can be unde s ood as an
es ima ion (on he es se ) o he anno a o s’ expe ise. The c owd-
sou cing me hods aim o es ima e hese con usion ma ices om he
noisy labeled aining se . Recall ha he g ound- u h labels a e no
obse ed o hese models. Figs. 8and 9show he es ima ed con usion
ma ices es ima ed by SVGPCR and SVGPMIX, espec i ely. These ma-
ices closely app oxima e he anno a o s’ beha io , emphasizing he
excellen pe o mance o he c owdsou cing me hods.
6. Discussion
Ou expe imen s ha e shown (see Tables 6and 7) ha SVGP im-
p o es he pe o mance o ResNe 18 es ed on SICAP 2 and is compe -
i i e o ou pe o ms ResNe 18 when es ed on he new C owdGleason.
These esul s con i m he po en ial o GPs o pe o m GG classi ica ion.
The SVGPCR classi ie , used in he lea ning om c owds amewo k,
achie ed a alue o 𝜅= 0.7048 ± 0.0207 and 𝜅= 0.6953 ± 0.0176
on C owdGleason and SICAP 2 es se s, espec i ely (see Table 7),
ou pe o ming label agg ega ion s a egies, such as MV, DS, MACE, and
GLAD. The bes label agg ega ion model (i.e., MV) ob ains 𝜅= 0.6576 ±
0.0086 and 𝜅= 0.6242 ± 0.0277 (see Table 7) o C owdGleason and
SICAP 2 es se s, espec i ely. This signi ican di e ence highligh s he
Compu e Me hods and P og ams in Biomedicine 257 (2024) 108472
7
M. López-Pé ez e al.
Fig. 4. Resul s o he SVGPCR model ained wi h labels p o ided by junio pa icipan s ( i s and second yea ) o senio pa icipan s ( hi d and ou h yea ).
Fig. 5. Resul s o he SVGPCR model a ying he numbe o anno a o s.
Fig. 6. Resul s o he SVGPMIX model a ying he p opo ion expe -labeled samples om SICAP 2.
eno mous impac o noisy labels p o ided by non-expe anno a o s on
he models pe o mance and he need o use a sui able model o lea n
om c owds. Fu he mo e, he SVGPCR esul s a e compe i i e wi h
SVGP ained on SICAP 2 wi h expe labels ha ob ain 𝜅= 0.6583 ±
0.0220 and 𝜅= 0.7736 ± 0.0139 (see Table 6). Rega ding he F1 me ic,
SVGPCR can e en imp o e he pe o mance o SVGP ained wi h
expe labels on bo h es da ase s. These esul s align wi h p e ious
wo ks in c owdsou cing [12,15,16,18], and alida e he use o he
p oposed C owdGleason da ase o u he s udies on c owdsou cing
and GG.
We ha e also explo ed he combina ion o he SICAP 2 da ase
wi h ou da ase . Recall ha lea ning a model wi h samples om wo
Compu e Me hods and P og ams in Biomedicine 257 (2024) 108472
8
M. López-Pé ez e al.
Fig. 7. No malized con usion ma ices o he se en anno a o s in he C owdGleason cu a ed es se .
Table 9
Cohen’s Quad a ic Kappa (𝜅) coe icien o non-expe anno a o s on he C owdGleason
cu a ed es se .
A1 A2 A3 A4 A5 A6 A7
𝜅0.4120 0.6283 0.5394 0.7765 0.7040 0.6520 0.0899
di e en cen e s is di icul due o he he e ogenei y be ween samples
and labels. Addi ionally, noisy labels om non-expe anno a o s in o-
duce noise in o he da ase , which wo sens he classi ie pe o mance.
See, o ins ance, he dec ease om 𝜅= 0.7736 ± 0.0139 o SVGP-
SICAP in Table 6 o 𝜅= 0.6242 ± 0.0277 o SVGP-MV in Table 8. In
his wo k, we p opose using SVGPMIX o add ess his issue. SVGPMIX
ex ends SVGPCR o he scena io whe e some labels a e gi en by one
expe and he es a e gi en by mul iple non-expe s. In his case,
SVGPMIX imp o es he esul s o bo h SVGP-SICAP and SVGPCR on
bo h da ase s (see Table 8). Speci ically, i achie es 𝜅= 0.7276 ± 0.0260
and 𝜅= 0.7814 ± 0.0083 on C owdGleason and SICAP 2, espec i ely.
Rema kably, SVGPMIX achie es s able esul s wi h only 10% pe cen
o samples labeled by expe pa hologis s. The esul s ob ained by bo h
SVGPCR and SVGPMIX a e wi hin he ange o esul s epo ed in he
li e a u e o GG classi ica ion. Fo example, Ma ón-Esqui el e al.
[42] epo ed 𝜅= 0.826, Xiang e al. [47] epo ed 𝜅= 0.81 and A ani i
e al. [21] epo ed 𝜅= 0.49 and 𝜅= 0.53 o wo di e en pa hologis s.
Du ing he s udy, we ha e obse ed g ea a iabili y be ween anno-
a o s ha is e en mo e accen ua ed when hey ha e li le expe ience in
he a ea. Table 9shows ha he esul s ob ained by non-expe s in he
C owdGleason cu a ed es se a e e y dissimila anging om 𝜅= 0.09
o 𝜅= 0.78. In gene al, we obse e a lowe mean ag eemen wi h
he es se (𝜅= 0.5432) han ha obse ed in o he wo ks in ol ing
only expe pa hologis s. Fo example, in [42] he au ho s epo ed 𝜅=
0.6946 among expe pa hologis s, and in [21] wo expe pa hologis s
sco ed 𝜅= 0.71. The anno a o s classi ied non-cance ous pa ches
ela i ely well, bu had mo e con usion be ween classes G3 and G4 (see
Fig. 7). SVGPCR and SVGPMIX au oma ically es ima e hese con usion
ma ices om he noisy aining da a. The esul s in Figs. 8and 9
show ha he es ima ed ma ices cap u e he beha io o he noisy
anno a o s. Fo ins ance, bo h models cap u e he highe sensi i i y
in he G4 and G5 g ades o anno a o 7. Fu he mo e, hese models
also cap u e he beha io o he anno a o s when labeling samples as
‘NC’. The models co ec ly es ima e ha anno a o s 1 and 7 ha e he
lowes sensi i i y in his class (as seen in Fig. 7). No e also ha SVGPCR
achie es a be e conco dance (Kappa alue) on he es se han mos
pa hologis s in- aining, as seen in Tables 7and 9. This means ha
SVGPCR ou pe o ms each pa hologis in- aining indi idually. As an
Compu e Me hods and P og ams in Biomedicine 257 (2024) 108472
9

Related note

Why institutions use Plag.ai for originality review, entry 39
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by review committees in large academic systems, distance-learning programs, and cross-border universities, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also clearer separation between similarity and misconduct, more consistent review procedures, and more transparent source review. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For grant proposals, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai