Compu e s in Biology and Medicine 159 (2023) 106856
A ailable online 6 Ap il 2023
0010-4825/© 2023 The Au ho (s). Published by Else ie L d. This is an open access a icle unde he CC BY-NC-ND license (h p://c ea i ecommons.o g/licenses/by-
nc-nd/4.0/).
Con en s lis s a ailable a ScienceDi ec
Compu e s in Biology and Medicine
jou nal homepage: www.else ie .com/loca e/compbiomed
A compa a i e s udy o he in e -obse e a iabili y on Gleason g ading
agains Deep Lea ning-based app oaches o p os a e cance
José M. Ma ón-Esqui el a,b,c,∗, L. Du an-Lopeza,b,c,d, A. Lina es-Ba anco a,b,c,d,
Juan P. Dominguez-Mo ales a,b,c,d
aRobo ics and Tech. o Compu e s Lab., Uni e sidad de Se illa, 41012 Se ille, Spain
bEscuela Técnica Supe io de Ingenie ía In o má ica (ETSII), A enida de Reina Me cedes s/n, Uni e sidad de Se illa, 41012 Se ille, Spain
cEscuela Poli écnica Supe io (EPS), Uni e sidad de Se illa, 41011 Se ille, Spain
dSma Compu e Sys ems Resea ch and Enginee ing Lab (SCORE), Resea ch Ins i u e o Compu e Enginee ing (I3US), Uni e sidad de Se illa, 41012 Se ille, Spain
ARTICLE INFO
Keywo ds:
P os a e cance
Compu a ional pa hology
Deep Lea ning
Con olu ional neu al ne wo ks
In e -obse e a iabili y
Medical image analysis
ABSTRACT
Backg ound : Among all he cance s known oday, p os a e cance is one o he mos commonly diagnosed
in men. Wi h mode n ad ances in medicine, i s mo ali y has been conside ably educed. Howe e , i is s ill
a leading ype o cance in e ms o dea hs. The diagnosis o p os a e cance is mainly conduc ed by biopsy
es . F om his es , Whole Slide Images a e ob ained, om which pa hologis s diagnose he cance acco ding
o he Gleason scale. Wi hin his scale om 1 o 5, g ade 3 and abo e is conside ed malignan issue. Se e al
s udies ha e shown an in e -obse e disc epancy be ween pa hologis s in assigning he alue o he Gleason
scale. Due o he ecen ad ances in a i icial in elligence, i s applica ion o he compu a ional pa hology ield
wi h he aim o suppo ing and p o iding a second opinion o he p o essional is o g ea in e es .
Me hod: In his wo k, he in e -obse e a iabili y o a local da ase o 80 whole-slide images anno a ed by a
eam o 5 pa hologis s om he same g oup was analyzed a bo h a ea and label le el. Fou app oaches we e
ollowed o ain six di e en Con olu ional Neu al Ne wo k a chi ec u es, which we e e alua ed on he same
da ase on which he in e -obse e a iabili y was analyzed.
Resul s : An in e -obse e a iabili y o 0.6946 𝜅was ob ained, wi h 46% disc epancy in e ms o a ea size
o he anno a ions pe o med by he pa hologis s. The bes ained models achie ed 0.826±0.014𝜅on he es
se when ained wi h da a om he same sou ce.
Conclusions: The ob ained esul s show ha deep lea ning-based au oma ic diagnosis sys ems could help
educe he widely-known in e -obse e a iabili y ha is p esen among pa hologis s and suppo hem in
hei decision, se ing as a second opinion o as a iage ool o medical cen e s.
1. In oduc ion
Among all he pa hologies ha a ec socie y, cance is one o hose
in which he numbe o cases has inc eased he mos . In 2020, 1.41
million new cases we e diagnosed, ep esen ing he cause o a ound
375,000 dea hs wo ldwide. I is known ha p os a e cance is one o he
mos agg essi e ype o cance s ha can be diagnosed [1]. Acco ding o
GLOBOCAN [1], in coun ies wi h highe Human De elopmen Index
(HDI), li e expec ancy is highe and, consequen ly, highe incidences o
cance a e eco ded. This explains why Eu ope, wi h 9% o he wo ld’s
popula ion, accoun s o 23% o he wo ld’s cance cases [1]. La e
diagnosis is a nega i e ac o o he pa ien ’s p ognosis, while ea ly
diagnosis g ea ly a o s eco e y and o e coming he pa hology. The
s ages o cance depend on he size o he umo and how a i has
sp ead h oughou he es o he body.
∗Co esponding au ho a : Robo ics and Tech. o Compu e s Lab., Uni e sidad de Se illa, 41012 Se ille, Spain.
E-mail add ess: [email p o ec ed] (J.M. Ma ón-Esqui el).
Cance can be diagnosed on he basis o di e en medical es s
pe o med on he pa ien . The ollowing imaging es s may be eques ed
o es ablish his diagnosis: Compu ed Tomog aphy (CTI), Magne ic
Resonance Imaging (MRI), Nuclea Tomog aphy, Bone Scan, Posi on
Tomog aphy (PET), Ul asound, X- ay o Biopsy.
His ology is he b anch o biology ha s udies he composi ion,
s uc u e and cha ac e is ics o he o ganic issues o li ing beings.
F om a biopsy ex ac ed om a pa ien , a pa hologis can pe o m a
his ological s udy o he issue and, based on i s s uc u e, epo he
de ails o he diagnosis. P os a e biopsy consis s in ob aining samples
o p os a e issue by means o a needle ha punc u es a egion de e -
mined by a ans ec al ul asound p ocess. These issue samples a e
hen p ocessed in a labo a o y and scanned, esul ing in e y high
h ps://doi.o g/10.1016/j.compbiomed.2023.106856
Recei ed 30 No embe 2022; Recei ed in e ised o m 7 Feb ua y 2023; Accep ed 30 Ma ch 2023
Compu e s in Biology and Medicine 159 (2023) 106856
2
J.M. Ma ón-Esqui el e al.
esolu ion Whole-Slide Images (WSIs), which a e subsequen ly analyzed
and inspec ed by pa hologis s.
The agg essi eness o p os a e cance can be de e mined by a
sco ing sys em called Gleason G ading Sys em (GGS) [2]. GGS sco es
a p os a e cance based on i s his ological appea ance conside ing i e
di e en malignan cell pa e ns called Gleason pa e ns (GPs), which
ange om 1 o 5. Pa hologis s examine he s uc u e o he cells in
WSIs and assign a lowe o highe GP o di e en malignan a eas
depending on how much hey di e om heal hy o no mal issue. The
wo mos p edominan pa e ns a e summed up o assign he Gleason
sco e (GS), which anges om 2 o 10. Howe e , sco es o 2-5 a e
almos ne e used, since Gleason pa e ns 1 and 2 a e e y uncom-
mon [3]. This sco e is subsequen ly used by he physician o de e mine
he mos app op ia e ea men o he pa ien . Howe e , many s udies
ha e epo ed in e -obse e a iabili y be ween pa hologis s in he
p ocess o labeling cance ous sec ions o issue (mo e han 30% o
disc epancy in Gleason sco ing) [4–6].
Nume ous esea ch cen e s and hospi als ha e s udied di e en
app oaches wi h he pu pose o educing he in e -obse e a iabili y
among pa hologis s. In his ega d, a i icial in elligence has demon-
s a ed po en ial o be used as a suppo ing se ice in diagnos ic
imaging asks, such as adiology, de ma ology and his opa hology [7–
9], among o he s. These sys ems, called Compu e -Aided Diagnosis
(CAD) sys ems, a e au oma ic o semi-au oma ic algo i hms wi h he
pu pose o suppo ing he p o essional when making a diagnosis o
in e p e ing an image.
One o he mos widely-used algo i hms in a i icial in elligence a e
A i icial Neu al Ne wo ks (ANNs). ANNs a e inspi ed by he ope a ions
pe o med by he human b ain. These ne wo ks, like he human b ain,
ecei e in o ma ion om he en i onmen h ough a aining p ocess,
whe e he synap ic weigh s s o e he acqui ed knowledge. Di e en
ypes o ANNs can be ound based on he ype and deg ee o con-
nec ions, as well as on he numbe o laye s. Con olu ional Neu al
Ne wo ks (CNNs) a e a ype o ANN ha has become popula in ecen
yea s, since hey a e e y e ec i e o machine ision asks, such
as image classi ica ion and segmen a ion, among o he applica ions.
Recen ly, many esea che s ha e s udied he applica ion o CNNs in he
diagnosis o nume ous ypes o diseases ha in ol e image in e p e a-
ion. Some wo ks, such as [10–14], ha e demons a ed he po en ial o
his kind o deep lea ning algo i hms in many di e en isual pa e n
classi ica ion p oblems in ol ing medical imaging.
Campanella e al. [15] de eloped a deep lea ning-based sys em
o dis inguish be ween cance and non-cance slides using mo e han
44000 WSIs om b eas , skin and p os a e issue wi hou pixel-wise
anno a ions. In ha pape , he au ho s p esen ed a no el amewo k
based on he mul iple ins ance lea ning app oach, which gene a es
a seman ically ich ea u e ep esen a ion. A Recu en Neu al Ne -
wo k (RNN) is used o in eg a e he ex ac ed in o ma ion in o de
o epo he inal classi ica ion esul , epo ing an A ea Unde he
Recei e Ope a ing Cha ac e is ic Cu e (AUC) o 0.986 o p os a e
cance de ec ion [15]. In [16], a CNN a chi ec u e is p esen ed o
dis inguish be ween low (GS6-GS7) and high (GS8 o GS10) Gleason
sco es using 895 Tissue Mic oa ays (TMA) images. A o al o 641
TMAs we e used o aining he CNN, which was hen e alua ed on an
independen es se consis ing o 245 TMAs anno a ed by wo di e en
pa hologis s. The au ho s epo ag eemen s o 0.75 and 0.71 (in e ms
o Cohen’s quad a ic kappa s a is ic) be ween he sys em and each
pa hologis , espec i ely, which we e compa able wi h he ag eemen
ob ained be ween he pa hologis s (0.71). S om e al. [17] p esen ed
wo CNNs ensembles (each consis ing o 30 Incep ionV3 [18] models
p e- ained on ImageNe [19]) o pe o m bina y classi ica ion (benign
o umo ) and GPs p edic ion. The au ho s used 6682 WSIs o aining
he sys em, 1631 WSIs o an independen es and 330 WSIs o an
ex e nal alida ion. The sys em achie ed an AUC o 0.997 (independen
es ) and 0.986 (ex e nal alida ion) o he bina y ask. Rega ding he
Gleason g ading, he au ho s ob ained a mean pai wise kappa o 0.62,
which was wi hin he ange o he in e -obse e a ia ion be ween 23
pa hologis s (0.60-0.73).
In his wo k, he in e -obse e a iabili y o a g oup o 5 pa hol-
ogis s ha anno a ed a da ase con aining slides om Clinic Hospi al
in Ba celona was analyzed a di e en le els. Di e en deep lea ning
a chi ec u es we e ained using P os a e cANce g aDe Assessmen
(PANDA) [20], he la ges publicly a ailable da ase , and Clinic da ase
in ou di e en app oaches, compa ing he a o emen ioned a iabili y
wi h he pe o mance o he models.
The main con ibu ions o his wo k include he ollowing:
•A s udy o he in e -obse e a iabili y o a eam o pa holo-
gis s om he same hospi al whe e he WSIs we e sou ced was
pe o med.
•A se o deep lea ning a chi ec u es we e ained wi h ou di -
e en me hods including he la ges publicly-a ailable p os a e
cance da ase and e alua ed on he same da ase o which he
in e -obse e a iabili y was analyzed.
•A o al o 240 CNN models we e e alua ed and compa ed, includ-
ing a b oad discussion o he in e -obse e a iabili y analysis
and he pe o mance ob ained by he neu al ne wo ks.
•The bes esul s we e ob ained wi h DenseNe 121 models, which
achie e a highe quad a ic Cohen’s kappa sco e (0.826 ±0.014)
han he in e -obse e a iabili y (0.6946), p o ing he iabili y
o deep lea ning-based sys ems o suppo ing pa hologis s in he
diagnosis.
The es o he pape is s uc u ed as ollows: Sec ion 2p esen s he
ma e ials and me hods used, including he da ase (Sec ion 2.1), he
p e-p ocessing applied o he images (Sec ion 2.2), an in oduc ion o
he di e en CNN models used (Sec ion 2.4), a b ie desc ip ion o he
expe imen s pe o med (Sec ion 2.5) and he me ics ha we e consid-
e ed o e alua e he ained models (Sec ion 2.6). Then, in Sec ion 3,
he esul s a e p esen ed, di iding hem in o hose ela ed o he in e -
obse e a iabili y analysis (Sec ion 3.1) and hose ela ed o he CNN
models (Sec ion 3.2). In Sec ion 4, he esul s ob ained a e discussed,
and, inally, he conclusions o his wo k a e p esen ed in Sec ion 5.
2. Ma e ials and me hods
2.1. Da ase
In his wo k, a local da ase o pa hological biopsy images ob-
ained om p os a e cance pa ien s om Clinic Hospi al (Ba celona,
Spain) was used. These cases consis ed o di e en samples ob ained by
means o needle co e biopsy and p epa ed wi h haema oxylin and eosin
(H&R) s ain in he labo a o y. The samples we e hen digi ized using a
VENTANA iScan HT (Roche Diagnos ics) scanne a 40×magni ica ion
(0.25 μm pe pixel). A o al o 80 di e en WSIs we e ob ained, which
we e hen pixel-wise anno a ed by a eam o i e pa hologis s om he
same hospi al om which he samples we e acqui ed. Since anno a ing
WSIs wi h ha le el o de ail is e y ime consuming, each o he
pa hologis s anno a ed only a ound a hi d pa o he o al amoun
o images (each WSI was anno a ed by, a leas , wo pa hologis s).
Pixel-wise anno a ions (also called s ong anno a ions) we e pe o med
using digi al g aphic able s and he QuPa h so wa e [21], labeling
malignan issue egions wi h Gleason pa e ns 3, 4 and 5. Al hough
he i e pa hologis s ha anno a ed he WSIs wo k in he same eam,
hey did no ask each o he no sha ed any in o ma ion ega ding he
anno a ion p ocess in o de no o bias he in e -obse e a iabili y
s udy pe o med in his wo k. Thus, his da ase was i s used o
analyze his aspec in a quan i a i e and s a is ical manne , and hen
o aining di e en deep lea ning a chi ec u es in o de o compa e
he esul s.
Since deep lea ning algo i hms equi e a highly he e ogeneous
aining da ase in o de o gene alize well on unseen da a, ano he
Compu e s in Biology and Medicine 159 (2023) 106856
3
J.M. Ma ón-Esqui el e al.
Table 1
Dis ibu ion o he WSIs used om Clinic Hospi al and PANDA challenge da ase s.
Da ase GS6 GS7 =3+4 GS7 =4+3 GS8 GS9-10 To al
Clinic Hospi al 42 12 10 7 9 80
PANDA challenge 802 673 909 764 964 4112
da ase was used in combina ion wi h he one ob ained om Clinic
Hospi al o his pu pose. This was he P os a e cANce g aDe Assess-
men (PANDA) Challenge da ase [20], which is public and con ains
11000 WSIs o digi ized H&R-s ained biopsies om Radboud Uni e -
si y Medical Cen e (Nijmegen, Ne he lands) and Ka olinska Ins i u e
(S ockholm, Sweden), o which 5060 images a e pixel-wise anno a ed.
This makes PANDA he la ges publicly-a ailable p os a e cance digi al
pa hology da ase a p esen .
Table 1 summa izes he WSIs used om each da ase , speci ying
hei co esponding g ound- u h Gleason sco e.
2.2. Image p e-p ocessing
WSIs a e gigapixel- esolu ion images whose size can be g ea e han
1 GB. Cu en GPUs and neu al ne wo ks a e unable o p ocess hese
images due o hei memo y limi a ions. The e o e, a p e-p ocessing
s ep was applied. A widely-known solu ion o o e come his p ob-
lem is pa ch-sampling he WSIs, which consis s in ex ac ing smalle
subimages, called pa ches, om he sou ce images, hus allowing hem
o be used as inpu o neu al ne wo ks. This p ocess is he cu en
and mos widely-used me hod o wo k wi h WSIs in deep lea ning,
and has been used in p e ious publica ions, such as [15,16,22–24],
among many o he s. In his wo k, he size o he pa ches ex ac ed we e
se o 750 ×750 pixels a 40×magni ica ion, since i was p e iously
used in o he s udies [16]. The pa ches we e densely ex ac ed, which
means ha no o e lapping be ween hem was se . Then, hese we e
subsampled o 224 ×224 pixels in o de o educe compu a ion and
also due o he ac ha i is he de aul inpu size in he p e- ained
CNN models ha we e used (see Sec ion 2.4).
Pa hologis s’ anno a ions we e used o delimi malignan a eas
wi hin WSIs. Pa ches we e only ex ac ed om hese a eas, since hey
con ained labeled egions o issue co esponding o he h ee Gleason
pa e ns conside ed in his wo k (GP 3-5). An 80% o e lapping h esh-
old wi h he anno a ions was se when ex ac ing he pa ches om he
WSIs, meaning ha a pa ch had o o e lap a leas by ha amoun
wi h an anno a ion in o de o be conside ed o he da ase , disca ding
hose wi h high backg ound con en and a oiding he addi ion o noisy
in o ma ion in he da ase . As a esul , 17632 pa ches we e ob ained
om he Clinic da ase , and 87824 om he PANDA da ase a e
applying he pa ch-sampling p ocess.
The pa ches ob ained we e used o ain and alida e di e en CNN
models, lea ing pa o he pa ches om Clinic o es hem and com-
pa e he esul s wi h he in e -obse e a iabili y measu ed among he
pa hologis s. Tables 2 and 3show he aining, alida ion and es ing
pa i ions used o he Clinic and he PANDA da ase , espec i ely, wi h
hei co esponding GP dis ibu ion. The pa i ions we e ca ied ou
aking in o accoun ha all he pa ches ob ained om he same pa ien
we e only in ol ed in a single se .
The lack o s anda diza ion in he H&E s aining p ocess leads o
colo a ia ions no only be ween images om di e en medical cen e s
o digi ized wi h di e en scanne s, bu also om he same sou ce
due o possible a ia ions ha may occu in he image p epa a ion
p ocess [25,26]. The e o e, he e is a endency o alle ia e his p oblem
by means o s ain no maliza ion and colo augmen a ion echniques.
These echniques help deep lea ning algo i hms ocus on he ele an
ea u es o he images du ing he aining s ep, while also homog-
enizing colo a ia ions ha may be p esen among hem. This is
pa icula ly impo an when wo king wi h images om di e en cen-
e s and scanne s whe e a di e en H&E s aining p ocess was ollowed.
Table 2
Pa ch dis ibu ion used in he aining, alida ion and es subse s wi h hei
co esponding GP om he Clinic da ase .
GP3 GP4 GP5 To al
T ain 3794 5424 2047 11265
Valida ion 1309 1718 728 3755
Tes 657 833 1122 2612
To al 5760 7975 3897 17632
Table 3
Pa ch dis ibu ion used in he aining and alida ion subse s wi h hei co esponding
GP om he PANDA challenge da ase .
GP3 GP4 GP5 To al
T ain 19954 38995 6019 65868
Valida ion 6604 13357 1995 21956
To al 26558 53252 8014 87304
Di e en echniques, such as His og am Equaliza ion (HE), Colo Space
T ans o ma ion (CST) and Colo Decon olu ion (CD) can be ound in
he li e a u e.
2.2.1. His og am equaliza ion
The basic idea behind HE is o ans o m he in ensi y alues o
he pixels in an image so ha he esul ing image has a uni o m
dis ibu ion o in ensi ies. The ans o ma ion is achie ed by compu ing
he cumula i e dis ibu ion unc ion (CDF) o he pixel in ensi ies in he
image and using i o map he o iginal in ensi y alues o new ones. The
new in ensi y alues a e chosen such ha he CDF o he new in ensi ies
is a linea unc ion [27].
The esul o his ans o ma ion is an image whe e he in ensi y
alues a e sp ead ou o e a wide ange, inc easing he con as o
he image. Howe e , his og am equaliza ion can also esul in he
o e -ampli ica ion o noise in he image, so i is impo an o use
he echnique wi h cau ion. To sol e his p oblem, he e a e se -
e al adap a ions o he me hod as seen in [28], whe e he use o
Adap i e His og am Equaliza ion (AHE) and Con as Limi ed Adap i e
His og am Equaliza ion (CLAHE) is p oposed.
2.2.2. Colo space ans o ma ion
CST is based on changing he colo space o an image, such as RGB
o g ayscale o HSV. A e ha , il e s a e applied o hese colo spaces
and ans o med back o RGB [29]. In [30], he au ho s de eloped a
me hod o ans o m a sou ce image in o a a ge image in he Lab
colo space. This was achie ed by calcula ing he mean and s anda d
de ia ion o each channel. A e comple ing he ans o ma ion, he
no malized image was hen con e ed back o he o iginal RGB colo
space.
2.2.3. Colo Decon olu ion
CD is a echnique ha sepa a es he con ibu ions o di e en dyes
o s ains used in his ological images. The goal o colo decon olu ion
is o isola e he indi idual colo channels in an image so ha each
componen can be analyzed and p ocessed independen ly.
His ological images a e o en s ained wi h mul iple dyes in o de o
highligh di e en s uc u es wi hin he issue. Fo example, one dye
may be used o s ain he nuclei, while ano he is used o s ain he
cy oplasm. By sepa a ing he con ibu ions o he di e en dyes, i is
possible o be e isualize and analyze he issue.
S ain Colo Adap i e No maliza ion Algo i hm (SCAN) [31] is based
on CD. I has been p oposed as a solu ion o enhance he con as
be ween he his ological issue and he backg ound while p ese ing
he local s uc u es in he image. This is achie ed wi hou al e ing he
colo o he lumen and he backg ound.
In [32], an Adap i e Colo Decon olu ion algo i hm is p oposed
o s ain sepa a ion and colo no maliza ion o H&E-s ained samples.
Compu e s in Biology and Medicine 159 (2023) 106856
4
J.M. Ma ón-Esqui el e al.
The p ocess o no maliza ion is accomplished using a uni o m colo
ans o ma ion ha maps pixels om he sou ce image o he empla e
image. This app oach does no equi e he classi ica ion o s ains.
Ins ead, he pa ame e s o colo no maliza ion a e de e mined h ough
an in eg a ed op imiza ion p ocess ha akes in o accoun he dis ibu-
ion o pixel alues. This esul s in he p ese a ion o he s uc u al
in o ma ion p esen in his ological images.
On he o he hand, i is possible o ind models ha use mo e han
one echnique, as in he case o [33], whe e a e inex model is designed
ha applies i s he colo space ans o ma ion and hen he colo
decon olu ion.
2.3. Da a augmen a ion
Inc easing he numbe o images and he he e ogenei y o he
da ase is a e y ele an aspec o conside when aining a CNN, since
i makes he sys em mo e obus , imp o es i s gene aliza ion o e un-
seen da a and p e en o e i ing [34]. The ele ance o applying da a
augmen a ion in compu a ional pa hology has been s udied and p o ed
in he li e a u e [35]. Thus, da a augmen a ion echniques ha e been
applied o inc ease he numbe o images and he he e ogenei y o he
da ase du ing he aining p ocess. Di e en ans o ma ions we e pe -
o med o he o iginal pa ches, hus, o each aining pa ch, ho izon al
and e ical lips we e applied, along wi h 90 deg ees o a ions.
In o de o ackle he p oblem o image and s ain a iabili y,
aining pa ches we e augmen ed in colo in hei HSV (hue, sa u a ion,
alue) ep esen a ion wi hin a speci ic limi ed ange ([−15, 8] o he
hue, [−20, 10] o he sa u a ion and [−8, 8] o he alue). So colo
augmen a ion has p o en o be one o he mos op imal app oaches
o ackling he s ain a iabili y p oblem [22,36]. Ro a ions, lips and
colo augmen a ion we e pe o med au oma ically a aining ime wi h
50% p obabili y o each o he men ioned p ocesses. To his end, he
open-sou ce Albumen a ions lib a y [37] was used.
2.4. Con olu ional neu al ne wo k models
Among all he exis ing ypes o neu al ne wo ks, CNNs ha e p o en
o be one o he mos accu a e and success ul algo i hms o image
analysis [38]. By means o con olu ion laye s, he ne wo k is able o
ex ac he main ea u es o he images, which a e hen ed o a se o
ully-connec ed laye s in o de o pe o m he classi ica ion. In addi ion
o con olu ions, CNNs consis o o he ypes o laye s ha imp o e and
speed up he lea ning and in e ence s eps by educing he amoun o
p ocessed in o ma ion.
In his wo k, di e en CNN models we e ained and e alua ed
in o de o compa e hei esul s wi h he in e -obse e a iabili y
be ween pa hologis s in he Clinic da ase . Among hem, he widely-
known VGG16 [39], DenseNe 121 [40] and Incep ionV3 [18] we e
used. All hese models we e ained based on p e- ained weigh s
om he Imagene da ase [19]. Along wi h hese, a G id Sea ch algo-
i hm [41] was pe o med, in which di e en cus om models con aining
om one con olu ion s age (con olu ion + pooling + ac i a ion laye s)
and a ully-connec ed laye up o a o al o 10 con olu ional s ages we e
explo ed and e alua ed, a ying he numbe and size o he con olu ion
il e s. Cus om models ha e shown imp o ed pe o mance o some
speci ic cases compa ed o p e- ained ones in he li e a u e [42].
The Adam op imize [43] was used when aining all o he models,
conside ing di e en lea ning a es anging om 1 × 10−3 o 1 × 10−6,
which a ied depending on he model. This op imize was selec ed
based on he e olu ion o he aining and alida ion losses.
In his wo k, Tenso Flow1[44] e sion 2.2.0, which is a well-
known Deep Lea ning Py hon lib a y ha allows designing, aining
and e alua ing deep neu al ne wo ks, was used o ha pu pose.
1h ps://www. enso low.o g
As can be obse ed in Tables 2 and 3, he pa ch dis ibu ion be ween
he h ee di e en GPs is no balanced. This could make he aining
p ocess ocus on classes wi h a la ge numbe o images when upda ing
he weigh s o he ne wo k. This po en ial p oblem was a oided by
using he class_weigh s pa ame e in Tenso Flow, which makes he
backp opaga ion algo i hm o compensa e classes du ing he aining
s ep based on he numbe o occu ences.
2.5. T aining s a egy and expe imen s
Di e en expe imen s we e ca ied ou in o de o, i s ly, analyze
he in e -obse e a iabili y on he Clinic da ase and, hen, compa e
he esul s wi h he pe o mance o di e en CNN models.
Rega ding he in e -obse e a iabili y analysis, wo di e en ex-
pe imen s we e ca ied ou . The i s , called O e lapping Anno a ed
A ea Analysis (OAAA), consis ed in measu ing he o e lapping a ea
o anno a ions by di e en pa hologis s co esponding o he same
egion o he slide. To his end, only hose anno a ions ha o e lap on
he same WSI we e analyzed. The second expe imen , called Labeling
Disc epancy Analysis (LDA) ollows an app oach ha is simila o ha
pe o med in he p e ious one. Howe e , ins ead o measu ing he
in e -obse e a iabili y in e ms o he a ea size o he anno a ed
malignan issue egions, an analysis ega ding he label ha was se
o each o he anno a ions was pe o med.
Rega ding he CNN expe imen s ha we e p oposed and pe o med,
he ollowing di e en aining app oaches we e conside ed:
•Expe imen 1: T aining and alida ing he models using Clinic
da ase only.
•Expe imen 2: T aining and alida ing he models using PANDA
da ase only.
•Expe imen 3: T aining and alida ing he models wi h PANDA
da ase , and hen ine- uning he models using Clinic da ase
(a e applying ans e lea ning).
•Expe imen 4: T aining and alida ing using bo h PANDA and
Clinic da ase s combined.
The es pa i ion in each o he expe imen s consis ed o pa ches
ex ac ed om he Clinic da ase . Fo each o he expe imen s, six
di e en CNN a chi ec u es we e conside ed: h ee o hem we e ob-
ained using G id Sea ch (in o de o achie e as e and less complex
models [42]), while he o he h ee co espond o widely-known p e-
ained ne wo ks, including VGG16, DenseNe 121 and Incep ionV3.
Fig. 1 shows a block diag am explaining he expe imen s pe o med.
T ans e lea ning [45] is a well-known Deep Lea ning echnique in
which he ea u e-ex ac ion laye s (con olu ional laye s) o a p e iously-
ained model a e ozen, and he weigh s o he las laye s a e upda ed
by aining hem wi h a di e en da ase ( ine- une), allowing he
ne wo k o be adap ed o a new da ase [46].
The pa i ions used in all he expe imen s a e p esen ed in Ta-
bles 2 and 3. As was p e iously men ioned, he es o he models was
pe o med using he co esponding pa i ion o Clinic da ase . Since
PANDA was only conside ed o ain and alida e he models due o
i s size and he e ogenei y, no es ing pa i ion can be seen in Table 3.
2.6. E alua ion me ics
Di e en e alua ion me ics can be used o de e mine he e ec-
i eness o neu al ne wo ks. Among hem, accu acy, speci ici y and
sensi i i y a e some o he mos used ones. The o me epo s a global
idea o how he ne wo k pe o ms, al hough i has a main d awback: i
ea s all classes as equal. This means ha , in e ms o accu acy, he e
is no di e ence on classi ying a chee ah as a ca o as a dolphin (bo h
would be conside ed as a misclassi ica ion o he ne wo k). Sensi i i y
and speci ici y a e widely used in medical applica ions, bu hey a e
mainly use ul o bina y classi ica ion p oblems.
Compu e s in Biology and Medicine 159 (2023) 106856
5
J.M. Ma ón-Esqui el e al.
Fig. 1. Block diag am o he expe imen s pe o med. The uppe pa (blue box) shows he analysis o he anno a ions p o ided by 5 pa hologis s om he Clinic da ase ,
dis inguishing be ween he o e lapped anno a ed a ea analysis and he labeling disc epancy analysis. On he bo om pa (o ange box), ou di e en aining s a egies pe o med
wi h deep lea ning algo i hms a e depic ed, using pa ches ex ac ed om he anno a ed egions o he WSIs, ob ained bo h om PANDA and Clinic da ase s.
The e o e, o e alua e he pe o mance o he models ained in
his wo k, Cohen’s kappa coe icien (𝜅) [47] was used. This me ic
measu es he ag eemen o disag eemen be ween wo a e s (which,
in ou case, a e he anno a ions om pa hologis s and he p edic ed
class om he ne wo k, o e en he anno a ions ha wo pa hologis s
assigned o he same issue egion). A alue o 1 means a comple e
ag eemen o bo h a e s. On he o he hand, a sco e o 0 ep esen s
a andom ag eemen . The quad a ic e sion o he sco e was used
(see Eq. (1)), which penalizes disag eemen s be ween he wo a e s
depending on he class di e ence in a quad a ic manne (ins ead o
linea ly, which would be he de aul 𝜅). This sco e has been ex ensi ely
used in p e ious wo ks in he ield o compu a ional pa hology [16,
17,22,48,49]. Tha way, a disag eemen be ween GP3 and GP5 would
esul in a s onge penaliza ion in 𝜅 han ha o a disag eemen
be ween GP3 and GP4.
𝜅= 1 −
∑𝑘
𝑖,𝑗 𝑤𝑖,𝑗 𝑂𝑖,𝑗
∑𝑘
𝑖,𝑗 𝑤𝑖,𝑗 𝐸𝑖,𝑗
, 𝑤𝑖,𝑗 =(𝑖−𝑗)2
(𝑁− 1)2(1)
In Eq. (1),𝑖and 𝑗a e he CNN ou pu classes, anging om 0 o 2 o
GP classi ica ion (0: GP 3, 1: GP 4, 2: GP 5; 𝑁= 3). 𝑂𝑖,𝑗 is he mul iclass
con usion ma ix, which is an 𝑁×𝑁his og am ep esen ing he numbe
o images ha we e classi ied wi h a speci ic pa e n 𝑖by he i s
e alua o and 𝑗by he second. 𝐸𝑖,𝑗 is an 𝑁×𝑁ma ix o expec ed
esul s, i.e., a his og am wi h he expec ed numbe o images classi ied
as 𝑖by he i s e alua o and as 𝑗by he second. The weigh ed ma ix,
𝑤𝑖,𝑗 , is calcula ed as a unc ion o he di e ence be ween he ue and
p edic ed class, and i is used o penalize p edic ions mo e s ongly he
mo e di e en hey a e om he ue alue. In his ma ix, he main
diagonal is always 0, while he ou e alues o he an i-diagonal a e
1. Mo e in o ma ion ega ding quad a ic weigh ed kappa can be ound
in sciki -lea n’s cohen_kappa_sco e unc ion.2and in Da a Science Bowl
2019 E alua ion page3
3. Resul s
The esul s o he expe imen s pe o med a e di ided in o wo main
subsec ions: i s ly, he in e -obse e a iabili y among pa hologis s o
he Clinic da ase is s udied and e alua ed in Sec ion 3.1; hen, di e en
deep lea ning models we e ained and e alua ed on he same da ase
used in 3.1, and he esul s ob ained using he a o emen ioned me ics
(see Sec ion 2.6) a e p esen ed in Sec ion 3.2.
3.1. In e -obse e a iabili y analysis
The in e -obse e a iabili y o he Clinic da ase was analyzed a
wo di e en le els. Fi s ly, he o e lapping a ea o he anno a ed e-
gions om he di e en pa hologis s was measu ed. Then, a compa ison
2h ps://sciki -lea n.o g/s able/modules/gene a ed/sklea n.me ics.cohen_
kappa_sco e.h ml
3h ps://www.kaggle.com/c/da a-science-bowl-2019/o e iew/
e alua ion
Compu e s in Biology and Medicine 159 (2023) 106856
6
J.M. Ma ón-Esqui el e al.
Fig. 2. O e lapping Anno a ed A ea Analysis p ocedu e. Anno a ions om wo di e en pa hologis s (P1 and P2) ha in e sec a a minimum o 15% be ween hem a e used o
he calcula ion (see Eq. (2)).
o he label o in e sec ing anno a ions om di e en pa hologis s was
pe o med.
3.1.1. O e lapping anno a ed a ea analysis (OAAA)
As men ioned in Sec ion 2.5, his expe imen consis ed in mea-
su ing he o e lapping a ea o anno a ions by di e en pa hologis s
co esponding o he same egion o he slide. To his end, only hose
anno a ions ha o e lap on he same WSI we e analyzed.
Some p oblema ic cases can appea , such as hose in which he same
malignan egion is anno a ed by wo pa hologis s in comple e di e en
ways (e.g., one o hem selec s a la ge issue egion as GP3, while he
o he anno a es smalle sub egions inside he la ge one wi h he same
label). In o de o p e en hese cases om a ec ing he esul s, he
a ea o all he small sub egions we e summed up and hen compa ed
o he la ge one, ins ead o pe o ming a simple pai wise compa ison.
Mo eo e , a 15% o e lap be ween anno a ion a eas was se o a oid
ou lie s.
The o e lapping a ea a io be ween wo anno a ions was ob ained
by means o he in e sec ion o e union, aking in o accoun he
a o emen ioned (see Eq. (2)).
𝑂𝐴𝐴𝐴 = (1 − 𝐴𝑎𝑐𝑢𝑚
𝐴1+𝐴𝑎𝑐𝑢𝑚 −𝐴𝑖𝑛𝑡𝑒𝑟𝑠𝑒𝑐
) ∗ 100 (2)
Whe e 𝐴𝑎𝑐𝑢𝑚 is he sum o he a eas anno a ed by pa hologis P1 ha
in e sec a a minimum o 15% wi h 𝐴1(a la ge anno a ion pe o med
by pa hologis P2), and 𝐴𝑖𝑛𝑡𝑒𝑟𝑠𝑒𝑐 is he a ea o he in e sec ion be ween
he anno a ions. A o al o 145 pai s o in e sec ing anno a ions we e
analyzed. Among hese, only 3 pai s did no exceed he 15% o e lap-
ping h eshold se . Fo each pai o anno a ions, exp ession (2) was
applied. Fig. 2 shows he whole p ocedu e ollowed o he OAAA
calcula ion. As a esul , a mean pai wise a ea disc epancy o 46% was
ob ained on o e lapping anno a ions by di e en pa hologis s.
3.1.2. Labeling disc epancy analysis (LDA)
As men ioned in Sec ion 2.5, an analysis ega ding he label ha
was se o each o he anno a ions was pe o med. To his end, Cohen’s
kappa sco e was used (see Sec ion 2.6). Two lis s o GPs we e de ined
o ob ain he esul , whe e index 𝑖 e e s o a issue a ea anno a ed as
𝐺𝑃 1[𝑖]by one pa hologis and as 𝐺𝑃 2[𝑖]by ano he pa hologis .
The amoun o pai s o anno a ions om he same issue egion
ha sha e he same label was calcula ed, which esul ed in 74.34%
ag eemen (25.66% disc epancy). The g ound- u h con usion ma ix
ob ained om he wo a o emen ioned lis s o anno a ions can be seen
in Fig. 3. A o al o 2116 pai s o anno a ions we e analyzed, o which
543 (25.66%) did no sha e he same pa e n. Consequen ly, a quad a ic
Cohen’s kappa o 0.6946 was ob ained.
Fig. 3. Con usion ma ix and kappa sco e o he g ound- u h anno a ions p o ided by
he g oup o pa hologis s om Clinic Hospi al ha anno a ed he slides.
Table 4
Tes accu acy (𝑡𝑒𝑠𝑡_𝑎𝑐𝑐) and Cohen’s quad a ic Kappa sco e (𝜅) o he 10 models
designed by means o he G id Sea ch algo i hm using Clinic da ase o aining,
alida ing and es ing.
es _acc 𝜅
IT 1 0.445 0.367
IT 2 0.501 0.429
IT 3 0.570 0.617
IT 4 0.487 0.609
IT 5 0.522 0.604
IT 6 0.636 0.727
IT 7 0.605 0.713
IT 8 0.658 0.739
IT 9 0.319 0
IT 10 0.319 0
3.2. CNN expe imen s esul s
As in oduced in Sec ion 2.5, di e en CNN models we e ained
using bo h da ase s desc ibed in Sec ion 2.1 in o de o compa e he
esul s wi h he disc epancy o he g oup o pa hologis s ha was
e alua ed in he p e ious expe imen .
In o de o epo obus esul s, each o he ne wo ks was ained
10 imes each. The esul s om each ne wo k a chi ec u e a e epo ed
as he mean and he s anda d de ia ion o he Cohen’s quad a ic kappa
s a is ic (see Sec ion 2.6).
Compu e s in Biology and Medicine 159 (2023) 106856
7
J.M. Ma ón-Esqui el e al.
Fig. 4. Diag am o he a chi ec u e o he h ee cus om models (Cus om 1, Cus om 2 and Cus om 3) used. CS s ands o Con olu ional S age, which consis s o a con olu ion
laye , a max-pooling laye and an ac i a ion laye .
Table 5
Kappa sco e achie ed on he Clinic es se o he 10 models ained o each CNN a chi ec u e wi h he Clinic da ase . The mean and s anda d
de ia ion a e also epo ed o each a chi ec u e.
VGG16 Incep ionV3 DenseNe 121 Cus om 1 Cus om 2 Cus om 3
Model 1 0.680 0.779 0.845 0.727 0.669 0.735
Model 2 0.708 0.772 0.838 0.675 0.632 0.702
Model 3 0.692 0.808 0.833 0.779 0.733 0.749
Model 4 0.698 0.767 0.817 0.710 0.658 0.753
Model 5 0.653 0.813 0.811 0.698 0.751 0.714
Model 6 0.586 0.805 0.830 0.747 0.717 0.664
Model 7 0.795 0.818 0.796 0.644 0.626 0.741
Model 8 0.773 0.800 0.841 0.708 0.717 0.705
Model 9 0.680 0.794 0.832 0.746 0.694 0.736
Model 10 0.732 0.810 0.821 0.662 0.752 0.747
Mean 0.700 ±0.056 0.797 ±0.017 0.826 ±0.014 0.709 ±0.040 0.695 ±0.044 0.725 ±0.027
3.2.1. Clinic aining
As was explained in Sec ion 2.5, he i s expe imen ega ding he
use o CNNs consis ed in aining di e en a chi ec u es wi h he Clinic
da ase and hen e alua ing hem wi h an ex e nal pa i ion o he same
da ase . Table 4 p esen s he esul s ob ained wi h he G id Sea ch
algo i hm. In each i e a ion, a con olu ion laye was added oge he
wi h a ba ch no maliza ion laye and a 2D max pooling laye . The
bes esul s we e ob ained in i e a ions 6 (0.727 𝜅), 7 (0.713 𝜅) and 8
(0.739 𝜅), which, om now on, will be e e ed o as Cus om 1, Cus om
2 and Cus om 3 models, espec i ely. Fig. 4 shows a ep esen a ion o
he a chi ec u e o hese h ee cus om models.
These h ee models, oge he wi h VGG16, Incep ionV3 and
DenseNe 121 we e ained en imes each, and he esul s o each o
hem can be seen in Table 5.
Fig. 5 p esen s he con usion ma ices o he bes models o each
o he ained a chi ec u es. As can be obse ed, all he models achie e
e y high accu acy on GP3 and GP5, which is no he case o GP4, as
i is e y o en con ounded wi h GP3.
3.2.2. PANDA aining
The second expe imen consis ed in aining each o he a chi ec-
u es p oposed wi h he PANDA da ase , which we e hen es ed on he
Clinic da ase . Table 6 p esen s he esul s o each o he en models
ained o each a chi ec u e, along wi h he mean 𝜅and s anda d
de ia ion. As expec ed, he models achie ed a lowe pe o mance wi h
espec o he p e ious expe imen , since he aining and alida ion o
he ne wo ks was pe o med on a da ase (PANDA) di e en han he
one used o he inal e alua ion wi h which he me ics we e ob ained
(Clinic).
co 6p esen s he con usion ma ices o he model ha achie ed
he bes esul s o each o he a chi ec u es conside ed. As can be
seen, he models end o classi y GP3 as GP4 in mos o he cases. I
should also be men ioned ha hese ma ices a e ob ained om he bes
models, which co espond o ex eme posi i e ou lie s no ep esen ing
he a e age case, as can be seen in Table 6.
3.2.3. T ans e lea ning
Since he esul s ob ained in he p e ious expe imen we e no as
good as expec ed, he same models we e used in a ans e lea ning
expe imen in o de o imp o e he esul s. To his end, he weigh s o
he ea u e ex ac ion laye s o he ained models ( ained only wi h
he PANDA da ase ) whe e ozen, and he ully-connec ed laye s whe e
ine- uned wi h he ain and alida ion pa i ions o he Clinic da ase .
The esul s ob ained a e es ing he ine- uned models wi h he es
pa i ion o he Clinic da ase a e p esen ed in Table 7. The bes o e all
esul was achie ed by he VGG16 models, which ob ained an a e age
o 0.746 ± 0.030𝜅. Among hem, he bes model was able o achie e a 𝜅
o 0.789. A clea imp o emen can be obse ed on he esul s ob ained
o each o he a chi ec u es compa ed o he p e ious expe imen .
Fig. 7 p esen s he con usion ma ices ob ained wi h he bes model
o each o he e alua ed a chi ec u es. A beha io simila o ha o he
i s expe imen is obse ed, in which he models end o classi y GP4
as GP3, e en o hose cases wi h he highes 𝜅.
3.2.4. Clinic and PANDA da ase s combined
In his subsec ion, he models ha e been ained wi h bo h da ase s
in o de o imp o e he no o ious con usion be ween GP3 and GP4
ound in he p edic ions o he models. In his case, no laye s we e
ozen, and all o hem we e ained om sc a ch wi h bo h da ase s
a he same ime (see Fig. 8 and Table 8).
4. Discussion
The in e -pa hologis a iabili y is widely-known in he compu a-
ional pa hology ield and, pa icula ly, in p os a e cance classi ica-
ion. The high he e ogenei y o he digi ized issue samples, he lack o
e y p ecise ules o ollow when choosing he speci ic GP o assign o
a issue egion ( he pa e n is assigned based on he ex en o which he
issue esembles na i e issue) and he subjec i i y o he pa hologis s
ha pe o m he anno a ions a e some o he main ac o s ha inc ease
Compu e s in Biology and Medicine 159 (2023) 106856
8
J.M. Ma ón-Esqui el e al.
Fig. 5. Con usion ma ices o he bes models ained wi h Clinic da ase .
he a o emen ioned a iabili y. This, oge he wi h he ecen ad ances
in a i icial in elligence, sugges s ha deep lea ning algo i hms could
be used as a suppo sys em o pa hologis s in o de o help hem in
he analysis ask and o educe he in e -obse e a iabili y.
In his wo k, di e en expe imen s we e pe o med in o de o ana-
lyze he in e -obse e a iabili y o a local da ase om Clinic Hospi al
(Ba celona, Spain) ha was anno a ed by a g oup o 5 pa hologis s,
which was hen compa ed o ou di e en deep lea ning-based aining
app oaches including 6 di e en CNN a chi ec u es.
In Sec ion 3.1.1, he size o he issue a eas anno a ed by he
pa hologis s was analyzed. To his end, a pai wise compa ison be ween
coinciden anno a ed egions was pe o med, esul ing on an a e age
disc epancy o 46% in size. Such high disc epancy can be a ibu ed
o he subjec i i y o anno a ing WSIs by hand in a manual p ocess,
in which a pa hologis may ha e been e y s ic in making he an-
no a ions while o he s may ha e anno a ed in a mo e gene al way
wi hou being oo speci ic. This also includes cases whe e a pa hologis
may ha e ma ked a whole issue egion as a egion o in e es , while
o he s may ha e anno a ed a se o smalle issue a eas wi hin ha
same egion. These cases a e conside ed in he OAAA calcula ion, since
smalle anno a ions om a pa hologis ha o e lap wi h a la ge one
om a di e en pa hologis a e summed up be o e calcula ing he
Compu e s in Biology and Medicine 159 (2023) 106856
9
J.M. Ma ón-Esqui el e al.
Table 6
Kappa sco e achie ed on he Clinic es se o he 10 models ained o each CNN a chi ec u e wi h he PANDA da ase . The mean and
s anda d de ia ion a e also epo ed o each a chi ec u e.
VGG16 Incep ionV3 DenseNe 121 Cus om 1 Cus om 2 Cus om 3
Model 1 0.454 0.526 0.389 0.499 0.427 0.082
Model 2 0.450 0.552 0.516 0.195 0.021 0.084
Model 3 0.501 0.483 0.483 0.162 0.101 0.186
Model 4 0.512 0.385 0.446 0.424 0.213 0.046
Model 5 0.539 0.432 0.563 0.142 0.236 0.409
Model 6 0.584 0.456 0.420 0.062 0.372 0.225
Model 7 0.484 0.204 0.469 0.020 0.158 0.018
Model 8 0.470 0.576 0.530 0.333 0.288 0.381
Model 9 0.529 0.579 0.420 0.389 0.358 0.336
Model 10 0.634 0.560 0.529 0.298 0.276 0.083
Mean 0.516 ±0.055 0.475 ±0.109 0.476 ±0.055 0.252 ±0.152 0.245 ±0.120 0.185 ±0.139
Table 7
Kappa sco e achie ed on he Clinic es se o he 10 models ained o each CNN a chi ec u e wi h he PANDA da ase and hen ine- uned
wi h Clinic. The mean and s anda d de ia ion a e also epo ed o each a chi ec u e.
VGG16 Incep ionV3 DenseNe 121 Cus om 1 Cus om 2 Cus om 3
Model 1 0.755 0.737 0.725 0.584 0.516 0.523
Model 2 0.763 0.739 0.729 0.539 0.632 0.586
Model 3 0.770 0.742 0.727 0.579 0.558 0.492
Model 4 0.789 0.706 0.734 0.547 0.603 0.626
Model 5 0.713 0.723 0.739 0.542 0.632 0.603
Model 6 0.727 0.712 0.735 0.580 0.641 0.558
Model 7 0.690 0.752 0.732 0.564 0.564 0.591
Model 8 0.747 0.691 0.730 0.583 0.532 0.373
Model 9 0.724 0.727 0.713 0.606 0.570 0.584
Model 10 0.783 0.703 0.740 0.640 0.560 0.354
Mean 0.746 ±0.030 0.723 ±0.019 0.731 ±0.007 0.579 ±0.028 0.581 ±0.042 0.529 ±0.091
Table 8
Kappa sco e achie ed on he Clinic es se o he 10 models ained o each CNN a chi ec u e wi h he PANDA and Clinic da ase s combined.
The mean and s anda d de ia ion a e also epo ed o each a chi ec u e.
VGG16 Incep ionV3 DenseNe 121 Cus om 1 Cus om 2 Cus om 3
Model 1 0.685 0.666 0.739 0.783 0.763 0.761
Model 2 0.562 0.729 0.807 0.802 0.728 0.603
Model 3 0.837 0.767 0.722 0.601 0.794 0.694
Model 4 0.610 0.742 0.751 0.754 0.798 0.738
Model 5 0.775 0.708 0.751 0.760 0.668 0.607
Model 6 0.851 0.758 0.770 0.822 0.724 0.631
Model 7 0.814 0.719 0.743 0.762 0.756 0.627
Model 8 0.798 0.674 0.729 0.606 0.805 0.753
Model 9 0.810 0.754 0.686 0.780 0.736 0.724
Model 10 0.773 0.764 0.762 0.781 0.610 0.721
Mean 0.751 ±0.094 0.728 ±0.034 0.746 ±0.030 0.745 ±0.074 0.738 ±0.058 0.686 ±0.059
disc epancy be ween hem, al hough all he s oma and benign cells
included in he la ge anno a ion would be conside ed, ep esen ing a
decen inc ease o ha disc epancy.
On he o he hand, wi h espec o he in e -obse e a iabili y
analysis ega ding he labels o he anno a ions pe o med in Sec-
ion 3.1.2, he esul ob ained (0.6946 𝜅be ween pa hologis s) is
consis en wi h ha p esen ed in [16], which epo s a simila 𝜅 alue
analyzed on TMAs ins ead o WSIs.
Rega ding he CNN expe imen s p esen ed in Sec ion 3.2, ou
di e en aining app oaches we e conside ed in o de o compa e
he pe o mance ob ained wi h he in e -obse e a iabili y ha was
p e iously analyzed. These expe imen s we e aimed a compa ing he
di e en aining me hods in e ms o pa ch-le el esul s on he es se .
Fi s ly, in Sec ion 3.2.1, six di e en neu al ne wo k models we e
ained wi h s ongly-anno a ed pa ches ex ac ed om he Clinic
da ase . Among hem, h ee widely-known CNN a chi ec u es (VGG16,
DenseNe 121 and Incep ionV3) we e used, oge he wi h he h ee bes
cus om models ob ained by means o a G id Sea ch algo i hm, which
au oma ically explo ed di e en a chi ec u es om one con olu ion
s age up o en wi h di e en hype pa ame e s and il e sizes. In
Table 4, i can be obse ed ha he models om i e a ion 8 onwa d,
which con ain 9 and 10 con olu ion s ages, a e no unc ional. Each
con olu ion s age educes he size o he ea u e maps, making hem
e y small and lose ele an in o ma ion a e 8 consecu i e con olu-
ion s ages, which explains why he ully-connec ed laye s a e no longe
able o classi y he ex ac ed ea u es co ec ly. On he o he hand,
i e a ions 6, 7 and 8 epo he bes esul s. These h ee a chi ec u es
oge he wi h VGG16, DenseNe 121 and Incep ionV3 we e ained en
imes each, epo ing he mean and he s anda d de ia ion o he
esul s. This app oach is commonly ollowed o educe undesi ed e ec s
in oduced by he s ochas ic g adien descen op imize adop ed du ing
he model op imiza ion. F om he esul s epo ed in Table 5, i can be
seen ha he bes o e all esul s we e ob ained by he DenseNe 121
models wi h 𝜅= 0.826 ±0.014, while he bes esul was ob ained
by one o he Incep ionV3 models epo ing 𝜅= 0.845. Al hough
hese esul s we e al eady highe han he in e -obse e a iabili y
analyzed on he Clinic da ase (𝜅= 0.6946), o he aining me hods
we e explo ed in o de o a oid biased esul s ob ained om models
ha we e ained and es ed on he same da ase . To his end, he
la ges publicly-a ailable p os a e cance da ase was used o ain he
models in di e en ways, which allows o a highe gene aliza ion o
he model due o he he e ogenei y o he da ase .
The esul s o he expe imen in which he ne wo ks we e ained
wi h PANDA and es ed on Clinic (see Sec ion 3.2.2) show a dec ease