AUTOMATING LICENSE-AWARE FULL-TEXT RETRIEVAL
FOR SYSTEMATIC REVIEWS: AN END-TO-END SCALABLE SYSTEM TO
REDUCE REVIEWER WORKLOAD
D. Zhuk*,†,1, E. Sandne †,2,4, I. Jako lje ic2, A. Simniceanu3, L. Fon ana3, A. Hen iques2,
A. Wagne 2, C. Gü l4
1Uni e si y o Vienna, 1010 Vienna, Aus ia
2CERN, 1211 Gene a, Swi ze land
3WHO, 1211 Gene a, Swi ze land
4G az Uni e si y o Technology, 8010 G az, Aus ia
Abs ac
Sys ema ic e iews a e widely ega ded as he mos ig-
o ous me hod o syn hesizing scien i ic e idence, ye hey
emain highly labou -in ensi e. Full- ex e ie al is a mo-
no onous, epe i i e, and ime-consuming ask ha equi es
e iewe s o loca e and alida e la ge numbe s o a icles.
Exis ing ools only pa ially add ess his s ep, wi h limi ed
suppo o au oma ed, open-sou ce, and legally complian
e ie al ac oss he e ogeneous eposi o ies. To add ess his
gap, a license-awa e, open-sou ce sys em was de eloped o
au oma e ull- ex e ie al, ex ac ion, and alida ion as
pa o he Neu inoRe iew p ojec . The sys em in eg a es
open APIs (Unpaywall, PubMed, Eu opePMC, C oss e )
wi h a p io i ized lookup s a egy, b owse -based PDF
downloading, ex ex ac ion, and me ada a-based alida-
ion. Pe o mance was e alua ed ac oss 500 a icles om
i e majo schola ly eposi o ies (PubMed, PMC, Eu o-
pePMC, IEEE Xplo e, ACM Digi al Lib a y). Resul s
show consis en ly high combined ex ac ion a es (CER ≥
0.800) and a e age p ocessing imes o 7–9 seconds pe a -
icle. In ealis ic e iew scena ios, he sys em achie es a
PDF e ie al a e o 82.68% and educes manual e ie al
wo kload by app oxima ely 80%, co esponding o ime
sa ings o mo e han 3 hou s in median sized SRs. These
indings demons a e he easibili y o au oma ing a c i ical
s ep in SR wo k lows, imp o ing ep oducibili y and scala-
bili y while eeing esea che s o ocus on e idence syn-
hesis.
INTRODUCTION
To gain a comp ehensi e unde s anding o a subjec
a ea, agmen ed knowledge mus be o ganized in o s uc-
u ed in o ma ion. Sys ema ic e iew (SR) is a syn hesis o
iden i ied and c i ically assessed e idence o opic unde -
s anding. This p ocess is conside ed mo e igo ous and o-
bus han a li e a u e e iew as i ollows a s ic me hodo-
logical amewo k, ypically accompanied by p ede ined
inclusion c i e ia [1]. Gene ally, SR consis s o se e al
phases ha may a y depending on he me hodology ap-
plied:
•P ojec Ini ia ion and Da a Re ie al – de ining he e-
sea ch ques ion, se ing inclusion and exclusion
c i e ia, and e ie ing bibliog aphic me ada a om
ele an schola ly eposi o ies.
•Sc eening – applying eligibili y c i e ia o i les, ab-
s ac s, and subsequen ly ull- ex a icles o ensu e
only ele an s udies a e included.
•Da a Ex ac ion – ga he ing ele an me hodological
de ails, ou comes, and con ex ual in o ma ion om he
included a icles o u he analysis and syn hesis.
Full- ex e ie al is he key pa o he sc eening s age
o SR, whe e e iewe s mus examine he comple e ex o
a icles o de e mine hei eligibili y [2]. Wi hou access o
he ull- ex da a, impo an me hodological de ails, ou -
comes, o con ex may emain hidden, leading o biased e -
idence syn hesis. In pa icula , his pape add esses he ol-
lowing esea ch ques ion: How can an open-sou ce, legally
complian solu ion be de eloped o au oma e ull- ex e-
ie al, ex ac ion, and alida ion ac oss mul iple schol-
a ly eposi o ies, educing e iewe wo kload and imp o -
ing ep oducibili y in SRs?
BACKGROUND AND RELATED WORK
I is wo h no ing ha o e all pe o ming a high-quali y
SR equi es a lo o manual wo k and emains ime-con-
suming, especially when ollowing speci ic guidelines o
be buil upon (e.g., P e e ed Repo ing I ems o SRs and
Me a-Analyses) [3]. Conduc ing SR may ake om 6 o 18
mon hs [4]. In pa icula , ull- ex e ie al o en ep esen s
a c i ical bo leneck – while bibliog aphic eco ds a e yp-
ically e ie able h ough Applica ion p og amming in e -
aces (API) o s uc u ed sea ch in e aces, access o co e-
sponding ull- ex a icles can be agmen ed, license- e-
s ic ed, o en i ely una ailable wi hou manual in e en-
ion. Fo ins ance, one o he mos common p oblems in-
clude bu a e no limi ed o:
•Publishe paywalls and subsc ip ion ba ie s as many
a icles emain inaccessible wi hou ins i u ional ac-
cess o indi idual paymen s.
•Licensing and copy igh es ic ions as e en when ac-
cess is g an ed, ex -mining o bulk e ie al may be
legally cons ained.
•He e ogeneous pla o ms and o ma s as ull ex s a e
dispe sed ac oss mul iple sou ces wi h inconsis en
me ada a, o ma s, and access p o ocols.
___________________________________________
* a1244642[email p o ec ed]ie.ac.a
† Bo h au ho s con ibu ed equally o his wo k
h ps://doi.o g/10.5281/zenodo.17220241
Figu e 1: Concep ual a chi ec u e o ull- ex e ie al module o sys ema ic e iew au oma ion ools; Blue ep esen s
he Open-Sou ce API Que ying Componen , G een ep esen s PDF Full- ex Da a Ex ac ion, and Red ep esen s PDF
Full- ex Valida ion.
•Incomple e o un eliable linking as bibliog aphic
me ada a o en lacks s able IDs o di ec links o ull-
ex sou ces, equi ing manual sea ches.
•Limi ed API suppo as only some eposi o ies (e.g.,
PubMed) p o ide open APIs, while o he s es ic p o-
g amma ic access.
As a esul , e iewe s o en spend subs an ial ime loca -
ing, downloading, and e i ying eligible esou ces and co -
esponding a icles – an e o ha de ac s om he ana-
ly ical phase o he e iew p ocess. I is epo ed ha e-
sou ce-in ensi e ask such as ull- ex e ie al can con-
sume a minimum o 8,000 minu es o esea che ime, de-
pending on he sc eening app oach used [5]. Despi e ecog-
ni ion o he challenges desc ibed, exis ing ools (e.g.,
ASRe iew, Cadmus, Co idence) p o ide only pa ial solu-
ions, wi h limi ed suppo o au oma ed, open-sou ce, and
legally complian ull- ex e ie al [6]. I is he e o e
wo h emphasizing ha he bu den o e o in SR is s ill
skewed owa d labou -in ensi e s eps, whe e e ie al, ex-
ac ion, and alida ion o ull- ex s a e pe o med manu-
ally, e en when ups eam bibliog aphic sea ches a e au o-
ma ed. The imbalance o e all no only slows down e iew
p oduc ion bu also isks inconsis encies ac oss espec i e
esea ch p ojec s consuming aluable ime esou ces [7].
Consequen ly, hose pe sis en limi a ions highligh he
need o new app oaches ha in eg a e me ada a sea ch
wi h anspa en ull- ex access. Such app oaches will no
only educe manual wo kload bu also p omo e consis ency
and ep oducibili y SR eams.
CONCEPTUAL DESIGN
The Neu inoRe iew p o o ype1 al eady suppo s au o-
ma ed bibliog aphic me ada a e ie al om majo sou ces
such as PubMed, MEDLINE, and Eu opePMC, as well as
om use -supplied da ase s. The me ada a is s o ed in a
s uc u ed da abase, p o iding a ounda ion o scalable
and ep oducible e iew wo k lows. Howe e , he cu en
implemen a ion s ops sho o deli e ing ull- ex e ie al
capabili ies, lea ing e iewe s o pe o m his s ep manu-
ally.
Figu e 1 illus a es he p oposed sys em o ull- ex e-
ie al, ea u ing end- o-end, license-awa e a chi ec u e
designed o seamless in eg a ion in o he Neu inoRe iew
p ojec . I is o ganized in o h ee key componen s, each ad-
d essing a c i ical s ep in he e ie al p ocess, which a e
desc ibed subsequen ly.
Open-Sou ce API Que ying
The i s componen akes bibliog aphic inpu s (DOI,
PMID/PMCID, i le, au ho s) and a emp s o disco e le-
gally e ie able ull- ex a e ac s (i.e., PDF URLs, XML
s uc u e) and accompanying license. I ollows a p io i-
ized lookup s a egy designed o maximize accu acy and
ep oducibili y:
•I DOI is p esen , he sys em que ies Unpaywall [8] o
ob ain candida e open-access PDF URLs and license
in o ma ion. Unpaywall is p e e ed because i agg e-
ga es open-access loca ions and e u ns explici license
me ada a when a ailable.
•I only PMID/PMCID is supplied (o DOI lookup
ails), he sys em que ies PubMed/PubMed Cen al
(PMC)/BioC endpoin s o eco e s uc u ed XML and
any license s a emen s embedded in eposi o y
me ada a. When he esul con ains DOI, he DOI is e-
checked agains Unpaywall as a seconda y sou ce.
•As a inal lookup, a me ada a- o-DOI lookup agains
C oss e is a emp ed using i le and au ho s ings
(also in case i DOI/PMID/PMCID lookups ail); any
disco e ed DOI is hen checked wi h Unpaywall.
All license s ings e u ned by ex e nal se ices a e no -
malized in o a compac decision se used by downs eam
___________________________________________
1) h ps://gi lab.ce n.ch/caimi a/caimi a-wp4/neu ino e iew
h ps://doi.o g/10.5281/zenodo.17220241
logic: pe missi e o ex mining and s o ing (open) o no
pe missi e (una ailable). The componen logs each se ice
que y, imes amps, aw se ice esponses, and he no mal-
iza ion a ionale o c ea e an audi able acking eco d.
PDF Full- ex Da a Ex ac ion
The second componen is esponsible o acqui ing he
canonical a icle ile (i.e., PDF s uc u e) om PDF URLs
when a ailable, con e ing ha ile in o ex ac able ex ,
and e u ning a no malized ex ual ep esen a ion sui able
o downs eam pa sing, alida ion, and sc eening. I is im-
plemen ed by wo coope a ing ou ines: a b owse -based
downloade used as a allback and p ima y PDF e ie al
and ex -ex ac ion unc ion. In ope a ion, he ex ac o
i s p epa es conse a i e b owse -like HTTP heade s and
p e e s s uc u ed o di ec access. I ha ails, i pe o ms
an HTTP GET and alida es he esponse Con en -Type be-
o e opening he by es. When publishe s se e PDF iles
dynamically o equi e Ja aSc ip , he ex ac o alls back
o a headless Ch omium downloade ha polls a empo a y
download di ec o y o a comple ed .pd ile. Once alid
PDF s eam is ob ained, he ex ac o i e a es pages o col-
lec page-le el ex and e u ns a single conca ena ed ex
(wi h page b eaks p ese ed). Such ailu es as non-PDF e-
sponses, ne wo k imeou s, co up ed iles, o images esul
in a None e u n and a e eco ded wi h s anda dized diag-
nos ics.
PDF Full- ex Valida ion
The hi d componen e i ies ha he ull- ex ex ac ed
om a e ie ed PDF co esponds o he expec ed biblio-
g aphic me ada a and mee s minimum quali y c i e ia be-
o e he documen is u he p ocessed. This alida ion is
pe o med by wo ou ines: a TF-IDF/cosine simila i y
sco e and a alida o ha applies heu is ic h esholds.
The alida o lowe cases he ex ac ed ull- ex da a and
uses he i s 10,000 cha ac e s as he p ima y sea ch win-
dow since i les, au ho s, and abs ac s ypically appea
nea he s a . A conse a i e egex a emp s o ex ac an
“abs ac ” block om he ex ; pai wise simila i ies a e
hen compu ed be ween he supplied i le and he documen
s a , he supplied abs ac and he ex ac ed abs ac , and
he supplied au ho s and he documen s a . Then, he al-
ida o e u ns a compac diagnos ic objec con aining he
h ee simila i y sco es and a boolean lag o alidi y; by
de aul , a eco d is accep ed i abs ac simila i y is g ea e
han 0.20 o au ho s simila i y is g ea e han 0.40 o i le
simila i y is g ea e han 0.30. These h esholds a e se up
empi ically by es ing di e en anges a e es ed o ab-
s ac , au ho , and i le simila i ies, including 0.2 o 0.4, 0.3
o 0.5, 0.6 o 0.8, and 0.7 o 1.0. Gi en he low dimension-
ali y o abs ac s and a icle me ada a, selec ed h esholds
yielded he bes esul s and a e su icien o obus alida-
ion. I no ex is a ailable o he checks ail decisi ely he
unc ion signals in alidi y (i.e., False); all simila i y sco es
a e logged o acking and h eshold uning.
Solu ion Ou come
The ou comes o he sys em a e machine- eadable ables
ha encode o each a icle: i s licensing s a us, he canon-
ical PDF link, PDF s uc u e, XML s uc u e, and PDF al-
ida ion ou comes (i any). These ou pu s can be di ec ly
consumed by SR pipelines o au oma ed o manual ull-
ex sc eening, da a ex ac ion, o c i ical app aisal. Mo e-
o e , he solu ion is ex ensible, ensu ing ha new e ie al
me hods, con en sou ces, o documen alida ion o ma s
can be inco po a ed wi hou subs an ial edesign. C u-
cially, he app oach adhe es o applicable legal es ic ions
and does no depend on paid con en p o ide s.
EVALUATION METHODOLOGY
Fo e alua ion, he p oposed sys em was applied o mul-
iple schola ly sou ces – PubMed, PMC, Eu opePMC,
IEEE Xplo e, and ACM Digi al Lib a y – wi h a ocus on
medical li e a u e. These es s a e conduc ed in o de o
demons a e he easibili y o signi ican ly educing e-
iewe wo kload while main aining ep oducibili y and
scalabili y. Each sou ce is que ied wi h domain-speci ic
sea ch s ings designed o cap u e ep esen a i e subse s o
esea ch a icles ele an o ai bo ne ansmission, espi a-
o y pa icle dynamics, o open-access e ie al a chi ec-
u es. F om each sou ce, he i s 100 a icles a e e u ned
by he que ies selec ed, esul ing in 500 a icles in o al.
The que y con igu a ions a e as ollows:
• PubMed – so ed by Bes Ma ch, bibliog aphic
me ada a is PMID, in Summa y ( ex ) o ma .
• PMC – so ed by De aul o de , bibliog aphic
me ada a is PMCID, in PMCID lis o ma .
• Eu opePMC – so ed by Rele ance, bibliog aphic
me ada a is PMCID, in ID lis o ma .
• IEEEXplo e – so ed by Rele ance, bibliog aphic
me ada a is DOI, in Plain ex o ma .
• ACM Digi al Lib a y – so ed by Recency, biblio-
g aphic me ada a is DOI, in ACM Re o ma .
Sea ch s ings o each o he sou ces can be ound in
Appendix 1.
The sys em pe o mance is quan i ied using ex ac ion
and alida ion ou comes pe 100-a icle om each sou ce.
Table 1 lis s he conside ed me ics along wi h hei de ini-
ions.
Table 1: Me ics O e iew
Me ics
Name and
Desc ip ion
Me ics
Abb e ia ion and
Equa ion
Open A icles – Coun o
a icles de e mined o
ha e an open license
𝑂𝐴 = 𝑁𝑂𝐴
PDF Re ie al Ra e –
F ac ion o open a icles
wi h a canonical PDF link
success ully e ie ed
𝑃𝑅𝑅 = 𝑁𝑃𝐷𝐹
𝑁𝑂𝐴
PDF Ex ac ion Ra e –
F ac ion o open a icles
om which PDF s uc-
u e is ex ac ed
𝑃𝐸𝑅 = 𝑁𝑃𝐸𝑅
𝑁𝑂𝐴
h ps://doi.o g/10.5281/zenodo.17220241
XML Ex ac ion Ra e –
F ac ion o open a icles
wi h XML s uc u e
a ailable
𝑋𝐸𝑅 = 𝑁𝑋𝐸𝑅
𝑁𝑂𝐴
Combined Ex ac ion
Ra e – F ac ion o open
a icles wi h ei he PDF
o XML s uc u e a aila-
ble
𝐶𝐸𝑅 = 𝑁𝑃𝐸𝑅 + 𝑁𝑋𝐸𝑅 − (𝑁𝑃𝐸𝑅 ∩ 𝑁𝑋𝐸𝑅)
𝑁𝑂𝐴
To al P ocessing Time –
Wall-clock ime s a is ics,
o al un ime (in seconds)
𝑇𝑃𝑇 = 𝑇𝑝𝑟𝑜𝑐
EVALUATION RESULTS
The sys em’s abili y o e ie e and ex ac ull- ex a -
ied signi ican ly ac oss eposi o ies, e lec ing di e ences
in openness, me ada a a ailabili y, and sou ce in as uc-
u e, as de ailed in Table 2.
Table 2: Benchma k E alua ion Resul s
Name
OA
PRR
PER
XER
CER
TPT
Pub-
Med
90
0.589
0.478
1.000
1.000
707.971
PMC
100
0.620
0.410
1.000
1.000
926.470
EPMC
93
0.925
0.871
1.000
1.000
912.000
IEEE
10
1.000
0.800
0.000
0.800
246.39
ACM
57
1.000
0.860
0.000
0.860
841.220
In e ms o a ailabili y, open-sou ce a icles co e age is
he highes o PMC (100/100) and Eu opePMC (93/100),
e lec ing hei open manda es. The p oposed solu ion also
pe o mes well on PubMed (90/100), while co e age on
ACM Digi al Lib a y (57/100) and especially IEEEXplo e
(10/100) is much mo e es ic ed.
In e ms o e ie al, he sys em achie es i s s onges
pe o mance on Eu opePMC, wi h PRR (0.925) and PER
(0.871), complemen ed by pe ec XML co e age. On
ACM and IEEE pe ec PRR (1.000) is eached, bu he ab-
sence o XML allback limi s CER o 0.860 and 0.800, e-
spec i ely. PubMed and PMC sou ces p o ide comple e
co e age h ough XML, hough hei espec i e PER
sco es a e less eliable.
P ocessing imes a e gene ally consis en , a e aging 7-9
seconds pe a icle. IEEEXplo e shows he as es o al
un ime due o i s small OA sample, whe eas PMC equi ed
sligh ly longe because o addi ional allback ope a ions.
O e all, he sys em demons a es s ong pe o mance
ac oss all sou ces, wi h CER ne e alling below 0.800, en-
su ing ha ull- ex da a is consis en ly a ailable ei he in
PDF o XML o ma .
DISCUSSION
The impac o au oma ed ull- ex e ie al in sys ema ic
e iews becomes e iden when conside ing i s po en ial o
educe e iewe wo kload in eal-wo ld scena ios.
An analysis o 195 sys ema ic e iews showed ha be-
ween 0 and 4,385 s udies (mean = 63) we e included a he
i le and abs ac sc eening s age and he e o e had o be
e ie ed in ull ex [9]. When au oma ion is no a ailable,
e iewe s mus pe o m his s ep manually, and e ie ing
a single ull ex is es ima ed o ake an a e age o 4
minu es [5]. Consequen ly, manually e ie ing ull ex s
o a sys ema ic e iew wi h he mean numbe o included
s udies (63) equi es abou 4 h 12 min, whe eas he mos
exhaus i e case (4,285 s udies) would demand app oxi-
ma ely 292 h 20 min.
By con as , he p oposed solu ion achie es an a e age
PRR o 82.68%, implying ha only 17.32% o a icles e-
qui e manual e ie al. The a e age p ocessing ime o
100 s udies is 726.81 seconds, co esponding o 7.3 sec-
onds pe a icle.
Applying his solu ion o a sys ema ic e iew equi ing
63 ull ex s, abou 52 can be e ie ed au oma ically, while
11 mus be e ie ed manually. The sys em’s p ocessing
ime amoun s o 6 min 20 s, wi h an addi ional 44 min o
manual wo k, yielding a o al e ie al ime o 50 min 20 s.
This co esponds o a wo kload educ ion o 3 h 21 min 40
s o a mean-sized sys ema ic e iew.
Based on he same assump ions, o a sys ema ic e iew
wi h 4,385 eco ds, he sys em’s p ocessing ime would be
7 h 21 min 10 s wi hou any pa alleliza ion o he e ie al
mechanism, whe eas manual e ie al would equi e abou
232 h 22 min 50 s. In his ex emely la ge case, he sys em
educes he wo kload by 232 h 22 min 50 s.
Consequen ly, he sys em can educe he ime equi ed
o ull- ex e ie al by 80%.
LIMITATIONS
Each componen o he solu ion p oposed has p ac ical
cons ain s ha may in luence pe o mance. Fi s ly, co e -
age depends hea ily on sou ce policies: eposi o ies wi h
es ic i e access models (e.g., IEEEXplo e, ACM Digi al
Lib a y) yield ewe open a icles, which educes o e all
e ie al oppo uni ies despi e high PDF success a es
when links a e a ailable. Secondly, PDF ex ac ion e-
mains agile in cases o scanned documen s, image-only
pages, o publishe -speci ic encodings, whe e s uc u ed
XML is no a ailable as a allback. Thi dly, me ada a in-
consis encies (e.g., a ian au ho s ings, missing ab-
s ac s) can lowe alida ion sco es and may exclude pos-
sible usable ex s. Mo eo e , p ocessing speed, while gen-
e ally accep able, is in luenced by ne wo k condi ions and
he need o b owse -based allback ou ines, which may
no scale well a e y la ge olumes. Finally, he sys em is
designed o ope a e wi hin legal bounda ies o open-access
con en – paywalled o license- es ic ed ma e ials emain
inaccessible by design, which can limi comple eness o
ce ain esea ch domains.
h ps://doi.o g/10.5281/zenodo.17220241
FUTURE WORK
Fu u e de elopmen o he sys em will ocus on h ee
main di ec ions. Imp o ing obus ness o PDF ex ac ion
by in eg a ing Op ical cha ac e ecogni ion (OCR) pipe-
lines o scanned o image-only documen s, and expe i-
men ing wi h hyb id app oaches ha combine pa sing wi h
Machine Lea ning-based ex eco e y. Expanding sou ce
co e age by inco po a ing addi ional APIs and ins i u ional
eposi o ies, he eby imp o ing comple eness in es ic ed
domains. Re ining alida ion by aining domain-adap i e
simila i y models ha go beyond heu is ics, enabling mo e
accu a e alignmen o me ada a and ull- ex da a. Addi-
ionally, e o s will be made o op imize p ocessing speed
and esou ce e iciency, ensu ing he sys em emains scal-
able o la ge SR p ojec s. Con inuous eedback and mo e
eal-wo ld es ing will guide hose i e a i e imp o emen s.
CONCLUSIONS
Thus, in his pape , a license-awa e, open-sou ce solu-
ion o au oma ed ull- ex e ie al, ex ac ion, and ali-
da ion ac oss mul iple majo schola ly eposi o ies is p e-
sen ed. Benchma king agains PubMed, PMC, Eu o-
pePMC, IEEEXplo e, and ACM Digi al Lib a y demon-
s a es ha he sys em consis en ly achie es high combined
ex ac ion a es (CER is g ea e han 0.800), ensu ing eli-
able a ailabili y o ei he PDF o XML s uc u es. The e-
sul s con i m bo h he easibili y and scalabili y o au o-
ma ing a c i ical bo leneck in sys ema ic e iews, educing
manual e iewe wo kload while main aining ep oducibil-
i y. A he same ime, di e ences ac oss eposi o ies high-
ligh he con inued challenges o es ic ed access and he -
e ogeneous in as uc u es. By p o iding ex ensible com-
ponen s and anspa en diagnos ics, he sys em lays a
ounda ion o u u e imp o emen s, including expanded
co e age, mo e obus ex ac ion me hods, and igh e in-
eg a ion wi h SR pipelines.
ACKNOWLEDGEMENTS
This esea ch was conduc ed as pa o he join CERN–
WHO ARIA2 p ojec , which unds Elias Sandne ’s PhD
s udies and wi hin which his pape was p epa ed. We also
g a e ully acknowledge he OpenWebSea ch.EU3 p ojec
and i s membe s o hei aluable suppo wi h his publi-
ca ion.
REFERENCES
[1] R. Randles and A. Finnegan, “Guidelines o w i ing a sys-
ema ic e iew”, Nu se Educa ion Today, ol. 125,
p. 105803, June 2023. doi:10.1016/j.ned .2023.105803
[2] L. Schmid e al., “Da a ex ac ion me hods o sys ema ic
e iew (semi)au oma ion: Upda e o a li ing sys ema ic e-
iew”, F1000Resea ch, ol. 10, a icle 401 ( e sion 3), Ap .
2025. doi:10.12688/ 1000 esea ch.51117.3
[3] F.M. Delgado-Cha es e al., “T ans o ming li e a u e
sc eening: The eme ging ole o la ge language models in
sys ema ic e iews”, P oc. Na l. Acad. Sci. U.S.A., ol. 122,
e2411962122, Jan. 2025. doi:10.1073/pnas.2411962122
[4] V. Phillips and E. Ba ke , “Sys ema ic e iews: S uc u e,
o m and con en ”, Jou nal o Pe iope a i e P ac ice, ol.
31, p. 349-353, Jan. 2025. doi:10.1177/1750458921994693
[5] I. Shemil e al., “Use o cos -e ec i eness analysis o com-
pa e he e iciency o s udy iden i ica ion me hods in sys-
ema ic e iews”, Sys . Re ., ol. 5, a icle 140, Aug. 2016.
doi:10.1186/s13643-016-0315-4
[6] L. A eng ube e al., “An explo a ion o a ailable me hods
and ools o imp o e he e iciency o sys ema ic e iew
p oduc ion: a scoping e iew”, BMC Med Res Me hodol,
ol. 24, a icle 210, Sep . 2024. doi:10.1186/s12874-024-
02320-4
[7] K.E.K. Chai e al., “Resea ch Sc eene : a machine lea ning
ool o semi-au oma e abs ac sc eening o sys ema ic e-
iews”, Sys . Re ., ol. 10, a icle 93, Ap . 2021.
doi:10.1186/s13643-021-01635-3
[8] Unpaywall API, h ps://unpaywall.o g/p oduc s/api
[9] R. Bo ah e al., “Analysis o he ime and wo ke s needed o
conduc sys ema ic e iews o medical in e en ions using
da a om he PROSPERO egis y”, BMJ Open 2017, ol.
7, a icle e012545, Oc . 2016. doi:10.1136/bmjopen-2016-
012545
___________________________________________
2) h ps://pa ne spla o m.who.in / ools/a ia
3) h ps://openwebsea ch.eu
h ps://doi.o g/10.5281/zenodo.17220241
APPENDIX
Sou ce Name
Sea ch S ing
PubMed
(ai bo ne[ iab] OR ae osol*[ iab] OR "ai bo ne ansmission"[ iab]
OR "ai ansmission"[ iab] OR inhala ion[ iab]) AND ( isk[ iab] OR
" isk assessmen "[ iab] OR exposu e[ iab] OR haza d*[ iab]) AND
(model[ iab] OR models[ iab] OR modelling[ iab] OR model-
ing[ iab] OR "ma hema ical model"[ iab] OR "compu a ional
model"[ iab] OR simula ion[ iab] OR simula ions[ iab])
PMC
(d ople * OR pa icle* OR ae osol*) AND (size OR diame e OR
"pa icle size" OR "d ople size" OR olume* OR cm OR cen ime e
OR cen ime e OR µm OR mic on OR "mic ome e " OR "mic o-me-
e ") AND ("expi a o y ac i i y" OR "expi a o y ac i i ies" OR " es-
pi a o y ac i i y" OR " espi a o y ac i i ies" OR b ea h* OR speak*
OR alk* OR shou * OR sing* OR cough* OR sneez*)
Eu opePMC
((d ople * OR pa icle* OR ae osol*) AND (size OR diame e OR
"pa icle size" OR "d ople size" OR olume* OR cm OR cen ime e
OR cen ime e OR µm OR mic on OR mic ome e ) AND ("expi a-
o y ac i i y" OR "expi a o y ac i i ies" OR " espi a o y ac i i y"
OR " espi a o y ac i i ies" OR b ea h* OR speak* OR alk* OR
shou * OR sing* OR cough* OR sneez*))
IEEEXplo e
(open OR "open-sou ce" OR "open access") AND (sea ch* OR e-
ie al OR disco e y OR "in o ma ion e ie al") AND (a chi ec-
u e* OR amewo k* OR sys em* OR pla o m* OR in as uc u e*
OR oolki *)
ACM Digi al Lib a y
(open OR "open-sou ce" OR "open access") AND (sea ch* OR e-
ie al OR disco e y OR "in o ma ion e ie al") AND (a chi ec-
u e* OR amewo k* OR sys em* OR pla o m* OR in as uc u e*
OR oolki *)
Appendix 1: Sea ch s ings o benchma k e alua ion
h ps://doi.o g/10.5281/zenodo.17220241