Res o ing Reliabili y in Vaccine Sa e y Su eillance:
Co ec ing S uc u al Bias in Medical Coho
Cons uc ion
Ma co Rocce i1*& ORCID
1 Depa men o Compu e Science and Enginee ing, Uni e si y o Bologna, Bologna, I aly
*Co esponding au ho
E-mail: ma co. occe [email protected] (MR)
&These au ho s also con ibu ed equally o his wo k.
Abs ac
The mode n e a o medical esea ch, cha ac e ized by he a ailabili y o big da a and sophis ica ed
s a is ical me hodologies, is pa adoxically ulne able o undamen al s uc u al laws in expe imen al
coho cons uc ion, despi e adhe ence o igo ous epo ing guidelines. E o s o en a ise om a ailu e
o apply simple base- a e checks, iola ing co e s a is ical p inciples de i ed om he wo k o pionee s,
and leading o con adic o y esul s. This me hodological challenge was ecen ly ampli ied in la ge-scale
COVID-19 accine sa e y su eillance, whe e a ailu e o ensu e a non-asymme ic dis ibu ion o high-
isk elde ly o ulne able indi iduals ac oss compa ed subg oups (Vaccina ed s. Non-Vaccina ed)
leads o a sys emic and ep oducible ailu e in coho cons uc ion. This gene a es p edic able and
spu ious Haza d Ra ios (HRs). This scena io is powe ully an icipa ed by his o ical ailu es ha emind
us o he impo ance o p ima y checks. Fo example, he Be a-Agonis Pa adox showed o he wo ld
ha s uc u al bias (Con ounding by Indica ion) could mis ake a ma ke o se e i y o a causal isk,
eaching us once again ha ex e nal alidi y is los when he co e coho s uc u e is comp omised. We
add ess his speci ic issue wi hin wo exempla s udies ha ha e ala med he in e na ional scien i ic
communi y by associa ing COVID-19 accina ion wi h inc eased isks o diseases, including cance and
2
au oimmune diso de s. These s udies sha e a single da a sou ce and, c i ically, his undamen ally
lawed me hodology. Ou analysis aims o econs uc he coho da a and quan i y he exac e ec o a
demog aphic asymme y on epo edly inc eased isks o cance and a common au oimmune disease
( i iligo). The applied me hodology in ol es he decons uc ion and e-analysis o epo ed HRs h ough
weigh ed incidence analysis o p oduce co ec ed and plausible isk es ima es, hus e-es ablishing
me hodological in eg i y and p o iding clinicians and he public wi h eliable in o ma ion ee om
unwa an ed ala m.
Keywo ds: Vaccine Sa e y Su eillance; Da a Rep esen a i eness; S uc u al Flaw; Flawed
Me hodology; Coho Bias; Haza d Ra io Decomposi ion
In oduc ion
The mode n e a o medical esea ch is de ined by he unp eceden ed a ailabili y o la ge-scale
da ase s, encompassing as quan i ies o c i ical pa ien in o ma ion. This weal h o big da a allows
esea che s o employ sophis ica ed analy ical echniques, anging om undamen al desc ip i e
s a is ics o complex in e en ial me hods and cu ing-edge a i icial in elligence (AI). Fu he mo e, he
epidemiological communi y adhe es o obus epo ing guidelines, such as STROBE [1] and PRISMA
[2], o example, speci ically designed o ensu e he anspa ency, ep oducibili y, and ul ima e eliabili y
o expe imen al esul s.
Mo eo e , he ounda ion o mode n bios a is ics es s on he pi o al con ibu ions o igu es such as
Ronald Fishe (de elope o ANOVA, Maximum Likelihood, and andomized design p inciples), Ka l
Pea son (pionee o co ela ion and he chi-squa ed es ), Je zy Neyman ( o mula o o con idence
in e als and hypo hesis es ing), and Da id Roxbee Cox ( he in en o o he p opo ional haza ds
model), jus o ci e a ew [3-6]. Thei me hodologies, including Reg ession Analysis, Su i al Analysis
(e.g., Kaplan-Meie ), and Mul i a iable Modeling (e.g., Logis ic and Poisson Reg ession) among o he s,
ha e enabled b eak h oughs anging om es ablishing he link be ween smoking and lung cance o
calcula ing accine e ec i eness and iden i ying complex gene ic isk ac o s.
3
Despi e his a senal o ad anced ools, s a is ical p inciples, and es ablished p o ocols, incong uencies
s ill equen ly plague he analysis o la ge, e ospec i e medical da ase s, he mos abundan o m o
e idence in he scien i ic wo ld. The eliance on hese powe ul in e en ial echniques, which a e o en
poo ly unde s ood, can some imes lead o esul s ha iola e he e y s a is ical and epidemiological
undamen als hey we e designed o uphold. E o s o igina ing om undamen al laws in he
o ganiza ion o s uc u e o his da a and he esul ing expe imen al coho s, pa icula ly when p epa ed
o in e en ial analysis, can iola e co e s a is ical p inciples and ende us unable o ex ac u h ul
signals om he da a, o en leading o con adic o y o e en sel -con adic o y indings.
A his junc u e, he solu ion is no me ely he u iliza ion o mo e da a o supe io analy ical echniques,
like machine/deep lea ning algo i hms [7]; i demands ins ead a e u n o he basic p inciples o
s a is ics, including simple desc ip i e analysis, and he ounda ional eachings o he pionee s ci ed
be o e. This equi es eco e ing he abili y, diligence, and mos o all objec i i y o execu e undamen al
base- a e checks ha ensu e a non-con adic o y o undis o ed use o he da a. Such an app oach
p e en s he esul s om being unduly in luenced by he p edisposi ion o always ejec he null
hypo hesis, d i en by he con ic ion ha as da a quan i ies mus always conceal a sensa ional new
inding, a no ion ha is by no means gua an eed.
This pe sis en me hodological ulne abili y is ex emely c i ical when e alua ing esea ch conce ning
public heal h ou comes. In pa icula , he necessi y o igo ous me hodological e iew o medical da a
and ela i e ou comes has been ecen ly ampli ied by he shee scale o he COVID-19 accina ion
campaigns. Globally, ens o billions o accine doses ha e been adminis e ed since 2021, in ol ing
hund eds o millions o people ac oss nea ly e e y demog aphic and heal hca e sys em. This
unp eceden ed scale means ha e en a e e en s, when ex apola ed ac oss he en i e accina ed
popula ion, necessi a e me iculous moni o ing and analysis [8].
The esul ing obse a ional esea ch, u ilizing as adminis a i e and elec onic heal h eco d
da abases, has yielded a highly complex and o en con adic o y pic u e ega ding po en ial Se ious
Ad e se E en s (SAE) [9]. Many la ge-scale s udies ha e con incingly ound no associa ion be ween
COVID-19 accina ion and nume ous se ious ou comes, o ins ance, dispelling ea ly conce ns
4
ega ding gene alized inc eased isks o s oke, pulmona y embolism, o speci ic neu ological diso de s
ac oss wide coho s [10-12]. Con e sely, he e is a minimal bu g owing body o ecen li e a u e
asse ing he disco e y o associa ions be ween accina ion and speci ic SAEs, such as myoca di is,
pe ica di is, and also inc eased isks o ce ain cance s and au oimmune condi ions [13-17].
Impo an ly, many o hese a i ma i e indings, pa icula ly hose published in he las wo yea s, emain
a he cen e o in ense and un esol ed scien i ic and me hodological deba e. O en, he epo ed signals
a e inconsis en ac oss di e en geog aphic egions o coho de ini ions, o hey ail o eplica e
obus ly. This ambigui y unde sco es he cu en challenge: dis inguishing a ue, consis en medical
signal om a s a is ical a i ac o me hodological e o inhe en in he e ospec i e analysis o massi e,
o en unbalanced, da ase s. The p e alence o such uncon inced o con lic ing esul s manda es ha
esea che s mus sc u inize he e y ounda ion o he da a, he cons uc ion and ep esen a i eness o
he esul ing expe imen al coho , be o e accep ing any pu po ed clinical associa ion.
Backg ound and His o ical P eceden s
To ully con ex ualize he g a i y o hese me hodological lapses and demons a e ha he issue o
lawed coho cons uc ion is no unique o mode n big da a medical analysis, i is essen ial o e u n o
medical his o y. Be o e p oceeding u he , we deem i necessa y o p o ide a his o ical backg ound
illus a ing how s uc u al da a e o s can undamen ally mislead clinical science.
O e he mode n e a o medicine, se e al landma k obse a ional s udies ha e shaken public
con idence wi h con adic o y o ul ima ely debunked esul s. In nea ly e e y ins ance, he unde lying
mo i a ion o he e oneous conclusions es ed upon he inco ec o imp ope use o suppo ing da a,
ypically h ough se e e selec ion o con ounding bias. The e o e, we ci e h ee p ominen cases whe e
me hodological ailu es, a he han ue biological e ec s, p oduced misleading signals o isk o
bene i . Speci ically, he challenge o ensu ing ex e nal alidi y (as ecognized by he STROBE p o ocol,
p inciple 21 [1]), ha is he deg ee o which a s udy coho accu a ely ep esen s he a ge popula ion
and i s baseline isk, is a ecu en and c i ical issue in obse a ional epidemiology. When a coho 's
baseline isk is sys ema ically skewed due o unmeasu ed o s uc u al selec ion di e ences be ween
exposu e g oups, i leads o a biased es ima e o he e ec , o en c ea ing a spu ious associa ion. This
5
phenomenon, whe e coho cons uc ion e o s lead a s udy o de i e misleading clinical conclusions,
is a long-s anding conce n in medicine, as demons a ed by he ollowing p eceden s.
Fi s , we ci e he case o Ho monal Replacemen The apy (HRT) and Ca dio ascula Risk (The WHI
s. Obse a ional S udies). Fo o e wo decades, in ac , highly in luen ial obse a ional s udies (such
as he Nu ses’ Heal h S udy) concluded ha Ho mone Replacemen The apy (HRT) d as ically educed
he isk o myoca dial in a c ion (up o 40–50%) in menopausal women [18]. These seemingly obus
indings pushed HRT o become a global s anda d ea men o millions. The e o seems o lay in a
se e e heal hy use bias: women who chose o ake HRT we e inhe en ly heal hie , weal hie , be e
educa ed, main ained heal hie li es yles, and we e mo e likely o seek p oac i e medical ca e. This
sys emic non- andom selec ion a i icially lowe ed hei ca dio ascula isk be o e s a ing he he apy,
making he HRT g oup look p o ec ed. The de ini i e andomized con olled ial (RCT), he Women’s
Heal h Ini ia i e (WHI, 2002), yielded he opposi e esul s: HRT ac ually inc eased ca dio ascula and
h ombo ic isk [19]. This case emains he mos amous example in medicine o how a s uc u al coho
bias c ea ed a p o ec i e e ec ha was en i ely non-exis en , leading o yea s o subop imal clinical
p ac ice. We p esen ed his case i s because i is one o hose ha has a lic ed e idence-based da a-
d i en medicine o se e al decades, wi h he a gumen s in a o o ho monal he apy ecen ly
easse ing hemsel es, albei in di e en o ms and aimed a speci ic pa ien sec o s, in a cycle ha
seems ne e o end [20].
The second and pe haps mos amous case is he one ha became known by he name o : The
Wake ield Case, in 1998 (also known as MMR Vaccine and Au ism). In 1998, And ew Wake ield
published a s udy, la e e ac ed and p o en audulen , ha sugges ed an associa ion be ween he
MMR (Measles, Mumps, and Rubella) accine and he onse o au ism. While he main issues we e
da a manipula ion and con lic s o in e es , he cen al me hodological ailu e was he absence o a alid
con ol g oup and he use o a iny, selec i ely chosen sample (n=12), cons i u ing ex eme selec ion
bias. This design law ensu ed ha any co ela ion ound was highly ulne able o con ounding.
Subsequen la ge-scale epidemiological s udies, in ol ing millions o child en ac oss mul iple coun ies,
con i med he o al lack o associa ion, demons a ing a ela i e isk o exac ly 1. The MMR–au ism case
6
has been de e minan in clinical medicine o unde s anding how he inco ec sample cons uc ion and
selec ion can p oduce de as a ing public ha m by e oding us in es ablished p e en i e measu es [21].
The hi d and inal case (As hma Mo ali y - Be a-agonis Pa adox) is an equally amous scena io whe e
audulence plays no pa , while a heme eme ges, almos inad e en ly due o insu icien medico-
p ocedu al knowledge, ega ding he use o da a acco ding o he co ec p o ocols and due
p ecedence. In ha case, in he 1980s and 1990s, obse a ional s udies had sugges ed ha equen
use o inhaled be a-agonis s was associa ed wi h an inc ease in as hma mo ali y. Those indings aised
signi ican ala ms ega ding he sa e y o a i s -line ea men . In eali y, his was a classic example o
con ounding by indica ion: pa ien s who we e p esc ibed and hus ecei ed he mos in ensi e and
equen doses o he inhale we e simply hose wi h he mos se e e, li e- h ea ening as hma. The d ug
he e o e appea ed o cause he se e i y, whe eas i was only a ma ke o p e-exis ing disease se e i y
and high isk. This s uc u al bias in he coho p oduced a spu ious Haza d Ra io (HR) ha alsely
sugges ed he li e-sa ing medica ion was ha m ul, leading o widesp ead misin e p e a ion o clinical
isk [22, 23].
This case se es as a powe ul eminde ha p ocedu al e o s in es ablishing p io i y o causes and
e ec s (indica ion s. ou come) can gene a e highly misleading signals, e en wi hou any delibe a e
manipula ion o da a. Mo e impo an ly, his scena io is p o oundly ins uc i e as i di ec ly an icipa es
he cen al hesis o ou s udy: a p oblem o gene alizabili y (ex e nal alidi y) is comple ely obscu ed
by he in ense ocus on con olling in e nal alidi y. In he an ic e o o balance lesse con ounding
elemen s, esea che s lose sigh o he b oade meaning o he da a, ailing o ecognize ha he co e
coho s uc u e i sel is undamen ally comp omised.
Ou Con ibu ion
These p eceden s illus a e ha when a s udy’s e e ence g oup is sys ema ically un ep esen a i e o
he baseline isk, he esul ing HRs o isk associa ions become undamen ally un eliable. In his
con ex , ou aim is, as al eady an icipa ed, o add ess p oblems s emming om imp ope ly handled
medical da a in he de ec ion o Se ious Ad e se E en s (SAE) wi hin la ge coho s o accina ed (V)
and non- accina ed (NV) indi iduals, no ing a ecen endency o ampli y such signals o he de imen
7
o he Vaccina ed (V) g oup. Speci ically, he in e p e a ion o sa e y/ala m signals de i ed om
obse a ional coho s and ela i e da a elies undamen ally on he common suppo assump ion ha
he e e ence (NV) and exposed (V) g oups a e compa able in unde lying heal h and isk beha io .
Un o una ely wha we ha e equen ly obse ed, and wha o ms he cen al ocus o he p esen
analysis, is he ailu e o apply a igo ous quan i a i e p e-analysis on he da a o ensu e a non-
asymme ic p esence o high- isk elde ly (o o he wise ulne able) indi iduals ac oss he wo subg oups
(V and NV). This omission di ec ly leads o a sys emic and ep oducible ailu e in coho cons uc ion
ha gene a es p edic able, spu ious HRs.
In his con ex , a se ies o s udies ha e ecen ly eme ged ha ha e ala med he medical wo ld and he
in e na ional communi y by associa ing COVID-19 accina ion wi h an inc eased isk o a ange o
diseases, spanning om psychia ic condi ions o au oimmune diso de s, and ca dio ascula -ci cula o y
issues, bu mos no ably o nume ous ypes o cance s [14-17]. Wha hese s udies, which ha e ound
accep ance in many espec ed jou nals om di e en edi o ial g oups, ha e in common is no only hei
o igin om a speci ic na ion o hei de elopmen by an almos singula esea ch g oup ( ac o s ha
hold no ele ance o ou analysis). Ra he , he c i ical commonali y is ha all hese s udies sha e he
same ini ial da a sou ce: a single, speci ic na ional da abase (i.e., he Sou h Ko ean Na ional Heal h
Insu ance Se ice NHIS da abase). Mo e impo an ly, all hese s udies sha e a undamen ally lawed
cons uc ion o he V and NV g oups. As we will demons a e, hese g oups a e sys ema ically
unbalanced, a o ing he NV g oup by in oducing a signi ican ly lowe numbe o elde ly indi iduals
(who ha e a na u ally highe incidence o he epo ed diseases), he eby leading o a supp essed
baseline isk o he NV g oup, independen o COVID-19 accina ion s a us. This s uc u al imbalance
ensu es ha any subsequen analy ical inding is highly p one o p oducing spu ious HRs.
Ou p esen s udy aims hus o econs uc he coho da a, quan i y he e ec o demog aphic
asymme y on epo ed HRs, and p oduce co ec ed es ima es ha accoun o hese biases o wo o
he se e al cases aised by hese s udies [14, 15]: speci ically, i iligo and cance . We ocus on i iligo
due o he peculia i y o he inding, as his common au oimmune diso de is no eadily implica ed in
he public o common medical imagina ion wi h COVID-19 accina ion, making i a compelling es case
o spu ious associa ion. Con e sely, we ocus on cance o he opposi e eason: i is one o he leading
causes o dea h wo ldwide, and hus, e en concep ually, is equen ly and widely suspec ed o being
8
associa ed wi h majo global e en s and epidemiological ac o s, making he epo ed associa ion
pa icula ly esonan and po en ially ala ming o public heal h. By doing so, we aim o cla i y whe he
he obse ed associa ions e lec ue biological isk o a e in he end simply s a is ical mi ages.
To his aim, we ha e applied a igo ous quan i a i e analysis o wo coho s o hose high-impac s udies
by employing a igo ous me hodology consis ing in decons uc ion and e-analysis o epo ed HRs
h ough weigh ed incidence analysis, ocused on co ec ing he se e e demog aphic bias (age and
beyond) in he non- accina ed subg oup, o econs uc mo e plausible and non-dis o ed isk es ima es.
This me hodology building on s anda d su i al analysis de elops a speci ic Poisson-like eg ession
decomposi ion echnique o haza d a e models, allowing o a lexible ep esen a ion o he baseline
haza d [24-26]. In essence, we use i o dissec he o iginally obse ed HRs in o componen s d i en by
s uc u al ac o s and unco ec ed me hodological bias, p o ing ha he la e is o en he dominan
o ce. Ou analysis o his da a has demons a ed, in he end, ha bo h he obse ed signals o
inc eased i iligo and cance isk a e highly likely o be a mani es a ion o a simila s uc u al selec ion
bias, in ol ing he asymme ic unde ep esen a ion o high- isk elde ly indi iduals in he non- accina ed
g oup, eaching us once again ha simila c i ical laws in coho cons uc ions mus be add essed
be o e any conclusion ega ding biological causali y can be en e ained.
In closing his Sec ion, i mus be unde sco ed ha ou p ima y ocus is no on he e aci y o hese
indings pe se, which emain subjec o ongoing demons a ion despi e he signi ican a en ion hey
ha e ga ne ed. Ra he , ou c i ique cen e s on he ac ha bo h epo ed associa ions, po en ially
o igina ing om he same esea ch g oups, exhibi simila and conce ning common pa e ns in he
handling o undamen al medical da a. Speci ically, hey demons a e a pe asi e ailu e o cons uc
balanced pa ien coho s capable o adequa ely con olling he signals ha he da a appea o emi ,
e en in e ospec i e analysis. This me hodological weakness pe sis s ega dless o he sophis ica ion
o he adjus men algo i hms employed, such as P opensi y Sco e Ma ching (PSM) used o cance , o
in cases whe e such complex balancing echniques we e no u ilized a all (as obse ed wi h he i iligo
endpoin ) [27]. This s uc u al inadequacy has de ini ely sugges ed ha he epo ed signals a e no
obus . A close bios a is ical inspec ion o he unde lying da a has e ealed p onounced de ia ions
om na ional demog aphic no ms, pa icula ly an unde ep esen a ion o high- isk elde ly indi iduals in
9
he non- accina ed compa a o g oup. This s uc u al imbalance has ma hema ically gene a ed
spu ious HR wi hou implying any causal o e en associa i e ela ionship.
Ma e ials and Me hods
In his Sec ion, we i s p o ide d awn om in e na ional li e a u e he e e ence da a o i iligo and
cance as he s a ing poin o ou analysis, espec i ely based on he wo ollowing se s o Sou h Ko ean
s udies, [28-30] and [31-35], which include he essen ial e e ence poin s o he p esen analysis. We
will c oss- e e ence his da a, which se es as he na ional benchma k o hose wo diseases in e ms
o incidence disease and age dis ibu ion, wi h he da a coming om he sc u inized s udies [14, 15],
ep esen ing i in clea abula o ma s. A e his, we in o m he eade abou he main me ics used o
calcula e and in e p e he meaning o isk (i.e., HRs) in con ex s o e ospec i e analysis o medical
da a, also illus a ing in ma hema ical de ails he me hodology used, s a ing om he a o emen ioned
e e ence da a, o co ec he isk alues and ob ain hose adequa e o he s a ing da a.
Inpu Da a Summa y and Sou ces
Ou quan i a i e compa a i e analysis s a s om he key epidemiological pa ame e s ex ac ed om
he wo exempla coho s unde in es iga ion, u ilizing he Sou h Ko ean Na ional Heal h Insu ance
Se ice (NHIS) da abase [14, 15]. The co e pa ame e s ex ac ed a e he obse ed Haza d Ra io
HR(Obse ed), he obse ed cumula i e Incidence Ra e in he Vaccina ed g oup, IR(V, Obse ed), and
he co esponding a e in he Non-Vaccina ed g oup, IR(NV, Obse ed).
Impo an ly, hese obse ed a es a e con as ed wi h independen ly calcula ed expec ed Na ional
Incidence a es, IR(Baseline), which se e as he gold s anda d baseline o alida ing a
demog aphically ep esen a i e Sou h Ko ean coho [29, 33]. We al eady an icipa e he e ha he
IR(NV, Obse ed) plays o en, in miscons uc ed coho s, he ole o he ailing denomina o due o he
s uc u al and selec ion biases inhe en in he coho p epa a ion. We begin by p o iding he inpu da a
summa y (HR and IR) o bo h i iligo and cance cases s udied in [14, 15] in Table 1.
16
na ional baseline (a -45% de ici in he high- isk >= 65 age g oup), indica ing a massi e selec ion bias.
The V g oup, al hough also dep essed, was less se e ely biased (-26.9%). This asymme y in he bias
magni ude is he di ec sou ce o he spu ious HR( esidual), wi h coho s almos al eady clea ly non-
compa able on unde lying heal h s a us, necessi a ing a s a is ical co ec ion o he baseline isk which
we will calcula e la e unde he o m o a HR(Co ec ed). This c i ical adjus men will neu alize, as
shown in he Resul s Sec ion, he denomina o ailu e caused by he asymme ic selec ion bias su e ed
by all he coho ]15], and will p o ide a non-dis o ed es ima e o he associa ion, di ec ly es ing he null
hypo hesis agains he co ec ed baseline.
Resul s
We now p o ide ou inal esul s in e ms o he eal isk ha he da a om he da abase employed in
[14, 15] should ha e e ealed i p ope ly managed. We p o ide a concise summa y o he HRs esul s
in he hi d Subsec ion, while in he i s wo, espec i ely o i iligo and cance , we show how hese
co ec ly calcula ed isk alues can be ob ained, and discussed as well.
Vi iligo Resul s: Failu e and Co ec ion
This Sec ion p esen s he esul s documen ing how we can a i e a he conclusion ha ei he he
inc eased i iligo isk disappea s o he V g oup, adop ing he s udy's logical amewo k discussed
ea lie , o he easoning de eloped in [14] can be alsi ied.
We s a by making he c i ical assump ions ha he obse ed cumula i e incidence a es o 2.22 (V)
and 0.67 (NV) pe 10.000 we e implici ly annualized and ha monized and a e he e o e di ec ly
compa able o he annual expec ed incidence a es o [29] which can be calcula ed as equal o 2.2232,
based on da a o Table 2 and Equa ion 2. We also assume ha he impac o excluding he unde -20
age g oup is negligible o his p ima y inding, in any case, emembe ing ha he exclusion om a
i iligo s udy o he g oup (< 20 yea s) ha expe iences he i s peak o he disease is in i sel qui e
unusual.
As al eady an icipa ed (see Table 2 and Equa ion2), he expec ed incidence a es pe 10.000 o bo h
g oups (V, NV) o he in es iga ed coho can be calcula ed as he wo componen s o he a io ha will
17
lead o he HR(S uc u al), amoun ing espec i ely o 2.6804 (V) and 2.2146 (NV). The a io o hese
wo componen s yields he app oxima ed alue o HR(s uc u al) = 1.21. This i s esul s shows a i s
ailu e caused by he ele an demog aphic imbalance be ween he Vaccina ed (V) and Non-Vaccina ed
(NV) g oups, because o he use o an insu icien Random Selec ion ma ching echnique o c ea e he
coho . In some sense, i is said ha he isk inc eases by a 21% only in o ce o an age dispa i y
be ween he wo sub-g oups. Simply said, he inc eased obse ed isk o 21% is ma hema ically
a ibu able solely o he di e ence in he age s uc u e, speci ically he highe p opo ion o olde , high-
isk indi iduals in he V coho .
By applying Equa ion (1), we can de i e now he alue o he HR(Residual), ha is he associa ion ha
emained a e ma hema ically elimina ing he age bias. We achie e a alue o 2.24, ha is
HR(Obse ed) / HR(S uc u al) = 2.714 / 1.21 = 2.24. The esul ing esidual isk o 2.24 con i ms a
p o ound ailu e o he simple andom ma ching echnique. E en a e adjus ing o age, a subs an ial
unexplained di e ence emained, p o ing he coho s we e no uly compa able and con i ming a
quali a i e ailu e in he ini ial s udy design.
To ind he ue, non-biased associa ion, we pe o m he de ini i e co ec ion. This in ol ed disca ding
he aul y obse ed incidence a e o he Non-Vaccina ed g oup and eplacing i wi h he expec ed
Na ional Incidence IR(Baseline), de i ed om a demog aphically sound gold s anda d baseline. This
IR(Baseline) howe e is no d awn om Table 1 (2.473) bu adjus ed o he age es ic ions imposed
by he o iginal s udy coho o [14], which excluded indi iduals unde 20. The demog aphically weigh ed
expec ed incidence o he NV coho was ha al eady calcula ed as 2.2146 / 10,000. The co ec ed
Haza d Ra io HR(Co ec ) is hen calcula ed by compa ing he obse ed incidence in he Vaccina ed
g oup IR(V, Obse ed) = 2.22 / 10,000, agains he e i ied demog aphically expec ed incidence o he
con ol coho IR(NV, Expec ed) = 2.2146 pe 10,000, yielding 2.22 / 2.2146 = 1.0024 pe 10.000 which
is e y nea o he uni y. We also emphasize ha he alue o 2.2146 used o co ec ion ep esen s
he demog aphically weigh ed incidence, ha is he IR(NV, Expec ed), speci ically es ic ed o he > 20
age coho analyzed in he o iginal s udy, hus di e ing om he o e all na ional IR(Baseline) o 2.473
epo ed in Table 1.
18
This decomposi ion conclusi ely shows ha he widely epo ed HR o 2.714 was almos en i ely he
p oduc o s a is ical bias. Once co ec ed o he s uc u al di e ences, he ue associa ion HR
(co ec ed) has led us o 1, indica ing he comple e elimina ion o he ala m signal and con i ming ha
he accina ion was no associa ed wi h an inc eased isk o i iligo in his popula ion when compa ed
o a p ope demog aphic baseline. In simple wo ds, since he co ec ed HR is p ac ically 1, i canno
be asse ed ha NV o V un a g ea e isk han he o he o de eloping i iligo ollowing accina ion,
gi en he a ailable da a.
Bu also his i iligo case o [14] has i s complexi ies, because we mus ask ou sel es wha happens i
we elax he assump ion ha he da a in Table 1 we e ac ually annualized, and abo e all, o ein oduce,
i one uly wan s o discuss he epidemiology o i iligo, he con ibu ion o hose who con ac he
disease be o e he age o 20. Le us s a wi h he issue o < 20yea s. We know om [29] ha he annual
Sou h Ko ean incidence a e o his age b acke is 3.4241 (also shown in Table 2), and we also know
om [36] ha he pe cen age o indi iduals < 20 yea s is app ox. 15% o he o al popula ion. This leads
o a gold s anda d o he incidence o i iligo < 20 yea s equal o 3.4241 x 0.15 = 0.5136. Consequen ly
he na ional incidence a e ´d awn om [29] o he emaining po ion o popula ion will be 2.473 - 0.5136
= 1.9594. I we now annualize he qua e ly obse ed incidence a es in [14] o 2.22 (V) and 0.67 (NV)
pe 10.000, we achie e espec i ely 8.88 and 2.68 o be con as ed agains 1.9594 pe 10.000 like in
Table 6 below.
Table 6: Inconsis ency and Viola ion o Epidemiological P inciples (Vi iligo)
G oup
Obse ed Ra e
(Annual)
Expec ed Ra e
(Gold S anda d)
Ra io
(HR)
Inconsis ency wi h
Baseline
Non-
Vaccina ed
(NV)
2.68
1.9594
1.37
37% Highe
Vaccina ed (V)
8.88
1.9594
4.53
453% Highe
19
In essence, he compa ison o he annualized obse ed a es wi h he na ional gold s anda d e eals a
p o ound inconsis ency, ende ing he ou come o [14] un eliable. I simul aneously ob ained wo esul s
disas ous o he eliabili y o he in es iga ed s udy. On he one hand, i in alida ed he con ol g oup
wi h an incidence o 2.68 ema kably highe han he na ional s anda ds (+37% o e 1.9594), on he
o he hand i has con ibu ed o he calcula ion o a co ec ed HR o 4.53 which is a esul clinically and
epidemiologically unsus ainable.
Ul ima ely, hen, i co ec calcula ions a e pe o med while igno ing a po ion o he disease's e ec on
he Sou h Ko ean popula ion and conside ing he obse ed qua e ly incidences, he disappoin ing
esul is eached: once he e ec o age imbalance in he coho s is elimina ed, he ue, co ec ly
ecalcula ed clinical isk is non-exis en . Con e sely, i incidences a e conside ed on an annualized
basis and he subs an ial po ion o he popula ion con ac ing he disease unde he age o 20 is no
igno ed, pa adoxical e ec s a ise, including he in alida ion o he con ol g oup due o incidence a es
exceeding he na ional a e age, and he non- accina ed g oup is le wi h an inc edibly high po ion o
isk o be se iously conside ed clinically, he eby e ec i ely in alida ing he en i e s udy.
Cance Resul s: Failu e and Co ec ion
The cance analysis p esen ed a di e en me hodological ailu e, as he P opensi y Sco e Ma ching
(PSM) success ully elimina ed s uc u al demog aphic bias, hus esul ing in an HR(S uc u al) nea
uni y, wi hou e en needing o coun (see Table 3). None heless, in his case we a e in he p esence o
a double ailu e: an in e nal ailu e (i.e., esidual bias wi hin ma ched s a a, Table 4) and an ex e nal
ailu e (non- ep esen a i eness o he en i e coho compa ed o he na ional popula ion, see he -32.2%
de ici in p e ious Sec ion).
Consequen ly, he en i e obse ed isk is a ibu ed o he esidual componen as can be de i ed om
he applica ion o Equa ion 1: HR(Residual) = HR(Obse ed) / HR(S uc u al) = 1.27 / 1.00 = 1.27. This
esul con i ms ha he obse ed HR ep esen s he di ec quan i ied e ec o an asymme ic selec ion
bias, al eady documen ed in Table 5, whe e he NV g oup exhibi s a signi ican ly deepe de ici om
he Na ional Incidence baseline (-45.1% s. -26.9% o he >= 65 g oup). This asymme y p o es he
20
coho s a e non-compa able on unde lying heal h s a us, despi e he applica ion o he PSM p ocedu e,
due o he well know in e e e ence o he heal hy cooho phenomenon [37, 38].
In his case we mus also add ess a p oblem o ex e nal alidi y whe e he ex e nal non-
ep esen a i eness o he o e all coho manda es he eplacemen o he obse ed incidence a e
IR(NV, Obse ed) = 33.43 wi h he demog aphic baseline IR(NV, Baseline) = 55.02 (Table 1). Applying
Equa ion (3) and da a om Table 1 p o ides he co ec ed Haza d Ra io: HR(Co ec ed) = IR(V,
Obse ed) / IR(NV, Baseline) = 42.62 / 55.02 = 0.77.
In he end, he HR(Co ec ed) is educed om he o iginal 1.27 o 0.77. This esul no only elimina es
he epo ed posi i e associa ion bu e en sugges s a p o ec i e associa ion a e adjus ing o he
s uc u al de ici o high- isk indi iduals in he s udy's o e all popula ion compa ed o he na ional
demog aphic baseline. This is a esounding con i ma ion o he ac ha an un easoned use o da a
leads o esul s ha a e no only needlessly ala ming bu o en bo de on nonsensical.
Resul s Summa y
Table 7 summa izes he ou comes o he decomposi ion analysis and he subsequen e-calcula ion o
he co ec ed isk, con as ing he HR published in [14, 15] wi h he HR e-calcula ed using he baseline
na ional incidence as he denomina o , o bo h i iligo and cance .
Table 7: Summa y o esul s
Case S udy
HR
Obse ed
HR
S uc u al
HR
Residual
HR
Co ec ed
Co ec ion E ec
Vi iligo (Case I)
2.714
1.21
2.24
App ox. 1
Elimina ion o signal
Vi iligo (Case II)
2.714 x 4
N/A
N/A
IR(NV, Obse ed)
= 2.68 -> 1.37
/
IR(V, Obse ed) =
8.88 -> 4.53
Clinical Nonsense
/
Viola ion o STROBE
(21)
21
Case S udy
HR
Obse ed
HR
S uc u al
HR
Residual
HR
Co ec ed
Co ec ion E ec
Cance
1.27
App ox. 1
1.27
0.77
Re e sal o
p o ec i e signal
Discussion
The p esen wo k has pu o h a a ge ed and quan i a i e me hodological c i ique on he c ucial subjec
o he eliabili y o sa e y signals eme ging om pos -COVID-19 accine su eillance, pa icula ly hose
de i ed om e ospec i e s udies on la ge na ional da abases. Ou analysis did no ocus on he
e aci y o he biological indings bu on he s uc u al alidi y o he expe imen al coho s om which
hese esul s we e ex ac ed.
The undamen al s eng h o ou a gumen lies in a e u n o basic s a is ical p inciples and ounda ional
epidemiology, o en obscu ed by he unc i ical use o ad anced in e en ial me hodologies. We
demons a ed how a sys emic s uc u al law, ypically, he asymme ic dis ibu ion o high- isk
indi iduals (elde ly o ulne able) be ween he Vaccina ed (V) and Non-Vaccina ed (NV) g oups, can
gene a e p edic able and spu ious HRs. This phenomenon is no new, bu a di ec echo o his o ical
p eceden s, such as he Be a-Agonis Pa adox and he Ho monal Replacemen The apy case, which
wa ned he scien i ic communi y abou how con ounding by indica ion o s uc u al selec ion bias can
ans o m a ma ke o se e i y in o an appa en isk signal.
Ou analysis add essed his ulne abili y by examining wo exempla y s udies, ocusing on i iligo and
cance s, which sha e he same da a sou ce and, c i ically, simila me hodological de ec s. Th ough a
igo ous decomposi ion o he Haza d Ra io (HR), we isola ed and quan i ied he exac con ibu ion o
his demog aphic bias, con i ming he es ablished amewo k o bias quan i ica ion [24-26]. In he case
o i iligo [14], we clea ly iden i ied ha he high obse ed isk was p ima ily an a i ac o s uc u al
laws. This was wo old: i s , much o he isk was ma hema ically a ibu able solely o he unbalanced
age s uc u e o he coho ; second, a deepe analysis e ealed a p o ound inconsis ency in he baseline
isk, showing ha he non- accina ed (NV) g oup's annualized incidence a e was al eady 37% highe
22
han he na ional gold s anda d baseline. This inhe en con amina ion o he con ol g oup p o ed he
ailu e o Ex e nal Validi y (STOBE p inciple 21). Co ec ing hese dys unc ions in ei he case, he HR
o isk was neu alized. Simila ly, in he case o cance , e en wi h he use o a complex balancing
echnique (P opensi y Sco e Ma ching o PSM), we e ealed an e en mo e insidious ailu e: he PSM
success ully balanced he mean age bu ailed o balance he baseline isk. The s uc u al
unde ep esen a ion o high- isk elde ly indi iduals in he NV g oup compa ed o he na ional benchma k
a i icially supp essed he baseline isk, c ea ing a de ec in he denomina o which, once co ec ed wi h
he expec ed na ional incidence, b ough he HR back o alues close o uni y. In bo h examples, he
ala m signal was essen ially neu alized when he coho was co ec ed o e lec adequa e
demog aphic ep esen a i eness.
The p o ound s eng h o ou me hodology lies in i s adhe ence o he long-es ablished p inciples o
desc ip i e/in e en ial s a is ics and medical p o ocol design, p inciples ha ha e guided he p ope
cons uc ion o expe imen al coho s o decades. Ou app oach igo ously con i ms ha da a in eg i y
p ecedes analy ical complexi y. We asse ha he ounda ional ask o ensu ing coho compa abili y,
a p e equisi e o mi iga ing all o ms o in e nal and ex e nal bias, mus be sa is ied be o e complex
in e en ial algo i hms a e deployed. Speci ically, by using na ionally ecognized incidence a es as a
gold s anda d baseline, we we e able o diagnose and co ec no only in e nal bias (age asymme y in
he i iligo s udy) bu also he mo e sub le ex e nal bias, like in he cance s udy, whe e he en i e s udy
popula ion was non- ep esen a i e o he na ional isk p o ile [37, 38]. Ou wo k is a compelling
demons a ion ha s a is ical igo , oo ed in undamen al checks on da a ep esen a i eness, p o ides
he necessa y co ec i e mechanism o e-es ablish me hodological in eg i y when s uc u al laws
comp omise sophis ica ed analyses.
None heless, despi e he obus ness o ou econs uc ion and quan i ica ion, he limi a ions o ou s udy
mus be clea ly ecognized and emphasized. Ou analysis is based en i ely on he e-edi ion and e-
analysis o he agg ega ed da a and epidemiological pa ame e s published in he o iginal s udies [14,
15], c oss- e e enced wi h known na ional incidence da a [28-36]. Wi hou di ec access o he o iginal
Sou h Ko ean da abase a he indi idual le el (disagg ega ed da a), ou specula ions, howe e
igo ously co ec and quan i ied, can ne e push beyond he le el imposed by he ini ial agg ega ion.
23
Consequen ly, we canno diagnose o exclude he exis ence o addi ional esidual biases, such as
insu icien co ec ion o como bidi ies o he absence o o he unmeasu ed con ounde s, which could
only eme ge om he analysis o indi idual eco ds. The impossibili y o accessing non-agg ega ed da a
emains, by de ini ion, he insu moun able limi o any seconda y and e ospec i e analysis o his ype.
I is also essen ial o ei e a e ha ou analysis does no cons i u e a c i ique o he biological indings
o he s udies in ques ion, no does i in end o challenge he se iousness o in eg i y o he au ho s o
he jou nals ha published hem. Ou in en is pu ely me hodological and public heal h-o ien ed. We
cau ion he scien i ic communi y and he public abou he co ec and easoned use o da a and he
necessi y o execu ing igo ous baseline checks on coho cons uc ion be o e accep ing any signal [39].
Ou s udy unequi ocally demons a es ha he analyzed da a, once subjec ed o co ec ion o s uc u al
bias, do no p o ide a easonable basis o any ala m ega ding an inc eased isk o i iligo o cance
ollowing Co id-19 accina ion, ende ing he o iginal conclusions, al hough ob ained wi h sophis ica ed
ools, needlessly ala ming and bo de ing on s a is ical a i ac .
Conclusion
Ou s udy pe o med a c ucial me hodological in e en ion o eassess he eliabili y o speci ic ad e se
e en signals epo ed in COVID-19 accine sa e y su eillance, pa icula ly conce ning i iligo and
cance isks de i ed om a speci ic na ional da abase. We ha e demons a ed ha he obse ed
associa ions, which gene a ed signi ican public ala m, a e highly likely o be he esul o sys emic
s uc u al laws in he cons uc ion o he compa ison coho s, a he han ue biological signals. The
co e o he issue lies in he ailu e o ensu e ull compa abili y be ween he accina ed and non-
accina ed g oups, leading o an asymme ic unde ep esen a ion o high- isk elde ly indi iduals in he
non- accina ed baseline. Th ough igo ous quan i a i e decons uc ion o he epo ed Haza d Ra ios,
ou analysis success ully isola ed and quan i ied his s uc u al bias. In bo h exempla cases, ou
co ec ed isk es ima es, de i ed by u ilizing na ional incidence da a as a gold s anda d baseline,
e ec i ely neu alized he ala ming signals. Speci ically, he obse ed HRs, o iginally sugges ing
inc eased isk, we e educed o alues ho e ing a ound uni y, demons a ing ha he isk dispa i y
disappea s when me hodological in eg i y is e-es ablished. This p esen wo k se es as a powe ul
24
eminde ha ounda ional s a is ical checks on da a ep esen a i eness mus p ecede he deploymen
o complex in e en ial echniques. The in eg i y o he s udy's ex e nal alidi y is comp omised when he
e e ence g oup does no accu a ely e lec he na ional isk p o ile, a ailu e ha e en sophis ica ed
algo i hms like P opensi y Sco e Ma ching canno o e come i he co e coho s uc u e is undamen ally
de icien . In summa y, ou indings s ongly sugges ha he epo ed associa ions be ween COVID-19
accina ion and inc eased isks o i iligo and cance a e s a is ical a i ac s bo n om me hodological
bias. While acknowledging he inhe en limi a ion o wo king exclusi ely wi h agg ega ed da a, we
cau ion he scien i ic communi y and he public agains d awing clinical conclusions o issuing public
heal h ala ms based on analyses ha ail o adhe e o hese essen ial p inciples o obus coho
cons uc ion. Ou p ima y conclusion is ha , based on he co ec ed e idence, no easonable ala m is
wa an ed om he da a unde sc u iny.
Au ho In o ma ion
Ma co Rocce i (MR): Depa men o Compu e Science and Enginee ing, Uni e si y o Bologna, 40126
Bologna, I aly, ma co. occe [email p o ec ed]. ORCID: 0000-0003-1264-8595, sole and co esponding au ho
Au ho Con ibu ions
MR concei ed and designed he s udy, ca ied ou all da a collec ion and analysis, in e p e ed he
quan i a i e esul s, and was he sole au ho esponsible o w i ing and e ising he manusc ip . The
au ho a i ms ull esponsibili y o he in eg i y o he da a and he accu acy o he da a analysis
p esen ed.
E hics app o al and consen o pa icipa e
This s udy uses publicly a ailable, agg ega ed da a ha con ains no p i a e in o ma ion. The e o e,
e hical app o al is no equi ed
Da a A ailabili y S a emen
The da a p esen ed he e is ei he included di ec ly o was ex ac ed om he e e enced documen s
and ci ed li e a u e. All calcula ions a e easily ep oducible based on he de ini ions p o ided.
25
Funding
This esea ch ecei ed no speci ic g an om any unding agency in he public, comme cial, o no - o -
p o i sec o s. This s udy was conduc ed en i ely independen ly by he au ho using pe sonal esou ces.
Con lic o In e es
The au ho decla es ha he e is no con lic o in e es , inancial, pe sonal, o o he wise, ha could be
cons ued as in luencing he esul s o he conclusions p esen ed in his pape .
Gene a i e AI s a emen
The au ho decla es ha no Gen AI was used in he c ea ion o his manusc ip .
Re e ences
1. Elm E, Al man DG, Egge M, e al. (2007) The S eng hening he Repo ing o Obse a ional
S udies in Epidemiology (STROBE) S a emen : guidelines o epo ing obse a ional s udies.
PLoS Med., 4(10):e296. DOI: 10.1016/j.jclinepi.2007.11.008
2. Page MJ, Mohe D, Bossuy PM, e al. (2021) PRISMA 2020 explana ion and elabo a ion:
upda ed guidance and exempla s o epo ing sys ema ic e iews. BMJ, 2021(372):160. DOI:
10.1136/bmj.n160
3. Fishe RA, (1921) S udies in C op Va ia ion (I). An Examina ion o he Yield o D essed G ain
om B oadbalk". J. Ag ic. Sci. 11(2):107–135. DOI: 10.1017/S0021859600003750
4. Pea son K, (1900) On he c i e ion ha a gi en sys em o de ia ions om he p obable in he
case o a co ela ed sys em o a iables is such ha i can be easonably supposed o ha e
a isen om andom sampling. Philosophical Magazine. Se ies 5. 50(302):157–175. DOI:
10.1080/14786440009463897
5. Neyman J, Pea son ES, (1933) On he p oblem o he mos e icien es s o s a is ical
hypo heses. Phil. T ans. R. Soc. Lond. A. 231(694–706):289–337. DOI:
10.1098/ s a.1933.0009