scieee Science in your language
[en] (orig)

Quantifying Structural Selection Bias in Observational Cohort Data: A Ponderation Analysis of Age- Specific Incidence Rates to Inform Vaccine Safety Verification

Author: Roccetti, Marco
Publisher: Zenodo
DOI: 10.5281/zenodo.17714308
Source: https://zenodo.org/records/17714308/files/Roccetti-vitiligo-final.pdf
1
Resea ch A icle
Quan i ying S uc u al Selec ion Bias in Obse a ional Coho Da a: A Ponde a ion Analysis o Age-
Speci ic Incidence Ra es o In o m Vaccine Sa e y Ve i ica ion
Ma co Rocce i
Depa men o Compu e Science and Enginee ing
Uni e si y o Bologna, 40126, I aly
ma co. occe [email protected]
ORCID: 0000-0003-1264-8595
Abs ac
Backg ound: A ecen na ionwide coho s udy epo ed an unadjus ed Haza d Ra io (HR) o 2.714 o i iligo
incidence ollowing COVID-19 accina ion, indica ing a majo sa e y conce n. This inding was based on coho s
wi h an ≈ 11-yea age di e ence, immedia ely aising c i ical conce ns ega ding ex eme s uc u al selec ion and
de ec ion bias.
Objec i es: We hypo hesize ha his ex eme associa ion is an a i ac o a a al me hodological law, challenging
he s udy's in e nal alidi y and subsequen ex e nal alidi y. We aim o quan i a i ely sepa a e he HR a ibu able
o he s uc u al age imbalance (HR S uc u al) om he esidual HR (HR Residual) which measu es he
unco ec ed me hodological ailu e.
Me hods: We pe o med a s a i ied ponde a ion analysis using he age dis ibu ion o he sc u inized s udy’s
coho s (Vaccina ed, mean age=56.32 yea s s Non-Vaccina ed, mean age=45.51 yea s) and applied es ablished
na ional age-speci ic i iligo incidence a es (IR) om ex e nal epidemiology. This allowed us o quan i a i ely
sepa a e he HR a ibu able o he s uc u al age imbalance (HR S uc u al) om he esidual HR (HR Residual),
which measu es he unco ec ed me hodological ailu e.
2
Resul s: The HR S uc u al was calcula ed o be 1.2821. This obus co ec ion demons a es ha he s uc u al
age di e ence explains only 16.43% o he obse ed excess isk. The emaining HR Residual (2.1168) is he exac
measu e o he me hodological ailu e caused by he double dis o ion in coho design (lowe ing he baseline isk
in he NonVaccine g oup while maximizing he de ec ion isk in he Vaccine g oup.
Discussion: The HR=2.714 o he sc u inized s udy is an uns able s a is ical a i ac . The o e whelming majo i y
o he obse ed associa ion is a consequence o a a al design law, no a biological isk, esul ing in a se e e lack
o in e nal and ex e nal alidi y.
Keywo ds: Co id-19 accine sa e y, S uc u al selec ion bias, Ponde a ion analysis, Haza d a io decomposi ion,
In e nal/Ex e nal alidi y, De ec ion bias
1. In oduc ion
Obse a ional s udies u ilizing na ional egis ies, such as hose conduc ed in Sou h Ko ea [1], ep esen a c i ical
esou ce o pos -ma ke ing su eillance and accine sa e y e i ica ion. Howe e , he eliance on p e-exis ing
da a necessi a es s ic adhe ence o es ablished me hodological s anda ds, no ably he STROBE (ST eng hening
he Repo ing o Obse a ional s udies in Epidemiology) guidelines [2]. The p ima y goal is o ensu e in e nal
alidi y, ha he obse ed associa ion is eal wi hin he s udy con ex , which is a p e equisi e o achie ing ex e nal
alidi y ( ha is gene alizabili y o he b oade popula ion).
The ecen s udy published in [1] epo ed a s ikingly high, unadjus ed Haza d Ra io (HRG oss) o 2.714 o
i iligo ollowing COVID-19 accina ion, based on a compa ison be ween a Vaccina ed (V) coho
(mean age=56.32 yea s) and a Non-Vaccina ed (Non-Vaccina ed) coho (mean age=45.51 yea s). This ≈11-yea
age di e ence immedia ely lagged c i ical conce ns ega ding con ounding by indica ion and immo al ime bias
[3]. The shee magni ude o he ≈11-yea age di e ence, coupled wi h he cumula i e incidence a es obse ed
(2.22 s 0.67 pe 10,000), s ongly sugges s ha he coho s we e inhe en ly non-compa able.
3
Ou analysis posi s ha he epo ed HR=2.714 is no a e lec ion o a obus biological signal bu a he a
quan i a i e measu e o a a al design law. We hypo hesize ha an Ex eme S uc u al Selec ion and De ec ion
Bias was in oduced by de ining he coho s in a manne ha a i icially minimized he baseline isk in he NV
g oup, while simul aneously maximizing he de ec ion and p e alence isk in he V g oup. We p esen a igo ous,
quan i a i e me hod, ha is a s a i ied ponde a ion analysis using ex e nal Sou h Ko ean na ional age-speci ic
incidence da a, o decompose he obse ed HR and isola e he ue con ibu ion o he s uc u al bias [6-10].
The quan i a i e indings o he p esen esea ch con i m his hypo hesis: he s uc u al age di e ence alone
accoun s o a calcula ed S uc u al Haza d Ra io (HR S uc u al) o 1.282. This means ha he obse ed
demog aphic imbalance explains only 16.43% o he epo ed excess isk. The majo i y o he associa ion is
cap u ed by he Residual Haza d Ra io (HR Residual) o 2.117, which s ands as a clea measu e o he unco ec ed
me hodological ailu e. This subs an ial esidual alue s ongly indica es ha he coho s we e no subjec o a
common suppo , leading o a p o ound iola ion o he assump ion o compa abili y equi ed by he Cox
P opo ional Haza ds model u ilized in he in es iga ed s udy.
Ul ima ely, he goal o his e-e alua ion is o easse he impe a i e o epidemiological alidi y in s udies o
accine sa e y de i ed om obse a ional da a. We demons a e ha sophis ica ed s a is ical adjus men s canno
emedy undamen al laws in coho design whe e non-measu ed con ounding ac o s, such as heal h-seeking
beha io and su eillance equency (i.e., De ec ion Bias), a e une enly dis ibu ed [7]. By quan i a i ely isola ing
and measu ing he non-causal s uc u al bias, ou analysis p o ides a c i ical amewo k o in e p e ing ex eme
isk es ima es and ensu ing ha public heal h conclusions a e based on associa ions ha a e epidemiologically
sound, a he han a i ac ua.l
2. Me hods
We he e p o ide all he undamen al me hods and da a use ul o he aim o ponde ing he s uc u al bias on which
we a e ines iga ing.
4
2.1 S udy Da a and Baseline Cha ac e is ics
We ex ac ed he ollowing key da a om [1] o es ablish he basis o he s uc u al bias as epo ed in he ollowing
Table 1.
Table 1: Baseline Cha ac e is ics and Unadjus ed Incidence Ra es om [1].
Coho
Mean Age
S anda d De ia ion
(SD)
Cumula i e Incidence Ra e (a 3 mo) (pe
10,000 p-y)
Non-Vaccina ed
(NV)
45.51 yea s
17.31
P(NV) = 0.67
Vaccina ed (V)
56.32 yea s
16.55
P(V) = 2.22
2.2 S a i ied Ponde a ion Analysis
We pe o med a i s p elimina y quan i a i e analysis by combining he age dis ibu ion pe cen ages P(i) o he
V and NV g oups o [1] wi h independen , es ablished age-speci ic annual incidence a es IR(i) o i iligo in
Sou h Ko ea, based on 2019 da a as epo ed in [5].
Table 2: Inpu Da a o Ponde a ion Analysis (Weigh ed IR)
Age G oup
Sou h Ko ean IR (pe 10,000 p-y)
% in NV G oup
P(i, NV)
% in V G oup
P(i,V)
< 20 y
3.4241
No included in [1]
No included in [1]
20-29 y
1.5717
18.46%
9.92%
30-39 y
1.7813
25.49%
7.70%
40-49 y
1.9053
20.92%
10.82%
5
Age G oup
Sou h Ko ean IR (pe 10,000 p-y)
% in NV G oup
P(i, NV)
% in V G oup
P(i,V)
50-59 y
2.5874
14.16%
24.76%
>= 60 y
3.3643
20.97%
45.79%
To al
—
100%
100%
2.3 Calcula ion o HR S uc u al and HR Residual
The Expec ed Annual Incidence Ra e, IR(Expec ed,) o each coho , based solely on i s s uc u al age
composi ion, can be calcula ed using he ollowing Fo mula 1:
IR(Expec ed) = ∑(IR(i) × P(i)). (1)
Whe e IR(i) a e he Sou h Ko ean age-speci ic incidence a es om ex e nal da a (Table 2) and P(i) a e he
p opo ional dis ibu ions o he espec i e coho (V o NV) epo ed in he same Table 2.
Applying his ponde a ion o he Non-Vaccina ed (NV) coho demog aphics, we ob ain he baseline expec ed
incidence, IR(NV, Expec ed) exac ly as ollows:
IR(NV, Expec ed) = (1.5717 × 0.1846) + (1.7813 × 0.2549) + (1.9053 ×0.2092) + (2.5874 × 0.1416) + (3.3643 ×
0.2097 ≈ 2.1611 / 10,000.
Simila ly, applying he ponde a ion o he Vaccina ed (V) coho demog aphics yields IR(V, Expec ed):
IR(V, Expec ed) = (1.5717 × 0.0992) + (1.7813 × 0.0770) + (1.9053 × 0.1082) + (2.5874 × 0.2476) + (3.3643 ×
0.4579) ≈ 2.7709 / 10,000.
This allowed us o calcula e he HR S uc u al as ollows: HR S uc u al = 2.7709 / 2.1611 ≈ 1.2821.
Finally, he HR Residual can be compu ed as he a io be ween he HR p o ided in [1] ( e med HR Obse ed) and
ou compu ed HR S uc u al: HR Residual = HR Obse ed / HR S uc u al = 2.714 / 1.2821 ≈ 2.1168.

6
3. Resul s:
Following he calcula ion o he Expec ed Incidence Ra es IR(Expec ed) based solely on he s uc u al age
composi ions o he wo coho s (Sec ion 2.3), we p oceeded o quan i y he ue ex en o he me hodological
ailu e.
This in ol ed decomposing he high, obse ed HR Obse ed =2.714 om [1] in o wo dis inc componen s: he
isk a ibu able pu ely o he s uc u al age imbalance (HR S uc u al) and he isk s emming om all o he
unco ec ed design laws and selec ion biases (HR Residual). Since Haza d Ra ios combine mul iplica i ely, ha
is HR Obse ed = HR S uc u al × HR Residual, he HR Residual hus ac s as a p ecise me ic o he deg ee o
non-compa abili y ha pe sis s despi e accoun ing o he known age di e ence. The b eakdown o his isk is
de ini ely p esen ed in Table 3.
Table 3: Decomposing he Obse ed Haza d Ra io (HR=2.714 [1])
Pa ame e
Desc ip ion
Value
Con ibu ion o Excess Risk
(HR−1)
HR Obse ed
Unadjus ed Haza d
Ra io om [1]
2.714
100%
HR S uc u al
HR due o Age
S uc u al Bias Alone
1.2821
16.43%
HR Residual
HR Unexplained by
S uc u al Age Bias
2.1168
83.57%
In closing his Sec ion, i is c ucial o emphasize how ou obus , age-speci ic ponde a ion analysis has shown ha
he s uc u al age di e ence explains only 16.43% o he excess isk signaled by he au ho s o [1]. The
o e whelming majo i y o he associa ion (83.57%, esul ing in an HR Residual o ≈ 2.12) is en i ely a ibu able
7
o unco ec ed me hodological laws which should be a ibu ed o a basic ailu e in he cons uc ion o he coho
and heu subg oups.
4. Discussion
We will summa ize he key akeaways o his discussion in o wo p ima y issues o ensu e hey a e p ope ly
highligh ed, no ing ha a he hea o he ma e lie p oblems o loss o compa abili y and esul ing clinical
signi icance.
4.1 The Collapse o In e nal Validi y: The Double Dis o ion Mechanism
The pe sis ence o he high esidual HR (2.117) a e obus adjus men o age s uc u e (HR S uc u al = 1.282)
p o ides a de ini i e quan i a i e p oo ha he coho s ha e been cons uc ed as non-compa able. The s udy's
design o [1] su e s om a double dis o ion mechanism ha undamen ally iola es he co e p emise o
obse a ional epidemiology.
Fi s , we a e alking abou issues o an a i icial baseline dep ession o he NV sub-g oup. In ac , he NV g oup
was disp opo iona ely composed o indi iduals in he 20−49 yea age ange, which alls in o he na u al low-
su eillance and pos - i s -peak incidence phase. Indi iduals in his g oup a e less likely o seek equen medical
ca e. C i ically, he small ac ion o olde indi iduals (>= 50 yea s) who chose no o be accina ed du ing a
majo pandemic likely ep esen s an excep ionally heal hy su i o coho o indi iduals wi h minimal in e ac ion
wi h he heal hca e sys em [8]. This demog aphic makeup na u ally supp esses bo h he ue incidence a e and
he a e o diagnosis (De ec ion Bias), yielding an a i icially low baseline o 0.67/10,000.
Second, we need o con on wi h an in la ed incidence by de ec ion and isk in he V subg oup. Con e sely, in
ac , he V g oup's composi ion o [1] (≈ 70% aged >= 50 yea s) gua an ees maximal isk exposu e, encompassing
he en i e second incidence peak o Vi iligo. Fu he mo e, he choice o accina e du ing a pandemic signi ies a
highe le el o heal h consciousness and engagemen wi h medical se ices. This heigh ened su eillance and
u iliza ion bias ensu es ha e en subclinical cases o i iligo o s able cases a e mo e likely o be diagnosed and
logged du ing he b ie ollow-up pe iod, in la ing he obse ed a e o 2.22/10,000.
8
This ex eme s uc u al sepa a ion, especially in he high- isk and high-su eillance age ca ego ies, ep esen s a
iola ion o he common suppo assump ion. The Cox model emplyed in he sc u inized s udy, he e o e, did no
compa e like wi h like, bu a he measu ed he isk di e en ial be ween an a i icially clean con ol g oup and a
maximally su eilled isk g oup.
4.2 Clinical and Ex e nal Validi y Implica ions o Vaccine Sa e y Ve i ica ion
The ex eme HR Residual ≈ 2.12 canno be in e p e ed as a genuine biological e ec . A ue biological signal o
his magni ude would equi e a plausible mechanism ha is no con ounded by he age s uc u e, a mechanism he
o iginal s udy could no isola e. Ins ead, he inding is a di ec esul o he design, which ende s he s udy's
conclusions no ex e nally alid o any clinical scena io.
The clinical implica ion is ha he epo ed HR = 2.714 o [1] is g a ely misleading o pa ien s and clinicians. I
does no e lec he inc emen al isk o accina ion bu a he he di e ence in unde lying heal h and heal hca e
seeking beha io be ween wo demog aphically dis inc g oups in Sou h Ko ea. This me hodological ailu e is a
se ious b each o epidemiological epo ing s anda ds in he sense o he STROBE p o ocol and unde mines he
u ili y o na ional egis y da a o assessing accine sa e y signals when p ope coho ma ching is neglec ed.
4.3 Limi a ions and Fu u e Di ec ions
We acknowledge se e al limi a ions o ou ponde a ion analysis. Fi s , he HR S uc u al calcula ion elies on he
assump ion ha he ex e nal, age-speci ic incidence a es IR(i) de i ed om he gene al Sou h Ko ean popula ion
(as epo ed om di e en pe spec i es in all he a ailable li e a u e [4-6]) accu a ely e lec he ue baseline isk
wi hin he na ional heal h insu ance se ice da a u ilized in [1]. Second, ou analysis only add esses con ounding
in oduced by s uc u al age di e ences; we a e unable o quan i y he esidual con ibu ions o o he unmeasu ed
a iables, such as socioeconomic s a us (SES), co-mo bidi ies, o he p ecise e ec o De ec ion Bias ela ed o
a ying heal hca e u iliza ion equency, all o which likely in la ed he HR Residual. Fu he mo e, his
me hodological c i icism does no exclude he possibili y o a smalle , genuine biological signal (HR < 1.2821),
which would be e ealed only h ough a p ope ly designed s udy u ilizing igh P opensi y Sco e Ma ching (PSM)
and ime- a ying exposu e analysis which was clea ly no used by he au ho s o [1]. None heless, we main ain
ha ou p esen analysisi has p o ided a c ucial quan i a i e amewo k o c i ically assessing he alidi y o
9
la ge epidemiological isk es ima es de i ed om imbalanced coho s and p o iding a ele an con ibu ion
owa ds he ideli y and e i iabili y o accine sa e y signals de i ed om obse a ional coho da a.
5. Conclusion
The associa ion be ween COVID-19 accina ion and i iligo (HR = 2.714) as epo ed in [1] is an ex eme
s a is ical a i ac . Ou obus ponde a ion analysis, based on speci ic Sou h Ko ean age-incidence a es,
de ini i ely p o es ha he s uc u al age imbalance explains only a mino ac ion (HR S uc u al ≈ 1.282) o he
obse ed isk. The o e whelming HR Residual o 2.117 is he quan i a i e measu e o he me hodological ailu e
caused by he S uc u al Selec ion and De ec ion Bias ha ha e a ec ed he cons uc ion o he e ospec i e
coho o [1]. This ailu e o es ablish genuinely compa able coho s has comp omised bo h he in e nal alidi y
and ex e nal alidi y o he in es iga ed s udy. The epo ed inding o [1] is he e o e non-causal and should no
be used o in o m public heal h policy o sa e y communica ion. We u ge he e-e alua ion and po en ial
econside a ion o he s udy's conclusions. Ou indings also unde sco e he pe sis en eliance on and espec o
me hodological Gold S anda ds in clinical esea ch [9, 10]. While inno a ion in epidemiological design is c ucial,
hese es ablished benchma ks mus only be challenged o supe seded by new s udies ea u ing supe io in e nal
alidi y and obus co ec ion o all known sou ces o bias.
Au ho In o ma ion
Ma co Rocce i: Depa men o Compu e Science and Enginee ing, Uni e si y o Bologna, 4016 Bologna, I aly,
ma co. occe [email protected] . ORCID: 0000-0003-1264-8595, sole and co esponding au ho
Au ho Con ibu ions
MR concei ed and designed he s udy, ca ied ou all da a collec ion and analysis, in e p e ed he quan i a i e
esul s, and was he sole au ho esponsible o w i ing and e ising he manusc ip . The au ho a i ms ull
esponsibili y o he in eg i y o he da a and he accu acy o he da a analysis p esen ed.
E hics app o al and consen o pa icipa e
This s udy uses publicly a ailable, agg ega ed da a ha con ains no p i a e in o ma ion. The e o e, e hical
app o al is no equi ed