scieee Science in your language
[en] (orig)

HERALD: High-resolution Early Recognition of Antigenic Landscape Divergence

Author: Davis, Bee Rosa
Publisher: Zenodo
DOI: 10.5281/zenodo.17682618
Source: https://zenodo.org/records/17682618/files/HERALD_Validation_Study_II__Retrospective_Time_Slice_Replay_on_SARS_CoV_2_Genomic_Data.pdf
HERALD Valida ion S udy II: Re ospec i e Time-Slice Replay on
SARS-CoV-2 Genomic Da a (“Time T a ele ”)
Bee Rosa Da is
No embe 21, 2025
Abs ac
Following he geome ic sepa a ion con i med on RBD deep-mu a ional scanning da a (Val-
ida ion I), his s udy e alua es whe he he ozen HERALD encode ϕexhibi s ou -o - ime
gene aliza ion du ing his o ical SARS-CoV-2 eme gence episodes. We ins an ia e a p e egis e ed
S age 2 d i pipeline—la en dis ance →PIT no maliza ion →Gaussianized Z—and execu e a
e ospec i e ime-slice eplay on a delibe a ely spa se mon hly slice (2020–2023). Despi e he
low-da a egime, he olling-ancho signal (DJump) issued a high-con idence ale in Sou h A ica
du ing Janua y 2022 (Z= 2.29; one-sided p≈0.011), coinciden wi h he Omic on BA.1 wa e.
Because No embe –Decembe 2021 bins we e unpopula ed, absolu e lead- ime o WHO is no
iden i iable he e; ins ead, we quan i y suppo -adjus ed la ency and show nea -ze o delay once
da a esumes. This demons a es obus ness o he lea ned an igenic geome y unde s a a ion
and mo i a es a ull-densi y Valida ion III o lead- ime benchma king.
1 Execu i e Summa y
This s udy is he second empi ical pilla o HERALD. I ope a ionalizes he Re ospec i e Time-
Slice Replay p o ocol o es he hypo hesis ha geome y an icipa es epidemiology when he
geome y is ained only on 2020 DMS escape signals and hen ozen.1
Key Findings.
•An igenic shi de ec ed unde s a a ion. The olling-ancho s a is ic DJump igge ed
a s a is ically signi ican ale in Sou h A ica, Janua y 2022 (Omic on BA.1), despi e mon hly
agg ega ion and low Npe bin.
•Suppo -adjus ed immediacy. Wi h ze o sequences in No –Dec 2021, absolu e lead- ime o
WHO is unde ined in his slice; howe e , suppo -adjus ed la ency om i s suppo ed bin o
ale is ≈0 days (ins an aneous once da a appea s).
•F ozen geome y, empo al gene aliza ion. The same ϕ ha p oduced a posi i e sepa a-
ion gap on held-ou DMS mu an s (Valida ion I) gene alized o 2022 da a wi hou any e aining,
suppo ing he s abili y o he “dis ance⇒escape” mani old.
1S age 2 PIT d i and eplay de ini ions ollow he HERALD manusc ip (Me hods §4.4, P o ocol §5.2; me ics
§5.4).
1
2 Me hods
2.1 Da a, Scope, and Binning
We use he Nex s ain open SARS-CoV-2 da ase ; coun ies: Uni ed Kingdom, Sou h A ica, USA;
ime: 2020–2023. To s ess- es obus ness, we agg ega e by mon h and cap pe -bin sampling
(uni o m) a Nmax = 200 a e minimum suppo Nmin.
Table 1: P o ocol Con igu a ion (Spa se Regime)
Componen Speci ica ion
Da a sou ce Nex s ain Open (human hos ; QC passed)
Regions UK, Sou h A ica, USA
Binning Mon hly ( obus o weekly gaps)
Sampling Uni o m subsample pe (coun y, mon h); Nmax = 200
Unde powe ed Bins wi h N < Nmin excluded om sco ing
Encode F ozen HERALD ϕ om Valida ion I checkpoin
Signals DWuhan (absolu e); DJump (episode-local)
Nulls Fixed his o ical window (W= 3 mon hs) pe -coun y, pe -signal
No maliza ion P obabili y In eg al T ans o m (PIT) →Z= Φ−1(U)
Ale ule Fi s mon h wi h Z( )> θ (one-sided)
Th eshold θ∈ {1.5,2.0,2.5}; spa se de aul θ= 1.5
Baseline Hamming- o-Wuhan, 90 h pe cen ile, iden ical sampling and PIT
2.2 Geome y and Dis ance
Each RBD sequence xis embedded by ϕin o z(x); dis ances a e Euclidean in la en space. P ima y
ancho is Wuhan-Hu-1. Fo episodes, a olling ancho is he consensus o he p e iously dominan
lineage. Pe -bin aw s a is ics a e 90 h pe cen iles:
SWuhan(c, ) = p90{∥zi−zWuhan∥2}, SJump(c, ) = p90{∥zi−zp e ∥2}.
2.3 PIT No maliza ion and D i Sco es
Fo a coun y cand s a is ic S, es ima e a leakage- ee empi ical CDF Fc(s) on a ixed his o ical
window W ha ends a cu da e c. Apply he p obabili y in eg al ans o m o ob ain Uc( ) =
Fc(S(c, )) and Gaussianize Zc( )=Φ−1(Uc( )). We epo DWuhan and DJump sepa a ely; s acking
is no used in his spa se slice.
2.4 Suppo -Adjus ed La ency
S anda d lead- ime me ics equi e con inuous suppo . We he e o e epo :
∆da a ≡ i s -suppo ed-bin − benchma k,∆ale |suppo ≡ ale − i s -suppo ed-bin.
Small ∆ale |suppo indica es he sys em i es as soon as da a exis s.
2
3 Resul s
3.1 Au oma ed Ale Log (Sou h A ica)
Table 2: High-con idence ale in a spa se mon hly eplay (one-sided p om Z).
Signal Ale Da e Z p (app ox) Associa ed E en
DJump 2022-01-17 2.29 0.011 Omic on BA.1
3.2 La ency Decomposi ion (Sou h A ica, Omic on BA.1)
Table 3: Sepa a ing da a la ency om model la ency.
Quan i y Value / In e p e a ion
Da a co e age (No –Dec 2021) 0 sequences ⇒bins unsuppo able
Fi s suppo ed bin 2022-01 (da a s eam esumes)
Ale da e 2022-01-17 ( i s mon h wi h suppo )
Suppo -adjus ed la ency ∆ale |suppo ≈0 days (ins an )
In e p e a ion. Absolu e lead- ime s. WHO (No 26, 2021) canno be compu ed on his slice.
Howe e , he nea -ze o suppo -adjus ed la ency demons a es ha he geome y igge s immedi-
a ely once sequences appea , consis en wi h a obus de ec o ope a ing unde s a a ion.
3.3 Visual Valida ion
Figu e 1: T a ic-ligh hea map o DWuhan (dis ance om Wuhan-Hu-1). Mono one d i inc eases
o e ime, as expec ed.
3
Figu e 2: T a ic-ligh hea map o DJump ( olling ancho ). A dis inc high-Zblock appea s in
Sou h A ica in ea ly 2022, coinciden wi h Omic on BA.1.
4 Baselines, Pa i y, and Powe
Apples- o-apples Hamming. We compu e he Hamming baseline wi h iden ical sampling (uni-
o m wi hin bin), iden ical 90 h-pe cen ile agg ega ion, and iden ical PIT- o-Zno maliza ion. This
sa is ies e iewe pa i y and isola es geome y’s con ibu ion.
Unde powe ed bins. We explici ly lag and exclude bins wi h N < Nmin o a oid uns able PIT
beha io . Mon hly agg ega ion mi iga es saw oo h a i ac s inhe en in spa se weekly slices.
5 Discussion
5.1 Geome y ⇒Signal unde S a a ion
Valida ion I demons a ed a posi i e sepa a ion gap ∆ on held-ou DMS escape pai s, implying ha
la en dis ance ca ies unc ional meaning. Valida ion II shows ha he same geome y, wi hou
any e aining, p oduces an Omic on-scale jump signal he ins an da a becomes a ailable. This is
p ecisely he se ing whe e geome y should ou pe o m aw mu a ion coun s.
5.2 Limi a ions and Nex S eps
•Spa si y. Wi h emp y bins (No –Dec 2021), absolu e lead- ime me ics (LT10%,LTWHO) a e
non-iden i iable. We he e o e epo suppo -adjus ed la ency he e and ese e lead- ime bench-
ma king o Valida ion III wi h ull-densi y weekly da a.
•Calib a ion scope. Th esholds (θ) and null windows we e chosen o sensi i i y in spa si y;
ope a ing poin s should be e-op imized unde ull co e age.
5.3 Sa e y and Go e nance
We adhe e o a su eillance-only scope: no sequence op imiza ion o anking o hypo he ical a i-
an s; coa se-g ained a ibu ions only; OOD/unce ain y abs en ion i s ; and e sioned, audi able
h esholds. A nega i e-con ol ca alog and wo-pe son in eg i y a e used when ele a ing beyond
Tie 1 (wa ch).
4
6 Conclusion
Valida ion II con i ms ha HERALD’s lea ned geome y is ope a ionally obus in a low-da a
en i onmen : i i es a co ec jump ale p omp ly once sequences appea , wi h nea -ze o suppo -
adjus ed la ency. A ull-densi y Valida ion III will ai ly adjudica e absolu e lead- ime s. WHO
and 10% benchma ks while e aining he same ozen geome y and PIT appa a us.
Rep oducibili y (A i ac s & Se ings)
Py hon sc ip s implemen : Nex s ain inges ion; uni o m subsampling; ESM→ ϕembedding; S-
s a is ics (90 h pe cen ile); PIT- o-Zusing leakage- ee ixed windows; ale scans; and hea maps.
Episode ancho s a e speci ied ia a JSON ile (coun y, s a da e, p e-window, lineage). Key
pa ame e s o his un: agg=mon h,N min=10,N max=200,null mode= ixed window,W=3 mon hs,
and θ∈ {1.5,2.0,2.5}.
5