scieee Science in your language
[en] (orig)

Uncertainty Quantification of Prediction Models for Differential Expression Analysis

Author: Seiler, Christof
Publisher: Zenodo
DOI: 10.5281/zenodo.17541260
Source: https://zenodo.org/records/17541260/files/2025_ZueKoSt.pdf
Unce ain y Quan i ica ion o P edic ion Models o
Di e en ial Exp ession Analysis
Ch is o Seile — h ps://ch is o seile .gi hub.io
Depa men o Rheuma ology, USZ/UZH, and
Depa men o Ad anced Compu ing Sciences, Maas ich Uni e si y
ZueKoS : Semina on Applied S a is ics, Zu ich, No embe 6, 2025
1/35
My Plan o Today
•Pa 1: Modeling
•Pa 2: P edic ion
2/35
Pa 1: Modeling
Di e en ial P o ein Ma ke Exp ession
cells ( ows)
cell ype
dono
condi ion
ea u es (columns)
ma ke coun s
cell in o
1
2
1
2
T
B
NK
T
B
NK
T
B
NK
T
B
NK
s im
un-
s im
non-ga ing (~10)
1
5
3
…
…
…
ma ke s o
compa e cell
ypes be ween
condi ions
A e he e any di e en ially exp essed p o ein ma ke s be ween expe imen al condi ions?
3/35
Common Analysis Wo k lows
1. Response a iable: mean coun s and cell ype abundance
•R package di cy (Webe e al. 2019):
High- esolu ion clus e ing and empi ical Bayes mode a ed es s adap ed om
ansc ip omics and uni a ia e ma ginal analyses
•F1000 CyTOF Wo k low (Nowicka e al. 2017):
Manual ga ing and uni a ia e ma ginal analyses
2. Response a iable: expe imen al condi ion
•R package Ci us (B uggne e al. 2014):
Hie a chical clus e ing and egula ized eg ession o selec p edic i e ea u es
•Py hon package CellCnn (A ani i and Claassen 2017):
Con olu ional neu al ne wo ks o de ec a e cell popula ions
•Limi a ions: Uni a ia e ma ginal analyses and/o no s a is ical gua an ees
4/35

Ma ginal s. Condi ional
•Common o analyze biological da a wi h ma ginal eg ession
(as in di cy (Webe e al. 2019))
•Why?
•Can p oduce p- alues o each p o ein ma ke ,
hen use BH (Benjamini and Hochbe g 1995) and BY (Benjamini and Yeku ieli 2001)
p ocedu es o con ol alse disco e y a e (FDR)
•Easy o in e p e
•Fas
5/35
Ma ginal s. Condi ional
•Bu ma ginal i ed coe icien s migh gi e he w ong answe s
•Example (Wake ield 2013): Suppose “ ue” model is
E(Y|X=x,Z=z) = β0+β1x+β2z
E(Z|X=x) = a+bx
•Then wha i we eg ess Yon x?
E(Y|X=x) = β0+aβ2+ (β1+bβ2)x
•Biased i Co (X,Z)6=0 and Ycondi ionally dependen on Z
6/35
Ou RPackage cy oe ec
Mul i a ia e Poisson Log-No mal Model wi h Ze o In la ion
•Model o p o ein ma ke s Y1,...,YKgi en expe imen al condi ion X=x
•Ze o in la ion:
•Yijk a e coun s in cell i, pa ien j, and p o ein ma ke k
•Flip a biased coin which lands Heads wi h p obabili y πjk
•I i comes up Heads, hen se Yijk =0, o he wise
Yijk ∼Poisson(λijk )
•Mul i a ia e Poisson (Chib and Winkelmann 2001):
log(λijk ) = βcond[i]k+bik +ujk
7/35
Low-Dimensional Summa y o Pos e io Samples
S ep 1: Compu e dis ance ma ix o each pos e io d aw o λ:
log 2median dis ance log i
alogi d
alls pa ien s
logIjn D s
log hin in
p o einma ke s Euclidean
dis ance
D s 11 11
p o ein
ma ke s
14/35

Low-Dimensional Summa y o Pos e io Samples
S ep 2: DiSTATIS (Abdi e al. 2005) on pos e io dis ance ma ices D(1),...,D(n):
DISTATIS pa ien B
Jpa ien c
Dhl
poin s m
y
hpa ien A
15/35
Real Da ase : P egnancy S udy
pCREB
pSTAT5
pP38
pSTAT1
pSTAT3
p pS6
pMAPKAPK
IkB
pNFkB
pERK1_2
A
A
B
B
C
C
D
D
E
E
F
F
G
G
HH
I
I
J
J
K
K
L
L
M
MN
N
O
OP
P
−0.25
0.00
0.25
−0.4 −0.2 0.0 0.2
Fac o 1
Fac o 2
e m 1s imes e 3 d imes e
dono
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
PTLG001
PTLG002
PTLG003
PTLG004
PTLG005
PTLG007
PTLG008
PTLG009
PTLG010
PTLG012
PTLG018
PTLG019
PTLG020
PTLG022
PTLG024
PTLG029
Pos e io DiSTATIS o La en Va iable λ
•Possible p oblem: MCMC sample (as in s an) will ake oo long 16/35
Pa ame ic Boo s ap DiSTATIS
1)
Fi Poisson
log
-no mal model o each pa ien -condi ion combina ion independen ly
using pai wise composi e likelihood (Lindsay 1988; Fieuws and Ve beke 2006;
Molenbe ghs and Ve beke 2006):
b
βand b
Σ
2) Gene a e pa ame ic boo s ap samples om he model:
Yik |λik ∼Poisson(λik ),
log(λik ) = b
βk+bik ,
bi∼No mal(0,b
Σ).
3) Replace pos e io samples wi h boo s ap samples o c ea e DiSTATIS plo
17/35
Compa ison: Pos e io s. Pa ame ic Boo s ap DiSTATIS
pCREB
pSTAT5
pP38
pSTAT1
pSTAT3
p pS6
pMAPKAPK
IkB
pNFkB
pERK1_2
A
A
B
B
C
C
D
D
E
E
F
F
G
G
HH
I
I
J
J
K
K
L
L
M
MN
N
O
OP
P
−0.25
0.00
0.25
−0.4 −0.2 0.0 0.2
Fac o 1
Fac o 2
e m 1s imes e 3 d imes e
dono
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
PTLG001
PTLG002
PTLG003
PTLG004
PTLG005
PTLG007
PTLG008
PTLG009
PTLG010
PTLG012
PTLG018
PTLG019
PTLG020
PTLG022
PTLG024
PTLG029
Pos e io DiSTATIS o La en Va iable λ
pCREB
pSTAT5
pP38
pSTAT1
pSTAT3
p pS6
pMAPKAPK
IkB
pNFkB
pERK1_2
A
A
B
B
C
C
DD
E
EF
F
G
G
H
H
I
IJ
J
K
KL
L
M
M
N
N
O
O
P
P
−0.3
−0.2
−0.1
0.0
0.1
0.2
−0.4 −0.2 0.0 0.2
Fac o 1
Fac o 2
e m 1s imes e 3 d imes e
dono
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
PTLG001
PTLG002
PTLG003
PTLG004
PTLG005
PTLG007
PTLG008
PTLG009
PTLG010
PTLG012
PTLG018
PTLG019
PTLG020
PTLG022
PTLG024
PTLG029
Pa ame ic Boo s ap DiSTATIS o La en Va iable λ
18/35
Conclusions
•Mul i a ia e models:
Desc ibe ma ke co ela ions o a oid biases
•Mixed models:
Desc ibe indi idual pa ien e ec s o a oid epo ing o e con iden esul s
•R package cy oe ec wi h igne es:
h ps://ch is o seile .gi hub.io/cy oe ec /
19/35

Pa 2: P edic ion
Join Wo k wi h
Jus ine Lecle c (PhD Candida e)
20/35
Coun e ac ual P edic ion
Science Table
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
TEH jig H
7
Die in y.cc
J.LT 44
Science Table
cell ea ed T
1
2
con ol C cell ea men e ec
obse ed p edic ed
p edic ed obse ed
21/35
S ep 4: Download Model wi h Calib a ed P edic ion In e als
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
A
calib a ing in e als!
on da ase
model wi h!
p edic ion in e als
a ge Y
a ge Y
p edic o Xp edic o X
no co e ed
27/35

RAI Pla o m — Reliable AI o Biomedicine
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
mm MM
Mm mm
amm mm
Mm Mm
mm
MM
mmm
my M
aa
i I
1) Upload p edic ion model
3) Calib a e model on da ase 4) Download model wi h
p edic ion in e als
a ge
a iable
ea u es
a ge
a iable
ea u es
a ge
a iable
ea u es
a ge
a iable
ea u es
2) Upload da ase
Funded by he Digi al Socie y Ini ia i e In as uc u e and Lab P og am in Zu ich 28/35
Spli Con o mal P edic ion
Spli he da a in o wo se s:
•D1called he p ope aining se o size n1=|D1|
•D2called he calib a ion se o size n2=|D2|
29/35
P ocedu e
1. Fi p edic ion model on D1 o ob ain b
n1
2. Compu e calib a ion se esiduals
Ri=|Yi−b
n1(Xi)|,i∈D2
3. Compu e con o mal quan ile
b
qn2=d(1−α)(n+1)esmalles o Ri,i∈D2
4. Compu e con o mal se
b
Cn(x) = hb
n1(x)−b
qn2,b
n1(x) + b
qn2i
30/35
Fini e-Sample Gua an ees
This p ocedu e gua an ees (Vo k, Gamme man, and Sha e 2005):
P(Yn+1∈b
C(Xn+1)|(Xi,Yi),i∈D1)≥1−α.
Rema ks:
•Replace he esiduals wi h o he sco e unc ions (lowe meaning be e )
•Only ma ginal co e age (Foygel Ba be e al. 2021)
31/35
Ou RPackage con o me R

Ou Idea
1 ea ed con ol 23
jp alues
1Aij Yij TJij c
iobse ed p edic ed i
7
Ama ix acons uc ed
wi h
con o mal
p edic ion
my
samples ea ed con ol NOVELTY
•R package con o me R unde de elopmen :
h ps://gi hub.com/juslecl/con o meR
32/35
Open P oblems
Ipos ea men Idis ibu ion shi Icondi ionco e age
a iables
ya.a gn'E e ni ies li e.ge
ha Ah 8
33/35
Cen e o Expe imen al Rheuma ology a USZ/UZH
Alexand a Khmele skaya Alexand e Meie
Alissa Weibel Amela Huka a
And ea Laimbache And ea Nüesch
Anna-Ma ia Ho mann-Vold Asimina Kakale
As id Ho man Bojana Mülle Du o ic
Camino Cal o Ca oline Ospel
Celina Geiss Danilo Menghini
Ellen Kossmann Elena Pache a
Ezen Ege Gab iela Kania
Geo gina Ma his-Pai o Gino Bonazza
Ie geniia Koche o a Jan De an
Jus ine Lecle c Ka e ina Apos olopoulou
Lau a Much Leyi Zhang
Lumeng Li Ma ija Luga
Ma yam Asadiko ayem Masoume Halsada Mi a
Mu iel Elhai Nicole Schneide
Oli e Dis le Pamela Bi e li
Pe e Künzle Phelipe Ha
Pha hamon Laphanuwa Philip S au e
Pie o Bea zi P zemek Blyszczuk
Raphael Miche oli Sey am Duphey
Shao Thing Teoh Silja Malkewi z
S e an Dudli Tama a Mengis
34/35
Thanks o Lis ening!
RAI Pla o m — Reliable AI o Biomedicine:
h ps:// ai.uzh.ch
Funded by he Digi al Socie y Ini ia i e o UZH
Please con ac us i you’d like o con o malize you models o
help us sol e he open p oblems!
Abdi, He é, Alice J O’Toole, Dominique Valen in, and Be y Edelman. 2005.
“DISTATIS: The Analysis o Mul iple Dis ance Ma ices.” In 2005 IEEE Compu e
Socie y Con e ence on Compu e Vision and Pa e n Recogni ion
(CVPR’05)-Wo kshops, 42–42. IEEE.
A ani i, Ei ini, and Man ed Claassen. 2017. “Sensi i e De ec ion o Ra e
Disease-Associa ed Cell Subse s ia Rep esen a ion Lea ning.” Na u e
Communica ions 8: 14825.
Benjamini, Yoa , and Yose Hochbe g. 1995. “Con olling he False Disco e y Ra e: A
P ac ical and Powe ul App oach o Mul iple Tes ing.” Jou nal o he Royal
S a is ical Socie y: Se ies B (Me hodological) 57 (1): 289–300.
Benjamini, Yoa , and Daniel Yeku ieli. 2001. “The Con ol o he False Disco e y Ra e
in Mul iple Tes ing Unde Dependency.” Annals o S a is ics, 1165–88.
B uggne , Robe V, Be nd Bodenmille , Da id L Dill, Robe J Tibshi ani, and Ga y P
Nolan. 2014. “Au oma ed Iden i ica ion o S a i ying Signa u es in Cellula
Subpopula ions.” P oceedings o he Na ional Academy o Sciences 111 (26):
E2770–77.
Ca pen e , Bob, And ew Gelman, Ma hew D Ho man, Daniel Lee, Ben Good ich,
Michael Be ancou , Ma cus B ubake , Jiqiang Guo, Pe e Li, and Allen Riddell.
2017. “S an: A P obabilis ic P og amming Language.” Jou nal o S a is ical
So wa e 76 (1).
Chib, Siddha ha, and Raine Winkelmann. 2001. “Ma ko Chain Mon e Ca lo Analysis
o Co ela ed Coun Da a.” Jou nal o Business & Economic S a is ics 19 (4):
428–35.
Eas on, Mo is L. 1989. “Chap e 7: Random O hogonal Ma ices.” In G oup
In a iance in Applica ions in S a is ics, Volume 1:100–107. Regional Con e ence
Se ies in P obabili y and S a is ics. Haywood CA; Alexand ia VA: Ins i u e o
Ma hema ical S a is ics; Ame ican S a is ical Associa ion.
h ps://p ojec euclid.o g/euclid.cbms/1462061037.
Fieuws, S e en, and Gee Ve beke. 2006. “Pai wise Fi ing o Mixed Models o he
Join Modeling o Mul i a ia e Longi udinal P o iles.” Biome ics 62 (2): 424–31.
Foygel Ba be , Rina, Emmanuel J Candes, Aadi ya Ramdas, and Ryan J Tibshi ani.
2021. “The Limi s o Dis ibu ion-F ee Condi ional P edic i e In e ence.”
In o ma ion and In e ence: A Jou nal o he IMA 10 (2): 455–82.
Jauch, Michael, Pe e D Ho , and Da id B Dunson. 2019. “Mon e Ca lo Simula ion on
he S ie el Mani old ia Pola Expansion.” a Xi P ep in a Xi :1906.07684.
Lewandowski, Daniel, Do o a Ku owicka, and Ha y Joe. 2009. “Gene a ing Random
Co ela ion Ma ices Based on Vines and Ex ended Onion Me hod.” Jou nal o
Mul i a ia e Analysis 100 (9): 1989–2001.
Lindsay, B uce G. 1988. “Composi e Likelihood Me hods.” Con empo a y Ma hema ics
80 (1): 221–39.
Molenbe ghs, Gee , and Gee Ve beke. 2006. Models o Disc e e Longi udinal Da a.
Sp inge -Ve lag New Yo k.
Nowicka, M, C K ieg, LM Webe , FJ Ha mann, S Guglie a, B Beche , MP Le esque,
and MD Robinson. 2017. “CyTOF Wo k low: Di e en ial Disco e y in
High-Th oughpu High-Dimensional Cy ome y Da ase s [Ve sion 2; Re e ees: 2
App o ed].” F1000Resea ch 6 (748).
h ps://doi.o g/10.12688/ 1000 esea ch.11622.2.
Vo k, Vladimi , Alexande Gamme man, and Glenn Sha e . 2005. Algo i hmic Lea ning
in a Random Wo ld. Vol. 29. Sp inge .
Wake ield, Jon. 2013. Bayesian and F equen is Reg ession Me hods. Sp inge Se ies in
S a is ics. Sp inge , New Yo k. h ps://doi.o g/10.1007/978-1-4419-0925-1.
Webe , Lukas M, Malgo za a Nowicka, Cha lo e Soneson, and Ma k D Robinson. 2019.
“Di cy : Di e en ial Disco e y in High-Dimensional Cy ome y ia High-Resolu ion
Clus e ing.” Communica ions Biology 2 (1): 183.
35/35