CARLA 20
25
/s
LUAD-Syn hNe : Gene a i e Ad e sa ial Ne wo ks o Syn he ic
Single-Cell T ansc ip omics in Lung Adenoca cinoma
Mo i a ion:
Da a:
Da a collec ion was conduc ed using a highly eliable, publicly accessible sou ce: he Na ional Lib a y o Medicine
(NLM). Speci ically, he GSE154826 da ase was accessed, which is a ailable h ough he Gene Exp ession Omnibus
(GEO) da abase o he Na ional Cen e o Bio echnology In o ma ion (NCBI). This ca alog is a global e e ence o
gene exp ession p o iling s udies, ensu ing ha he da a used in his esea ch come om a epu able eposi o y wi h
igo ous alida ion.
The sca ci y o biological samples and he high a iabili y o umo mic oen i onmen s p esen signi ican challenges o
building obus models in cance esea ch. In esponse, his s udy p oposes LUAD-Syn hNe , a gene a i e amewo k o
p oducing syn he ic single-cell RNA-seq (scRNA-seq) gene exp ession p o iles ep esen a i e o lung adenoca cinoma
(LUAD) cells.
Goal:
The goal is o expand exis ing da ase s wi h high-
ideli y syn he ic da a using Gene a i e Ad e sa ial
Ne wo k (GAN) o acili a e he aining and hypo hesis
gene a ion o new p edic ion models in cance
esea ch.
Conclusion
Me hodology
Joaquín A aya Bus os , Welin on Ba e a Mondaca , Rena o Ál a ez Ramos , Claudia Cancino Qui oz ,
Jo ge e ga a-Quezada , Ana Moya-Bel án
1 1 1 1
2 2
Escuela de In o má ica, Facul ad de Ingenie ía, Uni e sidad Tecnológica Me opoli ana, San iago, Chile.
Depa amen o de In o má ica y Compu ación, Facul ad de Ingenie ía, Uni e sidad Tecnológica Me opoli ana, San iago, Chile.
1
2
Me hodological Limi a ion: The Ins abili y o GAN
A chi ec u es
The inhe en ins abili y o gene a i e models can be aced
scaling up he complexi y o he me hodology.
Failu e o S anda d Models: A s anda d GAN ailed
comple ely due o mode collapse. This o ced us o
disca d simple app oaches and in es ime and
compu a ional esou ces in o implemen ing a mo e
ad anced and s able a chi ec u e, he WGAN-GP.
P ep ocessing Limi a ion: The Ex eme Na u e o
Genomic Da a
The main echnical ba ie was no he model i sel , bu
he dis ibu ion o he inpu da a.
Ex emely Skewed Da a: The high dispe sion o gene
exp ession da a (wi h mos alues being ze o) c ea ed
a "lea ning landscape" ha was nea ly impossible o
he GAN o na iga e.
Need o Ad anced Techniques: S anda d
no maliza ion echniques we e no enough. The
p ojec 's success depended on implemen ing a non-
i ial and compu a ionally mo e expensi e solu ion.
The Quan ile T ans o me was applied o ans o m he
da a in o a dis ibu ion ha acili a es e ec i e model
lea ning.
Limi a ion in E alua ion and Valida ion
Finally, a limi a ion inhe en o he en i e ield o
gene a i e models is he lack o a single "success" me ic.
Mul i ace ed Valida ion: I is necessa y o combine
mul iple analyses (PCA, KDE, co ela ion hea maps,
e o me ics, e c.) o build a con incing case o he
model's ideli y.
Limi a ions o he S udy
Ja ayabu@u em.cl
amoya@u em.cl
Con ac :
Va iance is no simply s a is ical "noise"; i e lec s he di e si y o unc ional
s a es, cell cycles, and pheno ypic esponses ha exis in a popula ion o eal
cells.
The g aph no only exhibi
a nume ical di e ence; i
also expose a undamen al
and sys ema ic limi a ion
o he gene a i e model: i s
inabili y o eplica e cellula
he e ogenei y.
This g aph desc ibe a
model ha has success ully
lea ned he "who's who" o
he ansc ip ome, bu s ill
hasn' mas e ed he
"quan um" o i s
exp ession, e ealing
quan i a i e biases ha limi
i s di ec applica ion.
1.
Acknowledgemen :
Depa amen o de In o má ica y Compu ación, UTEM; Escuela de In o má ica, UTEM; Labo a o io de In es igación Aplicada,
Depa amen o de In o má ica y Compu ación, UTEM. This wo k was suppo ed in pa by “Compe i ion o Resea ch Assis an Funding
UTEM”, yea 2024, code AI25-11, and in pa by he “Scien i ic and Technological Equipmen P ojec s Compe i ion, yea 2024, code
LE24-03”
2.
The g aph shows he
dis ibu ion o exp ession
alues (loga i hmic scale)
o se e al ele an genes,
compa ing eal da a
(g een) wi h syn he ic da a
(blue). Each poin
ep esen s an indi idual
cell.
Side-by-side isualiza ion allows o isual assessmen o he ideli y o
syn he ic da a wi h espec o eal pa e ns. The high conco dance in he
shape, ange, and dispe sion o he dis ibu ions sugges s ha he syn he ic
da a co ec ly cap u e he key s a is ical cha ac e is ics o he o iginal da a,
which is essen ial o applica ions in compu a ional analysis and model
alida ion.
This wo k success ully es ablishes a obus pipeline o gene a ing LUAD
genomic da a using a WGAN-GP and ad anced p ep ocessing. The esul s
demons a e high isual ideli y, eplica ing he global s uc u e, indi idual
dis ibu ions, and complex in e dependencies be ween genes (Co ela ion
Hea maps).
Howe e , quan i a i e me ics e eal sub le o e i ing and unde es ima ion
o a iance in ce ain genes, limi a ions expec ed gi en he size o he
aining da ase . These indings, while indica ing ha he model is " ai " in
e ms o s a is ical pe ec ion, a e c ucial and di ec ly guide ou u u e wo k.
The samples used in his s udy we e classi ied acco ding
o hei o igin and he pa ien 's diagnosis:
a) pa ien s wi h lung adenoca cinoma (LUAD) and
b) pa ien s wi h lung squamous cell ca cinoma (LUSC).
Two ype o samples we e ob ained pe pa ien : p ima y
umo issue and issue om a di e en ana omical si e.
The s udy was limi ed o he analysis o
37 lung adenoca cinoma (LUAD)
samples, selec ed om a o al se o 49
samples.
Gene a o
Inpu : A andom noise ec o (100 dims) ha ac s as a seed.
Func ion: An expande ne wo k ha ans o ms he noise in o a syn he ic gene ic
p o ile (2,404 dims). I uses Linea , Ba chNo m1d ( o s abili y), and LeakyReLU
laye s. The inal Tanh laye scales he ou pu o he ange [-1, 1].
C i ical
Inpu : A gene ic p o ile (2,404 dims), ei he eal o syn he ic.
Func ion: A educe ne wo k ha e alua es he sample and comp esses i in o a
single ealism sco e. I uses Linea , LeakyReLU, and D opou ( o egula iza ion)
laye s. C ucially, he ou pu laye has no ac i a ion unc ion.
Bo h ne wo ks we e ained
compe i i ely, using he Adam
op imize and he
BCEWi hLogi sLoss loss
unc ion, un il he Gene a o
becomes so good ha he
Disc imina o can no longe
dis inguish ake da a om eal
da a.
Me ic Con lic : Hea maps showed esounding
success. While he mo e s ingen quan i a i e es
e ealed he p oblem o o e i ing. This demons a es
he limi a ion o elying solely on isual inspec ion and
he need o mo e igo ous es ing.
Di ec Compa ison o Mean
Exp ession
Va iance Compa ison pe Gene
S ep 1: Ini ial Se up
Objec i e:
Gene a e
cance da a
wi h a GAN
Failu e 1: Da a
loading e o
'ValueE o :
num_samples=0'
Diagnosis:
d opna()
emo es all
ows
Solu ion:
Implemen a obus
da a_loade
S ep 2: T ain
S anda d GAN
Failu e 2: Mode
Collapse PCA/KDE
show sepa a e
clus e s
Diagnosis:
The s anda d
GAN is oo
uns able
S ep 3: Implemen
WGAN-GP
S ep 4: T ain
WGAN-GP
S ep 5: Hypo hesis:
The da a dis ibu ion
mus be 'smoo hed'
S ep 6: T ain WGAN-
GP wi h ans o med
da a
Failu e 3: The
ailu e pe sis s
Same isual esul s
as in Failu e 2
Diagnosis:
The p oblem is
no he model,
bu he da a
dis ibu ion
Solu ion: Use
Quan ileT ans o me
o no malize he
dis ibu ion
Solu ion: Use
Quan ileT ans o me
o no malize he
dis ibu ion
Success! PCA, KDE,
and Hea map show
excellen o e lap
and co ela ion
Re e ence:
1.- Maie B, Leade AM, Chen ST. A conse ed dend i ic-cell egula o y p og am limi s an i umou immuni y. Na u e.
2020;580:257–262.
2.- Godec T, G ozdeno ic E. LncRNAWiki: a comp ehensi e esou ce o long noncoding RNAs. Nucleic Acids Res.
2020;48(15):e85. doi:10.1093/na /gkaa527