GenomeDe ende : Valida ed High-P ecision De ec ion o Da a Poisoning A acks in
Single-Cell RNA-seq Da a using a Mul i-Model Ensemble
Au ho s: Rena o Ál a ez Ramos, Welin on Ba e a Mondaca1, Joaquín A aya Bus os1,
Claudia Cancino Qui oz1, Vic o Escoba Je ia2 , Ana Moya-Bel án2.
A ilia ions:1Escuela de In o má ica, Facul ad de Ingenie ía, Uni e sidad Tecnológica
Me opoli ana, San iago, Chile. 2Depa amen o de In o má ica y Compu ación,
Facul ad de Ingenie ía, Uni e sidad Tecnológica Me opoli ana, San iago, Chile.
As omics echnologies gene a e inc easingly la ge-complex da ase s, cybe secu i y
echniques a e becoming essen ial o ensu e da a in eg i y in biomedical esea ch.
Single-cell RNA sequencing (scRNA-seq), a key ool in mode n genomics, is pa icula ly
ulne able o da a poisoning a acks ha manipula e gene exp ession ma ices o deg ade
model pe o mance and dis o biological in e p e a ion. Common h ea s include label lips,
syn he ic cell injec ion, gene scaling, and s uc u ed noise—many o which mimic na u al
biological a ia ion, making hem di icul o de ec wi h con en ional QC me hods. To
add ess his c i ical ulne abili y, we p esen GenomeDe ende , a high-p ecision anomaly
de ec ion ool speci ically designed o scRNA-seq da a. GenomeDe ende in eg a es a
mul i-model ensemble a chi ec u e comp ising ou op imized deep lea ning componen s:
Va ia ional Au oencode s (VAEs), G aph Neu al Ne wo k Au oencode s (GNN-AEs),
Con as i e Au oencode s (CAEs), and Denoising Di usion P obabilis ic Models (DDPMs).
Each is ailo ed o de ec speci ic poisoning ec o s, wi h model pai ings s a egically
selec ed o complemen a y s eng hs. The pipeline includes p ep ocessing, pa allelized
model execu ion, and a decision usion module o obus classi ica ion o clean s. poisoned
da a.
GenomeDe ende consis en ly achie es p ecision abo e 95% ac oss all e alua ed a ack
ypes, alida ed h ough ex ensi e benchma king using eal-wo ld single-cell RNA-seq
da ase s (GSE154826). This le el o accu acy is enabled by he syne gis ic design o
ligh weigh , h ee-laye de ec ion a chi ec u es and s a egic ensemble usion. The ool
suppo s inc emen al aining, making i adap able o e ol ing da a dis ibu ions and new
a ack a ian s.
GenomeDe ende b ings cybe secu i y-d i en obus ness o scRNA-seq pipelines, deli e ing
a scalable and ep oducible solu ion wi hin HPC amewo ks and ein o cing he
us wo hiness o genomic disco e ies and p ecision medicine ini ia i es.
Acknowledgemen : Depa amen o de In o má ica y Compu ación, UTEM; Escuela de In o má ica, UTEM;
Labo a o io de In es igación Aplicada, Depa amen o de In o má ica y Compu ación, UTEM. This wo k was
suppo ed in pa by “Compe i ion o Resea ch Assis an Funding UTEM”, yea 2024, code AI24-11, and in pa
by he “Scien i ic and Technological Equipmen P ojec s Compe i ion, yea 2024, code LE24-03”.