scieee Science in your language
[en] (orig)

Supplementary Material for "Hyperbolic Nature of Differential Expression Signatures"

Author: Pogány, Domonkos
Publisher: Zenodo
DOI: 10.1109/TCBBIO.2025.3612275/mm1
Source: https://zenodo.org/records/17742098/files/Supplementary%20Material.pdf
1
Supplemen a y Ma e ial o pape :
Hype bolic Na u e o Di e en ial Exp ession
Signa u es
Domonkos Pog´
any, P´
e e An al
OVERVIEW OF SUPPLEMENTARY MATERIALS
This supplemen a y ma e ial p o ides addi ional de ails and ex ended esul s suppo ing ou main s udy.
•Sec ion S.I expands on he heo e ical connec ion be ween di e en ial exp ession gene (DEG) signa u es and hype bolic
geome y, complemen ing Sec ion IV o he main pape :
–S.I.A: Analyzing he scale- ee na u e o DEG signa u es ac oss di e en da ase s and dis ance me ics.
•Sec ion S.II p o ides addi ional de ails o he dimensionali y educ ion expe imen in Sec ion V o he main manusc ip :
–S.II.A: De ailing he hype pa ame e op imiza ion p ocess.
–S.II.B: Replica ing he esul s o he compa a i e analysis using he SigCom LINCS da ase .
–S.II.C: P o iding supplemen a y 2D isualiza ions.
–S.II.D: In es iga ing he connec ion be ween he embeddings and he scale- ee na u e o DEG signa u es.
S.I. SCALE-FREE NATURE OF DEG SIGNATURES
A. Resul s o e Da ase s and Dis ance Me ics
This subsec ion p o ides addi ional analyses suppo ing he indings p esen ed in Sec ion IV o he main pape .
To e i y he obus ness o he obse ed scale- ee p ope ies o di e en ially exp essed gene (DEG) signa u es, we epea ed
he so h esholding analysis on he L1000FWD landma k da ase [S1] using al e na i e dis ance me ics. In addi ion o
cosine simila i y, escaled o [0,1] as Sij = (Sij + 1)/2, we applied Euclidean and Canbe a dis ances, escaled as Sij =
1−Dij/max(D). Figu e S1 illus a es he esul s, demons a ing ha o bo h me ics, DEG signa u es eached an R2abo e
0.9, ein o cing hei scale- ee na u e.
Addi ionally, o assess he gene alizabili y o ou indings ac oss da ase s, we conduc ed he same analysis on he ull
L1000FWD da ase as well as he SigCom LINCS A375, NPC, NEU, and HELA da ase s [S2] (using cosine simila i y). As
shown in Figu e S2, ou obse a ion holds o all he da ase s.
Fig. S1. Analysis o he scale- ee na u e o L1000FWD landma k DEG
signa u es ac oss di e en dis ance me ics. The R2 alues o he powe -law
i a e plo ed agains di e en powe s o he simila i y ma ices.
Fig. S2. Analysis o he scale- ee na u e o DEG signa u es ac oss di e en
da ase s. The R2 alues o he powe -law i a e plo ed agains di e en
powe s o he simila i y ma ices.
D. Pog´
any and P. An al a e wi h he Depa men o A i icial In elligence and Sys ems Enginee ing, Budapes Uni e si y o Technology and Economics,
Budapes , Hunga y. E-mail: {pogany, an al}@mi .bme.hu.
2
S.II. DIMENSIONALITY REDUCTION ON DEG SIGNATURES
A. Hype pa ame e s
This subsec ion p o ides addi ional de ails o he hype pa ame e op imiza ion p esen ed in Sec ion V o he main pape .
To ensu e he obus ness o ou e alua ions, we conduc ed a comp ehensi e hype pa ame e sea ch ac oss mul iple se ings.
To assess local s uc u e p ese a ion, we pe o med K-Nea es Neighbo (KNN) classi ica ion on se e al asks, using 5- old
epea ed c oss- alida ion, epo ing he mean and s anda d de ia ion o he F1 sco e o each me ic. A simila epea ed
p ocedu e was applied o global s uc u e e alua ions using Spea man co ela ion (SC) and andom iple accu acy (RTA),
whe e we sampled 5,000 pai s and iple s pe e alua ion. In addi ion, each expe imen using mani old lea ning me hods
(Uni o m Mani old App oxima ion and P ojec ion (UMAP) [S3]; i s hype bolic e sion, HUMAP; and he Poinca ´
e maps
(PMAP) [S4]) was epea ed i e imes o accoun o hei non-de e minis ic na u e.
We e alua ed di e en dis ance me ics o DEG signa u es, es ing Euclidean, cosine, and Canbe a dis ances, bo h wi h
and wi hou p io s anda diza ion. Figu es S3 o S5 p esen he KNN classi ica ion esul s on DEG signa u es ac oss he
L1000FWD landma k, L1000FWD ull, and SigCom LINCS A375 da ase s. S anda diza ion did no imp o e pe o mance o
any me ic, and cosine consis en ly ou pe o med Euclidean wi h he same execu ion ime. Canbe a dis ance, as p e iously
epo ed [S5], achie ed sligh ly be e KNN classi ica ion esul s, pa icula ly o cell ype and ime poin p edic ion. Howe e ,
i did no imp o e MOA classi ica ion o la e global e alua ion me ics. Rega ding he compu a ional cos o he KNN
classi ica ion, e en in he ela i ely small L1000FWD landma k da ase , i had 10 imes longe execu ion ime han he cosine
and Euclidean dis ances. Mo eo e , unlike he o he me ics, i s ime depends on he numbe o ea u es, no jus he samples.
As a consequence, i ook mo e han 12 hou s o calcula e he Canbe a dis ance ma ix on he la ge da ase s. Conside ing he
only ma ginal imp o emen and i s as compu a ional cos , we excluded i om u he analysis. Figu e S6 compa es mani old
lea ning me hods using Euclidean and cosine me ics as inpu s, showing ha cosine consis en ly ou pe o med Euclidean ac oss
all me hods and hype pa ame e se ings. Based on hese indings, we used cosine simila i y wi hou p io s anda diza ion
h oughou ou s udy o KNN classi ica ion on DEG signa u es and as he inpu me ic o dimensionali y educ ion. This
choice aligns wi h s anda d p ac ices in analyzing DEG signa u es de i ed using he Cha ac e is ic Di ec ion (CD) me hod [S6].
We also examined he hype pa ame e s o he KNN classi ie , namely he numbe o neighbo s and whe he o use dis ance-
weigh ed p edic ions. Ou expe imen s wi h k=5, 10, and 50 neighbo s showed no signi ican pe o mance di e ences,
leading us o selec k= 5 wi h weigh ed dis ances o consis ency. Figu es S3 o S5 p esen he KNN classi ica ion esul s
ac oss mul iple da ase s, u he suppo ing his choice. We es ed a ious combina ions o in es iga e po en ial in e ac ions
be ween he KNN classi ica ion neighbo numbe and he numbe o neighbo s used o cons uc ing KNN g aphs a he ini ial
s ep o mani old lea ning me hods. As shown in Figu e S7, we ound no connec ion be ween he wo pa ame e s. While he
pe o mance o mani old lea ning me hods is sensi i e o he KNN g aph neighbo numbe , he KNN classi ica ion neighbo
numbe had li le o no impac .
We hen op imized he hype pa ame e s o he mani old lea ning me hods. The esul s o local and global s uc u e
p ese a ion ac oss mul iple da ase s and la en dimensions a e p esen ed in Figu es S8 o S10 o UMAP, Figu es S11
o S13 o HUMAP, and Figu es S14 o S17 o PMAP. Addi ionally, we epo execu ion imes, bu he e, hey e e o he
dimensionali y educ ion s ep alone, excluding KNN classi ica ion and global e alua ion ime. We ocused on op imizing he
numbe o neighbo s in he KNN g aph o all me hods, he minimum dis ance o UMAP and HUMAP, and he lea ning
a e o PMAP. O he hype pa ame e s we e also es ed bu showed minimal impac , so we e ained hei de aul alues. Fo
ins ance, sigma and gamma pa ame e s o PAMP we e kep a hei de aul alues o 1.0 and 2.0, espec i ely, as shown in
Figu e S15. Fo UMAP and HUMAP, we es ed minimum dis ance alues o 0.001, 0.01, 0.05, 0.1, and 0.5. We obse ed
minimal a ia ion in local and global me ics and no subs an ial e ec on execu ion ime. To balance local and global s uc u e
p ese a ion, we selec ed a minimum dis ance o 0.1. Fo PMAP, we explo ed lea ning a es be ween 0.01 and 1.0. Lea ning
a es below 0.1 esul ed in subop imal pe o mance, while hose abo e 0.1 showed no signi ican imp o emen s. Howe e , since
highe lea ning a es led o as e con e gence, we selec ed 1.0 o e iciency. Ac oss all me hods, he numbe o KNN g aph
neighbo s was he mos in luen ial hype pa ame e , de e mining he ade-o be ween local and global s uc u e p ese a ion.
To cap u e bo h ex emes, we used wo con igu a ions: k= 5 o enhanced local s uc u e and k= 250 o imp o ed global
s uc u e. The only excep ion was UMAP and HUMAP on he SigCom LINCS da ase , whe e k= 50 ( he de aul se ing o
UMAP isualiza ion in his da ase [S2]) yielded he bes o e all pe o mance.
Rega ding execu ion ime, mos hype pa ame e s had li le impac . Since all me hods ope a e on he dis ance ma ix o DEG
signa u es, hey a e independen o he numbe o ea u es (genes) in he signa u es. S ill, hey a e a ec ed by he numbe o
samples (signa u es). UMAP and HUMAP showed mino sensi i i y o embedding dimensions, whe eas PMAP equi ed much
mo e ime in highe dimensions. As we men ioned be o e, he e was a s ong nega i e co ela ion be ween lea ning a e and
he execu ion ime o PMAP. The numbe o KNN g aph neighbo s also s ongly a ec ed he execu ion ime o bo h me hods.
In e es ingly, while UMAP and HUMAP an as e wi h ewe KNN g aph neighbo s, he opposi e is ue o PMAP, which
was signi ican ly slowe when using a small numbe o neighbo s. We disco e ed expe imen ally ha PMAP con e ges as e
wi h la ge neighbo coun s, eaching i s ea ly s opping c i e ion soone .
3
Fig. S3. Hype pa ame e sea ch on he L1000FWD da ase o he landma k DEG signa u es, illus a ing mean and s anda d de ia ions o he F1 sco es o
a ious KNN classi ica ion asks and he execu ion imes.
Fig. S4. Hype pa ame e sea ch on he L1000FWD da ase o he ull DEG signa u es, illus a ing mean and s anda d de ia ions o he F1 sco es o a ious
KNN classi ica ion asks.
Fig. S5. Hype pa ame e sea ch on he SigCom LINCS A375 cell line o he DEG signa u es, illus a ing mean and s anda d de ia ions o he F1 sco es o
a ious KNN classi ica ion asks.
4
Fig. S6. KNN classi ica ion esul s on he L1000FWD landma k da ase ac oss a ious signa u e dis ance me ics, dimensionali y educ ion me hods, and
hype pa ame e s. The igu e shows he mean and s anda d de ia ions o F1 sco es o di e en classi ica ion asks.
Fig. S7. KNN classi ica ion esul s on he L1000FWD landma k da ase ac oss dimensionali y educ ion me hods, dimensionali y educ ion g aph neighbo
coun s (neighbo s), and KNN classi ica ion neighbo coun s (KNN neighbo s). The igu e shows he mean and s anda d de ia ions o F1 sco es o di e en
classi ica ion asks.
5
Fig. S8. Hype pa ame e sea ch on he L1000FWD da ase o he UMAP me hod in 2D, illus a ing mean and s anda d de ia ions o he F1 sco es o
a ious KNN classi ica ion asks, he global s uc u e p ese a ion me ics, and he execu ion imes.
Fig. S9. Hype pa ame e sea ch on he L1000FWD da ase o he UMAP me hod in 64D, illus a ing mean and s anda d de ia ions o he F1 sco es o
a ious KNN classi ica ion asks, he global s uc u e p ese a ion me ics, and he execu ion imes.

6
Fig. S10. Hype pa ame e sea ch on he SigCom LINCS A375 cell line o he UMAP me hod in 2D, illus a ing mean and s anda d de ia ions o he F1
sco es o a ious KNN classi ica ion asks and he global s uc u e p ese a ion me ics.
Fig. S11. Hype pa ame e sea ch on he L1000FWD da ase o he HUMAP me hod in 2D, illus a ing mean and s anda d de ia ions o he F1 sco es o
a ious KNN classi ica ion asks, he global s uc u e p ese a ion me ics, and he execu ion imes.
7
Fig. S12. Hype pa ame e sea ch on he L1000FWD da ase o he HUMAP me hod in 64D, illus a ing mean and s anda d de ia ions o he F1 sco es o
a ious KNN classi ica ion asks, he global s uc u e p ese a ion me ics, and he execu ion imes.
Fig. S13. Hype pa ame e sea ch on he SigCom LINCS A375 cell line o he HUMAP me hod in 2D, illus a ing mean and s anda d de ia ions o he F1
sco es o a ious KNN classi ica ion asks and he global s uc u e p ese a ion me ics.
8
Fig. S14. Hype pa ame e sea ch on he L1000FWD da ase o he PMAP me hod in 2D, illus a ing mean and s anda d de ia ions o he F1 sco es o
a ious KNN classi ica ion asks, he global s uc u e p ese a ion me ics, and he execu ion imes.
Fig. S15. Hype pa ame e sea ch on he L1000FWD da ase o he PMAP me hod in 2D, illus a ing mean and s anda d de ia ions o he F1 sco es o
a ious KNN classi ica ion asks and he global s uc u e p ese a ion me ics
9
Fig. S16. Hype pa ame e sea ch on he L1000FWD da ase o he PMAP me hod in 64D, illus a ing mean and s anda d de ia ions o he F1 sco es o
a ious KNN classi ica ion asks, he global s uc u e p ese a ion me ics, and he execu ion imes.
Fig. S17. Hype pa ame e sea ch on he SigCom LINCS A375 cell line o he PMAP me hod in 2D, illus a ing mean and s anda d de ia ions o he F1
sco es o a ious KNN classi ica ion asks and he global s uc u e p ese a ion me ics.
16
Fig. S27. Two-dimensional signa u e isualiza ion on he SigCom LINCS HELA cell line, compa ing di e en dimensionali y educ ion me hods and neighbo
coun s. Embeddings a e colo ed acco ding o he mos equen mechanisms o ac ion.
Fig. S28. Two-dimensional signa u e isualiza ion on he SigCom LINCS HELA cell line, compa ing di e en dimensionali y educ ion me hods and neighbo
coun s. Embeddings a e colo ed acco ding o he mos equen pe u ba ion imes.

17
Fig. S29. Two-dimensional signa u e isualiza ion on he SigCom LINCS NPC cell line, compa ing di e en dimensionali y educ ion me hods and neighbo
coun s. Embeddings a e colo ed acco ding o he mos equen mechanisms o ac ion.
Fig. S30. Two-dimensional signa u e isualiza ion on he SigCom LINCS NPC cell line, compa ing di e en dimensionali y educ ion me hods and neighbo
coun s. Embeddings a e colo ed acco ding o he mos equen pe u ba ion imes.
18
Fig. S31. Two-dimensional signa u e isualiza ion on he SigCom LINCS NEU cell line, compa ing di e en dimensionali y educ ion me hods and neighbo
coun s. Embeddings a e colo ed acco ding o he mos equen mechanisms o ac ion.
Fig. S32. Two-dimensional signa u e isualiza ion on he SigCom LINCS NEU cell line, compa ing di e en dimensionali y educ ion me hods and neighbo
coun s. Embeddings a e colo ed acco ding o he mos equen pe u ba ion imes.
19
D. Connec ion wi h he Scale-F ee Na u e
In he ollowing, we p o ide de ails o he connec ion be ween he scale- ee na u e o DEG signa u es and he ob ained
embeddings. Speci ically, we examined how well hese embeddings cap u e he deg ee dis ibu ion o he DEG signa u es. As
ou lined in Sec ion IV o he main pape , we de ined node deg ees in wo ways: (i) a disc e e deg ee de i ed om a ha d
h eshold o 0.65 applied o he cosine simila i y ma ix Sij, and (ii) a con inuous weigh ed deg ee, compu ed as he sum o
he weigh ed adjacency ma ix Aij =Sβ
ij, wi h β= 7 chosen o main ain a scale- ee s uc u e.
We i s analyzed a subse o he L1000FWD da ase con aining only 2,000 signa u es, he same subse used in Figu e 1 o
he main pape . Figu e S33 isualizes he ne wo k ob ained using he 0.65 h eshold, whe e node size is p opo ional o disc e e
deg ee and colo e lec s weigh ed deg ee. To quan i a i ely assess he ela ionship be ween deg ee and embeddings, we plo ed
weigh ed deg ee agains embedding no ms and compu ed signed R2 alues o a linea i . Resul s o a ious dimensionali y
educ ion me hods, KNN neighbo numbe s, and embedding dimensionali y a e de ailed in Figu es igs. S34 o S37. We hen
ex ended his analysis o he en i e da ase wi h 16,848 signa u es. Figu e S38 p esen s a simila isualiza ion as be o e. To
ge consis en esul s, we ansla ed he signa u e wi h he highes weigh ed deg ee o he o igin, e lec ing ou expec a ion
ha highly connec ed nodes should be in he cen e o he embedding space. Fo HUMAP and PMAP, his was pe o med
using Poinca ´
e ansla ions [S4]. Figu e S39 illus a es a wo-dimensional example o he Poinca ´
e ansla ion. The esul s o
he analysis a e shown in Figu es igs. S40 o S44. No ably, a e ansla ion, embedding no ms we e equal o dis ances om
he highes -deg ee node.
On he small subse , we obse ed a s ong nega i e co ela ion be ween PMAP embedding no m and deg ee wi h |R2|>0.8.
Howe e , his e ec weakened in lowe dimensions and small KNN neighbo numbe s. In con as , none o he embeddings
showed a s ong alignmen wi h deg ee dis ibu ion on he ull da ase . Me hods wi h s ong global s uc u e p ese a ion, such
as PCA and PMAP wi h high KNN neighbo numbe s, cap u ed deg ee in o ma ion sligh ly be e han o he app oaches bu
only wi hin he ange o 0.3<|R2|<0.4. This may be because all es ed me hods ely on KNN g aph cons uc ion, which
di e s opologically om he scale- ee ne wo k ob ained ia ha d o so h esholding. These indings sugges ha u u e
esea ch could explo e mani old lea ning echniques speci ically designed o embed he h esholded g aphs in o hype bolic
space.
Fig. S33. Two-dimensional ne wo k isualiza ion on he L1000FWD subse wi h he i s 2,000 signa u es. The ne wo k was cons uc ed using a simila i y
h eshold o 6.5. Node coo dina es we e ob ained om di e en dimensionali y educ ion me hods and neighbo coun s. Node sizes a e p opo ional o hei
deg ee, and colo s ep esen hei weigh ed deg ee.
20
Fig. S34. Rela ionship be ween weigh ed deg ee and PCA embedding no m on he L1000FWD subse wi h he i s 2,000 signa u es. Embedding no ms a e
plo ed agains weigh ed deg ees and colo ed by deg ee. R-squa ed alues o a linea i be ween weigh ed deg ees and no ms a e assessed ac oss di e en
dimensions.
Fig. S35. Rela ionship be ween weigh ed deg ee and UMAP embedding no m on he L1000FWD subse wi h he i s 2,000 signa u es. Embedding no ms
a e plo ed agains weigh ed deg ees and colo ed by deg ee. R-squa ed alues o a linea i be ween weigh ed deg ees and no ms a e assessed ac oss di e en
dimensions and neighbo coun s.
21
Fig. S36. Rela ionship be ween weigh ed deg ee and HUMAP embedding no m on he L1000FWD subse wi h he i s 2,000 signa u es. Embedding no ms
a e plo ed agains weigh ed deg ees and colo ed by deg ee. R-squa ed alues o a linea i be ween weigh ed deg ees and no ms a e assessed ac oss di e en
dimensions and neighbo coun s.

22
Fig. S37. Rela ionship be ween weigh ed deg ee and PMAP embedding no m on he L1000FWD subse wi h he i s 2,000 signa u es. Embedding no ms
a e plo ed agains weigh ed deg ees and colo ed by deg ee. R-squa ed alues o a linea i be ween weigh ed deg ees and no ms a e assessed ac oss di e en
dimensions and neighbo coun s.
Fig. S38. Two-dimensional ne wo k isualiza ion on he L1000FWD da ase . The ne wo k was cons uc ed using a simila i y h eshold o 6.5. Node coo dina es
we e ob ained om di e en dimensionali y educ ion me hods and neighbo coun s. Node sizes a e p opo ional o hei deg ee, and colo s ep esen hei
weigh ed deg ee.
23
Fig. S39. Hype bolic ansla ion ope a ion in he 2D PMAP space. The same L1000FWD ne wo k is isualized, bu o he igh , he node wi h he highes
weigh ed deg ee is shi ed o he cen e .
Fig. S40. Rela ionship be ween weigh ed deg ee and adjus ed 2D FWD embedding no m on he L1000FWD da ase . Adjus ed no ms a e plo ed agains
weigh ed deg ees and colo ed by deg ee.
Fig. S41. Rela ionship be ween weigh ed deg ee and adjus ed PCA embedding no m on he L1000FWD da ase . Adjus ed no ms a e plo ed agains weigh ed
deg ees and colo ed by deg ee. R-squa ed alues o a linea i be ween weigh ed deg ees and adjus ed no ms a e assessed ac oss di e en dimensions.
24
Fig. S42. Rela ionship be ween weigh ed deg ee and adjus ed UMAP embedding no m on he L1000FWD da ase . Adjus ed no ms a e plo ed agains weigh ed
deg ees and colo ed by deg ee. R-squa ed alues o a linea i be ween weigh ed deg ees and adjus ed no ms a e assessed ac oss di e en dimensions and
neighbo coun s.
Fig. S43. Rela ionship be ween weigh ed deg ee and adjus ed HUMAP embedding no m on he L1000FWD da ase . Adjus ed no ms a e plo ed agains
weigh ed deg ees and colo ed by deg ee. R-squa ed alues o a linea i be ween weigh ed deg ees and adjus ed no ms a e assessed ac oss di e en dimensions
and neighbo coun s.
25
Fig. S44. Rela ionship be ween weigh ed deg ee and adjus ed PMAP embedding no m on he L1000FWD da ase . Adjus ed no ms a e plo ed agains weigh ed
deg ees and colo ed by deg ee. R-squa ed alues o a linea i be ween weigh ed deg ees and adjus ed no ms a e assessed ac oss di e en dimensions and
neighbo coun s.