P e- egis e edcon i ma o y es o heψ-GIEUni e salLawo S uc u alE olu ionac ossbiological,
linguis ic,cul u al,anda i icialsys ems
Au ho s:
Gio anniEsposi o(ORCID:h ps://o cid.o g/0009-0002-5834-030X)
Email:[email p o ec ed]
Da eo p e egis a ion:24No embe 2025
ZenodoDOI( ese ed):10.5281/zenodo.17694192
Thisp e egis a ionis ozen.Nohypo heses,me ics,da ase s, h esholds,o analysiss epswillbemodi ied
a e hisda e.
1.Hypo heses(Locked)
H1‒Dec ys alliza ion egime(sys emsunde inc easedsubsidy)
P ima y:ΔRCcomp‒25% ela i e oma chedconse edbaseline
Seconda y:ΔRCkme ‒15%,H el+12%,Massoc+20%
H2‒C ys alliza ion egime(cons ainedin e aces)
P ima y:ΔRCcomp+15%
Seconda y:ΔRCkme +80%,Massoc60
H3‒Conse a ion egime
P ima y:ΔRCcomp+45%
Seconda y:ΔRCkme +150%
H4‒Uni e sali y
Th esholdsclassi y85%o p e-speci iedda ase sco ec lyac ossDNA,RNA,na u allanguage,sou ce
code,bioacous ics,andcul u al ex swi hou domain-speci ic uning.
2.LockedMe ics
1.ΔRCcomp‒gzip-based ela i ecomp essionc ys allini y
2.ΔRCkme ‒k-me igidi ychange(k=6biological,k=3linguis ic/audio)
3.H el‒ ela i eShannonen opy
4.Massoc‒associa i emo i mul iplici y
5.ψ-index‒weigh edcomposi e(0.4/0.2/0.2/0.2)
Implemen a ion:Allanalysiscodeandgene a edda awillbea chi eddi ec lyonZenodounde DOI
10.5281/zenodo.17694192.
3.Con i ma o yDa ase s(Exac )
Biological:
HAR1 egion(hg38ch 7:5,500,000‒5,502,000)
Lac aseenhance ENH0008452
HOXD70 egula o y egion
GENCODE 44coding/noncodingse s
Linguis ic&Cul u al:
Vic o ianco pus(P ojec Gu enbe g,p e-1900)
Mode nis co pus(Ulysses,TheWa es)
Digi al-e aco pus(CommonC awl2023)
Ju|hoanclicklanguageco pus
Hawaiianhis o icalphonologyco pus
AnimalCommunica ion:
Nigh ingaleDB 3.1
Humpbackwhale(1971+2023)
Ve e ala mcalls
Mee ka ocaliza ions
A i icialSys ems:
GPT-2→GPT-4con inua ions(50k okenseach)
Human+AImixedco po a(ThePilesubse s)
AI-onlygene a ions
4.AnalysisPlan
1.Compu eallme icsusing ozensc ip sa chi edonZenodo.
2.Gene a e1,000ma chednullsequences(uni o m,Ma ko -1/2,PCFG).
3.Co ec classi ica ion equi es:3/5me icsinp edic ed h esholdAND ealda aexceeding99 h
pe cen ileo nulls.
4.H1‒H4con i medonlyi 85%da ase sco ec lyclassi ied(binomialexac es ,α=0.01).
5.Robus ness:mul iscales abili y(128‒1024 okens), empo almono onici y,c oss-modali yconsis ency.
5.ExclusionC i e ia
Sequences<1000 okens
Genomicgaps>5%
Humanco po awi h>20%AI-gene a edcon en
Audio<3secondsdu a ion
6.Blinding
Da ase iden i ie swillbeenc yp ed;allme iccompu a ion unsblind.
7.Falsi ica ionC i e ia(anyone igge s ejec ion)
<70%o dec ys allizingsubs a esshowΔRCcomp‒25%
<70%o cons ainedin e acesshowΔRCcomp+15%
<70%o conse a ionsys emsshowΔRCcomp+45%
Nulldis ibu ionso e lap>2domains(MahalanobisD²<4)
Mul iscale eg ession e e sesp edic eddi ec ion
8.Da a&CodeA ailabili y
Allda aandanalysissc ip swillbea chi edunde ZenodoDOI:
10.5281/zenodo.17694192
Noaddi ionalda ase s,me ics,o explo a o yanalyseswillbeadded.
9.LicenseandAccess
CC-BY4.0
Openaccess