AI-Generated Song Detection via Lyrics Transcripts

Author: Markus Frohmann; Elena Epure; Gabriel Meseguer Brocal; Markus Schedl; Romain Hennequin

Publisher: Zenodo

DOI: 10.5281/zenodo.17706345

Source: https://zenodo.org/records/17706345/files/000013.pdf

AI-GENERATED SONG DETECTION VIA LYRICS TRANSCRIPTS
Ma kus F ohmann1,2Elena V. Epu e1Gab iel Mesegue -B ocal1
Ma kus Schedl2,3Romain Hennequin1
1Deeze Resea ch, Pa is, F ance 2Johannes Keple Uni e si y Linz, Aus ia
3Linz Ins i u e o Technology, AI Lab, Aus ia
[email p o ec ed], {ma kus. ohmann, ma kus.schedl}@jku.a
ABSTRACT
The ecen ise in capabili ies o AI-based music gene -
a ion ools has c ea ed an uphea al in he music indus-
y, necessi a ing he c ea ion o accu a e me hods o de-
ec such AI-gene a ed con en . This can be done using
audio-based de ec o s; howe e , i has been shown ha
hey s uggle o gene alize o unseen gene a o s o when
he audio is pe u bed. Fu he mo e, ecen wo k used ac-
cu a e and cleanly o ma ed ly ics sou ced om a ly ics
p o ide da abase o de ec AI-gene a ed music. How-
e e , in p ac ice, such pe ec ly ics a e no a ailable (only
he audio is); his lea es a subs an ial gap in applicabil-
i y in eal-li e use cases. In his wo k, we ins ead p opose
sol ing his gap by ansc ibing songs using gene al au o-
ma ic speech ecogni ion (ASR) models. Once ansc ibed,
ly ics a e again a ailable in a ex ep esen a ion, and es-
ablished AI-gene a ed ex de ec ion me hods can be ap-
plied. We do his using se e al de ec o s. The esul s on di-
e se, mul i-gen e, and mul i-lingual ly ics show gene ally
s ong de ec ion pe o mance ac oss languages and gen es,
pa icula ly o ou bes -pe o ming model using Whispe
la ge- 2 and LLM2Vec embeddings. In addi ion, we show
ha ou me hod is mo e obus han s a e-o - he-a audio-
based ones when he audio is pe u bed in di e en ways
and when e alua ed on di e en music gene a o s. 1
1. INTRODUCTION
In ecen yea s, he ull gene a ion o musical audio wi h
a i icial in elligence sys ems has ma u ed [1–3] and is
now widely deployed in comme cial sys ems such as Suno,
Udio, S able Audio, and Ri usion.
The gene a ion o his con en p esen s new challenges
o he music indus y: Re enue dilu ion o eal a is s
due o AI-gene a ed acks leading o p o i s [4], copy igh
in ingemen issues [5], lack o anspa ency o he end
use , o ca alog looding o music s eaming se ices. Fo
1Ou code is a ailable a h ps://gi hub.com/deeze /
obus -AI-ly ics-de ec ion.
© M. F ohmann, E. Epu e, G. Mesegue -B ocal, M. Schedl,
R. Hennequin. Licensed unde a C ea i e Commons A ibu ion 4.0 In-
e na ional License (CC BY 4.0). A ibu ion: M. F ohmann, E. Epu e,
G. Mesegue -B ocal, M. Schedl, R. Hennequin, “AI-Gene a ed Song De-
ec ion ia Ly ics T ansc ip s”, in P oc. o he 26 h In . Socie y o Music
In o ma ion Re ie al Con ., Daejeon, Sou h Ko ea, 2025.
ins ance, Deeze epo ed ha 10% o he acks deli e ed
o hem a e gene a ed by Udio o Suno, which is mo e han
10,000 acks daily [6]. The e o e, he e is a p essing and
g owing need o iden i y his AI-gene a ed con en . While
signed me ada a [7] and wa e ma king [8] s anda ds ha e
been p oposed o moni o and ce i y he sou ce and his-
o y o con en , hey ha e no ye been widely adop ed by
he music indus y. The only la gely deployable solu ion
cu en ly emains au oma ic de ec ion o his con en .
Some ecen me hods [9,10] based on small CNN mod-
els ha e epo ed a e y high de ec ion accu acy (mo e han
99%) o de ec ing AI-gene a ed music audio iles. How-
e e , hese me hods, p ima ily based on le e aging low-
le el a i ac s o audio neu al decode s, do no gene alize
o unseen gene a ion models. In addi ion, hey a e e y
sensi i e o audio manipula ions such as playback speed
modi ica ions and can hen be easily a acked. [9,10]
Con e sely, in [11], he au ho s p opose a me hod o
de ec ing AI-gene a ed ly ics, which shows p omising e-
sul s. Thus, ly ics de i ed om he audio signal could be
le e aged o de ec AI-gene a ed con en . As ly ics a e
mainly independen o he audio gene a ion models, ly ics
in o ma ion should be mos ly in a ian unde audio manip-
ula ion and audio gene a ion model changes. Le e aging
ly ics should lead o de ec ion models ha a e mo e o-
bus and gene alize be e . Howe e , he solu ion p oposed
in [11] elies on he exis ence o p ope ly o ma ed ly ics,
which, in p ac ice, a e no a ailable—only he audio is. 2
This lea es a subs an ial gap in i s applicabili y in eal-li e
use cases.
In his pape , we con i m ou hypo hesis— ha le e ag-
ing ly ics could lead o mo e obus and gene alizable de-
ec ion models—by p oposing a new AI-gene a ed music
de ec ion me hod ha le e ages ly ics di ec ly om audio
ia ansc ibing, as depic ed in Figu e 1. We show ha his
me hod main ains he s a e-o - he-a pe o mance o [11]
ac oss a di e se mul i-gen e co pus. C ucially, i does so
despi e he ex a di icul y o ansc ip ion. Also, we show
ha i is obus o audio manipula ion and exhibi s p omis-
ing signs o gene aliza ion o unseen models.
Finally, i should be no ed ha AI-gene a ed au-
dio and AI-gene a ed ly ics a e no pe ec ly co e-
la ed. Many comme cial models suppo inpu ing human-
w i en ly ics; hus, i is possible o ge AI-gene a ed mu-
2Ly ics a e no mally no p o ided as me ada a when inges ing music
in an indus ial se ing, which pa icula ly a ec s new songs.
107
Song
T ansc ibe
"Fo e e us ing
who we we e and
no hing else sha e s
Ne e ca ed o
wha hey shoe"
...
T ansc ip
Fea u e
Ex ac ion
0.12
-0.30
0.55
0.08
-0.15
...
Fea u es
MLP
Fake
Real
Figu e 1. O e iew o ou pipeline o de ec AI-gene a ed songs using ly ics. Using only i s wa e o m, we ge a song’s
ly ics using a ansc ibe (e.g., Whispe ). We hen ex ac a ea u e ec o om he ansc ip (e.g., wi h LLM2Vec), which
is subsequen ly ed in o an MLP-based de ec o o classi y he song as eal o ake. Only he MLP classi ie is ained while
he o he componen s a e used as-is wi hou u he aining.
sic wi h sung human-w i en ly ics. Mo eo e , i is possi-
ble o gene a e ly ics wi h AI and ha e hem pe o med by
eal singe s and o ches as. Thus, he ask o AI-gene a ed
ly ics de ec ion and AI-gene a ed music audio de ec ion
di e s. Howe e , he second case— eal music wi h AI-
gene a ed ly ics—is a less p obable because in es ing
signi ican esou ces in a p o essional eco ding wi h ly ics
o limi ed pe cei ed alue is unlikely. O e all, we be-
lie e de ec ing AI-gene a ed ly ics can help wi h moni-
o ing AI-gene a ed con en , as acks wi h en i ely AI-
gene a ed ly ics a e likely o be ully AI-gene a ed. The
ask o AI-gene a ed ly ics de ec ion om audio would
also ha e s aigh o wa d applica ions o publishe s and
copy igh collec ing socie ies o a oid p o iding oyal ies
o AI-gene a ed ly ics, as, o ins ance in he US, his con-
en canno be p o ec ed by copy igh [12].
The pape is o ganized as ollows: In Sec ion 2 we
p o ide an o e iew o ele an domains: AI-gene a ed
music, and he de ec ion o AI-gene a ed music and ex .
In Sec ion 3, we desc ibe ou me hod. The expe imen al
se up in Sec ion 4 ocuses on he p esen a ion o he da a,
baselines, and e alua ion me ic used in his wo k. Then,
we show esul s in Sec ion 5 and conclude in Sec ion 6.
2. RELATED WORK
AI-gene a ed music gene a ion. Cu en AI music
gene a ion models ypically ely on wo main componen s
wo king in sequence. The i s is an au oencode (AE)
ained o comp ess aw audio in o a mo e manageable
ep esen a ion, which can hen be econs uc ed in o
an audio signal. Today, ad anced neu al audio codecs
such as Encodec [13], DAC [14], SoundS eam [15],
Music2La en [16], and MusicLM [1], a e commonly
used, enabling highe -quali y gene a ion. The second key
componen in ol es aining a model o p edic and gene -
a e he comp essed sequence o e ime, o en based on ex
p omp s. Two p incipal app oaches domina e his s age:
la ge language models (LLMs), as explo ed in wo ks
such as [1, 2, 15], and la en di usion models, as seen in
sys ems like S able Audio [17, 18] and MusicLDM [19].
In simple e ms, he AE is esponsible o wa e o m
syn hesis, while he LLM o di usion model ensu es he
gene a ion o a cohe en musical sequence o e ime. The
comme cial AI music-gene a ion pla o ms, such as Suno,
Udio, and Ri usion, gene a e en i e songs, including
ly ics condi ioning. Howe e , li le is publicly known
abou he a chi ec u es o hei unde lying models.
AI-gene a ed music de ec ion. The i s a emp s o
de ec AI-gene a ed con en ocused on de ec ing oice
cloning [20, 21]. As hese echnologies con inue o
ad ance, i is inc easingly di icul o dis inguish cloned
oices om eal human pe o mances. Consequen ly,
esea che s a e ocusing on iden i ying hei p esence
in songs. A b oade e o o de ec AI-gene a ed music
has p ima ily ocused on he AE componen o music
gene a o s. Resea ch such as [9, 10] aims o de e mine
i a music sample has been syn hesized by an a i icial
decode , independen o i s musical con en , by iden i ying
a i ac s in oduced du ing he encoding-decoding p ocess.
Mo e ecen ly, he wo k o [22] has sough o iden i y
whe he he music and/o ly ics ha e been AI-gene a ed
and o de e mine which speci ic componen was gene a ed.
Howe e , audio-based me hods ha e been shown o be
p one o pe u ba ions in audio, such as ime s e ching o
adding noise [9], making hei p ac ical usage di icul .
AI-gene a ed ex de ec ion. De ec ing whe he a
ex is gene a ed by AI has been widely esea ched. I
is o en amed as a supe ised, bina y classi ica ion
challenge ha consis s o sepa a ing human-w i en
con en om machine-p oduced ex [23, 24]. Mos
classi ie s employ ex ual encode s such as RoBERTa o
Long o me [23, 25–28] o LLMs [29–32]. Howe e , his
app oach depends on ha ing a su icien ly la ge aining
da ase , which is no always a ailable, and i isks o e i -
ing when aced wi h un amilia au ho ial s yles o newly
c ea ed gene a i e models [33, 34]. A di e en s and
o esea ch aims o di e en ia e AI-gene a ed ex om
human-w i en con en by analyzing a ia ions in gene -
a i e model me ics o s ylis ic cha ac e is ics [35–38].
Al hough hese me hods ha e shown e ec i eness, hei
pe o mance may a y compa ed o supe ised ap-
p oaches, in luenced by he gene a i e model and da ase
employed [28]. Mo eo e , some esea che s ha e in es-
iga ed wa e ma k-based de ec ion echniques [25–27].
While hese app oaches yield p omising esul s, mos
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
108
Tex encode Base model Dimension
Re ie al-op imized con en ional ex encode s
MINILMV2 XLM-ROBERTA 1024
BGE-M3 XLM-ROBERTA 1024
S ylis ic ex encode s
UAR-CRUD DISTILROBERTA 768
UAR-MUD DISTILROBERTA 768
LLM-based ex encode s
BGE-ML-GEMMA GEMMA-2-9B3584
LLM2VEC-LLAMALLAMA-3-8B4096
Table 1. O e iew o he ex encode s used.
exis ing wa e ma king inse ion schemes s ill equi e
di ec access o model logi s, which ex e nal use s o
API- es ic ed models such as GPT-4 ypically lack [39].
Fu he mo e, [40] benchma k mul iple AI-gene a ed
ex de ec o s, showing ha mos o hem exhibi high
alse posi i e a es, ail o gene alize ou -o -domain, and
a e ulne able o di e en ypes o ad e sa ial a acks,
wi h each o he de ec o s showing di e en beha io .
While a ple ho a o esea ch has explo ed AI-gene a ed
ex de ec ion ac oss a ious domains, only [11] ha e
explo ed de ec ing AI-gene a ed ly ics, in oducing a
co pus o syn he ic ly ics gene a ed using se e al LLMs
wi h human ly ics seeds spanning nine languages om
musically di e se gen es. In con as , ou aim is o de ec
AI-gene a ed music ia ly ics in a ealis ic scena io whe e
only he audio is accessible.
3. METHOD
The p oposed pipeline o iden i y di ec ly om au-
dio i ly ics a e AI-gene a ed o human-w i en is il-
lus a ed in Figu e 1. Fi s , he audio is p ocessed
by a ansc ip ion model o gene a e a ly ics an-
sc ip . Building upon p e ious esea ch showing hei
e ec i eness wi h ly ics [41, 42] and obus o audio
modi ica ions [43], we use p e- ained Whispe mod-
els, speci ically Whispe -la ge- 2 [44] using he
as e -whispe lib a y [45]. The ansc ip ions a e
used as-is wi hou any co ec ion. We also expe imen ed
wi h a ious ypes o pos -p ocessing, such as ex no mal-
iza ion, emo ing special cha ac e s, and s ipping punc u-
a ion, bu none o hese imp o ed de ec ion pe o mance.
Mo eo e , as shown in Sec ion 5, he pe o mance gap be-
ween he bes de ec o on g ound- u h clean ly ics and
ansc ibed noisy ly ics is small (less han 4%). This sug-
ges s ha e en aw ansc ip s a e p omising o ealis ic
scena ios whe e only audio is a ailable.
We hen inpu he comple e ly ics ansc ip in o a
p e- ained mul ilingual ex embedding model o cap u e
seman ic, syn ac ic, and s ylis ic p ope ies. Fo a ai
compa ison, he con ex window is se o 512 okens o all
models. Mos ly ics i wi hin his limi , bu in a e cases
whe e he oken coun exceeds 512, we unca e he inpu .
This s ep yields a single, con ex ualized ec o -based
ep esen a ion o he ly ics. Following pas esea ch on
syn he ic ly ics de ec ion [11], we es mul iple ypes o
ex embedding models: (1) con en ional ex encode s
op imized o e ie al [46–48]; (2) LLM-based en-
code s [48, 49]; and (3) ex encode s designed o cap u e
s ylis ic cha ac e is ics o ex [50]. While hese models
could be u he specialized o ly ics, we lea e his as
u u e wo k; al hough domain adap a ion appea s o help
wi h he de ec ion ask [11], he o e all pe o mance gains
emain modes , making i a lowe p io i y o ou wo k.
We p o ide a summa y o he di e en models in Table 1.
Re ie al-op imized con en ional ex encode s.
The i s ca ego y, e ie al-op imized con en ional
ex encode s, builds upon ounda ion models such as
BERT [51] and MPNe [52] and add ess some o hei key
limi a ions: high compu a ional cos s o asks equi ing
he cap u ing o seman ic simila i y be ween pai s o
ex ual inpu s [46]. In p ac ice, he e ie al-op imized ex
encode ini ializes a Siamese ne wo k wi h he weigh s
o a ounda ion model such as BERT, which is hen
ine- uned using a con as i e lea ning app oach on pai s
o simila ex s. The ou pu is a ixed-size ec o .
Al hough we es ed mul iple models om he
SENTENCE-TRANSFORMERS [46] lib a y, we epo
only he bes -pe o ming one o he de ec ion ask:
a model based on MINILM [53], a dis illed e sion o
XLM-ROBERTA[54], ine- uned on o e one billion sim-
ila sen ence pai s. 3The second encode , BGE-M3 [48]
is buil on he same ounda ion model, XLM-ROBERTA.
BGE-M3 in oduces se e al inno a ions wi h ega d o
da a cu a ion— o bo h he sel -supe ised p e- aining
and supe ised ine- uning o sen ence simila i y and
he aining s a egy—which elies on a sel -knowledge
dis illa ion amewo k whe e mul iple e ie al unc ions
(embedding-based, aka dense e ie al; keywo d-based,
aka spa se e ie al) a e lea ned oge he .
LLM-based ex encode s. The second ca ego y
en ails LLM-based ex encode s. LLMs a e p ima ily
designed o ex gene a ion, no ex encoding, as hey
a e decode -based and ained au o eg essi ely. Howe e ,
ecen wo k, such as LLM2Vec [49], has p oposed using
LLMs o ex encoding, speci ically ia a h ee-s ep
me hod o con e hese models in o ex encode s. Fi s ,
he causal a en ion mask is modi ied o allow bidi ec ional
a en ion. Then, he model is p e- ained o adap o his
new a en ion mechanism wi h a Masked Nex -Token P e-
dic ion (MNTP) objec i e. Op ionally, he model is u he
ained wi h con as i e lea ning in an unsupe ised way
using SimCSE [47] o enhance sequence ep esen a ion
o downs eam asks. Speci ically, he same inpu is
subjec ed o mul iple d opou masks o gene a e a pai
o ex ual inpu s used o sen ence simila i y ine- uning.
This app oach has shown s ong esul s, making au o e-
g essi e LLMs e y e ec i e o ex encoding bu wi h a
highe compu a ional cos and in e ence ime.
In ou expe imen s, we used LLM2Vec based on
3Fo mo e de ails, see h ps://www.sbe .ne .
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
109
en de p es i a ja
Human-w i en T ain 30 30 30 30 30 30 30 30 30
Tes 450 392 339 450 450 450 211 354 338
AI-gene a ed T ain 28 30 22 30 29 30 27 30 29
Tes 450 450 241 450 450 450 150 407 255
Table 2. Dis ibu ion o human-w i en and AI-gene a ed ly ics by language used as he gene a ion seed (ISO 639 codes).
al e na i e ench hip-hop pop &b ock
i al e na i e elec onic hip-hop jazz pop ock
es al e na i e elec onic hip-hop la in-ame ican pop ock
al e na i e elec onic olk hip-hop pop ock
en al e na i e elec onic hip-hop pop &b ock
de al e na i e edm elec onic hip-hop pop ock
p ch is ian hip-hop música popula b asilei a pop samba-pagode se anejo
ja al e na i e asian elec onic pop ock sound ack
a al e na i e a abic elec onic hip-hop pop ock
Table 3. Gen es used o each language (ISO 639 codes). Selec ed gen es a e he mos o en s eamed ones pe language.
LLAMA-3-8Band es ed bo h he MNTP and SIMCSE
e sions. Since hei pe o mance was simila in ou
ask—consis en wi h he indings in [11] on clean
ly ics—we epo esul s only o he MNTP model.
Addi ionally, we included ano he LLM-based model
in ou expe imen s, simila o BGE-M3, bu buil on
Gemma [55] and op imized o mul ilingual ex simila i y
and e ie al asks: BGE-ML-GEMMA [48] 4.
S ylis ic ex encode s. The hi d ca ego y consis s o
s ylis ic ex encode s, which ha e ecen ly p o en highly
e ec i e in iden i ying whe he a ex is human-w i en
o gene a ed by bo h open-sou ce and closed-sou ce
LLMs [56]. Uni e sal Au ho ship A ibu ion (UAR)
models a e ained o cap u e he au ho ’s w i ing s yle,
complemen ing he usual syn ac ic and seman ic cues
in embeddings [50]. In p ac ice, a con as i e lea ning
s a egy is used o ain a e ie al-op imized con en ional
ex encode o sepa a e s yle om opic. A posi i e
example consis s o an inpu om he same au ho on a
di e en opic, while a nega i e example is om ano he
au ho bu on he same opic. UAR exis s in mul iple
a ian s, UAR-MUD and UAR-CRUD, ained on inpu
om 1 o 5 million Reddi use s, espec i ely.
Finally, on op o he al eady ex ac ed ea u es wi h
ozen p e- ained models, we ain a mul i-laye pe cep-
on (MLP) wi h wo hidden laye s o sizes 256 and 128,
using he ReLU ac i a ion unc ion. We op imize he
model wi h AdamW [57], se ing he lea ning a e o 1e−3
and educing i by a ac o o 0.1i he aining loss does
no imp o e o 5consecu i e epochs. Fo all implemen a-
ions, we use py o ch-ligh ning [58].
4. EXPERIMENTAL SETUP
We p esen u he how we c ea ed he da ase s a ing om
he ly ics-only co pus p oposed by [11]. Then, we b ie ly
desc ibe he baselines and he e alua ion me ics used.
4As o he ime o w i ing, he de ailed adap a ion o Gemma in BGE-
ML-GEMMA o seman ic ex simila i y has no been ully disclosed
4.1 Da a
While se e al da ase s wi h AI-gene a ed audio exis , only
he one by [11] con ains AI-gene a ed ly ics. I p o ides
3,704 eal and 3,558 AI-gene a ed ly ics using h ee LLM
gene a o s (Mis al, TinyLlama, and Wiza dLM2) and hu-
man ly ics spanning nine languages and he six mos pop-
ula gen es pe language as seeds. Table 2 p esen s he dis-
ibu ion o ly ics by language used as he gene a ion seed,
as well as by sou ce (AI-gene a ed o human-w i en) and
ain/ es spli . The gen es in Table 3 co espond o he six
mos p esen music gen es in each language, co e ing he
majo i y o s eams pe language acco ding o he s a is ics
p o ided by he Deeze music s eaming se ice.
Howe e , he da ase p oposed by [11] only p o ides
ly ics; no audio is a ailable. To enable ealis ic expe i-
men s ep esen a i e o ully AI-gene a ed music, we gen-
e a e accompanying audio using Suno 3.5, condi ioned on
bo h ly ics and gen e. I is capable o gene a ing ealis ic
songs wi h up o 4 minu es and ep esen s he mos ecen
s able Suno model. C ucially, we u ilize he p o ided ly ics
om [11] o ensu e con ol o e he con en o he ly ics in
he audio and o ensu e di e si y in e ms o he gene a i e
model (LLM) used o he ex modali y.
Fo songs wi h human-gene a ed ly ics, we use he o ig-
inal audio. In expe imen s, we ollow he same ain/ es
spli as in oduced by [11]. Thus, ou da ase con ains di-
e se songs wi h (i) ake ly ics (LLM-gene a ed) / ake au-
dio (Suno-gene a ed wi h ly ics condi ioned om ex e nal
LLMs) songs and (ii) eal audio / eal ly ics songs.
Mo eo e , o assess gene aliza ion abili ies o unseen
audio and i s a i ac s, we wan o es ou al eady ained
model on a new, p e iously unseen audio gene a ion
model. To his end, we u n owa d Udio, ano he popu-
la music gene a ion ool, and gene a e 260 songs wi h AI-
gene a ed ly ics om he es se o he same ly ics da ase .
He e, we use he mos ecen udio-130 1.5 model, se -
ing ly ics s eng h o 100% o ensu e ha he p o ided
ly ics a e used as-is wi hou majo changes and he seed o
42. We lea e he o he se ings a hei de aul alue. We
hen sample 260 eal songs o main ain balance ac oss lan-
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
110
Model en de p es i a ja Mac o
A g.
BASELINES
GT LYRICSLLM2Vec†91.3 97.4 95.3 99.4 97.5 95.7 94.3 91.5 85.9 94.3
CNNSpec og am‡97.5 96.3 97.5 98.7 98.8 97.0 98.0 94.4 98.4 97.4
TEXT-BASED DETECTORS VIA LYRICS TRANSCRIPTIONS
Re ie al-op imized con en ional ex encode s
MINILMV2 80.8 92.6 93.4 94.4 91.5 90.1 92.1 83.0 67.7 87.3
BGE-M3 84.7 91.3 92.1 93.5 91.4 89.0 91.0 86.6 70.1 87.7
S ylis ic ex encode s
UAR-CRUD 81.9 92.1 92.4 94.3 92.2 87.4 90.8 85.9 76.4 88.2
UAR-MUD 85.2 92.9 93.0 95.0 92.8 91.3 91.5 87.2 77.8 89.6
LLM-based ex encode s
BGE-ML-GEMMA 84.4 94.4 92.9 96.6 93.0 91.8 92.9 88.3 78.0 90.2
LLM2VEC-LLAMA90.6 94.6 93.5 96.5 92.6 91.2 92.9 87.8 77.0 90.7
Table 4. Recall sco es o each language used as gene a ion seed. We epo he a e age o e languages and he bes
ly ics-based model in bold pe language. †deno es he bes -pe o ming baseline by [11], using non- ansc ibed g ound
u h (GT) ly ics wi h LLM2VEC-LLAMA.‡uses he ampli ude spec og am o ain a CNN on he ask, ollowing [9].
guages and gen es. We hen e alua e he models ained on
he Suno da ase on his ou -o -domain scena io wi h bo h
eal and AI-gene a ed songs.
4.2 Baselines
Ou me hod conside s a ealis ic scena io whe e only
audio is a ailable by le e aging ansc ibed ly ics o
de ec AI-gene a ed music. We also include wo
s ong baselines conside ing di e en scena ios. The
i s , GT LYRICSLLM2Vec, uses g ound u h (i.e., non-
ansc ibed; as ound in a ly ics bookle ) ly ics wi h he
ex encoding me hod LLM2VEC and Llama-3-8B as i s
unde lying ea u e ex ac o since his combina ion pe -
o med bes in [11]. Howe e , his assumes he a ailabili y
o pe ec ly o ma ed ly ics. Like ou models, an MLP is
ained on he ou pu embeddings, which a e ozen. Sec-
ond, we ollow [10] and ain a ligh weigh CNN on he
audio’s ampli ude spec og am di ec ly on he ask, aiming
o de ec audio a i ac s. 5
4.3 E alua ion
We e alua e ou model’s pe o mance using mac o- ecall,
ollowing [11, 28, 59], as i p o ides a ealis ic e alua ion
ac oss he wo classes and is sui able o balanced da ase s
like ou s. Ou ocus is hus on minimizing alse nega i es
(misclassi ying eal ly ics) and maximizing ue posi i es
(co ec ly iden i ying AI-gene a ed ly ics).
5. RESULTS
We p esen he main expe imen al esul s in Table 4. As
a eminde , he GT LYRICSLLM2Vec baseline e e s o he
me hod p oposed by [11], which akes g ound- u h ly ics
as inpu and can be conside ed an uppe baseline. The
5Such models could also be ained on o he inpu ep esen a ions, bu
he indings o [10] a e consis en ac oss hem. Hence, we eso o he
bes -pe o ming one.
CNN me hod is a eimplemen a ion o [10], se ing as a
baseline ha does no ely on ly ics in o ma ion.
We obse e only a sligh pe o mance dec ease be-
ween he GT LYRICSLLM2Vec baseline and ou p oposed
me hod using ly ics ansc ip ion when he ex de ec ion
model is an LLM-based ex encode (BGE-Ml-Gemma
and LLM2Vec-LLaMa). This indica es ha he ly ics
ansc ip ion block e ec i ely cap u es ly ical in o ma ion,
which is u he exploi ed by ou p oposed me hod in Fig-
u e 1 o classi y whe he a song is AI-gene a ed o human-
w i en when only audio is a ailable.
Pe o mance is consis en ac oss all languages used as
gene a ion seeds, hough we obse e no able d ops o
Japanese and, o some ex en , o A abic and English as
well. We analyze hese e o s and hei sou ces in mo e de-
ail in Sec ion 5.3. The o he ypes o ex encode s show
mo e modes pe o mance, con i ming ha he LLM2Vec
encode is a be e ea u e ex ac o o AI-gene a ed ly ics
de ec ion on bo h g ound- u h and ansc ibed ly ics. The
CNN baseline ou pe o ms he ly ics-based model on in-
domain da a. Howe e , i s pe o mance declines on ou -
o -domain da a, as discussed in he nex sec ion (§5.1).
5.1 Ou -o -dis ibu ion Audio Gene aliza ion
We epo he esul o he ou -o -dis ibu ion expe imen
in Table 5: Unde basic audio manipula ions, he CNN
me hod show a conside able pe o mance d op o all
ans o ma ions excep ime-s e ching, simila o indings
in [10]. In con as , pe o mance emains s able o ly ics-
based de ec o s. Mo eo e , ly ics-based models gene al-
ize well o unseen audio gene a o s. Speci ically, when
ained only on Suno-gene a ed audio, hey s ill de ec
Udio-gene a ed audio wi h only a small pe o mance d op.
In con as , he a i ac -based CNN model expe iences a
signi ican decline, pe o ming only sligh ly be e han
chance on Udio-gene a ed con en . This con i ms he hy-
po hesis ha ly ics a e la gely una ec ed by audio manip-
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
111

AUDIO ATTACKS
Model
S e ch Pi ch EQ Noise Re e b
UDIO
CNN 98.1 59.0 79.4 77.4 80.7 56.9
UAR-MUD 86.7 88.8 88.8 88.6 88.5 85.6
BGE-GEMMA 91.0 89.8 89.9 89.7 90.0 86.1
L2V-LLAMA90.0 89.7 89.6 89.3 89.6 85.9
Table 5. Recall sco es on ou -o -dis ibu ion da a (Udio)
and when ake songs a e pe u bed (a acked) in di e en
ways. We epo a e age sco es o e languages.
ula ion and can se e as a c ucial ex a cue o de ec ing
ully gene a ed con en .
5.2 Ou -o -dis ibu ion Tex Gene aliza ion
Nex , we assess gene aliza ion abili ies o each model w. . .
ex gene a o s. Fo his, we keep music wi h ly ics gene -
a ed by one o he models unseen om he aining se and
use i only a es ime, while s ill using songs wi h ly ics
om he o he wo models o aining. We epo he e-
sul s o his expe imen in Table 6, whe e columns indica e
he models ha a e no used du ing aining bu only a es .
Fi s , he a i ac -based CNN model main ains a s ong
pe o mance when he ly ics gene a ion model is changed
(95.3% on a e age). This was expec ed, as he ly ics gen-
e a ion model should no signi ican ly a ec he audio a -
i ac s ha he de ec ion model elies on. Howe e , pe o -
mance d ops on music wi h ly ics gene a ed by TinyLlama
and Wiza dLM2, which is somewha su p ising.
Conside ing ly ics-based models, hei pe o mance is
mo e impac ed by he se o ly ics gene a ion models used
in aining. Ye , we s ill obse e good gene aliza ion when
he de ec ion model is ained on da a wi h ly ics om
TinyLlama and Wiza dLM2 and es ed on da a wi h ly ics
om Mis al (88.9% o LLM2VEC-LLAMA). A la ge
pe o mance d op occu s when es ing on TinyLlama o
Wiza dLM2 a e aining wi h he o he wo ex gene a-
o s. S ill, he pe o mance emains decen and well abo e
chance (unlike he audio-based CNN when es ed on da a
om an unseen gene a o ), indica ing some gene alizabil-
i y o unseen gene a ion models. Addi ionally, his sug-
ges s ha aining on di e se da a om mul iple ex gen-
e a ion models (i.e., LLMs) is essen ial o main ain good
pe o mance on ou -o -domain da a.
5.3 Quali a i e Analysis
In his sec ion, we p o ide insigh s in o he de ec o s’ pe -
o mance in iden i ying AI-gene a ed ly ics when a ious
pai s o gen es and languages a e used as seeds in gene -
a ion. As a eminde , he mos lis ened music gen es pe
language a e lis ed in Table 3.
As shown in Table 4, he bes -pe o ming model,
LLM2VEC-LLAMA, e ec i ely dis inguishes AI-
gene a ed om human-w i en music o e all. Howe e ,
when examining he esul s in de ail, pe o mance a ies
ac oss gen e-language pai s. In some cases, he model ails
o de ec a la ge po ion (o e en all) o he AI-gene a ed
TEXT GENERATION MODELS (LLMS)
USED DURING TESTING
Model Mis al TinyLlama Wiza dLM2 MACRO
AVG.
CNN 99.4 92.6 94.0 95.3
UAR-MUD 85.2 71.3 78.1 78.2
BGE-GEMMA 86.5 74.8 79.5 80.3
L2V-LLAMA88.9 75.7 80.2 81.6
Table 6. Recall sco es o de ec o s ( ows) when es ed on
ly ics om di e en gene a o LLMs (columns), ollowing
a lea e-one-gene a o -ou app oach. Each model is ained
on ly ics om he gene a o s no shown in he espec i e
column. We mac o-a e age sco es o e languages.
con en , pa icula ly o I alian Jazz and Tu kish Folk
(bo h 0% ecall), as well as Japanese Elec onic (32%),
Japanese Al e na i e (33%), and Japanese Sound ack
(35%) seeds. In con as , some English gen es, such as
Al e na i e, Elec onic, Pop, R&B, and Rock, exhibi
highe alse posi i e a es. This sugges s ha he model
is mo e likely o misclassi y music wi h human-w i en
ly ics as AI-gene a ed in hese cases.
The low ecall o ce ain gen e-language pai seeds,
such as I alian Jazz and Tu kish Folk, can be la gely a -
ibu ed o se e e class imbalance. Speci ically, hese pai s
ha e e y ew AI-gene a ed examples in he da ase sug-
ges ing ha insu icien aining da a is he de e mining
ac o in he model’s poo pe o mance. Ye , some En-
glish gen es’ high alse posi i e a e may s em om s ylis-
ic simila i ies be ween human-w i en and AI-gene a ed
ly ics. A po en ial solu ion could be adjus ing he de ec-
o ’s classi ica ion h eshold o be less agg essi e in hese
cases, hough we lea e his o u u e wo k.
6. CONCLUSION
In his wo k, we p oposed a obus and p ac ical me hod
o de ec AI-gene a ed music ocused on ly ics, bu us-
ing only audio. To achie e his, we i s ansc ibe he
songs, o e coming he eliance on pe ec g ound- u h
ly ics. Fea u es a e hen ex ac ed using a ious ex en-
code s, and a ligh weigh MLP classi ie is ained on op
o hese ea u es o iden i y AI-gene a ed music. Expe -
imen s show ha he p oposed me hod wo ks e ec i ely,
is mo e esilien o audio dis o ions han pas audio-only
de ec ion me hods, main aining pe o mance whe e o he s
deg ade, and gene alizes well o unseen AI music gene -
a ion models, as well as ai ly well o unseen LLMs o
ly ics gene a ion. Fu u e wo k should explo e obus ness
o mo e complex audio manipula ions and a acks, b oade
gene aliza ion o o he AI music models once a ailable,
in eg a ing al e na i e ex encode s, including some ine-
uned on ly ics, and usion wi h audio-based ea u es o
imp o ed de ec ion. By enabling eliable de ec ion di ec ly
om audio, ou wo k o e s a p ac ical ool o add ess
copy igh conce ns and ensu e anspa ency. I s obus ness
and ease o deploymen make i a p ac ical solu ion o AI
music managemen in he indus y.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
112
7. ETHICS STATEMENT
Al hough designed o bene icial pu poses such as sa e-
gua ding copy igh and p omo ing anspa ency, he de-
elopmen and disclosu e o AI de ec ion sys ems pose
complex e hical challenges. We acknowledge he dynamic
landscape o his ield and he need o ca ch up; as gen-
e a i e models e ol e, hei ou pu s may become s a is i-
cally indis inguishable om human-c ea ed con en . De-
ec o s may be lea ning ansien a i ac s o cu en mod-
els, and hei long- e m u ili y is an open ques ion. Fu -
he mo e, ou me hod add esses ully AI-gene a ed con-
en , bu he g owing p e alence o hyb id human-AI col-
labo a i e wo k lows p esen s a mo e nuanced scena io
ha cu en de ec o s do no ye add ess. In gene al, mis-
applica ion o such ools could lead o unwa an ed e-
mo als o con en , disp opo iona ely ha ming c ea o s.
Mo eo e , biases embedded in aining da a may also
skew de ec ion ou comes ac oss di e en musical gen es
o languages. To add ess hese immedia e and long- e m
challenges, we u ge e hical de elopmen and implemen a-
ion o AI-gene a ed con en de ec o s, p io i izing ans-
pa ency abou hei capabili ies and limi a ions, equi able
ou comes, and human-in- he-loop app oaches o balance
inno a ion wi h p o ec ions o a is s, c ea o s, and he in-
eg i y o he music indus y.
8. ACKNOWLEDGEMENTS
This esea ch was unded in whole o in pa by he Aus-
ian Science Fund (FWF): h ps://doi.o g/10.
55776/COE12,h ps://doi.o g/10.55776/
DFH23,h ps://doi.o g/10.55776/P36413.
The au ho s would like o hank Au elien He aul , Manuel
Moussallam, Yanis Lab ak, and Gaspa d Michel o hei
in aluable eedback on his wo k.
9. REFERENCES
[1] A. Agos inelli, T. I. Denk, Z. Bo sos, J. Engel,
M. Ve ze i, A. Caillon, Q. Huang, A. Jansen,
A. Robe s, M. Tagliasacchi, M. Sha i i, N. Zeghidou ,
and C. H. F ank, “Musiclm: Gene a ing music om
ex ,” A Xi , ol. abs/2301.11325, 2023. [Online].
A ailable: h ps://api.seman icschola .o g/Co pusID:
256274504
[2] J. Cope , F. K euk, I. Ga , T. Remez, D. Kan , G. Syn-
nae e, Y. Adi, and A. Dé ossez, “Simple and con ol-
lable music gene a ion,” in Thi y-se en h Con e ence
on Neu al In o ma ion P ocessing Sys ems, 2023.
[3] N. Ziqian, C. Huakang, J. Yuepeng, H. Chunbo,
M. Guobin, W. Shuai, Y. Jixun, and X. Lei,
“Di Rhy hm: Blazingly as and emba assingly sim-
ple end- o-end ull-leng h song gene a ion wi h la en
di usion,” a Xi p ep in a Xi :2503.01183, 2025.
[4] Joe Spa ow, “AI-gene a ed song cha s in Ge -
many, amid con o e sy,” h ps://musically.com/2024/
08/13/ai-gene a ed-song-cha s-in-ge many-amid-
con o e sy/, 2024, [Online; accessed 26-Ma ch-
2025].
[5] RIAA P ess s a emen s, “Reco d Companies
B ing Landma k Cases o Responsible AI
Agains Suno and Udio in Bos on and New
Yo k Fede al Cou s, Respec i ely,” h ps:
//www. iaa.com/ eco d-companies-b ing-landma k-
cases- o - esponsible-ai-agains suno-and-udio-in-
bos on-and-new-yo k- ede al-cou s- espec i ely/,
2024, [Online; accessed 26-Ma ch-2025].
[6] Daniel Tence , “10,000 AI acks uploaded daily
o Deeze , pla o m e eals, as i iles wo
pa en s o new AI de ec ion ool,” h ps://www.
musicbusinesswo ldwide.com/10000-ai- acks-a e-
uploaded-daily- o-deeze -pla o m- e eals-as-i -
iles- wo-pa en s- o -new-ai-de ec ion- ool/, 2025,
[Online; accessed 26-Ma ch-2025].
[7] T. C. o Con en P o enance and A. (C2PA), “C2pa
spec icia ions,” 2024. [Online]. A ailable: h ps:
//c2pa.o g/speci ica ions/speci ica ions/1.3/index.h ml
[8] A. T. S. Commi ee, “A sc s anda d: Audio wa e ma k
emission,” 2024.
[9] D. A cha , G. Mesegue -B ocal, and R. Hennequin,
“De ec ing music deep akes is easy bu ac ually ha d,”
A Xi , ol. abs/2405.04181, 2024. [Online]. A ailable:
h ps://api.seman icschola .o g/Co pusID:269614314
[10] ——, “Ai-gene a ed music de ec ion and i s chal-
lenges,” ICASSP 2025 - 2025 IEEE In e na ional Con-
e ence on Acous ics, Speech and Signal P ocessing
(ICASSP), 2025.
[11] Y. Lab ak, M. F ohmann, G. Mesegue -B ocal,
and E. V. Epu e, “Syn he ic ly ics de ec ion ac oss
languages and gen es,” in P oceedings o he 5 h Wo k-
shop on T us wo hy NLP (T us NLP 2025), T. Cao,
A. Das, T. Kuma age, Y. Wan, S. K ishna, N. Meh abi,
J. Dhamala, A. Ramak ishna, A. Galys an, A. Kuma ,
R. Gup a, and K.-W. Chang, Eds. Albuque que,
New Mexico: Associa ion o Compu a ional Linguis-
ics, May 2025, pp. 524–541. [Online]. A ailable:
h ps://aclan hology.o g/2025. us nlp-main.34/
[12] “Copy igh ac o 1976,” 1976.
[13] A. Dé ossez, J. Cope , G. Synnae e, and Y. Adi,
“High ideli y neu al audio comp ession,” A Xi , ol.
abs/2210.13438, 2022. [Online]. A ailable: h ps:
//api.seman icschola .o g/Co pusID:253097788
[14] R. Kuma , P. See ha aman, A. Luebs, I. Kuma ,
and K. Kuma , “High- ideli y audio comp ession wi h
imp o ed qgan,” in Ad ances in Neu al In o ma ion
P ocessing Sys ems, ol. 36. Cu an Associa es,
Inc., 2023, pp. 27 980–27 993. [Online]. A ailable:
h ps://a xi .o g/abs/2306.06546
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
113
[15] N. Zeghidou , A. Luebs, A. Om an, J. Skoglund,
and M. Tagliasacchi, “Sounds eam: An end- o-end
neu al audio codec,” IEEE/ACM T ansac ions on
Audio, Speech, and Language P ocessing, ol. 30,
pp. 495–507, 2021. [Online]. A ailable: h ps:
//api.seman icschola .o g/Co pusID:236149944
[16] M. Pasini, S. La ne , and G. Fazekas, “Music2la en :
Consis ency au oencode s o la en audio comp es-
sion,” in P oceedings o he 25 h In e na ional Socie y
o Music In o ma ion Re ie al Con e ence. IS-
MIR, No . 2024, pp. 111–119. [Online]. A ailable:
h ps://doi.o g/10.5281/zenodo.14877289
[17] Z. E ans, C. Ca , J. Taylo , S. H. Hawley,
and J. Pons, “Fas iming-condi ioned la en audio
di usion,” A Xi , ol. abs/2402.04825, 2024. [Online].
A ailable: h ps://api.seman icschola .o g/Co pusID:
267523339
[18] Z. E ans, J. Pa ke , C. Ca , Z. Zukowski, J. Taylo ,
and J. Pons, “S able audio open,” A Xi , ol.
abs/2407.14358, 2024. [Online]. A ailable: h ps:
//api.seman icschola .o g/Co pusID:271310050
[19] K. Chen, Y. Wu, H. Liu, M. Nezhu ina,
T. Be g-Ki kpa ick, and S. Dubno , “Musi-
cldm: Enhancing no el y in ex - o-music gen-
e a ion using bea -synch onous mixup s a egies,”
ICASSP 2024 - 2024 IEEE In e na ional Con e -
ence on Acous ics, Speech and Signal P ocessing
(ICASSP), pp. 1206–1210, 2023. [Online]. A ailable:
h ps://api.seman icschola .o g/Co pusID:260438807
[20] Y. Zang, Y. Zhang, M. Heyda i, and Z. Duan,
“Sing ake: Singing oice deep ake de ec ion,” in P oc.
IEEE In e na ional Con e ence on Acous ics, Speech
and Signal P ocessing (ICASSP). IEEE, 2024, pp.
1–5.
[21] D. Desblancs, G. Mesegue -B ocal, R. Hennequin, and
M. Moussallam, “F om eal o cloned singe iden i ica-
ion,” in P oceedings o he 25 h In e na ional Socie y
o Music In o ma ion Re ie al Con e ence. ISMIR,
No . 2024.
[22] M. A. Rahman, Z. I. A. Hakim, N. H. Sa ke , B. Paul,
and S. A. Fa ah, “Sonics: Syn he ic o no - iden i y-
ing coun e ei songs,” in In e na ional Con e ence on
Lea ning Rep esen a ions (ICLR), 2025.
[23] X. Liu, Z. Zhang, Y. Wang, H. Pu, Y. Lan, and C. Shen,
“Coco: Cohe ence-enhanced machine-gene a ed ex
de ec ion unde low esou ce wi h con as i e lea n-
ing,” in P oceedings o he 2023 Con e ence on Empi -
ical Me hods in Na u al Language P ocessing, 2023,
pp. 16 167–16 188.
[24] F. Huang, H. Kwak, and J. An, “Toblend: Token-
le el blending wi h an ensemble o llms o a -
ack ai-gene a ed ex de ec ion,” a Xi p ep in
a Xi :2402.11167, 2024.
[25] S. Abdelnabi and M. F i z, “Ad e sa ial wa e ma k-
ing ans o me : Towa ds acing ex p o enance wi h
da a hiding,” in 2021 IEEE Symposium on Secu i y and
P i acy (SP). IEEE, 2021, pp. 121–140.
[26] M. Chak abo y, S. T. I. Tonmoy, S. M. Zaman, S. Gau-
am, T. Kuma , K. Sha ma, N. Ba man, C. Gup a,
V. Jain, A. Chadha e al., “Coun e u ing es (c 2):
Ai-gene a ed ex de ec ion is no as easy as you may
hink-in oducing ai de ec abili y index (adi),” in P o-
ceedings o he 2023 Con e ence on Empi ical Me h-
ods in Na u al Language P ocessing, 2023, pp. 2206–
2239.
[27] J. Ki chenbaue , J. Geiping, Y. Wen, J. Ka z, I. Mie s,
and T. Golds ein, “A wa e ma k o la ge language
models,” in In e na ional Con e ence on Machine
Lea ning. PMLR, 2023, pp. 17 061–17 084.
[28] Y. Li, Q. Li, L. Cui, W. Bi, Z. Wang, L. Wang,
L. Yang, S. Shi, and Y. Zhang, “MAGE: Machine-
gene a ed ex de ec ion in he wild,” in P oceedings
o he 62nd Annual Mee ing o he Associa ion o
Compu a ional Linguis ics (Volume 1: Long Pape s),
L.-W. Ku, A. Ma ins, and V. S ikuma , Eds.
Bangkok, Thailand: Associa ion o Compu a ional
Linguis ics, Aug. 2024, pp. 36–53. [Online]. A ailable:
h ps://aclan hology.o g/2024.acl-long.3
[29] D. Macko, R. Mo o, A. Uchendu, J. S. Lucas, M. Ya-
mashi a, M. Pikuliak, I. S ba, T. Le, D. Lee, J. Simko
e al., “Mul i ude: La ge-scale mul ilingual machine-
gene a ed ex de ec ion benchma k,” in 2023 Con e -
ence on Empi ical Me hods in Na u al Language P o-
cessing, EMNLP 2023. Associa ion o Compu a-
ional Linguis ics (ACL), 2023, pp. 9960–9987.
[30] W. An oun, D. Seddah, and B. Sago , “F om ex o
sou ce: Resul s in de ec ing la ge language model-
gene a ed con en ,” in The 2024 Join In e na ional
Con e ence on Compu a ional Linguis ics, Language
Resou ces and E alua ion (LREC-COLING 2024),
2024.
[31] Y. Chen, H. Kang, V. Zhai, L. Li, R. Singh, and B. Raj,
“Token p edic ion as implici classi ica ion o iden i y
llm-gene a ed ex ,” in P oceedings o he 2023 Con e -
ence on Empi ical Me hods in Na u al Language P o-
cessing, 2023, pp. 13 112–13 120.
[32] T. S. Kuma age, P. She h, R. Mo a ah, J. Ga land
e al., “How eliable a e ai-gene a ed- ex de ec o s? an
assessmen amewo k using e asi e so p omp s,” in
The 2023 Con e ence on Empi ical Me hods in Na u al
Language P ocessing, 2023.
[33] A. Uchendu, T. Le, K. Shu, and D. Lee, “Au ho ship
a ibu ion o neu al ex gene a ion,” in P oceedings
o he 2020 con e ence on empi ical me hods in na u al
language p ocessing (EMNLP), 2020, pp. 8384–8395.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
114
[34] A. Bakh in, S. G oss, M. O , Y. Deng, M. Ranza o,
and A. Szlam, “Real o ake? lea ning o disc imina e
machine om human gene a ed ex ,” a Xi p ep in
a Xi :1906.03351, 2019.
[35] E. Mi chell, Y. Lee, A. Khaza sky, C. D. Manning, and
C. Finn, “De ec gp : Ze o-sho machine-gene a ed ex
de ec ion using p obabili y cu a u e,” in In e na ional
Con e ence on Machine Lea ning, 2023. [Online].
A ailable: h ps://api.seman icschola .o g/Co pusID:
256274849
[36] J. Su, T. Zhuo, D. Wang, and P. Nako , “De ec llm:
Le e aging log ank in o ma ion o ze o-sho de ec-
ion o machine-gene a ed ex ,” in Findings o he
Associa ion o Compu a ional Linguis ics: EMNLP
2023, 2023, pp. 12 395–12 412.
[37] B. Zhu, L. Yuan, G. Cui, Y. Chen, C. Fu, B. He,
Y. Deng, Z. Liu, M. Sun, and M. Gu, “Bea llms a hei
own game: Ze o-sho llm-gene a ed ex de ec ion ia
que ying cha gp ,” in P oceedings o he 2023 Con e -
ence on Empi ical Me hods in Na u al Language P o-
cessing, 2023, pp. 7470–7483.
[38] V. S. Sadasi an, A. Kuma , S. Balasub amanian,
W. Wang, and S. Feizi, “Can ai-gene a ed ex be e-
liably de ec ed?” a Xi p ep in a Xi :2303.11156,
2023.
[39] J. Achiam, S. Adle , S. Aga wal, L. Ahmad, I. Akkaya,
F. L. Aleman, D. Almeida, J. Al enschmid , S. Al -
man, S. Anadka e al., “Gp -4 echnical epo ,” a Xi
p ep in a Xi :2303.08774, 2023.
[40] L. Dugan, A. Hwang, F. T hlík, A. Zhu, J. M.
Ludan, H. Xu, D. Ippoli o, and C. Callison-Bu ch,
“RAID: A sha ed benchma k o obus e alua ion
o machine-gene a ed ex de ec o s,” in P oceedings
o he 62nd Annual Mee ing o he Associa ion o
Compu a ional Linguis ics (Volume 1: Long Pape s),
L.-W. Ku, A. Ma ins, and V. S ikuma , Eds.
Bangkok, Thailand: Associa ion o Compu a ional
Linguis ics, Aug. 2024, pp. 12 463–12 492. [Online].
A ailable: h ps://aclan hology.o g/2024.acl-long.674/
[41] L. Zhuo, R. Yuan, J. Pan, Y. Ma, Y. Li, G. Zhang,
S. Liu, R. B. Dannenbe g, J. Fu, C. Lin, E. Bene os,
W. Chen, W. Xue, and Y.-T. Guo, “Ly icwhiz:
Robus mul ilingual ze o-sho ly ics ansc ip ion by
whispe ing o cha gp ,” A Xi , ol. abs/2306.17103,
2023. [Online]. A ailable: h ps://api.seman icschola .
o g/Co pusID:259287024
[42] O. Cí ka, H. Sch eibe , L. Mine , and F.-R. S ö e ,
“Ly ics ansc ip ion o humans: A eadabili y-awa e
benchma k,” a Xi p ep in a Xi :2408.06370, 2024.
[43] S. Ka ko , A. Lio a, and A. Vie i, “Benchma k-
ing whispe unde di e se audio ans o ma ions and
eal- ime cons ain s,” in In e na ional Con e ence on
Speech and Compu e . Sp inge , 2024, pp. 82–91.
[44] A. Rad o d, J. W. Kim, T. Xu, G. B ockman,
C. McLea ey, and I. Su ske e , “Robus speech
ecogni ion ia la ge-scale weak supe ision,” A Xi ,
ol. abs/2212.04356, 2022. [Online]. A ailable: h ps:
//api.seman icschola .o g/Co pusID:252923993
[45] G. Klein, J. W. Kim, Y. Kim, and C. Delangue, “ as e -
whispe : A eimplemen a ion o openai’s whispe
model using c ansla e2,” no 2023. [Online]. A ail-
able: h ps://gi hub.com/SYSTRAN/ as e -whispe
[46] N. Reime s and I. Gu e ych, “Sen ence-BERT:
Sen ence embeddings using Siamese BERT-ne wo ks,”
in P oceedings o he 2019 Con e ence on Empi ical
Me hods in Na u al Language P ocessing and he 9 h
In e na ional Join Con e ence on Na u al Language
P ocessing (EMNLP-IJCNLP), K. Inui, J. Jiang,
V. Ng, and X. Wan, Eds. Hong Kong, China:
Associa ion o Compu a ional Linguis ics, No .
2019, pp. 3982–3992. [Online]. A ailable: h ps:
//aclan hology.o g/D19-1410
[47] T. Gao, X. Yao, and D. Chen, “SimCSE: Simple
con as i e lea ning o sen ence embeddings,” in
P oceedings o he 2021 Con e ence on Empi ical
Me hods in Na u al Language P ocessing, M.-F.
Moens, X. Huang, L. Specia, and S. W.- . Yih,
Eds. Online and Pun a Cana, Dominican Republic:
Associa ion o Compu a ional Linguis ics, No .
2021, pp. 6894–6910. [Online]. A ailable: h ps:
//aclan hology.o g/2021.emnlp-main.552
[48] J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu,
“M3-embedding: Mul i-linguali y, mul i- unc ionali y,
mul i-g anula i y ex embeddings h ough sel -
knowledge dis illa ion,” in Findings o he Associa ion
o Compu a ional Linguis ics: ACL 2024, L.-W.
Ku, A. Ma ins, and V. S ikuma , Eds. Bangkok,
Thailand: Associa ion o Compu a ional Linguis-
ics, Aug. 2024, pp. 2318–2335. [Online]. A ailable:
h ps://aclan hology.o g/2024. indings-acl.137/
[49] P. BehnamGhade , V. Adlakha, M. Mosbach,
D. Bahdanau, N. Chapados, and S. Reddy,
“Llm2 ec: La ge language models a e sec e ly
powe ul ex encode s,” 2024. [Online]. A ailable:
h ps://a xi .o g/abs/2404.05961
[50] R. A. Ri e a-So o, O. E. Miano, J. O donez, B. Y.
Chen, A. Khan, M. Bishop, and N. And ews,
“Lea ning uni e sal au ho ship ep esen a ions,” in
P oceedings o he 2021 Con e ence on Empi ical
Me hods in Na u al Language P ocessing, M.-F.
Moens, X. Huang, L. Specia, and S. W.- . Yih,
Eds. Online and Pun a Cana, Dominican Republic:
Associa ion o Compu a ional Linguis ics, No .
2021, pp. 913–919. [Online]. A ailable: h ps:
//aclan hology.o g/2021.emnlp-main.70
[51] J. De lin, M.-W. Chang, K. Lee, and K. Tou ano a,
“BERT: P e- aining o deep bidi ec ional ans o me s
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
115

Related note

Why organizations use Identific for document trust, entry 44
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in North America, Europe, Latin America, and international online education, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports more transparent source review, better handling of multilingual submissions, and more consistent review procedures. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For doctoral theses, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com