Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu

Author: António Pinto (INESC TEC; University of Porto -. Faculty of Engineering)

Publisher: Zenodo

DOI: 10.5281/zenodo.17706403

Source: https://zenodo.org/records/17706403/files/000037.pdf

TOWARDS HUMAN-IN-THE-LOOP ONSET DETECTION: A TRANSFER
LEARNING APPROACH FOR MARACATU
An ónio Sá Pin o
Faculdade de Engenha ia da Uni e sidade do Po o, Po o, Po ugal
INESC TEC, Po o, Po ugal
[email p o ec ed]
ABSTRACT
We explo e ans e lea ning s a egies o musical onse
de ec ion in he A o-B azilian Ma aca u adi ion, which
ea u es complex hy hmic pa e ns ha challenge con-
en ional models. We adap wo Tempo al Con olu ional
Ne wo k a chi ec u es: one p e- ained o onse de ec ion
(in a- ask) and ano he o bea acking (in e - ask). Us-
ing only 5-second anno a ed snippe s pe ins umen , we
ine- une hese models h ough laye -wise e aining s a e-
gies o i e adi ional pe cussion ins umen s. Ou esul s
demons a e signi ican imp o emen s o e baseline pe -
o mance, wi h F1 sco es eaching up o 0.998 in he in a-
ask se ing and imp o emen s o o e 50 pe cen age poin s
in bes -case scena ios. The c oss- ask adap a ion p o es
pa icula ly e ec i e o ime-keeping ins umen s, whe e
onse s na u ally align wi h bea posi ions. The op imal
ine- uning con igu a ion a ies by ins umen , highligh -
ing he impo ance o ins umen -speci ic adap a ion s a e-
gies. This app oach add esses he challenges o unde ep-
esen ed musical adi ions, o e ing an e icien human-
in- he-loop me hodology ha minimizes anno a ion e o
while maximizing pe o mance. Ou indings con ibu e o
mo e inclusi e music in o ma ion e ie al ools applicable
beyond Wes e n musical con ex s.
1. INTRODUCTION
Accu a ely iden i ying he p ecise momen when a musi-
cal no e begins emains one o he undamen al challenges
in audio signal p ocessing. This ask, known as musical
onse de ec ion, se es as a co ne s one o nume ous Mu-
sic In o ma ion Re ie al (MIR) applica ions. Onse de ec-
ion has his o ically been essen ial o hy hmic analysis,
no ably in bea acking sys ems [1–3]. While end- o-end
lea ning models ha e ecen ly bypassed his explici s ep
in some con ex s, onse de ec ion con inues o be c i ical
o di e se applica ions such as sco e ollowing [4], music
segmen a ion [5], and polyphonic music ansc ip ion [6].
The me hodological e olu ion o onse de ec ion mi -
o s b oade ends in MIR esea ch. Ea ly app oaches e-
© A. S. Pin o. Licensed unde a C ea i e Commons A i-
bu ion 4.0 In e na ional License (CC BY 4.0). A ibu ion: A. S. Pin o,
“Towa ds Human-in- he-loop Onse De ec ion: A T ans e Lea ning Ap-
p oach o Ma aca u”, in P oc. o he 26 h In . Socie y o Music In o -
ma ion Re ie al Con ., Daejeon, Sou h Ko ea, 2025.
lied on signal p ocessing echniques o iden i y signi ican
changes in audio p ope ies [7, 8], ollowed by he in o-
duc ion o ea u e-based machine lea ning me hods [9,10].
The ield hen shi ed owa d neu al ne wo k a chi ec u es,
beginning wi h Recu en Neu al Ne wo ks (RNNs) [11]
and ad ancing o Con olu ional Neu al Ne wo ks (CNNs)
[12], which ex ac ele an ea u es di ec ly om aw au-
dio o spec al ep esen a ions. Despi e imp essi e ad-
ances in pe o mance me ics (wi h op models achie -
ing F1 sco es app oaching 90% in ecen e alua ions 1),
signi ican challenges pe sis in onse de ec ion. In pa ic-
ula , accu a ely de ec ing so onse s emains di icul e en
o ad anced models [13]. Mo eo e , hese da a-d i en ap-
p oaches in oduce addi ional challenges ela ed o aining
da a equi emen s and gene alizabili y.
The e ec i eness o supe ised lea ning models hinges
on he quali y and di e si y o aining da a [14]. Cu en
sys ems expe ience pe o mance d ops when analysing
non-Wes e n musical adi ions o a e ins umen s, p i-
ma ily due o insu icien ep esen a ion in exis ing
da ase s. Add essing hese gaps equi es cos ly anno a ion
e o s ha demand bo h domain-speci ic and cul u ally-
in o med expe ise [15], u he complica ing da ase cu-
a ion. Fu he mo e, he anno a ion p ocess i sel e eals
limi a ions: manual labelling o onse s is p one o human
e o and inconsis encies [16], wi h e en isola ed pe cus-
si e signals p o ing di icul o label p ecisely [17]. These
cons ain s es ic he p ac ical deploymen o s a e-o -
he-a sys ems in di e se musical con ex s, poin ing o he
need o mo e adap able s a egies.
Mo ing beyond he speci ic challenges o onse de-
ec ion, MIR esea ch has employed se e al adap i e
s a egies wi hin hy hm analysis asks. In o med me h-
ods le e age a p io i knowledge abou hy hmic con en
o asks such as bea acking [18] and me e de e mi-
na ion [19], which, while e ec i e in speci ic gen es,
lack gene alizabili y. T ans e lea ning le e ages knowl-
edge ac oss domains, wi h examples including adap a ions
o mains eam bea - acking models o G eek olk mu-
sic [20] and acili a ing adap i e hy hm mic o iming gen-
e a ion [21]. Addi ionally, use -cen ic app oaches like
Ac i e Lea ning and Few-Sho Lea ning op imize lea ning
h ough s a egic sample selec ion, enhancing adap abili y
in polyphonic d um ansc ip ion [22, 23] and enabling in-
1MIREX 2018, a h ps://nema.lis.illinois.edu/nema_ou /
mi ex2018/ esul s/aod/
320
e ac i e e inemen o onse de ec ion [24] and bea ack-
ing [25]. This shi owa d use in ol emen exempli ies
he cu en human-cen ed landscape o MIR, ecognizing
use s’ essen ial ole in da a-d i en sys ems [26]. The in e-
g a ion o human expe ise in o compu a ional amewo ks
p o ides a p omising a enue when exis ing solu ions p o e
insu icien .
Recen esea ch has explo ed inco po a ing use -
p o ided in o ma ion o enhance bea acking pe o -
mance. Techniques such as high-le el model pa ame e -
iza ion [27] and in eg a ing use -anno a ed da a snippe s in
a ine- uning cycle [28] ha e shown p omise o imp o ing
s a e-o - he-a accu acy. These me hods a e pa icula ly
e ec i e in add essing challenges in unde ep esen ed mu-
sical con ex s, whe e con en ional MIR echniques unde -
pe o m. Such app oaches ha e p o en ins umen al in he
c ea ion o he Ma aca u onse da ase [17], me e de e -
mina ion in La in-Ame ican music [29], and bea acking
in highly challenging music signals [30]. The implemen a-
ion o ans e lea ning o hese asks a ies conside ably:
while some app oaches e ain only inal laye s o le e age
basic hy hmic ep esen a ions [20, 31], o he s a ge inpu
and ou pu laye s o ins umen -speci ic adap a ion [17],
and some e ain en i e ne wo ks [28]. Despi e hese a ied
s a egies, no s udies ha e empi ically e alua ed he im-
pac o laye -wise e aining on model pe o mance, lea -
ing his c i ical ques ion unexplo ed.
Building on his ounda ion, his pape explo es a use -
d i en ans e lea ning app oach o onse de ec ion, o-
cusing on he A o-B azilian adi ion o Ma aca u. We
use he eponymous da ase [17], which ea u es complex
hy hms and unique ins umen al acous ic cha ac e is ics
ha cause leading models o s uggle wi h achie ing sa is-
ac o y pe o mance.
Ou me hodology in ol es adap ing a deep neu al ne -
wo k o each ins umen in he “ e no”, he pe cus-
sion ensemble cen al o Ma aca u’s hy hm, based on a
sho anno a ed snippe pe ins umen . Th ough hese
ins umen -speci ic adap a ions, we demons a e an e ec-
i e and s aigh o wa d me hod o enhance s a e-o - he-a
pe o mance. We in es iga e wo dis inc ans e lea n-
ing scena ios: one wi h a model ini ially ained o on-
se de ec ion, and ano he no el app oach adap ing a bea -
acking model o onse de ec ion. This ex ends p e ious
esea ch [17, 32] by explo ing c oss- ask ea u e ans e -
abili y and le e aging mo e complex models ained on
la ge da ase s. Fu he mo e, we sys ema ically e alua e
laye -wise e aining s a egies, examining he e ec i e-
ness o eezing di e en laye g oups o iden i y op imal
con igu a ions o Ma aca u onse de ec ion.
2. METHODOLOGY
Ou app oach add esses he limi a ions o exis ing mod-
els in non-mains eam signals by in eg a ing use -p o ided
sho anno a ed snippe s. We adap he human-in- he-
loop me hod p oposed by Pin o e al. o bea ack-
ing [27, 28, 30] o he ask o onse de ec ion, le e aging
s a e-o - he-a models h ough in-si u ine- uning. This
use -cen ed me hodology elimina es he need o ex en-
si e aining om sc a ch, enabling end-use s o swi ly
ob ain high-quali y onse es ima es ha align wi h hei
judgmen s.
Fo onse de ec ion in mono imb al signals, we adap
neu al ne wo ks o each ins umen ’s unique acous ic cha -
ac e is ics using jus a single 5-second anno a ed snip-
pe pe ins umen as he ine- uning a ge . This ap-
p oach demons a es bo h minimal anno a ion e o and
apid adap a ion cycles, yielding ins umen -speci ic ne -
wo ks op imized o hei co esponding acous ic p ope -
ies while emaining compu a ionally easible o s anda d
esou ces. While ou me hod is applicable o a ious DNN
a chi ec u es, his s udy employs Tempo al Con olu ional
Ne wo k (TCN)-based models o hei e icien e aining
capabili ies. The TCN’s pe o mance in onse de ec ion
asks is compa able o s a e-o - he-a models, as demon-
s a ed in Sec ion 3.1, making i sui able o ou in es iga-
ion.
We explo e wo ans e lea ning scena ios: an in a-
ask se ing using a TCN onse de ec ion model [32]
and an in e - ask se ing ha adap s a TCN bea acking
model [33] o onse de ec ion. This in e - ask app oach
can be amed as a domain adap a ion p oblem, whe e a
model ained o bea acking is epu posed o onse
de ec ion. Gi en he inhe en ela ionship be ween bea s
and onse s, his adap a ion may bene i om he ypically
b oade aining da a a ailable o bea acking models. To
he bes o ou knowledge, his is he i s s udy o explo e
domain adap a ion om bea acking o onse de ec ion.
Fu he mo e, onse de ec ion’s unambiguous objec i e,
when con as ed wi h he mul i ace ed na u e o bea ack-
ing, allows o clea e adap a ion a ge s and, consequen ly,
mo e s aigh o wa d in e p e a ion o esul s. This mo i-
a ed us o ex end p e ious esea ch by examining laye -
wise e aining s a egies. We sys ema ically eeze di e -
en segmen s o he 15-laye TCN a chi ec u es, om he
ini ial con olu ional laye s wi h small ecep i e ields o
he deepe laye s wi h la ge dila ion a es and wide e-
cep i e ields. In o al, ou expe imen al cycle comp ises
150 ine- uning cycles (15 laye con igu a ions ×5 ins u-
men s ×2 models). Th ough his comp ehensi e e alua-
ion, we aim o in es iga e ea u e ans e abili y be ween
ela ed hy hm analysis asks and sys ema ically assess he
impac o di e en laye eezing con igu a ions.
In line wi h open science p inciples [34], we p o ide a
Gi Hub eposi o y wi h ou code and de ailed esul s, in-
cluding pe - ile e alua ion me ics o all con igu a ions
and highe - esolu ion igu es o de ailed analysis 2. The
emainde o his sec ion ou lines he Ma aca u da ase
composi ion, expe imen al se ings, base models’ desc ip-
ion, and ine- uning and e alua ion de ails.
2h ps://gi hub.com/asapsmc/HIILOnse De ec ion
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
321
2.1 Da ase
Ma aca u de baque sol o 3, also known as Ma aca u
“ u al”, is a ib an ca ni al pe o mance om Pe nam-
buco, No heas B azil, combining music, poe y, and
dance [36]. The hy hmic nucleus o Ma aca u, known as
he “ e no” ensemble, consis s o i e pe cussionis s play-
ing adi ional handmade ins umen s: cuica,gonge-lo,
a ol,minei o, and ambo -hi. The Ma aca u da ase [17]
cap u es hese ins umen s using con ac mic ophones o
la gely isola ed pe -ins umen acks, eco ded du ing a
ixed loca ion pe o mance and comp ising 34 indi idual
pieces o alling app oxima ely 33 minu es 4.
Ma aca u ea u es wo main hy hmic pa e ns: “ma -
cha” and “samba”, cha ac e ized by as empi o app oxi-
ma ely 165 and 180 bea s-pe -minu e (bpm), espec i ely.
This apid pace c ea es a complex iming p o ile ac oss he
ensemble. Time-keeping ins umen s (cuica and gonge-
lo) main ain hy hmic s abili y despi e hei spo adic use,
wi h a mean onse coun o a ound 4,700 (2.5 anno a ions
pe second). In con as , he “ oicing” ins umen s ( a ol,
minei o, and ambo -hi) play mo e exp essi e oles, esul -
ing in a highe mean onse coun o app oxima ely 16,600
(8.9 anno a ions pe second).
−0.1
0.0
0.1
Cuica
0.68 0.74 0.79 0.85 0.91 0.96 1.02
−0.2
0.0
0.2
Gonge-Lo
0.74 0.79 0.85 0.91 0.96 1.02 1.08
−0.1
0.0
0.1
Minei o
0.11 0.14 0.16 0.18 0.20
−0.1
0.0
0.1
Tambo -Hi
0.20 0.23 0.25 0.27 0.29
012345
−0.2
0.0
0.2
Ta ol
0.20 0.23 0.25 0.27 0.29
ime (seconds)
Ampli ude
Figu e 1. Onse -anno a ed wa e o ms o he Ma aca u
ins umen s. Le : 5-seconds ine- uning snippe ; Righ :
Zoomed in wa e o m, om he second onse o he sample
be o e hi d onse (in blue).
The in icacy o hese hy hms and dis inc wa e o m
shapes, as illus a ed in Figu e 1, complica es onse de-
ec ion and anno a ion. The minei o exempli ies his chal-
lenge wi h i s unusual wa e o m cha ac e is ics, which led
3He ea e e e ed o as Ma aca u, his gen e should be dis inguished
om Ma aca u de baque i ado (o “Nação”). Bo h sha e A ican o i-
gins and ce ain musical simila i ies, bu di e signi ican ly in ins umen-
a ion, p ac ice, and na a i e [35].
4While he o iginal da ase con ains 34 iles pe ins umen , we ex-
cluded Ins umen _34 iles ac oss all sub-da ase s due o a co up ed
Minei o_34 ile.
o i s exclusion om mic o iming analysis in he o iginal
da ase c ea ion s udy due o anno a ion di icul ies [17].
Combined wi h he unde - ep esen a ion o hese ins u-
men s in a ailable model aining da a, hese ac o s c e-
a e subs an ial obs acles o bo h human anno a o s and
au oma ed sys ems. The Ma aca u da ase hus p o ides
an ideal es bed o ou human-in- he-loop s a egy, ex-
ending he app oach p e iously employed in he da ase ’s
c ea ion.
2.2 Base Models
This s udy employs wo p e- ained models, bo h de-
i ed om he TCN a chi ec u e p oposed by Da ies and
Böck [37]. Fo he in a- ask se ing, we use a modi ied
e sion o he o iginal TCN model wi h an addi ional 11 h
dila ion a e le el [32], ained om sc a ch on he On-
se DB da ase [4] o onse de ec ion. In he in e - ask sce-
na io, we u ilize an adap a ion o he [33] mul i ask ne -
wo k, modi ied by masking i s empo and downbea loss o
unc ion as a single- ask (bea ) ne wo k, ained on a ious
bea - acking da ase s. He ea e , we e e o hese models
as TCN 1 and TCN 2, espec i ely.
Con 1
Con 2
Con 3
Tcn1
Tcn2
Tcn4
Tcn8
Tcn16
Tcn32
Tcn64
Tcn128
Tcn256
Tcn512
Tcn1024
Ou pu
Inpu
Con 1
Figu e 2. High-le el a chi ec u e sha ed by he TCN 1 and
TCN 2 models. Bo h ollow he same laye sequence and
dep h, bu di e in con olu ional il e con igu a ion, e-
sul ing in dis inc ecep i e ields and o e all model sizes.
As illus a ed in Figu e 2, bo h models sha e he same
high-le el a chi ec u e and signal condi ioning s ages, bu
hei implemen a ions di e signi ican ly. TCN 1 con-
sis s o h ee con olu ional laye s wi h 16 il e s and il e
shapes o 3×3, 3×3, and 1×8, wi h max pooling o e h ee
equency bins a e he i s wo laye s. In con as , TCN 2
employs h ee con olu ional laye s wi h 20 il e s and il-
e shapes o 3×3, 1×10, and 3×3, each ollowed by max
pooling o e h ee equency bins. Bo h a chi ec u es use
d opou a e each con olu ional s age. The ensuing TCN
block ope a es non-casually and consis s o 11 dila ion le -
els, 16 il e s, and a ke nel size o 5. The TCN 1 model
comp ises 21,890 pa ame e s, while he TCN 2 model has
116,302 pa ame e s. The o iginal aining p ocedu es also
di e ed sligh ly in op imiza ion echniques: TCN 1 em-
ployed a s anda d Adam op imize , whe eas TCN 2 used a
Rec i ied Adam plus Lookahead app oach.
2.3 Fine- uning
Fo bo h in a- ask and in e - ask ans e lea ning se ings,
we adop he ine- uning s a egy desc ibed in [28], us-
ing a 5-second anno a ed sample pe ins umen o demon-
s a e minimal anno a ion e o . Each base model is ine-
uned o 50 epochs wi h he lea ning a e educed o one-
qua e o he o iginal alue, main aining he o iginal op i-
mize s o seamless aining con inua ion. Ea ly s opping
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
322
and lea ning a e educ ion mechanisms we e no imple-
men ed as hese pa ame e s p o ed su icien o con e -
gence gi en he sho aining du a ion and small da ase
size. Gi en ou sys ema ic laye -wise analysis comp ising
150 ine- uning cycles, we omi ed da a augmen a ion and
addi ional hype pa ame e op imiza ion o main ain expe -
imen al ac abili y and suppo isola ed analysis o how
laye -wise eezing s a egies ela e o each ins umen ’s
acous ic cha ac e is ics.
We e alua e all possible ine- uning con igu a ions, de-
no ed as A-B, whe e A and B indica e he s a ing and
ending laye s o he ozen sec ion, espec i ely. The ou -
pu laye is always upda ed and hus excluded om his
no a ion. We explo e con igu a ions om Con 1...3 o
Tcn1...1024, including he ully ainable con igu a ion
. These a e compa ed wi h he in a- ask baseline bsl
and he in e - ask baseline bsl*.
2.4 E alua ion
The ne wo k ou pu is an onse ac i a ion unc ion wi h
a 10-millisecond (ms) empo al esolu ion. We apply he
s anda d madmom peak-picking algo i hm o ob ain onse
es ima es. Pe o mance is e alua ed using he F1 me ic
wi h he de aul 25 ms ole ance window [4]. We imple-
men a holdou alida ion app oach whe e, o each in-
s umen , we ex ac a 5-second segmen om he i s
ile (Ins umen _01) o ine- uning and hen exclude his
en i e ile om he e alua ion se o p e en da a leak-
age. This ensu es unbiased assessmen o he ins umen -
adap ed models by e alua ing pe o mance on he emain-
ing 32 iles pe ins umen .
3. EXPERIMENTS AND RESULTS
3.1 P elimina y Model Analysis
To con ex ualize ou app oach, we i s compa e he pe o -
mance o ou base TCN models wi h p e ious s a e-o - he-
a me hods on he Onse DB da ase [4]. Ou base models,
TCN 1 and TCN 2, achie e F1 sco es o 0.907 and 0.340,
espec i ely. The lowe pe o mance o TCN 2 is expec ed,
as i was o iginally ained o bea acking a he han
onse de ec ion. The madmomRNN and madmomCNN mod-
els, p e- ained and p o ided as in e ence- eady models in
he madmom package [38], achie e F1 sco es o 0.849 and
0.913, espec i ely. Howe e , i is impo an o no e ha
hese e alua ions we e conduc ed wi hou knowledge o
he o iginal aining/ es spli s used o hese p e- ained
models, c ea ing po en ial da a leakage ha may lead o an
o e es ima ion o hei pe o mance. The 2nd gene a ion
onse CNN [39] emains he es ablished benchma k, wi h
a epo ed F1 sco e o 0.903, e i ied h ough k- old c oss-
alida ion. Unlike he madmom models, ou TCN models
we e e alua ed unde he same alida ion condi ions as he
2nd gen CNN, ensu ing compa abili y. These esul s indi-
ca e ha TCN 1 is compe i i e wi h he cu en s a e o he
a in onse de ec ion.
Table 1. Rep esen a i e con igu a ions demons a ing im-
p o emen s ac oss ans e lea ning se ings.
Onse - o-Onse Bea - o-Onse
Adap ed (bes ) bsl Adap ed (bes ) bsl*
Cuica Tcn16 0.985 0.477 0.955 0.429
Gonge-Lo Tcn2/4/16 0.998 0.508 0.956 0.892
Minei o Tcn16 0.972 0.946 Tcn8 0.790 0.193
Tambo -Hi /Tcn1024 0.978 0.965 Tcn1 0.723 0.443
Ta ol Con 3 0.997 0.993 0.884 0.139
3.2 Onse - o-Onse T ans e Lea ning Resul s
Figu e 3 ( op) p esen s he F1 sco es ob ained o each ine-
uning con igu a ion in compa ison o he baseline. The
esul s can be g ouped based on he hy hmic ole o he
ins umen s: ime-keeping (cuica and gonge-lo) s. oic-
ing ( a ol,minei o, and ambo -hi).
Fo ime-keeping ins umen s, he baseline pe o mance
is mode a e (F1 ≈0.5), bu ine- uning yields signi ican
imp o emen s, wi h sco es eaching he 0.8–1.0 ange.
In con as , exp essi e ins umen s exhibi highe ini ial
F1 sco es (≈0.9–1.0), which limi s he ela i e imp o e-
men . This dispa i y can be a ibu ed o he con en ional
na u e o a ol and ambo -hi, which a e mo e aligned
wi h he aining da a, whe eas cuica and gonge-lo di e ge
mo e in e ms o acous ic cha ac e is ics. An excep ion is
minei o, which achie es a ela i ely high baseline sco e
despi e i s dis inc wa e o m cha ac e is ics. Howe e ,
he epo ed lowe p ecision o hese g ound- u h anno-
a ions [17] complica es di ec pe o mance compa isons.
Table 1 p esen s high-pe o ming con igu a ions o
demons a e he achie able imp o emen s ac oss ins u-
men s. The Tcn16 model achie es he highes accu-
acy o cuica and minei o (0.985 and 0.972, espec i ely),
while Tcn2, Tcn4, and Tcn16 all achie e he highes F1
sco e o gonge-lo (F1 = 0.998). Fo ambo -hi, he bes
pe o mance is ob ained wi h bo h Tcn1024 and (F1 =
0.978). Fo a ol, he highes F1 sco e (0.997) is achie ed
wi h Con 3, hough many con igu a ions show compa a-
ble pe o mance wi h ma ginal di e ences. These con igu-
a ions consis en ly ou pe o m he baseline, wi h he mos
no able gains obse ed in cuica and gonge-lo, whe e F1
imp o emen s exceed 50 pe cen age poin s (p.p.).
In summa y, all ins umen s bene i om adap a ion, as
mos ine- uned con igu a ions—and in pa icula , he bes
o each ins umen —consis en ly ou pe o m he base-
line. The imp o emen is especially p onounced o ime-
keeping ins umen s (cuica and gonge-lo), likely due o
hei lowe baseline accu acy, which allows mo e oom o
imp o emen , and he ela i e ease o de ec ing spa se on-
se s compa ed o hose ha a e closely clus e ed in ime,
e en hough onse densi y emains well abo e he ne -
wo k’s empo al esolu ion o 10 ms. The op imal eeze
con igu a ion a ies by ins umen , wi h no clea global
end. Howe e , some pa e ns eme ge: o oicing in-
s umen s, ull-ne wo k ine- uning ( ) anks among he
op-pe o ming con igu a ions, whe eas i deg ades pe o -
mance o ime-keeping ins umen s.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
323
0.0
0.2
0.4
0.6
0.8
1.0
F1
Cuica Gonge-Lo Minei o Tambo -Hi Ta ol
bsl*/bsl
Con 1
Con 2
Con 3
Tcn1
Tcn2
Tcn4
Tcn8
Tcn16
Tcn32
Tcn64
Tcn128
Tcn256
Tcn512
Tcn1024
0.0
0.2
0.4
0.6
0.8
1.0
F1
bsl*/bsl
Con 1
Con 2
Con 3
Tcn1
Tcn2
Tcn4
Tcn8
Tcn16
Tcn32
Tcn64
Tcn128
Tcn256
Tcn512
Tcn1024
bsl*/bsl
Con 1
Con 2
Con 3
Tcn1
Tcn2
Tcn4
Tcn8
Tcn16
Tcn32
Tcn64
Tcn128
Tcn256
Tcn512
Tcn1024
bsl*/bsl
Con 1
Con 2
Con 3
Tcn1
Tcn2
Tcn4
Tcn8
Tcn16
Tcn32
Tcn64
Tcn128
Tcn256
Tcn512
Tcn1024
bsl*/bsl
Con 1
Con 2
Con 3
Tcn1
Tcn2
Tcn4
Tcn8
Tcn16
Tcn32
Tcn64
Tcn128
Tcn256
Tcn512
Tcn1024
Figu e 3. Dis ibu ion o F1 sco es pe laye -wise con igu a ion unde wo ans e lea ning se ings: Onse - o-Onse ( op),
whe e ine- uned models a e compa ed agains hei baseline, and Bea - o-Onse (bo om), whe e we assess c oss- ask
e sus wi hin- ask ans e lea ning, wi h compa able pe o mance obse ed o ime-keeping ins umen s.
3.3 Bea - o-Onse T ans e Lea ning Resul s
In his sec ion, we ocus on a domain adap a ion, whe e
a model p e- ained o bea acking is adap ed o on-
se de ec ion. Unlike he p e ious se ing, he goal he e
is no o compa e ine- uned models o hei baseline, as
his o igina es om a di e en ask. We also e ain om
an in-dep h analysis o mean F1 sco es ac oss da ase s,
gi en hei limi ed in e p e a i e alue. Ins ead, we assess
whe he models ine- uned in his se ing achie e esul s
compa able o hose in he onse - o-onse ans e lea ning
scena io. Figu e 3 (bo om) p o ides an o e iew o he
esul s.
Time-keeping ins umen s, such as cuica and gonge-lo,
achie e ela i ely high baseline (bsl*) accu acies, likely
due o he alignmen be ween hei onse s and bea lo-
ca ions. Adap a ion imp o es accu acy ac oss all ins u-
men s, con i ming he easibili y o bea - o-onse ans e
lea ning. Howe e , while he ine- uned models consis-
en ly ou pe o m he bea - acking baseline, di ec com-
pa isons o he onse - o-onse se ing e eal pe o mance
dispa i ies ha a y by ins umen . Speci ically, o ime-
keeping ins umen s, pe o mance emains nea ly iden ical
ac oss bo h ans e lea ning scena ios, wi h di e ences o
only 1.6 p.p. o cuica and 3.7 p.p. o gonge-lo. In con-
as , oicing ins umen s exhibi p og essi ely la ge dis-
c epancies, wi h F1-sco e di e ences o 11.3p.p. o a ol,
27.5p.p. o minei o, and he la ges gap o 32.2p.p. o
ambo -hi.
Close inspec ion o he laye -wise esul s e eals addi-
ional pa e ns. The accu acy gene ally inc eases as mo e
laye s a e ine- uned up o he 3 d o 4 h dila ion le el, be-
yond which no u he gains a e obse ed. Howe e , his
end does no hold o a ol, whe e deepe ine- uning
leads o addi ional pe o mance imp o emen s. These
obse a ions highligh ha , while ine- uning is bene i-
cial ac oss all cases, he op imal e aining dep h emains
ins umen -dependen .
Al oge he , he esul s indica e ha ea u e ans e abil-
i y om bea acking o onse de ec ion is mo e e ec i e
o ime-keeping ins umen s han o oicing ins umen s.
Speci ically, gonge-lo exhibi s a clea ly highe baseline
F1 accu acy in he bea - o-onse se ing compa ed o i s
onse - o-onse coun e pa (0.892 s. 0.508), while cuica
achie es a compa able pe o mance (0.429 s. 0.477), as
epo ed in Table 1. This enhanced c oss- ask adap abili y
a ises om he me ical unc ion o ime-keeping ins u-
men s: hei onse s inhe en ly coincide wi h bea posi ions,
making hem na u al a ge s o he p e- ained model’s
hy hmic ep esen a ions. Examining hese esul s mo e
closely, we e i y ha Ma aca u’s empo ange o 165–180
BPM co esponds o in e -bea in e als o 333–363 ms.
These du a ions app oxima ely ma ch he wa e o m spans
o cuica and gonge-lo, bu no hose o he o he ins u-
men s 5.
This empo al alignmen —whe e he ins umen s’
acous ic p o ile align wi h he gen e’s in e -bea in e -
als—explains he high baseline accu acies. Addi ion-
ally, he la ge capaci y o TCN 2 (116,302 pa ame e s s.
21,890 in TCN 1) and i s exposu e o a b oade aining se
5Acco ding o an in o mal inspec ion o wa e o m spans—cuica:
384-428 ms, gonge-lo: 376-400 ms, a ol: 77-107 ms, minei o: 90-180
ms, and ambo -hi: 120-230 ms.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
324

may u he con ibu e o his ad an age. This sugges s ha
model exp essi i y and p e- aining di e si y can compen-
sa e o ask di e ences in ce ain ans e lea ning scena -
ios.
3.4 Discussion
Ou in es iga ion o wo con as ing ans e lea ning sce-
na ios e eals ha adap a ion ou pe o ms baseline ap-
p oaches ac oss all ins umen s, wi h a ying deg ees o
imp o emen .
In he wi hin-domain se ing, adap a ion yielded high
accu acies wi h F1 sco es om 0.972 (minei o) o 0.998
(gonge-lo) and 0.997 ( a ol). Imp o emen was mos p o-
nounced o ime-keeping ins umen s wi h lowe baseline
accu acies (≈0.500), wi h cuica showing a 52 p.p. gain.
Fo he c oss-domain adap a ion, while imp o emen s o e
he bea - acking baseline (bsl*) we e e iden , compa -
ison agains he onse - acking baseline (bsl) e ealed
ins umen -dependen pa e ns. Voicing ins umen s’ bes
F1 sco es emained below he onse - acking baseline by
11-24 p.p. , indica ing limi ed bene i s om domain adap-
a ion. Howe e , o ime-keeping ins umen s whose on-
se s align wi h he p e- ained model’s hy hmic p io s,
c oss- ask adap a ion yielded imp o emen s o 45-48 pe -
cen age poin s.
These indings p o ide key insigh s: i) Fine- uning
consis en ly enhances pe o mance in bo h se ings, mak-
ing i aluable o achie ing high accu acy in unde ep e-
sen ed music gen es; ii) Models ained on bea - acking
can be e ec i ely adap ed o onse de ec ion, le e aging
model scale o compensa e o ask di e gence and ad-
d essing limi ed da a a ailabili y o non-mains eam in-
s umen s. Howe e , e ec i eness a ies by ins umen
ype: bea - o-onse adap a ion bene i s ime-keeping in-
s umen s, while onse - o-onse adap a ion consis en ly im-
p o es pe o mance ac oss all ins umen s. These im-
p o emen s a e na u ally mo e subs an ial when baseline
accu acy is lowe , as obse ed in oicing ins umen s.
Ou esul s also demons a e ha op imal ine- uning
con igu a ions a y by ins umen , necessi a ing ailo ed
s a egies o selec ing which laye weigh s o upda e du -
ing ine- uning. This challenges he assump ion ha only
laye s closes o he musical su ace and he ou pu laye
would equi e ecalib a ion o op imize a ne wo k o a
speci ic ins umen [17].
Finally, se e al limi a ions wa an conside a ion. Ou
esul s ep esen a single expe imen al cycle, and despi e
p io esea ch sugges ing ela i e s abili y ac oss uns [28,
30], he s ochas ic na u e o he ( e) aining p ocess–due
o con olu ional d opou –implies ha esul s may a y.
While unlikely o a ec gene al ends, mul iple cycles
would be needed o in es iga e speci ic aspec s such as e-
cep i e ield size impac and i s ela ion o op imal laye
eeze selec ion o ins umen wa e o m p o iles. No e
ha , as p e iously discussed, co esponding laye s ac oss
he wo models di e in hei empo al ecep i e ields de-
spi e ha ing he same labels. Fo ins ance, while Con 3
co esponds o app oxima ely 50 ms in bo h models, he
laye Tcn2 spans 170 ms in TCN 1 s. 410 ms in TCN 2.
This disc epancy mus be conside ed when in e p e ing e-
sul s, limi ing di ec compa ison be ween speci ic eeze
con igu a ions ac oss scena ios. The lowe anno a ion
p ecision o minei o u he limi s some esul in e p e-
a ion, po en ially explaining i s anomalous pe o mance
(e.g. lowes ine- uned and baseline accu acy on each se -
ing).
No ably, ou cu en esul s we e achie ed wi h min-
imal adjus men o he expe imen al pipeline o main-
ain ai compa ison wi h baselines. This conse a i e
app oach sugges s g ea e imp o emen s migh be pos-
sible h ough hype pa ame e op imiza ion— o example,
c oss- ask adap a ion may equi e mo e epochs o con e ge
han wi hin- ask adap a ion. While such op imiza ion ex-
ceeded his s udy’s scope, i ep esen s a p omising di-
ec ion o ex ending he clea pe o mance gains demon-
s a ed he e.
4. CONCLUSION
This s udy in es iga ed onse de ec ion in Ma aca u de
baque sol o h ough wo ans e lea ning s a egies: onse -
o-onse adap a ion and bea - o-onse adap a ion. Bo h
app oaches yielded no able imp o emen s o e baseline
models, unde lining he ad an ages o ine- uning o en-
hancing accu acy.
We demons a ed ha c oss- ask adap a ion o models
is iable o less- ep esen ed asks such as onse de ec ion
when s uc u al alignmen exis s be ween sou ce and a ge
domains. T ans e lea ning e ec i ely add esses limi ed
da a a ailabili y and ci cum en s ex ensi e manual anno-
a ion o cos ly aining om sc a ch—a inding wi h im-
po an implica ions o music in o ma ion e ie al, pa -
icula ly when acing da a sca ci y challenges.
Fu u e wo k should add ess his s udy’s limi a ions
while explo ing in g ea e de ail he ac o s in luencing
ans e lea ning e ec i eness. Mul iple- un expe imen s
would con i m obse ed ends and in es iga e speci ic as-
pec s, such as op imal eeze segmen selec ion and i s e-
la ion wi h ne wo k ecep i e ield and ins umen wa e-
o m p o iles, alongside po en ial imp o emen s h ough
hype pa ame e op imiza ion. Addi ional esea ch di ec-
ions include ex ending he analysis o o he da ase s and
unde ep esen ed ins umen s, and e ining aining p o o-
cols. E alua ing ou adap i e app oach using s ic e ole -
ance windows would p o ide deepe insigh s in o empo al
p ecision, pa icula ly o exp essi e ins umen s and mi-
c o iming analysis applica ions whe e ine-scale empo al
a ia ions a e signi ican .
In summa y, his s udy demons a es he e ec i eness
o ans e lea ning in imp o ing musical onse de ec ion
o di e se adi ions beyond he Wes e n canon. By adap -
ing exis ing models, we can imp o e accu acy and obus -
ness o unde ep esen ed sounds. These me hods and in-
sigh s con ibu e o de eloping mo e inclusi e ools o
music analysis, wi h applica ions ex ending beyond he
speci ic gen es and asks s udied he e o bene i he b oade
ield o Music In o ma ion Re ie al.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
325
5. REFERENCES
[1] M. Go o and Y. Mu aoka, “A bea acking sys em o
acous ic signals o music,” in P oceedings o he 2nd
ACM In e na ional Con e ence on Mul imedia (MUL-
TIMEDIA ’94). ACM P ess, 1994, pp. 365–372.
[2] S. Dixon, “Au oma ic Ex ac ion o Tempo and Bea
F om Exp essi e Pe o mances,” Jou nal o New Mu-
sic Resea ch, ol. 30, no. 1, pp. 39–58, 2001.
[3] R. B. Dannenbe g, “Towa d au oma ed holis ic bea
acking, music analysis, and unde s anding,” in P o-
ceedings o he 6 h In e na ional Con e ence on Music
In o ma ion Re ie al (ISMIR), 2005, pp. 366–373.
[4] S. Böck, F. K ebs, and M. Schedl, “E alua ing he on-
line capabili ies o onse de ec ion me hods,” in P o-
ceedings o he 13 h In e na ional Socie y o Music
In o ma ion Re ie al Con e ence (ISMIR), 2012, pp.
49–54.
[5] J. Pons, R. Gong, and X. Se a, “Sco e-in o med sylla-
ble segmen a ion o a cappella singing oice wi h con-
olu ional neu al ne wo ks,” in P oceedings o he 18 h
In e na ional Socie y o Music In o ma ion Re ie al
Con e ence (ISMIR), 2017, pp. 383–389.
[6] R. Vogl, M. Do e , G. Widme , and P. Knees, “D um
T ansc ip ion ia Join Bea and D um Modeling us-
ing Con olu ional Recu en Neu al Ne wo ks,” in P o-
ceedings o he In e na ional Socie y o Music In o -
ma ion Re ie al (ISMIR) Con e ence, 2017, pp. 150–
157.
[7] J. Bello, L. Daude , S. Abdallah, C. Duxbu y,
M. Da ies, and M. Sandle , “A u o ial on onse de ec-
ion in music signals,” IEEE T ansac ions on Speech
and Audio P ocessing, ol. 13, no. 5, pp. 1035–1047,
2005.
[8] S. Dixon, “Onse de ec ion e isi ed,” in P oceedings
o he 9 h In e na ional Con e ence on Digi al Audio
E ec s (DAFx), 2006, pp. 133–137.
[9] M. Ma ol , A. Ka cic, and M. P i osnik, “Neu al ne -
wo ks o no e onse de ec ion in piano music,” in P o-
ceedings o he In e na ional Compu e Music Con e -
ence (ICMC), 2002.
[10] A. Lacos e and D. Eck, “A Supe ised Classi ica ion
Algo i hm o No e Onse De ec ion,” EURASIP Jou -
nal on Ad ances in Signal P ocessing, ol. 2007, no. 1,
p. 043745, 2006.
[11] F. Eyben, S. Böck, B. Schulle , and A. G a es, “Uni-
e sal onse de ec ion wi h bidi ec ional long sho -
e m memo y neu al ne wo ks,” P oceedings o he
11 h In e na ional Socie y o Music In o ma ion Re-
ie al Con e ence, ISMIR 2010, no. Janua y, pp. 589–
594, 2010.
[12] J. Schlü e and S. Böck, “Musical onse de ec ion wi h
Con olu ional Neu al Ne wo ks,” in 6 h in e na ional
wo kshop on machine lea ning and music (MML),
2013.
[13] M. Tomczak and J. Hockman, “Onse De ec ion o
S ing Ins umen s Using Bidi ec ional Tempo al and
Con olu ional Recu en Ne wo ks,” in P oceedings
o he 18 h In e na ional Audio Mos ly Con e ence.
ACM, 2023, pp. 136–142.
[14] G. Pee e s, “The Deep Lea ning Re olu ion in MIR:
The P os and Cons, he Needs and he Challenges,”
in Pe cep ion, Rep esen a ions, Image, Sound, Mu-
sic - 14 h In e na ional Symposium, CMMR 2019,
Ma seille, F ance, Oc obe 14-18, 2019, Re ised Se-
lec ed Pape s, se . Lec u e No es in Compu e Science,
R. K onland-Ma ine , S. Ys ad, and M. A amaki, Eds.,
ol. 12631. Sp inge , 2021, pp. 3–30.
[15] A. S ini asamu hy, A. Holzap el, and X. Se a, “In
Sea ch o Au oma ic Rhy hm Analysis Me hods o
Tu kish and Indian A Music,” Jou nal o New Music
Resea ch, ol. 43, no. 1, pp. 94–114, 2014.
[16] J. Bol and G. Fazekas, “Supe ised Con as i e Lea n-
ing Fo Musical Onse De ec ion,” in P oceedings
o he 18 h In e na ional Audio Mos ly Con e ence.
ACM, 2023, pp. 130–135.
[17] M. E. P. Da ies, M. Fuen es, J. Fonseca, L. Aly,
M. Je ónimo, and F. B. Ba aldi, “Mo ing in Time:
Compu a ional Analysis o Mic o iming in Ma aca u
de Baque Sol o,” in P oceedings o he 21 h In e na-
ional Socie y o Music In o ma ion Re ie al Con e -
ence (ISMIR), 2020, pp. 795–802.
[18] M. Fuen es, B. McFee, H. C. C ayencou , S. Essid, and
J. P. Bello, “A Music S uc u e In o med Downbea
T acking Sys em Using Skip-chain Condi ional Ran-
dom Fields and Deep Lea ning,” in IEEE In e na ional
Con e ence on Acous ics, Speech and Signal P ocess-
ing (ICASSP), ol. 2019-May. IEEE, 2019, pp. 481–
485.
[19] A. S ini asamu hy, A. Holzap el, and X. Se a, “In-
o med au oma ic me e analysis o music eco d-
ings,” in P oceedings o he 18 h In e na ional Socie y
o Music In o ma ion Re ie al Con e ence (ISMIR),
2017, pp. 679–685.
[20] D. Fiocchi, M. Buccoli, M. Zanoni, F. An onacci,
and A. Sa i, “Bea T acking using Recu en Neu-
al Ne wo k: A T ans e Lea ning App oach,” in 26 h
Eu opean Signal P ocessing Con e ence (EUSIPCO).
IEEE, 2018, pp. 1915–1919.
[21] G. Bu loiu, “In e ac i e Lea ning o Mic o iming in an
Exp essi e D um Machine,” in The Join Con e ence
on AI Music C ea i i y, 2020.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
326
[22] Y. Wang, J. Salamon, M. Ca w igh , N. J. B yan, and
J. P. Bello, “Few-Sho D um T ansc ip ion in Poly-
phonic Music,” in P oceedings o he 21s In e na-
ional Socie y o Music In o ma ion Re ie al Con e -
ence (ISMIR), 2020, pp. 117–124.
[23] Y. Wang, N. J. B yan, M. Ca w igh , J. Pablo Bello,
and J. Salamon, “Few-Sho Con inual Lea ning o Au-
dio Classi ica ion,” in IEEE In e na ional Con e ence
on Acous ics, Speech and Signal P ocessing (ICASSP).
IEEE, 2021, pp. 321–325.
[24] J. J. Vale o-Mas and J. M. Iñes a, “In e ac i e use co -
ec ion o au oma ically de ec ed onse s: app oach and
e alua ion,” EURASIP Jou nal on Audio, Speech, and
Music P ocessing, ol. 2017, no. 1, p. 15, 2017.
[25] K. Yamamo o, “Human-in- he-Loop Adap a ion o In-
e ac i e Musical Bea T acking,” in P oceedings o he
22nd In e na ional Socie y o Music In o ma ion Re-
ie al Con e ence (ISMIR), 2021, pp. 794–801.
[26] M. Schedl, E. Gómez, and J. U bano, “Music In o -
ma ion Re ie al: Recen De elopmen s and Applica-
ions,” Founda ions and T ends® in In o ma ion Re-
ie al, ol. 8, no. 2-3, pp. 127–261, 2014.
[27] A. S. Pin o and M. E. P. Da ies, “Towa ds use -
in o med bea acking o musical audio,” in 14 h In-
e na ional Symposium on Compu e Music Mul idis-
ciplina y Resea ch (CMMR), 2019, pp. 577–588.
[28] A. Pin o, S. Böck, J. Ca doso, and M. Da ies, “Use -
D i en Fine-Tuning o Bea T acking,” Elec onics,
ol. 10, no. 13, p. 1518, 2021.
[29] L. S. Maia, M. Rocamo a, and M. Fuen es, “Adap -
ing me e acking models o La in ame ican music,” in
P oceedings o he 23 h In e na ional Socie y o Mu-
sic In o ma ion Re ie al Con e ence (ISMIR), 2022,
pp. 361–368.
[30] A. S. Pin o and G. Be na des, “B idging he Rhy h-
mic Gap : A Use -Cen ic App oach o Bea T acking
in Challenging Music Signals,” in 16 h In e na ional
Symposium on Compu e Music Mul idisciplina y Re-
sea ch (CMMR), 2023, pp. 1–12.
[31] K. Choi, G. Fazekas, M. Sandle , and K. Cho, “T ans-
e lea ning o music classi ica ion and eg ession
asks,” in P oceedings o he 18 h In e na ional Con-
e ence on Music In o ma ion Re ie al (ISMIR), 2017,
pp. 141–149.
[32] J. Fonseca, M. Fuen es, F. Bonini Ba aldi, and M. E.
Da ies, “On he Use o Au oma ic Onse De ec ion
o he Analysis o Ma aca u de Baque Sol o,” in Pe -
spec i es on Music, Sound and Musicology.Cu en Re-
sea ch in Sys ema ic Musicology, ol. 10., L. Co eia
Cas ilho, R. Dias, and J. Pinho, Eds. Sp inge Cham,
2021, pp. 209–225.
[33] S. Böck and M. E. P. Da ies, “Decons uc , Anal-
yse, Recons uc : How To Imp o e Tempo, Bea , and
Downbea Es ima ion,” in P oceedings o he 21s In-
e na ional Socie y o Music In o ma ion Re ie al
Con e ence (ISMIR), 2020, pp. 574–582.
[34] B. McFee, J. W. Kim, M. Ca w igh , J. Salamon,
R. Bi ne , J. P. Bello, and O.-s. P ac ices, “Open-
Sou ce P ac ices o Music Signal P ocessing Re-
sea ch: Recommenda ions o T anspa en , Sus ain-
able, and Rep oducible Audio Resea ch,” IEEE Signal
P ocessing Magazine, ol. 36, no. Janua y, pp. 128–
137, 2019.
[35] C. d. O. San os, T. S. Resende, and P. M. Keays,
Ba uque Book: Ma aca u Baque Vi ado e Baque Sol o.
Au ho ’s edi ion., 2009.
[36] G. P. Bessoni e Sil a, “Ma aca u de Baque Sol o: de
b incadei a a pa imônio cul u al,” Cade no Vi ual de
Tu ismo, ol. 21, no. 2, p. 113, 2021.
[37] M. E. P. Da ies and S. Böck, “Tempo al con olu ional
ne wo ks o musical audio bea acking,” in P oceed-
ings o he 27 h Eu opean Signal P ocessing Con e -
ence (EUSIPCO), 2019.
[38] S. Böck, F. Ko zeniowski, J. Schlü e , F. K ebs, and
G. Widme , “madmom: A New Py hon Audio and
Music Signal P ocessing Lib a y,” in P oceedings o
he 24 h ACM In e na ional Con e ence on Mul imedia
(MM ’16). ACM, 2016, pp. 1174–1178.
[39] J. Schlü e and S. Böck, “Imp o ed musical onse de-
ec ion wi h Con olu ional Neu al Ne wo ks,” in IEEE
In e na ional Con e ence on Acous ics, Speech and
Signal P ocessing (ICASSP). IEEE, 2014, pp. 6979–
6983.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
327

Related note

Why organizations use Identific for document trust, entry 72
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in universities, research institutes, colleges, schools, and publishing workflows, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports clearer documentation of academic decisions, reduced manual checking effort, and more reliable review records. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For policy papers, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com