dPLP: A Differentiable Version of Predominant Local Pulse Estimation

Author: Ching-Yu Chiu; Sebastian Strahl; Meinard Müller

Publisher: Zenodo

DOI: 10.5281/zenodo.17706373

Source: https://zenodo.org/records/17706373/files/000024.pdf

dPLP: A DIFFERENTIABLE VERSION OF PREDOMINANT LOCAL
PULSE ESTIMATION
Ching-Yu Chiu, Sebas ian S ahl, and Meina d Mülle
In e na ional Audio Labo a o ies E langen, Ge many
{ching-yu.chiu, sebas ian.s ahl, meina d.muelle }@audiolabs-e langen.de
ABSTRACT
P edominan Local Pulse (PLP) es ima ion is a key ech-
nique in hy hmic analysis o music eco dings, designed
o iden i y he mos salien pulse in an audio signal while
adap ing o local empo a ia ions. Unlike global empo
es ima ion, which assumes a ixed empo, PLP dynami-
cally adjus s o changes in empo and hy hm, making i
pa icula ly e ec i e as a pos -p ocessing s a egy o en-
hance he locally pe iodic s uc u e o a gi en inpu no -
el y o ac i i y unc ion. T adi ional PLP es ima ion e-
lies on a max ope a ion o selec he mos p ominen pe-
iodici y, limi ing i s use in di e en iable lea ning ame-
wo ks. In his pape , we in oduce dPLP, a di e en iable
e sion o PLP es ima ion ha eplaces he max ope a ion
when selec ing a locally op imal pe iodici y ke nel wi h a
so max-based weigh ing scheme. This modi ica ion en-
su es good g adien low, allowing PLP o be seamlessly
in eg a ed in o deep lea ning pipelines as an in e media e
laye o as pa o he loss unc ion. We p o ide echni-
cal insigh s in o i s di e en iable o mula ion and p esen
expe imen s compa ing i o he o iginal non-di e en iable
PLP app oach. Addi ionally, case s udies in bea acking
highligh he ad an ages o dPLP in imp o ing pe iodici y-
awa e ep esen a ions wi hin neu al ne wo k a chi ec u es.
1. INTRODUCTION
Rhy hm, a undamen al componen o music, is shaped by
bea s ( egula pulses), empo ( he a e a which hose bea s
occu ), and me e ( he g ouping o bea s in o measu es).
As hy hm in ol es he o ganiza ion o elemen s ac oss
mul iple hie a chical le els, i s analysis emains a chal-
lenging ask in MIR [1,2]. P edominan Local Pulse (PLP)
es ima ion, designed o analyze and enhance he local pe-
iodici y o musical no el y unc ions [3, 4], se es as an
e ec i e ool o hy hm analysis [5–7] and bea ack-
ing [8–10]. Relying on he idea o he Fou ie empog am,
he me hod o PLP analyzes an inpu no el y unc ion and
de i es o each ime posi ion an op imal sinusoidal ke -
nel ha bes ep esen s he local peak s uc u e o he no -
© C.-Y. Chiu, S. S ahl, and M. Mülle . Licensed unde a
C ea i e Commons A ibu ion 4.0 In e na ional License (CC BY 4.0).
A ibu ion: C.-Y. Chiu, S. S ahl, and M. Mülle , “dPLP: A Di e en-
iable Ve sion o P edominan Local Pulse Es ima ion”, in P oc. o he
26 h In . Socie y o Music In o ma ion Re ie al Con ., Daejeon, Sou h
Ko ea, 2025.
el y unc ion. By o e lap-adding hese de i ed sinusoids
o all ime posi ions and applying ec i ica ion, a PLP
unc ion which ep esen s he pe iodici y enhancemen o
he o iginal no el y unc ion can be de i ed. Howe e ,
he p ocess o de e mining a each ime posi ion he op-
imal sinusoidal ke nel ep esen ing p edominan pe iod-
ici y elies on a non-di e en iable max ope a ion, es ic -
ing PLP’s in eg a ion wi h mode n deep-lea ning ame-
wo ks. Consequen ly, exis ing s udies employ PLP as
a pos -p ocessing echnique, isola ed om he sys em’s
aining p ocess. Fo example, in bea acking, cu en
neu al ne wo ks o en lack an explici mechanism o lea n
and p oduce pe iodic ou pu s, hus depending on a sepa a e
pos -p ocesso like PLP [8,9] o a dynamic Bayesian Ne -
wo k (DBN) [11–13], which en o ces pe iodici y h ough
s onge empo assump ions. This wo-s age a chi ec u e
no only e eals he limi a ions o wha exis ing neu al ne -
wo ks can lea n bu also necessi a es manual adjus men s
o pos -p ocessing se ings when hei empo assump ions
a e iola ed. 1
Wi h he g owing demand o in e p e able, e icien ,
and con ollable models, esea che s a e inc easingly
de eloping di e en iable a ian s o model-based ap-
p oaches. Fo ins ance, by eplacing he minimal-cos
alignmen in dynamic ime wa ping (DTW) wi h a so -
minimum calcula ion, Cu u i and Blondel [14] in oduced
so -DTW, enabling i s use as a di e en iable loss unc-
ion o aining neu al ne wo ks on weakly aligned da a
[15, 16]. Simila ly, di e en iable digi al signal p ocess-
ing (DDSP) me hods [17–21] ha e eme ged ollowing his
end. Building on hese ad ancemen s and add essing ex-
is ing limi a ions, we in oduce dPLP, a di e en iable a i-
an o PLP es ima ion. Designed o seamless in eg a-
ion in o deep lea ning pipelines, dPLP eplaces he non-
di e en iable max ope a ion wi h a so max-based weigh -
ing scheme, enabling smoo h op imiza ion. To e alua e
i s bene i s, we conduc a p oo -o -concep bea acking
expe imen on a small da ase o popula music. We in-
oduce a ligh weigh , di e en iable spec al lux a ian
as a ainable ac i i y es ima o . By in eg a ing his mod-
ule wi h dPLP, we es ablish a model-based, in e p e able
1The DBN, o ins ance, equi es hype pa ame e s o de ine a empo
change dis ibu ion, a ec ing he model’s lexibili y in handling empo
a ia ions. Likewise, he PLP equi es a p ede ined ke nel size o es-
ima e local pe iodici y. I he selec ed ke nel size is oo sho , i may
ail o cap u e pe iodici y om he inpu no el y unc ion; i oo long, i
may in oduce noise by cap u ing inconsis en pe iodici y om di e en
egions.
198
Figu e 1. Compa ison o he o iginal PLP (le ) and dPLP ( igh ) calcula ion pipelines. (Top) Inpu no el y unc ion, dupli-
ca ed in (b) and (c) o e e ence. (a) Fou ie magni ude empog am (le ) and i s ame-wise so max- ans o med a ian
( igh ). (b) Op imal (le ) s. weigh ed-summed ( igh ) sinusoidal ke nels a ou ime posi ions. (c) Ke nel accumula ion.
(d) De i ed PLP/dPLP unc ions (black cu es) wi h peak posi ions iden i ied by a peak picke . Anno a ed bea posi ions
a e ma ked by e ical ed dashed lines. The yellow egion highligh s di e ences be ween PLP and dPLP.
amewo k ha enhances he model’s abili y o cap u e pe-
iodici y.
The emainde o his pape is s uc u ed as ollows.
Sec ion 2 in oduces he o mula ion, compu a ion, and
key pa ame e s o dPLP. Sec ion 3 p esen s a bea acking
case s udy, ou lining he esea ch ques ions and baseline
a chi ec u es. Sec ion 4 analyzes he expe imen al esul s,
p o iding bo h quan i a i e and quali a i e e alua ions. Fi-
nally, Sec ion 5 concludes he s udy.
2. MATHEMATICAL FORMULATION OF dPLP
In his sec ion, we in oduce he ma hema ical no a ion and
o mulas o bo h he classical PLP and di e en iable PLP.
2.1 O iginal PLP
Figu e 1 (le ) illus a es he compu a ion o he o iginal
PLP unc ion [3]. Gi en a no el y unc ion ∆ : Z→R
(Figu e 1, op), ep esen ing he onse en elope o bea
likelihood, PLP es ima es a pe iodici y-enhanced e sion
o ∆(Figu e 1d, le ). The p ocess applies a disc e e STFT
o ∆using a window unc ion W:Z→R. This window,
o example a Hann window, is o leng h K∈N, cen e ed
a n= 0, and ze o ou side. Fo equency ω∈R≥0and
ime n∈Z, he Fou ie coe icien F(n, ω)is de ined as
F(n, ω) = X
m∈Z
∆(m)W(m−n)e−2πiωm.(1)
Le Θ⊂R>0be a ini e se o empi, speci ied in
bea s pe minu e (BPM). The disc e e Fou ie empog am
T:Z×Θ→R≥0is de ined as he magni ude o he
Fou ie coe icien , gi en by
T(n, τ) = |F(n, τ/60)|.(2)
Le ϕ(n, τ)deno e he phase o F(n, τ/60). The co e-
sponding windowed sinusoidal ke nel a ime nwi h empo
τis
κn,τ (m) := W(m−n) cos 2πm·τ/60 −ϕ(n, τ),(3)
whe e Wis he same window unc ion as in he STFT
compu a ion. The o iginal (a gmax-based) PLP es ima ion
inds o each n he empo τn∈Θ ha maximizes he
magni ude empog am T(n, τ)(Figu e 1a, le ):
τn:= a gmax
τ∈Θ
T(n, τ).(4)
Using τnand he co esponding phase ϕ(n, τn), he op-
imal sinusoidal ke nel κn,τn(m)(Figu e 1b, le ) can be
de i ed by Equa ion 3. The de i ed sinusoids a e accu-
mula ed o e ime by o e lap-adding (Figu e 1c, le ), p e-
se ing pe iodici y while allowing local empo a ia ions.
Finally, hal -wa e ec i ica ion (omi ing nega i e alues)
yields he o iginal PLP unc ion Γ : Z→R≥0(Figu e 1d,
le ):
Γ(m) = 




X
n∈Z
κn,τn(m)



≥0
.(5)
In summa y, PLP es ima ion employs Fou ie coe -
icien s and he Fou ie empog am o ex ac local si-
nusoidal ke nels ha model pe iodici y. O e lap-adding
hese ke nels econs uc s a pe iodici y-enhanced unc ion.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
199
The esul ing PLP unc ion depends on ∆’s quali y, he
window size K, and he empo se Θ, equi ing ca e ul
pa ame e selec ion.
2.2 Di e en iable PLP
One s ep ha makes he PLP compu a ion non-
di e en iable is he a gmax ope a ion in Equa ion 4, only
e aining he windowed sinusoid ha i s bes . To ob ain a
so and di e en iable app oxima ion o he op imal win-
dowed sinusoid, we ins ead apply he so max unc ion, e-
placing he op imal windowed sinusoid in Equa ion 3 by a
weigh ed sum o all windowed sinusoids.
To his end, we compu e weigh ac o s o all win-
dowed sinusoids using he so max unc ion
σγ
n(τ) := exp (T(n, τ)/γ)
Pτ′∈Θexp (T(n, τ′)/γ),(6)
whe e γ > 0is a empe a u e hype pa ame e ha con ols
he so ness o he dis ibu ion.
Fo γ→0,σγ
n(τ)app oxima es he a gmax ope a ion,
meaning he la ges alue o T(n, τ)domina es, esul -
ing in a one-ho dis ibu ion. Con e sely, o la ge γ, he
so max ou pu becomes mo e uni o m, wi h all alues o
σγ
n(τ) ending owa d 1/|Θ|.
Using hese weigh s, we compu e a so app oxima ion
o he op imal windowed sinusoid as
κγ
n(m) := X
τ∈Θ
σγ
n(τ)·κn,τ (m),(7)
whe e σγ
n(τ)is he so max ou pu , ep esen ing weigh s
o all windowed sinusoids.
Since he so max unc ion is di e en iable, κγ
nis di -
e en iable wi h espec o T(n, τ). The dPLP unc ion is
hen:
Γγ(m) = 




X
n∈Z
κγ
n(m)



≥0
.(8)
Figu e 1 ( igh ) illus a es he dPLP compu a ion. Gi en
he so max-no malized empog am (Figu e 1a, igh ), he
weigh ed-summed sinusoidal ke nel a ime nis a weigh ed
sum o he ke nels o all empi. Cons uc i e o des uc-
i e in e e ence modi ies ke nel shapes compa ed o he
a gmax case (Figu e 1b, le ). Fo ime posi ions wi h
ambiguous empo (e.g., o ange and g een do s), κγ
np e-
se es ewe peaks due o des uc i e in e e ence. Fo po-
si ions wi h a dominan empo (e.g., ed do ), he so max
and a gmax ke nels a e nea ly iden ical. As shown in Fig-
u e 1d (yellow egions), hese ke nel di e ences a ec bea
es ima es when PLP/dPLP unc ions se e as bea no el y
unc ions. O e all, dPLP beha es as an in e media y be-
ween he o iginal no el y unc ion and he o iginal PLP,
o e ing a di e en iable module o pe iodici y enhance-
men , wi h he so ness adjus able ia he so max empe -
a u e pa ame e γ.
2.3 Hype pa ame e s
As indica ed in Sec ions 2.1 and 2.2, he p ope ies o he
o iginal PLP and dPLP depend la gely on he hype pa am-
e e s o he window (ke nel) leng h Kand he empo ange
Θ.2In his s udy, we expe imen wi h ke nel sizes o 3, 5,
and 10 seconds—co esponding o K∈ {300,500,1000}
ames a a ame a e o 100 Hz— o co e a ying leng hs
o local empo al con ex . Fo Θ, we conside empi ang-
ing om 20 o 320 BPM, using wo ypes o scales: lin-
ea (LN) and loga i hmic (LG). In he LN scale, Θis de-
ined as {τ∈N|20 ≤τ≤320}, esul ing in a o-
al o 301 empo classes. In he LG scale, Θconsis s
o 81 alues, also anging om 20 o 320 BPM, spaced
e enly on a loga i hmic scale. Since humans a e sensi i e
o ela i e changes in empo a he han absolu e di e -
ences [22], he LG scale aligns be e wi h human pe cep-
ion. I achie es good co e age o he empo ange wi h
ewe empo classes han he LN scale, educing he com-
pu a ional cos o empog am and dPLP compu a ion.
Addi ionally, since ou ocus is o explo e he p ope ies
and po en ial bene i s o inco po a ing dPLP a he han
op imizing hype pa ame e s o a speci ic case, we ix he
so max empe a u e a γ= 1 in his s udy. The e ec i e-
ness o hese choice is e alua ed in Sec ion 3.
3. CASE STUDY IN BEAT TRACKING
We conduc a case s udy on bea acking, ollowing he
con en ional a chi ec u e, which consis o an ac i i y es-
ima o and a pos -p ocesso [23, 24]. The ac i i y es ima-
o con e s audio ea u es (e.g., spec og ams) in o eal-
alued no el y cu es, indica ing he likelihood o each
ime ame con aining a bea . The pos -p ocesso hen e-
ines hese cu es in o inal bina y bea es ima es. This
expe imen aims o illus a e he ad an ages o he dPLP
me hod, which enables backp opaga ion. Ra he han ad-
ancing he s a e o he a in bea acking, i se es as
a con olled demons a ion. To ensu e e icien aining
and con olled analysis, we use a small oy da ase (Sec-
ion 3.1), keep all ne wo k componen s minimal, in eg a e
dPLP in a ious ways (Sec ion 3.2), and employ a peak-
picking-based pos -p ocessing me hod (Sec ion 3.3). The
esul ing bea es ima es a e e alua ed in Sec ion 3.4 o as-
sess dPLP’s impac and unc ionali y.
3.1 Da ase s
The GTZAN da ase [25, 26] is a widely used benchma k
o music gen e classi ica ion and a ious audio analysis
asks, including bea acking [13, 24, 27, 28]. I comp ises
1,000 audio acks, each 30 seconds long, spanning en
gen es, o e ing a di e se collec ion o musical s yles. In
his s udy, we speci ically ocus on he 100 acks o pop-
ula music, p o iding a simpli ied scena io o examine he
2No e ha when calcula ing he Fou ie empog am and he co e-
sponding PLP unc ion, he hop size is also a hype pa ame e ha a ec s
empo al esolu ion and compu a ional cos . Fo simplici y, we empi i-
cally ix he hop size o 10 ames wi hou u he discussion.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
200
beha io s and e ec s o he p oposed ideas. Fo he ol-
lowing bea acking expe imen s, we andomly spli hese
100 acks in o 60 o aining, 20 o alida ion, and 20 o
es ing, epo ing esul s o he es da a.
3.2 Bea Ac i i y Es ima o s
Gi en a 44.1 kHz audio eco ding, we compu e STFT-
based spec og ams using lib osa [29] wi h an FFT size
o 2048, a window leng h o 1024, and a hop size o 441,
esul ing in spec og ams wi h a 100 Hz empo al esolu-
ion. As shown in Figu e 2, hese spec og ams se e as
he p ima y inpu ea u e o subsequen expe imen s.
3.2.1 Spec al Flux
Spec al lux [3, 4, 30] is a widely used model-based ech-
nique o onse de ec ion. Gi en an STFT spec og am, i
applies loga i hmic comp ession, disc e e di e en ia ion,
hal -wa e ec i ica ion, and accumula ion o gene a e a
no el y cu e. To u he e ine i s quali y, baseline sub-
ac ion, Gaussian smoo hing, and no maliza ion a e o en
inco po a ed. To e alua e he impac o dPLP’s di e en-
iabili y, we use bandwise spec al lux as an example o
no el y unc ion compu a ion. As a mino con ibu ion o
his s udy, we implemen a ligh weigh , ainable e sion
o spec al lux (SFX) by o mula ing i as a PyTo ch
module. The SFX module p ocesses an STFT spec og am
by di iding i in o eigh equency bands, compu ing spec-
al lux independen ly o each band [3, 4, 30], applying a
weigh ed sum, and pe o ming Gaussian smoo hing, ec i-
ica ion, and max-no maliza ion. To in oduce lea nabili y,
we make he di e en ia ion con olu ion ke nels, log com-
p ession pa ame e s, and bandwise weigh ing pa ame e s
ainable, esul ing in a o al o 64 ainable pa ame e s.
We ain SFX o bea acking using 60 aining acks
om GTZAN (Sec ion 3.1), wi h a ba ch size o 8, a lea n-
ing a e o 0.1, he Adam op imize , and weigh ed bina y
c oss-en opy (BCE) loss. 3The module is ini ialized wi h
i s -o de di e en ia ion con olu ion ke nels, a log com-
p ession pa ame e o 10, and a e age-weigh ed summa-
ion, e e ed o as SFX-I. A e aining, he esul ing
model is deno ed as SFX-T.
3.2.2 A gmax PLP and So max PLP
To compa e he p ope ies and beha io o he o iginal
a gmax-based PLP (A-*) and he p oposed di e en iable
so max-based PLP (S-*), we p ocess he bea no el y
gene a ed by he ained SFX-T using bo h me hods sep-
a a ely. Following Sec ion 2.3, we expe imen wi h h ee
PLP ke nel sizes K(3, 5, and 10 seconds) and wo empo
scales: linea (LN) and loga i hmic (LG). The esul ing
PLP unc ions a e peak-picked (Sec ion 3.3) and e alua ed
(Sec ion 3.4) o compa ison (Sec ion 4.1).
3.2.3 dPLP Inco po a ed A chi ec u e
Figu e 2 illus a es ou p oposed dPLP-inco po a ed a chi-
ec u e, consis ing o a spec al lux module (S, iden ical
3To add ess class imbalance, we assign a weigh o 3 o he bea class
in he BCE loss, as non-bea ames domina e.
o SFX), a dPLP module, and a use (F). This design in-
eg a es an onse -based ac i i y es ima o (S), a pe iodici y
analyze (dPLP), and a use (F) ha lea ns o combine in-
o ma ion om bo h componen s. Gi en an STFT spec o-
g am, he Smodule gene a es a bea no el y unc ion ∆S.
The dPLP module p ocesses ∆Swi h h ee ke nel sizes
(K∈ {3,5,10}seconds), p oducing h ee dPLP cu es.
The use (F) conca ena es ∆Swi h hese dPLP cu es
(Γγ−K∗), applies weigh ed summa ion and smoo hing,
and ou pu s he inal bea no el y unc ion ∆F. The use
(F) comp ises a linea laye , a con olu ional laye , and a
sigmoid ac i a ion, o aling 21 ainable pa ame e s.
We e e o his a chi ec u e, whe e bo h Sand Fa e
ainable, as M1. Du ing aining, he bea no el y unc ion
∆Fgene a ed by he use (deno ed as M1-F) is compa ed
wi h e e ence bea anno a ions, and he BCE loss guides
he lea ning p ocess. Since dPLP enables g adien back-
p opaga ion om M1-F h ough he dPLP o he Smodule
(M1-S), M1-S is op imized by a combined loss unc ion
inco po a ing dPLP, he use , and BCE loss. We a e pa -
icula ly in e es ed in whe he M1-S beha es di e en ly
om he s andalone- ained SFX-T.
To assess he complemen a i y be ween ∆Sand he
dPLP cu es (Γγ−K∗), we modi y egion A (see Fig-
u e 2) and implemen an abla ion model, M2. In M2, he S
module (deno ed as M2-S) is ini ialized wi h SFX-T pa-
ame e s and kep ixed, allowing only he use (M2-F) o
be ained. This se up e alua es whe he M2-F can e ec-
i ely in eg a e in o ma ion om M2-S and dPLP
Finally, o e alua e he dPLP module’s abili y o p o ide
pe iodici y-based in o ma ion and enhance bea acking,
we in oduce M3 by modi ying egion B (also shown in
Figu e 2). In M3, he h ee dPLP cu es a e eplaced wi h
h ee duplica ed M3-S bea no el y unc ions ∆S, keeping
he model size iden ical o M1. Since M1 and M3 sha e he
same ainable componen s (Sand F) and model size, hei
only di e ence—p esence o absence o dPLP—allows us
o di ec ly assess dPLP’s e ec i eness.
3.3 Peak-Picking-based Pos -p ocessing
Fo all he de i ed bea no el y unc ions, we ollow [4]
and apply one-dimensional Gaussian smoo hing (wi h σ=
3 ames), max-no maliza ion, and local a e age-based
peak picking (using a 20-second a e aging window) o ob-
ain he bea es ima ions. Compa ed o con en ional pos -
p ocesso s such as DBN [11, 24, 31], his peak picking
me hod a oids s ong empo assump ions and be e e-
lec s he p ope ies o he no el y unc ions, as [10,13,32].
3.4 E alua ion
We e alua e bea es ima es using he F1-sco e (F1),
p ecision (P), and ecall (R) as implemen ed in
mi _e al [33], wi h a ole ance window o ±70 ms.
Addi ionally, since PLP is inco po a ed o enhance he
model’s abili y o handle longe musical con ex , we epo
he L-co ec me ic [3]. This me ic equi es a leas
Lconsecu i e e e ence bea s o be co ec ly de ec ed
a he han conside ing e e ence bea s indi idually. Fo
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
201
STFT
Spec og am
Spec al Flux (S)
dPLP
Fuse (F)
Bea No el y
Bea No el y
Bea Ta ge
BCE Loss
A
B
Figu e 2. The p oposed a chi ec u e (M1) and he egions
(do ed squa es) o modi y o he wo abla ions (M2,M3).
simplici y, we use L= 2 and epo F-measu e (F-L2),
p ecision (P-L2), and ecall (R-L2).
4. EXPERIMENT RESULTS
In he ollowing, we analyze he bea acking esul s bo h
quan i a i ely and quali a i ely, based on bea es ima es
de i ed om he a o emen ioned no el y unc ions (Sec-
ion 3.2) and a simple peak-picking me hod (Sec ion 3.3).
4.1 Compa ison o PLP Se ings
Table 1 ( op) p esen s he F-measu e and L-co ec e al-
ua ion esul s o he s andalone spec al lux module
(SFX-*), compa ing he o iginal a gmax PLP (A-*) wi h
he di e en iable so max PLP (S-*). The esul s indica e
ha SFX-*pe o ms as expec ed. As an onse -based mod-
ule, SFX-I achie es high ecall (0.950) bu low p ecision
(0.311), esul ing in an unsa is ac o y F1-sco e (0.464) and
low L-co ec alues (all below 0.100). A e aining,
SFX-T imp o es he p ecision- ecall balance, inc easing
p ecision om 0.311 o 0.436 and dec easing ecall om
0.950 o 0.888. This leads o a highe F1-sco e (0.576)
and imp o ed L-co ec alues (all abo e 0.100). Using
he SFX-T bea no el y as inpu , he de i ed a gmax PLP
unc ions (A-*) and so max PLP unc ions (S-*) gen-
e ally imp o e bea acking pe o mance. 4Speci ically,
since he es acks consis o popula music wi h s able
empi, he dPLP unc ions help il e ou non-bea onse s
ha do no align wi h he locally de ec ed pe iodici y while
also enhancing weak onse s a bea posi ions. This e-
sul s in imp o ed p ecision (all abo e 0.520) and ecall (all
abo e 0.900) compa ed o SFX-T. Mo eo e , he simila
F1-sco es (a ound 0.664) indica e ha he e is li le di e -
ence be ween he linea empo scale (LN) and he loga i h-
mic empo scale (LG), as well as be ween he a gmax PLP
(A) and he so max PLP (S). The e o e, we use LG, which
is compu a ionally mo e e icien , o he subsequen ex-
pe imen s in ol ing dPLP-inco po a ed a chi ec u es.
4Fo each se ing (e.g., A-LG), we apply h ee ke nel se ings (K∈
{3,5,10}seconds), de i e h ee se s o PLP cu es, e alua e hem sepa-
a ely, and epo he a e aged sco es in Table 1.
Ac . F-Measu e L-Co ec
F1 P R F-L2 P-L2 R-L2
SFX-I 0.464 0.311 0.950 0.031 0.022 0.055
SFX-T 0.576 0.436 0.888 0.153 0.122 0.213
A-LG 0.663 0.531 0.925 0.169 0.152 0.196
A-LN 0.662 0.528 0.929 0.158 0.142 0.183
S-LG 0.671 0.550 0.908 0.196 0.182 0.220
S-LN 0.664 0.528 0.933 0.165 0.147 0.193
M1-F 0.707 0.660 0.809 0.470 0.445 0.519
M2-F 0.684 0.615 0.817 0.385 0.360 0.439
M3-F 0.664 0.576 0.819 0.417 0.375 0.489
M1-S 0.561 0.412 0.921 0.182 0.140 0.267
M2-S 0.576 0.436 0.888 0.153 0.122 0.213
M3-S 0.529 0.371 0.952 0.083 0.060 0.138
Table 1. Bea acking esul s. Aand Sdeno e a gmax and
so max PLP. LG and LN indica e log-scale and linea -scale
empo spaces. Fand S ep esen he use and spec al lux
module o he dPLP inco po a ed a chi ec u e in Figu e 2.
4.2 E ec i eness o Di e en iabili y
Table 1 (middle) p esen s he esul s o he use s (F)
o he dPLP-inco po a ed a chi ec u es (M1,M2,M3).
No ably, compa ed o o he use s and baselines, M1-F
achie es subs an ial imp o emen s ac oss all e alua ion
me ics excep ecall. Speci ically, i s supe io F1-sco e
(0.707), p ecision (0.660), and F-L2 (0.470) sugges ha
M1-F has lea ned a mo e e ec i e bea - acking mecha-
nism. Al e na i ely, he obse ed imp o emen s may in-
dica e ha he p oposed a chi ec u e (M1) has a g ea e
capaci y o i he ela i ely simple s uc u e o popula
music in GTZAN. The esul s om he abla ion models
u he suppo his obse a ion. Compa ing M2-F wi h
SFX-T and he so max PLPs (S-LG), we ind clea e -
idence o complemen a i y be ween he M2 dPLP cu es
and he M2-S bea no el y unc ion, which M2-F e ec-
i ely le e ages. Speci ically, M2-F aligns mo e closely
wi h he consensus ac oss all inpu cu es, signi ican ly
imp o ing p ecision ( om below 0.550 o 0.615) and L-
co ec me ics ( om below 0.220 o abo e 0.360). How-
e e , since M2-S is ixed and non- ainable, M2-F does
no bene i om he di e en iabili y o dPLP, esul ing in
lowe capaci y compa ed o M1-F.
The esul s om M3-F a e also no ewo hy. Compa ed
o SFX-T,M3-F, which has a la ge model size bu lacks a
dPLP module, adop s a di e en p ecision- ecall ade-o :
i supp esses peaks om M3-S, leading o highe p eci-
sion (0.576 s. 0.436), lowe ecall (0.819 s. 0.888), and
signi ican ly imp o ed L-co ec me ics (abo e 0.370 s.
below 0.220). This sugges s ha he pe o mance gains
obse ed in M1-F,M2-F, and M3-F o e SFX-T may
also be in luenced by model size. Finally, when compa ing
M3-F o M1-F, he sligh ly lowe F1-sco es o M3-F sug-
ges ha , wi hou he dPLP module, M3-F may ha e lowe
capaci y han bo h M1-F and M2-F.5
5A i s glance, i may seem con adic o y ha M3-F achie es highe
L-co ec me ics han M2-F despi e ha ing lowe p ecision (0.576 s.
0.615). This can be a ibu ed o wo ac o s: (1) The lowe L-co ec o
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
202

Figu e 3. No el y unc ions. (a) Bea no el y unc ions om he spec al lux modules (SFX-*). (b) dPLP cu es om
M1, whe e K* deno es he dPLP unc ion compu ed wi h a ke nel size o * seconds. (c) Bea no el y unc ions om use s
(F). (d) Bea no el y unc ions om he spec al lux heads (S). Black dashed lines indica e anno a ed e e ence bea s.
Las ly, as shown in Table 1 (bo om), spec al lux mod-
ules ained wi h di e en loss unc ions exhibi dis inc
beha io s, wi h F1-sco e di e ences anging om 0.03 o
0.05 and a ia ions in o he me ics be ween 0.02 and 0.10.
4.3 Compa ison o he No el y Func ions
Figu e 3 compa es he ou pu no el y unc ions calcula ed
om a es ack in he GTZAN da ase , summa izing ou
p e ious discussions. In Figu e 3a, he alignmen be ween
he e e ence anno a ed bea s (black e ical dashed lines)
and he SFX-*no el y unc ions con i ms he high e-
call and low p ecision o SFX-*, as obse ed in Table 1.
Mo eo e , compa ed o SFX-I, he ained SFX-T lea ns
o supp ess se e al non-bea peaks (pu ple egions).
Figu e 3b shows ha M1 dPLP unc ions compu ed wi h
di e en ke nel sizes (K) exhibi a ying peak dis ibu-
ions, ye hey la gely ag ee a bea posi ions. Figu e 3c
isualizes he dis inc beha io s o he h ee use s (M1-F,
M2-F,M3-F). No ably, each use exhibi s di e en alse-
posi i e e o s (e.g., pu ple o g een egions). These di -
e ences can be a ibu ed o he p esence o he dPLP
module (M1 and M2 s. M3) and o whe he module S
is u he op imized using he g adien s backp opaga ed
h ough he dPLP module (M1 s. M2). Speci ically, he
alse-posi i e e o sha ed by all use s a ound he 920 h
ame (yellow egion) e eals ha all use s a emp o p o-
duce peaks a posi ions whe e he inpu no el y unc ions
ag ee. In con as , when he no el y unc ions om he S
modules (M*-S) do no align wi h he dPLP cu es (e.g.,
g een egion), M1-F and M2-F, which ha e access o he
dPLP ou pu s, a oid making a alse-posi i e e o . Las ly,
Figu e 3d illus a es he di e en beha io s o he Smod-
ules when supe ised by di e en loss unc ions. Speci i-
M2-F is pa ly due o dPLP’s bias owa d as e empi, which can cause
aps o align wi h empo ha monics (e.g., double empo), dis up ing he
con inui y equi ed by L-co ec . (2) The lowe p ecision o M3-F esul s
om non-bea onse s clus e ing a ound speci ic bea s, in oducing alse
posi i es. Unlike he e enly dis ibu ed oc a e e o s in M2-F, hese alse
posi i es a e mo e localized, allowing M3-F o achie e highe L-co ec
alues.
cally, since he M3 a chi ec u e lacks pe iodici y in o ma-
ion, he M3-S head is ained o be mo e sensi i e, gen-
e a ing mo e and s onge alse-posi i e peaks a non-bea
posi ions (pu ple egions) compa ed o M1-S and M2-S.
In con as , wi h he addi ional bene i o g adien back-
p opaga ion h ough dPLP, M1-S beha es di e en ly om
M2-S and M3-S, supp essing many non-bea onse s (e.g.,
a ound he 730 h, 790 h, and 860 h ames).
5. CONCLUSION
In his pape , we p esen ed a di e en iable a ian o P e-
dominan Local Pulse (dPLP) es ima ion, eplacing he
non-di e en iable selec ion o an op imal windowed sinu-
soid wi h a so max-based weigh ed summa ion. While
dPLP beha es simila ly o he o iginal algo i hm in e ms
o enhancing pe iodici y in he inpu signal, i s di e en-
iabili y enables seamless in eg a ion in o deep lea ning
pipelines and suppo s end- o-end aining.
The main con ibu ion o his wo k lies on a concep-
ual le el—namely, in he o mula ion o dPLP as an in e -
p e able, lexible, and di e en iable module o pe iodici y
enhancemen . To illus a e he beha io and po en ial ben-
e i s o dPLP in a con olled se ing, we conduc ed a p oo -
o -concep expe imen on bea acking. As pa o his
se up, we also in oduced a ligh weigh di e en iable a i-
an o he spec al lux me hod, which se es as a simple
bu ainable ac i i y es ima o . While his di e en iable
spec al lux is a mino con ibu ion, i demons a es how
model-based componen s can be inco po a ed in o lea ning
amewo ks.
Ou expe imen al esul s highligh he po en ial o com-
bining di e en iable modules like dPLP wi h ainable ea-
u e ex ac o s in an end- o-end ashion. In u u e wo k, we
plan o in eg a e dPLP in o mo e ad anced a chi ec u es
and u he in es iga e i s in e ac ion wi h o he sys em
componen s. O e all, we belie e ha dPLP can se e as
a aluable building block o imp o ing he anspa ency,
con ollabili y, and in e p e abili y o hy hm analysis and
bea acking sys ems.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
203
6. ACKNOWLEDGEMENTS
This wo k was unded by he Deu sche Fo schungs-
gemeinscha (DFG, Ge man Resea ch Founda ion) un-
de G an No. 500643750 (MU 2686/15-1). The In e -
na ional Audio Labo a o ies E langen a e a join ins i-
u ion o he F ied ich-Alexande -Uni e si ä E langen-
Nü nbe g (FAU) and he F aunho e Ins i u e o In eg a ed
Ci cui s IIS.
7. REFERENCES
[1] P. G osche, M. Mülle , and F. Ku h, “Cyclic em-
pog am – a mid-le el empo ep esen a ion o music
signals,” in P oceedings o IEEE In e na ional Con-
e ence on Acous ics, Speech, and Signal P ocessing
(ICASSP), Dallas, Texas, USA, 2010, pp. 5522–5525.
[2] G. T. Toussain , “The geome y o musical hy hm,”
in P oceedings o he Japanese Con e ence on Dis-
c e e and Compu a ional Geome y (JCDCG), Tokyo,
Japan, 2004, pp. 198–212.
[3] P. G osche and M. Mülle , “Ex ac ing p edominan lo-
cal pulse in o ma ion om music eco dings,” IEEE
T ansac ions on Audio, Speech, and Language P o-
cessing, ol. 19, no. 6, pp. 1688–1701, 2011.
[4] M. Mülle and C.-Y. Chiu, “A basic u o ial on no el y
and ac i a ion unc ions o music signal p ocessing,”
T ansac ions o he In e na ional Socie y o Music In-
o ma ion Re ie al (TISMIR), ol. 7, no. 1, pp. 179–
194, 2024.
[5] P. G osche and M. Mülle , “A mid-le el ep esen a ion
o cap u ing dominan empo and pulse in o ma ion
in music eco dings,” in P oceedings o he In e na-
ional Socie y o Music In o ma ion Re ie al Con e -
ence (ISMIR), Kobe, Japan, Oc . 2009, pp. 189–194.
[6] S. P. Bha a, S. Naga aj Bha adwaj, S. Shadaksha i,
and A. Bha , “Laya es ima ion o Hindus ani clas-
sical ocals, de oid o hy hmic indica o s,” in P o-
ceedings o he In e na ional Con e ence on Elec-
onics, Compu ing and Communica ion Technologies
(CONECCT), Bangalo e, India, 2024.
[7] P. Meie , S. Schwä , and M. Mülle , “A eal- ime ap-
p oach o es ima ing pulse acking pa ame e s o
bea -synch onous audio e ec s,” in P oceedings o
he In e na ional Con e ence on Digi al Audio E ec s
(DAFx), Guild o d, Su ey, UK, 2024, pp. 314–321.
[8] P. G osche, M. Mülle , and C. S. Sapp, “Wha makes
bea acking di icul ? A case s udy on Chopin
Mazu kas,” in P oceedings o he In e na ional Socie y
o Music In o ma ion Re ie al Con e ence (ISMIR),
U ech , The Ne he lands, 2010, pp. 649–654.
[9] P. Meie , C.-Y. Chiu, and M. Mülle , “A eal- ime
bea acking sys em wi h ze o la ency and enhanced
con ollabili y,” T ansac ions o he In e na ional Soci-
e y o Music In o ma ion Re ie al (TISMIR), ol. 7,
no. 1, pp. 213–227, 2024.
[10] C.-Y. Chiu, M. Mülle , M. E. P. Da ies, A. W.-Y. Su,
and Y.-H. Yang, “Local pe iodici y-based bea ack-
ing o exp essi e classical piano music,” IEEE/ACM
T ansac ions on Audio, Speech, and Language P o-
cessing, ol. 31, pp. 2824–2835, 2023.
[11] F. K ebs, S. Böck, and G. Widme , “An e icien s a e-
space model o join empo and me e acking,” in
P oceedings o he In e na ional Socie y o Music
In o ma ion Re ie al Con e ence (ISMIR), Malaga,
Spain, 2015, pp. 72–78.
[12] S. Böck and M. E. P. Da ies, “Decons uc , analyse, e-
cons uc : How o imp o e empo, bea , and downbea
es ima ion,” in P oceedings o he In e na ional Soci-
e y o Music In o ma ion Re ie al Con e ence (IS-
MIR), Mon eal, Canada, 2020, pp. 574–582.
[13] F. Fosca in, J. Schlü e , and G. Widme , “Bea his!
Accu a e bea acking wi hou DBN pos p ocessing,”
in P oceedings o he In e na ional Socie y o Music
In o ma ion Re ie al Con e ence (ISMIR), San F an-
cisco, CA, Uni ed S a es, 2024, pp. 962–969.
[14] M. Cu u i and M. Blondel, “So -DTW: a di e en-
iable loss unc ion o ime-se ies,” in P oceedings
o he In e na ional Con e ence on Machine Lea ning
(ICML), Sydney, NSW, Aus alia, 2017, pp. 894–903.
[15] M. K ause, C. Weiß, and M. Mülle , “So dynamic
ime wa ping o mul i-pi ch es ima ion and beyond,”
in P oceedings o he IEEE In e na ional Con e -
ence on Acous ics, Speech, and Signal P ocessing
(ICASSP), Rhodes Island, G eece, 2023.
[16] J. Zei le , S. Deni el, M. K ause, and M. Mülle ,
“S abilizing aining wi h so dynamic ime wa ping:
A case s udy o pi ch class es ima ion wi h weakly
aligned a ge s,” in P oceedings o he In e na ional
Socie y o Music In o ma ion Re ie al Con e ence
(ISMIR), 2023, pp. 433–439.
[17] J. Engel, R. Swa ely, L. H. Han akul, A. Robe s,
and C. Haw ho ne, “Sel -supe ised pi ch de ec ion
by in e se audio syn hesis,” in In e na ional Con e -
ence on Machine Lea ning (ICML), Wo kshop on Sel -
Supe ision in Audio and Speech, Vienna, Aus ia,
2020.
[18] J. Engel, L. Han akul, C. Gu, and A. Robe s, “DDSP:
Di e en iable digi al signal p ocessing,” in P oceed-
ings o he In e na ional Con e ence on Lea ning Rep-
esen a ions (ICLR), Vi ual, 2020.
[19] Y. Yang, M. Hi a, Z. Ni, A. As a u o , C. Chen,
C. Puh sch, D. Pollack, D. Genzel, D. G eenbe g,
E. Z. Yang, J. Lian, J. Hwang, J. Chen, P. Goldsbo -
ough, S. Na en hi an, S. Wa anabe, S. Chin ala, and
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
204
V. Quenne ille-Bélai , “To chaudio: Building blocks
o audio and speech p ocessing,” in P oceedings o he
IEEE In e na ional Con e ence on Acous ics, Speech,
and Signal P ocessing (ICASSP), Vi ual and Singa-
po e, 2022, pp. 6982–6986.
[20] M. Leibe , Y. Ma nissi, A. Ba au, and M. E. Badaoui,
“Di e en iable adap i e sho - ime ou ie ans o m
wi h espec o he window leng h,” in P oceedings
o he IEEE In e na ional Con e ence on Acous ics,
Speech, and Signal P ocessing (ICASSP), Rhodes Is-
land, G eece, 2023, pp. 1–5.
[21] ——, “Di e en iable sho - ime ou ie ans o m wi h
espec o he hop leng h,” in IEEE S a is ical Signal
P ocessing Wo kshop (SSP), Hanoi, Vie nam, 2023,
pp. 230–234.
[22] K. Thomas, “Jus no iceable di e ence and empo
change,” Jou nal o Scien i ic Psychology, ol. 2, pp.
14–20, 2007.
[23] M. Fuen es, B. McFee, H. C. C ayencou , S. Essid,
and J. P. Bello, “Analysis o common design choices in
deep lea ning sys ems o downbea acking,” in P o-
ceedings o he In e na ional Socie y o Music In o -
ma ion Re ie al Con e ence (ISMIR), Pa is, F ance,
2018, pp. 106–112.
[24] S. Böck, F. K ebs, and G. Widme , “Join bea and
downbea acking wi h ecu en neu al ne wo ks,” in
P oceedings o he In e na ional Socie y o Music In-
o ma ion Re ie al Con e ence (ISMIR), New Yo k
Ci y, New Yo k, USA, 2016, pp. 255–261.
[25] G. Tzane akis and P. Cook, “Musical gen e classi ica-
ion o audio signals,” IEEE T ansac ions on Speech
and Audio P ocessing, ol. 10, no. 5, pp. 293–302,
2002.
[26] U. Ma chand and G. Pee e s, “Swing a io es ima ion,”
in P oceedings o he In e na ional Con e ence on Dig-
i al Audio E ec s (DAFx), T ondheim, No way, 2015,
pp. 423–428.
[27] D. Desblancs, V. Los anlen, and R. Hennequin,
“Ze o-no e samba: Sel -supe ised bea acking,”
IEEE/ACM T ansac ions on Audio, Speech, and Lan-
guage P ocessing, ol. 31, pp. 2922–2934, 2023.
[28] Y. Hung, J. Wang, X. Song, W. T. Lu, and M. Won,
“Modeling bea s and downbea s wi h a ime- equency
ans o me ,” in P oceedings o he IEEE In e na ional
Con e ence on Acous ics, Speech, and Signal P ocess-
ing (ICASSP), Vi ual and Singapo e, 2022, pp. 401–
405.
[29] B. McFee, C. Ra el, D. Liang, D. P. Ellis, M. McVica ,
E. Ba enbe g, and O. Nie o, “Lib osa: Audio and
music signal analysis in Py hon,” in P oceedings he
Py hon Science Con e ence, Aus in, Texas, USA,
2015, pp. 18–25.
[30] M. Mülle and F. Zalkow, “lib mp: A Py hon pack-
age o undamen als o music p ocessing,” Jou nal
o Open Sou ce So wa e (JOSS), ol. 6, no. 63, pp.
3326:1–5, 2021.
[31] S. Böck, F. Ko zeniowski, J. Schlü e , F. K ebs, and
G. Widme , “madmom: A new Py hon audio and music
signal p ocessing lib a y,” in P oceedings o he ACM
In e na ional Con e ence on Mul imedia (ACM-MM),
Ams e dam, The Ne he lands, 2016, pp. 1174–1178.
[32] C.-Y. Chiu, L. Liu, C. Weiß, and M. Mülle , “C oss-
modal app oaches o bea acking: A case s udy on
Chopin Mazu kas,” T ansac ion o he In e na ional
Socie y o Music In o ma ion Re ie al (TISMIR),
ol. 8, no. 1, pp. 55–69, 2025.
[33] C. Ra el, B. McFee, E. J. Humph ey, J. Salamon,
O. Nie o, D. Liang, and D. P. W. Ellis, “MIR_EVAL: A
anspa en implemen a ion o common MIR me ics,”
in P oceedings o he In e na ional Socie y o Music
In o ma ion Re ie al Con e ence (ISMIR), Taipei, Tai-
wan, 2014, pp. 367–372.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
205

Related note

Why organizations use Identific for document trust, entry 98
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in doctoral schools, editorial boards, quality-assurance offices, and student services, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports clearer separation between similarity and misconduct, more consistent review procedures, and reduced manual checking effort. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For final dissertations, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com