High-Resolution Sustain Pedal Depth Estimation From Piano Audio Across Room Acoustics

Author: Hanwen Zhang; Kun Fang; Ziyu Wang; Ichiro Fujinaga

Publisher: Zenodo

DOI: 10.5281/zenodo.17706525

Source: https://zenodo.org/records/17706525/files/000068.pdf

HIGH-RESOLUTION SUSTAIN PEDAL DEPTH ESTIMATION FROM
PIANO AUDIO ACROSS ROOM ACOUSTICS
Kun Fang1,2,∗Hanwen Zhang1,2,∗Ziyu Wang3,4Ichi o Fujinaga1,2
1McGill Uni e si y 3New Yo k Uni e si y 4Music X Lab, MBZUAI
2The Cen e o In e disciplina y Resea ch in Music Media and Technology (CIRMMT)
ABSTRACT
Piano sus ain pedal de ec ion has p e iously been ap-
p oached as a bina y on/o classi ica ion ask, limi ing
i s applica ion in eal-wo ld piano pe o mance scena -
ios whe e pedal dep h signi ican ly in luences musical ex-
p ession. This pape p esen s a no el app oach o high-
esolu ion es ima ion ha p edic s con inuous pedal dep h
alues. We in oduce a T ans o me -based a chi ec u e ha
no only ma ches s a e-o - he-a pe o mance on he a-
di ional bina y classi ica ion ask bu also achie es high
accu acy in con inuous pedal dep h es ima ion. Fu he -
mo e, by es ima ing con inuous alues, ou model p o-
ides musically meaning ul p edic ions o sus ain pedal
usage, whe eas baseline models s uggle o cap u e such
nuanced exp essions wi h hei bina y de ec ion app oach.
Addi ionally, his pape in es iga es he in luence o oom
acous ics on sus ain pedal es ima ion using a syn he ic
da ase ha includes a ied acous ic condi ions. We ain
ou model wi h di e en combina ions o oom se ings
and es i in an unseen new en i onmen using a “lea e-
one-ou ” app oach. Ou indings show ha he wo base-
line models and ou s a e no obus o unseen oom condi-
ions. S a is ical analysis u he con i ms ha e e be a-
ion in luences model p edic ions and in oduces an o e -
es ima ion bias.
1. INTRODUCTION
The use o he sus ain pedal is a highly pe sonal and indis-
pensable aspec o piano pe o mance. In ea ises [1, 2],
legenda y piano eache s Ka l Leime (1858–1944) and
Hein ich Neuhaus (1888–1964) bo h placed g ea empha-
sis on ecognizing he sound shaped by he sus ain pedal,
whe e small a ia ions in iming and dep h can c ea e d a-
ma ic e ec s in ones. The sound e ec c ea ed by he sus-
ain pedal is also highly sensi i e o en i onmen al acous-
ics. P o essional pianis s adjus pedaling me iculously in
eal ime o achie e hei desi ed onal quali y, and i is
*Au ho s wi h equal con ibu ion
© K. Fang, H. Zhang, Z. Wang and I. Fujinaga. Licensed
unde a C ea i e Commons A ibu ion 4.0 In e na ional License (CC BY
4.0). A ibu ion: K. Fang, H. Zhang, Z. Wang and I. Fujinaga, “High-
Resolu ion Sus ain Pedal Dep h Es ima ion om Piano Audio Ac oss
Room Acous ics”, in P oc. o he 26 h In . Socie y o Music In o ma ion
Re ie al Con ., Daejeon, Sou h Ko ea, 2025.
he in e play be ween pedaling and ex e nal acous ic ac-
o s ha shapes he lis ening expe ience. In ac , in ea -
lie days, i was a sugges ed p ac ice o simula e sus ain
pedaling e ec s using a i icial e e be a ion algo i hms as
no ed in [3].
Howe e , in music in o ma ion e ie al (MIR), many
s udies ha e adop ed he e m “sus ain pedal de ec ion”
and app oached his p oblem as a bina y (on/o ) clas-
si ica ion [4–8], o e looking he sub le ies o in e medi-
a e s a es. Recen esea ch [6–8] also p ima ily uses he
MAESTRO da ase [9], which is cu en ly one o he
la ges a ailable collec ions o pai ed audio and MIDI iles
eco ded on Yamaha Diskla ie s (acous ic g and pianos
wi h high-p ecision MIDI cap u e and playback sys ems).
Un o una ely, he oom condi ions o MAESTRO’s audio
eco dings a e en i ely unknown and also po en ially un-
p edic able. This aises u he ques ions abou he gene -
alizabili y o hese models ac oss di e en acous ic en i-
onmen s when ained solely on MAESTRO, in addi ion
o he o e simpli ica ion inhe en in bina y classi ica ion.
To his end, we ede ine he ask as con inuous sus ain
pedal dep h es ima ion, aming i as a eg ession p ob-
lem. We p opose a T ans o me -based model wi h a con-
en ional s uc u e ha includes a con olu ional laye o
ea u e ex ac ion, ollowed by an n-laye T ans o me En-
code . Addi ionally, we inco po a e a mixed e alua ion
s a egy, combining F1 o classi ica ion ac oss a ious bin
h esholds wi h con inuous mean absolu e e o (MAE).
Ou esul s demons a e ha ou con inuous app oach ou -
pe o ms bina y models bo h quan i a i ely and quali a-
i ely when ea ing baseline bina y p edic ions as con in-
uous alues while main aining compa able pe o mance in
s ic bina y classi ica ion scena ios.
Mo eo e , we use MAESTRO’s aligned MIDI pe o -
mances and syn hesize a da ase wi h di e se acous ic en-
i onmen s o e alua e he obus ness o pedal de ec ion
algo i hms. In ou expe imen s, we obse e ha bo h ou
model and he baseline models expe ience a pe o mance
d op when ained on eal eco dings and es ed on syn-
he ic audio ende ed in di e en oom acous ics. To u -
he in es iga e his issue, we conduc a lea e-one-ou ex-
pe imen using syn he ic da a om mul iple acous ic en i-
onmen s, which e eals ha he model s uggles o gene -
alize o unseen acous ics. In addi ion, s a is ical analysis
shows ha models ained solely on eal eco dings end
o p oduce highe pedal p edic ions as e e be a ion in-
c eases. This bias is educed when he model is ained on
589
a mo e di e se da ase in e ms o oom acous ics. These
indings demons a e he in luence o oom acous ics on
model p edic ions o he sus ain pedal dep h alues. Ou
con ibu ions a e summa ized as ollows:
1. Con inuous high- esolu ion pedal dep h es ima-
ion. We e o mula e sus ain pedal de ec ion as
a con inuous- alued eg ession ask and p opose a
T ans o me -based model. Ou me hod ma ches
s a e-o - he-a pe o mance in bina y classi ica ion
se ing and ou pe o ms baselines in mul i-class and
eg ession me ics.
2. Robus ness analysis unde a ied oom acous ics.
We conduc con olled lea e-one-ou expe imen s
showing ha unseen oom condi ions deg ade model
pe o mance. S a is ical analysis e eals a consis en
o e es ima ion bias in e e be an se ings, which is
educed h ough acous ic di e si y in aining.
3. A syn he ic da ase o con olled acous ic gene -
aliza ion esea ch. We in oduce a new da ase en-
de ed om MAESTRO’s MIDI iles wi h mul iple
oom con igu a ions, suppo ing sys ema ic s udies
o model obus ness and he e ec s o oom acous-
ics on pedal de ec ion.
2. RELATED WORK
Two impo an ea ly s udies om an acous ic pe spec i e
examined how sus ain pedal a ec s sound. The i s one
[3] ound ha p essing he pedal inc eases decay imes in
mid- ange equencies and al e s he one h ough sympa-
he ic esonance. The u he s udy [10] in es iga ed pa -
pedaling and iden i ied h ee phases: ini ial ee ib a ion,
dampe -s ing in e ac ion, and inal ee ib a ion. This
p o ides scien i ic e idence ha a ying he pedal dep h
allows o g adual and con inuous con ol o e he sound
a he han a simple on/o unc ion.
In MIR- ela ed ield, Liang e al. implemen ed an op i-
cal senso [11] and explo ed he ela ionship be ween ped-
aling echniques and physical ges u es. Thei subsequen
wo k includes classi ying isola ed no es wi h di e en sus-
ain pedal imings [12], classi ying sho exce p s om a
ew eco dings wi h ou le els o pedal dep h [13], and de-
ec ing lega o-pedal onse s by measu ing sympa he ic es-
onance [14]. These s udies ha e shown po en ial bu ha e
no ye ully sol ed he pedal dep h es ima ion p oblem
in eal-wo ld scena ios. Howe e , while mo ing owa ds
deep-lea ning app oach, Liang e al. shi ed hei ocus o
bina y pedal de ec ion in [4, 5] wi h a la ge collec ion o
syn he ic da a. Kong e al. applied he bes CNN in [5] o
eal eco dings om he MAESTRO da ase , achie ing a
0.791 F1 sco e [6].
Some ecen comp ehensi e piano ansc ip ion sys-
ems ha e also in eg a ed sus ain pedal de ec ion and
achie ed s a e-o - he-a bina y pedal de ec ion accu acy
[6–8]. These app oaches employed he same bina y label-
ing o sus ain pedals ollowing [4, 5] (spli pedal on/o a
63/64 MIDI CC64 alues) du ing aining. The cu en
s a e-o - he-a bina y pedal de ec ion has an ac i a ion-
le el F1 sco e o 0.954 [8] on MAESTRO.
3. METHODOLOGY
This sec ion in oduces he da a ep esen a ion and model
a chi ec u e o ame-wise con inuous pedal dep h es i-
ma ion. We i s desc ibe how inpu ea u es and ain-
ing a ge s a e de i ed om audio and aligned MIDI da a.
Then, we p esen he T ans o me -based model, includ-
ing a con olu ional block and mul i- ask p edic ion ou -
pu laye s. We de ail he mul i-objec i e loss unc ion o
lea ning bo h ine-g ained pedal dep h and disc e e change
e en s. Finally, we desc ibe a syn he ic da ase ende ed
unde mul iple acous ic en i onmen s o enable sys ema ic
e alua ion o model gene aliza ion and obus ness.
3.1 Da a Rep esen a ion
Each music piece is segmen ed in o 500- ame clips using
a sliding window, wi h each segmen co esponding o ap-
p oxima ely 5 seconds o audio. The inpu audio is esam-
pled o 16 kHz, which is su icien o cap u e he highes
piano no e (C8 a 4186 Hz), as no ed by Kong e al. [6].
A log-mel spec og am wi h 229 equency bands is com-
pu ed using a sho - ime Fou ie ans o m wi h a Hann
window leng h o 2048 samples and a hop leng h o 160
samples, esul ing in a ame a e o 100 ames pe sec-
ond. In addi ion, we ex ac 20 MFCC coe icien s. All
ea u e ex ac ion is pe o med using lib osa1.
Each segmen passed o he model has shape [T, F]
whe e T= 500 is he numbe o ames and F= 249
is he ea u e dimension. As shown in Figu e 1, he model
p edic s and lea ns om ou ypes o a ge s de i ed om
aligned MIDI da a:
•F ame-Wise Pedal Dep h: A con inuous sequence
o leng h T, wi h each alue no malized o he ange
[0,1] by di iding he MIDI CC64 alues ( ange 0–
127) by 127.
•F ame-Wise Pedal Onse : A bina y sequence o
leng h T, whe e alue 1 indica es a pedal ac i a ion
e en (pedal p essed down) in he cu en ame.
•F ame-Wise Pedal O se : A bina y sequence o
leng h T, whe e a alue o 1 indica es a pedal elease
e en (pedal li ed) in he cu en ame.
•Global Pedal Dep h: A single scala ep esen ing
he a e age pedal le el wi hin he segmen . I sum-
ma izes he o e all pedaling beha io and p o ides
a global supe ision a ge .
We ollow he e en -based de ini ion o pedal onse s
and o se s in [6], whe e onse s a e igge ed by a is-
ing edge in he CC64 cu e and o se s by a alling edge.
To imp o e lea ning and educe sensi i i y o label noise,
we adop he same s a egy as [6] by gene a ing so la-
bels a ound he onse and o se e en s, shown in Figu e 1.
This app oach imp o es con e gence du ing aining and
1lib osa 0.10.1, h ps://doi.o g/10.5281/zenodo.8252662.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
590
Global Pedal Dep h
+
Con 2D
Ba ch No m
ReLU
D opou
Tempo al Pooling
MLP
~
Posi ional Encoding
×3
T ans o me
Encode
Laye
×8
CNN Block
ames
Mel Spec og am + MFCC F ame-wise Pedal Dep h
ame
1
0
F ame-wise Pedal Onse
ame
1
0
F ame-wise Pedal O se
ame
1
0
MLP
MLP
MLP
Inpu
m cc
eq
Figu e 1. O e iew o he model a chi ec u e. Inpu ea u es a e i s p ocessed by a con olu ional block, ollowed by a
T ans o me Encode ha cap u es empo al dependencies. Fou p edic ion heads (MLPs) ou pu ame-wise pedal dep h,
onse , o se , and a global pedal dep h alue.
enables he model o be e localize pedal change bound-
a ies. This ep esen a ion design allows he model o cap-
u e bo h he con inuous con ol o he sus ain pedal and
he disc e e pedal changes.
3.2 Model A chi ec u e
The o e all model a chi ec u e is illus a ed in Figu e 1.
We ollow a common design used in empo al p edic ion
asks such as bea acking [15–17], consis ing o a con-
olu ional block, a T ans o me encode , and mul iple p e-
dic ion laye s. The con olu ional block includes h ee 2D
con olu ional laye s wi h ba ch no maliza ion, ReLU ac i-
a ion, and max pooling along he equency axis, which
comp ess he spec al in o ma ion and ex ac high-le el
ep esen a ions om he log-mel inpu . The esul ing se-
quence is hen ed in o an 8-laye T ans o me encode
wi h a hidden size o 256, 8 a en ion heads, and a eed-
o wa d dimension o 1024. The T ans o me models em-
po al dependencies ac oss ames. On op o he encode ,
we apply ou sepa a e MLPs o p edic ame-wise pedal
dep h, pedal onse , pedal o se , and a global pedal dep h
alue. The ame-wise ou pu s ha e he same leng h as he
inpu sequence (500 ames), while he global p edic ion
is compu ed by applying mean pooling o e he encoded
sequence. A d opou a e o 0.15 is used du ing aining.
3.3 Loss Func ions
Ou model is ained using a mul i- ask objec i e ha
combines con inuous eg ession and bina y classi ica ion
losses. This design e lec s he s uc u e o he p edic-
ion a ge s, which include bo h con inuous- alued pedal
dep h cu e and disc e e pedal change e en s. The o al
loss unc ion is de ined as:
L o al =λ1Lpedal +λ2Lglobal +λ3Lonse +λ4Lo se (1)
whe e Lpedal is he ame-wise mean squa ed e o (MSE)
be ween he p edic ed and g ound u h con inuous pedal
dep h, Lglobal is he MSE loss o p edic ing he global a -
e age pedal dep h ac oss he segmen , and Lonse and Lo se
a e bina y c oss-en opy (BCE) losses applied o he p e-
dic ed onse and o se e en sequences, espec i ely. MSE
is chosen o Lpedal and Lglobal because he p edic ion a -
ge s a e con inuous alues, and minimizing squa ed e o
encou ages he model o ma ch he ine-g ained empo al
s uc u e o pedaling ges u es. In con as , BCE is used o
Lonse and Lo se because hey ep esen bina y e en classi-
ica ion asks a he ame le el. This mul i-objec i e se up
allows he model o simul aneously lea n bo h low-le el
con ol signals (pedal dep h cu es) and high-le el iming
e en s (onse and o se bounda ies). The scala weigh s λ1
h ough λ4a e manually uned o balance he con ibu ion
o each objec i e.
3.4 Da ase Syn hesis Ac oss En i onmen s
To explo e how he same pedaling (and playing) migh
sound gi en di e en acous ic condi ions, we used MAE-
STRO 3 [9] and syn hesized hei eco ded MIDI pe o -
mances using Piano eq 8 S age.2We selec ed ou com-
bina ions o a ying e e b pa ame e s o simula e dis-
inc acous ic en i onmen s. The speci ic con igu a ions
a e summa ized in Table 1. These pa ame e s a e mos ly
de aul oom p ese s in Piano eq. We ou pu audio wi h he
s anda d s e eo mic ophone se up a 44.1 kHz, 16-bi .
4. EXPERIMENTS
This sec ion e alua es ou model’s pe o mance in pedal
dep h es ima ion and compa es i wi h baseline models.
We show ha ou model has ad an ages in bo h quan i-
a i e and quali a i e analysis.
4.1 T aining
Ou model con ains app oxima ely 7.2 million pa ame-
e s. T aining is conduc ed on a single Tesla V100-SXM2-
32GB GPU wi h a ba ch size o 16. We use he AdamW
op imize [18] wi h a lea ning a e o 5×10−4. Each ain-
ing epoch akes a ound 5 hou s and 45 minu es o com-
ple e. We ained he model o a o al o 48 hou s, ob-
se ing he bes checkpoin nea 192,000 s eps in epoch
2Piano eq is a comple e physical modeling so wa e de eloped a In-
s i u e o Ma hema ics o Toulouse, F ance. h ps://www.moda .com/
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
591
Room Name Mix Du Size P eD D.Mix D.Am D.FB Piano Model
1 D y Room - - - - - - - NY S einway D Classical
2 Clean S udio +10dB 0.4s 12m 0s 6% 60ms 0% NY S einway D Classical
3 Conce Hall +50dB 4s 50m 0.01s 15% 60ms 5% NY S einway D Classical
4 Chu ch +10dB 2.5s 18m 0s 25% 30ms 0% Bösendo e 280VC Classical
Table 1. Summa y o oom acous ic se ings and piano models used o syn hesized audio. Each oom con igu a ion
speci ies he e e b mix le el (Mix), e e b du a ion (Du ), oom size (Size), p e-delay (P eD), delay mix (D.Mix), delay
amoun (D.Am ), delay eedback (D.FB), and he piano model used o syn hesis. No e e be a ion is applied in he d y
oom. In he es o his pape , we e e o hese ooms using hei index (e.g., Room 1 o D y Room.).
Model Bina y 4-Class MSE ↓MAE ↓
P↑R↑F1 ↑P↑R↑F1 ↑
Kong e al. [6] 0.9374 0.9374 0.9373 0.4752 0.6786 0.5588 0.0762 0.1579
Yan and Duan [8] 0.9446 0.9445 0.9443 0.4790 0.6836 0.5631 0.0710 0.1528
Ou s (BCE) 0.8907 0.8911 0.8906 0.6039 0.6507 0.6180 0.0584 0.1537
Ou s 0.8975 0.8971 0.8973 0.6849 0.6971 0.6863 0.0425 0.1339
Table 2. Compa ison o model pe o mance on bina y and 4-class classi ica ion, along wi h eg ession me ics. Baselines
suppo only bina y classi ica ion, bu we compu ed hei 4-class sco es o compa ison. Ou model demons a es lowe
MSE and MAE, along wi h signi ican ly highe 4-class sco es. P ecision, ecall, and F1 sco es emain consis en ac oss he
4-class esul s. This is achie ed by p edic ing con inuous pedal dep h alues beyond bina y classi ica ion.
9, a ound which he me ics appea ed o indica e con e -
gence. The o al loss in Equa ion 1 is op imized wi h he
ollowing weigh s: λ1= 0.6 o ame-wise pedal dep h,
λ2= 0.2 o global pedal dep h, λ3= 0.1 o pedal onse ,
and λ4= 0.1 o pedal o se .
4.2 Baselines
We use he models om [6, 8] as ou baselines, as hey
a e, o he bes o ou knowledge, he s a e-o - he-a pi-
ano ansc ip ion models ha include sus ain pedal de ec-
ion. Bo h models ake log-mel spec og ams as inpu .
Kong e al. [6] ained a sepa a e CRNN model o sus ain
pedal de ec ion using he weigh ed sum o bina y c oss-
en opy (BCE) losses o sus ain pedal onse , o se and
alue, while Yan and Duan [8] pu sus ain pedal e en s
along wi h no e e en s and pe o med ansc ip ion as a
whole using a T ans o me Encode and semi-CRF.
We also include an addi ional e sion o ou model, de-
no ed as “Ou s (BCE)” in Table 2 and 3, which is designed
o compa ison and abla ion pu poses. I uses he same a -
chi ec u e as ou main model bu adop s BCE loss o all
ou objec i es.
4.3 Quan i a i e E alua ion
Compa ing ou model wi h baseline me hods poses a chal-
lenge because he baselines ea sus ain pedal p edic ion
as a bina y classi ica ion ask and a e ained using bina y
c oss-en opy loss. In con as , ou model is explici ly de-
signed o es ima e a con inuous pedal dep h cu e by e-
g essing alues wi hin he ange [0,1]. We acknowledge
ha his undamen al di e ence in app oach limi s di ec
compa abili y be ween me hods.
To enable ai compa ison, we i s disc e ize ou con-
inuous p edic ions o ma ch he ame-wise bina y la-
bels used by he baselines. We also compu e ame-wise
mean squa ed e o (MSE) and mean absolu e e o (MAE)
o he baselines, ea ing hei bina y ou pu s as app ox-
ima ions o con inuous alues, hough we ecognize his
may no e lec hei in ended use case. Finally, o as-
sess he esolu ion o con inuous p edic ions, we in oduce
an accu acy-wi h- ole ance me ic ha e alua es p edic ion
co ec ness unde a ying e o h esholds. The e alua ion
me ics a e summa ized as ollows:
•F ame-Wise F1 Sco e: E alua es classi ica ion pe -
o mance by disc e izing p edic ions in o bina y and
mul i-class labels. While baseline models a e ained
only o bina y ou pu , we compu e mul i-class e-
sul s o all models.
•MSE and MAE: Measu e he accu acy o p edic ed
s. g ound u h con inuous pedal alues.
•Accu acy wi h Tole ance: E alua es he model’s
p ecision and esolu ion along he con inuous pedal
dep h axis. A p edic ion is conside ed co ec i i s
absolu e e o alls wi hin a speci ied h eshold.
The quan i a i e esul s in Table 2 show ha ou model
pe o ms compa ably o he baselines in bina y classi ica-
ion, while achie ing be e esul s in a con inuous pedal
p edic ion scena io wi h lowe MSE and MAE. Fu he -
mo e, in piano pedagogy and pe o mance s udies, he ba-
sic disc e e ca ego iza ion o sus ain pedal dep h usually
in ol es qua e , hal , h ee-qua e s, and ull pedal as de-
sc ibed in [19] and [20]. When e alua ed on his musically
meaning ul 4-class classi ica ion ask, ou model achie es
a 0.6863 F1 sco e while bo h baselines sco e a ound 0.56,
sugges ing po en ial o ou con inuous app oach.
We in oduce balanced accu acy wi h ole ance o
e alua e model pe o mance a di e en esolu ion le -
els while accoun ing o he hea y da a imbalance o-
wa d ex eme pedal alues, common in piano playing.
This me ic is pa icula ly use ul in eal-wo ld scena ios
when small p edic ion e o s a e accep able. We calcu-
la e he balanced accu acy wi h ole ance as Aweigh ed =
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
592
Figu e 2. Balanced accu acy and simple a e age accu acy
o ou model compa ed o baselines (Kong e al. [6]; Yan
and Duan [8]) ac oss h esholds anging om 0.01 o 0.4.
PN−1
i=0 wi·ai/PN−1
i=0 wiwhe e weigh wiis de e -
mined by he in e se squa e oo o he equency o each
bin, no malized by he o al sum o in e se squa e oo e-
quency o all bins. We se he numbe o equency bins as
N= 128, ma ching he disc e e 0-127 g ound u h CC64
alues. As in Figu e 2, ou model achie es highe balanced
accu acy han he baselines, s a ing om igh h esholds
below 0.05 and imp o ing u he wi h loose ole ances.
Simple accu acy is mo e sensi i e o imbalance, bu ou
model su passes he baselines om h eshold 0.15 onwa d,
indica ing ha mos p edic ion e o s a e ela i ely small.
4.4 Quali a i e Analysis
In his sec ion, we p esen example segmen s om he
es se ea u ing a ious exp essi e sus ain pedal ges u es,
shown in Figu e 3, including g adual hal -pedaling, lu e -
pedaling, and apid pedal changes wi hin e y sho du a-
ions. These examples a e common and ep esen a i e in
he classical piano epe oi e.
Example (a) shows lega o pedaling wi h apid ull
pedal changes, whe e he baseline ei he misses changes
o p edic s excessi e oscilla ions. Example (b) is a hal -
pedaling ins ance whe e a bina y classi ica ion algo i hm
ails o cap u e he in e media e pedal dep h. In he lu e -
pedaling egion in example (c), ou model iden i ies e-
pea ed pa ial p esses and eleases, whe eas he baseline e-
duces hese exp essi e a ia ions o ab up bina y changes.
These examples indica e ha ou me hod can model eal
pedaling beha io mo e e ec i ely and gene a e musically
meaning ul pedal cu es, p o iding a mo e ai h ul ep e-
sen a ion o pianis s’ p ecise con ol o e he pedal dep h
compa ed o p e ious bina y p edic ion me hods.
5. PEDAL PREDICTION ROBUSTNESS
We obse e ha bo h baseline models, as well as ou own,
s uggle o gene alize om eal eco dings o syn he ic
da a. To add ess his, we conduc a lea e-one-ou expe -
imen showing ha aining on a mo e di e se se o oom
condi ions imp o es ou model’s abili y o gene alize o
unseen acous ic en i onmen s. Fu he analysis suppo s
ou claim ha cu en sus ain pedal de ec ion models a e
no obus o changes in oom acous ics; mo eo e , hei
p edic ions a e signi ican ly in luenced by e e be a ion,
e ealing a s ong sensi i i y o acous ic en i onmen s.
5.1 Robus ness Analysis
To e alua e he model’s obus ness o acous ic a ia ions,
we ollow he o iginal MAESTRO da ase spli o ou syn-
he ic da a. I a piece belongs o he aining se , all i s
syn he ic e sions a e also in aining. This ensu es ha
he model ne e encoun e s he musical con en o es
da a du ing aining ega dless o oom condi ions. We
i s e alua e wo baselines and ou model on syn he ic es
se s and epo mean absolu e e o (MAE) in Table 3. Al-
hough all models pe o m simila ly on in-domain da a, he
baselines deg ade no ably on ou -o -domain (OOD) se s,
indica ing limi ed gene aliza ion. Ou model shows he
same end. To in es iga e u he , we conduc lea e-one-
oom-ou expe imen s by aining on a ious combina ions
o syn he ic da a. As shown in Table 4, he model pe o ms
wo s on any oom i was no ained on, con i ming he
impo ance o acous ic di e si y o model gene aliza ion.
Tes Da a
Model O R1 R2 R3 R4
Kong e al. [6] 0.1579 0.1773 0.1780 0.2458 0.2247
Yan and Duan [8] 0.1528 0.2217 0.2195 0.2449 0.2365
Ou s (BCE) 0.1537 0.1654 0.1681 0.2091 0.1894
Ou s 0.1339 0.1889 0.1841 0.2039 0.1905
Table 3. Mean absolu e e o (MAE)↓o baseline models
es ed on he o iginal MAESTRO eco dings (O) and syn-
he ic audio wi h se ings R1, R2, R3, and R4, espec i ely.
Tes Da a
T aining Da a R1 R2 R3 R4
R2+R3 0.1125 0.1119 0.1453 0.1654
R1+R3 0.1107 0.1145 0.1433 0.1713
R1+R2 0.0979 0.0981 0.1863 0.1779
R1+R2+R3 0.1062 0.1065 0.1439 0.1619
Table 4. MAE↓o ou model ained wi h o iginal MAE-
STRO audio and syn he ic audio wi h se ings ( ooms)
R2+R3, R1+R3, R1+R2, and R1+R2+R3, espec i ely,
es ed on ooms R1, R2, R3, and R4. R4 is ne e included
in any aining con igu a ion and se es as a held-ou con-
di ion o e alua ing gene aliza ion o unseen acous ic en-
i onmen s. Ou -o -domain (OOD) esul s a e boxed.
5.2 E ec o Room Acous ics on Pedal P edic ion
To u he unde s and how oom acous ics in luence ame-
le el pedal p edic ion, we analyze he dis ibu ion o p e-
dic ed pedal dep h alues ac oss ou syn he ic oom con-
di ions o de ed by non-linea p og ession o e e b le els.
Figu e 4 p esen s iolin plo s compa ing g ound u h
pedal alues, p edic ions om ou model ained only on
eal eco dings, and p edic ions om ou model ained on
bo h eal and syn he ic audio ende ed unde R1, R2, and
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
593

(a) Rapid Change (b) Hal Pedal (c) Flu e ing
Time (seconds) Time (seconds) Time (seconds)
Pedal Dep h Pedal Dep h F equency (Hz)
Figu e 3. Compa ison o ou p edic ed pedal cu es and he baseline bina y model [6] agains g ound u h pedal dep h
alues, plo ed alongside aligned mel-spec og ams and hei espec i e sco es: (a) F. Chopin (1810–1849) Polonaise-
Fan aisie Op. 61, mm. 6–7; (b) F. Schube (1797–1828) Imp omp u in F Mino , Op. 142 No. 4, mm. 45–46; and (c) J. S.
Bach (1685–1750) P elude and Fugue in E Mino , BWV 855, mm. 11–12.
(a) G ound T u h (b) P edic ions, Model Ò ealÓ (c) P edic ions, Model ÒmixÓ
Figu e 4. Violin plo s showing he dis ibu ion o p edic ed pedal dep h alues (b and c) and g ound u h (a) ac oss ou
oom se ings: D y Room (R1), Clean S udio (R2), Chu ch (R4), and Conce Hall (R3). (b) is ou model ained solely on
eal da a, while (c) lea ns om eal da a + aining da a om R1, R2, and R3. O ange dashed lines indica e he mean pedal
dep h ac oss ooms, and pink dashed lines ma k he median.
R3. The g ound u h dis ibu ions emain s able ac oss
ooms, wi h consis en anges, medians, and symme ic
shapes, which is unsu p ising since all samples a e syn-
hesized om he same MIDI CC64 alues. In con as ,
he eal-da a-only model shows a no iceable upwa d shi
in bo h median and dis ibu ion shape as e e be a ion in-
c eases, sugges ing ha he model in e p e s inc easing e-
e be an ene gy as mo e use o he sus ain pedal. The
mixed-da a model ( igh plo ) p oduces dis ibu ions ha
mo e closely ma ch he g ound u h, wi h mean alues and
o e all shapes emaining consis en ac oss ooms and no
clea upwa d end as e e b inc eases.
These indings e eal ha gene aliza ion is a key chal-
lenge in sus ain pedal de ec ion. Models ained on a single
acous ic se ing end o p oduce biased p edic ions when
es ed in new en i onmen s. Ou analysis con i ms ha
oom acous ics can signi ican ly a ec model beha io .
Howe e , we also show ha inco po a ing aining da a
om mul iple acous ic se ings helps mi iga e his issue,
sugges ing ha acous ic di e si y is a p omising di ec ion
o imp o ing model obus ness.
6. CONCLUSION
This pape p esen s a high- esolu ion sus ain pedal es ima-
ion model capable o p edic ing con inuous pedal dep h
alues beyond bina y on/o s a es.3Th ough quan i a-
i e and quali a i e e alua ions, we show ha ou model
enables mo e de ailed analysis o exp essi e piano pe -
o mance by cap u ing ine -g ained con inuous pedaling
dep h alues, while main aining pe o mance compa a-
ble o SOTA baselines on p e ious bina y de ec ion asks.
We acknowledge he inhe en limi a ions when compa ing
con inuous and bina y classi ica ion app oaches. We u -
he e alua e ou model’s obus ness ac oss a ying acous-
ic condi ions using “lea e-one-ou ” es wi h di e en
acous ic se ings. Addi ionally, he obus ness analysis e-
eals ha e e be a ion sys ema ically inc eases p edic ion
e o and in oduces an o e es ima ion bias in pedal dep h
alue p edic ions. These indings highligh he impo ance
o modeling con inuous pedal dep h alues and accoun ing
o acous ic a iabili y in sus ain pedal de ec ion asks.
3A ailable a h ps://gi hub.com/kun ang98927/PedalDe ec ion.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
594
7. REFERENCES
[1] W. Gieseking and K. Leime , Piano Technique.
Cou ie Co po a ion, 2013.
[2] H. Neuhaus, The A o Piano Playing. Kahn and
A e ill, 2008.
[3] H.-M. Leh onen, H. Pen inen, J. Rauhala, and
V. Välimäki, “Analysis and modeling o piano sus ain-
pedal e ec s,” The Jou nal o he Acous ical Socie y o
Ame ica, ol. 122, no. 3, pp. 1787–1797, Sep embe
2007.
[4] B. Liang, G. Fazekas, and M. B. Sandle , “T ans e
lea ning o piano sus ain-pedal de ec ion,” in 2019
In e na ional Join Con e ence on Neu al Ne wo ks
(IJCNN). Budapes , Hunga y: IEEE, 2019, pp. 1–6.
[5] ——, “Piano sus ain-pedal de ec ion using con olu-
ional neu al ne wo ks,” in P oceedings o IEEE In-
e na ional Con e ence on Acous ics, Speech and Sig-
nal P ocessing (ICASSP), B igh on, Uni ed Kingdom,
2019, pp. 241–245.
[6] Q. Kong, B. Li, X. Song, Y. Wan, and Y. Wang, “High-
esolu ion piano ansc ip ion wi h pedals by eg ess-
ing onse and o se imes,” IEEE/ACM T ansac ions
on Audio, Speech and Language P ocessing, ol. 29,
pp. 3707–3717, Oc obe 2021.
[7] Y. Yan, F. Cwi kowi z, and Z. Duan, “Skipping he
ame-le el: E en -based piano ansc ip ion wi h neu-
al semi-CRFs,” in Ad ances in Neu al In o ma ion
P ocessing Sys ems, ol. 34, Vi ual, 2021, pp. 20 583–
20 595.
[8] Y. Yan and Z. Duan, “Sco ing ime in e als using non-
hie a chical ans o me o au oma ic piano ansc ip-
ion,” in P oceedings o he In e na ional Socie y o
Music In o ma ion Re ie al Con e ence (ISMIR), San
F ancisco, Uni ed S a es, 2024.
[9] C. Haw ho ne, A. S asyuk, A. Robe s, I. Simon, C.-
Z. A. Huang, S. Dieleman, E. Elsen, J. Engel, and
D. Eck, “Enabling ac o ized piano music modeling
and gene a ion wi h he MAESTRO da ase ,” in P o-
ceedings o he 7 h In e na ional Con e ence on Lea n-
ing Rep esen a ions, New O leans, Louisiana, Uni ed
S a es, 2019.
[10] H.-M. Leh onen, A. Asken el , and V. Välimäki,
“Analysis o he pa -pedaling e ec in he piano,” The
Jou nal o he Acous ical Socie y o Ame ica, ol. 126,
no. 2, pp. EL49–EL54, July 2009.
[11] B. Liang, G. Fazekas, M. B. Sandle , and A. McPhe -
son, “Piano pedalle : A measu emen sys em o clas-
si ica ion and isualisa ion o piano pedalling ech-
niques,” in P oceedings o The In e na ional Con-
e ence on New In e aces o Musical Exp ession
(NIME), Copenhagen, Denma k, 2017, pp. 325–329.
[12] B. Liang, G. Fazekas, and M. B. Sandle , “De ec ion
o piano pedaling echniques on he sus ain pedal,” in
P oceedings o Audio Enginee ing Socie y Con en ion
143, New Yo k, Uni ed S a es, 2017.
[13] ——, “Measu emen , ecogni ion, and isualiza ion o
piano pedaling ges u es and echniques,” Jou nal o he
Audio Enginee ing Socie y, ol. 66, no. 6, pp. 448–456,
June 2018.
[14] ——, “Piano lega o-pedal onse de ec ion based on
a sympa he ic esonance measu e,” in P oceedings o
he 26 h Eu opean Signal P ocessing Con e ence (EU-
SIPCO), Rome, I aly, 2018.
[15] J. Zhao, G. Xia, and Y. Wang, “Bea ans o me :
Demixed bea and downbea acking wi h dila ed sel -
a en ion,” in P oceedings o he In e na ional Socie y
o Music In o ma ion Re ie al Con e ence (ISMIR),
Bengalu u, India, 2022.
[16] Y. Hung, J. Wang, X. Song, W. T. Lu, and M. Won,
“Modeling bea s and downbea s wi h a ime- equency
ans o me ,” in P oceedings o he IEEE In e na ional
Con e ence on Acous ics, Speech and Signal P ocess-
ing (ICASSP), Singapo e, 2022, pp. 401–405.
[17] F. Fosca in, J. Schl"u e , and G. Widme , “Bea his!
accu a e bea acking wi hou dbn pos p ocessing,”
in P oceedings o he In e na ional Socie y o Music
In o ma ion Re ie al Con e ence (ISMIR), San F an-
cisco, Uni ed S a es, 2024.
[18] I. Loshchilo and F. Hu e , “Decoupled weigh
decay egula iza ion,” in P oceedings o he In e -
na ional Con e ence on Lea ning Rep esen a ions
(ICLR), New O leans, Louisiana, Uni ed S a es, 2019.
[Online]. A ailable: h ps://open e iew.ne / o um?id=
BkgQ1j09K7
[19] K. U. Schnabel, Mode n Technique o he Pedal: A Pi-
ano Pedal S udy. New Yo k, Uni ed S a es: Mills
Music, 1954.
[20] J. Banowe z, The Pianis ’s Guide o Pedaling. Bloom-
ing on, IN: Indiana Uni e si y P ess, 1985.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
595

Related note

Why institutions use Plag.ai for originality review, entry 5
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by academic integrity officers in doctoral schools, editorial boards, quality-assurance offices, and student services, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also more transparent source review, better handling of multilingual submissions, and faster first-level screening. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For journal manuscripts, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai