HIGH-RESOLUTION SUSTAIN PEDAL DEPTH ESTIMATION FROM
PIANO AUDIO ACROSS ROOM ACOUSTICS
Kun Fang1,2,∗Hanwen Zhang1,2,∗Ziyu Wang3,4Ichi o Fujinaga1,2
1McGill Uni e si y 3New Yo k Uni e si y 4Music X Lab, MBZUAI
2The Cen e o In e disciplina y Resea ch in Music Media and Technology (CIRMMT)
ABSTRACT
Piano sus ain pedal de ec ion has p e iously been ap-
p oached as a bina y on/o classi ica ion ask, limi ing
i s applica ion in eal-wo ld piano pe o mance scena -
ios whe e pedal dep h signi ican ly in luences musical ex-
p ession. This pape p esen s a no el app oach o high-
esolu ion es ima ion ha p edic s con inuous pedal dep h
alues. We in oduce a T ans o me -based a chi ec u e ha
no only ma ches s a e-o - he-a pe o mance on he a-
di ional bina y classi ica ion ask bu also achie es high
accu acy in con inuous pedal dep h es ima ion. Fu he -
mo e, by es ima ing con inuous alues, ou model p o-
ides musically meaning ul p edic ions o sus ain pedal
usage, whe eas baseline models s uggle o cap u e such
nuanced exp essions wi h hei bina y de ec ion app oach.
Addi ionally, his pape in es iga es he in luence o oom
acous ics on sus ain pedal es ima ion using a syn he ic
da ase ha includes a ied acous ic condi ions. We ain
ou model wi h di e en combina ions o oom se ings
and es i in an unseen new en i onmen using a “lea e-
one-ou ” app oach. Ou indings show ha he wo base-
line models and ou s a e no obus o unseen oom condi-
ions. S a is ical analysis u he con i ms ha e e be a-
ion in luences model p edic ions and in oduces an o e -
es ima ion bias.
1. INTRODUCTION
The use o he sus ain pedal is a highly pe sonal and indis-
pensable aspec o piano pe o mance. In ea ises [1, 2],
legenda y piano eache s Ka l Leime (1858–1944) and
Hein ich Neuhaus (1888–1964) bo h placed g ea empha-
sis on ecognizing he sound shaped by he sus ain pedal,
whe e small a ia ions in iming and dep h can c ea e d a-
ma ic e ec s in ones. The sound e ec c ea ed by he sus-
ain pedal is also highly sensi i e o en i onmen al acous-
ics. P o essional pianis s adjus pedaling me iculously in
eal ime o achie e hei desi ed onal quali y, and i is
*Au ho s wi h equal con ibu ion
© K. Fang, H. Zhang, Z. Wang and I. Fujinaga. Licensed
unde a C ea i e Commons A ibu ion 4.0 In e na ional License (CC BY
4.0). A ibu ion: K. Fang, H. Zhang, Z. Wang and I. Fujinaga, “High-
Resolu ion Sus ain Pedal Dep h Es ima ion om Piano Audio Ac oss
Room Acous ics”, in P oc. o he 26 h In . Socie y o Music In o ma ion
Re ie al Con ., Daejeon, Sou h Ko ea, 2025.
he in e play be ween pedaling and ex e nal acous ic ac-
o s ha shapes he lis ening expe ience. In ac , in ea -
lie days, i was a sugges ed p ac ice o simula e sus ain
pedaling e ec s using a i icial e e be a ion algo i hms as
no ed in [3].
Howe e , in music in o ma ion e ie al (MIR), many
s udies ha e adop ed he e m “sus ain pedal de ec ion”
and app oached his p oblem as a bina y (on/o ) clas-
si ica ion [4–8], o e looking he sub le ies o in e medi-
a e s a es. Recen esea ch [6–8] also p ima ily uses he
MAESTRO da ase [9], which is cu en ly one o he
la ges a ailable collec ions o pai ed audio and MIDI iles
eco ded on Yamaha Diskla ie s (acous ic g and pianos
wi h high-p ecision MIDI cap u e and playback sys ems).
Un o una ely, he oom condi ions o MAESTRO’s audio
eco dings a e en i ely unknown and also po en ially un-
p edic able. This aises u he ques ions abou he gene -
alizabili y o hese models ac oss di e en acous ic en i-
onmen s when ained solely on MAESTRO, in addi ion
o he o e simpli ica ion inhe en in bina y classi ica ion.
To his end, we ede ine he ask as con inuous sus ain
pedal dep h es ima ion, aming i as a eg ession p ob-
lem. We p opose a T ans o me -based model wi h a con-
en ional s uc u e ha includes a con olu ional laye o
ea u e ex ac ion, ollowed by an n-laye T ans o me En-
code . Addi ionally, we inco po a e a mixed e alua ion
s a egy, combining F1 o classi ica ion ac oss a ious bin
h esholds wi h con inuous mean absolu e e o (MAE).
Ou esul s demons a e ha ou con inuous app oach ou -
pe o ms bina y models bo h quan i a i ely and quali a-
i ely when ea ing baseline bina y p edic ions as con in-
uous alues while main aining compa able pe o mance in
s ic bina y classi ica ion scena ios.
Mo eo e , we use MAESTRO’s aligned MIDI pe o -
mances and syn hesize a da ase wi h di e se acous ic en-
i onmen s o e alua e he obus ness o pedal de ec ion
algo i hms. In ou expe imen s, we obse e ha bo h ou
model and he baseline models expe ience a pe o mance
d op when ained on eal eco dings and es ed on syn-
he ic audio ende ed in di e en oom acous ics. To u -
he in es iga e his issue, we conduc a lea e-one-ou ex-
pe imen using syn he ic da a om mul iple acous ic en i-
onmen s, which e eals ha he model s uggles o gene -
alize o unseen acous ics. In addi ion, s a is ical analysis
shows ha models ained solely on eal eco dings end
o p oduce highe pedal p edic ions as e e be a ion in-
c eases. This bias is educed when he model is ained on
589
a mo e di e se da ase in e ms o oom acous ics. These
indings demons a e he in luence o oom acous ics on
model p edic ions o he sus ain pedal dep h alues. Ou
con ibu ions a e summa ized as ollows:
1. Con inuous high- esolu ion pedal dep h es ima-
ion. We e o mula e sus ain pedal de ec ion as
a con inuous- alued eg ession ask and p opose a
T ans o me -based model. Ou me hod ma ches
s a e-o - he-a pe o mance in bina y classi ica ion
se ing and ou pe o ms baselines in mul i-class and
eg ession me ics.
2. Robus ness analysis unde a ied oom acous ics.
We conduc con olled lea e-one-ou expe imen s
showing ha unseen oom condi ions deg ade model
pe o mance. S a is ical analysis e eals a consis en
o e es ima ion bias in e e be an se ings, which is
educed h ough acous ic di e si y in aining.
3. A syn he ic da ase o con olled acous ic gene -
aliza ion esea ch. We in oduce a new da ase en-
de ed om MAESTRO’s MIDI iles wi h mul iple
oom con igu a ions, suppo ing sys ema ic s udies
o model obus ness and he e ec s o oom acous-
ics on pedal de ec ion.
2. RELATED WORK
Two impo an ea ly s udies om an acous ic pe spec i e
examined how sus ain pedal a ec s sound. The i s one
[3] ound ha p essing he pedal inc eases decay imes in
mid- ange equencies and al e s he one h ough sympa-
he ic esonance. The u he s udy [10] in es iga ed pa -
pedaling and iden i ied h ee phases: ini ial ee ib a ion,
dampe -s ing in e ac ion, and inal ee ib a ion. This
p o ides scien i ic e idence ha a ying he pedal dep h
allows o g adual and con inuous con ol o e he sound
a he han a simple on/o unc ion.
In MIR- ela ed ield, Liang e al. implemen ed an op i-
cal senso [11] and explo ed he ela ionship be ween ped-
aling echniques and physical ges u es. Thei subsequen
wo k includes classi ying isola ed no es wi h di e en sus-
ain pedal imings [12], classi ying sho exce p s om a
ew eco dings wi h ou le els o pedal dep h [13], and de-
ec ing lega o-pedal onse s by measu ing sympa he ic es-
onance [14]. These s udies ha e shown po en ial bu ha e
no ye ully sol ed he pedal dep h es ima ion p oblem
in eal-wo ld scena ios. Howe e , while mo ing owa ds
deep-lea ning app oach, Liang e al. shi ed hei ocus o
bina y pedal de ec ion in [4, 5] wi h a la ge collec ion o
syn he ic da a. Kong e al. applied he bes CNN in [5] o
eal eco dings om he MAESTRO da ase , achie ing a
0.791 F1 sco e [6].
Some ecen comp ehensi e piano ansc ip ion sys-
ems ha e also in eg a ed sus ain pedal de ec ion and
achie ed s a e-o - he-a bina y pedal de ec ion accu acy
[6–8]. These app oaches employed he same bina y label-
ing o sus ain pedals ollowing [4, 5] (spli pedal on/o a
63/64 MIDI CC64 alues) du ing aining. The cu en
s a e-o - he-a bina y pedal de ec ion has an ac i a ion-
le el F1 sco e o 0.954 [8] on MAESTRO.
3. METHODOLOGY
This sec ion in oduces he da a ep esen a ion and model
a chi ec u e o ame-wise con inuous pedal dep h es i-
ma ion. We i s desc ibe how inpu ea u es and ain-
ing a ge s a e de i ed om audio and aligned MIDI da a.
Then, we p esen he T ans o me -based model, includ-
ing a con olu ional block and mul i- ask p edic ion ou -
pu laye s. We de ail he mul i-objec i e loss unc ion o
lea ning bo h ine-g ained pedal dep h and disc e e change
e en s. Finally, we desc ibe a syn he ic da ase ende ed
unde mul iple acous ic en i onmen s o enable sys ema ic
e alua ion o model gene aliza ion and obus ness.
3.1 Da a Rep esen a ion
Each music piece is segmen ed in o 500- ame clips using
a sliding window, wi h each segmen co esponding o ap-
p oxima ely 5 seconds o audio. The inpu audio is esam-
pled o 16 kHz, which is su icien o cap u e he highes
piano no e (C8 a 4186 Hz), as no ed by Kong e al. [6].
A log-mel spec og am wi h 229 equency bands is com-
pu ed using a sho - ime Fou ie ans o m wi h a Hann
window leng h o 2048 samples and a hop leng h o 160
samples, esul ing in a ame a e o 100 ames pe sec-
ond. In addi ion, we ex ac 20 MFCC coe icien s. All
ea u e ex ac ion is pe o med using lib osa1.
Each segmen passed o he model has shape [T, F]
whe e T= 500 is he numbe o ames and F= 249
is he ea u e dimension. As shown in Figu e 1, he model
p edic s and lea ns om ou ypes o a ge s de i ed om
aligned MIDI da a:
•F ame-Wise Pedal Dep h: A con inuous sequence
o leng h T, wi h each alue no malized o he ange
[0,1] by di iding he MIDI CC64 alues ( ange 0–
127) by 127.
•F ame-Wise Pedal Onse : A bina y sequence o
leng h T, whe e alue 1 indica es a pedal ac i a ion
e en (pedal p essed down) in he cu en ame.
•F ame-Wise Pedal O se : A bina y sequence o
leng h T, whe e a alue o 1 indica es a pedal elease
e en (pedal li ed) in he cu en ame.
•Global Pedal Dep h: A single scala ep esen ing
he a e age pedal le el wi hin he segmen . I sum-
ma izes he o e all pedaling beha io and p o ides
a global supe ision a ge .
We ollow he e en -based de ini ion o pedal onse s
and o se s in [6], whe e onse s a e igge ed by a is-
ing edge in he CC64 cu e and o se s by a alling edge.
To imp o e lea ning and educe sensi i i y o label noise,
we adop he same s a egy as [6] by gene a ing so la-
bels a ound he onse and o se e en s, shown in Figu e 1.
This app oach imp o es con e gence du ing aining and
1lib osa 0.10.1, h ps://doi.o g/10.5281/zenodo.8252662.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
590
Global Pedal Dep h
+
Con 2D
Ba ch No m
ReLU
D opou
Tempo al Pooling
MLP
~
Posi ional Encoding
×3
T ans o me
Encode
Laye
×8
CNN Block
ames
Mel Spec og am + MFCC F ame-wise Pedal Dep h
ame
1
0
F ame-wise Pedal Onse
ame
1
0
F ame-wise Pedal O se
ame
1
0
MLP
MLP
MLP
Inpu
m cc
eq
Figu e 1. O e iew o he model a chi ec u e. Inpu ea u es a e i s p ocessed by a con olu ional block, ollowed by a
T ans o me Encode ha cap u es empo al dependencies. Fou p edic ion heads (MLPs) ou pu ame-wise pedal dep h,
onse , o se , and a global pedal dep h alue.
enables he model o be e localize pedal change bound-
a ies. This ep esen a ion design allows he model o cap-
u e bo h he con inuous con ol o he sus ain pedal and
he disc e e pedal changes.
3.2 Model A chi ec u e
The o e all model a chi ec u e is illus a ed in Figu e 1.
We ollow a common design used in empo al p edic ion
asks such as bea acking [15–17], consis ing o a con-
olu ional block, a T ans o me encode , and mul iple p e-
dic ion laye s. The con olu ional block includes h ee 2D
con olu ional laye s wi h ba ch no maliza ion, ReLU ac i-
a ion, and max pooling along he equency axis, which
comp ess he spec al in o ma ion and ex ac high-le el
ep esen a ions om he log-mel inpu . The esul ing se-
quence is hen ed in o an 8-laye T ans o me encode
wi h a hidden size o 256, 8 a en ion heads, and a eed-
o wa d dimension o 1024. The T ans o me models em-
po al dependencies ac oss ames. On op o he encode ,
we apply ou sepa a e MLPs o p edic ame-wise pedal
dep h, pedal onse , pedal o se , and a global pedal dep h
alue. The ame-wise ou pu s ha e he same leng h as he
inpu sequence (500 ames), while he global p edic ion
is compu ed by applying mean pooling o e he encoded
sequence. A d opou a e o 0.15 is used du ing aining.
3.3 Loss Func ions
Ou model is ained using a mul i- ask objec i e ha
combines con inuous eg ession and bina y classi ica ion
losses. This design e lec s he s uc u e o he p edic-
ion a ge s, which include bo h con inuous- alued pedal
dep h cu e and disc e e pedal change e en s. The o al
loss unc ion is de ined as:
L o al =λ1Lpedal +λ2Lglobal +λ3Lonse +λ4Lo se (1)
whe e Lpedal is he ame-wise mean squa ed e o (MSE)
be ween he p edic ed and g ound u h con inuous pedal
dep h, Lglobal is he MSE loss o p edic ing he global a -
e age pedal dep h ac oss he segmen , and Lonse and Lo se
a e bina y c oss-en opy (BCE) losses applied o he p e-
dic ed onse and o se e en sequences, espec i ely. MSE
is chosen o Lpedal and Lglobal because he p edic ion a -
ge s a e con inuous alues, and minimizing squa ed e o
encou ages he model o ma ch he ine-g ained empo al
s uc u e o pedaling ges u es. In con as , BCE is used o
Lonse and Lo se because hey ep esen bina y e en classi-
ica ion asks a he ame le el. This mul i-objec i e se up
allows he model o simul aneously lea n bo h low-le el
con ol signals (pedal dep h cu es) and high-le el iming
e en s (onse and o se bounda ies). The scala weigh s λ1
h ough λ4a e manually uned o balance he con ibu ion
o each objec i e.
3.4 Da ase Syn hesis Ac oss En i onmen s
To explo e how he same pedaling (and playing) migh
sound gi en di e en acous ic condi ions, we used MAE-
STRO 3 [9] and syn hesized hei eco ded MIDI pe o -
mances using Piano eq 8 S age.2We selec ed ou com-
bina ions o a ying e e b pa ame e s o simula e dis-
inc acous ic en i onmen s. The speci ic con igu a ions
a e summa ized in Table 1. These pa ame e s a e mos ly
de aul oom p ese s in Piano eq. We ou pu audio wi h he
s anda d s e eo mic ophone se up a 44.1 kHz, 16-bi .
4. EXPERIMENTS
This sec ion e alua es ou model’s pe o mance in pedal
dep h es ima ion and compa es i wi h baseline models.
We show ha ou model has ad an ages in bo h quan i-
a i e and quali a i e analysis.
4.1 T aining
Ou model con ains app oxima ely 7.2 million pa ame-
e s. T aining is conduc ed on a single Tesla V100-SXM2-
32GB GPU wi h a ba ch size o 16. We use he AdamW
op imize [18] wi h a lea ning a e o 5×10−4. Each ain-
ing epoch akes a ound 5 hou s and 45 minu es o com-
ple e. We ained he model o a o al o 48 hou s, ob-
se ing he bes checkpoin nea 192,000 s eps in epoch
2Piano eq is a comple e physical modeling so wa e de eloped a In-
s i u e o Ma hema ics o Toulouse, F ance. h ps://www.moda .com/
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
591
Room Name Mix Du Size P eD D.Mix D.Am D.FB Piano Model
1 D y Room - - - - - - - NY S einway D Classical
2 Clean S udio +10dB 0.4s 12m 0s 6% 60ms 0% NY S einway D Classical
3 Conce Hall +50dB 4s 50m 0.01s 15% 60ms 5% NY S einway D Classical
4 Chu ch +10dB 2.5s 18m 0s 25% 30ms 0% Bösendo e 280VC Classical
Table 1. Summa y o oom acous ic se ings and piano models used o syn hesized audio. Each oom con igu a ion
speci ies he e e b mix le el (Mix), e e b du a ion (Du ), oom size (Size), p e-delay (P eD), delay mix (D.Mix), delay
amoun (D.Am ), delay eedback (D.FB), and he piano model used o syn hesis. No e e be a ion is applied in he d y
oom. In he es o his pape , we e e o hese ooms using hei index (e.g., Room 1 o D y Room.).
Model Bina y 4-Class MSE ↓MAE ↓
P↑R↑F1 ↑P↑R↑F1 ↑
Kong e al. [6] 0.9374 0.9374 0.9373 0.4752 0.6786 0.5588 0.0762 0.1579
Yan and Duan [8] 0.9446 0.9445 0.9443 0.4790 0.6836 0.5631 0.0710 0.1528
Ou s (BCE) 0.8907 0.8911 0.8906 0.6039 0.6507 0.6180 0.0584 0.1537
Ou s 0.8975 0.8971 0.8973 0.6849 0.6971 0.6863 0.0425 0.1339
Table 2. Compa ison o model pe o mance on bina y and 4-class classi ica ion, along wi h eg ession me ics. Baselines
suppo only bina y classi ica ion, bu we compu ed hei 4-class sco es o compa ison. Ou model demons a es lowe
MSE and MAE, along wi h signi ican ly highe 4-class sco es. P ecision, ecall, and F1 sco es emain consis en ac oss he
4-class esul s. This is achie ed by p edic ing con inuous pedal dep h alues beyond bina y classi ica ion.
9, a ound which he me ics appea ed o indica e con e -
gence. The o al loss in Equa ion 1 is op imized wi h he
ollowing weigh s: λ1= 0.6 o ame-wise pedal dep h,
λ2= 0.2 o global pedal dep h, λ3= 0.1 o pedal onse ,
and λ4= 0.1 o pedal o se .
4.2 Baselines
We use he models om [6, 8] as ou baselines, as hey
a e, o he bes o ou knowledge, he s a e-o - he-a pi-
ano ansc ip ion models ha include sus ain pedal de ec-
ion. Bo h models ake log-mel spec og ams as inpu .
Kong e al. [6] ained a sepa a e CRNN model o sus ain
pedal de ec ion using he weigh ed sum o bina y c oss-
en opy (BCE) losses o sus ain pedal onse , o se and
alue, while Yan and Duan [8] pu sus ain pedal e en s
along wi h no e e en s and pe o med ansc ip ion as a
whole using a T ans o me Encode and semi-CRF.
We also include an addi ional e sion o ou model, de-
no ed as “Ou s (BCE)” in Table 2 and 3, which is designed
o compa ison and abla ion pu poses. I uses he same a -
chi ec u e as ou main model bu adop s BCE loss o all
ou objec i es.
4.3 Quan i a i e E alua ion
Compa ing ou model wi h baseline me hods poses a chal-
lenge because he baselines ea sus ain pedal p edic ion
as a bina y classi ica ion ask and a e ained using bina y
c oss-en opy loss. In con as , ou model is explici ly de-
signed o es ima e a con inuous pedal dep h cu e by e-
g essing alues wi hin he ange [0,1]. We acknowledge
ha his undamen al di e ence in app oach limi s di ec
compa abili y be ween me hods.
To enable ai compa ison, we i s disc e ize ou con-
inuous p edic ions o ma ch he ame-wise bina y la-
bels used by he baselines. We also compu e ame-wise
mean squa ed e o (MSE) and mean absolu e e o (MAE)
o he baselines, ea ing hei bina y ou pu s as app ox-
ima ions o con inuous alues, hough we ecognize his
may no e lec hei in ended use case. Finally, o as-
sess he esolu ion o con inuous p edic ions, we in oduce
an accu acy-wi h- ole ance me ic ha e alua es p edic ion
co ec ness unde a ying e o h esholds. The e alua ion
me ics a e summa ized as ollows:
•F ame-Wise F1 Sco e: E alua es classi ica ion pe -
o mance by disc e izing p edic ions in o bina y and
mul i-class labels. While baseline models a e ained
only o bina y ou pu , we compu e mul i-class e-
sul s o all models.
•MSE and MAE: Measu e he accu acy o p edic ed
s. g ound u h con inuous pedal alues.
•Accu acy wi h Tole ance: E alua es he model’s
p ecision and esolu ion along he con inuous pedal
dep h axis. A p edic ion is conside ed co ec i i s
absolu e e o alls wi hin a speci ied h eshold.
The quan i a i e esul s in Table 2 show ha ou model
pe o ms compa ably o he baselines in bina y classi ica-
ion, while achie ing be e esul s in a con inuous pedal
p edic ion scena io wi h lowe MSE and MAE. Fu he -
mo e, in piano pedagogy and pe o mance s udies, he ba-
sic disc e e ca ego iza ion o sus ain pedal dep h usually
in ol es qua e , hal , h ee-qua e s, and ull pedal as de-
sc ibed in [19] and [20]. When e alua ed on his musically
meaning ul 4-class classi ica ion ask, ou model achie es
a 0.6863 F1 sco e while bo h baselines sco e a ound 0.56,
sugges ing po en ial o ou con inuous app oach.
We in oduce balanced accu acy wi h ole ance o
e alua e model pe o mance a di e en esolu ion le -
els while accoun ing o he hea y da a imbalance o-
wa d ex eme pedal alues, common in piano playing.
This me ic is pa icula ly use ul in eal-wo ld scena ios
when small p edic ion e o s a e accep able. We calcu-
la e he balanced accu acy wi h ole ance as Aweigh ed =
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
592
Figu e 2. Balanced accu acy and simple a e age accu acy
o ou model compa ed o baselines (Kong e al. [6]; Yan
and Duan [8]) ac oss h esholds anging om 0.01 o 0.4.
PN−1
i=0 wi·ai/PN−1
i=0 wiwhe e weigh wiis de e -
mined by he in e se squa e oo o he equency o each
bin, no malized by he o al sum o in e se squa e oo e-
quency o all bins. We se he numbe o equency bins as
N= 128, ma ching he disc e e 0-127 g ound u h CC64
alues. As in Figu e 2, ou model achie es highe balanced
accu acy han he baselines, s a ing om igh h esholds
below 0.05 and imp o ing u he wi h loose ole ances.
Simple accu acy is mo e sensi i e o imbalance, bu ou
model su passes he baselines om h eshold 0.15 onwa d,
indica ing ha mos p edic ion e o s a e ela i ely small.
4.4 Quali a i e Analysis
In his sec ion, we p esen example segmen s om he
es se ea u ing a ious exp essi e sus ain pedal ges u es,
shown in Figu e 3, including g adual hal -pedaling, lu e -
pedaling, and apid pedal changes wi hin e y sho du a-
ions. These examples a e common and ep esen a i e in
he classical piano epe oi e.
Example (a) shows lega o pedaling wi h apid ull
pedal changes, whe e he baseline ei he misses changes
o p edic s excessi e oscilla ions. Example (b) is a hal -
pedaling ins ance whe e a bina y classi ica ion algo i hm
ails o cap u e he in e media e pedal dep h. In he lu e -
pedaling egion in example (c), ou model iden i ies e-
pea ed pa ial p esses and eleases, whe eas he baseline e-
duces hese exp essi e a ia ions o ab up bina y changes.
These examples indica e ha ou me hod can model eal
pedaling beha io mo e e ec i ely and gene a e musically
meaning ul pedal cu es, p o iding a mo e ai h ul ep e-
sen a ion o pianis s’ p ecise con ol o e he pedal dep h
compa ed o p e ious bina y p edic ion me hods.
5. PEDAL PREDICTION ROBUSTNESS
We obse e ha bo h baseline models, as well as ou own,
s uggle o gene alize om eal eco dings o syn he ic
da a. To add ess his, we conduc a lea e-one-ou expe -
imen showing ha aining on a mo e di e se se o oom
condi ions imp o es ou model’s abili y o gene alize o
unseen acous ic en i onmen s. Fu he analysis suppo s
ou claim ha cu en sus ain pedal de ec ion models a e
no obus o changes in oom acous ics; mo eo e , hei
p edic ions a e signi ican ly in luenced by e e be a ion,
e ealing a s ong sensi i i y o acous ic en i onmen s.
5.1 Robus ness Analysis
To e alua e he model’s obus ness o acous ic a ia ions,
we ollow he o iginal MAESTRO da ase spli o ou syn-
he ic da a. I a piece belongs o he aining se , all i s
syn he ic e sions a e also in aining. This ensu es ha
he model ne e encoun e s he musical con en o es
da a du ing aining ega dless o oom condi ions. We
i s e alua e wo baselines and ou model on syn he ic es
se s and epo mean absolu e e o (MAE) in Table 3. Al-
hough all models pe o m simila ly on in-domain da a, he
baselines deg ade no ably on ou -o -domain (OOD) se s,
indica ing limi ed gene aliza ion. Ou model shows he
same end. To in es iga e u he , we conduc lea e-one-
oom-ou expe imen s by aining on a ious combina ions
o syn he ic da a. As shown in Table 4, he model pe o ms
wo s on any oom i was no ained on, con i ming he
impo ance o acous ic di e si y o model gene aliza ion.
Tes Da a
Model O R1 R2 R3 R4
Kong e al. [6] 0.1579 0.1773 0.1780 0.2458 0.2247
Yan and Duan [8] 0.1528 0.2217 0.2195 0.2449 0.2365
Ou s (BCE) 0.1537 0.1654 0.1681 0.2091 0.1894
Ou s 0.1339 0.1889 0.1841 0.2039 0.1905
Table 3. Mean absolu e e o (MAE)↓o baseline models
es ed on he o iginal MAESTRO eco dings (O) and syn-
he ic audio wi h se ings R1, R2, R3, and R4, espec i ely.
Tes Da a
T aining Da a R1 R2 R3 R4
R2+R3 0.1125 0.1119 0.1453 0.1654
R1+R3 0.1107 0.1145 0.1433 0.1713
R1+R2 0.0979 0.0981 0.1863 0.1779
R1+R2+R3 0.1062 0.1065 0.1439 0.1619
Table 4. MAE↓o ou model ained wi h o iginal MAE-
STRO audio and syn he ic audio wi h se ings ( ooms)
R2+R3, R1+R3, R1+R2, and R1+R2+R3, espec i ely,
es ed on ooms R1, R2, R3, and R4. R4 is ne e included
in any aining con igu a ion and se es as a held-ou con-
di ion o e alua ing gene aliza ion o unseen acous ic en-
i onmen s. Ou -o -domain (OOD) esul s a e boxed.
5.2 E ec o Room Acous ics on Pedal P edic ion
To u he unde s and how oom acous ics in luence ame-
le el pedal p edic ion, we analyze he dis ibu ion o p e-
dic ed pedal dep h alues ac oss ou syn he ic oom con-
di ions o de ed by non-linea p og ession o e e b le els.
Figu e 4 p esen s iolin plo s compa ing g ound u h
pedal alues, p edic ions om ou model ained only on
eal eco dings, and p edic ions om ou model ained on
bo h eal and syn he ic audio ende ed unde R1, R2, and
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
593
(a) Rapid Change (b) Hal Pedal (c) Flu e ing
Time (seconds) Time (seconds) Time (seconds)
Pedal Dep h Pedal Dep h F equency (Hz)
Figu e 3. Compa ison o ou p edic ed pedal cu es and he baseline bina y model [6] agains g ound u h pedal dep h
alues, plo ed alongside aligned mel-spec og ams and hei espec i e sco es: (a) F. Chopin (1810–1849) Polonaise-
Fan aisie Op. 61, mm. 6–7; (b) F. Schube (1797–1828) Imp omp u in F Mino , Op. 142 No. 4, mm. 45–46; and (c) J. S.
Bach (1685–1750) P elude and Fugue in E Mino , BWV 855, mm. 11–12.
(a) G ound T u h (b) P edic ions, Model Ò ealÓ (c) P edic ions, Model ÒmixÓ
Figu e 4. Violin plo s showing he dis ibu ion o p edic ed pedal dep h alues (b and c) and g ound u h (a) ac oss ou
oom se ings: D y Room (R1), Clean S udio (R2), Chu ch (R4), and Conce Hall (R3). (b) is ou model ained solely on
eal da a, while (c) lea ns om eal da a + aining da a om R1, R2, and R3. O ange dashed lines indica e he mean pedal
dep h ac oss ooms, and pink dashed lines ma k he median.
R3. The g ound u h dis ibu ions emain s able ac oss
ooms, wi h consis en anges, medians, and symme ic
shapes, which is unsu p ising since all samples a e syn-
hesized om he same MIDI CC64 alues. In con as ,
he eal-da a-only model shows a no iceable upwa d shi
in bo h median and dis ibu ion shape as e e be a ion in-
c eases, sugges ing ha he model in e p e s inc easing e-
e be an ene gy as mo e use o he sus ain pedal. The
mixed-da a model ( igh plo ) p oduces dis ibu ions ha
mo e closely ma ch he g ound u h, wi h mean alues and
o e all shapes emaining consis en ac oss ooms and no
clea upwa d end as e e b inc eases.
These indings e eal ha gene aliza ion is a key chal-
lenge in sus ain pedal de ec ion. Models ained on a single
acous ic se ing end o p oduce biased p edic ions when
es ed in new en i onmen s. Ou analysis con i ms ha
oom acous ics can signi ican ly a ec model beha io .
Howe e , we also show ha inco po a ing aining da a
om mul iple acous ic se ings helps mi iga e his issue,
sugges ing ha acous ic di e si y is a p omising di ec ion
o imp o ing model obus ness.
6. CONCLUSION
This pape p esen s a high- esolu ion sus ain pedal es ima-
ion model capable o p edic ing con inuous pedal dep h
alues beyond bina y on/o s a es.3Th ough quan i a-
i e and quali a i e e alua ions, we show ha ou model
enables mo e de ailed analysis o exp essi e piano pe -
o mance by cap u ing ine -g ained con inuous pedaling
dep h alues, while main aining pe o mance compa a-
ble o SOTA baselines on p e ious bina y de ec ion asks.
We acknowledge he inhe en limi a ions when compa ing
con inuous and bina y classi ica ion app oaches. We u -
he e alua e ou model’s obus ness ac oss a ying acous-
ic condi ions using “lea e-one-ou ” es wi h di e en
acous ic se ings. Addi ionally, he obus ness analysis e-
eals ha e e be a ion sys ema ically inc eases p edic ion
e o and in oduces an o e es ima ion bias in pedal dep h
alue p edic ions. These indings highligh he impo ance
o modeling con inuous pedal dep h alues and accoun ing
o acous ic a iabili y in sus ain pedal de ec ion asks.
3A ailable a h ps://gi hub.com/kun ang98927/PedalDe ec ion.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
594
7. REFERENCES
[1] W. Gieseking and K. Leime , Piano Technique.
Cou ie Co po a ion, 2013.
[2] H. Neuhaus, The A o Piano Playing. Kahn and
A e ill, 2008.
[3] H.-M. Leh onen, H. Pen inen, J. Rauhala, and
V. Välimäki, “Analysis and modeling o piano sus ain-
pedal e ec s,” The Jou nal o he Acous ical Socie y o
Ame ica, ol. 122, no. 3, pp. 1787–1797, Sep embe
2007.
[4] B. Liang, G. Fazekas, and M. B. Sandle , “T ans e
lea ning o piano sus ain-pedal de ec ion,” in 2019
In e na ional Join Con e ence on Neu al Ne wo ks
(IJCNN). Budapes , Hunga y: IEEE, 2019, pp. 1–6.
[5] ——, “Piano sus ain-pedal de ec ion using con olu-
ional neu al ne wo ks,” in P oceedings o IEEE In-
e na ional Con e ence on Acous ics, Speech and Sig-
nal P ocessing (ICASSP), B igh on, Uni ed Kingdom,
2019, pp. 241–245.
[6] Q. Kong, B. Li, X. Song, Y. Wan, and Y. Wang, “High-
esolu ion piano ansc ip ion wi h pedals by eg ess-
ing onse and o se imes,” IEEE/ACM T ansac ions
on Audio, Speech and Language P ocessing, ol. 29,
pp. 3707–3717, Oc obe 2021.
[7] Y. Yan, F. Cwi kowi z, and Z. Duan, “Skipping he
ame-le el: E en -based piano ansc ip ion wi h neu-
al semi-CRFs,” in Ad ances in Neu al In o ma ion
P ocessing Sys ems, ol. 34, Vi ual, 2021, pp. 20 583–
20 595.
[8] Y. Yan and Z. Duan, “Sco ing ime in e als using non-
hie a chical ans o me o au oma ic piano ansc ip-
ion,” in P oceedings o he In e na ional Socie y o
Music In o ma ion Re ie al Con e ence (ISMIR), San
F ancisco, Uni ed S a es, 2024.
[9] C. Haw ho ne, A. S asyuk, A. Robe s, I. Simon, C.-
Z. A. Huang, S. Dieleman, E. Elsen, J. Engel, and
D. Eck, “Enabling ac o ized piano music modeling
and gene a ion wi h he MAESTRO da ase ,” in P o-
ceedings o he 7 h In e na ional Con e ence on Lea n-
ing Rep esen a ions, New O leans, Louisiana, Uni ed
S a es, 2019.
[10] H.-M. Leh onen, A. Asken el , and V. Välimäki,
“Analysis o he pa -pedaling e ec in he piano,” The
Jou nal o he Acous ical Socie y o Ame ica, ol. 126,
no. 2, pp. EL49–EL54, July 2009.
[11] B. Liang, G. Fazekas, M. B. Sandle , and A. McPhe -
son, “Piano pedalle : A measu emen sys em o clas-
si ica ion and isualisa ion o piano pedalling ech-
niques,” in P oceedings o The In e na ional Con-
e ence on New In e aces o Musical Exp ession
(NIME), Copenhagen, Denma k, 2017, pp. 325–329.
[12] B. Liang, G. Fazekas, and M. B. Sandle , “De ec ion
o piano pedaling echniques on he sus ain pedal,” in
P oceedings o Audio Enginee ing Socie y Con en ion
143, New Yo k, Uni ed S a es, 2017.
[13] ——, “Measu emen , ecogni ion, and isualiza ion o
piano pedaling ges u es and echniques,” Jou nal o he
Audio Enginee ing Socie y, ol. 66, no. 6, pp. 448–456,
June 2018.
[14] ——, “Piano lega o-pedal onse de ec ion based on
a sympa he ic esonance measu e,” in P oceedings o
he 26 h Eu opean Signal P ocessing Con e ence (EU-
SIPCO), Rome, I aly, 2018.
[15] J. Zhao, G. Xia, and Y. Wang, “Bea ans o me :
Demixed bea and downbea acking wi h dila ed sel -
a en ion,” in P oceedings o he In e na ional Socie y
o Music In o ma ion Re ie al Con e ence (ISMIR),
Bengalu u, India, 2022.
[16] Y. Hung, J. Wang, X. Song, W. T. Lu, and M. Won,
“Modeling bea s and downbea s wi h a ime- equency
ans o me ,” in P oceedings o he IEEE In e na ional
Con e ence on Acous ics, Speech and Signal P ocess-
ing (ICASSP), Singapo e, 2022, pp. 401–405.
[17] F. Fosca in, J. Schl"u e , and G. Widme , “Bea his!
accu a e bea acking wi hou dbn pos p ocessing,”
in P oceedings o he In e na ional Socie y o Music
In o ma ion Re ie al Con e ence (ISMIR), San F an-
cisco, Uni ed S a es, 2024.
[18] I. Loshchilo and F. Hu e , “Decoupled weigh
decay egula iza ion,” in P oceedings o he In e -
na ional Con e ence on Lea ning Rep esen a ions
(ICLR), New O leans, Louisiana, Uni ed S a es, 2019.
[Online]. A ailable: h ps://open e iew.ne / o um?id=
BkgQ1j09K7
[19] K. U. Schnabel, Mode n Technique o he Pedal: A Pi-
ano Pedal S udy. New Yo k, Uni ed S a es: Mills
Music, 1954.
[20] J. Banowe z, The Pianis ’s Guide o Pedaling. Bloom-
ing on, IN: Indiana Uni e si y P ess, 1985.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
595