scieee Science in your language
[en] (orig)

Modeling the Difficulty of Saxophone Music

Author: Šimon Libřický; jr., Jan Hajič
Publisher: Zenodo
DOI: 10.5281/zenodo.17706579
Source: https://zenodo.org/records/17706579/files/000087.pdf
MODELING THE DIFFICULTY OF SAXOPHONE MUSIC
Šimon Libˇ
ický, Jan Hajiˇ
c j .
Ins i u e o Fo mal and Applied Linguis ics, Cha les Uni e si y
[email p o ec ed], [email p o ec ed]
ABSTRACT
In lea ning music, di icul y is an impo an ac o in choice
o epe oi e, choice o empo, and s uc u e o p ac ice.
These choices a e ypically done wi h he guidance o
a eache ; howe e , no all lea ne s ha e access o one.
While piano and s ings ha e had some a en ion de o ed o
au oma ed di icul y es ima ion, wind ins umen s ha e so
a been unde -se ed. In his pape , we p opose a me hod
o es ima ing he di icul y o pieces o winds and im-
plemen i o he eno saxophone. We ake he cos -o -
a e sal app oach, modelling he pa as a sequence o
ansi ions – no e pai s. We es ima e ansi ion cos s om
newly collec ed eco dings o ill speeds, compa ing ep-
esen a ions o saxophone inge ings a a ious le els o
expe inpu . We hen compu e and isualise he cos o
he op imal pa h h ough he pa , a a gi en empo. While
we p esen his model o he eno saxophone, he same
pipeline can be applied o o he woodwinds, and ou ex-
pe imen s show ha wi h app op ia e ea u e design, only
a small p opo ion o possible ills is needed o es ima e
he cos s well. Thus, we p esen a p ac ical way o di e -
si ying he capabili ies o MIR in music educa ion o he
wind amily o ins umen s.
1. INTRODUCTION AND SYSTEM OVERVIEW
One essen ial elemen o s udying music is choosing ap-
p op ia e epe oi e. Ou o he ac o s ha make a piece
app op ia e o a lea ne , i s di icul y is among he mos
impo an : oo much o a challenge, o no enough o one,
can se iously damage he lea ning p ocess [1,2]. A majo
ac o go e ning he di icul y o execu ing an ins umen-
al pa is also i s a ge empo, which mus be de e mined.
In addi ion o es ima ing he di icul y o en i e pieces, an
o e iew o which pa s o a piece a e di icul (and hence
will likely equi e he mos p ac ice) migh be use ul o
any lea ne . While eache s a e o cou se p o icien in as-
signing app op ia e epe oi e, empo, and p ac ice plans,
no e e y lea ne has access o a quali ied eache . Hence,
he e is cause o y and model he di icul y o a piece and
i s pa s au oma ically.
© Š. Libˇ
ický, and J. Hajiˇ
c j .. Licensed unde a C ea i e
Commons A ibu ion 4.0 In e na ional License (CC BY 4.0). A ibu-
ion: Š. Libˇ
ický, and J. Hajiˇ
c j ., “Modeling he Di icul y o Saxophone
Music”, in P oc. o he 26 h In . Socie y o Music In o ma ion Re ie al
Con ., Daejeon, Sou h Ko ea, 2025.
Jus as he e a e unde -se ed s uden s, he e a e unde -
se ed ins umen s. The opic o di icul y es ima ion, o-
ge he wi h inge ing es ima ion, has been explo ed on he
piano [3,4] and o some ex en also on s ing ins umen s
[5,6]; no , howe e , o wind ins umen s.
In his pape , we p opose a model o di icul y o he
eno saxophone. We ake he “op imum pa h” pa adigm
[6], whe e con ol o musical ins umen s is undamen-
ally s uc u ed in o “play s a es” o “play ac ions” mean
o sound he desi ed one a he desi ed ime. Playing a
pa can hen be modelled as a e sing a pa h h ough he
co esponding play s a es in ime, and di icul y o a pa is
he cos o his a e sal.
In o de o his model o gene alise o unseen pa s,
we can ac o ize he cos o a pa h o he agg ega ed cos s
o sub-sequences o which we ha e cos es ima es, sim-
ila ly o how n-g am language models unca e his o y.
These sub-sequences can be as sho as indi idual no es
(unig ams); howe e , playing one no e wi h no con ex is
nea ly meaningless. Toge he wi h [6] we se le on ansi-
ions be ween wo inge ings as he uni s o which cos
is de ined. 1We e ain a leas some meaning ul musi-
cal con ex , while keeping he s a e space manageable (ap-
p ox. 750 possible inge ing pai s on he eno saxophone).
T a e sal h ough a woodwind pa is hen he a e sal
h ough a ini e-s a e au oma on wi h ones as nodes and
ansi ion cos s on he edges, which is easily ac able.
Two hings hus mus be done: de ining he play s a es
o he saxophone, and es ima ing ansi ion cos s. Fo he
saxophone, a play s a e ha p oduces a desi ed one is a
combina ion o inge ing, b ea h, muscle oicing, and a -
icula ion. B ea h, howe e , is ha d o obse e and o -
malise, and in any case is no en i ely independen om
inge ing, hough some cha ac e isa ion o b ea h pa am-
e e s om woodwind audio has been done [7, 8]. A icu-
la ion and muscle oicing o saxophones ha e been s ud-
ied [9,10], bu no o a le el ha hey can be easily mod-
elled compu a ionally. Hence, we ocus on encoding play
s a es ia inge ing.
The inal missing piece o he puzzle is hen o es ima e
he cos s o ansi ions. This is, howe e , non- i ial o
he saxophone. Fo ins ance, he majo second om (w i -
en) C o D on a eno saxophone a ies wildly in b ea h
suppo , oicing equi emen s, and mo o mo emen s in
he h ee oc a es whe e his in e al appea s. Thus, in o -
1The e is only a small numbe o scena ios on a saxophone whe e a
pe o me would al e inge ings based on any hing o he han he p eced-
ing and ollowing inge ings.
747
de o es ima e he ansi ion cos s, a combina ion o expe
knowledge and eal-wo ld da a is necessa y. Di ec da a-
d i en di icul y es ima ion [11] ains a model on ex insic
anno a ions o di icul y, ins ead o in insic p ope ies o
he no es ha make up he pe o mance, equi ing mo e
da a wi h di icul y a ings. Ins ead, we p oposed o c e-
a ing inge ing and di icul y p edic ion models ha ely
on da a ha a e easie o acqui e, using ill speed as a
p oxy o inge ing ansi ion di icul y, aking inspi a ion
om [12], who e alua ed heo e ical inge ing designs o
digi al ins umen s.
Besides designing and implemen ing he applica ion o
he op imum pa h di icul y model o he saxophone using
ill speeds, which is gene al enough o apply o any wind
ins umen wi hou ha ing o adjus impo an assump ions
(such as single- oice pa s 2), we also con ibu e a da ase
o eno saxophone ills and ex ac ed ill speeds o all
i s playable ansi ions, we show ha expe knowledge in
ep esen a ion design o he inhe en ly limi ed da a leads
o mo e accu a e ansi ion cos es ima ion as well as o
minimising he da a acquisi ion cos s o mo e ins umen
models, and we p o ide he implemen a ion o he di i-
cul y model i sel as well as he pipeline o da a collec ion
and ill speed ex ac ion. 3
2. RELATED WORK
Exis ing music di icul y es ima ion me hods ely on us-
ing musician-made anno a ions o op imal inge ings [3],
o se ing (o lea ning) o weigh s based on expe obse -
a ions [6]. Fo woodwinds and b ass ins umen s, how-
e e , whe e la ge leaps can be accomplished wi h minimal
mo o mo emen in he inge s and a ms, he ules-based
app oach o [6] make c ea ing expe ules ha de e mine
op imal inge ings di icul .
Da a-d i en di icul y es ima ion me hods o he piano
a e a ailable, based on symbolic ep esen a ions [13–15],
shee music images [16], o ecen ly audio [17]; e-
a angemen by di icul y has also been done [18]. Fo he
iolin, op imal inge ing es ima ions exis : he o iginal op-
imal pa h app oach [6], hough ocus has been mo e on
playe skill le el a he han pa di icul y, bo h o in-
ge ing es ima ion [19] and in con ex o isual pe cep ion
o iolinis s [20]. In [21], symbolic sco e di icul y was
es ima ed o iolin, piano, and gui a .
Fo he woodwind ins umen amily, howe e , ocus
has been mo e on hei acous ical p ope ies [7,22]. Sax-
ophone acous ics ha e been modeled [23] and eed pa-
ame e s ha e been es ima ed om audio [8]. The kine -
ics o inge ings and no e ansi ions we e s udied o he
lu e [24]. In [25], eco de audio is classi ied by ypical
mis ake, bu no di icul y. Sigh - eading skills ha e been
desc ibed o he cla ine (and iolin) al eady in 1953 [26],
and o he lu e hey ha e been empi ically cha ac e ised
[27], oge he wi h he di e ence in b ea h con ol be ween
beginne s and expe s [28].
2Mul iphonics would a ely apply o likely use s o such a sys em.
3See Sec ion 12 o link o implemen a ion.
Figu e 1. An example o a p omp p esen ed o pe o m-
e s. In his case, he pe o me is mean o ill be ween
Bb4 and B4 (w i en pi ch).
Figu e 2. Visualisa ion o he pe -playe /session di e -
ences in ill speed o ansi ions ha we e eco ded in
e e y session (“ancho ” ansi ions).
3. DATA COLLECTION
The p ocess o collec ing da a consis s o eco ding eno
saxophone (hence o h jus saxophone) playe s illing.
The musicians a e ins uc ed o ill (1) as as as possible,
(2) as cleanly as possible, 4and (3) as lega o as possible. 5
The da a was eco ded in sessions o app ox. 65 ills,
aking an hou o eco d (including a sho b eak). Fo each
session, a se o music no a ion p omp s – see Figu e 1 –
was ypese o he playe s o play om, communica ing
which no es we e o be illed be ween and which inge -
ings we e o be used o each, along wi h emphasising he
idea o speed.
To a oid playe a igue in a eco ding session, a e y
simpli ied p io es ima e o di icul y was done using only
he size o he jump and p imi i e in o ma ion abou how
many inge s mo e, o mix likely easy and di icul ills.
Each session was subsequen ly manually spli in o
eco dings o indi idual ills.
F om he inge ings chosen, he e a e 741 possible in-
ge ing pai s. 6In o al, 817 ills we e eco ded, wi h 5
4Addi ional cla i ica ion was gi en o pe o me s o play in such a way
o minimise o e /unde blowing ha would esul in hea ing no es in he
ha monic se ies o he undamen al no e being played.
5On a saxophone, his would mean ha he ongue is no used o sep-
a a e he wo no es. Fo la ge jumps such as an oc a e, o when needing
o play a low no e ha is easie o p oduce when ongued, his ins uc ion
was de-emphasised.
6Some o hese we e eco ded bu no used in u u e sec ions, as hey
we e ills be ween al e na e inge ings o he same no e. The ill ex ac-
ion me hodology in Sec ion 4 canno wo k wi h his o now.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
748
Figu e 3. Example o how clus e s and ills a e de ec ed
isualised on a CREPE [29] salience plo . Dashed yellow
line loosely ep esen s he clus e bounda y.
di e en , simila ly ad anced conse a o y-le el pe o m-
e s pa icipa ing o a o al o 13 sessions. 7
6 “ancho ” ansi ions we e eco ded in e e y session o
explo e he na u e o ill speed a ia ions ac oss pe o m-
e s and sessions. The speeds o hese ansi ions is shown
in Figu e 2. No e ha Playe 5 is among he as es in 3,
and among he slowes in he o he 3.
4. TRILL SPEED EXTRACTION WORKFLOW
A e eco ding all necessa y ills, an au oma ic ill speed
ex ac ion s ep is un o ex ac he highes ill speed om
he aw audio o each inge ing ansi ion.
We use CREPE [29] o p edic 0 om he aw audio.
To de e mine wha no es pa icipa e in he ill, we pe o m
k-means clus e ing on he 0 alues wi h k= 2. The MIDI
alues a e hen de i ed om he median alue o each clus-
e using lib osa [30], a A=440 Hz.
The ill speed o each segmen was hen de e mined
by he numbe o comple e ills comple ed wi hin ha seg-
men (see Figu e 3). A comple e ill is de ined as s a ing
on no e 1, ansi ioning o no e 2, and e u ning o no e 1.
In p ac ice, pe o me s a y speed o e each ill. To ind
he as es s able ill speed, we ake he i s hal , middle
wo qua e s, and second hal o each ill (each segmen
is abou 1.5–3 seconds long – su icien o coun as a sus-
ained ill), and ake he highes a e age ill speed om
hese h ee segmen s.
On mo e echnically di icul ansi ions a speed, unde
and o e blowing o he no e happens. To co ec o pos-
sible misclus e ings, whe e such mis akes a e de ec ed as
hei own clus e , 8i one clus e was less han 20% he
size o he o he , we disca ded i and e- an k-means wi h
k= 2 again.
To check o possible e o s in ill speed ex ac ion, he
de ec ed no es a e ma ched agains he expec ed in e al
om he eco ding session. I he no es de ec ed by he
ill speed ex ac ion algo i hm do no ma ch he expec ed
7Al e na e inge ings o F#, Bb wi h/wi hou oc a e key, and F on
F/F# we e used. These we e se led on ollowing discussion wi h he
pa icipa ing saxophonis s as o which al e na e inge ings hey used eg-
ula ly.
8Especially on sub- i h ansi ions, whe e an oc a e dis ance om he
ue no es migh “ emp ” k-means o pu bo h a ge no es in one clus e .
Figu e 4. A diag am explaining he encoding o ma as
i ela es o a inge ing cha . 1 (oc a e key) – 13 a e le
hand keys, 14–23 a e igh hand keys.
no es, we checked he ill eco ding manually o make
su e he ex ac ed ill speed does ma ch he eco ded ill.
5. TRANSITION FEATURES
To es ima e cos s o un eco ded ansi ions, 9a ep esen-
a ion o ansi ions is needed. Each ansi ion is a pai
o play s a es; we ep esen play s a es as inge ings. The
saxophone has a ini e numbe o keys (23, see Figu e 4)
ha can be p essed, so we can s aigh o wa dly ep esen
a ansi ion as wo bina y inge ing ec o s. 10 Howe e ,
while his ep esen a ion does in p inciple con ain all in o -
ma ion abou a ansi ion, mo e eco ded da a poin s may
be needed o lea n how he keys in e ac – which combi-
na ions a e di icul and which a e easy. Because we wan
o minimise he amoun o eco ding necessa y o each a
gi en cos es ima ion accu acy, we design wo mo e ep e-
sen a ions: one ha ocuses on he playe ’s inge s a he
han he ins umen ’s keys, and expe ea u es ha inco -
po a e ac o s o di icul y om saxophone pedagogy.
In o al, h ee ep esen a ions a e used o ansi ions
(shown in Figu e 5):
• “Raw” (abb . R) ea u es – aw encodings a e simply
conca ena ed.
• “Finge ” (abb . F) ea u es – each inge ge s a 1
o 0 based on whe he i has o mo e du ing he
ansi ion, along wi h a penal y i any inge has o
mo e om one p essed key o a di e en p essed key
(p esence o a same- inge ansi ion).
• “Expe ” ea u es – u he di ided in o “Hand-
Based” (E-HB) and “Finge -Based” (E-FB).
9Which occu i we subsampled as pe Sec ion 7.
10 Fo ins umen s whe e a la ge amoun o adjus men s can be made
ia he embouchu e o o he non- inge mo o mo emen s ha a e known
o each pi ch, hese would also be inco po a ed.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
749
5.1 Finge ea u es
Each combina ion o co e ed keys has a su icien ly unique
map o which inge s a e ac i e, so he “Finge ” ea u es
can be de i ed au oma ically om he “Raw” ea u es.
The only complica ion is dealing wi h palm keys
(played by he side o he hand o inge s). Two me hods
o mapping keys o inge s we e used o he Finge ea u e
se : a ’palm-as-a- inge ’ (PAF) app oach, whe e palm keys
a e ea ed as being played by a ’six h inge ’ on each hand
ep esen ing he palm, and a ’palm-key- o- inge -mapping’
(P2FM) app oach, whe e each palm key is mapped on o he
inge ha plays i using i s side. Fo example, he High D
key (key 13 in Figu e 4) is played by he side o he index
inge .
5.2 Expe ea u es
The “Expe ” ea u es, selec ed in consul a ion wi h he
eco ding saxophonis s, con ain:
• MIDI alues o bo h inge ings (w i en pi ch).
• Numbe o inge s ha ha e o mo e om a p essed
key o a di e en p essed posi ion (so called same-
inge ansi ions). These a e big indica o s o di i-
cul y, as hey a e impossible o play lega o and a e
gene ally cumbe some. 11
• P esence o any palm key ansi ion on he le and
igh hand espec i ely.
• Whe he o no he oc a e key has o change s a e.
• P esence o a inge ing ha is below a low C#. These
no es equi e addi ional embouchu e and ai suppo .
To his sha ed base, he Hand-Based Expe ea u e se
(E-HB) adds he numbe o inge s ha ha e o change on
he le and igh hand espec i ely. A ansi ion equi -
ing he le poin e , middle and ing inge o change s a es
would ecei e a alue o 3 o his ea u e ( o he le
hand). The Finge -Based Expe ea u es (E-FB) ins ead
add a ea u e o e e y inge , wi h alue 1 i i has o mo e
and 0 o he wise. 12 We expe imen ed also wi h omi ing
he MIDI alues when aining a model (labelled “NoM”).
In addi ion, Expe Weigh s (EW) we e chosen o help
indica e how se e ely a gi en ea u e is likely o impac he
ill speed o a gi en ansi ion. These weigh s we e man-
ually adjus ed un il k-means clus e ing wi h k=n/513
esul ed in clus e s o app oxima ely he same di icul y
as judged by he expe . We expe imen ed wi h disabling
hese as well (labelled “NoEW”).
6. TRANSITION DIFFICULTY ESTIMATION
To es ima e he di icul y o ansi ions (in e als) ha a e
no eco ded, we ain eg ession models. This s ep can
signi ican ly lowe he cos o pa ame e ising he en i e
di icul y model, because i can dec ease he numbe o
eco ding sessions needed. The e a e hus wo goals:
11 The Bis key is igno ed o his, as i is a special case o an in ended
same inge ansi ion.
12 This pa is almos iden ical o he inge -based ea u es.
13 Chosen as an amoun o ansi ions compa able in ui i ely.
Figu e 5. Encoding a ansi ion (E wi h oc a e key o high
D) wi h Raw, Expe ea u e se s a e in he e sion ha
includes MIDI and wi h expe weigh s enabled.
(1) o ob ain a model ha is as accu a e in p edic ing ansi-
ion ill speeds as possible, wi h (2) as ew o he eco ded
ansi ions as possible used as aining da a.
Expe imen s we e conduc ed o compa e he p oposed
ea u e se s and wo model classes: a linea eg ession
model (LM), and a mul i-laye pe cep on (MLP), bo h us-
ing sklea n [31]. The MLP had a hidden laye size o
50, and used he lb gs sol e . Bo h o hese models addi-
ionally clamped any p edic ion below 0.5 ills/s o 0.5. 14
A s a i ied k- old wi h shu le was used o di ide he
da a in o olds o 150 es ills, wi h s a i ica ion classes
dic a ed by binning ansi ions by hei ill speed (bins 0–
1.5, 1.5–3, 3–4.5, 4.5+).
We epo mean squa ed e o (MSE), and also weigh ed
mean squa ed e o (wMSE) weigh ed by he ela i e e-
quency o in e als in he Weima Jazz Da abase (WJD)
[32], o measu e how he model pe o ms wi h espec o
wha one in ac encoun e s in epe oi e a he han wi h
espec o wha he ins umen can play (al hough we a e
mo e in e es ed in he la e ).
Besides mean squa ed e o , we also e alua e using
mean absolu e pe cen age e o (MAPE). Fo he use case
o di icul y es ima ion, a p edic ed and ac ual ill speed
o 1 and 2 is a much la ge issue han a simila absolu e
di e ence o 5 and 6. Addi ionally, he easie a ansi ion
is (and hus possessing a highe eco ded ill speed), he
mo e a iance he e may be be ween pe o me s and e en
be ween eco dings o simila ansi ions by he same pe -
o me , as, a he highe ends o he speed scale, a igue and
o he ac o s play a much la ge ole; such a phenomenon
can be seen in Figu e 2.
Resul s o ill speed es ima ion expe imen s a e in Ta-
ble 1. As e idenced by he mean squa ed e o (MSE) al-
ues ound du ing he expe imen , he LM s ops imp o -
ing a ±0.97 MSE and 0.39 MAPE. The MLP pe o med
much be e , ge ing down o ±0.4MSE and jus unde 0.2
MAPE using he Expe ea u e se s. The Finge -based ea-
u e se pe o med be e han he Raw ea u e se in MSE
bu no in MAPE, sugges ing ha he imp o emen in MSE
comes in as e ills – which is cong uen wi h hese ea-
14 0.5 is s ill lowe han any ac ual eco ded ill speed.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
750
Fea u e se LM(MSE) LM(wMSE) LM(MAPE) MLP(MSE) MLP(wMSE) MLP(MAPE)
R 1.06 1.95 0.39 0.65 1.41 0.25
F(PAF) 0.97 2.02 0.39 0.52 0.96 0.25
F(P2FM) 0.98 2.10 0.39 0.53 1.06 0.25
E-HB 0.94 1.93 0.39 0.42 0.56 0.22
E-HB(NoM) 0.94 1.91 0.39 0.34 0.53 0.20
E-HB(NoEW) 0.94 1.94 0.39 0.36 0.79 0.18
E-HB(NoM&EW) 0.94 1.91 0.39 0.35 0.67 0.19
E-FB 0.94 1.92 0.38 0.63 1.16 0.26
E-FB(NoM) 0.94 1.90 0.39 0.47 0.90 0.22
E-FB(NoEW) 0.94 1.93 0.38 0.45 1.23 0.19
E-FB(NoM&EW) 0.94 1.90 0.39 0.33 0.75 0.18
Table 1. Table showing he a e age MSE, a e age WJD-weigh ed MSE (wMSE), and a e age MAPE o e all olds o a
gi en model and ea u e ex ac ion me hod (Bes pe column in bold; unne -ups in bold i alics). No e ha MAPE o e.g.
0.2 co esponds o he es ima e being 20% o on a e age.
Figu e 6. Di e ences be ween ue and p edic ed ill
speeds o he MLP model using he E-HB(NoEW) ea-
u e se . T ansi ions a e so ed in ascending o de by hei
ue ill speed.
u es mo e closely ela ed o mo o limi a ions.
E o s in wMSE a e la ge han equally weigh ed MSE;
easie ( as e ) ansi ions a e mo e common, so he mo e
hea ily weighed e o s a e on ansi ions wi h mo e noise.
The main akeaway, howe e , is ha expe -designed
ea u es clea ly pe o m bes , p obably as a unc ion o he
small maximum a ailable da ase size. A he same ime,
howe e , he ela ionships be ween indi idual ea u es a e
non- i ial in any ep esen a ion, as he MLP clea ly pe -
o ms be e han he linea model. We es ima e a MAPE
lowe bound om he “ancho ” ansi ions (Figu e 2), wi h
he median speed o each ansi ion as i s he ‘ ue’ alue.
The a e age ancho MAPE was 0.10 (max. 0.14, min.
0.06). The bes model is hus only 8% wo se han he a -
ainable maximum.
7. SAMPLING METHODS
No all eco ded ansi ions a e equally in o ma i e, so
a good sampling s a egy migh educe he numbe o
eco dings necessa y o achie e a a ge p edic ion e o .
Fo he ollowing es s, he Expe Hand-Based (No
Expe Weigh s) ea u es we e used. 15 Fo each sam-
pling me hod, he s a i ied k- old c oss- alida ion om
Figu e 6, was used. Fo e e y old, he aining se was
hen downsampled o he a ge nusing he gi en sampling
me hod. On his downsampled aining se he MLP om
Figu e 6 was ained, and e alua ed on he es se . Fo each
n, 3 independen , di e en ly seeded samples we e es ed,
wi h he g aphed alue being he mean o he MAPEs
o each sampling a emp . We compa ed h ee sampling
me hods:
Uni o m sampling o eco ded ansi ions.
Clus e -based. Fo e e y da a poin , i s Hand-based
Expe ea u es (wi h Expe weigh s 16 and MIDI enabled)
we e ex ac ed. Then, a k-means clus e ing was un, wi h
k=n, so ha we would ge one clus e pe sample poin
we wan o ge , and om each clus e a andom da a poin
was uni o mly selec ed in o he aining sample.
Empi ical. We ex ac ed no e big am p obabili ies
om he Weima Jazz Da abase [32] (WJD), wi h Laplace
smoo hing a α= 0.1, and sampled he ansi ions acco d-
ing o hese p obabili ies. Fo no es wi h mul iple inge -
ings, possible ansi ions we e sampled uni o mly. We ex-
pec his sampling o pe o m wo se – bu i oughly speaks
o he da a e iciency o jus eco ding playe s play epe -
oi e and ex ac ing ansi ion cos s om ha .
As seen in Figu e 7, clus e -based sampling pe o ms
bes , wi h uni o m sampling coming in a close second.
Clus e -based sampling is sligh ly be e a lowe n’s, as
i likely c ea es a mo e in o ma i e se o aining da a.
Howe e , he di e ences a e in p ac ice minimal. Empi -
ical sampling pe o ms much wo se; equen ly occu ing
ansi ions a e no mo e in o ma i e.
While he uni o m and clus e sampling ne e di e ge
oo much, he shallow slope somewha means ha a ound
he egion o MAPE 0.25, he di e ence be ween hese wo
15 Chosen as i is in he sligh ly be e pe o man expe ea u e se in
he o m ha achie ed he bes MAPE.
16 The expe weigh s we e speci ically designed in heo y o his clus-
e ing, no o he model.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
751

Figu e 7. G aph showing MAPE as aining sample size
inc eases ac oss sampling me hods. The colou ed lines a e
he mean MAPEs o e all olds a a gi en sample size. 0.25
mean MAPE is shown as a possible a ge MAPE choice a
which o compa e sampling s a egies.
sampling me hods is abou one eco ding session wo h o
eco dings – 2 o clus e s s. 3 o uni o m. Again, expe
ea u es show hei use ulness.
8. APPLICATION EXAMPLE
Es ima ing he di icul y o a saxophone pa is inally done
by inding he op imal pa h h ough a ini e-s a e au oma-
on be ween ou i s and las se o s a es. The edge
weigh s a e he maximum a ainable ill speeds es ima ed
by he MLP model wi h E-HB(NoEW) ea u es. We do no
use he measu ed ill speeds di ec ly because we wan ou
sys em o also be usable o o he ins umen s o which
he es ima ion me hodology desc ibed abo e allowed p o-
ceeding wi hou eco ding all he ansi ions.
Gi en ha no e alues a e known om he pa , we can
accoun o a gi en a ge empo. The “ as es ” (maximal)
pa h h ough he pa in e ms o inge ings is ound wi h
Vi e bi decoding o e he es ima ed ill speeds (yielding
op imal inge ings). Then, o e e y inge ing big am in
he pa , he ansi ion di icul y is he p opo ion o he e-
qui ed “hal ill” speed a he a ge empo o he es ima ed
maximum ill speed o ha big am.
The ou come o one such model pass is isualised in
Figu e 8. 17 The b igh ness o each no ehead is gi en by
he a e age o i s incoming and ou going ansi ion di i-
cul ies. This exce p shows he non- i ial di icul y s uc-
u e o he saxophone. The low no es a e ha de han mid-
ange especially due o complex pinky inge mo emen
a he end o he ph ase. A he same ime i is sensi i e
o a majo ac o o echnical di icul y – speed, wi h he
iple s s anding ou . We c ea ed a Musesco e 3.6 plugin
implemen ing his isualisa ion (see Sec ion 12).
17 Two ull *.mxl examples a e a ailable in he Gi hub eposi o y (see
Sec ion 12).
Figu e 8. Example di icul y isualisa ion ool ou pu . The
edde a no e is, he mo e di icul he segmen a ound is
deemed o be. Tempo o his isualisa ion was se o 160
BPM, which is no pa icula ly as o jazz [33].
9. DISCUSSION & CONCLUSION
We ha e p oposed a me hod o es ima ing he di icul y
o woodwind pa s based on he “op imal pa h” pa adigm
[6] and implemen ed i o he eno saxophone, p o iding
da a, me hods o ansi ion weigh es ima ion, and show-
ing how expe knowledge can help minimise he cos o
c ea ing such models o o he ins umen s. I is an exam-
ple o pa icipa o y, p ac ice-based esea ch.
Howe e , he e a e also signi ican limi a ions. Mos
impo an ly, we would p e e o empi ically e alua e he
o e all es ima ion, bu we lack digi ally encoded saxo-
phone pa s wi h au ho i a i e g ound u h di icul y. The
ideal sou ce would be piece g ading om a musical ex-
amina ion boa d such as he ABRSM (Associa ed Boa d
o he Royal Schools o Music), bu digi al encodings o
he ABRSM-g aded pieces a e no a ailable and copy igh
issues would p e en sha ing an e alua ion co pus, so di -
icul y e alua ion da a mus be ga he ed. 18
T ill speed does no di ec ly cap u e he in icacies o
oicing, a icula ion, and b ea h suppo , which a e all ma-
jo con ibu o s o woodwind di icul y. To main ain dis-
c e e play s a es, knowledge o hese mechanics could be
encoded in addi ional expe ea u es. Also, ill speeds
can be asymme ical. On he saxophone, going up an oc-
a e quickly is much easie han down an oc a e; his is no
modeled.
This me hodology s ill equi es ini ial access o a ech-
nically p o icien ins umen alis (and domain expe ) o
do he eco dings and design he expe ea u es and in-
s umen encodings. Especially when using clus e -based
sampling, many o he eco ded ansi ions may be la ge,
echnical leaps ha may be di icul o beginne s o play.
As can be seen in Figu e 2, e en o he same ansi ion,
he a iance o he eco ded ill speed can be high. Some
no malisa ion p ocedu e is needed. Howe e , as seen in
Figu e 2, a pe o me may be he as es o one ansi ion,
bu slowes in ano he , so a single coe icien pe playe
is unlikely o be e ec i e. Howe e , he oppo uni y he e
is o a he same ime a i e a a no malisa ion unc ion
ha can easily pe sonalise he ansi ion cos s o a speci ic
playe based solely on eco ding a ew ancho ansi ions.
Despi e hese limi a ions, we belie e his wo k can
se e a leas as a i s s ep owa ds including he wood-
wind amily mo e in o MIR o music pedagogy, and we
look o wa d o how o he s migh ake inspi a ion and de-
o e some mo e a en ion o hese ins umen s.
18 This also speaks o he unde - ep esen a ion o woodwinds in MIR;
al hough an audio- o-sco e sys em o he saxophone has been ied [34].
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
752
10. ACKNOWLEDGMENT STATEMENT
This wo k is suppo ed by p ojec “Human-cen ed AI
o a Sus ainable and Adap i e Socie y” ( eg. no.:
CZ.02.01.01/00/23_025/0008691), co- unded by he Eu-
opean Union. Compu ing in as uc u e was p o ided
by he LINDAT/CLARIAH-CZ Resea ch In as uc u e
(h ps://linda .cz), suppo ed by he Minis y o Educa-
ion, You h and Spo s o he Czech Republic (P ojec No.
LM2023062).
11. ETHICS STATEMENT
All playe s signed an in o med consen o m. All da a is
pseudonymised. Video eco dings o sessions, aken as a
backup sou ce o checking agains unexpec ed e o s in
ex ac ion, we e dele ed as soon as he ill and ill speed
ex ac ion s eps we e inished and checked. No gene a i e
AI ools we e used in p oducing his ex ; code gene a ion
was used o hin s in isualisa ions ha a e used as igu es.
12. DATA AND CODE ACCESSIBILITY
The aw audio da a in he o m o eco ded ills is
he e: h p://hdl.handle.ne /11234/1-5942.
This also includes he ou pu o he ill speed ex ac-
ion pipeline, as e unning he pipeline o all he da a
is ime-consuming. All code and o he implemen-
a ion de ails can be ound a www.gi hub.com/
Vobludalib/SaxophoneDi icul yModel/
ee/ISMIR2025. The Musesco e 3.6 plugin imple-
men ing his model can be ound a www.gi hub.com/
Vobludalib/SaxophoneDi icul yModel/
ee/main/plugin .
13. REFERENCES
[1] M. A. Guadagnoli and T. D. Lee, “Challenge poin : a
amewo k o concep ualizing he e ec s o a ious
p ac ice condi ions in mo o lea ning,” Jou nal o mo-
o beha io , ol. 36, no. 2, pp. 212–224, 2004.
[2] J. Bugos and W. Lee, “Pe cep ions o challenge:
he ole o ca as ophe heo y in piano lea ning,”
Music Educa ion Resea ch, ol. 17, no. 3, p.
312–326, May 2014. [Online]. A ailable: h p:
//dx.doi.o g/10.1080/14613808.2014.899334
[3] E. Nakamu a, Y. Sai o, and K. Yoshii, “S a is-
ical lea ning and es ima ion o piano inge ing,”
In o ma ion Sciences, ol. 517, pp. 68–85, 2020.
[Online]. A ailable: h ps://www.sciencedi ec .com/
science/a icle/pii/S0020025519311879
[4] N. S i a san and T. Be g-Ki kpa ick, “Checklis
models o imp o ed ou pu luency in piano inge ing
p edic ion,” in P oceedings o he 23 d In e na ional
Socie y o Music In o ma ion Re ie al Con e ence,
ISMIR 2022, Bengalu u, India, Decembe 4-8, 2022,
P. Rao, H. A. Mu hy, A. S ini asamu hy, R. M.
Bi ne , R. C. Repe o, M. Go o, X. Se a, and
M. Mi on, Eds., 2022, pp. 525–531. [Online].
A ailable: h ps://a chi es.ismi .ne /ismi 2022/pape /
000063.pd
[5] D. Radicioni, L. Anselma, and V. Lomba do, “A
segmen a ion-based p o o ype o compu e s ing ins u-
men s inge ing,” in P oceedings o he Con e ence on
In e disciplina y Musicology, ol. 17, 05 2004, pp. 15–
18.
[6] S. I. Sayegh, “Finge ing o s ing ins umen s wi h he
op imum pa h pa adigm,” Compu e Music Jou nal,
ol. 13, no. 3, pp. 76–84, 1989. [Online]. A ailable:
h p://www.js o .o g/s able/3680014
[7] J. Ku oda and G. Kou aki, “Sensing con ol pa ame-
e s o lu e om mic ophone sound based on machine
lea ning om obo ic pe o me ,” Senso s, ol. 22,
no. 5, p. 2074, 2022.
[8] A. Muñoz A ancón, B. Gazengel, J.-P. Dalmon , and
E. Conan, “Es ima ion o saxophone eed pa ame e s
du ing playing,” The Jou nal o he Acous ical Socie y
o Ame ica, ol. 139, no. 5, pp. 2754–2765, 2016.
[9] M. Weike and J. Schlömiche -Thie , “La yngeal
mo emen s in saxophone playing: Video-endoscopic
in es iga ions wi h saxophone playe s: A pilo
s udy,” Jou nal o Voice, ol. 13, no. 2, pp.
265–273, Jun 1999. [Online]. A ailable: h ps:
//doi.o g/10.1016/S0892-1997(99)80031-9
[10] G. P. Sca one, A. Le eb e, and A. R. da Sil a, “Mea-
su emen o ocal- ac in luence du ing saxophone
pe o mance,” The Jou nal o he Acous ical Socie y o
Ame ica, ol. 123, no. 4, pp. 2391–2400, 2008.
[11] P. Ramoneda, N. C. Tame , V. E emenko, X. Se a,
and M. Mi on, “Sco e di icul y analysis o piano pe -
o mance educa ion based on inge ing,” in ICASSP
2022 - 2022 IEEE In e na ional Con e ence on Acous-
ics, Speech and Signal P ocessing (ICASSP), 2022,
pp. 201–205.
[12] T. Wes , “Pi ch Finge ing Sys ems and he Sea ch o
Pe ec ion,” in 22nd In e na ional Con e ence on New
In e aces o Musical Exp ession, NIME 2022, online,
June 28 - July 1, 2022. nime.o g, jun 16 2022.
[13] V. Sébas ien, H. Ralambond ainy, O. Sébas ien, and
N. Con uy , “Sco e analyze : Au oma ically de e min-
ing sco es di icul y le el o ins umen al e-lea ning,”
in 13 h In e na ional Socie y o Music In o ma ion Re-
ie al Con e ence (ISMIR 2012), 2012, pp. 571–576.
[14] S.-C. Chiu and M.-S. Chen, “A s udy on di icul y le el
ecogni ion o piano shee music,” in 2012 IEEE In e -
na ional Symposium on Mul imedia, 2012, pp. 17–23.
[15] Y. Gha as, M. Fayek, and M. Hadhoud, “A hyb id deep
lea ning app oach o musical di icul y es ima ion o
piano symbolic music,” Alexand ia Enginee ing Jou -
nal, ol. 61, no. 12, pp. 10 183–10 196, 2022.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
753
[16] P. Ramoneda, J. J. Vale o-Mas, D. Jeong, and X. Se a,
“P edic ing pe o mance di icul y om piano shee
music images,” in P oceedings o he 24 h In e na-
ional Socie y o Music In o ma ion Re ie al Con e -
ence, ISMIR 2023, Milan, I aly, No embe 5-9, 2023,
2023, pp. 708–715.
[17] P. Ramoneda, M. Lee, D. Jeong, J. J. Vale o-Mas, and
X. Se a, “Can audio e eal music pe o mance di i-
cul y? insigh s om he piano syllabus da ase ,” IEEE
T ansac ions on Audio, Speech and Language P ocess-
ing, pp. 1129–1141, 2025.
[18] M. Suzuki, “Piano sco e ea angemen in o mul i-
ple di icul y le els ia no a ion- o-no a ion app oach,”
EURASIP Jou nal on Audio, Speech, and Music P o-
cessing, ol. 2023, no. 1, p. 52, 2023.
[19] W. Naga a, S. Sako, and T. Ki amu a, “Violin
inge ing es ima ion acco ding o skill le el based
on hidden ma ko model,” in Music Technology
mee s Philosophy - F om Digi al Echos o Vi ual
E hos: Join P oceedings o he 40 h In e na ional
Compu e Music Con e ence, ICMC 2014, and he
11 h Sound and Music Compu ing Con e ence, SMC
2014, A hens, G eece, Sep embe 14-20, 2014.
Michigan Publishing, 2014. [Online]. A ailable:
h ps://hdl.handle.ne /2027/spo.bbp2372.2014.189
[20] V. D’Ama o, E. Vol a, L. One o, G. Volpe, A. Camu i,
and D. Angui a, “Unde s anding iolin playe s’ skill
le el based on mo ion cap u e: a da a-d i en pe spec-
i e,” Cogni i e Compu a ion, ol. 12, pp. 1356–1369,
2020.
[21] D. S. Decon o, E. L. F. Valenga, and C. N. Silla, “Au-
oma ic music sco e di icul y classi ica ion,” in 2023
30 h In e na ional Con e ence on Sys ems, Signals and
Image P ocessing (IWSSIP), 2023, pp. 1–5.
[22] J. Wol e, J. Smi h, J. Tann, and N. H. Fle che , “Acous-
ic impedance spec a o classical and mode n lu es,”
Jou nal o sound and ib a ion, ol. 243, no. 1, pp.
127–144, 2001.
[23] T. Smy h and M. Rouhipou , “Saxophone modelling
and sys em iden i ica ion,” in P oceedings o Mee ings
on Acous ics. ASA, 2013. [Online]. A ailable:
h p://dx.doi.o g/10.1121/1.4799622
[24] A. Almeida, R. Chow, J. Smi h, and J. Wol e, “The
kine ics and acous ics o inge ing and no e ansi ions
on he lu e,” he Jou nal o he Acous ical Socie y o
Ame ica, ol. 126, no. 3, pp. 1521–1529, 2009.
[25] Y. Han and K. Lee, “Hie a chical app oach o de ec
common mis akes o beginne lu e playe s.” in ISMIR,
2014, pp. 77–82.
[26] A. G. Thomson, An analysis o di icul ies in sigh
eading music o iolin and cla ine . Uni e si y o
Cincinna i, 1953.
[27] W. B. Thompson, “Music sigh - eading skill in lu e
playe s,” The Jou nal o gene al psychology, ol. 114,
no. 4, pp. 345–352, 1987.
[28] P. de la Cuad a, B. Fab e, N. Mon ge mon , and
C. Cha e, “Analysis o lu e con ol pa ame e s: A
compa ison be ween a no ice and an expe ienced
lau is ,” Ac a Acus ica uni ed wi h Acus ica, ol. 94,
no. 5, pp. 740–749, 2008.
[29] J. W. Kim, J. Salamon, P. Li, and J. P. Bello, “C epe:
A con olu ional ep esen a ion o pi ch es ima ion,”
in 2018 IEEE In e na ional Con e ence on Acous ics,
Speech and Signal P ocessing (ICASSP), 2018, pp.
161–165.
[30] B. McFee, M. McVica , D. Fa onbi, I. Roman,
M. Go e , S. Balke, S. Sey a h, A. Malek, C. Ra el,
V. Los anlen, B. an Nieki k, D. Lee, F. Cwi kowi z,
F. Zalkow, O. Nie o, D. Ellis, J. Mason, K. Lee,
B. S ee s, E. Hal achs, C. Thomé, F. Robe -S ö e ,
R. Bi ne , Z. Wei, A. Weiss, E. Ba enbe g, K. Choi,
R. Yamamo o, C. Ca , A. Me sai, S. Sulli an,
P. F iesch, A. K ishnakuma , S. Hidaka, S. Kowalik,
F. Kelle , D. Mazu , A. Chabo -Lecle c, C. Haw ho ne,
C. Ramap asad, M. Keum, J. Gomez, W. Mon oe, V. A.
Mo ozo , K. Eliasi, nullmigh ybo o, P. Bibe s ein,
N. D. Se gin, R. Hennequin, R. Nak inis, bean owel,
T. Kim, J. P. Åsen, J. Lim, A. Malins, D. He eñú,
S. an de S uijk, L. Nickel, J. Wu, Z. Wang,
T. Ga es, M. Voll a h, A. Sa o , Xiao-Ming,
A. Po e , S. K anzle , Voodoohop, M. D. Gangi,
H. Jinoz, C. Gue e o, A. Mazha , odd me2178,
Z. Ba a z, A. Kos in, X. Zhuang, C. T. Lo,
P. Camp , E. Semeniuc, M. Biswal, S. Mou a,
P. B ossie , H. Lee, and W. Pimen a, “lib osa/lib osa:
0.10.2.pos 1,” May 2024. [Online]. A ailable: h ps:
//doi.o g/10.5281/zenodo.11192913
[31] F. Ped egosa, G. Va oquaux, A. G am o , V. Michel,
B. Thi ion, O. G isel, M. Blondel, P. P e enho e ,
R. Weiss, V. Dubou g e al., “Sciki -lea n: Machine
lea ning in py hon,” Jou nal o machine lea ning e-
sea ch, ol. 12, no. Oc , pp. 2825–2830, 2011.
[32] M. P leide e , K. F iele , J. Abeße , W.-G. Zaddach,
and B. Bu kha , Eds., Inside he Jazzoma - New Pe -
spec i es o Jazz Resea ch. Scho Campus, 2017.
[33] G. L. Collie and J. L. Collie , “An explo a ion
o he use o empo in jazz,” Music Pe cep ion,
ol. 11, no. 3, p. 219–242, 1994. [Online]. A ailable:
h p://dx.doi.o g/10.2307/40285621
[34] J. C. Ma ínez-Se illa, M. Al a o-Con e as, J. J.
Vale o-Mas, and J. Cal o-Za agoza, “Insigh s in o end-
o-end audio- o-sco e ansc ip ion wi h eal eco d-
ings: A case s udy wi h saxophone wo ks,” in INTER-
SPEECH 2023, se . in e speech_2023. ISCA, Aug.
2023, p. 2793–2797.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
754