Modeling the Difficulty of Saxophone Music

Author: Šimon Libřický; jr., Jan Hajič

Publisher: Zenodo

DOI: 10.5281/zenodo.17706579

Source: https://zenodo.org/records/17706579/files/000087.pdf

MODELING THE DIFFICULTY OF SAXOPHONE MUSIC
Šimon Libˇ
ický, Jan Hajiˇ
c j .
Ins i u e o Fo mal and Applied Linguis ics, Cha les Uni e si y
[email p o ec ed], [email p o ec ed]
ABSTRACT
In lea ning music, di icul y is an impo an ac o in choice
o epe oi e, choice o empo, and s uc u e o p ac ice.
These choices a e ypically done wi h he guidance o
a eache ; howe e , no all lea ne s ha e access o one.
While piano and s ings ha e had some a en ion de o ed o
au oma ed di icul y es ima ion, wind ins umen s ha e so
a been unde -se ed. In his pape , we p opose a me hod
o es ima ing he di icul y o pieces o winds and im-
plemen i o he eno saxophone. We ake he cos -o -
a e sal app oach, modelling he pa as a sequence o
ansi ions – no e pai s. We es ima e ansi ion cos s om
newly collec ed eco dings o ill speeds, compa ing ep-
esen a ions o saxophone inge ings a a ious le els o
expe inpu . We hen compu e and isualise he cos o
he op imal pa h h ough he pa , a a gi en empo. While
we p esen his model o he eno saxophone, he same
pipeline can be applied o o he woodwinds, and ou ex-
pe imen s show ha wi h app op ia e ea u e design, only
a small p opo ion o possible ills is needed o es ima e
he cos s well. Thus, we p esen a p ac ical way o di e -
si ying he capabili ies o MIR in music educa ion o he
wind amily o ins umen s.
1. INTRODUCTION AND SYSTEM OVERVIEW
One essen ial elemen o s udying music is choosing ap-
p op ia e epe oi e. Ou o he ac o s ha make a piece
app op ia e o a lea ne , i s di icul y is among he mos
impo an : oo much o a challenge, o no enough o one,
can se iously damage he lea ning p ocess [1,2]. A majo
ac o go e ning he di icul y o execu ing an ins umen-
al pa is also i s a ge empo, which mus be de e mined.
In addi ion o es ima ing he di icul y o en i e pieces, an
o e iew o which pa s o a piece a e di icul (and hence
will likely equi e he mos p ac ice) migh be use ul o
any lea ne . While eache s a e o cou se p o icien in as-
signing app op ia e epe oi e, empo, and p ac ice plans,
no e e y lea ne has access o a quali ied eache . Hence,
he e is cause o y and model he di icul y o a piece and
i s pa s au oma ically.
© Š. Libˇ
ický, and J. Hajiˇ
c j .. Licensed unde a C ea i e
Commons A ibu ion 4.0 In e na ional License (CC BY 4.0). A ibu-
ion: Š. Libˇ
ický, and J. Hajiˇ
c j ., “Modeling he Di icul y o Saxophone
Music”, in P oc. o he 26 h In . Socie y o Music In o ma ion Re ie al
Con ., Daejeon, Sou h Ko ea, 2025.
Jus as he e a e unde -se ed s uden s, he e a e unde -
se ed ins umen s. The opic o di icul y es ima ion, o-
ge he wi h inge ing es ima ion, has been explo ed on he
piano [3,4] and o some ex en also on s ing ins umen s
[5,6]; no , howe e , o wind ins umen s.
In his pape , we p opose a model o di icul y o he
eno saxophone. We ake he “op imum pa h” pa adigm
[6], whe e con ol o musical ins umen s is undamen-
ally s uc u ed in o “play s a es” o “play ac ions” mean
o sound he desi ed one a he desi ed ime. Playing a
pa can hen be modelled as a e sing a pa h h ough he
co esponding play s a es in ime, and di icul y o a pa is
he cos o his a e sal.
In o de o his model o gene alise o unseen pa s,
we can ac o ize he cos o a pa h o he agg ega ed cos s
o sub-sequences o which we ha e cos es ima es, sim-
ila ly o how n-g am language models unca e his o y.
These sub-sequences can be as sho as indi idual no es
(unig ams); howe e , playing one no e wi h no con ex is
nea ly meaningless. Toge he wi h [6] we se le on ansi-
ions be ween wo inge ings as he uni s o which cos
is de ined. 1We e ain a leas some meaning ul musi-
cal con ex , while keeping he s a e space manageable (ap-
p ox. 750 possible inge ing pai s on he eno saxophone).
T a e sal h ough a woodwind pa is hen he a e sal
h ough a ini e-s a e au oma on wi h ones as nodes and
ansi ion cos s on he edges, which is easily ac able.
Two hings hus mus be done: de ining he play s a es
o he saxophone, and es ima ing ansi ion cos s. Fo he
saxophone, a play s a e ha p oduces a desi ed one is a
combina ion o inge ing, b ea h, muscle oicing, and a -
icula ion. B ea h, howe e , is ha d o obse e and o -
malise, and in any case is no en i ely independen om
inge ing, hough some cha ac e isa ion o b ea h pa am-
e e s om woodwind audio has been done [7, 8]. A icu-
la ion and muscle oicing o saxophones ha e been s ud-
ied [9,10], bu no o a le el ha hey can be easily mod-
elled compu a ionally. Hence, we ocus on encoding play
s a es ia inge ing.
The inal missing piece o he puzzle is hen o es ima e
he cos s o ansi ions. This is, howe e , non- i ial o
he saxophone. Fo ins ance, he majo second om (w i -
en) C o D on a eno saxophone a ies wildly in b ea h
suppo , oicing equi emen s, and mo o mo emen s in
he h ee oc a es whe e his in e al appea s. Thus, in o -
1The e is only a small numbe o scena ios on a saxophone whe e a
pe o me would al e inge ings based on any hing o he han he p eced-
ing and ollowing inge ings.
747
de o es ima e he ansi ion cos s, a combina ion o expe
knowledge and eal-wo ld da a is necessa y. Di ec da a-
d i en di icul y es ima ion [11] ains a model on ex insic
anno a ions o di icul y, ins ead o in insic p ope ies o
he no es ha make up he pe o mance, equi ing mo e
da a wi h di icul y a ings. Ins ead, we p oposed o c e-
a ing inge ing and di icul y p edic ion models ha ely
on da a ha a e easie o acqui e, using ill speed as a
p oxy o inge ing ansi ion di icul y, aking inspi a ion
om [12], who e alua ed heo e ical inge ing designs o
digi al ins umen s.
Besides designing and implemen ing he applica ion o
he op imum pa h di icul y model o he saxophone using
ill speeds, which is gene al enough o apply o any wind
ins umen wi hou ha ing o adjus impo an assump ions
(such as single- oice pa s 2), we also con ibu e a da ase
o eno saxophone ills and ex ac ed ill speeds o all
i s playable ansi ions, we show ha expe knowledge in
ep esen a ion design o he inhe en ly limi ed da a leads
o mo e accu a e ansi ion cos es ima ion as well as o
minimising he da a acquisi ion cos s o mo e ins umen
models, and we p o ide he implemen a ion o he di i-
cul y model i sel as well as he pipeline o da a collec ion
and ill speed ex ac ion. 3
2. RELATED WORK
Exis ing music di icul y es ima ion me hods ely on us-
ing musician-made anno a ions o op imal inge ings [3],
o se ing (o lea ning) o weigh s based on expe obse -
a ions [6]. Fo woodwinds and b ass ins umen s, how-
e e , whe e la ge leaps can be accomplished wi h minimal
mo o mo emen in he inge s and a ms, he ules-based
app oach o [6] make c ea ing expe ules ha de e mine
op imal inge ings di icul .
Da a-d i en di icul y es ima ion me hods o he piano
a e a ailable, based on symbolic ep esen a ions [13–15],
shee music images [16], o ecen ly audio [17]; e-
a angemen by di icul y has also been done [18]. Fo he
iolin, op imal inge ing es ima ions exis : he o iginal op-
imal pa h app oach [6], hough ocus has been mo e on
playe skill le el a he han pa di icul y, bo h o in-
ge ing es ima ion [19] and in con ex o isual pe cep ion
o iolinis s [20]. In [21], symbolic sco e di icul y was
es ima ed o iolin, piano, and gui a .
Fo he woodwind ins umen amily, howe e , ocus
has been mo e on hei acous ical p ope ies [7,22]. Sax-
ophone acous ics ha e been modeled [23] and eed pa-
ame e s ha e been es ima ed om audio [8]. The kine -
ics o inge ings and no e ansi ions we e s udied o he
lu e [24]. In [25], eco de audio is classi ied by ypical
mis ake, bu no di icul y. Sigh - eading skills ha e been
desc ibed o he cla ine (and iolin) al eady in 1953 [26],
and o he lu e hey ha e been empi ically cha ac e ised
[27], oge he wi h he di e ence in b ea h con ol be ween
beginne s and expe s [28].
2Mul iphonics would a ely apply o likely use s o such a sys em.
3See Sec ion 12 o link o implemen a ion.
Figu e 1. An example o a p omp p esen ed o pe o m-
e s. In his case, he pe o me is mean o ill be ween
Bb4 and B4 (w i en pi ch).
Figu e 2. Visualisa ion o he pe -playe /session di e -
ences in ill speed o ansi ions ha we e eco ded in
e e y session (“ancho ” ansi ions).
3. DATA COLLECTION
The p ocess o collec ing da a consis s o eco ding eno
saxophone (hence o h jus saxophone) playe s illing.
The musicians a e ins uc ed o ill (1) as as as possible,
(2) as cleanly as possible, 4and (3) as lega o as possible. 5
The da a was eco ded in sessions o app ox. 65 ills,
aking an hou o eco d (including a sho b eak). Fo each
session, a se o music no a ion p omp s – see Figu e 1 –
was ypese o he playe s o play om, communica ing
which no es we e o be illed be ween and which inge -
ings we e o be used o each, along wi h emphasising he
idea o speed.
To a oid playe a igue in a eco ding session, a e y
simpli ied p io es ima e o di icul y was done using only
he size o he jump and p imi i e in o ma ion abou how
many inge s mo e, o mix likely easy and di icul ills.
Each session was subsequen ly manually spli in o
eco dings o indi idual ills.
F om he inge ings chosen, he e a e 741 possible in-
ge ing pai s. 6In o al, 817 ills we e eco ded, wi h 5
4Addi ional cla i ica ion was gi en o pe o me s o play in such a way
o minimise o e /unde blowing ha would esul in hea ing no es in he
ha monic se ies o he undamen al no e being played.
5On a saxophone, his would mean ha he ongue is no used o sep-
a a e he wo no es. Fo la ge jumps such as an oc a e, o when needing
o play a low no e ha is easie o p oduce when ongued, his ins uc ion
was de-emphasised.
6Some o hese we e eco ded bu no used in u u e sec ions, as hey
we e ills be ween al e na e inge ings o he same no e. The ill ex ac-
ion me hodology in Sec ion 4 canno wo k wi h his o now.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
748
Figu e 3. Example o how clus e s and ills a e de ec ed
isualised on a CREPE [29] salience plo . Dashed yellow
line loosely ep esen s he clus e bounda y.
di e en , simila ly ad anced conse a o y-le el pe o m-
e s pa icipa ing o a o al o 13 sessions. 7
6 “ancho ” ansi ions we e eco ded in e e y session o
explo e he na u e o ill speed a ia ions ac oss pe o m-
e s and sessions. The speeds o hese ansi ions is shown
in Figu e 2. No e ha Playe 5 is among he as es in 3,
and among he slowes in he o he 3.
4. TRILL SPEED EXTRACTION WORKFLOW
A e eco ding all necessa y ills, an au oma ic ill speed
ex ac ion s ep is un o ex ac he highes ill speed om
he aw audio o each inge ing ansi ion.
We use CREPE [29] o p edic 0 om he aw audio.
To de e mine wha no es pa icipa e in he ill, we pe o m
k-means clus e ing on he 0 alues wi h k= 2. The MIDI
alues a e hen de i ed om he median alue o each clus-
e using lib osa [30], a A=440 Hz.
The ill speed o each segmen was hen de e mined
by he numbe o comple e ills comple ed wi hin ha seg-
men (see Figu e 3). A comple e ill is de ined as s a ing
on no e 1, ansi ioning o no e 2, and e u ning o no e 1.
In p ac ice, pe o me s a y speed o e each ill. To ind
he as es s able ill speed, we ake he i s hal , middle
wo qua e s, and second hal o each ill (each segmen
is abou 1.5–3 seconds long – su icien o coun as a sus-
ained ill), and ake he highes a e age ill speed om
hese h ee segmen s.
On mo e echnically di icul ansi ions a speed, unde
and o e blowing o he no e happens. To co ec o pos-
sible misclus e ings, whe e such mis akes a e de ec ed as
hei own clus e , 8i one clus e was less han 20% he
size o he o he , we disca ded i and e- an k-means wi h
k= 2 again.
To check o possible e o s in ill speed ex ac ion, he
de ec ed no es a e ma ched agains he expec ed in e al
om he eco ding session. I he no es de ec ed by he
ill speed ex ac ion algo i hm do no ma ch he expec ed
7Al e na e inge ings o F#, Bb wi h/wi hou oc a e key, and F on
F/F# we e used. These we e se led on ollowing discussion wi h he
pa icipa ing saxophonis s as o which al e na e inge ings hey used eg-
ula ly.
8Especially on sub- i h ansi ions, whe e an oc a e dis ance om he
ue no es migh “ emp ” k-means o pu bo h a ge no es in one clus e .
Figu e 4. A diag am explaining he encoding o ma as
i ela es o a inge ing cha . 1 (oc a e key) – 13 a e le
hand keys, 14–23 a e igh hand keys.
no es, we checked he ill eco ding manually o make
su e he ex ac ed ill speed does ma ch he eco ded ill.
5. TRANSITION FEATURES
To es ima e cos s o un eco ded ansi ions, 9a ep esen-
a ion o ansi ions is needed. Each ansi ion is a pai
o play s a es; we ep esen play s a es as inge ings. The
saxophone has a ini e numbe o keys (23, see Figu e 4)
ha can be p essed, so we can s aigh o wa dly ep esen
a ansi ion as wo bina y inge ing ec o s. 10 Howe e ,
while his ep esen a ion does in p inciple con ain all in o -
ma ion abou a ansi ion, mo e eco ded da a poin s may
be needed o lea n how he keys in e ac – which combi-
na ions a e di icul and which a e easy. Because we wan
o minimise he amoun o eco ding necessa y o each a
gi en cos es ima ion accu acy, we design wo mo e ep e-
sen a ions: one ha ocuses on he playe ’s inge s a he
han he ins umen ’s keys, and expe ea u es ha inco -
po a e ac o s o di icul y om saxophone pedagogy.
In o al, h ee ep esen a ions a e used o ansi ions
(shown in Figu e 5):
• “Raw” (abb . R) ea u es – aw encodings a e simply
conca ena ed.
• “Finge ” (abb . F) ea u es – each inge ge s a 1
o 0 based on whe he i has o mo e du ing he
ansi ion, along wi h a penal y i any inge has o
mo e om one p essed key o a di e en p essed key
(p esence o a same- inge ansi ion).
• “Expe ” ea u es – u he di ided in o “Hand-
Based” (E-HB) and “Finge -Based” (E-FB).
9Which occu i we subsampled as pe Sec ion 7.
10 Fo ins umen s whe e a la ge amoun o adjus men s can be made
ia he embouchu e o o he non- inge mo o mo emen s ha a e known
o each pi ch, hese would also be inco po a ed.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
749
5.1 Finge ea u es
Each combina ion o co e ed keys has a su icien ly unique
map o which inge s a e ac i e, so he “Finge ” ea u es
can be de i ed au oma ically om he “Raw” ea u es.
The only complica ion is dealing wi h palm keys
(played by he side o he hand o inge s). Two me hods
o mapping keys o inge s we e used o he Finge ea u e
se : a ’palm-as-a- inge ’ (PAF) app oach, whe e palm keys
a e ea ed as being played by a ’six h inge ’ on each hand
ep esen ing he palm, and a ’palm-key- o- inge -mapping’
(P2FM) app oach, whe e each palm key is mapped on o he
inge ha plays i using i s side. Fo example, he High D
key (key 13 in Figu e 4) is played by he side o he index
inge .
5.2 Expe ea u es
The “Expe ” ea u es, selec ed in consul a ion wi h he
eco ding saxophonis s, con ain:
• MIDI alues o bo h inge ings (w i en pi ch).
• Numbe o inge s ha ha e o mo e om a p essed
key o a di e en p essed posi ion (so called same-
inge ansi ions). These a e big indica o s o di i-
cul y, as hey a e impossible o play lega o and a e
gene ally cumbe some. 11
• P esence o any palm key ansi ion on he le and
igh hand espec i ely.
• Whe he o no he oc a e key has o change s a e.
• P esence o a inge ing ha is below a low C#. These
no es equi e addi ional embouchu e and ai suppo .
To his sha ed base, he Hand-Based Expe ea u e se
(E-HB) adds he numbe o inge s ha ha e o change on
he le and igh hand espec i ely. A ansi ion equi -
ing he le poin e , middle and ing inge o change s a es
would ecei e a alue o 3 o his ea u e ( o he le
hand). The Finge -Based Expe ea u es (E-FB) ins ead
add a ea u e o e e y inge , wi h alue 1 i i has o mo e
and 0 o he wise. 12 We expe imen ed also wi h omi ing
he MIDI alues when aining a model (labelled “NoM”).
In addi ion, Expe Weigh s (EW) we e chosen o help
indica e how se e ely a gi en ea u e is likely o impac he
ill speed o a gi en ansi ion. These weigh s we e man-
ually adjus ed un il k-means clus e ing wi h k=n/513
esul ed in clus e s o app oxima ely he same di icul y
as judged by he expe . We expe imen ed wi h disabling
hese as well (labelled “NoEW”).
6. TRANSITION DIFFICULTY ESTIMATION
To es ima e he di icul y o ansi ions (in e als) ha a e
no eco ded, we ain eg ession models. This s ep can
signi ican ly lowe he cos o pa ame e ising he en i e
di icul y model, because i can dec ease he numbe o
eco ding sessions needed. The e a e hus wo goals:
11 The Bis key is igno ed o his, as i is a special case o an in ended
same inge ansi ion.
12 This pa is almos iden ical o he inge -based ea u es.
13 Chosen as an amoun o ansi ions compa able in ui i ely.
Figu e 5. Encoding a ansi ion (E wi h oc a e key o high
D) wi h Raw, Expe ea u e se s a e in he e sion ha
includes MIDI and wi h expe weigh s enabled.
(1) o ob ain a model ha is as accu a e in p edic ing ansi-
ion ill speeds as possible, wi h (2) as ew o he eco ded
ansi ions as possible used as aining da a.
Expe imen s we e conduc ed o compa e he p oposed
ea u e se s and wo model classes: a linea eg ession
model (LM), and a mul i-laye pe cep on (MLP), bo h us-
ing sklea n [31]. The MLP had a hidden laye size o
50, and used he lb gs sol e . Bo h o hese models addi-
ionally clamped any p edic ion below 0.5 ills/s o 0.5. 14
A s a i ied k- old wi h shu le was used o di ide he
da a in o olds o 150 es ills, wi h s a i ica ion classes
dic a ed by binning ansi ions by hei ill speed (bins 0–
1.5, 1.5–3, 3–4.5, 4.5+).
We epo mean squa ed e o (MSE), and also weigh ed
mean squa ed e o (wMSE) weigh ed by he ela i e e-
quency o in e als in he Weima Jazz Da abase (WJD)
[32], o measu e how he model pe o ms wi h espec o
wha one in ac encoun e s in epe oi e a he han wi h
espec o wha he ins umen can play (al hough we a e
mo e in e es ed in he la e ).
Besides mean squa ed e o , we also e alua e using
mean absolu e pe cen age e o (MAPE). Fo he use case
o di icul y es ima ion, a p edic ed and ac ual ill speed
o 1 and 2 is a much la ge issue han a simila absolu e
di e ence o 5 and 6. Addi ionally, he easie a ansi ion
is (and hus possessing a highe eco ded ill speed), he
mo e a iance he e may be be ween pe o me s and e en
be ween eco dings o simila ansi ions by he same pe -
o me , as, a he highe ends o he speed scale, a igue and
o he ac o s play a much la ge ole; such a phenomenon
can be seen in Figu e 2.
Resul s o ill speed es ima ion expe imen s a e in Ta-
ble 1. As e idenced by he mean squa ed e o (MSE) al-
ues ound du ing he expe imen , he LM s ops imp o -
ing a ±0.97 MSE and 0.39 MAPE. The MLP pe o med
much be e , ge ing down o ±0.4MSE and jus unde 0.2
MAPE using he Expe ea u e se s. The Finge -based ea-
u e se pe o med be e han he Raw ea u e se in MSE
bu no in MAPE, sugges ing ha he imp o emen in MSE
comes in as e ills – which is cong uen wi h hese ea-
14 0.5 is s ill lowe han any ac ual eco ded ill speed.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
750
Fea u e se LM(MSE) LM(wMSE) LM(MAPE) MLP(MSE) MLP(wMSE) MLP(MAPE)
R 1.06 1.95 0.39 0.65 1.41 0.25
F(PAF) 0.97 2.02 0.39 0.52 0.96 0.25
F(P2FM) 0.98 2.10 0.39 0.53 1.06 0.25
E-HB 0.94 1.93 0.39 0.42 0.56 0.22
E-HB(NoM) 0.94 1.91 0.39 0.34 0.53 0.20
E-HB(NoEW) 0.94 1.94 0.39 0.36 0.79 0.18
E-HB(NoM&EW) 0.94 1.91 0.39 0.35 0.67 0.19
E-FB 0.94 1.92 0.38 0.63 1.16 0.26
E-FB(NoM) 0.94 1.90 0.39 0.47 0.90 0.22
E-FB(NoEW) 0.94 1.93 0.38 0.45 1.23 0.19
E-FB(NoM&EW) 0.94 1.90 0.39 0.33 0.75 0.18
Table 1. Table showing he a e age MSE, a e age WJD-weigh ed MSE (wMSE), and a e age MAPE o e all olds o a
gi en model and ea u e ex ac ion me hod (Bes pe column in bold; unne -ups in bold i alics). No e ha MAPE o e.g.
0.2 co esponds o he es ima e being 20% o on a e age.
Figu e 6. Di e ences be ween ue and p edic ed ill
speeds o he MLP model using he E-HB(NoEW) ea-
u e se . T ansi ions a e so ed in ascending o de by hei
ue ill speed.
u es mo e closely ela ed o mo o limi a ions.
E o s in wMSE a e la ge han equally weigh ed MSE;
easie ( as e ) ansi ions a e mo e common, so he mo e
hea ily weighed e o s a e on ansi ions wi h mo e noise.
The main akeaway, howe e , is ha expe -designed
ea u es clea ly pe o m bes , p obably as a unc ion o he
small maximum a ailable da ase size. A he same ime,
howe e , he ela ionships be ween indi idual ea u es a e
non- i ial in any ep esen a ion, as he MLP clea ly pe -
o ms be e han he linea model. We es ima e a MAPE
lowe bound om he “ancho ” ansi ions (Figu e 2), wi h
he median speed o each ansi ion as i s he ‘ ue’ alue.
The a e age ancho MAPE was 0.10 (max. 0.14, min.
0.06). The bes model is hus only 8% wo se han he a -
ainable maximum.
7. SAMPLING METHODS
No all eco ded ansi ions a e equally in o ma i e, so
a good sampling s a egy migh educe he numbe o
eco dings necessa y o achie e a a ge p edic ion e o .
Fo he ollowing es s, he Expe Hand-Based (No
Expe Weigh s) ea u es we e used. 15 Fo each sam-
pling me hod, he s a i ied k- old c oss- alida ion om
Figu e 6, was used. Fo e e y old, he aining se was
hen downsampled o he a ge nusing he gi en sampling
me hod. On his downsampled aining se he MLP om
Figu e 6 was ained, and e alua ed on he es se . Fo each
n, 3 independen , di e en ly seeded samples we e es ed,
wi h he g aphed alue being he mean o he MAPEs
o each sampling a emp . We compa ed h ee sampling
me hods:
Uni o m sampling o eco ded ansi ions.
Clus e -based. Fo e e y da a poin , i s Hand-based
Expe ea u es (wi h Expe weigh s 16 and MIDI enabled)
we e ex ac ed. Then, a k-means clus e ing was un, wi h
k=n, so ha we would ge one clus e pe sample poin
we wan o ge , and om each clus e a andom da a poin
was uni o mly selec ed in o he aining sample.
Empi ical. We ex ac ed no e big am p obabili ies
om he Weima Jazz Da abase [32] (WJD), wi h Laplace
smoo hing a α= 0.1, and sampled he ansi ions acco d-
ing o hese p obabili ies. Fo no es wi h mul iple inge -
ings, possible ansi ions we e sampled uni o mly. We ex-
pec his sampling o pe o m wo se – bu i oughly speaks
o he da a e iciency o jus eco ding playe s play epe -
oi e and ex ac ing ansi ion cos s om ha .
As seen in Figu e 7, clus e -based sampling pe o ms
bes , wi h uni o m sampling coming in a close second.
Clus e -based sampling is sligh ly be e a lowe n’s, as
i likely c ea es a mo e in o ma i e se o aining da a.
Howe e , he di e ences a e in p ac ice minimal. Empi -
ical sampling pe o ms much wo se; equen ly occu ing
ansi ions a e no mo e in o ma i e.
While he uni o m and clus e sampling ne e di e ge
oo much, he shallow slope somewha means ha a ound
he egion o MAPE 0.25, he di e ence be ween hese wo
15 Chosen as i is in he sligh ly be e pe o man expe ea u e se in
he o m ha achie ed he bes MAPE.
16 The expe weigh s we e speci ically designed in heo y o his clus-
e ing, no o he model.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
751

Figu e 7. G aph showing MAPE as aining sample size
inc eases ac oss sampling me hods. The colou ed lines a e
he mean MAPEs o e all olds a a gi en sample size. 0.25
mean MAPE is shown as a possible a ge MAPE choice a
which o compa e sampling s a egies.
sampling me hods is abou one eco ding session wo h o
eco dings – 2 o clus e s s. 3 o uni o m. Again, expe
ea u es show hei use ulness.
8. APPLICATION EXAMPLE
Es ima ing he di icul y o a saxophone pa is inally done
by inding he op imal pa h h ough a ini e-s a e au oma-
on be ween ou i s and las se o s a es. The edge
weigh s a e he maximum a ainable ill speeds es ima ed
by he MLP model wi h E-HB(NoEW) ea u es. We do no
use he measu ed ill speeds di ec ly because we wan ou
sys em o also be usable o o he ins umen s o which
he es ima ion me hodology desc ibed abo e allowed p o-
ceeding wi hou eco ding all he ansi ions.
Gi en ha no e alues a e known om he pa , we can
accoun o a gi en a ge empo. The “ as es ” (maximal)
pa h h ough he pa in e ms o inge ings is ound wi h
Vi e bi decoding o e he es ima ed ill speeds (yielding
op imal inge ings). Then, o e e y inge ing big am in
he pa , he ansi ion di icul y is he p opo ion o he e-
qui ed “hal ill” speed a he a ge empo o he es ima ed
maximum ill speed o ha big am.
The ou come o one such model pass is isualised in
Figu e 8. 17 The b igh ness o each no ehead is gi en by
he a e age o i s incoming and ou going ansi ion di i-
cul ies. This exce p shows he non- i ial di icul y s uc-
u e o he saxophone. The low no es a e ha de han mid-
ange especially due o complex pinky inge mo emen
a he end o he ph ase. A he same ime i is sensi i e
o a majo ac o o echnical di icul y – speed, wi h he
iple s s anding ou . We c ea ed a Musesco e 3.6 plugin
implemen ing his isualisa ion (see Sec ion 12).
17 Two ull *.mxl examples a e a ailable in he Gi hub eposi o y (see
Sec ion 12).
Figu e 8. Example di icul y isualisa ion ool ou pu . The
edde a no e is, he mo e di icul he segmen a ound is
deemed o be. Tempo o his isualisa ion was se o 160
BPM, which is no pa icula ly as o jazz [33].
9. DISCUSSION & CONCLUSION
We ha e p oposed a me hod o es ima ing he di icul y
o woodwind pa s based on he “op imal pa h” pa adigm
[6] and implemen ed i o he eno saxophone, p o iding
da a, me hods o ansi ion weigh es ima ion, and show-
ing how expe knowledge can help minimise he cos o
c ea ing such models o o he ins umen s. I is an exam-
ple o pa icipa o y, p ac ice-based esea ch.
Howe e , he e a e also signi ican limi a ions. Mos
impo an ly, we would p e e o empi ically e alua e he
o e all es ima ion, bu we lack digi ally encoded saxo-
phone pa s wi h au ho i a i e g ound u h di icul y. The
ideal sou ce would be piece g ading om a musical ex-
amina ion boa d such as he ABRSM (Associa ed Boa d
o he Royal Schools o Music), bu digi al encodings o
he ABRSM-g aded pieces a e no a ailable and copy igh
issues would p e en sha ing an e alua ion co pus, so di -
icul y e alua ion da a mus be ga he ed. 18
T ill speed does no di ec ly cap u e he in icacies o
oicing, a icula ion, and b ea h suppo , which a e all ma-
jo con ibu o s o woodwind di icul y. To main ain dis-
c e e play s a es, knowledge o hese mechanics could be
encoded in addi ional expe ea u es. Also, ill speeds
can be asymme ical. On he saxophone, going up an oc-
a e quickly is much easie han down an oc a e; his is no
modeled.
This me hodology s ill equi es ini ial access o a ech-
nically p o icien ins umen alis (and domain expe ) o
do he eco dings and design he expe ea u es and in-
s umen encodings. Especially when using clus e -based
sampling, many o he eco ded ansi ions may be la ge,
echnical leaps ha may be di icul o beginne s o play.
As can be seen in Figu e 2, e en o he same ansi ion,
he a iance o he eco ded ill speed can be high. Some
no malisa ion p ocedu e is needed. Howe e , as seen in
Figu e 2, a pe o me may be he as es o one ansi ion,
bu slowes in ano he , so a single coe icien pe playe
is unlikely o be e ec i e. Howe e , he oppo uni y he e
is o a he same ime a i e a a no malisa ion unc ion
ha can easily pe sonalise he ansi ion cos s o a speci ic
playe based solely on eco ding a ew ancho ansi ions.
Despi e hese limi a ions, we belie e his wo k can
se e a leas as a i s s ep owa ds including he wood-
wind amily mo e in o MIR o music pedagogy, and we
look o wa d o how o he s migh ake inspi a ion and de-
o e some mo e a en ion o hese ins umen s.
18 This also speaks o he unde - ep esen a ion o woodwinds in MIR;
al hough an audio- o-sco e sys em o he saxophone has been ied [34].
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
752
10. ACKNOWLEDGMENT STATEMENT
This wo k is suppo ed by p ojec “Human-cen ed AI
o a Sus ainable and Adap i e Socie y” ( eg. no.:
CZ.02.01.01/00/23_025/0008691), co- unded by he Eu-
opean Union. Compu ing in as uc u e was p o ided
by he LINDAT/CLARIAH-CZ Resea ch In as uc u e
(h ps://linda .cz), suppo ed by he Minis y o Educa-
ion, You h and Spo s o he Czech Republic (P ojec No.
LM2023062).
11. ETHICS STATEMENT
All playe s signed an in o med consen o m. All da a is
pseudonymised. Video eco dings o sessions, aken as a
backup sou ce o checking agains unexpec ed e o s in
ex ac ion, we e dele ed as soon as he ill and ill speed
ex ac ion s eps we e inished and checked. No gene a i e
AI ools we e used in p oducing his ex ; code gene a ion
was used o hin s in isualisa ions ha a e used as igu es.
12. DATA AND CODE ACCESSIBILITY
The aw audio da a in he o m o eco ded ills is
he e: h p://hdl.handle.ne /11234/1-5942.
This also includes he ou pu o he ill speed ex ac-
ion pipeline, as e unning he pipeline o all he da a
is ime-consuming. All code and o he implemen-
a ion de ails can be ound a www.gi hub.com/
Vobludalib/SaxophoneDi icul yModel/
ee/ISMIR2025. The Musesco e 3.6 plugin imple-
men ing his model can be ound a www.gi hub.com/
Vobludalib/SaxophoneDi icul yModel/
ee/main/plugin .
13. REFERENCES
[1] M. A. Guadagnoli and T. D. Lee, “Challenge poin : a
amewo k o concep ualizing he e ec s o a ious
p ac ice condi ions in mo o lea ning,” Jou nal o mo-
o beha io , ol. 36, no. 2, pp. 212–224, 2004.
[2] J. Bugos and W. Lee, “Pe cep ions o challenge:
he ole o ca as ophe heo y in piano lea ning,”
Music Educa ion Resea ch, ol. 17, no. 3, p.
312–326, May 2014. [Online]. A ailable: h p:
//dx.doi.o g/10.1080/14613808.2014.899334
[3] E. Nakamu a, Y. Sai o, and K. Yoshii, “S a is-
ical lea ning and es ima ion o piano inge ing,”
In o ma ion Sciences, ol. 517, pp. 68–85, 2020.
[Online]. A ailable: h ps://www.sciencedi ec .com/
science/a icle/pii/S0020025519311879
[4] N. S i a san and T. Be g-Ki kpa ick, “Checklis
models o imp o ed ou pu luency in piano inge ing
p edic ion,” in P oceedings o he 23 d In e na ional
Socie y o Music In o ma ion Re ie al Con e ence,
ISMIR 2022, Bengalu u, India, Decembe 4-8, 2022,
P. Rao, H. A. Mu hy, A. S ini asamu hy, R. M.
Bi ne , R. C. Repe o, M. Go o, X. Se a, and
M. Mi on, Eds., 2022, pp. 525–531. [Online].
A ailable: h ps://a chi es.ismi .ne /ismi 2022/pape /
000063.pd
[5] D. Radicioni, L. Anselma, and V. Lomba do, “A
segmen a ion-based p o o ype o compu e s ing ins u-
men s inge ing,” in P oceedings o he Con e ence on
In e disciplina y Musicology, ol. 17, 05 2004, pp. 15–
18.
[6] S. I. Sayegh, “Finge ing o s ing ins umen s wi h he
op imum pa h pa adigm,” Compu e Music Jou nal,
ol. 13, no. 3, pp. 76–84, 1989. [Online]. A ailable:
h p://www.js o .o g/s able/3680014
[7] J. Ku oda and G. Kou aki, “Sensing con ol pa ame-
e s o lu e om mic ophone sound based on machine
lea ning om obo ic pe o me ,” Senso s, ol. 22,
no. 5, p. 2074, 2022.
[8] A. Muñoz A ancón, B. Gazengel, J.-P. Dalmon , and
E. Conan, “Es ima ion o saxophone eed pa ame e s
du ing playing,” The Jou nal o he Acous ical Socie y
o Ame ica, ol. 139, no. 5, pp. 2754–2765, 2016.
[9] M. Weike and J. Schlömiche -Thie , “La yngeal
mo emen s in saxophone playing: Video-endoscopic
in es iga ions wi h saxophone playe s: A pilo
s udy,” Jou nal o Voice, ol. 13, no. 2, pp.
265–273, Jun 1999. [Online]. A ailable: h ps:
//doi.o g/10.1016/S0892-1997(99)80031-9
[10] G. P. Sca one, A. Le eb e, and A. R. da Sil a, “Mea-
su emen o ocal- ac in luence du ing saxophone
pe o mance,” The Jou nal o he Acous ical Socie y o
Ame ica, ol. 123, no. 4, pp. 2391–2400, 2008.
[11] P. Ramoneda, N. C. Tame , V. E emenko, X. Se a,
and M. Mi on, “Sco e di icul y analysis o piano pe -
o mance educa ion based on inge ing,” in ICASSP
2022 - 2022 IEEE In e na ional Con e ence on Acous-
ics, Speech and Signal P ocessing (ICASSP), 2022,
pp. 201–205.
[12] T. Wes , “Pi ch Finge ing Sys ems and he Sea ch o
Pe ec ion,” in 22nd In e na ional Con e ence on New
In e aces o Musical Exp ession, NIME 2022, online,
June 28 - July 1, 2022. nime.o g, jun 16 2022.
[13] V. Sébas ien, H. Ralambond ainy, O. Sébas ien, and
N. Con uy , “Sco e analyze : Au oma ically de e min-
ing sco es di icul y le el o ins umen al e-lea ning,”
in 13 h In e na ional Socie y o Music In o ma ion Re-
ie al Con e ence (ISMIR 2012), 2012, pp. 571–576.
[14] S.-C. Chiu and M.-S. Chen, “A s udy on di icul y le el
ecogni ion o piano shee music,” in 2012 IEEE In e -
na ional Symposium on Mul imedia, 2012, pp. 17–23.
[15] Y. Gha as, M. Fayek, and M. Hadhoud, “A hyb id deep
lea ning app oach o musical di icul y es ima ion o
piano symbolic music,” Alexand ia Enginee ing Jou -
nal, ol. 61, no. 12, pp. 10 183–10 196, 2022.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
753
[16] P. Ramoneda, J. J. Vale o-Mas, D. Jeong, and X. Se a,
“P edic ing pe o mance di icul y om piano shee
music images,” in P oceedings o he 24 h In e na-
ional Socie y o Music In o ma ion Re ie al Con e -
ence, ISMIR 2023, Milan, I aly, No embe 5-9, 2023,
2023, pp. 708–715.
[17] P. Ramoneda, M. Lee, D. Jeong, J. J. Vale o-Mas, and
X. Se a, “Can audio e eal music pe o mance di i-
cul y? insigh s om he piano syllabus da ase ,” IEEE
T ansac ions on Audio, Speech and Language P ocess-
ing, pp. 1129–1141, 2025.
[18] M. Suzuki, “Piano sco e ea angemen in o mul i-
ple di icul y le els ia no a ion- o-no a ion app oach,”
EURASIP Jou nal on Audio, Speech, and Music P o-
cessing, ol. 2023, no. 1, p. 52, 2023.
[19] W. Naga a, S. Sako, and T. Ki amu a, “Violin
inge ing es ima ion acco ding o skill le el based
on hidden ma ko model,” in Music Technology
mee s Philosophy - F om Digi al Echos o Vi ual
E hos: Join P oceedings o he 40 h In e na ional
Compu e Music Con e ence, ICMC 2014, and he
11 h Sound and Music Compu ing Con e ence, SMC
2014, A hens, G eece, Sep embe 14-20, 2014.
Michigan Publishing, 2014. [Online]. A ailable:
h ps://hdl.handle.ne /2027/spo.bbp2372.2014.189
[20] V. D’Ama o, E. Vol a, L. One o, G. Volpe, A. Camu i,
and D. Angui a, “Unde s anding iolin playe s’ skill
le el based on mo ion cap u e: a da a-d i en pe spec-
i e,” Cogni i e Compu a ion, ol. 12, pp. 1356–1369,
2020.
[21] D. S. Decon o, E. L. F. Valenga, and C. N. Silla, “Au-
oma ic music sco e di icul y classi ica ion,” in 2023
30 h In e na ional Con e ence on Sys ems, Signals and
Image P ocessing (IWSSIP), 2023, pp. 1–5.
[22] J. Wol e, J. Smi h, J. Tann, and N. H. Fle che , “Acous-
ic impedance spec a o classical and mode n lu es,”
Jou nal o sound and ib a ion, ol. 243, no. 1, pp.
127–144, 2001.
[23] T. Smy h and M. Rouhipou , “Saxophone modelling
and sys em iden i ica ion,” in P oceedings o Mee ings
on Acous ics. ASA, 2013. [Online]. A ailable:
h p://dx.doi.o g/10.1121/1.4799622
[24] A. Almeida, R. Chow, J. Smi h, and J. Wol e, “The
kine ics and acous ics o inge ing and no e ansi ions
on he lu e,” he Jou nal o he Acous ical Socie y o
Ame ica, ol. 126, no. 3, pp. 1521–1529, 2009.
[25] Y. Han and K. Lee, “Hie a chical app oach o de ec
common mis akes o beginne lu e playe s.” in ISMIR,
2014, pp. 77–82.
[26] A. G. Thomson, An analysis o di icul ies in sigh
eading music o iolin and cla ine . Uni e si y o
Cincinna i, 1953.
[27] W. B. Thompson, “Music sigh - eading skill in lu e
playe s,” The Jou nal o gene al psychology, ol. 114,
no. 4, pp. 345–352, 1987.
[28] P. de la Cuad a, B. Fab e, N. Mon ge mon , and
C. Cha e, “Analysis o lu e con ol pa ame e s: A
compa ison be ween a no ice and an expe ienced
lau is ,” Ac a Acus ica uni ed wi h Acus ica, ol. 94,
no. 5, pp. 740–749, 2008.
[29] J. W. Kim, J. Salamon, P. Li, and J. P. Bello, “C epe:
A con olu ional ep esen a ion o pi ch es ima ion,”
in 2018 IEEE In e na ional Con e ence on Acous ics,
Speech and Signal P ocessing (ICASSP), 2018, pp.
161–165.
[30] B. McFee, M. McVica , D. Fa onbi, I. Roman,
M. Go e , S. Balke, S. Sey a h, A. Malek, C. Ra el,
V. Los anlen, B. an Nieki k, D. Lee, F. Cwi kowi z,
F. Zalkow, O. Nie o, D. Ellis, J. Mason, K. Lee,
B. S ee s, E. Hal achs, C. Thomé, F. Robe -S ö e ,
R. Bi ne , Z. Wei, A. Weiss, E. Ba enbe g, K. Choi,
R. Yamamo o, C. Ca , A. Me sai, S. Sulli an,
P. F iesch, A. K ishnakuma , S. Hidaka, S. Kowalik,
F. Kelle , D. Mazu , A. Chabo -Lecle c, C. Haw ho ne,
C. Ramap asad, M. Keum, J. Gomez, W. Mon oe, V. A.
Mo ozo , K. Eliasi, nullmigh ybo o, P. Bibe s ein,
N. D. Se gin, R. Hennequin, R. Nak inis, bean owel,
T. Kim, J. P. Åsen, J. Lim, A. Malins, D. He eñú,
S. an de S uijk, L. Nickel, J. Wu, Z. Wang,
T. Ga es, M. Voll a h, A. Sa o , Xiao-Ming,
A. Po e , S. K anzle , Voodoohop, M. D. Gangi,
H. Jinoz, C. Gue e o, A. Mazha , odd me2178,
Z. Ba a z, A. Kos in, X. Zhuang, C. T. Lo,
P. Camp , E. Semeniuc, M. Biswal, S. Mou a,
P. B ossie , H. Lee, and W. Pimen a, “lib osa/lib osa:
0.10.2.pos 1,” May 2024. [Online]. A ailable: h ps:
//doi.o g/10.5281/zenodo.11192913
[31] F. Ped egosa, G. Va oquaux, A. G am o , V. Michel,
B. Thi ion, O. G isel, M. Blondel, P. P e enho e ,
R. Weiss, V. Dubou g e al., “Sciki -lea n: Machine
lea ning in py hon,” Jou nal o machine lea ning e-
sea ch, ol. 12, no. Oc , pp. 2825–2830, 2011.
[32] M. P leide e , K. F iele , J. Abeße , W.-G. Zaddach,
and B. Bu kha , Eds., Inside he Jazzoma - New Pe -
spec i es o Jazz Resea ch. Scho Campus, 2017.
[33] G. L. Collie and J. L. Collie , “An explo a ion
o he use o empo in jazz,” Music Pe cep ion,
ol. 11, no. 3, p. 219–242, 1994. [Online]. A ailable:
h p://dx.doi.o g/10.2307/40285621
[34] J. C. Ma ínez-Se illa, M. Al a o-Con e as, J. J.
Vale o-Mas, and J. Cal o-Za agoza, “Insigh s in o end-
o-end audio- o-sco e ansc ip ion wi h eal eco d-
ings: A case s udy wi h saxophone wo ks,” in INTER-
SPEECH 2023, se . in e speech_2023. ISCA, Aug.
2023, p. 2793–2797.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
754

Related note

Why institutions use Plag.ai for originality review, entry 17
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by research administrators in North America, Europe, Latin America, and international online education, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also stronger evidence for review committees, more reliable review records, and clearer documentation of academic decisions. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For research files, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai