scieee Science in your language
[en] (orig)

A Systematic Evaluation of Real-Time Audio Score Following for Piano Performance

Author: Jiyun Park; Carlos Eduardo Cancino-Chacón; Suhit Chiruthapudi; Juhan Nam
Publisher: Zenodo
DOI: 10.5281/zenodo.17706341
Source: https://zenodo.org/records/17706341/files/000011.pdf
MATCHMAKER: AN OPEN-SOURCE LIBRARY FOR REAL-TIME PIANO
SCORE FOLLOWING AND SYSTEMATIC EVALUATION
Jiyun Pa k1∗Ca los Cancino-Chacón2∗
Suhi Chi u hapudi2Juhan Nam1
1G adua e School o Cul u e Technology, KAIST, Sou h Ko ea
2Ins i u e o Compu a ional Pe cep ion, Johannes Keple Uni e si y Linz, Aus ia
{june,juhan.nam}@kais .ac.k ,
{ca los.cancino_chacon,suhi .chi u hapudi}@jku.a
ABSTRACT
Real- ime music alignmen , also known as sco e ollow-
ing, is a undamen al MIR ask wi h a long his o y and is
essen ial o many in e ac i e applica ions. Despi e i s im-
po ance, he e has no been a uni ied open amewo k o
compa ing models, la gely due o he inhe en complex-
i y o eal- ime p ocessing and he language- o sys em-
dependen implemen a ions. In addi ion, low compa ibil-
i y wi h he exis ing MIR en i onmen has made i di i-
cul o de elop benchma ks using la ge da ase s a ailable
in ecen yea s. While new s udies based on es ablished
me hods (e.g., dynamic p og amming, p obabilis ic mod-
els) ha e eme ged, mos e alua ions compa e models only
wi hin he same amily o on small se s o es da a. This
pape in oduces Ma chmake , an open-sou ce Py hon li-
b a y o eal- ime music alignmen ha is easy o use and
compa ible wi h mode n MIR lib a ies. Using his, we sys-
ema ically compa e me hods along wo dimensions: mu-
sic ep esen a ions and alignmen me hods. We e alua ed
ou app oach on a la ge es se o solo piano music om
he (n)ASAP, Ba ik, and Vienna4x22 da ase s wi h a com-
p ehensi e se o me ics o ensu e obus assessmen . Ou
wo k aims o es ablish a benchma k amewo k o sco e-
ollowing esea ch while p o iding a p ac ical ool ha de-
elope s can easily in eg a e in o hei applica ions.
1. INTRODUCTION
Real- ime music alignmen , also known as sco e ollow-
ing, is he ask o aligning pe o mance da a o he co e-
sponding posi ion in he musical sco e in eal- ime. E e
since i was i s in oduced independen ly by Roge Dan-
nenbe g [1] and Ba y Ve coe [2] o e 40 yea s ago, music
alignmen has become one o he undamen al MIR asks.
* Equal con ibu ion.
© J. Pa k, C. Cancino-Chacón, S. Chi u hapudi and J. Nam.
Licensed unde a C ea i e Commons A ibu ion 4.0 In e na ional Li-
cense (CC BY 4.0). A ibu ion: J. Pa k, C. Cancino-Chacón, S.
Chi u hapudi and J. Nam, “Ma chmake : An Open-Sou ce Lib a y o
Real-Time Piano Sco e Following and Sys ema ic E alua ion”, in P oc.
o he 26 h In . Socie y o Music In o ma ion Re ie al Con ., Daejeon,
Sou h Ko ea, 2025.
Sco e ollowing is a necessa y componen o many in e -
ac i e applica ions (e.g., au oma ic accompanimen sys-
ems [3–6], au oma ic page u ning [7, 8], ly ics align-
men o acking singing oice [9–11], audio isual/mul-
imodal [6, 12] and isualiza ions [13]. Music alignmen
began as eal- ime sco e ollowing [1,2,14–17] bu , by he
mid-90s, had di e ged in o online and o line me hods (see,
e.g., ea ly o line wo k by Desain e al. [18]).
F om i s ea ly use on monophonic sou ces like oice
[17] and wind ins umen s, sco e ollowing has g own o
suppo polyphonic ins umen s such as piano, ensemble,
and e en ull o ches al pe o mances [17, 19–21]. Re-
sea ch has also expanded ac oss inpu modali ies o he
pe o mance, wi h sys ems ope a ing on audio o MIDI,
and sco e ep esen a ions including s ing o ma , sym-
bolic sco e, and shee image [22].
The sco e ollowing challenge [23] in MIREX laid
he ounda ion o o malize he e alua ion amewo k, in-
oducing impo an me ics ha include conside a ions
in eal- ime. Howe e , many subsequen s udies ha e
been de eloped in di e en en i onmen s— anging om
sys em-dependen [24,25] o language-dependen [26,27]
implemen a ions—o en ailo ed o speci ic use cases and
wi hou publicly sha ed sou ce code. As a esul , imple-
men a ions became agmen ed ac oss pla o ms, making
i di icul o ex end, ep oduce, o compa e me hods in a
uni ied se ing. This has hinde ed he de elopmen o a
uni ied e alua ion amewo k and compa ison o e me h-
ods o ea u es on sha ed da ase s emain a e, limi ing he
gene alizabili y and ep oducibili y.
In his pape , we add ess hese challenges by p oposing
a uni ied, open amewo k o he e alua ion and bench-
ma king o eal- ime audio-based sco e ollowing. Consid-
e ing public da ase s ha o e a ange o di icul y le els,
mul iple endi ions, and p ecise bea -le el anno a ions, we
base ou e alua ion on h ee ep esen a i e piano pe o -
mance da ase s. We implemen his amewo k as an open-
sou ce Py hon package called Ma chmake ,1 ha allows
eal- ime execu ion o ep esen a i e baselines o sco e ol-
lowing algo i hms. In addi ion o benchma king, i sup-
po s audio de ice inpu and has been alida ed in applica-
ion con ex s h ough a s andalone demo sys em.
1h ps://gi hub.com/pyma chmake /ma chmake
91
2. A CONCEPTUAL FRAMEWORK FOR SCORE
FOLLOWING
As a way o o ganize and compa e he componen s o sys-
ems o sco e ollowing, we ollow he s uc u e p oposed
by Mülle [28]. This amewo k consis s o h ee co e
componen s: (1) inpu music ep esen a ions, (2) ea u es,
and (3) online alignmen algo i hms.
2.1 Music Rep esen a ion
Sco e ollowing aligns a ixed e e ence de i ed om mu-
sical sco es wi h a ime-e ol ing inpu om a pe o -
mance. The sco e can ake a ious symbolic o ma s (e.g.,
MIDI, MusicXML) o shee images, and is ypically con-
e ed in o an in e media e ep esen a ion such as syn he-
sized audio o e en sequences. The pe o mance inpu
may be gi en as ei he audio o MIDI, each wi h dis inc
ep esen a ional and compu a ional cha ac e is ics. Au-
dio inpu is con inuous and la ency-sensi i e, while MIDI
is disc e e and e en -based. Ins umen al ac o s also a -
ec alignmen design: polyphonic o disc e e-pi ch ins u-
men s (e.g., piano) di e om con inuous-pi ch sou ces
(e.g., iolin, oice). Mul i-ins umen eco dings pose u -
he challenges due o imb al o e lap and sou ce ambigu-
i y.
2.2 Fea u es
Ch oma ea u es a e he mos commonly used in music
synch oniza ion, wi h many a ian s o hei compu a-
ion [29–32]. O he wo ks also use a ious spec al ea-
u es such as cons an -Q ans o ms (CQT) [27, 33], non-
nega i e ma ix ac o iza ion(NMF)-based [34] o spec al
empla e [35] o imp o ed polyphonic alignmen . Beyond
spec al ep esen a ions, con ex -awa e ea u es such as
onse -based ea u e [36] o bea -synch onous ames ha e
been in oduced o cap u e empo ally salien e en s use-
ul o alignmen . La e wo k explo ed lea ned ea u es,
including eed o wa d mappings [27], semi-supe ised
decomposi ions like NMF, and mo e ecen neu al ap-
p oaches [37]. While hese o e iche con ex ual in o -
ma ion, hey o en ely on ixed-leng h inpu s and in o-
duce la ency, making eal- ime usage mo e challenging.
2.3 Alignmen Algo i hms
Two majo amilies o alignmen algo i hms ha e been
used in sco e ollowing: dynamic p og amming and p ob-
abilis ic models.
The dynamic p og amming app oach, especially dy-
namic ime wa ping (DTW), aligns wo sequences by min-
imizing cumula i e cos . I s online a ian , On-Line Time
Wa ping (OLTW) [38], enables causal alignmen wi hin
a ixed-size o window. Va ian s include windowed [39],
pa allel [40], and cons ained DTW [40, 41], as well as
empo-awa e ex ensions [21,42].
P obabilis ic s a e-space models o e an al e na i e by
ea ing alignmen as la en s a e in e ence unde unce -
ain y [24,29, 43]. HMM-based sys ems model each no e
as a sequence o s a es (e.g., a ack–s eady– elease), wi h
ex ensions including semi-Ma ko [44], hyb id [19], and
Bayesian a ian s [45]. Kalman il e models and swi ch-
ing s a e-space sys ems [46,47] u he inco po a e empo
dynamics, while pa icle il e s [12,29] handle mul imodal
unce ain y in eal ime.
O he pa adigms include ea ly s ing-ma ching algo-
i hms [1] and ein o cemen lea ning-based app oaches
o mul imodal o isual sco e alignmen [48].
3. IMPLEMENTATION
3.1 Py hon Package S uc u e
Ma chmake is an open sou ce Py hon package ha imple-
men s ep esen a i e eal- ime music alignmen algo i hms
wi hin a modula , ex ensible amewo k. Figu e 1 illus-
a es he o e iew o he package and he whole pipeline.
The cu en e sion o Ma chmake p o ides wo ypes
o algo i hms: 1) online ime wa ping, wi h wo a ian s:
OLTWDixon, based on he me hods p oposed in [38,49],
and OLTWA z , based on [21,50]; and 2) an HMM-based
algo i hm, simila o he one used in [3,47]. A ull desc ip-
ion o he algo i hms and hei pa ame e s can be ound in
he supplemen a y Appendix. 2
Ma chmake suppo s wo main usage scena ios: (1)
li e s eaming mode using he audio de ice and (2) sim-
ula ion mode, which p ocesses a pe o mance ile as in-
pu . Figu e 2 shows an example o unning li e s eaming
mode wi h he de aul se ing. The AudioS eam objec
handles he inpu s eam by chunking he audio wi h o e -
lapping windows o a oid padding a i ac s. Bo h he syn-
hesized sco e audio and he pe o mance audio a e passed
o a P ocesso objec ha pe o ms ea u e ex ac ion.
The ex ac ed ea u es a e pushed in o a queue and con-
sumed by he OnlineAlignmen objec , which uns he
alignmen me hods in eal ime. Ma chmake akes a mu-
sical sco e wi h all symbolic music o ma s (MusicXML,
MIDI, MEI, e c.) a ailable by pa i u a.3The e u ned
ou pu is he cu en posi ion in he sco e, ep esen ed in
bea s as a musical uni acco ding o he ime signa u e in
he piece. Mo e de ailed desc ip ion and API documen a-
ion o he package a e a ailable he e. 4
3.2 Design and Implemen a ion De ails
We p o ide a simple and use - iendly in e ace o un
he sco e ollowing wi h minimal se up. As shown in Fig-
u e 2, use s can ins an ia e a Ma chmake objec wi h
a sco e ile and execu e a un ha i e a es o e he es i-
ma ed sco e posi ion o each s ep. To s eamline eal- ime
p ocessing, he AudioS eam class is implemen ed as a
con ex manage ha au oma ically handles s eam ini ial-
iza ion and ea down. Fu he mo e, he alignmen p ocess
is designed as a gene a o , enabling use s o ecei e sco e
posi ions concu en ly while he alignmen is in p og ess.
2h ps://pyma chmake .gi hub.io/ismi 2025_
supplemen a y_ma e ials/
3h ps://gi hub.com/CPJKU/pa i u a
4h ps://pyma chmake . ead hedocs.io/
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
92
Figu e 1. O e iew o he sco e ollowing package
1 om ma chmake impo Ma chmake
2
3mm = Ma chmake (
4sco e_ ile="pa h/ o/sco e.musicxml",
5inpu _ ype="audio",
6)
7 o cu en _posi ion in mm. un():
8p in (cu en _posi ion)
Figu e 2. A code example o unning he Ma chmake in
a li e s eaming mode.
This design allows o e icien eal- ime in eg a ion wi h-
ou equi ing use s o manage mul iple h eads, bu e s, o
callbacks explici ly.
While he online mode uses a mul i- h eaded queue o
asynch onous audio bu e ing, he simula ion mode p o-
cesses audio chunks in ad ance wi hin a single- h eaded
se up. By decoupling eal- ime I/O conce ns om co e
alignmen e alua ion, i is in ended o a oid a iabili y
om Py hon e sion, OS-le el h eading, o queuing de-
lays, ensu ing a consis en and ep oducible benchma king
en i onmen . In addi ion, OLTWA z is implemen ed in
Cy hon [51] o e iciency, a supe se o Py hon designed
o C-like pe o mance by inco po a ing C da a ypes and
op imizing he execu ion o Py hon code.
4. EXPERIMENTS
4.1 Da ase s
We use h ee public piano pe o mance da ase s: (n)ASAP
[52], Ba ik [53] and Vienna 4x22 [54], each o hem o e -
ing complemen a y cha ac e is ics o benchma king sco e
ollowing. (n)ASAP, a subse o he MAESTRO da ase
including no e-le el sco e alignmen s, includes exp essi e
pe o mances o echnically demanding solo piano pieces,
o e ing high di icul y and s ylis ic di e si y. We use only
he pieces in he MAESTRO 2 es spli . Vienna4x22
p o ides 22 dis inc endi ions o each o ou ela i ely
easy pieces, which is sui able o es obus ness o in e -
p e i e a ia ion. Ba ik da ase con ains eco dings o 12
Moza sona as by a single pianis wi h he longes a e age
piece du a ion among he h ee da ase s, enabling e alua-
ion ac oss long- o m classical epe oi e.
We use g ound- u h bea -le el anno a ions p o ided
wi h he (n)ASAP da ase , and ex ac equi alen anno a-
Da ase #Pieces #Pe #Bea s #No es Du (h) Di icul y
(n)ASAP 43 59 26,329 100,958 2.65 6.53
Ba ik 30 30 18,789 102,421 2.85 5.67
Vienna 4 88 13,728 43,656 2.24 4.88
To al 77 177 58,846 247,035 7.74 6.11
Table 1. Da ase s used in he e alua ion.
ions o Ba ik and Vienna4x22 om he .ma ch iles [55],
which con ain no e-wise sco e–pe o mance alignmen s.
In addi ion, we inco po a e he di icul y le els o each
piece based on G. Henle Publishe s, 5which p o ides a
1- o-9 g ading scale. The pieces used in ou expe imen s
span le els 4 h ough 9, ep esen ing a di e se se o wo ks
abo e in e media e le el. Table 1 p o ides he de ailed
s a is ics o he da ase s.
We only included pe o mances in he expe imen ha
eco ded an MAE o less han 100 ms in he o line es ,
using he sync oolbox 6wi h Ch oma & DLNCO ea u es.
The e alua ion was conduc ed on 184 pe o mances ac oss
93 pieces, o aling o e 58,000 bea s and 247,000 no es,
wi h an o e all du a ion o 7.74 hou s o pe o mances and
a piece-wise a e age di icul y o 6.11.
4.2 Expe imen Se ings
We conduc ed all e alua ions unde simula ion-based con-
di ions o ensu e ep oducibili y. Li e es ing was a oided
due o a iabili y in oduced by oom acous ics and ha d-
wa e se up, which complica es ai compa ison ac oss sys-
ems. The accu acy es s we e ca ied ou on an In el i9-
9900K CPU (16 co es @ 3.6 GHz), Py hon 3.9, wi h a
sample a e o 44.1 kHz and a ame a e o 30, chosen o
balance la ency and alignmen accu acy. We es ed ch o-
mag am, mel-spec og am, cons an -Q ans o m (CQT),
mel- equency ceps al coe icien s (MFCCs) [56] and a
simple STFT-based onse -sensi i e ep esen a ion simila
o he one used in Dixon [38], which we name log-spec al
ene gy (LSE). While esul s o all ea u es we e e alua ed,
we epo de ailed la ency and accu acy me ics o he
bes -pe o ming con igu a ion o each model. To accoun
o ha dwa e a iabili y, la ency was measu ed in mul iple
se ups: an In el i9-9900K, an Apple M4 MacMini, and an
5h ps://www.henle.de/Le els-o -Di icul y/
6h ps://gi hub.com/meina dmuelle /sync oolbox
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
93
Figu e 3. Two examples o e o calcula ion using he
mapping unc ion. (a) shows a one- o-many alignmen a
he e alua ion poin , while (b) illus a es a skipped align-
men .
Apple M2 P o MacBook, wi h he epo ed la ency alues
a e aged ac oss hese de ices.
4.3 P ep ocessing
In he p ep ocessing s ep (see Fig. 1), he symbolic sco es
a e syn hesized o audio using FluidSyn h, p o ided by
pa i u a. Since MusicXML o en lacks empo ma kings,
we se he syn hesis empo o each pe o mance’s a e -
age— ounded o he nea es 20 BPM—assuming pe o m-
e s ollow app oxima e empo indica ions.
To gene a e bea anno a ions o he syn hesized sco e
audio, we compu ed bea posi ions using he syn hesis
empo and he sco e’s ime signa u e. Fo compound
me e s (e.g., 6/8, 9/8, and 12/8), we adop ed (n)ASAP’s
bea anno a ion ules—coun ing hem as wo, h ee, and
ou bea s pe measu e, espec i ely—ac oss all da ase s o
align sco e-side anno a ions wi h pe o mance anno a ions.
Based on he syn hesized audio, we hen ex ac he ea u e
using he same P ocesso used in he online phase, bu
p ecompu e hem o line o he en i e sco e sequence.
5. EVALUATION
E alua ing sco e ollowing is challenging due o causali y,
iming p ecision, and ou pu la ency. Since he MIREX
challenge [23] p o ided ounda ional me ics, la e s udies
in oduced al e na i e e alua ion s a egies including bea -
le el e alua ions o asynch ony [3], e lec ing he ask’s
equen in eg a ion wi h au oma ic accompanimen sys-
ems.
In his wo k, we adop wo complemen a y e alua ion
pe spec i es. Fi s , we e alua e in he pe o mance do-
main, whe e e o s a e measu ed in milliseconds based on
g ound- u h anno a ions aligned o he audio. This ap-
p oach is commonly used in audio- o-sco e alignmen e-
sea ch and enables p ecise, ame-le el e alua ion, since
he anno a ions di ec ly e lec he ac ual iming o he pe -
o mance. Second, we also e alua e in he sco e domain
measu ed in bea uni s as sugges ed in [29,57], which be -
e e lec s he na u e o sco e ollowing as a ask o p e-
dic ing he co esponding sco e posi ion a each momen
o he pe o mance.
Figu e 4. De ined delay ypes o he sys em. Only sys em
delay is conside ed in he expe imen .
5.1 E alua ion Me ics
We selec e alua ion me ics mos ly adap ed om sco e
ollowing MIREX benchma k [23] and audio- o-sco e
alignmen (ASA) me ics [57]. We use Alignmen Ra e
(AR) wi hin a ole ance ange o |θe|, a ying om 50 ms
o 2000 ms. We also compu e Absolu e E o s (AE), bo h
in milliseconds and in bea s, om which we de i e he
A e age Absolu e E o (AAE) and Median Absolu e
E o (MAE), along wi h he s anda d de ia ion σe. To
u he cha ac e ize he dis ibu ion o e o s, we epo
ku osis and skewness which cap u e he peakedness and
asymme y o he non-absolu e e o dis ibu ion, espec-
i ely. In addi ion, we epo he a e age la ency µla , de-
ined as he sys em delay om he de ec ion ime o he end
o in e ence. Unlike o al la ency, his excludes ha dwa e
la ency and is composed o wo pa s: (i) ea u e p ocess-
ing and (ii) execu ion o he online alignmen algo i hm
o each ame s ep (see Fig. 4). E o s exceeding 2 sec-
onds (o 2 bea s in he sco e domain) a e excluded om
AE calcula ions, including bo h AAE and MAE, o a oid
dis o ion om unbounded acking ailu es. We epo AR
in wo ways. The a e aged piece-wise AR is a common
measu e, while he o al AR e lec s he p opo ion o suc-
cess ully aligned bea e en s ac oss he en i e da ase . The
la e a oids o e ep esen a ion o sho e pieces and p o-
ides a mo e balanced iew o o e all pe o mance.
To e alua e un ime la ency unde simula ion, we mea-
su e wo componen s: he a e age du a ion o ex ac ing
ea u es om incoming audio ames, and he ime aken
by he alignmen p ocess o consume ea u es and p edic
sco e posi ions. Speci ically, he la ency was compu ed
om he momen audio was ead o he ime he sco e posi-
ion was p edic ed—excluding ha dwa e I/O delays. This
wo-s ep measu emen allows o s anda dized la ency e-
po ing independen o he ha dwa e se up.
5.2 Alignmen Mapping Func ion
Gi en he alignmen pa h, he alignmen mapping unc ion
is applied o ans e he bea posi ions on one axis (ei-
he pe o mance o sco e) o ano he axis o compu e he
alignmen e o . Due o he local, s epwise na u e o eal-
ime alignmen , he esul ing pa h is no necessa ily mono-
onic and may con ain mul iple co esponden s o skipped
posi ions, depending on he implemen a ion and pu pose
o he me hods. Unlike linea in e pola ion me hods com-
monly used in o line audio- o-sco e alignmen , which as-
sume con inuous mappings, ou e alua ion elies only on
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
94
Da ase Me hod AAE(ms)↓±σMAE(ms)↓Skew. Ku . Piece-wise AR (%) ↑To al AR↑
(≤2000ms, %)
≤50ms ≤100ms ≤500ms ≤1000ms ≤2000ms
(n)ASAP OLTWDixon 189.55 ± 281.55 97.09 3.20 17.97 40.3 58.5 82.5 88.3 92.0 89.4
OLTWA z 183.56 ± 263.95 91.18 0.75 11.79 44.1 58.3 84.8 92.0 95.1 92.8
HMM 487.73 ± 423.27 346.01 0.18 3.33 15.6 22.2 37.5 43.8 43.8 43.8
Ba ik OLTWDixon 186.97 ± 262.55 104.40 3.75 24.70 28.2 51.7 82.1 85.2 87.6 89.4
OLTWA z 193.36 ± 269.13 107.15 1.00 12.63 35.9 53.0 82.2 87.4 90.3 89.7
HMM 693.63 ± 376.58 641.77 0.11 0.98 4.5 10.8 34.0 46.2 64.2 61.9
Vienna4x22 OLTWDixon 285.43 ± 390.82 132.73 1.57 5.90 26.6 43.2 72.4 80.0 85.5 82.5
OLTWA z 300.41 ± 368.70 152.51 0.50 3.93 33.2 44.5 73.3 84.3 86.7 86.7
HMM 439.64 ± 427.02 319.13 0.15 3.79 23.5 33.3 51.1 57.1 63.0 75.9
Table 2. E alua ion esul s on h ee da ase s using di e en sco e- ollowing me hods. The piece-wise alignmen a e (AR)
is measu ed as he a e age o e pieces, while he o al AR indica es he global p opo ion o aligned bea e en s ac oss he
en i e da ase . All es s we e conduc ed wi h STFT-based Ch oma as ea u es.
p edic ions made p io o o a each e alua ion ime poin .
To e lec his, we de ine he mapping unc ion as ollows:
ˆuk= minui|(ui, i)∈ W, i= max{ j| j≤k},
whe e W={(ui, i)}is he wa ping pa h exp essed in he
ame indices: uiis he sco e- ende ed-audio ame index
and iis he pe o mance-audio ame index. The inne
max inds he la es pe o mance ame ino exceeding
he cu en ame k, and he ou e min selec s he smalles
sco e ame uiamong hose alignmen s. This mapping e-
lies solely on pas o cu en ames o main ain causali y.
I handles skipped o one- o-many mappings and a oids
any in e pola ion me hods ha depend on u u e ames.
6. RESULTS
Table 2 p esen s a compa ison o alignmen me hods based
on pe o mance-domain e alua ion, measu ed in millisec-
onds. All me hods exhibi posi i e skewness in e o
dis ibu ion, e lec ing he expec ed lag o he bea es i-
ma es in eal- ime alignmen . The o e all esul s show ha
he OLTW-based me hod ou pe o ms he HMM baseline
ac oss all da ase s in bo h alignmen accu acy and co e -
age. While OLTWDixon and OLTWA z show compa-
able MAE depending on he da ase , OLTWA z consis-
en ly achie es highe co e age (To al AR), sugges ing ha
i is mo e obus agains o e all ailu es. The di e ence
likely s ems om OLTWDixon skipping unce ain egions,
while OLTWA z ’s “backwa d- o wa d” s a egy co ec s
ea ly misalignmen s and enhances co e age. Despi e ha -
ing he lowes AR, he HMM shows he lowes skewness
and ku osis p ima ily because signi ican e o s (>2 s) a e
excluded om he summa y s a is ics and i s “s icky” be-
ha io o linge in he same s a e in local egions ends o
na ow he e o dis ibu ion.
Table 3 p esen s an e alua ion compa ison in bea uni s,
o e ing a empo-no malized pe spec i e. The o e all
end mi o s he pe o mance-domain esul s in millisec-
ond, bu hese esul s a e s anda dized ac oss empi. AAE
emains a ound 0.3 bea s, wi h median alues ypically be-
low 0.2. To al AR is consis en ly lowe han he 2000 ms-
Da ase Me hod AAE↓(bea s) ± σMAE↓(bea s) AR↑(%)
(n)ASAP OLTWDixon 0.22 ± 0.27 0.13 83.4
OLTWA z 0.27 ± 0.30 0.16 85.2
HMM 0.80 ± 0.54 0.66 76.9
Ba ik OLTWDixon 0.20 ± 0.27 0.11 88.9
OLTWA z 0.29 ± 0.34 0.18 88.8
HMM 0.80 ± 0.38 0.67 59.3
Vienna4x22 OLTWDixon 0.31 ± 0.33 0.19 78.3
OLTWA z 0.37 ± 0.38 0.24 84.0
HMM 0.76 ± 0.78 0.51 70.3
Table 3. Bea -le el e alua ion esul s including o al align-
men a e (AR) (%).
Fea u e P ocess Online Alignmen
Type MAE (ms) La ency (ms) Me hod La ency (ms)
Ch oma 265.50 3.05 OLTWDixon 1.22
mel 297.92 3.40 OLTWA z 0.07
CQT 341.25 42.58 HMM 3.59
LSE 241.85 0.91
MFCC 931.81 2.58
Table 4. Compa ison o ea u e ypes and alignmen me h-
ods in e ms o alignmen e o (MAE) and la ency. LSE
is log-spec al ene gy ea u e ha was adop ed in [38]. La-
ency alues a e a e aged o e he ha dwa e se ups e alu-
a ed in Sec ion 4.
based me ic, e lec ing ha mos pieces ha e empi abo e
60BPM, whe e wo bea s span less han wo seconds.
In addi ion, a compa ison o a ious ea u e ypes and
la encies o he alignmen me hods a e epo ed in Table 4.
Among he ea u es, log-spec al ene gy (LSE) shows he
lowes MAE (241.85 ms) and delay (0.91 ms), indica -
ing s ong pe o mance wi h minimal o e head. In con-
as , CQT and MFCC yield highe MAE, wi h CQT also
equi ing conside able ex ac ion ime (42.58 ms), which
limi s i s eal- ime sui abili y. Fo alignmen me hods,
OLTWA z achie es he lowes la ency (0.07 ms), whe eas
HMM shows no iceably highe delay (3.59 ms) due o i s
compu a ional complexi y. These esul s highligh a ade-
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
95

Figu e 5. A sca e plo o mean absolu e e o (MAE) and
Henle’s di icul y le el in (n)ASAP and Ba ik da ase . The
MAE esul s a e om OLTWA z .
o be ween alignmen accu acy and un ime e iciency,
wi h LSE and OLTWA z p o iding a a o able balance
o low-la ency use.
The esul s also show abou he cha ac e is ics o he
da ase s. While he o e all alignmen pe o mance be-
ween (n)ASAP and Ba ik is compa able, Vienna4x22
shows no iceably highe e o a iance and ku osis. This
e lec s he da ase ’s unique s uc u e—22 di e se endi-
ions o each o only ou pieces—leading o subs an ial
a iabili y in exp essi e iming, a icula ion, and in e p e-
a ion. These a ia ions p esen addi ional challenges o
sco e ollowing and esul in hea ie - ailed e o dis ibu-
ions, as seen in he highe ku osis alues.
7. DISCUSSIONS
Figu e 5 u he illus a es he ela ionship be ween mu-
sical di icul y and alignmen accu acy o (n)ASAP and
Ba ik. We obse e a mode a e posi i e co ela ion ( =
0.24,p= 0.022) be ween MAE and he anno a ed di -
icul y le els, indica ing ha echnically mo e demand-
ing pieces end o p oduce la ge alignmen e o s. Vi-
enna4x22 was excluded om his analysis due o i s use
o sho exce p s, which makes consis en di icul y g ad-
ing un eliable.
To u he unde s and how alignmen beha io s di e
om me hods, Figu e 6 illus a es an example o alignmen
esul compa ing OLTWA z (le ) and HMM ( igh ). Al-
hough OLTWA z smoo hly ollows he bea e en s, he
HMM wa ping pa h shows equen ho izon al segmen s,
indica ing he “s icky” endency o s ay nea no e onse s,
e lec ing i s s a e-based o mula ion ha emphasizes on-
se ansi ions. This leads o cases whe e i linge s on
sus ained no es and becomes locally s uck, showing lim-
i ed o wa d momen um. The co esponding egion (high-
ligh ed in yellow) exhibi s changes in ha mony, no e den-
si y, and dynamics compa ed o he p eceding passage,
which p o ides su icien con as o he sco e ollowe
o eco e .
Las ly, we ound ha no only he choice o e alua ion
me ics, bu also how alignmen e o s a e compu ed (Sec-
ion 5.2) can a ec accu acy esul s o a meaning ul ex en .
Small di e ences in e o calcula ion some imes led o no-
iceable shi s in epo ed accu acy.
Figu e 6. Two examples o alignmen pa h wi h bea posi-
ions: (le ) OLTWA z , ( igh ) HMM.
8. USE CASES AND APPLICATIONS
To demons a e he p ac icali y o ou package, we buil
a ligh weigh web applica ion ha uns locally wi h
eal- ime audio inpu o p e- eco ded iles. Buil wi h
websocke -based communica ion, he sys em esponds
quickly enough o ensu e minimal pe cep ual delay. Ou
companion websi e includes a ideo demons a ion and a
link o he sou ce code. This applica ion aims o help e-
sea che s es hei own sco e ollowing models in an in-
e ac i e se ing. Beyond he web demo, ou package is
also used as he sco e ollowing module in he ACCompa-
nion [3], a eal- ime accompanimen sys em. These appli-
ca ions demons a e he e sa ili y o ou amewo k and
alida e i s u ili y in in e ac i e music scena ios.
9. CONCLUSIONS AND FUTURE WORK
We p esen ed a sys ema ic amewo k o eal- ime audio-
based sco e ollowing as he open-sou ce Py hon package
Ma chmake . I suppo s li e and simula ion-based e alua-
ion wi h baseline models, enabling ep oducible bench-
ma king ac oss da ase s and ea u es. Expe imen s on
h ee public piano da ase s show ha he OLTWA z a i-
an achie es he highes pe o mance and ha he onse -
sensi i e spec al ea u e (LSE) ou pe o ms ch oma in
bo h accu acy and la ency. Howe e , he cu en ame-
wo k is limi ed in i s suppo o empo models commonly
in eg a ed wi h HMM-based sco e ollowe s which may
pa ly explain he limi ed pe o mance o he HMM base-
line. Also, ecen wo ks o en include lea ned ea u es o
mul imodal inpu which poses a new challenge o e alu-
a e. Al hough ou e alua ion was limi ed o classical pi-
ano, ex ending Ma chmake o o he ins umen s and gen-
es equi es only adap ing he p ope da ase s and ea u e
ex ac ion modules. Fu u e wo k will ex end he ame-
wo k o suppo a wide a ie y o ins umen s and musical
s yles, and include addi ional ea u e ep esen a ions, ad-
anced empo modeling, and mul imodal inpu s.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
96
10. ACKNOWLEDGMENTS
This wo k has been suppo ed by he Aus ian Science
Fund (FWF), g an ag eemen PAT 8820923 (“Rach3: A
Compu a ional App oach o S udy Piano Rehea sals”).
Addi ionally, his wo k was suppo ed by he Na ional Re-
sea ch Founda ion o Ko ea (NRF) g an unded by he Ko-
ea go e nmen (MSIT) unde G an RS-2023-NR077289.
11. REFERENCES
[1] R. B. Dannenbe g, “An On-Line Algo i hm o Real-
Time Accompanimen ,” in P oceedings o he In e na-
ional Compu e Music Con e ence (ICMC ’84), Pa is,
F ance, 1984.
[2] B. Ve coe, “The Syn he ic Pe o me in he Con ex
o Li e Pe o mance,” in P oceedings o he In e na-
ional Compu e Music Con e ence (ICMC ’84), Pa is,
F ance, 1984.
[3] C. Cancino-Chacon, S. Pe e , P. Hu, E. Ka ys inaios,
F. Henkel, F. Fosca in, N. Va ga, and G. Widme , “The
ACCompanion: Combining Reac i i y, Robus ness,
and Musical Exp essi i y in an Au oma ic Piano
Accompanis ,” in P oceedings o he In e na ional
Join Con e ence on A i icial In elligence (IJCAI-23),
Macao, SAR, China, May 2023, a Xi :2304.12939
[cs, eess]. [Online]. A ailable: h p://a xi .o g/abs/
2304.12939
[4] K. A ms ong, T.-C. Hung, J.-X. Huang, and Y.-W.
Liu, “Real- ime piano accompanimen model ained
on and e alua ed acco ding o human ensemble cha ac-
e is ics,” in P oceedings o he Sound and Music Com-
pu ing (SMC), Po o, Po ugal, 2024.
[5] C. Raphael, “Music Plus One and Machine Lea ning,”
in P oceedings o he 27 h In e na ional Con e ence on
Machine Lea ning (ICML 2010), Hai a, Is ael, 2010.
[6] A. Maezawa, “I Go Rhy hm, so Follow Me Mo e:
Modeling Sco e-Dependen Timing Synch oniza ion
in a Piano Due ,” in P oceedings o he Sound and Mu-
sic Compu ing Con e ence (SMC 2024), Po o, Po u-
gal, 2024.
[7] A. A z , G. Widme , and S. Dixon, “Au oma ic Page
Tu ning o Musicians ia Real-Time Machine Lis en-
ing,” in P oceedings o he Eu opean Con e ence on
A i icial In elligence (ECAI), 2008.
[8] F. Henkel, S. Schwaige , and G. Widme , “Fully
Au oma ic Page Tu ning on Real Sco es,” in Ex ended
Abs acs o he La e-B eaking Demo Session o he
22nd In e na ional Socie y o Music In o ma ion
Re ie al Con e ence (ISMIR 2021), Online, 2021,
a Xi :2111.06643 [cs]. [Online]. A ailable: h p:
//a xi .o g/abs/2111.06643
[9] C. B azie and G. Widme , “Towa ds Reliable Real-
ime Ope a T acking: Combining Alignmen wi h
Audio E en De ec o s o Inc ease Robus ness,” in
P oceedings o he Sound and Music Compu ing
Con e ence, Online, 2020, a Xi :2006.11033 [cs,
eess]. [Online]. A ailable: h p://a xi .o g/abs/2006.
11033
[10] J. Pa k, S. Yong, T. Kwon, and J. Nam, “A Real-
Time Ly ics Alignmen Sys em Using Ch oma and
Phone ic Fea u es o Classical Vocal Pe o mance,” in
ICASSP 2024 - 2024 IEEE In e na ional Con e ence
on Acous ics, Speech and Signal P ocessing (ICASSP).
Seoul, Ko ea, Republic o : IEEE, Ap . 2024, pp.
1371–1375. [Online]. A ailable: h ps://ieeexplo e.
ieee.o g/documen /10445926/
[11] R. Gong, P. Cu illie , N. Obin, and A. Con , “Real-
ime audio- o-sco e alignmen o singing oice based
on melody and ly ic in o ma ion,” in In e speech
2015. ISCA, Sep. 2015, pp. 3312–3316. [Online].
A ailable: h ps://www.isca-a chi e.o g/in e speech_
2015/gong15_in e speech.h ml
[12] T. O suka, K. Nakadai, T. Takahashi, T. Oga a, and
H. G. Okuno, “Real-Time Audio- o-Sco e Alignmen
Using Pa icle Fil e o Coplaye Music Robo s,”
EURASIP Jou nal on Ad ances in Signal P ocessing,
ol. 2011, no. 1, p. 384651, Dec. 2011. [Online].
A ailable: h ps://asp-eu asipjou nals.sp inge open.
com/a icles/10.1155/2011/384651
[13] O. La illo , C. Cancino-Chacon, and C. B azie ,
“Real-Time Visualisa ion o Fugue Played by a S ing
Qua e ,” in P oceedings o he Sound and Music Com-
pu ing Con e ence (SMC 2020), Online, 2020.
[14] R. Dannenbe g and B. Mon -Reynaud, “Following
an Imp o isa ion in Real Time,” in P oceedings o
he In e na ional Compu e Music Con e ence, Cham-
paign/U bana, Illinois, USA, 1987.
[15] R. B. Dannenbe g and H. Mukaino, “New Techniques
o Enhanced Quali y o Compu e Accompanimen ,”
in P oceedings o he In e na ional Compu e Music
Con e ence, Cologne, Ge many, 1988.
[16] B. Ve coe and M. Pucke e, “Syn he ic Rehea sal:
T aining he Syn he ic Pe o me ,” in P oceedings o
he In e na ional Compu e Music Con e ence (ICMC
’85), Vancou e , BC, Canada, 1985.
[17] M. Pucke e, “Sco e ollowing using he sung oice,”
in P oceedings o he In e na ional Compu e Music
Con e ence (ICMC ’95), Ban , AB, Canada, 1995.
[18] P. Desain, H. Honing, and H. Heijink, “Robus Sco e-
Pe o mance Ma ching: Taking Ad an age o S uc-
u al In o ma ion,” in P oceedings o he In e na ional
Compu e Music Con e ence (ICMC), Thessalonki,
G eece, 1997.
[19] C. Raphael and Y. Gu, “O ches al Accompanimen
o a Rep oducing Piano,” in P oceedings o he In-
e na ional Compu e Music Con e ence (ICMC 2009),
Mon eal, Canada, 2009.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
97
[20] M. P ockup, D. G unbe g, A. H ybyk, and Y. E. Kim,
“O ches al Pe o mance Companion: Using Real-
Time Audio o Sco e Alignmen ,” IEEE Mul iMedia,
ol. 20, no. 2, pp. 52–60, Ap . 2013. [Online]. A ail-
able: h p://ieeexplo e.ieee.o g/documen /6530591/
[21] A. A z and G. Widme , “Real-Time Music T ack-
ing Using Mul iple Pe o mances as a Re e ence,” in
P oceedings o he 16 h In e na ional Socie y o Mu-
sic In o ma ion Re ie al Con e ence (ISMIR 2015),
Malaga, Spain, 2015.
[22] F. Henkel, S. Balke, M. Do e , and G. Widme ,
“Sco e Following as a Mul i-Modal Rein o cemen
Lea ning P oblem,” T ansac ions o he In e na ional
Socie y o Music In o ma ion Re ie al, ol. 2,
no. 1, pp. 67–81, No . 2019. [Online]. A ailable:
h p:// ansac ions.ismi .ne /a icles/10.5334/ ismi .31/
[23] A. Con , D. Schwa z, N. Schnell, and C. Raphael,
“E alua ion o Real-Time Audio- o-Sco e Alignmen ,”
in Music In o ma ion Re ie al E alua ion eXchange
(MIREX 2007), Vienna, Aus ia, 2007.
[24] A. Con , “An esco o: An icipa o y Synch oniza ion
and Con ol o In e ac i e Pa ame e s in Compu e Mu-
sic,” in P oceedings o he In e na ional Compu e Mu-
sic Con e ence (ICMC ’08), Bel as , I eland, 2008.
[25] J. Eche es e, P. Cu illie , and A. Con , “Imp o ed
Synch oniza ion o a P e-Reco ded Music Accompa-
nimen on a Use ’s Music Playing,” U.S. Pa en US
2023/0 082 086 A1, Ma ., 2023.
[26] S. Dixon and G. Widme , “MATCH: A Music Align-
men Tool Ches ,” in P oceedings o he 6 h In e na-
ional Con e ence on Music In o ma ion Re ie al (IS-
MIR 2005), London, UK, 2005.
[27] C. Jode , S. Essid, and G. Richa d, “Lea ning
Op imal Fea u es o Polyphonic Audio- o-Sco e
Alignmen ,” IEEE T ansac ions on Audio, Speech,
and Language P ocessing, ol. 21, no. 10, pp.
2118–2128, Oc . 2013. [Online]. A ailable: h p:
//ieeexplo e.ieee.o g/documen /6525340/
[28] M. Mulle , F. Ku h, and T. Rode , “Towa ds an E -
icien Algo i hm o Au oma ic Sco e- o-Audio Syn-
ch oniza ion,” in P oceedings o he 5 h In e na ional
Con e ence on Music In o ma ion Re ie al (ISMIR
2004), Ba celona, Spain, 2004.
[29] Z. Duan and B. Pa do, “A s a e space model
o online polyphonic audio-sco e alignmen ,” in
2011 IEEE In e na ional Con e ence on Acous ics,
Speech and Signal P ocessing (ICASSP). P ague,
Czech Republic: IEEE, May 2011, pp. 197–
200. [Online]. A ailable: h p://ieeexplo e.ieee.o g/
documen /5946374/
[30] P.-W. Chou, F.-N. Lin, K.-N. Chang, and H.-
Y. Chen, “A Simple Sco e Following Sys em o
Music Ensembles Using Ch oma and Dynamic Time
Wa ping,” in P oceedings o he 2018 ACM on
In e na ional Con e ence on Mul imedia Re ie al.
Yokohama Japan: ACM, Jun. 2018, pp. 529–532.
[Online]. A ailable: h ps://dl.acm.o g/doi/10.1145/
3206025.3206090
[31] M. Mulle , “Music Synch oniza ion,” in Funda-
men als o Music P ocessing. Cham: Sp inge
In e na ional Publishing, 2021, pp. 119–170. [On-
line]. A ailable: h ps://link.sp inge .com/10.1007/
978-3-030-69808-9_3
[32] M. Pé ez Fe nández, H. Ki chho , and X. Se a, “A
compa ison o pi ch ch oma ex ac ion algo i hms,” in
P oceedings o he 19 h Sound and Music Compu ing
Con e ence (SMC/JIM/IFC 2022). Sain -É ienne,
F ance: SMC Ne wo k, 2022. [Online]. A ailable:
h ps://doi.o g/10.5281/zenodo.6573082
[33] C.-T. Chen, J.-S. R. Jang, W.-S. Liu, and C.-
Y. Weng, “An e icien me hod o polyphonic
audio- o-sco e alignmen using onse de ec ion and
cons an Q ans o m,” in 2016 IEEE In e -
na ional Con e ence on Acous ics, Speech and
Signal P ocessing (ICASSP). Shanghai: IEEE,
Ma . 2016, pp. 2802–2806. [Online]. A ailable:
h p://ieeexplo e.ieee.o g/documen /7472188/
[34] J. J. Ca abias-O i, F. J. Rod iguez-Se ano, P. Ve a-
Candeas, and N. Ruiz-Reyes, “An Audio o Sco e
Alignmen F amewo k Using Spec al Fac o iza ion
and Dynamic Time Wa ping,” in P oceedings o he
16 h In e na ional Socie y o Music In o ma ion Re-
ie al Con e ence (ISMIR 2015), Malaga, Spain,
2015.
[35] F. Ko zeniowski and G. Widme , “Re ined Spec al
Templa e Models o Sco e Following,” in P oceedings
o he Sound and Music Compu ing Con e ence (SMC
2013), S ockholm, Sweden, 2013.
[36] S. Ewe , M. Mulle , and P. G osche, “High esolu ion
audio synch oniza ion using ch oma onse ea u es,”
in 2009 IEEE In e na ional Con e ence on Acous ics,
Speech and Signal P ocessing. IEEE, 2009, pp. 1869–
1872.
[37] A. Pillay, “A Neu al Sco e Followe o Compu e
Accompanimen o Polyphonic Musical Ins umen s,”
Mas e ’s hesis, Ca negie Mellon Uni e si y, Pi s-
bu gh, PA, USA, 2024.
[38] S. Dixon, “An On-Line Time Wa ping Algo i hm o
T acking Musical Pe o mances,” in P oceedings o he
19 h In e na ional Join Con e ence on A i icial In el-
ligence (IJCAI 05), Edinbu gh, Sco land, 2005.
[39] R. Mac ae and S. Dixon, “Polyphonic Sco e Follow-
ing Using on-Line Time Wa ping,” in Music In o -
ma ion Re ie al E alua ion eXchange (MIREX 2008),
Philadelphia, USA, 2008.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
98
[40] F. J. Rod iguez-Se ano, P. Ve a-Candeas, and J. J.
Ca abias-O i, “A Real-Time Sco e Followe o
MIREX 2015,” in Music In o ma ion Re ie al E al-
ua ion eXchange (MIREX 2015), Malaga, Spain, 2015.
[41] J. J. Ca abias, F. J. Rod iguez, and P. Ve a, “A Real-
Time Nm -Based Sco e Followe o MIREX 2012,”
in Music In o ma ion Re ie al E alua ion eXchange
(MIREX 2012), Po o, P o ugal, 2012.
[42] A. A z , “Real-Time Music T acking Using Tempo-
Awa e on-Line Dynamic Time Wa ping,” in P oceed-
ings o he Vienna Talk on Musical Acous ics (VITA),
Vienna, Aus ia, 2010.
[43] P. Cano, A. Loscos, and J. Bonada, “Sco e-
Pe o mance Ma ching using HMMs,” in P oceedings
o he In e na ional Compu e Music Con e ence, Bei-
jing, China, 1999.
[44] E. Nakamu a, P. Cu illie , and A. Con , “Au-
o eg essi e Hidden Semi-Ma ko Model o Sym-
bolic Music Pe o mance o Sco e Following,”
in P oceedings o he 16 h In e na ional Socie y
o Music In o ma ion Re ie al Con e ence (ISMIR
2015), Malaga, Spain, 2015. [Online]. A ailable:
h ps://a chi es.ismi .ne /ismi 2015/pape /000015.pd
[45] C. Raphael, “A Bayesian Ne wo k o Real-Time
Musical Accompanimen ,” in P oceedings o
he 14 h In e na ional Con e ence on Neu al
In o ma ion P ocessing Sys ems, Vancou e , BC,
Canada, 2001, pp. 1433–1439. [Online]. A ailable:
h ps://p oceedings.neu ips.cc/pape _ iles/pape /2001/
ile/2b0 658cb d284984 b11d90254081 -Pape .pd
[46] R. Yamamo o, S. Sako, and T. Ki a mu a, “Real-Time
Audio o Sco e Alignmen Using Segmen al Condi-
ional Random Fields and Linea Dynamical Sys em,”
in Music In o ma ion Re ie al E alua ion eXchange
(MIREX 2012), Po o, Po ugal, 2012.
[47] Y. Jiang and C. Raphael, “Sco e Following wi h Hid-
den Tempo Using a Swi ching S a e-Space Model,” in
P oceedings o he 21s In e na ional Socie y o Music
In o ma ion Re ie al Con e ence (ISMIR 2020), On-
line, 2020.
[48] M. Do e , F. Henkel, and G. Widme , “Lea ning o
Lis en, Read, and Follow: Sco e Following as a Rein-
o cemen Lea ning Game,” in P oceedings o he 19 h
In e na ional Socie y o Music In o ma ion Re ie al
Con e ence (ISMIR 2018), Pa is, F ance, 2018.
[49] S. Dixon, “Li e T acking o Musical Pe o mances us-
ing On-Line Time Wa ping,” in P oceedings o he
8 h In e na ional Con e ence on Digi al Audio E ec s
(DAFx’05), Mad id, Spain, 2005.
[50] A. A z and G. Widme , “Towa ds E ec i e Any-Time
Music T acking,” in P oceedings o he S a ing AI Re-
sea che s Symposium (STAIRS), held a ECAI 2010,
Lisbon, Po ugal, 2010.
[51] S. Behnel, R. B adshaw, C. Ci o, L. Dalcin, D. S.
Seljebo n, and K. Smi h, “Cy hon: The bes o bo h
wo lds,” Compu ing in Science & Enginee ing, ol. 13,
no. 2, pp. 31–39, 2011.
[52] S. D. Pe e , C. E. Cancino-Chacón, F. Fos-
ca in, A. P. McLeod, F. Henkel, E. Ka ys inaios,
and G. Widme , “Au oma ic No e-Le el Sco e- o-
Pe o mance Alignmen s in he ASAP Da ase ,”
T ansac ions o he In e na ional Socie y o Mu-
sic In o ma ion Re ie al, ol. 6, no. 1, pp.
27–42, Jun. 2023. [Online]. A ailable: h p:
// ansac ions.ismi .ne /a icles/10.5334/ ismi .149/
[53] P. Hu and G. Widme , “The Ba ik-plays-Moza Co -
pus: Linking Pe o mance o Sco e o Musicological
Anno a ions,” in P oceedings o he In e na ional So-
cie y o Music In o ma ion Re ie al Con e ence (IS-
MIR), 2023.
[54] W. Goebl, “The ienna 4x22 piano co pus,” 1999.
[Online]. A ailable: h p://dx.doi.o g/10.21939/4X22
[55] F. Fosca in, E. Ka ys inaios, S. D. Pe e , C. Cancino-
Chacón, M. G ach en, and G. Widme , “The ma ch
ile o ma : Encoding alignmen s be ween sco es and
pe o mances,” in P oceedings o he Music Encoding
Con e ence (MEC 2022), Hali ax, Canada.
[56] C. B azie and G. Widme , “Add essing he Reci a i e
P oblem in Real- ime Ope a T acking,” in P oceedings
o F on ie s o Resea ch in Speech and Music FRSM
2020, Online, Oc . 2020, a Xi :2010.11013 [eess].
[57] A. Mo si and X. Se a, “Bo lenecks and solu ions o
audio o sco e alignmen esea ch.” in P oceedings o
he 23 d In e na ional Socie y o Music In o ma ion
Re ie al Con e ence (ISMIR 2022), 2022, pp. 272–
279.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
99