scieee Science in your language
[en] (orig)

Quantifying Regularity in Music Structure Analysis

Author: Brian McFee
Publisher: Zenodo
DOI: 10.5281/zenodo.17706327
Source: https://zenodo.org/records/17706327/files/000004.pdf
QUANTIFYING REGULARITY IN MUSIC STRUCTURE ANALYSIS
B ian McFee
Music and Audio Resea ch Lab, New Yo k Uni e si y
[email p o ec ed]
ABSTRACT
This a icle desc ibes objec i e measu es o segmen eg-
ula i y o use in e alua ing musical s uc u e anno a ions.
The co e idea de i es om iden i ying simple a io ela-
ionships be ween segmen du a ions (e.g., 2:1 o 3:4), and
can be implemen ed in bo h musical ime (bea s) o ab-
solu e ime (seconds). Ex ensions a e p oposed o u he
quan i y egula i y wi hin labeled segmen g oups, ac oss
hie a chical le els, and e alua e balance o uni o mi y o
segmen du a ions. The e icacy o he p oposed me hods
is demons a ed h ough an empi ical s udy o se e al s an-
da d da ase s o music s uc u e analysis.
The esul s indica e: 1) unde easonable assump ions
o empo s abili y, egula i y can be eliably measu ed in
absolu e ime, 2) mos exis ing da ase s exhibi egula -
i y, 3) egula i y in e ac s meaning ully wi h segmen la-
belling, 4) egula i y and balance a e dis inc concep s, and
5) mul i-le el segmen a ions exhibi c oss-le el egula i y.
1. INTRODUCTION
Au oma ic music s uc u e analysis can be hough o as
being d i en by ou undamen al p inciples: homogene-
i y, no el y, epe i ion, and egula i y. The i s h ee p in-
ciples ha e been ui ully exploi ed in algo i hm design,
e.g., he design o sel -simila i y ma ices as an in e me-
dia e ep esen a ion o bounda y de ec ion and sec ion la-
beling. Simila ly, hese p inciples ha e led o he design o
e alua ion c i e ia which quan i y he ag eemen be ween
wo anno a ions unde one o mo e p inciples (e.g., bound-
a y de ec ion me ics quan i y ag eemen in no el y). The
egula i y p inciple, howe e , has p o en o be somewha
ickie o in eg a e in o algo i hm design and e alua ion.
While a ew me hods ha e been p oposed o p omo e o
en o ce egula i y among au oma ically gene a ed segmen-
a ions, he e is a p esen no sys ema ic me hod o quan i-
ying he egula i y o a empo al segmen a ion.
This pape desc ibes a amily o quan i a i e me ics o
assess he egula i y o empo al segmen a ions. The p o-
posed me ics include o mula ions o unlabeled, labeled,
and hie a chical segmen a ions, and do no depend on bea
o downbea es ima ions. Using hese me ics, we analyze
© B. McFee. Licensed unde a C ea i e Commons A i-
bu ion 4.0 In e na ional License (CC BY 4.0). A ibu ion: B. McFee,
“Quan i ying egula i y in music s uc u e analysis”, in P oc. o he 26 h
In . Socie y o Music In o ma ion Re ie al Con ., Daejeon, Sou h Ko ea,
2025.
he e e ence anno a ions p o ided in se e al commonly
used da ase s o music s uc u e analysis. The goal o his
wo k is no o p opose new algo i hms o s uc u e analy-
sis, bu a he o gain insigh s abou how “ egula i y” man-
i es s in exis ing s uc u al anno a ions.
2. BACKGROUND AND RELATED WORK
Wi hin he music in o ma ion e ie al ield, he e is a well-
es ablished li e a u e on music s uc u e analysis, and mul-
iple s udies ha e p oposed axonomies o pe cep ual and
musical p ope ies used o in o m he design o algo i hms
o he ask [1–3]. In his wo k, we ollow he mos ecen
su ey by Nie o e al. [3], which ex ends he ea lie axon-
omy o Paulus e al. [2] o include ou go e ning p inci-
ples o music s uc u e analysis: homogenei y (segmen s
end owa d sel -simila i y), no el y (segmen bounda ies
coincide wi h pe cep ible changes), epe i ion (segmen s
consis o and may be iden i ied by epea ing sequences),
and mos ele an o he cu en s udy: egula i y, which
b oadly conce ns he dis ibu ion o segmen du a ions.
Regula i y has been in oked in a ious o ms by al-
go i hm de elope s, hough i is ela i ely unde -explo ed
compa ed o he o he go e ning p inciples. Sa gen e al.
p oposed an explici objec i e o penalize de ia ion om
expec ed segmen du a ions (measu ed in bea s) [4, 5].
While Sa gen ’s penal y is a mono onic unc ion o di e -
ence om he expec ed du a ion, Ma mo e e al. p oposed
a penal y ha p omo es du a ions o speci ic mul iples o
ba leng hs, e.g., p e e ing segmen du a ions o align wi h
speci ic in ege mul iples o ba s (8, 4, o 2) [6].
O he au ho s ha e p oposed implici models o seg-
men egula i y. McFee and Ellis p oposed a clus e ing-
based segmen a ion algo i hm in which segmen s a e pe-
nalized in p opo ion o hei du a ion, and he in luence o
his penal y was op imized o e aining da a [7]. Maezawa
p oposed a ecu en neu al ne wo k model o lea n he dis-
ibu ion o segmen du a ions om aining da a [8]. In
bo h cases, he no ion o egula i y is da a-d i en and im-
plici , a he han de i ing om explici ly coded domain
knowledge. The end esul is quali a i ely simila , in ha
he algo i hms a e incen i ized o p oduce some segmen a-
ions o e o he s acco ding o he dis ibu ion o du a ions.
Ou side o algo i hm design, ela i ely li le ocus has
been placed on iden i ying o quan i ying egula i y in mu-
sic s uc u e analysis. Mos closely ela ed is he wo k o
Smi h and Go o, who cha ac e ized he dis ibu ion o seg-
men du a ions in he SALAMI da ase [9,10]. Thei s udy
36
(1, 1)=1
(1, 2)=1
(2, 3)=1/2
(3, 4)=1/3
(3, 5)=1/3
Figu e 1. Examples o egula and i egula segmen a ions
de e mined by d1(blue) and d2(o ange). The pa e ned
egions illus a e he la ges uni which di ides bo h d1and
d2, and mul iples o his uni a e ma ked by dashed lines.
yielded se e al indings: mos no ably om he pe spec i e
o egula i y is he obse a ion ha he du a ions o adja-
cen segmen s end o exhibi simple in ege a ios. Smi h
and Go o exploi ed his and ela ed obse a ions de i ed
om es ima ed segmen du a ion o in o m he selec ion o
segmen a ion algo i hms in an ensemble me hod.
This wo k syn hesizes and ex ends he abo e no ions
o egula i y. The co e idea is a di ec ex ension o he
“simple in ege a io” obse a ion o Smi h and Go o [9].
I gene alizes and o malizes he de ini ions o egula i y
p oposed by Sa gen [5] and Ma mo e [6], while also sup-
po ing analysis in absolu e ime a he han elying on po-
en ially inaccu a e bea and downbea es ima ion.
3. METHODS
Al hough egula i y may seem like a s aigh o wa d con-
cep , mos p io wo k s ops sho o p o iding a o mal
de ini ion. Fo example, Sa gen e al. de ine egula i y as
“segmen s o compa able size” o “con o ming o a spe-
ci ic segmen model” (i.e., close o an expec ed alue), and
ansla e his high-le el desc ip ion in o a penal y e m ha
scales by di e gence om an expec ed du a ion. This in-
ui ion cap u es si ua ions whe e segmen s ha e uni o m
du a ions ( ig. 1, op ow), bu does no include si ua ions
whe e one du a ion di ides ano he ( ig. 1, second ow).
3.1 Tempo al di isibili y
The p oposed no ion o egula i y de i es om he ques-
ion: wha do wo segmen du a ions ha e in common?
When wo segmen s ha e equal du a ion— he mos “ egu-
la ” con igu a ion possible— he answe is e e y hing. The
same is ue, in a sense, when one du a ion di ides he
o he : he longe du a ion consis s o “ egula ” epe i ions
o he sho e du a ion. Mo e in e es ing cases a ise when
he wo du a ions a e no in ege mul iplies o each o he ,
e.g., he pai (2,3). In such cases, we can di ide he seg-
men s in o smalle pieces un il a common uni is ound ha
i s e enly in o bo h. The less di ision is needed o achie e
his, he mo e “ egula ” he segmen s appea .
This in ui ion can be o malized in e ms o he g ea es
common di iso (gcd): in he case abo e, gcd{2,3}= 1.1
No malizing by he smalle o he wo du a ions yields a
1We will assume o now ha du a ions a e in ege - alued; he eal-
alued ex ension is desc ibed in sec ion 3.2.
simple exp ession ha o malizes he ques ion abo e:
ρ(d1, d2) := gcd{d1, d2}
min{d1, d2}.(1)
Regula i y o wo segmen s is hus de ined as he la ges
uni o ela i e ime ha di ides bo h du a ions.
Because gcd and min a e bo h associa i e ope a o s,
eq. (1) would di ec ly gene alize o suppo mo e han wo
segmen s unde compa ison. 2Howe e , bo h ope a o s
a e sensi i e o single inpu elemen s: a p ime numbe may
domina e he gcd calcula ion, while a small numbe would
domina e he min calcula ion. This could in u n lead o
an o e ly sensi i e me ic i applied nai ely o an en i e
segmen a ion. Ins ead, we may agg ega e o e pai wise
compa isons be ween dis inc segmen s:
R(S) := 2
|S| · (|S| − 1) X
d16=d2∈S
ρ(d1, d2),(2)
whe e Sdeno es he se o segmen du a ions. This ap-
p oach is mo e obus and lends i sel na u ally o use ul
ex ensions by es ic ing he pai s unde compa ison, as
demons a ed in sec ions 3.4 and 3.5.
3.2 Musical ime and absolu e ime
In eq. (1), du a ions d1and d2a e assumed o be in ege -
alued so ha gcd is well-de ined. This is easonable when
ime is measu ed in musical ime (bea s), bu i is no di-
ec ly applicable o du a ions measu ed in absolu e ime
(seconds).
To esol e his, a wo-s age p e-p ocessing o du a ions
is implemen ed. Fi s , as in mos segmen a ion e alua ion
me ics [11], du a ions a e quan ized wi h espec o a ixed
ame a e (e.g., 10 Hz), so ha d7→ bd/ c. This p o-
duces in ege - alued du a ions measu ed in ames, hough
i is s ill possible ha app oxima ely commensu a e du a-
ions would achie e a small ρ- alue due o sampling and
ounding in he loo ope a ion. To comba his, in he sec-
ond s age, ρis compu ed o alues d1+δand d2+wi hin
a ole ance window −w≤δ,  ≤w. As in bounda y de ec-
ion me ics [11], he de aul wco esponds o 0.5seconds,
o equi alen ly, one bea a 120BPM. 3The o se s δ,  a e
chosen o maximize he sco e as ollows:
˜ρ(d1, d2) := max
−w≤δ,≤wρd1+δ
,d2+
.(3)
This modi ica ion allows eq. (3) o g ace ully suppo
empo a ia ions and small de ia ions o bounda y posi-
ions, while s ill cap u ing he co e p inciple o eq. (1).
˜
R(S)is analogously de ined as he a e age o e all pai -
wise du a ion compa isons. The maximiza ion in eq. (3) is
pe o med by compu ing ρo e a g id o (2w/ + 1) ×
(2w/ + 1) sample poin s, which unde he de aul alues
desc ibed below, is 11 ×11 and e icien in p ac ice.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
37
Table 1. Empi ical mean ˜ρ o uni o mly sampled segmen
du a ions d1≤d2in he ange [4,60] seconds, o di e en
ame a es (Hz) and ole ance window w(seconds).
w= 0 w= 0.25 w= 0.5
= 40 0.009 0.349 0.477
= 20 0.016 0.343 0.477
= 10 0.029 0.261 0.477
= 2 0.111 0.111 0.427
3.3 P ope ies
Be o e going in o ex ensions and applica ions, i is wo h
pausing o ake no e o a ew p ope ies o ρ, ˜ρ, and R.
Boundedness Since gcd{d1, d2} ≤ min{d1, d2}, eqs. (1)
and (3) a e bounded a 1, wi h equali y when d2is an
in ege mul iple o d1(o ice e sa). The minimal alue
1/min{d1, d2}is achie ed by ela i ely p ime (d1, d2).
Scale-in a iance Fo any posi i e a ional csuch ha c·d1
and c·d2a e in ege s, ρ(c·d1, c ·d2) = ρ(d1, d2).
A ainable alues ρ(d1, d2) = 1/N o some posi i e in-
ege N. This is because c= 1/gcd{d1, d2}, which
sa is ies he scale in a iance p ope y abo e, implies
ρ(d1, d2) = ρ(c·d1, c ·d2) = 1/min{c·d1, c ·d2}.
Expec ed alue Table 1 epo s he empi ical mean ˜ρ o
uni o mly sampled du a ion pai s o e he ange [4,60]
seconds. Fo he p oposed de aul ole ance o w= 0.5,
he mean ˜ρ≈0.477 is s able o di e en ame a es .
3.4 Ex ension 1: Sec ion labels
Equa ion (2) a e ages o e all uno de ed pai s o dis inc
segmen s. I o en occu s ha no all segmen s a e ele-
an o include in his compa ison: o example, in oduc-
o y silences o c owd noise may exis ou side o musi-
cal ime and he e o e no pa icipa e meaning ully in eg-
ula i y. Simila ly, sec ions wi h signi ican de ia ions in
empo om he emainde o he eco ding may esul in
low sco es unde eq. (3), and a case could be made ha
hese should be ea ed sepa a ely.
Mo e gene ally, one may conside a no ion o es ic ed
egula i y ha only compa es segmen s wi h he same sec-
ion label (e.g., e se o cho us). Unde sui able label-
ing con en ions, his iew encapsula es he examples lis ed
abo e, and p o ides a simple mechanism o exclude seg-
men s wi h spo adically occu ing labels. This idea can be
implemen ed wi h a s aigh o wa d modi ica ion o eq. (2)
whe e a collec ion o dis inc segmen pai s P⊂S×Sis
p o ided a he han he en i e segmen a ion S:
RL(P) = 1
|P|X
(d1,d2)∈P
ρ(d1, d2).(4)
2The associa i e p ope y o gcd and min also implies ha he edge
case o a segmen a ion consis ing o only one segmen should p oduce a
sco e o 1. This con en ion is adop ed he e.
3δis cons ained o d+δ≥ so ha eq. (3) is well-de ined.
Label ag eemen is a simple way o gene a e he pai
se P, hough he de ini ion abo e suppo s o he schemes,
e.g.au oma ic hie a chy expansion ( o app oxima e ag ee-
men ) [12]. Rela edly, he empo al p oximi y obse a ion
o Smi h and Go o [9] can be implemen ed he e by gene -
a ing pai s o sequen ially adjacen du a ions:
P={(di, di+1)|0≤i < |S| − 1}.
3.5 Ex ension 2: Hie a chical egula i y
Equa ion (2) can be modi ied o e alua e he egula i y o
hie a chical segmen a ions. No e ha eq. (2) ope a es on
pai s o du a ions, bu i does no equi e ha he segmen s
unde compa ison a e disjoin in ime o o m a alid seg-
men a ion. I H= (S0, S1, . . . )deno es a mul i-le el seg-
men a ion (wi h each Sideno ing now he collec ion o in-
e als a he i h segmen a ion le el), a pai se Pcan be
gene a ed by ma ching each segmen a le el i o i s max-
imally o e lapping segmen a each le el j < i. The sim-
pli ied case o a wo-le el hie a chy H= (S0, S1)yields
P=(|s|,| |)| ∈S1∧s=a gmaxs∈S0|s∩ |,
whe e |s|deno es he du a ion o in e al s, and |s∩ |de-
no es he o e lap du a ion be ween in e als sand . E al-
ua ing ρon each such pai cap u es how e enly he hie a -
chy di ides segmen s om one le el o he nex .
3.6 Ex ension 3: Balance
Equa ion (1) cap u es a o m o egula i y whe e du a ions
a e ela ed by simple a ios. This di e s om p e ious no-
ions o egula i y, which we e designed o a o segmen s
o equal du a ion [5]. This no ion can be eco e ed by e-
placing he min no maliza ion in eq. (1) by max:
β(d1, d2) := gcd{d1, d2}
max{d1, d2}.(5)
Equa ion (5) hus cap u es he balance o d1and d2: a
sco e o 1 is only achie ed when d1=d2, a sco e o 1/2is
achie ed when hey a e ela ed by a ac o o 2, and so on.
In gene al, β(d1, d2)≤ρ(d1, d2), and i o he wise inhe i s
he boundedness, scale-in a iance, and in ege ecip ocal
p ope ies no ed abo e. Repea ing he calcula ions behind
able 1 o ˜
β esul s in an expec ed alue o 0.216 o uni-
o mly andom du a ions and w= 0.5.
As abo e, his also gi es ise o an agg ega e pai wise
sco e B(S), sampled e sions ˜
βand ˜
B, and labeled and
hie a chical a ia ions.
4. EXPERIMENTS
The p oposed me ics a e e alua ed on he e e ence anno-
a ions p o ided by a a ie y o commonly used s uc u e
analysis da ase s spanning mul iple gen es:
Bea les (TUT) 174 Bea les songs using he TUT segmen-
a ions [13] and Isophonics bea anno a ions [14].
Ha monixSe 912 popula songs wi h segmen and bea
anno a ions [15].
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
38
Figu e 2. Segmen du a ions o each da ase .
Jazz S uc u e Da ase (JSD) 340 acks [16]. Fo he
labeled me ics, he cho us and heme coun e ields a e
disca ded om segmen label s ings, and segmen s la-
beled silence a e ea ed as mu ually dis inc .
Jazz Audio-aligned Ha mony (JAAH) 113 acks wi h
labels de i ed om he pa s anno a ions [17].
Real-wo ld Compu ing (RWC) 211 acks (100 popula ,
61 classical, 50 jazz) [18]. Fo labeled me ics, segmen s
labeled as "no hing" a e ea ed as mu ually dis inc , and
labels a e simpli ied by disca ding pa en he ical a ia-
ions (e.g.,“cho us A (+1)” 7→ “cho us A”).
SALAMI 1359 acks om he publicly a ailable
da ase [10], consis ing o 4486 anno a ions (2243 up-
pe , 2243 lowe ). Sec ions labeled as "Z" o "silence"
a e ea ed as mu ually dis inc o labeled me ics, and
a ia ion ma ke s a e disca ded (e.g.,A07→ A).
Figu e 2 illus a es he dis ibu ion o segmen du a ions
o each da ase .
The e alua ion seeks o explo e he ollowing ques ions:
1. How do he absolu e ime me ics (˜ρ,˜
β) di e om
he musical ime me ics (ρ,β)?
2. Do s uc u e anno a ions exhibi egula i y and/o
balance? Does his a y wi h gen e?
3. A e mul i-le el segmen a ions egula ac oss le els?
In se ice o he i s ques ion, we compa ed sco es de-
i ed om absolu e ime (using he app oach desc ibed
in sec ion 3.2) o he simple o ms de i ed om in ege -
alued du a ions measu ed in bea s. This analysis is e-
s ic ed o he da ase s wi h e e ence bea anno a ions:
Bea les, Ha monixSe , JAAH, and RWC. Each segmen
bounda y is mapped o i s nea es bea , and segmen du-
a ions da e measu ed in bea s be ween he s a and end
bounda ies. A p elimina y s udy e ealed sensi i i ies o
ounding e o in bea posi ion iden i ica ion, which we e
esol ed by including a maximiza ion o e {d−1, d, d+1}.
The (Pea son) co ela ion was hen compu ed be ween he
musical- ime and absolu e- ime me ics o each da ase .
Fo he second ques ion, unlabeled and labeled o ms
o he absolu e ime me ics we e compu ed. As a poin o
compa ison, me ics we e also compu ed unde es ic ion
o adjacen segmen s [9], deno ed he e as RS,BS,e c.
Table 2. Mean egula i y and balance sco es using musical
ime, bo h unlabeled (R, B) and labeled (RL,BL).
R RLB BL
Bea les (TUT) 0.681 0.847 0.459 0.834
Ha monix 0.728 0.799 0.524 0.731
JAAH 0.741 0.878 0.488 0.869
RWC Classical 0.599 0.789 0.391 0.765
RWC Jazz 0.914 0.949 0.789 0.945
RWC Popula 0.820 0.958 0.587 0.945
Table 3. Mean egula i y and balance sco es using abso-
lu e ime, bo h unlabeled ( ˜
R, ˜
B) and labeled ( ˜
RL,˜
BL).
˜
R˜
RL˜
B˜
BL
Bea les (TUT) 0.704 0.820 0.394 0.805
Ha monix 0.730 0.789 0.498 0.719
JSD 0.732 0.606 0.344 0.591
JAAH 0.646 0.793 0.411 0.784
RWC Classical 0.506 0.709 0.298 0.673
RWC Jazz 0.818 0.856 0.720 0.838
RWC Popula 0.791 0.941 0.560 0.925
SALAMI (uppe ) 0.776 0.719 0.373 0.619
SALAMI (lowe ) 0.875 0.889 0.684 0.840
Fo he hi d ques ion, we es ic a en ion o he
SALAMI da ase , and e alua e hie a chical egula i y and
balance using he pai ed uppe - and lowe -le el anno a-
ions o each ack.
5. RESULTS
5.1 Musical ime s. absolu e ime
Table 2 epo s he a e age alue o he egula i y and bal-
ance me ics on each o he da ase s lis ed abo e o which
segmen du a ions can be eliably measu ed in bea s. As
should be expec ed, he labeled o ms a e gene ally sub-
s an ially highe han he unlabeled o ms. Each da ase
exhibi s high labeled egula i y (signi ican ly abo e 0.5),
as well as high labeled balance, indica ing ha simila ly la-
beled segmen s do consis en ly span equi alen du a ions.
Table 3 summa izes he absolu e- ime me ics ac oss all
da ase s, and Figu e 3 illus a es he co ela ion be ween
hese and he musical ime da a epo ed in able 2. The
co ela ions a e gene ally high (abo e 0.6), wi h a ew no-
able excep ions in he jazz and classical da ase s. These
excep ions may be explained by he empo dis ibu ions
o each da ase , illus a ed in ig. 4. Recall ha he ab-
solu e ime me ic uses a ole ance window o 0.5 sec-
onds, equi alen o one bea a 120BPM. I a ack is much
slowe —e.g., RWC Classical wi h median empo o 87.1,
o RWC Jazz wi h median empo o 89.4— he maximiza-
ion in eq. (3) will no co e a ull bea , so a la ge window
may be wa an ed. Howe e , no e ha i he empo is s a-
ble, his becomes less o an issue because absolu e- and
musical- ime a e app oxima ely p opo ional, which is ex-
ploi ed by he scale-in a iance p ope y o ρ.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
39
Regula i y (unlabeled)
Regula i y (labeled)
Balance (unlabeled)
Balance (labeled)
Bea les (TUT)
Ha monix
JAAH
RWC Classical
RWC Jazz
RWC Popula
0.73 0.80 0.85 0.81
0.91 0.93 0.95 0.95
0.59 0.66 0.85 0.70
0.55 0.70 0.59 0.74
0.48 0.38 0.82 0.36
0.74 0.80 0.86 0.84 1.0
0.5
0.0
0.5
1.0
Figu e 3. Pea son co ela ion be ween musical- and
absolu e- ime me ics o each da ase .
Figu e 4. Tempo de i ed om e e ence bea anno a ions.
Each poin co esponds o he mean empo o one eco d-
ing. 120BPM is ma ked in ed as a e e ence poin .
Figu e 5 illus a es he dis ibu ions o empo s abili y,
measu ed as he s anda d de ia ion o in e -bea -in e al.
Da ase s wi h high empo s abili y end o exhibi high co -
ela ion in ig. 3 e en when hey con ain many low- empo
acks (e.g., Bea les, Ha monix, and RWC Pop).
5.2 Unlabeled and labeled egula i y
Figu e 6 illus a es he ela ionship be ween labeled and
unlabeled egula i y me ics. Consis en wi h he summa y
in able 3, he unlabeled egula i y sco es a e gene ally
qui e dispe sed, while he labeled sco es skew highe , con-
i ming ha segmen s belonging o di e en ly labeled sec-
ions may no con o m o egula du a ion ela ionships.
Two excep ions o his obse a ion a e JSD and SALAMI
(uppe ). In bo h cases, labeled egula i y dec eases om
he unlabeled sco es. These cases may be explained by
he use o sho silence segmen s, which di ide e enly in o
mos o he segmen s, con ibu ing many la ge alues o he
Figu e 5. Tempo s abili y o each da ase , as measu ed by
he s anda d de ia ion o local empo de i ed om in e -
bea in e als in he e e ence anno a ions. Each poin ep-
esen s he s anda d de ia ion o empo o one eco ding.
Bea les (TUT)
Ha monix
JSD
JAAH
RWC Classical
RWC Jazz
RWC Popula
SALAMI (uppe )
SALAMI (lowe )
Figu e 6. Labeled s. unlabeled egula i y me ics o
each anno a ion in each da ase .
a e age in eq. (2). In he labeled egula i y calcula ion,
each silence segmen is ea ed as dis inc , elimina ing his
sou ce o in la ion. Segmen s o his na u e a e less p e a-
len in he o he da ase s (e.g., RWC o JAAH).
Table 4 summa izes he esul s o egula i y and balance
when compu ed on adjacen segmen pai s. While he e a e
clea egula i y ends, con i ming he p io wo k o Smi h
and Go o, he e ec is no gene ally as p e alen as he
label-ag eemen esul s epo ed in able 3.
5.3 Balance s. Regula i y
Figu e 7 illus a es he dis ibu ion o he di e ence be-
ween labeled egula i y and labeled balance in each
Table 4. Sequen ial egula i y and balance me ics in bo h
musical ime (RS,BS) and absolu e ime ( ˜
RS,˜
BS).
RS˜
RSBS˜
BS
Bea les (TUT) 0.666 0.651 0.399 0.398
Ha monix 0.720 0.696 0.500 0.483
JSD — 0.729 — 0.479
JAAH 0.786 0.699 0.563 0.495
RWC Classical 0.604 0.521 0.380 0.311
RWC Jazz 0.935 0.872 0.818 0.780
RWC Popula 0.839 0.806 0.592 0.562
SALAMI (uppe ) — 0.746 — 0.420
SALAMI (lowe ) — 0.882 — 0.753
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
40

Figu e 7. The dis ibu ions o di e ence be ween labeled
egula i y and balance: ∆ = ˜
RL−˜
BL o each da ase .
Figu e 8. Hie a chical sco es on SALAMI.
da ase . The balance sco es canno exceed he egula -
i y sco es—so he di e ence is non-nega i e— hough each
da ase does exhibi e y high co ela ion be ween eg-
ula i y and balance: all co ela ion coe icien s exceed
0.95. While some da ase s gene ally end o ma ch bal-
ance and egula i y (Bea les, JSD, JAAH, RWC Jazz and
Pop), o he s di e ge subs an ially (Ha monix, RWC Clas-
sical, SALAMI). This demons a es ha egula i y and bal-
ance a e indeed dis inc quali ies o segmen a ion.
5.4 Hie a chical egula i y
Figu e 8 illus a es he dis ibu ion o hie a chical egula -
i y and balance sco es on he SALAMI da ase . As ex-
pec ed, he balance sco es end o be low due o he sho e
du a ion o segmen s in he lowe le el anno a ions.
In e es ingly, he egula i y sco es a e gene ally qui e
high, wi h a median alue o 0.969. This can be in e -
p e ed b oadly as con i ming ha uppe -le el segmen s a e
comp ised o whole epe i ions o lowe -le el segmen du-
a ions. While his may be in ui i ely expec ed gi en he
anno a ion ules, i is no an ob ious conclusion om he
single-le el analyses in he p e ious sec ion. Figu e 6 il-
lus a es ha lowe -le el segmen a ions end o be highly
egula ( ˜
RL≈0.889) and highly balanced ( ˜
BL≈0.840),
while uppe -le el segmen a ions a e sligh ly less egula
(˜
RL≈0.719) and o en less balanced ( ˜
BL≈0.619).
6. DISCUSSION
F om he indings abo e, we can d aw some conclusions
abou he ole o egula i y in music s uc u e analysis.
Fi s , because hese analyses a e conduc ed on e e -
ence anno a ions (no model ou pu s), he esul s e lec he
pbeha io o human anno a o s, and no algo i hms. The
dis ibu ion plo s in ig. 6 indica e ha al hough he mean
egula i y sco es a e gene ally high ac oss da ase s, he e
is conside able a iabili y ac oss indi idual acks. While
hese esul s de i e om he absolu e ime me ics, he high
co ela ion wi h he musical ime me ics sugges s ha his
is gene ally no explained by empo a ia ion, and a he
e lec s widesp ead and meaning ul s uc u al i egula i y
in many da ase s. This sugges s ha egula i y, i aken as
a design p inciple in segmen a ion algo i hms, should be
ea ed wi h some ca e o allow o i egula segmen a ions
when wa an ed by he ack in ques ion.
Second, he disc epancy be ween labeled and unlabeled
me ics can be qui e la ge (Bea les, Ha monix, RWC Clas-
sical and Pop). This co esponds o non- i ial in e ac ions
be ween he egula i y and epe i ion p inciples (as ela ed
o segmen label ag eemen ), which had no been iden i-
ied in p e ious s udies. Modeling and ui ully exploi ing
hese in e ac ions would be an in e es ing di ec ion o u-
u e wo k in s uc u e analysis algo i hms.
Thi d, some da ase s exhibi signi ican disc epancies
be ween egula i y and balance (Ha monix, SALAMI).
This demons a es ha segmen du a ions in ac exhibi
mo e complex pa e ns han simple equi alence.
7. LIMITATIONS
The p oposed me hods a e applicable o quan i a i e e al-
ua ion o segmen a ions, bu hey do exhibi some limi a-
ions. Fi s , he absolu e ime de ini ion does appea o ex-
hibi sensi i i y o empo a ia ion, in pa icula as i ela es
o he choice o ole ance window. In si ua ions whe e high
empo a ia ion may be expec ed, i may be p e e able o
ei he apply he musical ime o mula ion using es ima ed
bea posi ions (i hey a e eliable), o adap he ole ance
window o i he (es ima ed) empo o he ack.
Second, sho segmen s may a i icially in la e sco es by
being easily di isible in o long segmen s. This is pa ially
add essed by he labeled ex ension, as sho segmen s end
o be spo adic and un ela ed o he majo i y o a ack, e.g.,
a sho silence segmen a he beginning o end.
Finally, he p oposed me ics do no easily lend hem-
sel es o di e en iable o mula ions which may be in-
eg a ed as lea ning objec i es o penal ies in cu en
g adien -based lea ning amewo ks. While i may be pos-
sible o do so, e.g., by p e-compu ing a look-up able
o pai wise du a ion compa isons, o he di icul ies may
a ise in adap ing he ideas in o p ac ical segmen a ion al-
go i hms. S ill, he p oposed me ics may be mo e easily
in eg a ed as pos -p ocessing s eps, e.g., o iden i y mean-
ing ul le els o include in a mul i-le el segmen a ion, o o
selec among a collec ion o p oposed segmen a ions gen-
e a ed by an ensemble o me hods.
8. ACKNOWLEDGMENTS
The au ho hanks Qingyang (Tom) Xi and Meina d Mülle
o help ul discussions and ea ly eedback.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
41
9. REFERENCES
[1] G. Pee e s, “De i ing musical s uc u es om sig-
nal analysis o music audio summa y gene a ion:
"sequence" and "s a e" app oach,” in Compu e
Music Modeling and Re ie al, In e na ional Sym-
posium, CMMR 2003,Mon pellie , F ance, May
26-27, 2003, Re ised Pape s, se . Lec u e No es
in Compu e Science, U. K. Wiil, Ed., ol. 2771.
Sp inge , 2003, pp. 143–166. [Online]. A ailable:
h ps://doi.o g/10.1007/978-3-540-39900-1 _14
[2] J. Paulus, M. Mülle , and A. Klapu i, “S a e
o he a epo : Audio-based music s uc u e
analysis.” in P oceedings o he 11 h In e na ional
Socie y o Music In o ma ion Re ie al Con e ence.
ISMIR, Aug. 2010, pp. 625–636. [Online]. A ailable:
h ps://doi.o g/10.5281/zenodo.1417289
[3] O. Nie o, G. J. Myso e, C.-i. Wang, J. B. L. Smi h,
J. Schlü e , T. G ill, and B. McFee, “Audio-based mu-
sic s uc u e analysis: Cu en ends, open challenges,
and applica ions,” T ansac ions o he In e na ional So-
cie y o Music In o ma ion Re ie al, Dec 2020.
[4] G. Sa gen , F. Bimbo , and E. Vincen , “A
egula i y-cons ained i e bi algo i hm and i s ap-
plica ion o he s uc u al segmen a ion o songs.”
in P oceedings o he 12 h In e na ional Socie y
o Music In o ma ion Re ie al Con e ence. IS-
MIR, Oc . 2011, pp. 483–488. [Online]. A ailable:
h ps://doi.o g/10.5281/zenodo.1415950
[5] ——, “Es ima ing he s uc u al segmen a ion o
popula music pieces unde egula i y cons ain s,”
IEEE/ACM T ansac ions on Audio, Speech, and Lan-
guage P ocessing, ol. 25, no. 2, pp. 344–358, 2017.
[6] A. Ma mo e , J. E. Cohen, and F. Bimbo , “Ba wise
music s uc u e analysis wi h he co ela ion block-
ma ching segmen a ion algo i hm,” T ansac ions o he
In e na ional Socie y o Music In o ma ion Re ie al,
No 2023.
[7] B. McFee and D. P. W. Ellis, “Lea ning o segmen
songs wi h o dinal linea disc iminan analysis,” in
2014 IEEE In e na ional Con e ence on Acous ics,
Speech and Signal P ocessing (ICASSP), 2014, pp.
5197–5201.
[8] A. Maezawa, “Music bounda y de ec ion based on a
hyb id deep model o no el y, homogenei y, epe i ion
and du a ion,” in ICASSP 2019 - 2019 IEEE In e -
na ional Con e ence on Acous ics, Speech and Signal
P ocessing (ICASSP), 2019, pp. 206–210.
[9] J. B. L. Smi h and M. Go o, “Using p io s o imp o e
es ima es o music s uc u e.” in P oceedings o he
17 h In e na ional Socie y o Music In o ma ion
Re ie al Con e ence. ISMIR, Aug. 2016, pp.
554–560. [Online]. A ailable: h ps://doi.o g/10.5281/
zenodo.1416916
[10] J. B. L. Smi h, J. A. Bu goyne, I. Fujinaga,
D. D. Rou e, and J. S. Downie, “Design and
c ea ion o a la ge-scale da abase o s uc u al
anno a ions.” in P oceedings o he 12 h In e na ional
Socie y o Music In o ma ion Re ie al Con e ence.
ISMIR, Oc . 2011, pp. 555–560. [Online]. A ailable:
h ps://doi.o g/10.5281/zenodo.1416884
[11] C. Ra el, B. McFee, E. J. Humph ey, J. Salamon,
O. Nie o, D. Liang, and D. P. W. Ellis, “mi _e al: A
anspa en implemen a ion o common mi me ics.”
in P oceedings o he 15 h In e na ional Socie y o
Music In o ma ion Re ie al Con e ence. ISMIR,
Oc . 2014, pp. 367–372. [Online]. A ailable: h ps:
//doi.o g/10.5281/zenodo.1416528
[12] B. McFee and K. Kinnai d, “Imp o ing s uc u e
e alua ion h ough au oma ic hie a chy expansion,”
in P oceedings o he 20 h In e na ional Socie y o
Music In o ma ion Re ie al Con e ence. ISMIR,
No . 2019, pp. 152–158. [Online]. A ailable: h ps:
//doi.o g/10.5281/zenodo.3527764
[13] J. Paulus, “Imp o ing ma ko model based music
piece s uc u e labelling wi h acous ic in o ma ion.”
in P oceedings o he 11 h In e na ional Socie y o
Music In o ma ion Re ie al Con e ence. ISMIR,
Aug. 2010, pp. 303–308. [Online]. A ailable: h ps:
//doi.o g/10.5281/zenodo.1416732
[14] C. Ha e, “Towa ds au oma ic ex ac ion o ha mony
in o ma ion om music signals,” Ph.D. disse a ion,
Depa men o Elec onic Enginee ing, Queen Ma y,
Uni e si y o London, 2010.
[15] O. Nie o, M. McCallum, M. Da ies, A. Robe son,
A. S a k, and E. Egozy, “The Ha monix Se : Bea s,
downbea s, and unc ional segmen anno a ions o
wes e n popula music,” in P oceedings o he 20 h
In e na ional Socie y o Music In o ma ion Re ie al
Con e ence. ISMIR, No . 2019, pp. 565–572.
[Online]. A ailable: h ps://doi.o g/10.5281/zenodo.
3527870
[16] S. Balke, J. Reck, C. WeiSS, J. AbeSSe , and
M. Mülle , “JSD: A da ase o s uc u e analysis in
jazz music,” T ansac ions o he In e na ional Socie y
o Music In o ma ion Re ie al, No 2022.
[17] V. E emenko, E. Demi el, B. Bozku , and X. Se a,
“Audio-aligned jazz ha mony da ase o au oma ic
cho d ansc ip ion and co pus-based esea ch,” in
P oceedings o he 19 h In e na ional Socie y o
Music In o ma ion Re ie al Con e ence. ISMIR,
Sep. 2018, pp. 483–490. [Online]. A ailable: h ps:
//doi.o g/10.5281/zenodo.1492457
[18] M. Go o, H. Hashiguchi, T. Nishimu a, and R. Oka,
“RWC music da abase: Popula , classical and
jazz music da abases.” in P oceedings o he 3 d
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
42
In e na ional Con e ence on Music In o ma ion Re-
ie al. ISMIR, Oc . 2002. [Online]. A ailable:
h ps://doi.o g/10.5281/zenodo.1416474
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
43