scieee Science in your language
[en] (orig)

Real-time Generation of Percussive Rhythms Using Descriptors

Author: Vilanova, Alexandre
Publisher: Zenodo
DOI: 10.5281/zenodo.17302293
Source: https://zenodo.org/records/17302293/files/Alexandre_Vilanova_SMC_2025_Master_Thesis.pdf
Mas e in Sound and Music Compu ing
Uni e si a Pompeu Fab a
Real- ime Gene a ion o Pe cussi e
Rhy hms Using Desc ip o s
Alexand e Vilano a
Supe iso : Daniel Gómez
Co-Supe iso : Se gi Jo dà
Augus 2025
Con en s
1 In oduc ion 1
2 S a e o he A 3
2.1 Rhy hmPe cep ion............................. 3
2.2 Rhy hmSpaces............................... 5
2.3 Va ia ional Au oencode s in Gene a i e Music . . . . . . . . . . . . . . 6
3 Me hods 8
3.1 P oblemSe up ............................... 8
3.2 Va ia ional Au oencode Model . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 ModelA chi ec u e............................. 9
3.2.2 T ainingP ocedu e............................. 10
3.2.3 La en Space Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Desc ip o -based Model . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1 Rhy hmDesc ip o s ............................ 11
3.3.2 Disc e e desc ip o p ecision . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.3 Neu al Ne wo k Implemen a ion . . . . . . . . . . . . . . . . . . . . . . 16
3.3.4 Dimensionali y Reduc ion . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.5 ModelE alua ion.............................. 18
3.4 Compa ing VAE and desc ip o -based app oaches . . . . . . . . . . . . 19
3.5 Smoo hness Expe imen . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5.1 Expe imen Design............................. 22
3.5.2 Implemen a ion............................... 24
3.6 Use Expe ience Expe imen . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6.1 Implemen a ion............................... 25
3.6.2 S uc u e .................................. 25
3.6.3 Da a Collec ion and S o age . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Resul s 31
4.1 Dimensionali y Reduc ion . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Smoo hness Expe imen . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Use Expe ience Expe imen . . . . . . . . . . . . . . . . . . . . . . . . 36
5 Discussion 40
5.1 VAE s. Desc ip o -Based App oaches . . . . . . . . . . . . . . . . . . 40
5.2 Desc ip o Selec ion ............................ 41
5.3 Smoo hnessAnalysis............................ 41
5.4 Use Expe ience Expe imen . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4.1 Valida ion o he Desc ip o -Based App oach . . . . . . . . . . . . . . . 42
5.4.2 Use Backg ound and Pe o mance . . . . . . . . . . . . . . . . . . . . 43
5.4.3 Desc ip o In e p e abili y . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.5 Quali a i e Use Feedback Analysis . . . . . . . . . . . . . . . . . . . . 44
5.6 Me hodological Conside a ions and Limi a ions . . . . . . . . . . . . . 45
6 Conclusions 46
7 Fu u e Wo k 48
Lis o Figu es 50
Lis o Tables 51
A Sou ce code and demo 55
Acknowledgemen
I would like o exp ess my since e g a i ude o my supe iso , Daniel Gómez, o his
in aluable guidance, unwa e ing suppo , and con agious en husiasm h oughou
his wo k. His hough ul insigh s and con inuous encou agemen ha e played a
c ucial ole in guiding he di ec ion and enhancing he quali y o his esea ch, and
his men o ship has made his jou ney bo h ewa ding and inspi ing.

Abs ac
A undamen al challenge in compu a ional music gene a ion lies in de eloping con-
ol in e aces ha p o ide in ui i e, musically meaning ul in e ac ions wi h gen-
e a i e sys ems. This hesis add esses his challenge speci ically o hy hmic gen-
e a ion, ocusing on he de elopmen o a sys em capable o gene a ing 16-s ep
monophonic hy hmic pa e ns in eal ime using musically in ui i e con ols.
Ou me hod uses pe cep ually g ounded hy hmic desc ip o s as an exp essi e, in-
ui i e con ol space. A neu al ne wo k is ained on all possible bina y 16-s ep
monophonic pa e ns, lea ning o map om desc ip o space back o hy hmic pa -
e ns. We compa e his desc ip o -based app oach o a a ia ional au oencode
model and ind he o me mo e e ec i e o usabili y and exp essi e con ol. An
in e ac i e in e ace is de eloped o explo a ion and es ing, ollowed by quan i a-
i e and quali a i e expe imen s e alua ing he smoo hness and use in ui i eness
o he sys em.
Findings show ha he desc ip o -based model aligns well wi h lis ene pe cep-
ion, balancing usabili y wi h exp essi e lexibili y. While limi ed o monophonic
hy hms, he sys em es ablishes desc ip o s as a s ong ounda ion o ex ending
in e ac i e hy hm gene a ion o polyphonic and mo e complex domains.
Keywo ds: hy hm gene a ion, desc ip o enginee ing, a ia ional au oencode s,
gene a i e music, symbolic music, eal- ime in e ac ion.
Chap e 1
In oduc ion
One o he cen al challenges in compu a ional music gene a ion is designing con ol
in e aces ha allow use s o in e ac wi h gene a i e sys ems in ways ha eel bo h
in ui i e and musically meaning ul. This hesis ocuses on his challenge o hy hm,
de eloping a sys em capable o p oducing 16-s ep monophonic hy hmic pa e ns in
eal ime, guided by con ols ha a e easy o musicians o unde s and and use.
Ou main objec i e is o de elop a musically meaning ul me hod o gen-
e a ing 16-s ep monophonic hy hms in eal ime. To achie e his goal, we
in es iga e wo con as ing app oaches: one based on abs ac la en space ep e-
sen a ions lea ned h ough a ia ional au oencode s (VAEs), and ano he g ounded
in explici hy hm desc ip o s co esponding o es ablished concep s in music pe -
cep ion heo y.
The i s app oach employs a VAE a chi ec u e ha lea ns compac la en ep esen-
a ions om hy hmic pa e ns wi hou inco po a ing explici musical knowledge.
Following es ablished wo k in VAE-based symbolic music gene a ion (B unne e al.
2018; Robe s e al. 2018; Vigliensoni e al. 2022), his me hod p o ides smoo h
in e pola ion capabili ies h ough con inuous la en spaces.
The second app oach di ec ly maps hy hm desc ip o s o monophonic hy hm pa -
e ns using a eed o wa d neu al ne wo k. These desc ip o s, g ounded in music
1
Chap e 3
Me hods
3.1 P oblem Se up
We ame he hy hm gene a ion ask as a mapping om a low-dimensional con ol
space o 16-s ep bina y monophonic hy hmic pa e ns. Each pa e n consis s o a
sequence o bina y alues indica ing silence (0) o onse (1), esul ing in a o al o
216 = 65,536 possible combina ions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Figu e 1: Visualiza ion o a 16-s ep hy hm pa e n.
In Figu e 1, you can see an example o how a 16-s ep monophonic bina y hy hm
pa e n looks like, onse s a e ep esen ed as ci cles a s eps 1, 5, 7, 9, and 13.
A he co e o ou app oach lies he ques ion: how can we design a con ol space
ha is bo h compac and musically meaning ul? To explo e his, we in es iga e wo
con as ing s a egies.
The i s s a egy akes a pu ely da a-d i en app oach: a VAE lea ns an abs ac la-
en space om hy hmic pa e ns wi hou inco po a ing any explici musical p io s.
This la en space is compac and suppo s smoo h in e pola ion be ween pa e ns,
making i a ac i e o gene a i e applica ions. Howe e , because i s dimensions a e
8

3.2. Va ia ional Au oencode Model 9
no explici ly ied o pe cep ual o musical concep s, i is ini ially unclea whe he
his space will suppo in ui i e o con ollable use in e ac ion.
The second s a egy le e ages a se o pe cep ually g ounded hy hm desc ip o s—
such as syncopa ion, onse densi y, and balance—as an in e p e able con ol space.
In o med by hy hm pe cep ion li e a u e, hese desc ip o s allow use s o di ec ly
in luence musically ele an a ibu es du ing gene a ion.
By compa ing he da a-d i en la en space lea ned by he VAE wi h he pe cep ually
g ounded desc ip o -based con ol space, we aim o e alua e he ade-o s be ween
gene a i e exp essi eness, in e p e abili y, and use con ol in in e ac i e hy hm
gene a ion.
3.2 Va ia ional Au oencode Model
We ain a VAE model o lea n a compac la en space om bina y hy hm pa -
e ns wi hou elying on p ede ined musical ea u es. The model aims o disco e a
low-dimensional, con inuous ep esen a ion ha enables smoo h in e pola ion and
gene a ion o hy hm pa e ns.
Figu e 2: O e iew o he Va ia ional Au oencode model pipeline.
3.2.1 Model A chi ec u e
The au oencode consis s o wo main componen s: an encode , which maps he 16-
s ep bina y hy hm pa e ns in o a compac la en space ha cap u es hei essen ial
ea u es, and a decode , which econs uc s he o iginal pa e ns om his la en
ep esen a ion while p ese ing he key hy hmic s uc u e. The ollowing sec ions
p o ide de ails on how hese wo laye s ha e been con igu ed:
10 Chap e 3. Me hods
Encode
•Fully connec ed laye wi h 64 neu ons and ReLU ac i a ion.
•Fully connec ed laye wi h dla en neu ons (no ac i a ion).
Decode
•Fully connec ed laye wi h 64 neu ons and ReLU ac i a ion.
•Fully connec ed laye wi h 16 ou pu neu ons ollowed by a sigmoid ac i a ion
o map ou pu s in o [0,1] ange.
Fo ou expe imen s, we se he la en space dimensionali y o d= 5. This alue is
chosen wi h a knob-based in e ace in mind, as i e knobs a e conside ed an app o-
p ia e numbe o p o ide su icien exp essi e con ol while main aining a minimal
and in ui i e in e ace.
3.2.2 T aining P ocedu e
We ain he au oencode using all possible 16-s ep bina y hy hm pa e ns (216 =
65,536 pa e ns), spli in o aining (70%) and es (30%) se s. The model is op-
imized using he Bina y C oss-En opy (BCE) loss unc ion, which is sui able o
bina y da a econs uc ion. The Adam op imize is employed wi h a lea ning a e
o 0.001.
•Loss unc ion: Bina y C oss-En opy Loss.
•Op imize : Adam.
•Epochs: 400.
•Ba ch size: 32.
We moni o bo h econs uc ion loss and bina y econs uc ion accu acy, whe e
accu acy is compu ed as he p opo ion o co ec ly econs uc ed onse posi ions
a e applying a 0.5 h eshold on he decode ou pu s.
3.3. Desc ip o -based Model 11
3.2.3 La en Space Sampling
To assess he gene a i e capabili y o he model, we sample andom poin s om
he la en space and decode hem in o hy hm pa e ns. While bina y pa e ns
a e ob ained by applying a h eshold o 0.5 o he decode ou pu s, we can also
expe imen wi h di ec ly using he con inuous ou pu s in he ange [0,1] o ep esen
no e eloci ies. This app oach allows he model o gene a e hy hms wi h dynamic
in ensi y, cap u ing exp essi e a ia ions beyond simple bina y e en s and p o iding
a iche ep esen a ion o hy hmic nuances.
3.3 Desc ip o -based Model
We ain a neu al ne wo k model ha maps pe cep ual hy hm desc ip o s o 16-
s ep monophonic hy hm pa e ns. Each desc ip o ep esen s a speci ic s uc u al o
pe cep ual p ope y o hy hm, based on indings om hy hm pe cep ion li e a u e.
Figu e 3: O e iew o he Desc ip o -based model pipeline.
3.3.1 Rhy hm Desc ip o s
A hy hm desc ip o is a unc ion ha akes a monophonic hy hm pa e n as in-
pu and e u ns a nume ical alue quan i ying a speci ic s uc u al o pe cep ual
p ope y o ha hy hm. Examples include measu es o onse densi y, syncopa ion,
e enness, and balance. To ensu e consis ency and compa abili y ac oss ea u es, all
desc ip o s a e no malized o lie in he ange [0,1], wi h 0 ep esen ing he minimum
exp ession o a ea u e and 1 he maximum. This no maliza ion allows he desc ip-
o s o be di ec ly used as inpu s o compu a ional models, such as neu al ne wo ks,
wi hou he need o addi ional scaling and ensu es ha each ea u e con ibu es
p opo ionally du ing lea ning.
12 Chap e 3. Me hods
Onse coun
Numbe o ac i e s eps in a hy hmic pa e n, ep esen ed by he numbe o onse s
in a bina y sequence. I p o ides a basic sense o hy hmic densi y o how many
bea s a e played wi hin he ixed numbe o s eps.
onse s = {i∈ {1,2,...,16} | pa e ni>0}nOnse s = |onse s|
dOnse Coun =1
16 ·nOnse s
S a
Posi ion o he i s onse in he hy hm. I se es as a e e ence poin o iming and
can in luence he pe cei ed g oo e o alignmen o he hy hm wi hin a measu e.
dS a =1
16 ·min{i|pa e ni>0}
Cen e
Cen e o mass o he hy hm ac oss 16 s eps. I shows how he weigh o he hy hm
is dis ibu ed in ime, helping o iden i y whe he he hy hm eels on -hea y, back-
hea y, o balanced.
dCen e =1
16 ·1
nOnse s ·
16
X
i=1
i·(pa e ni>0)
Syncopa ion
Measu es how much a hy hm de ia es om a egula me ical pa e n. I quan-
i ies he displacemen o accen s o weake bea s, p o iding insigh in o hy hmic
complexi y and ension (Hu on 2006).
w= [5,1,2,1,3,1,2,1,4,1,2,1,3,1,2,1]
3.3. Desc ip o -based Model 13
si= max 0,( i− (i+1) mod 16)·(w(i+1) mod 16 −wi)
dSyncopa ion =1
30
15
X
i=0
si+ 15
This o mula builds on he me ical hie a chy weigh s p oposed by Le dahl and
Jackendo (1983), la e summa ized in Toussain (2013). In his hie a chy, bea s a
di e en posi ions wi hin a 16-s ep pa e n a e assigned weigh s: s ong downbea s
(5), mid-poin s (4), hal -bea s (3), qua e subdi isions (2), and o -bea s (1). This
hie a chy is e lec ed in he weigh ec o wabo e.
The syncopa ion sco e si ollows he p inciple ha hy hmic ension a ises when
an onse p ecedes silence o a weake onse , bu leads in o a me ically s onge
posi ion. This is concep ually ela ed o Longue -Higgins and Lee (1984) and i s
la e o maliza ion by Fi ch and Rosen eld (2007), whe e syncopa ion is quan i ied as
he di e ence in me ical weigh be ween an onse and a subsequen es a a s onge
bea . The no maliza ion ac o ( 1
30) and o se (+15) ensu e ha syncopa ion alues
all wi hin a consis en ange ac oss 16-s ep hy hmic pa e ns.
Syncopa ion awa eness
Re ines he basic syncopa ion me ic by ac o ing in pe cep ual salience. Each onse
is weigh ed acco ding o i s pe cei ed impo ance o no iceabili y o human lis ene s,
yielding a measu e ha aligns mo e closely wi h musical pe cep ion (Gómez-Ma ín,
Jo dà, and He e a 2015).
a= [8,8,8,8,1,1,1,1,4,4,4,4,2,2,2,2]
dSyncopa ionAwa eness =1
115
15
X
i=0
si·ai+ 65

14 Chap e 3. Me hods
E enness
As desc ibed in Milne and He (2020), and ela ed o he concep o densi y, he
e enness o a hy hm e lec s he egula i y o in e onse in e als1. Rhy hms wi h
lowe a iance in in e onse spacing a e conside ed mo e e en, while highe a iance
indica es i egula i y. Ou o mula ion adap s he geome ic app oach o hy hmic
e enness in oduced by Milne and Dean (2016), p ojec ing onse posi ions on o he
uni ci cle and compa ing hem agains an ideal uni o m dis ibu ion.
dE enness =1
nOnse s
nOnse s−1
X
k=0 
cos2πk
nOnse s −2π·onse sk
16 +2π·onse s0
16 
Balance
Complemen a y o e enness, balance measu es he symme y o onse dis ibu ion.
I is de ined as he p oximi y o he hy hm’s cen e o mass o he cen e o he
uni ci cle2. A hy hm wi h high balance is sp ead mo e symme ically a ound he
cycle, con ibu ing o g ea e pe cep ual s abili y. Toge he , e enness and balance
desc ibe complemen a y aspec s o hy hmic dis ibu ion: one quan i ies uni o mi y
o spacing, he o he symme y a ound he cycle.
dBalance = 1 −1
nOnse s
u
u
u
u
u
u
u
u
nOnse s−1
X
k=0
cos 2πonse sk
16 !2
+
nOnse s−1
X
k=0
sin 2πonse sk
16 !2
Syness
Combined me ic ha inco po a es bo h syncopa ion and he numbe o onse s. I
cap u es he in e play be ween hy hmic complexi y and densi y, o e ing a nuanced
iew o g oo e and s uc u e.
1An in e onse in e al is he ime be ween wo consecu i e pulses (Milne and He 2020).
2The uni ci cle is a isualiza ion o a hy hmic cycle whe e s eps a e placed a ound a ci cle,
like posi ions on a clock (Milne and He 2020).
3.3. Desc ip o -based Model 15
dSyness =1
0.633 ·dSyncopa ionAwa eness
nOnse s
The no maliza ion cons an 1
0.633 co esponds o he maximum a ainable alue o
he measu e ac oss all 16-s ep pa e ns, ensu ing ha dSyness ∈[0,1]
3.3.2 Disc e e desc ip o p ecision
When using analog con ols such as knobs, i is possible o ob ain highly con inuous
inpu alues. Howe e , in he case o a digi al use in e ace, we canno assume access
o con inuous inpu signals. Mo eo e , in he con ex o ou p ojec , main aining
MIDI compa ibili y is an impo an equi emen .
To accoun o his, when gene a ing he desc ip o da ase we es ic ed alues
o he p ecision o he MIDI ange (0–127). This ine i ably in oduces some e-
dundancy in he da a (e.g., mul iple desc ip o inpu s may co espond o he same
pa e n).
Inpu Type Dimensions Repea ed Unique (%)
MIDI 8 292 99.6
5 9742 85.1
Floa 8 41 99.9
5 3855 94.1
Table 1: P ecision using MIDI and loa s o de ine desc ip o s.
Table 1 summa izes he impac o using disc e e MIDI alues e sus con inuous
loa ing-poin alues o de ining desc ip o s. When es ic ing inpu s o he MIDI
esolu ion (0–127), a highe numbe o epea ed pa e ns eme ges, pa icula ly in he
5-dimensional case, whe e o e 9,000 epe i ions occu and only 85.1% o pa e ns a e
uniquely iden i ied. In con as , using loa ing-poin alues g ea ly educes edun-
dancy, wi h nea ly all pa e ns uniquely iden i ied in bo h he 5 and 8-dimensional
cases. These esul s highligh he ade-o be ween main aining MIDI compa i-
bili y and achie ing ine p ecision in he desc ip o space. In he con ex o ou
expe imen , we use he da ase in he MIDI ange.
16 Chap e 3. Me hods
3.3.3 Neu al Ne wo k Implemen a ion
The model is a eed o wa d neu al ne wo k ained o eg ess om hy hm desc ip-
o s o hy hm pa e ns. The inpu laye akes ndesc ip o alues, and he ou pu
laye p oduces a 16-dimensional ec o o logi s, which a e passed h ough a sig-
moid unc ion o ob ain onse p obabili ies o each o he 16 s eps in he hy hm
pa e n. Again, hese p obabili ies can be in e p e ed in wo ways: (1) by applying
a h eshold (e.g. 0.5) o p oduce bina y onse p edic ions indica ing he p esence o
absence o an onse , o (2) by di ec ly mapping he p obabili ies o eloci y alues,
yielding a con inuous ep esen a ion o onse s eng h a each s ep.
A chi ec u e
•Inpu laye :ndesc ip o alues.
•Hidden laye s: Fou ully connec ed laye s wi h 16, 32, 64, and 32 neu ons
espec i ely, each ollowed by a ReLU ac i a ion unc ion.
•Ou pu laye : 16 neu ons (no ac i a ion; sigmoid is applied ex e nally o
in e p e he ou pu s as p obabili ies).
The model is ained using he BCEWi hLogi sLoss loss unc ion, and op imized
wi h he Adam op imize . T aining is pe o med o 200 epochs wi h mini-ba ches
o size 32.
E alua ion Me hods
To assess he quali y o he model’s ou pu , we employ wo dis inc e alua ion
s a egies ha e lec di e en aspec s o hy hmic simila i y:
•Pa e n-based accu acy: he p edic ed ou pu is h esholded a 0.5 o ob-
ain a bina y pa e n, which is hen di ec ly compa ed o he g ound- u h
bina y hy hm pa e n. Accu acy is compu ed as he p opo ion o co ec ly
p edic ed onse posi ions ac oss all s eps and examples.
3.3. Desc ip o -based Model 17
Sco e =1
16
16
X
i=1
1[ˆpi=pi]
•Desc ip o -based accu acy: he p edic ed bina y hy hm pa e ns a e pos -
p ocessed o compu e hei desc ip o s (using he same ea u e ex ac ion
me hod as he inpu ). Accu acy is compu ed based on he simila i y (e.g.,
in e se no malized e o ) be ween he p edic ed desc ip o s and he o iginal
inpu desc ip o s. This e sion p io i izes pe cep ual o s uc u al simila i y
o e exac pa e n ma ch.
Sco e = 1 −1
8
8
X
j=1 |ˆ
di−di|
Fu he expe imen s will be conduc ed o de e mine which o he wo e alua ion
app oaches is mo e sui able o ou use case.
3.3.4 Dimensionali y Reduc ion
To in es iga e he e ec i eness o a educed desc ip o space o hy hm pa e n
gene a ion, we conduc expe imen s using dimensionali y educ ion by sys ema i-
cally emo ing ea u es om he ull desc ip o se . The mo i a ion behind his
p ocess is o enable a mo e usable and compac con ol space—pa icula ly ele an
in in e ac i e o ha dwa e-based sys ems, such as knob-based in e aces, whe e ha -
ing oo many dimensions can hinde in ui i e con ol. A smalle , well-chosen se o
desc ip o s would allow o mo e exp essi e ye manageable manipula ion o hy hm
gene a ion, enhancing bo h use expe ience and c ea i e lexibili y. This expe imen
is conduc ed using only he pa e n-based accu acy.
Speci ically, we employ a lea e-k-ou app oach, whe e k∈ {1,2,3}. In each case, k
desc ip o s a e excluded om he inpu ea u e se , and he model is e ained and
e alua ed using he emaining n−kdesc ip o s. This p ocess allows us o e alua e
how much each desc ip o (o g oup o desc ip o s) con ibu es o he pe o mance
o he model, and whe he a smalle subse o desc ip o s can s ill p ese e p edic i e
24 Chap e 3. Me hods
Figu e 6: Visualiza ion o a mo emen in he desc ip o space.
3.5.2 Implemen a ion
The expe imen is implemen ed in a Py hon no ebook ha au oma ically gene a es
a la ge numbe o andom mo emen s ac oss he h ee dis ance ca ego ies, compu es
he abo e me ics, and sa es he gene a ed pa e n sequences.
We ex end he Pu e Da a pa ch o suppo he simula ion o hese mo emen s in
he desc ip o space. As shown in Figu e 4, he wo o ange slide s allow use s o
na iga e h ough he se o p e-gene a ed mo emen s and explo e he in e pola ions
in e ac i ely. The index slide selec s he speci ic mo emen o be simula ed, while
he s ep slide con ols he in e pola ion posi ion along he mo emen . As he s ep
slide is adjus ed, he co esponding desc ip o alues smoo hly in e pola e om he
s a o he end poin , and use s can lis en in eal ime o he g adual ans o ma ion
o he gene a ed hy hmic pa e n.
3.6 Use Expe ience Expe imen
To e alua e he usabili y and e ec i eness o he sys em, we de elop a ully web-
based s udy whe e pa icipan s a emp o eplica e e e ence hy hmic pa e ns
using desc ip o slide s.

3.6. Use Expe ience Expe imen 25
3.6.1 Implemen a ion
A web-based o ma is chosen o maximize accessibili y and scalabili y, allowing a
b oade and mo e di e se pool o pa icipan s o ake pa in he s udy emo ely.
This app oach also simpli ies deploymen and educes se up ime, enabling e icien
collec ion o la ge amoun s o da a ac oss di e en de ices and en i onmen s.
Since he o iginal in e ac i e sys em is de eloped as a Pu e Da a pa ch, we e-
implemen i o he web. The webapp is buil using he ollowing s ack:
•Reac : o building he in e ac i e use in e ace.3
•Tone.js: o sequencing and playing samples.4
•ONNX Run ime: o execu e ou inal PyTo ch model wi hin he b owse .5
This a chi ec u e ensu es ha pa e n gene a ion and audio playback a e execu ed
locally wi hin he b owse , minimizing la ency and ensu ing a smoo h use expe i-
ence.
3.6.2 S uc u e
The expe imen consis s o he ollowing s ages:
•Backg ound ques ionnai e: pa icipan s p o ide demog aphic in o ma ion,
including age ange and musical expe ience (yea s o s udy, pe o mance, and
pe cussion-speci ic expe ience).
•Eigh exe cises: in each exe cise, pa icipan s lis en o a a ge hy hmic
pa e n (Pa e n A) and a emp o eplica e i using he p o ided slide s,
which con ol hy hmic desc ip o s such as onse _coun , s a , cen e , synco-
pa ion, and balance. A e submi ing hei solu ion (Pa e n B), hey a e
3h ps:// eac .de
4h ps:// onejs.gi hub.io
5h ps://onnx un ime.ai
26 Chap e 3. Me hods
he pe cei ed simila i y be ween he wo pa e ns. The wo pa e ns a e an-
domized bu hey a e always wi hin a medium dis ance (as de ined in he
smoo hness expe imen ).
•Final eedback: pa icipan s o e hei o e all imp essions ega ding he
use in e ace, he cla i y o he ask, and he di icul y o con olling each
desc ip o , along wi h open-ended commen s abou con using aspec s, liked
ea u es and sugges ions o imp o emen .
3.6.3 Da a Collec ion and S o age
Th oughou he s udy, a ange o objec i e and subjec i e da a poin s a e collec ed
o each pa icipan and s o ed p i a ely in Google Shee s using he App Sc ip s API.
Some o hese ields a e di ec ly submi ed by he pa icipan s, while o he s de i e
om p ocessing he expe imen al da a (e.g., objec i e and pa ame ic simila i y
me ics o elapsed ime). No pe sonal o iden i iable in o ma ion is collec ed, and
all s o ed da a is p ocessed wi h he in o med consen o he pa icipan s.
The collec ed da a includes:
•Backg ound in o ma ion:
–Pa icipan ID
–Age ange (18-25, 26-35, 36-45, 46-55, 56+)
–Yea s o musical s udy (0-4+ scale)
–Yea s o musical pe o mance (0-4+ scale)
–Yea s spen pe o ming pe cussion (0-4+ scale)
•Exe cise da a:
–Exe cise numbe (1-8)
–Ta ge pa e n (16-s ep pa e n)
–Submi ed pa e n (16-s ep pa e n)
3.6. Use Expe ience Expe imen 27
–Ta ge desc ip o s (5 desc ip o alues)
–Ini ial desc ip o s (5 desc ip o alues)
–Final desc ip o s (5 desc ip o alues)
–Elapsed ime ( ial du a ion in seconds)
–Subjec i e simila i y (0-5 scale)
–Objec i e simila i y (euclidean dis ance be ween pa e ns)6
–Pa ame ic simila i y (euclidean dis ance be ween desc ip o s)7
•Final eedback:
–Ra ings o in e ace in ui i eness and di icul y pe desc ip o (0–5 scale).
–Con using aspec s o he in e ace ( ee ex )
–Liked aspec s o he in e ace ( ee ex )
–Feedback, sugges ions, o ideas ( ee ex )
Figu es 7 o 11 illus a e key s ages o he use s udy, including he in oduc ion,
backg ound ques ionnai e, exe cise in e ace, and eedback ques ions.
6De ined by he euclidean dis ance be ween he p esen ed pa e n and he one submi ed by
he use .
7Compu ed wi h he euclidean dis ance be ween he ini ial se o desc ip o s and he inal one
de ined by he use .
28 Chap e 3. Me hods
Figu e 7: Use expe ience expe imen : in o and backg ound ques ionnai e.
Figu e 8: Use expe ience expe imen : amilia i y s ep.
3.6. Use Expe ience Expe imen 29
Figu e 9: Use expe ience expe imen : exe cise in e ac ion in e ace.
Figu e 10: Use expe ience expe imen : exe cise eedback o m.

30 Chap e 3. Me hods
Figu e 11: Use expe ience expe imen : inal eedback ques ionnai e.
Chap e 4
Resul s
4.1 Dimensionali y Reduc ion
We sys ema ically pe o med a se ies o lea e-k-ou expe imen s, whe e we emo ed
one, wo, o h ee desc ip o s a a ime and measu ed he classi ica ion accu acy.
Tables 2, 3, and 4 summa ize he esul s o he lea e-one-ou (L1O), lea e- wo-
ou (L2O), and lea e- h ee-ou (L3O) es s, espec i ely. Each ow in hese ables
co esponds o one expe imen al con igu a ion, lis ing he desc ip o s used and he
esul ing classi ica ion accu acy.
Id Acc. (%) Desc ip o s
1 79.13 onse _coun , s a , cen e , syncopa ion, syncopa ion_awa eness, e enness, balance
2 77.68 onse _coun , s a , cen e , syncopa ion, syncopa ion_awa eness, e enness, syness
3 78.83 onse _coun , s a , cen e , syncopa ion, syncopa ion_awa eness, balance, syness
4 78.38 onse _coun , s a , cen e , syncopa ion, e enness, balance, syness
5 74.92 onse _coun , s a , cen e , syncopa ion_awa eness, e enness, balance, syness
6 76.22 onse _coun , s a , syncopa ion, syncopa ion_awa eness, e enness, balance, syness
7 76.06 onse _coun , cen e , syncopa ion, syncopa ion_awa eness, e enness, balance, syness
8 78.94 s a , cen e , syncopa ion, syncopa ion_awa eness, e enness, balance, syness
Table 2: Lea e-one-ou (L1O) expe imen esul s.
31
32 Chap e 4. Resul s
Id Acc. (%) Desc ip o s
1 77.66 onse _coun , s a , cen e , syncopa ion, syncopa ion_awa eness, e enness
2 79.16 onse _coun , s a , cen e , syncopa ion, syncopa ion_awa eness, balance
3 77.14 onse _coun , s a , cen e , syncopa ion, syncopa ion_awa eness, syness
4 74.29 onse _coun , s a , cen e , syncopa ion, e enness, balance
5 77.05 onse _coun , s a , cen e , syncopa ion, e enness, syness
6 78.20 onse _coun , s a , cen e , syncopa ion, balance, syness
7 74.68 onse _coun , s a , cen e , syncopa ion_awa eness, e enness, balance
8 73.59 onse _coun , s a , cen e , syncopa ion_awa eness, e enness, syness
9 74.66 onse _coun , s a , cen e , syncopa ion_awa eness, balance, syness
10 74.67 onse _coun , s a , cen e , e enness, balance, syness
11 76.11 onse _coun , s a , syncopa ion, syncopa ion_awa eness, e enness, balance
12 74.87 onse _coun , s a , syncopa ion, syncopa ion_awa eness, e enness, syness
13 74.42 onse _coun , s a , syncopa ion, syncopa ion_awa eness, balance, syness
14 75.54 onse _coun , s a , syncopa ion, e enness, balance, syness
15 71.90 onse _coun , s a , syncopa ion_awa eness, e enness, balance, syness
16 76.55 onse _coun , cen e , syncopa ion, syncopa ion_awa eness, e enness, balance
17 74.31 onse _coun , cen e , syncopa ion, syncopa ion_awa eness, e enness, syness
18 73.03 onse _coun , cen e , syncopa ion, syncopa ion_awa eness, balance, syness
19 75.48 onse _coun , cen e , syncopa ion, e enness, balance, syness
20 72.46 onse _coun , cen e , syncopa ion_awa eness, e enness, balance, syness
21 70.21 onse _coun , syncopa ion, syncopa ion_awa eness, e enness, balance, syness
22 77.04 s a , cen e , syncopa ion, syncopa ion_awa eness, e enness, balance
23 77.68 s a , cen e , syncopa ion, syncopa ion_awa eness, e enness, syness
24 78.47 s a , cen e , syncopa ion, syncopa ion_awa eness, balance, syness
25 75.00 s a , cen e , syncopa ion, e enness, balance, syness
26 74.79 s a , cen e , syncopa ion_awa eness, e enness, balance, syness
27 76.27 s a , syncopa ion, syncopa ion_awa eness, e enness, balance, syness
28 76.15 cen e , syncopa ion, syncopa ion_awa eness, e enness, balance, syness
Table 3: Lea e- wo-ou (L2O) expe imen esul s.
4.1. Dimensionali y Reduc ion 33
Id Acc. (%) Desc ip o s
1 77.32 onse _coun , s a , cen e , syncopa ion, syncopa ion_awa eness
2 73.02 onse _coun , s a , cen e , syncopa ion, e enness
3 74.05 onse _coun , s a , cen e , syncopa ion, balance
4 76.58 onse _coun , s a , cen e , syncopa ion, syness
5 73.56 onse _coun , s a , cen e , syncopa ion_awa eness, e enness
6 74.58 onse _coun , s a , cen e , syncopa ion_awa eness, balance
7 72.91 onse _coun , s a , cen e , syncopa ion_awa eness, syness
8 71.16 onse _coun , s a , cen e , e enness, balance
9 73.58 onse _coun , s a , cen e , e enness, syness
10 74.47 onse _coun , s a , cen e , balance, syness
11 74.91 onse _coun , s a , syncopa ion, syncopa ion_awa eness, e enness
12 74.31 onse _coun , s a , syncopa ion, syncopa ion_awa eness, balance
13 73.65 onse _coun , s a , syncopa ion, syncopa ion_awa eness, syness
14 70.78 onse _coun , s a , syncopa ion, e enness, balance
15 74.30 onse _coun , s a , syncopa ion, e enness, syness
16 73.62 onse _coun , s a , syncopa ion, balance, syness
17 71.92 onse _coun , s a , syncopa ion_awa eness, e enness, balance
18 70.57 onse _coun , s a , syncopa ion_awa eness, e enness, syness
19 69.94 onse _coun , s a , syncopa ion_awa eness, balance, syness
20 71.77 onse _coun , s a , e enness, balance, syness
21 74.77 onse _coun , cen e , syncopa ion, syncopa ion_awa eness, e enness
22 73.06 onse _coun , cen e , syncopa ion, syncopa ion_awa eness, balance
23 71.99 onse _coun , cen e , syncopa ion, syncopa ion_awa eness, syness
24 71.77 onse _coun , cen e , syncopa ion, e enness, balance
25 74.19 onse _coun , cen e , syncopa ion, e enness, syness
26 72.63 onse _coun , cen e , syncopa ion, balance, syness
27 72.59 onse _coun , cen e , syncopa ion_awa eness, e enness, balance
28 70.39 onse _coun , cen e , syncopa ion_awa eness, e enness, syness
29 68.58 onse _coun , cen e , syncopa ion_awa eness, balance, syness
30 72.28 onse _coun , cen e , e enness, balance, syness
31 70.11 onse _coun , syncopa ion, syncopa ion_awa eness, e enness, balance
32 69.34 onse _coun , syncopa ion, syncopa ion_awa eness, e enness, syness
33 68.40 onse _coun , syncopa ion, syncopa ion_awa eness, balance, syness
34 69.74 onse _coun , syncopa ion, e enness, balance, syness
35 65.77 onse _coun , syncopa ion_awa eness, e enness, balance, syness
36 75.22 s a , cen e , syncopa ion, syncopa ion_awa eness, e enness
37 75.36 s a , cen e , syncopa ion, syncopa ion_awa eness, balance
38 77.15 s a , cen e , syncopa ion, syncopa ion_awa eness, syness
39 72.73 s a , cen e , syncopa ion, e enness, balance
40 73.57 s a , cen e , syncopa ion, e enness, syness
41 74.08 s a , cen e , syncopa ion, balance, syness
42 73.14 s a , cen e , syncopa ion_awa eness, e enness, balance
43 73.56 s a , cen e , syncopa ion_awa eness, e enness, syness
44 74.48 s a , cen e , syncopa ion_awa eness, balance, syness
45 72.84 s a , cen e , e enness, balance, syness
46 72.86 s a , syncopa ion, syncopa ion_awa eness, e enness, balance
47 74.77 s a , syncopa ion, syncopa ion_awa eness, e enness, syness
48 74.26 s a , syncopa ion, syncopa ion_awa eness, balance, syness
49 71.08 s a , syncopa ion, e enness, balance, syness
50 71.99 s a , syncopa ion_awa eness, e enness, balance, syness
51 73.75 cen e , syncopa ion, syncopa ion_awa eness, e enness, balance
52 74.48 cen e , syncopa ion, syncopa ion_awa eness, e enness, syness
53 72.84 cen e , syncopa ion, syncopa ion_awa eness, balance, syness
54 72.33 cen e , syncopa ion, e enness, balance, syness
55 72.25 cen e , syncopa ion_awa eness, e enness, balance, syness
56 70.04 syncopa ion, syncopa ion_awa eness, e enness, balance, syness
Table 4: Lea e- h ee-ou (L3O) expe imen esul s.
Chap e 5
Discussion
In his sec ion, we analyze and in e p e he da a collec ed in he esul s, exam-
ining he pe o mance, usabili y, and pe cep ual ele ance o he di e en hy hm
gene a ion app oaches. By discussing bo h quan i a i e and quali a i e indings,
we aim o highligh he s eng hs, limi a ions, and p ac ical implica ions o ou
desc ip o -based hy hm gene a ion app oach.
5.1 VAE s. Desc ip o -Based App oaches
The p elimina y compa ison be ween VAE and desc ip o -based models alida es
he cen al hypo hesis o his esea ch. While he VAE app oach demons a ed
smoo h la en space in e pola ion, i ailed o p o ide meaning ul use con ol due
o abs ac la en dimensions ha made i impossible o p edic o in en ionally
in luence speci ic hy hmic p ope ies.
The desc ip o -based model, enabled pu pose ul explo a ion and p edic able ou -
comes. This ade-o be ween pe ec smoo hness and in e p e able con ol a o s
in e ac i e applica ions p io i izing use agency. The success ein o ces he alue o
inco po a ing domain knowledge om music pe cep ion esea ch a he han elying
solely on da a-d i en ea u e lea ning.
40

5.2. Desc ip o Selec ion 41
5.2 Desc ip o Selec ion
The lea e-k-ou expe imen s (Tables 2, 3, and 4) e ealed impo an insigh s abou
desc ip o edundancy and he minimal ea u e se equi ed o e ec i e hy hm gen-
e a ion. The esul s demons a e ha a educed se o i e desc ip o s (onse _coun ,
s a , cen e , syncopa ion and balance) main ains s ong p edic i e pe o mance
while signi ican ly simpli ying he con ol space.
The ansi ion om eigh o i e desc ip o s esul ed in only a modes dec ease in
pa e n-based accu acy ( om 78.35% o 74.05%, as seen in Table 4) while main ain-
ing high desc ip o -based accu acy (92.24%). This inding is pa icula ly signi ican
o p ac ical applica ions, as i sugges s ha h ee desc ip o s (syncopa ion awa e-
ness, e enness, and syness) con ibu e ela i ely li le unique in o ma ion beyond
wha is cap u ed by he co e i e ea u es.
The supe io pe o mance o desc ip o -based accu acy (92.24%) compa ed o pa e n-
based accu acy (74.05%) indica es ha he model success ully lea ns o p ese e he
pe cep ual and s uc u al p ope ies o hy hms e en when exac pa e n econ-
s uc ion is impe ec . This aligns wi h music cogni ion esea ch sugges ing ha
human hy hm pe cep ion is mo e ole an o su ace-le el a ia ions when unde ly-
ing s uc u al ela ionships a e main ained.
5.3 Smoo hness Analysis
The smoo hness expe imen (Figu e 12) p o ided e idence o model selec ion be-
ween he pa e n-based and desc ip o -based aining app oaches. The ANOVA
esul s in Table 5 con i m ha he desc ip o -based e o me ic p oduces signi i-
can ly smoo he ansi ions ac oss all mo emen ca ego ies. The pos -hoc Tukey’s
HSD analysis esul s in Table 6 e eal a clea hie a chy in smoo hness ac oss mo e-
men magni udes. Small mo emen s consis en ly show he smoo hes beha io , ol-
lowed by medium mo emen s, wi h la ge mo emen s exhibi ing he mos a iabili y.
This hie a chy sugges s ha use s can expec mo e p edic able hy hmic ansi ions
when making sub le desc ip o adjus men s compa ed o d ama ic changes.
42 Chap e 5. Discussion
The choice o KL di e gence and Euclidean dis ance as complemen a y smoo h-
ness me ics p o es aluable. KL di e gence cap u es p obabilis ic di e ences in he
model’s ou pu dis ibu ions, e lec ing he unce ain y and g ada ion in hy hm gen-
e a ion, while Euclidean dis ance in desc ip o space measu es how well he model
p ese es he in ended con ol ela ionships. The conco dance be ween hese me ics
s eng hens con idence in he smoo hness indings.
5.4 Use Expe ience Expe imen
5.4.1 Valida ion o he Desc ip o -Based App oach
The co ela ion be ween objec i e and subjec i e simila i y measu es (Figu es 14
and 15) alida es he use o Euclidean dis ance o hy hm pa e n compa ison
and con i ms ha ou compu a ional measu es align wi h human judgmen . This
alignmen demons a es ha pa icipan s’ pe cep ual assessmen s o hy hmic sim-
ila i y co espond meaning ully wi h algo i hmic dis ance calcula ions, suppo ing
he undamen al assump ion ha geome ic ela ionships in desc ip o space e lec
musical ela ionships as pe cei ed by lis ene s.
Beyond alida ing ou simila i y me ic, his co ela ion p o ides e idence o he
e ec i eness o ou chosen desc ip o s as a con ol space. The ac ha use s’ sub-
jec i e e alua ions consis en ly ela e o compu a ional measu es sugges s ha he
i e-dimensional desc ip o space cap u es pe cep ually ele an aspec s o hy h-
mic s uc u e. This pe cep ual g ounding dis inguishes ou app oach om abs ac
la en ep esen a ions, whe e such alignmen be ween compu a ional and human
simila i y judgmen s canno be assumed.
The s a is ical signi icance o his ela ionship es ablishes a ounda ion o au o-
ma ed e alua ion o hy hm gene a ion quali y, enabling u u e sys ems o op i-
mize o human-pe cei ed simila i y. Mo eo e , i alida es he desc ip o -based
me hodology as a b idge be ween compu a ional ep esen a ion and musical cogni-
ion, suppo ing he in eg a ion o music pe cep ion esea ch wi h machine lea ning
app oaches o in e ac i e music sys ems.
5.4. Use Expe ience Expe imen 43
5.4.2 Use Backg ound and Pe o mance
Table 7 summa izes he co ela ions be ween pa icipan backg ound, elapsed ime
doing he exe cises, and he h ee ypes o simila i y. The esul s e eal a weak
bu s a is ically signi ican nega i e co ela ion be ween elapsed ime and objec i e
simila i y, sugges ing ha pa icipan s who spen mo e ime on each exe cise ended
o pe o m sligh ly wo se. This may indica e ha longe in e ac ion did no con-
sis en ly suppo lea ning, possibly due o ine ec i e explo a ion, ask di icul y, o
us a ion.
Musical s udy expe ience also showed a weak nega i e co ela ion wi h subjec i e
simila i y a ings, e ealing a pa adox: mo e musically ained pa icipan s a ed
hei ep oduc ions as less simila o he a ge s, despi e no signi ican di e ence in
objec i e pe o mance. This may e lec highe c i ical s anda ds among expe ienced
musicians o sugges ha musical aining shapes expec a ions abou hy hm con ol
ha di e om hose suppo ed by he desc ip o -based in e ace.
By con as , no signi ican co ela ions we e ound be ween age ange, music pe -
o mance expe ience, pe cussion pe o mance expe ience, and any o he simila i y
measu es. This sugges s ha he desc ip o -based in e ace may be equally accessi-
ble o bo h specialis s and non-specialis s ac oss age g oups and expe ien ial back-
g ounds. None heless, a la ge and mo e di e se sample o pa icipan s is wa an ed
o ob ain mo e obus insigh s.
5.4.3 Desc ip o In e p e abili y
Use eedback e ealed a ying le els o desc ip o in ui i eness (Figu es 16 and 17):
s a and onse _coun we e mos accessible, while syncopa ion and balance p o ed
challenging. Despi e syncopa ion’s heo e ical g ounding in music heo y, use s
ound i di icul o con ol in p ac ice, indica ing a gap be ween heo e ical alidi y
and p ac ical usabili y.
Balance, de ined as symme y o onse dis ibu ion a ound he uni ci cle, ecei ed
low in ui i eness a ings. While ma hema ically well-de ined, his geome ic mea-
44 Chap e 5. Discussion
su e may be oo abs ac o immedia e musical comp ehension wi hou addi ional
in e ace suppo o aining.
5.5 Quali a i e Use Feedback Analysis
In addi ion o quan i a i e measu es, a quali a i e analysis o use eedback p o-
ided ich insigh s in o pa icipan s’ expe iences wi h he desc ip o -based hy hm
in e ace. Tables 8, 9, and 10 summa ize hese indings.
Table 8 highligh s aspec s o he sys em ha use s app ecia ed, such as he cla -
i y o he in e ace, immedia e audi o y and isual eedback, and he enjoymen o
explo a o y lea ning. Pa icipan s gene ally ound he s a and onse _coun pa am-
e e s in ui i e. Con e sely, hey epo ed di icul ies wi h less angible desc ip o s
like syncopa ion and balance, e lec ing a gap be ween heo e ical ele ance and
p ac ical in e p e abili y. Pa icipan s also exp essed a need o explana o y ex o
guidance, sugges ing ha some aspec s o he in e ace we e ini ially opaque.
Table 9 dis ills ecu ing pa e ns in usabili y. No ably, use s alued immedia e
eedback and hands-on expe imen a ion, which acili a ed engagemen and lea ning.
Howe e , he complexi y o ce ain slide s and pa ame e in e ac ions some imes
hinde ed unde s anding, highligh ing he impo ance o in e ace design ha bal-
ances exp essi e con ol wi h accessibili y.
Finally, Table 10 p esen s speci ic ea u e eques s. Use s p oposed con ex ual
ool ips, u o ial sys ems, and isual aids o be e map slide mo emen s o hy h-
mic ou comes. Addi ional sugges ions included adap i e in e aces ailo ed o use
expe ise and expanded con ol o e musical pa ame e s such as swing, ins umen
selec ion, and sound ex u es. These eques s indica e di ec ions o imp o ing bo h
lea nabili y and c ea i e lexibili y in u u e sys ems.
5.6. Me hodological Conside a ions and Limi a ions 45
5.6 Me hodological Conside a ions and Limi a ions
This esea ch ope a es wi hin se e al me hodological cons ain s ha shape bo h i s
con ibu ions and applicabili y. The ocus on 16-s ep monophonic pa e ns p o ides
compu a ional ac abili y and enables sys ema ic analysis o he comple e pa e n
space, bu necessa ily limi s di ec applica ion o mul iple gen es o music ha
ea u e a iable ime signa u es. Howe e , his cons ain se es he esea ch goals
o es ablishing undamen al p inciples o desc ip o -based con ol ha can in o m
mo e complex sys ems.
The use s udy wi h 12 pa icipan s p o ides ini ial insigh s in o sys em usabili y,
bu he sample size and pa icipan composi ion limi he gene alizabili y o indings.
No ably, ew pa icipan s had ex ensi e pe cussion expe ience (Figu e 13), which
may ha e in luenced in e p e abili y a ings o hy hm-speci ic desc ip o s.
The pa e n eplica ion ask enables quan i a i e e alua ion and objec i e pe o -
mance me ics. While his may no ully e eal c ea i e po en ial, i es ablishes
baseline pe o mance cha ac e is ics ha in o m sys em design, p io i izing igo ous
e alua ion o co e p inciples o e immedia e p ac ical deploymen .

Chap e 6
Conclusions
The cen al goal o his esea ch was o c ea e a hy hm gene a ion sys em whe e
use s can in ui i ely shape hy hmic pa e ns by con olling musically ele an p op-
e ies. This chap e summa izes how we achie ed his objec i e h ough sys ema ic
compa ison o gene a ion app oaches, op imiza ion o he con ol in e ace, and al-
ida ion o he sys em’s e ec i eness o musical in e ac ion.
We app oached his p oblem by compa ing wo gene a ion me hods: a ia ional au-
oencode s (VAEs) and desc ip o -based neu al ne wo ks. While he VAE app oach
is ma hema ically elegan and o e s smoo h la en space in e pola ion, i p o ed un-
sui able o musical in e ac ion due o i s abs ac , unin ui i e con ol dimensions.
In con as , he desc ip o -based app oach—le e aging hy hm ea u es g ounded
in music pe cep ion esea ch—p o ided a mo e musically meaning ul and engaging
in e ace o use s.
Ini ially, we ained he neu al ne wo k using 8 hy hm desc ip o s, howe e , in
o de o op imize he desc ip o -based sys em o li e use, we sys ema ically educed
he con ol space om eigh o i e desc ip o s h ough lea e-k-ou expe imen s.
This minimal se (onse _coun , s a posi ion, cen e , syncopa ion, and balance)
main ained a decen pa e n accu acy while signi ican ly simpli ying he in e ace.
We de ined a smoo hness me ic o e alua e model pe o mance, quan i ying how
46
47
g adual changes in desc ip o alues co espond o smoo h hy hmic ansi ions.
The me ic combined KL di e gence and Euclidean dis ance in desc ip o space o
cap u e bo h p obabilis ic and geome ic aspec s o hy hm change. To sys ema -
ically assess beha io , we in oduced he concep o a mo emen , ca ego ized in o
h ee ypes—small, medium, and la ge—based on he magni ude o desc ip o ad-
jus men s. Using his me ic ac oss mo emen ypes, we iden i ied he model ha
p oduced he mos con inuous and p edic able hy hmic ans o ma ions.
The use expe ience expe imen , conduc ed wi h 12 pa icipan s, alida ed he sys-
em’s usabili y and e ec i eness. Pa icipan s we e able o success ully manipula e
hy hmic p ope ies using he desc ip o slide s, demons a ing in ui i e con ol o e
he gene a ed hy hms. Mo eo e , he obse ed co ela ion be ween objec i e sim-
ila i y me ics and pa icipan s’ subjec i e simila i y a ings con i med ha ou
compu a ional app oach aligns closely wi h human hy hm pe cep ion, suppo ing
he pe cep ual ele ance o he chosen desc ip o s.
While cons ained o 16-s ep monophonic pa e ns and limi ed by ou pa icipan
sample, his wo k demons a es ha pe cep ually g ounded desc ip o s can b idge
compu a ional hy hm gene a ion wi h in ui i e musical con ol, es ablishing a oun-
da ion o eal- ime in e ac i e hy hm sys ems.
Chap e 7
Fu u e Wo k
A majo di ec ion o u u e esea ch is he de elopmen o a pipeline ha ex ends
desc ip o -con olled monophonic gene a ion in o polyphonic d um pa e ns. This
nex s ep would build on p e ious esea ch, such as he apping s udies by Cla k
(2023)—which demons a ed how monophonic hy hmic inpu can be mapped o
polyphonic ou pu s while p ese ing pe cep ual s uc u e—and he dualized d um
pa e ns da ase by Haki e al. (2023). By in eg a ing ou desc ip o -based mono-
phonic gene a o wi h polyphonic expansion echniques, use s could design sim-
ple hy hmic skele ons h ough in ui i e desc ip o con ol and ans o m hem
in o ich, mul i- oice d um a angemen s in eal ime. This app oach would e-
qui e u he in es iga ion in o how desc ip o ela ionships—such as syncopa ion
and balance— ansla e o polyphonic s uc u es and whe he addi ional desc ip o s
would be needed.
Cu en expe imen s in his hesis elied on he ull 16-s ep bina y pa e n da ase ,
which, while comp ehensi e, can in oduce edundancy and noise in he esul ing
pa e ns. Fu u e wo k could explo e in elligen da ase educ ion s a egies o im-
p o e e iciency while p ese ing musical di e si y, such as clus e ing hy hmically
simila pa e ns o iden i ying a che ypal hy hm amilies. Addi ionally, i would
be in e es ing o explo e ex ending he sys em o gene a e a iable-leng h pa e ns
beyond he 16-s ep cons ain —which would equi e de eloping desc ip o o mula-
48
49
ions ha gene alize ac oss di e en ime signa u es and pa e n leng hs.
Finally, i would be aluable o eplica e he use expe ience expe imen wi h a
la ge and mo e di e se pa icipan pool o ob ain mo e obus insigh s in o us-
abili y and pe cep ual alignmen . Fu u e s udies could also conside an audio-only
e sion o he expe imen , emo ing he isual ep esen a ion o pa e ns, o assess
how well pa icipan s can manipula e and pe cei e hy hmic s uc u es based pu ely
on sound. Addi ionally, gi en ha use s in he cu en s udy we e gene ally able
o ma ch a ge pa e ns e ec i ely, p esen ing exe cises wi h la ge desc ip o dis-
ances could help e alua e he sys em’s beha io o e a b oade ange o hy hmic
a ia ions and u he challenge pa icipan s’ con ol and pe cep ual sensi i i y.