scieee Science in your language
[en] (orig)

SurpriseLSTM: Neural Modeling of Musical Expectation and Surprise in Monophonic Melodies

Author: Lissenko, Tanguy
Publisher: Zenodo
DOI: 10.5281/zenodo.17304927
Source: https://zenodo.org/records/17304927/files/Tanguy-Lissenko_SMC_2025_Master_Thesis.pdf
Mas e in Sound and Music Compu ing
Uni e si a Pompeu Fab a
Su p iseLSTM: Neu al Modeling o
Musical Expec a ion and Su p ise in
Monophonic Melodies
Tanguy Lissenko
Supe iso s: Ma in Rocamo a, Manuel Anglada-To
Augus 2025
Con en s
1 In oduc ion 1
1.1 Mo i a ion.................................. 1
1.2 Backg ound and con ex . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 P oblems a emen ............................. 2
1.4 Resea chobjec i es............................. 3
1.5 Me hodological o e iew . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Con ibu ions................................ 4
1.7 ThesisO ganiza ion ............................ 5
2 S a e o he A 7
2.1 Musical Expec a ion, Su p ise, and Unce ain y . . . . . . . . . . . . . 7
2.2 Musical S yle as P obabilis ic G amma . . . . . . . . . . . . . . . . . 8
2.3 Rule-Based Expec a ion: Implica ion-Realiza ion . . . . . . . . . . . . 8
2.4 P obabilis ic Rule-Based App oaches . . . . . . . . . . . . . . . . . . . 9
2.5 IDyOM: In o ma ion Dynamics o Music . . . . . . . . . . . . . . . . . 10
2.5.1 Va iable-o de PPM∗in e ence....................... 10
2.5.2 Mul iple Viewpoin In eg a ion . . . . . . . . . . . . . . . . . . . . . . 10
2.5.3 Sho - e m and Long- e m Memo y . . . . . . . . . . . . . . . . . . . . 11
2.6 Deep Lea ning App oaches . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6.1 In o ma ion-Con en Cu e Ma ching (IIC) . . . . . . . . . . . . . . . . 11
2.6.2 AudioIC: Su p isal om Audio . . . . . . . . . . . . . . . . . . . . . . 13
2.6.3 Di usion-Based Su p isal Es ima ion . . . . . . . . . . . . . . . . . . . 14
3 Da ase , P ep ocessing, and Fea u e Rep esen a ion 16
3.1 Monophonic MIDI Co po a . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 P ep ocessing o MIDI S imuli . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Fea u e Rep esen a ion . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Model A chi ec u e and T aining 23
4.1 Why Nex -Pi ch P edic ion . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Desi ed Model Beha io . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 A chi ec u al Mo i a ion . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Da a Inpu , Masking, and Packing . . . . . . . . . . . . . . . . . . . . 24
4.5 Su p iseLSTM A chi ec u e . . . . . . . . . . . . . . . . . . . . . . . . 25
4.6 T aining Objec i e and Loss . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Expe imen al Valida ion 28
5.1 Expe imen al Me hodology . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Expe imen 1: Inc emen al Fea u e Selec ion . . . . . . . . . . . . . . . 29
5.3 Expe imen 2: Model-Model Co ela ions . . . . . . . . . . . . . . . . . 32
5.4 Expe imen 3: Su p ise s. Pleasan ness . . . . . . . . . . . . . . . . . 33
5.5 Expe imen 4: Cho ale En opy P o iles . . . . . . . . . . . . . . . . . 35
5.6 Expe imen 5: Wund -E ec Modeling . . . . . . . . . . . . . . . . . . 37
5.7 Summa y and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6 Neu al s. S a is ical Musical Expec a ion 41
6.1 Chap e O e iew ............................. 41
6.2 Da a and Expe imen al Design . . . . . . . . . . . . . . . . . . . . . . 42
6.3 S a is icalAnalysis............................. 42
6.4 Resul sandAnalysis............................ 43
6.4.1 T aining Da a Dependencies . . . . . . . . . . . . . . . . . . . . . . . . 43
6.4.2 Caden ial Con ex Analysis . . . . . . . . . . . . . . . . . . . . . . . . 45
6.4.3 A chi ec u al and Cogni i e Implica ions . . . . . . . . . . . . . . . . . 47
6.5 Discussion.................................. 47
7 Discussion and Fu u e Wo k 51
7.1 Summa y o Con ibu ions . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.2 Implica ions o Musical Cogni ion . . . . . . . . . . . . . . . . . . . . 52
7.3 Ex ensions o Polyphonic and Audio Domains . . . . . . . . . . . . . . 53
7.4 C oss-Cul u al and Indi idual Di e ences . . . . . . . . . . . . . . . . . 53
7.5 Conclusion.................................. 53
Lis o Figu es 55
Lis o Tables 56
Bibliog aphy 57
Abs ac
We in oduce Su p iseLSTM, an LSTM ne wo k ha p edic s no e-by-no e su p isal
in s ic ly monophonic, symbolic melodies. We conduc he i s sys ema ic com-
pa isons o a neu al expec ancy model agains he symbolic IDyOM model and he
audio-based AudioIC algo i hm ac oss i e complemen a y alida ion pa adigms:
la ge-scale co ela ions on Wes e n melody co po a, wo-no e pleasan ness a ings,
Bach-cho ale su p ise p o iles, and he musical Wund e ec . Su p iseLSTM ma ches
o exceeds he pe o mance o IDyOM in aligning wi h human su p ise judgmen s.
A de ailed con ex analysis e eals dis inc di e ences in how neu al and s a is-
ical models espond o au hen ic cadences e sus non-caden ial si ua ions, wi h
Su p iseLSTM showing supe io con ainmen a es and dis ibu ion i ing in p e-
dic ing human scale deg ee expec a ions. These indings demons a e ha ecu -
en neu al ne wo ks, ained on basic no e-le el ea u es, can accu a ely cap u e
he cogni i e p inciples o s a is ical lea ning and p obabilis ic p edic ion in music
pe cep ion. Code and a p e- ained model a e a ailable a h ps://gi hub.com/
lissenko/su p ise-ls m.
Keywo ds: Musical su p ise; In o ma ion con en ; LSTM; IDyOM

Chap e 1
In oduc ion
1.1 Mo i a ion
Music has he abili y o c ea e expec a ions in lis ene s and hen e en ually su p ise
by iola ing hose expec a ions. When we hea he opening no es o a amilia
melody, we unconsciously an icipa e wha comes nex . Some imes ou p edic ions
a e con i med, c ea ing a sense o sa is ac ion. O he imes, he music akes an
unexpec ed u n. We migh encoun e a su p ising cho d change, an unusual melodic
leap, o an unexpec ed hy hm, c ea ing momen s o ension, exci emen , o deligh .
Take Bee ho en’s Fi h Symphony: he opening da-da-da-DUM mo i c ea es a clea
pa e n ha lis ene s lea n quickly. When his pa e n e u ns la e , some imes
unchanged and some imes a ied, i con i ms o su p ises ou expec a ions. In jazz
s anda ds like Au umn Lea es, musicians i s play he melody as w i en, se ing
up wha lis ene s expec o hea . Then hey imp o ise a ia ions ha change he
melody while keeping he basic cho d s uc u e.
This in e play be ween expec a ion and su p ise plays impo an pa o ou musical
aes he ic expe ience [1]. I con ibu es o ou emo ional esponse, and ou engage-
men wi h he music. Mo eo e i a piece is en i ely p edic able, i may eel bo ing
and i he piece is comple ely unp edic able, i may eel chao ic and unpleasan . The
music we like o en has a balance be ween expec a ions iola ion and con i ma ion.
1
2Chap e 1. In oduc ion
Unde s anding how hese expec a ions o m and how su p ise a ec s ou musical ex-
pe ience is he e o e c ucial o se e al easons. Fi s , i e eals undamen al aspec s
o human cogni ion: how we lea n pa e ns, make p edic ions, and adap o new in-
o ma ion in eal ime. Second, i helps explain why ce ain musical s uc u es a e
ound ac oss cul u es and his o ical pe iods, sugges ing he psychological p inciples
ha go e n musical aes he ics. Thi d, as a i icial in elligence sys ems inc easingly
gene a e music, he abili y o model and con ol musical su p ise becomes essen ial
o c ea ing composi ions ha eel bo h cohe en and engaging o human lis ene s.
1.2 Backg ound and con ex
The scien i ic s udy o musical expec a ion has e ol ed h ough se e al phases. In
he 1950s, music heo is Leona d Meye p oposed ha musical meaning eme ges
om he c ea ion and esolu ion o expec a ions: when music con i ms wha we an-
icipa e, we eel sa is ac ion; when i iola es ou p edic ions, we expe ience ension
o su p ise [2]. Building on his ounda ion, esea che s de eloped ule-based heo ies
o explain how hese expec a ions o m, d awing on p inciples om music heo y and
pe cep ual psychology [3, 4, 5, 1]. Since he 1990s, howe e , a di e en app oach has
eme ged based on implici s a is ical lea ning. This heo y p oposes ha lis ene s
unconsciously abso b he s a is ical egula i ies o musical s yles h ough me e ex-
posu e, g adually building in e nal models ha allow hem o p edic likely melodic
and ha monic con inua ions [6, 7, 8]. This lea ning p ocess ope a es au oma ically,
wi hou conscious e o o o mal musical aining, and ollows simila p inciples
obse ed in language acquisi ion and o he cogni i e domains [9, 10, 11].
1.3 P oblem s a emen
Recen decades ha e seen he de elopmen o compu a ional models ha a emp
o cap u e hese p ocesses. The mos p ominen is IDyOM (In o ma ion Dynamics
o Music), which uses s a is ical echniques o p edic he nex no e in a melody and
quan i ies su p ise as he in o ma ion con en o unexpec ed e en s. Such models
ha e success ully p edic ed aspec s o human musical pe cep ion and ha e ound
1.4. Resea ch objec i es 3
applica ions in music analysis, composi ion, and e en neu oscience esea ch [7, 8,
12, 13, 14, 15].
Howe e , compu a ional modeling has e ol ed wi h he ise o deep lea ning. Neu-
al ne wo ks ha e demons a ed impo an abili ies o lea n complex pa e ns in
sequen ial da a, om na u al language [16, 17, 18] o music [19, 20]. This aises
undamen al ques ions: Can mode n neu al a chi ec u es cap u e he same p inci-
ples o musical expec a ion ha go e n human cogni ion? How do hei p edic ions
compa e o es ablished s a is ical models? And pe haps mos impo an ly, do hey
align wi h human judgmen s o musical su p ise?
This hesis add esses hese ques ions by ocusing on he case o monophonic melodies:
single lines o music wi hou ha monic accompanimen . While simple han ull
musical ex u es, monophonic melodies con ain he essen ial ing edien s o musical
expec a ion: pa e ns o pi ch and hy hm ha c ea e p edic able s uc u es and
oppo uni ies o su p ise.
1.4 Resea ch objec i es
To add ess hese gaps, we pu sue h ee in e ela ed goals:
1. Neu al ins an ia ion o SLH and PPH: De elop Su p iseLSTM, a ecu -
en ne wo k ha ins an ia e he s a is ical lea ning and p obabilis ic p edic-
ion hypo hesis and lea ns ansi ion s a is ics and ou pu s pi ch-p edic ion
dis ibu ions sui able o in o ma ion-con en compu a ion.
2. Model- o-model compa ison: We conduc a sys ema ic e alua ion o Su -
p iseLSTM agains wo benchma ks: IDyOM and he audio-based AudioIC
model. We use ou complemen a y pa adigms: la ge-scale co ela ions on
monophonic Wes e n melodies, wo-no e in e al pleasan ness a ings, su p ise
p o ile i s on Bach cho ales, and a eplica ion o he musical Wund e ec .
3. Musical con ex analysis: In es iga e how Su p iseLSTM and IDyOM e-
spond di e en ly o au hen ic cadences e sus non-cadence con ex s. By com-
4Chap e 1. In oduc ion
pa ing scale deg ee dis ibu ions. We iden i y whe e neu al and s a is ical
app oaches di e ge in hei unde s anding o onal expec a ions.
1.5 Me hodological o e iew
Ou app oach in ol es h ee main componen s: de eloping a neu al ne wo k model
o musical su p ise, compa ing i wi h exis ing models, and es ing all models agains
human da a.
We c ea e Su p iseLSTM, a neu al ne wo k ha lea ns o p edic he nex no e in
a melody. The model uses basic musical in o ma ion om each no e (pi ch, iming,
melodic in e als, e c) and lea ns pa e ns om la ge collec ions o melodies. We
compa e his neu al app oach wi h wo exis ing models: IDyOM (a s a is ical model)
and AudioIC (which wo ks wi h audio eco dings).
To e alua e how well hese models cap u e human musical expec a ions, we conduc
i e di e en es s:
•Finding he mos impo an musical ea u es o p edic ing su p ise
•Compa ing model p edic ions ac oss la ge collec ions o Wes e n melodies
•Tes ing whe he models ma ch human a ings o how pleasan di e en musical
in e als sound
•Examining de ailed su p ise pa e ns in wo Bach cho ales ha ha e human
anno a ions
•Tes ing he Wund e ec , he idea ha mode a ely su p ising music is mos
pleasan
1.6 Con ibu ions
1. We in oduce Su p iseLSTM, a neu al ne wo k model ha applies s a is-
ical lea ning and p obabilis ic p edic ion p inciples o melodic expec a ion,
2.6. Deep Lea ning App oaches 11
whe e H(P )=−PeP (e) log2P (e)and Hmax( ) = log2|E |is he maximum en-
opy o e iewpoin ’s alphabe E .
2.5.3 Sho - e m and Long- e m Memo y
To e lec bo h li elong encul u a ion and piece-speci ic con ex , IDyOM main ains
long- e m model (LTM) ained on a la ge co pus o melodies, and sho - e m model
(STM) buil dynamically on he cu en piece. Thei p edic ions a e in e pola ed,
again by en opy weigh ing, o e u n he inal P(e|c).
2.6 Deep Lea ning App oaches
2.6.1 In o ma ion-Con en Cu e Ma ching (IIC)
Bja e e al. [25] p opose a me hod o s ee ing symbolic music gene a ion by ma ch-
ing an Ins an aneous In o ma ion Con en (IIC) cu e o a use -speci ied a ge .
Gi en any au o eg essi e c i ic model p, hey compu e each oken’s su p isal
IC(xi|x<i) = −log p(xi|x<i)
and hen p ojec hese disc e e IC alues on o eal ime ia a empo al-localiza ion
unc ion (i, x)and a smoo h window λ, yielding he con inuous IIC cu e
IIC( , x) = X
(i,x)<
λ − (i, x), iIC(xi|x<i).
To guide gene a ion, hey de ine he L1de ia ion be ween he gene a ed IIC and a
a ge cu e IIC∗( ),

IIC −IIC∗
1=ZT
0IIC( )−IIC∗( , x)d ,
and employ beam sea ch o e con inua ions, selec ing a each s ep he candida e
whose IIC bes ma ches he a ge .
In hei expe imen s, hey ins an ia e pas a causal T ans o me PIA model [26]

12 Chap e 2. S a e o he A
Figu e 1: High-le el a chi ec u e o IDyOM. Mul iple iewpoin s eams eed PPM∗
engines in bo h long- e m and sho - e m modules. Thei weigh ed ou pu s combine
o p oduce he inal condi ional dis ibu ion. [24]
2.6. Deep Lea ning App oaches 13
buil on he Pe cei e IO a chi ec u e, p e ained on a la ge co pus o exp essi e
piano pe o mances. MIDI e en s a e okenized in o Pi ch, Veloci y, Du a ion, and
Timeshi okens; he T ans o me p edic s he nex oken’s dis ibu ion a each
s ep, yielding he condi ional p obabili ies used o IC. They demons a e ha (1)
IIC co ela es s ongly wi h ha monic and hy hmic complexi y, (2) lis ene s udies
can eliably iden i y which a ge IIC cu e was used, and (3) ma ching IIC enables
con ol o su p isal in symbolic music gene a ion.
Impo an ly, he unde lying PIA model is no publicly eleased, and key implemen-
a ion de ails a e no su icien ly documen ed o independen eimplemen a ion. As
a esul , while he IIC amewo k i sel is gene al, we canno di ec ly ep oduce o
compa e i s pe o mance wi h ou Su p iseLSTM in open-sou ce se ings.
2.6.2 AudioIC: Su p isal om Audio
Bja e e al. [27] ex end su p isal es ima ion o aw music eco dings by aining a
12-laye causal T ans o me o au o eg essi ely p edic 64-dimensional la en audio
ames ob ained om he Music2La en [28] consis ency au oencode [29]. Ins ead
o a so max o e a disc e e ocabula y, hey model he nex - ame dis ibu ion as
a 32-componen Gaussian mix u e whose pa ame e s (means, a iances, weigh s)
a e p oduced by he T ans o me ’s inal linea laye . The ame-wise in o ma ion
con en
IC(x |x< )=−log pGMMx |x< 
is hus unbounded and e lec s su p ise o e con inuous audio ep esen a ions.
They ain on a la ge mul i-s em popula -music co pus, ine- uning on ocal s ems
when modeling EEG esponses. Posi ional con ex is cap u ed ia o a y embed-
dings and FlashA en ion. Audio is p ep ocessed in o mono MP3 a 22050 Hz,
encoded by Music2La en a app oxima ely 11 Hz ame a e, and conca ena ed
in o sequences up o 4600 ames (abou 7 min).
In e alua ion, AudioIC’s mean IC dec eases eliably on epea ed segmen s and ises
on con as ing segmen s la e in a piece. Mo eo e , including AudioIC es ima es in
14 Chap e 2. S a e o he A
a co ical- acking eg ession signi ican ly imp o es EEG p edic ion o e an ene gy-
only baseline, demons a ing ha AudioIC cap u es neu al su p isal signa u es.
The au ho s p o ide an open-sou ce implemen a ion and p e ained weigh s, en-
abling ep oduc ion o key esul s.
2.6.3 Di usion-Based Su p isal Es ima ion
Two dis inc app oaches ha e eme ged o es ima ing musical su p isal using di u-
sion models, each add essing di e en aspec s o empo al musical p edic ion.
Mascle and Kelle [30] p opose using a denoising di usion p obabilis ic model
(DDPM) as a deep gene a i e measu e o musical expec a ion. They ea a di -
usion model’s a ia ional bound on da a likelihood as an app oxima e su p isal:
−log pθ(x0)≤LT
|{z}
≈0
+
T
X
=1 L −1−L0,
whe e each L is exp essed in closed o m ia he model’s p edic ed noise ˆεθ(x , ).
They apply a p e ained audio-di usion-256 model ained on mel-spec og ams,
compu ing o al su p isal by summing likelihood bounds ac oss non-o e lapping 5-
second blocks. Using he Gold e al. co pus, hey demons a e ha di usion-based
su p isal exhibi s he expec ed Wund e ec wi h signi ican quad a ic ela ionships
be ween in o ma ion con en and lis ene p e e ence.
On he o he hand, Bja e e al. [31] ex end su p isal es ima ion o causal, au o e-
g essi e scena ios using au o eg essi e di usion models (ADMs). Unlike app oaches
ha assume Gaussian mix u e dis ibu ions o nex -s ep p edic ions, hei me hod
es ima es in o ma ion con en h ough he ins an aneous change o a iables o mula
applied o p obabili y low ODEs. They employ wo di usion p ocesses: EDM and
Rec i ied Flow, o es ima e su p isal in he Music2La en con inuous audio ep e-
sen a ion space.
An impo an insigh om hei wo k is ha di usion models can compu e like-
lihood es ima es a di e en noise le els du ing he denoising p ocess, po en ially
2.6. Deep Lea ning App oaches 15
cap u ing musical ea u es a a ying empo al and spec al g anula i ies. They
demons a e ha in e media e noise le els be e co ela e wi h symbolic pi ch su -
p isal (as measu ed by IDyOM) han ully denoised ep esen a ions, sugges ing ha
con olled noise il e ing p ese es pi ch- ela ed expec a ion while emo ing imb al
a ia ions i ele an o melodic su p ise.
Chap e 3
Da ase , P ep ocessing, and Fea u e
Rep esen a ion
This chap e su eys he monophonic MIDI co po a employed in ou s udy, de ails
he p ep ocessing s eps applied o each collec ion, and ou lines he ull se o melodic
ea u es, along wi h hei domains and encoding schemes, used by ou LSTM model.
3.1 Monophonic MIDI Co po a
Table 1 lis s he MIDI da ase s we used, each se ing a di e en ole ( aining,
alida ion o es ing). Al hough all collec ions a e mean o be monophonic, a ew
iles may con ain o e lapping no es; hese a e emo ed du ing ou p ep ocessing s ep.
We will explain how and when we use each se in Chap e s 4 and 5. Each da ase
is labeled wi h an in ege ID, which will be used o all subsequen e e ences.
The Clean Melodies collec ion [32] is a hand-picked subse o he la ge Los Angeles
MIDI da ase . F om ha mas e co pus, which i sel me ges he Lakh midi da ase
[33], he Me aMIDI Da ase [34], Reddi MIDI and o he public sc apes, i uses only
he iles con aining a single, s ic ly monophonic melody ack o channel. Each
melody was ex ac ed di ec ly om i s MIDI ile (no audio sou ce), and no u he
ans o ma ions o pos -p ocessing we e applied beyond he monophony il e . This
16

3.1. Monophonic MIDI Co po a 17
Table 1: O e iew o Da ase s
ID Desc ip ion Melodies E en s E/M Pi ches
1 Clean melodies (Teg idy-MIDI-
Da ase )
117659 7530176 64.0 79
2 Wes e n melodies 1110 50936 45.9 37
Cho ale melodies 338 16912 50.0 24
Ge man olk songs 213 8393 39.4 27
Canadian olk ballads 152 8552 56.3 26
Yugosla ian olk songs 119 2691 22.6 25
Aus ian olk songs 104 5306 51.0 35
Swiss olk songs 93 4586 49.3 34
Alsa ian olk songs 91 4496 49.4 32
3 Na u alis ic s imuli 57 4590 80.5 48
4 In e al pleasan ness 31 62 2.0 31
5 Manza a expe imen s imuli 2 86 43.0 12
6 AC/NC s imuli 90 754 8.4 28
18 Chap e 3. Da ase , P ep ocessing, and Fea u e Rep esen a ion
clean subse p o ides a compac , high-quali y se o MIDI melodies o ini ial model
de elopmen and ea u e es ing.
The Wes e n melodies co pus combines se e al public-domain collec ions o iginally
dis ibu ed in **ke n o ma by he Music Cogni ion Labo a o y a Ohio S a e Uni-
e si y (see h ps://ke n.humd um.o g/cgi-bin/b owse?l=essen/eu opa) and
he Cen e o Compu e Assis ed Resea ch in he Humani ies a S an o d Uni e -
si y (see h ps://ke n.cca h.o g/cgi-bin/ksb owse?s=no a). I was assembled
o cap u e he b oad Occiden al melodic adi ion, spanning epe oi es om Eu ope
and a small selec ion om No h Ame ica.
The Na u alis ic S imuli co pus [35] was de eloped o p obe lis ene s’ aes he ic e-
sponses by p esen ing exce p s d awn om a b oad a ay o musical gen es, pe iods,
compose s, onali ies, and me e s. Pa icipan s hea d each exce p and p o ided
a ings o pe cei ed pleasan ness/liking.
The in e al pleasan ness co pus is de i ed om a dense a ing expe imen by
Anglada-To e al. [36], in which 415 U.S. pa icipan s a ed he pleasan ness o
pi ch in e als, de i ed om 15,000 s imuli sampled andomly and uni o mly om
−15 o +15 semi ones. They used a pleasan ness scale (1 = no a all pleasan o
7 = e y much pleasan ). We e ained only in ege - alued semi one in e als, i.e.
hose whose size is an exac whole numbe o semi ones (no ac ional alues). We
hen gene a ed 31 wo-no e MIDI iles, each beginning on he e e ence pi ch C4 and
spanning all in ege semi one in e als om –15 o +15.
The Manza a co pus [37] comp ises wo monophonic lines ex ac ed om J.S.Bach
cho ales (see Figu e 3). In he o iginal beha io al s udy, pa icipan s we e asked o
place mone a y wage s on he iden i y o each upcoming pi ch. F om hei ial-by-
ial wage ing accu acy, he in o ma ion con en (IC) o each no e was es ima ed,
yielding en opy p o iles o he melodies. Fu he de ails o he en opy compu a ion
and ou analyses a e p esen ed in Chap e 5.
The AC/NC s imuli co pus is de i ed om he melodic cloze expe imen by Mo gan
e al. [38], in which pa icipan s hea d monophonic melodic openings and we e
3.2. P ep ocessing o MIDI S imuli 19
asked o sing he no e hey expec ed o come nex . The co pus comp ises 45 pai s
o melodic s ems, whe e each pai consis s o an Au hen ic Cadence (AC) e sion
and a Non-Cadence (NC) e sion ha di e by only a small numbe o no es. AC
condi ion melodies end wi h an implied V ha mony (dominan cho d) ha c ea es
a s ong expec a ion o esolu ion o he onic (scale deg ee 1), while NC condi ion
melodies do no end on a V ha mony and we e designed o a oid c ea ing s ong
expec a ions o any pa icula con inua ion no e. Al hough monophonic, hese
melodies eliably gene a e implici ha monic s uc u e o Wes e n lis ene s. The
co pus p o ides a con olled es case o examining how compu a ional models
handle caden ial e sus non-caden ial melodic con ex s, pa icula ly hei abili y o
ecognize one o he mos ounda ional ha monic p og essions in Wes e n music.
3.2 P ep ocessing o MIDI S imuli
Be o e i ing any models o compu ing su p ise me ics, e e y MIDI ile is p ocessed
h ough he ollowing pipeline.
Pa sing and ea u e ex ac ion Each ile is loaded wi h p e y_midi and con-
e ed in o a sequence o pe -no e ea u e ec o s (see Table 2).
Finge p in ing and deduplica ion We compu e a inge p in by conca ena ing
he pi ch-du a ion pai s o he i s en no es o each melody. Any melody whose
inge p in ma ches a p e iously seen one is disca ded.
Monophony and leng h il e ing Melodies ha a e no s ic ly monophonic
(i.e. con ain o e lapping no es) o whose o al no e coun alls below a da ase -
speci ic minimum a e emo ed. This h eshold is chosen depending o he na u e o
he da a.
Shu ling The emaining melodies a e andomly pe mu ed o elimina e o de ing
biases du ing aining and e alua ion.
20 Chap e 3. Da ase , P ep ocessing, and Fea u e Rep esen a ion
IDyOM-only subse In pa allel, we assemble a seconda y co pus o IDyOM
e alua ion by eapplying all s eps abo e and addi ionally excluding any melody
con aining no es sho e han a six y- ou h du a ion ( o a oid a speci ic un ime
e o wi h IDyOM).
3.3 Fea u e Rep esen a ion
In o de o eed each melody in o he neu al sequence model, e e y no e is con-
e ed in o a ixed-leng h ec o whose componen s encode melodic, hy hmic, and
onal a ibu es. Mos o he ea u es implemen ed he e a e d awn om he cogni-
i ely mo i a ed se used in he IDyOM model [7], which we e designed o cap u e
pe cep ually salien aspec s o melodic s uc u e.
Causali y cons ain . One c ucial equi emen is ha no ea u e may use in o -
ma ion om no es ha occu la e han he one being encoded. Since ou goal is
o model human lis ene s’ su p ise in eal ime, each ea u e o he - h no e is
compu ed solely om he i s no es (and any global me ada a such as in e ed
key); he model ne e peeks ahead a u u e e en s i has no ye hea d. This causal
es ic ion ensu es ha compu ed su p ise uly e lec s wha a lis ene could know
a each momen in he un olding melody.
Table 2 summa izes he ull se o ea u es, hei ma hema ical domains, and he
o m o hei encoding.
The mos undamen al a ibu e is he pi ch o each no e, ep esen ed as an in ege
MIDI alue in {0,...,127}and embedded ia a 128-dimensional one-ho ec o .
To cap u e oc a e-in a ian pa e ns, we also eco d pi ch_class, i.e. he pi ch
modulo 12, simila ly encoded as a 12-dimensional one-ho ec o . Melodic mo-
ion is desc ibed bo h by he signed in e al om he p eceding no e (an in ege
in {−36,...,36}) and by i s con ou ( he sign o ha in e al in {−1,0,1}), each
disc e ized in o a one-ho ep esen a ion. A complemen a y ea u e, cpin ip, mea-
su es he signed dis ance om he i s no e o he melody, again as a one-ho ec o
o e he same 73- alue ange.
4.6. T aining Objec i e and Loss 27
he ue pi ch a + 1. By minimizing his loss, he model lea ns o assign high
p obabili y o he ac ual con inua ion a e e y s ep, he eby cap u ing he s a is ical
s uc u e o melodic sequences.

Chap e 5
Expe imen al Valida ion
5.1 Expe imen al Me hodology
In his chap e , we e alua e ou Su p iseLSTM model o melodic su p ise agains
wo benchma ks: he s a is ical IDyOM model [7] and he audio-based AudioIC
model [39]. We p esen i e complemen a y analyses: (1) a g eedy o wa d–backwa d
ea u e-selec ion p ocedu e o iden i y he mos in o ma i e inpu ep esen a ions;
(2) a la ge-scale co ela ion s udy on a Wes e n-melody co pus; (3) a wo-no e in-
e al pa adigm ela ing model-de i ed su p ise o human pleasan ness a ings; (4)
de ailed en opy p o iles o wo Bach cho ales wi h human su p ise da a; and (5)
a eplica ion o he classic Wund e ec ia mixed-e ec s modeling o su p ise and
liking.
Ideally, one would alida e a su p ise model by compa ing i s no e-by-no e p edic-
ions o human lis ene s’ su p ise judgmen s. Howe e , only wo Bach cho ales wi h
such anno a ions a e a ailable [37], which is insu icien o b oad gene aliza ion. To
add ess his, we employ he di e en e alua ion pa adigms desc ibed abo e o build
a comp ehensi e assessmen o model pe o mance.
28
5.2. Expe imen 1: Inc emen al Fea u e Selec ion 29
5.2 Expe imen 1: Inc emen al Fea u e Selec ion
To iden i y he subse o no e-le el ea u es ha bes suppo he p edic ion o
melodic su p ise, we employed an al e na ing o wa d-backwa d g eedy sea ch. We
begin wi h no ea u es selec ed. In he o wa d pass, each candida e ea u e no ye
in he se is added one a a ime, he model is e ained, and he addi ion accep ed
as soon as i yields any imp o emen in pe o mance; he pass hen es a s om
his enla ged se . I no single addi ion imp o es pe o mance, we en e he backwa d
pass: each ea u e in he cu en se is emo ed in u n, and any emo al ha boos s
pe o mance is accep ed, a e which we e u n o he o wa d pass. This al e na ing
p ocess con inues un il nei he adding no emo ing any single ea u e yields u he
gains.
All e alua ions use he wo cho ale melodies om he Manza a expe imen (Da ase
5, Figu e 3), wi h human- a ed in o ma ion-con en p o iles [37]. Each candida e
ea u e se is used o ain h ee independen uns o he Su p iseLSTM model on
ou clean melodies aining co pus (Da ase 1), and es ed only on hese wo cho ales
o measu e gene aliza ion o unseen ma e ial.
(a) BWV 379: Meinen Jesum laß ich nich , Jesus


  









 

 

(b) BWV 159: Jesu Leiden, Pein und Tod



 




 

 



6
   






12 

Figu e 3: Sco e exce p s o he wo Bach cho ales used in he Manza a expe imen
(Da ase 5).
In e e y un, he Su p iseLSTM consis s o a s ack o wo LSTM laye s, each wi h
30 Chap e 5. Expe imen al Valida ion
hidden size H= 1024, ollowed by d opou (p= 0.5) and a inal linea p ojec ion
o 128 pi ch logi s (see Chap e 4). Models a e ained o 8 epochs wi h ba ch size
128, using he Adam op imize (lea ning a e 0.001), and g adien s clipped o no m
5. These alues we e selec ed a e explo ing a ange o hidden sizes, d opou a es,
lea ning a es, and aining du a ions: hey p o ided he bes ade-o be ween
model capaci y and gene aliza ion, yielding s able con e gence ac oss andom seeds
wi hou e iden o e i ing.
Table 3 gi es me ics a each s age o he sea ch, including Spea man’s co ela ion
ρs, mean in o ma ion con en H, adjus ed R2, eg ession slope b, and aining/ es
losses. We pe o med he g eedy o wa d–backwa d selec ion once a ge ing maxi-
miza ion o ρs. We chose Spea man’s co ela ion o e Pea son’s as we expec mono-
onic a he han linea ela ions in he IC pai s.
The in e se ela ionship, whe e mean IC H alls as Spea man ρs ises, aligns wi h
heo ies o e icien cogni i e encoding: lis ene s may a o ea u es ha boos he
expec edness o e en s ( aising ) while simul aneously comp essing edundan in-
o ma ion (lowe ing H) [40, 41].
Table 3: Inc emen al Fea u e Selec ion Resul s o Su p iseLSTM Model
Fea u es Added ρsH R2
adj bT ain Loss Tes Loss
pi ch (baseline) 0.628 2.065 0.468 0.572 1.712 1.450
+du a ion 0.637 2.003 0.490 0.629 1.664 1.401
+symbolic du a ion 0.638 2.030 0.472 0.647 1.659 1.429
+onse 0.650 2.007 0.464 0.598 1.652 1.407
+in e al 0.650 1.996 0.454 0.630 1.609 1.397
+con ou 0.665 1.976 0.473 0.628 1.601 1.391
+ioi 0.667 1.987 0.469 0.606 1.597 1.396
The esul s in Table 3 demons a e sys ema ic imp o emen s in he model’s abili y
o p edic human su p ise as successi e ea u es a e inco po a ed. S a ing wi h
only he pi ch ea u e as baseline, Su p iseLSTM achie es a Spea man co ela ion
o ρs= 0.628 and adjus ed R2= 0.468, wi h a mean en opy o 2.065 bi s. Adding
du a ion inc eases co ela ion o ρs= 0.637 and R2
adj = 0.490 while educing mean
5.2. Expe imen 1: Inc emen al Fea u e Selec ion 31
en opy o 2.003 bi s, indica ing ha hy hmic in o ma ion enhances bo h p edic i e
accu acy and model e iciency.
Inco po a ing symbolic du a ion (nea es no a ed alue) main ains he co ela ion
gain (ρs= 0.638) bu shows a sligh dec ease in R2
adj o 0.472, sugges ing some edun-
dancy wi h he con inuous du a ion ea u e. The addi ion o onse iming p oduces
a mo e subs an ial imp o emen , boos ing co ela ion o ρs= 0.650 while main ain-
ing mean en opy a ound 2.007 bi s. Adding in e al in o ma ion main ains his
co ela ion le el while u he educing en opy o 1.996 bi s, demons a ing mo e
e icien encoding o melodic ela ionships.
The inclusion o con ou ea u es yields he mos signi ican imp o emen in co e-
la ion (ρs= 0.665) and con inues he end o educed mean en opy (1.976 bi s),
indica ing ha melodic shape in o ma ion cap u es impo an aspec s o human
expec a ion. Finally, inco po a ing ioi (in e -onse in e al) p o ides a modes ad-
di ional gain o ρs= 0.667 and R2
adj = 0.469, wi h mean en opy s abilizing a
1.987 bi s.
O e all, he inc emen al addi ion o hese six addi ional ea u es esul s in modes
imp o emen s om he pi ch-only baseline (∆ρs= +0.039,∆R2
adj = +0.001) while
achie ing mo e e icien ep esen a ion (∆H=−0.078 bi s). This sugges s ha
pi ch in o ma ion alone cap u es much o he s uc u e unde lying human musical
expec a ion, wi h addi ional empo al and melodic ea u es p o iding inc emen al
e inemen s. The consis en educ ion in bo h aining and es loss ( om 1.712
o 1.597 and 1.450 o 1.396, espec i ely) con i ms ha while hese supplemen a y
ea u es do enhance model pe o mance, he co e p edic i e powe de i es p ima ily
om pi ch ela ionships.
The dominance o pi ch in o ma ion in p edic ing human melodic su p ise equi es
ca e ul in e p e a ion. While he baseline pi ch ea u e alone cap u es subs an ial
a iance, his inding encompasses no me ely aw pi ch alues bu he ich a ay o
pi ch-de i ed ela ionships ha he neu al model lea ns o encode in i s dis ibu ed
ep esen a ions. The model weigh s implici ly cap u e complex in e ac ions be ween
32 Chap e 5. Expe imen al Valida ion
absolu e pi ch, ela i e in e als, scale deg ees, con ou pa e ns, and hei empo al
dependencies: ela ionships ha ex end a beyond simple no e iden i ica ion. In
essence, he pi ch baseline ep esen s a comp essed encoding o melodic s uc u e
ha encompasses many o he de i ed ea u es we explici ly es ed.
This dominance o pi ch- ela ed in o ma ion aises in iguing ques ions abou he
uni e sali y o hese indings. While pi ch ela ionships cap u e he majo i y o a i-
ance in ou Wes e n musical con ex , his pa e n migh di e subs an ially in o he
cul u al adi ions whe e hy hmic complexi y plays a mo e cen al ole in musical
s uc u e and expec a ion. Fo ins ance, in many A ican, Indian, o Middle Eas -
e n musical adi ions, hy hmic pa e ns and hei a ia ions cons i u e p ima y
sou ces o musical ension and esolu ion. A c oss-cul u al ex ension o his ea u e
selec ion analysis could e eal whe he he ela i e impo ance o pi ch e sus em-
po al ea u es a ies sys ema ically ac oss musical cul u es, po en ially unco e ing
impo an di e ences in how di e en socie ies o ganize musical expec a ion.
5.3 Expe imen 2: Model-Model Co ela ions
To si ua e Su p iseLSTM among exis ing app oaches, we compu ed h ee se s o
no e-by-no e su p ise co ela ions on he same 1110-melody Wes e n co pus (Da ase
2): (1) Su p iseLSTM s. IDyOM, (2) Su p iseLSTM s. AudioIC, and (3) IDyOM
s. AudioIC. IDyOM was e ained on ou p ima y MIDI collec ion (Da ase 1) us-
ing he cpin ip iewpoin oge he wi h he linked cpin e ⊗du - a io com-
bina ion shown by Pea ce [7] o bes p edic pi ch expec ancy. Su p iseLSTM em-
ployed he ea u e subse selec ed in Expe imen 1. AudioIC, which ope a es on
con inuous audio a he han symbolic sequences, was applied o WAV ende s o
each MIDI ile (syn hesized using he Up igh Piano KW SoundFon a ailable a
h ps:// eepa s.zen oid.o g/Piano/acous ic-g and-piano.h ml). Because
AudioIC’s ou pu is a con inuous ime-se ies o su p ise alues, we sampled i s cu e
a he onse imes o each MIDI no e o ob ain disc e e no e-le el es ima es. Mo e-
o e , AudioIC inco po a es imb al ea u es since i uses audio, absen om bo h
Su p iseLSTM and IDyOM, so some sys ema ic di e ences a e o be expec ed.

5.4. Expe imen 3: Su p ise s. Pleasan ness 33
Fo each melody, we ex ac ed su p ise ime-se ies om all h ee models and com-
pu ed Spea man’s ank co ela ion ρs, and i ed a simple linea eg ession o ob ain
he slope band he adjus ed coe icien o de e mina ion R2
adj. We hen a e aged
each me ic ac oss melodies and epo he mean ±SE in Table 4.
Table 4: In e -Model Co ela ion S a is ics Ac oss 1,110 Wes e n Melodies
Model Pai ρs±SE b±SE R2
adj ±SE
IDyOM s Su p iseLSTM 0.639 ±0.135 0.517 ±0.187 0.411 ±0.166
AudioIC s Su p iseLSTM 0.300 ±0.163 0.035 ±0.017 0.131 ±0.121
AudioIC s IDyOM 0.480 ±0.151 4.704 ±1.298 0.323 ±0.150
Su p iseLSTM shows he s onges co espondence wi h IDyOM (ρs= 0.639±0.135,
R2
adj = 0.411 ±0.166), con i ming ha ou neu al model e ec i ely cap u es he
condi ional pi ch expec a ions cha ac e is ic o symbolic, in o ma ion- heo e ic p e-
dic ion. This subs an ial co ela ion sugges s bo h models ap in o simila aspec s
o musical s uc u e despi e hei o ally di e en compu a ional a chi ec u es.
In con as , AudioIC exhibi s weake co ela ions wi h bo h symbolic models: ρs=
0.300 ±0.163 wi h Su p iseLSTM and ρs= 0.480 ±0.151 wi h IDyOM. These
lowe co ela ions likely e lec AudioIC’s design o a bi a y audio con en , includ-
ing polyphonic ex u es and imb al nuances, whe eas IDyOM and Su p iseLSTM
ope a e s ic ly on monophonic, symbolic ep esen a ions. The mode a e AudioIC–
IDyOM co ela ion (R2
adj = 0.323±0.150) sugges s some sha ed sensi i i y o melodic
pa e ns, hough il e ed h ough AudioIC’s dis inc spec o- empo al p ocessing.
5.4 Expe imen 3: Su p ise s. Pleasan ness
Fo a di ec beha io al alida ion o ou model, we compa ed Su p iseLSTM’s
in o ma ion-con en es ima es agains human pleasan ness a ings o isola ed pi ch
in e als (Da ase 4). These a ings we e collec ed in a dense a ing pa adigm [36],
in which 415 US pa icipan s judged 15 000 ins ances o wo-no e s imuli spanning
he in ege in e al ange [−15,+15] semi ones. Pa icipan s p o ided mean pleas-
an ness a ings on a 1 (no a all) o 7 ( e y much) scale. We ex ac ed he smoo hed
34 Chap e 5. Expe imen al Valida ion
mean and s anda d e o alues o each in ege in e al.
We applied he same Su p iseLSTM model ( ea u es, aining da a, iewpoin s and
hype pa ame e s) as in Expe imen 2. Fo each in ege in e al ∆∈[−15,15], we
syn hesized a wo-no e MIDI ile in which a ixed e e ence pi ch (C5) is ollowed
by C5+ ∆ a e 0.5 s. We hen p edic ed he in o ma ion con en (IC) o he
second no e om Su p iseLSTM, om IDyOM (using he same iewpoin s and
aining se ), and om AudioIC (a e ende ing each MIDI o WAV and sampling
AudioIC’s con inuous su p ise cu e a he no e onse imes). Finally, we compu ed
Spea man’s co ela ions be ween model IC and human pleasan ness ac oss he 31
in ege in e als.
Table 5: Co ela ion be ween Model In o ma ion Con en and Human Pleasan ness
Ra ings
Model Co ela ion (ρ) Signi icance
Su p iseLSTM −0.821 p < .001***
IDyOM −0.802 p<.001***
AudioIC +0.240 p=.19
The co ela ions in Table 5 show ha bo h Su p iseLSTM (ρ=−0.821,p<.001)
and IDyOM (ρ=−0.802,p<.001) exhibi s ong nega i e co ela ions wi h pleas-
an ness a ings, indica ing ha highe model-p edic ed su p ise co esponds o lowe
lis ene p e e ences. Bo h symbolic models cap u e he human endency o ind
highly unexpec ed musical e en s less pleasan .
AudioIC shows no eliable ela ionship wi h pleasan ness a ings (ρ= +0.240,
p=.19). This lack o co ela ion may e lec AudioIC’s ocus on spec o- empo al
ea u es ha do no align wi h he mechanisms unde lying aes he ic p e e ence in
melodic con ex s.
These esul s indica e ha symbolic, in o ma ion- heo e ic app oaches o model-
ing melodic expec a ion cap u e he ela ionship be ween p edic abili y and musi-
cal p e e ence, whe e excessi e su p ise educes lis ene sa is ac ion. The simila
indings ac oss Su p iseLSTM and IDyOM suppo expec a ion-based heo ies o
5.5. Expe imen 4: Cho ale En opy P o iles 35
musical p e e ence.
5.5 Expe imen 4: Cho ale En opy P o iles
In his analysis we examine how well each model cap u es he su p ise ajec o y in
wo Bach cho ales om he Manza a co pus (Da ase 5). These melodies, Meinen
Jesum laß ich nich , Jesus (BWV 379) and Jesu Leiden, Pein und Tod (BWV 159),
a e labeled wi h human IC p o iles ob ained in a beha io al s udy [37]. We apply
he same Su p iseLSTM and he wo benchma ks, IDyOM and AudioIC, o hese
cho ales wi h he same pa ame e s as in expe imen s 2 and 4.
Table 6 epo s Spea man’s co ela ion ρs, mean IC, eg ession slope b, adjus ed R2
and es loss whe e a ailable. The accompanying su p ise o e ime ajec o ies a e
shown in Figu e 4.
Table 6: Model Pe o mance Me ics by Melody
Melody Model ρsMean IC b R2
adj Tes Loss
BWV 379
Su p iseLSTM 0.698 2.070 0.635 0.541 1.435
IDyOM 0.581 2.710 0.551 0.439 N/A
AudioIC 0.384 24.263 0.033 0.120 N/A
BWV 159
Su p iseLSTM 0.663 2.049 0.504 0.360 1.421
IDyOM 0.534 2.471 0.510 0.306 N/A
AudioIC 0.283 25.221 0.030 0.096 N/A
The esul s in Table 6 show ha Su p iseLSTM achie es he s onges alignmen
wi h human in o ma ion con en a ings on bo h cho ales, wi h Spea man co ela-
ions o ρs= 0.698 (BWV 379) and ρs= 0.663 (BWV 159), and co esponding R2
adj
alues o 0.541 and 0.360 espec i ely. IDyOM pe o ms compa ably, wi h co ela-
ions o ρs= 0.581 and ρs= 0.534 o he wo cho ales, e lec ing i s e ec i eness a
symbolic pi ch p edic ion. AudioIC shows weake ag eemen wi h human a ings,
achie ing co ela ions o only ρs= 0.384 (BWV 379) and ρs= 0.283 (BWV 159).
Bo h symbolic models demons a e consis en pe o mance ac oss he wo cho ales,
wi h Su p iseLSTM main aining a sligh ad an age in cap u ing human expec a ion
36 Chap e 5. Expe imen al Valida ion
pa e ns. The lowe eg ession slopes o AudioIC (b= 0.033 and b= 0.030) in-
dica e a weake linea ela ionship wi h human a ings compa ed o he symbolic
app oaches.
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
0
1
2
3
4
5
6
7
8
No e
In o ma ion Con en
En opy P o ile o Cho ale BWV 379
Human
LSTM
0 5 10 15 20 25 30 35 40 45 50 55 60
0
1
2
3
4
5
6
7
No e
In o ma ion Con en
En opy P o ile o Cho ale BWV 159
Human
LSTM
Figu e 4: Compa ison o human-de i ed In o ma ion Con en (blue) and Su p iseL-
STM p edic ions ( ed) o he wo Manza a cho ale s imuli (Da ase 5).
6.4. Resul s and Analysis 43
To e alua e model pe o mance comp ehensi ely, we compu e se e al complemen-
a y me ics: con ainmen a es measu ing he ac ion o cases whe e he human
op choice alls wi hin he model’s op-k p edic ions, Mean Recip ocal Rank (MRR)
assessing he ypical anking posi ion o human p e e ences in model p edic ions,
and expec ed log-p obabili y measu ing how well model dis ibu ions ma ch human
esponse pa e ns.
We also analyze he a e age o melody-le el e o a es, de ined as hal he sum
o absolu e di e ences be ween human and model p obabili y dis ibu ions o each
s imulus. This me ic p o ides an in ui i e measu e o dis ibu ion misma ch, whe e
alues nea 0 indica e pe ec alignmen and alues nea 1 indica e comple ely op-
posi e dis ibu ions.
6.4 Resul s and Analysis
6.4.1 T aining Da a Dependencies
The mlogi eg ession esul s e eal impo an insigh s abou how he models handle
la ge-scale, di e se aining da a. Table 8 shows ha when IDyOM is ained on
he smalle Wes e n da ase while Su p iseLSTM uses he di e se Teg idy co pus,
bo h models emain signi ican p edic o s o human beha io (IDyOM: coe icien =
0.429, p<2.2×10−16; LSTM: coe icien = 1.022, p < 2.2×10−16). Su p iseLSTM
demons a es a coe icien mo e han wice as la ge as IDyOM, indica ing subs an-
ially s onge p edic i e powe .
Table 8: IDyOM (Wes e n Da a) + LSTM (Teg idy Da a): S a is ical Resul s
AC + NC NC
P edic o Es . SE z p- alue Es . SE z p- alue
IDyOM 0.429 0.032 13.29 < 2.2e-16*** 0.253 0.040 6.31 2.77e-10***
LSTM 1.022 0.041 25.08 < 2.2e-16*** 0.771 0.055 13.89 < 2.2e-16***
Table 9 e eals a di e en pa e n when bo h models a e ained on he iden i-

44 Chap e 6. Neu al s. S a is ical Musical Expec a ion
Table 9: IDyOM (Teg idy Da a) + LSTM (Teg idy Da a): S a is ical Resul s
AC + NC NC
P edic o Es . SE z p- alue Es . SE z p- alue
IDyOM -0.017 0.052 -0.32 0.749 -0.015 0.067 -0.22 0.822
LSTM 1.249 0.042 30.09 < 2.2e-16*** 0.906 0.059 15.42 < 2.2e-16***
cal la ge-scale Teg idy da ase . He e, IDyOM’s coe icien becomes non-signi ican
(AC+NC: coe icien = -0.017, p= 0.749; NC only: coe icien = -0.015, p= 0.822),
while Su p iseLSTM main ains s ong pe o mance (coe icien = 1.249 and coe i-
cien = 0.906 espec i ely, bo h p < 2.2×10−16).
This pa e n does no indica e ha IDyOM ained on Teg idy pe o ms poo ly in
isola ion. When e alua ed alone, i achie es a signi ican coe icien o 0.762 (p <
2×10−16). Ra he , he esul s sugges ha Su p iseLSTM’s p edic ions subsume he
a iance explained by IDyOM when bo h a e ained on he same di e se da ase .
The neu al model’s dis ibu ed ep esen a ions appea o cap u e all he pa e ns
ha IDyOM’s explici s a is ical mechanisms de ec , plus addi ional egula i ies
ha IDyOM canno access.
This inding has impo an implica ions o unde s anding he ela ionship be ween
neu al and s a is ical app oaches. When ained on di e se, la ge-scale da a (con-
di ions ha be e app oxima e eal-wo ld musical exposu e) he neu al model’s
lea ned ep esen a ions encompass he p edic i e capaci y o explici s a is ical mod-
eling while ex ending beyond i s limi a ions. IDyOM’s handc a ed iewpoin com-
bina ions may ep esen a subse o he pa e ns ha eme ge na u ally om Su -
p iseLSTM’s end- o-end lea ning p ocess.
The a e age melody-le el e o analysis in Table 10 ein o ces his in e p e a ion.
Su p iseLSTM achie es consis en ly lowe e o a es han IDyOM ac oss all condi-
ions (AC: 0.581 s. 0.686; NC: 0.546 s. 0.579).
6.4. Resul s and Analysis 45
Table 10: A e age melody-le el e o o each model
Model AC NC
Teg idy LSTM 0.581 0.546
IDyOM 0.686 0.579
Wes e n IDyOM 0.627 0.612
6.4.2 Caden ial Con ex Analysis
Figu e 5 e eals sys ema ic di e ences in how models handle caden ial e sus non-
caden ial con ex s. In AC con ex s, human lis ene s show s ong con e gence on
he onic (73% o esponses on scale deg ee 1), e lec ing he powe ul expec a ion
c ea ed by implied dominan ha mony. Su p iseLSTM cap u es his endency wi h
mode a e concen a ion (30%), while IDyOM’s pe o mance a ies d ama ically by
aining da ase : he Wes e n- ained e sion achie es be e onic p edic ion (29%)
han he Teg idy- ained e sion (19%).
The pe o mance me ics in Table 11 show Su p iseLSTM achie ing good accu-
acy in caden ial con ex s (No e@Top-1: 57.8%) wi h he bes dis ibu ion i (log-
p obabili y: -5.079). Bo h he Wes e n- ained IDyOM and Su p iseLSTM pe o m
easonably well when ha monic expec a ions a e clea and s ong.
NC con ex s p esen a mo e challenging es o model capabili ies. Human e-
sponses become dis ibu ed ac oss mul iple scale deg ees, e lec ing genuine unce -
ain y abou con inua ion. He e, Su p iseLSTM demons a es supe io pe o mance
ac oss all me ics in Table 12, achie ing 40% accu acy a No e@Top-1 compa ed o
IDyOM’s 15.6% (Wes e n) and 26.7% (Teg idy). The neu al model’s dis ibu ed
ep esen a ions appea be e sui ed o cap u ing he unce ain y ha cha ac e izes
human cogni ion in ambiguous musical con ex s.
46 Chap e 6. Neu al s. S a is ical Musical Expec a ion
Table 11: Model pe o mance o AC (cadence) cases. Con ainmen : ac ion whe e
he human op choice is in he model’s Top-k(No e = exac pi ch, PC = pi ch class).
Ranking: MRR o he human op choice. Dis ibu ion i : expec ed log-p obabili y
(highe is be e ).
Ca ego y Me ic LSTM ( eg idy) IDyOM (wes e n) IDyOM ( eg idy)
Con ainmen
No e@Top-1 0.578 0.622 0.267
PC@Top-1 0.622 0.556 0.178
No e@Top-3 0.867 0.778 0.689
PC@Top-3 0.889 0.778 0.733
Ranking No e MRR 0.740 0.724 0.518
PC MRR 0.773 0.705 0.478
Dis ibu ion i Log-p ob -5.079 -5.115 -5.254
Table 12: Model pe o mance o NC (non-cadence) cases. Me ics as in Table 11.
Ca ego y Me ic LSTM ( eg idy) IDyOM (wes e n) IDyOM ( eg idy)
Con ainmen
No e@Top-1 0.400 0.156 0.267
PC@Top-1 0.511 0.244 0.267
No e@Top-3 0.667 0.622 0.578
PC@Top-3 0.822 0.622 0.556
Ranking No e MRR 0.546 0.416 0.466
PC MRR 0.663 0.485 0.473
Dis ibu ion i Log-p ob -5.610 -6.076 -5.467
6.5. Discussion 47
6.4.3 A chi ec u al and Cogni i e Implica ions
Despi e a chi ec u al di e ences, Su p iseLSTM and IDyOM show subs an ial co e-
la ion (ρs= 0.639 om Chap e 5), sugges ing a chi ec u e-independen p ope ies
o melodic expec a ion. Bo h models demons a e sensi i i y o s a is ical egula -
i ies in musical sequences, whe he cap u ed h ough explici n-g am modeling o
lea ned dis ibu ed ep esen a ions.
Howe e , he di e en ial pe o mance ac oss aining condi ions and musical con-
ex s e eals impo an mechanis ic dis inc ions. Su p iseLSTM’s obus ness o
di e se aining da a and supe io handling o ambiguous con ex s sugges s ha
dis ibu ed neu al ep esen a ions may mo e ai h ully cap u e he con ex -sensi i e
na u e o human musical cogni ion. The neu al a chi ec u e’s abili y o lea n hie -
a chical ea u es and main ain unce ain y dis ibu ions ha may align be e wi h
human cogni i e p ocesses han explici s a is ical abula ion.
The dominance o onic esponses ac oss bo h human da a and model p edic ions
indica es a bias owa d onal s abili y ha anscends speci ic a chi ec u al imple-
men a ions. Howe e , his bias appea s s onge in human lis ene s han in ei he
compu a ional model, sugges ing ha human musical cogni ion inco po a es ex-
plici knowledge abou onal hie a chy beyond wha eme ges om pu e s a is ical
lea ning.
6.5 Discussion
These indings shows se e al aspec s o compu a ional app oaches o musical ex-
pec a ion. The ela ionship be ween Su p iseLSTM and IDyOM when ained on
la ge-scale da ase s e eals no ha IDyOM ails, bu a he ha neu al models may
ep esen a mo e comp ehensi e compu a ional amewo k. When bo h models a e
ained on di e se musical da a ha be e app oxima es eal-wo ld lis ening condi-
ions, Su p iseLSTM’s dis ibu ed ep esen a ions subsume he p edic i e pa e ns
cap u ed by IDyOM’s explici s a is ical mechanisms while ex ending beyond hei
scope.
48 Chap e 6. Neu al s. S a is ical Musical Expec a ion
This subsump ion sugges s ha handc a ed iewpoin combina ions in s a is ical
models may ep esen a subse o he pa e ns ha eme ge na u ally om neu-
al lea ning p ocesses. Ra he han indica ing undamen al laws in s a is ical ap-
p oaches, his inding poin s o he possibili y ha neu al a chi ec u es disco e and
in eg a e he same s a is ical egula i ies ha in o m explici models, while simul-
aneously lea ning addi ional ela ionships ha explici p og amming canno easily
cap u e.
The supe io pe o mance o Su p iseLSTM in non-caden ial con ex s suppo s his
in e p e a ion. Human musical expec a ion in ambiguous con ex s likely in ol es
pa allel cons ain sa is ac ion whe e mul iple in o ma ion sou ces con ibu e si-
mul aneously o expec a ion o ma ion. Dis ibu ed ep esen a ions may na u ally
app oxima e his unce ain y-awa e p ocessing, whe eas explici s a is ical models
equi e ca e ul enginee ing o achie e simila lexibili y.
The a chi ec u al con e gence obse ed be ween neu al and s a is ical app oaches
( hei subs an ial co ela ion despi e undamen al di e ences) sugges s bo h cap-
u e impo an aspec s o s a is ical lea ning unde lying human musical expec a-
ion. Howe e , hei di e gences e eal ha compu a ional mechanisms ma e
signi ican ly o modeling cogni i e p ocesses. Neu al models’ abili y o lea n hi-
e a chical ea u e ep esen a ions and main ain unce ain y may align mo e closely
wi h human cogni i e a chi ec u e han s a is ical sys ems.
These esul s also highligh he impo ance o aining da a di e si y in e alua ing
cogni i e models. Real-wo ld lis ene s encoun e eno mous s ylis ic he e ogenei y
h oughou hei li es, and compu a ional models mus demons a e obus ness un-
de hese condi ions o claim ecological alidi y. The inding ha neu al app oaches
be e handle di e se aining condi ions sugges s hey may p o ide mo e ealis ic
app oxima ions o human lea ning p ocesses.
Howe e , bo h modeling app oaches lea e subs an ial a iance unexplained, pa ic-
ula ly in complex musical con ex s. This limi a ion indica es ha comple e models
o musical cogni ion will equi e in eg a ion o hie a chical ha monic unde s and-

6.5. Discussion 49
ing, long- e m s uc u al dependencies, and cul u al condi ioning ac o s ha shape
indi idual lis ening s a egies. The cu en esul s sugges ha while neu al ap-
p oaches may p o ide a mo e comp ehensi e ounda ion, signi ican heo e ical and
compu a ional ad ances a e s ill needed o ully cap u e human musical cogni ion.
50 Chap e 6. Neu al s. S a is ical Musical Expec a ion
1234567
0
0.2
0.4
0.6
0.8
1
P opo ion
Human
1234567
0
0.2
0.4
0.6
0.8
1
1234567
0
0.2
0.4
0.6
0.8
1
P opo ion
IDyOM (Wes e n)
IDyOM (Teg idy)
1234567
0
0.2
0.4
0.6
0.8
1
1234567
0
0.2
0.4
0.6
0.8
1
Scale Deg ee
P opo ion
LSTM
1234567
0
0.2
0.4
0.6
0.8
1
Scale Deg ee
Au hen ic Cadence Non-Cadence
Figu e 5: Dis ibu ion o scale deg ees o Human, LSTM, and IDyOM models.
Au hen ic Cadence (le column) and Non-Cadence ( igh column) a e compa ed,
wi h IDyOM shown in wo a ian s ( ained on Teg idy da ase and he Wes e n
da ase ).
Chap e 7
Discussion and Fu u e Wo k
7.1 Summa y o Con ibu ions
This hesis in oduced Su p iseLSTM, a neu al ne wo k model ha applies s a is i-
cal lea ning and p obabilis ic p edic ion p inciples o melodic expec a ion, demon-
s a ing ha ecu en a chi ec u es can e ec i ely model musical su p ise om ba-
sic no e-le el ea u es. Th ough sys ema ic compa ison wi h es ablished benchma ks
ac oss mul iple e alua ion pa adigms, we ha e p o ided he i s comp ehensi e as-
sessmen o neu al e sus s a is ical app oaches o modeling musical expec a ion in
he monophonic domain.
Empi ically, we demons a ed ha Su p iseLSTM ma ches o exceeds IDyOM’s pe -
o mance in p edic ing human su p ise judgmen s ac oss i e complemen a y alida-
ion asks. The model success ully cap u es he in e se ela ionship be ween su p ise
and pleasan ness, ep oduces he classical Wund e ec , and p o ides supe io i s
o human en opy p o iles in Bach cho ales.
Theo e ically, ou indings e eal impo an dis inc ions be ween neu al and s a is-
ical app oaches. When ained on la ge-scale, di e se da ase s ha be e app oxi-
ma e eal-wo ld musical exposu e, Su p iseLSTM’s dis ibu ed ep esen a ions sub-
sume he p edic i e pa e ns cap u ed by IDyOM’s explici s a is ical mechanisms
while ex ending beyond hei scope. This sugges s ha neu al a chi ec u es may
51
52 Chap e 7. Discussion and Fu u e Wo k
na u ally disco e and in eg a e he same egula i ies ha in o m s a is ical mod-
els, while simul aneously lea ning addi ional ela ionships ha explici p og amming
canno easily cap u e.
7.2 Implica ions o Musical Cogni ion
Ou esul s p o ide se e al insigh s in o he compu a ional p inciples unde lying
human musical expec a ion. The dominance o pi ch- ela ed ea u es in d i ing
p edic i e accu acy sugges s ha melodic con ou and in e al ela ionships cons i-
u e he p ima y ea u e o musical expec a ion in Wes e n onal music. Howe e ,
his inding equi es ca e ul in e p e a ion: he neu al model’s pi ch ep esen a ion
encompasses complex in e ac ions be ween absolu e pi ch, ela i e in e als, scale
deg ees, and empo al dependencies ha ex end a beyond simple no e iden i ica-
ion.
The supe io pe o mance o Su p iseLSTM in ambiguous musical con ex s (pa -
icula ly non-caden ial passages whe e human expec a ions a e dis ibu ed a he
han ocused) sugges s ha human musical cogni ion elies on unce ain y-awa e
p ocessing mechanisms. Dis ibu ed neu al ep esen a ions may be e app oxima e
he pa allel cons ain sa is ac ion p ocesses ha cha ac e ize human expec a ion
o ma ion, whe e mul iple in o ma ion sou ces con ibu e simul aneously o p oba-
bilis ic p edic ions.
The consis en alignmen be ween model p edic ions and human aes he ic judgmen s
ac oss mul iple pa adigms suppo s heo ies linking p edic i e p ocessing o musical
enjoymen . The eplica ion o he Wund e ec demons a es ha compu a ional
models o su p ise can cap u e undamen al ela ionships be ween complexi y and
p e e ence, sugges ing ha aes he ic expe ience may eme ge om op imal calib a-
ion o p edic i e mechanisms.
Howe e , ou indings also highligh signi ican limi a ions in cu en compu a ional
app oaches. Bo h neu al and s a is ical models lea e subs an ial a iance in human
beha io unexplained, pa icula ly in complex musical con ex s. This sugges s ha
BIBLIOGRAPHY 59
[19] Huang, C.-Z. A. e al. Music ans o me (2018). URL h ps://a xi .o g/
abs/1809.04281.1809.04281.
[20] Robe s, A., Engel, J., Ra el, C., Haw ho ne, C. & Eck, D. A hie a chical
la en ec o model o lea ning long- e m s uc u e in music (2019). URL
h ps://a xi .o g/abs/1803.05428.1803.05428.
[21] Shannon, C. E. P edic ion and en opy o p in ed english. The Bell Sys em
Technical Jou nal 30, 50–64 (1951).
[22] Tempe ley, D. A p obabilis ic model o melody pe cep ion. Cogni i e Science
32, 418–444 (2008). URL h ps://onlinelib a y.wiley.com/doi/abs/10.
1080/03640210701864089.h ps://onlinelib a y.wiley.com/doi/pd /
10.1080/03640210701864089.
[23] Conklin, D. & Wi en, I. Mul iple iewpoin sys ems o music p edic ion. J.
New Music Res 24 (2003).
[24] Pea ce, M. T. Lea ning o Lis en, Lis ening o Lea n: Music Pe cep ion and
he Psychology o Encul u a ion (Ox o d Uni e si y P ess, Ox o d, 2025). URL
h ps://doi.o g/10.1093/oso/9780198848004.001.0001.
[25] Bja e, M. R., La ne , S. & Widme , G. Con olling su p isal in music gene -
a ion ia in o ma ion con en cu e ma ching (2024). URL h ps://a xi .
o g/abs/2408.06022.2408.06022.
[26] Hadje es, G. & C es el, L. The piano inpain ing applica ion. CoRR
abs/2107.05944 (2021). URL h ps://a xi .o g/abs/2107.05944.2107.
05944.
[27] Bja e, M. R., Can isani, G., La ne , S. & Widme , G. Es ima ing musical
su p isal in audio (2025). URL h ps://a xi .o g/abs/2501.07474.2501.
07474.
[28] Pasini, M., La ne , S. & Fazekas, G. Music2la en : Consis ency au oen-
code s o la en audio comp ession (2024). URL h ps://a xi .o g/abs/
2408.06500.2408.06500.

60 BIBLIOGRAPHY
[29] Song, Y., Dha iwal, P., Chen, M. & Su ske e , I. Consis ency models (2023).
URL h ps://a xi .o g/abs/2303.01469.2303.01469.
[30] Mascle , N. L. & Kelle , T. A. Deep gene a i e models o music expec a ion
(2023). URL h ps://a xi .o g/abs/2310.03500.2310.03500.
[31] Bja e, M. R., La ne , S. & Widme , G. Es ima ing musical su p isal om
audio in au o eg essi e di usion model noise spaces (2025). URL h ps://
a xi .o g/abs/2508.05306.2508.05306.
[32] Le , A. Los angeles midi da ase : So a kilo-scale midi da ase o mi and
music ai pu poses. In Gi Hub (2024).
[33] Ra el, C. Lea ning-Based Me hods o Compa ing Sequences, wi h Applica ions
o Audio- o-MIDI Alignmen and Ma ching. Ph.D. hesis, Columbia Uni e si y,
USA (2016). URL h ps://doi.o g/10.7916/D8N58MHV.
[34] Ens, J. & Pasquie , P. Building he me amidi da ase : Linking symbolic and
audio musical da a. In Lee, J. H. e al. (eds.) P oceedings o he 22nd In e na-
ional Socie y o Music In o ma ion Re ie al Con e ence, ISMIR 2021, On-
line, No embe 7-12, 2021, 182–188 (2021). URL h ps://a chi es.ismi .
ne /ismi 2021/pape /000022.pd .
[35] Gold, B. e al. Audi o y and ewa d s uc u es e lec he pleasu e o musical
expec ancies du ing na u alis ic lis ening. F on ie s in Neu oscience 17 (2023).
[36] Anglada-To , M., Ha ison, P. M., Lee, H. & Jacoby, N. La ge-scale i -
e a ed singing expe imen s e eal o al ansmission mechanisms unde lying
music e olu ion. Cu en Biology 33, 1472–1486.e12 (2023). URL h ps:
//www.sciencedi ec .com/science/a icle/pii/S0960982223002439.
[37] Manza a, L. C., Wi en, I. H. & James, M. On he en opy o music: An
expe imen wi h bach cho ale melodies. Leona do Music Jou nal 2, 81–88
(1992).
BIBLIOGRAPHY 61
[38] Mo gan, E., Fogel, A., Nai , A. & Pa el, A. D. S a is ical lea ning and
ges al -like p inciples p edic melodic expec a ions. Cogni ion 189, 23–
34 (2019). URL h ps://www.sciencedi ec .com/science/a icle/pii/
S0010027718303317.
[39] Bja e, M. R., Can isani, G., La ne , S. & Widme , G. Es ima ing musical
su p isal in audio (2025). URL h ps://a xi .o g/abs/2501.07474.2501.
07474.
[40] Cha e , N. Reconciling simplici y and likelihood p inciples in pe cep ual o ga-
niza ion. Psychological Re iew 103, 566–581 (1996).
[41] Cha e , N. The sea ch o simplici y: A undamen al cogni i e p inciple? The
Qua e ly Jou nal o Expe imen al Psychology Sec ion A 52, 273–302 (1999).
[42] Be lyne, D. E. S udies in he new expe imen al aes he ics: S eps owa d an
objec i e psychology o aes he ic app ecia ion. Jou nal o Aes he ics and A
C i icism 34, 86–87 (1975).