scieee Science in your language
[en] (orig)

GlobalMood: A Cross-Cultural Benchmark for Music Emotion Recognition

Author: Harin Lee; Elif Celen; Peter Harrison; Manuel Anglada-Tort; Pol van Rijn; Minsu Park; Marc Schönwiesner; Nori Jacoby
Publisher: Zenodo
DOI: 10.5281/zenodo.17706316
Source: https://zenodo.org/records/17706316/files/000001.pdf
GLOBALMOOD: A CROSS-CULTURAL BENCHMARK FOR MUSIC
EMOTION RECOGNITION
Ha in Lee1,2,3Eli Çelen1Pe e Ha ison4Manuel Anglada-To 5
Pol an Rijn1Minsu Pa k6Ma c Schönwiesne 3No i Jacoby1,7
1MPI Empi ical Aes he ics 2MPI o Human Cogni i e and B ain Sciences
3Leipzig Uni e si y 4Uni e si y o Camb idge 5Goldsmi hs, Uni e si y o London
6New Yo k Uni e si y Abu Dhabi 7Co nell Uni e si y
ABSTRACT
Human anno a ions o mood in music a e essen ial o mu-
sic gene a ion and ecommende sys ems. Howe e , ex-
is ing da ase s p edominan ly ocus on Wes e n songs wi h
e ms de i ed om English, which may limi gene alizabil-
i y ac oss di e se linguis ic and cul u al backg ounds. We
in oduce ‘GlobalMood’, a no el c oss-cul u al benchma k
da ase comp ising 1,180 songs sampled om 59 coun ies,
wi h la ge-scale anno a ions collec ed om 2,519 indi id-
uals ac oss i e cul u ally and linguis ically dis inc loca-
ions: U.S., F ance, Mexico, S. Ko ea, and Egyp . Ra he
han imposing p ede ined emo ion and mood ca ego ies,
we implemen a bo om-up, pa icipan -d i en app oach o
o ganically elici cul u ally speci ic music- ela ed emo ion
e ms. We hen ec ui ano he pool o human pa icipan s
o collec 988,925 a ings o hese cul u e-speci ic de-
sc ip o s. Ou analysis con i ms he p esence o a alence-
a ousal s uc u e sha ed ac oss cul u es, ye also e eals
signi ican di e gences in how ce ain emo ion e ms (de-
spi e being dic iona y equi alen s) a e pe cei ed c oss-
cul u ally. S a e-o - he-a mul imodal models bene i sub-
s an ially om ine- uning on ou c oss-cul u ally balanced
da ase , pa icula ly in non-English con ex s. B oadly, ou
indings in o m he ongoing deba e on he uni e sali y e -
sus cul u al speci ici y o emo ional desc ip o s, and ou
me hodology can con ibu e o o he mul imodal and c oss-
lingual esea ch.
1. INTRODUCTION
Music e okes di e se emo ional esponses in lis ene s,
spanning a wide spec um beyond basic emo ional ca e-
go ies [1, 2]. A cen al challenge in Music In o ma ion
Re ie al (MIR) is designing algo i hms ha can eplica e
his emo ional sensi i i y. This is c ucial o building ec-
ommenda ion sys ems ha align wi h lis ene s’ mood and
© H. Lee, E. Çelen, P. Ha ison, M. Anglada-To , P. an
Rijn, M. Pa k, M. Schönwiesne , and N. Jacoby. Licensed unde a C e-
a i e Commons A ibu ion 4.0 In e na ional License (CC BY 4.0). A i-
bu ion: H. Lee, E. Çelen, P. Ha ison, M. Anglada-To , P. an Rijn, M.
Pa k, M. Schönwiesne , and N. Jacoby, “GlobalMood: A c oss-cul u al
benchma k o music emo ion ecogni ion”, in P oc. o he 26 h In . So-
cie y o Music In o ma ion Re ie al Con ., Daejeon, Sou h Ko ea, 2025.
con ex [3–5], and o gene a ing music ha esona es wi h
indi idual p e e ences [6]. Mo e b oadly, unde s anding
how music con eys emo ion is a co e ques ion in he sci-
ence o music [7–9]. To da e, howe e , mos algo i hms
ha e been ained on da ase s de i ed om Wes e n lis en-
e s and Wes e n music, using axonomies p ima ily based
on English language (e.g., MIREX [10]).
A signi ican challenge is c ea ing c oss-cul u al mod-
els capable o handling non-Wes e n music and emo ion
ocabula ies beyond English. Add essing his challenge
is essen ial o de eloping algo i hms ha accu a ely e lec
global use s’ p e e ences, including hose whose musical
as es ex end beyond he limi ed ange o s yles cu en ly
ep esen ed in aining da ase s. Mo eo e , wi hou cap-
u ing cul u ally speci ic nuances o emo ion, especially
hose di icul o ansla e, key aspec s o musical mean-
ing may be missed en i ely. Di ec dic iona y ansla ions
o English e ms may be insu icien , as e ms desc ibing
emo ions a e deeply cul u al and may lack exac equi a-
len s [11–14].
To add ess hese issues, we in oduce ‘GlobalMood’, 1
a new benchma k da ase designed o suppo cul u ally in-
clusi e and linguis ically di e se emo ion and mood ecog-
ni ion in music. Ou con ibu ion inno a es along h ee
key dimensions: (i) he di e si y o musical s imuli, d awn
om 59 coun ies; (ii) he di e si y o anno a o s, span-
ning i e dis inc egions (wi h plans o ex ens o o e 20
languages and loca ions in u u e); (iii) a da a-d i en ap-
p oach o collec ing desc ip o s, gene a ed o ganically by
pa icipan s in hei own language du ing he anno a ion
p ocess.
Da a we e collec ed h ough wo s ages in ol ing a o-
al o 2,519 pa icipan s and 1,180 songs balanced e enly
ac oss 59 coun ies: In he i s s age (Sec ion 4.1; Fig-
u e 1), using a smalle subse o 200 songs, we employed
ou ecen ly de eloped i e a i e ask ha combines open-
ended elici a ion wi h collec i e e inemen [13, 15, 16].
Ra he han asking lis ene s o choose om a ixed lis o
p e-de ined emo ion e ms, we asked hem o desc ibe he
pe cei ed emo ion con eyed in he music using ee- ex
ags in hei na i e language, and a he same ime, a e he
1All code and da a: h ps://gi hub.com/ha in-gi /
GlobalMood
11
Figu e 1. Elici a ion and e inemen o music emo ion e ms h ough i e a i e pa icipan chains. (A) Schema ic illus a ion
o he collabo a i e agging p ocess wi hin a pa icipan chain. Pa icipan s con ibu e new emo ion- ela ed wo d ags o
each song, a e he ele ance o exis ing ags, and can also lag i ele an con en , c ea ing a dynamic e inemen sys em.
(B) Twen y mos eliable emo ion ags in each language, anked by hei ag sco es. Y-axis labels display ags in hei
o iginal language (le ) and English ansla ions ( igh ).
ags p o ided by p e ious lis ene s. This app oach was key
o unco e ing emo ion e ms ha would o he wise be o e -
looked by p ede ined, English-based axonomies (such as
‘appeal/plead’ ha appea s in Ko ean only).
In he second s age (Sec ion 4.2; Figu e 2), we selec ed
he op 20 elici ed e ms pe language and c owdsou ced
a ings o each ag ac oss he en i e se o 1,180 songs.
This esul ed in a o al o 988,925 a ings, c ea ing he mos
comp ehensi e open-sou ce c oss-cul u al emo ion anno-
a ion da ase in Music Emo ion Recogni ion (MER) o
da e.
We le e aged GlobalMood o es se e al ecen mul i-
modal and mul ilingual models (Gemini, CLAP) by e al-
ua ing hei pe o mance unde ze o-sho , ew-sho , and
ine- uned scena ios (Sec ion 4.3; Figu e 3). Models
ained only on English da a pe o med poo ly in some
cul u al con ex s, bu ine- uning wi h ou c oss-cul u al
da a g ea ly imp o ed hei pe o mance in non-English
se ings. This highligh s he c i ical impo ance o c oss-
cul u al da a in bo h aining MER models and es ablishing
app op ia e benchma ks o hei e alua ion.
2. RELATED WORKS
2.1 Music Emo ion and Mood Anno a ion Da ase s
Se e al da ase s ha e been de eloped o MER sys ems
wi h a ying anno a ion app oaches. 2Ea ly examples in-
2No e ha da abases o en ex end he concep o emo ion o include
ela ed cons uc s such as mood o eeling. He e we adop his b oade
clude he widely used MIREX 2007 mood da ase [10]
wi h 240-250 Wes e n songs in i e mood clus e s de i ed
om AllMusic’s English ags (e.g., ‘passiona e– ousing’,
‘wis ul–bi e swee ’), and CAL500 [17] wi h 500 Wes e n
pop/ ock songs anno a ed using 18 English mood e ms by
U.S. unde g adua e lis ene s. O e ime, la ge da ase s ap-
pea ed: he DEAM co pus (MediaE al ‘Emo ion in Music’
da ase [18]) con aining 2,058 song exce p s wi h con in-
uous alence/a ousal anno a ions; mood ags mined om
la ge co po a o Spo i y music playlis s [19]; and he
MTG-Jamendo da ase [20], which p o ides mood/ heme
ags o 18,486 songs. No ably, Jamendo’s ags we e eely
c owdsou ced (56 unique mood labels), which in oduced
mo e label a ie y bu s ill almos en i ely in English.
A common limi a ion ac oss hese da ase s is hei e-
liance on p ede ined English desc ip o s, many o which
s em om Wes e n music psychology ( o an excep ion,
see S auss e al. [21]). Fo ins ance, he Gene a Emo-
ional Music Scale (GEMS) de ines 45 emo ion desc ip-
o s (e.g., ‘joy ul ac i a ion’) based on s udies wi h Eu-
opean lis ene s [1], and his axonomy has been used o
anno a e da ase s like Emo i y [22]. Simila ly, he mood
ca ego ies in MIREX and CAL500 we e ixed in ad ance
(d awn om AllMusic o p io li e a u e) and p esen ed o
anno a o s as a closed se o op ions. Consequen ly, hese
op-down app oaches es ic anno a o s o he moods he
esea che s en isioned, lea ing any unlis ed mood nuances
pe spec i e, while acknowledging ha sub le dis inc ions be ween hem
do exis .
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
12
Figu e 2. Associa ion be ween emo ion e ms ac oss languages. (A) MDS isualiza ion o he emo ion e ms based
on mean a ings ac oss he ull song se . Te ms posi ioned close oge he exhibi simila a ing pa e ns ac oss songs,
sugges ing simila in e p e a ions ac oss languages. (B) Compa ison o e ms wi h di ec ansla ion equi alen s ac oss
languages. The a ea size indica es he deg ee o seman ic di e gence despi e appa en ansla ion equi alence.
uncap u ed and undocumen ed.
Acknowledging hese limi a ions, ecen esea ch has
begun explo ing MER beyond he Wes e n-cen ic scope.
Hu e al. [23] examined mood anno a ions o K-pop songs
p o ided by bo h Ko ean and Ame ican lis ene s. Thei
app oach in ol ed ansla ing he o iginal MIREX mood
ca ego ies in o Ko ean o local anno a o s. Al hough his
allowed di ec compa isons o mood classi ica ion be ween
Ko ean and Ame ican lis ene s, i inhe en ly es ic ed Ko-
ean anno a ions o e ms o iginally de ined wi hin Wes e n
con ex s.
Mo e ecen ly, we compiled a balanced se o Ame i-
can, B azilian, and Ko ean songs and ga he ed mood an-
no a ions ac oss nine ca ego ies, whe e anno a o s a ed
songs bo h om hei own and he o he wo coun ies [12].
We showed ha ce ain mood e ms like ‘ene ge ic’ and
‘sad’ a e highly consis en ac oss cul u es, while mo e ab-
s ac concep s like ‘lo e’ and ‘d eamy’ di e ge conside -
ably. Simila indings ha e been epo ed by o he s ud-
ies [13, 14], highligh ing ha when mood desc ip o s a e
imposed om one language on o ano he , impo an mean-
ings can simply be ‘los in ansla ion’.
In summa y, while exis ing MER da ase s and esea ch
ha e laid a solid g oundwo k, hey emain limi ed by insu -
icien linguis ic and cul u al di e si y. Because many a e
p edominan ly English-based and ely on op-down anno-
a ion s a egies, hey may o e look how people in o he
cul u al con ex s pe cei e emo ion and mood in music.
2.2 Audio LLMs: he New F on ie in Music Tagging
Recen ad ances in mul imodal la ge language models
(LLMs) ha e opened p omising a enues o downs eam
MIR asks, including emo ion ecogni ion. These mod-
els combine he easoning capabili ies o LLMs wi h audio
pe cep ion sys ems (audio LLMs), enabling mo e lexible
and nuanced music unde s anding han adi ional classi i-
ca ion app oaches [24–26].
Models like MuLan [27] and MERT [24] ha e demon-
s a ed po en ial o ze o-sho music emo ion and mood
classi ica ion by embedding audio and na u al language de-
sc ip ions in a sha ed seman ic space. Howe e , comp e-
hensi e benchma ks such as he MuChoMusic [25] high-
ligh a c ucial limi a ion: hese models ely hea ily on lan-
guage modali y and do no a end su icien ly o audio, o -
en ailing wi h mo e nuanced audio examples o down-
s eam MIR asks. This limi a ion could be pa icula ly
c i ical o non-Wes e n music and non-English emo ion
desc ip o s, gi en ha hei aining da a a e la gely om
Wes e n con ex s.
Simila ly, closed-sou ce models (e.g., Gemini) ha e
shown p omise in psychological ex ual analysis in mul i-
lingual con ex s [28], while e alua ions in specialized do-
mains such as MIR emain sca ce. The p op ie a y na u e
o hei aining da a complica es ho ough assessmen o
c oss-lingual o c oss-cul u al pe o mance. We aim o ad-
d ess hese undamen al gaps by p o iding a la ge se o di-
e se, mul ilingual desc ip o s and anno a ions o suppo
b oade c oss-cul u al gene alizabili y o audio LLMs.
3. METHOD
3.1 Pa icipan s
We ec ui ed wo independen se s o pa icipan s ac oss
he wo s ages o ou da a collec ion: S age 1 o emo-
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
13
ion e m elici a ion (N = 778; see Sec ion 4.1) and S age
2 o subsequen a ings on op 20 e ms (N = 1,741;
see Sec ion 4.2). Pa icipan s had o be a leas 18 yea s
old, eside in he a ge coun y, and speak he a ge lan-
guage as hei p ima y language. Pa icipan s om he US
we e ec ui ed h ough P oli ic, while pa icipan s om
he o he ou coun ies (F ance, Mexico, S. Ko ea, and
Egyp ) we e ec ui ed h ough he CINT pla o m. All
pa icipan s p o ided in o med consen unde an app o ed
p o ocol (see Sec ion 7). Pa icipan s we e ins uc ed o
wea headphones and had o pass a headphone sc eening
ask [29], and a language p o iciency es [30] be o e be-
ing eligible o he main expe imen al ask. Expe imen s
we e conduc ed in each pa icipan ’s na i e language (En-
glish, F ench, Spanish, Ko ean, and Egyp ian A abic),
wi h ins uc ions ansla ed using GPT-4o. Code o epli-
ca e he expe imen h ough he PsyNe amewo k [31]
and all da a a e a ailable a h ps://gi hub.com/
ha in-gi /GlobalMood
3.2 Globally Rep esen a i e Song Selec ion
To c ea e a globally ep esen a i e music da ase , we used
weekly YouTube op 100 music cha s (yea 2017-2023)
om 59 coun ies, spanning six con inen s. To ensu e each
coun y’s cha s e lec ed i s dis inc popula music, we
excluded any ack appea ing in mo e han one coun y’s
cha . This le us wi h a coun y-exclusi e pool o songs.
F om his pool, we sampled 20 songs pe coun y, yielding
1,180 songs in o al. This di e se se is designed o cap u e
a wide ange o musical adi ions and se e as a obus
es bed o c oss-cul u al emo ion ecogni ion. Each 15-
second audio exce p was immed om a andom s a ing
poin in he ull ack, and no malized a -5dB loudness.
3.3 Model E alua ion
We used he esul ing GlobalMood da ase o e alua e se -
e al ecen mul imodal and mul ilingual models capable o
music unde s anding. Speci ically, we assessed Google’s
Gemini models (1.5 Flash,2.0 Flash, and he la es 2.5
P o), a amily o mul imodal la ge language models capa-
ble o p ocessing and easoning ac oss ex and audio (bu
also image and ideo). 3We compa ed ze o-sho and ew-
sho app oaches, whe e he la e included 10 human- a ed
emo ion e ms as examples.
Gi en ha Gemini is closed-sou ce, we also included
CLAP (Con as i e Language-Audio P e aining) [34] as
an al e na i e, open-sou ce model ha lea ns join audio-
ex embeddings. CLAP has demons a ed p omise in MIR
applica ions [35] and se es as he ounda ion o music-
speci ic models like CLaMP [36]. He e, we conduc ed
ze o-sho e alua ions h ough: (1) ex ac ing audio em-
beddings om CLAP, (2) compu ing cosine simila i ies
wi h ex embeddings o emo ion e ms, and (3) compa -
ing hese sco es o human a ings.
3P elimina y es s wi h o he ecen mul imodal models showed pe -
o mance issues—Flamingo 2 [32] s uggled wi h a ing consis ency and
GPT-4o [33] ailed o gene a e musical desc ip ions o a ings om audio
alone— hus we excluded hem om u he analysis.
We also ine- uned CLAP on GlobalMood ( ain– es
spli = 1,000:180) o assess po en ial pe o mance im-
p o emen s. To p ese e he con inuous na u e o ou a -
ings, we ep esen ed each e m in p opo ion o i s mean
a ing (e.g., he e m ‘calm’, wi h a mean a ing o 3.0,
appea ed h ee imes in he ex ). This me hod e ained
he nuanced in o ma ion in ou so labels a he han e-
ducing hem o bina y ca ego ies. To imp o e gene aliz-
abili y, we c ea ed 10 augmen ed a ia ions o each song
h ough pi ch shi ing ( ange o ±3 semi ones), loudness
adjus men ( ange o ±15dB), and he addi ion o Gaus-
sian noise (ampli ude o 0.005). Each augmen ed a ian
andomly included one o wo o hese modi ica ions.
4. RESULTS
4.1 Bo om-up Te m Elici a ion Ac oss Languages
4.1.1 Tagging pipeline
Many exis ing s udies on music emo ions ely on p e-
de ined axonomies o web-sc aped da a ha o e lim-
i ed linguis ic di e si y [10, 19]. To o e come his limi a-
ion, we employed a bo om-up, pa icipan -d i en agging
me hod [13, 15, 16]. Speci ically, we asked pa icipan s in
each coun y o comple e independen ‘chains’ o i e a i e
anno a ions. A subsample o 200 songs om he 1,180
en i e se was used as s imuli. This subsample consis ed
o 180 balanced songs ac oss coun ies, wi h an addi ional
20 local songs d awn om he pa icipa ing coun y’s pool.
This was o ensu e ha local pa icipan s encoun e enough
music s ongly ied o hei backg ound, allowing hem o
elici cul u ally speci ic emo ion desc ip o s.
Figu e 1A illus a es one such chain: (i) The i s pa ic-
ipan anno a es he song using single-wo d emo ion ags in
hei na i e language; (ii) The second pa icipan ( om he
same coun y) a es he ele ance o hese ags (1–5 scale),
lag i ele an ags (e.g., gen e- o ly ics- ela ed a he han
emo ion), add new ags as necessa y; (iii) The hi d pa -
icipan sees all ags om ea lie pa icipan s and epea s
hese s eps; (i ) This i e a i e p ocess con inues h ough
en pa icipan s pe chain, sys ema ically e ining and al-
ida ing emo ion e ms. In each coun y, we an he en i e
elici a ion expe imen wice and agg ega ed he esul s o
inc ease he di e si y o esponses om a la ge pool o
pa icipan s.
4.1.2 Top eme ging e ms
Following he emo al o ags lagged by mo e han wo
pa icipan s in a chain, ou STEP-Tag p ocess yielded
an ex ensi e, cul u ally speci ic lexicon o emo ion e ms
ac oss languages (N unique e ms: English = 644; F ench
= 528; Spanish = 870; Ko ean = 629; A abic = 283). To
iden i y he mos salien e ms in each language, we calcu-
la ed a composi e sco e o e e y e m by mul iplying i s
equency o occu ences ac oss chains by i s mean ele-
ance a ing. Highe sco es indica e e ms equen ly men-
ioned and consis en ly a ed as highly ele an .
We consolida ed closely ela ed mo phological a i-
an s (e.g., ‘happy’ and ‘happiness’ in English; gende ed
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
14
o ms such as ‘joyeux’ and ‘joyeuse’ in F ench) manu-
ally wi h na i e speake s. Figu e 1B p esen s he esul ing
20 highes - anking ags pe language, displaying bo h he
o iginal wo d and English ansla ions o acili a e c oss-
cul u al compa isons.
Despi e being explici ly asked o p o ide emo ion
e ms, pa icipan s o en ga e b oade a ec i e desc ip-
o s like moods o eelings (e.g., ‘so ’ and ‘ es i e’). This
aligns wi h p io MIR li e a u e ha o en includes bo h
emo ion and mood, and gi en i s ele ance o p ac ical
use, we did no en o ce a s ic dis inc ion.
4.2 La ge-scale Di e se Human Ra ings
4.2.1 C oss-cul u al a ings ac oss he en i e se
Ha ing iden i ied he op 20 e ms o each language, we
nex ga he ed exhaus i e a ings o he en i e 1,180 songs
o GlobalMood. We ec ui ed 1,741 new pa icipan s (see
Sec ion 3.1) who lis ened o he 15-second exce p s and
a ed how e ec i ely each exce p con eyed a gi en emo-
ion and mood e m (1–5 scale). Fo each s imulus, pa ic-
ipan s e alua ed se en andomly selec ed e ms om he
ele an language se . This sys ema ic app oach ensu ed
ha , on a e age, each song in each language ecei ed 8.38
(SD = 2.40) unique pa icipan a ings, esul ing in an ex-
ensi e collec ion o 988,925 a ings spanning ac oss i e
languages.
4.2.2 Is ‘happy’ in my language he same ‘happy’ in you
language?
To in es iga e di e ences in how each cul u e in e p e s
hese e ms, we cons uc ed 100 a ing ec o s (5 lan-
guages ×20 e ms pe language). Each ec o was 1,180-
dimensional, cap u ing he mean a ing pe e m ac oss
he 1,180 song se . We hen pe o med non-me ic mul-
idimensional scaling (MDS) using co ela ion as he dis-
ance me ic, p ojec ing hese ec o s in a wo-dimensional
space. In his emo ion ‘space,’ e ms ha posi ion close o
one ano he —e en hose om di e en languages— e lec
simila a ing pa e ns ac oss he musical examples, sug-
ges ing compa able emo ional in e p e a ions ac oss cul-
u es.
Figu e 2A isualizes his emo ion space. The e ms
clus e in o wo main egions: one egion o high a ousal
and high alence (e.g., happy,ene ge ic, and li ely; uppe
egion o he igu e) and a second egion o low a ousal ha
spans posi i e alence (e.g., peace ul; bo om le ) o neg-
a i e (e.g., sad; bo om igh ). No ably, many ansla ed
‘equi alen s’ appea close oge he , which migh sugges
a gene al c oss-cul u al consensus on wha music e okes
wha emo ions.
Howe e , examining six commonly sha ed e ms ha
ha e di ec ansla ions in a leas ou o he i e lan-
guages ( un,happy, hy hmic,lo e,sad, and calm) e-
ealed a ying deg ees o c oss-cul u al ag eemen (see
Figu e 2B). Fo each o hese e ms, be ween-coun y
ag eemen ( be ween) was compu ed as he a e age o pai -
wise co ela ion coe icien s, while wi hin-coun y ag ee-
men ( wi hin) was calcula ed using spli -hal eliabili y wi h
Spea man-B own o mula. E ec i ely, wi hin se es as
measu emen e o o compa e as baselines when e alu-
a ing be ween.
The e m calm showed he highes a e age ag eemen
( be ween = 0.52 [0.49, 0.55]; wi hin = 0.49 [0.38, 0.57]),
ollowed by un ( be ween = 0.46 [0.39, 0.53]; wi hin = 0.44
[0.27, 0.54]), lo e ( be ween = 0.44 [0.41, 0.47]; wi hin =
0.47 [0.33, 0.66]), sad ( be ween = 0.41 [0.37, 0.45]; wi hin
= 0.43 [0.32, 0.58]), hy hmic ( be ween = 0.38 [0.30, 0.45];
wi hin = 0.43 [0.25, 0.53]), and no ably happy ( be ween =
0.37 [0.28, 0.45]; wi hin = 0.45 [0.24, 0.59]).
O e all, conside ing wi hin-coun y ag eemen (mean
wi hin = 0.43–0.48), mos o hese e ms we e compa a-
ble in hei be ween-coun y ag eemen (mean be ween =
0.39–0.52). Howe e , despi e being conside ed a basic
uni e sal human emo ion [37], happy exhibi ed a consid-
e able gap be ween be ween- and wi hin-coun y ag ee-
men . This emphasizes he necessi y o inco po a ing di-
e se cul u al pe spec i es when modeling nuanced mu-
sical emo ional esponses. Reliance on ei he dic iona y
ansla ion o LLM-based ansla ion alone could o e look
impo an , con ex -speci ic nuances in emo ion and mood
pe cep ion—pa icula ly ele an when building models
o global audiences.
Figu e 3. Co ela ions be ween human a ings and mul i-
modal model p edic ions. (A) Gemini models wi h ze o-
sho p omp ing showing inc ease in pe o mance wi h
newe models. (B) CLAP models in ze o-sho and ine-
uned scena ios showing how he use o mul ilingual an-
no a ions can subs an ially inc ease pe o mance. G ay
dashed lines ep esen spli -hal eliabili y o human a -
ings using he Spea man-B own o mula as baseline e e -
ence o co ela ions achie ed be ween humans. E o ba s
indica e 95% CI o mean co ela ion ac oss songs.
4.3 Human s. Mul imodal Models
Recen benchma ks ha e e alua ed he capabili ies o au-
dio LLMs ac oss a ious downs eam MIR asks, bu hese
e alua ions ha e also been es ic ed o English [25]. We
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
15

e alua ed bo h closed-sou ce (Gemini) and open-sou ce
(CLAP) models agains ou GlobalMood mul ilingual hu-
man a ings (see Sec ion 3.3 o model de ails).
Fo Gemini (Figu e 3A), we eplica ed ou human s udy
p o ocol by p omp ing he model wi h ‘Ra e om a scale
o 1 o 5 how well his song exp esses o con eys he
emo ion [...]’ independen ly o he i e na i e languages.
We hen e alua ed how well he model’s ou pu s p edic
human judgmen s. We obse ed consis en imp o emen
wi h each new model e sion. The ea lies model, 1.5
Flash, demons a ed modes alignmen wi h human judg-
men s (mean co ela ion ac oss coun ies: = 0.34 (95%
CI = [0.27, 0.42]). This model s uggled pa icula ly wi h
Ko ean language a ings ( = 0.27 [0.21, 0.33]). The sub-
sequen 2.0 Flash e sion b idged his gap in Ko ean (
= 0.42 [0.36, 0.47]), and achie ed a subs an ially highe
mean co ela ion o = 0.42 [0.36, 0.48]. The la es 2.5
P o model demons a ed ano he leap wi h a mean co e-
la ion o = 0.50 [0.45, 0.55]. Few-sho app oach wi h
10 human- a ed examples using 2.5 P o did no imp o e
he esul s ( = 0.47 [0.42, 0.52]). This consis en upwa d
ajec o y ac oss model i e a ions p o ides compelling e i-
dence ha hese gene al-pu pose sys ems a e p og essi ely
de eloping mo e sophis ica ed unde s anding o musical
emo ions ac oss di e se linguis ic and cul u al con ex s.
This le el o co ela ions be ween Gemini and human
a ings a e on pa wi h algo i hms speci ically designed o
MER, such as Spo i y’s mood es ima ion [12, 38]. Mo e-
o e , gi en he subjec i e na u e o emo ion and mood
pe cep ion in music (whe e e en humans o en disag ee),
he la es Gemini model al eady eaches human-le el pe -
o mance, ma ching he heo e ical uppe bound de ined
by in e - a e human ag eemen (g ay dashed lines in Fig-
u e 3).
Fo CLAP, an open-sou ce al e na i e (Figu e 3B), we
e alua ed bo h a ze o-sho app oach and a ine- uned e -
sion ained on ou GlobalMood da ase (see Sec ion 3.3
o ine- uning de ails). The ze o-sho app oach measu ed
he simila i y be ween he emo ion e m’s ex embedding
and he song’s audio embedding. This ze o-sho CLAP
pe o med poo ly (mean = 0.08 [0.03, 0.13]), while ine-
uning wi h GlobalMood subs an ially imp o ed he pe -
o mance (mean = 0.31 [0.19, 0.44]). As a con ol expe -
imen , we also ine- uned CLAP using a da ase whe e all
non-English emo ion e ms we e i s ansla ed in o En-
glish using an LLM wi hou any musical con ex (ins ead
o o iginal emo ion e ms collec ed om na i e speake s).
This ansla ion-based app oach ailed o imp o e pe o -
mance (mean = 0.13 [-0.05, 0.32]), showing ha he
pe o mance gains om ine- uning come om cul u ally-
speci ic in o ma ion a he han om me e inc ease in da a
olume.
Impo an ly, imp o emen s h ough ine uning we e
mos p onounced o non-English languages. Based on
Fishe ’s z es o co ela ion compa isons, A abic showed
he la ges inc ease om = 0.11 o 0.46 (z= 0.39), ol-
lowed by Spanish ( om 0.04 o 0.32; z= 0.29) and Ko-
ean ( om 0.06 o 0.32; z= 0.27). The leas subs an ial
inc ease was obse ed o F ench ( om 0.07 o 0.17; z=
0.10). These obse a ions demons a e he p omising po-
en ial o how na i e-language e ms and a ings om ou
GlobalMood da ase can be used o gene alize o new cul-
u es and languages, when modeling MER sys ems.
5. DISCUSSION
We in oduce GlobalMood, a no el c oss-cul u al bench-
ma k da ase comp ising emo ion and mood desc ip o s
and a ings collec ed om a la ge and globally di e se pa -
icipan pool h ough a da a-d i en app oach. Consis en
wi h p e ious esea ch in music consump ion and pe cep-
ion [39–44] and music emo ion [12–14, 45, 46], ou ind-
ings highligh bo h c oss-cul u al simila i ies and di e -
ences. Speci ically, we demons a e ha music emo ion
desc ip o s ac oss languages a e b oadly o ganized a ound
clus e s ela ing o alence and a ousal [8,37].
Relying on dic iona y ansla ions o music emo ion
e ms may ace limi a ions (e.g., [45]), whe e c oss-lingual
ag eemen a ied ac oss e ms. Fo ins ance, despi e happy
being a basic emo ion, i exhibi ed a ela i ely la ge gap
be ween c oss- and wi hin-coun y ag eemen . This high-
ligh s he need o inco po a ing mul ilingual desc ip o s
om di e se anno a o s. Ou u u e wo k aims o expand
GlobalMood o each o e 20 languages [30, 47] and a
b oade ange o musical s yles, including wo ks om di -
e en his o ical pe iods, which will be c i ical o de el-
oping mo e obus ools ha can gene alize ac oss di e se
cul u al con ex s.
No ably, ine- uning CLAP on c oss-cul u al da a sig-
ni ican ly boos ed pe o mance in non-English con ex s,
highligh ing he alue o GlobalMood. These indings
poin o he po en ial o cul u ally-sensi i e MER sys-
ems, mo ing beyond one-size- i s-all models. Fu u e
wo k should in es iga e mo e a ied p omp ing and da a
augmen a ion echniques, and al e na i e LLM a chi ec-
u es.
Ou agging pipeline is quick o deploy ac oss cul u es
and uses a bo om-up app oach ha helps minimize e-
sea che bias. This o e s an ad an age o e p ede ined
e m lis s, which o en ail o cap u e he cul u al sub le ies
o emo ion e ms [48]. Howe e , i has d awbacks: ea ly
pa icipan s can bias he ag pool and he e’s no heo e -
ical gua an ee ha all ags e lec clea emo ion concep s
(e.g., ‘é ange ’ in F ench). Unlike s uc u ed sys ems o
c owdsou cing ags (e.g., TagATune [49]), ou app oach
ades con ol o lexibili y and cul u al esponsi eness.
Beyond MIR, he GlobalMood and associa ed pipeline
o e imely implica ions o b oade mul imodal and
c oss-lingual esea ch, pa icula ly in Na u al Language
P ocessing (NLP) communi ies [50]. As NLP inc eas-
ingly ackles mul imodal asks in ol ing audio– ex mod-
eling and c oss-cul u al applica ions, ou da ase may p o-
ide a use ul benchma k o e alua ing language–audio
models ac oss di e se con ex s. The i e a i e anno a ion
pipeline can also se e as an e ec i e amewo k o col-
lec ing ep esen a i e samples and anno a ions in o he do-
mains [15,16,31,51].
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
16
6. ACKNOWLEDGMENTS
H.L. was unded by he Max Planck Socie y. M.P. was
pa ially suppo ed by he NYUAD Cen e o In e ac ing
U ban Ne wo ks (CITIES), unded by Tamkeen unde he
NYUAD Resea ch Ins i u e Awa d CG001.
7. ETHICS STATEMENT
We conduc ed ou human expe imen acco ding o e hi-
cal bes p ac ices. All pa icipan s ec ui ed ia P oli ic
o CINT p o ided in o med consen based on an app o ed
p o ocol (Max Planck E hics Council #202142). Pa ic-
ipan da a was collec ed anonymously (excep o P oli ic
o CINT IDs), and all published da a a e ully anonymized.
Models we e accessed h ough comme cial APIs, wi h
ine- uning pe o med locally. Conduc ing c oss-cul u al
esea ch equi es sensi i i y o di e se e hical conside a-
ions [52], and his s udy was designed acco dingly. We
acknowledge ha machine lea ning models and aining
da ase s may con ain biases o igina ing om pa icipan s,
da a selec ion p ocesses, and he models hemsel es. De-
spi e hese po en ial biases, ou s udy aims o add ess c i -
ical limi a ions and socie al isks wi hin cu en MIR ech-
nologies, which p edominan ly ocus on English-language
music and a e s despi e se ing di e se global popula-
ions [48,53,54].
8. REFERENCES
[1] M. Zen ne , D. G andjean, and K. R. Sche e , “Emo-
ions e oked by he sound o music: Cha ac e iza-
ion, classi ica ion, and measu emen ,” Emo ion, ol. 8,
no. 4, pp. 494–521, 2008.
[2] A. S. Cowen, X. Fang, D. Sau e , and D. Kel ne ,
“Wha music makes us eel: A leas 13 dimensions
o ganize subjec i e expe iences associa ed wi h music
ac oss di e en cul u es,” P oceedings o he Na ional
Academy o Sciences, ol. 117, no. 4, pp. 1924–1934,
Jan 2020.
[3] T. Ee ola and P. Saa i, “Wha emo ions does music ex-
p ess? s uc u e o a ec e ms in music using i e a i e
c owdsou cing pa adigm,” PLOS ONE, ol. 20, no. 1,
p. e0313502, Jan 2025.
[4] H. T an, T. Le, A. Do, T. Vu, S. Bogae s, and
B. Howa d, “Emo ion-awa e music ecommenda ion,”
in P oceedings o he AAAI Con e ence on A i icial In-
elligence, ol. 37, no. 13, 2023, p. 13.
[5] J. Kang and D. He emans, “A e we he e ye ? a
b ie su ey o music emo ion p edic ion da ase s,
models and ou s anding challenges,” 2024. [Online].
A ailable: h ps://a xi .o g/abs/2406.08809
[6] Y. Sun, M. Kuo, X. Wang, W. Li, and Q. Bai,
“Emo ion-condi ioned MusicLM: Enhancing emo-
ional esonance in music gene a ion,” in 2024 IEEE
Cong ess on E olu iona y Compu a ion (CEC), Jun
2024, pp. 1–8.
[7] P. N. Juslin and D. Väs jäll, “Emo ional esponses o
music: The need o conside unde lying mechanisms,”
Beha io al and b ain sciences, ol. 31, no. 5, pp. 559–
575, 2008.
[8] T. Ee ola and J. K. Vuoskoski, “A e iew o music and
emo ion s udies: App oaches, emo ion models, and
s imuli,” Music Pe cep ion: An In e disciplina y Jou -
nal, ol. 30, no. 3, pp. 307–340, 2012.
[9] M. Pa k, J. Thom, S. Mennicken, H. C ame , and
M. Macy, “Global music s eaming da a e eal diu nal
and seasonal pa e ns o a ec i e p e e ence,” Na u e
human beha iou , ol. 3, no. 3, pp. 230–236, 2019.
[10] X. Hu and J. S. Downie, “Explo ing mood me ada a:
Rela ionships wi h gen e, a is and usage me ada a,” in
In e na ional Socie y o Music In o ma ion Re ie al
Con e ence, Dec 2007.
[11] J. C. Jackson, J. Wa s, T. R. Hen y, J.-M. Lis ,
R. Fo kel, P. J. Mucha, S. J. G eenhill, R. D. G ay, and
K. A. Lindquis , “Emo ion seman ics show bo h cul-
u al a ia ion and uni e sal s uc u e,” Science, ol.
366, no. 6472, pp. 1517–1522, 2019.
[12] H. Lee, F. Höge , M. Schönwiesne , M. Pa k, and
N. Jacoby, “C oss-cul u al mood pe cep ion in pop
songs and i s alignmen wi h mood de ec ion algo-
i hms,” in P oceedings o he In e na ional Socie y o
Music In o ma ion Re ie al Con e ence, No 2021,
pp. 366–373.
[13] E. Celen, P. an Rijn, H. Lee, and N. Jacoby,
“A e exp essions o music emo ions he same
ac oss cul u es?” 2025. [Online]. A ailable: h ps:
//a xi .o g/abs/2502.08744
[14] J. S. Gómez-Cañón, E. Cano, P. He e a, and
E. Gómez, “Joy ul o you and ende o us: he in-
luence o indi idual cha ac e is ics and language on
emo ion labeling and classi ica ion,” in P oceedings
o he In e na ional Socie y o Music In o ma ion Re-
ie al Con e ence, Oc 2020.
[15] R. Ma jieh, P. an Rijn, I. Sucholu sky, T. R. Sume s,
H. Lee, T. L. G i i hs, and N. Jacoby, “Wo ds a e all
you need? language as an app oxima ion o human
simila i y judgmen s,” in The Ele en h In e na ional
Con e ence on Lea ning Rep esen a ions, 2023.
[16] P. an Rijn, S. Me es, K. Janowski, K. Wei z, N. Ja-
coby, and E. And é, “Gi ing obo s a oice: Human-
in- he-loop oice c ea ion and open-ended labeling,” in
P oceedings o he 2024 CHI Con e ence on Human
Fac o s in Compu ing Sys ems, 2024, pp. 1–34.
[17] D. Tu nbull, L. Ba ing on, D. To es, and G. Lanck-
ie , “Towa ds musical que y-by-seman ic-desc ip ion
using he CAL500 da a se ,” in P oceedings o he 30 h
annual in e na ional ACM SIGIR con e ence on Re-
sea ch and de elopmen in in o ma ion e ie al, se .
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
17
SIGIR ’07. New Yo k, NY, USA: Associa ion o
Compu ing Machine y, Jul 2007, pp. 439–446.
[18] A. Aljanaki, Y.-H. Yang, and M. Soleymani, “De el-
oping a benchma k o emo ional analysis o music,”
PLOS ONE, ol. 12, no. 3, p. e0173392, Ma 2017.
[19] J. A ol e and M. A. Roh meie , “U ilizing lis ene -
p o ided ags o music emo ion ecogni ion: A da a-
d i en app oach,” in P oceedings o he In e na ional
Socie y o Music In o ma ion Re ie al Con e ence,
No 2024, pp. 547–554.
[20] D. Bogdano , M. Won, P. To s ogan, A. Po e ,
and X. Se a, “The MTG-Jamendo da ase o
au oma ic music agging,” in Machine Lea ning o
Music Disco e y Wo kshop, In e na ional Con e ence
on Machine Lea ning (ICML 2019), Long Beach,
CA, Uni ed S a es, 2019. [Online]. A ailable: h p:
//hdl.handle.ne /10230/42015
[21] H. S auss, J. Vigl, P.-O. Jacobsen, M. Baye , F. Ta-
lamini, W. Vigl, E. Zange le, and M. Zen ne , “The
Emo ion- o-Music mapping a las (EMMA): A sys-
ema ically o ganized online da abase o emo ion-
ally e oca i e music exce p s,” Beha . Res. Me hods,
ol. 56, no. 4, pp. 3560–3577, Ap . 2024.
[22] A. Aljanaki, F. Wie ing, and R. C. Vel kamp, “S udy-
ing emo ion induced by music h ough a c owdsou c-
ing game,” In o ma ion P ocessing & Managemen ,
ol. 52, no. 1, pp. 115–128, Jan 2016.
[23] X. Hu, J. H. Lee, K. Choi, and J. S. Downie, “A c oss-
cul u al s udy on he mood o K-POP songs,” in P o-
ceedings o he In e na ional Socie y o Music In o -
ma ion Re ie al Con e ence, Oc 2014, pp. 385–390.
[24] Y. LI, R. Yuan, G. Zhang, Y. Ma, X. Chen, H. Yin,
C. Xiao, C. Lin, A. Ragni, E. Bene os, N. Gyenge,
R. Dannenbe g, R. Liu, W. Chen, G. Xia, Y. Shi,
W. Huang, Z. Wang, Y. Guo, and J. Fu, “MERT:
Acous ic music unde s anding model wi h la ge-scale
sel -supe ised aining,” in The Twel h In e na ional
Con e ence on Lea ning Rep esen a ions, 2024.
[25] B. Weck, I. Manco, E. Bene os, E. Quin on,
G. Fazekas, and D. Bogdano , “MuChoMusic: E al-
ua ing music unde s anding in mul imodal audio-
language models,” in P oceedings o he In e na ional
Socie y o Music In o ma ion Re ie al Con e ence,
No 2024.
[26] S. Doh, K. Choi, J. Lee, and J. Nam, “LP-MusicCaps:
LLM-based pseudo music cap ioning,” in P oceedings
o he In e na ional Socie y o Music In o ma ion Re-
ie al Con e ence, 2023.
[27] Q. Huang, A. Jansen, J. Lee, R. Gan i, J. Y. Li, and
D. P. W. Ellis, “MuLan: A join embedding o mu-
sic audio and na u al language,” in P oceedings o he
In e na ional Socie y o Music In o ma ion Re ie al
Con e ence, 2022.
[28] S. Ra hje, D.-M. Mi ea, I. Sucholu sky, R. Ma jieh,
C. E. Robe son, and J. J. Van Ba el, “GPT is an e ec-
i e ool o mul ilingual psychological ex analysis,”
P oceedings o he Na ional Academy o Sciences, ol.
121, no. 34, p. e2308950121, Aug. 2024.
[29] A. E. Milne, R. Bianco, K. C. Poole, S. Zhao, A. J.
Oxenham, A. J. Billig, and M. Chai , “An online head-
phone sc eening es based on dicho ic pi ch,” Beha -
io Resea ch Me hods, ol. 53, no. 4, pp. 1551–1562,
2021.
[30] P. an Rijn, Y. Sun, H. Lee, R. Ma jieh, I. Sucholu -
sky, F. Lanza ini, E. And é, and N. Jacoby, “A ound
he wo ld in 60 wo ds: A gene a i e ocabula y es o
online esea ch,” in P oceedings o he Annual Mee ing
o he Cogni i e Science Socie y, 2023.
[31] P. M. C. Ha ison, R. Ma jieh, F. Adol i, P. an
Rijn, M. Anglada-To , O. Tche nicho ski, P. La ouy-
Maes i, and N. Jacoby, “Gibbs sampling wi h peo-
ple,” in Ad ances in Neu al In o ma ion P ocessing
Sys ems, 2020, pp. 10 659–10 671.
[32] S. Ghosh, Z. Kong, S. Kuma , S. Sakshi, J. Kim,
W. Ping, R. Valle, D. Manocha, and B. Ca anza o,
“Audio Flamingo 2: An audio-language model
wi h long-audio unde s anding and expe easoning
abili ies,” 2025. [Online]. A ailable: h ps://a xi .o g/
abs/2503.03983
[33] A. Hu s , A. Le e , A. P. Gouche , A. Pe elman,
A. Ramesh, A. Cla k, A. Os ow, A. Welihinda,
A. Hayes, A. Rad o d e al., “GPT-4o sys em ca d,”
a Xi p ep in a Xi :2410.21276, 2024.
[34] Y. Wu, K. Chen, T. Zhang, Y. Hui, T. Be g-Ki kpa ick,
and S. Dubno , “La ge-scale con as i e language-
audio p e aining wi h ea u e usion and keywo d- o-
cap ion augmen a ion,” in ICASSP 2023 - 2023 IEEE
In e na ional Con e ence on Acous ics, Speech and
Signal P ocessing (ICASSP). IEEE, Jun. 2023, pp.
1–5.
[35] J. Ba ne , H. F. Ga cia, and B. Pa do, “Explo ing mu-
sical oo s: Applying audio embeddings o empowe
in luence a ibu ion o a gene a i e music model,” in
P oceedings o he In e na ional Socie y o Music In-
o ma ion Re ie al Con e ence, No 2024.
[36] S. Wu, Z. Guo, R. Yuan, J. Jiang, S. Doh, G. Xia,
J. Nam, X. Li, F. Yu, and M. Sun, “CLaMP 3:
Uni e sal music in o ma ion e ie al ac oss unaligned
modali ies and unseen languages,” 2025. [Online].
A ailable: h ps://a xi .o g/abs/2502.10362
[37] P. Ekman, “A e he e basic emo ions?” Psychological
Re iew, ol. 99, no. 3, pp. 550–553, 1992.
[38] D. Duman, P. Ne o, A. Ma olampados, P. Toi iainen,
and G. Luck, “Music we mo e o: Spo i y audio ea-
u es and easons o lis ening,” PLoS One, ol. 17,
no. 9, p. e0275228, Sep. 2022.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
18
[39] N. Jacoby, E. A. Undu aga, M. J. McPhe son,
J. Valdés, T. Ossandón, and J. H. McDe mo , “Uni e -
sal and non-uni e sal ea u es o musical pi ch pe cep-
ion e ealed by sung ep oduc ion,” Cu en Biology,
ol. 29, no. 19, pp. 3229–3243, 2019.
[40] S. A. Meh , M. Singh, D. Knox, D. M. Ke e ,
D. Pickens-Jones, S. A wood, C. Lucas, A. A. Egne ,
N. Jacoby, E. J. Hopkins e al., “Uni e sali y and di-
e si y in human song,” Science, ol. 366, no. 6468, p.
eaax0868, 2019.
[41] N. Jacoby, R. Polak, J. A. G ahn, D. J. Came on, K. M.
Lee, R. Godoy, E. A. Undu aga, T. Huanca, T. Thal-
wi ze , N. Doumbia e al., “Commonali y and a ia ion
in men al ep esen a ions o music e ealed by a c oss-
cul u al compa ison o hy hm p io s in 15 coun ies,”
Na u e Human Beha iou , ol. 8, no. 5, pp. 846–877,
2024.
[42] P. E. Sa age, S. B own, E. Sakai, and T. E. Cu ie, “S a-
is ical uni e sals e eal he s uc u es and unc ions o
human music,” P oceedings o he Na ional Academy
o Sciences, ol. 112, no. 29, pp. 8987–8992, 2015.
[43] J. H. McDe mo , A. F. Schul z, E. A. Undu aga, and
R. A. Godoy, “Indi e ence o dissonance in na i e
amazonians e eals cul u al a ia ion in music pe cep-
ion,” Na u e, ol. 535, no. 7613, pp. 547–550, 2016.
[44] H. Lee, N. Jacoby, R. Hennequin, and M. Moussal-
lam, “Mechanisms o cul u al di e si y in u ban pop-
ula ions,” Na u e Communica ion, ol. 16, no. 1, p.
5192, Jun. 2025.
[45] A. S. Cowen, X. Fang, D. Sau e , and D. Kel ne ,
“Wha music makes us eel: A leas 13 dimensions
o ganize subjec i e expe iences associa ed wi h music
ac oss di e en cul u es,” P oceedings o he Na ional
Academy o Sciences, ol. 117, no. 4, pp. 1924–1934,
2020.
[46] T. F i z, S. Jen schke, N. Gosselin, D. Sammle ,
I. Pe e z, R. Tu ne , A. D. F iede ici, and S. Koelsch,
“Uni e sal ecogni ion o h ee basic emo ions in mu-
sic,” Cu en Biology, ol. 19, no. 7, pp. 573–576,
2009.
[47] J. P. Niede mann, I. Sucholu sky, R. Ma jieh, E. Celen,
T. G i i hs, N. Jacoby, and P. an Rijn, “S udying he
e ec o globaliza ion on colo pe cep ion using mul i-
lingual online ec ui men and la ge language models,”
in P oceedings o he Annual Mee ing o he Cogni i e
Science Socie y, ol. 46, 2023.
[48] D. E. Blasi, J. Hen ich, E. Adamou, D. Kemme e , and
A. Majid, “O e - eliance on English hinde s cogni i e
science,” T ends in Cogni i e Sciences, ol. 26, no. 12,
pp. 1153–1170, 2022.
[49] E. L. M. Law, L. on Ahn, R. B. Dannenbe g, and
M. C aw o d, “TagATune: A game o music and
sound anno a ion,” in P oceedings o he In e na ional
Con e ence on Music In o ma ion Re ie al, 2007, pp.
361–364.
[50] D. He shco ich, S. F ank, H. Len , M. de Lhoneux,
M. Abdou, S. B andl, E. Buglia ello, L. Cabello Pi-
que as, I. Chalkidis, R. Cui, C. Fie o, K. Ma ga ina,
P. Rus , and A. Søgaa d, “Challenges and s a egies in
c oss-cul u al NLP,” in P oceedings o he 60 h Annual
Mee ing o he Associa ion o Compu a ional Linguis-
ics (Volume 1: Long Pape s), S. Mu esan, P. Nako ,
and A. Villa icencio, Eds. Dublin, I eland: Asso-
cia ion o Compu a ional Linguis ics, May 2022, pp.
6997–7013.
[51] D.-M. Huang, P. Van Rijn, I. Sucholu sky, R. Ma jieh,
and N. Jacoby, “Cha ac e izing simila i ies and
di e gences in con e sa ional ones in humans and
LLMs by sampling wi h people,” in P oceedings
o he 62nd Annual Mee ing o he Associa ion o
Compu a ional Linguis ics (Volume 1: Long Pape s),
L.-W. Ku, A. Ma ins, and V. S ikuma , Eds.
Bangkok, Thailand: Associa ion o Compu a ional
Linguis ics, Aug. 2024, pp. 10 486–10 512. [Online].
A ailable: h ps://aclan hology.o g/2024.acl-long.565/
[52] N. Jacoby, E. H. Ma gulis, M. Clay on, E. Hannon,
H. Honing, J. I e sen, T. R. Klein, S. A. Meh , L. Pea -
son, I. Pe e z e al., “C oss-cul u al wo k in music cog-
ni ion: Challenges, insigh s, and ecommenda ions,”
Music Pe cep ion, ol. 37, no. 3, pp. 185–195, 2020.
[53] J. Hen ich, S. J. Heine, and A. No enzayan, “Mos peo-
ple a e no WEIRD,” Na u e, ol. 466, no. 7302, pp.
29–29, 2010.
[54] B. Ami i, N. Shah e di, A. Haddadi, and Y. Ghah e-
mani, “Beyond he ends: E olu ion and u u e di ec-
ions in music ecommende sys ems esea ch,” IEEE
Access, 2024.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
19