ENHANCING MUSIC RECOMMENDER SYSTEMS WITH MULTIMEDIA
CONTENT: A CONTEXT-AWARE APPROACH
Oleg Leso a1Ve onica Cla ijo2A ia Rizwani2
Ma kus Schedl1B uce Fe we da2
1Ins i u e o Compu a ional Pe cep ion, Johannes Keple Uni e si y Linz, Linz, Aus ia
2Depa men o Compu e Science and In o ma ics, Jönköping Uni e si y, Jönköping, Sweden
[email p o ec ed], [email p o ec ed]
ABSTRACT
The e olu ion o he music indus y has in oduced mul i-
media elemen s, such as ideo, ex , and images, in o mu-
sic consump ion. Howe e , cu en Music Recommende
Sys ems (MRSs) emain p edominan ly audio- ocused, e-
qui ing explici use in e ac ion o access addi ional media.
This s udy explo es he in eg a ion o mul imedia con en
in o MRSs, conside ing he ole o con ex ual ac i i ies and
he Uses and G a i ica ions (U&G) amewo k in enhanc-
ing pe sonaliza ion and engagemen . A dia y s udy wi h 26
pa icipan s o e one week iden i ied nine key ac i i ies,
wi h Household Cho es, Wo kou , and Focusing being
he mos ele an . These ac i i ies e ealed no el U&Gs
such as “Fo P e e ence,” “Fo Con enience,” “Fo Dis-
co e y,” and “To Ge Dis ac ed.” A subsequen use s udy
compa ed a Basic Music App (audio-only) wi h a Modi-
ied Music App (mul imedia-enhanced). Resul s showed
ha pa icipan s p e e ed he Modi ied Music App ac oss
i e cons uc s: no el y, ease o use, use ulness, sa is-
ac ion, and in en ion o use. These indings sugges
ha mul imedia-enhanced ecommenda ions can imp o e
use expe ience by aligning wi h ac i i y-speci ic p e e -
ences. The s udy con ibu es o esea ch on pe sonalized
MRSs and o e s insigh s o de eloping con ex -awa e,
mul imedia-d i en ecommenda ions.
1. INTRODUCTION
Music, in i s simples o m, combines sound, bo h o-
cal and ins umen al, o c ea e beau y in o m o exp es-
sion [1]. Music pe mea es ou daily li es, accompanying
us h ough ou ou ines and expe iences. DeNo a [2] em-
phasizes how music undamen ally in luences ou moods,
shapes ou memo ies, and connec s us o he wo ld a ound.
The way people consume music has e ol ed signi i-
can ly o e ime. Since he launch o MTV in 1981, which
ans o med he music indus y in o a mo e isual expe-
© O. Leso a, V. Cla ijo, A. Rizwani, M. Schedl and B. Fe -
we da. Licensed unde a C ea i e Commons A ibu ion 4.0 In e na ional
License (CC BY 4.0). A ibu ion: O. Leso a, V. Cla ijo, A. Rizwani,
M. Schedl and B. Fe we da, “Enhancing Music Recommende Sys ems
wi h Mul imedia Con en : A Con ex -Awa e App oach”, in P oc. o he
26 h In . Socie y o Music In o ma ion Re ie al Con ., Daejeon, Sou h
Ko ea, 2025.
ience [3], music consump ion has expanded beyond jus
lis ening. Ka z [4] highligh s how isual elemen s, such as
he Illus a ed Song Machine and he ise o s e eophony,
ha e enhanced musical expe iences. Today, music- ela ed
media includes no only audio bu also ex (e.g., ly ics),
ideo (e.g., o icial music ideos), and images (e.g., album
co e s) [5, 6]. Pla o ms like Spo i y, Amazon Music, and
YouTube Music o e use s complemen a y media, includ-
ing ideos and ly ics. Howe e , accessing his media ypi-
cally equi es ac i e in e ac ion, which is no always ideal.
As he na u e o music consump ion has shi ed, e-
sea che s ha e inc easingly sough o unde s and no only
why people lis en o music bu also how hey engage wi h
he b oade spec um o music- ela ed media. The Uses
and G a i ica ions Theo y (UGT) has eme ged as a p omi-
nen amewo k o explo ing hese mo i a ions, empha-
sizing how indi iduals ac i ely seek media o sa is y spe-
ci ic needs and achie e goals [7]. Lonsdale and No h [8]
applied UGT o in es iga e mo i a ions o music lis en-
ing, including compa isons o o he leisu e ac i i ies and
age- ela ed di e ences. Simila ly, K ause and B own [9]
examined music o ma p e e ences h ough he UGT lens,
while De la Rosa He e a & Publiese [10] explo ed easons
o music consump ion among eme ging adul s.
Resea ch mo i a ions in his ield a e di e se, ang-
ing om unde s anding he ole o music in e e yday
li e [8, 11] and examining i s impac on human exis ence
[12–17] o explo ing i s social and psychological unc ions
[18]. Recen ly, he ocus has expanded o include Mu-
sic Recommende Sys ems (MRSs), which le e age use
p e e ences o p edic and ecommend songs [19–22]. The
ise o s eaming se ices like Spo i y, Apple Music, and
YouTube Music has ampli ied he impo ance o MRSs in
shaping he music consump ion expe ience.
Despi e hei widesp ead use, many MRSs ail o ac-
coun o con ex ual ac o s, such as use ac i i y, which
signi ican ly in luence music consump ion [21–23]. This
o e sigh o en esul s in unsa is ac o y ecommenda ions
ha do no align wi h he dynamic needs o use s. Schedl e
al. [22] p opose ha inco po a ing con ex ual aspec s could
signi ican ly enhance he pe o mance o MRSs, o deli e
mo e ele an and sa is ying ecommenda ions.
Ga cia-Ga h igh e al. [24] ound ha use sa is ac-
ion wi h MRSs inc eases when he ecommenda ions
align wi h music as e, mee use needs, and suppo goal
547
achie emen . These la e wo unc ions align closely wi h
UGT, sugges ing ha e ec i e MRSs should no only ec-
ommend music ha ma ches use p e e ences bu also ad-
d ess deepe needs and goals. This alignmen highligh s
he c i ical ole o unde s anding bo h use needs and con-
ex ual ac o s in designing nex -gene a ion MRSs.
The in e sec ion o e ol ing media consump ion habi s,
con ex ual ac i i ies, and he uses and g a i ica ions associ-
a ed wi h music- ela ed media p esen s a unique oppo u-
ni y o enhance MRSs. By applying he UGT amewo k,
esea che s can de elop mo e pe sonalized and con ex -
awa e sys ems. This esea ch aims o explo e hese dimen-
sions, con ibu ing o he c ea ion o use -cen e ed MRSs
ha no only align wi h musical p e e ences bu also ad-
d ess b oade media consump ion habi s and con ex ual ac-
i i y needs. Hence, he p esen s udy add esses he ollow-
ing esea ch ques ions:
RQ1: Wha a e he uses and g a i ica ions o engaging
wi h music- ela ed media (audio, ex , ideo, image) based
on he use ’s con ex ual ac i i ies?
RQ2: How does a pe sonalized ecommenda ion o music-
ela ed media in luence use expe ience when conside ing
di e se uses and g a i ica ions, and con ex ual ac i i ies?
This s udy aims o ill he gap in cu en esea ch by
examining he mo i a ions behind engaging wi h di e -
en ypes o music- ela ed media (e.g., ex , ideo, image)
nex o solely audio, and unde s anding how con ex ual ac-
i i ies in luence hese choices. The e o e, his s udy o-
cuses on con ex ual ac i i ies as a c i ical ac o in imp o -
ing music ecommenda ions. By in eg a ing insigh s om
hese a eas, his esea ch seeks o p o ide mo e accu a e
and ele an ecommenda ions ha enhance use expe i-
ences based on eal- ime ac i i ies and media p e e ences.
2. RELATED WORK
Resea ch on MRSs has inc easingly emphasized he im-
pac o use s’ con ex s, mo i a ions, and emo ional s a es
on music selec ion. While hese sys ems o en ely on
da a such as play coun s, skips, and likes, se e al s udies
highligh he in luence o si ua ional ac o s, such as lo-
ca ion, ac i i y, o mood, on a use ’s immedia e lis ening
choices [21, 23, 25–27]. Schola s a gue ha hese con ex-
ual cues e lec unde lying “g a i ica ions” use s seek, an
idea o igina ing om UGT. I a sys em igno es such con-
ex , i can misin e p e use beha io and deli e ecom-
menda ions ha lack immedia e ele ance [22].
Al hough widely used echniques like collabo a i e and
con en -based il e ing e ec i ely iden i y pa e ns in la ge
use popula ions, hey a ely cap u e he nuances o when
and why use s lis en. Fo ins ance, a use ’s p e e ence
o high-ene gy music while exe cising may ge o e shad-
owed i he sys em only looks a he agg ega e lis ening
his o y and no a he ime o loca ion associa ed wi h each
playback. S udies sugges in eg a ing con ex ual signals,
po en ially ga he ed om use -de ice in e ac ions o e en
ex e nal senso s, o align ecommenda ions mo e closely
wi h use s’ day- o-day eali ies [22,24,28].
Meanwhile, leading s eaming se ices ha e begun o
expand beyond adi ional audio consump ion. Fo in-
s ance, Spo i y no only s eams music bu also p o ides
buil -in ly ics and o icial music ideos o p emium use s.
Amazon Music le e ages i s X-Ray ea u e (adap ed om
Amazon Video), displaying eal- ime i ia o ex anno a-
ions abou he cu en ack. YouTube Music allows use s
o swi ch be ween audio and ideo seamlessly. Despi e
hese added unc ionali ies, pla o ms gene ally lea e i o
lis ene s o ini ia e any deepe engagemen , such as click-
ing on he ly ics ab o op ing o wa ch a ideo. Conse-
quen ly, he se ice does no dynamically ecommend mul-
imedia o ma s ha migh enhance an indi idual’s speci ic
con ex o mo i a ion a he momen .
Resea che s s udying mul iple music- ela ed media o -
ma s unde sco e hei po en ial o sa is y b oade use
needs. Deldjoo e al. [6] and A cha e al. [5] show ha lis-
ene s may wa ch ideos o mo e imme si e expe iences,
ead ly ics o cla i y, o b owse a wo k o app ecia e a
song’s isual iden i y. Linking hese mo i a ions o UGT’s
no ion o ac i e media selec ion could ex end he impac
o MRSs. By ecognizing ha a lis ene migh , o in-
s ance, p e e soo hing ins umen als while s udying o an
imme si e music ideo when ac i ely b owsing con en ,
an MRS could shi om me ely sugges ing music aligned
wi h hei gene al p e e ence o ecommending he combi-
na ion o o ma s mos ele an in he gi en con ex . Inco -
po a ing he p inciples o UGT can help MRSs mo e be-
yond simple p e e ence ma ching, o e ing mo e nuanced
sugges ions ha add ess di e se lis ening scena ios.
3. METHODS
To in es iga e how con ex ual ac o s a ec music con-
sump ion and explo e whe he con ex ual, ac i i y-awa e
ecommenda ions can enhance use expe ience, his e-
sea ch comp ised wo complemen a y s udies: a Dia y
S udy and a Use S udy. The Dia y S udy examined
how indi iduals’ uses and g a i ica ions (U&G) o music-
ela ed media align wi h hei day- o-day ac i i ies (RQ1).
Building on hese insigh s, he Use S udy e alua ed how
ecommending addi ional media o ma s ( ideo, ex , o
images) based on con ex ual ac i i ies and U&G impac s
use expe ience (RQ2).
By combining he con ex ual insigh s om he Dia y
S udy wi h an expe imen al Use S udy, hese me hods o -
e a comp ehensi e app oach o unde s anding how eal-
ime ac i i ies and use mo i a ions can shape ecommen-
da ions o music- ela ed media 1.
3.1 Dia y S udy
A dia y s udy app oach was selec ed o cap u e pa ici-
pan s’ ou ines and mo i a ions in si u, hus minimizing e-
call bias. Each pa icipan comple ed a dia y o se en con-
secu i e days, co e ing weekdays and weekends. Pa ici-
pan s eco ded ou main i ems o each music consump-
1We sha e he consen o ms, pa icipan b ie s, adminis a o sc ip s
and addi ional analysis in he eposi o y: h ps://gi hub.com/
hcai-mms/m s_con ex -ac i i y_media
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
548
ion episode: (1) da e and ime, (2) cu en ac i i y, (3)
ype o music- ela ed media (audio, ideo, ex , o image),
and (4) he easons o choosing ha media. This inciden-
al dia y design asked pa icipan s o submi en ies when
hey na u ally engaged wi h music- ela ed media, he eby
educing i ele an o o ced da a.
Inclusion c i e ia speci ied ha pa icipan s mus con-
sume music- ela ed media daily, use mul iple s eaming
se ices, and p o ide w i en consen o collec ing dia y
en ies and demog aphic de ails. A sho pilo wi h i e
pa icipan s o e h ee days es ed he cla i y o ins uc-
ions and he easibili y o he collec ion p ocess, leading
o mino e inemen s in ques ion ph asing.
3.1.1 P ocedu es
A o al o 26 pa icipan s (19 emale, 7 male) comple ed
he dia y s udy, yielding 185 dia y en ies. En y coun s pe
pa icipan anged om h ee o ou een, and h ee pa ic-
ipan s ex ended hei pa icipa ion by wo addi ional days.
No pa icipan s wi hd ew o p o ided unusable da a. The
dia y da a se ed o iden i y a spec um o U&G ha pa -
icipan s sough in di e en con ex s. Each epo ed ac i -
i y was ca ego ized. The inal ca ego ies we e na owed
o ou key ac i i ies: “Social Ga he ing,” “Household
Cho es,” “Wo kou ,” and “Relaxing.” These we e g ouped
in o wo se s: G oup 1 (“Social Ga he ing,” “Household
Cho es”) and G oup 2 (“Wo kou ,” “Relaxing”) ha o e
a di e se combina ion o music- ela ed media ecommen-
da ions.
3.2 Use S udy
Building on he Dia y S udy esul s, he Use S udy in es i-
ga ed whe he con ex ual awa eness, pai ed wi h addi ional
media ecommenda ions, imp o es use expe ience (RQ2).
Two music applica ion p o o ypes we e de eloped:
•Basic Music App P o o ype: A simpli ied sys em
ecommending only audio.
•Modi ied Music App P o o ype: A sys em ha p o-
ides audio plus complemen a y media (i.e., ideo,
ex , o image), guided by he con ex ual ac i i ies
and mo i a ions iden i ied in he Dia y S udy.
Bo h p o o ypes we e e alua ed h ough wo use expe-
ience amewo ks:
•ResQue [29], measu ing ecommenda ion accu acy,
no el y, ease o use, pe cei ed use ulness, o e all
sa is ac ion, and in en ion o use.
•UEQ-S (Use Expe ience Ques ionnai e Sho Ve -
sion) [30], assessing b oade usabili y and use ex-
pe ience a ibu es.
To ensu e ha pa icipan s ocused on he ecom-
mended media o ma s ( a he han song p e e ence), mu-
sic acks we e selec ed om Spo i y’s ending cha s
and ac i i y-based playlis s. Pa icipan s we e in o med
ha hei ask was o assess how each p o o ype’s ecom-
mended con en aligned wi h a gi en scena io, a he han
o e alua e he song ecommenda ions.
3.2.1 P ocedu es
A o al o 63 pa icipan s we e ec ui ed, mainly
h ough Wha sApp g oups in s uden accommoda ions in
Jönköping, Sweden, and we e o e ed small incen i es
(e.g., soda and candy). Each pa icipan unde wen wo
s ages:
1. Basic Music App: Pa icipan s chose om he ac-
i i ies in hei assigned g oup (e.g., “Household
Cho es”), in e ac ed wi h he p o o ype (audio only),
and hen comple ed bo h ResQue and UEQ-S ques-
ionnai es.
2. Modi ied Music App: Pa icipan s e isi ed he
same ac i i ies, bu his ime he p o o ype displayed
complemen a y media (i.e., ideo, ex , o images)
selec ed acco ding o Dia y S udy insigh s. They
again comple ed ResQue and UEQ-S.
Sessions las ed app oxima ely 10 minu es, occasionally
ex ending o u he discussion when pa icipan s ga e no-
able eedback. Following da a collec ion, ou pa icipan
en ies we e emo ed due o disengaged o epe i i e e-
sponses, lea ing 59 alid da a poin s o analysis.
4. RESULTS
This sec ion p esen s indings om wo complemen a y
s udies. The i s pa co e s he Dia y S udy, which
explo es how di e en day- o-day ac i i ies shape pa ic-
ipan s’ choices o music- ela ed media and he uses and
g a i ica ions (U&G) hey seek (RQ1). The second pa
desc ibes he Use S udy, examining whe he con ex ual,
mul i- o ma ecommenda ions imp o e use expe ience
(RQ2).
4.1 Dia y S udy Resul s
O e a se en-day pe iod, pa icipan s eco ded episodes o
music- ela ed media use, yielding a o al o 185 dia y en-
ies. We disca ded 30 in alid en ies o issues such as
non-music con en o insu icien ac i i y con ex (e.g., “A
a kiosk”). This le 155 alid en ies, each desc ibing (1)
he ac i i y, (2) he media ype(s) used (i.e., only audio o
complimen a y wi h ideo, ex , o image), and (3) he mo-
i a ion behind selec ing hese media. We ca ego ized he
en ies using A ini y Diag amming, c ea ing wo se s o
clus e s ep esen ing ac i i ies and U&G:
•Ac i i ies: Nine dis inc hemes eme ged: House-
hold Cho es (25 en ies), Wo kou (25), Focusing
(20), Social Ga he ing (19), Commu ing (18), Re-
laxing (16), Passing he Time (13), Medi a ing (10),
and D i ing (9).
•Uses and G a i ica ions (U&G): Thi een ca e-
go ies we e iden i ied. The mos common included:
Fo P e e ence (34 en ies), As a Backg ound (19),
To Pass he Time / Enjoy (13), Fo Ac i e In e ac ion
(11), and Fo Disco e y (11). O he ecu ing mo-
i a ions we e Fo Con enience,To Relax,To Ge
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
549
(a) Only audio (b) Audio + image (c) Audio + ly ics (d) Audio + ideo
Figu e 1: In e aces wi h audio and di e en music- ela ed media
Dis ac ed,To Concen a e,To Se he En i onmen ,
Playlis Usage,To Ma ch Mood, and To Help Sleep /
Fall Asleep.
4.1.1 S a is ical Analysis o Media Choice
To in es iga e how di e en ac i i ies in luenced he
choice o media, we ans o med he da a o indica e p es-
ence (1) o absence (0) o audio, ideo, ex , o image
in each dia y en y. A Mul i a ia e Analysis o Va iance
(MANOVA) showed a s a is ically signi ican e ec o ac-
i i y on hese media selec ions (p < 0.001)2. Fo in-
s ance, D i ing was associa ed exclusi ely wi h audio in
all cases, whe eas Passing he Time had a highe p opo -
ion o ideo o ex usage, Table 1.
These esul s highligh he cen al ole o con ex ual ac-
i i y in shaping he media o ma s use s p e e and he
g a i ica ions hey seek, such as using audio-only con en
while d i ing o combining ideo and ex when mo e en-
gaged o elaxed.
4.2 Use S udy Resul s
The use s udy (N=63 ini ially; 59 alid a e excluding
disengaged esponses) examined whe he a Modi ied Mu-
sic App o e ing con ex -based ecommenda ions o music-
ela ed media imp o es use expe ience (RQ2).
4.2.1 Use Expe ience (UEQ-S) Findings
We measu ed h ee dimensions in UEQ-S: p agma ic qual-
i y,hedonic quali y, and o e all use expe ience. Ac oss
all 59 pa icipan s, he Modi ied Music App consis en ly
achie ed highe a e age sco es han he Basic Music App,
sugges ing ha p o iding ex a media o ma s can en ich
use engagemen . B eaking his down by g oup:
2De ailed epo can be ound in he companion eposi o y
1. “Social Ga he ing,” “Household Cho es”: Pa ic-
ipan s epo ed highe hedonic quali y in he Mod-
i ied App (¯x= 1.917 s. ¯x= 1.028), inding i
mo e en e aining and no el—pa icula ly o social
con ex s o idle cho es (¯x= 2.148 s. ¯x= 1.509).
2. “Wo kou ,” “Relaxing”: While pa icipan s s ill
a o ed he Modi ied App o e all, some poin ed
ou ha ex a media (e.g., ly ics o ideo) du ing
Wo kou could be dis ac ing, leading hem o wish
o be e oggling o cus omiza ion (hedonic: ¯x=
0.836 s. ¯x= 1.953 and p agma ic: ¯x= 1.609 s.
¯x= 2.117)).
Al hough he Modi ied App imp o ed o e all imp ession,
con ex ual nuances (e.g., a physically ac i e se ing) could
a ec how much people app ecia ed he added media.
4.2.2 ResQue Analysis
We used six cons uc s om he ResQue amewo k o
e alua e pa icipan s’ pe cep ions ega ding:
1. Accu acy (ACC): How well he app’s ecommenda-
ions i use s’ expec a ions.
2. No el y (NOV): Whe he he app in oduced in e -
es ing new con en .
3. Pe cei ed Ease o Use (PEU): How simple i was
o na iga e o in e ac wi h he app.
4. Pe cei ed Use ulness (PU): The ex en o which
pa icipan s el he ecommenda ions enhanced
hei expe ience.
5. Use Sa is ac ion (US): O e all sa is ac ion wi h
he sys em.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
550
Table 1: Media o ma s epo ed ac oss di e en ac i i ies.
Household Cho es Wo kou Focusing Social Ga he ing Commu ing Relaxing Passing he Time Medi a ing D i ing
Audio 38.46% 80.77% 70.83% 42.11% 78.95% 52.94% 30.77% 100.00% 100.00%
Video 57.69% 15.38% 12.50% 52.63% 15.79% 29.41% 46.15% 0.00% 0.00%
Tex 0.00% 3.85% 8.33% 5.26% 5.26% 0.00% 15.38% 0.00% 0.00%
Image 3.85% 0.00% 8.33% 0.00% 0.00% 17.65% 7.69% 0.00% 0.00%
Table 2: Pai ed-Samples T-Tes o ResQue Cons uc s.
Nega i e means indica e p e e ence o he Modi ied App.
Pai Mean Di . SD p (1-sided) p (2-sided)
ACC -0.34 1.060 -2.455 0.009 0.0170
NOV -0.54 0.916 -4.549 <0.001 <0.001
PEU -0.37 0.717 -3.996 <0.001 <0.001
PU -0.51 0.838 -4.660 <0.001 <0.001
US -0.39 0.695 -4.307 <0.001 <0.001
IU -0.49 0.751 -5.025 <0.001 <0.001
6. In en ion o Use (IU): Likelihood o adop ing he
app in he u u e.
Desc ip i e s a is ics showed ha on a e age, pa icipan s
a ed he Modi ied App mo e a o ably on each cons uc
(e.g., highe ACC, highe NOV, e c.). A pai ed-samples
T- es indica ed hese di e ences we e s a is ically signi i-
can a he 99% le el (p < 0.01) o mos cons uc s, wi h
one excep ion: Accu acy (ACC), which was signi ican a
95% (p= 0.017). This aligns wi h he open-ended eed-
back, whe e some use s el he added media did no al-
ways align wi h hei exac ac i i y o p e e ences, espe-
cially in physically ac i e scena ios (see Table 2).
4.2.3 Open-Ended Re lec ions
To gain deepe insigh s, we in i ed pa icipan s o com-
men on hei expe iences in ee- o m esponses. Com-
mon ema ks included:
•“When I’m wi h iends, i ’s g ea o ha e ideos.
Bu i I’m alone, I usually jus lock he sc een.”
•“Fo household cho es, I don’ need ly ics o im-
ages. Audio is enough.”
•“In he gym, I don’ wan o look a ex o a ideo
all he ime. I can be annoying.”
•“Relaxing some imes means u ning e e y hing o .
Bu i I’m bo ed, I’ll wa ch he ideo.”
Se e al pa icipan s indica ed a s ong desi e o oggle o
cus omize he ex a media ea u es o i hei immedia e
si ua ion. While hey gene ally ound he Modi ied App
mo e en e aining o no el, he use ulness o ha ing ly ics,
ideo, o images emained highly con ex -dependen .
5. DISCUSSION
5.1 RQ1: Uses and G a i ica ions in Con ex ual
Ac i i ies
This s udy se ou o iden i y he uses and g a i ica ions
(U&G) d i ing indi iduals o consume di e en music-
ela ed media (audio, ex , ideo, image) depending on
hei con ex ual ac i i ies (RQ1). The Dia y S udy led
o a lis o nine dis inc ac i i ies among 26 pa icipan s.
Ou analysis showed ha audio was consumed mos e-
quen ly, ollowed closely by he addi ion o ideo, while
images we e he leas used. S a is ical es ing indica ed ha
ac i i y ype signi ican ly in luenced media usage pa e ns,
poin ing o he impo ance o con ex ual ac o s.
The Dia y S udy also p oduced hi een di e en U&G
o music- ela ed media consump ion. Se e al o hese
mo i a ions co esponded o hose ound in p e ious
music-lis ening esea ch, including “As a Backg ound”
[11, 12], “To Pass he Time / Enjoy” [8, 31], “Fo Ac-
i e In e ac ion” (Lonsdale & No h, 2011), “To Relax”
[8, 11, 31], and “To Concen a e” [8, 11]. Howe e , ad-
di ional mo i a ions such as “Fo P e e ence,” “Fo Con-
enience,” “Fo Disco e y,” and “To Ge Dis ac ed” ap-
pea ed o be mo e speci ic o mul i- o ma consump ion
and a e no commonly men ioned in s udies o audio-only
music lis ening. This inding sugges s ha ex ending be-
yond audio can elici new mo i a ions o media usage.
5.2 RQ2: In luence o Pe sonalized, Con ex -Awa e
Recommenda ions
The second esea ch ques ion (RQ2) in es iga ed how pe -
sonalized ecommenda ions o music- ela ed media a ec
use expe ience when accoun ing o di e se U&G and
con ex ual ac i i ies. The Use S udy employed wo p o-
o ypes: a Basic Music App, which only ecommended au-
dio, and a Modi ied Music App, which inco po a ed addi-
ional o ma s (i.e., ideo, ex , o images) aligned wi h he
ac i i ies iden i ied in he Dia y S udy. Following es ab-
lished e alua ion amewo ks, we used he ResQue ques-
ionnai e [29] and he sho e sion o he Use Expe i-
ence Ques ionnai e (UEQ-S; [30]) o gain deepe insigh s.
Resul s consis en ly a o ed he Modi ied App in e ms o
bo h p agma ic (ease o use) and hedonic (enjoymen , no -
el y) quali ies. Ne e heless, pa icipan a ings o Accu-
acy we e s a is ically signi ican a a 0.05 con idence le el
(p= 0.017), no a a 0.01 le el, indica ing ha he pe -
cei ed alignmen be ween ecommended media and use
p e e ences a ied sligh ly. Addi ionally, pe cei ed ease o
use showed he smalles di e ence be ween he wo p o-
o ypes, possibly because bo h sha ed he same in e ace
layou , colo s, and na iga ion scheme.
Open-ended esponses illus a ed he desi e o con ol
o e media o ma s, wi h pa icipan s exp essing in e es
in u ning o ex a con en i i became dis ac ing o did
no i hei ongoing ac i i y. These esponses ein o ce
he no ion ha while addi ional media can be help ul o
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
551
en e aining, con ex ual app op ia eness emains i al, and
use s o en wan he eedom o choose how much isual
o ex ual con en accompanies he audio.
6. CONCLUSION
This esea ch aimed o enhance MRSs by in eg a ing con-
ex ual awa eness (e.g., d i ing, wo king ou , o socializ-
ing) and in oducing music- ela ed media ypes beyond au-
dio. The Dia y S udy e ealed he ange o day- o-day ac-
i i ies du ing which media a e consumed, along wi h a se
o no el U&G ( o ins ance, “Fo P e e ence” and “To Ge
Dis ac ed”), sugges ing ha mul i- o ma music se ices
can ac i a e mo i a ions no o en obse ed in audio-only
en i onmen s. The Use S udy demons a ed ha an MRS
p o o ype aligning wi h hese con ex ual insigh s gene ally
os e s highe p agma ic and hedonic quali y pe cep ions.
O e all, hese indings sugges a s ong po en ial o
con ex -awa e, mul imedia-based ecommenda ions o im-
p o e he use expe ience, p o ided hey emain lexible
and allow lis ene s o con ol o disable addi ional media
as needed.
7. IMPLICATIONS & LIMITATIONS
The indings om his s udy o e guidance o enhanc-
ing MRSs by in eg a ing con ex ual ac i i y awa eness [22]
and use -cen ic con en selec ion. In oducing music-
ela ed media, such as ideo, ex , and images, alongside
adi ional audio o ma s appea s o deli e a mo e engag-
ing use expe ience by add essing a ied needs ac oss di -
e en ac i i ies. This app oach could be adap ed o o he
media domains, including podcas s, educa ional ideos, o
news, whe e eal- ime con ex ual in o ma ion shapes use
engagemen . Music s eaming se ices could gain a com-
pe i i e edge by inco po a ing hese ea u es, he eby in-
c easing use sa is ac ion.
Fu he mo e, he s udy b oadens he in e sec ion be-
ween Uses and G a i ica ions Theo y and Recommende
Sys ems by showcasing how con ex ual ac i i y and mul-
imedia op ions a ec use p e e ences. The in eg a ion
o quali a i e insigh s om dia y s udies wi h quan i a-
i e measu es in a use s udy highligh s he alue o hy-
b id esea ch me hods. These me hods emphasize he sig-
ni icance o con ex ual ele ance and mul imedia lexibil-
i y in ecommenda ion algo i hms, shi ing he ocus be-
yond audio-cen ic esea ch owa d mul imedia-enhanced
MRSs. The esul ing insigh s suppo he de elopmen o
adap i e, ac i i y-awa e ecommende s ha can ca e o a
wide a ay o use mo i a ions and beha io s.
Se e al limi a ions should be no ed. The sample size
and demog aphic di e si y we e ela i ely cons ained in
bo h he Dia y S udy and he Use S udy. This cons ain
in luenced pe cep ions o accu acy in he Modi ied Mu-
sic App. Speci ically, he assump ion ha use s who sha e
simila ai s would also sha e music- ela ed media p e -
e ences es ed on dia y da a om only 26 pa icipan s,
pe haps insu icien o accu a ely cap u ing all p e e ence
a ia ions. Mo eo e , he Dia y S udy sample was skewed
owa d emale pa icipan s, po en ially shaping which ac-
i i ies and g a i ica ions eme ged mos p ominen ly. Be-
cause his phase was explici ly explo a o y, we do no
claim ha hese pa e ns gene alize ac oss all lis ene pop-
ula ions; ins ead, hey highligh p omising a enues o
ollow-up. Rec ui ing a la ge , mo e demog aphically bal-
anced coho , o explici ly es ing o gende in e ac ions,
will help e i y whe he hese insigh s hold ac oss di e se
g oups and ensu e equi able applicabili y o con ex -awa e
ecommenda ions.
In addi ion, ou exclusi e ocus on con ex ual ac i i-
ies a he han a b oade ange o signals (e.g., loca ion,
ime o day, o de ice usage) [22] may ha e omi ed o he
d i e s o media- o ma p e e ences. Inco po a ing such
passi e o ligh weigh sensing in u u e wo k could e-
eal new mo i a ional ac o s. Finally, he Use S udy em-
ployed a ixed song se d awn om public playlis s a he
han pe sonalized con en , a decision made o logis ical
easons; while ou emphasis emained on media- ype ec-
ommenda ions, his choice may ne e heless ha e in lu-
enced o e all use a ings.
Finally, ime cons ain s imposed a na owe window
o collec ing dia y en ies and ec ui ing pa icipan s o
he use s udy. Longe collec ion in e als o epea ed
sessions migh cap u e mo e nuanced beha io s and yield
s onge e idence on whe he mul imedia-enhanced ec-
ommenda ions main ain hei appeal o e ime.
8. FUTURE RESEARCH
Fu he esea ch could in es iga e he long- e m impac o
mul imedia-enhanced ecommenda ions on use engage-
men and e en ion, employing a longi udinal s udy o de-
e mine whe he added media elemen s sus ain use sa -
is ac ion o e ex ended pe iods. Examining demog aphic
ac o s, such as age, gende , o cul u al backg ound, would
also deepen unde s anding o how media ype p e e ences
a y among di e en use segmen s. Finally, in es iga -
ing di e en media on a mo e g anula le el could help
imp o e ecommenda ion accu acy ( o ins ance one could
p e e music ideos o b owsing along and co e ideos
o lea ning a song on an ins umen ).
Explo ing pe sonaliza ion s a egies ha conside mo e
g anula ac i i y de ec ion, emo ional s a es, ime-o -day
pa e ns, o loca ion-based da a alongside con ex ual ac-
i i ies could imp o e he accu acy and ele ance o ec-
ommenda ions. Add essing he cons ain s no ed he e, in-
cluding la ge , mo e di e se samples and p olonged ob-
se a ion pe iods, would likely yield mo e gene alizable
indings. By inco po a ing hese di ec ions, u u e esea ch
can e ine he design o con ex -awa e, mul imedia-d i en
MRSs o o e mo e esponsi e and g a i ying use expe i-
ences.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
552
9. ETHICAL CONSIDERATIONS
This s uden -led s udy was e iewed and app o ed a he
ins i u ional le el by Jönköping Uni e si y and judged o
pose minimal isk unde Swedish s uden - esea ch eg-
ula ions (no E ikp ö ningsmyndighe en submission was
equi ed). All pa icipan s ecei ed w i en in o ma ion
abou bo h he dia y-logging and p o o ype-e alua ion
phases, and p o ided in o med consen be o e pa ici-
pa ion. Da a collec ion and s o age p ocedu es ensu ed
anonymi y and olun a y wi hd awal igh s.
To p o ec pa icipan wel a e and da a p i acy, we
implemen ed se e al sa egua ds h oughou bo h s udies.
Fi s , in o med consen p ocedu es emphasized olun a i-
ness: pa icipan s we e b ie ed on hei eedom o skip
any dia y en y o wi hd aw om he app e alua ion wi h-
ou consequences. Second, we anonymized all beha -
io al and su ey da a a he poin o collec ion. Hence,
no di ec iden i ie s we e s o ed. Thi d, we conduc ed
a deb ie immedia ely ollowing he app-based e alua-
ion, allowing pa icipan s o ask ques ions and eques
emo al o hei da a i desi ed. Al hough he s udy in-
ol ed only non-sensi i e in o ma ion (music-lis ening be-
ha io s and media- o ma p e e ences), hese s eps en-
su ed anspa ency and espec o pa icipan au onomy.
Finally, in e lec ing on u u e eal-wo ld deploymen s
o con ex -awa e ecommenda ions—whe e adap i e ea-
u es migh le e age signals such as ime o day o sel -
epo ed ac i i ies—we acknowledge he inhe en ension
be ween pe sonaliza ion and su eillance. We ecommend
ha any p oduc ion sys em include clea op -in p omp s,
ine-g ained use con ols o disabling con ex -d i en ea-
u es, and a s ic policy o da a minimiza ion o uphold
use agency and p i acy.
10. ACKNOWLEDGMENTS
This esea ch was unded in whole o in pa by he Aus-
ian Science Fund (FWF): h ps://doi.o g/10.
55776/COE12,h ps://doi.o g/10.55776/
DFH23,h ps://doi.o g/10.55776/P36413.
11. REFERENCES
[1] Eppe son and Go don, “Music | A Fo m, S yles,
hy hm, His o y,” 2 2025. [Online]. A ailable:
h ps://www.b i annica.com/a /music
[2] T. DeNo a, Music in e e yday li e. Camb idge uni-
e si y p ess, 2000.
[3] T. Dohe y, “M and he music ideo: P omo and
p oduc ,” Sou he n Jou nal o Communica ion, ol. 52,
no. 4, pp. 349–361, 1987.
[4] M. Ka z, Cap u ing sound: How echnology has
changed music. Uni o Cali o nia P ess, 2010.
[5] D. A cha , A. Melchio e, M. Schedl, R. Hennequin,
E. Epu e, and M. Moussallam, “Explainabili y in mu-
sic ecommende sys ems,” AI Magazine, ol. 43,
no. 2, pp. 190–208, 2022.
[6] Y. Deldjoo, M. Schedl, P. C emonesi, G. Pasi e al.,
“Con en -based mul imedia ecommenda ion sys ems:
de ini ion and applica ion domains,” in I alian In o -
ma ion Re ie al Wo kshop, 2018, pp. 1–4.
[7] T. Schä e , P. Sedlmeie , C. S äd le , and D. Hu on,
“The psychological unc ions o music lis ening,”
F on ie s in psychology, ol. 4, p. 511, 2013.
[8] A. J. Lonsdale and A. C. No h, “Why do we lis en
o music? a uses and g a i ica ions analysis,” B i ish
jou nal o psychology, ol. 102, no. 1, pp. 108–134,
2011.
[9] A. E. K ause and S. C. B own, “A uses and g a i ica-
ions app oach o conside ing he music o ma s ha
people use mos o en,” Psychology o Music, ol. 49,
no. 3, pp. 547–566, 2021.
[10] K. De la Rosa He e a and R. Publiese, “The uses and
g a i ica ions o music among eme ging adul s,” In e -
na ional Jou nal o A s & Sciences, ol. 10, no. 1, pp.
351–364, 2017.
[11] T. Chamo o-P emuzic and A. Fu nham, “Pe sonali y
and music: Can ai s explain how people use music in
e e yday li e?” B i ish jou nal o psychology, ol. 98,
no. 2, pp. 175–185, 2007.
[12] D. Boe , R. Fische , H. G. Tekman, A. Abubaka ,
J. Njenga, and M. Zenge , “Young people’s opog aphy
o musical unc ions: Pe sonal, social and cul u al ex-
pe iences wi h music ac oss gende s and six socie ies,”
In e na ional Jou nal o Psychology, ol. 47, no. 5, pp.
355–369, 2012.
[13] B. Fe we da and M. Schedl, “In es iga ing he ela ion-
ship be ween di e si y in music consump ion beha io
and cul u al dimensions: A c oss-coun y analysis.” in
UMAP (Ex ended P oceedings), 2016.
[14] B. Fe we da, A. Vall, M. Tkalcic, and M. Schedl, “Ex-
plo ing music di e si y needs ac oss coun ies,” in P o-
ceedings o he 2016 Con e ence on Use Modeling
Adap a ion and Pe sonaliza ion, 2016, pp. 287–288.
[15] T. Hays and V. Minichiello, “The meaning o music in
he li es o olde people: A quali a i e s udy,” Psychol-
ogy o music, ol. 33, no. 4, pp. 437–451, 2005.
[16] M. Schedl, F. Lemme ich, B. Fe we da, M. Skow on,
and P. Knees, “Indica o s o coun y simila i y in e ms
o music as e, cul u al, and socio-economic ac o s,”
in 2017 IEEE In e na ional Symposium on Mul imedia
(ISM). IEEE, 2017, pp. 308–311.
[17] M. Skow on, F. Lemme ich, B. Fe we da, and
M. Schedl, “P edic ing gen e p e e ences om cul-
u al and socio-economic ac o s o music e ie al,”
in Eu opean Con e ence on In o ma ion Re ie al.
Sp inge , 2017, pp. 561–567.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
553
[18] D. J. Ha g ea es and A. C. No h, “The unc ions o
music in e e yday li e: Rede ining he social in music
psychology,” Psychology o music, ol. 27, no. 1, pp.
71–83, 1999.
[19] B. Fe we da, “Imp o ing he use expe ience o music
ecommende sys ems h ough pe sonali y and cul u al
in o ma ion,” Ph.D. disse a ion, B. Fe we da, 2016.
[20] B. Fe we da and M. Schedl, “Enhancing music ecom-
mende sys ems wi h pe sonali y in o ma ion and emo-
ional s a es: A p oposal.” in Umap wo kshops, 2014.
[21] P. Knees, M. Schedl, B. Fe we da, and A. Laplan e,
“Use awa eness in music ecommende sys ems,” Pe -
sonalized human-compu e in e ac ion, pp. 223–252,
2019.
[22] M. Schedl, H. Zamani, C.-W. Chen, Y. Deldjoo, and
M. Elahi, “Cu en challenges and isions in music ec-
ommende sys ems esea ch,” In e na ional Jou nal o
Mul imedia In o ma ion Re ie al, ol. 7, pp. 95–116,
2018.
[23] Y. Jin, N. N. H un, N. Tin a e , and K. Ve be , “Con-
ex play: E alua ing use con ol o con ex -awa e
music ecommenda ion,” in P oceedings o he 27 h
ACM con e ence on use modeling, adap a ion and
pe sonaliza ion, 2019, pp. 294–302.
[24] J. Ga cia-Ga h igh , B. S . Thomas, C. Hosey,
Z. Naza i, and F. Diaz, “Unde s anding and e alua -
ing use sa is ac ion wi h music disco e y,” in The 41s
In e na ional ACM SIGIR Con e ence on Resea ch &
De elopmen in In o ma ion Re ie al, 2018, pp. 55–
64.
[25] Y. Hu and M. Ogiha a, “Nex one playe : A music ec-
ommenda ion sys em based on use beha io ,” in P o-
ceedings o he 12 h In e na ional Socie y o Music
In o ma ion Re ie al Con e ence, ISMIR, 2011, pp.
103–108.
[26] M. Schedl and A. Flexe , “Pu ing he use in he cen-
e o music in o ma ion e ie al,” in P oceedings o
he 13 h In e na ional Socie y o Music In o ma ion
Re ie al Con e ence, ISMIR, 2012, pp. 385–390.
[27] G. Vigliensoni and I. Fujinaga, “Au oma ic music ec-
ommenda ion sys ems: Do demog aphic, p o iling,
and con ex ual ea u es imp o e hei pe o mance?”
in P oceedings o he 17 h In e na ional Socie y o
Music In o ma ion Re ie al Con e ence, ISMIR, 2016,
pp. 94–100.
[28] B. Fe we da, “The sound ack o my li e: adjus ing
emo ion o music,” in P oceedings o he 1s wo kshop
collabo a ing wi h in elligen machines, Seoul, Sou h
Ko ea, 2015.
[29] P. Pu, L. Chen, and R. Hu, “A use -cen ic e alua ion
amewo k o ecommende sys ems,” in P oceedings
o he i h ACM con e ence on Recommende sys ems,
2011, pp. 157–164.
[30] M. Sch epp, “Use expe ience ques ionnai e hand-
book,” All you need o know o apply he UEQ suc-
cess ully in you p ojec , ol. 10, 2015.
[31] A. E. G easley and A. Lamon , “Explo ing engagemen
wi h music in e e yday li e using expe ience sampling
me hodology,” Musicae Scien iae, ol. 15, no. 1, pp.
45–71, 2011.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
554