scieee Science in your language
[en] (orig)

InScoreAI: Collaborative Score Inpainting with Anticipatory Transformers

Author: Lallana Babiloni, Manuel
Publisher: Zenodo
DOI: 10.5281/zenodo.17304499
Source: https://zenodo.org/records/17304499/files/Manuel-Lallana_SMC_2025_Master_Thesis.pdf
Mas e hesis on Sound and Music Compu ing
Uni e si a Pompeu Fab a
InSco eAI: Collabo a i e Sco e Inpain ing
wi h An icipa o y T ans o me s
Manuel Lallana Babiloni
Supe iso : Ra ael Ramí ez-Melendez
July 2025
Mas e hesis on Sound and Music Compu ing
Uni e si a Pompeu Fab a
InSco eAI: Collabo a i e Sco e Inpain ing
wi h An icipa o y T ans o me s
Manuel Lallana Babiloni
Supe iso : Ra ael Ramí ez-Melendez
July 2025
Con en s
1 In oduc ion 1
1.1 Signi icance and Mo i a ion . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Con ibu ions................................ 3
1.3 Objec i es.................................. 3
1.4 S uc u e o he Repo . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 S a e o he a 5
2.1 In oduc ion................................. 5
2.2 Human-human collabo a i e composi ion . . . . . . . . . . . . . . . . . 5
2.2.1 F om co-c ea ion o ne wo ks . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Collabo a i e Physical In e aces . . . . . . . . . . . . . . . . . . . . . 6
2.2.3 Collabo a i e Digi al In e aces . . . . . . . . . . . . . . . . . . . . . . 6
2.2.4 So wa e Ecosys em Analysis . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Human-AI Music-Making . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 AIMusicGene a ion............................ 8
2.3.2 Human-AI collabo a ion . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 B idging he gap: Human collabo a ion assis ed by AI o AISCW . . . 13
3 Me hods 17
3.1 P elimina ysu ey ............................. 17
3.2 InSco eAI A chi ec u e . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Weba chi ec u e .............................. 20
3.2.2 API ..................................... 25

3.2.3 Deploymen ................................. 30
3.3 Expe imen s................................. 30
3.3.1 E alua ionSu ey ............................. 31
3.3.2 Expe imen 1. Indi idual composi ion wi hou AI ools . . . . . . . . . 31
3.3.3 Expe imen 2. Indi idual composi ion wi h AI ools . . . . . . . . . . . 32
3.3.4 Expe imen 3. Collec i e composi ion wi h AI ools . . . . . . . . . . . 32
4 Resul s 33
4.1 Quan i a i e Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 Indi idual No-AI use indings . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.2 AIuse indings ............................... 33
4.1.3 Collabo a ion indings . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Quali a i eAnalysis ............................ 34
4.2.1 E ec o Yea s o Expe ience . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Tablesandg aphics............................. 36
5 Discussion 39
5.1 Discussion.................................. 39
5.1.1 Con ibu ions................................ 40
5.1.2 Fu u es eps................................. 41
5.1.3 Limi a ions ................................. 42
5.2 Conclusions................................. 43
Bibliog aphy 45
Acknowledgemen
I would like o hank my supe iso , Ra ael Ramí ez-Melendez, o he e o and
ime he dedica ed o his p ojec . He belie ed in my ideas and p oposals and always
encou aged me o explo e new solu ions.
I would like o hank Anna Xambó and Ál a o Ba bosa o hei guidance and Xa ie
Se a o pu ing us in con ac . The con e sa ions we had opened new pa hs and
insigh s, and hey gene ously dedica ed hei ime and knowledge.
I am hank ul o my Mas e and Bachelo colleagues o all hei suppo and
eedback as well as o Vale io Vela do o p o iding us aluable knowledge on he
mos ecen model a chi ec u es and sha ing he An icipa o y T ans o me s wi h
me.
This wo k was made possible hanks o all he musicians and compose s ha illed
he p elimina y su ey and pa icipa ed in he expe imen s, i has been a pleasu e
o c ea e some hing o you.
Finally, I would like o hank my pa ne Laia, my pa en s and sis e o all hei
lo e and ca e h oughou an in ense and ul illing yea .
Abs ac
Collabo a i e music composi ion is he p ocess by which mul iple indi iduals con-
ibu e o he c ea ion o a musical wo k. In his p ojec , an in e ace is c ea ed o
help compose s c ea e collabo a i ely and each consensus h ough lexible symbolic
music gene a ion using An icipa o y T ans o me s. This model gene a es MIDI in-
side a speci ic agmen o music aking in o accoun he p e ious and ollowing
okens. The cen al and mos impo an pa o he p ojec a e musicians and com-
pose s, hey’ e been he ocus om he s a o he de elopmen and e alua ion.
A p elimina y su ey o expe ienced musicians and compose s was de eloped o
ex ac impo an ea u es o he in e ace, he e alua ion consis ed o h ee asks,
one whe e he in e ace was used wi hou he AI ools, one wi h he AI ools and he
las one using he collabo a i e mode wi h AI ools. Resul s show ha AI assis ance
signi ican ly educes he composi ional e o and imp o es sel -e icacy in indi idual
wo k lows, bu diminishes pe cei ed owne ship. In collabo a i e se ings, he sys em
e ec i ely esol es in e pe sonal ic ion h ough AI-media ed "idea b idging," os-
e ing consensus and mu ual unde s anding, sense o owne ship is highe compa ed
o an indi idual AI se ing hough i is s ill a conce n o some pa icipan s based
on long esponses. P o essional compose s showed conce ns abou AI making com-
pose s “lazy” and he wo ks simple and p edic able, while nonp o essionals ou lined
educa ional po en ials o music collabo a i e se ings.
Keywo ds: Human compu e in e ac ion (HCI); Symbolic Music Gene a ion; T ans-
o me model; Collabo a i e music composi ion;
6Chap e 2. S a e o he a
p esen ed o gi e a bi d’s-eye iew o he opic.
A well documen ed example is he collabo a ion be ween he expe imen al a is s
John Cage and Da id Tudo who wo ked wi h inde e mina e sco es. The pe o me
had he eedom o de ine some musical pa ame e s; Glo e [1] conside s his o be
an example o collabo a i e musical p ac ice.
La e on, The League o Au oma ic Music Compose s, ounded by Jim Ho on, pu
emphasis on connec ions be ween musicians and he use o compu e s o de ine new
social con ex s o music, while he collec i e known as The Hub (1980s) pionee ed
compu e -media ed social in e ac ions in music h ough eal- ime da a exchange
be ween pe o me s, c ea ing eme gen s uc u es om decen alized decisions [2].
Ba bosa [3], who deli e ed he i s sys ema ic axonomy o compu e -suppo ed col-
labo a i e music sys ems, analyzed his and o he ea ly mani es a ions o ne wo ked
music and in oduced he “Sha ed Sonic En i onmen ” concep ia he la ency-
adap i e Public Sound Objec s (PSOs) pla o m.
2.2.2 Collabo a i e Physical In e aces
Tangible in e aces p io i ized egali a ian pa icipa ion h ough spa ial in e ac ion
pa adigms. Jam-O-D um [4] used ci cula mul i ouch su aces o enable non-hie a chical
hy hmic collabo a ion, while he amous eacTable [5] mapped physical objec po-
si ions o sound pa ame e s, allowing simul aneous manipula ion by mul iple use s.
Mul i-su ace de ices and sma phones ha e been used as a ool o music collabo-
a ion, showing ha esponsi eness and isual eedback is impo an o he use
[6], and ha eal- ime shaping o musical ideas os e s unexpec ed and expe imen-
al esul s [7]. These sys ems aded p ecision o accessibili y, os e ing “low- loo ”
engagemen a he cos o limi ed exp essi e dep h.
2.2.3 Collabo a i e Digi al In e aces
Rega ding collabo a ion using echnological in e aces o music composi ion, a sys-
em o collabo a i e music composi ion o e he web was in oduced in [8]. This

2.2. Human-human collabo a i e composi ion 7
was one o he i s collabo a i e ools o music and included logs o e sions and
o ing. The de elopmen o sha ed i ual en i onmen s [9] and cloud-based ile
sha ing [10] o music collabo a ion culmina ed in p ojec s like DISSCO [11], a pla -
o m ha in eg a es algo i hmic composi ion, sound syn hesis, and use inpu in
eal- ime, enabling mul iple pa icipan s—whe he co-loca ed o emo e— o collab-
o a i ely c ea e and modi y musical composi ions.
Ano he impo an line o esea ch has been he no ion o mu ual engagemen in
c ea i e collabo a ions [12] [13]. This a ea explo es how engagemen in music col-
labo a ion in a digi al ool can be measu ed. They ound ha adding ins uc ions
ha emphasize collabo a ion can lead o mo e engagemen , which was iden i ied
h ough se e al ac o s (like p oximi y o con ibu ions, mu ual modi ica ion, and
acknowledgmen , mi o ing, o ans o ma ion o o he s’ inpu s). Mo eo e , hey
ound a c i ical challenge: pa icipan s epo ed a clash o ideas and a lack o com-
munica ion channels. In his wo k, we will ackle his issue by in oducing AI ools
o c ea e inpain ing p oposals in o de o each consensus.
Digi al in e aces o music collabo a ion ha e been also ackled by he ield o music
educa ion, wi h pla o ms enabling “a powe ul ou e in o composi ional hinking”
[14]. S udies o online composi ion [15] e ealed ha some asynch onous wo k lows
imp o e o ganiza ion and ime managemen .
Se e al s udies ha e shown success ul e alua ion me ics o collabo a i e en i on-
men s, especially he C ea i i y Suppo Index (CSI) [16], a psychome ically ali-
da ed su ey ool designed o e alua e how e ec i ely a echnology (e.g., so wa e,
digi al ool) suppo s use s in c ea i e asks, and a p ojec by Bu kha d e al. [17],
an e alua ion o collabo a ion in ech-d i en design (e.g., i ual en i onmen s) ha
emphasizes mixed-me hod amewo ks blending quan i a i e me ics wi h quali a-
i e insigh s.
8Chap e 2. S a e o he a
2.2.4 So wa e Ecosys em Analysis
Some imes, music so wa e inco po a es collabo a ion in he o m o cloud syncing
and ile sha ing, bu i is a e o ind eal- ime edi ing as being an essen ial pa o
i . The mos no able music edi ing so wa e, Musesco e, Sibelius and Do ico, o e
p o essional edi ing bu lack eal- ime collabo a ion, al hough some plugins ha e
been de eloped o his pu pose [18].
No e ligh , Fla .io and Soundslice a e web applica ions wi h collab suppo o some
ex en , bu , as he al eady men ioned edi ing so wa e, hey lack AI ools o sym-
bolic music gene a ion. An excep ion o his is AIVA, an AI music gene a ion
assis an ha in i es he use o co-c ea e acks. Un o una ely, i s inpu and isu-
aliza ion ha e he s yle o a DAW wi h piano oll, bu i is no a sco e edi ing ool.
In gene al, his ool seems o be e i music p oduce s, no compose s.
Table 1 shows how cu en musical so wa e suppo s di e se collabo a ion capabil-
i ies, bu o en lacks AI assis ance and eal- ime collabo a ion, and when i does, i
lacks sco e edi ion.
So wa e Collab Type No a ion Suppo AI Fea u es
Musesco e Cloud Sync (MuseLab Plugin o collabo a ion) P o essional None
Do ico Async ile sha ing P o essional None
Sibelius Cloud sync P o essional None
No e ligh Real- ime web Basic None
Fla .io Real- ime web In e media e None
Soundslice Sha ing by links Basic None
S a pad Async cloud Handw i ing None
AIVA None DAW in eg a ion Full AI gene a ion
Table 1: Compa a i e Analysis o Collabo a i e Music So wa e
2.3 Human-AI Music-Making
2.3.1 AI Music Gene a ion
The axonomy p oposed by Zhu e al. [19] shows ha AI-d i en music gene a ion
sys ems can be g ouped in o h ee me hodological amilies, each de ined by he inpu
ep esen a ion and lea ning pa adigm hey employ.
2.3. Human-AI Music-Making 9
1. Pa ame e based models: These sys ems ea music as an o de ed se-
quence o symbolic pa ame e s (no es, du a ions, dynamics, e c.) and gene -
a e new sequences by modeling he s a is ical o heu is ic ela ionships among
hem.
•Ma ko chains (e.g. Ma ko Melody Gene a o ): ansi ion ma i-
ces s o e he p obabili y o one musical e en ollowing ano he ; gene a-
ion p oceeds by sampling om he lea ned ma ix.
•Rule based sys ems (e.g. MusicSco e): expe de ined ha monic and
hy hmic cons ain s ha delimi a sea ch space whe e alid sequences
a e assembled.
•E olu iona y algo i hms (e.g. GenJam): candida e sequences go h ough
mu a ion and c osso e and a i ness unc ion guides selec ion ac oss mul-
iple gene a ions.
•Neu al ne wo ks (e.g. Magen a,Jukebox,MuseNe ): ecu en o
T ans o me a chi ec u es ha lea n long- ange dependencies in la ge
MIDI o aw audio co po a and au o eg essi ely p edic he nex oken.
2. P omp based (condi ional) models: A na u al language o audio p omp
condi ions he model, p o iding music ha is, ideally, aligned wi h he desc ip-
ion (e.g. Ri usion,MusicLM,MusicGen).
3. Visual based models: These app oaches con e ideo o isual embeddings
in o an in e media e ep esen a ion ha condi ions he music gene a o (e.g.
CMT,Foley Music).
They iden i ied he nex limi a ions in mos o he AI music gene a ion models:
•Limi ed Con ol: Some models, such as MuseNe , may no adhe e s ic ly
o use -speci ied ins umen s, leading o unexpec ed esul s.
•Da a Dependency: Tools like Jukebox a e limi ed by he di e si y o hei
aining da ase s, a ec ing hei abili y o gene a e speci ic s yles.
10 Chap e 2. S a e o he a
•Compu a ional Complexi y: Models like Jukebox can be esou ce-in ensi e,
making hem less accessible o use s wi hou high-pe o mance compu ing
esou ces o o li e applica ions.
•Quali y o Ou pu : Gene a ed music o en equi es manual ine- uning, as he
ini ial ou pu may no mee use expec a ions.
These limi a ions e eal he challenges o achie ing ully au onomous and c ea i ely
ich AI music gene a ion.
AI Symbolic Music Gene a ion
Recen esea ch on AI-d i en symbolic music gene a ion has highligh ed he com-
plexi y o au oma ing he composi ion p ocess. Ea ly deep lea ning app oaches
ha e excelled a cap u ing sho - ange dependencies o p oduce ealis ic local mu-
sical pa e ns, ye ha e o en s uggled wi h long- ange s uc u e, s yle alignmen ,
and limi ed c ea i e po en ial [20]. These limi a ions could also a ise om he lack
o uni ied me hods o e alua ing model pe o mance, making i di icul o assess
whe he gene a ed music mee s p o essional s anda ds o o iginali y and cohe ence.
Mul iple s udies ad oca e o collabo a i e app oaches ha o eg ound exp essi e-
ness and human–machine co-c ea ion [21], his app oaches o en p opose “music
inpain ing" solu ions.
Music inpain ing
Music inpain ing add esses he comple ion o missing segmen s wi hin a musical
wo k, aligning closely wi h i e a i e human c ea i e wo k lows. The ask is o gen-
e a e a musically cohe en sequence ha b idges he gi en pas con ex wi h he
u u e con ex .
Ea ly me hodologies p oposed p obabilis ic sampling. Gibbs sampling, a Ma ko
Chain Mon e Ca lo (MCMC) echnique, has been widely applied—no ably in Deep-
Bach o esampling no es/ oices in Bach cho ales. Simila ly, Cocone (CNN-based)
2.3. Human-AI Music-Making 11
was de eloped o comple e pa ial sco es, aming gene a ion as i e a i e “ ew i ing".
These piano- oll models, howe e , we e limi ed o ixed-leng h spans [21]. Ex ending
his, Ippoli o e al. [22] used T ans o me s o in illing missing MIDI pe o mances
ia Gibbs sampling, enabling compose s o i e a i ely ew i e o mo ph sec ions.
Lea ning exp essi e la en spaces has also enabled g ea e di e si y and use in e -
ac ion. Pa i e al. [23] in oduced Inpain Ne (VAE-based), gene a ing connec -
ing ba s be ween clips ia lea ned la en ajec o ies, allowing con ex ual manip-
ula ion. Chen e al. [24] p oposed Music Ske chNe , inco po a ing use -speci ied
pi ch/ hy hm cons ain s ("ske ches") o ill monophonic gaps. A key limi a ion o
bo h ba -le el VAEs was igid alignmen o measu e bounda ies.
Recen wo k o e comes ixed-span limi a ions. Chang e al. [25] le e aged XLNe o
a iable-leng h comple ion, in oducing ela i e ba encoding o nuanced posi ional
awa eness. Guo e al. [26] de eloped a T ans o me amewo k enhancing s ylis ic
ideli y o o iginal ma e ial h ough con ol okens go e ning key, ack densi y, ack
polyphony, ensile s ain, ba cloud diame e , and ack occupa ion. This g anula
con ol acili a es in e ac i e collabo a ion.
Impo an echnical ad ances in con olling symbolic ou pu s can be seen in wo ks
such as he An icipa o y Music T ans o me , whe e his inpain ing pa adigm helps
gene a e speci ic passages a ound use -de ined ancho s [27], and MIDI-GPT, which
o e s ack-based condi ional gene a ion o comp ehensi e musical a angemen s
[28]. Finally, a ecen model called No aGen emphasizes aining pa adigms om
la ge language models o imp o e musicali y and use -con ollable ea u es [29],
showing how domain-speci ic adap a ion can e ine s uc u e and cohe ence in clas-
sical composi ions.
2.3.2 Human-AI collabo a ion
Zhu, e al. [19] men ion he in e disciplina y po en ial o collabo a ion be ween
humans and machines. This Human-AI collabo a ion is esea ch ha mos o he
ime ocuses on in e ac i i y, s ee abili y and con ol.

12 Chap e 2. S a e o he a
Huang, e al. [30] emphasize he impo ance o de eloping AI ools wi h c ea i e
wo k lows in mind and inco po a ing exis ing musical p ac ices. C ea o s see AI as
a pa ne o idea ion and explo a ion bu emphasize main aining con ol o e he
inal p oduc [31]. In he same lines, AI-s ee ing ools we e ound o imp o e he
sense o owne ship in he use s [32].
One o he mos no able esea ch e o s on AI-s ee ing ools o Human-AI music
c ea ion is p esen ed in [33]. The s udy in oduces “Cococo”, a web-based music
edi o ha in eg a es AI-s ee ing ools o suppo i e a i e music composi ion and
enhance collabo a ion. Thei claim is ha no as many s udies ha e paid a en ion
o he po en ial and limi a ions o co-c ea ion wi h AI ools. Thei app oach is
o p o ide a web in e ace whe e you can compose SATB cho ales using a Deep
Neu al Ne wo k ained on Bach pieces. You can selec wha oice o gene a e
and gene a ed con en appea s as mul iple al e na i es you can audi ion and choose
om, imp o ing con ollabili y o e he inal ou pu .
In hei use s udy wi h music no ices, he ools helped pa icipan s eel mo e em-
powe ed and connec ed o he composi ions, while os e ing a posi i e pe cep ion
o AI’s ole in c ea i e asks. This use s udy was e alua ed by asking he use s o
compose a piece using a non-s ee able AI ool like “Bach Doodle” and hen compos-
ing a pice using “Cococo”. Then he use s illed a su ey wi h 7-poin Like scale
i ems ega ding e icacy, engagemen , e o , con ollabili y and o he use ul i ems
o Human-AI collabo a ion.
Based on he esul s ex ac ed om analyzing and compa ing hose i ems om bo h
asks, we can obse e ha use s bene i ed om:
•Inc eased sense o con ol, us , and owne ship o composi ions.
•Imp o ed abili y o sol e p oblema ic a eas, lea n new musical s uc u es, and
explo e AI’s limi s.
•Enhanced sel -e icacy, c ea i i y, and engagemen in he co-c ea ion p ocess.
2.4. B idging he gap: Human collabo a ion assis ed by AI o AISCW 13
Use s saw he AI as a collabo a i e pa ne when s ee ing ools we e a ailable. Some
pa icipan s en isioned using AI wi h di e en oles o di e en asks. The e o e,
u u e in e aces could le use s de ine hei c ea i e goal, adap ing he AI’s ole
acco dingly. Fo uzzy goals, he AI migh au oma ically explo e ideas; o clea
goals, i could espond o speci ic eques s. S ee ing ools hus o e no only con ol
o e di ec ion bu also he delibe a e op ion o in i e c ea i i y and new possibili ies
by ceding con ol.
2.4 B idging he gap: Human collabo a ion assis ed
by AI o AISCW
La e , “Cococo” was es ed in a collabo a i e human-human en i onmen as a possi-
ble way o ease social ic ion du ing c ea i e asks [34]. Use s we e much mo e com-
o able judging and deba ing AI p oposals han human p oposals, allowing hem
o es ablish common g ound and p og ess e icien ly. The s udy e ealed ha he
non-human iden i y o he AI ac ed as a psychological sa e y ne : pa icipan s eely
ejec ed AI ideas wi hou ea o o ending a collabo a o , educing in e pe sonal
ension. In addi ion, AI o e ed mul iple al e na i es du ing c ea i e disag eemen s,
allowing eams o mo e away om a dead end owa d a comp omise.
The limi a ions o his s udy show ha he e is a lo o space o imp o emen :
1. Al hough “Cococo” acili a ed collabo a ion, i was no o iginally designed o
mul i-human wo k lows.
2. Ins ead o making “Cococo” collabo a i e, hey used he Google Remo e Desk-
op ex ension, allowing one o he pa icipan s o access he sc een o he o he
use . This makes hem sha e he same iew o he pla o m bu lacks essen-
ial collabo a i e ea u es like independen ac ions o each use and isual
eedback o hese ac ions.
3. Some o he pa icipan s indica ed ha he sys em u ned humans om “com-
pose s” o “ad iso s” o “cu a o s” o AI ou pu .
14 Chap e 2. S a e o he a
4. I s Bach-inspi ed aining da a cons ained s ylis ic di e si y.
Despi e hese limi a ions, he s udy unco e s he po en ial o AI as social glue: by
sha ing s a ing poin s, accele a ing p og ess h ough au o-comple ion, and e am-
ing c i icism as collabo a i e “p oblem sol ing wi h a hi d pa y”. They e alua ed
he expe imen s by conduc ing semi-s uc u ed in e iews om which high-le el
ca ego ies and hemes we e ex ac ed. In ou conside a ion, his app oach could be
bene i ed by he 7-poin Like scale i ems p oposed by [33].
A ecen s udy ca ied ou by Fu e al. [35] used al eady de eloped AI audio ools
(e.g. Suno and MusicGen) o ca y collabo a i e composi ions wi h no ice c ea o s.
Thei indings indica e ha AI in collabo a i e en i onmen s can be pe cei ed as:
1. As a " eam membe ", wi h c ea i e esponsibili ies (e.g., ly ic gene a ion,
melody composi ion).
2. As a echnical assis an wi h mixing, mas e ing, o gen e-speci ic expe ise
whe e no ices lacked p o iciency.
3. As a mechanical ou pu gene a o , c i icized o p oducing con en lacking
emo ional dep h.
Al hough hese ools lowe ed en y ba ie s (enabling no ices o c ea e ull songs in
minu es), hey in oduced c i ical limi a ions:
1. AI ou pu s we e o en in lexible (e.g., unedi able WAV/MP3 iles) and ha d
o e ine and in eg a e wi h human-gene a ed elemen s.
2. Pa icipan s emphasized o e - eliance on AI isked c ea i e iden i y. Some o
he eams consciously limi ed he ole o AI o p ese e a is ic agency.
3. Despi e accele a ing idea ion, AI comp essed he p epa a o y s age, causing
“decision a igue” du ing idea alida ion. No ices whe e o e whelmed by abun-
dan low-quali y op ions and hey s ick o “sa is icing choices” due o limi ed
domain knowledge.
2.4. B idging he gap: Human collabo a ion assis ed by AI o AISCW 15
The s udy iden i ies se en di e en s ages o co-c ea ion in hei “Human-AI Co-
C ea ion S age Model”: P oblem iden i ica ion, P epa a ion, Idea gene a ion, Idea
selec ion and alida ion, Collaging, e ining and in eg a ion and Ou come. In he
Collaging s age, pa icipan s manually me ged agmen ed AI ou pu s in o cohesi e
wo ks, a p ocess ha lacks con ollabili y and was desc ibed by he pa icipan s as
ime consuming.
Fo e alua ing he s udy, hey combined a i ac analysis, semi-s uc u ed in e -
iews, and embedded e hnog aphy. O e a 10-week unde g adua e caps one cou se,
esea che s sys ema ically analyzed eams’ weekly p ocess logs (e.g., idea ion i e a-
ions, ole assignmen s), c ea i e ou pu s (d a ly ics, Spo i y-published EPs), and
collabo a i e pla o ms (Figma/Mi o boa ds). In-dep h in e iews wi h 9 pa ici-
pan s cap u ed insigh s on wo k low dynamics, pe cei ed AI oles, and owne ship
challenges, song playback was used o s imula e e lec ion. All da a we e analyzed
by hema ic analysis, i s coding eme gen hemes and hen mapping he indings
o he s age model, e ealing c i ical de ia ions om i such as a comp essed o in-
exis en p epa a ion phase. This design cap u ed beha io al dimensions (e.g., ime
spen edi ing AI ou pu s) and pe cep ual dimensions (e.g., sel -e icacy shi s).
To he bes o ou knowledge, hese a e he mos no able and ecen s udies on how
AI can a ec collabo a i e musical composi ion p ac ices. This hesis is an in i a ion
o explo e his gap ha could be called “Humans-AI Collabo a ion” o “AI Suppo ed
Collabo a i e Wo k (AISCW)”. By joining he indings o bo h Human-Human and
AI-Human music composi ion, we can answe some o he iden i ied limi a ions o
AI models (lack o con ol and c ea i i y) as well as e ol e he ield o collabo a i e
music composi ion by u ilizing AI as and assis an o collabo a i e p ocesses.
Collabo a i e ools lack mechanisms o esol e he “clash o ideas” p oblem iden-
i ied in [12], whe e pa icipan s s uggle o econcile di e gen concep s wi hou
media ion. AI could ac as a decen alized media o h ough wo key s a egies:
Idea B idging: A T ans o me model ained in pas and u u e con ex (inpain -
ing) samples could gene a e ansi ional ma e ial be ween con lic ing con ibu ions,
22 Chap e 3. Me hods
Figu e 2: Web A chi ec u e
Figu e 3: Web GUI
k.ge PageCoun () // Handles pagina ion
k.ge TimeFo Elemen () // Measu e Timing
k. ende ToMIDI() // Midi gene a ion
Measu es we e ende ed as in e ac i e SVG elemen s. Click handling le e aged

3.2. InSco eAI A chi ec u e 23
Ve o io’s xml:id o ack selec ions (e.g., selec edMeasu eIds). S a and end ime
o he elec ed measu es o e-gene a e was ex ac ed using Ve o io’s ime-mapping
( k. ende ToTimemap()).
Edi ing Wo k low
MEI DOM Manipula ion was done by di ec ly modi ying MEI XML using DOM-
Pa se o inse /dele e no es/ es s based on use /MIDI inpu :
cons meiDoc =pa se .pa seF omS ing(o iginalMEI, " ex /xml");
cons no eElemen =meiDoc.c ea eElemen NS(MEI_NS, 'no e');
laye Elemen .appendChild(no eElemen );
Ve o io alida ed edi s agains musical ules (e.g., measu e du a ion checks using
ime signa u e da a om sco eDe ) so use s could no inse du a ions o e he
maximum du a ion o each measu e.
The wo k low is as ollows:
1. Use oggles edi mode wi h a bu on
2. Use selec s a measu e o inpu new no es
3. Use selec s a no e du a ion wi h he compu e keys (1 o whole no e, 2 o
hal no e, 3 o qua e no e, e c.)
4. Use plays a no e on hei MIDI keyboa d ( ecei ed h ough he Web MIDI
API)
Playback Sys em
MIDI was gene a ed ia k. ende ToMIDI() and synch onized wi h Tone.js o audio
playback. Visual eedback du ing playback used Ve o io’s k.ge Elemen sA Time()
o highligh ac i e no es. Ve o io-gene a ed MIDI was decoded and scheduled us-
ing Tone.Pa , wi h iming de i ed om Ve o io’s empo al da a (playbackO se ).
Use s can play o s op playback o a selec ion o measu es o he whole sco e.
24 Chap e 3. Me hods
Collabo a ion (Pee JS)
As s a ed in he o icial documen a ion “Pee JS w aps he b owse ’s WebRTC im-
plemen a ion o p o ide a comple e, con igu able, and easy- o-use pee - o-pee con-
nec ion API. A pee can ha e a da a o media s eam connec ion wi h a emo e
pee .” Ve o io’s MEI s a e was se ialized and synced in eal- ime be ween pee s.
Full-s a e b oadcas s included MEI da a and edi his o y. In ou applica ion Use A
connec s o Use B ia Pee ID, ull s a e synch oniza ion is ac i e and bo h use s
see eal- ime upda es o :
•Sco e changes
•AI p oposals ( ed highligh o he ange gene a ed)
AI P oposal Sys em
AI-gene a ed MEI agmen s ( ecei ed om he Flask backend) we e in eg a ed
using Ve o io’s measu e- eplacemen unc ions ( eplaceMeasu eInMEI()).
The AI gene a ion wo k low is as ollows:
1. Use selec s a g oup o measu es
2. Clicks on ha monize, inpain o change melody (ha monize will no change he
sop ano and change melody will no change al o, ba i one and bass, inpain ing
will change all)
3. A “Gene a ing p oposal 1/2...” ex ge s displayed
4. Once he i s p oposal is gene a ed i is displayed in he sco e, he no es a e
upda ed and he use s can lis en o i
5. A “Gene a ing p oposal 2/2...” ex ge s displayed, once he second p oposal
is gene a ed a ew bu ons appea whe e he use s can independen ly selec
accep , ejec and lis en o p oposals
3.2. InSco eAI A chi ec u e 25
6. Once one use accep s a p oposal (o ejec s), he sco e ge s upda ed o bo h
use s
The ela ionship wi h he API is as ollows:
1. The web sends he MIDI o he selec ed egion, he s a _ ime and end_ ime
o gene a ion and he op_p(nucleus sampling pa ame e co e ed in he API
sec ion)
2. The API e u ns a new MusicXML ile ha ge s ende ed
Challenges and Solu ions
Table 5 shows he di e en challenges and solu ions aced du ing he de elopmen
o he en i onmen .
A ea Issue Solu ion
SVG Re- ende ing Loss o use s a e (e.g., selec ed
measu es) du ing SVG egene a-
ion
Pe sis measu e IDs in
selec edMeasu eIds and
e-apply CSS classes
(.highligh ed) a e en-
de ing
Real-Time Collabo a ion Edi con lic s du ing concu en
ope a ions
Ope a ional ans o ms us-
ing e sioned his o y s acks
(his o yS ack, edoS ack)
wi h MEI di ing be o e b oad-
cas
Pe o mance Op imiza-
ion
La ency om ull-sco e e-
ende ing on edi s
Selec i e SVG upda es using
measu e-le el a ge ing and
Ve o io’s pa ial layou me hods
Table 5: Technical Challenges and Solu ions in Web Implemen a ion
3.2.2 API
Symbolic Gene a ion
The i s i e a ion o ou AI symbolic gene a ion used LLMs (Llama and GPT4o)
in di e en en i onmen s (Au ogen and C ewAI) o gene a e abc no a ion displayed
on a web using abcjs. The quali y o ou pu s a ied conside ably and he leng h
26 Chap e 3. Me hods
Figu e 4: API A chi ec u e
o he gene a ions was no consis en enough (e en using pydan ic models). I also
lacked u u e con ex , ha is why we wen o an inpain ing model ecommended
by he esea che Vale io Vela do, he An icipa o y T ans o me .
An icipa o y T ans o me s: The An icipa o y T ans o me [27] le e ages a decode -
only ans o me wi h a con ex leng h o 1,024 okens. We used he okeniza ion
wi h bes esul s shown in he o iginal pape ; he a i al- ime encoding, which is
done by ep esen ing e en s as ( ime, du a ion, no e) iple s ( ocab size: 27,512).
This a i al- ime encoding (3 okens/e en ), enables con ex - ee eo de ing o be -
e sequence modeling.
The model’s key inno a ion is i s an icipa ion mechanism, which in e lea es:
•E en s: No e onse s/o se s
•Con ols: Use -de ined cons ain s (e.g., u u e melody)
Du ing aining, con ols appea seconds ahead o e en s ( = 5sin he imple-
men a ion), enabling he model o lea n dependencies be ween cu en e en s and
u u e cons ain s. The model was ained on he Lakh MIDI da ase (178,561 iles)
wi h ime quan ized o 10ms esolu ion, pi ch (0–127), and du a ion (0–10s). Th ee
3.2. InSco eAI A chi ec u e 27
model sizes we e e alua ed: Small (128M pa ame e s), Medium (360M), and La ge
(780M). Du ing in e ence, nucleus sampling ( op-p) is used o gene a ion, sampling
om okens whe e cumula i e p obabili y ≤p(de aul : p= 0.95).
Why using An icipa o y T ans o me s:
An icipa o y ans o me s o e obus p e- ained models wi h high quali y ou pu s.
You can selec he exac ange o gene a e and he condi ioning okens. I also has
a nucleus sampling op_phype pa ame e s use s can in e ac wi h.
Implemen a ion Enhancemen s: Two c i ical imp o emen s we e made o he
o iginal implemen a ion:
•Compound Ins umen Tokens: The ini ial code me ged mul i- ack MIDIs
(e.g., wo piano acks) in o a single ack. We in oduced compound okens
combining (p og am, channel)using bi -shi ing:
compound_ins =(ins << 4)|channel
This p ese es ins umen iden i y du ing okeniza ion and enables ou - ack
sepa a ion du ing gene a ion (e.g., piano, d ums, bass, s ings).
•Me ada a Reco e y: Time signa u es and empo da a (p e iously dis-
ca ded) a e now ex ac ed du ing p ep ocessing and einjec ed in o gene a ed
MIDI iles. This main ains hy hmic/me ical in eg i y:
me a_ ack.append(mido.Me aMessage('se _ empo', empo=o iginal_ empo))
me a_ ack.append(mido.Me aMessage(' ime_signa u e',...))
Key u ili ies include:
•midi_ o_compound(): now p ese es mul i- ack s uc u e ia compound o-
kens
•e en s_ o_midi(): Recons uc s MIDI wi h me ada a eco e y

28 Chap e 3. Me hods
•Veloci y/no e alida ion: Ensu es MIDI compliance ( eloci y ∈[0,127], no e ∈
[0,127])
The compound oken ix u ns he ans o me om a single- ack o a mul i- ack
model, demons a ing ha a chi ec u al adjus men s can unlock new c ea i e ap-
plica ions wi hou e aining.
Music T ans o ma ion Agen s: Th ee agen s we e coded o ake ad an age o
he ans o me o use -d i en ans o ma ions. All agen s ollow a uni ied wo k low:
1. P ep ocessing:
•Con e inpu MIDI → okenized e en s ia midi_ o_e en s()
•Clip e en s o use -selec ed ime ange [s a _ ime, end_ ime]
2. Gene a ion:
•Condi ioned on his o y (e en s be o e s a _ ime) and con ols (con-
s ain s)
•Gene a e okens ia nucleus sampling ( op_p)
3. Pos p ocessing:
•Me ge gene a ed/an icipa ed e en s →MIDI ia e en s_ o_midi()
•Injec eco e ed me ada a and clamp in alid alues (e.g., MIDI pi ches
∈[0,127])
Ha moniza ion Agen : Gene a es accompanimen condi ioned on a melody. Gi en
inpu MIDI:
1. Ex ac melody (ins umen 0) as con ols
2. Gene a e accompanimen okens:
accompanimen =gene a e(model, con ols=melody)
3.2. InSco eAI A chi ec u e 29
3. Combine new accompanimen wi h o iginal melody
In illing Agen : Inpain s missing segmen s using u u e con ex .
1. Ex ac e en s a e end_ ime as an icipa ed con ols
2. Gene a e in ill okens condi ioned on u u e:
in illing =gene a e(model, con ols=an icipa ed)
3. Me ge in ill wi h o iginal e en s be o e s a _ ime and a e end_ ime
Melody-Changing Agen : Regene a es a melody based on (and p ese ing) ac-
companimen :
1. T ea he accompanimen as con ols
2. Gene a e new melody condi ioned on accompanimen :
new_melody =gene a e(model, con ols=accompanimen )
3. Combine new melody wi h o iginal accompanimen
API Wo k low
The Flask API exposes h ee endpoin s: /upload,/uploadin ill, and /uploadchangemelody.
Each endpoin :
•Accep s a MIDI ile and ime ange (s a _ ime,end_ ime)
•Rou es o he co esponding agen (ha monize , in ille ,change_melody)
•Con e s ou pu MIDI →MusicXML o web ende ing:
musicxml_s =midi_ o_musicxml( emp_midi_pa h)
e u n jsoni y({"musicxml": musicxml_s })
30 Chap e 3. Me hods
3.2.3 Deploymen
Fi s we ied deploying he web and API on he Skyno e se e , bu he e we e
se e al space and memo y issues. The API used he ( esou ce hea y) T ans o me s
and o ch py hon lib a ies. We ied ixing i by:
•Clea ing pip caches
•Using he ligh es an icipa o y model
•Ins alling only he CPU e sion o o ch lib a y
We sol ed his issue by:
•Deploying he web pa using Ne li y
•Deploying he API using Hugging ace Spaces by making a docke con aine .
The API is cu en ly unning on a 32 GB RAM se e (0.03$ pe hou ) and
can be upg aded o a 15 GB RAM GPU se e (0.4$ pe hou )
The app can be accessed by en e ing h ps://insco eai.ne li y.app/
The gene a ion ime now is 7 seconds pe sco e measu e and all he expe imen s
we e conduc ed using his deploymen .
3.3 Expe imen s
Th ee expe imen s we e designed o e alua e he collabo a i e AI-assis ed compo-
si ion app. 16 compose s pa icipa ed ac oss 8 90-minu e sessions, wi h sessions
conduc ed ia Google Mee . The au ho was p esen du ing he en i e session, an-
no a ing eedback and issues, his ep esen s app oxima ely 24 hou s o wo k ime.
Pa icipan s we e pai ed acco ding o a ailabili y (coo dina ed ia Doodle), and
eedback was collec ed h ough obse a ion and a pos -expe imen su ey (Google
Fo ms). Musical ma e ial (s a ing and ending segmen s o 5 measu es each) we e
3.3. Expe imen s 31
dis ibu ed ia WeT ans e . Pa icipan s wi hou physical MIDI keyboa ds used i -
ual al e na i es (MidiKeys o macOS and mpk/loopMIDI o Windows). Techni-
cal p oblems a ose when pa icipan s used VPNs, non-Ch ome b owse s, o mobile
4G ne wo k sha ing, causing sys em ins abili y.
3.3.1 E alua ion Su ey
The pos -expe imen s su ey employed a 7-poin Like scale (1: s ongly disag ee;
7: s ongly ag ee) o measu e use expe ience, d awing me ics om es ablished
amewo ks (e. g. C ea i e exp ession me ic: Ra e 1-7 you ag eemen wi h (“I
was able o exp ess my c ea i e goals in he composi ion made”). Co e me ics o
expe imen s 1,2 and 3 included:
•Musical Cohe ence and Con ol om [32]
•C ea i e Exp ession, Sel E icacy, Engagemen , Comple ness, Uniqueness,
Owne ship, Collabo a ion wi h he sys em, Comp ehensibili y and E o om
[33]
The speci ic me ics om es ablished amewo ks o e alua ing collabo a ion in
expe imen 3 we e:
•Collabo a ion, Easiness o idea sha ing and Idea acking om [16]
•Fluidi y o collabo a ion, Mu ual unde s anding and Consensus eaching om
[17]
3.3.2 Expe imen 1. Indi idual composi ion wi hou AI ools
Pa icipan s began by loading a s a ing 5 measu e MusicXML ile, adding ou
emp y measu es, and adding a 5 measu e ending MusicXML ile. The ask was o
manually ill he 4 emp y measu es ia MIDI inpu wi hou AI assis ance. This
baseline expe imen e alua ed indi idual composi ion wi hou AI.
38 Chap e 4. Resul s
Table 7: Raw p- alues o Expe imen al Compa isons (Unco ec ed) Q= 0.05
Exp1 s Exp2 Exp2 s Exp3
Me ic p- alue Sig. p- alue Sig.
C ea i e Exp ession 0.383 0.206
Sel -E icacy 0.007 0.774
Engagemen 0.003 0.751
Comple eness 0.708 0.034
Uniqueness 0.580 0.052
Owne ship 0,001 0.227
Collabo a ion w/ Sys em 0.718 0.126
Musical Cohe ence 0.652 0.055
Con ol 0.083 0.606
Comp ehensibili y 0.211 0.299
E o 0.001 0.027
Table 8: In e en ial S a is ics (Benjamini-Hochbe g Co ec ed) Q= 0.05
Exp1 s Exp2 Exp2 s Exp3
Me ic p- alue Sig. p- alue Sig.
C ea i e exp ession 0,602 0.378
Sel -E icacy 0,027 0,774
Engagemen 0,089 0.826
Comple eness 0,779 0,186
Uniqueness 0,798 0.190
Owne ship 0,006 0,357
Collabo a ion w/ Sys em 0,718 0.278
Musical Cohe ence 0,797 0.153
Con ol 0,181 0.740
Comp ehensibili y 0,386 0.412
E o 0,004 0,293
Table 9: Collabo a ion Me ics in Exp 3 (N=16)
Me ic Mean Median
Collabo a ion 6.19 7
Fluidi y 6.13 6.5
Idea Sha ing 6.31 7
Idea T acking 5.25 5
Mu ual Unde s anding 6.19 6
Consensus 6.38 6

Chap e 5
Discussion
5.1 Discussion
Resul s con i m ha AI in eg a ion dis inc ly eshapes c ea i e p ocesses ac oss in-
di idual and collabo a i e en i onmen s, di ec ly add essing ou co e esea ch ques-
ions:
1. Measu ed me ics (Indi idual s. Collabo a i e):
•Indi idual Use: AI d as ically educed e o (µ= 2.56 s. µ= 4.94,
p=0.004) and boos ed sel -e icacy (µ= 5.31 s. µ= 4.69, p=0.027),
co obo a ing s udies on AI-s ee ing ools [32],[33]. Howe e , owne ship
declined signi ican ly (µ= 3.31 s. µ= 5.62, p=0.006), echoing ea s o
"soulless" ou pu s and o e - eliance [31],[35].
•Collabo a i e Use: Owne ship emained low (µ= 3.88) and s a is ically
indis inguishable om indi idual AI use (p=0.357), sugges ing AI’s ole
as a gene a o a he han co-au ho dilu es owne ship ega dless o con-
ex . Ye , collabo a ion me ics (e.g., Consensus, Med=6) exceeded ex-
pec a ions, indica ing AI mi iga es c ea i e bo lenecks wi hou esol ing
owne ship ade-o s.
2. Resol ing In e pe sonal F ic ion:
39
40 Chap e 5. Discussion
High luidi y (Med=6.19) and consensus sco es (Med=6.38) could sugges he
e icacy o AI as "social glue" hough quali a i e c i iques no ed AI p edic abil-
i y could ha ness expe imen a ion (P10, P11). Quali a i e esponses show ha
“We ag eed as and easy on which pa o do” (P14). The au ho no ed a a -
ied se o app oaches om he pa icipan s: di iding he composi ion in oices,
in measu es, using he AI o gene a e some hing while hey we e wo king on
ano he pa and e-ha monizing sec ions once hey we e inished.
3. Compose Requi emen s: Su eyed compose s demanded eal- ime col-
labo a ion (90% o ed >= 5), AI-aided echnical asks (e.g., ha moniza ion:
65%), and "idea b idging" was men ioned in he quali a i e esponses. Ou
implemen a ion add essed hese ia an icipa o y ans o me s o mul i- ack
inpain ing and measu e-le el egene a ion ( ea u es absen in s a e o he a
so wa e analyzed in Table 1).
Ou indings ex end he "s ee able AI" pa adigm o collabo a i e se ings, con i m-
ing ha AI enhances co-c ea ion e icacy bu complica es owne ship. Unlike Cococo
[33], which ele a ed no ice owne ship ia cons ained AI oles, ou p o essional co-
ho pe cei ed AI as a subs i u i e o ce, u ging designs ha p ese e human agency.
5.1.1 Con ibu ions
The con ibu ions o his wo k a e as ollows:
An icipa o y T ans o me s can now:
•Gene a e mul i ack MIDIs.
•Be used in a new manne (Melody change, condi ioned on accompanimen ).
•Be isualized in a web in e ace.
•Be isualized collabo a i ely in eal- ime.
A ully unc ional collabo a i e sco e edi ing + AI ull-s ack web app was designed
and deployed. Compose s now ha e a new ool o esol e:
5.1. Discussion 41
•Clash o ideas. (A p oblem iden i ied in he s a e o h a o Human-Human
Collabo a ion)
•Lack o con ol wi h AI. (A p oblem iden i ied in he s a e o he a o Human-
AI Collabo a ion)
Finally: insigh ul eedback om 20 p o essional compose s on ideal ea u es o
u u e collabo a i e applica ions was ex ac ed. Also, long answe s om he inal
su ey e eal some possible educa ional po en ials o collabo a i e apps.
5.1.2 Fu u e s eps
Web u u e s eps
As men ioned abo e, he p elimina y su ey e ealed se e al ea u es ha can be im-
plemen ed in he u u e ( hey we e no implemen ed due o ime cons ain s). These
ea u es a e: a commen ing/anno a ion sys em, con ex ual music heo y guidance,
e sion compa ison analy ics (o ull e sion con ol) and con ibu ion acking me -
ics. Some o his would be aligned wi h he imp o emen o he Idea acking
me ic, which, as we men ioned, was he lowes . O he ea u es we sugges a e use
log-in and no e-le el gene a ion inside he in e ace.
Ideally a new open sou ce sco e edi ing lib a y o he web should be de eloped
bu , i his is no possible due o ime and esou ces, he au ho sugges s ( o a
p oduc ion eady applica ion) ei he con ac ing wi h he No e ligh eam o he use
o hei no a ion edi o o de eloping a plugin o MuseSco e and keeping he web
o isualiza ion, AI, and e sion con ol.
API u u e s eps
Fu u e s eps in he API could be ocused on sel -hos ing he API on a powe ul
se e .
The an icipa o y ans o me could be ine- uned o i he desi ed gen e o he
compose s o mo e ich de ails. One (no ye es ed) implemen a ion o LLMs in o
42 Chap e 5. Discussion
his applica ion would be gene a ing abc no a ion h ough na u al language que ies,
u ning his no a ion in o MIDI and condi ion he an icipa o y ans o me on his
MIDI.
The esea che Dimi y Bogdano p oposed using embeddings om bo h compose s
co pus o wo ks ( hese could be impo ed in he app, hough p i acy issues should
be aken in o accoun ) so he inpain ing would be mo e aligned o a mix o hei
s yle, esul ing in a possible imp o emen o consensus.
5.1.3 Limi a ions
Ideally, his applica ion should be es ed on mo e pa icipan s (now N=16), which
was no possible due o he na u e o he equi emen s se :
•P o essional compose s
•P o icien in sco e eading and edi ing
•Access o a Midi keyboa d (physical o digi al)
This N size is why some me ic alues should no be aken as a end and gene al-
iza ion o he esul s canno be aken o g an ed. The eason behind his was o
ocus on a speci ic popula ion and pe o m long and exhaus i e expe imen s whe e
he au ho could be p esen o he whole ime anno a ing eedback o ex ac de-
eloped insigh s and no sho Yes/No answe s. Thus, due o ime cons ain s, his
could no be done wi h a highe numbe o pa icipan s.
Time cons ain s also lead o a disca d o o he quan i a i e analysis as analyzing
he MIDI iles gene a ed by he use s (by analyzing epe i i eness, complexi y, e c.)
and hen compa ing AI and no-AI iles in a ocus g oup o see i he e a e pe cep ual
di e ences.
The limi ed compu a ional esou ces o he API also led o using he small e sion
o he An icipa o y T ans o me model on he expe imen s, limi ing he quali y o
he ou pu (which can e en be highe ).
5.2. Conclusions 43
5.2 Conclusions
This hesis add essed a c i ical gap in he in e sec ion o a i icial in elligence and col-
labo a i e c ea i e p ac ices by designing, implemen ing, and e alua ing InSco eAI,
a web-based pla o m in eg a ing an icipa o y ans o me s o eal- ime, AI-assis ed
collabo a i e music composi ion. Ou wo k b idges he domains o Compu e Sup-
po ed Collabo a i e Wo k (CSCW) and Human-AI co-c ea ion, p oposing a no el
pa adigm: AI-Suppo ed Collabo a i e Wo k (AISCW) o music. Th ough empi i-
cal s udies wi h p o essional compose s, we add ess ou co e esea ch ques ions and
show he po en ial (and challenges) o AI media ion in c ea i e wo k lows.
AI signi ican ly educed composi ional e o and enhanced sel -e icacy, alida ing
AI’s ole as a powe ul echnical assis an ha accele a es wo k low and o e comes
c ea i e blocks. Howe e , his comes a he cos o a signi ican educ ion in pe -
cei ed owne ship, aising conce ns abou o e - eliance and “soulless” ou pu s among
p o essional compose s.
AI se es as an e ec i e "social glue", demons ably mi iga ing in e pe sonal ic ion
du ing clashes o ideas. High median sco es o Fluidi y, Consensus, and Collabo a-
ion con i m ha AI-gene a ed "idea b idges" acili a e mu ual unde s anding and
e icien comp omise. Al hough owne ship emained low and compa able o indi id-
ual AI use, he collabo a i e con ex os e ed a ma ginally highe sense o sha ed
owne ship compa ed o soli a y AI assis ance, sugges ing ha collabo a ion pa ially
mi iga es he lack o sense o owne ship.
We success ully ex ended an icipa o y ans o me s wi h mul i- ack capabili ies and
embedded i wi hin a collabo a i e web edi o . Key echnical con ibu ions include
compound ins umen okens p ese ing mul i- ack s uc u e du ing gene a ion,
me ada a eco e y ( empo, ime signa u es) ensu ing hy hmic/me ical in eg i y,
and h ee “s ee ing” agen s (Ha moniza ion, Inpain ing and Melody-Changing) o -
e ing lexible, use -con ollable AI in e en ions condi ioned on melody, accompa-
nimen o u u e con ex . This p o ides a obus solu ion o he "lack o con ol"
p oblem p e alen in au onomous AI music gene a o s.

44 Chap e 5. Discussion
InSco eAI e ec i ely add esses he “lack o consensus” challenge iden i ied in CSCW
music esea ch. By gene a ing mul iple ansi ional op ions (“idea b idges”) be ween
con lic ing human con ibu ions and allowing eal- ime, independen audi ion/selec-
ion/ ejec ion o p oposals whe e AI ac s as a neu al media o . This ans o ms c e-
a i e disag eemen s in o collabo a i e p oblem-sol ing exe cises ocused on e alua -
ing AI sugges ions, he eby educing in e pe sonal ension and accele a ing p og ess
owa d sha ed goals.
The sys em di ec ly add esses he high-p io i y ea u es iden i ied by p o essional
compose s: eal- ime collabo a ion (90% high in e es ), AI assis ance o echnical
asks (Ha moniza ion and Melodic De elopmen , 65%), and he eques ed "idea
b idging" unc ionali y. Howe e , quali a i e eedback by p o essional compose s
aised conce ns abou AI po en ially os e ing c ea i e laziness, simpli ying musi-
cal ou pu , and diminishing he pe cei ed uniqueness and expe imen al na u e o
composi ions. Quali a i e insigh s highligh he signi ican educa ional alue o AI-
assis ed collabo a i e pla o ms, pa icula ly o no ice compose s and s uden s.
Bene i s include o e coming ini ial echnical hu dles (e.g., ha moniza ion), isual-
izing composi ional possibili ies, and mo i a ing engagemen by ans o ming exe -
cises in o ulle musical pieces wi hin a suppo i e collabo a i e en i onmen .
This hesis demons a es ha an icipa o y ans o me s, in eg a ed as s ee able
pa ne s wi hin a eal- ime collabo a i e en i onmen , o e a powe ul pa adigm
(AISCW) o ans o ming music composi ion. InSco eAI success ully educes e -
o , enhances sel -e icacy, and esol es in e pe sonal ic ion, making collabo a i e
composi ion mo e luid and accessible. Howe e , he co e ension be ween AI’s e -
iciency gains and he dilu ion o pe cei ed owne ship/c ea i e iden i y emains a
undamen al challenge o he ield. Fu u e wo k mus p io i ize in e ace and AI
designs ha ac i ely p ese e human agency, os e s ylis ic di e si y, and enhance
anspa ency o add ess hese conce ns, pa icula ly o p o essional use s. The
deploymen o InSco eAI h ps://insco eai.ne li y.app/ and planned open-
sou cing o i s code p o ide a ounda ion o u he explo a ion o AI’s e ol ing
ole in human collabo a ion wi hin music and o he ields.
Bibliog aphy
[1] Glo e , R. & Redhead, L. Collabo a i e and Dis ibu ed P ocesses in Con em-
po a y Music-Making (Camb idge Schola s Publishing, 2020).
[2] G esham-Lancas e , S. The aes he ics and his o y o he hub: The e ec s o
changing echnology on ne wo k compu e music. Leona do Music Jou nal 8,
39–44 (1998).
[3] Ba bosa, A. Compu e -supo ed coope a i e wo k o music applica ions. Ph.D.
hesis, Uni e si a Pompeu Fab a (2006).
[4] Blaine, T. & Pe kis, T. The Jam-O-D um in e ac i e music sys em: A s udy in
in e ac ion design. In P oceedings o he 3 d Con e ence on Designing In e ac-
i e Sys ems: P ocesses, P ac ices, Me hods, and Techniques, 165–173 (ACM,
New Yo k Ci y New Yo k USA, 2000).
[5] Kal enb unne , M., Jo da, S., Geige , G. & Alonso, M. The eacTable*: A
Collabo a i e Musical Ins umen . In 15 h IEEE In e na ional Wo kshops
on Enabling Technologies: In as uc u e o Collabo a i e En e p ises (WET-
ICE’06), 406–411 (IEEE, Manches e , UK, 2006).
[6] Laney, R. e al. Issues and echniques o collabo a i e music making on mul i-
ouch su aces. In 7 h Sound and Music Compu ing Con e ence (Ba celona,
2010).
[7] Xambó, A., Laney, R. & Dobbyn, C. TOUCH 4ck: Democ a ic collabo a i e
music. In TEI ’11 Fi h In e na ional Con e ence on Tangible, Embedded and
Embodied In e ac ion (Funchal, Po ugal, 2011).
45
46 BIBLIOGRAPHY
[8] Jo da, S. & Wus , O. A sys em o collabo a i e music composi ion o e he
web. In 12 h In e na ional Wo kshop on Da abase and Expe Sys ems Appli-
ca ions, 537–542 (2001).
[9] Men, L. Explo ing Collabo a i e Music Making Expe ience in Sha ed Vi ual
En i onmen s. Thesis, Queen Ma y Uni e si y o London (2020).
[10] Vlachakis, G., Kalaen zis, A. & Akoumianakis, D. Collabo a i e music compo-
si ion as i ual wo k ac oss bounda ies. In 2014 In e na ional Con e ence on
Telecommunica ions and Mul imedia (TEMU), 202–207 (2014).
[11] Tipei, S., C aig, A. B. & Rod iguez, P. F. Using High-Pe o mance Compu e s
o Enable Collabo a i e and In e ac i e Composi ion wi h DISSCO. Mul imodal
Technologies and In e ac ion 5, 24 (2021).
[12] B yan-Kinns, N., Healey, P. G. T. & Leach, J. Explo ing mu ual engagemen
in c ea i e collabo a ions. In P oceedings o he 6 h ACM SIGCHI Con e ence
on C ea i i y & Cogni ion, C&C ’07, 223–232 (Associa ion o Compu ing
Machine y, New Yo k, NY, USA, 2007).
[13] B yan-Kinns, N. Mu ual engagemen and colloca ion wi h sha ed ep esen a-
ions. In e na ional Jou nal o Human-Compu e S udies 71, 76–90 (2013).
[14] Ha , A. & Williams, A. A Space o Making: Collabo a i e composi ion as
social pa icipa ion. O ganised Sound 26, 240–254 (2021).
[15] Biasu i, M. Assessing a collabo a i e online en i onmen o music composi-
ion. Educa ional Technology & Socie y 18, 49–64 (2015).
[16] Che y, E. & La ulipe, C. The c ea i i y suppo index. 4009–4014 (2009).
[17] Bu kha d , J.-M. e al. An app oach o assess he quali y o collabo a ion in
echnology-media ed design si ua ions. In Eu opean Con e ence on Cogni i e
E gonomics: Designing beyond he P oduc — Unde s anding Ac i i y and Use
Expe ience in Ubiqui ous En i onmen s, ECCE ’09 (VTT Technical Resea ch
Cen e o Finland, FI-02044 VTT, FIN, 2009).
BIBLIOGRAPHY 47
[18] MuseSco e. Muselab plugin. h ps://musesco e.o g/en/p ojec /
muselab- eal- ime-collabo a ion [Accessed: 31-3-25].
[19] Zhu, Y., Baca, J., Rekabda , B. & Rawassizadeh, R. A Su ey o AI Music
Gene a ion Tools and Models (2023). 2308.12982.
[20] Wang, L. e al. A e iew o in elligen music gene a ion sys ems. Neu al Com-
pu ing and Applica ions 36, 6381–6401 (2024).
[21] Ji, S., Yang, X. & Luo, J. A Su ey on Deep Lea ning o Symbolic Music
Gene a ion: Rep esen a ions, Algo i hms, E alua ions, and Challenges. ACM
Compu . Su . 56, 7:1–7:39 (2023).
[22] Ippoli o, D., Huang, A., Haw ho ne, C. & Eck, D. In illing piano pe o -
mances. In NIPS Wo kshop on Machine Lea ning o C ea i i y and Design,
ol. 2 (2018).
[23] Pa i, A., Le ch, A. & Hadje es, G. Lea ning o a e se la en spaces o musical
sco e inpain ing. CoRR abs/1907.01164 (2019). URL h p://a xi .o g/
abs/1907.01164.1907.01164.
[24] Chen, K., Wang, C.-i., Be g-Ki kpa ick, T. & Dubno , S. Music ske ch-
ne : Con ollable music gene a ion ia ac o ized ep esen a ions o pi ch and
hy hm. a Xi p ep in a Xi :2008.01291 (2020).
[25] Chang, C.-J., Lee, C.-Y. & Yang, Y.-H. Va iable-leng h music sco e in ill-
ing ia xlne and musically specialized posi ional encoding. a Xi p ep in
a Xi :2108.05064 (2021).
[26] Guo, R., Simpson, I., Kie e , C., Magnusson, T. & He emans, D. Musiac: An
ex ensible gene a i e amewo k o music in illing applica ions wi h mul i-le el
con ol. In In e na ional Con e ence on Compu a ional In elligence in Music,
Sound, A and Design (Pa o E oS a ), 341–356 (Sp inge , 2022).
[27] Thicks un, J., Hall, D., Donahue, C. & Liang, P. An icipa o y Music T ans-
o me (2024). 2306.08620.