P oceedings o he 6 h Con e ence on AI Music C ea i i y (AIMC 2025),
B ussels, Belgium, Sep embe 10 h-12 h
PATIENCE X: Ex ended a is ic exp ession in AI-
assis ed music composi ion
Lois Macdonald
School o Digi al A s (SODA)
Manches e Me opoli an Uni e si y
[email protected]
Abs ac
PATIENCE X is a collec ion o se en audio composi ions explo ing he
in eg a ion o machine lea ning ools in o composi ional and pe o ma i e
wo k lows o al -pop. This p ac ice-led esea ch examines eal wo ld AI-assis ed
music composi ion. Th ough quali a i e expe imen a ion wi h selec ed AI ools
and o iginal da ase s o emale ocals collec ed by he esea che , PRiSM
SampleRNN eme ged as mos a is ically e ec i e o gene a ing unique samples
o selec ion and manipula ion. The indings emphasise he need o a balance
be ween a is ic con ol and algo i hmic unp edic abili y, ensu ing AI unc ions
as a c ea i e ca alys a he han a ool me ely o ex ensi e eplica ion.
Addi ionally, accessibili y emains c i ical. Fo AI ools o be p ac ically use ul o
musicians, hey mus accommoda e non-coding a is s wi hou comp omising
usabili y and c ea i e low. Composed be ween 2022 and 2023, PATIENCE X
se es as a documen o accessible sys ems du ing his pe iod as well as
demons a ing hei sonic aes he ics. Th ough he p ocess o disco e y he p ojec
led o he o ma ion o a new a is pe sona, 'PATIENCE’, a genuine
ep esen a ion o ex ended musical exp ession h ough AI use.
1 In oduc ion
This 10-minu e pe o mance p esen s an a angemen o composi ions om PATIENCE X, a body
o wo k c ea ed using unique ocal samples gene a ed la gely ia PRiSM SampleRNN. These
samples a e de i ed om o iginal da ase s ea u ing emale ocals, and he esul ing acks
embody an explo a ion o he au hen ic in eg a ion o AI sys ems in o a composi ional wo k low
o elec onic al -pop music. Th ough his p ac ice-led esea ch a new a is pe sona,
‘PATIENCE’, came in o exis ence, he pe soni ica ion o he human/machine ocalisa ions
gene a ed h oughou he p ojec . As a musician wi h li le p io knowledge o coding I aim o
con ibu e o he discou se a ound he implemen a ion o AI ools in o c ea i e music composi ion
wo k lows in e ms o accessibili y, au ho ship, and a is ic con ol. He e, I ou line he c ea i e
p ocess, con ex ual in luences, and he me hodological amewo k behind his wo k.
2 Con ex
The wo k ela es o he heme o AIMC 2025 h ough explo a ion o o iginal da ase s ha unco e
new sonic ma e ials and d i e ex ended c ea i e exp ession in music composi ion (and ela ed
a is ic ields). The p ojec also add esses he issue o accessibili y o AI ools o musicians and
how hese ools can be in eg a ed au hen ically in o exis ing p ac ices, bo h o in s udio and li e
wo k lows.
2
The e is a g owing body o esea ch explo ing he gap be ween echnological inno a ion and i s
p ac ical adop ion by a is s. S u m and Ben-Tal (2017) add ess his issue by in i ing musicians o
engage c i ically wi h Cha RNN and FolkRNN. Simila ly, Ma, Sa gen, De Rou e, and Howa d
(2024) examine he applica ion o he PRiSM SampleRNN model by conse a oi e-based a is s
who possess a deg ee o p io amilia i y wi h he echnology. Howe e , access o and
au onomous engagemen wi h such ools emains limi ed o musicians ope a ing ou side o
specialis con ex s. This pe o mance o e s an al e na i e pe spec i e by e lec ing on he
applica ion o PRiSM SampleRNN om an a is si ua ed wi hin he al -pop gen e. As a c ea i e
p ojec buil om aw ocal audio da ase s p ocessed h ough neu al ne wo ks, PATIENCE X is in
dialogue wi h wo ks such as Second Sel (2019) by Dadabo s, Reeps One and Bell Labs (2019),
PROTO (2019) by Holly He ndon, and Fu u e Cho us (2023), cu a ed by Eleni Ikoniadou.
3 Me hodology
Taking an i e a i e app oach, I ini ially expe imen ed wi h 17 AI ools, each wi h a ious me hods
o audio gene a ion. These we e: AIVA, Bea bo .ai, Cha GPT, El Tech (G imes, 2023), Genny,
Holly+, Ki s.ai, Lo o, Magen a S udio, MaxMSP (ml.s a ), PRiSM SampleRNN, RAVE, Sound ul,
These Ly ics Do No Exis , TTS Make , Wekina o , and Wo d2Wa e. Tools we e assessed based
on h ee c i e ia: accessibili y ( echnical skill and cos ), c ea i e po en ial, and hei p ac ical abili y
o suppo and sus ain low (Csikszen mihalyi, 1990). I ound MIDI and da a-d i en ools
in oduced a mechanical o p edic able quali y o he ou pu s whils p oducing ex ensi e a ia ions
wi h li le usable con en . In con as , wo king wi h a mo e selec i e amoun o aw audio clips
p oduced by neu al ne wo ks was mo e p oduc i e. I was able o access new sonic ma e ials ha
in oduced me o a imb al space ha is bo h emo ionally esonan and uncanny. This was
cha ac e ised by mic o onal wa bles, gli ches, and ocal anomalies ha e oke a p imal quali y. This
is whe e I was able o connec c ea i ely wi h hese ools.
PRiSM SampleRNN, El Tech, TTS Make and Cha GPT we e used o he inal composi ions.
Mos signi ican ly, PRiSM SampleRNN was used o c ea e a lib a y o 249 unique samples.
Fi e o iginal da ase s o 30 minu es we e used o gene a e hese samples.
Table 1: Da ase s used
Da ase
Desc ip ion
1
Elec ic gui a (w i en and
pe o med by a is )
2
Syn h imp o isa ion (Mic oKo g,
Ko g MS20 and Roland TB3)
3
Spoken s o y elling (Lead a is
elling pe sonal anecdo es in
con e sa ional one)
4
"Vocal dé i e" (imp o ised
ocalisa ions by lead a is )
5
Mixed ocal se (10 minu es each
om lead a is and wo addi ional
emale ocalis s pe o ming
o iginal melodies)
Da ase s 1 and 2 yielded limi ed musical in e es as he ou pu s we e simila o he o iginal audio.
The da ase s using emale ocals (3, 4, 5) p o ided a ied and anomalous esul s ha ini ia ed a
c ea i e impulse, wi h ou pu s om da ase 5 mos musically engaging. C ea ed om h ee
di e en emale oices in a simila ange, he audio iles used we e cohe en s ylis ically as o
3
gene a e luid sounds, whils small de ails such as accen , ph asing and a icula ion allowed he
model o o m a mo e di e se ou pu han wi h a single ocal. In o al, o e 100 minu es o audio
iles we e gene a ed om hese da a se s. Musically in e es ing agmen s o hese sonic ou pu s
we e hen edi ed in o sho samples and ca ego ised by heme o c ea e a unique sample lib a y o
compose wi h. Eg. ‘Kick’, ‘Hi s’, ‘Rhy hm’, ‘Singing’. The limi a ion o wo king wi h hese
selec ed samples was essen ial o he c ea i e de elopmen o he p ojec .
Audio 1
Example o audio gene a ed om da ase 3 (Con e sa ional)
Audio 2
Example o audio gene a ed om da ase 4 (Vocal Dé i e single ocal)
Audio 3
Example o audio gene a ed om da ase 5 (Mixed ocal)
Audio 4
Example o pe cussi e audio gene a ed om da ase 5 (Mixed ocal)
In addi ion o gene a ing samples using PRiSM SampleRNN, El Tech was used o c ea e ocables
om nonsensical o pe cussi e sounds. Cha GPT was used o w i e ly ics ‘collabo a i ely’, and
TTS make was used o oice hese ly ics whe e needed.
Audio 5
Example o o iginal sample om PRiSM SampleRNN
Audio 5.1
Example o o iginal sample om PRiSM SampleRNN wi h dis o ion
Audio 5.2
Example o o iginal sample om PRiSM SampleRNN wi h dis o ion p ocessed wi h El Tech
4
T acks we e composed using Able on Li e o Logic P o X. The esul ing wo ks a e pe o med
using Able on Li e, ha dwa e d um machine Roland TR8 (uploading unique samples) and he
inclusion o emale ocalis s o u he cha ac e ise he sounds and pe sona o ‘PATIENCE’.
4 Conclusion
In adop ing an open-ended, explo a o y app oach, I ha e engaged in p ac i ione -led p ocesses ha
allow a nuanced b oke ing o ou ela ionship wi h eme ging echnologies. Si ing adjacen o
o malised academic esea ch, his p ac ice-based me hod plays a c i ical ole in shaping he
design o u u e esea ch agendas.
Th ough explo ing he ou pu s o he s a ed sys ems, I was d awn owa d he gene a ion and
manipula ion o human-like sounds. This mani es ed ei he h ough he expansion o my own
ocal cha ac e is ics by blending hem wi h hose o o he pe o me s, o h ough he
pe soni ica ion o pe cussi e and melodic elemen s aken om gene a ed clips. As a esul , he
p ojec ’s i le e ol ed in o a new a is ic pe sona, ‘PATIENCE’, embodying he uncanny, hyb id
quali y o he oice-based samples. This al e ego has led o se e al new p ojec s ha ex end pas
he ini ial in en ions o he esea ch.
Re e ences
G imes (2023) ’El Tech’ ‘h ps://el . ech/connec ' [accessed on 5 May 2023]
Cha GPT (2022) A ailable a : <h ps://cha gp .com/> [accessed on 14 Decembe 2022]
Csikszen mihalyi, M., 1988. The low expe ience and i s signi icance o human psychology. Op imal
expe ience: Psychological s udies o low in consciousness, 2, pp.15-35.
He ndon, H., (2019) P o o, A ailable a : Spo i y [Accessed: 10 May 2022]
Ikoniadou, Eleni e al., (2023) Fu u e Cho us, Digi al album compila ion, A ailable a : Spo i y [Accessed: 10
Oc obe 2023]
L. S u m, B. & Ben-Tal, O., (2017) “Taking he Models back o Music P ac ice: E alua ing Gene a i e
T ansc ip ion Models buil using Deep Lea ning”, Jou nal o C ea i e Music Sys ems 2(1).
doi: h ps://doi.o g/10.5920/JCMS.2017.09
Ma, B., Sa gen, E., De Rou e, D. and Howa d, E., 2024. Lea ning o Lea n: A Re lexi e Case S udy o
PRiSM SampleRNN. AIMC 2024 (09/09-11/09)
5
Melen, C. (2020) PRiSM SampleRNN. A ailable a : h ps://www. ncm.ac.uk/ esea ch/ esea ch-
cen es ncm/p ism/p ism-collabo a ions/p ism-sample nn/ [Accessed: 4 No embe 2021]
Dadabo s, Reeps One and Bell Labs (2019) Second Sel . A ailable a :
h ps://www.you ube.com/wa ch? =q981cTdL0Y& =67s [Accessed: 11 Ma ch 2022]
TTS Make (2023) A ailable a : <h ps:// smake .com/> [Accessed: 21 Ap il 2023]