Composing wi h an AI-Gene a ed Sound Co pus:
Re lec ions on My Compu e ’s In e p e a ion o Falling
Down
S e ie J. Su an o
Facul y o Music
Uni e si as Peli a Ha apan
Tange ang, Ban en
[email p o ec ed]
Abs ac
This pape e lec s on he c ea ion o My Compu e ’s In e p e a ion o Falling
Down, a composi ion de eloped using an AI-gene a ed sound co pus. Ra he han
using AI solely as a sou ce o aw ma e ial, he wo k explo es how gene a i e
models can also shape composi ional s uc u e and ela ionships be ween sounds.
The p ocess in ol ed gene a ing a co pus o mo ion- ela ed sounds ia ex p omp s
submi ed o a ex - o-audio model, hen o ganizing and sequencing hose sounds
h ough ea u e-based clus e ing. The esul is a piece shaped h ough in e ac-
ion—be ween language, sys em beha io , and lis ening. Mo i a ed by cu iosi y
abou human–machine collabo a ion, he wo k explo es how his app oach migh
no only shape musical o m bu also e eal how a black-box gene a i e model
in e p e s a cons ained opic h ough i s unde lying biases using language ha is
meaning ul o humans, a he han adjus ing abs ac model pa ame e s.
1 In oduc ion
My Compu e ’s In e p e a ion o Falling Down explo es he sonic quali ies o objec s in mo-
ion— alling, olling, spinning, and sliding— h ough he lens o machine-gene a ed sound. Ra he
han sou cing sounds om eal-wo ld eco dings o syn hesis, he ma e ial was de eloped using an
AI-gene a ed sound co pus cons uc ed om a se o desc ip i e p omp s. This app oach examines
how gene a i e audio models, pa icula ly ex - o-audio models, which a e ypically employed o
c ea e single audio pieces, can be mo e ully in eg a ed in o he composi ional p ocess.
In his wo k, AI ools we e asked no only wi h p oducing sound ma e ial bu also wi h con ibu ing
o he shaping o hei o ganiza ion and o mal de elopmen . This ies in o b oade discussions in
gene a i e music esea ch, which posi ion AI sys ems no only as con en gene a o s bu as po en ial
co-c ea i e agen s (Singh e al., 2024). Tha sense o disco e y—led pa ly by he sys em’s own
logic—was cen al o how his piece ook shape.
2 Wo king wi h Tex - o-audio Model
Tex - o-audio models ha e quickly e ol ed, allowing a is s o desc ibe sound ia language and
ecei e audio ende ings. Howe e , hei c ea i e use is o en limi ed o isola ed asks, like gene a ing
one-sho e ec s o illing sonic gaps. I aimed o push his u he — iewing he AI-gene a ed co pus
no jus as sounds bu as a i al componen in he composi ional amewo k.
This cu iosi y ela es o a b oade ques ion abou he po en ial o gene a i e models: how migh hei
in e nal associa ions, unp edic able beha io , o impe ec ions be pa o he composi ional p ocess?
How can a ex - o-audio sys em, when u ilized c ea i ely, con ibu e no only o a piece’s imb al
su ace bu also o i s s uc u al logic?
P oceedings o he 6 h Con e ence on AI Music C ea i i y (AIMC 2025),
B ussels, Belgium, Sep embe 10 h-12 h
Recen esea ch, such as ha by Che ep e al. (2024), shows ha AI-gene a ed sound need no aim
o ealism; i can o e abs ac ende ings ha e oke ideas a he han ep oduce hem. Simila ly,
Liu e al. (2025) highligh amewo ks ha posi ion AI as a composi ional collabo a o , coo dina ing
di e se audio elemen s in s uc u ed ways. This p ojec i s wi hin hese explo a ions while ocusing
on a single, cons ained wo ld: objec s in mo ion.
3 F om Co pus o Composi ion
The gene a i e p ocess began wi h he use o a ex - o-audio model accessed h ough Ele enLabs
sound e ec s API.
1
In his p ojec , I explo ed how such a comme cial model could espond o a
mo e ocused se o physical-mo ion scena ios, aiming o build a uni ied and hema ically cons ained
sound co pus.
To manage he di e si y and speci ici y o he co pus, I c ea ed a simple py hon sc ip ha au o-
ma ically gene a ed desc ip i e p omp s by combining a iables such as ma e ial (e.g., wood, glass,
me al), objec size (small, medium, la ge), ype o mo ion ( alling, sliding, bouncing, olling), and
su ace in e ac ion (conc e e, g a el, wa e , e c.). These ph ases we e hen sen o he API, which
e u ned a ange o sho audio clips based on he ex ual inpu . This au oma ed app oach enabled
he model o espond wi h di e se sounds uni ied by a sha ed concep ual domain. Examples o he
gene a ed p omp s include:
• “A hea y wooden objec sliding ac oss a conc e e loo ”
• “A small glass ball spinning o a s op on me al”
• “Some hing ubbe y bouncing quickly on g a el”
The gene a ed audio was analyzed using pe cep ual and acous ic ea u es, including spec al cen oid,
b igh ness, du a ion, and en elope shape. Based on hese ea u es, I g ouped he sounds in o wo
b oad ca ego ies: impac ges u es (sho , ansien e en s) and mo ion ges u es (sus ained sounds like
olling o sliding).
To connec hese ca ego ies, I used a KDT ee s uc u e o e icien ly sea ch o acous ic simila i y.
Fo each impac sound, he sys em e ie ed nea by mo ion ges u es in he ea u e space— hose wi h
compa able spec al and empo al cha ac e is ics. These pai ings we e no in ended o be li e al bu
we e designed o c ea e pe cep ual con inui y: a sense ha one sound migh logically ollow ano he ,
e en i hey o igina ed om en i ely di e en p omp s.
The esul ing sequences eme ged h ough a balance o algo i hmic associa ion and lis ening-based
cu a ion. I ea ed he sys em’s sugges ions no as ixed solu ions bu as p oposals—s a ing poin s
o explo ing connec ions, ques ioning assump ions, and shaping he piece h ough a en ion o wha
he ma e ial seemed o o e . In his way, he o m o he composi ion was no imposed in ad ance,
bu g adually disco e ed h ough in e ac ion wi h he co pus and he beha io o he sys em i sel .
This me hod esona es wi h co pus-based (Schwa z, 2007) and AI-assis ed composi ion p ac ice, bu
wi h a key di e ence: he sound ma e ials we e no d awn om an a chi e o eal-wo ld eco dings
o ins umen al samples. This p ocess in ol ed cons uc ing he co pus om he g ound up h ough
in e ac ion wi h a gene a i e model, se ing as bo h a da ase and a c ea i e en i onmen .
4 Challenges
Wo king wi h AI-gene a ed sound co po a p esen ed a numbe o challenges—some echnical, o he s
mo e concep ual. One ecu ing di icul y was he unp edic abili y o he model’s ou pu . While he
s uc u e o he p omp s p o ided a deg ee o con ol, he esul s we e no always consis en o clea ly
aligned wi h he in ended mo ion. Some sounds a i ed o e ly abs ac , while o he s el dis an om
he physical beha io s I had in mind. A i s , his unp edic abili y el like a limi a ion, bu o e
ime I began o iew i as pa o he ma e ial’s exp essi e ange. Many o he ex u es ha became
essen ial o he inal piece eme ged p ecisely om hese unexpec ed esponses.
The e was also a ques ion o au ho ship. While I designed and cu a ed he sys em, many o he
speci ic sonic decisions— he choice o ges u es, hei o de , hei iming—we e shaped by he model’s
1h ps://ele enlabs.io/sound-e ec s
2
in e p e a ions and clus e ing. Ra he han seeing his as a loss o con ol, I app oached i as a kind o
co-composi ion, whe e he sys em’s p oposals ac ed as a s imulus o c ea i e esponse.
5 Re lec ion
This composi ion is a small s ep in explo ing how AI-gene a ed sound co po a migh be used no
only as a sou ce o ma e ial bu also as a me hod o shaping musical o m. By in i ing a gene a i e
sys em o ake pa in bo h sound p oduc ion and s uc u al o ganiza ion, I hoped o es wha kinds o
musical hinking migh eme ge.
The esul is a piece shaped as much by lis ening as by designing—lis ening o he ou pu s o
he model, o he ela ionships be ween sounds, and o he way o m began o ake shape h ough
hese in e ac ions. Ra he han aiming o a demons a ion o echnical no el y, he wo k leans in o
he unce ain ies o he p ocess: he occasional misma ches, he unlikely pai ings, he su p ising
con inui ies ha eme ged h ough he sys em’s in e nal logic.
While s ill in an explo a o y s age, I see his app oach as a s ep owa d mo e in eg a ed uses o AI in
sound composi ion—whe e gene a i e models con ibu e no only o wha we hea bu also o how we
imagine and shape he spaces be ween sounds. A he same ime, his me hod o e s a way o explo e
how a black-box gene a i e model in e p e s speci ic opics and e eals unde lying biases— h ough
language ha is meaning ul o humans, a he han by uning abs ac model pa ame e s.
Re e ences
Che ep, M., Singh, N., and Shand, J. (2024). C ea i e ex - o-audio gene a ion ia syn hesize
p og amming. a Xi p ep in a Xi :2406.00294.
Liu, X., Zhu, Z., Liu, H., Yuan, Y., Huang, Q., Cui, M., Liang, J., Cao, Y., Kong, Q., Plumbley,
M. D., e al. (2025). Wa jou ney: Composi ional audio c ea ion wi h la ge language models. IEEE
T ansac ions on Audio, Speech and Language P ocessing.
Schwa z, D. (2007). Co pus-based conca ena i e syn hesis. IEEE signal p ocessing magazine,
24(2):92–104.
Singh, N., Mish a, M., and Macho e , T. (2024). AI o Musical Disco e y. An MIT Explo a ion o
Gene a i e AI. Publishe : MIT.
3