scieee Science in your language
[en] (orig)

The Jam_bot, a Real-Time System for Collaborative Free Improvisation With Music Language Models

Author: Lancelot Blanchard; Perry Naseck; Stephen Brade; Kimaya Lecamwasam; Jordan Rudess; Cheng-Zhi Anna Huang; Joseph Paradiso
Publisher: Zenodo
DOI: 10.5281/zenodo.17706584
Source: https://zenodo.org/records/17706584/files/000088.pdf
THE JAM_BOT, A REAL-TIME SYSTEM FOR COLLABORATIVE
FREE IMPROVISATION WITH MUSIC LANGUAGE MODELS
Lancelo Blancha d1,∗Pe y Naseck1,∗S ephen B ade2
Kimaya Lecamwasam1Jo dan Rudess1,3,♯ Cheng-Zhi Anna Huang2
Joseph Pa adiso1
1MIT Media Lab, Camb idge, MA, USA
2MIT Music Tech, Camb idge, MA, USA
3Wizdom Music, New Ci y, NY, USA
∗Au ho s con ibu ed equally
♯Wo k pe o med as pa o a Visi ing A is Residency
[email p o ec ed], [email p o ec ed]
ABSTRACT
In o de o design a Gene a i e AI sys em ha could im-
p o ise on s age wi h GRAMMY-winning keyboa d i u-
oso Jo dan Rudess, we de eloped he “JAM_BOT”, a eal-
ime pe o mance sys em ha could ma ch his eclec ic im-
p o isa ional aes he ics. We debu ed he JAM_BOT a a
high-s akes sold-ou conce o c i ical acclaim, ealizing
a se ies o i uosic igh ly-coupled Human-AI ee im-
p o isa ions in a ying musical s yles. Re lec ing on ou
yea -long collabo a ion, we summa ize lea nings o AI e-
sea che s and musicians on he adap a ions needed o u n
s a e-o - he-a symbolic music Language Models (LMs)
in o JAM_BOTS and he enginee ing equi ed o make hem
pe o mance- eady. We ocus on h ee aspec s: (1) en-
abling JAM_BOTS o ake on di e en musical oles by
adap ing music LMs o employ di e en in e ac ion s a e-
gies by modi ying he con ex and condi ioning signals; (2)
desc ibing how Rudess in en ionally s uc u es his imp o-
isa ion in o de o ine une JAM_BOTS o ma ch he s yle
needed o each piece; and (3) showing he op imiza ions
needed o un music LMs in eal- ime and embed hem in
a low-la ency mul i- h eaded sys em ha lis ens, p omp s,
and schedules model gene a ions seamlessly. We hope
hese insigh s enable mo e musician-AI symbio ic i u-
osi y.
1. INTRODUCTION
On Sep embe 21, 2024 a he MIT Media Lab, we pu ou
Gene a i e AI–powe ed JAM_BOT o he es oge he wi h
Jo dan Rudess in a high-s akes sold-ou conce . Jo dan
© L. Blancha d, P. Naseck, S. B ade, K. Lecamwasam, J.
Rudess, C.Z.A. Huang, and J. Pa adiso. Licensed unde a C ea i e Com-
mons A ibu ion 4.0 In e na ional License (CC BY 4.0). A ibu ion:
L. Blancha d, P. Naseck, S. B ade, K. Lecamwasam, J. Rudess, C.Z.A.
Huang, and J. Pa adiso, “The JAM_BOT, a Real-Time Sys em o Col-
labo a i e F ee Imp o isa ion wi h Music Language Models”, in P oc.
o he 26 h In . Socie y o Music In o ma ion Re ie al Con ., Daejeon,
Sou h Ko ea, 2025.
Rudess is known o his e sa ili y as an imp o ise and his
eclec ic musical pe o mances which showcase his i uos-
i y in dis inc gen es. When planning o his pe o mance,
Rudess wan ed o imp o ise wi h an AI sys em on s age in
eal- ime ins ead o wo king wi h Gene a i e AI o -s age
in an o line ashion. Th ough his sys em, Rudess wan ed
o be able o “imp o ise wi h [himsel ]” and ain models
ha could unde s and “ he language and logic behind he
way [he] imp o ised” o “push bounda ies”, which would
enable him o a ain a new o m o “symbio ic i uosi y”
oge he wi h he sys em [1]. As such, we designed a Gen-
e a i e AI sys em ha could imp o ise on s age wi h him
h ough a ange o musical pieces o a ying s yles. In
his pape , we e lec on his yea -long close collabo a ion
and explo e how Rudess’s unique aes he ic app oaches o
imp o isa ion shaped he design o a no el eal- ime pe -
o mance sys em, which we call he JAM_BOT.
We si ua e ou wo k in he con ex o ee imp o isa ion,
whe e musicians pe o m wi hou a p ede ined musical
sco e o s uc u e. Wi h he in eg a ion o a JAM_BOT, how
can he a is dynamically cho eog aph he piece? How
will hey and he JAM_BOT coo dina e wha and when o
play? How can hey an icipa e wha o expec om he
JAM_BOT, and main ain su icien agency and con ollabil-
i y o guide he imp o isa ion, especially when e e y hing
mus p oceed smoo hly du ing a high-s akes pe o mance?
To ackle hese challenges, we collabo a ed closely
wi h Jo dan Rudess o explo e how au o eg essi e sym-
bolic music Language Models (LMs) could be adap ed in o
JAM_BOTS, enabling i uosic, igh ly-coupled Human-
AI ee imp o isa ion. Al hough music LMs– om Music
T ans o me [2] o An icipa o y Music T ans o me s [3]–
can gene a e compelling musical sequences, hei ou pu s
a e o en gene ic, hei in e ence speeds oo slow, and hei
inpu , ou pu , and condi ioning s uc u es oo igid. To
suppo highly-en angled Human-AI imp o isa ions whe e
musical ole-swi ching is equen , mechanisms o coo di-
na ion o signaling a e essen ial. The JAM_BOT also needs
o pay “double a en ion” o bo h wha he human musician
755
is playing and i s own pas , p esen , and u u e [4].
Ou con ibu ions a e h ee old: Fi s , we de elop h ee
in e ac ion s a egies o he JAM_BOT, enabling he sys-
em o ake on di e en musical oles in he imp o isa ion
and he musician o shape he musical ou come. Second,
o JAM_BOTS o ma ch he s yle and iden i y needed o
each piece, we desc ibe how Rudess in en ionally s uc-
u es his imp o isa ions in o de o ine une music LMs o
enable he a o emen ioned in e ac ion s a egies and model
s ylis ic ges u es. Thi d, we show he op imiza ions needed
o un music LMs in eal- ime and how o embed hem
in a low-la ency mul i- h eaded sys em ha lis ens, and
p omp s and schedules model gene a ions seamlessly.
The combina ion o hese h ee con ibu ions culmi-
na ed in he 2024 conce . In his debu pe o mance,
Jo dan Rudess imp o ised alongside he JAM_BOT ac oss
mul iple musical pieces, each wi h dis inc gen es and
iden i ies, demons a ing ha he sys em is no only able o
lis en and espond in eal- ime in a s ylis ic-speci ic ash-
ion bu also appeal o and engage a wide audience. We also
elease ou code publicly in he hope ha JAM_BOTS can
be used in o he pe o mances.*
2. RELATED WORK
2.1 Symbolic Music Gene a ion and O line
In e ac ions
Algo i hmic composi ion is an ongoing cu iosi y o mu-
sicians and scien is s ha ocuses on adap ing compu a-
ional ad ancemen s o he pu poses o music c ea ion.
Ea ly wo k explo ed Ma ko chains and gene ic algo i hms
[5–7], while mo e ecen esea ch has in oduced com-
pelling echniques based on LSTM ne wo ks [8, 9], and
la e , T ans o me s [2,3]. T ans o me s, in pa icula , o m
he backbone o music LMs, which can gene a e music
wi h local cohe ence and a compelling global a c. O line
in e ac i e me hods ha e eme ged h ough unique adap a-
ions o exis ing algo i hms. Fo example, in illing ech-
niques accommoda e he nonlinea na u e o human com-
posi ion by aining models o b idge gaps be ween exis -
ing musical a cs [10, 11]. O he s a egies model mul i-
ple acks o music [11–13] o condi ion symbolic gene a-
ion on pa icula s yles [14,15]. Ou wo k builds on hese
echniques by embedding s a e-o - he-a symbolic music
gene a ion in o a eal- ime imp o isa ion. We adap he
An icipa o y Music T ans o me (AMT) [3] by de elop-
ing pe o mance- es ed in e ac ion s a egies o con ol-
lable and compelling ee imp o isa ion.
2.2 Real-Time Musician-Machine Collabo a ions
Resea che s ha e de eloped no el compu a ional ame-
wo ks o enable eal- ime in e ac ions be ween musicians
and musical agen s. Ea ly sys ems es ablished ha d-coded
links be ween musicians’ ges u es and syn he ic pe o m-
e s [16], enabling simul aneous conduc ing and pe o -
mance o semi-au onomous ule-based sys ems like Voy-
age [17]. La e wo ks such as Con inua o [18] used
*Ou code and esou ces can be accessed a h ps:
//jam-bo -ismi -2025.media.mi .edu/.
Ma ko chains o enable eal- ime esponses ha mi o
a musician’s imp o isa ions and s yle. OMax B o he s
[19] expands on his by using Ma ko models o c ea e
a modula eal- ime jam space o any numbe o musi-
cians o syn he ic pe o me s. Ma ko models con inue o
appea in con empo a y sys ems, including a ecen li e
co-imp o isa ion ea u ed in WIRED.†In pa allel, human-
obo in e ac ion esea ch has embedded imp o isa ional
agen s in an h opomo phic o ms, lending in e p e abil-
i y and embodimen o co-imp o isa ion [20–22], while
as audio models such as RAVE [23] can be le e aged
o eal- ime imb e ans e in Max.‡and has been de-
ployed in in e ac i e dance pe o mances [24] Recen sys-
ems ocused on symbolic music embed ligh weigh ecu -
en models o co-imp o ise in he s yle o Bach [25], o o
map bu on p esses o eal- ime piano ou pu [26]. Mo e
ecen ly, T ans o me s ha e been adap ed o eal- ime
use: ReaLCho ds and i s companion, ReaLJam, pe o m
adap i e li e ha moniza ion wi h s ong obus ness o un-
amilia melodic inpu , enabled by Rein o cemen Lea n-
ing [27, 28]. In his p ojec , we speci ically ocus on sym-
bolic music and con ibu e o his a ea by adap ing a music
LM o suppo h ee in e ac i e pa adigms o ee musical
dialogue wi h a i uoso musician. We p esen ou aining
pipeline o adap ing AMT o hese pa adigms, and docu-
men ou op imiza ion o AMT o eal- ime pe o mance.
3. DESIGNING & TRAINING THE JAM_BOT
Du ing ou i e a i e discussions wi h Jo dan Rudess, we
iden i ied key design equi emen s o he JAM_BOT o
op imal pe o mabili y and o ensu e Rudess’s com o on
s age.
3.1 Modeling Sepa a e Musical Iden i ies
“I wan ed o dig in o speci ic pa s o my musical pe son-
ali y and eally explo e wha makes each one ick. Each
model is like a deep di e in o a di e en side o how I play
o hink musically, and building hem sepa a ely ga e me
he eedom o shape hem wi h eal in en ion and nuance.”
(Jo dan Rudess)
Jo dan Rudess’s pe o mances ypically consis o a col-
lec ion o musical scenes, each showcasing his i uosi y in
dis inc gen es. Fo he JAM_BOT o accompany his imp o-
isa ion e ec i ely, he sys em mus independen ly unde -
s and hese a ying s yles. Ou expe imen s e ealed ha ,
in o de o ac as a con incing imp o isa ion pa ne , he
JAM_BOT needed no only o display local cohe ence wi h
ecen musical inpu , ensu ing con inui y, bu also a s ong
s yle adhe ence o he a is ’s cu en imp o isa ional s yle.
To ensu e his s ylis ic adhe ence, we collec mul iple
small se s o aining da a ha we use o ain mul iple
models o use in he JAM_BOT sys em, each co espond-
ing o he gen e o a musical piece ha Rudess can en-
gage in. While s a e-o - he-a music LMs a e e y capable
o gene a ing long and cohe en musical sequences, hei
†h ps://www.wi ed.com/s o y/
gene a i e-ai-music/
‡h ps://gi hub.com/acids-i cam/RAVE
756
la ge-scale aining can some imes be de imen al o hei
s ylis ic adhe ence and hei ou pu oo gene ic. Fo exam-
ple, AMT is ained on he Lakh da ase which has been
demons a ed o be comp ised o p ima ily elec onic, pop,
and classical music [29], biasing he s yle o he AMT’s
gene a ion o hese gen es. P e ious wo k [30] sugges s
ha , in c ea i e con ex s, o e i ing on small da ase s can
be a powe ul mechanism o enabling g ea e human in-
luence o e Gene a i e AI sys ems.
Wi h his in mind, we eco ded Rudess du ing his
p ac ice sessions and collec ed 15-45 minu e-long MIDI
clips. We augmen ed hese da a by ansposing hem o all
wel e keys and used hem o ine- une a p e ained AMT
model (s an o d-c m/music-medium-800k,
app ox. 360M pa ame e s). Models we e ained o 2,000
s eps, wi h o e i ing usually occu ing as ea ly as 300
s eps in, whe e alida ion loss would s a o pla eau.
While we do no cu en ly plan o elease he da ase ,
u he s a is ical analyses can be ound on ou websi e.
3.2 Implemen ing Con ollabili y
“I wan ed he JAM_BOT o eel like a e sion o mysel –like
i I could pu my musical b ain in o ano he playe and see
wha i would be like o jam wi h me. As someone who im-
p o ises by ea , ha was a eally exci ing idea o explo e.
[...] [The sys em could] imp o ise and pe o m li e in a
due wi h me. Some imes leading, some imes ollowing, he
model and I could c ea e new and unique music.” (Jo dan
Rudess)
A key equi emen o he JAM_BOT was i s con ol-
labili y. Du ing ee imp o isa ion, he absence o p e-
de e mined s uc u e and planning can make using an un-
p edic able gene a i e sys em daun ing. As such, Rudess
equi ed a mechanism o coo dina e musical oles and p o-
ide musical guidance on s age ha ei he lead o ollowed
musical decisions and ansi ion seamlessly be ween hese
oles, as well as accommoda e ha monic,melodic, and
hy hmic cues, wi h a ying deg ees o igidi y.
4. DEVELOPING INTERACTION STRATEGIES
FOR THE JAM_BOT
Implemen ing hese equi emen s, howe e , is challenging.
In eal- ime se ings, i is di icul o de e mine he op imal
iming and con ex o simula e o ganic musical dialogue in
au o eg essi e models. The inhe en speed and complex-
i y o Rudess’s music speci ically makes i impossible o
nai ely p omp ou model con inuously, since he p omp -
ing a e needed o gene a e cohe en sequences would be
oo high. As such, we ook inspi a ion om human im-
p o ise s and he balance be ween hei own cohe en mu-
sical pe o mance and hei con inuous ocus on he mu-
sical in o ma ion om ellow playe s o ca e ully decide
when and how o bes p omp he music LM. No gaa d e
al. desc ibe his dual-p ocess phenomenon in expe jazz
musicians as a conscious ocus on highe -le el musical ele-
men s and ensemble in e ac ion, alongside a subconscious
p ocess o gene a ing no e choices [4]. To model his phe-
nomenon, we de elop h ee in e ac ion s a egies ha bal-
ance he model’s a en ion o ex e nal in o ma ion and o-
cus on i s own composi ion (Fig. 1). Addi ionally, o en-
su e ha he sys em displays s ong s yle adhe ence, we
c a p ecise aining da ase s o each in e ac ion s a egy.
4.1 A egula ime in e als (1)
When & How o p omp The mos s aigh o wa d ap-
p oach o p omp he model is o do so a egula ime
in e als using he ha monic, melodic, and hy hmic con-
en mos ecen ly played by he pe o me . D awing om
common p ac ices in imp o ised music, his enables he
human and he sys em o al e na e (“ ade”) in hei imp o-
isa ion, wi h each yielding o he o he a e a p ede e -
mined ime in e al (e.g., 4 ba s, 2 ba s, 1 ba ). Du ing he
sys em’s u n, i ocuses solely on i s own ou pu , while i
lis ens o he human inpu when i is no playing.
T aining Da a & Pe o mance To ine- une a mu-
sic LM o enable his kind o in e ac ion, we collec long,
single-ins umen MIDI clips. Th ough his ine- uning,
he model is able o lea n how o con inue any sequence
o musical inpu in a speci ic s yle, making i a good i
o his ype o p omp ing s a egy. In he inal pe o -
mance, Rudess was able o use his mechanism o c ea e
wo pieces wi h g ea ly con as ing s yles: a p og essi e
ock piece, and a con apun al ba oque one, whe e him and
he JAM_BOT we e able o ade o each o he . See Mu-
sical Examples 11and 22 o a demons a ion o he wo
pieces wi h hei co esponding aining da a.
4.2 A e e y musical ges u e (2)
When & How o p omp Ano he app oach o model-
ing he dual p ocess o consciousness is by a ending o
e e y single inpu as i is ecei ed. Using his app oach,
he sys em lis ens o each no e played by he use , upda -
ing i s con ex a e a b ie delay (e.g., 100 ms o accom-
moda e cho d inpu ). This s a egy can be used o p omp
he model wi h ei he ha monic o melodic con en . Fo
melody condi ioning, a longe delay (e.g., 800 ms) allows
he sys em o wai o a sus ained no e be o e e-p omp ing
he model. Using his s a egy, he sys em can igh ly ol-
low he ha monic o melodic decisions o he pe o me
be o e seamlessly ansi ioning o a leading ole, un il an-
o he inpu is ecei ed. To implemen a s ic e o m o
condi ioning, we use he an icipa o y mechanism o An-
icipa o y Music T ans o me s o epea edly condi ion he
model on he same p omp e e y xms, o a gi en x. This
locks he sys em in o a ollowing ole, p e en ing i om
making no el musical decisions.
T aining Da a & Pe o mance Simila ly o he p e-
ious case, we can enable his ype o in e ac ion by ine-
uning music LMs on long eco ded MIDI clips. To model
he join dis ibu ion be ween he ha mony/melody condi-
ioning (condi ioning signal) and he sequence o gene -
a e (inpu signal), we c ea e MIDI iles wi h wo di e en
ins umen s– ep esen ed by wo di e en Gene al MIDI
1h ps://jam-bo -ismi -2025.media.mi .edu/
#musical-example-1
2h ps://jam-bo -ismi -2025.media.mi .edu/
#musical-example-2
757
Figu e 1. The h ee di e en inpu s a egies used o p omp he music LMs–a egula ime in e als,a e e y musical
ges u e, and on eques . In e ac ion s a egies (1) and (3) p o ide bo h ha monic and melodic in o ma ion o he sys em,
while in e ac ion s a egy (2) p o ides ei he . All p o ide hy hmic in o ma ion. ∆ e e s o he delay pa ame e se o
igge gene a ion o (2), while k e e s o he numbe o no es o ill he bu e o (3).
codes. These wo di e en ins umen s a e eco ded us-
ing ei he using di e en keyboa d egis e s, o h ough
di e en MIDI channels. Empi ically, we disco e ed ha
he bes aining da a was collec ed when Rudess ixed a
condi ioning signal (e.g., a speci ic cho d) o a ew ba s,
and imp o ised mul iple inpu signals o e he same con-
di ion be o e ansi ioning o a new one. In he pe o -
mance, his ype o in e ac ion was used o a uba o piece,
whe e Rudess would play sequences o cho ds, dic a ing
ha monic decisions o he JAM_BOT, which would eely
imp o ise melodies on op. This is exempli ied in Musi-
cal Example 3.3Ano he example o his in e ac ion s a -
egy is shown in Musical Example 4,4whe e Rudess can
p omp he JAM_BOT o o e cho d sugges ions by play-
ing melodies on op.
4.3 On eques (3)
When & How o p omp Finally, we can employ an hy-
b id app oach, which allows he model o al e na e explic-
i ly be ween ocus on i s own composi ion and he use
inpu on eques . He e, he sys em shi s i s a en ion o he
pe o me ’s main condi ioning inpu (condi ioning signal
1) upon eques , igge ed by a speci ic inpu (condi ioning
signal 2–e.g., a speci ic egis e o he keyboa d). Once
ac i a ed, he sys em uses he mos ecen no es s o ed in
a bu e o p ede e mined size o condi ion i s u u e ou -
pu s. This app oach, o example, can allow he pe o me
o play melodic lines and p omp he sys em o e esh i s
ou pu when a oo no e is played in a lowe oc a e.
T aining Da a & Pe o mance This ype o in e ac-
ion is sligh ly mo e complex han he p e ious wo. In
o de o ine- une music LMs and enable his in e ac ion,
we mus eco d wo di e en MIDI clips, enabling he
modeling o bo h condi ioning signals. In he inal pe o -
mance, Jo dan Rudess used his s a egy o a piece whe e
he JAM_BOT was able o imp o ise cho ds and basslines,
3h ps://jam-bo -ismi -2025.media.mi .edu/
#musical-example-3
4h ps://jam-bo -ismi -2025.media.mi .edu/
#musical-example-4
while Rudess imp o ised melodies on op. When he
wan ed o ocus he model’s a en ion on his mos ecen
ha monic and melodic mo emen s, he could hi a oo no e
on he lowes oc a e o his keyboa d o igge he p omp -
ing o he unde lying music LM. The aining da a o his
in e ac ion equi ed Rudess o p o ide wo MIDI clips: one
wi h pai s o cho ds and basslines, and one wi h pai s o
cho ds and oo no es. On s age, Rudess pe o med wi h
his in e ac ion s a egy o c ea e a musical piece whe e he
would imp o ise melodies o e he JAM_BOT’s basslines
and cho d p og essions. On cue, Rudess was able o p o-
ide a oo no e o guide he sys em’s ha monic gene a ion.
This is exempli ied in Musical Example 5.5
We no e ha , o all inpu s a egies, o addi ionally en-
su e local cohe ence wi h he musician’s inpu , we use pa -
icula ly small con ex windows (be ween 40 and 60 no es)
o p omp he music LM. This allows he JAM_BOT o gen-
e a e sequences ha connec di ec ly o he pe o me ’s in-
pu , ins ead o ocusing on musical e en s ha ha e hap-
pened in a mo e dis an pas .
5. OPTIMIZING MUSIC LMS FOR REAL-TIME
PERFORMANCE
Wi h ou in e ac ion s a egies designed and cus om mu-
sic LMs ine- uned, we can now de elop a eal- ime a -
chi ec u e ha acili a es seamless in eg a ion wi h he pe -
o me . To achie e his, he sys em mus execu e he ol-
lowing asks in pa allel:
• Recei e MIDI inpu om he pe o me , assign
imes amps, and con e i in o musical e en s.
• Based on he cu en in e ac ion s a egy, agg ega e
musical e en s and dynamically c ea e p omp s o
he music LM.
• Pe o m au o eg essi e in e ence using he cu en
p omp .
5h ps://jam-bo -ismi -2025.media.mi .edu/
#musical-example-5
758
• Ga he he music LM’s ou pu and schedule musical
e en s o be played as MIDI no es.
These asks mus no only un in pa allel bu also syn-
ch onize wi h a global clock o minimize la ency. While
eal- ime audio p ocessing ypically add esses such chal-
lenges, applying hese p inciples o complex Machine
Lea ning algo i hms is less common due o inhe en la-
ency issues. Addi ionally, he in icacy and igh cou-
pling o he in e ac ion s a egies we in oduce–compa ed
o simple asks like con inua ion–add complexi y o he
sys em design. In his sec ion, we de ail he sys em’s a -
chi ec u e and ou line he op imiza ions implemen ed o
each s ep o ensu e e icien pe o mance. We hope ha
ou a chi ec u e can inspi e u u e eal- ime AI-powe ed
music sys ems.
5.1 De eloping a Real-Time En i onmen
To build a eal- ime sys em, we u ilize he JUCE ame-
wo k, an open-sou ce, c oss-pla o m C++ amewo k
popula o audio applica ions [31]. Beyond o e ing ools
o g aphical in e aces and MIDI de ice in e ac ion, i
p o ides nume ous h ead-sa e algo i hms and da a s uc-
u es. Based on he ou asks ou lined ea lie , we pa i ion
ou sys em in o ou pa allel h eads, each communica ing
ia h ead-sa e queues (Fig. 2), desc ibed below.
Ou i s h ead is he Clock Th ead which, simila ly o
Digi al Audio Wo ks a ions (DAWs), manages con inuous
synch oniza ion o a global ime clock ha is ei he in e -
nal, using he machine’s clock sys em, o ex e nal, ecei -
ing in o ma ion om ano he sys em ia MIDI Timecode
(MTC). To p e en sudden ime jumps when synch oniz-
ing wi h an ex e nal clock, we use a P opo ional Con ol
Loop o upda e ou local ime o se δas ollows:
δnew =((1 −G)δp e +G·δi |δ−δp e | ≤ T
δo he wise
whe e Gis a gain pa ame e , δ= ( local − ecei ed)is
he new calcula ed ime o se om he ime in o ma ion
ecei ed ia MTC, and Tis a h eshold pa ame e .
The Inpu Cap u e Th ead handles MIDI inpu om
he pe o me and passes i o he P ocessing Th ead. I
pass h ough is enabled, i can also send MIDI inpu di-
ec ly o he MIDI syn hesize . The P ocessing Th ead
imes amps he MIDI no es ecei ed by he Inpu Cap u e
Th ead using he Clock Th ead. I hen con e s hese
MIDI no es in o musical e en s by combining No e On and
No e O messages, and c ea es musical p omp s acco d-
ing o he selec ed in e ac ion s a egy. These p omp s a e
hen passed o he Gene a ion Th ead. When gene a ion
needs o be ese (e.g., a e 4 ba s when p omp ing he
model a egula ime in e als), he P ocessing Th ead
issues a ese signal o he Gene a ion Th ead. The Gen-
e a ion Th ead conduc s i e a i e in e ences on he music
LM wi h a dynamically adap i e con ex , sending gene -
a ed no es o he P ocessing Th ead. I necessa y, i can
also send me a-signals, such as eques s o in alida e p e-
iously sen no es. The P ocessing Th ead collec s he
gene a ed MIDI no es om he Gene a ion Th ead and
Figu e 2. The di e en pa allel p ocesses o he JAM_BOT
sys em, wi h hei in e ac ions.
schedules hem o playback, whe e i sends hem o a i -
ual o ha dwa e MIDI ou pu o syn hesis.
5.2 Op imizing Deep Neu al Ne wo ks
To achie e eal- ime pe o mance, model in e -
ence mus be highly op imized. Ou base model
(s an o d-c m/music-medium-800k) u ilizes
he ans o me s Py hon lib a y, which is buil on op
o he PyTo ch amewo k. Al hough he model logic
can be expo ed o o chsc ip o execu ion in C++
ia he PyTo ch C++ API, he gene a ion pe o mance
emains subop imal in his se ing. To enhance e iciency,
we con e ou base model o he ONNX amewo k,
enabling op imized execu ion ac oss a ious ha dwa e
backends. By expo ing he model o ONNX, we bene i
om g aph op imiza ions, ope a o usion, and na i e
CUDA suppo h ough ONNX Run ime. This ansi ion
allows ou sys em o le e age ha dwa e accele a ions
beyond wha is na i ely a ailable in PyTo ch.
We also apply 8-bi weigh quan iza ion o he model
using ONNX Run ime’s quan iza ion oolki . This d as-
ically educes memo y oo p in and compu a ional cos
while main aining in e ence accu acy wi hin an accep -
able ange. Quan iza ion no only educes he size o he
model in memo y bu also speeds up ma ix mul iplica-
ions, which a e he p ima y bo lenecks in T ans o me -
based a chi ec u es. Finally, we apply KV caching o op-
imize sequen ial in e ence. Wi hou caching, each new
oken gene a ion equi es ep ocessing he en i e inpu se-
quence, leading o unnecessa y long compu a ions. By
s o ing and eusing a en ion key- alue pai s om p e i-
ous in e ence s eps, we ensu e ha subsequen oken gen-
e a ions ope a e wi h a educed compu a ional bu den, en-
abling be e eal- ime pe o mances. These op imiza ions
allow us o e en ually un in e ences on a consume -g ade
NVIDIA GeFo ce RTX 4090 GPU.
6. EVALUATION
6.1 Jo dan Rudess’s sel epo
“I ’s been p e y mind-blowing o c ea e his ech-based
e sion o mysel –like looking in o a eal- ime musical mi -
o . I’ e played wi h so many amazing musicians o e he
yea s and lea ned a on om each expe ience, bu his was
some hing o ally di e en . I ga e me a deep, almos ana-
759

ly ical look a how I ac ually hink and play—how musical
ideas a e buil om he ules and ins inc s I’ e in e nal-
ized o e ime. I ’s been inc edibly educa ional, no jus
c ea i ely bu in unde s anding he a chi ec u e o my own
musical language. And I’m s ill wide open o whe e his
explo a ion can go nex .” (Jo dan Rudess)
6.2 Audiences’ pe cep ion
Du ing he Sep embe 2024 pe o mance, we collec ed
audience eedback o assess he pe cei ed impac o he
JAM_BOT. Fi e o ou 41 pa icipan s we e excluded o
lack o comple ion o any po ion o he su ey, esul ing
in 36 esponses. Pa icipan s we e asked whe he hey no-
iced speci ic JAM_BOT beha io s du ing he pe o mance,
and mos epo ed obse ing eal- ime eac ions o he li e
musicians (n= 25) and independen ha monic decision-
making (n= 24). This sugges s ha he sys em e ec i ely
ollows and leads musical decisions as equi ed. We also
assessed whe he he ou pu pe cep ibly s ayed om he
di ec ions o he pe o me s, de ined in his case as “co-
he ence", and ound ha he e was no signi ican di e -
ence be ween he numbe o pa icipan s who did (n= 15)
and did no (n= 13) eel ha he ou pu s we e cohe -
en . This ambigui y sugges ed ha u he , mo e nuanced
assessmen s o hese dynamics we e needed. Quali a i e
esponses highligh ed a ange o eac ions o he sys em’s
beha io and he pe o mance o e all, wi h speci ic ap-
p ecia ion o he JAM_BOT’s esponsi e, eal- ime ou pu .
Howe e , pa icipan s also no ed ha some o he gene -
a ed music el mono onous, while o he s s a ed ha “ AI
is going o ha e a place in he u u e o music whe he
someone likes i o no ...” and ha “i was eally amazing
how well he model was able o eac o he playe s and
c ea e a lo ely sound...” These e lec ions, howe e , a e
limi ed by he con ex in which hey we e elici ed. Since
he conce was open o he public, ou sample popula ion
a ied widely in bo h musical expe ise and AI amilia -
i y. Gi en he sou ces o audi o y, isual, and social dis-
ac ion ha also cha ac e ize li e pe o mance con ex s, i
was especially impo an o supplemen ou indings om
he pe o mance wi h con olled e alua ions o he e icacy
o JAM_BOT. The ull desc ip ion o he s udy and s a is i-
cal analyses can be ound on ou websi e.6
6.3 Compa ing JAM_BOT and p e ious me hods
*
Jo dan Rudess
Figu e 3. Resul s depic ing numbe o imes a sou ce is
p e e ed in ou lis ening s udy. E o ba s show he s an-
da d de ia ion o a binomial dis ibu ion i ed o he bina y
win/loss coun s o each sou ce.
6h ps://jam-bo -ismi -2025.media.mi .edu/
#appendix-audience
We conduc a lis ening s udy (n= 24) o baseline a
JAM_BOT model ha can be p omp ed a egula ime in-
e als agains bo h Jo dan Rudess’s playing and Con inua-
o [18], because o how Con inua o can be adap ed o any
s ylis ic inpu and he con empo a y use o Ma ko models
in eal- ime con ex s [32]. We employ a pai wise compa i-
son in which pa icipan s lis en o wo con inua ions o he
same musical p omp and a e which con inua ion hey p e-
e o he ini ial musical p omp on a 5-poin Like scale.
The esul s (Fig. 3) ind no signi ican dis inc ion be ween
he con inua ions o ou me hod and Jo dan Rudess’s play-
ing, as well as a s ong and signi ican p e e ence o bo h
o hese sou ces o e Con inua o . The ull desc ip ion o
he s udy and s a is ical analyses can be ound on ou web-
si e.7
7. CONCLUSION & FUTURE WORK
We in oduced he JAM_BOT, a eal- ime Gene a i e AI
sys em u ilizing music Language Models o acili a e col-
labo a i e ee imp o isa ion wi h pe o me s. De eloped
in collabo a ion wi h Jo dan Rudess, he sys em was ai-
lo ed o mee his equi emen s du ing a li e music pe o -
mance. This p ocess in ol ed designing h ee in e ac ion
s a egies o guide eal- ime p omp ing o he music LM,
along wi h selec ing and e ining he necessa y aining
da a. We also de eloped an op imized sys em o embed ou
models and un hem in eal- ime. We hope ha ou wo k
can os e mo e collabo a ions be ween AI esea che s and
a is s and inspi e u u e de elopmen o gene a i e AI mu-
sic sys ems o use in li e music pe o mances. We be-
lie e addi ional in e ac ion s a egies, pa icula ly h ough
he use o Rein o cemen Lea ning, could enhance he sys-
em’s esponsi eness o ex e nal musical inpu . Addi ion-
ally, we see po en ial imp o emen s by enabling exp essi e
MIDI ou pu s and ex ending he JAM_BOT’s capabili ies
beyond ee imp o isa ion o planned imp o isa ion.
“I’m exci ed abou he idea o e en ually c ea ing a single,
in elligen model ha b ings oge he e e y hing we’ e de-
eloped ac oss he sepa a e models so a . Righ now, di -
e en musical concep s li e in di e en places, bu I’d lo e
o see one uni ied “Jo dan Rudess” model ha can unde -
s and and espond mo e na u ally and luidly. E en beyond
jus my own musical inpu , I see he po en ial o i o be-
come some hing e en mo e expansi e—capable o d awing
om a b oade ange o in luences while s ill eeling deeply
connec ed o my musical iden i y.” (Jo dan Rudess)
8. ACKNOWLEDGMENTS
A special hank you o Jo dan and Danielle Rudess, who
en husias ically guided his esea ch p ojec . Thank you
o e e yone who made his esea ch and pe o mance
possible: Lydia B osnahan, B ian May on, Ca lo
Mandolini, Na han Pe y, Ke in Da is, Co nelle King,
E an Egozy, Aud ey Lee, MIT E33, MIT Cen e o
A Science and Technology, MIT O ice o he A s,
7h ps://jam-bo -ismi -2025.media.mi .edu/
#appendix-lis ening
760
and he en i e Responsi e En i onmen s and MIT Media
Lab communi ies. Thank you o he eam a JUCE o
de eloping he amewo k we used in his p ojec and o
p o iding a new educa ional license as a esul o his wo k.
9. ETHICS STATEMENT
We de eloped he JAM_BOT wi h he in en ion o augmen -
ing he abili y o pe o me s and compose s, no eplac-
ing hem, bu we mus ecognize ha simila sys ems ha e
he po en ial o displace bo h pa ies. In an e o o p o-
mo e a is s’ igh s and au onomy, we used a 1:1 aine -
o-pe o me sys em, meaning ha JAM_BOT pe o me s
own hei model and all o i s gene a ions. Howe e , u-
u e uses o JAM_BOTS a e no limi ed o his app oach,
which aises conce ns abou scena ios whe e pe o me s
and compose s a e seen as comple ely sepa a e en i ies
om esea ch eams. Wi hou s ong connec ions be ween
he a is s and hose aining he models, we un he isk o
jeopa dizing a is owne ship and c ea i e exp ession. We
also no e ha he JAM_BOT was p e- ained on p edomi-
nan ly Wes e n music, which limi s i s musical ocabula y
and con ibu es o b oade con e sa ions ega ding he di-
lu ion o non-Wes e n musical adi ions in music gene a-
ion. Simila issues ha e been aised by o he esea che s
in he ield, hough he pa icula nuance o e hical con-
side a ions o eal- ime sys ems like JAM_BOTS is wo h
con inued discussion.
10. REFERENCES
[1] L. Blancha d, P. Naseck, E. Egozy, and J. A. Pa adiso,
“De eloping Symbio ic Vi uosi y: Ai-Augmen ed
Musical Ins umen s and Thei Use in Li e Music Pe -
o mances,” An MIT Explo a ion o Gene a i e AI, sep
25 2024, h ps://mi -genai.pubpub.o g/pub/iz684jj .
[2] C.-Z. A. Huang, A. Vaswani, J. Uszko ei , N. Shazee ,
I. Simon, C. Haw ho ne, A. M. Dai, M. D. Ho man,
M. Dinculescu, and D. Eck, “Music T ans o me ,”
a Xi p ep in a Xi :1809.04281, 2018.
[3] J. Thicks un, D. L. W. Hall, C. Donahue, and P. Liang,
“An icipa o y music ans o me ,” T ansac ions on
Machine Lea ning Resea ch, 2024. [Online]. A ail-
able: h ps://open e iew.ne / o um?id=EBNJ33Fc l
[4] M. No gaa d, “The in e play be ween conscious
and subconscious p ocesses du ing expe musical
imp o isa ion,” in Music and Consciousness 2:
Wo lds, P ac ices, Modali ies, R. He be , D. Cla ke,
and E. Cla ke, Eds. Ox o d Uni e si y P ess, Ap .
2019, p. 0. [Online]. A ailable: h ps://doi.o g/10.
1093/oso/9780198804352.003.0011
[5] T. Funk, “A musical sui e composed by an elec onic
b ain: Reexamining he illiac sui e and he legacy o
leja en a. hille j .” Leona do Music Jou nal, ol. 28,
pp. 19–24, 2018.
[6] D. Cope, The algo i hmic compose . AR Edi ions,
Inc., 2000, ol. 16.
[7] J. Biles e al., “Genjam: A gene ic algo i hm o gen-
e a ing jazz solos,” in ICMC, ol. 94. Ann A bo , MI,
1994, pp. 131–137.
[8] D. Eck and J. Schmidhube , “A i s look a music com-
posi ion using ls m ecu en neu al ne wo ks,” Is i u o
Dalle Molle Di S udi Sull In elligenza A i iciale, ol.
103, no. 4, pp. 48–56, 2002.
[9] S. Oo e, I. Simon, S. Dieleman, D. Eck, and K. Si-
monyan, “This ime wi h eeling: Lea ning exp essi e
musical pe o mance,” Neu al Compu ing and Appli-
ca ions, ol. 32, pp. 955–967, 2020.
[10] C.-Z. A. Huang, T. Cooijmans, A. Robe s,
A. Cou ille, and D. Eck, “Coun e poin by con-
olu ion,” in P oceedings o ISMIR 2017, 2017.
[11] M. E. Maland o, “Compose ’s Assis an : An In e ac-
i e T ans o me o Mul i-T ack MIDI In illing,” in
P oc. 24 h In . Socie y o Music In o ma ion Re ie al
Con ., Milan, I aly, 2023, pp. 327–334.
[12] H.-W. Dong, W.-Y. Hsiao, L.-C. Yang, and Y.-H. Yang,
“Musegan: Mul i- ack sequen ial gene a i e ad e sa -
ial ne wo ks o symbolic music gene a ion and accom-
panimen ,” in P oceedings o he AAAI con e ence on
a i icial in elligence, ol. 32, no. 1, 2018.
[13] H.-W. Dong, K. Chen, S. Dubno , J. McAuley, and
T. Be g-Ki kpa ick, “Mul i ack music ans o me ,”
in ICASSP 2023-2023 IEEE In e na ional Con e ence
on Acous ics, Speech and Signal P ocessing (ICASSP).
IEEE, 2023, pp. 1–5.
[14] Y.-J. Shih, S.-L. Wu, F. Zalkow, M. Mülle , and Y.-H.
Yang, “Theme ans o me : Symbolic music gene a-
ion wi h heme-condi ioned ans o me ,” IEEE T ans-
ac ions on Mul imedia, ol. 25, pp. 3495–3508, 2022.
[15] S.-L. Wu and Y.-H. Yang, “Musemo phose: Full-song
and ine-g ained piano music s yle ans e wi h one
ans o me ae,” IEEE/ACM T ansac ions on Audio,
Speech, and Language P ocessing, ol. 31, pp. 1953–
1967, 2023.
[16] B. L. Ve coe, “Syn he ic lis ene s and syn he ic pe -
o me s,” The Jou nal o he Acous ical Socie y o
Ame ica, ol. 88, no. S1, pp. S70–S70, 1990.
[17] G. E. Lewis, “Too many no es: Compu e s, complex-
i y and cul u e in oyage ,” Leona do music jou nal,
ol. 10, pp. 33–39, 2000.
[18] F. Pache , “The con inua o : Musical in e ac ion wi h
s yle,” Jou nal o New Music Resea ch, ol. 32, no. 3,
pp. 333–341, 2003.
761
[19] G. Assayag, G. Bloch, M. Chemillie , A. Con , and
S. Dubno , “Omax b o he s: a dynamic yopology o
agen s o imp o iza ion lea ning,” in P oceedings o
he 1s ACM wo kshop on Audio and music compu ing
mul imedia, 2006.
[20] G. Weinbe g and S. D iscoll, “Robo -human in e -
ac ion wi h an an h opomo phic pe cussionis ,” in
P oceedings o he SIGCHI Con e ence on Human
Fac o s in Compu ing Sys ems, se . CHI ’06. New
Yo k, NY, USA: Associa ion o Compu ing Ma-
chine y, 2006, p. 1229–1232. [Online]. A ailable:
h ps://doi.o g/10.1145/1124772.1124957
[21] G. Weinbe g, M. God ey, A. Rae, and J. Rhoads, “A
eal- ime gene ic algo i hm in human- obo musical
imp o isa ion,” in Compu e Music Modeling and Re-
ie al. Sense o Sounds, R. K onland-Ma ine , S. Ys-
ad, and K. Jensen, Eds. Be lin, Heidelbe g: Sp inge
Be lin Heidelbe g, 2008, pp. 351–359.
[22] G. Ho man and G. Weinbe g, “Ges u e-based human-
obo jazz imp o isa ion,” in 2010 IEEE in e na ional
con e ence on obo ics and au oma ion. IEEE, 2010,
pp. 582–587.
[23] A. Caillon and P. Esling, “Ra e: A a ia ional
au oencode o as and high-quali y neu al audio
syn hesis,” 2021. [Online]. A ailable: h ps://a xi .o g/
abs/2111.05011
[24] S. Nabi, P. Esling, G. Pee e s, and F. Be ilacqua,
“Embodied explo a ion o deep la en spaces in
in e ac i e dance-music pe o mance,” in P oceedings
o he 9 h In e na ional Con e ence on Mo emen and
Compu ing, se . MOCO ’24. New Yo k, NY, USA:
Associa ion o Compu ing Machine y, 2024. [Online].
A ailable: h ps://doi.o g/10.1145/3658852.3659072
[25] C. Bene a os, J. Vande S el, and Z. Duan, “Bachdue :
A deep lea ning sys em o human-machine coun e -
poin imp o isa ion,” in P oceedings o he In e na-
ional Con e ence on New In e aces o Musical Ex-
p ession, 2020.
[26] C. Donahue, I. Simon, and S. Dieleman, “Piano genie,”
in P oceedings o he 24 h In e na ional Con e ence on
In elligen Use In e aces, 2019, pp. 160–164.
[27] Y. Wu, T. Cooijmans, K. Kas ne , A. Robe s, I. Simon,
A. Sca la os, C. Donahue, C. Ta akajian, S. Omid-
sha iei, A. Cou ille, P. S. Cas o, N. Jaques, and
C.-Z. A. Huang, “Adap i e accompanimen wi h e-
alcho ds,” in In e na ional Con e ence on Machine
Lea ning, 2024.
[28] A. Sca la os, Y. Wu, I. Simon, A. Robe s, T. Cooij-
mans, N. Jaques, C. Ta akajian, and C.-Z. A. Huang,
“Realjam: Real- ime human-ai music jamming wi h
ein o cemen lea ning- uned ans o me s,” CHI La e-
B eaking Wo ks ack, 2025.
[29] C. Ra el, Lea ning-based me hods o compa ing se-
quences, wi h applica ions o audio- o-midi alignmen
and ma ching. Columbia Uni e si y, 2016.
[30] Vigliensoni, Gab iel, Pe y, Phoenix, and Fieb ink,
Rebecca, “A Small-Da a Mindse o Gene a i e AI
C ea i e Wo k,” New Yo k, NY, USA, May 2022.
[Online]. A ailable: h ps://ual esea chonline.a s.ac.
uk/id/ep in /18343/1/CHI_Wo kshop_GenAI.pd
[31] J. S o e , “Juce,” h ps://gi hub.com/juce- amewo k/
JUCE, 2004.
[32] O. Ben-Tal and D. Dolan, “Musical and me a-musical
con e sa ions,” 2023.
762