GOAT: A LARGE DATASET OF PAIRED GUITAR AUDIO RECORDINGS
AND TABLATURES
Jackson Lo h1, Ped o Sa men o1,2, Sau jya Sa ka 1, Zixun Guo1, Ma hieu Ba he 1,3,
and Ma k Sandle 1
1Cen e o Digi al Music, Queen Ma y Uni e si y o London
2Music.AI
3Aix-Ma seille Uni CNRS PRISM
{j.j.lo h, p.p.sa men o, sau jya.sa ka , zixun.guo, m.ba he , ma k.sandle }@qmul.ac.uk
ABSTRACT
In ecen yea s, he gui a has ecei ed inc eased a en-
ion om he music in o ma ion e ie al (MIR) commu-
ni y d i en by he challenges posed by i s di e se play-
ing echniques and sonic cha ac e is ics. Mainly ueled
by deep lea ning app oaches, p og ess has been limi ed
by he sca ci y and limi ed anno a ions o da ase s. To
add ess his, we p esen he Gui a On Audio and Tab-
la u es (GOAT) da ase , comp ising 5.9 hou s o unique
high-quali y di ec inpu audio eco dings o elec ic gui-
a s om a a ie y o di e en gui a s and playe s. We
also p esen an e ec i e da a augmen a ion s a egy us-
ing gui a ampli ie s which deli e s nea -unlimi ed onal
a ie y, o which we p o ide a s a ing 29.5 hou s o au-
dio. Each eco ding is anno a ed using gui a abla u es, a
gui a -speci ic symbolic o ma suppo ing s ing and e
numbe s, as well as nume ous playing echniques. Fo his
we u ilise bo h he Gui a P o o ma , a so wa e o abla-
u e playback and edi ing, and a ex -like oken encoding.
Fu he mo e, we p esen compe i i e esul s using GOAT
o MIDI ansc ip ion and p elimina y esul s o a no el
app oach o au oma ic gui a abla u e ansc ip ion. We
hope ha GOAT opens up he possibili ies o ain no el
models on a wide a ie y o gui a - ela ed MIR asks, om
syn hesis o ansc ip ion o playing echnique de ec ion.
1. INTRODUCTION
The gui a is one o he mos popula ins umen s in mod-
e n wes e n music. I is no su p ise ha gui a -cen e ed
esea ch has ecei ed a lo o a en ion, pa icula ly in he
wake o ad ances in deep lea ning. Tasks such as au o-
ma ic gui a ansc ip ion (AGT) [1] [2] [3] [4], abla u e
gene a ion [5] [6], gui a ampli ie modeling [7] [8] [9],
and mo e ha e seen g ea p og ess. Despi e his in e es
in gui a s, he e emains limi ed da a a ailabili y o gui a
© J. Lo h, P. Sa men o, S. Sa ka , Z. Guo, M. Ba he , and
M. Sandle . Licensed unde a C ea i e Commons A ibu ion 4.0 In e na-
ional License (CC BY 4.0). A ibu ion: J. Lo h, P. Sa men o, S. Sa ka ,
Z. Guo, M. Ba he , and M. Sandle , “GOAT: A La ge Da ase o Pai ed
Gui a Audio Reco dings and Tabla u es”, in P oc. o he 26 h In . Soci-
e y o Music In o ma ion Re ie al Con ., Daejeon, Sou h Ko ea, 2025.
audio, pa icula ly in he case o anno a ed audio eco d-
ings.
Musical Ins umen Digi al In e ace (MIDI) is a well
known s anda d o ep esen ing music, and is a common
choice when anno a ing he gui a [10] [4]. MIDI eas-
ily encodes essen ial pa ame e s o musical no es such as
pi ch, no e onse , du a ion and eloci y. Howe e , i does
no gi e any indica ion o s ing o e numbe , no does
i ha e any s anda dized way o ep esen ing he nume ous
exp essi e componen s o gui a playing [11]. I is based
on a desc ip i e ype o no a ion, in which he e is a ela-
ionship be ween symbols and pi ch. Tabla u es, howe e ,
a e a p esc ip i e ype o no a ion commonly used o gui-
a , in which he e is a ela ion be ween symbols and ac-
ions, i.e. how o play said symbols [12]. While he e
a e some gui a - ocused da ase s in he li e a u e ha o -
e abla u e-like anno a ions [13] [14] [15] [16], hey a e
ypically limi ed o e and s ing alues, igno ing exp es-
si e echniques (e.g. bends, palm mu es, lega os).
As a mo i a ion behind he c ea ion o GOAT, he
Gui a OnAudio and Tabla u es da ase , we seek o su -
pass he limi a ions o MIDI and ins ead p io i ize building
a da ase o gui a eco dings anno a ed using gui a ab-
la u es, a popula musical o ma among gui a is s. Fo
his we ely on Gui a P o, a comme cial so wa e o
abla u e edi ion and playback widely sp ead amongs he
gui a communi y. This da ase consis s o 5.9 hou s o
unique eal audio eco dings o gui a s, ully anno a ed o
e /s ing numbe s and exp essi e playing echniques. We
also include he en i e da ase ende ed h ough a wide a-
ie y o di e en digi al gui a ampli ie s in a ious con ig-
u a ions and cabine impulse esponses, o a o al o 29.5
hou s o ully anno a ed audio. The anno a ions a e p o-
ided in Gui a P o o ma , as well as DadaGP [12] okens,
a comp essed ex -like ep esen a ion o he in o ma ion in
he abla u es. In an a emp o make a b idge wi h p io
li e a u e, we also include MIDI e sions o he con en .
This pape ’s con ibu ions include (1) an o e iew o
he GOAT da ase and he da a collec ion me hodology; (2)
a da a augmen a ion s a egy o ob aining a la ge amoun
o imb al a ia ion; (3) an e alua ion o esul s when using
he GOAT da ase o he ask o AGT, complemen ed wi h
(4) p elimina y esul s on a no el audio- o- ex -based ap-
p oach o AGT; inally, some (5) sugges ions o p ospec-
655
i e applica ions using he GOAT da ase . We i s co e
some ele an backg ound conce ning gui a abla u es and
similia p e iously eleased music da ase s. In Sec ion 3,
we ex ensi ely desc ibe he me hodology used o compile
he GOAT da ase . We hen desc ibe, in Sec ion 4, he de-
ails o he GOAT da ase , he ea u es i encompasses and
he ones i lacks. Wi hin Sec ion 5 we p esen esul s om
a use case o GOAT in he con ex o AGT, ollowing bo h
a adi ional MIDI-based app oach o compa ison wi h
p e ious esul s, and a no el p ocedu e using he ex -like
oken ep esen a ions o GOAT o ine- une a Whispe [17]
model. Finally, Sec ion 6 p oposes addi ional applica ions
o he da ase .
2. BACKGROUND
2.1 Gui a Tabla u es
Gui a abla u es (see Figu e 1), also called abs, a e sym-
bolic ep esen a ions o gui a music.
Figu e 1. Example o wo measu es o a gui a abla u e,
om he Gui a P o edi ing so wa e, exempli ying di -
e en gui a playing echniques (i.e. bends, palm mu es,
lega os and slides).
In con as o MIDI, which simply ep esen s a no e
pi ch o e ime, abs ep esen bo h he e and s ing num-
be o a gui a . Tabla u es can also suppo exp essi e
playing echniques such as bends, hamme -ons, pull-o s,
s um di ec ions and mo e. I is a e y in ui i e, isual
ep esen a ion o gui a music and is popula among gui-
a is s o lea ning speci ic songs and pe o mances. Gui-
a abs ha e seen an inc ease in a en ion om esea che s
in he pas ew yea s in a eas such as gui a abla u e gene -
a ion [12] [5] [6] [18] [19], au oma ic gui a ansc ip ion
[1] [3], abla u e p edic ion om MIDI [20] and acous ic
gui a syn hesis om abla u es [21].
As p e iously discussed, ep esen ing abla u es is di -
icul as he e is no s anda dised ep esen a ion like MIDI.
Some wo ks use a JSON-like ep esen a ion o encode in-
o ma ion [13] [22]. DadaGP [12] also in oduced an easy
o pa se ex -based encoding which uses a simple ocabu-
la y o desc ibe s ings, e , and many gui a -speci ic ex-
p essi e echniques.
2.2 Gui a -Focused Da ase s
To acili a e gui a ocused esea ch, a numbe o da ase s
has been compiled o a ious pu poses, ocusing mainly
on he ask o AGT, as obse able in Table 1. Da ase s such
as Gui a Se [13], EGDB [10], GAPS [4] (which ocuses
Da ase Leng h (m) MIDI Tab
GAPS [4] 843 ✓ ✗
GOAT [Ou s] 354 ✓ ✓
IDMT [22] 340 ✗ ✓
Gui a -TECHS [16] 312 ✓ ✓
Gui a Se [13] 180 ✗ ✓
F ançoisLeduc [2] 240 ✓ ✗
EGDB [10] 118 ✓ ✗
Table 1. Gui a - ocused da ase s in he li e a u e. We e-
po eal audio con en du a ion (in minu es), and exis ence
o anno a ions in MIDI and Tabla u e o ma . We conside
abla u e o ma o be any kind o anno a ion con aining
s ing and e in o ma ion.
exclusi ely on classical gui a eco dings), IDMT-SMT-
Gui a [22] F ançoisLeduc [2] and Gui a -TECHS [16]
con ain eal gui a eco dings wi h anno a ions o no es,
ypically in MIDI o abla u e-like o m. IDMT-SMT-
Audio-E ec s [23] ocuses on gui a eco dings wi h a -
ious audio e ec s. Syn hTab [14] con ains an ex emely
la ge collec ion o audio endi ions o gui a acks syn-
hesised using a i ual ins umen . While s ill sounding
somewha ealis ic, we do no include Syn hTab in Table 1
due o he syn he ic na u e o he audio.
3. METHODOLOGY
In o de o ha e as much lexibili y as possible wi h he
da ase , i was decided ea ly on ha he da ase would ha e
he ollowing quali ies: (1) di ec inpu (DI) eco dings o
allow o pos -p ocessing owa ds one a ie y; (2) 44.1
kHz sampling a e o accoun o high-quali y audio; (3)
anno a ions in he o m o gui a abla u es o cap u e as
much de ail abou he playing as possible.
3.1 Da a Collec ion
The da a was p oduced by bo h he main au ho s and wo
hi d pa y con en c ea o s. These c ea o s deli e ed dig-
i al audio wo ks a ion (DAW) p ojec s con aining a p e-
exis ing co e o a popula song and co esponding abla-
u es, om which he audio and ab pai s we e ex ac ed.
This allowed us o quickly ob ain a la ge amoun o da a.
Each song was manually checked and aligned agains he
abla u e o ensu e ha e e y no e was co ec be ween au-
dio and abla u e. I mul iple gui a pa s we e p esen
in a ab, hey we e spli o in o sepa a e iles. In addi-
ion o his, we eco ded o e an hou o addi ional audio
o complemen he da ase . This was done by collec ing
communi y-c ea ed abla u es 1and eco ding he pa s in
each song exac ly as desc ibed in he abla u e.
3.2 Da a P ocessing
In o de o p o ec he wo k made by he hi d pa y c e-
a o s, all song and a is names we e emo ed om he ile
1h ps://www.songs e .com/
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
656
Figu e 2. No e dis ibu ion o e he e boa d (i.e. s ing and e posi ions) o he con en in he GOAT da ase .
names and abla u es. The abla u es we e all con e ed o
he DadaGP [12] ex encoding o ma due o i s popula i y
in abla u e- ela ed asks. MIDI was also gene a ed om
he abla u es using he Gui a P o MIDI expo unc ion.
While he MIDI anno a ions do no encode he exp essi e
echniques included in he abs, hey allow he da ase o
be used along wi h o he da ase s and models which we e
buil o MIDI anno a ions, le e aging con ibu ions om
he li e a u e. Despi e he ac ual no es played in he au-
dio iles ha ing small iming impe ec ions, he no es in
he abla u e anno a ions a e all quan ized o a g id. The
alignmen p ocedu e om [4] was used o ine-align he
MIDI no es o he pe o mances. Bo h he ine-aligned and
quan ized MIDI a e p o ided in he da ase . The abs we e
also ende ed using he Gui a P o Realis ic Sound Engine
(RSE) 2. i ual ins umen o c ea e addi ional syn he ic
da a.
3.3 Ampli ie Rende ing
The audio in GOAT is eco ded om he aw ou pu o an
elec ic gui a pickup, also called di ec inpu o DI. This is
no a sound commonly hea d in music, as he elec ic gui-
a sound mos lis ene s a e accus omed o comes om a
DI un h ough a gui a ampli ie and gui a speake cabine
connec ed in se ies. We collec ed DIs in o de o maximize
he po en ial o da a augmen a ion ia eamping. Reamp-
ing is a p ocess in which a DI is un h ough an ampli ie
and some imes a ull e ec s chain o ans o m i in o a
desi ed inal sound. This is common in music p oduc ion
when an enginee wan s o cap u e a pe o mance bu no
commi o a inal gui a one.
While i would be heo e ically possible o eamp
GOAT using a a ie y o eal ampli ie s and cabine s, his
p ocess is ime consuming and expensi e. Ins ead, we
can use high quali y digi al ampli ie models and impulse
esponses (IRs) o a gui a cabine speake o model he
ampli ie and speake cabine espec i ely. This app oach
was used by [10] and showed imp o emen in ansc ip ion
wi h he inc eased one a ie y, hough he ampli ie mod-
els used we e o low quali y and limi ed o jus i e o al
2h ps://www.gui a -p o.com/blog/p/14545-signa u e-sounds-
explained-gui a -p o-7
digi al ampli ie s. We ex end his idea by using a highe
quali y mode n ampli ie modeling so wa e, as well as
signi ican ly inc easing he a ie y in ampli ie s and cab-
ine s used. Th ough his me hod, we a e able o ans o m
he audio in GOAT much close o he sound o eal wo ld
da a, as well as signi ican ly inc ease he imb al a ie y.
Fo he ampli ie , we used he Neu al Amp Mod-
ele 3(NAM), a high quali y digi al ampli ie plugin. The
NAM allows use s o cap u e a ‘p o ile’ o a gui a am-
pli ie , essen ially a snapsho o he ampli ie ’s sound. A
la ge communi y 4has g own ou o use s sha ing hei
p o iles online. Using publicly a ailable NAM p o iles,
we cu a ed a da ase o oughly 7,000 p o iles om o e
1,000 di e en ampli ie s. A p o ile was chosen a an-
dom o ende each audio i em in he da ase . IRs we e
used om he Neu alDSP A che ype Nolly 5plugin. This
plugin has cabine IRs om se e al di e en mic ophones,
wi h pa ame e s o con ol he mic ophone heigh and dis-
ance om he speake . The mic ophone and i s pa ame e s
we e chosen a andom o each audio i em in he da ase .
A send o a oom e e b was also enabled a andom (wi h
a25% chance o being enabled) and he send le el was
andomised be ween −10dB and 0dB. A py hon package
called Pedalboa d [24] was used o ende he audio. This
p ocess was epea ed 5 imes o e , esul ing in 29.5 hou s
o ully anno a ed audio. This p ocess can, howe e , be ex-
ended by choosing di e en pos -p ocessing app oaches.
The eamping code 6is made publicly a ailable.
4. DATASET
4.1 S a is ical O e iew
The inalised da ase con ains 5.17 hou s o DI eco dings
in s anda d uning 7and 25.85 hou s o ampli ie eco d-
ing, as well as an addi ional 43.85 minu es in non-s anda d
uning. This is b oken in o 153 (s anda d uning) and 19
(non-s anda d uning) indi idual da ase i ems.
Fo each da a poin in he GOAT da ase we p o ide he
ollowing: (1) he DI audio ile (in .wa o ma ) and (2)
3h ps://www.neu alampmodele .com/
4h ps:// onehun .o g/
5h ps://neu aldsp.com/plugins/a che ype-nolly
6h ps://gi hub.com/JackJamesLo h/GOAT-Da ase
7S anda d uning uses he open s ing pi ches E, A, D, G, B, e.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
657
Figu e 3. S a is ical in o ma ion abou he GOAT da ase . His og ams o empo (in BPM) pe song (a), ime signa u es
pe measu e (b), mos common no e du a ions, in which a alue o 960 icks co esponds o a qua e no e (c), and mos
equen gui a playing echniques (d).
he DI audio ende ed h ough 5 di e en ampli ie s and 5
di e en cabine s (in .wa o ma ); (3) he co esponding
Gui a P o abla u e (in .gp and .gp5 o ma , he la e o
allow o con e sion in o he DadaGP o ma ), he (4) co -
esponding DadaGP ex encoding o he Gui a P o anno-
a ion (in . x o ma ), and (5) a ende ed e sion in o audio
using he RSE i ual ins umen in Gui a P o (in .wa o -
ma ); inally, (6) he co esponding MIDI and ine-aligned
MIDI anno a ions (in .mid o ma ). A .cs ile con aining
me ada a and ile pa hs is also p o ided.
Gui a S anda d (m) O he (m)
S a ocas e 208.82 26.15
Les Paul 6.08 7.36
Jazzmas e 15.62 10.34
S andbe g 79.83 -
Table 2. B eakdown o con en o audio (in minu es) by
ype o gui a , using s anda d o al e na i e unings (e.g.
d op-D unings, hal -s ep o whole-s ep down- unings).
Table 2 epo s he con en (in e ms o minu es) o au-
dio, di ided by ype o gui a used in he eco dings. O e -
all, i con ains 172 dis inc audio iles, each wi h all he
anno a ions and me ada a desc ibed p e iously.
In Figu e 2 we obse e a compila ion o he no e dis i-
bu ion o all he songs in GOAT, om a o al o 109,869
o al played no es, con aining in o ma ion abou he e s
and s ings used o each.
Cho d Type Numbe o Ins ances
w/ 3 no es 7,174
w/ 4 no es 4,260
w/ 5 no es 1,065
w/ 6 no es 1,039
Table 3. B eakdown o numbe o cho ds in he GOAT
da ase in e ms o numbe o no es.
S a is ics on he cho ds p esen in GOAT a e shown in
Table 3. F om a o al o 13,538 cho ds in he da ase , he e
is a p e alence o cho ds played using only 3 no es. This
includes iads (majo , mino , diminished, augmen ed), bu
mos ly “powe cho ds”, common in he ock/me al sono i-
ies, comp ised o he oo , and he i h and oc a e abo e.
Figu e 3 shows his og am b eakdowns o he musical
con en in he da ase . In (a) we obse e a dis ibu ion o
he empos, in bpm, o each song in he GOAT da ase .
In e ms o ime signa u es, in (b) we see a p e alence o
4/4 ime signa u es in mos o he measu es o he songs in
he da ase . The s a is ics p esen ed in (c) and (d) le e age
he oken o ma o he DadaGP anno a ions. Inspec ing
he GOAT da ase ia his ex -like o ma , we can p o-
duce an his og am o he mos common no e du a ions (see
(c), in which a qua e no e co esponds o 960 icks, 480
icks co espond o an eigh h no e, 240 icks o a six een h
no e, 320 o an eigh h no e iple , 720 icks o a do ed
eigh h no e, and so on), and he mos commonly used gui-
a playing echniques (d). Rega ding he la e , we ha e
highligh ed he ou mos p e alen exp essi e echniques,
bu he da ase con ains ins ances o o he s (e.g. apping,
ib a o), al hough wi h a much smalle ep esen a ion.
4.2 Dis ibu ion
We dis ibu e he GOAT da ase on he Zenodo 8pla o m.
The da ase is made a ailable by eques o be e con ol
i s use o esea ch pu poses only.
4.3 Wha Is Missing?
As o his e sion, he e is no in o ma ion ega ding key
signa u e as me ada a in GOAT. Finally, akin o wha he
au ho s e e ed o in [12], Gui a P o (up un il i s e -
sion 5) does no include no e eloci y in o ma ion as in
MIDI. Howe e , Gui a P o does ep esen loudness be-
ween no es and musical ph ases by using adi ional dy-
namic ins uc ions (e.g. o e, piano). Thus, dynamics a e
excluded om bo h he Gui a P o and MIDI endi ions o
he GOAT da ase , and hey a e exclusi ely con ained in
he ealis ic DI eco dings. In e ms o me ada a, i would
be in e es ing o p o ide a dis ibu ion o he ype o cho ds
in e ms o ha monic quali y (e.g. majo , mino , se en h)
as well as diag ams o inge ings, bu ha in o ma ion is
cu en ly absen om wha GOAT p o ides.
8h ps://zenodo.o g/ eco ds/15690894
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
658
5. EXPERIMENTS USING GOAT
In o de o showcase he applicabili y o GOAT, we se-
lec ed au oma ic music ansc ip ion as a use case. We
e alua e GOAT on wo asks, ansc ip ion in o MIDI o -
ma , (he e e e ed o as MIDI ansc ip ion), ocusing only
on cap u ing no e’s pi ch, du a ion and eloci y in o ma-
ion, and a newly p oposed ask o au oma ic gui a abla-
u e ansc ip ion (AGTT) ia DadaGP oken p edic ion.
5.1 MIDI T ansc ip ion
While GOAT is no exp essly a MIDI- ocused da ase ,
MIDI ansc ip ion is s ill he mos common baseline used
in au oma ic gui a ansc ip ion (AGT) and is he me hod
esea che s will be mos amilia wi h. Despi e he p e-
iously discussed sho comings o MIDI, i is s ill bene i-
cial o e alua e GOAT using he ask o MIDI ansc ip ion
in o de o be e unde s and how he da ase compa es o
o he simila da ase s.
Following [4] [2], we ine une a high- esolu ion an-
sc ip ion model [25] p e ained using ex ensi ely aug-
men ed da a [26]. Ra he han aining om sc a ch, we
ine une a p e ained model, as [2, 4] ha e demons a ed
ha ine uning yields supe io pe o mance. We use he
same aining se up as [4] whe e each aining da a is seg-
men ed in o 10-second chunks wi h a hop size o 1 second.
Du ing aining, we apply da a augmen a ion by andomly
shi ing he pi ch up o ±2semi ones.
We ain h ee sepa a e models using GOAT o ou ex-
pe imen s. The models we e ained on an NVIDIA A5000
GPU un il con e gence. The i s uses only he aw DI
eco dings (DI). The nex uses he da ase eamped once
using ou me hodolgy discussed in Sec ion 3.3 (AMP). Fi-
nally, we ain a model using he da ase eamped wice
(AMP-XL). While his does no mean ha he model
would see mo e a ie y in MIDI anno a ion du ing ain-
ing, i does mean ha he model will see mo e imb al a i-
e y du ing aining. These models a e hen e alua ed on he
es spli o GOAT (bo h DI and AMP) and Gui a Se [13]
(which con ains only acous ic gui a ). We choose o e alu-
a e ou model on Gui a Se o assess i s ze o-sho lea ning
capabili ies. This is done in o de o es he ollowing hy-
po heses: H1- da a augmen a ion in he o m o gui a
ampli ie eamping imp o es he gene alizabili y o a an-
sc ip ion model on eal da a; H2- eamping he da ase
mul iple imes o gi e addi ional imb al a ie y u he im-
p o es he gene alizabili y o a ansc ip ion model on eal
da a. The MIDI ansc ip ion esul s a e ound in Table 4.
5.1.1 Discussion
All h ee models in gene al achie e ai ly compe i i e e-
sul s on he es da ase s [2]. The AMP model is able o
ob ain compa able esul s o he DI model on he DI and
Gui a Se es spli , while signi ican ly ou pe o ming he
DI model on he AMP es spli , he la e being he closes
o a eal wo ld ansc ip ion scena io conce ning elec ic
gui a . This is s ong e idence o H1, sugges ing ha he
ampli ie da a augmen a ion allows he model o gene alize
o di e en ampli ie imb es wi hou sac i icing quali y in
Da a Tes Spli P R F
DI 0.847 0.798 0.820
DI AMP 0.771 0.677 0.714
Gui a Se 0.879 0.849 0.860
DI 0.853 0.836 0.843
AMP AMP 0.824 0.781 0.800
Gui a Se 0.882 0.815 0.843
DI 0.831 0.809 0.818
AMP-XL AMP 0.821 0.796 0.806
Gui a Se 0.872 0.808 0.835
Table 4. MIDI ansc ip ion esul s. Me ics used a e no e-
le el p ecision (P), ecall (R) and F1sco e (F).
he case o DI o acous ic gui a . This imp o emen also
su passes he gains obse ed be ween models ained wi h
DI and ampli ie da a in [10], u he alida ing he e ec-
i eness o ou da a augmen a ion s a egy. No ably, ou
model ained solely on DI da a achie es compa able an-
sc ip ion pe o mance on Gui a Se ’s es se [2], u he
alida ing GOAT’s e ec i eness in aining AGT sys ems.
In e es ingly, he AMP-XL model pe o ms wo se han
bo h he DI and AMP models on he DI and Gui a Se es
spli s, while pe o ming sligh ly be e han AMP on he
AMP es spli . This seems o sugges ha he addi ional
imb e a ie y is only helping gene alisa ion among am-
pli ie imb es, while hu ing gene alisa ion among he un-
seen DI and acous ic gui a imb es. Howe e , we posi
ha his s ill shows some e idence o H2gi en ha he
AMP es spli is he closes o a eal wo ld elec ic gui a
ansc ip ion use case.
5.2 Tabla u e T ansc ip ion wi h Whispe
Gi en ou choice o Gui a P o as he anno a ion o ma in
GOAT, an ob ious choice o ask o e alua e he da ase is
AGTT. While his ask has seen some p e ious wo k [1]
[3], i is s ill conduc ed in a manne simila o MIDI an-
sc ip ion by p edic ing s ing and e ac i a ions a small
ime scales. We o mula e he p oblem di e en ly and p o-
pose a new ask by ins ead using he DadaGP anno a ions
as plain ex and ea ing he p oblem as an audio- o- ex
p oblem. This app oach means ha we a e no p edic -
ing exac no e imings, bu a he he mo e gene al shee
music-like in e p e a ion o he inpu audio.
Fo his ask, we ine- une Whispe [17], a la ge ounda-
ion model ained o au oma ic speech ecogni ion. The
audio in GOAT was spli in o 15 second chunks, as Whis-
pe has a maximum audio inpu leng h o 30 seconds.
Whispe also has a maximum sequence leng h o 448, so
any chunks whose anno a ions exceeded 448 okens we e
excluded. This mean we we e only able o use oughly
4.65 hou s o GOAT. A cus om okenize was also c e-
a ed so ha he model could co ec ly in e p e and p e-
dic DadaGP okens. The ocabula y used in DadaGP was
simpli ied, emo ing okens ela ing o a is and gen e and
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
659
Da a Loss Accu acy WER
DI 1.787 0.682 0.515
AMP 2.065 0.681 0.547
AMP-XL 2.285 0.692 0.518
Table 5. E alua ion esul s on he es spli o GOAT om
he bes uns o he ine- uned Whipse model.
simpli ying no e okens by emo ing imb e in o ma ion.
In ou expe imen s, we again ain h ee di e en mod-
els using he same da a se up as in Sec ion 5.1. The models
we e ained on an NVIDIA A100 GPU un il con e gence.
We use oken accu acy and wo d e o a e (WER) o e al-
ua e he pe o mance. P elimina y esul s om he bes
pe o ming epochs a e p esen ed in Table 5.
5.2.1 Discussion
The ansc ip ion esul s ul ima ely did no wo k well,
hough hey do show some p omise. DadaGP is a highly
s uc u ed ex encoding, and he bigges ini ial hu dle is
lea ning his s uc u e. Fo example, in a DadaGP encod-
ing, no es a e speci ied h ough one o mo e no e okens,
ollowed by a wai oken. Gi en he abundance o wai
okens in DadaGP encodings, i is possible o he model
o simply lea n o ou pu only hese okens o op imise he
loss unc ion. While example ou pu s o he ine- uned
models o show his beha io in pa s o he p edic ion,
he i s hal o he oken p edic ions s ill end o be semi-
s uc u ed DadaGP encodings. Common no e g oups (e.g
simple cho ds) a e p edic ed. Howe e , hese ul ima ely
did no co ela e wi h he no es in he g ound- u h encod-
ings. While he accu acy appea s o be somwha high, his
is likely misleading due o he p ele ance o wai okens
in bo h he da ase and model ou pu s.
I seems likely ha mo e ine- uning and da a is needed
o ge a wo kable ansc ip ion model. T aining using he
AMP-XL da a esul ed in a small imp o emen o he ac-
cu acy and WER, sugges ing ha addi ional da a could be
bene icial. A p e aining s ep using syn he ic da a, such
as Syn hTab [14] o example, could be a easible way o
collec enough da a o ge he model o p ope ly lea n he
DadaGP s uc u e. This could hen be ollowed by a u -
he ine- uning s ep on GOAT o expose he model o a
wide ange o imb es and eal gui a playing. Howe e ,
we lea e hese imp o emen s o u u e wo k. We include
hese limi ed esul s in o de o show he po en ial o his
app oach and showcase he bene i o a da ase like GOAT,
which is unique in being he only non-syn he ic da ase o
pai ed gui a audio and abla u es.
6. PROSPECTIVE USE CASES
We en ision GOAT as a gene al pu pose da ase ha can
be use ul o a numbe o di e en asks. While we p opse
hese asks as he s a ing poin o wo k wi h his da ase ,
we hope ha he da ase inds uses in o he unique and in-
e es ing asks as well.
6.1 Au oma ic Gui a T ansc ip ion
Con aining bo h MIDI and abla u e anno a ions o he
sou ce audio, GOAT lends i sel o AGT e y well. Sec-
ion 5 de ails p elimina y esul s o MIDI-based ansc ip-
ion and a newly p oposed ask o DadaGP oken p edic-
ion o abla u e ansc ip ion. We lea e imp o emen on
hese baselines as u u e wo k.
6.2 Realis ic Gui a Syn hesis
GOAT p o ides nume ous examples o di e en no es and
playing echniques, wi h anno a ions o all o hem. The
da a can be used o help ain o ine- une popula models
such as RAVE [27] and DDSP [28]. I is also sui able o
MIDI syn hesis models such as [29] o e en no el asks
such as syn hesis om gui a abla u es.
6.3 Au oma ic Gui a Playing Technique
Iden i ica ion
Au oma ic gui a playing echnique iden i ica ion is an-
o he ask ha we belie e can bene i om he a ailabili y
o GOAT. Fo example, he au ho s in [30], p opose ways
o modelling bends in gui a abla u es. Mo eo e , while
da ase s exis o his ype o wo k [31], GOAT has sig-
ni ican ly mo e audio and con ains bo h playing echnique
and no e anno a ions. We hypo hesise ha his could po-
en ially be combined wi h AGTT, u he imp o ing he
capabili ies o gui a ansc ip ion models.
6.4 E ec /Dis o ion Remo al
The clean DI audio and ampli ie pai s p o ide nume -
ous examples o e ec /dis o ion emo al echniques [32]
[33]. The da a augmen a ion echniques can also be ex-
ended o o he common e ec s (e.g phase , cho us).
7. CONCLUSION
We p esen GOAT, a as da ase con aining pai s o eal
gui a audio and gui a abla u e anno a ions. This da ase
is unique among gui a - ocused da ase s in ha i uses dig-
i al ep esen a ions o abla u es as he anno a ion o eal
audio, an anno a ion o ma which is well sui ed o cap u e
gui a -speci ic me ada a. A da a augmen a ion s a egy
which o e s a la ge amoun o onal a ie y is p esen ed.
We alida e he use o he da ase and he augmen a ion
s a egy h ough a s anda d li e a u e-based MIDI an-
sc ip ion expe imen . The esul s show compe i i e pe -
o mance as well as conside able imp o emen o gene al-
isa ion when using da a augmen a ion. We also show p e-
limina y s eps owa ds AGTT ia DadaGP okens, a ask
uniquely sui ed o GOAT. Finally, we p opose p ospec-
i e in e es ing applica ions and pa hs o esea ch u ilis-
ing GOAT including ansc ip ion, syn hesis, playing ech-
nique iden i ica ion and e ec emo al asks. We hope
ha his da ase helps esea che s u he imp o e gui a -
ocused MIR esea ch.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
660
8. ETHICAL STATEMENT
We decla e ha , in he c ea ion o he GOAT da ase , ca e-
ul conside a ion was unde aken by he au ho s o ensu e
ha he p ocess ollows he e hical guidelines suppo ed by
he ISMIR communi y. In his ega d, we in o m ha (1)
he con en c ea o s we e made awa e o he p ospec i e
applica ions and use cases o he da a hey p o ided, ha
(2) hey ag eed o such applica ions, and ha (3) hey we e
duly compensa ed o he da a. Howe e , due o he ac
ha he GOAT da ase con ains co e s (i.e. subjec i e en-
di ions, pe o mances and anno a ions/ ansc ip ions) o
popula songs, bo h in eco ded audio and abla u e o -
ma s, and because we do no exclusi ely own he copy-
igh s o some o i s con en , we in end o make he da ase
a ailable o esea ch pu poses only, upon eques .
9. ACKNOWLEDGEMENTS
This wo k is suppo ed by he EPSRC UKRI Cen e o
Doc o al T aining in A i icial In elligence and Music
(G an no. EP/S022694/1) as well as UKRI Inno a e UK
(P ojec no. 10102804).
10. REFERENCES
[1] A. Wiggins and Y. E. Kim, “Gui a Tabla u e Es ima-
ion wi h a Con olu ional Neu al Ne wo k,” in P o-
ceedings o he In e na ional Socie y o Music In o -
ma ion Re ie al Con e ence (ISMIR), 2019, pp. 284–
291.
[2] X. Riley, D. Edwa ds, and S. Dixon, “High Resolu ion
Gui a T ansc ip ion ia Domain Ddap a ion,” in In e -
na ional Con e ence on Acous ics, Speech and Signal
P ocessing (ICASSP). IEEE, 2024, pp. 1051–1055.
[3] F. Cwi kowi z, T. Hi onen, and A. Klapu i, “F e Ne :
Con inuous- alued pi ch con ou s eaming o poly-
phonic gui a abla u e ansc ip ion,” in IEEE In e -
na ional Con e ence on Acous ics, Speech and Signal
P ocessing (ICASSP). IEEE, 2023, pp. 1–5.
[4] J. Riley, Z. Guo, D. Edwa ds, S. Dixon e al., “GAPS:
A La ge and Di e se Classical Gui a Da ase and
Benchma k T ansc ip ion Model,” in 25 h In e na-
ional Socie y o Music In o ma ion Re ie al (ISMIR)
Con e ence, 2024.
[5] J. Lo h, P. Sa men o, C. Ca , Z. Zukowski, and M. Ba -
he , “P ogGP: F om Gui a P o Tabla u e Neu al Gen-
e a ion To P og essi e Me al P oduc ion,” in 16 h In-
e na ional Symposium on Compu e Music Mul idis-
ciplina y Resea ch, Tokyo, Japan, 2023.
[6] “Sh edGP: Gui a is S yle-Condi ioned Tabla u e Gen-
e a ion,” in 16 h In e na ional Symposium on Com-
pu e Music Mul idisciplina y Resea ch, 2023.
[7] A. W igh , E.-P. Damskägg, L. Ju ela, and
V. Välimäki, “Real-Time Gui a Ampli ie Emula-
ion wi h Deep Lea ning,” Applied Sciences, ol. 10,
no. 3, p. 766, Jan. 2020. [Online]. A ailable:
h ps://www.mdpi.com/2076-3417/10/3/766
[8] A. W igh , V. Välimäki, and L. Ju ela, “Ad e sa -
ial gui a ampli ie modelling wi h unpai ed da a,” in
ICASSP 2023-2023 IEEE In e na ional Con e ence on
Acous ics, Speech and Signal P ocessing (ICASSP).
IEEE, 2023, pp. 1–5.
[9] Y.-H. Chen, Y.-T. Yeh, Y.-C. Cheng, J.-T. Wu, Y.-H.
Ho, J.-S. R. Jang, and Y.-H. Yang, “Towa ds Ze o-Sho
Ampli ie Modeling: One- o-many Ampli ie Model-
ing ia Tone Embedding Con ol,” P oceedings o he
In e na ional Socie y o Music In o ma ion Re ie al
Con e ence (ISMIR), 2024.
[10] Y.-H. Chen, W.-Y. Hsiao, T.-K. Hsieh, J.-S. R. Jang,
and Y.-H. Yang, “Towa ds Au oma ic T ansc ip ion
o Polyphonic Elec ic Gui a Music: A New Da ase
and a Mul i-loss T ans o me Model,” in IEEE In e -
na ional Con e ence on Acous ics, Speech and Signal
P ocessing (ICASSP). IEEE, 2022, pp. 786–790.
[11] J. Lo h, P. Sa men o, S. Sa ka , and M. Ba he , “Anal-
ysis o MIDI as Inpu Rep esen a ions o Gui a Syn-
hesis,” in DMRN+19: Digi al Music Resea ch Ne -
wo k One-day Wo kshop, 2024.
[12] P. Sa men o, A. Kuma , C. Ca , Z. Zukowski, M. Ba -
he , and Y.-H. Yang, “DadaGP: a Da ase o Tokenized
Gui a P o Songs o Sequence Models,” in P oceed-
ings o he In e na ional Socie y o Music In o ma ion
Re ie al Con e ence (ISMIR), 2021, pp. 610–618.
[13] Q. Xi, R. M. Bi ne , J. Pauwels, X. Ye, and J. P. Bello,
“Gui a Se : A Da ase o Gui a T ansc ip ion,” P o-
ceedings o he In e na ional Socie y o Music In o -
ma ion Re ie al Con e ence (ISMIR), 2018.
[14] Y. Zang, Y. Zhong, F. Cwi kowi z, and Z. Duan, “Syn-
hTab: Le e aging Syn hesized Da a o Gui a Tabla-
u e T ansc ip ion,” in IEEE In e na ional Con e ence
on Acous ics, Speech and Signal P ocessing (ICASSP),
2024, pp. 1286–1290.
[15] H. Ped oza, W. Ab eu, R. M. Co ey, and I. R. Roman,
“Le e aging Real Elec ic Gui a Tones and E ec s o
Imp o e Robus ness in Gui a Tabla u e T ansc ip ion
Modeling,” in DAFx, 2024.
[16] ——, “Gui a -TECHS: An Elec ic Gui a Da ase
Co e ing Techniques, Musical Exce p s, Cho ds and
Scales Using a Di e se A ay o Ha dwa e,” IEEE In-
e na ional Con e ence on Acous ics, Speech and Sig-
nal P ocessing (ICASSP), 2025.
[17] A. Rad o d, J. W. Kim, T. Xu, G. B ockman,
C. McLea ey, and I. Su ske e , “Robus Speech
Recogni ion ia La ge-Scale Weak Supe ision,” in In-
e na ional Con e ence on Machine Lea ning. P o-
ceedings o Machine Lea ning Resea ch, 2023, pp.
28 492–28 518.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
661
[18] P. Sa men o, A. Kuma , Y.-H. Chen, C. Ca ,
Z. Zukowski, and M. Ba he , “GTR-CTRL: Ins u-
men and Gen e Condi ioning o Gui a -Focused Mu-
sic Gene a ion wi h T ans o me s,” in P oc. o he E o-
MUSART Con . Sp inge , 2023, pp. 260–275.
[19] P. Sa men o, “Gui a Tabla u e Gene a ion wi h Deep
Lea ning,” Ph.D. disse a ion, Queen Ma y Uni e si y
o London, Augus 2024.
[20] D. Edwa ds, X. Riley, P. Sa men o, and S. Dixon,
“MIDI- o-Tab: Gui a Tabla u e In e ence ia Masked
Language Modeling,” in P oceedings o he In e na-
ional Socie y o Music In o ma ion Re ie al Con e -
ence (ISMIR), 2024.
[21] H. Kim, S. Choi, and J. Nam, “Exp essi e Acous ic
Gui a Sound Syn hesis wi h an Ins umen -Speci ic
Inpu Rep esen a ion and Di usion Ou pain ing,” in
IEEE In e na ional Con e ence on Acous ics, Speech
and Signal P ocessing (ICASSP), 2024, pp. 7620–
7624.
[22] C. Kehling, J. Abeße , C. Di ma , and G. Schulle ,
“Au oma ic abla u e ansc ip ion o elec ic gui a
eco dings by es ima ion o sco e-and ins umen -
ela ed pa ame e s.” in DAFx, 2014, pp. 219–226.
[23] M. S ein, J. Abeße , C. Di ma , and G. Schulle , “Au-
oma ic De ec ion o Audio E ec s in Gui a and Bass
Reco dings,” in Audio Enginee ing Socie y Con en ion
128. Audio Enginee ing Socie y, 2010.
[24] P. Sobo , “Pedalboa d,” Jul. 2021. [Online]. A ailable:
h ps://doi.o g/10.5281/zenodo.7817838
[25] Q. Kong, B. Li, X. Song, Y. Wan, and Y. Wang, “High-
Resolu ion Piano T ansc ip ion wi h Pedals by Re-
g essing Onse and O se Times,” IEEE ACM T ans-
ac ions on Audio, Speech and Language P ocessing,
ol. 29, pp. 3707–3717, 2021.
[26] D. Edwa ds, S. Dixon, E. Bene os, A. Maezawa, and
Y. Kusaka, “A Da a-D i en Analysis o Robus Au o-
ma ic Piano T ansc ip ion,” IEEE Signal P ocess. Le .,
ol. 31, pp. 681–685, 2024.
[27] A. Caillon and P. Esling, “RAVE: A a ia ional
au oencode o as and high-quali y neu al audio
syn hesis.” [Online]. A ailable: h p://a xi .o g/abs/
2111.05011
[28] J. Engel, L. H. Han akul, C. Gu, and A. Robe s,
“DDSP: Di e en iable Digi al Signal P ocessing,” in
In e na ional Con e ence on Lea ning Rep esen a ions
(ICLR), 2020.
[29] B. Maman, J. Zei le , M. Mülle , and A. H. Be mano,
“Pe o mance condi ioning o di usion-based mul i-
ins umen music syn hesis,” in ICASSP 2024-2024
IEEE In e na ional Con e ence on Acous ics, Speech
and Signal P ocessing (ICASSP). IEEE, 2024, pp.
5045–5049.
[30] A. D’Hooge, L. Bigo, and K. Dégue nel, “Modeling
bends in popula music gui a abla u es,” in 24 h In e -
na ional Socie y o Music In o ma ion Re ie al Con-
e ence. Milan, I aly: In e na ional Socie y o Music
In o ma ion Re ie al, No 2023.
[31] Y.-P. Chen, L. Su, Y.-H. Yang e al., “Elec ic Gui a
Playing Technique De ec ion in Real-Wo ld Reco ding
Based on F0 Sequence Pa e n Recogni ion,” in P o-
ceedings o he In e na ional Socie y o Music In o -
ma ion Re ie al (ISMIR), 2015, pp. 708–714.
[32] M. Rice, C. J. S einme z, G. Fazekas, and J. D. Reiss,
“Gene al Pu pose Audio E ec Remo al,” in IEEE
Wo kshop on Applica ions o Signal P ocessing o Au-
dio and Acous ics (WASPAA). IEEE, 2023, pp. 1–5.
[33] J. Imo , G. Fabb o, M. A. Ma inez Rami ez, S. Uh-
lich, Y. Koyama, and Y. Mi su uji, “Dis o ion Audio
E ec s: Lea ning How o Reco e he Clean Signal,”
in P oceedings o he In e na ional Socie y o Music
In o ma ion Re ie al (ISMIR), 2022.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
662