GOAT: A Large Dataset of Paired Guitar Audio Recordings and Tablatures

Author: Jackson Loth; Pedro Sarmento; Saurjya Sarkar; Zixun Guo; Mathieu Barthet; Mark Sandler

Publisher: Zenodo

DOI: 10.5281/zenodo.17706552

Source: https://zenodo.org/records/17706552/files/000076.pdf

GOAT: A LARGE DATASET OF PAIRED GUITAR AUDIO RECORDINGS
AND TABLATURES
Jackson Lo h1, Ped o Sa men o1,2, Sau jya Sa ka 1, Zixun Guo1, Ma hieu Ba he 1,3,
and Ma k Sandle 1
1Cen e o Digi al Music, Queen Ma y Uni e si y o London
2Music.AI
3Aix-Ma seille Uni CNRS PRISM
{j.j.lo h, p.p.sa men o, sau jya.sa ka , zixun.guo, m.ba he , ma k.sandle }@qmul.ac.uk
ABSTRACT
In ecen yea s, he gui a has ecei ed inc eased a en-
ion om he music in o ma ion e ie al (MIR) commu-
ni y d i en by he challenges posed by i s di e se play-
ing echniques and sonic cha ac e is ics. Mainly ueled
by deep lea ning app oaches, p og ess has been limi ed
by he sca ci y and limi ed anno a ions o da ase s. To
add ess his, we p esen he Gui a On Audio and Tab-
la u es (GOAT) da ase , comp ising 5.9 hou s o unique
high-quali y di ec inpu audio eco dings o elec ic gui-
a s om a a ie y o di e en gui a s and playe s. We
also p esen an e ec i e da a augmen a ion s a egy us-
ing gui a ampli ie s which deli e s nea -unlimi ed onal
a ie y, o which we p o ide a s a ing 29.5 hou s o au-
dio. Each eco ding is anno a ed using gui a abla u es, a
gui a -speci ic symbolic o ma suppo ing s ing and e
numbe s, as well as nume ous playing echniques. Fo his
we u ilise bo h he Gui a P o o ma , a so wa e o abla-
u e playback and edi ing, and a ex -like oken encoding.
Fu he mo e, we p esen compe i i e esul s using GOAT
o MIDI ansc ip ion and p elimina y esul s o a no el
app oach o au oma ic gui a abla u e ansc ip ion. We
hope ha GOAT opens up he possibili ies o ain no el
models on a wide a ie y o gui a - ela ed MIR asks, om
syn hesis o ansc ip ion o playing echnique de ec ion.
1. INTRODUCTION
The gui a is one o he mos popula ins umen s in mod-
e n wes e n music. I is no su p ise ha gui a -cen e ed
esea ch has ecei ed a lo o a en ion, pa icula ly in he
wake o ad ances in deep lea ning. Tasks such as au o-
ma ic gui a ansc ip ion (AGT) [1] [2] [3] [4], abla u e
gene a ion [5] [6], gui a ampli ie modeling [7] [8] [9],
and mo e ha e seen g ea p og ess. Despi e his in e es
in gui a s, he e emains limi ed da a a ailabili y o gui a
© J. Lo h, P. Sa men o, S. Sa ka , Z. Guo, M. Ba he , and
M. Sandle . Licensed unde a C ea i e Commons A ibu ion 4.0 In e na-
ional License (CC BY 4.0). A ibu ion: J. Lo h, P. Sa men o, S. Sa ka ,
Z. Guo, M. Ba he , and M. Sandle , “GOAT: A La ge Da ase o Pai ed
Gui a Audio Reco dings and Tabla u es”, in P oc. o he 26 h In . Soci-
e y o Music In o ma ion Re ie al Con ., Daejeon, Sou h Ko ea, 2025.
audio, pa icula ly in he case o anno a ed audio eco d-
ings.
Musical Ins umen Digi al In e ace (MIDI) is a well
known s anda d o ep esen ing music, and is a common
choice when anno a ing he gui a [10] [4]. MIDI eas-
ily encodes essen ial pa ame e s o musical no es such as
pi ch, no e onse , du a ion and eloci y. Howe e , i does
no gi e any indica ion o s ing o e numbe , no does
i ha e any s anda dized way o ep esen ing he nume ous
exp essi e componen s o gui a playing [11]. I is based
on a desc ip i e ype o no a ion, in which he e is a ela-
ionship be ween symbols and pi ch. Tabla u es, howe e ,
a e a p esc ip i e ype o no a ion commonly used o gui-
a , in which he e is a ela ion be ween symbols and ac-
ions, i.e. how o play said symbols [12]. While he e
a e some gui a - ocused da ase s in he li e a u e ha o -
e abla u e-like anno a ions [13] [14] [15] [16], hey a e
ypically limi ed o e and s ing alues, igno ing exp es-
si e echniques (e.g. bends, palm mu es, lega os).
As a mo i a ion behind he c ea ion o GOAT, he
Gui a OnAudio and Tabla u es da ase , we seek o su -
pass he limi a ions o MIDI and ins ead p io i ize building
a da ase o gui a eco dings anno a ed using gui a ab-
la u es, a popula musical o ma among gui a is s. Fo
his we ely on Gui a P o, a comme cial so wa e o
abla u e edi ion and playback widely sp ead amongs he
gui a communi y. This da ase consis s o 5.9 hou s o
unique eal audio eco dings o gui a s, ully anno a ed o
e /s ing numbe s and exp essi e playing echniques. We
also include he en i e da ase ende ed h ough a wide a-
ie y o di e en digi al gui a ampli ie s in a ious con ig-
u a ions and cabine impulse esponses, o a o al o 29.5
hou s o ully anno a ed audio. The anno a ions a e p o-
ided in Gui a P o o ma , as well as DadaGP [12] okens,
a comp essed ex -like ep esen a ion o he in o ma ion in
he abla u es. In an a emp o make a b idge wi h p io
li e a u e, we also include MIDI e sions o he con en .
This pape ’s con ibu ions include (1) an o e iew o
he GOAT da ase and he da a collec ion me hodology; (2)
a da a augmen a ion s a egy o ob aining a la ge amoun
o imb al a ia ion; (3) an e alua ion o esul s when using
he GOAT da ase o he ask o AGT, complemen ed wi h
(4) p elimina y esul s on a no el audio- o- ex -based ap-
p oach o AGT; inally, some (5) sugges ions o p ospec-
655
i e applica ions using he GOAT da ase . We i s co e
some ele an backg ound conce ning gui a abla u es and
similia p e iously eleased music da ase s. In Sec ion 3,
we ex ensi ely desc ibe he me hodology used o compile
he GOAT da ase . We hen desc ibe, in Sec ion 4, he de-
ails o he GOAT da ase , he ea u es i encompasses and
he ones i lacks. Wi hin Sec ion 5 we p esen esul s om
a use case o GOAT in he con ex o AGT, ollowing bo h
a adi ional MIDI-based app oach o compa ison wi h
p e ious esul s, and a no el p ocedu e using he ex -like
oken ep esen a ions o GOAT o ine- une a Whispe [17]
model. Finally, Sec ion 6 p oposes addi ional applica ions
o he da ase .
2. BACKGROUND
2.1 Gui a Tabla u es
Gui a abla u es (see Figu e 1), also called abs, a e sym-
bolic ep esen a ions o gui a music.
Figu e 1. Example o wo measu es o a gui a abla u e,
om he Gui a P o edi ing so wa e, exempli ying di -
e en gui a playing echniques (i.e. bends, palm mu es,
lega os and slides).
In con as o MIDI, which simply ep esen s a no e
pi ch o e ime, abs ep esen bo h he e and s ing num-
be o a gui a . Tabla u es can also suppo exp essi e
playing echniques such as bends, hamme -ons, pull-o s,
s um di ec ions and mo e. I is a e y in ui i e, isual
ep esen a ion o gui a music and is popula among gui-
a is s o lea ning speci ic songs and pe o mances. Gui-
a abs ha e seen an inc ease in a en ion om esea che s
in he pas ew yea s in a eas such as gui a abla u e gene -
a ion [12] [5] [6] [18] [19], au oma ic gui a ansc ip ion
[1] [3], abla u e p edic ion om MIDI [20] and acous ic
gui a syn hesis om abla u es [21].
As p e iously discussed, ep esen ing abla u es is di -
icul as he e is no s anda dised ep esen a ion like MIDI.
Some wo ks use a JSON-like ep esen a ion o encode in-
o ma ion [13] [22]. DadaGP [12] also in oduced an easy
o pa se ex -based encoding which uses a simple ocabu-
la y o desc ibe s ings, e , and many gui a -speci ic ex-
p essi e echniques.
2.2 Gui a -Focused Da ase s
To acili a e gui a ocused esea ch, a numbe o da ase s
has been compiled o a ious pu poses, ocusing mainly
on he ask o AGT, as obse able in Table 1. Da ase s such
as Gui a Se [13], EGDB [10], GAPS [4] (which ocuses
Da ase Leng h (m) MIDI Tab
GAPS [4] 843 ✓ ✗
GOAT [Ou s] 354 ✓ ✓
IDMT [22] 340 ✗ ✓
Gui a -TECHS [16] 312 ✓ ✓
Gui a Se [13] 180 ✗ ✓
F ançoisLeduc [2] 240 ✓ ✗
EGDB [10] 118 ✓ ✗
Table 1. Gui a - ocused da ase s in he li e a u e. We e-
po eal audio con en du a ion (in minu es), and exis ence
o anno a ions in MIDI and Tabla u e o ma . We conside
abla u e o ma o be any kind o anno a ion con aining
s ing and e in o ma ion.
exclusi ely on classical gui a eco dings), IDMT-SMT-
Gui a [22] F ançoisLeduc [2] and Gui a -TECHS [16]
con ain eal gui a eco dings wi h anno a ions o no es,
ypically in MIDI o abla u e-like o m. IDMT-SMT-
Audio-E ec s [23] ocuses on gui a eco dings wi h a -
ious audio e ec s. Syn hTab [14] con ains an ex emely
la ge collec ion o audio endi ions o gui a acks syn-
hesised using a i ual ins umen . While s ill sounding
somewha ealis ic, we do no include Syn hTab in Table 1
due o he syn he ic na u e o he audio.
3. METHODOLOGY
In o de o ha e as much lexibili y as possible wi h he
da ase , i was decided ea ly on ha he da ase would ha e
he ollowing quali ies: (1) di ec inpu (DI) eco dings o
allow o pos -p ocessing owa ds one a ie y; (2) 44.1
kHz sampling a e o accoun o high-quali y audio; (3)
anno a ions in he o m o gui a abla u es o cap u e as
much de ail abou he playing as possible.
3.1 Da a Collec ion
The da a was p oduced by bo h he main au ho s and wo
hi d pa y con en c ea o s. These c ea o s deli e ed dig-
i al audio wo ks a ion (DAW) p ojec s con aining a p e-
exis ing co e o a popula song and co esponding abla-
u es, om which he audio and ab pai s we e ex ac ed.
This allowed us o quickly ob ain a la ge amoun o da a.
Each song was manually checked and aligned agains he
abla u e o ensu e ha e e y no e was co ec be ween au-
dio and abla u e. I mul iple gui a pa s we e p esen
in a ab, hey we e spli o in o sepa a e iles. In addi-
ion o his, we eco ded o e an hou o addi ional audio
o complemen he da ase . This was done by collec ing
communi y-c ea ed abla u es 1and eco ding he pa s in
each song exac ly as desc ibed in he abla u e.
3.2 Da a P ocessing
In o de o p o ec he wo k made by he hi d pa y c e-
a o s, all song and a is names we e emo ed om he ile
1h ps://www.songs e .com/
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
656
Figu e 2. No e dis ibu ion o e he e boa d (i.e. s ing and e posi ions) o he con en in he GOAT da ase .
names and abla u es. The abla u es we e all con e ed o
he DadaGP [12] ex encoding o ma due o i s popula i y
in abla u e- ela ed asks. MIDI was also gene a ed om
he abla u es using he Gui a P o MIDI expo unc ion.
While he MIDI anno a ions do no encode he exp essi e
echniques included in he abs, hey allow he da ase o
be used along wi h o he da ase s and models which we e
buil o MIDI anno a ions, le e aging con ibu ions om
he li e a u e. Despi e he ac ual no es played in he au-
dio iles ha ing small iming impe ec ions, he no es in
he abla u e anno a ions a e all quan ized o a g id. The
alignmen p ocedu e om [4] was used o ine-align he
MIDI no es o he pe o mances. Bo h he ine-aligned and
quan ized MIDI a e p o ided in he da ase . The abs we e
also ende ed using he Gui a P o Realis ic Sound Engine
(RSE) 2. i ual ins umen o c ea e addi ional syn he ic
da a.
3.3 Ampli ie Rende ing
The audio in GOAT is eco ded om he aw ou pu o an
elec ic gui a pickup, also called di ec inpu o DI. This is
no a sound commonly hea d in music, as he elec ic gui-
a sound mos lis ene s a e accus omed o comes om a
DI un h ough a gui a ampli ie and gui a speake cabine
connec ed in se ies. We collec ed DIs in o de o maximize
he po en ial o da a augmen a ion ia eamping. Reamp-
ing is a p ocess in which a DI is un h ough an ampli ie
and some imes a ull e ec s chain o ans o m i in o a
desi ed inal sound. This is common in music p oduc ion
when an enginee wan s o cap u e a pe o mance bu no
commi o a inal gui a one.
While i would be heo e ically possible o eamp
GOAT using a a ie y o eal ampli ie s and cabine s, his
p ocess is ime consuming and expensi e. Ins ead, we
can use high quali y digi al ampli ie models and impulse
esponses (IRs) o a gui a cabine speake o model he
ampli ie and speake cabine espec i ely. This app oach
was used by [10] and showed imp o emen in ansc ip ion
wi h he inc eased one a ie y, hough he ampli ie mod-
els used we e o low quali y and limi ed o jus i e o al
2h ps://www.gui a -p o.com/blog/p/14545-signa u e-sounds-
explained-gui a -p o-7
digi al ampli ie s. We ex end his idea by using a highe
quali y mode n ampli ie modeling so wa e, as well as
signi ican ly inc easing he a ie y in ampli ie s and cab-
ine s used. Th ough his me hod, we a e able o ans o m
he audio in GOAT much close o he sound o eal wo ld
da a, as well as signi ican ly inc ease he imb al a ie y.
Fo he ampli ie , we used he Neu al Amp Mod-
ele 3(NAM), a high quali y digi al ampli ie plugin. The
NAM allows use s o cap u e a ‘p o ile’ o a gui a am-
pli ie , essen ially a snapsho o he ampli ie ’s sound. A
la ge communi y 4has g own ou o use s sha ing hei
p o iles online. Using publicly a ailable NAM p o iles,
we cu a ed a da ase o oughly 7,000 p o iles om o e
1,000 di e en ampli ie s. A p o ile was chosen a an-
dom o ende each audio i em in he da ase . IRs we e
used om he Neu alDSP A che ype Nolly 5plugin. This
plugin has cabine IRs om se e al di e en mic ophones,
wi h pa ame e s o con ol he mic ophone heigh and dis-
ance om he speake . The mic ophone and i s pa ame e s
we e chosen a andom o each audio i em in he da ase .
A send o a oom e e b was also enabled a andom (wi h
a25% chance o being enabled) and he send le el was
andomised be ween −10dB and 0dB. A py hon package
called Pedalboa d [24] was used o ende he audio. This
p ocess was epea ed 5 imes o e , esul ing in 29.5 hou s
o ully anno a ed audio. This p ocess can, howe e , be ex-
ended by choosing di e en pos -p ocessing app oaches.
The eamping code 6is made publicly a ailable.
4. DATASET
4.1 S a is ical O e iew
The inalised da ase con ains 5.17 hou s o DI eco dings
in s anda d uning 7and 25.85 hou s o ampli ie eco d-
ing, as well as an addi ional 43.85 minu es in non-s anda d
uning. This is b oken in o 153 (s anda d uning) and 19
(non-s anda d uning) indi idual da ase i ems.
Fo each da a poin in he GOAT da ase we p o ide he
ollowing: (1) he DI audio ile (in .wa o ma ) and (2)
3h ps://www.neu alampmodele .com/
4h ps:// onehun .o g/
5h ps://neu aldsp.com/plugins/a che ype-nolly
6h ps://gi hub.com/JackJamesLo h/GOAT-Da ase
7S anda d uning uses he open s ing pi ches E, A, D, G, B, e.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
657
Figu e 3. S a is ical in o ma ion abou he GOAT da ase . His og ams o empo (in BPM) pe song (a), ime signa u es
pe measu e (b), mos common no e du a ions, in which a alue o 960 icks co esponds o a qua e no e (c), and mos
equen gui a playing echniques (d).
he DI audio ende ed h ough 5 di e en ampli ie s and 5
di e en cabine s (in .wa o ma ); (3) he co esponding
Gui a P o abla u e (in .gp and .gp5 o ma , he la e o
allow o con e sion in o he DadaGP o ma ), he (4) co -
esponding DadaGP ex encoding o he Gui a P o anno-
a ion (in . x o ma ), and (5) a ende ed e sion in o audio
using he RSE i ual ins umen in Gui a P o (in .wa o -
ma ); inally, (6) he co esponding MIDI and ine-aligned
MIDI anno a ions (in .mid o ma ). A .cs ile con aining
me ada a and ile pa hs is also p o ided.
Gui a S anda d (m) O he (m)
S a ocas e 208.82 26.15
Les Paul 6.08 7.36
Jazzmas e 15.62 10.34
S andbe g 79.83 -
Table 2. B eakdown o con en o audio (in minu es) by
ype o gui a , using s anda d o al e na i e unings (e.g.
d op-D unings, hal -s ep o whole-s ep down- unings).
Table 2 epo s he con en (in e ms o minu es) o au-
dio, di ided by ype o gui a used in he eco dings. O e -
all, i con ains 172 dis inc audio iles, each wi h all he
anno a ions and me ada a desc ibed p e iously.
In Figu e 2 we obse e a compila ion o he no e dis i-
bu ion o all he songs in GOAT, om a o al o 109,869
o al played no es, con aining in o ma ion abou he e s
and s ings used o each.
Cho d Type Numbe o Ins ances
w/ 3 no es 7,174
w/ 4 no es 4,260
w/ 5 no es 1,065
w/ 6 no es 1,039
Table 3. B eakdown o numbe o cho ds in he GOAT
da ase in e ms o numbe o no es.
S a is ics on he cho ds p esen in GOAT a e shown in
Table 3. F om a o al o 13,538 cho ds in he da ase , he e
is a p e alence o cho ds played using only 3 no es. This
includes iads (majo , mino , diminished, augmen ed), bu
mos ly “powe cho ds”, common in he ock/me al sono i-
ies, comp ised o he oo , and he i h and oc a e abo e.
Figu e 3 shows his og am b eakdowns o he musical
con en in he da ase . In (a) we obse e a dis ibu ion o
he empos, in bpm, o each song in he GOAT da ase .
In e ms o ime signa u es, in (b) we see a p e alence o
4/4 ime signa u es in mos o he measu es o he songs in
he da ase . The s a is ics p esen ed in (c) and (d) le e age
he oken o ma o he DadaGP anno a ions. Inspec ing
he GOAT da ase ia his ex -like o ma , we can p o-
duce an his og am o he mos common no e du a ions (see
(c), in which a qua e no e co esponds o 960 icks, 480
icks co espond o an eigh h no e, 240 icks o a six een h
no e, 320 o an eigh h no e iple , 720 icks o a do ed
eigh h no e, and so on), and he mos commonly used gui-
a playing echniques (d). Rega ding he la e , we ha e
highligh ed he ou mos p e alen exp essi e echniques,
bu he da ase con ains ins ances o o he s (e.g. apping,
ib a o), al hough wi h a much smalle ep esen a ion.
4.2 Dis ibu ion
We dis ibu e he GOAT da ase on he Zenodo 8pla o m.
The da ase is made a ailable by eques o be e con ol
i s use o esea ch pu poses only.
4.3 Wha Is Missing?
As o his e sion, he e is no in o ma ion ega ding key
signa u e as me ada a in GOAT. Finally, akin o wha he
au ho s e e ed o in [12], Gui a P o (up un il i s e -
sion 5) does no include no e eloci y in o ma ion as in
MIDI. Howe e , Gui a P o does ep esen loudness be-
ween no es and musical ph ases by using adi ional dy-
namic ins uc ions (e.g. o e, piano). Thus, dynamics a e
excluded om bo h he Gui a P o and MIDI endi ions o
he GOAT da ase , and hey a e exclusi ely con ained in
he ealis ic DI eco dings. In e ms o me ada a, i would
be in e es ing o p o ide a dis ibu ion o he ype o cho ds
in e ms o ha monic quali y (e.g. majo , mino , se en h)
as well as diag ams o inge ings, bu ha in o ma ion is
cu en ly absen om wha GOAT p o ides.
8h ps://zenodo.o g/ eco ds/15690894
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
658
5. EXPERIMENTS USING GOAT
In o de o showcase he applicabili y o GOAT, we se-
lec ed au oma ic music ansc ip ion as a use case. We
e alua e GOAT on wo asks, ansc ip ion in o MIDI o -
ma , (he e e e ed o as MIDI ansc ip ion), ocusing only
on cap u ing no e’s pi ch, du a ion and eloci y in o ma-
ion, and a newly p oposed ask o au oma ic gui a abla-
u e ansc ip ion (AGTT) ia DadaGP oken p edic ion.
5.1 MIDI T ansc ip ion
While GOAT is no exp essly a MIDI- ocused da ase ,
MIDI ansc ip ion is s ill he mos common baseline used
in au oma ic gui a ansc ip ion (AGT) and is he me hod
esea che s will be mos amilia wi h. Despi e he p e-
iously discussed sho comings o MIDI, i is s ill bene i-
cial o e alua e GOAT using he ask o MIDI ansc ip ion
in o de o be e unde s and how he da ase compa es o
o he simila da ase s.
Following [4] [2], we ine une a high- esolu ion an-
sc ip ion model [25] p e ained using ex ensi ely aug-
men ed da a [26]. Ra he han aining om sc a ch, we
ine une a p e ained model, as [2, 4] ha e demons a ed
ha ine uning yields supe io pe o mance. We use he
same aining se up as [4] whe e each aining da a is seg-
men ed in o 10-second chunks wi h a hop size o 1 second.
Du ing aining, we apply da a augmen a ion by andomly
shi ing he pi ch up o ±2semi ones.
We ain h ee sepa a e models using GOAT o ou ex-
pe imen s. The models we e ained on an NVIDIA A5000
GPU un il con e gence. The i s uses only he aw DI
eco dings (DI). The nex uses he da ase eamped once
using ou me hodolgy discussed in Sec ion 3.3 (AMP). Fi-
nally, we ain a model using he da ase eamped wice
(AMP-XL). While his does no mean ha he model
would see mo e a ie y in MIDI anno a ion du ing ain-
ing, i does mean ha he model will see mo e imb al a i-
e y du ing aining. These models a e hen e alua ed on he
es spli o GOAT (bo h DI and AMP) and Gui a Se [13]
(which con ains only acous ic gui a ). We choose o e alu-
a e ou model on Gui a Se o assess i s ze o-sho lea ning
capabili ies. This is done in o de o es he ollowing hy-
po heses: H1- da a augmen a ion in he o m o gui a
ampli ie eamping imp o es he gene alizabili y o a an-
sc ip ion model on eal da a; H2- eamping he da ase
mul iple imes o gi e addi ional imb al a ie y u he im-
p o es he gene alizabili y o a ansc ip ion model on eal
da a. The MIDI ansc ip ion esul s a e ound in Table 4.
5.1.1 Discussion
All h ee models in gene al achie e ai ly compe i i e e-
sul s on he es da ase s [2]. The AMP model is able o
ob ain compa able esul s o he DI model on he DI and
Gui a Se es spli , while signi ican ly ou pe o ming he
DI model on he AMP es spli , he la e being he closes
o a eal wo ld ansc ip ion scena io conce ning elec ic
gui a . This is s ong e idence o H1, sugges ing ha he
ampli ie da a augmen a ion allows he model o gene alize
o di e en ampli ie imb es wi hou sac i icing quali y in
Da a Tes Spli P R F
DI 0.847 0.798 0.820
DI AMP 0.771 0.677 0.714
Gui a Se 0.879 0.849 0.860
DI 0.853 0.836 0.843
AMP AMP 0.824 0.781 0.800
Gui a Se 0.882 0.815 0.843
DI 0.831 0.809 0.818
AMP-XL AMP 0.821 0.796 0.806
Gui a Se 0.872 0.808 0.835
Table 4. MIDI ansc ip ion esul s. Me ics used a e no e-
le el p ecision (P), ecall (R) and F1sco e (F).
he case o DI o acous ic gui a . This imp o emen also
su passes he gains obse ed be ween models ained wi h
DI and ampli ie da a in [10], u he alida ing he e ec-
i eness o ou da a augmen a ion s a egy. No ably, ou
model ained solely on DI da a achie es compa able an-
sc ip ion pe o mance on Gui a Se ’s es se [2], u he
alida ing GOAT’s e ec i eness in aining AGT sys ems.
In e es ingly, he AMP-XL model pe o ms wo se han
bo h he DI and AMP models on he DI and Gui a Se es
spli s, while pe o ming sligh ly be e han AMP on he
AMP es spli . This seems o sugges ha he addi ional
imb e a ie y is only helping gene alisa ion among am-
pli ie imb es, while hu ing gene alisa ion among he un-
seen DI and acous ic gui a imb es. Howe e , we posi
ha his s ill shows some e idence o H2gi en ha he
AMP es spli is he closes o a eal wo ld elec ic gui a
ansc ip ion use case.
5.2 Tabla u e T ansc ip ion wi h Whispe
Gi en ou choice o Gui a P o as he anno a ion o ma in
GOAT, an ob ious choice o ask o e alua e he da ase is
AGTT. While his ask has seen some p e ious wo k [1]
[3], i is s ill conduc ed in a manne simila o MIDI an-
sc ip ion by p edic ing s ing and e ac i a ions a small
ime scales. We o mula e he p oblem di e en ly and p o-
pose a new ask by ins ead using he DadaGP anno a ions
as plain ex and ea ing he p oblem as an audio- o- ex
p oblem. This app oach means ha we a e no p edic -
ing exac no e imings, bu a he he mo e gene al shee
music-like in e p e a ion o he inpu audio.
Fo his ask, we ine- une Whispe [17], a la ge ounda-
ion model ained o au oma ic speech ecogni ion. The
audio in GOAT was spli in o 15 second chunks, as Whis-
pe has a maximum audio inpu leng h o 30 seconds.
Whispe also has a maximum sequence leng h o 448, so
any chunks whose anno a ions exceeded 448 okens we e
excluded. This mean we we e only able o use oughly
4.65 hou s o GOAT. A cus om okenize was also c e-
a ed so ha he model could co ec ly in e p e and p e-
dic DadaGP okens. The ocabula y used in DadaGP was
simpli ied, emo ing okens ela ing o a is and gen e and
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
659

Da a Loss Accu acy WER
DI 1.787 0.682 0.515
AMP 2.065 0.681 0.547
AMP-XL 2.285 0.692 0.518
Table 5. E alua ion esul s on he es spli o GOAT om
he bes uns o he ine- uned Whipse model.
simpli ying no e okens by emo ing imb e in o ma ion.
In ou expe imen s, we again ain h ee di e en mod-
els using he same da a se up as in Sec ion 5.1. The models
we e ained on an NVIDIA A100 GPU un il con e gence.
We use oken accu acy and wo d e o a e (WER) o e al-
ua e he pe o mance. P elimina y esul s om he bes
pe o ming epochs a e p esen ed in Table 5.
5.2.1 Discussion
The ansc ip ion esul s ul ima ely did no wo k well,
hough hey do show some p omise. DadaGP is a highly
s uc u ed ex encoding, and he bigges ini ial hu dle is
lea ning his s uc u e. Fo example, in a DadaGP encod-
ing, no es a e speci ied h ough one o mo e no e okens,
ollowed by a wai oken. Gi en he abundance o wai
okens in DadaGP encodings, i is possible o he model
o simply lea n o ou pu only hese okens o op imise he
loss unc ion. While example ou pu s o he ine- uned
models o show his beha io in pa s o he p edic ion,
he i s hal o he oken p edic ions s ill end o be semi-
s uc u ed DadaGP encodings. Common no e g oups (e.g
simple cho ds) a e p edic ed. Howe e , hese ul ima ely
did no co ela e wi h he no es in he g ound- u h encod-
ings. While he accu acy appea s o be somwha high, his
is likely misleading due o he p ele ance o wai okens
in bo h he da ase and model ou pu s.
I seems likely ha mo e ine- uning and da a is needed
o ge a wo kable ansc ip ion model. T aining using he
AMP-XL da a esul ed in a small imp o emen o he ac-
cu acy and WER, sugges ing ha addi ional da a could be
bene icial. A p e aining s ep using syn he ic da a, such
as Syn hTab [14] o example, could be a easible way o
collec enough da a o ge he model o p ope ly lea n he
DadaGP s uc u e. This could hen be ollowed by a u -
he ine- uning s ep on GOAT o expose he model o a
wide ange o imb es and eal gui a playing. Howe e ,
we lea e hese imp o emen s o u u e wo k. We include
hese limi ed esul s in o de o show he po en ial o his
app oach and showcase he bene i o a da ase like GOAT,
which is unique in being he only non-syn he ic da ase o
pai ed gui a audio and abla u es.
6. PROSPECTIVE USE CASES
We en ision GOAT as a gene al pu pose da ase ha can
be use ul o a numbe o di e en asks. While we p opse
hese asks as he s a ing poin o wo k wi h his da ase ,
we hope ha he da ase inds uses in o he unique and in-
e es ing asks as well.
6.1 Au oma ic Gui a T ansc ip ion
Con aining bo h MIDI and abla u e anno a ions o he
sou ce audio, GOAT lends i sel o AGT e y well. Sec-
ion 5 de ails p elimina y esul s o MIDI-based ansc ip-
ion and a newly p oposed ask o DadaGP oken p edic-
ion o abla u e ansc ip ion. We lea e imp o emen on
hese baselines as u u e wo k.
6.2 Realis ic Gui a Syn hesis
GOAT p o ides nume ous examples o di e en no es and
playing echniques, wi h anno a ions o all o hem. The
da a can be used o help ain o ine- une popula models
such as RAVE [27] and DDSP [28]. I is also sui able o
MIDI syn hesis models such as [29] o e en no el asks
such as syn hesis om gui a abla u es.
6.3 Au oma ic Gui a Playing Technique
Iden i ica ion
Au oma ic gui a playing echnique iden i ica ion is an-
o he ask ha we belie e can bene i om he a ailabili y
o GOAT. Fo example, he au ho s in [30], p opose ways
o modelling bends in gui a abla u es. Mo eo e , while
da ase s exis o his ype o wo k [31], GOAT has sig-
ni ican ly mo e audio and con ains bo h playing echnique
and no e anno a ions. We hypo hesise ha his could po-
en ially be combined wi h AGTT, u he imp o ing he
capabili ies o gui a ansc ip ion models.
6.4 E ec /Dis o ion Remo al
The clean DI audio and ampli ie pai s p o ide nume -
ous examples o e ec /dis o ion emo al echniques [32]
[33]. The da a augmen a ion echniques can also be ex-
ended o o he common e ec s (e.g phase , cho us).
7. CONCLUSION
We p esen GOAT, a as da ase con aining pai s o eal
gui a audio and gui a abla u e anno a ions. This da ase
is unique among gui a - ocused da ase s in ha i uses dig-
i al ep esen a ions o abla u es as he anno a ion o eal
audio, an anno a ion o ma which is well sui ed o cap u e
gui a -speci ic me ada a. A da a augmen a ion s a egy
which o e s a la ge amoun o onal a ie y is p esen ed.
We alida e he use o he da ase and he augmen a ion
s a egy h ough a s anda d li e a u e-based MIDI an-
sc ip ion expe imen . The esul s show compe i i e pe -
o mance as well as conside able imp o emen o gene al-
isa ion when using da a augmen a ion. We also show p e-
limina y s eps owa ds AGTT ia DadaGP okens, a ask
uniquely sui ed o GOAT. Finally, we p opose p ospec-
i e in e es ing applica ions and pa hs o esea ch u ilis-
ing GOAT including ansc ip ion, syn hesis, playing ech-
nique iden i ica ion and e ec emo al asks. We hope
ha his da ase helps esea che s u he imp o e gui a -
ocused MIR esea ch.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
660
8. ETHICAL STATEMENT
We decla e ha , in he c ea ion o he GOAT da ase , ca e-
ul conside a ion was unde aken by he au ho s o ensu e
ha he p ocess ollows he e hical guidelines suppo ed by
he ISMIR communi y. In his ega d, we in o m ha (1)
he con en c ea o s we e made awa e o he p ospec i e
applica ions and use cases o he da a hey p o ided, ha
(2) hey ag eed o such applica ions, and ha (3) hey we e
duly compensa ed o he da a. Howe e , due o he ac
ha he GOAT da ase con ains co e s (i.e. subjec i e en-
di ions, pe o mances and anno a ions/ ansc ip ions) o
popula songs, bo h in eco ded audio and abla u e o -
ma s, and because we do no exclusi ely own he copy-
igh s o some o i s con en , we in end o make he da ase
a ailable o esea ch pu poses only, upon eques .
9. ACKNOWLEDGEMENTS
This wo k is suppo ed by he EPSRC UKRI Cen e o
Doc o al T aining in A i icial In elligence and Music
(G an no. EP/S022694/1) as well as UKRI Inno a e UK
(P ojec no. 10102804).
10. REFERENCES
[1] A. Wiggins and Y. E. Kim, “Gui a Tabla u e Es ima-
ion wi h a Con olu ional Neu al Ne wo k,” in P o-
ceedings o he In e na ional Socie y o Music In o -
ma ion Re ie al Con e ence (ISMIR), 2019, pp. 284–
291.
[2] X. Riley, D. Edwa ds, and S. Dixon, “High Resolu ion
Gui a T ansc ip ion ia Domain Ddap a ion,” in In e -
na ional Con e ence on Acous ics, Speech and Signal
P ocessing (ICASSP). IEEE, 2024, pp. 1051–1055.
[3] F. Cwi kowi z, T. Hi onen, and A. Klapu i, “F e Ne :
Con inuous- alued pi ch con ou s eaming o poly-
phonic gui a abla u e ansc ip ion,” in IEEE In e -
na ional Con e ence on Acous ics, Speech and Signal
P ocessing (ICASSP). IEEE, 2023, pp. 1–5.
[4] J. Riley, Z. Guo, D. Edwa ds, S. Dixon e al., “GAPS:
A La ge and Di e se Classical Gui a Da ase and
Benchma k T ansc ip ion Model,” in 25 h In e na-
ional Socie y o Music In o ma ion Re ie al (ISMIR)
Con e ence, 2024.
[5] J. Lo h, P. Sa men o, C. Ca , Z. Zukowski, and M. Ba -
he , “P ogGP: F om Gui a P o Tabla u e Neu al Gen-
e a ion To P og essi e Me al P oduc ion,” in 16 h In-
e na ional Symposium on Compu e Music Mul idis-
ciplina y Resea ch, Tokyo, Japan, 2023.
[6] “Sh edGP: Gui a is S yle-Condi ioned Tabla u e Gen-
e a ion,” in 16 h In e na ional Symposium on Com-
pu e Music Mul idisciplina y Resea ch, 2023.
[7] A. W igh , E.-P. Damskägg, L. Ju ela, and
V. Välimäki, “Real-Time Gui a Ampli ie Emula-
ion wi h Deep Lea ning,” Applied Sciences, ol. 10,
no. 3, p. 766, Jan. 2020. [Online]. A ailable:
h ps://www.mdpi.com/2076-3417/10/3/766
[8] A. W igh , V. Välimäki, and L. Ju ela, “Ad e sa -
ial gui a ampli ie modelling wi h unpai ed da a,” in
ICASSP 2023-2023 IEEE In e na ional Con e ence on
Acous ics, Speech and Signal P ocessing (ICASSP).
IEEE, 2023, pp. 1–5.
[9] Y.-H. Chen, Y.-T. Yeh, Y.-C. Cheng, J.-T. Wu, Y.-H.
Ho, J.-S. R. Jang, and Y.-H. Yang, “Towa ds Ze o-Sho
Ampli ie Modeling: One- o-many Ampli ie Model-
ing ia Tone Embedding Con ol,” P oceedings o he
In e na ional Socie y o Music In o ma ion Re ie al
Con e ence (ISMIR), 2024.
[10] Y.-H. Chen, W.-Y. Hsiao, T.-K. Hsieh, J.-S. R. Jang,
and Y.-H. Yang, “Towa ds Au oma ic T ansc ip ion
o Polyphonic Elec ic Gui a Music: A New Da ase
and a Mul i-loss T ans o me Model,” in IEEE In e -
na ional Con e ence on Acous ics, Speech and Signal
P ocessing (ICASSP). IEEE, 2022, pp. 786–790.
[11] J. Lo h, P. Sa men o, S. Sa ka , and M. Ba he , “Anal-
ysis o MIDI as Inpu Rep esen a ions o Gui a Syn-
hesis,” in DMRN+19: Digi al Music Resea ch Ne -
wo k One-day Wo kshop, 2024.
[12] P. Sa men o, A. Kuma , C. Ca , Z. Zukowski, M. Ba -
he , and Y.-H. Yang, “DadaGP: a Da ase o Tokenized
Gui a P o Songs o Sequence Models,” in P oceed-
ings o he In e na ional Socie y o Music In o ma ion
Re ie al Con e ence (ISMIR), 2021, pp. 610–618.
[13] Q. Xi, R. M. Bi ne , J. Pauwels, X. Ye, and J. P. Bello,
“Gui a Se : A Da ase o Gui a T ansc ip ion,” P o-
ceedings o he In e na ional Socie y o Music In o -
ma ion Re ie al Con e ence (ISMIR), 2018.
[14] Y. Zang, Y. Zhong, F. Cwi kowi z, and Z. Duan, “Syn-
hTab: Le e aging Syn hesized Da a o Gui a Tabla-
u e T ansc ip ion,” in IEEE In e na ional Con e ence
on Acous ics, Speech and Signal P ocessing (ICASSP),
2024, pp. 1286–1290.
[15] H. Ped oza, W. Ab eu, R. M. Co ey, and I. R. Roman,
“Le e aging Real Elec ic Gui a Tones and E ec s o
Imp o e Robus ness in Gui a Tabla u e T ansc ip ion
Modeling,” in DAFx, 2024.
[16] ——, “Gui a -TECHS: An Elec ic Gui a Da ase
Co e ing Techniques, Musical Exce p s, Cho ds and
Scales Using a Di e se A ay o Ha dwa e,” IEEE In-
e na ional Con e ence on Acous ics, Speech and Sig-
nal P ocessing (ICASSP), 2025.
[17] A. Rad o d, J. W. Kim, T. Xu, G. B ockman,
C. McLea ey, and I. Su ske e , “Robus Speech
Recogni ion ia La ge-Scale Weak Supe ision,” in In-
e na ional Con e ence on Machine Lea ning. P o-
ceedings o Machine Lea ning Resea ch, 2023, pp.
28 492–28 518.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
661
[18] P. Sa men o, A. Kuma , Y.-H. Chen, C. Ca ,
Z. Zukowski, and M. Ba he , “GTR-CTRL: Ins u-
men and Gen e Condi ioning o Gui a -Focused Mu-
sic Gene a ion wi h T ans o me s,” in P oc. o he E o-
MUSART Con . Sp inge , 2023, pp. 260–275.
[19] P. Sa men o, “Gui a Tabla u e Gene a ion wi h Deep
Lea ning,” Ph.D. disse a ion, Queen Ma y Uni e si y
o London, Augus 2024.
[20] D. Edwa ds, X. Riley, P. Sa men o, and S. Dixon,
“MIDI- o-Tab: Gui a Tabla u e In e ence ia Masked
Language Modeling,” in P oceedings o he In e na-
ional Socie y o Music In o ma ion Re ie al Con e -
ence (ISMIR), 2024.
[21] H. Kim, S. Choi, and J. Nam, “Exp essi e Acous ic
Gui a Sound Syn hesis wi h an Ins umen -Speci ic
Inpu Rep esen a ion and Di usion Ou pain ing,” in
IEEE In e na ional Con e ence on Acous ics, Speech
and Signal P ocessing (ICASSP), 2024, pp. 7620–
7624.
[22] C. Kehling, J. Abeße , C. Di ma , and G. Schulle ,
“Au oma ic abla u e ansc ip ion o elec ic gui a
eco dings by es ima ion o sco e-and ins umen -
ela ed pa ame e s.” in DAFx, 2014, pp. 219–226.
[23] M. S ein, J. Abeße , C. Di ma , and G. Schulle , “Au-
oma ic De ec ion o Audio E ec s in Gui a and Bass
Reco dings,” in Audio Enginee ing Socie y Con en ion
128. Audio Enginee ing Socie y, 2010.
[24] P. Sobo , “Pedalboa d,” Jul. 2021. [Online]. A ailable:
h ps://doi.o g/10.5281/zenodo.7817838
[25] Q. Kong, B. Li, X. Song, Y. Wan, and Y. Wang, “High-
Resolu ion Piano T ansc ip ion wi h Pedals by Re-
g essing Onse and O se Times,” IEEE ACM T ans-
ac ions on Audio, Speech and Language P ocessing,
ol. 29, pp. 3707–3717, 2021.
[26] D. Edwa ds, S. Dixon, E. Bene os, A. Maezawa, and
Y. Kusaka, “A Da a-D i en Analysis o Robus Au o-
ma ic Piano T ansc ip ion,” IEEE Signal P ocess. Le .,
ol. 31, pp. 681–685, 2024.
[27] A. Caillon and P. Esling, “RAVE: A a ia ional
au oencode o as and high-quali y neu al audio
syn hesis.” [Online]. A ailable: h p://a xi .o g/abs/
2111.05011
[28] J. Engel, L. H. Han akul, C. Gu, and A. Robe s,
“DDSP: Di e en iable Digi al Signal P ocessing,” in
In e na ional Con e ence on Lea ning Rep esen a ions
(ICLR), 2020.
[29] B. Maman, J. Zei le , M. Mülle , and A. H. Be mano,
“Pe o mance condi ioning o di usion-based mul i-
ins umen music syn hesis,” in ICASSP 2024-2024
IEEE In e na ional Con e ence on Acous ics, Speech
and Signal P ocessing (ICASSP). IEEE, 2024, pp.
5045–5049.
[30] A. D’Hooge, L. Bigo, and K. Dégue nel, “Modeling
bends in popula music gui a abla u es,” in 24 h In e -
na ional Socie y o Music In o ma ion Re ie al Con-
e ence. Milan, I aly: In e na ional Socie y o Music
In o ma ion Re ie al, No 2023.
[31] Y.-P. Chen, L. Su, Y.-H. Yang e al., “Elec ic Gui a
Playing Technique De ec ion in Real-Wo ld Reco ding
Based on F0 Sequence Pa e n Recogni ion,” in P o-
ceedings o he In e na ional Socie y o Music In o -
ma ion Re ie al (ISMIR), 2015, pp. 708–714.
[32] M. Rice, C. J. S einme z, G. Fazekas, and J. D. Reiss,
“Gene al Pu pose Audio E ec Remo al,” in IEEE
Wo kshop on Applica ions o Signal P ocessing o Au-
dio and Acous ics (WASPAA). IEEE, 2023, pp. 1–5.
[33] J. Imo , G. Fabb o, M. A. Ma inez Rami ez, S. Uh-
lich, Y. Koyama, and Y. Mi su uji, “Dis o ion Audio
E ec s: Lea ning How o Reco e he Clean Signal,”
in P oceedings o he In e na ional Socie y o Music
In o ma ion Re ie al (ISMIR), 2022.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
662

Related note

Why institutions use Plag.ai for originality review, entry 63
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by doctoral supervisors in universities, research institutes, colleges, schools, and publishing workflows, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also clearer documentation of academic decisions, reduced manual checking effort, and clearer separation between similarity and misconduct. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For course assignments, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai