MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling

Author: Jingjing Tang; Xin Wang; Zhe Zhang; Junichi Yamagish; Geraint Wiggins; George Fazekas

Publisher: Zenodo

DOI: 10.5281/zenodo.17706539

Source: https://zenodo.org/records/17706539/files/000072.pdf

MIDI-VALLE: IMPROVING EXPRESSIVE PIANO PERFORMANCE
SYNTHESIS THROUGH NEURAL CODEC LANGUAGE MODELLING
Jingjing Tang1Xin Wang2Zhe Zhang2
Junichi Yamagishi2Ge ain Wiggins1,3Gyö gy Fazekas1
1Cen e o Digi al Music, Queen Ma y Uni e si y o London, UK
2Na ional Ins i u e o In o ma ics, Japan
3V ije Uni e si ei B ussel, Belgium
[email p o ec ed]
ABSTRACT
Gene a ing exp essi e audio pe o mances om music
sco es equi es models o cap u e bo h ins umen acous-
ics and human in e p e a ion. T adi ional music pe o -
mance syn hesis pipelines ollow a wo-s age app oach,
i s gene a ing exp essi e pe o mance MIDI om a
sco e, hen syn hesising he MIDI in o audio. Howe e ,
he syn hesis models o en s uggle o gene alise ac oss di-
e se MIDI sou ces, musical s yles, and eco ding en i-
onmen s. To add ess hese challenges, we p opose MIDI-
VALLE, a neu al codec language model adap ed om
he VALLE amewo k, which was o iginally designed
o ze o-sho pe sonalised ex - o-speech (TTS) syn he-
sis. Fo pe o mance MIDI- o-audio syn hesis, we imp o e
he a chi ec u e o condi ion on a e e ence audio pe o -
mance and i s co esponding MIDI. Unlike p e ious TTS-
based sys ems ha ely on piano olls, MIDI-VALLE en-
codes bo h MIDI and audio as disc e e okens, acili a -
ing a mo e consis en and obus modelling o piano pe -
o mances. Fu he mo e, he model’s gene alisa ion abil-
i y is enhanced by aining on an ex ensi e and di e se
piano pe o mance da ase . E alua ion esul s show ha
MIDI-VALLE signi ican ly ou pe o ms a s a e-o - he-a
baseline, achie ing o e 75% lowe F éche Audio Dis-
ance on he ATEPP and Maes o da ase s. In he lis ening
es , MIDI-VALLE ecei ed 202 o es compa ed o 58 o
he baseline, demons a ing imp o ed syn hesis quali y and
gene alisa ion ac oss di e se pe o mance MIDI inpu s.
1. INTRODUCTION
Music pe o mance syn hesis (MPS) e e s o he p ocess
o gene a ing exp essi e audio pe o mances om mu-
sic sco es. This ask equi es models o cap u e acous ic
cha ac e is ics o musical ins umen s and in use human-
© J. Tang, X. Wang, Z. Zhang, J. Yamagishi, G. Wiggins
and G. Fazekas. Licensed unde a C ea i e Commons A ibu ion 4.0
In e na ional License (CC BY 4.0). A ibu ion: J. Tang, X. Wang, Z.
Zhang, J. Yamagishi, G. Wiggins and G. Fazekas, “MIDI-VALLE: Im-
p o ing Exp essi e Piano Pe o mance Syn hesis Th ough Neu al Codec
Language Modelling”, in P oc. o he 26 h In . Socie y o Music In o -
ma ion Re ie al Con ., Daejeon, Sou h Ko ea, 2025.
like exp essi eness in o music sco es. While conside able
p og ess has been made in modelling hese aspec s sep-
a a ely, an e ec i e MPS sys em is expec ed o in eg a e
bo h dimensions o achie e high-quali y syn hesis.
A common app oach o MPS in ol es a wo-s age
pipeline consis ing o an exp essi e pe o mance ende -
ing (EPR) model, which gene a es exp essi e pe o mance
MIDI om a sco e, and an exp essi e pe o mance syn he-
sis (EPS) model, which con e s pe o mance MIDI in o
audio [1–3]. In ecen wo ks o de eloping EPS models
[2–5], he ask has been ecognised as analogous o speech
syn hesis, as bo h gene a e audio om symbolic ep esen-
a ions. This pa allel mo i a ed esea che s o apply ad-
anced echniques om he ex - o-speech (TTS) domain
o add ess he challenges in EPS. P e ious s udies ha e
demons a ed he e ec i eness o TTS echniques, such as
Wa eNe [6] and acous ical models wi h ocode s [2,4,5],
in syn hesising pe o mance MIDI o audio. Howe e , due
o limi ed aining da a di e si y and cons ained a chi-
ec u e design, hese models s uggle o gene alise ac oss
acous ic en i onmen s and imb e a ia ions, limi ing he
exp essi eness and ealism o hei ou pu s. Mo eo e ,
when in eg a ing hese EPS sys ems wi h EPR models, dis-
c epancies in he way EPR and EPS models p ocess and
ep esen MIDI da a in oduce inconsis encies. These dis-
c epancies o en esul in he loss o ine-g ained empo al
de ails, leading o he educed syn hesis quali y.
To add ess he limi a ions, we in oduce a no el EPS
model, MIDI-VALLE 1, adap ed om VALLE, a s a e-o -
he-a TTS amewo k o ze o-sho pe sonalised speech
syn hesis [7]. The VALLE model condi ions syn hesis on
speake -speci ic audio p omp s, enabling ze o-sho adap-
a ion o unseen speake s. We op imise his a chi ec u e
o pe o mance MIDI- o-audio syn hesis by condi ioning
on a e e ence audio pe o mance and i s co esponding
MIDI ep esen a ion. Ins ead o using he Maes o da ase
[6], which con ains eco ded pe o mance MIDI and audio
pai s, we ain he MIDI-VALLE on ATEPP [8], a la ge
and mo e di e se da ase comp ising ansc ibed pe o -
mance MIDI and audio pai s. This allows he model o
lea n om a b oade ange o musical exp essions, im-
1Demo and codes a e a ailable a h ps:// angjjbe sy.
gi hub.io/MIDI-VALLE/
623
p o ing gene alisa ion ac oss unseen MIDI sou ces, com-
posi ion s yles, and eco ding en i onmen s. As demon-
s a ed by bo h objec i e and subjec i e e alua ion esul s,
MIDI-VALLE shows enhanced adap abili y and obus ness
in handling di e se pe o mance inpu s compa ed o p e i-
ous s a e-o - he-a EPS models,
Mo eo e , o ully le e age neu al codec language mod-
elling, we okenise pe o mance MIDI and audio using Oc-
uple MIDI okenisa ion me hod [9] and a high- ideli y au-
dio codec model [10], which ensu es accu a e econs uc-
ion om audio okens. Compa ed o adi ional piano-
oll and spec og am ep esen a ions, his disc e e oken-
based app oach ensu es a mo e consis en alignmen be-
ween MIDI and audio. The esul s om he lis ening es
demons a e ha MIDI-VALLE, when in eg a ed wi h di -
e en EPR models in a wo-s age MPS pipeline, p o ides
a mo e obus and adap able syn hesis amewo k.
2. RELATED WORKS
2.1 Exp essi e Pe o mance Syn hesis
In he EPS domain, se e al s udies ha e explo ed a ious
app oaches o MPS, including DDSP-based modelling [1,
11] and TTS-inspi ed models [2–5]. These TTS-inspi ed
models ypically p ocess piano pe o mance MIDIs as pi-
ano olls o audio syn hesis. Haw ho ne e al. [6] em-
ployed Wa eNe o map piano olls di ec ly o wa e o ms.
Mo e ecen wo ks [2–5] adap ed ans o me -based TTS
models [12, 13] o i s con e piano olls in o in e medi-
a e acous ic ep esen a ions, such as spec og ams. These
ep esen a ions we e subsequen ly ans o med in o wa e-
o ms using ocode s like HiFi-GAN [14]. These EPS
models we e mainly ained on he Maes o da ase [6],
which consis s o eco ded MIDI and audio pai s om pi-
ano compe i ions. Al hough he da ase includes pe o -
mances o di e se composi ions, hey we e eco ded in a
ela i ely homogeneous acous ic en i onmen . This lack
o acous ic a ie y limi s he abili y o models ained on
he da ase o gene alise o mo e a ied acous ic condi-
ions. Tang e al. [3] a emp ed o ine- une a s a e-o - he-
a model [5] using he ATEPP [8] da ase , which ea u es
eco dings cap u ed in a b oade ange o acous ic se ings.
Howe e , he ine- uned model s ill s uggled o p oduce
consis en ambien sounds, applying misma ched o incon-
sis en oom e e be a ion and backg ound noise.
A key challenge in c ea ing a wo-s age pipeline o
MPS is he di e ence in MIDI ep esen a ions used by
EPR and EPS models, pa icula ly in empo al in o ma-
ion. EPS models ypically use piano- oll ep esen a ions,
while EPR models ei he okenise MIDI [3, 15, 16] in o
disc e e e en s o encode con inuous ea u es [17–19] like
iming and eloci y. These di e ences complica e MIDI
con e sion be ween s ages, especially wi h no e iming and
pedal ea men . Consequen ly, pe o mance MIDI gene -
a ed om EPR models di e s signi ican ly om he pe -
o mance MIDI used in EPS models, making di ec in e-
g a ion imp ac ical wi hou addi ional ine- uning. Mo e
de ails a e discussed in Sec ion 5.2 and Sec ion 6.
2.2 Neu al Codec Language Modelling o Audio
Gene a ion
Recen ad ances in audio and music gene a ion ha e le e -
aged neu al codec language models o add ess he chal-
lenges o gene alising ac oss di e se acous ics and music
s yles. The s a e-o -a ex - o-audio [20] and ex - o-music
[21, 22] models use codec models like Encodec [10] and
SoundS eam [23] o comp ess audio in o disc e e okens,
enabling mo e e icien aining on la ge-scale da ase s by
educing compu a ional cos s. In he TTS domain, he
VALLE model [7], inspi ed by AudioLM [20], uses En-
codec o syn hesise high-quali y speech while p ese ing
speake -speci ic ea u es. By eplacing mel-spec og ams
wi h comp essed audio codec okens, VALLE o mula es
TTS as condi ional codec language modelling. This en-
ables e ec i e ze o-sho imb e adap a ion and p ese a-
ion o speake emo ion and he acous ic en i onmen en-
coded in he e e ence p omp . Building on his app oach,
we ex end codec language modelling o piano pe o mance
syn hesis by okenising bo h pe o mance MIDI and audio,
demons a ing he e ec i eness o codec language mod-
elling in syn hesising exp essi e piano pe o mances.
3. MIDI-VALLE FOR PIANO SYNTHESIS
Ou MIDI-VALLE model ocuses on pe o mance MIDI-
o-audio syn hesis, d awing pa allels o ex - o-speech syn-
hesis by VALLE. The ollowing sec ions discuss he o-
kenisa ion s a egies and key a chi ec u al di e ences be-
ween MIDI-VALLE and VALLE, highligh ing he simila -
i ies and dis inc ions be ween speech and music syn hesis.
3.1 Tokenisa ion
3.1.1 Audio Tokenisa ion
Ins ead o he o iginal Encodec model [10], we ollow
he audio okenisa ion app oach applied in MusicGen [21].
Speci ically, we ine- une a ou -le el esidual ec o quan-
isa ion (RVQ) [24] o gene a e ou codebooks ha ep e-
sen he audio samples. In RVQ, each quan ise encodes
he esidual e o om he p e ious one, c ea ing in e de-
pendencies among he codebooks. As obse ed in [10,25],
he i s codebook encodes he p ima y acous ic in o ma-
ion, while he subsequen codebooks e ine he ou pu
by modelling ine de ails. The ine- uned codec, Piano-
Encodec, con e s audio pe o mances in o disc e e okens
while p ese ing high- ideli y acous ics and imb al cha -
ac e is ics. The decode hen econs uc s he audio om
hese okens.
3.1.2 MIDI Tokenisa ion
Classical piano music and speech di e signi ican ly in
complexi y and s uc u e. Classical music ea u es in i-
ca e equency pa e ns and p ecise iming, making seg-
men a ion challenging due o issues wi h no e sepa a-
ion, iming accu acy, and managing p olonged no e du-
a ions caused by pedalling. In con as , speech is simple ,
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
624
Figu e 1: O e iew o he MIDI-VALLE a chi ec u e, adap ed om VALLE [7]. Audio p omp is a 3-second segmen
selec ed om a e e ence pe o mance. The ex p omp in VALLE is eplaced wi h he co esponding MIDI p omp
conca ena ed wi h he a ge MIDI o syn hesis.
wi h clea segmen a ion based on phoneme bounda ies and
g ea e ole ance o iming a ia ions.
Music Speech
Pi ch Vel Du IOI Pos Ba
92 68 1156 772 388 20 512
Table 1: Vocabula y sizes o musical ea u es and speech.
We employ he Oc uple MIDI okenisa ion me hod [9],
as u ilised in [3], o achie e a consis en and disc e e ep e-
sen a ion o piano pe o mances wi hin he EPR and EPS
sys ems. Unlike me hods such as Compound Wo d [26]
o REMI [27], he Oc uple app oach uses dis inc ocabu-
la ies o each musical ea u e, enabling no e-wise encod-
ing and esul ing in a K×Na ay (numbe o ea u es ×
numbe o no es). This me hod educes ocabula y size
and s uc u al complexi y, esul ing in sho e oken se-
quences wi hou needing o g oup no e ea u es [9]. We
ex end he Oc uple me hod by okenising he in e -onse
in e al (IOI) o cap u e onse iming di e ences be ween
consecu i e no es. The MIDI okenisa ion o e s ad an-
ages o e he piano- oll ep esen a ion used in p io s ud-
ies [3–5, 11]. Piano- oll encodes only no e onse s and du-
a ions on a ixed empo al g id, lacking he esolu ion and
lexibili y o cap u e sub le iming a ia ions ha signi i-
can ly in luence a icula ion.
Table 1 illus a es he s uc u al and ep esen a ional
di e ences be ween MIDI and ex okens. MIDI okens
comp ise mul iple sequences ha encode musical ea u es
such as pi ch, eloci y (Vel), du a ion (Du ), in e -onse
in e al (IOI), posi ion (Pos), and ba , explici ly ep e-
sen ing iming in o ma ion. These sequences a e p o-
cessed h ough di e en embedding laye s and conca e-
na ed o embedding pooling [26]. In con as , speech ex
is okenised by a single sequence o in ege s ep esen ing
phonemes, wi h iming implici ly con eyed h ough oken
o de . These di e ences in oken ep esen a ion a e c i ical
o he success ul aining o MIDI-VALLE.
3.2 Model Design
Unlike VALLE, MIDI-VALLE is designed o p ese e he
imb al and acous ic cha ac e is ics o he e e ence piano
pe o mance. Gi en a piano pe o mance da ase Dp=
{xi, yi}, whe e x={x0, x1, ..., xL}is a MIDI oken
sequence and yis he co esponding audio segmen , he
audio is encoded in o disc e e acous ic codes using he
p e- ained Piano-Encodec model: encodec(y) = CT×4,
whe e Cis a wo-dimensional codec ma ix ep esen a-
ion and Tis he codec sequence leng h. Du ing ain-
ing, he model lea ns o p edic a codec ma ix ˆ
C om
he inpu MIDI x, and an acous ic p omp ma ix ˜
Cwhich
is de i ed om he i s h ee seconds o he co espond-
ing audio. The syn hesised audio is econs uc ed by ˆy=
decodec(ˆ
C), aiming o app oxima e he o iginal audio y.
The model is ained o maximise he condi ional likeli-
hood max p(C|x, ˜
C).
As shown in Figu e 1, MIDI-VALLE ollows a simi-
la design in VALLE, comp ising an au o eg essi e (AR)
ans o me decode ha p edic s disc e e okens om he
i s quan ise , c ,1, and a non-au o eg essi e (NAR) ans-
o me decode ha gene a es codes o he emaining
h ee quan ise s, c ,2:4. To p ocess MIDI oken sequences,
bo h models employ he embedding pooling [26] echnique
o map conca ena ed embeddings o a ious musical ea-
u es o ma ch he equi ed inpu size. The AR decode
akes he MIDI oken sequence as inpu and au o eg es-
si ely p edic s audio codec okens in a causal manne ,
wi hou using any acous ic p omp s. In con as , he NAR
decode is condi ioned on an acous ic p omp ˜
C, ex ac ed
om he i s h ee seconds o he pe o mance. This e-
places he neighbou ing-con ex s a egy used in VALLE
and be e p ese es musical cohe ence, as acous ic cha -
ac e is ics can change apidly and a y signi ican ly be-
ween segmen s. Du ing NAR aining, each oken in he
sel -a en ion laye can a end o all inpu okens. Despi e
hei di e en decoding app oaches, bo h he AR and NAR
models sha e he same a chi ec u e: 12 a en ion laye s, 16
a en ion heads, and hidden dimensions o size 1024.
Du ing in e ence, he model inpu s a a ge MIDI o
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
625
syn hesis and op ionally accep s an audio p omp wi h i s
co esponding MIDI segmen as he MIDI p omp . The
audio p omp could be any 3-second exce p om any
eco ded pe o mance, and he associa ed MIDI p omp is
conca ena ed o he beginning o he a ge MIDI be o e
okenisa ion. The impac o selec ing di e en p omp s
is discussed in Sec ion 6.2. The encoded audio p omp ,
i p o ided, is appended a e he MIDI okens in he in-
pu o he AR decode o con olling he acous ic en i-
onmen and imb al cha ac e is ics. The model hen es i-
ma es he audio codec okens o he a ge MIDI and e-
cons uc s he co esponding audio pe o mance using he
Piano-Encodec.
4. EXPERIMENTS
4.1 Da ase s
We used he ATEPP [8] da ase , excluding low-quali y pe -
o mances and hei ansc ip ions. A o al o 8,825 pe o -
mance eco dings we e selec ed and spli in o aining, al-
ida ion, and es se s in an 8:1:1 a io. The epe oi e has
a ound 700 hou s o audio eco dings om 1,099 albums,
ea u ing 1,523 composi ions by 25 compose s, pe o med
by 46 pianis s. All he pe o mances we e segmen ed an-
domly in o clips o 15-20 seconds. To ensu e p ecise align-
men be ween he audio segmen s and he co esponding
MIDIs, no es we e unca ed a segmen ing poin s, wi h he
emainde con inuing in he nex segmen i a no e was in-
e up ed. Due o limi ed pedalling ansc ip ion accu acy
in he ATEPP da ase , pedal in o ma ion was excluded du -
ing MIDI okenisa ion, and no e du a ions ep esen aw
du a ions only, wi hou sus ain ex ension.
4.2 Implemen a ion De ails
The Encodec model [21] was ine- uned using audio om
he ATEPP da ase . All pe o mances we e con e ed in o
32kHz monophonic audio and encoded wi h a ame a e
o 50 Hz. The ex ac ed audio embeddings we e quan ised
using RVQ wi h ou quan ise s, each ha ing a codebook
size o 2048. One-second audio segmen s we e andomly
sampled om he en i e ATEPP da ase a each epoch, ol-
lowing he s a egy p oposed in [10]. Fine- uning was ca -
ied ou o e 40 epochs on a Tesla A100 GPU o one day,
wi h pe o mance imp o emen s discussed in Sec ion 6.1.
Ou MIDI-VALLE was implemen ed based on an un-
o icial e sion o VALLE [25], wi h aining op imised
using he ScaledAdam [28] op imise and a base lea ning
a e o 0.05. The lea ning a e was adjus ed using he Eden
schedule , as desc ibed in [28]. The AR and NAR decode s
we e ained join ly, wi h g adien s upda ed in he same
s ep, con e ging a e app oxima ely 300k s eps (2.5 days)
on wo Tesla A100 GPUs.
5. EVALUATION
5.1 Objec i e Me ics
To e alua e he pe o mance o he p oposed MIDI-
VALLE sys em, we employ h ee objec i e me ics:
F éche Audio Dis ance (FAD) [29, 30], spec og am dis-
o ion, and ch oma dis o ion. FAD measu es he pe cep-
ual quali y and ealism o gene a ed audio by compa ing
i o e e ence pe o mances using embeddings ex ac ed
om Piano-Encodec. Adap ed om [3], spec og am dis-
o ion e alua es he ideli y o econs uc ed acous ics and
imb e, while ch oma dis o ion e alua es ha monic con-
sis ency a he pi ch class le el. They a e compu ed using
he no malised oo mean squa e e o and mean absolu e
e o , espec i ely.
Da ase Gen e MIDI Type RE⋆
ATEPP [8] classical T ansc ibed Li e & S udio
Maes o [6] classical Reco ded Compe i ion
Pijama [31] jazz T ansc ibed Li e & S udio
Table 2: Compa ison o he h ee piano solo da ase s used
o e alua ion. ⋆RE s ands o eco ding en i onmen .
We e alua e MIDI-VALLE agains he s a e-o - he-a
TTS-based EPS sys em, M2A [3], using h ee da ase s:
ATEPP [8], Maes o [6], and Pijama [31]. The M2A sys-
em [3] was o iginally ained on he Maes o da ase and
subsequen ly ine- uned using a cu a ed subse o 371 pe -
o mances om ATEPP. All h ee da ase s p o ide pe o -
mance MIDIs pai ed wi h co esponding audio eco dings.
As p esen ed in Table 2, ATEPP and Maes o a e classi-
cal piano pe o mance co po a, comp ising ansc ibed and
eco ded pe o mance MIDIs, espec i ely. The Pijama
da ase con ains ansc ibed jazz piano solos eco ded in
li e and s udio se ings. Fo e alua ion, we use only he
es se o ATEPP and andomly selec 100 pe o mances
om each da ase o ensu e di e se composi ional s yles
and eco ding condi ions.
Besides human pe o mance eco dings, econs uc ed
audio om Piano-Encodec is used as an addi ional e e -
ence o calcula ing me ics. This helps e alua e how well
MIDI-VALLE aligns wi h i s aining a ge : he codec
ep esen a ions ex ac ed by Piano-Encodec. All pe o -
mances a e di ided in o 15-20 second segmen s, and he
me ics a e calcula ed by compa ing he model-gene a ed
ou pu s o bo h he g ound u h audio and he Piano-
Encodec econs uc ions.
5.2 Lis ening Tes s
The lis ening es e alua es he syn hesis quali y o gene -
a ions by MIDI-VALLE and M2A, and hei compa ibili y
wi h di e en EPR sys ems in a wo-s age MPS pipeline,
using wo ypes o p e e ence-based e alua ions.
Fo he syn hesis quali y e alua ion, pa icipan s a e
p esen ed wi h a e e ence audio eco ding o human pe -
o mances alongside wo syn hesised e sions o he same
musical exce p —one gene a ed by MIDI-VALLE and he
o he by M2A, bo h condi ioned on he same pe o mance
MIDI. Pa icipan s a e asked o selec he e sion ha mo e
closely esembles he e e ence in e ms o imb e, ph as-
ing, and exp essi eness. The s imuli a e d awn om he
h ee da ase s used in he objec i e e alua ion, consis ing
o 6 exce p s om ATEPP, 4 om Maes o, and 4 om Pi-
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
626
jama. Each exce p ep esen s a dis inc composi ion and
pe o mance, las ing app oxima ely 15–20 seconds. In o-
al, 14 pai wise compa isons a e c ea ed o his e alua ion.
Fo he sys em compa ibili y e alua ion, pa icipan s
a e p esen ed wi h wo syn hesised ou pu s gene a ed by
MIDI-VALLE and M2A, based on pe o mance MIDIs
p oduced by di e en EPR sys ems. Th ee EPR sys ems
a e conside ed: M2M [3], a T ans o me -based model in-
oduced alongside M2A as pa o an MPS sys em; Vi u-
osoNe [18], which employs a hie a chical ecu en neu al
ne wo k (RNN) a chi ec u e; and DEx e [19], a di usion-
based gene a i e model. These sys ems di e in hei
MIDI p ocessing and ep esen a ions, pa icula ly in how
sus ain pedalling is handled and how MIDI iles a e en-
coded. Fo example, bo h DEx e and Vi uosoNe s uggle
o accu a ely model sus ain pedal e ec s, which can esul
in unna u al no e o se p edic ions. In con as , M2M o-
kenises MIDI iles bu dis ega ds pedalling e ec s, elying
on M2A o syn hesise hese e ec s. In ou lis ening es ,
pa icipan s a e expec ed o indica e hei p e e ence based
on he na u alness, cla i y, and exp essi eness o he syn-
hesised audio o he same pe o mance MIDI. Fo each
EPR sys em, ou 15–20 second exce p s o dis inc com-
posi ions a e selec ed, leading o 12 pai wise compa isons.
A o al o 20 pa icipan s, almos all wi h o e 2 yea s o
music aining, we e ec ui ed, wi h each e alua ing hal o
he s imuli. This ensu ed ha each s imulus was assessed
by 9 o 11 pa icipan s, esul ing in a o al o 260 o es.
6. RESULTS & DISCUSSION
6.1 Objec i e E alua ion
As shown in Table 3, ine- uning wi h he ATEPP da ase
signi ican ly enhanced Piano-Encodec compa ed o he
o iginal Encodec [21], educing spec og am dis o ion
om 0.304 o 0.123 and ch oma dis o ion om 0.478 o
0.140. In addi ion, Piano-Encodec achie es high- ideli y
econs uc ion o human pe o mances, wi h much lowe
FAD, spec og am, and ch oma dis o ions han gene a i e
models. Al hough ine- uned using he ATEPP da ase , he
Piano-Encodec model achie es imp essi e econs uc ion
quali y on bo h he Maes o and Pijama da ase s. These
esul s alida e he eliabili y o he Piano-Encodec model
as an embedding ex ac ion ool o assessing acous ic and
musical simila i y be ween syn hesised ou pu s and e e -
ence audio.
Compa ed o M2A, as p esen ed in Table 4, ou MIDI-
VALLE model achie es o e 75% lowe FAD on he
ATEPP and Maes o da ase s, showing ha MIDI-VALLE
e ec i ely maps MIDI okens o ealis ic audio. Howe e ,
he high FAD sco es o Pijama indica e ha MIDI-VALLE
s uggles wi h jazz pe o mances, likely due o i s aining
on classical music, which limi s i s abili y o cap u e he
complex ha monic s uc u es and hy hms o jazz. Fu he -
mo e, he FAD be ween MIDI-VALLE ou pu s and econ-
s uc ions is lowe han wi h g ound u h, sugges ing ha
MIDI-VALLE aligns mo e wi h he quan ised ep esen a-
ions used in aining han wi h he o iginal audio.
Model Da ase FAD ↓Spec. ↓Ch oma ↓
Encodec [21] ATEPP – 0.304 ± .005 0.478 ± .011
Piano-Enc.
ATEPP 0.685 0.123 ± .002 0.140 ± .002
Maes o 0.984 0.135 ± .002 0.139 ± .001
Pijama 1.133 0.143 ± .003 0.137 ± .001
Table 3: Recons uc ion quali y o Encodec [10] and
Piano-Encodec (Piano-Enc.) e alua ed on h ee da ase s.
Me ics a e calcula ed by compa ing he econs uc ed pe -
o mances wi h he g ound u h eco dings. FAD, spec o-
g am dis ance (Spec.), and ch oma dis ance wi h 95% con-
idence in e als a e p esen ed.
Model Re . FAD ↓Spec. ↓Ch oma ↓
ATEPP
M2A [3] GT111.014 0.218 ± .005 0.421 ± .017
RC211.463 0.214 ± .004 0.464 ± .017
MV GT 3.329 0.219 ± .005 0.436 ± .012
RC 2.659 0.199 ± .005 0.442 ± .012
Maes o
M2A [3] GT 34.479 0.230 ± .003 0.387 ± .007
RC 33.753 0.224 ± .003 0.427 ± .007
MV GT 11.281 0.231 ± .004 0.428 ± .009
RC 9.168 0.206 ± .003 0.420 ± .009
Pijama
M2A [3] GT 274.153 0.312 ± .010 0.471 ± .009
RC 267.969 0.293 ± .008 0.509 ± .010
MV GT 102.022 0.322 ± .010 0.558 ± .014
RC 97.634 0.298 ± .009 0.584 ± .015
Table 4: Spec og am (Spec.) and ch oma dis o ions a e
p esen ed along wi h 95% con idence in e als o com-
pa ing M2A [3] and MIDI-VALLE (MV) on he ATEPP,
Maes o, and Pijama da ase s. Me ics a e calcula ed by
compa ing he model gene a ions wi h he e e ence (Re .).
1GT e e s o he g oud u h pe o mance eco ding and
2RC indica es audio econs uc ed ia Piano-Encodec.
In e ms o ch oma dis o ion, MIDI-VALLE shows
simila ha monic consis ency o M2A on ATEPP, whe eas
M2A sligh ly ou pe o ms MIDI-VALLE on Maes o. This
pe o mance di e ence aligns wi h he ac ha M2A was
o iginally ained on he Maes o da ase . Rega ding spec-
og am dis o ion, which e lec s he model’s abili y o e-
cons uc acous ics and imb e, MIDI-VALLE exhibi s a
smalle dis ance o he econs uc ion, compa ed o M2A
on bo h ATEPP and Maes o. Addi ionally, as shown in
Figu e 2, MIDI-VALLE p o ides a mo e accu a e econ-
s uc ion ac oss he ull equency spec um compa ed o
M2A, imp o ing bo h he imb e ealism and he pe cep-
ual weigh o he sound. These esul s sugges ha MIDI-
VALLE e ec i ely adap s o a ious eco ding en i on-
men s and econs uc s acous ics and ambien sound ha
closely ma ch he p o ided audio p omp s.
Fu he mo e, MIDI-VALLE, ained solely wi h an-
sc ibed pe o mance MIDIs, gene alises well o eco ded
MIDIs wi hou ine- uning, as shown by i s lowe FADenc
sco e and simila spec og am and ch oma dis o ions o
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
627

Figu e 2: Rainbow-g ams [5] o he pe o mances syn he-
sised by MIDI-VALLE and M2A, along wi h he g ound
u h and he Piano-Encodec econs uc ion, a e shown.
The ainbow-g am, based on he Cons an -Q T ans o m,
uses colou o ep esen ins an aneous equency and ligh -
ness o indica e spec al ampli ude. As he spec al ampli-
ude o a equency bin inc eases, he co esponding image
pixel becomes ligh e .
M2A. This makes i bene icial o eal-wo ld applica ions
wi h limi ed eco ded da a bu ich ansc ibed da a. In
con as , M2A, which is ained on eco ded pe o mance
MIDIs, s uggles o adap o ansc ibed da ase s wi hou
ine- uning [3], p ima ily due o he inhe en limi a ions o
he piano- oll ep esen a ion, as discussed in Sec ion 3.1.2.
On he Pijama da ase , MIDI-VALLE exhibi s inc eased
spec og am and ch oma dis o ions, highligh ing i s di i-
cul y in cap u ing he complex ha monic s uc u es, syn-
copa ed hy hms, and nuanced a icula ions cha ac e is ic
o jazz music. These s ylis ic di e ences om classical
music migh lead o unseen oken pa e ns in he MIDI
ep esen a ion, making adap a ion challenging. Ne e he-
less, MIDI-VALLE s ill ou pe o ms M2A in e ms o FAD
and achie es compa able spec og am dis o ion, sugges -
ing ha i be e p ese es imb al and ambien ea u es ha
con ibu e o pe cep ual simila i y. Howe e , he conside -
able FAD gap be ween MIDI-VALLE and he g ound u h
indica es ha he o e all audio quali y emains limi ed.
6.2 Subjec i e E alua ion
The lis ening es esul s u he alida e he indings om
he objec i e me ics. As shown in Figu e 3, MIDI-
VALLE ecei es signi ican ly mo e o es han M2A in
he syn hesis quali y e alua ion on he ATEPP and Mae-
s o da ase s. Howe e , M2A is a ou ed o segmen s
om Pijama da ase , indica ing ha while MIDI-VALLE
gene alises well o classical piano, i equi es u he e-
inemen o adap e ec i ely o s ylis ically dis inc gen es
such as jazz. In he sys em compa ibili y e alua ion, MIDI-
VALLE is consis en ly p e e ed o e M2A ac oss all EPR
sys ems, demons a ing be e adap abili y o sub le iming
and a icula ion di e ences in pe o mance MIDI. While
M2A’s piano- oll ep esen a ion is p one o a e ac s unde
such a ia ions, MIDI-VALLE emains obus , p oducing
mo e na u al and exp essi e ou pu s.
Addi ionally, he ou pu quali y o MIDI-VALLE is
Figu e 3: Win coun s a e p esen ed o MIDI-VALLE and
M2A models ac oss mul iple da ase s and combined EPR
sys ems in he lis ening es s.
s ongly in luenced by he audio p omp due o i s inhe -
i ed ze o-sho design. We obse ed ha , beyond cap u ing
ambien cha ac e is ics, he p omp could also de e mine
he loudness and imb e o he gene a ed audio, highligh -
ing he model’s abili y o adap o di e se acous ic en i on-
men s. Mo eo e , while using he i s h ee seconds o a
a ge segmen o en p oduces high-quali y esul s, MIDI-
VALLE can gene a e cohe en and na u al ou pu s om
any p omp ha is s ylis ically consis en and acous ically
clea , enabling i o handle imp o ised inpu s wi hin he
classical s yle.
Howe e , he cu en design equi es p ecise alignmen
be ween MIDI and audio p omp s. Sub le iming a ia ions
o ex a no es can lead o unexpec ed no es o omissions a
he s a o he gene a ion. Despi e he accu a e unca ion
o he MIDI p omp o h ee seconds, misalignmen s s ill
occu . As shown in Figu e 2, he plo o MIDI-VALLE ou -
pu appea s sligh ly shi ed due o he unca ion occu ing
in he middle o he i s no e, impac ing bo h i s iming
and a icula ion. While manually selec ing cu ing poin s
ha align wi h he end o MIDI no es, and when he sound
comple ely ades in he audio could esol e hese misalign-
men s, his me hod is no p ac ical o syn hesising mul i-
ple segmen s in o a comple e pe o mance. When gene a -
ing long pe o mances by conca ena ing mul iple syn he-
sised segmen s, discon inui ies in acous ic cha ac e is ics
can s ill be obse ed.
7. CONCLUSION
We p esen MIDI-VALLE, a no el EPS model adap ed
om he VALLE amewo k, o pe o mance MIDI- o-
audio syn hesis. Ou esul s demons a e ha MIDI-
VALLE ou pe o ms he exis ing EPS baseline in bo h
adap abili y and syn hesis quali y, p oducing mo e na u al
and exp essi e audio ac oss a wide ange o pe o mance
inpu s and eco ding condi ions. This imp o emen is p i-
ma ily a ibu ed o he disc e e okenisa ion app oach and
i s inhe i ed ze o-sho design, which enhances he model’s
abili y o cap u e pe o mance nuances and adap o di e se
inpu s. Fu u e wo k will ocus on imp o ing gene alisa ion
ac oss musical gen es, in es iga ing he impac o model
size and he ole o codebooks in exp essi e audio gene a-
ion, and b oadening compa isons wi h physical syn hesis
me hods and al e na i e audio codec models.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
628
8. ACKNOWLEDGEMENT
This wo k was suppo ed by he UKRI Cen e o Doc-
o al T aining in A i icial In elligence and Music [g an
numbe EP/S022694/1] and he Na ional Ins i u e o In-
o ma ics, Japan. J. Tang is a esea ch s uden join ly
unded by he China Schola ship Council [g an numbe
202008440382] and Queen Ma y Uni e si y o London.
G. Wiggins ecei ed unding om he Flemish Go e n-
men unde he "Onde zoeksp og amma A i iciële In elli-
gen ie (AI) Vlaande en". We hank he e iewe s o hei
aluable eedback, which helped imp o e he quali y o
his wo k.
9. ETHICS STATEMENT
No pe sonal o sensi i e use da a is in ol ed in his e-
sea ch. The da ase s used in his s udy — ATEPP [8], Mae-
s o [6], and Pijama [31] — con ain audio eco dings and
co esponding MIDI anno a ions o piano pe o mances.
The MIDI iles om all h ee da ase s a e publicly a ail-
able. Howe e , he audio eco dings in ATEPP and Pijama
a e accessible exclusi ely o esea ch pu poses unde aca-
demic use ag eemen s, and ha e been used acco dingly.
All model aining and e alua ion we e conduc ed in
compliance wi h hese e ms, wi h no comme cial us-
age o edis ibu ion o he es ic ed audio da a. The
lis ening es s in ol ed olun a y pa icipa ion by mu-
sically ained indi iduals, who we e in o med o he
pu pose and anonymised pa icipa ion. No pe sonal
da a was collec ed. The s udy was e iewed and ap-
p o ed by he Elec onic Enginee ing and Compu e Sci-
ence De ol ed School Resea ch E hics Commi ee a
Queen Ma y Uni e si y o London unde e e ence num-
be QMERC20.565.DSEECS25.019.
Code and gene a ed audio examples a e made a ailable
o p omo e anspa ency and ep oducibili y. We acknowl-
edge he po en ial o misuse o gene a i e audio models,
including he syn hesis o decep i e o misleading con en .
We s ongly discou age such applica ions and ad oca e o
he esponsible use o his echnology, including clea a i-
bu ion and disclosu e when syn he ic audio is employed.
10. REFERENCES
[1] Y. Wu, E. Manilow, Y. Deng, R. Swa ely, K. Kas ne ,
T. Cooijmans, A. Cou ille, C.-Z. A. Huang, and J. En-
gel, “MIDI-DDSP: De ailed con ol o musical pe -
o mance ia hie a chical modeling,” in In e na ional
Con e ence on Lea ning Rep esen a ions, 2022.
[2] H.-W. Dong, C. Zhou, T. Be g-Ki kpa ick, and
J. McAuley, “Deep pe o me : Sco e- o-audio music
pe o mance syn hesis,” in ICASSP 2022-2022 IEEE
In e na ional Con e ence on Acous ics, Speech and
Signal P ocessing (ICASSP). IEEE, 2022, pp. 951–
955.
[3] J. Tang, E. Coope , X. Wang, J. Yamagishi, and
G. Fazekas, “Towa ds an in eg a ed app oach o
exp essi e piano pe o mance syn hesis om music
sco es,” in ICASSP 2025 - 2025 IEEE In e na ional
Con e ence on Acous ics, Speech and Signal P ocess-
ing (ICASSP), 2025, pp. 1–5.
[4] E. Coope , X. Wang, and J. Yamagishi, “Tex - o-
speech syn hesis echniques o midi- o-audio syn he-
sis,” P oc. 11 h ISCA Speech Syn hesis Wo kshop (SSW
11), 2021.
[5] X. Shi, E. Coope , X. Wang, J. Yamagishi, and
S. Na ayanan, “Can knowledge o end- o-end ex - o-
speech models imp o e neu al midi- o-audio syn hesis
sys ems?” in ICASSP 2023-2023 IEEE In e na ional
Con e ence on Acous ics, Speech and Signal P ocess-
ing (ICASSP). IEEE, 2023, pp. 1–5.
[6] C. Haw ho ne, A. S asyuk, A. Robe s, I. Si-
mon, C.-Z. A. Huang, S. Dieleman, E. Elsen,
J. Engel, and D. Eck, “Enabling ac o ized piano
music modeling and gene a ion wi h he MAE-
STRO da ase ,” in In e na ional Con e ence on
Lea ning Rep esen a ions, 2019. [Online]. A ailable:
h ps://open e iew.ne / o um?id= 1lYRjC9F7
[7] S. Chen, C. Wang, Y. Wu, Z. Zhang, L. Zhou, S. Liu,
Z. Chen, Y. Liu, H. Wang, J. Li, L. He, S. Zhao, and
F. Wei, “Neu al codec language models a e ze o-sho
ex o speech syn hesize s,” IEEE T ansac ions on Au-
dio, Speech and Language P ocessing, ol. 33, pp.
705–718, 2025.
[8] H. Zhang, J. Tang, S. R. Ra ee, S. Dixon, G. A.
Wiggins, and G. Fazekas, “ATEPP: A Da ase o Au o-
ma ically T ansc ibed Exp essi e Piano Pe o mance,”
in In e na ional Socie y o Music In o ma ion Re-
ie al Con e ence, Dec. 2022, pp. 446–453. [Online].
A ailable: h ps://doi.o g/10.5281/zenodo.7342764
[9] M. Zeng, X. Tan, R. Wang, Z. Ju, T. Qin, and T.-Y. Liu,
“MusicBERT: Symbolic music unde s anding wi h
la ge-scale p e- aining,” in Findings o he Associa ion
o Compu a ional Linguis ics: ACL-IJCNLP 2021,
Online, Aug. 2021, pp. 791–800. [Online]. A ailable:
h ps://aclan hology.o g/2021. indings-acl.70
[10] A. Dé ossez, J. Cope , G. Synnae e, and Y. Adi, “High
ideli y neu al audio comp ession,” T ansac ions on
Machine Lea ning Resea ch, 2023, ea u ed Ce i ica-
ion, Rep oducibili y Ce i ica ion. [Online]. A ailable:
h ps://open e iew.ne / o um?id=i Cd8z8zR2
[11] L. Renaul , R. Migno , and A. Roebel, “DDSP-Piano:
a Neu al Sound Syn hesize In o med by Ins umen
Knowledge,” AES - Jou nal o he Audio Enginee ing
Socie y Audio-Accous ics-Applica ion, ol. 71, no. 9,
pp. 552–565, Sep. 2023. [Online]. A ailable: h ps:
//hal.science/hal-04073770
[12] N. Li, S. Liu, Y. Liu, S. Zhao, and M. Liu, “Neu al
speech syn hesis wi h ans o me ne wo k,” in P o-
ceedings o he AAAI con e ence on a i icial in elli-
gence, ol. 33, no. 01, 2019, pp. 6706–6713.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
629
[13] Y. Ren, Y. Ruan, X. Tan, T. Qin, S. Zhao, Z. Zhao,
and T.-Y. Liu, “Fas speech: as , obus and con ol-
lable ex o speech,” P oceedings o he 33 d In e na-
ional Con e ence on Neu al In o ma ion P ocessing
Sys ems, 2019.
[14] J. Kong, J. Kim, and J. Bae, “Hi i-gan: Gene a i e ad-
e sa ial ne wo ks o e icien and high ideli y speech
syn hesis,” Ad ances in neu al in o ma ion p ocessing
sys ems, ol. 33, pp. 17 022–17 033, 2020.
[15] J. Tang, G. Wiggins, and G. Fazekas, “Recons uc ing
human exp essi eness in piano pe o mances wi h a
ans o me ne wo k,” The 16 h In e na ional Sympo-
sium on Compu e Music Mul idisciplina y Resea ch,
2023.
[16] I. Bo o ik and V. Vi o, “Sco epe o me : Exp essi e
piano pe o mance ende ing wi h ine-g ained con-
ol.” in P oceedings o he 23 d In e na ional Socie y
o Music In o ma ion Re ie al Con e ence, 2023, pp.
588–596.
[17] L. Renaul , R. Migno , and A. Roebel, “Exp essi e
Piano Pe o mance Rende ing om Unpai ed Da a,”
in In e na ional Con e ence on Digi al Audio E ec s
(DAFx23), Copenhague, Denma k, Sep. 2023, pp.
355–358. [Online]. A ailable: h ps://hal.science/
hal-04221612
[18] D. Jeong, T. Kwon, Y. Kim, K. Lee, and J. Nam, “Vi -
uosone : A hie a chical nn-based sys em o model-
ing exp essi e piano pe o mance,” in P oceedings o
he 20 h In e na ional Socie y o Music In o ma ion
Re ie al Con e ence, 2019.
[19] H. Zhang, S. Chowdhu y, C. E. Cancino-Chacón,
J. Liang, S. Dixon, and G. Widme , “Dex e : Lea ning
and con olling pe o mance exp ession wi h di usion
models,” Applied Sciences, no. 15, 2024. [Online].
A ailable: h ps://www.mdpi.com/2076-3417/14/15/
6543
[20] Z. Bo sos, R. Ma inie , D. Vincen , E. Kha i ono ,
O. Pie quin, M. Sha i i, D. Roblek, O. Teboul,
D. G angie , M. Tagliasacchi e al., “Audiolm: a
language modeling app oach o audio gene a ion,”
IEEE/ACM ansac ions on audio, speech, and lan-
guage p ocessing, ol. 31, pp. 2523–2533, 2023.
[21] J. Cope , F. K euk, I. Ga , T. Remez,
D. Kan , G. Synnae e, Y. Adi, and A. Dé os-
sez, “Simple and con ollable music gene a ion,”
in Thi y-se en h Con e ence on Neu al In o ma-
ion P ocessing Sys ems, 2023. [Online]. A ailable:
h ps://open e iew.ne / o um?id=j iQ26sCJi
[22] A. Agos inelli, T. I. Denk, Z. Bo sos, J. Engel,
M. Ve ze i, A. Caillon, Q. Huang, A. Jansen,
A. Robe s, M. Tagliasacchi e al., “Musiclm:
Gene a ing music om ex ,” a Xi p ep in
a Xi :2301.11325, 2023.
[23] N. Zeghidou , A. Luebs, A. Om an, J. Skoglund,
and M. Tagliasacchi, “Sounds eam: An end- o-end
neu al audio codec,” IEEE/ACM T ans. Audio, Speech
and Lang. P oc., ol. 30, p. 495–507, No . 2021.
[Online]. A ailable: h ps://doi.o g/10.1109/TASLP.
2021.3129994
[24] R. G ay, “Vec o quan iza ion,” IEEE ASSP Magazine,
ol. 1, no. 2, pp. 4–29, 1984.
[25] F. Li, “Vall-e: A neu al codec language model,” 2023.
[Online]. A ailable: h p://gi hub.com/li ei eng/ all-e
[26] W.-Y. Hsiao, J.-Y. Liu, Y.-C. Yeh, and Y.-H. Yang,
“Compound wo d ans o me : Lea ning o compose
ull-song music o e dynamic di ec ed hype g aphs,”
in P oceedings o he AAAI Con e ence on A i icial In-
elligence, ol. 35, no. 1, 2021, pp. 178–186.
[27] Y.-S. Huang and Y.-H. Yang, “Pop music ans o me :
Bea -based modeling and gene a ion o exp essi e pop
piano composi ions,” in P oceedings o he 28 h ACM
In e na ional Con e ence on Mul imedia, se . MM ’20.
New Yo k, NY, USA: Associa ion o Compu ing
Machine y, 2020, p. 1180–1188. [Online]. A ailable:
h ps://doi.o g/10.1145/3394171.3413671
[28] Z. Yao, L. Guo, X. Yang, W. Kang, F. Kuang, Y. Yang,
Z. Jin, L. Lin, and D. Po ey, “Zip o me : A as e
and be e encode o au oma ic speech ecogni ion,”
in The Twel h In e na ional Con e ence on Lea ning
Rep esen a ions, 2024. [Online]. A ailable: h ps:
//open e iew.ne / o um?id=9WD9KwssyT
[29] K. Kilgou , M. Zuluaga, D. Roblek, and M. Sha i i,
“F éche audio dis ance: A e e ence- ee me ic
o e alua ing music enhancemen algo i hms,” in
In e speech, 2019. [Online]. A ailable: h ps://api.
seman icschola .o g/Co pusID:202725406
[30] A. Gui, H. Gampe , S. B aun, and D. Emmanouilidou,
“Adap ing eche audio dis ance o gene a i e music
e alua ion,” in P oc. IEEE ICASSP 2024, 2024.
[Online]. A ailable: h ps://a xi .o g/abs/2311.01616
[31] D. Edwa ds, S. Dixon, and E. Bene os, “Pijama: Pi-
ano jazz wi h au oma ic midi anno a ions,” T ansac-
ions o he In e na ional Socie y o Music In o ma-
ion Re ie al, 2023.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
630

Related note

Why organizations use Identific for document trust, entry 56
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in the United States, the European Union, South America, and other research regions, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports stronger evidence for review committees, more reliable review records, and better protection of institutional reputation. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For institutional reports, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com