Exploring Network Adaptations for Minimum Latency Real-Time Piano Transcription

Author: Patricia Hu; Silvan Peter; Jan Schlüter; Gerhard Widmer

Publisher: Zenodo

DOI: 10.5281/zenodo.17706339

Source: https://zenodo.org/records/17706339/files/000010.pdf

EXPLORING SYSTEM ADAPTATIONS FOR
MINIMUM LATENCY REAL-TIME PIANO TRANSCRIPTION
Pa icia Hu1Sil an Da id Pe e 1Jan Schlü e 1Ge ha d Widme 1,2
1Ins i u e o Compu a ional Pe cep ion, Johannes Keple Uni e si y Linz, Aus ia
2LIT AI Lab, Linz Ins i u e o Technology, Aus ia
[email p o ec ed]
ABSTRACT
Ad ances in neu al ne wo k design and he a ailabili y o
la ge-scale labeled da ase s ha e d i en majo imp o e-
men s in piano ansc ip ion. Exis ing app oaches a ge
ei he o line applica ions, wi h no es ic ions on compu-
a ional demands, o online ansc ip ion, wi h delays o
128–320 ms. Howe e , mos eal- ime musical applica-
ions equi e la encies below 30 ms. In his wo k, we in-
es iga e whe he and how he cu en s a e-o - he-a on-
line ansc ip ion model can be adap ed o eal- ime pi-
ano ansc ip ion. Speci ically, we elimina e all non-causal
p ocessing, and educe compu a ional load h ough sha ed
compu a ions ac oss co e model componen s and a ia-
ions in model size. Addi ionally, we explo e di e en
p e- and pos p ocessing s a egies, and ela ed label en-
coding schemes, and discuss hei sui abili y o eal- ime
ansc ip ion. E alua ing he adap ions on he MAESTRO
da ase , we ind a d op in ansc ip ion accu acy due o
s ic ly causal p ocessing as well as a adeo be ween he
p ep ocessing la ency and p edic ion accu acy. We elease
ou sys em as a baseline o suppo esea che s in designing
models owa ds minimum la ency eal- ime ansc ip ion.
1. INTRODUCTION
Au oma ic music ansc ip ion (AMT) is he ask o ans-
o ming audio signals in o hei symbolic music ep esen-
a ion, and is commonly e e ed o as one o he holy g ails
in Music In o ma ion Re ie al (MIR), gi en i s ole in
linking he audio and symbolic domain, as well as i s el-
e ance o a ious downs eam asks and musical applica-
ions [1,2]. The ansc ip ion o piano solo music is among
he mos ex ensi ely s udied asks, d i en by he ins u-
men ’s well-de ined onse cha ac e is ics and he a ailabil-
i y o la ge-scale, s ongly labeled aining da a [1]. Conse-
quen ly, esea ch on au oma ic piano ansc ip ion has seen
subs an ial p og ess. No able con ibu ions include [1–4],
wi h he o me wo se ing new benchma ks by le e aging
la ge a chi ec u es, inc eased model complexi y, ex ended
© P. Hu, S. Pe e , J. Schlü e and G. Widme . Licensed un-
de a C ea i e Commons A ibu ion 4.0 In e na ional License (CC BY
4.0). A ibu ion: P. Hu, S. Pe e , J. Schlü e and G. Widme , “Explo ing
Sys em Adap a ions o Minimum La ency Real-Time Piano T ansc ip-
ion”, in P oc. o he 26 h In . Socie y o Music In o ma ion Re ie al
Con ., Daejeon, Sou h Ko ea, 2025.
aining, and a no el eg ession-based a ge encoding ap-
p oach, leading o highe - esolu ion piano ansc ip ion.
P og ess in au oma ic piano ansc ip ion has ocused
almos exclusi ely on o line me hods, wi h only a ew
in es iga ions a emp ing o sol e his ask online [5–7].
These online sys ems ypically econ igu e o line ap-
p oaches o block-wise upda es while keeping audio ep-
esen a ions and ne wo k s uc u e, and achie e la encies
be ween 128 and 320 ms. La ency alues in he hun-
d eds o milliseconds a e sui able o musical applica ions
such as sub i ling, page u ning, o isualiza ions. How-
e e , mos in e ac i e musical applica ions equi e lowe
la encies, e.g., o digi al ins umen s, a commonly named
bound is 10 ms [8–10], and o ne wo ked ensemble play-
ing la encies o up o 30 ms a e accep able [11–13]. A
eal- ime ansc ip ion model should enable luen musi-
cal in e ac ion, o which we ake 30 ms as a minimal e-
qui emen , and 10 ms as a goal o impe cep ible la ency.
La ency s ems om a ious sou ces: audio bu e ing, p e-
p ocessing, model in e ence and pos p ocessing. Fo an
au oma ic piano ansc ip ion sys em o pe o m in e ence
in eal- ime, i mus minimize la ency in all sou ces. I
should adhe e o s ic causali y equi emen s in bo h he
model a chi ec u e and he pos p ocessing, ensu ing ha
bo h ely exclusi ely on pas in o ma ion.
In his wo k, we adap a s a e-o - he-a model o eal-
ime ansc ip ion owa ds minimal la ency. We do so by
allowing only causal p ocessing wi hin he model and e-
ducing compu a ional load by sha ing compu a ions ac oss
co e componen s o he model. We u he in es iga e he
la encies incu ed by widely used p e- and pos p ocessing
s a egies, and explo e op ions o mi iga e hese using a
combina ion o adap ed STFT p ocessing, label encoding
schemes, loss unc ions, and causal pos -p ocessing. Ou
con ibu ion is wo- old: Fi s , we examine he sou ces o
la ency and causali y iola ions in he cu en s a e-o - he-
a sys em in online piano ansc ip ion, and p opose and
e alua e changes in he modeling and p e- and pos p ocess-
ing s ages o e icien eal- ime ansc ip ion. Second, we
p o ide an open-sou ce basis o a low-la ency, eal- ime
au oma ic piano ansc ip ion, in i ing u he esea ch and
de elopmen in his a ea.
The emainde o his a icle is s uc u ed as ollows: In
Sec ion 2 we poin o ela ed wo k, one o which will o m
he ou s a ing poin o ou adap a ions owa ds mini-
mum la ency, which we will he e o e ocus on in g ea e
83
de ail in Sec ion 3. In Sec ion 4 we p esen ou la ency-
minimizing adap a ions and epo he expe imen s con-
duc ed o assess hei e ec ansc ip ion accu acy. We dis-
cuss he challenges and lessons lea ned, and an ou look o
u u e wo k in Sec ion 5.
2. RELATED WORK
As ou lined in he p e ious sec ion, bo h online and eal-
ime ansc ip ion emain la gely unexplo ed. We will dis-
cuss h ee no ewo hy con ibu ions [5–7].
Fe nandez [6] p oposes a pu ely con olu ional model-
ing app oach ha ocuses solely on onse and eloci y p e-
dic ion, achie ing a la ency be ween 4 and 9 seconds.
Kwon e al. [7, 14, 15] p opose an au o eg essi e neu-
al ne wo k o e icien online piano ansc ip ion. The
a chi ec u e comp ises wo main componen s: an acous ic
module consis ing o a s ack o con olu ional laye s wi h
equency-condi ioned FiLM laye s, and a no e sequence
module consis ing o pi chwise LSTMs, a mul i-s a e so -
max ou pu ( o di e en no e s a es: onse , sus ain, e-
onse , o se , and o ), and an au o eg essi e connec ion
om he no e s a e ou pu o he p e ious ime s ep o he
cu en sequence module inpu . The au ho s p opose a i-
ous a chi ec u es ha balance accu acy and la ency. O e -
all, hei models achie e la encies om 128 o 320 ms.
Kusaka and Maezawa [5] in oduce Mobile-AMT, a
amewo k designed o ackle bo h eal- ime p ocessing
and gene aliza ion o unseen eco ding en i onmen s in
au oma ic piano ansc ip ion. They op imize a s a e-o -
he-a o line ansc ip ion model [2] by eplacing i s con-
en ional con olu ional componen s wi h ligh weigh , e -
icien al e na i es [16] and ain i using a da a augmen-
a ion scheme ha simula es ou dis inc acous ic dis i-
bu ion shi s. The esul ing model se s he cu en s a e
o he a in online au oma ic piano ansc ip ion, achie -
ing F1 sco es compa able o o line s a e-o - he-a models
while being obus o in- he-wild eco dings. I s la ency is
epo ed as 174 ms – bu an appa en o e sigh in he a -
chi ec u e inc eases la ency o 10 s. We use his me hod as
ou s a ing poin and de ail i in he nex sec ion.
3. STARTING POINT
As Mobile-AMT [5] ep esen s he cu en s a e o he a
in eal- ime piano ansc ip ion, we use his me hod as he
ounda ion and e e ence me hod o ou adap a ions o-
wa ds minimum-la ency ansc ip ion. To p o ide con ex
o hese modi ica ions, we i s ou line he s uc u e o he
model, and pa icula ly ocus on he modeling aspec s ha
iola e causali y equi emen s o eal- ime sys ems and
necessi a e adap a ion.
3.1 Model A chi ec u e
Mobile-AMT [5] is a ligh weigh adap a ion o he s a e-
o - he-a o line piano ansc ip ion model by Kong e al.
[2]. I eplaces all con en ional con olu ional blocks wi h
MobileNe [16] equi alen s, which consis o dep hwise
sepa able con olu ions ha educe compu a ional com-
plexi y while main aining ep esen a ional powe . Addi-
ionally, Mobile-AMT emo es all bi-di ec ional lows in
ecu en model laye s, and d ops one o o iginally ou
acous ic model co e s acks ( he one o no e o se p edic-
ion, which is ins ead condi ioned on he ame and onse
ou pu 1). All op imiza ion and ac i a ion laye s a e e-
ained om he o iginal o line model. Wi h hese modi-
ica ions, he au ho s epo a esul ing la ency o 174 ms,
and a gue hei model o be capable o eal- ime in e ence.
Apa om dep hwise sepa able con olu ions, Mobile-
AMT also adop s MobileNe ’s Squeeze-and-Exci a ion
(SE) laye s o dynamically ecalib a e channel-wise ea-
u es. The squeeze ope a ion in ol es global pooling o e
all spa ial dimensions, and he e o e elies on in o ma ion
om he en i e ea u e map, making he ope a ion non-
causal. As Mobile-AMT p ocesses i s inpu in 10-second
blocks, he squeeze ope a ion adds 10 seconds o la ency,
which is no accoun ed o in he au ho s’ calcula ions.
3.2 Pos p ocessing
Mobile-AMT uses he same eg ession a ge encoding o -
ma as i s unde lying o line ansc ip ion model [2], esul -
ing in inc emen ally inc easing and dec easing onse and
o se a ge s o e a sequence o ames ins ead o bina y,
poin wise classi ica ion a ge s.
While his a ge encoding o ma allows o high, sub-
ame esolu ion o onse and o se de ec ion, i neces-
si a es non-causal pos p ocessing, as he de ec ion o an
onse /o se elies on bo h pas and u u e ames. The au-
ho s o Mobile-AMT adop he same pos p ocessing s a -
egy as [2], and accoun o i in hei o e all la ency calcu-
la ion.
3.3 T aining and E alua ion Se up
Mobile-AMT is ained on 10-second segmen s o 16 kHz
audio, which a e ans o med in o 229-binned mel spec-
og ams a e an STFT wi h a 2048-sample Hann window
and a hop size o 320 samples. The loss unc ion o he on-
se , o se and ame a ge s is bina y c oss-en opy (BCE),
and eloci ies a e ained using mean squa ed e o (MSE).
Mobile-AMT p oposes a da a augmen a ion scheme o
aining o enhance obus ness o eal-wo ld, in- he-wild
eco dings, he e o e aining du a ion depends on he da a
augmen a ions applied. Fo he non-augmen ed baseline,
Mobile-AMT is ained o 3000 epochs on he MAESTRO
da ase [1] wi h a ba ch size o 16 and uses he Adam op i-
mize wi h a lea ning a e o 0.001 annealed o 0 ollowing
a cosine schedule [12].
4. REAL-TIME ADAPTATIONS
As ou main con ibu ion, we implemen and e alua e
adap a ions o he s a ing poin ha aim o educe he
sys em’s la ency. We o m h ee g oups o expe imen s:
1The o se p edic ion is omi ed du ing in e ence, bu no addi ional
de ails a e p o ided on whe he o how he pos p ocessing is adjus ed o
accoun o his missing in o ma ion.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
84
adap ing he aining and pos p ocessing, adap ing he au-
dio p ep ocessing, and adap ing he ne wo k a chi ec u e.
We conduc each expe imen wi h a educed aining ime
o 500 epochs (30k upda es, app oxima ely one-six h o
he o al aining ime epo ed in ou e e ence me hod [5])
o sa e compu a ional esou ces, as mos e ec s become
e iden al eady du ing he ea ly aining s ages. We pe -
o m one inal compa ison o a model in which we com-
bine selec ed modi ica ions, and compa e his o ou e e -
ence me hod, wi h bo h models ained o 2000 epochs.
Fo all ou expe imen s, unless o he wise no ed, we
ain on 3-second segmen s o audio a 16 kHz, ans-
o med in o 229-bin mel spec og ams a e an STFT wi h
a Hann window o 2048 and a hop size o 160 samples. We
double he ame a e compa ed o Mobile-AMT o allow
o mo e equen upda es, and he e o e educed la ency.
Apa om he aining du a ion and da a encoding,
we ollow he same aining scheme as used in he non-
augmen ed se up in ou e e ence me hod, i.e., all adap ed
models a e ained and e alua ed on he MAESTRO 3.0
da ase using mi _e al [17]. As ou wo k ocuses on min-
imizing delay in eal- ime ansc ip ion, we mos ly ocus
on he no e onse me ics. Excep o he inal compa ison,
we only e alua e on he alida ion se . Since ou goal is o
achie e minimal la ency o eal- ime musical in e ac ions,
we assess in e ence pe o mance on s ic e iming ole -
ances (10–30 ms) o educe he misma ch be ween sys em
e alua ion and a ge pe o mance.
4.1 T aining and Pos p ocessing
In Mobile-AMT, he aining a ge o an onse o o -
se ex ends o e mul iple ime s eps, o ming a iangle
cen e ed on he a ge anno a ion. O iginally p oposed by
Kong e al. [2], such iangles allow exp essing posi ions a
a highe esolu ion han he ame a e. Howe e , p edic -
ing such iangles equi es lookahead in he model, as he
ou pu mus inc ease be o e he ac ual e en . Fu he mo e,
in e p e ing he p edic ions equi es lookahead in pos p o-
cessing in o de o ind each iangle’s maximum. As a
p epa a ion o swi ching o a causal e sion o Mobile-
AMT wi h causal pos p ocessing, we hus eplace he i-
angula a ge s wi h bina y a ge s ha a e ac i e only a
he ame ha is closes o an anno a ion.
As he changed label encoding esul s in a hea ily im-
balanced bina y classi ica ion ask, pa icula ly o onse
and o se a ge s, we y weigh ing hei posi i e occu -
ences wi h a ac o o 10 in he bina y c oss-en opy loss
[18]. Addi ionally, poin wise classi ica ion a ge s penal-
ize small empo al de ia ions mo e s ongly han iangu-
la a ge s. To accoun o his, we es applying a shi -
ole an loss ecen ly p oposed o bea de ec ion [19], wi h
a ole ance o ±1 ame (±10 ms a ou ame a e).
We also modi y he pos p ocessing o ensu e ha he
p edic ion o he cu en ame elies only on pas in o -
ma ion. Fi s , we bina ize he ou pu p obabili ies: o on-
se and o se a ge s, an ac i a ion is de ec ed when he
cu en ame exceeds a gi en h eshold while he p e i-
ous ame does no . A ame ac i a ion is eco ded i he
Tole ance 10 ms 20 ms 30 ms
Exp. Pos p ocessing onse h eshold: 0.45
TP1 9.58 ±2.29 25.80 ±4.83 47.44 ±7.30
TP2 29.84 ±8.37 47.64 ±11.76 50.28 ±11.83
TP3 13.95 ±4.91 34.34 ±8.05 45.10 ±8.33
TP4 27.24 ±7.47 55.49 ±9.73 64.95 ±10.06
TP5 16.00 ±5.32 41.02 ±8.50 55.40 ±8.53
Exp. Pos p ocessing onse h eshold: 0.55
TP1 13.42 ±2.60 35.07 ±5.31 56.02 ±8.14
TP2 23.56 ±8.41 35.50 ±11.44 36.87 ±11.49
TP3 17.04 ±5.52 40.54 ±8.26 51.46 ±8.29
TP4 27.42 ±7.52 54.95 ±10.18 63.67 ±10.72
TP5 17.46 ±5.59 44.19 ±8.53 58.68 ±8.61
Exp. Pos p ocessing onse h eshold: 0.65
TP1 19.10 ±3.12 43.88 ±7.02 59.93 ±9.77
TP2 14.32 ±6.70 20.52 ±8.68 21.07 ±8.70
TP3 20.70 ±5.90 46.48 ±8.28 56.72 ±8.28
TP4 27.18 ±7.65 53.65 ±10.71 61.55 ±11.36
TP5 18.74 ±5.80 46.80 ±8.60 61.05 ±8.89
Table 1. Compa ison o no e onse F1 sco es (mean ±
s d each o e h ee expe imen al uns) on he MAESTRO
.3 alida ion se ac oss h ee onse ole ances o di e -
en a ge encodings, (weigh ed) loss unc ions and onse
h esholds in pos p ocessing.
cu en ame su passes he h eshold. Nex , we elimina e
e-onse s o he same pi ch ha occu wi hin a p ede ined
minimum e-onse dis ance. Finally, we de e mine he o -
se o an ac i e no e based on he ea lie occu ing one o
ei he ame inac i i y o o se ac i i y.
We es he change in label encoding in combina ion
wi h di e en (weigh ed) loss unc ions in i e di e en
expe imen al se ups: In TP1 we ain ou e e ence model
wi h he o iginal loss unc ions and eg ession a ge en-
coding scheme as p oposed by he au ho s [5]. In TP2 we
use classi ica ion a ge s, and in TP3 we addi ionally ap-
ply a weigh ac o 10 on bo h he onse and o se a ge s.
Finally, we apply he shi - ole an BCE loss (wi h a ol-
e ance o ±10 ms), ei he unweigh ed (TP4) o weigh ed
again by a ac o o 10 (TP5). All i e se ups use ou
s ic ly causal pos p ocessing desc ibed abo e. The model
a chi ec u e emains as p oposed in Mobile-AMT.
Table 1 lis s he no e onse F1 sco es on he MAE-
STRO 3.0 alida ion se o he di e en aining a ge s
and loss unc ions, combined wi h di e en onse h esh-
old alues applied du ing pos p ocessing, as e alua ed on
di e en onse ole ance h esholds. We choose o e al-
ua e a lowe ole ance h esholds as hey e lec a eal-
ime sys em’s p ac ical esponsi eness be e han he com-
monly used ±50ms do, which migh mask la ency issues
by c edi ing he model o de ec ions ha would eel de-
layed o a use in an in e ac i e scena io. Lea ning poin -
wise bina y a ge s wi hou any u he modi ica ion (TP2)
p o es supe io o sho sequen ial eg ession (TP1), p o-
ided a low enough onse h eshold du ing pos p ocessing.
The e ec o lowe onse de ec ion h esholds is somewha
mi iga ed by weigh ing posi i e labeled examples highe
(TP3), and i ually ully elimina ed by using a shi -
ole an loss (TP4). The e is no u he imp o emen when
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
85
1
0
1
0
1
1024
0
1
160
2000 1500 1000 500 0 500 1000
0
1
Figu e 1. Mobile-AMT cen e s STFT windows on he
ime poin o p edic o , incu ing a delay o 1024 samples.
We shi he window o educe he delay o 160 samples,
and change he window unc ion o be e use ha limi ed
amoun o u u e in o ma ion.
combining weigh s and shi - ole ance (T5).
Fo he nex expe imen s, we use bina y a ge s wi h
weigh s. While he shi - ole an loss pe o med a o ably
he e, ou goal is o use a causal model, o which shi ol-
e ance could esul in sys ema ically delayed p edic ions.
4.2 Audio P ep ocessing
Mobile-AMT p ocesses audio in spec og am ames o
2048-sample windows cen e ed on he ime poin s o p e-
dic e en s o . Thus, e en wi h a causal model ha does
no p ocess in o ma ion om u u e ames, in eal- ime
in e ence, e e y ime a comple e audio bu e o 2048 sam-
ples is illed, in e ence will be igge ed o p edic e en s
ha a e al eady 1024 samples (64 ms) in he pas . Figu e 1
illus a es his: he op ow shows an audio wa e o m, he
second ow a ypical STFT window cen e ed a he onse .
Sho e il e s educe his delay, as he delay is ixed o hal
he il e leng h, bu his comes a he cos o lowe e-
quency esolu ion, which we should a oid: Wi h a 2048-
sample STFT a 16 kHz, he bin wid h is 7.8 Hz, which is
al eady oo coa se o achie e semi one p ecision o he
lowes piano no es (A0 a 27.5 Hz, BZ0 a 29.14 Hz).
We can howe e educe his delay o a lowe numbe
nso samples (e.g., 160 samples o 10 ms) while keeping
he window leng h and equency esolu ion unchanged,
by shi ing windows so hey end nssamples a e hei
e e ence poin ins ead o being cen e ed. The hi d ow
in Figu e 1 shows he Hann window shi ed o ns= 160
samples. The Hann window s ongly a enua es he bound-
a ies, he igh one o which now con ains highly ele-
an in o ma ion o he p edic ion. To mi iga e his un-
wan ed a enua ion, we can eplace he Hann window wi h
an asymme ic window ha ape s (2048 −ns)samples
be o e and nssamples a e he e e ence poin . The las
ow in Figu e 1 illus a es his windowing unc ion o a
1888/160 sample asymme y. No e how we keep mo e in-
o ma ion om he incoming samples in he g ay shaded
a ea unde he window unc ion, albei a he cos o in-
c easing spec al leakage (by abou 20 dB o ns= 160).
Tole ance 10 ms 20 ms 30 ms
H1 Hann 64 ms 17.43 ±5.10 34.65 ±7.40 39.86 ±7.45
H2 Hann 10 ms 0.00 ±0.01 0.00 ±0.02 0.04 ±0.10
T1 asym. 10 ms 22.25 ±5.17 25.21 ±5.20 25.84 ±5.18
T2 asym. 20 ms 28.61 ±7.09 33.76 ±7.01 34.65 ±6.89
T3 asym. 30 ms 27.61 ±6.79 37.91 ±7.54 39.43 ±7.33
T4 asym. 40 ms 24.39 ±6.50 37.51 ±7.81 39.87 ±7.60
T5 asym. 50 ms 20.99 ±6.02 36.88 ±7.75 40.47 ±7.59
ST asym. 10 ms 0.41 ±0.40 1.57 ±0.82 11.86 ±4.25
Table 2. No e onse F1 sco es on he MAESTRO .3 ali-
da ion se o di e en windowing unc ions. The las ow
addi ionally uses a shi - ole an aining loss.
To expe imen wi h di e en windowing con igu a ions
o educing he delay in audio p ep ocessing, we mod-
i y ou e e ence me hod o apply only causal p ocessing,
as allowing he model access o u u e ames would en-
de ou in e en ions meaningless. Speci ically, we make
each con olu ion causal, so he model’s ecep i e ield o
9 ames ex ends 8 ames in o he pas , a he han spli -
ing 4 ames in o he pas and 4 ames in o he u u e.
Addi ionally, we emo e he Squeeze-Exci a ion laye s o
he MobileNe V3 blocks, which pe o m global a e age
pooling o e bo h pas and u u e ames in an exce p .
Table 2 shows ou esul s. The o iginal cen e ed Hann
window wi h ou causal model (H1) pe o ms wo se han
ou non-causal s a ing poin (TP3 in Table 1). Shi ing he
Hann window om a delay o 64 ms o a delay o 10 ms
(H2) seems o comple ely a enua e usable in o ma ion in
he ames. Using an asymme ic window (T1) imp o es
pe o mance, bu s ill alls behind he cen e ed Hann win-
dow. Successi ely inc easing he delay up o 50 ms, we
see a s ong imp o emen (T2–T5 and Figu e 2). Fo
a delay o 30 ms o mo e, we ma ch pe o mance o he
cen e ed Hann window a an e alua ion onse ole ance o
30 ms. Fo s ic e ole ances, shi ed asymme ic windows
o 20 ms delay o mo e su pass he cen e ed Hann window.
We also ake he chance o in es iga e how a shi -
ole an loss o ±1 ame a ec s esul s o he causal
model. The loss could allow he model o sys ema ically
p edic e en s one ame (10 ms) la e han anno a ed. Su -
p isingly, using an asymme ic window wi h 10 ms o de-
lay, we ind ha he shi - ole an loss (ST) pe o ms on
pa wi h 30 ms delay (T3) when admi ing an e alua ion
ole ance o 50 ms (no shown in able), bu b eaks down
wi h any s ic e ole ance (as seen in he able).
Fo he hi d g oup o expe imen s, we keep he s ic es
se ing wi h asymme ic windows a a delay o 10 ms.
4.3 Model A chi ec u e
In ou inal g oup o expe imen s, we in es iga e a chi ec-
u al modi ica ions o ou e e ence model. The a chi ec-
u e o Mobile-AMT consis s o h ee acous ic s acks, each
consis ing o ecu en con olu ional blocks. Each s ack
lea ns a (onse , ame o eloci y) a ge . Fo some a ge s,
he ou pu s o mul iple s acks a e conca ena ed o condi-
ion he inal p edic ions. Compa ed o hei e e ence o -
line model [2], Mobile-AMT omi s he acous ic s ack o
he o se a ge , eusing he s ack o he ame a ge .
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
86
10 20 30 50
Onse ole ance (ms)
0.200
0.225
0.250
0.275
0.300
0.325
0.350
0.375
0.400
Mean no e onse F1 sco e
Asymme ic window wi h di e en amoun s o shi
10ms
20ms
30ms
40ms
50ms
Figu e 2. No e onse F1 sco es (means only) o di e en
window delays and onse ole ance h esholds.
Tole ance 10 ms 20 ms 30 ms
No e onse F1 mean ±s d
A1 20.23 ±4.29 23.03 ±4.25 23.75 ±4.24
A2 25.77 ±4.65 30.66 ±4.98 31.79 ±4.93
A3 21.59 ±4.79 24.48 ±4.83 25.14 ±4.80
A4 21.53 ±4.31 24.51 ±4.24 25.22 ±4.19
A5 19.44 ±4.20 22.39 ±4.34 23.12 ±4.30
A6 25.82 ±4.48 30.52 ±4.51 31.56 ±4.44
No e onse and o se F1 mean ±s d
A1 3.25 ±1.31 5.13 ±2.54 7.20 ±3.51
A2 3.56 ±1.29 6.17 ±2.80 8.39 ±4.01
A3 5.84 ±1.41 5.84 ±2.48 7.71 ±3.43
A4 5.94 ±1.31 5.94 ±2.48 7.89 ±3.49
A5 4.84 ±1.11 4.84 ±2.36 6.61 ±3.32
A6 6.79 ±1.15 6.79 ±2.53 8.89 ±3.59
Table 3. No e onse and onse -and-o se F1 sco es on he
MAESTRO .3 alida ion se ac oss h ee onse ole ances
o a chi ec u al modi ica ions and inpu ep esen a ions.
Ou expe imen s in ol e he ollowing adap a ions,
each es ed independen ly: Fi s , in A1 we ( e)in oduce a
sepa a e o se acous ic s ack o explo e whe he and how
i imp o es o se label p edic ion. In A2 we emo e he
eloci y condi ioning on he onse s. Nex , we examine
whe he u he s eamlining he a chi ec u e by sha ing
a ou h (A3), hal (A4) o all (A5) o he con olu ional
blocks in he model’s acous ic s acks a ec s pe o mance.
Las ly, in A6 we examine he e ec o aining on he o ig-
inal 10 seconds sequence leng h.
Table 3 p esen s he no e onse and no e onse -and-
o se F1 sco es o he model and da a adap a ions on he
MAESTRO alida ion se . O e all, i is e iden ha he
combined impac o bina y, hea ily imbalanced poin wise
a ge s, causal modeling, and shi ed asymme ic window
esul s in a signi ican ly ha de lea ning p oblem, wi h
he same aining du a ion (500 epochs) leading o signi -
ican ly poo e sco es han he base case (TP1 in Table 1).
Howe e , ac oss all expe imen al se ups in Sec ion 4.1
compa ed o he cu en one, all ou causal modi ica ions
demons a e signi ican ly s onge obus ness o dec easing
ole ance h esholds, which is impo an o gua an ee low
la ency in p edic ions, and he e o e appea p omising o
u he aining.
Fu he mo e, when compa ing all model a chi ec u e
modi ica ions (A1-5) on no e onse and no e-onse -and-
o se F1 sco e, we obse e wo unexpec ed model be-
ha iou s: Fi s , adding a sepa a e o se s ack (A1) does
no imp o e o se p edic ion. As ou pos p ocessing de-
ec s a no e o se as he ea lie o ei he o se ac i a ion
o ame inac i a ion, we hypo hesize ha ame ac i i y
is su icien ly lea ned o compensa e o he absence o an
o se acous ic s ack. Second, emo ing he eloci y con-
di ioning on onse p edic ion (A2) esul s in a s ong im-
p o emen in onse p edic ion. Fu he mo e, sha ing he
acous ic s ack ac oss inc easing p opo ions (A3-5) does
no appea o hinde he model’s abili y o lea n meaning-
ul ep esen a ions. Finally, expe imen A6 sugges s ha
he model bene i s om he la ge con ex ual window.
4.4 Final compa ison
Fo ou inal compa ison, we p oceed wi h he ollowing
da a and model con igu a ions: we con inue wi h he (160
samples) shi ed asymme ic window o he STFT (T1 in
Sec. 4.2), emo e he eloci y condi ioning (A2) and sha e
all con olu ional laye s in he acous ic s ack ac oss all a -
ge s (A5). Mobile-AMT uses he o iginal non-causal pos -
p ocessing desc ibed in Sec ion 3.2, while ou model use
he causal pos p ocessing in oduced in Sec ion 4.1.
Table 4 summa izes he esul s o e di e en onse (and
o se ) h esholds o no e onse and onse -and-o se me -
ics. As expec ed, Mobile-AMT ou pe o ms ou modi ied
causal model ac oss all me ics, wi h a signi ican ma gin.
Upon e iewing all expe imen s conduc ed, we conclude
ha he la ges pe o mance d ops a e a ibu ed o he
shi ed window unc ion and he causal con olu ions in ou
model. When compa ing Mobile-AMT and ou adap ed
model ac oss a ious onse ole ance h esholds, we ob-
se e, simila o he p e ious expe imen , ha while ou
modi ied causal model p edic s ewe a ge s wi h lowe
accu acy o e all, i demons a es highe p ecision and o-
bus ness when e alua ed a s ic e iming ole ances.
5. DISCUSSION AND OUTLOOK
In his wo k, we in es iga e whe he and how he cu -
en s a e o he a in eal- ime piano ansc ip ion can be
adap ed o achie e minimum-la ency au oma ic piano an-
sc ip ion sui able o eal- ime musical in e ac ion.
Wha la ency is sui able canno be answe ed uni e -
sally, so ou choice o 10–30 ms is wo hy o discus-
sion. While 10 ms is sugges ed in digi al ins umen de-
sign [8–10], h esholds o la ency pe cep ion a y depend-
ing on he musical si ua ion, ask, and ins umen : o pe -
cussi e digi al ins umen s, dec eased a ings o alues
o 20 ms and abo e we e ound [20], ins umen -speci ic
h esholds be ween below 10 ms and 40 ms a e epo ed
in a li e moni o ing se ing [21], and abou 30 ms we e
ound o ges u al con ol [22]. O se s as low as 6 ms may
be pe cei ed in simple isoch onously spaced s imuli [23],
while o he esea che s ound jus no iceable la ency di -
e ences a 27 ms and highe [24]. T ansc ip ion-enabled
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
87

No e Onse No e Onse wi h O se
Model Tol. (ms) P ecision Recall F1 P ecision Recall F1
Causal-AMT 10 43.51 ±6.87 25.60 ±8.64 31.55 ±7.76 6.86 ±2.15 4.11 ±2.03 5.03 ±2.08
Mobile-AMT 10 22.70 ±5.82 15.57 ±2.96 18.26 ±3.51 3.27 ±1.19 2.32 ±0.94 2.69 ±1.00
Causal-AMT 20 50.86 ±7.52 29.71 ±9.11 36.70 ±7.96 10.89 ±3.91 6.34 ±2.90 7.86 ±3.16
Mobile-AMT 20 59.28 ±7.41 41.87 ±8.44 48.52 ±6.96 11.03 ±4.15 7.98 ±3.73 9.16 ±3.84
Causal-AMT 30 51.85 ±7.49 30.24 ±9.07 37.38 ±7.85 14.49 ±5.72 8.33 ±3.93 10.37 ±4.42
Mobile-AMT 30 81.17 ±6.29 57.84 ±12.79 66.80 ±9.78 18.42 ±6.89 13.27 ±6.09 15.26 ±6.30
Table 4. Final compa ison be ween ou implemen a ion o Mobile-AMT and ou modi ied minimal-la ency, s ic ly causally
adapa ed model.
eal- ime applica ions like in e ac i e accompanimen o
gene a i e imp o isa ion a e mo e akin o ensemble play-
ing han di ec ins umen con ol. In ne wo ked musical
con ex s, esea che s ypically aim o 20–30 ms o la ency
o mee pe o mance condi ions ha mi o adi ional in-
pe son ensembles [11, 12]. Howe e , s udies also ound
ha musicians may be able o compensa e o la encies up
o 50 ms [13,25] o e en 100 ms [26] o one piano piece, a
alue ha was deemed “nei he musical no in e ac i e” in
ano he s udy [27]. A eal- ime ansc ip ion model should
no only be ole able bu enable luen musical in e ac ion,
so we ook 30 ms as a minimal equi emen , and 10 ms as
a goal o impe cep ible la ency.
We in es iga e mul iple adap a ions o educe la-
ency, including label encoding wi h causal pos p ocess-
ing, shi ed asymme ic window unc ions du ing p ep o-
cessing, and a chi ec u al modi ica ions ha en o ce causal
p ocessing wi hin he model. Addi ionally, we educe he
model size ( om 320 o 160 GFLOPs o 3 seconds o in-
pu ) by sha ing compu a ions ac oss co e model compo-
nen s o all a ge s.
In a i s se o expe imen s, we assess he impac o
eg ession e sus classi ica ion loss encodings o non-
causal models. The o iginal eg ession a ge s only make
sense in conjunc ion wi h a lookahead as he a ge s begin
o inc ease se e al ames be o e he ac ual onse which
is impossible o a causal model o p edic . To mi i-
ga e he cos in aining s abili y and accu acy incu ed by
localized, causal- eady a ge s, we expe imen wi h loss
unc ions ha weigh he ac i e ames o e he inac i e
ame o comba label imbalance, and loss unc ions ha
a e ole an o small empo al shi s. We ind ha he
weigh ed classi ica ion losses app oxima e he baseline,
and he shi - ole an losses each he same le el in he ab-
sence o a ge s equi ing lookahead.
In a second expe imen , we in es iga e he delay in-
cu ed by he compu a ion o audio ea u e ep esen a ions.
Speci ically, we look a STFT windows and hei co -
esponding cen e ed a ge s. T ansc ip ion equi es high
equency esolu ion o pi ch es ima ion which equi es
la ge windows. Cen e ing he a ge s esul s in an o en
o e looked delay o hal he window leng h, 64 ms in ou
case. We es con igu a ions o shi ed windows along wi h
asymme ic windowing unc ions ha do no a enua e he
mos ecen samples. We ind ha agg essi ely shi ed
windows a 10 ms do de e io a e he ansc ip ion accu acy
by a lo , ye a 30 ms, we each compa able pe o mance o
an unshi ed causal model. He e, a shi - ole an loss does
no imp o e pe o mance. A he same ime, con igu ing
he model a chi ec u e o s ic ly causal p ocessing also
de e io a es pe o mance wi h espec o he baseline wi h
mo e han 100 ms o lookahead.
In a hi d expe imen , we assess di e en model a chi-
ec u es and hei impac . We obse e ha sha ing he con-
olu ional componen s o he acous ic s ack ac oss di e -
en a ge ypes p o es bene icial. We hypo hesize ha he
local acous ic ea u es cap u ed in he con olu ional laye s
o he acous ic model can be e ec i ely lea ned indepen-
den ly o sequen ial in o ma ion, making hem in a ian o
he a ge ype. Fu he mo e, emo ing he eloci y con-
di ioning on he onse s s ongly imp o es he accu acy o
onse p edic ions.
O e all, we ind ha we can compensa e well o algo-
i hmic issues: we can scale he model and use lookahead-
ee a ge s wi hou a majo d op in pe o mance. Wha
p o es di icul , howe e , is o ende he model s ic ly
causal and o e ec i ely p ocess he incoming audio wi h-
ou loss o ele an in o ma ion. Fo a la ency o 10 ms, i
would be equi ed ha he model p edic s pi ches wi h a
mos 10 ms o incoming audio samples. Fo onse s o he
lowes wo oc a es on he piano, his means ha he e is no
e en a ull pe iod o he undamen al equency p esen in
he samples, and p edic ions may need o ely on ha monic
pa ials. Along wi h he ansien phase and he conse-
quen ly blu y STFT ame, his leads o an inc easingly
ha d ansc ip ion ask. We hope ha hese indings and
pinpoin ed challenges will con ibu e o u u e esea ch on
eal- ime, minimum la ency au oma ic piano ansc ip ion.
While his s udy p ima ily ocuses on he algo i hmic
pe o mance and obus ness o a eal- ime ansc ip ion
model, we acknowledge ha a de ailed analysis o p o-
cessing ime—including bo h ne wo k in e ence and p e-
p ocessing—ac oss di e en ha dwa e pla o ms emains
an impo an a ea o u u e wo k o allow o he p ac i-
cal deploymen o a eal- ime ansc ip ion sys em in eal-
wo ld scena ios. Likewise, we wan o ake a close in-
spec ion in o he design o he unde lying window unc-
ion and il e bank, in o de o ind an app op ia e balance
be ween educing p edic ion delay and inc easing u u e
con ex , all while main aining he desi ed STFT p ope ies.
Las ly, ac oss all expe imen al g oups, ou sys em adap a-
ions consis en ly ou pe o med he baseline a lowe im-
ing ole ance, which we conside a desi able p ope y wo -
hy o u he in es iga ion.
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
88
6. ACKNOWLEDGEMENTS
This esea ch acknowledges suppo by he Eu opean Re-
sea ch Council (ERC), unde he Eu opean Union’s Ho i-
zon 2020 esea ch and inno a ion p og amme, g an ag ee-
men No. 101019375 Whi he Music?. The LIT AI Lab is
suppo ed by he Fede al S a e o Uppe Aus ia.
7. REFERENCES
[1] C. Haw ho ne, A. S asyuk, A. Robe s, I. Simon, C.-
Z. A. Huang, S. Dieleman, E. Elsen, J. Engel, and
D. Eck, “Enabling ac o ized piano music modeling
and gene a ion wi h he MAESTRO da ase ,” in In-
e na ional Con e ence on Lea ning Rep esen a ions,
2019.
[2] Q. Kong, B. Li, X. Song, Y. Wan, and Y. Wang, “High-
Resolu ion Piano T ansc ip ion wi h Pedals by Re-
g essing Onse and O se Times,” IEEE/ACM T ans-
ac ions on Audio Speech and Language P ocessing,
ol. 29, pp. 3707–3717, 2021.
[3] S. Sig ia, E. Bene os, and S. Dixon, “An end- o-end
neu al ne wo k o polyphonic piano music ansc ip-
ion,” IEEE/ACM T ansac ions on Audio, Speech, and
Language P ocessing, ol. 24, no. 5, pp. 927–939,
2016.
[4] R. Kelz, S. Böck, and G. Widme , “Deep polyphonic
ads piano no e ansc ip ion,” in ICASSP 2019-2019
IEEE In e na ional Con e ence on Acous ics, Speech
and Signal P ocessing (ICASSP). IEEE, 2019, pp.
246–250.
[5] Y. Kusaka and A. Maezawa, “Mobile-AMT: Real-
Time Polyphonic Piano T ansc ip ion o In- he-Wild
Reco dings,” in 2024 32nd Eu opean Signal P ocess-
ing Con e ence (EUSIPCO). IEEE, 2024, pp. 36–40.
[6] A. Fe nandez, “Onse s and Veloci ies: A o dable
Real-Time Piano T ansc ip ion Using Con olu ional
Neu al Ne wo ks,” in 2023 31s Eu opean Signal P o-
cessing Con e ence (EUSIPCO). IEEE, 2023, pp.
151–155.
[7] T. Kwon, D. Jeong, and J. Nam, “Towa ds E icien and
Real-Time Piano T ansc ip ion Using Neu al Au o e-
g essi e Models,” IEEE/ACM T ansac ions on Audio,
Speech, and Language P ocessing, 2024.
[8] D. Wessel and M. W igh , “P oblems and p ospec s o
in ima e musical con ol o compu e s,” Compu e mu-
sic jou nal, ol. 26, no. 3, pp. 11–22, 2002.
[9] A. P. McPhe son, R. H. Jack, and G. Mo o, “Ac ion-
sound la ency: A e ou ools as enough?” in
16 h In e na ional Con e ence on New In e aces o
Musical Exp ession, NIME 2016, G i i h Uni e si y,
B isbane, Aus alia, July 11-15, 2016. nime.o g,
2016, pp. 20–25. [Online]. A ailable: h ps://doi.o g/
10.5281/zenodo.3964611
[10] F. Caspe, J. Shie , M. Sandle , C. Sai is, and
A. McPhe son, “Designing neu al syn hesize s o low
la ency in e ac ion,” a Xi p ep in a Xi :2503.11562,
2025.
[11] L. Tu che and C. Ro ondi, “On he ela ion be ween
he ields o ne wo ked music pe o mances, ubiqui-
ous music, and in e ne o musical hings,” Pe sonal
and Ubiqui ous Compu ing, ol. 27, no. 5, pp. 1783–
1792, 2023.
[12] E. Lakio akis, C. Liaskos, and X. Dimi opoulos, “Im-
p o ing ne wo ked music pe o mance sys ems us-
ing applica ion-ne wo k collabo a ion,” Concu ency
and Compu a ion: P ac ice and Expe ience, ol. 31,
no. 24, p. e4730, 2019.
[13] E. Chew, R. Zimme mann, A. A. Sawchuk, C. Ky -
iakakis, C. Papadopoulos, A. F ançois, G. Kim,
A. Rizzo, and A. Volk, “Musical in e ac ion a a dis-
ance: Dis ibu ed imme si e pe o mance,” in P o-
ceedings o he MusicNe wo k Fou h Open Wo kshop
on In eg a ion o Music in Mul imedia Applica ions.
MusicNe wo k Ba celona, 2004, pp. 15–16.
[14] T. Kwon, D. Jeong, and J. Nam, “Polyphonic Piano
T ansc ip ion Using Au o eg essi e Mul i-S a e No e
Model,” in The 21 h In e na ional Socie y o Music
In o ma ion Re ie al Con e ence (ISMIR). In e na-
ional Socie y o Music In o ma ion Re ie al, 2020.
[15] D. Jeong and S. Telecom, “Real- ime au oma ic piano
music ansc ip ion sys em,” in La e B eaking Demo.
In e na ional Socie y o Music In o ma ion Re ie al,
2020, pp. 4–6.
[16] A. Howa d, M. Sandle , G. Chu, L.-C. Chen, B. Chen,
M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasude an e al.,
“Sea ching o MobileNe V3,” in P oceedings o he
IEEE/CVF in e na ional con e ence on compu e i-
sion, 2019, pp. 1314–1324.
[17] C. Ra el, B. McFee, E. J. Humph ey, J. Salamon,
O. Nie o, D. Liang, D. P. Ellis, and C. C. Ra el,
“MIR_EVAL: A T anspa en Implemen a ion o Com-
mon MIR Me ics.” in ISMIR, ol. 10, 2014, p. 2014.
[18] R. M. Bi ne , J. J. Bosch, D. Rubins ein, G. Mesegue -
B ocal, and S. Ewe , “A ligh weigh ins umen -
agnos ic model o polyphonic no e ansc ip ion and
mul ipi ch es ima ion,” in P oceedings o he IEEE In-
e na ional Con e ence on Acous ics, Speech, and Sig-
nal P ocessing (ICASSP), Singapo e, 2022.
[19] F. Fosca in, J. Schlü e , and G. Widme , “Bea his!
Accu a e bea acking wi hou DBN pos p ocessing,”
a Xi p ep in a Xi :2407.21658, 2024.
[20] R. H. Jack, A. Meh abi, T. S ockman, and A. McPhe -
son, “Ac ion-sound la ency and he pe cei ed quali y o
digi al musical ins umen s: Compa ing p o essional
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
89
pe cussionis s and ama eu musicians,” Music Pe cep-
ion, ol. 36, no. 1, pp. 109–128, 09 2018. [Online].
A ailable: h ps://doi.o g/10.1525/mp.2018.36.1.109
[21] M. Les e and J. Boley, “The e ec s o la ency on li e
sound moni o ing,” Jou nal o he Audio Enginee ing
Socie y, no. 7198, oc obe 2007.
[22] T. Mäki-Pa ola and P. Hämäläinen, “La ency ole ance
o ges u e con olled con inuous sound ins umen
wi hou ac ile eedback,” in P oceedings o he 2004
In e na ional Compu e Music Con e ence, ICMC
2004, Miami, Flo ida, USA, No embe 1-6, 2004.
Michigan Publishing, 2004. [Online]. A ailable:
h ps://hdl.handle.ne /2027/spo.bbp2372.2004.032
[23] A. F ibe g and J. Sundbe g, “Time disc imina ion in
a mono onic, isoch onous sequence,” The Jou nal o
he Acous ical Socie y o Ame ica, ol. 98, no. 5, pp.
2524–2531, 1995.
[24] A. Schmid, M. Amb os, J. Bogon, and R. Wimme ,
“Measu ing he jus no iceable di e ence o audio
la ency,” in P oceedings o he 19 h In e na ional
Audio Mos ly Con e ence: Explo a ions in Sonic
Cul u es, AM 2024, Milan, I aly, Sep embe 18-
20, 2024, L. A. Ludo ico and D. A. Mau o,
Eds. ACM, 2024, pp. 325–331. [Online]. A ailable:
h ps://doi.o g/10.1145/3678299.3678331
[25] S. Dahl and R. B esin, “Is he playe mo e in luenced
by he audi o y han he ac ile eedback om he in-
s umen ,” in P oceedings o he Digi al Audio E ec s
Con e ence (DAFx), 2001, pp. 6–9.
[26] A. A. Sawchuk, E. Chew, R. Zimme mann, C. Pa-
padopoulos, and C. Ky iakakis, “F om emo e media
imme sion o dis ibu ed imme si e pe o mance,”
in P oceedings o he 2003 ACM SIGMM Wo k-
shop on Expe ien ial Telep esence, se . ETP ’03.
New Yo k, NY, USA: Associa ion o Compu ing
Machine y, 2003, p. 110–120. [Online]. A ailable:
h ps://doi.o g/10.1145/982484.982506
[27] C. Ba le e, D. Headlam, M. Bocko, and G. Velikic,
“E ec o ne wo k la ency on in e ac i e musical
pe o mance,” Music Pe cep ion, ol. 24, no. 1,
pp. 49–62, 09 2006. [Online]. A ailable: h ps:
//doi.o g/10.1525/mp.2006.24.1.49
P oceedings o he 26 h ISMIR Con e ence, Daejeon, Ko ea, Sep embe 21-25, 2025
90

Related note

Why organizations use Identific for document trust, entry 40
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in large academic systems, distance-learning programs, and cross-border universities, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports faster first-level screening, better protection of institutional reputation, and better handling of multilingual submissions. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For conference papers, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com