A Spanish multispeaker database of esophageal speech

Author: Serrano García, Luis,Raman, Sneha,Hernáez Rioja, Inmaculada,Navas Cordón, Eva,Sánchez de la Fuente, Jon,Saratxaga Couceiro, Ibon

Publisher: Elsevier

Year: 2021

DOI: 10.1016/j.csl.2020.101168

Source: https://addi.ehu.eus/bitstream/10810/47986/4/AhoSLABI_CSL.V7.pdf

A Spanish Mul ispeake Da abase o Esophageal Speech
Luis Se ano Ga c´ıa, Sneha Raman, Inma He n´aez Rioja, E a Na as Co d´on,
Jon Sanchez, Ibon Sa a xaga
HiTZ Basque Cen e o Language Technology, Uni e si y o he Basque Coun y
(UPV/EHU), Bilbao, Spain
Abs ac
A la yngec omee is a pe son whose la ynx has been emo ed by su ge y, usually
due o la yngeal cance . A e su ge y, mos la yngec omees a e able o speak
again, using echniques ha a e lea ned wi h he help o a speech he apis .
This is e med as ala yngeal speech, and esophageal speech (ES) is one o he
se e al ala yngeal speech p oduc ion modes. A conside able amoun o esea ch
has been dedica ed o he s udy o ala yngeal speech, wi h a wide ange o aims
such as helping speech he apis s wi h e alua ion and diagnosis, and imp o -
ing i s quali y and in elligibili y using digi al signal p ocessing echniques. We
p esen o you a da abase o Spanish ES oices, named AhoSLABI, which is
designed o allow he de elopmen o new suppo echnologies o his speech
impai men . The da abase p ima ily consis s o eco dings o 31 la yngec omees
(27 males and 4 emales) p onouncing phone ically balanced sen ences. Addi-
ionally, i includes pa allel eco dings o he sen ences by 9 heal hy speake s
(6 males and 3 emales) o acili a e speech p ocessing asks ha equi e small
pa allel co po a, such as oice con e sion o syn he ic speech adap a ion. Apa
om he sen ences, he da abase includes sus ained owels and a small se o
isola ed wo ds, which can be aluable o esea ch on ES analysis, diagnosis and
e alua ion. The pape desc ibes he main con en s o he da abase, he eco d-
ing p o ocols and p ocedu e, as well as he labeling p ocess. The main acous ic
1Cu en Add ess: Communica ions Enginee ing Dep ., Facul y o Enginee ing o Bilbao,
Uni e si y o he Basque Coun y (UPV/EHU) Spain
P ep in submi ed o Compu e Speech and Language Sep embe 23, 2020
This is he acep ed e sion o an a icle published by Else ie . The inal e sion o Luis Se ano Ga cía, Sneha
Raman, Inma He náez Rioja, E a Na as Co dón, Jon Sanchez, Ibon Sa a xaga. A Spanish mul ispeake da abase o
esophageal speech. Compu e Speech & Language 66 : (2021) // A icle ID 101168 is a ailable a h ps://
doi.o g/10.1016/j.csl.2020.101168 ©2020 Else ie L d. All igh s ese ed.This manusc ip e sion is made a ailable
unde he CC-BY-NC-ND 4.0 license h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/
cha ac e is ics o he oices, such as speaking a e, du a ions o he eco dings,
phones and silences, and o he such cha ac e is ics a e compa ed wi h hose o
a educed se o heal hy oices. In addi ion, we desc ibe an expe imen using
he da abase o imp o e he pe o mance o an ASR sys em o ES speake s.
This new esou ce will be made a ailable o he scien i ic communi y wi h he
hope ha i will be used o imp o e he quali y o li e o he la yngec omees.
Keywo ds: esophageal speech, oice con e sion, speech da abases, speech
in elligibili y, speech analysis
2010 MSC: 00-01, 99-00
1. In oduc ion
Esophageal speech (ES) is a ype o speech p oduced by la yngec omees,
which a e people whose la ynx has been su gically emo ed. The la ynx is a
undamen al o gan in he speech p oduc ion mechanism. I con ains he ocal
olds, which a e esponsible o gene a ing he ai ib a ions ha a e pe cei ed5
as a sound. In addi ion o he emo al o he la ynx, he la yngec omy sepa a es
he nasal ca i y and he ocal ac . As a esul , he la yngec omees b ea he
h ough a hole (called he s oma) which le s he ou side ai di ec ly in o he
achea. Despi e he emo al o he ocal olds, i is s ill possible o people who
ha e unde gone a o al la yngec omy o p oduce in elligible speech ia one o 10
he h ee main ypes o ala yngeal speech: using an elec o-la ynx (EL Speech),
acheoesophageal speech (TES) and ES.
EL speech uses an ex e nal ib a ion de ice which is placed in con ac wi h
he h oa . This de ice gene a es an acous ic buzz ha can be modula ed by he
mo emen o he a icula o s. In elligible speech is ob ained wi h his me hod,15
bu he quali y is poo , mainly due o he dominan buzzing. The main and
pe haps he only ad an age o his me hod is ha no lea ning is equi ed.
In Heal hy La yngeal Speech (HS), he ai ha lows h ough he lungs and
he achea ib a es he ocal olds o c ea e sound. This is no ana omically
possible o a la yngec omee. The e o e, ai low is p oduced using o he s a e-20
2
gies, he e ec i eness o which depends on he cha ac e is ics and ana omy o
each pe son. A su gical solu ion is o c ea e a is ula allowing ai o pass be-
ween he achea and he esophagus. A al e is placed in he is ula so ha no
ood o liquid can pass o he achea. The ai low aided by his al e p oduces
ib a ions o he esophageal sphinc e , which gene a es TES. TES is mo e in el-25
ligible and equi es less e o om he speake han o he me hods o ob aining
he ai [1][2][3][4][5][6]. Howe e , he al es mus be changed pe iodically ( e-
qui ing su ge y) and he e a e o he possible medical complica ions associa ed
wi h he implan [7][8].
Unlike TES speake s, ES speake s do no ha e he al e ha allows he30
con olled en y o ai . ES speake s achie e his unc ion by swallowing ai and
expelling i , e y much like he p oduc ion o a bu p. Like TES, he pha yn-
goesophageal segmen is used as a subs i u e ib a o y elemen ins ead o he
ocal olds. Lea ning o p oduce speech in his manne equi es long pe iods
o aining (usually mon hs) wi h he assis ance o a speech he apis . Due o35
he di icul ies in he p oduc ion me hod [9], some indi iduals ne e manage o
lea n ES. Howe e , despi e he long pe iods o lea ning, ES has he ad an age
o no equi ing a de ice o pe iodic su ge ies. The e o e, we conside he e is
a clea ad an age in p omo ing he lea ning o ES.
As ES and HS p oduc ion mechanisms a e e y di e en , hei speech signal40
cha ac e is ics di e g ea ly oo [4][10]. The main consequence o ES is a
d ama ic educ ion in na u alness and in elligibili y [11][5][12]. A conside able
amoun o esea ch e o has been de o ed o o e come hese limi a ions o ES,
some o which in ol e a i icially modi ying i s cha ac e is ics.
The e ha e been se e al app oaches o enhance he quali y and in elligibili y45
o ala yngeal oices. Some s udies use sou ce- il e analysis o he pa hological
signal and ocuses on modi ying he sou ce, he il e , o bo h. An example
o his app oach can be ound in [13], whe e an adap i e gain equalize algo-
i hm was used o modi y he ES sou ce; o in [14], whe e he econs uc ion
o no mal sounding speech o la yngec omy pa ien s was a emp ed h ough50
a modi ied CELP (Code Exci ed Linea P edic ion) codec. In [15], di e en
3
manipula ions o bo h sou ce and il e we e e alua ed. Ano he app oach o
imp o e in elligibili y and quali y is o wo k wi h he p osodic elemen s. In [16],
he pi ch in o ma ion ex ac ed om an elec oglo og aph (EGG) was used o
c ea e a syn he ic glo al signal which educed ji e and shimme . Addi ion-55
ally, spec al smoo hing and il co ec ion we e applied. These modi ica ions
educed he ha shness and b ea hiness o he TES speech. The same au ho s
desc ibe a me hod o ec i ying he du a ion o pa hological phones in [17].
Along he same lines, [18] p esen s a sys em whe e conca ena ion o andomly
chosen heal hy e e ence pa e ns eplaces he pa hological exci a ion, adjus ing60
he sho , medium and long- e m a iabili y o he pi ch.
A di e en app oach o he p oblem is o use Voice Con e sion (VC) ech-
niques. VC aims o modi y he cha ac e is ics o he oice o an inpu speake ,
making hem sound like hose o a a ge speake . In he classical app oach
[19], a con e sion unc ion is ained using da a om bo h sou ce and a ge 65
speake . Al hough non-pa allel aining is also possible [20], in VC, a se o
pa allel sou ce- a ge sen ences is desi able. A se o 50 phonemically balanced
sen ences in Japanese we e used o e alua e he pe o mance and capabili y
o he di e en VC s a egies ha we e aimed a imp o ing he quali y and
in elligibili y o ala yngeal oices [21][22][23][24][25][26][27].70
In his pape , we p esen an acous ic speech da abase speci ically designed
o he esea ch o speech con e sion echniques, applied o ES. The pu pose o
de eloping he AhoSLABI da abase2was o compile acous ic da a which would
allow us o in es iga e he use o VC echniques in imp o ing he in elligibili y
and quali y o ES. Some o ou p e ious wo k in he ield o pe sonalized syn-75
he ic oices [28][29] has e ealed ha la yngec omees a e a highly in e es ed
use g oup o he echnological de elopmen s in speech syn hesis, speech ecog-
ni ion and VC echniques. Al hough no o icial s a is ics ha e been published,
in 2018, an es ima ed 1200 la yngec omies we e pe o med in Spain [8]. We
2The name is a combina ion o he labo a o y name o he au ho s (Aholab), and he name
o he Biscayan Associa ion o La yngec omees ASLABI
4
aim o p o ide some use ul ools o hese la yngec omees, and o social and80
geog aphical easons, we ha e de eloped he da abase in Spanish. To p omo e
esea ch and he compa ison o echniques and esul s, we ha e p o ided an
open access phone ically labeled da abase.
Fi s , we p esen an o e iew o he exis ing da abases in he ollowing sec-
ion. The ea e , we desc ibe he con en s o he da abase and he p ocesses85
pe o med. Sec ion “Design o he AhoSLABI da abase” desc ibes he co pus
con en s and cha ac e is ics o he speake s, as well as he eco ding se up. The
“Resul s” sec ion p esen s some me ics o he da abase and p o ides some lin-
guis ic and acous ic s a is ics abou i s con en s. In his sec ion, we also epo
he p ocess o ex ac ion o phone ic labels and he e alua ion o he au oma ic90
labeling p ocedu e. In addi ion, we gi e some p elimina y esul s o ASR and
VC expe imen s pe o med wi h he da abase. The inal sec ion p esen s he
conclusions and discusses possible u u e uses o he da abase.
2. Exis ing Rela ed Ma e ial
In some ypes o pa hological speech such as dysa h ic speech, ce ain95
da abases ha e been ex ensi ely used and ha e become a de ac o s anda d
[30][31][32]. The same canno be said o ala yngeal speech da abases. Fo ES,
many di e en eco dings o a ied cha ac e is ics ha e been pe o med, each
adap ed o he pu pose o he s udy. In his sec ion, we e iew he esea ch
publica ions in he ield and gi e an o e iew o he exis ing eco dings and100
hei cha ac e is ics.
Resea ch on ala yngeal speech has adi ionally ocused on he p oduc ion
o sus ained owels. Vowels allow easy measu emen o undamen al equency,
ha monic p ope ies, and in ensi y and du a ion o phona ion, which a e basic
ea u es in assessing he speake ’s oice quali y and speaking p o iciency. Vowels105
based analysis we e pe o med in a numbe o s udies [33][4][34][35][36][37][38][39].
Some s udies used eco dings o wo ds and sen ences o measu e he speaking
a e [40][2][41], o s udy pauses [42] o bo h [43]. Reco dings o wo ds and sen-
5

ences ha e also been used in pe cep ual e alua ions [44][45][46][47][48][49][50],
and o e alua e syn he ic manipula ions [15].110
Au oma ic speech ecogni ion (ASR) is also p oblema ic o ala yngeal oices.
Some ASR expe imen s use only owels [51][52]. Typically, hund eds o sen-
ences a e used o ain such ASR sys ems. In [53], a pa allel da abase o 500
sen ences p onounced by se en EL and se en HS Ge man speake s we e used
o e alua e an ASR designed o HS speake s. In [54] 480 sen ences p oduced115
by one F ench ES speake we e eco ded wi h he pu pose o imp o ing he
pe o mance o an exis ing ASR sys em.
The s a is ical VC expe imen s desc ibed in [21][22][55][23][24][25][26][27]
use 50 pa allel HS-ES sen ences, bu in Japanese. In o de o acili a e he
alignmen p ocedu e, he HS speake ied o imi a e he hy hm o he ES120
speake s’ u e ances. Such a pa allel HS-ES da abase is desi able o VC.
In conclusion, o he bes o ou knowledge, no s anda d da abase exis s o
pe o m compa able esea ch o Spanish ES, le alone o ca y ou VC expe i-
men s. We hope o ill his oid wi h he da abase desc ibed in his pape .
3. Design o he AhoSLABI da abase125
3.1. Tex con en
We selec ed he Spanish ex co pus called Zu eTTS desc ibed in [28] o
he eco dings. This co pus con ains 100 phone ically balanced sen ences en-
compassing all he phonemes in Cas ilian Spanish. The phoneme equency dis-
ibu ion is shown in Table 1. The phoneme codes ollow he Spanish SAMPA130
con en ion3. The o al numbe o phones is 5625. This dis ibu ion is consis en
wi h o he p e ious Spanish co po a (see o example [56]). The sen ences in he
co pus a e seman ically ela i ely complex. As we al eady ha e HS eco dings
o his co pus, i made sense o eco d he ES da abase wi h he same co pus.
3h ps://www.phon.ucl.ac.uk/home/sampa/spanish.h m
6
This allowed us o ha e a pa allel ES-HS co pus which is use ul o asks such135
as pa allel VC.
Fo a heal hy speake , he eco ding p ocess usually akes be ween 30 and
40 minu es. Fo an ES speake , he same ask akes longe (see subsec ion
Reco ded Ma e ial and Du a ions) and o no ice ES speake s, i can be qui e
exhaus ing. This is why he 100 sen ences eco ded we e u he di ided in140
h ee blocks o 33, 33 and 34 sen ences espec i ely. Each one o hese blocks
was phone ically balanced wi hin i sel . The e o e, i a speake was i ed and
decided o no con inue wi h he eco ding p ocess a e he i s o he second
block, he collec ed ma e ial would s ill be use ul.
Table 1: Pe cen age o phonemes in he AhoSLABI co pus.
Phoneme Occu ences (%) Phoneme Occu ences (%)
a 12.71 b 2.83
e 13.17 d 4.98
i 8.69 g 1.44
o 9.76 p 1.92
u 4.43 4.48
m 2.52 k 3.47
n 7.13 1.08
J 0.30 s 5.99
l 4.96 T 1.99
L 0.69 x 0.82
jj 0.41 S 0.44
4.75 1.03
In addi ion o he 100 sen ences, each ES speake eco ded 4 ins ances o 145
he sus ained a icula ion o all i e Spanish owels. Fou wo ds con aining
diph hongs we e also eco ded (mu ci´elago,acu´ı e o,ayun amien o,acei uno).
Ten isola ed wo ds, which a e also p esen in he Zu eTTS co pus, we e included
in he eco dings, o enable u u e e alua ions o spoken e m de ec ion asks
7
and he like.150
3.2. Cha ac e is ics o he Speake s
All he ES speake s who pa icipa ed in he eco ding p ocess a e membe s
o he Associa ion o La yngec omees o Biscay (AhoSLABI). The speake s un-
de wen speech he apy sessions a e he la yngec omy o lea n ES p oduc ion
echniques.155
Mos candida es pe o med he eco dings mon hs a e ha ing inished he
speech he apy sessions. We call hese speake s ’p o icien ’ ES speake s. On he
o he hand, 4 o hem we e s ill a ending he he apy sessions and hei speech
had e y low in elligibili y. We call hese speake s ’non-p o icien ’ speake s.
Ou o he 4 non-p o icien speake s, 2 e u ned a e inishing he he apy and160
pe o med he eco dings again. We ha e kep all hese sessions in he da abase.
The da abase con ains eco dings om 31 speake s (27 male and 4 emale).
I is composed o 34 di e en sessions as ollows:

26 p o icien ES speake s wi h one eco ding session each

2 non-p o icien ES speake s wi h one eco ding session each165

2 ES speake s wi h one eco ding session each when hey we e non-p o icien
and one when hey we e p o icien (in o al 4 sessions)

1 speake ’s eco dings in bo h TES and ES (in o al 2 sessions)
In summa y, ou o he 34 sessions, 29 co espond o p o icien ES speake s,
one o a p o icien TES speake and he emaining ou o non-p o icien ES170
speake s.
The mean age o he speake s was 65 yea s and 4 mon hs, bu wi h la ge
a ia ion. The younges was 51 yea s and 4 mon hs old a he ime o eco ding,
and he oldes was 82 yea s and 5 mon hs old.
In o de o iden i y each session, a ou cha ac e code is used:175
8

The i s wo numbe s iden i y he speake (01 o 32)4

One cha ac e speci ies he speake ’s gende M o F.

One cha ac e speci ies he kind o speake : ”3” o he p o icien speake s
and ”2” o he non-p o icien speake s. Fo he TES speake a ”T” has
been used.180
The majo i y o sessions (25) ea u e p o icien male speake s. Table 2 lis s
all he session iden i ie s.
Table 2: Session iden i ie s.
Session iden i ie
Non-p o icien , male 13M2, 14M2, 16M2
Non-p o icien , emale 15F2
P o icien , emale 11F3, 15F3, 25F3, 28F3
T acheoesophageal speake , male 09MT
P o icien , male All he o he s
In addi ion o he ES speake s, eco dings o he 100 sen ences om 9 heal hy
speake s (6 males, 3 emales, a e age age: 36 yea s and 3 mon hs) a e p o ided.
These speake s we e selec ed because o hei a ailabili y and willingness o be185
pa o he public da abase, and no c i e ia o age balance was conside ed.
3.3. Reco ding p o ocol
The da abase eco ding p o ocol and p ocedu es we e app o ed by he e hics
commi ee o he Uni e si y o he Basque Coun y (UPV/EHU) (signed on
26 h Feb ua y 2017). The eco dings we e made in he soundp oo ed eco d-190
ing oom a he Facul y o Enginee ing (UPV/EHU). Fou di e en mic o-
phones (s udio mic ophone - Neumann TLM 103, ins umen a ion mic ophone
4Reco dings om speake numbe 27 a e no included in he da abase
9
syllables pe second ob ained o each session, o de ed by mean. The esul s
o he se o 9 HS speake s a e also shown. As expec ed, HS showed a highe
speaking a e han ES. I can also be seen ha he TES speake (session 09MT)310
achie ed a speaking a e which does no di e om ha o a heal hy speake
which co obo a es p e ious analysis on TES and ES [44]. Mo eo e , when he
same speake did no use he al e (session 09M3), his speaking a e slowed
conside ably. Ano he in e es ing esul is ha 3 ou o he 4 non-p o icien
speake s had he slowes speaking a es. Two o hese non-p o icien speak-315
e s epea ed he eco dings h ee mon hs la e , a e gaining mo e con ol and
speech p o iciency. While speake 15F inc eased he speed, speake 16M was
speaking e en slowe . Howe e , based on only hese wo speake s, we canno
gene alise hese obse a ions.
Figu e 4: Speaking a e. Speaking a e calcula ed o 34 sessions o esophageal speake s
(blue) and 9 o heal hy speake s (g een). In each box, he cen e line is he median, he edges
o he box ep esen he 25 h and 75 h pe cen iles, he whiske s ex end o he mos ex eme
alues no conside ed ou lie s, and he ou lie s a e shown indi idually wi h a ed c oss.
16

4.4. ASR expe imen s320
S anda d ASR sys ems no mally use heal hy speech as aining ma e ial and
he e o e pe o m poo ly o ES. In his subsec ion, we desc ibe an expe imen
whe e we compa e he esul s o wo ASR sys ems, one ained wi h HS and he
o he one wi h ES om he AhoSLABI da abase.
The s a ing poin o bo h ASR is a s anda d Spanish ASR buil using325
he Kaldi oolki [58]. The speci ic implemen a ion o Spanish is desc ibed
in [59] and i is implemen ed ollowing he ecipe s5 o he Wall S ee Jou -
nal da abase. The aining begins wi h a la -s a ini ializa ion o con ex -
independen phone ic Hidden Ma ko Models (HMM), and hen a se ies o ac-
cumula i e ainings a e done. Fo he inal s ep o he ecognize , a neu al330
ne wo k is ained. The inpu ea u es o he neu al ne wo k consis o a se ies
o 40-dimensional ea u es. The ne wo k sees a window o hese ea u es, wi h 4
ames on each side o he cen al ame. The ea u es a e de i ed by p ocessing
he con en ional 13-dimensional Mel-F equency Ceps al Coe icien s (MFCCS)
o which a p ocess o mean and a iance no maliza ion (CMVN) is applied o335
mi iga e he e ec s o he channel. The necessa y s eps a e desc ibed in [60] and
basically consis in applying a se ies o ans o ma ions o he no malized cep-
s a: i s linea disc iminan analysis (LDA), hen maximum likelihood linea
ans o m (MLLT) and global ea u e-space maximum likelihood linea eg es-
sion ( MLLR). A he ecogni ion s age, he same ans o ma ions a e applied340
o he es da a, handling hem as a block.
The main co pus used o he aining o he acous ic models is he Spanish
sec ion o a subse o he Basque Pa liamen da abase. This subse con ains
he eco dings o 47 pa liamen a y sessions o he Basque Pa liamen in bo h
Basque and Spanish, oge he wi h hei co esponden ansc ip ions 5. Some345
p elimina y wo k has been done o sepa a e he Spanish in e en ions om he
Basque ones. As a esul , he e a e mo e han 124 hou s o speech in Spanish
5This da abase is p esen ly being de eloped by he GTTS esea ch g oup o he UPV/EHU,
con ac ge man.bo [email protected]
17
u e ed by 84 di e en speake s, 45 male and 39 emale. Addi ionally o he
Basque Pa liamen da abase, abou 4 hou s o speech ex ac ed om 5 audio
iles in Spanish ex ac ed om he Spanish MAVIR wo kshops held in 2006,350
2007 and 2008 was also used o ain he acous ic models (see [61] o mo e
de ails).
To a oid he e ec s o Ou O Vocabula y (OOV) wo ds, he lexicon o
bo h ASR sys ems has been educed o he ocabula y o he 100 sen ences o
he da abase and unig am models a e used. Fo he ASR ained wi h HS, he355
heal hy speake s o he da abase had a mean WER sco e o 15.8±3.9, while
he ES speake s had a mean WER sco e o 68.7±16.9. These esul s show how
p oblema ic gene ic ASR ained wi h HS can be o ES.
To ain he sys em wi h ES, we used all he ES speake s o which he
comple e se o 100 sen ences was a ailable. The speake s we e di ided in o 3360
blocks o 10 speake s each. The sen ences we e di ided in o 10 blocks. A wo
le el c oss alida ion was pe o med, one a he speake le el and he o he a
he sen ence le el. In o al 10 (sen ence blocks) imes 3 (speake blocks) i.e., 30
c oss- alida ions we e pe o med o ensu e ha he es da a was no seen in
he aining phase. In each o hese c oss- alida ions, 90 sen ences om all he365
speake s o 2 blocks we e used as aining ma e ial and he 10 es sen ences o
he 3 d block o speake s we e e alua ed. When done 30 imes, all he sen ences
om all he speake s we e co e ed.
The ASR sco es o he 29 p o icien speake s om bo h sys ems (ASR
ained wi h HS and ASR ained wi h ES) a e p esen ed in igu e 5. The370
non-p o icien speake (14M2) has been emo ed om he global esul s due o
hei poo pe o mance (WER highe han 100%). The WER sco es om he
ASR ained wi h HS we e signi ican ly highe han he ASR ained wi h ES
( (28)=16.14, p<0.001). As can be obse ed, some speake s bene i mo e han
o he s om he ES aining. The mean imp o emen in WER is 23.2±7.7.375
This esul demons a es ha gene ic Spanish ASR sy ems can be made
mo e ES inclusi e by using he AhoSLABI da abase.
18
Figu e 5: ASR Resul s. Mean speake -wise Wo d E o Ra es (in %) o ASR ained wi h
HS and ASR ained wi h ES
5. Conclusions
In his a icle we ha e desc ibed a da abase o Spanish ES called AhoSLABI.
The da abase comp ises mainly male ES, al hough i also con ains he eco dings380
o ou emale ES speake s and one male TES speake . The main con en o he
da abase is he eco dings o a se o 100 phone ically balanced sen ences. The
da abase also con ains pa allel eco dings o 9 heal hy speake s. We pe o med
segmen a ion and labeling on he da a. We ha e desc ibed he main aspec s o
he expe imen al se up, speake cha ac e is ics and he acous ic p ope ies o 385
he eco dings.
The p ima y mo i a ion o c ea ing his da abase was he au ho s’ desi e
o ha e he la yngec omees bene i om ecen ad ances in speech echnologies,
speci ically in he ield o VC. In pa icula , as epo ed in sec ion 1, VC ech-
niques ha e been p oposed in he li e a u e o imp o e he in elligibili y o hese390
oices. This was he main eason o eco d he sen ences, as mos VC sys ems
need pa allel sou ce- a ge u e ances o ain he con e sion unc ion. Some o
ou VC wo k ([62] and [63]) demons a es how ES can be made mo e in elligible
19
o mo e p e e able o lis ene s using VC echniques.
Al hough VC was ou main in ended applica ion, he e a e many o he a eas395
o s udy whe e hese eco dings could be o in e es . The sus ained owels
eco dings a e help ul in he e alua ion o undamen al equency, shimme ,
ji e , and in ensi y and du a ion o phona ion. The signals can be used o
ain and es he pe o mance o ASR sys ems wi h ES as shown in sec ion 4.4
o his pape . Addi ionally, a small se o isola ed wo ds is also a ailable which400
can be use ul o es ASR sys ems in a spoken e m de ec ion ask.
Ano he esea ch a ea is ela ed o he loss o iden i y in he la yngec omees
oices. One’s oice is a e y impo an pe sonali y ai which is los wi h
la yngec omy. The eco dings a ailable could be use ul in he emula ion o p e-
la yngec omy speech cha ac e is ics. In es iga ing ways o es o e his iden i y405
could be mo e easible i p e-su ge y eco dings we e a ailable. In he u u e,
he au ho s in end o also eco d oices o p e-la yngec omy pa ien s.
Subjec i e e alua ion o he quali y and in elligibili y o ala yngeal speech
o imp o e diagnosis and he apy is also possible wi h hese eco dings, because
he numbe and a ie y o indi iduals is conside ably high. A p elimina y s udy410
o he in elligibili y and lis ening e o o AhoSLABI was conduc ed in [64].
We belie e ha i is no only speech enginee s bu also esea che s in speech
he apy who can bene i om his da abase 6.
6. Acknowledgmen s
This wo k was pa ially unded by he Spanish Minis y o Economy and415
Compe i i eness wi h FEDER suppo (RESTORE p ojec , TEC2015-67163-
C2-1-R), he Basque Go e nmen (PIBA-018-0035) and by he Eu opean Union’s
H2020 esea ch and inno a ion p og am unde he Ma ie Cu ie Eu opean T ain-
ing Ne wo k ENRICH (675324).
6The da abase is a ailable o esea che s h ough he Eu opean Language Resou ces
Agency eposi o y.
20
The au ho s wan o hank he Asociaci´on Bizkaina de La ingec omizados o 420
hei aluable collabo a ion and all he la yngec omees o hei oice dona ions.
We also would like o hank he e iewe s o hei ui ul commen s ha ha e
con ibu ed g ea ly o he alue o he pape .
Re e ences
[1] S. E. Williams, J. B. Wa son, Speaking p o iciency a ia ions acco ding o425
me hod o ala yngeal oicing, La yngoscope 97 (1987) 737–739.
[2] R. H. Pindzola, B. H. Cain, Accep abili y a ings o acheoesophageal
speech, La yngoscope 98 (1988) 394–397.
[3] W. Ainswo h, S. W., Pe cep ual compa ison o neoglo al, oesophageal
and no mal speech., Folia Phonia (Basel) 44 (6) (1992) 297–307.430
[4] F. Deb uyne, P. Delae e, J. Wou e s, P. Uwen s, Acous ic analysis o
acheo-oesophageal e sus oesophageal speech, The Jou nal o La yngol-
ogy & O ology 108 (4) (1994) 325–328.
[5] T. Mos , Y. Tobin, R. C. Mim an, Acous ic and pe cep ual cha ac e is ics
o esophageal and acheoesophageal speech p oduc ion, Jou nal o com-435
munica ion diso de s 33 (2) (2000) 165–181.
[6] L. ˇ
Si i´c, D. ˇ
Soˇs, M. Rosso, S. S e ano i´c, Objec i e assessmen o acheoe-
sophageal and esophageal speech using acous ic analysis o oice, Collegium
an opologicum 36 (2) (2013) 111–114.
[7] B. M. Op de Coul, F. J. Hilge s, a. J. Balm, I. B. Tan, F. J. an den Hoogen,440
H. an Tin e en, A decade o pos la yngec omy ocal ehabili a ion in 318
pa ien s: a single Ins i u ion’s expe ience wi h consis en applica ion o
p o ox indwelling oice p os heses., A chi es o o ola yngology–head &
neck su ge y 126 (11) (2000) 1320–8. doi:10.1001/a cho ol.126.11.
1320.445
URL h p://www.ncbi.nlm.nih.go /pubmed/11074828
21

[8] P. D´ıaz de Ce io Canduela, I. A ´an Gonz´alez, R. Ba be ´a Du ban, A. Sis-
iaga Su´a ez, M. Tobed Secall, P. L. Pa en e A ias, Rehabili a ion o
he la yngec omised pa ien . Recommenda ions o he Spanish Socie y o
O ola yngology and Head and Neck Su ge y, Ac a O o inola ingol´ogica450
Espa˜nola (2018) 1–6doi:10.1016/j.o o i.2018.01.003.
URL h ps://doi.o g/10.1016/j.o o i.2018.01.003
[9] E. Lunds ¨om, Voice Func ion and Quali y o Li e in La yngec omees, in:
S udies in Logopedics and Phonia ics, 13, Ka olinska Ins i u e , S ock-
holm, 2009.455
[10] W. Wszolek, M. Mod zejewski, M. P zysiezny, Acous ic analysis o
esophageal speech in pa ien s a e o al la yngec omy, A chi es o Acous-
ics 32 (4 (Supplemen )) (2007) 151–158.
[11] B. Weinbe g, Acous ical p ope ies o esophageal and acheoesophageal
speech, La yngec omee ehabili a ion (1986) 113–127.460
[12] T. D ugman, M. Rijckae , C. Janssens, M. Remacle, T acheoesophageal
speech: A dedica ed objec i e acous ic assessmen , Compu e Speech &
Language 30 (1) (2015) 16–31.
[13] R. Ishaq, B. G. Zapi ain, Esophageal speech enhancemen using modi ied
oicing sou ce, in: Signal P ocessing and In o ma ion Technology (ISSPIT),465
2013 IEEE In e na ional Symposium on, IEEE, 2013, pp. 000210–000214.
[14] H. R. Sha i zadeh, I. V. McLoughlin, F. Ahmadi, Recons uc ion o no mal
sounding speech o la yngec omy pa ien s h ough a modi ied celp codec,
IEEE T ansac ions on Biomedical Enginee ing 57 (10) (2010) 2448–2458.
[15] R. an Son, I. Jacobi, F. J. Hilge s, e al., Manipula ing acheoesophageal470
speech., in: In e speech, 2010, pp. 274–277.
[16] A. Del Pozo, S. Young, Con inuous acheoesophageal speech epai , in:
Signal P ocessing Con e ence, 2006 14 h Eu opean, Ci esee , 2006, pp. 1–
5.
22
[17] A. Del Pozo, S. Young, Repai ing acheoesophageal speech du a ion, in:475
P oc Speech P osody, Ci esee , 2008, pp. 187–190.
[18] O. Schleusing, R. Ve e , P. Rene ey, J.-M. Vesin, V. Schweize , P osodic
speech es o a ion de ice: Glo al exci a ion es o a ion using a mul i-
esolu ion app oach, in: In e na ional Join Con e ence on Biomedical En-
ginee ing Sys ems and Technologies, Sp inge , 2010, pp. 177–188.480
[19] Y. S ylianou, O. Capp´e, E. Moulines, Con inuous p obabilis ic ans o m
o oice con e sion, IEEE T ansac ions on Speech and Audio P ocessing
6 (2) (1998) 131–142. doi:10.1109/89.661472.
[20] D. E o, A. Mo eno, A. Bona on e, Inca algo i hm o aining oice con-
e sion sys ems om nonpa allel co po a, IEEE T ansac ions on Audio,485
Speech, and Language P ocessing 18 (5) (2009) 944–953.
[21] M. Kishimo o, T. Toda, H. Doi, S. Sak i, S. Nakamu a, Model aining us-
ing pa allel da a wi h misma ched pause posi ions in s a is ical esophageal
speech enhancemen , in: Signal P ocessing (ICSP), 2012 IEEE 11 h In e -
na ional Con e ence on, Vol. 1, IEEE, 2012, pp. 590–594.490
[22] H. Doi, K. Nakamu a, T. Toda, H. Sa uwa a i, K. Shikano, Esophageal
speech enhancemen based on s a is ical oice con e sion wi h gaussian
mix u e models, IEICE TRANSACTIONS on In o ma ion and Sys ems
93 (9) (2010) 2472–2482.
[23] H. Doi, K. Nakamu a, T. Toda, H. Sa uwa a i, K. Shikano, S a is ical ap-495
p oach o enhancing esophageal speech based on gaussian mix u e models,
in: Acous ics Speech and Signal P ocessing (ICASSP), 2010 IEEE In e -
na ional Con e ence on, IEEE, 2010, pp. 4250–4253.
[24] H. Doi, K. Nakamu a, T. Toda, H. Sa uwa a i, K. Shikano, Speaking-aid
sys ems based on one- o-many eigen oice con e sion o o al la yngec-500
omees, APSIPA ASC 2010 - Asia-Paci ic Signal and In o ma ion P ocess-
ing Associa ion Annual Summi and Con e ence.
23
[25] H. Doi, K. Nakamu a, T. Toda, H. Sa uwa a i, K. Shikano, An e alua ion
o ala yngeal speech enhancemen me hods based on oice con e sion ech-
niques, in: Acous ics, Speech and Signal P ocessing (ICASSP), 2011 IEEE505
In e na ional Con e ence on, IEEE, 2011, pp. 5136–5139.
[26] K. Yamamo o, T. Toda, H. Doi, H. Sa uwa a i, K. Shikano, S a is ical
app oach o oice quali y con ol in esophageal speech enhancemen , in:
IEEE In e na ional Con e ence on Acous ics, Speech and Signal P ocessing
(ICASSP), 2012, pp. 4497–4500.510
[27] H. Doi, Augmen ed speech p oduc ion beyond physical cons ain s using
s a is ical oice con e sion – ala yngeal speech enhancemen and singing
oice quali y con ol, Ph.D. hesis, Na a Ins i u e o Science and Technol-
ogy (2013).
[28] D. E o, I. He n´aez, E. Na as, A. Alonso, H. A zelus, I. Jauk, N. Q. Hy,515
C. Maga inos, R. P´e ez-Ram´on, M. Sulı , e al., Zu e s: online pla o m
o ob aining pe sonalized syn he ic oices, P oceedings o eNTERFACE
(2014) 1178–1193.
[29] D. E o, I. He naez, A. Alonso, D. Ga c´ıa-Lo enzo, E. Na as, J. Ye,
H. A zelus, I. Jauk, N. Q. Hy, C. Maga i˜nos, R. P´e ez-Ram´on, M. Sul´ı ,520
X. Tian, X. Wang, Pe sonalized syn he ic oices o speaking impai ed:
Websi e and app., in: In e speech, 2015, pp. 1251–1254.
[30] M. Eye, E. In i ma y, Voice diso de s da abase, e sion. 1.03 (cd- om)
(1994).
[31] X. Menendez-Pidal, J. B. Poliko , S. M. Pe e s, J. E. Leonzio, H. T. Bun-525
nell, The nemou s da abase o dysa h ic speech, in: Spoken Language,
1996. ICSLP 96. P oceedings., Fou h In e na ional Con e ence on, Vol. 3,
IEEE, 1996, pp. 1962–1965.
[32] H. Kim, M. Hasegawa-Johnson, A. Pe lman, J. Gunde son, T. S. Huang,
24
K. Wa kin, S. F ame, Dysa h ic speech da abase o uni e sal access e-530
sea ch, in: P oceedings o In e speech, 2008, pp. 1741–1744.
[33] M. Kinishi, M. Ama su, Pi ch pe u ba ion measu es o oice p oduc ion o
la yngec omees a e he ama su acheoesophageal shun ope a ion, Au is
Nasus La ynx 13 (1) (1986) 53–62.
[34] M. R. A ias, J. L. Ram´on, M. Campos, J. J. Ce an es, Acous ic analysis535
o he oice in phona o y is uloplas y a e o al la yngec omy, O ola yn-
gology—Head and Neck Su ge y 122 (5) (2000) 743–747.
[35] C. J. an As-B ooks, F. J. Koopmans- an Beinum, L. C. Pols, F. J. Hilge s,
Acous ic signal yping o e alua ion o oice quali y in acheoesophageal
speech, Jou nal o Voice 20 (3) (2006) 355–368.540
[36] M. Ca ello, M. Magnano, A i s compa a i e s udy o oesophageal and
oice p os hesis speech p oduc ion, EURASIP Jou nal on Ad ances in Sig-
nal P ocessing 2009 (1) (2009) 821304.
[37] J. K. MacCallum, L. Cai, L. Zhou, Y. Zhang, J. J. Jiang, Acous ic anal-
ysis o ape iodic oice: pe u ba ion and nonlinea dynamic p ope ies in545
esophageal phona ion, Jou nal o Voice 23 (3) (2009) 283–290.
[38] N. Deo e, S. Da a, R. Dwi edi, R. Pala , R. Shah, S. Sayed, M. Jagde,
R. Kazi, Acous ic analysis o acheo-oesophageal oice in male o al la yn-
gec omy pa ien s, The Annals o The Royal College o Su geons o England
93 (7) (2011) 523–527.550
[39] H.-J. Shim, H. R. Jang, H. B. Shin, D.-H. Ko, Ceps al, spec al and ime-
based analysis o oices o esophageal speake s, Folia Phonia ica e Lo-
gopaedica 67 (2) (2015) 90–96.
[40] J. Robbins, H. B. Fishe , E. C. Blom, M. I. Singe , A compa a i e acous ic
s udy o no mal, esophageal, and acheoesophageal speech p oduc ion,555
Jou nal o Speech and Hea ing diso de s 49 (2) (1984) 202–210.
25

Related note

Why institutions use Plag.ai for originality review, entry 7
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by research administrators in North America, Europe, Latin America, and international online education, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also stronger evidence for review committees, more reliable review records, and clearer documentation of academic decisions. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For research files, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai