A Spanish Mul ispeake Da abase o Esophageal Speech
Luis Se ano Ga c´ıa, Sneha Raman, Inma He n´aez Rioja, E a Na as Co d´on,
Jon Sanchez, Ibon Sa a xaga
HiTZ Basque Cen e o Language Technology, Uni e si y o he Basque Coun y
(UPV/EHU), Bilbao, Spain
Abs ac
A la yngec omee is a pe son whose la ynx has been emo ed by su ge y, usually
due o la yngeal cance . A e su ge y, mos la yngec omees a e able o speak
again, using echniques ha a e lea ned wi h he help o a speech he apis .
This is e med as ala yngeal speech, and esophageal speech (ES) is one o he
se e al ala yngeal speech p oduc ion modes. A conside able amoun o esea ch
has been dedica ed o he s udy o ala yngeal speech, wi h a wide ange o aims
such as helping speech he apis s wi h e alua ion and diagnosis, and imp o -
ing i s quali y and in elligibili y using digi al signal p ocessing echniques. We
p esen o you a da abase o Spanish ES oices, named AhoSLABI, which is
designed o allow he de elopmen o new suppo echnologies o his speech
impai men . The da abase p ima ily consis s o eco dings o 31 la yngec omees
(27 males and 4 emales) p onouncing phone ically balanced sen ences. Addi-
ionally, i includes pa allel eco dings o he sen ences by 9 heal hy speake s
(6 males and 3 emales) o acili a e speech p ocessing asks ha equi e small
pa allel co po a, such as oice con e sion o syn he ic speech adap a ion. Apa
om he sen ences, he da abase includes sus ained owels and a small se o
isola ed wo ds, which can be aluable o esea ch on ES analysis, diagnosis and
e alua ion. The pape desc ibes he main con en s o he da abase, he eco d-
ing p o ocols and p ocedu e, as well as he labeling p ocess. The main acous ic
1Cu en Add ess: Communica ions Enginee ing Dep ., Facul y o Enginee ing o Bilbao,
Uni e si y o he Basque Coun y (UPV/EHU) Spain
P ep in submi ed o Compu e Speech and Language Sep embe 23, 2020
This is he acep ed e sion o an a icle published by Else ie . The inal e sion o Luis Se ano Ga cía, Sneha
Raman, Inma He náez Rioja, E a Na as Co dón, Jon Sanchez, Ibon Sa a xaga. A Spanish mul ispeake da abase o
esophageal speech. Compu e Speech & Language 66 : (2021) // A icle ID 101168 is a ailable a h ps://
doi.o g/10.1016/j.csl.2020.101168 ©2020 Else ie L d. All igh s ese ed.This manusc ip e sion is made a ailable
unde he CC-BY-NC-ND 4.0 license h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/
cha ac e is ics o he oices, such as speaking a e, du a ions o he eco dings,
phones and silences, and o he such cha ac e is ics a e compa ed wi h hose o
a educed se o heal hy oices. In addi ion, we desc ibe an expe imen using
he da abase o imp o e he pe o mance o an ASR sys em o ES speake s.
This new esou ce will be made a ailable o he scien i ic communi y wi h he
hope ha i will be used o imp o e he quali y o li e o he la yngec omees.
Keywo ds: esophageal speech, oice con e sion, speech da abases, speech
in elligibili y, speech analysis
2010 MSC: 00-01, 99-00
1. In oduc ion
Esophageal speech (ES) is a ype o speech p oduced by la yngec omees,
which a e people whose la ynx has been su gically emo ed. The la ynx is a
undamen al o gan in he speech p oduc ion mechanism. I con ains he ocal
olds, which a e esponsible o gene a ing he ai ib a ions ha a e pe cei ed5
as a sound. In addi ion o he emo al o he la ynx, he la yngec omy sepa a es
he nasal ca i y and he ocal ac . As a esul , he la yngec omees b ea he
h ough a hole (called he s oma) which le s he ou side ai di ec ly in o he
achea. Despi e he emo al o he ocal olds, i is s ill possible o people who
ha e unde gone a o al la yngec omy o p oduce in elligible speech ia one o 10
he h ee main ypes o ala yngeal speech: using an elec o-la ynx (EL Speech),
acheoesophageal speech (TES) and ES.
EL speech uses an ex e nal ib a ion de ice which is placed in con ac wi h
he h oa . This de ice gene a es an acous ic buzz ha can be modula ed by he
mo emen o he a icula o s. In elligible speech is ob ained wi h his me hod,15
bu he quali y is poo , mainly due o he dominan buzzing. The main and
pe haps he only ad an age o his me hod is ha no lea ning is equi ed.
In Heal hy La yngeal Speech (HS), he ai ha lows h ough he lungs and
he achea ib a es he ocal olds o c ea e sound. This is no ana omically
possible o a la yngec omee. The e o e, ai low is p oduced using o he s a e-20
2
gies, he e ec i eness o which depends on he cha ac e is ics and ana omy o
each pe son. A su gical solu ion is o c ea e a is ula allowing ai o pass be-
ween he achea and he esophagus. A al e is placed in he is ula so ha no
ood o liquid can pass o he achea. The ai low aided by his al e p oduces
ib a ions o he esophageal sphinc e , which gene a es TES. TES is mo e in el-25
ligible and equi es less e o om he speake han o he me hods o ob aining
he ai [1][2][3][4][5][6]. Howe e , he al es mus be changed pe iodically ( e-
qui ing su ge y) and he e a e o he possible medical complica ions associa ed
wi h he implan [7][8].
Unlike TES speake s, ES speake s do no ha e he al e ha allows he30
con olled en y o ai . ES speake s achie e his unc ion by swallowing ai and
expelling i , e y much like he p oduc ion o a bu p. Like TES, he pha yn-
goesophageal segmen is used as a subs i u e ib a o y elemen ins ead o he
ocal olds. Lea ning o p oduce speech in his manne equi es long pe iods
o aining (usually mon hs) wi h he assis ance o a speech he apis . Due o35
he di icul ies in he p oduc ion me hod [9], some indi iduals ne e manage o
lea n ES. Howe e , despi e he long pe iods o lea ning, ES has he ad an age
o no equi ing a de ice o pe iodic su ge ies. The e o e, we conside he e is
a clea ad an age in p omo ing he lea ning o ES.
As ES and HS p oduc ion mechanisms a e e y di e en , hei speech signal40
cha ac e is ics di e g ea ly oo [4][10]. The main consequence o ES is a
d ama ic educ ion in na u alness and in elligibili y [11][5][12]. A conside able
amoun o esea ch e o has been de o ed o o e come hese limi a ions o ES,
some o which in ol e a i icially modi ying i s cha ac e is ics.
The e ha e been se e al app oaches o enhance he quali y and in elligibili y45
o ala yngeal oices. Some s udies use sou ce- il e analysis o he pa hological
signal and ocuses on modi ying he sou ce, he il e , o bo h. An example
o his app oach can be ound in [13], whe e an adap i e gain equalize algo-
i hm was used o modi y he ES sou ce; o in [14], whe e he econs uc ion
o no mal sounding speech o la yngec omy pa ien s was a emp ed h ough50
a modi ied CELP (Code Exci ed Linea P edic ion) codec. In [15], di e en
3
manipula ions o bo h sou ce and il e we e e alua ed. Ano he app oach o
imp o e in elligibili y and quali y is o wo k wi h he p osodic elemen s. In [16],
he pi ch in o ma ion ex ac ed om an elec oglo og aph (EGG) was used o
c ea e a syn he ic glo al signal which educed ji e and shimme . Addi ion-55
ally, spec al smoo hing and il co ec ion we e applied. These modi ica ions
educed he ha shness and b ea hiness o he TES speech. The same au ho s
desc ibe a me hod o ec i ying he du a ion o pa hological phones in [17].
Along he same lines, [18] p esen s a sys em whe e conca ena ion o andomly
chosen heal hy e e ence pa e ns eplaces he pa hological exci a ion, adjus ing60
he sho , medium and long- e m a iabili y o he pi ch.
A di e en app oach o he p oblem is o use Voice Con e sion (VC) ech-
niques. VC aims o modi y he cha ac e is ics o he oice o an inpu speake ,
making hem sound like hose o a a ge speake . In he classical app oach
[19], a con e sion unc ion is ained using da a om bo h sou ce and a ge 65
speake . Al hough non-pa allel aining is also possible [20], in VC, a se o
pa allel sou ce- a ge sen ences is desi able. A se o 50 phonemically balanced
sen ences in Japanese we e used o e alua e he pe o mance and capabili y
o he di e en VC s a egies ha we e aimed a imp o ing he quali y and
in elligibili y o ala yngeal oices [21][22][23][24][25][26][27].70
In his pape , we p esen an acous ic speech da abase speci ically designed
o he esea ch o speech con e sion echniques, applied o ES. The pu pose o
de eloping he AhoSLABI da abase2was o compile acous ic da a which would
allow us o in es iga e he use o VC echniques in imp o ing he in elligibili y
and quali y o ES. Some o ou p e ious wo k in he ield o pe sonalized syn-75
he ic oices [28][29] has e ealed ha la yngec omees a e a highly in e es ed
use g oup o he echnological de elopmen s in speech syn hesis, speech ecog-
ni ion and VC echniques. Al hough no o icial s a is ics ha e been published,
in 2018, an es ima ed 1200 la yngec omies we e pe o med in Spain [8]. We
2The name is a combina ion o he labo a o y name o he au ho s (Aholab), and he name
o he Biscayan Associa ion o La yngec omees ASLABI
4
aim o p o ide some use ul ools o hese la yngec omees, and o social and80
geog aphical easons, we ha e de eloped he da abase in Spanish. To p omo e
esea ch and he compa ison o echniques and esul s, we ha e p o ided an
open access phone ically labeled da abase.
Fi s , we p esen an o e iew o he exis ing da abases in he ollowing sec-
ion. The ea e , we desc ibe he con en s o he da abase and he p ocesses85
pe o med. Sec ion “Design o he AhoSLABI da abase” desc ibes he co pus
con en s and cha ac e is ics o he speake s, as well as he eco ding se up. The
“Resul s” sec ion p esen s some me ics o he da abase and p o ides some lin-
guis ic and acous ic s a is ics abou i s con en s. In his sec ion, we also epo
he p ocess o ex ac ion o phone ic labels and he e alua ion o he au oma ic90
labeling p ocedu e. In addi ion, we gi e some p elimina y esul s o ASR and
VC expe imen s pe o med wi h he da abase. The inal sec ion p esen s he
conclusions and discusses possible u u e uses o he da abase.
2. Exis ing Rela ed Ma e ial
In some ypes o pa hological speech such as dysa h ic speech, ce ain95
da abases ha e been ex ensi ely used and ha e become a de ac o s anda d
[30][31][32]. The same canno be said o ala yngeal speech da abases. Fo ES,
many di e en eco dings o a ied cha ac e is ics ha e been pe o med, each
adap ed o he pu pose o he s udy. In his sec ion, we e iew he esea ch
publica ions in he ield and gi e an o e iew o he exis ing eco dings and100
hei cha ac e is ics.
Resea ch on ala yngeal speech has adi ionally ocused on he p oduc ion
o sus ained owels. Vowels allow easy measu emen o undamen al equency,
ha monic p ope ies, and in ensi y and du a ion o phona ion, which a e basic
ea u es in assessing he speake ’s oice quali y and speaking p o iciency. Vowels105
based analysis we e pe o med in a numbe o s udies [33][4][34][35][36][37][38][39].
Some s udies used eco dings o wo ds and sen ences o measu e he speaking
a e [40][2][41], o s udy pauses [42] o bo h [43]. Reco dings o wo ds and sen-
5
ences ha e also been used in pe cep ual e alua ions [44][45][46][47][48][49][50],
and o e alua e syn he ic manipula ions [15].110
Au oma ic speech ecogni ion (ASR) is also p oblema ic o ala yngeal oices.
Some ASR expe imen s use only owels [51][52]. Typically, hund eds o sen-
ences a e used o ain such ASR sys ems. In [53], a pa allel da abase o 500
sen ences p onounced by se en EL and se en HS Ge man speake s we e used
o e alua e an ASR designed o HS speake s. In [54] 480 sen ences p oduced115
by one F ench ES speake we e eco ded wi h he pu pose o imp o ing he
pe o mance o an exis ing ASR sys em.
The s a is ical VC expe imen s desc ibed in [21][22][55][23][24][25][26][27]
use 50 pa allel HS-ES sen ences, bu in Japanese. In o de o acili a e he
alignmen p ocedu e, he HS speake ied o imi a e he hy hm o he ES120
speake s’ u e ances. Such a pa allel HS-ES da abase is desi able o VC.
In conclusion, o he bes o ou knowledge, no s anda d da abase exis s o
pe o m compa able esea ch o Spanish ES, le alone o ca y ou VC expe i-
men s. We hope o ill his oid wi h he da abase desc ibed in his pape .
3. Design o he AhoSLABI da abase125
3.1. Tex con en
We selec ed he Spanish ex co pus called Zu eTTS desc ibed in [28] o
he eco dings. This co pus con ains 100 phone ically balanced sen ences en-
compassing all he phonemes in Cas ilian Spanish. The phoneme equency dis-
ibu ion is shown in Table 1. The phoneme codes ollow he Spanish SAMPA130
con en ion3. The o al numbe o phones is 5625. This dis ibu ion is consis en
wi h o he p e ious Spanish co po a (see o example [56]). The sen ences in he
co pus a e seman ically ela i ely complex. As we al eady ha e HS eco dings
o his co pus, i made sense o eco d he ES da abase wi h he same co pus.
3h ps://www.phon.ucl.ac.uk/home/sampa/spanish.h m
6
This allowed us o ha e a pa allel ES-HS co pus which is use ul o asks such135
as pa allel VC.
Fo a heal hy speake , he eco ding p ocess usually akes be ween 30 and
40 minu es. Fo an ES speake , he same ask akes longe (see subsec ion
Reco ded Ma e ial and Du a ions) and o no ice ES speake s, i can be qui e
exhaus ing. This is why he 100 sen ences eco ded we e u he di ided in140
h ee blocks o 33, 33 and 34 sen ences espec i ely. Each one o hese blocks
was phone ically balanced wi hin i sel . The e o e, i a speake was i ed and
decided o no con inue wi h he eco ding p ocess a e he i s o he second
block, he collec ed ma e ial would s ill be use ul.
Table 1: Pe cen age o phonemes in he AhoSLABI co pus.
Phoneme Occu ences (%) Phoneme Occu ences (%)
a 12.71 b 2.83
e 13.17 d 4.98
i 8.69 g 1.44
o 9.76 p 1.92
u 4.43 4.48
m 2.52 k 3.47
n 7.13 1.08
J 0.30 s 5.99
l 4.96 T 1.99
L 0.69 x 0.82
jj 0.41 S 0.44
4.75 1.03
In addi ion o he 100 sen ences, each ES speake eco ded 4 ins ances o 145
he sus ained a icula ion o all i e Spanish owels. Fou wo ds con aining
diph hongs we e also eco ded (mu ci´elago,acu´ı e o,ayun amien o,acei uno).
Ten isola ed wo ds, which a e also p esen in he Zu eTTS co pus, we e included
in he eco dings, o enable u u e e alua ions o spoken e m de ec ion asks
7
and he like.150
3.2. Cha ac e is ics o he Speake s
All he ES speake s who pa icipa ed in he eco ding p ocess a e membe s
o he Associa ion o La yngec omees o Biscay (AhoSLABI). The speake s un-
de wen speech he apy sessions a e he la yngec omy o lea n ES p oduc ion
echniques.155
Mos candida es pe o med he eco dings mon hs a e ha ing inished he
speech he apy sessions. We call hese speake s ’p o icien ’ ES speake s. On he
o he hand, 4 o hem we e s ill a ending he he apy sessions and hei speech
had e y low in elligibili y. We call hese speake s ’non-p o icien ’ speake s.
Ou o he 4 non-p o icien speake s, 2 e u ned a e inishing he he apy and160
pe o med he eco dings again. We ha e kep all hese sessions in he da abase.
The da abase con ains eco dings om 31 speake s (27 male and 4 emale).
I is composed o 34 di e en sessions as ollows:
26 p o icien ES speake s wi h one eco ding session each
2 non-p o icien ES speake s wi h one eco ding session each165
2 ES speake s wi h one eco ding session each when hey we e non-p o icien
and one when hey we e p o icien (in o al 4 sessions)
1 speake ’s eco dings in bo h TES and ES (in o al 2 sessions)
In summa y, ou o he 34 sessions, 29 co espond o p o icien ES speake s,
one o a p o icien TES speake and he emaining ou o non-p o icien ES170
speake s.
The mean age o he speake s was 65 yea s and 4 mon hs, bu wi h la ge
a ia ion. The younges was 51 yea s and 4 mon hs old a he ime o eco ding,
and he oldes was 82 yea s and 5 mon hs old.
In o de o iden i y each session, a ou cha ac e code is used:175
8
The i s wo numbe s iden i y he speake (01 o 32)4
One cha ac e speci ies he speake ’s gende M o F.
One cha ac e speci ies he kind o speake : ”3” o he p o icien speake s
and ”2” o he non-p o icien speake s. Fo he TES speake a ”T” has
been used.180
The majo i y o sessions (25) ea u e p o icien male speake s. Table 2 lis s
all he session iden i ie s.
Table 2: Session iden i ie s.
Session iden i ie
Non-p o icien , male 13M2, 14M2, 16M2
Non-p o icien , emale 15F2
P o icien , emale 11F3, 15F3, 25F3, 28F3
T acheoesophageal speake , male 09MT
P o icien , male All he o he s
In addi ion o he ES speake s, eco dings o he 100 sen ences om 9 heal hy
speake s (6 males, 3 emales, a e age age: 36 yea s and 3 mon hs) a e p o ided.
These speake s we e selec ed because o hei a ailabili y and willingness o be185
pa o he public da abase, and no c i e ia o age balance was conside ed.
3.3. Reco ding p o ocol
The da abase eco ding p o ocol and p ocedu es we e app o ed by he e hics
commi ee o he Uni e si y o he Basque Coun y (UPV/EHU) (signed on
26 h Feb ua y 2017). The eco dings we e made in he soundp oo ed eco d-190
ing oom a he Facul y o Enginee ing (UPV/EHU). Fou di e en mic o-
phones (s udio mic ophone - Neumann TLM 103, ins umen a ion mic ophone
4Reco dings om speake numbe 27 a e no included in he da abase
9
syllables pe second ob ained o each session, o de ed by mean. The esul s
o he se o 9 HS speake s a e also shown. As expec ed, HS showed a highe
speaking a e han ES. I can also be seen ha he TES speake (session 09MT)310
achie ed a speaking a e which does no di e om ha o a heal hy speake
which co obo a es p e ious analysis on TES and ES [44]. Mo eo e , when he
same speake did no use he al e (session 09M3), his speaking a e slowed
conside ably. Ano he in e es ing esul is ha 3 ou o he 4 non-p o icien
speake s had he slowes speaking a es. Two o hese non-p o icien speak-315
e s epea ed he eco dings h ee mon hs la e , a e gaining mo e con ol and
speech p o iciency. While speake 15F inc eased he speed, speake 16M was
speaking e en slowe . Howe e , based on only hese wo speake s, we canno
gene alise hese obse a ions.
Figu e 4: Speaking a e. Speaking a e calcula ed o 34 sessions o esophageal speake s
(blue) and 9 o heal hy speake s (g een). In each box, he cen e line is he median, he edges
o he box ep esen he 25 h and 75 h pe cen iles, he whiske s ex end o he mos ex eme
alues no conside ed ou lie s, and he ou lie s a e shown indi idually wi h a ed c oss.
16
4.4. ASR expe imen s320
S anda d ASR sys ems no mally use heal hy speech as aining ma e ial and
he e o e pe o m poo ly o ES. In his subsec ion, we desc ibe an expe imen
whe e we compa e he esul s o wo ASR sys ems, one ained wi h HS and he
o he one wi h ES om he AhoSLABI da abase.
The s a ing poin o bo h ASR is a s anda d Spanish ASR buil using325
he Kaldi oolki [58]. The speci ic implemen a ion o Spanish is desc ibed
in [59] and i is implemen ed ollowing he ecipe s5 o he Wall S ee Jou -
nal da abase. The aining begins wi h a la -s a ini ializa ion o con ex -
independen phone ic Hidden Ma ko Models (HMM), and hen a se ies o ac-
cumula i e ainings a e done. Fo he inal s ep o he ecognize , a neu al330
ne wo k is ained. The inpu ea u es o he neu al ne wo k consis o a se ies
o 40-dimensional ea u es. The ne wo k sees a window o hese ea u es, wi h 4
ames on each side o he cen al ame. The ea u es a e de i ed by p ocessing
he con en ional 13-dimensional Mel-F equency Ceps al Coe icien s (MFCCS)
o which a p ocess o mean and a iance no maliza ion (CMVN) is applied o335
mi iga e he e ec s o he channel. The necessa y s eps a e desc ibed in [60] and
basically consis in applying a se ies o ans o ma ions o he no malized cep-
s a: i s linea disc iminan analysis (LDA), hen maximum likelihood linea
ans o m (MLLT) and global ea u e-space maximum likelihood linea eg es-
sion ( MLLR). A he ecogni ion s age, he same ans o ma ions a e applied340
o he es da a, handling hem as a block.
The main co pus used o he aining o he acous ic models is he Spanish
sec ion o a subse o he Basque Pa liamen da abase. This subse con ains
he eco dings o 47 pa liamen a y sessions o he Basque Pa liamen in bo h
Basque and Spanish, oge he wi h hei co esponden ansc ip ions 5. Some345
p elimina y wo k has been done o sepa a e he Spanish in e en ions om he
Basque ones. As a esul , he e a e mo e han 124 hou s o speech in Spanish
5This da abase is p esen ly being de eloped by he GTTS esea ch g oup o he UPV/EHU,
con ac ge man.bo [email protected]
17
u e ed by 84 di e en speake s, 45 male and 39 emale. Addi ionally o he
Basque Pa liamen da abase, abou 4 hou s o speech ex ac ed om 5 audio
iles in Spanish ex ac ed om he Spanish MAVIR wo kshops held in 2006,350
2007 and 2008 was also used o ain he acous ic models (see [61] o mo e
de ails).
To a oid he e ec s o Ou O Vocabula y (OOV) wo ds, he lexicon o
bo h ASR sys ems has been educed o he ocabula y o he 100 sen ences o
he da abase and unig am models a e used. Fo he ASR ained wi h HS, he355
heal hy speake s o he da abase had a mean WER sco e o 15.8±3.9, while
he ES speake s had a mean WER sco e o 68.7±16.9. These esul s show how
p oblema ic gene ic ASR ained wi h HS can be o ES.
To ain he sys em wi h ES, we used all he ES speake s o which he
comple e se o 100 sen ences was a ailable. The speake s we e di ided in o 3360
blocks o 10 speake s each. The sen ences we e di ided in o 10 blocks. A wo
le el c oss alida ion was pe o med, one a he speake le el and he o he a
he sen ence le el. In o al 10 (sen ence blocks) imes 3 (speake blocks) i.e., 30
c oss- alida ions we e pe o med o ensu e ha he es da a was no seen in
he aining phase. In each o hese c oss- alida ions, 90 sen ences om all he365
speake s o 2 blocks we e used as aining ma e ial and he 10 es sen ences o
he 3 d block o speake s we e e alua ed. When done 30 imes, all he sen ences
om all he speake s we e co e ed.
The ASR sco es o he 29 p o icien speake s om bo h sys ems (ASR
ained wi h HS and ASR ained wi h ES) a e p esen ed in igu e 5. The370
non-p o icien speake (14M2) has been emo ed om he global esul s due o
hei poo pe o mance (WER highe han 100%). The WER sco es om he
ASR ained wi h HS we e signi ican ly highe han he ASR ained wi h ES
( (28)=16.14, p<0.001). As can be obse ed, some speake s bene i mo e han
o he s om he ES aining. The mean imp o emen in WER is 23.2±7.7.375
This esul demons a es ha gene ic Spanish ASR sy ems can be made
mo e ES inclusi e by using he AhoSLABI da abase.
18
Figu e 5: ASR Resul s. Mean speake -wise Wo d E o Ra es (in %) o ASR ained wi h
HS and ASR ained wi h ES
5. Conclusions
In his a icle we ha e desc ibed a da abase o Spanish ES called AhoSLABI.
The da abase comp ises mainly male ES, al hough i also con ains he eco dings380
o ou emale ES speake s and one male TES speake . The main con en o he
da abase is he eco dings o a se o 100 phone ically balanced sen ences. The
da abase also con ains pa allel eco dings o 9 heal hy speake s. We pe o med
segmen a ion and labeling on he da a. We ha e desc ibed he main aspec s o
he expe imen al se up, speake cha ac e is ics and he acous ic p ope ies o 385
he eco dings.
The p ima y mo i a ion o c ea ing his da abase was he au ho s’ desi e
o ha e he la yngec omees bene i om ecen ad ances in speech echnologies,
speci ically in he ield o VC. In pa icula , as epo ed in sec ion 1, VC ech-
niques ha e been p oposed in he li e a u e o imp o e he in elligibili y o hese390
oices. This was he main eason o eco d he sen ences, as mos VC sys ems
need pa allel sou ce- a ge u e ances o ain he con e sion unc ion. Some o
ou VC wo k ([62] and [63]) demons a es how ES can be made mo e in elligible
19
o mo e p e e able o lis ene s using VC echniques.
Al hough VC was ou main in ended applica ion, he e a e many o he a eas395
o s udy whe e hese eco dings could be o in e es . The sus ained owels
eco dings a e help ul in he e alua ion o undamen al equency, shimme ,
ji e , and in ensi y and du a ion o phona ion. The signals can be used o
ain and es he pe o mance o ASR sys ems wi h ES as shown in sec ion 4.4
o his pape . Addi ionally, a small se o isola ed wo ds is also a ailable which400
can be use ul o es ASR sys ems in a spoken e m de ec ion ask.
Ano he esea ch a ea is ela ed o he loss o iden i y in he la yngec omees
oices. One’s oice is a e y impo an pe sonali y ai which is los wi h
la yngec omy. The eco dings a ailable could be use ul in he emula ion o p e-
la yngec omy speech cha ac e is ics. In es iga ing ways o es o e his iden i y405
could be mo e easible i p e-su ge y eco dings we e a ailable. In he u u e,
he au ho s in end o also eco d oices o p e-la yngec omy pa ien s.
Subjec i e e alua ion o he quali y and in elligibili y o ala yngeal speech
o imp o e diagnosis and he apy is also possible wi h hese eco dings, because
he numbe and a ie y o indi iduals is conside ably high. A p elimina y s udy410
o he in elligibili y and lis ening e o o AhoSLABI was conduc ed in [64].
We belie e ha i is no only speech enginee s bu also esea che s in speech
he apy who can bene i om his da abase 6.
6. Acknowledgmen s
This wo k was pa ially unded by he Spanish Minis y o Economy and415
Compe i i eness wi h FEDER suppo (RESTORE p ojec , TEC2015-67163-
C2-1-R), he Basque Go e nmen (PIBA-018-0035) and by he Eu opean Union’s
H2020 esea ch and inno a ion p og am unde he Ma ie Cu ie Eu opean T ain-
ing Ne wo k ENRICH (675324).
6The da abase is a ailable o esea che s h ough he Eu opean Language Resou ces
Agency eposi o y.
20
The au ho s wan o hank he Asociaci´on Bizkaina de La ingec omizados o 420
hei aluable collabo a ion and all he la yngec omees o hei oice dona ions.
We also would like o hank he e iewe s o hei ui ul commen s ha ha e
con ibu ed g ea ly o he alue o he pape .
Re e ences
[1] S. E. Williams, J. B. Wa son, Speaking p o iciency a ia ions acco ding o425
me hod o ala yngeal oicing, La yngoscope 97 (1987) 737–739.
[2] R. H. Pindzola, B. H. Cain, Accep abili y a ings o acheoesophageal
speech, La yngoscope 98 (1988) 394–397.
[3] W. Ainswo h, S. W., Pe cep ual compa ison o neoglo al, oesophageal
and no mal speech., Folia Phonia (Basel) 44 (6) (1992) 297–307.430
[4] F. Deb uyne, P. Delae e, J. Wou e s, P. Uwen s, Acous ic analysis o
acheo-oesophageal e sus oesophageal speech, The Jou nal o La yngol-
ogy & O ology 108 (4) (1994) 325–328.
[5] T. Mos , Y. Tobin, R. C. Mim an, Acous ic and pe cep ual cha ac e is ics
o esophageal and acheoesophageal speech p oduc ion, Jou nal o com-435
munica ion diso de s 33 (2) (2000) 165–181.
[6] L. ˇ
Si i´c, D. ˇ
Soˇs, M. Rosso, S. S e ano i´c, Objec i e assessmen o acheoe-
sophageal and esophageal speech using acous ic analysis o oice, Collegium
an opologicum 36 (2) (2013) 111–114.
[7] B. M. Op de Coul, F. J. Hilge s, a. J. Balm, I. B. Tan, F. J. an den Hoogen,440
H. an Tin e en, A decade o pos la yngec omy ocal ehabili a ion in 318
pa ien s: a single Ins i u ion’s expe ience wi h consis en applica ion o
p o ox indwelling oice p os heses., A chi es o o ola yngology–head &
neck su ge y 126 (11) (2000) 1320–8. doi:10.1001/a cho ol.126.11.
1320.445
URL h p://www.ncbi.nlm.nih.go /pubmed/11074828
21
[8] P. D´ıaz de Ce io Canduela, I. A ´an Gonz´alez, R. Ba be ´a Du ban, A. Sis-
iaga Su´a ez, M. Tobed Secall, P. L. Pa en e A ias, Rehabili a ion o
he la yngec omised pa ien . Recommenda ions o he Spanish Socie y o
O ola yngology and Head and Neck Su ge y, Ac a O o inola ingol´ogica450
Espa˜nola (2018) 1–6doi:10.1016/j.o o i.2018.01.003.
URL h ps://doi.o g/10.1016/j.o o i.2018.01.003
[9] E. Lunds ¨om, Voice Func ion and Quali y o Li e in La yngec omees, in:
S udies in Logopedics and Phonia ics, 13, Ka olinska Ins i u e , S ock-
holm, 2009.455
[10] W. Wszolek, M. Mod zejewski, M. P zysiezny, Acous ic analysis o
esophageal speech in pa ien s a e o al la yngec omy, A chi es o Acous-
ics 32 (4 (Supplemen )) (2007) 151–158.
[11] B. Weinbe g, Acous ical p ope ies o esophageal and acheoesophageal
speech, La yngec omee ehabili a ion (1986) 113–127.460
[12] T. D ugman, M. Rijckae , C. Janssens, M. Remacle, T acheoesophageal
speech: A dedica ed objec i e acous ic assessmen , Compu e Speech &
Language 30 (1) (2015) 16–31.
[13] R. Ishaq, B. G. Zapi ain, Esophageal speech enhancemen using modi ied
oicing sou ce, in: Signal P ocessing and In o ma ion Technology (ISSPIT),465
2013 IEEE In e na ional Symposium on, IEEE, 2013, pp. 000210–000214.
[14] H. R. Sha i zadeh, I. V. McLoughlin, F. Ahmadi, Recons uc ion o no mal
sounding speech o la yngec omy pa ien s h ough a modi ied celp codec,
IEEE T ansac ions on Biomedical Enginee ing 57 (10) (2010) 2448–2458.
[15] R. an Son, I. Jacobi, F. J. Hilge s, e al., Manipula ing acheoesophageal470
speech., in: In e speech, 2010, pp. 274–277.
[16] A. Del Pozo, S. Young, Con inuous acheoesophageal speech epai , in:
Signal P ocessing Con e ence, 2006 14 h Eu opean, Ci esee , 2006, pp. 1–
5.
22
[17] A. Del Pozo, S. Young, Repai ing acheoesophageal speech du a ion, in:475
P oc Speech P osody, Ci esee , 2008, pp. 187–190.
[18] O. Schleusing, R. Ve e , P. Rene ey, J.-M. Vesin, V. Schweize , P osodic
speech es o a ion de ice: Glo al exci a ion es o a ion using a mul i-
esolu ion app oach, in: In e na ional Join Con e ence on Biomedical En-
ginee ing Sys ems and Technologies, Sp inge , 2010, pp. 177–188.480
[19] Y. S ylianou, O. Capp´e, E. Moulines, Con inuous p obabilis ic ans o m
o oice con e sion, IEEE T ansac ions on Speech and Audio P ocessing
6 (2) (1998) 131–142. doi:10.1109/89.661472.
[20] D. E o, A. Mo eno, A. Bona on e, Inca algo i hm o aining oice con-
e sion sys ems om nonpa allel co po a, IEEE T ansac ions on Audio,485
Speech, and Language P ocessing 18 (5) (2009) 944–953.
[21] M. Kishimo o, T. Toda, H. Doi, S. Sak i, S. Nakamu a, Model aining us-
ing pa allel da a wi h misma ched pause posi ions in s a is ical esophageal
speech enhancemen , in: Signal P ocessing (ICSP), 2012 IEEE 11 h In e -
na ional Con e ence on, Vol. 1, IEEE, 2012, pp. 590–594.490
[22] H. Doi, K. Nakamu a, T. Toda, H. Sa uwa a i, K. Shikano, Esophageal
speech enhancemen based on s a is ical oice con e sion wi h gaussian
mix u e models, IEICE TRANSACTIONS on In o ma ion and Sys ems
93 (9) (2010) 2472–2482.
[23] H. Doi, K. Nakamu a, T. Toda, H. Sa uwa a i, K. Shikano, S a is ical ap-495
p oach o enhancing esophageal speech based on gaussian mix u e models,
in: Acous ics Speech and Signal P ocessing (ICASSP), 2010 IEEE In e -
na ional Con e ence on, IEEE, 2010, pp. 4250–4253.
[24] H. Doi, K. Nakamu a, T. Toda, H. Sa uwa a i, K. Shikano, Speaking-aid
sys ems based on one- o-many eigen oice con e sion o o al la yngec-500
omees, APSIPA ASC 2010 - Asia-Paci ic Signal and In o ma ion P ocess-
ing Associa ion Annual Summi and Con e ence.
23
[25] H. Doi, K. Nakamu a, T. Toda, H. Sa uwa a i, K. Shikano, An e alua ion
o ala yngeal speech enhancemen me hods based on oice con e sion ech-
niques, in: Acous ics, Speech and Signal P ocessing (ICASSP), 2011 IEEE505
In e na ional Con e ence on, IEEE, 2011, pp. 5136–5139.
[26] K. Yamamo o, T. Toda, H. Doi, H. Sa uwa a i, K. Shikano, S a is ical
app oach o oice quali y con ol in esophageal speech enhancemen , in:
IEEE In e na ional Con e ence on Acous ics, Speech and Signal P ocessing
(ICASSP), 2012, pp. 4497–4500.510
[27] H. Doi, Augmen ed speech p oduc ion beyond physical cons ain s using
s a is ical oice con e sion – ala yngeal speech enhancemen and singing
oice quali y con ol, Ph.D. hesis, Na a Ins i u e o Science and Technol-
ogy (2013).
[28] D. E o, I. He n´aez, E. Na as, A. Alonso, H. A zelus, I. Jauk, N. Q. Hy,515
C. Maga inos, R. P´e ez-Ram´on, M. Sulı , e al., Zu e s: online pla o m
o ob aining pe sonalized syn he ic oices, P oceedings o eNTERFACE
(2014) 1178–1193.
[29] D. E o, I. He naez, A. Alonso, D. Ga c´ıa-Lo enzo, E. Na as, J. Ye,
H. A zelus, I. Jauk, N. Q. Hy, C. Maga i˜nos, R. P´e ez-Ram´on, M. Sul´ı ,520
X. Tian, X. Wang, Pe sonalized syn he ic oices o speaking impai ed:
Websi e and app., in: In e speech, 2015, pp. 1251–1254.
[30] M. Eye, E. In i ma y, Voice diso de s da abase, e sion. 1.03 (cd- om)
(1994).
[31] X. Menendez-Pidal, J. B. Poliko , S. M. Pe e s, J. E. Leonzio, H. T. Bun-525
nell, The nemou s da abase o dysa h ic speech, in: Spoken Language,
1996. ICSLP 96. P oceedings., Fou h In e na ional Con e ence on, Vol. 3,
IEEE, 1996, pp. 1962–1965.
[32] H. Kim, M. Hasegawa-Johnson, A. Pe lman, J. Gunde son, T. S. Huang,
24
K. Wa kin, S. F ame, Dysa h ic speech da abase o uni e sal access e-530
sea ch, in: P oceedings o In e speech, 2008, pp. 1741–1744.
[33] M. Kinishi, M. Ama su, Pi ch pe u ba ion measu es o oice p oduc ion o
la yngec omees a e he ama su acheoesophageal shun ope a ion, Au is
Nasus La ynx 13 (1) (1986) 53–62.
[34] M. R. A ias, J. L. Ram´on, M. Campos, J. J. Ce an es, Acous ic analysis535
o he oice in phona o y is uloplas y a e o al la yngec omy, O ola yn-
gology—Head and Neck Su ge y 122 (5) (2000) 743–747.
[35] C. J. an As-B ooks, F. J. Koopmans- an Beinum, L. C. Pols, F. J. Hilge s,
Acous ic signal yping o e alua ion o oice quali y in acheoesophageal
speech, Jou nal o Voice 20 (3) (2006) 355–368.540
[36] M. Ca ello, M. Magnano, A i s compa a i e s udy o oesophageal and
oice p os hesis speech p oduc ion, EURASIP Jou nal on Ad ances in Sig-
nal P ocessing 2009 (1) (2009) 821304.
[37] J. K. MacCallum, L. Cai, L. Zhou, Y. Zhang, J. J. Jiang, Acous ic anal-
ysis o ape iodic oice: pe u ba ion and nonlinea dynamic p ope ies in545
esophageal phona ion, Jou nal o Voice 23 (3) (2009) 283–290.
[38] N. Deo e, S. Da a, R. Dwi edi, R. Pala , R. Shah, S. Sayed, M. Jagde,
R. Kazi, Acous ic analysis o acheo-oesophageal oice in male o al la yn-
gec omy pa ien s, The Annals o The Royal College o Su geons o England
93 (7) (2011) 523–527.550
[39] H.-J. Shim, H. R. Jang, H. B. Shin, D.-H. Ko, Ceps al, spec al and ime-
based analysis o oices o esophageal speake s, Folia Phonia ica e Lo-
gopaedica 67 (2) (2015) 90–96.
[40] J. Robbins, H. B. Fishe , E. C. Blom, M. I. Singe , A compa a i e acous ic
s udy o no mal, esophageal, and acheoesophageal speech p oduc ion,555
Jou nal o Speech and Hea ing diso de s 49 (2) (1984) 202–210.
25