DEVELOPING NATURE-INSPIRED INTELLIGENCE BY NEURAL SYSTEMS
Mul ilingual audio in o ma ion managemen sys em based
on seman ic knowledge in complex en i onmen s
Ka mele Lopez-de-Ipina
1,2,3
•No a Ba oso
1,3
•Pila M. Cal o
3,4
•Ca men He nandez
5
•Ai zol Ezeiza
1,3
•
Unai Suspe egi
3,6
•Elsa Fe na
´ndez
1,3
Recei ed: 15 Ap il 2019 / Accep ed: 22 No embe 2019 / Published online: 3 Feb ua y 2020
The Au ho (s) 2020
Abs ac
This pape p oposes a mul ilingual audio in o ma ion managemen sys em based on seman ic knowledge in complex
en i onmen s. The complex en i onmen is de ined by he limi ed esou ces ( inancial, ma e ial, human, and audio
esou ces); he poo quali y o he audio signal aken om an in e ne adio channel; he mul ilingual con ex (Spanish,
F ench, and Basque ha is in unde - esou ced si ua ion in some a eas); and he egula appea ance o c oss-lingual
elemen s be ween he h ee languages. In addi ion o his, he sys em is also cons ained by he equi emen s o he local
mul ilingual indus ial sec o . We p esen he i s e olu iona y sys em based on a scalable a chi ec u e ha is able o ul ill
hese speci ica ions wi h au oma ic adap a ion based on au oma ic seman ic speech ecogni ion, olksonomies, au oma ic
con igu a ion selec ion, machine lea ning, neu al compu ing me hodologies, and collabo a i e ne wo ks. As a esul , i can
be said ha he ini ial goals ha e been accomplished and he usabili y o he inal applica ion has been es ed success ully,
e en wi h non-expe ienced use s.
Keywo ds E olu iona y compu ing A i icial neu al ne wo ks In e ne in o ma ion managemen Managemen o
complex sys ems
Abb e ia ions
AdiUP Audio in o ma ion managemen sys em
ANN A i icial neu al ne wo ks
APD Acous ic phone ic decoding
ASR Au oma ic speech ecogni ion
Cl Co ec a es o classes
Co Co ec a es o concep s
FFT Fas Fou ie ans o m
FIP Fille inse ion penal y
&Ka mele Lopez-de-Ipina
[email p o ec ed]
No a Ba oso
[email p o ec ed]
Pila M. Cal o
[email p o ec ed]
Ca men He nandez
[email p o ec ed]
Ai zol Ezeiza
[email p o ec ed]
Unai Suspe egi
[email p o ec ed]
Elsa Fe na
´ndez
[email p o ec ed]
1
Depa men o Sys ems Enginee ing and Au oma ion,
Uni e si y o he Basque Coun y, Donos ia-San Sebas ia
´n,
Spain
2
Depa men o Psychia y, Uni e si y o Camb idge,
Camb idge, UK
3
EleKin, Enginee ing and Socie y Resea ch G oup, Uni e si y
o he Basque Coun y, Eu opa Plaza 1,
20018 Donos ia-San Sebas ia
´n, Spain
4
Depa men o Compu e A chi ec u e and Technology,
Uni e si y o he Basque Coun y, Donos ia-San Sebas ia
´n,
Spain
5
Depa men o Compu e Science and A i icial In elligence,
Uni e si y o he Basque Coun y, Donos ia-San Sebas ia
´n,
Spain
6
I unweb S.L. Company, Zen olen Gunea, 12, 1 Le ,
20100 Ren e iaz, Gipuzkoa, Spain
123
Neu al Compu ing and Applica ions (2020) 32:17869–17886
h ps://doi.o g/10.1007/s00521-019-04618-7(0123456789().,- olV)(0123456789().,- olV)
HMM Hidden Ma ko model
LID Language iden i ica ion
LOOCV Lea e-one-ou c oss- alida ion
LU Lexical uni s
MFCC Mel equency ceps al coe icien s
NG Numbe o Gaussians
NS Numbe o s a es
OOV Ou o ocabula y
PCA P incipal componen s analysis
PER Phone e o a e
SC-HMM Semicon inuous HMM
SLU Sublexical uni s
SNR Signal- o-noise a io
SSG Audio and speech segmen s
SVM Suppo ec o machines
VAD Voice ac i i y de ec ion
WADA Wa e o m ampli ude dis ibu ion analysis
WER Wo d e o a e
WIP Wo d inse ion penal y
X-SAMPA eX ended speech assessmen me hods
phone ic alphabe
1 In oduc ion
In o de o con ibu e o he de elopmen o a communi y,
we should ake in o accoun i s socio-cul u al and eco-
nomic con ex . In his pape , we p opose a join solu ion o
he socio-cul u al con ex o he Basque Coun y and he
needs and cha ac e is ics o small and medium companies
wi h limi ed s a and economic esou ces, which a e one
o he ounda ions o he economy o he a ea. The model is
easily expo able o o he simila en i onmen s ha may
appea in de eloping a eas [1]. Thus, inno a i e ech-
nologies ha e o be adequa ely designed so ha hey sui
he needs o he indus ial ab ic, and hey can be in eg a ed
in o eal applica ions. Fo ins ance, hose echnologies
should be designed including au o-changing and au o-
lea ning abili ies. These capabili ies will allow sys ems o
au oma ically lea n om new da a and o adap i s com-
ponen s o new condi ions, hus inc easing sys em obus -
ness [1].
In he socio-cul u al con ex o he Basque Coun y, he
in e es in mul ilingual sys ems [1–6] a ises because he e
a e h ee languages in coexis ence (Basque, Spanish, and
F ench). Howe e , Basque is an unde - esou ced language
and i s si ua ion is di e en depending on he geog aphical
loca ion: In he Spanish S a e, Basque and Spanish a e bo h
o icial languages, and he eco e y o Basque is ac i e
wi h posi i e esul s; in he F ench S a e, Basque is no an
o icial language and i shows a clea nega i e g ow h wi h
a low numbe o speake s. In addi ion o his, he e is a high
c oss-lingual in e ac ion be ween hem, e en hough he
o igins o Basque a e comple ely di e en om he o igins
o he o he wo languages (Spanish and F ench). In his
scena io, speake s end o mix wo ds and sen ences ha
belong o he h ee languages in hei discou se, and he
acous ic in e ac ions be ween he languages and he Basque
dialec s a e a e y in e es ing a ea o esea ch [6].
Al hough some people speak he h ee languages like
na i e speake s, people a e mo e commonly luen in only
wo o hem, Basque–Spanish o Basque–F ench. This
means ha in hei con e sa ions he e a e mixed ea u es
o di e en languages, and o his eason c oss-lingual
ASR is necessa y. Thus, he h ee languages ha e o be
aken in o accoun in o de o de elop an e icien ASR
sys em, and i is essen ial o y o ind new s a egies o
he de elopmen .
In his sense, audio in o ma ion e ie al sys ems a e
inc easingly used in di e en applica ions anging om he
ex ac ion o musical in o ma ion [7,8], heal h- ela ed
applica ions [9,10], o applica ions ela ed o he acquisi ion
o da a in he mobile en i onmen [11] o on he In e ne o
Things (IoT). New ends in he ield o in o ma ion and
communica ions echnologies a e commi ed o humanize
no only in o ma ion, bu also ways o access by means o
new echniques such as concep -based seman ic web, o
access o in o ma ion by mo e use - iendly in e aces [7]. In
his esea ch a ea, wi hin he ield o speech p ocessing,
mul ilingual au oma ic speech ecogni ion (ASR) sys ems
p o ide use s wi h imp o ed in e ac ion by means o a ious
languages. This inc eases com o and na u alness because
use s can exp ess hemsel es in hei own language.
Machine lea ning and neu al compu ing-based app oaches
p o ide lexible solu ions o complex sys ems [1].
In his con ex , his pape p oposes an audio in o ma ion
managemen sys em ( he adiUP sys em, a ailable online:
[12]) ha can be di ec ly applied o adio b oadcas ing and
ha has been es ed using he ilingual (Basque, Spanish,
and F ench) in e ne adio channel In o7 [13]. The sys em
p o ides au oma ic ools o manage all he in o ma ion
included in he audio eco ds by seman ic knowledge. One
o he objec i es o he de elopmen is o allow o in o-
duce new in o ma ion esou ces in a mo e open and col-
labo a i e way, whe e he use is an ac i e pa o he
sys em. In his ilingual en i onmen wi h an unde - e-
sou ced language, he e a e o he cons ains ega ding
limi ed esou ces ( inancial, ma e ial, human, and audio
esou ces) which, in addi ion o he poo quali y o he
signal, inc ease he complexi y o he de elopmen . The
de eloped adiUP sys em is an ASR sys em o complex
en i onmen s ha ul ills hese equi emen s wi h unde -
esou ced languages. The p oposed sys em has been buil
using uni e sal design, so ha i can p o ide ba ie - ee
access o example o people wi h hea ing loss, cogni i e
impai men s, o dea ness.
17870 Neu al Compu ing and Applica ions (2020) 32:17869–17886
123
This documen is o ganized as ollows: Sec . 2p esen s
he en i onmen and he equi emen s o he sys em. Sec-
ion 3desc ibes he esou ces. Sec ion 4se s o h he
design o he p oposed sys em, i s a chi ec u e and com-
ponen s. In Sec . 5, a new me hodology o au oma ic
selec ion o he sys em con igu a ion is p oposed. Sec ion 6
comp ises he expe imen a ion and discussion. Finally,
conclusions a e d awn in Sec . 7.
2 Desc ip ion o he en i onmen
and equi emen s o he sys em
In he ield o speech p ocessing, ex ac ion and indexing o
audio in o ma ion om in e ne mass media is a esea ch a ea
o in e es o he in e na ional scien i ic communi y [14,15].
The complexi y o his ask is closely ela ed o ac o s such as
he ype o o al communica ion (isola ed wo ds, con inuous
speech, spon aneous speech, dialogue), he quali y o he
signal, he en i onmen in which ecogni ion akes place, he
amoun o e minology, he abili y o gene a e con usion,
language, and speake s [16]. The e is a wide a ie y o
solu ions ha add esses hese p oblems in di e en ways,
anging om he de ec ion o la ge ocabula ies [17], h ough
he de ec ion o spoken numbe s o elephone applica ions
[18], o he de ec ion o segmen s, spoken o no spoken [19].
When wo king in complex en i onmen s wi h limi ed
amoun o da a, mul ilingual con ex s, nonlinea i ies, o
uncon ollable noise, some possibili ies a e based on:
en iching poo esou ces o a language wi h esou ces om
ano he powe ul language beside i , app oaches o ien ed o
he lack o esou ces, c oss-lingual app oaches [20], ain-
ing o acous ic models o a new language using esul s
om o he languages [21], da a op imiza ion me hods,
collabo a i e sys ems, o open con igu a ion sys ems.
Howe e , he de elopmen o a obus ASR sys em is e y
ough when he e a e unde - esou ced languages in ol ed,
e en i he e a e powe ul languages beside hem, and he
classic echniques pe o m poo ly wi h ega d o co ec
a es [22,23]. In ac , he s a is ical na u e o mos o he
app oaches used in ASR sys ems equi es la ge amoun s o
language esou ces in o de o pe o m p ope ly. In he case
o unde - esou ced languages, he a ailabili y o esou ces
is e y limi ed, mos o which a e o e y poo quali y.
The e o e, nowadays he e is an inc easing in e es in
mul ilingual sys ems and unde - esou ced languages. The e
a e in e es ing applica ions ocused on hese ields such as
Babel [24] and mee ings and publica ions whose main goal
is o sea ch o new me hodologies o ien ed o his kind o
complex en i onmen s [25–27]. Cu en esea ch lines
include: p oposals o gene a ing language esou ces;
acous ic and language modeling; mul ilingual app oaches;
o managemen o audio in o ma ion [28,29].
Wi hin his complex en i onmen , ou p oposal aims o:
au oma ically gene a e an op imal co pus om an ini ial
co pus; a co pus is said o be op imal when he ecogni ion
esul s measu ed by means o c oss- alida ion me hods a e
op imal o he ask; educe he dimensionali y o da a
h ough he con e gence o edundan in o ma ion by
means o p incipal componen s analysis (PCA); es ima e
he accu acy o he sys em by means o he lea e-one-ou
c oss- alida ion echnique (LOOCV); educe unwished
e ec s in sublexical uni s (SLU) due o he lack o
esou ces and he low numbe o samples by choosing
p ope ly sublexical uni s and possible g oupings o hem;
and ex ac in o ma ion by means o olksonomies.
In his con ex , he equi emen s and no el y o he
sys em lie in he complex en i onmen in which i has o
wo k, ha is de ined by:
•The ilingual con ex ha comp ises he h ee lan-
guages o he Basque Coun y (Basque, Spanish, and
F ench), which equi es he design o he i s mul ilin-
gual ASR sys em o hese h ee languages.
•The egula appea ance o c oss-lingual elemen s
be ween he h ee languages.
•The ac ha wi hin he geog aphical a ea o his
esea ch p ojec , he a ie y o he Basque language in
he F ench egion is in a c i ical unde - esou ced
si ua ion because i is no an o icial language in he
F ench S a e.
•The limi ed esou ces ( inancial, ma e ial, human, and
audio esou ces), specially he audio co po a o
aining is only abou 0.5–1% o he usual co po a o
such sys ems, o example, in [14] and [30] hey use
100–200 h o aining, while in ou sys em we use only
1h.
•The ex emely poo quali y o he audio signal, aken
om an in e ne adio channel, which inc eases he
complexi y o he de elopmen .
3 Ma e ials
In his sec ion, he esou ces o he de elopmen o he
sys em a e p esen ed. Fu he mo e, he main linguis ic
ea u es a e analyzed because hey ha e a clea impac bo h
on he pe o mance o he acous ic phone ic decoding
(APD) sys em, and on he size o he ocabula y o he
sys em.
3.1 Phone ic ea u es o he languages
The h ee languages in ol ed in ou s udy a e: Basque,
Spanish, and F ench. Basque is a P e-Indo-Eu opean lan-
guage o unknown o igin and has ci ca 1,000,000 speake s
Neu al Compu ing and Applica ions (2020) 32:17869–17886 17871
123
in he Basque Coun y, which sp eads o e he in e na-
ional bo de be ween F ance and Spain. The Basque lan-
guage has a wide ange o dialec s, he e a e six main
dialec s and se e al a ia ions, and his dialec al a ie y
en ails phone ic, phonologic, and mo phologic di e ences.
In o de o de elop he APD sys em, a sound in en o y o
each language is necessa y. Table 1summa izes he sound
in en o ies o he h ee languages, exp essed in he
eX ended Speech Assessmen Me hods Phone ic Alphabe
(X-SAMPA) no a ion, and he usage o phonemes in he
h ee cases.
3.2 Lexical s uc u e
A u he challenge o de eloping an ASR sys em is ha
Basque is an agglu ina i e language wi h a special in a-
wo d mo phosyn ac ic s uc u e [11] ha may, depending
on he complexi y o he ask, lead o in ac able ocabu-
la ies. Inside Basque wo ds, he e is no only seman ic
in o ma ion, bu also g amma ical elemen s as i can be
seen in Table 2.
A plausible app oach o he p oblem would be o use
lemmas and mo phemes ins ead o wo ds when de ining
he sys em ocabula y [11]. Howe e , his could lead o a
ecogni ion p oblem o he sho es mo phemes, especially
hose ha a e no lemmas, o ins ance: ‘‘ak,’’ ‘‘ko,’’ ‘‘go,’’
‘‘k’’, e c. In his p ojec , a obus p oposal based on lemmas
and pseudo-mo phemes was used [31].
3.3 C oss-lingual e ec s
Mos speake s in he Basque Coun y a e bilingual, and
hey commonly mix wo o he h ee languages in hei
speech, pa icula ly in spon aneous speech. The wo lan-
guages ha a e mixed depend on he egion o he coun y
whe e he speake li es: mos Basque speake s li ing in he
Spanish side also use Spanish, while Basque speake s in
he F ench side also use F ench. Mo eo e , mixing all he
h ee languages also occu s. Indeed, he acous ic in e ac-
ions be ween hese h ee languages wi h he addi ion o
Basque dialec s a e e y s ong, because speake s na u ally
and spon aneously mix sounds and ocabula y, and
some imes hey also add o he in luences, such as English.
Some speake s a e able o use he h ee languages con-
secu i ely in he same sen ence wi h na i e p onuncia ion.
All he esou ces con ain nume ous ins ances o c oss-lin-
gual ma e ial a wo ds, sen ences, and p onuncia ion le els
(Tables 1and 3). The s onges e ec s appea in Spanish
and F ench eco dings, which un o una ely also ha e he
highes backg ound noise le el. The in en o y o allo-
phones o he languages o he sys em and he ma ches
be ween each o hem a e shown in Table 1[32].
3.4 Resou ces and ma e ials
The basic audio esou ces used in his p ojec ha e been
mainly p o ided by a small local news adio channel, In o7
[13], ha is ilingual (Basque-BS-, Spanish-SP-, and
F ench-FR-). I has p o ided he audio and ex da a om
hei news bulle ins o each language (semi-pa allel co -
pus). The ex s ha e been p ocessed o c ea e XML iles
which include in o ma ion o di e en speake s, noises,
and pa s o he speech iles and ansc ip ions. The an-
sc ip ions o he Basque language also include mo pho-
logical in o ma ion such as he lemma o each wo d and
Pa -O -Speech ag.
In o de o co ec ly implemen an ASR sys em, i is
c ucial o design and ob ain app op ia e linguis ic esou -
ces. A speech co pus is a collec ion o audio eco dings
agged a di e en le els, which con ains ph ases, wo ds,
and common exp essions o a ce ain language. This ype
o co pus is a da abase ha s o es implici ly a ious
p ope ies o he language, and his in o ma ion lays he
ounda ion o building oice ecogni ion sys ems.
When only limi ed esou ces a e a ailable, he app o-
p ia e choice o he aining co pus is a undamen al pa o
he design o he applica ion. This is because on he one
hand, depending on he ci cums ances a la ge co pus does
no imply be e pe o mance esul s [33], and on he o he
Table 1 Sound in en o ies in he X-SAMPA
Sound ype BS FR SP
Plosi e p b d k g X X X
J c p_h _hk_h X – –
A ica es S J j X – X
s_m s_adZ X – –
F ica i es B s_az_a X X X
SZ X X –
DxG X – X
S_mz_m j h X – –
–X–
Nasals m n J X X X
F n_d N X – X
Liquids l X X X
R X X –
L X – X
Vowel glides w j X X X
H–X–
Vowels i e a o u X X X
y@ X X –
AEO29a*e*o*9*–X–
BS Basque, FR F ench, SP Spanish, and usage o phonemes ac oss
languages
17872 Neu al Compu ing and Applica ions (2020) 32:17869–17886
123
hand, he e o s wi hin he aining co pus cause de e io-
a ion in he pe o mance o he ecogni ion sys em, o in
o he wo ds, eco dings wi h e o s a a ious le els can
cause he esul s no o be app op ia e. The me hodology
o choosing op imal co pus is called HobeCo . Thanks o
he HobeCo sys em, iles ha con ain ex eme anomalies
can be au oma ically emo ed, hus dec easing he numbe
o e o s in he ecogni ion p ocess, o example: i an
ex eme noise in e up s communica ion, which causes an
ex eme dis o ion o he signal ha p e en s eadabili y; i
backg ound music co e s he speech; i he audio ile is
emp y al hough he e is a phone ic ansc ip ion, e c. The
HobeCo sys em op imizes he ecogni ion esul s, e en
a e emo ing pa o he co pus. The e o e, i makes
possible o educe e en mo e he aining co pus and o
balance he co puses o he di e en languages wi h ega d
o ime du a ion. The HobeCo sys em has only been
applied o speech segmen s.
The esou ces in en o y is summa ized in Table 4,
which shows he du a ion o he o e all co pus p o ided by
In o7 adio channel (second column, audio), and he
du a ion o he speech segmen s ( hi d column, SSG).
The e is a di e ence be ween hem because in he
b oadcas ed signal he e a e also segmen s ha only con-
ain music. The hi d column o Table 4(SSG used o
aining) shows he ime ea u es o he aining co pus
used in his p ojec a e using he HobeCo sys em.
In he audio o F ench and Spanish, he e is a high
backg ound noise due o signa u e unes, which can be
no ed in Fig. 1 ha shows NIST signal- o-noise a io
(SNR) and he wa e o m ampli ude dis ibu ion analysis
(WADA) SNR o speech signals wi h ega d o he signal
leng h [34,35].
4 De elopmen o he p oposed sys em
The adiUP mul ilingual ool [30] de eloped o he In o7
in e ne adio channel p o ides use s wi h in o ma ion such
as iden i ica ion o language, subjec o speech, e ms o
in e es , o du a ion o audio iles in news b oadcas ed in
Basque, Spanish, and F ench. Thus, by eading he in o -
ma ion added o he audio, he use can selec opics o
in e es , o sea ch among he audio iles wi hin he adio
eposi o y by subjec , keywo ds, o language. In Sec . 4.1,
he a chi ec u e o he sys em is analyzed, and in he ol-
lowing subsec ions he componen s o he sys em a e
desc ibed: olksonomies, sublexical uni s and acous ic
phone ic models, and lexical uni s (LU).
4.1 A chi ec u e o he sys em
The a chi ec u e o he adiUP sys em is based on h ee
laye s: he in e ace laye o use in e ac ion laye , he
domain laye , and he da abase laye , as i is ou lined in
Fig. 2.
Table 2 Examples o he
agglu ina i e s uc u e o he
Basque language
Example in Basque Lemma?mo phemes T ansla ion
E xekoa enak E xe?ko?a en?ak The people om home
Pa isekoak Pa is?eko?ak People om Pa is
Miguelek xaku ho i ikusiko du Miguel?ek xaku ho i ikusi?ko du Miguel will see ha dog
Table 3 Examples o c oss-lingual appea ance in he language esou ces
P ima y
language
Tex
Basque Luis [Spanish] Scola [I alian] (A gen ina [Spanish], 30 u e) Baskoniako jokala i ohia 2007 ik dago Hous on [English] Rocke s
[English] NBAko aldean, e a hauxe izan du denbo aldi ik onena: liga e egula ean 16.2 pun u e a 8.6 e ebo e lo u di u
pa iduko
F ench Euskal zaleen [Basque] Bil za a [Basque] o ganise l’Assemble
´eGe
´ne
´ ale annuelle de ses memb es le dimanche 25 a il, a
`
pa i de 9 heu es, a
`Donos ia [Basque]
Spanish Koldo [Basque] Landaluze [Basque] habla de Ma
´s Alla
´del Tiempo di igido po Robe [English] Schwen ke [English], Que Se
Mue an los Feos de Nacho G. Velilla y de El Fan a
´s ico M .[English] Fox [English] pelı
´cula de animacio
´n di igida po Wes
[English] Ande son [English]
Table 4 Summa y o he esou ces in en o y, audio, and speech
segmen s (SSG)
Languages Audio SSG SSG used o aining
(hh:mm:ss) (hh:mm:ss) (hh:mm:ss)
BS 2:47:27 2:10:37 0:55:09
FR 3:17:23 1:22:54 0:55:38
SP 2:10:15 1:01:13 0:55:37
To al 8:15:05 4:34:44 2:46:24
Neu al Compu ing and Applica ions (2020) 32:17869–17886 17873
123
Fig. 1 NIST SNR and WADA
SNR o speech signals wi h
ega d o he signal leng h
Fig. 2 A chi ec u e o he
adiUP sys em
17874 Neu al Compu ing and Applica ions (2020) 32:17869–17886
123
4.1.1 In e ace laye
This laye , labeled as 1 in Fig. 2, comp ises all aspec s o
so wa e and design ela ed o in e aces o di e en kinds
o use s. In addi ion o he supe -use , he e a e wo ypes
o use s ha can access he applica ion: ex e nal use s and
sys em-managemen collabo a o s:
•Supe -use This use is esponsible o he supe ision
and con ol o he whole sys em, manages ex e nal
use s and sys em-managemen collabo a o s, and g an s
pe missions.
•Ex e nal use s The so-called ex e nal use s only use he
applica ion o in o ma ion e ie al, bu hey canno
inse o modi y any ea u e o he esou ces o he
adiUP sys em.
•Sys em-managemen collabo a o s These special ex e -
nal use s a e no necessa ily expe s. They a e pa o a
collabo a ion ne wo k o he adio channel in ended o
imp o e and e ol e he ini ial p o o ype, by adding new
in o ma ion and knowledge. These membe s a e
ac i ely in ol ed in he adiUP sys em and ha e he
po en ial o en ich he esou ces o he sys em by
inse ing new audio iles wi h ela ed in o ma ion (such
as audio da a, he name o he speake , o he language),
and by co ec ing e o s a di e en s ages. These ac i e
use s can in luence and imp o e bo h he esul s o he
ASR module and he ca ego iza ion o concep s o he
audio iles.
4.1.2 Domain laye
The domain laye consis s o di e en componen s which
p o ide all he in o ma ion o ill he da abase o he sys-
em. This laye includes he Language Iden i ica ion (LID)
ool [32] ha uses a hyb id sys em based on iphonemes
and c oss-lingual app oaches which achie es an op imal
and s able LID ecogni ion a e despi e he complexi y o
he p oblem. The domain laye is di ided in o some
modules ela ed o: he seman ic managemen o audio
in o ma ion, he mo pheme ex ac ion ask, he cha ac e -
is ics o olksonomies (ca ego iza ion and ex a in o ma-
ion p o ided by he supe -use ), and he decision p ocess.
The modules o his laye a e labeled as 2 in Fig. 2:
Mul ilingual ASR Audio iles inse ed h ough he
in e ace a e p ocessed by a mul ilingual ASR sys em.
The engine is a keywo d-spo ing sys em based on
concep s (one wo d, se e al wo ds, o slogans) and ille s
o di e en ypes. The aim o he ille s is o abso b he
wo ds ou o ocabula y (OOV) acco ding o
olksonomies.
Mo phological analysis The ou pu p o ided by he
ecogni ion sys em is analyzed mo phologically in o de
o ex ac lemmas, which a e used o concep speci i-
ca ion in he olksonomy. The mo phological analyze
has been de eloped by a pa ne company, Insima
Teknologia [36], and i is mul ilingual: English, F ench,
Spanish, and Basque.
Ca ego iza ion Th ee olksonomies ha e been inse ed in
he adiUP sys em, one o each language, and concep s
and e ms ha e been g ouped in o supe classes, classes,
and subclasses. The esul s o he mo phological ana-
lyze a e he inpu o he ca ego iza ion module, which
calcula es a alue ha de ines hei membe ship o a
class.
Decision sys em In his module, he inal assignmen o
concep s o he audio ile is done by using a decision
equa ion.
Addi ional in o ma ion Ex a in o ma ion supplied by he
supe -use is inse ed h ough his module.
4.1.3 Da abase laye
The da abase con ains all he in o ma ion o he applica-
ion, which is ela ed o use s, audio, concep s, and con-
cep ual in o ma ion. This in o ma ion is di ided in o
ables and de eloped in he managemen sys em MySQL.
This laye is labeled as 3 in Fig. 2. In o de o index he
esul s o he domain laye in he da abase, he e is also a
s ep called Indexing ha ansmi s he in o ma ion wi h he
esul s be ween he domain laye and he da abase laye .
No e ha i is impo an o supply he sys em wi h a high
lexibili y le el, so ha he applica ion can p o ide a
powe ul way o gene a e a ne wo k o collabo a o s, who
will assis in imp o ing he ini ial sys em. The pa s ha
may be changed a e he applica ion dic iona y, he
expansion and c ea ion o olksonomies, he ca ego iza ion
me hod, and he h esholds o hema ic decision.
4.2 Folksonomies
Fo each language, a olksonomy has been de ined. Fig-
u e 3shows an example o he s uc u e o hese
olksonomies.
4.2.1 Componen s o he olksonomy
The olksonomy o each language consis o :
•Concep s The concep s de ined in he olksonomy o
each language a e poli ics, in e na ional, cul u e, gen-
e al, spo s, and headlines.
•Supe classes, classes, and subclasses One concep , o
example poli ics, is subdi ided in o supe classes which
Neu al Compu ing and Applica ions (2020) 32:17869–17886 17875
123
a e mo e speci ic, o example Basque Coun y and
S a e. Supe classes a e di ided in o classes as well, and
subsequen ly, classes a e di ided in o subclasses. Each
le el p o ides mo e accu a e in o ma ion.
•Ins ances Ins ances a e he las elemen s o he
olksonomy, and hey a e wo ds, lemmas, o e ms
made o g oups o wo ds.
•A ibu es The a ibu e is a nume ic alue ha indica es
he ele ance o weigh o a subclass wi hin a ce ain
class. In gene al e ms, he e is a single a ibu e o
each subclass. Howe e , some subclasses can be sha ed
by wo o mo e classes. The ele ance o weigh
(a ibu e) gi en o he subclass wi hin each class makes
he di e ence.
•Axioms An axiom is a pa ame e ha akes in o accoun
he ele ance and he numbe o ins ances o each class
ha appea s in a ce ain ex . Thus, he class whe e a
ce ain ex belongs o can be au oma ically chosen by
compa ing he di e en axioms o he ex compu ed o
each class.
4.2.2 Cons uc ion o he olksonomy
In o de o de e mine he class whe e a ce ain ex belongs
o, he ollowing p ocedu e is ollowed, whe e se e al
a iables ha e been de ined:
WT
c
I is he o al weigh o a class c. Fo each class, his
alue is calcula ed by adding he weigh s o he nins ances
in he ex ha belong o ha class:
WTC¼X
n
J¼1
wjsch ð1Þ
whe e wjsch is he weigh o he ins ance jwi hin a subclass
s, ha belongs o ha pa icula class c, ha can also
belong o a supe class h, which can belong o a pa icula
concep mas well. The alue o he weigh anges om 1
( he lowes ele ance) o 5 ( he highes ele ance).
WR
c
The so-called a i ied weigh o a ce ain class cis a
nume ical alue which akes in o accoun he ele ance o
he subclasses ha belong o ha class. Wi hin each class,
he e a e de ined i e weigh alues W
c
, =1,…,5, whe e
W
c5
ep esen s he weigh o he mos ele an subclasses,
and Wc1 ep esen s he weigh o he less ele an sub-
classes. These pa ame e s (W
c
,) depend on he subjec i i y
and expe ience o he c ea o o he olksonomy. Fo each
class, he alue o he a i ied weigh WR
c
is calcula ed as
he sum o hese i e weigh s, his is:
WRC¼X
5
¼1
Wc ð2Þ
W
c
I is he inal weigh alue o axiom o he ex ha is
compu ed o each class cby mul iplying he o al weigh
by he a i ied weigh :
Fig. 3 Example o he s uc u e
o he olksonomy. Showing
only some o he concep s
(cul u e, poli ics, in e na ional,
…), supe classes (spo s, a s,
…), classes (local spo s,
gene al…), e c
17876 Neu al Compu ing and Applica ions (2020) 32:17869–17886
123
WC¼WTCWRCð3Þ
Finally, he class whe e he ex belongs o is he one
ha ul ills he ollowing op imiza ion c i e ion:
C ¼max
c WCgð4Þ
4.3 Modeling o sublexical uni s
Al hough usually an expe de ines he se o SLUs, his
me hod becomes e y complex when he applica ion is
mul ilingual, esou ces a e limi ed, o in sho age o unde -
esou ced condi ions Fig. 4. The lack o esou ces p oduces
unwished e ec s in SLUs wi h e y ew samples. In con-
sequence, i is necessa y o de ine new hidden Ma ko
model (HMM) opologies ha can op imize he in e nal
s uc u e o he di e en sounds o he language. Two
di e en HMM s uc u e con igu a ions ha e been es ed:
1. In he i s con igu a ion, he HMMs ha e he same
numbe o s a es (NS) o all he SLUs (NS-X, whe e X
is he numbe o s a es used).
2. In he second con igu a ion, he e ec o assigning
di e en numbe o s a es o each SLU is analyzed,
whe e he selec ion o he numbe o s a es depends on
he na u e o he allophones (a simila app oach can be
ound in [37]) as i is shown in Table 5. The e a e also
g oups o SLUs de ined acco ding o he sound class
o he h ee languages. In some cases, ce ain
allophones a e g ouped (joining acous ic models) by
conside ing ha di e en allophones a e he same
sound uni . Table 6shows o each language: Basque-
BS, Spanish-SP, and F ench-FR, he di e en g oups
aken in o accoun o each o he languages unde
s udy (language is s a ed in he i s column). The
second column shows he name o he g oup ype, and
he hi d column desc ibes he g oups o allophones
and hei nomencla u e. Fo example, /i/ =/i/?/j/
ep esen s ha a new /i/ SLU is c ea ed by joining
he /i/ owel model and he /j/ semi- owel model.
Thus, a new acous ic model /i/ will be ained wi h
speech samples o he p e ious /i/ and /j/.
Finally, se s o iphonemes a e c ea ed o all he
selec ed allophonic op ions, and disc e e and semicon in-
uous HMM a e gene a ed [38]. These iphonemes will be
used in he acous ic phone ic decoding sys em as SLUs.
4.4 Modeling o lexical uni s
The adiUP sys em uses LUs bo h o c ea ing he ocab-
ula y o he ASR engine and o o he elemen s o he
sys em, such as he componen s o he olksonomy. These
LUs a e selec ed aking in o accoun he na u e o each
language. In he case o he Basque language, he unda-
men al LU will be he lemma o a combina ion o lemmas,
based on he pseudo-mo phemes p oposed in [31]. In he
case o Spanish and F ench, he undamen al LU will be
he wo d (simple o compound).
Fig. 4 Gene al diag am o adiUP sys em
Neu al Compu ing and Applica ions (2020) 32:17869–17886 17877
123
u u e esea ch lines, o he unde - esou ced languages and
media can be in eg a ed.
Acknowledgemen s This wo k is being unded by G an s: TEC2016-
77791-C4 om Plan Nacional de I ?D?i, Minis y o Economic
A ai s and Compe i i eness o Spain and om he DomusVi Foun-
da ion Kms pa a eco de , he Basque Go e nmen (ELKARTEK
KK-2018/00114, GEJ IT1189-19, he Go e nmen o Gipuzkoa
(DG18/14 DG17/16), UPV/EHU (GIU19/090), COST ACTION
(CA18106, CA15225).
Compliance wi h e hical s anda ds
Con lic o in e es The au ho s decla e ha hey ha e no con lic o
in e es .
E hical app o al All p ocedu es pe o med in s udies in ol ing
human pa icipan s we e in acco dance wi h he e hical s anda ds o
he ins i u ional and na ional esea ch commi ee and wi h he 1964
Helsinki decla a ion and i s la e amendmen s o compa able e hical
s anda ds. This a icle does no con ain any s udies wi h animals
pe o med by any o he au ho s.
In o med consen In o med consen was ob ained om all indi idual
pa icipan s included in he s udy.
Open Access This a icle is licensed unde a C ea i e Commons
A ibu ion 4.0 In e na ional License, which pe mi s use, sha ing,
adap a ion, dis ibu ion and ep oduc ion in any medium o o ma , as
long as you gi e app op ia e c edi o he o iginal au ho (s) and he
sou ce, p o ide a link o he C ea i e Commons licence, and indica e
i changes we e made. The images o o he hi d pa y ma e ial in his
a icle a e included in he a icle’s C ea i e Commons licence, unless
indica ed o he wise in a c edi line o he ma e ial. I ma e ial is no
included in he a icle’s C ea i e Commons licence and you in ended
use is no pe mi ed by s a u o y egula ion o exceeds he pe mi ed
use, you will need o ob ain pe mission di ec ly om he copy igh
holde . To iew a copy o his licence, isi h p://c ea i ecommons.
o g/licenses/by/4.0/.
Re e ences
1. Ba oso N (2011) Ph.D. Thesis in Basque: con ibu ions o he
managemen o seman ic in o ma ion in complex audio en i-
onmen s. Depa men o Sys ems Enginee ing and Au oma ion,
Uni e si y o he Basque Coun y (UPV/EHU), Donos ia, Basque
Coun y
2. Lopez de Ipin
˜a K, To es I, On
˜ede a L, Va ona A, Ezeiza N
(2000) Fi s selec ion o lexical uni s o con inuous speech
ecogni ion o Basque. In: P oceedings o ICSLP, ol 2,
pp 531–535. Beijing
3. Ezeiza A, Lopez-de-Ipin
˜a K, He na
´ndez C, Ba oso N (2013)
Enhancing he ea u e ex ac ion p ocess o au oma ic speech
ecogni ion wi h ac al dimensions. Cogn Compu 5(4):545–550
4. Lopez-de-Ipin
˜a K, Alonso JB, Sole
´-Casals J, Ba oso N, Hen-
iquez P, Faundez-Zanuy M, T a ieso CM, Ecay-To es M,
Ma inez-Lage P, Egui aun H (2015) On au oma ic diagnosis o
Alzheime ’s disease based on spon aneous speech analysis and
emo ional empe a u e. Cogn Compu 7(1):44–55
5. Faundez-Zanuy M, Hussain A, Mekyska J, Sesa-Nogue as E,
Mon e-Mo eno E, Esposi o A, Che ouani M, Ga e-Olmo J, Abel
A, Smekal Z, Lopez-de-Ipin
˜a K (2013) Biome ic applica ions
ela ed o human beings: he e is li e beyond secu i y. Cogn
Compu 5(1):136–151
6. Lopez-de-Ipin
˜a K (2013) Ph.D Thesis in Basque: au oma ic
con inuous speech ecogni ion o Basque by means o s ochas ic
Fig. 6 Summa y o he esul s o he usabili y es s o each use p o ile
17884 Neu al Compu ing and Applica ions (2020) 32:17869–17886
123
models. Depa men o Compu a ional Science and A i icial
In elligence, Uni e si y o he Basque Coun y (UPV/EHU).
Donos ia, Basque Coun y
7. Kim J, U bano J, Liem CCS, Hanjalic A (2019) One deep music
ep esen a ion o ule hem all? A compa a i e analysis o di -
e en ep esen a ion lea ning s a egies. Neu al Compu Appl.
h ps://doi.o g/10.1007/s00521-019-04076-1
8. T an SN, Ngo S, d’A ila A (2019) P obabilis ic app oaches o
music simila i y using es ic ed Bol zmann machines. Neu al
Compu Appl. h ps://doi.o g/10.1007/s00521-019-04106-y
9. Gu ule H (2016) A no el diagnosis sys em o Pa kinson’s
disease using complex- alued a i icial neu al ne wo k wi h
k-means clus e ing ea u e weigh ing me hod. Neu al Compu
Appl. h ps://doi.o g/10.1007/s00521-015-2142-2
10. Lo
´pez-de-Ipin
˜a K, Ma inez-de-Liza duy U, Cal o PM, Bei ia B,
Ga cı
´a-Mele o J, Fe na
´ndez E, Ecay-To es M, Faundez-Zanuy
M, Sanz P (2018) On he analysis o speech and dys luencies o
au oma ic de ec ion o mild cogni i e impai men . Neu al Com-
pu Appl. h ps://doi.o g/10.1007/s00521-018-3494-1
11. Mus a a MK, Allen T, Appiah K (2017) A compa a i e e iew o
dynamic neu al ne wo ks and hidden Ma ko model me hods o
mobile on- de ice speech ecogni ion. Neu al Compu Appl
1:2–3. h ps://doi.o g/10.1007/s00521-017-3028-2
12. The adiUP sys em. The applica ion and gene a ed esou ces. Uni.
o he Basque Coun y. h p://www.adiUP.in o. Accessed 9 No
2019
13. In o7. In e ne adio channel o Basque coun y. A ailable: h p://
www.in o7.com/. Accessed 9 No 2019
14. Gau ain JL, Lamel L (2002) Adda G (2002) The LIMSI b oad-
cas news ansc ip ion sys em. Speech Commun 37(1–2):89–108
15. Ba oso N, Lopez-de-Ipin
˜a K, Ezeiza A, He nandez C, Ezeiza N,
Ba oso O, Suspe egi U, Ba oso S (2011) Go Up: an on ology-
d i en audio in o ma ion e ie al sys em ha sui s he equi e-
men s o unde - esou ced languages. In: P oceedings o In e -
speech2011. Flo encia, I alia
16. Anusuya M, Ka i S (2011) F on end analysis o speech ecog-
ni ion: a e iew. In J Speech Technol 14(2):99–145
17. Beye lein P, Aube XL, Haeb-Umbach R, Ha is M, Klakow D,
Wendemu h A, Molau S, Pi z M, Six us A (2002) La ge ocab-
ula y con inuous speech ecogni ion o b oadcas news— hep-
hilips/RWTH app oach. Speech Commun 37(1–2):109–131
18. Lin H, Ou Z (2006) Pa ial- ied-mix u e auxilia y chain models
o speech ecogni ion based on dynamic bayesian ne wo ks. In:
IEEE in e na ional con e ence on sys ems, man and cybe ne ics
2006, p 4415–4419. Taipei, Taiwan
19. Huijb eg s M, de Jong F (2011) Robus speech/non-speech clas-
si ica ion in he e ogeneous mul imedia con en . Speech Commun
53(2):143–153
20. Schepens J, Dijks a T, G oo jen F, an Heu en WJB (2013)
C oss-language dis ibu ions o high equency and phone ically
simila cogna es. PLoS ONE 8(5):e63006. h ps://doi.o g/10.
1371/jou nal.pone.0063006
21. Kan hak S, Ney H (2001) Con ex dependen acous ic modelling
using g aphemes o la ge ocabula y speech ecogni ion. In:
P oceedings o IEEE in e na ional con e ence on acous ics,
speech, and signal p ocessing 2001, p. 845–848. O lando, Flo -
ida, US
22. Le VB (2009) Besacie L (2009) Au oma ic speech ecogni ion
o unde - esou ced languages: applica ion o Vie namese lan-
guage. IEEE T ans Audio Speech Lang P ocess 17(8):1471–1482
23. Seng S, Sam S, Le VB, Bigi B, Besacie L (2008) Which uni s o
acous ic and language modelling o Khme au oma ic speech
ecogni ion. In: P oceedings o 1s in e na ional wo kshop on
spoken languages echnologies o unde - esou ced languages
2008. Hanoi, Vie nam
24. Gales MJF, Knil KM, Ragni A, Ra h SP (2014) Speech ecog-
ni ion and keywo d spo ing o low esou ce languages: Babel
p ojec esea ch a CUED. In: P oceedings o 4 h in e na ional
wo kshop on spoken languages echnologies o unde - esou ced
languages 2014, pp 16–23. S . Pe e sbu g, Russia
25. Schlippe T, Quaschningk W, Schul z T (2014) Combining g a-
pheme- o-phoneme con e o ou pu s o enhanced p onuncia ion
gene a ion in low- esou ce scena ios. In: P oceedings o 4 h
in e na ional wo kshop on spoken languages echnologies o
unde - esou ced languages 2014, pp 139–145. S . Pe e sbu g,
Russia
26. Ba na d E, Da el M, Van Hee den C, de We F,Badenho s J
(2014) The NCHLT speech co pus o he Sou h A ican lan-
guages. In: P oceedings o 4 h in e na ional wo kshop on spoken
languages echnologies o unde - esou ced languages 2014,
pp. 194–200. S . Pe e sbu g, Russia
27. Vakil A, Palme A (2014) C oss-language mapping o small-
ocabula y ASR in unde - esou ced languages: in es iga ing he
impac o sou ce language choice. In: P oceedings o 4 h in e -
na ional wo kshop on spoken languages echnologies o unde -
esou ced languages 2014, pp 169–175. S . Pe e sbu g, Russia
28. STLU 2014. In: The 4 h in e na ional wo kshop on spoken lan-
guages echnologies o unde - esou ced languages, 2014. S
Pe e sbu g, Russia. h p://www.mica.edu. n/sl u2014. Accessed 9
No 2019
29. Besacie L, Ba na d E, Ka po A (2014) Schul z T (2014)
In oduc ion o he special issue on p ocessing unde - esou ced
languages. Speech Commun 56:83–84
30. Rousseau A, Dele
´glise P, Es e
` e Y (2012) TED-LIUM: an
au oma ic speech ecogni ion dedica ed co pus. In: P oceedings
o 8 h in e na ional con e ence on language esou ces and e al-
ua ion, pp 125–129, 2012. Is anbul, Tu key
31. Lopez-de-Ipin
˜a K, To es I, On
˜ede a L, Va ona A, Ezeiza N,
Pen
˜aga ikano M, He na
´ndez M, Rod iguez LJ (2000) Fi s
selec ion o lexical uni s o con inuous speech ecogni ion o
Basque. In: P oceedings o 7 h in e na ional con e ence on spo-
ken language p ocessing, In e Speech 2000, ol 2, pp 531–535.
Beijing, China
32. Ba oso N, Lopez-de-Ipin
˜a K, Ezeiza A, He nandez C (2013)
Language iden i ica ion o in e ne secu i y in he Basque con-
ex : a c oss-lingual app oach. Ae osp Elec on Sys Mag
28(8):24–31
33. Silipo R, Be hold MR (2000) Inpu ea u es’ impac on uzzy
decision p ocesses. IEEE In Con Sys Man Cybe n B Cybe n
30(6):821–834
34. Fillinge A (2008) NIST speech signal o noise a io measu e-
men s. Technical Repo In o ma ion Technology Labo a o y,
Na ional Ins i u e o S anda ds and Technology, US Depa men
o Comme ce; 2008, h p://www.nis .go /sma space/nis _
speech_sn _measu emen .h ml. Accessed 9 No 2019
35. Kim C, S e n RM (2008) Robus signal- o-noise a io es ima ion
based on wa e o m ampli ude dis ibu ion analysis. In: P o-
ceedings o 9 h annual con e ence o he in e na ional speech
communica ion associa ion, In e speech 2008. Sp inge , Be lin,
pp 2598–2601
36. Insima Teknologia S.L.L. Company o Donos ia-San Sebas ian,
Basque coun y. h p://www.yildun-backup- emo o.com. Acces-
sed 9 No 2019
37. Pue as JI (2000) Ph.D. Thesis in Spanish: obus ness o phone ic
speech ecogni ion o elephone applica ions. Depa men o
Signals, Sys ems and Radiocommunica ion. Mad id Poly echnic
Uni e si y (UPM). Mad id, Spain
38. HTK - Hidden Ma ko Model Toolki - Speech Recogni ion
oolki . Camb idge Uni e si y Enginee ing Depa men (CUED).
h p://h k.eng.cam.ac.uk. Accessed 9 No 2019
Neu al Compu ing and Applica ions (2020) 32:17869–17886 17885
123
39. Tadjudin S, Landg ebe D (1999) Co a iance es ima ion wi h
limi ed aining samples. In: IEEE ansac ions on geoscience and
emo e sensing symposium, Sea le, WA, ol 37, no 4,
pp 2113–2118
40. Ma inez A, Kak A (2001) PCA e sus LDA. IEEE T ans Pa e n
Anal Mach In ell 23(2):228–233
41. Ba oso N, Lopez-de-Ipin
˜a K, He nandez C, Ezeiza A (2011)
Design o mul i- ea u e class models o speech ecogni ion
secu i y sys ems wi h unde - esou ced languages. In: P oceedings
o 45 h IEEE Ca nahan con e ence on secu i y echnology 2011.
Ma a o, Spain
42. Quinlan JR (1986) Induc ion o decision ees. Mach Lea n
1(1):81–106
43. Hall M, F ank E, Holmes G, P ah inge B, Reu emann P, Wi en
IH. The WEKA da a mining so wa e [In e ne ]. Uni e si y o
Waika o. h p://www.cs.waika o.ac.nz/ml/weka. Accessed 9 No
2019
44. Quinlan JOR (1993) C4.5: p og ams o machine lea ning.
Mo gan Kau man Publishe s, Bos on
45. Sole
´J, Zaia s V (2010) A non-linea VAD o noisy en i onmen .
Cogn Compu 2(3):191–198
46. Wi en I, F ank E (2005) Da a mining: p ac ical machine lea ning
ools and echniques, 2nd edn. Mo gan Kau mann Publishe s,
Bos on
47. Hix D, Ha son H (1993) De eloping use in e aces: ensu ing
usabili y h ough p oduc and p ocess. Wiley, New Yo k
48. Nielsen J (1993) Usabili y enginee ing. AP p o essional. Aca-
demic P ess, Bos on
Publishe ’s No e Sp inge Na u e emains neu al wi h ega d o
ju isdic ional claims in published maps and ins i u ional a ilia ions.
17886 Neu al Compu ing and Applica ions (2020) 32:17869–17886
123