scieee Science in your language
[en] (orig)

ACHIEVING SECURITY VIA SPEECH RECOGNITION

Author: patel, swati
Publisher: Zenodo
DOI: 10.5281/zenodo.17657299
Source: https://zenodo.org/records/17657299/files/6216ijist22.pdf
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
DOI : 10.5121/ijis .2016.6222 203
ACHIEVING SECURITY VIA SPEECH
RECOGNITION
Swa i Pa el and Ni a Lokwani
Sm . Chandaben Mohanbhai Pa el Ins i u e o Compu e Applica ions,
Cha usa Uni e si y, Changa
ABSTRACT
Speech is one o he essen ial sou ces o he con e sa ion be ween human beings. We as humans speak and
lis en o each o he in human-human in e ace. People ha e ied o de elop sys ems ha can lis en and
p epa e a speech as pe sons do so na u ally. This pape p esen s a b ie su ey on Speech ecogni ion,
allow people o compose documen s and con ol hei compu e s wi h hei oice. In o he wo ds, he
p ocess o enabling a machine (like a compu e ) o iden i y and espond o he sounds p oduced in human
speech. ASR can be ea ed as he independen , compu e -d i en sc ip o spoken language in o eadable
ex in eal ime. The Speech Recogni ion sys em equi es ca e ul a en ion o he ollowing issues: Meaning
o a ious ypes o speeches, speech ep esen a ion, ea u e ex ac ion echniques, speech classi ie s, and
da abase and pe o mance e alua ion. This pape helps in unde s anding he echnique along wi h hei
p os and cons. A compa a i e s udy o di e en echnique is done as pe s ages.
KEYWORDS
Au oma ic speech ecogni ion (ASR), Analysis, ea u e ex ac ion, Modeling, Tes ing, speech p ocessing,
Human Compu e In e ac ion (HCI).
1. INTRODUCTION
Speech ecogni ion is a p ocess in which compu e (o o he ype o machine) iden i ies spoken
wo ds and pe o ming equi ed ask. The ask is o unde s and spoken wo ds, eac app op ia ely
and hen con e in o ex . Some ime i is also known as speech o ex (STT). Basically, i is
co ec ecognize o wha you a e saying. I is mos ly used when someone’s hands and eyes a e
busy. The speech ecogni ion sys em consis s a mic ophone in which pe son can speak; speech
ecogni ion so wa e o in e p e he speech; a good quali y sound ca d o inpu and/o ou pu ;
and a p ope p onuncia ion.
2. SPEECH RECOGNITION BASICS
Speech ecogni ion au oma ically ex ac s he s ing o wo ds spoken om he speech signal.
The ollowing ac o s a e he basically used o unde s anding speech ecogni ion echnology.
U e ance
An u e ance is he p onuncia ion (speaking) o ei he a wo d o wo ds ha ep esen a single
meaning o he compu e . I can be wo d, wo ds, sen ence, o e en mul iple sen ences.
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
204
Speake dependen & speake independen sys ems
Speake dependen sys ems design speech pa e ns o a speci ic speake . Gene ally hey a e mo e
pe ec o he co ec speake , bu inaccu a e o o he speake s. They accep he speake will
speak in a s eady and consis en oice and empo. Speake independen sys ems a e cons uc ed
om a wide ange o speake s. Adap i e sys ems usually s a wi h speake independen sys ems
and apply aining echniques o adap o he speake o inc ease hei ecogni ion accu acy.
Vocabula ies
I is a collec ion o wo ds o s a emen s ha can be ecognized by he SR sys em. Gene ally, o a
compu e , i is e y easy o iden i y smalle ocabula ies, bu e y di icul o iden i y la ge
ocabula ies. In no mal dic iona ies, i doesn' ha e only a single wo d, bu i can be as long as a
sen ence o wo. Smalle ocabula ies can emembe only one o wo wo ds like “S and up", bu
la ge ocabula ies can ha e a hund ed housand o mo e!
Accu ac
The skill o a ecognize can be s udied by measu ing i s accu acy - o how well i emembe s he
wo ds. This includes no only co ec ly iden i ying a wo d, bu also iden i ying i he spoken a
wo d is no in i s ocabula y. Good ASR sys ems ha e mo e han 98% co ec . The sa is ac o y
accu acy o a sys em eally based on he applica ion.
T aining
Some speech ecognize s ha e he skill o adap o a speake . When he sys em has his skill, i
may allow aining o ake place. An ASR sys em is ained by ha ing he pe son who has epea
s anda d o common ph ases and adjus ing i s compa ison algo i hms o ma ch ha pa icula
pe son. The accu acy o a ecognize usually imp o es wi h aining.
Pe sons who ha e di icul y in speaking, o p onouncing ce ain o he wo ds hey can also use
aining. As long as he pe son can consis en ly epea a wo d, ASR sys ems wi h aining should
be able o adap .
3. TYPES OF SPEECH RECOGNITION
Speech ecogni ion sys ems can be di ided in a ious di e en classes by desc ibing wha ypes
o wo ds hey ha e he abili y o emembe . ASR is he abili y o check when a pe son s a s and
comple es a wo d. Mos packages can i in o mo e han one class, based on which mode hey' e
using.
Isola ed Wo ds
Isola ed wo d equi ed each wo d o ha e silen (lack o an audio signal) on bo h ends. I doesn'
mean ha i accep s single wo ds, bu i does equi e a single wo d a a ime. O en, hese sys ems
ha e "Lis en/No -Lis en" s a es, whe e hey equi e he pe son o wai be ween wo ds (usually
doing p ocessing du ing he pauses). The isola ed wo d migh be a good name o his class.
Connec ed Wo ds
Connec ed wo ds (o mo e co ec ly 'connec ed u e ances') a e simila o isola ed wo ds, bu
allow sepa a e wo ds o be un simul aneously wi h a minimum pause be ween hem.
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
205
Con inuous Speech
P ocessing wi h con inuous speech a e some o he mos di icul o design because hey mus
p ocess wi h he special me hods o de e mine wo d limi s. Con inuous speech allows use s o
speak almos na u ally, while he compu e de ines he con en . Basically, ha pa icula con en
i sel he wo ds which a e dic a ed by Compu e .
Spon aneous Speech
The e appea s o be a a ious de ini ions o wha spon aneous speech in eali y is. A a basic
poin , i can be hough o as speech ha is na u al sounding and no planned. An ASR sys em
wi h spon aneous speech abili y o co e ypes o na u al speech comp ises wi h he wo ds being
un oge he .
Voice Ve i ica ion/Iden i ica ion
Some ASR sys ems a e used o iden i ying speci ic use s. This documen doesn' co e he
pa icula e i ied o secu e sys ems.
4. WORKING OF SPEECH RECOGNITION SYSTEM
Figu e 1. P ocess o Speech Recogni ion
Basically, he con e sion o he oice o an analog signal o mic ophone and ha p ocess akes
he inpu om digi al s age. Inpu o he sys em om he use is known as u e ance (Spoken
inpu om he use o a speech sys em. An u e ance may be a one wo d, a sen ence, an en i e
ph ase, o e en se e al sen ences.) This is he ep esen a ion o he bina y o m o 1s and 0s ha
make up p og amming languages used by he compu e . Any o he kind o sound is no hea d by
he compu e s.
Sound- ecogni ion sys em has acous ic models (An acous ic model is c ea ed by aking audio
demos o speech, and hei ex eco ds, and using so wa e o c ea e nume ical ep esen a ions o
he sounds ha make up each u e ance. I is used by a speech ecogni ion engine o emembe
speech) con e he audio sounds o one o abou ou dozen basic speech componen s (called
phonemes). The la es e sions o speech echnology ha e been de i ed so ha hey elimina e he
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
206
ex a noise and no used in o ma ion ha is no needed o le he compu e wo k. The wo ds we
speak a e changed in o digi al o ms o he basic speech componen s (phonemes).
Once his is comple e, a second s ep o he so wa e begins o wo k. The ex is compa ed o he
digi al dic iona y ha is s o ed in compu e in e nal s o age. This is a e y as collec ion o
wo ds, usually mo e han 100,000. When i compa es and iden i y a ma ch based on he digi al
o m i displays he wo ds on he display. This is he simple and basic p ocess o all speech
ecogni ion sys ems.
Figu e 2. Wo king o Speech Recogni ion
To con e speech o ex o a compu e command, a compu e has o go wi h he se e al complex
asks. When you speak, you c ea e ib a ions in he ai . The analog- o-digi al con e e (ADC)
con e s his analog wa e o he digi al da a ha can be unde s ood by compu e . To do his
digi izes he sound by aking small measu emen s o he wa e a he egula ime. The sys em
il e s he digi ized sound o emo e unwan ed noise, and some imes o sepa a e i in o a ious
bands o equency ( equency is he wa eleng h o he sound wa es, hea d by humans as
di e ences in pi ch). I also no malizes he sound, o adjus s i o a cons an olume le el. I may
also ha e o be empo ally aligned. The speed o he speech has been always in a iable o m, so
he sound mus be adjus ed o ma ch he speed o he o ma sound ha is al eady a ailable in he
sys em's memo y.
Nex he signal is di ided in o small segmen s as sho as a ew hund eds o a second, o e en
housand hs in he case o passi e consonan sounds -- consonan s ops p oduced by obs uc ing
ai low in he ocal ac -- like "p" o " ." The p og am hen ma ches hese segmen s o known
phonemes in he app op ia e language. A phoneme is he smalles elemen o a language -- a
ep esen a ion o he sounds we make and pu oge he o o m meaning ul exp essions. The e a e
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
207
oughly 40 phonemes in he English language (di e en linguis s ha e di e en opinions on he
exac numbe ), while o he languages ha e mo e o ewe phonemes.
The nex s ep seems simple, bu i is ac ually he mos di icul o accomplish and is he ocus o
mos speech ecogni ion esea ch. The p og am examines phonemes in he con ex o he o he
phonemes a ound hem. I uns he con ex ual phoneme plo h ough a complex s a is ical model
and compa es hem o a la ge lib a y o known wo ds, ph ases and sen ences. The p og am hen
de e mines wha he use was p obably saying and ei he ou pu s i as ex o issues a compu e
command.
5. APPROACHES TO SPEECH RECOGNITION
 Acous ic phone ic app oach
 Pa e n Recogni ion app oach
 A i icial in elligence app oach.
5.1 Acous ic Phone ic App oach
I is also called as ule-based app oach. I is use knowledge o phone ics & linguis ics o guide
he sea ch p ocess. This app oach gene ally uses some p inciples which a e de ined exp essing
e e y hing o any hing ha migh help o dec yp based in “blackboa d” a chi ec u e, i.e. A each
decision poin i lays ou he possibili ies and use ules o de e mine which o de s a e accep able.
I has poo pe o mance due o di icul y o exp ess ules, o imp o e he o ganiza ion. This
app oach ecognizes indi idual phonemes, wo ds, sen ence s uc u e and/o signi icance.
5.2 Pa e n Recogni ion App oach
This me hod is di ided in wo s eps, i.e. aining o speech pa e ns and ecogni ion o pa e n by
way o pa e n compa ison. In he pa ame e measu emen phase ( il e bank, LFC, DFT), a
sequence o measu emen s is de eloped based on he inpu signal o de ine he “ es pa e n”.
The unknown es pa e n is hen compa ed wi h each sound e e ence pa e n and a measu e o
comp ise be ween he examina ion pa e n & e e ence pa e n. Bes ma ches he unknown es
pa e n based on he ma ching o he pa e n classi ica ion phase (dynamic ime wa ping).
5.2.1 Templa e-based app oach
This app oach p o ides a amily o echniques ha ha e ad anced he ield conside ably du ing
he las six decades. A collec ion o p o o ypical speech pa e ns is s o ed as e e ence pa e ns
ep esen ing he dic iona y o applican ’s wo ds. Recogni ion is hen ca ied ou by ma ching an
unknown spoken u e ance wi h each e e ence empla e and choosing he ca ego y o he bes
ma ching pa e n. Gene ally empla es a e c ea ed o en i e wo ds. This has he ad an age ha ,
e o s due o segmen a ion o classi ica ion o smalle acous ically mo e a iable uni s such as
phonemes can be a oided.
5.2.2 S a is ics-based app oach
S ochas ic modelling en ails he use o p obabilis ic models o deal wi h unde ined o incomple e
in o ma ion. In speech ecogni ion, many sou ces like, con usable sounds, speake a iabili y’s,
con ex ual e ec s, and homophone wo ds a e a ec ed o unce ain y and incomple eness. In

In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
208
oday he mos popula s ochas ic app oach is hidden Ma ko modelling. A hidden Ma ko
model is quali ied by a ini e s a e Ma ko model and a se o ou pu dis ibu ions. The ansi ion
pa ame e s in he Ma ko chain models, empo al a iabili ies, while he pa ame e s in he ou pu
dis ibu ion model, spec al a iabili ies. These wo ypes o a iables a e he e ec o speech
ecogni ion.
5.3 A i icial In elligence Recogni ion App oach
This app oach is a combina ion o acous ic phone ic app oach and pa e n ecogni ion app oach.
In his, i exploi s he concep s o acous ic phone ics and pa e n ecogni ion me hods. The
in o ma ion ega ding linguis ic, phone ic and spec og am used by Knowledge based app oach.
Some speech esea che s de eloped ecogni ion sys em ha used acous ic phone ic knowledge o
imp o e classi ica ion ules o speech sounds. While empla e based app oaches ha e been eal
in he design o a a ie y o speech ecogni ion sys ems; hey p o ided li le insigh abou human
speech p ocessing, he eby c ea ing e o analysis and knowledge-based sys em enhancemen
di icul . On he o he hand, a la ge body o linguis ic and phone ic li e a u e p o ided insigh s
and unde s anding o human spoken language p ocessing. In i s comple e o m, he knowledge
enginee ing plan in ol es he di ec and explici inco po a ion o expe ’s speech knowledge in o
a ecogni ion sys em. This knowledge is no mally de i ed om ca e ul s udy o spec og ams
and is inco po a ed using guidelines o p ocedu es. Pu e knowledge enginee ing was also
mo i a ed by he in e es and esea ch in expe sys ems.
6. CHALLENGES AND DIFFICULTIES OF SR
Speech Recogni ion is s ill a e y cumbe some p oblem. Following a e he p oblems….
 Speake Va iabili y
Speake o mo e han one speake may be p onounced he same wo d in a di e en way.
 Channel Va iabili y
The posi ion and quali y o he mic ophone and backg ound en i onmen may be
a ec ing in ou pu
7. APPLICATIONS OF SPEECH RECOGNITION
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
209
Applica ions o Speech Recogni ion
Speech ecogni ion applica ions include
 Voice dialling (e.g., "Call home"),
 Call ou ing (e.g., "I would like o make a collec call"),
 Simple da a en y (e.g., en e ing a c edi ca d numbe ),
 P epa a ion o s uc u ed documen s (e.g., A adiology epo ),
 Speech- o- ex p ocessing (e.g., wo d p ocesso s o emails), and
 In ai c a cockpi s (usually e med Di ec Voice Inpu ).
8. PROS AND CONS OF SPEECH RECOGNITION
PROS:
 The e is no need o ype o w i e ex , and gene ally i is quicke han “ yping” &
“handw i ing”.
 Allows o be e spelling, whe he i is in ex o documen s. I is e y use ul o men al
o physical disabili y pe son.
CONS:
 No p og am is 100% pe ec
 Fac o s like slang, homonyms, signal- o-noise a io, and o e lapping speech a e a ec ing
he accu acy o speech ecogni ion.
 Can be expensi e depending on he p og am
9. CONCLUSIONS
Speech is he p ima y, and he mos app op ia e means o communica ion be ween human being.
Whe he due o echnological cu iosi y o make machines ha mimic humans o desi e o
au oma e wo k wi h machines, esea ch in speech and speake iden i ica ion. This pape
in oduces he basics o speech ecogni ion echnology and also highligh s he di e ence be ween
di e en speech ecogni ion sys ems. In his pape he mos common algo i hms which a e used
o do speech ecogni ion a e also discussed along wi h he cu en and i s u u e use. Speech
ecogni ion is one o he mos in eg a ing a eas o machine in elligence, since, humans do a daily
ac i i y o speech ecogni ion.
10. ACKNOWLEDGEMENT
We a e e y hank ul o ou ins i u e CMPICA (Sm . Chandaben Mohanbhai Pa el Ins i u e o
Compu e Applica ions), CHARUSAT o p o iding necessa y in as uc u e o de elopmen o
he esea ch wo k.
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
210
REFERENCES
[1] M.A.Anusuya, S.K.Ka i “Speech Recogni ion by Machine: A Re iew” In e na ional Jou nal o
Compu e Science and In o ma ion Secu i y
[2] P ee i Saini, Pa nee Kau “Au oma ic Speech Recogni ion: A Re iew” In e na ional Jou nal o
Enginee ing T ends and Technology.
[3] San osh k.Gaikwad, Bha i W.Gawali, P a in Yannawa “A Re iew on Speech Recogni ion
Technique” In e na ional Jou nal o Compu e Applica ions.
[4] Pa winde pal Singh, E . Bhupinde Singh “Speech Recogni ion as Eme ging Re olu iona y
Technology, “In e na ional Jou nal o Ad anced Resea ch in Compu e Science and So wa e
Enginee ing.
[5] h p://www. ldp.o g/HOWTO/Speech-Recogni ion-HOWTO/in oduc ion.h ml
[6] Ship a J. A o a, Rishi Pal Singh “Au oma ic Speech Recogni ion: A Re iew”, In e na ional Jou nal o
Compu e Applica ions.
[7] Wiqas Ghai and Na deep Singh,“Li e a u e Re iew on Au oma ic Speech Recogni ion”,In e na ional
Jou nal o Compu e Applica ions ol. 41– no.8, pp. 42-50, Ma ch 2012.
[8] Nnamdi Okomba S., 2Adegboye Mu iu Adesina, and 3Candidus O. Okwo ., “Su ey o Technical
P og ess in Speech Recogni ion by Machine o e Few Yea s o Resea ch”, IOSR Jou nal o
Elec onics and Communica ion Enginee ing
[9] h p://www.compu e hope.com/ja gon/ / oic eco.h m
[10] h ps://en.wikipedia.o g/wiki/Speech_ ecogni ion
[11] h ps://en.wikipedia.o g/wiki/Acous ic_model
AUTHORS
Swa i Pa el ecei ed he M.C.A. & B.C.A. deg ees om Dha msinh Desai Uni e si y,
Nadiad, Guj a , India in 2011 & 2009 espec i ely. She is p esen ly wo king as an
Assis an P o esso in Sm . Chandaben Mohanbhai Pa el Ins i u e o Compu e
Applica ions, Changa. He esea ch a ea includes HCI and Pa e n Recogni ion.
Ni a Lokwani ecei ed he M.C.A. deg ee om Sm . Chandaben Mohanbhai Pa el
Ins i u e o Compu e Applica ions, Cha usa Uni e si y, Changa, Guj a , India in 2012 &
B.C.A. deg ee om Dha msinh Desai Uni e si y, Nadiad, Guj a , India in 2009. She is
p esen ly wo king as an Assis an P o esso in Sm . Chandaben Mohanbhai Pa el Ins i u e
o Compu e Applica ions, Changa. He esea ch a ea includes Human Compu e
In e ac ion and wo king on Embedded Sys ems.