ACHIEVING SECURITY VIA SPEECH RECOGNITION

Author: patel, swati

Publisher: Zenodo

DOI: 10.5281/zenodo.17657299

Source: https://zenodo.org/records/17657299/files/6216ijist22.pdf

In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
DOI : 10.5121/ijis .2016.6222 203
ACHIEVING SECURITY VIA SPEECH
RECOGNITION
Swa i Pa el and Ni a Lokwani
Sm . Chandaben Mohanbhai Pa el Ins i u e o Compu e Applica ions,
Cha usa Uni e si y, Changa
ABSTRACT
Speech is one o he essen ial sou ces o he con e sa ion be ween human beings. We as humans speak and
lis en o each o he in human-human in e ace. People ha e ied o de elop sys ems ha can lis en and
p epa e a speech as pe sons do so na u ally. This pape p esen s a b ie su ey on Speech ecogni ion,
allow people o compose documen s and con ol hei compu e s wi h hei oice. In o he wo ds, he
p ocess o enabling a machine (like a compu e ) o iden i y and espond o he sounds p oduced in human
speech. ASR can be ea ed as he independen , compu e -d i en sc ip o spoken language in o eadable
ex in eal ime. The Speech Recogni ion sys em equi es ca e ul a en ion o he ollowing issues: Meaning
o a ious ypes o speeches, speech ep esen a ion, ea u e ex ac ion echniques, speech classi ie s, and
da abase and pe o mance e alua ion. This pape helps in unde s anding he echnique along wi h hei
p os and cons. A compa a i e s udy o di e en echnique is done as pe s ages.
KEYWORDS
Au oma ic speech ecogni ion (ASR), Analysis, ea u e ex ac ion, Modeling, Tes ing, speech p ocessing,
Human Compu e In e ac ion (HCI).
1. INTRODUCTION
Speech ecogni ion is a p ocess in which compu e (o o he ype o machine) iden i ies spoken
wo ds and pe o ming equi ed ask. The ask is o unde s and spoken wo ds, eac app op ia ely
and hen con e in o ex . Some ime i is also known as speech o ex (STT). Basically, i is
co ec ecognize o wha you a e saying. I is mos ly used when someone’s hands and eyes a e
busy. The speech ecogni ion sys em consis s a mic ophone in which pe son can speak; speech
ecogni ion so wa e o in e p e he speech; a good quali y sound ca d o inpu and/o ou pu ;
and a p ope p onuncia ion.
2. SPEECH RECOGNITION BASICS
Speech ecogni ion au oma ically ex ac s he s ing o wo ds spoken om he speech signal.
The ollowing ac o s a e he basically used o unde s anding speech ecogni ion echnology.
U e ance
An u e ance is he p onuncia ion (speaking) o ei he a wo d o wo ds ha ep esen a single
meaning o he compu e . I can be wo d, wo ds, sen ence, o e en mul iple sen ences.
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
204
Speake dependen & speake independen sys ems
Speake dependen sys ems design speech pa e ns o a speci ic speake . Gene ally hey a e mo e
pe ec o he co ec speake , bu inaccu a e o o he speake s. They accep he speake will
speak in a s eady and consis en oice and empo. Speake independen sys ems a e cons uc ed
om a wide ange o speake s. Adap i e sys ems usually s a wi h speake independen sys ems
and apply aining echniques o adap o he speake o inc ease hei ecogni ion accu acy.
Vocabula ies
I is a collec ion o wo ds o s a emen s ha can be ecognized by he SR sys em. Gene ally, o a
compu e , i is e y easy o iden i y smalle ocabula ies, bu e y di icul o iden i y la ge
ocabula ies. In no mal dic iona ies, i doesn' ha e only a single wo d, bu i can be as long as a
sen ence o wo. Smalle ocabula ies can emembe only one o wo wo ds like “S and up", bu
la ge ocabula ies can ha e a hund ed housand o mo e!
Accu ac
The skill o a ecognize can be s udied by measu ing i s accu acy - o how well i emembe s he
wo ds. This includes no only co ec ly iden i ying a wo d, bu also iden i ying i he spoken a
wo d is no in i s ocabula y. Good ASR sys ems ha e mo e han 98% co ec . The sa is ac o y
accu acy o a sys em eally based on he applica ion.
T aining
Some speech ecognize s ha e he skill o adap o a speake . When he sys em has his skill, i
may allow aining o ake place. An ASR sys em is ained by ha ing he pe son who has epea
s anda d o common ph ases and adjus ing i s compa ison algo i hms o ma ch ha pa icula
pe son. The accu acy o a ecognize usually imp o es wi h aining.
Pe sons who ha e di icul y in speaking, o p onouncing ce ain o he wo ds hey can also use
aining. As long as he pe son can consis en ly epea a wo d, ASR sys ems wi h aining should
be able o adap .
3. TYPES OF SPEECH RECOGNITION
Speech ecogni ion sys ems can be di ided in a ious di e en classes by desc ibing wha ypes
o wo ds hey ha e he abili y o emembe . ASR is he abili y o check when a pe son s a s and
comple es a wo d. Mos packages can i in o mo e han one class, based on which mode hey' e
using.
Isola ed Wo ds
Isola ed wo d equi ed each wo d o ha e silen (lack o an audio signal) on bo h ends. I doesn'
mean ha i accep s single wo ds, bu i does equi e a single wo d a a ime. O en, hese sys ems
ha e "Lis en/No -Lis en" s a es, whe e hey equi e he pe son o wai be ween wo ds (usually
doing p ocessing du ing he pauses). The isola ed wo d migh be a good name o his class.
Connec ed Wo ds
Connec ed wo ds (o mo e co ec ly 'connec ed u e ances') a e simila o isola ed wo ds, bu
allow sepa a e wo ds o be un simul aneously wi h a minimum pause be ween hem.
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
205
Con inuous Speech
P ocessing wi h con inuous speech a e some o he mos di icul o design because hey mus
p ocess wi h he special me hods o de e mine wo d limi s. Con inuous speech allows use s o
speak almos na u ally, while he compu e de ines he con en . Basically, ha pa icula con en
i sel he wo ds which a e dic a ed by Compu e .
Spon aneous Speech
The e appea s o be a a ious de ini ions o wha spon aneous speech in eali y is. A a basic
poin , i can be hough o as speech ha is na u al sounding and no planned. An ASR sys em
wi h spon aneous speech abili y o co e ypes o na u al speech comp ises wi h he wo ds being
un oge he .
Voice Ve i ica ion/Iden i ica ion
Some ASR sys ems a e used o iden i ying speci ic use s. This documen doesn' co e he
pa icula e i ied o secu e sys ems.
4. WORKING OF SPEECH RECOGNITION SYSTEM
Figu e 1. P ocess o Speech Recogni ion
Basically, he con e sion o he oice o an analog signal o mic ophone and ha p ocess akes
he inpu om digi al s age. Inpu o he sys em om he use is known as u e ance (Spoken
inpu om he use o a speech sys em. An u e ance may be a one wo d, a sen ence, an en i e
ph ase, o e en se e al sen ences.) This is he ep esen a ion o he bina y o m o 1s and 0s ha
make up p og amming languages used by he compu e . Any o he kind o sound is no hea d by
he compu e s.
Sound- ecogni ion sys em has acous ic models (An acous ic model is c ea ed by aking audio
demos o speech, and hei ex eco ds, and using so wa e o c ea e nume ical ep esen a ions o
he sounds ha make up each u e ance. I is used by a speech ecogni ion engine o emembe
speech) con e he audio sounds o one o abou ou dozen basic speech componen s (called
phonemes). The la es e sions o speech echnology ha e been de i ed so ha hey elimina e he
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
206
ex a noise and no used in o ma ion ha is no needed o le he compu e wo k. The wo ds we
speak a e changed in o digi al o ms o he basic speech componen s (phonemes).
Once his is comple e, a second s ep o he so wa e begins o wo k. The ex is compa ed o he
digi al dic iona y ha is s o ed in compu e in e nal s o age. This is a e y as collec ion o
wo ds, usually mo e han 100,000. When i compa es and iden i y a ma ch based on he digi al
o m i displays he wo ds on he display. This is he simple and basic p ocess o all speech
ecogni ion sys ems.
Figu e 2. Wo king o Speech Recogni ion
To con e speech o ex o a compu e command, a compu e has o go wi h he se e al complex
asks. When you speak, you c ea e ib a ions in he ai . The analog- o-digi al con e e (ADC)
con e s his analog wa e o he digi al da a ha can be unde s ood by compu e . To do his
digi izes he sound by aking small measu emen s o he wa e a he egula ime. The sys em
il e s he digi ized sound o emo e unwan ed noise, and some imes o sepa a e i in o a ious
bands o equency ( equency is he wa eleng h o he sound wa es, hea d by humans as
di e ences in pi ch). I also no malizes he sound, o adjus s i o a cons an olume le el. I may
also ha e o be empo ally aligned. The speed o he speech has been always in a iable o m, so
he sound mus be adjus ed o ma ch he speed o he o ma sound ha is al eady a ailable in he
sys em's memo y.
Nex he signal is di ided in o small segmen s as sho as a ew hund eds o a second, o e en
housand hs in he case o passi e consonan sounds -- consonan s ops p oduced by obs uc ing
ai low in he ocal ac -- like "p" o " ." The p og am hen ma ches hese segmen s o known
phonemes in he app op ia e language. A phoneme is he smalles elemen o a language -- a
ep esen a ion o he sounds we make and pu oge he o o m meaning ul exp essions. The e a e
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
207
oughly 40 phonemes in he English language (di e en linguis s ha e di e en opinions on he
exac numbe ), while o he languages ha e mo e o ewe phonemes.
The nex s ep seems simple, bu i is ac ually he mos di icul o accomplish and is he ocus o
mos speech ecogni ion esea ch. The p og am examines phonemes in he con ex o he o he
phonemes a ound hem. I uns he con ex ual phoneme plo h ough a complex s a is ical model
and compa es hem o a la ge lib a y o known wo ds, ph ases and sen ences. The p og am hen
de e mines wha he use was p obably saying and ei he ou pu s i as ex o issues a compu e
command.
5. APPROACHES TO SPEECH RECOGNITION
 Acous ic phone ic app oach
 Pa e n Recogni ion app oach
 A i icial in elligence app oach.
5.1 Acous ic Phone ic App oach
I is also called as ule-based app oach. I is use knowledge o phone ics & linguis ics o guide
he sea ch p ocess. This app oach gene ally uses some p inciples which a e de ined exp essing
e e y hing o any hing ha migh help o dec yp based in “blackboa d” a chi ec u e, i.e. A each
decision poin i lays ou he possibili ies and use ules o de e mine which o de s a e accep able.
I has poo pe o mance due o di icul y o exp ess ules, o imp o e he o ganiza ion. This
app oach ecognizes indi idual phonemes, wo ds, sen ence s uc u e and/o signi icance.
5.2 Pa e n Recogni ion App oach
This me hod is di ided in wo s eps, i.e. aining o speech pa e ns and ecogni ion o pa e n by
way o pa e n compa ison. In he pa ame e measu emen phase ( il e bank, LFC, DFT), a
sequence o measu emen s is de eloped based on he inpu signal o de ine he “ es pa e n”.
The unknown es pa e n is hen compa ed wi h each sound e e ence pa e n and a measu e o
comp ise be ween he examina ion pa e n & e e ence pa e n. Bes ma ches he unknown es
pa e n based on he ma ching o he pa e n classi ica ion phase (dynamic ime wa ping).
5.2.1 Templa e-based app oach
This app oach p o ides a amily o echniques ha ha e ad anced he ield conside ably du ing
he las six decades. A collec ion o p o o ypical speech pa e ns is s o ed as e e ence pa e ns
ep esen ing he dic iona y o applican ’s wo ds. Recogni ion is hen ca ied ou by ma ching an
unknown spoken u e ance wi h each e e ence empla e and choosing he ca ego y o he bes
ma ching pa e n. Gene ally empla es a e c ea ed o en i e wo ds. This has he ad an age ha ,
e o s due o segmen a ion o classi ica ion o smalle acous ically mo e a iable uni s such as
phonemes can be a oided.
5.2.2 S a is ics-based app oach
S ochas ic modelling en ails he use o p obabilis ic models o deal wi h unde ined o incomple e
in o ma ion. In speech ecogni ion, many sou ces like, con usable sounds, speake a iabili y’s,
con ex ual e ec s, and homophone wo ds a e a ec ed o unce ain y and incomple eness. In

In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
208
oday he mos popula s ochas ic app oach is hidden Ma ko modelling. A hidden Ma ko
model is quali ied by a ini e s a e Ma ko model and a se o ou pu dis ibu ions. The ansi ion
pa ame e s in he Ma ko chain models, empo al a iabili ies, while he pa ame e s in he ou pu
dis ibu ion model, spec al a iabili ies. These wo ypes o a iables a e he e ec o speech
ecogni ion.
5.3 A i icial In elligence Recogni ion App oach
This app oach is a combina ion o acous ic phone ic app oach and pa e n ecogni ion app oach.
In his, i exploi s he concep s o acous ic phone ics and pa e n ecogni ion me hods. The
in o ma ion ega ding linguis ic, phone ic and spec og am used by Knowledge based app oach.
Some speech esea che s de eloped ecogni ion sys em ha used acous ic phone ic knowledge o
imp o e classi ica ion ules o speech sounds. While empla e based app oaches ha e been eal
in he design o a a ie y o speech ecogni ion sys ems; hey p o ided li le insigh abou human
speech p ocessing, he eby c ea ing e o analysis and knowledge-based sys em enhancemen
di icul . On he o he hand, a la ge body o linguis ic and phone ic li e a u e p o ided insigh s
and unde s anding o human spoken language p ocessing. In i s comple e o m, he knowledge
enginee ing plan in ol es he di ec and explici inco po a ion o expe ’s speech knowledge in o
a ecogni ion sys em. This knowledge is no mally de i ed om ca e ul s udy o spec og ams
and is inco po a ed using guidelines o p ocedu es. Pu e knowledge enginee ing was also
mo i a ed by he in e es and esea ch in expe sys ems.
6. CHALLENGES AND DIFFICULTIES OF SR
Speech Recogni ion is s ill a e y cumbe some p oblem. Following a e he p oblems….
 Speake Va iabili y
Speake o mo e han one speake may be p onounced he same wo d in a di e en way.
 Channel Va iabili y
The posi ion and quali y o he mic ophone and backg ound en i onmen may be
a ec ing in ou pu
7. APPLICATIONS OF SPEECH RECOGNITION
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
209
Applica ions o Speech Recogni ion
Speech ecogni ion applica ions include
 Voice dialling (e.g., "Call home"),
 Call ou ing (e.g., "I would like o make a collec call"),
 Simple da a en y (e.g., en e ing a c edi ca d numbe ),
 P epa a ion o s uc u ed documen s (e.g., A adiology epo ),
 Speech- o- ex p ocessing (e.g., wo d p ocesso s o emails), and
 In ai c a cockpi s (usually e med Di ec Voice Inpu ).
8. PROS AND CONS OF SPEECH RECOGNITION
PROS:
 The e is no need o ype o w i e ex , and gene ally i is quicke han “ yping” &
“handw i ing”.
 Allows o be e spelling, whe he i is in ex o documen s. I is e y use ul o men al
o physical disabili y pe son.
CONS:
 No p og am is 100% pe ec
 Fac o s like slang, homonyms, signal- o-noise a io, and o e lapping speech a e a ec ing
he accu acy o speech ecogni ion.
 Can be expensi e depending on he p og am
9. CONCLUSIONS
Speech is he p ima y, and he mos app op ia e means o communica ion be ween human being.
Whe he due o echnological cu iosi y o make machines ha mimic humans o desi e o
au oma e wo k wi h machines, esea ch in speech and speake iden i ica ion. This pape
in oduces he basics o speech ecogni ion echnology and also highligh s he di e ence be ween
di e en speech ecogni ion sys ems. In his pape he mos common algo i hms which a e used
o do speech ecogni ion a e also discussed along wi h he cu en and i s u u e use. Speech
ecogni ion is one o he mos in eg a ing a eas o machine in elligence, since, humans do a daily
ac i i y o speech ecogni ion.
10. ACKNOWLEDGEMENT
We a e e y hank ul o ou ins i u e CMPICA (Sm . Chandaben Mohanbhai Pa el Ins i u e o
Compu e Applica ions), CHARUSAT o p o iding necessa y in as uc u e o de elopmen o
he esea ch wo k.
In e na ional Jou nal o In o ma ion Sciences and Techniques (IJIST) Vol.6, No.1/2, Ma ch 2016
210
REFERENCES
[1] M.A.Anusuya, S.K.Ka i “Speech Recogni ion by Machine: A Re iew” In e na ional Jou nal o
Compu e Science and In o ma ion Secu i y
[2] P ee i Saini, Pa nee Kau “Au oma ic Speech Recogni ion: A Re iew” In e na ional Jou nal o
Enginee ing T ends and Technology.
[3] San osh k.Gaikwad, Bha i W.Gawali, P a in Yannawa “A Re iew on Speech Recogni ion
Technique” In e na ional Jou nal o Compu e Applica ions.
[4] Pa winde pal Singh, E . Bhupinde Singh “Speech Recogni ion as Eme ging Re olu iona y
Technology, “In e na ional Jou nal o Ad anced Resea ch in Compu e Science and So wa e
Enginee ing.
[5] h p://www. ldp.o g/HOWTO/Speech-Recogni ion-HOWTO/in oduc ion.h ml
[6] Ship a J. A o a, Rishi Pal Singh “Au oma ic Speech Recogni ion: A Re iew”, In e na ional Jou nal o
Compu e Applica ions.
[7] Wiqas Ghai and Na deep Singh,“Li e a u e Re iew on Au oma ic Speech Recogni ion”,In e na ional
Jou nal o Compu e Applica ions ol. 41– no.8, pp. 42-50, Ma ch 2012.
[8] Nnamdi Okomba S., 2Adegboye Mu iu Adesina, and 3Candidus O. Okwo ., “Su ey o Technical
P og ess in Speech Recogni ion by Machine o e Few Yea s o Resea ch”, IOSR Jou nal o
Elec onics and Communica ion Enginee ing
[9] h p://www.compu e hope.com/ja gon/ / oic eco.h m
[10] h ps://en.wikipedia.o g/wiki/Speech_ ecogni ion
[11] h ps://en.wikipedia.o g/wiki/Acous ic_model
AUTHORS
Swa i Pa el ecei ed he M.C.A. & B.C.A. deg ees om Dha msinh Desai Uni e si y,
Nadiad, Guj a , India in 2011 & 2009 espec i ely. She is p esen ly wo king as an
Assis an P o esso in Sm . Chandaben Mohanbhai Pa el Ins i u e o Compu e
Applica ions, Changa. He esea ch a ea includes HCI and Pa e n Recogni ion.
Ni a Lokwani ecei ed he M.C.A. deg ee om Sm . Chandaben Mohanbhai Pa el
Ins i u e o Compu e Applica ions, Cha usa Uni e si y, Changa, Guj a , India in 2012 &
B.C.A. deg ee om Dha msinh Desai Uni e si y, Nadiad, Guj a , India in 2009. She is
p esen ly wo king as an Assis an P o esso in Sm . Chandaben Mohanbhai Pa el Ins i u e
o Compu e Applica ions, Changa. He esea ch a ea includes Human Compu e
In e ac ion and wo king on Embedded Sys ems.

Related note

Why institutions use Plag.ai for originality review, entry 39
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by review committees in large academic systems, distance-learning programs, and cross-border universities, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also clearer separation between similarity and misconduct, more consistent review procedures, and more transparent source review. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For grant proposals, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai