Deep transfer learning-based gaze tracking for behavioral activity recognition

Author: De Lope Asiaín, Javier,Graña Romay, Manuel María

Publisher: Elsevier

Year: 2022

DOI: 10.1016/j.neucom.2021.06.100

Source: https://addi.ehu.eus/bitstream/10810/58308/1/1-s2.0-S0925231222006403-main.pdf

Deep ans e lea ning-based gaze acking o beha io al ac i i y
ecogni ion
Ja ie de Lope
a,
⇑
, Manuel G aña
b
a
Depa men o A i icial In elligence, Uni e sidad Poli écnica de Mad id (UPM), Mad id, Spain
b
Compu a ional In elligence G oup, Uni e si y o he Basque Coun y (UPV/EHU), San Sebas ian, Spain
a icle in o
A icle his o y:
Recei ed 7 Ma ch 2021
Re ised 8 June 2021
Accep ed 25 June 2021
A ailable online 23 May 2022
Keywo ds:
Deep ans e lea ning
Gaze acking
Gaze e hog am
Human ac i i y ecogni ion
abs ac
Compu a ional E hology s udies ocused on human beings is usually e e ed as Human Ac i i y
Recogni ion (HAR). Speci ically, his pape belongs o a line o wo k on he iden i ica ion o b oad cogni-
i e ac i i ies ha use s ca y ou wi h compu e s. The keys one o his kind o sys ems is he nonin asi e
de ec ion o he subjec ’s gaze ixa ions in selec ed display a eas. Nonin asi eness is ensu ed by using he
con en ional lap op came as wi hou addi ional illumina ion o acking de ices. The gaze e hog ams,
composed as sequences o gaze ixa ions, a e he basis o iden i y he use ac i i ies. To de e mine he
gaze ixa ion display a eas wi h he highes accu acy, his pape explo es he use o a ans e lea ning
app oach applied o se e al well-known deep lea ning ne wo k (DLN) a chi ec u es whose inpu is he
eye a ea ex ac ed om he ace image,and ou pu is he iden i ica ion o he gaze ixa ion a ea in he
compu e sc een. Two di e en da ase s a e c ea ed and used in he alida ion expe imen s. We epo
encou aging esul s ha may allow he gene al use o he sys em.
Ó2022 The Au ho (s). Published by Else ie B.V. This is an open access a icle unde he CC BY-NC-ND
license (h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/).
1. In oduc ion
Compu a ional E hology [1] has become a ho esea ch ield in
he las ew yea s. I in eg a es he in o ma ion om se e al di e -
en senso s and ac i i y measu emen de ices in o de o cha ac-
e ize he beha io o li ing beings. Speci ically, he compu e -
based analysis and ecogni ion o human beha io , e e ed o as
Human Ac i i y Recogni ion (HAR) [2], ecei es plen y o a en ion
and con ibu ions. Basically, he e a e wo ypes o senso s used in
HAR esea ch: came as [3] and ine ial senso s [4]. In compu a-
ional neu oe hology, hese senso s a e usually combined wi h
neu onal ac i i y da a cap u ed by using, o example, elec oen-
cephalog aphy (EEG) equipmen [5]. Much e o in HAR esea ch
is cu en ly di ec ed o he moni o ing o aging people [14], and
o he pe o mance imp o emen in some spo s [15]. The moni-
o ing o elde ly people is usually mo i a ed by beha io al decline
due o neu odegene a i e diseases and i s goals is o de ec abno -
mal si ua ions o aise ala ms [6], o example, all de ec ion [16].
HAR s udies a e usually o ien ed o he iden i ica ion o low le el
ac i i ies, o ins ance, he de ec ion o abno mal beha io al si ua-
ions in he elde ly [6] by he use o 3D skele on models o body
pos u es [7], hence hey do no deal wi h highe le el beha io ep-
esen a ions such as p o ided by e hog ams.
An e hog am is a ime plo o he low le el ac ions ca ied ou
by he subjec unde obse a ion ha p o ides a high le el beha -
io al ep esen a ion. E hog ams ha e been used o animal pheno-
ype cha ac e iza ion [8]. We a e cu en ly in e es ed in he
cha ac e iza ion o beha io al s a es o a lap op compu e use
by using he lap op came a and he mic ophone o de e mine he
ac i i y pe o med by he use by nonin asi e compu a ional
me hods. P e iously, we ha e s udied he pe o mance o con en-
ional machine lea ning app oaches on such ask [9]. In his pape
we explo e he use o deep lea ning echniques o ecognize he
subjec ’s beha io al ac i i y. Ou hypo hesis is ha he subjec ’s
gaze ixa ions in o ma ion allows o de e mine he speci ic ac i i-
ies in which he subjec is engaged [9,61].
A isual ixa ion is he sus ained gaze du ing a ime in e al in
a speci ic di ec ion which alls upon a single loca ion in he isual
s imulus. I s a e age du a ion in uncon olled condi ions is abou
200 ms [12]. The saccades a e quick, simul aneous mo emen s o
bo h eyes be ween wo o mo e phases o ixa ion in he same
di ec ion [13].Blinking is he semi-au oma ic apid closing o
he eyelids. I s a e is gene ally g ea e han a dozen blinks pe
minu e, al hough i may dec ease when he eyes a e ocused on
an objec o an ex ended pe iod o ime, o example, when
eading.
h ps://doi.o g/10.1016/j.neucom.2021.06.100
0925-2312/Ó2022 The Au ho (s). Published by Else ie B.V.
This is an open access a icle unde he CC BY-NC-ND license (h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/).
⇑
Co esponding au ho .
E-mail add esses: [email p o ec ed] (J. de Lope), [email p o ec ed]
(M. G aña).
Neu ocompu ing 500 (2022) 518–527
Con en s lis s a ailable a ScienceDi ec
Neu ocompu ing
jou nal homepage: www.else ie .com/loca e/neucom
As he in o ma ion is e ie ed du ing he ixa ions, we de e -
mine when hey a e p oduced and in which o de hey a e pe -
o med. We call gaze e hog ams hese empo al sequences o
isual ixa ions which a e he a omic ac ions building up he
beha io al ep esen a ion. We de ine a eas o he display which
ecei es he use a en ion in o de o ca ego ize he isual ixa-
ions. The gaze e hog am may be used o ecognize he subjec ’s
beha io al ac i i y. The wo k in his pape is de o ed o he e alu-
a ion o deep lea ning a chi ec u es on he ask o ecognizing he
gaze ixa ion om un il e ed images o he eye egion.
The es o he pape is o ganized as ollows. Fi s , Sec ion 2p o-
ides a sho iew o he s a e-o -a in bo h lines o ou wo k: gaze
de ec ion and acking, and deep lea ning echniques applied o li e
sciences. Sec ion 3desc ibes he expe imen al da ase s and he
p oposed compu a ional me hods expe imen ed wi h. Sec ion 4
p o ides he expe imen al esul s. The e, we also o e a c i ical
discussion on he esul s. Finally, Sec ion 5we conclude wi h some
summa izing ema ks on ou wo k and di ec ions o u u e wo k.
2. Backg ound
This sec ion p o ides a sho e iew o he s a e-o -a in
ela ed esea ch. Fi s , we summa ize he wo ks on gaze de ec ion
and acking wi h app oxima e o equi alen goals. Then, we
e iew some an eceden s and cu en de elopmen s in he g ow-
ing a ea o deep lea ning applied o li e sciences.
2.1. Gaze de ec ion and acking
Gaze in o ma ion has been used o diagnos ic and ac i e in e -
ac ion pu poses [10,11,18]. Fo example, gaze in e ac ion has been
used o communica ion wi h people su e ing ex eme disabili y
[24] o in pa ien s wi h Alzheime ’s Disease (AD) [25]. Diagnos ic
applica ions ha e been widesp ead in many di e en a eas such
as neu oscience [26,27], in luence o s uden s’ isual a en ion
and school ailu e [28] o analysis o acial exp ession explo a ion
in subjec s wi h social anxie y [29].
Gaze de ec ion has been a esea ch challenge o a long ime
[17,18]. Ea ly success ul app oaches [19] we e based on elec-
ooculog aphy (EOG), which is a echnique ha uses a se ies o
elec odes si ua ed in he use ’s ace o measu e he eye mo ion
in an elec omagne ic ield. Videooculog aphy (VOG) sys ems
[20] a e op ical-based sys ems using speci ic illumina ion sys ems
—o en in a ed— ha enhance he de ec ion o eye ea u es such
as he pupil and he co nea.
The e is a need o much less in asi e sys ems ha do no
equi e he subjec o wea speci ic in usi e echnology, as is
he case o EOG and VOG. Solu ions based on compu e ision
use con en ional machine lea ning echniques, some a e based
on he localiza ion o he eyeb ows [21], o he s use he es ima ion
o he 3D ace mo ion om a single came a [22]. Recen app oaches
based on deep lea ning a chi ec u es ha e been es ed in neu o-
science s udies [23]. The wo k in his pape goes in his di ec ion
owa ds minimally in asi e eliable gaze de ec ion and acking
sys ems.
2.2. Deep lea ning in he li e sciences
Deep Lea ning (DL) app oaches a e he p o agonis o A i icial
Neu al Ne wo k (ANN) esu gence in he las decade [31–33]. They
o e come he p oblem o he anishing g adien and o e i ing by
a ious app oaches. They p oduce a da a d i en hie a chy o
abs ac ep esen a ions by s ochas ic g adien descen aining
p ocedu es. Speci ically, he con olu ional neu al ne wo k (CNN)
[35] aining p oduces a hie a chy o il e s uned om he da a.
CNNs ha e been ex emely in luen ial in he ad ance o compu e
ision (CV) asks. This a chi ec u e has inspi ed new gene a ions o
DL ne wo ks (DLN) wi h di e se a chi ec u es, which a e epo ing
supe io pe o mance on many di e en p oblems in a eas such as
image p ocessing [36,37], pa e n ecogni ion and objec de ec ion
[38–40], classi ica ion [41,42], acking [48], and ac i i y ecogni-
ion om da a p o ided by ine ial senso s [49].
In he Li e Sciences (LS) he numbe o epo ed DLN applica-
ions du ing he las i e yea s ha e been g owing exponen ially
[34]. Example applica ions o DLN in LS a eas a e he analysis o
medical images in he neu osciences [23,43] and o he medical
a eas [44,45] including ea ly s age de ec ion o COVID-19 in X-
ay imaging [46,47]. DLNs ha e been also applied o acial image
p ocessing, which is a a he complex objec because o many di -
e en ac o s like he ace posi ion and o ien a ion, he mou h and
eyes opening, and he human skin colo ange. The e a e epo ed
DLN app oaches o ace con ou de ec ion [50,51], he acial com-
ponen s ex ac ion [52,53], biome ic acial ecogni ion [54], and
gende classi ica ion [55].
3. Ma e ials and me hods
As p e iously s a ed, ou se up employs he beha io al ac i i y
ecogni ion sys em o de e mine he ac i i y ca ied ou by he
subjec [9]. This sys em uses gaze e hog ams o desc ibe and iden-
i y such ac i i ies. Fig. 1(a) shows an ins ance o a gaze e hog am
ob ained om a subjec eading a ex on he compu e display. Fo
ac i i y ecogni ion pu poses i is enough ha he gaze acking
sys em iden i ies he gaze ixa ion a ge s co esponding o he
b oad a eas in Fig. 1(b). The a ge numbe o de has been a bi a -
Fig. 1. (a) Gaze e hog am co esponding o he use ac i i y ‘‘ eading a ex ” in an
expe imen o du a ion 200 s. The a ge s co espond o nine di e en display a eas
in which he subjec ’s ixa ions a e de ec ed. (b) Templa e used o calib a ion. The
numbe s deno e he sequence o loca ions o he a ge a eas o gaze ixa ions
ollowed by he use du ing calib a ion. The same numbe s a e used as ou pu
ca ego ies o he DLNs.
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
519
ily de ined in o de o educe he subjec a igue while pe o ming
he sys em calib a ion.
We desc ibe he o e all cogni i e ac i i y ecogni ion based on
gaze e hog ams elsewhe e [9]. The wo k in his pape co e s a
no el p oposal ha u ilizes deep lea ning o es ima ing he gaze
ixa ion on he isual a ge a eas. The sys em ha dwa e con igu a-
ion is a lap op compu e endowed wi h a web came a on op o
he sc een upon which a use is wo king. The dis ance o he ace
o he came a is oughly 50 cm, and he came a iew o he ace
is on al, al hough he subjec can mo e eely and change pose
a will. We a e using o he shel web came as ha a e ac o y
ins alled in lap ops, he e o e obus ness is a challenge and a lim-
i a ion. The esolu ion o hese came as is limi ed and o en he
image quali y is qui e low. Addi ional di icul ies a ise om he
uncon olled illumina ion condi ions, and he use eedom o
mo emen in on o he came a.
3.1. Da ase s
We ha e gene a ed wo di e en da ase s o hese expe i-
men s. Bo h a e p oduced om lap op came a cap u ed ideos
wi h esolu ions o 720p, in which he subjec s pe o m ixa ions
in o de on e e y a ge o a calib a ion empla e o 3 s. Then,
hose images a e selec ed o emo e examples wi h oo blu ed
o e y simila images and unclea a ge des ina ions and hey
a e hand-labeled o assign he a ge o each one.
The i s da ase con ains images om 12 subjec s wi h di e -
en equipmen and illumina ion condi ions. The images in his
da ase ha e been balanced in o de o gua an ee an equi alen
numbe o images in each class, ying o an icipa e oubles
du ing he aining s age. This da ase con ains 450 images.
The second da ase con ains images om a unique subjec . The
ideos ha e been eco ded unde di e en illumina ion condi-
ions and a ying dis ance o he came a. The unde lying idea
is o compa e he pe o mance be ween ANNs ained wi h gen-
e al, mul i-use da a and ailo ed, single-use da a. This da ase
is composed by 700 images.
To localize he ace in he images we apply a p e- ained de ec-
o based on his og ams o o ien ed g adien s (HOG) [56] as inpu
ea u es o classi ica ion by linea suppo ec o machines (SVM)
[57]. Once he ace is localized in he image, he nex s ep consis s
o de e mining he posi ion o ace landma ks. This p oblem is
known as ace alignmen . We use a p e iously ained ensemble
o eg ession ees o es ima e he ace landma k posi ion di ec ly
om a spa se subse o pixels in ensi ies [60]. The me hod e u ns
68 2D poin s in he image ha can be used o localize he eyes, eye-
b ows, nose, mou h, and jawline. This app oach allows almos eal-
ime esponse, al hough we ha e ound ouble when he use is
wea ing some kind o glasses du ing he da a cap u e. Finally, we
selec he eye a ea in he o iginal images o add hem o he ali-
da ion da ase s. Fig. 2 shows some examples o hose images and
he co esponding label. In his case we use he a ge iden i ie
me hod o labeling [61].
3.2. Deep ans e lea ning
We ha e e ained six models o DLN. We use a ans e
lea ning app oach, whe e each DLN has been p e iously ained
o e da a om he ImageNe challenge. We keep he weigh s o
he in e media e laye s, e aining he inal laye s ha p oduce
he ac ual classi ica ion ou pu . In deep ans e lea ning [30] he
al eady ained DLN hidden laye s a e assumed as a gene al
ea u e ex ac ion p ocedu e, de ining a mani old ha can be used
o map he inpu da a o classi ica ion o eg ession asks ha a e
di e en om he o iginal one. Task speci ic in o ma ion is
p o ided when aining he ou pu laye s o he DLN.
We use he same ou pu ully connec ed laye o all he ne s. I
con ains 10 neu ons, each one iden i ies one o he display a ge
a eas. The en h neu on is used o de ec cases in which he subjec
has he eyes closed. We ha e chose a so max unc ion o compu -
ing he classes due o i is usually ecommended o likelihood
compu a ion in mul i-class domains. Unless s a ed o he wise, we
ha e used he Adam op imize wi h a lea ning a e o 10
4
.We
ha e alida ed he e ained models by c oss- alida ion in all he
expe imen s epo ed in he nex sec ion. We ha e epea ed 30
imes a 80% hold ou alida ion, whe e we by andomly selec a
80% o he da ase s o aining and use he emaining 20% o es .
We epo he a e age accu acy o he es esul s.
3.2.1. VGG19
The i s DLN e alua ed was he classical VGG19 [62].I isa
classical CNN which has 19 con olu ional laye s ollowed by
max-pooling laye s o educe he image size. In o de o adap
he model o ou da a we ha e emo ed he las laye s o he p e-
ained ne wo k, and added and ained wo ully connec ed laye s
wi h 50 and 25 neu ons, espec i ely, wi h i s owns d opou laye s
o educe he o e i ing.
3.2.2. Incep ion- 3
The Incep ion neu al ne wo k [63] has se e al e sions, he
ou h is he mos ecen . We ha e used he p e ious e sion
because o i s a ailabili y. I s s uc u e is composed o a pa e n
o laye s ha is eplica ed along he ne . The e a e modules wi h
mul iple con olu ional laye s in pa allel ha ex ac di e en
image ea u es, which a e conca ena ed a he end o he module.
We ha e added and ained an addi ional ully connec ed laye
o i i o ou da ase s.
3.2.3. Xcep ion
The Xcep ion neu al ne wo k [64] uses he same modula com-
posi ion idea o Incep ion a chi ec u es bu he e is a modi ica ion
in he pa e ns: i changes he pa allel con olu ional laye s by sep-
a able con olu ional laye s. These new laye s allow o educe he
compu a ions, being he ime equi ed o ain much mo e images,
conside ably sho e .
The Xcep ion s uc u e p esen s h ee di e en s ages. The ini-
ial s age applies a il e o he image o educing he image size
while i keeps he con olu ional laye s. The middle s age a e
epea ed modules, which a e duplica ed up o eigh imes. The inal
s age has been modi ied and e ained o adap i o ou da ase s.
He e we ha e used wo ully connec ed laye s wi h 50 and 20 neu-
ons ollowed by a single d opou laye o a oid o e i ing.
3.2.4. ResNe 50
ResNe 50 [65] is a esidual ne wo k. This kind o DLN a chi ec-
u e ies o model he esidual o he p edic ion a p e ious laye s.
I has di ec p opaga ion o he inpu along he laye s o he ne -
wo k in o de o compu e his esidual. This s uc u al ea u e alle-
ia es he anishing g adien p oblem and p o ides in e es ing
compu a ional p ope ies, such ha he compu a ion a a gi en
laye is independen om p e ious laye s. Residual ne wo ks
may ha e a e y la ge numbe o laye s, he one ha we e ain
on ou da ase s has 50 laye s ha a e g ouped in o se e al blocks.
A he beginning o each block, he compu ed esidual is s o ed and
i is used a he end o he block wi h he compu ed weigh s. In his
case, we a e using he SGD op imize due o i s supe io pe o -
mance agains o he op imize s o his kind o ne wo ks. The
lea ning a e is 10
5
and he decay a e is 10
6
o each i e a ion.
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
520
Fig. 2. Examples o eye egion images cap u ed while subjec s a e pe o ming ixa ions in each a ge ha compose he da ase s. These images a e he inpu o he DLNs
p o iding he gaze ixa ion iden i ica ion.
Fig. 3. Accu acy cu es in aining and es wi h he mul i-use da ase .
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
521
3.2.5. Incep ion-ResNe - 2
The Incep ion-Resne ne [66,67] combine bo h ResNe and
Incep ion app oaches o c ea e a model wi h he ad an ages
p o ided by hem. The s uc u e is composed by se e al blocks
besides he pa allel con olu ional laye s used o conca ena e
blocks. Inside he blocks he e a e epea ed modules o he Incep-
ion la o . The e is also connec ions om he beginning o blocks
o he end simila o ResNe ones.
3.2.6. DenseNe
DenseNe [68] ollows a design idea simila o ResNe al hough
now au ho s add he esidual o each block globally and no only in
he end o each block. Thus, i appea s se e al connec ions om he
inpu s he con olu ional laye s in each block o he ou pu s o
o he blocks. Thanks o his modi ica ion he ne is mo e compac
and equi es lesse laye s o ex ac in o ma ion om he image
because o each laye can ecei e in o ma ion om p e ious laye s.
Fig. 4. Ca ego ical c oss-en opy loss cu es in aining and es wi h he mul i-use da ase .
Table 1
Ne s accu acy and e o wi h he mul i-use da ase .
Ne Bes Accu acy Lowes E o
VGG19 89.01% 0.3452
Incep ion- 3 86.96% 0.5529
Xcep ion 82.61% 0.6023
ResNe 50 86.96% 0.6531
Incep ion-ResNe - 2 84.78% 0.6248
DenseNe 91.30% 0.4281
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
522

Table 2
No malized con usion ma ix om DenseNe and he mul i-use da ase .
P edic ed Ta ge
0123456789
Ac ual Ta ge 0 100 000 0 000
1010 000 0 000
20 0 .833 0 0 0 0 0 0 .167
30 0 .143 .857 0 0 0 0 0 0
4000 010 0 000
50 0 0 0 0 .667 0 0 0 .333
60 0 0 0 0 0 .800 .200 0 0
7000 000 0 100
800 0 0 0 0 0 0 1 0
9000 000 0 001
Fig. 5. A e age accu acy cu es in aining and es o e he single-use da ase .
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
523
4. Expe imen al esul s and discussion
Now we discuss he lea ning esul s and gene al pe o mance
ob ained wi h he DLNs desc ibed abo e. Fi s , we show he
mul i-use da ase esul s. Then, we summa ize and compa e he
esul s ob ained on he single-use da ase .
4.1. Mul i-use da ase
Fig. 3 depic s he a e age accu acy cu es ob ained in aining
and es phases wi h he mul i-use da ase . All he ne s achie e
a high accu acy a e a low numbe o epochs. Mo eo e , he en-
dency in bo h aining and es a e almos pa allel in e e y case.
The ea u es ex ac ed by he p e ained hidden laye s o he DLNs
appea o p o ide a good baseline o his p oblem and ou da ase .
Wi h he excep ion o VGG19, he DLNs s all a e oughly 30
Fig. 6. A e age ca ego ical c oss-en opy loss cu es du ing aining and es o e he single-use da ase .
Table 3
Bes accu acy and e o achie ed by DLN models o e he single-use da ase .
Ne Bes Accu acy Lowes E o
VGG19 94.62% 0.1788
Incep ion- 3 93.55% 0.3151
Xcep ion 93.55% 0.2475
ResNe 50 91.40% 0.2486
Incep ion-ResNe - 2 91.24% 0.2982
DenseNe 95.70% 0.2195
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
524
epochs. I usually comes om he ac ha DLNs s a o o e i o
he aining da a. The e o e, p obably hey would need mo e da a
o keep imp o ing he aining.
T aining minimizes he ca ego ical c oss-en opy loss in o de
o compa e he eal dis ibu ion wi h he p edic ed one. The lowe
he ou pu o his unc ion, he g ea e he deg ee o simila i y o
bo h dis ibu ions, and he g ea e he expec ed accu acy o classi-
ica ion. Fig. 4 show he a e age e olu ion o his e o measu e in
bo h aining and es da ase s. These cu es a e highly nega i ely
co ela ed o he ones in Fig. 3: he highe he accu acy, he lowe
he e o . He e, we also obse e ha he bes aining e olu ion
esul s a e ob ained wi h he VGG19 ne . O he ne s end o o e i
a e he i s epochs.
Table 1 shows he highes accu acy and he lowes e o
achie ed by e e y DLN a chi ec u e. The esul s indica e ha we
ha e been able o ge a leas an accu acy o 80% wi h he new da a.
DenseNe achie es he bes esul wi h mo e han 90%.
Table 2 shows a ypical es con usion ma ix ob ained by he
e ained DenseNe on he mul i-use da ase . Usually he DLNs
end o ou pu e oneous a ge s when subjec s look a he a ge
a eas loca ed a he bo om o he display. The eason is ha he
eyes a e o en closed in hose images so ha i is ha d o de e mine
he igh a ge unde hese condi ions.
4.2. Single-use da ase
The p oposed ans e lea ning a chi ec u es achie e be e
esul s when hey a e e alua ed on he second da ase , which is
composed o images om a single use . The lea ning p oblem
appea s easie han in he mul i-use case, because we emo e
he da a a iabili y due o he use . Mo eo e , he da ase is la -
ge han in he mul i-use case. We can obse e in Fig. 5 ha all
DLN models achie e an accu acy g ea e han 80% in jus a ew
epochs. Howe e , o e i ing appea s in he ini ial epochs so ha
e aining he DLNs do no imp o e hei pe o mance anymo e.
The hypo hesized eason is he high simila i y be ween all he
images in he da ase . I migh be pa ially sol ed by emo ing
edundan da a o by applying addi ional egula iza ion me hods
apa om he d opou laye s deployed a design ime. Fig. 6
shows he e olu ion o he loss unc ion on lea ning and es
da ase s. The e o a e alls apidly bu i emains s able a e
he i s epochs.
Table 3 summa izes he highes accu acy and he lowes e o
achie ed by e e y DLN a chi ec u e a e ans e lea ning. The
esul s a e be e han he ob ained wi h he mul i-use da ase .
All he DLN a chi ec u es achie e accu acies o e 90%. DenseNe
p o ides he bes esul s. No e ha he e he ne s a e lea ning o
classi y he gaze co esponding o jus one subjec . This gi es an
idea abou how impo an can be o ailo he classi ie s o a inal
use .
Table 4 shows a ypical con usion ma ix gene a ed om Den-
seNe and he single-use da ase . The con usion e o in p edic ed
a ge s ollows a e y simila pa e n o he mul i-use case.
4.3. Discussion
The models achie e compe i i e esul s wi h bo h da ase s. The
es accu acy achie ed o e he single-use da ase is g ea e bu
hese esul s mus be aken wi h ca e. The mul i-use case could
o e he be e solu ion o a global sys em o de aul mode, while
he single-use case has o be e ained o each pa icula use .
We expec ha he esul s wi h he mul i-use da ase should
be imp o ed i mo e images om new use s a e added o he da a-
se because he cu en numbe o images is no pa icula ly high
and DLN me hods usually equi e la ge da ase s o e ec i e
aining.
Also we ha e used he same s uc u e and laye s in ne s o
bo h da ase s. P obably we could modi y some laye s in o de o
manage he o e i ing p oblems ound wi h he single-use da a-
se , as p e iously commen ed.
5. Conclusions and u he wo k
We ha e p esen ed a me hod o gaze ixa ion de ec ion based
on deep ans e lea ning in he con ex o beha io al ac i i y
ecogni ion sys ems. This is usually an impo an pa o such sys-
ems. In ou case we mus achie e he bes pe o mance o he gaze
acking sys ems because he goal o ou sys em is o de e mine
ac i i ies ha a use ca ies ou in on a compu e and he inpu s
come om he came a on op o he sc een.
In spi e o he educed da ase s used in he expe imen s he use
o a ailable public p e- ained ne wo ks o domain ans e lea n-
ing allows o achie e good pe o mance wi h a o dable compu a-
ional cos . The bes esul s acco ding he ecogni ion accu acy
ha e been epo ed by he DenseNe model. O he models equi e
lowe aining ime o a e easie o implemen , so i should be con-
side ed as jus one i em o conside .
Fu u e wo ks will check inno a i e ecen DL. Speci ically, he
ecommenda ion o he e iewe s conce ning he 3D-ResNe 35
a chi ec u e [69,70] ha p omises enhanced esul s due o i s abil-
i y o p ocess 3D da a. Ano he al e na i e o u u e wo k is o c e-
a e a new a chi ec u e om sc a ch. We should ex end ou da ase
o his endea o , because a basic equi emen o DLN aining a e
la ge da ase s.
CRediT au ho ship con ibu ion s a emen
Ja ie de Lope: Concep ualiza ion, Me hodology, So wa e,
W i ing – e iew & edi ing. Manuel G aña: Me hodology, W i ing
– e iew & edi ing.
Table 4
No malized con usion ma ix om DenseNe o e he single-use da ase .
P edic ed Ta ge
0123456789
Ac ual Ta ge 0 1000000000
10100000000
20010000000
30001000000
40 0 0 0 .937 0 0 0 .063 0
50 .040 0 0 0 .800 .120 0 0 .040
60000001000
7.133 0 0 0 0 0 0 .867 0 0
80000000010
90000000001
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
525
Decla a ion o Compe ing In e es
The au ho s decla e ha hey ha e no known compe ing inan-
cial in e es s o pe sonal ela ionships ha could ha e appea ed
o in luence he wo k epo ed in his pape .
Acknowledgmen s
This wo k has been suppo ed by FEDER unds h ough MINECO
p ojec TIN2017-85827-P. This p ojec has ecei ed unding om
he Eu opean Union’s Ho izon 2020 esea ch and inno a ion p o-
g amme unde he Ma ie Sklodowska-Cu ie g an ag eemen No.
777720. XinZhe Jin con ibu ed some ea ly compu a ional
expe iences.
Re e ences
[1] D.J. Ande son, P. Pe ona, Towa d a science o Compu a ional E hology, Neu on
84 (2014) 18–31.
[2] M. V igkas, C. Nikou, I. Kakadia is, A e iew o human ac i i y ecogni ion
me hods, F on . Robo . A i . In ell. 2 (2015) 11.
[3] S.-R. Ke, H. Le Uyen Thuc, Y.-J. Lee, J.-N. Hwang, J.-H. Yoo, K.-H. Choi, A e iew
on ideo-based human ac i i y ecogni ion, Compu e s 2(2) (2013) 88–131..
[4] J.Y. Yang, J.S. Wang, Y.P. Chen, Using accele a ion measu emen s o ac i i y
ecogni ion: An e ec i e lea ning algo i hm o cons uc ing neu al classi ie s,
Pa e n Recogn. Le . 29 (16) (2008) 2213–2220.
[5] M. G aña, M. Aguila -Mo eno, J. De Lope, I. Baglie o, X. Ga mendia, Imp o ed
ac i i y ecogni ion combining ine ial mo ion senso s and
elec oencephalog am signals, In . J. Neu al Sys . 30 (10) (2020) 2050053.
[6] A. Len zas, D. V akas, Non-in usi e human ac i i y ecogni ion and abno mal
beha io de ec ion on elde ly people: A e iew, A i . In ell. Re . 53 (2020)
1975–2021.
[7] N. Tasnim, M. Islam, J.-H. Baek, Deep lea ning-based ac ion ecogni ion using
3D skele on join s in o ma ion, In en ions 5 (2020) 49.
[8] J.H.F. Abeelen, Mouse mu an s s udied by means o e hological me hods,
Gene ica 34 (1964) 79–94.
[9] J. De Lope, M. G aña, Beha io al ac i i y ecogni ion based on gaze e hog ams,
In . J. Neu al Sys . 30 (7) (2020) 2050025.
[10] A. Geo ge, Image based eye gaze acking and i s applica ions. a Xi 2019,
1907.04325..
[11] R. Ho , How do you Google? New eye acking s udy e eals huge changes,
Fo bes Online, 2015.
[12] B. Cassin, S. Solomon, Dic iona y o Eye Te minology, T iad Publising Company,
Gaines ille, Flo ida, 1990.
[13] J.D. Ende le, D.A. Sie a, A new linea muscle ibe model o neu al con ol o
saccades, In . J. Neu al Sys . 73 (2013) 1350002..
[14] R.G. Hussain, M.A. Ghazan a , M.A. Azam, U. Naeem, S.U. Rehman, A
pe o mance compa ison o machine lea ning classi ica ion app oaches o
obus ac i i y o daily li ing ecogni ion, A i . In ell. Re . 52(1) (2019) 357–
379..
[15] G. And ienko, N. And ienko, G. Budziak, J. Dykes, G. Fuchs, T. on Landesbe ge ,
H. Webe , Visual analysis o p essu e in oo ball, Da a Min. Knowl. Disc. 31 (6)
(2017) 1793–1839.
[16] E.E. S one, M. Skubic, Unob usi e, con inuous, in-home gai measu emen
using he Mic oso Kinec , IEEE T ans. Biomed. Eng. 60 (10) (2013) 2925–
2932.
[17] A.T. Duchowski, Eye T acking Me hodology — Theo y and P ac ice, Sp inge ,
Cham, 2017.
[18] A.T. Duchowski, Gaze-based in e ac ion: A 30 yea e ospec i e, Compu .
G aph. 73 (2018) 59–69.
[19] L.R. Young, D. Sheena, Su ey o eye mo emen eco ding me hods, Beha . Res.
Me hods Ins um. 7 (5) (1975) 397–439.
[20] B.W. Blakley, L. Chan, Me hods conside a ions o nys agmog aphy, J.
O ola yngol. Head Neck Su g. 44 (2015) 25.
[21] L. Flo ea, C. Flo ea, C. Ve an, Recogni ion o he gaze di ec ion: Ancho ing wi h
eyeb ows, J. Vis. Commun. Image Rep. 35 (2016) 67–77.
[22] K.R. Pa k, J.J. Lee, J. Kim, Gaze posi ion de ec ion by compu ing he h ee
dimensional acial posi ions and mo ions, Pa e n Recogn. 35 (11) (2002)
2559–2569.
[23] Y.-H. Yiu, M. Aboula a, T. Raise , L. Ophey, V.L. Flanagin, P. Zu Eulenbu g, S.-A.
Ahmadi, Deep- og: Open-sou ce pupil segmen a ion and gaze es ima ion in
neu oscience using deep lea ning, J. Neu osci. Me hods 324 (2019) 108307.
[24] N. Ba ba a, T.A. Camille i, K.P. Camille i, EOG-based eye mo emen de ec ion
and gaze es ima ion o an asynch onous i ual keyboa d, Biomed. Signal
P ocess. Con ol 47 (2019) 159–167.
[25] P.M. Insch, G. Slesso , J. Wa ing on, L.H. Phillips, Gaze de ec ion and gaze
cuing in Alzheime ’s Disease, B ain Cogn. 116 (2017) 47–53.
[26] O. G ynszpan, J. Bou eille , S. G ynszpan, F. Le Ba illie , J.C. Ma in, J. Nadel,
Al e ed sense o gaze leading in au ism, Res. Au ism Spec . Diso d. 67 (2019)
101441.
[27] J. Kim, J. Seo, T.H. Laine, De ec ing bo edom om eye gaze and EEG, Biomed.
Signal P ocess. Con ol 46 (2018) 302–313.
[28] M.-J. Tsai, H.-T. Hou, M.-L. Lai, W.-Y. Liu, F.-Y. Yang, Visual a en ion o sol ing
mul iple-choice science p oblem: An eye- acking analysis, Compu . Educ. 58
(2012) 375–385.
[29] A. Gu ie ez-Ga cia, A. Fe nandez-Ma in, M. Del Libano, M.G. Cal o, Selec i e
gaze di ec ion and in e p e a ion o acial exp essions in social anxie y, Pe s.
Indi id. Di e . 147 (2019) 297–305.
[30] M. Talo, U.B. Baloglu, O. Yildi im, U.R. Acha ya, Applica ion o deep ans e
lea ning o au oma ed b ain abno mali y classi ica ion using MR images,
Cogn. Sys . Res. 54 (2019) 176–188.
[31] Y. Bengio, P. Lamblin, D. Popo ici, H. La ochelle, U. Mon eal, G eedy laye -
wise aining o deep ne wo ks (2007) 19.
[32] G.E. Hin on, D. Osinde o, Y-W. Teh, A as lea ning algo i hm o deep belie
ne s, Neu al Compu . 18(7) (2006) 1527–1554..
[33] M.A. Ranza o, Y.-L. Bou eau, Y. LeCun, Spa se ea u e lea ning o deep belie
ne wo ks, Con . Neu al In . P oc. Sys . (2007) 1185–1192.
[34] D. Bacciu, P. Lisboa, J. Ma in-Gue e o, R. S oean, A. Vellido, Bioin o ma ics
and medicine in he e a o deep lea ning, 2018, a Xi :1802.09791..
[35] Y. Lecun, P. Ha ne , L. Bo ou, Y. Bengio, Objec ecogni ion wi h g adien -
based lea ning, in: Shape, Con ou and G ouping in Compu e Vision. Lec u e
No es in Compu e Science, ol 1681. Sp inge , Be lin, Heidelbe g. doi:
10.1007/3-540-46805-6_19..
[36] L.A. Ga ys, A.S. Ecke , M. Be hge, Image s yle ans e using con olu ional
neu al ne wo ks, IEEE Con . on Compu e Vision and Pa e n Recogni ion
(2016) 2414–2423.
[37] G. An ipo , M. Baccouche, J. Dugelay, Face aging wi h condi ional gene a i e
ad e sa ial ne wo ks, IEEE In . Con . in Image P ocessing (2017) 2089–2093.
[38] A. Uca , Y. Demi , C. Guzelis, Objec ecogni ion and de ec ion wi h deep
lea ning o au onomous d i ing applica ions, Simula ion 93 (2017).
[39] J. Redmon, S. Di ala, R. Gi shick, A. Fa hadi, You only look once: Uni ied, eal-
ime objec de ec ion, in: IEEE In . Con . in Image P ocessing, 2016.
[40] K. Po da , C. Pai, S. Akolka , A con olu ional neu al ne wo k based li e objec
ecogni ion sys em as blind aid, 2018..
[41] B. Ma, X. Li, Y. Xia, Y. Zhang, Au onomous deep lea ning: A gene ic DCNN
designe o image classi ica ion, Neu ocompu ing 379 (2020) 152–161.
[42] Y. Zhang, Y. Wang, X.-Y. Liu, S. Mi, M.-L. Zhang, La ge-scale mul i-label
classi ica ion using unknown s eaming images, Pa e n Recogn. 99 (2020)
107100.
[43] M. Talo, U.B. Baloglu, O. Yildi im, U.R. Acha ya, Applica ion o deep ans e
lea ning o au oma ed b ain abno mali y classi ica ion using MR images,
Cogn. Sys . Res. 54 (2019) 176–188.
[44] U. Ragha end a, H. Fuji a, S.V. Bhanda y, A. Gudiga , J.H. Tan, U.R. Acha ya,
Deep con olu ional neu al ne wo k o accu a e diagnosis o glaucoma using
digi al undus images, In . Sci. 441 (2018) 41–49.
[45] O. Yildi im, M. Talo, B. Ay, U.B. Baloglu, G. Aydin, U.R. Acha ya, Au oma ed
de ec ion o diabe ic subjec using p e- ained 2D-CNN models wi h equency
spec um images ex ac ed om hea a e signals, Compu . Biol. Med. 113
(2019) 103387.
[46] Wang, L.; Wong, A. COVID-NET: A ailo ed deep con olu ional neu al ne wo k
design o de ec ion o COVID-19 cases om ches X- ay images. 2020..
[47] F. Shan, Y. Gao, J. Wang, W. Shi, N. Shi, M. Han, Z. Xue, D. Shen, Y. Shi, Lung
in ec ion quan i ica ion o COVID-19 in CT images wi h deep lea ning, 2020..
[48] W. Ouyang, X. Wang, Join deep lea ning o pedes ian de ec ion, IEEE In .
Con . in Compu e Vision, 2013.
[49] R. G zeszick, J.M. Lenk, F.M. Rueda, G.A. Fink, S. Feldho , M. en Hompel, Deep
neu al ne wo k based human ac i i y ecogni ion o he o de picking
p ocess, iWOAR 2017..
[50] H. Jiang, E. Lea ned-Mille , Face de ec ion wi h Fas e RCNN, IEEE In . Con .
Au oma ic Face Ges u e Recogni ion (2017) 650–657.
[51] X. Sun, P. Wu, S.C. Hoi, Face de ec ion using deep lea ning: An imp o ed Fas e
RCNN app oach, Neu ocompu ing 299 (2018) 42–50.
[52] R. Ranjan, V.M. Pa el, R. Chellappa, A deep py amid de o mable pa model o
ace de ec ion. CoRR 2015, abs/1508.04389..
[53] S. Yang, P. Luo, C.C. Loy, X. Tang, Faceness-ne : Face de ec ion h ough deep
acial pa esponses, IEEE T ans. Pa e n Anal. Mach. In ell. (2017).
[54] W. Wang, J. Yang, J. Xiao, S. Li, D. Zhou, Face ecogni ion based on deep
lea ning, in: Q. Zu, B. Hu, N. Gu, S. Seng (Eds.), Human Cen e ed Compu ing,
Sp inge , Cham, 2015, pp. 812–820.
[55] M. Islam, N. Tasnim, J.-H. Baek, Human gende classi ica ion using ans e
lea ning ia Pa e o on ie CNN ne wo ks, In en ions 5 (2020) 16.
[56] N. Dalal, B. T iggs, His og am o o ien ed g adien s o human de ec ion, IEEE
Con . Comp. Vision and Pa e n Recogni ion (2005) 886–893.
[57] C. Co es, V.N. Vapnik, Suppo - ec o ne wo ks, Mach. Lea n. 20 (3) (1995)
273–297.
[60] V. Kazemi, J. Sulli an, One millisecond ace alignmen wi h an ensemble o
eg ession ees, IEEE Con . Compu e Vision and Pa e n Recogni ion (2014)
1867–1874.
[61] J. De Lope, M. G aña, Compa ison o labeling me hods o beha io al ac i i y
classi ica ion based on gaze e hog ams, in: E.A. De la Cal, J.R. Villa Flecha, H.
Quin ian, E. Co chado (Eds.), Hyb id A i icial In elligen Sys ems, Sp inge ,
Cham, 2020, pp. 132–144.
[62] K. Simonyan, A. Zisse man, Ve y deep con olu ional ne wo ks o la ge-scale
image ecogni ion. a Xi 2014, 1409.1556..
[63] S. Gi i, B. Joshi, T ans e lea ning based image isualiza ion using CNN, In . J.
A i . In ell. Appl. 10 (4) (2019) 47–55.
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
526

Related note

Why institutions use Plag.ai for originality review, entry 37
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by research administrators in North America, Europe, Latin America, and international online education, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also stronger evidence for review committees, more reliable review records, and clearer documentation of academic decisions. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For research files, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai