Deep ans e lea ning-based gaze acking o beha io al ac i i y
ecogni ion
Ja ie de Lope
a,
⇑
, Manuel G aña
b
a
Depa men o A i icial In elligence, Uni e sidad Poli écnica de Mad id (UPM), Mad id, Spain
b
Compu a ional In elligence G oup, Uni e si y o he Basque Coun y (UPV/EHU), San Sebas ian, Spain
a icle in o
A icle his o y:
Recei ed 7 Ma ch 2021
Re ised 8 June 2021
Accep ed 25 June 2021
A ailable online 23 May 2022
Keywo ds:
Deep ans e lea ning
Gaze acking
Gaze e hog am
Human ac i i y ecogni ion
abs ac
Compu a ional E hology s udies ocused on human beings is usually e e ed as Human Ac i i y
Recogni ion (HAR). Speci ically, his pape belongs o a line o wo k on he iden i ica ion o b oad cogni-
i e ac i i ies ha use s ca y ou wi h compu e s. The keys one o his kind o sys ems is he nonin asi e
de ec ion o he subjec ’s gaze ixa ions in selec ed display a eas. Nonin asi eness is ensu ed by using he
con en ional lap op came as wi hou addi ional illumina ion o acking de ices. The gaze e hog ams,
composed as sequences o gaze ixa ions, a e he basis o iden i y he use ac i i ies. To de e mine he
gaze ixa ion display a eas wi h he highes accu acy, his pape explo es he use o a ans e lea ning
app oach applied o se e al well-known deep lea ning ne wo k (DLN) a chi ec u es whose inpu is he
eye a ea ex ac ed om he ace image,and ou pu is he iden i ica ion o he gaze ixa ion a ea in he
compu e sc een. Two di e en da ase s a e c ea ed and used in he alida ion expe imen s. We epo
encou aging esul s ha may allow he gene al use o he sys em.
Ó2022 The Au ho (s). Published by Else ie B.V. This is an open access a icle unde he CC BY-NC-ND
license (h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/).
1. In oduc ion
Compu a ional E hology [1] has become a ho esea ch ield in
he las ew yea s. I in eg a es he in o ma ion om se e al di e -
en senso s and ac i i y measu emen de ices in o de o cha ac-
e ize he beha io o li ing beings. Speci ically, he compu e -
based analysis and ecogni ion o human beha io , e e ed o as
Human Ac i i y Recogni ion (HAR) [2], ecei es plen y o a en ion
and con ibu ions. Basically, he e a e wo ypes o senso s used in
HAR esea ch: came as [3] and ine ial senso s [4]. In compu a-
ional neu oe hology, hese senso s a e usually combined wi h
neu onal ac i i y da a cap u ed by using, o example, elec oen-
cephalog aphy (EEG) equipmen [5]. Much e o in HAR esea ch
is cu en ly di ec ed o he moni o ing o aging people [14], and
o he pe o mance imp o emen in some spo s [15]. The moni-
o ing o elde ly people is usually mo i a ed by beha io al decline
due o neu odegene a i e diseases and i s goals is o de ec abno -
mal si ua ions o aise ala ms [6], o example, all de ec ion [16].
HAR s udies a e usually o ien ed o he iden i ica ion o low le el
ac i i ies, o ins ance, he de ec ion o abno mal beha io al si ua-
ions in he elde ly [6] by he use o 3D skele on models o body
pos u es [7], hence hey do no deal wi h highe le el beha io ep-
esen a ions such as p o ided by e hog ams.
An e hog am is a ime plo o he low le el ac ions ca ied ou
by he subjec unde obse a ion ha p o ides a high le el beha -
io al ep esen a ion. E hog ams ha e been used o animal pheno-
ype cha ac e iza ion [8]. We a e cu en ly in e es ed in he
cha ac e iza ion o beha io al s a es o a lap op compu e use
by using he lap op came a and he mic ophone o de e mine he
ac i i y pe o med by he use by nonin asi e compu a ional
me hods. P e iously, we ha e s udied he pe o mance o con en-
ional machine lea ning app oaches on such ask [9]. In his pape
we explo e he use o deep lea ning echniques o ecognize he
subjec ’s beha io al ac i i y. Ou hypo hesis is ha he subjec ’s
gaze ixa ions in o ma ion allows o de e mine he speci ic ac i i-
ies in which he subjec is engaged [9,61].
A isual ixa ion is he sus ained gaze du ing a ime in e al in
a speci ic di ec ion which alls upon a single loca ion in he isual
s imulus. I s a e age du a ion in uncon olled condi ions is abou
200 ms [12]. The saccades a e quick, simul aneous mo emen s o
bo h eyes be ween wo o mo e phases o ixa ion in he same
di ec ion [13].Blinking is he semi-au oma ic apid closing o
he eyelids. I s a e is gene ally g ea e han a dozen blinks pe
minu e, al hough i may dec ease when he eyes a e ocused on
an objec o an ex ended pe iod o ime, o example, when
eading.
h ps://doi.o g/10.1016/j.neucom.2021.06.100
0925-2312/Ó2022 The Au ho (s). Published by Else ie B.V.
This is an open access a icle unde he CC BY-NC-ND license (h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/).
⇑
Co esponding au ho .
E-mail add esses: [email p o ec ed] (J. de Lope), [email p o ec ed]
(M. G aña).
Neu ocompu ing 500 (2022) 518–527
Con en s lis s a ailable a ScienceDi ec
Neu ocompu ing
jou nal homepage: www.else ie .com/loca e/neucom
As he in o ma ion is e ie ed du ing he ixa ions, we de e -
mine when hey a e p oduced and in which o de hey a e pe -
o med. We call gaze e hog ams hese empo al sequences o
isual ixa ions which a e he a omic ac ions building up he
beha io al ep esen a ion. We de ine a eas o he display which
ecei es he use a en ion in o de o ca ego ize he isual ixa-
ions. The gaze e hog am may be used o ecognize he subjec ’s
beha io al ac i i y. The wo k in his pape is de o ed o he e alu-
a ion o deep lea ning a chi ec u es on he ask o ecognizing he
gaze ixa ion om un il e ed images o he eye egion.
The es o he pape is o ganized as ollows. Fi s , Sec ion 2p o-
ides a sho iew o he s a e-o -a in bo h lines o ou wo k: gaze
de ec ion and acking, and deep lea ning echniques applied o li e
sciences. Sec ion 3desc ibes he expe imen al da ase s and he
p oposed compu a ional me hods expe imen ed wi h. Sec ion 4
p o ides he expe imen al esul s. The e, we also o e a c i ical
discussion on he esul s. Finally, Sec ion 5we conclude wi h some
summa izing ema ks on ou wo k and di ec ions o u u e wo k.
2. Backg ound
This sec ion p o ides a sho e iew o he s a e-o -a in
ela ed esea ch. Fi s , we summa ize he wo ks on gaze de ec ion
and acking wi h app oxima e o equi alen goals. Then, we
e iew some an eceden s and cu en de elopmen s in he g ow-
ing a ea o deep lea ning applied o li e sciences.
2.1. Gaze de ec ion and acking
Gaze in o ma ion has been used o diagnos ic and ac i e in e -
ac ion pu poses [10,11,18]. Fo example, gaze in e ac ion has been
used o communica ion wi h people su e ing ex eme disabili y
[24] o in pa ien s wi h Alzheime ’s Disease (AD) [25]. Diagnos ic
applica ions ha e been widesp ead in many di e en a eas such
as neu oscience [26,27], in luence o s uden s’ isual a en ion
and school ailu e [28] o analysis o acial exp ession explo a ion
in subjec s wi h social anxie y [29].
Gaze de ec ion has been a esea ch challenge o a long ime
[17,18]. Ea ly success ul app oaches [19] we e based on elec-
ooculog aphy (EOG), which is a echnique ha uses a se ies o
elec odes si ua ed in he use ’s ace o measu e he eye mo ion
in an elec omagne ic ield. Videooculog aphy (VOG) sys ems
[20] a e op ical-based sys ems using speci ic illumina ion sys ems
—o en in a ed— ha enhance he de ec ion o eye ea u es such
as he pupil and he co nea.
The e is a need o much less in asi e sys ems ha do no
equi e he subjec o wea speci ic in usi e echnology, as is
he case o EOG and VOG. Solu ions based on compu e ision
use con en ional machine lea ning echniques, some a e based
on he localiza ion o he eyeb ows [21], o he s use he es ima ion
o he 3D ace mo ion om a single came a [22]. Recen app oaches
based on deep lea ning a chi ec u es ha e been es ed in neu o-
science s udies [23]. The wo k in his pape goes in his di ec ion
owa ds minimally in asi e eliable gaze de ec ion and acking
sys ems.
2.2. Deep lea ning in he li e sciences
Deep Lea ning (DL) app oaches a e he p o agonis o A i icial
Neu al Ne wo k (ANN) esu gence in he las decade [31–33]. They
o e come he p oblem o he anishing g adien and o e i ing by
a ious app oaches. They p oduce a da a d i en hie a chy o
abs ac ep esen a ions by s ochas ic g adien descen aining
p ocedu es. Speci ically, he con olu ional neu al ne wo k (CNN)
[35] aining p oduces a hie a chy o il e s uned om he da a.
CNNs ha e been ex emely in luen ial in he ad ance o compu e
ision (CV) asks. This a chi ec u e has inspi ed new gene a ions o
DL ne wo ks (DLN) wi h di e se a chi ec u es, which a e epo ing
supe io pe o mance on many di e en p oblems in a eas such as
image p ocessing [36,37], pa e n ecogni ion and objec de ec ion
[38–40], classi ica ion [41,42], acking [48], and ac i i y ecogni-
ion om da a p o ided by ine ial senso s [49].
In he Li e Sciences (LS) he numbe o epo ed DLN applica-
ions du ing he las i e yea s ha e been g owing exponen ially
[34]. Example applica ions o DLN in LS a eas a e he analysis o
medical images in he neu osciences [23,43] and o he medical
a eas [44,45] including ea ly s age de ec ion o COVID-19 in X-
ay imaging [46,47]. DLNs ha e been also applied o acial image
p ocessing, which is a a he complex objec because o many di -
e en ac o s like he ace posi ion and o ien a ion, he mou h and
eyes opening, and he human skin colo ange. The e a e epo ed
DLN app oaches o ace con ou de ec ion [50,51], he acial com-
ponen s ex ac ion [52,53], biome ic acial ecogni ion [54], and
gende classi ica ion [55].
3. Ma e ials and me hods
As p e iously s a ed, ou se up employs he beha io al ac i i y
ecogni ion sys em o de e mine he ac i i y ca ied ou by he
subjec [9]. This sys em uses gaze e hog ams o desc ibe and iden-
i y such ac i i ies. Fig. 1(a) shows an ins ance o a gaze e hog am
ob ained om a subjec eading a ex on he compu e display. Fo
ac i i y ecogni ion pu poses i is enough ha he gaze acking
sys em iden i ies he gaze ixa ion a ge s co esponding o he
b oad a eas in Fig. 1(b). The a ge numbe o de has been a bi a -
Fig. 1. (a) Gaze e hog am co esponding o he use ac i i y ‘‘ eading a ex ” in an
expe imen o du a ion 200 s. The a ge s co espond o nine di e en display a eas
in which he subjec ’s ixa ions a e de ec ed. (b) Templa e used o calib a ion. The
numbe s deno e he sequence o loca ions o he a ge a eas o gaze ixa ions
ollowed by he use du ing calib a ion. The same numbe s a e used as ou pu
ca ego ies o he DLNs.
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
519
ily de ined in o de o educe he subjec a igue while pe o ming
he sys em calib a ion.
We desc ibe he o e all cogni i e ac i i y ecogni ion based on
gaze e hog ams elsewhe e [9]. The wo k in his pape co e s a
no el p oposal ha u ilizes deep lea ning o es ima ing he gaze
ixa ion on he isual a ge a eas. The sys em ha dwa e con igu a-
ion is a lap op compu e endowed wi h a web came a on op o
he sc een upon which a use is wo king. The dis ance o he ace
o he came a is oughly 50 cm, and he came a iew o he ace
is on al, al hough he subjec can mo e eely and change pose
a will. We a e using o he shel web came as ha a e ac o y
ins alled in lap ops, he e o e obus ness is a challenge and a lim-
i a ion. The esolu ion o hese came as is limi ed and o en he
image quali y is qui e low. Addi ional di icul ies a ise om he
uncon olled illumina ion condi ions, and he use eedom o
mo emen in on o he came a.
3.1. Da ase s
We ha e gene a ed wo di e en da ase s o hese expe i-
men s. Bo h a e p oduced om lap op came a cap u ed ideos
wi h esolu ions o 720p, in which he subjec s pe o m ixa ions
in o de on e e y a ge o a calib a ion empla e o 3 s. Then,
hose images a e selec ed o emo e examples wi h oo blu ed
o e y simila images and unclea a ge des ina ions and hey
a e hand-labeled o assign he a ge o each one.
The i s da ase con ains images om 12 subjec s wi h di e -
en equipmen and illumina ion condi ions. The images in his
da ase ha e been balanced in o de o gua an ee an equi alen
numbe o images in each class, ying o an icipa e oubles
du ing he aining s age. This da ase con ains 450 images.
The second da ase con ains images om a unique subjec . The
ideos ha e been eco ded unde di e en illumina ion condi-
ions and a ying dis ance o he came a. The unde lying idea
is o compa e he pe o mance be ween ANNs ained wi h gen-
e al, mul i-use da a and ailo ed, single-use da a. This da ase
is composed by 700 images.
To localize he ace in he images we apply a p e- ained de ec-
o based on his og ams o o ien ed g adien s (HOG) [56] as inpu
ea u es o classi ica ion by linea suppo ec o machines (SVM)
[57]. Once he ace is localized in he image, he nex s ep consis s
o de e mining he posi ion o ace landma ks. This p oblem is
known as ace alignmen . We use a p e iously ained ensemble
o eg ession ees o es ima e he ace landma k posi ion di ec ly
om a spa se subse o pixels in ensi ies [60]. The me hod e u ns
68 2D poin s in he image ha can be used o localize he eyes, eye-
b ows, nose, mou h, and jawline. This app oach allows almos eal-
ime esponse, al hough we ha e ound ouble when he use is
wea ing some kind o glasses du ing he da a cap u e. Finally, we
selec he eye a ea in he o iginal images o add hem o he ali-
da ion da ase s. Fig. 2 shows some examples o hose images and
he co esponding label. In his case we use he a ge iden i ie
me hod o labeling [61].
3.2. Deep ans e lea ning
We ha e e ained six models o DLN. We use a ans e
lea ning app oach, whe e each DLN has been p e iously ained
o e da a om he ImageNe challenge. We keep he weigh s o
he in e media e laye s, e aining he inal laye s ha p oduce
he ac ual classi ica ion ou pu . In deep ans e lea ning [30] he
al eady ained DLN hidden laye s a e assumed as a gene al
ea u e ex ac ion p ocedu e, de ining a mani old ha can be used
o map he inpu da a o classi ica ion o eg ession asks ha a e
di e en om he o iginal one. Task speci ic in o ma ion is
p o ided when aining he ou pu laye s o he DLN.
We use he same ou pu ully connec ed laye o all he ne s. I
con ains 10 neu ons, each one iden i ies one o he display a ge
a eas. The en h neu on is used o de ec cases in which he subjec
has he eyes closed. We ha e chose a so max unc ion o compu -
ing he classes due o i is usually ecommended o likelihood
compu a ion in mul i-class domains. Unless s a ed o he wise, we
ha e used he Adam op imize wi h a lea ning a e o 10
4
.We
ha e alida ed he e ained models by c oss- alida ion in all he
expe imen s epo ed in he nex sec ion. We ha e epea ed 30
imes a 80% hold ou alida ion, whe e we by andomly selec a
80% o he da ase s o aining and use he emaining 20% o es .
We epo he a e age accu acy o he es esul s.
3.2.1. VGG19
The i s DLN e alua ed was he classical VGG19 [62].I isa
classical CNN which has 19 con olu ional laye s ollowed by
max-pooling laye s o educe he image size. In o de o adap
he model o ou da a we ha e emo ed he las laye s o he p e-
ained ne wo k, and added and ained wo ully connec ed laye s
wi h 50 and 25 neu ons, espec i ely, wi h i s owns d opou laye s
o educe he o e i ing.
3.2.2. Incep ion- 3
The Incep ion neu al ne wo k [63] has se e al e sions, he
ou h is he mos ecen . We ha e used he p e ious e sion
because o i s a ailabili y. I s s uc u e is composed o a pa e n
o laye s ha is eplica ed along he ne . The e a e modules wi h
mul iple con olu ional laye s in pa allel ha ex ac di e en
image ea u es, which a e conca ena ed a he end o he module.
We ha e added and ained an addi ional ully connec ed laye
o i i o ou da ase s.
3.2.3. Xcep ion
The Xcep ion neu al ne wo k [64] uses he same modula com-
posi ion idea o Incep ion a chi ec u es bu he e is a modi ica ion
in he pa e ns: i changes he pa allel con olu ional laye s by sep-
a able con olu ional laye s. These new laye s allow o educe he
compu a ions, being he ime equi ed o ain much mo e images,
conside ably sho e .
The Xcep ion s uc u e p esen s h ee di e en s ages. The ini-
ial s age applies a il e o he image o educing he image size
while i keeps he con olu ional laye s. The middle s age a e
epea ed modules, which a e duplica ed up o eigh imes. The inal
s age has been modi ied and e ained o adap i o ou da ase s.
He e we ha e used wo ully connec ed laye s wi h 50 and 20 neu-
ons ollowed by a single d opou laye o a oid o e i ing.
3.2.4. ResNe 50
ResNe 50 [65] is a esidual ne wo k. This kind o DLN a chi ec-
u e ies o model he esidual o he p edic ion a p e ious laye s.
I has di ec p opaga ion o he inpu along he laye s o he ne -
wo k in o de o compu e his esidual. This s uc u al ea u e alle-
ia es he anishing g adien p oblem and p o ides in e es ing
compu a ional p ope ies, such ha he compu a ion a a gi en
laye is independen om p e ious laye s. Residual ne wo ks
may ha e a e y la ge numbe o laye s, he one ha we e ain
on ou da ase s has 50 laye s ha a e g ouped in o se e al blocks.
A he beginning o each block, he compu ed esidual is s o ed and
i is used a he end o he block wi h he compu ed weigh s. In his
case, we a e using he SGD op imize due o i s supe io pe o -
mance agains o he op imize s o his kind o ne wo ks. The
lea ning a e is 10
5
and he decay a e is 10
6
o each i e a ion.
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
520
Fig. 2. Examples o eye egion images cap u ed while subjec s a e pe o ming ixa ions in each a ge ha compose he da ase s. These images a e he inpu o he DLNs
p o iding he gaze ixa ion iden i ica ion.
Fig. 3. Accu acy cu es in aining and es wi h he mul i-use da ase .
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
521
3.2.5. Incep ion-ResNe - 2
The Incep ion-Resne ne [66,67] combine bo h ResNe and
Incep ion app oaches o c ea e a model wi h he ad an ages
p o ided by hem. The s uc u e is composed by se e al blocks
besides he pa allel con olu ional laye s used o conca ena e
blocks. Inside he blocks he e a e epea ed modules o he Incep-
ion la o . The e is also connec ions om he beginning o blocks
o he end simila o ResNe ones.
3.2.6. DenseNe
DenseNe [68] ollows a design idea simila o ResNe al hough
now au ho s add he esidual o each block globally and no only in
he end o each block. Thus, i appea s se e al connec ions om he
inpu s he con olu ional laye s in each block o he ou pu s o
o he blocks. Thanks o his modi ica ion he ne is mo e compac
and equi es lesse laye s o ex ac in o ma ion om he image
because o each laye can ecei e in o ma ion om p e ious laye s.
Fig. 4. Ca ego ical c oss-en opy loss cu es in aining and es wi h he mul i-use da ase .
Table 1
Ne s accu acy and e o wi h he mul i-use da ase .
Ne Bes Accu acy Lowes E o
VGG19 89.01% 0.3452
Incep ion- 3 86.96% 0.5529
Xcep ion 82.61% 0.6023
ResNe 50 86.96% 0.6531
Incep ion-ResNe - 2 84.78% 0.6248
DenseNe 91.30% 0.4281
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
522
Table 2
No malized con usion ma ix om DenseNe and he mul i-use da ase .
P edic ed Ta ge
0123456789
Ac ual Ta ge 0 100 000 0 000
1010 000 0 000
20 0 .833 0 0 0 0 0 0 .167
30 0 .143 .857 0 0 0 0 0 0
4000 010 0 000
50 0 0 0 0 .667 0 0 0 .333
60 0 0 0 0 0 .800 .200 0 0
7000 000 0 100
800 0 0 0 0 0 0 1 0
9000 000 0 001
Fig. 5. A e age accu acy cu es in aining and es o e he single-use da ase .
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
523
4. Expe imen al esul s and discussion
Now we discuss he lea ning esul s and gene al pe o mance
ob ained wi h he DLNs desc ibed abo e. Fi s , we show he
mul i-use da ase esul s. Then, we summa ize and compa e he
esul s ob ained on he single-use da ase .
4.1. Mul i-use da ase
Fig. 3 depic s he a e age accu acy cu es ob ained in aining
and es phases wi h he mul i-use da ase . All he ne s achie e
a high accu acy a e a low numbe o epochs. Mo eo e , he en-
dency in bo h aining and es a e almos pa allel in e e y case.
The ea u es ex ac ed by he p e ained hidden laye s o he DLNs
appea o p o ide a good baseline o his p oblem and ou da ase .
Wi h he excep ion o VGG19, he DLNs s all a e oughly 30
Fig. 6. A e age ca ego ical c oss-en opy loss cu es du ing aining and es o e he single-use da ase .
Table 3
Bes accu acy and e o achie ed by DLN models o e he single-use da ase .
Ne Bes Accu acy Lowes E o
VGG19 94.62% 0.1788
Incep ion- 3 93.55% 0.3151
Xcep ion 93.55% 0.2475
ResNe 50 91.40% 0.2486
Incep ion-ResNe - 2 91.24% 0.2982
DenseNe 95.70% 0.2195
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
524
epochs. I usually comes om he ac ha DLNs s a o o e i o
he aining da a. The e o e, p obably hey would need mo e da a
o keep imp o ing he aining.
T aining minimizes he ca ego ical c oss-en opy loss in o de
o compa e he eal dis ibu ion wi h he p edic ed one. The lowe
he ou pu o his unc ion, he g ea e he deg ee o simila i y o
bo h dis ibu ions, and he g ea e he expec ed accu acy o classi-
ica ion. Fig. 4 show he a e age e olu ion o his e o measu e in
bo h aining and es da ase s. These cu es a e highly nega i ely
co ela ed o he ones in Fig. 3: he highe he accu acy, he lowe
he e o . He e, we also obse e ha he bes aining e olu ion
esul s a e ob ained wi h he VGG19 ne . O he ne s end o o e i
a e he i s epochs.
Table 1 shows he highes accu acy and he lowes e o
achie ed by e e y DLN a chi ec u e. The esul s indica e ha we
ha e been able o ge a leas an accu acy o 80% wi h he new da a.
DenseNe achie es he bes esul wi h mo e han 90%.
Table 2 shows a ypical es con usion ma ix ob ained by he
e ained DenseNe on he mul i-use da ase . Usually he DLNs
end o ou pu e oneous a ge s when subjec s look a he a ge
a eas loca ed a he bo om o he display. The eason is ha he
eyes a e o en closed in hose images so ha i is ha d o de e mine
he igh a ge unde hese condi ions.
4.2. Single-use da ase
The p oposed ans e lea ning a chi ec u es achie e be e
esul s when hey a e e alua ed on he second da ase , which is
composed o images om a single use . The lea ning p oblem
appea s easie han in he mul i-use case, because we emo e
he da a a iabili y due o he use . Mo eo e , he da ase is la -
ge han in he mul i-use case. We can obse e in Fig. 5 ha all
DLN models achie e an accu acy g ea e han 80% in jus a ew
epochs. Howe e , o e i ing appea s in he ini ial epochs so ha
e aining he DLNs do no imp o e hei pe o mance anymo e.
The hypo hesized eason is he high simila i y be ween all he
images in he da ase . I migh be pa ially sol ed by emo ing
edundan da a o by applying addi ional egula iza ion me hods
apa om he d opou laye s deployed a design ime. Fig. 6
shows he e olu ion o he loss unc ion on lea ning and es
da ase s. The e o a e alls apidly bu i emains s able a e
he i s epochs.
Table 3 summa izes he highes accu acy and he lowes e o
achie ed by e e y DLN a chi ec u e a e ans e lea ning. The
esul s a e be e han he ob ained wi h he mul i-use da ase .
All he DLN a chi ec u es achie e accu acies o e 90%. DenseNe
p o ides he bes esul s. No e ha he e he ne s a e lea ning o
classi y he gaze co esponding o jus one subjec . This gi es an
idea abou how impo an can be o ailo he classi ie s o a inal
use .
Table 4 shows a ypical con usion ma ix gene a ed om Den-
seNe and he single-use da ase . The con usion e o in p edic ed
a ge s ollows a e y simila pa e n o he mul i-use case.
4.3. Discussion
The models achie e compe i i e esul s wi h bo h da ase s. The
es accu acy achie ed o e he single-use da ase is g ea e bu
hese esul s mus be aken wi h ca e. The mul i-use case could
o e he be e solu ion o a global sys em o de aul mode, while
he single-use case has o be e ained o each pa icula use .
We expec ha he esul s wi h he mul i-use da ase should
be imp o ed i mo e images om new use s a e added o he da a-
se because he cu en numbe o images is no pa icula ly high
and DLN me hods usually equi e la ge da ase s o e ec i e
aining.
Also we ha e used he same s uc u e and laye s in ne s o
bo h da ase s. P obably we could modi y some laye s in o de o
manage he o e i ing p oblems ound wi h he single-use da a-
se , as p e iously commen ed.
5. Conclusions and u he wo k
We ha e p esen ed a me hod o gaze ixa ion de ec ion based
on deep ans e lea ning in he con ex o beha io al ac i i y
ecogni ion sys ems. This is usually an impo an pa o such sys-
ems. In ou case we mus achie e he bes pe o mance o he gaze
acking sys ems because he goal o ou sys em is o de e mine
ac i i ies ha a use ca ies ou in on a compu e and he inpu s
come om he came a on op o he sc een.
In spi e o he educed da ase s used in he expe imen s he use
o a ailable public p e- ained ne wo ks o domain ans e lea n-
ing allows o achie e good pe o mance wi h a o dable compu a-
ional cos . The bes esul s acco ding he ecogni ion accu acy
ha e been epo ed by he DenseNe model. O he models equi e
lowe aining ime o a e easie o implemen , so i should be con-
side ed as jus one i em o conside .
Fu u e wo ks will check inno a i e ecen DL. Speci ically, he
ecommenda ion o he e iewe s conce ning he 3D-ResNe 35
a chi ec u e [69,70] ha p omises enhanced esul s due o i s abil-
i y o p ocess 3D da a. Ano he al e na i e o u u e wo k is o c e-
a e a new a chi ec u e om sc a ch. We should ex end ou da ase
o his endea o , because a basic equi emen o DLN aining a e
la ge da ase s.
CRediT au ho ship con ibu ion s a emen
Ja ie de Lope: Concep ualiza ion, Me hodology, So wa e,
W i ing – e iew & edi ing. Manuel G aña: Me hodology, W i ing
– e iew & edi ing.
Table 4
No malized con usion ma ix om DenseNe o e he single-use da ase .
P edic ed Ta ge
0123456789
Ac ual Ta ge 0 1000000000
10100000000
20010000000
30001000000
40 0 0 0 .937 0 0 0 .063 0
50 .040 0 0 0 .800 .120 0 0 .040
60000001000
7.133 0 0 0 0 0 0 .867 0 0
80000000010
90000000001
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
525
Decla a ion o Compe ing In e es
The au ho s decla e ha hey ha e no known compe ing inan-
cial in e es s o pe sonal ela ionships ha could ha e appea ed
o in luence he wo k epo ed in his pape .
Acknowledgmen s
This wo k has been suppo ed by FEDER unds h ough MINECO
p ojec TIN2017-85827-P. This p ojec has ecei ed unding om
he Eu opean Union’s Ho izon 2020 esea ch and inno a ion p o-
g amme unde he Ma ie Sklodowska-Cu ie g an ag eemen No.
777720. XinZhe Jin con ibu ed some ea ly compu a ional
expe iences.
Re e ences
[1] D.J. Ande son, P. Pe ona, Towa d a science o Compu a ional E hology, Neu on
84 (2014) 18–31.
[2] M. V igkas, C. Nikou, I. Kakadia is, A e iew o human ac i i y ecogni ion
me hods, F on . Robo . A i . In ell. 2 (2015) 11.
[3] S.-R. Ke, H. Le Uyen Thuc, Y.-J. Lee, J.-N. Hwang, J.-H. Yoo, K.-H. Choi, A e iew
on ideo-based human ac i i y ecogni ion, Compu e s 2(2) (2013) 88–131..
[4] J.Y. Yang, J.S. Wang, Y.P. Chen, Using accele a ion measu emen s o ac i i y
ecogni ion: An e ec i e lea ning algo i hm o cons uc ing neu al classi ie s,
Pa e n Recogn. Le . 29 (16) (2008) 2213–2220.
[5] M. G aña, M. Aguila -Mo eno, J. De Lope, I. Baglie o, X. Ga mendia, Imp o ed
ac i i y ecogni ion combining ine ial mo ion senso s and
elec oencephalog am signals, In . J. Neu al Sys . 30 (10) (2020) 2050053.
[6] A. Len zas, D. V akas, Non-in usi e human ac i i y ecogni ion and abno mal
beha io de ec ion on elde ly people: A e iew, A i . In ell. Re . 53 (2020)
1975–2021.
[7] N. Tasnim, M. Islam, J.-H. Baek, Deep lea ning-based ac ion ecogni ion using
3D skele on join s in o ma ion, In en ions 5 (2020) 49.
[8] J.H.F. Abeelen, Mouse mu an s s udied by means o e hological me hods,
Gene ica 34 (1964) 79–94.
[9] J. De Lope, M. G aña, Beha io al ac i i y ecogni ion based on gaze e hog ams,
In . J. Neu al Sys . 30 (7) (2020) 2050025.
[10] A. Geo ge, Image based eye gaze acking and i s applica ions. a Xi 2019,
1907.04325..
[11] R. Ho , How do you Google? New eye acking s udy e eals huge changes,
Fo bes Online, 2015.
[12] B. Cassin, S. Solomon, Dic iona y o Eye Te minology, T iad Publising Company,
Gaines ille, Flo ida, 1990.
[13] J.D. Ende le, D.A. Sie a, A new linea muscle ibe model o neu al con ol o
saccades, In . J. Neu al Sys . 73 (2013) 1350002..
[14] R.G. Hussain, M.A. Ghazan a , M.A. Azam, U. Naeem, S.U. Rehman, A
pe o mance compa ison o machine lea ning classi ica ion app oaches o
obus ac i i y o daily li ing ecogni ion, A i . In ell. Re . 52(1) (2019) 357–
379..
[15] G. And ienko, N. And ienko, G. Budziak, J. Dykes, G. Fuchs, T. on Landesbe ge ,
H. Webe , Visual analysis o p essu e in oo ball, Da a Min. Knowl. Disc. 31 (6)
(2017) 1793–1839.
[16] E.E. S one, M. Skubic, Unob usi e, con inuous, in-home gai measu emen
using he Mic oso Kinec , IEEE T ans. Biomed. Eng. 60 (10) (2013) 2925–
2932.
[17] A.T. Duchowski, Eye T acking Me hodology — Theo y and P ac ice, Sp inge ,
Cham, 2017.
[18] A.T. Duchowski, Gaze-based in e ac ion: A 30 yea e ospec i e, Compu .
G aph. 73 (2018) 59–69.
[19] L.R. Young, D. Sheena, Su ey o eye mo emen eco ding me hods, Beha . Res.
Me hods Ins um. 7 (5) (1975) 397–439.
[20] B.W. Blakley, L. Chan, Me hods conside a ions o nys agmog aphy, J.
O ola yngol. Head Neck Su g. 44 (2015) 25.
[21] L. Flo ea, C. Flo ea, C. Ve an, Recogni ion o he gaze di ec ion: Ancho ing wi h
eyeb ows, J. Vis. Commun. Image Rep. 35 (2016) 67–77.
[22] K.R. Pa k, J.J. Lee, J. Kim, Gaze posi ion de ec ion by compu ing he h ee
dimensional acial posi ions and mo ions, Pa e n Recogn. 35 (11) (2002)
2559–2569.
[23] Y.-H. Yiu, M. Aboula a, T. Raise , L. Ophey, V.L. Flanagin, P. Zu Eulenbu g, S.-A.
Ahmadi, Deep- og: Open-sou ce pupil segmen a ion and gaze es ima ion in
neu oscience using deep lea ning, J. Neu osci. Me hods 324 (2019) 108307.
[24] N. Ba ba a, T.A. Camille i, K.P. Camille i, EOG-based eye mo emen de ec ion
and gaze es ima ion o an asynch onous i ual keyboa d, Biomed. Signal
P ocess. Con ol 47 (2019) 159–167.
[25] P.M. Insch, G. Slesso , J. Wa ing on, L.H. Phillips, Gaze de ec ion and gaze
cuing in Alzheime ’s Disease, B ain Cogn. 116 (2017) 47–53.
[26] O. G ynszpan, J. Bou eille , S. G ynszpan, F. Le Ba illie , J.C. Ma in, J. Nadel,
Al e ed sense o gaze leading in au ism, Res. Au ism Spec . Diso d. 67 (2019)
101441.
[27] J. Kim, J. Seo, T.H. Laine, De ec ing bo edom om eye gaze and EEG, Biomed.
Signal P ocess. Con ol 46 (2018) 302–313.
[28] M.-J. Tsai, H.-T. Hou, M.-L. Lai, W.-Y. Liu, F.-Y. Yang, Visual a en ion o sol ing
mul iple-choice science p oblem: An eye- acking analysis, Compu . Educ. 58
(2012) 375–385.
[29] A. Gu ie ez-Ga cia, A. Fe nandez-Ma in, M. Del Libano, M.G. Cal o, Selec i e
gaze di ec ion and in e p e a ion o acial exp essions in social anxie y, Pe s.
Indi id. Di e . 147 (2019) 297–305.
[30] M. Talo, U.B. Baloglu, O. Yildi im, U.R. Acha ya, Applica ion o deep ans e
lea ning o au oma ed b ain abno mali y classi ica ion using MR images,
Cogn. Sys . Res. 54 (2019) 176–188.
[31] Y. Bengio, P. Lamblin, D. Popo ici, H. La ochelle, U. Mon eal, G eedy laye -
wise aining o deep ne wo ks (2007) 19.
[32] G.E. Hin on, D. Osinde o, Y-W. Teh, A as lea ning algo i hm o deep belie
ne s, Neu al Compu . 18(7) (2006) 1527–1554..
[33] M.A. Ranza o, Y.-L. Bou eau, Y. LeCun, Spa se ea u e lea ning o deep belie
ne wo ks, Con . Neu al In . P oc. Sys . (2007) 1185–1192.
[34] D. Bacciu, P. Lisboa, J. Ma in-Gue e o, R. S oean, A. Vellido, Bioin o ma ics
and medicine in he e a o deep lea ning, 2018, a Xi :1802.09791..
[35] Y. Lecun, P. Ha ne , L. Bo ou, Y. Bengio, Objec ecogni ion wi h g adien -
based lea ning, in: Shape, Con ou and G ouping in Compu e Vision. Lec u e
No es in Compu e Science, ol 1681. Sp inge , Be lin, Heidelbe g. doi:
10.1007/3-540-46805-6_19..
[36] L.A. Ga ys, A.S. Ecke , M. Be hge, Image s yle ans e using con olu ional
neu al ne wo ks, IEEE Con . on Compu e Vision and Pa e n Recogni ion
(2016) 2414–2423.
[37] G. An ipo , M. Baccouche, J. Dugelay, Face aging wi h condi ional gene a i e
ad e sa ial ne wo ks, IEEE In . Con . in Image P ocessing (2017) 2089–2093.
[38] A. Uca , Y. Demi , C. Guzelis, Objec ecogni ion and de ec ion wi h deep
lea ning o au onomous d i ing applica ions, Simula ion 93 (2017).
[39] J. Redmon, S. Di ala, R. Gi shick, A. Fa hadi, You only look once: Uni ied, eal-
ime objec de ec ion, in: IEEE In . Con . in Image P ocessing, 2016.
[40] K. Po da , C. Pai, S. Akolka , A con olu ional neu al ne wo k based li e objec
ecogni ion sys em as blind aid, 2018..
[41] B. Ma, X. Li, Y. Xia, Y. Zhang, Au onomous deep lea ning: A gene ic DCNN
designe o image classi ica ion, Neu ocompu ing 379 (2020) 152–161.
[42] Y. Zhang, Y. Wang, X.-Y. Liu, S. Mi, M.-L. Zhang, La ge-scale mul i-label
classi ica ion using unknown s eaming images, Pa e n Recogn. 99 (2020)
107100.
[43] M. Talo, U.B. Baloglu, O. Yildi im, U.R. Acha ya, Applica ion o deep ans e
lea ning o au oma ed b ain abno mali y classi ica ion using MR images,
Cogn. Sys . Res. 54 (2019) 176–188.
[44] U. Ragha end a, H. Fuji a, S.V. Bhanda y, A. Gudiga , J.H. Tan, U.R. Acha ya,
Deep con olu ional neu al ne wo k o accu a e diagnosis o glaucoma using
digi al undus images, In . Sci. 441 (2018) 41–49.
[45] O. Yildi im, M. Talo, B. Ay, U.B. Baloglu, G. Aydin, U.R. Acha ya, Au oma ed
de ec ion o diabe ic subjec using p e- ained 2D-CNN models wi h equency
spec um images ex ac ed om hea a e signals, Compu . Biol. Med. 113
(2019) 103387.
[46] Wang, L.; Wong, A. COVID-NET: A ailo ed deep con olu ional neu al ne wo k
design o de ec ion o COVID-19 cases om ches X- ay images. 2020..
[47] F. Shan, Y. Gao, J. Wang, W. Shi, N. Shi, M. Han, Z. Xue, D. Shen, Y. Shi, Lung
in ec ion quan i ica ion o COVID-19 in CT images wi h deep lea ning, 2020..
[48] W. Ouyang, X. Wang, Join deep lea ning o pedes ian de ec ion, IEEE In .
Con . in Compu e Vision, 2013.
[49] R. G zeszick, J.M. Lenk, F.M. Rueda, G.A. Fink, S. Feldho , M. en Hompel, Deep
neu al ne wo k based human ac i i y ecogni ion o he o de picking
p ocess, iWOAR 2017..
[50] H. Jiang, E. Lea ned-Mille , Face de ec ion wi h Fas e RCNN, IEEE In . Con .
Au oma ic Face Ges u e Recogni ion (2017) 650–657.
[51] X. Sun, P. Wu, S.C. Hoi, Face de ec ion using deep lea ning: An imp o ed Fas e
RCNN app oach, Neu ocompu ing 299 (2018) 42–50.
[52] R. Ranjan, V.M. Pa el, R. Chellappa, A deep py amid de o mable pa model o
ace de ec ion. CoRR 2015, abs/1508.04389..
[53] S. Yang, P. Luo, C.C. Loy, X. Tang, Faceness-ne : Face de ec ion h ough deep
acial pa esponses, IEEE T ans. Pa e n Anal. Mach. In ell. (2017).
[54] W. Wang, J. Yang, J. Xiao, S. Li, D. Zhou, Face ecogni ion based on deep
lea ning, in: Q. Zu, B. Hu, N. Gu, S. Seng (Eds.), Human Cen e ed Compu ing,
Sp inge , Cham, 2015, pp. 812–820.
[55] M. Islam, N. Tasnim, J.-H. Baek, Human gende classi ica ion using ans e
lea ning ia Pa e o on ie CNN ne wo ks, In en ions 5 (2020) 16.
[56] N. Dalal, B. T iggs, His og am o o ien ed g adien s o human de ec ion, IEEE
Con . Comp. Vision and Pa e n Recogni ion (2005) 886–893.
[57] C. Co es, V.N. Vapnik, Suppo - ec o ne wo ks, Mach. Lea n. 20 (3) (1995)
273–297.
[60] V. Kazemi, J. Sulli an, One millisecond ace alignmen wi h an ensemble o
eg ession ees, IEEE Con . Compu e Vision and Pa e n Recogni ion (2014)
1867–1874.
[61] J. De Lope, M. G aña, Compa ison o labeling me hods o beha io al ac i i y
classi ica ion based on gaze e hog ams, in: E.A. De la Cal, J.R. Villa Flecha, H.
Quin ian, E. Co chado (Eds.), Hyb id A i icial In elligen Sys ems, Sp inge ,
Cham, 2020, pp. 132–144.
[62] K. Simonyan, A. Zisse man, Ve y deep con olu ional ne wo ks o la ge-scale
image ecogni ion. a Xi 2014, 1409.1556..
[63] S. Gi i, B. Joshi, T ans e lea ning based image isualiza ion using CNN, In . J.
A i . In ell. Appl. 10 (4) (2019) 47–55.
J. de Lope and M. G aña Neu ocompu ing 500 (2022) 518–527
526