Academic Edi o : Susana
Ma inez-Conde
Recei ed: 14 Feb ua y 2025
Re ised: 12 June 2025
Accep ed: 4 July 2025
Published: 7 July 2025
Ci a ion: Ba z, M.; Bha i, O.S.; Alam,
H.M.T.; Nguyen, D.M.H.; Al meye , K.;
Malone, S.; Sonn ag, D. eyeNo a e:
In e ac i e Anno a ion o Mobile Eye
T acking Da a Based on Few-Sho
Image Classi ica ion. J. Eye Mo . Res.
2025,18, 27. h ps://doi.o g/10.3390/
jem 18040027
Copy igh : © 2025 by he au ho s.
Licensee MDPI, Basel, Swi ze land.
This a icle is an open access a icle
dis ibu ed unde he e ms and
condi ions o he C ea i e Commons
A ibu ion (CC BY) license
(h ps://c ea i ecommons.o g/
licenses/by/4.0/).
Jou nal o
Eye Mo emen Resea ch
A icle
eyeNo a e: In e ac i e Anno a ion o Mobile Eye T acking Da a
Based on Few-Sho Image Classi ica ion
Michael Ba z 1,2,* , Omai Shahzad Bha i 1, Hasan Md Tus iqu Alam 1, Duy Minh Ho Nguyen 1,3,4 ,
K is in Al meye 5, Sa ah Malone 5and Daniel Sonn ag 1,2
1
In e ac i e Machine Lea ning, Ge man Resea ch Cen e o A i icial In elligence (DFKI), 66123 Saa b ücken,
Ge many; [email p o ec ed] (O.S.B.); hasan_md_ us iqu [email p o ec ed] (H.M.T.A.);
ho_minh_duy[email p o ec ed] (D.M.H.N.); [email p o ec ed] (D.S.)
2Applied A i icial In elligence, Uni e si y o Oldenbu g, 26129 Oldenbu g, Ge many
3Machine Lea ning and Simula ion Science Depa men , Uni e si y o S u ga , 70569 S u ga , Ge many
4Max Planck Resea ch School o In elligen Sys ems (IMPRS-IS), 70569 S u ga , Ge many
5Depa men o Educa ion, Saa land Uni e si y, 66123 Saa b ücken, Ge many;
[email p o ec ed] (K.A.); [email p o ec ed] (S.M.)
*Co espondence: [email p o ec ed]
Abs ac
Mobile eye acking is an impo an ool in psychology and human-cen e ed in e ac ion
design o unde s anding how people p ocess isual scenes and use in e aces. Howe e ,
analyzing eco dings om head-moun ed eye acke s, which ypically include an egocen-
ic ideo o he scene and a gaze signal, is a ime-consuming and la gely manual p ocess.
To add ess his challenge, we de elop eyeNo a e, a web-based anno a ion ool ha enables
semi-au oma ic da a anno a ion and lea ns o imp o e om co ec i e use eedback. Use s
can manually map ixa ion e en s o a eas o in e es (AOIs) in a ideo-edi ing-s yle in e -
ace (baseline e sion). Fu he , ou ool can gene a e ixa ion- o-AOI mapping sugges ions
based on a ew-sho image classi ica ion model (IML-suppo e sion). We conduc an
expe s udy wi h ained anno a o s (n = 3) o compa e he baseline and IML-suppo
e sions. We measu e he pe cei ed usabili y, anno a ions’ alidi y and eliabili y, and
e iciency du ing a da a anno a ion ask. We asked ou pa icipan s o e-anno a e da a
om a single indi idual using an exis ing da ase (n = 48). Fu he , we conduc ed a semi-
s uc u ed in e iew o unde s and how pa icipan s used he p o ided IML ea u es and
assessed ou design decisions. In a pos hoc expe imen , we in es iga e he pe o mance o
h ee image classi ica ion models in anno a ing da a o he emaining 47 indi iduals.
Keywo ds: eye acking; in e ac i e machine lea ning; a ea o in e es (AOI); mobile eye
acking; isual a en ion; eye acking da a analysis; ixa ion- o-AOI mapping
1. In oduc ion
Eye acking s udies o en conside isual a en ion o speci ic a eas o in e es (AOIs)
o analyze and unde s and how people p ocess isual in o ma ion. AOIs a e speci ic
egions in a scene o in e ace ha a e de ined by esea che s [
1
]. Visual a en ion e e s
o he ime a pe son pays a en ion o hese egions. By measu ing isual a en ion o and
ansi ions be ween AOIs du ing a s udy, esea che s can gain insigh s in o which elemen s
o a scene a e ele an o an ac i i y and how in e en ions o an expe imen in luence he
pa icipan ’s eye mo emen beha io . This is usually pe o med based on ixa ion e en s
as hey a e assumed o app oxima e a pe son’s alloca ion o cogni i e esou ces h ough
J. Eye Mo . Res. 2025,18, 27 h ps://doi.o g/10.3390/jem 18040027
J. Eye Mo . Res. 2025,18, 27 2 o 35
he ime hey spend p ocessing a isual scene [
2
]. Fu he , ad ances in mode n head-wo n
eye acking echnology [
3
] can enable a en ion-awa e mobile human–compu e in e aces.
In emo e eye acking wi h s a ic s imuli such as images, an AOI can be de ined once
and eused o e e y pa icipan . Dynamic AOIs in ideo-based s imuli can be anno a ed
using key ame-based anno a ion echniques; i.e., AOIs a e ma ked ia bounding boxes o
key ames, and in e pola ion is used o anno a e in e media e ames [
4
]. Howe e , hese
e icien ixa ion- o-AOI mapping echniques om emo e eye acking do no scale o
mobile eye acking applica ions. Accu a ely anno a ing mobile eye acking da a emains
a challenging and ime-consuming ask because scene ideos aken wi h a head-moun ed
eye acking de ice a e unique o e e y pa icipan . In mobile eye acking p ac ice, one
o mo e anno a o s decide pe ixa ion whe he an AOI was hi o no [
5
,
6
]. This ixa ion-
wise anno a ion app oach educes he anno a ion e o compa ed o a ideo ame-based
anno a ion because ixa ions las a ound 200–400 ms [
1
], which co esponds o 2–2.5 e en s
pe second. Videos a e ypically eco ded wi h a sampling a e o a leas 30 Hz. S ill,
i does no emedy he need o anno a e AOIs in e e y single eco ding and hinde s he
de elopmen o a en ion-awa e mobile in e aces.
A aching iducial ma ke s o a ge s imuli was p oposed as a solu ion in e-
sea ch
[7–9]
and was adop ed in mode n comme cial so wa e solu ions like Pupil Cloud
(h ps://pupil-labs.com/blog/pupil-cloud-p ojec s-en ichmen s/; accessed on 2 Feb u-
a y 2024). Howe e , ma ke s a e ob usi e and may impac isual scanning beha io .
The e o e, he p esen esea ch aims a a solu ion o non-ins umen ed en i onmen s. Ex-
is ing app oaches o au oma ic o semi-au oma ic analysis o head-moun ed eye acking
da a use compu e ision models o map ixa ions o AOIs. Mos o hese app oaches
ely on p e- ained compu e ision models ha do no allow o adap ing he unde lying
model o a ce ain a ge domain [
5
,
10
–
13
]. These can be applied in e y cons ained
se ings only, i.e., i he da ase used o aining he machine lea ning model ma ches he
a ge domain. Some app oaches suppo a single, a p io i model aining o ine- uning
s ep o adap a ion o a a ge domain [
14
–
16
]. These app oaches o e no possibili y o
adap ing he model du ing he anno a ion p ocess and, hence, su e om a lack o lexibil-
i y. Fu he , no all me hods a e e alua ed quan i a i ely [
17
–
19
] o e alua ion me ics a e
no p ope ly desc ibed [
11
,
20
] o inadequa e, e.g., igno ing empo al aspec s [
16
]. Some
comme cial ools o e au oma ic mapping o he gaze signal in wo ld ideo coo dina es
o a e e ence ame ha de ines AOIs, such as he assis ed mapping unc ion o Tobii
P o (h ps://connec . obii.com/s/a icle/how- o-pe o m-manual-and-assis ed-mapping;
accessed on 12 Decembe 2024). Howe e , his is only possible o a limi ed numbe o
e e ence ames.
We aim o de elop a me hod o semi-au oma ic mapping o ixa ions o AOIs, which
enables e icien analysis and in e p e a ion o humans’ complex in e ac ion beha io . This
bea s he po en ial o boos he e iciency in esea ch based on eye acking by au oma ing
he ime-consuming and expensi e da a anno a ion p ocess [
16
] and o acili a e no el
eal- ime adap i e human–compu e in e ac ion [
21
,
22
]. Fu he , we aim o b eak he
limi a ions o using p e- ained models, i.e., he issue o lacking lexibili y and quali y
assu ance h ough humans-in- he-loop. In his wo k, we implemen and e alua e eyeNo-
a e, a use in e ace ha enables semi-au oma ic anno a ion o mobile eye acking da a.
Ou ool allows mobile eye acking p ac i ione s o manually anno a e hei eco dings
ixa ion-wise, e lec ing he cu en s a e o he a and ep esen ing ou baseline app oach.
Fu he , we implemen an ex ension o e ing ixa ion- o-AOI mapping sugges ions using
a ew-sho image classi ica ion model, which was shown o be success ul in ano he use
case [
23
]. This model can lea n om use eedback, i.e., when use s accep o ejec /co ec
sugges ions, ollowing he in e ac i e machine lea ning (IML) pa adigm. IML combines
J. Eye Mo . Res. 2025,18, 27 3 o 35
equen human inpu and eedback wi h machine lea ning echnologies wi hou equi ing
backg ound knowledge in machine lea ning [
24
,
25
]. Domain knowledge om end-use s,
like eye acking p ac i ione s, can be in eg a ed mo e e ec i ely in o complex applica-
ions. Howe e , i is impo an o ho oughly design such sys ems o achie e be e use
expe iences and mo e e ec i e lea ning sys ems [
26
]. We conduc a case s udy wi h n = 3
ained anno a o s o compa e he baseline e sion and he IML-suppo ed app oach. We
measu e he pe cei ed usabili y, anno a ion alidi y and eliabili y, and e iciency du ing
a da a anno a ion ask using an exis ing mobile eye acking da ase wi h g ound- u h
anno a ions (n = 48). We ask pa icipan s o e-anno a e da a o one indi idual in his
da ase . A e ask comple ion, we conduc ed a semi-s uc u ed in e iew (SSI) o unde -
s and how pa icipan s used he p o ided IML ea u es. In addi ion, we in es iga e he
pe o mance in au oma ically anno a ing he emainde o he da ase using ou esul ing
machine lea ning models.
To add ess he challenges in anno a ing da a om head-moun ed eye ackings, we
implemen eyeNo a e, a use in e ace ha enables semi-au oma ic anno a ion. Ou ool
allows mobile eye acking p ac i ione s o manually anno a e hei eco dings ixa ion-wise
(baseline) and semi-au oma ically using ixa ion- o-AOI mapping sugges ions based on a
ew-sho image classi ica ion model (IML-suppo ). We con ibu e by (i) implemen ing he
eyeNo a e ool o semi-au oma ic anno a ion o head-moun ed eye acking da a based
on ew-sho image classi ica ion, (ii) e alua ing ou eyeNo a e in a case s udy wi h n = 3
ained anno a o s o compa e he baseline e sion and he IML-suppo ed app oach,
measu ing he pe cei ed usabili y, anno a ion alidi y and eliabili y, and e iciency du ing
a da a anno a ion ask, and (iii) conduc ing a pos hoc machine lea ning expe imen o
assess he pe o mance o he conside ed models in au oma ically anno a ing da a om
head-moun ed eye acke s.
2. Rela ed Wo k
We aim o imp o e he anno a ion p ocess o mobile eye acking da a om diagnos ic
use s udies, i.e., assigning each ixa ion in a se o eco dings o an AOI based on he
co esponding ideo ame om he on - acing scene came a and he ixa ion posi ion.
He e, we p o ide an o e iew o exis ing app oaches o he anno a ion o mobile eye
acking da a and ideo anno a ion in gene al. Fu he , we p o ide a b ie o e iew o
me hods o eal- ime in e p e a ion o eye acking da a ha can be used o de elop
wea able a en ion-awa e use in e aces [
27
]. Using unob usi e mode n eye acking
head-gea (see, e.g., Tonsen e al.
[3]
, Lande e al.
[28]
) o augmen ed eali y headse s like
Mic oso ’s HoloLens 2 ha come wi h in eg a ed eye acking senso s, ou sys em o
in e ac i e anno a ion and model aining can enable de elope s o easily c ea e cus om
compu e ision models o a en ion-awa e mobile in e ac ion.
2.1. Anno a ion o Da a om Mobile Eye T acke s
Head-moun ed eye acke s allow esea che s o in es iga e human beha io in mo-
bile se ings. Howe e , e icien me hods o mapping ixa ions o AOIs om emo e eye
acking canno be used because he ideo o he on - acing scene came a di e s o each
pa icipan . Ins umen ing he expe imen scene wi h iducial ma ke s is an op ion o
cope wi h his issue [
7
,
8
]. So wa e ha accompanies mode n head-moun ed eye ack-
e s ypically in eg a es ma ke acking, like he ma ke -based su ace acking in Pupil
Cap u e [
29
]. Howe e , he ins umen a ion o he expe imen a ea comes wi h ce ain
limi a ions. Ma ke acking migh be los due o low came a quali y o due o occlusion
h ough o he objec s in he scene. In augmen ed eali y (AR) se ings, which allow lea ne s
o see digi al objec s embedded in eali y by looking h ough he came a o sma phones o
J. Eye Mo . Res. 2025,18, 27 4 o 35
able s, supposedly unique ma ke s migh appea wice, causing ambigui y. Consequen ly,
objec s can no longe be dis inc ly iden i ied by ma ke s. Ano he disad an age o ma ke -
based su ace acking is ha he nume ous ma ke s needed o eliably ecognize objec s in
in o ma ion- ich lea ning en i onmen s migh impai he ins uc ional design by claim-
ing cogni i e esou ces o he ma ke p ocessing and dis ac ing om lea ning- ele an
isual s imuli. The e o e, his wo k ocuses on an app oach o acili a e and suppo he
ime-consuming and challenging p ocedu e o mapping human gaze o ixa ions o ob-
jec s o AOIs in non-ins umen ed en i onmen s. Comme cial ools like Tobii P o Lab
(h ps://connec . obii.com/s/a icle/how- o-pe o m-manual-and-assis ed-mapping (ac-
cessed on 12 Decembe 2024)) exis ha o e au oma ic mapping o he gaze signal o AOIs
de ined in a e e ence image. Howe e , he assis ed mapping unc ion wo ks o s a ic
scenes only, is e o -p one in cases o as head mo emen s and dis o ed image ames, and,
hence, equi es addi ional manual e o o co ec ing w ong assignmen s o anno a ing
missing samples [
15
]. Fu he , he so wa e is e y expensi e and does no suppo he
anno a ion o eye acking da a om o he de ices like Pupil Co e head-wo n de ice ha
we used. P e ious esea ch also add essed his p oblem in he con ex o da a analysis o
diagnos ic eye acking s udies. Howe e , hese app oaches come wi h ce ain limi a ions.
Mos app oaches ely on p e- ained compu e ision models ha do no suppo an
adap a ion o he unde lying models o he a ge domain. Süme e al.
[10]
in es iga ed
he p oblem o au oma ic a en ion de ec ion in a eaching scena io. They ex ac image
pa ches o all s uden aces in he egocen ic ideo eed and clus e hem using a ResNe -
50 model [
30
] ained on VGGFace2 da a [
31
]. They assign s uden IDs o each clus e ,
allowing hem o map he eache ’s gaze o indi idual s uden s. Chong e al.
[32]
de eloped
a sys em o measu ing eye con ac in adul –child social in e ac ions using mobile eye
acke s. Callemein e al.
[33]
p esen ed a sys em o de ec ing when he pa icipan ’s gaze
ocuses on he head o hands o ano he pe son wi hou he possibili y o di e en ia ing
be ween in e locu o s. Machado e al.
[11]
ma ched ixa ions wi h bounding boxes om
an objec de ec ion algo i hm. They used a sliding-window app oach wi h a MobileNe
model [
34
], p e- ained on ImageNe da a [
35
]. Venup asad e al.
[13]
used unsupe ised
clus e ing wi h gaze and objec loca ions o de ec isual a en ion o an objec o a ace.
They used a Fas e -RCNN model [
36
], p e- ained using he MS COCO da ase [
37
]. Ba z
and Sonn ag
[38]
compa ed wo app oaches o au oma ic ixa ion- o-AOI mapping using
p e- ained deep lea ning models: wo ResNe models p e- ained wi h ImageNe da a
and a Mask R-CNN model p e- ained using MS COCO da a. In an e alua ion based
on he VISUS da ase [
6
], hey ound ha p e- ained models ha e se e e d awbacks in
ealis ic scena ios like AOIs no being ep esen ed by he aining da a. Deane e al.
[12]
also p esen ed an anno a ion sys em based on a p e- ained Mask R-CNN model [
39
]. They
ound high ag eemen s be ween manual and au oma ic anno a ions o AOIs ha ma ch
he MS COCO classes. These can be applied in e y cons ained se ings only, i.e., i he
da ase used o aining he machine lea ning model ma ches he a ge domain.
O he app oaches su e om a lack o lexibili y. Wol e al.
[14]
de eloped an
algo i hm ha maps ixa ions o objec -based AOIs using he Mask R-CNN objec de ec ion
model [
39
]. They conduc ed a con olled lab s udy o eco d da a in a heal hca e se ing wi h
wo AOIs: a bo le and i e sy inges. An e alua ion has shown ha using 72 aining images
wi h 264 anno a ed objec masks, hei sys em can closely app oxima e he AOI-based
me ics compa ed o manual ixa ion-wise anno a ions as a baseline.
Ba line e al. [40]
p esen ed a simila sys em o simpli ying usabili y esea ch wi h mobile eye acke s
o medical sc een-based de ices. Kuma i e al.
[15]
in es iga e he e ec i eness and
e iciency o h ee objec de ec ion models o anno a ing mobile eye acking da a om
s uden s pa icipa ing in STEM lab cou ses. These me hods a e based on a single, a p io i
J. Eye Mo . Res. 2025,18, 27 5 o 35
model aining o ine- uning s ep wi h no possibili y o adap ing he model du ing he
anno a ion p ocess.
Some app oaches include p omising in e ac ion concep s bu use ou da ed compu e
ision me hods. Pon illo e al.
[20]
p esen ed Seman iCode, an in e ac i e ool o pos hoc
ixa ion-based anno a ion o egocen ic eye acking ideos. I suppo s semi-au oma ic
labeling using a dis ance unc ion o e colo his og ams o manually anno a ed ixa ions.
B ône e al.
[19]
p oposed o use objec ecogni ion wi h mobile eye acking o enhance he
analysis o cus ome jou neys. In ollow-up wo k, hey compa ed di e en ea u e ex ac-
ion me hods [
41
] and e alua ed hei app oach in a museum se ing [
42
].
E ans e al. [43]
e iewed me hods o mobile eye acking in ou doo scenes anging om pupil de ec-
ion and calib a ion o da a analysis. They p esen ed an ea ly o e iew o me hods o
au oma ing he p ocess o analyzing mobile eye acking da a. Fong e al.
[44]
p esen ed
a semi-au oma ic da a anno a ion app oach. An anno a o assigns ideo ames wi h a
gaze o e lay o AOIs, and as he anno a ion p ocess ad ances, he sys em lea ns o classi y
AOIs ia ins ance-based lea ning. Ku zhals e al.
[18]
used bag-o -SIFT ea u es and colo
his og ams wi h unsupe ised clus e ing o so ixa ion-based image pa ches by hei ap-
pea ance. They o e an in e ac i e isualiza ion o manual co ec ions.
Pane a e al. [16]
p esen ed an anno a ion me hod based on bag-o - isual wo ds as ea u es and a suppo
ec o classi ica ion model (SVC) ha is ained a p io i. In ollow-up wo k, hey p esen
a sys em ha au oma ically segmen s objec s o in e es using wo s a e-o - he-a neu al
segmen a ion models [
45
]. They used p e- ained models o showcase and e alua e new
da a isualiza ion me hods, bu hey did no assess he pe o mance o hei au oma ic
anno a ion app oach.
Recen ly, Ku zhals e al.
[46]
desc ibed an in e ac i e app oach o anno a ing and
in e p e ing egocen ic eye acking da a o ac i i y and beha io analysis. They implemen
an i e a i e ime sequence sea ch based on eye mo emen s and isual ea u es. They aim
o anno a e high-le el ac i i y e en s ins ead o AOI-hi e en s like we do. In ollow-up
wo k, Ku zhals
[47]
p esen ed an app oach o anno a ing he objec s iewed by s udy
pa icipan s wea ing mobile eye acke s. They p opose o c op image pa ches a ound each
poin o gaze, segmen he esul ing image pa ches simila o he ixa ion de ec ion me hod
by S eil e al.
[48]
, and p esen ep esen a i e gaze humbnails o anno a o s as image
clus e s in 2D. Anno a o s in e ac wi h his clus e ep esen a ion o anno a e and analyze
he mobile eye acking da a. In con as , ou me hod is based on in e ac i e ew-sho
image classi ica ion. Ou sys em lea ns o ecognize he ype o ixa ed objec s o egions
based on human eedback du ing he in e ac ion.
This wo k aims o accele a e and objec i y esea ch on isual a en ion wi h mobile eye
acking using echnologies om he ield o compu e ision and in e ac i e machine lea ning.
2.2. Video Anno a ion in Gene al
The anno a ion o mobile eye acking da a equi es he in e p e a ion o he ideo
eed om he on - acing scene came a. Hence, sys ems and me hods o ideo anno a-
ion a e closely ela ed o ou app oach. An impo an di e ence is ha gene al ools o
ideo anno a ion do no ake he gaze signal o ixa ion e en s in o accoun . In ac , ideo
anno a ion based on he de ini ion o bounding boxes a ound ele an objec s, a espec i e
in e pola ion o in e media e ames, and a mapping o gaze o ixa ion poin s o hese
a eas is he s a e o he a o anno a ing ideo s imuli used wi h emo e eye acking
de ices [
4
]. E en hough hese me hods do no scale when i comes o he anno a ion o mo-
bile eye acking wi h indi idual ideo eeds o each pa icipan , we b ie ly e iew ecen
app oaches and ools o ideo anno a ion, as hey can p o ide guidance o he design o
simila sys ems. Wi h LabelMo ie, Palo ai e al.
[49]
p esen ed a ool o collabo a i e ideo
J. Eye Mo . Res. 2025,18, 27 6 o 35
anno a ion. They p oposed machine lea ning-based quali y assu ance and au oma ion o
he anno a ion p ocess. In mo e ecen wo k, he esea ch g oup p esen ed a me hod o he
semi-au oma ic anno a ion o ideos o analyzing he beha io o labo a o y animals [
50
].
The Mul imodal Mul isenso Ac i i y Anno a ion Tool (MMAAT) o e s simila unc ion-
ali ies o mul ichannel da a s eams om mul iple senso s, like dep h channels om 3D
came as and accele ome e s om w is -wo n de ices [
51
]. The VGG Image Anno a o (VIA)
(h ps://www. obo s.ox.ac.uk/~ gg/so wa e/ ia/ (accessed on 12 Decembe 2024)) is
a s and-alone ool ha enables manual anno a ion o images, audio, and ideo da a in a
web b owse [
52
]. The Compu e Vision Anno a ion Tool (CVAT) is an open-sou ce sys em
o in e ac i e image and ideo anno a ion (h ps://gi hub.com/openc /c a (accessed
on 12 Decembe 2024)). I in eg a es unc ionali ies o scaling ideo anno a ion, like
au oma ic p e-anno a ion based on compu e ision models and key ame-based in e -
pola ion o manual anno a ions, in an easily deployable online pla o m o la ge-scale
p ojec s. A gene al o e iew o in e ac ion me hods o ideo con en was p esen ed by
Schoe mann e al. [53].
2.3. Me hods o A en ion-Awa e In e aces
Human gaze can be conside ed a p oxy o human isual a en ion and hus can
enhance gaze-based mul imodal in e ac ion [
54
]. We p o ide a b ie o e iew o such
eal- ime in e ac i e sys ems because hey can bene i om ou p esen ed app oach o
in e ac i e anno a ion o mobile eye acking da a. Rela ed wo k includes app oaches o
building use in e aces ha a e awa e o he cu en con ex o si ua ion [
55
], including
con e sa ional in e aces [
56
]. Fo ins ance, Bulling e al.
[57]
p esen ed an app oach o
in e ing high-le el con ex ual cues om eye mo emen s o acili a e beha io al moni o ing
and li e-logging. Simila ly, S eil and Bulling
[58]
used opic modeling o de ec e e yday
ac i i ies om eye mo emen s in an unsupe ised ashion. In a la e wo k, he au ho s
p esen ed an app oach o isual a en ion o ecas ing in mobile in e ac ion se ings, which
akes he isual scene and de ice usage da a as addi ional inpu s [
59
].
Toyama e al. [60]
im-
plemen ed a Museum Guide ha uses SIFT (scale-in a ian ea u e ans o m) ea u es [
61
]
wi h he nea es neighbo algo i hm and a h eshold-based e en de ec ion o ecognize
use a en ion o one o 12 exhibi s. They ex ended hei app oach o de ec ing ead ex s
and ixa ed aces wi h he goal o building a i icial episodic memo ies o suppo demen ia
pa ien s [
62
]. O he app oaches combine isual ea u es o a scene wi h gaze in o ma ion
o de ec ac ions ecen ly pe o med by a use [
63
–
66
]. P aso and Chai
[67]
de eloped a
sys em ha combines speech and passi e gaze inpu o enhance e e ence esolu ion in
con e sa ional in e aces. Bau e al. [68] implemen ed No A, a sys em o analyzing and
in e p e ing social signals in mul imodal in e ac ions wi h a con e sa ional agen , which
in eg a es eye acking echnology. Thomason e al.
[69]
de eloped a gaze-based dialog
sys em ha enables he g ounding o wo d meanings in mul imodal obo pe cep ion.
Uppal e al. [5] p esen ed a me hod o segmen ing he ixa ed objec using an end- o-end
compu e ision model. Chang e al.
[70]
de eloped he MemX sys em ha de ec s human
isual a en ion based on mobile eye acking and au oma ically ex ac s impo an ideo
sequences ha can be used o , e.g., li elogging. Meye e al.
[71]
p oposed o use head and
eye mo emen in combina ion wi h o he senso da a o ecognize human ac i i ies o
building con ex -awa e sma glasses.
3. Ma e ials and Me hods
We implemen he eyeNo a e sys em, a web-based ool o ixa ion- o-AOI mapping,
and e alua e i s usabili y, e ec i eness, and e iciency in a small expe case s udy (n = 3).
Fu he , we conduc a pos hoc expe imen o assess he pe o mance o he unde lying
J. Eye Mo . Res. 2025,18, 27 7 o 35
machine lea ning models in au oma ically anno a ing long eco dings om head-moun ed
eye acke s. In he ollowing, we p esen he de ails abou he implemen a ion o eyeNo a e
and he me hodology used o e alua ing i .
3.1. The eyeNo a e Anno a ion Tool
We implemen eyeNo a e, a web-based ool o ixa ion- o-AOI mapping, an essen ial
da a p ocessing s ep in esea ch based on mobile eye acke s. Ou ool allows p ac i ion-
e s o anno a e eco dings manually ixa ion-wise, e lec ing he cu en s a e o he a
(baseline). We designed he use in e ace o enable e icien na iga ion h ough ideos
based on ixa ion e en s aligned o common ideo-edi ing in e aces. Fu he , we in eg a e
an IML componen ha can p o ide AOI label sugges ions o ixa ions and lea n om
use eedback, i.e., when hey accep o ejec /co ec sugges ions, based on a ew-sho
image classi ica ion model (IML-suppo ). Use anno a ions and model-based sugges ions
a e s o ed in a da abase. Figu e 1shows he basic use in e ace and an o e iew o he
IML-suppo ea u es.
a
c
b
d
e
Figu e 1. (a) Sc eensho o he use in e ace o ou baseline anno a ion ool and (b–e) an o e iew
o he IML-suppo ea u es. I ex ends he baseline by (b) a s a us ba indica ing he numbe o
AOI sugges ions g ouped by model ce ain y and a us -le el slide o adjus ing ce ain y in e als,
(c) indica o s o AOI sugges ions in he ixa ion lis , (d) adjus ed ixa ion o e lays o he ideo,
and (e) an op ion o con i m AOI sugges ions.
3.1.1. Baseline Anno a ion Tool
The baseline ool o e s a ideo-edi ing-like in e ace o ixa ion-wise da a anno a ion
(see Figu e 1a). I includes h ee main elemen s: A op ba displays in o ma ion on he
selec ed eco ding and he anno a ion p og ess, a lis on he le shows all ixa ions and
hei anno a ion s a e, and a ideo iew on he igh wi h a ixa ion o e lay and bu ons
o manual anno a ion. Selec ing a ixa ion om he lis causes he ideo iew o show
he espec i e image ame wi h a ci cula o e lay a he ixa ion posi ion, indica ing he
cu en ly assigned AOI. An AOI can be assigned o he ixa ion by clicking one o he AOI
bu ons o p essing he co esponding sho cu on he keyboa d. This is isually con i med
by a g een badge ha appea s nex o he ixa ion’s lis en y, and he o e lay in he ideo
iew ha u ns g een and shows he newly assigned AOI label. Na iga ion h ough
ixa ions is possible ia he a ow keys and on-sc een ideo con ols. When consecu i e
ixa ions hi he same AOI, hey can be anno a ed simul aneously by selec ing mul iple
ixa ions om he lis using he shi and a ow keys in combina ion. This is consis en wi h
mul i-i em selec ion ea u es in common lis iews.
J. Eye Mo . Res. 2025,18, 27 8 o 35
3.1.2. In e ac i e Machine Lea ning Suppo
The IML-suppo e sion o ou ool in eg a es an IML componen based on a ew-
sho image classi ica ion model, which is ini ialized wi h a small se o images pe AOI.
This model gene a es AOI label sugges ions o each ixa ion by c opping an image pa ch
om he co esponding ideo ame a ound he ixa ion poin . Manual anno a ions and
con i ma o y o co ec i e eedback a e used o e- ain he image classi ica ion model,
aiming o imp o e i s pe o mance o e ime. The model aining and in e ence un in
pa allel o enable lexible and quick adap a ions o he model o he a ge domain. Figu e 2
shows a high-le el o e iew o he componen s o ou sys em and how hey in e ela e.
anno a e model aining
model in e ence
AOI sugges ionsAOI sugges ions
con i m/co ec
Anno a o
F on end
Backend
Few-Sho
Image
Classi ica ion
Figu e 2. O e iew o he a chi ec u e o ou in e ac i e anno a ion sys em, including a web-based
use in e ace ( on end), a backend o managing da a s o age, and an IML se ice ha enables label
sugges ions and model e aining o he IML-suppo e sion o ou ool.
Use In e ace
The use in e ace o he IML-suppo e sion is ex ended o display and in e ac wi h
model-based label sugges ions (see Figu e 1b–e). A non- illed badge a a ixa ion’s lis
i em indica es ha a sugges ion is a ailable (see Figu e 1c). The ou line colo o he badge
encodes he model’s con idence, which is ei he high (g een), medium (yellow), o low ( ed).
The colo is also e lec ed in he ixa ion o e lay in he ideo iew (Figu e 1d). Use s can se
hei pe cei ed us in he model using a slide in he op ba (Figu e 1b). Mo ing he slide
owa ds high us dec eases he con idence h esholds: mo e sugges ions appea in g een.
Nex o he slide , an o e iew displays he dis ibu ion o sugges ions ac oss con idence
le els. A sugges ion can be con i med o co ec ed by use s. They p ess he space key o
con i m a sugges ion o one o mul iple selec ed ixa ions (Figu e 1e). To co ec i , hey
assign ano he class.
Image Classi ica ion Model
The IML-suppo e sion adop s a ew-sho lea ning s a egy based on he Fea u e
Map Recons uc ion Ne wo k (FRNe ) [
72
] o gene a e AOI label sugges ions. An o e iew
o he aining and in e ence o his model is illus a ed in Figu e 3. The FRNe is a
con olu ional neu al ne wo k (CNN) a chi ec u e ha pe o ms classi ica ion ia a class-
agnos ic dis ance unc ion: The image classi ica ion ask is amed as a econs uc ion
p oblem in la en space; i.e., p edic ing class membe ship elies on measu ing he dis ance
be ween a que y poin and e e ence poin s in la en space ep esen ing ou a ge classes
(i.e., AOIs). Fo any que y image
x
, he con olu ional block o he ne wo k ou pu s a
ea u e map
Q∈R ×d
, whe e
is he spa ial esolu ion
(h×w)
and
d
is he numbe o
channels. The ne wo k is ained in an N-sho -K-way manne o lea n suppo ea u e
maps
Sk∈RN ×d
o each AOI class
k∈K
om a pool o
N
aining images pe class.
Du ing in e ence, he model aims o econs uc he bes - i que y ea u e map
Qk
o each
class ca ego y as a weigh ed sum o ows o
Sk
such ha
WSk≈Qk
, whe e
W
is he model
weigh s op imized du ing model aining. By examining he nega i e econs uc ion e o ,
which ep esen s he dispa i y be ween he o iginal ea u e map
Q
and each AOI-wise
econs uc ed ea u e map
Qk
, FRNe assigns a class sco e. Smalle econs uc ion e o s
J. Eye Mo . Res. 2025,18, 27 9 o 35
indica e a highe likelihood ha he que y image belongs o he same class as he suppo
ea u es. We ain ou classi ica ion model using n = 10 images and o K = 7 AOIs (ini ial
labeled da a pool). Following We heime e al.
[72]
, we combine he classi ica ion loss
wi h an auxilia y loss
Laux
ha op imizes suppo ea u es om di e en classes o span
he la en space o ain FRNe :
Laux =∑
i∈K
∑
j∈K,j=i
∥SiST
j∥2(1)
The anno a ion ool uses his p e- ained FRNe model o in e AOI labels o each
ixa ion in he selec ed da ase . Label sugges ions a e displayed i he h eshold exceeds
a minimum con idence alue (0.4) ha he use can adjus h ough he us -le el slide .
Manual anno a ions and con i med o co ec ed AOI labels a e added o he labeled
da a pool. Fo e e y 10 new samples, a model e- aining is s a ed in he backg ound.
The model weigh s used o in e ence a e upda ed upon comple ion. The models a e
ained o 30 epochs a each i e a ion wi h weigh s ini ialized om he p e ious s eps.
On an NVIDIA RTX 3080 GPU (24GB), he model aining akes 2–4 s pe epoch.
……
T aining Phase In e ence Phase
Suppo
Images
𝑋𝑠
Suppo
Fea u e Map
𝑄𝑠
Suppo
Fea u e Pool
𝑋𝑠
Que y
Recons uc ion
ത
𝑄c
Que y
Fea u e Map
𝑄
Que y Image
…
Figu e 3. O e iew o he FRNe classi ica ion wo k low o a ew-sho classi ica ion p oblem.
3.2. E alua ion
We e alua e ou app oach in wo ways: we conduc a small case s udy wi h n = 3
ained anno a o s o quan i a i ely and quali a i ely compa e he baseline e sion o
ou ool wi h he IML-suppo e sion. Anno a o s ha e been asked o anno a e a small
po ion o a ound 2% o an exis ing da ase wi h g ound- u h anno a ions. In a pos hoc
expe imen , we assess he pe o mance o h ee machine lea ning models in au oma ically
anno a ing he emaining pa o he da ase . In he ollowing, we desc ibe he use case and
he co esponding da ase . Then, we p o ide de ails abou he case s udy and he pos hoc
machine lea ning expe imen .
3.2.1. Use Case and Da ase om Educa ional Resea ch
The e alua ion ocuses on educa ional esea ch as an impo an eye acking use case.
Mos digi al and analog lea ning en i onmen s a e based on isual in o ma ion. Hence,
gaze beha io is an impo an obse able cue allowing esea che s o gain insigh s in o
J. Eye Mo . Res. 2025,18, 27 16 o 35
Table 3. Class-wise mean ask comple ion imes in e ms o seconds pe anno a ion and he mean
numbe o anno a ions pe class.
Class
E_L E_R T_L T_R P_6 P_8
P_10
BG
Mean numbe o anno a ions 270 59 135 10 276 46 2 73
Mean ask comple ion
ime [s/anno a ion]
baseline 1.55 1.24 1.60 3.04 1.44 1.27 2.96 2.59
IML-
suppo 2.13 1.32 1.89 2.10 1.41 1.14 3.50 2.01
4.1.3. Usabili y
We measu ed pe cei ed usabili y using he SUS ques ionnai e. The baseline e sion
is consis en ly a ed as “excellen ” wi h alues anging om 87.5 o 95 (91.6 on a e age).
Fo he IML-suppo condi ion, we obse ed a high a iance in SUS sco es: he a ings
ange om 50 o B1 (“poo ”) o 67.5 o B2 (“OK”) o 97.5 o A1 (“excellen ”), a e aging
o 71.6. The e lexi e hema ic analysis o he SSI e ealed wo hemes: (a) he ool’s
design acili a es he anno a ion o mobile eye acking da a, and (b) he cons ained model
pe o mance limi s IML-based bene i s. De ails a e p o ided in he discussion sec ion below.
4.2. Resul s o he Machine Lea ning Expe imen
Table 4 epo s he accu acy o each model and aining se ing. FRNe ou pe o ms
MobileNe and ResNe : i achie es an accu acy o 57.57% in he base se ing and 58.78% in
he inal se ing, which is 6.64% and 7.39% be e han he second-bes models, espec i ely.
The model pe o ms ma ginally be e when aking he anno a ions o ou pa icipan s
in o accoun o aining in he inal se ing (+1.21%). MobileNe anks second o he base
se ing wi h an accu acy o 50.93%. The accu acy sligh ly dec eases o 49.28% o he inal
se ing. ResNe pe o ms wo s o he base se ing wi h 39.60% and bene i s mos om
using mo e aining samples in he inal se ing. The accu acy inc eases by 11.78% o 51.39%,
now sligh ly ou pe o ming MobileNe .
Table 4. Accu acy o each model and ain se ing.
Tes Samples
Base Se ing Final Se ing
MobileNe
ResNe FRNe
MobileNe
ResNe FRNe
230.3k 50.93% 39.60% 57.57% 49.28% 51.39% 58.78%
Table 5 epo s he class-wise and a e aged 1 sco es o each model and aining
se ing. In bo h aining se ings, FRNe pe o ms bes in e ms o he mac o and weigh ed
a e age o he 1 sco e. The bes pe o mance is achie ed in he inal se ing wi h a
mac o-a e age 1 sco e o 0.455 and a weigh ed a e age o 0.593. In he base se ing,
he mac o-a e age is 0.428, and he weigh ed a e age is 0.579. MobileNe and ResNe
achie e conside ably wo se a e age 1 sco es in bo h se ings. Fo he base se ing, he mac o-
a e age is 0.202 o MobileNe and 0.256 o Res Ne , he weigh ed a e age is 0.460 o
MobileNe and 0.409 o ResNe . MobileNe does no bene i om aking mo e aining
samples in o accoun in he inal se ing: he mac o-a e age 1 sco e sligh ly d ops o 0.185,
and he weigh ed a e age 1 sco e o 0.445. Fo ResNe , he mac o-a e age 1 sco e s ays
simila , while he weigh ed a e age 1 sco e imp o es by 0.062 o 0.471. Howe e , his is
s ill 0.122 wo se compa ed o FRNe in he same se ing and 0.107 wo se han FRNe in he
base se ing. I is no ewo hy ha he di e ence be ween FRNe and he o he wo models is
la ge o he mac o-a e age 1 sco e (di e ence
≥
0.172) han o he weigh ed a e age 1
sco e (di e ence
≥
0.119). Also, he mac o-a e age 1 sco e is always clea ly wo se han he
J. Eye Mo . Res. 2025,18, 27 17 o 35
weigh ed a e age 1 sco e, indica ing ha all models pe o m be e o classes wi h many
samples han o classes wi h a small numbe o samples. A class-wise analysis shows
ha all models pe o m bes o he backg ound class (BG) wi h 1 sco es s a ing om
0.569 o ResNe in he base se ing and la ge han 0.663 o all o he condi ions. The bes
pe o mance o he backg ound class was obse ed o ResNe and FRNe in he inal
se ing wi h an 1 sco e o 0.681. We only obse ed a single be e 1 sco e o 0.687 o
he able class T_L o FRNe in he inal se ing. As he backg ound class co e s mo e
han hal o all samples (137.9k o 230.3k samples), i has a la ge impac on he weigh ed
a e age. Fo MobileNe and ResNe models, we obse ed low 1 sco es o less han 0.5 o
all se en classes o he han BG in bo h se ings. FRNe shows a mo e balanced pe o mance.
In he base se ing, only ou ou o eigh classes achie e an 1 sco e below ha h eshold.
Fu he , o FRNe , we obse ed he bes pe o mance o each class besides P_10 o
which MobileNe was be e . In he inal se ing, FRNe imp o es o all classes besides
he expe imen a ea E_R (
−
0.094), which is why i e ou o eigh classes ha e an 1 sco e
lowe han 0.5. S ill, he model pe o ms bes o all classes besides P_10. Fo BG, ResNe
pe o ms equally well in his se ing. The bes 1 sco es o FRNe a e obse ed o he
backg ound class BG and he wo able classes T_L and T_R.
Table 5. Class-wise 1 sco es o each model and ain se ing.
Class Tes Samples
Base Se ing Final Se ing
MobileNe
ResNe
FRNe MobileNe
ResNe
FRNe
E_L
10,771 0.207 0.180 0.323 0.153 0.224 0.384
E_R
7780 0.001 0.481 0.556 0.006 0.457 0.463
T_L
26,167 0.077 0.002 0.662 0.057 0.001 0.687
T_R
11,407 0.183 0.316 0.570 0.144 0.087 0.575
P_6
14,725 0.317 0.256 0.334 0.310 0.320 0.375
P_8
10,242 0.003 0.198 0.209 0.014 0.120 0.329
P_10
11,392 0.151 0.044 0.093 0.133 0.153 0.146
BG
137,852 0.676 0.569 0.678 0.663 0.681 0.681
Mac o A e age 0.202 0.256 0.428 0.185 0.255 0.455
Weigh ed A e age 0.460 0.409 0.579 0.445 0.471 0.593
Figu e 7shows he con usion ma ix o he bes -pe o ming condi ion: FRNe in he
inal se ing. I is no malized o e he ue condi ions (i.e., o e ows): he alues on
he diagonal co espond o he ecall o a espec i e class. O he alues in he same ow
co espond o alse-nega i e e o s and sum up o he miss- a e o ha class. Fo ins ance,
o he backg ound class BG, he ecall is 61.33%, and he alse nega i es sum up o a miss
a e o 38.66%. The backg ound is o en misclassi ied as one o he expe imen a ea classes
(18.19%) o as one o he able classes (12%). The con usion ma ix shows ha classes wi h
simila appea ances a e equen ly con used. This can be obse ed o he wo expe imen
a ea AOIs, he wo able AOIs, and he h ee wo kbook AOIs. Fo ins ance, o E_L,
he ecall is 56.43%, and, wi h 26.53%, he majo i y o he alse nega i es we e classi ied as
E_R. The ecall o T_L is 76.31% while 12.38% o he alse nega i es we e classi ied as T_R.
A simila pa e n was obse ed o he wo kbook AOIs P_*. All AOI classes a e equen ly
misclassi ied as backg ound. He eby, he alse-nega i e e o s o he expe imen a ea and
able AOIs ange be ween 10.99% and 16.24%. The h ee wo kbook AOIs a e a ec ed
mo e se e ely: he alse-nega i e e o s ange be ween 48.86% and 55.58%. This esul s in a
p ecision o 0.765 o BG, which is he bes p ecision among all classes. P ecision and ecall
o all classes a e epo ed in Table 6.
J. Eye Mo . Res. 2025,18, 27 18 o 35
E_L E_R T_L T_R P_6 P_8 P_10 BG
P edic ed label
E_L
E_R
T_L
T_R
P_6
P_8
P_10
BG
T ue label
56.43 26.53 0.62 0.12 0.05 0.00 0.01 16.24
2.76 83.06 0.37 2.13 0.05 0.04 0.00 11.58
0.05 0.26 76.31 12.38 0.00 0.00 0.00 10.99
0.00 0.92 10.97 73.46 0.00 0.09 0.01 14.55
0.00 0.11 0.00 0.00 31.70 16.66 1.22 50.31
0.00 0.58 0.04 0.04 6.91 40.85 2.72 48.86
0.23 0.63 0.06 0.09 6.45 27.27 9.68 55.58
10.57 7.62 7.71 4.29 2.93 3.96 1.58 61.33
0%
10%
20%
30%
40%
50%
60%
70%
80%
Figu e 7. Con usion ma ix o he es se o FRNe in inal se ing (no malized o e ows).
Table 6. Class-wise p ecision and ecall o he FRNe model in he inal se ing.
Class P ecision Recall
E_L 0.291 0.564
E_R 0.321 0.831
T_L 0.625 0.763
T_R 0.473 0.735
P_6 0.459 0.317
P_8 0.275 0.409
P_10 0.295 0.097
BG 0.765 0.613
Mac o A g. 0.438 0.541
Weigh ed A g. 0.633 0.588
5. Discussion
Wi h eyeNo a e, we p esen a ool o anno a ing mobile eye acking da a. Ou
goal is o c ea e a ool ha allows esea che s o mo e e ec i ely and e icien ly anno a e
eco dings om mobile eye acke s while p o iding a high usabili y. In he ollowing, we
discuss he esul s o ou e alua ion, including a case s udy wi h h ee ained anno a o s
and a pos hoc machine lea ning expe imen .
5.1. Validi y and Reliabili y
The alidi y o use s’ anno a ions is high and alike o bo h e sions o ou anno a ion
ool. We obse ed an accu acy o 94.76% o he baseline e sion and 94.55% o he IML-
suppo e sion (weigh ed mean). An addi ional analysis e ealed 14 e o s (1.6%) in he
g ound u h. We iden i ied hese e o s in cases when all h ee anno a o s ag eed on an
AOI ha de ia ed om he g ound u h. Wi h a co ec ed g ound u h, accu acy inc eases
o 96.29% o he baseline e sion and 96.07% o he IML-suppo e sion. This sugges s
we me ou goal o achie ing an accu acy o a leas 95%. Ou esul s u he sugges ha
he exp_2 was mo e di icul o anno a e because accu acy alues consis en ly d opped o
bo h e sions o he ool om mo e han 97% accu acy o less han 90%, and we obse ed a
highe a io o g ound- u h e o s. A eason migh be ha he second phase included mo e
J. Eye Mo . Res. 2025,18, 27 19 o 35
di e en AOI classes and a mo e complex scene. The in e - a e ag eemen was almos
pe ec wi h
κ≥
0.9 in all cases, i.e., he eliabili y o anno a ions om bo h e sions o ou
ool is high.
5.2. E iciency
On a e age, ask comple ion imes o bo h ool e sions we e simila : anno a o s
we e 3.44% (48 s) slowe when using he IML-suppo e sion. Likewise, he di e ence
in ask comple ion imes be ween e sions pe pa icipan is small. On he o he hand,
he di e ences be ween pa icipan s a e la ge. A1 equi ed a ound 2000 s o sol e he ask
pe ool, while B1 and B2 equi ed a ound 1200 s and below 1000 s, espec i ely. This
is almos wice as as wi hou comp omising accu acy, which indica es ha B1 and B2
had a mo e e icien s a egy in using ou ools. Analyzing he ask comple ion imes
o e ime, we obse e ha A1 is consis en ly slowe han B1 and B2 wi h anno a ion
imes o 250–300 s/100 anno a ions. B1 and B2 equi e only a ound 150 s/100 anno a ions.
Du ing he s udy, we obse ed ha all pa icipan s used sho cu s o anno a ion and
con i ma ion, bu A1 did no use he mul i-selec ea u e, which could explain he high
di e ence o B1 and B2 in e ms o ask comple ion ime. Ano he indica o o he high
e ec i eness o he mul i-selec ea u e is ha B1 and B2 had he lowes ask comple ion
imes (
50–100
s/100 anno a ions) a he end o exp_1, which includes many consecu i e
occu ences o P_6 and P_8 (see also he low class-wise anno a ion imes in Table 3).
O e all, gi en he 870 ixa ions in he anno a ion ask, ou eyeNo a e achie es a wo s -case
anno a ion a e o 2.41 s/ ixa ion o use A1 when using he IML-suppo e sion and a
bes -case anno a ion a e o 1.11 s/ ixa ion o use B2 when using he IML-suppo e sion.
This means, using an au oma ic anno a ion me hod o map he emaining 230k ixa ions in
he ull da ase , he e is a ime-sa ing po en ial be ween 70 and 150 h o his use case.
Howe e , we could no con i m ou hypo hesis ha p o iding label sugges ions would
accele a e he labeling p ocess. This is likely because all anno a o s ended o manually
check and con i m label sugges ions in he IML-suppo e sion (c . Sec ion 5.3). We
obse ed co esponding anno a ion beha io du ing he s udy, and heme (b) o ou SSI
analysis conce ning he cons ained model pe o mance con i ms his: anno a o s did no
us he model su icien ly and el highly esponsible o pe o ming he job well. Hence,
hey did no bene i om au oma ic label sugges ions as ound in Desmond e al.
[23]
.
The di e ences in in e ac ion design be ween he baseline and he IML-suppo e sion
o ou ool seemingly played no ole in his con ex . Ou indings om he SSI analysis
ela ing o heme (a) sugges ha pa icipan s, in p inciple, liked he in e ac ion design o
he IML-suppo e sion, bu due o he low model pe o mance, hese ea u es we e no
e ec i e. Ou indings sugges ha u u e in es iga ions should include mo e e ec i e
compu e ision models ha can be e cope wi h he challenges o mobile eye acking da a
like di e en ia ing classes wi h simila appea ance. This could, o ins ance, be achie ed
using a classi ica ion model ha akes he posi ion o a ixa ed objec in o accoun [
86
] o
by acking objec s once hey ha e been anno a ed once using 3D scene econs uc ion
and objec acking algo i hms [
87
]. Follow-up wo k could also in es iga e how lay use s,
in con as o he ained anno a o s in ou case s udy, pe o m in he anno a ion ask,
ollowing he ques ions whe he lay use s could achie e he same alidi y as ained
anno a o s and whe he lay use s would bene i mo e om label sugges ions in e ms
o e iciency.
5.3. Usabili y
The usabili y o ou ool’s baseline e sion was consis en ly a ed as excellen : he
basic ea u es and gene al in e ac ion design o ou anno a ion ool we e pe cei ed e y
J. Eye Mo . Res. 2025,18, 27 20 o 35
posi i ely, which is suppo ed by heme (a) o ou hema ic analysis conce ning he ool’s
in e ac ion design: “ he ool’s design acili a es he anno a ion o mobile eye acking da a.”
Howe e , B1 and B2 a ed he IML-suppo ed e sion d as ically lowe , which con adic s
ou assump ion ha bo h ools achie e a simila usabili y a ing. Looking in o indi idual
SUS i ems, B1 and B2 majo ly penalized an inc eased inconsis ency o he IML-suppo
e sion and indica ed ha i was mo e cumbe some o use. Bo h el less con iden using
he IML-suppo e sion and hough i was less easy o use. Pa icula ly, B1, who a ed he
usabili y o he IML-suppo e sion as “poo ”, epo ed ha he sys em p o ided many
w ong label sugges ions and seemed unce ain in many cases, which caused con usion and
de e io a ed us . B1 epo s ha , as a consequence, hey ell back o a manual anno a ion
s a egy. B2 and A1 epo ed simila issues wi h he model pe o mance despi e a ing
usabili y highe . We obse ed ha B2 and A1 a o ed manual anno a ion, simila o B1.
These usabili y issues can be a ibu ed o he in eg a ion o IML-suppo ea u es and ela e
o heme (b) o ou hema ic analysis conce ning he cons ained model pe o mance: “ he
cons ained model pe o mance limi s IML-based bene i s.” The wo hemes, o igina ing
om a e lexi e hema ic analysis o he SSI, a e de ailed below.
5.3.1. (a) The Tool’s Design Facili a es he Anno a ion o Mobile Eye T acking Da a
Ou case s udy pa icipan s liked ou ool’s basic unc ionali y and in e ac ion de-
sign. In pa icula , hey highligh ed he clean design ha allowed hem o ocus on he
anno a ion ask h oughou he s udy. They epo ed high usabili y and lea nabili y. Quick
eac ion imes and isual eedback we e highly app ecia ed. Pa icula ly, he ideo o e lay
immedia ely displaying upda es a e manual anno a ion o con i ma ion was conside ed
e y help ul because hey had o check he ideo ame o decide on he AOI class anyway.
All pa icipan s epo ed a high pe cei ed pe o mance due o he clean, ocused in e ac ion
design and he abili y o use sho cu s o na iga ion and anno a ion. Also, he mul i-selec
ea u e o anno a ion and con i ma ion seems o impac anno a ion e iciency posi i ely.
The ideo playback unc ion was no used by ou pa icipan s bu migh ha e suppo ed
unde s anding he ideo-edi ing-like in e ace me apho . Upon asking hem, pa icipan s
epo ed hey unde s ood he us -le el slide bu did no use i o en, al hough i was
conside ed use ul. High-ce ain y sugges ions (g een highligh ) we e also conside ed help-
ul. Howe e , ce ain bu w ong label sugges ions we e us a ing as hey could lead o
w ong con i ma ions. Also, he ed colo o unce ain sugges ions was epo ed o in e up
he in e ac ion low in he case he p edic ions we e co ec . In summa y, colo -coding
o he model ce ain y o label sugges ions migh cause us a ion in he case o ce ain
bu w ong p edic ions and can in e up he in e ac ion low in he case o unce ain bu
co ec p edic ions. An implica ion could be o es ic label sugges ions o highly ce ain
sugges ions. Ou pa icipan s sugges ed wo in e es ing ea u es ha will be conside ed in
u u e e sions o ou ool. They p oposed a ea u e ha enables jumping o non-anno a ed
ixa ions o unce ain sugges ions. Fu he , hey p oposed a ea u e o ba ch-accep all
ce ain p edic ions, which would be dependen on he s a e o he us -le el slide and
could be es ic ed o classes wi h good classi ica ion pe o mance.
5.3.2. (b) The Cons ained Model Pe o mance Limi s IML-Based Bene i s
All pa icipan s epo ed a pe cei ed model pe o mance o 30–40% accu acy, al-
hough he ue alue is highe (62%). This indica es ha ou pa icipan s had low us in
he unde lying model gene a ing he AOI label sugges ions and could explain why hey
checked all sugges ions manually. This is also in line wi h hei epo s on p oblems wi h
ce ain y-based colo coding. All pa icipan s speci ied ha he model su e ed om a
le / igh weakness: Some AOIs wi h he same appea ance we e p esen on he le and
J. Eye Mo . Res. 2025,18, 27 21 o 35
igh sides o he expe imen scene, bu he model could no p ope ly di e en ia e be ween
hem. We in en ionally in es iga ed his challenge by including expe imen phase 2. One
example is T_L and T_R, e e ing o wo ins ances o he same able moun ed on he le
o igh side o he expe imen scene. This is e iden in he con usion ma ix o FRNe in
Figu e 7:T_L is w ongly classi ied as T_R in 12.38% o he cases. The alse-nega i e e o s
conce ning all o he classes besides BG sum up o 0.31%. We obse e simila p oblems
o he expe imen a ea and wo kbook page AOIs. I objec s look e y much alike, ou IML-
suppo e sion has limi a ions. Add essing he le / igh weakness is essen ial because
AOIs wi h simila appea ances a e common. Fu u e esea ch should in es iga e whe he
objec - acking o posi ion-awa e models can help o add ess his challenge. Ano he op ion
can be ound in me a-models ha i e a i ely lea n o which classes a model pe o ms well
and ac i a e sugges ions o hose only.
5.4. Pos Hoc ML Expe imen
We obse ed he bes a e age 1 sco es and accu acy sco es when using he FRNe
model a chi ec u e in he inal se ing, i.e., when using he 870 anno a ed ixa ions o
aining (see Tables 4and 5). Howe e , using mo e aining da a o he FRNe model
only sligh ly inc eases he pe o mance, e.g., +1.21% in accu acy and +0.015 conce ning
he weigh ed a e age 1 sco e. Wi h +11.78% o accu acy and +0.062 o he weigh ed 1
sco e, ResNe showed he g ea es imp o emen when mo e aining samples we e added.
MobileNe pe o ms sligh ly wo se o all me ics. Howe e , he esul s show ha he
models a e no good enough o mos applica ions such as au oma ic o semi-au oma ic
anno a ion wi h humans-in- he-loop. This is in line wi h he use ’s eedback om he SSI
as summa ized in heme (b).
The bes 1 sco e o 0.687 was obse ed o he T_L class o he FRNe model in he
inal se ing, ollowed by an 1 sco e o 0.681 o he BG class. The p ecision is highes o BG
wi h 0.765 (see Table 6), so labeling suppo only o he BG class could ha e been e ec i e.
Since almos 60% o all labels belong o his class, his could al eady sa e a lo o ime
wi hou aising usabili y issues like he ones men ioned in heme (b). The high a io o BG
samples in he es se also means ha summa y s a is ics like accu acy and he weigh ed 1
sco e a e biased h ough he ela i ely high pe o mance o his class. This is isible in he
la ge de ia ion be ween he weigh ed and mac o-a e age 1 sco es o all models. O e all,
FRNe shows he mos balanced pe o mance ac oss all classes: i pe o ms bes o all
classes besides P_10. This also explains he g ea e ela i e di e ence in he mac o-a e age
alues and he weigh ed a e age alues o 1 o MobileNe and ResNe .
The con usion ma ix in Figu e 7shows he s eng hs and weaknesses o he FRNe
model ( inal) on he class le el in mo e de ail. As coun s a e no malized o e he ue
condi ion, i.e., o e ows, he diagonal shows he ecall sco es o he ue condi ion o
class o ha ow, while he emaining alues o ha ow sum up o he co esponding miss
a e. Fo BG, we obse ed a ecall o 61.33% wi h a p ecision o 76.53%. This means ha ,
when limi ing sugges ions o he BG class, labels o mo e han one- hi d o all ins ances
(61.33% o 59.88% o all 230.3k ins ances) could ha e been p o ided, o which a ound
h ee-qua e s would ha e been co ec . S ill, one-qua e would ha e been w ong. So,
limi ing sugges ions o BG alone would likely no sol e he usabili y issues men ioned in
heme (b). These sco es we e obse ed o he de aul se ing when BG is assigned i he
model’s classi ica ion p obabili y o an AOI class is lowe han
BG =
0.4. Lowe ing
BG
would inc ease he p ecision o he BG class bu a he cos o a lowe ecall. Likewise,
inc easing he h eshold o assigning one o he se en AOI classes, we call i AOI, would
inc ease he p ecision o hese classes. E en ually, a class-speci ic ba ch-accep ea u e
o accep ing label sugges ions o a ce ain class wi h manually uned
BG
and
AOI
could
J. Eye Mo . Res. 2025,18, 27 22 o 35
be use ul. The use should be able o con igu e he p obabili y h eshold
BG
and he
classi ica ion h esholds
AOI
o each class, which would allow anno a o s o accep labels
based on hei own expe iences o how he model pe o ms pe class. Howe e , mos
1 sco es and all p ecision sco es o AOI classes a e lowe han he sco es o he BG
class (see Table 6), which indica es ha uning he h esholds o a ba ch-accep ea u e
migh be di icul . We conduc and epo on a ollow-up expe imen ha in es iga es how
changes in
BG
and
AOI
a ec he classi ica ion pe o mance and ela e o he numbe o
ixa ions wi hou a label sugges ion. By ha , we aim o es ima e he po en ial o a class-wise
ba ch-accep ea u e.
The con usion ma ix also indica es ha a eason o he low 1 sco es is he simila
appea ance o he AOI classes, including he wo expe imen a eas E_*, he wo able s T_*,
and he h ee wo kbook pages P_*. These h ee g oups can be clea ly iden i ied along he
diagonal as h ee squa es based on he high numbe o alse-nega i e e o s wi hin each
g oup. Fu he , i shows ha many AOI classes a e equen ly misclassi ied as belonging o
he backg ound class BG, pa icula ly he h ee wo kbook AOIs. Con usion o AOI classes
wi h he BG class could be educed by inc easing he classi ica ion h eshold
AOI
. This
could be ealized, e.g., h ough a class-based us -le el slide . Con usion o simila -looking
AOI classes can only be sol ed by using mo e sui able app oaches like mul i-objec acking;
i.e., once an AOI was manually labeled o con i med by a use , he sys em could ack his
ins ance o e eal w ong classi ica ions o au o-con i m ue classi ica ion, o g aph neu al
ne wo ks ha conside he spa ial loca ion o an objec o classi ica ion [
86
]. An op ion o
inc ease he u ili y o he FRNe model would be o p o ide label sugges ions a a highe
seman ic le el. Fo ins ance, eyeNo a e could iden i y all able s and ask he use which
ins ances belong o he le (T_L) o igh (T_R) class. Simila ly, his could be pe o med
o he wo expe imen a eas and he h ee wo kbook pages. Classi ica ion pe o mance
would likely be highe o his ou -class p oblem because i is a less complex classi ica ion
p oblem. We in es iga e his aspec in ano he ollow-up expe imen . Fu he , a wo-le el
decision ask (le s. igh ) o h ee-le el decision ask in he case o he wo kbook pages is
less di icul o use s han he eigh -le el decision ask, which includes all AOIs and he
sepa a e backg ound class.
Nex , we epo on he he wo men ioned ollow-up expe imen s: one o es ima ing
he u ili y o a class-wise ba ch-accep ea u e and one o in es iga ing how he model
would pe o m o he ou -class classi ica ion p oblem.
5.4.1. Es ima ing he U ili y o a Class-Wise Ba ch-Accep Fea u e
To es ima e he u ili y o a class-wise ba ch-accep ea u e, we in es iga e he impac
o adjus ing he classi ica ion h esholds
BG
and
AOI
on he model pe o mance in an
addi ional expe imen . In he cu en se ing, eyeNo a e sugges ed BG as a label when he
p obabili y was below a h eshold o
BG =
0.4 and he highes - anked AOI class o he wise.
In his pos hoc expe imen , we add a second h eshold
AOI
ha de e mines he minimum
classi ica ion p obabili y
p
be o e we assign an AOI class. The highe he gap be ween
hese wo h esholds, he highe he numbe o ins ances wi hou a label sugges ion will be.
Hence, he e will be a ade-o be ween he numbe o ins ances wi h a label sugges ion
and he p ecision o hose.
In he i s s ep, we assess whe he he de aul h eshold o classi ying he backg ound
class
BG =
0.4 was a good choice. Fo his, we plo an ROC cu e ha illus a es he ade-
o be ween he ue-posi i e a e ( ecall) and he alse-posi i e a e o classi ying he
BG class ( e sus all o he AOI classes) depending on
BG
(see Figu e 8). No e ha in he
de aul se ing,
AOI = BG
. The ROC cu e shows ha alse-posi i e a e o
BG =
0.4
is qui e high: 28.07% o non-BG ins ances a e w ongly classi ied as BG. Reducing
BG
o
J. Eye Mo . Res. 2025,18, 27 23 o 35
0.35 o 0.30 imp o es he alse-posi i e a e: only 10.92% o 3.06% a e w ongly classi ied as
backg ound. The ecall would d op o 44.83% and 29.96%, espec i ely. A ecall o 29.96%
s ill co esponds o 17.94% o all samples (41.3k) because 59.88% o all 230.3k samples
belong o he BG class.
Howe e , simul aneously educing
BG
and
AOI
op imizes he alse-posi i e a e o
he backg ound class bu will also lead o an inc ease in alse-posi i e a es o all o he
classes. Hence, we in es iga e he impac o inc easing
AOI
in 5% s eps on accu acy wi h
cons an
BG
o
BG ∈ {
0.3, 0.35, 0.4
}
. A he same ime, we in es iga e he impac on
he numbe o samples ha will no be anno a ed. The esul s a e p esen ed in Figu e 9a.
I shows he model accu acy and he anno a ion a io, i.e., he po ion o samples ha
ecei ed an anno a ion sugges ion, as a unc ion o
AOI
. Using he de aul pa ame e s
BG = AOI =
0.4, we obse e an accu acy o 58.78% as epo ed in Table 4 o FRNe in
he inal se ing. The anno a ion a io is 100% because
BG = AOI
. Fo
BG = AOI =
0.3,
he cu e s a s wi h an accu acy o 45.15%. Fo
BG = AOI =
0.35, accu acy s a s
wi h 52.58%. In all h ee cases, he accu acy inc eases and he anno a ion a io dec eases
wi h inc easing
AOI
. Se ing
AOI =
1 means, we do no conside anno a ions o any
class besides BG. Fo
BG =
0.4, he accu acy eaches 76.53% and he anno a ion a io
57.96% in his se ing. We obse e ha he lowe
BG
, he lowe he accu acy, and he
highe he anno a ion a io. Consequen ly, he maximum accu acy is eached o
BG =
0.3
wi h 93.54% as well as he minimum anno a ion a io o 18.97%. Howe e , o
AOI =
1,
p edic ion labels would be limi ed o BG. This indica es ha a ba ch-accep ea u e o
BG could be e ec i e. Fo a ba ch-accep ea u e ha includes o he classes han BG,
AOI
mus be smalle han 1. To assess how well he model would pe o m o AOI classes only,
i.e., o all classes besides he backg ound class BG, we an he expe imen o
BG =
0
and 0
≤ AOI ≤
1. The co esponding diag am is shown in Figu e 9b. Up o
AOI =
0.15,
all samples a e classi ied as one o he AOI classes. This means ha he minimum model
ce ain y lies be ween 0.15 and 0.2. Wi h inc easing
AOI
he accu acy also inc eases un il i
eaches i s maximum o
AOI =
0.9 wi h 64.75%. Howe e , wi h hese pa ame e s, only
1.24% o all samples would be anno a ed.
0.0 0.2 0.4 0.6 0.8 1.0
False Posi i e Ra e
0.0
0.2
0.4
0.6
0.8
1.0
T ue Posi i e Ra e (Recall)
BG s all (AUC: 0.72)
Chance
BG
= 0.40 (de aul )
BG
= 0.35
BG
= 0.30
Figu e 8. ROC cu e o he backg ound class BG o he FRNe model in he inal se ing. The decision
bounda y co esponds o he h eshold BG = AOI.
J. Eye Mo . Res. 2025,18, 27 24 o 35
O e all, he esul s o his addi ional expe imen indica e ha a ba ch-accep ea u e
o he backg ound class BG could add alue o eyeNo a e. Since he pa ame e s a e
op imized o e he es se , he esul s can only se e as an uppe bound o he pe o mance.
In a ealis ic scena io, he pe o mance wi h a human op imizing he pa ame e s would
lie below his uppe bound, bu i would, in heo y, be eachable o he conside ed use
case, da ase , and model. Howe e , he esul s also show ha he classi ie is no good
enough o p o iding label sugges ions o AOI classes, e en unde he assump ion ha
use s could une he decision h esholds. A eason is likely he high simila i y be ween
some o he AOI classes.
5.4.2. Simula ing Model Pe o mance in a Fou -Class Classi ica ion Se ing
Ano he op ion o inc ease he u ili y o eyeNo a e using he FRNe model is o ea
he classi ica ion as a ou -class p oblem, i.e., o only di e en ia e be ween he backg ound
class BG and h ee u he AOI classes: expe imen a ea E, able T, and wo kbook pages
P. Fo ou use case, he human anno a o would s ill need o decide whe he , e.g., he
iden i ied able is he le o igh e sion. Bu his decision is less complex han assigning
one ou o all eigh classes. Also, his in es iga ion can e eal he po en ial bene i o
eyeNo a e o o he , mo e simple use cases. Hence, we assess he o e all accu acy and
he p ecision, ecall, and 1 sco es unde he assump ion ha only ou a ge classes exis ,
i.e., E,T,P,BG, using he FRNe model in he inal se ing. Fo his, we eplace he ue
and p edic ed class labels wi h he co esponding summa y class; e.g., E_L and E_R a e
eplaced wi h Ebe o e compu ing sco es. The BG labels do no change.
In he ou -class se ing, FRNe achie es an accu acy o 65.30%, which is 6.52% be e
han in he o iginal eigh -class se ing. Table 7shows he co esponding p ecision, ecall,
and 1 sco es. As expec ed, he sco es o summa y classes a e be e compa ed o he
o iginal classes. Fo ins ance, o E, we obse e an 1 sco e o 0.524, while he 1 sco es
o E_L and E_R a e 0.384 and 0.463, espec i ely. This also holds o Tand P. The e-
sul s do no change o BG because he e we e no changes conce ning he backg ound
class. Consequen ly, he mac o-a e age and weigh ed a e age 1 sco es a e also highe .
The mac o-a e age 1 sco e inc eases by 0.167 and he weigh ed a e age 1 sco e by 0.063.
In summa y, educing he complexi y o he classi ica ion p oblem has a posi i e e ec
on all obse ed sco es. Howe e , o enable e ec i e anno a ion suppo we will need o
u he imp o e he model pe o mance. P omising di ec ions ha should be in es iga ed
include me hods like mul i-objec acking and g aph neu al ne wo k models.
Table 7. Class-wise p ecision, ecall, and 1 sco es o he FRNe model in inal se ing o a educed
se o ou a ge classes.
AOI # Samples P ecision Recall 1 Sco e
E 18,551 0.380 0.842 0.524
T 37,574 0.661 0.874 0.753
P 36,359 0.598 0.479 0.532
BG 137,852 0.765 0.613 0.681
mac o a g 0.601 0.702 0.622
weigh ed a g 0.691 0.653 0.656
J. Eye Mo . Res. 2025,18, 27 25 o 35
0%
20%
40%
60%
80%
100%
0%
20%
40%
60%
80%
100%
0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
Anno a ion Ra io
Accu acy
AOI
accu acy @ BG=0.30 accu acy @ BG=0.35 accu acy @ BG=0.40
anno a ion a io @ BG=0.30 anno a ion a io @ BG=0.35 anno a ion a io @ BG=0.40
(a) BG ∈ {0.3, 0.35, 0.4}
0%
20%
40%
60%
80%
100%
0%
20%
40%
60%
80%
100%
Anno a ion Ra io
Accu acy
AOI
accu acy @ BG=0.00 anno a ion a io @ BG=0.00
(b) BG =0
Figu e 9. Accu acy and anno a ion a io as a unc ion o
AOI
o he FRNe model in inal se ing o
BG ∈ {0.3, 0.35, 0.4}(a) and o BG =0 (b).
6. Conclusions
We p esen ed eyeNo a e, an in e ac i e anno a ion ool o mobile eye acking da a
based on ew-sho image classi ica ion. The esul s o a case s udy con i med ha eye-
No a e e ec i ely enables ixa ion- o-AOI mapping: use s liked he basic unc ionali y
and in e ac ion design, and he alidi y and eliabili y o use s’ anno a ions we e high.
Howe e , we obse ed ha p o iding AOI label sugges ions in he IML-suppo e sion
did no inc ease he e iciency, likely because o pe o mance issues o he model ha led
o low us in he ained anno a o s. S ill, ou esul s sugges ed ha FSL bea s g ea
po en ial o ini ia ing in e ac i e da a anno a ion. O e all, he ask comple ion imes we e
low, wi h 1.11 s pe anno a ion (bes case) o 2.41 s (wo s case). Pa icipan s iden i ied
cons ained model pe o mance as he main hinde ing ac o , especially p oblems wi h
simila -looking AOIs. This limi a ion was con i med in ou pos hoc machine lea ning
expe imen . Fu u e esea ch should aim o de elop o in eg a e mo e sophis ica ed com-
pu e ision me hods ha can cope wi h he dynamic and complex na u e o mobile eye
J. Eye Mo . Res. 2025,18, 27 32 o 35
19.
B ône, G.; Oben, B.; Goedemé, T. Towa ds a Mo e E ec i e Me hod o Analyzing Mobile Eye-T acking Da a: In eg a ing Gaze
Da a wi h Objec Recogni ion Algo i hms. In P oceedings o he PETMEI ’11: 1s In e na ional Wo kshop on Pe asi e Eye
T acking & Mobile Eye-Based In e ac ion, Beijing, China, 18 Sep embe 2011; pp. 53–56. [C ossRe ]
20.
Pon illo, D.F.; Kinsman, T.B.; Pelz, J.B. Seman iCode: Using Con en Simila i y and Da abase-D i en Ma ching o Code Wea able
Eye acke Gaze Da a. In P oceedings o he ETRA ’10: 2010 Symposium on Eye-T acking Resea ch & Applica ions, Aus in, TX,
USA, 22–24 Ma ch 2010; pp. 267–270. [C ossRe ]
21.
Huang, C.M.; Mu lu, B. An icipa o y Robo Con ol o E icien Human-Robo Collabo a ion. In P oceedings o he HRI ’16:
2016 11 h ACM/IEEE In e na ional Con e ence on Human-Robo In e ac ion (HRI), Ch is chu ch, New Zealand, 7–10 Ma ch
2016; pp. 83–90.
22.
Ba z, M.; Kapp, S.; Kuhn, J.; Sonn ag, D. Au oma ic Recogni ion and Augmen a ion o A ended Objec s in Real- ime using Eye
T acking and a Head-moun ed Display. In P oceedings o he ETRA ’21 Adjunc : ACM Symposium on Eye T acking Resea ch
and Applica ions, Vi ual, 25–27 May 2021; p. 4. [C ossRe ]
23.
Desmond, M.; Mulle , M.; Ashk o ab, Z.; Dugan, C.; Dues e wald, E.; B imijoin, K.; Finegan-Dollak, C.; B achman, M.; Sha ma,
A.; Joshi, N.N.; e al. Inc easing he Speed and Accu acy o Da a Labeling Th ough an AI Assis ed In e ace. In P oceedings o
he IUI ’21: 26 h In e na ional Con e ence on In elligen Use In e aces, College S a ion, TX, USA, 14–17 Ap il 2021; pp. 392–401.
[C ossRe ]
24.
Dudley, J.J.; K is ensson, P.O. A Re iew o Use In e ace Design o In e ac i e Machine Lea ning. ACM T ans. In e ac . In ell.
Sys . 2018,8, 37. [C ossRe ]
25.
Sonn ag, D.; Ba z, M.; Gou êa, T. A look unde he hood o he In e ac i e Deep Lea ning En e p ise (No-IDLE). a Xi 2024,
a Xi :2406.19054.
26.
Ame shi, S.; Cakmak, M.; Knox, W.B.; Kulesza, T. Powe o he People: The Role o Humans in In e ac i e Machine Lea ning. AI
Mag. 2014,35, 105–120. [C ossRe ]
27.
Toyama, T. Towa ds Wea able A en ion-Awa e Sys ems in E e yday En i onmen s. Ph.D. Thesis, Technische Uni e si ä
Kaise slau e n, Kaise slau e n, Ge many, 2015.
28.
Lande , C.; Löch e eld, M.; K üge , A. HEYEb id: A Hyb id App oach o Mobile Calib a ion-F ee Gaze Es ima ion. P oc. ACM
In e ac . Mob. Wea able Ubiqui ous Technol. 2018,1, 29. [C ossRe ]
29.
Kassne , M.; Pa e a, W.; Bulling, A. Pupil: An Open Sou ce Pla o m o Pe asi e Eye T acking and Mobile Gaze-Based
In e ac ion. In P oceedings o he UbiComp ’14 Adjunc : 2014 ACM In e na ional Join Con e ence on Pe asi e and Ubiqui ous
Compu ing: Adjunc Publica ion, Sea le, WA, USA, 13–17 Sep embe 2014; pp. 1151–1160. [C ossRe ]
30.
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Lea ning o Image Recogni ion. In P oceedings o he 2016 IEEE Con e ence on
Compu e Vision and Pa e n Recogni ion (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [C ossRe ]
31.
Cao, Q.; Shen, L.; Xie, W.; Pa khi, O.M.; Zisse man, A. VGGFace2: A Da ase o Recognising Faces ac oss Pose and Age. In
P oceedings o he 2018 13 h IEEE In e na ional Con e ence on Au oma ic Face Ges u e Recogni ion (FG 2018), Xi’an, China,
15–19 May 2018; pp. 67–74. [C ossRe ]
32.
Chong, E.; Chanda, K.; Ye, Z.; Sou he land, A.; Ruiz, N.; Jones, R.M.; Rozga, A.; Rehg, J.M. De ec ing Gaze Towa ds Eyes in
Na u al Social In e ac ions and I s Use in Child Assessmen . P oc. ACM In e ac . Mob. Wea able Ubiqui ous Technol. 2017,1, 20.
[C ossRe ]
33.
Callemein, T.; Van Beeck, K.; B ône, G.; Goedemé, T. Au oma ed Analysis o Eye-T acke -Based Human-Human In e ac ion
S udies. In P oceedings o he In o ma ion Science and Applica ions 2018, Hong Kong, China, 25–27 June 2018; Kim, K.J., Baek, N., Eds.;
Lec u e No es in Elec ical Enginee ing; Sp inge : Singapo e, 2019; pp. 499–509. [C ossRe ]
34.
Howa d, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; And ee o, M.; Adam, H. MobileNe s: E icien
Con olu ional Neu al Ne wo ks o Mobile Vision Applica ions. a Xi 2017, a Xi :1704.04861.
35.
Russako sky, O.; Deng, J.; Su, H.; K ause, J.; Sa heesh, S.; Ma, S.; Huang, Z.; Ka pa hy, A.; Khosla, A.; Be ns ein, M.; e al.
ImageNe La ge Scale Visual Recogni ion Challenge. In . J. Compu . Vis. (IJCV) 2015,115, 211–252. [C ossRe ]
36.
Ren, S.; He, K.; Gi shick, R.; Sun, J. Fas e R-CNN: Towa ds Real-Time Objec De ec ion wi h Region P oposal Ne wo ks. In
P oceedings o he Ad ances in Neu al In o ma ion P ocessing Sys ems, Mon eal, QC, Canada, 7–12 Decembe 2015; Co es, C.,
Law ence, N., Lee, D., Sugiyama, M., Ga ne , R., Eds.; Cu an Associa es, Inc.: Red Hook, NY, USA, 2015; Volume 28, pp. 91–99.
37.
Lin, T.Y.; Mai e, M.; Belongie, S.; Hays, J.; Pe ona, P.; Ramanan, D.; Dollá , P.; Zi nick, C.L. Mic oso COCO: Common Objec s in
Con ex . In P oceedings o he Compu e Vision—ECCV 2014, Zu ich, Swi ze land, 6–12 Sep embe 2014; Flee , D., Pajdla, T.,
Schiele, B., Tuy elaa s, T., Eds.; Lec u e No es in Compu e Science; Sp inge : Cham, Swi ze land, 2014; Volume 8693, pp. 740–755.
[C ossRe ]
38.
Ba z, M.; Sonn ag, D. Au oma ic Visual A en ion De ec ion o Mobile Eye T acking Using P e-T ained Compu e Vision Models
and Human Gaze. Senso s 2021,21, 4143. [C ossRe ]
39. He, K.; Gkioxa i, G.; Dollá , P.; Gi shick, R. Mask R-CNN. IEEE T ans. Pa e n Anal. Mach. In ell. 2020,42, 386–397. [C ossRe ]
J. Eye Mo . Res. 2025,18, 27 33 o 35
40.
Ba line , M.; Hess, S.; Eh lich-Adám, C.; Lohmeye , Q.; Mebold , M. Au oma ed a eas o in e es analysis o usabili y s udies o
angible sc een-based use in e aces using mobile eye acking. A i . In ell. Eng. Des. Anal. Manu . 2020,34, 505–514. [C ossRe ]
41.
De Beughe , S.; Ichiche, Y.; B ône, G.; Goedemé, T. Au oma ic Analysis o Eye-T acking Da a Using Objec De ec ion Algo i hms.
In P oceedings o he UbiComp ’12: 2012 ACM Con e ence on Ubiqui ous Compu ing, Pi sbu gh, PA, USA, 5–8 Sep embe 2012;
pp. 677–680. [C ossRe ]
42.
De Beughe , S.; B ône, G.; Goedemé, T. Au oma ic analysis o in- he-wild mobile eye- acking expe imen s using objec , ace and
pe son de ec ion. In P oceedings o he 2014 In e na ional Con e ence on Compu e Vision Theo y and Applica ions (VISAPP),
Lisbon, Po ugal, 5–8 Janua y 2014; Volume 1, pp. 625–633.
43.
E ans, K.M.; Jacobs, R.A.; Ta duno, J.A.; Pelz, J.B. Collec ing and Analyzing Eye-T acking Da a in Ou doo En i onmen s. J. Eye
Mo . Res. 2012,5, 19. [C ossRe ]
44.
Fong, A.; Ho man, D.; Ra wani, R.M. Making Sense o Mobile Eye-T acking Da a in he Real-Wo ld: A Human-in- he-Loop
Analysis App oach. P oc. Hum. Fac o s E gon. Soc. Annu. Mee . 2016,60, 1569–1573. [C ossRe ]
45.
Pane a, K.; Wan, Q.; Rajee , S.; Kaszowska, A.; Ga dony, A.L.; Na anjo, K.; Taylo , H.A.; Agaian, S. ISeeColo : Me hod o
Ad anced Visual Analy ics o Eye T acking Da a. IEEE Access 2020,8, 52278–52287. [C ossRe ]
46.
Ku zhals, K.; Rod igues, N.; Koch, M.; S oll, M.; B uhn, A.; Bulling, A.; Weiskop , D. Visual Analy ics and Anno a ion o Pe asi e
Eye T acking Video. In P oceedings o he ETRA ’20 Full Pape s: ACM Symposium on Eye T acking Resea ch and Applica ions,
S u ga , Ge many, 2–5 June 2020; p. 9. [C ossRe ]
47.
Ku zhals, K. Image-Based P ojec ion Labeling o Mobile Eye T acking. In P oceedings o he ACM Symposium on Eye T acking
Resea ch and Applica ions, Vi ual, 25–27 May 2021; Associa ion o Compu ing Machine y: New Yo k, NY, USA, 2021.
48.
S eil, J.; Huang, M.X.; Bulling, A. Fixa ion De ec ion o Head-Moun ed Eye T acking Based on Visual Simila i y o Gaze Ta ge s.
In P oceedings o he ETRA ’18: 2018 ACM Symposium on Eye T acking Resea ch & Applica ions, Wa saw, Poland, 14–17 June
2018; p. 9. [C ossRe ]
49.
Palo ai, Z.; Láng, M.; Sá kány, A.; T˝osé , Z.; Sonn ag, D.; Toyama, T.; L˝o incz, A. LabelMo ie: Semi-supe ised machine
anno a ion ool wi h quali y assu ance and c owd-sou cing op ions o ideos. In P oceedings o he 2014 12 h In e na ional
Wo kshop on Con en -Based Mul imedia Indexing (CBMI), Klagen u , Aus ia, 18–20 June 2014; pp. 1–4. [C ossRe ]
50.
Kopácsi, L.; Dobolyi, A.; Fó hi, A.; Kelle , D.; Va ga, V.; L˝o incz, A. RATS: Robus Au oma ed T acking and Segmen a ion o
Simila Ins ances. In P oceedings o he A i icial Neu al Ne wo ks and Machine Lea ning—ICANN 2021, B a isla a, Slo akia, 14–17
Sep embe 2021; Fa kaš, I., Masulli, P., O e, S., We m e , S., Eds.; Lec u e No es in Compu e Science; Sp inge : Cham, Swi ze land,
2021; Volume 12893, pp. 507–518. [C ossRe ]
51.
Ba z, M.; Moni i, M.M.; Webe , M.; Sonn ag, D. Mul imodal mul isenso ac i i y anno a ion ool. In P oceedings o he 2016 ACM
In e na ional Join Con e ence on Pe asi e and Ubiqui ous Compu ing, UbiComp Adjunc 2016, Heidelbe g, Ge many, 12–16 Sep embe
2016; Lukowicz, P., K üge , A., Bulling, A., Lim, Y.K., Pa el, S.N., Eds.; Associa ion o Compu ing Machine y: New Yo k, NY,
USA, 2016; pp. 17–20. [C ossRe ]
52.
Du a, A.; Zisse man, A. The VIA Anno a ion So wa e o Images, Audio and Video. In P oceedings o he MM ’19: 27 h ACM
In e na ional Con e ence on Mul imedia, Nice, F ance, 21–25 Oc obe 2019; pp. 2276–2279. [C ossRe ]
53.
Schoe mann, K.; Hudelis , M.A.; Hube , J. Video In e ac ion Tools: A Su ey o Recen Wo k. ACM Compu . Su . 2015,48, 34.
[C ossRe ]
54.
Q a o d , P. Gaze-In o med Mul imodal In e ac ion. In The Handbook o Mul imodal-Mul isenso In e aces: Founda ions, Use
Modeling, and Common Modali y Combina ions; Associa ion o Compu ing Machine y and Mo gan & Claypool: New Yo k, NY,
USA, 2017; Volume 1, pp. 365–402.
55.
Bulling, A. Eye Mo emen Analysis o Con ex In e ence and Cogni i e-Awa eness: Wea able Sensing and Ac i i y Recogni ion
Using Elec ooculog aphy. Ph.D. Thesis, ETH Zu ich, Zu ich, Swi ze land, 2010; ISBN 978-3-909386-34-5. [C ossRe ]
56.
And é, E.; Chai, J. In oduc ion o he Special Sec ion on Eye Gaze and Con e sa ion. ACM T ans. In e ac . In ell. Sys . 2013,3, 2.
[C ossRe ]
57.
Bulling, A.; Weichel, C.; Gelle sen, H. EyeCon ex : Recogni ion o High-le el Con ex ual Cues om Human Visual Beha iou . In
P oceedings o he SIGCHI Con e ence on Human Fac o s in Compu ing Sys ems, Pa is, F ance, 27 Ap il–2 May 2013; pp. 305–308.
[C ossRe ]
58.
S eil, J.; Bulling, A. Disco e y o E e yday Human Ac i i ies om Long-Te m Visual Beha iou Using Topic Models. In
P oceedings o he UbiComp ’15: 2015 ACM In e na ional Join Con e ence on Pe asi e and Ubiqui ous Compu ing, Osaka,
Japan, 7–11 Sep embe 2015; pp. 75–85. [C ossRe ]
59.
S eil, J.; Mülle , P.; Sugano, Y.; Bulling, A. Fo ecas ing Use A en ion du ing E e yday Mobile In e ac ions Using De ice-
In eg a ed and Wea able Senso s. In P oceedings o he MobileHCI ’18: 20 h In e na ional Con e ence on Human-Compu e
In e ac ion wi h Mobile De ices and Se ices, Ba celona, Spain, 3–6 Sep embe 2018; p. 13. [C ossRe ]
J. Eye Mo . Res. 2025,18, 27 34 o 35
60.
Toyama, T.; Kieninge , T.; Sha ai , F.; Dengel, A. Gaze Guided Objec Recogni ion Using a Head-Moun ed Eye T acke . In
P oceedings o he ETRA ’12: Symposium on Eye T acking Resea ch and Applica ions, San a Ba ba a, CA, USA, 28–30 Ma ch
2012; pp. 91–98. [C ossRe ]
61. Lowe, D.G. Dis inc i e Image Fea u es om Scale-In a ian Keypoin s. In . J. Compu . Vis. 2004,60, 91–110. [C ossRe ]
62.
Toyama, T.; Sonn ag, D. Towa ds Episodic Memo y Suppo o Demen ia Pa ien s by Recognizing Objec s, Faces and Tex in Eye
Gaze. In P oceedings o he KI 2015: Ad ances in A i icial In elligence—38 h Annual Ge man Con e ence on AI, D esden, Ge many,
21–25 Sep embe 2015; Hölldoble , S., K ö zsch, M., Peñaloza, R., Rudolph, S., Eds.; Lec u e No es in Compu e Science; Sp inge :
Cham, Swi ze land, 2015; Volume 9324, pp. 316–323. [C ossRe ]
63.
Fa hi, A.; Li, Y.; Rehg, J.M. Lea ning o Recognize Daily Ac ions Using Gaze. In P oceedings o he Compu e Vision—ECCV 2012,
Flo ence, I aly, 7–13 Oc obe 2012; Fi zgibbon, A., Lazebnik, S., Pe ona, P., Sa o, Y., Schmid, C., Eds.; Lec u e No es in Compu e
Science; Sp inge : Be lin/Heidelbe g, Ge many, 2012; Volume 7572, pp. 314–327. [C ossRe ]
64.
Li, Y.; Ye, Z.; Rehg, J.M. Del ing in o egocen ic ac ions. In P oceedings o he 2015 IEEE Con e ence on Compu e Vision and
Pa e n Recogni ion (CVPR), Bos on, MA, USA, 7–12 June 2015; pp. 287–295. [C ossRe ]
65.
Li, Y.; Liu, M.; Rehg, J.M. In he Eye o Beholde : Join Lea ning o Gaze and Ac ions in Fi s Pe son Video. In P oceedings o he
Compu e Vision—ECCV 2018, Munich, Ge many, 8–14 Sep embe 2018; Fe a i, V., Hebe , M., Sminchisescu, C., Weiss, Y., Eds.;
Lec u e No es in Compu e Science; Sp inge : Cham, Swi ze land, 2018; Volume 11209, pp. 639–655. [C ossRe ]
66.
Shiga, Y.; Toyama, T.; U sumi, Y.; Kise, K.; Dengel, A. Daily Ac i i y Recogni ion Combining Gaze Mo ion and Visual Fea u es. In
P oceedings o he UbiComp ’14 Adjunc : 2014 ACM In e na ional Join Con e ence on Pe asi e and Ubiqui ous Compu ing:
Adjunc Publica ion, Sea le, WA, USA, 13–17 Sep embe 2014; pp. 1103–1111. [C ossRe ]
67.
P aso , Z.; Chai, J.Y. Wha ’s in a Gaze? The Role o Eye-Gaze in Re e ence Resolu ion in Mul imodal Con e sa ional In e aces.
In P oceedings o he IUI ’08: 13 h In e na ional Con e ence on In elligen Use In e aces, G an Cana ia, Spain, 13–16 Janua y
2008; pp. 20–29. [C ossRe ]
68.
Bau , T.; Mehlmann, G.; Damian, I.; Lingen else , F.; Wagne , J.; Lug in, B.; And é, E.; Gebha d, P. Con ex -Awa e Au oma ed
Analysis and Anno a ion o Social Human–Agen In e ac ions. ACM T ans. In e ac . In ell. Sys . 2015,5, 11. [C ossRe ]
69.
Thomason, J.; Sinapo , J.; S e lik, M.; S one, P.; Mooney, R.J. Lea ning Mul i-Modal G ounded Linguis ic Seman ics by Playing “I
Spy”. In P oceedings o he IJCAI’16: Twen y-Fi h In e na ional Join Con e ence on A i icial In elligence, New Yo k, NY, USA,
9–15 July 2016; AAAI P ess: Washing on, DC, USA, 2016; pp. 3477–3483.
70.
Chang, Y.; Zhao, Y.; Dong, M.; Wang, Y.; Lu, Y.; L , Q.; Dick, R.P.; Lu, T.; Gu, N.; Shang, L. MemX: An A en ion-Awa e Sma
Eyewea Sys em o Pe sonalized Momen Au o-Cap u e. P oc. ACM In e ac . Mob. Wea able Ubiqui ous Technol. 2021,5, 23.
[C ossRe ]
71.
Meye , J.; F ank, A.; Schlebusch, T.; Kasneci, E. A CNN-Based Human Ac i i y Recogni ion Sys em Combining a Lase Feedback
In e e ome y Eye Mo emen Senso and an IMU o Con ex -Awa e Sma Glasses. P oc. ACM In e ac . Mob. Wea able Ubiqui ous
Technol. 2022,5, 24. [C ossRe ]
72. We heime , D.; Tang, L.; Ha iha an, B. Few-Sho Classi ica ion Wi h Fea u e Map Recons uc ion Ne wo ks. In P oceedings o
he 2021 IEEE/CVF Con e ence on Compu e Vision and Pa e n Recogni ion (CVPR), Nash ille, TN, USA, 20–25 June 2021;
pp. 8008–8017. [C ossRe ]
73.
Ja odzka, H.; Holmq is , K.; G ube , H. Eye acking in Educa ional Science: Theo e ical amewo ks and esea ch agendas.
J. Eye Mo . Res. 2017,10, 18. [C ossRe ]
74.
Malone, S.; Al meye , K.; Vogel, M.; B ünken, R. Homogeneous and he e ogeneous mul iple ep esen a ions in equa ion-sol ing
p oblems: An eye- acking s udy. J. Compu . Assis . Lea n. 2020,36, 781–798. [C ossRe ]
75.
Reingold, E.M.; She idan, H. Eye mo emen s and isual expe ise in chess and medicine. In The Ox o d Handbook o Eye Mo emen s;
Li e sedge, S.P., Gilch is , I., E e ling, S., Eds.; Ox o d Uni e si y P ess: Ox o d, UK, 2011; pp. 524–550. [C ossRe ]
76.
Ja odzka, H.; an Gog, T.; Do , M.; Schei e , K.; Ge je s, P. Lea ning o see: Guiding s uden s’ a en ion ia a Model’s eye
mo emen s os e s lea ning. Lea n. Ins . 2013,25, 62–70. [C ossRe ]
77.
Thees, M.; Al meye , K.; Kapp, S.; Rexigel, E.; Beil, F.; Klein, P.; Malone, S.; B ünken, R.; Kuhn, J. Augmen ed Reali y o P esen ing
Real-Time Da a Du ing S uden s’ Labo a o y Wo k: Compa ing a Head-Moun ed Display Wi h a Sepa a e Display. F on . Psychol.
2022,13, 16. [C ossRe ] [PubMed]
78.
Salminen-Saa i, J.F.A.; Ga cia Mo eno-Es e a, E.; Haa aja, E.; Toi anen, M.; Hannula, M.S.; Laine, A. Phases o collabo a i e
ma hema ical p oblem sol ing and join a en ion: A case s udy u ilizing mobile gaze acking. ZDM—Ma h. Educ. 2021,
53, 771–784. [C ossRe ]
79.
Fleische , T.; Deibl, I.; Mose , S.; S ahl, A.; Maie , S.; Zumbach, J. Mobile Eye T acking du ing Expe imen ing wi h Digi al
Sca olding—Gaze Shi s be ween Augmen ed Reali y and Expe imen du ing Zinc Iodide Elec olysis Se -Up. Educ. Sci. 2023,
13, 20, 170. [C ossRe ]
80.
B ooke, J. SUS: A ’Quick and Di y’ Usabili y Scale. In Usabili y E alua ion in Indus y, 1s ed.; CRC P ess: London, UK, 1996; p. 6.
81. Fleiss, J.L. Measu ing nominal scale ag eemen among many a e s. Psychol. Bull. 1971,76, 378–382. [C ossRe ]
J. Eye Mo . Res. 2025,18, 27 35 o 35
82. Landis, J.R.; Koch, G.G. The measu emen o obse e ag eemen o ca ego ical da a. Biome ics 1977,33, 159–174. [C ossRe ]
83.
Bango , A.; Ko um, P.; Mille , J. De e mining Wha Indi idual SUS Sco es Mean: Adding an Adjec i e Ra ing Scale. J. Usabili y
S ud. 2009,4, 114–123.
84.
B aun, V.; Cla ke, V. Thema ic analysis. In APA Handbook o Resea ch Me hods in Psychology, Volume 2: Resea ch Designs: Quan i a i e,
Quali a i e, Neu opsychological, and Biological; APA Handbooks in Psychology®; Ame ican Psychological Associa ion: Washing on,
DC, USA, 2012; pp. 57–71. [C ossRe ]
85.
Sandle , M.; Howa d, A.; Zhu, M.; Zhmogino , A.; Chen, L.C. MobileNe V2: In e ed Residuals and Linea Bo lenecks. In
P oceedings o he 2018 IEEE/CVF Con e ence on Compu e Vision and Pa e n Recogni ion (CVPR), Sal Lake Ci y, UT, USA,
18–23 June 2018; pp. 4510–4520. [C ossRe ]
86.
Le, H.H.; Nguyen, D.M.H.; Bha i, O.S.; Kopácsi, L.; Ngo, T.P.; Nguyen, B.T.; Ba z, M.; Sonn ag, D. I-MPN: Induc i e message
passing ne wo k o e icien human-in- he-loop anno a ion o mobile eye acking da a. Sci. Rep. 2025,15, 14192. [C ossRe ]
87.
Kopácsi, L.; Ba z, M.; Bha i, O.S.; Sonn ag, D. IMETA: An In e ac i e Mobile Eye T acking Anno a ion Me hod o Semi-
Au oma ic Fixa ion- o-AOI Mapping. In P oceedings o he IUI ’23 Companion: Companion P oceedings o he 28 h In e na ional
Con e ence on In elligen Use In e aces, Sydney, NSW, Aus alia, 27–31 Ma ch 2023; pp. 33–36. [C ossRe ]
Disclaime /Publishe ’s No e: The s a emen s, opinions and da a con ained in all publica ions a e solely hose o he indi idual
au ho (s) and con ibu o (s) and no o MDPI and/o he edi o (s). MDPI and/o he edi o (s) disclaim esponsibili y o any inju y o
people o p ope y esul ing om any ideas, me hods, ins uc ions o p oduc s e e ed o in he con en .