scieee Science in your language
[en] (orig)

Perceived Text Relevance Estimation Using Scanpaths and GNNs

Author: Mohamed Selim, Abdulrahman; Bhatti, Omair Shahazad; Barz, Michael; Sonntag, Daniel
Publisher: Zenodo
DOI: 10.1145/3678957.3685736
Source: https://zenodo.org/records/17258315/files/3678957.3685736.pdf
Pe cei ed Tex Rele ance Es ima ion Using Scanpa hs and GNNs
Abdul ahman Mohamed Selim
abdul ahman.mohamed@d ki.de
Ge man Resea ch Cen e o A i icial In elligence (DFKI)
Saa b ücken, Ge many
Omai Shahzad Bha i
omai _shahzad.bha i@d ki.de
Ge man Resea ch Cen e o A i icial In elligence (DFKI)
Saa b ücken, Ge many
Michael Ba z
michael.ba z@d ki.de
Ge man Resea ch Cen e o A i icial In elligence (DFKI)
Saa b ücken, Ge many
Uni e si y o Oldenbu g
Oldenbu g, Ge many
Daniel Sonn ag
daniel.sonn ag@d ki.de
Ge man Resea ch Cen e o A i icial In elligence (DFKI)
Saa b ücken, Ge many
Uni e si y o Oldenbu g
Oldenbu g, Ge many
ABSTRACT
A scanpa h is an impo an concep in eye acking ha ep esen s
a pe son’s eye mo emen s in a g aph-like s uc u e. Passi e gaze-
based in e aces, in which use s do no consciously in e ac using
hei eyes, ypically in e p e use s’ scanpa hs o enable adap i e
and pe sonalised in e ac ion. Despi e he bene i s o g aph neu al
ne wo ks (GNNs) in g aph p ocessing, his echnology has no been
conside ed o ha pu pose. An example applica ion is pe cei ed
ele ance es ima ion, which s ill su e s om low classi ica ion pe -
o mance. In his wo k, we in es iga e how and whe he GNNs can
be used o analyse scanpa hs o eade s’ pe cei ed ele ance es i-
ma ion using he gazeRE da ase . This da ase con ains eye acking
da a om 24 pa icipan s, who a ed he ele ance o 12 sho and
12 long documen s in ela ion o a gi en que y. The ele ance was
assigned ei he o an en i e sho documen o o each pa ag aph
wi hin a long documen , which allowed us o in es iga e wo di e -
en GNN asks. Fo compa ison, we ep oduced he gazeRE baseline
using Random Fo es and Suppo Vec o classi ie s, and an addi-
ional Con olu ional Neu al Ne wo k (CNN) om he li e a u e.
All models we e e alua ed using lea e-use s-ou c oss- alida ion.
Fo sho documen s, he GNNs su passed he baseline me hods,
wi h ce ain expe imen s showing an absolu e balanced accu acy
imp o emen o 7.6% and 14.3% o e he CNN and gazeRE baselines,
espec i ely. Howe e , simila imp o emen s we e no obse ed in
long documen s. This wo k in es iga es and discusses he u u e
po en ial o using GNNs as a scanpa h analysis me hod o passi e
gaze-based applica ions, such as implici ele ance es ima ion.
CCS CONCEPTS
• Compu ing me hodologies
→
Neu al ne wo ks; • Human-
cen e ed compu ing → Human compu e in e ac ion (HCI).
Pe mission o make digi al o ha d copies o all o pa o his wo k o pe sonal o
class oom use is g an ed wi hou ee p o ided ha copies a e no made o dis ibu ed
o p o i o comme cial ad an age and ha copies bea his no ice and he ull ci a ion
on he i s page. Copy igh s o componen s o his wo k owned by o he s han he
au ho (s) mus be hono ed. Abs ac ing wi h c edi is pe mi ed. To copy o he wise, o
epublish, o pos on se e s o o edis ibu e o lis s, equi es p io speci ic pe mission
and/o a ee. Reques pe missions om [email p o ec ed].
ICMI ’24, No embe 04–08, 2024, San Jose, Cos a Rica
© 2024 Copy igh held by he owne /au ho (s). Publica ion igh s licensed o ACM.
ACM ISBN 979-8-4007-0462-8/24/11
h ps://doi.o g/10.1145/3678957.3685736
KEYWORDS
Eye T acking; Scanpa h; GNN; Passi e Gaze-based Applica ion
ACM Re e ence Fo ma :
Abdul ahman Mohamed Selim, Omai Shahzad Bha i, Michael Ba z, and Daniel
Sonn ag. 2024. Pe cei ed Tex Rele ance Es ima ion Using Scanpa hs and
GNNs. In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERAC-
TION (ICMI ’24), No embe 04–08, 2024, San Jose, Cos a Rica. ACM, New
Yo k, NY, USA, 10 pages. h ps://doi.o g/10.1145/3678957.3685736
1 INTRODUCTION
Scanpa hs e e o aces o a pe son’s eye mo emen ac oss space
o e a pe iod o ime [
19
]. A scanpa h consis s o a se ies o al e -
na ing ixa ions and saccades [
4
]. Fixa ions desc ibe he s a e when
he eyes emain ela i ely s ill o a ime pe iod las ing somewhe e
be ween a ew ens o milliseconds up o a ew seconds, while sac-
cades a e he apid eye mo emen s om one ixa ion o ano he [
19
].
Scanpa hs a e among he mos common me hods o analysing and
ep esen ing human eye mo emen s [
4
,
27
]. Figu e 1 is an example
o a isual encoding o ma whe e a scanpa h is p ojec ed on op
o a s imulus, e.g. a piece o ex , whe e ixa ions a e shown as
numbe ed ci cles, and saccades a e lines connec ing hem. G aph
ep esen a ion is ano he common o ma , whe e a scanpa h gaze
da a is g ouped, e.g. clus e ing neighbou ing ixa ions, o c ea e
nodes and edges ep esen ing a g aph s uc u e [30].
G aph Neu al Ne wo ks (GNNs) a e deep-lea ning models ha
p ocess g aph s uc u es and cap u e hei dependence ia message
passing be ween nodes [
43
]. GNNs ha e shown good pe o mance
in mul iple ields, e.g., na u al science, social science, and bioin o -
ma ics [
43
]. Despi e his, GNNs ha e no been p ope ly in es iga ed
in p ocessing scanpa h g aph s uc u es. The e ha e been a emp s
in ac i e gaze-based applica ions, e.g. [
35
]. Howe e , we only ound
one publica ion, i.e. [
38
], ha used GNNs wi h scanpa h da a in a
passi e gaze-based applica ion, i.e., applica ions whe e eye acking
is used as a suppo ing modali y o unde s and a use ’s beha iou
and ac i i ies wi hou explici gaze-based in e ac ion [13, 34].
An impo an a ea o passi e gaze-based applica ions is o un-
de s and and moni o cogni i e p ocesses [
29
], e.g. implici ele-
ance es ima ion du ing eading [
1
,
2
,
6
] o du ing decision mak-
ing [
15
]. De ec ing a use ’s pe cei ed ele ance owa ds a piece o
media is o en used o imp o e he sys em pe o mance and e u n
use - ailo ed esul s, e.g., in ecommende [
33
] and in o ma ion
418
ICMI ’24, No embe 04–08, 2024, San Jose, Cos a Rica M. Selim e al.
His ooms we e b illian ly li , and, e en as I looked up, I saw his
all, spa e igu e pass wice in a da k silhoue e agains he blind. He
was pacing he oom swi ly, eage ly, wi h his head sunk upon his
ches and his hands clasped behind him. To me, who knew his e e y
mood and habi , his a i ude and manne old hei own s o y.
He was a wo k again. He had isen ou o his d ug-c ea ed d eams
and was ho upon he scen o some new p oblem. I ang he bell and
was shown up o he chambe which had o me ly been in pa my
own.
2
9
10
1
13
8
7
12
6
3
5
11
14 15 16
17 18 19 20
4
Figu e 1: A manually gene a ed scanpa h o e a piece o ex
as a simpli ied e sion o a eal-wo ld example.
e ie al [
32
] sys ems. Pe cei ed ele ance es ima ion is also used
in human-compu e in e ac ion (HCI), e.g. o c ea e adap i e use
in e aces (UIs) [
14
,
15
]. Howe e , explici ly de ec ing pe cei ed
ele ance using ques ionnai es o in e iews could ha e a nega i e
impac on a use ’s cogni i e load [
33
], which is why implici ele-
ance es ima ion is seen as a be e al e na i e because i equi es
no ex a e o on a use ’s behal [33, 39].
In his pape , we in es iga ed using GNNs o scanpa h p o-
cessing o es ima e a use ’s pe cei ed ele ance while eading ex
documen s using he gazeRE da ase [
1
]. This da ase con ains da a
om wo di e en asks using wo di e en ex co po a, whe e a
use ’s pe cei ed ele ance o a gi en que y is es ima ed ei he o
each pa ag aph in a documen o o he documen as a whole. This
enabled us o ea each ull documen as a single g aph and in es i-
ga e a g aph classi ica ion ask, i.e. p edic ing he label assigned o
he ull documen , and a node classi ica ion ask, i.e. p edic ing he
label assigned o each single pa ag aph. Using a GNN equi es con-
e ing scanpa hs in o sui able g aph s uc u es. We implemen ed
and compa ed ou di e en scanpa h g aph s uc u es o use as
inpu s o ou GNNs. To e alua e ou p oposed me hod, we imple-
men ed wo baseline app oaches. The i s app oach ep oduced
he se up o Ba z e al
. [1]
, which used 17 eye acking ea u es wi h
andom o es (RF) and suppo ec o machine (SVM) classi ie s.
We ex ended his se up by in es iga ing an addi ional ea u e sub-
se . The second app oach eplica ed he me hod o Bha acha ya
e al
. [3]
, which used he VGG19 Con olu ional Neu al Ne wo k
(CNN) a chi ec u e [
36
]. We ex ended his app oach by examining
i s pe o mance no only on sho documen s, simila o hose used
in hei o iginal expe imen , bu also on long documen s.
We designed ou expe imen s o explo e wo p ima y esea ch
opics in a single, cohe en amewo k. The i s opic examined
he easibili y o using GNNs o p ocess scanpa hs o pe cei ed el-
e ance es ima ion, ocusing on bo h g aph and node classi ica ion
asks. The second opic in ol ed a compa a i e analysis o he pe -
o mance o GNNs wi h o he machine lea ning algo i hms, namely
SVM, RF, and CNN, in he con ex o pe cei ed ele ance es ima ion.
This analysis aimed o p o ide a comp ehensi e unde s anding o
he pe o mance o hese di e en app oaches.
2 RELATED WORK
In he li e a u e, we ound di e en me hods o cons uc g aph
s uc u es ou o scanpa hs o passi e gaze-based applica ions. Lan
e al. [25] used a CNN o p ocess complex g aph s uc u es whe e
each gaze poin ep esen ed a node o a s imulus and ask in e -
ence applica ion. Ma e al
. [28]
ea ed each wo d in a eading ask
as a node o s uc u e a scanpa h as a g aph o measu e eading
comp ehension using ne wo k me ics such as densi y, cen ali y,
and small-wo ldness. Can oni e al. [8] ocused on modelling use
iewing beha iou o use au hen ica ion by spli ing he s imuli
in o 7x6 g ids and used he cen e o each g id cell o combine he
di e en ixa ions in o g aph nodes; each node had a weigh ep-
esen ing he o al numbe o ixa ions and o al ixa ion du a ion
wi hin i . Khos a an e al
. [21]
used he BIRCH clus e ing algo-
i hm [
42
] o gene a e a less dense g aph s uc u e ou o scanpa hs
on medical images o simpli y he scanpa hs wi hou changing
hei opology; hey encoded he numbe o nodes in each clus e
and he o al du a ion spen wi hin each clus e in he g aph as
a ep esen a ion o he a en ion in a pa icula egion. Despi e
s uc u ing scanpa hs as g aphs being common o passi e gaze-
based applica ions, we only ound one pape by Wang e al
. [38]
ha p oposed a gaze-guided GNN o p ocess g aphs c ea ed by
embedding he aw gaze da a wi h image pa ches om x- ay scans.
GNNs ha e eme ged as a powe ul ool o lea ning wi h g aph-
s uc u ed da a, such as molecules and social, biological, and i-
nancial ne wo ks; he key o his lea ning p ocess is he e ec i e
ep esen a ion o he g aph s uc u e [
41
,
43
]. GNNs ope a e on a
message passing scheme; each node calcula es a new ea u e ec o
ha con ains s uc u al in o ma ion o i s neighbou ing nodes by
agg ega ing he ea u e ec o s o hese neighbou ing nodes; o
ep esen an en i e g aph, a pooling me hod is used, such as sum-
ming he ep esen a ion ec o s o all nodes in he g aph [
41
]. Zhou
e al
. [43]
desc ibed a gene al GNN ask pipeline, which consis s
o : de ining he g aph s uc u e; de e mining he g aph ype (e.g.,
di ec ed o undi ec ed g aph); iden i ying he ask ype, whe he
node-le el asks ha ocus on he g aph nodes (e.g., node classi ica-
ion), edge-le el asks ha ocus on he g aph edges (e.g., p edic ing
i an edge exis s be ween wo nodes), o g aph-le el asks ha ocus
on he ull g aph s uc u e (e.g., g aph classi ica ion); and inally,
building he GNN model. We wan ed o combine GNNs and scan-
pa h g aph ep esen a ions (wi hou s imulus in o ma ion) o a
passi e gaze-based applica ion. The applica ion we decided o ocus
on was es ima ing a use ’s pe cei ed ele ance while eading o see
whe he his app oach could help imp o e he ield’s cu en s a e.
P e ious s udies showed ha eye acking is a alid modali y o
es ima ing a pe son’s pe cei ed ele ance owa ds a ex documen
wi h espec o a p e iously ead igge ques ion. Busche e al
.
[6]
in es iga ed he ela ion be ween a use ’s eading beha iou
and hei pe cei ed ele ance owa ds a documen . They ound
ha use s end o skim i ele an documen s bu exe con inuous
eading beha iou while eading ele an ones. Gwizdka
[17
,
18]
in oduced he g-REL co pus, which is a collec ion o sho ex
pa ag aphs and co esponding ques ions. They used i o in es i-
ga e he ela ion be ween eye mo emen s and a use ’s pe cei ed
ele ance while eading and we e able o con i m he p io indings
o Busche e al. [6].
Bha acha ya e al
. [3]
used he g-REL co pus and encoded use s’
scanpa h da a as images o es ima e hei pe cei ed ele ance us-
ing a CNN. They e alua ed six di e en p e- ained CNNs bu
concluded ha VGG19 [
36
] p oduced he bes esul s. A e wa ds,
Bha acha ya e al
. [2]
in oduced wo no el con ex hull-based
419
Pe cei ed Tex Rele ance Es ima ion Using Scanpa hs and GNNs ICMI ’24, No embe 04–08, 2024, San Jose, Cos a Rica
scanpa h ea u es o es ima e a use ’s pe cei ed ele ance while
eading sho news a icles. They conduc ed wo sepa a e da a col-
lec ion s udies whe e 24 pa icipan s and 120 news a icles we e
used in he i s s udy, and 24 pa icipan s and 42 news a icles
we e used in he second s udy. They used 10- old c oss- alida ion
wi h an RF classi ie o a bina y classi ica ion o e h ee sepa a e
subse s, i.e. Ag ee whe e he use ’s pe cei ed ele ance ma ched he
sys em ele ance, Topical whe e he news a icles we e on he opic
o in e es bu did no ha e he equi ed answe , and All whe e hey
used he da ase as a whole. They achie ed he bes classi ica ion
pe o mance when hey combined hei wo p oposed con ex hull
ea u es wi h 15 o he eye acking ea u es om he li e a u e.
They epo ed he bes model pe o mance o he Ag ee subse
ollowed by All, bu he Topical subse p oduced poo esul s.
Ba z e al
. [1]
ex ended he p io wo k o Bha acha ya e al
. [2]
by in es iga ing using he same 17 ea u es on long documen s.
They collec ed da a om 24 pa icipan s using 12 documen s om
he g-REL co pus and 12 documen s om he Google Na u al Ques-
ions (GoogleNQ) co pus [
24
], which is a collec ion o long docu-
men s ha equi e sc olling. Despi e he lowe model pe o mance,
using RF and SVM classi ie s, hey p oduced simila indings o
Bha acha ya e al
. [2]
unde he same expe imen condi ions o
he g-REL co pus, bu we e unable o gene alise hei indings o
he GoogleNQ co pus. Pe cei ed ele ance es ima ion is s ill an on-
going a ea o passi e gaze-based esea ch. I s ill has open ques ions
ega ding opical and long documen s, so i is a sui able applica ion
domain o in es iga e and es GNNs o scanpa h p ocessing.
3 METHODS
In his pape , we p esen a no el GNN-based scanpa h analysis
app oach using he gazeRE da ase o a node and a g aph classi-
ica ion p oblem. Fo he g aph classi ica ion, we e alua ed ou
di e en scanpa hs g aph ep esen a ion o ma s. Howe e , o he
node classi ica ion, we e alua ed one g aph ep esen a ion o ma .
We e alua ed di e en GNN ope a o s o bo h asks. As a base-
line, we ep oduced he se up epo ed in [
1
] using SVM and RF
classi ie s; we also e alua ed hese classi ie s using only he wo
con ex hull-based ea u es om [
2
]. In addi ion, o compa e agains
a neu al ne wo k, we eplica ed he VGG19 se up epo ed in [3].
3.1 Da ase
The gazeRE da ase
1
has eye acking da a om 24 pa icipan s o
pe cei ed ele ance es ima ion while eading. Each pa icipan ead
12 sho a icles om he g-REL co pus [
17
] and 12 long a icles
om he GoogleNQ co pus [24].
The g-REL co pus con ains ou ele an , ou i ele an , and
ou opical documen s wi h espec o hei accompanying que y
acco ding o he sys em label. Each documen had be ween h ee
o i e pa ag aphs (
𝜇 =
3
.
5,
𝜎 =
0
.
645). Pa icipan s we e shown a
que y and had o decide whe he he en i e documen was ele an
o i ele an wi h espec o he que y. A schema ic example o a g-
REL s imulus is shown in Figu e 2a. Ou o 288 o al ials, 107 we e
pe cei ed as ele an and 181 as i ele an by he pa icipan s. The
Ag ee subse (whe e he pe cei ed ele ance ma ched he sys em
ele ance) has 181 o al ials, wi h 86 ele an and 95 i ele an
1h ps://gi hub.com/DFKI-In e ac i e-Machine-Lea ning/gazeRE-da ase
Label
Que y
(a) g-REL Co pus
Label 0
Label 1
Label 2
Label 3
Label 4
Que y
(b) GoogleNQ Co pus
Figu e 2: Schema ic example o a s imulus o bo h g-REL
in 2a and GoogleNQ in 2b om he gazeRE da ase [1].
ials. The Topical subse (i.e., documen s on he opic o in e es
bu no con aining he que y answe ) has 96 o al ials, wi h 20
ele an and 76 i ele an ials.
The GoogleNQ co pus con ains 12 long documen s ha e-
qui e sc olling. One pa ag aph in each documen is ele an , and
he emaining pa ag aphs a e opical o he accompanying que y.
GoogleNQ does no ha e explici ly i ele an pa ag aphs. Each doc-
umen had be ween i e o se en pa ag aphs (
𝜇 =
5
.
83,
𝜎 =
0
.
799).
Pa icipan s we e shown a que y and had o decide whe he each
sepa a e pa ag aph in a documen was ele an o i ele an wi h
espec o he que y. A schema ic example o a GoogleNQ s imulus
is shown in Figu e 2b. GoogleNQ has a o al o 288 ials, wi h 450
ele an and 1230 i ele an pa ag aphs. The Ag ee subse (whe e
he ull documen pe cei ed ele ance ma ched he sys em ele-
ance) has 145 o al ials, wi h 248 ele an and 1190 i ele an
pa ag aphs. Due o all 12 documen s ha ing opical pa ag aphs,
GoogleNQ did no ha e a Topical subse .
We used he pa icipan s’ pe cei ed ele ance o he ex o he
que y, i.e. ele an o i ele an , as ou labels o he bina y classi i-
ca ion p oblem. When we men ion he wo d label mo ing o wa d,
ha is wha we a e e e ing o. The e a e mul iple di e ences
be ween g-REL and GoogleNQ. Each documen in g-REL has one
label assigned o he ull documen . Howe e , in GoogleNQ, each
documen has mul iple labels co esponding o he numbe o pa a-
g aphs wi hin each documen . Addi ionally, GoogleNQ does no
ha e explici ly i ele an pa ag aphs because all he pa ag aphs
ha do no ha e he answe o he que y a e on he opic o he
que y, i.e. Topical. We decided o use his da ase because he di -
e ences be ween he wo co po a allowed us o in es iga e wo
di e en GNN ask ypes: Node-le el and G aph-le el asks. In bo h
asks, we ea ed each ull documen as a single g aph. g-REL was
sui able o g aph classi ica ion because each documen had one
label assigned o i . GoogleNQ, on he o he hand, was sui able o
node classi ica ion, whe e we ied o classi y he labels assigned
o each pa ag aph in a documen .
3.2 T adi ional Machine Lea ning
In o de o es ablish a baseline o compa ison, we ep oduced he
se up epo ed in [
1
]. Howe e , in addi ion o using he same 17
420
ICMI ’24, No embe 04–08, 2024, San Jose, Cos a Rica M. Selim e al.
Table 1: O e iew o he 17 eye acking ea u es om [
1
,
2
].
Type Fea u es
Fixa ion
1. Numbe o ixa ions
2. Sum o ixa ion du a ions
3. Mean o ixa ion du a ions
4. S anda d de ia ion o ixa ion du a ions
Saccade
5. Sum o ho izon al ampli udes o saccades, no malised by w
6. Sum o e ical ampli udes o saccades, no malised by h
7. Sum o Euclidean dis ance o no malised saccade ampli udes
8. Ra io o ho izon al o e ical ampli udes
9. A e age saccade ampli ude
10. Ho izon al saccade eloci y
11. Ve ical saccade eloci y
12. Saccade eloci y
A ea
13. A ea scanned by summed saccade ampli udes
14. The scanned a ea no malised by he scan ime
15. Numbe o ixa ions pe scanned a ea
16. The con ex hull a ea no malised by he scan ime
17. Numbe o ixa ions pe con ex hull a ea
ea u es, shown in Table 1, we in es iga ed using jus he con ex
hull ea u es om Bha acha ya e al
. [2]
, i.e. numbe s 16 and 17 in
Table 1 because Bha acha ya e al
. [2]
only e alua ed hem on sho
documen s and no longe documen s such as he GoogleNQ co pus.
We used h ee machine lea ning algo i hms wi h ou wo ea u e
se s: he de aul
RF
classi ie om sciki -lea n
2
; he
RF∗
classi ie
which is he
RF
classi ie wi h wo addi ional p ep ocessing s eps,
he o e sampling echnique SMOTE [
9
] om he imbalanced-lea n
package [
26
], and he s anda disa ion ea u e scaling me hod o
make he ea u es ha e ze o mean and uni a iance; in addi ion
o he
SVM∗
classi ie which is he de aul
SVM
classi ie om sciki -
lea n wi h he same p ep ocessing s eps o RF
∗ .
3.3 Con olu ional Neu al Ne wo k
We eplica ed he bes -pe o ming se up epo ed by Bha acha ya
e al
. [3]
using VGG19, which is a a ian o he VGG model wi h
19 laye s. I includes 19 con olu ional laye s o cap u e he spa ial
pa e ns in images. The a chi ec u e uses small 3x3 con olu ion
il e s, which allow i o collec mo e de ailed and complex ea u es.
In addi ion, i inco po a es h ee ully connec ed laye s ollowing
hese con olu ional laye s. We adap ed he inal ou pu laye s o
ensu e hei compa ibili y wi h ou bina y classi ica ion ask.
In he p ep ocessing s ep, we ans o med each eye acking
eco ding in o a scanpa h image ollowing he me hods used by
Bha acha ya e al
. [3]
. Fo g-REL, each documen p oduced one
single image, while o GoogleNQ, we p oduced a scanpa h image
o each pa ag aph independen ly o ensu e ha he image dimen-
sions and p esen a ion emained consis en wi h hose used o
g-REL. An example o a scanpa h image is shown in Figu e 3, whe e
each ixa ion is ep esen ed by unique isual ma ke s p opo ional
o i s du a ion, while he saccades a e colou -coded o illus a e
he sequence o eading mo emen s ac oss he ex . Each scanpa h
image was gene a ed on a 2560x1440 can as and scaled down o
256x256 as inpu o he CNN.
2h ps://sciki -lea n.o g/s able/
Figu e 3: An example o a scanpa h isual ep esen a ion o
he CNN. The ixa ions a e ep esen ed by di e en ma ke s
based on hei du a ion, while he saccades a e colou -coded
based on hei imes amp.
3.4 G aph Neu al Ne wo k
3.4.1 Scanpa h G aph Rep esen a ion. In o de o use he scanpa hs
as inpu s o ou GNNs, we con e ed he scanpa hs in o simpli ied
g aph s uc u es. The gene a ed g aphs we e di ec ed ( o e ain he
empo al in o ma ion o a scanpa h) and homogeneous (i.e., all he
nodes and all he edges had he same ype). The GoogleNQ co pus
had a simple con e sion p ocess because i s documen s we e used
in he node classi ica ion ask; we ea ed each documen as a single
g aph, each pa ag aph ep esen ed a node, and he saccades om
one pa ag aph o he nex ep esen ed he edges. Howe e , each
documen in he g-REL co pus was accompanied by only one label;
we es ed ou di e en app oaches o gene a e sui able g aphs
om he scanpa hs: pa ag aph-based, line-based, clus e -based, and
qua ile-based.
The Pa ag aph-based app oach, shown in Figu e 4a, is he same
app oach ollowed in GoogleNQ ( o ha e a common ep esen a ion
be ween bo h co po a) whe e each pa ag aph ep esen ed a node,
and he saccades om one pa ag aph o he nex ep esen ed he
edges. The Line-based app oach, shown in Figu e 4d, ies o p e-
se e he s uc u e o he ex and eading pa e ns, which could be
seen as an ex ension o Ma e al
. [28]
whe e hey ea ed each wo d
as a node. The Clus e -based app oach was inspi ed by Khos a an
e al
. [21]
, bu ins ead o using he BIRCH clus e ing algo i hm [
42
],
we used he A ini y P opaga ion algo i hm [
16
]; Figu e 4b shows
a e y simpli ied depic ion o his app oach. The Qua ile-based
app oach di ides he ull- ex documen in o ou equal-sized nodes,
as shown in Figu e 4c; his is simila o spli ing he s imuli in o
g ids as epo ed by Can oni e al. [8].
The documen s in bo h co po a con ained addi ional whi e space
a ound he ex along he X-axis. We igno ed his ex a whi e space,
ocusing only on he main body o he ex . We limi ed he gaze
poin s o he ex body and no he backg ound o documen i le.
Ac oss he di e en g aph gene a ion app oaches, we used he
numbe o ixa ions and o al ixa ion du a ion wi hin each node
as node ea u es simila o [
8
,
21
]. In addi ion, we compu ed he
same 17 ea u es shown in Table 1 o each node o all g aph
gene a ion me hods excep o he line-based app oach because
we could no compu e he a ea-based ea u es and only compu ed
he ixa ion and saccade-based ea u es. To be able o use g aph
s uc u es as inpu s o a GNN, we need o de ine wo pa ame e s:
Node Fea u es (x)
and
Edges (E)
. The pa ame e
x
, shown in
Algo i hm 1, con ains he node ea u es, whe e each node has a
ea u e ec o . The pa ame e
E
, shown in Algo i hm 1, con ains
he edges in he g aph, which a e di ec ed in ou use case.
421
Pe cei ed Tex Rele ance Es ima ion Using Scanpa hs and GNNs ICMI ’24, No embe 04–08, 2024, San Jose, Cos a Rica
(a) Pa ag aph-based (b) Clus e -based (c) Qua ile-based (d) Line-based
Figu e 4: Schema ic examples o ou ou scanpa h g aph ep esen a ions whe e each colou ed elemen is a di e en node.
Algo i hm 1: G aph De ini ion
1 Le 𝐺𝑟𝑎𝑝ℎ = (𝑉 , 𝐸) be a di ec ed g aph, whe e 𝑉 is he se
o nodes and 𝐸 is he se o edges.
0 1 1
2 𝑉 =
𝑛𝑜𝑑𝑒 ( ), 𝑛𝑜𝑑𝑒 ( )
{, . . . , 𝑛𝑜𝑑𝑒 (𝑛− ) } o 𝑛 nodes.
3 𝐸 =
{𝑛𝑜𝑑𝑒 (𝑖)
,
(𝑛𝑜𝑑𝑒 (𝑗))} o each edge om 𝑛𝑜𝑑𝑒 (𝑖) o
𝑛𝑜𝑑𝑒 (𝑗 ) .
= (0) (
4 𝑥(𝑥 , . . . , 𝑥 𝑛−1)) =
((𝑓 (0) 1
, . . . , (0) 1
𝑓𝑚 ), (𝑛− ) (𝑛− )
0. . . , (𝑓 , . . . , 𝑓
0 𝑚 )) o 𝑛nodes
and 𝑚 node ea u es.
5 Each
𝑛𝑜𝑑𝑒 (𝑖)is ep esen ed by 𝑥 (𝑖 ) .
6 All connec ions om
𝑛𝑜𝑑𝑒 (𝑖) o o he nodes in he g aph
a e ep esen ed by edges (
(𝑛𝑜𝑑𝑒 𝑖 ) , 𝑛𝑜𝑑𝑒 (𝑗 )).
Fo di ec ed g aphs
𝑛𝑜𝑑𝑒 (𝑖) , 𝑛𝑜𝑑𝑒 (𝑗 ) )
7 ( ) ≠ (𝑗 (𝑖 )
(𝑛𝑜𝑑𝑒 , 𝑛𝑜𝑑𝑒 ).
3.4.2 GNN Model A chi ec u es. We used he same GNN ne wo k
a chi ec u es wi h bo h he g aph and node classi ica ion p oblems.
Wu e al
. [40]
s a ed in hei e iew ha an open esea ch ques ion is
whe he using deepe GNNs is ac ually a good s a egy o lea ning
g aph da a because he pe o mance o some ne wo ks ended o
d op wi h an inc ease in he numbe o g aph con olu ional laye s.
We kep ou ne wo ks simple o in es iga e whe he basic ne wo k
s uc u es could yield meaning ul esul s and insigh s. We used
PyTo ch Geome ic (PyG) o ou implemen a ions and used hei
documen a ion 3
as a s a ing poin .
We used he
Adam Op imise
[
22
] and he
C oss-en opy Loss
in ou GNN a chi ec u e. Due o
4
C oss-en opy Loss
in PyTo ch
al eady ha ing a
Sigmoid Ac i a ion
unc ion, we did no add an
ex a ac i a ion unc ion. The g aph classi ica ion GNN is shown
in Figu e 5a, while he node classi ica ion GNN is shown in Fig-
u e 5b. We had wo main di e ences be ween he g aph and node
classi ica ion ne wo ks: (1) o g aph classi ica ion, we used an
addi ional eadou laye , i.e.
Global A e age Pooling
, which p o-
duces a single global ep esen a ion o each g aph om i s nodes
g aph o he g aph classi ica ion p oblem, and (2) we used di e -
en no malisa ion s a egies be ween bo h p oblems. Acco ding
o Chen e al
. [10]
, g aph classi ica ion p oblems pe o m be e
when he node ea u es a e no malised using ba ch-based no mali-
sa ion, while node classi ica ion p oblems pe o m be e when he
ea u es a e no malised on a g aph-based no malisa ion. We used
Ba chNo m
[
20
] as ou ba ch-based no malisa ion o he g aph
classi ica ion, and
G aphNo m
[
7
] as ou g aph-based no malisa ion
o he node classi ica ion.
3h ps://py o ch-geome ic. ead hedocs.io/en/la es /ge _s a ed/colabs.h ml
4h ps://py o ch.o g/docs/s able/gene a ed/ o ch.nn.C ossEn opyLoss.h ml
We e alua ed a ious g aph con olu ional ope a o s such as
hose om Mo is e al
. [31]
and Kip and Welling
[23]5
. Howe e ,
using he
GAT 2
ope a o om B ody e al
. [5]
pe o med he bes ,
and we only ocus on i he e. The s anda d g aph a en ional (
GAT
)
ope a o [
37
] assigns a weigh , i.e. an a en ion coe icien , o each
node’s neighbou s, which indica es he impo ance o each neigh-
bou ing node, and by using mul iple a en ion heads, each head can
lea n a di e en ype o in o ma ion conce ning he neighbou hood.
While he
GAT
ope a o is compu a ionally e icien , i has s a ic
a en ion, meaning he weigh s a e ixed and canno adap based
on he que y o con ex . To add ess his, we u ilised he
GAT 2
ope a o , which allows o dynamic changes in he weigh s. This
lexibili y enables he model o adap be e and has been shown o
ou pe o m he adi ional
GAT
ope a o . Ou ne wo ks, as shown
in Figu e 5, consis ed o h ee
GAT 2Con
laye s, each ollowed by
an
ELU
ac i a ion unc ion because
GAT 2
inco po a es
LeakyReLU
in i s compu a ions, a
D opou
unc ion be o e he las laye o
p e en o e i ing, and a inal
Linea
laye , which mapped he
ou pu s om he con olu ion laye s o he numbe o classes.
4 EXPERIMENT
In all ou expe imen s, we spli g-REL in o All, Ag ee, and Topi-
cal subse s, and GoogleNQ in o All and Ag ee subse s, simila o
[
1
,
2
]. All con ained he whole da ase ; Ag ee con ained he da a
whe e he use ’s pe cei ed ele ance ma ched he sys em ele ance;
and Topical con ained he da a whe e he ex was on he opic o
in e es , bu wi hou ha ing he co ec que y answe .
We implemen ed a 5- old s a i ied lea e-use s-ou c oss- alida ion
using sciki -lea n
6
o spli he da a in o non-o e lapping aining
and es ing subse s. We used lea e-use s-ou c oss- alida ion be-
cause, o physiological da a, adi ional k- old c oss- alida ion
migh lead o o e es ima ing he model pe o mance [
11
,
12
]. Fo
each old wi h he adi ional machine lea ning models (i.e., RF and
SVM), 80% o he da a we e used o aining, and 20% we e used o
es ing. Howe e , wi h he CNN and GNN models, we used nes ed
c oss- alida ion o hype pa ame e op imisa ion using he Op una
amewo k
7
. The ou e c oss- alida ion loop spli he da a in o 20%
o es ing, and an inne c oss- alida ion loop spli he emaining
80% o he da a in o 64% o aining and 16% o alida ion. In he
inne c oss- alida ion loop, each model con igu a ion was es ed
on i e di e en aining and alida ion da a spli s, and hen he al-
ida ion pe o mance me ics we e a e aged o e he i e olds. The
5
Thei espec i e a chi ec u es and esul s a e a ailable in he Appendix in he supple-
men a y ma e ial.
6
h ps://sciki -lea n.o g/s able/modules/gene a ed/sklea n.model_selec ion.
S a i iedG oupKFold.h ml
7h ps://op una.o g/
422

ICMI ’24, No embe 04–08, 2024, San Jose, Cos a Rica M. Selim e al.
GAT 2Con GAT 2Con GAT 2Con
Ba ch
No malisa ion
(2, H)
(H, H)
Global A e age
Pooling
elu elu
(H, H)
D opou
Linea
(H, 2)
P edic ed
G aph Label
Inpu
0 1
3
2
(a) The G aph Classi ica ion Ne wo ks
GAT 2Con GAT 2Con GAT 2Con
G aph
No malisa ion
(2, H)
(H, H)
elu elu
(H, H)
D opou
Linea
(H, 2)
Inpu
0 1
3
2
N0
N1
N2
N3
P edic ed
Nodes Labels
(b) The Node Classi ica ion Ne wo ks
Figu e 5: Ou G aph Con olu ion Neu al Ne wo ks
con igu a ion wi h he bes a e age pe o mance me ic was hen
used o e ain he model on he whole aining/ alida ion da a and
p oduce he es ing esul s; his was epea ed i e imes o each
es ing subse . The pseudocode, shown in Algo i hm 2, summa ises
his p ocess. We used 5- old lea e-use s-ou c oss- alida ion be-
cause i e is a commonly used alue, i is compu a ionally e icien
o nes ed c oss- alida ion, and ou da a is qui e small o wo
10- old c oss- alida ion loops. Ou codes a e publicly a ailable on
Gi Hub8 .
We used balanced accu acy as ou main e alua ion me ic. Bal-
anced accu acy is he a e age o he sensi i i y (i.e., he ue posi i e
a e o how many posi i e labels we e co ec ly classi ied as posi-
i e) and he speci ici y (i.e., he ue nega i e a e o how many
nega i e labels we e co ec ly classi ied as nega i e). Ou goal is o
co ec ly iden i y a use ’s pe cei ed ele ance owa ds a piece o
ex , which means ocusing on co ec ly iden i ied labels, whe he
ele an o i ele an . Due o he da a imbalance, especially o
he Topical subse , we compu ed he balanced accu acy because
i gi es equal weigh o bo h posi i e and nega i e classes. How-
e e , Ba z e al
. [1]
used 1-sco e, which is he ha monic mean o
he sensi i i y and he p ecision (i.e., how many co ec posi i e
p edic ions exis in he o al posi i e p edic ions). The issue is ha
he 1-sco e does no ake in o conside a ion he amoun o ue
nega i e classi ica ions, which is why we decided o use balanced
accu acy as he main e alua ion me ic ins ead.
4.1 Resul s
Table 2 shows he es balanced accu acies a e aged ac oss he
5- old s a i ied lea e-use s-ou c oss- alida ion. Fo each co pus
subse , he esul o he bes -pe o ming model is emphasised and
unde lined. In he Appendix
9
, we epo ed addi ional pe o mance
me ics, such as 1-sco e, ue posi i e a e, alse posi i e a e, and
a ea unde he cu e.
4.1.1 T adi ional Machine Lea ning. Fo g-REL, using all 17 ea-
u es wi h
RF
esul ed in a 0.624 balanced accu acy o All. Bo h
ea u e subse s esul ed in an almos iden ical balanced accu acy
o 0.692 o Ag ee using
SVM
∗
. On a e age, using all 17 ea u es
p oduced be e esul s o All and Ag ee. None o he app oaches
p oduced balanced accu acies abo e 0.6 o Topical. Fo GoogleNQ,
using all 17 ea u es wi h
SVM
∗
esul ed in a 0.604 balanced accu acy
8
h ps://gi hub.com/DFKI-In e ac i e-Machine-Lea ning/GNN-Scanpa h-Analysis-
ICMI2024
9In he supplemen a y ma e ial.
Algo i hm 2: GNN and CNN Model T aining and E alua-
ion Using Nes ed C oss- alida ion
Inpu : Scanpa h Da a
Ou pu : A e age Tes Da a Balanced Accu acy
1 o each old 𝑖 ∈ {1, 2, 3, 4, 5} do
Spli inpu da a in o aining/ alida ion se
𝑖
2 𝐷 ain, al
and es se 𝐷𝑖
es
3 o Op una ial 𝑡 ∈ {1, 2, . . . , 𝑛𝑡 𝑟 }
𝑖𝑎𝑙𝑠 do
4 Pick model con igu a ion 𝑐
𝑖
𝑡
5 o each old 𝑗 ∈ {1, 2, 3, 4, 5} do
6Spli 𝑖
𝑖 𝑗
𝐷in o aining se 𝐷
ain, al ain and
alida ion se 𝐷
𝑖 𝑗
al
T ain Model 𝑖
7 𝑗
𝑚
𝑐𝑡 using con igu a ion 𝑐
𝑖
𝑡and
aining se
𝑖 𝑗
𝐷 ain
8
E alua e Model
𝑚
𝑖 𝑗
on alida ion se
𝑖 𝑗
𝐷
𝑐𝑡
o ge
al
he Balanced Accu acy 𝑖 𝑗
𝐵𝐴 al,𝑐𝑡
9 S o e model con igu a ion 𝑐
𝑖
𝑡, and Balanced
Accu acy 𝑖
𝐵𝐴 𝑗
al,𝑐𝑡
10 Compu e A e age Valida ion Balanced Accu acy
𝑖
𝐵𝐴 al,𝑐𝑡 o con igu a ion 𝑐
𝑖
𝑡
11 De e mine he con igu a ion 𝑐
𝑖
bes wi h he maximum
𝑖
𝐵𝐴 al,𝑐𝑡
12 T ain he bes model 𝑚
𝑖
bes using 𝑐
𝑖and 𝐷
𝑖
bes ain, al
13
Tes
𝑚𝑖
on
𝐷𝑖
o ge es
bes es
Balanced Accu acy
𝐵𝐴𝑖
es
14 S o e 𝑐𝑖 , and 𝐵𝐴𝑖
bes es
15 Compu e A e age Tes Balanced Accu acy 𝐵𝐴 es
o All, bu none o he app oaches p oduced balanced accu acies
abo e 0.6 o Ag ee.
4.1.2 Con olu ional Neu al Ne wo k. Fo g-REL, he CNN app oach
p oduced a e age balanced accu acies o 0.676 and 0.768 o All and
Ag ee, espec i ely. Fo GoogleNQ, he CNN app oach p oduced
a 0.603 balanced accu acy o Ag ee. Howe e , o he emaining
subse s om bo h co po a, he CNN esul ed in balanced accu acies
below 0.6.
4.1.3 G aph Neu al Ne wo k. Fo he g aph classi ica ion using
g-REL, he 17 node ea u es esul ed in highe balanced accu acies
o bo h All and Ag ee compa ed o using only wo node ea u es.
423
Pe cei ed Tex Rele ance Es ima ion Using Scanpa hs and GNNs ICMI ’24, No embe 04–08, 2024, San Jose, Cos a Rica
The pa ag aph-based scanpa h g aph ep esen a ion p oduced he
highes a e age balanced accu acies o 0.701 and 0.691 o All and
Ag ee, espec i ely. Howe e , o Topical, he clus e -based scanpa h
g aph ep esen a ion esul ed in a 0.648 a e age balanced accu acy
using only wo node ea u es, as opposed o 0.621 using all 17 node
ea u es.
Fo he node classi ica ion using GoogleNQ, he 17 node ea u es
we e sligh ly be e o bo h co po a. Howe e , he a e age balanced
accu acies we e below 0.6 wi h 0.553 and 0.559 o All and Ag ee,
espec i ely.
5 DISCUSSION
In his s udy, we implemen ed a GNN o p ocess scanpa h da a o
pe cei ed ele ance es ima ion, ocusing on bo h g aph and node
classi ica ion asks using he gazeRE da ase . We used es ablished
me hods om he li e a u e as baselines, compa ing ou GNN e-
sul s wi h hose ob ained using adi ional and neu al ne wo k
machine lea ning algo i hms, namely SVM, RF, and CNN classi ie s.
The expe imen s we e conduc ed wi h wo p ima y objec i es: (1)
o assess he e ec i eness o GNNs in scanpa h analysis o pe -
cei ed ele ance es ima ion and (2) o compa e he pe o mance o
GNNs wi h ha o es ablished me hods om he li e a u e.
5.1 T adi ional Machine Lea ning
Fo g-REL, using all 17 ea u es esul ed in be e accu acies han
jus he wo con ex hull ea u es, which aligns wi h he indings
om Bha acha ya e al
. [2]
. Ou esul s we e also consis en wi h
he esul s om Ba z e al
. [1]
, who epo ed bes balanced accu-
acies o 0.605 o All, 0.689 o Ag ee, and 0.527 o Topical. This
di e ence in pe o mance can be a ibu ed o Ba z e al
. [1]
using
no mal k- old ins ead o lea e-use s-ou c oss- alida ion. O e all,
Ag ee pe o med be e han All, and none o he app oaches p o-
duced meaning ul esul s o Topical, which aligns wi h he indings
om bo h Bha acha ya e al. [2] and Ba z e al. [1].
Fo GoogleNQ, Ba z e al
. [1]
we e only able o achie e a 0.57
and a 0.543 balanced accu acy o bo h All and Ag ee, espec i ely.
Fo Ag ee, we achie ed simila esul s o Ba z e al
. [1]
. Howe e ,
o All, we we e able o achie e a be e balanced accu acy using
all 17 ea u es. The
SVM∗
was he only app oach, ac oss all expe i-
men s, including he GNN and CNN app oaches, o each a balanced
accu acy abo e 0.6 o All. The wo con ex hull ea u es we e un-
success ul in p oducing any meaning ul esul s o ei he subse .
O e all, despi e he imp o emen o All, we belie e i wa an s u -
he esea ch as we canno conclude he success o his app oach a
p edic ing use s’ pe cei ed ele ance wi h longe ex documen s.
5.2 Con olu ional Neu al Ne wo k
Fo g-REL, he
CNN
pe o med qui e well. I esul ed in 0.676, 0.768,
and 0.572 balanced accu acies o All, Ag ee, and Topical, espec-
i ely. Wi h a 5.2% and a 7.2% absolu e di e ence o All and Ag ee
compa ed o he adi ional machine lea ning classi ie s, which is
a no iceable imp o emen . Howe e , despi e he imp o emen o
Topical, i s balanced accu acy is s ill below 0.6. O e all, we belie e
ha he
CNN
was success ully eplica ed on a new da ase . This
p o es i s abili y o p edic use s’ pe cei ed ele ance, bu on sho
ex documen s.
Fo GoogleNQ, he
CNN
p oduced he o e all bes esul s o
he Ag ee subse wi h a 0.603 balanced accu acy. Howe e , i did
no p oduce any meaning ul esul s o All. The e o e, we canno
each a p ope conclusion ega ding i s success a p edic ing use s’
pe cei ed ele ance wi h longe ex documen s.
5.3 G aph Neu al Ne wo k
Fo he GNN expe imen s, we s a wi h he g aph classi ica ion
ask using g-REL, and hen we discuss he node classi ica ion ask
using GoogleNQ.
5.3.1 G aph Classi ica ion Task. Fo g-REL, he
GNN
ou pe o med
CNN and adi ional machine lea ning o bo h All and Topical. I
achie ed a 0.701 balanced accu acy o All using he pa ag aph-
based g aph ep esen a ion wi h 17 node ea u es; his is a 2.5% and
a 7.7% imp o emen o e CNN and adi ional machine lea ning,
espec i ely. Fo Topical, i achie ed a 0.648 balanced accu acy using
he clus e -based g aph ep esen a ion wi h wo node ea u es,
which is a 7.6% and a 14.3% imp o emen o e CNN and adi ional
machine lea ning, espec i ely. Fo Ag ee, he highes balanced
accu acy was 0.691 using he pa ag aph-based g aph ep esen a ion
wi h 17 node ea u es; his closely ma ched adi ional machine
lea ning bu ell sho o CNN by 7.7%. Using all 17 ea u es, wi h he
no malisa ion s ep, p o ided he bes pe o mance o bo h All and
Ag ee. Howe e , o Topical, using only ixa ion du a ion and he
numbe o ixa ions in each node wi hou no malisa ion p oduced
be e esul s. Ou assump ion is ha when using All, he model
equi es mo e in o ma ion o di e en ia e be ween he di e en
classes, bu o Topical, a mo e concise iew o he p oblem is mo e
bene icial. O e all, he GNN app oach was e ec i e in analysing
scanpa hs o pe cei ed ele ance es ima ion o sho documen s in
a g aph classi ica ion ask, ou pe o ming he baseline app oaches.
Rega ding he scanpa h g aph ep esen a ion o ma s, pa ag aph,
line, and qua ile-based app oaches pe o med well wi h bo h All
and Ag ee, wi h pa ag aph p oducing be e balanced accu acies
o bo h. These h ee ep esen a ion o ma s migh ha e been suc-
cess ul because hey e ained some seman ic in o ma ion abou he
ex o m; his equi es u he analysis o s udy he isualisa ion o
he gene a ed g aphs supe imposed o e he s imuli o each clas-
si ica ion esul and see i he e a e indeed any no iceable pa e ns
o each subse . Howe e , he clus e -based g aph ep esen a ion
using A ini y P opaga ion p oduced be e balanced accu acies o
Topical. This migh be because he ixa ions we e mo e ocused
on ce ain a eas, e.g. ce ain wo ds, so au oma ically gene a ed
clus e s we e able o ind pa e ns unique o he espec i e labels,
which migh no ha e been ound using pa ag aph, line, o qua ile-
based app oaches; his equi es u he in es iga ion o check his
assump ion. G aph gene a ion is qui e impo an and could lead o
in e es ing esea ch ques ions ega ding how di e en app oaches
hold up in di e en applica ions and inding be e -pe o ming
gene ic scanpa h g aph gene a ion app oaches.
5.3.2 Node Classi ica ion Task. Fo GoogleNQ, he GNN app oach
was unable o imp o e he balanced accu acies o ei he All o
Ag ee. The GoogleNQ co pus did no ha e ue i ele an pa a-
g aphs, and he e was a high da a imbalance be ween ele an
and i ele an pa ag aphs. E en wi h di e en g aph con olu ional
424
ICMI ’24, No embe 04–08, 2024, San Jose, Cos a Rica M. Selim e al.
Table 2: The a e age balanced accu acy esul s (
𝜇 ± 𝜎
) o g-REL and GoogleNQ using 5- old lea e-use s-ou c oss- alida ion.
GNN models include Pa ag aph (PB), Line (LB), Clus e (CB), and Qua ile-based (QB) g aph s uc u es. The subsc ip numbe
indica es he o al numbe o ea u es (ei he 2 o 17), excep o he LB GNN, which did no ha e he i e a ea-based ea u es.
The bes o e all esul o each co pus subse is unde lined and emphasized.
g-REL GoogleNQ
All Ag ee Topical All Ag ee
Baseline Models
RF17 0.624 ± 0.056 0.651 ± 0.076 0.494 ± 0.097 0.527 ± 0.064 0.484 ± 0.015
RF∗
17 0.600 ± 0.045 0.650 ± 0.092 0.505 ± 0.101 0.572 ± 0.059 0.502 ± 0.108
SVM∗
17 0.607 ± 0.044 0.692 ± 0.107 0.490 ± 0.124 0.604 ± 0.047 0.542 ± 0.103
RF2 0.486 ± 0.056 0.557 ± 0.139 0.444 ± 0.050 0.479 ± 0.038 0.461 ± 0.027
RF∗
2 0.477 ± 0.049 0.570 ± 0.135 0.371 ± 0.116 0.503 ± 0.051 0.578 ± 0.218
SVM∗
2 0.587 ± 0.065 0.692 ± 0.031 0.454 ± 0.088 0.588 ± 0.033 0.509 ± 0.106
CNN 0.676 ± 0.078 0.768 ± 0.107 0.572 ± 0.086 0.552 ± 0.024 0.603 ± 0.083
G aph Neu al Ne wo k
PB17 0.701 ± 0.021 0.691 ± 0.114 0.486 ± 0.116 0.553 ± 0.049 0.559 ± 0.049
LB12 0.674 ± 0.050 0.668 ± 0.070 0.528 ± 0.086 − −
CB17 0.563 ± 0.111 0.664 ± 0.073 0.621 ± 0.220 − −
QB17 0.634 ± 0.034 0.674 ± 0.059 0.584 ± 0.129 − −
PB2 0.650 ± 0.040 0.682 ± 0.103 0.591 ± 0.124 0.548 ± 0.035 0.535 ± 0.031
LB2 0.646 ± 0.031 0.682 ± 0.082 0.555 ± 0.081 − −
CB2 0.492 ± 0.089 0.534 ± 0.031 0.648 ± 0.205 − −
QB2 0.606 ± 0.043 0.645 ± 0.083 0.441 ± 0.132 − −
ope a o s, none o hem p oduced any meaning ul abo e chance
le el esul s. We hink in o de o es ele ance es ima ion while
eading longe documen s ha equi e sc olling, a mo e balanced
da ase is equi ed by ei he assigning a single label o a la ge doc-
umen o ha ing he same numbe o pa ag aphs co esponding o
each label, in addi ion o ha ing uly i ele an pa ag aphs, no
jus ele an and opical ones. Al hough we canno conclude ha
ou app oach gene alises o mul i-pa ag aph documen s, he 0.604
balanced accu acy achie ed by
SVM
∗
makes us belie e ha in es i-
ga ing di e en node classi ica ion algo i hms om he li e a u e,
and using di e en node and edge ea u es migh lead o be e
esul s o GoogleNQ be o e a emp ing o collec a new da ase
and dismissing his one.
6 CONCLUSION
This pape in es iga ed he easibili y and po en ial o using GNNs
o scanpa h analysis o a passi e gaze-based applica ion, i.e., im-
plici ele ance es ima ion du ing eading. Ou expe imen s used
he gazeRE da ase [
1
], allowing us o es GNNs o g aph and node
classi ica ion asks based on ex s om he g-REL and GoogleNQ
co po a, espec i ely. We implemen ed a e y simple GNN wi h
h ee GAT 2 con olu ional laye s. Fo he g aph classi ica ion ask,
we e alua ed ou me hods o gene a ing g aph s uc u es om
scanpa hs, while o he node classi ica ion ask, we used a sin-
gle g aph gene a ion app oach. As a baseline, we ep oduced he
me hod om Ba z e al
. [1]
using RF and SVM classi ie s wi h 17
eye acking ea u es. We also ained hese classi ie s using only
wo con ex hull-based ea u es by [
2
]. In addi ion, o compa e
agains a neu al ne wo k, we eplica ed he CNN app oach om
Bha acha ya e al
. [3]
, which we also e alua ed, o he i s ime,
on long documen s.
ACKNOWLEDGMENTS
Fo g-REL, he GNN p oduced he bes esul s o All and Top-
ical subse s, while he CNN p oduced he bes esul s o Ag ee.
Based on all he p esen ed indings, we ha e shown ha GNNs a e
sui able o p ocessing scanpa h da a o use s’ pe cei ed ele ance
es ima ion o sho ex documen s, which migh wa an u u e in-
es iga ion o o he passi e gaze-based applica ions. Addi ionally,
we ha e shown ha he CNN app oach p oposed by Bha acha ya
e al
. [3]
is alid o pe cei ed ele ance es ima ion on sho doc-
umen s by e alua ing i on a new da ase . Fo GoogleNQ, he
node classi ica ion GNN was no success ul in imp o ing o p o-
ducing meaning ul abo e chance le el esul s. In addi ion, we also
could no conclude he applicabili y o he CNN o long docu-
men s. Based on ele an li e a u e, we used e y simple GNNs and
node ea u es in ou app oach o see he easibili y and po en ial
bene i s o using GNNs o scanpa h p ocessing in ele ance es-
ima ion while eading. We belie e he app oach equi es u he
in es iga ion o di e en and mo e complex GNN a chi ec u es and
di e en ea u es, e.g., ea u e selec ion o he node ea u es o
adding edge ea u es such as he ac ual dis ance be ween he nodes.
Ou cu en esul s sugges he easibili y o using GNNs o scan-
pa h p ocessing, bu u he s udies a e equi ed o in es iga e i s
gene alisabili y o mo e di e se passi e gaze-based applica ions. In
addi ion, u u e s udies should conside la ge da ase s o alida e
he p esen ed indings.
This wo k was unded, in pa , by he Eu opean Union unde g an
numbe 101093079 (MASTER), and he Ge man Fede al Minis y o
Educa ion and Resea ch (BMBF) unde g an numbe 01IW23002
(No-IDLE).
425
Pe cei ed Tex Rele ance Es ima ion Using Scanpa hs and GNNs ICMI ’24, No embe 04–08, 2024, San Jose, Cos a Rica
REFERENCES
[1]
Michael Ba z, Omai Shahzad Bha i, and Daniel Sonn ag. 2022. Implici Es ima-
ion o Pa ag aph Rele ance F om Eye Mo emen s. F on ie s in Compu e Science
3 (2022). h ps://doi.o g/10.3389/ comp.2021.808507
[2]
Nila a Bha acha ya, Somna h Rakshi , and Jacek Gwizdka. 2020. Towa ds
Real- ime Webpage Rele ance P edic ion UsingCon ex Hull Based Eye- acking
Fea u es. In ACM Symposium on Eye T acking Resea ch and Applica ions (ETRA
’20 Adjunc ). Associa ion o Compu ing Machine y, New Yo k, NY, USA, 1–10.
h ps://doi.o g/10.1145/3379157.3391302
[3]
Nila a Bha acha ya, Somna h Rakshi , Jacek Gwizdka, and Paul Kogu . 2020. Rel-
e ance P edic ion om Eye-Mo emen s Using Semi-In e p e able Con olu ional
Neu al Ne wo ks. In P oceedings o he 2020 Con e ence on Human In o ma ion
In e ac ion and Re ie al (CHIIR ’20). Associa ion o Compu ing Machine y, New
Yo k, NY, USA, 223–233. h ps://doi.o g/10.1145/3343413.3377960 e en -place:
Vancou e BC, Canada.
[4]
T. Blascheck, K. Ku zhals, M. Raschke, M. Bu ch, D. Weiskop , and T. E l. 2017.
Visualiza ion o Eye T acking Da a: A Taxonomy and Su ey: Visualiza ion
o Eye T acking Da a. Compu e G aphics Fo um 36, 8 (Dec. 2017), 260–284.
h ps://doi.o g/10.1111/cg .13079
[5]
Shaked B ody, U i Alon, and E an Yaha . 2022. How A en i e a e G aph A -
en ion Ne wo ks? h ps://doi.o g/10.48550/a Xi .2105.14491 a Xi :2105.14491
[cs].
[6]
Geo g Busche , And eas Dengel, and Ludge an Els . 2008. Eye Mo emen s as
Implici Rele ance Feedback. In CHI ’08 Ex ended Abs ac s on Human Fac o s in
Compu ing Sys ems (CHI EA ’08). Associa ion o Compu ing Machine y, New
Yo k, NY, USA, 2991–2996. h ps://doi.o g/10.1145/1358628.1358796 e en -place:
Flo ence, I aly.
[7]
Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, and Liwei Wang. 2021.
G aphNo m: A P incipled App oach o Accele a ing G aph Neu al Ne wo k
T aining. h ps://doi.o g/10.48550/a Xi .2009.03294 a Xi :2009.03294 [cs, ma h,
s a ].
[8]
Vi ginio Can oni, Chia a Galdi, Michele Nappi, Ma co Po a, and Daniel Riccio.
2015. GANT: Gaze analysis echnique o human iden i ica ion. Pa e n Recogni-
ion 48, 4 (Ap il 2015), 1027–1038. h ps://doi.o g/10.1016/j.pa cog.2014.02.017
[9]
N. V. Chawla, K. W. Bowye , L. O. Hall, and W. P. Kegelmeye . 2002. SMOTE:
Syn he ic Mino i y O e -sampling Technique. Jou nal o A i icial In elligence
Resea ch 16 (June 2002), 321–357. h ps://doi.o g/10.1613/jai .953
[10]
Yihao Chen, Xin Tang, Xianbiao Qi, Chun-Guang Li, and Rong Xiao. 2020. Lea n-
ing G aph No maliza ion o G aph Neu al Ne wo ks. h p://a xi .o g/abs/2009.
11746 a Xi :2009.11746 [cs].
[11]
Youngjun Cho. 2021. Re hinking Eye-blink: Assessing Task Di icul y h ough
Physiological Rep esen a ion o Spon aneous Blinking. In P oceedings o he 2021
CHI Con e ence on Human Fac o s in Compu ing Sys ems (CHI ’21). Associa ion
o Compu ing Machine y, New Yo k, NY, USA, 1–12. h ps://doi.o g/10.1145/
3411764.3445577
[12]
Akba Dehghani, T is an Gla a d, and Emad Shihab. 2019. Subjec C oss Valida-
ion in Human Ac i i y Recogni ion. h ps://doi.o g/10.48550/a Xi .1904.02666
a Xi :1904.02666 [cs, s a ].
[13]
And ew T. Duchowski. 2018. Gaze-based in e ac ion: A 30 yea e ospec i e.
Compu e s & G aphics 73 (June 2018), 59–69. h ps://doi.o g/10.1016/j.cag.2018.
04.002
[14]
João Ma celo E angelis a Belo, Ma hias N. Lys bæk, Anna Ma ia Fei , Ken P eu e ,
Pe e Kán, An i Oulas i a, and Kaj G ønbæk. 2022. AUIT – he Adap i e Use
In e aces Toolki o Designing XR Applica ions. In P oceedings o he 35 h
Annual ACM Symposium on Use In e ace So wa e and Technology (UIST ’22).
Associa ion o Compu ing Machine y, New Yo k, NY, USA. h ps://doi.o g/10.
1145/3526113.3545651 e en -place: Bend, OR, USA.
[15]
Anna Ma ia Fei , Lukas Vo demann, Seonwook Pa k, Ca e ina Be ube, and O ma
Hilliges. 2020. De ec ing Rele ance du ing Decision-Making om Eye Mo e-
men s o UI Adap a ion. In ACM Symposium on Eye T acking Resea ch and
Applica ions (ETRA ’20 Full Pape s). Associa ion o Compu ing Machine y, New
Yo k, NY, USA, 1–11. h ps://doi.o g/10.1145/3379155.3391321
[16]
B endan J. F ey and Delbe Dueck. 2007. Clus e ing by Passing Messages
Be ween Da a Poin s. Science 315, 5814 (Feb 2007), 972–976. h ps://doi.o g/10.
1126/science.1136800
[17]
Jacek Gwizdka. 2014. Cha ac e izing Rele ance wi h Eye-T acking Measu es.
In P oceedings o he 5 h In o ma ion In e ac ion in Con ex Symposium (IIiX
’14). Associa ion o Compu ing Machine y, New Yo k, NY, USA, 58–67. h ps:
//doi.o g/10.1145/2637002.2637011 e en -place: Regensbu g, Ge many.
[18]
Jacek Gwizdka. 2014. News S o ies Rele ance E ec s on Eye-Mo emen s. In
P oceedings o he Symposium on Eye T acking Resea ch and Applica ions (ETRA
’14). Associa ion o Compu ing Machine y, New Yo k, NY, USA, 283–286. h ps:
//doi.o g/10.1145/2578153.2578198 e en -place: Sa e y Ha bo , Flo ida.
[19]
Kenne h Holmq is , Ma cus Nys om, Richa d Ande sson, Richa d Dewhu s ,
Halszka Ja odzka, Weije , and Joos an de. 2011. Eye T acking: A comp ehensi e
guide o me hods and measu es. Ox o d Uni e si y P ess, Ox o d, New Yo k.
[20]
Se gey Io e and Ch is ian Szegedy. 2015. Ba ch No maliza ion: Accele a ing
Deep Ne wo k T aining by Reducing In e nal Co a ia e Shi . h ps://doi.o g/
10.48550/a Xi .1502.03167 a Xi :1502.03167 [cs].
[21]
Naji Khos a an, Hayda Celik, Ba is Tu kbey, Elizabe h C. Jones, B ad o d Wood,
and Ulas Bagci. 2019. A collabo a i e compu e aided diagnosis (C-CAD) sys em
wi h eye- acking, spa se a en ional model, and deep lea ning. Medical Image
Analysis 51 (Jan. 2019), 101–115. h ps://doi.o g/10.1016/j.media.2018.10.010
[22]
Diede ik P. Kingma and Jimmy Ba. 2017. Adam: A Me hod o S ochas ic Op i-
miza ion. h ps://doi.o g/10.48550/a Xi .1412.6980 a Xi :1412.6980 [cs].
[23]
Thomas N. Kip and Max Welling. 2016. Semi-Supe ised Classi ica ion wi h
G aph Con olu ional Ne wo ks. (2016). h ps://doi.o g/10.48550/ARXIV.1609.
02907
[24]
Tom Kwia kowski, Jennima ia Palomaki, Oli ia Red ield, Michael Collins, Anku
Pa ikh, Ch is Albe i, Danielle Eps ein, Illia Polosukhin, Jacob De lin, Ken on Lee,
K is ina Tou ano a, Llion Jones, Ma hew Kelcey, Ming-Wei Chang, And ew M.
Dai, Jakob Uszko ei , Quoc Le, and Sla Pe o . 2019. Na u al Ques ions: A
Benchma k o Ques ion Answe ing Resea ch. T ansac ions o he Associa ion o
Compu a ional Linguis ics 7 (Aug. 2019), 453–466. h ps://doi.o g/10.1162/ acl_
a_00276
[25]
Guohao Lan, Bailey Hei , Tim Sca gill, and Ma ia Go la o a. 2020. GazeG aph:
g aph-based ew-sho cogni i e con ex sensing om human isual beha io .
In P oceedings o he 18 h Con e ence on Embedded Ne wo ked Senso Sys ems
(SenSys ’20). Associa ion o Compu ing Machine y, New Yo k, NY, USA, 422–435.
h ps://doi.o g/10.1145/3384419.3430774
[26]
Guillaume Lemaî e, Fe nando Noguei a, and Ch is os K. A idas. 2017.
Imbalanced-lea n: A Py hon Toolbox o Tackle he Cu se o Imbalanced Da ase s
in Machine Lea ning. Jou nal o Machine Lea ning Resea ch 18, 17 (2017), 1–5.
h p://jml .o g/pape s/ 18/16-365.h ml
[27]
Beibin Li, Nicholas Nuech e lein, E in Ba ney, Clai e Fos e , Minah Kim, Monique
Mahony, Adham A yabi, Li Feng, Quan Wang, Pamela Ven ola, Linda Shapi o,
and F ede ick Shic. 2021. Lea ning Oculomo o Beha io s om Scanpa h. In
P oceedings o he 2021 In e na ional Con e ence on Mul imodal In e ac ion. ACM,
Mon éal QC Canada, 407–415. h ps://doi.o g/10.1145/3462244.3479923
[28]
Xiaochuan Ma, Yikang Liu, Roy Cla iana, Chanyuan Gu, and Ping Li. 2023.
F om eye mo emen s o scanpa h ne wo ks: A me hod o s udying indi idual
di e ences in exposi o y ex eading. Beha io Resea ch Me hods 55, 2 (Feb.
2023), 730–750. h ps://doi.o g/10.3758/s13428-022-01842-3
[29]
Päi i Maja an a and And eas Bulling. 2014. Eye T acking and Eye-Based Hu-
man–Compu e In e ac ion. Sp inge , London, 39–65. h ps://doi.o g/10.1007/978-
1-4471-6392-3_3
[30]
Abdul ahman Mohamed Selim, Michael Ba z, Omai Shahzad Bha i, Hasan
Md Tus iqu Alam, and Daniel Sonn ag. 2024. A e iew o machine lea ning
in scanpa h analysis o passi e gaze-based in e ac ion. F on ie s in A i icial
In elligence 7 (June 2024). h ps://doi.o g/10.3389/ ai.2024.1391745 Publishe :
F on ie s.
[31]
Ch is ophe Mo is, Ma in Ri ze , Ma hias Fey, William L. Hamil on, Jan E ic
Lenssen, Gau a Ra an, and Ma in G ohe. 2018. Weis eile and Leman Go
Neu al: Highe -o de G aph Neu al Ne wo ks. (2018). h ps://doi.o g/10.48550/
ARXIV.1810.02244
[32]
E ik No ak, Luka Bizjak, Dunja Mladenić, and Ma ko G obelnik. 2022. Why
is a documen ele an ? Unde s anding he ele ance sco es in c oss-lingual
documen e ie al. Knowledge-Based Sys ems 244 (2022), 108545. h ps://doi.
o g/10.1016/j.knosys.2022.108545
[33]
Douglas Oa d and Jinmook Kim. 1998. Implici Feedback o Recommende
Sys em. P oceedings o he AAAI Wo kshop on Recommende Sys ems (1998).
h ps://cs. i .edu/~pkc/apweb/ ela ed/oa d-aaaiWS98.pd
[34]
Pe nilla Q a o d . 2017. Gaze-in o med mul imodal in e ac ion. In The Handbook
o Mul imodal-Mul isenso In e aces: Founda ions, Use Modeling, and Common
Modali y Combina ions - Volume 1, Sha on O ia , Bjö n Schulle , Philip R. Cohen,
Daniel Sonn ag, Ge asimos Po amianos, and An onio K üge (Eds.). ACM, 365–
402. h ps://doi.o g/10.1145/3015783.3015794
[35]
Lei Shi, Cosmin Copo , and S e e Vanlandui . 2021. Gaze Ges u e Recogni ion
by G aph Con olu ional Ne wo ks. F on ie s in Robo ics and AI 8 (2021). h ps:
//doi.o g/10.3389/ ob .2021.709952
[36]
Ka en Simonyan and And ew Zisse man. 2015. Ve y Deep Con olu ional Ne -
wo ks o La ge-Scale Image Recogni ion. h ps://doi.o g/10.48550/a Xi .1409.
1556 a Xi :1409.1556 [cs].
[37]
Pe a Veličko ić, Guillem Cucu ull, A an xa Casano a, Ad iana Rome o, Pie o
Liò, and Yoshua Bengio. 2017. G aph A en ion Ne wo ks. (2017). h ps:
//doi.o g/10.48550/ARXIV.1710.10903
[38]
Bin Wang, Hongyi Pan, A ms ong Aboah, Zheyuan Zhang, Eli Keles, D ew
To igian, Ba is Tu kbey, Elizabe h K upinski, Jaya am Udupa, and Ulas Bagci.
2024. GazeGNN: A Gaze-Guided G aph Neu al Ne wo k o Ches X-Ray Classi-
ica ion. 2194–2203. h ps://openaccess. hec .com/con en /WACV2024/h ml/
Wang_GazeGNN_A_Gaze-Guided_G aph_Neu al_Ne wo k_ o _Ches _X-
Ray_Classi ica ion_WACV_2024_pape .h ml
[39]
Ryen W. Whi e, Ian Ru h en, and Joemon M. Jose. 2002. The Use o Implici
E idence o Rele ance Feedback in Web Re ie al. In Ad ances in In o ma ion
426