A systematic performance comparison of two Smooth Pursuit detection algorithms in Virtual Reality depending on target number, distance, and movement patterns [original]

Journal of Eye Movement Research, 15(3):9

https://doi.org/10.16910/jemr.15.3.9

Introduction

The use of the human gaze to interact with machines or

software has become a viable alternative to traditional

means of input. Compared to mouse control, gaze-based

interaction techniques can be faster and particularly useful

in situations where both hands are needed to perform a task

(Sibert & Jacob, 2000) or in hygiene-critical situations,

such as surgery (Mewes et al., 2017).

Especially smooth pursuit movements have proven

suitable to provide a range of unobtrusive interaction

methods, that allow a broad range of users to interact ef-

fectively with gaze-controlled interfaces. Applications

range from novel takes on gaze-spelling that let users se-

lect their target letter by simply following its’ movement

A systematic performance comparison of

two Smooth Pursuit detection algorithms

in Virtual Reality depending on target

number, distance, and movement patterns

Sarah-Christin Freytag

TU Berlin, Berlin, Germany

Roland Zechner

TU Berlin, Berlin, Germany

Michelle Kamps

TU Berlin, Berlin, Germany

We compared the performance of two smooth-pursuit-based object selection algorithms in

Virtual Reality (VR). To assess the best algorithm for a range of configurations, we system-

atically varied the number of targets to choose from, their distance, and their movement

pattern (linear and circular). Performance was operationalized as the ratio of hits, misses

and non-detections. Averaged over all distances, the correlation-based algorithm performed

better for circular movement patterns compared to linear ones (F(1,11) = 24.27, p < .001, η²

= .29). This was not found for the difference-based algorithm (F(1,11) = 0.98, p = .344, η²

= .01). Both algorithms performed better in close distances compared to larger ones (F(1,11)

= 190.77, p < .001, η² = .75 correlation-based, and F(1,11) = 148.20, p < .001, η² = .42,

difference-based). An interaction effect for distance x movement emerged. After systemat-

ically varying the number of targets, these results could be replicated, with a slightly smaller

effect.

Based on performance levels, we introduce the concept of an optimal threshold algorithm,

suggesting the best detection algorithm for the individual target configuration. Learnings of

adding the third dimension to the detection algorithms and the role of distractors are dis-

cussed and suggestions for future research added.

Keywords: Eye movement, Eye Tracking, Smooth Pursuit, VR, Virtual Reality,

correlation-based algorithm, vector-angle based algorithm

*Corresponding author: Sarah-Christin Freytag, sarah.frey[email protected]-berlin.de

Received February 08, 2023; Published May 29, 2023.

Citation: Freytag, S.-C., Zechner, R. & Kamps, M. (2023). A systematic performance comparison of

two Smooth Pursuit detection algorithms in Virtual Reality depending on target number, distance, and

movement patterns. Journal of Eye Movement Research, 15(3):9. https://doi.org/10.16910/jemr.15.3.9

ISSN: 1995-8692

This article is licensed under a Creative Commons Attribution 4.0 International license.

Journal of Eye Movement Research Freytag, S.-C., Zechner, R., & Kamps, M. (2023)

15(3):9 A systematic performance comparison of two Smooth-Pursuit Algorithms in VR

with their eyes (Cymek et al., 2014; Khamis et al., 2016;

Lutz et al., 2015) to controlling smart-phone applications

by observing the movement speed of icons for applica-

tions, that, after surpassing a specific matching-criterion,

will then be opened (Esteves et al., 2015). The ease of use

and usage of very natural gaze movements make these in-

teractions also suitable for interactions in public spaces

(Khamis et al., 2015; Vidal et al., 2013) and have shown

promising results when tested with large databases of users

(Freytag, 2020).

One of the great advantages of employing smooth-pur-

suit for interaction is the reduction of the Midas touch

problem, which states that for interactions that require

dwell-time-based approaches, a distinction between a rest-

ing gaze that indicates the intention to select and one that

was evoked by the wish to examine cannot sufficiently be

made (Huckauf & Urbina, 2008; Vidal et al., 2012).

All these applications use one of two algorithms to

compare the eye movements of the user with the move-

ment patterns of the UI elements: a correlation-based algo-

rithm, using the Pearson’s product-moment correlation,

and an algorithm based on vectors using the Euclidean dis-

tance. These algorithms are well-researched for interac-

tions on a 2D-plane. In addition to these, Drewes et al.

(2019b) introduced a novel slope approach, using the slope

of a linear regression line for object detection, showing a

possible detection for up to 160 individual objects, based

on circular movement on several rings of objects. How-

ever, this approach was tested in 2D as well.

Since the introduction of the Oculus Rift DK1 at the

end of 2012 (Kickstarter.com, 2012), the technological

progress as well as the availability of Head-Mounted Dis-

plays (HMDs) for the consumer market have skyrocketed

(Gamesradar, 2022). The integration of eye-tracking tech-

nology into HMDs followed suit. In only a span of a few

years the solutions developed from research editions pro-

vided by eye-tracking manufacturers over clip-in solutions

to, finally, the mass-production of consumer-level hard-

ware with eye-tracking integrated by default (VIVE,

2022). This widespread availability of eye-tracking data

during usage of HMDs opens the door for integrating gaze-

based interactions by default into consumer media. It also

provides researchers with an abundance of opportunities to

investigate the transferability of what is known to work in

2D to 3D virtual reality applications.

The natural navigation of the visual space provided by

HMDs suggests that the observed gaze behavior would be

close to natural, with no artificial affordances of control

disrupting the visual exploration of the virtual world Due

to this, VR could potentially overcome shortcomings of

lab experiments by providing a semi-realistic experience

that surpasses artificial lab settings (Clay et al., 2019;

Lappi, 2015). However, there also are challenges unique

to experiences of VR via HMDs. One is the users' potential

ability to physically move across the 3D environment.

Khamis et al. (2018) investigated the influence of user

movement, target size, the distance to targets, and the ra-

dius of circular object trajectories on the performance of a

correlation-based algorithm, showing that, while still

yielding sufficient results, movement reduced the accuracy

of selections and negatively impacting the performance.

For our study, we chose to keep all of these parameters

except for distance constant and our participants stationary

across all conditions to control for possible effects.

Another challenge is the Vergence-Accomodation con-

flict. When focusing on an object in a natural setting, the

focal distance of the eye and the vergence align. While

viewing a scene via a HMD however, the vergence of the

users’ eyes is set to the virtual distance of the focused ob-

ject behind the screen of the HMD – while the focal dis-

tance is set to the screen. This creates a mismatch which

does not exist in the natural world and might lead to eye

strain (Dörner et al., 2013) and possibly slightly influence

the individual vergence response itself (Neveu et al.,

2012). However, the additional gaze information along the

third axis remains available over the course of the interac-

tion in VR. Can this information be useful to improve

smooth-pursuit algorithms in 3D?

While previous studies investigated the performance of

smooth-pursuit algorithms in 3D VR, either correlation-

based (Khamis et al., 2018) or based on the Euclidian dis-

tance (Piumsomboon et al., 2017), the depth information

of a third axis was not yet included in the calculations.

Breitenfellner et al. (2019) conclude that so far there was

no extension to the existing 2D smooth pursuit algorithms

for the use in VR. While Khamis et al. (2018) found no

significant effect of distance on the correlation-based al-

gorithms' performance, we assume that distance will affect

the performance once the third dimension is included and

providing additional information to the detection algo-

rithms.

Journal of Eye Movement Research Freytag, S.-C., Zechner, R., & Kamps, M. (2023)

15(3):9 A systematic performance comparison of two Smooth-Pursuit Algorithms in VR

The aim of the study was to systematically examine the

potential of incorporating gaze information along the 3rd

dimensional axis into the two currently most-widely used

algorithms typical for 2D-smooth-pursuit interaction. One

correlation-based algorithm and one distance-based algo-

rithm were adapted to 3D. In a first experiment, the perfor-

mances of both algorithms were examined by systemati-

cally varying parameters of distance and trajectory of ob-

ject movement. During this experiment, only one object

was visible at all times, allowing for the assessment of se-

lection performance under ideal conditions.

The second experiment focused on the performance of

the algorithms while additional objects to choose from

were visible. The number of additional objects to choose

from, as well as the configuration within the 3D space was

varied systematically to test the algorithms under ecologi-

cally valid conditions.

The following section introduces the algorithms, fol-

lowed by the methods, and a description of the virtual en-

vironment, which were used for both experiments. After

that, details and outcomes of both experiments are pre-

sented individually, followed by a critical discussion and

outlook.

Algorithms and dependent variables

While 2D smooth-pursuit algorithms often use screen

coordinates to match targets and gaze, a 3D environment

requires adjustments. Instead of x-, y- and z-coordinates,

we defined the center of the HMD as the origin of a spher-

ical coordinate system and matched its position to the

origin of the world-space in our virtual environment. Dis-

tances were calculated as radial distance r with positions

being defined by the radius r and the angles theta θ and phi

φ for pitch and yaw respectively (Figure 1).

The 3D Point of Regard (3D-POR) was used for gaze

estimation and defined as the mid-point between the re-

spective points on the gaze vectors of each eye where the

distance between both vectors reached its minimum. Both

of the following algorithms were initially tested against a

variable threshold. Determining the ideal threshold level

for both algorithms respectively was part of experiment 1.

Figure 1. Illustration of the HMD-based coordinate system with

radius r, pitch θ, and yaw φ.

A correlation-based algorithm was adapted from the

correlation-based algorithm for 2D smooth-pursuit as de-

scribed by Vidal et al. (2013). This algorithm calculates

the product-moment correlation between gaze coordinates

and the coordinates of the moving target. Instead of x and

y-coordinates, the 3D-adapted algorithm uses r, θ and φ for

the calculations.

The difference-based algorithm was based on the ap-

proach by Lutz et al. (2015). The authors calculate the dif-

ference between the movement vector of targets and gaze

as well as the difference in angle of the movement vectors

in relation to the x-y plane. Targets are selected when both

criteria fall below a selection threshold. This algorithm

was adapted to 3D by using the radial distance r as well as

θ and φ of the moving targets to calculate the difference to

the gaze path.

Workflow

For each new frame, first the validity of the gaze data

was assessed (see Figure S1). Next, a 3D-POR was calcu-

lated and added to a Vector3-field storing the last x amount

of samples, with x being defined as the size of a moving

window. Upon reaching the maximum sample size, the

currently oldest sample would be removed upon adding the

new sample. Simultaneously, the object coordinates of

each target object were stored in an identical manner in re-

spective Vector3 fields. After updating the 3D-POR

0°

180°

90°

-90°

0°

+90°

3D-POR

Theta [°]

Radius [m]

Phi [°]

Journal of Eye Movement Research Freytag, S.-C., Zechner, R., & Kamps, M. (2023)

15(3):9 A systematic performance comparison of two Smooth-Pursuit Algorithms in VR

coordinates in the described manner, the respective algo-

rithms started calculating as follows:

The correlation-based algorithm iterates over all

possible interaction objects and calculates product-mo-

ment correlations between the gaze and object coordinates

for each object respectively. The calculations are per-

formed for each dimension (radius, θ, φ). In contrast to the

approach in 2D, where individual correlations are com-

pared to a threshold directly, we chose to calculate the av-

erage of all correlation coefficients for each object. While

this potentially introduces an uncorrelated parameter, the

effect will be the same for all respective samples which

remain distinguishable via the remaining parameters. A

lowering of the correlation threshold during these situa-

tions will be tested, akin to the Algorithm tested by

Khamis et al. (2018).

Upon calculating the correlation coefficients of all ob-

jects, the algorithm searches for the highest overall coeffi-

cient. If this correlation surpasses the particular threshold,

the respective item is marked as selected by the participant.

The difference-based algorithm first splits the gaze

data Vector3 field in half based on timestamps. The pa-

rameters of the halves containing the oldest and newest

gaze vectors respectively are averaged. The most recent

averaged gaze coordinates refer to the end point of the gaze

vector, the averaged coordiantes of the other half consti-

tute the origin of the gaze vector. By averaging the gaze

data over several samples, we smooth the data and prevent

obtaining false correlation values due to outliers. The end-

point of each averaged half of the Vector3 field is sub-

tracted from the respective starting point in order to obtain

a movement vector ranging from start to finish of the

movement as recorded by the field interval, resulting in

∆

_"#$#%

These steps are performed for the gaze data as well as

for the positional coordinates of each object. In order to

achieve a relation between the object and gaze movement

vectors, the difference coefficients for r, θ and φ are cal-

culated as follows:

(1)

∆$#&'()=

∆

!"#$%&

(

()*+,-.

)

∆

!"#$%&

(

()*+,-.

)

-.∆

!"#$%&

(

0"1+.

)

−0.5

(2)

∆/=*

∆

2(()*+,-.)

∆

2(()*+,-.)-.∆

2(0"1+.) −0.5

(3)

∆0=*

∆

(

()*+,-.

)

∆

(

()*+,-.

)

-.∆

(

0"1+.

)

−0.5

The obtained coefficients illustrate the difference be-

tween gaze radius r, gaze angle theta θ and gaze angle phi

φ and the respective object parameters. The coefficients lie

within the range of [0; ∞]. A coefficient of 0 indicates a

perfect fit between gaze and object parameters.

The calculated difference coefficients can be graph-

ically expressed on a logarithmic scale based on the loga-

rithm of ten. For example, if the object difference vector is

kept constant at 10, a symmetrical image results for a var-

iable gaze difference vector for positive numbers (see Fig-

ure 2). The difference coefficient would reach its mini-

mum of 0 at a gaze vector of 10 and its maximum of 0.5 at

a gaze vector of 0. Likewise, at high positive deviations,

approximately 0.5 is reached. If the gaze moves in the op-

posite direction to the object, differential coefficients of >

0.5 are always achieved. Except in the special case that the

gaze difference value should reach exactly the negative ob-

ject difference value, no calculation of the difference coef-

ficient is possible by a division by 0. This case should

hardly occur practically.

Figure 2. Visualization of the relation between the ratio of gaze

to object movement and the resulting difference coefficient (for

one dimension).

In order to account for different distances and to correct

the 3D POR error, the three coefficients r, θ and φ are av-

eraged over all samples within the moving window. The

algorithm then compares the sum with the threshold. The

threshold level itself is adaptable and the determination of

the ideal threshold level part of experiment 1.

Dependent Variables

The following section explains the parameters that

were analyzed as dependent variables in both experiments.

Detection rate (DR). A true positive (TP) detection

was defined by the target object surpassing the selection

threshold for the respective algorithm. A false positive

(FP) was defined as the algorithm detecting any other

0.0

0.1

0.2

0.3

0.4

0.5

0.1 110

Difference

coefficient for one

single dimension

Ratio of gaze to object movement

Journal of Eye Movement Research Freytag, S.-C., Zechner, R., & Kamps, M. (2023)

15(3):9 A systematic performance comparison of two Smooth-Pursuit Algorithms in VR

object but the currently visible one as selected. No detec-

tion (ND) took place if the threshold was not surpassed for

any of the objects. The detection rate relates these param-

eters akin to the assessment of a binary classificator:

(4)

𝐷𝑅=*

∑

23-.

∑

43-

∑

The rates of false positives (FPR) and non-detections

(NDR) were calculated likewise.

Efficiency. The efficiency expresses the ratio of true

detections to overall detections:

(5)

𝐸𝑓𝑓=*

∑

23-.

∑

Duration until selection. As long durations until de-

tections can invoke frustration in users (Khamis et al.,

2018), the duration until the algorithm was able to select

any object was introduced as additional criterion for com-

paring the performance of the algorithms. The duration is

expressed both in frames per second (fps) and in s.

Further Variables. To indicate the participants’ focus

on the task, the task performance of the participants, meas-

ured as the sum of points related to the given task, and av-

erage reaction time per condition, was tracked.

Methods

The following section describes the material used in

both experiments. Differences between both settings are

pointed out where applicable.

Virtual Environment. A virtual environment was cre-

ated with the Unity Game Engine (Unity, 2017). The envi-

ronment is seen from the viewpoint of a person standing

on a small planet of 2m diameter (Zehm, 2017) in front of

a starry sky. The environment was kept intentionally plain

to reduce the influence of head movements on the task

(Anderson & Bischof, 2019). An X on the planet marked

the ideal position for the subjects. A light source was

placed above and slightly behind the subject to prevent

blinding. A chicken inside a semi-transparent spherical

spaceship was introduced as a moving target (“Vertex

Cat”, 2017). The target was kept visually plain to prevent

sustained scanning of the details while hopefully being

sufficiently entertaining to maintain subjects’ motivation.

A high contrast to the backdrop was chosen to facilitate

visual detection (see Figure 3). The target had a diameter

of 0.07m, equaling to 10° visual angle in the close

condition and 2.9° visual angle in the far condition. The

size was chosen based on the results of a pre-test, consti-

tuting a compromise between identifiability over different

distances and simplicity.

Figure 3. Virtual environment displaying the users' position.

Lower right corner: the target object "space chicken" in a close-

up.

Number of objects. A maximum number of 26 indi-

vidual objects being present at once was chosen in order to

prevent possible ceiling effects regarding the performance

of the algorithms. The high number allowed for the testing

of a variety of unique movement directions within the 3D

space and was therefore increased, comparing to similar

studies in 2D (e.g. Zeng et al., 2020). During the first ex-

periment, only one of the objects was visible while the oth-

ers remained hidden to the user, but were taken into ac-

count during the analysis. This approach was chosen to fa-

cilitate sustained and ideal smooth-pursuit movements on

one target, without other distractions. With this approach,

the algorithms could be tested under an idealized, highly

standardized smooth pursuit movement performed by the

participants. The second experiment introduced visibility

of a systematically varied number of distractors in order to

retest the resulting ideal performance as it would occur “in

the field” with a natural ecological validity (see experi-

ment 2).

Distances. Two distances (near / far) were imple-

mented after having been selected for optimal usability and

prevention of eye strain in a pre-test. In the “near” condi-

tion the center of a spawn sphere was set to an origin at

0.4m distance (with the sphere spanning from 0.2 - 0.6m)

to provide a substantial vergence of the eyes, while simul-

taneously being far enough to prevent eyestrain or irrita-

tion and disorientation due to too large portions of the vis-

ual field moving. The “far” condition set the center of the

spawn sphere at 1.4m distance (spanning from 1.2m to

1.6m) to test the performance of the algorithm near the

limit of depth detection due to parallelization of the eyes.

Loading more pages...