scieee Science in your language
[en] (orig)
Journal of Eye Movement Research, 15(3):9
https://doi.org/10.16910/jemr.15.3.9
1
Introduction
The use of the human gaze to interact with machines or
software has become a viable alternative to traditional
means of input. Compared to mouse control, gaze-based
interaction techniques can be faster and particularly useful
in situations where both hands are needed to perform a task
(Sibert & Jacob, 2000) or in hygiene-critical situations,
such as surgery (Mewes et al., 2017).
Especially smooth pursuit movements have proven
suitable to provide a range of unobtrusive interaction
methods, that allow a broad range of users to interact ef-
fectively with gaze-controlled interfaces. Applications
range from novel takes on gaze-spelling that let users se-
lect their target letter by simply following its’ movement
A systematic performance comparison of
two Smooth Pursuit detection algorithms
in Virtual Reality depending on target
number, distance, and movement patterns
Sarah-Christin Freytag
TU Berlin, Berlin, Germany
Roland Zechner
TU Berlin, Berlin, Germany
Michelle Kamps
TU Berlin, Berlin, Germany
We compared the performance of two smooth-pursuit-based object selection algorithms in
Virtual Reality (VR). To assess the best algorithm for a range of configurations, we system-
atically varied the number of targets to choose from, their distance, and their movement
pattern (linear and circular). Performance was operationalized as the ratio of hits, misses
and non-detections. Averaged over all distances, the correlation-based algorithm performed
better for circular movement patterns compared to linear ones (F(1,11) = 24.27, p < .001, η²
= .29). This was not found for the difference-based algorithm (F(1,11) = 0.98, p = .344, η²
= .01). Both algorithms performed better in close distances compared to larger ones (F(1,11)
= 190.77, p < .001, η² = .75 correlation-based, and F(1,11) = 148.20, p < .001, η² = .42,
difference-based). An interaction effect for distance x movement emerged. After systemat-
ically varying the number of targets, these results could be replicated, with a slightly smaller
effect.
Based on performance levels, we introduce the concept of an optimal threshold algorithm,
suggesting the best detection algorithm for the individual target configuration. Learnings of
adding the third dimension to the detection algorithms and the role of distractors are dis-
cussed and suggestions for future research added.
Keywords: Eye movement, Eye Tracking, Smooth Pursuit, VR, Virtual Reality,
correlation-based algorithm, vector-angle based algorithm
*Corresponding author: Sarah-Christin Freytag, sarah.frey[email protected]-berlin.de
Received February 08, 2023; Published May 29, 2023.
Citation: Freytag, S.-C., Zechner, R. & Kamps, M. (2023). A systematic performance comparison of
two Smooth Pursuit detection algorithms in Virtual Reality depending on target number, distance, and
movement patterns. Journal of Eye Movement Research, 15(3):9. https://doi.org/10.16910/jemr.15.3.9
ISSN: 1995-8692
This article is licensed under a Creative Commons Attribution 4.0 International license.
Journal of Eye Movement Research Freytag, S.-C., Zechner, R., & Kamps, M. (2023)
15(3):9 A systematic performance comparison of two Smooth-Pursuit Algorithms in VR
2
with their eyes (Cymek et al., 2014; Khamis et al., 2016;
Lutz et al., 2015) to controlling smart-phone applications
by observing the movement speed of icons for applica-
tions, that, after surpassing a specific matching-criterion,
will then be opened (Esteves et al., 2015). The ease of use
and usage of very natural gaze movements make these in-
teractions also suitable for interactions in public spaces
(Khamis et al., 2015; Vidal et al., 2013) and have shown
promising results when tested with large databases of users
(Freytag, 2020).
One of the great advantages of employing smooth-pur-
suit for interaction is the reduction of the Midas touch
problem, which states that for interactions that require
dwell-time-based approaches, a distinction between a rest-
ing gaze that indicates the intention to select and one that
was evoked by the wish to examine cannot sufficiently be
made (Huckauf & Urbina, 2008; Vidal et al., 2012).
All these applications use one of two algorithms to
compare the eye movements of the user with the move-
ment patterns of the UI elements: a correlation-based algo-
rithm, using the Pearson’s product-moment correlation,
and an algorithm based on vectors using the Euclidean dis-
tance. These algorithms are well-researched for interac-
tions on a 2D-plane. In addition to these, Drewes et al.
(2019b) introduced a novel slope approach, using the slope
of a linear regression line for object detection, showing a
possible detection for up to 160 individual objects, based
on circular movement on several rings of objects. How-
ever, this approach was tested in 2D as well.
Since the introduction of the Oculus Rift DK1 at the
end of 2012 (Kickstarter.com, 2012), the technological
progress as well as the availability of Head-Mounted Dis-
plays (HMDs) for the consumer market have skyrocketed
(Gamesradar, 2022). The integration of eye-tracking tech-
nology into HMDs followed suit. In only a span of a few
years the solutions developed from research editions pro-
vided by eye-tracking manufacturers over clip-in solutions
to, finally, the mass-production of consumer-level hard-
ware with eye-tracking integrated by default (VIVE,
2022). This widespread availability of eye-tracking data
during usage of HMDs opens the door for integrating gaze-
based interactions by default into consumer media. It also
provides researchers with an abundance of opportunities to
investigate the transferability of what is known to work in
2D to 3D virtual reality applications.
The natural navigation of the visual space provided by
HMDs suggests that the observed gaze behavior would be
close to natural, with no artificial affordances of control
disrupting the visual exploration of the virtual world Due
to this, VR could potentially overcome shortcomings of
lab experiments by providing a semi-realistic experience
that surpasses artificial lab settings (Clay et al., 2019;
Lappi, 2015). However, there also are challenges unique
to experiences of VR via HMDs. One is the users' potential
ability to physically move across the 3D environment.
Khamis et al. (2018) investigated the influence of user
movement, target size, the distance to targets, and the ra-
dius of circular object trajectories on the performance of a
correlation-based algorithm, showing that, while still
yielding sufficient results, movement reduced the accuracy
of selections and negatively impacting the performance.
For our study, we chose to keep all of these parameters
except for distance constant and our participants stationary
across all conditions to control for possible effects.
Another challenge is the Vergence-Accomodation con-
flict. When focusing on an object in a natural setting, the
focal distance of the eye and the vergence align. While
viewing a scene via a HMD however, the vergence of the
users’ eyes is set to the virtual distance of the focused ob-
ject behind the screen of the HMD while the focal dis-
tance is set to the screen. This creates a mismatch which
does not exist in the natural world and might lead to eye
strain (Dörner et al., 2013) and possibly slightly influence
the individual vergence response itself (Neveu et al.,
2012). However, the additional gaze information along the
third axis remains available over the course of the interac-
tion in VR. Can this information be useful to improve
smooth-pursuit algorithms in 3D?
While previous studies investigated the performance of
smooth-pursuit algorithms in 3D VR, either correlation-
based (Khamis et al., 2018) or based on the Euclidian dis-
tance (Piumsomboon et al., 2017), the depth information
of a third axis was not yet included in the calculations.
Breitenfellner et al. (2019) conclude that so far there was
no extension to the existing 2D smooth pursuit algorithms
for the use in VR. While Khamis et al. (2018) found no
significant effect of distance on the correlation-based al-
gorithms' performance, we assume that distance will affect
the performance once the third dimension is included and
providing additional information to the detection algo-
rithms.
Journal of Eye Movement Research Freytag, S.-C., Zechner, R., & Kamps, M. (2023)
15(3):9 A systematic performance comparison of two Smooth-Pursuit Algorithms in VR
3
The aim of the study was to systematically examine the
potential of incorporating gaze information along the 3rd
dimensional axis into the two currently most-widely used
algorithms typical for 2D-smooth-pursuit interaction. One
correlation-based algorithm and one distance-based algo-
rithm were adapted to 3D. In a first experiment, the perfor-
mances of both algorithms were examined by systemati-
cally varying parameters of distance and trajectory of ob-
ject movement. During this experiment, only one object
was visible at all times, allowing for the assessment of se-
lection performance under ideal conditions.
The second experiment focused on the performance of
the algorithms while additional objects to choose from
were visible. The number of additional objects to choose
from, as well as the configuration within the 3D space was
varied systematically to test the algorithms under ecologi-
cally valid conditions.
The following section introduces the algorithms, fol-
lowed by the methods, and a description of the virtual en-
vironment, which were used for both experiments. After
that, details and outcomes of both experiments are pre-
sented individually, followed by a critical discussion and
outlook.
Algorithms and dependent variables
While 2D smooth-pursuit algorithms often use screen
coordinates to match targets and gaze, a 3D environment
requires adjustments. Instead of x-, y- and z-coordinates,
we defined the center of the HMD as the origin of a spher-
ical coordinate system and matched its position to the
origin of the world-space in our virtual environment. Dis-
tances were calculated as radial distance r with positions
being defined by the radius r and the angles theta θ and phi
φ for pitch and yaw respectively (Figure 1).
The 3D Point of Regard (3D-POR) was used for gaze
estimation and defined as the mid-point between the re-
spective points on the gaze vectors of each eye where the
distance between both vectors reached its minimum. Both
of the following algorithms were initially tested against a
variable threshold. Determining the ideal threshold level
for both algorithms respectively was part of experiment 1.
Figure 1. Illustration of the HMD-based coordinate system with
radius r, pitch θ, and yaw φ.
A correlation-based algorithm was adapted from the
correlation-based algorithm for 2D smooth-pursuit as de-
scribed by Vidal et al. (2013). This algorithm calculates
the product-moment correlation between gaze coordinates
and the coordinates of the moving target. Instead of x and
y-coordinates, the 3D-adapted algorithm uses r, θ and φ for
the calculations.
The difference-based algorithm was based on the ap-
proach by Lutz et al. (2015). The authors calculate the dif-
ference between the movement vector of targets and gaze
as well as the difference in angle of the movement vectors
in relation to the x-y plane. Targets are selected when both
criteria fall below a selection threshold. This algorithm
was adapted to 3D by using the radial distance r as well as
θ and φ of the moving targets to calculate the difference to
the gaze path.
Workflow
For each new frame, first the validity of the gaze data
was assessed (see Figure S1). Next, a 3D-POR was calcu-
lated and added to a Vector3-field storing the last x amount
of samples, with x being defined as the size of a moving
window. Upon reaching the maximum sample size, the
currently oldest sample would be removed upon adding the
new sample. Simultaneously, the object coordinates of
each target object were stored in an identical manner in re-
spective Vector3 fields. After updating the 3D-POR
18
90°
-90°
+90°
3D-POR
Theta]
Radius [m]
Phi [°]
Advertisement
Journal of Eye Movement Research Freytag, S.-C., Zechner, R., & Kamps, M. (2023)
15(3):9 A systematic performance comparison of two Smooth-Pursuit Algorithms in VR
4
coordinates in the described manner, the respective algo-
rithms started calculating as follows:
The correlation-based algorithm iterates over all
possible interaction objects and calculates product-mo-
ment correlations between the gaze and object coordinates
for each object respectively. The calculations are per-
formed for each dimension (radius, θ, φ). In contrast to the
approach in 2D, where individual correlations are com-
pared to a threshold directly, we chose to calculate the av-
erage of all correlation coefficients for each object. While
this potentially introduces an uncorrelated parameter, the
effect will be the same for all respective samples which
remain distinguishable via the remaining parameters. A
lowering of the correlation threshold during these situa-
tions will be tested, akin to the Algorithm tested by
Khamis et al. (2018).
Upon calculating the correlation coefficients of all ob-
jects, the algorithm searches for the highest overall coeffi-
cient. If this correlation surpasses the particular threshold,
the respective item is marked as selected by the participant.
The difference-based algorithm first splits the gaze
data Vector3 field in half based on timestamps. The pa-
rameters of the halves containing the oldest and newest
gaze vectors respectively are averaged. The most recent
averaged gaze coordinates refer to the end point of the gaze
vector, the averaged coordiantes of the other half consti-
tute the origin of the gaze vector. By averaging the gaze
data over several samples, we smooth the data and prevent
obtaining false correlation values due to outliers. The end-
point of each averaged half of the Vector3 field is sub-
tracted from the respective starting point in order to obtain
a movement vector ranging from start to finish of the
movement as recorded by the field interval, resulting in
"
"
#
_"#$#%
.
These steps are performed for the gaze data as well as
for the positional coordinates of each object. In order to
achieve a relation between the object and gaze movement
vectors, the difference coefficients for r, θ and φ are cal-
culated as follows:
(1)
$#&'()=
%
+
+
,
!"#$%&
(
()*+,-.
)
+
,
!"#$%&
(
()*+,-.
)
-.∆
+
+
,
!"#$%&
(
0"1+.
)
0.5
%
(2)
/=*
%
+
,
2(()*+,-.)
+
+
,
2(()*+,-.)-.∆
+
,
2(0"1+.) 0.5
%
(3)
0=*
%
+
,
3
(
()*+,-.
)
+
+
,
3
(
()*+,-.
)
-.∆
+
+
,
3
(
0"1+.
)
0.5
%
The obtained coefficients illustrate the difference be-
tween gaze radius r, gaze angle theta θ and gaze angle phi
φ and the respective object parameters. The coefficients lie
within the range of [0; ∞]. A coefficient of 0 indicates a
perfect fit between gaze and object parameters.
The calculated difference coefficients can be graph-
ically expressed on a logarithmic scale based on the loga-
rithm of ten. For example, if the object difference vector is
kept constant at 10, a symmetrical image results for a var-
iable gaze difference vector for positive numbers (see Fig-
ure 2). The difference coefficient would reach its mini-
mum of 0 at a gaze vector of 10 and its maximum of 0.5 at
a gaze vector of 0. Likewise, at high positive deviations,
approximately 0.5 is reached. If the gaze moves in the op-
posite direction to the object, differential coefficients of >
0.5 are always achieved. Except in the special case that the
gaze difference value should reach exactly the negative ob-
ject difference value, no calculation of the difference coef-
ficient is possible by a division by 0. This case should
hardly occur practically.
Figure 2. Visualization of the relation between the ratio of gaze
to object movement and the resulting difference coefficient (for
one dimension).
In order to account for different distances and to correct
the 3D POR error, the three coefficients r, θ and φ are av-
eraged over all samples within the moving window. The
algorithm then compares the sum with the threshold. The
threshold level itself is adaptable and the determination of
the ideal threshold level part of experiment 1.
Dependent Variables
The following section explains the parameters that
were analyzed as dependent variables in both experiments.
Detection rate (DR). A true positive (TP) detection
was defined by the target object surpassing the selection
threshold for the respective algorithm. A false positive
(FP) was defined as the algorithm detecting any other
0.0
0.1
0.2
0.3
0.4
0.5
0.1 110
Difference
coefficient for one
single dimension
Ratio of gaze to object movement
Journal of Eye Movement Research Freytag, S.-C., Zechner, R., & Kamps, M. (2023)
15(3):9 A systematic performance comparison of two Smooth-Pursuit Algorithms in VR
5
object but the currently visible one as selected. No detec-
tion (ND) took place if the threshold was not surpassed for
any of the objects. The detection rate relates these param-
eters akin to the assessment of a binary classificator:
(4)
𝐷𝑅=*
23
23-.
43-
56
The rates of false positives (FPR) and non-detections
(NDR) were calculated likewise.
Efficiency. The efficiency expresses the ratio of true
detections to overall detections:
(5)
𝐸𝑓𝑓=*
23
23-.
43
Duration until selection. As long durations until de-
tections can invoke frustration in users (Khamis et al.,
2018), the duration until the algorithm was able to select
any object was introduced as additional criterion for com-
paring the performance of the algorithms. The duration is
expressed both in frames per second (fps) and in s.
Further Variables. To indicate the participants’ focus
on the task, the task performance of the participants, meas-
ured as the sum of points related to the given task, and av-
erage reaction time per condition, was tracked.
Methods
The following section describes the material used in
both experiments. Differences between both settings are
pointed out where applicable.
Virtual Environment. A virtual environment was cre-
ated with the Unity Game Engine (Unity, 2017). The envi-
ronment is seen from the viewpoint of a person standing
on a small planet of 2m diameter (Zehm, 2017) in front of
a starry sky. The environment was kept intentionally plain
to reduce the influence of head movements on the task
(Anderson & Bischof, 2019). An X on the planet marked
the ideal position for the subjects. A light source was
placed above and slightly behind the subject to prevent
blinding. A chicken inside a semi-transparent spherical
spaceship was introduced as a moving target (“Vertex
Cat”, 2017). The target was kept visually plain to prevent
sustained scanning of the details while hopefully being
sufficiently entertaining to maintain subjects’ motivation.
A high contrast to the backdrop was chosen to facilitate
visual detection (see Figure 3). The target had a diameter
of 0.07m, equaling to 10° visual angle in the close
condition and 2.9° visual angle in the far condition. The
size was chosen based on the results of a pre-test, consti-
tuting a compromise between identifiability over different
distances and simplicity.
Figure 3. Virtual environment displaying the users' position.
Lower right corner: the target object "space chicken" in a close-
up.
Number of objects. A maximum number of 26 indi-
vidual objects being present at once was chosen in order to
prevent possible ceiling effects regarding the performance
of the algorithms. The high number allowed for the testing
of a variety of unique movement directions within the 3D
space and was therefore increased, comparing to similar
studies in 2D (e.g. Zeng et al., 2020). During the first ex-
periment, only one of the objects was visible while the oth-
ers remained hidden to the user, but were taken into ac-
count during the analysis. This approach was chosen to fa-
cilitate sustained and ideal smooth-pursuit movements on
one target, without other distractions. With this approach,
the algorithms could be tested under an idealized, highly
standardized smooth pursuit movement performed by the
participants. The second experiment introduced visibility
of a systematically varied number of distractors in order to
retest the resulting ideal performance as it would occur “in
the field” with a natural ecological validity (see experi-
ment 2).
Distances. Two distances (near / far) were imple-
mented after having been selected for optimal usability and
prevention of eye strain in a pre-test. In the “near” condi-
tion the center of a spawn sphere was set to an origin at
0.4m distance (with the sphere spanning from 0.2 - 0.6m)
to provide a substantial vergence of the eyes, while simul-
taneously being far enough to prevent eyestrain or irrita-
tion and disorientation due to too large portions of the vis-
ual field moving. The “far” condition set the center of the
spawn sphere at 1.4m distance (spanning from 1.2m to
1.6m) to test the performance of the algorithm near the
limit of depth detection due to parallelization of the eyes.
Advertisement
Loading more pages...