scieee Science in your language
[en] (orig)
Integrating Neurophysiologic Relevance Feedback
in Intent Modeling for Information Retrieval
Giulio Jacucci
Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki,
P.O. Box 68, (Pietari Kalmin katu 5), Helsinki FI-00014, Finland. E-mail: [email protected]
Oswald Barral
Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki,
P.O. Box 68, (Pietari Kalmin katu 5), Helsinki FI-00014, Finland. E-mail: oswald.barral@helsinki.fi
Pedram Daee
Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University,
P.O.Box 15400, Aalto FI-00076, Finland. E-mail: [email protected]
Markus Wenzel
Neurotechnology Group, Technische Universität Berlin, Berlin 10587, Germany. E-mail: markus.wenzel@hhi.
fraunhofer.de
Baris Serim
Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki,
P.O. Box 68, (Pietari Kalmin katu 5), Helsinki FI-00014, Finland. E-mail: [email protected]
Tuukka Ruotsalo
Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki,
P.O. Box 68, (Pietari Kalmin katu 5), Helsinki FI-00014, Finland. E-mail: [email protected]i
Patrik Pluchino
Human Inspired Technology Research Centre, University of Padova, Via Luzzatti 4, Padova 35121, Italy.
Jonathan Freeman
Goldsmiths, University of London, New Cross, London SE14 6NW, UK. E-mail: [email protected]
Luciano Gamberini
Human Inspired Technology Research Centre, University of Padova, Via Luzzatti 4, Padova 35121, Italy.
Samuel Kaski
Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University,
P.O.Box 15400, Aalto FI-00076, Finland. E-mail: [email protected]
Benjamin Blankertz
Neurotechnology Group, Technische Universität Berlin, Berlin 10587, Germany. E-mail: benjamin.blankertz@
tu-berlin.de
Received November 20, 2017; revised April 20, 2018; accepted October 17, 2018
© 2019 The Authors. Journal of the Association for Information Science and Technology published by Wiley Periodicals, Inc. on behalf of ASIS&T.
Published online March 12, 2019 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/asi.24161
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribu-
tion in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 70(9):917930, 2019
The use of implicit relevance feedback from neurophysi-
ology could deliver effortless information retrieval. How-
ever, both computing neurophysiologic responses and
retrieving documents are characterized by uncertainty
because of noisy signals and incomplete or inconsis-
tent representations of the data. We present the first-of-
its-kind, fully integrated information retrieval system
that makes use of online implicit relevance feedback
generated from brain activity as measured through elec-
troencephalography (EEG), and eye movements. The
findings of the evaluation experiment (N = 16) show that
we are able to compute online neurophysiology-based
relevance feedback with performance significantly bet-
ter than chance in complex data domains and realistic
search tasks. We contribute by demonstrating how to
integrate in interactive intent modeling this inherently
noisy implicit relevance feedback combined with scarce
explicit feedback. Although experimental measures of
task performance did not allow us to demonstrate how
the classification outcomes translated into search task
performance, the experiment proved that our approach
is able to generate relevance feedback from brain sig-
nals and eye movements in a realistic scenario, thus
providing promising implications for future work in neu-
roadaptive information retrieval (IR).
Introduction
Information retrieval systems are confronted with a diffi-
cult task; deriving a users information needs from limited
explicit user signals and use these to retrieve information
matching those needs. Although modeling the data to be
retrieved has witnessed dramatic advances during the last
decades, understanding usersinformation needs is still
based on rather simple user signals, such as queries, clicks,
speech commands, or other explicit interactions. As a result,
understanding information needs implicitly without disrupt-
ing the user has become a central research challenge in
information retrieval (IR). Neurophysiologic measures are
promising candidates for implicitly gathering relevance feed-
back, as they reflect the inner state of the user and can be
collected unobtrusively at high throughput (Cowley et al.,
2016; Eugster et al., 2016; Jacucci, Fairclough, & Solovey,
2015; Wenzel, Bogojeski, & Blankertz, 2017). Neurophysi-
ologic signals hold a great potential for information retrieval
as they provide a novel user signal revealing interests and
relevance towards a diverse digital content as they happen
when users are consuming digital information. Neurophysio-
logic signals also carry extraordinary practical promise as
numerous types of wearable devices are rapidly becoming
integral part of peoples everyday life.
However, successful application of neurophysiologic
measures in IR encounters a dual uncertainty problem:
(a) noisiness and unknown causes of responses in neuro-
physiologic signals make it difficult to interpret them, a
problem exacerbated by the lack of stimulus control in
realistic settings, and (b) the IR process involves inherent
uncertainty originating from the ambiguity and inconsis-
tency of the representations of data to be retrieved. Unlike
explicit relevance feedback that has low uncertainty due a
users overt control, implicit relevance feedback techniques
are intrinsically noisy. When observing a users click-
through activity or brain responses to infer relevance feed-
back, the uncertainty of the feedback accuracies becomes
higher, and incorporating this feedback within an interac-
tive IR system requires novel computational solutions. The
integration of brain signals has been especially challeng-
ing; even though they have shown promise, their use
beyond laboratory experiments with very controlled stimuli
remains largely unexplored. Previous work displays a lim-
ited number of unambiguous stimuli on the screen and/or
constrains user interaction to decrease the amount of noise
(Eugster et al., 2014; Eugster et al., 2016). In contrast, real-
istic search interfaces are characterized by dense informa-
tion, potential ambiguity regarding the relevance of search
results, and user interaction.
Our work provides the following contributions:
1. We demonstrate an approach able to predict implicit rele-
vance feedback from human-brain measurements in a real-
istic search scenario.
2. We present a first-of-its-kind interactive IR system that com-
bines brain-based feedback and eye tracking with scarce
explicit feedback for improved relevance predictions.
The article is structured as follows. First, a brief discus-
sion on related work on implicit relevance feedback in IR
using braincomputer interfaces (BCIs) is presented. The
section An Approach for Single-Trial Relevance Computa-
tion in IR investigates the challenge of decoding single-
trial event-related potentials (ERP) that involve semantic
interpretation of complex stimuli with large variability. We
follow with a detailed proposal of a neurophysiologic
approach for relevance computation, providing validation
proof for the method, while highlighting potential chal-
lenges to be addressed when integrating relevance compu-
tation from brain signals in an IR system.
In the subsequent section, Addressing Uncertainty in an
Online Neuroadaptive System through Interactive Intent
Modeling we propose interactive intent modeling as a par-
ticular retrieval and ranking approach that facilitates the
elicitation of explicit and implicit relevance feedback. Our
approach in this respect is characterized by combining
modeling of neurophysiologic response with modeling
interactively intent in IR. In the section An Experiment in
Neuroadaptive Literature Search we report the evaluation
of our approach through findings from an experiment (N=
16) showing that we can predict neurophysiology-based
relevance feedback in complex data domains and realistic
search tasks and combine it with explicit relevance feed-
back in interactive intent modeling.
Related Work
Traditional relevance feedback techniques involve asking
a user to provide explicit judgments on the information con-
tent. These has proven to be problematic because, in prac-
tice, users are reluctant to interrupt their search task to
918 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
provide relevance feedback, even although they are aware
that doing so would improve their search performance
(Kelly & Fu, 2006). An important bottleneck of information
seeking systems is that a considerable amount of user
relevance feedback on retrieved items is needed to properly
explore the large information space (Daee, Pyykkö,
Glowacka, & Kaski, 2016). To overcome this challenge, pre-
vious approaches investigated implicit relevance feedback
as indexed from search behavior from mouse and keyboard
interaction data to understand a users interests and personal-
ize and rank search results (Kelly & Teevan, 2003). Other
sources of implicit feedback include eye tracking to infer a
users interest through various metrics such as fixation count,
dwell time, pupil size, and scan paths (e.g., Gwizdka, 2014;
Oliveira, Aula, & Russell, 2009; Puolamäki, Salojärvi, Savia,
Simola, & Kaski, 2005), analysis of users facial expressions
(e.g., Arapakis, Athanasakos, & Jose, 2010), physiologic
responses (e.g., Barral et al., 2015, 2016), or a combination
of these (e.g., Arapakis, Konstas, & Jose, 2009; Moshfeghi &
Jose, 2013). Lately, brain signals have been identified as
promising sources for implicit relevance feedback and infor-
mation personalization (e.g., Eugster et al., 2014; Eugster
et al., 2016; Golenia, Wenzel, & Blankertz, 2015).
IR is one of the fields that could profit from this direct
access to the mental processes of the brain (Golenia et al.,
2015; Gwizdka & Mostafa, 2015, 2017). First of all, mental
processes can reveal information about relevance in
response to particular information items thereby providing
an effective way to elicit implicitly relevance feedback with
great efficiency gains in being able to expose more items
and collect relevance feedback without disrupting the users
search process. Second, mental processes and psychophysio-
logic states can be used to automatically annotate informa-
tion such as news with affective or relevance response for
future use and collaborative filtering (Barral et al., 2016;
Barral, Kosunen, & Jacucci, 2017), and finally affective
states as detectable from the brain can provide important
context information for when or how to present information
to user considering awareness, cognitive workload and other
mental states. Research at the intersection between brain-
computer interfaces (BCIs) and IR is still in an early stage,
and appropriate neurophysiologic methods have to be
matched with the appropriate paradigms for HCI in
IR. Kauppi et al. (2015) studied magnetoencephalographic
signals alone and in conjunction with gaze signals to pro-
vide relevance feedback in an image retrieval task by using
a static image database. Similarly, Eugster et al. (2014)
decoded the EEG with the objective of providing relevance
feedback in a text retrieval task by using a static text data
set. Other studies (Golenia et al., 2015; Golenia, Wenzel,
Bogojeski, & Blankertz, 2018) demonstrated how the brain
response to relevant versus irrelevant information can be
harnessed to improve image searches in ambiguous search
tasks. Recently research in the neurophysiologic correlates
of relevance have been studied by Moshfeghi, Pinto, Pol-
lick, & Jose, (Moshfeghi, Pinto, Pollick, & Jose, 2013)
using functional Magnetic Resonance Imaging (fMRI)
revealing three brain regions in the frontal, parietal and tem-
poral cortex where brain activity differed between proces-
sing relevant and non-relevant documents. Not only where
in the brain but also when relevance assessment phenomena
happen have been studied for example by Allegretti
et al. (2015) using a 64-channel EEG device. They found a
significant variation between relevance and nonrelevance for
the first 800 milliseconds (ms) of a relevance assessment
process from the presentation of the image within the EEG
signals. These studies are important as provide important
additional evidence on the feasibility to include brain signal
based relevance elicitation, however most studies focus on
relevance of images and videos and less on text for which
ca be more challenging to elicit and detect physiologic
responses. Moreover, Eugster et al. (2016) gave relevant
feedback on words from the Wikipedia database according
to information extracted from EEG signals. The loop
between brain and computer was closed by presenting new
recommendations to the users according to the EEG-based
feedback, which resulted in a significant information gain
for about 70% of the participants of the study. This work
constitutes presumably the first proof-of-concept IR systems
that have performed automatic information filtering on the
basis of brain activity alone.
Despite these advancement such as studies of neurophys-
iologic correlates of relevance, and applications using differ-
ent kinds of stimuli, there is a lack of understanding on how
to integrate neurophysiology-based relevance feedback in a
realistic IR scenario. On one hand this includes the need of
standardized approaches and procedures in research
(Mostafa & Gwizdka, 2016) considering for example the
use of machine learning. More importantly questions arise
on what user intent and retrieval models are best suited to
process the obtained implicit relevance feedback and how
this can be combined with other relevance information
obtained for example through explicit feedback.
An Approach for Single-Trial Relevance
Computation in IR
Uncertainty in Single-Trial EEG Decoding
Because of the comparably high conductivity of the
brain and scalp with respect to the one of the skull, electri-
cal signals arrive spatially smeared at the EEG sensors,
leading to low signal-to-noise ratio. Each sensor receives a
mixture of signals from many sources in the brain and,
conversely, the signals of one particular brain source are
recorded at many different electrodes with a broad spatial
profile. The predominant approach for real-time decoding
is to employ multivariate data analysis methods from the
field of machine learning (Lemm, Blankertz, Dickhaus, &
Müller, 2011) and to train subject-specific decoding
models on calibration data. Although this approach is com-
parably effective, a high degree of uncertainty in single-
trial analysis remains, probably because of the very high
number of potentially disturbing sources.
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
919
The perception and cognitive evaluation of visual stim-
uli, such as information presented on a computer screen, is
reflected by event-related potentials (ERPs). In the well-
known ERP-based Row-Column Speller (Farwell &
Donchin, 1988), users concentrate on a target symbol
while the rows and columns of the matrix of all symbols
are flashing randomly. If the user fixates on the target
symbol by gaze, the detection tasks boil down to a mere
detection of flashes. More recent ERP-based spellers, such
as the Center Speller (Treder, Schmidt, & Blankertz,
2011) circumvent the gaze-dependency of the Row-
Column Speller by posing a higher load on the user as it
requires the recognition of a target shape or color.
Advancing further into the realm of IR (3), the evaluation
of information involves semantic interpretation and more
complex stimuli with large variability. In this escalation,
the brain responses follow an increasingly less common
temporal structure across trials. This leads to a larger vari-
ability in the latencies, but also in the morphology of the
ERPs and, therefore, to a larger uncertainty in the decod-
ing, see Figure 1.
The challenge of extracting information from a single-
trial EEG gets even larger when free-viewing applications
are considered. A suitable method for the investigation of
free-viewing tasks are eye-fixation-related potentials
(EFRP), see (Baccino & Manunta, 2005). Nevertheless, the
decoding of the cognitive processes is hampered. On one
hand, further unrelated brain activity connected to saccades
and artifacts from eye movements overlay the EEG and, on
the other hand, the temporal relationship between target-
related ERP components and eye movements is variable
because task-relevant processing of visual objects may
already start before the beginning of a saccade, for exam-
ple when the visual object is still at a peripheral location
(Wenzel, Golenia, & Blankertz, 2016).
Neurophysiology-Based Relevance Computation
We propose a method to predict the relevance of textual
keywords from brain signals and eye movements. The
approach follows a supervised learning scheme, in which a
user-specific classifier is trained by using labeled data.
Then, the trained classifier can be used to generate rele-
vance measures online, which can potentially be used in a
feedback loop while the user interacts with the system.
This machine learning approach is parallel to most modern
BCI systems (Nijholt et al., 2008).
Training the classifier. The purpose of this first phase
(referred as the calibration phase) is to gather enough
brain activity associated with the users relevance judg-
ments to train a classifier that will then be used to gener-
ate relevance measures online. A series of keywords for
which relevance labels are known are presented to the
user, and eye tracking is employed to identify when an
eye fixation falls on a keyword. For each fixation that
falls on a keyword, a high-dimensional feature vector is
extracted from the EEG and eye movements (see below)
and is labeled as relevantor irrelevantaccording to
the known label of the keyword. A classification function
is then trained to discriminate the feature vectors of the
relevantand the irrelevantclasses. To this end, regu-
larized linear discriminant analysis is used (Friedman,
1989), whereby the shrinkage parameter is calculated with
an analytic method (Ledoit & Wolf, 2004; Schäfer &
Strimmer, 2005).
Online relevance computation. Once the system has been
calibrated for the specific user by training a user-specific
classifier, the user can interact with the system while EEG
signals and eye movements are monitored (referred to as
the online phase). For each keyword fixated on, a high-
dimensional feature vector is extracted (see below), and the
classifier infers its label online as belonging to the rele-
vantor irrelevantclasses. This means that the relevance
predictions are available to the system in real time and can
be used in an adaptive feedback loop.
Feature extraction. High-dimensional feature vectors are
extracted from EEG channels recorded at 1000Hz accord-
ing to the following steps: First, the multichannel EEG sig-
nal is re-referenced to the linked mastoids and low-pass
filtered (with a second order Chebyshev filter; 42 Hz pass-
band, 49 Hz stop-band). The continuous signal is then seg-
mented by extracting the interval from 100 ms to 800 ms
after the onset of every eye fixation. Slow fluctuations in
the signal are removed by baseline correction (i.e., by
FIG. 1. From target to relevance detection. The classical row-column speller (a) which consists essentially in the detection of flashing. The center speller
(b) relies on the recognition of a target shape/color. In contrast, the task to search for relevant terms (c) is incomparably more complex. [Color figure can
be viewed at wileyonlinelibrary.com]
920 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
subtracting the mean of the signal within the first 50ms
after the fixation onset from each epoch). The signal is
downsampled from the original 1000 Hz to 20 Hz to
decrease the dimensionality of the feature vectors to be
obtained (14 values per channel). A low dimensionality in
comparison to the number of available samples has been
shown to reduce the risk of overfitting to the training data,
which in turn is beneficial for the classification perfor-
mance (Blankertz, Lemm, Treder, Haufe, & Müller, 2011).
The multichannel signal is vectorized by concatenating the
values measured at the EEG channels at the 14 time points.
The fixation duration is concatenated as an additional fea-
ture to the EEG feature vector. Other eye-tracking-related
features (e.g., gaze velocity) are not considered as they are
not provided in real time by the application programming
interface of the device. Further, eye-movement-related sig-
nal components are not removed from the EEG because
the classifier is expected to deal with task-unrelated eye-
movements.
Method validation. To validate the approach in terms of
computing relevance measures from semantic words, we
carried out a prior experiment (N= 15). The main question
addressed was whether relevance inference from the elec-
troencephalogram (EEG) can be applied in settings where
the interpretation of the semantics goes beyond the simple
recognition of a previously known letter, picture, or shape
that is repeatedly flashed. In the experiment, participants
looked for words that belonged to semantic categories, and
it was predicted in real-time which words, and thus which
semantic category, was the one the user was interested
in. Results showed that models using EEG features alone,
and in combination with the eye fixation duration feature
were able to generate single trial predictions on the key-
words significantly above chance levels. Further, these pre-
dictions were aggregated in real time to provide reliable
estimates of which were the semantic category of interest,
showing slight improvements when adding fixation dura-
tion to the EEG-based feature vectors. Complete details on
the prior experiment have been published separately in
Wenzel et al. (2017).
The prior experiment provided several insights. First, it
validated the use of EEG and eye gaze signals to infer sub-
jective relevance of words that required interpretation with
respect to their semantics in a free search task (as opposed
to commonly used countingtasks). Further, predictions
were generated on words that were presented simulta-
neously, relating neural activity to keywords using eye
tracking. The prior experiment also evidenced the rela-
tively low single-trial classification performances, which
were successfully dealt with in real time by averaging over
semantic categories. However, when interacting with a real
IR system, the user interest and intentions may be more
complex than as simulated in the prior experiment, and
other mechanisms should be envisaged to integrate contex-
tual information that may help to correct the noisy single-
trial prediction accuracies.
Addressing Uncertainty in an Online
Neuroadaptive System through Interactive Intent
Modeling
A promising solution to cope with the uncertainty in the
users intent is interactive intent modeling (Ruotsalo,
Jacucci, Myllymäki, & Kaski, 2015), where the potential
search intentions of the user are represented and visualized
as keywords, their relevance are estimated using feedback
signals from the user, and information corresponding to the
model is retrieved. In terms of neuroadaptive systems,
intent modeling can mitigate both the uncertainty related to
the noise present in neurophysiologic signals and the mis-
match between the users articulation of information needs
and the encodings of the information to be retrieved.
Adapting the intent model from suboptimal and noisy user
feedback
The intent model directly couples the potentially subop-
timal user feedback originating from implicit and explicit
user signals. The implicit feedback is connected to explicit
feedback by considering source-specific probabilistic
assumptions on their uncertainties. This provides the flexi-
bility to learn the true uncertainty of each feedback given
all preceding feedback.
Estimating the intent model. The relevance of keywords
in the model is described with a linear Gaussian model,
with which the accuracy of the feedback may differ for the
different source types (implicit or explicit). The relevance
of keyword iis modeled as
yiNx
iϕ,σ2=wi

,ð1Þ
where x
i
is the feature vector representing that keyword, ϕ
is the unknown weight vector which is shared between all
keywords and maps the feature vectors to relevance values
representing user intent, σ
2
is the variance of feedback
noise, and w
i
models the accuracy of the relevance feed-
back. We assume prior distributions on the parameters
to be
ϕN0,λIðÞ,
σ2InverseGamma ασ2,βσ2
ðÞ,
wiGamma αw,βw
ðÞ,
where λ,ασ2, and βσ2are fixed hyperparameters. A key
aspect of our approach is that we distinguish between
implicit and explicit feedback by using different hyperpara-
meters for prior of the accuracy values, that is, αexp
w,βexp
w

for explicit feedback and αimp
w,βimp
w

for implicit feedback.
The posterior of the model estimates both the users cur-
rent search intent (ϕ) and the accuracy of the user rele-
vance feedback (w
i
s). As mentioned, the accuracies of the
user feedback on keywords are unknown and drawn from
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
921
a gamma distribution with two parameters: alpha and beta.
The model differentiates among explicit and implicit feed-
back by using different sets of hyper-parameters for the
gamma distribution. The explicit feedback is considered
very certain (a gamma distribution with mean 1 and very
small variance, that is, αexp
w¼100,βexp
w¼100). On the other
hand, the implicit feedback is uncertain a priori (gamma
distribution with mean 0.5 and large variance, that
is, αimp
w¼1,βimp
w¼2), and therefore, its accuracy is mostly
inferred from observations. For example, if the implicit
feedback is in line with the previous history of feedback,
then it will be inferred as certain and will contribute to the
user model. However, if it contradicts the systems current
belief, learned from sequence of feedback, then its accu-
racy may be inferred as a low value and it will not affect
the user model (the posterior of ϕ) much. The model infers
the true accuracies and corrects the noise in the feedback.
We use mean-field variational inference for the posterior
inference (Attias, 1999; Kangasrääsiö, Chen, Glowacka, &
Kaski, 2016).
Estimating Document Relevance
In addition to estimating the relevances for the key-
words in the intent model, the relevances of the documents
are estimated and ranked. We employ the feature transfor-
mation that projects the relevances estimated for the key-
words to the documents (Daee et al., 2016). The
underlying principle is that the transformation projects doc-
uments in the feature space of the keywords as the rele-
vance of a document is a weighted sum of the relevance of
individual keywords that have appeared in it. Based on this
projection, the relevance of a document also follows Equa-
tion 1 with the difference that the document feature vector
is generated from the feature projection.
Exploring uncertainty. Estimating the intent model by
directly exploiting the feedback observed from the user
yields to showing items like those already judged relevant
by the user in the previous iterations. Because the implicit
feedback observed from the user may be inaccurate, this
exploitative choice might cause the intent model to con-
verge to a suboptimal representation of the users intention.
Alternatively, the system might exploratively select items
that are relevant, but also uncertain. These items are likely
to be better for obtaining feedback in subsequent iterations
as they are novel and not too similar to the ones already
judged by the user.
Multiarmed bandits have been shown to be able to
model this exploration and exploitation dilemma in infor-
mation seeking (Ruotsalo et al., 2015). We use the Thomp-
son sampling algorithm (Agrawal & Goyal, 2013) as a
solution to the multiarmed bandit problem, to control the
exploration and exploitation balance of the recommended
keywords and documents (Daee et al., 2016). The idea
behind Thompson sampling is that the uncertainty in the
marginal posterior of ϕcan by itself control the exploration
and exploitation of the items. To implement the algorithm,
it is enough to draw a sample from the posterior and rank
all the keywords and documents accordingly. In detail, the
Thompson sampling algorithm performs the following
steps in each iteration:
1. Draw a sample from the marginal posterior of ϕand denote
it as ϕ
p
.
2. Rank all the keywords based on the inner product xT
iϕp.
3. Rank all the documents based on the inner product xT
jϕp.
4. Recommend the highest ranked items and gather the
feedback.
5. Update the posterior.
Here, x
i
and x
j
denote the feature vectors of keyword
iand document j(after the transformation) respectively.
The highest ranked recommendations were expected to
consider the balance between exploration and exploitation
(Agrawal & Goyal, 2013).
Visualizing the Intent Model for Explicit and Implicit
Interaction
To enable implicit and explicit feedback from the user,
the intent model needs to be visualized for interaction. The
implicit feedback is captured via capturing eye fixations
and EEG signal.
Interface views. The interface consists of two separate
views: intent model view and document view. The intent
model view, shown in Figure 2, visualizes the top-k key-
words chosen based on their estimated weights resulting
from the Thompson sampling algorithm. The view
employs a circular layout chosen to increase eye tracking
accuracy, which is higher at the center of the screen. The
keyword are positioned randomly but the layout is opti-
mized to increase the distance between neighboring key-
words for more robust matching with eye fixations. The
document view, shown in Figure 3, has a conventional
ranked list visualization.
Interaction. The search is initiated by entering a query,
which results in the first set of results retrieved by the sys-
tem. To direct the search, users can open a view that dis-
plays a set of keywords that are potentially relevant to the
userssearch intent. The users can examine these keywords
and provide explicit relevance feedback on one of the key-
words by clicking on it. Although users examine the key-
words, the physiologic classifier generates implicit
relevance feedback on them. The system then updates the
intent model by taking into account both the explicit rele-
vance feedback, and the implicit feedback generated from
the keywords the user fixated on. The system then returns
the next iteration of results. This process is repeated until
the user decides to change the query or ends the search
task. Figure 4 depicts the user-system interaction as a
control loop.
922 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
An Experiment in Neuroadaptive Literature Search
This experiment helps to evaluate the approach and sys-
tem presented in the previous two sections by investigating
the following questions:
Is it possible to predict online relevance from neuro-
physiology in a realistic search task and integrate it as
implicit feedback in combination with explicit feedback in
interactive intent modeling?
System Apparatus
The system that integrates neurophysiology-based implicit
feedback with interactive intent modeling is implemented as a
web application using a frontend (the interface) - backend
(the engine) architecture, see Figure 5. The engine comprises
of three main components: The Controller, which coordinates
the different components of the system; the Physiologic Clas-
sifier, which generates real-time implicit relevance feedback,
and the Interactive Intent Model, which handles the user
model and the information items of the system. The Physio-
logic Classifier is implemented within the framework of the
BBCI-Toolbox.
1
For each gaze-fixation, the classifier sends
to the Controller a relevance value. The Controller checks
whether the fixation falls on a keyword visible on the screen
to associate the predicted relevance value to it. For collecting
eye movements, the system uses the SensoMotoric Instru-
ments RED500 eye tracker, interfaced through the SMI
iViewX SDK.
2
For collecting brain signals, the system sup-
ports the BrainProducts QuickAmp and BrainAmp ampli-
fiers,
3
both of which recorded 32 EEG channels at a
samplingrateof1000Hz.TheInteractive Intent Model uses
the same document-retrieval model as in Ruotsalo
et al. (2013) to select subset of documents, and uses a data
set from the following data sources: the Web of Science pre-
pared by Thomson Reuters, Inc., the digital library of the
Institute of Electrical and Electronics Engineers (IEEE), the
digital library of the Association of Computing Machinery
(ACM), and the digital library of Springer. The hyperpara-
meters of the intent model were tuned as ασ2¼2, βσ2¼0:1,
and λ= 0.1 based on pilot experiments (N = 27).
Participants
Sixteen participants (3 females) took part in the experi-
ment. The participants ranged from 22 to 39 years old
FIG. 2. A screenshot of the user interface displaying the intent model view.
1
https:/github.com/bbci/bbci_public
2
http://www.smivision.com/
3
http://www.brainproducts.com/
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
923
(M= 28.3). Three participants were postdoctoral researchers,
and the rest were students (8 post-graduate, 5 undergraduate)
from the University of Helsinki in Finland and the Univer-
sity of Padova in Italy. The participants reported themselves
as being physically and mentally healthy. The participants
reported a good level of English (M= 4.0, SD = 0.9, on a
1 to 5 scale) and high expertise in computer science (M=
4.4, SD = 0.6, on a 1 to 5 scale). Their experience with
browsing scientific literature (M= 3.6, SD = 0.9, on a 1 to
5 scale) and their prior knowledge of machine learning (M=
2.8, SD = 1.5, on a 1 to 5 scale) varied.
Procedure and Experimental Task
At the beginning of the session, the participants were
welcomed and briefed as to the procedure and purpose of
the experiment before signing the informed consent form.
The participants were instructed about the duration of the
experiment and reminded that they could withdraw from
the experiment at any point in time, without facing nega-
tive consequences. Although the physiologic sensors were
set up, the participants filled a background information
questionnaire. Following, a standard 9-point eye tracker
calibration procedure was carried out repeatedly until
reaching an error smaller than 0.5 degrees of visual angle.
The calibration phase. The participants then engaged in
the calibration phase for around 1 hour, until the system had
collected enough data points to train the physiologic classi-
fier. The participants were allowed to have small breaks dur-
ing the calibration phase whenever they felt tired or their
concentration was diminishing. To collect training data for
the physiologic classifier, we generated a data set that
matched the application domain by using a subset of the
data set used by the interactive intent model system. The
data set consisted of a set of topics with associated key-
words and was created using expert judgments in an itera-
tive process that aimed at minimizing the overlaps between
the topics, while maximizing the dissociation between rele-
vant and irrelevant keywords to a given topic.
4
Participants were prompted with a list of five topics,
randomly selected from the calibration data set. On select-
ing a topic, a series of keywords were shown to the user,
who was asked to select the keywords relevant to the topic.
This procedure was repeated iteratively for several topics,
until the system had gathered enough data to train the
physiologic classifier.
5
The online phase. Once enough data had been collected
and the physiologic classifier had been trained, the partici-
pants engaged in the online phase. Participants were pro-
vided the following instructions:
Imagine that you are going to write an essay about topic
X. Please bookmark the articles on the scroll list that you
think are relevant to the topic, so that you can use them
later in the essay. You will later be asked to write a short
outline of the essay based on your bookmarked articles.
The participants had to perform two versions of the
same task, using the topics neural networksand support
vector machines.One of the tasks was performed using
the full system. The other task was performed using a
baseline system, which behaved in the exact same way as
the full system, but no implicit relevance feedback was fed
to the interactive intent model system. Instead, only the
explicit feedback provided by the user was used to refine
the user model and present the next iteration of results.
The participants were unaware that they were using two
different systems, and they were naïve about the systems
implementation.
For evaluation purposes, the participants were prompted
at the end of each iteration with a dialog asking them to
label the relevance of the keywords they had fixated on
(on a scale from 0 to 5). This allowed the ground truth
to be collected on the relevance of the presented keywords
as perceived by the users. This was otherwise not
FIG. 3. A screenshot of the user interface displaying the document view.
[Color figure can be viewed at wileyonlinelibrary.com]
4
For review: Refer to Appendix A for more details on the generation
of the calibration data set.
5
For review: Refer to Appendix B for details on how the assessment
of keywordsrelevance was carried out by the participants during the cali-
bration phase.
924 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
available, as the keywords were generated in real-time
from the interactive intent model system, and their rele-
vance naturally depends on the usersinformation needs,
which were not known a priori.
The participants performed each task in the online phase
for around 20 minutes, for a maximum of 10 iterations.
The task and system type were counterbalanced. On com-
pletion of the task, participants were rewarded with two
movie tickets. In total, the experiment lasted approximately
2.5 hours.
Measures and Analyses
Calibration phase. To evaluate the feasibility and perfor-
mance of the system in predicting relevance from brain
signals, we first evaluated the classification performance in
the calibration phase. The data used in the calibration
phase were controlled and had the advantage that the same
data set was used to train the different user-specific classi-
fication models. Classification performance was computed
in terms of area under the ROC curve (AUROC) and was
evaluated using a standard 10 ×10 fold cross validation
approach. AUROC is a widely used and sensible measure,
even under class imbalances, that links the true positive
rate and the false positive rate while avoiding possible
misinterpretations such as the accuracy paradox (Zhu &
Davidson, 2007).
To quantify the significance and the effect sizes of the
implicit relevance feedback from the brain signals, we
compared the classification performances against perfor-
mances from prediction models learned from randomized
labels. Standard permutation tests were applied for signifi-
cance testing (Good, 2013). In detail, for each of the
16 participants, we ran within-participant permutation tests
with 1000 iterations. For each iteration, we learned a clas-
sification model using randomized labels, and we then
computed the p-value as the percentage of random classifi-
cation performances that were equal to or greater than the
true classification performance.
Online phase. The aim was to assess how well the classi-
fication performance achieved in the calibration phase
transferred to the online phase, during which the users
were engaged in a realistic information-seeking task, and
FIG. 5. Components of the system. [Color figure can be viewed at wileyonlinelibrary.com]
FIG. 4. Summary of the system as a control loop during the online phase.
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
925
the data presented to the user from which implicit rele-
vance feedback was classified were generated in real-time.
To do so, for each participant we computed the classifica-
tion performance in terms of AUROC for each of the fix-
ated keywords in the online phase in the tasks for which
the participants used the full system. We used the feedback
provided by the participants on the keywords as the labels.
We binarized the user feedback, so that keywords that
were rated between 0 and 2 were considered irrelevant and
keywords that were rated between 3 and 5 were considered
relevant. Further, participants P05 and P06 had to be
rejected from the analysis because the server hosting the
interactive intent model system went down during the exe-
cution of the online phase.
As explained in Section Addressing Uncertainty in an
Online Neuroadaptive System through Interactive Intent
Modeling, in each iteration, the intent model learns the rel-
evance of all keywords from the available sequence of
explicit and implicit feedback. Accordingly, we also com-
puted the classification performance in terms of the
AUROC of the relevance of keywords estimated by the
intent model. This is the performance after the user model
has accounted for the noise in implicit relevance feedback
values coming from the physiologic classifier.
Task performance. After completion of the search task,
participants were asked to write down some of the con-
cepts that they had learned about the topics, which lead to
a very heterogeneous collection of mini-essaysnot suited
for comparison across participants. Instead, to assess
whether using physiology-based implicit relevance mea-
sures had an influence on the task performance, we com-
pared the quality of the documents that participants
bookmarked when using the full system (including implicit
relevance feedback) and when using the baseline system
(that did not include implicit relevance feedback). In total,
participants generated 397 bookmarks on 277 different
documents. On the population level, documents had often
been bookmarked using both system types (e.g, one partic-
ipant had bookmarked a document in the baseline system
condition, while another participant had bookmarked the
same document in the full system condition). To assess
any change in task performance between the two condi-
tions, we therefore limited the analysis to a representa-
tivesubset of documents that minimized overlaps.
Documents were selected as representativefor one of the
conditions if on the population level, the document was
bookmarked two or more times than in the other condition.
This lead to a subset of 21 documents (8 representing the
full system condition, and 13 representing the baseline sys-
tem condition). Documents were then rated by 3 experts
(on a 1-6 rating scale), on their relevance (i.e., is this docu-
ment relevant to the search task), obviousness (i.e., is this a
well-known overview article in a given research area), and
novelty (i.e., is this article uncommon yet relevant to a
given topic or specific subtopic in a given research area)
(Ruotsalo et al., 2013). Ratings were averaged across
experts, and Wilcoxon rank-sum tests were used to test for
statistical differences between the two conditions (full sys-
tem vs. baseline system), for each of the three rating cate-
gories (relevance,obviousness, and novelty).
Results
Calibration phase. Classification performance proved
to be significantly better than random for 13 out of
16 participants, meaning that we were able to success-
fully train the classifier for around 80% of the partici-
pants. On the population level, AUROC resulted in 0.61
0.02 (mean standard error of the mean). Figure 6
presents the individual classification performances in the
calibration phase.
FIG. 6. Individual classification performances in the calibration phase in terms of area under the ROC curve (AUROC), and improvement over the random
baseline at the levels of p< 0.05 (*), and p< 0.001 (**). The horizontal lines represent the mean (solid) and random (dashed).
926 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
Online phase. Online relevance predictions as directly
obtained through the physiologic classifier presented aver-
aged AUROC values on the population level of 0.53
0.03 (mean standard error of the mean). The perfor-
mance was improved by the user model, leading to aver-
aged AUROC values of 0.60 0.03. In fact, the intent
model increased prediction performance for 10 out of the
12 participants for which a classifier was successfully
trained in the calibration phase, representing over 80% of
these participants. Figure 7 shows the results of the classi-
fication performance for the calibration phase and for the
online phase, in terms of the implicit relevance feedback,
both as directly obtained through classification of brain
signals, and as inferred by the intent model.
Task performance. Wilcoxon rank-sum tests did not
show statistical difference between the full system and
baseline system, for any of the rating categories: In terms
of relevance, expert ratings provided to representative doc-
uments of the full system (Mdn = 3.5) did not significantly
differ from those of the baseline system (Mdn =4.67),
W=69,p= 0.22. In terms of obviousness, expert ratings
provided to representative documents of the full system
(Mdn = 2.67) did not significantly differ from those of the
baseline system (Mdn = 3.33), W= 73.5, p= 0.12. In terms
of novelty, expert ratings provided to representative docu-
ments of the full system (Mdn = 3.83) did not significantly
differ from those of the baseline system (Mdn = 3.67),
W=55.5,p=0.82.
Discussion and Conclusions
A methodology for predicting implicit relevance feed-
back from human-brain measurements in a realistic search
scenario was presented. The methodology was implemen-
ted in a first-of-its-kind interactive IR system that com-
bines brain-based feedback and eye tracking with scarce
explicit feedback. To our knowledge, the presented system
is the first closed-loop IR system that utilizes brain-based
feedback, combines it with eye-tracking and explicit feed-
back, and is evaluated in realistic IR tasks.
Empiric Findings
The empiric evidence suggests that the presented meth-
odology allows to reliably train classification models for
implicit relevance prediction by using complex real-world
data. The results show that the classification performance
significantly outperforms random predictions for over 80%
FIG. 7. Individual classification performance in terms of area under the ROC curve (AUROC). Left: offline prediction in the calibration phase.Middle:
neurophysiologic prediction in the online phase.Right: intent model prediction in the online phase.Smaller black dots and dashed lines indicate mean
classification performance. The dashed horizontal line represents random classification. Participants for which calibration did not outperform random pre-
dictions are presented in gray. [Color figure can be viewed at wileyonlinelibrary.com]
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
927
of the participants, with some of the participants reaching
AUROC values over 0.7. One explanation for the random
classification outcomes among the remaining approxi-
mately 20% of participants could be the fact that BCI con-
trol does not work for a non-negligible proportion of users
(approximately 15 - 30%) (Acqualagna, Botrel, Vidaurre,
Kübler, & Blankertz, 2016; Allison et al., 2010; Blankertz
et al., 2010; Guger et al., 2009). These results are compara-
ble to the ones obtained in the prior experiment (see
Section Validating the Relevance Computation Method,
and (Wenzel et al., 2017)), where a limited and controlled
data set of keywords was used.
In addition, the results show that the classification
performances achieved using the controlled calibration
data setin the calibration phase transferred to the online
phase, during which the retrieved documents and keyword
varied for each participant, and their perception of rele-
vance was related to their current information needs, rather
than to a predefined experimental task. Although the classi-
fication performance decreased as expected, the overall
distribution across participants remained above random
classification levels.
Further, we demonstrate that the approach is able to
combine the noisy neurophysiology-based implicit rele-
vance feedback with limited explicit feedback (one per
search iteration), which improved the classification perfor-
mance for over 80% of the participants for which we
had successfully trained a classifier during the calibration
phase.
Figure 7 shows atypical values for participant P02. By
looking at the data, we found out that this participant pro-
vided highly unbalanced ground truth in the online phase
(i.e., 96% of the ground truth provided was from the rele-
vant class), which explains the drastic changes in the
AUROC values. Thus, the magnitude of such changes in
the performance measures should be interpreted cautiously.
Further, we looked at the participants who consistently pre-
sented high AUROC values (i.e., P04,P10, and P16)to
identify factors that could explain their better performance,
which could potentially be used to improve the overall
model and the design of future studies (e.g., level of educa-
tion, prior knowledge about the topic, reported satisfaction
and engagement with the system, etc.). We did not find
anything especially noteworthy about them, nor perfor-
mance differences between undergraduate and postgraduate
participants.
Limitations
Our approach and study includes at least the following
limitations. First, the predicted relevance from physiology,
although promising, still leaves room for improvement, both
in terms of classification performance and uniformity across
participants. Second, as the online phase involves an online
interaction loop between the participants and the system, we
could not perform offline permutation tests to evaluate how
much did the achieved classification performances improved
over random classifications (as done for the calibration
phase). This would have provided further empiric insight on
how much the classification performance transferred from
the controlled calibration phase to the realistic online phase,
as well as on how much of the performance of the intent
model is explained by the implicit feedback and how much
by the scarce explicit feedback. Third, the analysis on the
selection behavior of bookmarked documents did not yet
yield conclusive results in terms of task performance
improvements. Future work should extend the presented
results by further studying how the reported classification
performances could transfer over to search task perfor-
mance. Overall, although we report on the first-of-its-kind
closed-loop information retrieval system that fully integrates
neurophysiologic signals while users perform real informa-
tion seeking task, the experimental method and results indi-
cate that there is still room for improvement, both to
demonstrate the impact of the implicit feedback to the over-
all intent model, as well as to exemplify how using neuro-
physiologic input transfers to improved task performance.
Implications and Future Work
In essence, relevance judgments happen in the human
brain and therefore the most intriguing way to predict rele-
vance is to directly utilize the brain signals. These signals
have advantages over the more conventional sources of
user signals from a practical IR point of view. The record-
ing of the relevance judgments directly from human neuro-
physiology do not require any explicit user interaction,
such as user actively clicking on items. The current work
contributes showing that predicting the relevance from
neurophysiology on information presented in realistic
information retrieval system responses is possible with
promising accuracy. Moreover, it is demonstrated that the
relevance prediction methodology can be operationalized
as a part of a closed-loop information retrieval system. In
concrete the work contributes not only a reference
approach and procedure but proposes interactive intent
modeling (Ruotsalo et al., 2015) as a promising user intent
and retrieval model suited for processing implicit relevance
and combining it with other relevance information such as
explicit feedback. Our findings open a horizon for informa-
tion retrieval systems that can detect relevance directly
from human neurophysiology and combine this with poten-
tially scarce explicit signals without requiring users to
devote attention for laborious explicit interaction. Future
studies can put more emphasis on demonstrating the actual
task performance improvement that can be obtained, and
can devise ways to collect repeated measures of implicit
responses to reduce uncertainty either from the same par-
ticipant or across participants. The inherent uncertainty in
both the brain measurements, attentional focus of the user,
and in data representations, however, call for computa-
tional methods for simultaneously modeling cognitional
states and the data for which these states are associated
with. Our findings already show a path towards closed-
928 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
loop systems that are able to analyze and utilize relevance
and human cognition directly from wearable sensors as it
is manifested as a part of human information search
activities.
Acknowledgments
We thank Mats Sjöberg, Antti Kangasrääsiö, Nishadh
Aluthge, and Hassan Abbas for their hard work in imple-
menting the system and running experimental studies. This
work has been supported by the European Commision
(MindSee FP7-ICT; Grant Agreement #611570).
References
Acqualagna, L., Botrel, L., Vidaurre, C., Kübler, A., & Blankertz, B.
(2016). Large-scale assessment of a fully automatic co-adaptive motor
imagery-based brain computer interface. PLoS One, 11(2), e0148886.
https://doi.org/10.1371/journal.pone.0148886
Agrawal, S., & Goyal, N. (2013). Thompson Sampling for Contextual
Bandits with Linear Payoffs. In Proceedings of the 30th International
Conference on Machine Learning (pp. 127135). PMLR.
Allegretti, M., Moshfeghi, Y., Hadjigeorgieva, M., Pollick, F. E.,
Jose, J. M., & Pasi, G. (2015). When relevance judgement is happen-
ing?: An EEG-based study. In Proceedings of the 38th International
ACM SIGIR Conference on Research and Development in Information
Retrieval (pp. 719722).
Allison, B., Luth, T., Valbuena, D., Teymourian, A., Volosyak, I., &
Graser, A. (2010). BCI demographics: How many (and what kinds of )
people can use an SSVEP BCI? IEEE Transactions on Neural Systems
and Rehabilitation Engineering, 18(2), 107116.
Arapakis, I., Athanasakos, K., & Jose, J.M. (2010). A comparison of gen-
eral vs personalised affective models for the prediction of topical rele-
vance. In Proceedings of the 33rd International ACM SIGIR Conference
on Research and Development in Information Retrieval (pp. 371378).
New York, NY: ACM. https://doi.org/10.1145/1835449.1835512
Arapakis, I., Konstas, I., & Jose, J.M. (2009). Using facial expressions
and peripheral physiological signals as implicit indicators of topical rel-
evance. In Proceedings of the 17th ACM International Conference on
Multimedia (pp. 461470). New York, NY: ACM. https://doi.org/10.
1145/1631272.1631336
Attias, H. (1999). Inferring parameters and structure of latent variable
models by variational bayes. In Proceedings of the fifteenth conference
on uncertainty in artificial intelligence (pp. 2130). San Francisco, CA:
Morgan Kaufmann Publishers Inc.
Baccino, T., & Manunta, Y. (2005). Eye-fixation-related potentials: Insight
into parafoveal processing. Journal of Psychophysiology, 19(3), 204215.
Barral, O., Eugster, M.J., Ruotsalo, T., Spapé, M.M., Kosunen, I., Ravaja, N.,
Jacucci, G. (2015). Exploring peripheral physiology as a predictor of
perceived relevance in information retrieval. In Proceedings of the 20th
International Conference on Intelligent User Interfaces (pp. 389399).
New York, NY: ACM. https://doi.org/10.1145/2678025.2701389
Barral, O., Kosunen, I., & Jacucci, G. (2017). No need to laugh out loud:
Predicting humor appraisal of comic strips based on physiological sig-
nals in a realistic environment. ACM Transactions on Computer-
Human Interaction, 24(6), 40. Retrieved from http://doi.acm.
org/10.1145/3157730. https://doi.org/10.1145/3157730
Barral, O., Kosunen, I., Ruotsalo, T., Spapé, M.M., Eugster, M.J.A.,
Ravaja, N., Jacucci, G. (2016). Extracting relevance and affect informa-
tion from physiological text annotation. User Modeling and User-Adapted
Interaction, 26(5), 493520. https://doi.org/10.1007/s11257-016-9184-8
Blankertz, B., Lemm, S., Treder, M., Haufe, S., & Müller, K.-R. (2011). Sin-
gle-trial analysis and classification of ERP components A tutorial. Neuro-
Image, 56(2), 814825. https://doi.org/10.1016/j.neuroimage.2010.06.048
Blankertz, B., Sannelli, C., Halder, S., Hammer, E.M., Kübler, A.,
Müller, K.-R., Dickhaus, T. (2010). Neurophysiological predictor of
SMR-based BCI performance. NeuroImage, 51(4), 13031309. https://
doi.org/10.1016/j.neuroimage.2010.03.022
Cowley, B., Filetti, M., Lukander, K., Torniainen, J., Henelius, A.,
Ahonen, L., Jacucci, G. (2016). The psychophysiology primer: A
guide to methods and a broad review with a focus on humancomputer
interaction. Foundations and Trends[textregistered] in Human
Computer Interaction, 9(3-4), 151308. Retrieved from https://doi.
org/10.1561/1100000065. https://doi.org/10.1561/1100000065
Daee, P., Pyykkö, J., Glowacka, D., & Kaski, S. (2016). Interactive intent
modeling from multiple feedback domains. In Proceedings of the 21st
International Conference on Intelligent User Interfaces (pp. 7175).
New York, NY: ACM. https://doi.org/10.1145/2856767.2856803
Eugster, M.J.A., Ruotsalo, T., Spapé, M.M., Barral, O., Ravaja, N.,
Jacucci, G., & Kaski, S. (2016). Natural brain-information interfaces:
Recommending information by relevance inferred from human brain
signals. Scientific Reports, 6, 38580. https://doi.org/10.1038/srep38580
Eugster, M.J.A., Ruotsalo, T., Spapé, M.M., Kosunen, I., Barral, O.,
Ravaja, N., Kaski, S. (2014). Predicting term-relevance from brain
signals. In Proceedings of the 37th International ACM SIGIR Conference
on Research & Development in Information Retrieval (pp. 425434).
New York, NY: ACM. https://doi.org/10.1145/2600428.2609594
Farwell, L.A., & Donchin, E. (1988). Talking off the top of your head:
Toward a mental prosthesis utilizing event-related brain potentials.
Electroencephalography and clinical Neurophysiology, 70(6), 510523.
Friedman, J.H. (1989). Regularized discriminant analysis. Journal of the
American Statistical Association, 84(405), 165175. https://doi.org/10.
1080/01621459.1989.10478752
Golenia, J.-E., Wenzel, M.A., & Blankertz, B. (2015). Live demonstrator
of EEG and eye-tracking input for disambiguation of image search
results. In Symbiotic interaction (pp. 8186). Cham: Springer. https://
doi.org/10.1007/978-3-319-24917-9_8
Golenia, J.E., Wenzel, M.A., Bogojeski, M., & Blankertz, B. (2018).
Implicit relevance feedback from electroencephalography and eye track-
ing in image search. Journal of Neural Engineering, 15(2), 026002.
https://doi.org/10.1088/1741-2552/aa9999
Good, P. (2013). Permutation tests: a practical guide to resampling
methods for testing hypotheses. Berlin/Heidelberg, Germany: Springer
Science & Business Media.
Guger, C., Daban, S., Sellers, E., Holzner, C., Krausz, G., Carabalona, R.,
Edlinger, G. (2009). How many people are able to control a
p300-based braincomputer interface (BCI)? Neuroscience Letters,
462(1), 9498.
Gwizdka, J. (2014). Characterizing relevance with eye-tracking measures.
In Proceedings of the 5th Information Interaction in Context Sympo-
sium (pp. 5867). New York, NY: ACM. https://doi.org/10.1145/
2637002.2637011
Gwizdka, J., & Mostafa, J. (2015). NeuroIR 2015: SIGIR 2015 workshop
on neuro-physiological methods in IR research. In ACM SIGIR Forum
(Vol. 49, pp. 8388). ACM. https://doi.org/10.1145/2888422.2888435
Gwizdka, J., & Mostafa, J. (2017). NeuroIIR: Challenges in bringing neu-
roscience to research in human-information interaction. In Proceedings
of the 2017 Conference on Conference Human Information Interaction
and Retrieval - CHIIR 17 (pp. 437438). New York, NY: ACM Press.
https://doi.org/10.1145/3020165.3022165
Jacucci, G., Fairclough, S., & Solovey, E.T. (2015). Physiological com-
puting. Computer, 48(10), 1216. Retrieved from http://ieeexplore.ieee.
org/document/7310960/. https://doi.org/10.1109/MC.2015.291
Kangasrääsiö, A., Chen, Y., Glowacka, D., & Kaski, S. (2016). Interactive
modeling of concept drift and errors in relevance feedback. In Proceed-
ings of the 2016 Conference on User Modeling Adaptation and Person-
alization (pp. 185193). New York, NY: ACM. https://doi.org/10.
1145/2930238.2930243
Kauppi, J.-P., Kandemir, M., Saarinen, V.-M., Hirvenkari, L., Parkkonen, L.,
Klami, A., Kaski, S. (2015). Towards brain-activity-controlled infor-
mation retrieval: Decoding image relevance from MEG signals. Neuro-
Image, 112, 288298. https://doi.org/10.1016/j.neuroimage.2014.12.079
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi
929
Kelly, D., & Fu, X. (2006). Elicitation of term relevance feedback: An
investigation of term source and context. In Proceedings of the 29th
Annual International ACM SIGIR Conference on Research and Devel-
opment in Information Retrieval (pp. 453460). New York, NY: ACM.
https://doi.org/10.1145/1148170.1148249
Kelly, D., & Teevan, J. (2003). Implicit feedback for inferring user prefer-
ence: A bibliography. SIGIR Forum, 37(2), 1828. https://doi.org/10.
1145/959258.959260
Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-
dimensional covariance matrices. Journal of Multivariate Analysis,
88(2), 365411. https://doi.org/10.1016/S0047-259X(03)00096-4
Lemm, S., Blankertz, B., Dickhaus, T., & ller, K.-R. (2011). Introduc-
tion to machine learning for brain imaging. Neuroimage, 56(2), 387399.
Moshfeghi, Y., & Jose, J.M. (2013). An effective implicit relevance feed-
back technique using affective, physiological and behavioural features.
In Proceedings of the 36th International ACM SIGIR Conference on
Research and Development in Information Retrieval (pp. 133142).
New York, NY: ACM. https://doi.org/10.1145/2484028.2484074
Moshfeghi, Y., Pinto, L.R., Pollick, F.E., & Jose, J.M. (2013). Under-
standing relevance: An fMRI study. In European Conference on Infor-
mation Retrieval (pp. 1425). Berlin, Heidelberg: Springer.
Mostafa, J., & Gwizdka, J. (2016). Deepening the role of the user: Neuro-
physiological evidence as a basis for studying and improving search. In
Proceedings of the 2016 ACM on Conference on Human Information
Interaction and Retrieval (pp. 6370). New York, USA; ACM.
Nijholt, A., Tan, D., Pfurtscheller, G., Brunner, C., Millán, J.d.R.,
Allison, B., Müller, K.R. (2008). Brain-computer interfacing for
intelligent systems. IEEE Intelligent Systems, 23(3), 7279. https://doi.
org/10.1109/MIS.2008.41
Oliveira, F.T., Aula, A., & Russell, D.M. (2009). Discriminating the rele-
vance of web search results with measures of pupil size. In Proceedings
of the SIGCHI Conference on Human Factors in Computing Systems
(pp. 22092212). New York, NY: ACM. https://doi.org/10.1145/
1518701.1519038
Puolamäki, K., Salojärvi, J., Savia, E., Simola, J., & Kaski, S. (2005).
Combining eye movements and collaborative filtering for proactive
information retrieval. In Proceedings of the 28th Annual International
ACM SIGIR Conference on Research and Development in Information
Retrieval (pp. 146153). New York, USA; ACM.
Ruotsalo, T., Jacucci, G., Myllymäki, P., & Kaski, S. (2015). Interactive
intent modeling: Information discovery beyond search. Communica-
tions of the ACM, 58(1), 8692.
Ruotsalo, T., Peltonen, J., Eugster, M., Głowacka, D., Konyushkova, K.,
Athukorala, K., et al. (2013). Directing exploratory search with interac-
tive intent modeling. In Proceedings of the 22nd ACM International
Conference on Conference on Information & Knowledge Management
(pp. 17591764). New York, USA; ACM.
Schäfer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale
covariance matrix estimation and implications for functional genomics.
Statistical Applications in Genetics and Molecular Biology, 4(1). Arti-
cle 32. https://doi.org/10.2202/1544-6115.1175
Treder, M.S., Schmidt, N.M., & Blankertz, B. (2011). Gaze-independent
braincomputer interfaces based on covert attention and feature atten-
tion. Journal of Neural Engineering, 8(6), 066003.
Wenzel, M.A., Bogojeski, M., & Blankertz, B. (2017). Real-time inference
of word relevance from electroencephalogram and eye gaze. Journal of
Neural Engineering, 14(5), 056007. https://doi.org/10.1088/1741-2552/
aa7590
Wenzel, M.A., Golenia, J.-E., & Blankertz, B. (2016). Classification of eye
fixation related potentials for variable stimulus saliency. Frontiers in Neu-
roscience, 10, 23. Retrieved from https://www.frontiersin.org/article/
10.3389/fnins.2016.00023. https://doi.org/10.3389/fnins.2016.00023
Zhu, X., & Davidson, I. (2007). Knowledge discovery and data mining:
Challenges and realities. Hershey, PA: IGI Global.
930 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYSeptember 2019
DOI: 10.1002/asi