SciPapers
[en] (orig)
ORIGINAL RESEARCH
published: 11 April 2022
doi: 10.3389/fnins.2022.836834
Frontiers in Neuroscience | www.frontiersin.org 1April 2022 | Volume 16 | Article 836834
Edited by:
Patrick Krauss,
University of Erlangen Nuremberg,
Germany
Reviewed by:
Konstantin Tziridis,
University Hospital Erlangen, Germany
Richard Gault,
Queen’s University Belfast,
United Kingdom
*Correspondence:
Saijal Shahania
Myra Spiliopoulou
Specialty section:
This article was submitted to
Auditory Cognitive Neuroscience,
a section of the journal
Frontiers in Neuroscience
Received: 16 December 2021
Accepted: 16 February 2022
Published: 11 April 2022
Citation:
Shahania S, Unnikrishnan V, Pryss R,
Kraft R, Schobel J, Hannemann R,
Schlee W and Spiliopoulou M (2022)
Predicting Ecological Momentary
Assessments in an App for Tinnitus by
Learning From Each User’s Stream
With a Contextual Multi-Armed Bandit.
Front. Neurosci. 16:836834.
doi: 10.3389/fnins.2022.836834
Predicting Ecological Momentary
Assessments in an App for Tinnitus
by Learning From Each User’s
Stream With a Contextual
Multi-Armed Bandit
Saijal Shahania1*, Vishnu Unnikrishnan1, Rüdiger Pryss2, Robin Kraft3,
Johannes Schobel4, Ronny Hannemann5, Winny Schlee6and Myra Spiliopoulou1*
1Knowledge Management and Discovery Lab, Otto-von-Guericke University Magdeburg, Magdeburg, Germany, 2Institute of
Clinical Epidemiology and Biometry, University of Würzburg, Würzburg, Germany, 3Institute of Databases and Information
Systems, Ulm University, Ulm, Germany, 4DigiHealth Institute, Neu-Ulm University of Applied Sciences, Neu-Ulm, Germany,
5Sivantos GmbH - WS Audiology, Erlangen, Germany, 6Department of Psychiatry and Psychotherapy, University of
Regensburg, Regensburg, Germany
Ecological Momentary Assessments (EMA) deliver insights on how patients perceive
tinnitus at different times and how they are affected by it. Moving to the next level,
an mHealth app can support users more directly by predicting a user’s next EMA
and recommending personalized services based on these predictions. In this study,
we analyzed the data of 21 users who were exposed to an mHealth app with
non-personalized recommendations, and we investigate ways of predicting the next
vector of EMA answers. We studied the potential of entity-centric predictors that learn
for each user separately and neighborhood-based predictors that learn for each user
separately but take also similar users into account, and we compared them to a predictor
that learns from all past EMA indiscriminately, without considering which user delivered
which data, i.e., to a “global model.” Since users were exposed to two versions of the
non-personalized recommendations app, we employed a Contextual Multi-Armed Bandit
(CMAB), which chooses the best predictor for each user at each time point, taking each
user’s group into account. Our analysis showed that the combination of predictors into
a CMAB achieves good performance throughout, since the global model was chosen at
early time points and for users with few data, while the entity-centric, i.e., user-specific,
predictors were used whenever the user had delivered enough data—the CMAB chose
itself when the data were “enough.” This flexible setting delivered insights on how user
behavior can be predicted for personalization, as well as insights on the specific mHealth
data. Our main findings are that for EMA prediction the entity-centric predictors should be
preferred over a user-insensitive global model and that the choice of EMA items should be
further investigated because some items are answered more rarely than others. Albeit our
Shahania et al. CMAB for EMA Prediction
CMAB-based prediction workflow is robust to differences in exposition and interaction
intensity, experimentators that design studies with mHealth apps should be prepared
to quantify and closely monitor differences in the intensity of user-app interaction, since
users with many interactions may have a disproportionate influence on global models.
Keywords: contextual multi-armed bandits, mHealth for tinnitus, EMA prediction, similarity-based prediction,
prediction on sparse data, prediction on time series with gaps, prediction in mHealth data
1. INTRODUCTION
According to De Ridder et al. (2021), “The capacity to measure
the incidence, prevalence, and impact will help in identification
of human, financial, and educational needs required to address
acute tinnitus as a symptom but chronic tinnitus as a disorder.”
Mobile health apps for tinnitus have the potential of assisting
patients in self-assessment of their condition and of delivering
insights on tinnitus heterogeneity to the medical experts, as
reported by Probst et al. (2017),Cederroth et al. (2019), and
Pryss et al. (2019, 2021) among others. This is particularly
the case for mHealth apps that collect Ecological Momentary
Assessments (EMA): several studies on mHealth tinnitus apps
have demonstrated that EMA recordings deliver insights on
tinnitus stages during the day and on the interplay of personal
traits and severity of tinnitus symptoms (see, e.g., Probst et al.,
2017; Mehdi et al., 2020; Unnikrishnan et al., 2020a; Jamaludeen
et al., 2021).
Modern mHealth apps are able to deliver suggestions to the
app users, exploiting knowledge about the users prior behavior
(Martínez-Pérez et al., 2014) or behavior change (Mao et al.,
2020), and they contribute also to clinical decision support
(Watson et al., 2019). In the context of tinnitus, TinnitusTipps1
delivers information toward “health literacy” and suggestions
that promote well-being, e.g., suggestions on physical exercising.
Personalization of such suggestions implies taking account
of a user’s personal traits and needs, and has the potential
of improving user experience and of anticipating undesirable
developments in the user’s condition. However, personalization
demands the ability to learn from past EMA and to predict future
EMA.
In contrast to passive recordings, e.g., of ambient noise, EMA
recordings demand action by the app user. Some studies, as
by Probst et al. (2017), concentrate on users who have many
recordings. While such studies are beneficial for the acquisition
of insights on how tinnitus is experienced in general, they
contribute less toward personalized services, which need to learn
and predict for each user, even for users who interact rarely
with the app and deliver too few data. Schleicher et al. (2020)
model the intensity of the interaction with the app as adherence
and attempt to identify adherence patterns. Other studies, as
Unnikrishnan et al. (2020a) and (Prakash et al., 2021) and the
earlier version of this work (Shahania et al., 2021), investigate
ways of learning from users who deliver very little data. The
main objective of these studies is to provide tools that predict
1https://tinnitustipps.lenoxug.de/
future EMA of a user, thus forming a basis for the prediction
of undesirable events and for the design of personalized services.
The main challenge of such methods is the scarcity of data for
some users.
Technically speaking, a method that learns from all the data
of all the users induces a “global model.” It abstracts from the
idiosyncracies in the EMA recordings of each user and attempts
to predict the future condition of a user from all that is known
on the past EMA of all users. In contrast, a method that learns
from the data of a single user, builds a model peculiar to that
user, capturing the user’s idiosyncracies and exploiting them for
future predictions. Such a method is bound to learn from as little
data as are available for each user, but suffers from the obvious
disadvantage that some users have too few EMA recordings for
any reliable prediction of their future EMA. For such users,
it seems reasonable to exploit data of other, similar users for
predictive modeling. Indeed, in our earlier works (Unnikrishnan
et al., 2020a, 2021), we have shown that methods which exploit
similarity among users can predict future EMA recordings of
users with little data. We term a model learned on the data of a
user and users similar to him/her as a “local model, to highlight
the fact that such a model exploits only data in the user’s vicinity
(in a multi-dimensional space spanned over the static data of the
users and their EMA recordings). A model learned on the data of
the user without his/her neighbors is a special case of local model
that considers only a single user.
There is a well-known separation of methods into idiographic
ones that learn for each individual separately and nomothetic
ones that aim to explain the common traits of all individuals.
Without entering the conceptual debate of idiographic vs
nomothetic approaches (cf. the elaboration of Hermans, 1988
among many), we point out that from the technical perspective,
a nomothetic approach builds a global model, an idiographic
approach builds a local model for each single individual, while
an approach that exploits information on a user and the users
similar to her is a nomothetic approach that builds a local
model by concentrating only on few users that are similar to a
given user. Since our methods use the same machine learning
instruments independently on whether they build global or
local models, we avoid the distinction between nomothecy and
idiography hereafter.
In this study, we build upon our earlier works on learning
local models from little data (Unnikrishnan et al., 2020a), on
comparing methods that learn from all data of all individuals
to methods that learn local models (Unnikrishnan et al., 2019)
and on frameworks for such comparisons (Shahania et al., 2021).
We present a complete framework for studying how methods
Frontiers in Neuroscience | www.frontiersin.org 2April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
designed for little data can contribute to predicting the condition
of a user interacting with a mHealth for chronic tinnitus. In a
nutshell, we want to predict the next vector of answers in the best
possible way. We investigate the following research questions:
1. RQ 1: To what extent can methods that learn from little-data
deliver good predictions?
·RQ 1a: How to exploit little-data for predictions?
·RQ 1b: How to orchestrate the invocation of methods
building local models, so that they are only invoked when their
prediction is good?
·RQ 1c: How to measure the superiority of a little-data
method over a method that learns from all the data?
2. RQ 2: What factors influence the superiority of local vs. global
models?
·RQ 2a: How do different configurations of little-data
methods affect their performance?
·RQ 2b: How do different behavioral phenotypes affect the
performance of local vs global models?
We address these research questions on the data of a clinical
study involving patients with chronic tinnitus: the participants
used an mHealth app to record daily EMA, and received regular
non-personalized tips on good practices for living with tinnitus.
To answer RQ 1, we consider for streams. Their purpose is
to capture the different interaction behavior of the entities (here:
mHealth app users), which causes the variability of the data
stream per entity. Hereby, the global model exploits the data
stream of all entities, whereas the local models only use data
of one entity (entity-centric local model) or an entity and its
nearest neighbors.
To ensure that we chose the best configuration depending
on the varying data streams, we incorporate the global and
local models into a contextual multiarmed bandit (CMAB). This
bandit employs a strategy to select one model for each time point
based on past rewards and additional meta information of the
entity. For RQ 2, we derive an evaluation procedure along with
the factors that are responsible for selecting each type of model
for different data streams. These factors include the history length
of the entities, their personal traits, their time of arrival in the
system, and the temporal proximity of the predictions.
This article is organized as follows. Section 2 covers the
materials used for our study. Section 3 describes the state of
the art techniques present in stream learning and describes
techniques for the creation of multi-armed bandits. We are using
those for defining our proposed approach. This section also
contains details about the need to define global and local models
to tackle our problem. Sections 4.2 and 4.3 depicts the results on
RQ 1 and RQ 2 followed up with the discussion of the insights,
findings, insights, and future work in Section 5.
2. MATERIALS
For our work, we have considered the data from a clinical study
with the mHealth app TinnitusTipps, conducted in 2018/2019.
The study was approved by the Ethical Review Board of the
University Clinic Regensburg. The ethical approval number is
17-544-101. All study participants provided informed consent.
For more information on the inclusion/exclusion criteria of the
participants, please refer to Schlee et al. (2022).
The TinnitusTipps app (see text footnote 1) was developed
by computer scientists, psychologists, and Sivantos GmbH—WS
Audiology (a company specialized in hearing aids). As we detail
in our earlier work (Shahania et al., 2021), TinnitusTipps uses tips
to promote “health literacy, and engaging features to stimulate
user involvement.
The study participants were split into three groups
that differed on their exposition to the functionalities of
TinnitusTipps. The tips were not personalized, so they were
always delivered at random, independently of the group of
the user.
Group A: The users of the mHealth app received the tips for
the entire project duration of 4 months (n=11).
Group B: In the first 2 months, the users received no tips.
They only got the normal “TrackYourTinnitus function (see
Kraft et al., 2020), i.e., they only answered questionnaires. In
the second half of the project, in months 3 and 4, they received
the tips (n=10).
Group Y: This group is more similar to Group A, in the sense
that they received tips from the beginning (n=13). The
difference between groups A and Y is that the participants have
been supervised by different persons who followed slightly
different protocols. For this reason, we skipped group Y and
concentrated on groups A and B that followed the same study
protocol.
The study protocol encompasses two questionnaires, one to be
filled at registration and one EMA questionnaire to be filled at
least once a day. The registration questionnaire is the “Tinnitus
Sample Case History Questionnaire TSCHQ (Langguth et al.,
2007), consisting of 31 items. The EMA questionnaire consists
of the 8 items depicted on Table 1.
We removed all users of group Y and those belonging to no
group, i.e., we considered groups A and B only. An extensive
exploratory analysis of the data of these users appears in the
following section. It includes the distribution of the values of
the answers to the EMA questionnaire items for each group, a
TABLE 1 | Items S01 to S08 from the EMA questionnaire of TinnitusTipps—the
range of values for the answers of the items S02-S07 is between 0 and 100 (first
two columns shown also in Shahania et al., 2021).
Item Question description Short description
S01 Do you perceive the tinnitus right now?
S02 How loud is your tinnitus right now? Tinnitus loudness
S03 How distressed are you by your tinnitus right now? Tinnitus distress
S04 How well do you hear right now?
S05 How much are you limited by your hearing right now?
S06 How stressed do you feel right now? stress
S07 How exhausted do you feel right now?
S08 Are you wearing a hearing aid right now?
For S01 and S08, we have binary answer (yes/no).
Frontiers in Neuroscience | www.frontiersin.org 3April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
correlation analysis among the EMA items independently of the
groups, and a correlation analysis among the EMA items for
each group.
Before addressing the research questions, we performed an
exploratory analysis reported hereafter. For the RQs themselves,
we skipped the categorical items S01 and S08. Due to space
limitations, for some results on the RQs, we report only on the
items S02, S03, S07 (cf. 3rd column of the Table 1). All results are
in the Supplementary Material.
3. OUR CMAB-BASED METHOD
The architectural design for this article depends on several
factors, which are depicted in Figure 1. At each time point,
we want to predict the vector of answers. We have the large
architecture to do so but in the end CMAB chooses the arm that
is expected to be the best prediction. The expectation of quality
of the prediction is quantified as reward. This aids us in deciding
when local models are preferred over global models based on the
behavior of this bandit (cf. RQ 1b). Therefore, the core factors of
investigation are as follows:
1. Data representation : Depending on the structure and features
of the data, the bandit has to be designed accordingly. Users
answers to the items constitute a multidimensional vector
of features. We distinguish between representations for an
ensemble of bandits, where each ensemble member has a
predictor for one of the items, and a whole vector bandit,
where all the items are predicted as one vector by a single
bandit.
2. Arms as prediction models : The bandits choose the arm to
pull depending on the expected reward, informally the quality
of the prediction. We consider three arms: one arm considers
only the past data of the user for which the prediction must
be made (entity-centric arm), one arm additionally considers
the past data of the user’s k nearest neighbors (neighborhood
arm), and one arm considers the past data of all users (global
arm). Each arm of the bandit can be handled in the same way
or rules can be made to trigger the arms differently.
3. Sampling strategy: Since an arm is chosen based on the
expected rewards, past rewards of when an arm was chosen
are sampled in a heuristics called sampling strategy.
4. Context: If the bandit makes use of additional meta-
information regarding the current sample we call it context.
In our case, the context of observation is the group, to which
the entity referred by this observation belongs. We consider
one contextual bandit which considers as context the group (A
or B) to which the user belongs, i.e., considers only data from
users of this group when building the predictors (cf. point 2).
We also consider simple bandit that has no context, i.e., does
not take the groups into account.
In summary, we employed 4 different configurations which are
based on the variations of the context and data representations.
These configurations are as follows:
1. Contextual Whole Vector Bandit: CWB
2. Simple Whole Vector Bandit: SWB
3. Contextual Ensemble Bandit: CEB
4. Simple Ensemble Bandit: SEB.
The SEB configuration means that our bandit does not use any
additional information to decide which arm to use but instead
completely relies on the sampling strategy being employed but
for each item, there is a different bandit provided. Thus, the
SEB indirectly has some contextual knowledge about the items
but none about the complete sample. On the other hand, CWB
configuration uses meta-information about the sample for its
decision but provides only a single output for all the items and
due that has no contextual knowledge about them. Furthermore,
the SWB configuration is a single bandit that uses no contextual
information about either the samples or the items, whereas the
CEB is provided with the context of both things.
3.1. Data Representation: Entities and
Their Observations
The data are comprised of two main components: the entities
and their stream of observations. Hereby every entity is a user
of the system, which is represented by observations in the form
of 6-dimensional multivariate time series (cf. items of the EMA
questionnaire in Table 1). The time series of the user is the
sequence of calendar times when we have seen the user. t0is the
first time and tnis the last time when we see the user. Let u be a
user, and tu,0,tu,1, ..., tu,nube the time steps at which the user was
observed. These are calendar time steps, i.e., they contain a date
and clock as hh:mm:ss. An entity might interact with the app
in different intervals, therefore each entity contains a different
amount of observations and the time between observations might
be hours to days. Therefore an observation oiat a time steps tiis
an element of the time series for a particular entity and might
contain null (NaN) values. For us a time point tiis not a specific
point of date, but instead an answer is given from any entity, i.e.,
we sort the observations by their date of appearance and label the
sorted list as time steps. Therefore there is no missing time step.
But since the time step tiis dependent on the entity’s interaction
with the mHealth app, the difference between time steps ti1and
tican range between a day to a week. Furthermore, at a particular
ti, if there is a NULL value corresponding to some questions, we
replace it with the value from the prior time step ti1.
As discussed in point 1 the output of the bandit has two major
options namely whole vector and ensemble bandit. The whole
vector bandit means that we provide a single output, i.e., a 6-
dimensional vector, for all the items of the observation. In the
ensemble setting, we instead use one bandit per item that returns
a scalar output. It would be possible to combine both approaches,
i.e., combine certain items for one bandit but not all. We did not
investigate this option for this article.
3.2. Arms as Prediction Models
As mentioned earlier, we predict the upcoming observations
using a bandit. For this purpose, we have to decide the definition
of the arms of the bandit which in our case are models used for
prediction. To better support the different types of entities and
their behavior including the number of observations per entity
Frontiers in Neuroscience | www.frontiersin.org 4April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
FIGURE 1 | The figure shows the outline of the system used to predict the answers given by the entities. The entire architecture is divided into 4 main parts. The first
part shows the representation of the data and the output. The second part is about the prediction models as the arms of the bandit followed by the sampling strategies
and the context for the bandit. We have both ensemble and whole vector bandit for each contextual and simple bandits. RMSE here refers to root mean square error.
and thus exploiting also little data for our predictions (cf. RQ 1a),
several choices are made:
1. The arm setup based on the entity’s past observations (history)
[RQ 1a],
2. the learning algorithm for the predictor [RQ 1a],
3. the performance measure for the arm [RQ 1b],
4. and the reward function for the bandit [RQ 1c].
3.2.1. Arm Setup Based on the Entity’s History (for
(RQ 1a)
We design the bandit with 3 arms and hence have 3 different
setups in place for prediction based on the number of entities and
their respective observations. The settings are as follows:
1. The global arm: This is modeled based on the history of all the
entities.
2. The entity-centric arm: This is modeled separately for each
entity based on a history of only the entity itself.
3. The neighborhood arm: It is designed based on the history of
the entity itself and its k nearest neighbors.
With the above settings, it is possible to focus on different
properties of the data. On the one hand, the entity-centric
arm is potentially better in exploiting the personal traits
of an entity but information may not be always recent as
there might be time gaps between the observations from a
particular entity (cf. Section 3.1). On the other hand, the
global arm can capture this temporal proximity but may over-
generalize in predicting certain observations of entities. Hence,
we introduce additionally the neighborhood arm which acts
as a trade-off between the global and the entity-centric arms,
capturing both the temporal proximity and the local behavioral
patterns. Since both entity-centric and neighborhood arms
capture the local behavioral patterns, we refer to them as local
arms.
Based on how the arms are designed, all of them need a
different number of observations before they can be used, i.e., the
global arm needs only a fixed number of observations m, whereas
the entity-centric arm requires nobservations from the respective
user. Therefore, the neighborhood arm additionally needs also
observations from all its nearest neighbors, i.e., k·n. This implies
that at the beginning not all arms would be equally available for
the bandit, especially the neighborhood arm. We will keep the
arm setting fixed in a rule-based manner. This means that arms
can only be used when enough observations have been processed
for the model to start predicting. Hence, the global arm is used
as a fallback option when there are not enough observations
available for the local arms. For our experiments, we need to set
the respective number of observations mand n. Due to the global
arm needing to generalize compared to local arms mshould be
sufficiently larger than n. Additionally, nneeds to be smaller than
the minimum history length in the data which is 12 in our case
for entity 25. Based on these constraints, we chose mand nas 30
and 5, respectively, from prior experimentation. Similarly, kis set
Frontiers in Neuroscience | www.frontiersin.org 5April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
to 4 as it provided the best results for almost all the items based
on preliminary analysis.
3.2.2. Choice of a Learning Algorithm for the
Predictor (for RQ 1a)
As discussed in the Section 3.1, the data constitute a stream of
observations, so we considered following regression algorithms
for streams:
1. Hoeffding Tree (Hulten et al., 2001)
2. Hoeffding Adaptive Tree Regressor
3. Adaptive Random Forest Regressor (Gomes et al., 2017).
The only difference between the normal Hoeffding Tree
and the adaptive version is that for drift detection it uses
ADWIN (Bifet and Gavaldá, 2007), which is an adaptive sliding
window algorithm. In the preliminary experiments all the three
algorithms performed mostly at par with each other but we
decided to make use of the drift detection of the adaptive version.
It is assumed that this will allow us to capture the drifts in data
of the users, if any, in the future. The detailed descriptions of the
algorithms can be found in the Supplementary Material.
3.2.3. Performance Measure for the Arm (for RQ 1b)
The algorithms explored for prediction in Section 3.2.2 are
regression models, therefore, the squared error is picked as the
evaluation criterion. Since in other related stream classification
papers (Unnikrishnan et al., 2020a) the root mean squared
error (RMSE as in Equation 1) has been employed, we decided
to go forward with the same.
RMSE =s1
NX
i
(xi ˆxi)2(1)
For every target output and every time-step, RMSE value is
calculated, where xiis the actual value and ˆxiis the predicted
value, while Nis the number of observations. The RMSE value
is averaged over all target outputs to understand the overall
performance per time step.
3.2.4. Choice of Reward Function for the Bandit (for
RQ 1c)
Most of this articles in the related work used 1 error from the
models as their reward. At each time step ti, the bandit decides
which arm to use for prediction based on past rewards, where the
reward is the RMSE over all the answers to the items (cf. Equation
1), normalized on the upper bound of the error U, defined as the
largest value minus the smallest value that each target can acquire.
For our application scenario, U=100 0=100, since all items
we chose for our analysis have a value range from 0 to 100 (cf.
Table 1).
3.3. Choice of Sampling Strategies for the
Bandit
After the arms have been defined for the bandit, we need
to decide the heuristic to choose an arm for the prediction
of an observation. This choice is dependent on the sampling
algorithms employed by the bandit. The inspiration for the
sampling algorithms was taken from the related work explored.
To compare various strategies Upper Confidence Bound (UCB)
and Thompson Sampling (TS) were used to conduct the
experiments. UCB is based on a simple principle that uncertainty
regarding the arm is directly proportional to the importance of
the exploration of an arm. So if the arm is very uncertain, UCB
chooses this and picks the corresponding reward to this arm and
makes the arm less uncertain. This goes on until the uncertainty
is below some decided threshold. For our experimentation UCB1
was used which is formulated mathematically as follows:
Q(a)+s2log(t)
Nt(a)(2)
where Q(a) is the average reward of arm afor each round, tis the
total number of rounds “played thus far and Nt(a) is the number
of times arm awas selected thus far.
For each round/iteration/time-step, we play the arm that
maximizes Equation (2). The first term in the Equation (2)
controls exploitation, i.e., choosing the arm where the average
is the largest, whereas the second term controls the exploration,
hence trying to maintain the balance between both.
TS on the other hand makes use of modeling the rewards
as probability distributions to sample the expected reward of
an arm instead of using the average expectation like UCB
does (Equation 2). Hence, the exploration is handled via the
sampling process instead of an explicit representation in the
formula. The advantage over UCB according to the survey is
that already good arms are more likely to be exploited without
forcing exploration based on uncertainty. Generally, this allows
TS to converge faster to an optimal solution meaning the one
that always chooses the model with the maximum reward. But
this highly depends on the correct modeling of the reward. For
our analysis, we considered normal distribution for sampling.
Additionally, to evaluate the applied strategies we have also
considered the following sampling methods:
1. optimal: chooses an arm that outputs the maximum reward
out of all;
2. worst: chooses an arm that outputs the minimum reward out
of all;
3. random: chooses any arm randomly.
With the sampling strategies, we decided on the last building
block for the bandit so that it can invoke the different arms when
they are potentially most beneficial (cf. RQ 1b). This of course
also is dependent on the arm itself and the reward function we
have chosen for its representation.
4. RESULTS
4.1. Exploratory Data Analysis
To acquire better insights on the data distribution of the EMA
of group A and group B participants, we first performed a
univariate analysis on the user’s answer to each EMA item and
then we computed the correlation matrix for the EMA items,
using heatmap as a basis for each group. Also, a comprehensive
Frontiers in Neuroscience | www.frontiersin.org 6April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
TABLE 2 | Number of times users answer zero for S01 in relation to their total number of observations overall in their first 2 months and next 2 months of the study for
group A and group B.
Group A Group B
First 2 Months Next 2 Months First 2 Months Next 2 Months
User Total Observations Perc. of 0s Total Observations Perc. of 0s User Total Observations Perc. of 0s Total Observations Perc. of 0s
17 108 69.4% 62 74.2% 20 132 0.0% 84 0.0%
18 50 22.0% 35 0.0% 24 102 0.0% 80 0.0%
19 20 20.0% 0 0.0% 25 11 72.7% 1 100%
22 157 0.0% 130 0.0% 29 169 34.9% 149 1.3%
23 129 0.8% 20 0.0% 35 190 0.0% 134 0.0%
28 150 49.3% 125 48.8% 42 58 34.5% 21 4.8%
30 101 0.0% 27 0.0% 47 165 1.2% 135 0.0%
31 104 18.3% 99 1.0% 48 173 0.6% 153 0.0%
33 39 12.8% 0 0.0% 51 259 0.4% 168 0.6%
40 24 16.7% 8 0.0% 52 45 26.7% 9 11.1%
43 50 0.0% 18 0.0% - - - - -
FIGURE 2 | The figure shows correlation analysis on the EMA items based on
Pearson’s correlation. When there is no correlation between 2 variables (when
a correlation is 0 or near 0) the color is gray. The darkest red means there is a
perfect positive correlation, while the darkest blue means there is a perfect
negative correlation. Additionally, the numbers inside each cell represents the
absolute value of the correlation.
analysis on the item S01 has been done to understand the role of
perceiving tinnitus to the answers given by the user.
4.1.1. S01 Analysis
To understand the differences between the tinnitus and non-
tinnitus perception times of the user, the values of item S01 were
analyzed for both the groups (cf. Table 2). In principle, Group B is
more active than Group A. Independently of the group, we have
fewer observations from the users in the next 2 months of the
study. Users 17, 28 from Group A and users 29, 42 from Group B
are the ones who do not perceive tinnitus most of the time. Group
B users mostly said so in the first 2 months while the Group A
users said so in both the first and the next 2 months of their study.
4.1.2. Univariate Analysis
For the univariate analysis, the box plots and distributions for
each of the items are used to see how the overall spread is for the
answers between 0 and 100. We observe that for all the variables
except S04 for both the groups, the likelihood of having the value
0 is high. For variable S04, the values usually are between 20 and
100. For Group A the mode of S04 is around 80 and for Group
B we have a trimodal distribution with modes around 50, 80, and
100. For S02 the largest mode is around 80 for both the groups
but for Group A most of the values lie around 40. For S03 we
have multi-modal distribution with a mode at around 30 and 85
for Groups A and B, respectively. For the items S05, S06, and
S07, there is a strong shift toward the value 0 in Group B. On
the contrary for Group A, the values lie on an average between
20 and 40, but also have a mode at 0. Furthermore, we inspected
the value distributions of the individual users for each item and
found that each user has their own “preferred range of values.
This strengthens our expectation that local models (idiographic
ones and those based on neighborhood) will be predictive for
some users.
4.1.3. Bi-variate Analysis
For the bivariate analysis among the EMA items, we put the
observations of both groups together and temporarily ignored
that some users contributed more observations than others.
The heatmap of the analysis is depicted in Figure 2. We found
that (independently of the users), tinnitus loudness (S02), and
tinnitus distress (S03) are positively correlated and so are
stress (S06) and exhaustion (S07). The item S04 on hearing
is particular in that higher values are better: accordingly, it
Frontiers in Neuroscience | www.frontiersin.org 7April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
FIGURE 3 | The figure shows for the ensemble bandit, the number of times each arm is invoked for the items by different samplers for both the groups. Five samplers
as discussed in Section 3.3 above have been compared here.
stands in negative correlation to stress (S06), exhaustion (S07),
and limitations because of hearing (S05). There is a significant
correlation between all the above correlations based on the
p values before the Bonferroni correction.
4.2. Experiments for RQ 1
In the first set of experiments, we are trying to evaluate if
the methods taking into account the local behavioral patterns
are superior to the global method. For this purpose, we are
looking at how often the global, entity-centric, and neighborhood
arms (NN) are chosen, therefore, their percentage of invocations
for the ensemble bandit (cf. Figure 3). Here, we see that for
both the groups, TS follows approximately the same trend as
that of the optimal sampler, whereas UCB has a very random
behavior. For group B, we see that the chances of each model
getting selected are approximately equally likely, whereas for
group A the likelihood of neighborhood arm being chosen is
comparatively low.
Frontiers in Neuroscience | www.frontiersin.org 8April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
Also, to be able to compare the whole vector bandit (cf.
Figure 4) to that of an ensemble bandit, we are looking at the
number of invocations for each type of arm (global, entity-
centric, neighborhood) for both the configurations. Since the
latter version has invocations for each item, we are calculating
an average number of invocations for each arm to get an
overview of its behavior. For that purpose, all the invocations
were summed up and divided by the total number of items,
i.e., 6 (cf. Table 1). Additionally, we are identifying the relative
number of invocations for each item using a division by the sum
of all the invocations which is 1680 and 2308 for groups A and
B, respectively. We then can proceed to compare UCB in Table 3
for the ensemble bandit and the whole vector bandit. Similarly
for, TS comparisons are made between the ensemble and the
whole vector bandit in Table 4. We observe that the behavior
of the bandits is very similar on average within the sampling
strategy. On the other hand, when we compare UCB and TS we
observe the preference of TS toward global arms whereas UCB
prefers the local ones. To see which strategy is less optimal we
compare the number of invocations to the optimal sampler (cf.
Table 5) and see that TS has a behavior similar to that of the
optimal one. This indicates the TS is more suitable for solving
the problem since it aligns with the optimal strategy in all the
configurations.
Furthermore, to better understand the convergence and
therefore the learning behavior of the bandits, the average
cumulative reward for each group is plotted. This provides an
overview of the increase of rewards over time and therefore
the increase in the quality of the models and the bandits
performance. This can be seen in Figure 5.
The findings that we derive from the average cumulative
reward view of the Figure 5 are as follows:
1. For the random sampling strategy rewards increases with
every iteration/time-step.
2. UCB and TS have a similar trend as that of a random strategy.
3. The investigated samplers are better than the worst samplers
except at the very beginning where they do not have enough
data to exploit.
4. Comparing the performances for group A and group B, we see
that the lines for group B are closer to each other as compared
to that for A. This indicates that the predictability for group B
is slightly higher than that for group A.
5. This claim is supported by the fact that for group B on average
cumulative rewards are higher than that for group A, which
indicates fewer errors while predicting entities in group B.
To compare the behavior of the bandit irrespective of the groups
to which users belong, the same set of experiments are repeated
without considering the group information. The same trends
as that of the contextual multi-armed bandit are observed.
Figure 6B shows that TS, UCB, and random samplers perform
at par with each other. To see how this bandit behaves for the
users in both groups A and B, the average cumulative rewards
per iteration/time-step was plotted. We see in Figure 7 bandit
learns well for both group A and group B users irrespective of
the difference in the number of observations in both the groups.
In summary, it can be observed that local arms are
preferred in many time-steps. There is no clear pattern in
the choice between global and local arms but we found
differences in the preference when using different sampling
strategies. Additionally, the predictability of the groups varies
slightly based on average cumulative rewards. To better
understand how the bandit chooses between global and
local arms, we investigate potential criteria in the next
subsection.
4.3. Experiments for RQ 2
To be able to understand the behavior of the bandit, i.e., choice of
the arms and development of the reward, we investigate different
factors. These factors include the history length of the entities,
their personal traits, their time of arrival in the system, and the
temporal proximity of the predictions.
Initially, we are investigating the average error of each entity
per item and prediction models underlying the arms. We observe
in Table 6 that there is no clear boundary on the history length
of the user where it can be clearly stated that local models are
performing better on average than global models. Only entity 25
with a history length of 12 displays a clear preference to global
models for all the items. Furthermore, also in Table 6 in the
last column where we have the average error of all items per
entity for different prediction models, this preference of entity
25 toward the global model can be confirmed. For entity 19 with
a history length of 20, the global model performs on par with
the neighborhood-based model. Beyond that, for most entities,
the local models seem to perform better for all items except item
S07, but their difference in performance to the global models is
mostly negligible.
The behavior of S07 cannot be explained by the history length
as a factor. Since the global model often performs better than
the local models there seems to be a property that local models
miss. One of these properties is the temporal proximity of the
predictions. As can be seen in Table 1 the item S07 asks about
how exhausted does the entity feel. This exhaustion might be
dependent on external factors such as weather, time of day, etc.,
and, therefore, is more dependent upon the prior predictions
rather than the entity itself.
The arrival time of the entity is interesting since the
global model had more instances to predict already if the
entity arrives later. Therefore, the global model might be
already a better predictor for the late-arriving entities than
the early birds. This cannot be proven since there seems
to be no correlation between the arrival time of the entity
and the preference toward the global model. This can be
seen in Figure 8 where the choice of the arms is more
dependent on the history length than the arrival time of
the entity.
Since the analysis is only based on the individual arms
of the bandit, we additionally checked the performance of
the whole vector bandit to its arms. In Table 7, we compare
the errors of optimal sampling strategy to that of TS and
see that TS performs on average worse or equal to the
best performing arm. This cannot be said about the optimal
Frontiers in Neuroscience | www.frontiersin.org 9April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
FIGURE 4 | The figure shows the combined view of both the groups for the whole vector bandit w.r.t the number of times each arm is invoked for the items by different
samplers and the average cumulative reward per iteration/time-step for those items. Five samplers as discussed in Section 3.3 above have been compared here.
strategy though which specially for shorter history lengths
perform better than its components. Additionally, for the longer
history lengths, the error of the optimal strategy is close to
the best arms. But for entity 22, which is also the first to
arrive in the system, the optimal strategy is similarly better as
for short history lengths. This might indicate that the initial
predictions for entity 22 are better in a bandit setting but
over time the performance of the individual arms averages
out to be similar to that of an optimal sampler. Hence, the
effect of the choice between the models is marginal at best,
resulting in similar bandits and similar behavior as for a purely
random model.
Additionally, we investigated if there might be a cold start
problem for the bandits, i.e., initializing the bandit with 0
rewards and invocations and, hence, having no information
about the expected performance of each arm. To deal with this,
the optimal strategy is used to kick start each of the samplers.
Meaning for the prediction of the first 70 observations, the
optimal strategy is employed to initialize UCB, random, and
Thompson sampler with the resulting number of invocations
and sum of rewards. The observed trend is similar to that in
Figure 3. The same experiment is repeated by taking the first
140 observations instead of 70 for initialization purposes and
still the trend is the same. This experiment further supports
the claim made about similar behavior of UCB, TS, and
random samplers.
In conclusion, we have shown that all 4 factors have influence
on the behavior of the bandit and the quality of the predictions.
The biggest influence has been on the history length and the
temporal proximity of the predictions which will be discussed
further in Section 5.
5. DISCUSSION
5.1. Technical Perspective
As mentioned in Section 3.2, we incorporated local and global
models as arms for CMAB which managed the invocations of
those models to prove their effectiveness for RQ 1. A clear trend
can be seen in Tables 3,4, which prefers local models two third
of the time for UCB and more than half for TS, whereas the
latter is close in behavior to an optimal sampling strategy. The
superiority of the local models is not necessarily large though.
Frontiers in Neuroscience | www.frontiersin.org 10 April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
TABLE 3 | Number of invocations for the global and local models by ensemble predictors and whole vector predictor using UCB as a sampling strategy.
Group A Group B
Global Entity Neighborhood Total Global Entity Neighborhood Total
S02 Absolute 545 617 518 1680 480 863 965 2308
Relative 32% 37% 31% 100% 21% 37% 42% 100%
S03 Absolute 511 627 542 1680 554 810 944 2308
Relative 30% 37% 32% 100% 24% 35% 41% 100%
S04 Absolute 485 584 611 1680 601 822 885 2308
Relative 29% 35% 36% 100% 26% 36% 38% 100%
S05 Absolute 512 569 599 1680 576 833 899 2308
Relative 30% 34% 36% 100% 25% 36% 39% 100%
S06 Absolute 467 642 571 1680 661 847 800 2308
Relative 28% 38% 34% 100% 29% 37% 35% 100%
S07 Absolute 531 590 559 1680 588 822 898 2308
Relative 32% 35% 33% 100% 25% 36% 39% 100%
Total
Absolute 3051 3629 3400 - 3460 4997 5391 -
Average 508.5 604.8 566.7 - 576.7 832.8 898.5 -
Relative 30% 36% 34% - 25% 36% 39% -
Whole
Vector
Absolute 506 607 567 1680 589 825 894 2308
Relative 30% 36% 34% 100% 26% 36% 39% 100%
TABLE 4 | Number of invocations for the global and local models by ensemble predictors and whole vector predictor using Thompson Sampling as a sampling strategy.
Group A Group B
Global Entity Neighborhood Total Global Entity Neighborhood Total
S02 Absolute 802 494 384 1680 947 716 645 2308
Relative 48% 29% 23% 100% 41% 31% 28% 100%
S03 Absolute 755 581 344 1680 972 671 665 2308
Relative 45% 35% 20% 100% 42% 29% 29% 100%
S04 Absolute 756 544 380 1680 930 692 686 2308
Relative 45% 32% 23% 100% 40% 30% 30% 100%
S05 Absolute 793 577 310 1680 962 706 640 2308
Relative 47% 34% 18% 100% 42% 31% 28% 100%
S06 Absolute 790 517 373 1680 967 704 637 2308
Relative 47% 31% 22% 100% 42% 31% 28% 100%
S07 Absolute 798 541 341 1680 987 677 644 2308
Relative 48% 32% 20% 100% 43% 29% 28% 100%
Total
Absolute 4694 3254 2132 - 5765 4166 3917 -
Average 782.3 542.3 355.3 - 960.8 694.3 652.8 -
Relative 47% 32% 21% - 42% 30% 28% -
Whole
Vector
Absolute 780 628 272 1680 970 649 689 2308
Relative 46% 37% 16% 100% 42% 28% 30% 100%
Only minor differences could be observed in Section 4.2. The
sampling strategies including worst and optimal all are relatively
close together in performance, meaning that the choice of models
is not that significant. This might also be due to the late activation
of local models as described in Section 3.2.1, since the fallback
would be always the global model. A better strategy on when to
chose each arm might still prove to be beneficial.
Additionally from Figures 35we see that the context used in
the CMAB does not make a big difference. Hence, the context of
the items might be irrelevant to the problem or for the arms that
were chosen. The sampling strategies might not be able to pick up
relevant differences or models might be too close in performance.
Furthermore, the predictability for group B is slightly higher than
for group A for the CMAB. This might mean, that users of group
B either formed a more distinct profile due to them exploring the
app more, or their behavior is just simpler and therefore easier
to predict. In both cases, the local arms might easier pick up the
personal traits of each entity. Additionally, there is a difference in
the usage of the context by the strategies. TS shows a significant
difference in the choice of the neighborhood arm across groups.
Frontiers in Neuroscience | www.frontiersin.org 11 April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
TABLE 5 | Number of invocations for the global and local models by ensemble predictors and whole vector predictor using optimal sampler as a sampling strategy.
Group A Group B
Global Entity Neighborhood Total Global Entity Neighborhood Total
S02 Absolute 667 678 335 1680 862 895 551 2308
Relative 40% 40% 20% 100% 37% 39% 24% 100%
S03 Absolute 675 627 378 1680 924 843 541 2308
Relative 40% 37% 23% 100% 40% 37% 23% 100%
S04 Absolute 686 636 358 1680 914 910 484 2308
Relative 41% 38% 21% 100% 40% 39% 21% 100%
S05 Absolute 714 611 355 1680 948 839 521 2308
Relative 43% 36% 21% 100% 41% 36% 23% 100%
S06 Absolute 728 612 340 1680 887 893 528 2308
Relative 43% 36% 20% 100% 38% 39% 23% 100%
S07 Absolute 747 582 351 1680 945 843 520 2308
Relative 44% 35% 21% 100% 41% 37% 23% 100%
Total
Absolute 4217 3746 2117 - 5480 5223 3145 -
Average 702.8 624.3 352.8 - 913.3 870.5 524.17 -
Relative 42% 37% 21% - 40% 38% 23% -
Whole
Vector
Absolute 670 653 357 1680 861 893 554 2308
Relative 40% 39% 21% 100% 37% 39% 24% 100%
This implies TS can make use of homogeneity of group B. These
findings are not significantly visible in the optimal sampler but
indications can be found. The difference between the groups is
not shared by the simple bandit as seen in Figure 7. Therefore,
we can assume that context is important overall, but the given
context is not sufficient, yet.
To understand which factors influence the performance
between the models (RQ 2), we looked at the four parameters
in Section 4.3. In Table 7, we found a clear dependency on the
history length of the entities. This is on one hand due to the
training time of the local models but also the fact that errors
average out over time. Additionally, we checked if the arrival time
of an entity in the system influences the choice of arms since
the global arm will have a smaller error and higher reward at
some point. Such a correlation does not exists, since Figure 8
only shows a dependency on the history length of the entity. The
factors temporal proximity of the predictions and the personal
traits of the entities might be in a trade-off between each other
since local models might be able to better capture the personal
traits but might not have recent predictions due to the behavior
of the entities. In Table 6, most items for either local model
perform better than the global one. We assume these items to be
personal and therefore dependent on the traits of the entity. This
seems to not be true for S07 where the global model oftentimes
outperforms the local models. We explained in Section 4.3 that
this might be due to external circumstances like weather and
time. Due to that, context about the items might be beneficial for
a CMAB.
As far as threats to validity are concerned, there is no
guarantee that the multi-armed bandits converged to an optimal
solution. This could be due to the lack of a fitting sampling
strategy and the fact, that all models get better over time. For
our experiments we assumed the data to follow the normal
distribution and TS was designed accordingly. This assumption
can be too stringent and hence we might just have a sub-
optimal sampling strategy. Second, from a data perspective, there
might be too few entities in the dataset, overall along with
varying history lengths as entities with a small history might
not be properly integrated by the bandit. This might have had
a negative impact on the bandits and the models. This was
further worsened due to the removal of entities without the
group (context) information.
For future work, one main point should be therefore on the
data collection process. Entities could be presented with the fact
the recommendations from the mHealth app can be improved if
they better work with the system, because for more active entities,
their answers are easier to predict. Since the group as a context for
the bandit did not provide useful insights, collecting additional
meta-information about the lifestyle or similar information about
the user might separate the groups inside the bandit better.
Additionally, it might be helpful to get feedback from the entities
on the aspect where the difference of the predictions to the
actual answers are large. This could be directly incorporated
into the reward function of the bandit and hence improving the
sampling strategies. Given more time, the sampling strategies
could have been enhanced by including information about the
recent behavior of each user, because their differences might
punish the rewards of yet not fully trained local models.
5.2. Mobile Health Perspective
From the perspective of optimal EMA prediction for personalized
services, our findings demonstrate a clear trend: the optimal
sampler of CMAB invokes the entity-centric model comparably
often as the global model when it comes to predictions of the
Frontiers in Neuroscience | www.frontiersin.org 12 April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
FIGURE 5 | Average cumulative reward per iteration/time-step for the items for Group A and B. Here, the first column represents selections by Group A and the
second column by Group B. Five samplers as discussed above have been compared here.
individual EMA through the ensemble (cf. Figure 3 and Table 5),
while for the whole-vector prediction it invokes the entity-centric
model equally or more often than the global one (cf. upper part of
Figure 4). Our findings on how often the optimal sampler of the
CMAB invokes each of the three models show a clear preference
toward the entity-centric model. This means that predictions
should be preferably based on the recordings of a user oneself,
especially for users who (alike the participants of group B) deliver
many recordings. Another formulation of this finding is that if
an mHealth service provider has the option of invoking both a
global model and an entity-centric one, then the latter should be
preferred for users who interact intensively with the app—as soon
as they start doing so.
One might argue that since the global model is used as
often as the entity-centric model by the optimal sampler of the
CMAB, then the two models are equally good. Unfortunately,
this conclusion is not permitted. Rather, the global model is
used so often, because many users contribute too few EMA,
Frontiers in Neuroscience | www.frontiersin.org 13 April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
FIGURE 6 | The figure shows the view for the simple bandit wherein the subfigure (A) shows the number of times each arm is invoked for all the items by different
samplers for every entity. The subfigure (B) shows the average cumulative reward per iteration/time-step for all the entities. In both cases, 5 samplers have been
compared wherein different colors represent different samplers.
FIGURE 7 | To see how the average cumulative reward in Figure 6B behaves for different groups, we split the behavior for both the groups which is shown in the
subfigures (A,B).Y-axis shows the average cumulative reward and the X-axis shows the time-steps and different colors represent 5 samplers that are compared here.
TABLE 6 | Average error for each item across all the time steps using global, entity-centric, and neighborhood prediction models.
Entities Hist. length S02 S03 S07 Total
Global EC NN Global EC NN Global EC NN Global EC NN
25 12 15.65 40.71 18.21 14.00 34.49 13.44 10.58 22.56 13.41 13.01 26.66 14.73
19 20 14.89 18.40 18.14 22.28 39.18 22.90 7.74 19.69 10.05 12.89 16.52 12.54
40 34 17.82 14.51 16.68 20.49 17.37 18.99 12.75 9.07 9.98 17.07 14.46 15.45
30 128 5.91 5.43 4.36 7.22 5.65 5.06 7.02 10.11 9.80 6.84 6.18 5.75
29 322 12.38 12.42 10.74 11.69 12.54 10.97 8.53 13.62 11.69 10.38 10.38 9.07
22 490 7.46 6.69 6.63 15.83 16.52 15.54 7.64 17.86 15.61 10.47 10.95 10.17
For demonstration purposes, we include only items S02, S03, and S07.
hence no personal, entity-centric models are available for them.
As can be seen in the upper right subfigure of Figure 4, the
optimal sampler invokes the entity-centric model is invoked
more often than the global model for the users of group
B, i.e., for the group the users of which contribute many
EMA.
Frontiers in Neuroscience | www.frontiersin.org 14 April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
FIGURE 8 | The relative number of invocations for global and local arms. The left subfigure shows the number of times each arm is invoked for all the items by TS
samplers for selected entities. Y-axis shows the relative number of invocations and the X-axis are the entities. The right subfigure shows the same for the optimal
sampler. For demonstration purposes, we have chosen 6 entities with varying history lengths.
For the design of personalized services, as anticipated, e.g.,
in Vogel et al. (2021), this finding is rather disappointing, since
it means that knowledge from other users is not very useful
when it comes to prediction. This finding is also in contrast
to the whole body of research on collaborative filtering for
recommendation engines, where knowledge about similar users
is exploited to predict a user’s future preferences (Ricci et al.,
2010), and to the advances on the potential of mHealth apps
for decision support through prediction (Martínez-Pérez et al.,
2014). However, our finding is less surprising when placed in
the context of EMA: cutting-edge prediction technologies on
temporal data build upon large amounts of recordings, see
e.g. the sizes of the timestamped data sets used in Bellogín
and Sánchez (2017); large data sets can be accumulated easily
through sensor signals, as for the glycose predictor proposed
by Pérez-Gandía et al. (2018), but are less easy to accumulate
when users deliver EMA at their own discretion. Indeed, in our
earlier work (Schleicher et al., 2020) on EMA recordings with
the mHealth app TrackYourTinnitus (for short: TYT), we found
that the majority of the users contributed less than three EMA
in total.
Then, should we avoid learning from similar users for
prediction? Since the neighborhood-based model is invoked
comparatively rarely by the optimal sampler of the CMAB
(cf. Table 5,Figure 3, and upper part of Figure 4), one is
tempted to conclude that this model is inferior to the other
two. However, the low number of invocations can also be
explained by the limitations of the concrete study: the number
of users is small in total, the period of observation is short,
and the neighborhood-based model demands 5 observations
per neighbor, in order to start making predictions. Hence, this
model is available less often than the other two models. This
means that if the population of users were larger and more
EMA were available for some of them, then the aforementioned
finding might be reversed. There are indeed indications in that
direction: our earlier analyses on users of EMA-based mHealth
apps for tinnitus (Unnikrishnan et al., 2019, 2020a, 2021) and
TABLE 7 | Combined average error for each item across all the time steps for the
complete bandit using optimal and Thompson sampling strategies to that of its
arms.
Entity-Id Group History length Optimal TS Global EC NN
25 B 12 11.72 13.75 13.01 26.66 14.73
19 A 20 11.21 12.87 12.89 16.52 12.54
40 A 34 13.22 15.59 17.07 14.46 15.45
30 A 128 5.45 6.61 6.84 6.18 5.75
29 B 322 8.68 10.18 10.38 10.38 9.07
22 A 490 8.91 10.32 10.47 10.95 10.17
The users shown are chosen at random to have a mix of varying history lengths.
diabetes (Unnikrishnan et al., 2020b) demonstrate that it is
possible to exploit the data of users who deliver many EMA
in order to do high-quality predictions for users who deliver
few EMA (or are at the beginning of their interaction with
the app). Nonetheless, choosing appropriate data to inform
a neighborhood-based predictor is challenging (Unnikrishnan
et al., 2021), not least because dependencies between past and
current recordings do not generalize for the whole population
of users. More research is needed to find ways of exploiting
information on similar users for recommendations, when the
available data are very sparse.
5.3. Experimentation Perspective and
Insights for Tinnitus Research
Our investigation on the relative performance of global and
local models is based on an experimental study. Hence,
some of our findings are of relevance to the design of
experiments involving mHealth app users, at least in the
context of tinnitus research and of design of mHealth apps for
tinnitus users.
First of all, there are differences in the predictability of
different EMA items: for the prediction of EMA item S07,
the optimal sampler selects the global model more often than
Frontiers in Neuroscience | www.frontiersin.org 15 April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
the entity-centric one; for item S02, the entity-centric model
is equally often selected for group A and more often selected
for group B. This means that the global model, which is
wholly insensitive to which user has contributed which EMA
answer, delivers the best prediction (and is thus selected by the
optimal sampler) as often as does the entity-centric model of
that user.
Further, there are EMA items that were answered more
often than others. More research is needed to shed light to
the (un)popularity of some EMA items and to investigate
whether some items can be consistently predicted from others;
in our earlier work, we provide some indication to this
end, by identifying Granger causalities among EMA items
answered by TrackYourTinnitus users (Jamaludeen et al.,
2021).
The findings on the answers to the EMA item S01 were
remarkable. While the majority of users consistently answered
that they perceived tinnitus every time they were asked,
some users consistently answered that they do not, and
some of the latter consistently answered that they do after
they entered the second phase of the experiment. Since the
number of users in this study was small, we do not attempt
any conclusion from this finding. However, the presence of
4 out of 21 users with so different behavior with respect
to this item shows that further research is needed, and in
a larger pool of users, to better understand the role of
this item.
The tips considered in the mHealth app TinnitusTipps were
not personalized toward users, nor aligned to a users previous
answers to specific items. Hence, the effect of the tips on
the answers to the EMA items cannot be assessed. However,
group A was exposed to tipps from the beginning of the
study, while group B was exposed only after the first two
months. More analysis is needed to understand whether this
can have resulted in the observed differences between group
A and group B with respect to the number of contributed
EMA recordings.
The optimal sampler of our CMAB demonstrated that the
choice among models is influenced by the group (A vs. B). The
difference between the two groups was that group B delivered
more EMA recordings from the beginning on. Since the number
of recordings influences the quality of the entity-centric predictor
and of the neighborhood-based predictor, and (as a matter of
fact) the quality of any multi-level model, it is advisable to
quantify interaction intensity (cf. Schleicher et al., 2020) and to
incorporate it into the learning model.
DATA AVAILABILITY STATEMENT
The data is part of the TinnitusTipps app developed by Sivantos,
data can be shared upon request to the authors. Requests should
be directed to WS: [email protected].
ETHICS STATEMENT
The studies involving human participants were reviewed and
approved by Ethics Committee from the University Regensburg:
17-544-101. The patients/participants provided their written
informed consent to participate in this study.
AUTHOR CONTRIBUTIONS
SS designed, implemented, and evaluated the technical
components under the supervision of MS building upon
and extending prior results of VU and MS created a conceptual
model of the whole approach. VU contributed with ideas and
insights from previous studies on the same data. RP, RH, and
WS designed the mHealth app. RK and JS implemented it under
the guidance of RP and provided data and instructions on data
usage. WS lead the design of the two-armed mHealth study that
resulted in the data set used in this investigation. RH delivered
comments and feedback on the purpose of the app components
and RP on the architecture and functionalities. SS and MS wrote
the paper together. All other authors contributed with comments
and feedback.
FUNDING
This work was partially funded by the CHRODIS PLUS Joint
Action, which has received funding from the European Union,
in the framework of the Health Programme (2014-2020), Grant
Agreement 761307 Implementing good practices for chronic
diseases. This work was partially inspired by the European
Union’s Horizon 2020 Research and Innovation Programme,
Grant Agreement 848261 Unification of treatments and
Interventions for Tinnitus patients (UNITI). The development
of the TinnitusTipps mHealth app was partially financed by
Sivantos GmbH–WS Audiology. The funder was not involved in
the study design, collection, analysis, interpretation of data, the
writing of this article, or the decision to submit it for publication.
ACKNOWLEDGMENTS
We would like to thank all reviewers for their comments,
which lead to a much broader perspective of the paper and to
additional contributions. We particularly thank the reviewers for
their suggestion to investigate the role of EMA item S01; this
suggestion leads to a remarkable result. Although we could not
draw conclusions from it, we believe that the publication itself
will lead to further investigation on the issue we identified.
SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found
online at: https://www.frontiersin.org/articles/10.3389/fnins.
2022.836834/full#supplementary-material
Frontiers in Neuroscience | www.frontiersin.org 16 April 2022 | Volume 16 | Article 836834
Shahania et al. CMAB for EMA Prediction
REFERENCES
Bellogín, A., and Sánchez, P. (2017). “Revisiting neighbourhood-based
recommenders for temporal scenarios, in RecTemp@ RecSys (Madrid).
40–44.
Bifet, A., and Gavaldá, R. (2007). “Learning from time-changing data with adaptive
windowing, in Proc. of the 2007 SIAM Int. Conf. on Data Mining (SDM’07),
vol. 7 (Minneapolis, MN), 443–448.
Cederroth, C. R., Gallus, S., Hall, D. A., Kleinjung, T., Langguth, B.,
Maruotti, A., et al. (2019). Towards an understanding of tinnitus
heterogeneity. Front. Aging Neurosci. 11, 53. doi: 10.3389/fnagi.2019.
00053
De Ridder, D., Schlee, W., Vanneste, S., Londero, A., Weisz, N., Kleinjung, T., et al.
(2021). “Tinnitus and tinnitus disorder: theoretical and operational definitions
(an international multidisciplinary proposal), in Progress in Brain Research
(Amsterdam: Elsevier BV). 1–25
Gomes, H. M., Bifet, A., Read, J., Barddal, J. P., Enembreck, F., Pfahringer, B., et al.
(2017). Adaptive random forests for evolving data stream classification. Mach.
Learn. 106, 1469–1495. doi: 10.1007/s10994-017-5642-8
Hermans, H. J. (1988). On the integration of nomothetic and
idiographic research methods in the study of personal meaning.
J. Personality 56, 785–812. doi: 10.1111/j.1467-6494.1988.
tb00477.x
Hulten, G., Spencer, L., and Domingos, P. (2001). “Mining time-changing data
streams, in Proc. of the 7th ACM SIGKDD Int. Conf. on Knowledge Discovery
and Data Mining (KDD ’01) (New York, NY: ACM), 97–106.
Jamaludeen, N., Unnikrishnan, V., Pryss, R., Schobel, J., Schlee, W., and
Spiliopoulou, M. (2021). “Circadian conditional granger causalities on
ecological momentary assessment data from an mhealth app, in 2021 IEEE 34th
International Symposium on Computer-Based Medical Systems (CBMS) (Aveiro:
IEEE), 354–359.
Kraft, R., Stach, M., Reichert, M., Schlee, W., Probst, T., Langguth, B., et al. (2020).
Comprehensive insights into the trackyourtinnitus database. Procedia Comput.
Sci. 175, 28–35. doi: 10.1016/j.procs.2020.07.008
Langguth, B., Goodey, R., Azevedo, A., Bjorne, A., Cacace, A., Crocetti, A., et al.
(2007). Consensus for tinnitus patient assessment and treatment outcome
measurement: tinnitus research initiative meeting, regensburg, july 2006. Progr.
Brain Res. 166, 525–536. doi: 10.1016/S0079-6123(07)66050-6
Mao, X., Zhao, X., and Liu, Y. (2020). mhealth app recommendation based on
the prediction of suitable behavior change techniques. Decis. Support Syst. 132,
113248. doi: 10.1016/j.dss.2020.113248
Martínez-Pérez, B., de la Torre-Díez, I., López-Coronado, M., Sainz-de Abajo, B.,
Robles, M., and García-Gómez, J. M. (2014). Mobile clinical decision support
systems and applications: a literature and commercial review. J. Med. Syst. 38,
1–10. doi: 10.1007/s10916-013-0004-y
Mehdi, M., Dode, A., Pryss, R., Schlee, W., Reichert, M., and Hauck, F. J.
(2020). Contemporary review of smartphone apps for tinnitus management
and treatment. Brain Sci. 10, 867. doi: 10.3390/brainsci10110867
Pérez-Gandía, C., García-Sáez, G., Subías, D., Rodríguez-Herrero, A., Gómez, E. J.,
Rigla, M., et al. (2018). Decision support in diabetes care: the challenge of
supporting patients in their daily living using a mobile glucose predictor. J.
Diabetes Sci. Technol. 12, 243–250. doi: 10.1177/1932296818761457
Prakash, S., Unnikrishnan, V., Pryss, R., Kraft, R., Schobel, J., Hannemann, R., et al.
(2021). Interactive system for similarity-based inspection and assessment of the
well-being of mhealth users. Entropy 23, 1695. doi: 10.3390/e23121695
Probst, T., Pryss, R. C., Langguth, B., Rauschecker, J. P., Schobel, J., Reichert,
M., et al. (2017). Does tinnitus depend on time-of-day? an ecological
momentary assessment study with the TrackYourTinnitus application. Front.
Aging Neurosci. 9, 253. doi: 10.3389/fnagi.2017.00253
Pryss, R., Langguth, B., Probst, T., Schlee, W., Spiliopoulou, M., and Reichert,
M. (2021). Smart mobile data collection in the context of neuroscience. Front.
Neurosci. 15, 618. doi: 10.3389/fnins.2021.698597
Pryss, R., Probst, T., Schlee, W., Schobel, J., Langguth, B., Neff, P., et al. (2019).
Prospective crowdsensing versus retrospective ratings of tinnitus variability and
tinnitus–stress associations based on the trackyourtinnitus mobile platform.
Int. J. Data Sci. Anal. (JDSA) 8, 327–338. doi: 10.1007/s41060-018-0111-4
Ricci, F., Rokach, L., and Shapira, B. (2010). Recommender Systems Handbook. New
York, NY: Springer. doi: 10.1007/978-0-387-85820-3_1
Schlee, W., Neff, P., Simoes, J., Langguth, B., Schoisswohl, S., Steinberger, H., et al.
(2022). Smartphone-guided educational counseling and self-help for chronic
tinnitus. Preprints (2022) 2022010469. doi: 10.20944/preprints202201.0469.v1
Schleicher, M., Unnikrishnan, V., Neff, P., Sim es, J., Probst, T., Schlee, W., et al.
(2020). Understanding adherence to the recording of ecological momentary
assessments in the example of tinnitus monitoring. Sci. Rep. 10, 1–13.
doi: 10.1038/s41598-020-79527-0
Shahania, S., Unnikrishnan, V., Pryss, R., Kraft, R., Schobel, J., Hannemann, R.,
et al. (2021). “User-centric vs whole-stream learning for EMA prediction,
in Proceedings of the IEEE Symposium on Computer Based Medical Systems
(CBMS’2021) (Aveiro: IEEE)
Unnikrishnan, V., Beyer, C., Matuszyk, P., Niemann, U., Schlee, W., Ntoutsi, E.,
et al. (2019). Entity-level stream classification: exploiting entity similarity to
label the future observations referring to an entity. Int. J. Data Sci. Anal. 9, 1–15.
doi: 10.1007/s41060-019-00177-1
Unnikrishnan, V., Schleicher, M., Shah, Y., Jamaludeen, N., Schobel, J.,
Kraft, R., et al. (2020a). The effect of non-personalised tips on the
continued use of self-monitoring mhealth applications. Brain Sci. 10, 924.
doi: 10.3390/brainsci10120924
Unnikrishnan, V., Shah, Y., Schleicher, M., Fernández-Viadero, C., Strandzheva,
M., Velikova, D., et al. (2021). “Love thy neighbours: a framework for error-
driven discovery of useful neighbourhoods for one-step forecasts on ema
data, in 2021 IEEE 34th International Symposium on Computer-Based Medical
Systems (CBMS) (Aveiro: IEEE), 295–300.
Unnikrishnan, V., Shah, Y., Schleicher, M., Strandzheva, M., Dimitrov, P.,
Velikova, D., et al. (2020b). “Predicting the health condition of mhealth
app users with large differences in the number of recorded observations
- where to learn from?” in Int. Conf. on Discovery Science (Thessaloniki),
659–673.
Vogel, C., Schobel, J., Schlee, W., Engelke, M., and Pryss, R. (2021).
“Uniti mobile emi-apps for a large-scale european study on
tinnitus, in 2021 43rd Annual International Conference of the IEEE
Engineering in Medicine & Biology Society (EMBC) (Mexico: IEEE),
2358–2362
Watson, H. A., Tribe, R. M., and Shennan, A. H. (2019). The role of medical
smartphone apps in clinical decision-support: a literature review. Artif. Intell.
Med. 100, 101707. doi: 10.1016/j.artmed.2019.101707
Conflict of Interest: RH was employed by Sivantos GmbH–WS Audiology
during the development of the mHealth app software. In this article, he provided
comments and feedback on the description of the purpose of the app.
The remaining authors declare that the research was conducted in the absence of
any commercial or financial relationships that could be construed as a potential
conflict of interest.
Publisher’s Note: All claims expressed in this article are solely those of the authors
and do not necessarily represent those of their affiliated organizations, or those of
the publisher, the editors and the reviewers. Any product that may be evaluated in
this article, or claim that may be made by its manufacturer, is not guaranteed or
endorsed by the publisher.
Copyright © 2022 Shahania, Unnikrishnan, Pryss, Kraft, Schobel, Hannemann,
Schlee and Spiliopoulou. This is an open-access article distributed under the terms
of the Creative Commons Attribution License (CC BY). The use, distribution or
reproduction in other forums is permitted, provided the original author(s) and the
copyright owner(s) are credited and that the original publication in this journal
is cited, in accordance with accepted academic practice. No use, distribution or
reproduction is permitted which does not comply with these terms.
Frontiers in Neuroscience | www.frontiersin.org 17 April 2022 | Volume 16 | Article 836834