scieee Science in your language
[en] (orig)
Pervasive and Mobile Computing 103 (2024) 101951
Available online 28 May 2024
1574-1192/Β© 2024 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
Contents lists available at ScienceDirect
Pervasive and Mobile Computing
journal homepage: www.elsevier.com/locate/pmc
On data minimization and anonymity in pervasive mobile-to-mobile
recommender systems
Tobias Eichingerβˆ—, Axel KΓΌpper
Technische UniversitΓ€t Berlin, Service-centric Networking (SNET), Ernst-Reuter-Platz 7, Berlin, 10587, Germany
ARTICLE INFO
Dataset link: https://github.com/TEichinger/d
ec-cf-sim
Keywords:
Data minimization
Anonymity
Decentralized recommender system
Distributed gradient descent
Pervasive computing
Mobile computing
ABSTRACT
Data minimization is a legal principle that mandates limiting the collection of personal data
to a necessary minimum. In this context, we address ourselves to pervasive mobile-to-mobile
recommender systems in which users establish ad hoc wireless connections between their mobile
computing devices in physical proximity to exchange ratings that represent personal data on
which they calculate recommendations. The specific problem is: How can users minimize the
collection of ratings over all users while only being able to communicate with a subset of
other users in physical proximity? A main difficulty is the mobility of users, which prevents,
for instance, the creation and use of an overlay network to coordinate data collection. Users,
therefore, have to decide whether to exchange ratings and how many when an ad hoc
wireless connection is established. We model the randomness of these connections and apply
an algorithm based on distributed gradient descent to solve the distributed data minimization
problem at hand. We show that the algorithm robustly produces the least amount of connections
and also the least amount of collected ratings compared to an array of baselines. We find that
this simultaneously reduces the chances of an attacker relating users to ratings. In this sense, the
algorithm also preserves the anonymity of users, yet only of those users who do not establish
an ad hoc wireless connection with each other. Users who do establish a connection with each
other are trivially not anonymous toward each other. We find that users can further minimize
data collection and preserve their anonymity if they aggregate multiple ratings on the same
item into a single rating and change their identifiers between connections.
1. Introduction
Privacy is traditionally associated with the autonomy and freedom of individuals [1,2]. It forms the basis of a free so-
ciety and is, therefore, widely considered worthy of protection. A popular way to protect privacy is by law, where modern
data protection laws around the globe such as the European Union’s General Data Protection Regulation (GDPR), California’s
California Privacy Rights Act (CPRA), South Africa’s Personal Information Act (POPIA), or Brazil’s Lei Geral de ProteΓ§Γ£o de Dados
(LGPD) unanimously emphasize the protection of personal data. Across all data protection laws, data minimization is a central data
protection principle that mandates the limitation of collection and processing of personal data.
One way to apply data minimization is to mandate purpose limitation. Purpose limitation means that when personal data are
collected, the purposes of their collection must be specified and the collected data may only be processed to fulfill these purposes.
Purpose limitation imposes a burden of proof on data controllers, that is those who collect and process personal data, and data subjects,
that is, those individuals to whom the data relate. Data controllers must make it clear to their data subjects for which purposes they
βˆ—Corresponding author.
E-mail address: [emailΒ protected] (T. Eichinger).
https://doi.org/10.1016/j.pmcj.2024.101951
Received 1 September 2023; Received in revised form 5 May 2024; Accepted 25 May 2024
Pervasive and Mobile Computing 103 (2024) 101951
2
T. Eichinger and A. KΓΌpper
Fig. 1. Visualization of Shanmugam et al.’s [4] data minimization paradigm, based on the three phase model of Hestness et al. [16]. Data minimization is
characterized as a stopping problem, where data collection is to be stopped beyond the sufficiency point and at best at the saturation point. We adapt the
denomination of the three phases and the two points that separate them to the data minimization use case.
collect personal data and that the personal data that they collect is actually necessary to fulfill these purposes. We see that the tacit
assumption behind the principle of purpose limitation is that it is possible to decide whether processing a piece of personal data is
necessary for the fulfillment of a purpose.
In many application scenarios, it is possible to understand whether the collection and processing of personal data is necessary
to fulfill a purpose. It is, for instance, clear that it is necessary for an online shop to collect a customer’s address for the purpose
of delivery. In some application scenarios, however, it is not straightforward to decide whether the collection of personal data is
necessary for the fulfillment of that purpose. One such purpose is the provision of personalized content on, for instance, social media
platforms or e-commerce websites, where the selection of personalized content is performed by recommender systems. In the context
of personalization in recommender systems, it is not straightforward to decide whether the collection and processing of personal
data is necessary.
Biega et al. [3] outline that the main difficulty in applying purpose limitation in recommender systems is that it is typically
not possible to decide for a particular piece of personal data whether it is necessary to be collected. They, therefore, propose to
collect any personal data that can be considered relevant for the purpose of personalization until reaching a performance threshold.
Shanmugam et al. [4] argue that the use of an absolute performance threshold may cause a system to collect personal data while
only marginally increasing performance. While performance generally increases as the use of personal data increases, performance
gains decrease [3,5–7]. They propose to collect personal data until the performance gains, rather than the performance itself, reach
a threshold. Fig. 1 illustrates this data minimization paradigm.
In our own previous work, we study the data minimization problem as per Fig. 1 in decentralized recommender systems [8]. In
decentralized recommender systems, we are faced with multiple distinct instances of the data minimization problem, one for each
user in the system. In this paper, we specifically recapitulate our findings in the context of a pervasive mobile-to-mobile recommender
system in which users collect personal data through pairwise data exchanges in physical proximity (see for instance [9–15]).
Limitation of data exchanges between users to physical proximity is in itself a data minimization method, as every user only has
access to the personal data of a subset of users in the system. This limitation makes it, however, easier to identify the user to
whom personal data relate. We thus discuss the link between data minimization and anonymity in general and study the effects of
(A) users changing their identifiers and (B) aggregating their own personal data with that of others in particular. We begin with a
recapitulation of our distributed data minimization method.
2. Methods and material
The core methods we describe in this section have already been presented in our previous publication [8]. The source code that
implements both previously proposed methods and methods that we propose in this paper is publicly available [17]. It allows us
to reproduce all our results. We first introduce some basic conventions and notations that are subject to the application domain
of recommender systems. We then formally describe the pairwise connections that users establish in physical proximity. For each
pairwise connection that users establish, we formally describe the way in which users exchange personal data. On this basis, we
formulate data minimization as a distributed optimization problem and present an algorithm based on distributed gradient descent
to solve it. We begin with some conventions and notations.
Pervasive and Mobile Computing 103 (2024) 101951
3
T. Eichinger and A. KΓΌpper
Fig. 2. Example of three users 𝑒, 𝑣, 𝑀 βˆˆπ‘ˆexchanging rating data on four items 𝑖1, 𝑖2, 𝑖3, 𝑖4∈𝐼in two consecutive and pairwise data exchanges 𝑒↔𝑀and 𝑒↔𝑣.
The amount of data collected overall hinges on (a) the number of data exchanges and (b) the number of ratings exchanged in a data exchange. 𝑅denotes the
user-item matrix and 𝑅𝑒, 𝑅𝑣, 𝑅𝑀individual collected datasets.
2.1. Conventions and notations
We assume that users derive recommendations on their mobile phones on the basis of the data they collect from nearby other
users. We assume, more specifically, that all users make use of a collaborative filtering algorithm to derive recommendations.
Collaborative filtering generally denotes a class of recommendation algorithms [18]. These algorithms take ratings as input and
yield predicted ratings as output. Predicted ratings are estimates of how a user would rate an item they have not seen yet. Based on
these predicted ratings, recommendations can be displayed as a list of items with the highest predicted ratings. These assumptions
do not represent a limitation. Our methods can readily be applied to data types other than ratings and algorithms other than
collaborative filtering algorithms, provided that the performance of recommendations can be measured and performance differences
can be characterized as zero, marginal, or large. In a system with real users, one could thus regularly ask users to grade the quality of
recommendations or track their behavior within the system to measure the quality of recommendations and characterize performance
differences.
We measure performance with respect to the discrepancy between users’ actual ratings and the predicted ratings output by a
collaborative filtering algorithm to derive recommendations for them. We denote by π‘Ÿπ‘’,𝑖 the rating by some user π‘’βˆˆπ‘ˆon some
item π‘–βˆˆπΌ, where π‘ˆand 𝐼denote sets of users and items, respectively. We use the hat notation ξš‚π‘Ÿπ‘’,𝑖 to denote predicted ratings.
We denote by 𝑛user and 𝑛item the number of users and items in π‘ˆand 𝐼, respectively. We write all ratings π‘Ÿπ‘’,𝑖 into the 𝑛user ×𝑛item
user-item matrix 𝑅and denote by |𝑅|the number of ratings in 𝑅. We call a row π‘Ÿπ‘’in the user-item matrix 𝑅, corresponding to the
ratings by some user π‘’βˆˆπ‘ˆ, the profile of user 𝑒. We use the bar notation π‘Ÿπ‘’to denote user 𝑒’s mean rating.
We denote by sim(𝑒, 𝑣) ∈ [0,1] the cosine similarity between the profiles of some two users 𝑒, 𝑣 βˆˆπ‘ˆdefined by sim(𝑒, 𝑣) = <π‘Ÿπ‘’,π‘Ÿπ‘£>
β€–π‘Ÿπ‘’β€–β‹…β€–π‘Ÿπ‘£β€–,
where missing ratings are substituted with zeros, <β‹…,β‹…>denotes the dot product, β€–β‹…β€–the Euclidean norm, and π‘Ÿπ‘’, π‘Ÿπ‘£the profiles
of some users 𝑒and 𝑣respectively. The cosine similarity is widely used to describe two users’ similarity with respect to their item
preferences. If, for instance, two users have not rated a single item in common, then their cosine similarity is zero. Cosine similarity
is positive if users have rated at least on item in common, and is larger the closer their ratings on those commonly rated items align.
In decentralized collaborative filtering systems, every user only has their own profile to begin with and thus needs to collect
ratings from nearby other users. We denote by π‘Ÿπ‘’β†’π‘£the payload that user 𝑒sends to 𝑣. Payloads are sets of previously collected
profiles, which may include the sender’s own profile. We denote by 𝑅𝑒the collected dataset of user 𝑒, which consists of 𝑒’s own profile
and all payloads that 𝑒collected. We more generally denote by the union of all users’ collected datasets {𝑅𝑒}π‘’βˆˆπ‘ˆadata distribution
and define |{𝑅𝑒}π‘’βˆˆπ‘ˆ|=βˆ‘π‘’βˆˆπ‘ˆ|𝑅𝑒|as the number of ratings in it.
We denote by an epoch 𝑑a time interval of fixed length during which the users in π‘ˆestablish connections. We assume that all
connections that are established within the same epoch are independent of each other. If, for instance, a user 𝑒establishes two
connections to two users 𝑣and 𝑀in epoch 𝑑respectively, we assume that 𝑒cannot send ratings to 𝑀that 𝑒receives from 𝑣in
epoch 𝑑, and vice versa. In other words, we assume that all connections within the same epoch happen synchronously, although, in
practice, this will not be the case in a pervasive mobile-to-mobile recommender system. All connections that are established during
an epoch are dissolved at the end of the epoch.
Fig. 2 shows an example in which three users 𝑒, 𝑣, 𝑀 βˆˆπ‘ˆcollect ratings on four items 𝑖1, 𝑖2, 𝑖3, 𝑖4∈𝐼in two consecutive and
pairwise data exchanges. Initially (𝑑= 0), users only have their own profile, where profiles are indicated as row vectors. We see that
the row vectors of the user-item matrix 𝑅are split over all users. Over time, users collect payloads that hold each other’s ratings.
Eventually (𝑑= 2), users 𝑒and 𝑣each hold a full copy of the user-item matrix 𝑅, modulo row permutation, and 𝑀holds two out of
three rows of 𝑅.
Pervasive and Mobile Computing 103 (2024) 101951
4
T. Eichinger and A. KΓΌpper
We denote by 𝑇𝑒the number of epochs a user π‘’βˆˆπ‘ˆseeks to exchange rating data with other users. We call 𝑇𝑒the time horizon
of user 𝑒. With respect to the example shown in Fig. 2, users 𝑒and 𝑣have time horizons 𝑇𝑒=𝑇𝑣= 2, while user 𝑀has a time
horizon of 𝑇𝑀= 1. If all time horizons 𝑇𝑒coincide, we call the common value 𝑇the global time horizon.
From the perspective of a user, data minimization presents itself as a collection minimization problem as in traditional centralized
systems. With reference to Fig. 1, each user collects ratings in the user-item matrix 𝑅until reaching their individual saturation point,
that is, the point beyond which additional collection only yields marginal performance gains. From the perspective of the system,
however, data minimization presents itself as a redundancy minimization problem. The user-item matrix 𝑅is to be distributed over
all users with minimum redundancy such that every user has reached at least their individual sufficiency point and, at best, their
individual saturation point. Since users can only exchange data in physical proximity, the way in which data distributions form is
subject to the order in which connections between users form. This is what we describe in the following.
2.2. Connection behavior
Connectivity in pervasive mobile-to-mobile recommender systems, as in all wireless ad hoc networks, is characterized by node
mobility and intermittent connectivity. In view of intermittent connectivity, we assume for simplicity that no routing takes place. No
overlay networks are formed and communication is limited to users located within the transmission ranges of each other’s mobile
computing devices. With this assumption, we particularly omit a treatment of the hidden node problem. In view of node mobility,
we assume that pairwise connections between users are established randomly.
We describe connection behaviors as the random sampling of edges from a graph = (π‘ˆ, 𝐸), where π‘ˆdenotes the set of users
and 𝐸the set of edges between these users. Whenever two users share an edge, we assume that these two users are physically close
such that their mobile computing devices can establish an ad hoc wireless connection. We thus model node mobility in terms of the
changes in the graph . In the following, we describe the two connection behaviors that we consider in this paper.
2.2.1. Random connection behavior
We sample edges in two steps such that every user establishes a connection with 𝑛other users. In the first step, we generate an
ordering (𝑒1, 𝑒2, .., 𝑒𝑛)on the set of users π‘ˆ. We do so by 𝑛times uniformly randomly sampling a user from the user set π‘ˆwithout
replacement. In the second step, we iterate over this ordering. For 𝑖= 1,2,…, 𝑛, we define a subset of edges πΈπ‘–βŠ‚ 𝐸 that includes
the 𝑖th user 𝑒𝑖and excludes all other users π‘£βˆˆπ‘ˆthat have previously been sampled in an edge. Formally, we have
𝐸𝑖= {𝑒= {𝑒, 𝑣} ∈ 𝐸|π‘’π‘–βˆˆπ‘’and 𝑒, 𝑣 βˆ‰π‘’π‘—for all 𝑗 < 𝑖 },(1)
where 𝑒𝑗denotes the edge sampled in the 𝑗th iteration. Note that the subset 𝐸𝑖can be empty (𝐸𝑖= βˆ…). This is particularly the case
if the number of users in π‘ˆis odd and all users except one form pairs.
We see that in random connection behavior, the graph is fully-connected and every user can potentially establish a connection
with any other user. Any pair of users has the same probability of establishing a connection. The following connection behavior
assumes that not every pair of users can potentially establish a connection, yet only similar pairs of users.
2.2.2. Similarity-based connection behavior
Similarity-based connection behavior takes into consideration the cosine similarity between users’s profiles. Two users 𝑒and
𝑣share a graph in the graph from which we sample edges if their cosine similarity sim(𝑒, 𝑣)is positive. Recall that the cosine
similarity is only positive if two users have rated at least one item in common. It is widely known in the recommender systems
literature that, typically, only a fraction of user pairs has rated at least one item in common. As a consequence, when we apply the
same edge sampling procedure as for random connection behavior, we see that with similarity-based connection behavior, every
user can only establish connections with a subset of other users.
Similarity-based connection behavior takes into consideration that users are more likely to be in physical proximity to each other
if they are similar. In this case, we measure similarity in terms of the cosine similarity, that is, the similarity between two users’
ratings. Its underlying idea more generally extends to the idea behind homophilous search (see for instance [19]). Now that we
have described the way in which connections form, we describe the communication that happens between pairs of connected users.
2.3. Data exchange protocol
The Data Exchange Protocol (see Algorithm 1) formalizes the way in which a user sends ratings in a payload to a connected other
user. While both users of a connection communicate according to the Data Exchange Protocol, we only show one communication
direction, that is, communication from some user 𝑒to some user 𝑣, and omit the other since both directions are identical. We
assume that all users are benevolent in the sense that they follow the Data Exchange Protocol exactly and without enforcement, since
otherwise, users would not seek to engage with other users in the first place. The Data Exchange Protocol features the following
two consecutive phases.
(𝑖)Similarity Calculation Phase: It is a widely adopted assumption in the literature on decentralized collaborative filtering systems
that two users who are similar with respect to their profiles hold rating data relevant to each other. Therefore, user 𝑒sends the
profile π‘Ÿπ‘’to 𝑣and receives the profile π‘Ÿπ‘£from user 𝑣. Upon reception, user 𝑒calculates the cosine similarity sim(𝑒, 𝑣). Only if the
cosine similarity exceeds 𝑒’s similarity threshold π‘‘π‘’βˆˆ [0,1],𝑒decides to forward a payload in the Forwarding Phase.
Pervasive and Mobile Computing 103 (2024) 101951
5
T. Eichinger and A. KΓΌpper
Algorithm 1: Data Exchange Protocol
Given: sender 𝑒and receiver 𝑣
Parameters: forwarding parameter πœƒπ‘’, payload parameter 𝑛payload
1Procedure (𝑖)SimilarityCalculationPhase (πœƒ):
2send(π‘Ÿπ‘’);wait(π‘Ÿπ‘£);
/* 𝑒calculates the cosine similarity between 𝑒’s profile π‘Ÿπ‘’and 𝑣’s profile π‘Ÿπ‘£*/
3similarity = sim(π‘Ÿπ‘’,π‘Ÿπ‘£);
/* 𝑒calculates similarity threshold 𝑑𝑒*/
4π‘‘π‘’β†πΉβˆ’1
𝑒(1 βˆ’ πœƒπ‘’);
5end
6if similarity > 𝑑𝑒then
7Procedure (𝑖𝑖)PayloadForwardingPhase (𝑛payload):
/* 𝑒selects 𝑛payload profiles/aggregated payloads that have previously been collected in
the PayloadForwardingPhase and relate to the 𝑛payload users most similar to 𝑣*/
8π‘Ÿπ‘’β†’π‘£β†selectProfiles(𝑛payload, 𝑣);
/* [OPTIONAL] 𝑒aggregates multiple ratings on the same item into a single rating */
9π‘Ÿπ‘’β†’π‘£β†aggregatePayload(π‘Ÿπ‘’β†’π‘£);
10 send(uPayload);
11 end
12 end
13 wait(vPayload);
14 disconnect(𝑒, 𝑣);
We point out that it is convenient to represent 𝑒’s similarity threshold 𝑑𝑒as the a priori probability πœƒπ‘’βˆˆ [0,1] that the similarity
sim(𝑒, 𝑣)between user 𝑒and the a priori unknown user 𝑣exceeds 𝑒’s similarity threshold 𝑑𝑒. Here, we only describe the construction
of this probability for the special case that users establish connections with random connection behavior. A general construction
can be found in Appendix A. Assume the distribution of similarities between 𝑒and any other user is known to 𝑒, and its cumulative
distribution function 𝐹𝑒is continuous. Then, 1 βˆ’ πœƒπ‘’βˆΆ= 𝐹𝑒(𝑑𝑒)represents the portion of users that have a similarity lower than 𝑑𝑒to
𝑒, and πœƒπ‘’represents the portion of users that have a similarity higher than 𝑑𝑒to 𝑒. We call πœƒπ‘’the forwarding parameter of 𝑒. The
forwarding parameter πœƒπ‘’describes the probability that user 𝑒forwards a payload in the Forwarding Phase of a connection.
(𝑖𝑖)Payload Forwarding Phase: User 𝑒considers all previously collected profiles, including 𝑒’s own profile, for sending. Here, 𝑒
limits the amount of ratings sent to 𝑣in a payload by only selecting the 𝑛payload profiles collected from the users that are most similar
to 𝑣. We call 𝑛payload the payload parameter. Users further limit the number of ratings they send to 𝑣by aggregating the ratings in
the payload, which we describe in Section 2.4.
The Data Exchange Protocol formalizes the process through which users collect data. We can now formally represent an individual
user 𝑒’s data collection process as a finite sequence of parameter value 3-tuples
((𝑇(𝑑)
𝑒, πœƒ(𝑑)
𝑒))𝑑=1,2,…,π‘šπ‘’,(2)
where 𝑇(𝑑)
𝑒denotes the in epoch 𝑑projected end of 𝑒’s data collection process, πœƒ(𝑑)
𝑒denotes the probability that 𝑒sends a payload in a
connection in epoch 𝑑, and π‘šπ‘’the length of 𝑒’s data collection process. For the special case that all forwarding parameter values πœƒ(𝑑)
𝑒
coincide over all users 𝑒and are constant over all epochs 𝑑, we can roughly estimate that each user 𝑒collects about βˆ‘π‘šπ‘’
𝑑=1 πœƒ(𝑑)
𝑒payloads
irrespective of the underlying connection behavior. Each payload can be aggregated to decrease the number of ratings it holds, as
we describe in the following.
2.4. Payload aggregation
We call any method that aggregates multiple payload ratings on the same item into a single payload rating a payload aggregation
method. While the payload parameter 𝑛payload determines the number of profiles that a payload consists of, payload aggregation
summarizes the ratings in these 𝑛payload payloads such that every rating refers to only one distinct item. As such, payload aggregation
represents a basic method to minimize data collection. The gentle reader is referred to [20] for details on aggregation in the context
of recommender systems. We present in the following the two payload aggregation methods we study in this paper.
2.4.1. Rating by most similar user (MostSim)
We follow the payload aggregation method proposed by Shokri et al. [21]. Whenever there are multiple ratings on the same
item in a payload, the sender user chooses the rating in the profile that is most similar to the profile of the receiver user. We call
this payload aggregation method MostSim. We make an example to illustrate the MostSim payload aggregation method.
Pervasive and Mobile Computing 103 (2024) 101951
6
T. Eichinger and A. KΓΌpper
Example: With reference to the example in Fig. 2, we illustrate how user 𝑒would aggregate the payload π‘Ÿπ‘’β†’π‘£=[π‘Ÿπ‘’
π‘Ÿπ‘€]in epoch 𝑑= 2
before sending to user 𝑣with MostSim payload aggregation. Before user 𝑒aggregates the two profiles π‘Ÿπ‘’=[βˆ’ 3 5 βˆ’ ]and π‘Ÿπ‘€=[1 4 βˆ’ βˆ’ ],
𝑒first calculates the cosine similarities sim(𝑣, 𝑒)=0.78 and sim(𝑣, 𝑀)=0.09, and then aggregates them as follows:
MostSim ([π‘Ÿπ‘’
π‘Ÿπ‘€],[π‘ π‘–π‘š(𝑣, 𝑒)
π‘ π‘–π‘š(𝑣, 𝑀)])=MostSim ([βˆ’ 3 5 βˆ’
1 4 βˆ’ βˆ’],[0.78
0.09])=[135βˆ’].(3)
We have reduced the number of ratings in the payload π‘Ÿπ‘’β†’π‘£from four to three ratings.
The above example illustrates how MostSim payload aggregation reduces the number of ratings in a payload if there are multiple
ratings on the same item. The example illustrates further that aggregated payloads have the shape of a single profile. Indeed,
when we interpret aggregated payloads as profiles, it is clear that we can aggregate payloads of payloads. Note that in contrast to
non-aggregated payloads in which users can relate distinct profiles in a payload to distinct users, this is not possible for aggregated
payloads. Users, therefore, relate aggregated payloads to the users who forward them instead of the users whose profiles have been
aggregated (see also Line 8in Algorithm 1in this context). We continue with the second payload aggregation method we study.
2.4.2. Weighted averaging (WeighAvg)
Whenever there are multiple ratings on the same item in a payload, the sender user aggregates them by calculating weighted
average ratings, where weights represent cosine similarities between the profile a rating is in and the profile of the receiver user.
We call this payload aggregation method WeighAvg. We make an example to illustrate the WeighAvg payload aggregation method.
Example: With reference to the example in Fig. 2, we illustrate how user 𝑒would aggregate the payload π‘Ÿπ‘’β†’π‘£=[π‘Ÿπ‘’
π‘Ÿπ‘€]in
epoch 𝑑= 2 before sending to user 𝑣with WeighAvg payload aggregation. Before user 𝑒aggregates the two profiles π‘Ÿπ‘’=[βˆ’ 3 5 βˆ’ ]
and π‘Ÿπ‘€=[1 4 βˆ’ βˆ’ ],𝑒again calculates the cosine similarities sim(𝑣, 𝑒) = 0.78 and sim(𝑣, 𝑀) = 0.09 first and then aggregates them by
calculating the weighted arithmetic mean for each item as follows:
WeighAvg ([π‘Ÿπ‘’
π‘Ÿπ‘€],[π‘ π‘–π‘š(𝑣, 𝑒)
π‘ π‘–π‘š(𝑣, 𝑀)])=WeighAvg ([βˆ’ 3 5 βˆ’
1 4 βˆ’ βˆ’],[0.78
0.09])=[1 3.10 5 βˆ’].(4)
We have again reduced the number of ratings in the payload π‘Ÿπ‘’β†’π‘£from four to three ratings.
A comparison of WeighAvg and MostSim payload aggregation shows that the reduction in the number of ratings in a payload they
yield is identical. The difference between the two payload aggregation methods resides only in the cardinality of the ratings in the
aggregated payload they produce. This concludes the description of the payload aggregation methods we study. In the following,
we describe data minimization in decentralized recommender systems as a distributed optimization problem.
2.5. Distributed data minimization
We define the distributed data minimization problem as a combination of a local and a global minimization problem.
Definition 1 (Distributed Data Minimization).Let π‘ˆbe a set of users that exchange data via the Data Exchange Protocol.
Local Data Minimization: Let πœ€ > 0. Let further π‘’βˆˆπ‘ˆbe a fixed user with data collection process ((𝑇(𝑑)
𝑒, πœƒ(𝑑)
𝑒))𝑑=1,2,…,π‘šπ‘’associated
to the sequence (𝑅(𝑑)
𝑒)𝑑=1,2,…,π‘šπ‘’of collected datasets. Then user 𝑒minimizes the amount of collected data locally via
min
((𝑇(𝑑)
𝑒,πœƒ(𝑑)
𝑒))𝑑=1,2,…,π‘šπ‘’|𝑅(π‘šπ‘’)
𝑒|such that 𝑅(π‘šπ‘’)
𝑒is saturated with respect to πœ€. (5)
We call the integer π‘šπ‘’of a solution π‘…βˆ—
𝑒
(π‘šπ‘’)its saturation point.
Global Data Minimization: Let 𝑅(π‘šπ‘’)
𝑒denote the final collected dataset that a user π‘’βˆˆπ‘ˆcollects. Then the entirety of all users π‘ˆ
minimizes the amount of collected data globally via
min
{𝑅(π‘šπ‘’)
𝑒}π‘’βˆˆπ‘ˆ|{𝑅(π‘šπ‘’)
𝑒}π‘’βˆˆπ‘ˆ|such that all 𝑅𝑒are sufficient,(6)
where {𝑅(π‘šπ‘’)
𝑒}π‘’βˆˆπ‘ˆdenotes a data distribution. We call a solution {π‘…βˆ—
𝑒
(π‘šπ‘’)}π‘’βˆˆπ‘ˆto the global data minimization problem a
minimized data distribution. If additionally all collected datasets π‘…βˆ—
𝑒
(π‘šπ‘’)solve their respective local data minimization problem,
we call {π‘…βˆ—
𝑒
(π‘šπ‘’)}π‘’βˆˆπ‘ˆa minimum data distribution.
With respect to Fig. 1, we see that every user tries to solve their respective local data minimization problem by stopping data
collection at their respective saturation point. They not consider the amount of data that other users collect. Differently put, the
local data minimization problem puts each user into the shoes of a centralized recommendation provider who seeks to minimize the
amount of data they collect from other users as in a centralized recommender system. In other words, the local data minimization
problem is the data minimization problem in a centralized recommender system. In decentralized systems, we also have that all
users try to solve the global data minimization problem. A solution to the global data minimization requires that every user passes
their respective sufficiency point.
We see that the local and the global data minimization problems are competing problems and solving both problems simul-
taneously is in general not possible. A short explanation is that users’ sufficiency points and saturation points, in general, do not
coincide. A more detailed explanation can be found in Appendix B. A comparison of the computational complexity between the
data minimization problem in centralized and decentralized recommender systems can be found in Appendix C. It remains to define
saturation and sufficiency.
Pervasive and Mobile Computing 103 (2024) 101951
7
T. Eichinger and A. KΓΌpper
Definition 2 (Saturation).Let π‘’βˆˆπ‘ˆbe a user with data collection process ((𝑇(𝑑)
𝑒, πœƒ(𝑑)
𝑒))𝑑=1,2,…,π‘šπ‘’associated to the sequence
(𝑅(𝑑)
𝑒)𝑑=1,2,…,π‘šπ‘’of collected datasets. For a fixed rating prediction method and performance metric 𝜎, we denote by πœŽπ‘’(𝑅(𝑑)
𝑒)the
performance that user 𝑒measures for the collected dataset 𝑅(𝑑)
𝑒. We further denote by π›₯πœŽπ‘’(𝑅(𝑑)
𝑒, 𝑅(π‘‘βˆ’1)
𝑒)the performance difference
that user 𝑒measures between the collected datasets 𝑅(𝑑)
𝑒and 𝑅(π‘‘βˆ’1)
𝑒, that is between the end of epoch 𝑑and the end of epoch π‘‘βˆ’ 1.
Formally:
π›₯πœŽπ‘’(𝑅(𝑑)
𝑒, 𝑅(π‘‘βˆ’1)
𝑒) ∢= πœŽπ‘’(𝑅(𝑑)
𝑒) βˆ’ πœŽπ‘’(𝑅(π‘‘βˆ’1)
𝑒).(7)
We more briefly write π›₯𝜎(𝑑)
π‘’βˆΆ= π›₯πœŽπ‘’(𝑅(𝑑)
𝑒, 𝑅(π‘‘βˆ’1)
𝑒)if the underlying collected datasets 𝑅(𝑑)
𝑒and 𝑅(π‘‘βˆ’1)
𝑒are clear from the context.
In this setting, let πœ€ > 0. Then, if the performance difference π›₯𝜎(𝑑)
𝑒satisfies
|π›₯𝜎(𝑑)
𝑒|=|πœŽπ‘’(𝑅(𝑑)
𝑒) βˆ’ πœŽπ‘’(𝑅(π‘‘βˆ’1)
𝑒)|∈ (0, πœ€),(8)
for some collected dataset 𝑅(𝑑)
𝑒, we say that the performance difference π›₯𝜎(𝑑)
𝑒is marginal. In this case, we call 𝑅(𝑑)
𝑒an πœ€-saturated
collected dataset, or simply saturated collected dataset if πœ€is clear from the context.
Definition 3 (Sufficiency).Let π‘ˆand 𝐼be sets of users and items respectively and {𝑅𝑒}π‘’βˆˆπ‘ˆbe a data distribution. Let further πΌπ‘’βŠ‚ 𝐼
be the subset of items that a user π‘’βˆˆπ‘ˆis interested in and π‘…π‘’βˆˆ {𝑅𝑒}π‘’βˆˆπ‘ˆthat user’s collected dataset. We then call the collected
dataset 𝑅𝑒insufficient if none of the ratings in 𝑅𝑒refers to any of 𝑒’s items of interest in 𝐼𝑒, and else call it sufficient. More generally,
we call the data distribution {𝑅𝑒}π‘’βˆˆπ‘ˆsufficient if all its collected datasets π‘…π‘’βˆˆ {𝑅𝑒}π‘’βˆˆπ‘ˆare sufficient, and else call it insufficient.
Finally, we define the sufficiency coefficient ({𝑅𝑒}π‘’βˆˆπ‘ˆ)of a data distribution {𝑅𝑒}π‘’βˆˆπ‘ˆas the fraction of sufficient collected datasets
in the data distribution {𝑅𝑒}π‘’βˆˆπ‘ˆ:
({𝑅𝑒}π‘’βˆˆπ‘ˆ) = |{π‘…π‘’βˆˆ {𝑅𝑒}π‘’βˆˆπ‘ˆ|𝑅𝑒is sufficient}|
|{𝑅𝑒}π‘’βˆˆπ‘ˆ|∈ [0,1],(9)
where |{β‹…}|denotes the number of elements in the set {β‹…}. Clearly, a data distribution {𝑅𝑒}π‘’βˆˆπ‘ˆis sufficient if and only if its sufficiency
coefficient ({𝑅𝑒}π‘’βˆˆπ‘ˆ)equals 1.
We emphasize that whether a solution to the distributed data minimization problem exists, that is, a solution that minimizes
both the local and the global data minimization problem simultaneously, depends largely on the underlying connection behavior.
Details on when solutions to the individual problems exist with random connection behavior are given in Appendix D. To see that
a solution to neither the local nor the global problem may exist, consider the case that a user can only establish connections with
users to whom the user is not similar. In this case, the user neither forwards nor receives a payload. In the following, we describe
an algorithm that we find to solve the distributed data minimization problem with random and similarity-based data minimization.
2.6. Distributed gradient descent-based data minimization
We propose to apply Distributed Gradient Descent (DGD) data minimization as previously proposed in [8] for use in pervasive
mobile-to-mobile recommender systems. The algorithm makes it such that users coordinate their individual data collection processes
by communicating and adjusting their parameter values πœƒ(𝑑)
𝑒and 𝑇(𝑑)
𝑒to each other such that their collected datasets saturate
uniformly.
Definition 4 (DGD Data Minimization).Let π‘ˆbe a set of users that collect ratings via the Data Exchange Protocol. Let further π‘’βˆˆπ‘ˆ
be a user that establishes connections in some epoch 𝑑with the users in some subset of users 𝑉 βŠ‚ π‘ˆβˆ–{𝑒}, and the parameter variable
𝑝be either the time horizon 𝑇or the forwarding parameter πœƒ. User 𝑒then updates 𝑒’s local parameter value 𝑝𝑒at the end of epoch 𝑑
for use in epoch 𝑑+ 1 as follows:
π›₯𝑝(𝑑)
𝑒=𝑝(𝑑+1)
π‘’βˆ’π‘(𝑑)
𝑒
= βˆ’ π›Όπ‘β‹…βŽ§
βŽͺ
⎨
βŽͺ
⎩
1,if |π›₯𝜎(𝑑)
𝑒|= 0
1,if |π›₯𝜎(𝑑)
𝑒|∈ (0, πœ€)
0,if |π›₯𝜎(𝑑)
𝑒|∈ [πœ€, ∞)
⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟
innovation term
+𝛽𝑝⋅1
|𝑉|βˆ‘
π‘£βˆˆπ‘‰
(𝑝(𝑑)
π‘£βˆ’π‘(𝑑)
𝑒)
⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟
consensus term
+𝛾𝑝⋅{1,if 𝜎(𝑑)
𝑒undefined
0,else,
⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟
regularization term
(10)
where 𝛼𝑝, 𝛽𝑝, 𝛾𝑝β‰₯0denote non-negative update weights specific to the parameter 𝑝,𝑉denotes the set of 𝑛connection users 𝑒establishes
a connection with in epoch 𝑑, and π›₯𝜎(𝑑)
𝑒denotes the performance differential of user 𝑒at the start and end of epoch 𝑑with respect to
some performance metric 𝜎. If a parameter update would cause a parameter value to fall below or exceed its respective boundary
interval π‘‡π‘’βˆˆ [0,∞) or πœƒπ‘’βˆˆ [0,1], the parameter value is instead set to its lower or upper bound respectively.
Termination: Data collection terminates if at least one of the following criteria are satisfied:
(I) All individual time horizons are smaller than the current epoch (max𝑒𝑇(𝑑)
𝑒< 𝑑).
(II) All individual forwarding parameters are zero (maxπ‘’πœƒπ‘’= 0).
Pervasive and Mobile Computing 103 (2024) 101951
8
T. Eichinger and A. KΓΌpper
Criterion (I) describes the situation that all users’ data collection processes have ended and no user seeks further connections.
Criterion (II) describes the situation that users do seek connections to other users, yet do not exchange payloads in any such
connection.
This concludes the presentation of the methods we use. In the following section, we establish a link between data minimization
and anonymity in the context of data minimization in general and distributed data minimization in pervasive mobile-to-mobile
recommender systems in particular.
3. Theory
The goal of this section is to establish a link between data minimization and anonymity. In order to do so, we follow in the lines
of Pfitzmann and Hansen’s [22] working document on the disambiguation of anonymity in the context of data minimization and
adapt their definitions to the context of pervasive mobile-to-mobile recommender systems. We begin with a general characterization
of data minimization.
3.1. A general characterization of data minimization
Data minimization as per Pfitzmann and Hansen [22] foresees data minimization as the following nested structure of minimiza-
tion objectives.
(1.) Minimize the possibility to collect personal data.
(2.) Minimize the collection of personal data.
(3.) Minimize the time that collected personal data are stored.
These three minimization objectives are nested in the sense that they are not to be minimized simultaneously, yet instead,
Objective (1.) first, only afterward Objective (2.), and finally Objective (3.). Since the three minimization objectives are formulated
for information systems in general, it is meaningful to discuss them in the context of pervasive mobile-to-mobile recommender
systems.
In a pervasive mobile-to-mobile recommender system, users establish ad hoc wireless connections in physical proximity to
exchange ratings with each other. In view of Objective (1.), we see that the possibility to collect personal data depends on the
mobility of users and their pairwise proximity to each other. Since we model the mobility of users as the random sampling of edges
from a graph, we can quantify the possibility to collect personal data in terms of the number of edges that are sampled, which
represents the number of connections that are established. Since we assume that every user establishes a single connection per
epoch, we address Objective (1.) by having every user 𝑒minimize their time horizon 𝑇𝑒, that is, the number of epochs in which 𝑒
establishes a connection with another user. On the basis of this minimized amount of connections, we then minimize the collection
of personal data in Objective (2.).
The collection of personal data hinges on the amount of ratings that are exchanged in each connection. The Data Exchange
Protocol foresees the exchange of ratings in two respects. First, in the Similarity Calculation Phase (𝑖), connected users exchange
their own respective profile ratings to calculate the cosine similarity. Only if the cosine similarity exceeds a similarity threshold,
users exchange the previously collected ratings of other users in the consecutive Payload Forwarding Phase (𝑖𝑖). Recall that the
forwarding parameter πœƒπ‘’quantifies the probability that a user 𝑒enters the Payload Forwarding Phase (𝑖𝑖) and sends a payload in
a connection. Recall further that the payload parameter 𝑛payload quantifies the number of profiles in that payload. We see that the
number of profiles forwarded in payloads is βˆ‘π‘’βˆˆπ‘ˆβˆ‘π‘‡π‘’
𝑑=1 πœƒπ‘’β‹…π‘›payload. Since we already minimized the time horizon 𝑇𝑒in Objective (1.)
and 𝑛payload is a constant, we address Objective (2.) by having every user 𝑒minimize their forwarding parameter πœƒπ‘’. On the basis
of this minimized amount of forwarded payloads, we finally minimize the time that collected personal data are stored.
The time that collected personal data are stored depends on the way in which recommendations are derived. If we assume
that each user trains a collaborative filtering model on their individually collected ratings on their mobile computing device, it
makes sense to store them until their data collection process terminates. After termination of the data collection process, a user
calculates recommendations on the collected data. After recommendations have been calculated, the collected ratings do not serve
their purpose anymore and can thus be deleted. With respect to the representation of individual users’ data collection processes as
in Eq. (2), we can reformulate the general Objectives (1.), (2.), and (3.) in the context of a pervasive mobile-to-mobile recommender
system for each user 𝑒as follows:
(1.’) Minimize the time horizon 𝑇𝑒.
(2.’) Minimize the forwarding parameter πœƒπ‘’.
(3.’) Delete all collected ratings after epoch 𝑇𝑒and after you have calculated recommendations.
We now see perhaps more clearly that, from the perspective of a single user, the distributed data minimization problem represents
a minimization problem on the parameter space spanned by the time horizon 𝑇𝑒and forwarding parameter πœƒπ‘’. From the perspective
of the system as a whole, it represents a minimization problem over the Cartesian product over all these parameter spaces.
So far, we only considered the minimization of ratings irrespective of whether they are the ratings by the user who forwards
them or those of a user who is not involved in a connection. We make this distinction in the following section in the context of a
discussion on anonymity.
Pervasive and Mobile Computing 103 (2024) 101951
9
T. Eichinger and A. KΓΌpper
3.2. Anonymity in terms of unlinkability
The basis for any discussion on anonymity requires the definition of the information of interest that is to be protected. In
the context of a pervasive mobile-to-mobile recommender system as described throughout Section 2, we consider two types of
information of interest, namely identifiers and ratings. Since we wish to characterize anonymity in terms of linkability, we need to
define linkability first. In order to do so, we begin by adapting the definition of linkability as per Pfitzmann and Hansen [22] to
identifiers and ratings.
Definition 5 (Linkability).Linkability of two or more identifiers and/or ratings from an attacker’s perspective means that the attacker
can sufficiently distinguish whether they are related or not.
In the context of the above definition, we may think of attackers as other users in the system or by-standers who are present
in physical proximity of users yet do not participate in the system as users themselves. As identifiers, we may think of for instance
network identifiers, application identifiers, and device identifiers which are used by users’ mobile computing devices to establish
an ad hoc wireless connection. For linguistic simplicity, we subsume all these different types of identifiers together under the term
identifier. For notational simplicity, we use the symbol π‘’βˆˆπ‘ˆto denote both a user as a person as well as their respective identifier
and always say whether we mean the user themselves or their identifier.
If a user always uses the same identifier, it is clear that other users can link distinct connections they establish with the same
user. We say that a user makes use of a persistent identifier. If users otherwise change their network identifiers between connections,
we say that they use changing identifiers. Making use of changing identifiers makes it more difficult for attackers to link the identifiers
and ratings of a user. It remains to describe how linkability relates to anonymity. In order to do so, we follow again the terminology
as per Pfitzmann and Hansen [22] and adapt their definition of sender anonymity.
Definition 6 (Anonymity (in Terms of Linkability)).We say that a user π‘’βˆˆπ‘ˆwho communicates using the identifier 𝑒is anonymous
from an attacker’s perspective if the attacker cannot sufficiently distinguish between the identifier 𝑒and any other identifier
(anonymity w.r.t. identifiers). Furthermore, we say that a user 𝑒communicates a rating π‘Ÿπ‘’,𝑖 anonymously from an attacker’s
perspective if the attacker cannot sufficiently distinguish between the rating π‘Ÿπ‘’,𝑖 and any other rating (anonymity w.r.t. ratings).
By the above definition, any two users who establish a connection can never be anonymous toward each other, neither with
respect to their identifiers nor their ratings. Only users who do not establish a connection can potentially remain anonymous toward
each other. We make a short example to illustrate this.
Example: We revisit the example in Fig. 2 in which user 𝑒establishes a connection 𝑒↔𝑀to 𝑀in epoch 𝑑= 1 and a connection
𝑒↔𝑣to 𝑣in epoch 𝑑= 2. Since connections are established in physical proximity and payloads are exchanged immediately between
connected users, user 𝑒can distinguish user 𝑀from user 𝑣on, for instance, the basis of when and where the two distinct connections
were established. We see that 𝑣and 𝑀cannot be anonymous toward 𝑒with respect to their identifiers. Since 𝑣and 𝑀communicate
their profiles π‘Ÿπ‘£and π‘Ÿπ‘€respectively in the Similarity Calculation Phase (𝑖), we see further that 𝑣and 𝑀are also not anonymous
toward 𝑒with respect to their ratings. The users 𝑣and 𝑀themselves, however, can potentially be anonymous toward each other,
since they do not establish a connection. We examine this question in the following.
When user 𝑒forwards 𝑀’s profile π‘Ÿπ‘€in the payload π‘Ÿπ‘’β†’π‘£=[π‘Ÿπ‘’
π‘Ÿπ‘€]to user 𝑣in epoch 𝑑= 2, we see that 𝑣can immediately distinguish
between 𝑒’s profile π‘Ÿπ‘’and 𝑀’s profile π‘Ÿπ‘€. In this case, 𝑀is trivially not anonymous toward 𝑣with respect to 𝑀’s ratings, since the
non-aggregated payload π‘Ÿπ‘’β†’π‘£=[π‘Ÿπ‘’
π‘Ÿπ‘€]allows to distinguish whether a rating is by the same or by different users. Since the profile
π‘Ÿπ‘€is marked with the identifier 𝑀, user 𝑀is trivially not anonymous toward user 𝑣with respect to 𝑀’s identifier. The situation is
different if 𝑒aggregates the payload π‘Ÿπ‘’β†’π‘£=[π‘Ÿπ‘’
π‘Ÿπ‘€]through, for instance, WeighAvg to the aggregated payload π‘Ÿπ‘’β†’π‘£=[1 3.10 5 βˆ’ ]before
sending to 𝑣(see Eq. (4)). In this case, user 𝑀is anonymous toward user 𝑣with respect to 𝑀’s identifier since the payload π‘Ÿπ‘’β†’π‘£is
not marked with user 𝑀’s identifier. User 𝑀is additionally anonymous toward 𝑣with respect to 𝑀’s ratings, since the aggregated
payload π‘Ÿπ‘’β†’π‘£=[1 3.10 5 βˆ’ ]itself does not allow 𝑣to tell the underlying profiles π‘Ÿπ‘’and π‘Ÿπ‘€that have been used for aggregation. Last
but not least, we point out that 𝑣is trivially anonymous toward 𝑀, since 𝑀neither receives 𝑣’s identifier nor any of 𝑣’s ratings.
The above example presents the use of changing identifiers and payload aggregation as methods to retain the anonymity between
users who do not establish an ad hoc wireless connection in physical proximity. We emphasize, however, that anonymity is never
absolute. With respect to the above example, if 𝑣knows, for instance, the order in which the two connections 𝑒↔𝑀and 𝑒↔𝑣
happen and the underlying payload aggregation method that 𝑒uses, it is possible for 𝑣to tell the existence of 𝑀and most likely
also to distinguish between 𝑒’s and 𝑀’s ratings in the aggregated payload π‘Ÿπ‘’β†’π‘£. In other words, the more connections users establish
and the more data they communicate in these connections, the easier it becomes for an attacker to relate identifiers and ratings to
each other.
The two data minimization objectives (1.’) and (2.’) that we formulated in the previous section implicitly aim to countervent the
possibility that an attacker can relate identifiers and ratings to each other. They do so by (1.’) minimizing the number of connections
that are established and (2.’) minimizing the number of ratings that are communicated over these connections. It is in this sense
that data minimization and anonymity are related to each other. In the following section, we present results on the minimization of
data collection according to Objectives (1.’) and (2.’) as well as the effect of users making use of changing identifiers and payload
aggregation.
Pervasive and Mobile Computing 103 (2024) 101951
10
T. Eichinger and A. KΓΌpper
4. Results
We first show how DGD data minimization affects the parameters 𝑇𝑒and πœƒπ‘’. We then demonstrate that DGD data minimization
consistently produces sufficient and minimized data distributions.
4.1. Dataset and experimental setup
We present results on a stratified sample of the MovieLens ml-25m dataset [23], which is a widely used benchmark dataset in the
recommender systems literature. The sample comprises 500 profiles that consist of 77,588 ratings on 9816 movies. We split ratings
into training and test sets per user at a ratio of 80 to 20. The training set comprises 62,073 ratings with profiles holding between
16 to 2726 ratings. We report results as means over five distinct splits into training and test data.
We simulate data collection on the training ratings following the Data Exchange Protocol as described in Section 2.3. Since each
user establishes a single connection per epoch, we see that every user establishes 𝑑connections until epoch 𝑑. We set 𝑛payload = 3
constant such that the number of profiles per payload remains constant over time, and we can expect that the average amount of
ratings in a payload remains constant over time.
We assess recommendation performance on models trained on every user’s individually collected training ratings and evaluated
on their test ratings. We train models with the help of the Cornac library [24]. We report results on the basis of the Singular Value
Decomposition (SVD) collaborative filtering algorithm, which we find to outperform, for instance, user-based collaborative filtering,
item-based collaborative filtering, and Matrix Factorization consistently. We thereby follow prior works that report the suitability
of SVD in the context of data minimization in recommender systems [3–5].
With respect to DGD data minimization, we experimented with distinct combinations of update weights 𝛼𝑇, 𝛽𝑇, π›Ύπ‘‡βˆˆ {1,2,3,5}
for the time horizon 𝑇𝑒and π›Όπœƒ, π›½πœƒ, π›Ύπœƒβˆˆ {0.1,0.2,0.3} for the forwarding parameter πœƒπ‘’. For small update weights 𝛽𝑇= 1 and π›½πœƒ= 0.1,
results differed only marginally in the amount of collected data and performance. However, larger update weights rendered DGD
data minimization unstable; that is, data distributions tended to become insufficient, and the data minimization procedure sometimes
did not terminate. For simplicity of presentation, we thus only report results for 𝛼𝑇=𝛽𝑇=𝛾𝑇= 1 and π›Όπœƒ=π›½πœƒ=π›Ύπœƒ= 0.1. Finally,
we consider performance changes π›₯πœŽπ‘’in the interval (0,0.01) marginal, that is, we set the performance threshold πœ€= 0.01.
4.2. Metrics
We measure the recommendation performance with respect to the Root Mean Squared Error (RMSE) metric. The RMSE is a
standard performance metric in the recommender systems literature.
Let 𝑒be a user with training ratings π‘Ÿπ‘’, test ratings π‘Ÿπ‘’|test, and collected dataset 𝑅𝑒. On the basis of the collected dataset 𝑅𝑒, user
𝑒derives a predicted rating ξš‚π‘Ÿπ‘’,𝑖 for every test rating π‘Ÿπ‘’,𝑖 βˆˆπ‘Ÿπ‘’|test. Note that 𝑒may not be able to derive predicted ratings for all test
ratings. For instance, user 𝑒cannot predict a rating for an item if 𝑒has not collected any rating on that item. We thus denote by
the subset ξš‚π‘Ÿπ‘’|test βŠ‚ π‘Ÿπ‘’|test the test ratings for which user 𝑒can derive a predicted rating. Then, we measure the error between the
predicted test ratings and the test ratings as follows:
RMSE𝑒(𝑅𝑒) = βŽ›βŽœβŽœβŽ
1
|ξš‚π‘Ÿπ‘’|test|βˆ‘
ξš‚π‘Ÿπ‘’,π‘–βˆˆξš‚π‘Ÿπ‘’|test (ξš‚π‘Ÿπ‘’,𝑖 βˆ’π‘Ÿπ‘’,𝑖)2⎞⎟⎟⎠
1βˆ•2
,(11)
where |ξš‚π‘Ÿπ‘’|test|denotes the number of ratings in ξš‚π‘Ÿπ‘’|test. In case that user 𝑒cannot derive any predicted rating for any test rating
(ξš‚π‘Ÿπ‘’|test = βˆ…), we say that the RMSE𝑒is undefined. We denote by RMSE the average over all RMSE𝑒that are not undefined.
We propose the following redundancy metric to measure the number of ratings in a data distribution {𝑅𝑒}π‘’βˆˆπ‘ˆfor use in
decentralized recommender systems:
({𝑅𝑒}π‘’βˆˆπ‘ˆ) = βˆ‘π‘’βˆˆπ‘ˆ|𝑅𝑒|
|𝑅|∈ [1,∞).(12)
The redundancy metric quantifies the amount of ratings in the system as multiples of the number of ratings in the
user-item matrix 𝑅. In centralized systems, the amount of ratings is often quantified as a fraction of the ratings in the user-item
matrix 𝑅in the interval [0,1] (see for instance [4,5]). The redundancy metric extends this metric to decentralized systems.
4.3. Baselines
We compare DGD data minimization against five baselines. The first two baselines are naive baselines that feature static
parameters 𝑇𝑒and πœƒπ‘’. They are naive in the sense that they assume that optimal constant parameter values exist for every user
Pervasive and Mobile Computing 103 (2024) 101951
11
T. Eichinger and A. KΓΌpper
Fig. 3. Mean time horizon 𝑇and mean forwarding parameter πœƒover time for Distributed Gradient Descent (DGD) data minimization and data minimization
baselines with dynamic parameter values. β–Άand β—€mark the beginning and end of data collection respectively. Error bars indicate deviations of one standard
deviation from the mean.
and that those parameter values are known prior to data collection. The remaining three baselines feature dynamic parameters 𝑇
and πœƒ. In contrast to the naive baselines, these baselines assume that constant optimal parameter values exist. Parameter values
need to be dynamically updated during data collection.
Static-Global (Static-G): All users employ the same constant parameter values 𝑇𝑒and πœƒπ‘’. They employ the same global similarity
threshold 𝑑instead of individual similarity thresholds 𝑑𝑒. Formally: 𝑑=𝐹(1βˆ’πœƒ), where 𝐹denotes the cumulative distribution function
of pairwise similarities between users.
Static-Individual (Static-I): All users employ the same constant parameter values 𝑇𝑒and πœƒπ‘’. In contrast to Static-Global, users
employ individual similarity thresholds 𝑑𝑒. Formally: 𝑑𝑒=𝐹𝑒(1βˆ’πœƒπ‘’), where 𝐹𝑒denotes the cumulative distribution function of pairwise
similarities between user 𝑒and all other users.
Dynamic-Centralized (Dyn-C): All users employ the same constant parameter values 𝑇𝑒=𝑇and πœƒπ‘’initially. Users stop data
collection as soon as they reach their saturation point. Formally: They set their time horizon to the current epoch (𝑇𝑒=𝑑) as soon
as their performance differentials become marginal (|π›₯𝜎(𝑑)
𝑒|∈ (0, πœ€)). We call this baseline centralized, since users minimize data
collection individually as if they were a centralized system. Users minimize data collection, in particular, independently from other
users.
Dynamic-Regularization (Dyn-R): All users employ the same constant parameter values 𝑇𝑒and πœƒπ‘’initially. Users update their
individual parameter values between epochs only with respect to the regularization term in Equation (Eq. (10)). Formally: Users
apply DGD data minimization with 𝛼𝑇=π›Όπœƒ=𝛽𝑇=π›½πœƒ= 0.
Dynamic-Regularization&Innovation (Dyn-R&I): All users employ the same constant parameter values 𝑇𝑒and πœƒπ‘’initially. In
contrast to Dyn-R&I, users not only update their individual parameter values with respect to the regularization term but also with
respect to the innovation term in Equation (Eq. (10)). Formally: Users apply DGD data minimization with 𝛽𝑇=π›½πœƒ= 0.
We see that users can control the amount of data they collect by controlling their local parameters 𝑇𝑒and πœƒπ‘’. Users increase their
parameter values to increase data collection and decrease their parameter values to decrease or terminate it. Before we study the
data distributions that result from users controlling their parameters 𝑇𝑒and πœƒπ‘’, we first shed a light on the parameters themselves.
4.4. Data collection parameters
Fig. 3 shows the behavior of the mean time horizon 𝑇(𝑑)∢= 1βˆ•π‘›user βˆ‘π‘’βˆˆπ‘ˆπ‘‡(𝑑)
𝑒and mean forwarding parameter
πœƒ(𝑑)∢= 1βˆ•π‘›user βˆ‘π‘’βˆˆπ‘ˆπœƒ(𝑑)
𝑒for DGD and the dynamic baselines Dyn-C, Dyn-R, and Dyn-R&I over epochs 𝑑. We omit showing
mean parameters for the static baselines Static-G and Static-I, because they are constant over time.
For DGD, both 𝑇and πœƒincrease initially until they reach a maximum almost simultaneously. Past the maximum, both mean
parameters decrease until πœƒdrops to zero and data collection terminates. Dyn-R&I follows the same pattern except that it exhibits
a milder decrease in the forwarding parameter than DGD. Recall that the parameter update formulas for DGD and Dyn-R&I differ
Pervasive and Mobile Computing 103 (2024) 101951
12
T. Eichinger and A. KΓΌpper
Table 1
Pairs ()of redundancies and sufficiency coefficients for Distributed Gradient Descent data minimization (DGD) and the five baselines
for distinct combinations of starting parameter values 𝑇𝑒and πœƒπ‘’for all users π‘’βˆˆπ‘ˆ(𝑛payload = 3). Redundancies are crossed out if the underlying
data distribution is insufficient ( β‰  100%). Asterisks mark statistical significance of the lowest redundancy under sufficiency per row under a
two-sided Welch’s t-test with Bonferroni correction and significance level 𝛼= 0.01.
π‘‡π‘’πœƒπ‘’Static-G Static-I Dyn-C Dyn-R Dyn-R&I DGD
Random
Connection Behavior
25 0.0 – – – 53.18 (100%) 13.38 (100%) 10.92βˆ—(100%)
0.1 10.94 (67.9%) 10.45 (93.2%) 8.58 (93.3%) 52.81 (100%) 13.40 (100%) 10.64βˆ—(100%)
0.2 19.55 (83.8%) 21.73 (99.5%) 16.61 (99.5%) 54.89 (100%) 13.82 (100%) 11.57βˆ—(100%)
0.3 28.15 (92.4%) 31.41 (100%) 21.74 (100%) 57.47 (100%) 15.36 (100%) 13.07βˆ—(100%)
50 0.0 – – – 60.70 (100%) 13.19 (100%) 10.93βˆ—(100%)
0.1 19.12 (75.7%) 21.12 (99.6%) 16.17 (99.6%) 60.40 (100%) 13.25 (100%) 10.64βˆ—(100%)
0.2 29.92 (88.9%) 38.04 (100%) 23.16 (100%) 61.26 (100%) 13.69 (100%) 11.56βˆ—(100%)
0.3 39.49 (95.6%) 47.25 (100%) 25.20 (100%) 63.30 (100%) 15.28 (100%) 13.08βˆ—(100%)
75 0.0 – – – 66.49 (100%) 13.19 (100%) 10.93βˆ—(100%)
0.1 23.07 (79.5%) 28.39 (100%) 19.66 (100%) 66.16 (100%) 13.25 (100%) 10.64βˆ—(100%)
0.2 34.21 (91.8%) 44.93 (100%) 24.05 (100%) 66.80 (100%) 13.75 (100%) 11.56βˆ—(100%)
0.3 44.20 (96.7%) 53.39 (100%) 25.47 (100%) 68.64 (100%) 15.28 (100%) 13.08βˆ—(100%)
RandInt(25,75) Rand(0.1,0.3) – 36.33 (99.3%) 20.41 (99.9%) 60.10 (100%) 13.58 (100%) 10.90βˆ—(100%)
Similarity-based
Connection Behavior
25 0.0 – – – 52.83 (100%) 11.90 (100%) 10.06βˆ—(100%)
0.1 10.64 (71.2%) 10.44 (94.6%) 9.55 (95.6%) 52.45 (100%) 11.95 (100%) 10.01βˆ—(100%)
0.2 20.52 (87.7%) 24.87 (99.6%) 18.98 (99.9%) 57.75 (100%) 13.95 (100%) 11.99βˆ—(100%)
0.3 30.66 (94.6%) 36.05 (100.0%) 25.32 (100%) 61.31 (100%) 15.77 (100%) 14.12βˆ—(100%)
50 0.0 – – – 60.75 (100%) 11.83 (100%) 10.07βˆ—(100%)
0.1 18.67 (78.2%) 22.25 (99.6%) 16.85 (99.8%) 60.02 (100%) 11.71 (100%) 10.00βˆ—(100%)
0.2 31.26 (91.8%) 41.60 (100%) 24.33 (100%) 64.92 (100%) 13.95 (100%) 12.00βˆ—(100%)
0.3 42.40 (97.1%) 52.08 (100%) 28.07 (100%) 68.43 (100%) 15.83 (100%) 14.11βˆ—(100%)
75 0.0 – – – 66.46 (100%) 11.83 (100%) 10.07βˆ—(100%)
0.1 22.82 (82.1%) 29.84 (100%) 19.51 (100%) 65.51 (100%) 11.71 (100%) 10.00βˆ—(100%)
0.2 35.85 (93.6%) 48.73 (100%) 24.97 (100%) 70.16 (100%) 13.81 (100%) 12.00βˆ—(100%)
0.3 47.16 (98.0%) 58.07 (100%) 28.21 (100%) 73.35 (100%) 15.83 (100%) 14.11βˆ—(100%)
RandInt(25,75) Rand(0.1,0.3) – 39.79 (99.8%) 21.37 (99.9%) 63.01 (100%) 13.76 (100%) 11.90βˆ—(100%)
only in DGD having the consensus term (see Section 4.3). Data collection terminates quicker for DGD than for Dyn-R&I due to the
consensus term in the update formula.
In contrast to DGD and Dyn-R&I, both 𝑇and πœƒincrease initially yet do not decrease for Dyn-R. Recall that the parameter update
formula of Dyn-R only features the regularization term, while that of Dyn-R&I features both regularization and innovation terms.
We thus attribute the initial increase in mean parameters to the regularization term and the subsequent decrease to the innovation
term. At the beginning of data collection, users only have their own profile, which is an insufficient (RMSE𝑒undefined) amount
of rating data. The regularization term thus causes mean parameter values to grow. As users collect rating data over time, the
number of users with an insufficient amount of ratings decreases and the number of users with a sufficient (RMSE𝑒defined) amount
of ratings increases. Past the maximum, the decrease in mean parameter values is caused by the innovation term. Users with a
sufficient amount of data begin to outweigh users with an insufficient amount.
For Dyn-C, mean parameters do not increase initially as for Dyn-R, Dyn-R&I, and DGD. In particular, the mean forwarding
parameter πœƒremains constant. The mean time horizon 𝑇decreases when users stop establishing connections with other users as
soon as their performance differentials become marginal (|π›₯𝜎(𝑑)
𝑒|∈ (0, πœ€)). Observe the high variance in mean time horizons 𝑇. The
variance is high since users do not coordinate data collection and thus do not agree on when they uniformly end their data collection
processes. Now that we have described the way users collect data in terms of the mean parameters πœƒand 𝑇, we measure in the
following the amount of data that are collected.
4.5. Data minimization results
Table 1 shows pairs () of redundancy and sufficiency for various starting parameter values 𝑇𝑒and πœƒπ‘’. Observe that DGD
consistently produces sufficient (= 100%) data distributions of the lowest redundancy. This holds across both random and
similarity-based connection behavior and also for the case that users apply random, and thus not identical, starting parameter values.
Redundancies produced by DGD are significantly lower compared to the redundancies of sufficient data distributions produced by
the baselines. However, Static-G, Static-I, and Dyn-C produce lower redundancies than DGD, although only for insufficient data
distributions.
The static baselines Static-G and Static-I produce the data distributions with the lowest sufficiency coefficients comparatively.
Notably, when users apply the same global similarity threshold as in Static-G, the system as a whole struggles to achieve sufficiency.
This indicates that the use of a global similarity threshold is not recommendable in view of data minimization. We conclude more
Pervasive and Mobile Computing 103 (2024) 101951
13
T. Eichinger and A. KΓΌpper
Table 2
Comparison of redundancy–sufficiency pairs ()when users do or do not make use of changing identifiers and payload aggregation MostSim or WeighAvg in
combination with Distributed Gradient Descent data minimization (DGD) for distinct combinations of starting parameter values 𝑇𝑒and πœƒπ‘’for all users π‘’βˆˆπ‘ˆ
(𝑛payload = 3). All combinations produce sufficient data distributions (= 100%). Pairwise two-sided Welch’s t-test with Bonferroni correction and significance
level 𝛼= 0.01 yield that neither of the combinations significantly outperforms all others.
Persistent identifiers Changing identifiers
π‘‡π‘’πœƒπ‘’DGD DGD[MostSim] DGD[WeighAvg] DGD DGD[MostSim] DGD[WeighAvg]
Random
Connection Behavior
25 0.0 10.92(100%) 5.25(100%) 5.25(100%) 10.90(100%) 5.24(100%) 5.24(100%)
0.1 10.64(100%) 5.27(100%) 5.26(100%) 10.66(100%) 5.27(100%) 5.26(100%)
0.2 11.57(100%) 5.41(100%) 5.41(100%) 11.59(100%) 5.41(100%) 5.41(100%)
0.3 13.07(100%) 5.91(100%) 5.91(100%) 13.07(100%) 5.91(100%) 5.91(100%)
50 0.0 10.93(100%) 5.25(100%) 5.24(100%) 10.91(100%) 5.23(100%) 5.23(100%)
0.1 10.64(100%) 5.26(100%) 5.25(100%) 10.65(100%) 5.26(100%) 5.25(100%)
0.2 11.56(100%) 5.41(100%) 5.41(100%) 11.59(100%) 5.41(100%) 5.41(100%)
0.3 13.08(100%) 5.91(100%) 5.91(100%) 13.07(100%) 5.91(100%) 5.90(100%)
75 0.0 10.93(100%) 5.25(100%) 5.24(100%) 10.91(100%) 5.23(100%) 5.23(100%)
0.1 10.64(100%) 5.26(100%) 5.25(100%) 10.65(100%) 5.26(100%) 5.25(100%)
0.2 11.56(100%) 5.41(100%) 5.41(100%) 11.59(100%) 5.41(100%) 5.41(100%)
0.3 13.08(100%) 5.91(100%) 5.91(100%) 13.07(100%) 5.91(100%) 5.90(100%)
RandInt (25,75) Rand (0.1,0.3) 10.90(100%) 5.25(100%) 5.23(100%) 10.91(100%) 5.24(100%) 5.23(100%)
Similarity-based
Connection Behavior
25 0.0 10.06(100%) 4.93(100%) 4.93(100%) 10.07(100%) 4.94(100%) 4.92(100%)
0.1 10.01(100%) 4.98(100%) 4.98(100%) 10.01(100%) 4.98(100%) 4.98(100%)
0.2 11.99(100%) 5.49(100%) 5.48(100%) 11.99(100%) 5.49(100%) 5.48(100%)
0.3 14.12(100%) 6.07(100%) 6.07(100%) 14.13(100%) 6.07(100%) 6.07(100%)
50 0.0 10.07(100%) 4.93(100%) 4.93(100%) 10.06(100%) 4.93(100%) 4.93(100%)
0.1 10.00(100%) 4.98(100%) 4.98(100%) 10.00(100%) 4.98(100%) 4.98(100%)
0.2 12.00(100%) 5.48(100%) 5.48(100%) 11.99(100%) 5.48(100%) 5.48(100%)
0.3 14.11(100%) 6.08(100%) 6.09(100%) 14.12(100%) 6.08(100%) 6.09(100%)
75 0.0 10.07(100%) 4.93(100%) 4.93(100%) 10.06(100%) 4.93(100%) 4.93(100%)
0.1 10.00(100%) 4.98(100%) 4.98(100%) 10.00(100%) 4.98(100%) 4.98(100%)
0.2 12.00(100%) 5.48(100%) 5.48(100%) 11.99(100%) 5.48(100%) 5.48(100%)
0.3 14.11(100%) 6.08(100%) 6.09(100%) 14.12(100%) 6.08(100%) 6.09(100%)
RandInt (25,75) Rand (0.1,0.3) 11.90(100%) 5.64(100%) 5.63(100%) 11.88(100%) 5.64(100%) 5.63(100%)
generally that the use of static parameter values is not suitable for distributed data minimization. This makes sense intuitively,
as dynamic parameters allow users to adjust their data collection to the current situation. Users may, for instance, adjust their
parameter values on the basis of other users’ parameter values or due to a change in connection behavior.
Regardless of whether we use dynamic parameters or static parameters, our results show that the effect of payload aggregation
and the use of changing identifiers have qualitatively the same effect on both DGD and the baselines. Table 2 shows the effect for
DGD in particular. We observe first that both payload aggregation and the use of changing identifiers does not break the robustness
with which DGD produces sufficient data distributions, as all sufficiency coefficients are perfect (= 100%). We observe further that
the impact of using changing identifiers as compared to using persistent identifiers on redundancy is marginal. More specifically, we
find that the differences are not statistically significant. The impact of payload aggregation is less subtle. We see that the use of either
MostSim or WeighAvg about halves redundancies. Although the use of payload aggregation allows to significantly reduce redundancy,
we find that the difference between the payload aggregation methods MostSim and WeighAvg is not statistically significant. Now
that we have seen that DGD solves the distributed data minimization problem, we are interested in seeing the performance that the
resulting data distributions yield.
4.6. Performance-redundancy trade-offs
Table 3 shows the absolute differences in RMSE and redundancy between DGD and each of the five baselines. Observe first
that the difference in RMSE between DGD and Dyn-R&I is less than 0.001, while using DGD results in less redundancy. Using DGD
rather than Dyn-R&I allows to collect a data amount of 2.55 copies of the user-item matrix 𝑅less with random connection behavior
and 1.71 copies with similarity-based connection behavior across all users with otherwise equal performance levels, which is a
relative improvement of 19.3% and 14.6% respectively. Recall that the difference in parameter update formulas is that DGD features
the consensus term, while Dyn-R&I does not (see Section 4.3). Users thus achieve a significantly better performance-redundancy
trade-off if they coordinate data minimization through the consensus term. Following a similar argumentation for DGD and Dyn-C,
we conclude that performing both global and local data minimization as in DGD, and not only local data minimization as in Dyn-C,
has a significantly positive impact on the performance-redundancy trade-off of minimized data distributions.
Recall that the update formula of Dyn-R&I features the innovation term and the regularization term, while that of Dyn-R only
features the regularization term. We see in Table 3 that using Dyn-R allows to trade a decrease in RMSE by 0.021 with random
connection behavior and 0.025 with similarity-based connection behavior, for a redundancy increased by the data amount of
Pervasive and Mobile Computing 103 (2024) 101951
14
T. Eichinger and A. KΓΌpper
Table 3
Absolute differences π›₯RMSE and redundancies π›₯of the baselines to DGD data minimization. Differences are taken on the basis of the least redundant and
sufficient data distributions shown in Table 1. Asterisks mark statistical significance between each of the baselines with DGD data minimization under a two-sided
Welch’s t-test with Bonferroni correction and significance level 𝛼= 0.01.
π‘‡π‘’πœƒπ‘’π›₯RMSE π›₯
Random
Connection Behavior
Static-G – – – –
Static-I 75 0.1 βˆ’0.015* +17.75*
Dyn-C 75 0.1 βˆ’0.006 +9.02*
Dyn-R 25 0.1 βˆ’0.021* +42.17*
Dyn-R&I 50 0.0 +0.001 +2.55*
DGD 25 0.1 Β±0.000 Β±0.00
Similarity-based
Connection Behavior
Static-G – – – –
Static-I 75 0.1 βˆ’0.017* +19.84*
Dyn-C 75 0.1 βˆ’0.009 +9.51*
Dyn-R 25 0.1 βˆ’0.025* +42.45*
Dyn-R&I 50 0.1 Β±0.000 +1.71*
DGD 50 0.1 Β±0.000 Β±0.00
Table 4
Absolute differences π›₯RMSE and redundancies π›₯of DGD with persistent identifiers and combinations of persistent and changing identifiers and payload
aggregation methods MostSim and WeighAvg. Differences are taken on the basis of the least redundant and sufficient data distribution shown in Table 2. Asterisks
mark statistical significance between each of the baselines with DGD data minimization under a two-sided Welch’s t-test with Bonferroni correction and significance
level 𝛼= 0.01.
π‘‡π‘’πœƒπ‘’π›₯RMSE π›₯
Random
Connection Behavior
Changing Identifiers
DGD[WeighAvg] 50 0.0 +0.005 βˆ’5.41*
DGD[MostSim] 50 0.0 +0.005 βˆ’5.41*
DGD 50 0.1 Β±0.000 +0.01
Persistent Identifiers
DGD[WeighAvg] RandInt (25,75) Rand (0.1,0.3) +0.006 βˆ’5.41*
DGD[MostSim] 25 0.0 +0.005 βˆ’5.39*
DGD 25 0.1 Β±0.000 Β±0.00
Similarity-based
Connection Behavior
Changing Identifiers
DGD[WeighAvg] 25 0.0 +0.003 βˆ’5.08*
DGD[MostSim] 50 0.0 +0.003 βˆ’5.07*
DGD 50 0.1 Β±0.000 Β±0.00
Persistent Identifiers
DGD[WeighAvg] 25 0.0 +0.004 βˆ’5.07*
DGD[MostSim] 25 0.0 +0.003 βˆ’5.07*
DGD 50 0.1 Β±0.000 Β±0.00
42.17 copies of the user-item matrix 𝑅more with random connection behavior, and 42.45 copies with similarity-based connection
behavior, over all users. We conclude that the use of the innovation term has a significantly positive impact on performance while
simultaneously having a significantly negative impact on redundancy. Whether this trade-off is desirable depends on the setting.
We have, for instance, seen in Fig. 3 that DGD converges quicker than Dyn-R, which may be a desirable property. However, only
considering performance and redundancy, we cannot strictly favor either DGD or Dyn-R&I. Following the same argumentation, we
cannot strictly favor either DGD or Static-I only on the basis of performance and redundancy. However, considering that Static-I
has a tendency to produce insufficient data distributions (see Table 1), we deem DGD to be generally favored over Static-I.
Table 4 shows how the use of changing identifiers and payload aggregation affect the performance-redundancy trade-off. We
observe first that the use of changing identifiers does not affects the RMSE at all with both random and similarity-based connection
behavior. We see further that with random connection behavior, the use of changing identifiers increases redundancy slightly by 0.01,
while for similarity-based connection behavior, it leaves redundancy unchanged. We find that the marginal differences in redundancy
and RMSE are not statistically significant. We thus conclude that the use of changing identifiers is feasible in the context of data
minimization.
The reduction in redundancy due to payload aggregation is statistically significant with both random and similarity-based
connection behavior. Interestingly, it is essentially the same, irrespective of whether users make use of persistent or changing
identifiers. We observe, however, that the redundancy reduction of βˆ’5.41 and βˆ’5.39 is larger with random connection behavior
than βˆ’5.08 and βˆ’5.07 with similarity-based connection behavior. Interestingly, the redundancies that the payload aggregation
methods MostSim and WeighAvg yield are essentially the same irrespective of whether or not users make use of persistent or
changing identifiers. We conclude that the use of changing identifiers is feasible in the context of data minimization with payload
aggregation. A direct comparison of MostSim and WeighAvg shows that both payload aggregation methods yield essentially the same
performance-redundancy trade-offs.
Pervasive and Mobile Computing 103 (2024) 101951
15
T. Eichinger and A. KΓΌpper
5. Conclusions
We study data minimization and anonymity in pervasive mobile-to-mobile recommender systems in which users establish ad hoc
wireless connections between their mobile computing devices in physical proximity to exchange ratings on which they calculate
recommendations. A major difference in achieving this goal in pervasive mobile-to-mobile recommender systems, compared to
decentralized recommender systems in general, resides in the mobility of users. Due to user mobility, it is not possible for users to
collect the ratings from other users at will. Users have to decide in situ and exactly when their mobile computing device establishes
a wireless ad hoc connection whether and how many ratings they want to exchange. One of the effects of user mobility is that a
solution to the distributed data minimization problem does not always exist.
We study two types of user mobility. We study, on the one hand, uniformly random connections between users and, on the
other hand, connections that are random yet limited to users who have rated at least one item in common. The randomness in both
models accounts for the randomness in connections that we would expect to arise in a pervasive mobile-to-mobile recommender
system. For both these mobility models, we find that our algorithm based on Distributed Gradient Descent (DGD) robustly solves
the distributed data minimization problem, outperforming an array of baselines. The algorithm’s robustness stems from its capacity
to allow users to dynamically change the amount of data they collect individually in order to minimize the amount of collected data
over all users globally.
The DGD data minimization algorithm minimizes first the number of connections, and on the basis of this minimized amount of
connections, the amount of data that are collected over these connections. We find that the algorithm consistently finds the least
amount of connections and the least amount of ratings communicated over these connections compared to the baselines. We find that
this property simultaneously reduces the chances of an attacker relating users to ratings. We further reduce the chances of an attacker
relating users to ratings by having users aggregate ratings on the same item into a single rating and change their identifiers between
connections. We find that payload aggregation allows to halve the amount of ratings that users collect irrespective of whether
users change their identifiers between connections or not and without jeopardizing the robustness of the DGD data minimization
algorithm. In this sense, the DGD data minimization algorithm not only minimizes the amount of ratings that users collect but also
preserves their anonymity. However, it is important to note that it only preserves the anonymity between users who do not establish
a connection. It is clear that users who do establish an ad hoc wireless connection in physical proximity can never be anonymous
toward each other.
CRediT authorship contribution statement
Tobias Eichinger: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Re-
sources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. Axel KΓΌpper: Funding acquisition,
Supervision, Writing – review & editing.
Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing
interests: Tobias Eichinger reports financial support was provided by European Commission. If there are other authors, they declare
that they have no known competing financial interests or personal relationships that could have appeared to influence the work
reported in this paper.
Data availability
https://github.com/TEichinger/dec-cf-sim (see [17] in the manuscript).
Acknowledgments
Funding: This work was supported by the European Union’s Horizon 2020 research and innovation program [grant number N.
883464].
Appendix A. General construction of the forwarding parameter πœ½π’–
Let π‘’βˆˆπ‘ˆbe a fixed user with similarity threshold π‘‘π‘’βˆˆ [0,1]. Let further 𝑆= {𝑠𝑧}π‘§βˆˆπ‘ˆbe the set of similarity data of all users in
π‘ˆand sim βˆΆπ‘†Γ—π‘†β†’[0,1] be a similarity measure. Let further π‘‹π‘’βˆΆπ‘ˆβˆ–{𝑒}β†’[0,1] be a random variable that describes the behavior
with which the fixed user 𝑒forms a connection with an a priori unknown user π‘£βˆˆπ‘ˆβˆ–{𝑒}. More specifically, for any user π‘£βˆˆπ‘ˆβˆ–{𝑒}
we denote by P(𝑋𝑒=𝑣)the probability that the fixed user 𝑒forms a connection with the a priori unknown user 𝑣. More precisely:
P(𝑋𝑒=𝑣) ∢= P({πœ”|πœ”βˆˆπ‘‹π‘’(𝑣)}). We emphasize that the random variable 𝑋𝑒only describes a single connection. Modeling multiple
connections requires taking multiple random variables 𝑋𝑒.
In this setting, it is unclear whether the fixed user 𝑒forwards a payload in a connection with some other user 𝑣, as 𝑒’s similarity
to 𝑣is a priori unknown. We define the random variable π‘Œπ‘’to describe this a priori unknown similarity. We more specifically define
π‘Œπ‘’βˆΆπ‘†βˆ–{𝑠𝑒}β†’[0,1] by π‘Œπ‘’βˆΆ= sim(𝑠𝑒, 𝑠𝑋𝑒), where 𝑠𝑋𝑒denotes the similarity data of the a priori unknown user 𝑋𝑒. We denote by
Pervasive and Mobile Computing 103 (2024) 101951
16
T. Eichinger and A. KΓΌpper
πΉπ‘Œπ‘’βˆΆ [0,1] β†’[0,1] the cumulative distribution function of π‘Œπ‘’. Then, πΉπ‘Œπ‘’(𝑠)represents the probability that 𝑒forms a connection with
another user 𝑋𝑒=π‘£βˆˆπ‘ˆβˆ–{𝑒}such that their similarity is smaller or equal to 𝑠. Formally, we have:
πΉπ‘Œπ‘’(𝑠)Def. πΉπ‘Œπ‘’
=P(π‘Œπ‘’β‰€π‘ )
Def. π‘Œπ‘’
=P(π‘ π‘–π‘š(𝑠𝑒, 𝑠𝑋𝑒)≀𝑠)
Def. 𝑋𝑒
=P(𝑋𝑒=𝑣, π‘ π‘–π‘š(𝑠𝑒, 𝑠𝑣)≀𝑠)
=P(𝑒↔𝑣, π‘ π‘–π‘š(𝑠𝑒, 𝑠𝑣)≀𝑠).
(A.1)
If we plug in 𝑒’s similarity threshold 𝑑𝑒into this equation (𝑠=𝑑𝑒), we have that πΉπ‘Œπ‘’(𝑑𝑒) ∈ [0,1] represents the probability that 𝑒forms
a connection with another user with a similarity lower or equal than 𝑑𝑒to 𝑒. Conversely, we have that 1 βˆ’ πΉπ‘Œπ‘’(𝑑𝑒) ∈ [0,1] represents
the probability that 𝑒forms a connection with another user with a similarity higher than 𝑑𝑒to 𝑒. Since 𝑒only forwards a payload if
the similarity to the other user is higher than 𝑑𝑒, we see that the probability
πœƒπ‘’βˆΆ= 1 βˆ’ πΉπ‘Œπ‘’(𝑑𝑒) ∈ [0,1].(A.2)
represents the probability with which 𝑒forwards a payload in an a priori unknown connection.
Appendix B. Local versus global data minimization
We illustrate that the objectives of local and global data minimization are competing on the basis of the example on sufficiency
in Section 2.5. We check first whether the data distribution in the example qualifies as a solution to the global data minimization
problem. We then check whether individual users’ collected datasets qualify as solutions to the local data minimization problem.
Finally, we conclude that the existence of solutions to the individual problems does not imply the existence of a solution to the joint
problem.
The data distribution {𝑅𝑒, 𝑅𝑣, 𝑅𝑀}does not qualify as a solution to the global data minimization problem (see Definition 1 in
Section 2.5), since it is not a sufficient data distribution (see the side condition in Eq. (6)). In order to arrive at a sufficient data
distribution, 𝑀would require an additional data exchange in epoch 𝑑= 3, either 𝑒↔𝑀or 𝑣↔𝑀, for then 𝑀would obtain a rating
on 𝑀’s item of interest 𝑖4and arrive at a sufficient collected dataset 𝑅𝑀. However, users 𝑒and 𝑣do not seek data exchanges beyond
epoch 𝑑= 2, since they already collected a full copy of the user-item matrix modulo row permutation. For them, any additional
data exchange in epoch 𝑑= 3 would trivially not yield any change in performance (π›₯𝜎(3)
𝑒=π›₯𝜎(3)
𝑣= 0). We now check whether the
collected datasets 𝑅𝑒, 𝑅𝑣, 𝑅𝑀qualify as solutions to the local data minimization problem (see Definition 1 in Section 2.5).
Observe first that the collected dataset 𝑅𝑀trivially does not qualify as a solution to the local data minimization problem,
since the collected dataset 𝑅𝑀is insufficient. User 𝑀has not collected any rating on 𝑀’s item of interest 𝑖4, which is why π›₯𝜎(1)
𝑀
is trivially undefined. We see that collected datasets necessarily need to be sufficient in order to qualify as solutions to the local
data minimization problem. 𝑅𝑒and 𝑅𝑣are sufficient collected datasets and are thus good solution candidates for the local data
minimization problem. If we assume, for the sake of the argument only, that the change in performance for users 𝑒and 𝑣was
marginal in epoch 𝑑= 2 (π›₯𝜎(2)
𝑒, π›₯𝜎(2)
π‘£βˆˆ (0, πœ€)) for some suitable performance metric 𝜎and performance threshold πœ€, the collected
datasets 𝑅𝑒and 𝑅𝑣qualify as solutions to the local data minimization problem.
In summary, we can see from this example that sufficiency is a necessary criterion for a collected dataset to qualify as a solution
to the local data minimization problem. If a data distribution were to qualify as a solution to both the local and the global data
minimization problem, then the data distribution necessarily has to be sufficient. If users that arrive quickly at a minimized collected
dataset stop seeking data exchanges with other users too early, they exacerbate the data collection of other users who may, as a
consequence, not be able to collect even a sufficient dataset. The underlying idea of our proposed DGD data minimization scheme
is that users coordinate their individual data collection processes by coordinating the pace of their data collection.
Appendix C. Complexity of centralized versus distributed data minimization
We compare the computational complexity of the data minimization problem in the centralized and distributed case. Roughly,
centralized data minimization involves training a few large models, while distributed data minimization involves training many small
models. Recall that the distributed data minimization problem is a mixture of the local and the global data minimization problem.
We omit a discussion on the complexity of the global data minimization problem (see Definition 1) since it merely involves constant
time look-ups of items of interest in a user’s collected dataset. We thus essentially only compare the computational complexity of
the centralized data minimization problem versus the local data minimization problem.
We assume that predicted ratings are calculated on the basis of an SVD collaborative filtering algorithm since the empirical
evaluation of the proposed DGD data minimization scheme is based on SVD (see Section 4.1). We point out that in the
context of recommendation, SVD usually denotes a class of matrix factorization algorithms such as SVD++ [25], truncated SVD
(see Section 3.6.5 in [26]), or FunkSVD [27] rather than a particular algorithm. We therefore use the complexity (min{π‘šπ‘›2, π‘›π‘š2})
of a classical numerical SVD algorithm on a π‘šΓ—π‘›matrix as, for instance, reported in [28].
Centralized data minimization assumes that the training ratings of a user-item matrix 𝑅are collected cumulatively in batches.
This is similar to users collecting payloads through the Data Exchange Protocol for distributed data minimization. After collecting a
new batch of ratings, first, a new SVD is calculated on the union of the previously collected batches and the newly collected batch.
Pervasive and Mobile Computing 103 (2024) 101951
17
T. Eichinger and A. KΓΌpper
Afterward, the new recommendation performance that the new SVD yields on the test ratings is calculated. The new performance
is compared with the previous performance. If the performance differential is marginal, data collection is terminated.
The complexity of SVD calculations for centralized data minimization is the same in each iteration since the number of rows
𝑛user and the number of columns 𝑛item in the underlying user-item matrix remains constant. The reason for this is that centralized
systems need to serve recommendations to all users π‘’βˆˆπ‘ˆon all items π‘–βˆˆπΌ. Since the number of items usually largely exceeds the
number of users (𝑛item ≫ 𝑛user), we arrive at a computational complexity for centralized data minimization on the basis of SVD of
(𝑛iteration β‹…[𝑛item ⋅𝑛user2
⏟⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏟
SVD calculation
+𝑐]),(C.1)
where 𝑛iteration denotes the number of iterations until termination and 𝑐the constant effort to calculate the performance on the test
ratings and the performance differentials in each iteration.
Although centralized data minimization and local data minimization are conceptually similar, the complexity of SVD calculations
for local data minimization does not remain constant as for centralized data minimization. The complexity of SVD calculations for
local data minimization increases between iterations. The reason for this is that 𝑒does not need to serve recommendations to all
users π‘’βˆˆπ‘ˆon all items π‘–βˆˆπΌas in a centralized recommender system, yet only to 𝑒themself on those items for which ratings have
been collected. The complexity of SVD calculations increases with the increasing dimension of the underlying user-item matrix,
which is the product of the number of rows and the number of columns. With respect to the example in Fig. 2, we see that the
collected dataset 𝑅𝑀is a 2 Γ—3 user-item matrix since 𝑀did not collect any rating on item 𝑖4, while 𝑅𝑒and 𝑅𝑣are 3 Γ—4 user-item
matrices.
Recall that users collecting data via the Data Exchange Protocol only receive a payload with probability πœƒin a connection and
that a payload represents the analog to a batch in the centralized case. More specifically, we have seen at the end of Section 2.3
that a user 𝑒receives βˆ‘π‘‘πœƒ(𝑑)
𝑒payloads on average over the course of 𝑒’s data collection process. Without loss of generality, we
disregard the connections in which 𝑒does not receive a payload. We define the integer 𝑛reception(𝑒) ∢= βŒŠβˆ‘π‘‘πœƒ(𝑑)
π‘’βŒ‹as an estimate of the
number of connections in which 𝑒receives a payload over the course of 𝑒’s data collection process. We further define the integer
𝑛reception ∢= βˆ‘π‘’π‘›reception(𝑒)as an estimate of the overall number of payload receptions over all users’ data collection processes, in
analogy to 𝑛iteration in Eq. (C.1) in the centralized case. We then arrive at a computational complexity for local data minimization
on the basis of SVD of
(βˆ‘
π‘’βˆˆπ‘ˆ
𝑛reception(𝑒)
βˆ‘
𝑑=1
[𝑛(𝑑)
item(𝑒)⋅𝑛(𝑑)
user(𝑒)2
⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟
SVD calculation
per user and epoch
+𝑐𝑒])
Eq. (C.3)
=(βˆ‘
π‘’βˆˆπ‘ˆ
𝑛reception(𝑒)
βˆ‘
𝑑=1
[𝑛item ⋅𝑛user2+𝑐𝑒])
=(βˆ‘
π‘’βˆˆπ‘ˆ
𝑛reception(𝑒)
βˆ‘
𝑑=1
[𝑛item ⋅𝑛user2+𝑐])
=(βˆ‘
π‘’βˆˆπ‘ˆ
𝑛reception(𝑒)β‹…[𝑛item ⋅𝑛user2+𝑐])
=(𝑛reception β‹…[𝑛item ⋅𝑛user2
⏟⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏟
averaged
SVD calculation
+𝑐]),
(C.2)
where 𝑛(𝑑)
user(𝑒)and 𝑛(𝑑)
item(𝑒)denote the number of rows and columns of a user 𝑒’s collected dataset respectively after receiving
𝑑payloads, 𝑐𝑒the user-specific constant computational effort to calculate the performance on their test ratings and their performance
differentials with 𝑐as their average, and 𝑛item ⋅𝑛user2the average computational effort of SVD calculations over all users and epochs
defined as
𝑛item ⋅𝑛user2∢= βˆ‘π‘’βˆˆπ‘ˆβˆ‘π‘›reception(𝑒)
𝑑=1 𝑛(𝑑)
item(𝑒)⋅𝑛(𝑑)
user(𝑒)2
βˆ‘π‘’βˆˆπ‘ˆπ‘›reception(𝑒).(C.3)
By comparison of Eqs. (C.1) and (C.2), we see that the computational complexity of centralized data minimization and local
data minimization have a similar structure. In particular, both grow linearly in the number of times that data is collected since the
collection of a batch or payload triggers the calculation of an SVD. The complexity of SVD calculations is expensive for large user-item
matrices, even if they are sparse. Therefore, if SVD calculations on the entire user-item matrix for centralized data minimization
become intractable, performing SVD calculations on multiple subsets of the entire user-item matrix for local data minimization may
be a tractable alternative. An empirical comparison of the computational effort of centralized versus local data minimization goes
beyond the scope of this paper and represents future work.
Pervasive and Mobile Computing 103 (2024) 101951
18
T. Eichinger and A. KΓΌpper
Appendix D. Completeness of local and global data minimization
We explain why individual solutions to the local and global data minimization problems exist with random connection behavior.
The local data minimization problem is well-defined; that is, a solution to the problem exists for a user 𝑒if that user 𝑒can collect
a locally minimized dataset 𝑅(π‘šπ‘’)
𝑒(see the side condition in Eq. (5)). Since local data minimization is equivalent to centralized data
minimization performed by an individual user, a user may apply data minimization schemes that have been proposed for use in
centralized systems (see for instance [3,4]). Data minimization schemes for use in centralized systems require the availability of
the entire user-item matrix 𝑅, that is, the availability of any other user’s profile. Randomly pairing 𝑒with any other user in the
Connection Phase (𝑖) of the Data Exchange Protocol makes any other user’s profile available to user 𝑒, although it may require
several epochs to actually collect these profiles. We conclude that solutions to the local data minimization problem always exist if
users are paired randomly.
The global data minimization problem is well-defined if every user π‘’βˆˆπ‘ˆcan collect a sufficient dataset 𝑅𝑒, which is equivalent
to the data distribution {𝑅𝑒}π‘’βˆˆπ‘ˆbeing sufficient (see the side condition in Eq. (6)). Under the assumption that the entire user-item
matrix 𝑅represents a sufficient collected dataset to any user 𝑒, we can completely analogously to the argumentation in the previous
paragraph for the local data minimization problem conclude that a solution to the global data minimization problem always exists
if users are paired randomly. We argue that the assumption that 𝑅is sufficient to any user 𝑒is justified since, otherwise, the
recommendation problem itself is not well-defined. More precisely, if 𝑅were not sufficient for a particular user 𝑒, that is, 𝑅does
not hold ratings on any of 𝑒’s items of interest, none of the items of interest can be recommended.
References
[1] European Union, Charter of fundamental rights of the European union, Off. J. Eur. Union 55 (C 326) (2012) 391–407, see in particular Chapter II.
[2] A.F. Westin, Privacy and Freedom, Ig Publishing, 2015, reprint from 1967.
[3] A.J. Biega, P. Potash, H. DaumΓ©, F. Diaz, M. Finck, Operationalizing the legal principle of data minimization for personalization, in: Proceedings of the
43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2020, pp. 399–408.
[4] D. Shanmugam, F. Diaz, S. Shabanian, M. Finck, A.J. Biega, Learning to limit data collection via scaling laws: A computational interpretation for the legal
principle of data minimization, in: Proceedings of the 5th ACM Conf. on Fairness, Accountability, and Transparency, ACM, 2022, pp. 839–849.
[5] R. Chow, H. Jin, B. Knijnenburg, G. Saldamli, Differential data analysis for recommender systems, in: Proceedings of the 7th ACM Conference on
Recommender Systems, ACM, 2013, pp. 323–326.
[6] M. Larson, A. Zito, B. Loni, P. Cremonesi, Towards minimal necessary data: The case for analyzing training data requirements of recommender algorithms,
in: Proceedings of the 1st FATREC Workshop on Responsible Recommendation, 2017, pp. 1–6.
[7] H. Wen, L. Yang, M. Sobolev, D. Estrin, Exploring recommendations under user-controlled data filtering, in: Proceedings of the 12th ACM Conference on
Recommender Systems, ACM, 2018, pp. 72–76.
[8] T. Eichinger, A. KΓΌpper, Distributed data minimization for decentralized collaborative filtering systems, in: Proceedings of the 24th International Conference
on Distributed Computing and Networking, ICDCN ’23, ACM, 2023, pp. 140–149.
[9] S. Lee, X. Zheng, J. Hua, H. Vikalo, C. Julien, Opportunistic federated learning: An exploration of egocentric collaboration for pervasive computing
applications, in: Proceedings of the 19th IEEE International Conference on Pervasive Computing and Communications, IEEE, 2021, pp. 1–8.
[10] J. Dunkel, R. Hermoso, Towards MANET-based recommender systems for open facilities, Appl. Intell. 52 (8) (2021) 9045–9066.
[11] T. Eichinger, F. Beierle, R. Papke, L. Rebscher, H. Chinh Tran, M. Trzeciak, On gossip-based information dissemination in pervasive recommender systems,
in: Proceedings of the 13th ACM Conference on Recommender Systems, ACM, 2019, pp. 442–446.
[12] L.N. Barbosa, J. Gemmell, M. Horvath, T. Heimfarth, Distributed user-based collaborative filtering on an opportunistic network, in: Proceedings of the
2018 IEEE 32nd International Conference on Advanced Information Networking and Applications, IEEE, 2018, pp. 266–273.
[13] P. Gratz, T. Leclerc, Delay-tolerant collaborative filtering, in: Proceedings of the 7th ACM International Symposium on Mobility Management and Wireless
Access, ACM, 2009, pp. 109–113.
[14] R. Schifanella, A. Panisson, C. Gena, G. Ruffo, MobHinter: Epidemic collaborative filtering and self-organization in mobile ad-hoc networks, in: Proceedings
of the 2nd ACM Conference on Recommender Systems, ACM, 2008, pp. 27–34.
[15] A. de Spindler, M.C. Norrie, M. Grossniklaus, Collaborative filtering based on opportunistic information sharing in mobile ad-hoc networks, in: Proceedings
of the 6th OTM Confederated International Conferences β€˜β€˜On the Move to Meaningful Internet Systems’’, in: LNCS, (4803) Springer, 2007, pp. 408–416.
[16] J. Hestness, S. Narang, N. Ardalani, G. Diamos, H. Jun, H. Kianinejad, M.M.A. Patwary, Y. Yang, Y. Zhou, Deep learning scaling is predictable, empirically,
2017, arXiv:1712.00409.
[17] T. Eichinger, Simulation framework for distributed data minimization, 2023, URL: https://github.com/TEichinger/dec-cf-sim. (Accessed 5 May 2024).
[18] M.D. Ekstrand, J.T. Riedl, J.A. Konstan, Collaborative filtering recommender systems, Found. Trends HCI 4 (2) (2011) 81–173.
[19] M. McPherson, L. Smith-Lovin, J.M. Cook, Birds of a feather: Homophily in social networks, Annu. Rev. Sociol. 27 (2001) 415–444.
[20] G. Beliakov, T. Calvo, S. James, Aggregation functions for recommender systems, in: F. Ricci, L. Rokach, B. Shapira (Eds.), Recommender Systems Handbook,
second ed., Springer US, Boston, MA, 2015, pp. 777–808.
[21] R. Shokri, P. Pedarsani, G. Theodorakopoulos, J.-P. Hubaux, Preserving privacy in collaborative filtering through distributed aggregation of offline profiles,
in: Proceedings of the 3rd ACM Conference on Recommender Systems, ACM, 2009, pp. 157–164.
[22] A. Pfitzmann, M. Hansen, A terminology for talking about privacy by data minimization: Anonymity, unlinkability, undetectability, unobservability,
pseudonymity, and identity management, 2010, v0.34.
[23] F.M. Harper, J.A. Konstan, The MovieLens datasets: History and context, ACM TIIS 5 (4) (2015) 1–19.
[24] Q.-T. Truong, A. Salah, H. Lauw, Multi-modal recommender systems: Hands-on exploration, in: Proceedings of the 15th ACM Conference on Recommender
Systems, ACM, 2021, pp. 834–837.
[25] Y. Koren, Factorization meets the neighborhood: A multifaceted collaborative filtering model, in: Proceedings of the 14th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, ACM, 2008, pp. 426–434.
[26] C.C. Aggarwal, Recommender Systems: The Textbook, first ed., Springer Publishing Company, Incorporated, 2016.
[27] S. Funk, Netflix update: Try this at home, 2006, URL: https://sifter.org/simon/journal/20061211.html. (Accessed 5 May 2024).
[28] L. Trefethen, D. Bau, Numerical Linear Algebra, 1997.