Document [original]

Pervasive and Mobile Computing 103 (2024) 101951

Available online 28 May 2024

(http://creativecommons.org/licenses/by/4.0/).

Contents lists available at ScienceDirect

Pervasive and Mobile Computing

journal homepage: www.elsevier.com/locate/pmc

On data minimization and anonymity in pervasive mobile-to-mobile

recommender systems

Tobias Eichinger∗, Axel Küpper

Technische Universität Berlin, Service-centric Networking (SNET), Ernst-Reuter-Platz 7, Berlin, 10587, Germany

ARTICLE INFO

Dataset link: https://github.com/TEichinger/d

ec-cf-sim

Keywords:

Data minimization

Anonymity

Decentralized recommender system

Distributed gradient descent

Pervasive computing

Mobile computing

ABSTRACT

Data minimization is a legal principle that mandates limiting the collection of personal data

to a necessary minimum. In this context, we address ourselves to pervasive mobile-to-mobile

recommender systems in which users establish ad hoc wireless connections between their mobile

computing devices in physical proximity to exchange ratings that represent personal data on

which they calculate recommendations. The specific problem is: How can users minimize the

collection of ratings over all users while only being able to communicate with a subset of

other users in physical proximity? A main difficulty is the mobility of users, which prevents,

for instance, the creation and use of an overlay network to coordinate data collection. Users,

therefore, have to decide whether to exchange ratings and how many when an ad hoc

wireless connection is established. We model the randomness of these connections and apply

an algorithm based on distributed gradient descent to solve the distributed data minimization

problem at hand. We show that the algorithm robustly produces the least amount of connections

and also the least amount of collected ratings compared to an array of baselines. We find that

this simultaneously reduces the chances of an attacker relating users to ratings. In this sense, the

algorithm also preserves the anonymity of users, yet only of those users who do not establish

an ad hoc wireless connection with each other. Users who do establish a connection with each

other are trivially not anonymous toward each other. We find that users can further minimize

data collection and preserve their anonymity if they aggregate multiple ratings on the same

item into a single rating and change their identifiers between connections.

1. Introduction

Privacy is traditionally associated with the autonomy and freedom of individuals [1,2]. It forms the basis of a free so-

ciety and is, therefore, widely considered worthy of protection. A popular way to protect privacy is by law, where modern

data protection laws around the globe such as the European Union’s General Data Protection Regulation (GDPR), California’s

California Privacy Rights Act (CPRA), South Africa’s Personal Information Act (POPIA), or Brazil’s Lei Geral de Proteção de Dados

(LGPD) unanimously emphasize the protection of personal data. Across all data protection laws, data minimization is a central data

protection principle that mandates the limitation of collection and processing of personal data.

One way to apply data minimization is to mandate purpose limitation. Purpose limitation means that when personal data are

collected, the purposes of their collection must be specified and the collected data may only be processed to fulfill these purposes.

Purpose limitation imposes a burden of proof on data controllers, that is those who collect and process personal data, and data subjects,

that is, those individuals to whom the data relate. Data controllers must make it clear to their data subjects for which purposes they

∗Corresponding author.

E-mail address: [email protected] (T. Eichinger).

https://doi.org/10.1016/j.pmcj.2024.101951

Received 1 September 2023; Received in revised form 5 May 2024; Accepted 25 May 2024

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Fig. 1. Visualization of Shanmugam et al.’s [4] data minimization paradigm, based on the three phase model of Hestness et al. [16]. Data minimization is

characterized as a stopping problem, where data collection is to be stopped beyond the sufficiency point and at best at the saturation point. We adapt the

denomination of the three phases and the two points that separate them to the data minimization use case.

collect personal data and that the personal data that they collect is actually necessary to fulfill these purposes. We see that the tacit

assumption behind the principle of purpose limitation is that it is possible to decide whether processing a piece of personal data is

necessary for the fulfillment of a purpose.

In many application scenarios, it is possible to understand whether the collection and processing of personal data is necessary

to fulfill a purpose. It is, for instance, clear that it is necessary for an online shop to collect a customer’s address for the purpose

of delivery. In some application scenarios, however, it is not straightforward to decide whether the collection of personal data is

necessary for the fulfillment of that purpose. One such purpose is the provision of personalized content on, for instance, social media

platforms or e-commerce websites, where the selection of personalized content is performed by recommender systems. In the context

of personalization in recommender systems, it is not straightforward to decide whether the collection and processing of personal

data is necessary.

Biega et al. [3] outline that the main difficulty in applying purpose limitation in recommender systems is that it is typically

not possible to decide for a particular piece of personal data whether it is necessary to be collected. They, therefore, propose to

collect any personal data that can be considered relevant for the purpose of personalization until reaching a performance threshold.

Shanmugam et al. [4] argue that the use of an absolute performance threshold may cause a system to collect personal data while

only marginally increasing performance. While performance generally increases as the use of personal data increases, performance

gains decrease [3,5–7]. They propose to collect personal data until the performance gains, rather than the performance itself, reach

a threshold. Fig. 1 illustrates this data minimization paradigm.

In our own previous work, we study the data minimization problem as per Fig. 1 in decentralized recommender systems [8]. In

decentralized recommender systems, we are faced with multiple distinct instances of the data minimization problem, one for each

user in the system. In this paper, we specifically recapitulate our findings in the context of a pervasive mobile-to-mobile recommender

system in which users collect personal data through pairwise data exchanges in physical proximity (see for instance [9–15]).

Limitation of data exchanges between users to physical proximity is in itself a data minimization method, as every user only has

access to the personal data of a subset of users in the system. This limitation makes it, however, easier to identify the user to

whom personal data relate. We thus discuss the link between data minimization and anonymity in general and study the effects of

(A) users changing their identifiers and (B) aggregating their own personal data with that of others in particular. We begin with a

recapitulation of our distributed data minimization method.

2. Methods and material

The core methods we describe in this section have already been presented in our previous publication [8]. The source code that

implements both previously proposed methods and methods that we propose in this paper is publicly available [17]. It allows us

to reproduce all our results. We first introduce some basic conventions and notations that are subject to the application domain

of recommender systems. We then formally describe the pairwise connections that users establish in physical proximity. For each

pairwise connection that users establish, we formally describe the way in which users exchange personal data. On this basis, we

formulate data minimization as a distributed optimization problem and present an algorithm based on distributed gradient descent

to solve it. We begin with some conventions and notations.

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Fig. 2. Example of three users 𝑢, 𝑣, 𝑤 ∈𝑈exchanging rating data on four items 𝑖1, 𝑖2, 𝑖3, 𝑖4∈𝐼in two consecutive and pairwise data exchanges 𝑢↔𝑤and 𝑢↔𝑣.

The amount of data collected overall hinges on (a) the number of data exchanges and (b) the number of ratings exchanged in a data exchange. 𝑅denotes the

user-item matrix and 𝑅𝑢, 𝑅𝑣, 𝑅𝑤individual collected datasets.

2.1. Conventions and notations

We assume that users derive recommendations on their mobile phones on the basis of the data they collect from nearby other

users. We assume, more specifically, that all users make use of a collaborative filtering algorithm to derive recommendations.

Collaborative filtering generally denotes a class of recommendation algorithms [18]. These algorithms take ratings as input and

yield predicted ratings as output. Predicted ratings are estimates of how a user would rate an item they have not seen yet. Based on

these predicted ratings, recommendations can be displayed as a list of items with the highest predicted ratings. These assumptions

do not represent a limitation. Our methods can readily be applied to data types other than ratings and algorithms other than

collaborative filtering algorithms, provided that the performance of recommendations can be measured and performance differences

can be characterized as zero, marginal, or large. In a system with real users, one could thus regularly ask users to grade the quality of

recommendations or track their behavior within the system to measure the quality of recommendations and characterize performance

differences.

We measure performance with respect to the discrepancy between users’ actual ratings and the predicted ratings output by a

collaborative filtering algorithm to derive recommendations for them. We denote by 𝑟𝑢,𝑖 the rating by some user 𝑢∈𝑈on some

item 𝑖∈𝐼, where 𝑈and 𝐼denote sets of users and items, respectively. We use the hat notation 𝑟𝑢,𝑖 to denote predicted ratings.

We denote by 𝑛user and 𝑛item the number of users and items in 𝑈and 𝐼, respectively. We write all ratings 𝑟𝑢,𝑖 into the 𝑛user ×𝑛item

user-item matrix 𝑅and denote by |𝑅|the number of ratings in 𝑅. We call a row 𝑟𝑢in the user-item matrix 𝑅, corresponding to the

ratings by some user 𝑢∈𝑈, the profile of user 𝑢. We use the bar notation 𝑟𝑢to denote user 𝑢’s mean rating.

We denote by sim(𝑢, 𝑣) ∈ [0,1] the cosine similarity between the profiles of some two users 𝑢, 𝑣 ∈𝑈defined by sim(𝑢, 𝑣) = <𝑟𝑢,𝑟𝑣>

‖𝑟𝑢‖⋅‖𝑟𝑣‖,

where missing ratings are substituted with zeros, <⋅,⋅>denotes the dot product, ‖⋅‖the Euclidean norm, and 𝑟𝑢, 𝑟𝑣the profiles

of some users 𝑢and 𝑣respectively. The cosine similarity is widely used to describe two users’ similarity with respect to their item

preferences. If, for instance, two users have not rated a single item in common, then their cosine similarity is zero. Cosine similarity

is positive if users have rated at least on item in common, and is larger the closer their ratings on those commonly rated items align.

In decentralized collaborative filtering systems, every user only has their own profile to begin with and thus needs to collect

ratings from nearby other users. We denote by 𝑟𝑢→𝑣the payload that user 𝑢sends to 𝑣. Payloads are sets of previously collected

profiles, which may include the sender’s own profile. We denote by 𝑅𝑢the collected dataset of user 𝑢, which consists of 𝑢’s own profile

and all payloads that 𝑢collected. We more generally denote by the union of all users’ collected datasets {𝑅𝑢}𝑢∈𝑈adata distribution

and define |{𝑅𝑢}𝑢∈𝑈|=∑𝑢∈𝑈|𝑅𝑢|as the number of ratings in it.

We denote by an epoch 𝑡a time interval of fixed length during which the users in 𝑈establish connections. We assume that all

connections that are established within the same epoch are independent of each other. If, for instance, a user 𝑢establishes two

connections to two users 𝑣and 𝑤in epoch 𝑡respectively, we assume that 𝑢cannot send ratings to 𝑤that 𝑢receives from 𝑣in

epoch 𝑡, and vice versa. In other words, we assume that all connections within the same epoch happen synchronously, although, in

practice, this will not be the case in a pervasive mobile-to-mobile recommender system. All connections that are established during

an epoch are dissolved at the end of the epoch.

Fig. 2 shows an example in which three users 𝑢, 𝑣, 𝑤 ∈𝑈collect ratings on four items 𝑖1, 𝑖2, 𝑖3, 𝑖4∈𝐼in two consecutive and

pairwise data exchanges. Initially (𝑡= 0), users only have their own profile, where profiles are indicated as row vectors. We see that

the row vectors of the user-item matrix 𝑅are split over all users. Over time, users collect payloads that hold each other’s ratings.

Eventually (𝑡= 2), users 𝑢and 𝑣each hold a full copy of the user-item matrix 𝑅, modulo row permutation, and 𝑤holds two out of

three rows of 𝑅.

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

We denote by 𝑇𝑢the number of epochs a user 𝑢∈𝑈seeks to exchange rating data with other users. We call 𝑇𝑢the time horizon

of user 𝑢. With respect to the example shown in Fig. 2, users 𝑢and 𝑣have time horizons 𝑇𝑢=𝑇𝑣= 2, while user 𝑤has a time

horizon of 𝑇𝑤= 1. If all time horizons 𝑇𝑢coincide, we call the common value 𝑇the global time horizon.

From the perspective of a user, data minimization presents itself as a collection minimization problem as in traditional centralized

systems. With reference to Fig. 1, each user collects ratings in the user-item matrix 𝑅until reaching their individual saturation point,

that is, the point beyond which additional collection only yields marginal performance gains. From the perspective of the system,

however, data minimization presents itself as a redundancy minimization problem. The user-item matrix 𝑅is to be distributed over

all users with minimum redundancy such that every user has reached at least their individual sufficiency point and, at best, their

individual saturation point. Since users can only exchange data in physical proximity, the way in which data distributions form is

subject to the order in which connections between users form. This is what we describe in the following.

2.2. Connection behavior

Connectivity in pervasive mobile-to-mobile recommender systems, as in all wireless ad hoc networks, is characterized by node

mobility and intermittent connectivity. In view of intermittent connectivity, we assume for simplicity that no routing takes place. No

overlay networks are formed and communication is limited to users located within the transmission ranges of each other’s mobile

computing devices. With this assumption, we particularly omit a treatment of the hidden node problem. In view of node mobility,

we assume that pairwise connections between users are established randomly.

We describe connection behaviors as the random sampling of edges from a graph = (𝑈, 𝐸), where 𝑈denotes the set of users

and 𝐸the set of edges between these users. Whenever two users share an edge, we assume that these two users are physically close

such that their mobile computing devices can establish an ad hoc wireless connection. We thus model node mobility in terms of the

changes in the graph . In the following, we describe the two connection behaviors that we consider in this paper.

2.2.1. Random connection behavior

We sample edges in two steps such that every user establishes a connection with 𝑛other users. In the first step, we generate an

ordering (𝑢1, 𝑢2, .., 𝑢𝑛)on the set of users 𝑈. We do so by 𝑛times uniformly randomly sampling a user from the user set 𝑈without

replacement. In the second step, we iterate over this ordering. For 𝑖= 1,2,…, 𝑛, we define a subset of edges 𝐸𝑖⊂ 𝐸 that includes

the 𝑖th user 𝑢𝑖and excludes all other users 𝑣∈𝑈that have previously been sampled in an edge. Formally, we have

𝐸𝑖= {𝑒= {𝑢, 𝑣} ∈ 𝐸|𝑢𝑖∈𝑒and 𝑢, 𝑣 ∉𝑒𝑗for all 𝑗 < 𝑖 },(1)

where 𝑒𝑗denotes the edge sampled in the 𝑗th iteration. Note that the subset 𝐸𝑖can be empty (𝐸𝑖= ∅). This is particularly the case

if the number of users in 𝑈is odd and all users except one form pairs.

We see that in random connection behavior, the graph is fully-connected and every user can potentially establish a connection

with any other user. Any pair of users has the same probability of establishing a connection. The following connection behavior

assumes that not every pair of users can potentially establish a connection, yet only similar pairs of users.

2.2.2. Similarity-based connection behavior

Similarity-based connection behavior takes into consideration the cosine similarity between users’s profiles. Two users 𝑢and

𝑣share a graph in the graph from which we sample edges if their cosine similarity sim(𝑢, 𝑣)is positive. Recall that the cosine

similarity is only positive if two users have rated at least one item in common. It is widely known in the recommender systems

literature that, typically, only a fraction of user pairs has rated at least one item in common. As a consequence, when we apply the

same edge sampling procedure as for random connection behavior, we see that with similarity-based connection behavior, every

user can only establish connections with a subset of other users.

Similarity-based connection behavior takes into consideration that users are more likely to be in physical proximity to each other

if they are similar. In this case, we measure similarity in terms of the cosine similarity, that is, the similarity between two users’

ratings. Its underlying idea more generally extends to the idea behind homophilous search (see for instance [19]). Now that we

have described the way in which connections form, we describe the communication that happens between pairs of connected users.

2.3. Data exchange protocol

The Data Exchange Protocol (see Algorithm 1) formalizes the way in which a user sends ratings in a payload to a connected other

user. While both users of a connection communicate according to the Data Exchange Protocol, we only show one communication

direction, that is, communication from some user 𝑢to some user 𝑣, and omit the other since both directions are identical. We

assume that all users are benevolent in the sense that they follow the Data Exchange Protocol exactly and without enforcement, since

otherwise, users would not seek to engage with other users in the first place. The Data Exchange Protocol features the following

two consecutive phases.

(𝑖)Similarity Calculation Phase: It is a widely adopted assumption in the literature on decentralized collaborative filtering systems

that two users who are similar with respect to their profiles hold rating data relevant to each other. Therefore, user 𝑢sends the

profile 𝑟𝑢to 𝑣and receives the profile 𝑟𝑣from user 𝑣. Upon reception, user 𝑢calculates the cosine similarity sim(𝑢, 𝑣). Only if the

cosine similarity exceeds 𝑢’s similarity threshold 𝑡𝑢∈ [0,1],𝑢decides to forward a payload in the Forwarding Phase.

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Algorithm 1: Data Exchange Protocol

Given: sender 𝑢and receiver 𝑣

Parameters: forwarding parameter 𝜃𝑢, payload parameter 𝑛payload

1Procedure (𝑖)SimilarityCalculationPhase (𝜃):

2send(𝑟𝑢);wait(𝑟𝑣);

/* 𝑢calculates the cosine similarity between 𝑢’s profile 𝑟𝑢and 𝑣’s profile 𝑟𝑣*/

3similarity = sim(𝑟𝑢,𝑟𝑣);

/* 𝑢calculates similarity threshold 𝑡𝑢*/

4𝑡𝑢←𝐹−1

𝑢(1 − 𝜃𝑢);

5end

6if similarity > 𝑡𝑢then

7Procedure (𝑖𝑖)PayloadForwardingPhase (𝑛payload):

/* 𝑢selects 𝑛payload profiles/aggregated payloads that have previously been collected in

the PayloadForwardingPhase and relate to the 𝑛payload users most similar to 𝑣*/

8𝑟𝑢→𝑣←selectProfiles(𝑛payload, 𝑣);

/* [OPTIONAL] 𝑢aggregates multiple ratings on the same item into a single rating */

9𝑟𝑢→𝑣←aggregatePayload(𝑟𝑢→𝑣);

10 send(uPayload);

11 end

12 end

13 wait(vPayload);

14 disconnect(𝑢, 𝑣);

We point out that it is convenient to represent 𝑢’s similarity threshold 𝑡𝑢as the a priori probability 𝜃𝑢∈ [0,1] that the similarity

sim(𝑢, 𝑣)between user 𝑢and the a priori unknown user 𝑣exceeds 𝑢’s similarity threshold 𝑡𝑢. Here, we only describe the construction

of this probability for the special case that users establish connections with random connection behavior. A general construction

can be found in Appendix A. Assume the distribution of similarities between 𝑢and any other user is known to 𝑢, and its cumulative

distribution function 𝐹𝑢is continuous. Then, 1 − 𝜃𝑢∶= 𝐹𝑢(𝑡𝑢)represents the portion of users that have a similarity lower than 𝑡𝑢to

𝑢, and 𝜃𝑢represents the portion of users that have a similarity higher than 𝑡𝑢to 𝑢. We call 𝜃𝑢the forwarding parameter of 𝑢. The

forwarding parameter 𝜃𝑢describes the probability that user 𝑢forwards a payload in the Forwarding Phase of a connection.

(𝑖𝑖)Payload Forwarding Phase: User 𝑢considers all previously collected profiles, including 𝑢’s own profile, for sending. Here, 𝑢

limits the amount of ratings sent to 𝑣in a payload by only selecting the 𝑛payload profiles collected from the users that are most similar

to 𝑣. We call 𝑛payload the payload parameter. Users further limit the number of ratings they send to 𝑣by aggregating the ratings in

the payload, which we describe in Section 2.4.

The Data Exchange Protocol formalizes the process through which users collect data. We can now formally represent an individual

user 𝑢’s data collection process as a finite sequence of parameter value 3-tuples

((𝑇(𝑡)

𝑢, 𝜃(𝑡)

𝑢))𝑡=1,2,…,𝑚𝑢,(2)

where 𝑇(𝑡)

𝑢denotes the in epoch 𝑡projected end of 𝑢’s data collection process, 𝜃(𝑡)

𝑢denotes the probability that 𝑢sends a payload in a

connection in epoch 𝑡, and 𝑚𝑢the length of 𝑢’s data collection process. For the special case that all forwarding parameter values 𝜃(𝑡)

𝑢

coincide over all users 𝑢and are constant over all epochs 𝑡, we can roughly estimate that each user 𝑢collects about ∑𝑚𝑢

𝑡=1 𝜃(𝑡)

𝑢payloads

irrespective of the underlying connection behavior. Each payload can be aggregated to decrease the number of ratings it holds, as

we describe in the following.

2.4. Payload aggregation

We call any method that aggregates multiple payload ratings on the same item into a single payload rating a payload aggregation

method. While the payload parameter 𝑛payload determines the number of profiles that a payload consists of, payload aggregation

summarizes the ratings in these 𝑛payload payloads such that every rating refers to only one distinct item. As such, payload aggregation

represents a basic method to minimize data collection. The gentle reader is referred to [20] for details on aggregation in the context

of recommender systems. We present in the following the two payload aggregation methods we study in this paper.

2.4.1. Rating by most similar user (MostSim)

We follow the payload aggregation method proposed by Shokri et al. [21]. Whenever there are multiple ratings on the same

item in a payload, the sender user chooses the rating in the profile that is most similar to the profile of the receiver user. We call

this payload aggregation method MostSim. We make an example to illustrate the MostSim payload aggregation method.

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Example: With reference to the example in Fig. 2, we illustrate how user 𝑢would aggregate the payload 𝑟𝑢→𝑣=[𝑟𝑢

𝑟𝑤]in epoch 𝑡= 2

before sending to user 𝑣with MostSim payload aggregation. Before user 𝑢aggregates the two profiles 𝑟𝑢=[− 3 5 − ]and 𝑟𝑤=[1 4 − − ],

𝑢first calculates the cosine similarities sim(𝑣, 𝑢)=0.78 and sim(𝑣, 𝑤)=0.09, and then aggregates them as follows:

MostSim ([𝑟𝑢

𝑟𝑤],[𝑠𝑖𝑚(𝑣, 𝑢)

𝑠𝑖𝑚(𝑣, 𝑤)])=MostSim ([− 3 5 −

1 4 − −],[0.78

0.09])=[135−].(3)

We have reduced the number of ratings in the payload 𝑟𝑢→𝑣from four to three ratings.

The above example illustrates how MostSim payload aggregation reduces the number of ratings in a payload if there are multiple

ratings on the same item. The example illustrates further that aggregated payloads have the shape of a single profile. Indeed,

when we interpret aggregated payloads as profiles, it is clear that we can aggregate payloads of payloads. Note that in contrast to

non-aggregated payloads in which users can relate distinct profiles in a payload to distinct users, this is not possible for aggregated

payloads. Users, therefore, relate aggregated payloads to the users who forward them instead of the users whose profiles have been

aggregated (see also Line 8in Algorithm 1in this context). We continue with the second payload aggregation method we study.

2.4.2. Weighted averaging (WeighAvg)

Whenever there are multiple ratings on the same item in a payload, the sender user aggregates them by calculating weighted

average ratings, where weights represent cosine similarities between the profile a rating is in and the profile of the receiver user.

We call this payload aggregation method WeighAvg. We make an example to illustrate the WeighAvg payload aggregation method.

Example: With reference to the example in Fig. 2, we illustrate how user 𝑢would aggregate the payload 𝑟𝑢→𝑣=[𝑟𝑢

𝑟𝑤]in

epoch 𝑡= 2 before sending to user 𝑣with WeighAvg payload aggregation. Before user 𝑢aggregates the two profiles 𝑟𝑢=[− 3 5 − ]

and 𝑟𝑤=[1 4 − − ],𝑢again calculates the cosine similarities sim(𝑣, 𝑢) = 0.78 and sim(𝑣, 𝑤) = 0.09 first and then aggregates them by

calculating the weighted arithmetic mean for each item as follows:

WeighAvg ([𝑟𝑢

𝑟𝑤],[𝑠𝑖𝑚(𝑣, 𝑢)

𝑠𝑖𝑚(𝑣, 𝑤)])=WeighAvg ([− 3 5 −

1 4 − −],[0.78

0.09])=[1 3.10 5 −].(4)

We have again reduced the number of ratings in the payload 𝑟𝑢→𝑣from four to three ratings.

A comparison of WeighAvg and MostSim payload aggregation shows that the reduction in the number of ratings in a payload they

yield is identical. The difference between the two payload aggregation methods resides only in the cardinality of the ratings in the

aggregated payload they produce. This concludes the description of the payload aggregation methods we study. In the following,

we describe data minimization in decentralized recommender systems as a distributed optimization problem.

2.5. Distributed data minimization

We define the distributed data minimization problem as a combination of a local and a global minimization problem.

Definition 1 (Distributed Data Minimization).Let 𝑈be a set of users that exchange data via the Data Exchange Protocol.

Local Data Minimization: Let 𝜀 > 0. Let further 𝑢∈𝑈be a fixed user with data collection process ((𝑇(𝑡)

𝑢, 𝜃(𝑡)

𝑢))𝑡=1,2,…,𝑚𝑢associated

to the sequence (𝑅(𝑡)

𝑢)𝑡=1,2,…,𝑚𝑢of collected datasets. Then user 𝑢minimizes the amount of collected data locally via

min

((𝑇(𝑡)

𝑢,𝜃(𝑡)

𝑢))𝑡=1,2,…,𝑚𝑢|𝑅(𝑚𝑢)

𝑢|such that 𝑅(𝑚𝑢)

𝑢is saturated with respect to 𝜀. (5)

We call the integer 𝑚𝑢of a solution 𝑅∗

𝑢

(𝑚𝑢)its saturation point.

Global Data Minimization: Let 𝑅(𝑚𝑢)

𝑢denote the final collected dataset that a user 𝑢∈𝑈collects. Then the entirety of all users 𝑈

minimizes the amount of collected data globally via

min

{𝑅(𝑚𝑢)

𝑢}𝑢∈𝑈|{𝑅(𝑚𝑢)

𝑢}𝑢∈𝑈|such that all 𝑅𝑢are sufficient,(6)

where {𝑅(𝑚𝑢)

𝑢}𝑢∈𝑈denotes a data distribution. We call a solution {𝑅∗

𝑢

(𝑚𝑢)}𝑢∈𝑈to the global data minimization problem a

minimized data distribution. If additionally all collected datasets 𝑅∗

𝑢

(𝑚𝑢)solve their respective local data minimization problem,

we call {𝑅∗

𝑢

(𝑚𝑢)}𝑢∈𝑈a minimum data distribution.

With respect to Fig. 1, we see that every user tries to solve their respective local data minimization problem by stopping data

collection at their respective saturation point. They not consider the amount of data that other users collect. Differently put, the

local data minimization problem puts each user into the shoes of a centralized recommendation provider who seeks to minimize the

amount of data they collect from other users as in a centralized recommender system. In other words, the local data minimization

problem is the data minimization problem in a centralized recommender system. In decentralized systems, we also have that all

users try to solve the global data minimization problem. A solution to the global data minimization requires that every user passes

their respective sufficiency point.

We see that the local and the global data minimization problems are competing problems and solving both problems simul-

taneously is in general not possible. A short explanation is that users’ sufficiency points and saturation points, in general, do not

coincide. A more detailed explanation can be found in Appendix B. A comparison of the computational complexity between the

data minimization problem in centralized and decentralized recommender systems can be found in Appendix C. It remains to define

saturation and sufficiency.

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Definition 2 (Saturation).Let 𝑢∈𝑈be a user with data collection process ((𝑇(𝑡)

𝑢, 𝜃(𝑡)

𝑢))𝑡=1,2,…,𝑚𝑢associated to the sequence

(𝑅(𝑡)

𝑢)𝑡=1,2,…,𝑚𝑢of collected datasets. For a fixed rating prediction method and performance metric 𝜎, we denote by 𝜎𝑢(𝑅(𝑡)

𝑢)the

performance that user 𝑢measures for the collected dataset 𝑅(𝑡)

𝑢. We further denote by 𝛥𝜎𝑢(𝑅(𝑡)

𝑢, 𝑅(𝑡−1)

𝑢)the performance difference

that user 𝑢measures between the collected datasets 𝑅(𝑡)

𝑢and 𝑅(𝑡−1)

𝑢, that is between the end of epoch 𝑡and the end of epoch 𝑡− 1.

Formally:

𝛥𝜎𝑢(𝑅(𝑡)

𝑢, 𝑅(𝑡−1)

𝑢) ∶= 𝜎𝑢(𝑅(𝑡)

𝑢) − 𝜎𝑢(𝑅(𝑡−1)

𝑢).(7)

We more briefly write 𝛥𝜎(𝑡)

𝑢∶= 𝛥𝜎𝑢(𝑅(𝑡)

𝑢, 𝑅(𝑡−1)

𝑢)if the underlying collected datasets 𝑅(𝑡)

𝑢and 𝑅(𝑡−1)

𝑢are clear from the context.

In this setting, let 𝜀 > 0. Then, if the performance difference 𝛥𝜎(𝑡)

𝑢satisfies

|𝛥𝜎(𝑡)

𝑢|=|𝜎𝑢(𝑅(𝑡)

𝑢) − 𝜎𝑢(𝑅(𝑡−1)

𝑢)|∈ (0, 𝜀),(8)

for some collected dataset 𝑅(𝑡)

𝑢, we say that the performance difference 𝛥𝜎(𝑡)

𝑢is marginal. In this case, we call 𝑅(𝑡)

𝑢an 𝜀-saturated

collected dataset, or simply saturated collected dataset if 𝜀is clear from the context.

Definition 3 (Sufficiency).Let 𝑈and 𝐼be sets of users and items respectively and {𝑅𝑢}𝑢∈𝑈be a data distribution. Let further 𝐼𝑢⊂ 𝐼

be the subset of items that a user 𝑢∈𝑈is interested in and 𝑅𝑢∈ {𝑅𝑢}𝑢∈𝑈that user’s collected dataset. We then call the collected

dataset 𝑅𝑢insufficient if none of the ratings in 𝑅𝑢refers to any of 𝑢’s items of interest in 𝐼𝑢, and else call it sufficient. More generally,

we call the data distribution {𝑅𝑢}𝑢∈𝑈sufficient if all its collected datasets 𝑅𝑢∈ {𝑅𝑢}𝑢∈𝑈are sufficient, and else call it insufficient.

Finally, we define the sufficiency coefficient ({𝑅𝑢}𝑢∈𝑈)of a data distribution {𝑅𝑢}𝑢∈𝑈as the fraction of sufficient collected datasets

in the data distribution {𝑅𝑢}𝑢∈𝑈:

({𝑅𝑢}𝑢∈𝑈) = |{𝑅𝑢∈ {𝑅𝑢}𝑢∈𝑈|𝑅𝑢is sufficient}|

|{𝑅𝑢}𝑢∈𝑈|∈ [0,1],(9)

where |{⋅}|denotes the number of elements in the set {⋅}. Clearly, a data distribution {𝑅𝑢}𝑢∈𝑈is sufficient if and only if its sufficiency

coefficient ({𝑅𝑢}𝑢∈𝑈)equals 1.

We emphasize that whether a solution to the distributed data minimization problem exists, that is, a solution that minimizes

both the local and the global data minimization problem simultaneously, depends largely on the underlying connection behavior.

Details on when solutions to the individual problems exist with random connection behavior are given in Appendix D. To see that

a solution to neither the local nor the global problem may exist, consider the case that a user can only establish connections with

users to whom the user is not similar. In this case, the user neither forwards nor receives a payload. In the following, we describe

an algorithm that we find to solve the distributed data minimization problem with random and similarity-based data minimization.

2.6. Distributed gradient descent-based data minimization

We propose to apply Distributed Gradient Descent (DGD) data minimization as previously proposed in [8] for use in pervasive

mobile-to-mobile recommender systems. The algorithm makes it such that users coordinate their individual data collection processes

by communicating and adjusting their parameter values 𝜃(𝑡)

𝑢and 𝑇(𝑡)

𝑢to each other such that their collected datasets saturate

uniformly.

Definition 4 (DGD Data Minimization).Let 𝑈be a set of users that collect ratings via the Data Exchange Protocol. Let further 𝑢∈𝑈

be a user that establishes connections in some epoch 𝑡with the users in some subset of users 𝑉 ⊂ 𝑈∖{𝑢}, and the parameter variable

𝑝be either the time horizon 𝑇or the forwarding parameter 𝜃. User 𝑢then updates 𝑢’s local parameter value 𝑝𝑢at the end of epoch 𝑡

for use in epoch 𝑡+ 1 as follows:

𝛥𝑝(𝑡)

𝑢=𝑝(𝑡+1)

𝑢−𝑝(𝑡)

𝑢

= − 𝛼𝑝⋅⎧

⎪

⎨

⎪

⎩

1,if |𝛥𝜎(𝑡)

𝑢|= 0

1,if |𝛥𝜎(𝑡)

𝑢|∈ (0, 𝜀)

0,if |𝛥𝜎(𝑡)

𝑢|∈ [𝜀, ∞)

⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟

innovation term

+𝛽𝑝⋅1

|𝑉|∑

𝑣∈𝑉

(𝑝(𝑡)

𝑣−𝑝(𝑡)

𝑢)

⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟

consensus term

+𝛾𝑝⋅{1,if 𝜎(𝑡)

𝑢undefined

0,else,

⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟

regularization term

(10)

where 𝛼𝑝, 𝛽𝑝, 𝛾𝑝≥0denote non-negative update weights specific to the parameter 𝑝,𝑉denotes the set of 𝑛connection users 𝑢establishes

a connection with in epoch 𝑡, and 𝛥𝜎(𝑡)

𝑢denotes the performance differential of user 𝑢at the start and end of epoch 𝑡with respect to

some performance metric 𝜎. If a parameter update would cause a parameter value to fall below or exceed its respective boundary

interval 𝑇𝑢∈ [0,∞) or 𝜃𝑢∈ [0,1], the parameter value is instead set to its lower or upper bound respectively.

Termination: Data collection terminates if at least one of the following criteria are satisfied:

(I) All individual time horizons are smaller than the current epoch (max𝑢𝑇(𝑡)

𝑢< 𝑡).

(II) All individual forwarding parameters are zero (max𝑢𝜃𝑢= 0).

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Criterion (I) describes the situation that all users’ data collection processes have ended and no user seeks further connections.

Criterion (II) describes the situation that users do seek connections to other users, yet do not exchange payloads in any such

connection.

This concludes the presentation of the methods we use. In the following section, we establish a link between data minimization

and anonymity in the context of data minimization in general and distributed data minimization in pervasive mobile-to-mobile

recommender systems in particular.

3. Theory

The goal of this section is to establish a link between data minimization and anonymity. In order to do so, we follow in the lines

of Pfitzmann and Hansen’s [22] working document on the disambiguation of anonymity in the context of data minimization and

adapt their definitions to the context of pervasive mobile-to-mobile recommender systems. We begin with a general characterization

of data minimization.

3.1. A general characterization of data minimization

Data minimization as per Pfitzmann and Hansen [22] foresees data minimization as the following nested structure of minimiza-

tion objectives.

(1.) Minimize the possibility to collect personal data.

(2.) Minimize the collection of personal data.

(3.) Minimize the time that collected personal data are stored.

These three minimization objectives are nested in the sense that they are not to be minimized simultaneously, yet instead,

Objective (1.) first, only afterward Objective (2.), and finally Objective (3.). Since the three minimization objectives are formulated

for information systems in general, it is meaningful to discuss them in the context of pervasive mobile-to-mobile recommender

systems.

In a pervasive mobile-to-mobile recommender system, users establish ad hoc wireless connections in physical proximity to

exchange ratings with each other. In view of Objective (1.), we see that the possibility to collect personal data depends on the

mobility of users and their pairwise proximity to each other. Since we model the mobility of users as the random sampling of edges

from a graph, we can quantify the possibility to collect personal data in terms of the number of edges that are sampled, which

represents the number of connections that are established. Since we assume that every user establishes a single connection per

epoch, we address Objective (1.) by having every user 𝑢minimize their time horizon 𝑇𝑢, that is, the number of epochs in which 𝑢

establishes a connection with another user. On the basis of this minimized amount of connections, we then minimize the collection

of personal data in Objective (2.).

The collection of personal data hinges on the amount of ratings that are exchanged in each connection. The Data Exchange

Protocol foresees the exchange of ratings in two respects. First, in the Similarity Calculation Phase (𝑖), connected users exchange

their own respective profile ratings to calculate the cosine similarity. Only if the cosine similarity exceeds a similarity threshold,

users exchange the previously collected ratings of other users in the consecutive Payload Forwarding Phase (𝑖𝑖). Recall that the

forwarding parameter 𝜃𝑢quantifies the probability that a user 𝑢enters the Payload Forwarding Phase (𝑖𝑖) and sends a payload in

a connection. Recall further that the payload parameter 𝑛payload quantifies the number of profiles in that payload. We see that the

number of profiles forwarded in payloads is ∑𝑢∈𝑈∑𝑇𝑢

𝑡=1 𝜃𝑢⋅𝑛payload. Since we already minimized the time horizon 𝑇𝑢in Objective (1.)

and 𝑛payload is a constant, we address Objective (2.) by having every user 𝑢minimize their forwarding parameter 𝜃𝑢. On the basis

of this minimized amount of forwarded payloads, we finally minimize the time that collected personal data are stored.

The time that collected personal data are stored depends on the way in which recommendations are derived. If we assume

that each user trains a collaborative filtering model on their individually collected ratings on their mobile computing device, it

makes sense to store them until their data collection process terminates. After termination of the data collection process, a user

calculates recommendations on the collected data. After recommendations have been calculated, the collected ratings do not serve

their purpose anymore and can thus be deleted. With respect to the representation of individual users’ data collection processes as

in Eq. (2), we can reformulate the general Objectives (1.), (2.), and (3.) in the context of a pervasive mobile-to-mobile recommender

system for each user 𝑢as follows:

(1.’) Minimize the time horizon 𝑇𝑢.

(2.’) Minimize the forwarding parameter 𝜃𝑢.

(3.’) Delete all collected ratings after epoch 𝑇𝑢and after you have calculated recommendations.

We now see perhaps more clearly that, from the perspective of a single user, the distributed data minimization problem represents

a minimization problem on the parameter space spanned by the time horizon 𝑇𝑢and forwarding parameter 𝜃𝑢. From the perspective

of the system as a whole, it represents a minimization problem over the Cartesian product over all these parameter spaces.

So far, we only considered the minimization of ratings irrespective of whether they are the ratings by the user who forwards

them or those of a user who is not involved in a connection. We make this distinction in the following section in the context of a

discussion on anonymity.

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

3.2. Anonymity in terms of unlinkability

The basis for any discussion on anonymity requires the definition of the information of interest that is to be protected. In

the context of a pervasive mobile-to-mobile recommender system as described throughout Section 2, we consider two types of

information of interest, namely identifiers and ratings. Since we wish to characterize anonymity in terms of linkability, we need to

define linkability first. In order to do so, we begin by adapting the definition of linkability as per Pfitzmann and Hansen [22] to

identifiers and ratings.

Definition 5 (Linkability).Linkability of two or more identifiers and/or ratings from an attacker’s perspective means that the attacker

can sufficiently distinguish whether they are related or not.

In the context of the above definition, we may think of attackers as other users in the system or by-standers who are present

in physical proximity of users yet do not participate in the system as users themselves. As identifiers, we may think of for instance

network identifiers, application identifiers, and device identifiers which are used by users’ mobile computing devices to establish

an ad hoc wireless connection. For linguistic simplicity, we subsume all these different types of identifiers together under the term

identifier. For notational simplicity, we use the symbol 𝑢∈𝑈to denote both a user as a person as well as their respective identifier

and always say whether we mean the user themselves or their identifier.

If a user always uses the same identifier, it is clear that other users can link distinct connections they establish with the same

user. We say that a user makes use of a persistent identifier. If users otherwise change their network identifiers between connections,

we say that they use changing identifiers. Making use of changing identifiers makes it more difficult for attackers to link the identifiers

and ratings of a user. It remains to describe how linkability relates to anonymity. In order to do so, we follow again the terminology

as per Pfitzmann and Hansen [22] and adapt their definition of sender anonymity.

Definition 6 (Anonymity (in Terms of Linkability)).We say that a user 𝑢∈𝑈who communicates using the identifier 𝑢is anonymous

from an attacker’s perspective if the attacker cannot sufficiently distinguish between the identifier 𝑢and any other identifier

(anonymity w.r.t. identifiers). Furthermore, we say that a user 𝑢communicates a rating 𝑟𝑢,𝑖 anonymously from an attacker’s

perspective if the attacker cannot sufficiently distinguish between the rating 𝑟𝑢,𝑖 and any other rating (anonymity w.r.t. ratings).

By the above definition, any two users who establish a connection can never be anonymous toward each other, neither with

respect to their identifiers nor their ratings. Only users who do not establish a connection can potentially remain anonymous toward

each other. We make a short example to illustrate this.

Example: We revisit the example in Fig. 2 in which user 𝑢establishes a connection 𝑢↔𝑤to 𝑤in epoch 𝑡= 1 and a connection

𝑢↔𝑣to 𝑣in epoch 𝑡= 2. Since connections are established in physical proximity and payloads are exchanged immediately between

connected users, user 𝑢can distinguish user 𝑤from user 𝑣on, for instance, the basis of when and where the two distinct connections

were established. We see that 𝑣and 𝑤cannot be anonymous toward 𝑢with respect to their identifiers. Since 𝑣and 𝑤communicate

their profiles 𝑟𝑣and 𝑟𝑤respectively in the Similarity Calculation Phase (𝑖), we see further that 𝑣and 𝑤are also not anonymous

toward 𝑢with respect to their ratings. The users 𝑣and 𝑤themselves, however, can potentially be anonymous toward each other,

since they do not establish a connection. We examine this question in the following.

When user 𝑢forwards 𝑤’s profile 𝑟𝑤in the payload 𝑟𝑢→𝑣=[𝑟𝑢

𝑟𝑤]to user 𝑣in epoch 𝑡= 2, we see that 𝑣can immediately distinguish

between 𝑢’s profile 𝑟𝑢and 𝑤’s profile 𝑟𝑤. In this case, 𝑤is trivially not anonymous toward 𝑣with respect to 𝑤’s ratings, since the

non-aggregated payload 𝑟𝑢→𝑣=[𝑟𝑢

𝑟𝑤]allows to distinguish whether a rating is by the same or by different users. Since the profile

𝑟𝑤is marked with the identifier 𝑤, user 𝑤is trivially not anonymous toward user 𝑣with respect to 𝑤’s identifier. The situation is

different if 𝑢aggregates the payload 𝑟𝑢→𝑣=[𝑟𝑢

𝑟𝑤]through, for instance, WeighAvg to the aggregated payload 𝑟𝑢→𝑣=[1 3.10 5 − ]before

sending to 𝑣(see Eq. (4)). In this case, user 𝑤is anonymous toward user 𝑣with respect to 𝑤’s identifier since the payload 𝑟𝑢→𝑣is

not marked with user 𝑤’s identifier. User 𝑤is additionally anonymous toward 𝑣with respect to 𝑤’s ratings, since the aggregated

payload 𝑟𝑢→𝑣=[1 3.10 5 − ]itself does not allow 𝑣to tell the underlying profiles 𝑟𝑢and 𝑟𝑤that have been used for aggregation. Last

but not least, we point out that 𝑣is trivially anonymous toward 𝑤, since 𝑤neither receives 𝑣’s identifier nor any of 𝑣’s ratings.

The above example presents the use of changing identifiers and payload aggregation as methods to retain the anonymity between

users who do not establish an ad hoc wireless connection in physical proximity. We emphasize, however, that anonymity is never

absolute. With respect to the above example, if 𝑣knows, for instance, the order in which the two connections 𝑢↔𝑤and 𝑢↔𝑣

happen and the underlying payload aggregation method that 𝑢uses, it is possible for 𝑣to tell the existence of 𝑤and most likely

also to distinguish between 𝑢’s and 𝑤’s ratings in the aggregated payload 𝑟𝑢→𝑣. In other words, the more connections users establish

and the more data they communicate in these connections, the easier it becomes for an attacker to relate identifiers and ratings to

each other.

The two data minimization objectives (1.’) and (2.’) that we formulated in the previous section implicitly aim to countervent the

possibility that an attacker can relate identifiers and ratings to each other. They do so by (1.’) minimizing the number of connections

that are established and (2.’) minimizing the number of ratings that are communicated over these connections. It is in this sense

that data minimization and anonymity are related to each other. In the following section, we present results on the minimization of

data collection according to Objectives (1.’) and (2.’) as well as the effect of users making use of changing identifiers and payload

aggregation.

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

4. Results

We first show how DGD data minimization affects the parameters 𝑇𝑢and 𝜃𝑢. We then demonstrate that DGD data minimization

consistently produces sufficient and minimized data distributions.

4.1. Dataset and experimental setup

We present results on a stratified sample of the MovieLens ml-25m dataset [23], which is a widely used benchmark dataset in the

recommender systems literature. The sample comprises 500 profiles that consist of 77,588 ratings on 9816 movies. We split ratings

into training and test sets per user at a ratio of 80 to 20. The training set comprises 62,073 ratings with profiles holding between

16 to 2726 ratings. We report results as means over five distinct splits into training and test data.

We simulate data collection on the training ratings following the Data Exchange Protocol as described in Section 2.3. Since each

user establishes a single connection per epoch, we see that every user establishes 𝑡connections until epoch 𝑡. We set 𝑛payload = 3

constant such that the number of profiles per payload remains constant over time, and we can expect that the average amount of

ratings in a payload remains constant over time.

We assess recommendation performance on models trained on every user’s individually collected training ratings and evaluated

on their test ratings. We train models with the help of the Cornac library [24]. We report results on the basis of the Singular Value

Decomposition (SVD) collaborative filtering algorithm, which we find to outperform, for instance, user-based collaborative filtering,

item-based collaborative filtering, and Matrix Factorization consistently. We thereby follow prior works that report the suitability

of SVD in the context of data minimization in recommender systems [3–5].

With respect to DGD data minimization, we experimented with distinct combinations of update weights 𝛼𝑇, 𝛽𝑇, 𝛾𝑇∈ {1,2,3,5}

for the time horizon 𝑇𝑢and 𝛼𝜃, 𝛽𝜃, 𝛾𝜃∈ {0.1,0.2,0.3} for the forwarding parameter 𝜃𝑢. For small update weights 𝛽𝑇= 1 and 𝛽𝜃= 0.1,

results differed only marginally in the amount of collected data and performance. However, larger update weights rendered DGD

data minimization unstable; that is, data distributions tended to become insufficient, and the data minimization procedure sometimes

did not terminate. For simplicity of presentation, we thus only report results for 𝛼𝑇=𝛽𝑇=𝛾𝑇= 1 and 𝛼𝜃=𝛽𝜃=𝛾𝜃= 0.1. Finally,

we consider performance changes 𝛥𝜎𝑢in the interval (0,0.01) marginal, that is, we set the performance threshold 𝜀= 0.01.

4.2. Metrics

We measure the recommendation performance with respect to the Root Mean Squared Error (RMSE) metric. The RMSE is a

standard performance metric in the recommender systems literature.

Let 𝑢be a user with training ratings 𝑟𝑢, test ratings 𝑟𝑢|test, and collected dataset 𝑅𝑢. On the basis of the collected dataset 𝑅𝑢, user

𝑢derives a predicted rating 𝑟𝑢,𝑖 for every test rating 𝑟𝑢,𝑖 ∈𝑟𝑢|test. Note that 𝑢may not be able to derive predicted ratings for all test

ratings. For instance, user 𝑢cannot predict a rating for an item if 𝑢has not collected any rating on that item. We thus denote by

the subset 𝑟𝑢|test ⊂ 𝑟𝑢|test the test ratings for which user 𝑢can derive a predicted rating. Then, we measure the error between the

predicted test ratings and the test ratings as follows:

RMSE𝑢(𝑅𝑢) = ⎛⎜⎜⎝

|𝑟𝑢|test|∑

𝑟𝑢,𝑖∈𝑟𝑢|test (𝑟𝑢,𝑖 −𝑟𝑢,𝑖)2⎞⎟⎟⎠

1∕2

,(11)

where |𝑟𝑢|test|denotes the number of ratings in 𝑟𝑢|test. In case that user 𝑢cannot derive any predicted rating for any test rating

(𝑟𝑢|test = ∅), we say that the RMSE𝑢is undefined. We denote by RMSE the average over all RMSE𝑢that are not undefined.

We propose the following redundancy metric to measure the number of ratings in a data distribution {𝑅𝑢}𝑢∈𝑈for use in

decentralized recommender systems:

({𝑅𝑢}𝑢∈𝑈) = ∑𝑢∈𝑈|𝑅𝑢|

|𝑅|∈ [1,∞).(12)

The redundancy metric quantifies the amount of ratings in the system as multiples of the number of ratings in the

user-item matrix 𝑅. In centralized systems, the amount of ratings is often quantified as a fraction of the ratings in the user-item

matrix 𝑅in the interval [0,1] (see for instance [4,5]). The redundancy metric extends this metric to decentralized systems.

4.3. Baselines

We compare DGD data minimization against five baselines. The first two baselines are naive baselines that feature static

parameters 𝑇𝑢and 𝜃𝑢. They are naive in the sense that they assume that optimal constant parameter values exist for every user

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Fig. 3. Mean time horizon 𝑇and mean forwarding parameter 𝜃over time for Distributed Gradient Descent (DGD) data minimization and data minimization

baselines with dynamic parameter values. ▶and ◀mark the beginning and end of data collection respectively. Error bars indicate deviations of one standard

deviation from the mean.

and that those parameter values are known prior to data collection. The remaining three baselines feature dynamic parameters 𝑇

and 𝜃. In contrast to the naive baselines, these baselines assume that constant optimal parameter values exist. Parameter values

need to be dynamically updated during data collection.

Static-Global (Static-G): All users employ the same constant parameter values 𝑇𝑢and 𝜃𝑢. They employ the same global similarity

threshold 𝑡instead of individual similarity thresholds 𝑡𝑢. Formally: 𝑡=𝐹(1−𝜃), where 𝐹denotes the cumulative distribution function

of pairwise similarities between users.

Static-Individual (Static-I): All users employ the same constant parameter values 𝑇𝑢and 𝜃𝑢. In contrast to Static-Global, users

employ individual similarity thresholds 𝑡𝑢. Formally: 𝑡𝑢=𝐹𝑢(1−𝜃𝑢), where 𝐹𝑢denotes the cumulative distribution function of pairwise

similarities between user 𝑢and all other users.

Dynamic-Centralized (Dyn-C): All users employ the same constant parameter values 𝑇𝑢=𝑇and 𝜃𝑢initially. Users stop data

collection as soon as they reach their saturation point. Formally: They set their time horizon to the current epoch (𝑇𝑢=𝑡) as soon

as their performance differentials become marginal (|𝛥𝜎(𝑡)

𝑢|∈ (0, 𝜀)). We call this baseline centralized, since users minimize data

collection individually as if they were a centralized system. Users minimize data collection, in particular, independently from other

users.

Dynamic-Regularization (Dyn-R): All users employ the same constant parameter values 𝑇𝑢and 𝜃𝑢initially. Users update their

individual parameter values between epochs only with respect to the regularization term in Equation (Eq. (10)). Formally: Users

apply DGD data minimization with 𝛼𝑇=𝛼𝜃=𝛽𝑇=𝛽𝜃= 0.

Dynamic-Regularization&Innovation (Dyn-R&I): All users employ the same constant parameter values 𝑇𝑢and 𝜃𝑢initially. In

contrast to Dyn-R&I, users not only update their individual parameter values with respect to the regularization term but also with

respect to the innovation term in Equation (Eq. (10)). Formally: Users apply DGD data minimization with 𝛽𝑇=𝛽𝜃= 0.

We see that users can control the amount of data they collect by controlling their local parameters 𝑇𝑢and 𝜃𝑢. Users increase their

parameter values to increase data collection and decrease their parameter values to decrease or terminate it. Before we study the

data distributions that result from users controlling their parameters 𝑇𝑢and 𝜃𝑢, we first shed a light on the parameters themselves.

4.4. Data collection parameters

Fig. 3 shows the behavior of the mean time horizon 𝑇(𝑡)∶= 1∕𝑛user ∑𝑢∈𝑈𝑇(𝑡)

𝑢and mean forwarding parameter

𝜃(𝑡)∶= 1∕𝑛user ∑𝑢∈𝑈𝜃(𝑡)

𝑢for DGD and the dynamic baselines Dyn-C, Dyn-R, and Dyn-R&I over epochs 𝑡. We omit showing

mean parameters for the static baselines Static-G and Static-I, because they are constant over time.

For DGD, both 𝑇and 𝜃increase initially until they reach a maximum almost simultaneously. Past the maximum, both mean

parameters decrease until 𝜃drops to zero and data collection terminates. Dyn-R&I follows the same pattern except that it exhibits

a milder decrease in the forwarding parameter than DGD. Recall that the parameter update formulas for DGD and Dyn-R&I differ

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Table 1

Pairs ()of redundancies and sufficiency coefficients for Distributed Gradient Descent data minimization (DGD) and the five baselines

for distinct combinations of starting parameter values 𝑇𝑢and 𝜃𝑢for all users 𝑢∈𝑈(𝑛payload = 3). Redundancies are crossed out if the underlying

data distribution is insufficient ( ≠ 100%). Asterisks mark statistical significance of the lowest redundancy under sufficiency per row under a

two-sided Welch’s t-test with Bonferroni correction and significance level 𝛼= 0.01.

𝑇𝑢𝜃𝑢Static-G Static-I Dyn-C Dyn-R Dyn-R&I DGD

Random

Connection Behavior

25 0.0 – – – 53.18 (100%) 13.38 (100%) 10.92∗(100%)

0.1 10.94 (67.9%) 10.45 (93.2%) 8.58 (93.3%) 52.81 (100%) 13.40 (100%) 10.64∗(100%)

0.2 19.55 (83.8%) 21.73 (99.5%) 16.61 (99.5%) 54.89 (100%) 13.82 (100%) 11.57∗(100%)

0.3 28.15 (92.4%) 31.41 (100%) 21.74 (100%) 57.47 (100%) 15.36 (100%) 13.07∗(100%)

50 0.0 – – – 60.70 (100%) 13.19 (100%) 10.93∗(100%)

0.1 19.12 (75.7%) 21.12 (99.6%) 16.17 (99.6%) 60.40 (100%) 13.25 (100%) 10.64∗(100%)

0.2 29.92 (88.9%) 38.04 (100%) 23.16 (100%) 61.26 (100%) 13.69 (100%) 11.56∗(100%)

0.3 39.49 (95.6%) 47.25 (100%) 25.20 (100%) 63.30 (100%) 15.28 (100%) 13.08∗(100%)

75 0.0 – – – 66.49 (100%) 13.19 (100%) 10.93∗(100%)

0.1 23.07 (79.5%) 28.39 (100%) 19.66 (100%) 66.16 (100%) 13.25 (100%) 10.64∗(100%)

0.2 34.21 (91.8%) 44.93 (100%) 24.05 (100%) 66.80 (100%) 13.75 (100%) 11.56∗(100%)

0.3 44.20 (96.7%) 53.39 (100%) 25.47 (100%) 68.64 (100%) 15.28 (100%) 13.08∗(100%)

RandInt(25,75) Rand(0.1,0.3) – 36.33 (99.3%) 20.41 (99.9%) 60.10 (100%) 13.58 (100%) 10.90∗(100%)

Similarity-based

Connection Behavior

25 0.0 – – – 52.83 (100%) 11.90 (100%) 10.06∗(100%)

0.1 10.64 (71.2%) 10.44 (94.6%) 9.55 (95.6%) 52.45 (100%) 11.95 (100%) 10.01∗(100%)

0.2 20.52 (87.7%) 24.87 (99.6%) 18.98 (99.9%) 57.75 (100%) 13.95 (100%) 11.99∗(100%)

0.3 30.66 (94.6%) 36.05 (100.0%) 25.32 (100%) 61.31 (100%) 15.77 (100%) 14.12∗(100%)

50 0.0 – – – 60.75 (100%) 11.83 (100%) 10.07∗(100%)

0.1 18.67 (78.2%) 22.25 (99.6%) 16.85 (99.8%) 60.02 (100%) 11.71 (100%) 10.00∗(100%)

0.2 31.26 (91.8%) 41.60 (100%) 24.33 (100%) 64.92 (100%) 13.95 (100%) 12.00∗(100%)

0.3 42.40 (97.1%) 52.08 (100%) 28.07 (100%) 68.43 (100%) 15.83 (100%) 14.11∗(100%)

75 0.0 – – – 66.46 (100%) 11.83 (100%) 10.07∗(100%)

0.1 22.82 (82.1%) 29.84 (100%) 19.51 (100%) 65.51 (100%) 11.71 (100%) 10.00∗(100%)

0.2 35.85 (93.6%) 48.73 (100%) 24.97 (100%) 70.16 (100%) 13.81 (100%) 12.00∗(100%)

0.3 47.16 (98.0%) 58.07 (100%) 28.21 (100%) 73.35 (100%) 15.83 (100%) 14.11∗(100%)

RandInt(25,75) Rand(0.1,0.3) – 39.79 (99.8%) 21.37 (99.9%) 63.01 (100%) 13.76 (100%) 11.90∗(100%)

only in DGD having the consensus term (see Section 4.3). Data collection terminates quicker for DGD than for Dyn-R&I due to the

consensus term in the update formula.

In contrast to DGD and Dyn-R&I, both 𝑇and 𝜃increase initially yet do not decrease for Dyn-R. Recall that the parameter update

formula of Dyn-R only features the regularization term, while that of Dyn-R&I features both regularization and innovation terms.

We thus attribute the initial increase in mean parameters to the regularization term and the subsequent decrease to the innovation

term. At the beginning of data collection, users only have their own profile, which is an insufficient (RMSE𝑢undefined) amount

of rating data. The regularization term thus causes mean parameter values to grow. As users collect rating data over time, the

number of users with an insufficient amount of ratings decreases and the number of users with a sufficient (RMSE𝑢defined) amount

of ratings increases. Past the maximum, the decrease in mean parameter values is caused by the innovation term. Users with a

sufficient amount of data begin to outweigh users with an insufficient amount.

For Dyn-C, mean parameters do not increase initially as for Dyn-R, Dyn-R&I, and DGD. In particular, the mean forwarding

parameter 𝜃remains constant. The mean time horizon 𝑇decreases when users stop establishing connections with other users as

soon as their performance differentials become marginal (|𝛥𝜎(𝑡)

𝑢|∈ (0, 𝜀)). Observe the high variance in mean time horizons 𝑇. The

variance is high since users do not coordinate data collection and thus do not agree on when they uniformly end their data collection

processes. Now that we have described the way users collect data in terms of the mean parameters 𝜃and 𝑇, we measure in the

following the amount of data that are collected.

4.5. Data minimization results

Table 1 shows pairs () of redundancy and sufficiency for various starting parameter values 𝑇𝑢and 𝜃𝑢. Observe that DGD

consistently produces sufficient (= 100%) data distributions of the lowest redundancy. This holds across both random and

similarity-based connection behavior and also for the case that users apply random, and thus not identical, starting parameter values.

Redundancies produced by DGD are significantly lower compared to the redundancies of sufficient data distributions produced by

the baselines. However, Static-G, Static-I, and Dyn-C produce lower redundancies than DGD, although only for insufficient data

distributions.

The static baselines Static-G and Static-I produce the data distributions with the lowest sufficiency coefficients comparatively.

Notably, when users apply the same global similarity threshold as in Static-G, the system as a whole struggles to achieve sufficiency.

This indicates that the use of a global similarity threshold is not recommendable in view of data minimization. We conclude more

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Table 2

Comparison of redundancy–sufficiency pairs ()when users do or do not make use of changing identifiers and payload aggregation MostSim or WeighAvg in

combination with Distributed Gradient Descent data minimization (DGD) for distinct combinations of starting parameter values 𝑇𝑢and 𝜃𝑢for all users 𝑢∈𝑈

(𝑛payload = 3). All combinations produce sufficient data distributions (= 100%). Pairwise two-sided Welch’s t-test with Bonferroni correction and significance

level 𝛼= 0.01 yield that neither of the combinations significantly outperforms all others.

Persistent identifiers Changing identifiers

𝑇𝑢𝜃𝑢DGD DGD[MostSim] DGD[WeighAvg] DGD DGD[MostSim] DGD[WeighAvg]

Random

Connection Behavior

25 0.0 10.92(100%) 5.25(100%) 5.25(100%) 10.90(100%) 5.24(100%) 5.24(100%)

0.1 10.64(100%) 5.27(100%) 5.26(100%) 10.66(100%) 5.27(100%) 5.26(100%)

0.2 11.57(100%) 5.41(100%) 5.41(100%) 11.59(100%) 5.41(100%) 5.41(100%)

0.3 13.07(100%) 5.91(100%) 5.91(100%) 13.07(100%) 5.91(100%) 5.91(100%)

50 0.0 10.93(100%) 5.25(100%) 5.24(100%) 10.91(100%) 5.23(100%) 5.23(100%)

0.1 10.64(100%) 5.26(100%) 5.25(100%) 10.65(100%) 5.26(100%) 5.25(100%)

0.2 11.56(100%) 5.41(100%) 5.41(100%) 11.59(100%) 5.41(100%) 5.41(100%)

0.3 13.08(100%) 5.91(100%) 5.91(100%) 13.07(100%) 5.91(100%) 5.90(100%)

75 0.0 10.93(100%) 5.25(100%) 5.24(100%) 10.91(100%) 5.23(100%) 5.23(100%)

0.1 10.64(100%) 5.26(100%) 5.25(100%) 10.65(100%) 5.26(100%) 5.25(100%)

0.2 11.56(100%) 5.41(100%) 5.41(100%) 11.59(100%) 5.41(100%) 5.41(100%)

0.3 13.08(100%) 5.91(100%) 5.91(100%) 13.07(100%) 5.91(100%) 5.90(100%)

RandInt (25,75) Rand (0.1,0.3) 10.90(100%) 5.25(100%) 5.23(100%) 10.91(100%) 5.24(100%) 5.23(100%)

Similarity-based

Connection Behavior

25 0.0 10.06(100%) 4.93(100%) 4.93(100%) 10.07(100%) 4.94(100%) 4.92(100%)

0.1 10.01(100%) 4.98(100%) 4.98(100%) 10.01(100%) 4.98(100%) 4.98(100%)

0.2 11.99(100%) 5.49(100%) 5.48(100%) 11.99(100%) 5.49(100%) 5.48(100%)

0.3 14.12(100%) 6.07(100%) 6.07(100%) 14.13(100%) 6.07(100%) 6.07(100%)

50 0.0 10.07(100%) 4.93(100%) 4.93(100%) 10.06(100%) 4.93(100%) 4.93(100%)

0.1 10.00(100%) 4.98(100%) 4.98(100%) 10.00(100%) 4.98(100%) 4.98(100%)

0.2 12.00(100%) 5.48(100%) 5.48(100%) 11.99(100%) 5.48(100%) 5.48(100%)

0.3 14.11(100%) 6.08(100%) 6.09(100%) 14.12(100%) 6.08(100%) 6.09(100%)

75 0.0 10.07(100%) 4.93(100%) 4.93(100%) 10.06(100%) 4.93(100%) 4.93(100%)

0.1 10.00(100%) 4.98(100%) 4.98(100%) 10.00(100%) 4.98(100%) 4.98(100%)

0.2 12.00(100%) 5.48(100%) 5.48(100%) 11.99(100%) 5.48(100%) 5.48(100%)

0.3 14.11(100%) 6.08(100%) 6.09(100%) 14.12(100%) 6.08(100%) 6.09(100%)

RandInt (25,75) Rand (0.1,0.3) 11.90(100%) 5.64(100%) 5.63(100%) 11.88(100%) 5.64(100%) 5.63(100%)

generally that the use of static parameter values is not suitable for distributed data minimization. This makes sense intuitively,

as dynamic parameters allow users to adjust their data collection to the current situation. Users may, for instance, adjust their

parameter values on the basis of other users’ parameter values or due to a change in connection behavior.

Regardless of whether we use dynamic parameters or static parameters, our results show that the effect of payload aggregation

and the use of changing identifiers have qualitatively the same effect on both DGD and the baselines. Table 2 shows the effect for

DGD in particular. We observe first that both payload aggregation and the use of changing identifiers does not break the robustness

with which DGD produces sufficient data distributions, as all sufficiency coefficients are perfect (= 100%). We observe further that

the impact of using changing identifiers as compared to using persistent identifiers on redundancy is marginal. More specifically, we

find that the differences are not statistically significant. The impact of payload aggregation is less subtle. We see that the use of either

MostSim or WeighAvg about halves redundancies. Although the use of payload aggregation allows to significantly reduce redundancy,

we find that the difference between the payload aggregation methods MostSim and WeighAvg is not statistically significant. Now

that we have seen that DGD solves the distributed data minimization problem, we are interested in seeing the performance that the

resulting data distributions yield.

4.6. Performance-redundancy trade-offs

Table 3 shows the absolute differences in RMSE and redundancy between DGD and each of the five baselines. Observe first

that the difference in RMSE between DGD and Dyn-R&I is less than 0.001, while using DGD results in less redundancy. Using DGD

rather than Dyn-R&I allows to collect a data amount of 2.55 copies of the user-item matrix 𝑅less with random connection behavior

and 1.71 copies with similarity-based connection behavior across all users with otherwise equal performance levels, which is a

relative improvement of 19.3% and 14.6% respectively. Recall that the difference in parameter update formulas is that DGD features

the consensus term, while Dyn-R&I does not (see Section 4.3). Users thus achieve a significantly better performance-redundancy

trade-off if they coordinate data minimization through the consensus term. Following a similar argumentation for DGD and Dyn-C,

we conclude that performing both global and local data minimization as in DGD, and not only local data minimization as in Dyn-C,

has a significantly positive impact on the performance-redundancy trade-off of minimized data distributions.

Recall that the update formula of Dyn-R&I features the innovation term and the regularization term, while that of Dyn-R only

features the regularization term. We see in Table 3 that using Dyn-R allows to trade a decrease in RMSE by 0.021 with random

connection behavior and 0.025 with similarity-based connection behavior, for a redundancy increased by the data amount of

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Table 3

Absolute differences 𝛥RMSE and redundancies 𝛥of the baselines to DGD data minimization. Differences are taken on the basis of the least redundant and

sufficient data distributions shown in Table 1. Asterisks mark statistical significance between each of the baselines with DGD data minimization under a two-sided

Welch’s t-test with Bonferroni correction and significance level 𝛼= 0.01.

𝑇𝑢𝜃𝑢𝛥RMSE 𝛥

Random

Connection Behavior

Static-G – – – –

Static-I 75 0.1 −0.015* +17.75*

Dyn-C 75 0.1 −0.006 +9.02*

Dyn-R 25 0.1 −0.021* +42.17*

Dyn-R&I 50 0.0 +0.001 +2.55*

DGD 25 0.1 ±0.000 ±0.00

Similarity-based

Connection Behavior

Static-G – – – –

Static-I 75 0.1 −0.017* +19.84*

Dyn-C 75 0.1 −0.009 +9.51*

Dyn-R 25 0.1 −0.025* +42.45*

Dyn-R&I 50 0.1 ±0.000 +1.71*

DGD 50 0.1 ±0.000 ±0.00

Table 4

Absolute differences 𝛥RMSE and redundancies 𝛥of DGD with persistent identifiers and combinations of persistent and changing identifiers and payload

aggregation methods MostSim and WeighAvg. Differences are taken on the basis of the least redundant and sufficient data distribution shown in Table 2. Asterisks

mark statistical significance between each of the baselines with DGD data minimization under a two-sided Welch’s t-test with Bonferroni correction and significance

level 𝛼= 0.01.

𝑇𝑢𝜃𝑢𝛥RMSE 𝛥

Random

Connection Behavior

Changing Identifiers

DGD[WeighAvg] 50 0.0 +0.005 −5.41*

DGD[MostSim] 50 0.0 +0.005 −5.41*

DGD 50 0.1 ±0.000 +0.01

Persistent Identifiers

DGD[WeighAvg] RandInt (25,75) Rand (0.1,0.3) +0.006 −5.41*

DGD[MostSim] 25 0.0 +0.005 −5.39*

DGD 25 0.1 ±0.000 ±0.00

Similarity-based

Connection Behavior

Changing Identifiers

DGD[WeighAvg] 25 0.0 +0.003 −5.08*

DGD[MostSim] 50 0.0 +0.003 −5.07*

DGD 50 0.1 ±0.000 ±0.00

Persistent Identifiers

DGD[WeighAvg] 25 0.0 +0.004 −5.07*

DGD[MostSim] 25 0.0 +0.003 −5.07*

DGD 50 0.1 ±0.000 ±0.00

42.17 copies of the user-item matrix 𝑅more with random connection behavior, and 42.45 copies with similarity-based connection

behavior, over all users. We conclude that the use of the innovation term has a significantly positive impact on performance while

simultaneously having a significantly negative impact on redundancy. Whether this trade-off is desirable depends on the setting.

We have, for instance, seen in Fig. 3 that DGD converges quicker than Dyn-R, which may be a desirable property. However, only

considering performance and redundancy, we cannot strictly favor either DGD or Dyn-R&I. Following the same argumentation, we

cannot strictly favor either DGD or Static-I only on the basis of performance and redundancy. However, considering that Static-I

has a tendency to produce insufficient data distributions (see Table 1), we deem DGD to be generally favored over Static-I.

Table 4 shows how the use of changing identifiers and payload aggregation affect the performance-redundancy trade-off. We

observe first that the use of changing identifiers does not affects the RMSE at all with both random and similarity-based connection

behavior. We see further that with random connection behavior, the use of changing identifiers increases redundancy slightly by 0.01,

while for similarity-based connection behavior, it leaves redundancy unchanged. We find that the marginal differences in redundancy

and RMSE are not statistically significant. We thus conclude that the use of changing identifiers is feasible in the context of data

minimization.

The reduction in redundancy due to payload aggregation is statistically significant with both random and similarity-based

connection behavior. Interestingly, it is essentially the same, irrespective of whether users make use of persistent or changing

identifiers. We observe, however, that the redundancy reduction of −5.41 and −5.39 is larger with random connection behavior

than −5.08 and −5.07 with similarity-based connection behavior. Interestingly, the redundancies that the payload aggregation

methods MostSim and WeighAvg yield are essentially the same irrespective of whether or not users make use of persistent or

changing identifiers. We conclude that the use of changing identifiers is feasible in the context of data minimization with payload

aggregation. A direct comparison of MostSim and WeighAvg shows that both payload aggregation methods yield essentially the same

performance-redundancy trade-offs.

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

5. Conclusions

We study data minimization and anonymity in pervasive mobile-to-mobile recommender systems in which users establish ad hoc

wireless connections between their mobile computing devices in physical proximity to exchange ratings on which they calculate

recommendations. A major difference in achieving this goal in pervasive mobile-to-mobile recommender systems, compared to

decentralized recommender systems in general, resides in the mobility of users. Due to user mobility, it is not possible for users to

collect the ratings from other users at will. Users have to decide in situ and exactly when their mobile computing device establishes

a wireless ad hoc connection whether and how many ratings they want to exchange. One of the effects of user mobility is that a

solution to the distributed data minimization problem does not always exist.

We study two types of user mobility. We study, on the one hand, uniformly random connections between users and, on the

other hand, connections that are random yet limited to users who have rated at least one item in common. The randomness in both

models accounts for the randomness in connections that we would expect to arise in a pervasive mobile-to-mobile recommender

system. For both these mobility models, we find that our algorithm based on Distributed Gradient Descent (DGD) robustly solves

the distributed data minimization problem, outperforming an array of baselines. The algorithm’s robustness stems from its capacity

to allow users to dynamically change the amount of data they collect individually in order to minimize the amount of collected data

over all users globally.

The DGD data minimization algorithm minimizes first the number of connections, and on the basis of this minimized amount of

connections, the amount of data that are collected over these connections. We find that the algorithm consistently finds the least

amount of connections and the least amount of ratings communicated over these connections compared to the baselines. We find that

this property simultaneously reduces the chances of an attacker relating users to ratings. We further reduce the chances of an attacker

relating users to ratings by having users aggregate ratings on the same item into a single rating and change their identifiers between

connections. We find that payload aggregation allows to halve the amount of ratings that users collect irrespective of whether

users change their identifiers between connections or not and without jeopardizing the robustness of the DGD data minimization

algorithm. In this sense, the DGD data minimization algorithm not only minimizes the amount of ratings that users collect but also

preserves their anonymity. However, it is important to note that it only preserves the anonymity between users who do not establish

a connection. It is clear that users who do establish an ad hoc wireless connection in physical proximity can never be anonymous

toward each other.

CRediT authorship contribution statement

Tobias Eichinger: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Re-

sources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. Axel Küpper: Funding acquisition,

Supervision, Writing – review & editing.

Declaration of competing interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing

interests: Tobias Eichinger reports financial support was provided by European Commission. If there are other authors, they declare

that they have no known competing financial interests or personal relationships that could have appeared to influence the work

reported in this paper.

Data availability

https://github.com/TEichinger/dec-cf-sim (see [17] in the manuscript).

Acknowledgments

Funding: This work was supported by the European Union’s Horizon 2020 research and innovation program [grant number N.

883464].

Appendix A. General construction of the forwarding parameter 𝜽𝒖

Let 𝑢∈𝑈be a fixed user with similarity threshold 𝑡𝑢∈ [0,1]. Let further 𝑆= {𝑠𝑧}𝑧∈𝑈be the set of similarity data of all users in

𝑈and sim ∶𝑆×𝑆→[0,1] be a similarity measure. Let further 𝑋𝑢∶𝑈∖{𝑢}→[0,1] be a random variable that describes the behavior

with which the fixed user 𝑢forms a connection with an a priori unknown user 𝑣∈𝑈∖{𝑢}. More specifically, for any user 𝑣∈𝑈∖{𝑢}

we denote by P(𝑋𝑢=𝑣)the probability that the fixed user 𝑢forms a connection with the a priori unknown user 𝑣. More precisely:

P(𝑋𝑢=𝑣) ∶= P({𝜔|𝜔∈𝑋𝑢(𝑣)}). We emphasize that the random variable 𝑋𝑢only describes a single connection. Modeling multiple

connections requires taking multiple random variables 𝑋𝑢.

In this setting, it is unclear whether the fixed user 𝑢forwards a payload in a connection with some other user 𝑣, as 𝑢’s similarity

to 𝑣is a priori unknown. We define the random variable 𝑌𝑢to describe this a priori unknown similarity. We more specifically define

𝑌𝑢∶𝑆∖{𝑠𝑢}→[0,1] by 𝑌𝑢∶= sim(𝑠𝑢, 𝑠𝑋𝑢), where 𝑠𝑋𝑢denotes the similarity data of the a priori unknown user 𝑋𝑢. We denote by

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

𝐹𝑌𝑢∶ [0,1] →[0,1] the cumulative distribution function of 𝑌𝑢. Then, 𝐹𝑌𝑢(𝑠)represents the probability that 𝑢forms a connection with

another user 𝑋𝑢=𝑣∈𝑈∖{𝑢}such that their similarity is smaller or equal to 𝑠. Formally, we have:

𝐹𝑌𝑢(𝑠)Def. 𝐹𝑌𝑢

=P(𝑌𝑢≤𝑠)

Def. 𝑌𝑢

=P(𝑠𝑖𝑚(𝑠𝑢, 𝑠𝑋𝑢)≤𝑠)

Def. 𝑋𝑢

=P(𝑋𝑢=𝑣, 𝑠𝑖𝑚(𝑠𝑢, 𝑠𝑣)≤𝑠)

=P(𝑢↔𝑣, 𝑠𝑖𝑚(𝑠𝑢, 𝑠𝑣)≤𝑠).

(A.1)

If we plug in 𝑢’s similarity threshold 𝑡𝑢into this equation (𝑠=𝑡𝑢), we have that 𝐹𝑌𝑢(𝑡𝑢) ∈ [0,1] represents the probability that 𝑢forms

a connection with another user with a similarity lower or equal than 𝑡𝑢to 𝑢. Conversely, we have that 1 − 𝐹𝑌𝑢(𝑡𝑢) ∈ [0,1] represents

the probability that 𝑢forms a connection with another user with a similarity higher than 𝑡𝑢to 𝑢. Since 𝑢only forwards a payload if

the similarity to the other user is higher than 𝑡𝑢, we see that the probability

𝜃𝑢∶= 1 − 𝐹𝑌𝑢(𝑡𝑢) ∈ [0,1].(A.2)

represents the probability with which 𝑢forwards a payload in an a priori unknown connection.

Appendix B. Local versus global data minimization

We illustrate that the objectives of local and global data minimization are competing on the basis of the example on sufficiency

in Section 2.5. We check first whether the data distribution in the example qualifies as a solution to the global data minimization

problem. We then check whether individual users’ collected datasets qualify as solutions to the local data minimization problem.

Finally, we conclude that the existence of solutions to the individual problems does not imply the existence of a solution to the joint

problem.

The data distribution {𝑅𝑢, 𝑅𝑣, 𝑅𝑤}does not qualify as a solution to the global data minimization problem (see Definition 1 in

Section 2.5), since it is not a sufficient data distribution (see the side condition in Eq. (6)). In order to arrive at a sufficient data

distribution, 𝑤would require an additional data exchange in epoch 𝑡= 3, either 𝑢↔𝑤or 𝑣↔𝑤, for then 𝑤would obtain a rating

on 𝑤’s item of interest 𝑖4and arrive at a sufficient collected dataset 𝑅𝑤. However, users 𝑢and 𝑣do not seek data exchanges beyond

epoch 𝑡= 2, since they already collected a full copy of the user-item matrix modulo row permutation. For them, any additional

data exchange in epoch 𝑡= 3 would trivially not yield any change in performance (𝛥𝜎(3)

𝑢=𝛥𝜎(3)

𝑣= 0). We now check whether the

collected datasets 𝑅𝑢, 𝑅𝑣, 𝑅𝑤qualify as solutions to the local data minimization problem (see Definition 1 in Section 2.5).

Observe first that the collected dataset 𝑅𝑤trivially does not qualify as a solution to the local data minimization problem,

since the collected dataset 𝑅𝑤is insufficient. User 𝑤has not collected any rating on 𝑤’s item of interest 𝑖4, which is why 𝛥𝜎(1)

𝑤

is trivially undefined. We see that collected datasets necessarily need to be sufficient in order to qualify as solutions to the local

data minimization problem. 𝑅𝑢and 𝑅𝑣are sufficient collected datasets and are thus good solution candidates for the local data

minimization problem. If we assume, for the sake of the argument only, that the change in performance for users 𝑢and 𝑣was

marginal in epoch 𝑡= 2 (𝛥𝜎(2)

𝑢, 𝛥𝜎(2)

𝑣∈ (0, 𝜀)) for some suitable performance metric 𝜎and performance threshold 𝜀, the collected

datasets 𝑅𝑢and 𝑅𝑣qualify as solutions to the local data minimization problem.

In summary, we can see from this example that sufficiency is a necessary criterion for a collected dataset to qualify as a solution

to the local data minimization problem. If a data distribution were to qualify as a solution to both the local and the global data

minimization problem, then the data distribution necessarily has to be sufficient. If users that arrive quickly at a minimized collected

dataset stop seeking data exchanges with other users too early, they exacerbate the data collection of other users who may, as a

consequence, not be able to collect even a sufficient dataset. The underlying idea of our proposed DGD data minimization scheme

is that users coordinate their individual data collection processes by coordinating the pace of their data collection.

Appendix C. Complexity of centralized versus distributed data minimization

We compare the computational complexity of the data minimization problem in the centralized and distributed case. Roughly,

centralized data minimization involves training a few large models, while distributed data minimization involves training many small

models. Recall that the distributed data minimization problem is a mixture of the local and the global data minimization problem.

We omit a discussion on the complexity of the global data minimization problem (see Definition 1) since it merely involves constant

time look-ups of items of interest in a user’s collected dataset. We thus essentially only compare the computational complexity of

the centralized data minimization problem versus the local data minimization problem.

We assume that predicted ratings are calculated on the basis of an SVD collaborative filtering algorithm since the empirical

evaluation of the proposed DGD data minimization scheme is based on SVD (see Section 4.1). We point out that in the

context of recommendation, SVD usually denotes a class of matrix factorization algorithms such as SVD++ [25], truncated SVD

(see Section 3.6.5 in [26]), or FunkSVD [27] rather than a particular algorithm. We therefore use the complexity (min{𝑚𝑛2, 𝑛𝑚2})

of a classical numerical SVD algorithm on a 𝑚×𝑛matrix as, for instance, reported in [28].

Centralized data minimization assumes that the training ratings of a user-item matrix 𝑅are collected cumulatively in batches.

This is similar to users collecting payloads through the Data Exchange Protocol for distributed data minimization. After collecting a

new batch of ratings, first, a new SVD is calculated on the union of the previously collected batches and the newly collected batch.

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Afterward, the new recommendation performance that the new SVD yields on the test ratings is calculated. The new performance

is compared with the previous performance. If the performance differential is marginal, data collection is terminated.

The complexity of SVD calculations for centralized data minimization is the same in each iteration since the number of rows

𝑛user and the number of columns 𝑛item in the underlying user-item matrix remains constant. The reason for this is that centralized

systems need to serve recommendations to all users 𝑢∈𝑈on all items 𝑖∈𝐼. Since the number of items usually largely exceeds the

number of users (𝑛item ≫ 𝑛user), we arrive at a computational complexity for centralized data minimization on the basis of SVD of

(𝑛iteration ⋅[𝑛item ⋅𝑛user2

⏟⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏟

SVD calculation

+𝑐]),(C.1)

where 𝑛iteration denotes the number of iterations until termination and 𝑐the constant effort to calculate the performance on the test

ratings and the performance differentials in each iteration.

Although centralized data minimization and local data minimization are conceptually similar, the complexity of SVD calculations

for local data minimization does not remain constant as for centralized data minimization. The complexity of SVD calculations for

local data minimization increases between iterations. The reason for this is that 𝑢does not need to serve recommendations to all

users 𝑢∈𝑈on all items 𝑖∈𝐼as in a centralized recommender system, yet only to 𝑢themself on those items for which ratings have

been collected. The complexity of SVD calculations increases with the increasing dimension of the underlying user-item matrix,

which is the product of the number of rows and the number of columns. With respect to the example in Fig. 2, we see that the

collected dataset 𝑅𝑤is a 2 ×3 user-item matrix since 𝑤did not collect any rating on item 𝑖4, while 𝑅𝑢and 𝑅𝑣are 3 ×4 user-item

matrices.

Recall that users collecting data via the Data Exchange Protocol only receive a payload with probability 𝜃in a connection and

that a payload represents the analog to a batch in the centralized case. More specifically, we have seen at the end of Section 2.3

that a user 𝑢receives ∑𝑡𝜃(𝑡)

𝑢payloads on average over the course of 𝑢’s data collection process. Without loss of generality, we

disregard the connections in which 𝑢does not receive a payload. We define the integer 𝑛reception(𝑢) ∶= ⌊∑𝑡𝜃(𝑡)

𝑢⌋as an estimate of the

number of connections in which 𝑢receives a payload over the course of 𝑢’s data collection process. We further define the integer

𝑛reception ∶= ∑𝑢𝑛reception(𝑢)as an estimate of the overall number of payload receptions over all users’ data collection processes, in

analogy to 𝑛iteration in Eq. (C.1) in the centralized case. We then arrive at a computational complexity for local data minimization

on the basis of SVD of

(∑

𝑢∈𝑈

𝑛reception(𝑢)

∑

𝑡=1

[𝑛(𝑡)

item(𝑢)⋅𝑛(𝑡)

user(𝑢)2

⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟

SVD calculation

per user and epoch

+𝑐𝑢])

Eq. (C.3)

=(∑

𝑢∈𝑈

𝑛reception(𝑢)

∑

𝑡=1

[𝑛item ⋅𝑛user2+𝑐𝑢])

=(∑

𝑢∈𝑈

𝑛reception(𝑢)

∑

𝑡=1

[𝑛item ⋅𝑛user2+𝑐])

=(∑

𝑢∈𝑈

𝑛reception(𝑢)⋅[𝑛item ⋅𝑛user2+𝑐])

=(𝑛reception ⋅[𝑛item ⋅𝑛user2

⏟⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏟

averaged

SVD calculation

+𝑐]),

(C.2)

where 𝑛(𝑡)

user(𝑢)and 𝑛(𝑡)

item(𝑢)denote the number of rows and columns of a user 𝑢’s collected dataset respectively after receiving

𝑡payloads, 𝑐𝑢the user-specific constant computational effort to calculate the performance on their test ratings and their performance

differentials with 𝑐as their average, and 𝑛item ⋅𝑛user2the average computational effort of SVD calculations over all users and epochs

defined as

𝑛item ⋅𝑛user2∶= ∑𝑢∈𝑈∑𝑛reception(𝑢)

𝑡=1 𝑛(𝑡)

item(𝑢)⋅𝑛(𝑡)

user(𝑢)2

∑𝑢∈𝑈𝑛reception(𝑢).(C.3)

By comparison of Eqs. (C.1) and (C.2), we see that the computational complexity of centralized data minimization and local

data minimization have a similar structure. In particular, both grow linearly in the number of times that data is collected since the

collection of a batch or payload triggers the calculation of an SVD. The complexity of SVD calculations is expensive for large user-item

matrices, even if they are sparse. Therefore, if SVD calculations on the entire user-item matrix for centralized data minimization

become intractable, performing SVD calculations on multiple subsets of the entire user-item matrix for local data minimization may

be a tractable alternative. An empirical comparison of the computational effort of centralized versus local data minimization goes

beyond the scope of this paper and represents future work.

Pervasive and Mobile Computing 103 (2024) 101951

T. Eichinger and A. Küpper

Appendix D. Completeness of local and global data minimization

We explain why individual solutions to the local and global data minimization problems exist with random connection behavior.

The local data minimization problem is well-defined; that is, a solution to the problem exists for a user 𝑢if that user 𝑢can collect

a locally minimized dataset 𝑅(𝑚𝑢)

𝑢(see the side condition in Eq. (5)). Since local data minimization is equivalent to centralized data

minimization performed by an individual user, a user may apply data minimization schemes that have been proposed for use in

centralized systems (see for instance [3,4]). Data minimization schemes for use in centralized systems require the availability of

the entire user-item matrix 𝑅, that is, the availability of any other user’s profile. Randomly pairing 𝑢with any other user in the

Connection Phase (𝑖) of the Data Exchange Protocol makes any other user’s profile available to user 𝑢, although it may require

several epochs to actually collect these profiles. We conclude that solutions to the local data minimization problem always exist if

users are paired randomly.

The global data minimization problem is well-defined if every user 𝑢∈𝑈can collect a sufficient dataset 𝑅𝑢, which is equivalent

to the data distribution {𝑅𝑢}𝑢∈𝑈being sufficient (see the side condition in Eq. (6)). Under the assumption that the entire user-item

matrix 𝑅represents a sufficient collected dataset to any user 𝑢, we can completely analogously to the argumentation in the previous

paragraph for the local data minimization problem conclude that a solution to the global data minimization problem always exists

if users are paired randomly. We argue that the assumption that 𝑅is sufficient to any user 𝑢is justified since, otherwise, the

recommendation problem itself is not well-defined. More precisely, if 𝑅were not sufficient for a particular user 𝑢, that is, 𝑅does

not hold ratings on any of 𝑢’s items of interest, none of the items of interest can be recommended.

References

[1] European Union, Charter of fundamental rights of the European union, Off. J. Eur. Union 55 (C 326) (2012) 391–407, see in particular Chapter II.

[2] A.F. Westin, Privacy and Freedom, Ig Publishing, 2015, reprint from 1967.

[3] A.J. Biega, P. Potash, H. Daumé, F. Diaz, M. Finck, Operationalizing the legal principle of data minimization for personalization, in: Proceedings of the

43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2020, pp. 399–408.

[4] D. Shanmugam, F. Diaz, S. Shabanian, M. Finck, A.J. Biega, Learning to limit data collection via scaling laws: A computational interpretation for the legal

principle of data minimization, in: Proceedings of the 5th ACM Conf. on Fairness, Accountability, and Transparency, ACM, 2022, pp. 839–849.

[5] R. Chow, H. Jin, B. Knijnenburg, G. Saldamli, Differential data analysis for recommender systems, in: Proceedings of the 7th ACM Conference on

Recommender Systems, ACM, 2013, pp. 323–326.

[6] M. Larson, A. Zito, B. Loni, P. Cremonesi, Towards minimal necessary data: The case for analyzing training data requirements of recommender algorithms,

in: Proceedings of the 1st FATREC Workshop on Responsible Recommendation, 2017, pp. 1–6.

[7] H. Wen, L. Yang, M. Sobolev, D. Estrin, Exploring recommendations under user-controlled data filtering, in: Proceedings of the 12th ACM Conference on

Recommender Systems, ACM, 2018, pp. 72–76.

[8] T. Eichinger, A. Küpper, Distributed data minimization for decentralized collaborative filtering systems, in: Proceedings of the 24th International Conference

on Distributed Computing and Networking, ICDCN ’23, ACM, 2023, pp. 140–149.

[9] S. Lee, X. Zheng, J. Hua, H. Vikalo, C. Julien, Opportunistic federated learning: An exploration of egocentric collaboration for pervasive computing

applications, in: Proceedings of the 19th IEEE International Conference on Pervasive Computing and Communications, IEEE, 2021, pp. 1–8.

[10] J. Dunkel, R. Hermoso, Towards MANET-based recommender systems for open facilities, Appl. Intell. 52 (8) (2021) 9045–9066.

[11] T. Eichinger, F. Beierle, R. Papke, L. Rebscher, H. Chinh Tran, M. Trzeciak, On gossip-based information dissemination in pervasive recommender systems,

in: Proceedings of the 13th ACM Conference on Recommender Systems, ACM, 2019, pp. 442–446.

[12] L.N. Barbosa, J. Gemmell, M. Horvath, T. Heimfarth, Distributed user-based collaborative filtering on an opportunistic network, in: Proceedings of the

2018 IEEE 32nd International Conference on Advanced Information Networking and Applications, IEEE, 2018, pp. 266–273.

[13] P. Gratz, T. Leclerc, Delay-tolerant collaborative filtering, in: Proceedings of the 7th ACM International Symposium on Mobility Management and Wireless

Access, ACM, 2009, pp. 109–113.

[14] R. Schifanella, A. Panisson, C. Gena, G. Ruffo, MobHinter: Epidemic collaborative filtering and self-organization in mobile ad-hoc networks, in: Proceedings

of the 2nd ACM Conference on Recommender Systems, ACM, 2008, pp. 27–34.

[15] A. de Spindler, M.C. Norrie, M. Grossniklaus, Collaborative filtering based on opportunistic information sharing in mobile ad-hoc networks, in: Proceedings

of the 6th OTM Confederated International Conferences ‘‘On the Move to Meaningful Internet Systems’’, in: LNCS, (4803) Springer, 2007, pp. 408–416.

[16] J. Hestness, S. Narang, N. Ardalani, G. Diamos, H. Jun, H. Kianinejad, M.M.A. Patwary, Y. Yang, Y. Zhou, Deep learning scaling is predictable, empirically,

2017, arXiv:1712.00409.

[17] T. Eichinger, Simulation framework for distributed data minimization, 2023, URL: https://github.com/TEichinger/dec-cf-sim. (Accessed 5 May 2024).

[18] M.D. Ekstrand, J.T. Riedl, J.A. Konstan, Collaborative filtering recommender systems, Found. Trends HCI 4 (2) (2011) 81–173.

[19] M. McPherson, L. Smith-Lovin, J.M. Cook, Birds of a feather: Homophily in social networks, Annu. Rev. Sociol. 27 (2001) 415–444.

[20] G. Beliakov, T. Calvo, S. James, Aggregation functions for recommender systems, in: F. Ricci, L. Rokach, B. Shapira (Eds.), Recommender Systems Handbook,

second ed., Springer US, Boston, MA, 2015, pp. 777–808.

[21] R. Shokri, P. Pedarsani, G. Theodorakopoulos, J.-P. Hubaux, Preserving privacy in collaborative filtering through distributed aggregation of offline profiles,

in: Proceedings of the 3rd ACM Conference on Recommender Systems, ACM, 2009, pp. 157–164.

[22] A. Pfitzmann, M. Hansen, A terminology for talking about privacy by data minimization: Anonymity, unlinkability, undetectability, unobservability,

pseudonymity, and identity management, 2010, v0.34.

[23] F.M. Harper, J.A. Konstan, The MovieLens datasets: History and context, ACM TIIS 5 (4) (2015) 1–19.

[24] Q.-T. Truong, A. Salah, H. Lauw, Multi-modal recommender systems: Hands-on exploration, in: Proceedings of the 15th ACM Conference on Recommender

Systems, ACM, 2021, pp. 834–837.

[25] Y. Koren, Factorization meets the neighborhood: A multifaceted collaborative filtering model, in: Proceedings of the 14th ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, ACM, 2008, pp. 426–434.

[26] C.C. Aggarwal, Recommender Systems: The Textbook, first ed., Springer Publishing Company, Incorporated, 2016.

[27] S. Funk, Netflix update: Try this at home, 2006, URL: https://sifter.org/simon/journal/20061211.html. (Accessed 5 May 2024).

[28] L. Trefethen, D. Bau, Numerical Linear Algebra, 1997.