Document [original]

From Theory to Comprehension: A Comparative Study of

Differential Privacy and 𝑘-Anonymity

Saskia Nuñez von Voigt

[email protected]

Technische Universität Berlin

Berlin, Germany

Luise Mehner

[email protected]

Technische Universität Berlin

Berlin, Germany

Florian Tschorsch

[email protected]

Technische Universität Dresden

Dresden, Germany

ABSTRACT

The notion of

𝜀

-differential privacy is a widely used concept of

providing quantifiable privacy to individuals. However, it is unclear

how to explain the level of privacy protection provided by a dif-

ferential privacy mechanism with a set

𝜀

. In this study, we focus

on users’ comprehension of the privacy protection provided by

a differential privacy mechanism. To do so, we study three vari-

ants of explaining the privacy protection provided by differential

privacy: (1) the original mathematical definition; (2)

𝜀

translated

into a specific privacy risk; and (3) an explanation using the ran-

domized response technique. We compare users’ comprehension of

privacy protection employing these explanatory models with their

comprehension of privacy protection of

𝑘

-anonymity as baseline

comprehensibility. Our findings suggest that participants’ compre-

hension of differential privacy protection is enhanced by the privacy

risk model and the randomized response-based model. Moreover,

our results confirm our intuition that privacy protection provided

by 𝑘-anonymity is more comprehensible.

CCS CONCEPTS

•Security and privacy

→

Usability in security and privacy;

Data anonymization and sanitization; • General and reference

→

Surveys and overviews.

KEYWORDS

differential privacy, explanatory model, study

ACM Reference Format:

Saskia Nuñez von Voigt, Luise Mehner, and Florian Tschorsch. 2024. From

Theory to Comprehension: A Comparative Study of Differential Privacy

and

𝑘

-Anonymity. In Proceedings of the Fourteenth ACM Conference on

Data and Application Security and Privacy (CODASPY ’24), June 19–21, 2024,

Porto, Portugal. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/

3626232.3653261

1 INTRODUCTION

Privacy-preserving techniques have been proposed in various do-

mains to provide data protection guarantees. The aim of these tech-

niques is to minimize the risk of identifying an individual while

also maximizing the utility of the data. One simple method is to

This work is licensed under a Creative Commons Attribution-

NonCommercial-NoDerivs International 4.0 License.

CODASPY ’24, June 19–21, 2024, Porto, Portugal

ACM ISBN 979-8-4007-0421-5/24/06

https://doi.org/10.1145/3626232.3653261

remove or generalize attributes so that each combination of at-

tribute values comprises at least

𝑘

entries, leading to the concept of

𝑘

-anonymity [

]. Each individual in the data set is therefore indis-

tinguishable from

𝑘−1

other individuals. However,

𝑘

-anonymity

does not provide strong mathematical privacy guarantees, as at-

tribute values can be revealed in some situations [16, 18].

The privacy concept of

𝜀

-differential privacy [

], offers stronger

privacy guarantees. It is a mathematical definition in which ran-

domization is used to limit the impact on the output of an individual

contributing to a database. The privacy parameter

𝜀

determines the

privacy-utility tradeoff.

It is, however, difficult for a user to comprehend the level of

privacy protection provided to them resulting from a particular

𝜀

Previous works have attempted to explain differential privacy mech-

anisms [

], quantify privacy guarantees [

], and com-

municate privacy risks [

]. One approach to making the privacy

parameter of differential privacy more comprehensible is to trans-

late

𝜀

into a corresponding privacy risk, expressed as a percent-

age [

]. Another approach has used the randomized response

technique [

] to describe privacy protection [

]. This technique in-

volves local differential privacy, which has been shown to be more

intuitive [

]. However, it is unclear whether these approaches

enhance users’ comprehension of the implications of differential

privacy mechanisms [3].

In contrast, for

𝑘

-anonymity, the privacy parameter

𝑘

is di-

rectly linked to individual identifiability. We therefore argue that

𝑘

anonymity is easier to understand than privacy protection provided

by differential privacy. Based on our assumption, we investigated

how we can explain the level of privacy protection of differential

privacy. Namely, in such a way that it is possibly just as compre-

hensible as 𝑘-anonymity.

To that end, we present three explanatory models that explain

the privacy protection provided by differential privacy. In each

explanatory model, we use a particular translation of the privacy

parameter

𝜀

into a more intuitive concept. These translations of

𝜀

describe the level of privacy protection, making it easier to compre-

hend the implications of various differential privacy mechanisms.

We build upon existing and established strategies to communicate

the privacy protection provided by differential privacy quantita-

tively; (1) the original mathematical definition (

DEF

); (2)

𝜀

as a

privacy risk (

RISK

); and (3) an explanation using the randomized

response technique (

RRT

). We conducted an experimental study

to investigate whether these explanatory models enhance users’

comprehension of differential privacy protection.

221

CODASPY ’24, June 19–21, 2024, Porto, Portugal Saskia Nuñez von Voigt, Luise Mehner, & Florian Tschorsch

In our experimental study, we examined users’ comprehension

of the privacy protection provided by a differential privacy mecha-

nism compared to their comprehension of the privacy protection

provided by a

𝑘

-anonymity mechanism. We thus anchor the compre-

hension of the privacy protection of differential privacy in general

and the respective comprehensibility with each explanatory model

to the comprehensibility of

𝑘

-anonymity. Our comparison increases

the methodological validity of our study. Importantly, we do not

compare the two mechanisms themselves nor their level of privacy

protection. Instead, we are interested in the comprehensibility of

privacy protection provided by the mechanisms.

With our results we provide evidence that the privacy protection

provided by differential privacy is best understood using

RRT

an explanatory model. Moreover, we establish

𝑘

-anonymity as a

baseline and an easily understandable privacy mechanism.

The paper’s contribution and structure can be summarized as fol-

lows: We present three explanatory models that include translations

of the privacy parameter to help users understand privacy protec-

tion and thus the implications provided by a differential privacy

mechanism in Section 2. After designing and conducting an exper-

imental study addressing our research questions (Section 3), we

performed a pilot study to validate our explanations and questions

before conducting our main study. Our improvements designed to

increase the internal validity of the questions concerning subjec-

tive and objective comprehension for the main study are presented

in Section 4. In our main study, we examined the participants’

subjective and objective comprehension of the differential privacy

protection with the explanatory models

DEF

RISK

, and

RRT

com-

pared to users’ comprehension of the privacy protection provided

by a

𝑘

-anonymity mechanism (Section 5). Lastly, we discuss limita-

tions and future work in Section 6 and we review related work in

Section 7. We conclude our paper in Section 8.

2 EXPLANATORY MODELS

In this section, we provide three explanatory models for the im-

plications of the privacy parameter

𝜀

of our privacy mechanism—

differential privacy. Each model involves a translation of the pri-

vacy parameter into a more intuitive concept. Each translation is

designed to help users understand the level of privacy protection

provided with a specified privacy parameter and thus the implica-

tions of the mechanism. In addition, we give a brief overview of

the privacy parameter of 𝑘-anonymity.

2.1 Privacy Protection of 𝑘-anonymity

The privacy protection of

𝑘

-anonymity [

] relies on the concept

of anonymity sets. An anonymity set is a set of elements which

are indistinguishable from each other. The individual’s entries in a

database are generalized or suppressed in a way that for each entry,

there are at least

𝑘

entries with the same values in all columns

that might be used to re-identify an individual. In other words,

individual’s entries are clustered into anonymity sets.

The privacy parameter

𝑘

translates to the size of the smallest

anonymity set in the database. The higher

𝑘

, the more indistinguish-

able individuals exist in each group, resulting in a stronger privacy

protection. For instance, with

𝑘=4

, the chance of correctly linking

an entry of a group to an individual is 1/4=0.25.

2.2 Differential Privacy Definition (DEF)

Differential privacy [

] bounds the amount of influence a single

individual’s data can have on the output of a statistical computa-

tion over a database. A mechanism

𝜀

-differentially private

if for any two neighboring data sets (

𝐷1

and

𝐷2

), differing in one

individual, and any statistical result computed over the data sets

(𝑆⊆Range(M)) satisfy:

𝑃[M(𝐷1) ∈ 𝑆] ≤ e𝜀𝑃[M(𝐷2) ∈ 𝑆]. (1)

The maximum distance between the probabilities of the mecha-

nism returning the result with each database is less than a certain

quantity. This quantity is based on the privacy parameter

𝜀

. A pri-

vacy parameter closer to zero reduces the maximum distance, which

means that the amount of influence any one individual’s data can

have on the overall output is smaller. A smaller privacy parameter

thus yields stronger privacy protection.

The privacy parameter

𝜀

translates into the factor by which

the probability of returning any other result is greater than the

probability of the same result if an individual is missing from the

data set. For instance, with

𝜀=ln 3

, thus,

eln 3 =3

the probability

of returning any result is at most three times the probability of the

same result if one individual is missing in the data set.

2.3 Epsilon as Privacy Risk (RISK)

Lee and Clifton [

] proposed for the Laplacian differential privacy

mechanism, a way of calculating the risk of users in a data set being

identified. In this framework, after an adversary receives an output

of the differential privacy mechanism, she then imagines every

possible scenario for a distribution of all possible values for the

individuals’ data that she does not already know. These scenarios

are her so-called possible worlds. By comparing the probability of

the mechanism returning the particular result for each possible

world, the adversary decides which possible world is most likely

to be true. The probability of the mechanism indicating the correct

possible world when returning a result hence represents the users’

risk of being identified.

Mehner et al. [

] simplified the framework by assuming worst-

case values for some variables, so that the risk of being identified

in a data set 𝑝depends only on 𝜀and 𝑛:

𝑝=

1+e−𝜀(𝑛−1), (2)

where

𝑛

corresponds to the number of (unknown) possible worlds

imagined by the adversary. Thus, the privacy parameter can be

translated into a privacy risk in percent.

However, the number of possible worlds

𝑛

may be difficult to

grasp. Moreover,

𝑛

depends on multiple often unspecified variables,

such as the knowledge of the adversary, the number of individuals

in the database and the number of possible values for an answer.

According to Mehner et al. [

], assuming the worst-case attack

scenario, an adversary might have only two possible worlds. For

example, she may be uncertain about only one individual’s answer

and there may be only two possible values for that answer. Accord-

ingly, the worst-case value for

𝑛=2

resulting in the global privacy

risk:

𝑝=

1+e−𝜀. (3)

222

From Theory to Comprehension: A Comparative Study of Differential Privacy and 𝑘-Anonymity CODASPY ’24, June 19–21, 2024, Porto, Portugal

We can therefore translate the privacy parameter for a given

𝜀

into the privacy risk of identifying the true answers of individuals

included in the database. In other words, if an adversary queries

the answer of an individual and there are only two possible answer

values (i.e., in a worst-case attack scenario), we can determine the

probability of the mechanism indicating the true answer of the

individual for a specified

𝜀

. For example, assume we set

𝜀=ln 3

which yields a privacy risk of

%, i.e., in the worst-case attack

scenario, the true answer of a person included in the database is

revealed with a probability of 75 %.

2.4 Using Randomized Response (RRT)

The number of possible worlds

𝑛

of Equation (2) is similar to the

number of different answers in the randomized response tech-

nique [

]. The randomized response technique is an approach

designed to provide plausible deniability to data subjects. The idea

is that some of the data subjects will give their true answer and

others will give a forced answer. The decision of whether an indi-

vidual gives a true or a forced answer is made randomly. Conse-

quently, each answer has a probability of being an individual’s true

answer. Therefore, users’ answers do not reveal the individuals’

true answers with certainty. The randomized response technique

inherently holds the local differential privacy guarantee.

More precisely, with a probability of

𝑝𝑡𝑟𝑢𝑒

, the true answer

𝑎

stored in the database. The probability of any false answer

𝑎0≠𝑎

𝑝𝑓 𝑎𝑙𝑠𝑒 =(1−𝑝𝑡𝑟𝑢𝑒 )/(𝑑−1)

, where

𝑑

is the number of possi-

ble answers. This mechanism is one approach of the randomized

response, called unary encoding, and it satisfies local differential

privacy:

𝑃[M(𝑎)=𝑎] ≤ e𝜀𝑃[M(𝑎)=𝑎0](4)

𝑝𝑡𝑟𝑢𝑒 e−𝜀=𝑝𝑓 𝑎𝑙𝑠𝑒 , (5)

resulting in

𝑝𝑡𝑟𝑢𝑒 =

1+e−𝜀(𝑑−1). (6)

The probability of storing a true answer is equal to the privacy

risk Equation (2), where the number of possible worlds

𝑛

corre-

sponds to the number of different answers 𝑑.

Hence, we can translate the privacy parameter

𝜀

into the probabil-

ity with which the mechanism stores a true answer in the database.

For example, assume we set

𝜀=ln 3

and have two possible an-

swers (

𝑑=2

). As a result, the probability of storing the true answer

%. With a higher number of possible answers, e.g.

𝑑=28

, the

probability of storing the true answer decreases to

%. Note that

the model also works for real-valued (continuous) data. In this case,

the worst case with

𝑑=2

should be used. The result indicates the

probability of storing true answers, regardless of whether the data

is discrete or continuous.

3 METHODOLOGY

In this section, we present and justify our hypotheses we formulated

to design our study. In addition, we detail how participants were

instructed, describe our sample, how we conducted the study and

how we analyzed the data.

Syntactic anonymization models, such as

𝑘

-anonymity, were

originally designed for privacy-preserving data publishing [

]. Dif-

ferential privacy, on the other hand, is more suitable for privacy-

preserving data mining. The concept of privacy-preserving data

publishing usually assumes a non-expert data publisher, i.e., the data

publisher does not have the knowledge to perform data mining [

Given that

𝑘

-anonymity is a viable solution for privacy-preserving

data publishing, the mechanism of

𝑘

-anonymity is aimed at the

non-expert who is the end user of the model. With

𝑘

-anonymity as

as simple and intuitive model [8], we derive our first hypothesis:

(H1)

Differential privacy vs.

𝑘

-anonymity: The privacy protection

provided by

𝑘

-anonymity is easier to comprehend than the

privacy protection provided by differential privacy (indepen-

dent of the explanatory model).

The definition of differential privacy is complex. Therefore, it is

important to describe the techniques or the implications of a differ-

entially private mechanism.

RRT

has often been used as an intuitive

mechanism [

]. Previous work has shown that

RRT

provides

more understanding among users [

RISK

was developed as an

intuitive explanation of

𝜀

. Consequently, we derive the following

hypothesis:

(H2)

Explanatory models: The explanatory models

RRT

and

RISK

will provide a better comprehension of the privacy protection

than the DEF model.

Previous work has shown that both numeracy skills and level

of educational affect risk understanding [

]. Users with low

numeracy skills have difficulty understanding risk in general [

These findings from previous work lead us to our final hypothesis:

(H3)

Education level and numeracy skills: High levels of educa-

tion and high numeracy skills help users to comprehend the

privacy protection provided by differential privacy.

3.1 Measures

Participants answered questions to evaluate their subjective and

objective comprehension of privacy protection. We also included

measures for covariates: demographics, privacy concerns and nu-

meracy skills.

3.1.1 Comprehension. Similar to previous work [

], we evalu-

ate the subjective comprehension (perceived comprehension) and

objective comprehension (actual comprehension) of the

𝑘

-anonymity

explanation and our explanatory models of the privacy protection

provided by differential privacy (RISK,RRT, and DEF).

We designed the questions concerning comprehension from

scratch, using direct questions. We included three 7-point-Likert

scaled questions regarding how the participants subjectively com-

prehended the level of privacy protection that each mechanism (and

its respective privacy parameter) provided. Following the questions

concerning subjective comprehension, there were four questions

testing the participants’ objective comprehension of privacy pro-

tection. In addition, we gave the participants the possibility to

comment on their comprehension answers. Last, we asked partici-

pants to directly compare which privacy mechanism they felt was

most comprehensible and intuitive in terms of privacy protection.

223

CODASPY ’24, June 19–21, 2024, Porto, Portugal Saskia Nuñez von Voigt, Luise Mehner, & Florian Tschorsch

Demographics

age,

field of study,

current level

of education

Scenario

statistics drug use

at school; parents

should not infer their

son/daughter’s drug use

comprehension questions

Privacy Protection Explanations (within-subject)

differential privacy

(between-subject)

RISK RRT

DEF

objective and subjective questions

𝑘-anonymity

objective and

subjective questions

direct comparison

Numeracy . . .

numeracy

privacy experience

privacy concerns

Check question

Figure 1: Overview of the study design.

3.1.2 Covariates. We assessed the participants’ numeracy skills

using subjective rating and objective test questions. The numeracy

questions were taken from multiple validated numeracy assess-

ments found in the literature [

]. Moreover, we asked about

any previous experience with privacy mechanisms in general and

differential privacy in particular. Finally, we also assessed the par-

ticipants’ general privacy concerns using a set of questions adapted

from Malhotra et al. [

]. We used these questions in the categories

of collection and awareness. We also included “attention” check

questions as part of the privacy aptitude and at the end of the study

to exclude inattentive participants: Please select 3 (More or less agree)

for this question and What is

4+5

. We assume that those par-

ticipants who were motivated at the end of the survey were also

motivated at the beginning.

3.2 Scenario and Explanations

We defined a fictional scenario about drug use at school as a running

example of a setting where privacy is crucial and where privacy

protection needs to be well understood at the same time. A school

stores student answers to a questionnaire on drug use in a database

grouped by age and class. In order to raise awareness, parents can

query the database, which is protected with a privacy mechanism.

Our explanations are designed from scratch. We used text-based

explanations because we focused on evaluation of the explanatory

model, not on how it was communicated. Our explanations start

with a short description of the privacy mechanism, inspired by

the Techniques description of Cummings et al. [

]. This was fol-

lowed by an explanation of the privacy protection parameter, e.g.

𝑘

, difference (

DEF

), risk (

RISK

) and probability (

RRT

). Finally, we

applied these explanations to our scenario and provided concrete

examples. The exact wording of our explanations can be found in

Appendix A.1.

3.3 Experimental Process

Prior to the main study, we conducted a pilot study to increase the

validity of our study questions. In particular, the pilot study allowed

us to validate our questions, explanations and instructions in terms

of textual clarity and general comprehensibility. We summarize the

results of the pilot study and the induced changes in Section 4.2

3.3.1 Overview of the Study Design. In Figure 1 we present an

overview of our study design and procedure. The process and design

We believe that this mathematical operation does not relate to numeracy because of

its simplicity. When answered, the question was answered correctly by all participants

of our main study.

The explanations, questions on subjective and objective understanding and

anonymized tables can be found in http://arxiv.org/abs/2404.04006.

of the main study and the pilot study, were the same. Both studies

had a mixed design with a between-subject factor “explanatory

model” (for differential privacy protection with three conditions

RISK

RRT

, and

DEF

) and a within-subject factor with two levels

(“privacy protection provided by

𝑘

-anonymity” and “privacy pro-

tection of differential privacy”). The within-subject factor included

in our study allowed us to evaluate the comprehensibility of differ-

ential privacy protection with each explanatory model compared

to the comprehensibility of the privacy protection of

𝑘

-anonymity.

As a results, we were able to

(1)

verify whether the privacy protection of

𝑘

-anonymity is

indeed easier to comprehend than that of differential privacy,

(2)

anchor the comprehensibility of differential privacy protec-

tion with each explanatory model to the comprehensibility

of privacy protection of

𝑘

-anonymity as a baseline for the

best possible comprehensibility,

(3)

control for any interindividual differences in comprehension

skills between the three conditions.

Moreover, use of a within-subject design reduced the standard

deviation in the objective and subjective comprehension scores,

improving the statistical validity of our study.

After a short welcome text explaining the purpose of the study,

the participants were asked to provide some demographic infor-

mation about themselves (age, field of study and current level of

education). Next, we introduced our fictional scenario. We ensured

that the participants understood the scenario by asking three check

questions: 1) Who provides the database in the scenario? 2) What

kind of data is stored in the database? 3) Eve (the adversary) wants

to find out the data of whom?

3.3.2 Procedure of Explanations. After ensuring that the partic-

ipants had read and understood the scenario, each participant

was presented with explanations of the privacy protection of two

privacy-enhancing mechanisms, an explanation of

𝑘

-anonymity

and an explanation of differential privacy. To control for learning

and other sequence effects, the order of the two explanations and

their respective comprehension questions were balanced. In other

words, participants were randomly assigned to either the first order

group, where the explanation and questions for

𝑘

-anonymity were

presented first, or to the second order group, where the explanation

and questions of the differential privacy protection were presented

first. Since each participant read and answered the questions for the

two explanations, our study had a within-subject factor with the

privacy protection of differential privacy and the privacy protection

of 𝑘-anonymity as factor levels.

224

From Theory to Comprehension: A Comparative Study of Differential Privacy and 𝑘-Anonymity CODASPY ’24, June 19–21, 2024, Porto, Portugal

The explanation of the privacy protection of

𝑘

-anonymity was

the same across all conditions. Each participant randomly (uni-

formly distributed) received one of the three explanations (

RISK

RRT

, or

DEF

) for differential privacy protection, resulting in three

between-subject conditions for the factor “explanatory model for

differential privacy protection”. We used similar phrasing and word-

ing in all explanations, including the explanation of the privacy

protection of

𝑘

-anonymity, in order to compare the comprehension

of the explanation. In addition, the subjective as well as the objective

comprehension questions were identical for each explanation.

The level of privacy protection provided by the differential pri-

vacy mechanism, i.e., the privacy parameter

𝜀

, was the same in each

explanatory model. We wanted to rule out the possibility of the level

of privacy protection systematically interfering with the partici-

pants’ comprehension of differential privacy protection. However,

differential privacy assumes a stronger adversary than

𝑘

-anonymity

does. A

𝑘

-anonymity mechanism cannot provide an equally strong

privacy protection as the differential privacy mechanism explained

using the

RISK

RRT

, and

DEF

explanatory models in our scenario.

Therefore, we have to trust that the weaker privacy protection did

not interfere with the participants’ comprehension. Consequently,

in our study, we explain the privacy protection of

𝑘

-anonymity with

𝑘=4

. We believe that this is an appropriate value to explain the

privacy protection of

𝑘

-anonymity since this results in a probability

of being identified of

0.25

. Again, we emphasize that we cannot

match the privacy levels of the two mechanisms.

3.3.3 Procedure after Explanations. After providing both the expla-

nations and the questions about their comprehensibility, we asked

participants directly about which privacy mechanism (if any) was

more comprehensible with respect to the level of privacy protection

and why. We also asked which mechanism (if any) they regarded

as providing a greater privacy protection in the particular scenario,

and why. The latter question was implemented to gain a deeper

insight into whether the participants had gained a sense of the rela-

tionship between a particular privacy parameter and the respective

level of privacy protection provided by each mechanism.

3.4 Participant Recruitment and Attributes

Both the pilot study and the main study were implemented us-

ing LimeSurvey

and emailed to university students of Berlin.

Our main study was publicly available between February 8 and

22, 2023. The participation was voluntary and we did not offer any

remuneration.

We used a set of questions provided by the Ethics Commission of

TU Berlin to self-evaluate the ethical considerations of the planned

research project. We then decided that a detailed application to the

Ethics Committee was not necessary. However, to address potential

ethical issues, we informed the participants (of the pilot study and

main study) about our data policies in our invitation email before

the survey: The evaluation of the responses would be anonymized,

i.e., we only used the LimeSurvey Response ID as an identifier and

3www.limesurvey.org

We cannot exclude the possibility of participants who participated in both studies.

However, the pilot study took place one year earlier, so we assume that the effect

is negligible. In addition, participants were asked about their prior knowledge of

privacy, so the overlap was controlled in the results of participants without any prior

knowledge.

would remove it before the statistical analysis. We only accessed

the results of the pilot study that were necessary to validate our

explanations and questions.

For the participants, the purpose of our study was to evaluate

explanations of the privacy protection provided by two privacy-

enhancing mechanisms. At that point, we did not refer to differential

privacy protection as the focus of our study to avoid the influence

of demand characteristics or participant expectations about our

desired outcome of the study. All participants were presumed to

have at least a high school diploma and to be currently studying at

a university.

There were a total of

249

respondents in the main study. Of these,

only

participants answered the subjective and objective com-

prehension questions for both explanations and could therefore be

included in the analysis. Of these, three participants were excluded

because they gave an incorrect answer to one of the comprehension

questions regarding the scenario or because they answered one of

the attention-check questions incorrectly, resulting in a total of

analyzed participants. Of these,

participants fully completed the

study and thus answered all questions. We decided to nevertheless

include the other

participants who did not finish the study into

parts of our analysis to increase the statistical power of our study

and to reduce motivation bias. In conclusion,

participants were

included in analyses involving the objective and subjective compre-

hension,

participants were included in all our analyses, including

those concerning the direct comparison and those involving the

participants’ privacy concerns or numeracy skills.

Consequently, we included

submissions in the analysis:

for

RISK

for

RRT

, and

for

DEF

. Of these participants,

indi-

cated a “STEM” study field of science, technology, engineering, or

mathematics (

were students of computer science/engineering).

Five students indicated a study field of management or economics,

eight students indicated a study field related to architecture or de-

sign, and

students indicated a study field of social sciences, or

psychology. The age of the participants ranged from

years

with a mean age of approximately

25.03

and a median age of

The level of education was high overall, with

participants having

a bachelor’s degree or higher. Of these,

participants stated that

they had a master’s degree. These

participants spent an average

of 34.8minutes on the study.

3.5 Data Set Pre-processing and Analysis

Each participant received a score for subjective comprehension

and an objective comprehension score, both between

and

corre-

sponding to “very poor” and “very good” comprehension, respec-

tively. To obtain the subjective comprehension score, we calculated

the mean score of the three subjective comprehension questions for

each participant. We thereby inverted the score of the first question

so that for every question a higher score indicated greater com-

prehension. We then normalized the scores to a range from

To measure objective comprehension, we scored each correct and

incorrect answer as

and

, respectively. We calculated the mean

of the four objective comprehension questions for each participant

and normalized the objective score to be between 0and 1.

225

CODASPY ’24, June 19–21, 2024, Porto, Portugal Saskia Nuñez von Voigt, Luise Mehner, & Florian Tschorsch

In order to avoid the influence of baseline differences in the

participants’ comprehension abilities between the conditions, we

calculated the differences between the comprehension scores for

the privacy protection of

𝑘

-anonymity and differential privacy for

each participant. In other words, we subtracted the mean scores for

the objective and the subjective comprehension of the differential

privacy explanation from the mean scores of the

𝑘

-anonymity ex-

planation model. A positive difference means that the

𝑘

-anonymity

explanation had a higher score, and was thus easier to comprehend.

A negative difference, on the other hand, means that the differential

privacy explanation has a higher score. If the difference is

, the

scores indicate a similar level of comprehension.

We tested for differences between the subjective comprehension

scores for the two explanations for each condition using one-tailed t-

tests. Furthermore, we examined the effect of the explanatory model

for differential privacy protection on the differences between the

comprehension scores for the two explanations. We also examined

the effect of the order of explanations, i.e., whether participants had

read the explanation and answered the corresponding comprehen-

sion questions for privacy protection of

𝑘

-anonymity or for that

of differential privacy first. To that end, we conducted two-way

analyses of variance (ANOVAs) with two between-subject factors

of explanatory model and order. We conducted one ANOVA for the

differences in the subjective comprehension scores and one for the

differences in the objective comprehension scores. The ANOVAs

also allowed us to investigate any interaction effects between the

two factors (explanatory model and order).

We also wanted to analyze how the participants’ comprehension

of differential privacy protection was influenced by privacy con-

cerns and numeracy skills. For that we included only participants

who had completed the whole study, as the questions regarding

privacy concerns and numeracy skills were asked at the end of the

study. To test (H3), we calculated a privacy concern score as well

as a subjective numeracy score and an objective numeracy score

for each participant. We used Pearson’s correlation coefficient to

measure the correlations.

In the final data set, each entry contains the following values: an

explanation group {1-3}, a mean subjective comprehension score

for

𝑘

-anonymity [0-1], an objective comprehension score for

𝑘

anonymity [0-1], a mean subjective comprehension score for differ-

ential privacy [0-1], an objective comprehension score for differ-

ential privacy [0-1], a comparison regarding comprehensibility {

𝑘

anonymity, differential privacy, both}, a comparison regarding pre-

vention {

𝑘

-anonymity, differential privacy, both}, education {high

school, bachelor, master}, a privacy concerns score [1-7], a subjec-

tive numeracy score [0-2] and an objective numeracy score [0-8].

3.6 Limitations

Our study was conducted with students from German universities

only. Therefore, our results cannot be transferred to the general

public. To allow more generic inferences, future work should test a

more heterogeneous sample. Also, the number of participants was

limited and many participants aborted the study. As a result, the

statistical power might simply not have been sufficient to detect all

effects. Therefore, future work should investigate a higher number

of participants, or should execute a power analysis, using our effect

sizes as a basis. To increase the statistical power of our study and

to reduce motivation bias, we indeed included participants who

answered questions concerning subjective and objective compre-

hension for both explanations, but did not fully complete the study.

However, these responses were only used in certain parts of our

analysis. Although these participants answered the scenario-check

question, they did not answer our attention- and math-check ques-

tions at the end of our study. Therefore, we cannot be certain that

those participants answered our study attentively.

Furthermore, we compared the comprehensibility of different

privacy mechanisms, which also provided different levels of privacy

protection. Therefore, we cannot rule out that this difference had an

effect on our results. However, it is unlikely as we did not compare

the privacy protection provided by the mechanisms but only the

participants’ comprehension.

Most importantly, our objective comprehension questions may

have been inherently easier to answer for one of the mechanisms’

privacy protection or for one of the explanatory models for differ-

ential privacy protection. Even though we conducted a pilot study

to validate our scenario, our explanations, and our comprehension

questions in terms of textual clarity and general comprehensibility,

we had to apply some adjustments to the objective comprehension

questions. As a result, the modified questions were not validated

before the main study. Therefore, the objective comprehension

questions may have inherently favored one of the explanations.

4 PILOT STUDY

The primary goal of the pilot study was to evaluate our study ques-

tions with respect to ambiguity, difficulty, and internal consistency.

Furthermore, the pilot study allowed us to refine the wording of

the questions, explanations, and study instructions. Moreover, it

allowed us to consider and address the comments provided by the

participants. These comments were read by one author and selected

if they included feedback about the instructions5.

We had the following findings from our pilot study. The majority

of incorrect values for our check question were very close to the

correct value. Hence, some of the answers may have been incorrect

due to mathematical difficulties rather than inattentiveness. There-

fore, we replaced the “attention” check question (What is

15 +7

with a simpler calculation to reach less mathematically able people:

What is 4+5?

All of our explanations were generally understandable, indicated

by high mean comprehension scores for all explanations (

min =

0.62

max =0.92

). In the main study, we also recorded the order

of explanations for

𝑘

-anonymity and differential privacy, to derive

any possible relationship between the comprehension and the order

of the explanations.

The means of the subjective scores were similar across all condi-

tions. Therefore, we only aligned the wording so that all questions

explicitly enquired about the level of privacy protection instead of

about the mechanism itself.

We removed one objective comprehension question since this

question was only answered correctly by a very few participants

The comments were solely intended to improve the wording of the questions, the

explanations, and the study instructions, and were not primary research artifacts.

Therefore, we did not use any further statistical methods for these entries.

226

From Theory to Comprehension: A Comparative Study of Differential Privacy and 𝑘-Anonymity CODASPY ’24, June 19–21, 2024, Porto, Portugal

Table 1: Comprehension scores for the explanatory models

𝑘-anonymity differential privacy

N subjective objective subjective objective

RISK 27 0.78 0.77 0.64 0.51

RRT 30 0.70 0.73 0.65 0.50

DEF 33 0.78 0.76 0.45 0.43

compared to the other questions. In addition, we modified objective

questions #1–#3 to maintain consistency in wording, by adding

the concept of the privacy parameter in the explanations and in

question #2. In question #3, we asked for the implications when the

privacy parameter is 0.

The comments overall suggested a high level of comprehension

of privacy protection for both mechanisms. We also confirmed that

the privacy protection of

𝑘

-anonymity was indeed regarded to be

as inherently easier to comprehend, and a less complex privacy

mechanism. The comments led us to shorten the explanations as

much as possible and to explain that the aim of the study focused

on the comprehension of privacy protection rather than of the

mechanisms themselves. The comments also led us to focus more

on the link between the privacy parameter and the level of privacy

protection provided, and less on how the mechanisms work.

Many participants were confused about the phrase “random

noise” in the explanation used in our pilot study, and the specified

results of the differential privacy mechanism. We decided to avoid

mathematical vocabulary wherever possible, to completely exclude

the concept of random noise from the explanations and to omit any

specific calculations or results returned by the mechanism. Instead,

we emphasize the probabilities and describe that differential pri-

vacy provides privacy protection by randomly modifying statistical

results. We also received indications that it would be helpful to

include the information in our scenario that Eve, the adversary,

knows that the returned result may not be correct. Furthermore,

we changed the wording of the sample database answers from

“true/false” to “yes/no”, because this seemed to be confusing in light

of the “true answers of the students”.

5 RESULTS

In the following, we describe the results of our main study. Through-

out our analysis, we use a significance-level of

𝛼=0.05

and adjust

the results of the post-hoc t-tests with Bonferroni corrections.6

5.1 Subjective Comprehension

The mean scores of subjective comprehension in Table 1 show that

across all explanatory models for differential privacy protection, the

level of privacy protection resulting from

𝑘

-anonymity was easier to

comprehend than that resulting from differential privacy. One-tailed

t-tests revealed a significant difference in comprehensibility, with

higher scores for the privacy protection of

𝑘

-anonymity, using the

RISK

model (

𝑡≈4.522

𝑝<0.001

, Cohens

𝑑≈0.762

) and using the

The false positive error grows with the number of tests performed. A common ap-

proach to deal with this is the Bonferroni correction, which sets

𝛼

for the entire set of

𝑛

comparisons equal to

𝛼=𝛼/𝑛

. For example, if we have a set of three hypothesis

tests and 𝛼=0.05, our adjusted significance level equals 0.05/3=0.017.

RISK RRT DEF

−1.0

−0.5

0.0

0.5

1.0

mean difference

order

k-anon, dp (1)

dp, k-anon (2)

(a) Subjective

RISK RRT DEF

−1.0

−0.5

0.0

0.5

1.0

mean difference

order

k-anon, dp (1)

dp, k-anon (2)

(b) Objective

Figure 2: Differences between scores for comprehension of

𝑘-anonymity and differential privacy.

DEF

model (

𝑡≈7.586

𝑝<0.001

, Cohens

𝑑≈1.749

). These results

support (H1). In contrast to (H1), when we used the

RRT

model,

the difference in comprehension between the privacy protection

𝑘

-anonymity and differential privacy was not significant for

subjective comprehension (𝑡≈1.27,𝑝≈0.107).

The subjective comprehension scores regarding differential pri-

vacy protection were higher when using the

RISK

model than the

DEF

model, and slightly higher when using the

RRT

model. We

present the difference between the mean subjective scores of

𝑘

anonymity and the mean subjective score of differential privacy in

Figure 2a. In

RRT

the differences were distributed around zero, with

a median of zero, whereas the interquartile range and median in

RISK were above zero, but lower than in DEF.

We tested the distribution of the differences with

RISK

RRT

, and

DEF

for each order of explanations with D’Agostino’s K-squared

tests. There was no indication of the differences not being normally

distributed for any of the conditions. Furthermore, Levene’s test

did not show any significant differences of the variances between

any of the conditions, indicating equality of variances. Hence, all

requirements for conducting a between-subject two-way ANOVA

were met. The ANOVA revealed a significant effect of the explana-

tory model on the difference between the subjective comprehension

scores for the privacy protection of

𝑘

-anonymity and differential

privacy, with

𝐹≈12.979

𝑝<0.001

and

𝜂2≈0.234

. There was no

significant effect of the order of explanations on the differences. Fur-

thermore, there was no significant interaction between explanatory

model and order of explanations.

One-tailed t-tests showed that in comparison to the

DEF

model,

the mean difference was significantly smaller with the

RISK

model

(

𝑡≈ −3.342

𝑝<0.001

, Cohens

𝑑≈0.809

) as well as with the

RRT

model (

𝑡≈ −4.624

𝑝<0.001

, Cohens

𝑑≈1.166

). Moreover, the

difference in comprehensibility was smaller when we used the

RRT

model than when we used the

RISK

model (

𝑡≈ −1.725

𝑝≈0.045

Cohens

𝑑≈0.458

). These results corroborate (H2), where we hy-

pothesized that the comprehension of the

RISK

and the

RRT

model

is higher than of the

DEF

model. For the

RRT

model, participants

achieved subjective comprehension scores comparable to those for

the privacy protection of

𝑘

-anonymity, a privacy mechanism that

is supposedly less complex and whose privacy protection is more

intuitively understandable.

227

CODASPY ’24, June 19–21, 2024, Porto, Portugal Saskia Nuñez von Voigt, Luise Mehner, & Florian Tschorsch

01234

correct objective answers

number of participants

(a) 𝐾-anonymity

01234

correct objective answers

number of participants

RISK

RRT

DEF

(b) Differential privacy

Figure 3: Proportion of correctly answered questions on ob-

jective comprehension.

5.2 Objective Comprehension

From Table 1, we infer that the objective comprehension scores for

𝑘

-anonymity were higher than for the differential privacy expla-

nation. The scores for the objective comprehension of the privacy

protection of differential privacy were generally low for all expla-

nations. With a mean of around

0.5

, the scores correspond to the

expected success rate by randomly guessing the answers.

In Figure 3, we show the number of correct objective answers for

each explanatory model. For

𝑘

-anonymity, all participants had at

least one correct answer (cf. Figure 3a). As illustrated in Figure 3b,

three participants answered all objective comprehension questions

wrong. All three participants were part of the

DEF

group. Notably,

none of the participants answered all the questions correctly. One-

tailed t-tests revealed a significant difference in comprehensibility,

with higher scores for the privacy protection of

𝑘

-anonymity, with

RISK

(

𝑡≈6.31

𝑝<0.001

, Cohens

𝑑≈1.72

), with

RRT

(

𝑡≈6.18

𝑝<0.001

, Cohens

𝑑≈1.59

) and with

DEF

(

𝑡≈6.62

𝑝<0.001

Cohens 𝑑≈1.51). These results confirm (H1).

In Figure 2b, we plot the mean differences of scores between

𝑘

-anonymity and differential privacy. The mean difference between

the objective comprehension of privacy protection provided by

𝑘

-anonymity and differential privacy was smallest with the

RRT

model, followed by the

RISK

model, and last the

DEF

model. With

the

RISK

and the

RRT

model the difference was smaller for the

second order group, i.e., where the differential privacy explanation

was shown first.

All requirements for conducting a between-subject two-way

ANOVA were met. In particular, D’Agostino’s K-squared tests for

normality did not reveal any significant deviation of the differ-

ences from a normal distribution for any of the three conditions.

Also, Levene’s test indicated equality of variances between the con-

ditions. The ANOVA did not reveal any significant effect of the

explanatory model on the differences in objective comprehension

(

𝐹≈1.245

𝑝≈0.293

). There was no indication of a significant

effect of the order of the explanations or a significant interaction

of the explanatory model and the order. Figure 2b shows that the

interquartile range of the differences was lower with

RISK

and

RRT

than with

DEF

. For these reasons, we conducted post-hoc one-tailed

t-tests to investigate whether the mean differences with the

RISK

RISK RRT DEF

number of participants

k-anonymity

both equally

differential privacy

(a) Comprehensible

RISK RRT DEF

number of participants

k-anonymity

both equally

differential privacy

(b) Prevent

Figure 4: Comparison regarding comprehensibility and pri-

vacy prevention.

model and the

RRT

model for differential privacy protection dif-

fered significantly from the

DEF

model. The results indicated that

the mean difference was smaller with the

RISK

model than with

the

DEF

model (

𝑡≈ −1.008

𝑝≈0.159

, Cohens

𝑑≈0.262

). There

was a tendency for the mean difference to be smaller with the

RRT

model than with the DEF model (𝑡≈ −1.467,𝑝≈0.074).

These results support (H2), suggesting that the users’ objective

comprehension concerning the privacy protection provided by dif-

ferential privacy is enhanced through the

RISK

model and may be

enhanced through the RRT model.

5.3 Comparison of 𝑘-Anonymity and

Differential Privacy

We present the answers to the direct comparison between the dif-

ferential privacy mechanism and the

𝑘

-anonymity mechanism in

Figure 4. Few participants rated the level of privacy protection

and thus the implications of differential privacy as more compre-

hensible than that of

𝑘

-anonymity (see Figure 4a). In

DEF

, nobody

rated the differential privacy explanation more comprehensible

than

𝑘

-anonymity. These results support (H1), which states that the

privacy protection of

𝑘

-anonymity is easier to comprehend than

the privacy protection of differential privacy.

The overall answer about which mechanism was better at pre-

venting a data breach was in favor of

𝑘

-anonymity with the

RISK

model and the

RRT

model (cf. Figure 4b). In

DEF

participants said

that both were equally good at preventing a data breach.

5.4 Effects of Level of Education and Numeracy

Skills

In Figure 5a, we show that objective comprehension scores increase

with higher levels of education across all explanatory models, with

𝑟≈0.285

, and

𝑝≈0.008

. Especially for the

DEF

model, a higher level

of education is associated with a higher objective comprehension

score. Thus, we can confirm (H3), which states that a high level

of education helps users to comprehend the privacy protection of

differential privacy.

Our participants’ objective and subjective numeracy skills were

high overall; we had no participants with a score below

0.9

(see Fig-

ure 5b). The participants’ subjective numeracy skills also correlated

228

From Theory to Comprehension: A Comparative Study of Differential Privacy and 𝑘-Anonymity CODASPY ’24, June 19–21, 2024, Porto, Portugal

Highschool

Bachelor

Master

0.0

0.2

0.4

0.6

objective-dp

RISK

RRT

DEF

(a) Education level

0 1 2

subjective numeracy score

0.2

0.4

0.6

0.8

1.0

mean-subjective-dp

(b) Subjective numeracy

Figure 5: Correlations of comprehension.

positively with their subjective comprehension scores for differen-

tial privacy protection, with

𝑟≈0.299

𝑝≈0.008

. We did not find

any correlation between the objective numeracy skills and objective

comprehension score. Regarding the subjective comprehension, we

can confirm (H3). We did not find any other correlations between

the objective or subjective comprehension scores and the level of

privacy concerns.

5.5 Exclusion of Knowledgeable Participants

Prior knowledge of privacy definitions may have influenced the

study results

. Therefore, we reran our analysis, excluding all par-

ticipants who indicated that they were already aware of one or

more privacy mechanisms, particularly differential privacy. This

resulted in only

participants who completed the study, so the

results are limited in their power. However, most findings remained

unchanged after these adaptions.

The privacy protection provided by

𝑘

-anonymity was subjec-

tively rated as significantly more easily understood than the privacy

protection provided by differential privacy among respondents in

the RISK and in the DEF group, but not in the RRT group.

The mean difference between the subjective comprehension

scores of

𝑘

-anonymity and differential privacy was again smallest

with the

RRT

model, followed by the

RISK

model. There was still

a significant effect of the explanatory model on the differences in

the subjective comprehension scores (

𝐹≈8.814

𝑝<0.001

𝜂2≈

0.2570

) and no significant effect from the order of the explanations

or from the interaction between the explanatory model and the

order of the explanations. One-tailed t-tests indicated that the mean

difference between the subjective comprehension scores for the pri-

vacy protection provided by

𝑘

-anonymity and differential privacy

was significantly smaller when using the

RRT

model compared to

the DEF model.

The scores regarding the objective comprehension of differential

privacy protection were highest with the

RISK

, followed by the

RRT

model. However, the mean difference between the privacy

protection of

𝑘

-anonymity and of differential privacy was now

We could not exclude the possibility of an overlap of participants who took part in

the pilot study and the main study. However, by excluding participants with prior

knowledge of privacy, we controlled for overlap and thus verified that the main results

would remain valid.

smallest with the

RRT

model. There was still no significant effect

on differences in the objective comprehension scores.

The findings regarding the correlations were similar to the find-

ings when knowledgeable participants were included. There was

again a significant positive correlation between subjective compre-

hension scores and subjective numeracy skills (

𝑟≈0.349

𝑝≈0.008

The level of education also indicated a positive direction to the ob-

jective comprehension (

𝑟≈0.251

𝑝≈0.067

). Remarkably, we

found a significant positive correlation between the objective and

subjective comprehension for

𝑘

-anonymity (

𝑟≈0.361

𝑝≈0.006

Again, we did not find any other expected correlations.

5.6 Summary of Findings

The following points represent our main findings:

•

The privacy protection of

𝑘

-anonymity was only rated as

significantly more easily understood subjectively than the

privacy protection of differential privacy with the

RISK

and

the DEF models.

•

The privacy protection of

𝑘

-anonymity is objectively easier

to comprehend than the privacy protection of differential

privacy (independent of the explanatory model).

•

The

RISK

and

RRT

models significantly enhanced the subjec-

tive comprehensibility of differential privacy protection to

greater extent than the DEF model.

•

The objective comprehension of differential privacy protec-

tion is enhanced by the

RISK

model and may be enhanced

through the RRT model.

•

We find a positive correlation between the level of education

and objective comprehension.

•

Participants with high subjective numeracy skills had also

high subjective comprehension scores. Moreover, partici-

pants with high objective numeracy skills had also higher

objective comprehension scores.

6 DISCUSSION

Our study was motivated by investigating various models attempt-

ing to explain privacy protection of differential privacy. We com-

pared the comprehensibility of three different explanatory models

with the privacy protection of

𝑘

-anonymity as a baseline for com-

prehensibility. In the following section, we put our results into

a larger context. One especially salient aspect is the suitability

of our explanatory models for understanding differential privacy

protection. We also reflect on the differences in how users under-

stand privacy protection of differential privacy compared to the

protection of

𝑘

-anonymity. Finally, we present our thoughts on the

influence of the level of education and numeracy skills.

The different explanatory models support the comprehension of pri-

vacy protection to varying extents. The subjective comprehension is

significantly enhanced by the

RRT

and

RISK

models of explanation.

The

DEF

model was the most difficult to understand with respect to

the privacy protection. This result fits well with [

] and confirms

that the pure definition of differential privacy is not that easy to

understand. Nevertheless, the

RRT

model helped people to com-

prehend differential privacy protection significantly better than

the

RISK

model did. We therefore suggest that the

RRT

model, as a

metaphor, is easier to imagine. However, Karegar et al. [

] advised

229

CODASPY ’24, June 19–21, 2024, Porto, Portugal Saskia Nuñez von Voigt, Luise Mehner, & Florian Tschorsch

being careful with metaphors, as users find them difficult to transfer

to other contexts. We believe that a transfer to other contexts is

easier with our explanatory

RRT

model because we describe privacy

protection and not the mechanism itself.

Explanatory models do not contribute to objective comprehension.

Whereas subjective comprehension was enhanced, we found no

difference on objective comprehension of differential privacy pro-

tection. Surprisingly, our objective comprehension scores for all

differential privacy models were low, with a mean of

≈0.5

. This

may be either due to the complexity of our explanatory models

or to the difficulty of our objective questions. In order to compare

𝑘

-anonymity and differential privacy, we aligned the questions, but

we compared two different mechanisms. Our questions on objective

comprehension were therefore unable to fully capture the subtleties

of differential privacy and thus of our explanatory model. Future

studies should therefore adapt the questions accordingly to better

capture objective comprehension.

The privacy protection of

𝑘

-anonymity is more comprehensible than

that by differentiated privacy. We can summarize by stating that

the privacy protection provided by

𝑘

-anonymity seems to be sub-

jectively easier to comprehend than that of differential privacy.

Differential privacy protection explained with the

RRT

model seems

almost as easy to comprehend as the privacy protection provided

𝑘

-anonymity. Nevertheless, even here more than half of the par-

ticipants rated

𝑘

-anonymity as subjectively more comprehensible

when asked to directly compare the comprehension of the privacy

protection of both mechanisms. This result is in line with the find-

ings of Valdez et al. [

], who evaluated users’ willingness to share

personal health data when applying privacy-preserving techniques

such as

𝑘

-anonymity or differential privacy. The perception of pri-

vacy was rated more strongly for

𝑘

-anonymity than for differential

privacy. The authors assumed that this was because the protection

of differential privacy was too difficult to conceptualize. Our results

show that the

RRT

model enhances the subjective comprehensibility

significantly compared to the other two explanatory models, and

also yields the best scores for the participants’ subjective compre-

hension of the privacy protection in comparison to the privacy

protection of

𝑘

-anonymity. We therefore argue that this model can

serve as a basis for further studies.

In addition, we have high scores concerning objective compre-

hension for

𝑘

-anonymity. This is probably due to the fact that the

privacy parameter

𝑘

has a direct implication for data protection,

which can be understood independently of the data [

]: the param-

eter is related to the legal concept of individual identifiability. Our

explanatory models help establish this relationship to identifiability,

but the implications must be explained in terms of a use case. With

𝑘

-anonymity, privacy protection and the mechanism itself can be

easily visualized with a data set. Hence, a non-expert can verifiy

that the published data set is indeed

𝑘

-anonymous [

]. When com-

municating differential privacy guarantees, past studies have either

attempted to explain or visualize the mechanism itself ([

])

or the risk associated with an

𝜀

([

]). Similar to research by

Nanayakkara et al. [

], future research should find explanations

that convey both the risks and the privacy-utility tradeoff. We be-

lieve that the implications are better understood if the explanation

is not use-case dependent.

Different levels of education need different explanations. We ob-

served that levels of education and numeracy skills led to better

objective and subjective comprehension. Differential privacy pro-

vides a quantitative mathematical definition, so it makes sense that

numeracy skills would be helpful in understanding the privacy pro-

tection of differential privacy. However, we also observed a positive

correlation between the comprehension and subjective numeracy

skills. These results do not correspond to the Dunning-Kruger ef-

fect [

]. However, this might be due to the fact that our sample was

very homogeneous and highly educated. Nevertheless, different tar-

get groups should receive different explanations. Our target group

was end users; further research is needed to determine whether

our explanatory models can help other audiences, e.g. developers,

choose a suitable 𝜀.

7 RELATED WORK

Considering the demand for strong privacy guarantees, there is a

plethora of work on developing new algorithms with differential

privacy guarantees that focus on improving the privacy-utility

tradeoff [

]. A smaller

𝜀

leads to stronger privacy. However,

the problem of how exactly to determine the value of

𝜀

remains a

major challenge. According to Dwork, the value of

𝜀

is a “social

decision” [

]. Therefore, various approaches have been proposed

to quantify the privacy guarantees by translating

𝜀

into a privacy

risk [

] or by using metaphors to describe the differential

privacy mechanism [1, 11, 26].

The privacy risk as well as the randomized response technique

may serve as explanatory models for the differential privacy guaran-

tee [

]. The

RRT

model has been researched with regard to users’

trust and comprehension [

] of the technique. Bullek et al. [

]

focused on describing the randomized response technique by using

a spinner metaphor. Smart et al. [

] investigated explanations of a

differential privacy mechanism, hiding the

𝜀

used and evaluating

users’ willingness to share data. However, it has not yet been deter-

mined whether users’ understanding of this technique is sufficient.

Franzen et al. [

] evaluated the

RISK

approach empirically, with a

focus on how to communicate this risk. Our present study, is a first

step in evaluating comprehensibility of the

RISK

model as well as

of the

RRT

model. Furthermore, the randomized response technique

has generally been researched in isolation; in contrast, this study

evaluates the technique as means of explaining differential privacy

in general.

Some previous studies have evaluated descriptions for differen-

tial privacy. The work of Cummings et al. [

] took a look at users’

expectations that arose from descriptions of differential privacy

mechanisms already in the industry. According to the authors of

that study, existing descriptions fail to explain the differential pri-

vacy guarantee in that users’ expectations are set arbitrarily. We

have aimed for an explanation that would improve users’ compre-

hension of the differential privacy guarantee. Instead of looking

at previously written descriptions, we have evaluated explanatory

models that are intended to facilitate users’ comprehension.

Karegar et al. [

] used blurred images as a metaphor to explain

the privacy protection provided by differential privacy. The authors

noted that the explanations help to communicate the fact that intro-

duced noise protects privacy and that there is some privacy-utility

230

From Theory to Comprehension: A Comparative Study of Differential Privacy and 𝑘-Anonymity CODASPY ’24, June 19–21, 2024, Porto, Portugal

tradeoff. However, their explanations did not directly imply privacy

protection of a particular

𝜀

. ViP [

] is a tool for supporting the

decision of setting/splitting

𝜀

across queries. Nanayakkara et al. [

]

used

RISK

([

]) to show the risk for a different number of users

and set

𝜀

. The authors then depicted the privacy-utility tradeoff for

data analysts. In our study, we have focused on end users as the

target group of such explanations.

Xiong et al. [

] researched how different explanations for dif-

ferential privacy influenced users’ willingness to share their per-

sonal data. The authors found that explanations focusing on the

implications of differential privacy instead of the technical aspects

increased users’ understanding and willingness to share personal

data. Nanayakkara et al. [

] used odds-based explanations based

on the

RISK

model inspired by [

]. They then compared their ex-

planations to the explanations provided by Xiong et al. [

]. Our

study takes a further step towards a more comprehensive explana-

tion of differential privacy, in that we evaluate different explanatory

models that all focus on the implications of the mechanism, i.e., the

privacy guarantee provided.

Our experimental setup, comparing users’ comprehension pro-

vided by

𝑘

-anonymity to that provided by differential privacy was

inspired by Valdez et al. [

]. They examined how understandings

of privacy concern change depending on what data is collected and

how it is used. The degree of privacy of

𝑘

-anonymity was also given

with “indistinguishability” [

]. To explain differential privacy, the

concept of “exceptionality” was used, which indicated how excep-

tional one is among all other respondents. The results of their study

suggest that being part of a larger crowd (

𝑘

-anonymity) appears

to be more privacy protective than differential privacy does. The

authors hypothesized that this is due to the explanation of excep-

tionality. With our work, we have provided models for explaining

differential privacy protection and have evaluated them for their

comprehensibility. Due to our comparison with users’ comprehen-

sion of the privacy protection of

𝑘

-anonymity, we have been able

to help shed light on the extent of users’ comprehension.

8 CONCLUSION

We can conclude that different explanatory models indeed help

people to comprehend the privacy protection provided by differ-

ential privacy. Our results confirm that the

RISK

and

RRT

model

enhance users’ subjective comprehension provided by differen-

tial privacy protection better than the

DEF

model does. We have

therefore presented a way to effectively explain the privacy pro-

tection of a Laplacian differential privacy mechanism. Moreover,

the privacy protection provided by

𝑘

-anonymity was more compre-

hensible than that provided by differential privacy. The

RRT

model

yields the best scores for the participants’ subjective comprehen-

sion. Therefore, we can conclude that

RRT

can serve as a basis for

further studies.

ACKNOWLEDGMENTS

This work is partially funded by the European Union (NextGener-

ationEU). It is also supported by the German Federal Ministry of

Education and Research (BMBF) as part of the research projects

FreeMove and GANGES under reference number 01UV2090B and

16KISA034, respectively.

REFERENCES

[1]

Brooke Bullek, Stephanie Garboski, Darakhshan J. Mir, and Evan M. Peck. 2017.

Towards Understanding Differential Privacy: When Do People Trust Random-

ized Response Technique?. In

CHI ’17: Proceedings of the 2017 Conference on

Human Factors in Computing Systems

. ACM, 3833–3837. https://doi.org/10.

1145/3025453.3025698

[2]

Chris Clifton and Tamir Tassa. 2013. On syntactic anonymity and differential

privacy. In

ICDEW ’13: IEEE 29th International Conference on Data Engineering

Workshops. 88–93. https://doi.org/10.1109/ICDEW.2013.6547433

[3]

Rachel Cummings, Gabriel Kaptchuk, and Elissa M. Redmiles. 2021. ”I need a bet-

ter description”: An Investigation Into User Expectations For Differential Privacy.

CCS ’21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and

Communications Security, Virtual Event, Republic of Korea, November 15 - 19,

2021. ACM, 3037–3052. https://doi.org/10.1145/3460120.3485252

[4]

Cynthia Dwork. 2006. Differential Privacy. In

ICALP ’06: Automata,

Languages and Programming, 33rd International Colloquium, Proceedings, Part

II (Lecture Notes in Computer Science, Vol. 4052)

. Springer, 1–12. https://doi.

org/10.1007/11787006_1

[5]

Cynthia Dwork. 2008. Differential Privacy: A Survey of Results. In

TAMC ’08:

Theory and Applications of Models of Computation, 5th International

Conference (Lecture Notes in Computer Science, Vol. 4978)

. Springer, 1–19.

https://doi.org/10.1007/978-3-540-79228-4_1

[6]

Angela Fagerlin, Brian Zikmund-Fisher, Peter Ubel, Aleksandra Jankovic, Holly

Derry, and Dylan Smith. 2007-09. Measuring Numeracy Without a Math Test:

Development of the Subjective Numeracy Scale.

Medical decision making : an

international journal of the Society for Medical Decision Making

27 (2007-09),

672–80. https://doi.org/10.1177/0272989X07304449

[7]

Daniel Franzen, Saskia Nuñez von Voigt, Peter Sörries, Florian Tschorsch, and

Claudia Müller-Birn. 2022. Am I Private and If So, how Many?: Communicating

Privacy Guarantees of Differential Privacy with Risk Communication Formats. In

CCS ’22: Proceedings of the 2022 ACM SIGSAC Conference on Computer and

Communications Security, Los Angeles, CA, USA, November 7-11, 2022

. ACM,

1125–1139. https://doi.org/10.1145/3548606.3560693

[8]

Arik Friedman, Ran Wolff, and Assaf Schuster. 2008. Providing

-anonymity in

data mining.

The VLDB Journal

17, 4 (2008), 789–804. https://doi.org/10.1007/

S00778-006-0039-5

[9]

Benjamin C. M. Fung, Ke Wang, Rui Chen, and Philip S. Yu. 2010. Privacy-

preserving data publishing: A survey of recent developments.

Comput. Surveys

42, 4 (2010), 14:1–14:53. https://doi.org/10.1145/1749603.1749605

[10]

Justin Hsu, Marco Gaboardi, Andreas Haeberlen, Sanjeev Khanna, Arjun Narayan,

Benjamin C. Pierce, and Aaron Roth. 2014. Differential Privacy: An Eco-

nomic Method for Choosing Epsilon. In

CSF ’14: IEEE 27th Computer Security

Foundations Symposium

. IEEE Computer Society, 398–410. https://doi.org/10.

1109/CSF.2014.35

[11]

Farzaneh Karegar, Ala Sarah Alaqra, and Simone Fischer-Hübner. 2022. Ex-

ploring User-Suitable Metaphors for Differentially Private Data Analyses. In

SOUPS ’22: Proceedings of the Eighteenth Symposium on Usable Privacy and

Security, Boston, MA, USA, August 7-9, 2022

. USENIX Association, 175–193.

https://www.usenix.org/conference/soups2022/presentation/karegar

[12]

Carmen Keller and Michael Siegrist. 2009. Effect of Risk Communication Formats

on Risk Perception Depending on Numeracy.

Medical Decision Making

29, 4

(2009), 483–490. https://doi.org/10.1177/0272989X09333122

[13]

Justin Kruger and David Dunning. 1999. Unskilled and unaware of it: how diffi-

culties in recognizing one’s own incompetence lead to inflated self-assessments.

Journal of personality and social psychology 77, 6 (1999), 1121.

[14]

Johannes A Landsheer, Peter Van Der Heijden, and Ger Van Gils. 1999. Trust and

understanding, two psychological aspects of randomized response.

Quality and

Quantity 33, 1 (1999), 1–12. https://doi.org/10.1023/A:1004361819974

[15]

Jaewoo Lee and Chris Clifton. 2011. How Much Is Enough? Choosing

𝜖

for Differ-

ential Privacy. In

ISC ’11: Information Security, 14th International Conference

Springer, 325–340. https://doi.org/10.1007/978-3-642-24861-0_22

[16]

Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-Closeness:

Privacy Beyond k-Anonymity and l-Diversity. In

ICDE ’07: Proceedings of the

23rd International Conference on Data Engineering

. IEEE Computer Society,

106–115. https://doi.org/10.1109/ICDE.2007.367856

[17]

Isaac Lipkus, Greg Samsa, and Barbara Rimer. 2001-02. General Performance on a

Numeracy Scale Among Highly Educated Samples.

Medical decision making : an

international journal of the Society for Medical Decision Making

21 (2001-02),

37–44. https://doi.org/10.1177/0272989X0102100105

[18]

Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakr-

ishnan Venkitasubramaniam. 2006. l-Diversity: Privacy Beyond k-Anonymity.

ICDE ’06: Proceedings of the 22nd International Conference on Data

Engineering. IEEE Computer Society, 24. https://doi.org/10.1109/ICDE.2006.1

[19]

Naresh K. Malhotra, Sung S. Kim, and James Agarwal. 2004. Internet Users’

Information Privacy Concerns (IUIPC): The Construct, the Scale, and a Causal

Model.

Information Systems Research

15, 4 (2004), 336–355. https://doi.org/10.

1287/isre.1040.0032

231

CODASPY ’24, June 19–21, 2024, Porto, Portugal Saskia Nuñez von Voigt, Luise Mehner, & Florian Tschorsch

[20]

Luise Mehner, Saskia Nuñez von Voigt, and Florian Tschorsch. 2021. To-

wards Explaining Epsilon: A Worst-Case Study of Differential Privacy Risks. In

EuroS&P ’21: IEEE European Symposium on Security and Privacy Workshops,

Vienna, Austria, September 6-10, 2021

. IEEE, 328–331. https://doi.org/10.1109/

EUROSPW54576.2021.00041

[21]

Maurizio Naldi and Giuseppe D’Acquisto. 2015. Differential Privacy: An Estima-

tion Theory-Based Method for Choosing Epsilon.

arXiv preprint

abs/1510.00917

(2015).

[22]

Priyanka Nanayakkara, Johes Bater, Xi He, Jessica Hullman, and Jennie Rogers.

2022. Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Re-

leases.

Proceedings on Privacy Enhancing Technologies

2022, 2 (2022), 601–618.

https://doi.org/10.2478/popets-2022-0058

[23]

Priyanka Nanayakkara, Mary Anne Smart, Rachel Cummings, Gabriel Kaptchuk,

and Elissa M. Redmiles. 2023. What Are the Chances? Explaining the Epsilon

Parameter in Differential Privacy. In

32nd USENIX Security Symposium, USENIX

Security 2023, Anaheim, CA, USA, August 9-11, 2023

. USENIX Association. https:

//www.usenix.org/conference/usenixsecurity23/presentation/nanayakkara

[24]

K. Patel and G. B. Jethava. 2018. Privacy Preserving Techniques for Big Data: A

Survey. In

ICICCT ’18: Proceedings of the 2018 Second International Conference

on Inventive Communication and Computational Technologies

. 194–199. https:

//doi.org/10.1109/ICICCT.2018.8473289

[25]

Sarina B. Schrager. 2018. Five Ways to Communicate Risks So That Patients

Understand. Family practice management 25 6 (2018), 28–31.

[26]

Mary Anne Smart, Dhruv Sood, and Kristen Vaccaro. [n.d.]. Understanding

Risks of Privacy Theater with Differential Privacy.

Proceedings of the ACM on

Human-Computer Interactio, volume = 6, number = CSCW2, pages = 1–24, year

= 2022, doi = 10.1145/3555762, ([n. d.]).

[27]

Latanya Sweeney. 2002. k-Anonymity: A Model for Protecting Privacy.

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems

10 (2002), 557–570.

[28]

André Calero Valdez and Martina Ziefle. 2019. The users’ perspective on the

privacy-utility trade-offs in health recommender systems.

International Journal

of Human-Computer Studies

121 (2019), 108–121. https://doi.org/10.1016/j.ijhcs.

2018.04.003

[29]

Teng Wang, Xuefeng Zhang, Jingyu Feng, and Xinyu Yang. 2020. A Comprehen-

sive Survey on Local Differential Privacy toward Data Statistics and Analysis.

Sensors 20, 24 (2020), 7030. https://doi.org/10.3390/s20247030

[30]

Stanley L. Warner. 1965. Randomized response: A survey technique for eliminat-

ing evasive answer bias. J. Amer. Statist. Assoc. 60.309 (1965), 63–69.

[31]

Aiping Xiong, Tianhao Wang, Ninghui Li, and Somesh Jha. 2020. Towards

Effective Differential Privacy Communication for Users’ Data Sharing Decision

and Comprehension. In

SP ’20: IEEE Symposium on Security and Privacy

. IEEE,

392–410. https://doi.org/10.1109/SP40000.2020.00088

APPENDIX

A SURVEY DETAILS

In this appendix, we provide the description of the explanations of

our explanatory models.

A.1 Descriptions and Explanations

𝐾

-Anonymity.

𝐾

-anonymity provides privacy protection by gen-

eralizing or removing all sensitive columns of the database that

might be used to re-identify a student. This way, for every student

in the database, there is a group of at least

𝑘

students with the same

answers in all sensitive columns. In other words, there are always

at least

𝑘

indistinguishable students. Accordingly,

𝑘

is the privacy

parameter, which determines the level of privacy protection. The

higher the privacy parameter

𝑘

, the more indistinguishable students

exist in each group, resulting in a stronger privacy protection.

Assume the school sets the privacy parameter to

𝑘=4

, i.e.,

the database is modified in a way that it always yields at least

indistinguishable students. Specifically, the database now contains

four students from Bob’s class with a generalized age of

(Peter,

Bob, Marie, and Lucas). Names are not shown. Please note that Eve

cannot link the individual rows to the respective students. Bob’s

drug use could be indicated by each of the four rows equally likely.

Hence, Bob—and its drug use—remain hidden in the group of four

indistinguishable students.

Differential Privacy

RISK

.Differential privacy provides privacy pro-

tection by randomly modifying statistical results extracted from the

database. The results therefore indicate the true answers of the stu-

dents with a certain probability only: Every student in the database

has a certain privacy risk that their true answer can be identified.

This risk is controlled by a privacy parameter, which determines

the level of privacy protection. A privacy parameter closer to zero

reduces the privacy risk, resulting in a stronger privacy protection.

Assume the school sets the privacy parameter in a way that

yields a privacy risk for the students of

%, i.e., the true answers

of the students are indicated with a probability of

%. Now, Eve

accesses the database and asks for the number of

16.3

years old

drug-using students in Bob’s class. Please note that Eve does not

know whether the returned result indicates Bob’s true answer or

not. The result was modified and might therefore be false. Bob has

a privacy risk of 75 %.

Differential Privacy

RRT

.Local differential privacy provides privacy

protection by randomly modifying the students’ answers contain-

ing sensitive information, before storing them in the database. The

stored answers therefore correspond to the true answers of the

students with a certain probability only: Every student in the data-

base has a certain probability that their true answer is stored. This

probability is controlled by a privacy parameter, which determines

the level of privacy protection. A privacy parameter closer to zero

reduces the probability, resulting in a stronger privacy protection.

Assume the school sets the privacy parameter in a way that the

mechanism stores the true answer of a student with a probability

%. Now, Eve accesses the database and asks for the number of

16.3 years old drug-using students in Bob’s class. Please note that

Eve does not know whether the returned result indicates Bob’s true

answer or not. Bob’s answer was modified and might therefore be

false. Bob’s true answer is stored with a probability of 75 %.

Differential Privacy

DEF

.Differential privacy provides privacy pro-

tection by randomly modifying statistical results extracted from the

database. The true results extracted from the students’ answers are

therefore returned with a certain probability only: The probability

of the same result if one of the students gave a different answer

has a certain difference to the probability of the true result. This

difference is controlled by a privacy parameter, which determines

the level of privacy protection. A privacy parameter closer to zero

reduces the difference, resulting in a stronger privacy protection.

Assume the school sets the privacy parameter in a way that

the probability of returning any statistical result is

times the

probability of the same result if one of the students gave a different

answer. Now, Eve accesses the database and asks for the number

16.3

years old drug-using students in Bob’s class. Please note

that Eve does not know whether the returned result indicates Bob’s

true answer or not. The result was modified and might therefore be

returned if Bob gave a different answer. The probability of the true

result being returned is at most

times the probability of the same

result if Bob did not use drugs.

232