Document [original]

An Agent-Based Approach for

Privacy-Preserving Information Filtering

vorgelegt von

Diplom-Informatiker

Richard Ciss´ee

Von der Fakult¨at IV – Elektrotechnik und Informatik

der Technischen Universit¨at Berlin

zur Erlangung des Grads

Doktor der Ingenieurwissenschaften

– Dr.-Ing. –

genehmigte Dissertation

Promotionsausschuss:

Vorsitzender: Prof. Dr. rer. nat. Volker Markl

Berichter: Prof. Dr.-Ing. habil. Sahin Albayrak

Berichter: Prof. Dr. habil. Odej Kao

Tag der wissenschaftlichen Aussprache: 19.05.2009

Berlin 2009

D 83

Abstract

Recommender Systems and Matchmaker Systems utilize Information Filter-

ing technologies in order to provide personalized information in the form of

recommendations of items or users with similar interests, based on a user’s

long-term information needs, which are in turn derived from personal data

and personal preferences. These systems are inherently privacy-critical be-

cause they essentially require personal data. Systems for sensitive domains in

particular are not likely to be widely accepted by users unless they preserve

the privacy of the user data they operate on. At the same time, Informa-

tion Filtering technology providers as well as information providers have to

be sufficiently motivated to develop and run privacy-friendly Recommender

Systems and Matchmaker Systems. In the optimal case, these systems are

multilaterally privacy-preserving in the sense that the privacy of all partici-

pating entities is preserved adequately.

This work describes an approach for distributed multilateral Privacy-

Preserving Information Filtering based on Multi-Agent System technology,

which due to the capabilities of agents is an obvious choice for realizing

distributed privacy-preserving Recommender Systems and Matchmaker Sys-

tems. As prerequisites, we introduce mechanisms for controlling the com-

munication capabilities of agents, which are mainly used in order to prevent

agents from disclosing private data, as well as a solution for transparent per-

sistence of data within Multi-Agent Systems, which is used in order to realize

generic interactions between agents in our approach. As the core of the work,

we specify two modules which allow the realization of privacy-preserving Rec-

ommender Systems as well as privacy-preserving Matchmaker Systems. The

underlying agent interactions are based on cryptographic protocols, which

protect participants against malicious adversaries attempting to obtain or

propagate private data. We describe and examine filtering techniques that

are suitable for our approach. We also describe a prototypical application

based on these building blocks, and we evaluate the overall feasibility of the

approach in terms of functional and non-functional requirements of privacy-

preserving Information Filtering technologies.

iii

Zusammenfassung

Empfehlungssysteme und Matchmaker-Systeme verwenden Technologien zur

Informationsfilterung, um personalisierte Informationen in Form von Empfeh-

lungen von Objekten oder Benutzern mit ¨

ahnlichen Interessen zu liefern. Die-

se Empfehlungen basieren auf langfristigen Informationsbed¨

urfnissen eines

Benutzers, welche aus den pers¨

onlichen Daten und pers¨

onlichen Pr¨

aferenzen

dieses Benutzers abgeleitet werden. Aufgrund dieses expliziten Bedarfs an

pers¨

onlichen Daten ist in derartigen Systemen die Privatheit der Benutzer

inh¨

arent bedroht. Insbesondere in sensiblen Dom¨

anen werden diese Syste-

me wahrscheinlich nur auf breite Akzeptenz stoßen, wenn sie die Privat-

heit der zugrundeliegenden Benutzerdaten bewahren. Gleichzeitig m¨

ussen die

Anbieter von Technologien zur Informationsfilterung und die Anbieter der

zugrundeliegenden Informationen hinreichend motiviert sein, privatheitsun-

terst¨

utzende Empfehlungssysteme und Matchmaker-Systeme zu entwickeln

und zu betreiben. Optimalerweise ber¨

ucksichtigen derartige Systeme die Pri-

vatheit aller beteiligten Parteien und sind somit mehrseitig privatheitsbe-

wahrend.

Diese Arbeit beschreibt einen Ansatz zur verteilten mehrseitigen privat-

heitsbewahrenden Informationsfilterung, basierend auf Multiagentensystem-

Technologie. Aufgrund der F¨

ahigkeiten von Agenten liegt es nahe, diese

Technologie zur Umsetzung von verteilten Empfehlungssystemen und Match-

maker-Systemen einzusetzen. Als Grundlagen beschreibt diese Arbeit Mecha-

nismen zur Einschr¨

ankung der Kommunikationsf¨

ahigkeiten von Agenten und

eine L¨

osung f¨

ur transparente Persistenz von Daten in Multiagentensystemen.

Erstere werden verwendet, um Agenten an der unkontrollierten Weitergabe

von privaten Daten zu hindern, w¨

ahrend letztere verwendet wird, um gene-

rische Interaktionen zwischen den beteiligten Agenten zu erm¨

oglichen. Den

Kern der Arbeit stellt die Spezifikation zweier Module dar, welche die Umset-

zung von privatheitsbewahrenden Empfehlungssystemen und Matchmaker-

Systemen erm¨

oglichen. Die zugrundeliegenden Interaktionen zwischen Agen-

ten basieren auf kryptographischen Protokollen, welche von die beteiligten

Parteien zum Schutz vor b¨

oswilligen Angriffen verwendet werden, in denen

versucht wird, private Daten zu erhalten oder weiterzuverbreiten. Die Arbeit

beschreibt und bewertet Filterverfahren, die unter den gegebenen Bedingun-

gen einsetzbar sind. Weiterhin beschreibt die Arbeit eine prototypische An-

wendung, welche auf diesen Modulen aufbaut, und evaluiert abschließend

den gesamten Ansatz, basierend auf den funktionalen und nichtfunktionalen

Anforderungen von Technologien zur privatheitsbewahrenden Informations-

filterung.

Contents

List of Figures xiii

List of Tables xv

1 Introduction 1

1.1 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Methodology and Notation Conventions . . . . . . . . . . . . 6

1.3.1 Analysis Section . . . . . . . . . . . . . . . . . . . . . . 6

1.3.2 Design & Implementation Section . . . . . . . . . . . . 7

1.3.3 Cryptographic Protocols . . . . . . . . . . . . . . . . . 8

2 Problem Description 11

2.1 Privacy............................... 11

2.1.1 Definitions......................... 11

2.1.2 Strategies for Protecting Privacy . . . . . . . . . . . . 14

2.2 Information Filtering . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1 Definitions......................... 16

2.2.2 IF Architectures . . . . . . . . . . . . . . . . . . . . . . 25

2.2.3 MainProblems ...................... 27

2.3 Privacy & Information Filtering . . . . . . . . . . . . . . . . . 30

2.3.1 Privacy as the Main Problem . . . . . . . . . . . . . . 30

2.3.2 Multilateral Privacy . . . . . . . . . . . . . . . . . . . 31

2.3.3 Measuring Privacy . . . . . . . . . . . . . . . . . . . . 32

2.3.4 Requirements ....................... 38

2.4 Multi-Agent Systems . . . . . . . . . . . . . . . . . . . . . . . 40

2.4.1 Definitions......................... 41

2.4.2 Basic Functionality . . . . . . . . . . . . . . . . . . . . 42

2.4.3 Malicious Hosts . . . . . . . . . . . . . . . . . . . . . . 42

2.5 Summary ............................. 43

vii

3 Related Work 45

3.1 Privacy-Enhancing Technologies . . . . . . . . . . . . . . . . . 46

3.1.1 Anonymous Communication . . . . . . . . . . . . . . . 47

3.1.2 Secure Multi-Party Computation . . . . . . . . . . . . 49

3.1.3 Trusted Computing . . . . . . . . . . . . . . . . . . . . 51

3.1.4 Privacy Enforcement . . . . . . . . . . . . . . . . . . . 54

3.1.5 Evaluation......................... 56

3.2 Privacy-Preserving Technologies . . . . . . . . . . . . . . . . . 56

3.2.1 Peer-Oriented Approaches . . . . . . . . . . . . . . . . 57

3.2.2 Privacy-Preserving Data Mining . . . . . . . . . . . . . 58

3.2.3 Private Information Retrieval . . . . . . . . . . . . . . 60

3.2.4 Privacy-Preserving IF Architectures . . . . . . . . . . . 63

3.2.5 Evaluation......................... 67

3.3 Privacy in Multi-Agent Systems . . . . . . . . . . . . . . . . . 70

3.3.1 Anonymous Communication . . . . . . . . . . . . . . . 70

3.3.2 Protection against Malicious Hosts . . . . . . . . . . . 71

3.3.3 Evaluation......................... 72

3.4 Summary ............................. 73

4 Privacy-Preserving Information Filtering 75

4.1 UseCases ............................. 75

4.2 Outline of the Solution . . . . . . . . . . . . . . . . . . . . . . 77

4.2.1 Trusted Environment . . . . . . . . . . . . . . . . . . . 79

4.2.2 Anonymous Centralized Model . . . . . . . . . . . . . 81

4.2.3 Use of MAS Technology . . . . . . . . . . . . . . . . . 82

4.2.4 Main Components . . . . . . . . . . . . . . . . . . . . 83

4.3 Implementation.......................... 86

4.4 Summary ............................. 88

5 Basic Infrastructure 91

5.1 Motivation............................. 91

5.2 Analysis .............................. 92

5.2.1 Ontologies......................... 93

5.2.2 Roles and Interactions . . . . . . . . . . . . . . . . . . 94

5.3 Design & Implementation . . . . . . . . . . . . . . . . . . . . 101

5.4 Summary .............................103

6 Transparent Persistence 105

6.1 Motivation.............................105

6.1.1 Persistence Interface . . . . . . . . . . . . . . . . . . . 106

6.1.2 Generic Transparent Persistence . . . . . . . . . . . . . 107

viii

6.2 Analysis ..............................108

6.2.1 Ontologies.........................108

6.2.2 Roles & Interactions . . . . . . . . . . . . . . . . . . . 109

6.2.3 Internal Functionality . . . . . . . . . . . . . . . . . . 110

6.3 Design & Implementation . . . . . . . . . . . . . . . . . . . . 111

6.3.1 Agents & Agent Services . . . . . . . . . . . . . . . . . 111

6.3.2 Internal Functionality . . . . . . . . . . . . . . . . . . 112

6.4 Summary .............................114

7 The Recommender Module 115

7.1 Motivation.............................115

7.2 Analysis ..............................116

7.2.1 Ontologies.........................116

7.2.2 Roles and Interactions . . . . . . . . . . . . . . . . . . 116

7.2.3 Summary .........................129

7.3 Design & Implementation . . . . . . . . . . . . . . . . . . . . 129

7.3.1 Threats and Countermeasures . . . . . . . . . . . . . . 130

7.3.2 Other Requirements . . . . . . . . . . . . . . . . . . . 144

7.3.3 Agents and Agent Services . . . . . . . . . . . . . . . . 144

7.3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . 146

7.4 Summary .............................146

8 The Matchmaker Module 149

8.1 Motivation.............................149

8.2 Analysis ..............................149

8.2.1 Ontologies.........................150

8.2.2 Roles and Interactions . . . . . . . . . . . . . . . . . . 150

8.3 Design & Implementation . . . . . . . . . . . . . . . . . . . . 154

8.3.1 Threats and Countermeasures . . . . . . . . . . . . . . 154

8.3.2 Other Requirements . . . . . . . . . . . . . . . . . . . 155

8.3.3 Agents and Agent Services . . . . . . . . . . . . . . . . 156

8.3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . 156

8.4 Summary .............................156

9 Exemplary Filtering Techniques 157

9.1 Motivation.............................157

9.2 Analysis ..............................158

9.3 Design & Implementation . . . . . . . . . . . . . . . . . . . . 160

9.3.1 Item Similarity Algorithm . . . . . . . . . . . . . . . . 160

9.3.2 Item-based Top-N Recommendation Algorithm . . . . 162

9.3.3 Hierarchical Clustering-based Algorithm . . . . . . . . 164

9.4 Summary .............................166

10 Evaluation 169

10.1 Coverage of Requirements . . . . . . . . . . . . . . . . . . . . 169

10.1.1 Functional Requirements . . . . . . . . . . . . . . . . . 169

10.1.2 Non-functional Requirements . . . . . . . . . . . . . . 171

10.2 Trusted Software Approaches . . . . . . . . . . . . . . . . . . 178

10.2.1Broadness.........................178

10.2.2 Provider Acceptance . . . . . . . . . . . . . . . . . . . 178

10.3 Usage Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . 179

10.4Summary .............................180

11 Conclusion & Outlook 181

11.1Applicability............................181

11.2FutureWork............................182

A Specification of Ontologies, Roles and Interactions 185

A.1 Basic Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . 185

A.1.1 Ontologies.........................185

A.1.2 Interactions ........................185

A.1.3 RoleModel ........................190

A.2 Transparent Persistence . . . . . . . . . . . . . . . . . . . . . 192

A.2.1 Ontologies.........................192

A.2.2 Interactions ........................192

A.2.3 RoleModel ........................192

A.3 The Recommender Module . . . . . . . . . . . . . . . . . . . . 196

A.3.1 Ontologies.........................196

A.3.2 Interactions ........................196

A.3.3 RoleModel ........................199

A.4 The Matchmaker Module . . . . . . . . . . . . . . . . . . . . . 205

A.4.1 Ontologies.........................205

A.4.2 Interactions ........................205

A.4.3 RoleModel ........................206

B Basic Infrastructure: Examples 207

B.1 Basic Interactions . . . . . . . . . . . . . . . . . . . . . . . . . 207

B.2 RevokingControl.........................209

B.3 Cascading Control . . . . . . . . . . . . . . . . . . . . . . . . 210

B.4 Additional Management Functionality . . . . . . . . . . . . . 211

C Exemplary Filtering Techniques: Examples 215

D List of Acronyms 219

Bibliography 223

Index 237

List of Figures

2.1 Aspects of Privacy . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 IF Systems and Related Areas . . . . . . . . . . . . . . . . . . 18

2.3 Main Components of an IF Architecture. . . . . . . . . . . . . 24

2.4 Provider-Controlled IF Architecture. . . . . . . . . . . . . . . 26

2.5 Privacy-Enhanced IF Architecture. . . . . . . . . . . . . . . . 27

2.6 User-Controlled IF Architecture. . . . . . . . . . . . . . . . . . 28

3.1 Related Work: Main Areas . . . . . . . . . . . . . . . . . . . . 46

4.1 Abstract Protocol - Linkable Result Data . . . . . . . . . . . . 80

4.2 Abstract Protocol - Private Result Data . . . . . . . . . . . . 81

4.3 Proposed PPIF Architecture . . . . . . . . . . . . . . . . . . . 84

4.4 Main Components of the PPIF architecture . . . . . . . . . . 85

4.5 Screenshot of the Smart Event Assistant . . . . . . . . . . . . 88

5.1 Collaboration: Restrict Communication . . . . . . . . . . . . . 96

5.2 Collaboration: Revoke Control . . . . . . . . . . . . . . . . . . 98

5.3 Collaboration: Restrict Communication (Cascading Control) . 99

5.4 Collaboration: Revoke Control And Terminate . . . . . . . . . 100

6.1 Motivation for Transparent Generic Persistence in PPIF. . . . 106

6.2 Overview of TPMAS and JDO Implementation Tasks . . . . . 113

7.1 Collaboration: Create User Profile . . . . . . . . . . . . . . . . 118

7.2 Collaboration: Set Up Temporary Filter Entity . . . . . . . . 120

7.3 Collaboration: Update Profile Model . . . . . . . . . . . . . . 121

7.4 Collaboration: Get Results (Linkable Result Data) . . . . . . 125

7.5 Collaboration: Get Results (Private Result Data) . . . . . . . 126

8.1 Collaboration: Announce Profile Element . . . . . . . . . . . . 152

A.1 Ontology: Communication Rules . . . . . . . . . . . . . . . . 186

A.2 Ontology: Anonymity . . . . . . . . . . . . . . . . . . . . . . . 186

xiii

A.3 Ontology: Transparent Persistence . . . . . . . . . . . . . . . 192

A.4 Ontology: Query Construct . . . . . . . . . . . . . . . . . . . 193

A.5 Ontology: InformationFiltering . . . . . . . . . . . . . . . . . 197

A.6 Ontology: CollaborativeFiltering . . . . . . . . . . . . . . . . 205

B.1 Communication Control: Examples . . . . . . . . . . . . . . . 213

C.1 HAC Example: Elements and Clusters . . . . . . . . . . . . . 215

xiv

List of Tables

1.1 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Terminology of IF Systems. . . . . . . . . . . . . . . . . . . . 19

2.2 Overview of IF Systems. . . . . . . . . . . . . . . . . . . . . . 23

2.3 Categories of the Privacy Spectrum. . . . . . . . . . . . . . . . 37

2.4 Overview of Existing IF Architectures. . . . . . . . . . . . . . 44

3.1 Private Information Retrieval Schemes . . . . . . . . . . . . . 61

3.2 Overview of PETs, PPTs, and Approaches for PPIF. . . . . . 74

4.1 Main Use Cases of the PPIF Approach . . . . . . . . . . . . . 78

4.2 Main Components of the PPIF Approach . . . . . . . . . . . . 86

5.1 Infrastructure Module: Roles . . . . . . . . . . . . . . . . . . . 95

5.2 Infrastructure Module: Interactions and Agent Services . . . . 102

5.3 Infrastructure Module: Roles and Agents . . . . . . . . . . . . 102

6.1 TPMAS Module: Roles . . . . . . . . . . . . . . . . . . . . . . 109

6.2 TPMAS Module: Interactions and Agent Services . . . . . . . 111

6.3 TPMAS Module: Roles and Agents . . . . . . . . . . . . . . . 111

7.1 Recommender Module: Roles . . . . . . . . . . . . . . . . . . 117

7.2 Interaction Steps - Linkable Result Data . . . . . . . . . . . . 123

7.3 Interaction Steps - Private Result Data . . . . . . . . . . . . . 124

7.4 Interaction Steps (Unlinkable Queries) - Linkable Result Data 127

7.5 Interaction Steps (Unlinkable Queries) - Private Result Data . 128

7.6 Threat Model: Honest-but-curious Participants . . . . . . . . 130

7.7 Secure Message Forwarding via Digital Signatures . . . . . . . 132

7.8 Secure Message Forwarding via HMAC (SMF1) . . . . . . . . 132

7.9 Secure Message Forwarding via Symmetric Encryption (SMF2) 133

7.10 Complexity of Secure Message Forwarding protocols . . . . . . 134

7.11 Result Data Propagation: (Semi-)Linkable Data

(Recommender System scenario) . . . . . . . . . . . . . . . . . 136

7.12 Result Data Propagation: Semi-Private Data

(Recommender System scenario) . . . . . . . . . . . . . . . . . 137

7.13 Result Data Propagation: Completely Private Data

(Recommender System scenario) . . . . . . . . . . . . . . . . . 138

7.14 Result Data Propagation: TFERole as Malicious Participant . 140

7.15 Result Data Propagation: Private Data

(Hybrid System scenario) . . . . . . . . . . . . . . . . . . . . . 142

7.16 Recommender Module: Interactions and Agent Services . . . . 145

7.17 Recommender Module: Roles and Agents . . . . . . . . . . . . 145

8.1 Matchmaker Module: Roles . . . . . . . . . . . . . . . . . . . 150

8.2 Matchmaker Module: Interactions and Agent Services . . . . . 156

8.3 Matchmaker Module: Roles and Agents . . . . . . . . . . . . . 156

9.1 Overview of Exemplary Filtering Techniques. . . . . . . . . . . 167

10.1 Coverage of Functional Requirements . . . . . . . . . . . . . . 171

10.2 Coverage of Privacy Requirements . . . . . . . . . . . . . . . . 173

10.3 Theoretical Performance Evaluation . . . . . . . . . . . . . . . 175

10.4 Practical Performance Evaluation . . . . . . . . . . . . . . . . 176

10.5 Coverage of Other Requirements . . . . . . . . . . . . . . . . . 177

A.1 Interaction: RestrictCommunication . . . . . . . . . . . . . . . 187

A.2 Interaction: CheckRule . . . . . . . . . . . . . . . . . . . . . . 187

A.3 Interaction: ActivateRule . . . . . . . . . . . . . . . . . . . . . 187

A.4 Interaction: AcquireConsent . . . . . . . . . . . . . . . . . . . 187

A.5 Interaction: RevokeControl . . . . . . . . . . . . . . . . . . . . 188

A.6 Interaction: RevokeRule . . . . . . . . . . . . . . . . . . . . . 188

A.7 Interaction: RevokeRule . . . . . . . . . . . . . . . . . . . . . 188

A.8 Interaction: RevokeControlAndTerminate . . . . . . . . . . . . 189

A.9 Interaction: RevokeRuleAndTerminate . . . . . . . . . . . . . 189

A.10 Interaction: InformAboutTermination . . . . . . . . . . . . . . 189

A.11 Interaction: RequestControl . . . . . . . . . . . . . . . . . . . 189

A.12 Interaction: SetupAnonymizer . . . . . . . . . . . . . . . . . . 190

A.13 Role Schema: AnonymizerRole . . . . . . . . . . . . . . . . . . 190

A.14 Role Schema: SupervisorRole . . . . . . . . . . . . . . . . . . 191

A.15 Role Schema: ControllableAgentRole . . . . . . . . . . . . . . 191

A.16 Interaction: CreateContext . . . . . . . . . . . . . . . . . . . . 194

A.17 Interaction: TerminateContext . . . . . . . . . . . . . . . . . . 194

A.18 Interaction: ModifyObjects . . . . . . . . . . . . . . . . . . . 194

A.19 Interaction: RetrieveObjects . . . . . . . . . . . . . . . . . . . 194

A.20 Role Schema: TPMASProviderRole . . . . . . . . . . . . . . . 195

xvi

A.21 Interaction: UpdateProfile . . . . . . . . . . . . . . . . . . . . 196

A.22 Interaction: QueryProfile . . . . . . . . . . . . . . . . . . . . . 196

A.23 Interaction: ObtainTFE . . . . . . . . . . . . . . . . . . . . . 198

A.24 Interaction: SetUpdatePolicy . . . . . . . . . . . . . . . . . . 198

A.25 Interaction: UpdateProfileModel . . . . . . . . . . . . . . . . . 198

A.26 Interaction: QueryProfileModel . . . . . . . . . . . . . . . . . 199

A.27 Interaction: ModifyProfileModel . . . . . . . . . . . . . . . . . 199

A.28 Interaction: GetResultsInternally . . . . . . . . . . . . . . . . 200

A.29 Interaction: GetResults . . . . . . . . . . . . . . . . . . . . . . 200

A.30 Interaction: GetResultsAsSupplier . . . . . . . . . . . . . . . . 200

A.31 Interaction: GetResultsAsUser . . . . . . . . . . . . . . . . . . 201

A.32 Interaction: ExchangeResults . . . . . . . . . . . . . . . . . . 201

A.33 Interaction: ObtainRelay . . . . . . . . . . . . . . . . . . . . . 201

A.34 Interaction: ShareKeys . . . . . . . . . . . . . . . . . . . . . . 202

A.35 Role Schema: InterfaceRole . . . . . . . . . . . . . . . . . . . 202

A.36 Role Schema: ProfileManagerRole . . . . . . . . . . . . . . . . 203

A.37 Role Schema: TFEFactoryRole . . . . . . . . . . . . . . . . . 204

A.38 Role Schema: TFERole . . . . . . . . . . . . . . . . . . . . . . 204

A.39 Role Schema: RelayRole . . . . . . . . . . . . . . . . . . . . . 204

A.40 Interaction: AnnounceProfileElement . . . . . . . . . . . . . . 205

A.41 Role Schema: CentralizedModelManagerRole . . . . . . . . . . 206

B.1 Communication Control - Example 1 . . . . . . . . . . . . . . 208

B.2 Communication Control - Example 2 . . . . . . . . . . . . . . 208

B.3 Communication Control - Example 3 . . . . . . . . . . . . . . 209

B.4 Communication Control - Example 4 . . . . . . . . . . . . . . 210

B.5 Communication Control - Example 5 . . . . . . . . . . . . . . 211

B.6 Communication Control - Example 4a . . . . . . . . . . . . . 212

C.1 HAC Example: Initial Distances . . . . . . . . . . . . . . . . . 216

C.2 HAC Example: Distances after step 1 . . . . . . . . . . . . . . 216

C.3 HAC Example: Distances after Step 2 . . . . . . . . . . . . . 217

C.4 HAC Example: Distances after Step 3 . . . . . . . . . . . . . 217

C.5 HAC Example: Distances after Step 4 . . . . . . . . . . . . . 217

C.6 HAC Example: Distances after Step 5 . . . . . . . . . . . . . 217

C.7 HAC Example: Distances of New Element . . . . . . . . . . . 218

xvii

Chapter 1

Introduction

The total quantity of information stored in electronic format is increasing

exponentially [80]. Consequently, human users accessing information are

increasingly threatened by information overload, i.e. by the fact that the

amount of information available on a specific topic cannot be handled man-

ually any more.

Information Retrieval (IR) technologies enable users to specify short-term

information needs e.g. via search engines, and receive a manageable amount

of relevant information meeting their information needs. In many cases, how-

ever, users have long-term information needs which are only met inadequately

by the use of IR technologies because the respective information changes dy-

namically. As an example, if a user intends to keep track of new publications

within a specific area, he would have to specify the same information need

over and over again when using a search engine, and he would have to filter

the results manually in order to determine which results actually have not

been received before.

Information Filtering (IF) technologies, as the name implies, are a so-

lution for this problem, because they enable users to specify long-term in-

formation needs and to receive the respective information in an manageable

manner: Utilizing IF-based Recommender Systems or Matchmaker Systems,

a user is provided with probably relevant information in the form of rec-

ommendations (e.g. of documents or other users) that are determined based

on information known about the user. This personal information is stored

in a user profile and may contain the user’s general preferences, ratings of

information obtained previously, or references to other users with similar

interests.

Shifting the task of filtering relevant information from the user to a soft-

ware system effectively eliminates the threat of information overload, but un-

fortunately introduces threats with regard to the privacy of the user: While

the use of IR technologies such as search engines is generally not privacy-

critical because single queries cannot be used to obtain information about a

user1, the use of IF technologies is inherently privacy-critical because these

technologies explicitly require personal data.

This apparent discrepancy of personalization and privacy has to be over-

come if solutions based on IF technologies are to be widely accepted. While

users may be less concerned about their privacy in domains covered by ex-

isting Recommender Systems used e.g. in e-commerce applications, Recom-

mender Systems for more sensitive domains providing e.g. financial or health-

related information are not likely to be accepted unless they address privacy

threats. At the same time, providers have to be motivated sufficiently to offer

privacy-friendly Recommender Systems, which they may initially regard as

problematic because these systems cause customer data to be less accessible.

While there are several candidates for technologies which may be used

to realize a solution for privacy-preserving information filtering, this work

shows that Multi-Agent System technology is ideally suitable for realizing

a distributed system with the given requirements. Therefore, the solution

described in this work does not merely constitute the realization of an ab-

stract specification in the context of a Multi-Agent System (MAS) system,

but rather an approach which essentially requires the features provided by

MAS technology.

We summarize these aspects as the main motivation for this work:



The main goal of this work is to provide a solution for Privacy-Pre-

serving Information Filtering in which the aspects of personalization

and privacy are no longer irreconcilable. Users intending to receive

personalized information should not have to give up their privacy with

regard to the personal data this information is based on.



The solution should not preserve the privacy of the users at the cost of

reducing the privacy of the other participants, such as the information

provider. Rather, it should address the privacy of all participants and

thus achieve multilateral privacy.



In addition to these goals, the work is also motivated by the idea of real-

izing a solution which highlights the capabilities of Multi-Agent System

1The privacy of users is at risk, however, when it is possible to link larger numbers of

queries to a single user, even when the respective user is not directly identifiable. As a

recent example, when AOL released 20 million pseudonymized search engine queries from

several hundreds of thousands of users in 2006, it was possible to reconstruct the identities

and information needs of users based on their queries.

technology and shows the potential benefits of a wider propagation of

this technology.

1.1 Main Contributions

As its title implies, this work describes an approach for Privacy-Preserving

Information Filtering, which consists of five main parts as the main contri-

butions of this work:



We introduce mechanisms for controlling the communication capabili-

ties of agents, which are mainly used in order to prevent agents from

disclosing private data.



We introduce a mechanism for transparent persistence of data within

a MAS, which is used in order to realize generic interactions between

participants in our approach.



We specify interactions and protocols used to realize privacy-preserving

Recommender Systems.



We specify interactions and protocols used to realize privacy-preserving

Matchmaker Systems, i.e. systems determining users with similar in-

terests.



We examine and describe filtering techniques that are suitable for our

approach.

Work and results related to this thesis has been published in the following

papers and articles:



R. Ciss´ee. An Architecture for Agent-Based Privacy-Preserving infor-

mation filtering. In Proceedings of the 6th International Workshop on

Trust, Privacy, Deception and Fraud in Agent Systems, 2003.[31].



R. Ciss´ee and S. Albayrak. An Agent-Based Approach for Privacy-

Preserving Recommender Systems. In E. H. Durfee, M. Yokoo, M.

N. Huhns, and O. Shehory, editors, 6th International Joint Confer-

ence on Autonomous Agents and Multiagent Systems (AAMAS 2007),

Honolulu, Hawaii, USA, May 14–18, 2007, IFAAMAS, 2007.[32].



J. Wohltorf, R. Ciss´ee, and A. Rieger. Berlintainment: An Agent-

Based Context-Aware Entertainment Planning System. IEEE Com-

munications Magazine, 43(6), 2005.[117].

The first publication mainly contains the initial ideas of this work. The

second publication summarizes the results of this thesis, focusing on the cen-

tral aspects of communication control and Recommender System function-

ality. The third publication primarily describes the prototypical application

in which our solution for PPIF has been utilized.

1.2 Structure of the Thesis

This thesis consists of the following four main parts, as shown in Table 1.1:

Table 1.1: Overview of the structure of the Thesis.

Part Chapter

I context

1 Introduction

2 Problem Description

3 Related Work

II approach

4 Privacy-Preserving Information Filtering

5 Basic Infrastructure

6 Transparent Persistence

7 The Recommender Module

8 The Matchmaker Module

9 Exemplary Filtering Techniques

III review 10 Evaluation

11 Conclusion & Outlook

IV details

A Specification of Ontologies, Roles and

Interactions

B Basic Infrastructure: Examples

C Exemplary Filtering Techniques: Examples



The first part provides the context for our work. It comprises this chap-

ter as the introduction and the two following chapters: Chapter 2 de-

scribes the problems and states the requirements of Privacy-Preserving

Information Filtering (PPIF), by addressing the two areas of privacy

and Information Filtering separately and in combination. It provides

definitions used throughout the work, and lists requirements for PPIF.

As our solution for PPIF is realized via Multi-Agent System (MAS)

technology, it also provides definitions used in this context, and de-

scribes the central problem of malicious hosts in MAS systems. Chap-

ter 3 reviews related work in the areas of Privacy-Enhancing Technol-

ogies and Privacy-Preserving Technologies, and the state of the art in

privacy-preserving Information Filtering.



The second and main part describes our approach. It comprises a chap-

ter outlining the overall approach and implemention, and five chapters

describing the main components of our approach: Chapter 4 lists the

supported use cases and gives a high-level outline of our solution for

PPIF, based on two essential concepts, namely the concept of a trusted

environment for protecting privacy in Recommender Systems, and the

additional concept of an anonymous centralized model for protecting

privacy in Matchmaker Systems. It motivates the use of MAS technol-

ogy in this context, and describes the implementation of the approach

itself as well as the implementation of a prototypical application based

on the approach. Chapter 5 describes basic functionality for controlling

the communication capabilities of agents, and functionality for anony-

mous communication of agents. Chapter 6 describes an approach for

Transparent Persistence in Multi-Agent Systems (TPMAS). Chapter 7

describes functionality for realizing Recommender System functional-

ity. Chapter 8 describes functionality for realizing Matchmaker System

functionality. Chapter 9 describes exemplary filtering techniques that

are applicable in this context.



The third part constitutes a review our approach. It comprises the

following two chapters: Chapter 10 evaluates our approach for Priva-

cy-Preserving Information Filtering. It discusses the coverage of the

non-functional and functional requirements by our approach and by

the implemented application. It also compares our approach with ap-

proaches based on trusted software, and provides usage guidelines for

applying the functionality specified in our approach. Chapter 11 con-

cludes the work by discussing the applicability of the approach in large-

scale real-world applications and by discussing directions for further

research.



The forth part provides details of our approach. It comprises the fol-

lowing three appendices: Appendix A provides tables and diagrams

containing the formal specification of the components of our approach

for Privacy-Preserving Information Filtering. Appendix B illustrates

the interactions for controlling communication. Appendix C provides

a brief example for the algorithm used by a suitable filtering technique,

namely hierarchical agglomerative clustering via single-link clustering.

By mapping each main contribution of this work to the respective chapter

of the main part of the thesis, the structure of the second part directly

represents the main contributions of this work.

1.3 Methodology and Notation Conventions

Analysis and design of the main components of our approach is kept as generic

as possible. We follow the Gaia methodology [120] for Agent-Oriented Soft-

ware Engineering and use AUML [11] as the modeling language for diagrams.

As utilizing all steps of the Gaia methodology in this work would be neither

illustrative nor practicable, we concentrate on the essential steps, i.e. mainly

the analysis and design phases.

1.3.1 Analysis Section

Each analysis section combines the analysis and architectural design phase

of the Gaia methodology. For every module, the analysis section contains

the ontologies in which the required domain knowledge is conceptualized

and specified. An ontology consists of categories and ontology functions

which may be applied to these categories. Categories contain attributes,

whereas each attribute has type, which may be a base type or a category type.

Categories may inherit attributes from other categories. For the notation of

ontologies, we use AUML-based diagrams.

Furthermore, the analysis section contains the role model describing the

participating roles and interactions. Roles are denoted as ExemplaryRole.

The responsibilities of a role are specified as lifeness expressions exemplary-

Roleiwith i= 1, .., n. Internal activities are denoted as ExemplaryActivity.

Interactions are denoted as ExemplaryInteraction. An interaction consists of

single communication steps as protocol parts, denoted as intsi for the sender

role and intri for the receiver role in the interaction Int, with i= 1, .., n.

However, when specifying an interaction, we usually abstract from single

communication steps in the overall protocol and just indicate the following

three protocol parts:

IntI

def

=ints1.intrn

IntR

def

=intr1

IntP

def

=intsn

Interactions may be interleaving. As an example, the lifeness expressions

firstRole =FirstIntI

secondRole =FirstIntR.SecondIntI.FirstIntP

thirdRole =SecondIntR.SecondIntP

imply that the actual communication steps may be carried out as follows:

firstRole =firstInts1.firstIntr2.firstInts3.firstIntr4

secondRole =firstIntr1.secondInts1.secondIntr2.firstInts2.

firstIntr3.secondInts3.secondIntr4.firstInts4

thirdRole =secondIntr1.secondInts2.secondIntr3.secondInts4

Interactions may be anonymous. A mechanism for anonymous interac-

tion realizing anonymity of the initiator as well as unlinkability of the single

communication steps is described in Chapter 5. A protocol part in which

the respective participant remains anonymous is indicated as Intanon

I. If the

interaction consists of more than the minimal two communication steps, sub-

sequent pairs of communication steps may be assumed to be unlinkable.

Interactions are visualized via AUML-based collaboration diagrams. Here,

instances of a specific role are denoted as “id:RoleName”, or as “id@platform-

id:RoleName”, if the platform configuration for the deployment of the agents

realizing the respective roles is relevant.

1.3.2 Design & Implementation Section

Each design section addresses the steps of the detailed design phase of the

Gaia methodology. Based on the role model, an agent model is specified.

Agents are denoted as ExemplaryAgent. Agent services are denoted as

ExemplaryAgentService. In most cases, the mapping of roles to agents and

interactions to agent services is rather straightforward and therefore not de-

scribed in detail. Details of the implementation of the parts of our approach

that are realized as agent services are omitted for the same reason. Instead,

we focus on implementation aspects only when the respective functionality

has not been specified in terms of ontologies, roles, and interactions as de-

scribed above.

1.3.3 Cryptographic Protocols

For the cryptographic protocols used within our approach, a standard nota-

tion following [20] is used where each protocol step is written as

n:S→R:m

where nindicates the protocol step, Sand Rtwo principals, namely the

sender and receiver, and mthe message.

Steps that may be carried out in parallel are denoted as na,nb, etc.

Anonymous communication is denoted as nanon(S)and nanon(R)for sender and

receiver anonymity respectively. A message contains data X={x1, .., xp},

which is often processed by a specific function f(such as an encryption

function), i.e. m=f(X). A protocol step may be carried out iteratively for

the partial data of a message, in order to achieve unlinkability of the data.

This is denoted as

∀x∈X:nx:S→R:f(x).

If there are subsequent iterations on the same data, as in the following

example

∀x∈X:nx:A→B:f(x)

∀x∈X: (n+1)x:B→C:g(x),

the step (n+ 1)ximay be carried out as soon as the step nxiis finished,

without the complete step nbeing required to have been completed.

Encryption functions are denoted as follows: The term fK(x) denotes a

statement xencrypted with a key Kunder a specific encryption function

f. If fis implicitly given, we use {x}Kas a short expression. The term

K(x) denotes a encrypted statement xdecrypted with a key Kunder a

specific decryption function f0. The protocols in this work are largely based

on symmetric encryption schemes, i.e. schemes where a single key is used

for both encryption and decryption2, so that f0

K({x}K) = x. In asymmetric

encryption schemes, a public key Kis used for encryption, and a private

key K0for decryption, where the private key cannot be derived from the

public key in a feasible manner, so that f0

K0({x}K) = x. Conversely, a digital

2The actual key used for decryption may differ from the actual key used for encryption

as long as the keys are trivially related.

signature sof statement xis usually obtained via encryption of its hash with

a private key, i.e. sK0(h(x)) = {h(x)}K0. It is verifiable via the public key K

as f0

K({h(x)}K0) = h(x).

We denote the encryption of a set of statements X={x1, .., xp}as

{X}K= ({x1}Kx1, .., {xp}Kxp). The keys Kx1to Kxpmay actually be the

same key, but different keys may have to be used depending on the protocol

and the encryption scheme. A secret key known only to principal Aat the

start of the protocol is written as KA. A key shared between two principals

Aand Bis written as KAB.

An encrypted hash of message mis denoted as h(m), where hdenotes a

cryptographic hash function, i.e. a function returning a fixed-size string for a

message of arbitrary length with the following additional properties:



Preimage resistant: For a given h(m) and unknown m, it should be

hard (i.e. computationally infeasible) to find messages m0with h(m0) =

h(m).



Second preimage resistant: For a given m, it should be hard to find

messages m0with h(m0) = h(m).



Collision-resistant: It should be hard to find any two messages mand

m0with h(m0) = h(m).

In order to be able to verify both the integrity and the authenticity of

a message, a keyed-Hash Message Authentication Code (HMAC) may be

generated, based on a cryptographic hash function and a secret key, via

HMACK(m) = h(K∗+h(K∗+m)) with K∗being the key Kpadded

with extra zeroes as required by the hash function. Analogous to using

{m}Kas a short expression for an encrypted message, we use {h(m)}Kas

a short expression for a HMAC. We also use H(M) as a short expres-

sion for {h(m1), h(m2), .., h(mn)}with M={m1, m2, .., mn}. Note that

H(M)6=h(M), because the latter expression denotes a single hash of a set

of messages, while the former expression denotes a set of hashes of single

messages.

Chapter 2

Problem Description

This work deals with privacy in the context of Information Filtering (IF). In

order to be able to describe problems and state the requirements of Priva-

cy-Preserving Information Filtering (PPIF), we first address these two areas

separately, and provide definitions used throughout the work. Subsequently,

we examine the intersection of the two areas and list requirements for PPIF.

As our approach for PPIF is realized via Multi-Agent System (MAS) tech-

nology, we also provide definitions used in this context, and describe the

central problem of malicious hosts in MAS systems. Thus, this chapter is

structured as follows: The following section deals with privacy in general.

Section 2.2 introduces concepts, definitions and architectures related to of

Information Filtering. Section 2.3 discusses the aspect of privacy in the con-

text of Information Filtering architectures, and states the requirements for

Privacy-Preserving Information Filtering. Section 2.4 deals with MAS tech-

nology. Section 2.5 summarizes the results of this chapter.

2.1 Privacy

In this section, we establish a definition of privacy that is used throughout

this work, and we give a short overview of different strategies for protecting

privacy, one of which we follow in this work.

2.1.1 Definitions

The term privacy describes a fundamental human right recognized by soci-

eties worldwide [43]. There is, however, no single universally accepted defi-

nition of privacy. Many different definitions of privacy have been proposed,

mainly because the concept of privacy encompasses different aspects whose

importance cannot be quantified objectively, leading to either very broad or

extremely narrow definitions. Furthermore, different definitions arise from

attempts to distinguish between the related concepts of privacy, confiden-

tiality, and secrecy. In the following, we restrict the discussion to largely

accepted definitions that are most suitable in the context of this work.

The following definition is given by political scientist Alan Westin: “[Pri-

vacy is] the claim of individuals, groups, or institutions to determine for

themselves when, how, and to what extent information about them is com-

municated to others” [116]. It is extended and complemented by the defi-

nition given by psychologist Roger Ingham: “Privacy is concerned with the

claim that individuals or groups have to determine for themselves how, when

and to what extent certain aspects of their behavior are determined by oth-

ers, behavior in this context being generously defined” [63]. Taken together,

these definitions encompass most aspects of privacy. They agree on the fact

that privacy is related to the ability to create barriers between individuals

or groups on the one hand and the society or parts thereof on the other

hand. They focus, however, on different aspects of these barriers: They may

be used in order to prevent the dissemination of information (according to

Westin’s definition), i.e. as active privacy protection, or in order to protect

against intrusions (according to Ingham’s definition), i.e. as passive privacy

protection. In the first case, the initiative originates from the individual or

group trying to expand or at least maintain a barrier, while in the latter case

the initiative is taken by another party trying to breach a barrier.

There are two further aspects of privacy, which are largely orthogonal to

the aspects described above:



Physical privacy is defined as “freedom from unauthorized intrusion”

[83], or, more precisely, as “a restriction on the ability of others to

experience a person through one or more of the five senses” [88]. It

has been recognized as a basic right by future Supreme Court justice

Brandeis as early as 1890 [115], defining it as the “right to be let alone”

in 1928 [105]. These definitions show that physical privacy is more

closely related to passive privacy.



Informational privacy is often merged with concept of data protection.

It is best defined as a right to informational self-determination, au-

thorizing “each individual to determine on the circulation and the use

of his own personal data”1[18], thus controlling the dissemination of

personal information. While sometimes a distinction is made between

1Translated from the definition “die Befugnis des Einzelnen, grunds¨atzlich selbst ¨uber

die Preisgabe und Verwendung seiner pers¨onlichen Daten zu bestimmen.”

individuals and organizations with regard to this aspect, it is in fact

reasonable to apply this definition to groups and organizations as well,

as in the definitions of Westin and Ingham given above. Another as-

pect of informational privacy is the “protection against intrusion by

unwanted information” [87]. Informational privacy is therefore more

closely related to active privacy.

Based on the given definitions, we define four aspects of privacy as shown

in Figure 2.1. The combinations of these aspects are roughly equivalent to

the four aspects of privacy as defined by Westin [116]:



Solitude, i.e. the separation from direct or indirect observation and in-

terference by others, is related to the passive aspect of physical privacy.



Intimacy, i.e. the ability to keep physical interactions and communica-

tion private, is related to the active aspect of physical privacy.



Reserve, i.e. the decision not to reveal certain information to others, is

related to the active aspect of informational privacy.



Anonymity, i.e. The avoidance of identification, is related to the passive

aspect of informational privacy. This relation, however, is less close

than the relations regarding the previous aspects.

Figure 2.1: Aspects of privacy.

In this work, we focus on informational privacy and disregard aspects

of physical privacy. Therefore, in the following the term “privacy” always

denotes informational privacy.

2.1.2 Strategies for Protecting Privacy

The active protection of informational privacy is a non-trivial task, because

controlling the dissemination of information is almost impossible in scenarios

that require more complex actions than all-or-nothing approaches: Allowing

unrestricted access to information as the one extreme, or withholding infor-

mation entirely as the other extreme are approaches that may be accom-

plished in a straightforward manner. However, these approaches are insuffi-

cient in cases where the respective information actually has to be propagated

to certain other parties, but at the same time is intended to be kept under

some control with regard to further dissemination.

The main problem therefore is the following: How may a party prevent

further dissemination of information once it has lost the exclusive control over

this information? There are three main strategies addressing this problem:



Legal Regulation: In most developed countries, legal regulations exist

that deal with the protection of privacy. As examples, we review the

current situation in the United States of America and the European

Union.

In the United States, privacy as a fundamental human right is not ex-

plicitly addressed in the Constitution. Specific aspects of privacy, how-

ever, are protected implicitly, e.g. by the Fourth Amendment (address-

ing the passive aspect of physical privacy), or the First Amendment

(addressing the active aspect of privacy). Currently the main founda-

tion for data protection and informational privacy in the United States

are the Principles of Fair Information Practices based on the Code of

Fair Information Practices [109] and further refined in OECD guide-

lines [89]. They are agreed upon by industry groups, privacy experts

and the United States government. The Fair Information Principles are

intended to limit data collection and to enable individuals to control

the dissemination of their personal information.

In the European Union, the key document addressing privacy is Direc-

tive 95/46/EC of the European Parliament, known as the European

Union Privacy Directive or European Union (EU) Data Protection Di-

rective [40], which binds member states of the European Union to ratify

laws implementing its requirements. The directive restricts the collec-

tion and processing of personal information, and prohibits the transfer

of personal information to entities for whom less strict legal regulations

apply, such as companies in non-member states. Additionally, the safe

harbor framework has been introduced to allow the transfer of personal

data between the EU and the United States, comprising seven princi-

ples consistent with the Principles of Fair Information Practices. Some

member states of the EU have introduced more extensive privacy and

data protection regulations.



Self-Regulation: Today, the processing of personal information collected

via the internet, i.e. by web sites accessed by users in the context of

e-Commerce, is mainly regulated by privacy policies posted on the re-

spective site. These privacy policies are generally not legally binding

and therefore constitute mechanisms within the area of self-regulation.

Comprehensive privacy policies should, according to the Fair Informa-

tion Principles, address at least the following principles:

–Notice/ Awareness: The user has to be noticed of the provider’s

If the user is not aware of the provider’s privacy policy, the fol-

lowing principles are meaningless.

–Choice/ Consent: The user has to be given the possibility to ex-

plicitly decline the collection and further dissemination of personal

information, i.e. the ability to opt-out, or preferably the possibility

to explicitly allow these actions, i.e. the ability to opt-in.

–Access/ Participation: The user has to be given means to access

all personal information collected about him and has to be able

to change or delete at least information that is inaccurate.

–Security/ Integrity: The collected personal information should be

stored securely without third parties being able to obtain it. Ad-

ditionally, the integrity of the information should be ensured.

–Enforcement/ Redress: The provider should act according to his

stated privacy policy. If there is no enforcement mechanism in

place (by legal regulation, external audits, certification or other

means), the privacy policy is merely an assertion of the provider

rather than an agreement between the two parties involved.

Privacy seal programs have been introduced in order to inform users

of the enforcement of privacy policies. Compared to standard privacy

policies, they have two advantages: The user may decide immediately

whether he wants to interact with a provider, based on the displayed

seals. If the user trusts the seal issuer, he does not have to deal with

the respective privacy policy itself, which is in many cases rather com-

plicated and hard to interpret for users who are not familiar with legal

terminology. Furthermore, the user probably trusts an independent

seal issuer more readily than he would trust a single, perhaps formerly

unknown provider. It has been pointed out, however, that current seal

programs generally do not enforce sanctions against malicious providers

and are therefore largely ineffective [33].



Technology: A wide range of Privacy-Enhancing Technologies and Pri-

vacy-Preserving Technologies have been suggested and implemented,

addressing different aspects of privacy in the context of various ap-

plications. These approaches are discussed in detail in the following

chapter.

For all practical purposes, all three approaches ultimately require some

amount of trust: The user whose privacy is to be protected has either to

trust in the responsible legislative, executive and judicial offices to actually

enforce legal regulations, prosecute violations, and abide by these regula-

tions; or he has to trust in the parties he interacts with to adhere to the

stated self-regulatory principles; or he has to trust in the technology that is

applied to work as specified. However, strategies based on technology may

(at least theoretically) not require trust, because the technologies used my

be examined and verified by the user (although in practice most users do

not have the required knowledge to do so). In this work we focus on privacy

protection through technology and therefore subsequently ignore the other

strategies.

2.2 Information Filtering

In this section, we provide definitions that are relevant in the context of In-

formation Filtering, we discuss the main types of IF architectures, and we

list the main problems of IF architectures.

2.2.1 Definitions

In the following, the definitions of the most important terms related to In-

formation Filtering are given.



The term “Personalization” in its broadest definition denotes the adap-

tion of products or services to individual consumers or users, based on

the acquisition and analysis of personal information, i.e. information re-

lated to a single specific person. It is used mainly in the context of Web

Personalization (i.e. in the context of personalized web sites and person-

alized content provided via the World Wide Web) [70], but the underly-

ing concepts have been widely researched under the designation “User

Modelling and User-Adaptive Interaction” [71]. Related terms often

used synonymously are “Customer Relationship Management (CRM)”

and “1-to-1 Marketing”.



The term “Information Filtering” denotes concepts and methods used

to provide personalized information. Systems dealing with other as-

pects of personalization, such as personalized user interfaces, do not

constitute IF systems. IF differs from the related field of Informa-

tion Retrieval in the following aspect: While the objective of IR sys-

tems, such as search engines, is the fulfillment of short-term information

needs, IF systems deal with long-term information needs. Therefore,

while IR systems are based on queries expressing an ephemeral infor-

mation need, IF systems are usually based on user profiles expressing

persistent information needs. An IF system may contain IR functional-

ity, e.g. for refining a long-term information need in a specific context,

resulting in a combination of both kinds of information needs. An ex-

emplary scenario is described in 4.1. Moreover, Information Filtering

is distinguished from the related field of Web Mining by the following

definitions: While Web Mining systems search for relevant informa-

tion across multiple web resources, IF systems operate on structured

information already at hand.



The term “Recommender System” (synonymous with “Recommenda-

tion System”) originally denoted a system for Automated Collaborative

Filtering [96]. The definition, however, has been broadened and now

denotes “any system that produces individualized recommendations as

output or has the effect of guiding the user in a personalized way to

interesting or useful objects in a large space of possible options” [19].



The term “Matchmaker System” (synonymous with “Matchmaking

System”) denotes a system aiming at introducing similar users to each

other [45]. In this context, similarity is usually determined via personal

user profiles which contain general preferences and additional data in-

dicating a user’s interests. It should be noted that especially in the

context of MAS architectures, the term sometimes denotes a different

kind of system, namely a system assisting users or agents in finding

and accessing certain resources [74]. We do not follow this definition

in this work.

According to these definitions, an IF system can be classified as a Rec-

ommender System, a Matchmaker System, or as a Hybrid IF System, i.e. a

combination of both: A Matchmaker System may be combined with a Recom-

mender System, usually by generating personalized information via similar

users. A pure Recommender System may also determine similar users as an

intermediate step in the process of generating personalized information, but

does not introduce similar users to each other. In other words, it is possible

to generate personalized information via similar users, but it is not always

possible to determine similar users via personalized information. Regarding

Matchmaker Systems and Hybrid IF Systems, we additionally distinguish

between centralized and distributed approaches, depending on the prevalent

mode of interaction, as described below. Figure 2.2 shows the relation be-

tween these systems and the related areas.

Figure 2.2: A classification of Information Filtering systems (in

black) and related areas (in gray).

An IF system consists of three main kinds of abstract entities:



The user, who intends to receive personalized information. Addition-

ally, users may provide feedback in order to improve the quality of the

IF system. The user entity directly or indirectly (e.g. via a user agent)

represents a human user2. In distributed IF systems, the user entity

also directly participates in the process of generating personalized in-

formation for a specific, different user entity.

2In some systems, additional artificial user entities are introduced: It has been sug-

gested, for example, to utilize software agents in order to simulate human users with the

goal of improving the quality of recommendations by creating additional ratings [59].



The provider, who provides the information based on which ultimately

personalized information is generated, and from which user profile items

are obtained. The provider entity usually represents a legal entity, such

as a company. In non-distributed IF systems, the provider entity also

directly participates in the process of generating personalized informa-

tion for a specific user entity.



The filter, who provides filtering techniques as collections of algorithms

for generating personalized information. The filter entity usually rep-

resents a legal entity, such as a company.

Table 2.1 gives an overview of these and further terms used in the follow-

ing formal definitions.

Table 2.1: Terminology used in the context of Information Fil-

tering systems throughout this work.

Notation Term

uuser entity

Uthe set of all user entities

pprovider entity

ffilter entity

ssupplier entity, s∈ {u, p}

iitem

Ia set of items

PR a profile

qIR(PR) constrained profile

(as result of IR-based query on a profile)

qIF (PR) partial profile

(as result of IF-based query on a profile)

ma profile model

ft filtering technique

FT the set of all filtering techniques

pp filtering technique algorithm (used

during the information processing stage)

ff filtering technique algorithm (used

during the information filtering stage)

pred a prediction of relevance

REC a set of recommendations

SU a set of similar user entities

RES a set of results

(recommendations or similar user entities)

Because users generally do not want to re-enter personal information for

every single filtering process, and the amount of provider information often

prohibits the direct application of a filtering technique to large collections of

data, an IF system consists of two additional stages preceding the information

filtering stage itself. Thus, we define three separate stages in an IF system:



Information Collection Stage: In the first stage, the data to be used

as the input for subsequent filtering processes is collected. This stage

is independent of the actual filtering technique. On the user side, a

user profile is generated and maintained based on interaction with the

user and/or observation of the user’s actions: The user profile contains

profile elements, which are items (provided by the provider or a different

source) that have been rated either explicitly by the user himself, or

implicitly by analyzing the user’s behavior. Items usually contain pairs

of attributes and values. The user profile may also contain general

preferences of the user, or additional information, such as personally

identifying information and demographic data. On the provider side, a

provider profile is provided (via an unspecified mechanism) containing

a (usually large) collection of items of a specific domain. The formal

definition of a profile is given in Equation 2.1. As entities supplying

profiles, both user entity and provider entity are subsumed as supplier

entities.



Information Processing Stage: In the second stage, the data collected

in the first stage is processed further as a preparation for the final stage.

Based on the profile elements and an actual filtering technique, models

are created on the user and provider side. These models constitute an

additional part of the respective profile, which is therefore defined as

described by Equation 2.1.

PRs=Is∪[

ft∈F T

mIs,ft (2.1)

Simple filtering techniques may use the profile elements themselves as

input (on one side or on both sides), but in most cases models are

generated and maintained in order to reduce the complexity of the

final stage, or because the underlying algorithm explicitly requires a

model. In any case, the structure of the model depends on the filtering

technique to be applied. Exemplary models are neural networks, data

clusters, and decision trees. Based on the way the models are generated

and maintained, we distinguish between two main groups of filtering

techniques:

–In feature-based approaches the models are generated and main-

tained separately for the user and provider side, based solely on

the features of the respective profile items. Data is not exchanged

between different profiles. Therefore, feature-based models are

defined as described by Equation 2.2.

mIs,ft =ppft(Is) (2.2)

–In collaboration-based approaches the models are generated and

maintained via a collaboration of entities, usually different user

entities and a provider entity. Provider models therefore contain

user profile data, and user models may contain provider profile

data as well. Therefore, collaboration-based models are defined

as described by Equation 2.3 and Equation 2.4.

mIu,ft =ppft(Iu, Ip) (2.3)

mIp,ft =ppft(Ip,[

u∈U

Iu) (2.4)



Information Filtering Stage: In the third and final stage, the filtering

technique is applied to two profiles (either to the profile items or to

models) in order to generate personalized information. The personal-

ized information is always generated for a user. The information the

personalized information is based on is supplied by a provider entity

or a different user entity, i.e. s=por s=u00. A process of this stage

generates one of the following four main kinds of results:

–Aprediction of the relevance of a specific item for the given user.

In this case, the item in question has to be provided as additional

input to the filtering technique, and a filtering algorithm as defined

in Equation 2.5 is required. The supplier is the provider (in a

Recommender System or a non-distributed Hybrid IF System) or

a different, presumably similar user (in a distributed Hybrid IF

System). The prediction values are usually defined as a range of

possible values, where a higher value indicates a higher predicted

relevance. Additionally, the value reflects the relevance in relation

to other candidate items available to the supplier. Otherwise, a

supplier profile would not actually be required by the filtering

algorithm.

predu,s,ft,i =ffft(PRu, PRs, i) (2.5)

–The top-nrecommendations for a given user, i.e. a set of nitems

(or less, e.g. in case the number of candidate items is smaller than

n) that are recommended to the given user because they have the

highest predicted relevance of all candidate items available to the

supplier that are not already contained within the user profile.

In this case, a filtering algorithm as defined in Equation 2.6 is

required. The supplier is the provider (in a Recommender System

or a non-distributed Hybrid IF System) or a different, presumably

similar user (in a distributed Hybrid IF System).

RECu,s,ft,n =ffft(PRu, PRs, n) =

{i∈(PRs\PRu)| ∀ X⊆(PRs\PRu\ {i}) :

|X|< n ∨ ∃ x∈X:predu,s,ft,i > predu,s,ft,x}

(2.6)

In case of large supplier profiles, a model-based filtering algorithm

may be used instead, leading to results that may differ somewhat

from this definition.

–Aprediction of the similarity of a specific user and the given user.

Similarity may be based on users’ features, preferences, or any

kind of data related to users. The user in question has to be pro-

vided as an additional input to the filtering technique. In this

case, a filtering algorithm as defined in Equation 2.7 is required.

The supplier is the provider (in a Matchmaker System) or a differ-

ent user (in a distributed Matchmaker system), which may either

be a candidate that is likely to be similar itself (i.e. s=u00 =u0)

or a different user (i.e. s=u00 6=u0). For statements applying to

the prediction values, see above.

predu,s,ft,u0=ffft(PRu, PRs, u0) (2.7)

–The top-nsimilar users for a given user, i.e. a set of nusers (or

less, e.g. in case the number of candidate users is smaller than n)

that are returned to the given user because they have the highest

predicted similarity value of all candidate users available to the

supplier. In this case, a filtering algorithm as defined in Equation

2.8 is required. The supplier is the provider (in a Matchmaker

System) or a different, presumably similar user (in a distributed

Matchmaker system).

SUu,s,ft,n =ffft(PRu, PRs, n) =

{u0∈U| ∀ X⊆(U\ {u0}) :

|X|< n ∨ ∃ x∈X:predu,s,ft,u0> predu,s,ft,x}

(2.8)

In case of large supplier profiles, a model-based filtering algorithm

may be used instead, leading to results that may differ somewhat

from this definition.

Table 2.2: Overview of Information Filtering systems and the

results provided.

supplier: s=psupplier: s=u00

Recommender System/ distributed

Hybrid IF System Hybrid IF System

input: i predu,p,ft,i predu,u00 ,ft,i

input: n RECu,p,ft,n RECu,u00 ,ft,n

distributed

Matchmaker System Matchmaker System

input: u0predu,p,ft,u0predu,u00 ,ft,u0

input: n SUu,p,ft,n SUu,u00 ,ft,n

Table 2.2 summarizes the different kinds of IF systems and the results

they provide.

We note that Matchmaker Systems and Hybrid IF Systems require a

collaboration-based approach because they ultimately have to be based

either on a global provider profile containing information about users,

or on user profiles containing information about other users. Otherwise,

a large fraction of all user-user pairs would have to be compared in

order to find users that are actually similar, which is infeasible in real-

world applications containing a large number of users. Feature-based

approaches are only suitable for pure Recommender Systems, which by

definition cannot be based on collaborative approaches.

Finally, all four kinds of results may be produced based on the part

of a supplier profile returned as a result of a query on that profile. In

this case, which constitutes the combination of long-term and short-

term information needs described above, the given equations have to

be modified by substituting qIR(PRs) for PRs. We refer to this case as

the mixed IR/IF scenario.

Queries on a profile may also occur during the actual filtering process.

We distinguish between both types of queries by denoting the former

queries as qIR(PR) and the results as constrained profiles, and the latter

queries as qIF (PR) and the results as partial profiles.

The main components of an IF architecture, including profiles and mod-

ules for the three stages described above, are shown in Figure 2.3 for the

Recommender System scenario, i.e. with a provider as the supplier in the

information filtering stage. The Matchmaker System scenario is largely anal-

ogous, with a second user replacing the provider.

Figure 2.3: The main components of an IF architecture. The

provider-side components (Data Storage and Results Analyzer)

are used as placeholders for provider functionality that is not fur-

ther specified here.

The two main groups of filtering techniques are classified further mainly

by the kind of profile data they are based on. There are three feature-based

approaches (and various combinations thereof):



In Content-based Filtering approaches, results are generated by deter-

mining provider profile items that are similar to user profile items.

Similarity is determined either by comparing attributes of the respec-

tive items directly, or by creating and applying models that are based

on the attributes of items.



In Knowledge-based Filtering approaches, results are generated by ap-

plying domain-specific knowledge to the user profile items and deter-

mining relevant provider profile items based on this knowledge.



In Utility-based Filtering approaches, results are generated by applying

a utility function, which describes a user’s general preferences, to the

provider profile items.

Additionally, there are two main collaboration-based approaches:



In Collaborative Filtering approaches, similar users are identified based

on the rated items and general preferences of the respective user pro-

files. If recommendations are generated, they are based on the profile

items of similar users.



In Demographic Filtering approaches, similar users are identified based

on the demographic data of the respective user profiles. Personal pref-

erences may additionally used to some extent in order to supplement

or infer demographic data. If recommendations are generated, they are

based on the profile items of similar users.

Combinations of pure feature-based approaches and collaboration-based

approaches are classified as collaboration-based approaches as well, according

to the definitions. The various approaches and combinations are described

in detail in [19].

2.2.2 IF Architectures

As defined above, IF architectures contain three abstract entities aggregat-

ing various components, and operate on various data (such as profiles and

results). Architectures differ, however, in the way the components and data

are controlled, and in the actual entities aggregating the abstract entities.

Existing IF architectures can be grouped into the following three main cate-

gories:



In provider-controlled IF, the provider entity controls all components

and data including the user profile data, and aggregates the filter en-

tity as well. This architecture is the most common approach in research

prototypes as well as E-Commerce applications of Recommender Sys-

tems. Usually, all data is stored in a central database. This approach

allows personalized information to be generated in a highly efficient

manner, because all required information can be accessed directly and

in a uniform way. At the same time, as a centralized architecture con-

stituting a single point of failure e.g. with regard to access to the user

profiles, it is characterized by inherent security risks. Figure 2.4 shows

this architecture.

Figure 2.4: A provider-controlled IF architecture.



In privacy-enhanced IF, the privacy of the user is addressed by enabling

the user entity to store the user profile data locally, e.g. within a web

browser plug-in or as part of a personal agent. Apart from this mod-

ification, the provider-controlled IF architecture model is used. This

approach allows the user entity to determine the personal information

that is collected. Figure 2.5 shows this architecture.



In user-controlled IF, the user entity not only controls the user pro-

file, but aggregates the filter entity as well. Thus, the architecture

basically resembles the privacy-enhanced approach, with the status of

user and provider reversed: In order to generate results, the user en-

tity uses data from the provider profile. This approach is primarily

used in collaboration-based approaches that are largely independent

of a specific information provider. In this case a second user entity

replaces the provider entity as supplier of the additional data. Com-

pared to provider-centric approaches, however, its complexity is gener-

ally rather high: Additional infrastructure aspects have to be addressed

Figure 2.5: A privacy-enhanced IF architecture.

because more complex software on the user side is required, as well as

an advanced communication network (such as a peer-to-peer network).

Figure 2.6 shows this architecture.

2.2.3 Main Problems

IF systems have not gained widespread acceptance. There are various Rec-

ommender Systems and Matchmaker Systems available, usually as part of

e-commerce applications3, but they are generally limited to specific domains

in which privacy and quality aspects in particular are less relevant, such as

entertainment as opposed to health or finance, and they often constitute an

additional service rather than core functionality. We identify the following

four problems as the main limitations resulting in the lack of acceptance of

IF systems:



Privacy: Personalization and privacy are often seen as contrary and ir-

reconcilable: Regardless of the chosen approach, personal information

is always required as a basis for personalized information. Many users,

3Popular examples include online shops such as amazon.com, and internet radio sta-

tions such as last.fm.

Figure 2.6: A user-controlled distributed IF architecture.

however, are reluctant to provide this information. None of the archi-

tectures described above offer a solution for multilateral privacy, i.e. a

solution in which the privacy of all participating entities is protected:

–The main drawback of provider-controlled IF is the lack of user

privacy: The user entity cannot determine the personal informa-

tion that is collected. Furthermore, even if the provider entity

ensures and actually attempts to protect the privacy of the user

information4, this cannot be verified or enforced by the user en-

tity. Therefore, the user entity cannot prevent the propagation or,

more generally, the unintended use of private data.

–The privacy-enhanced IF approach allows the user entity to ini-

tially control the propagation of his personal information by de-

ciding which entities to grant access to this data. However, as

the provider entity ultimately still has to be allowed to access the

user profile in order to provide personalized results, the user entity

still cannot prevent the propagation and unintended use of private

data.

–The user-controlled IF approach is problematic with regard to

provider privacy, if the provider entity participates actively in the

4Approaches for protecting the privacy of user data on the provider side are discussed

in Section 3.1.4.

actual IF processes, because in this case the provider entity loses

control over its profile. In the case of a distributed system, user

privacy is still not protected completely because private data is

exchanged between users.

The privacy of the filter entity, who may regard the details of the

algorithms used by the filtering techniques as private data, is not pro-

tected by any of the approaches described here. We consider the lack

of privacy to be the most important problem in IF architectures, and

therefore it is the primary focus of this work. It is discussed in detail

in the following section.



Quality: Recommender Systems often return unsatisfying results be-

cause they are largely based on unfounded assumptions about the user

and his information needs. In particular, the metrics used to determine

which items to recommend to a user are inadequate in many cases [82],

because they lead to returning obvious recommendations that are very

similar to user profile elements, and neglect the aspect of serendipitous

(unexpected) recommendations. Basically, metrics often tend to ignore

the fact that the user may not always be interested in the elements

with the highest prediction [122].



User Effort: Users are often not willing to spend any amount of time

learning to interact with unknown software or interfaces. Instead, they

prefer to obtain results of lesser quality immediately, even if they would

profit more from the former course of action in the long term. This

behavior, the Production Paradox, is a significant aspect of the Paradox

of the Active User as defined in [26]. In the context of Recommender

Systems, this behavior is encountered in the fact that users are often

not willing to explicitly provide the information their profile is based

upon, e.g. by rating recommendations. Thus, Recommender Systems

that depend largely on explicit user feedback discourage many potential

users.



Provider Bias: Commercial IF architectures ultimately focus on real-

izing the provider’s goals, which may be contrary to the users’ goals:

While providers may offer personalized information with the ulterior

motives of customer retention (a user may use the respective service

in the future because he already put a certain amount of effort into

creating his user profile) and customer data aggregation, these aspects

have generally negative implications for users. The notion expressed

by the term “Customer Relationship Management” is challenged by

the fact that customers generally reject the idea of relationships with

corporations.

To summarize, we identify the following main reasons for a lack of accep-

tance of existing IF-based systems: Lack of privacy, lack of quality, required

user effort, and provider bias. Based on these problems, we arrive at require-

ments of a PPIF architecture in Section 2.3.4. Other problems in IF systems,

such as security aspects, obviously have to be addressed as well, but are less

specific to IF systems and therefore not discussed here.

2.3 Privacy & Information Filtering

In this section, we discuss the aspect of privacy in the context of IF architec-

tures. In the following, we show that privacy is in fact the main problem in

IF architectures, we introduce the concept of multilateral privacy, and dis-

cuss different aspects of privacy that have to be considered when designing

an architecture for PPIF. Finally, we state the requirements for Privacy-Pre-

serving Information Filtering.

2.3.1 Privacy as the Main Problem

Privacy is often stated to be the primary problem in IF architectures [24, 77].

This statement is, however, somewhat at odds with the actual behavior of a

large number of users who can be classified as privacy pragmatists willing to

waive privacy in return for other benefits or incentives [107]. Only a small

number of users are privacy fundamentalists aiming at keeping all personal

information private, regardless of its actual sensitivity. A comprehensive

overview of surveys dealing with users’ privacy concerns in the context of

personalization is given in [108]. It observes widespread differences between

stated preferences and the actual behavior of users. Nevertheless, the con-

flict of privacy and personalization is shown to be a significant problem in

personalized electronic commerce applications, and the need for solutions is

stressed.

The currently observed user behavior, however, has to be assessed in the

context of existing IF architectures, which do not realize their full potential,

mainly because they are restricted to the specific domains, as the following

examples indicate.



E-commerce: most current Recommender Systems assist a potential

buyer only by recommending products of a relatively low value, such

as books as opposed to cars, houses, or stock.



Lifestyle: Current Recommender Systems are largely restricted to

entertainment-related domains such as movies and music, ignoring e.g.

healthcare and wellness-related products.



Knowledge: Current Recommender Systems offer personalized news-

letters for politics, local news, or sports, but they generally disregard

the areas of academic publications, as well as financial and economic

information.

In future IF systems offering these kinds of information, perhaps even in an

integrated manner, users can be expected to be much more concerned about

their privacy, basically because of the large amount and sensitive character

of the personal information required in these cases, as opposed to current

solutions.

2.3.2 Multilateral Privacy

In IF architectures which aim at becoming widely accepted, privacy aspects

related to all main entities should be addressed:



Protecting the user privacy is the most obvious problem, because the

goals of privacy protection and personalization are inherently conflict-

ing: By definition, users have to provide personal information in order

to obtain personalized information.



Protecting the provider privacy is a problem that is sometimes ne-

glected, especially if the term “privacy” is only applied to individuals.

Obviously, the provider information cannot be protected completely, at

least in Recommender Systems, because recommendations are based on

provider information. Nevertheless, an information provider is likely to

be concerned about the dissemination of his information, which is his

principal asset. It would be detrimental, for example, if competitors

would be able to extract and re-create the entire provider information

in a feasible way, e.g. via using artificial user profiles as means for

retrieving a large number of recommendations.



Protecting the filter privacy is a problem that is generally ignored in the

literature. It seems reasonable, however, to regard the filter algorithms

themselves as a valuable asset that should be protected, mainly because

the quality of personalized information and thus probably the commer-

cial viability of the overall system directly depends on the quality of

the filtering techniques.

We therefore use the term “Multilateral Privacy”, analogous to the term

’Multilateral Security’ introduced in [94], in association with a system in

which the privacy of all participating entities is addressed, with no entity

taking precedence over another.

2.3.3 Measuring Privacy

The requirement “privacy protection” as such is rather vague and cannot be

measured objectively for a given IF system. Therefore, it has to be specified

further, in particular with regard to the following aspects:



Against what kind of adversary does private information have to be

protected?



What kind of threats have to be countered?



To what extent has private information to be protected?

We discuss these aspects in the following sections.

2.3.3.1 Adversary Model

From the viewpoint of one entity or participant of an IF system, any other

participant may be an adversary attempting to access and/or propagate pri-

vate information in a way that has not been agreed upon. We distinguish

between the following types of participants5:



An honest participant carries out all actions exactly as specified and

announced. Even in systems consisting entirely of honest participants,

threats with regard to private information exist, as described below.



An honest-but-curious adversary (sometimes referred to as “semi-hon-

est”) carries out all actions as specified and announced, but tries to

gain additional information by arbitrary means, e.g. by analyzing the

exchanged information. As an example, an honest-but-curious provider

entity in a IF system based on a centralized architecture returns rec-

ommendations as specified, but it may use the user profile data for

additional purposes.

5The terminology has been introduced in [56] in the context of Secure Multi-Party

Computation, for which see Section3.1.2, but it is applicable in this broader context as

well.



Amalicious adversary tries to gain additional information by means

that violate the protocols agreed upon. As an example, a malicious

provider entity in a IF system based on a centralized architecture may

not even return any result data, or he may deliberately return incorrect

data in order to induce the user to provide additional private informa-

tion.

It should be noted that an honest participant may be forced to deviate from

the protocol agreed upon as a reaction to a detected threat. Following [121],

we classify a role as an honest defending party in this case.

The honest-but-curious adversary model is obviously weaker than the

malicious adversary model, and thus protecting private information against

an honest-but-curious adversary is generally easier than protecting private

information against a malicious adversary. If prevention fails, however, it

may conversely be easier to at least detect a malicious action as a noticeable

deviation from the protocol, while an honest-but-curious adversary may be

harder to detect because the results are indistinguishable from the results

provided by an honest participant.

Furthermore, while the honest-but-curious adversary model is in many

cases sufficiently realistic because the adversary may be interested in correct

results as well, or because a malicious adversary is detected at some point

in time, in the context of Information Filtering both kinds of adversaries

should be assumed. However, we consider it to be sufficient for participants

to be able to detect malicious adversaries, instead of aiming at an architecture

preventing them completely, because a probable risk of detection should deter

malicious adversaries in most cases.

2.3.3.2 Threats

There are different threats with regard to private information that is propa-

gated to a recipient, e.g. from the user entity to the provider entity. In [45],

the following five main threats are identified:



Deception by the recipient: The recipient of private information uses

the information for other purposes, or purposes exceeding those stated

or agreed upon.



Mission creep: The recipient initially adheres to his stated policy with

regard to private information, but expands the original purposes over

time, without re-negotiating with or even notifying the respective party.



Accidental disclosure: The recipient propagates private information ac-

cidentally, e.g. via discarded hardware.



Disclosure by malicious intent: Private information is stolen by a third

party with malicious intents.



Forced disclosure: The recipient is legally forced to propagate private

information, e.g. via a subpoena.

Not all of these threats are equally important in an IF architecture. In a

Recommender System for restaurants, for example, the threat of forced dis-

closure seems somewhat far-fetched, considering the character of the private

information.

It should be noted that with the exception of the first threat, all threats

may emerge even in systems that initially contain only honest participants.

Countering the first threat, which may originate from honest-but-curious as

well as malicious adversaries, goes a long way towards countering the other

threats. Therefore, an IF architecture realizing multilateral privacy should

primarily focus on the threat of deception by the recipient.

2.3.3.3 Degrees of Privacy

As stated above, an all-or-nothing approach with regard to privacy is imprac-

tical in IF architectures, because a certain amount of information has to be

propagated by definition. Therefore, we define an acceptable degree of pri-

vacy that privacy-preserving IF architectures should provide. This sufficient

degree of privacy is characterized by the following aspects:



Computational vs. Information-Theoretic Privacy: Information can be

considered as private either if it is computationally infeasible for a sec-

ond party to obtain it, or if it is theoretically impossible for a second

party to obtain it. The first case, i.e. the case of computational pri-

vacy, relies on (unproven but widely accepted) intractability assump-

tions used e.g. as the foundation of an encryption scheme. The second

case, i.e. the case of information-theoretic privacy, does not rely on

these kinds of assumptions and therefore even prevents an adversary

with unbounded computing power to obtain the information. While

information-theoretic privacy is obviously stronger, it is generally im-

practical to achieve. We therefore consider computational privacy to

be sufficient in the context of privacy-preserving IF architectures.



Aspects of Anonymity: If an entity cannot be associated with its pri-

vate information by a second party, it is considered to be anonymous in

this regard. An infrastructure for anonymous communication supports

anonymity but may not ensure it unconditionally, because it does not

address the problem that the communicated information itself may be

sufficient for identifying an entity. Therefore, in IF architectures a user

profile may be anonymous in the sense that the respective user entity

is not directly identifiable, but the user’s identity may still be deduced

indirectly by combining profile elements. In some cases, even a small

number of apparently uncritical profile elements may be combined suc-

cessfully in order to identify a user, e.g. if the respective user profile

can be determined to be the only one containing that specific combi-

nation of elements6. Therefore, anonymous communication in itself is

not sufficient. Additionally, private information is exposed even when a

user cannot be identified. This is problematic especially in cases where

valuable, i.e. non-trivial and interesting information may be deduced

from a specific combination of elements in a user profile (e.g. a success-

ful stock portfolio or a list of related work pointing to scientific work

in progress or a pending patent ). These issues are discussed in detail

in [77]. We conclude that in privacy-preserving IF it should neither be

possible to associate user entities with profile items, nor a profile item

with other profile items. Provider and filter entities, however, are not

required to remain anonymous.



Unlinkability vs. Unobservability: In the optimal case, private informa-

tion is unobservable, i.e. an adversary cannot even determine whether

the respective information exists at all. In IF architectures, user profile

information is unobservable if another entity cannot determine whether

a specific profile item is actually contained in any user profile at all.

Private information is observable but unlinkable if an adversary is able

to determine that the respective information exists, but cannot asso-

ciate it with certainty with any participant or other information, i.e.

if he cannot determine a link. According to the privacy spectrum de-

fined in [95], unlinkability with regard to private information is further

classified by the following categories:

–Beyond Suspicion: From the adversary’s point of view, all theo-

retically possible links have the same probability. Therefore, he

cannot rationally suspect a specific participant or other informa-

tion to be associated with the respective information.

–Probable Innocence: From the adversary’s point of view, there may

exist links with a probability higher than that of other links. Still,

6As an example, a large majority of United States citizens may be identified correctly

based on just three attributes: Zip code, birth date and gender [106].

for any given link, its probability is smaller than 0.5, i.e. it is more

likely that the link does not actually exist.

–Possible Innocence: From the adversary’s point of view, a link

exists with a probability 1 ≥p≥0.5, i.e. the possibility that the

link does not actually exist cannot be ruled out.

Obviously, if private information can be associated with certainty with

a specific participant or other information, unlinkability is no longer

given and the information is exposed. All categories of the privacy

spectrum are summarized in Table 2.3 with regard to user privacy in

IF architectures. We consider the “probable innocence” degree of un-

linkability to be sufficient for user privacy in the context of privacy-

preserving IF architectures under the condition that the probability

is close to 1/|U|for all theoretical links, i.e. the “beyond suspicion”

degree is almost reached.

Regarding provider privacy, all information that is not provided directly

via recommendations should be unobservable. Regarding filter privacy,

the filter algorithm, if regarded as one single element of information,

should be unobservable as well: It is obviously observable that some

filtering technique is applied, but it should not be observable that a

specific filtering technique is applied.



Privacy of the Returned Information: Finally, it has to be considered

whether the returned information should be regarded as private as well,

i.e. whether e.g. the provider entity should be allowed to obtain the rec-

ommendations generated for a given user entity. This course of action

has the following advantages: The provider entity receives feedback

about the information that it provides, which may be useful for improv-

ing the quality of the information, and for adding further information in

areas that are highly demanded. Additionally, it may reduce the com-

plexity of the underlying filtering technique algorithms, because the re-

quirements with regard to privacy are somewhat relaxed. On the other

hand, allowing the provider entity to obtain result data compromises

the privacy of the user profile information at least indirectly, because

the provider entity may attempt to infer user profile information via the

recommendations. Determining an optimal trade-off between these as-

pects is problematic and depends on the given scenario. Therefore, we

do not resolve this issue at this point and merely observe that a privacy-

preserving IF architecture should preferably support both approaches

and reach a decision in particular cases e.g. through negotiation be-

tween the participants. As a minimal requirement, result data should

Table 2.3: The categories of the privacy spectrum according to

[95] with regard to user privacy in IF architectures. I0is the set

of all observed elements, cis a constant. The given probabilities

reflect the adversary’s point of view.

User-Element Element-Element

Association Association

Unobservability: I0=∅I0=∅

Absolute Privacy

Unlinkability: ∀u∈U:∀i0∈I0:

Beyond Suspicion prob(i∈PRu)prob(∃PR :{i, i0} ⊆ PR)

=c=c

Unlinkability: ∀u∈U:∀i0∈I0:

Probable Innocence prob(i∈PRu)prob(∃PR :{i, i0} ⊆ PR)

≤0.5≤0.5

Unlinkability: ∀u∈U:∀i0∈I0:

Possible Innocence prob(i∈PRu)prob(∃PR :{i, i0} ⊆ PR)

<1<1

Exposure ∃u∈U:∃i0∈I0:

prob(i∈PRu)prob(∃PR :{i, i0} ⊆ PR)

= 1 = 1

not compromise a user’s anonymity. As a maximal requirement, a de-

gree of unlinkability similar to the degree of unlinkability reached with

regard to profile elements should be reached. Predictions are consid-

ered to be unproblematic as long as they cannot be linked to a specific

user.

To summarize, a privacy-preserving IF architecture requires a degree of

privacy characterized by the following aspects:



computational privacy;



unlinkability of private user information and the respective user, and

of private user information elements among themselves;



regarding user privacy, a degree of unlinkability assuring probable in-

nocence with a sufficiently low probability threshold;



regarding provider and filter privacy, unobservability (with the excep-

tion of returned information);



(optionally) user privacy with regard to returned information.

2.3.4 Requirements

The functional requirements of a comprehensive Privacy-Preserving Informa-

tion Filtering system are the same as in any other comprehensive IF system,

and follow directly from the definitions given above:



The system should provide sufficient functionality for realizing the three

stages introduced in Section 2.2.1, namely the information collection

stage, the information processing stage and the information filtering

stage.



The system should be able to return all different kinds of result data

defined in Section 2.2.1, namely predictions of the relevance of specific

items, top-nrecommendations of items, predictions of the similarity of

specific users, and top-nsimilar users for a given user. In other words,

the system should be able to provide Recommender System function-

ality as well as Matchmaker System functionality. As non-distributed

Matchmaker Systems and non-distributed Hybrid IF Systems would be

difficult to realize in a privacy-preserving manner, we explicitly do not

require the system to be able to provide the respective functionality,

which is not strictly required anyway to cover all kinds of result data.



Regarding filtering techniques, the system should be able to support

feature-based approaches as well as collaborative approaches.

Additionally, we define several non-functional requirements of a Priva-

cy-Preserving Information Filtering system. As described in Section 2.2.3,

we identify the following main reasons for a lack of acceptance of existing

IF-based systems: Lack of privacy, lack of quality, required user effort, and

provider bias. The first two can be expressed directly as requirements. The

issue of user effort is partially covered by introducing the requirement of

broadness, i.e. by requiring a solution to be applicable in various domains

and in combination with a wide range of filtering techniques, because in

this case the user is able to re-use his personal profile and has to enter in-

formation only once for different providers. The aspect of provider bias is

addressed indirectly by the other requirements, and by acceptance aspects

discussed below. Finally, privacy-preserving IF systems must not disregard

performance issues: While some trade-offs with regard to performance are to

be expected and may be acceptable, the overall performance of the resulting

systems should be comparable to the performance of centralized systems,

because otherwise it would be infeasible to deploy the resulting systems in

real-world scenarios, or they would not be accepted by users because of us-

ability issues. Therefore, we also introduce the requirement of performance.

Security requirements such as the requirement of secure communication, or

protection against denial-of-service attacks, are not listed explicitly because

they are already covered, from the respective entity’s point of view, by the

requirements regarding privacy.

We define the requirements as follows:



User Privacy (Ru): No linkable information about user profiles should

be acquired permanently by any other entity or external party, includ-

ing other user entities, apart from observations of single profile elements

which can be linked to a specific user or to other observed profile ele-

ments with negligible probability. The adverb “permanently” is used

here to specify that private information related to a specific user may

be acquired by another entity temporarily as long as it is not prop-

agated or processed further and as long as it is removed completely,

resulting in a state identical to a hypothetical state in which it had

never been acquired. Result data, i.e. recommendations, predictions,

or similar users should be regarded as private in this sense as well, if

possible. If the returned information is not private, user profile infor-

mation that can be deduced directly from returned information may be

propagated as well. User anonymity per se is not required but may be

provided optionally. While these requirements are obviously somewhat

weaker than theoretically possible from the user’s point of view (the

optimal protection is reached by requiring unobservability of user pro-

file information), they constitute a realistic compromise because user

privacy can still be considered to be protected adequately while the

relaxed requirements enable feasible solutions in terms of complexity

and quality, and allow the provider to obtain some feedback about the

provided information.



Provider Privacy (Rp): No information about provider profiles, with the

exception of the returned information, should be acquired permanently

by other entities or external parties at all, i.e. provider information

remains unobservable. Additionally, the propagation of information is

entirely under the control of the provider. Thus, it is ensured that

the provider may prevent e.g. the automatic large-scale extraction of

information.



Filter Privacy (Rf): The algorithms used by the filter entity should

not be acquired permanently by any other entity or external party.

General information about the algorithm may be provided by the filter

in order to allow other entities to reach a decision on whether to apply

the respective filtering technique.



Quality (Rqq): The quality of the returned information should be close

to the level of quality achieved in traditional IF approaches (although

a small decrease in quality may be acceptable as a trade-off).



Broadness (Rbb): The architecture should not be restricted to single

information domains, specific filtering techniques, or specific persistent

storage mechanisms. It should be easily adaptable to various domains

and environments in order to facilitate fast and efficient development.



Performance (Rpp): The computational complexity and latency of the

realized solution should be close to the performance achieved in tradi-

tional IF approaches (although a small decrease in performance may

be acceptable as a trade-off).

In addition to non-functional requirements, we also introduce the follow-

ing acceptance aspects:



User Acceptance (Au): A system meeting all requirements listed above

may still not achieve a high degree of user acceptance, e.g. if users do

not trust the underlying technology, or if the system lacks usability.



Provider Acceptance (Ap): Furthermore, a system meeting all require-

ments listed above may not be commercially viable or valuable in any

other way for the providers of the system and the filtering techniques,

e.g. if no viable business model is found or if the costs of running the

system are too high.

These acceptance aspects have to be kept in mind when addressing the

requirements, because otherwise the resulting systems would only be of theo-

retical interest. Acceptance is generally achieved to a large degree by meeting

the respective privacy requirement. Provider acceptance of user-controlled

approaches is expected to be somewhat lower because these approaches may

not allow the provider to obtain any information about users at all.

2.4 Multi-Agent Systems

Multi-Agent System technology are one possible choice for realizing a dis-

tributed PPIF architecture. Section 4.2.3 discusses this choice in more de-

tail. In this section, we provide definitions of the main concepts of MAS

technology, describe the basic functionality required for our approach, and

discuss the problem of malicious hosts and possible solutions thereof.

2.4.1 Definitions

There is no universally accepted definition for the terms agent and Multi-

Agent System. The IEEE Computer Society standards organization Foun-

dation for Intelligent Physical Agents (FIPA) provides a set of specifications

covering most aspects of MAS architectures (see [46, 47, 49, 48] for the most

general specifications). Our approach is generally applicable to all architec-

tures complying with these specifications and the following basic definitions:



An agent is a software-based entity operating autonomously,reactively,

and pro-actively. It has the ability to interact with other agents via a

special Agent Communication Language (ACL). Autonomy here de-

notes the ability of an agents to carry out complex tasks independently,

i.e. without the intervention of other agents or humans, while having

control over its own actions and internal state, e.g. through running

on its own internal execution thread. Reactivity denotes the ability

of an agent to respond to changes detected in its environment. Pro-

activeness denotes the ability of an agent to act not only reactively,

but also on its own initiative, e.g. triggered by its internal state. This

definition closely follows [118]. An agent basically contains program

code that enables it to operate according to this definition, knowledge,

i.e. basically a set of data, and its internal state.



An agent service is a specific task an agent may carry out internally

or on behalf of another agent. In order to enable other agents to use

its services, an agent usually has to announce these services by provid-

ing an abstract description in a well-defined format, i.e. based on an

common ontology.



An agent platform is the runtime environment of a group of agents.

Agent platforms provide infrastructure functionality, such as yellow

pages services for agent service discovery, and white pages services

listing existing agents themselves. Additionally, agents on an agent

platform are usually protected against threats originating from other

agents on the same or any other platform, or from other external en-

tities. The life cycle of an agent begins with the agent being created

on a specific platform, and ends with the agent (or the entire platform

he is located on) being terminated. Additionally, mobile agents have

the ability to migrate from one platform to another, by transporting

their program code, knowledge and internal state. An agent platform

is provided by an host entity.



AMulti-Agent System is the entirety of agents and agent platforms

either deployed in the context of an application as a distributed problem

solving system, or deployed with different and possibly conflicting goals

in a more abstract common context as an open system [120].

2.4.2 Basic Functionality

Our approach requires a MAS with certain basic features and functional-

ity which are largely implicit in the definitions given above: All interactions

between agents are carried out via agent services. In order to exchange a mes-

sage between a sender agent and a receiver agent, an agent service is used

where the sender is the service user and the receiver is the service provider.

The agent service is realized by both participants following a specific pro-

tocol, i.e. a user protocol on the service user side and a provider protocol

on the service provider side. For basic interactions, the user protocol may

be as simple as sending initial data and receiving the result data, but more

complex user protocols may be required for complex interactions. Regarding

unobservability of interactions, it should be possible to hide the content of

an interaction from all possible observers, namely other agents, the agent

platform itself as well as external entities. The fact that an interaction takes

place may however be observable, e.g. by the agent platform. Agents should

be able to control the access to the services they offer, which is done e.g.

by using an Service Control List mechanism. An agent’s program code and

knowledge is considered private information that should not be accessible by

other parties without the agent’s consent. Regarding platform management

functionality, we assume a special agent realizing a PlatformManager-

Role that provides at least interactions for creating and terminating agents,

and for migrating mobile agents. All agents realize a generic AgentRole

which allows them to use services offered by agents realizing more specific

roles.

2.4.3 Malicious Hosts

Features of agents such as autonomy and the capability to contain private

information in the form of knowledge suggest the use of agents as personal

agents, i.e. agents acting on behalf of a human user (or, more generally,

abstract entity) and containing private information of the respective user or

entity. In this context, the security risks of agents operating on a platform

provided by a potentially untrustworthy host have to be addressed: While

the protection of hosts against agents, and of agents against other agents

in the same environment are relatively straightforward issues that may be

addressed adequately (see e.g. [99]), the problem of protecting agents against

malicious hosts is more complicated (see [111] for a survey). In addition to

threats not necessarily originating from the host, such as threats related to

communication of mobile agents, there are two main threats related to the

following aspects of an agent running on a remote platform:



Privacy: The host may attempt to obtain the program code and/or

knowledge of an agent without interfering with the execution of the

agent. In this case, the host is regarded as honest-but-curious with

respect to the general threat model.



Integrity: The host may attempt to tamper with the agent code and/or

data during the execution of the agent. In this case, the host is regarded

as malicious with respect to the general threat model.

In the context of personal agents, these threats directly affect the privacy

of the user or entity represented by the personal agent. Apart from relying

on trusted hosts, several approaches have been suggested to counter these

threats, which are discussed in Section 3.3.2.

2.5 Summary

This chapter provides definitions that are used throughout this work. Based

on these definition, the main problems of IF systems in general and the prob-

lem of privacy in IF systems in particular are described, and requirements

for a Privacy-Preserving Information Filtering architecture are derived from

this problem description.

Regarding definitions, this work focuses on informational privacy as one

of several aspects of privacy (Section 2.1.1), and focuses on privacy protec-

tion via technology as one of several strategies for protecting privacy (Sec-

tion 2.1.2). It deals with IF systems in the form of Recommender Systems,

Matchmaker Systems, and Hybrid IF Systems, which provide functionality

for three separate stages, namely the information collection stage, the infor-

mation processing stage, and the information filtering stage. In the latter

two stages, filtering techniques are used that are group into feature-based

approaches and collaboration-based approaches (Section 2.2.1).

As part of the problem description, we describe different existing types

of IF architectures (Section 2.2.2), and list the main problems of IF sys-

tem (Section 2.2.3), out of which we highlight privacy as the main problem

(Section 2.3.1).

We introduce the concept of multilateral privacy (Section 2.3.2), which

we specify further in the context of IF systems by defining realistic adver-

sary models (Section 2.3.3.1), by describing various threats to privacy (Sec-

tion 2.3.3.2), and by defining a degree of privacy that reflects the interests of

the involved entities adequately and at the same time is realistically achiev-

able (Section 2.3.3.3). Subsequently, we list all requirements of a Privacy-

Preserving Information Filtering architecture (Section 2.3.4).

As our solution is based on Multi-Agent System technology, we define the

main concepts of MAS technology (Section 2.4.1); we list basic functionality

required for our approach (Section 2.4.2); and we describe the problem of

malicious hosts in MAS systems as the central problem related to privacy in

a MAS context (Section 2.4.3).

Table 2.4: An overview of existing IF architectures in relation

to the requirements and acceptance aspects of Privacy-Preserving

Information Filtering. A requirement is fully met (indicated by

“X”), partially met (indicated by “



”), or not met at all (indicated

by “–”). Acceptance is indicated in an analogous manner. Note

that the ratings do not always indicate the best value theoretically

possible for the respective architecture, but an average value.

Privacy Other Accep-

Requirements Requirements tance

RuRpRfRqq Rbb Rpp AuAp

provider-controlled IF –X–



X X –X

privacy-enhanced IF



X–

   

user-controlled IF



X–

 

–

 

(collaboration-based)

Table 2.4 summarizes the problems of existing IF architectures by listing

the existing approaches for provider-controlled, privacy-enhanced and user-

controlled IF in relation to the coverage of the requirements and acceptance

aspects. In the following chapter, work related to the area of Privacy-Pre-

serving Information Filtering is discussed based on the definitions provided

in this chapter, and existing approaches are evaluated in the same manner.

Chapter 3

Related Work

In this chapter, we review the state of the art in privacy-preserving Infor-

mation Filtering, including related approaches and building blocks thereof.

Work related to specific areas of our approach that is not directly relevant

for the overall approach, such as specific algorithms for filtering techniques,

is discussed in the respective chapters.

The chapter is structured as follows: The first section discusses Priva-

cy-Enhancing Technologies (PETs) as the fundamental building blocks of

privacy-preserving architectures, including basic concepts and functionality

used by these PETs. Section 3.2 discusses Privacy-Preserving Technologies

(PPTs) as privacy-preserving approaches for related areas, such as Private

Information Retrieval and privacy-preserving data mining, as well as privacy-

preserving Information Filtering architectures, i.e. approaches comparable to

our architecture for Privacy-Preserving Information Filtering. Each section

contains evaluation of the respective areas in the context of PPIF. Figure 3.1

gives an overview of the different groups of related work and their relation-

ships, i.e. the ways they build upon each other. These two sections comprise

the state of the art of privacy-preserving Information Filtering.

Section 3.3 discusses related work in the area of Multi-Agent System

(MAS) technology that is directly relevant for our solution, i.e. related work

dealing with anonymous communication and the problem of malicious hosts.

Section 3.4 summarizes the chapter by giving an overview of the applicability

of the different approaches with regard to Privacy-Preserving Information

Filtering.

Figure 3.1: Areas of research in PETs and PPTs. As indicated

by the lines between areas, PPTs build upon the functionality

provided by PETs, which in turn use basic concepts and function-

ality.

3.1 Privacy-Enhancing Technologies

Privacy-Enhancing Technologies have been researched under this designation

for more than a decade. It should be noted that the designation is not used

in all related work: There is a large amount of related work e.g. in the areas

of anonymous communication and secure multi-party computing that is not

explicitly labeled as PETs. Surveys of the field are given e.g. by [55] and

[54].

In the following, we distinguish between PETs and PPTs by defining Pri-

vacy-Enhancing Technologies as basic building blocks which, while they may

be used directly for a specific purpose, such as anonymous communication,

have to be combined with other functionality in order to realize Privacy-Pre-

serving Technologies used within applications for broader purposes, such as

PPIF. In other words, in a complex scenario a single Privacy-Enhancing

Technology (PET) may be used to enhance privacy, as the name implies, but

is not sufficient for preserving privacy under all circumstances.

In addition to work on the theoretical foundations, research in the area

currently focuses on four main areas:



Anonymous communication over the internet and in peer-to-peer net-

works, including work on traffic analysis and related issues, such as

anonymous web browsing, is discussed in Section 3.1.1.



Protocols for Secure Multi-Party Computation are discussed in Sec-

tion 3.1.2.



Trusted Computing mechanisms and applications are discussed in Sec-

tion 3.1.3.



Other provider-side technologies for privacy enforcement, such as en-

terprise privacy policies and DRM-related approaches, are discussed in

Section 3.1.4.

Research on privacy-enhanced applications, e.g. in the areas of face recog-

nition, public transport, ubiquitous computing, and e-learning is not directly

relevant in the context of Privacy-Preserving Information Filtering and there-

fore not discussed further here.

3.1.1 Anonymous Communication

With regard to user acceptance and prevalence, PETs for anonymous com-

munication are arguably the most successful group of PETs to date. They

are usually realized in the form of tools for anonymous e-mail and brows-

ing, but may address related issues, such as anonymous communication in

peer-to-peer networks as well.

The following terminology is adapted from [91]1: A sender sends messages

to a single recipient or a group of recipients via a communication network.

An attacker is interested in obtaining information about these messages, e.g.

about communication patterns, and may attempt to manipulate the commu-

nication. The content of the messages themselves is usually not considered in

this context, because it is expected to be encrypted and therefore protected

against attacks. In this aspect, the concepts discussed in the following have

to be distinguished from the concepts introduced in Section 2.3.3.3, where we

focus on the actual information, i.e. the content of the messages. However,

as indicated by the use of similar terminology, such as “unobservability” and

“unlinkability”, in both contexts, the respective concepts are largely analo-

gous. In the context of anonymous communication, we distinguish between

the following concepts:



Anonymity is a feature of an entity (a sender or recipient) who is not

identifiable within a set of entities of the same type, the anonymity set.



Unlinkability is a feature of a number of entities or messages who appear

no more related after an attacker’s observation than they are based on

his prior knowledge. Anonymity may therefore be defined in terms of

unlinkability:

1An extended version of this work is available online as version v0.31 via the URL

<http://dud.inf.tu-dresden.de/literatur/Anon%5FTerminology%5Fv0.31.pdf>.

–Sender Anonymity is established in systems where, out of the sets

of all senders and messages, there exists no sender and message

that can be linked.

–Recipient Anonymity is defined in an analogous manner for recip-

ients and messages.

–Relationship Anonymity is defined in an analogous manner for

senders and recipients. It is a weaker notion of anonymity be-

cause it is implied by sender anonymity as well as by recipient

anonymity.



Unobservability is a feature of entities who cannot be distinguished

from any other entity of the same type. Analogous to the definitions

above, Sender Unobservability,Recipient Unobservability, and Rela-

tionship Unobservability are defined. It should be noted that unobserv-

ability always implies anonymity. Sometimes the term Untraceability

is used synonymously.



Pseudonymity is a feature of entities who use one or more pseudonyms

for identification.

In the following, we review the main groups of solutions, which mainly

focus on achieving sender and/or relationship anonymity. They may be ex-

tended by various methods (such as dummy traffic, or the use of steganog-

raphy) to achieve a certain degree of unobservability. Exemplary approaches

for recipient anonymity are broadcast mechanisms and Private Information

Retrieval schemes, for which see Section 3.2.3.

3.1.1.1 Mix networks

The most basic approach for anonymity through unlinkability is the use of

relays, also known as proxies, for all communication in order to hide the

actual sender and/or recipients. Simple proxies, however, are vulnerable

to traffic analysis threats. Mix networks (first suggested by [28]) address

this problem by providing a number of proxies as mixes. Each messages

is routed through various mixes, and each mix withholds messages until a

certain amount of messages have been collected, and then re-encrypts and

propagates messages in permutated order.

Subsequent Research on mix networks has generally segmented into two

groups of approaches: High-latency approaches, such as the Mixminion ap-

proach [36], aim to minimize breaches of anonymity by introducing large and

variable latencies, and are therefore less suitable for immediate communi-

cation, such as web browsing. On the other hand, low-latency approaches,

such as onion routing [57] and its more recent modifications [39], aim at

anonymizing network traffic itself.

While mix networks have been studied extensively (see e.g. [39] for an

overview), they have been observed [23] to largely rely on insufficient crypto-

graphic constructions, which has led to a large number of schemes that have

been broken and modified, sometimes repeatedly. Nevertheless, existing ap-

plications for anonymous communication are mainly based on mix networks.

3.1.1.2 DC-Nets

While mix networks may be used to provide computational privacy (as de-

fined in Section 2.3.3.3), DC-Nets are an alternate approach providing in-

formation-theoretic privacy as well as sender unobservability. DC-Nets have

been introduced in [27]. The name is derived from the so-called dining cryp-

tographers problem, which illustrates anonymous communication via a simple

scenario: Three or more entities intend to receive a message (in the most ba-

sic case with a length of one bit) sent by one of the entities who intends to

remain anonymous. The entities arrange themselves as a ring, i.e. a circular

linked list, and each entity shares a secret bit with each of his two neigh-

bors. Starting with a designated entity and a result bit set to zero, each

entity in turn combines the two secret bits obtained from its neighbors, the

message itself (in case the entity is the sender), and the result bit via the

XOR-operation. Once this operation has been carried out by all entities, the

result bit represents the message, without any information about the identity

of the sender having been revealed.

In more general terms, the protocol realizes a superimposed sending of

a message on a ring network, with each message bit requiring one round

of communication around the ring for superimposing the message, and one

round for broadcasting the result. Obviously, all entities are required to act

in a non-malicious way, as a malicious entity may easily alter the message or

disrupt the protocol. The main drawback of DC-Nets is the communication

complexity which makes their use infeasible for real-world multi-user systems.

3.1.2 Secure Multi-Party Computation

Secure Multi-Party Computation (SMPC) protocols are the main building

block for interactions involving two or more entities who intend to exchange

some information while keeping related information private. The concept has

been introduced in [119] as secure two-party computation and later general-

ized in e.g. [56]. An SMPC protocol involves nentities with inputs I1. . . In

and a function f= (f1, .., fn) for computing outputs O1. . . On, based on

these inputs, i.e. a function denoted by

(I1, .., In)7→ (O1=f1(I1, .., In), .., On=fn(I1, .., In)).

Privacy is preserved by the protocol if the information an entity obtains

during the protocol does not exceed the information the respective entity

could derive from its own input and from the output of the protocol.

In this definition, all information is regarded as sensitive with regard to

privacy. If additional information can be obtained, the respective protocol is

therefore not privacy-preserving in a strict sense. It may still be sufficient,

however, if the disclosed information does not actually violate the privacy

of a participating entity. While the respective protocol may be optimized in

terms of complexity by allowing additional information to be disclosed, it is

generally more difficult to prove its sufficiency with regard to privacy.

Generic SMPC protocols model the function to be computed as a com-

binatorial circuit, and carry out comparatively small sub-protocols for every

gate of the circuit. While theoretically applicable to a large class of func-

tions, generic protocols are often practically infeasible because of large input

sizes and the fact that complex functions have to be modeled as complex cir-

cuits, leading to inefficient protocols. Nevertheless, applications for realizing

generic Secure Multi-Party Computation protocols have been introduced [81]

and may be used for simple functions.

The following examples for simple functions are given in [34] in the context

of Privacy-Preserving Data Mining, which is discussed in Section 3.2.2:



Secure Sum: For values a1. . . andistributed between multiple partici-

pants, the sum of the values is basically computed in the following way

if it is to be known to lie in the range [0..n]: A designated first par-

ticipant adds a random number to his local value and propagates the

result, among all participants, with each participant adding his own

local value. In each step, nis subtracted from the result if it is greater

than n. Finally, the first participant subtracts the random value and

thus obtains the actual result.



Secure Set Union: For sets A1. . . Andistributed between multiple par-

ticipants, the union A1∪.. ∪Anis computed by each participant en-

crypting, via a commutative encryption scheme2, all items of his own

set, and the encrypted items of all other sets, received from the other

participants. At the end of the process, each item has been encrypted

2In commutative encryption schemes, the results is not affected by the order in which

encryption or decryption function are applied, i.e. {{m}K1}K2={{m}K2}K1.

ntimes, and because of the commutativity of encryption equal result

values imply equal items. Thus, duplicates may be removed, and the

resulting union set of encrypted items is decrypted by each participant

in turn, resulting in the union set of decrypted items.



Secure Size of Set Intersection: For sets A1. . . Andistributed between

multiple participants, the size of the intersection set A1∩..∩Anis com-

puted by each participant encrypting, via a commutative encryption

scheme, all items of his own set, and the encrypted items of all other

sets, received from the other participants. At the end of the process,

each item has been encrypted ntimes, and because of the commuta-

tivity of encryption equal result values imply equal items. Therefore,

the size of the intersection set may be determined by each participant

simply by counting the number of encrypted items appearing in each

of the sets.



Scalar Product: Apart from general approaches based on secure multi-

party computation, which are not efficient, the problem of determining

the scalar product of vectors in a privacy-preserving way has only been

addressed for two parties, with various complex solutions e.g. based on

the use of random vectors and matrices.

Private Information Retrieval schemes, described in Section 3.2.3, may

also be based on SMPC protocols, because they basically realize the function

(i, {x1, .., xn})7→ (xi, λ),

i.e. the first entity retrieves xifor a given i, while the second entity learns

nothing at all.

3.1.3 Trusted Computing

Trusted computing aims at realizing trusted systems by increasing the secu-

rity of open systems (i.e. systems which are accessed by various groups of

entities, where entities of one group do not necessarily trust entities of other

groups) to a level comparable with the level of security that is possible in

closed systems (i.e. systems which are accessed by a single group of entities

that trust each other). It is based on a combination of tamper-proof hard-

ware and various software components. A trusted computing architecture

has to address two aspects: Integrity of the system, which is achieved by

ensuring the system cannot be tampered with in any way, and authenticity

of the system, which is achieved by ensuring that a remote party can be

convinced of the integrity of the system. The aspects are realized by three

main mechanisms:



Secure Bootstrapping ensures the system is initialized resulting in a

state that adheres to a given security policy, e.g. by booting into a

trusted operating system.



Strong Isolation prevents the initialized system from being tampered

with, and prevents applications from tampering each other.



Remote Attestation certifies the integrity of software run on the system

to a remote party.

A secure bootstrapping mechanism (described e.g. in [7]) basically enables

a system to measure its own integrity during the boot process, and terminate

the process if the integrity is compromised, via a chain of identity checks: A

tamper-proof hardware device, the trusted module, starts the boot process by

recomputing the hash of the BIOS, and compares this value with a hash of the

BIOS signed with the private key of the trusted module and stored within it.

The private key itself is embedded in the trusted module, and the respective

public-private key-pair is certified by a Certificate Authority (CA)3. If the

values match, the BIOS has not been tampered with, and control is passed

to it. The process is continued by the BIOS and the next layer in the chain

in a similar manner, and by subsequent layers up to the operating system.

Thus, each layer certifies the next layer in the chain, by signing a hash

of its executable image, and its public key. When the process is completed

successfully, the system is guaranteed to have booted into a trusted operating

system.

Secure bootstrapping alone does not allow a remote party to verify the

integrity of the boot process. This is addressed either by extending the pro-

cess to authenticated boot, which we do not describe here, or through remote

attestation. In other words, secure bootstrapping restricts the software that

may actually run on a system, while remote attestation reports which soft-

ware runs on a system. It is up to the remote party to decide how to deal

with the information given.

The basic mechanism of remote attestation, however, is similar to the

secure bootstrapping mechanism described above: An application is attested

by the operating system signing a hash of the executable of the application.

3It should be noted that for this reason, trusted computing requires a trusted third

party essentially certifying that the trusted module works as specified. This holds even

when a central CA is replaced by a more flexible mechanism enabling direct anonymous

attestation [16].

This certificate is sent to the remote party, along with all other certificates of

the chain starting at the trusted module and ending at the operating system.

The remote party verifies each certificate (thus verifying the integrity of the

boot process as well), and checks the corresponding hashes against a list of

approved soft- and hardware. Remote attestation should result in a secret

shared between the application and the remote party, otherwise it cannot

be ensured that the attested application is actually executed. We discuss

problems and their suggested solutions related to remote attestation in the

following section.

Finally, strict isolation is handled by the trusted operating system. Vir-

tual machine monitors may be used in order to abstract from the actual

hardware, and leverage strict isolation by running applications in different

virtual machine monitors [51].

3.1.3.1 Semantic Remote Attestation

According to [60], the basic remote attestation process has the following

problems:



Program behavior is not attested: The only information provided is

that a certain executable is running. It is up to the remote party to

determine whether this executable actually runs as specified, or it has

to be trusted. Both alternatives are obviously problematic especially

in cases where sensitive information is involved. Even if the executable

is actually intended to act non-maliciously, it may fail to do so because

of bugs or design flaws. In any case, it is impossible for the average

human user to analyze an executable.



Inflexibility: Remote attestation is carried out once, before the exe-

cutable starts to run. Therefore, information about its runtime state

or input data cannot be provided.



Management issues: Updates or patches of executables result in dif-

ferent hashes. The remote party has to update its list of approved

executables accordingly, again with the problem that program behav-

ior is not attested. The problem is exacerbated by the fact that in many

cases multiple patches and updates exist and are applied in various or-

der, resulting in a large number of different executables. Additionally,

the number of executables that have to be approved is increased by the

fact that platform-specific binaries are attested, i.e. the same software

on different operating systems results in different executables.



Revocation: A problem inherited from public-key cryptography is the

revocation of certificates, which cannot be addressed easily in an effi-

cient way. To make sure a certificate is valid, Certificate Revocation

Lists would have to be checked for every attestation process.

In semantic remote attestation [60], some of these problems are addressed

by using language-based techniques in combination with a virtual machine

approach, with the goal of attesting program behavior rather than particular

executables. Instead of the trusted operating system attesting the executable

of an application, the virtual machine, which is capable of executing platform-

independent code, attests various properties of an application running within

it. This is possible because in order to be executable within a virtual machine,

the respective code contains high-level information which may be used for

attestation.

3.1.3.2 Applications

Trusted computing is most often discussed in relation with Digital Rights

Management (DRM), i.e. as a mechanism for realizing provider privacy by

limiting users’ access to information. There is, however, a large number of

other potential applications, including mechanisms for realizing user privacy:

Some example applications, including peer-to-peer networks, distributed fire-

walls, and distributed computing in general, are listed in [51]. Other obvi-

ous potential applications are MAS architectures supporting mobile agents,

anonymous remailers, PPIF, and other PPTs.

In trusted computing, tamper-proof hardware is used only for the boot-

strapping process. Related approaches use a secure coprocessor as tamper-

proof hardware for additional tasks (see e.g. the Private Information Retrieval

(PIR) schemes discussed in Section 3.2.3). This course of action focuses more

on prevention of tampering, while trusted computing focuses on detection of

tampering. Remote attestation, however, is often not addressed explicitly by

these approaches, and they require customized tamper-proof hardware.

3.1.4 Privacy Enforcement

In this section, various approaches for privacy enforcement are discussed.

They are based on the assumption that a given provider of information ser-

vices intends to actually protect the privacy of his customers, i.e. the users,

and thus does not act in a malicious manner. A basic requirement for pri-

vacy enforcement is the following: Privacy policies of providers as well as

privacy preferences of users have to be expressed in a structured form, i.e. in

a form that allows the user preferences to be processed and applied to the

user profile data automatically, according to the respective privacy policy.

The Platform for Privacy Preferences Project specification [90] meets this

requirement by defining a syntax and semantics for privacy policies, as well

as mechanisms for associating privacy policies with web resources. It does

not offer support for actual provider-side enforcement of privacy policies.

A basic kind of enforcement make take place at the user side, through

programs which for example fill out web-based forms according to the user’s

preferences and the respective web site’s privacy policy. User-side enforce-

ment can only be used to restrict the information a provider receives, but

not to control further dissemination of this information. The latter aspect is

addressed by provider-side enforcement, which aims at protecting sensitive

data by various means, listed as follows:



Hippocratic Databases: In analogy to the hippocratic oath, which pro-

tects the privacy of patients, a mechanism for protecting user data

in databases is described in [2]. Basically, database records are ex-

tended by adding privacy metadata, derived from the information a

user has specified in a Platform for Privacy Preferences Project (P3P)

profile and the provider’s privacy policy. Based on the privacy meta-

data, sophisticated access control mechanism are implemented. Ad-

ditional tools are utilized to detect unusual and potentially privacy-

critical queries, and to record an audit trail for each query.



Enterprise P3P: A similar approach is described in [69], resulting in a

Platform for Enterprise Privacy Practices in which every data object

is enhanced by metadata information specifying the respective access

policy. Thus, data may be handled according to a privacy policy and a

user’s preferences at all stages of an enterprise process, not only in the

context of a database.



RAIC-based dynamic adaption: A component-based approach is de-

scribed in [72] resulting in an architecture based on a Redundant Array

of Independent Components where a simple component provides func-

tionality adapted to specific privacy constraints, e.g. a specific filtering

technique in an IF context. Suitable components are selected for each

process based on the current privacy constraints, such as the privacy

preferences of the current user.

With regard to the privacy threats listed in Section 2.3.3.2, privacy en-

forcement primarily deals with the threat of accidental disclosure.

3.1.5 Evaluation

While Privacy-Enhancing Technologies are useful building blocks, each group

of PETs described in this section is, in itself, not sufficient for realizing a

privacy-preserving IF architecture in a feasible way, because an IF system

based on a single group of PETs does not meet all requirements listed in

Section 2.3.4:



Anonymous communication between user and provider may not prevent

the provider from identifying the user via the user’s profile data, and

it may allow the provider to obtain private information by aggregating

profile data. The other non-functional requirements would not be af-

fected decisively by the use of anonymous communication. Because of

its prevalence as a PET, it is generally accepted by users. Providers,

however, may not accept it as readily because they usually prefer to be

able to identify the user they are interacting with.



Secure multi-party computation protocols are infeasible for large quan-

tities of data and complex functions, such as filtering technique algo-

rithms. Additionally, they cannot be used if the function to be com-

puted is to be kept private as well, which is required for filter privacy.

These PETs is not likely to affect acceptance aspects decisively.



Trusted computing by itself is not sufficient for realizing a generic

privacy-preserving IF architecture, mainly because of practical issues

related to remote attestation of an application meeting the requirement

of broadness (these issues are discussed in detail in Section 10.2), and

because of a lack of user acceptance with regard to trusted computing

in general.



Privacy enforcement mechanisms at the provider side do not protect

sensitive user data against malicious providers. These PETs is not

likely to affect the other requirements and acceptance aspects decisively.

Table 3.2 summarizes the evaluation of all PETs in the context of all

work related to PPIF. To recapitulate, functionality from different groups

of PETs has to be combined with additional functionality in order to realize

the goal of Privacy-Preserving Information Filtering.

3.2 Privacy-Preserving Technologies

In this section, we discuss Privacy-Preserving Technologies as work that is

related to our approach. While these PPTs are not directly applicable for

PPIF, mainly because they are intended for different purposes, they are nev-

ertheless relevant because the problems and concepts introduced are similar

to those of PPIF. Furthermore, we discuss related work in privacy-preserving

IF architectures.

3.2.1 Peer-Oriented Approaches

Peer-oriented approaches are protocols based on Secure Multi-Party Com-

putation that allow a group of similar entities, i.e. entities characterized

by having equivalent goals and uniform private data structures, to accom-

plish various tasks related to Information Retrieval and Information Filtering.

Hence, these approaches are not applicable in scenarios containing entities

with different goals (such as user and provider). In the following, we discuss

two exemplary peer-oriented approaches. For other approaches, see e.g. [22].

3.2.1.1 Privacy-Preserving Indexing

In this scenario, described e.g. in [13], a document collection is shared among

multiple entities, e.g. in a peer-to-peer file-sharing setting, and a global index

is to be created in order to enable documents to be retrieved in a more

efficient way than by asking each entity whether it may be able to provide

the document in question. However, the participating entities do not wish

their entire partial collections to be known globally, and they intend to be

able to decide whether to provide a document, based on the entity requesting

the document. The protocol suggested in [13] basically introduces a number

of false positives for each document equal to the number of true positives

(the entities actually sharing the respective document). The global index

contains all false and true positives, and thus the probability that an entity

listed in the index as sharing a specific document actually does so is 0.5. In

terms of unlinkability (see Section 2.3.3.3), this solution therefore provides

an unlinkability degree close to “probable innocence”. However, the privacy-

preserving construction of the index turns out to be problematic in the case

of colluding malicious entities.

3.2.1.2 Privacy-Preserving Clustering

Another peer-oriented approach closely related to Information Filtering is

privacy-preserving clustering, in which two or more entities aim at construct-

ing a global model of clusters of private data. Apart from relying on a trusted

third party, there are two basic approaches:



Data Perturbation: Before the clustering algorithm is applied, noise is

added to the underlying data.



Secure Multi-Party Computation: The clustering algorithm is based on

a SMPC protocol.

Perturbation-based approaches are described e.g. in [84]. In [66], an SMPC-

based approach for privacy-preserving clustering is described which is based

on the k-means clustering algorithm for two parties under the assumption of

a honest-but-curious adversary, which utilizes a privacy-preserving protocol

for computing cluster means. Two approaches for the protocol itself are

described: A protocol for oblivious polynomial evaluation [86], and a protocol

for homomorphic encryption [14].

3.2.2 Privacy-Preserving Data Mining

Data mining, also known as Knowledge Discovery in Databases (KDD), deals

with the acquisition of non-obvious, potentially useful information, via an

automatic extraction of previously unknown patterns from large amounts of

data usually stored within databases. This is generally achieved by building

models aggregating the raw input data. The output generated by a data min-

ing process is a classifier or rule describing the previously unknown pattern.

In addition to applying standard machine-learning techniques to generate the

output, algorithms based on association rule mining are often used (see [62]

for a survey), producing association rules, which describe relations between

sets of items expressing statements of the form “users who bought item x

and yalso bought itemz”.

Privacy-Preserving Data Mining aims at preserving the privacy of users

providing the input data, basically via two distinct approaches:



Partial privacy: The model on which data mining processes are to

be applied may contain sensitive information, but this information is

hidden in the output.



Complete privacy: Sensitive information is hidden even in the model

itself, though it may be contained within the input data.

Based on the classification given in [112], we distinguish between the

following three classes of approaches for preserving privacy:



Heuristic-based approaches are utilized to obtain partial privacy, i.e.

they are applied when output is generated from a model.



Cryptography-based approaches are utilized to obtain complete privacy.

They are applied to create a model by secure multi-party computation,

based on input that has not to be revealed itself.



Reconstruction-based approaches are utilized to obtain complete pri-

vacy. They allow the perturbation of the input data, resulting in a

model containing no sensitive information. Because this model is based

on perturbed data, the actual model has to be reconstructed from it.

3.2.2.1 Heuristic-Based Approaches

Heuristic-based approaches aim at preventing certain patterns in the respec-

tive data set, such as rules based on sensitive information. In the case of

association rule mining, the problem is formalized as follows: Given a set of

rules Rand a subset Rh⊂R, the underlying data set Dhas to be trans-

formed to a data set D0in a way that allows the mining of rules R, but

not of Rh. In other words, heuristic based approaches aim at limiting pos-

sibilities for data mining. For this reason, and because they only achieve

partial privacy, they are not relevant as related work with regard to PPIF,

and therefore not discussed further here.

3.2.2.2 Cryptography-Based Approaches

Cryptography-based approaches based on secure multi-party computation

have first been suggested in [78]. Generic solutions are not applicable for

privacy-preserving data mining in a feasible manner, because large data sets

and complex algorithms are involved. Therefore, specific protocols for secure

multi-party computation have to be designed.

In [78], a secure two-party computation protocol for the a decision tree

learning algorithm is described realizing the function

(D1, D2)7→ (ID3(D1∪D2), ID3(D1∪D2)),

i.e. given two participants with separate databases D1and D2, each partici-

pant obtains a decision tree based on the joint database. The communication

complexity of this protocol is shown to be reasonably close to the commu-

nication complexity of a non-private protocol for distributed computation of

the decision tree.

In [34], basic operations are described as a toolkit for realizing various effi-

cient data mining algorithms, as listed in Section 3.1.2. Using a combination

these basic operations, algorithms for association rule mining in horizontally

partitioned data (transactions are distributed between participants) [68] and

for association rule mining in vertically partitioned data (each transaction

is distributed between participants) [110] are described. While the result-

ing data mining techniques are not secure multi-party computations in the

strict sense, because intermediate information is disclosed in addition to the

output, they are shown to still preserve privacy to a large extent, by en-

suring controlled disclosure of information. In other words, the information

disclosed in intermediate stages of the algorithm may be regarded as being

not sensitive, e.g. because it may give only a very general idea about the

underlying data.

3.2.2.3 Reconstruction-Based Approaches

The original privacy-preserving data mining approach described in [3] is a

reconstruction-based approach using a value distortion mechanism in which

input data attribute values are perturbed by adding a random value (based on

a uniform or gaussian distribution). In the reconstructed model, the original

data distribution is reconstructed with sufficient accuracy, i.e. the decision

tree classifiers generated as output based on the reconstructed model are

similarly accurate to classifiers generated based on a model created from

non-perturbed data.

A related approach [44] deals with creating association rules based on

a reconstructed model. Recommender systems are explicitly stated as a

possible application. The association rules are created based on randomized

transactions, i.e. transactions in which each original items is replaced with

a certain probability. Again, the resulting association rules are sufficiently

accurate to be used instead of rules based on non-perturbed transactions.

3.2.3 Private Information Retrieval

Private Information Retrieval deals with retrieving information via a query

mechanism, usually from a database, without revealing information about the

query and the retrieved results itself to the entity providing the information.

It has been introduced in [30]. Formally, the problem is expressed as follows:

Given a database Das a binary string xof length n, i.e. D=x1· · · xn, a

user has an index iand wishes to obtain xi, without the entity hosting the

database obtaining ior xi. In a more general definition, the database con-

tains more complex database records with a fixed maximal length. Obviously,

retrieving the entire database constitutes a trivial solution to this problem.

This approach has communication complexity of O(n) and is therefore in-

feasible for large databases. Additionally, it may violate the privacy of the

information provider. In the following, we discuss various classes of schemes

for PIR. Table 3.1 gives an overview of the discussed approaches.

Table 3.1: An overview of suggested Private Information

Retrieval schemes. Each solution is shown with its communication

complexity.

information-

theoretic privacy

computational

privacy

k-server schemes [30] [29]

O(n1/3)O(nε)

(O(n) computational [6]

complexity) O(n1/(2k−1)

single-server schemes trivial solution [75]

O(n)O(nεlog n)

(O(n) computational [21]

complexity) O(logdn)

single-server n/a [104]

hardware-based schemes optimal

(up to O(1) on-line [8]

computational complexity) optimal

The first approaches to PIR are based on replicating the database by

distributing copies of the original database on different servers:



In [30], a two-server PIR is described that has a communication com-

plexity of O(n1/3).



In [6], this solution is generalized to a k-server PIR scheme with com-

munication complexity O(n1/(2k−1).



In [29], a two-server solution with an improved communication com-

plexity O(nε) for any ε > 0 is described. The improved complexity is

reached by replacing the goal of information-theoretic privacy with the

goal of computational privacy, as defined in Section 2.3.3.3.

In all multi-server approaches, the respective providers are assumed not

to collude, acting in a non-honest manner. This requirement turns out to

be rather unrealistic, mainly because it is difficult to enforce, even more so

as the respective entities have to communicate and thus know each other

in order to keep the databases consistent. Therefore, later approaches are

based on single-server schemes and achieve computational privacy rather

than information-theoretic privacy (which in single-server PIR schemes can

only be reached via the trivial approach):



In [75], the first single-server PIR scheme is described, relying on an

intractability assumption. The communication complexity is O(nεlogn)

for any ε > 0. Basically, the approach is based on encrypting the query

and applying the encrypted query to the database in a way that the

provider is not able to recognize either the query or the result.



In [21], a single-server PIR scheme with polylogarithmic communication

complexity O(logdn) with d > 1 is described, based on a different

intractability assumption. This communication complexity is close to

optimal, because retrieving a bit from a binary string in the non-private

case has a communication complexity of O(logn).

Single-server solutions, however, require a computational complexity of

O(n), because the server has to access all data in every single PIR process,

because otherwise information could be obtained via observing which data is

actually accessed. For this reason, the single-server solutions described above,

while theoretically viable, are considered impractical with regard to real-

world large-scale databases. Therefore, most recent approaches are hardware-

based, i.e. they require tamper-proof hardware:



In [104], a secure coprocessor is used as a tamper-proof device for stor-

ing and propagating the relevant database record, based on an en-

crypted query received from the user. As in earlier approaches, all

records are read from the database, and therefore the computational

complexity is still in O(n).



In [8], the previous approach is optimized by storing all original data-

base records in encrypted and permutated form via the secure coproces-

sor. Thus, no longer all records have to read from the database, because

observing the retrieval of an encrypted records does not give any infor-

mation about the original record4. After a constant number of queries,

the secure coprocessor switches to a different set of encrypted and per-

mutated records. Because the preprocessing of encrypted databases in

done off-line, i.e. independent of actual queries, on-line computa

Finally, there are the following approaches closely related to PIR:

4If only one record would be read for every query, however, subsequent queries would

be revealed to be different if different encrypted records were read. Therefore, for the k-th

query, all k−1 records previously accessed have to be read again, and a random record

has to be read for duplicate queries.



Private Information Storage: The problem of Private Information Stor-

age, i.e. the unobservable storage of data in a remote database, is similar

to the problem of PIR, and analogous solutions have been suggested

e.g. for multi-server schemes and hardware-based schemes. We omit a

detailed discussion of these approaches here, because they are not di-

rectly relevant for the field of Privacy-Preserving Information Filtering.



Symmetrically-Private Information Retrieval: If the privacy of the da-

tabase provider has to protected in addition to the privacy of the user,

a Symmetrically-Private Information Retrieval (SPIR) scheme is re-

quired. The privacy of the database is protected if a user does not

receive any information in addition to the result of the query. Starting

with [53], there are a number of approaches addressing this problem

explicitly. In hardware-based PIR schemes, it is usually addressed im-

plicitly, because only the query result is returned to the user anyway.

3.2.4 Privacy-Preserving IF Architectures

Related work in privacy-preserving IF architectures focuses on distributed

architectures, i.e. on collaboration-based approaches. The main problem in

distributed IF architectures is how to determine for a given user either the

most similar users themselves or at least potential candidates for recommen-

dations. In [85], the following five approaches are suggested:



Random Discovery: A file-sharing protocol or similar mechanism is

used for finding and contacting other users. This approach is not very

efficient because a discovered user will only be similar coincidentally

and therefore a large number of other users has to be contacted.



Transitive Traversal: The Random Discovery approach is improved by

the the following approach: A given user asks another user that is

known to be similar for a list of further similar users, and contacts

these, because they are expected to be similar to the given user as well.

In this approach, a new user has still to contact other users randomly.



Centralized Model Generation: A trusted central server stores all ob-

jects themselves, and all user ratings of these objects, in a single model.

This approach is obviously critical with regard to privacy, because sen-

sitive information is stored centrally and trust is required to a large

extent unless the sensitive information is protected via other mecha-

nisms.



Distributed Model Generation: Using an approach similar to a secure

voting scheme, such as a secure blackboard, a global model may be

created in a privacy-preserving manner. Apart from the complexity,

the main drawback of this approach is the fact that it cannot be used to

determine similar users in addition to the recommendations themselves,

because global models abstract from user-user relationships.



Distributed Model Storage: In peer-to-peer networks with a determin-

istic overlay routing system, the network itself may be used as a dis-

tributed storage system for user-object relationships. This approach is

problematic with regard to privacy as well, because each user has to

make a list of rated objects available for other users.

A number of solutions have been suggested along these main approaches:



Competitive Recommender Systems: In [9] a distributed algorithm

based on the competitive recommender systems approach [41] is in-

troduced that is based essentially on random discovery: A recommen-

dation is generated for a user by iteratively retrieving an item and

determining its rating as a recommendation, until an item with a suffi-

ciently high rating is found. An item is retrieved either randomly from

the provider, or by asking a random user for a recommendation, the

decision being based on a random coin flip. The complexity of this

algorithm is largely dependent on the assumption that the number of

types is small or even constant with regard to the number of users,

and on the assumption that the number of users belonging to a type is

significantly larger than the number of users not belonging to any type,

because otherwise a large average number of iterations is required for

each user.



Yenta: In [45], an agent-based approach is described in which user

agents representing similar users are discovered via the transitive traver-

sal approach. A hill-climbing algorithm is applied in order to find a

user agent with maximal similarity. The problems inherent to this

approach, namely the issues of local maxima and isolated groups of

agents are mentioned, but, somewhat surprisingly, stated to be negli-

gible. Privacy is preserved through pseudonymous interaction between

the agents and adding obfuscating data to personal information, re-

sulting in a “probable innocence” degree of unlinkability. More recent

related approaches are described e.g. in [79].



PocketLens: As described in [85], all five approaches are implemented

as reference architectures for the PocketLens algorithm, a neighbor-

hood-based algorithm for distributed collaborative filtering. The real-

ized architectures are characterized by the respective drawbacks listed

above.



Alambic: The Alambic system [4] is based on a centralized model gen-

eration approach in which the privacy of the model is preserved by a

combination of mechanisms. It is described in detail below.



Cryptography-Based Collaborative Filtering: Various cryptography-

based approaches [24][25] generating a global model in a distributed

way have been suggested. They are described in detail below.



Reconstruction-Based Collaborative Filtering: Various reconstruction-

based approaches [93][92] generating a central model in a privacy-

preserving way have been suggested. They are described in detail be-

low.

Finally, it should be noted that other aspects not related to privacy have

to be addressed as well when designing a distributed IF architecture: In

[113], an agent-based distributed recommender system with the focus on

utility rather than privacy is described, i.e. the main problem addressed is

how to enable agents decide when and to whom to provide recommendations.

It is concluded that agent with similar profiles may profit from exchanging

recommendations, but no mechanism for determining likely candidates is

given apart from random interaction of agents, which is rather inefficient.

Privacy aspects are explicitly not addressed.

The approaches most relevant as work related to our approach are dis-

cussed in detail in the following.

3.2.4.1 Alambic

The Alambic system [4] proposes a mechanism for privacy-preserving demo-

graphic filtering which may be generalized in order to support other kinds of

filtering techniques. While its description strongly suggests that the archi-

tecture is based on MAS technology, this is not stated explicitly. Basically,

the filter is realized as an independent entity which is to some degree con-

trolled by the provider (all communication with other entities is controlled

and monitored by the provider, by unspecified means), but at the same time

protected against manipulations (through code encryption and obfuscation,

or alternatively by utilizing tamper-proof hardware). The provider model

contains clusters of users with a similar demographic background, where

each cluster is assigned indexes of the provider profile elements themselves.

The provider model is exclusively controlled and accessed by the filter role.

It is updated based on feedback of users.

In a filtering process, the user profile is encrypted via the filter role’s

public key as protection against the provider entity. The encrypted profile is

sent from the user role to the provider entity, where it is stored for unspecified

reasons. It is then propagated to the filter entity, where it is decrypted and

subsequently compared with the cluster centroids in order to determine the

best matching cluster. The indexes of profile elements associated with the

best matching cluster are encrypted twice via a symmetric encryption scheme,

first with a secret key shared with the provider entity (as protection against

the user entity), and then with a secret key shared with the user entity via

the encrypted profile (as protection against the provider entity).

The encrypted indexes are returned to the user entity, presumably again

via the provider entity. Finally, the actual recommendations are determined

via the indexes, either directly by the provider role itself, or via a more

complex PIR-based protocol between user and provider entity which keeps

the actual recommendations hidden from the provider. Again, details of the

protocol are unspecified. All user interaction with the system is anonymous.

The specific algorithm used for determining clusters is not specified, because

it is irrelevant for the overall privacy-preserving architecture.

3.2.4.2 Cryptography-Based Collaborative Filtering

In [24], a distributed privacy-preserving architecture for a recommender sys-

tem based on collaborative filtering via Singular Value Decomposition is de-

scribed. In this approach, recommendations are generated via a public model

aggregating the distributed user profiles without containing explicit informa-

tion about user profiles themselves. A similar approach based on factor anal-

ysis is described in [25]. Both approaches are based on secure multi-party

computation.

The overall complexity is O(m n log n) for nusers and mdifferent items

contained within the user profiles, which is reasonable close to the lower

bound of Ω(m n). The model is structured as a matrix Pwith Pij being

the rating of user ifor item j, which is zero if the item has not been rated,

and greater than zero otherwise. This matrix is not known to any partici-

pant. Recommendations are generated via the public matrix A, computed

via SMPC, with Abeing a global linear approximation of P.

All computation is distributed between the users. An honest majority of

users is assumed, i.e. a fraction α > 1/2 of all involved computers are assumed

to be uncorrupted. The approach assumes two services: A blackboard in the

form of a write-once, read-many storage system, and a trusted source of

random bits. The protocol itself, which we do not present here, is largely

based on a scheme for secure e-voting described in [35]. It uses homomorphic

encryption5in order to handle sums of encrypted vectors without exposing

the underlying data itself.

3.2.4.3 Reconstruction-Based Collaborative Filtering

In [93], a distributed privacy-preserving architecture for a recommender sys-

tem based on collaborative filtering via Singular Value Decomposition is de-

scribed, i.e. an architecture very similar to the one described above. The main

difference is the following: Privacy is preserved through random perturba-

tion instead of secure-multi party computation. A similar approach based on

correlation of users is given in [92]. In the first approach, a global model is

created based on vectors representing the single user profiles in which all data

has been perturbed by adding random values to the vector elements, based on

a given distribution. A significant trade-off between privacy and accuracy is

observed: Obviously, using a small amount of perturbation is insufficient for

preserving privacy because the modified vector closely resembles the origi-

nal vector. On the other hand, a large amount of perturbation decreases

the overall accuracy because the vector elements become indistinguishable

from random noise. Determining the optimal trade-off is problematic be-

cause while accuracy can be expressed in terms of mean absolute error of

the model created from the perturbed data, privacy is generally more diffi-

cult to quantify. Based on the method suggested in [1], a privacy measure

is used indicating how closely an original value can be estimated based on

the perturbed value: The privacy loss P(X|Z) is the fraction of privacy of a

variable Xthat is lost by revealing the variable Z. Privacy of a variable itself

is expressed based on its differential entropy, i.e. a measure of uncertainty in

the values of the variable. Using these measures for privacy and accuracy,

an optimal trade-off may be determined.

3.2.5 Evaluation

Unlike single PETs, the Privacy-Preserving Technologies discussed in this

section are sufficient for preserving privacy in specific application domains,

sometimes under certain realistic assumptions. While there is a compara-

tively small amount of related work describing PPTs for Information Filter-

5Homomorphic encryption schemes basically allow operations on encrypted data to be

carried out in a meaningful manner.

ing, the related approaches discussed here are also relevant in the context of

PPIF because they may used as parts of the overall architecture:



Peer-oriented approaches methods may be used as part of a filtering

technique in collaboration-based approaches, i.e. for clustering users

in order to determine similar users. While feature-based approaches

may also be based on clustering of profile elements, in this case the

clustering itself generally does not have to be performed in a privacy-

preserving manner because the underlying information does not refer

to more than one participant.



Privacy-preserving data mining approaches are relevant because data

mining methods may be used for creating provider profile models in

collaboration-based approaches, e.g. by deriving association rules based

on user feedback.



Private Information Retrieval schemes are intrinsically very relevant in

the context of PPIF because the filtering process may rely on queries

on a large dataset. Apart from performance issues, however, proposed

solutions in this area are currently of a more theoretical nature and

cannot easily be applied to real-world database management systems.

However, these approaches may not be used by themselves for realizing

Privacy-Preserving Information Filtering, because the respective IF system

would not meet all requirements listed in Section 2.3.4:



By definition, Peer-oriented approaches methods are only suitable for

collaboration-based approaches and thus do not even meet all func-

tional requirements. Even ignoring this aspect, they do not address

filter privacy, and are characterized by a high complexity, leading to

problems with regard to performance and/or broadness.



Privacy-preserving data mining approaches do not address filter pri-

vacy, and protect the privacy of the user only up to the information

processing stage, but not in the information filtering stage. They are

additionally characterized by quality issues (perturbation-based ap-

proaches) or performance issues (approaches based on Secure Multi-

Party Computation).



Private Information Retrieval schemes do not address provider privacy

in the context of IF, nor filter privacy. As largely theoretical ap-

proaches, they do not meet the requirement of adequate performance

in realistic scenarios.

Acceptance is achieved to a large degree by meeting the respective privacy

requirement.

Table 3.2 summarizes the evaluation of these PPTs in the context of all

work related to PPIF. To recapitulate, the PPTs discussed in this section

provide useful functionality for parts of PPIF architectures but cannot be

used by themselves in order to realize these architectures.

Finally, we evaluate the existing approaches for distributed PPIF archi-

tectures. These approaches are usually restricted to specific filtering tech-

niques and are only suitable for collaboration-based approaches and thus

do not meet all functional requirements. With regard to the non-functional

requirements, they are characterized by the following drawbacks:



Non-model-based approaches, such as the competitive recommender

systems approach and Yenta, are infeasible for real-world applications,

mainly due to the complexity involved in the task of determining sim-

ilar users. They do not address filter privacy, and are restricted to

specific filtering techniques, which is problematic with regard to the

requirement of broadness.



The PocketLens approach is characterized by the drawbacks listed in

the discussion of the five main approaches for determining similar users,

which are problematic either with regard to user privacy, or with re-

gard to performance. Furthermore, the PocketLens approach does not

address filter privacy.



The Alambic system is the approach most closely related to our archi-

tecture. While basically viable, two aspects are insufficiently addressed,

namely the protection of the filter against manipulation attempts (the

technology-based solutions are only described in an abstract way, while

a solution based on an additional trusted party only moves the under-

lying problem to a different level), and the prevention of collusions

between the filter and the provider (the suggested solution again re-

lies on a trusted third party). In other words, the approach does not

address filter privacy, and it only insufficiently addresses user privacy.



All Cryptography-Based Collaborative Filtering approaches are largely

theoretical, i.e. they often have not even been fully implemented. Fur-

thermore, they require the participation of all users, especially for up-

dating the model, a course of action that is somewhat inefficient for a

large number of users. Additionally, the suggested approaches cannot

be used to realize a Matchmaker System, because similar users are not

determined as such, and they do not address filter privacy.



The described Reconstruction-Based Collaborative Filtering approach

deals only with the privacy-preserving generation of the model, it does

not address the problem of obtaining recommendations in a privacy-

preserving manner. Therefore, only the information procession stage

of an entire IF process is covered. As in the case of perturbation-

based approaches for privacy-preserving data mining, the requirement

of quality is problematic, and, as in all approaches discussed in this

section, filter privacy is not addressed.

Acceptance is achieved to a large degree by meeting the respective pri-

vacy requirement. Provider acceptance of these approaches is expected to

be somewhat lower, though, because these approaches may not allow the

provider to obtain any information about users at all.

Table 3.2 summarizes the evaluation of these distributed PPIF architec-

tures in the context of all work related to PPIF. To recapitulate, none of

the distributed PPIF architectures discussed in this section, which meet the

non-functional requirements to a larger extend than single PETs or PPTs,

constitutes a generic framework suitable for various kinds of filtering tech-

niques and for realizing feature-based and collaboration-based Recommender

Systems as well as Matchmaker Systems.

3.3 Privacy in Multi-Agent Systems

In this section, we discuss work related to privacy aspects of architectures

based on Multi-Agent System technology. After a brief overview of ap-

proaches for anonymous communication between agents, we describe pro-

posed solutions dealing with the problem of malicious hosts in MAS archi-

tectures.

3.3.1 Anonymous Communication

A MAS architecture requiring anonymous communication of agents may

adapt general solutions for anonymous communication, as described in Sec-

tion 3.1.1. Theoretic foundations for reasoning about anonymity and infor-

mation hiding in MAS architectures are given by [61]. Furthermore, the

following MAS-specific approaches for anonymous communication have been

suggested:



In [73], an approach for anonymous communication of mobile agents

based on onion routing (see Section 3.1.1.1) is described which is is

based on the JADE6multi-agent platform. Every agent platform con-

sists of dedicated onion agents providing a data forwarding service (thus

representing the mixes) which communicate via an Agent Communi-

cation Language, and an additional manager agent monitoring these

agents.



In [15], the anonymizing service provided by the AgentScape7frame-

work is described. It is realized as a simple router, i.e. a proxy agent

(see Section 3.1.1.1) relaying communication between two agents that

intend to remain anonymous.

Both approaches may be adapted to other MAS architectures in a straight-

forward manner.

3.3.2 Protection against Malicious Hosts

As discussed in Section 2.4.3, the protection of agents against malicious

hosts is a central requirement in MAS architectures consisting of personal

agents representing different entities that regard each other as potentially

non-honest. The solutions addressing this problem have been suggested:



Code encryption: As described in [98], the code of a mobile agent my

be protected by using an encrypted function computing the result for

input provided by the host. While the host obtains the result of the

function, he is not able to obtain the original function itself. He may,

however, still attack the mobile agent by re-running it and attempting

to obtain information about the function by using different inputs.

These kinds of attacks may be prevented by requiring communication

with a trusted third party during the computation of the result [5].



Data Encryption: In addition to encrypting a function as part of a

mobile agent’s algorithm, the data of an agent may be encrypted as

well, and all operations of the agent on an untrusted platform may be

carried out on the encrypted data itself, without the need for decrypting

the data. This approach requires a homomorphic encryption scheme,

e.g. a scheme based on the ElGamal cryptosystem [42].



Code Obfuscation: Code obfuscation in the context of MAS is the trans-

formation of an agent’s code into a form that is hard to understand or

6Available via the URL <http://jade.tilab.com/>.

7Available via the URL <http://www.agentscape.org>.

to reverse-engineer by attackers, but at the same time results in equiv-

alent behavior of the agent executing the obfuscated code, compared

to an agent executing the original code. A comprehensive overview is

given e.g. in [37]. It has been shown [10] that there is no general obfus-

cation method that always results in a perfectly obfuscated program

which reveals nothing about the underlying algorithm. Research in the

area continues, however, because in practice less-than-ideal obfuscation

may be acceptable: In the case of mobile agents, an obfuscation scheme

deterring or delaying attacks for a certain amount of time could still be

useful for protecting mobile agents against manipulations while they

are running on an untrusted platform.



Recording and Tracing Approaches: A number of other approaches have

been suggested which mainly aim at detecting malicious hosts, without

the possibility of preventing attacks. A survey of these approaches

is given in [64]. They are based on recording the itinerary of mobile

agents visiting a number of mobile platforms, and detecting possible

malicious hosts by inconsistencies in the recordings [97], or by tracing

the execution of an agent in a non-repudiable log file [114].



Trusted Computing: While the approaches outlined above may be suf-

ficient in specific scenarios, the only approach suitable for protecting

mobile agents against malicious hosts in a generic way appears to be the

use of secure hardware, e.g. through trusted computing, as described

in Section 3.1.3. Via remote attestation, a platform may be ensured to

be incapable of acting in a malicious way, e.g. by realizing the virtual

machine approach for semantic remote attestation described in [60].

The drawbacks of these approaches are discussed in the following section.

3.3.3 Evaluation

While the solutions for anonymous communication in MAS architectures con-

stitute adequate solutions, the proposed approaches dealing with the problem

of malicious hosts are characterized by several drawbacks:



Code encryption: The main drawback of this approach is that mainly

because of complexity issues, it can be applied efficiently only to basic

algorithms, such as algorithms for evaluating polynomial expressions.

As in the case of secure multi-party computation (see Section 3.1.2,

it is infeasible for complex algorithms such as filtering techniques for

Information Filtering.



Data Encryption: Data encryption in itself is sufficient for protecting

data of a mobile agent, as long as the data is static in the follow-

ing sense: The data should be obtained before the agent migrates to

the untrusted platform, and decrypted only after the agent has left it.

Unencrypted data received through communication while the agent is

running on the untrusted platform should not be combined with the

encrypted data in any way. This restriction implies that most mobile

agent scenarios cannot use this approach, because they usually involve

communication of the mobile agent, except for simple remote comput-

ing scenarios.



Code Obfuscation: Code obfuscation in itself is insufficient for protect-

ing the privacy of mobile agents against non-honest hosts, because the

respective attacks are not limited to the duration of the execution of

the mobile agent.



Recording and Tracing Approaches: These approaches do not actually

protect the privacy of mobile agents, which makes them infeasible for

many scenarios including Privacy-Preserving Information Filtering.



Trusted Computing: As described above, trusted computing appears

to be the only approach suitable for protecting mobile agents against

malicious hosts in a generic way.

To recapitulate, privacy aspects in MAS architectures are largely cov-

ered by related work. Therefore, we do not provide additional approaches

addressing these aspects in this work, but rather build upon the existing

solutions.

3.4 Summary

This chapter discusses different areas of related work, namely Privacy-En-

hancing Technologies (Section 3.1) and Privacy-Preserving Technologies (Sec-

tion 3.2), including approaches for distributed Privacy-Preserving Informa-

tion Filtering. It also discusses work related to aspects of privacy in Multi-

Agent Systems (Section 3.3). We summarize the evaluation of the various

areas of work related to PPIF (i.e. Section 3.1.5 and Section 3.2.5) in Ta-

ble 3.2, based on the non-functional requirements defined in Section 2.2.3.

To recapitulate, no single solution is sufficient for realizing an architecture

for Privacy-Preserving Information Filtering meeting all functional and non-

functional requirements.

Table 3.2: An overview of existing PETs, PPTs, and approaches

for PPIF in relation to the requirements and acceptance aspects

of Privacy-Preserving Information Filtering. A requirement may

be fully met (indicated by “X”), partially met (indicated by “o”),

or not met at all (indicated by “–”). Acceptance is indicated

in an analogous manner. Note that the ratings do not always

indicate the best value theoretically possible for the architecture

utilizing the respective technology, or the respective architecture

for distributed PPIF, but a realistic average value.

Privacy Other Accep-

Requirements Requirements tance

RuRpRfRqq Rbb Rpp AuAp

PETs

Anonymous Communication –X–

  

X–

SMPC X X –



– –

 

Trusted Computing X X X



–



–



Privacy Enforcement –X–

    

PPTs

Peer-Oriented Approaches



X–



– –

 

SMPC-based PPDM –X–



– – – X

Perturbation-based PPDM –X– – –



–X

Private IR X



–

 

–

 

Distributed PPIF

Random Discovery-based



X–



– –

 

PocketLens



X–



– –

 

Alambic



X–



–

  

Cryptography-based X X –



– –

 

Reconstruction-based X X – – –

  

At first glance, out of all related work, trusted computing seems to be

the most suitable approach for PPIF. Consequently, a straightforward course

of action would be to take an existing comprehensive IF system and to the

respective architecture based on trusted computing. As discussed in Sec-

tion 10.2, this approach is problematic mainly with regard to the requirement

of broadness and the aspect of user acceptance. Therefore, even if the IF sys-

tem is to be based on trusted computing, a different approach is required.

We describe this approach in the following chapter.

Chapter 4

Privacy-Preserving Information

Filtering

This chapter gives an overview of our approach for Privacy-Preserving In-

formation Filtering. It focuses on the general idea and main concepts of our

solution, and omits all details of the approach, which are covered by the

following chapters.

This chapter is structured as follows: The following section lists the sup-

ported use cases, which are directly derived from the functional requirements

listed in Section 2.3.4. Section 4.2 gives a high-level outline of our solution

for PPIF. Based on two essential concepts, namely the concept of a trusted

environment for protecting privacy in Recommender Systems (Section 4.2.1),

and the additional concept of an anonymous centralized model for protecting

privacy in Matchmaker Systems (Section 4.2.2), we motivate the use of MAS

technology in this context (Section 4.2.3) and introduce the components of

our approach that realize these concepts via MAS technology (Section 4.2.4).

In Section 4.3, we briefly describe the implementation of the approach itself

as well as the implementation of a prototypical application based on the

approach. Section 4.4 summarizes the chapter.

4.1 Use Cases

As defined in Section 2.2.1, we distinguish between the following kinds of IF

systems, each fulfilling a distinct goal:



ARecommender System generates recommendations and predictions of

items.



AMatchmaker System determines similar users.



AHybrid IF System generates recommendations and predictions of

items via determining similar users.

A comprehensive IF approach should provide functionality that is sufficient

for realizing these kinds of systems. In other words, an approach for PPIF

should provide functionality for realizing the following four main use cases:



The use case “get prediction for item” with interactions resulting in

predu,s,ft,i, i.e. the predicted relevance of item ifor a given user u,

based on the profile of the supplier s(which may be a provider or

another user), and the filtering technique ft.



The use case “get recommendations” with interactions resulting in

RECu,s,ft,n, i.e. the top-nrecommendations with parameters as above.



The use case “get prediction for user” with interactions resulting in

predu,s,ft,u0, i.e. the predicted similarity of user u0with parameters as

above.



The use case “get similar users” with interactions resulting in SUu,s,ft,n,

i.e. the top-nsimilar users with parameters as above.

We note that each use case actually consists of two partial use cases,

one in which the complete supplier profile is used, and one in which a con-

strained supplier profile is used, containing elements returned as a result of

a query on the supplier profile: The latter use case constitutes the mixed

IR/IF scenario, the IR-related part of which is not considered as privacy-

critical, because it is assumed not to be directly related to the user profile

data. A plausible scenario for this use case is the following example: A user

wants to receive recommendations from a movie recommender, but the rec-

ommendations should be restricted to movies being shown on a specific day

on television or in cinemas in a specific city. In this case, the respective

queries usually return a more manageable part of the supplier profile, such

as 200 out of 20.000 movies.

Additionally, these use cases may be split up further depending on the

actual degree of user privacy (see Section 2.3.3.3), which is basically defined

via the result data the supplier obtains:



Completely Linkable Result Data: All result data is linkable to the user.

Consequently, it is internally linkable as well.



Semi-Linkable Result Data: The result data is internally linkable, but

not to the user. As an example, the supplier may determine whether

two recommendations belong to one set of recommendations generated

for a single user, but the user himself remains anonymous. This case

does not apply to the prediction-based use cases, because the concept

of unlinkability does not apply to single elements.



Semi-Private Result Data: The result data is internally unlinkable, and

it is not linkable to a user as well.



Completely Private Result Data: The supplier does not obtain any

result data.

We subsume the first two cases under the designation Linkable Result

Data, and the last two cases under the designation Private Result Data.

In addition to these use cases, a comprehensive approach for PPIF should

also provide functionality for the following use cases related to the first two

stages of an IF process, as described in Section 2.2.1:



The use case “update profile elements”, in which an entity adds ele-

ments to the respective profile, or removes elements from it during the

Information Collection stage.



The use case “update profile model”, in which the model for a profile is

updated based on added or removed elements during the Information

Processing stage.

Table 4.1 summarizes all six use cases described above, which we refer

to as main use cases in the following. Other use cases providing aggregated

by these main use cases are referred to as partial use cases. As they are not

directly relevant for the outline of our solution, they are not listed here.

4.2 Outline of the Solution

We propose an approach for agent-based Privacy-Preserving Information Fil-

tering suitable for realizing Recommender Systems, Matchmaker Systems,

and Hybrid IF Systems. The approach meets all requirements stated in

Section 2.3.4 and may be used to realize all use cases introduced above, as

shown in Section 10.1.

In the following outline of our solution, we focus on the information filter-

ing stage and disregard the two preceding stages, mainly because they are no

more critical with regard to privacy as they typically involve fewer entities:

Table 4.1: Main use cases covered by our approach for PPIF.

main use case supplier result data

profile compl. semi- private

linkable linkable

information collection stage

update profile elements n/a n/a (no result data)

information processing stage

update profile model n/a n/a (no result data)

information filtering stage

get prediction for item complete Xn/a X

get prediction for item query result Xn/a X

get recommendations complete X X X

get recommendations query result X X X

get prediction for user complete Xn/a X

get prediction for user query result Xn/a X

get similar users complete X X X

get similar users query result X X X



A process of the information collection stage only involves a single

entity, namely the user or the provider entity, depending on the profile

data to be collected.



A feature-based process of the information processing stage involves two

entities, namely the user or the provider entity, depending on the profile

data to be processed, and the filter entity. As noted in Section 2.2.1,

these processes occur in Recommender Systems only.



A collaboration-based process of the information processing stage typ-

ically involves all three abstract entities. As noted in Section 2.2.1,

these processes occur in Matchmaker Systems and Hybrid IF Systems.



A process of the information filtering stage involves all three abstract

entities.

Additionally, processes of the first two stages do not return sensitive result

data, as in the final stage. Therefore, the concepts of a solution for the

information filtering stage are likely to be applicable for the other stages as

well (Section 7.2.2.1 and Section 7.2.2.2 show that this is in fact the case).

We begin the outline by addressing the most problematic aspect in Priva-

cy-Preserving Information Filtering, namely the apparent paradox of provid-

ing private information in order to obtain personalized information without

losing control over the provided private information.

4.2.1 Trusted Environment

The requirements state that no private information should be acquired per-

manently by other entities. The basic idea for realizing the privacy-related

requirements in Recommender Systems is already suggested implicitly by

the use of the adverb “permanently” in this context: While it is obviously

important that permanent acquisition is prevented, temporary acquisition

of private information may be allowed and therefore used to full capacity.

Thus, the use case “get recommendations” is realized as follows on the most

abstract level: User and supplier entity both propagate the respective profile

data to the filter entity. The filter entity provides recommendations (either to

both entities, or only to the user entity), and deletes all private information

afterwards.

There are basically two approaches for realizing an acquisition of private

information that is in fact only temporary:



Trusted Software: The respective entity is trusted or known - e.g.

through validation via trusted computing mechanisms - to remove the

respective information as specified;



Trusted Environment: The respective entity operates in an environment

that is trusted or otherwise known to control the communication and

lifecycle of the entity to an extent that the removal of the respective

information may be achieved regardless of the attempted actions of the

entity itself. Additionally, the environment itself is trusted or otherwise

known not to act in a malicious manner in this context (e.g. it cannot

extract and propagate the respective information itself).

Our solution is based on a trusted environment because, although it is more

complex than the trusted software approach, the trust issues are resolvable

more easily in this approach, basically because a trusted environment may be

realized in a more generic way. We address this issue in detail in Chapter10.2.

According to this decision, we specify the abstract information filtering

protocol for the use case “get recommendations (linkable result data)” as

shown in Figure 4.1: The filter entity deploys a Temporary Filter Entity

(TFE) operating in a trusted environment. The user entity deploys an ad-

ditional relay entity operating in the same environment. These additional

entities are short-lived, i.e. they are terminated after a specified number of

tasks. Through mechanisms provided by this environment, the relay entity

is able to control the communication of the TFE, and the supplier entity is

able to control the communication of both relay entity and the TFE. Thus,

it is possible to ensure that the controlled entities are only able to propagate

recommendations, but no other private information.

In the first stage (Step I.a to Step I.c of Figure 4.1), the relay entity

establishes control of the TFE, and thus prevents it from propagating user

profile information. User profile data is propagated without participation

of the supplier entity from the user entity to the TFE via the relay entity.

In the second stage (Step II.a to Step II.c of Figure 4.1), the supplier entity

establishes control of both relay and TFE, and thus prevents them from prop-

agating supplier profile information. Supplier profile data is propagated from

the supplier entity to the TFE via the relay entity. In the third stage (Step

III.a to Step III.e of Figure 4.1), the TFE returns the recommendations via

the relay entity, and the controlled entities are terminated. Taken together,

these steps ensure that all private information is acquired temporarily only

by the other main entities. The use case “get prediction for item (linkable re-

sult data)” is realized via a similar protocol, in which the result data contains

the prediction instead of recommendations.

The use cases “get recommendations (private result data)” and “get pre-

diction for item (private result data)” are realized by a protocol based on

similar steps (see Figure 4.2), but including an additional relay entity de-

ployed by the supplier, which is basically required for validating the result

data before it is propagated to the user. In the case of linkable result data,

this task is carried out by the supplier entity itself, which is not possible in

the case of private result data, because the supplier entity may not obtain

result data directly.

Figure 4.1: The abstract privacy-preserving information filter-

ing protocol for the use cases returning linkable result data. All

communication across the environments indicated by dashed lines

is prevented with the exception of communication with the con-

trolling entity.

Figure 4.2: The abstract privacy-preserving information filter-

ing protocol for the use cases returning private result data. All

communication across the environments indicated by dashed lines

is prevented with the exception of communication with the con-

trolling entity.

While these protocols may also be applied in distributed Matchmaker

Systems, in this case another central problem has yet to be addressed, namely

the challenge of determining user candidates in an efficient manner.

4.2.2 Anonymous Centralized Model

In order to meet the privacy-related requirements in Matchmaker Systems,

the protocols introduced above may be applied, with a different user consti-

tuting the supplier entity in the interactions. Determining similar users in

general, however, is difficult if the number of users is too large to efficiently

carry out this protocol for each pair of users. As described in Section 3.2.4,

there are various approaches for determining suitable candidates from the

set of all users. Our solution is based on a combination of the mechanisms

of random discovery, transitive traversal, and central model generation.

In order to preserve privacy, all information related to users is stored

anonymously in a centralized model: A user adding a specific item to his

profile may announce this anonymously, which allows the provider of the

centralized model to store the relationship of item and pseudonym. By using

a different pseudonym for each user-element relation stored in the central

model, unlinkability of users and items as well as of the items among them-

selves is realized. Subsequently, a given user may obtain the information

that the profiles of other users contain a specific item, but is given only

a pseudonymous communication address for contacting the candidate user.

Obviously, a mechanism for anonymous communication is required for this

solution.

The provider of the centralized model does not necessarily have to be iden-

tical with the provider of the underlying data. In most scenarios, however,

a single entity is likely to constitute both providers, because maintaining a

centralized model allows the provider to obtain some feedback regarding the

prevalence of items in user profiles, which may be useful information. Other

entities are less likely to be sufficiently motivated to provide a centralized

model.

The protocol for the use case “get similar users” is defined as follows: The

user entity anonymously receives anonymous candidates from the provider,

which may be selected randomly or based on his profile elements. The user

entity interacts with the candidates in order to determine the similarity of

the respective profiles, or in order to obtain additional candidates with whom

he interacts in the same manner. Over time, the most similar users are found

with high probability. The user may also receive candidate users randomly,

or from other similar users.

4.2.3 Use of MAS Technology

As outlined in the previous sections, our approach for Privacy-Preserving

Information Filtering is based on a distributed system in which the main

abstract entities of user, provider and filter are modeled as distinct enti-

ties which control the respective private information exclusively. Thus, only

interactions involving more than one entity have to be considered as privacy-

critical. If communication control as described above is actually realized,

and the entities are protected against external threats, sensitive information

may actually be protected and a PPIF architecture may be realized. This

approach requires a participating entity to have the following five main abil-

ities:



The ability to perform certain well-defined tasks (such as carrying out

a filtering process) with a high degree of autonomy, i.e. largely indepen-

dent of other entities (e.g. because the entity is not able to communicate

in an unrestricted manner);



The ability to be deployable dynamically in a well-defined environment;



The ability to communicate with other entities;



The ability to achieve protection against manipulation attempts;



The ability to control and restrict the communication of other entities;

As defined in Section 2.4.1, MAS architectures are an ideal solution for

realizing the approach, because they provide agents as entities actually char-

acterized by autonomy, mobility and the ability to communicate, as well

as agent platforms as environments providing means to realize the security

of agents. In this context, the issue of malicious hosts, i.e. host attacking

agents, has to be addressed explicitly. Additionally, existing MAS architec-

tures generally do not allow agents to control the communication of other

agents, i.e. this specific ability is not covered as such. It is possible, however,

to expand MAS architecture in order to provide agents with this ability. For

these reasons, our approach is based on a MAS architecture. Concluding the

outline, we give a high-level overview of the architecture and lists its main

components.

4.2.4 Main Components

Continuing the depictions of existing IF architectures in Section 2.2.2, Figure

4.3 shows a high-level overview of the architecture for PPIF for the non-

collaboration-based scenario. In addition to the MAS architecture itself,

which is assumed as given, the architecture in general consists of the following

five main components providing the required functionality:



Because MAS architectures generally do not provide functionality for

controlling the communication of agents, nor for anonymous commu-

nication, a component realizing this functionality is provided, namely

the Infrastructure Module described in Chapter 5.



In order to facilitate the use of different data storage mechanisms,

and to provide a uniform interface for accessing persistent information,

which may be utilized for monitoring critical interactions involving po-

tentially private information e.g. as part of queries, a component for

transparent persistence is provided, namely the TPMAS Module de-

scribed in Chapter 6.

Figure 4.3: The proposed architecture for Privacy-Preserving

Information Filtering.



Functionality for realizing the Recommender System use cases “get

prediction for item” and “get recommendations” is provided within

the Recommender Module component described in Chapter 7.



Functionality for realizing the Matchmaker System use cases “get pre-

diction for user” and “get similar users” is provided within the Match-

maker Module component described in Chapter 8.



Finally, while these components may generally be used in connection

with various filtering techniques that are not restricted to specific do-

mains, the protocols impose certain other restrictions on the actual fil-

tering techniques. Therefore, Exemplary Filtering Techniques are pro-

vided as a separate component described in Chapter 9 in order to show

that the requirements may actually be met by choosing appropriate

filtering techniques.

Figure 4.4 gives an overview of the five main components.

All non-functional requirements listed in Section 2.3.4 are addressed by

these components, with different components focusing on different require-

ments:



The requirement of user privacy is primarily addressed by the In-

frastructure Module providing foundations for privacy protection, the

Figure 4.4: The five main components of the PPIF architecture.

Additional parts required for the architecture but not directly

contributed by this work are grayed out.

Recommender Module providing privacy-preserving protocols, and the

Matchmaker Module protecting the privacy of users as participants in

a distributed Matchmaker System.



The requirement of provider privacy is primarily addressed by the In-

frastructure Module and the Recommender Module, analogous to user

privacy.



The requirement of filter privacy is primarily addressed by the Infras-

tructure Module and the Recommender Module, analogous to user pri-

vacy.



The requirement of quality is addressed by the Exemplary Filtering

Techniques providing result data, and the Matchmaker Module provid-

ing users that are probably similar to a given user. In both cases, the

provided information should be of high quality.



The requirement of broadness is addressed by the TPMAS Module

providing a uniform interface for accessing persistent information, and

by Exemplary Filtering Techniques which are not restricted to a specific

domain.



The requirement of performance is primarily addressed by the Exem-

plary Filtering Techniques, because the filtering process itself is most

critical with regard to performance. All other components take this

requirement into account as well.

The algorithms used in the Exemplary Filtering Techniques component

take the privacy requirements into account as well. Table 4.2 gives an

overview of these relationships.

The trusted environment introduced above encompasses the MAS archi-

tecture itself and the Infrastructure Module. In other words, these com-

ponents have to be trusted to act in a non-malicious manner to rule out

the possibility of malicious hosts. Explicit trust with regard to the other

components is not required because they operate within the trusted environ-

ment and are thus prevented from acting in a malicious manner. Finally, it

should be noted that while we have chosen a specific MAS architecture for

the implementation, the specification of the approach is applicable to any

FIPA-compliant MAS architecture.

Table 4.2: The five main components of our approach for PPIF

in relation to the requirements. A components provides primary

functionality (indicated by “X”), auxiliary functionality (indi-

cated by “



”), or no explicit functionality (indicated by “–”) with

regard to a specific requirement.

Privacy Other

Requirements Requirements

RuRpRfRqq Rbb Rpp

Infrastructure Module X X X – –



TPMAS Module – – – – X



Recommender Module X X X – –



Matchmaker Module X– – X–



Exemplary FTs

  

X X X

4.3 Implementation

We have implemented our approach for Privacy-Preserving Information Fil-

tering based on JIAC IV [50, 101, 100], a FIPA-compliant MAS architecture.

JIAC IV integrates fundamental aspects of autonomous agents regarding

pro-activeness, intelligence, communication capabilities and mobility by pro-

viding a scalable component-based architecture. Additionally, JIAC IV offers

components realizing management and security functionality, and provides

a methodology for Agent-Oriented Software Engineering. In the context

of PPIF, JIAC IV stands out among other MAS architectures as the only

security-certified architecture, as it has been certified by the German Federal

Office for Information Security according to the Evaluation Assurance Level

3 of the Common Criteria for Information Technology Security standard [52].

JIAC IV offers several security features in the areas of access control for

agent services, secure communication between agents, and low-level security

based on Java security policies [99]. Access control for agent services is based

on authenticated users or X.509 certificates associated with agents. JIAC IV

offers also means to secure the communication channel between agents. This

is either achieved by using the SSL protocol on the transport level or, if

this not possible, e.g. because a FIPA-compliant exchange of speech acts via

the Agent Communication Channel is required, by using an application level

protocol similar to SSL in order to protect speech acts. X.509 certificates are

used for access control and for protecting the communication channel, based

on a public key infrastructure [17]. Finally, Java security mechanisms [58]

are used to protect agents from attacks performed by other agents within

the same Java Virtual Machine. Java security mechanisms are also used

to represent human users as subjects within the Java Authentication and

Authorization architecture [76].

We have implemented all components listed above, following the decisions

made in the analysis and design phase, which are described in the following

chapters. As a proof of concept, and in order to evaluate performance and

quality under real-life conditions, we have also used our approach within the

Smart Event Assistant, a MAS-based Recommender System which integrates

various personalized services for entertainment planning in different German

cities, such as a restaurant finder and a movie finder [117]. Additional ser-

vices, such as a calendar, a routing service and news services complement

the information services. An intelligent day planner integrates all function-

ality by providing personalized recommendations for the various information

services, based on the user’s preferences and taking into account the loca-

tion of the user as well as the potential venues. All services are accessible

via mobile devices as well1. Figure 4.5 shows a screenshot of the intelligent

1The Smart Event Assistant was accessible online in different versions until 2007

via the URL <http://www.smarteventassistant.de>. It is currently being redeveloped

and extended as the Smart Personal Assistant, which is accessible online via the URL

<http://www.smartassistantsolutions.de>.

Figure 4.5: Screenshot of the Smart Event Assistant, a privacy-

preserving Recommender System supporting users in planning

entertainment-related activities.

day planner’s result dialog. The Smart Event Assistant is entirely realized

as a MAS system providing, among other functionality, various filter agents

and different service provider agents, which together with the personal user

agents utilize the functionality provided by our approach.

We describe typical scenarios of the Smart Event Assistant in more detail

in Section 10.1.2.4, where we evaluate the performance of our approach. Due

to resource restrictions with regard to the Smart Event Assistant project, we

did not have the time to deploy the system in a trusted environment, i.e.

based on a trusted computing infrastructure, which therefore remains future

work.

4.4 Summary

This chapter gives an overview of our approach for Privacy-Preserving Infor-

mation Filtering. From the requirements listed in Section 2.3.4, we derive a

number of use cases which have to be realized by a comprehensive architecture

for PPIF (Section 4.1). We outline our solution for PPIF, which is based on

a trusted environment that is used to control the communication capabilities

of entities deployed in this environment. Based on this trusted environment,

we specify protocols for realizing the Recommender System-related use cases

in a privacy-preserving manner (Section 4.2.1). We describe an anonymous

centralized model of user-item relationships which is used for realizing the

Matchmaker System-related use cases in a privacy-preserving manner (Sec-

tion 4.2.2). We list the required abilities of entities operating in this context,

and motivate the use of MAS technology by mapping these required abil-

ities to the capabilities of agents and agent platforms (Section 4.2.3). We

list the components of our approach, which address the given requirements

(Section 4.2.4). We give a short overview of JIAC IV as the foundation of

our implementation, and introduce the Smart Event Assistant as a prototyp-

ical application utilizing our approach (Section 4.3). The following chapters

describe the components of our approach in detail.

Chapter 5

Basic Infrastructure

This chapter describes basic functionality for controlling the communication

capabilities of agents, and functionality for anonymous communication of

agents. We subsume both kinds of functionality in a single chapter because in

both cases communication capabilities of agents are addressed, and because

the functionality may be realized most efficiently by extending the respective

MAS architecture itself.

The chapter is structured as follows: Section 5.1 briefly motivates the

Infrastructure Module. Section 5.2 describes the ontologies, roles and in-

teractions of the module, while Section 5.3 describes the agents and agent

services realizing these interactions. Section 5.4 concludes the chapter with

a summary.

5.1 Motivation

As noted in Section 4.2.4, the ability to control the communication of agents

is generally not a feature of existing Multi-Agent System architectures but

at the same time a central feature of our approach for agent-based Privacy-

Preserving Information Filtering.

Anonymous communication is required for several interactions in our ap-

proach mainly in order to achieve unlinkability of user-related data. Depend-

ing on the actual scenario, solutions for sender anonymity as well as receiver

anonymity are required.

We therefore provide the respective functionality within the Infrastruc-

ture Module, as specified in the following sections.

5.2 Analysis

This section describes the ontologies, roles and interactions of the Infrastruc-

ture Module. For the sake of readability, all tables and diagrams containing

the formal specification may be found in Appendix A.1. Usually the analy-

sis phase and thus the specification of interactions abstracts from agents and

platform configurations. However, in this case it is necessary to refer to these

concepts because of the reflective nature of the interactions.

When utilizing this module, it should be noted that the concepts of com-

munication control and anonymous communication are mutually exclusive,

because agents on a controlled platform are always identifiable when com-

municating, and thus cannot communicate anonymously.

The functionality required for controlling communication cannot be real-

ized based on regular agent services and/or components, because an agent on

a platform is usually not allowed to interfere with the actions of other agents

in any way. Otherwise, the security of agents would be severely compromised.

Therefore, additional infrastructure providing the required functionality has

to be added to the MAS architecture itself.

Controlling the communication capabilities of an agent is realized by re-

stricting its incoming and outgoing communication channels to specific plat-

forms or agents on external platforms as well as other possible communication

channels, such as the file system. These restrictions are stated via rules, sim-

ilar to rules used by a firewall, which become effective only if the respective

agent consents. Consent is required because otherwise the overall security

would be compromised, as attackers could easily block various communica-

tion channels. Our approach does not require controlling the communication

between agents on the same platform, and therefore this aspect is not ad-

dressed1. Consequently, all rules addressing communication capabilities have

to be enforced across entire platforms, because otherwise a controlled agent

could simply use a non-controlled agent on the same platform as a relay for

communicating with agents residing on external platforms.

Obviously, an agent on a controlled platform should not be able to migrate

freely to other platforms, because in that case control could be lost. Because

migration is initiated by communicating with a remote platform manager

agent, it is made impossible implicitly with the following non-critical ex-

ception: An agent on a controlled platform may still be able to migrate to

1While also depending on the actual MAS architecture utilized, this would typically

be more complicated or even impossible to realize, because intra-platform agent commu-

nication is usually based on channels that are more difficult to control than inter-platform

communication channels, such as communication within a single Java Virtual Machine as

opposed to TCP/IP-based communication.

platforms it is allowed to communicate with.

Agents attempting to migrate onto a controlled platform should be made

aware of this fact, or incoming migration should be prohibited altogether.

The same applies to agents attempting to create additional agents on a con-

trolled platform. Because these aspects are not directly relevant for the

scenarios discussed in this work, we will not address them further.

The functionality for anonymous communication may either be realized

based on regular agent services and/or components, or by adding additional

infrastructure providing the required functionality to the MAS architecture

itself (see Section 3.3.1 for related work in this area). In any case, some kind

of relay is required because sender and receiver (i.e. agent service user and

provider) cannot communicate directly without compromising anonymity.

5.2.1 Ontologies

Rules for controlling communication are expressed via the ontology “Com-

munication Rules”, for which see Section A.1 in the appendix. Basically, a

rule specifies a controller (i.e. the controlling agent itself), a sender (i.e. the

platform that is controlled) and receivers (a set of agents and/or platforms

to be excepted from communication blocking).

For every controlled platform, a set of activated rules contains the rules

which are applicable to the respective platform. Activated rules may con-

tain different controlling agents. From the set of activated rules, which may

be contradictory, a single effective rule is generated consisting of two parts.

The first part is applied in order to decide which communication attempts to

block, while the second part is applied in order to decide which platforms an

agent on a controlled platform may control in turn. Generally, the effective

rule is determined by creating the intersection of the sets of exceptions of

each single activated rule. As an example, if the first activated rule states

that communication with all platforms except P1and P2is to be blocked,

and the second activated rule states that communication with all platforms

except P2and P3is to be blocked, the effective rule in this case states that

communication with all platforms except P2is to be blocked. All rules orig-

inating from agents on a specific platform are collected as a set of foreign

rules.

Because a controlling agent is expected to intend to communicate with

agents on the controlled platform, it is excepted from communication block-

ing in the first part of the effective rule. In cases where two or more different

agents control one platform, only the activated rules related to the first agent

are considered when determining the first part of the effective rule in order

to ensure that the first controller is able to communicate with the controlled

platform. For the second part of the effective rule, however, all activated

rules are considered2.

All information related to sender and receiver anonymity is expressed via

the ontology “Anonymity”, for which see Section A.1 in the appendix. Sender

anonymity requires a pseudonym to be used for the sender, the agent address

of the actual interaction partner, information about the interactions that are

to be anonymized, and information related to the required degree of unlinka-

bility, i.e. whether all single interaction steps should be unlinkable. Receiver

anonymity requires a pseudonym to be used for the receiver, the actual agent

address of the receiver, and information about the interactions that are to

be anonymized. Depending on the implementation of the anonymizer, it

may be infeasible to provide a mechanism for continuous receiver anonymity,

mainly because there may be a large number of potential receivers for each

actual interaction which would have to reachable continuously. Therefore, a

time slot may be given optionally indicating the time periods in which the

anonymizer should actually facilitate anonymous interaction. In both cases,

an optional attribute may be used to indicate the required multiplicity, i.e.

whether the respective interaction should be carried out anonymously once,

a fixed number of times, or an unlimited number of times.

5.2.2 Roles and Interactions

This section describes the roles and interactions of the Infrastructure Module.

For the role schemas, see Appendix A.1. Table 5.1 provides an overview of

the roles.

5.2.2.1 Communication Control

A role with special privileges exceeding those of regular roles, namely the

SupervisorRole, is required for actually enforcing control of the commu-

nication capabilities of specific agents. On every platform hosting agents

which are potential candidates for controlled agents, there has to be an agent

realizing this role. Similar to the agent realizing the PlatformManager-

Role itself, the agent realizing the SupervisorRole has to be trusted to

act non-maliciously, i.e. to carry out all tasks as specified without trying to

obtain additional information. In other words, these agent have to be part

of the trusted environment described in Section 4.2.1.

To simplify matters, both roles may be realized by one single agent. We

model controlling agents in the interactions described below as agents real-

2This is done in order to avoid subsequent complications in more complex situations,

as described in Appendix B, Example 4.

Table 5.1: The roles participating in the Infrastructure Module.

short name/ aggregated by

role name user provider filter

SupervisorRole SR(U) SR(P)

Responsible for actually controlling the communication capa-

bilities of agents realizing the ControllableAgentRole,

and for handling request for control regarding agents on re-

mote platforms.

ControllableAgentRole CAR(U) CAR(P)

Provides functionality that allows a ControllerRole to

obtain the consent to control the agent realizing this role un-

der certain circumstances.

AnonymizerRole AR(U) AR(P)

Provides functionality for anonymous communication.

AgentRole *(U) *(P)

Does not provide specific functionality. Participates in inter-

actions with other roles.

izing a generic AgentRole, because they do not have to provide specific

functionality, and controlled agents as agents aggregating the Control-

lableAgentRole. The agent realizing the PlatformManagerRole

and the agent realizing the SupervisorRole itself are always exempt from

communication blocking, because they have to communicate with other plat-

forms in order to carry out their respective tasks.

For clarity, we point out that the term “controlling agent” always refers

to the agent that initiated the control, while the agent realizing the Super-

visorRole is responsible for actually enforcing control.

To keep the complexity of the process of adding and handling rules man-

ageable, a group of platforms is always blocked uniformly, i.e. each agent

on any platform within the group may communicate with any other agent

located on a platform within the group, with the controlling agent unless

otherwise restricted, but with no one else. Therefore, a controlling agent

may only to specify a group of platforms he intends to block, without being

able to add additional restrictions, which would only complicate matters un-

necessarily. Consequently, agents on a controlled platform are blocked from

communication with any agent outside the respective group of controlled

platforms, with the exception of the controlling agent itself.

Basic Interactions To facilitate controlling the communication of agents,

the four basic interactions RestrictCommunication,CheckRule,ActivateRule,

and AcquireConsent are provided as specified in Table A.1, Table A.2, Table

A.3, and Table A.4 of the appendix. The partial use case “restrict communi-

cation” representing a typical scenario based on these interactions is shown

in Figure 5.1. It should be noted that if consent is withheld by at least

one role, further interactions are not carried out, and no rules are added or

activated anywhere.

Figure 5.1: Collaboration Diagram for the partial use case “re-

strict communication”.

It may appear unnecessary to involve the supervisor, which is the agent

realizing the SupervisorRole, on the controlling agents’s own platform in

the process, instead of allowing the respective role to directly interact with

the supervisor on the platform to be controlled. This direct interaction is not

allowed because it would not allow scenarios based on cascading control, as

described below. Furthermore, an agent’s own supervisor checks the second

part of its effective rule in order to determine whether the agent may control

the remote platform as intended, and it checks for attempts to block the same

group of platforms more than once by checking its foreign rules. Thus, poten-

tially unsuccessful attempts to restrict communication are detected without

unnecessary inter-platform interactions. A detailed example highlighting the

use of these basic interactions, and the effect of multiple activated rules on

the effective rule is given in Appendix B.

Revoking Control Revoking control by removing rules that have already

been activated is required for several reasons: The tasks of controlling a

group of platforms may have to be transferred from one agent to another,

or a group of controlled platforms may have to be enlarged or downsized.

Furthermore, a rule may have been activated as a precautionary act that

subsequently turns out to be unnecessary, or a platform may have to be con-

trolled temporarily only, e.g. because sensitive information handled by an

agent on the platform has expired. Therefore, the interactions RevokeControl

and RevokeRule are provided as specified in Table A.5 and A.6 of the ap-

pendix. The partial use case “revoke control” representing a typical scenario

based on these interactions is shown in Figure 5.2 . When revoking control,

the effective rules are determined based on the remaining active rules. The

agents on the controlled platform are not required to consent. Only the con-

trolling agent itself may revoke control. Again, Appendix B illustrates these

interactions via an example. Similar to restricting communication, control

is always revoked with respect to a group of platforms. Therefore, it is not

possible to revoke control for a single platform within a group of controlled

platforms directly. Instead, communication has to be restricted explicitly

for a new group consisting of the remaining platforms, and in a second step

control may be revoked on the original group of platforms. For convenience,

an additional interaction could combine both steps, but this interaction is

not strictly required and thus optional.

Cascading Control A scenario not addressed by the interactions intro-

duced so far is the cascading control scenario in which an agent intends to

control a platform containing agents which in turn control further platforms.

In this case, these further platforms are added to the groups of platforms to

be controlled, by returning the respective information as an additional output

of the CheckRule interaction. The rules regarding these additional platforms

are activated without acquiring consent of the respective agents, because they

have already consented to be controlled. Instead, these agents are notified of

the cascading control via the interaction InformAboutCascadingControl speci-

fied in Table A.7 of the appendix, because this information may help agents

to determine whether to carry out a specific protocol for which cascading

control is required. The partial use case “restrict communication (cascad-

ing control)” representing a typical scenario based on cascading control is

shown in Figure 5.3. The initiator of the interaction RestrictCommuni-

cation is informed about all platforms that are controlled in addition to

the group given as input. From this point onwards, no distinction is made

between platforms controlled through cascading control and platforms con-

Figure 5.2: Collaboration Diagram for the partial use case “re-

voke control”.

trolled regularly. Therefore, revoking control of a platform does not affect

the activated rules regarding this platform even if they have been established

through cascading control. Again, Appendix B provides an example for this

scenario.

Additional Management Functionality As described above, control-

ling a certain agent is only possible by controlling an entire platform. There-

fore, a large number of platforms is required in scenarios containing a large

number of separate processes that have to be executed by agents controlled

by different controlling agents. For example, in the Recommender System

use cases of our PPIF approach, the optimal solution with regard to privacy

and security is to use one separate platform for each process of the infor-

mation filtering stage. For real-world applications aiming at a large number

of users, the number of required platforms is therefore much larger than the

usual number of platforms in a deployed system based on MAS architecture,

which is typically relatively small due to the amount of resources required to

run a platform.

For these reasons, it is infeasible to control at platform permanently, or

in fact longer than for the time period that is sufficient for carrying out a

small number of interactions. If control is short-lived, a controlled platform

Figure 5.3: Collaboration Diagram for the use case “restrict

communication (cascading control)”.

may be terminated or recycled and the respective resources may be re-used

by subsequent for further tasks. An additional positive effect of short-lived

control is that it further minimizes security risks. Short-lived control of a

platform implies that the agents located on that platform are short-lived

as well, because they will usually not be allowed to migrate away from a

platform that is to be terminated, as this would result in a loss of control.

Therefore, an agent consenting to be controlled implicitly consents to be

terminated at any time as well.

The interactions for terminating controlled agents, RevokeControlAndTer-

minate and RevokeRuleAndTerminate, specified in Table A.8 and A.9 of the

appendix, are very similar to the interactions defined above, with the ad-

ditional result that the agents on the respective platforms are terminated

either immediately after control is revoked or at a later time, depending on

the following conditions: There must be no other remaining activated rule re-

garding the same controller, and the controller who started the service must

be the first in the list of controllers. If the agents are not terminated imme-

diately, an activated rule is added and evaluated whenever activated rules

are removed via the interactions RevokeControl and RevokeControlAndTer-

minate, in order to terminate the agents at the appropriate point of time.

Additionally, when a platform controlled by more than one agent is actually

terminated, other supervisors are notified via the interaction InformAbout-

Termination specified in Table A.10 of the appendix. This interaction is also

used in the cascading control scenario in order to inform the supervisors of

platforms controlled by an agent to be terminated, in order to enable them

to remove the respective activated rules as well. The partial use case “revoke

control and terminate” representing a typical scenario based on these inter-

actions is shown in Figure 5.4. Again, Appendix B illustrates this scenario.

The interactions TerminateAgents and TerminateAgent are regarded as stan-

dard platform management interactions and therefore not specified further

here.

Figure 5.4: Collaboration Diagram for the partial use case “re-

voke control and terminate”.

Finally, an agent may actively request to be controlled by another agent,

mainly in order to be able to carry out a protocol requiring this control

at a certain point. The respective interaction RequestControl is provided as

specified in Table A.11.

An aspect not addressed further in this work is the following: It may

become necessary to enforce termination of platforms from a global platform

manager’s point of view, in cases where the controlling agents deliberately or

100

accidentally fail to release control. This could be handled by using a time-out

after which platforms are terminated automatically, or by using an incentive

or billing mechanism that makes it desirable for a controlling agent to release

control as soon as possible.

5.2.2.2 Anonymous Communication

The required functionality for anonymous communication is realized via an

abstract AnonymizerRole. An agent requiring sender anonymity for a

specific interaction relays that interaction through the AnonymizerRole,

by interacting with it as if it where the actual interaction partner, i.e. the

AnonymizerRole is the interaction partner in the interaction with the ac-

tual initiator, and the initiator in the interaction with the actual partner.

The main interaction is preceded by the initiator providing the required in-

formation via the interaction SetupAnonymizer specified in Table A.12 of the

appendix. The same interaction is used for realizing receiver anonymity, basi-

cally via the same relay mechanism. In this case, the agent requiring receiver

anonymity is the partner in the main interaction, i.e. the roles of initiator

and partner are reversed.

5.3 Design & Implementation

The design of the functionality for controlling communication and for anony-

mous communication is rather straightforward, because roles are mapped di-

rectly to agents and interactions to agent services as shown in Table 5.2 and

Table 5.3 respectively, the only exception being the interactions CheckRule

and ActivateRule, which are realized within a single agent service, because

they are always used in conjunction. The implementation of agents, agent

services and internal components is similarly straightforward and therefore

omitted here. The remaining issue that has to be addressed with regard

to communication control is the implementation of the actual control mech-

anism. As JIAC IV is based on Java, we utilize methods provided via the

Java Security Manager as part of the Java security model. Thus, the super-

visor agent is enabled to define custom security policies, thereby granting or

denying other agents, which are executed as threads in the JVM, access to

resources such as files or sockets for TCP/IP-based communication. Apart

from denial of service attacks in which agents may attempt to disrupt the ex-

ecution of other agents on the same platform by seizing a large amount of the

available resources, all other malicious actions and communication attempts

may be blocked via this mechanism.

101

Table 5.2: The mapping of interactions to agent services.

Interaction Table Agent Service

RestrictCommunication A.1 RestrictCommunication

CheckRule A.2 ImplementRule

ActivateRule A.3

AcquireConsent A.4 AcquireConsent

RevokeControl A.5 RevokeControl

RevokeRule A.6 RevokeRule

InformAboutCascadingControl A.7 InformAboutCascadingControl

RevokeControlAndTerminate A.8 RevokeControlAndTerminate

RevokeRuleAndTerminate A.9 RevokeRuleAndTerminate

InformAboutTermination A.10 InformAboutTermination

RequestControl A.11 RequestControl

SetupAnonymizer A.12 SetupAnonymizer

Table 5.3: The mapping of roles to agents.

Role Agent

AgentRole unspecified

ControllableAgentRole unspecified

SupervisorRole SupervisorAgent

AnonymizerRole AnonymizerAgent

As noted above, the SupervisorAgent and the agent realizing the Plat-

formManagerRole have to be part of a trusted environment, which may

be realized e.g. based on a trusted computing infrastructure. As part of

the deployment phase, however, the trusted environment does not affect the

implementation phase and is therefore not discussed further at this point.

The remaining issue that has to be addressed with regard to anonymous

communication is the implementation of the anonymizer itself. While theo-

retically any approach used for anonymous communication on the Internet

may be mapped to a MAS context, resulting in an agent-based mix net-

work, onion routing approach or a similar system (see Section 3.1.1 and

Section 3.3.1 for related work in this area), we have chosen a simple proxy

mechanism mainly because details of the anonymizer are out of the scope of

this work.

The anonymizer therefore is implemented as a simple component, which

may actually be part of the agent that intends to communicate anonymously,

instead of a separate agent. The anonymizer component creates and termi-

nates the actual relay agents as requested. The relay agents offer and execute

the services that are to be used anonymously. Because they are deployed and

102

controlled by the agent that intends to communicate anonymously, they are

trusted implicitly, and a trusted third party is not required. Currently, the

code for the relay agents has to be created manually, based on the respective

service to be relayed. It is conceivable, however, to automatize this process

and create the respective code dynamically at least for services without a

user protocol.

5.4 Summary

This chapter describes basic functionality for controlling the communication

capabilities of agents, and functionality for anonymous communication of

agents because these kinds of functionality are generally not a feature of

existing Multi-Agent System architectures but at the same time a central

requirement of our approach for agent-based Privacy-Preserving Information

Filtering.

We motivate the need for communication control and anonymous com-

munication as additional functionality in MAS architectures for PPIF (Sec-

tion 5.1). We specify ontologies containing the basic concepts, namely rules

for communication control, sender anonymity, and receiver anonymity (Sec-

tion 5.2.1). Regarding communication control, we specify roles and basic

interactions for establishing and revoking control, interactions for cascad-

ing control, and interactions for additional management functionality (Sec-

tion 5.2.2.1). Regarding anonymous communication, we specify a role and an

abstract interaction (Section 5.2.2.2). Regarding the design and implemen-

tation of the specified functionality, we list agents and agent services, and we

discuss the implementation of the actual control mechanism and the imple-

mentation of the anonymizer (Section 5.3). In the appendix, we give several

examples illustrating the aspects of communication control (Appendix B).

The modules realizing the main use cases of our approach are based on

functionality described in this chapter as well as functionality related to

accessing persistent information in a well-defined manner, which is described

in the following chapter.

103

Chapter 6

Transparent Persistence

This chapter describes an approach for Transparent Persistence in Multi-

Agent Systems (TPMAS). It should be noted that it is possible to use this

approach in any scenario involving the use of large amounts of persistent data,

as it is not adapted especially for Privacy-Preserving Information Filtering.

As motivated below, however, our approach for PPIF strongly benefits from

a transparent persistence management mechanism, and therefore we include

it here as a main component of the approach.

The chapter is structured as follows: Section 6.1 motivates the TPMAS

Module. Section 6.2 describes the ontologies, roles and interactions of the

module, while Section 6.3 describes the agents and agent services realizing

these interactions. Section 6.4 concludes the chapter with a summary.

6.1 Motivation

Information Filtering in general is based on data that has to be stored per-

sistently: It deals with users’ long-term information needs, and therefore the

respective user profile data has to be stored persistently. As in Information

Retrieval, the provider data usually consists of large data sets that are rel-

atively static1, and is therefore best managed by using a persistent storage

mechanism as well.

In standard Information Filtering architectures, the filter entity is not re-

alized as independent from the provider entity. Therefore, the provider may

use a specific persistent storage mechanism, which is usually a Relational

Database Management System (RDBMS), and may utilize filtering tech-

niques operating directly, e.g. via JDBC, on the data store. This straightfor-

1Static in the sense that while single items are usually added and removed constantly,

the bulk of the provider data is kept permanently for a comparatively long time.

105

ward approach has the advantage that it may be optimized for performance

easily. It is visualized in Figure 6.1 as “Standard Architecture”.

Figure 6.1: The motivation for a transparent and generic per-

sistence mechanism in the context of PPIF. The topmost layer

visualizes the data flow in a standard IF architecture. The layer

below visualizes the additional operations introduced by our PPIF

approach, and the bottommost layer shows subsequent optimiza-

tions leading to our final approach.

6.1.1 Persistence Interface

In our approach for Privacy-Preserving Information Filtering, the filter entity

is actually independent from the provider entity. Therefore, the respective fil-

ter role cannot operate directly on the provider’s data store, and an interface

for accessing persistent information is required. Apart from functionality for

retrieving profile data during the information filtering stage (i.e. read-only ac-

cess to the persistent data), the interface should also provide functionality for

storing and retrieving data to facilitate creating and updating profile models

during the information collection stage and the information processing stage.

106

Though these models are dependent on the specific filtering technique, they

cannot be stored internally by the filter role, because the temporary filter

entities accessing these models are short-lived. Therefore, the profile models

have to be stored by the user and provider role respectively.

Because the structure of a profile model is entirely dependent on the spe-

cific filtering technique, and filtering techniques may be based on arbitrary

structures, it is not advisable to realize a persistence interface by providing

various dedicated interaction for specific model structures (e.g. interactions

with the goal of training a neural network, updating a decision tree or car-

rying out a clustering algorithm), because these interactions would always

be potentially incomplete. Instead, the persistence interface should provide

generic functionality for storing and removing data. Because the profiles

and profile models, especially those on the provider side, are usually rather

large, they should not have to be dealt with as a whole. Therefore, the

persistence interface should also provide services for retrieving, storing and

updating parts of a profile. The resulting data flow is visualized in Figure

6.1 as “Unoptimized PPIF”.

6.1.2 Generic Transparent Persistence

In the Privacy-Preserving Information Filtering approach, the retrieval of

potentially sensitive information is monitored by controlling the communica-

tion of agents. As an example, when retrieving parts of the provider profile

within a filtering process, the controlling agent associated with the user role

has to make sure no sensitive information about the user profile is used

within the respective query. Therefore, a uniform query structure should be

used regardless of the actual persistent storage mechanism (which may be a

RDBMS, a file system, or something else). Otherwise, the controlling role

would have to be adjusted to every single storage mechanism used.

While for the controller any uniform structure would be acceptable, a

transparent persistence interface is preferable for the filter: It should be pos-

sible to persist the objects handled by the filter, and to use these objects

when creating queries, rather than having to transform these objects into

some other uniform structure within the filter. Otherwise, the mapping pro-

cess would become needlessly complicated, as both the filter and the provider

would have to carry out transformations. In the case of transparent persis-

tence, transformations only have to be carried out by the provider. This

aspect of transparent persistence is visualized in Figure 6.1 as the part of the

final architecture “PPIF with TPMAS” related to the filter.

Finally, it should be possible to exchange the actual persistent storage

mechanism without having to adjust any of the interactions. In the context

107

of Object-Oriented Software Engineering2, this is achieved via using a per-

sistence mechanism such as the Java Data Objects (JDO) specification [65].

This aspect of generic persistence is visualized in Figure 6.1 as the part of

the final architecture “PPIF with TPMAS” related to the provider.

6.2 Analysis

This section describes the ontologies, roles and interactions of the TPMAS

Module. For the sake of readability, all tables and diagrams specifying these

components may be found in Appendix A.2.

6.2.1 Ontologies

We do not use a special category for persistent objects, rather, it should be

possible to treat objects of all categories defined in arbitrary ontologies as

persistent objects. Therefore, ontologies defining certain categories may be

used without need for adjustments. For reasons of access management and

in order to keep a large number of persistent objects manageable, persistent

objects are stored in groups, namely contexts. Every operation is applied to

a single context, rather than globally to all persistent objects3. A context is

referred to via a unique identifier.

Access control of contexts is supported by a simple authorization ap-

proach in which each context is assigned three authorization tokens for var-

ious access rights (read only access, read/write access, and full access in-

cluding the right to create and terminate a context). These tokens may be

propagated at the discretion of the agent that has created the respective

context. More complex access control mechanisms, such as Role-Based Ac-

cess Control, are not strictly required for our approach and are therefore not

described here. If actually required, they may be added easily on top of the

existing mechanism.

Apart from the categories required to create complex queries, which are

collected in a separate ontology described below, only a few other basic cat-

egories are required for storing and retrieving persistent objects. For the

respective ontology “Transparent Persistence”, see Section A.2 of the ap-

pendix.

2Object-Oriented Software Engineering concepts are actually applicable in this case

within the larger context of Agent-Oriented Software Engineering because the interactions

between the provider agent and the data storage are not agent-based, and the internal

functionality of agents is realized in an object-oriented manner.

3The analogous element in a database management system is a single database.

108

When retrieving objects from a context containing a potentially large

number of objects, it is advisable to keep the size of the list of returned objects

as small as possible, instead of retrieving a large list of perhaps only partially

relevant objects. Therefore, the structure of the query construct used within

the respective interaction should support complex queries, i.e. it should be

possible to express queries in a manner similar to other query languages,

such as SQL or JDOQL, the query language used in the JDO specification.

For this purpose, an ontology-based query structure is provided which allows

conjunctive and disjunctive queries on all attributes of the objects stored

within a context. For the respective ontology “Query Construct”, see again

Section A.2 of the appendix.

6.2.2 Roles & Interactions

This section describes the roles and interactions of the TPMAS Module. For

the role schemas, see Appendix A.2. Table 6.1 provides an overview of the

roles. The only role actually specified in the TPMAS approach is the role

providing, via transparent persistence, access to the actual persistent storage

mechanism, namely the TPMASProviderRole. Because any role may use

the services offered by this role, a designated service user role does not have

to be specified and we use a generic AgentRole for the specification.

Table 6.1: The roles participating in the TPMAS Module.

short name/ aggregated by

role name user provider filter

TPMASProviderRole TPMAS(U) TPMAS(P)

Provides transparent access to a persistent storage mecha-

nism.

AgentRole *(U) *(P)

Does not provide specific functionality. Participates in inter-

actions with other roles.

Two kinds of interactions are provided: Interactions operating on the

context level, and interactions operating on the object level. Within the first

group, the interactions CreateContext and TerminateContext for creating and

terminating contexts are provided as specified in Table A.16 and Table A.17

of the appendix. Within the second group, the interaction ModifyObjects

for storing, updating and removing objects within a contexts is provided

as well as the interaction RetrieveObjects for retrieving objects, based on a

query, as specified in Table A.18 and Table A.19 of the appendix. Because

109

each interaction only involves the TPMASProviderRole and a generic

AgentRole, we omit collaboration diagrams here.

6.2.3 Internal Functionality

It does not make sense to entirely specify the internal functionality of the

TPMASProviderRole at this point, because it largely depends on the

actual MAS architecture on the one hand, and the actual persistent storage

mechanisms on the other hand. However, as mdifferent MAS architectures

and ndifferent persistent storage mechanisms would require m·ndifferent

solutions, it seems appropriate to reduce this number by introducing further

functionality. Using a mechanism for transparent persistence, the number of

different solutions is in fact reduced to m, i.e. one per actual MAS architec-

ture.

We therefore utilize the Java Data Objects (JDO) specification [65]4as

the basis for the mechanism for transparent persistence. As many MAS ar-

chitectures are based on Java, this choice is obvious and at the same time

not too restrictive. There are various open source and commercial implemen-

tations of the JDO specification, which are interchangeable. Therefore, our

approach is not limited to one specific JDO implementation, and we do not

have to choose one at this point.

The JDO specification contains the following main components:



PersistenceCapable interface: For Java objects that are to be made

persistent, the respective classes have to implement this interface in

order to provided the required functionality in the form of fields and

methods. In most JDO implementations, the required code is added

automatically by enhancing the respective classes.



PersistanceManager interface: Persistence managers implementing this

interface manage groups of persistent objects and provide functionality

for adding and removing persistent objects via transactions.



Query interface: JDO implementations provide, via this interface, map-

pings from queries expressed in JDOQL to queries expressed in the

query language of the persistent storage mechanism (e.g. SQL).

For classes implementing the PersistentCapable interface, XML-based

metadata files have be supplied specifying how a class is to be persisted,

4The specification was originally developed under the Java Community Process as Java

Specification Request (JSR) 12, and released in 2002 as JDO 1.0. An extension to the

JDO specification has been developed as JSR 243, and released in 2006 as JDO 2.0.

110

and which fields are to be made persistent. This information is used when

storing persistent objects, and most JDO implementations also provide tools

that generate, based on this information, the actual structures persistent

objects are stored in (e.g. tables within a relational database).

There are other solutions for transparent persistence of Java objects which

could be used alternatively to achieve transparent persistence within MAS

architectures. They are, however, usually less generic with regard to the per-

sistent storage mechanism (as an example, object-relational mapping tools,

such as Hibernate [12] or the Java Persistence API as part of the Enterprise

Java Beans (EJB) 3.0 specification [67] require a Relational Database Man-

agement System as the persistent storage mechanism). Therefore, we have

chosen the JDO specification as the most suitable solution for our approach.

6.3 Design & Implementation

This section describes the agents and agent services of the TPMAS Module,

as well as internal functionality.

6.3.1 Agents & Agent Services

The design of the functionality for transparent persistence is rather straight-

forward, because roles are mapped directly to agents and interactions to

agent services as shown in Table 6.2 and Table 6.3 respectively. The im-

plementation of agents, agent services and internal components is similarly

straightforward and therefore omitted here.

Table 6.2: The mapping of interactions to agent services.

Interaction Table Agent Service

CreateContext A.16 CreateContext

TerminateContext A.17 TerminateContext

ModifyObjects A.18 ModifyObjects

RetrieveObjects A.19 RetrieveObjects

Table 6.3: The mapping of roles to agents.

Role Agent

AgentRole unspecified

TPMASProviderRole TPMASProviderAgent

111

6.3.2 Internal Functionality

In order to facilitate the utilization of a JDO implementation within a MAS

architecture, we provide the following functionality:



Functionality for dynamically creating a Java class for a category of

an ontology, and a bidirectional mapping of objects of this Java class

to objects of the category. MAS architectures may use Java classes

themselves as ontology categories, without using a separate language

in which ontologies are expressed. In this case, the functionality obvi-

ously does not have to be actually implemented. However, other MAS

architectures, such as JIAC IV, use a separate language for specifying

ontologies. Therefore, in this case Java classes have to be created dy-

namically. A dynamically created Java class representing a category

of an ontology contains fields matching the attributes of the category

(thus, a mapping of the base types defined in the respective ontology

language to Java base types is required), a constructor with the cat-

egory attributes as input parameters, and, for the reverse mapping,

methods with the category attributes as return parameters.



Functionality for creating all required metadata information and files

for a given category. Metadata information is normally not created

automatically in order to give the developer greater control over the

data storage schema and the way objects are made persistent. Because

this course of action is usually not required in our approach, we actually

create all metadata information automatically, which turns out to be a

rather straightforward task. It should be noted, though, that ontology

objects may also be mapped to already existing elements stored in a

persistent storage mechanism. In this case, the respective metadata has

to be created manually, based on the structure of the stored elements.

Figure 6.2 shows the main tasks carried out by our TPMAS implementa-

tion and a JDO implementation for handling persistent objects and queries,

including all aspects of the previous list. Note that the tasks related to

preparing a context, while shown separately from the tasks related to storing

an object, are carried out, if necessary, immediately before storing an object

and not as part of a separate agent service. Therefore, they are hidden from

the service user who does not have to keep track of whether a context has

already been prepared for storing objects of a certain category.

112

Figure 6.2: Overview of tasks provided by the TPMAS Module

implementation and a JDO implementation for handling persis-

tent objects and queries.

113

6.4 Summary

This chapter describes an approach for Transparent Persistence in Multi-

Agent Systems (TPMAS) which is a main component of our approach for

Privacy-Preserving Information Filtering, but at the same time may be used

independently in any scenario involving the use of large amounts of persistent

data.

We motivate the concept of transparent persistence by showing that our

approach for PPIF requires a persistence interface and benefits from generic

transparent persistence (Section 6.1). We specify ontologies containing the

basic concepts (Section 6.2.1). We specify roles and basic interactions for

handling contexts and objects (Section 6.2.2), and we discuss internal func-

tionality required for our approach (Section 6.2.3). Regarding the design

and implementation of the specified functionality, we list agents and agent

services (Section 6.3.1), and we discuss the implementation of the internal

functionality (Section 6.3.2).

With the functionality described in this and the previous chapter as a

foundation, we are now able to describe the modules realizing the main use

cases of our approach in the following chapters.

114

Chapter 7

The Recommender Module

This chapter describes functionality provided by the Recommender Module,

i.e. it primarily addresses the use cases “get prediction for item” and “get

recommendations”, as defined in Section 4.1. Moreover, it addresses the

use cases related to the first two IF stages, namely the use cases “update

profile elements” and “update profile model”. In addition to the use of

the Recommender Module functionality in a Recommender System context,

this chapter also covers its use in a Hybrid IF System context, because the

interactions are largely similar in both cases.

The chapter is structured as follows: Section 7.1 briefly motivates the

Recommender Module. Section 7.2 describes the ontologies, roles and in-

teractions of the module, while Section 7.3 describes the agents and agent

services realizing these interactions. Section 7.4 concludes the chapter with

a summary.

7.1 Motivation

The Recommender Module constitutes one of the two core modules of our

approach for Privacy-Preserving Information Filtering: Together with the

Matchmaker Module, it addresses all use cases defined in Section 4.1. It

provides primary functionality related to the requirements of user privacy,

provider privacy and filter privacy (see Table 4.2). Thus, the need for func-

tionality described in this chapter is motivated directly by the outline of our

solution given in Section 4.2, as the abstract IF protocols introduced in the

outline are realized via agent interactions, i.e. as part of agent services.

115

7.2 Analysis

This section describes the ontologies, roles and interactions of the Recom-

mender Module. For the sake of readability, all tables and diagrams specify-

ing these components may be found in Appendix A.3.

7.2.1 Ontologies

The main ontology of this module, namely the ontology “Information Filter-

ing” shown in Figure A.5 of the appendix, contains categories and attributes

that are directly derived from the definitions given in Section 2.2.1. These

are explained further in the context of the interactions they are used in.

7.2.2 Roles and Interactions

This section describes the roles and interactions of the Recommender Mod-

ule. For the role schemas, see Appendix A.3. The main abstract entities

(user entity, provider entity, and filter entity) introduced in Section 2.2.1 are

split into and mapped to different roles, each providing specific functionality

as described in Table 7.1. The roles InterfaceRole,ProfileManager-

Role,RelayRole, and TPMASProviderRole are aggregated by the

user entity as well as by the provider entity. The roles TFERole and TFE-

FactoryRole are exclusively aggregated by the filter entity.

Interactions are defined according to the three stages of Information Fil-

tering in the following sections. In these steps, all participating roles are

assumed to act in an honest or at least honest-but-curious manner, i.e. they

follow the specified protocols (see Section 2.3.3.1 for definitions of adversary

models). Roles aggregated by the same abstract entity are assumed to always

act in an honest manner with regard to each other, i.e. in interactions within

the respective abstract entity. Additionally, these interactions are considered

to be unobservable with regard to other roles aggregated by different entities1.

Thus, threats related to privacy have to be considered whenever interactions

between roles aggregated by different abstract entities take place. Additional

threats emanating from roles acting in a malicious manner are discussed and

addressed in Section 7.3.1. They do not have to be considered here because

it turns out that they are addressable by refining the single protocol steps of

interactions, from which we abstract in the analysis phase.

The TPMASProviderRole only interacts with the ProfileMan-

agerRole of the respective abstract entity, via the interactions specified

1As discussed in Section 2.4.2, we consider this condition to be fulfilled in the underlying

MAS architecture

116

Table 7.1: The roles participating in the Recommender Module.

short name/ aggregated by

role name user supplier filter

InterfaceRole IR(U) IR(S)

Responsible for interaction with human users or other soft-

ware.

ProfileManagerRole PMR(U) PMR(S)

Responsible for the management of a profile, which may be

accessed through this role only. Provides agents realizing the

RelayRole. Responsible for controlling agents agents of

other main abstract entities realizing the RelayRole.

RelayRole RR(U) RR(S)

Responsible for controlling agents of other main abstract en-

tities realizing the RelayRole and the TFERole.

TPMASProviderRole TP(U) TP(S)

Provides transparent access to a persistent storage mecha-

nism.

TFERole TFE

Carries out tasks of the information processing stage and the

information filtering stage.

TFEFactoryRole FF

Provides agents realizing the TFERole.

in Section 6.2.2. We omit this interactions, which may be mapped to the

interactions of the ProfileManagerRole in a straightforward manner, in

the following.

7.2.2.1 Information Collection Stage

The information collection stage deals with interactions related to creating

and updating profiles. Because the basic profile data associated with a given

abstract entity does not depend on a specific filtering technique, and is not

directly related to a second entity, there are actually no threats related to

privacy that have to be addressed. As an example, a human user may add

his favorite movie to his personal profile via a Graphical User Interface (GUI)

provided by the InterfaceRole of his personal agent in an unobservable

way, i.e. even without the entity that originally provided the related informa-

tion noticing. The user may subsequently remove the movie from his profile,

or add movies from other sources in the same way. The information provider

117

profile is updated in a similar manner, though usually on a larger scale and

via an API rather than a GUI.

Therefore, this stage requires only basic interactions addressing the use

case “update profile elements”, namely the interactions UpdateProfile and

QueryProfile , which provide functionality for updating and querying profiles

as specified in Table A.21 and Table A.22 of the appendix. Figure 7.1 il-

lustrates these interactions via the partial use case “create user profile” as a

special case of the main use case “update profile elements”.

Figure 7.1: Collaboration Diagram for the partial use case “cre-

ate user profile” as a special case of the main use case “update

profile elements”.

It may seem unnecessary to specify these additional interactions as we

have specified similar interactions for transparent persistence in the previous

chapter. These interactions are in fact used by the ProfileManagerRole

in order to store profiles persistently. They are not used directly, however,

because of additional functionality (for which see Section 7.2.2.2) provided by

the ProfileManagerRole, which is partly triggered by the interactions

of the Information Collection stage. Additionally, this course of action keeps

the overall architecture flexible, because the relation between profiles and

contexts is not fixed and may be arranged by the ProfileManagerRole

as it sees fit2, and it keeps the interface for the interaction partner simple

because it does not have to deal with managing contexts and access control

data.

2For example, a large profile may be stored across multiple databases and therefore

in different contexts, which may even be managed by different agents implementing the

TPMASProviderRole. These details are likely to be irrelevant for the respective service

user.

118

7.2.2.2 Information Processing Stage

The raw profile data collected in the first stage may be used directly as input

for a filtering technique in the Information Filtering stage. More complex

filtering techniques, however, require a further processing of the collected

data, resulting in models structuring the profile data in a certain way. Models

may be used on both user profile and provider profile data, or only on data

of a single profile, again depending on the filtering technique.

Different filtering techniques, such as minor variations of the same main

technique, may use the same profile model. Therefore, the ontology “Infor-

mation Filtering” groups filtering techniques by the profile model they are

based on. In order to be able to create and maintain a model at this stage,

the filtering technique to be applied in the following stage has to be known. If

different filtering techniques are to be applied, different corresponding mod-

els have to be maintained. In principle, the required models could be created

as a first step of the Information Filtering stage itself, but this approach is

usually infeasible due to the complexity of the process combined with the

fact that the Information Filtering stage may be initiated directly by a hu-

man user waiting for the results, which makes it more time-critical than the

preceding stages.

Nevertheless, for all but the most basic models, the algorithm used to

create and maintain the profile models should be considered part of the fil-

tering technique itself and is therefore provided by the filter entity. Thus,

two entities are involved in each process of the Information Processing stage,

and privacy aspects have to be addressed3. The filter entity is responsible,

via functionality provided by the a TFERole, for creating and updating

the profile models. The agent realizing this role is located on a platform

controlled by a user or a provider entity, depending on the profile on which

a model is to be created or updated. Interactions specified in Section 5.2

are used for controlling the platform. The agent realizing the TFERole

is created via a manager role, the TFEFactoryRole, via the interaction

ObtainTFE specified in Table A.23 of the appendix. Figure 7.2 illustrates the

respective partial use case “set up temporary filter entity”.

Due to the complexity of the process, it is advisable to create and update

large profile models independent of the actual information filtering process

itself. Therefore, it is necessary for the respective abstract entity to announce

an intended future use of a certain group of filtering techniques to its Pro-

fileManagerRole in order to trigger the creation of the respective model.

3Note that while all three abstract entities are involved in collaboration-based processes

of the Information Processing stage, we do not have to address this complication in the

context of Recommender System functionality, for reasons discussed in Section 2.2.1.

119

Figure 7.2: Collaboration Diagram for the partial use case “set

up temporary filter entity”.

For this reason the interaction SetUpdatePolicy specified in Table A.24 of the

appendix is provided by the ProfileManagerRole. It allows its initiator

to define a profile model update policy and a group of filtering techniques to

be used on a the respective profile or group of profiles.

Additionally, the ProfileManagerRole acts as a relay between the

TFERole and the TPMASProviderRole that handles the persistent

storage of the profiles. Therefore, similar to the profile management inter-

actions introduced in Section 7.2.2.1, the additional interactions UpdatePro-

fileModel and QueryProfileModel are provided as specified in Table A.25 and

Table A.26 of the appendix. These interactions are required at this stage

because the TFERole realized by an agent on a controlled platform cannot

communicate with the TPMASProviderRole directly (unless the con-

troller agent itself realizes this role as well) and therefore a relay is required.

Finally, for creating and modifying a profile model, the interaction Mod-

ifyProfileModel is provided as specified in Table A.27 of the appendix. This

interaction is initiated by the ProfileManagerRole, based on the respec-

tive update policy: If a profile is to be updated immediately, the interaction

is triggered whenever the interaction UpdateProfile is carried out. If a profile

is to be updated periodically, it is triggered by an internal timer. If an update

policy implies that a new profile model has to be created, it is started imme-

diately as well. Figure 7.3 illustrates the respective main use case “update

120

profile model”.

In each case, the ProfileManagerRole is responsible for supplying

the appropriate profile elements. In the first case, they may be carried over

directly from the respective profile management service. In the second case,

the ProfileManagerRole either has to keep track of all elements received

after the last profile model update or, as is done in the third case as well,

use the interaction RetrieveObjects to obtain the appropriate elements.

These interactions are sufficient if the TFERole is assumed to be honest

or at least honest-but-curious, because there is no way to propagate private

information outside the specified interactions.

Figure 7.3: Collaboration Diagram for the main use case “up-

date profile model”.

If the filter entity considers the generated model to contain sensitive infor-

mation, such as data that could be analyzed in order to obtain information

about the algorithm used by the respective filtering technique, the model

should be regarded as private data of the filter entity and subsequently it

should be protected accordingly. As described in more detail in the context

of exemplary filtering techniques in Chapter 9, this may be achieved by en-

121

crypting the model before it is propagated from the filter entity to another

entity.

7.2.2.3 Information Filtering Stage

The final stage providing the information filtering process itself uses the

data collected and processed in the preceding two stages and compares two

profiles in order to generate recommendations or a prediction for a given

item. Three different abstract entities are involved in this stage: The user

entity, a supplier entity which may be a provider entity (in a Recommender

System context) or a different user entity (in a Hybrid IF System context4),

and a filter entity. Therefore, it is the most complex stage with regard to

privacy threats.

The TFERole introduced in the previous section is used in this stage

to carry out the actual filtering process. It is neither required nor possible

to actually use the same agent for both tasks, because the respective agent

is terminated at the end of the information processing stage. With regard

to functionality, the TFERole actually aggregates two partial roles, one

used in the Information Processing stage and one used in the Information

Filtering stage, because different algorithms may be used in these stages.

However, the partial roles have to dovetail in order for the actual filtering

technique to be applicable to the generated profile models. Apart from the

actual algorithm applied, they are utilized in similar manners. Therefore, we

subsume these partial roles as the TFERole.

Based on the outline of the information filtering process described de-

scribed in Section 4.2, we describe the essential interaction steps for the use

cases based on linkable result data and private result data (including the

Hybrid IF System scenario) in Table 7.2 and Table 7.3 respectively.

For the Hybrid IF System scenario, we assume the result data to be

completely private, mainly because there is no reason why the supplier, who

in this case represents another user, should obtain the result data which

is part of his user profile and as such no new information. Therefore, the

protocol outlined in Table 7.3 could be used in this case. However, in order to

provide an additional incentive for the supplier to participate in the process

at all, additional result data should returned by the TFERole which is

actually relevant for the supplier, such as recommendations taken from the

user profile. While this could be achieved by applying the protocol with the

roles of user and supplier reversed, it is easier and more efficient to propagate

4While this chapter focuses on Recommender System functionality, the introduced

protocols may also be used in a Hybrid IF System, as described in Section 8.2.2.3. For

this reason, we describe them in a generalized form here.

122

Table 7.2: The essential interaction steps of the information fil-

tering stage for the use cases based on linkable result data, based

on the abstract protocol shown in Figure 4.1. In the case of semi-

linkable result data, the user entity roles have to remain anony-

mous in all interactions with roles of other entities.

Step Sender →Receiver Message part of interaction

I.a RR(U) restricts communication of TFE

I.b PMR(U)→RR(U)PRuQueryProfileModel

I.c RR(U)→TFE PRuQueryProfileModel

II.a PMR(S) restricts communication of RR(U), TFE

II.b PMR(S)→RR(U)PRsQueryProfileModel

II.c RR(U)→TFE PRsQueryProfileModel

III.a TFE →RR(U)RES GetResultsAsUser

III.b RR(U)→PMR(S)RES GetResultsAsSupplier

III.c PMR(S)→PMR(U)RES GetResults

III.d RR(U) terminates TFE

III.e PMR(S) terminates RR(U)

all result data in a single protocol. In this case, the result data contains

specific information for the user and the supplier, i.e. RES =RESu∪RESs.

Because the TFERole uses sensitive information related to two different

abstract entities in this stage, it has to be controlled by both entities, or,

more precisely, by each entity as soon as the respective sensitive information

is provided. As described in Section 5.2, effective control by more than one

controller can only be established through cascading control, i.e. one of the

controllers has to be controlled in turn.

We therefore introduce an additional role, the RelayRoleUser. This role,

which is aggregated by the abstract user entity, controls the TFERole and

is in turn controlled by the ProfileManagerRoleProvider. This second

control is established after the TFERole has received all user profile data

required for the filtering process. This is done for the following reason: If

control of the RelayRoleUser would be established at the beginning of the

filtering process, the user profile information would have to be communicated

via the ProfileManagerRoleSupplier, and the privacy of the user could

not be preserved. Using this construction, however, there is no way for the

TFERole to receive user profile data after a certain point, and especially

not after having received supplier profile data. This limitation has to be

taken into account when designing or selecting suitable filtering techniques,

and is therefore addressed in Chapter 9. For use cases based on private result

data, an additional RelayRoleSupplier is required for the result propagation,

123

Table 7.3: The essential interaction steps of the information fil-

tering stage for the use cases based on private result data, based

on the abstract protocol shown in Figure 4.2. In the case of semi-

private result data, all user roles must remain anonymous in in-

teractions with other roles.

Step Sender →Receiver Message part of interaction

I.a RR(U) restricts communication of TFE

I.b PMR(U)→RR(U)PRuQueryProfileModel

I.c RR(U)→TFE PRuQueryProfileModel

II.a RR(S) restricts communication of RR(U), TFE

II.b PMR(S)→RR(S)PRsQueryProfileModel

II.c RR(S)→RR(U)PRsQueryProfileModel

II.d RR(U)→TFE PRsQueryProfileModel

III.a PMR(U) restricts communication of RR(S), TFE

III.b TFE →RR(U)RES GetResultsAsUser

III.c RR(U)→RR(S)RES GetResultsAsSupplier

III.d RR(S)→PMR(U)RES GetResults

For semi-private result data:

repeat 3.5 ∀res ∈RES:

III.e PMR(U)→PMR(S)res ExchangeResults

For completely private result data:

III.e omitted

For the Hybrid IF System scenario:

III.e PMR(U)→PMR(S)RESsExchangeResults

III.f RR(U) terminates TFE

III.g RR(S) terminates RR(U)

III.h PMR(U) terminates RR(S)

as described in Table 7.3.

The additional interactions required in this stage, namely the interac-

tions GetResultsInternally,GetResults,GetResultsAsSupplier,GetResultsAsUser,

ExchangeResults, and ObtainRelay are specified in Table A.28, Table A.29, Ta-

ble A.30, Table A.31, Table A.32, and Table A.33 of the appendix. Finally,

the interaction ShareKeys as specified in Table A.34 of the appendix is used

by roles aggregated by the same abstract entity for exchanging keys used in

encryption schemes. It is included here because while it is not required as

long as only honest and honest-but-curious participants are assumed, is is

required in case of malicious participants, as described in Section 7.3.1.

Figure 7.4 illustrates the use cases “get recommendations” and “get pre-

124

Figure 7.4: Collaboration Diagram for the main use cases “get

recommendations” and “get prediction for item”, based on link-

able result data in a Recommender System context.

diction for item”, based on linkable result data in a Recommender System

context. Figure 7.5 illustrates the same use cases, based on private result

data in a Recommender System context.

Query Data Propagation As indicated by its name, the RelayRoleUser

has to act as a relay for interactions initiated by the TFERole. It par-

ticipates in the interactions QueryProfile and QueryProfileModel as well, but

instead of interacting directly with a TPMASProviderRole (as the Pro-

fileManagerRole does), it relays the queries by interacting with the Pro-

fileManagerRole of the other abstract entity.

In the Hybrid IF System scenario, where the supplier represents a dif-

ferent user entity, entire user profiles may be retrieved independent of each

other because they are generally rather small. In the Recommender System

scenario, however, retrieving an entire provider profile is often infeasible due

125

Figure 7.5: Collaboration Diagram for the main use cases “get

recommendations” and “get prediction for item”, based on private

result data in a Recommender System context.

to its size. This may not apply to the mixed IR/IF scenario in which a

constrained provider profile is used that is obtained via an additional non-

privacy-critical query. Regular queries on the supplier profile (including the

supplier profile models), however, are potentially critical with regard to user

privacy, because the TFERole may use parts of the user profile within the

query structure. This course of action should not be prevented completely,

because it is actually the only feasible way to obtain a partial supplier pro-

file containing the relevant parts of a supplier profile, short of retrieving the

entire profile, as the relevant parts are expected to be those that have some

relation to the user profile.

When querying the supplier profile, the respective filtering algorithms

have to take user privacy into account and use either unlinkable user profile

elements in the query, or no user profile elements as such at all. Exemplary

filtering techniques for both approaches are given in Chapter 9. In the case

126

Table 7.4: The Phase II interaction steps of the information fil-

tering stage for scenarios based on unlinkable queries and for the

use cases based on linkable result data. The RelayRoleUser has

to remain anonymous in all interactions with the ProfileMan-

agerRoleSupplier.

Step Sender →Receiver Message part of interaction

repeat II.a to II.e ∀pr ∈PRu:

II.a TFE →RR(U)q(pr)QueryProfileModel

II.b RR(U)→PMR(S)q(pr)QueryProfileModel

II.c PMR(S)→RR(U){PPq(pr)}KPQueryProfileModel

II.f PMR(S) restricts communication of RR(U), TFE

II.g PMR(S)→RR(U)KPQueryProfileModel

II.h RR(U)→TFE PPq(P Ru)QueryProfileModel

of honest and honest-but-curious participants we assume the TFERole to

actually use a privacy-preserving approach for querying. Threats originating

from a malicious TFERole are addressed in Section 7.3.1.2.

Unlinkable queries have to be realized through anonymized interaction:

The RelayRole sends single queries to the ProfileManagerRoleSupplier

(or the RelayRoleSupplier in case of private result data), and receives the

respective results. Because agents on controlled platforms cannot commu-

nicate anonymously, these interactions have to be carried out before control

of the RelayRole is established. In order to protect the provider data, it

is send in encrypted form by the ProfileManagerRoleSupplier, who pro-

vides the key only after control has finally been established, i.e. after the

final anonymous interaction. Taken together, these steps (as listed in Table

7.4 and Table 7.5) replace the steps of Phase II of the abstract protocol.

The unlinkability of single queries obviously depends on the number of

parallel interactions of different agents realizing the RelayRole with one

supplier: If only one single filtering process takes place in a given time period,

unlinkability is not achieved. Unfortunately, it is difficult for the user to come

up with a realistic estimation of this number. There are three approaches

for increasing the degree of unlinkability in case a low number of parallel

interactions is suspected:



The time period may be increased by deliberately delaying the single

interactions. While the probability of parallel interactions rises with

increasing length of the time period, this approach also results in in-

creasing response times, which may be critical in case the user actively

waits for results.

127

Table 7.5: The Phase II interaction steps of the information fil-

tering stage for scenarios based on unlinkable queries and for the

use cases based on private result data. The RelayRoleUser has

to remain anonymous in all interactions with the ProfileMan-

agerRoleSupplier.

Step Sender →Receiver Message part of interaction

repeat II.a to II.c ∀pr ∈PRu:

II.a TFE →RR(U)q(pr)QueryProfileModel

II.b RR(U)→RR(S)q(pr)QueryProfileModel

II.c RR(S)→PMR(S)q(pr)QueryProfileModel

II.d PMR(S)→RR(S){PPq(pr)}KPQueryProfileModel

II.e RR(S)→RR(U){PPq(pr)}KPQueryProfileModel

II.d RR(S) restricts communication of RR(U), TFE

II.e PMR(S)→RR(S)KPQueryProfileModel

II.f RR(S)→RR(U)KPQueryProfileModel

II.g RR(U)→TFE PPq(P Ru)QueryProfileModel



Additional interactions based on dummy queries may be initiated by

the RelayRole.



Entire dummy interactions of the type GetResults may be initiated by

the ProfileManagerRoleUser. This approach may be problematic

with regard to the overall performance.

In the following, we assume that the number of parallel interactions is

sufficiently large, which is a realistic assumption for systems handling a large

number of users.

Result Data Propagation In both main use cases, the results, i.e. recom-

mendations, similar users, or a prediction, have to be propagated along the

cascade of controllers. In the Recommender System scenario, the supplier

may obtain the personalized information as well, mainly in order to improve

the quality of its information, based on data about information in high de-

mand. In the distributed Hybrid IF System scenario, result information is

usually not propagated to the other participating entity because it is neither

required nor would it generally be possible to realize unlinkability in this

case, because a user generally participates in parallel interactions to a much

smaller extent compared to a provider.

As described in Section 4.1, there are four different cases with regard to

the propagation of result data, which are realized by adjusting the steps of

Phase III of the information filtering protocol:

128



Completely Linkable Result Data: In this scenario, no adjustments are

required.



Semi-Linkable Result Data: In this scenario, the ProfileManager-

RoleUser has to remain anonymous in all interactions with the Pro-

fileManagerRoleSupplier. The interaction steps as such do not have

to be adjusted.



Semi-Private Result Data: In this scenario, the result data is propa-

gated to the ProfileManagerRoleUser via an additional relay, the

RelayRoleSupplier. In single anonymous interactions, the Profile-

ManagerRoleUser propagates the result data to the ProfileMan-

agerRoleSupplier.



Completely Private Result Data: In this scenario, the final interaction

step between ProfileManagerRoleUser and ProfileManager-

RoleSupplier is omitted.

Other protocols are conceivable, especially for the scenarios based on pri-

vate result data. By encrypting the result data, it would be possible to forgo

the ProfileManagerRoleSupplier. It will turn out, however, that these

alternatives are less suitable in the case of malicious participants, which is

addressed in the following section. Therefore, they are not examined further

at this point.

7.2.3 Summary

In the analysis phase, we have defined basic interactions addressing all threats

originating from honest-but-curious participants in our approach by counter-

measures as shown in Table 7.6. These basic interactions have to be refined

further in order to address threats originating from malicious participants.

We combine this refinement with other tasks of the design phase, which is

discussed in the following section.

7.3 Design & Implementation

This section describes the agents and agent services of the Recommender

Module. The interactions defined in the previous sections have to be refined

further in order to address threats originating from malicious participants,

as well as other aspects. It turns out that the interactions as such are suf-

ficient, but the interaction steps have to be extended in many cases. The

129

Table 7.6: Threats in PPIF with honest-but-curious partici-

pants, and countermeasures in our approach.

permanent by user by supplier by filter

acquisition of

user profile does not acquire TFERole

data n/a linkable data is controlled

permanently by user

supplier profile RelayRoleUser TFERole

data is controlled n/a is controlled

by supplier by supplier

only acquired as TFERole

result data n/a specified is controlled

by user

following section describes extensions as countermeasures for various threats

originating from malicious participants. Subsequent sections describe exten-

sions addressing other aspects, and also the agents and agent services.

7.3.1 Threats and Countermeasures

Malicious participants may deviate from the specified protocols in any con-

ceivable way either in order to propagate private information related to an-

other participant, or in order to alter or disrupt the overall interaction for

other reasons. Because there is no possible way for malicious participants

on controlled platforms to communicate with external parties, we are able

to restrict the discussion of threats originating from malicious participants

to the interactions taking place along the cascade of controllers up to the

initiator, i.e. to interactions between roles as defined in the Section 7.2.2 of

this chapter as shown in Figure 4.1 and Figure 4.2 for the use cases based on

linkable result data and private result data respectively.

As there is no critical interaction between roles aggregated by the same

abstract entities, the opportunities for deviations are limited: Attempts to

establish an additional channel for propagating private information are easily

detected and stopped, basically because these attempts would be registered

as obvious deviations from the protocol. Therefore, critical malicious at-

tempts are limited to the following two main aspects, which are discussed in

the following sections: Altering queries on the supplier profile, and altering

result data. We discuss modifications to the protocols which address these

threats in the following, starting with the description of various protocols as

building blocks for secure message forwarding.

130

7.3.1.1 Secure Message Forwarding

We introduce two generic protocols for secure message forwarding as building

blocks that are applied in the following sections in order to eliminate threats

based on altered result data and the use of subliminal channels.

Consider a scenario involving three entities A,B, and C.Aand Care

not able to communicate directly, but both are able to communicate with

B, whom they do not trust. Aand C, however, have been able to exchange

unlimited information, including a shared key KAC, earlier. Bhas a secret

key KB.Aintends to propagate the message mto C, which has to be done

via B.Bmay know the content of the message, but should not be able

to alter it. On the other hand, Bwill only forward the message if it can

be certain that no additional information is propagated via any subliminal

channel (such as a key used by A). The message mitself is not considered

to contain any hidden information. This restriction is somewhat problematic

because it is conceivable that Aand Chave previously agreed on some code

to be used in the message. However, Bmay modify the original message prior

to the actual protocol steps until he is convinced that it does not contain any

hidden information.

This scenario is generally known as the prisoners’ problem [102], because

a real-life analogy consists of two prisoners intending to communicate, which

has to be done via a warden who insists on being able to access all messages

in unencrypted form and will only forward messages if they contain infor-

mation he has approved. The prisoners, on the other hand, want to prevent

the warden from modifying the messages in an undetectable manner. The

prisoners’ problem has been introduced to motivate the use of subliminal

channels [102], whereas in our solution for secure message forwarding the

goal is to prevent the participants from using subliminal channels.

The most straightforward protocol for secure message forwarding is a

protocol based on digital signatures, as listed in Table 7.7. However, digital

signature schemes have been shown to contain subliminal channels [103] and

are therefore unsuitable in this context.

We provide the following solution for a secure message forwarding pro-

tocol (designated SMF1): As listed in Table 7.8, a keyed-Hash Message Au-

thentication Code (HMAC) of the message mis propagated by Ain addition

to the message itself in order to prevent undetectable modifications of the

message (Step a). The HMAC is encrypted by Bwith a secret key, and the

encrypted HMAC is propagated, along with a hash of the key (Step c). The

key KAC is sent to Bin order to allow Bto verify that the HMAC is not

used as a subliminal channel, i.e. that it is actually an encrypted hash of the

message m(Step d). Finally, the message itself is propagated by B, along

131

Table 7.7: A protocol for secure message forwarding based on

digital signatures. sK(x) indicates a message xsigned via a private

key K. The corresponding public key may be used to verify the

signature.

Step Sender →Receiver Message

a A →B m, sKA(h(m))

b B checks m

c B checks h(m) via public key KA0

d B →C m, sKA(h(m))

e C checks h(m) via public key KA0

with the key KB, which allows Cto verify that the message has not been

altered (Step f).

Propagating the hash of the key KBprevents Bfrom altering the message

to m0after Step d, which would be undetectable otherwise because Bcould

choose a key KB0so that {{h(m)}KAC }KB={{h(m0)}KAC }KB0. The key

KAC cannot be used as a subliminal channel: If Aand Cagree on a number

of different keys KACnbefore the first step of the protocol, and Auses a

specific key KACiin order to propagate additional information, Chas to try

out various keys until a valid hash is obtained. This is not possible because

Chas to propagate KAC before he obtains the encrypted hash. Thus, C

would have to guess the correct key.

Table 7.8: A protocol for secure message forwarding (SMF1)

based on a HMAC.

Step Sender →Receiver Message

a A →B m, {h(m)}KAC

b B checks m

c B →C h(KB),{{h(m)}KAC }KB

d C →B KAC

e B decrypts and checks h(m)

f B →C m, KB

g C checks h(KB)

h C decrypts and checks h(m)

Obviously, the keys KBand KAC may only be used once, which is un-

fortunate because it adds to the complexity of the communication required

for key exchange. This drawback has to be put up with in order to achieve

secure message forwarding. The encryption scheme used for encrypting the

hash obviously has to be secure against known-plaintext attacks, because

132

otherwise Bmay be able to obtain KAC after Step aand subsequently al-

ter min an undetectable way. Additionally, the encryption scheme must

not be commutative, i.e. a scheme where {{m}KA}KB={{m}KB}KAcannot

be used for this protocol: In this case, Bcould alter mto m0and choose

a key KB0so that {m}KB={m0}KB0and, because of the commutativity,

{{m}KAC }KB={{m0}KAC }KB0. By propagating KB0instead of KB,Bwould

cause Cto unknowingly decrypt m0instead of m.

We additionally provide a slightly modified protocol (SMF2), listed in

Table 7.9, in which the message itself is encrypted, instead of using an HMAC.

This solution has a slightly lower communication complexity, but a higher

computational complexity (as shown in Table 7.10). It is somewhat less

suitable for cascading secure message forwarding involving more than three

participants, but more suitable for secure iterative message forwarding, as

described in the following. This modified protocol does not suffer from the

vulnerabilities mentioned above, i.e. a commutative encryption scheme may

be used here as well as an encryption scheme vulnerable against known-

plaintext attacks (although both options are generally not recommended).

Table 7.9: A protocol for secure message forwarding (SMF2)

based on a symmetric encryption scheme.

Step Sender →Receiver Message

a A →B{m}KAC

b B →C h(KB),{{m}KAC }KB

c C →B KAC

d B decrypts and checks m

e B →C KB

f C decrypts m

g C checks h(KB)

Cascading Secure Message Forwarding A generalized case of the pro-

tocol for secure message forwarding is the following: A message is to be

forwarded securely along a cascade of participants A1,B1, .., An−1,Bn−1,

An,Bn(with Bnbeing optional), in which each Aishares keys with and

trusts the other Aj, but does not trust any Bk, and vice versa. Every par-

ticipant along the cascade has to be able to verify the message m, but must

not be able to modify it. This is achieved by repeating the protocol steps

introduced above for every three consecutive participants. For SMF1, this is

especially efficient because the final step of the iteration i−1 may be merged

with the first step of the iteration i, resulting in a reduced communication

complexity because monly has to be propagated once. For SMF2, no such

133

Table 7.10: A comparison of the protocols for secure message

forwarding in terms of communication complexity and computa-

tional complexity. The size of hashes and keys is constant and

therefore almost insignificant in relation to the size of messages.

The same applies with regard to the computational complexity

of encrypting and decrypting hashes vs. messages. I(x) denotes

information of size x.

SMF1 SMF2

communication complexity

# of I(m) 2 2

# of I(h) 3 1

# of I(K) 2 2

computational complexity

# of encryptions of I(m) – 2

# of decryptions of I(m) – 4

# of encryptions of I(h) 2 –

# of decryptions of I(h) 4 –

# of hashing operations on I(m) 3 –

# of hashing operations on I(K) 2 2

optimization is possible. Additionally, when used in this way, the restrictions

described above regarding vulnerability against known-plaintext attacks and

commutativity apply to SMF2 as well.

Secure Iterative Message Forwarding Another generalized case of the

protocol for secure message forwarding is the following: Aintends to forward

a number of messages to Cvia B, but Bmust not be able to withhold the

propagation of message miand still obtain subsequent messages mi+xat

the same time. SMF1 cannot be used for this task without modifications,

because Awould not be able to decide whether to start another iteration of

the protocol. SMF2, on the other hand, may be used for this task because

if Cdoes not receive the information required to obtain mi, he may refuse

to proceed with subsequent iterations, which prevents Bfrom being able to

decrypt mi+x. Finally, the protocol for cascading secure message forwarding

may be combined with the protocol for secure iterative message forwarding.

The protocol listed in Table 7.11 constitutes an example.

134

7.3.1.2 Altering Queries

Based on the protocols for secure message forwarding, we are now able extend

the interactions described above in order to counter malicious threats.

Queries on a profile are relayed through the RelayRole associated with

the opposing participating entity. The RelayRole therefore has to de-

cide whether a query is used as specified, i.e. to retrieve data in a privacy-

preserving way, or whether the query is used instead to propagate private

information. In the most straightforward case, a complete profile is returned

and the respective query does not contain any specific information at all. If

single profile elements are used within queries, unlinkable interactions may

be carried out as described in Section 7.2.2.3. In all other cases, the decision

becomes more complicated. Ultimately, if the user does not trust the fil-

ter completely, filtering techniques based on more advanced query structures

should not be used because the possibility of using the queries as subliminal

channels cannot be ruled out completely.

7.3.1.3 Altering Result Data in Recommender Systems

Depending on the use case, the result data consists of a set of recommenda-

tions or similar users, or a single prediction of the relevance of an item. In

the Recommender System scenario, participants may attempt to alter this

result data for various purposes. In order to prevent a successful execution

of these attempts, the interaction steps are extended as explained below and

as listed in Table 7.11, Table 7.12, and Table 7.13. Result data may be with-

held maliciously by the TFERole or any RelayRole. While this cannot

be prevented, it constitutes neither a serious nor a probable threat, because

the respective main abstract entity would not benefit from this action and

therefore the motivation for deviating from the protocol in this manner is

considered to be low. Other threats are examined in detail in the following.

TFERole Alters Result Data The TFERole may attempt to propa-

gate private user information via the result data, or otherwise alter the result

data in a way that is unfavorable for the user entity. As the filter entity would

not benefit directly from this action, it makes sense only if filter and supplier

collude, or if the filter entity intends to cause suspicion of a possible collusion,

a scenario that seems somewhat far-fetched.

The TFERole may also alter the result data in a way that results in the

user obtaining incorrect data, while the supplier obtains correct data, e.g. by

returning result data according to a previously defined code (such as a code

in which a recommendation xactually stands for recommendation y, or a

135

Table 7.11: The extended protocol steps for the propagation of

result data in the Recommender System scenario, for the case of

completely linkable result data. For semi-linkable result data, the

ProfileManagerRoleUser has to remain anonymous in inter-

actions with the ProfileManagerRoleSupplier. Apart from this

modification, the same steps may be used.

Step Sender →Receiver Message part of interaction

[O.a to O.c: Key sharing]

O.a TFEF →TFE KA ShareKeys

O.b TFEF →PMR(S)KA ShareKeys

repeat O.c ∀res ∈RES:

O.c PMR(U)→RR(U)KDres ShareKeys

[Phase Iand Phase II as above]

[III.a to III.i: SMF1 with modified final step for RES]

III.a TFE →RR(U)RES, {H(RES)}KA GetResultsAsUser

III.b RR(U) analyzes RES

III.c RR(U) creates secret key KB

III.d RR(U)→PMR(S)h(KB),{{H(RES)}KA}KB GetResultsAsSupplier

III.e PMR(S)→RR(U)KA GetResultsAsSupplier

III.f RR(U) decrypts and checks h(RES)

III.g RR(U)→PMR(S)KB GetResultsAsSupplier

III.h PMR(S) checks h(KB)

III.i PMR(S) decrypts H(RES) (cannot check it yet)

[III.j to III.r: SMF2 ∀res ∈RES]

repeat III.j ∀res ∈RES:

III.j RR(U)→PMR(S){res}KDres GetResultsAsSupplier

III.k PMR(S) creates secret key KEres

repeat III.l ∀res ∈RES:

III.l PMR(S)→PMR(U)h(KEres),{{res}KDres }KEres GetResults

repeat III.m to III.r ∀res ∈RES:

III.m PMR(U)→PMR(S)KDres GetResults

III.n PMR(S) decrypts and analyzes res

III.o PMR(S) checks H(RES) from above w.r.t h(res)

III.p PMR(S)→PMR(U)KEres,complete data(res)GetResults

III.q PMR(U) decrypts res

III.r PMR(U) checks h(KEres)

III.s RR(U) terminates TFE

III.t PMR(S) terminates RR(U)

136

Table 7.12: The extended protocol steps for the propagation of result

data in the Recommender System scenario, for the case of semi-private

result data. The user must remain anonymous in Step III.t and Step III.u.

Step Sender →Receiver Message part of interaction

[O.a to O.d: Key sharing]

O.a/b TFEF →TFE/RR(S)KA ShareKeys

O.c PMR(S)→RR(S)KC (session-independent) ShareKeys

O.d PMR(U)→RR(U)KD ShareKeys

[Phase Iand Phase II as above]

III.a PMR(U) restricts communication of RR(S), TFE

[III.b to III.j: SMF1 for RES]

III.b TFE →RR(U)RES, {h(RES)}KA GetResultsAsUser

III.c RR(U) analyzes RES

III.d RR(U) creates secret key KB

III.e RR(U)→RR(S)h(KB),{{h(RES)}KA}KB GetResultsAsSupplier

III.f RR(S)→RR(U)KA GetResultsAsSupplier

III.g RR(U) decrypts and checks h(RES)

III.h RR(U)→RR(S)RES, KB GetResultsAsSupplier

III.i RR(S) analyzes RES

III.j RR(S) checks h(KB); decrypts and checks h(RES)

[III.k to III.q: SMF1 with modified final step for RES]

III.k RR(U)→RR(S){H(RES)}KD GetResultsAsSupplier

III.l RR(S) creates secret key KE

III.m RR(S)→PMR(U)h(KE),{{H(RES)}KD}KE GetResults

III.n PMR(U)→RR(S)KD GetResults

III.o RR(S) decrypts and checks h(RES)

III.p RR(S)→PMR(U)KE GetResults

III.q PMR(U) checks h(KE)

repeat III.r to III.w ∀res ∈RES:

III.r RR(S)→PMR(U)res GetResults

III.s PMR(U) decrypts and checks h(res)

III.t PMR(U)→PMR(S)res ExchangeResults

III.u PMR(S)→PMR(U)complete data(res),{h(res)}KC ExchangeResults

III.v PMR(U)→RR(S){h(res)}KC GetResults

III.w RR(S) decrypts and checks h(res)

III.x RR(U) terminates TFE

III.y RR(S) terminates RR(U)

III.z PMR(U) terminates RR(S)

137

Table 7.13: The extended protocol steps for the propagation of

result data in the Recommender System scenario, for the case of

completely private result data.

Step Sender →Receiver Message part of interaction

[O.a to O.c: Key sharing]

O.a/b TFEF →TFE/RR(S)KA ShareKeys

O.c PMR(U)→RR(U)KD ShareKeys

[Phase Iand Phase II as above]

III.a PMR(U) restricts communication of RR(S), TFE

[III.b to III.j: SMF1 for RES]

III.b TFE →RR(U)RES, {h(RES)}KA GetResultsAsUser

III.c RR(U) analyzes RES

III.d RR(U) creates secret key KB

III.e RR(U)→RR(S)h(KB),{{h(RES)}KA}KB GetResultsAsSupplier

III.f RR(S)→RR(U)KA GetResultsAsSupplier

III.g RR(U) decrypts and checks h(RES)

III.h RR(U)→RR(S)RES, KB GetResultsAsSupplier

III.i RR(S) analyzes RES

III.j RR(S) checks h(KB), decrypts and checks h(RES)

[III.k to III.r: SMF1 for RES]

III.k RR(U)→RR(S){h(RES)}KD GetResultsAsSupplier

III.l RR(S) creates secret key KE

III.m RR(S)→PMR(U)h(KE),{{h(RES)}KD}KE GetResults

III.n PMR(U)→RR(S)KD GetResults

III.o RR(S) decrypts and checks h(RES)

III.p RR(S)→PMR(U)RES, KE GetResults

III.q PMR(U) checks h(KE)

III.r PMR(U) decrypts and checks h(RES)

III.s RR(U) terminates TFE

III.t RR(S) terminates RR(U)

III.u PMR(U) terminates RR(S)

138

prediction kfor a prediction k+m). While there does not seem to be any

way to prevent this, it affects the quality and consistency of the result data

and therefore will probably be noticed. Additionally, it does not immediately

threaten user privacy.

These threats do not apply to the case of completely private result data,

because in that case the result data is acquired permanently by the user

entity only. In the case of semi-private, semi-linkable or completely linkable

result data, the following solution applies:

As the RelayRoleUser receives all result data before it is acquired per-

manently by the supplier, it is able to check the result data for suspicious

information. While some attempts may be detected immediately, e.g. if user

profile elements are used as recommendations, more sophisticated attempts

based on subliminal channels have to be detected as well. The possibility

of subliminal channels within the result data used to propagate private in-

formation increases with the complexity of the result data. For example, a

vector containing several floating point decimals may be used more easily for

encoding information than single boolean values. Because recommendations

are always a subset of the supplier profile data, it is sufficient to return a short

identifier as a recommendation, based on which the user may subsequently

obtain the complete element.

For the same reason, predictions should be taken from a limited range of

possible values, instead of allowing arbitrary values. Taken together, these

steps minimize the possibilities for subliminal channels. The interaction

between the ProfileManagerRoleSupplier and the ProfileManager-

RoleUser is extended by a final step in which the complete recommendation

is returned if necessary. To prevent the supplier from returning arbitrary

information as the complete recommendation, the identifier should be mean-

ingful in itself, i.e. a string expressing a recognizable movie title would be

preferable to an apparently arbitrary number.

Furthermore, the TFERole may attempt to propagate private supplier

information via the result data, or otherwise alter the result data in a way

that is unfavorable to the supplier, in a similar way as described above. This

only makes sense if filter and user collude, or if the filter intends to cause

suspicion of a possible collusion. In this case, the following solution applies:

As either the RelayRoleSupplier or the ProfileManagerRoleSupplier

receives all result data before it is acquired permanently by the user, it is

able to check the result data for suspicious information, in a similar manner

as described above. The use of subliminal channels by the TFERole (such

as h(RES) or KA) is prevented implicitly by using the protocol for secure

message forwarding. Table 7.14 summarizes the threats and countermeasures

discussed in this paragraph.

139

Table 7.14: Countermeasures against the TFERole as a mali-

cious participant. The user entity has to be able to analyze the

results before they are obtained by the supplier entity (perma-

nently or temporarily), and the supplier entity has to be able to

analyze the results before they are permanently obtained by the

user entity. This is actually accomplished in all protocols as sum-

marized here.

linkable semi-private compl. private

internal action result data result data result data

(see Table 7.11) (see Table 7.12) (see Table 7.13)

user entity RR(U)RR(U)RR(U)

analyzes result data via Step III.b Step III.c Step III.c

supplier entity obtains/ PMR(S)RR(S)RR(S)

analyzes result data via Step III.n Step III.i Step III.i

user entity obtains PMR(U)PMR(U)PMR(U)

result data via Step III.q Step III.r Step III.p

Other Roles Alter Result Data The RelayRoleUser may attempt to

propagate private supplier information via the result data, or otherwise alter

the result data in a way that is unfavorable to the supplier. In order to

prevent a successful execution of this attempt, the result data is propagated

via the protocol for secure message forwarding SMF1 (Step III.a to Step III.i

in Table7.11, Step III.b to Step III.j in Table 7.12, and Step III.b to Step III.j

in Table 7.13).

If the TFERole and the RelayRoleUser collude, this threat obviously

is not preventable at this stage. In this case, the supplier still is able to react

in the same manner as described above for the threat originating from the

TFERole. In all other cases, however, it is more efficient to rely on the

encryption than having to analyze the returned data.

The RelayRoleSupplier or the ProfileManagerRoleSupplier may at-

tempt to propagate private user information via the result data, or otherwise

alter the result data in a way that is unfavorable to the user. In order to

prevent a successful execution of this attempt, the result data is propagated

via the protocol for secure message forwarding SMF2 (Step III.j to Step III.r

in Table 7.11, and the protocol for secure message forwarding SMF1 respec-

tively (Step III.k to Step III.q in Table 7.12, and Step III.k to Step III.r in

Table 7.13). In particular, the iterative disclosure of result data in Step III.j

to Step III.r in Table 7.11 prevents the ProfileManagerRoleSupplier from

withholding result data. Similarly, the protocol part consisting of Step III.r

to Step III.w in Table 7.12 prevents the ProfileManagerRoleUser from

140

withholding or altering result data, because this would be noticed by the

RelayRoleSupplier, which subsequently would be able to halt the protocol.

7.3.1.4 Altering Result Data in Hybrid IF Systems

In the Hybrid IF System scenario, participants may attempt to alter result

data for various purposes largely analogous to the cases described above. In

order to prevent a successful execution of these attempts, the interaction

steps are extended as listed in Table 7.15.

7.3.1.5 Completion of Iterative Disclosure

A minor but potentially problematic threat arises in protocols based on an

iterative disclosure of result data: While a participant withholding the entire

result data would cause the respective protocol to stop, withholding the final

part of the result data (e.g. the final recommendation) does not have any

direct consequences because the sender does not subsequently receive any

additional information anyway.

If both participants act strictly rationally, this problem does not only

affect the final part of the result data, but ultimately the entire result data,

because a participant who cannot expect to receive data in step nmay choose

not to carry out step n−1, and thus no participant would be sufficiently

motivated even to begin the protocol.

We discuss this threat for the different scenarios and cases:



In the cases of completely linkable data and semi-linkable data in the

Recommender System scenario, this threat is less problematic because

it only applies to the ProfileManagerRoleSupplier (Step III.p in Ta-

ble 7.11), who can be expected to carry out the protocol as specified in

order to gain the trust of the users.



In the case of semi-private result data, this threat is less problematic

because while the ProfileManagerRoleUser may actually withhold

the final result data (Step III.t in Table 7.12), this does not lead to

the ProfileManagerRoleSupplier withholding the previous data, be-

cause the interactions are unlinkable from the point of view of the

ProfileManagerRoleSupplier.



In the case of completely private result data in the Recommender Sys-

tem scenario, the threat does not apply because there is no iterative

disclosure.

141

Table 7.15: The extended protocol steps for the propagation of

result data in the Hybrid System scenario, for the case of com-

pletely private result data.

Step Sender →Receiver Message part of interaction

[O.a to O.e: Key sharing]

O.a/b TFEF →TFE/RR(S)KA ShareKeys

O.c PMR(S)→RR(S)KC ShareKeys

repeat O.c ∀res(U)∈RES(U):

O.d PMR(S)→RR(S)KFres(U)ShareKeys

O.e PMR(U)→RR(U)KD ShareKeys

[Phase Iand Phase II as above]

III.a PMR(U) restricts communication of RR(S), TFE

III.b RR(S)→RR(U)KC GetResultsAsSupplier

repeat III.c ∀res(U)∈RES(U):

III.c RR(S)→RR(U)KFres(U)GetResultsAsSupplier

[III.d to III.k: SMF1 for RES]

III.d TFE →RR(U)RES, {h(RES)}KA GetResultsAsUser

III.e RR(U) analyzes RES; creates secret key KB

III.f RR(U)→RR(S)h(KB),{{h(RES)}KA}KB GetResultsAsSupplier

III.g RR(S)→RR(U)KA GetResultsAsSupplier

III.h RR(U) decrypts and checks h(RES)

III.i RR(U)→RR(S)RES, KB GetResultsAsSupplier

III.j RR(S) analyzes RES

III.k RR(S) checks h(KB); decrypts and checks h(RES)

[III.l to III.r: SMF1 for RES0=Sres0with res0def

={res(S)}KC,{res(U)}KFres(U)]

III.l RR(U)→RR(S){h(RES0)}KD GetResultsAsSupplier

III.m RR(S) creates secret key KE

III.n RR(S)→PMR(U)h(KE),{{h(RES0)}KD}KE GetResults

III.o PMR(U)→RR(S)KD GetResults

III.p RR(S) decrypts and checks h(RES0)

III.q RR(S)→PMR(U)RES0, KE GetResults

III.r PMR(U) checks h(KE); decrypts and checks h(RES0)

repeat III.s to III.t ∀res0∈RES0:

III.s PMR(U)→PMR(S){res(S)}KC ExchangeResults

III.t PMR(S)→PMR(U)KFres(U)ExchangeResults

III.u RR(U) terminates TFE

III.v RR(S) terminates RR(U)

III.w PMR(U) terminates RR(S)

142



In the case of completely private result data in the Hybrid System

scenario, the threat may be countered by letting the initiating user

represented by the ProfileManagerRoleUser decide on the size of

the result data (i.e. the number nof recommendations to be returned),

and by keeping this number secret from the other user represented by

the ProfileManagerRoleSupplier. Thus, the ProfileManager-

RoleSupplier cannot withhold (Step III.t in Table 7.15) without risking

to miss information about further result data.

If the result data consists of a single prediction, the iterative disclosure

procedure is reduced to a single iteration which is even more problematic.

Therefore, if predictions are to be returned instead of recommendations, the

result data should contain a number of predictions for different items instead

of a single prediction. If this is done, the countermeasures described above

apply here as well.

7.3.1.6 Reaction to Detected Threats

Finally, all roles involved in the interactions have to be able to react in an

appropriate manner if they detect deviations from the protocol. While the

ProfileManagerRole may just abort the overall interaction and log or

otherwise report the attempted deviation, the RelayRole is more restricted

in this regard, because it cannot communicate freely. As an example, if the

RelayRole in the Recommender System scenario detects or suspects a col-

lusion between the other roles involved, it cannot report to the ProfileM-

anagerRoleUser. It may still abort the overall interaction, but in this case

the supplier may divert suspicion by announcing that a different problem

caused the interruption of the overall interaction.

Therefore, a RelayRole should be allowed to establish an additional

channel in order to propagate some kind of status flag to the ProfileM-

anagerRoleUser. If only a few bits are used for this channel, it is likely to

be too small to be used as a feasible subliminal channel, but wide enough to

communicate one of a number of status flags previously agreed upon. If the

meaning of status flags is changed in each overall interaction, the supplier

cannot modify the status flag in an undetectable way. Additionally, the Re-

layRole should continue to interact with the supplier even when it detects

a deviation in order to ensure that the result flag is actually propagated, but

at the same time it should deviate itself from the protocol in order to prevent

the acquision of private information, e.g. by replacing encrypted result data

with random noise.

143

7.3.2 Other Requirements

Apart from privacy requirements, interactions may be extended for other

reasons, e.g. in order to improve the performance of the system:

In the mixed IR/IF scenario, recommendations may be generated in two

ways:



The user may apply the IR-related query to recommendations retrieved

in the usual manner. However, this approach is likely to result in an

small or even empty set of recommendations, because few or no matches

may be found in the candidate set of recommendations, which is usually

itself rather small.



The supplier may apply the IR-related query to his profile, and gener-

ate recommendations from the result set. While this approach cannot

utilize a pre-computed profile model, it is actually more feasible be-

cause the relevant part of the provider profile can be expected to be

small enough to be propagated completely to the TFRRole.

Therefore, the interaction GetRecommendations has to be extended by

using the IR-related query as an additional input parameter.

Furthermore, especially in this case repeated interactions between a given

user and supplier within a short time period are likely because the user may

send various IR-related queries5. In this case, it is not required to repeat all

interactions for each single filtering process, because the agents on controlled

platforms may be re-used. Interactions may be extended accordingly to al-

low this re-use of agents. We omit the details of the extended interactions

because they are not directly relevant for the overall architecture. It should

be noted, however, that re-using the RelayRole rules out the possibility of

anonymous interactions for querying the provider profile.

7.3.3 Agents and Agent Services

Based on the extended interactions described in the previous sections, we

are now able to define the agent services. In most cases, interactions are

mapped to agent services in a straightforward manner, as shown in Table

7.16. Interactions with similar input and output are aggregated as one agent

service, because they can be regarded as having the same effect. Depending

on the actual interaction, different protocols are used within the respective

service.

5In the regular scenario, repeated interactions only make sense when at least one profile

changes, which happens only intermittently.

144

Table 7.16: The mapping of interactions to agent services.

Interaction Table Agent Service

Information Collection Stage

UpdateProfile A.21 UpdateProfile

QueryProfile A.22 QueryProfile

Information Processing Stage

ObtainTFE A.23 ObtainTFE

SetUpdatePolicy A.24 SetUpdatePolicy

UpdateProfileModel A.25 UpdateProfileModel

QueryProfileModel A.26 QueryProfileModel

ModifyProfileModel A.27 ModifyProfileModel

Information Filtering Stage

GetResultsInternally A.28 GetResults

GetResults A.29

GetResultsAsSupplier A.30

GetResultsAsUser A.31

ExchangeResults A.32 ExchangeResults

ObtainRelay A.33 ObtainRelay

ShareKeys A.34 ShareKeys

Roles are aggregated by agents in a similarly straightforward manner, as

shown in Table 7.17. While it may be advisable to aggregate various roles be-

longing to the same abstract entity (such as the TPMASProviderRoleUser

and the ProfileManagerRoleUser) for performance reasons, we use sep-

arate agents in order to keep the architecture flexible.

Table 7.17: The mapping of roles to agents.

Role Agent

InterfaceRoleUser InterfaceAgentUser

InterfaceRoleProvider InterfaceAgentProvider

ProfileManagerRoleUser PMAgentUser

ProfileManagerRoleProvider PMAgentProvider

RelayRoleUser RelayAgentUser

RelayRoleProvider RelayAgentProvider

TFEFactoryRole TFEFactoryAgent

TFERole TFEAgent

Communication within agent services is encrypted by mechanisms pro-

vided by the respective MAS architecture, unless the service is considered not

to require encryption, either because no sensitive information is communi-

145

cated, or because the respective data is already encrypted for other reasons,

as described above.

7.3.4 Implementation

As we have only specified agents and agent services for this module, the

implementation is straightforward and therefore most details are omitted

here. Advanced Encryption Standard (AES) is used as the symmetric en-

cryption scheme and HMAC-SHA-1 as the MAC based on a cryptographic

hash function. These algorithms may easily be replaced with similarly suited

algorithms.

For load balancing and improved performance, the provider entity may

use a different agents providing the same functionality, i.e. realizing the same

role, such as a number of PMAgentProvider agents. In this case, the

load balancing must be actually carried out internally, e.g. by using a single

PMAgentProvider manager agent distributing the actual load. It must not

be realized by providing a dedicated PMAgentProvider agent whenever a

user initiates a filtering process because in this case an honest-but-curious

provider may link anonymous communications with the PMAgentProvider

as recipient to a specific user. In the basic implementation, load balancing

is not addressed.

7.4 Summary

This chapter describes functionality provided by the Recommender Module

addressing the use cases “get prediction for item” and “get recommenda-

tions”, as well as the use cases “update profile elements” and “update profile

model”, as defined in Section 4.1. In addition to the use of the Recommender

Module functionality in a Recommender System context, it also covers its

use in a Hybrid IF System context.

We briefly motivate the need for the Recommender Module (Section 7.1).

We specify an ontology containing the basic concepts (Section 7.2.1). We

specify roles and basic interactions for the three stages information collec-

tion, information processing, and information filtering, and address threats

arising from honest-but-curious participants, in particular in the context of

query data propagation and result data propagation (Section 7.2.2). Regard-

ing the design and implementation of the specified functionality, we address

threats arising from malicious participants by specifying two basic protocols

for secure message forwarding (Section 7.3.1.1). Based on these protocols, we

address the threats of altering queries (Section 7.3.1.2), altering result data

146

(Section 7.3.1.3 and (Section 7.3.1.4), and threats in the context of iterated

disclosure of results (Section 7.3.1.5). We briefly discuss how roles should re-

act when a threat is detected (Section 7.3.1.6), and how interactions have to

be extended in order to meet other requirements (Section 7.3.2). Finally, we

list agents and agent services (Section 7.3.3), and we discuss implementation

details (Section 7.3.4). The following chapter addresses the remaining main

use cases of our approach.

147

Chapter 8

The Matchmaker Module

This chapter describes functionality provided by the Matchmaker Module, i.e.

it primarily addresses the use cases “get prediction for user” and “get similar

users”, as defined in Section 4.1. It uses functionality of the Recommender

Module described in the previous chapter.

The chapter is structured as follows: Section 8.1 briefly motivates the

Matchmaker Module. Section 8.2 describes the ontologies, roles and interac-

tions of the module, while Section 8.3 describes the agents and agent services

realizing these interactions. Section 8.4 concludes the chapter with a sum-

mary.

8.1 Motivation

The Matchmaker Module is one of the two core modules of our approach for

Privacy-Preserving Information Filtering. Together with the Recommender

Module, it addresses all use cases defined in Section 4.1. It provides primary

functionality related to the requirement of user privacy (see Table 4.2). Thus,

the need for functionality described in this chapter is motivated directly by

the outline of our solution given in Section 4.2, as the abstract IF protocols

introduced in the outline are realized via agent interactions, i.e. as part of

agent services.

8.2 Analysis

This section describes the ontologies, roles and interactions of the Match-

maker Module. For the sake of readability, all tables and diagrams specifying

these components may be found in Appendix A.4.

149

8.2.1 Ontologies

The main ontology of this module, the ontology “Distributed Information

Filtering” shown in Figure A.6 of the Appendix, complements the ontology

introduced in the previous chapter. It contains categories and attributes for

distributed Matchmaker Systems and distributed Hybrid IF Systems that

are explained further in the context of the interactions they are used in.

8.2.2 Roles and Interactions

The Matchmaker Module utilizes the roles introduced in Table 7.1, and one

additional role as described in Table 8.1. For the role schema, see Ap-

pendix A.4.

Table 8.1: The roles participating in the Matchmaker Module.

short name/ aggregated by

role name user provider filter

InterfaceRole 









see Table 7.1

ProfileManagerRole

RelayRole

TPMASProviderRole

TFERole

TFEFactoryRole

CentralizedModelManagerRole CMMR(P)

Responsible for the management of relations between profile

elements and references to user entities.

Analogous to the Recommender Module, we initially assume all partici-

pating roles to act in an honest or at least honest-but-curious manner, and

address threats emanating from roles acting in a malicious manner in Sec-

tion 8.3.1.

8.2.2.1 Determining Potentially Similar Users

In distributed collaboration-based IF approaches, similar users are deter-

mined by comparing user profiles. In our approach, the user profiles are

distributed among the user entities. Therefore, similar users have to be

determined in a distributed manner as well. Assuming the Information Fil-

tering process of determining the similarity of two specific users as given,

the straightforward approach of determining all similar users would be to

apply this Information Filtering process to all combinations of two users. In

150

systems with a large number of users, however, it is obviously infeasible to

carry out the Information Filtering process for every pair of users, because

the overall complexity would be quadratical in the number of users not even

taking into account the fact that the process has to be repeated periodically

because profile elements and thus similarities change over time. Therefore,

the Information Filtering process should only be carried out for pairs of users

who can be expected to be similar with a probability that is at least above

average. This section introduces interactions for determining such candidate

pairs. Other approaches are discussed in Section 3.2.4.

Candidate pairs are determined during the information collection and

information processing stage. They are stored in the user profile models of

the respective candidate users. We determine candidate pairs by considering

two users as potentially similar if they have profiles containing the same

profile element, or similar profile elements.

An entity keeping track of the profile elements of different user profiles

would be able to determine overlaps and thus potentially similar users. In our

approach, however, all user profile information is stored in a decentralized

way by the user agents. Therefore, an additional role realizing this task is

introduced, namely the CentralizedModelManagerRole. While this

role could be associated with any abstract entity, the provider entity is the

most obvious entity for aggregating this role, mainly because it already man-

ages the items that are potential user profile elements. Additionally, the

provider entity may be sufficiently motivated to carry out the tasks assigned

to this role because the respective interactions allow the provider entity to

collect general information about the dissemination of the information it pro-

vides, such as statistics of the most popular items, i.e. items appearing in a

large number of user profiles.

In relation to a given profile element, the CentralizedModelMan-

agerRole stores references to users who have added this element to their

respective profiles within a profile model as part of the information processing

stage. As a user entity may add an element to its profile without interact-

ing with any other role, the user entity itself is responsible for announcing

the operation to the CentralizedModelManagerRole, as an additional

interaction within the main use case “update profile model”. The Cen-

tralizedModelManagerRole utilizes a filtering technique in order to

determine references to other user entities who have announced this element

or similar elements, and notifies the user entity as an additional interaction

within the main use case “update profile model”. The other user entities do

not have to be notified by the CentralizedModelManagerRole, be-

cause they will be contacted by the user entity itself if it actually intends to

determine similar users. The respective interaction AnnounceProfileElement

151

Figure 8.1: Collaboration Diagram for the partial use case “an-

nounce profile element” as part of the main use case “update pro-

file model”.

is specified in Table A.40 of the appendix.

The obvious problem arising from the introduction of the Centralized-

ModelManagerRole is the privacy of the user entities, i.e. the fact that

user profile data is stored in a centralized way by a potentially honest-but-

curious role. This problem is addressed by the following solution: The refer-

ence to the user entity stored by the CentralizedModelManagerRole

does not contain data that may be used to actually identify the respective

user entity. Instead, a pseudonym is stored which may be used to contact

the respective entity. Furthermore, different pseudonyms have to be used by

a given user entity for different profile elements, because otherwise the com-

plete user profile could be reconstructed by searching for all profile elements

associated with the same pseudonym. The mechanism for anonymous com-

munication introduced in Section 5.2 is used to contact the user entity via its

pseudonym. In other words, the user entity has to utilize an Anonymizer-

Role in order to achieve receiver anonymity when contacted by potentially

similar user entities, and for achieving sender anonymity when announcing

the respective profile element. The interaction between the different roles is

shown in Figure 8.1.

Due to the fact that all data stored in the centralized model is anonymized,

all threats originating from an honest-but-curious provider entity may be ad-

dressed adequately, because there is no way for the provider entity to obtain

additional private information. Threats originating from malicious partici-

152

pants are addressed in the design phase described in Section 8.3.

8.2.2.2 Determining the Actual Similarity

Determining the actual similarity of two users is a rather straightforward

task. In fact, we have already described all required interactions in the

previous chapter, because the use case “get prediction for user” is basically

realized via the same interaction protocol as the use cases “get prediction for

item” and “get recommendations”, with the following differences:



The supplier entity is actually realized by a second user entity, instead

of a provider entity.



User profiles are usually small enough to be processed entirely. There-

fore, the RelayRole does not have to actually analyze queries (as long

as it ascertains that no private data is used within the query structure).



The filtering technique is primarily applied in order to determine the

overall similarity of the two user profiles. It may additionally provide

recommendations and/or predictions within the same interaction. Ob-

viously, recommendations are generally only likely to be relevant in

cases of high similarity. Filtering techniques may be designed in a way

that allows their use for both goals. We give an example of a suitable

filtering technique in the following chapter.

The use case “get similar users” is realized by carrying out the interactions

of the use case “get prediction for user” for each candidate user, whereas the

top-N most similar users are retained.

As we utilize no additional interactions for determining the actual sim-

ilarity of users, no additional threats originating from honest-but-curious

participants in this contexts have to be addressed.

8.2.2.3 Hybrid IF System Functionality

A Hybrid IF System generates recommendations and predictions of items via

determining similar users. It may be realized simply by combining Match-

maker System functionality and Recommender System functionality. In our

approach, the functionality described above allows a user to find other similar

users. Recommendations or predictions of item relevance may be obtained

from a similar user by carrying out the interactions described in the previous

chapter, with the similar user representing the supplier. Both steps may also

be combined by returning recommendations or predictions of item relevance

153

in addition to the value indicating the similarity of users, which reduces the

number of interactions between users. In this case, the additional result data

should only be returned if the users are actually similar, because otherwise

the result data cannot be expected to be relevant. It should also be noted

that while result data from two user profiles could theoretically be generated

via the same filtering techniques that are used in the pure Recommender

System context, using adjusted filtering techniques in this context may be

more advantageous with regard to quality.

8.3 Design & Implementation

This section describes the agents and agent services of the Matchmaker Mod-

ule. The interactions defined in the previous sections have to be refined

further in order to address threats originating from malicious participants,

as well as further requirements. The following section describes extensions

constituting countermeasures for various threats originating from malicious

participants. Subsequent sections describe extensions addressing further re-

quirements, and finally the agents and agent services.

8.3.1 Threats and Countermeasures

The introduction of the CentralizedModelManagerRole creates new

possibilities for critical malicious actions. A malicious entity acting as a user

entity may attempt to reconstruct the user profile of other user entities by

the following course of action: The malicious entity adds a large number of

elements to its user profile, and thus receives a subsequently large number

of references to potentially similar user entities. By determining the similar-

ities of the respective user entities (which are initially only known via their

pseudonyms), the malicious entity may subsequently be able to link different

pseudonyms to one single user entity, namely in cases where the similar-

ity value and/or the generated recommendations and predictions are exactly

identical. Once different pseudonyms are linkable to a single user entity, the

respective profile elements become linkable as well, and thus a user profile

may be reconstructed.

This threat is basically prevented by blurring the returned similarity

value, making it impossible to link similarities1. Recommendations should

1As an example, if similarity is expressed as a continuous value between 0% and 100%,

two separate similarities of 89.1327% probably indicate the same actual user, while using

discrete values such as 0%, 5%, 10%, etc. would not allow to draw the same conclusion for

two similarities of 90%, because the value will apply to many different users.

154

only be returned in cases of high similarity and non-suspicious foreign user

profiles. User profiles containing only a few elements, or an exceedingly large

number of elements, should be considered suspicious as they are likely not

to belong to a regular, i.e. honest user entity.

Malicious actions with regard to the propagation of the similarity value

itself do not have to be addressed explicitly, because it may be treated as

a special kind of recommendation in this respect, and the mechanisms de-

scribed in Section 7.3.1.3 apply, with the following exception: There is ac-

tually no way to prevent a ProfileManagerRole from withholding the

similarity value itself. Because the similarity value alone does not give any

useful information, this threat is negligible.

8.3.2 Other Requirements

Apart from privacy requirements, interactions may be extended for other

reasons, e.g. in order to improve the performance of the system:

As described in Section 5.2, using relay agents for achieving receiver

anonymity is problematic because these relay agents are not as short-lived

as relay agents used for achieving sender anonymity, basically because the

receiver usually does not decide when the communication takes place. There-

fore, in a straightforward realization of the interaction introduced above, each

user entity has to create and maintain one additional agent per user profile

element. However, the total number of agents would be rather large in sys-

tems with a large number of users and comprehensive user profiles (O(s·u)

for uusers with an average number of elements s). This number could be

reduced to O(u), which is manageable because there are already O(u) agents

associated with user entities anyway, by the following approach:

Based on the average user profile size (which may be determined via the

CentralizedModelManagerRole), a correspondingly large number of

time slots are introduced, and each potential user profile element is assigned

a time slot. User entities announce user profile elements only during the

respective time slot, and thus have to keep the respective agents realizing

the functionality for receiver anonymity alive only during that time slot.

If the suggested approach is still considered infeasible for a specific imple-

mentation, other solutions may be used without having to change the overall

architecture: Solutions related to mix networks or similar anonymizer ap-

proaches (for which see Section 3.1.1) may be successfully applied in this

case, although it should be noted that they would introduce additional trust

issues.

155

8.3.3 Agents and Agent Services

Based on the extended interactions described in the previous sections, we

are now able to define an additional agent service. As in most other cases,

the respective interaction is mapped to the agent service in a straightforward

manner, as shown in Table 8.2. Roles are aggregated by agents in a similarly

straightforward manner, as shown in Table 8.3.

Table 8.2: The mapping of interactions to agent services.

Interaction Table Agent Service

AnnounceElement A.40 AnnounceElement

Table 8.3: The mapping of roles to agents.

Role Agent

CentralizedModel- CentralizedModel-

ManagerRole ManagerAgent

8.3.4 Implementation

As we have only specified agents and agent services for this module, the

implementation is straightforward and therefore its details are omitted here.

8.4 Summary

This chapter describes functionality provided by the Matchmaker Module

addressing the use cases “get prediction for user” and “get similar users”,

as defined in Section 4.1. In addition to the use of the Matchmaker Module

functionality in a distributed Matchmaker System context, it also covers its

use in a distributed Hybrid IF System context.

We briefly motivate the need for the Matchmaker Module (Section 8.1).

We specify an ontology containing the basic concepts (Section 8.2.1). We

specify roles and basic interactions, and address threats arising from honest-

but-curious participants 8.2.2). Regarding the design and implementation of

the specified functionality, we address threats arising from malicious partici-

pants (Section 8.3.1), and we discuss how interactions have to be extended in

order to meet other requirements (Section 8.3.2). Finally, we list agents and

agent services (Section 8.3.3). The following chapter describes exemplary

filtering techniques that may be used in our approach.

156

Chapter 9

Exemplary Filtering Techniques

This chapter describes exemplary filtering techniques that may be used by

the Recommender Module and the Matchmaker Module. These filtering

techniques are provided as building blocks to be utilized by the filter entity.

While the modules described in the previous chapters provided ontologies,

interactions and ultimately agents and agent services, this chapter deals with

functionality to be used internally, i.e. within a single agent.

As the Gaia methodology is not applicable to internal functionality, the

section of this chapter are structured differently than the sections of the pre-

vious chapters. Section 9.1 briefly motivates exemplary filtering techniques.

Section 9.2 describes the general requirements of filtering techniques that are

to be applied in our approach for Privacy-Preserving Information Filtering,

and Section 9.3 describes three exemplary filtering techniques meeting these

requirements. Section 9.4 concludes the chapter with a summary.

9.1 Motivation

In the previous chapters, the filtering techniques used by the Recommender

Module and the Matchmaker Module have been largely treated as black

boxes, i.e. we have assumed that filtering techniques exist which are may be

utilized within the information processing stage and the information filtering

stage in a way that meets all functional and non-functional requirements.

In this chapter, we break down the requirements regarding the use of a

specific filtering technique, and we describe exemplary filtering techniques in

order to show that these requirements may actually be met.

157

9.2 Analysis

This section lists the requirements for filtering techniques to be used in the

context of Privacy-Preserving Information Filtering. It should be noted that

our approach does not require a single filtering technique to be applicable in

the context of each of the four main use cases, or each of the sub-cases for

the propagation of result data (as defined in Section 4.1). However, all use

cases should be covered by at least one filtering technique.

As discussed in Section 2.2.1, there are two main groups of filtering tech-

niques, namely feature-based and collaboration-based filtering techniques.

Feature-based filtering techniques are less problematic with regard to pri-

vacy because the respective profiles do not contain private data associated

with other users. There are, however, feature-based approaches that are not

directly applicable: Learning-based approaches in which the filter entity uses

the data obtained during a specific filtering process (or data provided by

the user as feedback) in order to refine further filter processes are obviously

problematic because our approach does not allow any additional data to be

propagated by the Temporary Filter Entity. Therefore, if these approaches

are to be used the feedback has to be obtained outside of the PPIF part of

the respective system, e.g. by using the same filtering technique in a non-

privacy-preserving context.

Collaboration-based filtering techniques are generally problematic be-

cause they are based on data obtained by combining and analyzing elements

from different user profiles. They may still be used either by again obtaining

the data outside of the PPIF part of the respective system, or by combining

a feature-based and a collaboration-based approach, e.g. as described as our

solution for Matchmaker Systems in Chapter 8. In the latter case, however,

the actual algorithm used to create the centralized model during the infor-

mation processing stage, as well as the actual algorithm used to generate

result data based on two profiles during the information filtering stage, are

feature-based filtering techniques.

There are two main aspects that have to be considered for filtering tech-

niques to be applied in our approach:



Influence of supplier profile data on obtained user profile data: The

protocols described in Chapter 7 are only applicable if the utilized

filtering algorithm does not have to retrieve user profile data based on

obtained supplier profile data, because as soon as supplier profile data

has been obtained, no further user profile data may be obtained in a

privacy-preserving manner. While the protocols could be extended to

facilitate an iterative propagation of user and supplier profile data, this

158

is usually unnecessary because the user profile is expected to be small

enough to be propagated as a whole.



Influence of user profile data on obtained supplier profile data: In prac-

tice, the applicability of a specific filtering technique largely depends on

the size of the provider profile, including the models maintained during

the information processing stage: While the user profile is typically

small enough to be propagated entirely, there are different options for

the propagation of the provider profile, in case the supplier is actually

a provider entity (and not a different user entity):

–Complete Propagation: Propagating the entire provider profile is

usually infeasible, due to its size. The situation is analogous to the

Private Information Retrieval scenario described in Section 3.2.3,

i.e. propagating the entire provider profile constitutes a trivial

solution that is theoretically applicable for all filtering techniques.

–Constrained Propagation via IR: In the mixed IF/IR scenario,

only a small part of the provider profile has to be propagated, the

elements of which are based on a non-privacy-preserving IR query.

–Partial Propagation via Unlinkable Queries: In all other cases, the

relevant parts of the provider profile have to be retrieved based on

user profile data. As described in Section 7.2.2.3, the respective

queries are privacy-critical and should therefore be propagated in

an unlinkable manner. Depending on the size of the user profile,

this approach is rather time-critical.

–Partial Propagation via Refinement: A different approach is based

on the fact that the provider entity may obtain any information

during the profile propagation process as long as it is information

that may be deduced from the recommendations themselves af-

ter the filtering process. If, for example, the item iis returned

as a recommendation, the prior propagation of the part of the

provider profile containing the 100 items most similar to idoes

not allow the provider to deduce additional information about the

user profile. In other words, this strategy refines a large num-

ber of provider profile items to a comparatively small number of

recommendations without using user profile elements within the

respective queries. This strategy is obviously not applicable in the

case of completely private result data. In the case of semi-private

result data, only single recommendations may be generated dur-

ing a specific filtering process. It is therefore most suitable in the

case of linkable and semi-linkable result data.

159

Filtering techniques to be applied in our approach should be able to

deal with at least one of the latter strategies in addition to the trivial

solution of complete propagation.

In the following section we specify filtering techniques meeting these re-

quirements.

9.3 Design & Implementation

In this section, we describe three exemplary filtering techniques that meet the

requirements given above. The first filtering technique is not based on profile

models and therefore applicable for the use cases “get recommendations” in

a Hybrid IF System, “get prediction for item”, and “get prediction for user”,

because these use cases require a comparatively small amount of supplier

profile data and therefore may be realized without profile models. The other

two filtering techniques are model-based and therefore primarily applicable

for the main use case “get recommendations” in a Recommender System.

All filtering techniques are ultimately based on determining the similarity

of two single items. It should be noted that the actual item similarity algo-

rithm may be chosen freely without affecting the other parts of the respective

filtering technique. In particular, this aspects enables the filter entity to ac-

tually preserve the privacy of its sensitive data, because the filter entity may

adjust or alter the item similarity algorithm independent of the other entities.

The model-based filtering techniques are used for the information process-

ing stage in the context of the partial use case “announce profile element” as

well: The centralized model of candidate users is based on item similarity as

well, and therefore the models created via the filtering techniques described

below may be used in this context.

9.3.1 Item Similarity Algorithm

As its name implies, the item similarity algorithm ft1is directly based on

item similarity. The similarity of two items i1and i2which are contained

in a profile is determined by a function sim(i1, i2). Following [38], we use a

cosine-based similarity function, i.e. given the vectors ~

I1and ~

I2containing

the attribute values of i1and i2respectively, the similarity of these items is

defined by Equation 9.11.

1In the original algorithm, the vectors contain user-related data, which is infeasible in

our approach. The algorithm, however, is applicable as long as the vectors contain any

kind of comparable data.

160

simft1(i1, i2) = cos(~

I1,~

I2) (9.1)

A prediction of the relevance of an item iis generated by comparing

the given item with all items of the user profile and returning the largest

similarity, as defined by Equation 9.2. It should be noted that no provider

profile items have to be propagated in this case apart from the item i, which

is assumed to be given.

predu,s,ft1,i =max(simft1(i, i1), .., simft1(i, in)) with

{i1, .., in}=PR(u)(9.2)

The top-n recommendations in a Hybrid IF System are generated by by

determining the pairwise similarity of all items of the two respective user

profiles, and by returning the nmost similar items that are not already con-

tained in the user profile of the entity initiating the interaction, as defined

by Equation 9.3. In case of small or largely equal profiles, less than nrec-

ommendations may be returned.

RECu,u0,ft1,n =

{i∈(PRu0\PRu)| ∀ X⊆(PRu0\PRu\ {i}) :

|X|< n ∨ ∃ x∈X:predu,u0,ft1,i > predu,u0,ft1,x}

(9.3)

A prediction of the similarity of a user is generated by determining the

pairwise similarity of all items of the two respective profiles, and by returning

the average similarity of items based on the pairwise similarities, as defined

by Equation 9.4.

predu,u0,ft1,u0=

r=1

s=1

simft1(ir, js)with

{i1, .., im}=PR(u)and (9.4)

{j1, .., jn}=PR(u0)

This filtering technique is applicable in our approach because the user pro-

file data may be propagated independent of and prior to the supplier profile

data. The supplier profile data is propagated completely in the Matchmaker

System-related use cases, which is feasible because in there cases the supplier

represents a second user with a comparatively small profile, and it does not

have to be propagated at all in the use case “get prediction for item”, because

no additional supplier profile data is required in this case.

161

Thus, the requirements of user privacy and provider privacy are addressed

adequately. As the filtering technique is not based on profile models, filter

privacy is addressed adequately as well, because no other entity obtains any

information about the filtering algorithm itself. The requirement of quality

is met because the output of this filtering technique in a Privacy-Preserv-

ing Information Filtering context is the same as its output in a regular IF

context. The requirement of broadness is met because the filtering technique

is not domain-specific. The requirement of performance is met because the

same algorithms are used as in the context of a regular IF system.

To summarize, the item similarity algorithm may be applied in our ap-

proach for Privacy-Preserving Information Filtering. As its implementation

is straightforward, details are omitted here.

9.3.2 Item-based Top-N Recommendation Algorithm

In the following, we use the filtering technique described in [38] as an ex-

emplary representative for the class of non-collaborative, non-learning-based

filtering techniques.

The item-based top-N recommendation algorithm ft2is based on a pro-

vider model created during the information processing stage. The kitems

with highest similarity values with regard to i2are contained in the set TOPi2,

excluding i2itself. Similarity is determined via the similarity function intro-

duced above. The provider model is constituted by a matrix Mcontaining

item similarities according to Equation 9.5 (for performance reasons, all but

the khighest values are set to zero for each column of the matrix).

mIp,ft1=M with

Mi1,i2=simft2(i1, i2)· |{i1} ∩ TOPi2|(9.5)

The items contained in the user profile are modeled as a vector ~

Uwith

non-zero values for items contained in the user profile according to Equation

9.6.

mIu,ft1=~

U with

Ui=|{i} ∩ PRU|(9.6)

Recommendations are generated by obtaining the vector ~x =M·~

which contains sums of similarity values for all items i: The value ~xiis the

sum of the similarity values of item iwith all items contained in the user

162

profile. Based on ~x, the Nitems with the highest values that are not already

contained in the user profile are returned as recommendations.

This filtering technique is applicable in our approach because the user

profile data may be propagated independent of and prior to the provider

profile data. The provider profile data may be propagated as follows:



Complete Propagation: Propagating the entire model is obviously pos-

sible. Its size is in O(I) as long as kis chosen independent of |I|, which

is optimal (a complete model should at least contain information about

all profile elements, and thus its size cannot be smaller than O(I)) but

still infeasible for large provider profiles.



Constrained Propagation via IR: If only a part of the provider profile is

selected in a non-privacy-preserving way as described above, only the

rows of the matrix Mcontaining the respective elements have to be

propagated, and the complete matrix may be reconstructed afterwards

by using zero values in the remaining rows. Thus, the propagated part

of the provider model may be reduced to a manageable size.



Partial Propagation via unlinkable queries: The values of the columns

of the matrix Mcorresponding to zero values in the vector ~

Udo not

contribute to the sums contained in the vector ~x. Therefore, the com-

plete matrix may be reconstructed after obtaining single columns of the

matrix Mvia unlinkable queries by using zero values in the remaining

columns. Thus, this approach is applicable for this filtering technique

as well.



Partial Propagation Via Refinement: The filtering technique is not

suitable for this strategy, because the provider profile cannot be re-

constructed or traversed iteratively.

While the requirements of user privacy and provider privacy are thus

addressed adequately, the filtering technique has to be adapted in order to

address the requirement of filter privacy: If the filtering algorithm is regarded

as sensitive data, the provider profile model has to be protected because it

contains similarity values results of the part of the filtering algorithm used

during the information processing stage. By accessing these similarity values,

the provider entity could be able to reconstruct the item similarity algorithm,

or it could use the data in order to carry out filtering processes by itself.

Therefore, the provider profile model has to be encrypted by the filter entity

before it is propagated to the provider entity for storage. The model cannot

be encrypted as a whole, as this would prevent the propagation of parts of the

163

models in the context of constrained or partial propagation. Therefore, the

rows of the matrix Mare encrypted separately, without changing their order,

so that the provider is still able to provide the requested data. Nevertheless,

the model should be re-encrypted completely whenever items are added or

removed, because all other approaches, such as only encrypting altered ele-

ments of the matrix Mindividually, would allow the provider entity to obtain

additional information. The requirement of quality is met because the out-

put of this filtering technique in a Privacy-Preserving Information Filtering

context is the same as its output in a regular IF context. The requirement of

broadness is met because the filtering technique is not domain-specific. The

requirement of performance is met in the approaches based on constrained or

partial propagation because the same algorithms are used as in the context

of a regular IF system, and the additional operations required for encryption

and decryption of the model do not change the overall complexity class.

To summarize, the item-based top-N recommendation algorithm may be

applied in our approach for Privacy-Preserving Information Filtering. As its

implementation is straightforward, details are omitted here.

9.3.3 Hierarchical Clustering-based Algorithm

As the item-based top-N recommendation algorithm is not suitable for the

provider profile propagation strategy of partial propagation via refinement,

we introduce a different algorithm which utilizes this strategy. Taken to-

gether, the two algorithms cover all propagation strategies. Hierarchical

clustering algorithms are ideally suited for approaches determining recom-

mendations in an iterative manner, because of the structure of the underlying

model. We describe a hierarchical agglomerative clustering-based algorithm

as a representative of this class of algorithms.

As the name implies, hierarchical clustering algorithms create a hierarchy

of clusters containing similar elements, resulting in a tree representing the

element set, which in our case is the provider profile. The leaves of this tree

are single profile elements as clusters of size one. In our case, the algorithm

ft3creates a tree via single-link clustering, i.e. by iteratively merging the

two most similar clusters, where cluster similarity is defined via item simi-

larity, and thus the two clusters containing the two most similar items are

considered to be most similar2. These items are additionally defined as the

cluster representatives. The process is repeated until all items are merged

2Other clustering approaches, such as complete-link clustering or average-link clus-

tering, could also be used here. The only differ with regard to the definition of cluster

similarity.

164

into one single cluster representing the root of the tree. An example is given

in Appendix C.

Recommendations are determined by iterating through the tree, from the

root upwards, by selecting in each single step the cluster whose representa-

tive is most similar to a given user profile element (or a group of user profile

elements). Once a cluster of sufficiently small size is reached, its elements

may be used either directly as recommendations, or as candidates for recom-

mendations from which, via additional similarity measurements, the actual

recommendations are determined.

This filtering technique is applicable in our approach because the user

profile data may be propagated independent of and prior to the provider

profile data. The provider profile data may be propagated as follows:



Complete Propagation: Propagating the entire model is obviously pos-

sible. Its size is in O(I), as there are 2 · |I| − 1 clusters (or somewhat

less if only clusters of at least size kare maintained). Again, this value

is optimal but still infeasible for large provider profiles.



Constrained Propagation via IR: If only a part of the provider profile

is selected in a non-privacy-preserving way as described above, either

the clustering process has to be carried out for the constrained profile,

which may be infeasible due to time constraints, or a model based on

the complete tree has to be used, whereas items not contained in the

constrained profile are simply ignored, which may lead to results of

lower quality. Therefore, the filtering technique is less suitable for this

strategy.



Partial Propagation via unlinkable queries: Unless combined with the

following strategy, the filtering technique is not suitable for this strategy

either, because the model is based on clusters of items rather than on

single items. Otherwise, the only way to use this strategy would be

to carry out at least part of the filtering process at the provider side,

which is contrary to our view of the filter entity as independent of other

entities.



Partial Propagation Via Refinement: The filtering technique is ideally

suitable for this strategy, because it may be carried out iteratively by

requesting the respective cluster representatives via separate queries.

The queries do not reveal any additional information because they do

not contain user profile elements, and they refer to clusters that may

be reconstructed via the recommendations themselves, as described in

the example given in Appendix C.

165

As in the case of the filtering technique utilizing the item-based top-N rec-

ommendation algorithm described above, this filtering technique has to be

adapted as well in order to address the requirement of filter privacy. Again,

the provider profile model has to be encrypted by the filter entity before it

is propagated to the provider entity for storage. To facilitate constrained

or partial propagation of the profile, each cluster has to be encrypted sep-

arately. Over time, an honest-but-curious provider may be able to deduce

the contents of specific clusters based on the generated recommendations.

As this knowledge would still not enable the provider to carry out the filter-

ing process by itself, we consider it to be less privacy-critical. In any case,

a concerned filter entity may offset this threat by re-encrypting the com-

plete model periodically, a task which is necessary anyway whenever items

are added or removed to the profile. Complete filter privacy may only be

achieved by combining the two approaches for partial propagation. The re-

quirement of quality is met because the output of this filtering technique in

a Privacy-Preserving Information Filtering context is the same as its output

in a regular IF context. The requirement of broadness is met because the

filtering technique is not domain-specific. The requirement of performance is

met because the same algorithms are used as in the context of a regular IF

system, and the additional operations required for encryption and decryption

of the model do not change the overall complexity class.

To summarize, the hierarchical agglomerative clustering-based algorithm

may be applied in our approach for Privacy-Preserving Information Filtering.

As its implementation is straightforward, details are omitted here.

9.4 Summary

This chapter describes exemplary filtering techniques that may be used by the

Recommender Module and the Matchmaker Module. We briefly motivate the

need for providing exemplary filtering techniques (Section 9.1). We discuss

the main aspects that have to be considered for filtering techniques to be

applied in our approach (Section 9.2). We describe three exemplary filtering

techniques, namely an item similarity algorithm (Section 9.3.1), an item-

based top-N recommendation algorithm (Section 9.3.2), and a hierarchical

agglomerative clustering-based algorithm (Section 9.3.3), and show that they

meet all requirements, as summarized in Table 9.1. The following chapter

evaluates our approach in general and in the context of the prototypical

application described in Section 4.3.

166

Table 9.1: An overview of the introduced exemplary filtering

techniques in relation to the requirements and acceptance aspects

of Privacy-Preserving Information Filtering for the use cases real-

ized via the respective filtering technique. A requirement is fully

met (indicated by “X”), partially met (indicated by “



”), or not

met at all (indicated by “–”). Acceptance is indicated in an anal-

ogous manner. The term “propagation” refers to the propagation

of the supplier profile.

Privacy Other Accep-

Requirements Requirements tance

RuRpRfRqq Rbb Rpp AuAp

item similarity algorithm ft1

complete propagation X X X X X X X X

without propagation X X X X X X X X

item-based top-N recommendation algorithm ft2

complete propagation X X X X X –X



constrained prop. via IR X X X X X X X X

partial prop. via queries X X X X X



partial prop. via refinement n/a

hierarchical agglomerative clustering-based algorithm ft3

complete propagation X X X X X –X



constrained prop. via IR X X X



X X

partial prop. via queries X X –X X X X



partial prop. via refinement X X



X X X X X

168

Chapter 10

Evaluation

This chapter evaluates our approach for Privacy-Preserving Information Fil-

tering. It is structured as follows: Section 10.1 discusses the coverage of the

non-functional and functional requirements by our approach and by the im-

plemented application. Section 10.2 compares our approach with approaches

based on trusted software. Section 10.3 evaluates the applicability of the

functionality specified in our approach and provides usage guidelines. Sec-

tion 10.4 concludes the chapter with a summary.

10.1 Coverage of Requirements

In this section, we show that our approach meets all functional and non-

functional requirements listed in Section 2.3.4. In addition to a theoretical

evaluation, we also discuss results obtained from evaluating the prototypical

application that we have implemented based on our approach, as described

in Section 4.3. It should be noted, though, that most non-functional require-

ments (such as the requirement of privacy) cannot be quantified easily, and

some of them (such as the requirement of broadness) cannot be quantified

at all, and therefore we cannot actually measure the extent to which these

requirements are met in the prototypical application. With regard to these

requirements, we have to rely on the theoretical evaluation.

10.1.1 Functional Requirements

In the following, we review the functional requirements and show that they

are actually met by our approach. As shown in Table 10.1, not all required

functionality has actually been implemented for the prototypical application,

169

mainly because the application focuses on Recommender System functional-

ity rather than Matchmaker System functionality.



The system should provide sufficient functionality for realizing the in-

formation collection stage, the information processing stage and the

information filtering stage of an IF system. This requirement is met by

our approach, because it provides for the respective main use cases:

–The information collection stage is covered by functionality for

the main use case “update profile elements”, as described in Sec-

tion 7.2.2.1.

–The information processing stage is covered by functionality for

the main use case “update profile model”, as described in Sec-

tion 7.2.2.2.

–The information filtering stage is covered by functionality for the

main use cases “get prediction for item”, “get recommendations”,

“get prediction for user”, and “get similar users”, as described in

Section 7.2.2.3 and Section 8.2.2.

The respective functionality has been implemented in the prototypical

application, with the exception of functionality for the use cases “get

prediction for user” and “get similar users”.



The system should be able to return all different kinds of result data

defined in Section 2.2.1, namely predictions of the relevance of specific

items, top-nrecommendations of items, predictions of the similarity

of specific users, and top-nsimilar users for a given user. In other

words, the system should be able to provide Recommender System

functionality as well as Matchmaker System functionality. As described

above, the different kinds of result data are covered by the different use

cases of the information filtering stage, and thus this requirement is

met by our approach, as described in Section 7.2.2.3.



Regarding filtering techniques, the system should be able to support

feature-based approaches as well as collaborative approaches. This re-

quirement is met by our approach, because the Recommender Module

utilizes feature-based approaches, while the Matchmaker Module uti-

lizes collaboration-based approaches for the generation of the central-

ized model. Two exemplary feature-based approaches, both of which

have been implemented in the prototypical application, are described

in Chapter 9.

170

Table 10.1: Coverage of the functional requirements of an IF

architecture by our approach for PPIF and the implemented pro-

totypical application. A requirement is covered (indicated by “X”)

or not covered (indicated by “–”).

functional covered by implemented

requirement our approach in application

information collection stage X X

information processing stage X X

information filtering stage X X

item predictions X X

recommendations X X

user similarity predictions X–

similar users X–

feature-based filtering techniques X X

collab.-based filtering techniques X–

10.1.2 Non-functional Requirements

In the following, we show that in addition to the functional requirements,

the non-functional requirements are also actually met by our approach. We

also discuss acceptance aspects here because they are closely related to non-

functional requirements. As shown in Table 10.2 and Table 10.5, not all

specified functionality has actually been implemented for the prototypical

application. The part of the functionality that has been implemented, how-

ever, is sufficient for evaluating the quantifiable requirements, such as per-

formance.

10.1.2.1 Privacy

The requirement of privacy is mainly met by keeping private data under

the control of the respective entity. Other entities may only access the data

temporarily, without being able to propagate the data to any other entity

or in any other way that would cause the first entity to lose control of the

further dissemination of its private data.

This solution is basically realized via mechanisms for communication con-

trol, as described in Section 5.2.2.1, which are applied in the interactions of

the Recommender Module in order to protect privacy against threats orig-

inating from honest-but-curious adversaries, as described in Section 7.2.2.

The respective protocols are extended by steps for secure message forward-

ing in order to protect privacy against threats originating from malicious

adversaries, as described in Section 7.3. Communication control and the

171

extended protocols have been implemented in the prototypical application.

Communication control has to be based on a trusted environment, which may

be realized via a trusted computing infrastructure but is not implemented

in the prototypical application. This solution applies to all entities and thus

addresses user privacy, provider privacy, and filter privacy.

In parts of protocols where communication control is not applicable or

insufficient, the requirement of privacy is additionally addressed via anony-

mous communication and encryption of private data, as described in the

following for the different main abstract entities:



User privacy: Privacy of the user entity is additionally preserved via

anonymous communication in the following areas of our approach:

–Query data propagation: User profile data has to be propagated

to the provider entity in case of large provider profiles. Unlinka-

bility of user and profile elements as well as unlinkability of profile

elements among themselves is achieved by propagating single ele-

ments via anonymous interactions, as described in Section 7.2.2.3.

This is not required for the scenarios of the prototypical appli-

cation, as described in Section 10.1.2.4, and therefore not imple-

mented.

–Result data propagation: User profile data may be reconstructed

from the result data, which is propagated to the provider entity

unless the partial use case “completely private result data” is re-

alized. In case of semi-linkable or semi-private result data, recon-

struction is prevented by propagating result data via anonymous

interactions, as described in Section 7.2.2.3. The prototypical ap-

plication realizes the partial use case “completely linkable result

data”, and therefore the other partial use cases are not imple-

mented in the prototypical application.

–Determining Potentially Similar Users: The Matchmaker Module

determines potentially similar users via a centralized model con-

taining user profile data. As in the case of query data propaga-

tion, unlinkability is achieved via anonymous communication, as

described in Section 8.2.2.1. As noted above, Matchmaker System

functionality is not implemented in the prototypical application.



Provider privacy: Privacy of the provider entity is additionally pre-

served via data encryption in the area of query data propagation of our

approach: As described above, anonymous communication is used to

172

protect the user privacy in this context, which implies that communi-

cation control cannot be used to protect provider profile data, which

is propagated to the user entity within the respective interactions, be-

cause controlled agents cannot communicate anonymously. In order to

protect the privacy of the provider profile data up to the point in time

when the respective user agent may be controlled, the data is encrypted

by the provider entity as described in Section 7.2.2.3. As noted above,

this is not required for the scenarios of the prototypical application,

and therefore not implemented.



Filter privacy: Privacy of the filter entity is additionally preserved via

data encryption in the area of information processing of our approach:

As described in Section 7.2.2.2, the provider profile model may allow

an honest-but-curious entity to obtain information about the filtering

algorithm, or even to carry out subsequent filtering processes by itself.

In order to protect the privacy of the filtering algorithm in this context,

the model is encrypted by the filter entity as described in Section 9.3

in the context of exemplary filtering techniques. This is implemented

in the prototypical application.

Table 10.2: Coverage of the privacy requirements of an IF ar-

chitecture in our approach for PPIF and the implemented proto-

typical application. A requirement is covered (indicated by “X”)

or not covered (indicated by “–”).

non-functional requirement/ covered by implemented

solution our approach in application

privacy (all entities)

communication control X X

extended protocols X X

trusted environment X–

user privacy

anonymous communication X–

provider privacy

data encryption X–

filter privacy

data encryption X X

To summarize, the requirement of privacy is met by our approach via a

combination of communication control, anonymous communication and data

encryption, as shown in Table 10.2.

173

10.1.2.2 Quality

While the requirement of quality is quantifiable and thus could be evaluated

practically e.g. by comparing the quality of recommendations provided by the

prototypical application with the quality of recommendations provided by a

regular IR system within the same domain, it is actually sufficient to evaluate

this requirement theoretically: As described in Chapter 9, the algorithms the

filtering techniques are based on are the same algorithms that are used in

a regular IF system, and therefore the quality of the result data does not

change compared to a regular IF system.

10.1.2.3 Broadness

The requirement of broadness implies that the architecture should not be

restricted to single information domains, specific filtering techniques, or spe-

cific persistent storage mechanisms. It is met by our approach due to the fact

that the specification of functionality does not bring about any restrictions of

this kind. In particular, the Infrastructure Module, the Recommender Mod-

ule and the Matchmaker Module are designed in a domain-independent way.

The TPMAS Module described in Chapter 6 provides a transparent persis-

tence interface which allows the use of arbitrary persistent storage mecha-

nisms. Domain-specific filtering techniques may be used for domain-specific

applications based on our approach, but, as shown in Chapter 9, domain-

independent filtering techniques may be applied effectively.

10.1.2.4 Performance

The requirement of performance is theoretically met by our approach, be-

cause the introduced protocols do not change the computational complexity

class or the communication complexity class with regard to profile data and

results, compared to a regular IF approach, as the amount of information

propagated and processed in each case is in the same class. However, as

exemplary shown in Table 10.3 for a typical use case, there is a substantial

amount of additional interactions and computations which are constant with

regard to profile data and results, but obviously affect the performance in

practice considerably.

Therefore, we also evaluate performance based on the implement proto-

typical application described in Section 4.3, i.e. the Smart Event Assistant.

In this application, recommendations are provided in two ways, based on an

item-based top-Nrecommendation algorithm: A push service delivers new

recommendations to the user in regular intervals (e.g. once per day) via email

or SMS. Because the user is not online during these interactions, they are

174

Table 10.3: The performance of a typical privacy-preserving

filtering process realizing the use case “get recommendations”

with linkable result data, assuming honest-but-curious partici-

pants, compared to the performance of the respective non-privacy-

preserving process.

# main computations # main interactions

non- privacy- non- privacy-

privacy- preserving privacy- preserving

preserving preserving

Phase I

create agent – 2 – 2

restrict comm. – 1 – 3

propagation PRU– – 1 3

Phase II

restrict comm. – 2 – 6

propagation PRS– – 1 3

Phase III

filtering algorithm 1 1 – –

propagation RES – – 1 3

terminate agent – 2 – 6

less critical with regard to performance and the protracted duration of the

information filtering process is acceptable in this case. Recommendations

generated for the intelligent day planner service, however, have to be deliv-

ered with very little latency because the process is triggered by the user, who

expects to receive results promptly. In this scenario, the overall performance

is substantially improved by setting up the relay agent and the TFE agent

offline, i.e. prior to the user’s request, and by the fact that the interactions

are based on the mixed IR/IF scenario, and thus the relevant part of the

provider profile may be propagated in an efficient manner: Because the user

is only interested in items, such as movies, available within a certain time

period and related to specific locations, such as screenings at cinemas in a

specific city, the relevant part of the provider profile is usually small enough

to be propagated entirely. As these additional parameters are not seen as

privacy-critical (because they are not based on the user profile, but rather

constitute a short-term information need), the relevant part of the provider

profile may be propagated as a whole, with no need for more complex in-

teractions. Thus, even a higher number of elements may be retrieved in an

efficient manner, because only a single query and a single interaction iteration

is required.

175

Taken together, these improvements result in a filtering process that,

according to our evaluation, takes about three times as long as the respective

non-privacy-preserving filtering process, which we regard as an acceptable

trade-off for the increased level of privacy. Table 10.4 shows the results of

the performance evaluation in more detail.

Table 10.4: The performance of typical privacy-preserving filter-

ing processes in the Smart Event Assistant, compared to the per-

formance of the respective non-privacy-preserving process. In the

non-privacy-preserving version, a filter agent retrieves the profiles

directly from a database and propagates the result to a provider

agent.

push scenario day planning scenario

non- privacy- non- privacy-

privacy- preserving privacy- preserving

preserving preserving

profile size (retrieved items/total amount of items)

user profile 25/25 25/25

provider profile 125/10,000 500/10,000

elapsed time in filtering process (in seconds)

agent creation etc. n/a 2.2 n/a offline

database access 0.2 0.5 0.4 0.4

profile propagation n/a 0.8 n/a 0.3

filtering algorithm 0.2 0.2 0.2 0.2

result propagation 0.1 1.1 0.1 1.1

overall time 0.5 4.8 0.7 2.0

To summarize, the requirement of performance is met by our approach

for different scenarios, as shown in Table 10.5 together with the other non-

privacy-related requirements.

10.1.2.5 User Acceptance

With regard to user acceptance, we consider our approach to be sufficiently

acceptable in terms of usability, mainly because the user interactions are

largely the same as in a regular IF system. Compared to other conceivable

approaches for Privacy-Preserving Information Filtering, such as centralized

approaches based on trusted computing, the user is expected to trust an ap-

plication based on our approach to a higher degree, because the concept of a

personal user agent containing and protecting personal data may be immedi-

ately more comprehensible than the concept of privacy protection via trusted

176

Table 10.5: Coverage of the requirements of quality, broadness,

and performance in our approach for PPIF and the implemented

prototypical application. A requirement is covered (indicated by

“X”) or not covered (indicated by “–”).

non-functional requirement/ covered by implemented

solution/ scenario our approach in application

quality

standard IF filtering techniques X X

broadness

transparent persistence X X

domain-independent protocols X X

domain-independent filtering techniques X X

performance

use cases in general



–

recommendations for offline user X X

mixed IR/IF scenario X X

computing technology. While it should be noted that our solution is based

on a trusted environment which is likely to be realized via trusted computing

as well, it may still be more acceptable especially if the trusted environment

is provided by an independent party and used for various additional tasks.

10.1.2.6 Provider Acceptance

We consider provider acceptance to be the greatest challenge of our ap-

proach. As long as users continue to accept and use non-privacy-preserving

Recommender Systems and Matchmaker Systems, providers are not likely

to embrace approaches for Privacy-Preserving Information Filtering mainly

because they obtain less information about users, and have to provide addi-

tional resources. However, as discussed in Section 2.2.3, this may change in

case of more complex applications, or in case of applications for more sensi-

tive domains, such as healthcare-related or financial Recommender Systems.

Ultimately, in order to acquire and retain large numbers of users for these

kinds of applications, providers will have to trade-off information about users

for user acceptance, and thus are expected to accept approaches for PPIF

perforce at least in specific scenarios.

177

10.2 Trusted Software Approaches

As described in Section 4.2.1, our solution is based on a trusted environment

in a Multi-Agent System technology context, which implies that the prob-

lem of malicious hosts has to be addressed. As discussed in Section 3.3.2, a

trusted computing infrastructure is the only viable technology-based solution

for this problem, and therefore our approach is based on a trusted computing

infrastructure. Furthermore, as described in Section 3.1.3, trusted computing

has been suggested as a Privacy-Enhancing Technology and could be applied

to a regular IF system in a rather straightforward way, resulting in system

realizing Privacy-Preserving Information Filtering via the trusted software

approach. In this section, we discuss the drawbacks of approaches based on

trusted software, and show that our approach based on a trusted environ-

ment, while somewhat more complex, is in fact more suitable with regard to

the requirements for Privacy-Preserving Information Filtering.

10.2.1 Broadness

The main problem of a solution for PPIF based on trusted software is its

lack of broadness: a software is trusted by a user in the context of trusted

computing when the user is able to verify, via examination, that the software

works as specified, and when the user is able to verify, via remote attestation,

that the instance of the software deployed by the provider actually matches

the examined software and is actually running within a trusted computing

infrastructure. Both aspects are problematic because they have to be carried

out for each single version of each single system in which the user participates.

Typically, the user himself does not have the knowledge or the resources to

examine software, and thus has to rely on third parties for this task. In

any case, the examination has to be carried out whenever some part of the

respective software, including the filtering algorithms in the context of IF, is

patched, upgraded or replaced.

In contrast, in a solution based on a trusted environment, only a com-

paratively small part of the overall system has to be examined and attested.

Other parts of the system, such as filtering algorithms or the protocols used in

the interactions, may be changed without changing the trusted environment

itself.

10.2.2 Provider Acceptance

Furthermore, a solution based on a trusted environment may be more ac-

ceptable for providers basically because the trusted environment could be

178

used for various other tasks and applications involving mobile agents: As

discussed in Section 11.2, it could actually be used to realize applications

from most areas of related work. Furthermore, as all applications based on

mobile agents ultimately require a trusted environment, it seems to be realis-

tic to assume that there is a greater chance that such a trusted environment

is actually realized, as compared to the chance that trusted software for a

large number of very specific tasks is actually realized.

10.3 Usage Guidelines

In this section, we evaluate the main use cases of our approach including par-

tial use cases with regard to applicability. While the provided functionality

may be combined arbitrarily, not all combinations are equally viable espe-

cially with regard to performance, nor are they all equally useful. Therefore,

in a real-world system based on our approach, we do not expect all combina-

tions to be available. In any case, even privacy-aware users are probably not

interested in having to decide whether the received result data should be e.g.

semi-linkable or semi-private. We give the following usage guidelines, based

on typical scenarios in Recommender Systems and Matchmaker Systems:



Recommender System; use case “get recommendations” based on the

complete provider profile: This use case is generally not that time-

critical with regard to performance, because it addresses a long-term

information need of a user and is based on relatively static profiles (in

the sense that profile updates are expected to take place in intervals

that are much longer (e.g. several hours) than the time required to carry

out a filtering process (e.g. a few seconds). Subsequently, the use case is

not based on immediate user input and can be realized without human

user interaction. At the same time, the result data generated in this use

case would allow honest-but-curious participants to deduce information

about the user profile to a larger extent than in the context of other

use cases, and therefore the result data is highly privacy-critical. We

therefore suggest to use the protocol for semi-private or completely

private result data in this case, and propagate a partial provider profile

based on unlinkable queries.



Recommender System; use case “get recommendations” based on a

constrained provider profile that represents the result of a query in the

mixed IR/IF scenario: This use case is more time-critical because the

query based on which the constrained profile is determined is expected

to have been created via interaction with a human user waiting for

179

results. At the same time, the result data may not be as privacy-critical

as in the scenario described above, basically because it may reflect the

user profile to a lesser degree, depending on the query and the number of

results obtained. If, for example, the constrained profile only contains

items that would never have been suggested as recommendations if the

complete profile had been used, the result data may not allow other

entities to correctly deduce information about the user profile. We

therefore suggest to use the protocol for completely linkable or semi-

linkable result data in this case, and propagate the entire constrained

provider profile.



Matchmaker System; use case “get similar users”: This use case again

addresses a long-term information need of a user that is based on rel-

atively static profiles, and is therefore not considered as time-critical

with regard to performance. Furthermore, it is also not highly privacy-

critical, again because the result data may not reflect the respective user

profiles closely. Users may also trust other users to a larger degree than

provider, i.e. other users are usually expected to act non-maliciously.

We therefore suggest to use the protocol for completely linkable or

semi-linkable result data in this case, and propagate the entire supplier

profile.

All other combinations of use cases occur less often in real-world IF sys-

tems and are therefore omitted here.

10.4 Summary

This chapter evaluates our approach for Privacy-Preserving Information Fil-

tering. We discuss the coverage of the non-functional and functional require-

ments by our approach and by the implemented application (Section 10.1),

and show that all requirements are addressed adequately. We compare our

approach with approaches based on trusted software (Section 10.2), and show

that approaches based on a trusted environment meet the requirements to

a higher degree. Finally, we evaluate the applicability of the functionality

provided by our approach (Section 10.3) and provide usage guidelines for

applications based on our approach. The following chapter concludes this

work and outlines future directions of research.

180

Chapter 11

Conclusion & Outlook

This work describes an agent-based approach for Privacy-Preserving Infor-

mation Filtering. This approach utilizes mechanisms that allow entities to

control the communication capabilities of other entities, combined with an

infrastructure for anonymous communication and data encryption. Based

on these building blocks, a trusted environment providing functionality for

the various stages of Information Filtering is realized. We describe how this

functionality may be used for obtaining multilateral privacy in the context

of Information Filtering, i.e. privacy of all participating entities. The ap-

proach provides functionality for privacy-preserving Recommender Systems,

distributed Matchmaker Systems, and combinations thereof. It utilizes fun-

damental features of agents such as autonomy, adaptability and the abil-

ity to communicate, which make agents uniquely suitable for representing

the entities participating in Privacy-Preserving Information Filtering. As

a proof of concept, an in order to be able to evaluate the approach practi-

cally, especially with regard to performance, we have implemented the Smart

Event Assistant as a prototypical application supporting users in planning

entertainment-related activities in a privacy-preserving manner.

This chapter concludes the work by discussing the applicability of the ap-

proach in large-scale real-world applications (Section 11.1) and by discussing

directions for further research and development (Section 11.2).

11.1 Applicability

If our approach is to be successfully applied in a large scale real-world Rec-

ommender System or Matchmaker System, the following aspects have to be

addressed:



An industrial-strength MAS technology is required as a robust foun-

181

dation for the implemented functionality. Application providers are

only expected to accept the use of MAS technology in the context of a

commercial application if it achieves a level of maturity with regard to

stability, required maintenance effort, support and ease of development

that is comparable to other technologies. Because most current MAS

technologies have been developed within a research context, they have

not been widely used in commercial applications yet.



Feedback obtained from users of the Smart Event Assistant indicates

that most users are indifferent to privacy in the context of entertain-

ment-related personal information. Therefore, in order to achieve a

high degree of user acceptance especially in view of expected minor

performance trade-offs, the application should probably focus on infor-

mation of a more privacy-critical domain, such as healthcare-related or

financial information.



Finally, in order to further increase user acceptance, the personal agent

containing and controlling the private user data should be physically as-

sociated with the respective user: Rather than operating on a platform

running on a server supplied by the application provider, the personal

agent should operate on a platform running on a device owned by the

user, even though the former approach is also feasible if the agent op-

erates within a trusted environment at all times. The latter approach

is probably realized best by utilizing mobile devices, because a mobile

device is available to the respective user in more situations than e.g. his

PC. This approach requires the respective MAS technology to be de-

ployed on a number of different mobile devices with different operating

systems, which is not possible with current MAS technologies.

To summarize, we consider the degree of maturity of current MAS tech-

nologies to be the biggest obstacle with regard to a successful real-world

application based on our approach.

11.2 Future Work

We envision two main areas of future work, namely the further implementa-

tion of our approach and possible generalizations of our approach addressing