Philip Raschke, Sebastian Zickau, Jacob Leon Kröger, Axel Küpper
Towards real-time web tracking detection with T.EX
- The Transparency EXtension
Open Access via institutional repository of Technische Universität Berlin
Document type
Conference paper | Accepted version
(i. e. final author-created version that incorporates referee comments and is the version accepted for
publication; also known as: Author’s Accepted Manuscript (AAM), Final Draft, Postprint)
This version is available at
https://doi.org/10.14279/depositonce-16456
Citation details
Raschke, P., Zickau, S., Kröger, J. L., & Küpper, A. (2019). Towards Real-Time Web Tracking Detection with
T.EX - The Transparency EXtension. In Privacy Technologies and Policy (pp. 3–17). Springer International
Publishing. https://doi.org/10.1007/978-3-030-21752-5_1.
Terms of use
This work is protected by copyright and/or related rights. You are free to use this work in any way permitted by
the copyright and related rights legislation that applies to your usage. For other uses, you must obtain
permission from the rights-holder(s).
Towards Real-time Web Tracking Detection
With T.EX - The Transparency EXtension
Philip Raschke, Sebastian Zickau, Jacob Leon Kr¨oger, and Axel K¨upper
Service-centric Networking, Weizenbaum-Institut, Telekom Innovation Laboratories,
Technische Universit¨at Berlin, Germany
{philip.raschke,sebastian.zickau,kroeger,axel.kuepper}@tu-berlin.de
Abstract. Targeted advertising is an inherent part of the modern Web
as we know it. For this purpose, personal data is collected at large scale to
optimize and personalize displayed advertisements to increase the prob-
ability that we click them. Anonymity and privacy are also important
aspects of the World Wide Web since its beginning. Activists and de-
velopers relentlessly release tools that promise to protect us from Web
tracking. Besides extensive blacklists to block Web trackers, researchers
used machine learning techniques in the past years to automatically de-
tect Web trackers. However, for this purpose often artificial data is used,
which lacks in quality.
Due to its sensitivity and the manual effort to collect it, real user data
is avoided. Therefore, we present T.EX - The Transparency EXten-
sion, which aims to record a browsing session in a secure and privacy-
preserving manner. We define requirements and objectives, which are
used for the design of the tool. An implementation is presented, which
is evaluated for its performance. The evaluation shows that our imple-
mentation can be used for the collection of data to feed machine learning
algorithms.
Keywords: Web tracking, browsing behavior, data privacy, browser extension,
data quality, machine-learning, classification algorithm
1 Introduction
There is no doubt that our Web browsing behavior is very sensitive. The websites
we visit and the content we consume reveal information about our personality,
our preferences, orientations, and habits. We give away our physical addresses,
our phone numbers, and bank account information to use services or order goods.
Simultaneously, the majority of websites nowadays integrates content from mul-
tiple external sources or third parties. Consequently, when visiting a website
(also referred as first party) these third parties are given notice about our visit
the moment our browser requests the external content. While our physical ad-
dress, phone number, or bank account information is not disclosed to these third
parties, a link to the website we visited is.
2 T.EX - The Transparency EXtension
The reasons for websites to integrate external content are manifold. Services
embed images, audio, or videos without having to host or being allowed to host
the content on an own server. But also many third-party scripts are integrated for
various reasons. They are in particular critical, since their integration enables the
execution of third-party code on the user’s machine. There, they can access and
gather information of the device and send it to a server where it is aggregated and
analyzed. This way, a malicious third party can track every mouse movement,
every key stroke, and every change of the scroll position of a user on a different
website even without his or her awareness.
While on paper this sounds like a severe data security and privacy threat, this
technique is widely used in the field of targeted advertising and Web analytics to
track user behavior across multiple websites. In fact, Web trackers are an inherent
part of the modern Web, because of their economic value for content providers
and publishers. Websites display advertisements provided by ad exchanges or
advertising networks in exchange for a payment per view or click. This way,
each user of a website generates revenue.
While there is a variety of browser extensions that promise to tackle the
issue, they are mostly blacklist-based, i.e. manual effort is required to identify
trackers, which are then blocked (often by the domain name). This has four
major disadvantages: (i) trackers can easily change their domain name, (ii) web-
sites may offer relevant content or services, while also tracking user behavior
(Amazon, Google, etc.), (iii) blacklists can be wrong, not complete, or outdated,
and (iv) blocking requests to domains might create errors that prevent access
to the desired content of the first party. The latter also occurs in the opposite
causal direction, i.e. first parties block users from their content, if they block
requests to third parties. Another conceptual flaw of blacklists is that they are
not transparent themselves by providing little to no information on the third
party in question and why it is blocked or not.
Consequently, an automated approach to detect Web trackers is desirable.
This is a classification problem, which can be solved with machine learning
techniques. However, machine learning approaches require rather large amounts
of training data, which ideally is real data. However, researchers in this field
often use bots to generate this data by crawling the Alexa.com top Kwebsites.
While this method produces large amounts of data rather quickly, it has a major
drawback: it is not real data. These bots open the website, wait until it is finished
loading, and then open the next in the list. These bots cannot log into websites
like Facebook or Twitter, which even have implemented countermeasures for
artificial users of their services. Even worse, the front page of these services are
very limited and only offer a login form. It can be safely assumed that most
of the third-party communication takes place after the login. By using bots,
tracking of user interactions like moving the mouse, pressing a key, or scrolling
is completely neglected.
For this reason, we present T.EX: Transparency EXtension (T.EX), a secure
browser extension to enable client-side recording, storage, and analysis of indi-
vidual browsing behavior. With this tool researchers can generate data sets with
T.EX - The Transparency EXtension 3
real users in a secure, privacy-preserving, and user-friendly way. In this paper,
we define requirements concerning security, privacy, and usability and explain
how they were met. In addition, the extension provides data visualization capa-
bilities allowing (experienced) users to assess their browsing behavior and the
third-party communication involved in it.
The remainder of this paper is structured as follows: Section 2 defines the
objectives and requirements of the tool. Section 3 elaborates on the limitations
and the derived design decisions. Section 4 gives an overview of related work
and assesses whether suitable solutions already exist. Section 5 presents the
implementation of the tool. In Section 6, we evaluate the tool with regard to the
specified objectives. Finally, a conclusion is given including an outlook.
2 Objectives and requirements
As stated above, the main objective of the tool is to enable the generation
of real user data in a secure, privacy-preserving, and user-friendly manner by
allowing users to record browsing sessions. On this basis, we derive the following
objectives:
Obj1 The tool needs to be able to monitor Hypertext Transfer Protocol (HTTP)
and Hypertext Transfer Protocol Secure (HTTPS) traffic, including header
information, parameters, and the body.
Obj2 An accurate differentiation between first and third party must be realized.
The first party should not be identified only by its host name but rather by
the actual page (HTTP path) the user visited.
Obj3 The network traffic must be persistently stored for a certain amount of
time. This data must be securely (i.e. encrypted) stored on the user’s device,
so no other (malicious) software on the user’s machine can access it.
Obj4 The extraction of data must be in a privacy-preserving manner, i.e. only
relevant data should be collected. Furthermore, no external servers must be
involved.
Obj5 The user must be able to completely delete the data at any time. There
should be a means to prove the erasure of the data.
Obj6 Furthermore, the user must be able to export the data in a machine-
readable format.
Obj7 The user must be able to disable the recording of network traffic at any
time. Ideally, the user can be given a guarantee or proof that the recording
is stopped.
Obj8 Usage of the tool should be user-friendly to the extent that the perceived
Quality of Experice (QoE) is not impacted by it.
Obj9 The tool must offer data visualization capabilities so that users can review
the recorded data before they export it. A search function enables users to
check if any sensitive information are contained within the data set.
4 T.EX - The Transparency EXtension
3 Limitations
Unfortunately, the above defined objectives cannot be realized without con-
straints. In this section, we infer limitations from these objectives and elaborate
on consequent design decisions for the tool.
In order to realize Obj1, HTTP and HTTPS traffic needs to be intercepted.
Obviously, this is a severe data security risk and infringement of the user’s pri-
vacy. For this reason, the collected and recorded data must remain on the user’s
device (see Obj4). However, intercepting HTTPS traffic on the network layer is
not possible without aggressive intervention. A man-in-the-middle attack could
be used in order to intercept the encrypted traffic, but this would put the user’s
overall data security at risk.
Fortunately, we can rely on capabilities offered by browser vendors. Experi-
enced users or system administrators have the expertise to obtain the data using
the browser’s developer tools like Google Chrome’s DevTools or the Inspector of
Firefox. However, the data, that is logged there, is separated from other browser
sessions (tabs). Consequently, for a holistic view, an aggregation of the data is
required. The user would need to open the corresponding tool before the begin
of each browsing session in each tab. The log is cleared with every new page the
user visits, so a checkbox needs to be ticked to persist the log (in each tab). To
export the recorded data, only Firefox’ Inspector offers a complete export of the
data, while Chrome’s DevTools only offer an option to export one request at a
time. Collecting data using this method is cumbersome and error-prone, which
violates Obj8. Further inspection of this method also revealed that Obj2 is vio-
lated, since the exported data either does not contain the first party (Chrome)
or only gives the host name of it (Firefox).
Clearly, a more sophisticated method is required. Luckily, HTTP and HTTPS
traffic can be logged using Chrome’s or Firefox’ extension Application Program-
ming Interface (API). So, Obj1 can be best implemented in a browser extension.
In fact, we found no alternative approach to realize Obj1 without aggressively
interfering with the user’s device. Using the extension API also allows us to
identify the first party including the HTTP path (see Obj2). Besides an initia-
tor field in the traffic log, it is possible to map a request to a certain open and
active tab of which the URL can be used.
To persistently store the data like stated in Obj3, a sophisticated database
like MySQL or MongoDB would be ideal, however this would require users to
install additional software on their device (violation of Obj8) or to transmit the
data to an external server (violation of Obj2). Browser extensions are able to
store data in the so-called local storage, which offers limited storage capabilities.
The local storage is a key-value store, thus complex queries cannot be easily ex-
pressed. Furthermore, the local storage is not encrypted, thus malicious software
on the user’s device could easily gain access to it. Therefore, encryption must
be implemented within the browser extension. However, inconvenient key-pair
generation and management must be avoided in order to not violate Obj8.
In order to realize a collection of data in a privacy-preserving manner (Obj4),
only the outgoing traffic is recorded. This way, we follow a data minimization
T.EX - The Transparency EXtension 5
approach. The HTTP response, besides the actual content the user consumes,
contains cookies and identifiers that are assigned to the user and which are
used for subsequent requests. By neglecting the HTTP response, we miss these
assignments. However, we assume the preserved privacy is of higher value than
the benefit gained from the HTTP responses. Moreover, it is not sure whether the
accuracy of a classification algorithm to detect Web trackers would be increased
if the HTTP response is taken into consideration. It would be interesting to
investigate this in a separate study.
Since the HTTP body is used to transmit sensitive data like passwords,
messages, photos or videos, recording it can be highly sensitive. Therefore, it is
not recorded by default but the user is able to enable this feature at own risk.
The reason why we do not completely exclude it, like we do with the HTTP
response, is that we could observe Web trackers using it for passing identifiers
to their servers.
The local storage can be cleared at any time; therefore, the user is given a
button to trigger the erasure of all data (Obj5). Moreover, the local storage is
file-based, i.e. its content can be found in plain text in files on the user’s machine.
Thus, to ensure the erasure of all personal data, the user can additionally delete
the corresponding files. The path to these files is static, it can be given to the
user so he or she can find it.
To export the data in a machine-readable format (Obj6) the whole local
storage must be queried, requests must be decrypted, and saved to a dedicated
file. Since data in the local storage is in JSON format, it is reasonable to export
it as such. Due to the diverse structure of the recorded data, an export in CSV
is rather unhandy.
Disabling the recording (Obj7) can be realized with a set of means: by imple-
menting blacklists (or whitelists), by offering a button to start and stop recording
at any time, or by disabling the extension completely. The latter is undoubtedly
the safest and easiest way to guarantee that the recording is disabled. Black-
lists or whitelists determine on which websites recording should be disabled or
enabled respectively. This approach, however, requires users to invest some ef-
fort for preconfiguration, which might violate Obj8. A button to start and stop
recording is rather easy to implement, but offers no advantage compared to en-
abling or disabling the extension, since this can be triggered with one click as
well.
To achieve Obj8, all other objectives must be realized by involving as less user
effort as possible. This means that the usage of the extension itself is realized
in a user-friendly manner. But furthermore, the usage of the extension should
not impact the perceived QoE while browsing the Web, i.e. websites should not
take longer to load or that CPU and memory consumption drastically increase
so that other applications are affected.
The visualization of the data (Obj9) can be done in the browser using Hy-
pertext Markup Language (HTML), Cascading Style Sheets (CSS), JavaScript,
and Scalable Vector Graphics (SVG). To highlight the communication flows, we
chose a graph representation of the data. A search function is provided to users
6 T.EX - The Transparency EXtension
allowing them to query the data for personal information they do not want to
be included in a resulting data set, which is further processed.
4 Related work
Trackers enjoy a long presence in the history of the Web. In fact, they exist
almost as long as the Web itself. Lerner et al. [11] proved the presence of Web
trackers in 1996 by examining and analyzing the Web Archive. The Internet, as
a distributed system, is built upon interconnections of nodes, thus, third par-
ties are conceptually nothing to despise. However, for the precise personalization
of displayed advertisements, personal data is required, which is often collected
without a users awareness using Web tracking techniques. One could argue that
the most severe issue with third-party content is not its presence but users
unawareness of it. A study by Thode et al. [14] shows that users’ expectations
regarding third-party tracking heavily differ from reality. With the General Data
Protection Regulation (GDPR) [7] coming into effect in May 2018, this circum-
stance becomes problematic, since it requires the processing of personal data to
be transparent.
Bujlow et al. [2] published a sophisticated survey on all known Web tracking
techniques to date. Most modern and often more accurate methods mostly rely
on third-party scripts that are executed on the user’s device to obtain a set
of data items to generate a so-called browser fingerprint, which is sufficient to
uniquely identify the user among other users.
Today Web trackers are subject to extensive studies due to the threat they
impose on our data privacy. A very sophisticated study was conducted by Engle-
hardt et al. [6] in 2016, who aimed to measure and analyze the extent of third-
party presence on one million websites. Therefore, they designed and developed
the tool OpenWPM to measure and record HTTP traffic. Yet, OpenWPM uses
Selenium to crawl the top one million websites, which is a framework to simulate
and automate user interactions. Thus, their measured data is not real user data.
Regardless of the data quality, they found third-party scripts present on nearly
all considered websites. Their results further show that only few third parties
are present on a high number of first parties. This is clear evidence for data
monopolies of the most prominent Web trackers. However, this circumstance is
also an advantage: one has to identify and block the few most prominent third
parties only to effectively protect oneself from Web tracking on the most popular
websites at least. This is one of the reasons why the blacklist-based approach is
so popular: it is very effective.
There are many browser extensions for all major browsers that follow this
approach. Their promise is to protect users from unintended and unauthorized
third-party information disclosure. Browser extensions like Ghostery [10], Ultra-
Block - Privacy Protection & Adblocker [16], Crumble [4], or Privacy Badger [13]
are very popular tools with millions of users. However, only Privacy Badger tries
to identify Web trackers based on their prominence in addition to blacklists. Pri-
vacy Badger blocks a third party if its presence is observed on three distinct first
T.EX - The Transparency EXtension 7
parties. An additional challenge of these browser extensions is to maintain the
same level of user-perceived QoE after the extension has been installed. From a
user’s perspective, blocking third-party requests is very beneficial, since loading
times are decreased and computing resources are spared, as a study of Kontaxis
and Chew [8] confirms.
However, the above presented browser extensions give little to no informa-
tion on the tracking third party itself nor technical details about the process of
data exposure. However, there are browser extensions that give more informa-
tion: uMatrix [17] and uBO-Scope [15]. The extension uMatrix provides the user
with insights on the type of HTTP requests issued to the corresponding third
parties. While, to our knowledge, the extension uBO-Scope is the only one that
accurately gives information on the extent of presence of a specific third party
during the current browsing session. A high presence of a third party is indicated
with red in the extension’s pop-up window.
Nonetheless, all the above presented browser extensions rather aim to iden-
tify and block tracking activities than serving as tool to assess data flows to
third parties. They offer limited data visualization capabilities and no record-
ing options, which makes it difficult to analyze or further process the measured
data. The browser extension closest to the objectives of T.EX is Firefox’ Light-
beam [9], which has strong visualization features (Obj9), but fails to give more
insights on the communication that has taken place and the third parties itself
(Obj1). Lightbeam allows to export the recorded data in machine-readable for-
mat (Obj6), yet the the exported information does not include the first party
with its HTTP path (Obj2).
The idea to use machine-learning techniques to identify Web trackers was
proposed by Bau et al. [1] in 2013. They elaborate on useful data sources and
how to obtain labeled training sets. Following the paper’s position, there were
several publications of researchers in the following years describing supervised
or unsupervised classification of Web tracking activities. In 2014, Metwalley et
al. [12] present an unsupervised approach that leads to successful results. Their
algorithm is able to detect 34 Web trackers that have never been documented
before. Similar results are achieved by Wu et al. [18] in 2016. They use a su-
pervised approach and detect 35 new tracking parties. Despite their successful
revelation of new Web trackers, both research groups use crawlers to generate
the data with which they feed their machine-learning algorithms.
The importance of proper data quality is highlighted by the publication of
Yu et al. [19], who achieve remarkable results with regard to accuracy and per-
formance of detecting Web trackers. The authors are a research group from the
Cliqz browser development team, which is a German browser vendor of the same-
named browser Cliqz [3]. Through their product, they were able to use browsing
data of 200.000 users for their algorithm. This way, they were able to outperform
their commercial competitor Disconnect.me [5], which is also used by Firefox.
8 T.EX - The Transparency EXtension
Fig. 1. The user interface of the browser extension including a graph, a search feature,
and further information on the third parties. Highly connected nodes are colored red
to indicate third parties with high extent of presence on other websites.
Fig. 2. Records visualized on a timeline enabling users to investigate requests initiated
by a certain website to a certain third party. By selecting an event, users can see the
corresponding record including all recorded data.
T.EX - The Transparency EXtension 9
5 Implementation
This chapter presents the implementation of T.EX and explains how the indi-
vidual objectives were realized. T.EX has been implemented for Google Chrome,
however it is planned to port the implementation to Mozilla Firefox. Since the
offered browser extension APIs of the two browser vendors are based on the
WebExtension APIs, it can be expected that most of the code can be reused for
the implementation of a Firefox extension.
5.1 HTTP and HTTPS traffic logging and recording
To intercept and log HTTP and HTTPS traffic, the interface webRequest is used.
Chrome and Firefox emit an event onBeforeRequest before a request is issued.
Extensions can subscribe to the event by adding a listener to it. Both browsers
provide extensions with valuable information on the issued request, including all
necessary information on the target tof the request, search parameters S, request
headers H, form data Fand even data in the request body B. Interestingly,
determining the source sof a request requires more effort in Google Chrome.
While Firefox emits the initiator of a request in the originUrl field, Chrome only
gives information on the source in an optional field called initiator. To retrieve
the source even if the field is not set, a query of open tabs with the tabId is
required. A logged event is called record r, which is defined as follows:
r∈R:= (s, t, S, H, F, B) (1)
kv := (key, N)∈S∪H∪F∪B(2)
v, kv ∈N(3)
5.2 Persistent storage of records
Records need to be persistently stored in order to enable an assessment of them
later in time. The local storage of browsers is rather limited with regard to
performance and expressiveness of queries. The local storage is a so-called key-
value-store that allows to load values for certain keys or a set of keys, yet does
not offer possibilities to query ranges. Each key has to be unique and queried
explicitly. This means in practice that the local storage cannot be queried to
return records that have been recorded in the last seven days for example. Fur-
thermore, it is not advisable to get or set values in a high frequency, since the
local storage can be easily overwhelmed, which directly leads to a bad QoE.
For this reason, two strategies are implemented: the aggregation of records
into chunks and the writing of chunks into the local storage in a defined interval
i. This way, the local storage is less demanded and the work load is evenly
distributed over time. However, these strategies raise the question of appropriate
keys that can be used for the chunks, so that they can be queried later in time.
To enable this, we implement a chain of chunks C, i.e. each chunk cis pointing
to the last chunk and the key of the most recent chunk is stored in a global field
10 T.EX - The Transparency EXtension
called currentId. Each chunk retrieves a timestamp ts, which is used as key for
the chunk.
c∈C:= (ts, lastId, R[ts−i,ts]) (4)
currentId =ts (5)
Eventually, this implementation enables queries of chunks in a certain time
range. Moreover, this implementation allows the erasure of old chunks after
a predefined time. Given that the local storage by default is limited to 5.24
megabytes, this feature is crucial. Both Chrome and Firefox have the extra
permission unlimitedStorage. Extensions that ask for the privilege are allowed
to store more data. Nonetheless, an implementation that does not rely on the
permission is desirable.
5.3 Encryption and decryption of chunks
Since the local storage resides on the user’s machine unencrypted, encryption
needs to be implemented in order to ensure data security. Otherwise, a malicious
application on the user’s device could gain access to this data and gain valuable
information like passwords, the browser history, email addresses, bank account
information and suchlike. Without encryption, T.EX would rather constitute a
severe risk than contribute to improved data security and privacy.
To implement encryption, the user is prompted to generate a key pair (pubKey
and privKey) after the installation of the browser extension. This requires the
user to enter a password pwd. The generated private key is encrypted with the
entered password using the Advanced Encryption Standard (AES). The gener-
ated public key and the encrypted private key encP rivKey are then stored in
the local storage.
To encrypt chunks, a random key aesKey is generated that serves as symmet-
ric key for the encryption. This random key is used for the whole browsing session
until the browser is closed. This key is encrypted with the public key so that
only the private key can decrypt it. This encrypted symmetric key encAesKey
is stored along with the encrypted chunk in the local storage. To decrypt chunks,
the user is prompted to enter the password to decrypt the private key, which is
then used to decrypt the symmetric key to eventually retrieve the chunks.
5.4 Data visualization
As it can be seen in Figure 1, data flows are represented by a graph G:= (V, E),
which illustrates connections between visited websites (green-colored nodes) and
involved third parties (beige or red-colored nodes). Red-colored nodes are highly
connected nodes that retrieve data from various websites and Web applications.
For the coloring, a rather simple rule-based approach was used for the beginning.
However, it is planned to extend the coloring function at a later point in time.
T.EX - The Transparency EXtension 11
Algorithm 1 Set-up and encryption of chunks
1: privKey, pubKey ←generateKeyP air()
2: pwd ←user-entered password
3: encP rivKey ←encrypt(privKey, pwd)
4: save(encP rivKey, pubKey)
5: c= (ts, lastId, R[ts−i,ts])
6: aesKey ←generateRandomKey() for each session
7: encAesKey ←encrypt(aesKey, pubKey)
8: c0←(ts, lastId, encrypt(R[ts−i,ts], encAesKey), encAesKey)
9: save(c0)
Algorithm 2 Decryption of chunks
1: encP rivKey ←load from local storage
2: pwd ←password prompt
3: privKey ←decrypt(encP rivKey, pwd)
4: c0←load from local storage
5: aesKey ←decrypt(c0
encAesKey, privKey)
6: c←(ts, lastId, decrypt(R[ts−i,ts], aesKey)
A more gradient color function is currently researched to highlight only the Web
trackers in the graph.
G:= (V, E) (6)
V:= {rs, rt|r∈R}(7)
E:= {(rs, rt)|r∈R}(8)
Users can search for keywords that might appear in URLs, headers, or pa-
rameters. Purple-colored nodes (as seen in Figure 1) are nodes that contain the
keyword in the record. By clicking on a node the user is able to retrieve more
information on the corresponding node such as to which nodes data has been
sent to or from which nodes data was retrieved. For further investigation of the
occurred communication, the user can investigate requests to or from one node,
which are visualized on a timeline. By selecting an entry on the timeline the
record is visualized (see Figure 2).
6 Evaluation
The aim of this section is to evaluate whether the usage of T.EX implies an un-
neglectable impact on the user-perceived QoE while browsing the Web. There-
fore, we investigate whether the loading time of a website noticeably increases,
when using T.EX. We measure loading times by recording key events: onDOM-
ContentLoaded and onCompleted. Both events occur strictly sequential, i.e. the
DOMContentLoaded, which indicates that the Document Object Model (DOM)
12 T.EX - The Transparency EXtension
is fully built, always occurs before DOMContentCompleted, which indicates that
also all referenced resources are fully loaded and initialized. From a user’s per-
spective, the first event occurs close to the moment when the user is able to
see the website. In contrast to the latter, which is triggered when the loading
indicator of the browser disappears.
Analogously, we measure the resource consumption (i.e. CPU and memory
usage) during a website request and loading in order to learn the impact of
the browser extension on hardware resources. For this purpose, we request and
compute CPU and memory usage in a determined interval (so-called tick each
50 milliseconds). Besides CPU and memory usage, we further evaluate the disk
space consumption of T.EX on a general level to find out how fast the extension
reserves disk space for its purpose.
As stated above, we open websites with and without T.EX activated. We
additionally repeat the procedure with a different, comparable browser extension
activated in order to be able to assess the performance of T.EX in comparison
with other extensions. For this purpose we identified Privacy Badger as good
candidate, since it uses the same APIs to analyze traffic in real-time. However, we
know that Privacy Badger decreases loading times of websites, while we expect
T.EX to increase loading times. This is due to Privacy Badger preventing HTTP
requests from occurring, thus saving time to load, while T.EX logs, processes,
and stores HTTP requests. For both hardware resources are used. With this
evaluation procedure we aim to put the increased hardware usage of T.EX into
perspective.
As appropriate websites for the test, we use the German news site spiegel.de
and the front page of google.de, which differ in the amount of third-party content
they integrate. While accessing google.de triggers only 23 requests, which only
request content from Google servers, requesting spiegel.de involves more than
400 requests to more than 50 third parties. We expect hardware usage and
loading times to increase linearly with the number of involved requests, thus we
selected two websites that are rather bipolar in that respect. The experiment
was conducted on a machine with an Intel Core i7 (2.2 GHz quad-core) and 16
GB memory. The machine was connected to the Internet via a 1 Gbit Ethernet
connection. The experiments were repeated three times each to detect anomalies.
The results of the experiment are depicted in Figure 3. The rows represent the
corresponding runs without T.EX activated (top), with T.EX activated (middle),
and with Privacy Badger activated (bottom). In each run the CPU usage (left
column), memory usage (middle column), and loading times (right column) were
measured.
By comparing the individual results displayed in the first column, an increase
of CPU usage is clearly observable. The CPU is working much closer to capacity
and maintains this level during the whole time the website is loaded. The reason
for the CPU demand of T.EX is found in the steady encryption of records in
the background. Thus, disabling the encryption would gain performance, yet
would constitute a violation of the extension’s main objectives. Additional CPU
capacity is used, since requests are preprocessed before they are stored in the
T.EX - The Transparency EXtension 13
Fig. 3. The results of the evaluation: the first column shows the CPU usage, the second
column the memory usage, and the third column the loading times. The first row
represents the measurements without T.EX activated, the second row with enabled
T.EX, and the last row with Privacy Badger activated.
14 T.EX - The Transparency EXtension
local storage. This preprocessing could be executed at a later point in time, for
example, when the browser is in the idle state for a certain amount of time , i.e.
the browser is currently not used by the user.
The memory consumption is rather consistent with our expectation: the us-
age is increased fairly but not excessively. Comparable browser extensions like
Privacy Badger that perform similar tasks show the same level of memory con-
sumption. The perceived QoE should not be affected to much by this circum-
stance. In contrast to the loading times, which seem to be strongly affected by
the usage of T.EX. When comparing the third column in Figure 3, it is notice-
able that the loading time is drastically increased, when T.EX was activated.
This does not apply on the DOMContentLoaded event, but on the DOMCon-
tentCompleted event. Note that the page is usable much earlier, so that the
user can already interact with it, before the DOM content is fully loaded. Yet
the performance of T.EX with regard to loading times requires improvement. It
is also noteworthy that the performance for the loading times of google.de are
comparable to the performance achieved in the other runs. Consequently, the
drastic increase of the loading time occurs on websites with massive third-party
involvement. An exponential increase relative to the number of involved third
parties could be ruled out.
Finally, we aim to investigate the disk space consumption. While it can be
measured easily by simply checking how big the local storage files are, it is rather
difficult to define a rule to estimate the storage usage. In general, it heavily
depends on the usage and browsing behavior of the user. In a dedicated three-
hour lasting session, we were able to collect 80 megabyte of data, while on a
different machine that is exclusively used during office hours (then extensively),
we collected almost 700 megabyte in a single month. Nonetheless, it must be
stated that the storage requirements imposed by the usage of T.EX exceed the
requirements of other browser extensions. Therefore, users of T.EX must be
aware that the recording of browsing sessions is storage intensive.
7 Conclusion & outlook
This paper presents T.EX a browser extension to provide transparency to ex-
perienced users or system administrators, who want to record and analyze com-
munication flows to external third parties while browsing the Web. Therefore,
objectives and requirements have been defined and their implementation has
been presented. T.EX will serve as tool to conduct measurements and obtain real
user data in a secure and privacy-preserving manner, which might contribute to
more accurate machine learning models to identify Web trackers and tracking
activities in real-time. We evaluated T.EX by measuring its impact on the per-
formance to derive consequences on the user-perceived QoE. Our results show
that T.EX achieves performance, which is comparable to other privacy browser
extensions like Privacy Badger. However, it has an impact on the loading times
of certain websites that cannot be neglected. The issue will be investigated in
T.EX - The Transparency EXtension 15
future works. Furthermore, we will use T.EX to collect data that will be used to
identify trackers and their tracking activities.
Acknowledgments
Supported by the European Union’s Horizon 2020 research and innovation pro-
gramme under grant 731601.
References
1. J. Bau, J. Mayer, H. Paskov, and J. Mitchell, A Promising Direction for Web
Tracking Countermeasures, W2Sp, 2013.
2. T. Bujlow, V. Carela-Espanol, B. R. Lee, and P. Barlet-Ros, A Survey on Web
Tracking: Mechanisms, Implications, and Defenses, Proceedings of the IEEE, vol.
105, no. 8, pp. 14761510, 2017.
3. Cliqz - Der sichere Browser mit integrierter Schnell-Suche [Online]. Available:
https://cliqz.com/ [Accessed: 4-Feb-2019].
4. Crumble Online Privacy, Stop Tracking. [Online]. Available: https:
//chrome.google.com/webstore/detail/crumble--online-privacy/
icpfjjckgkocbkkdaodapelofhgjncoh. [Accessed: 4-Feb-2019].
5. Disconnect [Online]. Available: https://disconnect.me/. [Accessed: 4-Feb-2019].
6. S. Englehardt and A. Narayanan, Online Tracking, Proceedings of the 2016 ACM
SIGSAC Conference on Computer and Communications Security - CCS16, no. 1,
pp. 13881401, 2016.
7. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27
April 2016 on the protection of natural persons with regard to the processing of
personal data and on the free movement of such data, and repealing Directive
95/46/EC (General Data Protection Regulation), OJ L 119, 4.5.2016, p. 1-88.
8. G. Kontaxis and M. Chew, Tracking Protection in Firefox For Privacy and Perfor-
mance, In IEEE Web 2.0 Security & Privacy, Jun. 2015.
9. Firefox Lightbeam Add-ons for Firefox. [Online]. Available: https://addons.
mozilla.org/de/firefox/addon/lightbeam/ [Accessed: 4-Feb-2019].
10. Ghostery Makes the Web Cleaner, Faster and Safer! [Online]. Available: https:
//www.ghostery.com. [Accessed: 4-Feb-2019].
11. A. Lerner, A. K. Simpson, T. Kohno, and F. Roesner, Internet Jones and the
Raiders of the Lost Trackers: An Archaeological Study of Web Tracking from 1996
to 2016, Usenix Security, 2016.
12. H. Metwalley, S. Traverso, and M. Mellia, Unsupervised Detection of Web Trackers,
in 2015 IEEE Global Communications Conference (GLOBECOM), 2014, pp. 16.
13. Privacy Badger — Electronic Frontier Foundation. [Online]. Available: https://
www.eff.org/privacybadger. [Accessed: 4-Feb-2019].
14. W. Thode, J. Griesbaum, and T. Mandl, I would have never allowed it: User
Perception of Third-party Tracking and Implications for Display Advertising,
Re:inventing Information Science in the Networked Society. Proceedings of the
14th International Symposium on Information Science (ISI 2015), Zadar, Croatia,
19th–21st May 2015, vol. 66, no. May 2015, pp. 445456, 2015.
15. uBO-Scope: A tool to measure over time your own exposure to third parties on the
web. [Online]. Available: https://github.com/gorhill/uBO-Scope. [Accessed: 4-
Feb-2019].
16 T.EX - The Transparency EXtension
16. UltraBlock - Block Ads, Trackers and Third Party Cookies. [Online]. Available:
https://ultrablock.org/. [Accessed: 4-Feb-2019].
17. uMatrix: Point and click matrix to filter net requests according to source, desti-
nation and type. [Online]. Available: https://github.com/gorhill/uMatrix. [Ac-
cessed: 4-Feb-2019].
18. Q. Wu, Q. Liu, Y. Zhang, P. Liu, and G. Wen, A machine learning approach for
detecting third-party trackers on the web, in Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 9878 LNCS, no. 4, I. Askoxylakis, S. Ioannidis, S. Katsikas,
and C. Meadows, Eds. Cham: Springer International Publishing, 2016, pp. 238258.
19. Z. Yu, S. Macbeth, K. Modi, and J. M. Pujol, Tracking the Trackers, in Proceedings
of the 25th International Conference on World Wide Web - WWW 16, 2016, no.
AUG., pp. 121132.