scieee Science in your language
[en] (orig)
Jonathan Heiss, Anselm Busse, Stefan Tai
Trustworthy Pre-processing of Sensor Data in Data
On-Chaining Workflows for Blockchain-Based IoT
Applications
Open Access via institutional repository of Technische Universität Berlin
Document type
Conference paper | Accepted version
(i. e. final author-created version that incorporates referee comments and is the version accepted for
publication; also known as: Author’s Accepted Manuscript (AAM), Final Draft, Postprint)
This version is available at
https://doi.org/10.14279/depositonce-19863
Citation details
Heiss, J., Busse, A., & Tai, S. (2021). Trustworthy Pre-processing of Sensor Data in Data On-Chaining
Workflows for Blockchain-Based IoT Applications. In Service-Oriented Computing (pp. 133–149). Springer
International Publishing. https://doi.org/10.1007/978-3-030-91431-8_9.
Terms of use
This work is protected by copyright and/or related rights. You are free to use this work in any way permitted by
the copyright and related rights legislation that applies to your usage. For other uses, you must obtain
permission from the rights-holder(s).
Trustworthy Pre-Processing of Sensor Data in
Data On-chaining Workflows for
Blockchain-based IoT Applications
Jonathan Heiss, Anselm Busse, and Stefan Tai
Information Systems Engineering (ISE)
TU Berlin, Germany
{jh,ab,st}@ise.tu-berlin.de
Abstract. Prior to provisioning sensor data to smart contracts, a pre-
processing of the data on intermediate off-chain nodes is often necessary.
When doing so, originally constructed cryptographic signatures cannot
be verified on-chain anymore. This exposes an opportunity for undetected
manipulation and presents a problem for applications in the Internet of
Things where trustworthy sensor data is required on-chain.
In this paper, we propose trustworthy pre-processing as enabler for end-
to-end sensor data integrity in data on-chaining workflows. We define
requirements for trustworthy pre-processing, present a model and com-
mon workflow for data on-chaining, select off-chain computation utiliz-
ing Zero-knowledge Proofs (ZKPs) and Trusted Execution Environments
(TEEs) as promising solution approaches, and discuss both our proof-of-
concept implementations and initial experimental, comparative evalua-
tion results. The importance of trustworthy pre-processing and principle
solution approaches are presented, addressing the major problem of end-
to-end sensor data integrity in blockchain-based IoT applications.
Keywords: Internet of Things ·Blockchain ·Data Integrity ·On-chaining
·Off-chaining ·Pre-processing ·TEEs ·zkSNARKs
1 Introduction
Blockchain technology is increasingly used in the Internet of Things (IoT) to
store and process critical sensor data originating from and shared between mul-
tiple, often mutually distrusting parties [21, 25, 15, 16, 7, 12, 24]. In local energy
grids with blockchain-based energy trading, for example, energy consumers and
producers depend on smart meter-generated measurement data [20, 7]. In sup-
ply chains, product-related manufacturing and shipping events are written to
a blockchain to provide a single source of truth for all involved, independent
parties [25, 24]. In healthcare, blockchain use cases exist for doctors, hospitals,
and emergency services to have access to patients’ health data collected by wear-
ables [12].
However, the variety and scale of connected IoT devices and the generated
data pose new challenges regarding data processing and data on-chaining. Raw
2 J. Heiss, A. Busse, and S. Tai
sensor measurements cannot directly be used on the blockchain because of vol-
ume limitations [18] or because sensitive information may be exposed and be-
come accessible to unintended readers [7]. Blockchains inherently have privacy
and scalability limitations [6, 17] that must be taken into account.
Consequently, the on-chain processing of sensor data is preceded by pre-
processing steps to reduce data volume and ensure that confidential information
is veiled. Such pre-processing typically is executed on intermediate, off-chain
nodes as part of multi-staged data provisioning workflows [21, 25, 15, 16, 7, 12]:
data originates on constrained sensor nodes, then moves to more powerful gate-
way nodes for pre-processing, and is finally provisioned to smart contracts as ag-
gregated information. For example, in the healthcare use case described in [12],
data is pre-processed by personal computers or smartphones; in energy grids [7]
by workstations located within participating households; in supply chains [25]
by board computers and mobile devices.
While pre-processing has become an integral element in such data on-chaining
workflows and is necessary to mitigate scalability and privacy issues, off-chain
pre-processing also represents a security risk. Sensor devices typically sign their
measurements to provide data integrity. However, sensor data integrity is not
end-to-end: once data is pre-processed on middleboxes, signatures constructed
on the input do not apply to the output anymore. Contrary to smart contract
application logic, application stakeholders cannot validate off-chain processing as
part of the blockchain’s consensus protocol. Consequently, naive pre-processing
can be exploited for malicious data manipulation without being noticed. This
attack vector threatens data integrity in data on-chaining workflows and quickly
questions the entire blockchain-based IoT system design and data quality.
To address this problem, solutions are needed to ensure trustworthy pre-
processing, i.e., to make computational correctness verifiable on the blockchain.
Off-chain computations have been proposed [6] to outsource blockchain transac-
tion processing to off-chain nodes without compromising trust guarantees. Zero-
Knowledge (ZK) computations and Trusted Execution Environments (TEE) are
two important approaches here that are also increasingly being used in early-
adoption projects and practice [7, 10, 9, 1]. However, using ZK computations and
TEEs for trustworthy pre-processing has not been examined so far.
In the face of the rising interest in blockchain-based sensor data management
and the need for end-to-end sensor data integrity, in this paper, we analyze the
underlying problem of trustworthy pre-processing in data on-chaining workflows,
propose a model for integrity-preserving data on-chaining, and examine its prac-
tical applicability based on ZK computations and TEEs. Thereby, we make two
individual contributions:
1. First, we propose a model for end-to-end sensor data integrity through trust-
worthy pre-processing. We characterize sensor data pre-processing in on-
chaining workflows for blockchain-based IoT applications based on relevant
literature. From our findings, we refine our problem statement and introduce
trustworthy pre-processing as a workflow element that enables application
Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 3
stakeholders through participation in the blockchain network to verify data
integrity from source to sink.
2. Second, we examine the applicability of zkSNARKs-based and Trusted Exe-
cution Environments (TEE)-based off-chain computations for our proposed
model. Based on a typical application workflow, we first conceptualize how
trustworthy pre-processing can be instantiated with ZoKrates [8], a toolkit
for zkSNARKs-based off-chain computation, and with Intel SGX [5], In-
tel’s realization of TEEs. Then, we implement the proposed model with
both technologies as a proof of concept and present preliminary experiments
in a testbed. While our results attest to the applicability of trustworthy
pre-preprocessing with both approaches, they also confirm that, in compari-
son, zkSNARKs provide stronger integrity guarantees (weaker trust assump-
tions), whereas TEEs enable more efficient off-chain pre-processing.
2 Pre-Processing
To lay the foundation for trustworthy pre-processing, in this section, we first
describe the general characteristics of pre-processing in blockchain-based IoT
applications that we observed in pertinent research papers. Next, we refine our
problem statement and define computational integrity, based on [2]. Finally,
we present a model for trustworthy pre-processing on gateway nodes for use in
data on-chaining workflows that start with sensor devices and result in smart
contracts.
2.1 Characterization
Pre-processing in blockchain-based applications shares common objectives, input
types, and functionality.
Objectives In data on-chaining workflows, off-chain pre-processing helps to
mitigate blockchain-inherent scalability and privacy limitations. Thereby, it pur-
sues the following objectives:
Offloading Computation: Outsource on-chain data processing to an off-chain
node that is not bound to costly consensus-based transaction processing [7].
Reducing Storage: Reduce the volume of sensor data to minimize the storage
footprint on the blockchain [12, 19].
Enabling Confidentiality: Hide sensitive information contained in raw mea-
surements or meta-data from stakeholders that do have read permissions [7,
25, 21].
Inputs Pre-processing can be executed on different types of data. We distinguish
between the following:
4 J. Heiss, A. Busse, and S. Tai
Measurements include all data that is generated by sensor devices. This
includes time series data collected over a longer period of time [22], for
example, temperature or location data, and event data that represents ex-
ternally triggered occurrences [25], for example, the scanning or opening of
a container in a logistics context.
Meta-data originates from the sensor device and contains descriptive infor-
mation about the measurements, such as sensor identities, target storage
addresses, or timestamps.
Auxiliary data is added at the gateway node. Examples are filter rules, access
control lists, or storage addresses.
Measurements and meta-data are critical for pre-processing and are referred to
in the following as sensory data. In contrast, auxiliary data is never processed
alone but optionally used to enrich pre-processing.
Types Without claiming completeness, we identify three general types of data
pre-processing which can be observed in relevant applications [25, 20, 12, 21] and
which represent typical functionality for operating on sequential data 1.
Mapping: Data is transformed into a target format, e.g., enumeration, en-
cryption, decryption, hashing [21, 25].
Reducing: Data of one or multiple sensor devices is consolidated, e.g., the
arithmetic average or a total amount is calculated [20].
Filtering: Data is filtered according to predefined rules, e.g., only values
below a predefined threshold are returned [12].
2.2 Problem Refinement
Data provisioning is often controlled by one of the stakeholders, e.g., shippers in
supply chains [25, 15] or producers in energy markets [7]. Stakeholders may have
a personal, often economically motivated interest in manipulating the data, e.g.,
in cooling chains to prevent contractual penalties if perishable fright is perished
or to improve accounting positions. Given such motifs, we assume data providing
stakeholders as potential attackers.
In data on-chaining workflows, data can take three states: it is in transit
when it is transmitted from one to another component, it is at rest when it is
persisted on disk, and it is in use when it is processed in memory. During the
states in transit and at rest, data integrity and authenticity can be verified using
cryptographic signatures. However, when data is processed, it is transformed and
signatures constructed on the input do not apply for the output anymore. Fur-
thermore, off-chain pre-processing cannot be validated by stakeholders through
the consensus mechanism. An attacker could selfishly execute different functions
on the data to manipulate the output and obtain a personal benefit without be-
ing noticed. Therefore, we assume manipulation of computation as the potential
attack.
1https://web.mit.edu/6.005/www/fa15/classes/25-map-filter-reduce/
Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 5
2.3 Computational Integrity
As a first step towards trustworthy pre-processing, we characterize computa-
tional integrity. We adopt the model proposed in [2].
A pre-processing program Pis executed on input data Dand some auxiliary
data Aand returns output Osuch that P(D, A)O.
A malicious executer may benefit from creating a manipulated program P
such that P(D, A)O|O=O. For example, in the supply chain use case,
a shipper executes a threshold check Pon temperature measurements Dusing
the threshold A. If the shipper knows that the outcome Otriggers a contractual
penalty, but Odoes not, it may change Pto Pto obtain Oinstead of O. It
then reports Oto the blockchain and is exempt from the penalty. Additionally,
the executer may leave the program Punchanged but manipulate the input data
D such that P(D, A)O|D=DO=Oor the auxiliary data A such that
P(D, A)O|A=AO=O
To prevent both, program and input manipulation, stakeholders should be
able to verify computational integrity which is only guaranteed if output Ois
executed on the right program Pand on the right input data (D, A) such that
P(D, A)O|(P=P)(D=D)(A=A). Therefore, we assume that pro-
gram Palso generates an evidence Ethat asserts computational integrity such
that P(D, A)(O, E). To enable third-party stakeholders to verify computa-
tional integrity, additionally, an asymmetric key pair is required: the evidence
signed with the proving key can be verified by any third party with the corre-
sponding verification key. The evidence and the evidence key pair represent the
major artefacts for trustworthy pre-processing.
2.4 End-to-End Data Integrity
Given that integrity of data can be verified while it is in use, we can define a
data on-chaining workflow where integrity is verifiable from its source on the
sensor node to its sink on the smart contract as depicted in Figure 1. Note that
instead of a simple signature, verifiable evidence is provided to the blockchain
that allows data integrity verification with moderate computational overhead in
the blockchain network.
Fig. 1: End-to-End Data Integrity through Trustworthy Pre-Processing
6 J. Heiss, A. Busse, and S. Tai
One Time Setup During an initial one time setup, central system artifacts are
generated and deployed on the system components. Given that these artifacts
are critical to verify computational integrity, we assume a trusted setup where
each stakeholder can verify the integrity of the artifacts. It consists of three
steps:
As a first step (1. Integrity Assertion), an environment is established that
enables the gateway node to generate verifiable evidence of computational in-
tegrity as accompanying artefacts of the pre-processing outputs. This includes
the integrity of sensory and auxiliary inputs. Examples for such environments
are mathematical constraint systems [8] or trusted execution environments [5]
as will be described in the subsequent section.
Next (2. Key Generation), two key pairs are required: an evidence key pair
consisting of a proving and verification key for signing and verifying the evidence
and a sensor key pair, represented as a cryptographic public and private key that
is used to sign and verify the sensor data on the sensor node and the gateway
node respectively.
As the last setup step (3. Deployment), all artefacts are deployed: The gate-
way node is equipped with the sensor node’s public key, the integrity-preserving
pre-processing program, the proving key, and optionally auxiliary data. The
smart contract receives the verification key that enables evidence verification.
Recurring Operations Sensory data arrives recurringly at the gateway node
in regular intervals, e.g., batches of time series data, or in irregular intervals,
e.g., externally triggered events. Then (4. Pre-Processing), the pre-processing
program takes the signed sensory data, the sensor’s public key, and optionally
auxiliary data as inputs and executes the following steps:
(a) The sensory inputs’ signature is verified with the sensor device’s public key.
(b) Pre-processing functions are executed on the verified inputs. Examples are
provided in section 2.1.
(c) An evidence is created and signed with the gateways’ proving key. The evi-
dence enables the smart contract to verify computational integrity.
Outputs and signed evidence are transmitted to the smart contract through
the blockchain node. The smart contract verifies the evidence using the veri-
fication key (5. Verification). Successful verification on the blockchain enables
applications stakeholders to independently verify that integrity of sensor data
has been preserved from source to sink despite intermediate pre-preprocessing.
Pre-processing outputs can be consumed through participating blockchain nodes
and used for subsequent processing.
3 Application
For trustworthy pre-processing to become easily applicable in practice, technolo-
gies are required that enable on-chain verifiability of computational integrity and
that can implement the pre-processing characteristics as described in 2.1.
Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 7
Fig. 2: Off-chain Computation Technologies according to [6]
3.1 Technologies for Trustworthy Pre-processing
Off-chain computation has been proposed to mitigate privacy and scalability
limitations of blockchains by outsourcing computation to off-chain nodes with-
out compromising core blockchain properties [6, 17]. Thereby, it represents a
matching concept for trustworthy pre-processing.
However, the different approaches to off-chain computation presented in [6]
and depicted in Figure 2 are not equally suitable. Both incentive-based and
sMPC-based approaches require multiple nodes that execute non-trivial pro-
tocols. However, in data on-chaining applications in the IoT [12, 15, 25, 7, 21],
pre-processing is typically executed on a single node with limited networking
and storage capacity. If such a constraint is given, the distributed computation
model and interactive nature of incentive- and sMPC-based approaches may be
inconsistent with use case specific requirements which restricts general applica-
bility. In contrast, zero-knowledge and enclave-based approaches can be executed
non-interactively on a single node and, hence, promise broader applicability for
trustworthy pre-processing.
3.2 ZkSNARKs-based Pre-Processing with ZoKrates
Zero-knowledge proofs enable a prover to convince a verifier that it has correctly
executed a computation without revealing inputs to the verifier.
zkSNARKs can be summarized as one type of a zero-knowledge protocol
that distinguishes through succinctness, i.e., resulting artefacts are small in size
and can be verified fast, non-interactivity, i.e., only one message is required to
convince the verifier, and argument of knowledge, i.e., the prover is able to prove
that she has access to the correct data.
ZoKrates [8] provides a toolbox and a higher-level language to implement a
zkSNARKs-proving system where an off-chain prover can convince an on-chain
verifier that the computation has been executed correctly.
To describe the ZoKrates-based pre-processing (compare Figure 3), we lever-
age the model presented in Section 2.1 and build upon the ZoKrates workflow
described in [8].
One Time Setup
1. Integrity Assertion: To guarantee integrity of auxiliary data and the sensor
public key, both are typed as public arguments in the ZoKrates program and,
8 J. Heiss, A. Busse, and S. Tai
Fig. 3: Trustworthy Pre-Processing with ZoKrates
hence, are required on-chain for evidence verification. Since the verification
would fail on different public inputs, their integrity can be determined on-
chain.
Once specified, the high-level ZoKrates code is compiled into an executable
constraint system (ECS) in the ZoKrates Intermediate Representation (ZIR)
format that can be considered as an extension to a Rank-1-Constraint System
and enables assertion of computational integrity: if a variable assignment is
found that satisfies the defined constraints computational integrity can be
proven.
2. Evidence Key Generation: An evidence key pair is generated from a Common
Reference String (CRS) [8] which enables proof creation and verification.
Since the CRS allows construction of fake proofs it must be securely disposed
after key generation. The evidence key pair is cryptographically bound to
the previously generated ECS.
3. Deployment: The ECS, the evidence proving key, auxiliary data, and the
sensor public key are deployed to the gateway node which takes the role
of the off-chain prover. Verification key and the verification contract are
deployed to the blockchain.
Recurring Operations
5. Execution: The ZIR program is executed on predefined inputs, through the
ZoKrates interpreter. The output is called witness, an artefact representing
variable assignments that satisfy the specified constraints for a specific exe-
cution. In a separate step, the cryptographic proof is generated based on the
execution-specific witness and the program-specific proving key. Finally, out-
puts and evidence are forwarded to the smart contract through a blockchain
node.
6. Verification: The verification contract takes the cryptographic proof, the
verification key, and public program arguments as input parameters. The
verification is only successful if the proof is executed with the right program
and on the right (public) inputs.
Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 9
Fig. 4: Trustworthy Pre-Processing with Intel SGX
3.3 Enclave-based Pre-Processing with Intel SGX
Enclave-based computation enables an enclave-external party to verify that an
output has been computed by a specific program inside a specific enclave that
protects internal integrity. Thereby, it relies on two concepts: Trusted Execution
Environments and Remote Attestation.
Trusted Execution Environments (TEE) are hardware-secured parts of a sys-
tem architecture that protect data and code from external manipulation and dis-
closure. Programs executed inside such TEEs are running in an isolated and/or
encrypted memory region that cannot even be accessed in the highest privilege
level of the system. Thus, it protects the content of the TEE from the system
owner and guarantees the integrity of computation executed inside the TEE.
Intel SGX is Intel’s concrete implementation of TEEs. We use the terms TEE
and enclave interchangeably.
Remote Attestation enables the external verification of the integrity of the
TEE’s internal state and the authenticity of messages received from inside. Thus,
ensuring that a malicious attacker cannot falsely pose as an trusted enclave.
TEE-enabled devices have a device identity key that is embedded into the device
hardware during manufacturing and can be verified by external parties through
a Public Key Infrastructure (PKI). Using this key, the device creates for each
instantiated TEE an identity certificate which can externally be verified through
the PKI. This enables evidence key generation. When remote attestation is re-
quested, the enclave returns signed measurements which represent a complete
snapshot of the TEEs internal state. With SGX as TEE, remote attestation and
the PKI are managed by Intel.
In the following, we describe pre-processing with Intel SGX as depicted in
Figure 4. To achieve comparability with Zokrates-based pre-processing we use
the same workflow model as described in Section 2.4.
One Time Setup
1. Integrity Assertion: To guarantee integrity of auxiliary data and the sensor
public key, both must be protected through the TEEs security guarantees.
Therefore, they are specified inside the enclave during implementation.
10 J. Heiss, A. Busse, and S. Tai
Once the enclave is instantiated and loaded in memory, as a first step, re-
mote attestation is executed to verify the enclave’s internal state. The signed
measurements are verified using the enclave’s public key that is previously
authenticated through the externally managed PKI. If the measurements
match a predefined reference value that represents the ground truth of the
enclave’s internal state, the enclave’s integrity is verified.
2. Key Generation: To verify the enclave’s integrity a unique enclave-bound
key pair is required that can be authenticated from outside the enclave. This
evidence key pair is used to sign program results computed inside the enclave.
Given that the enclave’s integrity guarantees hold, this signature enables
verification of computational integrity on the blockchain. The evidence key is
generated inside the enclave and can be authenticated through an externally
managed PKI.
3. Deployment: The enclave’s evidence public key becomes part of the verifi-
cation contract which implements the signature verification on-chain and is
deployed to the blockchain. At this point, the enclave is already instantiated
on the gateway node.
Recurring Operations
5. Execution: Sensor data is provided through the host program which repre-
sents the only interface to the enclave. Auxiliary data and the sensor public
key are already part of the enclave and, hence, protected. The program is
executed as defined in Section 2.4. The computational outputs are signed
with the evidence proving key.
6. Verification: The verification contract validates the signature with the ev-
idence verification key. A successful validation proves the outputs’ authen-
ticity, i.e., they have been signed with the right proving key that is unique
to the enclave, and integrity, i.e., the received outputs are computed by the
right pre-processing program inside the enclave.
4 Evaluation
Given the two conceptual workflow descriptions, in this section, we evaluate the
technical feasibility for each technology.
4.1 Implementation
Our proof-of-concept (PoC) implementations follow the descriptions provided
in Section 3.2 and 3.3 respectively. Thereby, we focus on the recurring opera-
tions steps, execution and verification which we consider as most relevant to
demonstrate feasibility. Aspects of the setup phase are discussed in Section 5.
The PoC program should respect the pre-processing characteristics presented
in Section 2.1. Our program mimics a threshold violation check on sensory data
where the threshold represents auxiliary data. The sensory data is filtered for
Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 11
violations, then reduced by counting the violations, and mapped by scaling the fil-
tered values down. The smart contract is only provided with the violation count.
Thereby, the program fulfills all three objectives: computation is outsourced to
an off-chain node, the data footprint is reduced in size, and the potentially sen-
sitive sensor measurements are not published on-chain.
ZoKrates: For our ZoKrates-based implementation, we simulate the sensor
node with a Python script that hashes the data with SHA256 and signs it with
EDDSA-based sensor key pair, which ZoKrates support. Plain sensory data is
a private input, while the data’s hash, signature, and the sensor public key are
public inputs to the ZoKrates program. To verify integrity of sensory inputs, the
signature’s hash input is reconstructed from the plain sensor data and compared
to the hash inputs. Only if both signature verification and hash comparison
are successful integrity is guaranteed. Hashing and signature verification are
implemented using the ZoKrates Standard Library. Pre-processing is executed
by two commands provided by the ZoKrates CLI: compute-witness that requires
the compiled program and generate-proof that takes proving key and witness as
inputs. The outputs are written to disk.
Intel SGX: For the SGX evaluation, we have implemented two enclaves. The
first one simulates a sensor node and signs the sensory input data with an in-
ternally generated sensor key pair using the SGX-provided operations sgx create
keypair and sgx ecdsa sign. The second enclave represents the gateway node that
stores auxiliary data and the sensor public key internally. It verifies the sensor
data with the sensor public key using the SGX operation sgx ecdsa verify. Evi-
dence key pair generation and signature construction on computational outputs
are realized with the same SGX commands as the sensor enclave. The processing
result and the corresponding signature are written to disk.
Ethereum: As blockchain technology, we chose Ethereum [27], which is
widely used and finds application both as a public blockchain but also as consor-
tium blockchain based on Proof-of-Authority consensus and non-public deploy-
ment. For each, respectively, a verification contract is implemented in Solidity
that runs on a locally deployed Ethereum blockchain and is accessed through a
Ganache blockchain client. To validate Intel SGX evidence, we build upon an ex-
isting ECDSA implementation for the Ethereum blockchain 2. ZoKrates proofs
rely on EdDSA (twisted Edwards curve) and are verified through a dedicated
verification contract that is generated by ZoKrates CLI support 3.
4.2 Experiments
Given our proof-of-concept implementations, we can now conduct initial experi-
ments to obtain the first practical insights into trustworthy pre-processing with
zkSNARKs and TEEs. At this point, it should be noted that experimental re-
sults strongly depend on our non-optimized PoC implementations and, hence,
cannot simply be generalized.
2https://github.com/tdrerup/elliptic-curve-solidity
3https://github.com/Zokrates/ZoKrates
12 J. Heiss, A. Busse, and S. Tai
1
4
8
16
101.5
101.6
101.7
Time [s]
(a) Various Batch Sizes, Count of 1
1
4
8
16
32
101.5
102
102.5
103
Time [s]
(b) Various Batch Counts, Size of 1
Fig. 5: Pre-processing with ZoKrates
Exerimental Setup For our experimental setup, we deploy our implementa-
tions on an Intel NUC-Kit NUC7PJYH with an SGX enabled Pentium Silver
J5005 CPU, 8 GB of Memory, and an Ubuntu 18.04.5 LTS operating system. To
construct workloads, we use smart meter measurements collected in a testbed
of an energy grid research project4and prepare the measurements such that (1)
each measurement consists of four integer values, (2) measurements are collected
into batches of different sizes line-wise in plain text, and (3) each batch is signed
to represent the sensor’s signature.
As mentioned in Section 2.1, pre-processing is typically exposed to two types
of workloads: event and batch processing. To simulate that in our experimental
setup, we turn on two knobs: for events of different sizes, we change the input
data size per execution (batch size), for batch processing, we vary the number
of subsequent executions (batch count). Latter is executed on size-one-batches
which contain a single measurement.
The computational outputs of size-one-batch experiments are used for on-
chain verification, which is measured in Gas, an Ethereum-specific metric for
capturing computational complexity of on-chain transaction processing.
Results The results summarized for ZoKrates in Figure 5 and for Intel SGX in
Figure 6 show the overall execution time for off-chain pre-processing in seconds
and microseconds, respectively. As expected, the execution time of zkSNARKs-
based pre-processing is orders of magnitude higher than that of enclave-based
pre-processing. With larger batch sizes, the execution time increases almost grad-
ually. This holds true for each technology individually as shown in Figure 5 a)
and Figure 6 a). Similar behaviour can be observed for increasing the batch
count as shown in Figure 5 b) and Figure 6 b). However, we can observe that
for both ZoKrates and SGX the increase is much steeper for a growing batch
count than for a growing batch size (note the different logarithmic y-scales). For
this specific implementation example, this would mean that it is preferable to
increase the number of processed data through larger batch sizes rather than
counts when possible in the actual application scenario.
4https://blogpv.net/
Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 13
1
32
128
256
512
103
103.1
103.2
Time [µs]
(a) Various Batch Sizes, Count of 1
1
32
128
256
512
103
104
105
Time [µs]
(b) Various Batch Counts, Size of 1
Fig. 6: Pre-processing with Intel SGX
In ZoKrates-based pre-processing, the accompanying construction of crypto-
graphic proofs represents a memory-intensive computation that correlates with
the input size. The experiment for the next larger batch size of 32 measurements
in ZoKrates ran out of memory during the proof-generation on the test system.
Given that sensory data can quickly grow very large, the memory capacity of
constrained IoT or edge devices may present a limiting factor, but may not be
an issue for larger middleboxes.
In contrast, Intel SGX reduces pre-processing overhead. Even though, our
implementation was also memory limited regarding a batch size larger than
1024 measurements, this is just a limitation of the current SGX design that
might change in the future and can be mitigated, e.g., by splitting up the pro-
cesses into multiple enclaves on the same machine. Better efficiency and smaller
memory consumption distinguishes Intel SGX as a suitable technology for lower
IoT layers where computational resources are typically scarce. However, con-
trary to ZoKrates, SGX-based pre-processing requires an increased trust in the
correctness of the hardware implementation and the attestation process that
requires trusting Intel regarding a correct attestation.
In our proof-of-concept implementation, on-chain verification costs are cheaper
for ZoKrates-generated proofs (567 614 Gas) than for Intel SGX-generated signa-
tures (1 211 443 Gas). However, since on-chain verification costs strongly depend
on the implementation of respective signature algorithm our results cannot be
generalized, e.g., for other blockchain technologies.
5 Discussion
While in the previous section, initial insights about the performance behavior
of each technology were provided, in this section, we discuss security and trust
aspects and potential extensions for trustworthy pre-processing.
Integrity and Trust Assumptions: As described in Section 2.2, pre-
processing is assumed to be executed by non-trusted stakeholders who have
an incentive for data manipulation. While off-chain technologies eliminate un-
noticed attacks during pre-processing, the setup phase still reveals an attack
14 J. Heiss, A. Busse, and S. Tai
surface. In Zokrates, for example, key generation must be executed in a trusted
setup to guarantee that the Common Reference String is safely disposed to pre-
vent fake proof generation. However, establishing a trusted setup for zkSNARKs
is a known problem to which various approaches exist as referenced in [8]. In Intel
SGX, the integrity guarantee strongly relies on the internal state of the enclave
and on the authenticity of the evidence key pair. To preserve this guarantee,
remote attestation and key authenticity must be verified through a trusted third
party or by all involved stakeholders individually. Also, auxiliary data and the
sensor’s public key must be verified before being added to the enclave. Beyond
the setup, zkSNARKs-based pre-processing does not rely on further trust as-
sumptions, whereas enclave-based pre-processing heavily relies on a trustworthy
manufacturer that ensures that private keys are kept secret and certificates ob-
tained from the PKI are authentic to the device’s identities. This distinguishes
ZoKrates as particularly suitable for processing critical data with substantial
security demands.
Further Attacks: Beyond our attack model described in Section 2.2, attacks
on data freshness and availability must be considered. While an attacker that
controls communication channels, e.g., between gateway and blockchain node,
cannot compromise data integrity without being noticed (Man-in-the-Middle At-
tack) due to signature and evidence verification, it can, however, intercept and
replay messages in a different order to impact the overall application logic (Re-
play Attack). To prevent this, secure timestamps or challenge-response patterns
can be applied. Furthermore, to prevent a malicious executor from compromis-
ing availability by withholding messages (Denial of Service Attack), gateway
nodes can redundantly be deployed to eliminate centralization, similar to this
proposal [26].
Multi-Stage Pre-Processing: In multi-stage data on-chaining workflows,
multiple pre-processing tasks may be executed subsequently by different non-
trusted stakeholders. To verify integrity on-chain, an evidence chain must be
established that allows any subsequent computation to validate the provided
evidence of the previous computation. This way, end-to-end integrity could be
guaranteed along arbitrarily long on-chaining workflows.
Confidential Pre-Processing: While this work focuses on integrity preser-
vation, in some use cases it might be required to keep inputs to pre-processing
hidden from the executor. This can, for example, be achieved through Intel SGX,
where encrypted inputs can be decrypted inside the enclave, processed, and en-
crypted again before being returned. Thereby, inputs and outputs would not be
accessible by the executor. However, side-channel attacks must be respected that
are known to extract confidential information from enclaves [4].
6 Related Work
In this paper, we extend trustworthy data on-chaining as presented in [14] by
considering data in use as an additional attack vector. Furthermore, we lever-
age approaches to off-chain computation presented in [6] to realize trustworthy
Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 15
pre-processing. From the proposed off-chain computation technologies in [6],
zkSNARKs and Trusted Execution Environments are increasingly adopted in
scientific literature on blockchain-based IoT applications.
Recently, many proposals leverage zkSNARKs for off-chain computations
through Zokrates; however, only a few intersect blockchain-based sensor data
management. While in [7] ZoKrates is applied for off-chain processing of sensor
data, i.e., smart meter measurements in local energy grids, other works mainly
use Zokrates for privacy-preserving authentication, e.g. in the context of smart
vehicle authentication at charging stations [11], consumer authentication for car
sharing [13], or in health care for patient authentication [23].
TEEs are leveraged in various papers to implement trustworthy oracles that
bridge data provisioning from off-chain data sources to smart contracts. For ex-
ample, in TownCrier [28], a TEE-based oracle system is proposed to authenticate
data provided by HTTPS-enabled off-chain data sources, or in [26], a distributed
TEE-enabled oracle system is proposed that improves availability. Beyond sci-
entific usage, e.g., ChainLink 5works on a solution to implement these concepts
for practical usage [3].
While the main focus of these proposals lies in data provisioning, other works
instead use TEEs for sensor data management. In [9], for example, a system is
proposed that employs TEEs for intermediate processing of sensory data before
it is forwarded to the blockchain and the cloud. The authors of [1] use TEEs for
trustworthy access management of sensor data in hybrid storage systems where
off-chain storage holds encrypted sensor data and the blockchain stores its hashes
and access logs. While these proposals do not apply pre-processing as defined
in this paper, they underline the need for a systematization of trustworthy pre-
processing that we aim to provide with our contributions.
7 Conclusion
End-to-end sensor data integrity is critical to many blockchain-based IoT appli-
cations. Data on-chaining workflows accordingly require pre-processing on off-
chain nodes to be trustworthy. In this paper, we explored the use of zkSNARKs-
and TEE-based computations for trustworthy pre-processing, first, as individual
candidate technologies that require non-trivial set-ups for integration in data on-
chaining workflows, and second, through a preliminary, comparative experimen-
tal evaluation based on two proof-of-concept implementations. We conclude that
each presents an important approach that (a) can conceptually be well-integrated
in respective workflows and (b) satisfies the requirements and primary objective
of end-to-end data integrity. Our proof-of-concept implementations use current,
state-of-the-art software, and, since both zero-knowledge proofs and TEEs are
very active areas of research, our implementations and the experimental find-
ings must be seen as preliminary. We expect rapid advances regarding the used
software stacks and current constraints regarding memory limitations, and, con-
sequently, performance numbers to change. Still, a principal performance gap
5https://chain.link/
16 J. Heiss, A. Busse, and S. Tai
and performance advantage of TEEs over zkSNARKs is expected to remain.
However, as discussed in this paper, the choice of an approach and technology
will depend also on other, non-performance criteria like the integrity and trust
assumptions or existing attack vectors for the specific IoT application under con-
sideration. Future work will address extensions of the proposed model regarding
its computational scalability through parallel execution and its applicability for
stream processing.
References
1. Ayoade, G., Karande, V., Khan, L., Hamlen, K.: Decentralized iot data manage-
ment using blockchain and trusted execution environment. In: 2018 IEEE Interna-
tional Conference on Information Reuse and Integration (IRI). pp. 15–22 (2018)
2. Ben-Sasson, E., Bentov, I., Chiesa, A., Gabizon, A., Genkin, D., Hamilis, M.,
Pergament, E., Riabzev, M., Silberstein, M., Tromer, E., Virza, M.: Computa-
tional integrity with a public random string from quasi-linear pcps. In: Advances
in Cryptology EUROCRYPT 2017. pp. 551–579 (2017)
3. Breidenbach, L., Cachin, C., Chan, B., Coventry, A., Ellis, S., Juels, A., Koushan-
far, F., Miller, A., Magauran, B., Moroz, D., et al.: Chainlink 2.0: Next steps in
the evolution of decentralized oracle networks (2021)
4. Bulck, J.V., Minkin, M., Weisse, O., Genkin, D., Kasikci, B., Piessens, F., Silber-
stein, M., Wenisch, T.F., Yarom, Y., Strackx, R.: Foreshadow: Extracting the keys
to the intel SGX kingdom with transient out-of-order execution. In: 27th USENIX
Security Symposium (USENIX Security 18). pp. 991–1008. USENIX Association,
Baltimore, MD (Aug 2018)
5. Costan, V., Devadas, S.: Intel SGX explained. IACR Cryptol. ePrint Arch. 2016,
86 (2016), http://eprint.iacr.org/2016/086
6. Eberhardt, J., Heiss, J.: Off-chaining models and approaches to off-chain computa-
tions. In: Proceedings of the 2Nd Workshop on Scalable and Resilient Infrastruc-
tures for Distributed Ledgers. SERIAL’18, ACM (2018)
7. Eberhardt, J., Peise, M., Kim, D.H., Tai, S.: Privacy-preserving netting in local
energy grids. In: 2020 IEEE International Conference on Blockchain and Cryp-
tocurrency (ICBC). pp. 1–9 (2020)
8. Eberhardt, J., Tai, S.: Zokrates - scalable privacy-preserving off-chain computa-
tions. In: IEEE International Conference on Blockchain (2018)
9. Enkhtaivan, B., Inoue, A.: Mediating data trustworthiness by using trusted hard-
ware between iot devices and blockchain. In: 2020 IEEE International Conference
on Smart Internet of Things (SmartIoT). pp. 314–318 (2020)
10. Gabay, D., Akkaya, K., Cebe, M.: A privacy framework for charging connected
electric vehicles using blockchain and zero knowledge proofs. In: 2019 IEEE 44th
LCN Symposium on Emerging Topics in Networking (LCN Symposium). pp. 66–73
(2019)
11. Gabay, D., Akkaya, K., Cebe, M.: Privacy-preserving authentication scheme for
connected electric vehicles using blockchain and zero knowledge proofs. IEEE
Transactions on Vehicular Technology 69(6), 5760–5772 (2020)
12. Griggs, K.N., Ossipova, O., Kohlios, C.P., Baccarini, A.N., Howson, E.A., Haya-
jneh, T.: Healthcare blockchain system using smart contracts for secure automated
remote patient monitoring. Journal of Medical Systems 42, 1–7 (2018)
Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 17
13. Gudymenko, I., Khalid, A., Siddiqui, H., Idrees, M., Clauß, S., Luckow, A.,
Bolsinger, M., Miehle, D.: Privacy-preserving blockchain-based systems for car
sharing leveraging zero-knowledge protocols. In: 2020 IEEE International Con-
ference on Decentralized Applications and Infrastructures (DAPPS). pp. 114–119
(2020)
14. Heiss, J., Eberhardt, J., Tai, S.: From oracles to trustworthy data on-chaining
systems. In: IEEE International Conference on Blockchain (2019)
15. Helo, P., Shamsuzzoha, A.: Real-time supply chain—a blockchain architecture for
project deliveries. Robotics and Computer-Integrated Manufacturing 63, 101909
(2020)
16. Huang, S., Wang, G., Yan, Y., Fang, X.: Blockchain-based data management for
digital twin of product. Journal of Manufacturing Systems 54, 361–371 (2020)
17. J.Eberhardt, S.Tai: On or Off the Blockchain? Insights on Off-Chaining Compu-
tation and Data. In: ESOCC 2017: 6th European Conference on Service-Oriented
and Cloud Computing (2017)
18. Kurt Peker, Y., Rodriguez, X., Ericsson, J., Lee, S.J., Perez, A.J.: A cost analysis of
internet of things sensor data storage on blockchain via smart contracts. Electronics
9(2) (2020)
19. Kurt Peker, Y., Rodriguez, X., Ericsson, J., Lee, S.J., Perez, A.J.: A cost analysis of
internet of things sensor data storage on blockchain via smart contracts. Electronics
9(2) (2020)
20. Peise, M., Kuhlenkamp, J., Busse, A., Eberhardt, J., Ulbricht, M.R., Tai, S., Baus,
J., Kassebaum, M., orner, T.: Blockchain-based local energy grids: Advanced use
cases and architectural considerations. In: IEEE 18th ICSA-C. pp. 130–137 (2021)
21. Putz, B., Dietz, M., Empl, P., Pernul, G.: Ethertwin: Blockchain-based secure dig-
ital twin information management. Information Processing & Management 58(1)
(2021)
22. Shafagh, H., Burkhalter, L., Hithnawi, A., Duquennoy, S.: Towards blockchain-
based auditable storage and sharing of iot data. In: Proceedings of the 2017 on
Cloud Computing Security Workshop. p. 45–50 (2017)
23. Sharma, B., Halder, R., Singh, J.: Blockchain-based interoperable healthcare using
zero-knowledge proofs and proxy re-encryption. 2020 International Conference on
COMmunication Systems & NETworkS (COMSNETS) pp. 1–6 (2020)
24. Sigwart, M., Borkowski, M., Peise, M., Schulte, S., Tai, S.: Blockchain-based data
provenance for the internet of things. In: Proceedings of the 9th International
Conference on the Internet of Things (2019)
25. Sund, T., of, C., Nadjm-Tehrani, S., Asplund, M.: Blockchain-based event pro-
cessing in supply chains—a case study at ikea. Robotics and Computer-Integrated
Manufacturing 65, 101971 (2020)
26. Woo, S., Song, J., Park, S.: A distributed oracle using intel sgx for blockchain-based
iot applications. Sensors 20(9) (2020)
27. Wood, G.: Ethereum: A secure decentralised generalised transaction ledger.
Ethereum Project Yellow Paper (2014)
28. Zhang, F., Cecchetti, E., Croman, K., Juels, A., Shi, E.: Town crier: An authen-
ticated data feed for smart contracts. In: Proceedings of the 2016 ACM SIGSAC
Conference on Computer and Communications Security (2016)