Document [original]

Jonathan Heiss, Anselm Busse, Stefan Tai

Trustworthy Pre-processing of Sensor Data in Data

On-Chaining Workflows for Blockchain-Based IoT

Applications

Open Access via institutional repository of Technische Universität Berlin

Document type

Conference paper | Accepted version

(i. e. final author-created version that incorporates referee comments and is the version accepted for

publication; also known as: Author’s Accepted Manuscript (AAM), Final Draft, Postprint)

This version is available at

https://doi.org/10.14279/depositonce-19863

Citation details

Heiss, J., Busse, A., & Tai, S. (2021). Trustworthy Pre-processing of Sensor Data in Data On-Chaining

Workflows for Blockchain-Based IoT Applications. In Service-Oriented Computing (pp. 133–149). Springer

International Publishing. https://doi.org/10.1007/978-3-030-91431-8_9.

This work is protected by copyright and/or related rights. You are free to use this work in any way permitted by

the copyright and related rights legislation that applies to your usage. For other uses, you must obtain

permission from the rights-holder(s).

Trustworthy Pre-Processing of Sensor Data in

Data On-chaining Workflows for

Blockchain-based IoT Applications

Jonathan Heiss, Anselm Busse, and Stefan Tai

Information Systems Engineering (ISE)

TU Berlin, Germany

{jh,ab,st}@ise.tu-berlin.de

Abstract. Prior to provisioning sensor data to smart contracts, a pre-

processing of the data on intermediate off-chain nodes is often necessary.

When doing so, originally constructed cryptographic signatures cannot

be verified on-chain anymore. This exposes an opportunity for undetected

manipulation and presents a problem for applications in the Internet of

Things where trustworthy sensor data is required on-chain.

In this paper, we propose trustworthy pre-processing as enabler for end-

to-end sensor data integrity in data on-chaining workflows. We define

requirements for trustworthy pre-processing, present a model and com-

mon workflow for data on-chaining, select off-chain computation utiliz-

ing Zero-knowledge Proofs (ZKPs) and Trusted Execution Environments

(TEEs) as promising solution approaches, and discuss both our proof-of-

concept implementations and initial experimental, comparative evalua-

tion results. The importance of trustworthy pre-processing and principle

solution approaches are presented, addressing the major problem of end-

to-end sensor data integrity in blockchain-based IoT applications.

Keywords: Internet of Things ·Blockchain ·Data Integrity ·On-chaining

·Off-chaining ·Pre-processing ·TEEs ·zkSNARKs

1 Introduction

Blockchain technology is increasingly used in the Internet of Things (IoT) to

store and process critical sensor data originating from and shared between mul-

tiple, often mutually distrusting parties [21, 25, 15, 16, 7, 12, 24]. In local energy

grids with blockchain-based energy trading, for example, energy consumers and

producers depend on smart meter-generated measurement data [20, 7]. In sup-

ply chains, product-related manufacturing and shipping events are written to

a blockchain to provide a single source of truth for all involved, independent

parties [25, 24]. In healthcare, blockchain use cases exist for doctors, hospitals,

and emergency services to have access to patients’ health data collected by wear-

ables [12].

However, the variety and scale of connected IoT devices and the generated

data pose new challenges regarding data processing and data on-chaining. Raw

2 J. Heiss, A. Busse, and S. Tai

sensor measurements cannot directly be used on the blockchain because of vol-

ume limitations [18] or because sensitive information may be exposed and be-

come accessible to unintended readers [7]. Blockchains inherently have privacy

and scalability limitations [6, 17] that must be taken into account.

Consequently, the on-chain processing of sensor data is preceded by pre-

processing steps to reduce data volume and ensure that confidential information

is veiled. Such pre-processing typically is executed on intermediate, off-chain

nodes as part of multi-staged data provisioning workflows [21, 25, 15, 16, 7, 12]:

data originates on constrained sensor nodes, then moves to more powerful gate-

way nodes for pre-processing, and is finally provisioned to smart contracts as ag-

gregated information. For example, in the healthcare use case described in [12],

data is pre-processed by personal computers or smartphones; in energy grids [7]

by workstations located within participating households; in supply chains [25]

by board computers and mobile devices.

While pre-processing has become an integral element in such data on-chaining

workflows and is necessary to mitigate scalability and privacy issues, off-chain

pre-processing also represents a security risk. Sensor devices typically sign their

measurements to provide data integrity. However, sensor data integrity is not

end-to-end: once data is pre-processed on middleboxes, signatures constructed

on the input do not apply to the output anymore. Contrary to smart contract

application logic, application stakeholders cannot validate off-chain processing as

part of the blockchain’s consensus protocol. Consequently, naive pre-processing

can be exploited for malicious data manipulation without being noticed. This

attack vector threatens data integrity in data on-chaining workflows and quickly

questions the entire blockchain-based IoT system design and data quality.

To address this problem, solutions are needed to ensure trustworthy pre-

processing, i.e., to make computational correctness verifiable on the blockchain.

Off-chain computations have been proposed [6] to outsource blockchain transac-

tion processing to off-chain nodes without compromising trust guarantees. Zero-

Knowledge (ZK) computations and Trusted Execution Environments (TEE) are

two important approaches here that are also increasingly being used in early-

adoption projects and practice [7, 10, 9, 1]. However, using ZK computations and

TEEs for trustworthy pre-processing has not been examined so far.

In the face of the rising interest in blockchain-based sensor data management

and the need for end-to-end sensor data integrity, in this paper, we analyze the

underlying problem of trustworthy pre-processing in data on-chaining workflows,

propose a model for integrity-preserving data on-chaining, and examine its prac-

tical applicability based on ZK computations and TEEs. Thereby, we make two

individual contributions:

1. First, we propose a model for end-to-end sensor data integrity through trust-

worthy pre-processing. We characterize sensor data pre-processing in on-

chaining workflows for blockchain-based IoT applications based on relevant

literature. From our findings, we refine our problem statement and introduce

trustworthy pre-processing as a workflow element that enables application

Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 3

stakeholders through participation in the blockchain network to verify data

integrity from source to sink.

2. Second, we examine the applicability of zkSNARKs-based and Trusted Exe-

cution Environments (TEE)-based off-chain computations for our proposed

model. Based on a typical application workflow, we first conceptualize how

trustworthy pre-processing can be instantiated with ZoKrates [8], a toolkit

for zkSNARKs-based off-chain computation, and with Intel SGX [5], In-

tel’s realization of TEEs. Then, we implement the proposed model with

both technologies as a proof of concept and present preliminary experiments

in a testbed. While our results attest to the applicability of trustworthy

pre-preprocessing with both approaches, they also confirm that, in compari-

son, zkSNARKs provide stronger integrity guarantees (weaker trust assump-

tions), whereas TEEs enable more efficient off-chain pre-processing.

2 Pre-Processing

To lay the foundation for trustworthy pre-processing, in this section, we first

describe the general characteristics of pre-processing in blockchain-based IoT

applications that we observed in pertinent research papers. Next, we refine our

problem statement and define computational integrity, based on [2]. Finally,

we present a model for trustworthy pre-processing on gateway nodes for use in

data on-chaining workflows that start with sensor devices and result in smart

contracts.

2.1 Characterization

Pre-processing in blockchain-based applications shares common objectives, input

types, and functionality.

Objectives In data on-chaining workflows, off-chain pre-processing helps to

mitigate blockchain-inherent scalability and privacy limitations. Thereby, it pur-

sues the following objectives:

–Offloading Computation: Outsource on-chain data processing to an off-chain

node that is not bound to costly consensus-based transaction processing [7].

–Reducing Storage: Reduce the volume of sensor data to minimize the storage

footprint on the blockchain [12, 19].

–Enabling Confidentiality: Hide sensitive information contained in raw mea-

surements or meta-data from stakeholders that do have read permissions [7,

25, 21].

Inputs Pre-processing can be executed on different types of data. We distinguish

between the following:

4 J. Heiss, A. Busse, and S. Tai

–Measurements include all data that is generated by sensor devices. This

includes time series data collected over a longer period of time [22], for

example, temperature or location data, and event data that represents ex-

ternally triggered occurrences [25], for example, the scanning or opening of

a container in a logistics context.

–Meta-data originates from the sensor device and contains descriptive infor-

mation about the measurements, such as sensor identities, target storage

addresses, or timestamps.

–Auxiliary data is added at the gateway node. Examples are filter rules, access

control lists, or storage addresses.

Measurements and meta-data are critical for pre-processing and are referred to

in the following as sensory data. In contrast, auxiliary data is never processed

alone but optionally used to enrich pre-processing.

Types Without claiming completeness, we identify three general types of data

pre-processing which can be observed in relevant applications [25, 20, 12, 21] and

which represent typical functionality for operating on sequential data 1.

–Mapping: Data is transformed into a target format, e.g., enumeration, en-

cryption, decryption, hashing [21, 25].

–Reducing: Data of one or multiple sensor devices is consolidated, e.g., the

arithmetic average or a total amount is calculated [20].

–Filtering: Data is filtered according to predefined rules, e.g., only values

below a predefined threshold are returned [12].

2.2 Problem Refinement

Data provisioning is often controlled by one of the stakeholders, e.g., shippers in

supply chains [25, 15] or producers in energy markets [7]. Stakeholders may have

a personal, often economically motivated interest in manipulating the data, e.g.,

in cooling chains to prevent contractual penalties if perishable fright is perished

or to improve accounting positions. Given such motifs, we assume data providing

stakeholders as potential attackers.

In data on-chaining workflows, data can take three states: it is in transit

when it is transmitted from one to another component, it is at rest when it is

persisted on disk, and it is in use when it is processed in memory. During the

states in transit and at rest, data integrity and authenticity can be verified using

cryptographic signatures. However, when data is processed, it is transformed and

signatures constructed on the input do not apply for the output anymore. Fur-

thermore, off-chain pre-processing cannot be validated by stakeholders through

the consensus mechanism. An attacker could selfishly execute different functions

on the data to manipulate the output and obtain a personal benefit without be-

ing noticed. Therefore, we assume manipulation of computation as the potential

attack.

1https://web.mit.edu/6.005/www/fa15/classes/25-map-filter-reduce/

Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 5

2.3 Computational Integrity

As a first step towards trustworthy pre-processing, we characterize computa-

tional integrity. We adopt the model proposed in [2].

A pre-processing program Pis executed on input data Dand some auxiliary

data Aand returns output Osuch that P(D, A)→O.

A malicious executer may benefit from creating a manipulated program P′

such that P′(D, A)→O′|O′=O. For example, in the supply chain use case,

a shipper executes a threshold check Pon temperature measurements Dusing

the threshold A. If the shipper knows that the outcome Otriggers a contractual

penalty, but O′does not, it may change Pto P′to obtain O′instead of O. It

then reports O′to the blockchain and is exempt from the penalty. Additionally,

the executer may leave the program Punchanged but manipulate the input data

D such that P(D′, A)→O′|D=D′∧O′=Oor the auxiliary data A such that

P(D, A′)→O′|A=A′∧O′=O

To prevent both, program and input manipulation, stakeholders should be

able to verify computational integrity which is only guaranteed if output Ois

executed on the right program Pand on the right input data (D, A) such that

P(D, A)→O|(P=P′)∧(D=D′)∧(A=A′). Therefore, we assume that pro-

gram Palso generates an evidence Ethat asserts computational integrity such

that P(D, A)→(O, E). To enable third-party stakeholders to verify computa-

tional integrity, additionally, an asymmetric key pair is required: the evidence

signed with the proving key can be verified by any third party with the corre-

sponding verification key. The evidence and the evidence key pair represent the

major artefacts for trustworthy pre-processing.

2.4 End-to-End Data Integrity

Given that integrity of data can be verified while it is in use, we can define a

data on-chaining workflow where integrity is verifiable from its source on the

sensor node to its sink on the smart contract as depicted in Figure 1. Note that

instead of a simple signature, verifiable evidence is provided to the blockchain

that allows data integrity verification with moderate computational overhead in

the blockchain network.

Fig. 1: End-to-End Data Integrity through Trustworthy Pre-Processing

6 J. Heiss, A. Busse, and S. Tai

One Time Setup During an initial one time setup, central system artifacts are

generated and deployed on the system components. Given that these artifacts

are critical to verify computational integrity, we assume a trusted setup where

each stakeholder can verify the integrity of the artifacts. It consists of three

steps:

As a first step (1. Integrity Assertion), an environment is established that

enables the gateway node to generate verifiable evidence of computational in-

tegrity as accompanying artefacts of the pre-processing outputs. This includes

the integrity of sensory and auxiliary inputs. Examples for such environments

are mathematical constraint systems [8] or trusted execution environments [5]

as will be described in the subsequent section.

Next (2. Key Generation), two key pairs are required: an evidence key pair

consisting of a proving and verification key for signing and verifying the evidence

and a sensor key pair, represented as a cryptographic public and private key that

is used to sign and verify the sensor data on the sensor node and the gateway

node respectively.

As the last setup step (3. Deployment), all artefacts are deployed: The gate-

way node is equipped with the sensor node’s public key, the integrity-preserving

pre-processing program, the proving key, and optionally auxiliary data. The

smart contract receives the verification key that enables evidence verification.

Recurring Operations Sensory data arrives recurringly at the gateway node

in regular intervals, e.g., batches of time series data, or in irregular intervals,

e.g., externally triggered events. Then (4. Pre-Processing), the pre-processing

program takes the signed sensory data, the sensor’s public key, and optionally

auxiliary data as inputs and executes the following steps:

(a) The sensory inputs’ signature is verified with the sensor device’s public key.

(b) Pre-processing functions are executed on the verified inputs. Examples are

provided in section 2.1.

dence enables the smart contract to verify computational integrity.

Outputs and signed evidence are transmitted to the smart contract through

the blockchain node. The smart contract verifies the evidence using the veri-

fication key (5. Verification). Successful verification on the blockchain enables

applications stakeholders to independently verify that integrity of sensor data

has been preserved from source to sink despite intermediate pre-preprocessing.

Pre-processing outputs can be consumed through participating blockchain nodes

and used for subsequent processing.

3 Application

For trustworthy pre-processing to become easily applicable in practice, technolo-

gies are required that enable on-chain verifiability of computational integrity and

that can implement the pre-processing characteristics as described in 2.1.

Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 7

Fig. 2: Off-chain Computation Technologies according to [6]

3.1 Technologies for Trustworthy Pre-processing

Off-chain computation has been proposed to mitigate privacy and scalability

limitations of blockchains by outsourcing computation to off-chain nodes with-

out compromising core blockchain properties [6, 17]. Thereby, it represents a

matching concept for trustworthy pre-processing.

However, the different approaches to off-chain computation presented in [6]

and depicted in Figure 2 are not equally suitable. Both incentive-based and

sMPC-based approaches require multiple nodes that execute non-trivial pro-

tocols. However, in data on-chaining applications in the IoT [12, 15, 25, 7, 21],

pre-processing is typically executed on a single node with limited networking

and storage capacity. If such a constraint is given, the distributed computation

model and interactive nature of incentive- and sMPC-based approaches may be

inconsistent with use case specific requirements which restricts general applica-

bility. In contrast, zero-knowledge and enclave-based approaches can be executed

non-interactively on a single node and, hence, promise broader applicability for

trustworthy pre-processing.

3.2 ZkSNARKs-based Pre-Processing with ZoKrates

Zero-knowledge proofs enable a prover to convince a verifier that it has correctly

executed a computation without revealing inputs to the verifier.

zkSNARKs can be summarized as one type of a zero-knowledge protocol

that distinguishes through succinctness, i.e., resulting artefacts are small in size

and can be verified fast, non-interactivity, i.e., only one message is required to

convince the verifier, and argument of knowledge, i.e., the prover is able to prove

that she has access to the correct data.

ZoKrates [8] provides a toolbox and a higher-level language to implement a

zkSNARKs-proving system where an off-chain prover can convince an on-chain

verifier that the computation has been executed correctly.

To describe the ZoKrates-based pre-processing (compare Figure 3), we lever-

age the model presented in Section 2.1 and build upon the ZoKrates workflow

described in [8].

One Time Setup

1. Integrity Assertion: To guarantee integrity of auxiliary data and the sensor

public key, both are typed as public arguments in the ZoKrates program and,

8 J. Heiss, A. Busse, and S. Tai

Fig. 3: Trustworthy Pre-Processing with ZoKrates

hence, are required on-chain for evidence verification. Since the verification

would fail on different public inputs, their integrity can be determined on-

chain.

Once specified, the high-level ZoKrates code is compiled into an executable

constraint system (ECS) in the ZoKrates Intermediate Representation (ZIR)

format that can be considered as an extension to a Rank-1-Constraint System

and enables assertion of computational integrity: if a variable assignment is

found that satisfies the defined constraints computational integrity can be

proven.

2. Evidence Key Generation: An evidence key pair is generated from a Common

Reference String (CRS) [8] which enables proof creation and verification.

Since the CRS allows construction of fake proofs it must be securely disposed

after key generation. The evidence key pair is cryptographically bound to

the previously generated ECS.

3. Deployment: The ECS, the evidence proving key, auxiliary data, and the

sensor public key are deployed to the gateway node which takes the role

of the off-chain prover. Verification key and the verification contract are

deployed to the blockchain.

Recurring Operations

5. Execution: The ZIR program is executed on predefined inputs, through the

ZoKrates interpreter. The output is called witness, an artefact representing

variable assignments that satisfy the specified constraints for a specific exe-

cution. In a separate step, the cryptographic proof is generated based on the

execution-specific witness and the program-specific proving key. Finally, out-

puts and evidence are forwarded to the smart contract through a blockchain

node.

6. Verification: The verification contract takes the cryptographic proof, the

verification key, and public program arguments as input parameters. The

verification is only successful if the proof is executed with the right program

and on the right (public) inputs.

Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 9

Fig. 4: Trustworthy Pre-Processing with Intel SGX

3.3 Enclave-based Pre-Processing with Intel SGX

Enclave-based computation enables an enclave-external party to verify that an

output has been computed by a specific program inside a specific enclave that

protects internal integrity. Thereby, it relies on two concepts: Trusted Execution

Environments and Remote Attestation.

Trusted Execution Environments (TEE) are hardware-secured parts of a sys-

tem architecture that protect data and code from external manipulation and dis-

closure. Programs executed inside such TEEs are running in an isolated and/or

encrypted memory region that cannot even be accessed in the highest privilege

level of the system. Thus, it protects the content of the TEE from the system

owner and guarantees the integrity of computation executed inside the TEE.

Intel SGX is Intel’s concrete implementation of TEEs. We use the terms TEE

and enclave interchangeably.

Remote Attestation enables the external verification of the integrity of the

TEE’s internal state and the authenticity of messages received from inside. Thus,

ensuring that a malicious attacker cannot falsely pose as an trusted enclave.

TEE-enabled devices have a device identity key that is embedded into the device

hardware during manufacturing and can be verified by external parties through

a Public Key Infrastructure (PKI). Using this key, the device creates for each

instantiated TEE an identity certificate which can externally be verified through

the PKI. This enables evidence key generation. When remote attestation is re-

quested, the enclave returns signed measurements which represent a complete

snapshot of the TEEs internal state. With SGX as TEE, remote attestation and

the PKI are managed by Intel.

In the following, we describe pre-processing with Intel SGX as depicted in

Figure 4. To achieve comparability with Zokrates-based pre-processing we use

the same workflow model as described in Section 2.4.

One Time Setup

1. Integrity Assertion: To guarantee integrity of auxiliary data and the sensor

public key, both must be protected through the TEEs security guarantees.

Therefore, they are specified inside the enclave during implementation.

10 J. Heiss, A. Busse, and S. Tai

Once the enclave is instantiated and loaded in memory, as a first step, re-

mote attestation is executed to verify the enclave’s internal state. The signed

measurements are verified using the enclave’s public key that is previously

authenticated through the externally managed PKI. If the measurements

match a predefined reference value that represents the ground truth of the

enclave’s internal state, the enclave’s integrity is verified.

2. Key Generation: To verify the enclave’s integrity a unique enclave-bound

key pair is required that can be authenticated from outside the enclave. This

evidence key pair is used to sign program results computed inside the enclave.

Given that the enclave’s integrity guarantees hold, this signature enables

verification of computational integrity on the blockchain. The evidence key is

generated inside the enclave and can be authenticated through an externally

managed PKI.

3. Deployment: The enclave’s evidence public key becomes part of the verifi-

cation contract which implements the signature verification on-chain and is

deployed to the blockchain. At this point, the enclave is already instantiated

on the gateway node.

Recurring Operations

5. Execution: Sensor data is provided through the host program which repre-

sents the only interface to the enclave. Auxiliary data and the sensor public

key are already part of the enclave and, hence, protected. The program is

executed as defined in Section 2.4. The computational outputs are signed

with the evidence proving key.

6. Verification: The verification contract validates the signature with the ev-

idence verification key. A successful validation proves the outputs’ authen-

ticity, i.e., they have been signed with the right proving key that is unique

to the enclave, and integrity, i.e., the received outputs are computed by the

right pre-processing program inside the enclave.

4 Evaluation

Given the two conceptual workflow descriptions, in this section, we evaluate the

technical feasibility for each technology.

4.1 Implementation

Our proof-of-concept (PoC) implementations follow the descriptions provided

in Section 3.2 and 3.3 respectively. Thereby, we focus on the recurring opera-

tions steps, execution and verification which we consider as most relevant to

demonstrate feasibility. Aspects of the setup phase are discussed in Section 5.

The PoC program should respect the pre-processing characteristics presented

in Section 2.1. Our program mimics a threshold violation check on sensory data

where the threshold represents auxiliary data. The sensory data is filtered for

Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 11

violations, then reduced by counting the violations, and mapped by scaling the fil-

tered values down. The smart contract is only provided with the violation count.

Thereby, the program fulfills all three objectives: computation is outsourced to

an off-chain node, the data footprint is reduced in size, and the potentially sen-

sitive sensor measurements are not published on-chain.

ZoKrates: For our ZoKrates-based implementation, we simulate the sensor

node with a Python script that hashes the data with SHA256 and signs it with

EDDSA-based sensor key pair, which ZoKrates support. Plain sensory data is

a private input, while the data’s hash, signature, and the sensor public key are

public inputs to the ZoKrates program. To verify integrity of sensory inputs, the

signature’s hash input is reconstructed from the plain sensor data and compared

to the hash inputs. Only if both signature verification and hash comparison

are successful integrity is guaranteed. Hashing and signature verification are

implemented using the ZoKrates Standard Library. Pre-processing is executed

by two commands provided by the ZoKrates CLI: compute-witness that requires

the compiled program and generate-proof that takes proving key and witness as

inputs. The outputs are written to disk.

Intel SGX: For the SGX evaluation, we have implemented two enclaves. The

first one simulates a sensor node and signs the sensory input data with an in-

ternally generated sensor key pair using the SGX-provided operations sgx create

keypair and sgx ecdsa sign. The second enclave represents the gateway node that

stores auxiliary data and the sensor public key internally. It verifies the sensor

data with the sensor public key using the SGX operation sgx ecdsa verify. Evi-

dence key pair generation and signature construction on computational outputs

are realized with the same SGX commands as the sensor enclave. The processing

result and the corresponding signature are written to disk.

Ethereum: As blockchain technology, we chose Ethereum [27], which is

widely used and finds application both as a public blockchain but also as consor-

tium blockchain based on Proof-of-Authority consensus and non-public deploy-

ment. For each, respectively, a verification contract is implemented in Solidity

that runs on a locally deployed Ethereum blockchain and is accessed through a

Ganache blockchain client. To validate Intel SGX evidence, we build upon an ex-

isting ECDSA implementation for the Ethereum blockchain 2. ZoKrates proofs

rely on EdDSA (twisted Edwards curve) and are verified through a dedicated

verification contract that is generated by ZoKrates CLI support 3.

4.2 Experiments

Given our proof-of-concept implementations, we can now conduct initial experi-

ments to obtain the first practical insights into trustworthy pre-processing with

zkSNARKs and TEEs. At this point, it should be noted that experimental re-

sults strongly depend on our non-optimized PoC implementations and, hence,

cannot simply be generalized.

2https://github.com/tdrerup/elliptic-curve-solidity

3https://github.com/Zokrates/ZoKrates

12 J. Heiss, A. Busse, and S. Tai

101.5

101.6

101.7

Time [s]

(a) Various Batch Sizes, Count of 1

101.5

102

102.5

103

Time [s]

(b) Various Batch Counts, Size of 1

Fig. 5: Pre-processing with ZoKrates

Exerimental Setup For our experimental setup, we deploy our implementa-

tions on an Intel NUC-Kit NUC7PJYH with an SGX enabled Pentium Silver

J5005 CPU, 8 GB of Memory, and an Ubuntu 18.04.5 LTS operating system. To

construct workloads, we use smart meter measurements collected in a testbed

of an energy grid research project4and prepare the measurements such that (1)

each measurement consists of four integer values, (2) measurements are collected

into batches of different sizes line-wise in plain text, and (3) each batch is signed

to represent the sensor’s signature.

As mentioned in Section 2.1, pre-processing is typically exposed to two types

of workloads: event and batch processing. To simulate that in our experimental

setup, we turn on two knobs: for events of different sizes, we change the input

data size per execution (batch size), for batch processing, we vary the number

of subsequent executions (batch count). Latter is executed on size-one-batches

which contain a single measurement.

The computational outputs of size-one-batch experiments are used for on-

chain verification, which is measured in Gas, an Ethereum-specific metric for

capturing computational complexity of on-chain transaction processing.

Results The results summarized for ZoKrates in Figure 5 and for Intel SGX in

Figure 6 show the overall execution time for off-chain pre-processing in seconds

and microseconds, respectively. As expected, the execution time of zkSNARKs-

based pre-processing is orders of magnitude higher than that of enclave-based

pre-processing. With larger batch sizes, the execution time increases almost grad-

ually. This holds true for each technology individually as shown in Figure 5 a)

and Figure 6 a). Similar behaviour can be observed for increasing the batch

count as shown in Figure 5 b) and Figure 6 b). However, we can observe that

for both ZoKrates and SGX the increase is much steeper for a growing batch

count than for a growing batch size (note the different logarithmic y-scales). For

this specific implementation example, this would mean that it is preferable to

increase the number of processed data through larger batch sizes rather than

counts when possible in the actual application scenario.

4https://blogpv.net/

Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 13

128

256

512

103

103.1

103.2

Time [µs]

(a) Various Batch Sizes, Count of 1

128

256

512

103

104

105

Time [µs]

(b) Various Batch Counts, Size of 1

Fig. 6: Pre-processing with Intel SGX

In ZoKrates-based pre-processing, the accompanying construction of crypto-

graphic proofs represents a memory-intensive computation that correlates with

the input size. The experiment for the next larger batch size of 32 measurements

in ZoKrates ran out of memory during the proof-generation on the test system.

Given that sensory data can quickly grow very large, the memory capacity of

constrained IoT or edge devices may present a limiting factor, but may not be

an issue for larger middleboxes.

In contrast, Intel SGX reduces pre-processing overhead. Even though, our

implementation was also memory limited regarding a batch size larger than

1024 measurements, this is just a limitation of the current SGX design that

might change in the future and can be mitigated, e.g., by splitting up the pro-

cesses into multiple enclaves on the same machine. Better efficiency and smaller

memory consumption distinguishes Intel SGX as a suitable technology for lower

IoT layers where computational resources are typically scarce. However, con-

trary to ZoKrates, SGX-based pre-processing requires an increased trust in the

correctness of the hardware implementation and the attestation process that

requires trusting Intel regarding a correct attestation.

In our proof-of-concept implementation, on-chain verification costs are cheaper

for ZoKrates-generated proofs (567 614 Gas) than for Intel SGX-generated signa-

tures (1 211 443 Gas). However, since on-chain verification costs strongly depend

on the implementation of respective signature algorithm our results cannot be

generalized, e.g., for other blockchain technologies.

5 Discussion

While in the previous section, initial insights about the performance behavior

of each technology were provided, in this section, we discuss security and trust

aspects and potential extensions for trustworthy pre-processing.

Integrity and Trust Assumptions: As described in Section 2.2, pre-

processing is assumed to be executed by non-trusted stakeholders who have

an incentive for data manipulation. While off-chain technologies eliminate un-

noticed attacks during pre-processing, the setup phase still reveals an attack

14 J. Heiss, A. Busse, and S. Tai

surface. In Zokrates, for example, key generation must be executed in a trusted

setup to guarantee that the Common Reference String is safely disposed to pre-

vent fake proof generation. However, establishing a trusted setup for zkSNARKs

is a known problem to which various approaches exist as referenced in [8]. In Intel

SGX, the integrity guarantee strongly relies on the internal state of the enclave

and on the authenticity of the evidence key pair. To preserve this guarantee,

remote attestation and key authenticity must be verified through a trusted third

party or by all involved stakeholders individually. Also, auxiliary data and the

sensor’s public key must be verified before being added to the enclave. Beyond

the setup, zkSNARKs-based pre-processing does not rely on further trust as-

sumptions, whereas enclave-based pre-processing heavily relies on a trustworthy

manufacturer that ensures that private keys are kept secret and certificates ob-

tained from the PKI are authentic to the device’s identities. This distinguishes

ZoKrates as particularly suitable for processing critical data with substantial

security demands.

Further Attacks: Beyond our attack model described in Section 2.2, attacks

on data freshness and availability must be considered. While an attacker that

controls communication channels, e.g., between gateway and blockchain node,

cannot compromise data integrity without being noticed (Man-in-the-Middle At-

tack) due to signature and evidence verification, it can, however, intercept and

replay messages in a different order to impact the overall application logic (Re-

play Attack). To prevent this, secure timestamps or challenge-response patterns

can be applied. Furthermore, to prevent a malicious executor from compromis-

ing availability by withholding messages (Denial of Service Attack), gateway

nodes can redundantly be deployed to eliminate centralization, similar to this

proposal [26].

Multi-Stage Pre-Processing: In multi-stage data on-chaining workflows,

multiple pre-processing tasks may be executed subsequently by different non-

trusted stakeholders. To verify integrity on-chain, an evidence chain must be

established that allows any subsequent computation to validate the provided

evidence of the previous computation. This way, end-to-end integrity could be

guaranteed along arbitrarily long on-chaining workflows.

Confidential Pre-Processing: While this work focuses on integrity preser-

vation, in some use cases it might be required to keep inputs to pre-processing

hidden from the executor. This can, for example, be achieved through Intel SGX,

where encrypted inputs can be decrypted inside the enclave, processed, and en-

crypted again before being returned. Thereby, inputs and outputs would not be

accessible by the executor. However, side-channel attacks must be respected that

are known to extract confidential information from enclaves [4].

6 Related Work

In this paper, we extend trustworthy data on-chaining as presented in [14] by

considering data in use as an additional attack vector. Furthermore, we lever-

age approaches to off-chain computation presented in [6] to realize trustworthy

Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 15

pre-processing. From the proposed off-chain computation technologies in [6],

zkSNARKs and Trusted Execution Environments are increasingly adopted in

scientific literature on blockchain-based IoT applications.

Recently, many proposals leverage zkSNARKs for off-chain computations

through Zokrates; however, only a few intersect blockchain-based sensor data

management. While in [7] ZoKrates is applied for off-chain processing of sensor

data, i.e., smart meter measurements in local energy grids, other works mainly

use Zokrates for privacy-preserving authentication, e.g. in the context of smart

vehicle authentication at charging stations [11], consumer authentication for car

sharing [13], or in health care for patient authentication [23].

TEEs are leveraged in various papers to implement trustworthy oracles that

bridge data provisioning from off-chain data sources to smart contracts. For ex-

ample, in TownCrier [28], a TEE-based oracle system is proposed to authenticate

data provided by HTTPS-enabled off-chain data sources, or in [26], a distributed

TEE-enabled oracle system is proposed that improves availability. Beyond sci-

entific usage, e.g., ChainLink 5works on a solution to implement these concepts

for practical usage [3].

While the main focus of these proposals lies in data provisioning, other works

instead use TEEs for sensor data management. In [9], for example, a system is

proposed that employs TEEs for intermediate processing of sensory data before

it is forwarded to the blockchain and the cloud. The authors of [1] use TEEs for

trustworthy access management of sensor data in hybrid storage systems where

off-chain storage holds encrypted sensor data and the blockchain stores its hashes

and access logs. While these proposals do not apply pre-processing as defined

in this paper, they underline the need for a systematization of trustworthy pre-

processing that we aim to provide with our contributions.

7 Conclusion

End-to-end sensor data integrity is critical to many blockchain-based IoT appli-

cations. Data on-chaining workflows accordingly require pre-processing on off-

chain nodes to be trustworthy. In this paper, we explored the use of zkSNARKs-

and TEE-based computations for trustworthy pre-processing, first, as individual

candidate technologies that require non-trivial set-ups for integration in data on-

chaining workflows, and second, through a preliminary, comparative experimen-

tal evaluation based on two proof-of-concept implementations. We conclude that

each presents an important approach that (a) can conceptually be well-integrated

in respective workflows and (b) satisfies the requirements and primary objective

of end-to-end data integrity. Our proof-of-concept implementations use current,

state-of-the-art software, and, since both zero-knowledge proofs and TEEs are

very active areas of research, our implementations and the experimental find-

ings must be seen as preliminary. We expect rapid advances regarding the used

software stacks and current constraints regarding memory limitations, and, con-

sequently, performance numbers to change. Still, a principal performance gap

5https://chain.link/

16 J. Heiss, A. Busse, and S. Tai

and performance advantage of TEEs over zkSNARKs is expected to remain.

However, as discussed in this paper, the choice of an approach and technology

will depend also on other, non-performance criteria like the integrity and trust

assumptions or existing attack vectors for the specific IoT application under con-

sideration. Future work will address extensions of the proposed model regarding

its computational scalability through parallel execution and its applicability for

stream processing.

References

1. Ayoade, G., Karande, V., Khan, L., Hamlen, K.: Decentralized iot data manage-

ment using blockchain and trusted execution environment. In: 2018 IEEE Interna-

tional Conference on Information Reuse and Integration (IRI). pp. 15–22 (2018)

2. Ben-Sasson, E., Bentov, I., Chiesa, A., Gabizon, A., Genkin, D., Hamilis, M.,

Pergament, E., Riabzev, M., Silberstein, M., Tromer, E., Virza, M.: Computa-

tional integrity with a public random string from quasi-linear pcps. In: Advances

in Cryptology – EUROCRYPT 2017. pp. 551–579 (2017)

3. Breidenbach, L., Cachin, C., Chan, B., Coventry, A., Ellis, S., Juels, A., Koushan-

far, F., Miller, A., Magauran, B., Moroz, D., et al.: Chainlink 2.0: Next steps in

the evolution of decentralized oracle networks (2021)

4. Bulck, J.V., Minkin, M., Weisse, O., Genkin, D., Kasikci, B., Piessens, F., Silber-

stein, M., Wenisch, T.F., Yarom, Y., Strackx, R.: Foreshadow: Extracting the keys

to the intel SGX kingdom with transient out-of-order execution. In: 27th USENIX

Security Symposium (USENIX Security 18). pp. 991–1008. USENIX Association,

Baltimore, MD (Aug 2018)

5. Costan, V., Devadas, S.: Intel SGX explained. IACR Cryptol. ePrint Arch. 2016,

86 (2016), http://eprint.iacr.org/2016/086

6. Eberhardt, J., Heiss, J.: Off-chaining models and approaches to off-chain computa-

tions. In: Proceedings of the 2Nd Workshop on Scalable and Resilient Infrastruc-

tures for Distributed Ledgers. SERIAL’18, ACM (2018)

7. Eberhardt, J., Peise, M., Kim, D.H., Tai, S.: Privacy-preserving netting in local

energy grids. In: 2020 IEEE International Conference on Blockchain and Cryp-

tocurrency (ICBC). pp. 1–9 (2020)

8. Eberhardt, J., Tai, S.: Zokrates - scalable privacy-preserving off-chain computa-

tions. In: IEEE International Conference on Blockchain (2018)

9. Enkhtaivan, B., Inoue, A.: Mediating data trustworthiness by using trusted hard-

ware between iot devices and blockchain. In: 2020 IEEE International Conference

on Smart Internet of Things (SmartIoT). pp. 314–318 (2020)

10. Gabay, D., Akkaya, K., Cebe, M.: A privacy framework for charging connected

electric vehicles using blockchain and zero knowledge proofs. In: 2019 IEEE 44th

LCN Symposium on Emerging Topics in Networking (LCN Symposium). pp. 66–73

(2019)

11. Gabay, D., Akkaya, K., Cebe, M.: Privacy-preserving authentication scheme for

connected electric vehicles using blockchain and zero knowledge proofs. IEEE

Transactions on Vehicular Technology 69(6), 5760–5772 (2020)

12. Griggs, K.N., Ossipova, O., Kohlios, C.P., Baccarini, A.N., Howson, E.A., Haya-

jneh, T.: Healthcare blockchain system using smart contracts for secure automated

remote patient monitoring. Journal of Medical Systems 42, 1–7 (2018)

Trustworthy Pre-Processing of Sensor Data in Data On-chaining Workflows 17

13. Gudymenko, I., Khalid, A., Siddiqui, H., Idrees, M., Clauß, S., Luckow, A.,

Bolsinger, M., Miehle, D.: Privacy-preserving blockchain-based systems for car

sharing leveraging zero-knowledge protocols. In: 2020 IEEE International Con-

ference on Decentralized Applications and Infrastructures (DAPPS). pp. 114–119

(2020)

14. Heiss, J., Eberhardt, J., Tai, S.: From oracles to trustworthy data on-chaining

systems. In: IEEE International Conference on Blockchain (2019)

15. Helo, P., Shamsuzzoha, A.: Real-time supply chain—a blockchain architecture for

project deliveries. Robotics and Computer-Integrated Manufacturing 63, 101909

(2020)

16. Huang, S., Wang, G., Yan, Y., Fang, X.: Blockchain-based data management for

digital twin of product. Journal of Manufacturing Systems 54, 361–371 (2020)

17. J.Eberhardt, S.Tai: On or Off the Blockchain? Insights on Off-Chaining Compu-

tation and Data. In: ESOCC 2017: 6th European Conference on Service-Oriented

and Cloud Computing (2017)

18. Kurt Peker, Y., Rodriguez, X., Ericsson, J., Lee, S.J., Perez, A.J.: A cost analysis of

internet of things sensor data storage on blockchain via smart contracts. Electronics

9(2) (2020)

19. Kurt Peker, Y., Rodriguez, X., Ericsson, J., Lee, S.J., Perez, A.J.: A cost analysis of

internet of things sensor data storage on blockchain via smart contracts. Electronics

9(2) (2020)

20. Peise, M., Kuhlenkamp, J., Busse, A., Eberhardt, J., Ulbricht, M.R., Tai, S., Baus,

J., Kassebaum, M., Z¨orner, T.: Blockchain-based local energy grids: Advanced use

cases and architectural considerations. In: IEEE 18th ICSA-C. pp. 130–137 (2021)

21. Putz, B., Dietz, M., Empl, P., Pernul, G.: Ethertwin: Blockchain-based secure dig-

ital twin information management. Information Processing & Management 58(1)

(2021)

22. Shafagh, H., Burkhalter, L., Hithnawi, A., Duquennoy, S.: Towards blockchain-

based auditable storage and sharing of iot data. In: Proceedings of the 2017 on

Cloud Computing Security Workshop. p. 45–50 (2017)

23. Sharma, B., Halder, R., Singh, J.: Blockchain-based interoperable healthcare using

zero-knowledge proofs and proxy re-encryption. 2020 International Conference on

COMmunication Systems & NETworkS (COMSNETS) pp. 1–6 (2020)

24. Sigwart, M., Borkowski, M., Peise, M., Schulte, S., Tai, S.: Blockchain-based data

provenance for the internet of things. In: Proceedings of the 9th International

Conference on the Internet of Things (2019)

25. Sund, T., L¨o¨of, C., Nadjm-Tehrani, S., Asplund, M.: Blockchain-based event pro-

cessing in supply chains—a case study at ikea. Robotics and Computer-Integrated

Manufacturing 65, 101971 (2020)

26. Woo, S., Song, J., Park, S.: A distributed oracle using intel sgx for blockchain-based

iot applications. Sensors 20(9) (2020)

27. Wood, G.: Ethereum: A secure decentralised generalised transaction ledger.

Ethereum Project Yellow Paper (2014)

28. Zhang, F., Cecchetti, E., Croman, K., Juels, A., Shi, E.: Town crier: An authen-

ticated data feed for smart contracts. In: Proceedings of the 2016 ACM SIGSAC

Conference on Computer and Communications Security (2016)