Tamper resistance of AES: Models, attacks and countermeasures / Volker Krummel [original]

Tamper Resistance of AES

–

Models, Attacks and Countermeasures

A dissertation submitted to the

DEPARTMENT OF COMPUTER SCIENCE

UNIVERSITY OF PADERBORN

for the degree of

Doktor der Naturwissenschaften

presented by

VOLKER KRUMMEL

accepted on the recommendation of

Prof. Dr. Johannes Bl¨omer, examiner

Prof. Dr. Joachim von zur Gathen, co-examiner

2007

≪Timmy & Finn – Sonnenkinder, die auch im Regen lachen≫

Acknowledgments

I am deeply grateful to my supervisor, Prof. Dr. Johannes Bl¨omer, for his great support

and continuous encouragement in writing this thesis. Among other topics, he introduced

me into the field of tamper resistance and side channel attacks and supplied me with new

interesting and challenging problems and ideas. Johannes allowed me great freedom to do

my research and he always took time to discuss the ongoing progress. His comments and

suggestions were always very helpful to improve my work.

I am also truly indebted to my second supervisor, Prof. Dr. Joachim von zur Gathen, who

sparked my interest in cryptography. The opportunity to join his working group allowed me

to deepen my research in this fascinating area.

Furthermore, I would like to thank Dr. Jean-Pierre Seifert, the coordinator of our joint

project with the Intel Corporation. The cooperation with Intel not only implied financial

support of my research but also provided valuable insights in recent cryptographic problems.

This thesis would not have been possible without the generous support of the “Institut

f¨ur Industriemathematik” of the University of Paderborn. Special thanks go to Tanja B¨urger

and Dr. Robert Preis who were very helpful in handling all the administrative obstacles.

For proof reading parts of my thesis, I would like to thank Marcel R. Ackermann, Dr.

Valentina Damerow and Stefanie Naewe.

Contents

1 Introduction 1

2 The Advanced Encryption Standard (AES) 5

2.1 Symmetric Block Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Basic Algebraic Structures of AES . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Representation of Data . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.2 The Finite Field F2[x]/hx8+x4+x3+x+ 1i.............. 7

2.2.3 The Ring F2[x]/hx8+ 1i. . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.4 The Ring R=F256[y]/hy4+ 1i. . . . . . . . . . . . . . . . . . . . . . 9

2.3 The Standard Implementation of AES . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 State Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.2 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.3 Key Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.4 Decryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 The Fast Implementation of AES . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Security and Side Channel Attacks 19

3.1 General Principles of Side Channel Attacks . . . . . . . . . . . . . . . . . . . 20

3.2 Side Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1 Timing Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.2 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.3 Fault Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.4 Cache Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.5 Other Side Channel Attacks . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Provably Secure Randomization of Cryptographic Algorithms 25

4.1 Security Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1.1 Discussion of the Security Notion . . . . . . . . . . . . . . . . . . . . . 30

4.2 Masking AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 Perfectly Masking AES against Order-1 Adversaries . . . . . . . . . . . . . . 33

4.3.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.3 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3.4 Simplified Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.4 Implementation and Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.4.1 Efficient Hardware Implementation over GF(((22)2)2) . . . . . . . . . 38

4.4.2 Cost and Comparison to Previous Countermeasures . . . . . . . . . . 39

4.5 Order-dPerfectly Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.5.1 Perfect Mask Change . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.5.2 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.5.3 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Fault Based Collision Attacks 49

5.1 The Concept of Fault Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.1.1 Methods to Induce Faults . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.1.2 Fault Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2 The Concept of Collision Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3 New Fault Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.4 Fault Based Collision Attacks on AES . . . . . . . . . . . . . . . . . . . . . . 59

5.4.1 Basic Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.4.2 Second Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4.3 Third Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.4.4 Fourth Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.4.5 Fifth Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 Cache Behavior Attacks (CBAs) 71

6.1 Cache Mechanism and Technical Background . . . . . . . . . . . . . . . . . . 73

6.2 Security Models for CBAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.2.1 Fundamental Model for CBAs . . . . . . . . . . . . . . . . . . . . . . . 76

6.2.2 Time Driven CBA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2.3 Trace Driven CBA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.2.4 Access Driven CBA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.2.5 Extending the Threat Model for Access Driven CBAs . . . . . . . . . 84

6.3 Access Driven CBAs on AES . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.3.1 Access Driven CBA on the First Round . . . . . . . . . . . . . . . . . 85

6.3.2 Access Driven CBA on the Last Round . . . . . . . . . . . . . . . . . 87

6.4 General Methods to Thwart CBAs . . . . . . . . . . . . . . . . . . . . . . . . 88

6.5 Information Leakage and Resistance . . . . . . . . . . . . . . . . . . . . . . . 89

6.6 Information Leakage and Resistance of Selected Implementations . . . . . . . 92

6.7 Countermeasures Based on Permutations . . . . . . . . . . . . . . . . . . . . 100

6.7.1 An Access Driven CBA on a Permuted Sbox . . . . . . . . . . . . . . 101

6.7.2 Separability and Distinguished Permutations . . . . . . . . . . . . . . 103

6.8 Summary of Countermeasures and Open Problems . . . . . . . . . . . . . . . 106

A Sbox Tables T0,...,T4of AES 109

B Decompositions of the AES Sbox 115

vii

viii

Chapter 1

Introduction

Security in whatsoever context or meaning is the goal of human beings ever since the dawn

of mankind. One aspect of security is the secret communication, i.e., preventing others

from reading private messages. The oldest approach in making texts hard to read dates

back to about 4000 years. At that time in Egypt, a master scribe used unusual hieroglyphs

to obfuscate the meaning of an inscription in the tomb of Khnumhotep II (Kahn 1996).

Cryptography – the science of secret writing – was born.

In the course of time, people invented a lot of systems for keeping messages secret. Most

of them were broken because of the lack of thorough analysis and invalid assumptions of the

inventor. A famous example was the Enigma cipher machine used by German forces in the

Second World War. Several drawbacks of the Enigma in connection with a bad protocol and

protocol failures of the participants helped Polish and British experts to break the cipher.

Since World War 2, people understand the importance of cryptography and the theory of

the design and the analysis of encryption algorithms made enormous progress. Nowadays

we have a large number of strong algorithms whose security was analyzed independently by

crypto researchers all over the world, e.g., see (Menezes, van Oorschot and Vanstone 1997)

and (Schneier 1996).

But cryptography expanded from the science of secret writing to the science of arbitrary

security problems like authentication or data integrity. Cryptography can solve some very

difficult problems concerning security. Hence, cryptographic algorithms are the main building

blocks of security systems like access control or electronic payments. However, it was known

right from the beginning that using strong cryptographic algorithms does not necessarily

lead to a secure system. Quite the contrary is true. Using weak cryptography would not

weaken many systems because there are several other components of the systems that allow

even easier attacks. We are confronted with this kind of problem when we see the security

problems that occur because of the human factor or implementational mistakes like buffer

overflows etc. Securing a system can be compared to protecting a house against burglars.

Further strengthening the front door with sophisticated locks does not improve the security

2Chapter 1. Introduction

if the window on the back is still open. An attacker is not fair. He would not spend his

(life)time trying to pick the locks of the front door but simply slips in through the open

window. The same is true for security systems and even for cryptographic algorithms. We

cannot expect that an attacker does what we suppose him to do. He will take every chance he

can get to break the system. Since the system is only as secure as its weakest link, to improve

the security of a system one has to perform the following steps according to (Ferguson and

Schneier 2003):

1. detect all links

2. determine weak links

3. strengthen weak links

These steps are easily written down but very hard to perform. I.e., detecting all links and

determining weak links is very hard and tricky. The problem is that there do not exist any

rules an attacker sticks to.

Peter Wright was the first who published details about operation “ENGULF“ an example

for such an “unfair” attack (Wright 1987). Wright was a scientist at the MI5, one of the secret

services of the United Kingdom. During the Suez crisis in 1956, the MI5 was interested in

the messages of the Egyptian embassy that were encrypted by a Hagelin rotor machine. To

improve the secrecy a new key was set up every day. Although the MI5 had exactly the

same model of the cipher machine they could not break the encryption efficiently. Therefore,

Wright suggested to place a microphone close to the cipher machine to determine the key

settings by listending to the sound that occurs when setting up a new key. The sound enabled

the MI5 to figure out the daily key and read all the messages.

In 1985 van Eck published a different approach later called “van Eck phreaking” to obtain

private information (van Eck 1985). He showed how to exploit the electromagnetic emanations

of computer displays to reconstruct the content of the display even from a large distance.

Attacks that bypass security mechanisms by exploiting additional information or by ma-

nipulating the environment are called tampering attacks. These attacks show that security

engineering – the science of developing reliable and secure systems – is a much wider field

than cryptography, e.g., see (Anderson 2001).

But cryptography itself became a target of security concerns when Kocher published

an attack that determines a secret RSA key by analyzing the running times of encryptions

(Kocher 1996). Only a few years later, he also showed how to break cryptographic schemes

by analyzing the power consumption (Kocher, Jaffe and Jun 1999). Several similar methods

– so called side channel attacks – were developed to break cryptographic algorithms very effi-

ciently. As strengthening the links of a security system, protecting cryptographic algorithms

against side channel attacks is quite tricky.

Organization of the Thesis and Main Results In this thesis we focus on analyzing

the tamper resistance of cryptographic algorithms. More precisely, we examine the security

of todays most important symmetric encryption scheme, the Advanced Encryption Standard

(AES), against side channel attacks.

The first goal was to develop a general and strong model in which the effectiveness of

countermeasures to thwart side channel attacks can be analyzed. This goal was motivated by

finding a secure implementation of AES, a problem that was not satisfyingly solved before.

In Chapter 4 we present our strong and general model which covers adversaries of different

power. After that, we develop a general method to implement ciphers like AES provably

secure in our model. We give the security proof together with a thorough analysis of the

costs of our AES implementation in hardware. The results of this chapter were published in

(Bl¨omer, Guajardo and Krummel 2004).

A further goal was to analyze the effectiveness of countermeasures that were proposed

to thwart side channel attacks but were not analyzed thoroughly. We focus on the so called

memory encryption, a method that is based on encrypting the main memory to prevent

information leakage. At first sight, memory encryption provides a large improvement of

security and hence is used in many high security smartcards. In Chapter 5 we show that

this first impression is wrong. We present a new concept of fault attacks called fault based

collision attacks that defeats memory encryption using only a moderate number of faults.

The results of this chapter were published in (Bl¨omer and Krummel 2006).

In the last part of the thesis we analyze a different kind of side channel attacks, so called

cache based attacks. Cache based attacks have been proven to be very powerful and turned

out to be one of the biggest threats of cryptographic software implementations running on

computers with cache. In Chapter 6 we first strengthen the existing threat model to adapt

it to the recent methodology of cache based attacks. We introduce two security concepts

information leakage and resistance. Information leakage measures the maximal amount of

information that leaks through an arbitrary number of cache based attacks. The resistance

estimates the information an attacker may get after a single cache based attack. We analyzed

several implementations of AES determining their information leakage and their resistance.

It turns out that all implementations proposed so far provide only poor resistance and leak all

key bits. Therefore, we propose a new implementation of AES using small sboxes that does

not leak a single key bit. Furthermore, we analyzed a proposed countermeasure based on

random permutations. We show how to efficiently defeat this countermeasure using cached

based attacks. To improve the effectiveness of this countermeasure we develop a special class

of permutations so called distinguished permutations. Using distinguished permutations we

can provably protect half of the key bits even for an unlimited number of cache attacks. The

results of this chapter were published in (Bl¨omer and Krummel 2007).

4Chapter 1. Introduction

Chapter 2

The Advanced Encryption

Standard (AES)

In 1977, the National Bureau of Standards (NBS) of the USA announced the first standardized

symmetric encryption algorithm called Data Encryption Standard (DES) which immediately

became the de facto standard worldwide. In 1997, the National Institute of Standards and

Technology (NIST), formerly named NBS, started to search a successor of DES. The NIST

arranged a public competition of proposed algorithms that were submitted by several re-

searchers of the cryptography community. These submissions where publicly analyzed by

crypto researchers all over the world. Five candidate algorithms made it to the final deci-

sion. In the end Rijndael, an algorithm of the two Belgian cryptographers Joan Daemen and

Vincent Rijmen, was chosen to be the successor of DES named the Advanced Encryption

Standard (AES). In this chapter we first give the background of symmetric encryption algo-

rithms and then describe the AES in more detail. Further information about the AES can

be found in (Daemen and Rijmen 2002) and (NIST 2001). A more condensed description of

the AES can be found in (Lenstra 2002).

2.1 Symmetric Block Ciphers

Since the seminal paper (Diffie and Hellman 1976) encryption schemes (or ciphers) can be

classified as either symmetric or asymmetric ciphers. Asymmetric ciphers use a pair of keys,

a public key for encryption and a private key for decryption. For the security of this kind of

encryption systems it is essential that the private key cannot be derived from the public key

efficiently. Using a key pair (a public and a private one) allows two parties to communicate

privately without sharing a common secret. Two famous examples for asymmetric ciphers are

RSA (Rivest, Shamir and Adleman 1978) and the ElGamal cryptosystem (ElGamal 1985).

Symmetric ciphers only deal with a single key for both, encryption and decryption. Hence,

before being able to communicate securely both parties have to agree on a common secret

6Chapter 2. The Advanced Encryption Standard (AES)

key. To be more precise, a symmetric encryption scheme is defined as follows.

Definition 1 (symmetric encryption scheme) Let P,Kand Cbe the sets of valid plain-

texts, keys and ciphertexts respectively. A symmetric encryption scheme consists of a pair of

algorithms (enc,dec). The algorithm enc computes the unique ciphertext c∈ C given a valid

plaintext p∈ P and a valid key k∈ K:

enc :P × K → C

(p, k)7→ c=enck(p).

The algorithm dec computes the unique plaintext p∈ P given a valid ciphertext c∈ C and a

valid key k∈ K:

dec :C × K → P

(c, k)7→ p=deck(c).

The algorithms enc and dec are related by the property that

∀p∈ P ∀k∈ K :deck(enck(p)) = p.

A symmetric encryption scheme that takes as input a plaintext block of a fixed size and

computes a ciphertext block of fixed length is called block cipher. In a so called iterated block

cipher several transformations are sequentially applied repeatedly.

AES is an iterated block cipher with a fixed block length of 128 bits. The key length

can be 128, 192 or 256 bits. Depending on the chosen key length AES is named AES-128,

AES-192 or AES-256, respectively. To simplify notation we only describe AES-128. Similar

descriptions of the other variants are given in (Daemen and Rijmen 2002) or (NIST 2001).

2.2 Basic Algebraic Structures of AES

The design of AES makes use of several algebraic structures. In this section we briefly describe

each of these structures together with their associated operations.

2.2.1 Representation of Data

The basic information unit of AES is a byte consisting of 8 bits. Depending on the underlying

algebraic structure AES deals with different representations of bytes. Firstly, a byte bcan be

written in the binary notation as b= (b7,...,b0) where each bi∈F2. We can also interpret

bas a natural number P7

i=0 bi·2ibetween 0 and 255 and represent it by its hexadecimal

notation xy where x, y ∈ {0,1,...,9, A, B, C, D, E, F}. When working in a finite field or ring

the polynomial notation

b=b7x7+b6x6+b5x5+b4x4+b3x3+b2x2+b1x+b0

with coefficients in F2is used.

2.2. Basic Algebraic Structures of AES 7

2.2.2 The Finite Field F2[x]/hx8+x4+x3+x+ 1i

One of the algebraic structures of AES is the finite field with 256 elements. To be more

precise, AES uses the polynomial

m:= x8+x4+x3+x+ 1

that is irreducible over F2to define the finite field

F256 =F2[x]/hmi.

The polynomial representation of a byte bis considered as an element of F256. In the

sequel, we briefly describe the associated operations.

The addition of two elements of a, b ∈F256 is computed as

a+b=

i=0

(ai⊕bi)xi

where ⊕denotes the addition in F2.

The multiplication of two elements a, b ∈F256 is computed as

a·b= 7

i=0

aixi!· 7

i=0

bixi!(mod x8+x4+x3+x+ 1).

Obviously, a= 1 ∈F256 is the neutral element of the multiplication. Hence, a multiplication

by a= 1 is the identity. Furthermore, the multiplication by a=x∈F256 can be implemented

very efficiently. To do so, the coefficients of bare shifted one position to the left setting the

rightmost coefficient to 0:

x·b=b7x8+b6x7+b5x6+b4x5+b3x4+b2x3+b1x2+b0x.

To determine the correct reduced result we distinguish two cases: If b7= 0 then

b7x8+b6x7+b5x6+b4x5+b3x4+b2x3+b1x2+b0x

=b6x7+b5x6+b4x5+b3x4+b2x3+b1x2+b0x

is already in the reduced form and we do not need a further reduction. If b7= 1 then we

have to reduce the result modulo the polynomial mas defined above. We can compute the

correct reduced result by simply adding mto the product x·b:

x8+b6x7+b5x6+b4x5+b3x4+b2x3+b1x2+b0x+ 0

+x8+x4+x3+x+ 1

b6x7+b5x6+ (b4+ 1)x5+ (b3+ 1)x4+ (b2+ 1)x3+b1x2+ (b0+ 1)x+ 1

Algorithm 1 shows the computation of x·b(mod m) called xtime.

8Chapter 2. The Advanced Encryption Standard (AES)

Algorithm 1 xtime

Input: b=b7x7+b6x6+b5x5+b4x4+b3x3+b2x2+b1x+b0∈F2[x]/hmi

Output: x·b∈F2[x]/hmi

1: c←b7x8+b6x7+b5x6+b4x5+b3x4+b2x3+b1x2+b0x{left shift of coefficients}

2: if b7= 0 then

3: Return c{already correct result}

4: else

5: Return c+m{reduce and return}

6: end if

Inversion For every element aof the multiplicative group F×

256 there exists an unique

element b∈F×

256 such that ab = 1. The element b=a−1is called the inverse of a. We extend

the inversion to all elements of F256 by defining the function

INV : F256 →F256

a7→ (a−1, if a∈F×

256

0 , if a= 0

By Lagrange’s Theorem, we can compute INV(a) in F256 by raising ato the 254th power:

INV(a) = a254 ∈F256.

We can use the repeated squaring algorithm to compute the power of an element efficiently.

See for example (von zur Gathen and Gerhard 2003) or (Shoup 2005) for a comprehensive

treatise of the topic.

2.2.3 The Ring F2[x]/hx8+ 1i

Another algebraic structure that is used in AES is the ring F2[x]/hx8+ 1i. Since

x8+ 1 = (x+ 1)8∈F2[x]

is not irreducible over F2, the ring F2[x]/hx8+ 1idoes not form a field and we cannot invert

each of its elements b6= 0. The representation of data bytes as an element of the ring is

again the polynomial representation like in F256 as described above. Beside computing the

reductions modulo x8+ 1, addition and multiplication are defined as above. Hence, for two

elements a, b ∈F2[x]/hx8+ 1i

a+b=

i=0

(ai⊕bi)xi

and

a·b= 7

i=0

aixi!· 7

i=0

bixi!(mod x8+ 1).

2.3. The Standard Implementation of AES 9

2.2.4 The Ring R=F256[y]/hy4+ 1i

AES also deals with 4-tuples of bytes. Here, each byte, considered as an element of F256 as

described above, is a coefficient of a polynomial

β=β3y3+β2y2+β1y+β0(mod y4+ 1)

of degree less than 4. The polynomials described above form the ring

R:= F256[y]/hy4+ 1i.

For two elements α=P3

i=0 αiyi∈ R and β=P3

i=0 βiyi∈ R the sum α+βis computed

(α3+β3)y3+ (α2+β2)y2+ (α1+β1)y+ (α0+β0),

where the addition of two coefficients is computed in F256.

The product α·βis computed as

i=0

αiyi·

i=0

βiyi(mod y4+ 1).

2.3 The Standard Implementation of AES

After explaining the basic algebraic structures, we now describe the standard implementation

of AES as defined in (Daemen and Rijmen 2002) and (NIST 2001). As mentioned above the

basic information unit of AES is a byte. 16 bytes arranged in a 4 ×4 matrix form a so called

state.

To process a plaintext block pof 128 bits, pis transformed into a state. To do so pis

divided into 16 bytes p0, p1,...,p15. The bytes are mapped to a 4 ×4 array as shown in

Figure 2.1.

0 321

p3p7p15

p12

p1p5

p2p14

p11

p13

p10

p0,...,p15

Figure 2.1: Mapping the plaintext pinto a state

10 Chapter 2. The Advanced Encryption Standard (AES)

AES is an iterated block cipher. During the AES encryption several different transforma-

tions grouped in so called rounds are repeatedly applied on the state. In the sequel, we first

describe each of these transformations and then provide the complete encryption algorithm.

2.3.1 State Transformations

The SubBytes (SB) Transformation

SubBytes is the non-linear transformation of AES.

s1,2

s3,2

s2,1

s3,1s3,3

s0,3

s0,1s0,2

s2,2

s′

0,1s′

0,2

s′

0,0

s′

1,0s′

1,2

s′

2,0s′

2,1s′

2,3

s′

2,2

s′

3,3

s′

3,2

s1,0

s2,0

s3,0

s0,0

s′

3,0s′

3,1

s1,3

s2,3

s′

0,3

s′

1,3

s1,1s′

1,1

Figure 2.2: The SubBytes transformation

It substitutes each byte of the state independently of the other bytes by applying a fixed

mapping. In the first step of this mapping each byte bconsidered as an element of F256 is

substituted by its inverse INV(b). In the second step, the INV(b) is interpreted as an element

of the ring R. A fixed affine mapping in the ring Ris applied to INV(b):

(x4+x3+x2+x+ 1) ·INV(b) + (x6+x5+x+ 1) (mod x8+ 1).(2.1)

To apply the mapping efficiently it is usually precomputed for all 256 possible different inputs

and the result is stored in a table of size 256 bytes. This table is called the substitution box

(sbox)S. We denote the application of the mapping to a byte bby S[b]. Figure 2.2 depicts

the application of the sbox.

The ShiftRows (SR)Transformation

The ShiftRows transformation performs a cyclic shift to each row of the state. Each row is

shifted by a fixed byte positions to the left. The first row is not shifted, the second row is

shifted one position to the left, the third row is shifted two positions to the left and the fourth

row is shifted three positions to the left. The ShiftRows operation is depicted in Figure 2.3.

2.3. The Standard Implementation of AES 11

s1,2s1,3

s3,2

s2,3

s2,1

s1,1

s0,3

s0,1s0,2

s2,2

s0,3

s1,1s1,2s1,3s1,0

s1,0

s2,0

s3,0

s0,1

s0,0

s3,1s3,3

s2,2

s3,3s3,0s3,1

s0,2

s2,0

s2,3

s3,2

s2,1

Figure 2.3: The ShiftRows transformation

The MixColumns (MC)Transformation

The MixColumns transformation performs a linear combination of the bytes of a column.

Each byte of the state is interpreted as an element of F256. The four bytes β0, β1, β2, β3of a

β1

β0

β3

β2

β′

·c

Figure 2.4: The MixColumns transformation

column are considered as the coefficients of a polynomial

β=β3y3+β2y2+β1y+β0∈ R

of degree less than 4 over the ring R=F256[y]/hy4+ 1i. The polynomial βis then multiplied

with a fixed polynomial:

c:= 03y3+ 01y2+ 01y+ 02 ∈F256[y]/hy4+ 1i.

MixColumns is depicted in Figure 2.4. Alternatively, we can represent the MixColumns trans-

formation as a matrix multiplication:







02 03 01 01

01 02 03 01

01 01 02 03

03 01 01 02







|{z }

∈F4×4

256

·





β0

β1

β2

β3





=





β′







12 Chapter 2. The Advanced Encryption Standard (AES)

The AddRoundKey Transformation

To introduce the secret key into the encryption, the AddRoundKey transformation is used.

The so called key schedule is explained in Section 2.3.3 and gets as input the cipher key and

generates a so called roundkey for every round of AES. The round key is of the same size

as the encryption state, i.e., it forms a 4 ×4 byte matrix. The AddRoundKey transformation

combines a byte bof the state with its corresponding byte kof the round key by computing

the bitwise addition modulo 2 (XOR): b⊕k. The AddRoundKey transformation is depicted in

Figure 2.5.

s1,2s1,3

s3,2

s2,3

s2,1

s1,1

s3,1s3,3

s0,3

s0,1s0,2

s2,2

k0,0s′

0,2

s′

0,0

s′

2,0s′

2,1s′

2,3

s′

2,2

k0,3

k0,2

k1,3

k1,2

k1,1

k1,0

k2,0k2,1

k3,1

k3,0s′

3,0s′

3,3

s′

1,0s′

1,1s′

1,2s′

1,3

s0,0

s1,0

s2,0

s3,0

k0,1

k3,2

k2,3

k3,3

s′

0,1s′

0,3

s′

3,1s′

3,2

k2,2

Figure 2.5: The AddRoundKey transformation

2.3.2 Encryption

The AES encryption entirely consists of the four state transformations. A round of the AES

encryption is composed by consecutively applying the state transformations to the state in the

order shown in Algorithm 2. The complete encryption algorithm is shown in Algorithm 3.

Algorithm 2 A round of the AES encryption

1: SubBytes

2: ShiftRows

3: MixColumns

4: AddRoundKey

It consists of an initial AddRoundKey and 9 times applying the AES round as described

in Algorithm 2. After that a truncated round is applied that only consists of SubBytes,

ShiftRows and AddRoundKey.

2.3.3 Key Expansion

AES-128 applies the AddRoundKey transformation eleven times on the intermediate state.

AddRoundKey is applied before the first round, in each of the nine rounds and in the truncated

2.3. The Standard Implementation of AES 13

Algorithm 3 Complete AES encryption

Input: plaintext p0,...,p15 ∈ {0,1}8, key k

Output: ciphertext c0,...,c15 ∈ {0,1}8

1: AddRoundKey

2: for i= 1 to 9 do

3: SubBytes

4: ShiftRows

5: MixColumns

6: AddRoundKey

7: end for

8: SubBytes

9: ShiftRows

10: AddRoundKey

last round. To generate different round keys for each of these applications of AddRoundKey

a so called expanded key wis derived from the cipher key k=k0,...,k15 ∈{0,1}816 as

follows. The cipher key kis mapped to a 4 ×4 state matrix similar to the mapping of a

plaintext to a state as shown in Figure 2.1 (page 9). The four bytes of each column of this

matrix form a so called word. We define two operations on words. The first operation is the

so called SubWord operation. SubWord applies the sbox to every byte of the word:

SubWord :{0,1}8× {0,1}8× {0,1}8× {0,1}8→ {0,1}8× {0,1}8× {0,1}8× {0,1}8

(β0, β1, β2, β3)7→ (S[β0],S[β1],S[β2],S[β3]).

The second operation is RotWord that cyclically shifts the 4 bytes of a word one postion

to the left.

RotWord :{0,1}8× {0,1}8× {0,1}8× {0,1}8→ {0,1}8× {0,1}8× {0,1}8× {0,1}8

(β0, β1, β2, β3)7→ (β1, β2, β3, β0).

Furthermore, for i≥1 let

Rcon[i] := (xi−1,0,0,0) ∈(F2[x]/hmi)4

the so called round constant for the ith round key. The expanded key wis then computed

according to Algorithm 4.

The round key for round iis extracted from the expanded key wby mapping the words

w4i,...,w4i+3 of the expanded key wto the columns of a 4 ×4 byte matrix.

14 Chapter 2. The Advanced Encryption Standard (AES)

Algorithm 4 Key schedule of AES-128 in pseudocode

Input: cipherkey k=k0,...,k15 ∈ {0,1}8

Output: expanded key w=w0,...,w43 ∈ {0,1}32

1: for i←0,...,3do

2: wi= (k4·i, k4·i+1, k4·i+2, k4·i+3);

3: end for

4: for i←4,...,43 do

5: temp =wi−1

6: if (i≡0 mod 4) then

7: temp =SubWord(RotWord(temp)) ⊕Rcon[i/4]

8: end if

9: wi=wi−4⊕temp;

10: end for

2.3.4 Decryption

The decryption of AES, that is determining the unique plaintext given the corresponding

ciphertext and the correct secret key, is done by reverting every transformation that was

applied in the encryption. In the sequel, we show how every single transformation can

be inverted. Hence, applying the inverse of each transformation in the reversed order will

compute the correct plaintext.

The InvSubBytes Transformation

To undo the SubBytes transformation that substituted a byte bwith

(x4+x3+x2+x+ 1) ·INV(b) + (x6+x5+x+ 1) (mod x8+ 1)

we proceed in two steps. Firstly, notice that the function INV is self inverse. Secondly, the

affine mapping in the ring Ris invertible having the inverse

(x6+x3+x)·b+ (x2+ 1) (mod x8+ 1).

Hence, the inverse transformation InvSubBytes of the SubBytes transformation is given by

applying the mapping

INV (x6+x3+x)·b+ (x2+ 1)(mod x8+ 1) (2.2)

to every byte of the state.

To increase the efficiency one can precompute all 256 possible values and store them in a

table called the inverse sbox S−1.

2.3. The Standard Implementation of AES 15

The InvShiftRows Transformation

The ShiftRows transformation is obviously invertible by cyclically shifting the bytes of a row

by the appropriate number of position to the right. I.e., shifting the second row one position,

the third row two positions and the fourth row three positions to the right cancels the effect

of ShiftRows on a state.

The InvMixColumns Transformation

In the MixColumns transformation each column of the state is interpreted as an element of

the ring F256[y]/hy4+ 1iis multiplied by a fixed polynomial c= 03 ·y3+ 01 ·y2+ 01 ·y+ 02.

Since gcd(c, y4+ 1) = 1 the inverse of cexists:

c−1:= 0B ·y3+ 0D ·y2+ 09 ·y+ 0E ∈F256[y]/hy4+ 1i.

Multiplying each row interpreted as an element of F256[y]/hy4+1iwith c−1cancels the effect

of the MixColumns operation on a state.

The InvAddRoundKey Transformation

The round key is combined with the state by bitwise adding (XOR) the bytes of the round

key with the corresponding bytes of the state. Since the XOR operation is its own inverse

adding the round key again cancels the effect of the AddRoundKey transformation.

After specifying the inverse of each individual transformation we can compute the de-

cryption of a ciphertext by applying the inverse transformations in the reversed order as

shown in Algorithm 5.

Algorithm 5 Complete AES decryption

Input: ciphertext c0,...,c15 ∈ {0,1}8, key k

Output: plaintext p0,...,p15 ∈ {0,1}8

1: InvAddRoundKey

2: InvShiftRows

3: InvSubBytes

4: for i= 9 to 1 do

5: InvAddRoundKey

6: InvMixColumns

7: InvShiftRows

8: InvSubBytes

9: end for

10: AddRoundKey

16 Chapter 2. The Advanced Encryption Standard (AES)

2.4 The Fast Implementation of AES

Combining the transformations SubBytes,ShiftRows and MixColumns as described in Sec-

tion 4.2 of (Daemen and Rijmen 2002) leads to an alternative description of AES. Notice

that the operations SubBytes and ShiftRows can be exchanged. SubBytes substitutes the

bytes independent of their position whereas ShiftRows changes the position of the bytes

independent of their values.

Let

s:= 





s0,0s0,1s0,2s0,3

s1,0s1,1s1,2s1,3

s2,0s2,1s2,2s2,3

s3,0s3,1s3,2s3,3







be the state before it enters an encryption round. For 0 ≤j≤3 consider the four bytes

s0,j, s1,j+1, s2,j+2, s3,j+3 of the state swhere the indices are computed modulo 4. The four

bytes are transformed by SubBytes and ShiftRows such that they form the new jth column:







S[s0,j]

S[s1,j+1]

S[a2,j+2]

S[a3,j+3]





.

The application of MixColumns and AddRoundKey leads to







02 03 01 01

01 02 03 01

01 01 02 03

03 01 01 02





·





S[s0,j]

S[s1,j+1]

S[s2,j+2]

S[s3,j+3]





⊕





k0,j

k1,j

k2,j

k3,j







We rewrite the matrix multiplication as the linear combination of the column vectors

S[s0,j]









⊕S[s1,j+1]









⊕S[s2,j+2]









⊕S[s3,j+3]









⊕





k0,j

k1,j

k2,j

k3,j







Based on this linear combination we can construct new sboxes

T0,T1,T2,T3:{0,1}8→{0,1}84

as follows:

T0[a] := 





S[a]·02

S[a]·01

S[a]·03





,T1[a] := 





S[a]·03

S[a]·02

S[a]·01





,T2[a] := 





S[a]·01

S[a]·03

S[a]·02

S[a]·01





,T3[a] := 





S[a]·01

S[a]·03

S[a]·02





.

2.4. The Fast Implementation of AES 17

Each of the sboxes T0,T1,T2,T3has 256 entries of size four bytes. 4 bytes can be encrypted

one (full) round by computing

T0[a0,j]⊕T1[a1,j+1]⊕T2[a2,j+2]⊕T3[a3,j+3]⊕





k0,j

k1,j

k2,j

k3,j





.

For the last (truncated) round that does not have a MixColumns transformation things

are more simple. We could simply apply the standard sbox Sto every byte of the state.

However, to increase the efficiency on 32 bit platforms (Daemen and Rijmen 2002) suggested

to use the sbox

T4:{0,1}8→{0,1}84

a7→ S[a],S[a],S[a],S[a].

Merging the transformations as described above leads to a description of AES that only

uses applications of the sboxes and key additions to compute the correct AES encryption.

18 Chapter 2. The Advanced Encryption Standard (AES)

Chapter 3

Security and Side Channel Attacks

Classical cryptography covers several different security notions, e.g., security against known

plaintext attacks or chosen plaintext attacks. But all the different security notions share at

least one assumption: The encryption function is a black box. I.e., the only information an

attacker can get or influence is the plaintext and the ciphertext of the encryption function

as depicted in Figure 3.1. Here Alice and Bob want to communicate confidentially over an

insecure channel. To protect their communication they encrypt the messages in a private

environment before sending them. The attacker named Eve wants to obtain information

about the messages or the key used for encryption.

Alice

encrypt decrypt

Bob

Eve

Figure 3.1: Black box model of classical cryptography

However, cryptographic algorithms have to be implemented either in hardware or software.

It turned out that implementations of cryptographic algorithms leak some information about

the cryptographic operations through so called side channels. The information that leaks is

called side channel information, e.g., the time it takes to encrypt a plaintext or the power

consumption etc. Side channel information depends on the implementation and its inputs,

i.e., the plaintext and the secret key. An attack that uses side channel information is called

side channel attack (SCA). It turns out that side channel attacks are much more efficient than

20 Chapter 3. Security and Side Channel Attacks

classical attacks for virtually every cryptographic algorithm. Hence, to analyze the security

of cryptographic algorithms it is essential that side channels are considered as a real threat

and are incorporated into the black box model. This leads to an extended black box model

like the one depicted in Figure 3.2. However, securing algorithms against side channel attacks

Alice

encrypt decrypt

Bob

Eve

fault

electromagnetic

emanation

consumption

power

light

sound sound

fault power

consumptiontime

electromagnetic

emanation

probing

light

time

probing

Figure 3.2: Extended black box model that incorporates side channels

is quite tricky. At least two problems occur:

1. It is unclear how to determine all side channels. So far, no security model that considers

all side channels is known.

2. It is difficult to prevent the leakage of information. Most of the countermeasures pro-

posed so far only thwart a certain way of exploiting side channel information.

We introduce a model for analyzing the security against side channel attacks in Chapter 4.

3.1 General Principles of Side Channel Attacks

In the sequel, we describe the general principle of side channel attacks and the assumptions

that are necessary for mounting a side channel attack on an implementation. The essential

assumptions concerning an attacker Athat exploits side channel information are:

Assumption 1 (Kerckhoffs’ extended principle) Aknows all technical details about the

underlying cryptographic algorithm and its implementation.

This assumption is implicitely used in all side channel attacks. In the following we simply

refer to it as Kerckhoffs’ extended principle.

3.2. Side Channels 21

Assumption 2 Ais able to get plaintexts (or ciphertexts) of encryptions. Furthermore, for

each encryption Ais able to obtain side channel information.

The general structure of a side channel attack consists of the following steps:

measurement step In the measurement step the adversary Aobtains the side channel

information of the implementation together with the corresponding plaintext and/or

ciphertext. To perform the measurement step, Aneeds access to the implementation

of the algorithm. Therefore this step is also called online step.

analysis step Ainterprets the information collected in the measurement step and tries to

connect the side channel information to some property of an intermediate state of the

encryption. This analysis lets Aderive some information about the secret key. Depend-

ing on the side channel attack the analysis step can determine the secret key uniquely

or reduces the number of key candidates significantly such that a brute force attack is

applicable. The analysis can be performed without access to the implementation and

hence this step is also called offline step.

3.2 Side Channels

In the sequel we give an overview over the most commonly analyzed side channels and specify

the common structure of side channel attacks.

3.2.1 Timing Attack

The first publication of a successful timing attack was Kochers timing attack on modular

exponentiation as used in RSA (Kocher 1996). In the asymmetric cipher RSA, a ciphertext

cviewed as an element of the multiplicative group Z∗

Nis decrypted by raising it to the d-th

power

p=cdmod N,

where N∈Zis the public modulus and d∈Z∗

ϕ(N)is the secret exponent. The exponentiation

can be computed efficiently using the repeated squaring algorithm or a variation of it. Kocher

showed how to determine defficiently by analyzing time measurements of decryptions of many

different ciphertexts.

Quisquater’s Timing Attack on RSA

(Dhem, Koeune, Leroux, Mestr´e, Quisquater and Willems 1998) improved the timing attack

on RSA that uses a fast modular multiplication method called Montgomery multiplication

(Montgomery 1985). The structure of the attack is as follows. Let [d0, d1,...,dn] be the

22 Chapter 3. Security and Side Channel Attacks

binary representation of d. Knowing the bits d0,...,di−1of d, the bit dican be determined

by computing the following steps for many ciphertexts c:

1. measure the running time Tof the decryption of c

2. compute z=c[d0,d1,˙,di−1,0]

3. if computing z·ctakes ”long” then put Tinto set S1

4. else put Tinto set S2

After that, the attacker Acompares the average timings of S1and S2. If they differ signifi-

cantly, Aassumes that di= 1. Otherwise he assumes that di= 0.

Before starting the attack, Afirst implicitly assumes that di= 1 which implies that a

modular multiplication is computed in step iof the decryption. Since Aknows all preceeding

bits, he can compute the intermediate result of the decryption right before step i. He splits the

set of time measurements depending on the time it would take to compute the multiplication

in step i. The set S1stores all timing measurements of ciphertexts that would take a ”long”

time for the multiplication. The set S2stores all timing measurements of ciphertexts that

would take a ”short” time for the multiplication. The assumption is that if the multiplication

takes a long time than it is more likely that the overall encryption time is greater than the

encryption time of a ciphertext for which the multiplication takes a short time. Hence, if

di= 1 then the average running time of set S1should be significantly larger than the average

running time of the set S2. On the other hand, if di= 0 then no modular multiplication will

be computed. The splitting of measurements into the two sets is assumed to be random and

we expect that the average running times do not differ significantly.

There are several different variant of timing attacks. (Schindler 2000) adapted the concept

of timing attacks to RSA using the Chinese Remainder Theorem. (Cathalo, Koeune and

Quisquater 2003) developed a different type of timing attack to break the identification

scheme GPS of (Baudron, Boudot, Bourel, Bresson, Corbel, Frisch, Gilbert, Girault, Goubin,

Misarsky, Nguyen, Patarin, Pointcheval, Stern, Traor and Poupard 2000). Symmetric ciphers

are also susceptible to timing attacks. (Hevia and Kiwi 1999) showed how to determine the

secret DES key and (Koeune and Quisquater 1999) obtained secret AES keys by mounting

timing attacks.

The power of timing attacks goes far beyond local attacks. (Brumley and Boneh 2005)

demonstrated that remote timing attacks are possible. They determined the secret RSA

exponent of a web server running openssl by remotely taking time measurements over a

computer network. This remote timing attack was improved by (Acıi¸cmez, Schindler and

Ko¸c 2005).

3.2. Side Channels 23

3.2.2 Power Analysis

The idea of power analysis is that the power consumption of a cryptographic device is related

to intermediate results of an encryption algorithm and hence depends on the secret key. The

first successful power attacks are due to (Kocher, Jaffe and Jun 1998). Power analysis can be

divided into simple power analysis (SPA) and differential power analysis (DPA). In an SPA,

the attacker analyzes a single power trace to figure out which operation and operands were

executed at what time.

As the name suggests, differential power analysis is based on the differences of power

traces obtained from many different inputs. Similar to timing attacks, the attacker Asplits

a large set of power traces into two sets depending on some guesses of parts of the key. For

each plaintext the attacker does the following. If the guess of the part of the key implies that

a certain operation during the encryption should consume a lot of power then the obtained

power trace is put into set S1. On the other hand, if the key guess implies that the operation

does not consume much power the trace is put into set S2. In the end, the attacker computes

the difference of the average traces of both sets. If there is a peak in the difference trace than

the attacker assumes that the guess of the part of the key was correct. Otherwise he assumes

that the guess was wrong.

The underlying idea is similar to the one of the timing attack. If the key guess is correct

than all power traces in the set S1show a high power consumption (peak) at the time when

the certain operation is executed. Hence, the average power trace of set S1also shows this

peak. The average trace of set S2does not have this peak. Therefore, the peak of S1will be

visible in the difference of the two average traces.

If the key guess was wrong than the attacker wrongly decides whether the operation would

consume a lot of power or not. The assumption is that in this case the assignment of power

traces into the sets S1,S2is random. Hence, we expect that when computing the average

traces the peaks in the power traces cancel out and we get a smooth difference trace.

3.2.3 Fault Attacks

The main idea of fault attacks is to obtain information about the secret key by inducing

faults into the cryptographic operation. We deal with fault attacks in more detail in Chapter

5 (page 49).

3.2.4 Cache Attacks

A cache is a fast buffer memory that can be accessed faster than the main memory. Hence,

buffering data that is used more often in the cache increases the performance of a computer.

In a cache attack the attacker observes information about the cache behavior of an algorithm.

E.g., he figures out how many cache accesses happened or which operation caused a cache

24 Chapter 3. Security and Side Channel Attacks

access. We analyze cache attacks in more detail in Chapter 6 (page 71).

3.2.5 Other Side Channel Attacks

Beside the side channels described above there are several other ways for an attacker to

obtain information about the internal states of an cryptography algorithm. (van Eck 1985)

shows how to reconstruct the content of a computer display by analyzing the electromagnetic

radiation of the monitor. Neal Stephenson treats the so called van Eck phreaking of attack

in his novel ”Cryptonomicon” (Stephenson 1999). The concept of using electromagnetic

radiation to attack cryptographic algorithms was demonstrated in (Quisquater and Samyde

2001), (Gandolfi, Mourtel and Olivier 2001) and (Kuhn 2003).

Another example for a side channel attack proposed in (Shamir and Tromer 2004) is to

analyze the sound a computer generates while operating with the secret key. Further kinds

of side channel attacks are among others so called frequency based attacks (Tiu 2005), visible

light attacks (Kuhn 2002) and scan based attacks (Yang, Wu and Karri 2004).

3.3 Countermeasures

In general, there are two strategies to thwart side channel attacks. The first strategy is to

prevent the information leakage. E.g., to thwart timing attacks one could build an imple-

mentation that uses constant execution time for all possible inputs. However, this approach

has several disadvantages. Firstly, building such an implementation is costly because it has

to consider all details of the underlying hardware and other parts of the environment. Sec-

ondly, missing one of the details could lead to an implementation that is susceptible to other

side channel attacks. The third disadvantage of this approach is that it leads to inefficient

implementations that have to be redesigned for every different environment.

The second strategy is to randomize the intermediate values of an implementation such

that the leaking information is useless for an attacker. Furthermore, the implementation

has to ensure that the correct ciphertext is computed in the end. Of course, this approach

needs random values for obfuscating intermediate values. But randomization has several

advantages over the strategy of preventing information leakage. The first advantage is that

one can define a general model to analyze the effectiveness of the randomization. Furthermore,

the randomization can be done independently of the underlying hardware. Therefore one can

reuse randomized algorithms on several different platforms.

In the next chapter, we will present such a randomization strategy to provably protect

the AES against side channel attacks in a strong model.

Chapter 4

Provably Secure Randomization of

Cryptographic Algorithms

The security of AES against Simple Power Analysis (SPA), Differential Power Analysis

(DPA), Higher Order Differential Power Analysis (HODPA) as published in (Kocher et al.

1998), (Kocher et al. 1999), and Timing Attacks (Kocher 1996) has received considerable

attention since the beginning of the AES selection process. (Koeune and Quisquater 1999)

describe timing attacks against careless implementations of AES. (Biham and Shamir 1999)

and (Daemen and Rijmen 1999) discuss DPA attacks on the AES candidates in software based

solutions. (¨

Ors, G¨urkaynak, Oswald and Preneel 2004) describe the first power analysis-based

attack on a dedicated ASIC implementation of AES and (Mangard 2002) discusses an SPA

attack on the key schedule of AES.

As a result of these attacks, numerous hardware and algorithmic countermeasures have

been proposed. Hardware methodologies were proposed right from the beginning including

randomized clocks, memory encryption schemes, see (Clavier, Coron and Dabbous 2000) and

(Goli´c 2003), power consumption randomization (Daemen and Rijmen 1999), and decorre-

lating the external power supply from the internal power consumed by the chip. Moreover,

the use of different hardware logic, such as complementary logic (Daemen and Rijmen 1999),

sense amplifier based logic (SABL) and asynchronous logic (Fournier, Moore, Li, Mullins

and Taylor 2003) and (Moore, Anderson, Mullins, Taylor and Fournier 2003) has also been

proposed. Some of these methods soon proved to be ineffective while other more successful

countermeasures are very costly in terms of development, area and power consumption. For

example, the techniques in (Daemen and Rijmen 1999), (Tiri, Akmal and Verbauwhede 2002),

(Tiri and Verbauwhede 2003), (Fournier et al. 2003) and (Moore et al. 2003) require about

twice as much area and will consume twice as much power as an implementation that is

not protected against power attacks. In addition, hardware countermeasure will only protect

against known techniques and attacks. They cannot provide security in a precisely defined

mathematical sense. Hence, although hardware countermeasures are an important defense

against side channel attacks, they should be complemented by algorithmic countermeasures

26 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

that are provably secure in a mathematically precise sense.

In this chapter, we focus on algorithmic countermeasures against timing and power at-

tacks on AES. In general, efficient algorithmic countermeasures against timing and power

attacks are based on randomization techniques. Here the problem is to guarantee that all

information that is accessible via side channels is random and hence useless to the attacker.

Moreover, the randomization must be used in such a way that, at the end of the algorithm, the

correct encryption or signature corresponding to the input plaintext is obtained. Random-

ized algorithmic countermeasures against timing and power attacks include secret-sharing

schemes, independently proposed by (Goubin and Patarin 1999) and (Chari, Jutla, Rao and

Rohatgi 1999) as well as methods based on the idea of masking all data and intermediate re-

sults during an encryption operation, originally introduced by (Messerges 2000). This chapter

is organized as follows.

Section 4.1: Security Model .......................................................28

In this section we introduce and discuss our mathematically precise security notion in

which we discuss randomization techniques. For our security notion we only make some

inevitable assumptions: Firstly, we assume that some (small) part of the computation

runs in a protected environment. Secondly, we limit the number of intermediate results

that an adversary has access to. Note that previous methods made at least these as-

sumptions. On the other hand, we assume that arbitrary differences in the distribution

of an intermediate result that depends on the plaintext or secret key of the cryptosystem

can be used to break the system completely. Accordingly, our security notion requires

that the distribution of any intermediate result is stochastically independent of the

secret key being used and independent of the plaintext. Independent of our research,

Goli´c briefly sketched a similar requirement in (Goli´c 2003). In the sequel, we call an

algorithm order-dperfectly masked if the joint distribution of any dintermediate results

is independent of the secret key and the plaintext. This notion of security strengthens

the security notion proposed in (Chari et al. 1999). Their security notion only requires

that the distribution of some side channel information about an intermediate result

has to be indistinguishable by an adversary. Since our security notion assumes that

even tiny differences in the distribution of the values of intermediate results completely

break an implementation of a cryptosystem, this notion is strong and often unrealistic.

On the other hand, we will argue that our security notion implies security against most

side channel attacks.

Section 4.2: Masking AES .........................................................31

In this section we briefly describe the masking techniques proposed so far. The first

algorithmic countermeasure against power attacks customized for the AES was the

Transform Masking Method by (Akkar and Giraud 2001). This method was further

simplified by (Trichina, Seta and Germani 2002). It was noticed in (Trichina et al. 2002),

(Goli´c and Tymen 2002) and (Akkar and Goubin 2003) that the multiplicative masking

introduced in (Akkar and Giraud 2001) masks only non-zero values, i.e., a zero byte will

not get masked because of the multiplicative nature of the mask. This feature renders

the method of Akkar and Giraud vulnerable to DPAs. A second masking technique

for AES is the Random Representation Method of (Goli´c and Tymen 2002). Similar to

Akkar and Giraud, Goli´c and Tymen do not try to show that their technique randomizes

all intermediate results. Instead, the authors argue experimentally that using their

methods the Hamming weights of all intermediate results are distributed in roughly

the same way, independent of the plaintext and the secret key. We conclude that so

far customized randomization techniques for AES were based on empirical assumptions

about the power of potential adversaries. Then these assumptions were used to define

some ad-hoc-model in which to analyze and argue the security of the methods. We

believe that this is a potentially dangerous approach.

Section 4.3: Perfectly Masking AES against Order-1Adversaries .............33

Based on our security notion we develop an order-1 perfectly masked algorithm for AES.

Hence, this algorithm is secure against any adversary that gets plaintext/ciphertext

pairs and a single arbitrary intermediate result for each of those pairs. The main

problem here is to describe a secure algorithm for the inversion operation that is the

main ingredient of the AES SubBytes transformation. Our solution is based on a

general technique to turn an arbitrary algorithm using arithmetic operations defined

over some finite field into a randomized algorithm that securely computes the same

function.

Section 4.4: Implementation and Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

We show that masking countermeasures are inexpensive to implement in hardware. Our

method amounts to only a 20% increase in the overall area required for an AES hardware

implementation when compared to dual-rail logic type countermeasures. To show this,

we provide a detailed cost comparison of the different methods. Because our method is

based on the usage of multipliers and adders over any binary field, designers might use

this method to implement DPA-safe circuits which utilize previously designed multiplier

and adder blocks. Moreover, the method is modular and encourages reusability.

Section 4.5: Order-dPerfectly Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

In this section we generalize our method of order-1 perfectly masked algorithms. We

show how to design order-dperfectly masked algorithms that are secure against adver-

saries that get the values of a fixed number dof intermediate results.

Section 4.6: Conclusion ............................................................47

We conclude the chapter by giving a brief survey of our contribution in the area of

building reliable security models and developing provably secure algorithms.

28 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

4.1 Security Model

In this section we describe our model which we will use in the sequel to analyze the security

of algorithms against side channel attacks. We specify the underlying assumptions that

characterize the model.

Let P,Kand Cdenote the set of plaintexts, the set of keys and the set of ciphertexts

respectively. We consider some encryption function

enc : P × K → C

(x, k)7→ c.

Given an algorithm E that evaluates the function enc, for each plaintext x∈ P and key

k∈ K, we view the computation of E(x, k) as a sequence of t∈Nintermediate results

I1(x, k, R),...,It(x, k, R).

Each intermediate result Iimay depend on the plaintext x, on the secret key k, and some

R∈ {0,1}αfor an appropriate constant α∈N. The element Ris used to randomize the

computation and is chosen uniformly at random from {0,1}α. For simplicity we assume that

we have a true random number generator (TRNG) and that the adversary is not able to

manipulate the random bits. Note that the ciphertext enc(x, k) = It(x, k, R) only depends

on xand kand not on R.

We consider an adversary Athat wants to derive information about the secret key kby

using side channel information. To characterize the security model we make the following

assumptions:

Assumption 3

1. The adversary Acan choose an arbitrary number of plaintexts (or ciphertexts) and

obtains the corresponding ciphertexts (or plaintexts).

2. For each encryption (or decryption), Agets the values of a constant number dof in-

termediate results.

In point 1 of Assumption 3 we allow the adversary to obtain an arbitrary number of (adap-

tively) chosen plaintext/ciphertext pairs (x, enc(x, k)). Furthermore, for each pair, the ad-

versary Aobtains the values of dintermediate results of his choice. Amay get different

intermediate results for different plaintext/ciphertext pairs. The larger the number dof

known intermediate results is, the more powerful is A. We call an adversary A, that can get

at most dintermediate results for each pair (x, enc(x)) an order-dadversary.

So far we considered intermediate results without specifying the possible intermediate

results that an adversary may get. We consider an algorithm as a sequence of operations

4.1. Security Model 29

that are treated as encapsulated modules. This leads to a classification of intermediate results

into different levels down to the bit level:

1. Text level: The whole algorithm is treated as a module. This level is the one of classical

cryptography. The only information available to the adversary is the plaintext and the

ciphertext.

2. Block level: Each part or subroutine of the algorithm is treated as a module. In the

case of a block cipher such as the AES, each transformation within a round is treated

as a module (SubBytes,ShiftRows,MixColumns and AddRoundKey).

3. Unit level: Each arithmetic operation is treated as a module. These operations work on

the atomic units of information in the cipher. For example, the AES units of information

are bytes; no operation acts on single bits or nibbles directly. In hardware terms this

level is based on the contents of registers.

4. Bit level: Each bit manipulation is treated as a module, for example XOR, shift etc.

Every output of such a module is an intermediate result. In this section we concentrate

on intermediate results at the unit level. For AES this seems to be a natural choice since

basically all operations in AES are arithmetic operations on bytes. Therefore timing, power

and fault attacks on AES have focused on these operations as well.

Assumption 4 Some of the operations of the algorithm E that evaluates the encryption

function are protected against A.

This assumption is inevitable to achieve a reasonable notion of security. To see this, note

that the secret key kitself can be considered as an intermediate result. Letting Aobtain k

directly would render all algorithms and countermeasures insecure. Hence, we must assume

that some parts of the computation run in a guaranteed secure environment. I.e., some

intermediate results cannot be accessed by an adversary. At least implicitly, all previously

proposed countermeasures against side channel attacks have made the same assumption.

Note that modern smartcards are protected by different types of countermeasures like sensors

and shields. Hence, the assumption that at least some computations are done in a secure

environment is realistic. However, it is desireable to clearly specify and to limit the number

of those operations because their protection is expensive.

Assumption 5 If the joint distribution of dintermediate results depends on the plaintext x

and on the secret key kthen Acan determine k.

This assumption strengthens the adversary. If the joint distribution of dintermediate results

depends on the secret key then it provides Asome information about k. To simplify and

strengthen our security model we assume that in this case Acan determine the entire key k.

30 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

Intuitively, we say that the algorithm computing enc is insecure if the joint distribution

of the intermediate results that are accessible for an adversary depends on the plaintext x

and on the secret key k. To formalize this, fix some d-tuple I1,...,Idof intermediate results.

For a pair (x, k) of plaintext and key we denote by Dx,k(R) the joint distribution of I1,...,Id

induced by choosing Runiformly at random in {0,1}αfor an appropriate constant α. Now

we can define our notion of security called perfect masking:

Definition 2 (perfect masking) An algorithm that evaluates an encryption function enc

is order-dperfectly masked if for all d-tuples I1,...,Idof intermediate results we have that

Dx,k(R) = Dx′,k′(R)for all pairs (x, k),(x′, k′).

For d= 1 we say that an algorithm is perfectly masked.

4.1.1 Discussion of the Security Notion

Our notion of security is very strong. Basically, we assume that an adversary can determine

the secret key even from tiny differences in the (joint) distribution of intermediate results. In

many realistic cases this may not be true. However, we do not want to base our security model

on assumptions about technical abilities or limitations adversaries currently have. Instead

we want to provide a precise mathematical notion that captures security against current

side channel attacks as well as future ones. Our notion of security strengthens the security

notion of (Chari et al. 1999). We require that for any two pairs (x, k),(x′, k′) of plaintext

and key the joint distributions Dx,k(R), Dx′,k′(R) of dintermediate results induced by these

pairs must be identical. Chari et al., on the other hand only demand that the distributions

Dx,k(R), Dx′,k′(R) must be indistinguishable by an adversary. As Chari et al. point out, if

the joint distributions of dintermediate results induced by different plaintext/key pairs are

indistinguishable for an adversary then power analysis and timing attacks using information

about at most dintermediate results cannot be mounted. Clearly, identical distributions

are indistinguishable. Hence, an algorithm that is order-dperfectly masked is secure against

timing and power analysis attacks using information about dintermediate results.

In the sequel, we will concentrate on methods to achieve a perfectly masked algorithm to

compute AES. From the discussion above it follows that the perfectly masked algorithm for

AES that we describe in Section 4.3 (page 33) is secure against timing and power analysis

attacks using a single intermediate result. As can easily be seen, our algorithm is not secure,

if an adversary has access to two or more intermediate results. Notice that most countermea-

sures proposed so far also assume an adversary with access to a single intermediate result,

see (Akkar and Giraud 2001), (Goli´c and Tymen 2002) and (Trichina 2003).

4.2. Masking AES 31

4.2 Masking AES

(Messerges 2000) introduces the idea of masking all intermediate values of an encryption

operation as an effective countermeasure against Simple Power Attacks and Differential Power

Attacks. Randomizing the computation of a function fis, thus, achieved as f(u′) where

u′=u+rand ris a randomly chosen mask. If the function is linear, one can recover the

desired value f(u) from f(u′) = f(u) + f(r). A similar computation will recover f(u) if the

function fis affine. For non-linear functions, the previous equation does not hold true and it

is necessary to come up with a series of computations depending only on rand u′such that

we obtain the value of f(u) without leaking any information.

We notice that in the case of the AES, the only non-linear function in the algorithm is

the AES SubBytes transformation. As described in Section 2.3 (page 9), SubBytes consists

of the function

INV(x) = (x−1, if x∈F×

256

0 , if x= 0

together with an affine mapping. In particular, most researchers have concentrated their

efforts on efficient methods to perform inversion over F256 in a secure manner via masking

countermeasures, i.e., computing u−1+rfrom u+rwithout compromising the value of

u. In this context, three masking methods have been proposed: two of them, (Akkar and

Giraud 2001) and (Goli´c and Tymen 2002) are based on the idea of combining boolean and

multiplicative masking operations and the third one is based on the idea of masking the

individual logic operations required to compute a F256 inverse. A simplification of (Akkar

and Giraud 2001) was introduced in (Trichina et al. 2002) but it has been recently found

by (Akkar, B´evan and Goubin 2004) that the simplifications lead to further vulnerabilities

against DPA. Thus, we do not consider it any further. In the following, we shortly summarize

the previously proposed countermeasures.

The Transform Masking Method (TMM)

In (Akkar and Giraud 2001), Akkar and Giraud introduce the Transform Masking Method

(TMM) and algorithms to transform between boolean masking (XOR operation) and multi-

plicative masking (multiplication in F256) which is compatible with inversion in F256. (Akkar

and Giraud 2001) solves the problem using Algorithm 6, where r1∈F256 is a random field

element and r2∈F×

256 is a random element of the multiplicative group.

However, as noticed in (Trichina et al. 2002) and (Goli´c and Tymen 2002), this counter-

measure is susceptible to first-order DPA if u= 0 because zero cannot be masked with a

multiplicative mask. It is clear that because of the special nature of the zero value, multi-

plicative masking cannot lead to perfect masking.

32 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

Algorithm 6 Transform Masking Method

Input: u′=u⊕r1∈F256,r1∈F256,r2∈F×

256

Output: INV(u)⊕r1

1: t1←u′·r2{t1= (u⊕r1)·r2}

2: t2←r1·r2{t2=r1·r2}

3: t1←t1⊕t2{t1=u·r2}

4: t3←r−1

2{t3=r−1

5: t1←INV(t1){t1= INV(u·r2)}

6: t2←t3·r1{t2=r1·r−1

7: t1←t1⊕t2{t1= INV(u·r2)⊕(r1·r−1

2)}

8: t1←t1·r2{t1= INV(u)⊕r1}

Embedded Multiplicative Masking (EMM)

Let m=x8+x4+x3+x+ 1 ∈F2[x] be the polynomial of the AES specification. The basic

idea of EMM as described in (Goli´c and Tymen 2002) is to embed the field F256 =F2[x]/hmi

in the ring

Rn:= F2[x]/(m·q)∼

=F256 ×F2n,

where q∈F2[x] is another irreducible polynomial of degree nthat is co-prime to m. The

field F256 is a subring of the ring Rnwith the homomorphism defined by

F256 → Rn

v7→ (vmod m, v mod q).

(Goli´c and Tymen 2002), then, suggests to use a random mapping ρkdefined by

ρk:F256 → Rn

v7→ v+rm mod mq

where r∈F2[x] is a randomly chosen polynomial of degree less than n. To compute INV(v)

an adapted function

INV′:F256 →F256

v7→ v254 mod mq

can be used.

In this way, arithmetic operations remain compatible with F256 and the zero value gets

mapped to one of 2nrandom values. Thus, it is harder to detect the zero value as nbecomes

larger. From a security point of view, however, the approach in (Goli´c and Tymen 2002) does

not yield perfect masking since the sets of representatives of different values are pairwise

disjoint. From an implementation point of view, we will show in Section 4.4.2 (page 39) that

4.3. Perfectly Masking AES against Order-1Adversaries 33

this method is too expensive to implement in hardware. This is important since our method

can be implemented with less than half the hardware resources and, at the same time, yields

perfect masking.

Combinational Logic Design for the AES Sbox on Masked Data

To the authors’ knowledge, (Trichina 2003) is the first to consider embedding a masking

countermeasure directly in hardware. (Trichina 2003) allows for a modified inversion function

which on input u⊕r1outputs u−1⊕r2, where r1and r2need not be the same. In addition,

(Trichina 2003) reduces the masking problem for inversion in F2mto the problem of masking

a logical AND operation since masking XOR operations is, in principle, trivial. In particular,

given masked bits u′=u⊕r1,v′=v⊕r2and corresponding masks r1, r2, we compute

(u∧v)⊕r3, where r3is the output mask. According to (Trichina 2003) and setting r3=r1∧r2

this can be accomplished as:

(u∧v)⊕r3= (u∧v)⊕(r1∧r2) = (u′∧v′)⊕(r1∧v′)⊕(r2∧u′)(4.1)

where the parenthesis indicate the order in which intermediate results are computed. Equa-

tion (4.1) implies that we can compute the AND operation of two bits u, v without using the

actual bits but rather their masked counterparts u′, v′and corresponding masks r1, r2. We

notice that if u=v= 0, the intermediate value (r1∧v′)⊕(r2∧u′) is always equal to zero

for any value of r1and r2. This implies that (4.1) does not lead to perfect masking.

4.3 Perfectly Masking AES against Order-1Adversaries

As mentioned before, in order to obtain a perfectly masked algorithm for AES we concentrate

on the problem of computing multiplicative inverses in F256 because this is the main step of

the SubBytes transformation. In this section we present an algorithm that is secure against

an adversary who is able to get the value of a single intermediate result. In Section 4.5 (page

41) we will show how to generalize this method to protect against order-dadversaries for an

arbitrary but fixed d≥1.

Let r, r′be independent and uniformly distributed random masks. We start with an

additively masked value u⊕rand would like to compute INV(u)⊕r′. However, a direct

application of INV leads to INV(u⊕r) that is of no use because of the non-linearity of

inversion.

4.3.1 Idea

The basis of our idea is to compute INV(x) as x254 in F256. For simplicity we only consider

the repeated squaring algorithm to compute the 254th power. However, to improve efficiency

34 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

one could use an optimal addition chain. For a thorough treatment of efficient exponentiation

methods see for example (von zur Gathen and N¨ocker 1997, von zur Gathen and Gerhard

2003). In general the multiplicative inverse of an element over an arbitrary finite field Fpm

can always be computed by raising it to the (pm−2)-th power. Since our inputs are additively

masked values (u⊕r) we correct the result of every single operation in the repeated squaring

algorithm in order to obtain the desired result. Our invariant is that at the end of each step

our result has the form

(ue⊕r′) (4.2)

for some e∈Nand r′∈F256 chosen uniformly at random. Hence, the problem is to correct

the intermediate results without revealing any information about u.

4.3.2 Method

We introduce some variables: We name rj,i the jth random mask used in step iof the

repeated squaring algorithm. All rj,i are independent and uniformly distributed masks. The

direct result of a squaring or multiplication performed on some masked values is called fi.

Furthermore, we need so called auxiliary terms s1,i and s2,i to transform the direct result fi.

The variable t1,i is the intermediate result that appears during the correction and tiis the

final result which complies with our invariant (4.2), i.e., it is of the form ue⊕r1,i for some e.

The input to our modified inversion algorithm is the masked value (u⊕r1,0). Next, we

describe how to perform multiplications and squarings in a perfectly masked manner. The se-

curity analysis is shown in Section 4.3.3. We distinguish between squaring and multiplication

because the former is linear and hence can be masked more efficiently.

Perfectly Masked Squaring (PMS) The perfectly masked squaring algorithm that is

used in step iof the repeated squaring algorithm is described in Algorithm 7. The input

ti−1=ue⊕r1,i−1is squared in step 1. In order to compute the output that respects our

invariant we have to change the mask to r1,i. To do so in steps 2 and 3 we use the auxiliary

term s1,i and compute the desired output t=u2e⊕r1,i.

Algorithm 7 Perfectly Masked Squaring (PMS)

Input: ti−1=ue⊕r1,i−1,r1,i−1,r1,i ∈F256

Output: u2e⊕r1,i ∈F256

1: fi←t2

i−1{fi=u2e⊕r2

1,i−1}

2: s1,i ←r2

1,i−1⊕r1,i {auxiliary term to correct fi}

3: ti←fi⊕s1,i {ti=u2e⊕r1,i}

Perfectly Masked Multiplication (PMM) Our perfectly masked multiplication method

is described in Algorithm 8. The inputs are two intermediate results: the output xof the

4.3. Perfectly Masking AES against Order-1Adversaries 35

previous step and a freshly masked value x′derived by securely changing the masked value

from u⊕r1to u⊕r2. In Step 1 we calculate the product fiof two intermediate results. The

variable ficontains the desired power of uas well as some disturbing terms. In Steps 2-5

we compute the auxiliary terms s1,i and s2,i. In the end (Steps 6 and 7) we eliminate the

disturbing parts of fiand transform it according to our invariant. This is done by simply

adding up the two auxiliary terms s1,i,s2,i and fi.

Algorithm 8 Perfectly Masked Multiplication (PMM)

Input: x=ue⊕r1,i−1,x′=u⊕r2,i,r1,i−1,r1,i,r2,i ∈F256

Output: ue+1 ⊕r1,i ∈F256

1: fi←x·x′{fi=ue+1 ⊕ue·r2,i ⊕u·r1,i−1⊕r1,i−1·r2,i}

2: v1,i ←x′·r1,i−1{v1,i =u·r1,i−1⊕r1,i−1·r2,i}

3: v2,i ←v1,i ⊕r1,i {v2,i =u·r1,i−1⊕r1,i−1·r2,i ⊕r1,i}

4: s1,i ←v2,i ⊕r1,i−1·r2,i {s1,i =u·r1,i−1⊕r1,i}

5: s2,i ←x·r2,i {s2,i =ue·r2,i ⊕r1,i−1·r2,i}

6: t1,i ←fi⊕s1,i {t1,i =ue+1 ⊕ue·r2,i ⊕r1,i−1·r2,i ⊕r1,i}

7: ti←t1,i ⊕s2,i {ti=ue+1 ⊕r1,i}

Table 4.1 lists all intermediate results that occur during the computation of x254.

4.3.3 Security Analysis

As defined in our security model we have to look at all intermediate results. For Algorithm

7 and Algorithm 8 we only have to analyze the distributions of the following intermediate

results: fi, s1,i, s2,i, ti, t1,i, v1,i, v2,i where 1 ≤i≤13. These are the results that depend on u.

We can neglect intermediate results such as r2

1,i since they do not depend on u.

Our security analysis is based on the following three lemmata that characterize the dis-

tributions of intermediate results.

i Op fis1,i s2,i t1,i ti

1 (S) u2⊕r2

1,0r2

1,0⊕r1,1u2⊕r1,1

2 (M) (u2⊕r1,1)(u⊕r2,2)ur1,1⊕r1,2u2r2,2⊕r1,1r2,2u3⊕u2r2,2⊕r1,1r2,2⊕r1,2u3⊕r1,2

3 (S) u6⊕r2

1,2r2

1,2⊕r1,3u6⊕r1,3

4 (M) (u6⊕r1,3)(u⊕r2,4)ur1,3⊕r1,4u6r2,4⊕r1,3r2,4u7⊕u6r2,4⊕r1,3r2,4⊕r1,4u7⊕r1,4

5 (S) u14 ⊕r2

1,4r2

1,4⊕r1,5u14 ⊕r1,5

6 (M) (u14 ⊕r1,5)(u⊕r2,6)ur1,5⊕r1,6u14r2,6⊕r1,5r2,6u15 ⊕u14r2,6⊕r1,5r2,6⊕r1,6u15 ⊕r1,6

7 (S) u30 ⊕r2

1,6r2

1,6⊕r1,7u30 ⊕r1,7

8 (M) (u30 ⊕r1,7)(u⊕r2,8)ur1,7⊕r1,8u30r2,8⊕r1,7r2,8u31 ⊕u30r2,8⊕r1,7r2,8⊕r1,8u31 ⊕r1,8

9 (S) u62 ⊕r2

1,8r2

1,8⊕r1,9u62 ⊕r1,9

10 (M) (u62 ⊕r2

1,9)(u⊕r2,10 )ur1,9⊕r1,10 u62r2,10 ⊕r1,9r2,10 u63 ⊕u62r2,10 ⊕r1,9r2,10 ⊕r1,10 u63 ⊕r1,10

11 (S) u126 ⊕r2

1,10 r2

1,10 ⊕r1,11 u126 ⊕r1,11

12 (M) (u126 ⊕r1,11 )(u⊕r2,12 )ur1,11 ⊕r1,12 u126 r2,12 ⊕r1,11 r2,12 u127 ⊕u126 r2,12 ⊕r1,11 r2,12 ⊕r1,12 u127 ⊕r1,12

13 (S) u254 ⊕r2

1,12 r2

1,12 ⊕r1,13 u254 ⊕r1,13

Table 4.1: Computation of (u254 ⊕r1,13) using repeated squaring

36 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

Lemma 1 Let u∈F256 be arbitrary. Let r∈F256 be uniformly distributed and independent

of u. Then Z=u⊕ris uniformly distributed.

Lemma 2 Let u, u′∈F256 and r, r′∈F256 be independent and uniformly distributed. Set

I1=u⊕rand I2=u′⊕r′. Then the product Z=I1·I2is distributed according to

Pr(Z=b) = ((29−1)/216 ,if b= 0

(28−1)/216 ,if b6= 0

We call this distribution D0.

Lemma 3 In any finite field of characteristic 2, squaring is a one-to-one mapping.

The proofs of these lemmata are straightforward.

In the sequel, we examine each of the intermediate results that occur in the PMS (Algo-

rithm 7) and in the PMM (Algorithm 8). We show that the distributions of each of these

intermediate results is independent of the secret value u.

Analysis of fiWe have to look at the intermediate result fiin the two cases of squaring

and multiplication.

Squaring: The computation is fi←t2

i−1=u2e⊕r2

1,i−1for some 2 ≤e≤254. Since

r1,i−1is chosen uniformly at random, Lemma 3 together with Lemma 1 shows that fi

is uniformly distributed for all u.

Multiplication: The variable is computed as fi←(ue+r1,i−1)·(u⊕r2,i) = ue+1 ⊕uer2,i ⊕

ur1,i−1⊕r1,i−1r2,i. Here the terms ue+r1,i−1and u⊕r2,i are independent (because of

the independence of r1,i−1and r2,i) and uniformly distributed (see Lemma 1). So by

Lemma 2, fiis distributed according to D0for all u.

Analysis of s1,i, s2,i We examine the intermediate results s1,i, s2,i for multiplication and

squaring.

Squaring: Here s1,i can be neglected since it does not depend on u.

Multiplication: The variable s1,i is calculated by adding or multiplying independent masks

on the term (u⊕r2,i) leading to the term ur1,i−1⊕r1,i. So s1,i is obviously uniformly

distributed. The variable s2,i ←(ue⊕r1,i−1)·r2,i is the product of two independent

and uniformly distributed variables that are both independent of u. So the variable s2,i

is distributed according to D0independent of the value of u.

4.3. Perfectly Masking AES against Order-1Adversaries 37

Analysis of t1,i, tiAll these intermediate results are sums of some part depending on u

and an independent additive mask. So all of them are uniformly distributed by Lemma 1.

Hence corresponding intermediate results are always identically distributed and indepen-

dent of the value of u. This implies that the whole computation is perfectly masked.

4.3.4 Simplified Version

Previously we assumed that for each step we generate new random masks. In the sequel,

we show how to improve the method described above in terms of the number of random

masks needed to achieve a perfectly masked exponentiation. The fact that an adversary only

obtains a single intermediate result allows us to reuse random masks in different steps of the

algorithm.

Algorithm 9 Simplified Perfectly Masked Squaring (s-PMS)

Input: x=ue⊕r1, r1∈F256

Output: u2e⊕r1∈F256

1: fi←x2{f1=u2e⊕r2

2: s1,i ←r2

1⊕r1{auxiliary term to correct fi}

3: ti←fi⊕s1,i {ti=u2e⊕r1}

We call the improved version of the squaring and multiplication algorithm simplified Per-

fectly Masked Squaring (s-PMS) (Algorithm 9) and simplified Perfectly Masked Multiplication

(s-PMM) (Algorithm 10), respectively.

Algorithm 10 Simplified Perfectly Masked Multiplication (s-PMM)

Input: x=ue⊕r1,x′=u⊕r2,r1, r2, r3∈F256

Output: ue+1 ⊕r1∈F256

1: fi←x·x′{fi=ue+1 ⊕ue·r2⊕u·r1⊕r1·r2}

2: t1←r1·r2⊕r3{t1=r1·r2⊕r3}

3: f′←f⊕t1{f′=ue+1 ⊕ue·r2⊕u·r1⊕r3}

4: s1,i ←x·r2{s1,i =ue·r1⊕r1·r2}

5: s2,i ←x′·r1{s2,i =u·r1⊕r1·r2}

6: t1,i ←f′

i⊕s1,i {t1,i =ue+1 ⊕u·r1⊕r1·r2⊕r3}

7: t2,i ←t1,i ⊕s2,i {t2,i =ue+1 ⊕r3}

8: t3,i ←t2,i ⊕r3⊕r1{t3,i =ue+1 ⊕r1}

Thus, we can reduce the number of random masks needed to only three masks (r1, r2, r3).

To achieve this we modify our computations such that after each step we switch back to our

original mask. This can be done by simply adding our original mask and then adding our

temporarily used mask. Because of the independence of the masks this has no impact on the

38 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

security. Table 4.2 lists all intermediate results that occur during the computation of x254

using the simplified method.

i Op fis1,i s2,i t1,i t2,i t3,i ti

1 (S) u2⊕r2

1r2

1⊕r1u2⊕r1

2 (M) (u2⊕r1)(u⊕r2)ur1⊕r3u2r2⊕r1r2u3⊕u2r2⊕r1r2⊕r3u3⊕r3u3⊕r3⊕r1u3⊕r1

3 (S) u6⊕r2

1r2

1⊕r1u6⊕r1

4 (M) (u6⊕r1)(u⊕r2)ur1⊕r3u6r2⊕r1r2u7⊕u6r2⊕r1r2⊕r3u7⊕r3u7⊕r3⊕r1u7⊕r1

5 (S) u14 ⊕r2

1r2

1⊕r1u14 ⊕r1

6 (M) (u14 ⊕r1)(u⊕r2)ur1⊕r3u14r2⊕r1r2u15 ⊕u14r2⊕r1r2⊕r3u15 ⊕r3u15 ⊕r3⊕r1u15 ⊕r1

7 (S) u30 ⊕r2

1r2

1⊕r1u30 ⊕r1

8 (M) (u30 ⊕r1)(u⊕r2)ur1⊕r3u30r2⊕r1r2u31 ⊕u30r2⊕r1r2⊕r3u31 ⊕r3u31 ⊕r3⊕r1u31 ⊕r1

9 (S) u62 ⊕r2

1r2

1⊕r1u62 ⊕r1

10 (M) (u62 ⊕r2

1)(u⊕r2)ur1⊕r3u62r2⊕r1r2u63 ⊕u62r2⊕r1r2⊕r3u63 ⊕r3u63 ⊕r3⊕r1u63 ⊕r1

11 (S) u126 ⊕r2

1r2

1⊕r1u126 ⊕r1

12 (M) (u126 ⊕r1)(u⊕r2)ur1⊕r3u126r2⊕r1r2u127 ⊕u126r2⊕r1r2⊕r3u127 ⊕r3u127 ⊕r3⊕r1u127 ⊕r1

13 (S) u254 ⊕r2

1r2

1⊕r1u254 ⊕r1

Table 4.2: Computation of (u254 ⊕r1) using repeated squaring (simplified version)

4.4 Implementation and Costs

Throughout the chapter, we have only considered a theoretical implementation of the in-

version algorithm according to the square-and-multiply algorithm. However, our method is

compatible with any implementation that combines additions, multiplications, and squarings

in a field or ring. More precisely, an arbitrary straight-line program over some finite field

using only additions and multiplications can be transformed to an equivalent program that

is perfectly masked. We do not consider software implementations of the presented coun-

termeasures. However, we notice that for constrained environments previous publications

have based their software implementations of side channel countermeasures on table lookups.

From a hardware point of view, the most area efficient ASIC hardware implementation is the

one described in (Satoh, Morioka, Takano and Munetoh 2001) based on composite fields. We

will discuss a possible implementation of our countermeasure based on composite fields and

will provide area and delay estimates in the next section.

4.4.1 Efficient Hardware Implementation over GF(((22)2)2)

First we describe in some detail how to implement an inverter over GF(((22)2)2), so that it

is clear how we obtained our area and delay estimates. This methodology is not new and it is

well known in the literature, e.g., see (Lidl and Niederreiter 1983). We assume a composite

field representation GF(((22)2)2)∼

=F256 for the inverse transformation using the following

4.4. Implementation and Costs 39

irreducible polynomials:

GF(22) : P(x) = x2+x+ 1, P(α) = 0

GF((22)2) : Q(y) = y2+y+α, Q(β) = 0

GF(((22)2)2) : R(z) = z2+z+λ, λ = (α+ 1)β

We use the s-PMM and s-PMS algorithms from Section 4.3 instead of the usual ones to build

our inversion circuit and, thus, render it secure against side channel attacks. Based on (Itoh

and Tsujii 1988) and (Guajardo and Paar 2002), (Satoh et al. 2001) notice that for A∈

GF(((22)2)2), A−1can be computed as A−1= (A17)−1A16, where A17 ∈GF((22)2). See for

example (Lidl and Niederreiter 1983) for the proof. Notice that the Itoh and Tsujii algorithm

can be recursively applied to B=A17 ∈GF((22)2), thus obtaining B−1= (B4·B)−1·(B4)

where B5∈GF(22). In the following, we write B=B1β+B0∈GF((22)2) with Bi∈GF(22).

Then, we can minimize the area requirement of the implementation using the following facts:

1. B4∈GF((22)2) can be computed as B4≡B1β+(B1+B0), i.e., only one addition over

GF(22).

2. B5∈GF(22) can be computed as B5≡B0·B1+B2

0+B2

1·α, where B2

1·αrequires

only wires for its implementation (no gates).

3. Given C=c1α+c0∈GF(22), C−1≡c1α+ (c1+c0), i.e., it requires one GF(2) adder.

Thus, computing B−1=B−5·B4∈GF((22)2) requires 3 GF(22) multipliers, 1 GF(22)

squarer, and 4 GF(22) adders. The inversion in GF(((22)2)2) can then be implemented

according to (Satoh et al. 2001) with 2 adders, 3 multipliers, 1 inverter, and 1 squarer

followed by multiplication with λ= (α+ 1)β, all over GF((22)2).

The hardware implementation of the perfectly masked version can be implemented sim-

ilarly except that instead of using the usual adders, multipliers, squarers, and inverters, we

use circuits which implement the algorithms from Section 4.3 (page 33).

4.4.2 Cost and Comparison to Previous Countermeasures

Area and delay estimates for circuits with and without countermeasures are provided in Table

4.3. The estimates are given in terms of the area and delay of 2-input AND gates, 2-input

XOR gates, and NOT gates. The complexity and specific implementation of these circuits

is taken from (Voigtl¨ander 2003). In addition, we provide complexity estimates in terms of

normalized area and delay. The normalization is done with respect to the area and delay

of a NOT gate. We have assumed that the areas of a 2-input AND gate and 2-input XOR

gate are twice and 3 times that of an inverter, respectively. Similarly, it is assumed that the

delays of NOT, AND, and XOR gates are equal. Notice that the assumptions regarding the

gates’ area and delay are not arbitrary but based on the actual sizes of several standard cell

libraries.

40 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

Arithmetic Operation A A′T T ′A′·T′

Inversion over GF (((22)2)2) 312 1 17 1 1

(Satoh et al. 2001)

Inversion with DPA countermeasure 1071 3.4 26 1.5 5.1

from (Trichina 2003) according to (4.1)

GF (((22)2)2) PM inverter 1704 5.5 21 1.2 6.6

from this thesis (Bl¨omer et al. 2004)

Inversion with DPA countermeasure 1341 4.3 34 2 8.6

from (Trichina 2003)

Inversion with countermeasure 1784 5.7 34 2 11.4

from (Akkar and Giraud 2001)

Table 4.3: Hardware cost comparison of area Aand delay Tfor different inversion circuits

with side channel countermeasures. A′:= A/ANormal Inv. and T′:= T/TNormal Inv. are the

normalized area and delay respectively.

Finally, we point out that (Satoh et al. 2001) which describes AES ASIC implementations

over GF(((22)2)2) does not provide the actual circuits used to implement the AES sbox.

Table 4.3 provides a cost comparison among the different masking countermeasures. We

did not consider the method from (Goli´c and Tymen 2002) briefly sketched in Section 4.2

(page 32) because its hardware implementation requires too many hardware resources. We

can estimate the cost of (Goli´c and Tymen 2002) if the degree of the polynomial qis n= 8

by simply considering the cost of a multiplier and an inverter over F2[x]/(mq)∼

=F256 ×F2n.

According to (Drolet 1998), such a multiplier requires 289 2-input AND gates and 272 2-input

XOR gates. The map INV′(v) = v254 mod mq can also be implemented with a multiplier (a

squarer requires only wires). Thus, we would need at least 1 multiplier and 1 inverter over

F2[x]/(mq) and 3 multipliers and 1 inverter over F256. This results in a circuit which requires

at least 731 AND and 766 XOR gates or about twice as many gates as our method.

Table 4.3 shows that the countermeasure of (Trichina 2003) implemented according to

Equation (4.1) on page 33 has the best area/time product of all the implementations. How-

ever, as we have seen in Section 4.2, this countermeasure is susceptible to DPA attacks if

the input byte is zero and, thus, does not provide perfect masking. If we then consider the

best area/time product of the countermeasures that offer DPA resistance, the implementa-

tion presented in this chapter has the best area/time product. This result is mainly due

to the reduced critical path in the circuit. In addition, our design encourages re-usability

of previously designed blocks. In other words, since the masking method depends only on

multipliers and adders, if one has multiplier and adder blocks already designed, they can be

used immediately to build a perfectly masked circuit (with the work from (Trichina 2003),

implementation of the masking countermeasure would require a complete circuit redesign).

Finally, we estimate the cost that our masking countermeasure would have on an AES

4.5. Order-dPerfectly Masking 41

hardware implementation. To do this, we assume that the implementation would follow the

architecture described in (Satoh et al. 2001) where the SubBytes transformation occupies

about 22% of the design with 4 sboxes in parallel. In SubBytes, the inverse transformation

accounts for 60% or about 14% of the total area. We also assume that the remaining cir-

cuits require twice as much area as an implementation without masking countermeasures.

Then, our new inversion circuit would need about 2.5 times the area that an AES hardware

implementation without countermeasures would need. Of this 31% would correspond to the

inverter circuit. The required area is only 20% larger than an implementation that uses hard-

ware countermeasures based on the usage of different hardware logic. Such methods double

the hardware resources when compared to an implementation using standard (single-rail)

logic.

In addition to time and area, other costs are also of importance. For example, the amount

of randomness is rather crucial since its generation is quite expensive. In our simplified

algorithm we only need 3 random masks in order to compute INV(x) in a secure manner.

Another important cost factor is the number of operations that have to be protected by

hardware means. Our approach needs this inevitable protection only for one intermediate

result. Hence it is optimal with respect to this cost measure.

4.5 Order-dPerfectly Masking

For the sake of completeness, in this section we focus on generalizing the method of Section

4.3 to adversaries of arbitrary but fixed order d. However, adversaries that can obtain values

of two or more intermediate results are very powerful and assumed to be not realistic right

now. Moreover, for increasing dan increasing amount of random bits is needed to achieve

such a high level of security. This, however, decreases efficiency considerably. In particular,

instead of using a single random byte ras a mask one has to use masks of the form

i=1

that are the sum of dindependent and uniformly distributed random bytes. Hence, our

invariant in the order-dcase is that every output of an operation is of the form

u⊕R=u⊕

i=1

ri(4.3)

for dindependent and uniformly distributed random bytes ri.

4.5.1 Perfect Mask Change

However, simply substituting the mask rby mask Rin the method described above is not

sufficient. To see this, consider the problem of changing masks of intermediate results in order

42 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

to introduce new randomness into the encryption. Note that changing masks is implicitely

done in the Perfectly Masked Squaring Algorithm (Algorithm 7 (page 34)) and the Perfectly

Masked Multiplication Algorithm (Algorithm 8 (page 35)). To change the mask Rof an

intermediate result Z1:= u⊕Rinto the mask R′the straightforward approach is to compute

1. Z1:= u⊕R(1)

2. Z2:= Z1⊕R(2) =u⊕R(1) ⊕R(2)

3. Z3:= Z2⊕R(1) =u⊕R(2)

However, for d≥3 an order-dadversary Acan get the values of Z1, Z2and Z3. Hence, A

can compute the unmasked value u=Z1⊕Z2⊕Z3.

In order to securely change the mask R(1) =Pd

i=1 r(1)

iof an intermediate result u⊕R(1)

to a different mask R(2) =Pd

i=1 r(2)

iwe propose to use Algorithm 11.

Algorithm 11 Perfect Mask Change (PMC)

Input: Z1=ue⊕R(1) for some 1 ≤e≤254, d∈N, r1...,rd

|{z }

R(1)

,rd+1, . . . , r2d

|{z }

R(2)

Output: Z2d+1 =ue⊕R(2)

1: for i= 2 ...2ddo

2: Zi←Zi−1⊕rd+i{add i-th masking byte of R(2)}

3: Zi+1 ←Zi⊕ri{remove i-th masking byte of R(1)}

4: i←i+ 1

5: end for

Example 1 For d= 3 Algorithm 11 computes the following intermediate results:

Z1=ue⊕r(1)

1⊕r(1)

2⊕r(1)

Z2=ue⊕r(1)

1⊕r(1)

2⊕r(1)

3⊕r(2)

Z3=ue⊕r(1)

2⊕r(1)

3⊕r(2)

Z4=ue⊕r(1)

2⊕r(1)

3⊕r(2)

1⊕r(2)

Z5=ue⊕r(1)

3⊕r(2)

1⊕r(2)

Z6=ue⊕r(1)

3⊕r(2)

1⊕r(2)

2⊕r(2)

Z7=ue⊕r(2)

1⊕r(2)

2⊕r(2)

Security Analysis

We first introduce our notation that we use for the proof of security. Let dbe the number of

intermediate results an adversary can get. For 1 ≤i≤2d+ 1 let

δi=i+ 1 mod 2

4.5. Order-dPerfectly Masking 43

indicate whether the intermediate result Ziis randomized by dor d+ 1 masks. Let

Si=i

2,...,i

2+d+δi

denote the set of indices of masks involved in the randomization of Zi. I.e.,

Zi=ue⊕X

j∈Si

rj.

Furthermore, let 1 ≤ℓ≤dand

I:= {i1,...,iℓ|i1< i2<···< iℓ}

be the set of indices of intermediate results known to the attacker and let

M:= {j1,...,jd−ℓ|j1< j2<···< jd−ℓ}

be the set of indices of masking bytes known to the attacker.

For i∈Ilet Si=Si\Mdenote the set of masks unknown to the attacker that randomize

the intermediate result Ziand let

Zi:= Zi⊕X

j∈M∩Si

rj=ue⊕X

j∈Si

denote a known intermediate result after removing all known masks. Note that |Si| ≥ 1 for all

i∈Iholds by construction. Hence, all Ziare uniformly distributed by Lemma 1 (page 36).

Furthermore, depending on the set of known masks it is possible that Zi=Zjfor Zi6=Zj.

Lemma 4 Let Zi,Ziand rjbe defined as above. Then

Pr(Zi1,...,Ziℓ|rj1,...,rjd−ℓ) = Pr(Zi1, . . . , Ziℓ).

Proof. Let ζi1,...,ζiℓ∈F256 and ρj1,...,ρjd−ℓ∈F256.

Pr `Zi1=ζi1,...,Ziℓ=ζiℓ|rj1=ρj1,...,rjd−ℓ=ρjd−ℓ´

= Pr 0

@Zi1⊕X

j∈Si1∩M

rj=ζi1

A,...,0

@Ziℓ⊕X

j∈Siℓ∩M

rj=ζiℓ

(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´1

= Pr 0

@Zi1=ζi1⊕X

j∈Si1∩M

ρj1

A,...,0

@Ziℓ=ζiℓ⊕X

j∈Siℓ∩M

ρj1

(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´1

Pr ““Zi1=ζi1⊕Pj∈Si1∩Mρj”,...,“Ziℓ=ζiℓ⊕Pj∈Siℓ∩Mρj”,(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´”

Pr `(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´´

44 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

Since |Si|≥ 1 for all i∈Ithe variables Zi1, . . . , Ziℓand rj,...,rjd−ℓare stochastically

independent. Hence, we have that

Pr ““Zi1=ζi1⊕Pj∈Si1∩Mρj”,...,“Ziℓ=ζiℓ⊕Pj∈Siℓ∩Mρj”,(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´”

Pr `(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´´

Pr ““Zi1=ζi1⊕Pj∈Si1∩Mρj”,...,“Ziℓ=ζiℓ⊕Pj∈Siℓ∩Mρj””·Pr `(rj1=ρj1),...,(rjd−ℓ=ρjd−ℓ)´

Pr `(rj1=ρj1),...,(rjd−ℓ=ρjd−ℓ)´

= Pr 0

@Zi1=ζi1⊕X

j∈Si1∩M

ρj1

A,...,0

@Ziℓ=ζiℓ⊕X

j∈Siℓ∩M

ρj1

Since all Zifor 1 ≤i≤ℓare uniformly distributed, this proves the lemma. ⊓⊔

To prove the security of Algorithm 11 we also need the following lemma.

Lemma 5 For some 1≤b≤ℓlet

I=[

1≤c≤b

be a partition of the set Iinto subsets I1,...,Ibsuch that

Zi=Zj⇔ ∃ 1≤c≤b:i, j ∈Ic.

I.e., the indices i, j of two elements Ziare in the same subset Iciff Zi=Zj.

Then

Pr ^

i∈I

Zi!=Y

1≤c≤b

Pr ^

i∈Ic

Zi!.

Proof. For i∈Iclet Tcdenote the set of indices of masks that randomize Zi. The

construction of the intermediate results of Algorithm 11 implies that for each 1 ≤c < b

min{j|j∈Tc}<min{j|j∈[

c<c′≤b

Tc′}

max{j|j∈Tc}<max{j|j∈\

c<c′≤b

Tc′}

holds. Hence, for each 1 ≤c < b at least one of the following cases holds:

1. If

Tc\[

c+1≤j≤b

Tj6=∅

it follows that all elements of {Zi|i∈Ic}are randomized by a uniformly distributed

mask that is not involved in randomizing elements of {Zi|i∈Sc+1≤j≤bIj}. Hence, it

follows that

Pr 

^

i∈Sc≤j≤bIj

Zi

= Pr ^

i∈Ic

Zi!·Pr 

^

i∈Sc+1≤j≤bIj

Zi

.

4.5. Order-dPerfectly Masking 45

2. If \

c+1≤j≤b

Tj\Tc6=∅

it follows that all elements of {Zi|i∈Sc+1≤j≤bIj}are randomized by a uniformly

distributed mask that is not involved in randomizing elements of {Zi|i∈Ic}. Hence, it

follows that

Pr 

^

i∈Sc≤j≤bIj

Zi

= Pr ^

i∈Ic

Zi!·Pr 

^

i∈Sc+1≤j≤bIj

Zi

.

Applying this case differentiation inductively proves the lemma. ⊓⊔

Lemma 4 shows that instead of analyzing the joint distribution of ℓintermediate results

Zi1,...,Ziℓtogether with d−ℓmasks rj1,...,rjd−ℓit is sufficient to analyze the joint dis-

tribution of the ℓvariables Zi1, . . . , Ziℓas defined above. Lemma 5 shows that the joint

distribution of Zi1, . . . , Ziℓis in fact independent of the secret variable u. Hence, an adver-

sary that knows at most dintermediate results and masks of Algorithm 11 does not learn

anything about the secret value u. Therefore, Lemma 4 together with Lemma 5 proves that

Algorithm 11 is order-dperfectly masked.

Generalized Mask Changing

We can generalize securely changing of masks for intermediate results

u⊕

i=1

R(i)=u⊕

i=1

j=1

r(i)

that is masked with lmasks R(1),...,R(l)each consisting of the sum of drandom bytes.

Algorithm 12 Perfect Multiple Mask Change (PMMC)

Input: Z(1) =ue⊕R(1) ⊕...⊕R(l),d, l ∈N

r(1)

1...,r(1)

|{z }

R(1)

,r(2)

1,...,r(2)

|{z }

R(2)

,...,r(l)

1...,r(l)

|{z }

R(l)

,r(l+1)

1...,r(l+1)

|{z }

R(l+1)

Output: Z(2)

d=ue⊕R(l+1)

1: Z(l)

0←Z

2: for i= 1 . . . d do

3: Z(0)

i←Z(l)

i−1⊕r(l+1)

i{add fresh mask r(l+1)

4: for j= 1 . . . l do

5: Z(j)

i←Z(j−1)

i⊕r(j)

i{remove old mask r(j)

6: end for

7: end for

46 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

The security of Algorithm 12 can be shown using similar arguments as in the security

proof of Algorithm 11.

In the sequel, we propose methods for squaring and multiplication in a secure manner.

4.5.2 Squaring

The order-dperfectly masked squaring algorithm is shown in Algorithm 13. The input

ue⊕R(1) is squared in Step 1. In the following steps we use the method PMC (Algorithm

11) to change the mask (R(1))2to R(2). Since squaring in a finite field of charactersistic 2 is

a one-to-one mapping the security of the squaring step entirely relies on the security of the

mask change. We showed above that the Algorithm PMC is in fact order-dperfectly masked.

Hence, Algorithm 13 is also order-dperfectly masked.

Algorithm 13 Order-dPerfectly Masked Squaring (d-PMS)

Input: x=ue⊕R(1),r(1)

1,...,r(1)

|{z }

R(1)

,r(2)

1,...,r(2)

|{z }

R(2)

Output: u2e⊕R(2)

1: f←x2{f=u2e⊕(R(1))2}

2: t←PMC(f, (r(1)

1)2,...,(r(1)

d)2, r(2)

1,...,r(2)

d){t=u2e⊕R(2)}

4.5.3 Multiplication

The order-dperfectly masked multiplication algorithm is presented in Algorithm 14. The

inputs Z(1) =ue⊕R(1) and Z(2) =uf⊕R(2) are multiplied in Step 1. The first loop (Steps

2-6) eliminates the first disturbing term ue·R(2) leaving the intermediate result

Z(1)

d=ue+f⊕uf·R(1) ⊕R(3).

The second disturbing term uf·R(1) is removed in the second loop (steps 8-12). In the final

step the result is recomputed to comply to our invariant (4.3) on page 41.

We verified that Algorithm 14 is order-dperfectly masked for d= 1,2,3. Due to the large

number of different distributions of the intermediate results we are not aware of an efficient

method to prove the security of Algorithm 14 for arbitrary d > 3. We believe that Algorithm

14 is also order-dperfectly masked for d > 3. However, the security level provided by an

order-3 perfectly masked algorithm goes far beyond the security requirements of practical

applications.

4.6. Conclusions 47

Algorithm 14 Perfectly Masked Multiplication (PMM)

Input: Z(1) =ue⊕R(1),Z(2) =uf⊕R(2),d∈N

r(1)

1,...,r(1)

d, r(2)

1,...,r(2)

| {z }

used masks

,r(3)

1,...,r(3)

d, r(4)

1,...,r(4)

d, r(5)

1,...,r(5)

| {z }

new masks

Output: ue+f⊕R(5)

1: Z(1)

0←Z(1) ·Z(2) {Z(1) =ue+f⊕ue·R(2) ⊕uf·R(1) ⊕R(1) ·R(2)}

eliminate first disturbing term

2: for i= 1 . . . d do

3: s(1)

i←Z(1) ·r(2)

i{s(1)

i=ue·r(2)

i⊕R(1) ·r(2)

4: s(2)

i←s(1)

i⊕r(3)

i{s(2)

i=ue·r(2)

i⊕R(1) ·r(2)

i⊕r(3)

5: Z(1)

i←Z(1)

i−1⊕s(2)

{Z(1)

i=ue+f⊕ue·Pd

j=i+1 r(2) +uf·R(1) +R(1) ·Pd

j=i+1 r(2)

i+Pi

j=1 r(3)

6: end for {Z(1)

d=ue+f⊕uf·R(1) ⊕R(3)}

eliminate second disturbing term

7: Z(2)

0←Z(1)

8: for i= 1 . . . d do

9: s(3)

i←Z(2) ·r(1)

i{s(3)

i=uf·r(1)

i⊕R(2) ·r(1)

10: s(4)

i←s(3)

i⊕r(4)

i{s(4)

i=uf·r(1)

i⊕R(2) ·r(1)

i⊕r(4)

11: Z(2)

i←Z(2)

i−1⊕s(4)

{Z(2)

i=ue+f⊕uf·Pd

j=i+1 r(1) +R(2) ·Pd

j=i+1 r(1)

i+R(3) +Pi

j=1 r(4)

12: end for {Z(2)

d←ue+f⊕R(1)R(2) ⊕R(3) ⊕R(4)}

Change mask

13: Z(3) ←PMMC(Z(2)

d,(r(1)

1r(2)

1),...,(r(1)

dr(2)

d), r(3)

1,...,r(3)

d, r(4)

1,...,r(4)

d, r(5)

1,...,r(5)

{Z(3) ←ue+f⊕R(5)}

4.6 Conclusions

In this chapter we analyzed the security of cryptographic algorithms such as AES against

side channel attacks. We proposed a strong and general model to analyse the security.

Furthermore, we proposed a generic method to implement cryptographic algorithms that is

provably secure in our model. I.e., we showed that using our method, an adversary who can

determine the value of a single but arbitrary intermediate result in every encryption does not

derive any information about the secret key. Moreover, we analyzed the costs of our method

when implemented in hardware and compared it with the efficiency of other methods. In the

last part, we proposed a way to generalize our method to even more powerful adversaries

that can obtain the values of an arbitrary but fixed number dof intermediate results.

48 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms

Chapter 5

Fault Based Collision Attacks

In this chapter we examine the security threat caused by so called fault attacks. Fault attacks

are a special type of side channel attacks in which the attacker enforces the malfunction of a

cryptographic device. The output or reaction of the device is then used to derive information

about the secret key. A typical target for fault attacks are smartcards (Rankl and Effing 2002).

A smartcard is a general purpose computer embedded in a plastic cover of a credit card’s size.

The main building blocks of a smartcard are a CPU, a ROM that contains for example the

operating system, an EEPROM containing among other things the secret key, and a RAM

to store intermediate results of computations. To communicate with the outside world the

smartcard has to be inserted into a so called card reader that also provides the energy the

smartcard needs for operating.

Smartcards are perfectly suited for storing private information such as cryptographic keys

because the corresponding cryptographic operations such as encryption or digital signature

are carried out directly on the smartcard. Therefore, the key never has to leave the smart-

card and hence seems to be protected very well, even in hostile environments. However, as

explained in Chapter 3 (page 19) physical instances of algorithms (in hardware or software)

may leak information about the computation through side channels.

(Boneh, DeMillo and Lipton 1997) were the first who showed that faults induced into the

encryption process of RSA can reveal the secret key. (Biham and Shamir 1997) combined fault

attacks with the concept of differentials and mounted a differential fault attack (DFA) on DES.

(Skorobogatov and Anderson 2002) showed that fault attacks are realizable with sufficient

precision in practice. (Bl¨omer and Seifert 2003), (Bar-El, Choukri, Naccache, Tunstall and

Whelan 2006) and (Otto 2005) give an overview of the physics of inducing faults.

In this chapter we focus on fault attacks on AES. The first fault attacks on AES reported

in the literature were due to (Bl¨omer and Seifert 2003) followed by improved attacks of

(Dusart, Letourneux and Vivolo 2003), (Giraud 2004), (Chen and Yen 2003) and (Piret and

Quisquater 2003). All these publications demonstrate the power of fault attacks. However,

these attacks either use the fault model of bit resets (Bl¨omer and Seifert 2003) in which case

50 Chapter 5. Fault Based Collision Attacks

they do not need the faulty ciphertexts. Or the attacks only require the fault model of bit

flips, in which case, however, the attacks need the faulty ciphertexts as described in (Dusart

et al. 2003), (Giraud 2004), (Chen and Yen 2003), (Piret and Quisquater 2003). The fault

attacks presented in this thesis use bit flips and, instead of faulty ciphertexts, the attacks only

use so called collision information. This turns out to be a much weaker requirement than the

requirement that an attacker gets complete faulty ciphertexts. To obtain our new attacks,

we show how to combine fault attacks with so called collision attacks. In a collision attack

the adversary tries to detect identical intermediate results during the encryption of different

plaintexts, e.g., by using side channel information, and uses this information to derive the

secret key. Basically this idea was due to Dobbertin. Schramm et al. developed collision

attacks against DES (Schramm, Wollinger and Paar 2003) and AES (Schramm, Leander,

Felke and Paar 2004) and showed how to detect collisions using power traces.

We combine the concepts of fault and collision attacks by inducing faults to generate

collisions. This approach allows to relax the requirement of getting faulty ciphertexts to the

requirement of detecting collisions in the encryption process. First we explain the basic idea

underlying our attacks by presenting an attack based on some rather strong assumptions.

After that we present an attack utilizing the same basic ideas that successfully attacks a

smartcard that is protected by a so called memory encryption mechanism (MEM). To the

best of our knowledge, this is the first fault attack on smartcards protected by memory

encryption.

To defend against side channel attacks the manufacturers of smartcards developed sev-

eral countermeasures. One type of countermeasure is intended to protect the card, e.g.,

shields, sensors or error detection. Another type is designed to render side channel attacks

useless using techniques to obfuscate the side channel information, e.g. by random masking

(Messerges 2000), (Goli´c and Tymen 2002), (Bl¨omer et al. 2004). Yet another more efficient

approach is to use a so called memory encryption mechanism (MEM). Memory encryption

mechanisms encrypt an intermediate result directly after it leaves the processor and decrypts

ROM

EEPROM

MEM

Processor

protected against faults

encrypted

RAM

key

Figure 5.1: Model of an enhanced smartcard with memory encryption mechanism (MEM)

data right before it enters the processor (see Figure 5.1). This guarantees that all data stored

in the RAM is encrypted. The intention is that memory encryption makes it harder for an

adversary to derive information about intermediate states of the encryption process by using

side channels of the smartcard. In general, it is assumed that unlike the RAM it is too diffi-

cult to induce faults into the registers of the highly integrated processor with some reasonable

precision. Hence, memory encryption is widely believed to be a useful countermeasure against

side channel attacks, i.e., fault attacks.

Due to the limited computational power of smartcards the MEM has to be very fast.

So the manufacturers of smartcards use some light encryption algorithms that are very fast

but may not be secure against serious cryptanalysis. To increase the impact of the MEM

the manufacturer like to keep their algorithms secret. However, many manufacturers do

not analyze the impact of MEMs on security but simply present it as an improvement of

security. The strategy is to implement as many promising countermeasures as possible by

not exceeding a certain cost threshold. Even a weak countermeasure should increase security.

Our attack, that works even in the presence of a MEM, shows that the security improve-

ment of the MEM as generally used is rather limited. In particular, we present an attack on

an AES implementation protected by MEM that determines the full AES key by inducing

only 285 faults and detecting collisions.

The chapter is organized as follows.

Section 5.1: The Concept of Fault Attacks ......................................52

In this section we introduce the concept of fault attacks. We categorize the existing

fault attacks depending on their properties like the precision of time and location.

Furthermore, we give the basic methods known so far to analyse the output or reaction

of the device respectively to derive secret information.

Section 5.2: The Concept of Collision Attacks ..................................56

To get a better understanding of fault based collision attacks we briefly sketch the idea

of so called collision attacks. Later we combine this concept with fault attacks to obtain

our novel concept of fault based collision attacks.

Section 5.3: New Fault Model ....................................................56

In Section 5.3 we present our model for analyzing fault based collision attacks as pub-

lished in (Bl¨omer and Krummel 2006). Fault based collsion attacks are an improvement

of classical fault attacks. On one hand they do not need strong assumption like the

ability to force bits to a certain value. On the other hand they do not need faulty

ciphertext to derive information about the secret key. We explicitely specify the under-

lying assumptions and justify why fault based collision attacks are realistic threats to

the security of cryptographic hardware.

Section 5.4: New Fault Attacks ...................................................59

In Section 5.4 we describe fault based collision attacks on AES and analyze their com-

52 Chapter 5. Fault Based Collision Attacks

plexity. Unlike the classical fault attacks using bit flips like the attacks of (Dusart

et al. 2003), (Giraud 2004), (Chen and Yen 2003) and (Piret and Quisquater 2003)

obtaining faulty ciphertexts is not essential for our attacks. Therefore our attacks are

applicable in scenarios where classical fault attacks do not work. On the other hand,

our new attacks need more faults than the classical fault attacks. We explain the basic

idea in our first attack. This attack is our basic attack and is based on rather strong

assumptions. However, in the sequel we show how to strengthen it and how to adapt it

to several other scenarios. The second attack we present is our strongest attack. This

attack shows how to successfully attack a smartcard that is protected by a MEM. To the

best of our knowledge this is the first successful attack against a smartcard protected

by a MEM.

Section 5.5: Conclusions ...........................................................69

We finish this chapter by reflecting the impact of fault based collision attacks on the

security of recent smartcards. Furthermore, we propose some ideas of how to thwart

cryptographic hardware against such attacks.

5.1 The Concept of Fault Attacks

In the sequel, we briefly summarize methods commonly suggested to induce faults in an

encryption. Based on these methods we present the standard models to analyze fault attacks.

5.1.1 Methods to Induce Faults

Researchers developed a wide variety of methods to induce faults into electrical circuits. In

the sequel, we list some common fault induction methods to motivate the fault models we

give afterwards. The methods to induce faults are the origin to develop theoretical models for

developing and analyzing fault attacks on cryptographic algorithms. Since we focus on the

theoretical analysis of fault attacks we only describe each method briefly. A more complete

list can be found in (Bar-El et al. 2006) and (Otto 2005).

Optical Fault Induction Exposing an electrical circuit to intensive light source will cause

photoelectric effects due to the current induced by photons. In turn, these photoelectric

effects cause faulty behavior of the circuit. If the circuit is laid open, intensive light is

an easy way to induce faults. In (Skorobogatov and Anderson 2002) the authors showed

how to induce faults with some reasonable precision using only a low-cost flash light.

The precision of inducing faults this way can be improved using more sophisticated lab

instruments.

Power Spikes The power supply of a smartcard is always established by the smartcard

reader. To ensure that the smartcard works properly in common environments the

5.1. The Concept of Fault Attacks 53

manufacturer agreed in (ISO 2002) that a smartcard must tolerate a variation of ±10%

of the standard supply voltage of 5V. Increasing or decreasing the voltage beyond

the specified limits is called a power spike. Power spikes may result in a transient

malfunction of the smartcard. E.g., if a power spike occurs during an encryption some

intermediate operation may not work properly and produce a faulty result.

Temperature Like the supply voltage also the operating temperature of a device is restricted

to certain thresholds to ensure proper operation. Heating up or cooling down a smart-

card beyond these thresholds may result in malfunctions, for example modifications of

the content of RAM cells.

Clock Glitches Due to the lack of an internal clock the correct operation of a smartcard

entirely depends on the external clock signal that is given by the cardreader. Disturbing

this clock signal may cause the card to spuriously skip operations.

X-rays and Focused Ion Beams There are two different ways to use X-rays or focused

ion beams in attacking a smartcard. Firstly, they can be used to drill holes through

a mechanical shield with high precision. Hence, the shield cannot prevent an attacker

to access the underlying hardware with some analysis tools, e.g., a probe. Secondly,

X-rays and focused ion beams can be used to induce faults without manipulating the

coating of the smartcard. Details can be found in (K¨ommerling and Kuhn 1999).

Eddy Current The French physicist Leon Foucault discovered in 1851 that moving a con-

ductor through a magnetic field causes some current flow called eddy current. Using

eddy current to disturb the operations of a smartcard is one of the oldest proposed meth-

ods of fault induction. See for example (Kocher 1996), (Anderson and Kuhn 1996) and

(K¨ommerling and Kuhn 1999). However, it is difficult to focus the fault to a certain

area of the chip. In (Quisquater and Samyde 2002) the authors developed a refined

method of inducing eddy current.

5.1.2 Fault Models

To analyze fault based attacks we first have to develop adequate models that cover the

important aspects of real environments. Independent of the method to induce faults the

following properties are essential for the analysis of fault attacks:

Precision The precision of the fault induction is crucial for both the success and the com-

plexity of a fault based attack. We distinguish between the precision of time and

location. The precision of location defines the ability of the attacker to focus the fault

induction on a certain part of the hardware. To induce a fault into a specific interme-

diate result an adversary must also be able to induce faults at a precise time depending

on the progress of the encryption.

54 Chapter 5. Fault Based Collision Attacks

Number of affected bits This property specifies how many bits are affected by the induced

faults. Precise fault injection techniques can modify single bits whereas other techniques

may change bytes or even a whole intermediate state.

Effect of the fault Our strongest model allows the adversary to set a bit of an intermediate

result to a certain value. I.e., if an adversary Acan force a bit to be 0 we call this a

bit reset. Moreover, if Acan force a bit to be 1 we call it a bit set. In weaker models

the adversary does not have such a strong influence on the value of faulty intermediate

result. E.g., if the adversary can only modify a whole intermediate state it is very

unlikely that he can force the complete state to a chosen value. In such scenarios we

assume that he can change the intermediate state in a random and unpredictable way.

We call this random fault.

Incidence of fault The incidence of a fault also plays an important role in the analysis of

fault based attacks. A fault that only changes the content of a memory cell once and

works properly during the rest of the encryption is called a transient fault. For example,

a focused ion beam changes the content of some bits in the RAM but does not destroy

any transistor. In contrast permanent faults are defective parts of the hardware that

do not work correctly after the fault is induced. E.g., this could be an interrupted wire

that prevents the information flow.

A fault attack can be divided into two steps. In the first step the adversary Ainduces a

fault into the encryption, e.g., by using a method described above. We call this step fault

induction step.

In the second step of a fault based attack, Aanalyses the impact of the induced fault on

the encryption. Depending on the implementation and the abilities of Athe analysis differs.

We distinguish two kinds of fault based attacks:

Fault Attacks Based on the Analysis of Faulty Output

If the encryption algorithm is not protected against fault attacks it does not react on the fault

directly. It simply continues its computation based on faulty intermediate results and outputs

a faulty ciphertext in the end. Giving Aaccess to both the faulty and the corresponding

correct ciphertext allows him to backtrack the encryption and deduce information about the

last round key. E.g., for ciphers like AES an adversary Aperforms the following procedure:

1. Aguesses the i-th byte bk10

iof the last round key and computes some intermediate

results by tracing back the last round of the encryption for both the faulty and the

correct ciphertext.

2. Averifies whether the difference of the corresponding intermediate results could be

caused by the induced fault. If this difference could not be caused by the fault, the

5.1. The Concept of Fault Attacks 55

candidate bk10

iis proven to be wrong and discarded. In the other case, Akeeps bk10

ias

a possible key value.

See for example (Dusart et al. 2003) and (Giraud 2004) for fault attacks of this type on

AES.

Fault Attacks based on the Information whether the Output is Faulty or Not

If the implementation does not output the faulty ciphertext the analysis is more involved.

However, we assume that the adversary Aalways notices if the induced fault falsifies the

encryption. E.g., a so called security reset that puts the implementation into a specified state

after detecting a fault would reveal the information that a fault occurred. But even if the

implementation does not reveal the detection of a fault directly it has to react on the fault

somehow. For example, it might recompute the encryption. But such a special treatment

of faults could be detected by the adversary by simply measuring the time an encryption

takes. If a faulty encryption takes longer than the correct encryption the induced fault had

an impact on the encryption. If the faulty encryption is as fast as the correct encryption the

adversary concludes that the induced fault does not influence the encryption. We distinguish

two kinds of attacks:

try and error attack The so called try and error attack works if Acan set or reset bits.

After forcing a bit of an intermediate result to either 0 (or 1), Adetermines if a fault

occurred or not. If the encryption is correct then the fault attack did not change

the value of that bit. Hence, Aconcludes that the bit was 0 (or 1 respectively). If

the encryption is faulty then the fault attack changed the value of that bit. Hence,

Aconcludes that the bit was 1 (or 0 respectively). Repeating fault attacks Acan

determine the values of several bits of an intermediate state. In turn, Acan use this

information to derive information about the secret key. See for example (Bl¨omer and

Seifert 2003) for this kind of fault attacks on AES.

fail safe attack Like the try and error attack, the so called fail safe attack does not need

a faulty ciphertext either but also works with random faults. To illustrate the attack

consider the square and multiply always algorithm (Algorithm 15) to compute an RSA

signature.

This implementation was proposed to counteract side channel attacks like timing and

power analysis. However, it turned out that this implementation is susceptible to a fail

safe attack. The idea is as follows: The attacker induces a random fault into t1of the

jth execution of the loop in line 4 of Algorithm 15. He can then determine the jth

bit djof the secret exponent as follows. If dj= 0 then no multiplication is needed in

step jto compute the signature and the variable t1does not influence the computation.

Hence the result would be correct. If dj= 1 then t1is needed in step jto compute

56 Chapter 5. Fault Based Collision Attacks

Algorithm 15 square and multiply always

Input: RSA modulus N, message m∈ZN, secret exponent d=Pℓ−1

i=0 di·2i∈Z∗

ϕ(N),

di∈ {0,1}

Output: mdmod N

1: t←1

2: for i=l−1 to 0 do

3: t0←t2

4: t1←t0·m{multiply always}

5: if di= 0 then

6: t←t0

7: else

8: t←t1

9: end if

10: end for

the signature and hence the result would be incorrect. Acan determine the complete

secret key by repeating this attack for all bits of d.

5.2 The Concept of Collision Attacks

The idea of collision attacks was due to Dobbertin and the first collision attacks were published

in (Schramm et al. 2003) and (Schramm et al. 2004). A collision is the occurrence of identical

intermediate results in the encryptions of different plaintexts. An adversary Atries to detect

collisions and uses this information together with the plaintexts (or ciphertexts) to derive

information about the secret key. To detect collisions the authors of (Schramm et al. 2003)

and (Schramm et al. 2004) proposed to use side channel information, e.g. power traces,

mounted successful collision attacks on DES and AES.

5.3 New Fault Model

5.3.1 Notation

In this chapter we focus on the AES-128 symmetric cipher and simply call it AES. However,

the attacks presented in this chapter can also be easily adapted to other versions of AES

having larger key sizes. As defined in Chapter 2, let P:= {0,1}128 be the set of plaintexts,

C:= {0,1}128 be the set of ciphertexts and K:= {0,1}128 be the set of keys. In the classical

model the AES encryption with a fixed key k∈ K is a bijective function :

AESk:P → C

p7→ c:= AESk(p).

5.3. New Fault Model 57

Let

O:= {SB,SR,MC,AR}

denote the set of round transformations SubBytes,ShiftRows,MixColumns and AddRoundKey.

Furthermore, let

p(r),(o)

i,j

denote the jth bit of the ith byte of the encryption state of plaintext pafter the operation

o∈Oof round 1 ≤r≤10. For example, p(3),(SR)

5,3is the 3rd bit of the 5th byte of the

encryption state of plaintext pafter the ShiftRows transformation of round 3. In the sequel,

we will omit the index jthat defines the bit position if it is not relevant in that context. The

ith byte of the round key k(r)of round ris called k(r)

We can then define the set of bits that are results of a round transformation as

S:= np(0),(AR)

i,j |i∈ {0,...,15}, j ∈ {0,...,7}o∪

np(r),(o)

i,j |o∈O, r ∈ {1,...,10}, i ∈ {0,...,15}, j ∈ {0,...,7}o.

5.3.2 Model

To model faults mathematically, we extend the AES function with a second variable bthat

specifies a bit position during the computation of AESk. The set of all realizable functions

via AES is extended by flipping bit bduring the computation of AESk. We call the extended

function FAES:

FAESk:P × S → C

(p, b)7→ c:= FAESk(p, b)

However, the extended function called FAES(p, b) is not bijective. There exist collisions such

that two intermediate states of computations of FAES(p, b) and FAES(p′, b′) with different

inputs (p, b)6= (p′, b′) are equal. An attacker wants to detect those collisions and then use

them to derive the secret key k.

In our scenario we have a smartcard with an implementation of AES and a secret AES

key kstored on it. An adversary Ahas access to the smartcard and wants to determine

information about the secret key k. In our model we assume that the following holds:

1. Ais able to trigger the smartcard to encrypt chosen plaintexts.

2. Acan induce transient bit flips into the encryption process.

3. Ais able to detect collisions.

58 Chapter 5. Fault Based Collision Attacks

Discussion of the New Model

To simplify the description of our attacks we assume that the adversary Ais able to input

chosen plaintexts into the smartcard. However, our attacks can also be transformed to known

plaintext attacks. During the encryption Acan induce faults in terms of transient bit flips

into the result of a round transformation that is stored in the RAM. To be more precise,

Acan flip a single bit of some specified byte in the memory. Furthermore, Acan detect

collisions by obtaining some information about an internal state of the encryption process.

However, we do not assume that this information lets Adetermine (parts of) the secret key

directly. Nevertheless, it enables Ato detect if a collision occurred or not. We call any kind

of information that lets Adetect collisions collision information of some intermediate state

of FAES(p, b). Later we will show examples how to derive collision information.

Modeling Collision Information We model collision information as the evaluation of an

injective function fkthat depends on the concrete implementation of AESkand the secret key

k. It gets as input the specification of a bit position that is flipped during the encryption of

a plaintext p. The output is some information about an intermediate state of the encryption.

According to the notation introduced above we denote the collision information of encrypting

plaintext pand inducing a bit flip of bit eof byte iof the state after transformation oin round

rby fk(p(r),(o)

i, e). E.g., fk(p(1),(SB)

i, e) is the collision information Aobtains after flipping bit

eof byte iof the state after the SubBytes transformation of round 1. It is also possible to

derive the collision information without inducing a fault. We denote the evaluation of fk

without inducing a fault in the encryption process by fk(p(r),(o)

i,−).

Realizations of Collision Information Depending on the purpose of the smartcard fk

can have different realizations. Given the ciphertexts the detection of collisions is easy be-

cause the equality of ciphertexts implies equality of intermediate states. However, in many

cases the output of an encryption is not available to the attacker. For example, if the smart-

card computes a message authentication code (MAC) or a hash value using AES as a building

block, fkcan simply be the MAC or the hash value. Remember that the MAC is the final re-

sult of a number of interlinked AES encryptions and not the result of a single AES encryption.

The final ciphertext could also be used as collision information if the smartcard computes

multiple encryption with different encryption algorithms. Finally, if the smartcard computes

a single encryption but does not output faulty ciphertexts, fkcould be the measurement of

some side channel information, e.g., power trace, that allows to detect collisions.

Cost Analysis To analyze the costs of a fault based collision attack we simply count the

number of faults we have to induce. The evaluation of fkwithout inducing a fault is for free.

We also neglect the complexity of additional computations that can be performed offline since

in our cases they are obviously easy.

5.4. Fault Based Collision Attacks on AES 59

5.4 Fault Based Collision Attacks on AES

Now we describe fault based collision attacks on AES. For simplicity, we only show how to

compute byte k0of the secret key k. Similar approaches can be used to compute the other

key bytes.

We describe how to mount and analyze fault based collision attacks on AES in different

scenarios. Each scenario is characterized by abilities of the adversary and the properties of

the environment.

Precision of Fault Induction The first characteristic defines the precision of the fault

induction. We consider two cases. In the first case the adversary Ais able to flip a

specific bit of an intermediate state. In the second case the adversary can focus the

fault to a single byte of an intermediate state. However, we assume that he cannot

focus on a single bit of that byte but each possible bit flip occurs with probability 1/8.

Memory Encryption Mechanism (MEM) The second characteristic specifies whether

the smartcard is protected by a MEM or not. The MEM encrypts every intermediate

result that leaves the processor and decrypts a value right before it enters the processor,

see Figure 5.1 (page 50). Since a smartcard has only restricted computational power

and memory most manufacturers choose a byte oriented encryption function with a

fixed key that is used for encryption and decryption. In our approach we simply model

the memory encryption as an unknown but fixed function h:{0,1}8→ {0,1}8. That

means that we do not rely on a weakness in the memory encryption itself. In particular,

we do not assume to have any information of how bit flips affect further processing of

that byte.

Validation of Collision Information The last characteristic defines whether collision in-

formation remains valid for a long period of time or not. If collision information does

not remain valid there is no reason for Ato store collision information since he cannot

use it later in the attack. Ais only able to compare collision information of two recently

taken measurements and store the result. This effect could be caused by environments

that are frequently changed such that collision information taken at different times is

hardly comparable, e.g., due to some countermeasure that induces noise into the col-

lision information. If, however, collision information remains valid over the time span

used for the attack it may be useful for Ato store this information in a preprocessing

step to have it available once and for all. It will turn out later that stored information

helps to reduce the number of induced faults.

We denote the transformation of SubBytes applied on a single byte xof the state simply

as the application of the sbox on xand write it as S[x]. To simplify notation we define

∆(pi, qi) = pi⊕qi

60 Chapter 5. Fault Based Collision Attacks

to be the difference of two plaintext bytes piand qi. Then

∆in(pi, qi) = (pi⊕k(0)

i)⊕(qi⊕k(0)

i) = pi⊕qi

is the input difference of (pi, qi) before the first application of the sbox and

∆out(pi, qi) = S[pi⊕k(0)

i]⊕S[pi⊕k(0)

is the output difference of (pi, qi) after the first application of the sbox.

5.4.1 Basic Attack

First, we describe the scenario in which the attack takes place. We assume that Acan flip

a specific bit at position eof the intermediate state p(1),(SB). We also assume that collision

information remains valid over the time span of the attack. Finally, we assume that the

smartcard is not protected by a MEM.

In a preprocessing step the adversary computes an array Beof length 256. In position

Be[y], y ∈ {0,...,255}the array stores the following information:

Be[y] := {s, t}s⊕t=y, S[s]⊕S[t] = 2e,

i.e., Be[y] stores all (unordered) pairs of bytes with ∆in(s, t) = yand

∆out(s, t) = S[s]⊕S[t] = 2e.

Furthermore, by Ce[y] denote the union of sets in Be[y]. The sets Ce[y] are pairwise disjoint.

As it turns out, for every e∈ {0,1,...,7}we have that 129 sets Ce[y] are empty, 126 sets

Ce[y] contain exactly two elements, and one set Ce[y] contains exactly four elements.

Next, Acollects a set Bof collision information fk(p(1)(SB)

0,−) for all 256 different values

of p0and arbitrary but fixed p1,...,p15. Then Achooses an arbitrary value q0and encrypts

the corresponding plaintext flipping an arbitrary bit eof q(1),(SB)

0. If fkhas the property that

fk(p(1),(SB)

0,−) = fk(q(1),(SB)

0, e)

then Ais able to find the corresponding plaintext p0satisfying

S[p0⊕k0] = S[q0⊕k0]⊕2e

by comparing the collision information with the elements of B. Given the pair p0, q0the

adversary Aknows the difference p0⊕k0⊕q0⊕k0=p0⊕q0. Using array Bethe adversary

Anow concludes

{p0⊕k0, q0⊕k0} ∈ Be[p0⊕q0].

Hence, Aknows that the correct key byte k0satisfies

k0∈p0⊕ss∈Ce[p0⊕q0].(5.1)

5.4. Fault Based Collision Attacks on AES 61

As mentioned above, Ce[y]≤4 for all y, and Ce[y]= 2 for all but one y. Hence, at this

point Ahas reduced the number of possible values for key byte k0to at most 4.

Next, Arepeats the experiment described above with some value q′

0, such that q′

0⊕s6∈

Ce[p0⊕q0] for all s∈ {p0⊕¯s¯s∈Ce[p0⊕q0]}. Using the collision information in set B, the

adversary Adetermines p′

0such that S[p′

0⊕k0] = S[q′

0⊕k0]⊕2e. As before Aconcludes that

the key byte k0satisfies

k0∈p′

0⊕ss∈Ce[p′

0⊕q′

0].(5.2)

By choice of q′

0, the adversary Ais guaranteed that p0⊕q06=p′

0⊕q′

0. By elementary arithmetic

it follows that if Ce[p′

0⊕q′

0]=Ce[p0⊕q0]= 2, then (5.1) and (5.2) uniquely determine

the key byte k0. By analyzing the structure of the arrays Bewe verified that the key byte k0

is also uniquely determined if one of the sets has size four.

Cost Analysis To determine a single AES key byte Ahas to induce two faults. Thus 32

faults are enough to determine the complete 128-bit AES key.

5.4.2 Second Attack

The scenario for this attack is as follows. We assume that the adversary Acan flip a specific

bit eof the intermediate state p(0),(AR). We also assume that collision information remains

valid over the time span of the attack. Finally, we assume that the smartcard is protected

by a MEM modelled as a function h:{0,1}8→ {0,1}8. This implies that after a flip of bit e

the encryption continues using the value h−1(h(pi⊕ki)⊕2e) instead of pi⊕ki. Therefore, we

assume that we have no information about the impact of bit flips on the encryption process.

The attack is divided into two steps. In the first step the adversary Acollects the necessary

information to compute a function g0that is equal to hup to some constant coefficient. To

do so Aselects a set Sof 256 plaintexts pthat take on all different values in byte p0and

that are equal in each other byte. Auses the smartcard to derive the collision information

for each of these plaintexts by evaluating fk(h(p(0)(AR)

0),−) and stores it in the table B. Then

Aencrypts plaintexts pof the set Sand induces a bit fault into bit 0 ≤e≤7 of h(p(0),(AR)

and compares the collision information fk(h(p(0),(AR)

0), e) with the entries of table Bto find

the corresponding plaintext p′

0. So Aknows the difference

h(p0⊕k0)⊕h(p′

0⊕k0) = 2e

and stores the triple (p0, p′

0, e) in a difference table ∆B. This step is repeated for different

plaintexts pand for different faulty bit positions until Areceived enough information to

compute the differences

h(p0⊕k0)⊕h(p′

0⊕k0)

of one byte p0with all other bytes p′

0. The details are given in the following lemma.

62 Chapter 5. Fault Based Collision Attacks

Lemma 6 Let m:{0,1}q→ {0,1}qbe an unknown function defined over F2q. There exists

a set Dof 2q−1pairs (u, v)∈F2q×F2qwith the following property: If for all (u, v)∈D

we know e∈ {0,...,q−1}such that m(u)⊕m(v) = 2e, then one can determine a function

g:{0,1}q→ {0,1}qsuch that g⊕c=mfor some constant c∈F2q.

Proof. Given some set D⊆F2q×F2qwe construct a graph Gwhose set of vertices is F2q

as follows. We connect two vertices u, v with an edge of weight eif (u, v)∈D.

If in Gthere exists a path between two vertices x, y then the difference m(x)⊕m(y) is

determined by the differences of pairs in D. Furthermore, if the graph Gis connected we can

compute the difference m(x)⊕m(y) for all (x, y)∈F2q×F2q. In particular, we can determine

all differences of the form m(u)⊕m(u0) for an arbitrary but fixed input u0. Using Lagrange

interpolation we can compute the function g(u) = m(u)⊕m(u0). Setting c:= m(u0) proves

the lemma.

Next we describe a set Dof pairs (u, v) with known differences m(u)⊕m(v) = 2e,

such that the graph Gas defined above is in fact connected. First we fix an arbitrary

e1∈ {0,...,q−1}. Then there exists a set D1of 2q−1distinct pairs (u, v)∈F2q×F2qsuch

that m(u)⊕m(v) = 2e1. All pairs in D1will be elements of D. If we consider the graph

whose edges are defined by pairs in D1we get a graph G1on the vertex set F2qthat consists

of 2q−1connected components each consisting of exactly 2 vertices.

Next we choose e26=e1. Then there exists a set D2of 2q−2pairs of vertices (u, v) with

m(u)⊕m(v) = 2e2such that each pair in D2connects different connected components of G1.

We call the resulting graph G2. The set Dwill also contain all elements from D2.

Continuing in this way with all possible ei∈ {0,...,q−1}we get sets of pairs D1, D2...,Dq

and graphs G1, G2,...,Gqsuch that Gihas 2q−iconnected components. In particular, Gqis

connected. Moreover, the edges of Gqare given by the pairs in D:= Sq

i=1 Di. The size of D

is 2q−1. This proves the lemma. ⊓⊔

We want to apply Lemma 6 to the function h(x⊕k0). It is easy to see that Acan compute

exactly the set of differences Ddescribed in the proof of Lemma 6 since he is able to flip a

specific bit. Hence, knowing Dthe adversary Acan compute a function g0:{0,1}8→ {0,1}8

such that for all x∈F256 the difference g0(x)⊕h(x⊕k0) is some constant c0∈F256. Since

Adoes not know the constant c0he does not get any information about the key byte k0at

this point.

Acontinues by computing for all other byte positions 1 ≤i≤15 functions g1,...,g15 such

that for all x∈F256 the function gi:{0,1}8→ {0,1}8has the property that gi(x)⊕h(x⊕ki) =

cifor some unknown constant ci∈F256 . Each of the gi’s does not reveal any information

about the involved key byte kibecause the constant cican take on all possible values and is

unknown to A.

To derive information about the key, Aproceeds as follows. He guesses two candidates

5.4. Fault Based Collision Attacks on AES 63

bk0,bkifor the keybytes k0, ki, respectively. To test this hypothesis on the key, Aselects several

bytes xuniformly at random and computes

g0(x⊕bk0) = h(x⊕bk0⊕k0)⊕c0

and

gi(x⊕bki) = h(x⊕bki⊕ki)⊕ci.

Depending on the hypothesis (bk0,bki) the difference

t0,i := g0(x⊕bk0)⊕gi(x⊕bki)

computes to

h(x)⊕c0⊕h(x)⊕ci=c0⊕ci, if bk0⊕k0=bki⊕ki(5.3)

h(x⊕bk0⊕k0)⊕c0⊕h(x⊕bki⊕ki)⊕ci, if bk06=k0and bki6=ki(5.4)

h(x)⊕c0⊕h(x⊕bki⊕ki)⊕ci, if bk0=k0and bki6=ki(5.5)

h(x⊕bk0⊕k0)⊕c0⊕h(x)⊕ci, if bk06=k0and bki=ki(5.6)

Now we assume that the function hhas the following property. There do not exist constants

a, c ∈F256 such that h(x)⊕a=h(x⊕c) for all x. Note that this assumption does not

restrict the choice of hfor two reasons. Firstly, a function used for memory encryption that

does not have this property contains too much structure and is probably easier to attack.

Secondly, most functions have this property. In fact, a random function has the property

with probability at least 1 −2−127.

This assumption implies that unlike in case (5.3) in cases (5.4),(5.5),(5.6) the difference

t0,i is not constant. Moreover, if the guess bk0,bkiwas correct that is bk0=k0and bki=kithen

Awill always be in case (5.3). Now Acan easily test the hypothesis (bk0,bk1) by computing

t0,i for several bytes x. If t0,i varies for several different values of xthen Aknows that he is

not in case (5.3). It follows that the pair (bk0,bk1) cannot be correct. On the other hand if t0,i

remains constant Aconcludes to be in case (5.3) and keeps the pair (bk0,bk1) as a potentially

correct candidate.

This implies that for every possible key byte bk0the adversary Aobtains a single candidate

bkifor 1 ≤i≤15 that fulfills condition (5.3). Guessing bk0the adversary Acan compute a

vector (bk1,...,bk15) composed of unique candidates bkithat only depend on bk0. To uniquely

determine the correct key, Asimply mounts an exhaustive search attack on the 256 possible

values of bk0.

Cost Analysis Ahas to induce 255 faults to compute a function giaccording to Lemma

6. To test a hypothesis of the key Adoes not need to induce faults. So the overall number

of faults is 16 ·255 = 4080.

64 Chapter 5. Fault Based Collision Attacks

Improvement The previous attack can be improved with respect to the number of induced

faults as shown below. In the first step Acomputes the function g0such that g0(x) =

h(x⊕k0)⊕c0, where c0∈F256 is unknown, as above. To determine the other functions

g1,...,g15 the adversary Auses the fact that each giis related to g0by the following equation

gi(x) = h(x⊕ki)⊕ci=g0(x⊕ki⊕k0

|{z }

)⊕ci⊕c0.

So knowing g0(determined as above) Acomputes a list of all 256 functions g0,s := g0(x⊕s),

s∈F256. To determine which of these functions equals githe adversary Achooses arbitrary

pi, qiand evaluates fk(h(p(0),(AR)

i),−) and fk(h(q(0),(AR)

i), e) at byte position i. Using this

information Acomputes some differences gi(pi)⊕gi(qi) as described in the computation of

g0above.

To determine the correct function gi=g0,si, the adversary Asimply checks which of the

functions g0,s fulfills these differences simultaneously until only one function remains. See

below for the required number of experiments. Then Aknows the sum si=k0⊕kiof two

AES key bytes. Arepeats this procedure for all other byte positions 0 ≤i≤15. As before

guessing bk0the adversary Acan determine a unique candidate bki. That means that Ahas

a vector (bk1,...,bk15) with fixed candidates bkifor each of the 256 candidates bk0. Like in the

original version of this attack this reduces the set of possible AES keys to only 256 candidates.

An exhaustive search reveals the full AES key.

Cost Analysis To compute g0the adversary Ahas to induce 255 faults like in the original

version. To determine further gi’s, Ahas to collect a set of differences gi(p)⊕gi(q) that is

fulfilled by only one of the 256 functions g0,s simultaneously. Notice that if the function g0,s

fulfills a difference, i.e., g0(p⊕s)⊕g0(q⊕s) = gi(p)⊕gi(q) then because of symmetry the

function g0,s′given by s′:= p⊕q⊕salso fulfills this difference since

g0(p⊕(p⊕q⊕s)) ⊕g0(q⊕(p⊕q⊕s) = g0(q⊕s)⊕g0(p⊕s) = gi(q)⊕gi(p).

Assuming that the 256 functions g0,s behave like random permutations (except for the sym-

metry) we expect that Aneeds 2 differences to uniquely identify the correct one with high

probability. We tested this assumption by various experiments and in our experiments it

proved to be correct. Hence, we expect that Aneeds 255 + 15 ·2 = 285 faults to determine

the full 128-bit AES key.

As mentioned before we do not consider the complexity of the offline computations like

Lagrange interpolation etc. since all these computations can be performed efficiently without

access to the smartcard.

5.4.3 Third Attack

First, we describe the scenario in which the attack takes place. We assume that Acan

flip a specific bit at position eof the intermediate state p(1),(SB). We do not assume that

5.4. Fault Based Collision Attacks on AES 65

collision information remains valid over the time span of the attack. Hence, Ais only able

to compare collision information of two recently obtained measurements. Finally, we assume

that the smartcard is not protected by a MEM. Because it is always clear from the context

we simplify notation by identifying elements of F256 with their canonical representation as

elements of the set {0,...,255}.

As a basis for his attack Afixes some input difference ∆in and output difference ∆out of

the application of the sbox in round 1. To be able to detect collisions with a single bit flip

we restrict ∆out to be a power of 2.

The analysis of the sbox shows that there are a lot of suitable values for ∆in and ∆out .

E.g., Achooses ∆in = 10 and ∆out = 4. Only the two pairs

Z1:= (p0⊕k0= 0, q0⊕k0= 10)

and

Z2:= (p0⊕k0= 244, q0⊕k0= 254)

together with their commuted counterparts fulfill the chosen requirements. A fault that is

induced into bit 2 of q(1),(SB)

0after the application of the sbox results in a collision for one of

these pairs. In order to detect such a collision the collision information fkshould have the

property that

fk(p(1),(SB)

0,−) = fk(q(1),(SB)

0,2).

If Afinds such a collision he can conclude that the key byte k0is an element of the set

{p0⊕0, p0⊕10, p0⊕244, p0⊕254}.

More precisely, the attack using fkwith the property defined above works as follows.

First, Agenerates all 128 pairs of plaintexts (p, q) (without symmetry) that have difference

10 in byte 0 (p0=q0⊕10) and are equal in the other bytes, i.e.,

∆(pi, qi) = (10, if i=0

0, otherwise.

Aknows that exactly two of these pairs have output difference 4 in byte 0. The input

difference of the sbox is the same as the difference of p0and q0since AddRoundKey does not

change it. Achecks all 128 pairs (p, q) until

fk(p(1),(SB)

0,−) = fk(q(1),(SB)

0,2).

Taking the symmetry into account it follows that either p0⊕k0= 0, p0⊕k0= 10, p0⊕k0= 244

or p0⊕k0= 254. So there are only 4 candidates for k0left. Acan repeat this attack for all

byte positions of the state. This leaves 22·16 = 232 possible keys. To determine the complete

128-bit AES key Amounts an exhaustive search attack.

66 Chapter 5. Fault Based Collision Attacks

Cost Analysis In the first step Aexamines 128 pairs of plaintexts with difference 10.

Two of these pairs result in a collision so the expected number of faults Ahas to induce is

(2/128)−1= 64. To compute a 128 bit AES key, Aexpects to induce 16 ∗64 = 1024 faults

and a brute force attack of size 232.

Alternative To determine the correct candidate of the key byte Acould also repeat the

same procedure as above with another difference. We assume that fklets Adetect collisions

when flipping bit 3, i.e.

fk(p′(1),(SB)

0,−) = fk(q′(1),(SB)

0,3).

If we consider all pairs (p′, q′) such that

∆(p′

i, q′

i) = (5, if i=0

0, otherwise

the analysis of the sbox shows that

Z3:= (p′

0⊕k0= 0, q′

0⊕k0= 5)

and

Z4:= (p′

0⊕k0= 122, q′

0⊕k0= 127)

are the only pairs with ∆in = 5 and ∆out = 8. Detecting one of these pairs using fkyields

again a set of 4 candidates for k0.

Next, Acomputes the difference of plaintexts p0and p′

0. The difference must be one of

the differences listed in Table 5.1. Since all possible differences are distinct, Acan determine

p0⊕k0and hence k0.

Cost Analysis Following the cost analysis as above this method determines the correct

candidate of each key byte with 1024 faults as in the previous method plus additional 1024

faults.

p0⊕k0

p′

0⊕k00 10 244 254

0 0 10 244 254

5 5 15 241 251

122 122 112 142 132

127 127 117 139 129

Table 5.1: All possible differences of p0,p′

5.4. Fault Based Collision Attacks on AES 67

5.4.4 Fourth Attack

We assume that Acan flip a bit of a specific byte of the intermediate state p(1),(SB). However,

he has no control over the bit position. Instead, we assume that all of the 8 possible bit flips

occur with the same probability 1/8. We also assume that collision information remains valid

over the time span of the attack. Finally, we assume that the smartcard is not protected by

a MEM.

The attack works as follows. In a first step Aselects a set Sof 256 plaintexts pthat take

on all different values in byte p0and are equal in each other byte. Acollects the collision

information fk(p(1),(SB)

0,−) for all elements of S. Then he chooses an arbitrary plaintext q

and encrypts qinducing a fault into bit eof q(1),(SB)

0. By comparing the collision information

fk(q(1),(SB)

0, e) with the collision information collected in the first step Acan determine the

corresponding plaintext p0such that

S[p0⊕k0] = S[q0⊕k0]⊕2e.

Note that eis unknown to Asince he does not have any influence on the bit position. Acan

test all candidates bk0of k0by simply checking if S[p0⊕bk0]⊕S[q0⊕bk0] is a power of 2. If this

condition is true Astores bk0as a possible key value and discards it otherwise. An analysis

of the AES sbox shows that after checking all candidates a set of at most 16 candidates will

remain. Arepeats this procedure with different q0until only one candidate is left. Using a

refined method similar to the attack in Section 5.4.1 using approximately 3 different q0we

can determine the correct key byte with high probability. Hence, we expect that this attack

needs roughly 3 ·16 = 48 faults.

5.4.5 Fifth Attack

We assume that Acan flip a bit of a specific byte of the intermediate state p(1),(SB). However,

he has no control over the bit position. Instead, we assume that all of the 8 possible bit flips

in a position b∈ {0,...,7}occur with the same probability 1/8. We do not assume that

collision information remains valid over the time span of the attack. Hence, Ais only able

to compare collision information of two recently obtained measurements. Finally, we assume

that the smartcard is not protected by a MEM.

Achooses ∆in of the sbox in round 1 in such a way that the number of pairs that have

difference ∆in and output difference with Hamming weight 1 is maximal. This choice reduces

the number of faults Ahas to induce as we will see later. An analysis of the sbox shows

that ∆in = 216 is the best choice since 8 is the maximum number of pairs that fulfill the

requirements.

A single bit flip induced into q(1)(SB)

0may produce a collision if and only if p0⊕k0is one

of the following values:

0,2,8,28,29,41,111,117,173,183,196,197,208,216,218,241.

68 Chapter 5. Fault Based Collision Attacks

To detect the collision fkshould have the property that

fk(p(1)(SB)

0,−) = fk(q(1)(SB)

0, b).(5.7)

A collision implies that k0is an element of the set of 16 candidates

L={p0, p0⊕2, p0⊕8, p0⊕28, p0⊕29, p0⊕41, p0⊕111, p0⊕117, p0⊕173,

p0⊕183, p0⊕196, p0⊕197, p0⊕208, p0⊕216, p0⊕218, p0⊕241}.

To determine k0the adversary Afirst builds a list of all 128 pairs (p0, q0) of plaintexts with

difference 216 in byte 0 and difference 0 in all other bytes. Then Aselects an arbitrary

q0, derives fk(q(1)(SB)

0, b) of the corresponding plaintext and compares it with the collision

information fk(p(1)(SB)

0,−) of the corresponding plaintext of p0.Arepeats this procedure

until he detects a collision. At his point Aknows that k0is an element of the set L.

To identify the correct candidate Acould start an exhaustive search or repeat the pro-

cedure with a different combination of input and output differences. For example Achooses

input difference 4 and output difference 32. Since (88,92) is the only such pair Acan use fk

as a special case of (5.7) having the property

fk(p(1)(SB)

0,−) = fk(q(1)(SB)

0,5)

to test each candidate bk0∈ L of k0.

To check whether a candidate bk0∈ L is equal to k0,Aderives the collision information

fk(p(1)(SB)

0,−) and fk(q(1)(SB)

0, b) for p0=bk0⊕92 and q0=bk0⊕88. Since (92,88) is the only

pair with ∆in = 4 and Hamming weight of ∆out = 1, the adversaryAcan check his hypothesis

bk0. More precisely if bk06=k0the Hamming weight of the output difference will always be

greater than 1 except for the case that p(0)(AR)

0= 88 and q(0)(AR)

0= 92. But this case implies

that bk0⊕4 = k0which is impossible since every difference of two of the sixteen candidates

is different from 4. So a wrong hypothesis cannot create a collision. On the other hand if

bk0=k0then p⊕k0= 92 ⊕bk0⊕k0= 92 and q⊕k0= 88 ⊕bk0⊕k0= 88 is the demanded pair

and Awill detect a collision using fk.

Cost Analysis The success probability of finding one of the 8 pairs in part one of the

attack choosing p0uniformly at random is 8

128 ·1

8=1

128 .Hence 128 is the expected number

of faults Ahas to induce.

The success probability in the second step is (1/8) ·(1/16) = 1/128. So we expect that

Aneeds additional 128 faults. Hence the total number of faults to determine a key byte is

2·128 = 256.

To compute a complete 128 bit AES key we expect that Aneeds 16 ·256 = 4096 faults.

5.5. Conclusion 69

5.5 Conclusion

In this chapter we introduced the concept of fault based collision attacks. We showed that

combining the concepts of fault attacks and collision attacks leads to powerful attacks. Fault

based collision attacks do not need faulty ciphertexts but only need collision information. It

turned out that this is a much weaker requirement.

Furthermore, we considered so called memory encryption mechanisms (MEM), an elab-

orative countermeasure widely used to protect high-end security smartcards against side

channel attacks. We showed that using MEM in a straightforward manner does not increase

security as much as one would expect. E.g., we presented a fault based collision attack on

AES that breaks an implementation protected by a MEM by inducing only about 285 faults.

Moreover, we showed how to mount further fault based collision attacks on AES in different

scenarios. Table 5.2 shows an overview of the 5 attacks presented in this chapter. The first

row shows the precision of the fault induction needed for each of our attacks. The second

row shows whether the collision information is valid over the whole time span of the attack

or if it changes after a short period of time. The third row shows if the target smartcard is

protected by a MEM. The expected number of faults needed for the attack is shown in the

last row.

basic attack attack 2 attack 3 attack 4 attack 5

Precision high high high loose loose

coll. information valid? yes yes no yes no

MEM no yes no no no

# faults 32 285 1024 48 4096

Table 5.2: Overview over the fault based collision attacks

To thwart our attack one has to be more careful. Using a MEM one has to ensure

that different memory encryption functions (keys) are used to protect different bytes of an

intermediate state. Furthermore, we suggest to change the keys of the memory encryption

frequently. Depending on the smartcard and the application one can also consider to increase

the block size of the memory encryption function, e.g., to 16 bit. This would increase the

complexity of fault based collision attacks.

For high-end security applications we suggest to use a randomization strategy like the one

proposed in Chapter 4. Obviously, this approach is more expensive in terms of random bits.

However, it provides a much better security that can be scaled to meet the desired security

level.

70 Chapter 5. Fault Based Collision Attacks

Chapter 6

Cache Behavior Attacks (CBAs)

The performance of recent computers benefits from the progress in chip design and computer

architecture. I.e., the usage of fast but small buffers, so called cache memories, improves

the execution time of algorithms significantly. At first glance this helps to improve security

because even more complex cryptographic algorithms, e.g., encryption algorithms could be

used without slowing down the system too much. However, performance improvements often

also open side channels that leak information about intermediate states of the encryption

process. In this chapter we analyze and formalize the information leakage due to cache

behavior.

It was first observed in (Hu 1992) and (Trostle 1998) that cache behavior opens a covert

channel. They did not focus on attacking cryptographic algorithms but analyzed the multi-

level security of complex systems. Later, (Kocher 1996) and (Kelsey, Schneier, Wagner and

Hall 1998) were the first who mentioned that cache behavior may be a possible point of at-

tack for cryptographic algorithms. During the selection process of AES the resistance of the

candidate algorithms against side channel attacks was investigated for example in (Daemen

and Rijmen 1999). At this time only the time and power consuming operations like multipli-

cation were in the field of vision. Table lookups, e.g., for efficient application of sboxes were

regarded to be resistant against side channel attacks since they were supposed to be constant

time and constant power consuming. However, this turned out to be wrong.

The first theoretical cache behavior attack (CBA) was mounted on DES and presented

in (Page 2002). Later the authors of (Tsunoo, Saito, Suzaki, Shigeri and Miyauchi 2003c)

proved that cache attacks are a realistic threat for cryptographic algorithms. They performed

a cache based attack on DES that successfully determined the secret key. Page extended the

theoretical concept of CBAs in (Page 2003). He started to classify CBAs into time driven

CBAs and trace driven CBAs depending on attackers abilities. The upcoming publications of

practical attacks against AES (Bernstein 2005), (Osvik, Shamir and Tromer 2006), (Brickell,

Graunke, Neve and Seifert 2006) and RSA (Percival 2005) revealed the full power of cache

behavior attacks. These attacks even justify to introduce a new class of CBAs, so called

72 Chapter 6. Cache Behavior Attacks (CBAs)

access driven CBAs.

In this chapter we give the background of CBAs and present the progress in the area of

CBAs up to now. After that we present a different view on how to counteract CBAs that

leads to novel countermeasures. A more detailed description of the structure of this chapter

is as follows:

Section 6.1: Cache Mechanism and Technical Background .....................73

We give a brief summary of the memory management, i.e., the cache mechanism of

recent computers. All technical details that are necessary to understand CBAs and

countermeasures are explained here.

Section 6.2: Security Models for CBAs ..........................................75

In this section we describe the theoretical foundations of CBAs. To analyze attacks and

countermeasures one has to define the abilities of the attacker and the properties of the

underlying implementation. We distinguish three different models: time driven, trace

driven, and access driven CBAs. We propose to use a strengthened variation of the

access driven model as a basis for security analysis and for developing countermeasures.

Section 6.3: Access Driven CBAs on AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85

In this section we describe two concrete CBAs on AES. The first one is due to (Osvik

et al. 2006). It is based on the first round(s) of AES. The second attack is more efficient

and only focuses on the last round of AES. The differences of these attacks lead to a

new countermeasure that we present in Section 6.7.2.

Section 6.4: General Methods to Thwart CBAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

This section provides a list of methods to thwart cache behavior attacks proposed so

far.

Section 6.5: Information Leakage and Resistance ................................89

In this section we introduce the concept of information leakage and the concept of

resistance to estimate the susceptibility of an implementation. Information leakage

allows to estimate the uncertainty of an attacker about the secret key that remains

after successfully mounting a CBA. The resistance is a measure that indicates the

expected effort for an adversary to derive some information about the secret key.

Section 6.6: Information Leakage and Resistance of Selected Implementations 92

In this section we examine the information leakage and the resistance as defined in the

former section of selected implementations of AES against access driven CBAs. Beside

well known implementations we also consider new implementations of AES to counter-

act CBAs. We show that one of the new implementations is provably secure even in

our strengthened access driven CBA model.

Section 6.7: Countermeasures Based on Permutations .........................100

The usage of random permutations is one of the countermeasures proposed in the

6.1. Cache Mechanism and Technical Background 73

literature. We analyze the security a random permutation provides by describing an

attack on a AES implementation protected by using a random permutation. In the

sequel, we introduce so called distinguished permutations. A distinguished permutation

is a permutation having a special property that ensures that some key bits are protected

unconditionally. This is an improvement over the usage of general permutations that

leak all bits of the secret key as our attack in Section 6.7.1 shows.

Section 6.8: Concluding Remarks ................................................106

Finally, we recapitulate CBAs and countermeasures. We describe how to combine the

proposed countermeasures to improve the ratio of security and efficiency.

6.1 Cache Mechanism and Technical Background

In this section we introduce the technical background of cache based attacks. A thorough

treatment of computer architecture and memory management is given in (Hennesey and

Patterson 2002). (Handy 1998) addresses the topic of cache memories even more deeply from

a processor designers view.

The processor (CPU) and the main memory (RAM) are the two main building blocks

of recent computers that play an important role in cache based attacks. The CPU only

has very fast but few so called CPU registers (short: registers) each having the size of a

processor word, e.g. 32 or 64 bits. To process data that is stored in the RAM the data has

to be transferred to the CPU registers. Hence, RAM should have at least two properties:

1. RAM should be large in order to allow to store a lot of data.

2. RAM should be fast in order to allow access and process data quickly.

However, with recent technology these two properties are contradictory. Memory that has to

be fast is necessarily restricted to small size and memory that has to be large is necessarily

slow. In order to compensate this discrepancy, modern computers use a hierarchy of typically

4 different levels of memories that differ in size and speed. The CPU registers are placed in

level 1 of the memory hierarchy. They have the shortest access time but are limited altogether

to less than 1 KB. To compensate the rather slow accesses to the main memory placed in level

3 the so called cache memory (short: cache) is placed in level 2. Cache is much faster than

the main memory but its size is restricted to a few megabytes. So, cache memory constitutes

a trade-off between the small but fast CPU registers and the large but slow main memory1.

The hard drive is placed in level 4 of the memory hierarchy. It is orders of magnitude larger

but also orders of magnitudes slower then the other types of memories. However, the hard

1Note that in recent CPU’s the cache memory (level 2) is split into different so called cache level again

differing in size and speed. However, in order to simplify descriptions we stick to the simpler situation with

only a single cache level.

74 Chapter 6. Cache Behavior Attacks (CBAs)

drive has no influence on CBAs. Table 6.1 shows the memory hierarchy of recent computers.

Furthermore, an overview over the typical sizes and the typical access times of memories of

different levels are given.

The cache memory is divided into dso called cache lines each of size λbits. The set of

cache lines is partitioned into mso called cache sets each containing exactly d/m cache lines2.

Likewise, the memory is divided into so called memory blocks of size λbits. The memory

blocks are labeled with consecutive numbers referred to as the address of the memory block.

Every transfer of data from the main memory is redirected through the cache. Whenever

data should be transferred to the CPU it is checked whether the data is already in the cache

or not. If the requested data is not in the cache the whole memory block Bthat contains

the data is first loaded from the main memory to the cache and then the data is transferred

to the processor. This is called a cache miss. To which cache set and cache line the data is

transferred depends on the address Mof the requested data. Mis split into tso called tag

bits,sso called set bits and boffset bits as depicted in Figure 6.1. The set bits determine

the cache set. According to a placement strategy, one of the cache lines in the determined

cache set is chosen to host the data of the memory block B. The tag bits are also stored as

meta information about the content of the cache line. Since the cache is much smaller than

the main memory, the previous content of the chosen cache line has to be overwritten. This

is called data eviction.

On the other hand, if the cache already contains the requested data, it is directly trans-

ferred from the cache to the processor avoiding the access to the slow main memory. This

is called a cache hit. Hence, if a process uses certain data more often, after the first access

the data resides in the cache and can be quickly transferred to the processor. To find the

requested data in the cache, the address Mis split like above into set bits, tag bits and offset

bits. The set bits determine the correct cache set S. The data is contained in that cache line

of Swhose tag bits match the tag bits of the requested data.

A cache that groups d/m cache lines in a cache set is called (d/m)-way associative cache.

2Note that dis always chosen as a multiple of m.

level 1 2 3 4

typical size <1 KB <16 MB <16 GB >100 GB

access time 0.25 −0.5 ns 0.5−25 ns 80 −250 ns 5 ms

hit time - 1-2 cycles 100 cycles 10.000.000 cycles

miss penalty - 25 −100 cycles - -

Table 6.1: The memory hierarchy

6.2. Security Models for CBAs 75

data address

memory block address offset bits

←ttag bits → ← sset bits → ← boffset bits →

Figure 6.1: Partitioning the address of requested data

If d/m = 1 then the cache is called a direct mapped cache. If every memory block can be

hosted by every cache line the cache is called a full associative cache. Figure 6.2 illustrates

the different types of caches. On one hand, the larger the number d/m of cache lines per

cache set is, the higher is the chance to avoid eviction of data that is still needed. On the

other hand, the larger the number d/m is the longer takes it to find the requested data in

the cache. Therefore, most recent processors use direct mapped caches. Some processors use

2- or 4-way associative caches.

6.2 Security Models for CBAs

In this section we present general principles of how to exploit the knowledge about the

cache behavior to determine information about intermediate results of an algorithm. In turn,

this information can be used to derive information about the secret key of a cryptographic

main memory

direct mapped

cache full associative

cachecache

2−way associative

Figure 6.2: Different types of cache memory

76 Chapter 6. Cache Behavior Attacks (CBAs)

algorithm. In the next section we describe the basic setting and the basic abilities of the

adversary referred to as the fundamental model. After that we present three different threat

models for CBAs that are based on the fundamental model.

6.2.1 Fundamental Model for CBAs

We consider a so called crypto process running on a computer with cache memory. This

crypto process encrypts (or decrypts) a given plaintext (or ciphertext) using a secret key k.

The adversary Awants to derive information about kby analyzing plaintext/ciphertext pairs.

Depending on the underlying threat model Agets some additional side channel information

that leaks due to the cache mechanism. I.e., we focus on the security problems based on

sharing the cache between processes. However, reading data of other processes directly is

prevented by the memory management. The only interaction that happens is the mutual

eviction of data.

To be more specific, in the fundamental threat model for CBAs we assume that the fol-

lowing holds:

Assumption 11 Aknows all technical details about the underlying cryptographic

algorithm and its implementation ( Kerckhoffs’ extended principle).

Assumption 12 Every memory block of the sboxes is mapped to a different cache

line. I.e., the applications of the sboxes do not cause any data eviction of sbox data.

Assumption 13 During the attack only cache accesses caused by the encryption

occur.

Assumption 14 In the beginning of an encryption / decryption no sbox data is

stored in the cache.

Assumption 15 Acan feed the crypto process with known (or chosen) plaintexts

(or ciphertexts) and obtains the corresponding ciphertexts (or plaintexts).

Discussion of the Fundamental Model In the following we discuss and justify the

fundamental model for CBAs as given above. Variations of how to implement a cryptographic

algorithm efficiently are rather limited. Hence, the security of an implementation should not

rely on keeping implementational aspects secret. We call this Kerckhoffs’ extended principle

according to Kerckhoffs’ principle (Kerckhoffs 1883).

6.2. Security Models for CBAs 77

In this thesis we focus on the symmetric cipher AES 3. Beside the standard implementation

we consider several variations of the fast implementation of AES as described in Section 2.4

(page 16). This implementation uses 5 sboxes T0,...,T4each having 256 entries of size 4

bytes.

Since we focus on information leakage due to table lookups, we assume that Aknows

the position of these sboxes in the memory and the possible cache lines they can be mapped

to. Recent processors possess several megabytes of cache memory that can hold the much

smaller sbox data completely. Hence, an access to some sbox data cannot evict other sbox

data. To simplify the description further, we assume that each sbox is mapped consecutively

into the cache memory. Let vbe the number of sbox elements that can be stored in a single

cache line. For each sbox Tj, 0 ≤j≤4 and 0 ≤i≤ ⌈256

v⌉ − 1 let CLj

idenote the cache line

that contains the following elements:

CLj

i= [Tj[i·v],...,Tj[i·v+v−1]] .

If it is clear from the context which is the referred sbox we simply write

CLi= [S[i·v],...,S[i·v+v−1]] .

Furthermore, for an index xof an sbox entry let hxidenote the index of the cache line that

stores x. For example hTj[i·v+ 1] i=i. Remember that Aknows all technical details

about the implementation in particular the position and the mapping of the sboxes to the

cache lines. Hence, Acan compute hxifor every xefficiently.

State of the art encryption algorithms like AES are very fast. Therefore, it is very unlikely

that the encryption of a single plaintext is interrupted by an other process accidentally. This

implies that during the encryption no other process causes cache accesses. Additionally, we

assume that in the beginning of every encryption (decryption) the cache does not hold any

data of an sbox. Hence, cache hits and cache misses only depend on the actual encryption

process. In particular, former encryptions do not have any influence on the cache content.

Note that the Assumptions 12 through 14 of the fundamental CBA model above improve

the strength of an adversary. They allow a simpler analysis and a simpler description of CBAs

but are not essential for an attack to be successful in principle. However, if the Assumptions

12 through 14 do not hold the complexity of an attack increases.

In (Page 2003) two general approaches are given to classify models of cache behavior

attacks: the trace driven CBA and the time driven CBA.

6.2.2 Time Driven CBA

As described in Section 3.2.1 it is possible to use timings of encryptions to determine in-

formation about the secret key. The classical timing attacks on RSA (Kocher 1996, Dhem

3However, all the analysis can be adapted to implementations of virtually any block cipher that uses table

lookups.

78 Chapter 6. Cache Behavior Attacks (CBAs)

et al. 1998) and AES (Koeune and Quisquater 1999) are based on data dependend timings

of certain operations during the encryption, e.g., multiplication. However, in modern block

ciphers like DES and AES, a complex function like the non-linear substitution is usually real-

ized via table lookups. In the AES selection phase it was not clear how to mount side channel

attacks based on table lookups. Table lookups were regarded as constant time operations

and therefore regarded as resistant against timing attacks, see (Daemen and Rijmen 1999).

As it turned out, this is not true for implementations running on computers with cache.

On computers with cache, table lookups to some indices will cause a cache hit while table

lookups to other indices cause a cache miss. An element of an sbox that is already stored in

the cache can be accessed faster than an element that is not stored in the cache. The index

of such an sbox lookup depends on values of intermediate results that again depend on the

plaintext and the secret key.

Hence, values of intermediate results indirectly influence the running time of the algo-

rithm, even for table lookups. These data dependend timings can be statistically analyzed

by an attacker Ato derive information about intermediate states. In turn, information

about intermediate states let Adeduce information about the secret key k. Hence, there is

information leakage due to the cache behavior of the cryptographic algorithm.

Threat Model for Time Driven CBAs The threat model for time driven CBAs is based

on the fundamental threat model presented in Section 6.2.1 (page 76). For a time driven CBA

to be successful, the following assumptions must be valid:

Assumption 18 It is more likely that an encryption of a plaintext that causes only

few cache misses has a short running time than an encryption of a plaintext that

causes more cache misses.

Assumption 19 Ais able to measure the time an encryption takes with reasonable

precision.

Discussion of the Threat Model Assumption 18 specifies the relation between the cache

behavior and the overall encryption time. The impact of a single cache hit or miss on the

encryption time depends on the underlying hardware. To mount a time driven CBA we

assume (Assumption 19) that the attacker can measure the encryption time. The precision

of these measurements is sufficient to allow a statistical analysis of the timings to verify if a

cache hit or miss occurred during a certain step of the encryption. In general, an attacker A

does not need complex equipment or techniques to measure the running time with reasonable

precision. For example, modern processors provide so called performance registers, e.g., to

measure timings of processes with a resolution in the range of clock cycles. A description of

how to use the performance registers, e.g., for time measurements is given in (Intel 2006).

6.2. Security Models for CBAs 79

Basic Structure of Time Driven CBAs The basic structure of a time driven CBA

follows the structure of general side channel attacks. In order to determine information

about the i-th byte kiof the secret key kthe attack consists of the two steps shown in

Figure 6.3.

measurement step: An attacker Achooses a set Sof n∈Narbitrary but different

plaintexts p(1),...,p(n). For each of these plaintexts p(j),Ameasures the time t(j)

the crypto process needs to encrypt p(j).

analysis step: To test a hypothesis bkiof the i-th byte kiof the secret key kthe attacker

Auses the following method based on the method described in (Dhem et al. 1998).

1. Areproduces a part of the encryption of p(j)assuming that bkiis correct. In

particular, Acomputes a certain intermediate result bx(j)of the encryption

of p(j)that only depends on the plaintext, the candidate bkiand possibly on

other parts of the key that Aalready knows. E.g., in AES this could be a

byte of the state after the first application of the sbox.

2. Furthermore, Asimulates the cache behavior of the encryption on that com-

puter. Hence, Adetermines the number z(j)of cache misses that occur during

the computation of bx(j). Let

M=1

n·

j=1

z(j)

denote the average number of cache misses taken over all z(j).

3. Apartitions the set Sof plaintexts into two sets Ssand Slas follows. A

plaintext p(j)is placed in set Ssif the number of cache misses that occur

during the computation of bx(j)is less than M. Otherwise p(j)is placed in set

Sl.

4. Acomputes the mean encryption times Msand Mlof plaintexts in Ssand

Slas

Ms=1

p(j)∈Ss

t(j)

and

Ml=1

p(j)∈Sl

t(j).

If Msand Mldiffer significantly Aconcludes that the candidate bkiis correct.

In the other case, Aconcludes that the candidate is wrong.

Figure 6.3: Basic structure of a time driven CBA

80 Chapter 6. Cache Behavior Attacks (CBAs)

To see why the attack works we first consider the case that the candidate bkiis correct.

Hence, it is more likely that z(j)matches the number of cache misses that occur during

the computation of x(j)in the encryption. Due to Assumption 18, it is more likely that an

encryption of a plaintext p(j)that causes only few cache misses while computing x(j)has

a shorter running time than an encryption that causes many cache misses. Therefore, Ml

should be significantly larger than Ms.

If bkiis not correct it is likely that the z(j)are not the correct numbers of cache misses

that occur during the computation of x(j). Hence, the partition of the plaintexts into the

sets Ssand Slis not entirely determined by the correct number of cache misses. We expect

that the mean times Msand Mlof both sets do not differ significantly.

The success probability of the attack depends on the precision of time measurements and

on the number nof plaintexts. Improving the precision of the measurements and increasing

the number nof plaintexts increases the success probability.

The first time driven CBAs mounted on DES, AES and several other block ciphers where

published in (Tsunoo, Tsujihara, Minematsu and Miyauchi 2002), (Tsunoo, Kubo, Shigeri,

Tsujihara and Miyauchi 2003a), (Tsunoo, Kawabata, Tsujihara, Minematsu and Miyauchi

2003b), (Tsunoo et al. 2003c), (Tsunoo, Suzaki, Saito, Kawabata and Miyauchi 2003d) and

(Tsunoo, Tsujihara, Shigeri, Kubo and Minematsu 2006).

6.2.3 Trace Driven CBA

In a trace driven CBA the attacker Ais more powerful. We assume that Ais able to derive the

profile of the cache behavior. That means that for each memory access Agets the information

if a cache hit or a cache miss occurred. Furthermore, it is assumed that Ais able to relate

this information to operations of the encryption. The sequence of operations together with

the information whether a cache hit or miss occurred is called a cache trace. In the sequel,

we present a threat model for trace driven CBAs to formalize the abilities of the attacker.

Threat Model for Trace Driven CBAs As for time driven CBAs the threat model

for trace driven CBAs is based on the fundamental model of Section 6.2.1 (page 76). The

fundamental threat model is extended by the ability of the adversary to obtain cache traces

of an encryption.

Assumption 21 Ais able to obtain the trace of cache activity.

In order to get a simpler description of the basic structure of trace driven CBAs we assume

that Aalways gets the correct trace without any distortion. This simplification reduces the

complexity but is not essential for a trace driven CBA to be successful.

6.2. Security Models for CBAs 81

Discussion of the Threat Model Assumption 21 provides the basis of trace driven CBAs.

However, obtaining traces of encryptions is not as easy as simple time measurements. The

attacker needs more sophisticated tools to mount a trace driven CBA. For example, Page

(Page 2002) proposes power analysis or the analysis of electromagnetic radiation as means

to determine cache traces. In (Bertoni, Zaccaria, Breveglieri, Monchiero and Palermo 2005)

the authors show how to obtain cache traces via power analysis.

Basic Structure of Trace Driven Attacks As for time driven CBAs, the basic structure

of trace driven CBAs follows the structure of general side channel attacks. In order to

determine information about the i-th byte kiof the secret key kthe attack consists of 2 steps

shown in Figure 6.4.

measurement step: Achooses a set Sof n∈Nplaintexts p(1),...,p(n)

and obtains the cache trace of the encryption of each p(j)as explained

above.

analysis step: To test a hypothesis bkiof a byte kiof the secret key kthe

adversary uses the following method.

1. Areproduces a part of the encryption of p(j)assuming that bkiis

correct. In particular, Acomputes a certain intermediate result

bx(j)of the encryption of p(j)that only depends on the plaintext,

the key byte bkiand possibly on other parts of the key that A

already knows. E.g., in AES this could be a byte of the state

after the first application of the sbox.

2. Furthermore, Asimulates the cache behavior of the encryption

on that computer. Hence, Adetermines the cache trace that

occurs during the computation of bx(j). If the trace of the sim-

ulated cache behavior that occurs during the computation of

bx(j)matches the obtained cache trace the hypothesis may be

correct. Otherwise the hypothesis is proven to be wrong.

Figure 6.4: Basic structure of a trace driven CBA

Examples for trace driven CBAs are given in (Page 2002) and (Acıi¸cmez and Ko¸c 2006).

82 Chapter 6. Cache Behavior Attacks (CBAs)

6.2.4 Access Driven CBA

In this section we present a threat model that is stronger than the models presented above. In

addition to the plaintext/ciphertext pair, the adversary Agets the information which cache

lines were accessed during the encryption. Strengthening the threat model in this way is

justified by the attacks of (Bernstein 2005), (Osvik et al. 2006) and (Neve and Seifert 2006).

These attacks show that cache based attacks are indeed very powerful, even in practice.

Hence, a conservative attitude towards unclear aspects of A’s technical abilities is necessary

to get a reliable analysis.

Threat Model for Access Driven Attacks According to the models described so far

access driven CBAs are also based on the fundamental threat model of Section 6.2.1 (page

76). We call this threat model the ad CBA model. We extend the fundamental model by

assuming that the following holds:

Assumption 24 Agets the indices of the cache lines that were accessed during the

encryption (decryption). We call this information cache information.

Assumption 25 We explicitly assume that Acannot distinguish between elements

in a single cache line.

The main point is that the adversary Ais able to determine information about which

cache lines were accessed during the encryption of a plaintext. To build a strong model we

simplify the determination of accessed cache lines in the following way. We assume that

Asimply gets the correct partition of the set of all cache lines Dinto the sets of indices

of accessed cache lines D0and the set D1of indices of cache lines that were not accessed

during the encryption of the plaintext pinto the ciphertext c. We call this partition cache

information. The triple (p, D0, D1) (or (c, D0, D1)) is called a measurement.

Discussion of the Threat Model Assumption 24 provides the basis of access driven

CBAs. In (Hu 1992) the author already presented a method to determine the indices of

cache lines that were accessed during a computation. Assuming that Ahas access to the

computer he can measure the time it takes to access certain data with reasonable precision.

Contrarily to the time driven CBA, Adoes not need to measure timings of the encryption

process. He only needs to measure the time it takes to access parts of his own data. See

(Intel 1997) for a description of how to do precise time measurements on a PC. To detect

which cache lines has been accessed during the encryption Acan use the Prime-and-Probe

method shown in Figure 6.5.

If, on one hand, the crypto process accesses the cache line CLiduring the encryption he

evicts the data block Bifrom the cache. Hence, accessing Biafter the encryption causes a

6.2. Security Models for CBAs 83

1. Flush the cache by accessing dmemory blocks B1, . . . , Bdsuch that Biis

mapped to cache line CLi.

2. Trigger the crypto process to encrypt the plaintext p.

3. For each memory block Bi, 1 ≤i≤ddo

(a) measure time tto access Bi

(b) if tis large then cache line CLihas been accessed during the encryption

Figure 6.5: Prime-and-Probe method

cache miss which in turn results in a larger access time. On the other hand, if the crypto

process does not access cache line CLithe data block Biremains in the cache. Hence,

accessing Biafter the encryption of pcauses a cache hit, allowing to access Bivery fast.

However, we assume that Acannot distinguish elements of a single cache line. Up to

now it is not clear if it is technically possible to distinguish accesses to elements within the

same cache line. No access driven CBA published so far requires this somewhat difficult and

unlikely ability of the adversary A. Obviously, the ability to distinguish elements within the

same cache line would allow even more powerful cache attacks than the attacks published so

far. As we will see, all efficient countermeasure are implicitly based on this assumption.

Basic Structure of Access Driven Attacks Next we give the general structure of an

access driven CBA to show how an attacker Acan use cache information to derive information

about the secret key. The attacker Aperforms the two steps shown in Figure 6.6.

At this point Ahas computed a set b

Kiof possible key candidates for ki. He knows that

one of the elements of b

Kiis the correct key byte kibecause ki∈b

K(j)

ifor all 1 ≤j≤n.

Hence, the correct value is also an element of the intersection of all sets b

K(j)

Wrong key candidates occur for two reasons. Firstly, each access to a cache line does not

determine the intermediate result exactly but leaves vpossible values where vis the number

of sbox elements that are stored in a single cache line. Secondly, there occur sbox lookups

during the encryption that do not compute x(j)directly but also induce cache accesses. We

call these sbox lookups perturbing lookups. Since an adversary cannot decide whether an

sbox lookup is perturbing or not he has to consider all key candidates that cause an access

to a cache line of the set D0.

The number of the remaining candidates depends on the number of measurements and,

as we will see later, on specific details of the attack. We present an access driven CBA that

84 Chapter 6. Cache Behavior Attacks (CBAs)

measurement step: Agets n∈Nmeasurements m(1),...,m(n)of encryptions of

plaintexts p(1),...,p(n)with the secret key k. That means, for each plaintext

p(1),...,p(n)the adversary Aknows the partition of the set of all cache lines

into the set D0of accessed cache lines and into the set D1of cache lines that

were not accessed during the encryption.

analysis step: For each measurement m(j)the attacker Aanalyses the correspond-

ing cache information to compute a set of possible values of an intermediate

result x(j)

iof the encryption of p(j)that only depends on the plaintext (or

ciphertext) and on the i-th byte kiof the (round-)key k. Then Acomputes

a set b

K(j)

iof candidates for kithat would produce one of the possible values

for x(j)

iduring the encryption. Finally, Acombines the information of all

measurements m(1),...,m(n)by computing

Ki:=

j=1 b

K(j)

Figure 6.6: Formal outline of an access driven CBA

can only determine half of the key bits whereas another attack that we present reveals the

complete key.

6.2.5 Extending the Threat Model for Access Driven CBAs

We present an extended threat model that strengthens the attack compared to the adversary

of the access driven CBA threat model. We call this model ead CBA model. In addition to

the assumptions of access driven CBAs as described above the following assumption holds:

Assumption 27 Acan restrict cache information to certain rounds of the encryp-

tion.

We assume that the adversary can influence the start and end of a measurement. I.e.,

Acan restrict cache information to certain rounds of the encryption. Hence, Acan focus

on chosen rounds of the AES encryption (decryption). As we will see, restricting the cache

information to certain rounds decreases the expected number of accessed cache lines. In turn

this improves the complexity of access driven CBAs significantly but does not increase the

information that leaks through the cache behavior of the crypto process.

Restricting measurements to certain rounds is justified by the property of modern mul-

titasking operating systems to change the active process after a constant amount of running

6.3. Access Driven CBAs on AES 85

time. For example, see (Stallings 2005) for further details. Hence, it is possible that the

encryption process is interrupted by the attackers process, allowing Ato access the cache

during an encryption (decryption). In (Bernstein 2005) Bernstein already warned that this

property may be exploitable and the authors of (Brickell et al. 2006) managed to exploit it

to determine cache information of arbitrary rounds on a real PC with some reasonable pre-

cision. Later, we will use the ead CBA model to analyze the resistance of implementations

and countermeasures against CBAs.

Table 6.2 compares the three different types of CBAs described above. The first column

indicates how difficult it is to mount the attack. The second column lists how many mea-

surements have to be done. In the next section we give the descriptions of two access driven

CBAs on AES based on the first round(s) and on the last round.

6.3 Access Driven CBAs on AES

To illustrate the general structure of access driven CBAs in the ead CBA modell, in this

section we present two access driven CBAs on AES. The first attack as presented in (Osvik

et al. 2006) is based on the first round(s) of AES. The second attack is based on the last

round of AES. The idea was mentioned in (Osvik et al. 2006) and (Brickell et al. 2006). In

the sequel we describe both attacks on the fast implementation of AES (see Section 2.4).

Although both attacks work for different sizes of cache lines, we simplify the descriptions by

fixing the size of a cache line to λ= 512 bits. Hence, each cache line can store v= 16 entries

of a large sbox T0,...,T4and each sbox Tjfits into m= 16 cache lines CLj

0,...,CLj

15. For

0≤ℓ≤15 the sbox the attack focus on is mapped into the cache lines as follows:

ℓ={Tj[x]|x=ℓ·16,...,ℓ·16 + 15}.

6.3.1 Access Driven CBA on the First Round

The first CBA is based on intermediate results of the first round. To be more precise, A

focus on the result of the first application of an sbox in the first round. Since the involved

type difficulty complexity

time driven low high

trace driven high medium

access driven low low

Table 6.2: Comparing properties of different CBAs

86 Chapter 6. Cache Behavior Attacks (CBAs)

sbox depends on the index iof the key byte we only consider the output

xi=T(imod 4)[pi⊕ki]

of the sbox T(imod 4). To simplify notation we simply write

xi=T[pi⊕ki].

Structure of the Attack

To derive information about the i-th byte kiof the secret key kthe attacker performs the

following operations according to the basic structure of access driven CBAs shown in Section

6.2.4 (page 83):

1. Achooses n∈Nplaintexts p(1),...,p(n)that are fixed in byte p(j)

iand are independent

and uniformly distributed in the other bytes.

2. Aobtains measurements m(j)= (D(j)

0, D(j)

1, p(j)) for 1 ≤j≤n.

3. Aconcludes that

x∈b

X(j):= [

ℓ∈D(j)

{ℓ·16,...,ℓ·16 + 15}

4. Acomputes the sets

K(j)

i=np(j)

i⊕bx(j)

i|bx(j)

i∈b

X(j)o

for all 1 ≤j≤n.

5. Acomputes the set

Ki=

j=1 b

K(j)

of candidates for ki.

Discussion of the Attack Let us assume that Acan restrict the measurements to the

first round. D(j)

0is the set of the indices of the 16 cache lines that were accessed during

the 4 applications of T(imod 4) in round 1 of the encryption of the plaintext p(j). Hence,

the correct key byte kiis an element of every b

K(j)

i. Remember that a cache line can store

v= 16 elements of an sbox. Hence, depending on the plaintexts p(1), . . . , p(n)the remaining

set of key candidates b

Kicontains at most 4 ·16 = 64 elements if all nmeasurements cause

the access of the same 4 cache lines. However, fixing byte piof the plaintexts and choosing

all other bytes uniformly at random lets Adetermine the cache line ℓthat is accessed while

computing xafter only few measurements. Knowing ℓlets Areduce the number of possible

key candidates to 16. To see why at least 16 key candidates will survive this attack we look

at the structure of the elements of a set b

K(j)

i. The elements of b

K(j)

iare always of the form

6.3. Access Driven CBAs on AES 87

p(j)

i⊕ℓ, . . . , p(j)

i⊕(ℓ+ 15). That means that the elements of each b

K(j)

irestricted to the 4

lower bits take on all 16 possible values. It follows that the attack is not able to determine

the lower 4 bits of the key byte and hence 24= 16 candidates for kiremain. In the case that

Acannot restrict the cache information to the first round the set D0also contains indices of

the perturbing lookups of cache lines that were accessed in rounds 2 to round 9. Hence, it

will take more measurements to determine information about the key. The total amount of

information that Agets are again the upper 4 bits of each key byte.

To determine the remaining bits of each key byte one can combine this attack with a

modified attack on the second round to compute the complete key.

6.3.2 Access Driven CBA on the Last Round

In this section we describe a CBA that is based on an intermediate result4

xi=S−1ci⊕k10

i

where Auses cache information about the sbox lookup of the last round to determine the

secret key k.

Basing the attack on the last round has advantages over the attack on the first rounds.

First, cache information of the last round is sufficient to determine all bits of the secret key.

So Adoes not need to attack different rounds. Another advantage is that the sbox T4of the

last round is special and is only used in that round. This helps the attacker because cache

information is never perturbed by cache accesses of other rounds. The cache information is

restricted to the last round automatically.

For sake of simplicity, we only show how to compute a single byte k10

iof the last round

key k10. However, the same strategy can by applied to determine the other key bytes of k10.

Knowing all key bytes of the last round key allows to revert the key schedule and compute

the cipher key k. As mentioned above, we fix the size of a cache line to λ= 512 bits and only

consider the sbox T4of the fast implementation of AES as described in Section 2.4 (page 16)

since it is widely used in common crypto libraries like openssl (OpenSSL Project 2005). We

denote the j-th cache line used for the table lookups for T4by CLj, j = 0,...,15. Hence,

CLjcontains the 4-tuples

{(S[x],S[x],S[x],S[x]) |x= 16 ·j, . . . , 16 ·j+ 15}

as defined in Section 2.4 (page 16).

Structure of the Attack The structure of the attack on the 10th round is similar to the

structure of the attack on the first round. To derive information about the i-th byte of the

last round key k10 the attacker performs the following operations:

4To simplify notation we omitted the ShiftRows operation.

88 Chapter 6. Cache Behavior Attacks (CBAs)

1. Achooses n∈Nplaintexts p(1),...,p(n)uniformly at random.

2. Aobtains the ciphertexts and the measurements m(j)= (D(j)

0, D(j)

1, c(j)) for 1 ≤j≤n.

3. Aconcludes that

x(j)

i∈b

X(j)

i:= [

ℓ∈D(j)

{ℓ·16,...,ℓ·16 + 15}

4. Acomputes the sets

K(j)

i=nc(j)

i⊕Shbx(j)

ii|bx(j)

i∈b

X(j)

for all 1 ≤j≤n.

5. Acomputes the set

Ki=

j=1 b

K(j)

of candidates for k10

If b

Kicontains only a single element, the adversary has determined k10

i. Now it is not hard

to see that the intersection of sets in step 5 eventually will contain only a single element if

every wrong key candidate is not an element of all sets b

K(j)

i. The big difference between the

attack on the first and on the last round is that in step 4 the sbox is involved in computing

the intermediate result. We verified that unlike in the attack on the first round the diffusion

on the bits caused by the sbox lets Adetect wrong key candidates. That means that for

every wrong key candidate bkthere exist appropriate choices of plaintexts such that the

resulting set of key candidates does not contain the wrong candidate bk. We will consider

this property of the attack more closely in Section 6.7. Moreover, experiments show that on

average approximately 15 pairs (p(j), c(j)) together with the cache information D(j)

0suffice to

determine the key byte k10

iuniquely.

6.4 General Methods to Thwart CBAs

In this section we give an overview over countermeasures to hedge implementations of cryp-

tographic algorithms against CBAs as proposed for example in (Page 2003). We give a brief

description and assess each countermeasure with respect to performance and security.

remove cache A straightforward countermeasure to counteract CBAs to remove or disable

the cache and hence the cache effects. On one hand it is not clear how to do this

on recent processors. On the other hand, disabling the cache would have devastating

consequences on the performance of implementations.

6.5. Information Leakage and Resistance 89

minimize time accuracy Time driven CBAs depend on the ability of the attacker to mea-

sure timings with reasonable precision. Disturbing timing measurements, e.g., by in-

serting random dummy operations into the encryption process, would increase the effort

for an attacker. However, a time driven CBA may still be feasible.

maximize line size The size of a cache line determines the amount of information that

leaks by a CBA. The larger a cache line is, the lower is the amount of information that

leaks. This also increases the effort needed to perform a CBA but does not necessarily

prevent it.

perform cache warming Warming the cache, that means loading the whole sbox into the

cache before starting the encryption, was first regarded as an effective countermeasure.

However, Bernstein (Bernstein 2005) warned about the effectiveness and the authors of

(Brickell et al. 2006) managed to defeat this countermeasure.

disable cache flushing Another point of defense could be to prevent an attacker from

flushing the cache. In combination with cache warming this would render all CBAs

useless. However, building this countermeasure needs additional hardware support

that would be very expensive.

cache flushing on every process switch In the analysis of the VAX security kernel (Hu

1992) the author proposes to clear the cache on every process switch. This approach

needs the support of the kernel of the operating system and would obviously close the

cache based side channel. However, even with hardware acceleration the impact on the

performance would be very high.

randomize the instruction order Randomizing the instruction order could also increase

the effort to mount CBAs. Because the attacker cannot associate side channel informa-

tion to certain operations, the number of measurements needed to deduce information

about intermediate states increases. See (May, Muller and Smart 2001a) and (May,

Muller and Smart 2001b).

randomize intermediate states Randomizing intermediate states as described in Chapter

4 obviously thwarts CBAs. Each intermediate result is completely randomized such

that is independent of the plaintext and the secret key. Hence, even if table lookups

are used to compute intermediate results the information that leaks via a CBA is also

independent of the secret key.

6.5 Information Leakage and Resistance

CBAs are very powerful attacks. Although they seem to be unrealistic and hypothetical

on first sight they were proven to be a real threat for implementations of cryptographic

algorithms on computers with cache. Hence, a strong threat model is essential for a thorough

90 Chapter 6. Cache Behavior Attacks (CBAs)

security analysis. The threat model described above is stronger than the threat models

published so far. The adversary is more powerful because Acan restrict the cache information

to a smaller interval of encryption operations. This reduces the number of accessed cache

lines per measurement and increases the efficiency of cache based attacks. The main questions

when analysing the security against CBAs are information leakage and complexity of a CBA.

After giving a formal definition of information leakage we introduce the notion of the so

called resistance of an implementation as a measure that allows to estimate the complexity

of a CBA.

Information Leakage The most important aspect of an implementation regarding the

security against access driven CBAs is to determine the maximal amount of information

that leaks via access driven CBAs. As we will see, the amount of leaking information about

the secret key varies depending on the details of the CBA and the implementation of the

cryptographic algorithm. We make the following definition:

Definition 3 (information leakage) We consider an adversary who can mount a CBA

using an arbitrary number of measurements. Let b

Kibe the set of remaining key candidates

for a key byte k10

iat the end of the attack. Then the leaking information is

8−log2|b

Ki|

bits.

The amount of leaking information allows to estimate the uncertainty of an attacker about

the secret key that remains after a successful access driven CBA. To quantify the maximal

amount of information Acan obtain about the secret key by access driven CBAs, we define

|CL|to be the size of a cache line in bits, |S|the number of entries of the sbox and sthe size

of a single sbox element in bits. Hence, the number of elements that fits into a cache line is

|CL|

sand the cache information of a single measurement leaks at most

log2(|S|)−log2|CL|

s= log2|S|

|CL|·s

bits. Depending on the exact nature of an attack, the sets of measurements let the attacker

reduce the number of remaining key candidates after the attack. The information leakage

varies between 0 and 8 bits of information per byte. For example, the attack on the first

round of (Osvik et al. 2006) mounted on the fast implementation can determine at most 4

bits of every key byte regardless of the number of measurements. In contrast, the attack of

(Brickell et al. 2006) based on the last round allows an adversary to determine all key bits.

Furthermore, in Section 6.6 (page 96) we present an implementation that does not leak any

information in our model.

6.5. Information Leakage and Resistance 91

Complexity of a CBA The information leakage as defined above measures the maximal

amount of information a CBA can provide using an arbitrary number of measurements.

Determining the expected number of measurements an attacker needs to obtain the complete

leaking information depends on the details of the implementation and on details of the CBA.

For simplification we introduce the notion of so called resistance. The resistance focuses on

the general structure of a CBA as shown in Section 6.2.4 (page 83) and does not consider

details of certain CBAs. It is a general measure to estimate the complexity of CBAs on

different implementations.

Definition 4 (Resistance) The resistance of an implementation is the expected number Er

of key candidates that are proven to be wrong during a single measurement that is based on r

rounds of the encryption.

The larger Erthe more susceptible is the implementation to access driven CBAs. In par-

ticular, if an implementation does not leak any information then an adversary cannot rule

out key candidates and hence the resistance is 0. To compute Erwe assume that all sbox

lookups are independently and uniformly distributed. This assumption is justified because an

attacker Ausually does not have any information about the distribution of the sbox lookups.

Hence, the best he can do in an attack is to choose the parts of the plaintexts/ciphertexts

that are not relevant for the attack uniformly at random.

Let mbe the number of cache lines needed to store the complete sbox. Each cache line

can store velements of an sbox. Furthermore, let wbe the number of sbox lookups per

round and let rbe the number of rounds the attack focuses on. In an access driven CBA

a key candidate is proven to be incorrect if it causes an access of a cache line that was not

accessed during a measurement. Assuming that all sbox lookups are uniformly distributed

the probability that a cache line is not accessed in all r·wsbox lookups is

pmiss := m−1

mr·w

Hence,

Er:= m−1

mr·w

·m·v(6.1)

is the expected number of key candidates that can be sorted out after a single measure-

ment. However, the maximal amount of information an arbitrary number of measurements

can reveal is limited by the information leakage. Further measurements will not reveal further

information. We verified by experiments that the number of measurements needed to achieve

the full information leakage only depends on Er.

In the sequel, we focus on methods to counteract CBAs. In general, there are two ap-

proaches to counteract such a side channel. The first approach is to use some kind of random-

ization to ensure that the leaking information does not reveal information about the secret

92 Chapter 6. Cache Behavior Attacks (CBAs)

key. Using randomization is a general strategy that protects against several kinds of side

channel attacks, see for example Chapter 4 (page 25). In Section 6.7 we analyze a more effi-

cient method to counteract CBAs based on random permutations. Before that, we consider

the second approach that is to reduce the bandwith of the side channel. We present several

implementations of AES and examine their information leakage and their resistance.

6.6 Information Leakage and Resistance of Selected Imple-

mentations

As Bernstein pointed out in (Bernstein 2005) to thwart cache attacks it is not sufficient to load

all sbox entries into the cache before accessing the sbox in order to compute an intermediate

result because Acan get cache information at all times. Hence, loading the complete sbox

into the cache does not suffice to hide all cache information. Therefore, he advises to avoid the

usage of table lookups in cryptographic algorithms. Computing the AES SubBytes operation

according to its definition

f:{0,1}8→ {0,1}8

x7→ a·INV(x)⊕b

would virtually cause no cache accesses and hence seems to be secure against CBAs.

However, implementing SubBytes like this would result in a very inefficient implementation

on a PC. To achieve a high level of efficiency people prefer to use precomputed tables. In the

sequel, we analyze the security of some well known and some novel variations of implemen-

tations of AES. For each of these implementations we consider access driven CBAs based on

different sboxes and examine the information leakage and the resistance as defined in (6.1).

To simplify notation we fix the size of a cache line to 512 bits as we did above. Furthermore,

we did timing experiments for each implemention to estimate its efficiency. The testing en-

vironment for our timing experiments is shown in Table 6.3. For each implementation we

compare its timing with the timing of the fast implementation. Table 6.9 summarizes the

information leakage, resistance and efficiency for all considered implementations.

CPU Intel Pentium M, 1400MHz

OS Linux, Kernel 2.6.18

Compiler gcc 4.1.1

Table 6.3: Experimental environment

6.6. Information Leakage and Resistance of Selected Implementations 93

Standard Implementation

The standard implementation as described in Section 2.3 (page 9) uses only the standard sbox

S. Hence, an access driven CBA as described above is based on that sbox. The standard

sbox consists of 256 entries each of size one byte. Hence, the sbox can be stored in m= 4

cache lines each of which can hold v= 64 sbox entries. In each round the sbox is applied

w= 16 times. Next, we analyze the susceptibility to access driven CBAs as described above:

Information leakage To determine the number of leaking bits we performed experiments.

Due to the low number mof cache lines and the relative high number of sbox accesses

per round the probability that a cache line is not accessed in a part of the encryption

becomes very small with an increasing number rof involved rounds. We verified by

experiments that measurements taken over ≤3 rounds of the standard implementation

leak all key bits. Although the small probability pmiss prevents performing further

experiments we assume that even more rounds will leak all key bits.

Resistance As explained above, the probability that a cache line is not accessed during r

rounds of an encryption decreases rapidly with increasing r. Table 6.4 summarizes the

resistance of the standard implementation for 1 ≤r≤10.

rEr

1 2.57

2 2.57 ·10−2

3 2.58 ·10−4

4 2.58 ·10−6

5 2.59 ·10−8

6 2.59 ·10−10

7 2.60 ·10−12

8 2.61 ·10−14

9 2.61 ·10−16

10 2.62 ·10−18

Table 6.4: The resistance of the standard implementation

E.g., we expect that a single measurement taken over 2 rounds of the encryption allows

to sort out approximately 0.0257 key candidates.

Efficiency The standard implementation uses some time consuming operations such as ma-

trix multiplication over the finite field F256. Hence, on a 32 bit processor the efficiency

of the standard implementation is obviously lower than the efficiency of the fast imple-

mentation that avoids these inefficient operations. Our timing experiments on a 32 bit

94 Chapter 6. Cache Behavior Attacks (CBAs)

processor have shown that the standard implementation is about 3 times slower than

the fast implementation.

Fast Implementation

The fast implementation as described in Section 2.4 (page 16) is the reference implementation

for virtually all AES implementations in software on 32 bit platforms. Its performance is

based on the clever merge of the round functions SubBytes,ShiftRows and MixColumns into

5 specially constructed sboxes T0,...,T4. Each of these sboxes holds 256 entries of size 4

bytes. Hence, a cache line can store v= 16 sbox elements and we need m= 16 cache lines to

store an sbox Tiin the cache. As described above, each of the sboxes T0,...,T3is applied

4 times in every round 1,...,9 of the encryption. In the last round T4is applied 16 times.

We consider both, a CBA based on table lookups to one of the sboxes T0,...,T3in the first

round like the one described in Section 6.3.1 and a CBA based on the sbox T4of the last

round as described in Section 6.3.2.

Information leakage The access driven CBA of (Osvik et al. 2006) as described in Section

6.3.1 on the first round of AES shows that in this case the fast implementation will

reveal half of the key bits, even with an arbitrary number of measurements. As we have

seen in Section 6.3.2 (page 87) a CBA based on the table lookups to T4in the last

round lets Adetermine the secret key completely.

Resistance Due to the bigger size of the sboxes and the lower number of sbox lookups per

round the resistance of the fast implementation is significantly lower than that of the

standard implementation. If the attack is based on sbox T4than every measurement

is implicitly restricted to the last round because T4is only used in that round. Hence,

the resistance does not change for measurements restricted to a different number of

rounds. We expect that Acan rule out approximately

Er=15

1616

·16 ·16 ≈91

wrong key candidates of a key byte of the last round key after a single measurement.

If the access driven CBA is based on sbox lookups of the first round things are different.

Each sbox T0,...,T3is used 4 times in every round 1,...,9. In this case, the expected

numbers of wrong key candidates that can be ruled out after a single measurement

taken over rrounds are given in Table 6.5.

Efficiency As the name suggests, the fast implementation is very efficient especially on 32

bit computers. It only consists of sbox lookups, shifts and XOR operations and omits

the complex operations such as matrix multiplication and uses precomputed tables to

compute operations in finite fields.

6.6. Information Leakage and Resistance of Selected Implementations 95

rEr

1 198.0

2 153.0

3 118.0

4 91.2

5 70.4

6 54.4

7 42.0

8 32.5

9 25.1

Table 6.5: The resistance of the fast implementation against access driven CBAs based on

sboxes T0,...,T3.

Fast Implementation Using Standard Sbox in the Last Round (fast-1)

To improve the security, the authors of (Brickell et al. 2006) suggested to exchange the sbox

T4with the standard sbox in the last round. In the case of a CBA that is based on sbox

lookups of the first round this implementation provides the same information leakage and

resistance as the fast implementation. Therefore, in the sequel we only consider a CBA that

is based on the table lookups of the last round.

Information leakage As for the standard implementation explained above, an access driven

CBA based on the standard sbox used in the last round reveals the complete secret key.

Resistance The resistance of this approach against an access driven CBA based on the

standard sbox is better than that of the fast implementation against an access driven

CBA based on the sbox T4. A single measurement lets Arule out approximately

Er=3

416

·4·64 ≈2.57

wrong key candidates. The resistance remains constant because the standard sbox is

only used in one round.

Efficiency Timing experiments with our implementation of this approach showed that using

the standard sbox in the last round does not slow down the encryption significantly.

Fast Implementation Using only Sbox T0(fast-2)

We consider another modification of the fast implementation of AES. The description of AES

in Section 2.4 (page 16) shows that the i-th entry of the sboxes T1, . . . , T3is equal to the i-th

96 Chapter 6. Cache Behavior Attacks (CBAs)

entry of the sbox T0cyclically shifted by 1,2 and 3 bytes to the right respectively. Hence, we

propose to use only sbox T0in the encryption and shift the result as needed to compute the

correct AES encryption. E.g., to compute the sbox lookup T1[i] using the sbox T0we simply

cyclically shift the value T0[i] by 1 byte to the right. In the last round, we recommend to

use the standard sbox. Since we already analyzed the information leakage and resistance of

the standard sbox we focus on a CBA based on the sbox T0.

Information leakage Using only the sbox T0does not change the amount of information

that leaks compared to the fast implementation. Hence, this implementation causes

also the leakage of the complete secret key.

Resistance The sbox T0needs m= 16 cache lines each of which stores v= 16 elements.

The difference with the fast implementation is that T0is applied w= 16 times in each

round 1,...,10. Due to the increased number of sbox lookups per round the resistance

against access driven CBAs is better than the resistance of the fast implementation.

Table 6.6 (page 96) shows the resistance Erfor all different values r.

rEr

1 91.2

2 32.5

3 11.6

4 4.12

5 1.47

6 5.22 ·10−1

7 1.86 ·10−1

8 6.62 ·10−2

9 2.36 ·10−2

10 8.39 ·10−3

Table 6.6: The resistance of the fast implementation using only T0

Efficiency We implemented this approach and did timing measurements to estimate the

running time. Compared to the fast implementation we could not measure any dif-

ferences in the running time. Hence, this implementation is as efficient as the fast

implementation.

Splitted Sboxes (small-n)

As a simple but effective countermeasure to counteract access driven CBAs we suggest to split

the sbox Sinto nsmaller sboxes S0,...,Sn−1such that every small sbox Sifits completely

6.6. Information Leakage and Resistance of Selected Implementations 97

into a single cache line5. An application Si[x] of sbox Siyields dibits of the desired result

S[x]. Hence, the correct result can be calculated by computing all bits separately and shift

them into the correct position.

We construct the small sboxes Sifor 0 ≤i≤n−1 as follows:

Si:{0,1}8→ {0,1}di

mapping

x7→ ⌊ S[x]⌋(Pi−1

j=0 dj,(Pi

j=0 dj)−1))

where ⌊y⌋(b,e)are the bits yb. . . yeof the binary representation of y= (y0,...,y7). The small

sboxes are shown in Appendix B (page 115). Instead of applying the sbox Sto xdirectly

each Siis applied.

The result is computed as

S[x] =

n−1

i=0

Si[x]·2Pi−1

j=0 dj.

In the sequel, we assume that the size of the sbox is a multiple of the size of a cache line and

that all djare equal. Depending on the number of small sboxes we call this implementation

small-n. E.g., let the size of a cache line be λ= 512 bits and for 0 ≤i≤3 let each Sistore

the bits ⌊S[x]⌋(2i,2i+1) . The result S[x] is then computed as

S[x] = S0[x]⊕S1[x]·4⊕S2[x]·16 ⊕S3[x]·64.

We call this implementation small-4.

Information leakage The amount of information that leaks depends on the number nof

small sboxes. Let us consider the variants small-2, small-4 and small-8. Computing

S[x] using variant small-4 or small-8 leaks 0 bits of information having cache lines of

size 512 bits because of two reasons:

1. Every Sifits completely into a single cache line.

2. For every xeach Siis used exactly once to compute S[x].

Hence, the cache information remains constant for all inputs. An attacker will always

get the information that every cache line has been accessed even if he could restrict

measurements to single sbox lookups. The only assumption that is involved is that A

cannot distinguish between the accesses on different elements within the same cache

line (Section 6.2.4). The variant small-2 presumably leaks all key bits in our setting.

Resistance As we have shown above, the variants small-4 and small-8 leak no key bit.

Hence, even an arbitrary number of measurements does not provide any information

that lets Arestrict the number of possible keys. This implies that small-4 and small-8

have resistance 0. The resistance of small-2 is listed in Table 6.7.

5Each sbox should fit into a single cache line at every cache level.

98 Chapter 6. Cache Behavior Attacks (CBAs)

rEr

1 3.91 ·10−3

2 5.96 ·10−8

3 9.09 ·10−13

4 1.39 ·10−17

5 2.12 ·10−22

6 3.23 ·10−27

7 4.93 ·10−32

8 7.52 ·10−37

9 1.15 ·10−41

10 1.75 ·10−46

Table 6.7: The resistance of small-2

Efficiency Obviously, the performance depends on the number of involved sboxes and shifts

to move bits into the right position. To estimate the efficiency we used the small-n

variants in the last round of the fast implementation. Due to the inefficient bit manip-

ulations on 32 bit processors our ad hoc implementation of using small-4 only in the

last round shows that the penalty is about 60%. We expect that a more sophisticated

implementation reduces this penalty significantly. However, we stress that access driven

CBAs are very powerful attacks. Hence, it is not astonishing that secure implementa-

tions are not that efficient.

Table 6.8 shows the result of our timing measurements for the variants small-2, small-4

and small-8 applied only on the last round of the fast implementation of AES. Applying

the small variants to more rounds will decrease the efficiency further.

Comparison of Implementations

To compare the implementations considered above with respect to information leakage (IL),

resistance (Er) and efficiency (Eff.) we summarize the important information in Table 6.9.

The explanations of the detailed informations were given above.

# sboxes fast small-2 small-4 small-8

time factor 1 1.32 1.6 1.95

Table 6.8: Timings for small-2, small-4 and small-8 applied on the last round of AES

6.6. Information Leakage and Resistance of Selected Implementations 99

1 2 3 4 5 6 7 8

standard fast fast-1 fast-2 small-2 small-4 small-8

S T0,...,T3T4S T0S0,S1S0,...,S3S0,...,S7

IL 8 4/8 8 8 8 8 0 0

E12.57 198.0 91.2 2.57 91.2 3.91 ·10−30 0

E22.57 ·10−2153.0 91.2 2.57 32.5 5.96 ·10−80 0

E32.58 ·10−4118.0 91.2 2.57 11.6 9.09 ·10−13 0 0

E42.58 ·10−691.2 91.2 2.57 4.12 1.39 ·10−17 0 0

E52.59 ·10−870.4 91.2 2.57 1.47 2.12 ·10−22 0 0

E62.59 ·10−10 54.4 91.2 2.57 5.22 ·10−13.23 ·10−27 0 0

E72.60 ·10−12 42.0 91.2 2.57 1.86 ·10−14.93 ·10−32 0 0

E82.61 ·10−14 32.5 91.2 2.57 6.62 ·10−27.52 ·10−37 0 0

E92.61 ·10−16 25.1 91.2 2.57 2.36 ·10−21.15 ·10−41 0 0

E10 2.62 ·10−18 25.1 91.2 2.57 8.39 ·10−31.75 ·10−46 0 0

Eff. ∼3 1 ∼1∼1 1.32 1.6 1.95

Table 6.9: Comparison of selected AES implementations with respect to information leakage

(IL), resistance (Er) and efficiency (Eff.)

The standard implementation leaks all key bits and provides good resistance but low

efficiency. The fast implementation also leaks all key bits and provides low resistance but

good efficiency. The modifications fast-1 and fast-2 inherit the information leakage and the

good efficiency but improve the resistance. Fast-1 improves the resistance against CBAs

that are based on the last round from 91.2 to 2.57. Using the fast-1 implementation a CBA

based on the sboxes T0,...,T3is much more efficient than a CBA based on the last round.

Fast-2 uses only one large sbox and hence improves the resistance against all CBAs that

comply with our basic structure of access driven CBAs. As the implementations mentioned

above, the implementation small-2 leaks all key bits. Its resistance is much better than

the resistance of the implementations mentioned above but its efficiency is rather low. The

implementations small-4 and small-8 do not leak a single key bit and hence provide the best

possible resistance. As the implementation small-2, the implementations small-4 and small-8

suffer from low efficiency. See Table 6.10 for a simplified comparison of the implementations

considered above.

For applications that require high speed we propose to use the implementation fast-2

because its efficiency is comparable to the efficiency of the fast implementation. However,

one should keep in mind that fast-2 does not thwart access driven CBAs completely but

only increase the complexity of a CBA. In high security applications where it is inevitable

to thwart CBAs we propose to use the small-4 implementation. It suffers from rather low

efficiency but prevents the leakage of key bits.

100 Chapter 6. Cache Behavior Attacks (CBAs)

implementation info leakage resistance efficiency

standard 8 bit / Byte + −

fast 8 bit / Byte −+

fast-1 8 bit / Byte 0 +

fast-2 8 bit / Byte + +

small-2 8 bit / Byte ++ −−

small-4 0 bit / Byte ++ −−

small-8 0 bit / Byte ++ −−

Table 6.10: Simplified Comparison of Implementations

6.7 Countermeasures Based on Permutations

Another class of countermeasure that was already proposed but not analyzed in (Brickell

et al. 2006) is to use secret random permutations to randomize the accesses to the sbox.

In this section we present a CBA against an implementation of AES secured by a random

permutation that needs roughly 2300 measurements to reveal the complete key (Bl¨omer and

Krummel 2007). This shows that the increase of the complexity of CBAs induced by random

permutations is not as high as one would expect. In particular, the uncertainty of the

permutation is not a good measure to estimate the gain of security. A random permutation

has uncertainty of log2(256!) ≈1684 bits and the uncertainty of the induced partition on the

cache lines is log2(256!/(16!)16)≈976 bits.

On the other hand, we present a subset of permutations, so called distinguished permu-

tations, that reduce the information leakage from 8 bits to 4 bits per key byte. Hence, the

remaining bits must be determined by an additional attack thereby increasing the complexity.

In our standard scenario this is the best one can achieve.

We focus only on the protection of the last round of AES and we assume that the output

xof the 9th round is randomized using some secret random permutation π. To be more

precise, each byte xiof the state x=x0,...,x15 is substituted by π(xi). To execute the last

round of AES a modified sbox T′

4that depends on πfulfilling

T′

4[π(xi)] = T4[xi]

is applied to every byte xi. This ensures that the resulting ciphertext c=c0,...,c15 is correct.

We denote the ℓ-th cache line used for the table lookups for T′

4by CLℓ, ℓ = 0,...,15. Hence,

CLℓcontains the 4-tuples

{(S[π−1(x)],S[π−1(x)],S[π−1(x)],S[π−1(x)]) |x= 16 ·ℓ, . . . , 16 ·ℓ+ 15}.

Using a permutation π, information leaking through accessed cache lines does not depend

directly on xibut only on the permuted value π(xi). Since πis unknown to Athe application

6.7. Countermeasures Based on Permutations 101

of πprevents him to deduce information about the last round key k10 =k10

0,...,k10

15 directly.

However, in the sequel we will show how to bypass random permutations by using CBAs.

6.7.1 An Access Driven CBA on a Permuted Sbox

We assume that we have a fast implementation of AES that is protected by a random per-

mutation πas described above. We also assume that the adversary Ahas access to the AES

decryption algorithm. This assumption can be avoided. However, the exposition becomes

easier if we allow Aaccess to the decryption. We show how an adversary Acan compute the

bytes k10

0,...,k10

15 of the last round key.

Let bk0denote a candidate for byte k10

0of the last round key. In a first step for each possible

value bk0the adversary Adetermines the assignment Pb

k0of bytes to cache lines induced by π

under the assumption that bk0=k10

0. To be more precise Acomputes a function

k0:{0,1}8→ {0,...,15}

such that if bk0is correct then for all x:

π(x)∈ {16 ·Pb

k0(x),...,16 ·Pb

k0(x) + 15}.

I.e., if bk0is correct then Pb

k0is the correct assignment of values π(x) to cache lines.

Let us fix some xand a candidate bk0for k10

0. We set c0=S[x]⊕bk0and let c

M0={0,...,15}

denote the set of indices of possible cache lines. The adversary Arepeats the following steps

for j= 1,2,...,n until c

M0contains a single element.

1. Achooses a ciphertext cj, whose first byte is c0, while the remaining bytes of cjare

chosen independently and uniformly at random.

2. Using his access to the decryption algorithm, Acomputes the plaintext pjcorresponding

to the cj.

3. Atriggers an encryption of pjby the crypto process and obtains cache information.

I.e., Aobtains the set Dj

0of cache lines that were accessed when applying sbox T′

during the encryption of pj.

4. Asets c

M0:= c

M0∩Dj

If c

M0={y}, then Asets Pb

k0(x) = y. Repeating this process for all xyields the function Pb

which has the desired property.

Under the assumption that the guess bk0was correct, the function Pb

k0is the correct

partition of values π(x) into cache lines. Remember that the permutation πis also used to

scramble the bytes on the other positions. In particular, the mapping of bytes to cache lines

102 Chapter 6. Cache Behavior Attacks (CBAs)

is the same for all positions of the state. Hence, it is not difficult to see that the information

provided by Pb

k0enables the adversary to mount a CBA on the last round similar to the one

described in Section 6.3.2 (page 87). This attack can be used to determine for each possible

candidate bk0a set of vectors bk1,...,bk15 of hypotheses for the other key bytes. To determine

a candidate bkithat arises from the value of bk0the attacker Aperforms the following steps:

1. Achooses n∈Nplaintexts p(1),...,p(n)

2. Aobtains the ciphertexts and the measurements m(j)= (D(j)

0, D(j)

1, c(j)) for 1 ≤j≤n.

3. Let xidenote the i-th byte of the intermediate state after the 9-th round. Aconcludes

that

xi∈b

X(j)

i=[

ℓ∈D(j)

{bxi|Pb

k0(bxi) = ℓ}

4. Acomputes the sets.

K(j)

i=nc(j)

i⊕Shbx(j)

ii|bx(j)

i∈b

X(j)

for all 1 ≤j≤n.

5. Acomputes the set

Ki=

j=1 b

K(j)

of candidates for ki.

For the time being, we assume that πhas the property that for each bk0there remains only

a single vector of hypotheses for the other key bytes. Hence, in the end there are only 256

AES keys left and a simple brute force attack reveals the correct one. In general, a random

permutation has this property. For a mathematical precise definition and analysis of that

property see Section 6.7.2.

Cost Analysis Experiments show that in the first step of the attack Aneeds on average

9 measurements consisting of a pair (pi, ci) and the corresponding cache information Di

such that the intersection c

M0:= TDi

0contains only a single element y=Pb

k0(x). We

need to determine the mapping Pb

k0(x) for every key candidate bk0and every argument x∈

{0,1}8. Hence, a straightforward implementation of the attack needs roughly 256 ·256 ·9

measurements to determine the function Pb

k0(x) for all arguments x∈ {0,1}8and all key

candidates bk0∈ {0,1}8. However, one can reuse measurements for different key candidates

bk0,bk′

0to reduce the number of measurements to roughly 256 ·9 = 2304. To determine the

vector of hypothesis based on the candidate bk0we can reuse the measurements obtained by

determining the function Pb

k0. Hence, the expected number of measurements of this attack is

2304.

6.7. Countermeasures Based on Permutations 103

6.7.2 Separability and Distinguished Permutations

From a security point of view, it is desirable to reduce the information leakage. E.g., a cache

attack alone should reveal as few information as possible, in particular it should not reveal the

complete key. Then the adversary is forced to either mount a refined and more complex CBA

based on other intermediate results or combine the cache attack with some other method to

determine the key bytes uniquely. In this case, the situation is similar to the attack of (Osvik

et al. 2006), where a cache attack on the first round only reveals 4 bits of each key byte.

Hence Osvik et al. combine cache attacks on the first and second round of AES.

First, we present the property a permutation applied to the result of the 9-th round should

have such that Acannot determine the key bytes uniquely using only a cache attack on the

last round. We denote the ℓ-th cache line by CLℓand the elements of CLℓby a(ℓ)

0,...,a(ℓ)

15 .

Hence, the underlying permutation used to define this cache line is given by

π−1(16ℓ+j) = S−1[a(ℓ)

j] (6.2)

for j= 0,...,15.

We say that a key candidate bk0is separable from the first key byte k10

0of the last round

if there exists a measurement that proves bk0to be wrong. Conversely, a key candidate bk0

is inseparable from the key k10

0if there does not exist a measurement that proves bk0to be

wrong. More precisely, writing bk0=k10

0⊕δthe bytes bk0and k10

0⊕δare inseparable if and

only if

∀ℓ∈ {0,...,15}∀a∈CLℓ:a⊕δ∈CLℓ.(6.3)

Notice that this property only depends on the difference δand not on the value of k0. In our

setting there are 16 elements of the sbox in every cache line and therefore property (6.3) can

only be satisfied by at most 16 differences.

It turns out that for |∆|= 16 the set

∆ := {δ|for all k0∈ {0,1}8the bytes k0and k0⊕δare inseparable}

forms a 4 dimensional subspace of F28viewed as a 8 dimensional vector space over F2. It

is obvious that the neutral element 0 is an element of ∆ and that every δ∈∆ is its own

inverse. It remains to show that ∆ is closed with respect to addition. Consider δ, δ′∈∆ and

an arbitrary a∈CLℓ. Then a′=a⊕δ∈CLℓimplies that a′⊕δ′=a⊕δ⊕δ′∈CLℓbecause

of (6.3) and δ⊕δ′∈∆ holds.

Hence, any partition that has the maximal number of inseparable key candidates must

generate a subspace of dimension 4.

Using this observation we describe how to efficiently construct permutations such that the

set ∆ of inseparable differences has size 16. In the sequel, we will call any such permutation

adistinguished permutation.

104 Chapter 6. Cache Behavior Attacks (CBAs)

Construction of the Subspace We first construct a set ∆ of 16 differences that is closed

with respect to addition over F256. We can do this in the following way

1. set ∆ := {δ0:= 0}, choose δ1uniformly at random from the set {1,...,255},

set ∆ := ∆ ∪ {δ1}

2. choose δ2uniformly at random from {1,...,255} \ ∆,

set ∆ := ∆ ∪ {δ2, δ3:= δ1⊕δ2}

3. choose δ4uniformly at random from {1,...,255} \ ∆,

set ∆ := ∆ ∪ {δ4, δ5:= δ4⊕δ1, δ6:= δ4⊕δ2, δ7:= δ4⊕δ3}

4. choose δ8uniformly at random from {1,...,255} \ ∆,

set ∆ := ∆ ∪ {δ8, δ9:= δ8⊕δ1, δ10 := δ8⊕δ2, δ11 := δ8⊕δ3, δ12 := δ8⊕δ4, δ13 :=

δ8⊕δ5, δ14 := δ8⊕δ6, δ15 := δ8⊕δ7}

This construction ensures that ∆ is closed with respect to addition and hence ∆ forms a

subspace as desired.

Construction of the Permutation Now we can compute the function Pthat maps

S[x]∈F8

2to a cache line. We use the fact that 16 proper translations of a 4 dimensional

subspace form a partition of a 8 dimensional vector space F8

2. A basis {b0,...b3}of the

subspace ∆ can be expanded by 4 vectors b4,...b7to a basis of F8

2. The 16 translations

of ∆ generated by linear combinations of b4,...,b7form the quotient space F8

2/∆ that is a

partition of F8

2. To construct the function Pwe do the following:

1. for every cache line CLℓdo

2. choose a(ℓ)uniformly at random from F256/{a(j)⊕δ|j < ℓ, δ ∈∆}

3. fill CLℓwith the values of the set {a(ℓ)⊕δ|δ∈∆}

Using (6.2) this partition into cache lines defines the corresponding permutation.

Analysis of the Countermeasure The security using a distinguished permutation as

defined above rests on two facts.

1. Using a distinguished permutation where the set ∆ of inseparable differences has size

16, a cache attack on the last round of AES will reveal only four bits of each key byte

k10

i. Overall 64 of the 128 bits of the last round key remain unknown. Therefore, the

adversary has to combine his cache attack on the last round with some other method to

determine the remaining 64 unknown bits. For example, he could try a modified cache

attack on the 9-th round exploiting his partial knowledge of the last round key. Or he

could use a brute force search to determine the last round key completely.

6.7. Countermeasures Based on Permutations 105

2. There are several distinguished permutations and each of these permutations leads to

16! different functions mapping elements to 16 lines. If we choose randomly one of these

functions, before an adversary can mount a cache attack on the last round as described

in Section 6.3.2, he first has to use some method like the one described in Section 6.7.1

to determine the function Pthat is actually used.

We stress that we consider the first fact to be the more important security feature. We saw

already in Section 6.7.1 that determining a random permutation used for mapping elements to

cache lines is not as secure as one might expect. Since we are using permutations of a special

form the attack described in Section 6.7.1 can be improved somewhat. In the remainder of

this section we briefly describe this improvement. To do so, first we have to determine the

number of subspaces leading to distinguished permutations.

As before view Fn

2:= {0,1}nas an n-dimensional F2vector space. For 0 ≤k≤nwe

define Dn,k to be the number of k-dimensional subspaces of Fn

2. To determine Dn,k for Van

arbitrary m-dimensional subspace of Fn

2we define

Nm,k := |{(v1,...,vk)|vi∈V, v1,...vkare linearly independent}|.

The number Nm,k is independent of the particular m-dimensional subspace V, it only depends

on the two parameters mand k. Then

Dn,k =Nn,k

Nk,k

Next we observe that

Nm,k =

k−1

j=0

(2m−2j) = 2k(k−1)/2

k−1

j=0

(2m−j−1).

Hence, we obtain that

Dn,k =Qk−1

j=0(2n−j−1)

Qk−1

j=0(2k−j−1).

In our special case we have n= 8 and k= 4 and hence the number of 4 dimensional subspaces

D8,4=255 ·127 ·63 ·31

15 ·7·3·1= 200787.

As mentioned above, each subspace leads to 16! different distinguished permutations.

Hence, overall we have 200787 ·16! ≈260 distinguished permutations. On the other hand,

because of the special structure of our permutations, to determine the function Pby cache

attacks can be done more efficiently than determining an arbitrary function mapping elements

to cache lines (see Section 6.7.1). In particular, Aonly needs to observe about 7 accesses of a

single but arbitrary cache line. With high probability this will be enough to determine a basis

of the subspace being used. In addition, Aneeds at least one access for every other cache

106 Chapter 6. Cache Behavior Attacks (CBAs)

line in order to determine the function P. The corresponding probability experiment follows

the multinomial distribution. We did not calculate the expected number of tries exactly.

Experiments show that if we can determine the accessed cache line exactly, on average 62

measurements suffice to compute the function Pexactly. However, a single measurement

only yields a set of accessed cache lines. But arguments similar to the ones used for the first

part of the attack in Section 6.7.1 show that we need on average 9 measurements to uniquely

determine an accessed cache line. Therefore, on average we need 9 ·62 = 558 experiments to

determine the function P.

Hence, compared to the results of Section 6.7.1 we have reduced the number of measure-

ments used to determine the function Pby a factor of 3. However, we want to stress again,

that the main security enhancement of using distinguished permutations instead of arbitrary

permutations is the fact, that with distinguished permutations the last round key cannot

be determined by a cache attack on the last round alone. To improve the security, one can

choose larger key sizes such as 192 bits or 256 bits. Since distinguished permutations protect

half of the key bits, the remaining uncertainty about the secret key after cache attacks can

be increased from 64 bits to 96 bits or 128 bits, respectively.

Separability and Random Permutations In our CBA on an implementation protected

by a random permutation (Section 6.7.1) we assumed that fixing a candidate bk0determines

the candidates for all other key bytes. With sufficiently many measurements for a fixed bk0

we can determine the function Pb

k0as defined in Section 6.7.1. Furthermore, we saw that the

separability of candidates bk, bk′depends only on their difference δ=bk⊕bk′. Hence, to be able

to rule out all but one candidate bkiat position ifor a fixed bk0the permutation πmust have

the following property:

∀δ6= 0∃j∈ {0,...,15}∃a∈CLj:a⊕δ6∈ CLj.

There are approximately 2844 of the 256! ≈21684 permutations that do not have this property.

Hence, a random permutation satisfies this condition with probability 1 −2884

21684 = 1 −2−840.

6.8 Summary of Countermeasures and Open Problems

In this chapter we presented and analyzed the security of several different implementations

of AES. Moreover, we analyzed countermeasures based on permutations: random permuta-

tions and distinguished permutations. We give a short overview over the advantages and

disadvantages of the countermeasures:

6.8. Summary of Countermeasures and Open Problems 107

countermeasure # measurements information security efficiency

leakage

small-4 ∞0 bits high slow

random permutation 2300 128 bits low fast

distinguished permutations 560 64 bits medium fast

The second column shows the expected number of measurements an attacker has to perform

in order to get the amount of information shown in the third column.

Small-4 (see Section 6.6) prevents information leakage in a cache attack. However, the

efficiency depends on the size of a cache line and is rather low. In contrast, random permu-

tations (see Section 6.7) provide only low security. About 2300 measurements are sufficient

to reveal the complete 128 bit AES key. If realized via table lookups, random permuta-

tions are fast. But to increase the security offered by random permutations they have to be

changed frequently. Changing a permutation may cause problems with respect to efficiency

and security. So far, we have no precise analysis of these issues.

Distinguished permutations (see Section 6.7.2) protect half of the key bits and hence

provide a medium level of security. Using distinguished permutations, no frequent changes

of permutations are required to achieve a medium level of security. Hence, they do not

suffer from the above mentioned problems of random permutations. Therefore, distinguished

permutations provide a better ratio of efficiency and security as random permutations but

still leak half of the key bits.

Random permutations and distinguished permutations have to be realized as tables for

efficiency reasons. Hence, a straightforward implementation of the applications of a permuta-

tion would render the whole implementation susceptible to cache attacks. A possible solution

to this problem is to realize permutations via small sboxes that completely fit into a cache

line. Following the description of the small-4 variant of Section 6.6, πis split into smaller

tables π0,...,π3each of which is applied to the input x. Obviously, this does not make sense

if the standard sbox Sis used because both πand Smap from {0,1}8to {0,1}8. Hence,

it takes as many table lookups to apply πrealized with small sboxes as it takes to apply S

realized with small sbox directly. Moreover, realizing Svia small tables has the advantage of

not leaking information via the cache behavior.

The situation is different if the large sboxes of the fast implementation are used. Again

πmaps from {0,1}8to {0,1}8but a large sbox maps from {0,1}8to {0,1}32. Therefore, it

takes 4 times as many table lookups to realize the large sbox via small sboxes than to realize

πvia small tables.

Hence, first applying πto an input via small tables and then applying a large permuted

sbox, as shown in Figure 6.7, makes sense if this technique is faster than realizing the standard

sbox Svia small sboxes. Here, one has to take into account the technical problem that on

108 Chapter 6. Cache Behavior Attacks (CBAs)

π1

π2

π0

π3T′

4[π(x0)]

=T4[x0]

π(x0)

Figure 6.7: Combining small tables with permutation π

32-bit platforms the byte oriented structure of the standard sbox Sleads to a time consuming

post processing to incorporate the output of the sbox into the encryption state.

Note that realizing πvia small tables does not leak any information in cache attacks. Only

the application of the permuted sbox leaks information about intermediate states. Hence,

this scenario is exactly the scenario of our attack in Section 6.7.1 where we assumed that

only the application of the sbox leaks information.

As mentioned in Section 6.6 one can scale the sizes of the smaller tables to improve

efficiency. But it is essential to determine whether the amount of information that leaks

with this method is acceptable or not. Summing up, the analysis given above shows that

permutations as a countermeasure to thwart cache based attacks do not provide as much

security as one would expect. However, we have shown that using distinguished permutations

one can reduce the information leakage via CBAs. That means that even with an arbitrary

number of measurements a CBA based on the last round cannot determine certain bits of

the secret key. Since we consider the reduction of information leakage as a preferred goal

distinguished permutations constitute an interesting way to improve the security gain of

permutations.

Appendix A

Sbox Tables T0,...,T4of AES

0 1 2 3 4 5 6 7

00 C6 63 63 A5 F8 7C 7C 84 EE 77 77 99 F6 7B 7B 8D FF F2 F2 0D D6 6B 6B BD DE 6F 6F B1 91 C5 C5 54

01 60 30 30 50 02 01 01 03 CE 67 67 A9 56 2B 2B 7D E7 FEFE 19 B5 D7 D7 62 4DABAB E6 EC 76 76 9A

02 8F CACA 45 1F 82 82 9D 89 C9 C9 40 FA 7D 7D 87 EFFA FA 15 B2 59 59 EB 8E 47 47 C9 FB F0 F0 0B

03 41 ADADEC B3 D4 D4 67 5F A2 A2FD 45 AFAFEA 23 9C 9C BF 53 A4 A4 F7 E4 72 72 96 9B C0 C0 5B

04 75 B7 B7 C2 E1 FDFD1C 3D 93 93 AE 4C 26 26 6A 6C 36 36 5A 7E 3F 3F 41 F5 F7 F7 02 83 CCCC 4F

05 68 34 34 5C 51 A5 A5 F4 D1 E5 E5 34 F9 F1 F1 08 E2 71 71 93 AB D8 D8 73 62 31 31 53 2A 15 15 3F

06 08 04 04 0C 95 C7 C7 52 46 23 23 65 9D C3 C3 5E 30 18 18 28 37 96 96 A1 0A 05 05 0F 2F 9A 9A B5

07 0E 07 07 09 24 12 12 36 1B 80 80 9B DF E2 E2 3D CDEBEB 26 4E 27 27 69 7F B2 B2CD EA 75 75 9F

08 12 09 09 1B 1D 83 83 9E 58 2C 2C 74 34 1A 1A 2E 36 1B 1B 2D DC 6E 6E B2 B4 5A 5A EE 5B A0 A0 FB

09 A4 52 52 F6 76 3B 3B 4D B7 D6 D6 61 7D B3 B3 CE 52 29 29 7B DD E3 E3 3E 5E 2F 2F 71 13 84 84 97

0A A6 53 53 F5 B9 D1 D1 68 00 00 00 00 C1 EDED 2C 40 20 20 60 E3 FC FC 1F 79 B1 B1 C8 B6 5B 5BED

0B D4 6A 6A BE 8D CBCB 46 67 BEBE D9 72 39 39 4B 94 4A 4A DE 98 4C 4C D4 B0 58 58 E8 85 CFCF 4A

0C BB D0 D0 6B C5 EF EF 2A 4F AAAAE5 EDFBFB 16 86 43 43 C5 9A 4D 4D D7 66 33 33 55 11 85 85 94

0D 8A 45 45 CF E9 F9 F9 10 04 02 02 06 FE 7F 7F 81 A0 50 50 F0 78 3C 3C 44 25 9F 9F BA 4B A8 A8 E3

0E A2 51 51 F3 5D A3 A3FE 80 40 40 C0 05 8F 8F 8A 3F 92 92 AD 21 9D 9DBC 70 38 38 48 F1 F5 F5 04

0F 63 BCBCDF 77 B6 B6 C1 AFDADA 75 42 21 21 63 20 10 10 30 E5 FF FF 1A FD F3 F3 0E BF D2 D2 6D

10 81 CDCD 4C 18 0C 0C 14 26 13 13 35 C3 ECEC 2F BE 5F 5F E1 35 97 97 A2 88 44 44 CC 2E 17 17 39

11 93 C4 C4 57 55 A7 A7 F2 FC 7E 7E 82 7A 3D 3D 47 C8 64 64 AC BA 5D 5D E7 32 19 19 2B E6 73 73 95

12 C0 60 60 A0 19 81 81 98 9E 4F 4F D1 A3DCDC 7F 44 22 22 66 54 2A 2A 7E 3B 90 90 AB 0B 88 88 83

13 8C 46 46 CA C7 EE EE 29 6B B8 B8 D3 28 14 14 3C A7DEDE 79 BC 5E 5E E2 16 0B 0B 1D ADDBDB 76

14 DB E0 E0 3B 64 32 32 56 74 3A 3A 4E 14 0A 0A 1E 92 49 49 DB 0C 06 06 0A 48 24 24 6C B8 5C 5C E4

15 9F C2 C2 5D BDD3 D3 6E 43 ACACEF C4 62 62 A6 39 91 91 A8 31 95 95 A4 D3 E4 E4 37 F2 79 79 8B

16 D5 E7 E7 32 8B C8 C8 43 6E 37 37 59 DA6D 6D B7 01 8D 8D 8C B1 D5 D5 64 9C 4E 4E D2 49 A9 A9 E0

17 D8 6C 6C B4 AC 56 56 FA F3 F4 F4 07 CFEAEA 25 CA 65 65 AF F4 7A 7A 8E 47 AEAE E9 10 08 08 18

18 6F BABAD5 F0 78 78 88 4A 25 25 6F 5C 2E 2E 72 38 1C 1C 24 57 A6 A6 F1 73 B4 B4 C7 97 C6 C6 51

19 CB E8 E8 23 A1DDDD7C E8 74 74 9C 3E 1F 1F 21 96 4B 4BDD 61 BDBDDC 0D 8B 8B 86 0F 8A 8A 85

1A E0 70 70 90 7C 3E 3E 42 71 B5 B5 C4 CC 66 66 AA 90 48 48 D8 06 03 03 05 F7 F6 F6 01 1C 0E 0E 12

1B C2 61 61 A3 6A 35 35 5F AE 57 57 F9 69 B9 B9 D0 17 86 86 91 99 C1 C1 58 3A 1D 1D 27 27 9E 9E B9

1C D9 E1 E1 38 EB F8 F8 13 2B 98 98 B3 22 11 11 33 D2 69 69 BB A9 D9 D9 70 07 8E 8E 89 33 94 94 A7

1D 2D 9B 9B B6 3C 1E 1E 22 15 87 87 92 C9 E9 E9 20 87 CECE 49 AA 55 55 FF 50 28 28 78 A5 DFDF7A

1E 03 8C 8C 8F 59 A1 A1 F8 09 89 89 80 1A 0D 0D 17 65 BFBFDA D7 E6 E6 31 84 42 42 C6 D0 68 68 B8

1F 82 41 41 C3 29 99 99 B0 5A 2D 2D 77 1E 0F 0F 11 7B B0 B0 CB A8 54 54 FC 6DBBBB D6 2C 16 16 3A

Table A.1: Sbox T0

109

0 1 2 3 4 5 6 7

00 A5 C6 63 63 84 F8 7C 7C 99 EE 77 77 8D F6 7B 7B 0D FF F2 F2 BD D6 6B 6B B1DE 6F 6F 54 91 C5 C5

01 50 60 30 30 03 02 01 01 A9CE 67 67 7D 56 2B 2B 19 E7 FEFE 62 B5 D7 D7 E6 4DABAB 9A EC 76 76

02 45 8F CACA 9D 1F 82 82 40 89 C9 C9 87 FA 7D 7D 15 EF FA FA EB B2 59 59 C9 8E 47 47 0B FB F0 F0

03 EC 41 ADAD 67 B3 D4 D4 FD 5F A2 A2 EA 45 AFAF BF 23 9C 9C F7 53 A4 A4 96 E4 72 72 5B 9B C0 C0

04 C2 75 B7 B7 1C E1 FDFD AE3D 93 93 6A 4C 26 26 5A 6C 36 36 41 7E 3F 3F 02 F5 F7 F7 4F 83 CCCC

05 5C 68 34 34 F4 51 A5 A5 34 D1 E5 E5 08 F9 F1 F1 93 E2 71 71 73 AB D8 D8 53 62 31 31 3F 2A 15 15

06 0C 08 04 04 52 95 C7 C7 65 46 23 23 5E 9D C3 C3 28 30 18 18 A1 37 96 96 0F 0A 05 05 B5 2F 9A 9A

07 09 0E 07 07 36 24 12 12 9B 1B 80 80 3D DF E2 E2 26 CDEBEB 69 4E 27 27 CD 7F B2 B2 9F EA 75 75

08 1B 12 09 09 9E 1D 83 83 74 58 2C 2C 2E 34 1A 1A 2D 36 1B 1B B2 DC 6E 6E EE B4 5A 5A FB 5B A0 A0

09 F6 A4 52 52 4D 76 3B 3B 61 B7 D6 D6 CE 7D B3 B3 7B 52 29 29 3E DD E3 E3 71 5E 2F 2F 97 13 84 84

0A F5 A6 53 53 68 B9 D1 D1 00 00 00 00 2C C1 EDED 60 40 20 20 1F E3 FCFC C8 79 B1 B1 ED B6 5B 5B

0B BE D4 6A 6A 46 8D CBCB D9 67 BEBE 4B 72 39 39 DE 94 4A 4A D4 98 4C 4C E8 B0 58 58 4A 85 CFCF

0C 6B BB D0 D0 2A C5 EF EF E5 4FAAAA 16 EDFBFB C5 86 43 43 D7 9A 4D 4D 55 66 33 33 94 11 85 85

0D CF 8A 45 45 10 E9 F9 F9 06 04 02 02 81 FE 7F 7F F0 A0 50 50 44 78 3C 3C BA 25 9F 9F E3 4B A8 A8

0E F3 A2 51 51 FE5D A3 A3 C0 80 40 40 8A 05 8F 8F AD 3F 92 92 BC 21 9D 9D 48 70 38 38 04 F1 F5 F5

0F DF 63 BCBC C1 77 B6 B6 75 AFDADA 63 42 21 21 30 20 10 10 1A E5 FF FF 0E FD F3 F3 6D BF D2 D2

10 4C 81 CDCD 14 18 0C 0C 35 26 13 13 2F C3 ECEC E1 BE 5F 5F A2 35 97 97 CC 88 44 44 39 2E 17 17

11 57 93 C4 C4 F2 55 A7 A7 82 FC 7E 7E 47 7A 3D 3D AC C8 64 64 E7 BA 5D 5D 2B 32 19 19 95 E6 73 73

12 A0 C0 60 60 98 19 81 81 D1 9E 4F 4F 7F A3DCDC 66 44 22 22 7E 54 2A 2A AB3B 90 90 83 0B 88 88

13 CA8C 46 46 29 C7 EE EE D3 6B B8 B8 3C 28 14 14 79 A7DEDE E2 BC 5E 5E 1D 16 0B 0B 76 ADDBDB

14 3BDB E0 E0 56 64 32 32 4E 74 3A 3A 1E 14 0A 0A DB 92 49 49 0A 0C 06 06 6C 48 24 24 E4 B8 5C 5C

15 5D 9F C2 C2 6EBDD3 D3 EF 43 ACAC A6 C4 62 62 A8 39 91 91 A4 31 95 95 37 D3 E4 E4 8B F2 79 79

16 32 D5 E7 E7 43 8B C8 C8 59 6E 37 37 B7 DA6D 6D 8C 01 8D 8D 64 B1 D5 D5 D2 9C 4E 4E E0 49 A9 A9

17 B4 D8 6C 6C FAAC 56 56 07 F3 F4 F4 25 CFEAEA AFCA 65 65 8E F4 7A 7A E9 47 AEAE 18 10 08 08

18 D5 6F BABA 88 F0 78 78 6F 4A 25 25 72 5C 2E 2E 24 38 1C 1C F1 57 A6 A6 C7 73 B4 B4 51 97 C6 C6

19 23 CB E8 E8 7C A1DDDD 9C E8 74 74 21 3E 1F 1F DD 96 4B 4B DC 61 BDBD 86 0D 8B 8B 85 0F 8A 8A

1A 90 E0 70 70 42 7C 3E 3E C4 71 B5 B5 AACC 66 66 D8 90 48 48 05 06 03 03 01 F7 F6 F6 12 1C 0E 0E

1B A3 C2 61 61 5F 6A 35 35 F9 AE 57 57 D0 69 B9 B9 91 17 86 86 58 99 C1 C1 27 3A 1D 1D B9 27 9E 9E

1C 38 D9 E1 E1 13 EB F8 F8 B3 2B 98 98 33 22 11 11 BB D2 69 69 70 A9 D9 D9 89 07 8E 8E A7 33 94 94

1D B6 2D 9B 9B 22 3C 1E 1E 92 15 87 87 20 C9 E9 E9 49 87 CECE FFAA 55 55 78 50 28 28 7A A5 DFDF

1E 8F 03 8C 8C F8 59 A1 A1 80 09 89 89 17 1A 0D 0D DA 65 BFBF 31 D7 E6 E6 C6 84 42 42 B8 D0 68 68

1F C3 82 41 41 B0 29 99 99 77 5A 2D 2D 11 1E 0F 0F CB 7B B0 B0 FC A8 54 54 D6 6DBBBB 3A 2C 16 16

Table A.2: Sbox T1

0 1 2 3 4 5 6 7

00 63 A5 C6 63 7C 84 F8 7C 77 99 EE 77 7B 8D F6 7B F2 0D FF F2 6B BD D6 6B 6F B1 DE 6F C5 54 91 C5

01 30 50 60 30 01 03 02 01 67 A9CE 67 2B 7D 56 2B FE 19 E7 FE D7 62 B5 D7 AB E6 4DAB 76 9A EC 76

02 CA 45 8F CA 82 9D 1F 82 C9 40 89 C9 7D 87 FA 7D FA 15 EF FA 59 EB B2 59 47 C9 8E 47 F0 0B FB F0

03 ADEC 41 AD D4 67 B3 D4 A2 FD 5F A2 AFEA 45 AF 9C BF 23 9C A4 F7 53 A4 72 96 E4 72 C0 5B 9B C0

04 B7 C2 75 B7 FD1C E1 FD 93 AE3D 93 26 6A 4C 26 36 5A 6C 36 3F 41 7E 3F F7 02 F5 F7 CC 4F 83 CC

05 34 5C 68 34 A5 F4 51 A5 E5 34 D1 E5 F1 08 F9 F1 71 93 E2 71 D8 73 ABD8 31 53 62 31 15 3F 2A 15

06 04 0C 08 04 C7 52 95 C7 23 65 46 23 C3 5E 9D C3 18 28 30 18 96 A1 37 96 05 0F 0A 05 9A B5 2F 9A

07 07 09 0E 07 12 36 24 12 80 9B 1B 80 E2 3D DF E2 EB 26 CDEB 27 69 4E 27 B2CD 7F B2 75 9F EA 75

08 09 1B 12 09 83 9E 1D 83 2C 74 58 2C 1A 2E 34 1A 1B 2D 36 1B 6E B2 DC 6E 5A EE B4 5A A0 FB 5B A0

09 52 F6 A4 52 3B 4D 76 3B D6 61 B7 D6 B3 CE 7D B3 29 7B 52 29 E3 3E DD E3 2F 71 5E 2F 84 97 13 84

0A 53 F5 A6 53 D1 68 B9 D1 00 00 00 00 ED 2C C1 ED 20 60 40 20 FC 1F E3 FC B1 C8 79 B1 5BED B6 5B

0B 6A BED4 6A CB 46 8D CB BED9 67 BE 39 4B 72 39 4A DE 94 4A 4C D4 98 4C 58 E8 B0 58 CF 4A 85 CF

0C D0 6B BB D0 EF2A C5 EF AAE5 4F AA FB 16 EDFB 43 C5 86 43 4D D7 9A 4D 33 55 66 33 85 94 11 85

0D 45 CF 8A 45 F9 10 E9 F9 02 06 04 02 7F 81 FE 7F 50 F0 A0 50 3C 44 78 3C 9F BA 25 9F A8 E3 4B A8

0E 51 F3 A2 51 A3 FE5D A3 40 C0 80 40 8F 8A 05 8F 92 AD 3F 92 9DBC 21 9D 38 48 70 38 F5 04 F1 F5

0F BCDF 63 BC B6 C1 77 B6 DA 75 AFDA 21 63 42 21 10 30 20 10 FF 1A E5 FF F3 0E FD F3 D2 6D BF D2

10 CD 4C 81 CD 0C 14 18 0C 13 35 26 13 EC 2F C3 EC 5F E1 BE 5F 97 A2 35 97 44 CC 88 44 17 39 2E 17

11 C4 57 93 C4 A7 F2 55 A7 7E 82 FC 7E 3D 47 7A 3D 64 AC C8 64 5D E7 BA5D 19 2B 32 19 73 95 E6 73

12 60 A0 C0 60 81 98 19 81 4F D1 9E 4F DC 7F A3DC 22 66 44 22 2A 7E 54 2A 90 AB3B 90 88 83 0B 88

13 46 CA8C 46 EE 29 C7 EE B8 D3 6B B8 14 3C 28 14 DE 79 A7DE 5E E2 BC 5E 0B 1D 16 0B DB 76 ADDB

14 E0 3B DB E0 32 56 64 32 3A 4E 74 3A 0A 1E 14 0A 49 DB 92 49 06 0A 0C 06 24 6C 48 24 5C E4 B8 5C

15 C2 5D 9F C2 D3 6EBD D3 ACEF 43 AC 62 A6 C4 62 91 A8 39 91 95 A4 31 95 E4 37 D3 E4 79 8B F2 79

16 E7 32 D5 E7 C8 43 8B C8 37 59 6E 37 6D B7 DA6D 8D 8C 01 8D D5 64 B1 D5 4E D2 9C 4E A9 E0 49 A9

17 6C B4 D8 6C 56 FAAC 56 F4 07 F3 F4 EA 25 CFEA 65 AFCA 65 7A 8E F4 7A AE E9 47 AE 08 18 10 08

18 BAD5 6F BA 78 88 F0 78 25 6F 4A 25 2E 72 5C 2E 1C 24 38 1C A6 F1 57 A6 B4 C7 73 B4 C6 51 97 C6

19 E8 23 CB E8 DD7C A1DD 74 9C E8 74 1F 21 3E 1F 4BDD 96 4B BDDC 61 BD 8B 86 0D 8B 8A 85 0F 8A

1A 70 90 E0 70 3E 42 7C 3E B5 C4 71 B5 66 AACC 66 48 D8 90 48 03 05 06 03 F6 01 F7 F6 0E 12 1C 0E

1B 61 A3 C2 61 35 5F 6A 35 57 F9 AE 57 B9 D0 69 B9 86 91 17 86 C1 58 99 C1 1D 27 3A 1D 9E B9 27 9E

1C E1 38 D9 E1 F8 13 EB F8 98 B3 2B 98 11 33 22 11 69 BB D2 69 D9 70 A9 D9 8E 89 07 8E 94 A7 33 94

1D 9B B6 2D 9B 1E 22 3C 1E 87 92 15 87 E9 20 C9 E9 CE 49 87 CE 55 FFAA 55 28 78 50 28 DF7A A5 DF

1E 8C 8F 03 8C A1 F8 59 A1 89 80 09 89 0D 17 1A 0D BFDA 65 BF E6 31 D7 E6 42 C6 84 42 68 B8 D0 68

1F 41 C3 82 41 99 B0 29 99 2D 77 5A 2D 0F 11 1E 0F B0 CB 7B B0 54 FC A8 54 BBD6 6DBB 16 3A 2C 16

Table A.3: Sbox T2

0 1 2 3 4 5 6 7

00 63 63 A5 C6 7C 7C 84 F8 77 77 99 EE 7B 7B 8D F6 F2 F2 0D FF 6B 6BBD D6 6F 6F B1 DE C5 C5 54 91

01 30 30 50 60 01 01 03 02 67 67 A9CE 2B 2B 7D 56 FEFE 19 E7 D7 D7 62 B5 ABAB E6 4D 76 76 9A EC

02 CACA 45 8F 82 82 9D 1F C9 C9 40 89 7D 7D 87 FA FA FA 15 EF 59 59 EB B2 47 47 C9 8E F0 F0 0B FB

03 ADADEC 41 D4 D4 67 B3 A2 A2FD 5F AFAFEA 45 9C 9C BF 23 A4 A4 F7 53 72 72 96 E4 C0 C0 5B 9B

04 B7 B7 C2 75 FDFD1C E1 93 93 AE3D 26 26 6A 4C 36 36 5A 6C 3F 3F 41 7E F7 F7 02 F5 CCCC 4F 83

05 34 34 5C 68 A5 A5 F4 51 E5 E5 34 D1 F1 F1 08 F9 71 71 93 E2 D8 D8 73 AB 31 31 53 62 15 15 3F 2A

06 04 04 0C 08 C7 C7 52 95 23 23 65 46 C3 C3 5E 9D 18 18 28 30 96 96 A1 37 05 05 0F 0A 9A 9A B5 2F

07 07 07 09 0E 12 12 36 24 80 80 9B 1B E2 E2 3D DF EBEB 26 CD 27 27 69 4E B2 B2CD 7F 75 75 9F EA

08 09 09 1B 12 83 83 9E 1D 2C 2C 74 58 1A 1A 2E 34 1B 1B 2D 36 6E 6E B2 DC 5A 5A EE B4 A0 A0 FB 5B

09 52 52 F6 A4 3B 3B 4D 76 D6 D6 61 B7 B3 B3 CE 7D 29 29 7B 52 E3 E3 3E DD 2F 2F 71 5E 84 84 97 13

0A 53 53 F5 A6 D1 D1 68 B9 00 00 00 00 EDED 2C C1 20 20 60 40 FC FC 1F E3 B1 B1 C8 79 5B 5BED B6

0B 6A 6A BE D4 CBCB 46 8D BEBE D9 67 39 39 4B 72 4A 4A DE 94 4C 4C D4 98 58 58 E8 B0 CFCF 4A 85

0C D0 D0 6B BB EF EF 2A C5 AAAAE5 4F FBFB 16 ED 43 43 C5 86 4D 4D D7 9A 33 33 55 66 85 85 94 11

0D 45 45 CF 8A F9 F9 10 E9 02 02 06 04 7F 7F 81 FE 50 50 F0 A0 3C 3C 44 78 9F 9F BA 25 A8 A8 E3 4B

0E 51 51 F3 A2 A3 A3 FE5D 40 40 C0 80 8F 8F 8A 05 92 92 AD 3F 9D 9DBC 21 38 38 48 70 F5 F5 04 F1

0F BCBCDF 63 B6 B6 C1 77 DADA 75 AF 21 21 63 42 10 10 30 20 FF FF 1A E5 F3 F3 0E FD D2 D2 6D BF

10 CDCD 4C 81 0C 0C 14 18 13 13 35 26 ECEC 2F C3 5F 5F E1 BE 97 97 A2 35 44 44 CC 88 17 17 39 2E

11 C4 C4 57 93 A7 A7 F2 55 7E 7E 82 FC 3D 3D 47 7A 64 64 AC C8 5D 5D E7 BA 19 19 2B 32 73 73 95 E6

12 60 60 A0 C0 81 81 98 19 4F 4F D1 9E DCDC 7F A3 22 22 66 44 2A 2A 7E 54 90 90 AB3B 88 88 83 0B

13 46 46 CA8C EE EE 29 C7 B8 B8 D3 6B 14 14 3C 28 DEDE 79 A7 5E 5E E2 BC 0B 0B 1D 16 DBDB 76 AD

14 E0 E0 3BDB 32 32 56 64 3A 3A 4E 74 0A 0A 1E 14 49 49 DB 92 06 06 0A 0C 24 24 6C 48 5C 5C E4 B8

15 C2 C2 5D 9F D3 D3 6EBD ACACEF 43 62 62 A6 C4 91 91 A8 39 95 95 A4 31 E4 E4 37 D3 79 79 8B F2

16 E7 E7 32 D5 C8 C8 43 8B 37 37 59 6E 6D 6D B7DA 8D 8D 8C 01 D5 D5 64 B1 4E 4E D2 9C A9 A9 E0 49

17 6C 6C B4 D8 56 56 FAAC F4 F4 07 F3 EAEA 25 CF 65 65 AFCA 7A 7A 8E F4 AEAE E9 47 08 08 18 10

18 BABA D5 6F 78 78 88 F0 25 25 6F 4A 2E 2E 72 5C 1C 1C 24 38 A6 A6 F1 57 B4 B4 C7 73 C6 C6 51 97

19 E8 E8 23 CB DDDD7C A1 74 74 9C E8 1F 1F 21 3E 4B 4B DD 96 BDBDDC 61 8B 8B 86 0D 8A 8A 85 0F

1A 70 70 90 E0 3E 3E 42 7C B5 B5 C4 71 66 66 AACC 48 48 D8 90 03 03 05 06 F6 F6 01 F7 0E 0E 12 1C

1B 61 61 A3 C2 35 35 5F 6A 57 57 F9 AE B9 B9 D0 69 86 86 91 17 C1 C1 58 99 1D 1D 27 3A 9E 9E B9 27

1C E1 E1 38 D9 F8 F8 13 EB 98 98 B3 2B 11 11 33 22 69 69 BB D2 D9 D9 70 A9 8E 8E 89 07 94 94 A7 33

1D 9B 9B B6 2D 1E 1E 22 3C 87 87 92 15 E9 E9 20 C9 CECE 49 87 55 55 FFAA 28 28 78 50 DFDF7A A5

1E 8C 8C 8F 03 A1 A1 F8 59 89 89 80 09 0D 0D 17 1A BFBFDA 65 E6 E6 31 D7 42 42 C6 84 68 68 B8 D0

1F 41 41 C3 82 99 99 B0 29 2D 2D 77 5A 0F 0F 11 1E B0 B0 CB 7B 54 54 FC A8 BBBB D6 6D 16 16 3A 2C

Table A.4: Sbox T3

0 1 2 3 4 5 6 7

00 63 63 63 63 7C 7C 7C 7C 77 77 77 77 7B 7B 7B 7B F2 F2 F2 F2 6B 6B 6B 6B 6F 6F 6F 6F C5 C5 C5 C5

01 30 30 30 30 01 01 01 01 67 67 67 67 2B 2B 2B 2B FEFEFEFE D7 D7 D7 D7 ABABABAB 76 76 76 76

02 CACACACA 82 82 82 82 C9 C9 C9 C9 7D 7D 7D 7D FA FA FA FA 59 59 59 59 47 47 47 47 F0 F0 F0 F0

03 ADADADAD D4 D4 D4 D4 A2 A2 A2 A2 AFAFAFAF 9C 9C 9C 9C A4 A4 A4 A4 72 72 72 72 C0 C0 C0 C0

04 B7 B7 B7 B7 FDFDFDFD 93 93 93 93 26 26 26 26 36 36 36 36 3F 3F 3F 3F F7 F7 F7 F7 CCCCCCCC

05 34 34 34 34 A5 A5 A5 A5 E5 E5 E5 E5 F1 F1 F1 F1 71 71 71 71 D8 D8 D8 D8 31 31 31 31 15 15 15 15

06 04 04 04 04 C7 C7 C7 C7 23 23 23 23 C3 C3 C3 C3 18 18 18 18 96 96 96 96 05 05 05 05 9A 9A 9A 9A

07 07 07 07 07 12 12 12 12 80 80 80 80 E2 E2 E2 E2 EBEBEBEB 27 27 27 27 B2 B2 B2 B2 75 75 75 75

08 09 09 09 09 83 83 83 83 2C 2C 2C 2C 1A 1A 1A 1A 1B 1B 1B 1B 6E 6E 6E 6E 5A 5A 5A 5A A0 A0 A0 A0

09 52 52 52 52 3B 3B 3B 3B D6 D6 D6 D6 B3 B3 B3 B3 29 29 29 29 E3 E3 E3 E3 2F 2F 2F 2F 84 84 84 84

0A 53 53 53 53 D1 D1 D1 D1 00 00 00 00 EDEDEDED 20 20 20 20 FCFCFCFC B1 B1 B1 B1 5B 5B 5B 5B

0B 6A 6A 6A 6A CBCBCBCB BEBEBEBE 39 39 39 39 4A 4A 4A 4A 4C 4C 4C 4C 58 58 58 58 CFCFCFCF

0C D0 D0 D0 D0 EF EF EF EF AAAAAAAA FB FBFB FB 43 43 43 43 4D 4D 4D 4D 33 33 33 33 85 85 85 85

0D 45 45 45 45 F9 F9 F9 F9 02 02 02 02 7F 7F 7F 7F 50 50 50 50 3C 3C 3C 3C 9F 9F 9F 9F A8 A8 A8 A8

0E 51 51 51 51 A3 A3 A3 A3 40 40 40 40 8F 8F 8F 8F 92 92 92 92 9D 9D 9D 9D 38 38 38 38 F5 F5 F5 F5

0F BCBCBCBC B6 B6 B6 B6 DADADADA 21 21 21 21 10 10 10 10 FF FF FF FF F3 F3 F3 F3 D2 D2 D2 D2

10 CDCDCDCD 0C 0C 0C 0C 13 13 13 13 ECECECEC 5F 5F 5F 5F 97 97 97 97 44 44 44 44 17 17 17 17

11 C4 C4 C4 C4 A7 A7 A7 A7 7E 7E 7E 7E 3D 3D 3D 3D 64 64 64 64 5D 5D 5D 5D 19 19 19 19 73 73 73 73

12 60 60 60 60 81 81 81 81 4F 4F 4F 4F DCDCDCDC 22 22 22 22 2A 2A 2A 2A 90 90 90 90 88 88 88 88

13 46 46 46 46 EE EE EE EE B8 B8 B8 B8 14 14 14 14 DEDEDEDE 5E 5E 5E 5E 0B 0B 0B 0B DBDBDBDB

14 E0 E0 E0 E0 32 32 32 32 3A 3A 3A 3A 0A 0A 0A 0A 49 49 49 49 06 06 06 06 24 24 24 24 5C 5C 5C 5C

15 C2 C2 C2 C2 D3 D3 D3 D3 ACACACAC 62 62 62 62 91 91 91 91 95 95 95 95 E4 E4 E4 E4 79 79 79 79

16 E7 E7 E7 E7 C8 C8 C8 C8 37 37 37 37 6D 6D 6D 6D 8D 8D 8D 8D D5 D5 D5 D5 4E 4E 4E 4E A9 A9 A9 A9

17 6C 6C 6C 6C 56 56 56 56 F4 F4 F4 F4 EAEAEAEA 65 65 65 65 7A 7A 7A 7A AEAEAEAE 08 08 08 08

18 BABABABA 78 78 78 78 25 25 25 25 2E 2E 2E 2E 1C 1C 1C 1C A6 A6 A6 A6 B4 B4 B4 B4 C6 C6 C6 C6

19 E8 E8 E8 E8 DDDDDDDD 74 74 74 74 1F 1F 1F 1F 4B 4B 4B 4B BDBDBDBD 8B 8B 8B 8B 8A 8A 8A 8A

1A 70 70 70 70 3E 3E 3E 3E B5 B5 B5 B5 66 66 66 66 48 48 48 48 03 03 03 03 F6 F6 F6 F6 0E 0E 0E 0E

1B 61 61 61 61 35 35 35 35 57 57 57 57 B9 B9 B9 B9 86 86 86 86 C1 C1 C1 C1 1D 1D 1D 1D 9E 9E 9E 9E

1C E1 E1 E1 E1 F8 F8 F8 F8 98 98 98 98 11 11 11 11 69 69 69 69 D9 D9 D9 D9 8E 8E 8E 8E 94 94 94 94

1D 9B 9B 9B 9B 1E 1E 1E 1E 87 87 87 87 E9 E9 E9 E9 CECECECE 55 55 55 55 28 28 28 28 DFDFDFDF

1E 8C 8C 8C 8C A1 A1 A1 A1 89 89 89 89 0D 0D 0D 0D BFBFBFBF E6 E6 E6 E6 42 42 42 42 68 68 68 68

1F 41 41 41 41 99 99 99 99 2D 2D 2D 2D 0F 0F 0F 0F B0 B0 B0 B0 54 54 54 54 BBBBBBBB 16 16 16 16

Table A.5: Sbox T4

Appendix B

Decompositions of the AES Sbox

In the sequel, the standard AES sbox is decomposed into smaller number of sboxes as de-

scribed in Section 6.6 on page 97. For each decomposition the function to compute S[x] given

xis shown.

The standard AES Sbox The standard sbox is an efficient realization of the mapping

{0,1}8→ {0,1}8

x7→ S[x].







63 7C 77 7B F2 6B 6F C5 30 01 67 2B FE D7 AB 76

CA 82 C9 7D FA 59 47 F0 AD D4 A2 AF 9C A4 72 C0

B7 FD 93 26 36 3F F7 CC 34 A5 E5 F1 71 D8 31 15

04 C7 23 C3 18 96 05 9A 07 12 80 E2 EB 27 B2 75

09 83 2C 1A 1B 6E 5A A0 52 3B D6 B3 29 E3 2F 84

53 D1 00 ED 20 FC B1 5B 6A CB BE 39 4A 4C 58 CF

D0 EF AA FB 43 4D 33 85 45 F9 02 7F 50 3C 9F A8

51 A3 40 8F 92 9D 38 F5 BC B6 DA 21 10 FF F3 D2

CD 0C 13 EC 5F 97 44 17 C4 A7 7E 3D 64 5D 19 73

60 81 4F DC 22 2A 90 88 46 EE B8 14 DE 5E 0B DB

E0 32 3A 0A 49 06 24 5C C2 D3 AC 62 91 95 E4 79

E7 C8 37 6D 8D D5 4E A9 6C 56 F4 EA 65 7A AE 08

BA 78 25 2E 1C A6 B4 C6 E8 DD 74 1F 4B BD 8B 8A

70 3E B5 66 48 03 F6 0E 61 35 57 B9 86 C1 1D 9E

E1 F8 98 11 69 D9 8E 94 9B 1E 87 E9 CE 55 28 DF

8C A1 89 0D BF E6 42 68 41 99 2D 0F B0 54 BB 16







Table B.1: The standard sbox S

115

Decomposition of the sbox S into 2smaller sboxes

The standard sbox Sis splitted into 2 smaller sboxes S(2)

0and S(2)

1each mapping from {0,1}8

to {0,1}4. The application of the sbox is then realized as

{0,1}8→ {0,1}4× {0,1}4

x7→ 16 ·S(2)

1[x]⊕S(2)

0[x]

S(2)







3 C 7 B 2 B F 5 0 1 7 B E 7 B 6

A 2 9 D A 9 7 0 D 4 2 F C 4 2 0

7 D 3 6 6 F 7 C 4 5 5 1 1 8 1 5

4 7 3 3 8 6 5 A 7 2 0 2 B 7 2 5

9 3 C A B E A 0 2 B 6 3 9 3 F 4

3 1 0 D 0 C 1 B A B E 9 A C 8 F

0 F A B 3 D 3 5 5 9 2 F 0 C F 8

1 3 0 F 2 D 8 5 C 6 A 1 0 F 3 2

D C 3 C F 7 4 7 4 7 E D 4 D 9 3

0 1 F C 2 A 0 8 6 E 8 4 E E B B

0 2 A A 9 6 4 C 2 3 C 2 1 5 4 9

7 8 7 D D 5 E 9 C 6 4 A 5 A E 8

A 8 5 E C 6 4 6 8 D 4 F B D B A

0 E 5 6 8 3 6 E 1 5 7 9 6 1 D E

1 8 8 1 9 9 E 4 B E 7 9 E 5 8 F

C 1 9 D F 6 2 8 1 9 D F 0 4 B 6







S(2)







6 7 7 7 F 6 6 C 3 0 6 2 F D A 7

C 8 C 7 F 5 4 F A D A A 9 A 7 C

B F 9 2 3 3 F C 3 A E F 7 D 3 1

0 C 2 C 1 9 0 9 0 1 8 E E 2 B 7

0 8 2 1 1 6 5 A 5 3 D B 2 E 2 8

5 D 0 E 2 F B 5 6 C B 3 4 4 5 C

D E A F 4 4 3 8 4 F 0 7 5 3 9 A

5 A 4 8 9 9 3 F B B D 2 1 F F D

C 0 1 E 5 9 4 1 C A 7 3 6 5 1 7

6 8 4 D 2 2 9 8 4 E B 1 D 5 0 D

E 3 3 0 4 0 2 5 C D A 6 9 9 E 7

E C 3 6 8 D 4 A 6 5 F E 6 7 A 0

B 7 2 2 1 A B C E D 7 1 4 B 8 8

7 3 B 6 4 0 F 0 6 3 5 B 8 C 1 9

E F 9 1 6 D 8 9 9 1 8 E C 5 2 D

8 A 8 0 B E 4 6 4 9 2 0 B 5 B 1







Decomposition of the sbox S into 4smaller sboxes

The standard sbox Sis splitted into 4 smaller sboxes S(4)

0,...,S(4)

3each mapping from {0,1}8

to {0,1}2. The application of the sbox is then realized as

x7→ 64 ·S(4)

3[x]⊕16 ·S(4)

2[x]⊕4·S(4)

1[x]⊕S(4)

0[x]

S(4)







3033233101332332

2211213010230020

3132233001111011

0333021232023321

1302322023231330

3101001323212003

0323313111230030

1303210102210332

1030330303210113

0130220022002233

0222120023021101

3031112102021220

2012020201033132

0212032211312112

1001112032312103

0111322011130032







S(4)







0312023100123121

2023221031033100

1301131311100201

1100211210002101

2032232002102031

0003030222322323

0322030112030332

0003032131200300

3303311111331320

0033020213213322

0022211300300112

1213313231121232

2213311123132322

0311201301121033

0220223123123123

3023310202330121







S(4)







2333322030223123

0003310321221230

3312333032233131

0020110101022233

0021121213132220

1102233120330010

1223003003031312

1200113333121331

0012110102332113

2001221002311101

2330002101221123

2032010221322320

3322123021310300

3332003023130011

2311210111020121

0200320201203131







S(4)







1111311300103321

3231311323222213

2320003302331300

0303020200233021

0200011210320302

1303032113201113

3323110213011022

1212220322300333

3003121032101101

1213002213203103

3000100133212231

3301231211331120

2100022333101222

1021103010122302

3320132220233103

2220231112002120







Decomposition of the sbox S into 8smaller sboxes

The standard sbox Sis splitted into 8 smaller sboxes S(8)

0,...,S(8)

7each mapping from {0,1}8

to {0,1}1. The application of the sbox is then realized as

{0,1}8→ {0,1} × {0,1} × {0,1} × {0,1} × {0,1} × {0,1} × {0,1} × {0,1}

x7→ 128·S(8)

7[x]⊕64·S(8)

6[x]⊕32·S(8)

5[x]⊕16·S(8)

4[x]⊕8·S(8)

3[x]⊕4·S(8)

2[x]⊕2·S(8)

1[x]⊕S(8)

0[x]

S(8)







1011011101110110

0011011010010000

1110011001111011

0111001010001101

1100100001011110

1101001101010001

0101111111010010

1101010100010110

1010110101010111

0110000000000011

0000100001001101

1011110100001000

0010000001011110

0010010011110110

1001110010110101

0111100011110010







S(8)







1011111000111111

1100101000110010

1011111000000000

0111010111011110

0101111011110110

1000000111101001

0111101000110010

0101100001100111

0010110101100001

0010110011001111

0111010011010000

1010001001010110

1001010100011011

0101011100101001

0000001011101001

0000111000010011







S(8)







0110001100101101

0001001011011100

1101111111100001

1100011010000101

0010010000100011

0001010000100101

0100010110010110

0001010111000100

1101111111111100

0011000011011100

0000011100100110

1011111011101010

0011111101110100

0111001101101011

0000001101101101

1001110000110101







S(8)







0101011000011010

1011110010011000

0100010100000100

0000100100001000

1011111001001010

0001010111111111

0111010001010111

0001011010100100

1101100000110110

0011010101101111

0011100100100001

0101101110010111

1101100011011111

0100100100010011

0110111011011011

1011100101110010







S(8)







0111100010001101

0001110101001010

1110111010011111

0000110101000011

0001101011110000

1100011100110010

1001001001011110

1000111111101111

0010110100110111

0001001000111101

0110000101001101

0010010001100100

1100101001110100

1110001001110011

0111010111000101

0000100001001111







S(8)







1111111010111011

0001100110110110

1101111011111010

0010000000011111

0010010101011110

0001111010110000

0111001001010101

0100001111010110

0001000001111001

1000110001100000

1110001000110011

1011000110111110

1111011010100100

1111001011010000

1100100000010010

0100110100101010







S(8)







1111111100101101

1011111101000011

0100001100111100

0101000000011001

0000011010100100

1101010111001111

1101110011011000

1010000100100111

1001101010101101

1011000011001101

1000100111010011

1101011011111100

0100000111101000

1001101010100100

1100110000011101

0000011110000100







S(8)







0000100100001110

1110100111111101

1110001101110100

0101010100111010

0100000100110101

0101011001100001

1111000101000011

0101110111100111

1001010011000000

0101001101101001

1000000011101110

1100110100110010

1000011111000111

0010001000011101

1110011110111001

1110110001001010







List of Tables

4.1 Computation of (u254 ⊕r1,13) using repeated squaring . . . . . . . . . . . . . 35

4.2 Computation of (u254 ⊕r1) using repeated squaring (simplified version) . . . 38

4.3 Hardware costs of different inversion circuits . . . . . . . . . . . . . . . . . . . 40

5.1 All possible differences of p0,p′

0. . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.2 Overview over the fault based collision attacks . . . . . . . . . . . . . . . . . 69

6.1 The memory hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.2 Comparing properties of different CBAs . . . . . . . . . . . . . . . . . . . . . 85

6.3 Experimental environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.4 The resistance of the standard implementation . . . . . . . . . . . . . . . . . 93

6.5 The resistance of the fast implementation . . . . . . . . . . . . . . . . . . . . 95

6.6 The resistance of the fast implementation using only T0............ 96

6.7 The resistance of small-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.8 Timings for small-2, small-4 and small-8 applied on the last round of AES . . 98

6.9 Information Leakage, Resistance and Efficiency of AES implementations . . . 99

6.10 Simplified Comparison of Implementations . . . . . . . . . . . . . . . . . . . . 100

A.1 Sbox T0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

A.2 Sbox T1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

A.3 Sbox T2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

A.4 Sbox T3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

A.5 Sbox T4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

B.1 The standard sbox S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

123

List of Figures

2.1 Mapping the plaintext pinto a state . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 The SubBytes transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 The ShiftRows transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 The MixColumns transformation . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 The AddRoundKey transformation . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1 Black box model of classical cryptography . . . . . . . . . . . . . . . . . . . . 19

3.2 Extended black box model that incorporates side channels . . . . . . . . . . . 20

5.1 Model of an enhanced smartcard with memory encryption mechanism (MEM) 50

6.1 Partitioning the address of requested data . . . . . . . . . . . . . . . . . . . . 75

6.2 Different types of cache memory . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.3 Basic structure of a time driven CBA . . . . . . . . . . . . . . . . . . . . . . 79

6.4 Basic structure of a trace driven CBA . . . . . . . . . . . . . . . . . . . . . . 81

6.5 Prime-and-Probe method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.6 Formal outline of an access driven CBA . . . . . . . . . . . . . . . . . . . . . 84

6.7 Combining small tables with permutation π. . . . . . . . . . . . . . . . . . . 108

125

Bibliography

Acıi¸cmez, O., and C¸. K. Ko¸c. 2006. Trace Driven Cache Attack on AES. In ICICS. P. Ning,

S. Qing, and N. Li. (Eds.). Vol. 4307 of Lecture Notes in Computer Science. Springer

Verlag. Pp. 112–121.

Acıi¸cmez, O., W. Schindler, and C¸. K. Ko¸c. 2005. Improving Brumley and Boneh timing

attack on unprotected SSL implementations. In ACM Conference on Computer and

Communications Security. V. Atluri, C. Meadows, and A. Juels. (Eds.). ACM. Pp. 139–

146.

Akkar, M.-L., and C. Giraud. 2001. An Implementation of DES and AES, Secure against

Some Attacks. In Ko¸c, Naccache and Paar (2001). Pp. 309–318.

Akkar, M.-L., and L. Goubin. 2003. A generic protection against high-order differential power

analysis. In Johansson (2003). Pp. 192–205.

Akkar, M.-L., R. B´evan, and L. Goubin. 2004. Two Power Analysis Attacks against One-

Mask Methods. In 11th International Workshop on Fast Software Encryption — FSE

2004. B. K. Roy, and W. Meier. (Eds.). Vol. LNCS 3017 of Lecture Notes in Computer

Science. Springer-Verlag.

Anderson, R. 2001. Security Engineering: A Guide to Building Dependable Distributed Sys-

tems. Wiley & Sons.

Anderson, R. J., and M. G. Kuhn. 1996. Tamper resistance – a cautionary note. Proceedings

of the second USENIX Workshop on Electronic Commerce. USENIX Association. Pp. 1–

11.

Bar-El, H., H. Choukri, D. Naccache, M. Tunstall, and C. Whelan. 2006. The Sorcerer’s

Apprentice Guide to Fault Attacks. Proceedings of the IEEE. Vol. 94. Pp. 370–382.

Baudron, O., F. Boudot, P. Bourel, E. Bresson, J. Corbel, L. Frisch, H. Gilbert, M. Girault,

L. Goubin, J.-F. Misarsky, P. Nguyen, J. Patarin, D. Pointcheval, J. Stern, J. Traor,

and G. Poupard. 2000. GPS - An Asymmetric Identification Scheme for on the fly

Authentication of Low Cost Smart Cards.

127

Bernstein, D. J. 2005. Cache-timing attacks on AES.

http://cr.yp.to/papers.html#cachetiming.

Bertoni, G., V. Zaccaria, L. Breveglieri, M. Monchiero, and G. Palermo. 2005. AES Power

Attack Based on Induced Cache Miss and Countermeasure. International Symposium

on Information Technology: Coding and Computing (ITCC 2005), Volume 1, 4-6 April

2005, Las Vegas, Nevada, USA. IEEE Computer Society. Pp. 586–591.

Biham, E., and A. Shamir. 1997. Differential fault analysis of secret key cryptosystems.

In Advances in Cryptology - CRYPTO ’97, 17th Annual International Cryptology Con-

ference, Santa Barbara, California, USA, August 17-21, 1997, Proceedings. Burton S.

Kaliski Jr. (ed.). Vol. 1294 of Lecture Notes in Computer Science. Springer. Pp. 513–525.

Biham, E., and A. Shamir. 1999. Power Analysis of the Key Scheduling of the AES Candi-

dates. Proceedings of the Second AES Candidate Conference (AES2). Rome, Italy.

Bl¨omer, J., and J.-P. Seifert. 2003. Fault Based Cryptanalysis of the Advanced Encryption

Standard (AES). In Financial Cryptography, 7th International Conference, FC 2003,

Guadeloupe, French West Indies, January 27-30, 2003, Revised Papers. R. N. Wright

(ed.). Vol. 2742 of Lecture Notes in Computer Science. Springer-Verlag. Pp. 162–181.

Bl¨omer, J., and V. Krummel. 2006. Fault based collision attacks on AES. In Fault Diagnosis

and Tolerance in Cryptography, Third International Workshop, FDTC 2006, Yokohama,

Japan, October 10, 2006, Proceedings. L. Breveglieri, I. Koren, D. Naccache, and J.-P.

Seifert. (Eds.). Vol. 4236 of Lecture Notes in Computer Science. Springer. Pp. 106–120.

Bl¨omer, J., and V. Krummel. 2007. Analysis of countermeasures against access driven cache

attacks on AES. In Selected Areas in Cryptography, 14th International Workshop, SAC

2007, Ottawa, Canada, August 16-17, 2007, Revised Selected Papers. C. Adams, A. Miri,

and M. Wiener. (Eds.). Lecture Notes in Computer Science. to appear.

Bl¨omer, J., J. Guajardo, and V. Krummel. 2004. Provably secure masking of AES. In Selected

Areas in Cryptography, 11th International Workshop, SAC 2004, Waterloo, Canada,

August 9-10, 2004, Revised Selected Papers. H. Handschuh, and M. A. Hasan. (Eds.).

Vol. 3357 of Lecture Notes in Computer Science. Springer Verlag.

Boneh, D., R. A. DeMillo, and R. J. Lipton. 1997. On the importance of checking crypto-

graphic protocols for faults (extended abstract). In Advances in Cryptology - EURO-

CRYPT ’97, International Conference on the Theory and Application of Cryptographic

Techniques, Konstanz, Germany, May 11-15, 1997, Proceeding. W. Fumy (ed.). Vol.

1233 of Lecture Notes in Computer Science. Springer. Pp. 37–51.

Brickell, E. F., G. Graunke, M. Neve, and J.-P. Seifert. 2006. Software mitigations to

hedge AES against cache-based software side channel vulnerabilities. Technical Report

2006/052. Cryptology ePrint Archive. http://eprint.iacr.org/2006/052.

Brumley, D., and D. Boneh. 2005. Remote timing attacks are practical. Computer Networks

48, 701–716.

Cathalo, J., F. Koeune, and J.-J. Quisquater. 2003. A New Type of Timing Attack: Appli-

cation to GPS. In Walter, Ko¸c and Paar (2003). Pp. 291–303.

Chari, S., C. S. Jutla, J. R. Rao, and P. Rohatgi. 1999. Towards sound approaches to

counteract power-analysis attacks. In Wiener (1999). Pp. 398–412.

Chen, C.-N., and S.-M. Yen. 2003. Differential fault analysis on AES key schedule and some

countermeasures. In Information Security and Privacy, 8th Australasian Conference,

ACISP 2003, Wollongong, Australia, July 9-11, 2003, Proceedings. R. Safavi-Naini, and

J. Seberry. (Eds.). Vol. 2727 of Lecture Notes in Computer Science. Springer. Pp. 118–

129.

Clavier, C., J. Coron, and N. Dabbous. 2000. Differential Power Analysis in the Presence of

Hardware Countermeasures. In Ko¸c and Paar (2000). Pp. 252–263.

Daemen, J., and V. Rijmen. 1999. Resistance against Implementa-

tion Attacks: A comparative Study of the AES Proposals. Sec-

ond Advanced Encryption Standard (AES) Candidate Conference.

http://csrc.nist.gov/encryption/aes/round1/conf2/aes2conf.htm.

Daemen, J., and V. Rijmen. 2002. The Design of Rijndael: AES - The Advanced Encryption

Standard. Information Security and Cryptography. Springer Verlag.

Dhem, J.-F., F. Koeune, P.-A. Leroux, P. Mestr´e, J.-J. Quisquater, and J.-L. Willems. 1998.

A practical implementation of the timing attack. In CARDIS. J.-J. Quisquater, and

B. Schneier. (Eds.). number 1820 in Lecture Notes in Computer Science. Springer Verlag.

Diffie, W., and M. E. Hellman. 1976. New directions in cryptography. IEEE Transactions on

Information Theory 22, 644–654.

Drolet, G. 1998. A New Representation of Elements of Finite Fields GF(2m) Yielding Small

Complexity Arithmetic Circuits. IEEE Transactions on Computers 47, 938–946.

Dusart, P., G. Letourneux, and O. Vivolo. 2003. Differential fault analysis on A.E.S.. In Ap-

plied Cryptography and Network Security, First International Conference, ACNS 2003.

Kunming, China, October 16-19, 2003, Proceedings. J. Zhou, M. Yung, and Y. Han.

(Eds.). Vol. 2846 of Lecture Notes in Computer Science. Springer. Pp. 293–306.

van Eck, W. 1985. Electromagnetic radiation from video display units: an eavesdropping

risk?. Computers & Security 4, 269–286.

ElGamal, T. 1985. A public key cryptosystem and a signature scheme based on discrete

logarithms. In Advances in Cryptology, Proceedings of CRYPTO ’84, Santa Barbara,

California, USA, August 19-22, 1984, Proceedings. G. R. Blakley, and D. Chaum. (Eds.).

Vol. 196 of Lecture Notes in Computer Science. Springer-Verlag New York, Inc.. New

York, NY, USA. Pp. 10–18.

Ferguson, N., and B. Schneier. 2003. Practical Cryptography. John Wiley & Sons.

Fournier, J., S. Moore, H. Li, R. Mullins, and G. Taylor. 2003. Security Evaluation of

Asynchronous Circuits. In Walter et al. (2003). Pp. 125–136.

Gandolfi, K., C. Mourtel, and F. Olivier. 2001. Electromagnetic analysis: Concrete results.

In Ko¸c et al. (2001). Pp. 251–261.

von zur Gathen, J., and J. Gerhard. 2003. Modern Computer Algebra. 2nd ed. Cambridge

University Press, Cambridge, UK,.

von zur Gathen, J., and M. N¨ocker. 1997. Exponentiation in Finite Fields: Theory and

Practice. In Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, 12th

International Symposium, AAECC-12, Toulouse, France, June 23-27, 1997, Proceedings.

T. Mora, and H. F. Mattson. (Eds.). Vol. 1255 of Lecture Notes in Computer Science.

Springer. Pp. 88–113.

Giraud, C. 2004. DFA on AES. In Advanced Encryption Standard - AES, 4th International

Conference, AES 2004, Bonn, Germany, May 10-12, 2004, Revised Selected and Invited

Papers. H. Dobbertin, V. Rijmen, and A. Sowa. (Eds.). Vol. 3373 of Lecture Notes in

Computer Science. Springer. Pp. 27–41.

Goli´c, J. D. 2003. DeKaRT: A New Paradigm for Key-Dependent Reversible Circuits. In

Walter et al. (2003). Pp. 98–112.

Goli´c, J. D., and C. Tymen. 2002. Multiplicative masking and power analysis of AES. In

Kaliski Jr., Ko¸c and Paar (2003). Pp. 198–212.

Goubin, L., and J. Patarin. 1999. DES and Differential Power Analysis, ”The Duplication

Method”. In Workshop on Cryptographic Hardware and Embedded Systems — CHES

1999. C¸. K. Ko¸c, and C. Paar. (Eds.). Vol. LNCS 1717 of Lecture Notes in Computer

Science. Springer-Verlag. Pp. 158–172.

Guajardo, J., and C. Paar. 2002. Itoh-Tsujii Inversion in Standard Basis and Its Application

in Cryptography and Codes. Design, Codes, and Cryptography 25, 207–216.

Handy, J. 1998. The Cache Memory Book: THE authorative reference on cache design. 2nd

ed. Academic Press.

Hennesey, J., and D. Patterson. 2002. Computer Architecture: A Quantitative Approach. 3rd

ed. Morgan Kaufmann.

Hevia, A., and M. A. Kiwi. 1999. Strength of two data encryption standard implementations

under timing attacks. ACM Transactions on Information and System Security (TISSEC)

2, 416–437.

Hu, W.-M. 1992. Lattice scheduling and covert channels. IEEE Symposium on Security and

Privacy. IEEE Press. Pp. 52–61.

Intel 1997. Using the RDTSC Instruction for Performance Monitoring. Intel Corporation

1997.

Intel 2006. Intel c

64 and IA-32 Architectures Software Developer’s Manual Volume 3: System

Programming Guide.

ISO 2002. International Organization for Standardization, ISO/IEC 7816-3: Electronic sig-

nals and transmission protocols.

Itoh, T., and S. Tsujii. 1988. A Fast Algorithm for Computing Multiplicative Inverses in

GF(2m) Using Normal Bases. Information and Computation 78, 171–177.

Johansson, T. (Ed.) 2003. Fast Software Encryption, 10th International Workshop, FSE

2003, Lund, Sweden, February 24-26, 2003, Revised Papers. Vol. 2887 of Lecture Notes

in Computer Science. Springer.

Kahn, D. 1996. The Codebreakers. Scribner.

Kaliski Jr., B. S., C¸ . K. Ko¸c, and C. Paar. (Eds.) 2003. Cryptographic Hardware and Embedded

Systems - CHES 2002, 4th International Workshop, Redwood Shores, CA, USA, August

13-15, 2002, Revised Papers. Vol. 2523 of Lecture Notes in Computer Science. Springer.

Kelsey, J., B. Schneier, D. Wagner, and C. Hall. 1998. Side channel cryptanalysis of product

ciphers. In Computer Security - ESORICS 98, 5th European Symposium on Research in

Computer Security, Louvain-la-Neuve, Belgium, September 16-18, 1998, Proceedings. J.-

J. Quisquater, Y. Deswarte, C. Meadows, and D. Gollmann. (Eds.). Vol. 1485 of Lecture

Notes in Computer Science. Springer. Pp. 97–110.

Kerckhoffs, A. 1883. La cryptographie militaire. Journal des sciences militaires IX, 5–83 &

161–191.

Ko¸c, C¸. K., and C. Paar. (Eds.) 2000. Cryptographic Hardware and Embedded Systems

- CHES 2000, Second International Workshop, Worcester, MA, USA, August 17-18,

2000, Proceedings. Vol. 1965 of Lecture Notes in Computer Science. Springer.

Ko¸c, C¸., D. Naccache, and C. Paar. (Eds.) 2001. Cryptographic Hardware and Embedded

Systems - CHES 2001, Third International Workshop, Paris, France, May 14-16, 2001,

Proceedings. Vol. 2162 of Lecture Notes in Computer Science. Springer.

Kocher, P. C. 1996. Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and

other systems. In Advances in Cryptology - CRYPTO ’96, 16th Annual International

Cryptology Conference, Santa Barbara, California, USA, August 18-22, 1996, Proceed-

ings. N. Koblitz (ed.). Vol. 1109 of Lecture Notes in Computer Science. Springer. Pp. 104–

113.

Kocher, P. C., J. Jaffe, and B. Jun. 1998. Introduction to Differential Power Analysis and

Related Attacks. Technical Report. Cryptography Research, Inc.

Kocher, P. C., J. Jaffe, and B. Jun. 1999. Differential power analysis. In Wiener (1999).

Pp. 388–397.

Koeune, F., and J.-J. Quisquater. 1999. A timing attack against Rijndael. Technical Report

CG-1999/1. Universit´e Catholique de Louvain.

K¨ommerling, O., and M. G. Kuhn. 1999. Design principles for tamper-resistant smartcard

processors. Proceedings of the USENIX Workshop on Smartcard Technology — Smart-

card ’99. USENIX Association. Pp. 9–20.

Kuhn, M. G. 2002. Optical Time-Domain Eavesdropping Risks of CRT Displays. IEEE

Symposium on Security and Privacy. Pp. 3–18.

Kuhn, M. G. 2003. Compromising emanations: eavesdropping risks of computer displays.

Technical Report UCAM-CL-TR-577. University of Cambridge.

Lenstra, H. W. 2002. Rijndael for algebraists.

http://www.math.berkeley.edu/~hwl/papers/rijndael0.pdf.

Lidl, R., and H. Niederreiter. 1983. Finite Fields. number 20 in Encyclopedia of Mathematics

and its Applications. Addison Wesley.

Mangard, S. 2002. A Simple Power-Analysis (SPA) Attack on Implementations of the AES

Key Expansion. Proceedings of the 5th International Conference on Information Security

and Cryptology (ICISC 2002). Vol. LNCS 2587. Springer-Verlag. Pp. 343–358.

May, M., H. Muller, and N. Smart. 2001a. Non-Deterministic Processors. 6th Australian

Conference On Information Security and Privacy (ACISP). Pp. 115–129.

May, M., H. Muller, and N. Smart. 2001b. Random Register Renaming to Foil DPA. In Ko¸c

et al. (2001).

Menezes, A. J., P. C. van Oorschot, and S. A. Vanstone. 1997. Handbook of Applied Cryp-

tography. CRC Press.

Messerges, T. S. 2000. Securing the AES finalists against power analysis attacks. In Fast

Software Encryption, 7th International Workshop, FSE 2000, New York, NY, USA,

April 10-12, 2000, Proceedings. B. Schneier (ed.). Vol. 1978 of Lecture Notes in Computer

Science. Springer. Pp. 150–164.

Montgomery, P. L. 1985. Modular multiplication without trial division. Mathematics of

Computation 44, 519–521.

Moore, S., R. Anderson, R. Mullins, G. Taylor, and J. Fournier. 2003. Balanced Self-Checking

Asynchronous Logic for Smart Card Applications. Journal of Microprocessors and Mi-

crosystems 27, 421–430.

Neve, M., and J.-P. Seifert. 2006. Advances on access-driven cache attacks on AES. In Selected

Areas in Cryptography, 13th International Workshop, SAC 2006, Montreal, Quebec,

Canada, August 17 & 18, 2006, Revised Selected Papers. E. Biham, and A. Youssef.

(Eds.). to appear.

NIST 2001. Announcing the ADVANCED ENCRYPTION STANDARD (AES). FIPS-PUB

197. National Institute for Standards and Technology (NIST).

OpenSSL Project 2005. http://www.openssl.org.

Ors, S., F. G¨urkaynak, E. Oswald, and B. Preneel. 2004. Power-Analysis Attack on an ASIC

AES Implementation. Proceedings of the 2004 International Symposium on Information

Technology (ITCC 2004). IEEE Computer Society.

Osvik, D. A., A. Shamir, and E. Tromer. 2006. Cache Attacks and Countermeasures: The

Case of AES. In Topics in Cryptology - CT-RSA 2006, The Cryptographers’ Track at

the RSA Conference 2006, San Jose, CA, USA, February 13-17, 2006, Proceedings.

D. Pointcheval (ed.). Vol. 3860 of Lecture Notes in Computer Science. Springer. Pp. 1–

20.

Otto, M. 2005. Fault Attacks and Countermeasures. PhD thesis. University of Paderborn.

Page, D. 2002. Theoretical use of cache memory as a cryptanalytic side-channel. Technical

Report CSTR-02-003. Department of Computer Science, University of Bristol.

Page, D. 2003. Defending against cache based side-channel attacks. Information Security

Technical Report 8, 30–44.

Percival, C. 2005. Cache missing for fun and profit.

www.daemonology.net/papers/htt.pdf.

Piret, G., and J.-J. Quisquater. 2003. A differential fault attack technique against SPN

structures, with application to the AES and KHAZAD. In Walter et al. (2003). Pp. 77–

88.

Quisquater, J.-J., and D. Samyde. 2001. ElectroMagnetic Analysis (EMA): Measures and

Counter-Measures for Smart Cards. In Smart Card Programming and Security, Interna-

tional Conference on Research in Smart Cards, E-smart 2001, Cannes, France, Septem-

ber 19-21, 2001, Proceedings. I. Attali, and T. P. Jensen. (Eds.). Vol. 2140 of Lecture

Notes in Computer Science. Springer. Pp. 200–210.

Quisquater, J.-J., and D. Samyde. 2002. Eddy current for magnetic analysis with active

sensor. E-Smart 2002, Nice, France.

Rankl, W., and W. Effing. 2002. Handbuch der Chipkarten. 4. ed. Carl Hanser Verlag.

Rivest, R. L., A. Shamir, and L. M. Adleman. 1978. A method for obtaining digital signatures

and public-key cryptosystems.. Communications of the ACM (CACM), 21, 120–126.

Satoh, A., S. Morioka, K. Takano, and S. Munetoh. 2001. A Compact Rijndael Hardware

Architecture with S-Box Optimization. In Advances in Cryptology - ASIACRYPT 2001,

7th International Conference on the Theory and Application of Cryptology and Informa-

tion Security, Gold Coast, Australia, December 9-13, 2001, Proceedings. C. Boyd (ed.).

Vol. LNCS 2248 of Lecture Notes in Computer Science. Springer-Verlag. Pp. 239–254.

Schindler, W. 2000. A timing attack against RSA with the Chinese Remainder Theorem. In

Ko¸c and Paar (2000). Pp. 109–124.

Schneier, B. 1996. Applied Cryptography. John Wiley & Sons.

Schramm, K., G. Leander, P. Felke, and C. Paar. 2004. A collision-attack on AES: Com-

bining side channel- and differential-attack. In Cryptographic Hardware and Embedded

Systems - CHES 2004: 6th International Workshop Cambridge, MA, USA, August 11-

13, 2004. Proceedings. M. Joye, and J.-J. Quisquater. (Eds.). Vol. 3156 of Lecture Notes

in Computer Science. Springer. Pp. 163–175.

Schramm, K., T. J. Wollinger, and C. Paar. 2003. A new class of collision attacks and its

application to DES. In Johansson (2003). Pp. 206–222.

Shamir, A., and E. Tromer. 2004. Acoustic cryptanalysis - on noisy people and noisy ma-

chines. http://theory.csail.mit.edu/∼tromer/acoustic/.

Shoup, V. 2005. A Computational Introduction to Number Theory and Algebra. Cambridge

University Press.

Skorobogatov, S. P., and R. J. Anderson. 2002. Optical fault induction attacks. In Kaliski

Jr. et al. (2003). Pp. 2–12.

Stallings, W. 2005. Operating Systems: Internals and Design Principles. 5th ed. Prentice

Hall.

Stephenson, N. 1999. Cryptonomicon. 1st ed. Eos (HarperCollins).

Tiri, K., and I. Verbauwhede. 2003. Securing Encryption Algorithms against DPA at the

Logic Level: Next Generation Smart Card Technology. In Walter et al. (2003). Pp. 125–

136.

Tiri, K., M. Akmal, and I. Verbauwhede. 2002. A Dynamic and Differential CMOS Logic

with Signal Independent Power Consumption to Withstand Differential Power Analysis

on Smart Cards. 28th European Solid-State Circuits Conference (ESSCIRC 2002).

Tiu, C. C. 2005. A new frequency-based side channel attack for embedded systems. Master’s

thesis. University of Waterloo.

Trichina, E. 2003. Combinational Logic Design For AES SubByte Transformation on Masked

Data. Cryptology eprint archive: Report 2003/236. IACR.

Trichina, E., D. D. Seta, and L. Germani. 2002. Simplified Adaptive Multiplicative Masking

for AES. In Kaliski Jr. et al. (2003). Pp. 187–197.

Trostle, J. T. 1998. Timing attacks against trusted path. IEEE Symposium on Security and

Privacy. IEEE Press. Pp. 125–134.

Tsunoo, Y., E. Tsujihara, K. Minematsu, and H. Miyauchi. 2002. Cryptanalysis of Block Ci-

phers Implemented on Computers with Cache. International Symposium on Information

Theory and Its Applications (ISITA).

Tsunoo, Y., E. Tsujihara, M. Shigeri, H. Kubo, and K. Minematsu. 2006. Improving cache

attacks by considering cipher structure. International Journal of Information Security

(IJIS) 5, 166–176.

Tsunoo, Y., H. Kubo, M. Shigeri, E. Tsujihara, and H. Miyauchi. 2003a. Timing attack

on AES using cache delay in S-boxes. Symposium on Cryptography and Information

Security.

Tsunoo, Y., T. Kawabata, E. Tsujihara, K. Minematsu, and H. Miyauchi. 2003b. Tim-

ing attack on KASUMI using cache delay in S-boxes. Symposium on Cryptography and

Information Security.

Tsunoo, Y., T. Saito, T. Suzaki, M. Shigeri, and H. Miyauchi. 2003c. Cryptanalysis of DES

Implemented on Computers with Cache. In Walter et al. (2003). Pp. 62–76.

Tsunoo, Y., T. Suzaki, T. Saito, T. Kawabata, and H. Miyauchi. 2003d. Timing attack on

Camellia using cache delay in S-boxes. Symposium on Cryptography and Information

Security.

Voigtl¨ander, P. 2003. Entwicklung einer Hardwarearchitektur f¨ur einen AES-Coprozessor.

Diplomarbeit. Fachbereich Informatik, Mathematik und Naturwissenshaften, Technische

Informatik, HTWK Leipzig. Germany.

Walter, C. D., C¸. K. Ko¸c, and C. Paar. (Eds.) 2003. Cryptographic Hardware and Embedded

Systems - CHES 2003, 5th International Workshop, Cologne, Germany, September 8-10,

2003, Proceedings. Vol. 2779 of Lecture Notes in Computer Science. Springer.

Wiener, M. J. (Ed.) 1999. Advances in Cryptology - CRYPTO ’99, 19th Annual Interna-

tional Cryptology Conference, Santa Barbara, California, USA, August 15-19, 1999,

Proceedings. Vol. 1666 of Lecture Notes in Computer Science. Springer.

Wright, P. 1987. Spy Catcher: The Candid Autobiography of a Senior Intelligence Officer.

Viking Adult.

Yang, B., K. Wu, and R. Karri. 2004. Scan based side channel attack on dedicated hardware

implementations of data encryption standard. ITC ’04: Proceedings of the International

Test Conference on International Test Conference. IEEE Computer Society. Washington,

DC, USA. Pp. 339–344.