Tamper Resistance of AES
–
Models, Attacks and Countermeasures
A dissertation submitted to the
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF PADERBORN
for the degree of
Doktor der Naturwissenschaften
presented by
VOLKER KRUMMEL
accepted on the recommendation of
Prof. Dr. Johannes Bl¨omer, examiner
Prof. Dr. Joachim von zur Gathen, co-examiner
2007
ii
≪Timmy & Finn – Sonnenkinder, die auch im Regen lachen≫
Acknowledgments
I am deeply grateful to my supervisor, Prof. Dr. Johannes Bl¨omer, for his great support
and continuous encouragement in writing this thesis. Among other topics, he introduced
me into the field of tamper resistance and side channel attacks and supplied me with new
interesting and challenging problems and ideas. Johannes allowed me great freedom to do
my research and he always took time to discuss the ongoing progress. His comments and
suggestions were always very helpful to improve my work.
I am also truly indebted to my second supervisor, Prof. Dr. Joachim von zur Gathen, who
sparked my interest in cryptography. The opportunity to join his working group allowed me
to deepen my research in this fascinating area.
Furthermore, I would like to thank Dr. Jean-Pierre Seifert, the coordinator of our joint
project with the Intel Corporation. The cooperation with Intel not only implied financial
support of my research but also provided valuable insights in recent cryptographic problems.
This thesis would not have been possible without the generous support of the “Institut
f¨ur Industriemathematik” of the University of Paderborn. Special thanks go to Tanja B¨urger
and Dr. Robert Preis who were very helpful in handling all the administrative obstacles.
For proof reading parts of my thesis, I would like to thank Marcel R. Ackermann, Dr.
Valentina Damerow and Stefanie Naewe.
iv
Contents
1 Introduction 1
2 The Advanced Encryption Standard (AES) 5
2.1 Symmetric Block Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Basic Algebraic Structures of AES . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Representation of Data . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 The Finite Field F2[x]/hx8+x4+x3+x+ 1i.............. 7
2.2.3 The Ring F2[x]/hx8+ 1i. . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.4 The Ring R=F256[y]/hy4+ 1i. . . . . . . . . . . . . . . . . . . . . . 9
2.3 The Standard Implementation of AES . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 State Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 Key Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.4 Decryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 The Fast Implementation of AES . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Security and Side Channel Attacks 19
3.1 General Principles of Side Channel Attacks . . . . . . . . . . . . . . . . . . . 20
3.2 Side Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 Timing Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.3 Fault Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.4 Cache Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.5 Other Side Channel Attacks . . . . . . . . . . . . . . . . . . . . . . . . 24
v
3.3 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Provably Secure Randomization of Cryptographic Algorithms 25
4.1 Security Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1.1 Discussion of the Security Notion . . . . . . . . . . . . . . . . . . . . . 30
4.2 Masking AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Perfectly Masking AES against Order-1 Adversaries . . . . . . . . . . . . . . 33
4.3.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.3 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.4 Simplified Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 Implementation and Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4.1 Efficient Hardware Implementation over GF(((22)2)2) . . . . . . . . . 38
4.4.2 Cost and Comparison to Previous Countermeasures . . . . . . . . . . 39
4.5 Order-dPerfectly Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5.1 Perfect Mask Change . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5.2 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5.3 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Fault Based Collision Attacks 49
5.1 The Concept of Fault Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.1.1 Methods to Induce Faults . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.1.2 Fault Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 The Concept of Collision Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 New Fault Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4 Fault Based Collision Attacks on AES . . . . . . . . . . . . . . . . . . . . . . 59
5.4.1 Basic Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4.2 Second Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4.3 Third Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
vi
5.4.4 Fourth Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.4.5 Fifth Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 Cache Behavior Attacks (CBAs) 71
6.1 Cache Mechanism and Technical Background . . . . . . . . . . . . . . . . . . 73
6.2 Security Models for CBAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2.1 Fundamental Model for CBAs . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.2 Time Driven CBA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2.3 Trace Driven CBA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.2.4 Access Driven CBA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2.5 Extending the Threat Model for Access Driven CBAs . . . . . . . . . 84
6.3 Access Driven CBAs on AES . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.3.1 Access Driven CBA on the First Round . . . . . . . . . . . . . . . . . 85
6.3.2 Access Driven CBA on the Last Round . . . . . . . . . . . . . . . . . 87
6.4 General Methods to Thwart CBAs . . . . . . . . . . . . . . . . . . . . . . . . 88
6.5 Information Leakage and Resistance . . . . . . . . . . . . . . . . . . . . . . . 89
6.6 Information Leakage and Resistance of Selected Implementations . . . . . . . 92
6.7 Countermeasures Based on Permutations . . . . . . . . . . . . . . . . . . . . 100
6.7.1 An Access Driven CBA on a Permuted Sbox . . . . . . . . . . . . . . 101
6.7.2 Separability and Distinguished Permutations . . . . . . . . . . . . . . 103
6.8 Summary of Countermeasures and Open Problems . . . . . . . . . . . . . . . 106
A Sbox Tables T0,...,T4of AES 109
B Decompositions of the AES Sbox 115
vii
viii
Chapter 1
Introduction
Security in whatsoever context or meaning is the goal of human beings ever since the dawn
of mankind. One aspect of security is the secret communication, i.e., preventing others
from reading private messages. The oldest approach in making texts hard to read dates
back to about 4000 years. At that time in Egypt, a master scribe used unusual hieroglyphs
to obfuscate the meaning of an inscription in the tomb of Khnumhotep II (Kahn 1996).
Cryptography – the science of secret writing – was born.
In the course of time, people invented a lot of systems for keeping messages secret. Most
of them were broken because of the lack of thorough analysis and invalid assumptions of the
inventor. A famous example was the Enigma cipher machine used by German forces in the
Second World War. Several drawbacks of the Enigma in connection with a bad protocol and
protocol failures of the participants helped Polish and British experts to break the cipher.
Since World War 2, people understand the importance of cryptography and the theory of
the design and the analysis of encryption algorithms made enormous progress. Nowadays
we have a large number of strong algorithms whose security was analyzed independently by
crypto researchers all over the world, e.g., see (Menezes, van Oorschot and Vanstone 1997)
and (Schneier 1996).
But cryptography expanded from the science of secret writing to the science of arbitrary
security problems like authentication or data integrity. Cryptography can solve some very
difficult problems concerning security. Hence, cryptographic algorithms are the main building
blocks of security systems like access control or electronic payments. However, it was known
right from the beginning that using strong cryptographic algorithms does not necessarily
lead to a secure system. Quite the contrary is true. Using weak cryptography would not
weaken many systems because there are several other components of the systems that allow
even easier attacks. We are confronted with this kind of problem when we see the security
problems that occur because of the human factor or implementational mistakes like buffer
overflows etc. Securing a system can be compared to protecting a house against burglars.
Further strengthening the front door with sophisticated locks does not improve the security
1
2Chapter 1. Introduction
if the window on the back is still open. An attacker is not fair. He would not spend his
(life)time trying to pick the locks of the front door but simply slips in through the open
window. The same is true for security systems and even for cryptographic algorithms. We
cannot expect that an attacker does what we suppose him to do. He will take every chance he
can get to break the system. Since the system is only as secure as its weakest link, to improve
the security of a system one has to perform the following steps according to (Ferguson and
Schneier 2003):
1. detect all links
2. determine weak links
3. strengthen weak links
These steps are easily written down but very hard to perform. I.e., detecting all links and
determining weak links is very hard and tricky. The problem is that there do not exist any
rules an attacker sticks to.
Peter Wright was the first who published details about operation “ENGULF“ an example
for such an “unfair” attack (Wright 1987). Wright was a scientist at the MI5, one of the secret
services of the United Kingdom. During the Suez crisis in 1956, the MI5 was interested in
the messages of the Egyptian embassy that were encrypted by a Hagelin rotor machine. To
improve the secrecy a new key was set up every day. Although the MI5 had exactly the
same model of the cipher machine they could not break the encryption efficiently. Therefore,
Wright suggested to place a microphone close to the cipher machine to determine the key
settings by listending to the sound that occurs when setting up a new key. The sound enabled
the MI5 to figure out the daily key and read all the messages.
In 1985 van Eck published a different approach later called “van Eck phreaking” to obtain
private information (van Eck 1985). He showed how to exploit the electromagnetic emanations
of computer displays to reconstruct the content of the display even from a large distance.
Attacks that bypass security mechanisms by exploiting additional information or by ma-
nipulating the environment are called tampering attacks. These attacks show that security
engineering – the science of developing reliable and secure systems – is a much wider field
than cryptography, e.g., see (Anderson 2001).
But cryptography itself became a target of security concerns when Kocher published
an attack that determines a secret RSA key by analyzing the running times of encryptions
(Kocher 1996). Only a few years later, he also showed how to break cryptographic schemes
by analyzing the power consumption (Kocher, Jaffe and Jun 1999). Several similar methods
– so called side channel attacks – were developed to break cryptographic algorithms very effi-
ciently. As strengthening the links of a security system, protecting cryptographic algorithms
against side channel attacks is quite tricky.
3
Organization of the Thesis and Main Results In this thesis we focus on analyzing
the tamper resistance of cryptographic algorithms. More precisely, we examine the security
of todays most important symmetric encryption scheme, the Advanced Encryption Standard
(AES), against side channel attacks.
The first goal was to develop a general and strong model in which the effectiveness of
countermeasures to thwart side channel attacks can be analyzed. This goal was motivated by
finding a secure implementation of AES, a problem that was not satisfyingly solved before.
In Chapter 4 we present our strong and general model which covers adversaries of different
power. After that, we develop a general method to implement ciphers like AES provably
secure in our model. We give the security proof together with a thorough analysis of the
costs of our AES implementation in hardware. The results of this chapter were published in
(Bl¨omer, Guajardo and Krummel 2004).
A further goal was to analyze the effectiveness of countermeasures that were proposed
to thwart side channel attacks but were not analyzed thoroughly. We focus on the so called
memory encryption, a method that is based on encrypting the main memory to prevent
information leakage. At first sight, memory encryption provides a large improvement of
security and hence is used in many high security smartcards. In Chapter 5 we show that
this first impression is wrong. We present a new concept of fault attacks called fault based
collision attacks that defeats memory encryption using only a moderate number of faults.
The results of this chapter were published in (Bl¨omer and Krummel 2006).
In the last part of the thesis we analyze a different kind of side channel attacks, so called
cache based attacks. Cache based attacks have been proven to be very powerful and turned
out to be one of the biggest threats of cryptographic software implementations running on
computers with cache. In Chapter 6 we first strengthen the existing threat model to adapt
it to the recent methodology of cache based attacks. We introduce two security concepts
information leakage and resistance. Information leakage measures the maximal amount of
information that leaks through an arbitrary number of cache based attacks. The resistance
estimates the information an attacker may get after a single cache based attack. We analyzed
several implementations of AES determining their information leakage and their resistance.
It turns out that all implementations proposed so far provide only poor resistance and leak all
key bits. Therefore, we propose a new implementation of AES using small sboxes that does
not leak a single key bit. Furthermore, we analyzed a proposed countermeasure based on
random permutations. We show how to efficiently defeat this countermeasure using cached
based attacks. To improve the effectiveness of this countermeasure we develop a special class
of permutations so called distinguished permutations. Using distinguished permutations we
can provably protect half of the key bits even for an unlimited number of cache attacks. The
results of this chapter were published in (Bl¨omer and Krummel 2007).
4Chapter 1. Introduction
Chapter 2
The Advanced Encryption
Standard (AES)
In 1977, the National Bureau of Standards (NBS) of the USA announced the first standardized
symmetric encryption algorithm called Data Encryption Standard (DES) which immediately
became the de facto standard worldwide. In 1997, the National Institute of Standards and
Technology (NIST), formerly named NBS, started to search a successor of DES. The NIST
arranged a public competition of proposed algorithms that were submitted by several re-
searchers of the cryptography community. These submissions where publicly analyzed by
crypto researchers all over the world. Five candidate algorithms made it to the final deci-
sion. In the end Rijndael, an algorithm of the two Belgian cryptographers Joan Daemen and
Vincent Rijmen, was chosen to be the successor of DES named the Advanced Encryption
Standard (AES). In this chapter we first give the background of symmetric encryption algo-
rithms and then describe the AES in more detail. Further information about the AES can
be found in (Daemen and Rijmen 2002) and (NIST 2001). A more condensed description of
the AES can be found in (Lenstra 2002).
2.1 Symmetric Block Ciphers
Since the seminal paper (Diffie and Hellman 1976) encryption schemes (or ciphers) can be
classified as either symmetric or asymmetric ciphers. Asymmetric ciphers use a pair of keys,
a public key for encryption and a private key for decryption. For the security of this kind of
encryption systems it is essential that the private key cannot be derived from the public key
efficiently. Using a key pair (a public and a private one) allows two parties to communicate
privately without sharing a common secret. Two famous examples for asymmetric ciphers are
RSA (Rivest, Shamir and Adleman 1978) and the ElGamal cryptosystem (ElGamal 1985).
Symmetric ciphers only deal with a single key for both, encryption and decryption. Hence,
before being able to communicate securely both parties have to agree on a common secret
5
6Chapter 2. The Advanced Encryption Standard (AES)
key. To be more precise, a symmetric encryption scheme is defined as follows.
Definition 1 (symmetric encryption scheme) Let P,Kand Cbe the sets of valid plain-
texts, keys and ciphertexts respectively. A symmetric encryption scheme consists of a pair of
algorithms (enc,dec). The algorithm enc computes the unique ciphertext c∈ C given a valid
plaintext p∈ P and a valid key k∈ K:
enc :P × K → C
(p, k)7→ c=enck(p).
The algorithm dec computes the unique plaintext p∈ P given a valid ciphertext c∈ C and a
valid key k∈ K:
dec :C × K → P
(c, k)7→ p=deck(c).
The algorithms enc and dec are related by the property that
∀p∈ P ∀k∈ K :deck(enck(p)) = p.
A symmetric encryption scheme that takes as input a plaintext block of a fixed size and
computes a ciphertext block of fixed length is called block cipher. In a so called iterated block
cipher several transformations are sequentially applied repeatedly.
AES is an iterated block cipher with a fixed block length of 128 bits. The key length
can be 128, 192 or 256 bits. Depending on the chosen key length AES is named AES-128,
AES-192 or AES-256, respectively. To simplify notation we only describe AES-128. Similar
descriptions of the other variants are given in (Daemen and Rijmen 2002) or (NIST 2001).
2.2 Basic Algebraic Structures of AES
The design of AES makes use of several algebraic structures. In this section we briefly describe
each of these structures together with their associated operations.
2.2.1 Representation of Data
The basic information unit of AES is a byte consisting of 8 bits. Depending on the underlying
algebraic structure AES deals with different representations of bytes. Firstly, a byte bcan be
written in the binary notation as b= (b7,...,b0) where each bi∈F2. We can also interpret
bas a natural number P7
i=0 bi·2ibetween 0 and 255 and represent it by its hexadecimal
notation xy where x, y ∈ {0,1,...,9, A, B, C, D, E, F}. When working in a finite field or ring
the polynomial notation
b=b7x7+b6x6+b5x5+b4x4+b3x3+b2x2+b1x+b0
with coefficients in F2is used.
2.2. Basic Algebraic Structures of AES 7
2.2.2 The Finite Field F2[x]/hx8+x4+x3+x+ 1i
One of the algebraic structures of AES is the finite field with 256 elements. To be more
precise, AES uses the polynomial
m:= x8+x4+x3+x+ 1
that is irreducible over F2to define the finite field
F256 =F2[x]/hmi.
The polynomial representation of a byte bis considered as an element of F256. In the
sequel, we briefly describe the associated operations.
The addition of two elements of a, b ∈F256 is computed as
a+b=
7
X
i=0
(ai⊕bi)xi
where ⊕denotes the addition in F2.
The multiplication of two elements a, b ∈F256 is computed as
a·b= 7
X
i=0
aixi!· 7
X
i=0
bixi!(mod x8+x4+x3+x+ 1).
Obviously, a= 1 ∈F256 is the neutral element of the multiplication. Hence, a multiplication
by a= 1 is the identity. Furthermore, the multiplication by a=x∈F256 can be implemented
very efficiently. To do so, the coefficients of bare shifted one position to the left setting the
rightmost coefficient to 0:
x·b=b7x8+b6x7+b5x6+b4x5+b3x4+b2x3+b1x2+b0x.
To determine the correct reduced result we distinguish two cases: If b7= 0 then
b7x8+b6x7+b5x6+b4x5+b3x4+b2x3+b1x2+b0x
=b6x7+b5x6+b4x5+b3x4+b2x3+b1x2+b0x
is already in the reduced form and we do not need a further reduction. If b7= 1 then we
have to reduce the result modulo the polynomial mas defined above. We can compute the
correct reduced result by simply adding mto the product x·b:
x8+b6x7+b5x6+b4x5+b3x4+b2x3+b1x2+b0x+ 0
+x8+x4+x3+x+ 1
b6x7+b5x6+ (b4+ 1)x5+ (b3+ 1)x4+ (b2+ 1)x3+b1x2+ (b0+ 1)x+ 1
Algorithm 1 shows the computation of x·b(mod m) called xtime.
8Chapter 2. The Advanced Encryption Standard (AES)
Algorithm 1 xtime
Input: b=b7x7+b6x6+b5x5+b4x4+b3x3+b2x2+b1x+b0∈F2[x]/hmi
Output: x·b∈F2[x]/hmi
1: c←b7x8+b6x7+b5x6+b4x5+b3x4+b2x3+b1x2+b0x{left shift of coefficients}
2: if b7= 0 then
3: Return c{already correct result}
4: else
5: Return c+m{reduce and return}
6: end if
Inversion For every element aof the multiplicative group F×
256 there exists an unique
element b∈F×
256 such that ab = 1. The element b=a−1is called the inverse of a. We extend
the inversion to all elements of F256 by defining the function
INV : F256 →F256
a7→ (a−1, if a∈F×
256
0 , if a= 0
By Lagrange’s Theorem, we can compute INV(a) in F256 by raising ato the 254th power:
INV(a) = a254 ∈F256.
We can use the repeated squaring algorithm to compute the power of an element efficiently.
See for example (von zur Gathen and Gerhard 2003) or (Shoup 2005) for a comprehensive
treatise of the topic.
2.2.3 The Ring F2[x]/hx8+ 1i
Another algebraic structure that is used in AES is the ring F2[x]/hx8+ 1i. Since
x8+ 1 = (x+ 1)8∈F2[x]
is not irreducible over F2, the ring F2[x]/hx8+ 1idoes not form a field and we cannot invert
each of its elements b6= 0. The representation of data bytes as an element of the ring is
again the polynomial representation like in F256 as described above. Beside computing the
reductions modulo x8+ 1, addition and multiplication are defined as above. Hence, for two
elements a, b ∈F2[x]/hx8+ 1i
a+b=
7
X
i=0
(ai⊕bi)xi
and
a·b= 7
X
i=0
aixi!· 7
X
i=0
bixi!(mod x8+ 1).
2.3. The Standard Implementation of AES 9
2.2.4 The Ring R=F256[y]/hy4+ 1i
AES also deals with 4-tuples of bytes. Here, each byte, considered as an element of F256 as
described above, is a coefficient of a polynomial
β=β3y3+β2y2+β1y+β0(mod y4+ 1)
of degree less than 4. The polynomials described above form the ring
R:= F256[y]/hy4+ 1i.
For two elements α=P3
i=0 αiyi∈ R and β=P3
i=0 βiyi∈ R the sum α+βis computed
as
(α3+β3)y3+ (α2+β2)y2+ (α1+β1)y+ (α0+β0),
where the addition of two coefficients is computed in F256.
The product α·βis computed as
3
X
i=0
αiyi·
3
X
i=0
βiyi(mod y4+ 1).
2.3 The Standard Implementation of AES
After explaining the basic algebraic structures, we now describe the standard implementation
of AES as defined in (Daemen and Rijmen 2002) and (NIST 2001). As mentioned above the
basic information unit of AES is a byte. 16 bytes arranged in a 4 ×4 matrix form a so called
state.
To process a plaintext block pof 128 bits, pis transformed into a state. To do so pis
divided into 16 bytes p0, p1,...,p15. The bytes are mapped to a 4 ×4 array as shown in
Figure 2.1.
0
0 321
3
2
1
p3p7p15
p12
p1p5
p6
p2p14
p11
p4
p0
p13
p10
p9
p8
p0,...,p15
Figure 2.1: Mapping the plaintext pinto a state
10 Chapter 2. The Advanced Encryption Standard (AES)
AES is an iterated block cipher. During the AES encryption several different transforma-
tions grouped in so called rounds are repeatedly applied on the state. In the sequel, we first
describe each of these transformations and then provide the complete encryption algorithm.
2.3.1 State Transformations
The SubBytes (SB) Transformation
SubBytes is the non-linear transformation of AES.
S
s1,2
s3,2
s2,1
s3,1s3,3
s0,3
s0,1s0,2
s2,2
s′
0,1s′
0,2
s′
0,0
s′
1,0s′
1,2
s′
2,0s′
2,1s′
2,3
s′
2,2
s′
3,3
s′
3,2
s1,0
s2,0
s3,0
s0,0
s′
3,0s′
3,1
s1,3
s2,3
s′
0,3
s′
1,3
s1,1s′
1,1
Figure 2.2: The SubBytes transformation
It substitutes each byte of the state independently of the other bytes by applying a fixed
mapping. In the first step of this mapping each byte bconsidered as an element of F256 is
substituted by its inverse INV(b). In the second step, the INV(b) is interpreted as an element
of the ring R. A fixed affine mapping in the ring Ris applied to INV(b):
(x4+x3+x2+x+ 1) ·INV(b) + (x6+x5+x+ 1) (mod x8+ 1).(2.1)
To apply the mapping efficiently it is usually precomputed for all 256 possible different inputs
and the result is stored in a table of size 256 bytes. This table is called the substitution box
(sbox)S. We denote the application of the mapping to a byte bby S[b]. Figure 2.2 depicts
the application of the sbox.
The ShiftRows (SR)Transformation
The ShiftRows transformation performs a cyclic shift to each row of the state. Each row is
shifted by a fixed byte positions to the left. The first row is not shifted, the second row is
shifted one position to the left, the third row is shifted two positions to the left and the fourth
row is shifted three positions to the left. The ShiftRows operation is depicted in Figure 2.3.
2.3. The Standard Implementation of AES 11
s1,2s1,3
s3,2
s2,3
s2,1
s1,1
s0,3
s0,1s0,2
s2,2
s0,3
s1,1s1,2s1,3s1,0
s1,0
s2,0
s3,0
s0,1
s0,0
s0,0
s3,1s3,3
s2,2
s3,3s3,0s3,1
s0,2
s2,0
s2,3
s3,2
s2,1
Figure 2.3: The ShiftRows transformation
The MixColumns (MC)Transformation
The MixColumns transformation performs a linear combination of the bytes of a column.
Each byte of the state is interpreted as an element of F256. The four bytes β0, β1, β2, β3of a
β1
β0
β3
β2
β′
0
β′
1
β′
3
β′
2
·c
Figure 2.4: The MixColumns transformation
column are considered as the coefficients of a polynomial
β=β3y3+β2y2+β1y+β0∈ R
of degree less than 4 over the ring R=F256[y]/hy4+ 1i. The polynomial βis then multiplied
with a fixed polynomial:
c:= 03y3+ 01y2+ 01y+ 02 ∈F256[y]/hy4+ 1i.
MixColumns is depicted in Figure 2.4. Alternatively, we can represent the MixColumns trans-
formation as a matrix multiplication:
02 03 01 01
01 02 03 01
01 01 02 03
03 01 01 02
|{z }
∈F4×4
256
·
β0
β1
β2
β3
=
β′
0
β′
1
β′
2
β′
3
12 Chapter 2. The Advanced Encryption Standard (AES)
The AddRoundKey Transformation
To introduce the secret key into the encryption, the AddRoundKey transformation is used.
The so called key schedule is explained in Section 2.3.3 and gets as input the cipher key and
generates a so called roundkey for every round of AES. The round key is of the same size
as the encryption state, i.e., it forms a 4 ×4 byte matrix. The AddRoundKey transformation
combines a byte bof the state with its corresponding byte kof the round key by computing
the bitwise addition modulo 2 (XOR): b⊕k. The AddRoundKey transformation is depicted in
Figure 2.5.
s1,2s1,3
s3,2
s2,3
s2,1
s1,1
s3,1s3,3
s0,3
s0,1s0,2
s2,2
k0,0s′
0,2
s′
0,0
s′
2,0s′
2,1s′
2,3
s′
2,2
k0,3
k0,2
k1,3
k1,2
k1,1
k1,0
k2,0k2,1
k3,1
k3,0s′
3,0s′
3,3
s′
1,0s′
1,1s′
1,2s′
1,3
s0,0
s1,0
s2,0
s3,0
k0,1
k3,2
k2,3
k3,3
s′
0,1s′
0,3
s′
3,1s′
3,2
k2,2
Figure 2.5: The AddRoundKey transformation
2.3.2 Encryption
The AES encryption entirely consists of the four state transformations. A round of the AES
encryption is composed by consecutively applying the state transformations to the state in the
order shown in Algorithm 2. The complete encryption algorithm is shown in Algorithm 3.
Algorithm 2 A round of the AES encryption
1: SubBytes
2: ShiftRows
3: MixColumns
4: AddRoundKey
It consists of an initial AddRoundKey and 9 times applying the AES round as described
in Algorithm 2. After that a truncated round is applied that only consists of SubBytes,
ShiftRows and AddRoundKey.
2.3.3 Key Expansion
AES-128 applies the AddRoundKey transformation eleven times on the intermediate state.
AddRoundKey is applied before the first round, in each of the nine rounds and in the truncated
2.3. The Standard Implementation of AES 13
Algorithm 3 Complete AES encryption
Input: plaintext p0,...,p15 ∈ {0,1}8, key k
Output: ciphertext c0,...,c15 ∈ {0,1}8
1: AddRoundKey
2: for i= 1 to 9 do
3: SubBytes
4: ShiftRows
5: MixColumns
6: AddRoundKey
7: end for
8: SubBytes
9: ShiftRows
10: AddRoundKey
last round. To generate different round keys for each of these applications of AddRoundKey
a so called expanded key wis derived from the cipher key k=k0,...,k15 ∈{0,1}816 as
follows. The cipher key kis mapped to a 4 ×4 state matrix similar to the mapping of a
plaintext to a state as shown in Figure 2.1 (page 9). The four bytes of each column of this
matrix form a so called word. We define two operations on words. The first operation is the
so called SubWord operation. SubWord applies the sbox to every byte of the word:
SubWord :{0,1}8× {0,1}8× {0,1}8× {0,1}8→ {0,1}8× {0,1}8× {0,1}8× {0,1}8
(β0, β1, β2, β3)7→ (S[β0],S[β1],S[β2],S[β3]).
The second operation is RotWord that cyclically shifts the 4 bytes of a word one postion
to the left.
RotWord :{0,1}8× {0,1}8× {0,1}8× {0,1}8→ {0,1}8× {0,1}8× {0,1}8× {0,1}8
(β0, β1, β2, β3)7→ (β1, β2, β3, β0).
Furthermore, for i≥1 let
Rcon[i] := (xi−1,0,0,0) ∈(F2[x]/hmi)4
the so called round constant for the ith round key. The expanded key wis then computed
according to Algorithm 4.
The round key for round iis extracted from the expanded key wby mapping the words
w4i,...,w4i+3 of the expanded key wto the columns of a 4 ×4 byte matrix.
14 Chapter 2. The Advanced Encryption Standard (AES)
Algorithm 4 Key schedule of AES-128 in pseudocode
Input: cipherkey k=k0,...,k15 ∈ {0,1}8
Output: expanded key w=w0,...,w43 ∈ {0,1}32
1: for i←0,...,3do
2: wi= (k4·i, k4·i+1, k4·i+2, k4·i+3);
3: end for
4: for i←4,...,43 do
5: temp =wi−1
6: if (i≡0 mod 4) then
7: temp =SubWord(RotWord(temp)) ⊕Rcon[i/4]
8: end if
9: wi=wi−4⊕temp;
10: end for
2.3.4 Decryption
The decryption of AES, that is determining the unique plaintext given the corresponding
ciphertext and the correct secret key, is done by reverting every transformation that was
applied in the encryption. In the sequel, we show how every single transformation can
be inverted. Hence, applying the inverse of each transformation in the reversed order will
compute the correct plaintext.
The InvSubBytes Transformation
To undo the SubBytes transformation that substituted a byte bwith
(x4+x3+x2+x+ 1) ·INV(b) + (x6+x5+x+ 1) (mod x8+ 1)
we proceed in two steps. Firstly, notice that the function INV is self inverse. Secondly, the
affine mapping in the ring Ris invertible having the inverse
(x6+x3+x)·b+ (x2+ 1) (mod x8+ 1).
Hence, the inverse transformation InvSubBytes of the SubBytes transformation is given by
applying the mapping
INV (x6+x3+x)·b+ (x2+ 1)(mod x8+ 1) (2.2)
to every byte of the state.
To increase the efficiency one can precompute all 256 possible values and store them in a
table called the inverse sbox S−1.
2.3. The Standard Implementation of AES 15
The InvShiftRows Transformation
The ShiftRows transformation is obviously invertible by cyclically shifting the bytes of a row
by the appropriate number of position to the right. I.e., shifting the second row one position,
the third row two positions and the fourth row three positions to the right cancels the effect
of ShiftRows on a state.
The InvMixColumns Transformation
In the MixColumns transformation each column of the state is interpreted as an element of
the ring F256[y]/hy4+ 1iis multiplied by a fixed polynomial c= 03 ·y3+ 01 ·y2+ 01 ·y+ 02.
Since gcd(c, y4+ 1) = 1 the inverse of cexists:
c−1:= 0B ·y3+ 0D ·y2+ 09 ·y+ 0E ∈F256[y]/hy4+ 1i.
Multiplying each row interpreted as an element of F256[y]/hy4+1iwith c−1cancels the effect
of the MixColumns operation on a state.
The InvAddRoundKey Transformation
The round key is combined with the state by bitwise adding (XOR) the bytes of the round
key with the corresponding bytes of the state. Since the XOR operation is its own inverse
adding the round key again cancels the effect of the AddRoundKey transformation.
After specifying the inverse of each individual transformation we can compute the de-
cryption of a ciphertext by applying the inverse transformations in the reversed order as
shown in Algorithm 5.
Algorithm 5 Complete AES decryption
Input: ciphertext c0,...,c15 ∈ {0,1}8, key k
Output: plaintext p0,...,p15 ∈ {0,1}8
1: InvAddRoundKey
2: InvShiftRows
3: InvSubBytes
4: for i= 9 to 1 do
5: InvAddRoundKey
6: InvMixColumns
7: InvShiftRows
8: InvSubBytes
9: end for
10: AddRoundKey
16 Chapter 2. The Advanced Encryption Standard (AES)
2.4 The Fast Implementation of AES
Combining the transformations SubBytes,ShiftRows and MixColumns as described in Sec-
tion 4.2 of (Daemen and Rijmen 2002) leads to an alternative description of AES. Notice
that the operations SubBytes and ShiftRows can be exchanged. SubBytes substitutes the
bytes independent of their position whereas ShiftRows changes the position of the bytes
independent of their values.
Let
s:=
s0,0s0,1s0,2s0,3
s1,0s1,1s1,2s1,3
s2,0s2,1s2,2s2,3
s3,0s3,1s3,2s3,3
be the state before it enters an encryption round. For 0 ≤j≤3 consider the four bytes
s0,j, s1,j+1, s2,j+2, s3,j+3 of the state swhere the indices are computed modulo 4. The four
bytes are transformed by SubBytes and ShiftRows such that they form the new jth column:
S[s0,j]
S[s1,j+1]
S[a2,j+2]
S[a3,j+3]
.
The application of MixColumns and AddRoundKey leads to
02 03 01 01
01 02 03 01
01 01 02 03
03 01 01 02
·
S[s0,j]
S[s1,j+1]
S[s2,j+2]
S[s3,j+3]
⊕
k0,j
k1,j
k2,j
k3,j
We rewrite the matrix multiplication as the linear combination of the column vectors
S[s0,j]
02
01
01
03
⊕S[s1,j+1]
03
02
01
01
⊕S[s2,j+2]
01
03
02
01
⊕S[s3,j+3]
01
01
03
02
⊕
k0,j
k1,j
k2,j
k3,j
Based on this linear combination we can construct new sboxes
T0,T1,T2,T3:{0,1}8→{0,1}84
as follows:
T0[a] :=
S[a]·02
S[a]·01
S[a]·01
S[a]·03
,T1[a] :=
S[a]·03
S[a]·02
S[a]·01
S[a]·01
,T2[a] :=
S[a]·01
S[a]·03
S[a]·02
S[a]·01
,T3[a] :=
S[a]·01
S[a]·01
S[a]·03
S[a]·02
.
2.4. The Fast Implementation of AES 17
Each of the sboxes T0,T1,T2,T3has 256 entries of size four bytes. 4 bytes can be encrypted
one (full) round by computing
T0[a0,j]⊕T1[a1,j+1]⊕T2[a2,j+2]⊕T3[a3,j+3]⊕
k0,j
k1,j
k2,j
k3,j
.
For the last (truncated) round that does not have a MixColumns transformation things
are more simple. We could simply apply the standard sbox Sto every byte of the state.
However, to increase the efficiency on 32 bit platforms (Daemen and Rijmen 2002) suggested
to use the sbox
T4:{0,1}8→{0,1}84
a7→ S[a],S[a],S[a],S[a].
Merging the transformations as described above leads to a description of AES that only
uses applications of the sboxes and key additions to compute the correct AES encryption.
18 Chapter 2. The Advanced Encryption Standard (AES)
Chapter 3
Security and Side Channel Attacks
Classical cryptography covers several different security notions, e.g., security against known
plaintext attacks or chosen plaintext attacks. But all the different security notions share at
least one assumption: The encryption function is a black box. I.e., the only information an
attacker can get or influence is the plaintext and the ciphertext of the encryption function
as depicted in Figure 3.1. Here Alice and Bob want to communicate confidentially over an
insecure channel. To protect their communication they encrypt the messages in a private
environment before sending them. The attacker named Eve wants to obtain information
about the messages or the key used for encryption.
Alice
encrypt decrypt
Bob
Eve
cp
Figure 3.1: Black box model of classical cryptography
However, cryptographic algorithms have to be implemented either in hardware or software.
It turned out that implementations of cryptographic algorithms leak some information about
the cryptographic operations through so called side channels. The information that leaks is
called side channel information, e.g., the time it takes to encrypt a plaintext or the power
consumption etc. Side channel information depends on the implementation and its inputs,
i.e., the plaintext and the secret key. An attack that uses side channel information is called
side channel attack (SCA). It turns out that side channel attacks are much more efficient than
19
20 Chapter 3. Security and Side Channel Attacks
classical attacks for virtually every cryptographic algorithm. Hence, to analyze the security
of cryptographic algorithms it is essential that side channels are considered as a real threat
and are incorporated into the black box model. This leads to an extended black box model
like the one depicted in Figure 3.2. However, securing algorithms against side channel attacks
Alice
encrypt decrypt
Bob
Eve
cp
fault
electromagnetic
emanation
consumption
power
light
sound sound
fault power
consumptiontime
electromagnetic
emanation
probing
light
time
probing
Figure 3.2: Extended black box model that incorporates side channels
is quite tricky. At least two problems occur:
1. It is unclear how to determine all side channels. So far, no security model that considers
all side channels is known.
2. It is difficult to prevent the leakage of information. Most of the countermeasures pro-
posed so far only thwart a certain way of exploiting side channel information.
We introduce a model for analyzing the security against side channel attacks in Chapter 4.
3.1 General Principles of Side Channel Attacks
In the sequel, we describe the general principle of side channel attacks and the assumptions
that are necessary for mounting a side channel attack on an implementation. The essential
assumptions concerning an attacker Athat exploits side channel information are:
Assumption 1 (Kerckhoffs’ extended principle) Aknows all technical details about the
underlying cryptographic algorithm and its implementation.
This assumption is implicitely used in all side channel attacks. In the following we simply
refer to it as Kerckhoffs’ extended principle.
3.2. Side Channels 21
Assumption 2 Ais able to get plaintexts (or ciphertexts) of encryptions. Furthermore, for
each encryption Ais able to obtain side channel information.
The general structure of a side channel attack consists of the following steps:
measurement step In the measurement step the adversary Aobtains the side channel
information of the implementation together with the corresponding plaintext and/or
ciphertext. To perform the measurement step, Aneeds access to the implementation
of the algorithm. Therefore this step is also called online step.
analysis step Ainterprets the information collected in the measurement step and tries to
connect the side channel information to some property of an intermediate state of the
encryption. This analysis lets Aderive some information about the secret key. Depend-
ing on the side channel attack the analysis step can determine the secret key uniquely
or reduces the number of key candidates significantly such that a brute force attack is
applicable. The analysis can be performed without access to the implementation and
hence this step is also called offline step.
3.2 Side Channels
In the sequel we give an overview over the most commonly analyzed side channels and specify
the common structure of side channel attacks.
3.2.1 Timing Attack
The first publication of a successful timing attack was Kochers timing attack on modular
exponentiation as used in RSA (Kocher 1996). In the asymmetric cipher RSA, a ciphertext
cviewed as an element of the multiplicative group Z∗
Nis decrypted by raising it to the d-th
power
p=cdmod N,
where N∈Zis the public modulus and d∈Z∗
ϕ(N)is the secret exponent. The exponentiation
can be computed efficiently using the repeated squaring algorithm or a variation of it. Kocher
showed how to determine defficiently by analyzing time measurements of decryptions of many
different ciphertexts.
Quisquater’s Timing Attack on RSA
(Dhem, Koeune, Leroux, Mestr´e, Quisquater and Willems 1998) improved the timing attack
on RSA that uses a fast modular multiplication method called Montgomery multiplication
(Montgomery 1985). The structure of the attack is as follows. Let [d0, d1,...,dn] be the
22 Chapter 3. Security and Side Channel Attacks
binary representation of d. Knowing the bits d0,...,di−1of d, the bit dican be determined
by computing the following steps for many ciphertexts c:
1. measure the running time Tof the decryption of c
2. compute z=c[d0,d1,˙,di−1,0]
3. if computing z·ctakes ”long” then put Tinto set S1
4. else put Tinto set S2
After that, the attacker Acompares the average timings of S1and S2. If they differ signifi-
cantly, Aassumes that di= 1. Otherwise he assumes that di= 0.
Before starting the attack, Afirst implicitly assumes that di= 1 which implies that a
modular multiplication is computed in step iof the decryption. Since Aknows all preceeding
bits, he can compute the intermediate result of the decryption right before step i. He splits the
set of time measurements depending on the time it would take to compute the multiplication
in step i. The set S1stores all timing measurements of ciphertexts that would take a ”long”
time for the multiplication. The set S2stores all timing measurements of ciphertexts that
would take a ”short” time for the multiplication. The assumption is that if the multiplication
takes a long time than it is more likely that the overall encryption time is greater than the
encryption time of a ciphertext for which the multiplication takes a short time. Hence, if
di= 1 then the average running time of set S1should be significantly larger than the average
running time of the set S2. On the other hand, if di= 0 then no modular multiplication will
be computed. The splitting of measurements into the two sets is assumed to be random and
we expect that the average running times do not differ significantly.
There are several different variant of timing attacks. (Schindler 2000) adapted the concept
of timing attacks to RSA using the Chinese Remainder Theorem. (Cathalo, Koeune and
Quisquater 2003) developed a different type of timing attack to break the identification
scheme GPS of (Baudron, Boudot, Bourel, Bresson, Corbel, Frisch, Gilbert, Girault, Goubin,
Misarsky, Nguyen, Patarin, Pointcheval, Stern, Traor and Poupard 2000). Symmetric ciphers
are also susceptible to timing attacks. (Hevia and Kiwi 1999) showed how to determine the
secret DES key and (Koeune and Quisquater 1999) obtained secret AES keys by mounting
timing attacks.
The power of timing attacks goes far beyond local attacks. (Brumley and Boneh 2005)
demonstrated that remote timing attacks are possible. They determined the secret RSA
exponent of a web server running openssl by remotely taking time measurements over a
computer network. This remote timing attack was improved by (Acıi¸cmez, Schindler and
Ko¸c 2005).
3.2. Side Channels 23
3.2.2 Power Analysis
The idea of power analysis is that the power consumption of a cryptographic device is related
to intermediate results of an encryption algorithm and hence depends on the secret key. The
first successful power attacks are due to (Kocher, Jaffe and Jun 1998). Power analysis can be
divided into simple power analysis (SPA) and differential power analysis (DPA). In an SPA,
the attacker analyzes a single power trace to figure out which operation and operands were
executed at what time.
As the name suggests, differential power analysis is based on the differences of power
traces obtained from many different inputs. Similar to timing attacks, the attacker Asplits
a large set of power traces into two sets depending on some guesses of parts of the key. For
each plaintext the attacker does the following. If the guess of the part of the key implies that
a certain operation during the encryption should consume a lot of power then the obtained
power trace is put into set S1. On the other hand, if the key guess implies that the operation
does not consume much power the trace is put into set S2. In the end, the attacker computes
the difference of the average traces of both sets. If there is a peak in the difference trace than
the attacker assumes that the guess of the part of the key was correct. Otherwise he assumes
that the guess was wrong.
The underlying idea is similar to the one of the timing attack. If the key guess is correct
than all power traces in the set S1show a high power consumption (peak) at the time when
the certain operation is executed. Hence, the average power trace of set S1also shows this
peak. The average trace of set S2does not have this peak. Therefore, the peak of S1will be
visible in the difference of the two average traces.
If the key guess was wrong than the attacker wrongly decides whether the operation would
consume a lot of power or not. The assumption is that in this case the assignment of power
traces into the sets S1,S2is random. Hence, we expect that when computing the average
traces the peaks in the power traces cancel out and we get a smooth difference trace.
3.2.3 Fault Attacks
The main idea of fault attacks is to obtain information about the secret key by inducing
faults into the cryptographic operation. We deal with fault attacks in more detail in Chapter
5 (page 49).
3.2.4 Cache Attacks
A cache is a fast buffer memory that can be accessed faster than the main memory. Hence,
buffering data that is used more often in the cache increases the performance of a computer.
In a cache attack the attacker observes information about the cache behavior of an algorithm.
E.g., he figures out how many cache accesses happened or which operation caused a cache
24 Chapter 3. Security and Side Channel Attacks
access. We analyze cache attacks in more detail in Chapter 6 (page 71).
3.2.5 Other Side Channel Attacks
Beside the side channels described above there are several other ways for an attacker to
obtain information about the internal states of an cryptography algorithm. (van Eck 1985)
shows how to reconstruct the content of a computer display by analyzing the electromagnetic
radiation of the monitor. Neal Stephenson treats the so called van Eck phreaking of attack
in his novel ”Cryptonomicon” (Stephenson 1999). The concept of using electromagnetic
radiation to attack cryptographic algorithms was demonstrated in (Quisquater and Samyde
2001), (Gandolfi, Mourtel and Olivier 2001) and (Kuhn 2003).
Another example for a side channel attack proposed in (Shamir and Tromer 2004) is to
analyze the sound a computer generates while operating with the secret key. Further kinds
of side channel attacks are among others so called frequency based attacks (Tiu 2005), visible
light attacks (Kuhn 2002) and scan based attacks (Yang, Wu and Karri 2004).
3.3 Countermeasures
In general, there are two strategies to thwart side channel attacks. The first strategy is to
prevent the information leakage. E.g., to thwart timing attacks one could build an imple-
mentation that uses constant execution time for all possible inputs. However, this approach
has several disadvantages. Firstly, building such an implementation is costly because it has
to consider all details of the underlying hardware and other parts of the environment. Sec-
ondly, missing one of the details could lead to an implementation that is susceptible to other
side channel attacks. The third disadvantage of this approach is that it leads to inefficient
implementations that have to be redesigned for every different environment.
The second strategy is to randomize the intermediate values of an implementation such
that the leaking information is useless for an attacker. Furthermore, the implementation
has to ensure that the correct ciphertext is computed in the end. Of course, this approach
needs random values for obfuscating intermediate values. But randomization has several
advantages over the strategy of preventing information leakage. The first advantage is that
one can define a general model to analyze the effectiveness of the randomization. Furthermore,
the randomization can be done independently of the underlying hardware. Therefore one can
reuse randomized algorithms on several different platforms.
In the next chapter, we will present such a randomization strategy to provably protect
the AES against side channel attacks in a strong model.
Chapter 4
Provably Secure Randomization of
Cryptographic Algorithms
The security of AES against Simple Power Analysis (SPA), Differential Power Analysis
(DPA), Higher Order Differential Power Analysis (HODPA) as published in (Kocher et al.
1998), (Kocher et al. 1999), and Timing Attacks (Kocher 1996) has received considerable
attention since the beginning of the AES selection process. (Koeune and Quisquater 1999)
describe timing attacks against careless implementations of AES. (Biham and Shamir 1999)
and (Daemen and Rijmen 1999) discuss DPA attacks on the AES candidates in software based
solutions. (¨
Ors, G¨urkaynak, Oswald and Preneel 2004) describe the first power analysis-based
attack on a dedicated ASIC implementation of AES and (Mangard 2002) discusses an SPA
attack on the key schedule of AES.
As a result of these attacks, numerous hardware and algorithmic countermeasures have
been proposed. Hardware methodologies were proposed right from the beginning including
randomized clocks, memory encryption schemes, see (Clavier, Coron and Dabbous 2000) and
(Goli´c 2003), power consumption randomization (Daemen and Rijmen 1999), and decorre-
lating the external power supply from the internal power consumed by the chip. Moreover,
the use of different hardware logic, such as complementary logic (Daemen and Rijmen 1999),
sense amplifier based logic (SABL) and asynchronous logic (Fournier, Moore, Li, Mullins
and Taylor 2003) and (Moore, Anderson, Mullins, Taylor and Fournier 2003) has also been
proposed. Some of these methods soon proved to be ineffective while other more successful
countermeasures are very costly in terms of development, area and power consumption. For
example, the techniques in (Daemen and Rijmen 1999), (Tiri, Akmal and Verbauwhede 2002),
(Tiri and Verbauwhede 2003), (Fournier et al. 2003) and (Moore et al. 2003) require about
twice as much area and will consume twice as much power as an implementation that is
not protected against power attacks. In addition, hardware countermeasure will only protect
against known techniques and attacks. They cannot provide security in a precisely defined
mathematical sense. Hence, although hardware countermeasures are an important defense
against side channel attacks, they should be complemented by algorithmic countermeasures
25
26 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
that are provably secure in a mathematically precise sense.
In this chapter, we focus on algorithmic countermeasures against timing and power at-
tacks on AES. In general, efficient algorithmic countermeasures against timing and power
attacks are based on randomization techniques. Here the problem is to guarantee that all
information that is accessible via side channels is random and hence useless to the attacker.
Moreover, the randomization must be used in such a way that, at the end of the algorithm, the
correct encryption or signature corresponding to the input plaintext is obtained. Random-
ized algorithmic countermeasures against timing and power attacks include secret-sharing
schemes, independently proposed by (Goubin and Patarin 1999) and (Chari, Jutla, Rao and
Rohatgi 1999) as well as methods based on the idea of masking all data and intermediate re-
sults during an encryption operation, originally introduced by (Messerges 2000). This chapter
is organized as follows.
Section 4.1: Security Model .......................................................28
In this section we introduce and discuss our mathematically precise security notion in
which we discuss randomization techniques. For our security notion we only make some
inevitable assumptions: Firstly, we assume that some (small) part of the computation
runs in a protected environment. Secondly, we limit the number of intermediate results
that an adversary has access to. Note that previous methods made at least these as-
sumptions. On the other hand, we assume that arbitrary differences in the distribution
of an intermediate result that depends on the plaintext or secret key of the cryptosystem
can be used to break the system completely. Accordingly, our security notion requires
that the distribution of any intermediate result is stochastically independent of the
secret key being used and independent of the plaintext. Independent of our research,
Goli´c briefly sketched a similar requirement in (Goli´c 2003). In the sequel, we call an
algorithm order-dperfectly masked if the joint distribution of any dintermediate results
is independent of the secret key and the plaintext. This notion of security strengthens
the security notion proposed in (Chari et al. 1999). Their security notion only requires
that the distribution of some side channel information about an intermediate result
has to be indistinguishable by an adversary. Since our security notion assumes that
even tiny differences in the distribution of the values of intermediate results completely
break an implementation of a cryptosystem, this notion is strong and often unrealistic.
On the other hand, we will argue that our security notion implies security against most
side channel attacks.
Section 4.2: Masking AES .........................................................31
In this section we briefly describe the masking techniques proposed so far. The first
algorithmic countermeasure against power attacks customized for the AES was the
Transform Masking Method by (Akkar and Giraud 2001). This method was further
simplified by (Trichina, Seta and Germani 2002). It was noticed in (Trichina et al. 2002),
(Goli´c and Tymen 2002) and (Akkar and Goubin 2003) that the multiplicative masking
27
introduced in (Akkar and Giraud 2001) masks only non-zero values, i.e., a zero byte will
not get masked because of the multiplicative nature of the mask. This feature renders
the method of Akkar and Giraud vulnerable to DPAs. A second masking technique
for AES is the Random Representation Method of (Goli´c and Tymen 2002). Similar to
Akkar and Giraud, Goli´c and Tymen do not try to show that their technique randomizes
all intermediate results. Instead, the authors argue experimentally that using their
methods the Hamming weights of all intermediate results are distributed in roughly
the same way, independent of the plaintext and the secret key. We conclude that so
far customized randomization techniques for AES were based on empirical assumptions
about the power of potential adversaries. Then these assumptions were used to define
some ad-hoc-model in which to analyze and argue the security of the methods. We
believe that this is a potentially dangerous approach.
Section 4.3: Perfectly Masking AES against Order-1Adversaries .............33
Based on our security notion we develop an order-1 perfectly masked algorithm for AES.
Hence, this algorithm is secure against any adversary that gets plaintext/ciphertext
pairs and a single arbitrary intermediate result for each of those pairs. The main
problem here is to describe a secure algorithm for the inversion operation that is the
main ingredient of the AES SubBytes transformation. Our solution is based on a
general technique to turn an arbitrary algorithm using arithmetic operations defined
over some finite field into a randomized algorithm that securely computes the same
function.
Section 4.4: Implementation and Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
We show that masking countermeasures are inexpensive to implement in hardware. Our
method amounts to only a 20% increase in the overall area required for an AES hardware
implementation when compared to dual-rail logic type countermeasures. To show this,
we provide a detailed cost comparison of the different methods. Because our method is
based on the usage of multipliers and adders over any binary field, designers might use
this method to implement DPA-safe circuits which utilize previously designed multiplier
and adder blocks. Moreover, the method is modular and encourages reusability.
Section 4.5: Order-dPerfectly Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
In this section we generalize our method of order-1 perfectly masked algorithms. We
show how to design order-dperfectly masked algorithms that are secure against adver-
saries that get the values of a fixed number dof intermediate results.
Section 4.6: Conclusion ............................................................47
We conclude the chapter by giving a brief survey of our contribution in the area of
building reliable security models and developing provably secure algorithms.
28 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
4.1 Security Model
In this section we describe our model which we will use in the sequel to analyze the security
of algorithms against side channel attacks. We specify the underlying assumptions that
characterize the model.
Let P,Kand Cdenote the set of plaintexts, the set of keys and the set of ciphertexts
respectively. We consider some encryption function
enc : P × K → C
(x, k)7→ c.
Given an algorithm E that evaluates the function enc, for each plaintext x∈ P and key
k∈ K, we view the computation of E(x, k) as a sequence of t∈Nintermediate results
I1(x, k, R),...,It(x, k, R).
Each intermediate result Iimay depend on the plaintext x, on the secret key k, and some
R∈ {0,1}αfor an appropriate constant α∈N. The element Ris used to randomize the
computation and is chosen uniformly at random from {0,1}α. For simplicity we assume that
we have a true random number generator (TRNG) and that the adversary is not able to
manipulate the random bits. Note that the ciphertext enc(x, k) = It(x, k, R) only depends
on xand kand not on R.
We consider an adversary Athat wants to derive information about the secret key kby
using side channel information. To characterize the security model we make the following
assumptions:
Assumption 3
1. The adversary Acan choose an arbitrary number of plaintexts (or ciphertexts) and
obtains the corresponding ciphertexts (or plaintexts).
2. For each encryption (or decryption), Agets the values of a constant number dof in-
termediate results.
In point 1 of Assumption 3 we allow the adversary to obtain an arbitrary number of (adap-
tively) chosen plaintext/ciphertext pairs (x, enc(x, k)). Furthermore, for each pair, the ad-
versary Aobtains the values of dintermediate results of his choice. Amay get different
intermediate results for different plaintext/ciphertext pairs. The larger the number dof
known intermediate results is, the more powerful is A. We call an adversary A, that can get
at most dintermediate results for each pair (x, enc(x)) an order-dadversary.
So far we considered intermediate results without specifying the possible intermediate
results that an adversary may get. We consider an algorithm as a sequence of operations
4.1. Security Model 29
that are treated as encapsulated modules. This leads to a classification of intermediate results
into different levels down to the bit level:
1. Text level: The whole algorithm is treated as a module. This level is the one of classical
cryptography. The only information available to the adversary is the plaintext and the
ciphertext.
2. Block level: Each part or subroutine of the algorithm is treated as a module. In the
case of a block cipher such as the AES, each transformation within a round is treated
as a module (SubBytes,ShiftRows,MixColumns and AddRoundKey).
3. Unit level: Each arithmetic operation is treated as a module. These operations work on
the atomic units of information in the cipher. For example, the AES units of information
are bytes; no operation acts on single bits or nibbles directly. In hardware terms this
level is based on the contents of registers.
4. Bit level: Each bit manipulation is treated as a module, for example XOR, shift etc.
Every output of such a module is an intermediate result. In this section we concentrate
on intermediate results at the unit level. For AES this seems to be a natural choice since
basically all operations in AES are arithmetic operations on bytes. Therefore timing, power
and fault attacks on AES have focused on these operations as well.
Assumption 4 Some of the operations of the algorithm E that evaluates the encryption
function are protected against A.
This assumption is inevitable to achieve a reasonable notion of security. To see this, note
that the secret key kitself can be considered as an intermediate result. Letting Aobtain k
directly would render all algorithms and countermeasures insecure. Hence, we must assume
that some parts of the computation run in a guaranteed secure environment. I.e., some
intermediate results cannot be accessed by an adversary. At least implicitly, all previously
proposed countermeasures against side channel attacks have made the same assumption.
Note that modern smartcards are protected by different types of countermeasures like sensors
and shields. Hence, the assumption that at least some computations are done in a secure
environment is realistic. However, it is desireable to clearly specify and to limit the number
of those operations because their protection is expensive.
Assumption 5 If the joint distribution of dintermediate results depends on the plaintext x
and on the secret key kthen Acan determine k.
This assumption strengthens the adversary. If the joint distribution of dintermediate results
depends on the secret key then it provides Asome information about k. To simplify and
strengthen our security model we assume that in this case Acan determine the entire key k.
30 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
Intuitively, we say that the algorithm computing enc is insecure if the joint distribution
of the intermediate results that are accessible for an adversary depends on the plaintext x
and on the secret key k. To formalize this, fix some d-tuple I1,...,Idof intermediate results.
For a pair (x, k) of plaintext and key we denote by Dx,k(R) the joint distribution of I1,...,Id
induced by choosing Runiformly at random in {0,1}αfor an appropriate constant α. Now
we can define our notion of security called perfect masking:
Definition 2 (perfect masking) An algorithm that evaluates an encryption function enc
is order-dperfectly masked if for all d-tuples I1,...,Idof intermediate results we have that
Dx,k(R) = Dx′,k′(R)for all pairs (x, k),(x′, k′).
For d= 1 we say that an algorithm is perfectly masked.
4.1.1 Discussion of the Security Notion
Our notion of security is very strong. Basically, we assume that an adversary can determine
the secret key even from tiny differences in the (joint) distribution of intermediate results. In
many realistic cases this may not be true. However, we do not want to base our security model
on assumptions about technical abilities or limitations adversaries currently have. Instead
we want to provide a precise mathematical notion that captures security against current
side channel attacks as well as future ones. Our notion of security strengthens the security
notion of (Chari et al. 1999). We require that for any two pairs (x, k),(x′, k′) of plaintext
and key the joint distributions Dx,k(R), Dx′,k′(R) of dintermediate results induced by these
pairs must be identical. Chari et al., on the other hand only demand that the distributions
Dx,k(R), Dx′,k′(R) must be indistinguishable by an adversary. As Chari et al. point out, if
the joint distributions of dintermediate results induced by different plaintext/key pairs are
indistinguishable for an adversary then power analysis and timing attacks using information
about at most dintermediate results cannot be mounted. Clearly, identical distributions
are indistinguishable. Hence, an algorithm that is order-dperfectly masked is secure against
timing and power analysis attacks using information about dintermediate results.
In the sequel, we will concentrate on methods to achieve a perfectly masked algorithm to
compute AES. From the discussion above it follows that the perfectly masked algorithm for
AES that we describe in Section 4.3 (page 33) is secure against timing and power analysis
attacks using a single intermediate result. As can easily be seen, our algorithm is not secure,
if an adversary has access to two or more intermediate results. Notice that most countermea-
sures proposed so far also assume an adversary with access to a single intermediate result,
see (Akkar and Giraud 2001), (Goli´c and Tymen 2002) and (Trichina 2003).
4.2. Masking AES 31
4.2 Masking AES
(Messerges 2000) introduces the idea of masking all intermediate values of an encryption
operation as an effective countermeasure against Simple Power Attacks and Differential Power
Attacks. Randomizing the computation of a function fis, thus, achieved as f(u′) where
u′=u+rand ris a randomly chosen mask. If the function is linear, one can recover the
desired value f(u) from f(u′) = f(u) + f(r). A similar computation will recover f(u) if the
function fis affine. For non-linear functions, the previous equation does not hold true and it
is necessary to come up with a series of computations depending only on rand u′such that
we obtain the value of f(u) without leaking any information.
We notice that in the case of the AES, the only non-linear function in the algorithm is
the AES SubBytes transformation. As described in Section 2.3 (page 9), SubBytes consists
of the function
INV(x) = (x−1, if x∈F×
256
0 , if x= 0
together with an affine mapping. In particular, most researchers have concentrated their
efforts on efficient methods to perform inversion over F256 in a secure manner via masking
countermeasures, i.e., computing u−1+rfrom u+rwithout compromising the value of
u. In this context, three masking methods have been proposed: two of them, (Akkar and
Giraud 2001) and (Goli´c and Tymen 2002) are based on the idea of combining boolean and
multiplicative masking operations and the third one is based on the idea of masking the
individual logic operations required to compute a F256 inverse. A simplification of (Akkar
and Giraud 2001) was introduced in (Trichina et al. 2002) but it has been recently found
by (Akkar, B´evan and Goubin 2004) that the simplifications lead to further vulnerabilities
against DPA. Thus, we do not consider it any further. In the following, we shortly summarize
the previously proposed countermeasures.
The Transform Masking Method (TMM)
In (Akkar and Giraud 2001), Akkar and Giraud introduce the Transform Masking Method
(TMM) and algorithms to transform between boolean masking (XOR operation) and multi-
plicative masking (multiplication in F256) which is compatible with inversion in F256. (Akkar
and Giraud 2001) solves the problem using Algorithm 6, where r1∈F256 is a random field
element and r2∈F×
256 is a random element of the multiplicative group.
However, as noticed in (Trichina et al. 2002) and (Goli´c and Tymen 2002), this counter-
measure is susceptible to first-order DPA if u= 0 because zero cannot be masked with a
multiplicative mask. It is clear that because of the special nature of the zero value, multi-
plicative masking cannot lead to perfect masking.
32 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
Algorithm 6 Transform Masking Method
Input: u′=u⊕r1∈F256,r1∈F256,r2∈F×
256
Output: INV(u)⊕r1
1: t1←u′·r2{t1= (u⊕r1)·r2}
2: t2←r1·r2{t2=r1·r2}
3: t1←t1⊕t2{t1=u·r2}
4: t3←r−1
2{t3=r−1
2}
5: t1←INV(t1){t1= INV(u·r2)}
6: t2←t3·r1{t2=r1·r−1
2}
7: t1←t1⊕t2{t1= INV(u·r2)⊕(r1·r−1
2)}
8: t1←t1·r2{t1= INV(u)⊕r1}
Embedded Multiplicative Masking (EMM)
Let m=x8+x4+x3+x+ 1 ∈F2[x] be the polynomial of the AES specification. The basic
idea of EMM as described in (Goli´c and Tymen 2002) is to embed the field F256 =F2[x]/hmi
in the ring
Rn:= F2[x]/(m·q)∼
=F256 ×F2n,
where q∈F2[x] is another irreducible polynomial of degree nthat is co-prime to m. The
field F256 is a subring of the ring Rnwith the homomorphism defined by
F256 → Rn
v7→ (vmod m, v mod q).
(Goli´c and Tymen 2002), then, suggests to use a random mapping ρkdefined by
ρk:F256 → Rn
v7→ v+rm mod mq
where r∈F2[x] is a randomly chosen polynomial of degree less than n. To compute INV(v)
an adapted function
INV′:F256 →F256
v7→ v254 mod mq
can be used.
In this way, arithmetic operations remain compatible with F256 and the zero value gets
mapped to one of 2nrandom values. Thus, it is harder to detect the zero value as nbecomes
larger. From a security point of view, however, the approach in (Goli´c and Tymen 2002) does
not yield perfect masking since the sets of representatives of different values are pairwise
disjoint. From an implementation point of view, we will show in Section 4.4.2 (page 39) that
4.3. Perfectly Masking AES against Order-1Adversaries 33
this method is too expensive to implement in hardware. This is important since our method
can be implemented with less than half the hardware resources and, at the same time, yields
perfect masking.
Combinational Logic Design for the AES Sbox on Masked Data
To the authors’ knowledge, (Trichina 2003) is the first to consider embedding a masking
countermeasure directly in hardware. (Trichina 2003) allows for a modified inversion function
which on input u⊕r1outputs u−1⊕r2, where r1and r2need not be the same. In addition,
(Trichina 2003) reduces the masking problem for inversion in F2mto the problem of masking
a logical AND operation since masking XOR operations is, in principle, trivial. In particular,
given masked bits u′=u⊕r1,v′=v⊕r2and corresponding masks r1, r2, we compute
(u∧v)⊕r3, where r3is the output mask. According to (Trichina 2003) and setting r3=r1∧r2
this can be accomplished as:
(u∧v)⊕r3= (u∧v)⊕(r1∧r2) = (u′∧v′)⊕(r1∧v′)⊕(r2∧u′)(4.1)
where the parenthesis indicate the order in which intermediate results are computed. Equa-
tion (4.1) implies that we can compute the AND operation of two bits u, v without using the
actual bits but rather their masked counterparts u′, v′and corresponding masks r1, r2. We
notice that if u=v= 0, the intermediate value (r1∧v′)⊕(r2∧u′) is always equal to zero
for any value of r1and r2. This implies that (4.1) does not lead to perfect masking.
4.3 Perfectly Masking AES against Order-1Adversaries
As mentioned before, in order to obtain a perfectly masked algorithm for AES we concentrate
on the problem of computing multiplicative inverses in F256 because this is the main step of
the SubBytes transformation. In this section we present an algorithm that is secure against
an adversary who is able to get the value of a single intermediate result. In Section 4.5 (page
41) we will show how to generalize this method to protect against order-dadversaries for an
arbitrary but fixed d≥1.
Let r, r′be independent and uniformly distributed random masks. We start with an
additively masked value u⊕rand would like to compute INV(u)⊕r′. However, a direct
application of INV leads to INV(u⊕r) that is of no use because of the non-linearity of
inversion.
4.3.1 Idea
The basis of our idea is to compute INV(x) as x254 in F256. For simplicity we only consider
the repeated squaring algorithm to compute the 254th power. However, to improve efficiency
34 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
one could use an optimal addition chain. For a thorough treatment of efficient exponentiation
methods see for example (von zur Gathen and N¨ocker 1997, von zur Gathen and Gerhard
2003). In general the multiplicative inverse of an element over an arbitrary finite field Fpm
can always be computed by raising it to the (pm−2)-th power. Since our inputs are additively
masked values (u⊕r) we correct the result of every single operation in the repeated squaring
algorithm in order to obtain the desired result. Our invariant is that at the end of each step
our result has the form
(ue⊕r′) (4.2)
for some e∈Nand r′∈F256 chosen uniformly at random. Hence, the problem is to correct
the intermediate results without revealing any information about u.
4.3.2 Method
We introduce some variables: We name rj,i the jth random mask used in step iof the
repeated squaring algorithm. All rj,i are independent and uniformly distributed masks. The
direct result of a squaring or multiplication performed on some masked values is called fi.
Furthermore, we need so called auxiliary terms s1,i and s2,i to transform the direct result fi.
The variable t1,i is the intermediate result that appears during the correction and tiis the
final result which complies with our invariant (4.2), i.e., it is of the form ue⊕r1,i for some e.
The input to our modified inversion algorithm is the masked value (u⊕r1,0). Next, we
describe how to perform multiplications and squarings in a perfectly masked manner. The se-
curity analysis is shown in Section 4.3.3. We distinguish between squaring and multiplication
because the former is linear and hence can be masked more efficiently.
Perfectly Masked Squaring (PMS) The perfectly masked squaring algorithm that is
used in step iof the repeated squaring algorithm is described in Algorithm 7. The input
ti−1=ue⊕r1,i−1is squared in step 1. In order to compute the output that respects our
invariant we have to change the mask to r1,i. To do so in steps 2 and 3 we use the auxiliary
term s1,i and compute the desired output t=u2e⊕r1,i.
Algorithm 7 Perfectly Masked Squaring (PMS)
Input: ti−1=ue⊕r1,i−1,r1,i−1,r1,i ∈F256
Output: u2e⊕r1,i ∈F256
1: fi←t2
i−1{fi=u2e⊕r2
1,i−1}
2: s1,i ←r2
1,i−1⊕r1,i {auxiliary term to correct fi}
3: ti←fi⊕s1,i {ti=u2e⊕r1,i}
Perfectly Masked Multiplication (PMM) Our perfectly masked multiplication method
is described in Algorithm 8. The inputs are two intermediate results: the output xof the
4.3. Perfectly Masking AES against Order-1Adversaries 35
previous step and a freshly masked value x′derived by securely changing the masked value
from u⊕r1to u⊕r2. In Step 1 we calculate the product fiof two intermediate results. The
variable ficontains the desired power of uas well as some disturbing terms. In Steps 2-5
we compute the auxiliary terms s1,i and s2,i. In the end (Steps 6 and 7) we eliminate the
disturbing parts of fiand transform it according to our invariant. This is done by simply
adding up the two auxiliary terms s1,i,s2,i and fi.
Algorithm 8 Perfectly Masked Multiplication (PMM)
Input: x=ue⊕r1,i−1,x′=u⊕r2,i,r1,i−1,r1,i,r2,i ∈F256
Output: ue+1 ⊕r1,i ∈F256
1: fi←x·x′{fi=ue+1 ⊕ue·r2,i ⊕u·r1,i−1⊕r1,i−1·r2,i}
2: v1,i ←x′·r1,i−1{v1,i =u·r1,i−1⊕r1,i−1·r2,i}
3: v2,i ←v1,i ⊕r1,i {v2,i =u·r1,i−1⊕r1,i−1·r2,i ⊕r1,i}
4: s1,i ←v2,i ⊕r1,i−1·r2,i {s1,i =u·r1,i−1⊕r1,i}
5: s2,i ←x·r2,i {s2,i =ue·r2,i ⊕r1,i−1·r2,i}
6: t1,i ←fi⊕s1,i {t1,i =ue+1 ⊕ue·r2,i ⊕r1,i−1·r2,i ⊕r1,i}
7: ti←t1,i ⊕s2,i {ti=ue+1 ⊕r1,i}
Table 4.1 lists all intermediate results that occur during the computation of x254.
4.3.3 Security Analysis
As defined in our security model we have to look at all intermediate results. For Algorithm
7 and Algorithm 8 we only have to analyze the distributions of the following intermediate
results: fi, s1,i, s2,i, ti, t1,i, v1,i, v2,i where 1 ≤i≤13. These are the results that depend on u.
We can neglect intermediate results such as r2
1,i since they do not depend on u.
Our security analysis is based on the following three lemmata that characterize the dis-
tributions of intermediate results.
i Op fis1,i s2,i t1,i ti
1 (S) u2⊕r2
1,0r2
1,0⊕r1,1u2⊕r1,1
2 (M) (u2⊕r1,1)(u⊕r2,2)ur1,1⊕r1,2u2r2,2⊕r1,1r2,2u3⊕u2r2,2⊕r1,1r2,2⊕r1,2u3⊕r1,2
3 (S) u6⊕r2
1,2r2
1,2⊕r1,3u6⊕r1,3
4 (M) (u6⊕r1,3)(u⊕r2,4)ur1,3⊕r1,4u6r2,4⊕r1,3r2,4u7⊕u6r2,4⊕r1,3r2,4⊕r1,4u7⊕r1,4
5 (S) u14 ⊕r2
1,4r2
1,4⊕r1,5u14 ⊕r1,5
6 (M) (u14 ⊕r1,5)(u⊕r2,6)ur1,5⊕r1,6u14r2,6⊕r1,5r2,6u15 ⊕u14r2,6⊕r1,5r2,6⊕r1,6u15 ⊕r1,6
7 (S) u30 ⊕r2
1,6r2
1,6⊕r1,7u30 ⊕r1,7
8 (M) (u30 ⊕r1,7)(u⊕r2,8)ur1,7⊕r1,8u30r2,8⊕r1,7r2,8u31 ⊕u30r2,8⊕r1,7r2,8⊕r1,8u31 ⊕r1,8
9 (S) u62 ⊕r2
1,8r2
1,8⊕r1,9u62 ⊕r1,9
10 (M) (u62 ⊕r2
1,9)(u⊕r2,10 )ur1,9⊕r1,10 u62r2,10 ⊕r1,9r2,10 u63 ⊕u62r2,10 ⊕r1,9r2,10 ⊕r1,10 u63 ⊕r1,10
11 (S) u126 ⊕r2
1,10 r2
1,10 ⊕r1,11 u126 ⊕r1,11
12 (M) (u126 ⊕r1,11 )(u⊕r2,12 )ur1,11 ⊕r1,12 u126 r2,12 ⊕r1,11 r2,12 u127 ⊕u126 r2,12 ⊕r1,11 r2,12 ⊕r1,12 u127 ⊕r1,12
13 (S) u254 ⊕r2
1,12 r2
1,12 ⊕r1,13 u254 ⊕r1,13
Table 4.1: Computation of (u254 ⊕r1,13) using repeated squaring
36 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
Lemma 1 Let u∈F256 be arbitrary. Let r∈F256 be uniformly distributed and independent
of u. Then Z=u⊕ris uniformly distributed.
Lemma 2 Let u, u′∈F256 and r, r′∈F256 be independent and uniformly distributed. Set
I1=u⊕rand I2=u′⊕r′. Then the product Z=I1·I2is distributed according to
Pr(Z=b) = ((29−1)/216 ,if b= 0
(28−1)/216 ,if b6= 0
We call this distribution D0.
Lemma 3 In any finite field of characteristic 2, squaring is a one-to-one mapping.
The proofs of these lemmata are straightforward.
In the sequel, we examine each of the intermediate results that occur in the PMS (Algo-
rithm 7) and in the PMM (Algorithm 8). We show that the distributions of each of these
intermediate results is independent of the secret value u.
Analysis of fiWe have to look at the intermediate result fiin the two cases of squaring
and multiplication.
Squaring: The computation is fi←t2
i−1=u2e⊕r2
1,i−1for some 2 ≤e≤254. Since
r1,i−1is chosen uniformly at random, Lemma 3 together with Lemma 1 shows that fi
is uniformly distributed for all u.
Multiplication: The variable is computed as fi←(ue+r1,i−1)·(u⊕r2,i) = ue+1 ⊕uer2,i ⊕
ur1,i−1⊕r1,i−1r2,i. Here the terms ue+r1,i−1and u⊕r2,i are independent (because of
the independence of r1,i−1and r2,i) and uniformly distributed (see Lemma 1). So by
Lemma 2, fiis distributed according to D0for all u.
Analysis of s1,i, s2,i We examine the intermediate results s1,i, s2,i for multiplication and
squaring.
Squaring: Here s1,i can be neglected since it does not depend on u.
Multiplication: The variable s1,i is calculated by adding or multiplying independent masks
on the term (u⊕r2,i) leading to the term ur1,i−1⊕r1,i. So s1,i is obviously uniformly
distributed. The variable s2,i ←(ue⊕r1,i−1)·r2,i is the product of two independent
and uniformly distributed variables that are both independent of u. So the variable s2,i
is distributed according to D0independent of the value of u.
4.3. Perfectly Masking AES against Order-1Adversaries 37
Analysis of t1,i, tiAll these intermediate results are sums of some part depending on u
and an independent additive mask. So all of them are uniformly distributed by Lemma 1.
Hence corresponding intermediate results are always identically distributed and indepen-
dent of the value of u. This implies that the whole computation is perfectly masked.
4.3.4 Simplified Version
Previously we assumed that for each step we generate new random masks. In the sequel,
we show how to improve the method described above in terms of the number of random
masks needed to achieve a perfectly masked exponentiation. The fact that an adversary only
obtains a single intermediate result allows us to reuse random masks in different steps of the
algorithm.
Algorithm 9 Simplified Perfectly Masked Squaring (s-PMS)
Input: x=ue⊕r1, r1∈F256
Output: u2e⊕r1∈F256
1: fi←x2{f1=u2e⊕r2
1}
2: s1,i ←r2
1⊕r1{auxiliary term to correct fi}
3: ti←fi⊕s1,i {ti=u2e⊕r1}
We call the improved version of the squaring and multiplication algorithm simplified Per-
fectly Masked Squaring (s-PMS) (Algorithm 9) and simplified Perfectly Masked Multiplication
(s-PMM) (Algorithm 10), respectively.
Algorithm 10 Simplified Perfectly Masked Multiplication (s-PMM)
Input: x=ue⊕r1,x′=u⊕r2,r1, r2, r3∈F256
Output: ue+1 ⊕r1∈F256
1: fi←x·x′{fi=ue+1 ⊕ue·r2⊕u·r1⊕r1·r2}
2: t1←r1·r2⊕r3{t1=r1·r2⊕r3}
3: f′←f⊕t1{f′=ue+1 ⊕ue·r2⊕u·r1⊕r3}
4: s1,i ←x·r2{s1,i =ue·r1⊕r1·r2}
5: s2,i ←x′·r1{s2,i =u·r1⊕r1·r2}
6: t1,i ←f′
i⊕s1,i {t1,i =ue+1 ⊕u·r1⊕r1·r2⊕r3}
7: t2,i ←t1,i ⊕s2,i {t2,i =ue+1 ⊕r3}
8: t3,i ←t2,i ⊕r3⊕r1{t3,i =ue+1 ⊕r1}
Thus, we can reduce the number of random masks needed to only three masks (r1, r2, r3).
To achieve this we modify our computations such that after each step we switch back to our
original mask. This can be done by simply adding our original mask and then adding our
temporarily used mask. Because of the independence of the masks this has no impact on the
38 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
security. Table 4.2 lists all intermediate results that occur during the computation of x254
using the simplified method.
i Op fis1,i s2,i t1,i t2,i t3,i ti
1 (S) u2⊕r2
1r2
1⊕r1u2⊕r1
2 (M) (u2⊕r1)(u⊕r2)ur1⊕r3u2r2⊕r1r2u3⊕u2r2⊕r1r2⊕r3u3⊕r3u3⊕r3⊕r1u3⊕r1
3 (S) u6⊕r2
1r2
1⊕r1u6⊕r1
4 (M) (u6⊕r1)(u⊕r2)ur1⊕r3u6r2⊕r1r2u7⊕u6r2⊕r1r2⊕r3u7⊕r3u7⊕r3⊕r1u7⊕r1
5 (S) u14 ⊕r2
1r2
1⊕r1u14 ⊕r1
6 (M) (u14 ⊕r1)(u⊕r2)ur1⊕r3u14r2⊕r1r2u15 ⊕u14r2⊕r1r2⊕r3u15 ⊕r3u15 ⊕r3⊕r1u15 ⊕r1
7 (S) u30 ⊕r2
1r2
1⊕r1u30 ⊕r1
8 (M) (u30 ⊕r1)(u⊕r2)ur1⊕r3u30r2⊕r1r2u31 ⊕u30r2⊕r1r2⊕r3u31 ⊕r3u31 ⊕r3⊕r1u31 ⊕r1
9 (S) u62 ⊕r2
1r2
1⊕r1u62 ⊕r1
10 (M) (u62 ⊕r2
1)(u⊕r2)ur1⊕r3u62r2⊕r1r2u63 ⊕u62r2⊕r1r2⊕r3u63 ⊕r3u63 ⊕r3⊕r1u63 ⊕r1
11 (S) u126 ⊕r2
1r2
1⊕r1u126 ⊕r1
12 (M) (u126 ⊕r1)(u⊕r2)ur1⊕r3u126r2⊕r1r2u127 ⊕u126r2⊕r1r2⊕r3u127 ⊕r3u127 ⊕r3⊕r1u127 ⊕r1
13 (S) u254 ⊕r2
1r2
1⊕r1u254 ⊕r1
Table 4.2: Computation of (u254 ⊕r1) using repeated squaring (simplified version)
4.4 Implementation and Costs
Throughout the chapter, we have only considered a theoretical implementation of the in-
version algorithm according to the square-and-multiply algorithm. However, our method is
compatible with any implementation that combines additions, multiplications, and squarings
in a field or ring. More precisely, an arbitrary straight-line program over some finite field
using only additions and multiplications can be transformed to an equivalent program that
is perfectly masked. We do not consider software implementations of the presented coun-
termeasures. However, we notice that for constrained environments previous publications
have based their software implementations of side channel countermeasures on table lookups.
From a hardware point of view, the most area efficient ASIC hardware implementation is the
one described in (Satoh, Morioka, Takano and Munetoh 2001) based on composite fields. We
will discuss a possible implementation of our countermeasure based on composite fields and
will provide area and delay estimates in the next section.
4.4.1 Efficient Hardware Implementation over GF(((22)2)2)
First we describe in some detail how to implement an inverter over GF(((22)2)2), so that it
is clear how we obtained our area and delay estimates. This methodology is not new and it is
well known in the literature, e.g., see (Lidl and Niederreiter 1983). We assume a composite
field representation GF(((22)2)2)∼
=F256 for the inverse transformation using the following
4.4. Implementation and Costs 39
irreducible polynomials:
GF(22) : P(x) = x2+x+ 1, P(α) = 0
GF((22)2) : Q(y) = y2+y+α, Q(β) = 0
GF(((22)2)2) : R(z) = z2+z+λ, λ = (α+ 1)β
We use the s-PMM and s-PMS algorithms from Section 4.3 instead of the usual ones to build
our inversion circuit and, thus, render it secure against side channel attacks. Based on (Itoh
and Tsujii 1988) and (Guajardo and Paar 2002), (Satoh et al. 2001) notice that for A∈
GF(((22)2)2), A−1can be computed as A−1= (A17)−1A16, where A17 ∈GF((22)2). See for
example (Lidl and Niederreiter 1983) for the proof. Notice that the Itoh and Tsujii algorithm
can be recursively applied to B=A17 ∈GF((22)2), thus obtaining B−1= (B4·B)−1·(B4)
where B5∈GF(22). In the following, we write B=B1β+B0∈GF((22)2) with Bi∈GF(22).
Then, we can minimize the area requirement of the implementation using the following facts:
1. B4∈GF((22)2) can be computed as B4≡B1β+(B1+B0), i.e., only one addition over
GF(22).
2. B5∈GF(22) can be computed as B5≡B0·B1+B2
0+B2
1·α, where B2
1·αrequires
only wires for its implementation (no gates).
3. Given C=c1α+c0∈GF(22), C−1≡c1α+ (c1+c0), i.e., it requires one GF(2) adder.
Thus, computing B−1=B−5·B4∈GF((22)2) requires 3 GF(22) multipliers, 1 GF(22)
squarer, and 4 GF(22) adders. The inversion in GF(((22)2)2) can then be implemented
according to (Satoh et al. 2001) with 2 adders, 3 multipliers, 1 inverter, and 1 squarer
followed by multiplication with λ= (α+ 1)β, all over GF((22)2).
The hardware implementation of the perfectly masked version can be implemented sim-
ilarly except that instead of using the usual adders, multipliers, squarers, and inverters, we
use circuits which implement the algorithms from Section 4.3 (page 33).
4.4.2 Cost and Comparison to Previous Countermeasures
Area and delay estimates for circuits with and without countermeasures are provided in Table
4.3. The estimates are given in terms of the area and delay of 2-input AND gates, 2-input
XOR gates, and NOT gates. The complexity and specific implementation of these circuits
is taken from (Voigtl¨ander 2003). In addition, we provide complexity estimates in terms of
normalized area and delay. The normalization is done with respect to the area and delay
of a NOT gate. We have assumed that the areas of a 2-input AND gate and 2-input XOR
gate are twice and 3 times that of an inverter, respectively. Similarly, it is assumed that the
delays of NOT, AND, and XOR gates are equal. Notice that the assumptions regarding the
gates’ area and delay are not arbitrary but based on the actual sizes of several standard cell
libraries.
40 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
Arithmetic Operation A A′T T ′A′·T′
Inversion over GF (((22)2)2) 312 1 17 1 1
(Satoh et al. 2001)
Inversion with DPA countermeasure 1071 3.4 26 1.5 5.1
from (Trichina 2003) according to (4.1)
GF (((22)2)2) PM inverter 1704 5.5 21 1.2 6.6
from this thesis (Bl¨omer et al. 2004)
Inversion with DPA countermeasure 1341 4.3 34 2 8.6
from (Trichina 2003)
Inversion with countermeasure 1784 5.7 34 2 11.4
from (Akkar and Giraud 2001)
Table 4.3: Hardware cost comparison of area Aand delay Tfor different inversion circuits
with side channel countermeasures. A′:= A/ANormal Inv. and T′:= T/TNormal Inv. are the
normalized area and delay respectively.
Finally, we point out that (Satoh et al. 2001) which describes AES ASIC implementations
over GF(((22)2)2) does not provide the actual circuits used to implement the AES sbox.
Table 4.3 provides a cost comparison among the different masking countermeasures. We
did not consider the method from (Goli´c and Tymen 2002) briefly sketched in Section 4.2
(page 32) because its hardware implementation requires too many hardware resources. We
can estimate the cost of (Goli´c and Tymen 2002) if the degree of the polynomial qis n= 8
by simply considering the cost of a multiplier and an inverter over F2[x]/(mq)∼
=F256 ×F2n.
According to (Drolet 1998), such a multiplier requires 289 2-input AND gates and 272 2-input
XOR gates. The map INV′(v) = v254 mod mq can also be implemented with a multiplier (a
squarer requires only wires). Thus, we would need at least 1 multiplier and 1 inverter over
F2[x]/(mq) and 3 multipliers and 1 inverter over F256. This results in a circuit which requires
at least 731 AND and 766 XOR gates or about twice as many gates as our method.
Table 4.3 shows that the countermeasure of (Trichina 2003) implemented according to
Equation (4.1) on page 33 has the best area/time product of all the implementations. How-
ever, as we have seen in Section 4.2, this countermeasure is susceptible to DPA attacks if
the input byte is zero and, thus, does not provide perfect masking. If we then consider the
best area/time product of the countermeasures that offer DPA resistance, the implementa-
tion presented in this chapter has the best area/time product. This result is mainly due
to the reduced critical path in the circuit. In addition, our design encourages re-usability
of previously designed blocks. In other words, since the masking method depends only on
multipliers and adders, if one has multiplier and adder blocks already designed, they can be
used immediately to build a perfectly masked circuit (with the work from (Trichina 2003),
implementation of the masking countermeasure would require a complete circuit redesign).
Finally, we estimate the cost that our masking countermeasure would have on an AES
4.5. Order-dPerfectly Masking 41
hardware implementation. To do this, we assume that the implementation would follow the
architecture described in (Satoh et al. 2001) where the SubBytes transformation occupies
about 22% of the design with 4 sboxes in parallel. In SubBytes, the inverse transformation
accounts for 60% or about 14% of the total area. We also assume that the remaining cir-
cuits require twice as much area as an implementation without masking countermeasures.
Then, our new inversion circuit would need about 2.5 times the area that an AES hardware
implementation without countermeasures would need. Of this 31% would correspond to the
inverter circuit. The required area is only 20% larger than an implementation that uses hard-
ware countermeasures based on the usage of different hardware logic. Such methods double
the hardware resources when compared to an implementation using standard (single-rail)
logic.
In addition to time and area, other costs are also of importance. For example, the amount
of randomness is rather crucial since its generation is quite expensive. In our simplified
algorithm we only need 3 random masks in order to compute INV(x) in a secure manner.
Another important cost factor is the number of operations that have to be protected by
hardware means. Our approach needs this inevitable protection only for one intermediate
result. Hence it is optimal with respect to this cost measure.
4.5 Order-dPerfectly Masking
For the sake of completeness, in this section we focus on generalizing the method of Section
4.3 to adversaries of arbitrary but fixed order d. However, adversaries that can obtain values
of two or more intermediate results are very powerful and assumed to be not realistic right
now. Moreover, for increasing dan increasing amount of random bits is needed to achieve
such a high level of security. This, however, decreases efficiency considerably. In particular,
instead of using a single random byte ras a mask one has to use masks of the form
R=
d
X
i=1
ri
that are the sum of dindependent and uniformly distributed random bytes. Hence, our
invariant in the order-dcase is that every output of an operation is of the form
u⊕R=u⊕
d
X
i=1
ri(4.3)
for dindependent and uniformly distributed random bytes ri.
4.5.1 Perfect Mask Change
However, simply substituting the mask rby mask Rin the method described above is not
sufficient. To see this, consider the problem of changing masks of intermediate results in order
42 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
to introduce new randomness into the encryption. Note that changing masks is implicitely
done in the Perfectly Masked Squaring Algorithm (Algorithm 7 (page 34)) and the Perfectly
Masked Multiplication Algorithm (Algorithm 8 (page 35)). To change the mask Rof an
intermediate result Z1:= u⊕Rinto the mask R′the straightforward approach is to compute
1. Z1:= u⊕R(1)
2. Z2:= Z1⊕R(2) =u⊕R(1) ⊕R(2)
3. Z3:= Z2⊕R(1) =u⊕R(2)
However, for d≥3 an order-dadversary Acan get the values of Z1, Z2and Z3. Hence, A
can compute the unmasked value u=Z1⊕Z2⊕Z3.
In order to securely change the mask R(1) =Pd
i=1 r(1)
iof an intermediate result u⊕R(1)
to a different mask R(2) =Pd
i=1 r(2)
iwe propose to use Algorithm 11.
Algorithm 11 Perfect Mask Change (PMC)
Input: Z1=ue⊕R(1) for some 1 ≤e≤254, d∈N, r1...,rd
|{z }
R(1)
,rd+1, . . . , r2d
|{z }
R(2)
Output: Z2d+1 =ue⊕R(2)
1: for i= 2 ...2ddo
2: Zi←Zi−1⊕rd+i{add i-th masking byte of R(2)}
3: Zi+1 ←Zi⊕ri{remove i-th masking byte of R(1)}
4: i←i+ 1
5: end for
Example 1 For d= 3 Algorithm 11 computes the following intermediate results:
Z1=ue⊕r(1)
1⊕r(1)
2⊕r(1)
3
Z2=ue⊕r(1)
1⊕r(1)
2⊕r(1)
3⊕r(2)
1
Z3=ue⊕r(1)
2⊕r(1)
3⊕r(2)
1
Z4=ue⊕r(1)
2⊕r(1)
3⊕r(2)
1⊕r(2)
2
Z5=ue⊕r(1)
3⊕r(2)
1⊕r(2)
2
Z6=ue⊕r(1)
3⊕r(2)
1⊕r(2)
2⊕r(2)
3
Z7=ue⊕r(2)
1⊕r(2)
2⊕r(2)
3
Security Analysis
We first introduce our notation that we use for the proof of security. Let dbe the number of
intermediate results an adversary can get. For 1 ≤i≤2d+ 1 let
δi=i+ 1 mod 2
4.5. Order-dPerfectly Masking 43
indicate whether the intermediate result Ziis randomized by dor d+ 1 masks. Let
Si=i
2,...,i
2+d+δi
denote the set of indices of masks involved in the randomization of Zi. I.e.,
Zi=ue⊕X
j∈Si
rj.
Furthermore, let 1 ≤ℓ≤dand
I:= {i1,...,iℓ|i1< i2<···< iℓ}
be the set of indices of intermediate results known to the attacker and let
M:= {j1,...,jd−ℓ|j1< j2<···< jd−ℓ}
be the set of indices of masking bytes known to the attacker.
For i∈Ilet Si=Si\Mdenote the set of masks unknown to the attacker that randomize
the intermediate result Ziand let
Zi:= Zi⊕X
j∈M∩Si
rj=ue⊕X
j∈Si
rj
denote a known intermediate result after removing all known masks. Note that |Si| ≥ 1 for all
i∈Iholds by construction. Hence, all Ziare uniformly distributed by Lemma 1 (page 36).
Furthermore, depending on the set of known masks it is possible that Zi=Zjfor Zi6=Zj.
Lemma 4 Let Zi,Ziand rjbe defined as above. Then
Pr(Zi1,...,Ziℓ|rj1,...,rjd−ℓ) = Pr(Zi1, . . . , Ziℓ).
Proof. Let ζi1,...,ζiℓ∈F256 and ρj1,...,ρjd−ℓ∈F256.
Pr `Zi1=ζi1,...,Ziℓ=ζiℓ|rj1=ρj1,...,rjd−ℓ=ρjd−ℓ´
= Pr 0
@0
@Zi1⊕X
j∈Si1∩M
rj=ζi1
1
A,...,0
@Ziℓ⊕X
j∈Siℓ∩M
rj=ζiℓ
1
A
˛
˛
˛
˛
˛
˛
(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´1
A
= Pr 0
@0
@Zi1=ζi1⊕X
j∈Si1∩M
ρj1
A,...,0
@Ziℓ=ζiℓ⊕X
j∈Siℓ∩M
ρj1
A
˛
˛
˛
˛
˛
˛
(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´1
A
=
Pr ““Zi1=ζi1⊕Pj∈Si1∩Mρj”,...,“Ziℓ=ζiℓ⊕Pj∈Siℓ∩Mρj”,(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´”
Pr `(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´´
44 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
Since |Si|≥ 1 for all i∈Ithe variables Zi1, . . . , Ziℓand rj,...,rjd−ℓare stochastically
independent. Hence, we have that
Pr ““Zi1=ζi1⊕Pj∈Si1∩Mρj”,...,“Ziℓ=ζiℓ⊕Pj∈Siℓ∩Mρj”,(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´”
Pr `(rj1=ρj1),...,`rjd−ℓ=ρjd−ℓ´´
=
Pr ““Zi1=ζi1⊕Pj∈Si1∩Mρj”,...,“Ziℓ=ζiℓ⊕Pj∈Siℓ∩Mρj””·Pr `(rj1=ρj1),...,(rjd−ℓ=ρjd−ℓ)´
Pr `(rj1=ρj1),...,(rjd−ℓ=ρjd−ℓ)´
= Pr 0
@0
@Zi1=ζi1⊕X
j∈Si1∩M
ρj1
A,...,0
@Ziℓ=ζiℓ⊕X
j∈Siℓ∩M
ρj1
A1
A
Since all Zifor 1 ≤i≤ℓare uniformly distributed, this proves the lemma. ⊓⊔
To prove the security of Algorithm 11 we also need the following lemma.
Lemma 5 For some 1≤b≤ℓlet
I=[
1≤c≤b
Ic
be a partition of the set Iinto subsets I1,...,Ibsuch that
Zi=Zj⇔ ∃ 1≤c≤b:i, j ∈Ic.
I.e., the indices i, j of two elements Ziare in the same subset Iciff Zi=Zj.
Then
Pr ^
i∈I
Zi!=Y
1≤c≤b
Pr ^
i∈Ic
Zi!.
Proof. For i∈Iclet Tcdenote the set of indices of masks that randomize Zi. The
construction of the intermediate results of Algorithm 11 implies that for each 1 ≤c < b
min{j|j∈Tc}<min{j|j∈[
c<c′≤b
Tc′}
or
max{j|j∈Tc}<max{j|j∈\
c<c′≤b
Tc′}
holds. Hence, for each 1 ≤c < b at least one of the following cases holds:
1. If
Tc\[
c+1≤j≤b
Tj6=∅
it follows that all elements of {Zi|i∈Ic}are randomized by a uniformly distributed
mask that is not involved in randomizing elements of {Zi|i∈Sc+1≤j≤bIj}. Hence, it
follows that
Pr
^
i∈Sc≤j≤bIj
Zi
= Pr ^
i∈Ic
Zi!·Pr
^
i∈Sc+1≤j≤bIj
Zi
.
4.5. Order-dPerfectly Masking 45
2. If \
c+1≤j≤b
Tj\Tc6=∅
it follows that all elements of {Zi|i∈Sc+1≤j≤bIj}are randomized by a uniformly
distributed mask that is not involved in randomizing elements of {Zi|i∈Ic}. Hence, it
follows that
Pr
^
i∈Sc≤j≤bIj
Zi
= Pr ^
i∈Ic
Zi!·Pr
^
i∈Sc+1≤j≤bIj
Zi
.
Applying this case differentiation inductively proves the lemma. ⊓⊔
Lemma 4 shows that instead of analyzing the joint distribution of ℓintermediate results
Zi1,...,Ziℓtogether with d−ℓmasks rj1,...,rjd−ℓit is sufficient to analyze the joint dis-
tribution of the ℓvariables Zi1, . . . , Ziℓas defined above. Lemma 5 shows that the joint
distribution of Zi1, . . . , Ziℓis in fact independent of the secret variable u. Hence, an adver-
sary that knows at most dintermediate results and masks of Algorithm 11 does not learn
anything about the secret value u. Therefore, Lemma 4 together with Lemma 5 proves that
Algorithm 11 is order-dperfectly masked.
Generalized Mask Changing
We can generalize securely changing of masks for intermediate results
u⊕
l
X
i=1
R(i)=u⊕
l
X
i=1
d
X
j=1
r(i)
j
that is masked with lmasks R(1),...,R(l)each consisting of the sum of drandom bytes.
Algorithm 12 Perfect Multiple Mask Change (PMMC)
Input: Z(1) =ue⊕R(1) ⊕...⊕R(l),d, l ∈N
r(1)
1...,r(1)
d
|{z }
R(1)
,r(2)
1,...,r(2)
d
|{z }
R(2)
,...,r(l)
1...,r(l)
d
|{z }
R(l)
,r(l+1)
1...,r(l+1)
d
|{z }
R(l+1)
Output: Z(2)
d=ue⊕R(l+1)
1: Z(l)
0←Z
2: for i= 1 . . . d do
3: Z(0)
i←Z(l)
i−1⊕r(l+1)
i{add fresh mask r(l+1)
i}
4: for j= 1 . . . l do
5: Z(j)
i←Z(j−1)
i⊕r(j)
i{remove old mask r(j)
i}
6: end for
7: end for
46 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
The security of Algorithm 12 can be shown using similar arguments as in the security
proof of Algorithm 11.
In the sequel, we propose methods for squaring and multiplication in a secure manner.
4.5.2 Squaring
The order-dperfectly masked squaring algorithm is shown in Algorithm 13. The input
ue⊕R(1) is squared in Step 1. In the following steps we use the method PMC (Algorithm
11) to change the mask (R(1))2to R(2). Since squaring in a finite field of charactersistic 2 is
a one-to-one mapping the security of the squaring step entirely relies on the security of the
mask change. We showed above that the Algorithm PMC is in fact order-dperfectly masked.
Hence, Algorithm 13 is also order-dperfectly masked.
Algorithm 13 Order-dPerfectly Masked Squaring (d-PMS)
Input: x=ue⊕R(1),r(1)
1,...,r(1)
d
|{z }
R(1)
,r(2)
1,...,r(2)
d
|{z }
R(2)
Output: u2e⊕R(2)
1: f←x2{f=u2e⊕(R(1))2}
2: t←PMC(f, (r(1)
1)2,...,(r(1)
d)2, r(2)
1,...,r(2)
d){t=u2e⊕R(2)}
4.5.3 Multiplication
The order-dperfectly masked multiplication algorithm is presented in Algorithm 14. The
inputs Z(1) =ue⊕R(1) and Z(2) =uf⊕R(2) are multiplied in Step 1. The first loop (Steps
2-6) eliminates the first disturbing term ue·R(2) leaving the intermediate result
Z(1)
d=ue+f⊕uf·R(1) ⊕R(3).
The second disturbing term uf·R(1) is removed in the second loop (steps 8-12). In the final
step the result is recomputed to comply to our invariant (4.3) on page 41.
We verified that Algorithm 14 is order-dperfectly masked for d= 1,2,3. Due to the large
number of different distributions of the intermediate results we are not aware of an efficient
method to prove the security of Algorithm 14 for arbitrary d > 3. We believe that Algorithm
14 is also order-dperfectly masked for d > 3. However, the security level provided by an
order-3 perfectly masked algorithm goes far beyond the security requirements of practical
applications.
4.6. Conclusions 47
Algorithm 14 Perfectly Masked Multiplication (PMM)
Input: Z(1) =ue⊕R(1),Z(2) =uf⊕R(2),d∈N
r(1)
1,...,r(1)
d, r(2)
1,...,r(2)
d
| {z }
used masks
,r(3)
1,...,r(3)
d, r(4)
1,...,r(4)
d, r(5)
1,...,r(5)
d
| {z }
new masks
Output: ue+f⊕R(5)
1: Z(1)
0←Z(1) ·Z(2) {Z(1) =ue+f⊕ue·R(2) ⊕uf·R(1) ⊕R(1) ·R(2)}
eliminate first disturbing term
2: for i= 1 . . . d do
3: s(1)
i←Z(1) ·r(2)
i{s(1)
i=ue·r(2)
i⊕R(1) ·r(2)
i}
4: s(2)
i←s(1)
i⊕r(3)
i{s(2)
i=ue·r(2)
i⊕R(1) ·r(2)
i⊕r(3)
i}
5: Z(1)
i←Z(1)
i−1⊕s(2)
i
{Z(1)
i=ue+f⊕ue·Pd
j=i+1 r(2) +uf·R(1) +R(1) ·Pd
j=i+1 r(2)
i+Pi
j=1 r(3)
j}
6: end for {Z(1)
d=ue+f⊕uf·R(1) ⊕R(3)}
eliminate second disturbing term
7: Z(2)
0←Z(1)
d
8: for i= 1 . . . d do
9: s(3)
i←Z(2) ·r(1)
i{s(3)
i=uf·r(1)
i⊕R(2) ·r(1)
i}
10: s(4)
i←s(3)
i⊕r(4)
i{s(4)
i=uf·r(1)
i⊕R(2) ·r(1)
i⊕r(4)
i}
11: Z(2)
i←Z(2)
i−1⊕s(4)
i
{Z(2)
i=ue+f⊕uf·Pd
j=i+1 r(1) +R(2) ·Pd
j=i+1 r(1)
i+R(3) +Pi
j=1 r(4)
j}
12: end for {Z(2)
d←ue+f⊕R(1)R(2) ⊕R(3) ⊕R(4)}
Change mask
13: Z(3) ←PMMC(Z(2)
d,(r(1)
1r(2)
1),...,(r(1)
dr(2)
d), r(3)
1,...,r(3)
d, r(4)
1,...,r(4)
d, r(5)
1,...,r(5)
d)
{Z(3) ←ue+f⊕R(5)}
4.6 Conclusions
In this chapter we analyzed the security of cryptographic algorithms such as AES against
side channel attacks. We proposed a strong and general model to analyse the security.
Furthermore, we proposed a generic method to implement cryptographic algorithms that is
provably secure in our model. I.e., we showed that using our method, an adversary who can
determine the value of a single but arbitrary intermediate result in every encryption does not
derive any information about the secret key. Moreover, we analyzed the costs of our method
when implemented in hardware and compared it with the efficiency of other methods. In the
last part, we proposed a way to generalize our method to even more powerful adversaries
that can obtain the values of an arbitrary but fixed number dof intermediate results.
48 Chapter 4. Provably Secure Randomization of Cryptographic Algorithms
Chapter 5
Fault Based Collision Attacks
In this chapter we examine the security threat caused by so called fault attacks. Fault attacks
are a special type of side channel attacks in which the attacker enforces the malfunction of a
cryptographic device. The output or reaction of the device is then used to derive information
about the secret key. A typical target for fault attacks are smartcards (Rankl and Effing 2002).
A smartcard is a general purpose computer embedded in a plastic cover of a credit card’s size.
The main building blocks of a smartcard are a CPU, a ROM that contains for example the
operating system, an EEPROM containing among other things the secret key, and a RAM
to store intermediate results of computations. To communicate with the outside world the
smartcard has to be inserted into a so called card reader that also provides the energy the
smartcard needs for operating.
Smartcards are perfectly suited for storing private information such as cryptographic keys
because the corresponding cryptographic operations such as encryption or digital signature
are carried out directly on the smartcard. Therefore, the key never has to leave the smart-
card and hence seems to be protected very well, even in hostile environments. However, as
explained in Chapter 3 (page 19) physical instances of algorithms (in hardware or software)
may leak information about the computation through side channels.
(Boneh, DeMillo and Lipton 1997) were the first who showed that faults induced into the
encryption process of RSA can reveal the secret key. (Biham and Shamir 1997) combined fault
attacks with the concept of differentials and mounted a differential fault attack (DFA) on DES.
(Skorobogatov and Anderson 2002) showed that fault attacks are realizable with sufficient
precision in practice. (Bl¨omer and Seifert 2003), (Bar-El, Choukri, Naccache, Tunstall and
Whelan 2006) and (Otto 2005) give an overview of the physics of inducing faults.
In this chapter we focus on fault attacks on AES. The first fault attacks on AES reported
in the literature were due to (Bl¨omer and Seifert 2003) followed by improved attacks of
(Dusart, Letourneux and Vivolo 2003), (Giraud 2004), (Chen and Yen 2003) and (Piret and
Quisquater 2003). All these publications demonstrate the power of fault attacks. However,
these attacks either use the fault model of bit resets (Bl¨omer and Seifert 2003) in which case
49
50 Chapter 5. Fault Based Collision Attacks
they do not need the faulty ciphertexts. Or the attacks only require the fault model of bit
flips, in which case, however, the attacks need the faulty ciphertexts as described in (Dusart
et al. 2003), (Giraud 2004), (Chen and Yen 2003), (Piret and Quisquater 2003). The fault
attacks presented in this thesis use bit flips and, instead of faulty ciphertexts, the attacks only
use so called collision information. This turns out to be a much weaker requirement than the
requirement that an attacker gets complete faulty ciphertexts. To obtain our new attacks,
we show how to combine fault attacks with so called collision attacks. In a collision attack
the adversary tries to detect identical intermediate results during the encryption of different
plaintexts, e.g., by using side channel information, and uses this information to derive the
secret key. Basically this idea was due to Dobbertin. Schramm et al. developed collision
attacks against DES (Schramm, Wollinger and Paar 2003) and AES (Schramm, Leander,
Felke and Paar 2004) and showed how to detect collisions using power traces.
We combine the concepts of fault and collision attacks by inducing faults to generate
collisions. This approach allows to relax the requirement of getting faulty ciphertexts to the
requirement of detecting collisions in the encryption process. First we explain the basic idea
underlying our attacks by presenting an attack based on some rather strong assumptions.
After that we present an attack utilizing the same basic ideas that successfully attacks a
smartcard that is protected by a so called memory encryption mechanism (MEM). To the
best of our knowledge, this is the first fault attack on smartcards protected by memory
encryption.
To defend against side channel attacks the manufacturers of smartcards developed sev-
eral countermeasures. One type of countermeasure is intended to protect the card, e.g.,
shields, sensors or error detection. Another type is designed to render side channel attacks
useless using techniques to obfuscate the side channel information, e.g. by random masking
(Messerges 2000), (Goli´c and Tymen 2002), (Bl¨omer et al. 2004). Yet another more efficient
approach is to use a so called memory encryption mechanism (MEM). Memory encryption
mechanisms encrypt an intermediate result directly after it leaves the processor and decrypts
ROM
EEPROM
MEM
Processor
protected against faults
encrypted
RAM
key
Figure 5.1: Model of an enhanced smartcard with memory encryption mechanism (MEM)
51
data right before it enters the processor (see Figure 5.1). This guarantees that all data stored
in the RAM is encrypted. The intention is that memory encryption makes it harder for an
adversary to derive information about intermediate states of the encryption process by using
side channels of the smartcard. In general, it is assumed that unlike the RAM it is too diffi-
cult to induce faults into the registers of the highly integrated processor with some reasonable
precision. Hence, memory encryption is widely believed to be a useful countermeasure against
side channel attacks, i.e., fault attacks.
Due to the limited computational power of smartcards the MEM has to be very fast.
So the manufacturers of smartcards use some light encryption algorithms that are very fast
but may not be secure against serious cryptanalysis. To increase the impact of the MEM
the manufacturer like to keep their algorithms secret. However, many manufacturers do
not analyze the impact of MEMs on security but simply present it as an improvement of
security. The strategy is to implement as many promising countermeasures as possible by
not exceeding a certain cost threshold. Even a weak countermeasure should increase security.
Our attack, that works even in the presence of a MEM, shows that the security improve-
ment of the MEM as generally used is rather limited. In particular, we present an attack on
an AES implementation protected by MEM that determines the full AES key by inducing
only 285 faults and detecting collisions.
The chapter is organized as follows.
Section 5.1: The Concept of Fault Attacks ......................................52
In this section we introduce the concept of fault attacks. We categorize the existing
fault attacks depending on their properties like the precision of time and location.
Furthermore, we give the basic methods known so far to analyse the output or reaction
of the device respectively to derive secret information.
Section 5.2: The Concept of Collision Attacks ..................................56
To get a better understanding of fault based collision attacks we briefly sketch the idea
of so called collision attacks. Later we combine this concept with fault attacks to obtain
our novel concept of fault based collision attacks.
Section 5.3: New Fault Model ....................................................56
In Section 5.3 we present our model for analyzing fault based collision attacks as pub-
lished in (Bl¨omer and Krummel 2006). Fault based collsion attacks are an improvement
of classical fault attacks. On one hand they do not need strong assumption like the
ability to force bits to a certain value. On the other hand they do not need faulty
ciphertext to derive information about the secret key. We explicitely specify the under-
lying assumptions and justify why fault based collision attacks are realistic threats to
the security of cryptographic hardware.
Section 5.4: New Fault Attacks ...................................................59
In Section 5.4 we describe fault based collision attacks on AES and analyze their com-
52 Chapter 5. Fault Based Collision Attacks
plexity. Unlike the classical fault attacks using bit flips like the attacks of (Dusart
et al. 2003), (Giraud 2004), (Chen and Yen 2003) and (Piret and Quisquater 2003)
obtaining faulty ciphertexts is not essential for our attacks. Therefore our attacks are
applicable in scenarios where classical fault attacks do not work. On the other hand,
our new attacks need more faults than the classical fault attacks. We explain the basic
idea in our first attack. This attack is our basic attack and is based on rather strong
assumptions. However, in the sequel we show how to strengthen it and how to adapt it
to several other scenarios. The second attack we present is our strongest attack. This
attack shows how to successfully attack a smartcard that is protected by a MEM. To the
best of our knowledge this is the first successful attack against a smartcard protected
by a MEM.
Section 5.5: Conclusions ...........................................................69
We finish this chapter by reflecting the impact of fault based collision attacks on the
security of recent smartcards. Furthermore, we propose some ideas of how to thwart
cryptographic hardware against such attacks.
5.1 The Concept of Fault Attacks
In the sequel, we briefly summarize methods commonly suggested to induce faults in an
encryption. Based on these methods we present the standard models to analyze fault attacks.
5.1.1 Methods to Induce Faults
Researchers developed a wide variety of methods to induce faults into electrical circuits. In
the sequel, we list some common fault induction methods to motivate the fault models we
give afterwards. The methods to induce faults are the origin to develop theoretical models for
developing and analyzing fault attacks on cryptographic algorithms. Since we focus on the
theoretical analysis of fault attacks we only describe each method briefly. A more complete
list can be found in (Bar-El et al. 2006) and (Otto 2005).
Optical Fault Induction Exposing an electrical circuit to intensive light source will cause
photoelectric effects due to the current induced by photons. In turn, these photoelectric
effects cause faulty behavior of the circuit. If the circuit is laid open, intensive light is
an easy way to induce faults. In (Skorobogatov and Anderson 2002) the authors showed
how to induce faults with some reasonable precision using only a low-cost flash light.
The precision of inducing faults this way can be improved using more sophisticated lab
instruments.
Power Spikes The power supply of a smartcard is always established by the smartcard
reader. To ensure that the smartcard works properly in common environments the
5.1. The Concept of Fault Attacks 53
manufacturer agreed in (ISO 2002) that a smartcard must tolerate a variation of ±10%
of the standard supply voltage of 5V. Increasing or decreasing the voltage beyond
the specified limits is called a power spike. Power spikes may result in a transient
malfunction of the smartcard. E.g., if a power spike occurs during an encryption some
intermediate operation may not work properly and produce a faulty result.
Temperature Like the supply voltage also the operating temperature of a device is restricted
to certain thresholds to ensure proper operation. Heating up or cooling down a smart-
card beyond these thresholds may result in malfunctions, for example modifications of
the content of RAM cells.
Clock Glitches Due to the lack of an internal clock the correct operation of a smartcard
entirely depends on the external clock signal that is given by the cardreader. Disturbing
this clock signal may cause the card to spuriously skip operations.
X-rays and Focused Ion Beams There are two different ways to use X-rays or focused
ion beams in attacking a smartcard. Firstly, they can be used to drill holes through
a mechanical shield with high precision. Hence, the shield cannot prevent an attacker
to access the underlying hardware with some analysis tools, e.g., a probe. Secondly,
X-rays and focused ion beams can be used to induce faults without manipulating the
coating of the smartcard. Details can be found in (K¨ommerling and Kuhn 1999).
Eddy Current The French physicist Leon Foucault discovered in 1851 that moving a con-
ductor through a magnetic field causes some current flow called eddy current. Using
eddy current to disturb the operations of a smartcard is one of the oldest proposed meth-
ods of fault induction. See for example (Kocher 1996), (Anderson and Kuhn 1996) and
(K¨ommerling and Kuhn 1999). However, it is difficult to focus the fault to a certain
area of the chip. In (Quisquater and Samyde 2002) the authors developed a refined
method of inducing eddy current.
5.1.2 Fault Models
To analyze fault based attacks we first have to develop adequate models that cover the
important aspects of real environments. Independent of the method to induce faults the
following properties are essential for the analysis of fault attacks:
Precision The precision of the fault induction is crucial for both the success and the com-
plexity of a fault based attack. We distinguish between the precision of time and
location. The precision of location defines the ability of the attacker to focus the fault
induction on a certain part of the hardware. To induce a fault into a specific interme-
diate result an adversary must also be able to induce faults at a precise time depending
on the progress of the encryption.
54 Chapter 5. Fault Based Collision Attacks
Number of affected bits This property specifies how many bits are affected by the induced
faults. Precise fault injection techniques can modify single bits whereas other techniques
may change bytes or even a whole intermediate state.
Effect of the fault Our strongest model allows the adversary to set a bit of an intermediate
result to a certain value. I.e., if an adversary Acan force a bit to be 0 we call this a
bit reset. Moreover, if Acan force a bit to be 1 we call it a bit set. In weaker models
the adversary does not have such a strong influence on the value of faulty intermediate
result. E.g., if the adversary can only modify a whole intermediate state it is very
unlikely that he can force the complete state to a chosen value. In such scenarios we
assume that he can change the intermediate state in a random and unpredictable way.
We call this random fault.
Incidence of fault The incidence of a fault also plays an important role in the analysis of
fault based attacks. A fault that only changes the content of a memory cell once and
works properly during the rest of the encryption is called a transient fault. For example,
a focused ion beam changes the content of some bits in the RAM but does not destroy
any transistor. In contrast permanent faults are defective parts of the hardware that
do not work correctly after the fault is induced. E.g., this could be an interrupted wire
that prevents the information flow.
A fault attack can be divided into two steps. In the first step the adversary Ainduces a
fault into the encryption, e.g., by using a method described above. We call this step fault
induction step.
In the second step of a fault based attack, Aanalyses the impact of the induced fault on
the encryption. Depending on the implementation and the abilities of Athe analysis differs.
We distinguish two kinds of fault based attacks:
Fault Attacks Based on the Analysis of Faulty Output
If the encryption algorithm is not protected against fault attacks it does not react on the fault
directly. It simply continues its computation based on faulty intermediate results and outputs
a faulty ciphertext in the end. Giving Aaccess to both the faulty and the corresponding
correct ciphertext allows him to backtrack the encryption and deduce information about the
last round key. E.g., for ciphers like AES an adversary Aperforms the following procedure:
1. Aguesses the i-th byte bk10
iof the last round key and computes some intermediate
results by tracing back the last round of the encryption for both the faulty and the
correct ciphertext.
2. Averifies whether the difference of the corresponding intermediate results could be
caused by the induced fault. If this difference could not be caused by the fault, the
5.1. The Concept of Fault Attacks 55
candidate bk10
iis proven to be wrong and discarded. In the other case, Akeeps bk10
ias
a possible key value.
See for example (Dusart et al. 2003) and (Giraud 2004) for fault attacks of this type on
AES.
Fault Attacks based on the Information whether the Output is Faulty or Not
If the implementation does not output the faulty ciphertext the analysis is more involved.
However, we assume that the adversary Aalways notices if the induced fault falsifies the
encryption. E.g., a so called security reset that puts the implementation into a specified state
after detecting a fault would reveal the information that a fault occurred. But even if the
implementation does not reveal the detection of a fault directly it has to react on the fault
somehow. For example, it might recompute the encryption. But such a special treatment
of faults could be detected by the adversary by simply measuring the time an encryption
takes. If a faulty encryption takes longer than the correct encryption the induced fault had
an impact on the encryption. If the faulty encryption is as fast as the correct encryption the
adversary concludes that the induced fault does not influence the encryption. We distinguish
two kinds of attacks:
try and error attack The so called try and error attack works if Acan set or reset bits.
After forcing a bit of an intermediate result to either 0 (or 1), Adetermines if a fault
occurred or not. If the encryption is correct then the fault attack did not change
the value of that bit. Hence, Aconcludes that the bit was 0 (or 1 respectively). If
the encryption is faulty then the fault attack changed the value of that bit. Hence,
Aconcludes that the bit was 1 (or 0 respectively). Repeating fault attacks Acan
determine the values of several bits of an intermediate state. In turn, Acan use this
information to derive information about the secret key. See for example (Bl¨omer and
Seifert 2003) for this kind of fault attacks on AES.
fail safe attack Like the try and error attack, the so called fail safe attack does not need
a faulty ciphertext either but also works with random faults. To illustrate the attack
consider the square and multiply always algorithm (Algorithm 15) to compute an RSA
signature.
This implementation was proposed to counteract side channel attacks like timing and
power analysis. However, it turned out that this implementation is susceptible to a fail
safe attack. The idea is as follows: The attacker induces a random fault into t1of the
jth execution of the loop in line 4 of Algorithm 15. He can then determine the jth
bit djof the secret exponent as follows. If dj= 0 then no multiplication is needed in
step jto compute the signature and the variable t1does not influence the computation.
Hence the result would be correct. If dj= 1 then t1is needed in step jto compute
56 Chapter 5. Fault Based Collision Attacks
Algorithm 15 square and multiply always
Input: RSA modulus N, message m∈ZN, secret exponent d=Pℓ−1
i=0 di·2i∈Z∗
ϕ(N),
di∈ {0,1}
Output: mdmod N
1: t←1
2: for i=l−1 to 0 do
3: t0←t2
4: t1←t0·m{multiply always}
5: if di= 0 then
6: t←t0
7: else
8: t←t1
9: end if
10: end for
the signature and hence the result would be incorrect. Acan determine the complete
secret key by repeating this attack for all bits of d.
5.2 The Concept of Collision Attacks
The idea of collision attacks was due to Dobbertin and the first collision attacks were published
in (Schramm et al. 2003) and (Schramm et al. 2004). A collision is the occurrence of identical
intermediate results in the encryptions of different plaintexts. An adversary Atries to detect
collisions and uses this information together with the plaintexts (or ciphertexts) to derive
information about the secret key. To detect collisions the authors of (Schramm et al. 2003)
and (Schramm et al. 2004) proposed to use side channel information, e.g. power traces,
mounted successful collision attacks on DES and AES.
5.3 New Fault Model
5.3.1 Notation
In this chapter we focus on the AES-128 symmetric cipher and simply call it AES. However,
the attacks presented in this chapter can also be easily adapted to other versions of AES
having larger key sizes. As defined in Chapter 2, let P:= {0,1}128 be the set of plaintexts,
C:= {0,1}128 be the set of ciphertexts and K:= {0,1}128 be the set of keys. In the classical
model the AES encryption with a fixed key k∈ K is a bijective function :
AESk:P → C
p7→ c:= AESk(p).
5.3. New Fault Model 57
Let
O:= {SB,SR,MC,AR}
denote the set of round transformations SubBytes,ShiftRows,MixColumns and AddRoundKey.
Furthermore, let
p(r),(o)
i,j
denote the jth bit of the ith byte of the encryption state of plaintext pafter the operation
o∈Oof round 1 ≤r≤10. For example, p(3),(SR)
5,3is the 3rd bit of the 5th byte of the
encryption state of plaintext pafter the ShiftRows transformation of round 3. In the sequel,
we will omit the index jthat defines the bit position if it is not relevant in that context. The
ith byte of the round key k(r)of round ris called k(r)
i.
We can then define the set of bits that are results of a round transformation as
S:= np(0),(AR)
i,j |i∈ {0,...,15}, j ∈ {0,...,7}o∪
np(r),(o)
i,j |o∈O, r ∈ {1,...,10}, i ∈ {0,...,15}, j ∈ {0,...,7}o.
5.3.2 Model
To model faults mathematically, we extend the AES function with a second variable bthat
specifies a bit position during the computation of AESk. The set of all realizable functions
via AES is extended by flipping bit bduring the computation of AESk. We call the extended
function FAES:
FAESk:P × S → C
(p, b)7→ c:= FAESk(p, b)
However, the extended function called FAES(p, b) is not bijective. There exist collisions such
that two intermediate states of computations of FAES(p, b) and FAES(p′, b′) with different
inputs (p, b)6= (p′, b′) are equal. An attacker wants to detect those collisions and then use
them to derive the secret key k.
In our scenario we have a smartcard with an implementation of AES and a secret AES
key kstored on it. An adversary Ahas access to the smartcard and wants to determine
information about the secret key k. In our model we assume that the following holds:
1. Ais able to trigger the smartcard to encrypt chosen plaintexts.
2. Acan induce transient bit flips into the encryption process.
3. Ais able to detect collisions.
58 Chapter 5. Fault Based Collision Attacks
Discussion of the New Model
To simplify the description of our attacks we assume that the adversary Ais able to input
chosen plaintexts into the smartcard. However, our attacks can also be transformed to known
plaintext attacks. During the encryption Acan induce faults in terms of transient bit flips
into the result of a round transformation that is stored in the RAM. To be more precise,
Acan flip a single bit of some specified byte in the memory. Furthermore, Acan detect
collisions by obtaining some information about an internal state of the encryption process.
However, we do not assume that this information lets Adetermine (parts of) the secret key
directly. Nevertheless, it enables Ato detect if a collision occurred or not. We call any kind
of information that lets Adetect collisions collision information of some intermediate state
of FAES(p, b). Later we will show examples how to derive collision information.
Modeling Collision Information We model collision information as the evaluation of an
injective function fkthat depends on the concrete implementation of AESkand the secret key
k. It gets as input the specification of a bit position that is flipped during the encryption of
a plaintext p. The output is some information about an intermediate state of the encryption.
According to the notation introduced above we denote the collision information of encrypting
plaintext pand inducing a bit flip of bit eof byte iof the state after transformation oin round
rby fk(p(r),(o)
i, e). E.g., fk(p(1),(SB)
i, e) is the collision information Aobtains after flipping bit
eof byte iof the state after the SubBytes transformation of round 1. It is also possible to
derive the collision information without inducing a fault. We denote the evaluation of fk
without inducing a fault in the encryption process by fk(p(r),(o)
i,−).
Realizations of Collision Information Depending on the purpose of the smartcard fk
can have different realizations. Given the ciphertexts the detection of collisions is easy be-
cause the equality of ciphertexts implies equality of intermediate states. However, in many
cases the output of an encryption is not available to the attacker. For example, if the smart-
card computes a message authentication code (MAC) or a hash value using AES as a building
block, fkcan simply be the MAC or the hash value. Remember that the MAC is the final re-
sult of a number of interlinked AES encryptions and not the result of a single AES encryption.
The final ciphertext could also be used as collision information if the smartcard computes
multiple encryption with different encryption algorithms. Finally, if the smartcard computes
a single encryption but does not output faulty ciphertexts, fkcould be the measurement of
some side channel information, e.g., power trace, that allows to detect collisions.
Cost Analysis To analyze the costs of a fault based collision attack we simply count the
number of faults we have to induce. The evaluation of fkwithout inducing a fault is for free.
We also neglect the complexity of additional computations that can be performed offline since
in our cases they are obviously easy.
5.4. Fault Based Collision Attacks on AES 59
5.4 Fault Based Collision Attacks on AES
Now we describe fault based collision attacks on AES. For simplicity, we only show how to
compute byte k0of the secret key k. Similar approaches can be used to compute the other
key bytes.
We describe how to mount and analyze fault based collision attacks on AES in different
scenarios. Each scenario is characterized by abilities of the adversary and the properties of
the environment.
Precision of Fault Induction The first characteristic defines the precision of the fault
induction. We consider two cases. In the first case the adversary Ais able to flip a
specific bit of an intermediate state. In the second case the adversary can focus the
fault to a single byte of an intermediate state. However, we assume that he cannot
focus on a single bit of that byte but each possible bit flip occurs with probability 1/8.
Memory Encryption Mechanism (MEM) The second characteristic specifies whether
the smartcard is protected by a MEM or not. The MEM encrypts every intermediate
result that leaves the processor and decrypts a value right before it enters the processor,
see Figure 5.1 (page 50). Since a smartcard has only restricted computational power
and memory most manufacturers choose a byte oriented encryption function with a
fixed key that is used for encryption and decryption. In our approach we simply model
the memory encryption as an unknown but fixed function h:{0,1}8→ {0,1}8. That
means that we do not rely on a weakness in the memory encryption itself. In particular,
we do not assume to have any information of how bit flips affect further processing of
that byte.
Validation of Collision Information The last characteristic defines whether collision in-
formation remains valid for a long period of time or not. If collision information does
not remain valid there is no reason for Ato store collision information since he cannot
use it later in the attack. Ais only able to compare collision information of two recently
taken measurements and store the result. This effect could be caused by environments
that are frequently changed such that collision information taken at different times is
hardly comparable, e.g., due to some countermeasure that induces noise into the col-
lision information. If, however, collision information remains valid over the time span
used for the attack it may be useful for Ato store this information in a preprocessing
step to have it available once and for all. It will turn out later that stored information
helps to reduce the number of induced faults.
We denote the transformation of SubBytes applied on a single byte xof the state simply
as the application of the sbox on xand write it as S[x]. To simplify notation we define
∆(pi, qi) = pi⊕qi
60 Chapter 5. Fault Based Collision Attacks
to be the difference of two plaintext bytes piand qi. Then
∆in(pi, qi) = (pi⊕k(0)
i)⊕(qi⊕k(0)
i) = pi⊕qi
is the input difference of (pi, qi) before the first application of the sbox and
∆out(pi, qi) = S[pi⊕k(0)
i]⊕S[pi⊕k(0)
i]
is the output difference of (pi, qi) after the first application of the sbox.
5.4.1 Basic Attack
First, we describe the scenario in which the attack takes place. We assume that Acan flip
a specific bit at position eof the intermediate state p(1),(SB). We also assume that collision
information remains valid over the time span of the attack. Finally, we assume that the
smartcard is not protected by a MEM.
In a preprocessing step the adversary computes an array Beof length 256. In position
Be[y], y ∈ {0,...,255}the array stores the following information:
Be[y] := {s, t}s⊕t=y, S[s]⊕S[t] = 2e,
i.e., Be[y] stores all (unordered) pairs of bytes with ∆in(s, t) = yand
∆out(s, t) = S[s]⊕S[t] = 2e.
Furthermore, by Ce[y] denote the union of sets in Be[y]. The sets Ce[y] are pairwise disjoint.
As it turns out, for every e∈ {0,1,...,7}we have that 129 sets Ce[y] are empty, 126 sets
Ce[y] contain exactly two elements, and one set Ce[y] contains exactly four elements.
Next, Acollects a set Bof collision information fk(p(1)(SB)
0,−) for all 256 different values
of p0and arbitrary but fixed p1,...,p15. Then Achooses an arbitrary value q0and encrypts
the corresponding plaintext flipping an arbitrary bit eof q(1),(SB)
0. If fkhas the property that
fk(p(1),(SB)
0,−) = fk(q(1),(SB)
0, e)
then Ais able to find the corresponding plaintext p0satisfying
S[p0⊕k0] = S[q0⊕k0]⊕2e
by comparing the collision information with the elements of B. Given the pair p0, q0the
adversary Aknows the difference p0⊕k0⊕q0⊕k0=p0⊕q0. Using array Bethe adversary
Anow concludes
{p0⊕k0, q0⊕k0} ∈ Be[p0⊕q0].
Hence, Aknows that the correct key byte k0satisfies
k0∈p0⊕ss∈Ce[p0⊕q0].(5.1)
5.4. Fault Based Collision Attacks on AES 61
As mentioned above, Ce[y]≤4 for all y, and Ce[y]= 2 for all but one y. Hence, at this
point Ahas reduced the number of possible values for key byte k0to at most 4.
Next, Arepeats the experiment described above with some value q′
0, such that q′
0⊕s6∈
Ce[p0⊕q0] for all s∈ {p0⊕¯s¯s∈Ce[p0⊕q0]}. Using the collision information in set B, the
adversary Adetermines p′
0such that S[p′
0⊕k0] = S[q′
0⊕k0]⊕2e. As before Aconcludes that
the key byte k0satisfies
k0∈p′
0⊕ss∈Ce[p′
0⊕q′
0].(5.2)
By choice of q′
0, the adversary Ais guaranteed that p0⊕q06=p′
0⊕q′
0. By elementary arithmetic
it follows that if Ce[p′
0⊕q′
0]=Ce[p0⊕q0]= 2, then (5.1) and (5.2) uniquely determine
the key byte k0. By analyzing the structure of the arrays Bewe verified that the key byte k0
is also uniquely determined if one of the sets has size four.
Cost Analysis To determine a single AES key byte Ahas to induce two faults. Thus 32
faults are enough to determine the complete 128-bit AES key.
5.4.2 Second Attack
The scenario for this attack is as follows. We assume that the adversary Acan flip a specific
bit eof the intermediate state p(0),(AR). We also assume that collision information remains
valid over the time span of the attack. Finally, we assume that the smartcard is protected
by a MEM modelled as a function h:{0,1}8→ {0,1}8. This implies that after a flip of bit e
the encryption continues using the value h−1(h(pi⊕ki)⊕2e) instead of pi⊕ki. Therefore, we
assume that we have no information about the impact of bit flips on the encryption process.
The attack is divided into two steps. In the first step the adversary Acollects the necessary
information to compute a function g0that is equal to hup to some constant coefficient. To
do so Aselects a set Sof 256 plaintexts pthat take on all different values in byte p0and
that are equal in each other byte. Auses the smartcard to derive the collision information
for each of these plaintexts by evaluating fk(h(p(0)(AR)
0),−) and stores it in the table B. Then
Aencrypts plaintexts pof the set Sand induces a bit fault into bit 0 ≤e≤7 of h(p(0),(AR)
0)
and compares the collision information fk(h(p(0),(AR)
0), e) with the entries of table Bto find
the corresponding plaintext p′
0. So Aknows the difference
h(p0⊕k0)⊕h(p′
0⊕k0) = 2e
and stores the triple (p0, p′
0, e) in a difference table ∆B. This step is repeated for different
plaintexts pand for different faulty bit positions until Areceived enough information to
compute the differences
h(p0⊕k0)⊕h(p′
0⊕k0)
of one byte p0with all other bytes p′
0. The details are given in the following lemma.
62 Chapter 5. Fault Based Collision Attacks
Lemma 6 Let m:{0,1}q→ {0,1}qbe an unknown function defined over F2q. There exists
a set Dof 2q−1pairs (u, v)∈F2q×F2qwith the following property: If for all (u, v)∈D
we know e∈ {0,...,q−1}such that m(u)⊕m(v) = 2e, then one can determine a function
g:{0,1}q→ {0,1}qsuch that g⊕c=mfor some constant c∈F2q.
Proof. Given some set D⊆F2q×F2qwe construct a graph Gwhose set of vertices is F2q
as follows. We connect two vertices u, v with an edge of weight eif (u, v)∈D.
If in Gthere exists a path between two vertices x, y then the difference m(x)⊕m(y) is
determined by the differences of pairs in D. Furthermore, if the graph Gis connected we can
compute the difference m(x)⊕m(y) for all (x, y)∈F2q×F2q. In particular, we can determine
all differences of the form m(u)⊕m(u0) for an arbitrary but fixed input u0. Using Lagrange
interpolation we can compute the function g(u) = m(u)⊕m(u0). Setting c:= m(u0) proves
the lemma.
Next we describe a set Dof pairs (u, v) with known differences m(u)⊕m(v) = 2e,
such that the graph Gas defined above is in fact connected. First we fix an arbitrary
e1∈ {0,...,q−1}. Then there exists a set D1of 2q−1distinct pairs (u, v)∈F2q×F2qsuch
that m(u)⊕m(v) = 2e1. All pairs in D1will be elements of D. If we consider the graph
whose edges are defined by pairs in D1we get a graph G1on the vertex set F2qthat consists
of 2q−1connected components each consisting of exactly 2 vertices.
Next we choose e26=e1. Then there exists a set D2of 2q−2pairs of vertices (u, v) with
m(u)⊕m(v) = 2e2such that each pair in D2connects different connected components of G1.
We call the resulting graph G2. The set Dwill also contain all elements from D2.
Continuing in this way with all possible ei∈ {0,...,q−1}we get sets of pairs D1, D2...,Dq
and graphs G1, G2,...,Gqsuch that Gihas 2q−iconnected components. In particular, Gqis
connected. Moreover, the edges of Gqare given by the pairs in D:= Sq
i=1 Di. The size of D
is 2q−1. This proves the lemma. ⊓⊔
We want to apply Lemma 6 to the function h(x⊕k0). It is easy to see that Acan compute
exactly the set of differences Ddescribed in the proof of Lemma 6 since he is able to flip a
specific bit. Hence, knowing Dthe adversary Acan compute a function g0:{0,1}8→ {0,1}8
such that for all x∈F256 the difference g0(x)⊕h(x⊕k0) is some constant c0∈F256. Since
Adoes not know the constant c0he does not get any information about the key byte k0at
this point.
Acontinues by computing for all other byte positions 1 ≤i≤15 functions g1,...,g15 such
that for all x∈F256 the function gi:{0,1}8→ {0,1}8has the property that gi(x)⊕h(x⊕ki) =
cifor some unknown constant ci∈F256 . Each of the gi’s does not reveal any information
about the involved key byte kibecause the constant cican take on all possible values and is
unknown to A.
To derive information about the key, Aproceeds as follows. He guesses two candidates
5.4. Fault Based Collision Attacks on AES 63
bk0,bkifor the keybytes k0, ki, respectively. To test this hypothesis on the key, Aselects several
bytes xuniformly at random and computes
g0(x⊕bk0) = h(x⊕bk0⊕k0)⊕c0
and
gi(x⊕bki) = h(x⊕bki⊕ki)⊕ci.
Depending on the hypothesis (bk0,bki) the difference
t0,i := g0(x⊕bk0)⊕gi(x⊕bki)
computes to
h(x)⊕c0⊕h(x)⊕ci=c0⊕ci, if bk0⊕k0=bki⊕ki(5.3)
h(x⊕bk0⊕k0)⊕c0⊕h(x⊕bki⊕ki)⊕ci, if bk06=k0and bki6=ki(5.4)
h(x)⊕c0⊕h(x⊕bki⊕ki)⊕ci, if bk0=k0and bki6=ki(5.5)
h(x⊕bk0⊕k0)⊕c0⊕h(x)⊕ci, if bk06=k0and bki=ki(5.6)
Now we assume that the function hhas the following property. There do not exist constants
a, c ∈F256 such that h(x)⊕a=h(x⊕c) for all x. Note that this assumption does not
restrict the choice of hfor two reasons. Firstly, a function used for memory encryption that
does not have this property contains too much structure and is probably easier to attack.
Secondly, most functions have this property. In fact, a random function has the property
with probability at least 1 −2−127.
This assumption implies that unlike in case (5.3) in cases (5.4),(5.5),(5.6) the difference
t0,i is not constant. Moreover, if the guess bk0,bkiwas correct that is bk0=k0and bki=kithen
Awill always be in case (5.3). Now Acan easily test the hypothesis (bk0,bk1) by computing
t0,i for several bytes x. If t0,i varies for several different values of xthen Aknows that he is
not in case (5.3). It follows that the pair (bk0,bk1) cannot be correct. On the other hand if t0,i
remains constant Aconcludes to be in case (5.3) and keeps the pair (bk0,bk1) as a potentially
correct candidate.
This implies that for every possible key byte bk0the adversary Aobtains a single candidate
bkifor 1 ≤i≤15 that fulfills condition (5.3). Guessing bk0the adversary Acan compute a
vector (bk1,...,bk15) composed of unique candidates bkithat only depend on bk0. To uniquely
determine the correct key, Asimply mounts an exhaustive search attack on the 256 possible
values of bk0.
Cost Analysis Ahas to induce 255 faults to compute a function giaccording to Lemma
6. To test a hypothesis of the key Adoes not need to induce faults. So the overall number
of faults is 16 ·255 = 4080.
64 Chapter 5. Fault Based Collision Attacks
Improvement The previous attack can be improved with respect to the number of induced
faults as shown below. In the first step Acomputes the function g0such that g0(x) =
h(x⊕k0)⊕c0, where c0∈F256 is unknown, as above. To determine the other functions
g1,...,g15 the adversary Auses the fact that each giis related to g0by the following equation
gi(x) = h(x⊕ki)⊕ci=g0(x⊕ki⊕k0
|{z }
si
)⊕ci⊕c0.
So knowing g0(determined as above) Acomputes a list of all 256 functions g0,s := g0(x⊕s),
s∈F256. To determine which of these functions equals githe adversary Achooses arbitrary
pi, qiand evaluates fk(h(p(0),(AR)
i),−) and fk(h(q(0),(AR)
i), e) at byte position i. Using this
information Acomputes some differences gi(pi)⊕gi(qi) as described in the computation of
g0above.
To determine the correct function gi=g0,si, the adversary Asimply checks which of the
functions g0,s fulfills these differences simultaneously until only one function remains. See
below for the required number of experiments. Then Aknows the sum si=k0⊕kiof two
AES key bytes. Arepeats this procedure for all other byte positions 0 ≤i≤15. As before
guessing bk0the adversary Acan determine a unique candidate bki. That means that Ahas
a vector (bk1,...,bk15) with fixed candidates bkifor each of the 256 candidates bk0. Like in the
original version of this attack this reduces the set of possible AES keys to only 256 candidates.
An exhaustive search reveals the full AES key.
Cost Analysis To compute g0the adversary Ahas to induce 255 faults like in the original
version. To determine further gi’s, Ahas to collect a set of differences gi(p)⊕gi(q) that is
fulfilled by only one of the 256 functions g0,s simultaneously. Notice that if the function g0,s
fulfills a difference, i.e., g0(p⊕s)⊕g0(q⊕s) = gi(p)⊕gi(q) then because of symmetry the
function g0,s′given by s′:= p⊕q⊕salso fulfills this difference since
g0(p⊕(p⊕q⊕s)) ⊕g0(q⊕(p⊕q⊕s) = g0(q⊕s)⊕g0(p⊕s) = gi(q)⊕gi(p).
Assuming that the 256 functions g0,s behave like random permutations (except for the sym-
metry) we expect that Aneeds 2 differences to uniquely identify the correct one with high
probability. We tested this assumption by various experiments and in our experiments it
proved to be correct. Hence, we expect that Aneeds 255 + 15 ·2 = 285 faults to determine
the full 128-bit AES key.
As mentioned before we do not consider the complexity of the offline computations like
Lagrange interpolation etc. since all these computations can be performed efficiently without
access to the smartcard.
5.4.3 Third Attack
First, we describe the scenario in which the attack takes place. We assume that Acan
flip a specific bit at position eof the intermediate state p(1),(SB). We do not assume that
5.4. Fault Based Collision Attacks on AES 65
collision information remains valid over the time span of the attack. Hence, Ais only able
to compare collision information of two recently obtained measurements. Finally, we assume
that the smartcard is not protected by a MEM. Because it is always clear from the context
we simplify notation by identifying elements of F256 with their canonical representation as
elements of the set {0,...,255}.
As a basis for his attack Afixes some input difference ∆in and output difference ∆out of
the application of the sbox in round 1. To be able to detect collisions with a single bit flip
we restrict ∆out to be a power of 2.
The analysis of the sbox shows that there are a lot of suitable values for ∆in and ∆out .
E.g., Achooses ∆in = 10 and ∆out = 4. Only the two pairs
Z1:= (p0⊕k0= 0, q0⊕k0= 10)
and
Z2:= (p0⊕k0= 244, q0⊕k0= 254)
together with their commuted counterparts fulfill the chosen requirements. A fault that is
induced into bit 2 of q(1),(SB)
0after the application of the sbox results in a collision for one of
these pairs. In order to detect such a collision the collision information fkshould have the
property that
fk(p(1),(SB)
0,−) = fk(q(1),(SB)
0,2).
If Afinds such a collision he can conclude that the key byte k0is an element of the set
{p0⊕0, p0⊕10, p0⊕244, p0⊕254}.
More precisely, the attack using fkwith the property defined above works as follows.
First, Agenerates all 128 pairs of plaintexts (p, q) (without symmetry) that have difference
10 in byte 0 (p0=q0⊕10) and are equal in the other bytes, i.e.,
∆(pi, qi) = (10, if i=0
0, otherwise.
Aknows that exactly two of these pairs have output difference 4 in byte 0. The input
difference of the sbox is the same as the difference of p0and q0since AddRoundKey does not
change it. Achecks all 128 pairs (p, q) until
fk(p(1),(SB)
0,−) = fk(q(1),(SB)
0,2).
Taking the symmetry into account it follows that either p0⊕k0= 0, p0⊕k0= 10, p0⊕k0= 244
or p0⊕k0= 254. So there are only 4 candidates for k0left. Acan repeat this attack for all
byte positions of the state. This leaves 22·16 = 232 possible keys. To determine the complete
128-bit AES key Amounts an exhaustive search attack.
66 Chapter 5. Fault Based Collision Attacks
Cost Analysis In the first step Aexamines 128 pairs of plaintexts with difference 10.
Two of these pairs result in a collision so the expected number of faults Ahas to induce is
(2/128)−1= 64. To compute a 128 bit AES key, Aexpects to induce 16 ∗64 = 1024 faults
and a brute force attack of size 232.
Alternative To determine the correct candidate of the key byte Acould also repeat the
same procedure as above with another difference. We assume that fklets Adetect collisions
when flipping bit 3, i.e.
fk(p′(1),(SB)
0,−) = fk(q′(1),(SB)
0,3).
If we consider all pairs (p′, q′) such that
∆(p′
i, q′
i) = (5, if i=0
0, otherwise
the analysis of the sbox shows that
Z3:= (p′
0⊕k0= 0, q′
0⊕k0= 5)
and
Z4:= (p′
0⊕k0= 122, q′
0⊕k0= 127)
are the only pairs with ∆in = 5 and ∆out = 8. Detecting one of these pairs using fkyields
again a set of 4 candidates for k0.
Next, Acomputes the difference of plaintexts p0and p′
0. The difference must be one of
the differences listed in Table 5.1. Since all possible differences are distinct, Acan determine
p0⊕k0and hence k0.
Cost Analysis Following the cost analysis as above this method determines the correct
candidate of each key byte with 1024 faults as in the previous method plus additional 1024
faults.
p0⊕k0
p′
0⊕k00 10 244 254
0 0 10 244 254
5 5 15 241 251
122 122 112 142 132
127 127 117 139 129
Table 5.1: All possible differences of p0,p′
0
5.4. Fault Based Collision Attacks on AES 67
5.4.4 Fourth Attack
We assume that Acan flip a bit of a specific byte of the intermediate state p(1),(SB). However,
he has no control over the bit position. Instead, we assume that all of the 8 possible bit flips
occur with the same probability 1/8. We also assume that collision information remains valid
over the time span of the attack. Finally, we assume that the smartcard is not protected by
a MEM.
The attack works as follows. In a first step Aselects a set Sof 256 plaintexts pthat take
on all different values in byte p0and are equal in each other byte. Acollects the collision
information fk(p(1),(SB)
0,−) for all elements of S. Then he chooses an arbitrary plaintext q
and encrypts qinducing a fault into bit eof q(1),(SB)
0. By comparing the collision information
fk(q(1),(SB)
0, e) with the collision information collected in the first step Acan determine the
corresponding plaintext p0such that
S[p0⊕k0] = S[q0⊕k0]⊕2e.
Note that eis unknown to Asince he does not have any influence on the bit position. Acan
test all candidates bk0of k0by simply checking if S[p0⊕bk0]⊕S[q0⊕bk0] is a power of 2. If this
condition is true Astores bk0as a possible key value and discards it otherwise. An analysis
of the AES sbox shows that after checking all candidates a set of at most 16 candidates will
remain. Arepeats this procedure with different q0until only one candidate is left. Using a
refined method similar to the attack in Section 5.4.1 using approximately 3 different q0we
can determine the correct key byte with high probability. Hence, we expect that this attack
needs roughly 3 ·16 = 48 faults.
5.4.5 Fifth Attack
We assume that Acan flip a bit of a specific byte of the intermediate state p(1),(SB). However,
he has no control over the bit position. Instead, we assume that all of the 8 possible bit flips
in a position b∈ {0,...,7}occur with the same probability 1/8. We do not assume that
collision information remains valid over the time span of the attack. Hence, Ais only able
to compare collision information of two recently obtained measurements. Finally, we assume
that the smartcard is not protected by a MEM.
Achooses ∆in of the sbox in round 1 in such a way that the number of pairs that have
difference ∆in and output difference with Hamming weight 1 is maximal. This choice reduces
the number of faults Ahas to induce as we will see later. An analysis of the sbox shows
that ∆in = 216 is the best choice since 8 is the maximum number of pairs that fulfill the
requirements.
A single bit flip induced into q(1)(SB)
0may produce a collision if and only if p0⊕k0is one
of the following values:
0,2,8,28,29,41,111,117,173,183,196,197,208,216,218,241.
68 Chapter 5. Fault Based Collision Attacks
To detect the collision fkshould have the property that
fk(p(1)(SB)
0,−) = fk(q(1)(SB)
0, b).(5.7)
A collision implies that k0is an element of the set of 16 candidates
L={p0, p0⊕2, p0⊕8, p0⊕28, p0⊕29, p0⊕41, p0⊕111, p0⊕117, p0⊕173,
p0⊕183, p0⊕196, p0⊕197, p0⊕208, p0⊕216, p0⊕218, p0⊕241}.
To determine k0the adversary Afirst builds a list of all 128 pairs (p0, q0) of plaintexts with
difference 216 in byte 0 and difference 0 in all other bytes. Then Aselects an arbitrary
q0, derives fk(q(1)(SB)
0, b) of the corresponding plaintext and compares it with the collision
information fk(p(1)(SB)
0,−) of the corresponding plaintext of p0.Arepeats this procedure
until he detects a collision. At his point Aknows that k0is an element of the set L.
To identify the correct candidate Acould start an exhaustive search or repeat the pro-
cedure with a different combination of input and output differences. For example Achooses
input difference 4 and output difference 32. Since (88,92) is the only such pair Acan use fk
as a special case of (5.7) having the property
fk(p(1)(SB)
0,−) = fk(q(1)(SB)
0,5)
to test each candidate bk0∈ L of k0.
To check whether a candidate bk0∈ L is equal to k0,Aderives the collision information
fk(p(1)(SB)
0,−) and fk(q(1)(SB)
0, b) for p0=bk0⊕92 and q0=bk0⊕88. Since (92,88) is the only
pair with ∆in = 4 and Hamming weight of ∆out = 1, the adversaryAcan check his hypothesis
bk0. More precisely if bk06=k0the Hamming weight of the output difference will always be
greater than 1 except for the case that p(0)(AR)
0= 88 and q(0)(AR)
0= 92. But this case implies
that bk0⊕4 = k0which is impossible since every difference of two of the sixteen candidates
is different from 4. So a wrong hypothesis cannot create a collision. On the other hand if
bk0=k0then p⊕k0= 92 ⊕bk0⊕k0= 92 and q⊕k0= 88 ⊕bk0⊕k0= 88 is the demanded pair
and Awill detect a collision using fk.
Cost Analysis The success probability of finding one of the 8 pairs in part one of the
attack choosing p0uniformly at random is 8
128 ·1
8=1
128 .Hence 128 is the expected number
of faults Ahas to induce.
The success probability in the second step is (1/8) ·(1/16) = 1/128. So we expect that
Aneeds additional 128 faults. Hence the total number of faults to determine a key byte is
2·128 = 256.
To compute a complete 128 bit AES key we expect that Aneeds 16 ·256 = 4096 faults.
5.5. Conclusion 69
5.5 Conclusion
In this chapter we introduced the concept of fault based collision attacks. We showed that
combining the concepts of fault attacks and collision attacks leads to powerful attacks. Fault
based collision attacks do not need faulty ciphertexts but only need collision information. It
turned out that this is a much weaker requirement.
Furthermore, we considered so called memory encryption mechanisms (MEM), an elab-
orative countermeasure widely used to protect high-end security smartcards against side
channel attacks. We showed that using MEM in a straightforward manner does not increase
security as much as one would expect. E.g., we presented a fault based collision attack on
AES that breaks an implementation protected by a MEM by inducing only about 285 faults.
Moreover, we showed how to mount further fault based collision attacks on AES in different
scenarios. Table 5.2 shows an overview of the 5 attacks presented in this chapter. The first
row shows the precision of the fault induction needed for each of our attacks. The second
row shows whether the collision information is valid over the whole time span of the attack
or if it changes after a short period of time. The third row shows if the target smartcard is
protected by a MEM. The expected number of faults needed for the attack is shown in the
last row.
basic attack attack 2 attack 3 attack 4 attack 5
Precision high high high loose loose
coll. information valid? yes yes no yes no
MEM no yes no no no
# faults 32 285 1024 48 4096
Table 5.2: Overview over the fault based collision attacks
To thwart our attack one has to be more careful. Using a MEM one has to ensure
that different memory encryption functions (keys) are used to protect different bytes of an
intermediate state. Furthermore, we suggest to change the keys of the memory encryption
frequently. Depending on the smartcard and the application one can also consider to increase
the block size of the memory encryption function, e.g., to 16 bit. This would increase the
complexity of fault based collision attacks.
For high-end security applications we suggest to use a randomization strategy like the one
proposed in Chapter 4. Obviously, this approach is more expensive in terms of random bits.
However, it provides a much better security that can be scaled to meet the desired security
level.
70 Chapter 5. Fault Based Collision Attacks
Chapter 6
Cache Behavior Attacks (CBAs)
The performance of recent computers benefits from the progress in chip design and computer
architecture. I.e., the usage of fast but small buffers, so called cache memories, improves
the execution time of algorithms significantly. At first glance this helps to improve security
because even more complex cryptographic algorithms, e.g., encryption algorithms could be
used without slowing down the system too much. However, performance improvements often
also open side channels that leak information about intermediate states of the encryption
process. In this chapter we analyze and formalize the information leakage due to cache
behavior.
It was first observed in (Hu 1992) and (Trostle 1998) that cache behavior opens a covert
channel. They did not focus on attacking cryptographic algorithms but analyzed the multi-
level security of complex systems. Later, (Kocher 1996) and (Kelsey, Schneier, Wagner and
Hall 1998) were the first who mentioned that cache behavior may be a possible point of at-
tack for cryptographic algorithms. During the selection process of AES the resistance of the
candidate algorithms against side channel attacks was investigated for example in (Daemen
and Rijmen 1999). At this time only the time and power consuming operations like multipli-
cation were in the field of vision. Table lookups, e.g., for efficient application of sboxes were
regarded to be resistant against side channel attacks since they were supposed to be constant
time and constant power consuming. However, this turned out to be wrong.
The first theoretical cache behavior attack (CBA) was mounted on DES and presented
in (Page 2002). Later the authors of (Tsunoo, Saito, Suzaki, Shigeri and Miyauchi 2003c)
proved that cache attacks are a realistic threat for cryptographic algorithms. They performed
a cache based attack on DES that successfully determined the secret key. Page extended the
theoretical concept of CBAs in (Page 2003). He started to classify CBAs into time driven
CBAs and trace driven CBAs depending on attackers abilities. The upcoming publications of
practical attacks against AES (Bernstein 2005), (Osvik, Shamir and Tromer 2006), (Brickell,
Graunke, Neve and Seifert 2006) and RSA (Percival 2005) revealed the full power of cache
behavior attacks. These attacks even justify to introduce a new class of CBAs, so called
71
72 Chapter 6. Cache Behavior Attacks (CBAs)
access driven CBAs.
In this chapter we give the background of CBAs and present the progress in the area of
CBAs up to now. After that we present a different view on how to counteract CBAs that
leads to novel countermeasures. A more detailed description of the structure of this chapter
is as follows:
Section 6.1: Cache Mechanism and Technical Background .....................73
We give a brief summary of the memory management, i.e., the cache mechanism of
recent computers. All technical details that are necessary to understand CBAs and
countermeasures are explained here.
Section 6.2: Security Models for CBAs ..........................................75
In this section we describe the theoretical foundations of CBAs. To analyze attacks and
countermeasures one has to define the abilities of the attacker and the properties of the
underlying implementation. We distinguish three different models: time driven, trace
driven, and access driven CBAs. We propose to use a strengthened variation of the
access driven model as a basis for security analysis and for developing countermeasures.
Section 6.3: Access Driven CBAs on AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85
In this section we describe two concrete CBAs on AES. The first one is due to (Osvik
et al. 2006). It is based on the first round(s) of AES. The second attack is more efficient
and only focuses on the last round of AES. The differences of these attacks lead to a
new countermeasure that we present in Section 6.7.2.
Section 6.4: General Methods to Thwart CBAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
This section provides a list of methods to thwart cache behavior attacks proposed so
far.
Section 6.5: Information Leakage and Resistance ................................89
In this section we introduce the concept of information leakage and the concept of
resistance to estimate the susceptibility of an implementation. Information leakage
allows to estimate the uncertainty of an attacker about the secret key that remains
after successfully mounting a CBA. The resistance is a measure that indicates the
expected effort for an adversary to derive some information about the secret key.
Section 6.6: Information Leakage and Resistance of Selected Implementations 92
In this section we examine the information leakage and the resistance as defined in the
former section of selected implementations of AES against access driven CBAs. Beside
well known implementations we also consider new implementations of AES to counter-
act CBAs. We show that one of the new implementations is provably secure even in
our strengthened access driven CBA model.
Section 6.7: Countermeasures Based on Permutations .........................100
The usage of random permutations is one of the countermeasures proposed in the
6.1. Cache Mechanism and Technical Background 73
literature. We analyze the security a random permutation provides by describing an
attack on a AES implementation protected by using a random permutation. In the
sequel, we introduce so called distinguished permutations. A distinguished permutation
is a permutation having a special property that ensures that some key bits are protected
unconditionally. This is an improvement over the usage of general permutations that
leak all bits of the secret key as our attack in Section 6.7.1 shows.
Section 6.8: Concluding Remarks ................................................106
Finally, we recapitulate CBAs and countermeasures. We describe how to combine the
proposed countermeasures to improve the ratio of security and efficiency.
6.1 Cache Mechanism and Technical Background
In this section we introduce the technical background of cache based attacks. A thorough
treatment of computer architecture and memory management is given in (Hennesey and
Patterson 2002). (Handy 1998) addresses the topic of cache memories even more deeply from
a processor designers view.
The processor (CPU) and the main memory (RAM) are the two main building blocks
of recent computers that play an important role in cache based attacks. The CPU only
has very fast but few so called CPU registers (short: registers) each having the size of a
processor word, e.g. 32 or 64 bits. To process data that is stored in the RAM the data has
to be transferred to the CPU registers. Hence, RAM should have at least two properties:
1. RAM should be large in order to allow to store a lot of data.
2. RAM should be fast in order to allow access and process data quickly.
However, with recent technology these two properties are contradictory. Memory that has to
be fast is necessarily restricted to small size and memory that has to be large is necessarily
slow. In order to compensate this discrepancy, modern computers use a hierarchy of typically
4 different levels of memories that differ in size and speed. The CPU registers are placed in
level 1 of the memory hierarchy. They have the shortest access time but are limited altogether
to less than 1 KB. To compensate the rather slow accesses to the main memory placed in level
3 the so called cache memory (short: cache) is placed in level 2. Cache is much faster than
the main memory but its size is restricted to a few megabytes. So, cache memory constitutes
a trade-off between the small but fast CPU registers and the large but slow main memory1.
The hard drive is placed in level 4 of the memory hierarchy. It is orders of magnitude larger
but also orders of magnitudes slower then the other types of memories. However, the hard
1Note that in recent CPU’s the cache memory (level 2) is split into different so called cache level again
differing in size and speed. However, in order to simplify descriptions we stick to the simpler situation with
only a single cache level.
74 Chapter 6. Cache Behavior Attacks (CBAs)
drive has no influence on CBAs. Table 6.1 shows the memory hierarchy of recent computers.
Furthermore, an overview over the typical sizes and the typical access times of memories of
different levels are given.
The cache memory is divided into dso called cache lines each of size λbits. The set of
cache lines is partitioned into mso called cache sets each containing exactly d/m cache lines2.
Likewise, the memory is divided into so called memory blocks of size λbits. The memory
blocks are labeled with consecutive numbers referred to as the address of the memory block.
Every transfer of data from the main memory is redirected through the cache. Whenever
data should be transferred to the CPU it is checked whether the data is already in the cache
or not. If the requested data is not in the cache the whole memory block Bthat contains
the data is first loaded from the main memory to the cache and then the data is transferred
to the processor. This is called a cache miss. To which cache set and cache line the data is
transferred depends on the address Mof the requested data. Mis split into tso called tag
bits,sso called set bits and boffset bits as depicted in Figure 6.1. The set bits determine
the cache set. According to a placement strategy, one of the cache lines in the determined
cache set is chosen to host the data of the memory block B. The tag bits are also stored as
meta information about the content of the cache line. Since the cache is much smaller than
the main memory, the previous content of the chosen cache line has to be overwritten. This
is called data eviction.
On the other hand, if the cache already contains the requested data, it is directly trans-
ferred from the cache to the processor avoiding the access to the slow main memory. This
is called a cache hit. Hence, if a process uses certain data more often, after the first access
the data resides in the cache and can be quickly transferred to the processor. To find the
requested data in the cache, the address Mis split like above into set bits, tag bits and offset
bits. The set bits determine the correct cache set S. The data is contained in that cache line
of Swhose tag bits match the tag bits of the requested data.
A cache that groups d/m cache lines in a cache set is called (d/m)-way associative cache.
2Note that dis always chosen as a multiple of m.
register cache RAM hard disc
level 1 2 3 4
typical size <1 KB <16 MB <16 GB >100 GB
access time 0.25 −0.5 ns 0.5−25 ns 80 −250 ns 5 ms
hit time - 1-2 cycles 100 cycles 10.000.000 cycles
miss penalty - 25 −100 cycles - -
Table 6.1: The memory hierarchy
6.2. Security Models for CBAs 75
data address
memory block address offset bits
←ttag bits → ← sset bits → ← boffset bits →
Figure 6.1: Partitioning the address of requested data
If d/m = 1 then the cache is called a direct mapped cache. If every memory block can be
hosted by every cache line the cache is called a full associative cache. Figure 6.2 illustrates
the different types of caches. On one hand, the larger the number d/m of cache lines per
cache set is, the higher is the chance to avoid eviction of data that is still needed. On the
other hand, the larger the number d/m is the longer takes it to find the requested data in
the cache. Therefore, most recent processors use direct mapped caches. Some processors use
2- or 4-way associative caches.
6.2 Security Models for CBAs
In this section we present general principles of how to exploit the knowledge about the
cache behavior to determine information about intermediate results of an algorithm. In turn,
this information can be used to derive information about the secret key of a cryptographic
main memory
direct mapped
cache full associative
cachecache
2−way associative
Figure 6.2: Different types of cache memory
76 Chapter 6. Cache Behavior Attacks (CBAs)
algorithm. In the next section we describe the basic setting and the basic abilities of the
adversary referred to as the fundamental model. After that we present three different threat
models for CBAs that are based on the fundamental model.
6.2.1 Fundamental Model for CBAs
We consider a so called crypto process running on a computer with cache memory. This
crypto process encrypts (or decrypts) a given plaintext (or ciphertext) using a secret key k.
The adversary Awants to derive information about kby analyzing plaintext/ciphertext pairs.
Depending on the underlying threat model Agets some additional side channel information
that leaks due to the cache mechanism. I.e., we focus on the security problems based on
sharing the cache between processes. However, reading data of other processes directly is
prevented by the memory management. The only interaction that happens is the mutual
eviction of data.
To be more specific, in the fundamental threat model for CBAs we assume that the fol-
lowing holds:
Assumption 11 Aknows all technical details about the underlying cryptographic
algorithm and its implementation ( Kerckhoffs’ extended principle).
Assumption 12 Every memory block of the sboxes is mapped to a different cache
line. I.e., the applications of the sboxes do not cause any data eviction of sbox data.
Assumption 13 During the attack only cache accesses caused by the encryption
occur.
Assumption 14 In the beginning of an encryption / decryption no sbox data is
stored in the cache.
Assumption 15 Acan feed the crypto process with known (or chosen) plaintexts
(or ciphertexts) and obtains the corresponding ciphertexts (or plaintexts).
Discussion of the Fundamental Model In the following we discuss and justify the
fundamental model for CBAs as given above. Variations of how to implement a cryptographic
algorithm efficiently are rather limited. Hence, the security of an implementation should not
rely on keeping implementational aspects secret. We call this Kerckhoffs’ extended principle
according to Kerckhoffs’ principle (Kerckhoffs 1883).
6.2. Security Models for CBAs 77
In this thesis we focus on the symmetric cipher AES 3. Beside the standard implementation
we consider several variations of the fast implementation of AES as described in Section 2.4
(page 16). This implementation uses 5 sboxes T0,...,T4each having 256 entries of size 4
bytes.
Since we focus on information leakage due to table lookups, we assume that Aknows
the position of these sboxes in the memory and the possible cache lines they can be mapped
to. Recent processors possess several megabytes of cache memory that can hold the much
smaller sbox data completely. Hence, an access to some sbox data cannot evict other sbox
data. To simplify the description further, we assume that each sbox is mapped consecutively
into the cache memory. Let vbe the number of sbox elements that can be stored in a single
cache line. For each sbox Tj, 0 ≤j≤4 and 0 ≤i≤ ⌈256
v⌉ − 1 let CLj
idenote the cache line
that contains the following elements:
CLj
i= [Tj[i·v],...,Tj[i·v+v−1]] .
If it is clear from the context which is the referred sbox we simply write
CLi= [S[i·v],...,S[i·v+v−1]] .
Furthermore, for an index xof an sbox entry let hxidenote the index of the cache line that
stores x. For example hTj[i·v+ 1] i=i. Remember that Aknows all technical details
about the implementation in particular the position and the mapping of the sboxes to the
cache lines. Hence, Acan compute hxifor every xefficiently.
State of the art encryption algorithms like AES are very fast. Therefore, it is very unlikely
that the encryption of a single plaintext is interrupted by an other process accidentally. This
implies that during the encryption no other process causes cache accesses. Additionally, we
assume that in the beginning of every encryption (decryption) the cache does not hold any
data of an sbox. Hence, cache hits and cache misses only depend on the actual encryption
process. In particular, former encryptions do not have any influence on the cache content.
Note that the Assumptions 12 through 14 of the fundamental CBA model above improve
the strength of an adversary. They allow a simpler analysis and a simpler description of CBAs
but are not essential for an attack to be successful in principle. However, if the Assumptions
12 through 14 do not hold the complexity of an attack increases.
In (Page 2003) two general approaches are given to classify models of cache behavior
attacks: the trace driven CBA and the time driven CBA.
6.2.2 Time Driven CBA
As described in Section 3.2.1 it is possible to use timings of encryptions to determine in-
formation about the secret key. The classical timing attacks on RSA (Kocher 1996, Dhem
3However, all the analysis can be adapted to implementations of virtually any block cipher that uses table
lookups.
78 Chapter 6. Cache Behavior Attacks (CBAs)
et al. 1998) and AES (Koeune and Quisquater 1999) are based on data dependend timings
of certain operations during the encryption, e.g., multiplication. However, in modern block
ciphers like DES and AES, a complex function like the non-linear substitution is usually real-
ized via table lookups. In the AES selection phase it was not clear how to mount side channel
attacks based on table lookups. Table lookups were regarded as constant time operations
and therefore regarded as resistant against timing attacks, see (Daemen and Rijmen 1999).
As it turned out, this is not true for implementations running on computers with cache.
On computers with cache, table lookups to some indices will cause a cache hit while table
lookups to other indices cause a cache miss. An element of an sbox that is already stored in
the cache can be accessed faster than an element that is not stored in the cache. The index
of such an sbox lookup depends on values of intermediate results that again depend on the
plaintext and the secret key.
Hence, values of intermediate results indirectly influence the running time of the algo-
rithm, even for table lookups. These data dependend timings can be statistically analyzed
by an attacker Ato derive information about intermediate states. In turn, information
about intermediate states let Adeduce information about the secret key k. Hence, there is
information leakage due to the cache behavior of the cryptographic algorithm.
Threat Model for Time Driven CBAs The threat model for time driven CBAs is based
on the fundamental threat model presented in Section 6.2.1 (page 76). For a time driven CBA
to be successful, the following assumptions must be valid:
Assumption 18 It is more likely that an encryption of a plaintext that causes only
few cache misses has a short running time than an encryption of a plaintext that
causes more cache misses.
Assumption 19 Ais able to measure the time an encryption takes with reasonable
precision.
Discussion of the Threat Model Assumption 18 specifies the relation between the cache
behavior and the overall encryption time. The impact of a single cache hit or miss on the
encryption time depends on the underlying hardware. To mount a time driven CBA we
assume (Assumption 19) that the attacker can measure the encryption time. The precision
of these measurements is sufficient to allow a statistical analysis of the timings to verify if a
cache hit or miss occurred during a certain step of the encryption. In general, an attacker A
does not need complex equipment or techniques to measure the running time with reasonable
precision. For example, modern processors provide so called performance registers, e.g., to
measure timings of processes with a resolution in the range of clock cycles. A description of
how to use the performance registers, e.g., for time measurements is given in (Intel 2006).
6.2. Security Models for CBAs 79
Basic Structure of Time Driven CBAs The basic structure of a time driven CBA
follows the structure of general side channel attacks. In order to determine information
about the i-th byte kiof the secret key kthe attack consists of the two steps shown in
Figure 6.3.
measurement step: An attacker Achooses a set Sof n∈Narbitrary but different
plaintexts p(1),...,p(n). For each of these plaintexts p(j),Ameasures the time t(j)
the crypto process needs to encrypt p(j).
analysis step: To test a hypothesis bkiof the i-th byte kiof the secret key kthe attacker
Auses the following method based on the method described in (Dhem et al. 1998).
1. Areproduces a part of the encryption of p(j)assuming that bkiis correct. In
particular, Acomputes a certain intermediate result bx(j)of the encryption
of p(j)that only depends on the plaintext, the candidate bkiand possibly on
other parts of the key that Aalready knows. E.g., in AES this could be a
byte of the state after the first application of the sbox.
2. Furthermore, Asimulates the cache behavior of the encryption on that com-
puter. Hence, Adetermines the number z(j)of cache misses that occur during
the computation of bx(j). Let
M=1
n·
n
X
j=1
z(j)
denote the average number of cache misses taken over all z(j).
3. Apartitions the set Sof plaintexts into two sets Ssand Slas follows. A
plaintext p(j)is placed in set Ssif the number of cache misses that occur
during the computation of bx(j)is less than M. Otherwise p(j)is placed in set
Sl.
4. Acomputes the mean encryption times Msand Mlof plaintexts in Ssand
Slas
Ms=1
nX
p(j)∈Ss
t(j)
and
Ml=1
nX
p(j)∈Sl
t(j).
If Msand Mldiffer significantly Aconcludes that the candidate bkiis correct.
In the other case, Aconcludes that the candidate is wrong.
Figure 6.3: Basic structure of a time driven CBA
80 Chapter 6. Cache Behavior Attacks (CBAs)
To see why the attack works we first consider the case that the candidate bkiis correct.
Hence, it is more likely that z(j)matches the number of cache misses that occur during
the computation of x(j)in the encryption. Due to Assumption 18, it is more likely that an
encryption of a plaintext p(j)that causes only few cache misses while computing x(j)has
a shorter running time than an encryption that causes many cache misses. Therefore, Ml
should be significantly larger than Ms.
If bkiis not correct it is likely that the z(j)are not the correct numbers of cache misses
that occur during the computation of x(j). Hence, the partition of the plaintexts into the
sets Ssand Slis not entirely determined by the correct number of cache misses. We expect
that the mean times Msand Mlof both sets do not differ significantly.
The success probability of the attack depends on the precision of time measurements and
on the number nof plaintexts. Improving the precision of the measurements and increasing
the number nof plaintexts increases the success probability.
The first time driven CBAs mounted on DES, AES and several other block ciphers where
published in (Tsunoo, Tsujihara, Minematsu and Miyauchi 2002), (Tsunoo, Kubo, Shigeri,
Tsujihara and Miyauchi 2003a), (Tsunoo, Kawabata, Tsujihara, Minematsu and Miyauchi
2003b), (Tsunoo et al. 2003c), (Tsunoo, Suzaki, Saito, Kawabata and Miyauchi 2003d) and
(Tsunoo, Tsujihara, Shigeri, Kubo and Minematsu 2006).
6.2.3 Trace Driven CBA
In a trace driven CBA the attacker Ais more powerful. We assume that Ais able to derive the
profile of the cache behavior. That means that for each memory access Agets the information
if a cache hit or a cache miss occurred. Furthermore, it is assumed that Ais able to relate
this information to operations of the encryption. The sequence of operations together with
the information whether a cache hit or miss occurred is called a cache trace. In the sequel,
we present a threat model for trace driven CBAs to formalize the abilities of the attacker.
Threat Model for Trace Driven CBAs As for time driven CBAs the threat model
for trace driven CBAs is based on the fundamental model of Section 6.2.1 (page 76). The
fundamental threat model is extended by the ability of the adversary to obtain cache traces
of an encryption.
Assumption 21 Ais able to obtain the trace of cache activity.
In order to get a simpler description of the basic structure of trace driven CBAs we assume
that Aalways gets the correct trace without any distortion. This simplification reduces the
complexity but is not essential for a trace driven CBA to be successful.
6.2. Security Models for CBAs 81
Discussion of the Threat Model Assumption 21 provides the basis of trace driven CBAs.
However, obtaining traces of encryptions is not as easy as simple time measurements. The
attacker needs more sophisticated tools to mount a trace driven CBA. For example, Page
(Page 2002) proposes power analysis or the analysis of electromagnetic radiation as means
to determine cache traces. In (Bertoni, Zaccaria, Breveglieri, Monchiero and Palermo 2005)
the authors show how to obtain cache traces via power analysis.
Basic Structure of Trace Driven Attacks As for time driven CBAs, the basic structure
of trace driven CBAs follows the structure of general side channel attacks. In order to
determine information about the i-th byte kiof the secret key kthe attack consists of 2 steps
shown in Figure 6.4.
measurement step: Achooses a set Sof n∈Nplaintexts p(1),...,p(n)
and obtains the cache trace of the encryption of each p(j)as explained
above.
analysis step: To test a hypothesis bkiof a byte kiof the secret key kthe
adversary uses the following method.
1. Areproduces a part of the encryption of p(j)assuming that bkiis
correct. In particular, Acomputes a certain intermediate result
bx(j)of the encryption of p(j)that only depends on the plaintext,
the key byte bkiand possibly on other parts of the key that A
already knows. E.g., in AES this could be a byte of the state
after the first application of the sbox.
2. Furthermore, Asimulates the cache behavior of the encryption
on that computer. Hence, Adetermines the cache trace that
occurs during the computation of bx(j). If the trace of the sim-
ulated cache behavior that occurs during the computation of
bx(j)matches the obtained cache trace the hypothesis may be
correct. Otherwise the hypothesis is proven to be wrong.
Figure 6.4: Basic structure of a trace driven CBA
Examples for trace driven CBAs are given in (Page 2002) and (Acıi¸cmez and Ko¸c 2006).
82 Chapter 6. Cache Behavior Attacks (CBAs)
6.2.4 Access Driven CBA
In this section we present a threat model that is stronger than the models presented above. In
addition to the plaintext/ciphertext pair, the adversary Agets the information which cache
lines were accessed during the encryption. Strengthening the threat model in this way is
justified by the attacks of (Bernstein 2005), (Osvik et al. 2006) and (Neve and Seifert 2006).
These attacks show that cache based attacks are indeed very powerful, even in practice.
Hence, a conservative attitude towards unclear aspects of A’s technical abilities is necessary
to get a reliable analysis.
Threat Model for Access Driven Attacks According to the models described so far
access driven CBAs are also based on the fundamental threat model of Section 6.2.1 (page
76). We call this threat model the ad CBA model. We extend the fundamental model by
assuming that the following holds:
Assumption 24 Agets the indices of the cache lines that were accessed during the
encryption (decryption). We call this information cache information.
Assumption 25 We explicitly assume that Acannot distinguish between elements
in a single cache line.
The main point is that the adversary Ais able to determine information about which
cache lines were accessed during the encryption of a plaintext. To build a strong model we
simplify the determination of accessed cache lines in the following way. We assume that
Asimply gets the correct partition of the set of all cache lines Dinto the sets of indices
of accessed cache lines D0and the set D1of indices of cache lines that were not accessed
during the encryption of the plaintext pinto the ciphertext c. We call this partition cache
information. The triple (p, D0, D1) (or (c, D0, D1)) is called a measurement.
Discussion of the Threat Model Assumption 24 provides the basis of access driven
CBAs. In (Hu 1992) the author already presented a method to determine the indices of
cache lines that were accessed during a computation. Assuming that Ahas access to the
computer he can measure the time it takes to access certain data with reasonable precision.
Contrarily to the time driven CBA, Adoes not need to measure timings of the encryption
process. He only needs to measure the time it takes to access parts of his own data. See
(Intel 1997) for a description of how to do precise time measurements on a PC. To detect
which cache lines has been accessed during the encryption Acan use the Prime-and-Probe
method shown in Figure 6.5.
If, on one hand, the crypto process accesses the cache line CLiduring the encryption he
evicts the data block Bifrom the cache. Hence, accessing Biafter the encryption causes a
6.2. Security Models for CBAs 83
1. Flush the cache by accessing dmemory blocks B1, . . . , Bdsuch that Biis
mapped to cache line CLi.
2. Trigger the crypto process to encrypt the plaintext p.
3. For each memory block Bi, 1 ≤i≤ddo
(a) measure time tto access Bi
(b) if tis large then cache line CLihas been accessed during the encryption
(c) else cache line ihas not been accessed by the encryption
Figure 6.5: Prime-and-Probe method
cache miss which in turn results in a larger access time. On the other hand, if the crypto
process does not access cache line CLithe data block Biremains in the cache. Hence,
accessing Biafter the encryption of pcauses a cache hit, allowing to access Bivery fast.
However, we assume that Acannot distinguish elements of a single cache line. Up to
now it is not clear if it is technically possible to distinguish accesses to elements within the
same cache line. No access driven CBA published so far requires this somewhat difficult and
unlikely ability of the adversary A. Obviously, the ability to distinguish elements within the
same cache line would allow even more powerful cache attacks than the attacks published so
far. As we will see, all efficient countermeasure are implicitly based on this assumption.
Basic Structure of Access Driven Attacks Next we give the general structure of an
access driven CBA to show how an attacker Acan use cache information to derive information
about the secret key. The attacker Aperforms the two steps shown in Figure 6.6.
At this point Ahas computed a set b
Kiof possible key candidates for ki. He knows that
one of the elements of b
Kiis the correct key byte kibecause ki∈b
K(j)
ifor all 1 ≤j≤n.
Hence, the correct value is also an element of the intersection of all sets b
K(j)
i.
Wrong key candidates occur for two reasons. Firstly, each access to a cache line does not
determine the intermediate result exactly but leaves vpossible values where vis the number
of sbox elements that are stored in a single cache line. Secondly, there occur sbox lookups
during the encryption that do not compute x(j)directly but also induce cache accesses. We
call these sbox lookups perturbing lookups. Since an adversary cannot decide whether an
sbox lookup is perturbing or not he has to consider all key candidates that cause an access
to a cache line of the set D0.
The number of the remaining candidates depends on the number of measurements and,
as we will see later, on specific details of the attack. We present an access driven CBA that
84 Chapter 6. Cache Behavior Attacks (CBAs)
measurement step: Agets n∈Nmeasurements m(1),...,m(n)of encryptions of
plaintexts p(1),...,p(n)with the secret key k. That means, for each plaintext
p(1),...,p(n)the adversary Aknows the partition of the set of all cache lines
into the set D0of accessed cache lines and into the set D1of cache lines that
were not accessed during the encryption.
analysis step: For each measurement m(j)the attacker Aanalyses the correspond-
ing cache information to compute a set of possible values of an intermediate
result x(j)
iof the encryption of p(j)that only depends on the plaintext (or
ciphertext) and on the i-th byte kiof the (round-)key k. Then Acomputes
a set b
K(j)
iof candidates for kithat would produce one of the possible values
for x(j)
iduring the encryption. Finally, Acombines the information of all
measurements m(1),...,m(n)by computing
b
Ki:=
n
\
j=1 b
K(j)
i.
Figure 6.6: Formal outline of an access driven CBA
can only determine half of the key bits whereas another attack that we present reveals the
complete key.
6.2.5 Extending the Threat Model for Access Driven CBAs
We present an extended threat model that strengthens the attack compared to the adversary
of the access driven CBA threat model. We call this model ead CBA model. In addition to
the assumptions of access driven CBAs as described above the following assumption holds:
Assumption 27 Acan restrict cache information to certain rounds of the encryp-
tion.
We assume that the adversary can influence the start and end of a measurement. I.e.,
Acan restrict cache information to certain rounds of the encryption. Hence, Acan focus
on chosen rounds of the AES encryption (decryption). As we will see, restricting the cache
information to certain rounds decreases the expected number of accessed cache lines. In turn
this improves the complexity of access driven CBAs significantly but does not increase the
information that leaks through the cache behavior of the crypto process.
Restricting measurements to certain rounds is justified by the property of modern mul-
titasking operating systems to change the active process after a constant amount of running
6.3. Access Driven CBAs on AES 85
time. For example, see (Stallings 2005) for further details. Hence, it is possible that the
encryption process is interrupted by the attackers process, allowing Ato access the cache
during an encryption (decryption). In (Bernstein 2005) Bernstein already warned that this
property may be exploitable and the authors of (Brickell et al. 2006) managed to exploit it
to determine cache information of arbitrary rounds on a real PC with some reasonable pre-
cision. Later, we will use the ead CBA model to analyze the resistance of implementations
and countermeasures against CBAs.
Table 6.2 compares the three different types of CBAs described above. The first column
indicates how difficult it is to mount the attack. The second column lists how many mea-
surements have to be done. In the next section we give the descriptions of two access driven
CBAs on AES based on the first round(s) and on the last round.
6.3 Access Driven CBAs on AES
To illustrate the general structure of access driven CBAs in the ead CBA modell, in this
section we present two access driven CBAs on AES. The first attack as presented in (Osvik
et al. 2006) is based on the first round(s) of AES. The second attack is based on the last
round of AES. The idea was mentioned in (Osvik et al. 2006) and (Brickell et al. 2006). In
the sequel we describe both attacks on the fast implementation of AES (see Section 2.4).
Although both attacks work for different sizes of cache lines, we simplify the descriptions by
fixing the size of a cache line to λ= 512 bits. Hence, each cache line can store v= 16 entries
of a large sbox T0,...,T4and each sbox Tjfits into m= 16 cache lines CLj
0,...,CLj
15. For
0≤ℓ≤15 the sbox the attack focus on is mapped into the cache lines as follows:
Cj
ℓ={Tj[x]|x=ℓ·16,...,ℓ·16 + 15}.
6.3.1 Access Driven CBA on the First Round
The first CBA is based on intermediate results of the first round. To be more precise, A
focus on the result of the first application of an sbox in the first round. Since the involved
type difficulty complexity
time driven low high
trace driven high medium
access driven low low
Table 6.2: Comparing properties of different CBAs
86 Chapter 6. Cache Behavior Attacks (CBAs)
sbox depends on the index iof the key byte we only consider the output
xi=T(imod 4)[pi⊕ki]
of the sbox T(imod 4). To simplify notation we simply write
xi=T[pi⊕ki].
Structure of the Attack
To derive information about the i-th byte kiof the secret key kthe attacker performs the
following operations according to the basic structure of access driven CBAs shown in Section
6.2.4 (page 83):
1. Achooses n∈Nplaintexts p(1),...,p(n)that are fixed in byte p(j)
iand are independent
and uniformly distributed in the other bytes.
2. Aobtains measurements m(j)= (D(j)
0, D(j)
1, p(j)) for 1 ≤j≤n.
3. Aconcludes that
x∈b
X(j):= [
ℓ∈D(j)
0
{ℓ·16,...,ℓ·16 + 15}
4. Acomputes the sets
b
K(j)
i=np(j)
i⊕bx(j)
i|bx(j)
i∈b
X(j)o
for all 1 ≤j≤n.
5. Acomputes the set
b
Ki=
n
\
j=1 b
K(j)
i
of candidates for ki.
Discussion of the Attack Let us assume that Acan restrict the measurements to the
first round. D(j)
0is the set of the indices of the 16 cache lines that were accessed during
the 4 applications of T(imod 4) in round 1 of the encryption of the plaintext p(j). Hence,
the correct key byte kiis an element of every b
K(j)
i. Remember that a cache line can store
v= 16 elements of an sbox. Hence, depending on the plaintexts p(1), . . . , p(n)the remaining
set of key candidates b
Kicontains at most 4 ·16 = 64 elements if all nmeasurements cause
the access of the same 4 cache lines. However, fixing byte piof the plaintexts and choosing
all other bytes uniformly at random lets Adetermine the cache line ℓthat is accessed while
computing xafter only few measurements. Knowing ℓlets Areduce the number of possible
key candidates to 16. To see why at least 16 key candidates will survive this attack we look
at the structure of the elements of a set b
K(j)
i. The elements of b
K(j)
iare always of the form
6.3. Access Driven CBAs on AES 87
p(j)
i⊕ℓ, . . . , p(j)
i⊕(ℓ+ 15). That means that the elements of each b
K(j)
irestricted to the 4
lower bits take on all 16 possible values. It follows that the attack is not able to determine
the lower 4 bits of the key byte and hence 24= 16 candidates for kiremain. In the case that
Acannot restrict the cache information to the first round the set D0also contains indices of
the perturbing lookups of cache lines that were accessed in rounds 2 to round 9. Hence, it
will take more measurements to determine information about the key. The total amount of
information that Agets are again the upper 4 bits of each key byte.
To determine the remaining bits of each key byte one can combine this attack with a
modified attack on the second round to compute the complete key.
6.3.2 Access Driven CBA on the Last Round
In this section we describe a CBA that is based on an intermediate result4
xi=S−1ci⊕k10
i
where Auses cache information about the sbox lookup of the last round to determine the
secret key k.
Basing the attack on the last round has advantages over the attack on the first rounds.
First, cache information of the last round is sufficient to determine all bits of the secret key.
So Adoes not need to attack different rounds. Another advantage is that the sbox T4of the
last round is special and is only used in that round. This helps the attacker because cache
information is never perturbed by cache accesses of other rounds. The cache information is
restricted to the last round automatically.
For sake of simplicity, we only show how to compute a single byte k10
iof the last round
key k10. However, the same strategy can by applied to determine the other key bytes of k10.
Knowing all key bytes of the last round key allows to revert the key schedule and compute
the cipher key k. As mentioned above, we fix the size of a cache line to λ= 512 bits and only
consider the sbox T4of the fast implementation of AES as described in Section 2.4 (page 16)
since it is widely used in common crypto libraries like openssl (OpenSSL Project 2005). We
denote the j-th cache line used for the table lookups for T4by CLj, j = 0,...,15. Hence,
CLjcontains the 4-tuples
{(S[x],S[x],S[x],S[x]) |x= 16 ·j, . . . , 16 ·j+ 15}
as defined in Section 2.4 (page 16).
Structure of the Attack The structure of the attack on the 10th round is similar to the
structure of the attack on the first round. To derive information about the i-th byte of the
last round key k10 the attacker performs the following operations:
4To simplify notation we omitted the ShiftRows operation.
88 Chapter 6. Cache Behavior Attacks (CBAs)
1. Achooses n∈Nplaintexts p(1),...,p(n)uniformly at random.
2. Aobtains the ciphertexts and the measurements m(j)= (D(j)
0, D(j)
1, c(j)) for 1 ≤j≤n.
3. Aconcludes that
x(j)
i∈b
X(j)
i:= [
ℓ∈D(j)
0
{ℓ·16,...,ℓ·16 + 15}
4. Acomputes the sets
b
K(j)
i=nc(j)
i⊕Shbx(j)
ii|bx(j)
i∈b
X(j)
io
for all 1 ≤j≤n.
5. Acomputes the set
b
Ki=
n
\
j=1 b
K(j)
i
of candidates for k10
i.
If b
Kicontains only a single element, the adversary has determined k10
i. Now it is not hard
to see that the intersection of sets in step 5 eventually will contain only a single element if
every wrong key candidate is not an element of all sets b
K(j)
i. The big difference between the
attack on the first and on the last round is that in step 4 the sbox is involved in computing
the intermediate result. We verified that unlike in the attack on the first round the diffusion
on the bits caused by the sbox lets Adetect wrong key candidates. That means that for
every wrong key candidate bkthere exist appropriate choices of plaintexts such that the
resulting set of key candidates does not contain the wrong candidate bk. We will consider
this property of the attack more closely in Section 6.7. Moreover, experiments show that on
average approximately 15 pairs (p(j), c(j)) together with the cache information D(j)
0suffice to
determine the key byte k10
iuniquely.
6.4 General Methods to Thwart CBAs
In this section we give an overview over countermeasures to hedge implementations of cryp-
tographic algorithms against CBAs as proposed for example in (Page 2003). We give a brief
description and assess each countermeasure with respect to performance and security.
remove cache A straightforward countermeasure to counteract CBAs to remove or disable
the cache and hence the cache effects. On one hand it is not clear how to do this
on recent processors. On the other hand, disabling the cache would have devastating
consequences on the performance of implementations.
6.5. Information Leakage and Resistance 89
minimize time accuracy Time driven CBAs depend on the ability of the attacker to mea-
sure timings with reasonable precision. Disturbing timing measurements, e.g., by in-
serting random dummy operations into the encryption process, would increase the effort
for an attacker. However, a time driven CBA may still be feasible.
maximize line size The size of a cache line determines the amount of information that
leaks by a CBA. The larger a cache line is, the lower is the amount of information that
leaks. This also increases the effort needed to perform a CBA but does not necessarily
prevent it.
perform cache warming Warming the cache, that means loading the whole sbox into the
cache before starting the encryption, was first regarded as an effective countermeasure.
However, Bernstein (Bernstein 2005) warned about the effectiveness and the authors of
(Brickell et al. 2006) managed to defeat this countermeasure.
disable cache flushing Another point of defense could be to prevent an attacker from
flushing the cache. In combination with cache warming this would render all CBAs
useless. However, building this countermeasure needs additional hardware support
that would be very expensive.
cache flushing on every process switch In the analysis of the VAX security kernel (Hu
1992) the author proposes to clear the cache on every process switch. This approach
needs the support of the kernel of the operating system and would obviously close the
cache based side channel. However, even with hardware acceleration the impact on the
performance would be very high.
randomize the instruction order Randomizing the instruction order could also increase
the effort to mount CBAs. Because the attacker cannot associate side channel informa-
tion to certain operations, the number of measurements needed to deduce information
about intermediate states increases. See (May, Muller and Smart 2001a) and (May,
Muller and Smart 2001b).
randomize intermediate states Randomizing intermediate states as described in Chapter
4 obviously thwarts CBAs. Each intermediate result is completely randomized such
that is independent of the plaintext and the secret key. Hence, even if table lookups
are used to compute intermediate results the information that leaks via a CBA is also
independent of the secret key.
6.5 Information Leakage and Resistance
CBAs are very powerful attacks. Although they seem to be unrealistic and hypothetical
on first sight they were proven to be a real threat for implementations of cryptographic
algorithms on computers with cache. Hence, a strong threat model is essential for a thorough
90 Chapter 6. Cache Behavior Attacks (CBAs)
security analysis. The threat model described above is stronger than the threat models
published so far. The adversary is more powerful because Acan restrict the cache information
to a smaller interval of encryption operations. This reduces the number of accessed cache
lines per measurement and increases the efficiency of cache based attacks. The main questions
when analysing the security against CBAs are information leakage and complexity of a CBA.
After giving a formal definition of information leakage we introduce the notion of the so
called resistance of an implementation as a measure that allows to estimate the complexity
of a CBA.
Information Leakage The most important aspect of an implementation regarding the
security against access driven CBAs is to determine the maximal amount of information
that leaks via access driven CBAs. As we will see, the amount of leaking information about
the secret key varies depending on the details of the CBA and the implementation of the
cryptographic algorithm. We make the following definition:
Definition 3 (information leakage) We consider an adversary who can mount a CBA
using an arbitrary number of measurements. Let b
Kibe the set of remaining key candidates
for a key byte k10
iat the end of the attack. Then the leaking information is
8−log2|b
Ki|
bits.
The amount of leaking information allows to estimate the uncertainty of an attacker about
the secret key that remains after a successful access driven CBA. To quantify the maximal
amount of information Acan obtain about the secret key by access driven CBAs, we define
|CL|to be the size of a cache line in bits, |S|the number of entries of the sbox and sthe size
of a single sbox element in bits. Hence, the number of elements that fits into a cache line is
|CL|
sand the cache information of a single measurement leaks at most
log2(|S|)−log2|CL|
s= log2|S|
|CL|·s
bits. Depending on the exact nature of an attack, the sets of measurements let the attacker
reduce the number of remaining key candidates after the attack. The information leakage
varies between 0 and 8 bits of information per byte. For example, the attack on the first
round of (Osvik et al. 2006) mounted on the fast implementation can determine at most 4
bits of every key byte regardless of the number of measurements. In contrast, the attack of
(Brickell et al. 2006) based on the last round allows an adversary to determine all key bits.
Furthermore, in Section 6.6 (page 96) we present an implementation that does not leak any
information in our model.
6.5. Information Leakage and Resistance 91
Complexity of a CBA The information leakage as defined above measures the maximal
amount of information a CBA can provide using an arbitrary number of measurements.
Determining the expected number of measurements an attacker needs to obtain the complete
leaking information depends on the details of the implementation and on details of the CBA.
For simplification we introduce the notion of so called resistance. The resistance focuses on
the general structure of a CBA as shown in Section 6.2.4 (page 83) and does not consider
details of certain CBAs. It is a general measure to estimate the complexity of CBAs on
different implementations.
Definition 4 (Resistance) The resistance of an implementation is the expected number Er
of key candidates that are proven to be wrong during a single measurement that is based on r
rounds of the encryption.
The larger Erthe more susceptible is the implementation to access driven CBAs. In par-
ticular, if an implementation does not leak any information then an adversary cannot rule
out key candidates and hence the resistance is 0. To compute Erwe assume that all sbox
lookups are independently and uniformly distributed. This assumption is justified because an
attacker Ausually does not have any information about the distribution of the sbox lookups.
Hence, the best he can do in an attack is to choose the parts of the plaintexts/ciphertexts
that are not relevant for the attack uniformly at random.
Let mbe the number of cache lines needed to store the complete sbox. Each cache line
can store velements of an sbox. Furthermore, let wbe the number of sbox lookups per
round and let rbe the number of rounds the attack focuses on. In an access driven CBA
a key candidate is proven to be incorrect if it causes an access of a cache line that was not
accessed during a measurement. Assuming that all sbox lookups are uniformly distributed
the probability that a cache line is not accessed in all r·wsbox lookups is
pmiss := m−1
mr·w
.
Hence,
Er:= m−1
mr·w
·m·v(6.1)
is the expected number of key candidates that can be sorted out after a single measure-
ment. However, the maximal amount of information an arbitrary number of measurements
can reveal is limited by the information leakage. Further measurements will not reveal further
information. We verified by experiments that the number of measurements needed to achieve
the full information leakage only depends on Er.
In the sequel, we focus on methods to counteract CBAs. In general, there are two ap-
proaches to counteract such a side channel. The first approach is to use some kind of random-
ization to ensure that the leaking information does not reveal information about the secret
92 Chapter 6. Cache Behavior Attacks (CBAs)
key. Using randomization is a general strategy that protects against several kinds of side
channel attacks, see for example Chapter 4 (page 25). In Section 6.7 we analyze a more effi-
cient method to counteract CBAs based on random permutations. Before that, we consider
the second approach that is to reduce the bandwith of the side channel. We present several
implementations of AES and examine their information leakage and their resistance.
6.6 Information Leakage and Resistance of Selected Imple-
mentations
As Bernstein pointed out in (Bernstein 2005) to thwart cache attacks it is not sufficient to load
all sbox entries into the cache before accessing the sbox in order to compute an intermediate
result because Acan get cache information at all times. Hence, loading the complete sbox
into the cache does not suffice to hide all cache information. Therefore, he advises to avoid the
usage of table lookups in cryptographic algorithms. Computing the AES SubBytes operation
according to its definition
f:{0,1}8→ {0,1}8
x7→ a·INV(x)⊕b
would virtually cause no cache accesses and hence seems to be secure against CBAs.
However, implementing SubBytes like this would result in a very inefficient implementation
on a PC. To achieve a high level of efficiency people prefer to use precomputed tables. In the
sequel, we analyze the security of some well known and some novel variations of implemen-
tations of AES. For each of these implementations we consider access driven CBAs based on
different sboxes and examine the information leakage and the resistance as defined in (6.1).
To simplify notation we fix the size of a cache line to 512 bits as we did above. Furthermore,
we did timing experiments for each implemention to estimate its efficiency. The testing en-
vironment for our timing experiments is shown in Table 6.3. For each implementation we
compare its timing with the timing of the fast implementation. Table 6.9 summarizes the
information leakage, resistance and efficiency for all considered implementations.
CPU Intel Pentium M, 1400MHz
OS Linux, Kernel 2.6.18
Compiler gcc 4.1.1
Table 6.3: Experimental environment
6.6. Information Leakage and Resistance of Selected Implementations 93
Standard Implementation
The standard implementation as described in Section 2.3 (page 9) uses only the standard sbox
S. Hence, an access driven CBA as described above is based on that sbox. The standard
sbox consists of 256 entries each of size one byte. Hence, the sbox can be stored in m= 4
cache lines each of which can hold v= 64 sbox entries. In each round the sbox is applied
w= 16 times. Next, we analyze the susceptibility to access driven CBAs as described above:
Information leakage To determine the number of leaking bits we performed experiments.
Due to the low number mof cache lines and the relative high number of sbox accesses
per round the probability that a cache line is not accessed in a part of the encryption
becomes very small with an increasing number rof involved rounds. We verified by
experiments that measurements taken over ≤3 rounds of the standard implementation
leak all key bits. Although the small probability pmiss prevents performing further
experiments we assume that even more rounds will leak all key bits.
Resistance As explained above, the probability that a cache line is not accessed during r
rounds of an encryption decreases rapidly with increasing r. Table 6.4 summarizes the
resistance of the standard implementation for 1 ≤r≤10.
rEr
1 2.57
2 2.57 ·10−2
3 2.58 ·10−4
4 2.58 ·10−6
5 2.59 ·10−8
6 2.59 ·10−10
7 2.60 ·10−12
8 2.61 ·10−14
9 2.61 ·10−16
10 2.62 ·10−18
Table 6.4: The resistance of the standard implementation
E.g., we expect that a single measurement taken over 2 rounds of the encryption allows
to sort out approximately 0.0257 key candidates.
Efficiency The standard implementation uses some time consuming operations such as ma-
trix multiplication over the finite field F256. Hence, on a 32 bit processor the efficiency
of the standard implementation is obviously lower than the efficiency of the fast imple-
mentation that avoids these inefficient operations. Our timing experiments on a 32 bit
94 Chapter 6. Cache Behavior Attacks (CBAs)
processor have shown that the standard implementation is about 3 times slower than
the fast implementation.
Fast Implementation
The fast implementation as described in Section 2.4 (page 16) is the reference implementation
for virtually all AES implementations in software on 32 bit platforms. Its performance is
based on the clever merge of the round functions SubBytes,ShiftRows and MixColumns into
5 specially constructed sboxes T0,...,T4. Each of these sboxes holds 256 entries of size 4
bytes. Hence, a cache line can store v= 16 sbox elements and we need m= 16 cache lines to
store an sbox Tiin the cache. As described above, each of the sboxes T0,...,T3is applied
4 times in every round 1,...,9 of the encryption. In the last round T4is applied 16 times.
We consider both, a CBA based on table lookups to one of the sboxes T0,...,T3in the first
round like the one described in Section 6.3.1 and a CBA based on the sbox T4of the last
round as described in Section 6.3.2.
Information leakage The access driven CBA of (Osvik et al. 2006) as described in Section
6.3.1 on the first round of AES shows that in this case the fast implementation will
reveal half of the key bits, even with an arbitrary number of measurements. As we have
seen in Section 6.3.2 (page 87) a CBA based on the table lookups to T4in the last
round lets Adetermine the secret key completely.
Resistance Due to the bigger size of the sboxes and the lower number of sbox lookups per
round the resistance of the fast implementation is significantly lower than that of the
standard implementation. If the attack is based on sbox T4than every measurement
is implicitly restricted to the last round because T4is only used in that round. Hence,
the resistance does not change for measurements restricted to a different number of
rounds. We expect that Acan rule out approximately
Er=15
1616
·16 ·16 ≈91
wrong key candidates of a key byte of the last round key after a single measurement.
If the access driven CBA is based on sbox lookups of the first round things are different.
Each sbox T0,...,T3is used 4 times in every round 1,...,9. In this case, the expected
numbers of wrong key candidates that can be ruled out after a single measurement
taken over rrounds are given in Table 6.5.
Efficiency As the name suggests, the fast implementation is very efficient especially on 32
bit computers. It only consists of sbox lookups, shifts and XOR operations and omits
the complex operations such as matrix multiplication and uses precomputed tables to
compute operations in finite fields.
6.6. Information Leakage and Resistance of Selected Implementations 95
rEr
1 198.0
2 153.0
3 118.0
4 91.2
5 70.4
6 54.4
7 42.0
8 32.5
9 25.1
Table 6.5: The resistance of the fast implementation against access driven CBAs based on
sboxes T0,...,T3.
Fast Implementation Using Standard Sbox in the Last Round (fast-1)
To improve the security, the authors of (Brickell et al. 2006) suggested to exchange the sbox
T4with the standard sbox in the last round. In the case of a CBA that is based on sbox
lookups of the first round this implementation provides the same information leakage and
resistance as the fast implementation. Therefore, in the sequel we only consider a CBA that
is based on the table lookups of the last round.
Information leakage As for the standard implementation explained above, an access driven
CBA based on the standard sbox used in the last round reveals the complete secret key.
Resistance The resistance of this approach against an access driven CBA based on the
standard sbox is better than that of the fast implementation against an access driven
CBA based on the sbox T4. A single measurement lets Arule out approximately
Er=3
416
·4·64 ≈2.57
wrong key candidates. The resistance remains constant because the standard sbox is
only used in one round.
Efficiency Timing experiments with our implementation of this approach showed that using
the standard sbox in the last round does not slow down the encryption significantly.
Fast Implementation Using only Sbox T0(fast-2)
We consider another modification of the fast implementation of AES. The description of AES
in Section 2.4 (page 16) shows that the i-th entry of the sboxes T1, . . . , T3is equal to the i-th
96 Chapter 6. Cache Behavior Attacks (CBAs)
entry of the sbox T0cyclically shifted by 1,2 and 3 bytes to the right respectively. Hence, we
propose to use only sbox T0in the encryption and shift the result as needed to compute the
correct AES encryption. E.g., to compute the sbox lookup T1[i] using the sbox T0we simply
cyclically shift the value T0[i] by 1 byte to the right. In the last round, we recommend to
use the standard sbox. Since we already analyzed the information leakage and resistance of
the standard sbox we focus on a CBA based on the sbox T0.
Information leakage Using only the sbox T0does not change the amount of information
that leaks compared to the fast implementation. Hence, this implementation causes
also the leakage of the complete secret key.
Resistance The sbox T0needs m= 16 cache lines each of which stores v= 16 elements.
The difference with the fast implementation is that T0is applied w= 16 times in each
round 1,...,10. Due to the increased number of sbox lookups per round the resistance
against access driven CBAs is better than the resistance of the fast implementation.
Table 6.6 (page 96) shows the resistance Erfor all different values r.
rEr
1 91.2
2 32.5
3 11.6
4 4.12
5 1.47
6 5.22 ·10−1
7 1.86 ·10−1
8 6.62 ·10−2
9 2.36 ·10−2
10 8.39 ·10−3
Table 6.6: The resistance of the fast implementation using only T0
Efficiency We implemented this approach and did timing measurements to estimate the
running time. Compared to the fast implementation we could not measure any dif-
ferences in the running time. Hence, this implementation is as efficient as the fast
implementation.
Splitted Sboxes (small-n)
As a simple but effective countermeasure to counteract access driven CBAs we suggest to split
the sbox Sinto nsmaller sboxes S0,...,Sn−1such that every small sbox Sifits completely
6.6. Information Leakage and Resistance of Selected Implementations 97
into a single cache line5. An application Si[x] of sbox Siyields dibits of the desired result
S[x]. Hence, the correct result can be calculated by computing all bits separately and shift
them into the correct position.
We construct the small sboxes Sifor 0 ≤i≤n−1 as follows:
Si:{0,1}8→ {0,1}di
mapping
x7→ ⌊ S[x]⌋(Pi−1
j=0 dj,(Pi
j=0 dj)−1))
where ⌊y⌋(b,e)are the bits yb. . . yeof the binary representation of y= (y0,...,y7). The small
sboxes are shown in Appendix B (page 115). Instead of applying the sbox Sto xdirectly
each Siis applied.
The result is computed as
S[x] =
n−1
X
i=0
Si[x]·2Pi−1
j=0 dj.
In the sequel, we assume that the size of the sbox is a multiple of the size of a cache line and
that all djare equal. Depending on the number of small sboxes we call this implementation
small-n. E.g., let the size of a cache line be λ= 512 bits and for 0 ≤i≤3 let each Sistore
the bits ⌊S[x]⌋(2i,2i+1) . The result S[x] is then computed as
S[x] = S0[x]⊕S1[x]·4⊕S2[x]·16 ⊕S3[x]·64.
We call this implementation small-4.
Information leakage The amount of information that leaks depends on the number nof
small sboxes. Let us consider the variants small-2, small-4 and small-8. Computing
S[x] using variant small-4 or small-8 leaks 0 bits of information having cache lines of
size 512 bits because of two reasons:
1. Every Sifits completely into a single cache line.
2. For every xeach Siis used exactly once to compute S[x].
Hence, the cache information remains constant for all inputs. An attacker will always
get the information that every cache line has been accessed even if he could restrict
measurements to single sbox lookups. The only assumption that is involved is that A
cannot distinguish between the accesses on different elements within the same cache
line (Section 6.2.4). The variant small-2 presumably leaks all key bits in our setting.
Resistance As we have shown above, the variants small-4 and small-8 leak no key bit.
Hence, even an arbitrary number of measurements does not provide any information
that lets Arestrict the number of possible keys. This implies that small-4 and small-8
have resistance 0. The resistance of small-2 is listed in Table 6.7.
5Each sbox should fit into a single cache line at every cache level.
98 Chapter 6. Cache Behavior Attacks (CBAs)
rEr
1 3.91 ·10−3
2 5.96 ·10−8
3 9.09 ·10−13
4 1.39 ·10−17
5 2.12 ·10−22
6 3.23 ·10−27
7 4.93 ·10−32
8 7.52 ·10−37
9 1.15 ·10−41
10 1.75 ·10−46
Table 6.7: The resistance of small-2
Efficiency Obviously, the performance depends on the number of involved sboxes and shifts
to move bits into the right position. To estimate the efficiency we used the small-n
variants in the last round of the fast implementation. Due to the inefficient bit manip-
ulations on 32 bit processors our ad hoc implementation of using small-4 only in the
last round shows that the penalty is about 60%. We expect that a more sophisticated
implementation reduces this penalty significantly. However, we stress that access driven
CBAs are very powerful attacks. Hence, it is not astonishing that secure implementa-
tions are not that efficient.
Table 6.8 shows the result of our timing measurements for the variants small-2, small-4
and small-8 applied only on the last round of the fast implementation of AES. Applying
the small variants to more rounds will decrease the efficiency further.
Comparison of Implementations
To compare the implementations considered above with respect to information leakage (IL),
resistance (Er) and efficiency (Eff.) we summarize the important information in Table 6.9.
The explanations of the detailed informations were given above.
# sboxes fast small-2 small-4 small-8
time factor 1 1.32 1.6 1.95
Table 6.8: Timings for small-2, small-4 and small-8 applied on the last round of AES
6.6. Information Leakage and Resistance of Selected Implementations 99
1 2 3 4 5 6 7 8
standard fast fast-1 fast-2 small-2 small-4 small-8
S T0,...,T3T4S T0S0,S1S0,...,S3S0,...,S7
IL 8 4/8 8 8 8 8 0 0
E12.57 198.0 91.2 2.57 91.2 3.91 ·10−30 0
E22.57 ·10−2153.0 91.2 2.57 32.5 5.96 ·10−80 0
E32.58 ·10−4118.0 91.2 2.57 11.6 9.09 ·10−13 0 0
E42.58 ·10−691.2 91.2 2.57 4.12 1.39 ·10−17 0 0
E52.59 ·10−870.4 91.2 2.57 1.47 2.12 ·10−22 0 0
E62.59 ·10−10 54.4 91.2 2.57 5.22 ·10−13.23 ·10−27 0 0
E72.60 ·10−12 42.0 91.2 2.57 1.86 ·10−14.93 ·10−32 0 0
E82.61 ·10−14 32.5 91.2 2.57 6.62 ·10−27.52 ·10−37 0 0
E92.61 ·10−16 25.1 91.2 2.57 2.36 ·10−21.15 ·10−41 0 0
E10 2.62 ·10−18 25.1 91.2 2.57 8.39 ·10−31.75 ·10−46 0 0
Eff. ∼3 1 ∼1∼1 1.32 1.6 1.95
Table 6.9: Comparison of selected AES implementations with respect to information leakage
(IL), resistance (Er) and efficiency (Eff.)
The standard implementation leaks all key bits and provides good resistance but low
efficiency. The fast implementation also leaks all key bits and provides low resistance but
good efficiency. The modifications fast-1 and fast-2 inherit the information leakage and the
good efficiency but improve the resistance. Fast-1 improves the resistance against CBAs
that are based on the last round from 91.2 to 2.57. Using the fast-1 implementation a CBA
based on the sboxes T0,...,T3is much more efficient than a CBA based on the last round.
Fast-2 uses only one large sbox and hence improves the resistance against all CBAs that
comply with our basic structure of access driven CBAs. As the implementations mentioned
above, the implementation small-2 leaks all key bits. Its resistance is much better than
the resistance of the implementations mentioned above but its efficiency is rather low. The
implementations small-4 and small-8 do not leak a single key bit and hence provide the best
possible resistance. As the implementation small-2, the implementations small-4 and small-8
suffer from low efficiency. See Table 6.10 for a simplified comparison of the implementations
considered above.
For applications that require high speed we propose to use the implementation fast-2
because its efficiency is comparable to the efficiency of the fast implementation. However,
one should keep in mind that fast-2 does not thwart access driven CBAs completely but
only increase the complexity of a CBA. In high security applications where it is inevitable
to thwart CBAs we propose to use the small-4 implementation. It suffers from rather low
efficiency but prevents the leakage of key bits.
100 Chapter 6. Cache Behavior Attacks (CBAs)
implementation info leakage resistance efficiency
standard 8 bit / Byte + −
fast 8 bit / Byte −+
fast-1 8 bit / Byte 0 +
fast-2 8 bit / Byte + +
small-2 8 bit / Byte ++ −−
small-4 0 bit / Byte ++ −−
small-8 0 bit / Byte ++ −−
Table 6.10: Simplified Comparison of Implementations
6.7 Countermeasures Based on Permutations
Another class of countermeasure that was already proposed but not analyzed in (Brickell
et al. 2006) is to use secret random permutations to randomize the accesses to the sbox.
In this section we present a CBA against an implementation of AES secured by a random
permutation that needs roughly 2300 measurements to reveal the complete key (Bl¨omer and
Krummel 2007). This shows that the increase of the complexity of CBAs induced by random
permutations is not as high as one would expect. In particular, the uncertainty of the
permutation is not a good measure to estimate the gain of security. A random permutation
has uncertainty of log2(256!) ≈1684 bits and the uncertainty of the induced partition on the
cache lines is log2(256!/(16!)16)≈976 bits.
On the other hand, we present a subset of permutations, so called distinguished permu-
tations, that reduce the information leakage from 8 bits to 4 bits per key byte. Hence, the
remaining bits must be determined by an additional attack thereby increasing the complexity.
In our standard scenario this is the best one can achieve.
We focus only on the protection of the last round of AES and we assume that the output
xof the 9th round is randomized using some secret random permutation π. To be more
precise, each byte xiof the state x=x0,...,x15 is substituted by π(xi). To execute the last
round of AES a modified sbox T′
4that depends on πfulfilling
T′
4[π(xi)] = T4[xi]
is applied to every byte xi. This ensures that the resulting ciphertext c=c0,...,c15 is correct.
We denote the ℓ-th cache line used for the table lookups for T′
4by CLℓ, ℓ = 0,...,15. Hence,
CLℓcontains the 4-tuples
{(S[π−1(x)],S[π−1(x)],S[π−1(x)],S[π−1(x)]) |x= 16 ·ℓ, . . . , 16 ·ℓ+ 15}.
Using a permutation π, information leaking through accessed cache lines does not depend
directly on xibut only on the permuted value π(xi). Since πis unknown to Athe application
6.7. Countermeasures Based on Permutations 101
of πprevents him to deduce information about the last round key k10 =k10
0,...,k10
15 directly.
However, in the sequel we will show how to bypass random permutations by using CBAs.
6.7.1 An Access Driven CBA on a Permuted Sbox
We assume that we have a fast implementation of AES that is protected by a random per-
mutation πas described above. We also assume that the adversary Ahas access to the AES
decryption algorithm. This assumption can be avoided. However, the exposition becomes
easier if we allow Aaccess to the decryption. We show how an adversary Acan compute the
bytes k10
0,...,k10
15 of the last round key.
Let bk0denote a candidate for byte k10
0of the last round key. In a first step for each possible
value bk0the adversary Adetermines the assignment Pb
k0of bytes to cache lines induced by π
under the assumption that bk0=k10
0. To be more precise Acomputes a function
Pb
k0:{0,1}8→ {0,...,15}
such that if bk0is correct then for all x:
π(x)∈ {16 ·Pb
k0(x),...,16 ·Pb
k0(x) + 15}.
I.e., if bk0is correct then Pb
k0is the correct assignment of values π(x) to cache lines.
Let us fix some xand a candidate bk0for k10
0. We set c0=S[x]⊕bk0and let c
M0={0,...,15}
denote the set of indices of possible cache lines. The adversary Arepeats the following steps
for j= 1,2,...,n until c
M0contains a single element.
1. Achooses a ciphertext cj, whose first byte is c0, while the remaining bytes of cjare
chosen independently and uniformly at random.
2. Using his access to the decryption algorithm, Acomputes the plaintext pjcorresponding
to the cj.
3. Atriggers an encryption of pjby the crypto process and obtains cache information.
I.e., Aobtains the set Dj
0of cache lines that were accessed when applying sbox T′
4
during the encryption of pj.
4. Asets c
M0:= c
M0∩Dj
0.
If c
M0={y}, then Asets Pb
k0(x) = y. Repeating this process for all xyields the function Pb
k0
which has the desired property.
Under the assumption that the guess bk0was correct, the function Pb
k0is the correct
partition of values π(x) into cache lines. Remember that the permutation πis also used to
scramble the bytes on the other positions. In particular, the mapping of bytes to cache lines
102 Chapter 6. Cache Behavior Attacks (CBAs)
is the same for all positions of the state. Hence, it is not difficult to see that the information
provided by Pb
k0enables the adversary to mount a CBA on the last round similar to the one
described in Section 6.3.2 (page 87). This attack can be used to determine for each possible
candidate bk0a set of vectors bk1,...,bk15 of hypotheses for the other key bytes. To determine
a candidate bkithat arises from the value of bk0the attacker Aperforms the following steps:
1. Achooses n∈Nplaintexts p(1),...,p(n)
2. Aobtains the ciphertexts and the measurements m(j)= (D(j)
0, D(j)
1, c(j)) for 1 ≤j≤n.
3. Let xidenote the i-th byte of the intermediate state after the 9-th round. Aconcludes
that
xi∈b
X(j)
i=[
ℓ∈D(j)
0
{bxi|Pb
k0(bxi) = ℓ}
4. Acomputes the sets.
b
K(j)
i=nc(j)
i⊕Shbx(j)
ii|bx(j)
i∈b
X(j)
io
for all 1 ≤j≤n.
5. Acomputes the set
b
Ki=
n
\
j=1 b
K(j)
i
of candidates for ki.
For the time being, we assume that πhas the property that for each bk0there remains only
a single vector of hypotheses for the other key bytes. Hence, in the end there are only 256
AES keys left and a simple brute force attack reveals the correct one. In general, a random
permutation has this property. For a mathematical precise definition and analysis of that
property see Section 6.7.2.
Cost Analysis Experiments show that in the first step of the attack Aneeds on average
9 measurements consisting of a pair (pi, ci) and the corresponding cache information Di
0
such that the intersection c
M0:= TDi
0contains only a single element y=Pb
k0(x). We
need to determine the mapping Pb
k0(x) for every key candidate bk0and every argument x∈
{0,1}8. Hence, a straightforward implementation of the attack needs roughly 256 ·256 ·9
measurements to determine the function Pb
k0(x) for all arguments x∈ {0,1}8and all key
candidates bk0∈ {0,1}8. However, one can reuse measurements for different key candidates
bk0,bk′
0to reduce the number of measurements to roughly 256 ·9 = 2304. To determine the
vector of hypothesis based on the candidate bk0we can reuse the measurements obtained by
determining the function Pb
k0. Hence, the expected number of measurements of this attack is
2304.
6.7. Countermeasures Based on Permutations 103
6.7.2 Separability and Distinguished Permutations
From a security point of view, it is desirable to reduce the information leakage. E.g., a cache
attack alone should reveal as few information as possible, in particular it should not reveal the
complete key. Then the adversary is forced to either mount a refined and more complex CBA
based on other intermediate results or combine the cache attack with some other method to
determine the key bytes uniquely. In this case, the situation is similar to the attack of (Osvik
et al. 2006), where a cache attack on the first round only reveals 4 bits of each key byte.
Hence Osvik et al. combine cache attacks on the first and second round of AES.
First, we present the property a permutation applied to the result of the 9-th round should
have such that Acannot determine the key bytes uniquely using only a cache attack on the
last round. We denote the ℓ-th cache line by CLℓand the elements of CLℓby a(ℓ)
0,...,a(ℓ)
15 .
Hence, the underlying permutation used to define this cache line is given by
π−1(16ℓ+j) = S−1[a(ℓ)
j] (6.2)
for j= 0,...,15.
We say that a key candidate bk0is separable from the first key byte k10
0of the last round
if there exists a measurement that proves bk0to be wrong. Conversely, a key candidate bk0
is inseparable from the key k10
0if there does not exist a measurement that proves bk0to be
wrong. More precisely, writing bk0=k10
0⊕δthe bytes bk0and k10
0⊕δare inseparable if and
only if
∀ℓ∈ {0,...,15}∀a∈CLℓ:a⊕δ∈CLℓ.(6.3)
Notice that this property only depends on the difference δand not on the value of k0. In our
setting there are 16 elements of the sbox in every cache line and therefore property (6.3) can
only be satisfied by at most 16 differences.
It turns out that for |∆|= 16 the set
∆ := {δ|for all k0∈ {0,1}8the bytes k0and k0⊕δare inseparable}
forms a 4 dimensional subspace of F28viewed as a 8 dimensional vector space over F2. It
is obvious that the neutral element 0 is an element of ∆ and that every δ∈∆ is its own
inverse. It remains to show that ∆ is closed with respect to addition. Consider δ, δ′∈∆ and
an arbitrary a∈CLℓ. Then a′=a⊕δ∈CLℓimplies that a′⊕δ′=a⊕δ⊕δ′∈CLℓbecause
of (6.3) and δ⊕δ′∈∆ holds.
Hence, any partition that has the maximal number of inseparable key candidates must
generate a subspace of dimension 4.
Using this observation we describe how to efficiently construct permutations such that the
set ∆ of inseparable differences has size 16. In the sequel, we will call any such permutation
adistinguished permutation.
104 Chapter 6. Cache Behavior Attacks (CBAs)
Construction of the Subspace We first construct a set ∆ of 16 differences that is closed
with respect to addition over F256. We can do this in the following way
1. set ∆ := {δ0:= 0}, choose δ1uniformly at random from the set {1,...,255},
set ∆ := ∆ ∪ {δ1}
2. choose δ2uniformly at random from {1,...,255} \ ∆,
set ∆ := ∆ ∪ {δ2, δ3:= δ1⊕δ2}
3. choose δ4uniformly at random from {1,...,255} \ ∆,
set ∆ := ∆ ∪ {δ4, δ5:= δ4⊕δ1, δ6:= δ4⊕δ2, δ7:= δ4⊕δ3}
4. choose δ8uniformly at random from {1,...,255} \ ∆,
set ∆ := ∆ ∪ {δ8, δ9:= δ8⊕δ1, δ10 := δ8⊕δ2, δ11 := δ8⊕δ3, δ12 := δ8⊕δ4, δ13 :=
δ8⊕δ5, δ14 := δ8⊕δ6, δ15 := δ8⊕δ7}
This construction ensures that ∆ is closed with respect to addition and hence ∆ forms a
subspace as desired.
Construction of the Permutation Now we can compute the function Pthat maps
S[x]∈F8
2to a cache line. We use the fact that 16 proper translations of a 4 dimensional
subspace form a partition of a 8 dimensional vector space F8
2. A basis {b0,...b3}of the
subspace ∆ can be expanded by 4 vectors b4,...b7to a basis of F8
2. The 16 translations
of ∆ generated by linear combinations of b4,...,b7form the quotient space F8
2/∆ that is a
partition of F8
2. To construct the function Pwe do the following:
1. for every cache line CLℓdo
2. choose a(ℓ)uniformly at random from F256/{a(j)⊕δ|j < ℓ, δ ∈∆}
3. fill CLℓwith the values of the set {a(ℓ)⊕δ|δ∈∆}
Using (6.2) this partition into cache lines defines the corresponding permutation.
Analysis of the Countermeasure The security using a distinguished permutation as
defined above rests on two facts.
1. Using a distinguished permutation where the set ∆ of inseparable differences has size
16, a cache attack on the last round of AES will reveal only four bits of each key byte
k10
i. Overall 64 of the 128 bits of the last round key remain unknown. Therefore, the
adversary has to combine his cache attack on the last round with some other method to
determine the remaining 64 unknown bits. For example, he could try a modified cache
attack on the 9-th round exploiting his partial knowledge of the last round key. Or he
could use a brute force search to determine the last round key completely.
6.7. Countermeasures Based on Permutations 105
2. There are several distinguished permutations and each of these permutations leads to
16! different functions mapping elements to 16 lines. If we choose randomly one of these
functions, before an adversary can mount a cache attack on the last round as described
in Section 6.3.2, he first has to use some method like the one described in Section 6.7.1
to determine the function Pthat is actually used.
We stress that we consider the first fact to be the more important security feature. We saw
already in Section 6.7.1 that determining a random permutation used for mapping elements to
cache lines is not as secure as one might expect. Since we are using permutations of a special
form the attack described in Section 6.7.1 can be improved somewhat. In the remainder of
this section we briefly describe this improvement. To do so, first we have to determine the
number of subspaces leading to distinguished permutations.
As before view Fn
2:= {0,1}nas an n-dimensional F2vector space. For 0 ≤k≤nwe
define Dn,k to be the number of k-dimensional subspaces of Fn
2. To determine Dn,k for Van
arbitrary m-dimensional subspace of Fn
2we define
Nm,k := |{(v1,...,vk)|vi∈V, v1,...vkare linearly independent}|.
The number Nm,k is independent of the particular m-dimensional subspace V, it only depends
on the two parameters mand k. Then
Dn,k =Nn,k
Nk,k
.
Next we observe that
Nm,k =
k−1
Y
j=0
(2m−2j) = 2k(k−1)/2
k−1
Y
j=0
(2m−j−1).
Hence, we obtain that
Dn,k =Qk−1
j=0(2n−j−1)
Qk−1
j=0(2k−j−1).
In our special case we have n= 8 and k= 4 and hence the number of 4 dimensional subspaces
is
D8,4=255 ·127 ·63 ·31
15 ·7·3·1= 200787.
As mentioned above, each subspace leads to 16! different distinguished permutations.
Hence, overall we have 200787 ·16! ≈260 distinguished permutations. On the other hand,
because of the special structure of our permutations, to determine the function Pby cache
attacks can be done more efficiently than determining an arbitrary function mapping elements
to cache lines (see Section 6.7.1). In particular, Aonly needs to observe about 7 accesses of a
single but arbitrary cache line. With high probability this will be enough to determine a basis
of the subspace being used. In addition, Aneeds at least one access for every other cache
106 Chapter 6. Cache Behavior Attacks (CBAs)
line in order to determine the function P. The corresponding probability experiment follows
the multinomial distribution. We did not calculate the expected number of tries exactly.
Experiments show that if we can determine the accessed cache line exactly, on average 62
measurements suffice to compute the function Pexactly. However, a single measurement
only yields a set of accessed cache lines. But arguments similar to the ones used for the first
part of the attack in Section 6.7.1 show that we need on average 9 measurements to uniquely
determine an accessed cache line. Therefore, on average we need 9 ·62 = 558 experiments to
determine the function P.
Hence, compared to the results of Section 6.7.1 we have reduced the number of measure-
ments used to determine the function Pby a factor of 3. However, we want to stress again,
that the main security enhancement of using distinguished permutations instead of arbitrary
permutations is the fact, that with distinguished permutations the last round key cannot
be determined by a cache attack on the last round alone. To improve the security, one can
choose larger key sizes such as 192 bits or 256 bits. Since distinguished permutations protect
half of the key bits, the remaining uncertainty about the secret key after cache attacks can
be increased from 64 bits to 96 bits or 128 bits, respectively.
Separability and Random Permutations In our CBA on an implementation protected
by a random permutation (Section 6.7.1) we assumed that fixing a candidate bk0determines
the candidates for all other key bytes. With sufficiently many measurements for a fixed bk0
we can determine the function Pb
k0as defined in Section 6.7.1. Furthermore, we saw that the
separability of candidates bk, bk′depends only on their difference δ=bk⊕bk′. Hence, to be able
to rule out all but one candidate bkiat position ifor a fixed bk0the permutation πmust have
the following property:
∀δ6= 0∃j∈ {0,...,15}∃a∈CLj:a⊕δ6∈ CLj.
There are approximately 2844 of the 256! ≈21684 permutations that do not have this property.
Hence, a random permutation satisfies this condition with probability 1 −2884
21684 = 1 −2−840.
6.8 Summary of Countermeasures and Open Problems
In this chapter we presented and analyzed the security of several different implementations
of AES. Moreover, we analyzed countermeasures based on permutations: random permuta-
tions and distinguished permutations. We give a short overview over the advantages and
disadvantages of the countermeasures:
6.8. Summary of Countermeasures and Open Problems 107
countermeasure # measurements information security efficiency
leakage
small-4 ∞0 bits high slow
random permutation 2300 128 bits low fast
distinguished permutations 560 64 bits medium fast
The second column shows the expected number of measurements an attacker has to perform
in order to get the amount of information shown in the third column.
Small-4 (see Section 6.6) prevents information leakage in a cache attack. However, the
efficiency depends on the size of a cache line and is rather low. In contrast, random permu-
tations (see Section 6.7) provide only low security. About 2300 measurements are sufficient
to reveal the complete 128 bit AES key. If realized via table lookups, random permuta-
tions are fast. But to increase the security offered by random permutations they have to be
changed frequently. Changing a permutation may cause problems with respect to efficiency
and security. So far, we have no precise analysis of these issues.
Distinguished permutations (see Section 6.7.2) protect half of the key bits and hence
provide a medium level of security. Using distinguished permutations, no frequent changes
of permutations are required to achieve a medium level of security. Hence, they do not
suffer from the above mentioned problems of random permutations. Therefore, distinguished
permutations provide a better ratio of efficiency and security as random permutations but
still leak half of the key bits.
Random permutations and distinguished permutations have to be realized as tables for
efficiency reasons. Hence, a straightforward implementation of the applications of a permuta-
tion would render the whole implementation susceptible to cache attacks. A possible solution
to this problem is to realize permutations via small sboxes that completely fit into a cache
line. Following the description of the small-4 variant of Section 6.6, πis split into smaller
tables π0,...,π3each of which is applied to the input x. Obviously, this does not make sense
if the standard sbox Sis used because both πand Smap from {0,1}8to {0,1}8. Hence,
it takes as many table lookups to apply πrealized with small sboxes as it takes to apply S
realized with small sbox directly. Moreover, realizing Svia small tables has the advantage of
not leaking information via the cache behavior.
The situation is different if the large sboxes of the fast implementation are used. Again
πmaps from {0,1}8to {0,1}8but a large sbox maps from {0,1}8to {0,1}32. Therefore, it
takes 4 times as many table lookups to realize the large sbox via small sboxes than to realize
πvia small tables.
Hence, first applying πto an input via small tables and then applying a large permuted
sbox, as shown in Figure 6.7, makes sense if this technique is faster than realizing the standard
sbox Svia small sboxes. Here, one has to take into account the technical problem that on
108 Chapter 6. Cache Behavior Attacks (CBAs)
π1
π2
π0
x0
π3T′
4[π(x0)]
=T4[x0]
π(x0)
Figure 6.7: Combining small tables with permutation π
32-bit platforms the byte oriented structure of the standard sbox Sleads to a time consuming
post processing to incorporate the output of the sbox into the encryption state.
Note that realizing πvia small tables does not leak any information in cache attacks. Only
the application of the permuted sbox leaks information about intermediate states. Hence,
this scenario is exactly the scenario of our attack in Section 6.7.1 where we assumed that
only the application of the sbox leaks information.
As mentioned in Section 6.6 one can scale the sizes of the smaller tables to improve
efficiency. But it is essential to determine whether the amount of information that leaks
with this method is acceptable or not. Summing up, the analysis given above shows that
permutations as a countermeasure to thwart cache based attacks do not provide as much
security as one would expect. However, we have shown that using distinguished permutations
one can reduce the information leakage via CBAs. That means that even with an arbitrary
number of measurements a CBA based on the last round cannot determine certain bits of
the secret key. Since we consider the reduction of information leakage as a preferred goal
distinguished permutations constitute an interesting way to improve the security gain of
permutations.
Appendix A
Sbox Tables T0,...,T4of AES
0 1 2 3 4 5 6 7
00 C6 63 63 A5 F8 7C 7C 84 EE 77 77 99 F6 7B 7B 8D FF F2 F2 0D D6 6B 6B BD DE 6F 6F B1 91 C5 C5 54
01 60 30 30 50 02 01 01 03 CE 67 67 A9 56 2B 2B 7D E7 FEFE 19 B5 D7 D7 62 4DABAB E6 EC 76 76 9A
02 8F CACA 45 1F 82 82 9D 89 C9 C9 40 FA 7D 7D 87 EFFA FA 15 B2 59 59 EB 8E 47 47 C9 FB F0 F0 0B
03 41 ADADEC B3 D4 D4 67 5F A2 A2FD 45 AFAFEA 23 9C 9C BF 53 A4 A4 F7 E4 72 72 96 9B C0 C0 5B
04 75 B7 B7 C2 E1 FDFD1C 3D 93 93 AE 4C 26 26 6A 6C 36 36 5A 7E 3F 3F 41 F5 F7 F7 02 83 CCCC 4F
05 68 34 34 5C 51 A5 A5 F4 D1 E5 E5 34 F9 F1 F1 08 E2 71 71 93 AB D8 D8 73 62 31 31 53 2A 15 15 3F
06 08 04 04 0C 95 C7 C7 52 46 23 23 65 9D C3 C3 5E 30 18 18 28 37 96 96 A1 0A 05 05 0F 2F 9A 9A B5
07 0E 07 07 09 24 12 12 36 1B 80 80 9B DF E2 E2 3D CDEBEB 26 4E 27 27 69 7F B2 B2CD EA 75 75 9F
08 12 09 09 1B 1D 83 83 9E 58 2C 2C 74 34 1A 1A 2E 36 1B 1B 2D DC 6E 6E B2 B4 5A 5A EE 5B A0 A0 FB
09 A4 52 52 F6 76 3B 3B 4D B7 D6 D6 61 7D B3 B3 CE 52 29 29 7B DD E3 E3 3E 5E 2F 2F 71 13 84 84 97
0A A6 53 53 F5 B9 D1 D1 68 00 00 00 00 C1 EDED 2C 40 20 20 60 E3 FC FC 1F 79 B1 B1 C8 B6 5B 5BED
0B D4 6A 6A BE 8D CBCB 46 67 BEBE D9 72 39 39 4B 94 4A 4A DE 98 4C 4C D4 B0 58 58 E8 85 CFCF 4A
0C BB D0 D0 6B C5 EF EF 2A 4F AAAAE5 EDFBFB 16 86 43 43 C5 9A 4D 4D D7 66 33 33 55 11 85 85 94
0D 8A 45 45 CF E9 F9 F9 10 04 02 02 06 FE 7F 7F 81 A0 50 50 F0 78 3C 3C 44 25 9F 9F BA 4B A8 A8 E3
0E A2 51 51 F3 5D A3 A3FE 80 40 40 C0 05 8F 8F 8A 3F 92 92 AD 21 9D 9DBC 70 38 38 48 F1 F5 F5 04
0F 63 BCBCDF 77 B6 B6 C1 AFDADA 75 42 21 21 63 20 10 10 30 E5 FF FF 1A FD F3 F3 0E BF D2 D2 6D
10 81 CDCD 4C 18 0C 0C 14 26 13 13 35 C3 ECEC 2F BE 5F 5F E1 35 97 97 A2 88 44 44 CC 2E 17 17 39
11 93 C4 C4 57 55 A7 A7 F2 FC 7E 7E 82 7A 3D 3D 47 C8 64 64 AC BA 5D 5D E7 32 19 19 2B E6 73 73 95
12 C0 60 60 A0 19 81 81 98 9E 4F 4F D1 A3DCDC 7F 44 22 22 66 54 2A 2A 7E 3B 90 90 AB 0B 88 88 83
13 8C 46 46 CA C7 EE EE 29 6B B8 B8 D3 28 14 14 3C A7DEDE 79 BC 5E 5E E2 16 0B 0B 1D ADDBDB 76
14 DB E0 E0 3B 64 32 32 56 74 3A 3A 4E 14 0A 0A 1E 92 49 49 DB 0C 06 06 0A 48 24 24 6C B8 5C 5C E4
15 9F C2 C2 5D BDD3 D3 6E 43 ACACEF C4 62 62 A6 39 91 91 A8 31 95 95 A4 D3 E4 E4 37 F2 79 79 8B
16 D5 E7 E7 32 8B C8 C8 43 6E 37 37 59 DA6D 6D B7 01 8D 8D 8C B1 D5 D5 64 9C 4E 4E D2 49 A9 A9 E0
17 D8 6C 6C B4 AC 56 56 FA F3 F4 F4 07 CFEAEA 25 CA 65 65 AF F4 7A 7A 8E 47 AEAE E9 10 08 08 18
18 6F BABAD5 F0 78 78 88 4A 25 25 6F 5C 2E 2E 72 38 1C 1C 24 57 A6 A6 F1 73 B4 B4 C7 97 C6 C6 51
19 CB E8 E8 23 A1DDDD7C E8 74 74 9C 3E 1F 1F 21 96 4B 4BDD 61 BDBDDC 0D 8B 8B 86 0F 8A 8A 85
1A E0 70 70 90 7C 3E 3E 42 71 B5 B5 C4 CC 66 66 AA 90 48 48 D8 06 03 03 05 F7 F6 F6 01 1C 0E 0E 12
1B C2 61 61 A3 6A 35 35 5F AE 57 57 F9 69 B9 B9 D0 17 86 86 91 99 C1 C1 58 3A 1D 1D 27 27 9E 9E B9
1C D9 E1 E1 38 EB F8 F8 13 2B 98 98 B3 22 11 11 33 D2 69 69 BB A9 D9 D9 70 07 8E 8E 89 33 94 94 A7
1D 2D 9B 9B B6 3C 1E 1E 22 15 87 87 92 C9 E9 E9 20 87 CECE 49 AA 55 55 FF 50 28 28 78 A5 DFDF7A
1E 03 8C 8C 8F 59 A1 A1 F8 09 89 89 80 1A 0D 0D 17 65 BFBFDA D7 E6 E6 31 84 42 42 C6 D0 68 68 B8
1F 82 41 41 C3 29 99 99 B0 5A 2D 2D 77 1E 0F 0F 11 7B B0 B0 CB A8 54 54 FC 6DBBBB D6 2C 16 16 3A
Table A.1: Sbox T0
109
0 1 2 3 4 5 6 7
00 A5 C6 63 63 84 F8 7C 7C 99 EE 77 77 8D F6 7B 7B 0D FF F2 F2 BD D6 6B 6B B1DE 6F 6F 54 91 C5 C5
01 50 60 30 30 03 02 01 01 A9CE 67 67 7D 56 2B 2B 19 E7 FEFE 62 B5 D7 D7 E6 4DABAB 9A EC 76 76
02 45 8F CACA 9D 1F 82 82 40 89 C9 C9 87 FA 7D 7D 15 EF FA FA EB B2 59 59 C9 8E 47 47 0B FB F0 F0
03 EC 41 ADAD 67 B3 D4 D4 FD 5F A2 A2 EA 45 AFAF BF 23 9C 9C F7 53 A4 A4 96 E4 72 72 5B 9B C0 C0
04 C2 75 B7 B7 1C E1 FDFD AE3D 93 93 6A 4C 26 26 5A 6C 36 36 41 7E 3F 3F 02 F5 F7 F7 4F 83 CCCC
05 5C 68 34 34 F4 51 A5 A5 34 D1 E5 E5 08 F9 F1 F1 93 E2 71 71 73 AB D8 D8 53 62 31 31 3F 2A 15 15
06 0C 08 04 04 52 95 C7 C7 65 46 23 23 5E 9D C3 C3 28 30 18 18 A1 37 96 96 0F 0A 05 05 B5 2F 9A 9A
07 09 0E 07 07 36 24 12 12 9B 1B 80 80 3D DF E2 E2 26 CDEBEB 69 4E 27 27 CD 7F B2 B2 9F EA 75 75
08 1B 12 09 09 9E 1D 83 83 74 58 2C 2C 2E 34 1A 1A 2D 36 1B 1B B2 DC 6E 6E EE B4 5A 5A FB 5B A0 A0
09 F6 A4 52 52 4D 76 3B 3B 61 B7 D6 D6 CE 7D B3 B3 7B 52 29 29 3E DD E3 E3 71 5E 2F 2F 97 13 84 84
0A F5 A6 53 53 68 B9 D1 D1 00 00 00 00 2C C1 EDED 60 40 20 20 1F E3 FCFC C8 79 B1 B1 ED B6 5B 5B
0B BE D4 6A 6A 46 8D CBCB D9 67 BEBE 4B 72 39 39 DE 94 4A 4A D4 98 4C 4C E8 B0 58 58 4A 85 CFCF
0C 6B BB D0 D0 2A C5 EF EF E5 4FAAAA 16 EDFBFB C5 86 43 43 D7 9A 4D 4D 55 66 33 33 94 11 85 85
0D CF 8A 45 45 10 E9 F9 F9 06 04 02 02 81 FE 7F 7F F0 A0 50 50 44 78 3C 3C BA 25 9F 9F E3 4B A8 A8
0E F3 A2 51 51 FE5D A3 A3 C0 80 40 40 8A 05 8F 8F AD 3F 92 92 BC 21 9D 9D 48 70 38 38 04 F1 F5 F5
0F DF 63 BCBC C1 77 B6 B6 75 AFDADA 63 42 21 21 30 20 10 10 1A E5 FF FF 0E FD F3 F3 6D BF D2 D2
10 4C 81 CDCD 14 18 0C 0C 35 26 13 13 2F C3 ECEC E1 BE 5F 5F A2 35 97 97 CC 88 44 44 39 2E 17 17
11 57 93 C4 C4 F2 55 A7 A7 82 FC 7E 7E 47 7A 3D 3D AC C8 64 64 E7 BA 5D 5D 2B 32 19 19 95 E6 73 73
12 A0 C0 60 60 98 19 81 81 D1 9E 4F 4F 7F A3DCDC 66 44 22 22 7E 54 2A 2A AB3B 90 90 83 0B 88 88
13 CA8C 46 46 29 C7 EE EE D3 6B B8 B8 3C 28 14 14 79 A7DEDE E2 BC 5E 5E 1D 16 0B 0B 76 ADDBDB
14 3BDB E0 E0 56 64 32 32 4E 74 3A 3A 1E 14 0A 0A DB 92 49 49 0A 0C 06 06 6C 48 24 24 E4 B8 5C 5C
15 5D 9F C2 C2 6EBDD3 D3 EF 43 ACAC A6 C4 62 62 A8 39 91 91 A4 31 95 95 37 D3 E4 E4 8B F2 79 79
16 32 D5 E7 E7 43 8B C8 C8 59 6E 37 37 B7 DA6D 6D 8C 01 8D 8D 64 B1 D5 D5 D2 9C 4E 4E E0 49 A9 A9
17 B4 D8 6C 6C FAAC 56 56 07 F3 F4 F4 25 CFEAEA AFCA 65 65 8E F4 7A 7A E9 47 AEAE 18 10 08 08
18 D5 6F BABA 88 F0 78 78 6F 4A 25 25 72 5C 2E 2E 24 38 1C 1C F1 57 A6 A6 C7 73 B4 B4 51 97 C6 C6
19 23 CB E8 E8 7C A1DDDD 9C E8 74 74 21 3E 1F 1F DD 96 4B 4B DC 61 BDBD 86 0D 8B 8B 85 0F 8A 8A
1A 90 E0 70 70 42 7C 3E 3E C4 71 B5 B5 AACC 66 66 D8 90 48 48 05 06 03 03 01 F7 F6 F6 12 1C 0E 0E
1B A3 C2 61 61 5F 6A 35 35 F9 AE 57 57 D0 69 B9 B9 91 17 86 86 58 99 C1 C1 27 3A 1D 1D B9 27 9E 9E
1C 38 D9 E1 E1 13 EB F8 F8 B3 2B 98 98 33 22 11 11 BB D2 69 69 70 A9 D9 D9 89 07 8E 8E A7 33 94 94
1D B6 2D 9B 9B 22 3C 1E 1E 92 15 87 87 20 C9 E9 E9 49 87 CECE FFAA 55 55 78 50 28 28 7A A5 DFDF
1E 8F 03 8C 8C F8 59 A1 A1 80 09 89 89 17 1A 0D 0D DA 65 BFBF 31 D7 E6 E6 C6 84 42 42 B8 D0 68 68
1F C3 82 41 41 B0 29 99 99 77 5A 2D 2D 11 1E 0F 0F CB 7B B0 B0 FC A8 54 54 D6 6DBBBB 3A 2C 16 16
Table A.2: Sbox T1
0 1 2 3 4 5 6 7
00 63 A5 C6 63 7C 84 F8 7C 77 99 EE 77 7B 8D F6 7B F2 0D FF F2 6B BD D6 6B 6F B1 DE 6F C5 54 91 C5
01 30 50 60 30 01 03 02 01 67 A9CE 67 2B 7D 56 2B FE 19 E7 FE D7 62 B5 D7 AB E6 4DAB 76 9A EC 76
02 CA 45 8F CA 82 9D 1F 82 C9 40 89 C9 7D 87 FA 7D FA 15 EF FA 59 EB B2 59 47 C9 8E 47 F0 0B FB F0
03 ADEC 41 AD D4 67 B3 D4 A2 FD 5F A2 AFEA 45 AF 9C BF 23 9C A4 F7 53 A4 72 96 E4 72 C0 5B 9B C0
04 B7 C2 75 B7 FD1C E1 FD 93 AE3D 93 26 6A 4C 26 36 5A 6C 36 3F 41 7E 3F F7 02 F5 F7 CC 4F 83 CC
05 34 5C 68 34 A5 F4 51 A5 E5 34 D1 E5 F1 08 F9 F1 71 93 E2 71 D8 73 ABD8 31 53 62 31 15 3F 2A 15
06 04 0C 08 04 C7 52 95 C7 23 65 46 23 C3 5E 9D C3 18 28 30 18 96 A1 37 96 05 0F 0A 05 9A B5 2F 9A
07 07 09 0E 07 12 36 24 12 80 9B 1B 80 E2 3D DF E2 EB 26 CDEB 27 69 4E 27 B2CD 7F B2 75 9F EA 75
08 09 1B 12 09 83 9E 1D 83 2C 74 58 2C 1A 2E 34 1A 1B 2D 36 1B 6E B2 DC 6E 5A EE B4 5A A0 FB 5B A0
09 52 F6 A4 52 3B 4D 76 3B D6 61 B7 D6 B3 CE 7D B3 29 7B 52 29 E3 3E DD E3 2F 71 5E 2F 84 97 13 84
0A 53 F5 A6 53 D1 68 B9 D1 00 00 00 00 ED 2C C1 ED 20 60 40 20 FC 1F E3 FC B1 C8 79 B1 5BED B6 5B
0B 6A BED4 6A CB 46 8D CB BED9 67 BE 39 4B 72 39 4A DE 94 4A 4C D4 98 4C 58 E8 B0 58 CF 4A 85 CF
0C D0 6B BB D0 EF2A C5 EF AAE5 4F AA FB 16 EDFB 43 C5 86 43 4D D7 9A 4D 33 55 66 33 85 94 11 85
0D 45 CF 8A 45 F9 10 E9 F9 02 06 04 02 7F 81 FE 7F 50 F0 A0 50 3C 44 78 3C 9F BA 25 9F A8 E3 4B A8
0E 51 F3 A2 51 A3 FE5D A3 40 C0 80 40 8F 8A 05 8F 92 AD 3F 92 9DBC 21 9D 38 48 70 38 F5 04 F1 F5
0F BCDF 63 BC B6 C1 77 B6 DA 75 AFDA 21 63 42 21 10 30 20 10 FF 1A E5 FF F3 0E FD F3 D2 6D BF D2
10 CD 4C 81 CD 0C 14 18 0C 13 35 26 13 EC 2F C3 EC 5F E1 BE 5F 97 A2 35 97 44 CC 88 44 17 39 2E 17
11 C4 57 93 C4 A7 F2 55 A7 7E 82 FC 7E 3D 47 7A 3D 64 AC C8 64 5D E7 BA5D 19 2B 32 19 73 95 E6 73
12 60 A0 C0 60 81 98 19 81 4F D1 9E 4F DC 7F A3DC 22 66 44 22 2A 7E 54 2A 90 AB3B 90 88 83 0B 88
13 46 CA8C 46 EE 29 C7 EE B8 D3 6B B8 14 3C 28 14 DE 79 A7DE 5E E2 BC 5E 0B 1D 16 0B DB 76 ADDB
14 E0 3B DB E0 32 56 64 32 3A 4E 74 3A 0A 1E 14 0A 49 DB 92 49 06 0A 0C 06 24 6C 48 24 5C E4 B8 5C
15 C2 5D 9F C2 D3 6EBD D3 ACEF 43 AC 62 A6 C4 62 91 A8 39 91 95 A4 31 95 E4 37 D3 E4 79 8B F2 79
16 E7 32 D5 E7 C8 43 8B C8 37 59 6E 37 6D B7 DA6D 8D 8C 01 8D D5 64 B1 D5 4E D2 9C 4E A9 E0 49 A9
17 6C B4 D8 6C 56 FAAC 56 F4 07 F3 F4 EA 25 CFEA 65 AFCA 65 7A 8E F4 7A AE E9 47 AE 08 18 10 08
18 BAD5 6F BA 78 88 F0 78 25 6F 4A 25 2E 72 5C 2E 1C 24 38 1C A6 F1 57 A6 B4 C7 73 B4 C6 51 97 C6
19 E8 23 CB E8 DD7C A1DD 74 9C E8 74 1F 21 3E 1F 4BDD 96 4B BDDC 61 BD 8B 86 0D 8B 8A 85 0F 8A
1A 70 90 E0 70 3E 42 7C 3E B5 C4 71 B5 66 AACC 66 48 D8 90 48 03 05 06 03 F6 01 F7 F6 0E 12 1C 0E
1B 61 A3 C2 61 35 5F 6A 35 57 F9 AE 57 B9 D0 69 B9 86 91 17 86 C1 58 99 C1 1D 27 3A 1D 9E B9 27 9E
1C E1 38 D9 E1 F8 13 EB F8 98 B3 2B 98 11 33 22 11 69 BB D2 69 D9 70 A9 D9 8E 89 07 8E 94 A7 33 94
1D 9B B6 2D 9B 1E 22 3C 1E 87 92 15 87 E9 20 C9 E9 CE 49 87 CE 55 FFAA 55 28 78 50 28 DF7A A5 DF
1E 8C 8F 03 8C A1 F8 59 A1 89 80 09 89 0D 17 1A 0D BFDA 65 BF E6 31 D7 E6 42 C6 84 42 68 B8 D0 68
1F 41 C3 82 41 99 B0 29 99 2D 77 5A 2D 0F 11 1E 0F B0 CB 7B B0 54 FC A8 54 BBD6 6DBB 16 3A 2C 16
Table A.3: Sbox T2
0 1 2 3 4 5 6 7
00 63 63 A5 C6 7C 7C 84 F8 77 77 99 EE 7B 7B 8D F6 F2 F2 0D FF 6B 6BBD D6 6F 6F B1 DE C5 C5 54 91
01 30 30 50 60 01 01 03 02 67 67 A9CE 2B 2B 7D 56 FEFE 19 E7 D7 D7 62 B5 ABAB E6 4D 76 76 9A EC
02 CACA 45 8F 82 82 9D 1F C9 C9 40 89 7D 7D 87 FA FA FA 15 EF 59 59 EB B2 47 47 C9 8E F0 F0 0B FB
03 ADADEC 41 D4 D4 67 B3 A2 A2FD 5F AFAFEA 45 9C 9C BF 23 A4 A4 F7 53 72 72 96 E4 C0 C0 5B 9B
04 B7 B7 C2 75 FDFD1C E1 93 93 AE3D 26 26 6A 4C 36 36 5A 6C 3F 3F 41 7E F7 F7 02 F5 CCCC 4F 83
05 34 34 5C 68 A5 A5 F4 51 E5 E5 34 D1 F1 F1 08 F9 71 71 93 E2 D8 D8 73 AB 31 31 53 62 15 15 3F 2A
06 04 04 0C 08 C7 C7 52 95 23 23 65 46 C3 C3 5E 9D 18 18 28 30 96 96 A1 37 05 05 0F 0A 9A 9A B5 2F
07 07 07 09 0E 12 12 36 24 80 80 9B 1B E2 E2 3D DF EBEB 26 CD 27 27 69 4E B2 B2CD 7F 75 75 9F EA
08 09 09 1B 12 83 83 9E 1D 2C 2C 74 58 1A 1A 2E 34 1B 1B 2D 36 6E 6E B2 DC 5A 5A EE B4 A0 A0 FB 5B
09 52 52 F6 A4 3B 3B 4D 76 D6 D6 61 B7 B3 B3 CE 7D 29 29 7B 52 E3 E3 3E DD 2F 2F 71 5E 84 84 97 13
0A 53 53 F5 A6 D1 D1 68 B9 00 00 00 00 EDED 2C C1 20 20 60 40 FC FC 1F E3 B1 B1 C8 79 5B 5BED B6
0B 6A 6A BE D4 CBCB 46 8D BEBE D9 67 39 39 4B 72 4A 4A DE 94 4C 4C D4 98 58 58 E8 B0 CFCF 4A 85
0C D0 D0 6B BB EF EF 2A C5 AAAAE5 4F FBFB 16 ED 43 43 C5 86 4D 4D D7 9A 33 33 55 66 85 85 94 11
0D 45 45 CF 8A F9 F9 10 E9 02 02 06 04 7F 7F 81 FE 50 50 F0 A0 3C 3C 44 78 9F 9F BA 25 A8 A8 E3 4B
0E 51 51 F3 A2 A3 A3 FE5D 40 40 C0 80 8F 8F 8A 05 92 92 AD 3F 9D 9DBC 21 38 38 48 70 F5 F5 04 F1
0F BCBCDF 63 B6 B6 C1 77 DADA 75 AF 21 21 63 42 10 10 30 20 FF FF 1A E5 F3 F3 0E FD D2 D2 6D BF
10 CDCD 4C 81 0C 0C 14 18 13 13 35 26 ECEC 2F C3 5F 5F E1 BE 97 97 A2 35 44 44 CC 88 17 17 39 2E
11 C4 C4 57 93 A7 A7 F2 55 7E 7E 82 FC 3D 3D 47 7A 64 64 AC C8 5D 5D E7 BA 19 19 2B 32 73 73 95 E6
12 60 60 A0 C0 81 81 98 19 4F 4F D1 9E DCDC 7F A3 22 22 66 44 2A 2A 7E 54 90 90 AB3B 88 88 83 0B
13 46 46 CA8C EE EE 29 C7 B8 B8 D3 6B 14 14 3C 28 DEDE 79 A7 5E 5E E2 BC 0B 0B 1D 16 DBDB 76 AD
14 E0 E0 3BDB 32 32 56 64 3A 3A 4E 74 0A 0A 1E 14 49 49 DB 92 06 06 0A 0C 24 24 6C 48 5C 5C E4 B8
15 C2 C2 5D 9F D3 D3 6EBD ACACEF 43 62 62 A6 C4 91 91 A8 39 95 95 A4 31 E4 E4 37 D3 79 79 8B F2
16 E7 E7 32 D5 C8 C8 43 8B 37 37 59 6E 6D 6D B7DA 8D 8D 8C 01 D5 D5 64 B1 4E 4E D2 9C A9 A9 E0 49
17 6C 6C B4 D8 56 56 FAAC F4 F4 07 F3 EAEA 25 CF 65 65 AFCA 7A 7A 8E F4 AEAE E9 47 08 08 18 10
18 BABA D5 6F 78 78 88 F0 25 25 6F 4A 2E 2E 72 5C 1C 1C 24 38 A6 A6 F1 57 B4 B4 C7 73 C6 C6 51 97
19 E8 E8 23 CB DDDD7C A1 74 74 9C E8 1F 1F 21 3E 4B 4B DD 96 BDBDDC 61 8B 8B 86 0D 8A 8A 85 0F
1A 70 70 90 E0 3E 3E 42 7C B5 B5 C4 71 66 66 AACC 48 48 D8 90 03 03 05 06 F6 F6 01 F7 0E 0E 12 1C
1B 61 61 A3 C2 35 35 5F 6A 57 57 F9 AE B9 B9 D0 69 86 86 91 17 C1 C1 58 99 1D 1D 27 3A 9E 9E B9 27
1C E1 E1 38 D9 F8 F8 13 EB 98 98 B3 2B 11 11 33 22 69 69 BB D2 D9 D9 70 A9 8E 8E 89 07 94 94 A7 33
1D 9B 9B B6 2D 1E 1E 22 3C 87 87 92 15 E9 E9 20 C9 CECE 49 87 55 55 FFAA 28 28 78 50 DFDF7A A5
1E 8C 8C 8F 03 A1 A1 F8 59 89 89 80 09 0D 0D 17 1A BFBFDA 65 E6 E6 31 D7 42 42 C6 84 68 68 B8 D0
1F 41 41 C3 82 99 99 B0 29 2D 2D 77 5A 0F 0F 11 1E B0 B0 CB 7B 54 54 FC A8 BBBB D6 6D 16 16 3A 2C
Table A.4: Sbox T3
0 1 2 3 4 5 6 7
00 63 63 63 63 7C 7C 7C 7C 77 77 77 77 7B 7B 7B 7B F2 F2 F2 F2 6B 6B 6B 6B 6F 6F 6F 6F C5 C5 C5 C5
01 30 30 30 30 01 01 01 01 67 67 67 67 2B 2B 2B 2B FEFEFEFE D7 D7 D7 D7 ABABABAB 76 76 76 76
02 CACACACA 82 82 82 82 C9 C9 C9 C9 7D 7D 7D 7D FA FA FA FA 59 59 59 59 47 47 47 47 F0 F0 F0 F0
03 ADADADAD D4 D4 D4 D4 A2 A2 A2 A2 AFAFAFAF 9C 9C 9C 9C A4 A4 A4 A4 72 72 72 72 C0 C0 C0 C0
04 B7 B7 B7 B7 FDFDFDFD 93 93 93 93 26 26 26 26 36 36 36 36 3F 3F 3F 3F F7 F7 F7 F7 CCCCCCCC
05 34 34 34 34 A5 A5 A5 A5 E5 E5 E5 E5 F1 F1 F1 F1 71 71 71 71 D8 D8 D8 D8 31 31 31 31 15 15 15 15
06 04 04 04 04 C7 C7 C7 C7 23 23 23 23 C3 C3 C3 C3 18 18 18 18 96 96 96 96 05 05 05 05 9A 9A 9A 9A
07 07 07 07 07 12 12 12 12 80 80 80 80 E2 E2 E2 E2 EBEBEBEB 27 27 27 27 B2 B2 B2 B2 75 75 75 75
08 09 09 09 09 83 83 83 83 2C 2C 2C 2C 1A 1A 1A 1A 1B 1B 1B 1B 6E 6E 6E 6E 5A 5A 5A 5A A0 A0 A0 A0
09 52 52 52 52 3B 3B 3B 3B D6 D6 D6 D6 B3 B3 B3 B3 29 29 29 29 E3 E3 E3 E3 2F 2F 2F 2F 84 84 84 84
0A 53 53 53 53 D1 D1 D1 D1 00 00 00 00 EDEDEDED 20 20 20 20 FCFCFCFC B1 B1 B1 B1 5B 5B 5B 5B
0B 6A 6A 6A 6A CBCBCBCB BEBEBEBE 39 39 39 39 4A 4A 4A 4A 4C 4C 4C 4C 58 58 58 58 CFCFCFCF
0C D0 D0 D0 D0 EF EF EF EF AAAAAAAA FB FBFB FB 43 43 43 43 4D 4D 4D 4D 33 33 33 33 85 85 85 85
0D 45 45 45 45 F9 F9 F9 F9 02 02 02 02 7F 7F 7F 7F 50 50 50 50 3C 3C 3C 3C 9F 9F 9F 9F A8 A8 A8 A8
0E 51 51 51 51 A3 A3 A3 A3 40 40 40 40 8F 8F 8F 8F 92 92 92 92 9D 9D 9D 9D 38 38 38 38 F5 F5 F5 F5
0F BCBCBCBC B6 B6 B6 B6 DADADADA 21 21 21 21 10 10 10 10 FF FF FF FF F3 F3 F3 F3 D2 D2 D2 D2
10 CDCDCDCD 0C 0C 0C 0C 13 13 13 13 ECECECEC 5F 5F 5F 5F 97 97 97 97 44 44 44 44 17 17 17 17
11 C4 C4 C4 C4 A7 A7 A7 A7 7E 7E 7E 7E 3D 3D 3D 3D 64 64 64 64 5D 5D 5D 5D 19 19 19 19 73 73 73 73
12 60 60 60 60 81 81 81 81 4F 4F 4F 4F DCDCDCDC 22 22 22 22 2A 2A 2A 2A 90 90 90 90 88 88 88 88
13 46 46 46 46 EE EE EE EE B8 B8 B8 B8 14 14 14 14 DEDEDEDE 5E 5E 5E 5E 0B 0B 0B 0B DBDBDBDB
14 E0 E0 E0 E0 32 32 32 32 3A 3A 3A 3A 0A 0A 0A 0A 49 49 49 49 06 06 06 06 24 24 24 24 5C 5C 5C 5C
15 C2 C2 C2 C2 D3 D3 D3 D3 ACACACAC 62 62 62 62 91 91 91 91 95 95 95 95 E4 E4 E4 E4 79 79 79 79
16 E7 E7 E7 E7 C8 C8 C8 C8 37 37 37 37 6D 6D 6D 6D 8D 8D 8D 8D D5 D5 D5 D5 4E 4E 4E 4E A9 A9 A9 A9
17 6C 6C 6C 6C 56 56 56 56 F4 F4 F4 F4 EAEAEAEA 65 65 65 65 7A 7A 7A 7A AEAEAEAE 08 08 08 08
18 BABABABA 78 78 78 78 25 25 25 25 2E 2E 2E 2E 1C 1C 1C 1C A6 A6 A6 A6 B4 B4 B4 B4 C6 C6 C6 C6
19 E8 E8 E8 E8 DDDDDDDD 74 74 74 74 1F 1F 1F 1F 4B 4B 4B 4B BDBDBDBD 8B 8B 8B 8B 8A 8A 8A 8A
1A 70 70 70 70 3E 3E 3E 3E B5 B5 B5 B5 66 66 66 66 48 48 48 48 03 03 03 03 F6 F6 F6 F6 0E 0E 0E 0E
1B 61 61 61 61 35 35 35 35 57 57 57 57 B9 B9 B9 B9 86 86 86 86 C1 C1 C1 C1 1D 1D 1D 1D 9E 9E 9E 9E
1C E1 E1 E1 E1 F8 F8 F8 F8 98 98 98 98 11 11 11 11 69 69 69 69 D9 D9 D9 D9 8E 8E 8E 8E 94 94 94 94
1D 9B 9B 9B 9B 1E 1E 1E 1E 87 87 87 87 E9 E9 E9 E9 CECECECE 55 55 55 55 28 28 28 28 DFDFDFDF
1E 8C 8C 8C 8C A1 A1 A1 A1 89 89 89 89 0D 0D 0D 0D BFBFBFBF E6 E6 E6 E6 42 42 42 42 68 68 68 68
1F 41 41 41 41 99 99 99 99 2D 2D 2D 2D 0F 0F 0F 0F B0 B0 B0 B0 54 54 54 54 BBBBBBBB 16 16 16 16
Table A.5: Sbox T4
Appendix B
Decompositions of the AES Sbox
In the sequel, the standard AES sbox is decomposed into smaller number of sboxes as de-
scribed in Section 6.6 on page 97. For each decomposition the function to compute S[x] given
xis shown.
The standard AES Sbox The standard sbox is an efficient realization of the mapping
{0,1}8→ {0,1}8
x7→ S[x].
63 7C 77 7B F2 6B 6F C5 30 01 67 2B FE D7 AB 76
CA 82 C9 7D FA 59 47 F0 AD D4 A2 AF 9C A4 72 C0
B7 FD 93 26 36 3F F7 CC 34 A5 E5 F1 71 D8 31 15
04 C7 23 C3 18 96 05 9A 07 12 80 E2 EB 27 B2 75
09 83 2C 1A 1B 6E 5A A0 52 3B D6 B3 29 E3 2F 84
53 D1 00 ED 20 FC B1 5B 6A CB BE 39 4A 4C 58 CF
D0 EF AA FB 43 4D 33 85 45 F9 02 7F 50 3C 9F A8
51 A3 40 8F 92 9D 38 F5 BC B6 DA 21 10 FF F3 D2
CD 0C 13 EC 5F 97 44 17 C4 A7 7E 3D 64 5D 19 73
60 81 4F DC 22 2A 90 88 46 EE B8 14 DE 5E 0B DB
E0 32 3A 0A 49 06 24 5C C2 D3 AC 62 91 95 E4 79
E7 C8 37 6D 8D D5 4E A9 6C 56 F4 EA 65 7A AE 08
BA 78 25 2E 1C A6 B4 C6 E8 DD 74 1F 4B BD 8B 8A
70 3E B5 66 48 03 F6 0E 61 35 57 B9 86 C1 1D 9E
E1 F8 98 11 69 D9 8E 94 9B 1E 87 E9 CE 55 28 DF
8C A1 89 0D BF E6 42 68 41 99 2D 0F B0 54 BB 16
Table B.1: The standard sbox S
115
Decomposition of the sbox S into 2smaller sboxes
The standard sbox Sis splitted into 2 smaller sboxes S(2)
0and S(2)
1each mapping from {0,1}8
to {0,1}4. The application of the sbox is then realized as
{0,1}8→ {0,1}4× {0,1}4
x7→ 16 ·S(2)
1[x]⊕S(2)
0[x]
S(2)
0:
3 C 7 B 2 B F 5 0 1 7 B E 7 B 6
A 2 9 D A 9 7 0 D 4 2 F C 4 2 0
7 D 3 6 6 F 7 C 4 5 5 1 1 8 1 5
4 7 3 3 8 6 5 A 7 2 0 2 B 7 2 5
9 3 C A B E A 0 2 B 6 3 9 3 F 4
3 1 0 D 0 C 1 B A B E 9 A C 8 F
0 F A B 3 D 3 5 5 9 2 F 0 C F 8
1 3 0 F 2 D 8 5 C 6 A 1 0 F 3 2
D C 3 C F 7 4 7 4 7 E D 4 D 9 3
0 1 F C 2 A 0 8 6 E 8 4 E E B B
0 2 A A 9 6 4 C 2 3 C 2 1 5 4 9
7 8 7 D D 5 E 9 C 6 4 A 5 A E 8
A 8 5 E C 6 4 6 8 D 4 F B D B A
0 E 5 6 8 3 6 E 1 5 7 9 6 1 D E
1 8 8 1 9 9 E 4 B E 7 9 E 5 8 F
C 1 9 D F 6 2 8 1 9 D F 0 4 B 6
S(2)
1:
6 7 7 7 F 6 6 C 3 0 6 2 F D A 7
C 8 C 7 F 5 4 F A D A A 9 A 7 C
B F 9 2 3 3 F C 3 A E F 7 D 3 1
0 C 2 C 1 9 0 9 0 1 8 E E 2 B 7
0 8 2 1 1 6 5 A 5 3 D B 2 E 2 8
5 D 0 E 2 F B 5 6 C B 3 4 4 5 C
D E A F 4 4 3 8 4 F 0 7 5 3 9 A
5 A 4 8 9 9 3 F B B D 2 1 F F D
C 0 1 E 5 9 4 1 C A 7 3 6 5 1 7
6 8 4 D 2 2 9 8 4 E B 1 D 5 0 D
E 3 3 0 4 0 2 5 C D A 6 9 9 E 7
E C 3 6 8 D 4 A 6 5 F E 6 7 A 0
B 7 2 2 1 A B C E D 7 1 4 B 8 8
7 3 B 6 4 0 F 0 6 3 5 B 8 C 1 9
E F 9 1 6 D 8 9 9 1 8 E C 5 2 D
8 A 8 0 B E 4 6 4 9 2 0 B 5 B 1
Decomposition of the sbox S into 4smaller sboxes
The standard sbox Sis splitted into 4 smaller sboxes S(4)
0,...,S(4)
3each mapping from {0,1}8
to {0,1}2. The application of the sbox is then realized as
x7→ 64 ·S(4)
3[x]⊕16 ·S(4)
2[x]⊕4·S(4)
1[x]⊕S(4)
0[x]
S(4)
0:
3033233101332332
2211213010230020
3132233001111011
0333021232023321
1302322023231330
3101001323212003
0323313111230030
1303210102210332
1030330303210113
0130220022002233
0222120023021101
3031112102021220
2012020201033132
0212032211312112
1001112032312103
0111322011130032
S(4)
1:
0312023100123121
2023221031033100
1301131311100201
1100211210002101
2032232002102031
0003030222322323
0322030112030332
0003032131200300
3303311111331320
0033020213213322
0022211300300112
1213313231121232
2213311123132322
0311201301121033
0220223123123123
3023310202330121
S(4)
2:
2333322030223123
0003310321221230
3312333032233131
0020110101022233
0021121213132220
1102233120330010
1223003003031312
1200113333121331
0012110102332113
2001221002311101
2330002101221123
2032010221322320
3322123021310300
3332003023130011
2311210111020121
0200320201203131
S(4)
3:
1111311300103321
3231311323222213
2320003302331300
0303020200233021
0200011210320302
1303032113201113
3323110213011022
1212220322300333
3003121032101101
1213002213203103
3000100133212231
3301231211331120
2100022333101222
1021103010122302
3320132220233103
2220231112002120
Decomposition of the sbox S into 8smaller sboxes
The standard sbox Sis splitted into 8 smaller sboxes S(8)
0,...,S(8)
7each mapping from {0,1}8
to {0,1}1. The application of the sbox is then realized as
{0,1}8→ {0,1} × {0,1} × {0,1} × {0,1} × {0,1} × {0,1} × {0,1} × {0,1}
x7→ 128·S(8)
7[x]⊕64·S(8)
6[x]⊕32·S(8)
5[x]⊕16·S(8)
4[x]⊕8·S(8)
3[x]⊕4·S(8)
2[x]⊕2·S(8)
1[x]⊕S(8)
0[x]
S(8)
0:
1011011101110110
0011011010010000
1110011001111011
0111001010001101
1100100001011110
1101001101010001
0101111111010010
1101010100010110
1010110101010111
0110000000000011
0000100001001101
1011110100001000
0010000001011110
0010010011110110
1001110010110101
0111100011110010
S(8)
1:
1011111000111111
1100101000110010
1011111000000000
0111010111011110
0101111011110110
1000000111101001
0111101000110010
0101100001100111
0010110101100001
0010110011001111
0111010011010000
1010001001010110
1001010100011011
0101011100101001
0000001011101001
0000111000010011
S(8)
2:
0110001100101101
0001001011011100
1101111111100001
1100011010000101
0010010000100011
0001010000100101
0100010110010110
0001010111000100
1101111111111100
0011000011011100
0000011100100110
1011111011101010
0011111101110100
0111001101101011
0000001101101101
1001110000110101
S(8)
3:
0101011000011010
1011110010011000
0100010100000100
0000100100001000
1011111001001010
0001010111111111
0111010001010111
0001011010100100
1101100000110110
0011010101101111
0011100100100001
0101101110010111
1101100011011111
0100100100010011
0110111011011011
1011100101110010
S(8)
4:
0111100010001101
0001110101001010
1110111010011111
0000110101000011
0001101011110000
1100011100110010
1001001001011110
1000111111101111
0010110100110111
0001001000111101
0110000101001101
0010010001100100
1100101001110100
1110001001110011
0111010111000101
0000100001001111
S(8)
5:
1111111010111011
0001100110110110
1101111011111010
0010000000011111
0010010101011110
0001111010110000
0111001001010101
0100001111010110
0001000001111001
1000110001100000
1110001000110011
1011000110111110
1111011010100100
1111001011010000
1100100000010010
0100110100101010
S(8)
6:
1111111100101101
1011111101000011
0100001100111100
0101000000011001
0000011010100100
1101010111001111
1101110011011000
1010000100100111
1001101010101101
1011000011001101
1000100111010011
1101011011111100
0100000111101000
1001101010100100
1100110000011101
0000011110000100
S(8)
7:
0000100100001110
1110100111111101
1110001101110100
0101010100111010
0100000100110101
0101011001100001
1111000101000011
0101110111100111
1001010011000000
0101001101101001
1000000011101110
1100110100110010
1000011111000111
0010001000011101
1110011110111001
1110110001001010
List of Tables
4.1 Computation of (u254 ⊕r1,13) using repeated squaring . . . . . . . . . . . . . 35
4.2 Computation of (u254 ⊕r1) using repeated squaring (simplified version) . . . 38
4.3 Hardware costs of different inversion circuits . . . . . . . . . . . . . . . . . . . 40
5.1 All possible differences of p0,p′
0. . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2 Overview over the fault based collision attacks . . . . . . . . . . . . . . . . . 69
6.1 The memory hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 Comparing properties of different CBAs . . . . . . . . . . . . . . . . . . . . . 85
6.3 Experimental environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.4 The resistance of the standard implementation . . . . . . . . . . . . . . . . . 93
6.5 The resistance of the fast implementation . . . . . . . . . . . . . . . . . . . . 95
6.6 The resistance of the fast implementation using only T0............ 96
6.7 The resistance of small-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.8 Timings for small-2, small-4 and small-8 applied on the last round of AES . . 98
6.9 Information Leakage, Resistance and Efficiency of AES implementations . . . 99
6.10 Simplified Comparison of Implementations . . . . . . . . . . . . . . . . . . . . 100
A.1 Sbox T0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
A.2 Sbox T1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
A.3 Sbox T2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
A.4 Sbox T3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
A.5 Sbox T4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
B.1 The standard sbox S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
123
List of Figures
2.1 Mapping the plaintext pinto a state . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 The SubBytes transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 The ShiftRows transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 The MixColumns transformation . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 The AddRoundKey transformation . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 Black box model of classical cryptography . . . . . . . . . . . . . . . . . . . . 19
3.2 Extended black box model that incorporates side channels . . . . . . . . . . . 20
5.1 Model of an enhanced smartcard with memory encryption mechanism (MEM) 50
6.1 Partitioning the address of requested data . . . . . . . . . . . . . . . . . . . . 75
6.2 Different types of cache memory . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3 Basic structure of a time driven CBA . . . . . . . . . . . . . . . . . . . . . . 79
6.4 Basic structure of a trace driven CBA . . . . . . . . . . . . . . . . . . . . . . 81
6.5 Prime-and-Probe method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.6 Formal outline of an access driven CBA . . . . . . . . . . . . . . . . . . . . . 84
6.7 Combining small tables with permutation π. . . . . . . . . . . . . . . . . . . 108
125
Bibliography
Acıi¸cmez, O., and C¸. K. Ko¸c. 2006. Trace Driven Cache Attack on AES. In ICICS. P. Ning,
S. Qing, and N. Li. (Eds.). Vol. 4307 of Lecture Notes in Computer Science. Springer
Verlag. Pp. 112–121.
Acıi¸cmez, O., W. Schindler, and C¸. K. Ko¸c. 2005. Improving Brumley and Boneh timing
attack on unprotected SSL implementations. In ACM Conference on Computer and
Communications Security. V. Atluri, C. Meadows, and A. Juels. (Eds.). ACM. Pp. 139–
146.
Akkar, M.-L., and C. Giraud. 2001. An Implementation of DES and AES, Secure against
Some Attacks. In Ko¸c, Naccache and Paar (2001). Pp. 309–318.
Akkar, M.-L., and L. Goubin. 2003. A generic protection against high-order differential power
analysis. In Johansson (2003). Pp. 192–205.
Akkar, M.-L., R. B´evan, and L. Goubin. 2004. Two Power Analysis Attacks against One-
Mask Methods. In 11th International Workshop on Fast Software Encryption — FSE
2004. B. K. Roy, and W. Meier. (Eds.). Vol. LNCS 3017 of Lecture Notes in Computer
Science. Springer-Verlag.
Anderson, R. 2001. Security Engineering: A Guide to Building Dependable Distributed Sys-
tems. Wiley & Sons.
Anderson, R. J., and M. G. Kuhn. 1996. Tamper resistance – a cautionary note. Proceedings
of the second USENIX Workshop on Electronic Commerce. USENIX Association. Pp. 1–
11.
Bar-El, H., H. Choukri, D. Naccache, M. Tunstall, and C. Whelan. 2006. The Sorcerer’s
Apprentice Guide to Fault Attacks. Proceedings of the IEEE. Vol. 94. Pp. 370–382.
Baudron, O., F. Boudot, P. Bourel, E. Bresson, J. Corbel, L. Frisch, H. Gilbert, M. Girault,
L. Goubin, J.-F. Misarsky, P. Nguyen, J. Patarin, D. Pointcheval, J. Stern, J. Traor,
and G. Poupard. 2000. GPS - An Asymmetric Identification Scheme for on the fly
Authentication of Low Cost Smart Cards.
127
Bernstein, D. J. 2005. Cache-timing attacks on AES.
http://cr.yp.to/papers.html#cachetiming.
Bertoni, G., V. Zaccaria, L. Breveglieri, M. Monchiero, and G. Palermo. 2005. AES Power
Attack Based on Induced Cache Miss and Countermeasure. International Symposium
on Information Technology: Coding and Computing (ITCC 2005), Volume 1, 4-6 April
2005, Las Vegas, Nevada, USA. IEEE Computer Society. Pp. 586–591.
Biham, E., and A. Shamir. 1997. Differential fault analysis of secret key cryptosystems.
In Advances in Cryptology - CRYPTO ’97, 17th Annual International Cryptology Con-
ference, Santa Barbara, California, USA, August 17-21, 1997, Proceedings. Burton S.
Kaliski Jr. (ed.). Vol. 1294 of Lecture Notes in Computer Science. Springer. Pp. 513–525.
Biham, E., and A. Shamir. 1999. Power Analysis of the Key Scheduling of the AES Candi-
dates. Proceedings of the Second AES Candidate Conference (AES2). Rome, Italy.
Bl¨omer, J., and J.-P. Seifert. 2003. Fault Based Cryptanalysis of the Advanced Encryption
Standard (AES). In Financial Cryptography, 7th International Conference, FC 2003,
Guadeloupe, French West Indies, January 27-30, 2003, Revised Papers. R. N. Wright
(ed.). Vol. 2742 of Lecture Notes in Computer Science. Springer-Verlag. Pp. 162–181.
Bl¨omer, J., and V. Krummel. 2006. Fault based collision attacks on AES. In Fault Diagnosis
and Tolerance in Cryptography, Third International Workshop, FDTC 2006, Yokohama,
Japan, October 10, 2006, Proceedings. L. Breveglieri, I. Koren, D. Naccache, and J.-P.
Seifert. (Eds.). Vol. 4236 of Lecture Notes in Computer Science. Springer. Pp. 106–120.
Bl¨omer, J., and V. Krummel. 2007. Analysis of countermeasures against access driven cache
attacks on AES. In Selected Areas in Cryptography, 14th International Workshop, SAC
2007, Ottawa, Canada, August 16-17, 2007, Revised Selected Papers. C. Adams, A. Miri,
and M. Wiener. (Eds.). Lecture Notes in Computer Science. to appear.
Bl¨omer, J., J. Guajardo, and V. Krummel. 2004. Provably secure masking of AES. In Selected
Areas in Cryptography, 11th International Workshop, SAC 2004, Waterloo, Canada,
August 9-10, 2004, Revised Selected Papers. H. Handschuh, and M. A. Hasan. (Eds.).
Vol. 3357 of Lecture Notes in Computer Science. Springer Verlag.
Boneh, D., R. A. DeMillo, and R. J. Lipton. 1997. On the importance of checking crypto-
graphic protocols for faults (extended abstract). In Advances in Cryptology - EURO-
CRYPT ’97, International Conference on the Theory and Application of Cryptographic
Techniques, Konstanz, Germany, May 11-15, 1997, Proceeding. W. Fumy (ed.). Vol.
1233 of Lecture Notes in Computer Science. Springer. Pp. 37–51.
Brickell, E. F., G. Graunke, M. Neve, and J.-P. Seifert. 2006. Software mitigations to
hedge AES against cache-based software side channel vulnerabilities. Technical Report
2006/052. Cryptology ePrint Archive. http://eprint.iacr.org/2006/052.
Brumley, D., and D. Boneh. 2005. Remote timing attacks are practical. Computer Networks
48, 701–716.
Cathalo, J., F. Koeune, and J.-J. Quisquater. 2003. A New Type of Timing Attack: Appli-
cation to GPS. In Walter, Ko¸c and Paar (2003). Pp. 291–303.
Chari, S., C. S. Jutla, J. R. Rao, and P. Rohatgi. 1999. Towards sound approaches to
counteract power-analysis attacks. In Wiener (1999). Pp. 398–412.
Chen, C.-N., and S.-M. Yen. 2003. Differential fault analysis on AES key schedule and some
countermeasures. In Information Security and Privacy, 8th Australasian Conference,
ACISP 2003, Wollongong, Australia, July 9-11, 2003, Proceedings. R. Safavi-Naini, and
J. Seberry. (Eds.). Vol. 2727 of Lecture Notes in Computer Science. Springer. Pp. 118–
129.
Clavier, C., J. Coron, and N. Dabbous. 2000. Differential Power Analysis in the Presence of
Hardware Countermeasures. In Ko¸c and Paar (2000). Pp. 252–263.
Daemen, J., and V. Rijmen. 1999. Resistance against Implementa-
tion Attacks: A comparative Study of the AES Proposals. Sec-
ond Advanced Encryption Standard (AES) Candidate Conference.
http://csrc.nist.gov/encryption/aes/round1/conf2/aes2conf.htm.
Daemen, J., and V. Rijmen. 2002. The Design of Rijndael: AES - The Advanced Encryption
Standard. Information Security and Cryptography. Springer Verlag.
Dhem, J.-F., F. Koeune, P.-A. Leroux, P. Mestr´e, J.-J. Quisquater, and J.-L. Willems. 1998.
A practical implementation of the timing attack. In CARDIS. J.-J. Quisquater, and
B. Schneier. (Eds.). number 1820 in Lecture Notes in Computer Science. Springer Verlag.
Diffie, W., and M. E. Hellman. 1976. New directions in cryptography. IEEE Transactions on
Information Theory 22, 644–654.
Drolet, G. 1998. A New Representation of Elements of Finite Fields GF(2m) Yielding Small
Complexity Arithmetic Circuits. IEEE Transactions on Computers 47, 938–946.
Dusart, P., G. Letourneux, and O. Vivolo. 2003. Differential fault analysis on A.E.S.. In Ap-
plied Cryptography and Network Security, First International Conference, ACNS 2003.
Kunming, China, October 16-19, 2003, Proceedings. J. Zhou, M. Yung, and Y. Han.
(Eds.). Vol. 2846 of Lecture Notes in Computer Science. Springer. Pp. 293–306.
van Eck, W. 1985. Electromagnetic radiation from video display units: an eavesdropping
risk?. Computers & Security 4, 269–286.
ElGamal, T. 1985. A public key cryptosystem and a signature scheme based on discrete
logarithms. In Advances in Cryptology, Proceedings of CRYPTO ’84, Santa Barbara,
California, USA, August 19-22, 1984, Proceedings. G. R. Blakley, and D. Chaum. (Eds.).
Vol. 196 of Lecture Notes in Computer Science. Springer-Verlag New York, Inc.. New
York, NY, USA. Pp. 10–18.
Ferguson, N., and B. Schneier. 2003. Practical Cryptography. John Wiley & Sons.
Fournier, J., S. Moore, H. Li, R. Mullins, and G. Taylor. 2003. Security Evaluation of
Asynchronous Circuits. In Walter et al. (2003). Pp. 125–136.
Gandolfi, K., C. Mourtel, and F. Olivier. 2001. Electromagnetic analysis: Concrete results.
In Ko¸c et al. (2001). Pp. 251–261.
von zur Gathen, J., and J. Gerhard. 2003. Modern Computer Algebra. 2nd ed. Cambridge
University Press, Cambridge, UK,.
von zur Gathen, J., and M. N¨ocker. 1997. Exponentiation in Finite Fields: Theory and
Practice. In Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, 12th
International Symposium, AAECC-12, Toulouse, France, June 23-27, 1997, Proceedings.
T. Mora, and H. F. Mattson. (Eds.). Vol. 1255 of Lecture Notes in Computer Science.
Springer. Pp. 88–113.
Giraud, C. 2004. DFA on AES. In Advanced Encryption Standard - AES, 4th International
Conference, AES 2004, Bonn, Germany, May 10-12, 2004, Revised Selected and Invited
Papers. H. Dobbertin, V. Rijmen, and A. Sowa. (Eds.). Vol. 3373 of Lecture Notes in
Computer Science. Springer. Pp. 27–41.
Goli´c, J. D. 2003. DeKaRT: A New Paradigm for Key-Dependent Reversible Circuits. In
Walter et al. (2003). Pp. 98–112.
Goli´c, J. D., and C. Tymen. 2002. Multiplicative masking and power analysis of AES. In
Kaliski Jr., Ko¸c and Paar (2003). Pp. 198–212.
Goubin, L., and J. Patarin. 1999. DES and Differential Power Analysis, ”The Duplication
Method”. In Workshop on Cryptographic Hardware and Embedded Systems — CHES
1999. C¸. K. Ko¸c, and C. Paar. (Eds.). Vol. LNCS 1717 of Lecture Notes in Computer
Science. Springer-Verlag. Pp. 158–172.
Guajardo, J., and C. Paar. 2002. Itoh-Tsujii Inversion in Standard Basis and Its Application
in Cryptography and Codes. Design, Codes, and Cryptography 25, 207–216.
Handy, J. 1998. The Cache Memory Book: THE authorative reference on cache design. 2nd
ed. Academic Press.
Hennesey, J., and D. Patterson. 2002. Computer Architecture: A Quantitative Approach. 3rd
ed. Morgan Kaufmann.
Hevia, A., and M. A. Kiwi. 1999. Strength of two data encryption standard implementations
under timing attacks. ACM Transactions on Information and System Security (TISSEC)
2, 416–437.
Hu, W.-M. 1992. Lattice scheduling and covert channels. IEEE Symposium on Security and
Privacy. IEEE Press. Pp. 52–61.
Intel 1997. Using the RDTSC Instruction for Performance Monitoring. Intel Corporation
1997.
Intel 2006. Intel c
64 and IA-32 Architectures Software Developer’s Manual Volume 3: System
Programming Guide.
ISO 2002. International Organization for Standardization, ISO/IEC 7816-3: Electronic sig-
nals and transmission protocols.
Itoh, T., and S. Tsujii. 1988. A Fast Algorithm for Computing Multiplicative Inverses in
GF(2m) Using Normal Bases. Information and Computation 78, 171–177.
Johansson, T. (Ed.) 2003. Fast Software Encryption, 10th International Workshop, FSE
2003, Lund, Sweden, February 24-26, 2003, Revised Papers. Vol. 2887 of Lecture Notes
in Computer Science. Springer.
Kahn, D. 1996. The Codebreakers. Scribner.
Kaliski Jr., B. S., C¸ . K. Ko¸c, and C. Paar. (Eds.) 2003. Cryptographic Hardware and Embedded
Systems - CHES 2002, 4th International Workshop, Redwood Shores, CA, USA, August
13-15, 2002, Revised Papers. Vol. 2523 of Lecture Notes in Computer Science. Springer.
Kelsey, J., B. Schneier, D. Wagner, and C. Hall. 1998. Side channel cryptanalysis of product
ciphers. In Computer Security - ESORICS 98, 5th European Symposium on Research in
Computer Security, Louvain-la-Neuve, Belgium, September 16-18, 1998, Proceedings. J.-
J. Quisquater, Y. Deswarte, C. Meadows, and D. Gollmann. (Eds.). Vol. 1485 of Lecture
Notes in Computer Science. Springer. Pp. 97–110.
Kerckhoffs, A. 1883. La cryptographie militaire. Journal des sciences militaires IX, 5–83 &
161–191.
Ko¸c, C¸. K., and C. Paar. (Eds.) 2000. Cryptographic Hardware and Embedded Systems
- CHES 2000, Second International Workshop, Worcester, MA, USA, August 17-18,
2000, Proceedings. Vol. 1965 of Lecture Notes in Computer Science. Springer.
Ko¸c, C¸., D. Naccache, and C. Paar. (Eds.) 2001. Cryptographic Hardware and Embedded
Systems - CHES 2001, Third International Workshop, Paris, France, May 14-16, 2001,
Proceedings. Vol. 2162 of Lecture Notes in Computer Science. Springer.
Kocher, P. C. 1996. Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and
other systems. In Advances in Cryptology - CRYPTO ’96, 16th Annual International
Cryptology Conference, Santa Barbara, California, USA, August 18-22, 1996, Proceed-
ings. N. Koblitz (ed.). Vol. 1109 of Lecture Notes in Computer Science. Springer. Pp. 104–
113.
Kocher, P. C., J. Jaffe, and B. Jun. 1998. Introduction to Differential Power Analysis and
Related Attacks. Technical Report. Cryptography Research, Inc.
Kocher, P. C., J. Jaffe, and B. Jun. 1999. Differential power analysis. In Wiener (1999).
Pp. 388–397.
Koeune, F., and J.-J. Quisquater. 1999. A timing attack against Rijndael. Technical Report
CG-1999/1. Universit´e Catholique de Louvain.
K¨ommerling, O., and M. G. Kuhn. 1999. Design principles for tamper-resistant smartcard
processors. Proceedings of the USENIX Workshop on Smartcard Technology — Smart-
card ’99. USENIX Association. Pp. 9–20.
Kuhn, M. G. 2002. Optical Time-Domain Eavesdropping Risks of CRT Displays. IEEE
Symposium on Security and Privacy. Pp. 3–18.
Kuhn, M. G. 2003. Compromising emanations: eavesdropping risks of computer displays.
Technical Report UCAM-CL-TR-577. University of Cambridge.
Lenstra, H. W. 2002. Rijndael for algebraists.
http://www.math.berkeley.edu/~hwl/papers/rijndael0.pdf.
Lidl, R., and H. Niederreiter. 1983. Finite Fields. number 20 in Encyclopedia of Mathematics
and its Applications. Addison Wesley.
Mangard, S. 2002. A Simple Power-Analysis (SPA) Attack on Implementations of the AES
Key Expansion. Proceedings of the 5th International Conference on Information Security
and Cryptology (ICISC 2002). Vol. LNCS 2587. Springer-Verlag. Pp. 343–358.
May, M., H. Muller, and N. Smart. 2001a. Non-Deterministic Processors. 6th Australian
Conference On Information Security and Privacy (ACISP). Pp. 115–129.
May, M., H. Muller, and N. Smart. 2001b. Random Register Renaming to Foil DPA. In Ko¸c
et al. (2001).
Menezes, A. J., P. C. van Oorschot, and S. A. Vanstone. 1997. Handbook of Applied Cryp-
tography. CRC Press.
Messerges, T. S. 2000. Securing the AES finalists against power analysis attacks. In Fast
Software Encryption, 7th International Workshop, FSE 2000, New York, NY, USA,
April 10-12, 2000, Proceedings. B. Schneier (ed.). Vol. 1978 of Lecture Notes in Computer
Science. Springer. Pp. 150–164.
Montgomery, P. L. 1985. Modular multiplication without trial division. Mathematics of
Computation 44, 519–521.
Moore, S., R. Anderson, R. Mullins, G. Taylor, and J. Fournier. 2003. Balanced Self-Checking
Asynchronous Logic for Smart Card Applications. Journal of Microprocessors and Mi-
crosystems 27, 421–430.
Neve, M., and J.-P. Seifert. 2006. Advances on access-driven cache attacks on AES. In Selected
Areas in Cryptography, 13th International Workshop, SAC 2006, Montreal, Quebec,
Canada, August 17 & 18, 2006, Revised Selected Papers. E. Biham, and A. Youssef.
(Eds.). to appear.
NIST 2001. Announcing the ADVANCED ENCRYPTION STANDARD (AES). FIPS-PUB
197. National Institute for Standards and Technology (NIST).
OpenSSL Project 2005. http://www.openssl.org.
¨
Ors, S., F. G¨urkaynak, E. Oswald, and B. Preneel. 2004. Power-Analysis Attack on an ASIC
AES Implementation. Proceedings of the 2004 International Symposium on Information
Technology (ITCC 2004). IEEE Computer Society.
Osvik, D. A., A. Shamir, and E. Tromer. 2006. Cache Attacks and Countermeasures: The
Case of AES. In Topics in Cryptology - CT-RSA 2006, The Cryptographers’ Track at
the RSA Conference 2006, San Jose, CA, USA, February 13-17, 2006, Proceedings.
D. Pointcheval (ed.). Vol. 3860 of Lecture Notes in Computer Science. Springer. Pp. 1–
20.
Otto, M. 2005. Fault Attacks and Countermeasures. PhD thesis. University of Paderborn.
Page, D. 2002. Theoretical use of cache memory as a cryptanalytic side-channel. Technical
Report CSTR-02-003. Department of Computer Science, University of Bristol.
Page, D. 2003. Defending against cache based side-channel attacks. Information Security
Technical Report 8, 30–44.
Percival, C. 2005. Cache missing for fun and profit.
www.daemonology.net/papers/htt.pdf.
Piret, G., and J.-J. Quisquater. 2003. A differential fault attack technique against SPN
structures, with application to the AES and KHAZAD. In Walter et al. (2003). Pp. 77–
88.
Quisquater, J.-J., and D. Samyde. 2001. ElectroMagnetic Analysis (EMA): Measures and
Counter-Measures for Smart Cards. In Smart Card Programming and Security, Interna-
tional Conference on Research in Smart Cards, E-smart 2001, Cannes, France, Septem-
ber 19-21, 2001, Proceedings. I. Attali, and T. P. Jensen. (Eds.). Vol. 2140 of Lecture
Notes in Computer Science. Springer. Pp. 200–210.
Quisquater, J.-J., and D. Samyde. 2002. Eddy current for magnetic analysis with active
sensor. E-Smart 2002, Nice, France.
Rankl, W., and W. Effing. 2002. Handbuch der Chipkarten. 4. ed. Carl Hanser Verlag.
Rivest, R. L., A. Shamir, and L. M. Adleman. 1978. A method for obtaining digital signatures
and public-key cryptosystems.. Communications of the ACM (CACM), 21, 120–126.
Satoh, A., S. Morioka, K. Takano, and S. Munetoh. 2001. A Compact Rijndael Hardware
Architecture with S-Box Optimization. In Advances in Cryptology - ASIACRYPT 2001,
7th International Conference on the Theory and Application of Cryptology and Informa-
tion Security, Gold Coast, Australia, December 9-13, 2001, Proceedings. C. Boyd (ed.).
Vol. LNCS 2248 of Lecture Notes in Computer Science. Springer-Verlag. Pp. 239–254.
Schindler, W. 2000. A timing attack against RSA with the Chinese Remainder Theorem. In
Ko¸c and Paar (2000). Pp. 109–124.
Schneier, B. 1996. Applied Cryptography. John Wiley & Sons.
Schramm, K., G. Leander, P. Felke, and C. Paar. 2004. A collision-attack on AES: Com-
bining side channel- and differential-attack. In Cryptographic Hardware and Embedded
Systems - CHES 2004: 6th International Workshop Cambridge, MA, USA, August 11-
13, 2004. Proceedings. M. Joye, and J.-J. Quisquater. (Eds.). Vol. 3156 of Lecture Notes
in Computer Science. Springer. Pp. 163–175.
Schramm, K., T. J. Wollinger, and C. Paar. 2003. A new class of collision attacks and its
application to DES. In Johansson (2003). Pp. 206–222.
Shamir, A., and E. Tromer. 2004. Acoustic cryptanalysis - on noisy people and noisy ma-
chines. http://theory.csail.mit.edu/∼tromer/acoustic/.
Shoup, V. 2005. A Computational Introduction to Number Theory and Algebra. Cambridge
University Press.
Skorobogatov, S. P., and R. J. Anderson. 2002. Optical fault induction attacks. In Kaliski
Jr. et al. (2003). Pp. 2–12.
Stallings, W. 2005. Operating Systems: Internals and Design Principles. 5th ed. Prentice
Hall.
Stephenson, N. 1999. Cryptonomicon. 1st ed. Eos (HarperCollins).
Tiri, K., and I. Verbauwhede. 2003. Securing Encryption Algorithms against DPA at the
Logic Level: Next Generation Smart Card Technology. In Walter et al. (2003). Pp. 125–
136.
Tiri, K., M. Akmal, and I. Verbauwhede. 2002. A Dynamic and Differential CMOS Logic
with Signal Independent Power Consumption to Withstand Differential Power Analysis
on Smart Cards. 28th European Solid-State Circuits Conference (ESSCIRC 2002).
Tiu, C. C. 2005. A new frequency-based side channel attack for embedded systems. Master’s
thesis. University of Waterloo.
Trichina, E. 2003. Combinational Logic Design For AES SubByte Transformation on Masked
Data. Cryptology eprint archive: Report 2003/236. IACR.
Trichina, E., D. D. Seta, and L. Germani. 2002. Simplified Adaptive Multiplicative Masking
for AES. In Kaliski Jr. et al. (2003). Pp. 187–197.
Trostle, J. T. 1998. Timing attacks against trusted path. IEEE Symposium on Security and
Privacy. IEEE Press. Pp. 125–134.
Tsunoo, Y., E. Tsujihara, K. Minematsu, and H. Miyauchi. 2002. Cryptanalysis of Block Ci-
phers Implemented on Computers with Cache. International Symposium on Information
Theory and Its Applications (ISITA).
Tsunoo, Y., E. Tsujihara, M. Shigeri, H. Kubo, and K. Minematsu. 2006. Improving cache
attacks by considering cipher structure. International Journal of Information Security
(IJIS) 5, 166–176.
Tsunoo, Y., H. Kubo, M. Shigeri, E. Tsujihara, and H. Miyauchi. 2003a. Timing attack
on AES using cache delay in S-boxes. Symposium on Cryptography and Information
Security.
Tsunoo, Y., T. Kawabata, E. Tsujihara, K. Minematsu, and H. Miyauchi. 2003b. Tim-
ing attack on KASUMI using cache delay in S-boxes. Symposium on Cryptography and
Information Security.
Tsunoo, Y., T. Saito, T. Suzaki, M. Shigeri, and H. Miyauchi. 2003c. Cryptanalysis of DES
Implemented on Computers with Cache. In Walter et al. (2003). Pp. 62–76.
Tsunoo, Y., T. Suzaki, T. Saito, T. Kawabata, and H. Miyauchi. 2003d. Timing attack on
Camellia using cache delay in S-boxes. Symposium on Cryptography and Information
Security.
Voigtl¨ander, P. 2003. Entwicklung einer Hardwarearchitektur f¨ur einen AES-Coprozessor.
Diplomarbeit. Fachbereich Informatik, Mathematik und Naturwissenshaften, Technische
Informatik, HTWK Leipzig. Germany.
Walter, C. D., C¸. K. Ko¸c, and C. Paar. (Eds.) 2003. Cryptographic Hardware and Embedded
Systems - CHES 2003, 5th International Workshop, Cologne, Germany, September 8-10,
2003, Proceedings. Vol. 2779 of Lecture Notes in Computer Science. Springer.
Wiener, M. J. (Ed.) 1999. Advances in Cryptology - CRYPTO ’99, 19th Annual Interna-
tional Cryptology Conference, Santa Barbara, California, USA, August 15-19, 1999,
Proceedings. Vol. 1666 of Lecture Notes in Computer Science. Springer.
Wright, P. 1987. Spy Catcher: The Candid Autobiography of a Senior Intelligence Officer.
Viking Adult.
Yang, B., K. Wu, and R. Karri. 2004. Scan based side channel attack on dedicated hardware
implementations of data encryption standard. ITC ’04: Proceedings of the International
Test Conference on International Test Conference. IEEE Computer Society. Washington,
DC, USA. Pp. 339–344.