scieee Science in your language
[en] (orig)
Soundes Marzougui, Ievgen Kabin, Juliane Krämer, Thomas
Aulbach, Jean-Pierre Seifert
On the Feasibility of Single-Trace Attacks on the
Gaussian Sampler Using a CDT
Open Access via institutional repository of Technische Universität Berlin
Document type
Conference paper | Accepted version
(i. e. final author-created version that incorporates referee comments and is the version accepted for
publication; also known as: Author’s Accepted Manuscript (AAM), Final Draft, Postprint)
This version is available at
https://doi.org/10.14279/depositonce-21360
Citation details
Marzougui, S., Kabin, I., Krämer, J., Aulbach, T., & Seifert, J.-P. (2023). On the Feasibility of Single-Trace
Attacks on the Gaussian Sampler Using a CDT. In Lecture Notes in Computer Science (pp. 149–169).
Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-29497-6_8.
Terms of use
This work is protected by copyright and/or related rights. You are free to use this work in any way permitted by
the copyright and related rights legislation that applies to your usage. For other uses, you must obtain
permission from the rights-holder(s).
On the Feasibility of Single-Trace Attacks on the
Gaussian Sampler using a CDT
Soundes Marzougui1, Ievgen Kabin2, Juliane Kr¨amer3, Thomas Aulbach3, and
Jean-Pierre Seifert1,4
1Technische Universit¨at Berlin, Germany
2IHP - Leibniz-Institut f¨ur innovative Mikroelektronik
3Universit¨at Regensburg, Regensburg, Germany
{thomas.aulbach, juliane.kraemer}@ur.de
4Fraunhofer Institute for Secure Information Technology, Germany
Abstract. We present a single-trace attack against lattice-based KEMs
using the cumulative distribution table for Gaussian sampling and ex-
ecute it in a real-world environment. Our analysis takes a single power
trace of the decapsulation algorithm as input and exploits leakage of the
Gaussian sampling subroutine to reveal the session key. We investigated
the feasibility of the attack on different boards and proved that the power
consumption traces become less informative with higher clock frequen-
cies. Therefore, we introduce a machine-learning denoising technique,
which enhances the accuracy of our attack and leverages its success rate
to 100%.
We accomplish the attack on FrodoKEM, a lattice-based KEM and third-
round alternate candidate. We execute it on a Cortex-M4 board equipped
with an STM32F4 micro-controller clocked at different frequencies.
Keywords: FrodoKEM, Gaussian sampler, Machine-Learning, Post-quantum
cryptography, Power analysis, Side-channel analysis
1 Introduction
Key encapsulation mechanisms (KEM) are widely adopted in Internet protocols
to establish a secure communication between two parties through the encryption
of the exchanged messages. Classical KEMs - relying on the intractability of fac-
torization or discrete-logarithm problems for large numbers - are considered to
be threatened by attacks with quantum computers [31] in the near future [21].
To address the potential threats from quantum attacks, the National Institute of
Standards and Technology (NIST) initiated a standardization process for post-
quantum schemes in 2016, i.e., for cryptographic schemes which are assumed
to be resistant towards attacks with quantum computers [23]. Post-quantum
schemes can be categorized in five different families. Of the five, lattice-based
cryptography has received significant research traction in recent years. Lattice-
based cryptography consists of schemes whose security can be reduced to the
hardness of lattice problems, for instance, the shortest vector problem (SVP),
the closest vector problem (CVP), and the learning with errors (LWE) prob-
lem. Lattice-based schemes have increasingly attracted attention owing to their
balanced performance in terms of sizes and speed and have been studied for real-
world deployment [8,18,24,34]. In mid-2022, the NIST standardization process
reached the end of the third round and both signature schemes and KEMs have
been chosen for standardization.
FrodoKEM is a well-known lattice-based key encapsulation mechanism [4]. It
was a third-round NIST alternate candidate and recommended by German Fed-
eral Office for Infa third-round NIST alternate candidate and recommended by
German Federal Office for Information Security (BSI) [9] and the Netherlands
National Communications Security Agency (NBV) [1] for achieving quantum-
safe communication.ormation Security (BSI) [9] and the Netherlands National
Communications Security Agency (NBV) [1] for achieving quantum-safe commu-
nication. Moreover, FrodoKEM was optimized and included in different libraries
such as pqm4 [12] and liboqs [33].
Lattice-based KEMs in general and FrodoKEM in specific are considered to
be secure against quantum attacks [31]. However, their practical implementations
succumb to side-channel analysis where an adversary has access to the victim’s
device. Regarding this, one might ask whether it will be easier for an attacker
with access to a victim’s device to directly extract the session key instead of
performing a tedious side-channel analysis. For instance, an employee with the
knowledge of the device’s technical details can easily extract the secret key.
Hence, high-end data protection has become a common place standard for every
major business entity. Moreover, the session keys generated during cryptographic
processes should be secured even against the manufacturer itself, thus deeming
such processes ‘maker-proof’. Not just that, these session keys are protected by
being stored in trusted physical modules [28,35] which are tamper- and intrusion-
resistant, highly-trusted, and meet security standards and regulations used, e.g.,
in banking systems.
Given that, extracting the session key directly from the device is challenging.
Hence, most attacks target a vulnerable routine, extract sensitive information via
a side-channel analysis, and build the long-term secret key. With the knowledge
of the long-term secret key, the session key can be easily calculated by decrypting
the exchanged ciphertext.
For example, an attacker having access to a device can retrieve the secret key
by observing the power consumption, the drop of voltage, or timing variation.
Related Work In [26], Ravi et al. identified vulnerabilities in the decapsulation
procedure, exploited them to gain information about the decrypted messages,
and recovered the long-term secret key within multiple side-channel information.
Another attack was proposed by Aysu et al. [3]. The attack targeted the matrix
and polynomial schoolbook multiplication used in these protocols. The crux of
their attack was to apply a horizontal attack that makes hypotheses on several
intermediate values within a single execution, all relating to the same long-term
secret, and combine their correlations for estimating the secret key. Despite the
fact that their attack needs a single trace, its success highly depends on the
accuracy of the used triggering techniques.
In [32], Bo-Yeon Sim et al. analyzed the message encoding operation in the
encapsulation phase of different lattice-based KEMs, i.e., CRYSTALS-KYBER,
SABER, and FrodoKEM, and obtained the session key with a single power
consumption trace [32]. Their experiments show that the success rate of the
attack for FrodoKEM can be low to 79%.
In [14], Kim et al. proposed a single trace attack against KEMs employing
the cumulative distribution table (CDT) for the error vector sampling. The CDT
sampler outputs a sample by sampling first a random value and iterating through
the CDT until the entry corresponding to the uniform random value is found.
Unlike most of attacks against KEMs in the literature, their attack reveals the
session key directly, without the need to obtain the long-term secret key. This
attack is convenient tract case ephemeral keys are used and can also be applied
to lattice-based encryption algorithms that use CDT for error sampling. The
purpose of their attack is to retrieve the sampled error and reduce each LWE
instance to a linear relation with the secret information.
In this paper, we investigate the practicability of the attack proposed in [14]
in real-world circumstances and provide a proof-of-concept implementation. For
that, we use different boards running on different frequencies. We perform our
side-channel analysis and show that as compared to the previous work of [14]
reading out the samples from a single power consumption trace of the FrodoKEM
decapsulation is not feasible. In fact, the measurement gets noisier and less
informative with higher frequency and different setup. As a solution, we employ
an offline-trained machine-learning classifier on a device similar to the victim’s,
intended to predict the Gaussian samples during the attack phase. Moreover, we
discuss the possible countermeasures to our attack based on literature work.
Contribution In this paper, we investigate the feasibility of single-trace at-
tacks against KEMs using Gaussian sampling and present a full-key recovery
for FrodoKEM. We confirm the practicality of our full-key recovery attack tar-
geting the NIST reference implementation and the optimized code of the pqm4
library for FrodoKEM when executed on a 32-bit Arm Cortex-M4 using machine-
learning filtering techniques. We prove that our attack is robust to real-world
conditions, such as noisy power measurements and high frequencies. The main
contributions of the paper are summarized as follows:
Evaluation of the feasibility of a single-trace attack against FrodoKEM. Our
experimental setup relies on different realistic conditions: different clock fre-
quencies and different target boards.
Proof-of-concept implementation of a single-trace attack against FrodoKEM.
Deployment of machine-learning tools to retrieve the sensitive information
in case of noisy and/or less informative measurements.
Discussion of a possible countermeasure implementation of Gaussian sam-
pling as suggested in the literature.
Organization: The remainder of this paper is organized in six sections. In
Section 2, we give preliminaries on the FrodoKEM scheme and the Gaussian
sampling. Then, in Section 3, we present our experimental setup and the tar-
geted implementations. In Section 4, we present a simple power analysis on the
Gaussian sampling routine. Subsequently, we present a detailed mathematical
description of our single trace power analysis attack in Section 5 and focus on
the influence of the introduced noise on the feasibility of our attack. Likewise,
in Section 6, we describe our machine-learning filtering techniques. We conclude
the paper with Section 7 where we discuss the possible countermeasure and call
attention to the urgent need of further protection against single trace attacks
targeting the Gaussian sampler.
2 Background
2.1 Lattices
Alattice Λis a discrete subgroup of Rn. Given mnlinearly independent
vectors b1, ..., bmRn, the lattice Λ(b1, ..., bm) is the set of all integer linear
combinations of the bi’s, i.e.,
Λ(b1, ..., bm) = nm
X
i=1
xibi
xiZo,
where b1, ..., bmform a basis of Λand mis the rank. In this paper, we consider
full-rank lattices, i.e., with m=n. An integer lattice is a lattice for which the
basis vectors are in Zn. Usually, we consider elements modulo q, i.e., the basis
vectors and coefficients are taken from Zq.
2.2 Learning with Errors
The Learning with Errors problem (LWE), which is a generalization of the classic
Learning Parities with Noise problem, was introduced by Regev [27]. We explain
in the following the Learning with Errors problem.
Definition 1. Let n,qbe positive integers, and let χbe a distribution over Z.
For sZn
q, the LWE distribution As,χ is the distribution over Zn
q×Zqobtained
by choosing aZn
quniformly at random and an integer error eZfrom χ.
The distribution outputs the pair (a,a,s+emod q)Zn
q×Zq.
There are two important computational LWE problems:
The search problem is to recover the secret sZn
qgiven a certain number
of samples drawn from the LWE distribution As.
The decision problem is to distinguish a certain number of samples drawn
from the LWE distribution from uniformly random samples.
2.3 Gaussian Sampler and CDTs
Several lattice-based schemes use the discrete Gaussian distribution as error dis-
tribution χin Definition 1. A centered discrete Gaussian distribution is used in
LWE to ensure that the noise added to the ciphertext is truly random and unbi-
ased, providing a secure and efficient encryption scheme. The discrete Gaussian
distribution over a lattice Λis defined as
DΛ,σ(x) = ρσ(x)
PyΛρσ(y),(1)
where
ρσ(x) = e
x2
2σ2(2)
represents the continuous Gaussian function. When sampling from the positive
integers, we simply write D+
σ, which is defined accordingly by
D+
σ(x) = ρσ(x)
P+
y=0 ρσ(y)(3)
There are different generic ways to sample from a discrete Gaussian distribu-
tion. One of the approaches employs the CDT for sampling as in Algorithm 1.
First, the CDT ψis precomputed using the cumulative distribution function
of D+
σ. The idea is that the sampler returns the index iof the table ψ, such that
ψ[i]< x ψ[i+ 1], where xis generated uniformly from the interval that is
covered by the table. The parameter τdenotes the tail-cut and is chosen such
that the probability for drawing from outside the interval is negligible.
Algorithm 1 Gaussian Sampler using CDT
Require: CDT ψof length l, following a distribution D+
σ, and having a tailcut τ
Ensure: Sampled value Sfollowing the targeted distribution Dσ
1: S0
2: rnd [0,τσ)Zuniformly at random
3: sign [0,1] Zuniformly at random
4: for (i= 0 ; i < l 1; i+ +) do
5: S+=(ψ[i]rnd)>> 15
6: end for
7: S((sign)S) + sign
8: return S
2.4 Description of FrodoKEM
FrodoKEM is a lattice-based KEM with security based on the standard LWE
problem (i.e., not the ring version of the LWE problem). It is a conservative
design with security proofs. KEM’s are defined as a triple of algorithms (KeyGen,
Encaps, Decaps). KeyGen is the algorithm responsible for public and secret
key generation. The encapsulation algorithm Encaps generates a random key
kand encrypts it using the public key pk to create a ciphertext cand derive
a shared secret ss. The decapsulation algorithm Decaps decrypts cusing the
secret key sk and returns the derived session key ss, or a random output in
case the re-encrypted ciphertext does not fully match the previous output of the
encapsulation algorithm.
We describe the decapsulation of FrodoKEM as it is the target of our attack,
see Algorithm 2. We introduce the following parameters:
n, ¯m, ¯ninteger matrix dimensions with n0 (mod 8)
Bis the number of the bits encoded in each matrix entry
lenseedAthe bit length of seeds used for pseudorandom matrix generation
lenseedSE the bit length of seeds used for pseudorandom bit generation for
error sampling
Gen pseudorandom matrix generation algorithm
Tχdistribution table for sampling
lensthe length of the bit vector sused for pseudorandom shared secret
generation in the event of decapsulation failure
lenzthe bit length of seeds used for pseudorandom generation of seedSE
lenkthe bit length of intermediate shared secret k
lenpkh the bit length of the hash of the public key
lenss the bit length of shared secret ss
The decapsulation starts with the calculation of the matrix M. When sim-
plifying M, we can write it as
M=Encode(µ) + SEES+E′′ ,
where S,S,E,E, and E′′ have small entries. Therefore, SEES+E′′ will
also result in a matrix with small entries, regarded as noise. The Decode (line
4, Algorithm 2) removes this noise and returns the seed µ. The decapsulation
then continues by doing a reencryption and comparing the ciphertexts. If the
ciphertexts (line 16, Algorithm 2) are equal, the correct shared key ss is returned.
Algorithm 2 FrodoKEM Key Decapsulation according to [23]
Require: Ciphertext c1c2 {0,1}( ¯m.n+ ¯m.¯n)D, and secret key sk=
(sseedAb,ST,pkh) {0,1}lens+lenseedA+D.n.¯n×Z¯n×n
q× {0,1}lenpkh
Ensure: Shared secret key ss {0,1}lenss
1: BUnpack(c1)
2: CUnpack(c2)
3: Compute MCBS
4: Compute µDecode(M)
5: Parse pk seedAb
6: Generate pseudorandom values
seedSEkSHAKE(pkhµ, lenseedSE +lenk)
7: Generate pseudorandom bit string
(r(0), r(1),...,r(2 ¯mn+ ¯mn1))SHAKE (0x96seedSE,(2 ¯mn + ¯mn).lenχ)
8: Sample error matrix SSampleMatrix(r(0), r(1),...,r( ¯mn1),¯m, n, Tχ)
9: Sample error matrix ESampleMatrix(r( ¯mn), r( ¯mn+1),...,r(2 ¯mn1),¯m, n, Tχ)
10: Generate AGen(seedA)
11: Compute B′′ SA+E
12: Sample error matrix E′′ SampleMatrix(r(2 ¯mn),...,r(2 ¯mn+ ¯mn1),¯m, ¯n, Tχ)
13: BUnpack(b, n, ¯n)
14: Compute VSB+E′′
15: Compute CV+ Encode(µ)
16: if BC=B′′ Cthen
17: return shared secret ss SHAKE (c1c2k, lenss)
18: else
19: return shared secret ss SHAKE (c1c2s, lenss)
20: end if
3 Experimental Setup
In this section, we present the experimental setup used for our side-channel anal-
ysis. Our attack targets the implementations taken from the open-source pqm4
library [12], running on the ARM Cortex-M4 and Harvard micro-controllers 5.
3.1 Implementations of FrodoKEM
The FrodoKEM source code [2] was provided as a portable C implementation.
The reference implementation of FrodoKEM having the smallest parameter set
(frodokem640shake/ frodokem640aes) requires almost a megabyte of RAM (in-
cluding messages and keys). This is mainly due to placing the entire matrix Ain
RAM. For larger parameter sets, more memory will be required. Therefore, the
reference implementations are not suitable for the target platforms considered for
this attack (and less interesting as side-channel target anyway). The optimized
5The implementation of our attack can be found at https://github.com/Soundes-M/
Soundes-M-FrodoKEMSingleTrace-/settings
implementations provided by pqm4 reduce the memory consumption, however,
the memory footprint remains large, and only the variant of NIST security level I
(frodokem640shake) fits on STM32F6target platforms consuming 117 KB of
RAM (including messages and keys). The larger parameter sets of FrodoKEM
consume between 181 KB (frodokem976shake) and 298 KB (frodokem1344aes).
This is due to the fact that the implementations of the AES parameter sets of
FrodoKEM use more memory than their SHAKE counterparts, and, thus, exceed
our memory limits by far [13]. Additionally, pqm4 [12] includes M4-optimized
assembly implementations for frodokem640shake and frodokem640aes by [5]. It
speeds up the polynomial multiplication and decreases the stack memory con-
sumption [5]. In that context, we mount our attack against the optimized [12]
implementation of frodokem640shake.
3.2 Experimental Workbench
To record the traces for the attack, we used two different target boards,i.e., 8-bit
Harvard and 32-bit Cortex, mounted on a ChipWhisperer Lite CW308 UFO as
in Figure 1. The ChipWhisperer is equipped with an analog-to-digital (ADC)
which converts the voltage input to a digital number representing the magnitude
of the voltage.
During recording, the ChipWhisperer and the micro-controller are synchro-
nized. The sampling rate of the analog-to-digital converter (ADC) was set to 4
samples/cycle with 10-bit resolution. We used a Python script running on the
PC to collect and store all relevant traces. We also used a high pass filter in the
first experiment to remove the low-frequency noise [10]. For higher frequency,
we used an external crystal Quarz oscillator.
The reason behind picking the two architectures 8-bit Harvard and 32-bit
Cortex is that we wanted to figure out the feasibility of the single trace attack
claimed in [14]. The findings in [14] need to be interpreted with cautions. In fact,
in [14], the authors used an 8-bit Harvard board equipped with an XMega micro-
controller which is especially common in educational embedded applications. In
contrast, in real world, Cortex-M boards have been embedded in tens of billions
of consumer devices.
Target Boards Setup Our setup is composed of a CW308 UFO Board which
is a board suitable for attacking different sorts of embedded targets. The UFO
board has three 20-pin female headers into which the target board fits. They
provide both electrical and mechanical connections for the board. The pin 1
in the right of the UFO board (Figure 1) corresponds to the low-side shunt
connection connected to the SMA cable, which is a coaxial cable responsible for
transmitting the signal from the target board to the ChipWhisperer. The power
consumption measurements are obtained by measuring the voltage drop across
the shunt resistor.
6We use in our experiments the STM32F4 target board which has 1 MB of Flash
memory and 192 KB of RAM
Figure 1: Experimental workbench used for our side-channel analysis; it contains
two target boards which can be mounted on the UFO board (red), a ChipWhis-
perer, an external clock oscillator, and a USB cable.
Clock Frequency Setup The CW308 has a crystal oscillator driver, which allows
the attacker to drive the victim board through the use of an external crystal.
In other words, it is possible to generate any frequency by simply putting an
appropriate crystal (as shown in Figure 1) into the socket. This step needs to be
followed by adjusting of the baudrate, and routing the specific crystal oscillator
to the victim external clock interface (CLKIN) using jumper J3.
4 Simple Side-Channel Analysis
4.1 Threat Model
Our threat model follows the power side-channel model of [19,20,22]. We assume
that an attacker has physical access to the victim’s device and is equipped with a
reasonable measurement setup that can synchronize the sampling rate within the
CPU clock period of the victim’s device such as the experimental setup described
in Section 3. The adversary in our model can record the power consumption
measurement of the key decapsulation (Algorithm 2).
Although we specify the points for our experiments where the execution of
the leaking routine (that is, the Gaussian sampling) starts in the power con-
sumption using triggering techniques, we note that for other devices, the at-
tacker might need engineering aspects of locating these cryptographic routine
sub-traces among the whole trace as explained in [6,11,37].
We indicate that our attack is a passive attack; by revealing the session key,
the adversary can also see the encrypted messages over a public channel and
has the public key of the victim which is stored in his certificate. However,
the adversary is not able to modify, drop, replay, or inject messages on the
public channel, nor use the retrieved session key to interact with the other party.
Basically, the attacker will be able only to decrypt the exchanged encrypted
messages between the two parties using the extracted session key.
4.2 Single-Trace Attack on the Gaussian Sampler
In the FrodoKEM reference and optimized implementation [2, 12], the values
are expressed in 16-bit integers, and nine bits are used for sampling. When the
value is negative and expressed in two’s complements, its significant bit is 1.
Listing 1.1: Gaussian Sampler Implementation in the reference and optimized
implementation of FrodoKEM [2,12]
1 void frodo sample n ( uint16 t s , const s i z e t n) {
2 unsigned i n t i , j ;
3 f o r ( i = 0 ; i <n ; ++i ) {
4 uint16 t sample = 0;
5 uint16 t prnd = s [ i ] >> 1 ;
6 uint16 t sign = s [ i ] & 0x1 ;
7 f o r ( j = 0 ; j <( unsigned int ) (CDF TABLE LEN 1)
8 ; j++) {
9 sample += ( uint16 t )(CDF TABLE[ j ] prnd ) >> 15;
10 }
11 s [ i ] = ((sign ) ˆ sample ) + sign ;
12 }
13 }
Hence, if the subtraction in line 9, Listing 1.1 yields a negative number, its most
significant bit is 1. When this number, expressed in 16 bits, is shifted to the right,
one is added to the value sample. However, if the subtraction outputs a positive
number, its most significant bit is 0. Owing to this, the value of sample will not
be incremented. By examining the power consumption trace during the iteration
through the CDT (Figure 2 (a)), we could distinguish between the addition of
zero and one.
After the iteration through the CDT, the sampler calculates a positive integer
called sample. Then, a bit-flipping operation is applied (line 11 in Listing 1.1). If
the sign bit is flipped and yields zero, then the sample’s sign remains the same.
Else, the sign is flipped. This can be observed clearly in Figure 2 (b).
Power Consumption Traces with Different Boards We acknowledge that
the observations in Figure 2 were already investigated by Kim and Hong in [14].
However, they are not always detectable, especially when taking the measure-
ments on boards with different architectures, such as Cortex-M4. We exemplify
this through the power consumption trace of one iteration through the CDT on
a Cortex-M4 equipped with an STM32F4 Micro-controller in Figure 3. The re-
sults presented in [14] do not apply to our setup. As in Figure 3 (a), one cannot
differentiate with the naked eye between the two colors corresponding to the
addition of zero and one. Surprisingly, the bit flipping is still vulnerable in this
setup as in Figure 3 (b).
(a) (b)
Figure 2: Figure (a) shows overlapped power consumption measurements during
the execution of line 7 - 10 of Listing 1.1, while Figure (b) shows overlapped
power consumption measurements during the sign bit flipping operation (line 11,
Listing 1.1). Both measurements are taken on an 8-bit Harvard board equipped
with an XMega micro-controller; the red color corresponds to the sampling of
the value 0 in Figure (a) and a flipped bit in Figure (b), while the blue color
corresponds to the sampling of the value 1 in Figure (a) and a non-flipped bit
in Figure (b)
Explanation For each target, the C code gets compiled and outputs a binary.
The binaries corresponding to the compilation of the C code for different targets
are identical, however, they are meant to be executed on targets that are in
reality each very different from one another. Each target is a distinct collection
of millions of transistors. Hence, this explains the differences in the power con-
sumption traces, which are the measurement of the power consumed by these
millions of transistors. The feasibility of side-channel analysis against one board
does not imply necessarily its feasibility on other boards. In the following, we
present power consumption measurements on different boards having different
frequencies, and we show that the results in [14] do neither apply for all types
of boards nor for different clock frequencies.
Power Consumption Traces with Different Frequencies We increased
the frequency from 7,327 MHz to 30 MHz and we took the power consump-
tion measurement as in Figure 4 and 5. The red color corresponds to the power
consumption traces taken at a frequency of 7,327 MHz, while the blue color indi-
cates those taken at 30 MHz. When the chip is clocked at 7,327 MHz, we notice
(as in the randomly zoomed area in Figure 4 and 5) that in every clock cycle
there are two high peaks. However, at a higher frequency we notice that only
one peak is occurring which minimized the fluctuation of the power consumption
measurement as compared to those taken at lower frequency.
Explanation We explain this behaviour based on [17]. There are two peaks (at
the low frequency 7,327 MHz). The high peak occurs when the clock edge rises
from low to high, smaller peaks take place when the clock signal falls down. At a
(a) (b)
Figure 3: Figure (a) shows overlapped power consumption measurements during
the execution of line 7 - 11 of Listing 1.1, while Figure (b) shows overlapped
power consumption measurements during the sign bit flipping operation (line
11, Listing 1.1). Both measurements are taken on on a Cortex-M4 equipped
with an STM32F4 micro-controller; the red color corresponds to the sampling
of the value 0 in Figure (a) and a flipped bit in Figure (b), while the blue color
corresponds the sampling of the value 1 in Figure (a) and a non-flipped bit in
Figure (b).
Figure 4: Overlapped Power consumption measurement during the execution of
line 9 of Listing 1.1 on an 32-bit Cortex board equipped with an STM32F4
micro-controller for two different clock frequencies
higher clock frequency, the small peak fades or even vanishes as it overlaps with
the highest peak. This yields a less informative measurement as the number of
peaks of the power consumption traces decreases with higher frequencies. To
this end, we came to the conclusion that single-trace attacks against the CDT
Gaussian sampler presented in [14] cannot be generalized to different boards and
different frequencies.
5 Description of the Attack and Error Tolerance
Once the victim starts executing the decapsulation algorithm, the attacker
records its power consumption. Then, the attacker can locate in the full-trace
the power consumption subtrace TSand TE′′ corresponding to the sampling of
Sand E′′ as in Algorithm 2. The experimental setup is detailed in Section 3.
Figure 5: Overlapped Power consumption measurement during the execution of
line 9 of Listing 1.1 on an 8-bit Harvard board equipped with an XMega micro-
controller for two different clock frequencies
The first subtrace of the Ssampling called TSis itself composed of 5120
subtraces, each corresponding to one iteration through the CDT. We call these
subtraces TS
i, with 0 i < 5120. The attacker performs the side-channel anal-
ysis (described in Section 4) of the collected subtraces to predict the Gaussian
samples which are the 5120 entries of the matrix S. We refer to those side-
channel analyses by the function side channel (line 4 and 7, Algorithm 3). Sim-
ilarly, the attacker gets the subtrace TE′′ corresponding to the sampling of the
64 entries of the matrix E′′. Again, the attacker analyses each subtrace TE′′
jto
predict the 64 entries of the matrix E′′, where 0 j < 64.
Having the values of Sand E′′ , the attacker computes the matrix Vas in
the decapsulation (Algorithm 2) through the following equation:
V=SB+E′′ (4)
It is known from Algorithm 2 that:
C=V+Encode(µ) (5)
Then, the attacker obtains the matrix Encode(µ) by plugging Equation 4 in
Equation 5 as below.
C=SB+E′′ +Encode(µ) (6)
Note that Encode is a function that takes a bit strings of length l=Bׯmׯn
as input and encodes it to a matrix of ¯mׯnentries. We refer to [4] for more
details. The Equation 6 gives the encoding of µ.
Encode(µ) = CSBE′′ (7)
To obtain µthe attacker applies the Decode on the right side of the Equation 7.
Note that Decode decodes an m-by-nmatrix into a bit string of length B×m×n.
This means, it extracts Bbits from each entry of the matrix. Hence, µcan be
written as:
µ=Decode(CSBE′′) (8)
The attacker then calculates the session key ss. The attack is summarized in
Algorithm 3.
Algorithm 3 Single Trace Attack
Require: Ciphertext c1c2 {0,1}( ¯m.n+ ¯m.¯n)Dand power consumption traces
TS= (TS0,...,TS¯mn1) and TE′′ = (TE0,...,TE¯m¯n1)
Ensure: The session key ss
1: BUnpack(c1)
2: CUnpack(c2)
3: for i {0. . . ¯mn}do
4: S
iside channel(TSi)
5: end for
6: for i {0. . . ¯m¯n}do
7: E′′
iside channel(TE′′ i)
8: end for
9: Compute V=SB+E′′
10: Compute Encode(µ) = CVand get µby applying Decode()
11: Generate pseudorandom values seedSEkSHAKE(pkhµ, lenSE +lenk)
12: shared secret ss SHAKE (c1c2k, lenss)
13: return ss
Noise Tolerance We notice here that the noise on the samples E′′ does not
affect the correctness of the attack because the Decode() function rounds the
term CSBE′′ to the (32 B) most significant bits. Hence, the Bleast
significant bits -even when guessed wrong- do not have consequences on the value
µ(Bis equal to 2 for the first security level of FrodoKEM).
On the other side, having an error on one coefficient of the matrix Swill
propagate when this latter is multiplied by Band will result in different erro-
neous matrix entries. This error can be located in the first bits of the coefficients
of the resulting matrix (SB) which will not be tolerated and result in a wrong
value of µ. We include in the following section, a machine-learning side chan-
nel analysis technique that leverage the accuracy of the single trace attack and
enhance the attacker capability to retrieve the session key.
6 Machine-Learning Side-Channel Analysis
Side-channel experiments are usually carried out in careful and non-realistic
circumstances. For example, the noise is minimized by using noise filtering tech-
niques, or non-commercial prototype boards are targeted. This yields easy-to-
analyse power consumption traces. However, in reality the real-world conditions
such as noise can prohibit the attack as explained in Section 5. To tackle this
problem, we write the power consumption 7at a specific point of time as the
following:
P=Pop +Pdata +Pnoise +Pconst (9)
7We mean here the power consumption of the device while running a cryptographic
operation.
The four entities Pop,Pdata,Pnoise, and Pconst are functions of time. The
entities Pop and Pdata are the power consumption depending on the executed
operation and the data, respectively. The entity Pnoise represents the noise added
to the power consumption measurement, and Pconst refers to constant power
consumption that occurs independently of the operation and the data.
Power analysis exploits the dependency between the power consumption and
the processed operation and data, i.e., Pop and Pdata. In Section 4.2, we fig-
ured out two scenarios affecting the ability of an attacker to analyse the power
consumption traces. In the first scenario, the noise level is extremely high, i.e.,
Pnoise >> Pop +Pdata, which prevents the attacker from detecting the leak-
age. To characterize the added noise Pnoise, we recorded the power consumption
while running the Gaussian sampling with fixed input, i.e., fixed random num-
bers. This yields the repeated execution of the same instructions with the same
input data. We mounted this experiment with 1000 repetitions on each board.
From each of these traces we took always the sample with the same index (i.e.,
index 120) corresponding to the first point of interest (POI) in the power con-
sumption trace. With these points we computed the two histograms in Figure 6
and 7.
For a low frequency, we point out that most of the points are concentrated
around zero. The shape of the histogram in Figure 6 indicates that the points in
the power traces follow a Gaussian distribution. However, for higher frequency
the noise distribution tends to be uniform for a cortex M4 board. For an XMega
board, the added noise follows a multidimensional Gaussian distribution.
In the second scenario, the high frequency leads to flattening the power con-
sumption trace, i.e., Pop +Pdata constant (observations in Section 4.1 demon-
strate that with higher frequencies the power consumption becomes less infor-
mative) which can often decrease the capability of an attacker to retrieve the
intermediate sensitive data with the naked eye or even with differential power
analysis.
Figure 6: Noise Distribution on
POI when the target board is
clocked at 7 MHz
Figure 7: Noise Distribution on
POI when the target board is
clocked at 30 MHz
Therefore, in both scenarios, a single-trace attack against FrodoKEM as
in [14] could not be mounted in real circumstances. To tackle these challenges,
one way is to use a spectrum analyser which displays a spectrum of signal am-
plitudes on different frequencies and determines the noise added to the signal
(power consumption of the device while running the cryptographic operation).
In view of the expensive price of a spectrum analyzer, other cheaper ways are
considered, e.g., template attacks based on a Gaussian assumption [7]. However,
template attacks have been subject to much criticism [16]. First, template at-
tacks hold true only if the assumption on the noise distribution is correct. This is
not always fulfilled in a real-world scenario. Moreover, they are useful as long as
a limited number of points of interest can be identified in leakage traces and con-
tain most of the information. If the number of useless samples in leakage traces
increases and/or the size of the profiling set becomes too limited, the template
attacks are useless [16]. Instead, machine-learning side-channel analysis is more
powerful in this case.
We propose a machine-learning technique composed of two phases: a profiling
phase and an attack phase.
6.1 Profiling Phase
To prepare the profiling, we executed the Gaussian sampling process (Algo-
rithm 1) with random input.
The training of our neural network proceeds as follows. First, we prepare
traces of the execution of multiple iterations through the CDT during the Gaus-
sian sampling function with random input (lines 7–13, Listing 1.1). The output
of each CDT iteration (i.e., the Gaussian sample) is assigned as the label. With
the prepared traces and their corresponding labels, we can build a list of exam-
ples (x, y)Rt×Y, where xRtare the power traces acting as the features,
and yYare the Gaussian samples acting as the labels. We assume that xleaks
information about y. The list of these noisy examples (x, y) is split into training,
validation, and a test set of the Multi-Layer Perceptron (MLP) machine-learning
classifier.
We emphasize that the attacker should train a classifier for each board type
(i.e., Cortex-M4, Harvard) and frequency. According to [36], ignoring the board’s
specifics and diversity can easily lead to an overestimation of the classification
accuracy.
Tuning the hyper-parameters of our machine-learning model is of particular
importance because it influences the accuracy of the trained machine-learning
classifier, hence the practicability of our attack. There are often general heuris-
tics or rules of thumb for configuring hyper-parameters. However, a better ap-
proach is to objectively search different values for model hyper-parameters and
choose a subset that maximizes the prediction accuracy of our classifier. The
so-called hyper-parameter optimization is available in the scikit-learn Python
machine-learning library [30]. The result of a hyper-parameter optimization is
a single set of well-performing hyper-parameters that is used to configure the
MLP classifiers.
We refer to each of our MLP classifiers as Classifier which is trained on traces
labeled by the output samples. We captured 20,000 power consumption traces.
We set 18,000 of them for training and testing and 2,000 for validation.
6.2 Attack Phase
Equipped with the trained MLP classifier, the attacker deduces information
about the samples from the power traces.
First, she (the attacker) lets the victim device run the decapsulation process.
Then, she extracts from the full power trace the parts of the trace corresponding
to the Gaussian sampling of the matrices Sand E′′. Each of those traces, i.e., S
and E′′ is split itself into snippets corresponding to the sampling of the matrix
entries. To this end, the attacker feeds those snippets to the classifier trained in
the profiling phase, and obtains a prediction of each entry in the matrices S
and E′′.
Once the matrices Sand E′′ are predicted by the attacker, this latter plugs
this information in Equation 8 and calculates the value µas described in Sec-
tion 5. Therefore, the session key can be calculated by hashing the value of µ
concatenated with the public known values of the ciphertext, k, and lenss as
the following: ss = SHAKE (c1c2k, lenss).
7 Countermeasures and Conclusion
It is important to note that CDT is not the only method of sampling, with bino-
mial sampling being used in modern lattice-based schemes such as Kyber. Our
attack may potentially affect the binomial distribution as similar instructions are
executed, and we anticipate it to be even more efficient due to the fact that bi-
nomial sampling counts the number of positive outcomes in binary experiments,
making bitwise operations more susceptible to attack through side-channel anal-
ysis. In the following, we discuss the possible countermeasures against the pro-
posed single-trace attack on the CDT sampler.
To avoid the presented single trace attack, the work [38] might be a coun-
termeasure. Instead of sampling directly from the target distribution Dσ, they
suggest to first start by generating a sample xfrom the base sampler D+
σ0, where
σ0=p1/2 ln (2).
Then, obtain the value yuuniformly at random from [0,· · · , K 1], and
compute z=yu+Kx, where K=jσ
σ0+ 1kis a constant. Finally, a Bernoulli
rejection sampling with acceptance rate
p= exp (y(y+ 2Kx)
2σ2)
is applied to ensure that zfollows the distribution D+
σ. In order to obtain also
negative samples, one can apply a random sign bit, but has to reject z= 0 with
probability 1/2.
This countermeasure does protect against the proposed single-trace attack
for schemes having relatively high deviation. However, for FrodoKEM the target
standard deviation is σ= 2.8. Therefore, the value of Kis small, i.e K= 4. An
attacker still can get the samples with a precision K, since she is able to reveal
the value of xwith the introduced single trace attack. She only misses out on the
value yu {0,· · · , K 1}, when trying to recover the final sample z=yu+Kx.
One of the common countermeasures is masking. In [29] Schneider et. al
presented the first protected (masked) binomial sampler claimed to be secure
against side-channel adversaries of arbitrary order. First-order masking counter-
measures are not enough to protect against ML attacks. Leveraging the masking
order is needed but it yields a performance overhead memory and time-wise.The
first-order masking technique has been, however, broken by [22] using machine-
learning side-channel analysis.
Another countermeasure was suggested by [14], where a look-up table is used
for sampling. One samples a 16-bit integer, and uses it as the index of the look-
up table and outputs the corresponding table value (or the sample). However,
this countermeasure needs to store a big look-up table. For FrodoKEM-640, the
Gaussian sample does not exceed one byte, and the random values are 16-bit
precision, where the last significant bit is used for the sign. Hence, a table of size
216 bits is needed.
Additionally, one can perform the Fisher-Yates random shuffle (or Knuth
shuffle) [15] in order to mask the relationship between the side channel and
the secret information (the samples). Specifically, after sampling the matrices
Sand E′′ their samples are shuffled. This countermeasure might be robust
against our attack, given that each session key is specific to a single session
establishment. However, in case the attacker succeeded to force the victim
reusing the same session key, the countermeasure of random shuffling is not
secure anymore and was proven to leak information [25]. An attacker needs
a marginally larger, yet still practical number of samples to rearrange the
coordinates and undo the shuffle.
To conclude, KEMs are important cryptographic routines for large-scale com-
munication protocols. As ephemeral secrets are used in these protocols, the risk
of being vulnerable to side-channel analysis can be underestimated. This is be-
cause the chance of an attack being successful decreases when the same key is
used only once in a single execution of the scheme. In this paper, we validate
that it is indeed crucial to examine the vulnerability of these schemes against
single-trace attacks targeting the session key. As lattice-based key exchange pro-
tocols are already deployed in practical applications, their side-channel evalua-
tion should play a role in the decision of their implementation choices. This pa-
per examines the feasibility of side-channel analysis against KEMs using CDT
Gaussian sampling and proves that the latter is still vulnerable to machine-
learning side-channel attacks even in real-world circumstances. We demonstrate
our results using FrodoKEM as an example, which is a NIST alternate candi-
date from the third round, and has been recommended by the German Federal
Office for Information Security(BSI) [9] and the Netherlands National Communi-
cations Security Agency (NBV) [1] for achieving quantum-safe communication.
Our single-trace attack leads to the recovery of the complete session key.
Acknowledgment
The work described in this paper has been supported by the German Federal
Ministry of Education and Research (BMBF) under the project Full Lifecycle
Post-Quantum PKI - FLOQI (ID 16KIS1074) and under the project Aquorypt
(ID 16KIS1022).
References
1. Netherlands National Communications Security Agency. Prepare for the
threat of quantum-computers, 2022. https://english.aivd.nl/publications/
publications/2022/01/18/prepare-for-the-threat-of-quantumcomputers.
2. Erdem Alkim, Joppe W. Bos, L´eo Ducas, Patrick Longa, Ilya Mironov, Michael
Naehrig, Valeria Nikolaenko, Chris Peikert, and Ananth Raghunathan Douglas
Stebil. Frodokem: Learning with errors key encapsulation. Github. https://
github.com/microsoft/PQCrypto-LWEKE.
3. Furkan Aydin, Aydin Aysu, Mohit Tiwari, Andreas Gerstlauer, and Michael Or-
shansky. Horizontal side-channel vulnerabilities of post-quantum key exchange and
encapsulation protocols. ACM Trans. Embed. Comput. Syst., 20(6), oct 2021.
4. Joppe Bos, Craig Costello, L´eo Ducas, Ilya Mironov, Michael Naehrig, Valeria
Nikolaenko, Ananth Raghunathan, and Douglas Stebila. Frodo: Take off the ring!
practical, quantum-secure key exchange from lwe. In Proceedings of the 2016 ACM
SIGSAC conference on computer and communications security, pages 1006–1018,
2016.
5. Joppe W Bos, Simon Friedberger, Marco Martinoli, Elisabeth Oswald, and Martijn
Stam. Fly, you fool! faster frodo for the arm cortex-m4. Cryptology ePrint Archive,
2018.
6. Wouter Castryck, Ilia Iliashenko, and Frederik Vercauteren. Provably weak in-
stances of ring-lwe revisited, 05 2016.
7. Suresh Chari, Josyula R. Rao, and Pankaj Rohatgi. Template attacks. In Burton S.
Kaliski, ¸cetin K. Ko¸c, and Christof Paar, editors, Cryptographic Hardware and
Embedded Systems - CHES 2002, pages 13–28, Berlin, Heidelberg, 2003. Springer
Berlin Heidelberg.
8. L´eo Ducas, Alain Durmus, Tancr`ede Lepoint, and Vadim Lyubashevsky. Bi-
modal lattice signature scheme (bliss). https://wiki.strongswan.org/projects/
strongswan/wiki/BLISS.
9. Federal Office for Information Security (BSI). Bsi tr-02102-1: ”cryptographic
mechanisms: Recommendations and key lengths” version: 2022-1, 2022.
https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/
TechGuidelines/TG02102/BSI-TR-02102-1.html.
10. NewAE Technology Inc. https://www.mouser.com/datasheet/2/894/
NAE-CW308-datasheet-1289269.pdf.
11. Mehmet Sinan Inci, Berk Gulmezoglu, Gorka Irazoqui, Thomas Eisenbarth, and
Berk Sunar. Cache attacks enable bulk key recovery on the cloud, 08 2016.
12. Matthias J. Kannwischer, Joost Rijneveld, Peter Schwabe, and Ko Stoffelen.
PQM4: Post-quantum crypto library for the ARM Cortex-M4. https://github.
com/mupq/pqm4.
13. Matthias J. Kannwischer, Joost Rijneveld, Peter Schwabe, and Ko Stoffelen. pqm4:
Testing and benchmarking NIST PQC on ARM cortex-m4. IACR Cryptol. ePrint
Arch., page 844, 2019.
14. Suhri Kim and Seokhie Hong. Single trace analysis on constant time cdt sampler
and its countermeasure. Applied Sciences, 8(10), 2018.
15. Donald E Knuth. Art of computer programming, volume 2: Seminumerical algo-
rithms. Addison-Wesley Professional, 2014.
16. Liran Lerman, Romain Poussier, Gianluca Bontempi, Olivier Markowitch, and
Fran¸cois-Xavier Standaert. Template attacks vs. machine learning revisited (and
the curse of dimensionality in side-channel analysis). In Stefan Mangard and
Axel Y. Poschmann, editors, Constructive Side-Channel Analysis and Secure De-
sign, pages 20–33, Cham, 2015. Springer International Publishing.
17. Stefan Mangard, Elisabeth Oswald, and Thomas Popp. Power Analysis Attacks:
Revealing the Secrets of Smart Cards (Advances in Information Security). Springer-
Verlag, Berlin, Heidelberg, 2007.
18. Soundes Marzougui and Juliane Kr¨amer. Post-quantum cryptography in embedded
systems, 2019.
19. Soundes Marzougui, Vincent Ulitzsch, Mehdi Tibouchi, and Jean-Pierre Seifert.
Profiling side-channel attacks on dilithium: A small bit-fiddling leak breaks it all.
Cryptology ePrint Archive, Paper 2022/106, 2022. https://eprint.iacr.org/
2022/106.
20. Soundes Marzougui, Nils Wisiol, Patrick Gersch, Juliane Kr¨amer, and Jean-Pierre
Seifert. Machine-learning side-channel attacks on the galactics constant-time im-
plementation of bliss, 2021.
21. Michele Mosca. Cybersecurity in an era with quantum computers: Will we be
ready? IEEE Security Privacy, 16(5):38–41, 2018.
22. Kalle Ngo, Elena Dubrova, Qian Guo, and Thomas Johansson. A side-channel
attack on a masked ind-cca secure saber kem implementation. IACR Transactions
on Cryptographic Hardware and Embedded Systems, pages 676–707, 2021.
23. National Institute of standards and technology. Nist pqc standardization process.
https://csrc.nist.gov/Projects/post-quantum-cryptography.
24. Sebastian Paul, Felix Schick, and Jan Seedorf. Tpm-based post-quantum cryp-
tography: A case study on quantum-resistant and mutually authenticated tls for
iot environments. In The 16th International Conference on Availability, Reliability
and Security, ARES 2021, New York, NY, USA, 2021. Association for Computing
Machinery.
25. Peter Pessl. Analyzing the shuffling side-channel countermeasure for lattice-based
signatures. In International Conference on Cryptology in India, pages 153–170.
Springer, 2016.
26. Prasanna Ravi, Sujoy Sinha Roy, Anupam Chattopadhyay, and Shivam Bhasin.
Generic side-channel attacks on cca-secure lattice-based pke and kems. IACR
Transactions on Cryptographic Hardware and Embedded Systems, 2020(3):307–335,
Jun. 2020.
27. Oded Regev. On lattices, learning with errors, random linear codes, and cryptog-
raphy. J. ACM, 56(6), sep 2009.
28. Rhode and Schwarz. Kryptoger¨ate. https://www.rohde-schwarz.com/de/
produkte/aerospace-verteidigung-sicherheit/kryptogeraete_230846.html.
29. Tobias Schneider, Clara Paglialonga, Tobias Oder, and Tim G¨uneysu. Efficiently
masking binomial sampling at arbitrary orders for lattice-based crypto. In IACR
International Workshop on Public Key Cryptography, pages 534–564. Springer,
2019.
30. scikit learn. scikit-learn machine learning in python. https://scikit-learn.org/
stable/.
31. Peter W. Shor. Polynomial-time algorithms for prime factorization and discrete
logarithms on a quantum computer. SIAM J. Comput., 26(5):1484–1509, oct 1997.
32. Bo-Yeon Sim, Jihoon Kwon, Joohee Lee, Il-Ju Kim, Tae-Ho Lee, Jaeseung Han,
Hyojin Yoon, Jihoon Cho, and Dong-Guk Han. Single-trace attacks on message
encoding in lattice-based kems. IEEE Access, 8:183175–183191, 2020.
33. Douglas Stebila and Michele Mosca. liboqs is an open source C library
for quantum-safe cryptographic algorithms., Cortex-M4. https://github.com/
open-quantum-safe/liboqs.
34. Vincent Quentin Ulitzsch, Shinjo Park, Soundes Marzougui, and Jean-Pierre
Seifert. A post-quantum secure subscription concealed identifier for 6g. In Proceed-
ings of the 15th ACM Conference on Security and Privacy in Wireless and Mobile
Networks, WiSec ’22, page 157–168, New York, NY, USA, 2022. Association for
Computing Machinery.
35. Utimaco. What is a hardware security module (hsm). https:
//utimaco.com/de/produkte/technologien/hardware-security-modules/
what-hardware-security-module-hsm.
36. H. Wang, M. Brisfors, S. Forsmark, and E. Dubrova. How diversity affects deep-
learning side-channel attacks. In 2019 IEEE Nordic Circuits and Systems Con-
ference (NORCAS): NORCHIP and International Symposium of System-on-Chip
(SoC), pages 1–7, 2019.
37. Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Cross-vm
side channels and their use to extract private keys. In Proceedings of the 2012 ACM
Conference on Computer and Communications Security, CCS ’12, page 305–316,
New York, NY, USA, 2012. Association for Computing Machinery.
38. Raymond K. Zhao, Ron Steinfeld, and Amin Sakzad. Facct: Fast, compact, and
constant-time discrete gaussian sampler over integers. IEEE Transactions on Com-
puters, 69(1):126–137, 2020.