Document [original]

This version is available at https://doi.org/10.14279/depositonce-6790

right to use is granted. This document is intended solely for

personal, non-commercial use.

redistribution. The definitive Version of Record was published in Proceedings of the 7th ACM international

conference on Computing frontiers,

http://dx.doi.org/10.1145/1787275.1787342.

Borodin, D., & Juurlink, B. H. H. (2010). Protective redundancy overhead reduction using instruction

vulnerability factor. In Proceedings of the 7th ACM international conference on Computing frontiers - CF

’10. ACM Press. https://doi.org/10.1145/1787275.1787342

Borodin, D., & Juurlink, B. H. H.

Protective redundancy overhead reduction

usin

instruction vulnerability factor

Accepted manuscript (Postprint)Conference paper |

Protective Redundancy Overhead Reduction Using

Instruction Vulnerability Factor ∗

Demid Borodin

Computer Engineering Laboratory

Faculty of Electrical Engineering, Mathematics,

and Computer Science

Delft University of Technology

Mekelweg 4, 2628 CD Delft, The Netherlands

Tel: +31 152787362, Fax: +31 152784898.

D[email protected]

B.H.H. (Ben) Juurlink†

Embedded Systems Architectures

Institute for Computer Engineering and

Micro-Electronics

Technische Universität Berlin

Franklinstraße 28/29, D-10587 Berlin, Germany

Tel: +49 3031425538, Fax: +49 3031422943

Juurlink@CS.TU-Berlin.de

ABSTRACT

Due to modern technology trends, fault tolerance (FT) is acquir-

ing an ever increasing research attention. To reduce the overhead

introduced by the FT features, several techniques have been pro-

posed. One of these techniques is Instruction-Level Fault Tolerance

Conﬁgurability (ILCOFT). ILCOFT enables application develop-

ers to protect different instructions at varying degrees, devoting

more resources to protect the most critical instructions, and sav-

ing resources by weakening protection of other instructions. It is,

however, not trivial to assign a proper protection level for every

instruction. This work introduces the notion of Instruction Vul-

nerability Factor (IVF), which evaluates how faults in every in-

struction affect the ﬁnal application output. The IVF is computed

off-line, and is then used by ILCOFT-enabled systems to assign

the appropriate protection level to every instruction. IVF releases

the programmer from the need to assign the necessary protection

level to every instruction by hand. Experimental results demon-

strate that IVF-based ILCOFT reduces the instruction duplication

performance penalty by up to 77%, while the maximum output

damage due to undetected faults does not exceed 0.6% of the to-

tal application output.

Categories and Subject Descriptors

C.4 [Performance of Systems]: Fault tolerance; D.2.4 [Software/

Program Veriﬁcation]: Reliability

∗This work was partially supported by the European Commission

in the context of the SARC integrated project #27648 (FP6), and

by the Dutch Ministry of Economic Affairs in the context of the

3DIM3 CATRENE project #CT105.

†The work was partially performed while the second author was

afﬁliated with Delft University of Technology.

General Terms

Design, Performance, Reliability, Veriﬁcation

Keywords

performance, redundancy, fault detection, instruction vulnerability,

selective protection

1. INTRODUCTION

Fault tolerance (FT) of computing systems is receiving an in-

stantly increasing attention since recently [1]. This is due to current

technology trends such as increasing the chip density by shrinking

the feature size and power supply minimization. When the transis-

tor feature size decreases, they become vulnerable to multiple ex-

ternal disturbances and internal problems. For example, radiation

which could not affect larger transistors and wires is able to cause

faults in smaller ones, internal crosstalk becomes more common,

etc. Minimization of the power supply increases the chance of bit

ﬂips. As a result, FT is becoming increasingly important. While

strong FT techniques used to be applied only to special-purpose

high-end computing systems, and only a few techniques such as Er-

ror Correcting Codes (ECC) [2] in memories were relatively widely

used, they are becoming common nowadays.

FT is often based on some form of redundancy, which means

that it introduces a certain hardware overhead, performance penalty

and/or energy consumption increase. While acceptable for expen-

sive critical computing systems, it should be avoided as much as

possible in commodity systems. Therefore, FT overhead mini-

mization is receiving a lot of attention. Many techniques, some-

times trading FT for overhead minimization, have been proposed.

One of these techniques, Instruction-Level Fault Tolerance Con-

ﬁgurability (ILCOFT) [3,4], requires the application developer to

assign the required protection level to every assembly instruction,

or every high-level programming language statement, or set of in-

structions/statements, and protects them accordingly. This is useful

for naturally error tolerant applications. For example, in multime-

dia applications, much of the computations do not strictly require

absolute correctness. Many faults can lead to slight output imper-

fections that in some cases are not even noticeable for a human.

For example, if a few pixels in a large image are wrong, they are

likely to be unnoticed. By enforcing the protection of more critical

instructions and/or weakening the protection of less critical ones,

ILCOFT achieves a higher overall reliability and/or minimizes the

protection overhead. Depending on the FT method used, this can

be performance, energy, and/or area overhead.

A disadvantage of the ILCOFT scheme as presented in [3,4]

is the manual protection level assignment required from the ap-

plication developer. To process (a large part of) a relatively large

application, signiﬁcant effort is required. In addition, this process

is error-prone because it entirely relies on the programmer’s judg-

ment of every instruction’s vulnerability. An alternative automatic

method is also proposed in [3,4] in which the compiler assigns

the required protection level. Because the compiler cannot eval-

uate the criticality of data processing instructions, it is assumed

that only control ﬂow instructions need strong protection. Thus, all

control ﬂow instructions (branches, jumps, function calls, etc.) and

instructions on which they depend (address calculations, condition

evaluations, etc.) are assigned a high protection level. Sundaram

et al. [5] discuss such compiler analysis in detail. This automatic

compiler-based method, however, is based on the assumption that

no data manipulating instructions are critical, which is not safe. In

many applications faults in data processing instructions can corrupt

the whole application output, in which case, even though the appli-

cation does not crash, it is not usable. This approach can only be

safely applied to a very limited set of applications. A more sophis-

ticated method to assess every instruction’s criticality is required

for most applications.

To address the instruction criticality assessment problem, this

work introduces the notion of Instruction Vulnerability Factor (IVF).

IVF is analogous to the Architecture Vulnerability Factor (AVF) [6],

but addresses application instructions instead of hardware struc-

tures. AVF estimates the probability that a fault in a particular

hardware structure will result in an error visible in the ﬁnal ap-

plication output. Experiments demonstrate that different processor

structures have different AVFs [6]. For example, unlike faults in

ALUs, faults in branch predictors can only result in a slightly re-

duced performance, but cannot damage the ﬁnal application output.

We suggest that a similar metric can be applied to the application’s

instructions. Faulty results of different instructions affect the ﬁnal

application output in different ways. IVF measures how much of

the ﬁnal output is corrupted due to faults in every instruction.

ILCOFT can be applied to different instructions based on their

IVF. Instructions with a higher IVF should be protected better, while

others can be assigned a weaker protection level to reduce the over-

head.

To compute the IVF for every instruction, off-line proﬁling is

performed. It can be done either in a simulation environment or

in real hardware capable of injecting faults. Fault(s) should be in-

jected into every instruction, one per experiment, and the resulting

application output be used to estimate the instruction IVF. Mul-

tiple experiments per instruction are preferable to obtain statisti-

cally valid results. To reduce the amount of work this process re-

quires, only the most time consuming application parts can be pro-

cessed. As shown in Section 3, this is able to provide signiﬁcant

application-level advantage. This procedure has to be performed

off-line, only once per application. The result (IVF value for all

the considered instructions) is then stored and distributed together

with the application binary code. At runtime, every instruction is

protected at the level required by its IVF value.

The remainder of this paper is structured as follows. Section 2

describes the IVF estimation performed in this work, and how it is

then used to reduce the overhead of the instruction duplication error

detection technique. Section 3presents the experimental results

evaluating the performance penalty reduction and fault coverage

of the proposed IVF-based ILCOFT technique. Finally, Section 4

draws conclusions and describes future work.

2. IVF AND IVF-BASED ILCOFT

This section introduces the IVF and demonstrates how it can be

used as a base for ILCOFT. Section 2.1 discusses how the IVF can

be estimated. Section 2.2 presents the instruction duplication error

detection scheme, which is adapted to support IVF-based ILCOFT

in Section 2.3.

2.1 IVF Estimation

IVF estimation requires monitoring how different faults in every

instruction propagate to the ﬁnal application output. In other words,

how much of the output is corrupted due to faults in every instruc-

tion. This can be achieved using fault injection experiments, either

in a simulation environment or on fault injection-enabled hardware.

Alternatively, this can also be done in software without simulations.

To simulate faults in software, the correct machine instructions can

be substituted with other instructions producing wrong results in

the same output registers. The simulator-free software solution can

be expected to signiﬁcantly speed up the IVF estimation process,

but it reduces the fault injection ﬂexibility (does not provide access

to simulated hardware structures where faults happen).

Ideally, all the possible faults have to be injected one at a time

into every executed instruction, and the corresponding output cor-

ruption measured. Moreover, some faults in dependent instructions

can affect each other, thus combinations of faults in different in-

structions should also ideally be examined. This, however, would

require an enormous number of experiments which is not feasible.

Experimental results (see Section 3) demonstrate that injecting a

random fault into every instruction provides a sufﬁciently accurate

IVF estimation. It is desirable to perform multiple experiments

with every instruction to achieve statistically valid results, because

different faults in the same instruction can affect the execution in

different ways.

In this work, the IVF estimation is performed on the sim-outorder

simulator from the SimpleScalar tool set [7]. During every simula-

tion, the output of one instruction execution is assigned a random

value, simulating the worst case (a multi-bit) transient fault. Faults

are only injected into instructions producing results, such as arith-

metic operations and memory loads. Memory stores and jumps are

not affected, because the error detection scheme used in this work

does not cover faults in these instructions. The ﬁnal application out-

put is then compared to the correct one, and the output corruption

is measured. Depending on the application, this can be the percent-

age of corrupted bytes in the output (in an image for instance), or

the percentage of corrupted output items, such as matrix elements.

The output corruption percentage is then saved as the instruction’s

IVF. When multiple experiments per instruction are performed, the

average IVF is saved for all the experiments.

The average IVF value is computed for every static instruction

within the considered code segment. This requires a large number

of simulations, especially if multiple experiments per instruction

are conducted (as is desirable). This, however, has to be done only

once per application, and can be done off-line. The collected statis-

tics are then saved together with the application binary code and

used at runtime. For large applications, it is not even necessary to

compute the IVF of every instruction. It can be estimated only for

the instructions within the most time consuming application parts,

such as multimedia kernels. Section 3.3 demonstrates that this ap-

proach achieves signiﬁcant application-level performance improve-

ments.

The IVF information can be saved in a program binary code as a

separate table. This table maps application instructions (identiﬁed

by their PC) to the corresponding IVF values, or to the required

protection levels. Storing the protection levels is likely to take less



    

 

 

 









Figure 1: Instruction duplication.

space (for example, only one bit is needed if two protection levels

are available), and no logic is needed to determine the protection

level from the IVF value. On the other hand, storing the IVF value

provides more ﬂexibility, in that the system can be conﬁgured at

runtime to deﬁne the threshold IVF values controlling the required

protection levels. The IVF information stored in a separate table

needs to be loaded into a special hardware buffer (possibly simul-

taneously assigning the protection levels and storing them instead

of IVF values to reduce the storage requirements). Then, the ta-

ble must be looked up to determine the proper protection level for

every executed instruction. The table can be compressed. For ex-

ample, with only two protection levels available in the system (of

which one is default), only PCs of instructions with the non-default

required protection level have to be stored. Alternatively, instead of

keeping the IVF data in a separate table, it might be preferable to

change the instruction format, to add the required protection levels

there.

2.2 Instruction Duplication

In this work a hardware ILCOFT scheme is used. Instruction du-

plication in the pipeline is used as a time-redundant error detection

technique, and its overhead is minimized using the IVF information

collected as described in Section 2.1. The instruction duplication

scheme is based on [8], but executes every instruction twice rather

than duplicating them in the dynamic scheduler. This requires less

space in the dynamic scheduler, but does not protect against faults

in it. Figure 1depicts the instruction execution steps with the cor-

responding activities performed by every instruction. A fetched

instruction proceeds normally until the execution stage. There the

instruction is kept in the RUU until it has been issued twice. When

the results of both executions are available, they are compared in

the Writeback stage, and the instruction commits if no errors are

detected.

What happens if the results do not match depends on the goals

of the system. A system targeting fail-safe operation would signal

an error and halt. If FT is required, another copy of the questioned

instruction could be created and executed, performing a majority

voting on all the obtained results according to the Triple Modular

Redundancy (TMR) scheme [9,10], or a different form of recovery

could be initiated. This, however, is outside the scope of this work.

2.3 IVF-Based Selective Instruction Duplica-

tion

To support ILCOFT, the FT technique(s) used in the system have

to be able to apply different protection levels to different instruc-

tions. The higher the IVF is, the stronger the protection technique

applied to the instruction should be. In the simplest case, only two

protection levels are available: an instruction is either protected

or left unprotected. This case is very practical, because more com-

plex protection schemes with multiple protection levels can be very

0% 6% 20%3% 9% 35% 50% 65% 80% 91% 94% 97% 100%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Image Addition Matrix Multiplication Fibonacci SAD

JPEG Enc. JPEG Dec. ADPCM Enc. ADPCM Dec.

IVF

% of static instructions

Figure 2: Histogram of IVF values for different applications

and kernels.

expensive. In this work such simple protection scheme based on

instruction duplication (see Section 2.2) is used, called IVF-Based

Selective Instruction Duplication (IVF-SID). Depending on its IVF,

an executed instruction is either duplicated and the result is veriﬁed,

or executed only once without error detection.

Note that two terms are used in this work: IVF-based ILCOFT

and IVF-SID. IVF-based ILCOFT is a general ILCOFT technique

which decides how critical instructions are based on their IVF val-

ues. IVF-based ILCOFT can be applied to many different FT tech-

niques able to protect individual instructions. IVF-based ILCOFT

applied to the instruction duplication error detection technique is

called IVF-SID.

The simple protected/unprotected scheme requires to establish

a threshold IVF value, further referred to as IVFthr . Instructions

with IVF above IVFthr should be protected, and instructions with

IVF below IVFthr are left unprotected. The IVFthr value should

be carefully chosen to guarantee that the amount of application out-

put corruption it represents is indeed tolerable. Ideally it should be

determined speciﬁcally for every application, because different ap-

plications have varying output damage tolerance.

Figure 2shows the histogram of IVF values for different ker-

nels (the kernels are described in Section 3.1). It clearly illustrates

that the majority of IVF values are at the extremes, they are either

larger than 99% or less than 1%. Some kernels (Image Addition

and SAD) do not have any instructions with IVF values in between.

Most instructions either corrupt (almost) the whole output or less

than 1% of it. This indicates that 1% is a good IVFthr value. Dam-

age of 1% of the output can be considered tolerable for many appli-

cation domains, including many multimedia applications. Increas-

ing the IVFthr to, for example, 10% would not lead to signiﬁcant

changes, because most kernels do not have any instructions with an

IVF in the range of 1% to 10% (see Figure 2). Thus, an IVFthr

value of 1% is used for IVF-SID in this work. Instructions whose

wrong results corrupt less than 1% of the application output are not

protected.

The IVF values distribution shown in Figure 2suggests that a

simple protected/unprotected ILCOFT scheme is sufﬁcient. Mul-

tiple available protection levels would not be very useful for most

of the benchmarks shown in Figure 2, because they have only a

few instructions with medium IVF values. On average 44% of the

instructions have an IVF below 1%, and 39% of the instructions

have an IVF above 99%. Most of the other instructions have an

IVF below 5% or above 95%. Only the ADPCM encoder and de-

coder have a signiﬁcant number of instructions (10% or more) with

Table 1: Processor conﬁguration.

Fetch/Dec./Issue Width 2, 4, or 8

# of Int. ALUs 2, 4, or 8

# of Int. Mult./Div. 1

# of FP ALUs 1

# of FP Mult./Div. 1

RUU Size 64

Memory Latency 112 cycles (ﬁrst chunk),

2 cycles (subsequent chunks)

L1 Data Cache 32 KB, 2-way set associative

L1 Instruction Cache 32 KB, 2-way set associative

L2 Uniﬁed Cache 512 KB, 4-way set associative

an IVF in the range 40-90%. For such applications, the following

scheme could be used: no or minimum protection for instructions

with an IVF below 1%, average protection for instructions with an

IVF between 1% and 99%, and maximum protection for instruc-

tions with an IVF above 99%.

3. EXPERIMENTAL EVALUATION

This section experimentally evaluates the proposed IVF-SID method.

Section 3.1 describes the used simulator and benchmarks. Sec-

tion 3.2 explains the details of the employed IVF estimation method.

Section 3.3 demonstrates the performance improvements IVF-SID

achieves compared to duplicating all instructions. Section 3.4 eval-

uates its fault coverage. Finally, Section 3.5 compares IVF-SID

with a manual ILCOFT method.

3.1 Experimental Setup

Simulations are performed using the sim-outorder simulator from

the SimpleScalar tool set [7]. The processor conﬁguration (not very

powerful because small kernels are considered) is shown in Table 1.

Four kernels and four applications are used as benchmarks. Im-

age Addition, Matrix Multiplication (with rather small input matri-

ces of the size 20×10 and 10×20 to reduce the simulation time),

and Sum of Absolute Differences (SAD) are kernels very often

used in multimedia applications. The fourth kernel computes the

Fibonacci numbers, which are widely used in science, and even in

ﬁnancial market trading and music [11]. Encoders and decoders

for JPEG image compression and ADPCM sound compression are

the full applications used, taken from the MediaBench benchmark

suite [12].

The kernels are chosen to represent a range of algorithms: from

(very) tolerable to faults to hardly tolerable to faults. Image Ad-

dition contains a large number of independent calculations, which

makes it tolerant to faults: a fault in most of the computational

instructions can only affect a single output element. Matrix Multi-

plication has many independent as well as many dependent opera-

tions. The output elements are, however, computed independently,

thus a fault in one of them does not affect the others. Every Fi-

bonacci number depends on all the previously computed numbers,

and thus, any fault leads to the corruption of all subsequent values.

SAD outputs a single value which depends on the whole compu-

tation sequence, and is either correct or wrong. Any fault is SAD

leads to a wrong result, thus, SAD is the most vulnerable kernel.

3.2 IVF Calculation

The IVF is estimated as described in Section 2.1. For the full

applications, to reduce the simulation time and demonstrate that it

still provides useful results, IVF-based ILCOFT is applied only to

the most time consuming functions. The applications are proﬁled

and the most demanding functions identiﬁed (for example, the for-

ward_DCT function in the JPEG encoder, and the jpeg_idct_islow

function in the JPEG decoder). The IVF values have been estimated

and are later used by ILCOFT only for the instructions within these

functions. In other parts of the applications all instructions have

been duplicated as described in Section 2.2.

To calculate the IVF of every instruction, ten fault injection ex-

periments per static instruction have been conducted for the ker-

nels, and three experiments per instruction for full applications. To

evaluate the accuracy of this method, an additional experiment is

performed with one of the kernels (Matrix Multiplication). This

experiment attempts to approach the ideal (exhaustive) case taking

into account all possible faults and even combinations of faults in

different instructions (see Section 2.1). First, for every evaluated

static instruction, one thousand experiments injecting a single ran-

dom fault into it are executed. Then, faults are injected in two dif-

ferent instructions at a time. This is achieved in the following way.

Each evaluated instruction is coupled with every other instruction

in the kernel whose originally estimated IVF (from the ﬁrst ex-

periment with ten injections per instruction) does not exceed its

own originally estimated IVF. The obtained output corruption con-

tributes to the IVF of the evaluated instruction. Instructions with

greater originally estimated IVF are not coupled, because they af-

fect the output more than the evaluated instruction. Fifty iterations

of this double-injection experiment are performed, resulting in 350

to 1500 simulations per instruction.

The results of the short Matrix Multiplication IVF estimation ex-

periment closely follow the results of the large experiment. The

average difference between the obtained IVF values is 0.5%. The

maximum difference is 7% (it is so large only for two instructions).

Hence, the short experiment performs with a sufﬁcient accuracy.

Fault injection experiments in Section 3.4 conﬁrm that a high level

of reliability is achieved using the short IVF estimation experi-

ments.

To provide an insight on the time the IVF estimation process

takes, the fault injection experiment simulation time has been mea-

sured for one randomly chosen kernel and one application. For

the Fibonacci kernel, an experiment injecting one fault into one in-

struction took 0.8 seconds, and for the JPEG Encoding application

it took 136.7 seconds. Note that this is for the case when simulation

produces a complete output (does not crash due to faults). Simula-

tions crashing (or producing incomplete outputs) due to faults take

less time. On the other hand, there are also faults substantially in-

creasing the simulation time. For example, a fault that signiﬁcantly

increases the value of a variable controlling the exit condition of a

large loop can have such effect.

3.3 Performance

Instruction duplication (see Section 2.2) requires that every dy-

namic instruction is executed twice. Since duplicated instructions

are independent, they can be executed in parallel, if sufﬁcient com-

putational resources are available. Thus, instruction duplication in-

creases the amount of instruction-level parallelism (ILP) available

in the application. Unless the application originally has a very lim-

ited amount of ILP, instruction duplication is likely to introduce a

signiﬁcant performance penalty due to the lack of computational

resources available to execute the instruction duplicates. By avoid-

ing the re-execution of instructions with an IVF smaller than 1%,

IVF-SID reduces the overhead of instruction duplication.

Figure 3shows the performance overhead of full instruction du-

plication and IVF-SID over the original execution without any pro-

tective redundancy. The benchmarks are executed on a system with

four integer ALUs, and a fetch/decode/issue width of four. Fig-

Image Addition

Matrix Multiplication

Fibonacci

SAD

JPEG Enc.

JPEG Dec.

ADPCM Enc.

ADPCM Dec.

10%

20%

30%

40%

50%

60%

70%

80%

90%

Full duplication

IVF-SID

Figure 3: Performance penalty of instruction duplication and

IVF-SID compared to redundancy-free execution. System with

4 integer ALUs, and fetch/decode/issue width of 4.

ure 3demonstrates that for all the benchmarks, IVF-SID recovers

a certain amount of the performance overhead due to instruction

duplication. For example, if the full duplication is 80% slower than

the redundancy free execution, and IVF-SID is 40% slower, than

the amount of performance it recovers is 50%. Figure 4shows the

amount of recovered performance for different benchmarks. Three

machine conﬁgurations are considered: with two, four and eight

integer ALUs. To balance the machine organization, the fetch, de-

code, and issue width matches the number of integer ALUs (see

Table 1).

For the kernels, the recovered performance varies from 1.1% (for

Fibonacci, on a system with eight integer ALUs) to 77.7% (for Ma-

trix Multiplication, on a system with two integer ALUs). Full ap-

plications recover from 1% (JPEG Encoder, system with two inte-

ger ALUs) to 58.3% (ADPCM Encoder, system with two integer

ALUs).

The amount of recovered performance depends on the IVF dis-

tribution of the application, and the amount of ILP available in the

redundancy-free application. The more instructions with low IVF

are found in the application, the fewer executed instructions are

duplicated, and more performance is recovered. The distribution of

instructions with different IVF values in the benchmarks is shown

in Figure 2.

In SAD, almost 86.4% of the static instructions have an IVF of

above 99% (and thus deﬁnitely need to be duplicated in IVF-SID),

and only 13.6% of the instructions have an IVF lower than 1% (and

thus do not need to be duplicated). This is because SAD produces

only one output, which can be either correct (100% of the output

is correct) or wrong (100% of the output is damaged). Any fault

which propagates to the ﬁnal SAD output damages 100% of it.

Therefore, only faults that do not propagate to the ﬁnal output at

all (damage unused data, or faults that are masked) can be tolerated

in SAD. We call these faults escapes. Due to these characteristics,

SAD is one of the worst performing benchmarks: only 3.4% to

10.8% of performance is recovered in IVF-SID.

A similar situation appears with the Fibonacci numbers genera-

tor, for which IVF-SID recovers from 1.1% to 8.8%. The generator

produces a series of numbers, every one of which depends on the

previous values. Thus, an error appearing in the beginning of the

sequence damages all the subsequent numbers. Only 6.3% of the

static instructions in the Fibonacci numbers generator have an IVF

of below 1%. 50% of the static instructions have an IVF of 99%

Image Addition

Matrix Multiplication

Fibonacci

SAD

JPEG Enc.

JPEG Dec.

ADPCM Enc.

ADPCM Dec.

Average

10%

20%

30%

40%

50%

60%

70%

80%

90%

2 Int. ALUs 4 Int. ALUs 8 Int. ALUs

Performance recovered by IVF-SID

Figure 4: Performance penalty reduction achieved by IVF-SID

over instruction duplication on different processor conﬁgura-

tions. Fetch/decode/issue width matches the number of integer

ALUs.

or 100%, and 43.8% have an IVF in the range 95%-99% (see Fig-

ure 2).

The highest performance improvements achieved by IVF-SID

are obtained for Image Addition (58.7% to 77.4%) and Matrix Mul-

tiplication (52.2% to 77.7%). In Image Addition, 55% of the static

instructions have an IVF of below 1%, and 45% have an IVF of

above 99%. In Matrix Multiplication, 67.7% of the static instruc-

tions have an IVF of below 1%, and 22.6% have an IVF of above

99%. Both these benchmarks produce many independent elements

in the output. When one of them is damaged, it often corrupts less

than 1% of the whole output, which is the used IVFthr value. For

large images with millions of pixels which are common nowadays,

one wrong pixel will most probably not even be visible.

Figure 4shows that IVF-SID improves the performance of the

ADPCM encoder much more than the other full applications. This

is due to the fact that the encoding function in the ADPCM encoder,

to which IVF-based ILCOFT is applied, is a substantial part of

the whole application. In contrast, the IVF-enabled forward_DCT

function in the JPEG encoder is only a part of the whole applica-

tion, which also calls many other functions. Thus, the ADPCM

encoder (and also decoder) functions affect the whole application

performance much more than the functions in the JPEG encoder

and decoder to which IVF-based ILCOFT is applied. The ADPCM

encoder, however, has signiﬁcantly more instructions with an IVF

below IVFthr (75.4% of the instructions, compared to 26% in the

decoder), and thus it achieves a much higher performance improve-

ment.

Figure 4indicates that in most cases, the performance recov-

ered by IVF-SID drops when the number of integer ALUs (and

the fetch/decode/issue width) increases. This is due to the larger

amount of computational resources available. The ILP introduced

by instruction duplication is utilized more effectively on systems

with more integer ALUs. Thus, the reduction of the number of ex-

ecuted instructions due to IVF-SID does not affect the performance

so much. In contrast, for Image Addition and the JPEG Encoder,

IVF-SID recovers less performance on a system with two integer

ALUs than with four of them. We attribute this to the issue width,

which is insufﬁcient and becomes a bottleneck in these cases. In-

creasing the fetch/decode/issue width from two to four (while the

number of integer ALUs is still two) increases the recovered per-

formance from 58.7% to 70.3% for Image Addition.

Table 2: Fault coverage statistics.

Benchmark Detected

(%)

Wrong

output

(%)

Escapes

(%)

Max. output

damage (%)

Min. output

damage (%)

Average output

damage (%)

Image Addition 43.16 48.42 8.42 0.06 0 0.05

Matrix Multiplication 59.14 18.28 22.58 0.25 0.25 0.25

Fibonacci 95.92 0 4.08 0 0 0

SAD 95 0 5 0 0 0

JPEG Encoding 76.77 0 23.23 0 0 0

JPEG Decoding 10 78 12 0.03 0 0.01

ADPCM Encoding 33.33 31.31 35.35 0.64 0 0.12

ADPCM Decoding 87.76 0 12.24 0 0 0

Note that the IVF values distribution in static application instruc-

tions cannot be expected to always perfectly indicate how useful the

IVF-based ILCOFT will be for the program. This is because per-

formance depends on the dynamic behavior, on how many times

every static instruction with a certain IVF will be executed.

3.4 Fault Coverage

To evaluate the fault coverage of IVF-SID with IVFthr equal 1%,

fault injection experiments have been conducted. Every benchmark

has been run around 100 times, with one fault in the output of a

random instruction within the kernel to which IVF-SID is applied.

To simulate the worst-case scenario, burst (multi-bit) faults are in-

jected rather than single-bit ones. This is achieved by changing

instruction output to a random value. There are two reasons to in-

ject only one fault per simulation. First, the current and near-future

fault rates are not very high, and thus, more than one fault in a rel-

atively small kernel is unlikely. Second, when injecting multiple

faults per experiment, the chance that at least one of them will be

detected increases. As a result, other faults that would not have

been detected alone are hidden, and their effect on the output is not

investigated.

Table 2depicts the collected statistics. The second column shows

the percentage of faults detected by instruction duplication. The

third column indicates the percentage of simulations ﬁnished with

wrong output (with an undetected fault). The fourth column shows

the percentage of escapes, that is, undetected faults that did not

propagate to the output. The subsequent columns demonstrate the

maximum, minimum, and average observed output corruption per-

centage due to undetected faults.

Table 2demonstrates that no undetected faults damaged more

than 0.64% of the output. For most benchmarks, the maximum

output damage is even smaller. The maximum output corruption

of 0.06% in Image Addition means that in the output image, 1152

pixels out of almost 2 million are wrong. A visual inspection of the

output image did not reveal any difference with the correct image.

We believe that the observed output damage in other benchmarks

is also tolerable.

SAD, Fibonacci, JPEG encoder and ADPCM decoder do not

have any output damage due to undetected faults. All faults in these

benchmarks were either detected or did not propagate to the output

(escapes). These benchmarks have only a few instructions with low

IVF values (see Section 3.3 and Figure 2). Most of their instruc-

tions have high IVF values, thus they are duplicated in IVF-SID,

and faults are detected. This high instruction coverage results, how-

ever, in a smaller performance improvement of IVF-SID compared

to full instruction duplication (see Figure 4).

Image Addition

Matrix Multiplication

Fibonacci

SAD

10%

20%

30%

40%

50%

60%

70%

80%

90%

IVF-SID

ILCOFT-enabled

EDDI

Performance recovered

Figure 5: Comparison of performance penalty reduction

achieved by IVF-SID over instruction duplication and by

ILCOFT-enabled EDDI over EDDI.

3.5 IVF-SID Compared to ILCOFT-Enabled

EDDI

In this section we compare IVF-SID to ILCOFT-enabled EDDI [4].

Error Detection by Duplicated Instructions (EDDI) [13] is a soft-

ware error detection technique which duplicates all instructions

in the program assembly code and inserts checks to determine if

the original instruction and its duplicate produce the same result.

ILCOFT-enabled EDDI duplicates only the critical instructions. In [4]

instructions are categorized manually by a programmer, and are

considered critical if they affect the control ﬂow.

Figure 5compares the amount of performance recovered by IL-

COFT in IVF-SID and in ILCOFT-enabled EDDI. Table 3com-

pares the maximum and average output damage observed when

these techniques are used. ILCOFT-enabled EDDI data is taken

from [4].

Note that this is not a straightforward comparison. First, these

techniques are very different, one is automatic and works in hard-

ware, while the other is manual and is applied in software. Second,

the processor organizations used in this work and in [4] differ in

some parameters such as memory access latency, cache sizes etc.

Both organizations, however, have an issue width of four and four

integer ALUs.

The main message Figure 5and Table 3illustrate is that IVF-

based ILCOFT is able to compete with a manual ILCOFT method.

This is clear from the Image Addition and Matrix Multiplication

performance and fault coverage: IVF-SID slightly improves perfor-

mance of Image Addition and reduces its average output damage,

and reduces performance of Matrix Multiplication and improves

Table 3: Output damage of IVF-SID and ILCOFT-enabled EDDI.

IVF-SID ILCOFT-enabled EDDI

Maximum Average Maximum Average

Image Addition 0.06 % 0.05 % 0.13 % 0.01 %

Matrix Multiplication 0.25 % 0.25 % 99 % 3.1 %

Fibonacci 0 0 96.67 % 38.23 %

SAD 0 0 100 % 100 %

its fault coverage. For Fibonacci and SAD, however, IVF-SID has

a serious performance disadvantage. This is due to the fact that

ILCOFT-enabled EDDI considers only instructions affecting the

control ﬂow to be critical. This leads to many undetected faults

causing an average output corruption of 38.2% for Fibonacci and

100% for SAD. IVF-SID does not allow any undetected faults in

these two kernels to reach the output, because it protects all the vul-

nerable instructions. Thus, IVF-SID is more suitable if reliability

is the primary concern, and ILCOFT-enabled EDDI is preferable

if performance is more important. Furthermore, another advantage

of IVF-SID is that it does not require the programmer to assign the

necessary protection level to every assembly instruction manually,

which demands a serious effort and is very error-prone.

4. CONCLUSIONS

This work introduces the concept of Instruction Vulnerability

Factor (IVF). IVF determines how much of the ﬁnal application

output is corrupted due to faults in particular instructions, and can

be estimated using fault injection experiments. It is shown that in

most applications, instructions have different IVF values. Depend-

ing on the nature of the application, it may have a large number of

instructions with low IVF values, which means that faults in these

instructions are tolerable. Many applications in the multimedia do-

main, for example, have such characteristics. This work proposes

to use the Instruction-Level Fault Tolerance Conﬁgurability (IL-

COFT) principle based on the application proﬁling data containing

every instruction’s IVF value. Instructions with higher IVF values

are protected better than instructions with lower IVF values.

A selective hardware instruction duplication scheme controlled

by the instruction’s IVF values (IVF-SID) is evaluated from the

performance and fault coverage points of view. It is shown that,

like in other ILCOFT schemes, both performance and fault cover-

age of IVF-SID depend on the nature of the application. Like other

ILCOFT schemes, IVF-SID recovers maximum performance (and

also saves energy) in applications that produce many independent

output elements. For these applications, faults in many instructions

are likely to affect only a small part of the ﬁnal output, and thus

many instructions have small IVF, and are not protected. The min-

imum performance is recovered in applications producing a single

output value, or a set of dependent values. Any fault propagating

to the ﬁnal output corrupts it completely in these applications, thus

most of the instructions need to be protected. On the other hand,

these applications have a better fault coverage, because most in-

structions are protected.

The experimental results demonstrate that for both these types of

applications, IVF is a useful measure which allows to determine the

optimal way to balance the performance and fault coverage. More-

over, IVF can be estimated automatically. It releases application

developers from the need to attribute code for ILCOFT manually.

This saves time, effort, and improves reliability, because manual

instruction FT level assignment is very error-prone. IVF-based IL-

COFT achieves performance improvement comparable to a manual

ILCOFT method. In addition, it provides much more accurate re-

sults than other automatic (compiler-based) instruction attribution

methods, which are based on very general assumptions (that only

instructions affecting the control ﬂow should be protected).

For future work, we plan to enhance our experimental setup.

Currently there is the following limitation. Both in IVF estimation

and fault injection, only the ﬁrst dynamic occurrence of every static

instruction is processed, because it is unknown if this instruction

will ever be executed later. In most cases this is not problematic. In

rare cases, however, this can lead to inaccurate IVF estimation. For

example, consider an instruction which calculates a condition for

a branch controlling a loop. If it produces any non-zero value, the

subsequent branch is taken (or not taken). At one loop iteration this

instruction produces a 1, which becomes any non-zero number due

to a fault, and the branch still performs correctly. However, at an-

other loop iteration, this instruction produces a 0, which becomes a

non-zero value due to a fault. In the latter case the branch behaves

in the wrong way. Taking into account the dynamic application

behavior would solve this problem when estimating IVF.

A possible future proﬁling procedure enhancement is to use a

timer to detect cases when faults lead to a signiﬁcant performance

degradation. During IVF proﬁling, we have observed several cases

when due to a fault, a critical loop iterates many orders of magni-

tude more times than it is supposed to. This leads to a correspond-

ing performance degradation. If such performance degradation is

unacceptable or if a signiﬁcant proﬁling duration increase is unac-

ceptable, a timer can be used to detect these cases. If the applica-

tion execution takes more time than permitted, it can be terminated,

assigning the maximum output damage to the resulting IVF value.

Both problems mentioned above can be solved by the follow-

ing approach. All the instructions affecting the control ﬂow can be

assigned the maximum IVF, because they can always lead to appli-

cation crashes. Then, the IVF is estimated using proﬁling only for

the remaining instructions. The instructions affecting the control

ﬂow can be identiﬁed by the compiler.

IVF estimation as presented in this work is suitable for appli-

cations with equally important output values. However, there ex-

ist applications whose output consists of different parts of varying

importance. For example, in a video sequence, bytes deﬁning in-

dividual pixel values are less important than bytes deﬁning frame

attributes. For such applications the IVF estimation procedure has

to be adapted to take into account the importance of individual out-

put elements. This can be achieved by assigning different weight

to the corruption of more and less important output elements.

5. REFERENCES

[1] P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and

L. Alvisi, “Modeling the Effect of Technology Trends on the

Soft Error Rate of Combinational Logic,” in DSN-02: Proc.

2002 Int. Conf. on Dependable Systems and Networks,

Washington, DC, USA, 2002, pp. 389–398.

[2] T. Rao and E. Fujiwara, Error-Control Coding for Computer

Systems. Upper Saddle River, NJ, USA: Prentice-Hall, Inc.,

1989.

[3] D. Borodin, B. Juurlink, and S. Vassiliadis,

“Instruction-Level Fault Tolerance Conﬁgurability,” in

IC-SAMOS VII: Proc. Int. Conf. on Embedded Computer

Systems: Architectures, Modeling, and Simulation, July

2007, pp. 110–117.

[4] D. Borodin, B. Juurlink, S. Hamdioui, and S. Vassiliadis,

“Instruction-Level Fault Tolerance Conﬁgurability,” Journal

of Signal Processing Systems, vol. 57, no. 1, pp. 89–105,

October 2009.

[5] A. Sundaram, A. Aakel, D. Lockhart, D. Thaker, and

D. Franklin, “Efﬁcient Fault Tolerance in Multi-Media

Applications through Selective Instruction Replication,” in

WREFT-08: Proc. of the 2008 workshop on Radiation effects

and fault tolerance in nanometer technologies. New York,

NY, USA: ACM, 2008, pp. 339–346.

[6] S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and

T. Austin, “A Systematic Methodology to Compute the

Architectural Vulnerability Factors for a High-Performance

Microprocessor,” in MICRO-36: Proc. of the 36th Annual

IEEE/ACM Int. Symp. on Microarchitecture. Washington,

DC, USA: IEEE Computer Society, 2003, p. 29.

[7] T. Austin, E. Larson, and D. Ernst, “SimpleScalar: An

Infrastructure for Computer System Modeling,” Computer,

vol. 35, no. 2, pp. 59–67, 2002.

[8] M. Franklin, “A Study of Time Redundant Fault Tolerance

Techniques for Superscalar Processors,” Proc. IEEE Int.

Workshop on Defect and Fault Tolerance in VLSI Systems,

pp. 207–215, Nov 1995.

[9] J. von Neumann, “Probabilistic Logics and the Synthesis of

Reliable Organisms from Unreliable Components,” in

Automata Studies, ser. Annals of Mathematics Studies.

Princeton, NJ: Princeton University Press, 1956, vol. 34, pp.

43–98.

[10] B. Johnson, Design and Analysis of Fault-Tolerant Digital

Systems. Addison-Wesley, Jan 1989.

[11] “Fibonacci numbers at Wikipedia,”

http://en.wikipedia.org/wiki/Fibonacci_number.

[12] C. Lee, M. Potkonjak, and W. H. Mangione-Smith,

“MediaBench: A Tool for Evaluating and Synthesizing

Multimedia and Communicatons Systems,” in MICRO-30:

Proc. of the 30th Annual ACM/IEEE Int. Symp. on

Microarchitecture. Washington, DC, USA: IEEE Computer

Society, 1997, pp. 330–335.

[13] N. Oh, P. P. Shirvani, and E. J. McCluskey, “Error Detection

by Duplicated Instructions in Super-Scalar Processors,”

IEEE Transactions on Reliability, vol. 51, no. 1, pp. 63–75,

Mar 2002.