Document [original]

IDR EXPLAINED∗

MARTIN H. GUTKNECHT†

Dedicated to Richard S. Varga on the occasion of his 80th birthday.

Abstract. The Induced Dimension Reduction (IDR) method is a Krylov space method for

solving linear systems that was developed by Peter Sonneveld around 1979. It was only noticed

by few people, and mainly as the forerunner of Bi-CGSTAB, which was introduced a decade later.

In 2007 Sonneveld and van Gijzen reconsidered IDR and generalized it to IDR(s), claiming that

IDR(1) ≈IDR is equally fast but preferable to the closely related Bi-CGSTAB, and that IDR(s)

with s > 1 may be much faster than Bi-CGSTAB. It also turned out that when s > 1, IDR(s) is

related to ML(s)BiCGSTAB of Yeung and Chan, and that there is quite some flexibility in the IDR

approach. This approach differs completely from traditional approaches to Krylov space methods,

and therefore it requires an extra effort to get familiar with it and to understand the connections

as well as the differences to better known Krylov space methods. This expository paper aims at

providing some help in this and to make the method understandable even to non-experts. After

presenting the history of IDR and related methods we summarize some of the basic facts on Krylov

space methods. Then we present the original IDR(s) in detail and put it into perspective with other

methods. Specifically, we analyze the differences between the IDR method published 1980, IDR(1)

and Bi-CGSTAB. At the end, we discuss a recently proposed ingenious variant of IDR(s) whose

residuals fulfill extra orthogonality conditions. There we dwell on details that have been left out in

the publications of van Gijzen and Sonneveld.

Key words. Krylov space method, iterative method, induced dimension reduction, IDR, CGS,

Bi-CGSTAB, ML(k)BiCGSTAB, large nonsymmetric linear system

1. History. The Induced Dimension Reduction (IDR) method was introduced

by Wesseling and Sonneveld from Delft University at a symposium of the International

Union of Theoretical and Applied Mechanics in September 1979. In the proceedings

it is covered on just 31

2pages of a 20-page paper [38], and it is explicitly attributed to

the second author. It was labeled as a Lanczos-type method for nonsymmetric linear

systems which does not require A

T. The term Lanczos-type method meant that the

new method was related to the biconjugate gradient (BiCG) method of Lanczos [17],

which had been revived and reformulated by Fletcher [4] four years before. Up to

that time there had been little interest in Lanczos’ approach, despite the fact that

it was very closely related to the widely used conjugate gradient method [16, 21].

Popular alternative Krylov space solvers for nonsymmetric systems were methods like

Vinsome’s OrthoMin1[37] (now often referred to as GCR), its variants OrthoDir

and OrthoRes [40], as well as similar methods introduced by Axelsson [2]. GMRes

[22] was still five years away. Also popular were parameter-dependent Krylov space

methods, like Chebyshev iteration, and parameter-dependent iterative methods based

on matrix splitting, like SOR.

The IDR method received hardly any attention, probably because it was neither

presented at a conference of the core numerical analysis community nor published in

∗Version of March 31, 2009.

†Seminar for Applied Mathematics, ETH Zurich, CH-8092 Zurich, Switzerland

([email protected]). Work done while the author was visiting the TU Berlin, supported by

the DFG Forschungszentrum MATHEON and the Mercator Visiting Professorship Program of the

DFG.

1We write the acronyms for the various methods in a unified way, which sometimes differs from

the one in the original publication.

a widely read journal. Moreover, Sonneveld’s approach to designing a Krylov space

solver was very unusual and even for experts hard to fully understand. Although the

method was under certain regularity conditions clearly and uniquely defined in [38],

some of the details and in particular the proof of the connection to BiCG were left

for publication elsewhere. The basic result on this connection is that the IDR residual

polynomials are of the form

ρIDR

n(t) = (Ωj(t)ρj(t) if n= 2j ,

Ωj(t)bρj+1(t) if n= 2j+ 1 ,(1.1)

where Ω0(t) :≡1, Ωj(t) :≡(1−ω1t)···(1−ωjt), and where ρjdenotes the jth BiCG

residual polynomial, which is often referred to as a Lanczos polynomial, scaled by

ρj(0) = 1, while bρj+1 denotes another residual polynomial, which has degree j+ 1. A

new linear factor (1−ωj+1t) is appended to Ωjin every other step, and it was suggested

to choose it such that the norm of the new IDR residual is minimized among those

that lie on a certain straight line. This is a widely used type of minimization step.

For example, it is also found in the conjugate residual [33] method, but there it leads

to a global minimum solution (for a symmetric positive definite matrix). And it is a

key ingredient of BiCGStab. The publication [38] only mentioned that the first line

of (1.1) had been proven in the case where Ais symmetric positive definite.

In 1984 Sonneveld introduced another Lanczos-type method: the Conjugate Gra-

dient Squared (CGS) method [29]. It is also based on residual polynomials that are

a product of polynomials, but here he used simply the square of the BiCG residual

polynomials:2

ρCGS

n(t) = ρ2

n(t).(1.2)

Note that the indexing of the residual polynomials and residuals is different in IDR

and CGS: in the former the degree increases by one when the index grows by one, in

the latter the degree increases by two.

The CGS paper [29] was received by the SIAM Journal on Scientific and Statistical

Computing (SISSC) on April 24, 1984, but it took nearly five years to get published

in revised and extended form [30]. Nevertheless, the method was accepted quickly by

numerical analysts and engineers. In typical cases it converges nearly twice as fast as

BiCG, though often in a very erratic manner. Although its idea and derivation are

ingenious, they are easy to understand: starting from the standard recursions for the

BiCG residual polynomials one just derives recursions for their squares by defining

additional suitable products of pairs of polynomials.

Yet another similar method was presented at the Householder Symposium in

Tylosand in June 1990 by van der Vorst, then still also at Delft University. The title

of his talk and the corresponding paper coauthored by Sonneveld and submitted to

SISSC on May 21, 1990, was “CGSTAB: A more smoothly converging variant of CG-

S” [35]. As part of the revision process the title was later changed into “Bi-CGSTAB:

a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric

2Therefore the name Biconjugate Gradient Squared (BiCGS) method would also make sense.

However Sonneveld’s view was that CGS is derived from a CG-type algorithm for building up a

set of orthogonal (or formally orthogonal) polynomials [private communication]. Only after the

recursions are mapped into a Krylov space that is embedded into an inner product space the notion

of biorthogonality makes sense.

linear systems”, and Sonneveld resigned as a coauthor [34]. In this paper, van der

Vorst started from the first formula in (1.1), now written as

ρBiCGSTAB

n(t) = Ωn(t)ρn(t).(1.3)

So, he adopted the indexing from CGS, and he also adopted from CGS the derivation

based on directly finding recursion for these residual polynomials, that is, he abstained

from using the recursions imposed by the IDR approach.

In the following years, BiCGStab was generalized to BiCGStab2 [12] and

BiCGStab(`) [24, 27], where the polynomial Ωnis built up from factors of degree 2

and `, respectively, whose coefficients are determined by a two– or an `–dimensional

residual norm minimization, respectively. This allows a better approximation of com-

plex eigenvalues and yields typically faster convergence at the price of higher com-

plexity, but only slightly higher computational cost. Nevertheless, the simple original

BiCGStab became the method of choice for most users who apply a Lanczos-type

method for solving a nonsymmetric linear system.

Due to the structure of the residual polynomials, CGS and the methods of the

BiCGStab family are often referred to as Lanczos-type product methods (LTPMs)

[13] or as hybrid BiCG methods [26].3

A new dimension came into play when, in 1997, Yeung and Chan submitted the

paper ML(k)BiCGSTAB: a BiCGSTAB variant based on multiple Lanczos starting

vectors to the renamed SIAM Journal on Scientific Computing (SISC) [39]. In this

paper they first introduced with ML(k)BiCG a version of BiCG where the left Krylov

space (generated by A

Tfrom an arbitrary shadow residual er0), which is used for the

oblique projection of r0, is replaced by a block Krylov space generated from a matrix

R0with kcolumns. Then they generalized the transition from BiCG to BiCG-

Stab to the new situation. This led to very complicated formulas, which were then

meticulously modified to get a simpler and efficient code. The method was shown to

converge amazingly well for a large number of fairly ill-conditioned examples, handled

mostly without preconditioning and with rather large k, 25 ≤k≤100, however, which

meant high memory consumption and considerable computational cost.4Although

the essence of the paper is well explained and easy to understand, the complexity

of the formulas and, paradoxically, the treatment of all the details must have kept

people from reading the paper and applying the method — despite the very promising

numerical results. The authors were aware of the connection to nonsymmetric block

Lanczos methods [1, 3, 5, 6, 8, 19], but while these are based on generalizing the

Lanczos three-term recursions, Yeung and Chan generalized the two-term recursions

of BiCG, as was done before by Simoncini [23]. This was partly the reason for the

complex formulas.

In [38] Wesseling and Sonneveld had announced a further publication on IDR to

be in preparation, but only in 2007 such a paper was submitted, again to SISC; see

[31] and, for the final version, [32]. In the sequel of an enquiry by Jens-Peter Zemke

[private communication] Sonneveld and van Gijzen reconsidered IDR and generalized

it to IDR(s), where the original method is included as the case s= 1 (except for a

3Note that many other authors have introduced other classes of “hybrid” iterative methods for

linear systems.

4For one example, the matrix ORSIRR1, the dependence on kwas investigated for small k, where

dramatic improvements can be noticed for 2 ≤k≤10 already; see Fig. 3(b) of [39].

small but interesting detail). They also clarify the relation of their method to BiCG-

Stab (when s= 1) and, in the final version, to ML(s)BiCGStab (when s > 1), which

they did not know when submitting the paper. It turns out that the even indexed

IDR(1) residuals are (up to roundoff effects) exactly the BiCGStab residuals and

that likewise every (s+ 1)th IDR(s) residual is up to the possibly different choice of

the parameters ωjaML(s)BiCGStab residual. However, the way these residuals are

constructed differs, and the “intermediate” residuals do not exist in BiCGStab and

differ in ML(s)BiCGStab, respectively. The paper also features numerical examples

that are relevant for practice and demonstrate the power of the method even for small

values of swhere the cost per step in nis small.

In a follow-up publication, Sleijpen, Sonneveld, and van Gijzen [25] introduced

partly different notation and tried to explain IDR(s) and its connection to BiCGStab

from a somewhat different viewpoint, but this author prefers the presentation in [32].

They also introduce methods similar to ML(s)BiCG and ML(s)BiCGStab.

In the recent publication [36] van Gijzen and Sonneveld introduce yet another,

very ingenious algorithm that fits into the IDR framework and leads to an elegant code.

It uses recursions that are quite different from those of the original IDR(s) of [32] and

produces “intermediate residuals” satisfying additional orthogonality conditions that

lead to shorter recurrence relations for some nand an improved memory management.

The authors do not introduce a separate name for this new algorithm, but in some

of the figures they refer to it as IDR(s) Bi-ortho. We will discuss this IDR variant in

Section 5 and use the shorter acronym IDR(s)BiO here.

2. From BiCG to ML(k)BiCGStab.In this section we review some basic facts

on Krylov space solvers putting special emphasis on the biconjugate gradient (BiCG)

method, and then look at the transition from BiCG to BiCGStab. We also have a

quick look at Yeung and Chan’s [39] generalization of these methods to ML(k)BiCG

and ML(k)BiCGStab, respectively, which feature multiple initial left (or shadow)

residuals.

2.1. Krylov space solvers based on projection. Given a nonsingular linear

system Ax =b∈CNand an initial approximation x0along with its residual r0:≡

b−Ax0, a Krylov space solver constructs recursively approximate solutions xn(often

referred to as iterates) such that

xn∈x0+Kn,

where

Kn:≡ Kn(A,r0) :≡span {r0,Ar0, . . . , An−1r0}.

is the nth Krylov subspace generated by Afrom r0.5

Two of the basic theoretical facts on this setting are: (i) There is a minimal ν

such that Kνis invariant. (ii) For the solution x?holds that x?∈x0+Kν, and νis the

minimal index for which this is true. So, if we choose xnwell, the solver terminates

in νsteps. In particular, it suffices to choose the iterates so that the corresponding

5Kn(B,y) denotes in this paper a Krylov subspace generated by Bfrom y, while Knwithout an

argument is an abbreviation for Kn(A,r0).

residuals rnare linearly independent unless zero. In practice, νis typically large (close

to N), and therefore this finite termination property is irrelevant.

The true aim is to find xnvery close to x?in few (or at least not very many)

steps. Because of the limited computer memory, it is important to find solvers that

allow us to compute xnwith short recursions. The restriction xn∈x0+Knimplies

that

rn∈r0+AKn⊆ Kn+1 .(2.1)

Most methods produce residuals that have a component in the “new part of the

space”, that is rn6∈ r0+AKn−1; in others there may occur exceptional situations

with rn∈r0+AKn−1, which implies that the residuals are linearly dependent at this

moment.

We will refer to spaces of the form r0+AKnor the form x0+Knas affine Krylov

subspaces.

Since the goal is a small rn, we need to approximate r0by elements from AKn.

E.g., krnkis minimum if we choose rnas the perpendicular from r0to its orthogonal

projection into AKn. This is the basis of the conjugate residual (CR) method [33] for

Hermitian systems and its various generalizations for the non-Hermitian case, such as

GCR and GMRes. For these methods we have

rn∈r0+AKn,rn⊥AKn.

Some Krylov space solvers are based on other orthogonal or oblique projections. In

particular, for the biconjugate gradient (BiCG) method [18, 4], which is of special

importance here, we have

rn∈r0+AKn,rn⊥e

Kn:≡ Kn(A

?,er0).(2.2)

Here, the initial shadow residual er0can be chosen arbitrarily; preferably, it should be

in arbitrary position with respect to an eigenbasis of A

The most often used recursions for BiCG are the coupled two-term or BiOMin

recursions, which can be written as follows:

αn:= δn/δ0

n,(2.3a)

rn+1 := rn−Avnαn,(2.3b)

ern+1 := ern−A

vnαn,(2.3c)

xn+1 := xn+vnαn,(2.3d)

δn+1 := hern+1,rn+1i,(2.3e)

βn:= δn+1/δn,(2.3f)

vn+1 := rn+1 +vnβn,(2.3g)

vn+1 := ern+1 +e

vnβn,(2.3h)

δ0

n+1 := he

vn+1,Avn+1i.(2.3i)

In addition to the residuals rnand iterates xnthree other sets of vectors are con-

structed: shadow residuals (or left-hand side Lanczos vectors) ern∈er0+A

Kn, search

directions vn∈v0+A

?Kn, and shadow search directions e

vn∈e

v0+A

Kn. All these

vectors are updated by coupled two-term recursions.

2.2. Residual polynomials and Lanczos-type product methods. Let Pn

denote the space of polynomials of degree at most n, and let P◦

n:≡ {ρ∈ Pn;ρ(0) =

1}. The inclusion (2.1) implies that one can associate rnwith a residual polynomial

ρn∈ P◦

nsuch that rn=ρn(A)r0. Roughly, krnkis small if |ρn(t)|is small at those

eigenvalues of Athat are “active” when r0is written in terms of the eigenbasis of A.

(This statement needs to be modified when Ahas no eigenbasis or an ill-conditioned

one.) This observation motivates derivations of Krylov space methods via real (when

Ais Hermitian) or complex (when Ais non-Hermitian) approximation problems. It

must also have motivated Sonneveld’s CGS method [30], where, as noted in (1.2), the

residual polynomials are the squares of the BiCG residual polynomials ρn. Clearly,

whenever |ρn(t)| ¿ 1 at an eigenvalue, |ρ2

n(t)|is even much smaller there. But often

the residual norm of BiCG oscillates wildly as a function of n, and then the one of

CGS oscillates even more. Avoiding or at least damping this was the motivation for

van der Vorst [34] for the choice (1.3) in BiCGStab. Recall that there, at step n,ωn

is chosen to minimize the residual norm on a straight line.

Let ρndenote the polynomial obtained from ρnby complex conjugation of the

coefficients. Then, in BiCG, we have rn=ρn(A)r0and ern=ρn(A

?)er0. In addition,

the search direction polynomials σn∈ Pn\Pn−1associated with the search directions

vnplay an important role: since v0=r0and e

v0=er0we have vn=σn(A)r0and

vn=σn(A

?)er0. Hence, associated with the recursions (2.3b), (2.3c), (2.3g), and

(2.3h) there are the underlying coupled polynomial recursions

ρn+1(t) := ρn(t)−αntσn(t), σn+1(t) := ρn+1(t) + βnσn(t).(2.4)

They are fundamental for deriving CGS and BiCGStab, and also, as we will see in

Section 4, for understanding the difference between IDR(1) and BiCGStab.

For CGS it is easy to derive from (2.4) four coupled recursions for the polynomials

ρCGS

n:≡ρ2

n,σ2

n,ρnσn, and ρnσn−1, which can then be translated into recursions

for elements of Kν. Likewise, for BiCGStab one combines (2.4) with the trivial

recursion Ωn+1(t) = (1 −ωnt)Ωn(t) to derive three recursions for the three products

ρBiCGSTAB

n:≡ρnΩn,ρnΩn−1, and σnΩn. In both cases alternative recursions exist too.6

2.3. Multiple initial shadow residuals. Yeung and Chan [39] generalized

BiCG by replacing the left Krylov subspaces e

Knby block Krylov subspaces, which

are a sum of Krylov spaces for the same matrix A

?but with several different initial

shadow residuals, stored as the columns of an N×smatrix e

R0. Yeung and Chan

called the resulting method the ML(s)BiCG method (except that they used kinstead

of s). The residuals whose index is a multiple of ssatisfy

rsj ∈r0+AKsj ,rsj ⊥ Kj(A

?,e

R0) :≡

i=1

Kj(A

?,er(i)

0).(2.5)

6In [11] four sets of equivalent recursions for CGS derived from the BiORes and BiODir recur-

sions of BiCG are given; however, they are all more complicated than the original CGS recursions,

and therefore more costly in work and storage.

For the others, with index n=sj +`, where 1 < ` < s, we have analogously

rn∈r0+AKn,

rn⊥ Kj;`(A

?,e

R0) :≡

i=1

Kj+1(A

?,er(i)

0) +

i=`+1

Kj(A

?,er(i)

0).

(2.6)

These residuals could also be constructed by using a variant of the nonsymmetric block

Lanczos method, where the block size of the left block Krylov space is s, while the

one of the right (block) Krylov space is just one, that is, the right space is an ordinary

Krylov space; see [1, 3, 5, 6, 8, 19] for ways to construct bases for these spaces with

short recursions. Yeung and Chan rather generalize block BiCG, described before

by O’Leary [20] and Simoncini [23], but the latter authors assumed the same block

size in the left-hand and the right-hand side Krylov spaces. Unfortunately, in theory

and practice there can (and ultimately do) occur problems that must be addressed

by block size reduction (deflation). Moreover there may occur Lanczos breakdowns

and pivot breakdowns; see, e.g., [13] for a discussion of the breakdowns of BiCG.

Deflation and breakdowns have not been addressed in [39].

Yeung and Chan [39] continued by applying to ML(s)BiCG the same transforma-

tion that turns BiCG into BiCGStab. Unfortunately, this lead to rather complicated

formulas, which they were able to simplify and economize somewhat by algebraic ma-

nipulations. The resulting algorithm, called ML(s)BiCGStab, was shown to be very

effective for a large number of rather ill-conditioned test matrices.

Under the titles Bi-CG and Bi-CGSTAB Sleijpen, Sonneveld, and van Gijzen [25]

sketched two methods that are in spirit the same as ML(s)BiCG and ML(s)BiCG-

Stab, but in detail differ considerably. Firstly, they are not using the equivalent of

Lanczos’ coupled two-term recursions; secondly, their “intermediate” residuals satisfy

only a block biorthogonality not the stricter requirement of (2.6) that determines the

residuals of ML(s)BiCG uniquely.

3. IDR basics. In this section we review the basic facts on the IDR(s) method,

following essentially the presentation in [32]. One aspect that we stress more explicitly

than Sonneveld and van Gijzen is that IDR(s) is a Krylov space method, and therefore

the residuals lie in an affine space that is embedded in a Krylov subspace. We also

try to give more realistic figures, although we will see that they still do not reflect the

whole truth.

3.1. The IDR Theorem. The IDR approach is based on a finite series of nested

linear subspaces Gjof diminishing dimension with the property that for some increas-

ing index sequence {nj}the residuals rnwith n≥njall lie in Gj. Of course, all

residuals lie in the invariant Krylov space Kν:≡ Kν(A,r0); therefore, we can start

with n0:≡0 and G0:≡ Kν. The other spaces Gjare defined by the recursion

Gj:≡(I−ωjA)(Gj−1∩ S) , .(3.1)

Here, Sis a prescribed linear subspace of codimension s¿N, and the constants

ωj6= 0 will be suitably chosen to boost convergence. Let us denote the dimension

of Gjby dj. Clearly, Gj−1∩ S can be represented by N−dj−1+slinear equations,

and it is likely that these are linearly independent. However, as pointed out in [32],

G0=R3

G0∩ S =S

I−ω1A

G1∩ S

G2∩ S ={0}

I−ω2A

Fig. 3.1.Case s= 1: The spaces R3=K3=G0)G1(G2.

we cannot conclude easily that linear independence is here a generic property (valid

for almost all problems if data are chosen randomly) since Gj−1actually depends on

S. But, typically, Gj−1∩ S and its image Gjhave dimension dj=dj−1−s. We will

mostly take it for granted that this and other regularity assumptions are satisfied,

and we will refer to this then as the regular case.7One can see, however, by analogy

to the behavior of the related ML(s)BiCG and ML(s)BiCGStab methods, that we

cannot expect that dj=dj−1−sremains true for jup to N/s if s > 1.

The IDR Theorem given next says that the spaces Gjare nested and, under mild

assumptions on S, the inclusion is strict: Gj(Gj−1, two properties that are not

apparent. For an illustration see Fig. 3.1.

Theorem 1 (IDR Theorem [38, 32]). Assume that S ∩G0contains no eigenvector

of A. Then

Gj(Gj−1unless Gj−1={o}.

For the proof see [38, 32]. As a consequence of the strict inclusions, Gj={o}for some

j≤N, say, j≡:J. However, the bound J≤Nthat follows from the IDR Theorem

leads to a strong overestimation of the finite termination index, for the simple reason

that termination is characterized by dJ= 0, which we can expect for Jof the size

N/s.

Sonneveld and van Gijzen [32] also provide the Extended IDR Theorem, whose

main result is that the difference dj−dj+1 is monotonically non-increasing:

0≤dj−dj+1 ≤dj+1 −dj≤s .

Alternatively, this result could also be concluded from the connection to ML(s)BiCG.

Of course, neither G0nor the other spaces Gjare known in advance, in the sense

that we know a basis for them. In theory, the IDR algorithm would provide these

7The authors of [32] refer to it as the generic case, although this may not be fully consistent

with the common usage of the word generic.

G0=R3

r2= (I−ω1A)v1

G0∩ S =S

I−ω1A

G1∩ S

Fig. 3.2.Case s= 1: The first two steps: construction of r2and r3(for details see §4.1).

bases if we continued it until the exact solution of Ax =bis found, that is, rν=ois

attained; but this is not feasible unless νis very small, so that the linear system can

be solved exactly in a few steps.

IDR constructs typically only s+ 1 residuals in Gjbefore turning to the next

subspace Gj+1 (Gj. To accommodate exceptional situations we introduce a mono-

tonically growing index sequence {nj}defined implicitly by8

rn∈ Gj∩(r0+AKn), n ≥nj. (3.2)

In the normal case, nj= (s+ 1)j, so nj+1 −nj=s+ 1 for reasons we will see in a

moment, but nj+1 −nj> s + 1 may occur in exceptional situations. In the simplest

case s= 1, which is when IDR is closely related to BiCGStab, two new residuals are

computed for each j. This is depicted in Figure 3.2. The details of the construction

are discussed next.

3.2. Recursions. The recursion for the residuals builds upon the recursion (3.1)

for the spaces Gj:

rn+1 := (I−ωjA)vn,vn∈ Gj−1∩ S ∩ (r0+AKn),(3.3)

We suppose here that, for all n,vnlies in the affine Krylov subspace spanned by

r1, . . . , rnand shifted by r0. This means that vnand thus also rn+1 have “maximum

degree” in the sense that vn6∈ r0+AKn−1and vn6∈ r0+AKn. There may be

situations where the latter assumptions do not hold, and, according to [32], there are

in the IDR framework ways to recover from such situations, but we will not treat

that here. There is some vague analogy to look-ahead Lanczos [7] or look-ahead block

Lanczos [1] in such recovery procedures.

8It is conceivable that the inclusion in (3.2) holds by chance for some n<njtoo, but for n≥nj

the inclusion will be forced by construction.

To construct vectors or “points” rn+1 ∈ Gjwe need vectors vn∈ Gj−1∩ S,

and the first time we construct a point in Gjwe can choose ωj. To construct vnin

Gj−1∩ S we need to intersect an s-dimensional affine subspace of Gj−1(in theory it

could be represented by N−sinhomogeneous linear equations) with the subspace S

represented by shomogeneous linear equations that we may write as P?vn=owith

an N×smatrix Pwhose columns form a basis of S⊥. In theory we would end up

with Ninhomogeneous linear equation for vnthat will usually have a unique solution,

but this is of course not feasible in practice. Instead we represent the s-dimensional

affine subspace of Gj−1directly by an affine combination (a linear combination whose

coefficients sum up to one) of s+ 1 points in Gj−1. The natural choice for these

s+ 1 points are the last computed residuals rn−s,...,rn. Here we see why we need

nj+1 −nj≥s+ 1. A neat way to take the condition of an affine combination

into account is to introduce the differences of the residual vectors, and that is what

Sonneveld and van Gijzen do:

vn:= rn−

ι(n)

i=1

γ(n)

i∆rn−i=rn−∆Rncn,(3.4)

where

s≤ι(n)≤n−nj−1,

∆rn:≡rn+1 −rn,

∆Rn:≡£∆rn−1. . . ∆rn−ι(n)¤,

cn:≡hγ(n)

1. . . γ(n)

ι(n)iT

The restriction ι(n)≤n−nj−1ensures that ∆rn−i∈ Gj−1. Usually, ι(n) = s,

but, again, there may be exceptional situations not covered here, where one needs to

choose a bigger ι(n). Note that using the differences of the residuals leads to a vn

whose polynomial representation automatically inherits from rnthe value 1 at zero.

Therefore, indeed vn∈r0+AKn, and we may view vnas a residual, so there is

n∈x0+Knsuch that vn=b−Ax0

To enforce vn∈ S we enforce9vn⊥ S⊥=R(P), that is, P?vn=o. This means

that the term ∆Rncnin (3.4) must be the oblique projection of rninto R(∆Rn)

along S. In order that this projection is uniquely defined, we need P?∆Rnto be

nonsingular, in particular ι(n) = sto make the matrix square. Then,

cn:≡(P?∆Rn)−1P?rn,vn:= rn−∆Rncn.(3.5)

Otherwise, when ι(n)> s, we might choose cnas the minimum norm solution of an

underdetermined least squares problem.

For the initial phase, that is, for constructing r1, . . . , rs, we may apply a fairly

arbitrary starting procedure, e.g., GMRes.

We need not just one point rn+1 ∈ Gj, but we need at least s+ 1 of them before

we can continue to the next space Gj+1. At first one might expect to need 2s+ 1

points in Gj−1to repeat the above construction s+ 1 times. However, this is not the

9R(P) denotes the range of P.

case. Because Gj⊂ Gj−1, the just constructed rn+1 ∈ Gjalso qualifies as a point of

Gj−1and can be used when we replace nby n+ 1 in the above construction.10 So,

s+ 1 points in Gj−1\Gjwill usually be enough. However, we cannot fully exclude

degenerate situations, where the last s+ 1 points constructed in Gj−1do not span

an s-dimensional affine space and therefore ∆Rnis singular (or nearly so). A careful

implementation of the method will need to address such situations, which are also

reflected by a zero (or absolutely small) coefficient in the tnterm of the polynomial

representation of some of the vectors vn.

For the case s= 1, the first two steps, from given r0and r1to r2and r3, are

shown in Figure 3.2. Then the construction is continued till v5=r6=oin Figure 3.3.

However, our figures show actually one of the “exceptional situations” we just referred

to: the constructed residuals are not all linearly independent. In fact, since the figures

show a construction in R3(i.e., N= 3), linearly independent residuals would mean

convergence in at most 3 steps, that is, r3=o.

We could avoid using vnby inserting (3.4) in (3.3):

rn+1 := rn−∆Rncn−ωjA(rn−∆Rncn) (3.6)

This formula manifests that IDR(s) differs considerably from most commonly used

Krylov space solvers such as CG,BiCG, or GCR. The difference is that in (3.6) not

only rnis multiplied by Abut also rn−s, . . . , rn−1. This means that in the terminology

of [10] IDR(s) is a (s+ 1, s + 1)-step method, while, e.g., CG,BiCG are (2,1)-step

methods, and OrthoMin(k) is a (k, 1)-step method, while the untruncated GCR is

a (∞,1)-step method.

As mentioned before, ωjcan only be chosen when we construct the first point in

Gj, that is, rn+1 with n+ 1 = nj. The formula rn+1 = (I−ωjA)vnsuggest that we

10The IDR(s) variant of Section 5 will differ in the choice of points used in (3.4).

G0=R3

r2= (I−ω1A)v1

G0∩ S =S

I−ω1A

G1∩ S

G2∩ S ={0}={v5}

I−ω2A

Fig. 3.3.Case s= 1: Construction of r4and r5. Termination with v5=o(for details see §4.1).

choose ωjso that krn+1kis minimal among all rof the form r= (I−ωjA)vn, that

is, we choose it such that rn+1 ⊥Avn:

ωj:≡hAvn,vni

kAvnk2.(3.7)

Note that this value of ωjmay turn out to be zero or close to zero. As in BiCGStab

this is a source of breakdown or instability, but it is easily cured by choosing another

value that does not minimize the residual locally.

Finally we need to address the fact that it does not suffice to construct residuals

rnand vn, but that we also need the corresponding approximate solution xnand

n∈x0+Kn. It is readily verified that

vn:= rn−∆Rncn⇐⇒ x0

n:= xn−∆Xncn,(3.8)

rn+1 := (I−ωjA)vn⇐⇒ xn+1 := ωjvn+x0

n.(3.9)

There are several ways to rearrange these four recursions and to combine them with

the iterate-residual relationships; see [25]. Also in the “prototype algorithm” of [32]

a different, but equivalent set of recursions is used. It includes the analog of (3.6) for

xn+1,

xn+1 := xn−∆Xncn+ωj(rn−∆Rncn) (3.10)

and the relation ∆rn=−A∆xn.

Note that here, as in any competitive set of recursions, the major cost of comput-

ing xn∈x0+Knconsists in n+ 1 matrix-vector products (MVs) with A. Regarding

memory, one needs just to store scolumns of each P, ∆Xn, and ∆Rn, plus a few

single N-vectors.

3.3. Characterization by orthogonality. Clearly, the dimension of Gjgets

reduced due to taking the intersection with the (N−s)–dimensional space S. This

dimension reduction is viewed as the basic force behind IDR and gave the method its

name. However, dimension reduction in Krylov space solvers is not at all a unique

feature of IDR. In fact, projection based methods can be understood in a similar way.

For example, the characterization (2.2) of the BiCG residuals could be written as

rn∈ L⊥

n∩(r0+AKn),

where Ln=e

Kn=Kn(A

?,er0), and for CR,GCR, and GMRes the same is true with

Ln=AKn. What is different in IDR is that Gjis not an orthogonal complement of

a Krylov subspace. However, due to the form of the recursion for {Gj},Gjturns out

to be the image of an orthogonal complement of a Krylov subspace. This result is

implicit in Subsection 5.1 of [32] and has been explicitly formulated in [25]:

Gj=nΩj(A)w¯¯w⊥ Kj(A

?,P)o= Ωj(A) [Kj(A

?,P)]⊥.(3.11)

Here, as before, Ωj(t) :≡(1 −ω1t)· · · (1 −ωjt)∈ P◦

j, and Kj(A

?,P) is the jth block

Krylov subspace generated by A

?from the scolumns of P, which are assumed to be

a basis of S⊥. Note that when we choose e

R0=Pthis block Krylov subspace is the

same as in ML(k)BiCG, see (2.5).

Note also that the larger s, the larger is Kj(A

?,P), and thus the smaller is Gj.

To prove (3.11) let us repeat here the argument from [32] that is linked to the

recursions that we have just discussed and provides further insight. (The induction

proof of (3.11) in [25] is different.) We start from (3.3) and the observation that

vn∈ Gj−1must likewise be of the form

vn= (I−ωj−1A)v0

n,v0

n∈ Gj−2∩ S ∩ (r0+AKn−1),

n= (I−ωj−2A)v00

n,v00

n∈ Gj−3∩ S ∩ (r0+AKn−2),

v(j−2)

n= (I−ω1A)wn+1 ,wn+1 ∈ G0∩ S ∩ (r0+AKn−j+1).

Starting at the bottom, we can also write (with Ω0≡1):

wn+1 = Ω0(A)wn+1 ∈ G0∩ S ,

v(j−2)

n= Ω1(A)wn+1 ∈ G1∩ S ,

n= Ωj−2(A)wn+1 ∈ Gj−2∩ S ,

vn= Ωj−1(A)wn+1 ∈ Gj−1∩ S ,

rn+1 = Ωj(A)wn+1 ∈ Gj.

Since {Ωk}j−1

k=0 is a basis of Pj−1we see that

Ω(A)wn+1 ∈ S (∀Ω∈ Pj−1),

that is, P?Ω(A)wn+1 =o∈Csor, in other words, wn+1 ⊥Ω(A

?)P,∀Ω∈ Pj−1, or,

wn+1 ⊥ Kj(A

?,P). In summary, we conclude that any rn+1 ∈ Gjis of the form

rn+1 = Ωj(A)wn+1 ,wn+1 ∈ G0∩ S ∩ (r0+AKn−j+1),wn+1 ⊥ Kj(A

?,P).

(3.12)

This proves (3.11). For simplicity, we may replace n+1 by nhere and, for our records,

write any rn∈ Gj(n=nj, . . . , nj+1 −1) as

rn= Ωj(A)wn,wn∈ G0∩ S ∩ (r0+AKn−j),wn⊥ Kj(A

?,P).(3.13)

Note that for n=nj−1 and n=nj, the polynomials associated with wnhave

the same degree: wn∈r0+AKnj−j. (That is why we chose nas index for wn,

although this is not the degree of the associated polynomial.)

In the generic case, for fixed j, we will construct nj+1 −nj=s+ 1 linearly

independent vectors wnthat provide s+ 1 linearly independent vectors rn(with

nj≤n<nj+1). So, as long as we stay in the generic case, nj=j(s+ 1).

Moreover, generically, for n=nj=j(s+ 1) where wn∈r0+AKjs and wn⊥

Kj(A

?,P) with dim Kj(A

?,P) = js =dim AKjs, there is a unique wnsatisfying

(3.13), since it can be characterized as the solution of a linear system with a js ×js

matrix that can be assumed to be nonsingular in the generic case:

Theorem 2 ([32]). Assume nj=j(s+1),j= 1,2, . . . , J, and assume the iterates

xnand residuals rnof IDR(s)are for n≤nJuniquely constructible by the recursions

(3.3) and (3.4) with the choice ι(n) = sand the coefficients cnfrom (3.5). Then, for

j≤J, the residuals wnjand rnjare uniquely characterized by the conditions (3.13).

Corollary 3. Under the assumptions of Theorem 2, and if the same parameters

ωj,(1 ≤j≤J)have been chosen, the IDR(s)iterates xnjand residuals rnjare identi-

cal with the iterates xjand residuals rjof BiCGStab (if s= 1) or ML(s)BiCGStab

(if s= 1), respectively.

But the sother vectors wn(with nj< n < nj+1), and thus also the corresponding

residuals rnare not uniquely determined by (3.13). We cannot expect that they

appear in BiCGStab or ML(s)BiCGStab, and, in fact, they usually do not.

4. The case s= 1 and the comparison with BiCGStab.If s= 1, the

subspace Sis a hyperplane determined by a single vector p⊥ S. So the matrix P

consists of the single column pnow. By Corollary 3, when s= 1, every other set of

vectors {wn,rn,vn−1,xn, . . . }(with neven) is uniquely determined up to the choice

of the parameters ωj. If the latter are chosen as in BiCGStab (and they usually are),

and if er0:= pin BiCGStab, then

r2j=r

BiCGSTAB

j,x2j=x

BiCGSTAB

j,w2j=r

BICG

j.(4.1)

So there remains the question whether and how BiCGStab and IDR(1) differ.

In order to answer this question we will look at the polynomial recursions that mirror

the recursions for the Krylov space vectors generated by the two methods.

4.1. Recursions and orthogonality properties of IDR(1).When s= 1 the

recursions (3.8) and (3.9) of IDR(s) simplify to

vn:= rn−γn(rn−rn−1),x0

n:= xn−γn(xn−xn−1),

rn+1 := (I−ωjA)vn,xn+1 := x0

n+ωjvn,(4.2)

where n≥1, j=b(n+ 1)/2c. The first line can be written

vn:= (1 −γn)rn+γnrn−1,x0

n:= (1 −γn)xn+γnxn−1,

to manifest that the point represented by vnlies on the straight line through rnand

rn−1, and likewise, x0

nlies on the line through xnand xn−1. By (3.5), γn:≡γ(n)

hp,rni/hp,∆rn−1iis chosen such that vn∈ S, that is, vn⊥p. This is illustrated

in the Figures 3.2 and 3.3. The parameter ωjis usually chosen to make r2jas short

as possible; this means that r2jis orthogonal to Av2j−1; see (3.7). (This property is

not taken into account in the figures.)

From (4.1) and (3.13) we know that

w2j=rBICG

j=ρj(A)r0⊥e

Kj,(4.3)

where ρjis still the jth Lanczos polynomial, and where now e

Kj:≡ Kj(A

?,p). Ac-

cording to (3.13) w2j+1 is represented by a polynomial bρj+1 ∈ P◦

j+1 and

w2j+1 =bρj+1(A)r0⊥e

Kj.(4.4)

So, since rn= (I−ωjA)vn−1= Ωj(A)wn, we have

rn= Ωj(A)wn=(Ωj(A)ρj(A)r0if n= 2j ,

Ωj(A)bρj+1(A)r0if n= 2j+ 1 ,

vn= Ωj−1(A)wn+1 =(Ωj−1(A)ρj(A)r0if n= 2j−1,

Ωj−1(A)bρj+1(A)r0if n= 2j .

(4.5)

Inserting these formulas into vn= (1 −γn)rn+γnrn−1we get, after a short

calculation, for n= 2j+ 1 and n= 2j, respectively,

ρj+1(t) := (1 −γ2j+1)bρj+1(t) + γ2j+1 ρj(t) (j= 0,1,2, . . . ),

bρj+1(t) := (1 −γ2j) (1 −ωjt)ρj(t) + γ2jbρj(t) (j= 1,2, . . . ).(4.6)

4.2. Comparison with the recursions and orthogonality properties of

BiCGStab.The Lanczos (residual) polynomials ρjand the BiCG search direction

polynomials σjare formal orthogonal polynomials (FOPs) in the sense that, for i6=j,

ρi⊥ρj⇐⇒ hρi(A

?)er0, ρj(A)r0i= 0 ⇐⇒ Der

BICG

i,r

BICG

jE= 0 ,

σi⊥tσj⇐⇒ hσi(A

?)er0,Aσj(A)r0i= 0 ⇐⇒ De

BICG

i,Av

BICG

jE= 0 ,

where vBICG

jand e

vBICG

iare the search directions and the “shadow” search directions,

respectively, which appeared in the recursions (2.3a)–(2.3i). Since {ρ0, . . . , ρj−1}and

{σ0, . . . , σj−1}both span Pj−1, we actually have

ρj⊥ Pj−1, σj⊥tPj−1,⇐⇒ r

BICG

j⊥e

Kj,v

BICG

j⊥Ae

Kj.

Here, h., .iAdenotes a formal A–inner product. In summary, the basic BiCG recur-

sions (2.3b), (2.3c), (2.3g), and (2.3h) upon which BiCGStab builds too, are mirrored

by the following recursions for ρjand σj:

ρj+1(t)

| {z }

⊥Pj

:= ρj(t)

|{z}

⊥Pj−1

−αjtσj(t)

|{z}

⊥Pj−1

, σj+1(t)

| {z }

⊥tPj

:= ρj+1(t)

| {z }

⊥Pj

+βjσj(t)

|{z}

⊥tPj−1

.(4.7)

Here, both αjand βjare chosen so that the new polynomials ρj+1 and σj+1 feature

the indicated orthogonality properties, by which they are uniquely determined up to

a scalar multiple.

In contrast, in IDR(1), by (4.6), (4.3), and (4.4),

bρj+1(t)

| {z }

⊥Pj−1

:= (1 −γ2j) (1 −ωjt)ρj(t)

| {z }

⊥Pj−2

+γ2jbρj(t)

|{z}

⊥Pj−2

ρj+1(t)

| {z }

⊥Pj

:= (1 −γ2j+1)bρj+1(t)

| {z }

⊥Pj−1

+γ2j+1 ρj(t)

|{z}

⊥Pj−1

ρj+1

ρj

tσj

−αjt σj

ρj+1

−α0

jt σj

Fig. 4.1.Case s= 1: The connection between BiCGStab and IDR(1).

Comparing these recursions for (ρj,bρj) with (4.7) we easily see that

(1 −γ2j+1) (bρj+1(t)−ρj(t)) = −αjt σj(t).

So,

bρj+1(t) = ρj(t)−αj

1−γ2j+1

t σj(t),(4.8)

or, after multiplication by Ωj(t) and translation into the Krylov space,

r2j+1 =r2j−αj

(1 −γ2j+1)As

BiCGSTAB

j,where s

BiCGSTAB

j:≡Ωj(A)v

BICG

j.(4.9)

This formula expresses the odd indexed IDR(1) residuals r2j+1 in terms of quantities

from BiCGStab and the IDR coefficient γ2j+1. We illustrate the connection in Fig-

ure 4.1. While BiCGStab implicitly constructs ρj+1 by enforcing a biorthogonality

condition on a polynomial that lies on the line determined by ρjand t σj, IDR(1) first

generates by the second recursion in (4.6) the polynomial bρj+1 that lies on that line

and then also enforces this condition.

Let us finally note that the parameter ωj, which is in (3.7) chosen to make r2j

orthogonal to Av2j−1, is indeed the same in IDR(1) and BiCGStab, since v2j−1is

the same in both methods.

4.3. How does the original IDR differ from IDR(1)?. In contrast to IDR(1)

of [32], where (4.2) holds for all n > 1, the original IDR of [38] used for nodd the

recursions

vn:= rn−γ0

n(rn−1−rn−2),x0

n:= xn−γ0

n(xn−1−xn−2),

rn+1 := (I−ωjA)vn,xn+1 := x0

n+ωjvn,(4.10)

with γ0

n:≡ hp,rni/hp,∆rn−2i. So, here, when computing the “intermediate iterate”

none modifies xnby a step in the same direction as has been used in the previous

step for modifying xn−1.

Moreover, in discrepancy of what we have stated here, the new IDR(s) of [32]

computes the residual differences actually as ∆rn=−A∆xn. This couples the re-

cursions for xnand rnmore tightly and thus reduces the gap between the recursively

computed residual and the true residual. This gap is known to be closely linked to the

attainable accuracy that can be achieved with a Krylov space solver; see [9, 15, 27].

The IDR Theorem still applies, and still x2j=xBiCGSTAB

j. This follows from the

fact that the arguments of Subsection 3.3 are still applicable.

5. IDR(s)with locally biorthogonal residuals. Recently, van Gijzen and

Sonneveld [36] came up with a new version of IRD(s), in which, in the regular case

assumed throughout this section, each of sconsecutive “intermediate” residuals rnj+k

is orthogonal to a growing subset of the sprescribed columns pkof P:

rnj+k⊥ {p1, . . . , pk}, k = 1, . . . , s . (5.1)

For distinction we will call this algorithm IDR(s)BiO here.11

IDR(s)BiO still fits into the IDR framework in the sense that the IDR Theo-

rem 1 and the orthogonality result of Theorem 2 as well as its Corollary 3 still apply.

One important aspect where it differs from the original IDR(s) is in the ansatz for

recursively constructing rnj+kfrom previous residuals: while the original version uses

in the formula (3.4) for vnthe latest sresidual differences (i.e., the choice ι(n) = s)

for all n, in IDR(s)BiO that sum involves the sresidual differences

∆rnj−1≡∆rnj−s−1∈ Gj−1∩AKnj−1+1 ,

∆rnj−1+s−1≡∆rnj−2∈ Gj−1∩AKnj−1,

none of which relates to a residual that lies in Gjalready. So, in the case s= 1, there

is an analogy to the original IDR of Sonneveld [38]; see (4.10). Additionally, these

residual differences are actually replaced by another set of svectors

gnj−1≡gnj−s−1∈ Gj−1∩AKnj−1+1 ,

gnj−1+s−1≡gnj−2∈ Gj−1∩AKnj−1,

that are multiples of the residual differences and thus also orthogonal to a growing

subset of the sprescribed columns pk:

gnj−1+k⊥ {p1, . . . , pk}, k = 0, . . . , s −1.(5.2)

However, these residual differences are not defined as before, but undergo a linear

transformation to impose (5.2). Note that in (5.2) the range of the index kis shifted

by 1; so for k= 0 the orthogonality condition is empty.

To construct preliminary vectors in Gj, we now define, for n=nj+k(k=

0, . . . , s), vectors vn∈ S by the ansatz

vn:= rn−

i=1

γ(n)

ignj−1+i−1=rn−Gj−1cn,(5.3)

11Actually, the sets {rnj+k}and {pi}are not biorthogonal, but by a triangular linear transfor-

mation we could replace the basis {pi}of S⊥by {p0

i}so that {rnj+k}and {p0

i}are biorthogonal.

However, the transformation would depend on j.

where

Gj−1:≡£gnj−1. . . gnj−1¤,cn:≡hγ(n)

1. . . γ(n)

si.

cnis determined by the condition vn⊥ R(P). So we have, as in (3.5),

cn:= (P?Gj−1)−1P?rn,vn:= rn−Gj−1cn.(5.4)

Here, the projection along Sis on R(Gj−1), that is, on a space that only depends

on j−1 and therefore is the same for s+ 1 values of n. Consequently, the matrix

P?Gj−1in the s+ 1 linear systems for cnis also the same. (However, the systems

cannot be solved at once, because the vector rnin the right-hand side P?rnresults

from the previous system.)

The elegance of IDR(s)BiO comes from special features that result from the

imposed orthogonality conditions (5.1) and (5.2). Due to (5.2) the matrix

Mj−1≡ {µ(j−1)

i,k0}s

i,k0=1 ≡£mnj−1. . . mnj−1+s−1¤:≡P?Gj−1(5.5)

is lower triangular, and due (5.1) the matrix with the sright-hand sides P?rn(n=

nj, . . . , nj+s−1),

Fj≡ {φ(j)

i,k0}s

i,k=1 ≡£fnj. . . fnj+s−1¤:≡P?£rnj. . . rnj+s−1¤(5.6)

is lower triangular too. Consequently, the matrix with the ssolutions cnof Mjcn=fn

for n=nj, . . . , nj+s−1,

Cj≡ {γ(nj+k0−1)

i}s

i,k0=1 ≡£cnj. . . cnj+s−1¤:≡M−1

jFj(5.7)

is also lower triangular. So its k0th column cnj+k0−1only depends on the (s−k0+1)th

trailing principal submatrix (of order s−k0+ 1) of Mj−1, whereas its first k0−1

entries are zero. In other words, the possibly nonzero entries of cnj+k0−1result from

a (s−k0)×(s−k0) linear system. This means in particular that the recursion (5.3)

becomes shorter while kincreases: for12 n=nj+k,k= 0, . . . , s −1,

vn:= rn−

i=k+1

γ(n)

ignj−1+i−1=rn−Gj−1cn.(5.8)

This not only reduces the computational cost, but it allows us to overwrite Gj−1by

Gjand Mjby Mj+1 inside the loop over k; for details see the pseudocode in [36].

We still need to explain how we find s+ 1 residuals rnj+k∈ Gj(k= 0, . . . , s)

so that (5.1) holds for the last sof them, and how we construct a new set of s

vectors gnj+k∈ Gj(k= 0, . . . , s −1) satisfying (5.2). We may use the orthogonality

conditions (5.1) with jreplaced by j−1 and (5.2) as induction assumption. For the

initialization (j= 0) such sets can be constructed by a one-sided Lanczos process that

combines the generation of a basis for Ks+1 with orthogonalization with respect to

the columns of P. Of course, at the same time, approximate solutions x1, . . . xsneed

to be constructed too, but they are obtained using essentially the same recursions.

12Note that in the presentation of this and other formulas in [36] the notation vn+kwith n=

nj+1 −1 (k= 1,...,s) is used, while here n=nj+k(k= 0,...,s−1) and k0=k+ 1.

Among the s+ 1 residuals in Gj−1satisfying the orthogonality condition, the last

one, rnj−1is orthogonal to all columns of P, whence rnj−1∈ S. So, in accordance

with (5.8) for k=s, where the sum is empty, we can choose vnj−1=rnj−1and thus

rnj:= (I−ωjA)rnj−1.(5.9)

Next, for n=nj+k > nj, any vnobtained from (5.8) lies in Gj−1∩ S ∩ (r0+AKn).

So, by the recursive definition of Gj,

ern+1 := (I−ωjA)vn=rn−Gj−1cn−ωjAvn

is a tentative residual in Gj∩(r0+AKn+1). Since rn∈ Gj∩(r0+AKn),

gn:≡rn−ern+1 =Gj−1cn+ωjAvn∈ Gj∩AKn+1 (5.10)

too, but in order to serve as a column of Gjit needs to be replaced by gnsatisfying

the orthogonality condition

gn⊥ {p1, . . . , pk}, n =nj+k , k = 0, . . . , s −1.(5.11)

This can be achieved by applying a Gram-Schmidt-like process: recursively, the pro-

jection of e

gn≡e

gnj+kon the span of gnj,. . . ,gn−1along the span of p1,. . . ,

pkis subtracted from e

gnto yield gn≡gnj+k∈ Gj∩AKn+1. This can be ex-

pressed as follows: for k= 0 the condition (5.11) is empty, so gnj:= e

gnj; then, for

n=nj+k, k = 0, . . . , s −1,

gn:= e

gn−

i=1

α(j)

i,k gnj+i−1=e

gn−£gnj... gn−1¤a(j)

k,(5.12)

where

a(j)

k:≡³£p1. . . pk¤?£gnj. . . gn−1¤´−1£p1. . . pk¤?e

gn∈Ck.(5.13)

Here, the k×kmatrix £p1. . . pk¤?£gnj. . . gn−1¤is the kth leading principal

submatrix of P?Gj=Mj, and thus it is lower triangular. Therefore, a(j)

kcan be found

by forward substitution. Of course, the diagonal elements µ(j)

i,i =p?

ignj−1+ineed to

be nonzero. Otherwise the process breaks down and the orthogonality condition (5.11)

cannot be satisfied for some n. For the whole block this step is summarized by

Gj:≡£e

gnj. . . e

gnj+s−1¤=GjAO

j(5.14)

with AO

junit upper triangular and P?Gj=Mjlower triangular. Above the diagonal

jcontains the s−1 coefficients vectors a(j)

1(in column 2) to a(j)

s−1(in the last

column). Hence, MjAO

j= (P?Gj)AO

jis an LU decomposition of f

Mj:≡P?e

Gj.

Amazingly, when we replace here this classical Gram-Schmidt-like process by a

modified Gram-Schmidt-like process as it was suggested in [36], there is no need to

solve triangular linear systems.

In matrix notation the first sweep of the modified Gram-Schmidt-like process can

be summarized as

G(1)

j:≡e

GjB(1)

where the upper triangular matrix B(1)

jis given by

B(1)

j:≡





1−β(j)

12 −β(j)

13 . . . −β(j)

1 0 . . . 0

....

1 0







, β(j)

1,k+1 :≡p1,e

gnj+k®

p1,e

gnj®

(with k= 1, . . . , s −1), and has the effect that, by subtracting a multiple of the first

column e

gnjof e

Gjthe columns 2 to sof e

Gjare transformed into columns of G(1)

that are orthogonal to p1. Then, for `= 2, . . . , s −1, in further analogous sweeps, by

subtracting a multiple of the `th column g(`−1)

nj+`−1of G(`−1)

jthe columns `+ 1 to sof

G(`−1)

jare transformed into columns of G(`)

jthat are additionally orthogonal to p`:

G(`)

j:≡G(`−1)

jB(`)

j, ` = 2, . . . , s −1,

where B(`)

jis now for `= 2, . . . , s −1 given by

B(`)

j:≡







1 0 . . . . . . . . . . . . 0

...

1−β(j)

`,`+1 . . . . . . −β(j)

`,s

1 0 . . . 0

....

1 0







, β(j)

`,k+1 :≡Dp`,g(`−1)

nj+kE

Dp`,g(`−1)

nj+`−1E

(with k=`, . . . , s −1). Ultimately, the columns of

Gj:≡G(s)

j=e

GjB(1)

jB(2)

j· · · B(s−1)

j(5.15)

satisfy the orthogonality condition (5.11) and thus are identical to those obtained by

(5.12). Moreover, the comparison with (5.14) reveals that

j=³B(1)

jB(2)

j· · · B(s−1)

j´−1=b

B(s−1)

B(s−2)

j· · · b

B(1)

j,(5.16)

where b

B(`)

j:≡(B(`)

j)−1is obtained by replacing in B(`)

jthe coefficients −β(j)

`,k+1 by

+β(j)

`,k+1. In view of the special structure of the matrices b

B(`)

jone can conclude from

(5.16) that β(j)

`,k+1 =α(j)

`,k. Of course, the matrices e

Gj,G(1)

j, . . . , G(1)

j,Gjcan all be

stored in the same place.

Finally, since by induction (within a block) both rnand gnare orthogonal to

p1, . . . , pk, and moreover, rn∈r0+AKnand gn∈AKn+1, we get rn+1 ∈r0+AKn+1

satisfying (5.1) according to

rn+1 := rn−φ(j)

k+1,k+1

µ(j)

k+1,k+1

gn(n=nj+k;k= 0, . . . , s −1) ,(5.17)

where φ(j)

k,k and µ(j)

k,k are diagonal elements of Fjand Mj, which are enforcing that

rn+1 ⊥pk+1, as can be checked easily.

As has been noted in [36] and can be seen by premultiplying (5.17) with P?the

columns of Fjcan be updated in an elegant way too:

fn+1 := fn−φ(j)

k+1,k+1

µ(j)

k+1,k+1

mn+1 (n=nj+k, k = 0, . . . , s −1) .(5.18)

Sofar, we have concentrated on the possibility of constructing efficiently residuals

satisfying the orthogonality properties (5.1), but we still need to give formulas for

the recursive computation of the approximate solutions xn, and due to the relation

∆rn=−A∆xn, these formulas will lead to other options for updating the residuals.

In general, update formulas for xnare fairly easily obtained from those for rn,

and here this is true also. Let us define

un:≡A−1e

gn,

un:≡A−1gn,

Uj−1:≡£unj−1. . . unj−1¤=A−1Gj−1.

Then, from (5.10) we get

un:= Uj−1cn+ωjvn∈ Kn+1 ,(5.19)

which allows us to replace (5.10) by

gn:= Ae

un.(5.20)

This helps to couple the updates of xnand rnand thus to avoid a fast growth of the

residual gap mentioned before. Moreover, (5.12) translates into

un:= e

un−

i=1

α(j)

i,k unj+i−1(n=nj+k, k = 1, . . . , s −1) .(5.21)

Finally, from (5.17) we get

xn+1 := xn+φ(j)

k+1,k+1

µ(j)

k+1,k+1

un(n=nj+k, k = 0, . . . , s −1) .(5.22)

These are the formulas the IDR pseudocode in [36] is based on. But what makes it

so ingenious is the fact that it minimizes memory usage by systematically overwriting

data that is no longer used. On the other hand, this makes the code harder to

understand.

6. Comments and conclusions. The various IDR algorithms may still be not

as well understood as other algorithms that are directly derived from the Lanczos

process (be it symmetric or non-symmetric), but their close relationship to Lanczos

based algorithms certainly helps us to understand them.

6.1. The case s= 1.This case is easy, because in exact arithmetic the even in-

dexed IDR(1) iterates and residuals are exactly the BiCGStab iterates and residuals.

However, as we have seen, the recursions are not the same, and therefore, it is possible

that IDR(1) is more stable than BiCGStab or vice versa. The odd numbered IDR(1)

iterates and residuals have no counterpart in the original BiCGStab algorithm.

For sure is that the existence of all BiCGStab iterates, i.e., all even IDR(1)

iterates, requires that all BiCG residuals exists, and that therefore any serious Lanczos

breakdown and any so-called pivot breakdown cause BiCGStab and IDR(1) to break

down unless extra precautions against such breakdowns have been implemented. This

follows from the fact that the BiOMin version and the BiORes version of BiCG break

down at the same times: the existence of the residuals implies the existence of the

coupled two-term recursions [13]. Additionally, the choice of the parameters ωjis the

same in BiCGStab and IDR(1), so a breakdown due to ωj= 0 will occur at the same

time in both methods; but in both it is also easy to fix.

It is conceivable that there are further causes for breakdown in IDR(1). On

the other hand, the recovery procedure in case of a breakdown seems to be much

simpler in IDR(1); but so far it seems to be less understood and documented than for

BiCGStab [14].

While IDR(1) and BiCGStab produce in exact arithmetic essentially the same

results based on a common mathematical background, they are clearly different algo-

rithms obtained by different approaches.13

6.2. The case s > 1.The relation BiCG ÃBiCGStab ∼IDR(1) is matched by

the relation ML(s)BiCG ÃML(s)BiCGStab ∼IDR(s), but the similarity between

ML(s)BiCGStab and IDR(s) is weaker than between BiCGStab and IDR(1). In

IDR(s) there is some freedom in choosing the s“intermediate” iterates and residuals

because, unlike in ML(s)BiCGStab, the vectors wnin (3.13) need not satisfy the

strict condition (2.6) of an “intermediate” ML(s)BiCG residual, but only the weaker

block orthogonality condition wn⊥ Kj(A

?,P) of (3.13). With the IDR approach,

such vectors can be obtained with much simpler recursions, which, in addition, allow

considerable flexibility that may enable us to overcome breakdowns.

The many published numerical examples on IDR(s) [25, 32, 36] manifest the fast

convergence of this method and the superiority of the choice s > 1. Due to the careful

choice of the numerical examples and the restriction on small values of s, where the

method is less costly than for large s, the numerical results are more relevant than

those of Yeung and Chan [39], who applied their ML(s)BiCGStab with large sand

mostly without preconditioning to rather ill-conditioned test matrices. Heuristically

it is plausible that these methods are particularly effective for such examples. In

BiCG the construction of a basis of e

Knis prone to roundoff errors, which can be

expected to be larger when Ais ill-conditioned and nis large. When constructing

13From Proposition 5.1 in [25] one may get the impression that BiCGStab and IDR(1) are nearly

identical. But they do not compare the original BiCGStab recursions with the original IDR(1)

recursions.

in ML(s)BiCG a basis of the block Krylov subspace Kj(A?,e

R0) we can start with s

orthonormal basis vectors and work with the dimension jof order n/s of each single

Krylov subspace. Then, for the same value of n, we can expect the basis of the block

Krylov subspace to be better conditioned. A similar improvement of the condition

of the basis can be expected when we compare BiCGStab to ML(s)BiCGStab or

IDR(s), and it seems to be relevant also in preconditioned problems that are not

extremely ill-conditioned.

The effectiveness of this improvement due to changing the dual (“left”) space

and using a better conditioned basis is quite surprising. The discovery of this effect

is due to Yeung and Chan [39], but the equally strong improvement of IDR(s) over

IDR(1), which seems to rely partly on the same effect, was discovered independently

a decade later by Sonneveld and van Gijzen. Additionally, the left-hand side block

Krylov space that is implicitly used by both ML(s)BiCGStab and IDR(s) seems to

be more effective in capturing the spectrum. By choosing in IDR(s) this space (i.e.,

the matrix P) appropriately — and perhaps even adaptively — depending on some

knowledge on the spectrum of Aone may be able to further speed up convergence.

The other fundamental fact is that in the framework of Lanczos-type product

methods, multiple left projections can reduce the MV count. By the reduction of

the MV count we understand a smaller ratio between the search space dimension n

(nj≤n<nj+1) and the number js of orthogonality conditions satisfied by wn.

For n=nj=j(s+ 1) this ratio is 1 + 1

swhile for CGS and BiCGStab it is 2.

This also applies both to ML(s)BiCGStab and IDR(s), but not to ML(s)BiCG,

where building up the left block Krylov space costs sMVs per value of j, while

ML(s)BiCGStab and IDR(s) achieve the same effect with just one MV. Therefore,

ML(s)BiCG is not competitive with ML(s)BiCGStab or IDR(s), except perhaps

in situations where the Lanczos-type product methods fail due to roundoff problems.

(It is well known, that the recursion coefficients produced by BiCGStab are usually

less accurate than the same coefficients produced by BiCG, and there are problems

where BiCGStab fails to converge, while BiCG succeeds.) For this reason, Yeung

and Chan [39] introduced ML(s)BiCG only as a tool for deriving ML(s)BiCGStab.

IDR(s) inherits from BiCGStab also the disadvantage that for a problem with

real-valued data the parameters ωjare all real-valued (when chosen in the standard

way), and therefore the zeros ω−1

iof Ωjcannot approximate complex eigenvalues of

Awell. This problem has been addressed by BiCGStab2 [12] and later by BiCG-

Stab(`) [24, 27] by building up Ωjfrom polynomial factors of degree 2 and `, respec-

tively. Unfortunately, an adaptation of IDR(s) to include this idea in an efficient way

is not straightforward and requires to change the framework. This topic is addressed

in [28].

Acknowledgment. The author acknowledges the advice of and the discussions

with many colleagues, in particular Klaus G¨artner, Martin van Gijzen, Miroslav

Rozloˇzn´ık, Valeria Simoncini, Peter Sonneveld, Man-Chung Yeung, and Jens-Peter

Zemke. He is also indebted to the Technical University Berlin and the DFG Research

Center MATHEON for hosting him.

REFERENCES

[1] J. I. Aliaga, D. L. Boley, R. W. Freund, and V. Hern´

andez,A Lanczos-type method for

multiple starting vectors, Math. Comp., 69 (2000), pp. 1577–1601. Received Oct. 10, 1996,

electr. publ. May 20, 1999.

[2] O. Axelsson,Conjugate gradient type methods for unsymmetric and inconsistent systems of

linear equations, Linear Algebra Appl., 29 (1980), pp. 1–16.

[3] Z. Bai, D. Day, and Q. Ye,ABLE: an adaptive block Lanczos method for non-Hermitian

eigenvalue problems, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 1060–1082 (electronic).

Received Mar. 4, 1997.

[4] R. Fletcher,Conjugate gradient methods for indefinite systems, in Numerical Analysis,

Dundee, 1975, G. A. Watson, ed., vol. 506 of Lecture Notes in Mathematics, Springer,

Berlin, 1976, pp. 73–89.

[5] R. W. Freund,Band Lanczos method (Section 4.6), in Templates for the Solution of Algebraic

Eigenvalue Problems: A Practical Guide, Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and

H. van der Vorst, eds., SIAM, Philadelphia, 2000, pp. 80–88.

[6] ,Band Lanczos method (Section 7.10), in Templates for the Solution of Algebraic Eigen-

value Problems: A Practical Guide, Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and

H. van der Vorst, eds., SIAM, Philadelphia, 2000, pp. 205–216.

[7] R. W. Freund, G. H. Golub, and N. M. Nachtigal,Iterative solution of linear systems,

Acta Numerica, (1992), pp. 57–100.

[8] R. W. Freund and M. Malhotra,A block QMR algorithm for non-Hermitian linear systems

with multiple right-hand sides, Linear Algebra Appl., 254 (1997), pp. 119–157.

[9] A. Greenbaum,Estimating the attainable accuracy of recursively computed residual methods,

SIAM J. Matrix Anal. Appl., 18 (1997), pp. 535–551. Received Apr. 21, 1995.

[10] M. H. Gutknecht,Stationary and almost stationary iterative (k, l)-step methods for linear

and nonlinear systems of equations, Numer. Math., 56 (1989), pp. 179–213.

[11] ,The unsymmetric Lanczos algorithms and their relations to Pad´e approxi-

mation, continued fractions, and the qd algorithm. in Preliminary Proceed-

ings of the Copper Mountain Conference on Iterative Methods, April 1990.

http://www.sam.math.ethz.ch/∼mhg/pub/CopperMtn90.ps.gz and CopperMtn90-7.ps.gz.

[12] ,Variants of BiCGStab for matrices with complex spectrum, SIAM J. Sci. Comput., 14

(1993), pp. 1020–1033. Received Sep. 9, 1991.

[13] ,Lanczos-type solvers for nonsymmetric linear systems of equations, Acta Numerica, 6

(1997), pp. 271–397.

[14] M. H. Gutknecht and K. J. Ressel,Look-ahead procedures for Lanczos-type product methods

based on three-term Lanczos recurrences, SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1051–

1078.

[15] M. H. Gutknecht and Z. Strakoˇ

s,Accuracy of two three-term and three two-term recurrences

for Krylov space solvers, SIAM J. Matrix Anal. Appl., 22 (2000), pp. 213–229.

[16] M. R. Hestenes and E. Stiefel,Methods of conjugate gradients for solving linear systems,

J. Res. Nat. Bureau Standards, 49 (1952), pp. 409–435.

[17] C. Lanczos,An iteration method for the solution of the eigenvalue problem of linear differential

and integral operators, J. Res. Nat. Bureau Standards, 45 (1950), pp. 255–281.

[18] ,Solution of systems of linear equations by minimized iterations, J. Res. Nat. Bureau

Standards, 49 (1952), pp. 33–53.

[19] D. Loher,New block Lanczos solvers for linear systems with multiple right-hand side, PhD

thesis, Diss. no. 16337, ETH Zurich, Zurich, Switzerland, 2006.

[20] D. P. O’Leary,The block conjugate gradient algorithm and related methods, Linear Algebra

Appl., 29 (1980), pp. 293–322. Received Sep. 25, 1978.

[21] J. K. Reid,On the method of conjugate gradients for the solution of large sparse systems

of linear equations, in Large sparse sets of linear equations. Proceedings of the Oxford

Conference of the Institute of Mathematics and Its Applications held in April, 1970, J. K.

Reid, ed., Academic Press, London, 1971, pp. 231–254.

[22] Y. Saad and M. H. Schultz,Conjugate gradient-like algorithms for solving nonsymmetric

linear systems, Math. Comp., 44 (1985), pp. 417–424.

[23] V. Simoncini,A stabilized QMR version of block BICG, SIAM J. Matrix Anal. Appl., 18

(1997), pp. 419–434. Received Mar. 16, 1994.

[24] G. L. G. Sleijpen and D. R. Fokkema,BiCGstab(l) for linear equations involving unsymmet-

ric matrices with complex spectrum, Electronic Trans. Numer. Anal., 1 (1993), pp. 11–32.

Received Mar. 1993.

[25] G. L. G. Sleijpen, P. Sonneveld, and M. B. van Gijzen,Bi-CGSTAB as an induced di-

mension reduction method, Report 08-07, Department of Applied Mathematical Analysis,

Delft University of Technology, 2008.

[26] G. L. G. Sleijpen and H. A. van der Vorst,An overview of approaches for the stable

computation of hybrid BiCG methods, Appl. Numer. Math., 19 (1995), pp. 235–254.

[27] G. L. G. Sleijpen, H. A. van der Vorst, and D. R. Fokkema,BiCGstab(l) and other hybrid

Bi-CG methods, Numerical Algorithms, 7 (1994), pp. 75–109. Received Oct. 29, 1993.

[28] G. L. G. Sleijpen and M. B. van Gijzen,Exploiting BiCGstab(`)strategies to induce di-

mension reduction, Report 09-02, Department of Applied Mathematical Analysis, Delft

University of Technology, 2009.

[29] P. Sonneveld,CGS, a fast Lanczos-type solver for nonsymmetric linear systems, Report

84-16, Department of Mathematics and Informatics, Delft University of Technology, 1984.

[30] ,CGS, a fast Lanczos-type solver for nonsymmetric linear systems, SIAM J. Sci. Statist.

Comput., 10 (1989), pp. 36–52. Received Apr. 24, 1984.

[31] P. Sonneveld and M. B. van Gijzen,IDR(s): a family of simple and fast algorithms for solv-

ing large nonsymmetric systems of linear equations, Report 07-07, Department of Applied

Mathematical Analysis, Delft University of Technology, 2007.

[32] P. Sonneveld and M. B. van Gijzen,IDR(s): A family of simple and fast algorithms for

solving large nonsymmetric systems of linear equations, SIAM Journal on Scientific Com-

puting, 31 (2008), pp. 1035–1062. Received Mar. 20, 2007.

[33] E. Stiefel,Relaxationsmethoden bester Strategie zur L¨osung linearer Gleichungssysteme,

Comm. Math. Helv., 29 (1955), pp. 157–179.

[34] H. A. van der Vorst,Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for

the solution of nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 13 (1992),

pp. 631–644. Received May 21, 1990.

[35] H. A. van der Vorst and P. Sonneveld,CGSTAB, a more smoothly converging variant

of CG-S, Report 90-50, Department of Mathematics and Informatics, Delft University of

Technology, 1990.

[36] M. B. van Gijzen and P. Sonneveld,An elegant IDR(s)variant that efficiently exploits

bi-orthogonality properties, Report 08-21, Department of Applied Mathematical Analysis,

Delft University of Technology, 2008.

[37] P. K. W. Vinsome,ORTHOMIN—an iterative method for solving sparse sets of simultaneous

linear equations, in Proc. Fourth SPE Symposium on Reservoir Simulation, Los Angeles,

1976, pp. 149–160.

[38] P. Wesseling and P. Sonneveld,Numerical experiments with a multiple grid and a pre-

conditioned Lanczos type method, in Approximation methods for Navier-Stokes problems

(Proc. Sympos., Univ. Paderborn, Paderborn, 1979), vol. 771 of Lecture Notes in Math.,

Springer, Berlin, 1980, pp. 543–562.

[39] M.-C. Yeung and T. F. Chan,ML(k)BiCGSTAB: a BiCGSTAB variant based on multiple

Lanczos starting vectors, SIAM J. Sci. Comput., 21 (1999), pp. 1263–1290. Received May

16, 1997, electr. publ. Dec. 15, 1999.

[40] D. M. Young and K. C. Jea,Generalized conjugate-gradient acceleration of nonsymmetrizable

iterative methods, Linear Algebra Appl., 34 (1980), pp. 159–194.