IDR EXPLAINED∗
MARTIN H. GUTKNECHT†
Dedicated to Richard S. Varga on the occasion of his 80th birthday.
Abstract. The Induced Dimension Reduction (IDR) method is a Krylov space method for
solving linear systems that was developed by Peter Sonneveld around 1979. It was only noticed
by few people, and mainly as the forerunner of Bi-CGSTAB, which was introduced a decade later.
In 2007 Sonneveld and van Gijzen reconsidered IDR and generalized it to IDR(s), claiming that
IDR(1) ≈IDR is equally fast but preferable to the closely related Bi-CGSTAB, and that IDR(s)
with s > 1 may be much faster than Bi-CGSTAB. It also turned out that when s > 1, IDR(s) is
related to ML(s)BiCGSTAB of Yeung and Chan, and that there is quite some flexibility in the IDR
approach. This approach differs completely from traditional approaches to Krylov space methods,
and therefore it requires an extra effort to get familiar with it and to understand the connections
as well as the differences to better known Krylov space methods. This expository paper aims at
providing some help in this and to make the method understandable even to non-experts. After
presenting the history of IDR and related methods we summarize some of the basic facts on Krylov
space methods. Then we present the original IDR(s) in detail and put it into perspective with other
methods. Specifically, we analyze the differences between the IDR method published 1980, IDR(1)
and Bi-CGSTAB. At the end, we discuss a recently proposed ingenious variant of IDR(s) whose
residuals fulfill extra orthogonality conditions. There we dwell on details that have been left out in
the publications of van Gijzen and Sonneveld.
Key words. Krylov space method, iterative method, induced dimension reduction, IDR, CGS,
Bi-CGSTAB, ML(k)BiCGSTAB, large nonsymmetric linear system
1. History. The Induced Dimension Reduction (IDR) method was introduced
by Wesseling and Sonneveld from Delft University at a symposium of the International
Union of Theoretical and Applied Mechanics in September 1979. In the proceedings
it is covered on just 31
2pages of a 20-page paper [38], and it is explicitly attributed to
the second author. It was labeled as a Lanczos-type method for nonsymmetric linear
systems which does not require A
T. The term Lanczos-type method meant that the
new method was related to the biconjugate gradient (BiCG) method of Lanczos [17],
which had been revived and reformulated by Fletcher [4] four years before. Up to
that time there had been little interest in Lanczos’ approach, despite the fact that
it was very closely related to the widely used conjugate gradient method [16, 21].
Popular alternative Krylov space solvers for nonsymmetric systems were methods like
Vinsome’s OrthoMin1[37] (now often referred to as GCR), its variants OrthoDir
and OrthoRes [40], as well as similar methods introduced by Axelsson [2]. GMRes
[22] was still five years away. Also popular were parameter-dependent Krylov space
methods, like Chebyshev iteration, and parameter-dependent iterative methods based
on matrix splitting, like SOR.
The IDR method received hardly any attention, probably because it was neither
presented at a conference of the core numerical analysis community nor published in
∗Version of March 31, 2009.
†Seminar for Applied Mathematics, ETH Zurich, CH-8092 Zurich, Switzerland
([email protected]). Work done while the author was visiting the TU Berlin, supported by
the DFG Forschungszentrum MATHEON and the Mercator Visiting Professorship Program of the
DFG.
1We write the acronyms for the various methods in a unified way, which sometimes differs from
the one in the original publication.
1
2
a widely read journal. Moreover, Sonneveld’s approach to designing a Krylov space
solver was very unusual and even for experts hard to fully understand. Although the
method was under certain regularity conditions clearly and uniquely defined in [38],
some of the details and in particular the proof of the connection to BiCG were left
for publication elsewhere. The basic result on this connection is that the IDR residual
polynomials are of the form
ρIDR
n(t) = (Ωj(t)ρj(t) if n= 2j ,
Ωj(t)bρj+1(t) if n= 2j+ 1 ,(1.1)
where Ω0(t) :≡1, Ωj(t) :≡(1−ω1t)···(1−ωjt), and where ρjdenotes the jth BiCG
residual polynomial, which is often referred to as a Lanczos polynomial, scaled by
ρj(0) = 1, while bρj+1 denotes another residual polynomial, which has degree j+ 1. A
new linear factor (1−ωj+1t) is appended to Ωjin every other step, and it was suggested
to choose it such that the norm of the new IDR residual is minimized among those
that lie on a certain straight line. This is a widely used type of minimization step.
For example, it is also found in the conjugate residual [33] method, but there it leads
to a global minimum solution (for a symmetric positive definite matrix). And it is a
key ingredient of BiCGStab. The publication [38] only mentioned that the first line
of (1.1) had been proven in the case where Ais symmetric positive definite.
In 1984 Sonneveld introduced another Lanczos-type method: the Conjugate Gra-
dient Squared (CGS) method [29]. It is also based on residual polynomials that are
a product of polynomials, but here he used simply the square of the BiCG residual
polynomials:2
ρCGS
n(t) = ρ2
n(t).(1.2)
Note that the indexing of the residual polynomials and residuals is different in IDR
and CGS: in the former the degree increases by one when the index grows by one, in
the latter the degree increases by two.
The CGS paper [29] was received by the SIAM Journal on Scientific and Statistical
Computing (SISSC) on April 24, 1984, but it took nearly five years to get published
in revised and extended form [30]. Nevertheless, the method was accepted quickly by
numerical analysts and engineers. In typical cases it converges nearly twice as fast as
BiCG, though often in a very erratic manner. Although its idea and derivation are
ingenious, they are easy to understand: starting from the standard recursions for the
BiCG residual polynomials one just derives recursions for their squares by defining
additional suitable products of pairs of polynomials.
Yet another similar method was presented at the Householder Symposium in
Tylosand in June 1990 by van der Vorst, then still also at Delft University. The title
of his talk and the corresponding paper coauthored by Sonneveld and submitted to
SISSC on May 21, 1990, was “CGSTAB: A more smoothly converging variant of CG-
S” [35]. As part of the revision process the title was later changed into “Bi-CGSTAB:
a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric
2Therefore the name Biconjugate Gradient Squared (BiCGS) method would also make sense.
However Sonneveld’s view was that CGS is derived from a CG-type algorithm for building up a
set of orthogonal (or formally orthogonal) polynomials [private communication]. Only after the
recursions are mapped into a Krylov space that is embedded into an inner product space the notion
of biorthogonality makes sense.
3
linear systems”, and Sonneveld resigned as a coauthor [34]. In this paper, van der
Vorst started from the first formula in (1.1), now written as
ρBiCGSTAB
n(t) = Ωn(t)ρn(t).(1.3)
So, he adopted the indexing from CGS, and he also adopted from CGS the derivation
based on directly finding recursion for these residual polynomials, that is, he abstained
from using the recursions imposed by the IDR approach.
In the following years, BiCGStab was generalized to BiCGStab2 [12] and
BiCGStab(`) [24, 27], where the polynomial Ωnis built up from factors of degree 2
and `, respectively, whose coefficients are determined by a two– or an `–dimensional
residual norm minimization, respectively. This allows a better approximation of com-
plex eigenvalues and yields typically faster convergence at the price of higher com-
plexity, but only slightly higher computational cost. Nevertheless, the simple original
BiCGStab became the method of choice for most users who apply a Lanczos-type
method for solving a nonsymmetric linear system.
Due to the structure of the residual polynomials, CGS and the methods of the
BiCGStab family are often referred to as Lanczos-type product methods (LTPMs)
[13] or as hybrid BiCG methods [26].3
A new dimension came into play when, in 1997, Yeung and Chan submitted the
paper ML(k)BiCGSTAB: a BiCGSTAB variant based on multiple Lanczos starting
vectors to the renamed SIAM Journal on Scientific Computing (SISC) [39]. In this
paper they first introduced with ML(k)BiCG a version of BiCG where the left Krylov
space (generated by A
Tfrom an arbitrary shadow residual er0), which is used for the
oblique projection of r0, is replaced by a block Krylov space generated from a matrix
e
R0with kcolumns. Then they generalized the transition from BiCG to BiCG-
Stab to the new situation. This led to very complicated formulas, which were then
meticulously modified to get a simpler and efficient code. The method was shown to
converge amazingly well for a large number of fairly ill-conditioned examples, handled
mostly without preconditioning and with rather large k, 25 ≤k≤100, however, which
meant high memory consumption and considerable computational cost.4Although
the essence of the paper is well explained and easy to understand, the complexity
of the formulas and, paradoxically, the treatment of all the details must have kept
people from reading the paper and applying the method — despite the very promising
numerical results. The authors were aware of the connection to nonsymmetric block
Lanczos methods [1, 3, 5, 6, 8, 19], but while these are based on generalizing the
Lanczos three-term recursions, Yeung and Chan generalized the two-term recursions
of BiCG, as was done before by Simoncini [23]. This was partly the reason for the
complex formulas.
In [38] Wesseling and Sonneveld had announced a further publication on IDR to
be in preparation, but only in 2007 such a paper was submitted, again to SISC; see
[31] and, for the final version, [32]. In the sequel of an enquiry by Jens-Peter Zemke
[private communication] Sonneveld and van Gijzen reconsidered IDR and generalized
it to IDR(s), where the original method is included as the case s= 1 (except for a
3Note that many other authors have introduced other classes of “hybrid” iterative methods for
linear systems.
4For one example, the matrix ORSIRR1, the dependence on kwas investigated for small k, where
dramatic improvements can be noticed for 2 ≤k≤10 already; see Fig. 3(b) of [39].
4
small but interesting detail). They also clarify the relation of their method to BiCG-
Stab (when s= 1) and, in the final version, to ML(s)BiCGStab (when s > 1), which
they did not know when submitting the paper. It turns out that the even indexed
IDR(1) residuals are (up to roundoff effects) exactly the BiCGStab residuals and
that likewise every (s+ 1)th IDR(s) residual is up to the possibly different choice of
the parameters ωjaML(s)BiCGStab residual. However, the way these residuals are
constructed differs, and the “intermediate” residuals do not exist in BiCGStab and
differ in ML(s)BiCGStab, respectively. The paper also features numerical examples
that are relevant for practice and demonstrate the power of the method even for small
values of swhere the cost per step in nis small.
In a follow-up publication, Sleijpen, Sonneveld, and van Gijzen [25] introduced
partly different notation and tried to explain IDR(s) and its connection to BiCGStab
from a somewhat different viewpoint, but this author prefers the presentation in [32].
They also introduce methods similar to ML(s)BiCG and ML(s)BiCGStab.
In the recent publication [36] van Gijzen and Sonneveld introduce yet another,
very ingenious algorithm that fits into the IDR framework and leads to an elegant code.
It uses recursions that are quite different from those of the original IDR(s) of [32] and
produces “intermediate residuals” satisfying additional orthogonality conditions that
lead to shorter recurrence relations for some nand an improved memory management.
The authors do not introduce a separate name for this new algorithm, but in some
of the figures they refer to it as IDR(s) Bi-ortho. We will discuss this IDR variant in
Section 5 and use the shorter acronym IDR(s)BiO here.
2. From BiCG to ML(k)BiCGStab.In this section we review some basic facts
on Krylov space solvers putting special emphasis on the biconjugate gradient (BiCG)
method, and then look at the transition from BiCG to BiCGStab. We also have a
quick look at Yeung and Chan’s [39] generalization of these methods to ML(k)BiCG
and ML(k)BiCGStab, respectively, which feature multiple initial left (or shadow)
residuals.
2.1. Krylov space solvers based on projection. Given a nonsingular linear
system Ax =b∈CNand an initial approximation x0along with its residual r0:≡
b−Ax0, a Krylov space solver constructs recursively approximate solutions xn(often
referred to as iterates) such that
xn∈x0+Kn,
where
Kn:≡ Kn(A,r0) :≡span {r0,Ar0, . . . , An−1r0}.
is the nth Krylov subspace generated by Afrom r0.5
Two of the basic theoretical facts on this setting are: (i) There is a minimal ν
such that Kνis invariant. (ii) For the solution x?holds that x?∈x0+Kν, and νis the
minimal index for which this is true. So, if we choose xnwell, the solver terminates
in νsteps. In particular, it suffices to choose the iterates so that the corresponding
5Kn(B,y) denotes in this paper a Krylov subspace generated by Bfrom y, while Knwithout an
argument is an abbreviation for Kn(A,r0).
5
residuals rnare linearly independent unless zero. In practice, νis typically large (close
to N), and therefore this finite termination property is irrelevant.
The true aim is to find xnvery close to x?in few (or at least not very many)
steps. Because of the limited computer memory, it is important to find solvers that
allow us to compute xnwith short recursions. The restriction xn∈x0+Knimplies
that
rn∈r0+AKn⊆ Kn+1 .(2.1)
Most methods produce residuals that have a component in the “new part of the
space”, that is rn6∈ r0+AKn−1; in others there may occur exceptional situations
with rn∈r0+AKn−1, which implies that the residuals are linearly dependent at this
moment.
We will refer to spaces of the form r0+AKnor the form x0+Knas affine Krylov
subspaces.
Since the goal is a small rn, we need to approximate r0by elements from AKn.
E.g., krnkis minimum if we choose rnas the perpendicular from r0to its orthogonal
projection into AKn. This is the basis of the conjugate residual (CR) method [33] for
Hermitian systems and its various generalizations for the non-Hermitian case, such as
GCR and GMRes. For these methods we have
rn∈r0+AKn,rn⊥AKn.
Some Krylov space solvers are based on other orthogonal or oblique projections. In
particular, for the biconjugate gradient (BiCG) method [18, 4], which is of special
importance here, we have
rn∈r0+AKn,rn⊥e
Kn:≡ Kn(A
?,er0).(2.2)
Here, the initial shadow residual er0can be chosen arbitrarily; preferably, it should be
in arbitrary position with respect to an eigenbasis of A
T.
The most often used recursions for BiCG are the coupled two-term or BiOMin
recursions, which can be written as follows:
αn:= δn/δ0
n,(2.3a)
rn+1 := rn−Avnαn,(2.3b)
ern+1 := ern−A
?e
vnαn,(2.3c)
xn+1 := xn+vnαn,(2.3d)
δn+1 := hern+1,rn+1i,(2.3e)
βn:= δn+1/δn,(2.3f)
vn+1 := rn+1 +vnβn,(2.3g)
e
vn+1 := ern+1 +e
vnβn,(2.3h)
δ0
n+1 := he
vn+1,Avn+1i.(2.3i)
In addition to the residuals rnand iterates xnthree other sets of vectors are con-
structed: shadow residuals (or left-hand side Lanczos vectors) ern∈er0+A
?e
Kn, search
directions vn∈v0+A
?Kn, and shadow search directions e
vn∈e
v0+A
?e
Kn. All these
vectors are updated by coupled two-term recursions.
6
2.2. Residual polynomials and Lanczos-type product methods. Let Pn
denote the space of polynomials of degree at most n, and let P◦
n:≡ {ρ∈ Pn;ρ(0) =
1}. The inclusion (2.1) implies that one can associate rnwith a residual polynomial
ρn∈ P◦
nsuch that rn=ρn(A)r0. Roughly, krnkis small if |ρn(t)|is small at those
eigenvalues of Athat are “active” when r0is written in terms of the eigenbasis of A.
(This statement needs to be modified when Ahas no eigenbasis or an ill-conditioned
one.) This observation motivates derivations of Krylov space methods via real (when
Ais Hermitian) or complex (when Ais non-Hermitian) approximation problems. It
must also have motivated Sonneveld’s CGS method [30], where, as noted in (1.2), the
residual polynomials are the squares of the BiCG residual polynomials ρn. Clearly,
whenever |ρn(t)| ¿ 1 at an eigenvalue, |ρ2
n(t)|is even much smaller there. But often
the residual norm of BiCG oscillates wildly as a function of n, and then the one of
CGS oscillates even more. Avoiding or at least damping this was the motivation for
van der Vorst [34] for the choice (1.3) in BiCGStab. Recall that there, at step n,ωn
is chosen to minimize the residual norm on a straight line.
Let ρndenote the polynomial obtained from ρnby complex conjugation of the
coefficients. Then, in BiCG, we have rn=ρn(A)r0and ern=ρn(A
?)er0. In addition,
the search direction polynomials σn∈ Pn\Pn−1associated with the search directions
vnplay an important role: since v0=r0and e
v0=er0we have vn=σn(A)r0and
e
vn=σn(A
?)er0. Hence, associated with the recursions (2.3b), (2.3c), (2.3g), and
(2.3h) there are the underlying coupled polynomial recursions
ρn+1(t) := ρn(t)−αntσn(t), σn+1(t) := ρn+1(t) + βnσn(t).(2.4)
They are fundamental for deriving CGS and BiCGStab, and also, as we will see in
Section 4, for understanding the difference between IDR(1) and BiCGStab.
For CGS it is easy to derive from (2.4) four coupled recursions for the polynomials
ρCGS
n:≡ρ2
n,σ2
n,ρnσn, and ρnσn−1, which can then be translated into recursions
for elements of Kν. Likewise, for BiCGStab one combines (2.4) with the trivial
recursion Ωn+1(t) = (1 −ωnt)Ωn(t) to derive three recursions for the three products
ρBiCGSTAB
n:≡ρnΩn,ρnΩn−1, and σnΩn. In both cases alternative recursions exist too.6
2.3. Multiple initial shadow residuals. Yeung and Chan [39] generalized
BiCG by replacing the left Krylov subspaces e
Knby block Krylov subspaces, which
are a sum of Krylov spaces for the same matrix A
?but with several different initial
shadow residuals, stored as the columns of an N×smatrix e
R0. Yeung and Chan
called the resulting method the ML(s)BiCG method (except that they used kinstead
of s). The residuals whose index is a multiple of ssatisfy
rsj ∈r0+AKsj ,rsj ⊥ Kj(A
?,e
R0) :≡
s
X
i=1
Kj(A
?,er(i)
0).(2.5)
6In [11] four sets of equivalent recursions for CGS derived from the BiORes and BiODir recur-
sions of BiCG are given; however, they are all more complicated than the original CGS recursions,
and therefore more costly in work and storage.
7
For the others, with index n=sj +`, where 1 < ` < s, we have analogously
rn∈r0+AKn,
rn⊥ Kj;`(A
?,e
R0) :≡
`
X
i=1
Kj+1(A
?,er(i)
0) +
s
X
i=`+1
Kj(A
?,er(i)
0).
(2.6)
These residuals could also be constructed by using a variant of the nonsymmetric block
Lanczos method, where the block size of the left block Krylov space is s, while the
one of the right (block) Krylov space is just one, that is, the right space is an ordinary
Krylov space; see [1, 3, 5, 6, 8, 19] for ways to construct bases for these spaces with
short recursions. Yeung and Chan rather generalize block BiCG, described before
by O’Leary [20] and Simoncini [23], but the latter authors assumed the same block
size in the left-hand and the right-hand side Krylov spaces. Unfortunately, in theory
and practice there can (and ultimately do) occur problems that must be addressed
by block size reduction (deflation). Moreover there may occur Lanczos breakdowns
and pivot breakdowns; see, e.g., [13] for a discussion of the breakdowns of BiCG.
Deflation and breakdowns have not been addressed in [39].
Yeung and Chan [39] continued by applying to ML(s)BiCG the same transforma-
tion that turns BiCG into BiCGStab. Unfortunately, this lead to rather complicated
formulas, which they were able to simplify and economize somewhat by algebraic ma-
nipulations. The resulting algorithm, called ML(s)BiCGStab, was shown to be very
effective for a large number of rather ill-conditioned test matrices.
Under the titles Bi-CG and Bi-CGSTAB Sleijpen, Sonneveld, and van Gijzen [25]
sketched two methods that are in spirit the same as ML(s)BiCG and ML(s)BiCG-
Stab, but in detail differ considerably. Firstly, they are not using the equivalent of
Lanczos’ coupled two-term recursions; secondly, their “intermediate” residuals satisfy
only a block biorthogonality not the stricter requirement of (2.6) that determines the
residuals of ML(s)BiCG uniquely.
3. IDR basics. In this section we review the basic facts on the IDR(s) method,
following essentially the presentation in [32]. One aspect that we stress more explicitly
than Sonneveld and van Gijzen is that IDR(s) is a Krylov space method, and therefore
the residuals lie in an affine space that is embedded in a Krylov subspace. We also
try to give more realistic figures, although we will see that they still do not reflect the
whole truth.
3.1. The IDR Theorem. The IDR approach is based on a finite series of nested
linear subspaces Gjof diminishing dimension with the property that for some increas-
ing index sequence {nj}the residuals rnwith n≥njall lie in Gj. Of course, all
residuals lie in the invariant Krylov space Kν:≡ Kν(A,r0); therefore, we can start
with n0:≡0 and G0:≡ Kν. The other spaces Gjare defined by the recursion
Gj:≡(I−ωjA)(Gj−1∩ S) , .(3.1)
Here, Sis a prescribed linear subspace of codimension s¿N, and the constants
ωj6= 0 will be suitably chosen to boost convergence. Let us denote the dimension
of Gjby dj. Clearly, Gj−1∩ S can be represented by N−dj−1+slinear equations,
and it is likely that these are linearly independent. However, as pointed out in [32],
8
G0=R3
G0∩ S =S
I−ω1A
G1
G1∩ S
G2∩ S ={0}
G2
I−ω2A
Fig. 3.1.Case s= 1: The spaces R3=K3=G0)G1(G2.
we cannot conclude easily that linear independence is here a generic property (valid
for almost all problems if data are chosen randomly) since Gj−1actually depends on
S. But, typically, Gj−1∩ S and its image Gjhave dimension dj=dj−1−s. We will
mostly take it for granted that this and other regularity assumptions are satisfied,
and we will refer to this then as the regular case.7One can see, however, by analogy
to the behavior of the related ML(s)BiCG and ML(s)BiCGStab methods, that we
cannot expect that dj=dj−1−sremains true for jup to N/s if s > 1.
The IDR Theorem given next says that the spaces Gjare nested and, under mild
assumptions on S, the inclusion is strict: Gj(Gj−1, two properties that are not
apparent. For an illustration see Fig. 3.1.
Theorem 1 (IDR Theorem [38, 32]). Assume that S ∩G0contains no eigenvector
of A. Then
Gj(Gj−1unless Gj−1={o}.
For the proof see [38, 32]. As a consequence of the strict inclusions, Gj={o}for some
j≤N, say, j≡:J. However, the bound J≤Nthat follows from the IDR Theorem
leads to a strong overestimation of the finite termination index, for the simple reason
that termination is characterized by dJ= 0, which we can expect for Jof the size
N/s.
Sonneveld and van Gijzen [32] also provide the Extended IDR Theorem, whose
main result is that the difference dj−dj+1 is monotonically non-increasing:
0≤dj−dj+1 ≤dj+1 −dj≤s .
Alternatively, this result could also be concluded from the connection to ML(s)BiCG.
Of course, neither G0nor the other spaces Gjare known in advance, in the sense
that we know a basis for them. In theory, the IDR algorithm would provide these
7The authors of [32] refer to it as the generic case, although this may not be fully consistent
with the common usage of the word generic.
9
G0=R3
r2= (I−ω1A)v1
r0
r1
v1
G0∩ S =S
I−ω1A
v2
r3
G1
G1∩ S
Fig. 3.2.Case s= 1: The first two steps: construction of r2and r3(for details see §4.1).
bases if we continued it until the exact solution of Ax =bis found, that is, rν=ois
attained; but this is not feasible unless νis very small, so that the linear system can
be solved exactly in a few steps.
IDR constructs typically only s+ 1 residuals in Gjbefore turning to the next
subspace Gj+1 (Gj. To accommodate exceptional situations we introduce a mono-
tonically growing index sequence {nj}defined implicitly by8
rn∈ Gj∩(r0+AKn), n ≥nj. (3.2)
In the normal case, nj= (s+ 1)j, so nj+1 −nj=s+ 1 for reasons we will see in a
moment, but nj+1 −nj> s + 1 may occur in exceptional situations. In the simplest
case s= 1, which is when IDR is closely related to BiCGStab, two new residuals are
computed for each j. This is depicted in Figure 3.2. The details of the construction
are discussed next.
3.2. Recursions. The recursion for the residuals builds upon the recursion (3.1)
for the spaces Gj:
rn+1 := (I−ωjA)vn,vn∈ Gj−1∩ S ∩ (r0+AKn),(3.3)
We suppose here that, for all n,vnlies in the affine Krylov subspace spanned by
r1, . . . , rnand shifted by r0. This means that vnand thus also rn+1 have “maximum
degree” in the sense that vn6∈ r0+AKn−1and vn6∈ r0+AKn. There may be
situations where the latter assumptions do not hold, and, according to [32], there are
in the IDR framework ways to recover from such situations, but we will not treat
that here. There is some vague analogy to look-ahead Lanczos [7] or look-ahead block
Lanczos [1] in such recovery procedures.
8It is conceivable that the inclusion in (3.2) holds by chance for some n<njtoo, but for n≥nj
the inclusion will be forced by construction.
10
To construct vectors or “points” rn+1 ∈ Gjwe need vectors vn∈ Gj−1∩ S,
and the first time we construct a point in Gjwe can choose ωj. To construct vnin
Gj−1∩ S we need to intersect an s-dimensional affine subspace of Gj−1(in theory it
could be represented by N−sinhomogeneous linear equations) with the subspace S
represented by shomogeneous linear equations that we may write as P?vn=owith
an N×smatrix Pwhose columns form a basis of S⊥. In theory we would end up
with Ninhomogeneous linear equation for vnthat will usually have a unique solution,
but this is of course not feasible in practice. Instead we represent the s-dimensional
affine subspace of Gj−1directly by an affine combination (a linear combination whose
coefficients sum up to one) of s+ 1 points in Gj−1. The natural choice for these
s+ 1 points are the last computed residuals rn−s,...,rn. Here we see why we need
nj+1 −nj≥s+ 1. A neat way to take the condition of an affine combination
into account is to introduce the differences of the residual vectors, and that is what
Sonneveld and van Gijzen do:
vn:= rn−
ι(n)
X
i=1
γ(n)
i∆rn−i=rn−∆Rncn,(3.4)
where
s≤ι(n)≤n−nj−1,
∆rn:≡rn+1 −rn,
∆Rn:≡£∆rn−1. . . ∆rn−ι(n)¤,
cn:≡hγ(n)
1. . . γ(n)
ι(n)iT
.
The restriction ι(n)≤n−nj−1ensures that ∆rn−i∈ Gj−1. Usually, ι(n) = s,
but, again, there may be exceptional situations not covered here, where one needs to
choose a bigger ι(n). Note that using the differences of the residuals leads to a vn
whose polynomial representation automatically inherits from rnthe value 1 at zero.
Therefore, indeed vn∈r0+AKn, and we may view vnas a residual, so there is
x0
n∈x0+Knsuch that vn=b−Ax0
n.
To enforce vn∈ S we enforce9vn⊥ S⊥=R(P), that is, P?vn=o. This means
that the term ∆Rncnin (3.4) must be the oblique projection of rninto R(∆Rn)
along S. In order that this projection is uniquely defined, we need P?∆Rnto be
nonsingular, in particular ι(n) = sto make the matrix square. Then,
cn:≡(P?∆Rn)−1P?rn,vn:= rn−∆Rncn.(3.5)
Otherwise, when ι(n)> s, we might choose cnas the minimum norm solution of an
underdetermined least squares problem.
For the initial phase, that is, for constructing r1, . . . , rs, we may apply a fairly
arbitrary starting procedure, e.g., GMRes.
We need not just one point rn+1 ∈ Gj, but we need at least s+ 1 of them before
we can continue to the next space Gj+1. At first one might expect to need 2s+ 1
points in Gj−1to repeat the above construction s+ 1 times. However, this is not the
9R(P) denotes the range of P.
11
case. Because Gj⊂ Gj−1, the just constructed rn+1 ∈ Gjalso qualifies as a point of
Gj−1and can be used when we replace nby n+ 1 in the above construction.10 So,
s+ 1 points in Gj−1\Gjwill usually be enough. However, we cannot fully exclude
degenerate situations, where the last s+ 1 points constructed in Gj−1do not span
an s-dimensional affine space and therefore ∆Rnis singular (or nearly so). A careful
implementation of the method will need to address such situations, which are also
reflected by a zero (or absolutely small) coefficient in the tnterm of the polynomial
representation of some of the vectors vn.
For the case s= 1, the first two steps, from given r0and r1to r2and r3, are
shown in Figure 3.2. Then the construction is continued till v5=r6=oin Figure 3.3.
However, our figures show actually one of the “exceptional situations” we just referred
to: the constructed residuals are not all linearly independent. In fact, since the figures
show a construction in R3(i.e., N= 3), linearly independent residuals would mean
convergence in at most 3 steps, that is, r3=o.
We could avoid using vnby inserting (3.4) in (3.3):
rn+1 := rn−∆Rncn−ωjA(rn−∆Rncn) (3.6)
This formula manifests that IDR(s) differs considerably from most commonly used
Krylov space solvers such as CG,BiCG, or GCR. The difference is that in (3.6) not
only rnis multiplied by Abut also rn−s, . . . , rn−1. This means that in the terminology
of [10] IDR(s) is a (s+ 1, s + 1)-step method, while, e.g., CG,BiCG are (2,1)-step
methods, and OrthoMin(k) is a (k, 1)-step method, while the untruncated GCR is
a (∞,1)-step method.
As mentioned before, ωjcan only be chosen when we construct the first point in
Gj, that is, rn+1 with n+ 1 = nj. The formula rn+1 = (I−ωjA)vnsuggest that we
10The IDR(s) variant of Section 5 will differ in the choice of points used in (3.4).
G0=R3
r2= (I−ω1A)v1
r0
r1
v1
G0∩ S =S
I−ω1A
v2
r3
G1
v3
G1∩ S
G2∩ S ={0}={v5}
G2
I−ω2A
v4
r4
r5
Fig. 3.3.Case s= 1: Construction of r4and r5. Termination with v5=o(for details see §4.1).
12
choose ωjso that krn+1kis minimal among all rof the form r= (I−ωjA)vn, that
is, we choose it such that rn+1 ⊥Avn:
ωj:≡hAvn,vni
kAvnk2.(3.7)
Note that this value of ωjmay turn out to be zero or close to zero. As in BiCGStab
this is a source of breakdown or instability, but it is easily cured by choosing another
value that does not minimize the residual locally.
Finally we need to address the fact that it does not suffice to construct residuals
rnand vn, but that we also need the corresponding approximate solution xnand
x0
n∈x0+Kn. It is readily verified that
vn:= rn−∆Rncn⇐⇒ x0
n:= xn−∆Xncn,(3.8)
rn+1 := (I−ωjA)vn⇐⇒ xn+1 := ωjvn+x0
n.(3.9)
There are several ways to rearrange these four recursions and to combine them with
the iterate-residual relationships; see [25]. Also in the “prototype algorithm” of [32]
a different, but equivalent set of recursions is used. It includes the analog of (3.6) for
xn+1,
xn+1 := xn−∆Xncn+ωj(rn−∆Rncn) (3.10)
and the relation ∆rn=−A∆xn.
Note that here, as in any competitive set of recursions, the major cost of comput-
ing xn∈x0+Knconsists in n+ 1 matrix-vector products (MVs) with A. Regarding
memory, one needs just to store scolumns of each P, ∆Xn, and ∆Rn, plus a few
single N-vectors.
3.3. Characterization by orthogonality. Clearly, the dimension of Gjgets
reduced due to taking the intersection with the (N−s)–dimensional space S. This
dimension reduction is viewed as the basic force behind IDR and gave the method its
name. However, dimension reduction in Krylov space solvers is not at all a unique
feature of IDR. In fact, projection based methods can be understood in a similar way.
For example, the characterization (2.2) of the BiCG residuals could be written as
rn∈ L⊥
n∩(r0+AKn),
where Ln=e
Kn=Kn(A
?,er0), and for CR,GCR, and GMRes the same is true with
Ln=AKn. What is different in IDR is that Gjis not an orthogonal complement of
a Krylov subspace. However, due to the form of the recursion for {Gj},Gjturns out
to be the image of an orthogonal complement of a Krylov subspace. This result is
implicit in Subsection 5.1 of [32] and has been explicitly formulated in [25]:
Gj=nΩj(A)w¯¯w⊥ Kj(A
?,P)o= Ωj(A) [Kj(A
?,P)]⊥.(3.11)
Here, as before, Ωj(t) :≡(1 −ω1t)· · · (1 −ωjt)∈ P◦
j, and Kj(A
?,P) is the jth block
Krylov subspace generated by A
?from the scolumns of P, which are assumed to be
a basis of S⊥. Note that when we choose e
R0=Pthis block Krylov subspace is the
same as in ML(k)BiCG, see (2.5).
13
Note also that the larger s, the larger is Kj(A
?,P), and thus the smaller is Gj.
To prove (3.11) let us repeat here the argument from [32] that is linked to the
recursions that we have just discussed and provides further insight. (The induction
proof of (3.11) in [25] is different.) We start from (3.3) and the observation that
vn∈ Gj−1must likewise be of the form
vn= (I−ωj−1A)v0
n,v0
n∈ Gj−2∩ S ∩ (r0+AKn−1),
v0
n= (I−ωj−2A)v00
n,v00
n∈ Gj−3∩ S ∩ (r0+AKn−2),
.
.
..
.
.
v(j−2)
n= (I−ω1A)wn+1 ,wn+1 ∈ G0∩ S ∩ (r0+AKn−j+1).
Starting at the bottom, we can also write (with Ω0≡1):
wn+1 = Ω0(A)wn+1 ∈ G0∩ S ,
v(j−2)
n= Ω1(A)wn+1 ∈ G1∩ S ,
.
.
.
v0
n= Ωj−2(A)wn+1 ∈ Gj−2∩ S ,
vn= Ωj−1(A)wn+1 ∈ Gj−1∩ S ,
rn+1 = Ωj(A)wn+1 ∈ Gj.
Since {Ωk}j−1
k=0 is a basis of Pj−1we see that
Ω(A)wn+1 ∈ S (∀Ω∈ Pj−1),
that is, P?Ω(A)wn+1 =o∈Csor, in other words, wn+1 ⊥Ω(A
?)P,∀Ω∈ Pj−1, or,
wn+1 ⊥ Kj(A
?,P). In summary, we conclude that any rn+1 ∈ Gjis of the form
rn+1 = Ωj(A)wn+1 ,wn+1 ∈ G0∩ S ∩ (r0+AKn−j+1),wn+1 ⊥ Kj(A
?,P).
(3.12)
This proves (3.11). For simplicity, we may replace n+1 by nhere and, for our records,
write any rn∈ Gj(n=nj, . . . , nj+1 −1) as
rn= Ωj(A)wn,wn∈ G0∩ S ∩ (r0+AKn−j),wn⊥ Kj(A
?,P).(3.13)
Note that for n=nj−1 and n=nj, the polynomials associated with wnhave
the same degree: wn∈r0+AKnj−j. (That is why we chose nas index for wn,
although this is not the degree of the associated polynomial.)
In the generic case, for fixed j, we will construct nj+1 −nj=s+ 1 linearly
independent vectors wnthat provide s+ 1 linearly independent vectors rn(with
nj≤n<nj+1). So, as long as we stay in the generic case, nj=j(s+ 1).
Moreover, generically, for n=nj=j(s+ 1) where wn∈r0+AKjs and wn⊥
Kj(A
?,P) with dim Kj(A
?,P) = js =dim AKjs, there is a unique wnsatisfying
(3.13), since it can be characterized as the solution of a linear system with a js ×js
matrix that can be assumed to be nonsingular in the generic case:
Theorem 2 ([32]). Assume nj=j(s+1),j= 1,2, . . . , J, and assume the iterates
xnand residuals rnof IDR(s)are for n≤nJuniquely constructible by the recursions
14
(3.3) and (3.4) with the choice ι(n) = sand the coefficients cnfrom (3.5). Then, for
j≤J, the residuals wnjand rnjare uniquely characterized by the conditions (3.13).
Corollary 3. Under the assumptions of Theorem 2, and if the same parameters
ωj,(1 ≤j≤J)have been chosen, the IDR(s)iterates xnjand residuals rnjare identi-
cal with the iterates xjand residuals rjof BiCGStab (if s= 1) or ML(s)BiCGStab
(if s= 1), respectively.
But the sother vectors wn(with nj< n < nj+1), and thus also the corresponding
residuals rnare not uniquely determined by (3.13). We cannot expect that they
appear in BiCGStab or ML(s)BiCGStab, and, in fact, they usually do not.
4. The case s= 1 and the comparison with BiCGStab.If s= 1, the
subspace Sis a hyperplane determined by a single vector p⊥ S. So the matrix P
consists of the single column pnow. By Corollary 3, when s= 1, every other set of
vectors {wn,rn,vn−1,xn, . . . }(with neven) is uniquely determined up to the choice
of the parameters ωj. If the latter are chosen as in BiCGStab (and they usually are),
and if er0:= pin BiCGStab, then
r2j=r
BiCGSTAB
j,x2j=x
BiCGSTAB
j,w2j=r
BICG
j.(4.1)
So there remains the question whether and how BiCGStab and IDR(1) differ.
In order to answer this question we will look at the polynomial recursions that mirror
the recursions for the Krylov space vectors generated by the two methods.
4.1. Recursions and orthogonality properties of IDR(1).When s= 1 the
recursions (3.8) and (3.9) of IDR(s) simplify to
vn:= rn−γn(rn−rn−1),x0
n:= xn−γn(xn−xn−1),
rn+1 := (I−ωjA)vn,xn+1 := x0
n+ωjvn,(4.2)
where n≥1, j=b(n+ 1)/2c. The first line can be written
vn:= (1 −γn)rn+γnrn−1,x0
n:= (1 −γn)xn+γnxn−1,
to manifest that the point represented by vnlies on the straight line through rnand
rn−1, and likewise, x0
nlies on the line through xnand xn−1. By (3.5), γn:≡γ(n)
1=
hp,rni/hp,∆rn−1iis chosen such that vn∈ S, that is, vn⊥p. This is illustrated
in the Figures 3.2 and 3.3. The parameter ωjis usually chosen to make r2jas short
as possible; this means that r2jis orthogonal to Av2j−1; see (3.7). (This property is
not taken into account in the figures.)
From (4.1) and (3.13) we know that
w2j=rBICG
j=ρj(A)r0⊥e
Kj,(4.3)
where ρjis still the jth Lanczos polynomial, and where now e
Kj:≡ Kj(A
?,p). Ac-
cording to (3.13) w2j+1 is represented by a polynomial bρj+1 ∈ P◦
j+1 and
w2j+1 =bρj+1(A)r0⊥e
Kj.(4.4)
15
So, since rn= (I−ωjA)vn−1= Ωj(A)wn, we have
rn= Ωj(A)wn=(Ωj(A)ρj(A)r0if n= 2j ,
Ωj(A)bρj+1(A)r0if n= 2j+ 1 ,
vn= Ωj−1(A)wn+1 =(Ωj−1(A)ρj(A)r0if n= 2j−1,
Ωj−1(A)bρj+1(A)r0if n= 2j .
(4.5)
Inserting these formulas into vn= (1 −γn)rn+γnrn−1we get, after a short
calculation, for n= 2j+ 1 and n= 2j, respectively,
ρj+1(t) := (1 −γ2j+1)bρj+1(t) + γ2j+1 ρj(t) (j= 0,1,2, . . . ),
bρj+1(t) := (1 −γ2j) (1 −ωjt)ρj(t) + γ2jbρj(t) (j= 1,2, . . . ).(4.6)
4.2. Comparison with the recursions and orthogonality properties of
BiCGStab.The Lanczos (residual) polynomials ρjand the BiCG search direction
polynomials σjare formal orthogonal polynomials (FOPs) in the sense that, for i6=j,
ρi⊥ρj⇐⇒ hρi(A
?)er0, ρj(A)r0i= 0 ⇐⇒ Der
BICG
i,r
BICG
jE= 0 ,
σi⊥tσj⇐⇒ hσi(A
?)er0,Aσj(A)r0i= 0 ⇐⇒ De
v
BICG
i,Av
BICG
jE= 0 ,
where vBICG
jand e
vBICG
iare the search directions and the “shadow” search directions,
respectively, which appeared in the recursions (2.3a)–(2.3i). Since {ρ0, . . . , ρj−1}and
{σ0, . . . , σj−1}both span Pj−1, we actually have
ρj⊥ Pj−1, σj⊥tPj−1,⇐⇒ r
BICG
j⊥e
Kj,v
BICG
j⊥Ae
Kj.
Here, h., .iAdenotes a formal A–inner product. In summary, the basic BiCG recur-
sions (2.3b), (2.3c), (2.3g), and (2.3h) upon which BiCGStab builds too, are mirrored
by the following recursions for ρjand σj:
ρj+1(t)
| {z }
⊥Pj
:= ρj(t)
|{z}
⊥Pj−1
−αjtσj(t)
|{z}
⊥Pj−1
, σj+1(t)
| {z }
⊥tPj
:= ρj+1(t)
| {z }
⊥Pj
+βjσj(t)
|{z}
⊥tPj−1
.(4.7)
Here, both αjand βjare chosen so that the new polynomials ρj+1 and σj+1 feature
the indicated orthogonality properties, by which they are uniquely determined up to
a scalar multiple.
In contrast, in IDR(1), by (4.6), (4.3), and (4.4),
bρj+1(t)
| {z }
⊥Pj−1
:= (1 −γ2j) (1 −ωjt)ρj(t)
| {z }
⊥Pj−2
+γ2jbρj(t)
|{z}
⊥Pj−2
,
ρj+1(t)
| {z }
⊥Pj
:= (1 −γ2j+1)bρj+1(t)
| {z }
⊥Pj−1
+γ2j+1 ρj(t)
|{z}
⊥Pj−1
.
16
b
ρj+1
ρj
tσj
−αjt σj
ρj+1
−α0
jt σj
Fig. 4.1.Case s= 1: The connection between BiCGStab and IDR(1).
Comparing these recursions for (ρj,bρj) with (4.7) we easily see that
(1 −γ2j+1) (bρj+1(t)−ρj(t)) = −αjt σj(t).
So,
bρj+1(t) = ρj(t)−αj
1−γ2j+1
t σj(t),(4.8)
or, after multiplication by Ωj(t) and translation into the Krylov space,
r2j+1 =r2j−αj
(1 −γ2j+1)As
BiCGSTAB
j,where s
BiCGSTAB
j:≡Ωj(A)v
BICG
j.(4.9)
This formula expresses the odd indexed IDR(1) residuals r2j+1 in terms of quantities
from BiCGStab and the IDR coefficient γ2j+1. We illustrate the connection in Fig-
ure 4.1. While BiCGStab implicitly constructs ρj+1 by enforcing a biorthogonality
condition on a polynomial that lies on the line determined by ρjand t σj, IDR(1) first
generates by the second recursion in (4.6) the polynomial bρj+1 that lies on that line
and then also enforces this condition.
Let us finally note that the parameter ωj, which is in (3.7) chosen to make r2j
orthogonal to Av2j−1, is indeed the same in IDR(1) and BiCGStab, since v2j−1is
the same in both methods.
4.3. How does the original IDR differ from IDR(1)?. In contrast to IDR(1)
of [32], where (4.2) holds for all n > 1, the original IDR of [38] used for nodd the
recursions
vn:= rn−γ0
n(rn−1−rn−2),x0
n:= xn−γ0
n(xn−1−xn−2),
rn+1 := (I−ωjA)vn,xn+1 := x0
n+ωjvn,(4.10)
with γ0
n:≡ hp,rni/hp,∆rn−2i. So, here, when computing the “intermediate iterate”
x0
none modifies xnby a step in the same direction as has been used in the previous
step for modifying xn−1.
Moreover, in discrepancy of what we have stated here, the new IDR(s) of [32]
computes the residual differences actually as ∆rn=−A∆xn. This couples the re-
cursions for xnand rnmore tightly and thus reduces the gap between the recursively
17
computed residual and the true residual. This gap is known to be closely linked to the
attainable accuracy that can be achieved with a Krylov space solver; see [9, 15, 27].
The IDR Theorem still applies, and still x2j=xBiCGSTAB
j. This follows from the
fact that the arguments of Subsection 3.3 are still applicable.
5. IDR(s)with locally biorthogonal residuals. Recently, van Gijzen and
Sonneveld [36] came up with a new version of IRD(s), in which, in the regular case
assumed throughout this section, each of sconsecutive “intermediate” residuals rnj+k
is orthogonal to a growing subset of the sprescribed columns pkof P:
rnj+k⊥ {p1, . . . , pk}, k = 1, . . . , s . (5.1)
For distinction we will call this algorithm IDR(s)BiO here.11
IDR(s)BiO still fits into the IDR framework in the sense that the IDR Theo-
rem 1 and the orthogonality result of Theorem 2 as well as its Corollary 3 still apply.
One important aspect where it differs from the original IDR(s) is in the ansatz for
recursively constructing rnj+kfrom previous residuals: while the original version uses
in the formula (3.4) for vnthe latest sresidual differences (i.e., the choice ι(n) = s)
for all n, in IDR(s)BiO that sum involves the sresidual differences
∆rnj−1≡∆rnj−s−1∈ Gj−1∩AKnj−1+1 ,
.
.
.
∆rnj−1+s−1≡∆rnj−2∈ Gj−1∩AKnj−1,
none of which relates to a residual that lies in Gjalready. So, in the case s= 1, there
is an analogy to the original IDR of Sonneveld [38]; see (4.10). Additionally, these
residual differences are actually replaced by another set of svectors
gnj−1≡gnj−s−1∈ Gj−1∩AKnj−1+1 ,
.
.
.
gnj−1+s−1≡gnj−2∈ Gj−1∩AKnj−1,
that are multiples of the residual differences and thus also orthogonal to a growing
subset of the sprescribed columns pk:
gnj−1+k⊥ {p1, . . . , pk}, k = 0, . . . , s −1.(5.2)
However, these residual differences are not defined as before, but undergo a linear
transformation to impose (5.2). Note that in (5.2) the range of the index kis shifted
by 1; so for k= 0 the orthogonality condition is empty.
To construct preliminary vectors in Gj, we now define, for n=nj+k(k=
0, . . . , s), vectors vn∈ S by the ansatz
vn:= rn−
s
X
i=1
γ(n)
ignj−1+i−1=rn−Gj−1cn,(5.3)
11Actually, the sets {rnj+k}and {pi}are not biorthogonal, but by a triangular linear transfor-
mation we could replace the basis {pi}of S⊥by {p0
i}so that {rnj+k}and {p0
i}are biorthogonal.
However, the transformation would depend on j.
18
where
Gj−1:≡£gnj−1. . . gnj−1¤,cn:≡hγ(n)
1. . . γ(n)
si.
cnis determined by the condition vn⊥ R(P). So we have, as in (3.5),
cn:= (P?Gj−1)−1P?rn,vn:= rn−Gj−1cn.(5.4)
Here, the projection along Sis on R(Gj−1), that is, on a space that only depends
on j−1 and therefore is the same for s+ 1 values of n. Consequently, the matrix
P?Gj−1in the s+ 1 linear systems for cnis also the same. (However, the systems
cannot be solved at once, because the vector rnin the right-hand side P?rnresults
from the previous system.)
The elegance of IDR(s)BiO comes from special features that result from the
imposed orthogonality conditions (5.1) and (5.2). Due to (5.2) the matrix
Mj−1≡ {µ(j−1)
i,k0}s
i,k0=1 ≡£mnj−1. . . mnj−1+s−1¤:≡P?Gj−1(5.5)
is lower triangular, and due (5.1) the matrix with the sright-hand sides P?rn(n=
nj, . . . , nj+s−1),
Fj≡ {φ(j)
i,k0}s
i,k=1 ≡£fnj. . . fnj+s−1¤:≡P?£rnj. . . rnj+s−1¤(5.6)
is lower triangular too. Consequently, the matrix with the ssolutions cnof Mjcn=fn
for n=nj, . . . , nj+s−1,
Cj≡ {γ(nj+k0−1)
i}s
i,k0=1 ≡£cnj. . . cnj+s−1¤:≡M−1
jFj(5.7)
is also lower triangular. So its k0th column cnj+k0−1only depends on the (s−k0+1)th
trailing principal submatrix (of order s−k0+ 1) of Mj−1, whereas its first k0−1
entries are zero. In other words, the possibly nonzero entries of cnj+k0−1result from
a (s−k0)×(s−k0) linear system. This means in particular that the recursion (5.3)
becomes shorter while kincreases: for12 n=nj+k,k= 0, . . . , s −1,
vn:= rn−
s
X
i=k+1
γ(n)
ignj−1+i−1=rn−Gj−1cn.(5.8)
This not only reduces the computational cost, but it allows us to overwrite Gj−1by
Gjand Mjby Mj+1 inside the loop over k; for details see the pseudocode in [36].
We still need to explain how we find s+ 1 residuals rnj+k∈ Gj(k= 0, . . . , s)
so that (5.1) holds for the last sof them, and how we construct a new set of s
vectors gnj+k∈ Gj(k= 0, . . . , s −1) satisfying (5.2). We may use the orthogonality
conditions (5.1) with jreplaced by j−1 and (5.2) as induction assumption. For the
initialization (j= 0) such sets can be constructed by a one-sided Lanczos process that
combines the generation of a basis for Ks+1 with orthogonalization with respect to
the columns of P. Of course, at the same time, approximate solutions x1, . . . xsneed
to be constructed too, but they are obtained using essentially the same recursions.
12Note that in the presentation of this and other formulas in [36] the notation vn+kwith n=
nj+1 −1 (k= 1,...,s) is used, while here n=nj+k(k= 0,...,s−1) and k0=k+ 1.
19
Among the s+ 1 residuals in Gj−1satisfying the orthogonality condition, the last
one, rnj−1is orthogonal to all columns of P, whence rnj−1∈ S. So, in accordance
with (5.8) for k=s, where the sum is empty, we can choose vnj−1=rnj−1and thus
rnj:= (I−ωjA)rnj−1.(5.9)
Next, for n=nj+k > nj, any vnobtained from (5.8) lies in Gj−1∩ S ∩ (r0+AKn).
So, by the recursive definition of Gj,
ern+1 := (I−ωjA)vn=rn−Gj−1cn−ωjAvn
is a tentative residual in Gj∩(r0+AKn+1). Since rn∈ Gj∩(r0+AKn),
e
gn:≡rn−ern+1 =Gj−1cn+ωjAvn∈ Gj∩AKn+1 (5.10)
too, but in order to serve as a column of Gjit needs to be replaced by gnsatisfying
the orthogonality condition
gn⊥ {p1, . . . , pk}, n =nj+k , k = 0, . . . , s −1.(5.11)
This can be achieved by applying a Gram-Schmidt-like process: recursively, the pro-
jection of e
gn≡e
gnj+kon the span of gnj,. . . ,gn−1along the span of p1,. . . ,
pkis subtracted from e
gnto yield gn≡gnj+k∈ Gj∩AKn+1. This can be ex-
pressed as follows: for k= 0 the condition (5.11) is empty, so gnj:= e
gnj; then, for
n=nj+k, k = 0, . . . , s −1,
gn:= e
gn−
k
X
i=1
α(j)
i,k gnj+i−1=e
gn−£gnj... gn−1¤a(j)
k,(5.12)
where
a(j)
k:≡³£p1. . . pk¤?£gnj. . . gn−1¤´−1£p1. . . pk¤?e
gn∈Ck.(5.13)
Here, the k×kmatrix £p1. . . pk¤?£gnj. . . gn−1¤is the kth leading principal
submatrix of P?Gj=Mj, and thus it is lower triangular. Therefore, a(j)
kcan be found
by forward substitution. Of course, the diagonal elements µ(j)
i,i =p?
ignj−1+ineed to
be nonzero. Otherwise the process breaks down and the orthogonality condition (5.11)
cannot be satisfied for some n. For the whole block this step is summarized by
e
Gj:≡£e
gnj. . . e
gnj+s−1¤=GjAO
j(5.14)
with AO
junit upper triangular and P?Gj=Mjlower triangular. Above the diagonal
AO
jcontains the s−1 coefficients vectors a(j)
1(in column 2) to a(j)
s−1(in the last
column). Hence, MjAO
j= (P?Gj)AO
jis an LU decomposition of f
Mj:≡P?e
Gj.
Amazingly, when we replace here this classical Gram-Schmidt-like process by a
modified Gram-Schmidt-like process as it was suggested in [36], there is no need to
solve triangular linear systems.
20
In matrix notation the first sweep of the modified Gram-Schmidt-like process can
be summarized as
G(1)
j:≡e
GjB(1)
j,
where the upper triangular matrix B(1)
jis given by
B(1)
j:≡
1−β(j)
12 −β(j)
13 . . . −β(j)
1s
1 0 . . . 0
....
.
.
1 0
1
, β(j)
1,k+1 :≡p1,e
gnj+k®
p1,e
gnj®
(with k= 1, . . . , s −1), and has the effect that, by subtracting a multiple of the first
column e
gnjof e
Gjthe columns 2 to sof e
Gjare transformed into columns of G(1)
j
that are orthogonal to p1. Then, for `= 2, . . . , s −1, in further analogous sweeps, by
subtracting a multiple of the `th column g(`−1)
nj+`−1of G(`−1)
jthe columns `+ 1 to sof
G(`−1)
jare transformed into columns of G(`)
jthat are additionally orthogonal to p`:
G(`)
j:≡G(`−1)
jB(`)
j, ` = 2, . . . , s −1,
where B(`)
jis now for `= 2, . . . , s −1 given by
B(`)
j:≡
1 0 . . . . . . . . . . . . 0
...
1−β(j)
`,`+1 . . . . . . −β(j)
`,s
1 0 . . . 0
....
.
.
1 0
1
, β(j)
`,k+1 :≡Dp`,g(`−1)
nj+kE
Dp`,g(`−1)
nj+`−1E
(with k=`, . . . , s −1). Ultimately, the columns of
Gj:≡G(s)
j=e
GjB(1)
jB(2)
j· · · B(s−1)
j(5.15)
satisfy the orthogonality condition (5.11) and thus are identical to those obtained by
(5.12). Moreover, the comparison with (5.14) reveals that
AO
j=³B(1)
jB(2)
j· · · B(s−1)
j´−1=b
B(s−1)
jb
B(s−2)
j· · · b
B(1)
j,(5.16)
where b
B(`)
j:≡(B(`)
j)−1is obtained by replacing in B(`)
jthe coefficients −β(j)
`,k+1 by
+β(j)
`,k+1. In view of the special structure of the matrices b
B(`)
jone can conclude from
(5.16) that β(j)
`,k+1 =α(j)
`,k. Of course, the matrices e
Gj,G(1)
j, . . . , G(1)
j,Gjcan all be
stored in the same place.
21
Finally, since by induction (within a block) both rnand gnare orthogonal to
p1, . . . , pk, and moreover, rn∈r0+AKnand gn∈AKn+1, we get rn+1 ∈r0+AKn+1
satisfying (5.1) according to
rn+1 := rn−φ(j)
k+1,k+1
µ(j)
k+1,k+1
gn(n=nj+k;k= 0, . . . , s −1) ,(5.17)
where φ(j)
k,k and µ(j)
k,k are diagonal elements of Fjand Mj, which are enforcing that
rn+1 ⊥pk+1, as can be checked easily.
As has been noted in [36] and can be seen by premultiplying (5.17) with P?the
columns of Fjcan be updated in an elegant way too:
fn+1 := fn−φ(j)
k+1,k+1
µ(j)
k+1,k+1
mn+1 (n=nj+k, k = 0, . . . , s −1) .(5.18)
Sofar, we have concentrated on the possibility of constructing efficiently residuals
satisfying the orthogonality properties (5.1), but we still need to give formulas for
the recursive computation of the approximate solutions xn, and due to the relation
∆rn=−A∆xn, these formulas will lead to other options for updating the residuals.
In general, update formulas for xnare fairly easily obtained from those for rn,
and here this is true also. Let us define
e
un:≡A−1e
gn,
un:≡A−1gn,
Uj−1:≡£unj−1. . . unj−1¤=A−1Gj−1.
Then, from (5.10) we get
e
un:= Uj−1cn+ωjvn∈ Kn+1 ,(5.19)
which allows us to replace (5.10) by
e
gn:= Ae
un.(5.20)
This helps to couple the updates of xnand rnand thus to avoid a fast growth of the
residual gap mentioned before. Moreover, (5.12) translates into
un:= e
un−
k
X
i=1
α(j)
i,k unj+i−1(n=nj+k, k = 1, . . . , s −1) .(5.21)
Finally, from (5.17) we get
xn+1 := xn+φ(j)
k+1,k+1
µ(j)
k+1,k+1
un(n=nj+k, k = 0, . . . , s −1) .(5.22)
These are the formulas the IDR pseudocode in [36] is based on. But what makes it
so ingenious is the fact that it minimizes memory usage by systematically overwriting
data that is no longer used. On the other hand, this makes the code harder to
understand.
22
6. Comments and conclusions. The various IDR algorithms may still be not
as well understood as other algorithms that are directly derived from the Lanczos
process (be it symmetric or non-symmetric), but their close relationship to Lanczos
based algorithms certainly helps us to understand them.
6.1. The case s= 1.This case is easy, because in exact arithmetic the even in-
dexed IDR(1) iterates and residuals are exactly the BiCGStab iterates and residuals.
However, as we have seen, the recursions are not the same, and therefore, it is possible
that IDR(1) is more stable than BiCGStab or vice versa. The odd numbered IDR(1)
iterates and residuals have no counterpart in the original BiCGStab algorithm.
For sure is that the existence of all BiCGStab iterates, i.e., all even IDR(1)
iterates, requires that all BiCG residuals exists, and that therefore any serious Lanczos
breakdown and any so-called pivot breakdown cause BiCGStab and IDR(1) to break
down unless extra precautions against such breakdowns have been implemented. This
follows from the fact that the BiOMin version and the BiORes version of BiCG break
down at the same times: the existence of the residuals implies the existence of the
coupled two-term recursions [13]. Additionally, the choice of the parameters ωjis the
same in BiCGStab and IDR(1), so a breakdown due to ωj= 0 will occur at the same
time in both methods; but in both it is also easy to fix.
It is conceivable that there are further causes for breakdown in IDR(1). On
the other hand, the recovery procedure in case of a breakdown seems to be much
simpler in IDR(1); but so far it seems to be less understood and documented than for
BiCGStab [14].
While IDR(1) and BiCGStab produce in exact arithmetic essentially the same
results based on a common mathematical background, they are clearly different algo-
rithms obtained by different approaches.13
6.2. The case s > 1.The relation BiCG ÃBiCGStab ∼IDR(1) is matched by
the relation ML(s)BiCG ÃML(s)BiCGStab ∼IDR(s), but the similarity between
ML(s)BiCGStab and IDR(s) is weaker than between BiCGStab and IDR(1). In
IDR(s) there is some freedom in choosing the s“intermediate” iterates and residuals
because, unlike in ML(s)BiCGStab, the vectors wnin (3.13) need not satisfy the
strict condition (2.6) of an “intermediate” ML(s)BiCG residual, but only the weaker
block orthogonality condition wn⊥ Kj(A
?,P) of (3.13). With the IDR approach,
such vectors can be obtained with much simpler recursions, which, in addition, allow
considerable flexibility that may enable us to overcome breakdowns.
The many published numerical examples on IDR(s) [25, 32, 36] manifest the fast
convergence of this method and the superiority of the choice s > 1. Due to the careful
choice of the numerical examples and the restriction on small values of s, where the
method is less costly than for large s, the numerical results are more relevant than
those of Yeung and Chan [39], who applied their ML(s)BiCGStab with large sand
mostly without preconditioning to rather ill-conditioned test matrices. Heuristically
it is plausible that these methods are particularly effective for such examples. In
BiCG the construction of a basis of e
Knis prone to roundoff errors, which can be
expected to be larger when Ais ill-conditioned and nis large. When constructing
13From Proposition 5.1 in [25] one may get the impression that BiCGStab and IDR(1) are nearly
identical. But they do not compare the original BiCGStab recursions with the original IDR(1)
recursions.
23
in ML(s)BiCG a basis of the block Krylov subspace Kj(A?,e
R0) we can start with s
orthonormal basis vectors and work with the dimension jof order n/s of each single
Krylov subspace. Then, for the same value of n, we can expect the basis of the block
Krylov subspace to be better conditioned. A similar improvement of the condition
of the basis can be expected when we compare BiCGStab to ML(s)BiCGStab or
IDR(s), and it seems to be relevant also in preconditioned problems that are not
extremely ill-conditioned.
The effectiveness of this improvement due to changing the dual (“left”) space
and using a better conditioned basis is quite surprising. The discovery of this effect
is due to Yeung and Chan [39], but the equally strong improvement of IDR(s) over
IDR(1), which seems to rely partly on the same effect, was discovered independently
a decade later by Sonneveld and van Gijzen. Additionally, the left-hand side block
Krylov space that is implicitly used by both ML(s)BiCGStab and IDR(s) seems to
be more effective in capturing the spectrum. By choosing in IDR(s) this space (i.e.,
the matrix P) appropriately — and perhaps even adaptively — depending on some
knowledge on the spectrum of Aone may be able to further speed up convergence.
The other fundamental fact is that in the framework of Lanczos-type product
methods, multiple left projections can reduce the MV count. By the reduction of
the MV count we understand a smaller ratio between the search space dimension n
(nj≤n<nj+1) and the number js of orthogonality conditions satisfied by wn.
For n=nj=j(s+ 1) this ratio is 1 + 1
swhile for CGS and BiCGStab it is 2.
This also applies both to ML(s)BiCGStab and IDR(s), but not to ML(s)BiCG,
where building up the left block Krylov space costs sMVs per value of j, while
ML(s)BiCGStab and IDR(s) achieve the same effect with just one MV. Therefore,
ML(s)BiCG is not competitive with ML(s)BiCGStab or IDR(s), except perhaps
in situations where the Lanczos-type product methods fail due to roundoff problems.
(It is well known, that the recursion coefficients produced by BiCGStab are usually
less accurate than the same coefficients produced by BiCG, and there are problems
where BiCGStab fails to converge, while BiCG succeeds.) For this reason, Yeung
and Chan [39] introduced ML(s)BiCG only as a tool for deriving ML(s)BiCGStab.
IDR(s) inherits from BiCGStab also the disadvantage that for a problem with
real-valued data the parameters ωjare all real-valued (when chosen in the standard
way), and therefore the zeros ω−1
iof Ωjcannot approximate complex eigenvalues of
Awell. This problem has been addressed by BiCGStab2 [12] and later by BiCG-
Stab(`) [24, 27] by building up Ωjfrom polynomial factors of degree 2 and `, respec-
tively. Unfortunately, an adaptation of IDR(s) to include this idea in an efficient way
is not straightforward and requires to change the framework. This topic is addressed
in [28].
Acknowledgment. The author acknowledges the advice of and the discussions
with many colleagues, in particular Klaus G¨artner, Martin van Gijzen, Miroslav
Rozloˇzn´ık, Valeria Simoncini, Peter Sonneveld, Man-Chung Yeung, and Jens-Peter
Zemke. He is also indebted to the Technical University Berlin and the DFG Research
Center MATHEON for hosting him.
24
REFERENCES
[1] J. I. Aliaga, D. L. Boley, R. W. Freund, and V. Hern´
andez,A Lanczos-type method for
multiple starting vectors, Math. Comp., 69 (2000), pp. 1577–1601. Received Oct. 10, 1996,
electr. publ. May 20, 1999.
[2] O. Axelsson,Conjugate gradient type methods for unsymmetric and inconsistent systems of
linear equations, Linear Algebra Appl., 29 (1980), pp. 1–16.
[3] Z. Bai, D. Day, and Q. Ye,ABLE: an adaptive block Lanczos method for non-Hermitian
eigenvalue problems, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 1060–1082 (electronic).
Received Mar. 4, 1997.
[4] R. Fletcher,Conjugate gradient methods for indefinite systems, in Numerical Analysis,
Dundee, 1975, G. A. Watson, ed., vol. 506 of Lecture Notes in Mathematics, Springer,
Berlin, 1976, pp. 73–89.
[5] R. W. Freund,Band Lanczos method (Section 4.6), in Templates for the Solution of Algebraic
Eigenvalue Problems: A Practical Guide, Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and
H. van der Vorst, eds., SIAM, Philadelphia, 2000, pp. 80–88.
[6] ,Band Lanczos method (Section 7.10), in Templates for the Solution of Algebraic Eigen-
value Problems: A Practical Guide, Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and
H. van der Vorst, eds., SIAM, Philadelphia, 2000, pp. 205–216.
[7] R. W. Freund, G. H. Golub, and N. M. Nachtigal,Iterative solution of linear systems,
Acta Numerica, (1992), pp. 57–100.
[8] R. W. Freund and M. Malhotra,A block QMR algorithm for non-Hermitian linear systems
with multiple right-hand sides, Linear Algebra Appl., 254 (1997), pp. 119–157.
[9] A. Greenbaum,Estimating the attainable accuracy of recursively computed residual methods,
SIAM J. Matrix Anal. Appl., 18 (1997), pp. 535–551. Received Apr. 21, 1995.
[10] M. H. Gutknecht,Stationary and almost stationary iterative (k, l)-step methods for linear
and nonlinear systems of equations, Numer. Math., 56 (1989), pp. 179–213.
[11] ,The unsymmetric Lanczos algorithms and their relations to Pad´e approxi-
mation, continued fractions, and the qd algorithm. in Preliminary Proceed-
ings of the Copper Mountain Conference on Iterative Methods, April 1990.
http://www.sam.math.ethz.ch/∼mhg/pub/CopperMtn90.ps.gz and CopperMtn90-7.ps.gz.
[12] ,Variants of BiCGStab for matrices with complex spectrum, SIAM J. Sci. Comput., 14
(1993), pp. 1020–1033. Received Sep. 9, 1991.
[13] ,Lanczos-type solvers for nonsymmetric linear systems of equations, Acta Numerica, 6
(1997), pp. 271–397.
[14] M. H. Gutknecht and K. J. Ressel,Look-ahead procedures for Lanczos-type product methods
based on three-term Lanczos recurrences, SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1051–
1078.
[15] M. H. Gutknecht and Z. Strakoˇ
s,Accuracy of two three-term and three two-term recurrences
for Krylov space solvers, SIAM J. Matrix Anal. Appl., 22 (2000), pp. 213–229.
[16] M. R. Hestenes and E. Stiefel,Methods of conjugate gradients for solving linear systems,
J. Res. Nat. Bureau Standards, 49 (1952), pp. 409–435.
[17] C. Lanczos,An iteration method for the solution of the eigenvalue problem of linear differential
and integral operators, J. Res. Nat. Bureau Standards, 45 (1950), pp. 255–281.
[18] ,Solution of systems of linear equations by minimized iterations, J. Res. Nat. Bureau
Standards, 49 (1952), pp. 33–53.
[19] D. Loher,New block Lanczos solvers for linear systems with multiple right-hand side, PhD
thesis, Diss. no. 16337, ETH Zurich, Zurich, Switzerland, 2006.
[20] D. P. O’Leary,The block conjugate gradient algorithm and related methods, Linear Algebra
Appl., 29 (1980), pp. 293–322. Received Sep. 25, 1978.
[21] J. K. Reid,On the method of conjugate gradients for the solution of large sparse systems
of linear equations, in Large sparse sets of linear equations. Proceedings of the Oxford
Conference of the Institute of Mathematics and Its Applications held in April, 1970, J. K.
Reid, ed., Academic Press, London, 1971, pp. 231–254.
[22] Y. Saad and M. H. Schultz,Conjugate gradient-like algorithms for solving nonsymmetric
linear systems, Math. Comp., 44 (1985), pp. 417–424.
[23] V. Simoncini,A stabilized QMR version of block BICG, SIAM J. Matrix Anal. Appl., 18
(1997), pp. 419–434. Received Mar. 16, 1994.
[24] G. L. G. Sleijpen and D. R. Fokkema,BiCGstab(l) for linear equations involving unsymmet-
ric matrices with complex spectrum, Electronic Trans. Numer. Anal., 1 (1993), pp. 11–32.
Received Mar. 1993.
[25] G. L. G. Sleijpen, P. Sonneveld, and M. B. van Gijzen,Bi-CGSTAB as an induced di-
25
mension reduction method, Report 08-07, Department of Applied Mathematical Analysis,
Delft University of Technology, 2008.
[26] G. L. G. Sleijpen and H. A. van der Vorst,An overview of approaches for the stable
computation of hybrid BiCG methods, Appl. Numer. Math., 19 (1995), pp. 235–254.
[27] G. L. G. Sleijpen, H. A. van der Vorst, and D. R. Fokkema,BiCGstab(l) and other hybrid
Bi-CG methods, Numerical Algorithms, 7 (1994), pp. 75–109. Received Oct. 29, 1993.
[28] G. L. G. Sleijpen and M. B. van Gijzen,Exploiting BiCGstab(`)strategies to induce di-
mension reduction, Report 09-02, Department of Applied Mathematical Analysis, Delft
University of Technology, 2009.
[29] P. Sonneveld,CGS, a fast Lanczos-type solver for nonsymmetric linear systems, Report
84-16, Department of Mathematics and Informatics, Delft University of Technology, 1984.
[30] ,CGS, a fast Lanczos-type solver for nonsymmetric linear systems, SIAM J. Sci. Statist.
Comput., 10 (1989), pp. 36–52. Received Apr. 24, 1984.
[31] P. Sonneveld and M. B. van Gijzen,IDR(s): a family of simple and fast algorithms for solv-
ing large nonsymmetric systems of linear equations, Report 07-07, Department of Applied
Mathematical Analysis, Delft University of Technology, 2007.
[32] P. Sonneveld and M. B. van Gijzen,IDR(s): A family of simple and fast algorithms for
solving large nonsymmetric systems of linear equations, SIAM Journal on Scientific Com-
puting, 31 (2008), pp. 1035–1062. Received Mar. 20, 2007.
[33] E. Stiefel,Relaxationsmethoden bester Strategie zur L¨osung linearer Gleichungssysteme,
Comm. Math. Helv., 29 (1955), pp. 157–179.
[34] H. A. van der Vorst,Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for
the solution of nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 13 (1992),
pp. 631–644. Received May 21, 1990.
[35] H. A. van der Vorst and P. Sonneveld,CGSTAB, a more smoothly converging variant
of CG-S, Report 90-50, Department of Mathematics and Informatics, Delft University of
Technology, 1990.
[36] M. B. van Gijzen and P. Sonneveld,An elegant IDR(s)variant that efficiently exploits
bi-orthogonality properties, Report 08-21, Department of Applied Mathematical Analysis,
Delft University of Technology, 2008.
[37] P. K. W. Vinsome,ORTHOMIN—an iterative method for solving sparse sets of simultaneous
linear equations, in Proc. Fourth SPE Symposium on Reservoir Simulation, Los Angeles,
1976, pp. 149–160.
[38] P. Wesseling and P. Sonneveld,Numerical experiments with a multiple grid and a pre-
conditioned Lanczos type method, in Approximation methods for Navier-Stokes problems
(Proc. Sympos., Univ. Paderborn, Paderborn, 1979), vol. 771 of Lecture Notes in Math.,
Springer, Berlin, 1980, pp. 543–562.
[39] M.-C. Yeung and T. F. Chan,ML(k)BiCGSTAB: a BiCGSTAB variant based on multiple
Lanczos starting vectors, SIAM J. Sci. Comput., 21 (1999), pp. 1263–1290. Received May
16, 1997, electr. publ. Dec. 15, 1999.
[40] D. M. Young and K. C. Jea,Generalized conjugate-gradient acceleration of nonsymmetrizable
iterative methods, Linear Algebra Appl., 34 (1980), pp. 159–194.