Document [original]

SPECTRAL ERROR BOUNDS FOR HERMITIAN INEXACT

KRYLOV METHODS†

UTE KANDLER∗AND CHRISTIAN SCHR¨

ODER∗

Abstract. We investigate the convergence behavior of inexact Krylov methods for the approx-

imation of a few eigenvectors or invariant subspaces of a large, sparse Hermitian matrix. Bounds

on the distance between an exact invariant subspace and a Krylov subspace and between an exact

invariant subspace and a Ritz space are presented. Using the first bound we analyze the question:

if a few iteration steps have been taken without convergence, how many more iterations have to be

performed to achieve a preset tolerance. The second bound provides a measure on the approximation

quality of a computed Ritz space. Traditional bounds of these quantities are particularly sensitive

to the gap between the wanted eigenvalues and the remaining spectrum. Here this gap is allowed

to be small by considering how well the exact invariant subspace is contained in a slightly larger

approximated invariant subspace. Moreover, numerical experiments confirm the applicability of the

given bounds.

Key words. Hermitian eigenvalue problem, inexact Krylov method, convergence analysis,

Krylov relation, Ritz pair

AMS subject classifications. 65F15, 65G99

1. Introduction. The Hermitian eigenvalue problem consists of solving the

equation

Ax =λx

for λ∈R,x∈Cn\ {0}, where A=AH∈Cn×nis given. Here we consider the

case when Ais large and data-sparse and only a small subset of exterior eigenvalues

and the corresponding invariant subspace are desired. Under these conditions the

most prominent iterative methods are Krylov subspace methods that search in Krylov

subspaces

Kk:= Kk(A, v1) := span(v1, Av1, A2v1, . . . , Ak−1v1)

for approximations of eigenvectors or invariant subspaces.

The application we have in mind is the identification of ground states of quan-

tum systems. Mathematically this amounts to is an Hermitian eigenvalue problem.

A ground state is an eigenvector corresponding to the smallest eigenvalue, whereas

eigenvectors corresponding to the second (third, forth,...) smallest eigenvalues are

called first (second, third,...) excited states. Note that these eigenvalues can be very

close to each other which can lead to complications in the determination of the ground

state [25]. The dimension of these quantum systems is often extremely large; it is 2d

where dis the number of particles of the system (and d > 50 is not uncommon) [11,25].

Hence, already a single vector might not fit into memory of even a large computing

cluster when stored in standard element-by-element format. For that reason vectors

have to be stored in a data sparse way, such as, e.g., in the tensor train or the hierarchi-

cal tensor format [9,10,21]. These formats, however, entail the drawback that vector

operations like matrix-vector multiplication, vector addition and vector scaling are

†This work was supported by German research council, DFG under project “Scalable Numerical

Methods for Adiabatic Quantum Preparation”

∗TU Berlin, Str. des 17. Juni 136, 10623 Berlin, GERMANY,

({kandler,schroed}@math.tu-berlin.de)

2Ute Kandler Christian Schr¨oder

inexact, i.e., only approximations of the intended quantities are available. Using inex-

act vector operations inside Krylov methods result in inexact Krylov methods. Other

occurrences include inexact solves in a shift-invert setting (when A= (A−σI)−1) or

mixed precision arithmetic (when computations are carried out in double precision,

but vectors are stored in single precision).

Backward error analysis shows that inexact Krylov subspace methods can be

interpreted as exact Krylov methods applied to a perturbed matrix A+E[14,29,33].

In general, an inexact Krylov method will generate orthonormal Vk+1 = [Vk, vk+1] =

[v1, . . . , vk+1]∈Cn×k+1,bk+1 ∈Ck, and Bk=BH

k∈Ck×ksuch that a Krylov relation

of the form

(1.1) (A+E)Vk=VkBk+vk+1bH

k+1

holds. Equation (1.1) takes the role of the familiar Arnoldi relation AVk=VkHk+

hk+1,kvk+1eT

k. The backward error matrix E=EH∈Cn×nis unknown, only a bound

on its norm ∥E∥2is usually available. Note that using the Lanczos process Bkwould

be tridiagonal, but other methods generate a full Hermitian Bk[14]. Relation (1.1)

implies that Vkis a basis of a Krylov subspace

Kk:= Kk(A+E, ˜v1)

of the Hermitian matrix A+Eclose to Afor some ˜v1∈ran (Vk).

In this paper the following questions are addressed:

•How well is a desired invariant subspace Xof the original matrix Aapproxi-

mated by the Krylov subspace ˜

Kkof a perturbed matrix A+Eor by a Ritz

space ˜

Yin ˜

Kk?

In our notation a tilde indicates a quantity that corresponds to A+Einstead

of to A. E.g., ˜

Kkis a Krylov subspace of A+Eand ˜

Yis a Ritz space of A+E

in ˜

Kk.

•How can the sensitivity of the error bounds with respect to a small gap between

the wanted and the remaining eigenvalues be avoided?

We allow the approximation ˜

Yto be of larger dimension than X. In other

words, we alter the question from ”How close is Xto ˜

Y?” to ”How well is X

contained in ˜

Y?”. This enables us in contrast traditional bounds, e.g., [7,13,

31], to treat small gaps between the wanted and the remaining eigenvalues.

•What is a suitable measure for the quality of the approximate invariant sub-

space?

We use the angle of inclusion of Xin ˜

Yfor nonzero subspaces X,˜

Y ⊂ Cn

which is defined by

(1.2) ∠↷

max(X,˜

Y) := max

x∈X, x=0 ∠(x, ˜

Y).

A small angle ∠↷

max(X,˜

Y) does not mean that Xis close to ˜

Y, but indi-

cates that Xis almost contained in ˜

Y. In numerous applications this is all

that is needed [1]. When dim(X)≤dim( ˜

Y), the angle ∠↷

max(X,˜

Y) coincides

with the well-known maximal canonical angle [5,37], otherwise (i.e., when

dim(X)>dim( ˜

Y)) it is π/2. This unsymmetric formulation is reasonable,

because for example a 2-dimensional space can be approximately contained in

a 3-dimensional space but obviously not vice versa. As a nice consequence it

also means that our bounds hold in the trivial case when dim(X)>dim( ˜

Y).

Analysis of inexact Arnoldi methods for Hermitian eigenvalue problems 3

The bounds presented in this paper use (1.1) as a starting point. We distinguish

between bounds on the distance to Krylov subspaces and to Ritz spaces. In the first

case we achieve an a priori result (requiring ∥E∥2only) and in the second case an a

posteriori result (requiring Vk, Bkand bk+1 in addition to ∥E∥2). In both cases the

bounds also require the exact eigenvalues of A. Next to being interesting in their own

right the bounds may also be beneficial inside algorithms as stopping criterion when

combined with certain spectral estimation techniques, cf. [2,22].

Our theory heavily relies on spectral perturbation theory. Classical works in this

field include [7,12,22,24,34]. The distance of an approximate invariant subspaces from

an exact one is considered in [13,20] (based on the residual) and in [31] (based on

the quality of a surrounding search space). For the a priori setting we will generalize

a classic result [23, Theorem 6.3] that bounds the angle between an eigenvector and

the kth Krylov subspace. Generalizations for block Krylov methods of this classic

result can be found in [18,27]. [3] presents bounds for the angle between an invariant

subspace of Aand the Krylov subspace Kk(A, v1) generated by a Krylov method

with polynomial restarts. Other works considering inexact Krylov subspace methods

include [19,28,29,36], where only the matrix vector multiplication is assumed to be

inexact. Finally we mention [33] where the backward error for an approximate Krylov

subspace is analyzed.

The paper is structured as follows: Section 2introduces more notation and basic

or preliminary results. The main results of the paper are presented in sections 3and 4.

In Section 3we present bounds on the distance to Krylov subspaces of a perturbed

matrix A+E, while in Section 4a bound on the approximation quality of Ritz spaces

is discussed. Numerical examples illustrating our findings are presented in Section 5.

Finally we oﬀer some concluding remarks in Section 6.

2. Notation and preliminary results. In this section we introduce some no-

tation and collect basic results, grouped by topic. Throughout this paper Cm×n

denotes the set of complex m×nmatrices and Cnthe n-dimensional complex vector

space. A matrix norm ∥ · ∥ is called unitarily invariant if ∥UAQ∥=∥A∥for any

matrix A∈Cn×mand any unitary matrices U∈Cm×mand Q∈Cn×n. Prominent

unitarily invariant matrix norms are the 2-norm denoted by ∥.∥2and the Frobenius

norm denoted by ∥.∥F.

For a set Λ ⊂Rand A∈Cn×nwe define

spread(Λ) := max

λ1,λ2∈Λ|λ1−λ2|and ran(A) := {Ax |x∈Cn},

where the range of Aequals the column space of the matrix A. We indicate by 0 the

null matrix, by Inthe n×nidentity matrix and by ekits kth column. The cardinality

of a discrete set S, denoted |S|, is the number of elements in S. The envelope env(M)

of a set M ⊂ R∪{∞,−∞} is defined by the smallest interval that contains M.

2.1. The angle of inclusion. For nonzero subspaces X,Y ⊂ Cnthe angle of

inclusion of Xin Yis defined by (1.2) (we replace ˜

Yby Yhere for ease of notation)

where

∠(x, Y) := min

y∈Y

y=0

∠(x, y),and ∠(x, y) := arccos (|xHy|

∥x∥2∥y∥2)∈[0,π

2].

For matrices X, Y we define ∠↷

max(X, Y ) := ∠↷

max(ran (X),ran (Y)).

4Ute Kandler Christian Schr¨oder

Intuitively, the angle of inclusion (1.2) provides a measure of how well Xis con-

tained in Y. If the angle of inclusion is small then for every x∈ X there is a y∈ Y that

is close to x, i.e., ∥x−y∥2/∥x∥2is small. In particular, we have ∠↷

max(X,Y) = 0 if and

only if X ⊂ Y and ∠↷

max(X,Y) reaches π/2 if Xcontains a direction orthogonal to Y.

In particular the latter is the case whenever dim(X)>dim(Y). Using these rationale

the angle of inclusion can be extended to zero-dimensional spaces: ∠↷

max({0},Y) := 0

for any (zero or non-zero) Y, and ∠↷

max(X,{0}) := π/2 for any non-zero X.

We also stress that while the angle of inclusion is non-symmetric in general , i.e.,

∠↷

max(X,Y)=∠↷

max(Y,X), it is symmetric whenever dim(X) = dim(Y). In the this

case ∠↷

max(X,Y) = ∠↷

max(Y,X) coincides with the maximal canonical angle between

Xand Y.

As a side note we mention that there is also a minimal angle, which is defined by

∠min(X,Y) := minx∈X ,x=0 ∠(x, Y), but this concept does not play a role in this paper.

For more details on subspace angles see [6,15,37]. We state some useful properties of

inclusion angles in the following lemma and theorem.

Lemma 2.1. Let X,Y,Zbe subspaces of Cn. Then

i. ∠↷

max(X,Z)≤∠↷

max(X,Y) + ∠↷

max(Y,Z)(triangle inequality);

ii. ∠↷

max(X,Y)≥∠↷

max(X,Z)

whenever Y ⊂ Z;

iii. ∠↷

max(X,Y) = ∠↷

max(Y⊥,X⊥)

where X⊥,Y⊥⊂Cnare the orthogonal complements of X,Yin Cn;

iv. ∠↷

max(X,Y) = 









0if dim(X) = 0

arccos (σdim(X)(XHY))if 1≤dim(X)≤dim(Y)

2if dim(X)>dim(Y)

where X, Y are any orthonormal bases of X,Yrespectively, i.e. X:= ran(X),

Y:= ran(Y), and σi(XHY)denotes the i-th largest singular value of XHY;

v. ∠↷

max(X,Y⊥∩Z)≤∠↷

max(X ⊕Y,Z)

whenever X ⊥ Y.

Proof.i. We distinguish three cases. a) It is easy to check that the inequality holds

whenever at least one of X,Y,Zis zero. b) If dim(X)>dim(Y) or dim(Y)>dim(Z)

at least one angle on the right-hand side reaches π

2and there is nothing to prove.

c) Otherwise, i.e., if 1 ≤dim(X)≤dim(Y)≤dim(Z) the definition of ∠↷

max(X,Y)

coincides with the definition of angles between subspaces in [37]. Hence in this case

the proof in [37, pp. 275] applies to our definition as well.

ii. This is a special case of part i. with ∠↷

max(Y,Z) = 0.

iii. We distinguish three cases. a) If X ⊂ Y or, equivalently, Y⊥⊂ X⊥then both

angles are zero. (This covers the cases that X= 0 or Y⊥= 0.) b) If dim(X)>dim(Y)

or, equivalently, dim(Y⊥)>dim(X⊥) then both angles are π/2. (This covers the cases

that Y= 0 or X⊥= 0.) c) If 1 ≤dim(X)≤dim(Y) see [16].

iv. The first case (dim(X) = 0) holds by definition. The last case (dim(X)>

dim(Y)) was discussed above. For the middle case (1 ≤dim(X)≤dim(Y)) see [17, p.

2010].

v. W.l.o.g. Xis non-zero (otherwise the angle on the left hand side is zero

and there is nothing to prove). Also, w.l.o.g. dim(X)≤dim(Y⊥∩ Z) (otherwise

dim(X)>dim(Y⊥∩Z) implies

dim(X ⊕Y) = dim(X) + dim(Y)>dim(Y⊥∩Z) + dim(Y)

≥(dim(Y⊥) + dim(Z)−n) + dim(Y) = dim(Z),

Analysis of inexact Arnoldi methods for Hermitian eigenvalue problems 5

so both angles in the claimed inequality are π/2 and there is nothing to prove). So

1≤dim(X)≤dim(Y⊥∩Z) which implies that Zand Y⊥are non-zero. W.l.o.g., also

Yis non-zero (otherwise both angles in the claimed inequality reduce to ∠↷

max(X,Z)

and there is nothing to prove). Finally we can assume w.l.o.g. that dim(X ⊕Y) =

dim(X)+dim(Y)≤dim(Z) (otherwise the angle on the right-hand side in the claimed

inequality is π/2 and there is nothing to prove).

Let X∈Cn×dim(X),Y∈Cn×dim(Y),Z=∈Cn×dim(Z)be orthonormal bases of

X,Y,Z, respectively, where Z= [Z1, Z2] is chosen such that ran(Z1) = Y⊥∩Z. Then

[X, Y ] is an orthonormal basis of X⊕Y, since X ⊥ Y. Moreover, W:= [X, Y ]H[Z1, Z2]

is of the form [W11 W12

0W22 ], where W11 =XHZ1does not have less columns than rows

(since 1 ≤dim(X)≤dim(Y⊥∩Z)) and W22 =YHZ2has full column rank. Also W

does not have less columns than rows (since 1 ≤dim(X ⊕Y)≤dim(Z)). Hence, with

part iv. we have

cos ∠↷

max(X,Y⊥∩Z) = σdim(X)(W11) and σdim(X⊕Y)(W) = cos ∠↷

max(X ⊕Y,Z).

Using the monotonically decreasing behavior of the cosine in the interval [0,π

2] all

that remains to prove is σdim(X)(W11)≥σdim(X⊕Y)(W). We distinguish two cases. If

W22 has more rows than columns, then the rows of Wmust be linearly dependent,

i.e., σdim(X⊕Y)(W) = σmin(W) = 0. Otherwise W22 is square and we have by the

interlacing property of eigenvalues of Hermitian matrices [22] that

σdim(X)(W11)2=λdim(X)(WH

11W11)≥λdim(X)+dim(Y)([WH

11W11 ∗

∗ ∗])

=λdim(X)+dim(Y)(WHW) = σdim(X⊕Y)(W)2.

Theorem 2.2. [30, Theorem 2.7] Let X∈Cn×m, X⊥∈Cn×n−m, Z ∈Cn−m×m

such that [X, X⊥]is unitary. Let X:= ran(X)and ˜

X:= ran(X+X⊥Z). Then

tan ∠↷

max(X,˜

X) = ∥Z∥2.

2.2. The spectrum and its perturbation. The set of eigenvalues of a matrix

A, i.e., its spectrum, is denoted by eig(A). A subspace X ∈ Cnis called an invariant

subspace of A∈Cn×nif AX ⊂ X. When Ais Hermitian, choosing orthonormal bases

X∈Cn×m, X⊥∈Cn×n−mof an invariant subspace Xand its orthogonal complement

X⊥, respectively, gives rise to a block spectral decomposition of Aof the form

(2.1) [X, X⊥]HA[X, X⊥] = [A11 0

0A22]

with A11 ∈Cm×m,A22 ∈Cn−m×n−mand eig(A) = eig(A11)∪eig(A22). The invariant

subspace Xis called simple if eig(A11)∩eig(A22) = ∅.

The following result is known as Weyl’s theorem for the 2-norm, e.g., [34, Corollary

4.10], [12], and as Hoﬀman-Wielandt theorem for the Frobenius norm, e.g., [12,38].

Theorem 2.3. Let A,E∈Cn×nbe Hermitian. Let λ1≥λ2≥... ≥λnbe the

eigenvalues of Aand ˜

λ1≥˜

λ2≥. . . ≥˜

λnthe eigenvalues of A+E. For ∗ ∈ {2, F }

we have

∥diag(λ1, . . . , λn)−diag(˜

λ1, . . . , ˜

λn)∥∗≤ ∥E∥∗.

6Ute Kandler Christian Schr¨oder

The following theorem is a generalization of the well-known Davis-Kahan tan(θ)

theorem which imposes, compared to the original formulation [7, Theorem 6.3] relaxed

conditions on the spectrum.

Theorem 2.4. [20, Theorem 2] Let A∈Cn×nbe a Hermitian matrix and let

X= [X1, X2, X3]be unitary so that XHAX = diag(A11, A22, A33)is block diagonal,

where Xi∈Cn×ni,Aii ∈Cni×nifor i= 1,2,3with n1+n2+n3=n. Let ˜

X3∈Cn×n3

have orthogonal columns and let R=A˜

X3−˜

X3˜

A3, where ˜

A3=˜

3A˜

X3. Suppose

that eig(A11)lies in [a, b]and eig( ˜

A3)lies in the union of (−∞, a −δ]and [b+δ, ∞).

Then

(2.2) tan ∠↷

max(˜

X3,[X2, X3]) ≤∥R∥2

δ.

Remark 2.5. In [20, Theorem 2] XHAX was assumed diagonal (instead of just

block diagonal). Our formulation holds, because the change of bases that diagonalizes

A11,A22, and A33 does not influence the subspace angles.

The standard Davis-Kahan tan(θ) theorem is obtained for n2= 0.

2.3. Gap between eigenvalues and separation of subspaces. We define

the gap between closed sets Λ1, Λ2⊂Rand square matrices A,Brespectively, by

gap(Λ1,Λ2) := min

λ1∈Λ1,λ2∈Λ2|λ1−λ2|and gap(A, B) := gap(eig(A),eig(B)).

Note that with (2.1) gap(A11, A22) = gap (eig(A11),eig(A)\eig(A11)) if ran(X) is a

simple invariant subspace. The gap provides a natural way of describing the distance

between two spectra. A similar quantity, also measuring in some sense the distance

between the spectra of two arbitrary square matrices A∈Cn×n, B ∈Cm×m, is given

by the separation

(2.3) sep∗(A, B) := min

Z∈Cm×n

∥Z∥∗=1 ∥AZ −ZB∥∗,where ∗ ∈ {2, F }.

Furthermore, the separation between two subspaces X,Y ∈ Cnwith respect to a

matrix A∈Cn×nis defined by

sepA,∗(X,Y) := sep∗(XHAX, Y HAY ),

where Xand Yare any orthonormal bases for Xand Y. This quantity is well defined

because the norms used in (2.3) are unitarily invariant, cf. [31]. For Hermitian matrices

the sep and the gap operators are related as follows.

Lemma 2.6. Let A11 ∈Cn×nand A22 ∈Cm×mbe Hermitian. Then

πgap(A11, A22)≤sep2(A11, A22)≤gap(A11, A22) = sepF(A11, A22).

Proof. The first inequality is discussed in [4, p.15] leveraging a function-theoretical

result in [35]. The second inequality is obtained by restricting Zin (2.3) to the form

Z=uivH

j, where ui∈Cnand vj∈Cmare a unit (right) eigenvector of A11 and a

unit (left) eigenvector of A22, respectively. The last relation is stated in [34, Theorem

3.1].

Remark 2.7. The second inequality of Lemma 2.6 is also valid in the case of

non-Hermitian matrices A11 and A22. In that case the words “right” and “left” put

Analysis of inexact Arnoldi methods for Hermitian eigenvalue problems 7

in parentheses in the above proof become important. The first and the last relation

in Lemma 2.6 also hold for normal A11 and A22 (if 2

πis replaced by 4

π), but do not

carry over to the non-normal case.

The following theorem is a corollary of Theorem 2.4 and provides a measure how

well the invariant subspace ran(X1) of Ais contained in the invariant subspace of

larger dimension ran([ ˜

X1,˜

X2]) of a perturbed matrix A+E. Note that it is even

applicable if the eigenvalues corresponding to ran(X1) are not well separated from

the remaining ones.

Theorem 2.8. Let A, E ∈Cn×nbe Hermitian. Let Xi,˜

Xi∈Cn×nifor i= 1,2,3

with n1+n2+n3=nbe such that X:= [X1, X2, X3],˜

X:= [ ˜

X1,˜

X2,˜

X3]are unitary

and

(2.4) XHAX = diag(A11, A22, A33),˜

XH(A+E)˜

X= diag( ˜

A11,˜

A22,˜

A33)

are block diagonal.

i) If gap(env(eig(A11)),eig( ˜

A33)) >∥E∥2then

tan ∠↷

max (X1,[˜

X1,˜

X2])≤∥E∥2

gap(A11,˜

A33)−∥E∥2

ii) If gap(env(eig(A11)),eig(A33)) >2∥E∥2and max

λ∈eig( ˜

A33)

min

λ∈eig(A33)|˜

λ−λ| ≤ ∥E∥2

then

tan ∠↷

max (X1,[˜

X1,˜

X2])≤∥E∥2

gap(A11, A33)−2∥E∥2

iii) If gap(env(eig( ˜

A11)),eig( ˜

A33)) >2∥E∥2and max

λ∈eig(A11)min

λ∈eig( ˜

A11)|˜

λ−λ| ≤

∥E∥2then

tan ∠↷

max (X1,[˜

X1,˜

X2])≤∥E∥2

gap( ˜

A11,˜

A33)−2∥E∥2

Proof. Defining ˜

A3:= ˜

3A˜

X3=˜

3(A+E)˜

X3−˜

3E˜

X3=˜

A33 −˜

3E˜

X3we

get

R:= A˜

X3−˜

X3˜

A3= (A+E)˜

X3−E˜

X3−˜

X3(˜

A33 −˜

3E˜

X3)

=−E˜

X3+˜

X3˜

3E˜

X3=−(I−˜

X3˜

3)E˜

X3,

which implies ∥R∥2≤ ∥E∥2. Since ˜

A3and ˜

A33 diﬀer by ˜

3E˜

X3their eigen-

values diﬀer by at most ∥E∥2by Theorem 2.3, implying gap(eig(A11),eig( ˜

A3)) ≥

gap(eig(A11),eig( ˜

A33)) −∥E∥2. Together with the assumption

gap (env(eig(A11)),eig( ˜

A33))>∥E∥2

we deduct that there is no eigenvalue of ˜

A3in the interval [λmin(A11)−gap(A11,˜

A3),

λmax(A11) + gap(A11,˜

A3)], where λmin(A11) := min(eig (A11)) and λmax(A11) :=

max (eig(A11)). Hence Theorem 2.4 is applicable and leads to

∠↷

max(˜

X3,[X2, X3]) ≤∥R∥2

gap(A11,˜

A3)≤∥E∥2

gap(A11,˜

A3)≤∥E∥2

gap(A11,˜

A33)−∥E∥2

8Ute Kandler Christian Schr¨oder

From Lemma 2.1.iii it follows that

∠↷

max(X1,[˜

X1,˜

X2]) = ∠↷

max([ ˜

X1,˜

X2]⊥, X⊥

1) = ∠↷

max(˜

X3,[X2, X3]),

which concludes the proof of part i).

For part ii) we use that by assumption max˜

λ∈eig( ˜

A33)minλ∈eig(A33)|˜

λ−λ| ≤ ∥E∥2

for every eigenvalue of ˜

A33 there exists an eigenvalue of A33 that is no further away

than ∥E∥2. This implies gap(A11, A33)≥gap(A11,˜

A33)−∥E∥2. Together with part

i) we obtain the assertion. Analogously, also part iii) follows from part i).

Remark 2.9. The assumption in part ii) that max˜

λ∈eig( ˜

A33)minλ∈eig(A33)|˜

λ−

λ| ≤ ∥E∥2and the similar condition in part iii) are not very restrictive. By Theo-

rem 2.3, the eigenvalues of Adiﬀer from the corresponding ones of A+Eby at most

∥E∥2. So the assumption is mainly a restriction on the distribution of eig(A+E) into

eig( ˜

A11), eig( ˜

A22), and eig( ˜

A33).

2.4. Ritz and Krylov subspaces. Let A∈Cn×n,V ⊂ Cnbe a subspace,

Y∈Cn×k, and M∈Ck×k. The pair (M, Y ) is called a Ritz pair of Awith respect to

Vif it satisfies the Galerkin condition

AY −Y M ⊥ V and ran(Y)⊂ V.

A subspace Yis called a Ritz space of Ain Vif for some basis Y∈Cn×kof Y

there is an M∈Cdim(Y)×dim(Y)such that (M, Y ) is a Ritz pair of Awith respect

to V. The eigenvalues of such an Mare well-defined and are called Ritz values of A

corresponding to Y.

AKrylov relation of Aof order kis a relation of the form

(2.5) AVk=VkBk+vk+1bH

k+1,

where Bk∈Ck×k,bk+1 ∈Ck, and the columns of [Vk, vk+1]∈Cn×k+1 are orthonor-

mal.

Relation (2.5) implies that ran(Vk) is a Krylov subspace of A. But it does not

imply that ran(Vi) = ran([v1, . . . , vi]) is a Krylov subspace of Afor i<k. It does

if Bkis Hessenberg and bk+1 =bk+1,kek. The following two theorems are the main

basis of our results.

Theorem 2.10. [24, Theorem 6.3] [23] Let the eigenvalues λiof the Hermitian

matrix Abe ordered decreasingly. Then the angle between the exact eigenvector zi

associated with λiand the k-th Krylov subspace Kksatisfies the inequality

(2.6) tan ∠(zi,Kk)≤θi

ψk−i(1 + 2ηi)tan ∠(v1, zi),

where

(2.7) θ1= 1, θi=

i−1

∏

j=1

λj−λn

λj−λi

for i > 1, ηi=λi−λi+1

λi+1 −λn

and ψk−iis the Chebychev polynomial of degree k−i.

Theorem 2.11. [31, Theorem 2] Let Xbe an eigenspace of A∈Cn×nand let

Kbe a subspace in Cn. Let Ybe a Ritz space in Kand let Y⊥be the orthogonal

complement of Yin K. Then

(2.8) sin ∠↷

max(X,Y)≤sin ∠↷

max(X,K)√1 + ∥P A(I−P)∥2

sepA,2(Y⊥,X)2,

Analysis of inexact Arnoldi methods for Hermitian eigenvalue problems 9

where Pis the orthogonal projector onto K.

Remark 2.12. In the original formulation of Theorem 2.11 in [31] it is required

that Xand Yare of the same dimension. However, the proof given there is actually

valid for dim(X)= dim(Y).

3. Distance to inexact Krylov subspaces. Here we present our main result

concerning how well an invariant subspace X1of Ais contained in a search space ˜

which is a Krylov subspace of a perturbed matrix A+E. We consider a situation

where literations of an inexact Krylov subspace method have been carried out but

the desired eigenpair approximations cannot yet be considered converged. This leads

to the question of how many more iterations have to be carried out until convergence

can be expected. The theorem uses nested subsets of eigenvalues. An illustration is

provided in Figure 1.

Theorem 3.1. Let A∈Cn×nbe Hermitian with eigenvalues λ1≥λ2≥. . . ≥λn.

Let J1⊆J2⊆J3⊆J4be nested nonempty subsets of {1,2, . . . , n}so that max(J3)< n

and J2,J3consist of consecutive integers.

Denote by JL:= {1,2, . . . , min(J3)−1}the leading and by JT:= {max(J3)+1, . . . , n}

the trailing indices of J3.

Let Λi:= {λj:j∈Ji}and Λ−i={λj:j∈ {1, . . . n}\Ji}for i∈ {1,...,4, L, T}.

Let X1and X4be invariant subspaces of Acorresponding to Λ1and Λ4, respectively.

Let k > max(J3).

For j= 1, . . . , k let ˜

Kj:= Kj(A+E, ˜v1)for some Hermitian E∈Cn×nand some

˜v1∈Cn.

For i∈ {2,3}let ˜

Xibe an invariant subspace of A+Ecorresponding to {λj(A+E) :

j∈Ji}such that ˜

X2⊂˜

X3.

If 2∥E∥2<min {gap(Λ1,Λ−2),gap(Λ2,Λ−3),gap(Λ3,Λ−4)}then for every l= 1, . . . ,

k−|JL|, we have

∠↷

max (X1,˜

Kk)≤∠↷

max (˜

X2,˜

Kk)+δ1,2

≤arctan (ϱk,l ·tan ∠↷

max(˜

X3,˜

Kl))+δ1,2

(3.1)

≤arctan (ϱk,l ·tan≤π

2(∠↷

max(X4,˜

Kl) + δ3,4))+δ1,2

(3.2)

where tan≤π

2(α) := tan(min{α, π

2})and

ϱk,l := (∑

i∈J2

θ2

ψk−l−|ΛL|(1 + 2˜ηi)2)1

, δi,j := arctan (∥E∥2

gap(Λi,Λ−j)−2∥E∥2),

(3.3)

θi:= ∏

j∈JL

|λj−λn|+ 2∥E∥2

|λj−λi|−2∥E∥2

>0,˜ηi:= gap(λi,ΛT)−2∥E∥2

spread(ΛT)+2∥E∥2

>0,

(3.4)

where ψjdenotes the Chebychev polynomial of degree j.

Proof. The proof consists of three parts, proving that

i) ∠↷

max(X1,˜

Kk)≤∠↷

max(˜

X2,˜

Kk) + δ1,2,

ii) tan ∠↷

max(˜

X2,˜

Kk)≤ϱk,l ·tan ∠↷

max(˜

X3,˜

Kl), and

iii) ∠↷

max(˜

X3,˜

Kl)≤min(π

2,∠↷

max(X4,˜

Kl) + δ3,4), respectively.

Then the claim follows because of the monotonicity of the tangent in [0,π

2].

10 Ute Kandler Christian Schr¨oder

λ(A)

λ1

λn

Λ1

Λ2

Λ3

Λ4

ΛT⊂Λ−3ΛL⊂Λ−3

Λ−1Λ−1

Λ−2Λ−2

Λ−4Λ−4

gap−4,3

gapT,2

gap−2,1gap1,−2

gap2,L

gap3,−4

Fig. 1.Illustration of the nested spectral subsets, here gap(Λi,Λj) = min{gapij ,gapji}

Part i) Using Theorem 2.8 ii) and the definition of δ1,2, respectively, we have

(3.5) tan ∠↷

max(X1,˜

X2)≤∥E∥2

gap(Λ1,Λ−2)−2∥E∥2

= tan δ1,2.

Hence, by the triangle inequality, Lemma 2.1.i, we get

(3.6) ∠↷

max (X1,˜

Kk)≤∠↷

max(X1,˜

X2) + ∠↷

max (˜

X2,˜

Kk)≤δ1,2+∠↷

max (˜

X2,˜

Kk).

Part iii) The proof of part iii) is similar to that of part i) and also uses a triangle

inequality, this time in the form of

∠↷

max (˜

X3,˜

Kl)≤∠↷

max(˜

X3,X4) + ∠↷

max (X4,˜

Kl)≤δ3,4+∠↷

max (X4,˜

Kl)

where we used Theorem 2.8 iii) for the second inequality. Also, any subspace angle

can at most be π

2by definition.

Part ii) If dim( ˜

Kl)<dim( ˜

X3) = |J3|then ∠↷

max(˜

X3,˜

Kl) = π

2and the right-hand

side is infinite. Thus there is nothing to prove in this case. Hence we may assume

that dim( ˜

Kl)≥ |J3|in the following.

We start by considering the angles between ˜

Kkand eigenvectors of A+E. So,

let ˜x1, . . . , ˜xnbe unit eigenvectors of A+Ecorresponding to the eigenvalues ˜

λ1≥

λ2≥. . . ≥˜

λnsuch that ˜

X3= span{˜xj:j∈J3}and ˜

X2= span{˜xj:j∈J2}. Let

ΛT:= {˜

λj:j∈JT}.

For i∈J2let Vi:= span{˜xj:j∈J3, j =i}=˜

X3∩span(˜xi)⊥. We note

that ˜

Kl∩V⊥

i, being an intersection of a (≥ |J3|)-dimensional and an (n−|J3|+ 1)-

dimensional subspace, is at least one-dimensional. Thus there exists a nonzero vector

vi∈˜

Kl∩V⊥

isuch that ∠(˜xi, vi) = ∠(˜xi,˜

Kl∩V⊥

i). Let

Ai:= A+E−∑

j∈J3

j=i

(˜

λj−˜

λn)˜xj˜xH

Note that the matrices Aiand A+Ehave the same eigenvectors. Also the eigenvalues

λjfor j∈({1, . . . , n}\J3)∪ {i}coincide. The remaining eigenvalues of A+Eare

moved to ˜

λn∈˜

ΛT. Then Theorem 2.10, applied to Aiand vi, yields

(3.7) tan ∠(˜xi,Kk−l+1(Ai, vi)) ≤ρi:= θi

ψk−l−|JT|(1 + 2ηi)tan ∠(˜xi, vi),

Analysis of inexact Arnoldi methods for Hermitian eigenvalue problems 11

with

(3.8) θi=∏

j∈JL

|˜

λj−˜

λn|

|˜

λj−˜

λi|, ηi=|˜

λi−˜

λmin(JT)|

|˜

λmin(JT)−˜

λn|=gap(˜

λi,˜

ΛT)

spread(˜

ΛT).

Let gi∈ Kk−l+1(Ai, vi) be such that ∠(˜xi, gi) = ∠(˜xi,Kk−l+1(Ai, vi)). W.l.o.g. gi

is scaled such that ˜xH

igi= 1 (otherwise, if ˜xH

igi= 0 we can rescale gi, if ˜xH

igi= 0

it follows with Lemma 2.1.vthat π/2 = ∠(˜xi, gi) = ∠(˜xi,˜

Kl∩V⊥

i)≤∠↷

max(˜

X3,˜

Kl),

i.e., ∠↷

max(˜

X3,˜

Kl) = π/2 such that the right-hand side of part ii) is infinite, i.e., in

this case is nothing to prove). Note that since vi⊥ Vi, we have Aivi= (A+E)vi

and Aivi⊥ Vi. Thus by induction Kj(Ai, vi) = Kj(A+E, vi) and Kj(Ai, vi)⊥ Vifor

every j= 1,2, . . .. Hence, gi⊥ Vithus there exist gL,i ∈C|JL|and gT,i ∈C|JT|such

that

gi= [˜x1, . . . , ˜x|JL|]gL,i + ˜xi+ [˜xmin(JT), . . . , ˜xn]gT,i.

Using Theorem 2.2, the definition of gi, and (3.7) leads to∥[gT

L,i, gT

T,i]T∥2=tan ∠(gi,˜xi)

≤ρi.Since the subspaces ˜

Kjare nested and vi∈˜

Kl, we have Kk−l+1(A+E, vi)⊂

Kk(A+E, ˜v1) = ˜

Kk. Thus gi∈˜

Kk. Since ˜

Kkis independent of i, it contains gifor

all i∈J2. Thus it contains the subspace ran(G) where

G:= [gmin(J2), . . . , gmax(J2)] = [˜x1, . . . , ˜xn]





I|J2|







with GL:= [gL,min(J2), . . . , gL,max(J2)]∈C|JL|×|J2|,GT:= [gT,min(J2), . . . , gT,max(J2)]

∈C|JT|×|J2|and the upper and lower zero blocks are of format (min(J2)−min(J3)) ×

|J2|and (max(J3)−max(J2))×|J2|, respectively. Using Lemma 2.1.ii and Theorem 2.2

we have

(3.9) tan ∠↷

max(˜

X2,˜

Kk)≤tan ∠↷

max(˜

X2,ran(G)) = 



[GL

GT]



2

Further transformations yield

(3.10) 



[GL

GT]



2≤



[GL

GT]



F≤(∑

i∈J2

ρ2

i)1

Applying Theorem 2.3, we have for i=j:|λi−λj|−2∥E∥2≤ |˜

λi−˜

λj| ≤ |λi−λj|+

2∥E∥2. Thus θi≤˜

θiand ηi≥˜ηi, for θi, ηias in (3.8) and ˜

θi,˜ηias in (3.4). Also, ˜

θi

and ˜ηiare positive by the assumed bound on ∥E∥2. Moreover, ψk−l−|JL|(1 + 2ηi)>

ψk−l−|JL|(1 + 2˜ηi), because Chebychev polynomials are positive and monotonically

increasing in [1; ∞). Furthermore, since ˜

X3= span(˜xi)⊕Viwe have by Lemma 2.1.v

that ∠(˜xi,˜

Kl∩V⊥

i)≤∠↷

max(˜

X3,˜

Kl), hence ∠(˜xi, vi)≤∠↷

max(˜

X3,˜

Kl). Thus

(3.11) ρi≤˜

θi

ψk−l−|JL|(1 + 2˜ηi)tan ∠↷

max(˜

X3,˜

Kl).

12 Ute Kandler Christian Schr¨oder

Combining (3.9), (3.10), and (3.11) we have

(3.12) tan ∠↷

max(˜

X2,˜

Kk)≤(∑

i∈J2

θ2

ψk−l−|JL|(1 + 2˜ηi)2)1

tan ∠↷

max(˜

X3,˜

Kl)

=ϱk,l tan ∠↷

max(˜

X3,˜

Kl).

This concludes the proof of part ii) and of the whole theorem.

The following remarks are in order.

Remark 3.2. Theorem 3.1 somewhat resembles Theorem 2.10, but holds several

improvements. i) The search space is chosen as Krylov subspace of a perturbed matrix

A+E(instead of Aitself). Thus the setting of inexact Krylov methods is covered.

ii) Instead of just eigenvectors we consider invariant subspaces. This allows to treat

clusters of eigenvalues as a whole. iii) The dimension lof the Krylov subspace on

the right-hand side is allowed to be larger than one. (Note that ∠(v1, zi) in (2.6)

could be written as ∠(vi,K1).) This is necessary for the theorem to be meaningful

as for l < dim(X4) the right-hand side is infinite. Moreover, this might be useful if

information about the angle ∠↷

max(˜

X3,˜

Kl) or ∠↷

max(X4,˜

Kl) is available for some l. iv)

Theorem 3.1 is still useful even if the wanted eigenvalues Λ1are not well separated

from the rest of the spectrum, i.e., if gap(Λ1,Λ−1)−2∥E∥2is tiny or even negative.

This is achieved by considering four possibly diﬀerent (but nested) sets of eigenvalues,

Λ1, . . . , Λ4. All that matters is that the gaps between Λiand the complement of Λi+1,

i.e., gap(Λ1,Λ−2), gap(Λ2,Λ−3), gap(Λ3,Λ−4), are well larger than 2∥E∥2. We stress

in particular that gap(Λi,Λ−i) may even be zero, i.e., Xiis not required to be simple.

Finally, Theorem 3.1 is a generalization of Theorem 2.10 and reduces to the latter in

case ∥E∥= 0, l= 1, J1=J2=J3=J4={i}.

Remark 3.3. The constant ϱk,l decreases exponentially fast as kgrows. Indeed,

from ψk(x) = cosh(k·arcosh(x)) for |x| ≥ 1, cosh(x) = 1

2(ex+e−x)>1

2ex, and

arcosh(x) = ln(x+√x2−1) for x≥1 we deduct that ψk(1 + 2η)>1

2(1 + 2η+

2√η+η2)k. This indicates at least linear convergence of ϱk,l towards zero.

Remark 3.4. In order to get an idea of the qualitative behavior of bound (3.2)

we consider the following. For small arguments 0 < a ≪1 we have arctan(a)≈aand

arctan(a)≤a. Since in the later stages of the iteration ϱk,l is small we may replace

arctan(·) by the identity in (3.2). Doing so, taking logarithms, and substituting

γi:= 1 + 2˜ηi+ 2√˜ηi+ ˜η2

i,τl:= tan≤π

2(∠↷

max(X4,˜

Kl) + δ3,4)results in

log (∠↷

max (X1,˜

Kk))≤log 

τl(∑

i∈J2

θ2

4γ2(k−l−|ΛL|)

i)1

+δ1,2

.

Moreover, using log(a+b)≈max(log(a),log(b)) and some algebraic transformations

the right hand side can be approximated by

max (log(δ1,2),log (τl)+max

i∈J2(log(˜

θi)−log (1

2)+(l+|ΛL|) log(γi)−klog(γi))).

On the other hand, in the beginning of the iteration the argument of the arctan

in (3.2) is usually large and the only save bound for ∠↷

max (X1,˜

Kk)is π

2. Together

Analysis of inexact Arnoldi methods for Hermitian eigenvalue problems 13

init.

phase

convergence

phase

stagnation

phase

αi−kβi

log(δ1,2)

log(π

log ∠y

max(X1,˜

Kk)

Fig. 2.Qualitative shape of the bound (3.2), see Remark 3.4

with the previous consideration we get

(3.13) log (∠↷

max (X1,˜

Kk))⪅min (log (π

2),max (log(δ1,2),max

i∈J2

(αi−kβi)))

with αi= log (τl) + log(˜

θi)−log(1

2)+(l+|ΛL|) log(γi), βi= log(γi)>0, and a⪅b

meaning that “ais usually smaller than b, it can be a bit larger, but not much”.

So, the logarithm of ∠↷

max(X1,˜

Kk) can be approximately bounded by a set of

straight lines, two horizontal and a few decreasing ones, see Figure 2for an illustration.

We can identify three phases. i) During the first iteration steps all the decreasing lines

lie above the π

2-level. Hence the minimum is assumed by log (π

2). We call this phase

the initialization phase. ii) In the following iteration steps the decreasing lines fall

below the π

2-level and become the important terms. Here the largest of the decreasing

lines defines the bound. We obtain a strict monotonically decreasing, piecewise linear,

convex shape. This phase is called the convergence phase. It is well possible that only

one line dominates for the whole convergence phase, i.e., there is no bend, see the

numerical examples in Section 5. iii) In the last phase the maximum is assumed

by log(δ1,2). We denote the last section by stagnation phase because the angle of

inclusion does not decrease anymore.

Remark 3.5. Theorem 3.1 oﬀers some freedom in choosing the index sets Ji.

J1is fixed to the indices of the wanted eigenvalues. J2, J3, J4are free and could thus

be chosen to minimize the bounds (3.1), (3.2). Here the various relations between

the index sets J1, J2, J3, J4influence the constants δ12,˜

θi, ˜ηiand δ34 in diﬀerent ways.

For example extending J2leads to a larger gap(Λ1,Λ−2) and thus to a decreased δ12.

of (3.2). Similar extending J3with respect to J2will improve ˜

θiand ˜ηi. Finally

extending J4with respect to J3will improve δ34.

Remark 3.6. Theorem 3.1 still holds if the eigenvalues are sorted in increasing

order, λ1≤λ2≤. . . ≤λn. This can be seen by applying the theorem to −Aand using

that all the constants ˜

θi,˜ηi, δi,j and ϱk,l are invariant under negating the eigenvalues.

Remark 3.7. Although this case is irrelevant in practice, we mention that

Theorem 3.1 still holds if the denominator of ˜ηiis zero, i.e., if ∥E∥2= 0 = spread(ΛT).

In this case ˜ηi=∞for all i∈J2and ϱk,l = 0. Then ∠↷

max(X1,˜

Kk)≤δ1,2for all

14 Ute Kandler Christian Schr¨oder

k > l +|JL|.

Remark 3.8. Theorem 3.1 relates four angles to one another: ∠↷

max(X1,˜

Kk),

∠↷

max(˜

X2,˜

Kk), ∠↷

max(˜

X3,˜

Kl), and ∠↷

max(X4,˜

Kl). The relation ”looks nicest“ when

restricted to the first and the last of these. (Especially since the middle two angles

involve invariant subspaces of A+Ewhich seem hard to obtain in practice.) The

reason to include all four angles in Theorem 3.1 is that i) ∠↷

max(˜

X2,˜

Kk) appears in

the following Theorem 4.1 and ii) ∠↷

max(˜

X3,˜

Kl) may actually be bounded in practice.

Remark 3.9. The formulation of Theorem 3.1 and Figure 1may seem to

suggest that the desired eigenvalues can be in the center of the spectrum (and the

theorem does hold in this case). However, the constants ˜

θi,i∈J2rapidly grow for

an increasing number of leading eigenvalues |JL|. Hence the theorem is useful only

if fairly exterior eigenvalues are wanted (say, when |JL|is not larger than two or

three). For example considering the matrix constructed by the MATLAB command

A= diag(100 :−1:1), ∥E∥2= 10−10 and choose Λ2= Λ3= Λ4= Λ1.Then for

•Λ1={90,91, . . . , 100}we get ˜

θmax = 1,

•Λ1={90,91}we get ˜

θmax = 1.731 ·1012,

•Λ1={50,51}we get ˜

θmax = 5.045 ·1028

with ˜

θmax := maxi∈J2˜

θi. Hence, already if we are interested in the 9th and 10th

largest eigenvalue of ATheorem 3.1 becomes little meaningful. Note, if largest Λ3

contains all the largest eigenvalues then ˜

θmax = 1.

4. Distance to inexact Ritz spaces. So far we have considered how well the

exact eigenspace X1is contained in the search space ˜

Kk. If ∠↷

max(X1,˜

Kk) is small then

there is a subspace of ˜

Kkthat is close to X1. However, in practice this subspace in ˜

is not known and X1is approximated by a Ritz space ˜

Yin ˜

Kk. Hence, in this section

we address the question: How much worse is the distance to a Ritz space compared

to the distance to the whole search space?

Theorem 4.1. Let A, E, Λ1,Λ2,Λ−2,X1,˜

X2and δ1,2be defined as in Theo-

rem 3.1.

Let Bk∈Ck×k,bk+1 ∈Ck, and Vk∈Cn×kbe such that the Krylov relation (2.5)for

A+Eis satisfied and ˜

Kk= ran(Vk).

Let ˜

Ybe a Ritz space of A+Ein ˜

Kk.

Let ˜

Mbe the Ritz values corresponding to ˜

Yand let ˜

M−be the set of k−dim( ˜

remaining Ritz values, with ∥E∥2<gap( ˜

M−,Λ2).

Then

(4.1)

∠↷

max(X1,˜

Y)≤δ1,2+ arcsin≤1(√1 + π2∥bk+1∥2

4(gap( ˜

M−,Λ2)−∥E∥2)2sin ∠↷

max(˜

X2,˜

Kk))

where arcsin≤1(x) := arcsin(min{1, x}).

Proof. By Lemma 2.1.iwe have

(4.2) ∠↷

max(X1,˜

Y)≤∠↷

max(X1,˜

X2) + ∠↷

max(˜

X2,˜

Y).

Now we treat each term of the right-hand side separately. Using Theorem 2.8 ii)

and (3.3), respectively, as in Theorem 3.1, the first one is bounded by

(4.3) ∠↷

max(X1,˜

X2)≤arctan ∥E∥2

gap(Λ1,Λ−2)−2∥E∥2

=δ1,2.

Analysis of inexact Arnoldi methods for Hermitian eigenvalue problems 15

Applying Theorem 2.11 to the second right-hand side angle in (4.2) results in

(4.4) ∠↷

max(˜

X2,˜

Y)≤arcsin≤1(√1 + ∥P(A+E)(I−P)∥2

sepA+E,2(˜

Y⊥,˜

X2)2sin ∠↷

max(˜

X2,˜

Kk)),

where Pis the orthogonal projector onto ˜

Kkand ˜

Y⊥is the orthogonal complement

of ˜

Yin ˜

Kk. Choosing V⊥∈Cn×n−ksuch that [Vk, V⊥] is unitary, we have for the

numerator ∥P(A+E)(I−P)∥2

2that

(4.5) ∥P(A+E)(I−P)∥2

2=∥VH

k(A+E)V⊥∥2

2=∥VH

⊥(A+E)Vk∥2

=∥VH

⊥VkBk+VH

⊥vk+1bH

k+1∥2

2=∥bk+1∥2

where relation (1.1) and the fact that A+Eis Hermitian was used. Moreover, applying

Lemma 2.6 to the denominator sepA+E,2(˜

Y⊥,˜

X2)2leads to

sepA+E,2(˜

Y⊥,˜

X2)2≥4

π2gap( ˜

M−,˜

Λ2)2≥4

π2(gap( ˜

M−,Λ2)−∥E∥2)2,

with ˜

Λ2:= {λj(A+E) : j∈J2}where J2is defined as in Theorem 3.1. Together

with (4.5) we obtain

(4.6)

∠↷

max(˜

X2,˜

Y)≤arcsin≤1



v

t1 + π2∥bk+1∥2

4(gap( ˜

M−,Λ2)−∥E∥2)2sin ∠↷

max(˜

X2,˜

Kk)



.

Finally, inserting (4.3) and (4.6) into (4.2) completes the proof.

Remark 4.2. Again, in order to deal with small gaps between the wanted and

unwanted eigenvalues, we have to allow the Ritz space ˜

Yto be of larger dimension

than the wanted invariant subspace X1.

Remark 4.3. One distinction between the Theorems 3.1 and 4.1 is the amount

of information necessary to evaluate the bounds corresponding to the kth step of the

Krylov method. Theorem 3.1 is an a priori result as it only needs the information of

the lth step (with l < k). In contrast, Theorem 4.1 is an a posteriori result, because

the vector bk+1 and the Ritz values at the kth step are required. It is unclear how an

a priori result bounding ∠↷

max(X1,˜

Y) could look like.

5. Numerical results. In this section we verify the spectral error bounds pre-

sented in sections 3and 4using two diﬀerent test matrices. The first one is constructed

to evaluate how changes in the eigenvalue distribution influence the behavior of the

bounds. The second test matrix comes from the application mentioned in the intro-

duction.

To obtain the Krylov relation (1.1) we apply an exact Arnoldi method, cf. [8,32],

to a matrix A+E, where Eis chosen as a Hermitian, random matrix of prescribed

norm.

5.1. Academic problem. We use a diagonal matrix Abuilt by the MATLAB

command

(5.1) A=diag([9,9−g,7,6,5,4,4−g,2,rand(1,992)]),

where g∈Ris a parameter. This is a 1000 ×1000 matrix with the eigenvalues

9,9−g, 7,6,5,4,4−g, 2 and additionally 992 eigenvalues in the interval (0,1). We

16 Ute Kandler Christian Schr¨oder

0 5 10 15 20 25 30 35 40 45

10−13

10−11

10−9

10−7

10−5

10−3

10−1

101

number of Arnoldi steps k

true value, ε=10−4

true value, ε=10−7

true value, ε=10−11

bound (3.2), ε=10−4

bound (3.2), ε=10−7

bound (3.2), ε=10−11

(a) case (A), Λ2= Λ3= Λ4= Λ1

0 5 10 15 20 25 30 35 40 45

10−13

10−11

10−9

10−7

10−5

10−3

10−1

101

number of Arnoldi steps k

true value, ε=10−4

true value, ε=10−7

true value, ε=10−11

bound (3.2), ε=10−4

bound (3.2), ε=10−7

bound (3.2), ε=10−11

(b) case (B), Λ1= Λ2⊂Λ3= Λ4

Fig. 3.Experiment 1: True angle of inclusion ∠↷

max(X1,˜

Kk)and corresponding bound (3.2)

with ε:= ∥E∥2/∥A∥2∈ {10−3,10−7,10−11}

are interested in the eigenspace corresponding to the eigenvalues Λ1={9−g, 7,6,5,4}.

Thus for g∈[0,2], gis the gap between the wanted and the unwanted eigenvalues

and if gis small Λ1is not well separated from the remaining spectrum of A.

Experiment 1. The first experiment explores the influence of the norm ∥E∥2of

the perturbation on the development of ∠↷

max(X1,˜

Kk) over the course of the Arnoldi

iteration. We choose g= 1 in (5.1), so the wanted and the unwanted eigenvalues

are well separated. We distinguish two cases (A) Λ2= Λ3= Λ4= Λ1and (B)

Λ1= Λ2⊂Λ3= Λ4.

For case (A) ∠↷

max(X1,˜

Kk) is plotted as black lines for ∥E∥2∈ {10−3,10−7,10−11}·

∥A∥2in Figure 3(a) . It can be observed that the angle decreases during the iteration,

although we search for eigenspaces of Ain a Krylov subspace of A+E(and not of A).

Convergence sets in after an initial phase and is linear with convergence rate indepen-

dent of ∥E∥2. But ∠↷

max(X1,˜

Kk) decreases not to zero but only to a limiting accuracy

that is on the order of ∥E∥2/∥A∥2. We see that all the curves agree in the sense that

the three phases mentioned in Remark 3.4 can be identified. The lighter lines show

the bounds of Theorem 3.1, where we choose l= 10, and Λ2= Λ3= Λ4= Λ1be-

cause the gap(Λ1,Λ−1) is large. The bound (3.2) starts to exist after the initialization

phase and correctly predicts the convergence and stagnation phases. In the stagna-

tion phase the bound overestimates the limiting accuracy only by a value of about 6.

The convergence rate (the αin ∠↷

max(X1,˜

Kk)≤α·∠↷

max(X1,˜

Kk+1) ) is overestimated

to be 0.41 (true value 0.12, both roughly computed from the observed values). This

amounts to a slow down factor of 2.5≈log(0.12)/log(0.41), i.e., reducing the angle

by a certain amount takes two and a half times less iterations than predicted by the

bound.

The convergence rate of bound (3.2) is determined by ˜ηi. According to Remark 3.5

the constant ˜ηican be improved by extending subset J3with respect to J2. This is

confirmed by Figure 3(b) where ∠↷

max(X1,˜

Kk) is plotted for Λ1= Λ2and Λ3= Λ4=

Λ1∪{9,4−a, 2}and for the same values of ∥E∥2as before. We see that during the

convergence phase the convergence rate of the true value and of the bound almost

coincide (true: 0.11, bound: 0.12).

Experiment 2. Here we investigate the sensitivity of bound (3.2) with respect

to the gap between the wanted and the remaining eigenvalues. We use the same Λ1,

Analysis of inexact Arnoldi methods for Hermitian eigenvalue problems 17

case gin (5.1)δ1,2

(lower

is better)

δ3,4

(lower

is better)

θmax

(lower

is better)

˜ηmin

(higher

is better)

overest.

of lim.

accuracy

slow

down

factor

(A) 104·∥E∥210−410−49·1062·10−70.33 1.96

(A) 102·∥E∥210−210−29·1082·10−90.40 2.41

(A) 2.1·∥E∥21.47 1.47 9 ·1011 2·10−12 1.00 1.5·105

(B) 2.1·∥E∥25·10−11 5·10−11 1.00 1.00 0.23 1.47

(B) 10−2·∥E∥25·10−11 5·10−11 1.00 1.00 0.23 1.49

Table 1

Experiment 2: Constants of the bound (3.2)for case (A) (Λ1= Λ2= Λ3= Λ4), case (B)

(Λ1⊂Λ2= Λ3= Λ4) and case (C) Λ1⊂Λ2⊂Λ3= Λ4(the numbers are correct to leading digit)

g∈ {10−2,2.1,102,104}·∥E∥2and distinguish between the three cases (A) Λ1= Λ2=

Λ3= Λ4, (B) Λ1⊂Λ2= Λ3= Λ4and (C) Λ1⊂Λ2⊂Λ3= Λ4.

For the first case because of the small gap between the wanted and the unwanted

eigenvalues the bound (3.2) can be expected to be very loose. This is confirmed by

Figure 4(a), where the true angle ∠↷

max(X1,˜

Kk) and the bound (3.2) are depicted

for ∥E∥2= 10−11∥A∥2,l= 20 and g∈ {2.1,102,104} ·∥E∥2for g= 10−2∥E∥2the

assumptions of Theorem 3.1 are not fulfilled and the bound does not exist. To be

precise, as we consider three diﬀerent matrices we would have to plot three diﬀerent

true angles. However, (apart from a minimal deviation in the starting point of the

convergence phase by 2 −3 iteration steps) the three true angles behave the same. So

for the sake of readability, only one true angle (for g= 2.1·∥E∥2) is depicted. The

figure illustrates that in case of a small gap choosing Λ2,Λ3,Λ4to coincide with Λ1

results in a rather loose bound. In particular, we see that the smaller the gap between

the wanted and the remaining eigenvalues the less meaningful bound (3.2) becomes.

The reason for the poor quality of the bound are the large constants δ1,2,δ3,4,˜

θmax

and the small ˜ηmin (see Table 5.1 where ˜

θmax := maxi∈J2{˜

θi}and ˜ηmin := mini∈J2{˜ηi}.

As mentioned in Remark 3.5 Λ2, . . . , Λ4may be chosen to optimize δ1,2,δ3,4,˜

θmax

and ˜ηmin in (3.2).

In case (B) we consider again g= 2.1·∥E∥2, but choose now Λ2= Λ3= Λ4=

Λ1∪{9,4−g} ⊃ Λ1. The constants stated in the second part of Table 5.1 improve

substantially and lead to a much sharper bound (3.2) compared to case (A), see

Figure 4(b). The bound predicts the actual behavior of ∠↷

max(X1,˜

Kk), i.e., all three

phases are reflected by the bound. We use this setting to analyze the influence of the

parameter lon our bound. Choosing l∈ {15,22,30}we see that the larger the number

of iteration steps lthat already have been carried out, the sharper the corresponding

bound in the convergence phase. In this phase the convergence rate of the bound

underestimates the true value (≈0.11) to be 0.23, which amounts to a slow down

factor of 1.47. Afterwards in the stagnation phase the bounds overestimate the actual

value only by a constant factor of about 3 (independent of l).

The constants of bound (3.2), stated in the second row of case (B) in Table 5.1,

indicate that even for a matrix where the gap between the wanted and the remaining

eigenvalues is smaller than the norm of the perturbation, i.e., g= 10−2∥E∥2the

bound (3.2) is meaningful.

We note that in case (C) the bound is further improved choosing Λ1,Λ2as in

18 Ute Kandler Christian Schr¨oder

0 5 10 15 20 25 30 35 40 45 50

10−11

10−9

10−7

10−5

10−3

10−1

101

number of Arnoldi steps k

true value, ε=10−11

bound (3.2), a=2.1 · kEkk2

bound (3.2), a=102· kEkk2

bound (3.2), a=104· kEkk2

(a) g∈ {2.1,102,104} · ∥E∥2,l= 20 and Λ2=

Λ3= Λ4= Λ1

0 5 10 15 20 25 30 35 40 45 50

10−11

10−9

10−7

10−5

10−3

10−1

101

number of Arnoldi steps k

true value, ε=10−11

bound (3.2), l=15

bound (3.2), l=22

bound (3.2), l=30

(b) g= 2.1· ∥E∥2,l∈ {15,22,30}and Λ2=

Λ3= Λ4⊃Λ1.

Fig. 4.Experiment 2: True angle of inclusion ∠↷

max(X1,˜

Kk)and its bounds (3.2)

case (B) and Λ3= Λ4= Λ2∪{2}. As stated in Remark 3.5 this improves ˜ηiwhich

leads to more accurate convergence rate of the bound. Here the convergence rate of

the bound and of the true value almost coincide, i.e., the slow down factor is about

1.00.

Experiment 3. In the third experiment we investigate how well the exact invariant

subspace X1is included in a Ritz space ˜

Yusing Theorem 4.1, i.e., we are interested

in the quality of the computed subspace ˜

Y. The subspace of interest X1still corre-

sponds to the eigenvalues Λ1:= {9−g, 7,6,5,4}. We choose Λ2:= Λ1∪ {9,4−g},

∥E∥2= 10−13∥A∥2and g= 10−11, i.e., the gap between the wanted and the re-

maining eigenvalues is very small. The approximate eigenpair ( ˜

M, ˜

Y) is obtained via

calculating a Ritz pair of A+E.

In this experiment we again distinguish between two cases: In case (A) we consider

the Ritz space corresponding to the 2nd to 6th largest Ritz values, i.e., the subspace

is of the same dimension as X1and the wanted eigenvalues Λ1are not well separated

from the unselected Ritz values ˜

M−. This leads to a very slow convergence of the

angle ∠↷

max(X1,˜

Y). Referring to Figure 5∠↷

max(X1,˜

Y) does not start to converge until

the 27th iteration step and decreases only to 10−3afterwards. Moreover, the angle

∠↷

max(X1,˜

Y) wiggles around after the convergence phase, which is also caused by the

insuﬃcient separation of the wanted and remaining Ritz values. The bound (4.1)

captures all three phases correctly, although it lags behind 5 iteration steps. After

the convergence phase the bound overestimates the true value by a factor of about

1.5.

In case (B) the subspace ˜

Ycorresponds to the 8 largest Ritz values. Now, there

is a suﬃciently large gap between the wanted eigenvalues Λ1und the unselected Ritz

values ˜

M−. We see in Figure 5that now the angle ∠↷

max(X1,˜

Y) starts to converge

earlier after 22 iteration steps and decreases afterwards to limiting accuracy on the

order of ∥E∥2/∥A∥2. The bound (4.1) captures all three phases initialization, con-

vergence and stagnation very well. During the convergence phase the convergence

rates of the bound and the true value diﬀer only by 0.005 and afterwards the bound

overestimates the true value by a constant factor of about 35.

5.2. Ising model. Returning to the motivation of this paper we consider as

a second example the one-dimensional quantum Ising model in the presence of a

transverse field as defined in [25,26]. The Hamiltonian of a system of dtwo-level

Analysis of inexact Arnoldi methods for Hermitian eigenvalue problems 19

0 5 10 15 20 25 30 35 40 45 50

10−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

number of Arnoldi steps k

true ∠y

max(X1,˜

Y) (dim(X1) = dim(˜

Y))

true ∠y

max(X1,˜

Y) (dim(X1)<dim(˜

Y))

bound (4.1) of ∠y

max(X1,˜

Y) (dim(X1) = dim(˜

Y))

bound (4.1) of ∠y

max(X1,˜

Y) (dim(X1)<dim(˜

Y))

Fig. 5.Experiment 3: True angles of inclusion ∠↷

max(X1,˜

Y)and corresponding bounds (4.1)

subsystems (qubits) is defined by

(5.2) A:= (1 −s)

∑

i=1

Ai+s

∑

i=1

BiBi+1

where s∈[0,1] and for i= 1, . . . , d

Ai:= −I2(i−1) ⊗[0 1

1 0]⊗I2(d−i), Bi:= −I2(i−1) ⊗[1 0

0−1]⊗I2(d−i)

and Bd+1 := B1. The dimension of the Hamiltonian Ais n= 2dgrowing exponentially

in the number of qubits. In the following experiments we choose d= 10, n= 1024

and s= 0.4. The spectrum for these values is depicted in Figure 6. We are interested

in the ground state as well as in the first and second excited state, i.e., we want to

approximate the subspace X1corresponding to Λ1consisting of the three smallest

eigenvalues of A. So J1={1,2,3}.

In Figure 6(b), we see that there is no gap between the third smallest and the

fourth smallest eigenvalue of A, i.e. Λ1is not well separated from the remaining

eigenvalues.

Experiment 4. In this experiment we illustrate the bound of Theorem 3.1. In the

first case we choose Λ2= Λ3= Λ4to consist of the four smallest eigenvalues of A,

i.e., J2=J3=J4={1, . . . , 4}and ∥E∥2= 10−11∥A||2. In Figure 7(a) the true angle

of inclusion ∠↷

max(X1,˜

Kk) and the bound (3.2) for l∈ {60,75,90}are plotted.

Qualitatively we obtain the same result as in Experiment 2 (B). The bound (3.2)

predicts all three phases initialization, convergence and stagnation of ∠↷

max(X1,˜

Kk).

As for the first test matrix in Experiment 2 we see that the larger the number of

previous iterations l, the sharper the corresponding bound in the first iteration steps.

The convergence rate is overestimated to be 0.77 (true value: 0.55). This amounts a

slow down factor of 2.31. After the convergence phase all three bounds overestimates

the true value by a constant factor of approximately 30.

20 Ute Kandler Christian Schr¨oder

0100 200 300 400 500 600 700 800 900 1,0001,100

−10

−8

−6

−4

−2

(a) all eigenvalues

0 2 46 8 10 12 14 16 18 20 22 24 26 28 30

−7

−6.5

−6

−5.5

−5

−4.5

−4

(b) the 30 smalles eigenvalues

Fig. 6.Eigenvalue distribution of the Ising model (5.2)with d= 10 and s= 0.4

0 20 40 60 80 100 120 140

10−11

10−9

10−7

10−5

10−3

10−1

101

number of Arnoldi steps k

true ε=10−11

bound (3.2) l=60

bound (3.2) l=75

bound (3.2) l=90

(a) Λ1⊂Λ2= Λ3= Λ4and l∈ {60,75,90}

0 20 40 60 80 100 120 140

10−11

10−9

10−7

10−5

10−3

10−1

101

number of Arnoldi steps k

true ε=10−11

bound (3.2) l=40

(b) Λ1⊂Λ2⊂Λ3⊂Λ4and l= 40

Fig. 7.Experiment 4: The angle of inclusion ∠↷

max(X1,˜

Kk)and its bounds (3.2)

In the second case we improve the bound by choosing l= 40 and Λ1⊂Λ2⊂

Λ3⊂Λ4, more precisely Λ1, Λ2, Λ3, Λ4consists of the 3, 4, 7, 11 smallest eigenvalues

of A, respectively. With this choice of the subsets the bound (3.2) is much sharper

compared to the first case, see Figure 7(b). Also the convergence rate of the bound

improved and the slow down factor is reduced to 1.31.

Experiment 5. In the last experiment we examine how well the exact subspace

X1of Ais included in the computed Ritz space ˜

Y, using bound (4.1). Again we

distinguish between dim(X1) = dim( ˜

Y) and dim(X1)<dim( ˜

Y), where in the first

case the Ritz space corresponds to the three smallest and in the second case to the

four smallest Ritz values.

In the first case there is an insuﬃcient separation between the wanted eigenvalues

und the unwanted Ritz values. Figure 8shows that for dim(X1) = dim( ˜

Y) the angle

∠↷

max(X1,˜

Y) does not converge and starts to wiggle after 100 steps of the algorithm.

This behavior is caused by the dense distribution of the eigenvalues of the matrix A.

More precisely, since the third and the fourth eigenvalue of Ais multiple the exact

subspace corresponding to the three smallest eigenvalues can not be approximated by

a Ritz space of the perturbed matrix A+E.

Since the third and the fourth eigenvalue of Ais multiple in general the condition

Analysis of inexact Arnoldi methods for Hermitian eigenvalue problems 21

0 20 40 60 80 100 120 140

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

number of Arnoldi steps k

angle of inclusion

∠y

max(X1,˜

Y) (dim(X1) = dim(˜

Y))

∠y

max(X1,˜

Y) (dim(X1)<dim(˜

Y))

bound (3.13) of ∠y

max(X1,˜

Y) (dim(X1) = dim(˜

Y))

bound (3.13) of ∠y

max(X1,˜

Y) (dim(X1)<dim(˜

Y))

Fig. 8.Experiment 5: True angles of inclusion ∠↷

max(X1,˜

Y)and corresponding bounds (4.1)

∥E∥2<gap( ˜

M−,Λ2) of bound (4.1) is violated. Only during the initial phase the

bound (4.1) can be computed because there the approximation of Ritz values is still

very inaccurate such that for a few iteration steps the condition on gap( ˜

M−,Λ2)

holds.

In the second case, the subspace ( ˜

Y) is slightly enlarged, which leads to an im-

provement of the convergence behavior of ∠↷

max(X1,˜

Y). Referring to Figure 8the

angle ∠↷

max(X1,˜

Y) starts to converge after 83 iteration. The bound (4.1) captures

this behavior correctly and overestimates the true value after the convergence phase

by a constant factor of 15.

In summary, also for the bound (4.1) we obtain qualitatively the same result as

for the first test matrix.

6. Conclusions. We have investigated the convergence of inexact Krylov sub-

space methods, where we bound the distance of an exact invariant subspace to an

inexact Krylov subspace and to an inexact Ritz space therein. The first bound ad-

dresses the question: If literations have been carried out without convergence, how

many more iteration steps of an inexact Krylov subspace method are necessary to

ensure convergence to a certain tolerance.

The second bound addresses the question: How much worse is the distance of

an invariant subspace to a Ritz space compared to its distance to the whole search

space? In particular, we have bounded the angle of inclusion of Xin an inexact Ritz

space ˜

Yin an a posteriori setting.

As special features our bounds can handle the presence of perturbations and a

small gap between the wanted and the unwanted eigenvalues. This bound needs the

information of the l-th step of the iteration, the 2-norm of the perturbation and the

exact eigenvalues of A. Here the key idea was to enlarge the Ritz space in order to

guarantee a suﬃciently large gap between the remaining Ritz values and the wanted

22 Ute Kandler Christian Schr¨oder

eigenvalues. Finally, in the numerical tests the bounds have been confirmed to cor-

rectly predict the trends of the convergence curves.

Acknowledgments. We thank Volker Mehrmann, Michael Karow, and Ag-

nieszka Miedlar (all TU Berlin) for insightful discussions on the topic and constructive

comments on early versions of the manuscript.

REFERENCES

[1] A. C. Antoulas,Approximation of large-scale dynamical systems, vol. 6 of Advances in Design

and Control, SIAM, Philadelphia, PA, 2005.

[2] C. A. Beattie,Harmonic Ritz and Lehmann bounds, Electron. Trans. Numer. Anal., 7 (1998),

pp. 18–39. Large scale eigenvalue problems (Argonne, IL, 1997).

[3] C. A. Beattie, M. Embree, and D. C. Sorensen,Convergence of polynomial restart Krylov

methods for eigenvalue computations, SIAM Rev., 47 (2005), pp. 492–515.

[4] R. Bhatia and P. Rosenthal,How and why to solve the operator equation AX −XB =Y,

Bull. London Math. Soc., 29 (1997), pp. 1–21.

[5] ˚

A. Bj¨

orck,Numerical methods for least squares problems, SIAM, Philadelphia, 1996.

[6] S. B¨

orm and C. Mehl,Numerical Methods for Eigenvalue Problems, De Gruyter Graduate

Lectures, Berlin/Boston, 2012.

[7] C. Davis and W. M. Kahan,The rotation of eigenvectors by a perturbation. III, SIAM J.

Numer. Anal., 7 (1970), pp. 1–46.

[8] G. H. Golub and C. F. Van Loan,Matrix computations, Johns Hopkins Studies in the

Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, third ed., 1996.

[9] L. Grasedyck, D. Kressner, and C. Tobler,A literature survey of low-rank tensor approx-

imation techniques, GAMM-Mitt., 36 (2013), pp. 53–78.

[10] S. Holtz, T. Rohwedder, and R. Schneider,On manifolds of tensors of fixed TT-rank,

Numer. Math., 120 (2012), pp. 701–731.

[11] T. Huckle, K. Waldherr, and T. Schulte-Herbr¨

uggen,Computations in quantum tensor

networks, Linear Algebra Appl., 438 (2013), pp. 750–781.

[12] I. C. F. Ipsen,Relative perturbation results for matrix eigenvalues and singular values, in

Acta numerica, 1998, vol. 7 of Acta Numer., Cambridge Univ. Press, Cambridge, 1998,

pp. 151–201.

[13] Z. Jia and G. W. Stewart,An analysis of the Rayleigh-Ritz method for approximating

eigenspaces, Math. Comp., 70 (2001), pp. 637–647.

[14] U. Kandler and C. Schr¨

oder,Backward error analysis of an inexact Arnoldi method using

a certain Gram Schmidt variant, Preprint 10-2013, TU Berlin, 2013.

[15] A. Kie lbasi´

nski and H. Schwetlick,Numerische lineare Algebra, vol. 18 of Mathematik

f¨ur Naturwissenschaft und Technik [Mathematics for Science and Technology], VEB

Deutscher Verlag der Wissenschaften, Berlin, 1988. Eine computerorientierte Einf¨uhrung.

[A computer-oriented introduction].

[16] A. Knyazev, A. Jujunashvili, and M. Argentati,Angles between infinite dimensional sub-

spaces with applications to the Rayleigh-Ritz and alternating projectors methods, J. Funct.

Anal., 259 (2010), pp. 1323–1345.

[17] A. V. Knyazev and M. E. Argentati,Principal angles between subspaces in an A-based

scalar product: algorithms and perturbation estimates, SIAM J. Sci. Comput., 23 (2002),

pp. 2008–2040 (electronic).

[18] R. B. Lehoucq,Implicitly restarted Arnoldi methods and subspace iteration, SIAM J. Matrix

Anal. Appl., 23 (2001), pp. 551–562 (electronic).

[19] R. B. Lehoucq and K. Meerbergen,Using generalized Cayley transformations within an

inexact rational Krylov sequence method, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 131–

148 (electronic).

[20] Y. Nakatsukasa,The tan θtheorem with relaxed conditions, Linear Algebra Appl., 436 (2012),

pp. 1528–1534.

[21] I. V. Oseledets,Tensor-train decomposition, SIAM J. Sci. Comput., 33 (2011), pp. 2295–2317.

[22] B. N. Parlett,The symmetric eigenvalue problem, Prentice-Hall Inc., Englewood Cliﬀs, N.J.,

1980. Prentice-Hall Series in Computational Mathematics.

[23] Y. Saad,On the rates of convergence of the Lanczos and the block-Lanczos methods, SIAM J.

Numer. Anal., 17 (1980), pp. 687–706.

[24] , Numerical methods for large eigenvalue problems, Classics in Applied Mathematics,

Society for Industrial and Applied Mathematics, rev. ed. ed., 2011.

Analysis of inexact Arnoldi methods for Hermitian eigenvalue problems 23

[25] G. Schaller,Adiabatic preparation without quantum phase transitions, Phys. Rev. A, 78

(2008), p. 032328.

[26] R. Sch¨

utzhold and G Schaller,Adiabatic quantum algorithms as quantum phase transitions:

First versus second order, Phys. Rev. A, 74 (2006), p. 060304.

[27] V. Simoncini,Ritz and pseudo-Ritz values using matrix polynomials, in Proceedings of

the Fourth Conference of the International Linear Algebra Society (Rotterdam, 1994),

vol. 241/243, 1996, pp. 787–801.

[28] , Variable accuracy of matrix-vector products in projection methods for eigencomputa-

tion, SIAM J. Numer. Anal., 43 (2005), pp. 1155–1174.

[29] V. Simoncini and D. B. Szyld,Theory of inexact Krylov subspace methods and applications

to scientific computing, SIAM J. Sci. Comput., 25 (2003), pp. 454–477 (electronic).

[30] G. W. Stewart,Error and perturbation bounds for subspaces associated with certain eigen-

value problems, SIAM Rev., 15 (1973), pp. 727–764.

[31] , A generalization of Saad’s theorem on Rayleigh-Ritz approximations, Linear Algebra

Appl., 327 (2001), pp. 115–119.

[32] , Matrix algorithms. Vol. II, Society for Industrial and Applied Mathematics (SIAM),

Philadelphia, PA, 2001. Eigensystems.

[33] , Backward error bounds for approximate Krylov subspaces, Linear Algebra Appl., 340

(2002), pp. 81–86.

[34] G. W. Stewart and J.-G. Sun,Matrix perturbation theory, Computer Science and Scientific

Computing, Academic Press Inc., Boston, MA, 1990.

[35] B. Sz.-Nagy,¨

Uber die Ungleichung von H. Bohr, Math. Nachr., 9 (1953), pp. 255–259.

[36] J. van den Eshof and G. L. G. Sleijpen,Inexact Krylov subspace methods for linear systems,

SIAM J. Matrix Anal. Appl., 26 (2004), pp. 125–153 (electronic).

[37] P. ˚

A. Wedin,On angles between subspaces of a finite dimensional inner product space, in

Matrix Pencils, vol. 973 of Lecture Notes in Mathematics, Springer Berlin Heidelberg,

1983, pp. 263–285.

[38] H. Wielandt,An extremum property of sums of eigenvalues., Proc. Am. Math. Soc., 6 (1955),

pp. 106–110.