On a perturbation bound for invariant subspaces of matrices [original]

SIAM J. MATRIX ANAL. APPL.c

2014 Society for Industrial and Applied Mathematics

Vol. 35, No. 2, pp. 599–618

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES OF

MATRICES∗

MICHAEL KAROW†AND DANIEL KRESSNER‡

Abstract. Given a nonsymmetric matrix A, we investigate the eﬀect of perturbations on an

invariant subspace of A. The result derived in this paper diﬀers from Stewart’s classical result and

sometimes yields tighter bounds. Moreover, we provide norm estimates for the remainder terms in

well-known perturbation expansions for invariant subspaces, eigenvectors, and eigenvalues.

Key words. invariant subspaces, perturbation theory, pseudospectra, quadratic matrix equation

AMS subject classifications. Author must provide

DOI. 10.1137/130912372

1. Introduction. A subspace X⊂Cnof a matrix A∈Cn×nis called invariant

if it satisﬁes

(1.1) AX⊂X.

In this paper, we reconsider the classical question of estimating the impact of pertur-

bations in Aon X.

Suppose that the columns X∈Cn×kform an orthonormal basis of X. Then (1.1)

implies the existence of A11 ∈Ck×ksuch that AX =XA11. The eigenvalues of A11 are

independent of the choice of basis and constitute the spectrum of the restriction of Ato

X. Extending Xto a unitary matrix [X, X⊥] leads to the block Schur decomposition

(1.2) AX, X⊥=X,X⊥A11 A12

0A22.

Note that this implies Λ(A)=Λ(A11)∪Λ(A22), where Λ(·) denotes the spectrum of

a matrix. Throughout this paper, we will assume

(1.3) Λ(A11)∩Λ(A22)=∅.

This is a necessary and suﬃcient condition for the Lipschitz continuity of Xwith

respect to perturbations in A[9, Thm. 15.5.1]. (Note that continuity requires a sub-

stantially weaker condition [9, Thm. 15.2.1].) Hence, if (1.3) holds, adding a small

perturbation A→ A+Eimplies a change in the invariant subspace that is asymp-

totically proportional to E.

Various bounds on the change of invariant subspaces under perturbations of A

have been derived, notably by Davis and Kahan [6], Stewart [18, 19], Demmel [8],

and Sun [24]. In the general nonsymmetric case, these bounds are valid only as long

as Eremains suﬃciently small. A minimal requirement is that the separation con-

dition (1.3) remains valid under perturbations. In the language of pseudospectra,

this means that Eshould stay below the critical perturbation level εfor which

∗Received by the editors March 7, 2013; accepted for publication (in revised form) by B. Sutton

March 17, 2014; published electronically May 13, 2014.

http://www.siam.org/journals/simax/35-2/91237.html

†Institut f¨ur Mathematik, TU Berlin, 10623 Berlin, Germany (k[email protected]berlin.de).

‡MATHICSE, EPF Lausanne, CH-1015 Lausanne, Switzerland (daniel.kressner@epﬂ.ch).

599

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

600 MICHAEL KAROW AND DANIEL KRESSNER

the components of the ε-pseudospectrum containing Λ(A11)andΛ(A22)ﬁrstmeet

each other [1]. It turns out that some existing perturbation results are unnecessarily

restrictive and require Eto stay signiﬁcantly below this critical level. The main con-

tribution of this paper consists of a novel perturbation bound for invariant subspaces;

see Theorem 3.1 below. To derive this bound, we employ pseudospectral techniques

in the analysis of a quadratic matrix equation.

The rest of this paper is organized as follows. In section 2, we introduce the basic

tools for the perturbation analysis of invariant subspaces and recall some existing

results. Section 3 contains the statement and proof of our main result, a new pertur-

bation bound for invariant subspaces. In section 3.2, it is shown that this bound is

sharp for a 2 ×2 example. Section 3.3 discusses a variation of the main result based

on the block diagonalization of A, while section 3.5 provides a comparison to existing

perturbation bounds. In section 4, the former is used to quantify existence conditions

and remainder terms for well-known eigenvalue and eigenvector expansions.

2. Preliminaries and existing results. The goal of this section is to summa-

rize some existing perturbation results for invariant subspaces and introduce notation,

needed in the rest of the paper, on the way. Let us ﬁrst recall some basic tools used

in the perturbation analysis.

2.1. Separation between matrices. The condition (1.3) can be quantiﬁed

by the separation between A11 and A22. Based on Varah’s original deﬁnition [27],

Demmel [8] has proposed

sepλ(A11,A

22):=sup{ε>0|Λ(A11 +E11)∩Λ(A22 +E22)=∅

∀E11,E

22 with max{E112,E222}≤ε}.

(2.1)

This deﬁnition has an important interpretation in terms of ε-pseudospectra, deﬁned as

Λε(M)=z∈C|z∈Λ(M+E)forsomeE∈Cn×nwith E2≤ε

for a matrix M∈Cn×n[26]. The separation (2.1) is the minimum value of εsuch

that Λε(A11)∩Λε(A22) is nonempty. This interpretation yields the expression

sepλ(A11,A

22) = inf

λ∈Cmax σmin(A11 −λI),σ

min(A22 −λI),

where σmin(·) denotes the smallest singular value of a matrix. Based on this expres-

sion, an algorithm for computing sepλhas been developed by Gu and Overton [11].

Although sepλis not an appropriate measure to quantify the perturbation of

invariant subspaces, it will still play a role in our derivations. Stewart [19] has intro-

duced a diﬀerent notion of separation based on the observation that (1.3) is satisﬁed

if and only if the Sylvester operator

T:C(n−k)×k→C(n−k)×k,T:Z→ ZA11 −A22Z

is nonsingular. The separation of A11 and A22 with respect to an arbitrary norm ·

is deﬁned as

(2.2) sep(A11,A

22):= min

Z=1 T(Z)=min

Z=1 ZA11 −A22Z.

Then sep(A11,A

22)= 0 if and only if (1.3) holds. Examples in [8, 27] show that the

quantity sep(A11,A

22) can be magnitudes smaller than sepλ(A11,A

22).

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 601

In the following, sepF(A11,A

22)=min

ZF=1 ZA11 −A22ZFdenotes the sep-

aration with respect to the Frobenius norm. An eﬃcient algorithm for estimating

sepF(A11,A

22) can be derived from the inverse power method [4]. Various lower and

upper bounds for sepF(A11,A

22) can be found in [22, 23, 28]. Lower bounds for other

norms are discussed in [29].

Proposition 2.1 below relates sep with sepλand the distance between the spectra

of A11 and A22. Most of its statements are well known; we have included the proof for

the convenience of the reader. Recall that a norm ·is said to be unitarily invariant

if UZV=Zfor all unitary matrices U, V of compatible size. A norm is unitarily

invariant if and only if

(2.3) XZY≤X2ZY2

for all matrices of compatible size.

Proposition 2.1. Let A11,E

11 ∈Ck×k,A22,E

22 ∈C(n−k)×(n−k),andλ∈C.

Let δ=min{|λ−μ||λ∈Λ(A11),μ∈Λ(A22)}denote the distance between the spectra

Λ(A11)and Λ(A22).

(a) For any norm, sep(A11,A

22)≤δ.

(b) For any unitarily invariant norm,

(i) sep(λI, A

22)=σmin(λI −A22),

(ii) sep(A11 +E11,A

22 +E22)≥sep(A11,A

22)−E112−E222,

(iii) sep(A11,A

22)≤2·sepλ(A11,A

22)≤δ.

22)=

2·sepλ(A11,A

22)=δ.

Proof. Consider vectors xand y∗such that x2=y2=1,y∗A11 =λy∗,and

A22x=μxfor λ∈Λ(A11), μ∈Λ(A22). Let Z=xy∗.ThenZA11−A22Z=(λ−μ)Z.

This implies (a). To show the statements of (b), let E11 =1

2(μ−λ)yy∗and E22 =

2(λ−μ)xx∗.Theny∗(A11 +E11)=1

2(λ+μ)y∗and (A22 +E22)x=1

2(λ+μ)x.Thus,

Λ(A11 +E11)∩Λ(A22 +E22)=∅for perturbations satisfying E112=E222=

|λ−μ|/2. This yields the second inequality in (b)(iii). The statement of (b)(ii) follows

from the inequality

Z(A11 +E11)−(A22 +E22)Z≥ZA11 −A22Z−(E112+E222)Z.

In particular, if max{E112,E22} <sep(A11,A

22)/2, then sep(A11 +E11,A

22 +

E22)>0. Thus, Λ(A11 +E11)∩Λ(A22 +E22)=∅. This yields the ﬁrst inequality of

(b)(iii).

Next, we show (c). Suppose that A11 and A22 are normal, and let U, V be unitary

matrices such that A11 =Udiag(λ1,...,λ

k)U∗,A22 =Vdiag(μ1,...,μ

n−k)V∗,where

λjand μiare the eigenvalues of A11 and A22, respectively. Let W=V∗ZU =[wij ].

Then

ZA11 −A22Z2

F=V∗(ZA11 −A22Z)U2

=Wdiag(λ1,...,λ

k)−diag(μ1,...,μ

n−k)W)2

=[(λj−μi)wij ]2

F=

ij |λi−μj|2|wij |2

≥δ2W2

F=δ2Z2

Thus, sepF(A11,A

22)≥δ. Combined with (b)(iii), this yields (c).

It remains to prove (b)(i). Let (u, v) be a pair of normalized singular vectors be-

longing to the smallest singular value σmin of λI−A22 such that (λI−A22)v=σminu,

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

602 MICHAEL KAROW AND DANIEL KRESSNER

and u2=v2.Thenvu∗=uu∗,sincevu∗and uu∗have the same singular val-

ues 1,0,...,0. Setting Z=vu∗/uu∗we obtain Z=1and(λI−A22)Z=σmin.

Thus sep(λI,A

22)≤σmin. On the other hand,

sep(λI,A

22)=min

Z=0 (λI−A22)Z

Z=min

W=0 W

(λI−A22)−1W

≥min

W=0 W

(λI−A22)−12W=σmin.

2.2. Invariant subspaces and a quadratic matrix equation. Let us con-

sider a general matrix A∈Cn×nand partition

A=A11 A12

A21 A22,A

11 ∈Ck×k,A

22 ∈C(n−k)×(n−k).

For given Z∈C(n−k)×k, consider the similarity transformation

(2.4) I0

−ZI

AI0

=A11 +A12ZA

A21 +A22Z−ZA11 −ZA12ZA

22 −ZA12

which becomes block upper triangular if and only if the quadratic matrix equation

(2.5) 0 = f(A, Z):=A21 +A22Z−ZA11 −ZA12Z

is satisﬁed. This implies Lemma 2.2 below. Note that a subspace Yof row vectors is

called a left invariant subspace if YA⊂Y.

Lemma 2.2. Using the notation introduced above, the following statements are

equivalent:

(i) The columns of [IZ

]span a right invariant subspace of Asuch that

AI

Z=I

Z(A11 +A12Z).

(ii) The rows of [−ZI]span a left invariant subspace of Asuch that

−ZI

A=(A22 −ZA12)−ZI

.

(iii) The quadratic matrix equation (2.5) is satisﬁed.

2.3. An asymptotic result. Perturbation bounds that are asymptotically valid

as E→0 can be obtained in a relatively straightforward way from truncating per-

turbation expansions. For invariant subspaces, such expansions have been discussed

in [5, 14, 25]. In the following, we will illustrate this approach.

Lemma 2.3. Given A∈Cn×n, suppose that there exists Z∈C(n−k)×ksuch that

f(A, Z)=0,with fdeﬁned as in (2.5).IfΛ(A11 +A12Z)∩Λ(A22 −ZA12)=∅,

then there exist an open neighborhood E⊂Cn×nof 0and an open neighborhood

Z⊂C(n−k)×kof Zsuch that for each E∈Ethe equation f(A+E,ZE)=0has a

unique solution ZE∈Z. Moreover, ZEdepends holomorphically on Eand admits

the ﬁrst-order expansion

ZE=Z+T−1

ZE21+O(E2)

with the Sylvester operator TZ:ΔZ→ ΔZ(A11 +A12Z)−(A22 −ZA12)ΔZ.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 603

Proof. Clearly, fis a holomorphic function in the entries of Aand Z.The

derivative of fwith respect to the variable Zequals the Sylvester operator −TZ.

Since A22 −ZA12 and A11 +A12Zhave disjoint spectra, the operator TZis invertible.

Thus, the lemma follows from the implicit function theorem [15].

The way Lemma 2.3 is stated will be convenient for later purposes. However, for

the sake of an asymptotic result, we may assume without loss of generality that the

unperturbed matrix Ais already in block triangular form,

(2.6) A=A11 A12

0A22,A

11 ∈Ck×k,A

22 ∈C(n−k)×(n−k).

By Lemma 2.2, this is equivalent to requiring that X=[

0] spans an invariant

subspace of A. Combining the statement of Lemma 2.3 with Lemma 2.2 then yields

the following result.

Corollary 2.4. Let Abe in block triangular form (2.6) and assume Λ(A11)∩

Λ(A22)=∅. Then, for every Ewith Esuﬃciently small, there exists an invariant

subspace XE= span[ I

ZE]of A+Esuch that ZEadmits the ﬁrst-order expansion

(2.7) ZE=T−1E21+O(E2),T:Z→ ZA11 −A22Z.

Recently, Stewart [20] derived bounds on Efor which ZE, as a function of E,

is Fr´echet diﬀerentiable.

Once we have obtained ZE, there are diﬀerent ways of comparing the two invariant

subspaces

X=spanI

0,XE=spanI

ZE

of the matrices Aand A+E, respectively. If σ1≥σ2≥ ··· ≥ σkdenote the

singular values of ZE, then the ith canonical angle between Xand XEis given by

θi(X,XE) = arctan σi.Deﬁning Θ(X,XE) := diag(θ1,...,θ

k), it is well known [21,

sect. II.4] that sin(Θ(X,XE))generates a metric on the k-dimensional subspaces

of Cnfor any unitarily invariant matrix norm ·. However, it is sometimes more

convenient to simply use

(2.8) ZE=tan(Θ(X,XE))

for measuring the distance, which remains close to sin(Θ(X,XE))as long as ZE

is small. The ﬁrst-order result

(2.9) tan(Θ(X,XE))=ZE=E21

sep(A11,A

22)+O(E2)

is now readily obtained from Corollary 2.4. This also conﬁrms that sep(A11,A

22)−1

is the condition number of X[2].

2.4. Nonasymptotic results. The derivation of nonasymptotic results requires

a more careful study of the quadratic matrix equation f(A+E,ZE) = 0 with fas

in (2.5) and

(2.10) A+E=A11 +E11 A12 +E12

E21 A22 +E22.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

604 MICHAEL KAROW AND DANIEL KRESSNER

In particular, Ais assumed to be block upper triangular. Then Stewart’s result [18,

Thm. 4.1] see also [21, Thm. 2.7]) reads as follows.

Theorem 2.5. Let ·denote a consistent family of norms (i.e. XY≤

XYwhenever the matrix product XY is deﬁned). Let A+E∈Cn×nbe parti-

tioned as in (2.10) and set

sE:= sep(A11,A

22)−E11−E22.

If sE>0and

(2.11) E21(A12+E12)<s

E/4,

then there exists a unique solution ZEof f(A+E,ZE)=0satisfying

(2.12) ZE≤ 2E21

sE+s2

E−4E21(A12+E12)<2E21

Interestingly, Theorem 2.5 can be derived directly from the Newton–Kantorovich

theorem; see Appendix A.

Remark 2.6. For a unitarily invariant family of norms ·, the result of Theo-

rem 2.5 still holds if sEis replaced with ˜sE=sep(A11,A

22)−E112−E222,E12

is replaced with E122,andA12is replaced with A122. See Appendix A.

A diﬀerent analysis for the case of the Frobenius norm has been given by

Demmel [8]; see also [16].

Theorem 2.7 (see [16, Thm. 1.15]). Using the notation of Theorem 2.5, assume

that

(2.13) EF<sepF(A11,A

22)

4P2

where Pdenotes the spectral projector of Abelonging to Λ(A11). Then there exists a

unique solution ZEof f(A+E,ZE)=0satisfying

ZEF<4EF

sepF(A11,A

22)−4P2EF

A comparison of Theorems 2.5 and 2.7 with our results is given in section 3.5.

3. Main result. Our main result admits the use of diﬀerent norms for measuring

the perturbation Eand the solution Z. More speciﬁcally, we consider two norms

·:C(n−k)×k→Rand |||·|

||:Cn×n→Rthat satisfy the inequalities

(N1) Z2≤Z,(N2) UZV≤U2ZV2,

(N3) E2≤|

||E|||,(N4) XEY≤X2||

|E|||Y2

for all E∈Cn×n,Z∈C(n−k)×k,X∈Ck×n,Y ∈Cn×k,U∈C(n−k)×(n−k),and

V∈Ck×k. Condition (N2) is equivalent to requiring that ·is unitarily invariant.

In particular, there is a symmetric gauge function Φ such that

Z=Φ(σ1(Z),...,σ

min{k,n−k}(Z)),

where σ1(Z)≥ ··· ≥ σmin{k,n−k}(Z) are the singular values of Zin nonincreas-

ing order; see [3, Thm. IV.2.1]. Condition (N1) is equivalent to requiring that

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 605

Φ(1,0,...,0) ≥1. If we now deﬁne |||E|||for E∈Cn×nby the same formula as

Z,thatis

||E|||=Φ(σ1(E),...,σ

min{k,n−k}(E)),

then conditions (N1)–(N4) are all satisﬁed. It is important to note that only min{k,n−

k}of the nsingular values of Eare involved in the deﬁnition of |||E|||.

We remark that ||

|·|

||need not be unitarily invariant. For instance, the conditions

(N1)–(N4) are also valid if ·=·Fis the Frobenius norm and |||E|||=i,j |eij|.

Theorem 3.1. Consider the block triangular matrix

A=A11 A12

0A22,A

11 ∈Ck×k,A

22 ∈C(n−k)×(n−k),

and assume that Λ(A11)∩Λ(A22)=∅.Let·and |||·|

||be norms on C(n−k)×kand

Cn×n, respectively, that satisfy conditions (N1)–(N4).Let

s:= sep(A11,A

22)= min

Z=1 ZA11 −A22Z.

For ε≥0deﬁne g(ε)=ε(ε+A122)and let ρ≥0be such that g(ρ)=s

2, i.e.,

ρ=1

2(s2+A122

2−A122).Finally,letBρ:= {E∈Cn×n||

||E|||<ρ}. Then the

following statements hold:

(a) For all ε≥0,Λε(A)⊆Λg(ε)(A11)∪Λg(ε)(A22).

(b) If ε<ρ,thenΛg(ε)(A11)∩Λg(ε)(A22)=∅.

BρE−→ ZE∈C(n−k)×k

with the following properties:

(i) The columns of [IZ



E]span a right invariant subspace XEof A+E.

(ii) The rows of [−ZEI]span a left invariant subspace YEof A+E.

(iii) The spectrum of the restriction of A+Eto XEis contained in the

pseudospectrum Λg(E2)(A11). The spectrum of the restriction of A+E

to YEis contained in the pseudospectrum Λg(E2)(A22).

(iv) The matrix ZEsatisﬁes

(3.1) ZE≤ 2|||E|||

s+s2−4|||E|||(|||E|||+A122)≤2

s|||E|||

as well as

(3.2) ZE−T−1E21≤ 6

s2E2||

|E|||,

where T:Z→ ZA11 −A22Z.

Remark 3.2. Inequality (3.2) gives a bound for the remainder O(E2)inthe

ﬁrst-order expansion (2.7).

Remark 3.3. The ﬁrst bound in (3.1) looks quite complicated. A slightly weaker

but more appealing estimate for ZEis

(3.3) ZE≤|||E|||

s1+|||E|||

ρ,

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

606 MICHAEL KAROW AND DANIEL KRESSNER

which can be derived as follows. Setting ψ()=2/(s+s2−4(+A12)) the ﬁrst

bound in (3.1) can be written as ZE≤ψ(|||E|||)|||E|||. A direct computation yields

ψ()≥0. Thus, ψis a convex function. It follows that ψ()≤ψ(0) + (/ρ)(ψ(ρ)−

ψ(0)) = s−1(1+(/ρ)).This shows (3.3).

3.1. Proof of Theorem 3.1. Statement (a) of Theorem 3.1 has been shown

by Grammont and Largillier [10, Proposition 3.1]; see also [13]. Statement (b) is a

consequence of Proposition 2.1(b)(iii). It remains to prove statement (c), in particular

the upper bound (3.1). For this purpose, we will ﬁrst derive an auxiliary result showing

that this upper bound and a certain lower bound are mutually exclusive. The main

part of the proof then consists of ruling out the lower bound by a continuity argument.

In the following, we will write Zinstead of ZEfor notational convenience.

Lemma 3.4. With the notation and assumptions stated in Theorem 3.1, deﬁne

r+(ε)=2ε/s+s2−4ε(ε+A122),r

−(ε)=2ε/s−s2−4ε(ε+A122).

If |||E|||≤ρand f(A+E,Z)=0,thenZ≤r+(|||E|||)or Z≥r−(|||E|||).

Proof. By direct computation, it can be veriﬁed that r−(ε)andr+(ε)arethe

zeros of the quadratic polynomial pε(r):=(ε+A122)r2−sr+ε. Since its leading

coeﬃcient is positive, we have

(3.4) pε(r)<0⇔r+(ε)<r<r

−(ε).

The assumption Λ(A11)∩Λ(A22)=∅of Theorem 3.1 implies that the Sylvester

operator T:Z→ ZA11 −A22Zis nonsingular, and hence s=T−1−1. The equation

f(A+E,Z) = 0 can be written as

T(Z)=[−ZI]EI

Z−ZA12Z

or, equivalently,

Z=T−1[−ZI]EI

Z−ZA12Z.

Using (2.3) and the fact that [−ZI]2=[IZ

]2=1+Z2

2≤1+Z2,

we conclude that Z≤|||E|||(1 + Z2)+A122Z2/s, which is equivalent to

0≤p|

|E|

|(Z). Thus, the claim follows from (3.4). Note that the upper bound

Z≤r+(|||E|||) in Lemma 3.4 is equivalent to the ﬁrst inequality in (3.1).

Lemma 3.5. For E∈B

ρlet IEdenote the set of t∈[0,1] such that there exists

amatrixZwith the following properties:

(α)f(A+tE, Z)=0;

(β)Λ

1(t)⊆Λg(E2)(A11),whereΛ1(t):=Λ(A11 +t(E11 +E12Z));

(γ)Λ

2(t)⊆Λg(E2)(A22),whereΛ2(t):=Λ(A22 +t(E22 −ZE21));

(δ)Z≤r+(|||E|||).

Then 1∈IE.

Proof. The proof of the statement proceeds via analytic continuation in three

steps.

Step 1. Since the conditions (α)–(δ) hold for t=0andZ= 0, it follows that

0∈IE.

Step 2. We now claim that there exists ε>0 such that [ˆ

t, ˆ

t+ε)⊂IEfor any

t∈IEwith ˆ

t<1.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 607

Let ˆ

Zbe such that f(A+ˆ

tE, ˆ

Z) = 0. The pseudospectra Λg(E2)(A11)and

Λg(E2)(A22) are disjoint by Theorem 3.1(b). Thus Λ1(ˆ

t)andΛ

2(ˆ

t)aredisjoint,too.

Hence, Lemma 2.3 applied to A+ˆ

tE implies that there exist ε>0 and a holomorphic

function

[ˆ

t, ˆ

t+ε)t−→ Zt∈Cm×l

such that f(A+tE, Zt)=0andZˆ

t=ˆ

Z. We may assume that ˆ

t+ε<1. The set

Λ1(t) is the spectrum of A+tE restricted to the right invariant subspace [IZ



t].

Thus,

Λ1(t)⊂Λ(A+tE)⊂Λg(|

|E|

|)(A11)∪Λg(|

|E|

|)(A22).

However, since the latter pseudospectra are disjoint closed sets it follows from Λ1(ˆ

t)⊆

Λg(|

|E|

|)(A11) and the continuity of eigenvalues that Λ1(t)⊆Λg(|

|E|

|)(A11) for all t∈

[ˆ

t, ˆ

t+ε). Analogously, we conclude that Λ2(t)⊆Λg(|

|E|

|)(A22) for all t∈[ˆ

t, ˆ

t+ε).

It remains to verify property (δ). By Lemma 3.4, we have for every tthat Zt≤

r+(t|||E|||)≤r+(|||E|||)orZt≥r−(t|||E|||)≥r−(|||E|||). Since the former inequality holds

for t=ˆ

tand r+(|||E|||)<r

−(|||E|||), the continuity of t→ Ztimplies that Zt≤r+(|||E|||)

for all t∈[ˆ

t, ˆ

t+ε). This establishes the claim.

Step 3. We now claim that the set IEis closed.

Let (tj) be a sequence in IEwith limit t∗. Then there exists a sequence (Zj)

such that the pairs (tj,Z

j) satisfy (α)–(δ). In particular Zj≤r+(|||E|||) for all j.

By compactness the sequence (Zj) has a convergent subsequence. Let Z∗denote its

limit. Then (t∗,Z

∗)satisﬁes(α)–(δ). In particular, the properties (β)and(γ)for

(t∗,Z

∗) are consequences of the continuity of eigenvalues and the closedness of the

pseudospectra Λg(E2)(A11),Λg(E2)(A22). This establishes the claim.

From Step 3, together with Steps 1 and 2, it follows that 1 = sup IE∈IE.

ProofofTheorem3.1(c). By applying Lemma 2.2 to A+E, each of the conditions

(c)(i) and (c)(ii) of Theorem 3.1 is equivalent to property (α) of Lemma 3.5 for t=1.

Again by Lemma 2.2, condition (c)(iii) is equivalent to the properties (β)and(γ)for

t= 1. Thus, Lemma 3.5 establishes the existence of Zsatisfying (c)(i)–(c)(iii) as well

as the ﬁrst inequality in (3.1).

Next we prove uniqueness of any such matrix Z. For this purpose, suppose that

f(A+E,Z)=f(A+E, ˜

Z)=0. Then

0=f(A+E,Z)−f(A+E, ˜

=E21 +(A22 +E22)Z−Z(A11 +E11)−Z(A12 +E12)Z

−E21 +(A22 +E22)˜

Z−˜

Z(A11 +E11)−˜

Z(A12 +E12)˜

Z

=(A22 +E22 −˜

Z(A12 +E12))

 

=: ˜

A22

(Z−˜

Z)−(Z−˜

Z)(A11 +E11 +(A12 +E12)Z)

 

=: ˜

A11

Since both Zand ˜

Zsatisfy properties (β)and(γ)fort= 1, the spectra Λ( ˜

A22)and

Λ( ˜

A11) are disjoint. Thus, Z−˜

Z=0.

In summary, we have shown the existence and uniqueness of a function E→ ZE

satisfying conditions (i)–(iii) in Theorem 3.1. Lemma 2.3 implies that this function is

holomorphic. It remains to prove the inequality (3.2). The relation f(A+E,ZE)=0

is equivalent to

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

608 MICHAEL KAROW AND DANIEL KRESSNER

ZE=T−1(E21 +E22ZE−ZEE11 −ZEE12ZE).

Furthermore, by (3.1) we have ZE≤2|||E|||/s < 2ρ/s ≤1. Thus,

ZE−T−1E21=T−1(E22ZE−ZEE11 −ZEE12ZE)

≤1

s(2E2ZE+E2ZE2)

≤3

sE2ZE≤ 6

s2E2||

|E|||.

This concludes the proof of Theorem 3.1.

3.2. The 2×2case. In this section we show for a 2×2 example that the bounds

in Theorem 3.1 are sharp. Let

A=−s/2c

0s/2,E=0ε

−ε0,s>0,c,ε≥0.

Then sis the separation of the diagonal elements of Aand E2=ε.Furthermore,

let ρ=1

2(√s2+c2−c). Then ρ(ρ+c)=s2/4. The ε-pseudospectrum of Acan be

calculated as

Λε(A)={z∈C|σmin(zI −A)≤ε}

=z∈C|1

2z−s

2+z+s

22+c2−z−s

2−z+s

22+c2≤ε

;

see also [13]. By Theorem 3.1(a),

Λε(A)⊆D

√ε(ε+c)(−s/2) ∪D

√ε(ε+c)(s/2),(3.5)

where Dr(z)⊂Cdenotes the closed disk of radius r≥0 around z∈C.Ifε<ρ,

then ε(ε+c)<s/2 and the disks in (3.5) are disjoint. Hence, the pseudospectrum

Λε(A) has two connected components. For ε=ρthe disks in (3.5) touch each other

at 0 ∈C. From (3.5) it follows that also 0 ∈σρ(A). Hence, σε(A) has only one

connected component for ε≥ρ. The eigenvalues of A+Eare

λ±(ε)=±1

2s2−4ε(ε+c).

These eigenvalues lie close to the boundary of Λε(A). The situation is illustrated in

Figure 1, where the shaded regions represent pseudospectra, the circles represent the

boundaries of the disks D√ε(ε+c)(±s/2), and the dots mark the eigenvalues λ±(ε).

A right eigenvector to the eigenvalue λ−(ε) is given by [1 zε],wherezε=2ε/(s+

s2−4ε(ε+c)). If ε<ρ, then the eigenvalues are real and distinct. If ε=ρ,then

A+Eis similar to a Jordan block. More speciﬁcally, in this case we have

A+E=1−2/s

2ε/s 001

 1−2/s

2ε/s 0−1

If ε>ρ, then the eigenvalues λ±(ε) are purely imaginary. Note that the function

ε→ zεis not diﬀerentiable at ε=ρ.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 609

−s/2 0 s/2

−s/2

s/2

(ε(ε+c))1/2

Case ε < 0.5*((s2+c2)1/2−c)

−s/2 0 s/2

−s/2

s/2

(ε(ε+c))1/2

Case ε = 0.5*((s2+c2)1/2−c)

−s/2 0 s/2

−s/2

s/2

(ε(ε+c))1/2

Case ε > 0.5*((s2+c2)1/2−c)

Fig. 1.ε-pseudospectra of a 2×2matrix for three values of ε.

3.3. A bound in terms of the spectral decomposition. The purpose of this

section is to derive a bound based on the block diagonalization of A. We assume the

setting of Theorem 3.1. In particular, Ais block triangular and Λ(A11)∩Λ(A22)=∅.

Then there exists a unique R∈Ck×(n−k)such that RA22 −A11R=A12.Wehave

AR

I=R

IA22.

Hence, the columns of [RI]span a right invariant subspace Xcof Awhich is

complementary to X= range([I0]). The projector onto Xalong Xcis the spectral

projector P=I−R

.Let

(3.6) p=1+R2

2,κ=p+R2=p+p2−1,G=IR/p

0I/p.

Then p=P2,κ=G2G−12is the condition number of G[8] and

A=Gdiag(A11,A

22)G−1,G

−1=I−R

0pI.

Furthermore, we have

R2= (tan ϕ)−1,p=(sinϕ)−1,κ= (tan ϕ

2)−1,

where ϕis the smallest angle between the subspaces Xand Xc; see [7].

With these preparations we are in a position to state and prove the following

theorem.

Theorem 3.6. Let Aand sbe deﬁned as in Theorem 3.1,andlet·and |||·|

be unitarily invariant norms on C(n−k)×kand Cn×n, respectively, satisfying the con-

ditions (N1)–(N4).LetRand κbe deﬁned as above. If |

||E|||<s/(2κ), then there

exists a unique WE∈C(n−k)×k, depending holomorphically on Ewith the following

properties:

(i) The columns of [IR

0I][ I

WE]span a right invariant subspace XEof A+E.

(ii) The rows of [−WEI][ I−R

0I]span a left invariant subspace YEof A+E.

(iii) The spectrum of the restriction of A+Eto XEis contained in the pseu-

dospectrum ΛκE2(A11). The spectrum of the restriction of A+Eto YEis

contained in the pseudospectrum ΛκE2(A22).

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

610 MICHAEL KAROW AND DANIEL KRESSNER

(iv) The matrix WEsatisﬁes

(3.7) WE≤2κ

sp|||E|||

and

(3.8) WE−T−1(E21)≤6κ2

2E2|

||E|||,

where T:Z→ ZA11 −A22Z.

(v) Let ZE=WE(I+RWE)−1=(I+WER)−1WE. Then the columns of

[IZ



E]span XE, the rows of [−ZEI]span the subspace YE,and

(3.9) ZE≤

2κ

p|||E|||

s−2κ

pR2|

||E|||.

Proof.Letˆ

A= diag(A11,A

22)and ˆ

E=G−1EG.ThenA+E=G(ˆ

A+ˆ

E)G−1.

Furthermore, the equivalences

(3.10) (ˆ

A+ˆ

E)U=UL ⇔(A+E)(GU)=(GU)L,

V(ˆ

A+ˆ

E)=MV ⇔(VG

−1)(A+E)=M(VG

−1)

hold for any U∈Cn×k,L∈Ck×k,V∈C(n−k)×n,M∈C(n−k)×(n−k).Thus,the

columns of GU span a right invariant subspace of A+Eif and only if the columns of

Uspan a right invariant subspace ˆ

A+ˆ

E. Furthermore, the rows of VG

−1span a left

invariant subspace of A+Eif and only if the rows of Vspan a left invariant subspace

A+ˆ

E. Suppose ||

|E|||<s/(2κ). Then, since |||·|

||is unitarily invariant,

(3.11) |

||ˆ

||≤κ|||E|||<s/2.

Hence, according to Theorem 3.1 there exists a unique Zˆ

E∈C(n−k)×kdepending

holomorphically on ˆ

Ewith the following properties:

(i)Thecolumnsof[IZ



E]span a right invariant subspace Xˆ

Eof ˆ

A+ˆ

(ii)Therowsof[−Zˆ

EI] span a left invariant subspace Yˆ

Eof ˆ

A+ˆ

(iii) The spectrum of the restriction of ˆ

A+ˆ

Eto Xˆ

Eis contained in the pseu-

dospectrum Λˆ

E2(A11). The spectrum of the restriction of A+Eto Yˆ

Eis

contained in the pseudospectrum Λˆ

E2(A22).

(iv)ThematrixZˆ

Esatisﬁes

Zˆ

E≤2

s|||ˆ

E|||as well as Zˆ

E−T−1(ˆ

E21)≤ 6

s2ˆ

E2|||ˆ

E|||.

Let U=[IZ



E],V=[−Zˆ

EI], and WE=Zˆ

E/p.Then

IR

0II

WE=GU, −WEII−R

0I=VG

−1.

Hence, the claims (i)–(iv) follow from (i)–(iv), the equivalences (3.10), the inequality

(3.11), and the fact that ˆ

E21 =pE

21.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 611

Now, redeﬁne ZEas ZE:= WE(I+RWE)−1=(I+RWE)−1WE. The ﬁrst

statements of claim (v) follow from the identities

IR

0II

WE=I

ZE(I+RWE),−WEII−R

0I=(I+WER)−ZEI.

Since ZE=WE(I+RWE)−1≤WE/(1 −R2WE) the inequality (3.9) is a

consequence of (3.7).

Remark 3.7. If Eand the underlying norms fulﬁll the assumptions of both

Theorem 3.1 and Theorem 3.6, then the solution matrices ZEestablished in these

theorems are identical. This follows from the uniqueness statement in Theorem 3.1.

Note that the bound (3.1) is tighter than the bound (3.9).

3.4. The case of the Frobenius norm. We now specialize our bounds to the

case that the underlying norm is the Frobenius norm ·Fon C(n−k)×k. To this end,

we deﬁne a unitarily invariant norm on Cn×nby

(3.12) EF,k =⎛

⎝

min{k,n−k}



j=1

σj(E)2⎞

⎠

1/2

where σ1(E)≥σ2(E)≥ ··· ≥ σn(E) denote the singular values of E∈Cn×nin

nonincreasing order. Note that EF,k ≤EFwith equality if and only if rank E≤

k. Since conditions (N1)–(N4) are satisﬁed for ·=·Fand |||·|

||=·F,k,wehave

the following corollary to Theorems 3.1 and 3.6.

Corollary 3.8. Let A∈Cn×nbe partitioned as in Theorem 3.1.LetR,

κ,andpbe deﬁned as above, and let s=sep

F(A11,A

22). Furthermore, let ρ=

2(s2+A122

2−A122).

(i) If EF,k <ρ, then then the matrix ZEin Theorem 3.1 satisﬁes

(3.13) ZEF≤s

2EF,k.

(ii) If EF,k <s

2κ, then the matrix ZEin Theorem 3.6 satisﬁes

(3.14) ZEF≤

2κ

pEF,k

s−2κ

pR2EF,k

If Esatisﬁes both conditions, EF,k <ρand EF,k <s

2κ, then the matrices ZEin

(i) and (ii) are identical.

3.5. Comparison with existing results.

Comparison with Theorem 2.7.Since always EF,k ≤EF,R2≤p,and

κ≤2p, the bound (3.14) is an improvement of Demmel’s bound in Theorem 2.7.

For the particular case that Ais block diagonal (i.e., A12 =0)wehavethat

κ=p=1,R= 0, and (3.14) as well as (3.13) state

ZEF≤2EF,k

sif EF,k ≤s

where s=sep

F(A11,A

22). In contrast, Theorem 2.7 yields

ZEF≤4EF

sif EF≤s

On the other hand, if R2is large, then the constants in the bound (3.14) and

in Demmel’s bound are nearly identical since then R2≈pand 2κ/p ≈4.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

612 MICHAEL KAROW AND DANIEL KRESSNER

Comparison with Theorem 2.5.It is not that easy to compare our bound (3.1)

with Stewart’s bound (2.12) from Theorem 2.5. First, the bound (3.1) holds under

the condition |||E|||(|||E|||+A12)<s

2/4, while Stewart’s bound requires E21(E12+

A12)<s

E/4, where s=sep(A11,A

22)andsE=s−E11−E22. Hence, these

bounds have a diﬀerent range of applicability. The advantage of the bound in Theo-

rem 3.1 over Stewart’s result is that it has the separation sin the denominator instead

of the smaller number sE.

On the other hand, (3.1) works with the norm of the whole matrix Eand Stewart’s

bound involves the norms of the blocks Eij , which are smaller than |||E|||.Inparticular,

Stewart’s bound properly predicts ZE=0whenE21 = 0. Our bound (3.1) does not

reﬂect this fact.

Following advice by Stewart, we performed several numerical experiments with

random matrices and perturbations. Not surprisingly, in these experiments Stewart’s

bound is observed to be tighter than ours in most cases when |||E|||is small compared

to s/2. The same observation is made when A12is not signiﬁcantly smaller than s.

It turns out that our new bound becomes advantageous when A12 =0,theper-

turbations are measured in the spectral norm, and |||E|||is not small compared with s/2.

To demonstrate this, let us consider an n×nblock diagonal matrix Awith k×kand

(n−k)×(n−k) diagonal blocks. With A12 = 0, the perturbation bounds depend on

Aonly via the separation sand they scale nearly proportionally with 1/s. It therefore

suﬃces to consider one value for s,say,s= 1. We have then chosen ε∈(0,s/2] and

generated a large set of random perturbations (10,000 for n= 7 and 1000 for n= 50)

having normally distributed entries and norm ε. The plots in Figure 2 show the

percentage of cases our bound is smaller than Stewart’s bound. Note that “strong”

refers to the stronger bounds (i.e., the ﬁrst inequalities) while “weak” refers to the

weaker but simpler bounds (i.e., the second inequalities) in (2.12) and (3.1). We have

considered the spectral norm as well as the Frobenius norm for measuring pertur-

bations. In the latter case, we have used the tighter bound (A.4) instead of (2.12)

and set |||E|||:= EF,k =σ1(E)2+···+σ2

min{k,n−k}(E). For the spectral norm,

Figures 2(a) and (c) show that our new bound performs better for |

||E|||close to s/2,

especially when kis larger. For the Frobenius norm, the new bound seems to be

better on average only for k=1andwhen|

||E|||is close to s/2.

The following example illustrates that Stewart’s bound can become very conser-

vative for |

||E|||≈s/2.

Example 3.9. For A11,A

22 ∈C2×2with disjoint spectra consider

A=A11 0

0A22,E

t=st

⎡

⎢

⎣

1000

0000

0100

0001

⎤

⎥

⎦=Et

11 Et

21 Et

22,

where 0 ≤t<1ands=sep

2(A11,A

22) is the separation with respect to the spectral

norm. Then Et2=Et

112=Et

222=Et

212=st/2, Et

122=0,andsE=

s−Et

112−Et

222=s(1−t). If ·=|||·|

||=·2, then our bound (3.1) and Stewart’s

bound in Theorem 2.5 are both applicable for t<1. As ttends to 1, Stewart’s bound

tends to inﬁnity, while our bound (3.1) tends to 1. If s=sep

F(A11,A

22) denotes the

separation with respect to the Frobenius norm and |||·|

||=·F,2, then our bound (3.1)

and Stewart’s bound in Theorem 2.5 are both applicable for t<1/√2. As ttends

to 1/√2, Stewart’s bound tends to 1/(23/2−2) ≈1.207, while our bound (3.1) tends

to 1.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 613

0 0.1 0.2 0.3 0.4 0.5

100

norm of random perturbation E

%new bound is better

strong (k=1)

weak (k=1)

strong (k=2)

weak (k=2)

strong (k=3)

weak (k=3)

(a) n= 7, spectral norm

0 0.1 0.2 0.3 0.4 0.5

100

norm of random perturbation E

%new bound is better

strong (k=1)

weak (k=1)

strong (k=2)

weak (k=2)

strong (k=3)

weak (k=3)

(b) n= 7, Frobenius norm

0 0.1 0.2 0.3 0.4 0.5

100

norm of random perturbation E

%new bound is better

strong (k=1)

weak (k=1)

strong (k=2)

weak (k=2)

strong (k=20)

weak (k=20)

0 0.1 0.2 0.3 0.4 0.5

100

norm of random perturbation E

%new bound is better

strong (k=1)

weak (k=1)

strong (k=2)

weak (k=2)

(d) n= 50, Frobenius norm

Fig. 2.Performance of new perturbation bound (3.1) compared to Stewart’s bounds (2.12)

(spectral norm) or (A.4) (Frobenius norm) for block diagonal matrices and random perturbations.

4. The case of a simple eigenvalue. The theorem below gives the second-

order expansion of a simple eigenvalue as well as the ﬁrst-order expansion of the

associated eigenvector. These expansions are well known and can be found, e.g.,

in [17]. Our novel contributions consist of the existence regions and the bounds for

the remainders in the expansion. Below, M†and Mdenote the Moore–Penrose

inverse and the Drazin inverse of M∈Cn×n, respectively. Furthermore, θ(x, y)=

arccos(|x∗y|/(x2y2)) denotes the angle between the one-dimensional subspaces

Cxand Cy.

Theorem 4.1. Let x0∈Cnbe a normalized right eigenvector of A∈Cn×n

belonging to a simple eigenvalue λ0∈C(i.e., Ax0=λ0x0,x02=1). Let y0∈C

be a left eigenvector such that y∗

0A=λ0y∗

0and y∗

0x0=1. Moreover, let

c=x∗

0(λ0I−A)2,

P0=I−x0x∗

p0=y02,

κ0=y02+y02

2−1,

s0=σn−1(P0(λ0I−A)),

where σn−1(·)denotes the second smallest singular value.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

614 MICHAEL KAROW AND DANIEL KRESSNER

(i) If E2<1

2(s2

0+c2−c), then there exists an eigenvector xEof A+E

depending holomorphically on Esuch that x∗

0xE=1and

xE=(I+[P0(λI −A)]†E)x0+ξE

for some ξE∈Cnwith

ξE2≤6

0E2

Furthermore,

xE−x02=tan(θ(x0,x

E)) ≤2E2/s0.

(ii) If E2<s

0/(2κ0), then there exists an eigenvector ˜xEof A+Edepending

holomorphically on Esuch that y∗

0˜xE=1and

˜xE=x0+(λ0I−A)Ex0+˜

ξE

for some ˜

ξE∈Cnwith

(4.1) ˜

ξE2≤6κ2

0E2

Furthermore,

˜xE−x02≤2κ0

s0E2and tan(θ(x0,˜xE)) ≤

2κ0

p0E2

s0−2κ0

p0p2

0−1E2

(4.2)

The corresponding eigenvalue of A+Esatisﬁes

λE=λ0+y∗

0E˜xE

(4.3)

=λ0+y∗

0Ex0+E

=λ0+y∗

0Ex0+y∗

0E(λ0I−A)Ex0+˜

E

with E=y∗

0(˜xE−x0)and ˜

E=y∗

0E˜

ξE. We have

(4.4) |E|≤2p0κ0

s0E2

2and |˜

E|≤6p0κ2

0E3

Proof. After a unitary similarity transformation we may assume that

A=λ0A12

0A22,A

22 −λ0Inonsingular,x

0=1

0,y

∗

0=[1 r]∈C1×n,

where r=A12(λ0I−A22)−1.Furthermore,

(λ0I−A)=0A12(λ0I−A22)−2

0(λ0I−A22)−1=0r(λ0I−A22)−1

0(λ0I−A22)−1,

P0(λ0I−A22)†=00

0λ0I−A22†

=00

0(λ0I−A22)−1.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 615

The statements of the theorem are obtained by specializing Theorems 3.1 and 3.6 to

the case A11 =λ0,·=|||·|

||=·2. In this case we have the following identities:

T(z)=(λI −A22)z,

T−1(E21)=(λI−A22)−1E21,

A122=x∗

0(λ0I−A)2=c,

p=1+r2

2=y02=p0,

κ=1+r2

2+r2=y02+y02

2−1=κ0,

sep2(λ0I,A22)=σmin(λ0I−A22)=σn−1(P0(λ0I−A22)) = s0.

Let E∈Cn×nwith E2<1

2(s2

0+c2−c). According to Theorem 3.1 there exists

a vector zE∈Cn−1depending holomorphically on Esuch that xE=[1 z

E]is an

eigenvector of A+E,zE2≤2

s0E2and zE−T−1(E21)2≤6E2

2/s2

0. Clearly,

x∗

0xE=1andzE2=tan(θ(x0,x

E)). It is straightforward to verify that

ξE2=xE−(I+[P0(λI −A)]†E)x02=zE−T−1(E21)2.

This concludes the proof of (i).

To show (ii), suppose that E2<κ

0/(2s0). According to Theorem 3.6 there ex-

ists a vector wE∈Cn−1depending holomorphically on Esuch that ˜xE=1r

0I1

wE

is an eigenvalue of A+Eand

wE2≤2κ0

p0s0E2as well as wE−T−1(E21)2≤6κ2

p0s2

0E2

It is easily veriﬁed that

˜xE−x0=r

IwEand

ξE=˜xE−(x0+(λ0I−A)Ex0)=r

I(wE−T−1(E21)).

This yields (4.1) and the ﬁrst inequality in (4.2), since [rI]2=p0.Wehave

˜xE=(1+rwE)1(1+rwE)−1w

E.Thus, tan(θ(x0,˜xE)) = (1+ rwE)−1wE2≤

wE2/(1 −r2wE2). This implies the second inequality in (4.2). Equation (4.3)

follows by multiplying the identity (A+E)˜xE=λE˜xEwith y∗

0. The estimates in

(4.4) are then obvious.

Let A∈Cn×nbe a normal matrix. Then we can take y0=x0in Theorem

4.1. Moreover, the separation s0=σn−1(P0(λ0I−A)) = σn−1(λ0I−A)equals

the distance of λ0to the set Λ(A)\{λ0}, and the identities a=0,p0=κ0=1,

[P0(λ0I−A)]†=(λ0I−A)†=(λ0I−A)hold. We thus have the following corollary

to Theorem 4.1.

Corollary 4.2. Let A∈Cn×nbe a normal matrix and consider a normalized

eigenvector x0∈Cnbelonging to a simple eigenvalue λ0of A.Lets0denote the

distance of λ0to the rest of the spectrum of A0, that is, s0=min{|λ0−ν|:ν∈

Λ(A),ν =λ0}.LetE∈Cn×nwith E2<s

0/2. Then there exists an eigenvector

xEof A+Edepending holomorphically on Esuch that x∗

0xE=1and

xE=(I+(λI −A)†E)x0+ξE

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

616 MICHAEL KAROW AND DANIEL KRESSNER

for some ξE∈Cnwith ξE2≤6E2

2/s2

0.Furthermore, xE−x02=tan(θ(x0,x

E)) ≤

2E2/s0.The associated eigenvalue satisﬁes

λE=λ0+x∗

0ExE=λ0+x∗

0Ex0+E=λ0+y∗

0Ex0+x∗

0E(λ0I−A)†Ex0+˜

E

with |E|≤2E2

2/s0and |˜

E|≤6E3

2/s2

5. Conclusions. By establishing a link to the coalescence of pseudospectral

components, we have derived a new perturbation bound for invariant subspaces. As

the bound turns out to be sharp for a 2×2 example, no further obvious improvement

of the bound seems to be possible. Moreover, we establish a novel bound for the

remainder term of a well-known perturbation expansion. Even the (modiﬁed) special-

ization of this remainder bound to the case of individual eigenvectors and eigenvalues

appears to be new. We believe that such bounds for remainder terms are important;

e.g., they can be used for quantifying the validity for condition numbers frequently

used in practice, e.g., in MATLAB and LAPACK [2].

As a side result, we have shown that Stewart’s classical result on the perturbation

of invariant subspaces is a direct consequence of the Newton–Kantorovich theorem.

We believe that there is some interest in this observation, as it may more easily allow

for extensions of Stewart’s result to diﬀerent settings.

Appendix A. Stewart’s result via the Newton--Kantorovich theorem. In

this section, we show that Theorem 2.5 is a special case of the Newton–Kantorovich

theorem formulated in [12, p. 536].

Theorem A.1. Let E,Zbe Banach spaces and let f:Z→Ebe twice continu-

ously diﬀerentiable in a suﬃciently large neighborhood Ωof Z∈Z. Suppose that there

exists a linear operator T:Z→Ehaving a continuous inverse T−1and satisfying the

following conditions:

T−1(F(Z))≤η,(A.1)

T−1◦F(Z)−I≤δ,(A.2)

T−1◦F(˜

Z)≤K∀˜

Z∈Ω.(A.3)

If δ<1and h:= ηK

(1−δ)2<1

2, then there exists a solution ZEof F(ZE)=0such that

ZE−Z≤r0with r0:= 2η

(1 −δ)(1 + √1−2h).

We apply Theorem A.1 to the setting of section 2.4:

0=F(Z):=−f(A+E,Z)=−E21 +Z(A11 +E11)−(A22 +E22)Z+Z(A12 +E12)Z

with E=Z=C(n−k)×k,Z=0,andT:Z→ ZA11 −A22Z. In the following, ·

denotes a consistent family of matrix norms.

Condition (A.1). We have

T−1(F(0))=T−1(E21)≤E21

s=: η,

where s=sep(A11,A

22).

Condition (A.2). From

F(0) : Z→ (A22 +E22)Z−Z(A11 +E11)=T(Z)+T(Z)

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 617

with T(Z):=E22 ·Z−Z·E11, it follows that

T−1◦F(0) −I=T−1◦T≤E11+E22

s=: δ.

Condition (A.3). Since the second derivative of fis constant, it immediately

follows that

T−1◦F(˜

Z)≤2A12 +E12

s≤2A12+E12

s=: K.

Summary. Setting sE=s−E11−E22, we ﬁnally obtain

h=ηK

(1 −δ)2=2E21(A12+E12)

r0=2η

(1 −δ)(1 + √1−2h)=2E21

sE+s2

E−4E21(A12+E12).

Theorem A.1 now states the existence of a solution ZEto F(Z)=−f(A+E,ZE)=0

with ZE≤r0if δ<1andh<1

2. This coincides precisely with the statement of

Theorem 2.5.

Extension. The statement of Theorem 2.5 can be improved when we assume that

·is unitarily invariant. In this case, the quantities δ, K in the derivation above can

be replaced by the potentially smaller quantities

δ=E112+E222

s,K=2

A122+E122

Consequently, the bound of Theorem 2.5 becomes

(A.4) ZE≤ 2E21

sE+s2

E−4E21(A122+E122)<2E21

under the condition E21(A122+E122)<s

E/4 with sE=s−E112−E222.

Acknowledgments. The authors thank Pete Stewart for very helpful discus-

sions. Parts of this work were prepared while the ﬁrst author was visiting FIM (In-

stitute for Mathematical Research) at ETH Zurich. The generous hospitality of FIM

is gratefully acknowledged.

REFERENCES

[1] R. Alam and S. Bora,On stable eigendecompositions of matrices,SIAMJ.MatrixAnal.

Appl., 26 (2005), pp. 830–848.

[2] Z. Bai, J. W. Demmel, and A. McKenney,On computing condition numbers for the non-

symmetric eigenproblem, ACM Trans. Math. Software, 19 (1993), pp. 202–223.

[3] R. Bhatia,Matrix Analysis, Springer-Verlag, New York, 1997.

[4] R. Byers,A LINPACK-style condition estimator for the equation AX −XBT=C, IEEE

Trans. Automat. Control, 29 (1984), pp. 926–928.

[5] F. Chatelin,Spectral Approximation of Linear Operators, Academic Press, New York, 1983.

[6] C. Davis and W. M. Kahan,The rotation of eigenvectors by a perturbation. III, SIAM J.

Numer. Anal., 7 (1970), pp. 1–46.

[7] J. W. Demmel,The condition number of equivalence transformations that block diagonalize

matrix pencils, SIAM J. Numer. Anal., 20 (1983), pp. 599–610.

[8] J. W. Demmel,Computing stable eigendecompositions of matrices, Linear Algebra Appl.,

79 (1986), pp. 163–193.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

618 MICHAEL KAROW AND DANIEL KRESSNER

[9] I. Gohberg, P. Lancaster, and L. Rodman,Invariant Subspaces of Matrices with Applica-

tions, Classics in Appl. Math. 51, SIAM, Philadelphia, 2006.

[10] L. Grammont and A. Largillier,On -spectra and stability radii, J. Comput. Appl. Math.,

147 (2002), pp. 453–469.

[11] M. Gu and M. L. Overton,An algorithm to compute Sepλ, SIAM J. Matrix Anal. Appl.,

28 (2006), pp. 348–359.

[12] L. V. Kantorovich and G. P. Akilov,Functional Analysis, 2nd ed., Pergamon Press, Oxford,

UK, 1982.

[13] M. Karow,Inclusion theorems for pseudospectra of block triangular matrices, in preparation.

[14] T. Kato,Perturbation Theory for Linear Operators, Classics in Math., Springer-Verlag, Berlin,

1995.

[15] S. G. Krantz and H. R. Parks,The Implicit Function Theorem,Birkh¨auser, Boston, 2003.

[16] D. Kressner,Numerical Methods for General and Structured Eigenvalue Problems, of Lect.

Notes Comput. Sci. Eng. 16, Springer-Verlag, Berlin, 2005.

[17] C. D. Meyer and G. W. Stewart,Derivatives and perturbations of eigenvectors,SIAMJ.

Numer. Anal., 25 (1988), pp. 679–691.

[18] G. W. Stewart,Error bounds for approximate invariant subspaces of closed linear operators,

SIAM J. Numer. Anal., 8 (1971), pp. 796–808.

[19] G. W. Stewart,Error and perturbation bounds for subspaces associated with certain eigen-

value problems, SIAM Rev., 15 (1973), pp. 727–764.

[20] G. W. Stewart,Smooth local bases for perturbed eigenspaces, UMIACS-TR-2012-08, CS-TR-

4010, May 2012, Department of Computer Science, University of Maryland.

[21] G. W. Stewart and J.-G. Sun,Matrix Perturbation Theory, Academic Press, New York,

1990.

[22] J.-G. Sun,Estimation of the separation of two matrices, J. Comput. Math., 2 (1984),

pp. 189–200.

[23] J.-G. Sun,Estimation of the separation of two matrices. II, J. Comput. Math., 3 (1985),

pp. 19–26.

[24] J.-G. Sun,Perturbation expansions for invariant subspaces, Linear Algebra Appl., 153 (1991),

pp. 85–97.

[25] J.-G. Sun,Stability and Accuracy: Perturbation Analysis of Algebraic Eigenproblems, Techni-

cal report UMINF 98-07, Department of Computing Science, University of Ume˚a, Ume˚a,

Sweden, 1998.

[26] L. N. Trefethen and M. Embree,Spectra and Pseudospectra. The Behavior of Nonnormal

Matrices and Operators, Princeton University Press, Princeton, NJ, 2005.

[27] J. M. Varah,On the separation of two matrices, SIAM J. Numer. Anal., 16 (1979),

pp. 216–222.

[28] H. Xu,Bounds on the separation of two matrices, J. Fudan Univ. Nat. Sci., 33 (1994), pp. 413–

420.

[29] S.-F. Xu,Lower bound estimation for the separation of two matrices, Linear Algebra Appl.,

262 (1997), pp. 67–82.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php