On a perturbation bound for invariant subspaces of matrices [original]

SIAM J. MATRIX ANAL. APPL.c

2014 Society for Industrial and Applied Mathematics

Vol. 35, No. 2, pp. 599–618

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES OF

MATRICES∗

MICHAEL KAROW†AND DANIEL KRESSNER‡

Abstract. Given a nonsymmetric matrix A, we investigate the eﬀect of perturbations on an

invariant subspace of A. The result derived in this paper diﬀers from Stewart’s classical result and

sometimes yields tighter bounds. Moreover, we provide norm estimates for the remainder terms in

well-known perturbation expansions for invariant subspaces, eigenvectors, and eigenvalues.

Key words. invariant subspaces, perturbation theory, pseudospectra, quadratic matrix equation

AMS subject classifications. Author must provide

DOI. 10.1137/130912372

1. Introduction. A subspace X⊂Cnof a matrix A∈Cn×nis called invariant

if it satisﬁes

(1.1) AX⊂X.

In this paper, we reconsider the classical question of estimating the impact of pertur-

bations in Aon X.

Suppose that the columns X∈Cn×kform an orthonormal basis of X. Then (1.1)

implies the existence of A11 ∈Ck×ksuch that AX =XA11. The eigenvalues of A11 are

independent of the choice of basis and constitute the spectrum of the restriction of Ato

X. Extending Xto a unitary matrix [X, X⊥] leads to the block Schur decomposition

(1.2) AX, X⊥=X,X⊥A11 A12

0A22.

Note that this implies Λ(A)=Λ(A11)∪Λ(A22), where Λ(·) denotes the spectrum of

a matrix. Throughout this paper, we will assume

(1.3) Λ(A11)∩Λ(A22)=∅.

This is a necessary and suﬃcient condition for the Lipschitz continuity of Xwith

respect to perturbations in A[9, Thm. 15.5.1]. (Note that continuity requires a sub-

stantially weaker condition [9, Thm. 15.2.1].) Hence, if (1.3) holds, adding a small

perturbation A→ A+Eimplies a change in the invariant subspace that is asymp-

totically proportional to E.

Various bounds on the change of invariant subspaces under perturbations of A

have been derived, notably by Davis and Kahan [6], Stewart [18, 19], Demmel [8],

and Sun [24]. In the general nonsymmetric case, these bounds are valid only as long

as Eremains suﬃciently small. A minimal requirement is that the separation con-

dition (1.3) remains valid under perturbations. In the language of pseudospectra,

this means that Eshould stay below the critical perturbation level εfor which

∗Received by the editors March 7, 2013; accepted for publication (in revised form) by B. Sutton

March 17, 2014; published electronically May 13, 2014.

http://www.siam.org/journals/simax/35-2/91237.html

†Institut f¨ur Mathematik, TU Berlin, 10623 Berlin, Germany (k[email protected]berlin.de).

‡MATHICSE, EPF Lausanne, CH-1015 Lausanne, Switzerland (daniel.kressner@epﬂ.ch).

599

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

600 MICHAEL KAROW AND DANIEL KRESSNER

the components of the ε-pseudospectrum containing Λ(A11)andΛ(A22)ﬁrstmeet

each other [1]. It turns out that some existing perturbation results are unnecessarily

restrictive and require Eto stay signiﬁcantly below this critical level. The main con-

tribution of this paper consists of a novel perturbation bound for invariant subspaces;

see Theorem 3.1 below. To derive this bound, we employ pseudospectral techniques

in the analysis of a quadratic matrix equation.

The rest of this paper is organized as follows. In section 2, we introduce the basic

tools for the perturbation analysis of invariant subspaces and recall some existing

results. Section 3 contains the statement and proof of our main result, a new pertur-

bation bound for invariant subspaces. In section 3.2, it is shown that this bound is

sharp for a 2 ×2 example. Section 3.3 discusses a variation of the main result based

on the block diagonalization of A, while section 3.5 provides a comparison to existing

perturbation bounds. In section 4, the former is used to quantify existence conditions

and remainder terms for well-known eigenvalue and eigenvector expansions.

2. Preliminaries and existing results. The goal of this section is to summa-

rize some existing perturbation results for invariant subspaces and introduce notation,

needed in the rest of the paper, on the way. Let us ﬁrst recall some basic tools used

in the perturbation analysis.

2.1. Separation between matrices. The condition (1.3) can be quantiﬁed

by the separation between A11 and A22. Based on Varah’s original deﬁnition [27],

Demmel [8] has proposed

sepλ(A11,A

22):=sup{ε>0|Λ(A11 +E11)∩Λ(A22 +E22)=∅

∀E11,E

22 with max{E112,E222}≤ε}.

(2.1)

This deﬁnition has an important interpretation in terms of ε-pseudospectra, deﬁned as

Λε(M)=z∈C|z∈Λ(M+E)forsomeE∈Cn×nwith E2≤ε

for a matrix M∈Cn×n[26]. The separation (2.1) is the minimum value of εsuch

that Λε(A11)∩Λε(A22) is nonempty. This interpretation yields the expression

sepλ(A11,A

22) = inf

λ∈Cmax σmin(A11 −λI),σ

min(A22 −λI),

where σmin(·) denotes the smallest singular value of a matrix. Based on this expres-

sion, an algorithm for computing sepλhas been developed by Gu and Overton [11].

Although sepλis not an appropriate measure to quantify the perturbation of

invariant subspaces, it will still play a role in our derivations. Stewart [19] has intro-

duced a diﬀerent notion of separation based on the observation that (1.3) is satisﬁed

if and only if the Sylvester operator

T:C(n−k)×k→C(n−k)×k,T:Z→ ZA11 −A22Z

is nonsingular. The separation of A11 and A22 with respect to an arbitrary norm ·

is deﬁned as

(2.2) sep(A11,A

22):= min

Z=1 T(Z)=min

Z=1 ZA11 −A22Z.

Then sep(A11,A

22)= 0 if and only if (1.3) holds. Examples in [8, 27] show that the

quantity sep(A11,A

22) can be magnitudes smaller than sepλ(A11,A

22).

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 601

In the following, sepF(A11,A

22)=min

ZF=1 ZA11 −A22ZFdenotes the sep-

aration with respect to the Frobenius norm. An eﬃcient algorithm for estimating

sepF(A11,A

22) can be derived from the inverse power method [4]. Various lower and

upper bounds for sepF(A11,A

22) can be found in [22, 23, 28]. Lower bounds for other

norms are discussed in [29].

Proposition 2.1 below relates sep with sepλand the distance between the spectra

of A11 and A22. Most of its statements are well known; we have included the proof for

the convenience of the reader. Recall that a norm ·is said to be unitarily invariant

if UZV=Zfor all unitary matrices U, V of compatible size. A norm is unitarily

invariant if and only if

(2.3) XZY≤X2ZY2

for all matrices of compatible size.

Proposition 2.1. Let A11,E

11 ∈Ck×k,A22,E

22 ∈C(n−k)×(n−k),andλ∈C.

Let δ=min{|λ−μ||λ∈Λ(A11),μ∈Λ(A22)}denote the distance between the spectra

Λ(A11)and Λ(A22).

(a) For any norm, sep(A11,A

22)≤δ.

(b) For any unitarily invariant norm,

(i) sep(λI, A

22)=σmin(λI −A22),

(ii) sep(A11 +E11,A

22 +E22)≥sep(A11,A

22)−E112−E222,

(iii) sep(A11,A

22)≤2·sepλ(A11,A

22)≤δ.

22)=

2·sepλ(A11,A

22)=δ.

Proof. Consider vectors xand y∗such that x2=y2=1,y∗A11 =λy∗,and

A22x=μxfor λ∈Λ(A11), μ∈Λ(A22). Let Z=xy∗.ThenZA11−A22Z=(λ−μ)Z.

This implies (a). To show the statements of (b), let E11 =1

2(μ−λ)yy∗and E22 =

2(λ−μ)xx∗.Theny∗(A11 +E11)=1

2(λ+μ)y∗and (A22 +E22)x=1

2(λ+μ)x.Thus,

Λ(A11 +E11)∩Λ(A22 +E22)=∅for perturbations satisfying E112=E222=

|λ−μ|/2. This yields the second inequality in (b)(iii). The statement of (b)(ii) follows

from the inequality

Z(A11 +E11)−(A22 +E22)Z≥ZA11 −A22Z−(E112+E222)Z.

In particular, if max{E112,E22} <sep(A11,A

22)/2, then sep(A11 +E11,A

22 +

E22)>0. Thus, Λ(A11 +E11)∩Λ(A22 +E22)=∅. This yields the ﬁrst inequality of

(b)(iii).

Next, we show (c). Suppose that A11 and A22 are normal, and let U, V be unitary

matrices such that A11 =Udiag(λ1,...,λ

k)U∗,A22 =Vdiag(μ1,...,μ

n−k)V∗,where

λjand μiare the eigenvalues of A11 and A22, respectively. Let W=V∗ZU =[wij ].

Then

ZA11 −A22Z2

F=V∗(ZA11 −A22Z)U2

=Wdiag(λ1,...,λ

k)−diag(μ1,...,μ

n−k)W)2

=[(λj−μi)wij ]2

F=

ij |λi−μj|2|wij |2

≥δ2W2

F=δ2Z2

Thus, sepF(A11,A

22)≥δ. Combined with (b)(iii), this yields (c).

It remains to prove (b)(i). Let (u, v) be a pair of normalized singular vectors be-

longing to the smallest singular value σmin of λI−A22 such that (λI−A22)v=σminu,

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

602 MICHAEL KAROW AND DANIEL KRESSNER

and u2=v2.Thenvu∗=uu∗,sincevu∗and uu∗have the same singular val-

ues 1,0,...,0. Setting Z=vu∗/uu∗we obtain Z=1and(λI−A22)Z=σmin.

Thus sep(λI,A

22)≤σmin. On the other hand,

sep(λI,A

22)=min

Z=0 (λI−A22)Z

Z=min

W=0 W

(λI−A22)−1W

≥min

W=0 W

(λI−A22)−12W=σmin.

2.2. Invariant subspaces and a quadratic matrix equation. Let us con-

sider a general matrix A∈Cn×nand partition

A=A11 A12

A21 A22,A

11 ∈Ck×k,A

22 ∈C(n−k)×(n−k).

For given Z∈C(n−k)×k, consider the similarity transformation

(2.4) I0

−ZI

AI0

=A11 +A12ZA

A21 +A22Z−ZA11 −ZA12ZA

22 −ZA12

which becomes block upper triangular if and only if the quadratic matrix equation

(2.5) 0 = f(A, Z):=A21 +A22Z−ZA11 −ZA12Z

is satisﬁed. This implies Lemma 2.2 below. Note that a subspace Yof row vectors is

called a left invariant subspace if YA⊂Y.

Lemma 2.2. Using the notation introduced above, the following statements are

equivalent:

(i) The columns of [IZ

]span a right invariant subspace of Asuch that

AI

Z=I

Z(A11 +A12Z).

(ii) The rows of [−ZI]span a left invariant subspace of Asuch that

−ZI

A=(A22 −ZA12)−ZI

.

(iii) The quadratic matrix equation (2.5) is satisﬁed.

2.3. An asymptotic result. Perturbation bounds that are asymptotically valid

as E→0 can be obtained in a relatively straightforward way from truncating per-

turbation expansions. For invariant subspaces, such expansions have been discussed

in [5, 14, 25]. In the following, we will illustrate this approach.

Lemma 2.3. Given A∈Cn×n, suppose that there exists Z∈C(n−k)×ksuch that

f(A, Z)=0,with fdeﬁned as in (2.5).IfΛ(A11 +A12Z)∩Λ(A22 −ZA12)=∅,

then there exist an open neighborhood E⊂Cn×nof 0and an open neighborhood

Z⊂C(n−k)×kof Zsuch that for each E∈Ethe equation f(A+E,ZE)=0has a

unique solution ZE∈Z. Moreover, ZEdepends holomorphically on Eand admits

the ﬁrst-order expansion

ZE=Z+T−1

ZE21+O(E2)

with the Sylvester operator TZ:ΔZ→ ΔZ(A11 +A12Z)−(A22 −ZA12)ΔZ.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 603

Proof. Clearly, fis a holomorphic function in the entries of Aand Z.The

derivative of fwith respect to the variable Zequals the Sylvester operator −TZ.

Since A22 −ZA12 and A11 +A12Zhave disjoint spectra, the operator TZis invertible.

Thus, the lemma follows from the implicit function theorem [15].

The way Lemma 2.3 is stated will be convenient for later purposes. However, for

the sake of an asymptotic result, we may assume without loss of generality that the

unperturbed matrix Ais already in block triangular form,

(2.6) A=A11 A12

0A22,A

11 ∈Ck×k,A

22 ∈C(n−k)×(n−k).

By Lemma 2.2, this is equivalent to requiring that X=[

0] spans an invariant

subspace of A. Combining the statement of Lemma 2.3 with Lemma 2.2 then yields

the following result.

Corollary 2.4. Let Abe in block triangular form (2.6) and assume Λ(A11)∩

Λ(A22)=∅. Then, for every Ewith Esuﬃciently small, there exists an invariant

subspace XE= span[ I

ZE]of A+Esuch that ZEadmits the ﬁrst-order expansion

(2.7) ZE=T−1E21+O(E2),T:Z→ ZA11 −A22Z.

Recently, Stewart [20] derived bounds on Efor which ZE, as a function of E,

is Fr´echet diﬀerentiable.

Once we have obtained ZE, there are diﬀerent ways of comparing the two invariant

subspaces

X=spanI

0,XE=spanI

ZE

of the matrices Aand A+E, respectively. If σ1≥σ2≥ ··· ≥ σkdenote the

singular values of ZE, then the ith canonical angle between Xand XEis given by

θi(X,XE) = arctan σi.Deﬁning Θ(X,XE) := diag(θ1,...,θ

k), it is well known [21,

sect. II.4] that sin(Θ(X,XE))generates a metric on the k-dimensional subspaces

of Cnfor any unitarily invariant matrix norm ·. However, it is sometimes more

convenient to simply use

(2.8) ZE=tan(Θ(X,XE))

for measuring the distance, which remains close to sin(Θ(X,XE))as long as ZE

is small. The ﬁrst-order result

(2.9) tan(Θ(X,XE))=ZE=E21

sep(A11,A

22)+O(E2)

is now readily obtained from Corollary 2.4. This also conﬁrms that sep(A11,A

22)−1

is the condition number of X[2].

2.4. Nonasymptotic results. The derivation of nonasymptotic results requires

a more careful study of the quadratic matrix equation f(A+E,ZE) = 0 with fas

in (2.5) and

(2.10) A+E=A11 +E11 A12 +E12

E21 A22 +E22.

Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Loading more pages...