
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SIAM J. MATRIX ANAL. APPL.c
2014 Society for Industrial and Applied Mathematics
Vol. 35, No. 2, pp. 599–618
ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES OF
MATRICES∗
MICHAEL KAROW†AND DANIEL KRESSNER‡
Abstract. Given a nonsymmetric matrix A, we investigate the effect of perturbations on an
invariant subspace of A. The result derived in this paper differs from Stewart’s classical result and
sometimes yields tighter bounds. Moreover, we provide norm estimates for the remainder terms in
well-known perturbation expansions for invariant subspaces, eigenvectors, and eigenvalues.
Key words. invariant subspaces, perturbation theory, pseudospectra, quadratic matrix equation
AMS subject classifications. Author must provide
DOI. 10.1137/130912372
1. Introduction. A subspace X⊂Cnof a matrix A∈Cn×nis called invariant
if it satisfies
(1.1) AX⊂X.
In this paper, we reconsider the classical question of estimating the impact of pertur-
bations in Aon X.
Suppose that the columns X∈Cn×kform an orthonormal basis of X. Then (1.1)
implies the existence of A11 ∈Ck×ksuch that AX =XA11. The eigenvalues of A11 are
independent of the choice of basis and constitute the spectrum of the restriction of Ato
X. Extending Xto a unitary matrix [X, X⊥] leads to the block Schur decomposition
(1.2) AX, X⊥=X,X⊥A11 A12
0A22.
Note that this implies Λ(A)=Λ(A11)∪Λ(A22), where Λ(·) denotes the spectrum of
a matrix. Throughout this paper, we will assume
(1.3) Λ(A11)∩Λ(A22)=∅.
This is a necessary and sufficient condition for the Lipschitz continuity of Xwith
respect to perturbations in A[9, Thm. 15.5.1]. (Note that continuity requires a sub-
stantially weaker condition [9, Thm. 15.2.1].) Hence, if (1.3) holds, adding a small
perturbation A→ A+Eimplies a change in the invariant subspace that is asymp-
totically proportional to E.
Various bounds on the change of invariant subspaces under perturbations of A
have been derived, notably by Davis and Kahan [6], Stewart [18, 19], Demmel [8],
and Sun [24]. In the general nonsymmetric case, these bounds are valid only as long
as Eremains sufficiently small. A minimal requirement is that the separation con-
dition (1.3) remains valid under perturbations. In the language of pseudospectra,
this means that Eshould stay below the critical perturbation level εfor which
∗Received by the editors March 7, 2013; accepted for publication (in revised form) by B. Sutton
March 17, 2014; published electronically May 13, 2014.
http://www.siam.org/journals/simax/35-2/91237.html
†Institut f¨ur Mathematik, TU Berlin, 10623 Berlin, Germany (k[email protected]berlin.de).
‡MATHICSE, EPF Lausanne, CH-1015 Lausanne, Switzerland (daniel.kressner@epfl.ch).
599
Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
600 MICHAEL KAROW AND DANIEL KRESSNER
the components of the ε-pseudospectrum containing Λ(A11)andΛ(A22)firstmeet
each other [1]. It turns out that some existing perturbation results are unnecessarily
restrictive and require Eto stay significantly below this critical level. The main con-
tribution of this paper consists of a novel perturbation bound for invariant subspaces;
see Theorem 3.1 below. To derive this bound, we employ pseudospectral techniques
in the analysis of a quadratic matrix equation.
The rest of this paper is organized as follows. In section 2, we introduce the basic
tools for the perturbation analysis of invariant subspaces and recall some existing
results. Section 3 contains the statement and proof of our main result, a new pertur-
bation bound for invariant subspaces. In section 3.2, it is shown that this bound is
sharp for a 2 ×2 example. Section 3.3 discusses a variation of the main result based
on the block diagonalization of A, while section 3.5 provides a comparison to existing
perturbation bounds. In section 4, the former is used to quantify existence conditions
and remainder terms for well-known eigenvalue and eigenvector expansions.
2. Preliminaries and existing results. The goal of this section is to summa-
rize some existing perturbation results for invariant subspaces and introduce notation,
needed in the rest of the paper, on the way. Let us first recall some basic tools used
in the perturbation analysis.
2.1. Separation between matrices. The condition (1.3) can be quantified
by the separation between A11 and A22. Based on Varah’s original definition [27],
Demmel [8] has proposed
sepλ(A11,A
22):=sup{ε>0|Λ(A11 +E11)∩Λ(A22 +E22)=∅
∀E11,E
22 with max{E112,E222}≤ε}.
(2.1)
This definition has an important interpretation in terms of ε-pseudospectra, defined as
Λε(M)=z∈C|z∈Λ(M+E)forsomeE∈Cn×nwith E2≤ε
for a matrix M∈Cn×n[26]. The separation (2.1) is the minimum value of εsuch
that Λε(A11)∩Λε(A22) is nonempty. This interpretation yields the expression
sepλ(A11,A
22) = inf
λ∈Cmax σmin(A11 −λI),σ
min(A22 −λI),
where σmin(·) denotes the smallest singular value of a matrix. Based on this expres-
sion, an algorithm for computing sepλhas been developed by Gu and Overton [11].
Although sepλis not an appropriate measure to quantify the perturbation of
invariant subspaces, it will still play a role in our derivations. Stewart [19] has intro-
duced a different notion of separation based on the observation that (1.3) is satisfied
if and only if the Sylvester operator
T:C(n−k)×k→C(n−k)×k,T:Z→ ZA11 −A22Z
is nonsingular. The separation of A11 and A22 with respect to an arbitrary norm ·
is defined as
(2.2) sep(A11,A
22):= min
Z=1 T(Z)=min
Z=1 ZA11 −A22Z.
Then sep(A11,A
22)= 0 if and only if (1.3) holds. Examples in [8, 27] show that the
quantity sep(A11,A
22) can be magnitudes smaller than sepλ(A11,A
22).
Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 601
In the following, sepF(A11,A
22)=min
ZF=1 ZA11 −A22ZFdenotes the sep-
aration with respect to the Frobenius norm. An efficient algorithm for estimating
sepF(A11,A
22) can be derived from the inverse power method [4]. Various lower and
upper bounds for sepF(A11,A
22) can be found in [22, 23, 28]. Lower bounds for other
norms are discussed in [29].
Proposition 2.1 below relates sep with sepλand the distance between the spectra
of A11 and A22. Most of its statements are well known; we have included the proof for
the convenience of the reader. Recall that a norm ·is said to be unitarily invariant
if UZV=Zfor all unitary matrices U, V of compatible size. A norm is unitarily
invariant if and only if
(2.3) XZY≤X2ZY2
for all matrices of compatible size.
Proposition 2.1. Let A11,E
11 ∈Ck×k,A22,E
22 ∈C(n−k)×(n−k),andλ∈C.
Let δ=min{|λ−μ||λ∈Λ(A11),μ∈Λ(A22)}denote the distance between the spectra
Λ(A11)and Λ(A22).
(a) For any norm, sep(A11,A
22)≤δ.
(b) For any unitarily invariant norm,
(i) sep(λI, A
22)=σmin(λI −A22),
(ii) sep(A11 +E11,A
22 +E22)≥sep(A11,A
22)−E112−E222,
(iii) sep(A11,A
22)≤2·sepλ(A11,A
22)≤δ.
(c) If A11 and A22 are normal matrices, then sepF(A11,A
22)=
2·sepλ(A11,A
22)=δ.
Proof. Consider vectors xand y∗such that x2=y2=1,y∗A11 =λy∗,and
A22x=μxfor λ∈Λ(A11), μ∈Λ(A22). Let Z=xy∗.ThenZA11−A22Z=(λ−μ)Z.
This implies (a). To show the statements of (b), let E11 =1
2(μ−λ)yy∗and E22 =
1
2(λ−μ)xx∗.Theny∗(A11 +E11)=1
2(λ+μ)y∗and (A22 +E22)x=1
2(λ+μ)x.Thus,
Λ(A11 +E11)∩Λ(A22 +E22)=∅for perturbations satisfying E112=E222=
|λ−μ|/2. This yields the second inequality in (b)(iii). The statement of (b)(ii) follows
from the inequality
Z(A11 +E11)−(A22 +E22)Z≥ZA11 −A22Z−(E112+E222)Z.
In particular, if max{E112,E22} <sep(A11,A
22)/2, then sep(A11 +E11,A
22 +
E22)>0. Thus, Λ(A11 +E11)∩Λ(A22 +E22)=∅. This yields the first inequality of
(b)(iii).
Next, we show (c). Suppose that A11 and A22 are normal, and let U, V be unitary
matrices such that A11 =Udiag(λ1,...,λ
k)U∗,A22 =Vdiag(μ1,...,μ
n−k)V∗,where
λjand μiare the eigenvalues of A11 and A22, respectively. Let W=V∗ZU =[wij ].
Then
ZA11 −A22Z2
F=V∗(ZA11 −A22Z)U2
F
=Wdiag(λ1,...,λ
k)−diag(μ1,...,μ
n−k)W)2
F
=[(λj−μi)wij ]2
F=
ij |λi−μj|2|wij |2
≥δ2W2
F=δ2Z2
F.
Thus, sepF(A11,A
22)≥δ. Combined with (b)(iii), this yields (c).
It remains to prove (b)(i). Let (u, v) be a pair of normalized singular vectors be-
longing to the smallest singular value σmin of λI−A22 such that (λI−A22)v=σminu,
Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
602 MICHAEL KAROW AND DANIEL KRESSNER
and u2=v2.Thenvu∗=uu∗,sincevu∗and uu∗have the same singular val-
ues 1,0,...,0. Setting Z=vu∗/uu∗we obtain Z=1and(λI−A22)Z=σmin.
Thus sep(λI,A
22)≤σmin. On the other hand,
sep(λI,A
22)=min
Z=0 (λI−A22)Z
Z=min
W=0 W
(λI−A22)−1W
≥min
W=0 W
(λI−A22)−12W=σmin.
2.2. Invariant subspaces and a quadratic matrix equation. Let us con-
sider a general matrix A∈Cn×nand partition
A=A11 A12
A21 A22,A
11 ∈Ck×k,A
22 ∈C(n−k)×(n−k).
For given Z∈C(n−k)×k, consider the similarity transformation
(2.4) I0
−ZI
AI0
ZI
=A11 +A12ZA
12
A21 +A22Z−ZA11 −ZA12ZA
22 −ZA12
which becomes block upper triangular if and only if the quadratic matrix equation
(2.5) 0 = f(A, Z):=A21 +A22Z−ZA11 −ZA12Z
is satisfied. This implies Lemma 2.2 below. Note that a subspace Yof row vectors is
called a left invariant subspace if YA⊂Y.
Lemma 2.2. Using the notation introduced above, the following statements are
equivalent:
(i) The columns of [IZ
]span a right invariant subspace of Asuch that
AI
Z=I
Z(A11 +A12Z).
(ii) The rows of [−ZI]span a left invariant subspace of Asuch that
−ZI
A=(A22 −ZA12)−ZI
.
(iii) The quadratic matrix equation (2.5) is satisfied.
2.3. An asymptotic result. Perturbation bounds that are asymptotically valid
as E→0 can be obtained in a relatively straightforward way from truncating per-
turbation expansions. For invariant subspaces, such expansions have been discussed
in [5, 14, 25]. In the following, we will illustrate this approach.
Lemma 2.3. Given A∈Cn×n, suppose that there exists Z∈C(n−k)×ksuch that
f(A, Z)=0,with fdefined as in (2.5).IfΛ(A11 +A12Z)∩Λ(A22 −ZA12)=∅,
then there exist an open neighborhood E⊂Cn×nof 0and an open neighborhood
Z⊂C(n−k)×kof Zsuch that for each E∈Ethe equation f(A+E,ZE)=0has a
unique solution ZE∈Z. Moreover, ZEdepends holomorphically on Eand admits
the first-order expansion
ZE=Z+T−1
ZE21+O(E2)
with the Sylvester operator TZ:ΔZ→ ΔZ(A11 +A12Z)−(A22 −ZA12)ΔZ.
Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
ON A PERTURBATION BOUND FOR INVARIANT SUBSPACES 603
Proof. Clearly, fis a holomorphic function in the entries of Aand Z.The
derivative of fwith respect to the variable Zequals the Sylvester operator −TZ.
Since A22 −ZA12 and A11 +A12Zhave disjoint spectra, the operator TZis invertible.
Thus, the lemma follows from the implicit function theorem [15].
The way Lemma 2.3 is stated will be convenient for later purposes. However, for
the sake of an asymptotic result, we may assume without loss of generality that the
unperturbed matrix Ais already in block triangular form,
(2.6) A=A11 A12
0A22,A
11 ∈Ck×k,A
22 ∈C(n−k)×(n−k).
By Lemma 2.2, this is equivalent to requiring that X=[
Ik
0] spans an invariant
subspace of A. Combining the statement of Lemma 2.3 with Lemma 2.2 then yields
the following result.
Corollary 2.4. Let Abe in block triangular form (2.6) and assume Λ(A11)∩
Λ(A22)=∅. Then, for every Ewith Esufficiently small, there exists an invariant
subspace XE= span[ I
ZE]of A+Esuch that ZEadmits the first-order expansion
(2.7) ZE=T−1E21+O(E2),T:Z→ ZA11 −A22Z.
Recently, Stewart [20] derived bounds on Efor which ZE, as a function of E,
is Fr´echet differentiable.
Once we have obtained ZE, there are different ways of comparing the two invariant
subspaces
X=spanI
0,XE=spanI
ZE
of the matrices Aand A+E, respectively. If σ1≥σ2≥ ··· ≥ σkdenote the
singular values of ZE, then the ith canonical angle between Xand XEis given by
θi(X,XE) = arctan σi.Defining Θ(X,XE) := diag(θ1,...,θ
k), it is well known [21,
sect. II.4] that sin(Θ(X,XE))generates a metric on the k-dimensional subspaces
of Cnfor any unitarily invariant matrix norm ·. However, it is sometimes more
convenient to simply use
(2.8) ZE=tan(Θ(X,XE))
for measuring the distance, which remains close to sin(Θ(X,XE))as long as ZE
is small. The first-order result
(2.9) tan(Θ(X,XE))=ZE=E21
sep(A11,A
22)+O(E2)
is now readily obtained from Corollary 2.4. This also confirms that sep(A11,A
22)−1
is the condition number of X[2].
2.4. Nonasymptotic results. The derivation of nonasymptotic results requires
a more careful study of the quadratic matrix equation f(A+E,ZE) = 0 with fas
in (2.5) and
(2.10) A+E=A11 +E11 A12 +E12
E21 A22 +E22.
Downloaded 12/14/17 to 130.149.176.172. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Loading more pages...