Linear Algebra

Linear algebra deals with finite-dimensional vector spaces. Notes on linear algebra follows [@Horn & Johnson, 1990. Matrix Analysis.]

Scanned course notes: Notes on linear algebra; Notes on advanced algebra; Notes on vector space;

Symbols

$\bar{\mathbb{F}}$: algebraically closed field, e.g. $\bar{\mathbb{Q}}$, $\bar{\mathbb{R}}$, and $\bar{\mathbb{C}}$.
$\mathbb{R}^n, \mathbb{C}^n$: n-dimentional vector spaces based on $\mathbb{R}$ and $\mathbb{C}$.
$\mathcal{B}$, $\{ e_i \}_{i=1}^n$: a basis of a vector space; the standard basis of $\mathbb{R}^n$ or $\mathbb{C}^n$.
$[v]_{\mathcal{B}}$, $_{\mathcal{B}_1}[T]_{\mathcal{B}_2}$: coordinate representation of a vector in basis $\mathcal{B}$ / a linear operator in bases $\mathcal{B}_1$ - $\mathcal{B}_2$.
$A$, $[A]_{I,J}$: a matrix; a submatrix with rows and columns from index sets $I$ and $J$ (default to $I = J$).
$M_{m,n}(\mathbb{F})$, $M_n(\mathbb{F})$: the set of all m-by-n / n-by-n matrices over field $\mathbb{F}$, which defaults to $\mathbb{C}$.
$\mathcal{F}$: a family of matrices;
$\text{GL}(n, \mathbb{F})$, $\text{SL}(n, \mathbb{F})$: the general / special linear group formed by nonsingular / unit-determinant matrices in $M_n(\mathbb{F})$;
$O(n)$, $SO(n)$, the (special) orthogonal group formed by (unit-determinant) orthogonal matrices in $\text{GL}(n, \mathbb{R})$ / $\text{SL}(n, \mathbb{R})$;
$U(n)$, $SU(n)$, the (special) unitary group formed by (unit-determinant) unitary matrices in $\text{GL}(n, \mathbb{C})$ / $\text{SL}(n, \mathbb{C})$;
$A \succeq 0$, $A \ge 0$: positive semi-definite matrix; nonnegative matrix.
$|A|$: matrix of absolute values of entries of a matrix.
$A^T, \bar{A}, A^H, A^\dagger$: transpose / conjugate / Hermitian adjoint (conjugate transpose) / Moore-Penrose pesudoinverse of a matrix.
$A^{-1}$: inverse of a nonsingular matrix.
$A^{1/2}$: the unique positive semidefinite square root of a positive semidefinite matrix.
$\sigma(A)$, $\sigma_i(A)$: the set of singular values of a matrix; the i-th largest singular value.
$\lambda(A)$, $\lambda_i(A)$: spectrum of a square matrix; the i-th eigenvalue, in increasing order if the matrix is Hermitian.
$\rho(A)$: spectral radius of a square matrix, the largest modulus of its eigenvalues.
$\text{tr} A$, $\det A$; $\text{adj} A$: trace / determinant / adjugate (classical adjoint) of a square matrix.
$p_A(t), q_A(t)$: the characteristic / minimal polynomial of a square matrix.

Vector Space and Product Space of Scalars

Vector space or linear space $(V, (+, \cdot_{\mathbb{F}}))$ over a field $(\mathbb{F}, (+, \cdot))$ is a set $V$ endowed with a binary operation $+: V^2 \mapsto V$ and a map $\cdot_{\mathbb{F}}: \mathbb{F} \times V \mapsto V$. For example, Cartesian powers of real numbers $\mathbb{R}^n$, the set $\mathbb{R}^X$ of real-valued functions with a common domain, and the set $C^0(X)$ of continuous real-valued functions with a common domain become real vector spaces when endowed with component-/point-wise addition and scalar multiplication. We call $+$ vector addition and $\cdot_{\mathbb{F}}$ scalar multiplication. Vector space structure $(+, \cdot_{\mathbb{F}})$ is the vector addition and scalar multiplication maps of a vector space.

Linear subspace $(S, (+, \cdot_{\mathbb{F}}))$ of a vector space is a vector space consisting of a subset of the vector space and the same vector space structure: $+: S^2 \mapsto S$, $\cdot_{\mathbb{F}}: \mathbb{F} \times S \mapsto S$. Linear combination of vectors is a finite sum of scaled vectors: $\sum_{i=1}^k a^i v_i$. Linear span (线性生成空间) $\text{Span}(A)$ of a set $A$ of vectors is the subspace that consists of all linear combinations of the set.

Linearly independent subset of a vector space is a subset such that every non-zero linear combination of its elements is non-zero: $n \in \mathbb{N}$, $a \in \mathbb{R}^n$, $a \ne 0$, then $\forall \{v_i\}_{i=1}^n$, $\sum_{i=1}^n a^i v_i \ne 0$. Algebraic/Hamel basis $\mathcal{B}$ of a vector space is a linearly independent subset that spans the space: $\text{Span}(\mathcal{B}) = V$. Every vector space has a basis. Dimension $\dim V$ of a vector space is the size of any basis of the vector space. The basis may be finite, countably infinite, or uncountably infinite; so is the dimension of the vector space.

Table: The intersection of linearly independent sets and span-to-space sets are bases.

Elements	$1, \cdots, \dim V$	$\dim V$	$\dim V, \cdots$
Sets	linearly independent	basis	span to $V$

The sum $v + S$ of a vector and a subspaces of a vector space is the set of all vectors that can be written as the addition of the vector and a vector from the subspace: $v + S = \{v + w : w \in S\}$. We call $v + S$ an affine subspace of the vector space parallel to the subspace, and also the coset (陪集) of the subspace determined by the vector. Dimension of an affine subspace is the dimension of the associated subspace. Quotient $V / S$ of a vector space by a subspace is the set of all cosets of the subspace: $V / S = \{v + S : v \in V\}$. Quotient space $(V / S, (+, \cdot_{\mathbb{F}}))$ of a vector space is a vector space of dimension $\dim V - \dim S$, consisting of a quotient of the vector space, and vector addition and scalar multiplication of cosets defined by $(v + S) + (w + S) = (v + w) + S$ and $c (v + S) = (c v) + S$. The natural projection $\pi: V \mapsto V/S$ associated with a quotient space is defined by $\pi(v) = v + S$.

The sum $S + T$ of two subspaces of a vector space is the set of all vectors that can be written as the addition of two vectors, one from each subspace: $S + T = \{v + w : v \in S, w \in T\}$. The sum of two subspaces equals the linear span of their union: $S + T = \text{Span}(S \cup T)$. Internal direct sum $S \oplus T$ of two subspaces of a vector space that intersect only at zero is their sum: $S \oplus T = S + T$. Complementary subspaces are two linear subspaces that intersect only at zero and whose (direct) sum equals the full space. Projection $\pi: V \mapsto S$ from a vector space onto a subspace, given a complementary subspace $T$, is the linear map that takes each vector to the vector in the subspace that appears in its direct sum decomposition: $\pi(v + w) = v$ where $v \in S, w \in T$.

Direct product or product space $(\prod_{\alpha \in A} V_\alpha, (+, \cdot_{\mathbb{F}}))$ of an indexed family of vector spaces over the same field is the vector space consisting of the Cartesian product and componentwise addition and scalar multiplication. The direct product and the external direct sum of a finite indexed family of vector spaces are identical.

Vector space isomorphism is an invertible linear map between two vector spaces. Finite-dimensional vector spaces over the same field are isomorphic if and only if they have the same dimension: $V \cong W \iff \dim V = \dim W$. In particular, any n-dimensional vector space over a field $\mathbb{F}$ is isomorphic the product space of the field to the n-th power: $V \cong \mathbb{F}^n \iff \dim V = n$. For any basis $\mathcal{B}$ of an n-dimensional vector space $V$ over field $\mathbb{F}$, the mapping $f: v \rightarrow [v]_{\mathcal{B}}$ from a vector to its coordinates is an isomorphism between $V$ and $\mathbb{F}^n$. It justifies the identification of all n-dimensional vector spaces with $\mathbb{F}^n$, up to a specific basis for each space. Scalar-valued function space $(\mathbb{F}^X, (+, \cdot_{\mathbb{F}}))$ on a set $X$ over a field $(\mathbb{F}, (+, \cdot))$ is the vector space over the field consisting of all scalar-valued functions on set, endowed with pointwise addition and scalar multiplication. A function space has a finite dimension if and only if the underlying set is finite: $|X| = n$ then $\mathbb{F}^X \cong \mathbb{F}^n$.

Linear Operator and Matrix

Operator (算子) $\omega: V \mapsto W$ is a mapping from a vector space to another; its application to / action on a vector is commonly denoted as $\omega v = w$. Linear operator $\omega: V \to W$ is an operator between vector spaces over the same field, such that it is compatible with their vector space structures: $\omega (x + y) = \omega x + \omega y$, $\omega (a x) = a \omega x$. Examples of linear operators include the coefficient matrices of systems of linear equations. A linear operator is nonsingular if it does not map nonzero vectors to zero. The set $\mathcal{L}(V, W)$ of all linear operators between two given vector spaces over the same field, endowed with pointwise addition and scalar multiplication, is a vector space over the same field. Linear transformation $\upsilon: V \to V$ is a linear operator from a vector space to itself. Examples of linear transformations include coordinate transformations and the Fourier integral transform. Linear operator theory concerns linear operators between (infinite-dimensional) vector spaces; the study of linear operators between finite-dimensional vector spaces is called Matrix Analysis. An important question in linear operator theory is classifying linear transformations $\mathcal{L}(V, V)$ w.r.t. some equivalence relation, e.g. similarity, unitary equivalence, topological equivalence.

Functional (泛函) $\alpha: V \mapsto \mathbb{F}$ is an operator from a vector space to its underlying scalar field, especially when the vector space is (a subspace of) a scalar-valued function space $\mathbb{F}^X$. Linear functional or covector (余向量) $\alpha: V \to \mathbb{F}$ is a functional that is also a linear operator. Examples of linear functionals include differentiation and integration. The vector space $\mathcal{L}(V, \mathbb{F})$ of linear functionals on a vector space is also written as $\mathcal{L}(V)$. Sublinear functional $p: V \mapsto \mathbb{R}$ on a real vector space is a functional that is subadditive and positive-homogeneous: $p(v + w) \le p(v) + p(w)$, $p(|a| v) = |a| p(v)$. Hahn–Banach Theorem (extension of linear functionals) [@Hahn1927; @Banach1929]: Every linear functional on a subspace of a real vector space that is bounded from above by a sublinear functional on the space has a linear extension to the space that remains bounded from above by the sublinear functional: $\omega \in S^∗$, $\omega \le p$, then $\exists \tilde\omega \in V^∗$, $\tilde\omega \le p$.

Algebraic dual space $V^∗$ of a vector space is the vector space of its linear functionals: $V^∗ = \mathcal{L}(V)$. Dual basis $(\varepsilon^i)_{i=1}^n$ to a basis $(e_i)_{i=1}^n$ of a finite-dimensional vector space is the n-tuple of covectors defined by $\varepsilon^i(e_j) = \delta^i_j$, which is a basis for the dual space. A finite-dimensional vector space and its dual space have the same dimension, and thus isomorphic. The action of a covector on a vector equals the sum of products of their coordinate representations in any basis and the dual basis: $\omega = \omega_i \varepsilon^i$, $v = v^j e_j$, then $\omega(v) = \omega_i v^i$; we write basis covectors with upper indices, and components of a covector with lower indices, so that the Einstein summation convention applies. Second dual space $V^{∗∗}$ of a vector space is the dual space of its dual space: $V^{∗∗} = (V^∗)^∗$. Canonical embedding $\xi: V \mapsto V^{∗∗}$ of a vector space into its second dual space is the injective linear operator that maps each vector to the evaluation map of covectors at that vector: $\forall v \in V$, $\forall \omega \in V^∗$, $\xi(v)(\omega) = \omega(v)$. This embedding is called canonical because its definition has no reference to any basis (see invariant definition). Algebraically reflexive vector space is a vector space whose canonical embedding is surjective, and thus a canonical vector space isomorphism: $\xi(V) = V^{∗∗}$. Finite-dimensional vector spaces are reflexive. Although a finite-dimensional vector space is also isomorphic to its dual space, there is no canonical isomorphism. Because of the canonical embedding, the action of a covector on a vector sometimes uses a symmetric angle bracket notation: $\langle w, v \rangle := \omega(v)$, $\langle v, w \rangle := \xi(v)(\omega) = \omega(v)$; this notation causes no confusion with an inner product because the latter operates on two vectors. With this symmetric notation, any basis is the dual basis to its dual basis, and thus the dual of a basis in a finite-dimensional vector space is an involution.

Matrix $(a^i_j)^{i \in I}_{j \in J}$ is a rectangular array of scalars in a field $\mathbb{F}$. Matrix addition $A + B$ of two matrices of the same size is the matrix formed by entrywise addition: $[A + B]_{i,j} = a^i_j + b^i_j$. Scalar matrix $a I$ is a scalar multiple of the identity matrix. The set $M_{m,n}(\mathbb{F})$ of all m-by-n matrices with entries from a field $\mathbb{F}$, endowed with entrywise addition and scalar multiplication, is a vector space over the same field. Matrix transpose $A^T$ is the flipped matrix: $[A^T]_{i,j} = a^j_i$. Matrix multiplication $A B$ of two matrices $A \in M_{l,m}(\mathbb{F})$ and $B \in M_{m,n}(\mathbb{F})$ is the matrix in $M_{l,n}(\mathbb{F})$ defined by $[A B]_{i,j} = a^i_k b^k_j$. Hadamard product or Schur product $A \circ B$ or $A \odot B$ of matrices of the same size is the matrix formed by entrywise product: $[A \circ B]_{i,j} = a^i_j b^i_j$. Matrix direct sum $A \oplus B$ is the block diagonal matrix $\text{diag}(A, B)$. Kronecker product or tensor product $A \otimes B$ of two matrices $A \in M_{m,n}(\mathbb{F})$ and $B \in M_{p,q}(\mathbb{F})$ is the matrix in $M_{mp,nq}(\mathbb{F})$ defined by $[A \otimes B]_{(i-1)p+k,(j-1)q+l} = a^i_j b^k_l$. Kronecker sum or tensor sum $A \oplus B$ (same symbol as direct sum) of square matrices $A \in M_m(\mathbb{F})$ and $B \in M_n(\mathbb{F})$ is the square matrix in $M_{mn}(\mathbb{F})$ defined by $A \oplus B = A \otimes I_n + I_m \otimes B$.

Coordinate representation $[A]$ of a linear operator $A \in \mathcal{L}(V, W)$ w.r.t. a basis $(v_j)_{j=1}^n$ of the domain and a basis $(w_i)_{i=1}^m$ of the codomain is the matrix $[A] \in M_{m,n}(\mathbb{F})$ defined by $a^i_j = \omega^i A v_j$, where $(\omega^i)_{i=1}^m$ is the dual basis of the given basis of the codomain. For any bases $\mathcal{B}_V$ and $\mathcal{B}_W$ of n- and m-dimensional vector spaces $V$ and $W$ over field $\mathbb{F}$, the mapping $f: A \rightarrow _{\mathcal{B}_W}[T]_{\mathcal{B}_V}$ from a linear operator to its matrix representation is a vector space isomorphism between $\mathcal{L}(V, W)$ and $M_{m,n}(\mathbb{F})$. Given such isomorphism/identification determined by two bases: matrix addition $[A] + [B]$ corresponds to the addition $A + B$ of linear operators; matrix multiplication $[A] [B]$ corresponds to the composition $A B$ of linear operators. matrix direct sum $[A] \oplus [B]$ corresponds to the direct sum $A \oplus B$ of linear operators, such that $(A \oplus B)(v, v') = (Av) \oplus (Bv')$.

Dual operator or transpose $A^∗ \in \mathcal{L}(W^∗, V^∗)$ of a linear operator $A \in \mathcal{L}(V, W)$ is the linear operator defined by $\forall \omega \in W^∗$, $\forall v \in V$, $(A^∗ \omega)(v) = \omega (A v)$. Given two bases of the domain and the codomain, the matrix representation of the dual operator equals the transpose of that of the original linear operator: $_{\mathcal{B}_V}[A^∗]_{\mathcal{B}_W} = (_{\mathcal{B}_W}[A]_{\mathcal{B}_V})^T$. Fundamental subspaces associated with a rank-r linear operator $A: V \mapsto W$ between real vector spaces of dimensions n and m are the four subspaces of its domain or codomain defined as follows. Let its matrix representation in some bases has a full SVD: $[A] = U \Sigma V^T$. Image (像) $\text{im}(A)$ or column space is the r-subspace of the codomain whose underlying set is the range of the operator: $\text{im}(A) = \text{span}(u_i)_{i=1}^r$. Kernel (核) $\text{ker}(A)$ or null space $A^{-1}(0)$ is the (n-r)-subspace of the domain whose underlying set is the zero set of the operator: $\text{ker}(A) = \text{span}(v_i)_{i=r+1}^n$. Coimage (余像) $\text{im}(A^T)$ or row space is the r-subspace of the domain whose underlying set is the range of the dual operator: $\text{im}(A^T) = \text{span}(v_i)_{i=1}^r$. Cokernel (余核) $\text{ker}(A^T)$ or left null space $(A^T)^{-1}(0)$ is the (m-r)-subspace of the codomain whose underlying set is the zero set of the dual operator: $\text{ker}(A^T) = \text{span}(u_i)_{i=r+1}^m$. Fundamental Theorem of Linear Algebra (not the Fundamental Theorem of Algebra): Kernel and coimage are complementary subspaces: $V = \text{ker}(A) \oplus \text{im}(A^T)$. Cokernel and image are complementary subspaces: $W = \text{ker}(A^T) \oplus \text{im}(A)$. If the underlying vector spaces are inner product spaces, the said two pairs of fundamental subspaces are orthogonal complements: $\text{im}(A^T) = \text{ker}(A)^\perp$; $\text{ker}(A^T) = \text{im}(A)^\perp$. Properties: $\text{im}(A B) \subset \text{im}(A)$; $\text{im}(A B) = \text{im}(A)$ if and only if $\text{im}(B) + \text{ker}(A) = \text{dom}(A)$.

Trace $\text{tr}$ of a square matrix is the sum of its diagonal entries: $\forall A \in M_n$, $\text{tr}~A = \sum_{i=1}^n A_{ii}$. Trace is invariant under cyclic permutations: $\forall A \in M_{m,n}$, $\forall B \in M_{n,m}$, $\text{tr}(A B) = \text{tr}(B A)$. Any linear functional on a space linear endomorphisms is proportional to the trace if it is invariant under cyclic permutations. As a corollary, trace is similarity invariant: $\forall A \in M_n$, $\forall P \in \text{GL}_n$, $\text{tr}(P^{-1} A P) = \text{tr}(A)$. Trace $\text{tr}$ of an endomorphism of a vector space is the sum of the diagonal entries of any matrix representation of the endomorphism, which is independent of the basis: $T \in \mathcal{L}(V, V)$, $\forall B = (b_i)_{i=1}^n$, $\text{Span}(B) = V$, $\text{tr}~T = \text{tr}~[T]_B$.

Determinant of a square matrix is the scalar-valued function $\det: M_n(\mathbb{F}) \mapsto \mathbb{F}$ defined as the sum of products of Latin hypercube samples of matrix entries, each product signed by the the sign of the associated permutation: $\det(A) := \sum_{\sigma \in S_n} \text{sgn}(\sigma) \prod_{i=1}^n a_{i\sigma(i)}$, where $S_n$ is the symmetric group. The determinant is an order-n polynomial of the matrix entries with n! terms, and it is affine in each matrix entry. Minor of a square matrix is the determinant of a square submatrix. The (i, j) cofactor of a square matrix is the signed determinant of its submatrix formed by deleting the i-th row and the j-th column: $c_{ij} := (-1)^{i+j} \det(A_{-i, -j})$. Equivalently, the determinant can be defined as follows: if n = 1, it equals its sole entry, $\det([a]) = a$; if n > 1, it is recursively defined by the Laplace expansion by cofactors along any row or column, $\det(A) = \sum_{j=1}^n a_{ij} c_{ij} = \sum_{i=1}^n a_{ij} c_{ij}$, $\forall i, j \in n$. Properties. The identity matrix has a unit determinant: $\det(I) = 1$. The determinant is multiplicative: $\det(A B) = \det(A) \det(B)$. (Corollary) The determinant is the product of the repeated eigenvalues: $\det(A) = \det(J) = \prod_{i=1}^n \lambda_i$, where $A = P J P^{-1}$ is a Jordan decomposition. The determinant is the alternating n-tensor (aka n-covector) on the Euclidean n-space, unique up to scaling.

Inverse of a square matrix, if exists, is the matrix $A^{-1}$ such that $A A^{-1} = I$. Invertible matrix is a square matrix whose inverse exists. Invertible matrices and square nonsingular matrices are equivalent. A square matrix is invertible if and only if zero is not one of its eigenvalues (or singular values). Square matrices are invertible in general: the set of singular matrices has measure zero, $\mu(M_n \setminus \text{GL}_n) = 0$. Adjugate or classical adjoint of a square matrix is the transpose of its cofactor matrix: $\text{adj}(A) := C^T$, where $[C]_{i,j} = c_{ij}$. The inverse of a nonsingular square matrix is its adjugate divided by its determinant: $A^{-1} = \det(A)^{-1} \text{adj}(A)$; in entry form, $\overline{a}_{ij} = (-1)^{i+j} \det(A_{-j, -i}) / \det(A)$. Matrix inversion $\text{inv}: \text{GL}(n, \mathbb{F}) \mapsto \text{GL}(n, \mathbb{F})$ is a rational function of the matrix entries, with order-(n-1) polynomials on the numerator and a common order-n polynomial on the denominator.

Matrix Factorization

Matrix factorization or matrix decomposition is a representation of a matrix as the product of (typically two or three) matrices. The factor matrices may be smaller or more structured than the original matrix.

Typically a matrix factorization is not unique. Consider an unstructured bi-factorization of a matrix: $M = A B^T$. This is equivalent to expressing A as a sum of rank-one matrices, defined by the paired columns of M and N: $A = \sum_{i=1}^s a_i b_i^T$. (1) The sum is invariant under re-indexing: we may think of M and N as two decks of cards, with the same depth and perhaps different heights; the equation $A B^T = \sum_{i=1}^s a_i b_i^T$ resembles a riffle (dovetail shuffle) which exactly pairs the cards in the two decks; simultaneous permutation of the two decks does not change the outcome; pairs can be added to or removed from anywhere in the two decks, $A B^T + a b^T = [A~a] [B~b]^T$. (2) If the number of columns in A (and B) equals the rank of A, i.e. s = r, then the product is invariant under any change of basis, defined by $(A, B) \to (A P, B P^{-T})$. Simultaneous permutation (case 1) is a special case of this, because $P^{-T} = P$. (3) If s > r, then apparently the factorization is redundant.

Common matrix factorizations (and rank-one representations):

LU decomposition ~ Gaussian elimination for solving systems of linear equations: $A = L U$, where $\forall k \le n$, $[A]_{1 \dots k} \in \text{GL}_k$, $L$ is lower-triagular, and $U$ is upper-triangular; $A = \sum_{i=1}^n l_i u_i^∗$, sequentially reduce residual order from n to 1; (We use row vector notation $a_i^∗ := e_i^T A$.)
- $A = P^T L U$, LU decomposition with row pivoting;
- Cholesky decomposition: $A = L L^H$, where $A \succeq 0$ and $L$ is lower-triangular;
QR decomposition ~ Gram–Schmidt orthogonalization process, least squares: $A = Q R$, where $A \in M_{m,n}$, $Q \in M_{m,n}$ is orthogonal/unitary, $R \in M_n$ is upper-triangular; $\{ a_1, \cdots, a_n \} \to \{ q_1, \cdots, q_n \}$; $A = \sum_{i=r}^n q_i r_i^∗$, sequentially building up column space from 1 to r dimensions;
Jordan decomposition ~ eigenvalue problem: $A = P J P^{-1}$, where $A \in M_n$, $J$ is a Jordan matrix, and $P$ is invertible; if $A$ is real with only real eigenvalues, then $P$ can be real;
- Eigendecomposition: $A = P \Lambda P^{-1}$, where $\Lambda$ is diagonal, if $A$ has $n$ linearly independent eigenvectors; $A = \sum_{i=1}^r \lambda_i p_i q_i^T$, $q_i = P^{-T} e_i$, $\text{mod}(\lambda) \in \mathbb{R}^n_{+\downarrow}$, sequentially reducing sensitivity;
  - $A = U \Lambda U^H$, where $U$ is orthogonal/unitary, if and only if $A$ is a normal matrix, e.g. orthogonal/unitary, (skew-)symmetric/Hermitian;
QZ decomposition / generalized Schur decomposition ~ generalized eigenvalue problem: $A = Q S Z^H, B = Q T Z^H$, where $A, B \in M_n$, $Q, Z$ orthogonal/unitary, and $S, T$ (1-2 block) upper triangular, such that ratio of the diagonals are the generalized eigenvalues $\lambda_i = S_{ii} / T_{ii}$, $A v = \lambda B v$;
- Schur decomposition, $A = Q U Q^H$, where $A \in M_n$, $Q$ unitary, and $U$ upper triangular (and complex), aka Schur form, where the diagonal are the eigenvalues;
Singular value decomposition (SVD) ~ principal components analysis, latent factor analysis: $A = V \Sigma W^H$, where $A \in M_{m,n}$, $V \in M_m, W \in M_n$ are orthogonal/unitary, and $\Sigma \in M_{m,n}$ is non-negative and has diagonal entries only; $A = \sum_{i=1}^r \sigma_i v_i w_i^H$, $\sigma \in \mathbb{R}^n_{+\downarrow}$, sequentially minimizing residual norm;

Similarity and the Eigenvalue Problem

Two square matrices $A, B \in M_n$ are similar if they are matrix representations of the same linear operator in different bases: $\exists S \in \text{GL}(n): A = S B S^{-1}$. The transformation $B \to S B S^{-1}$ is called a similarity transformation by similarity matrix $S$.

Eigenvector of a linear transformation on a finite-dimensional vector space is a vector that gets scaled by the transformation: $A x = \lambda x$. Eigenvalue (本征值) $\lambda$ associated with an eigenvector is the scaling factor when the eigenvector is applied to the linear transformation. Eigenspace associated with an eigenvalue is the subspace of eigenvectors with the eigenvalue: $\{x : A x = \lambda x\}$. Spectrum $\lambda(A)$ of a linear transformation on a finite-dimensional vector space is the set of its eigenvalues. Resolvent set $\mathbb{C} \setminus \lambda(A)$ of a linear transformation is the complement of its spectrum in the complex plane.

Jordan block $J_k(\lambda_i)$ is a k-by-k matrix with $\lambda_i$ on the diagonal and 1 on the super diagonal: $J_k(\lambda_i) = \lambda_i I_k + U_k$, where $U_k$ is the upper/backward shift matrix $U_k = [0, I_{k-1}; 0, 0]$; specially, $J_1(\lambda_i) = [\lambda_i]$. Jordan matrix is a direct sum of Jordan blocks, i.e. a block diagonal matrix consisting of Jordan blocks: $J = \text{diag}\{ J_{n_i}(\lambda_i) \}_{i=1}^k$, $\sum_{i=1}^k n_i = n$. Jordan Decomposition Theorem: Any square matrix is similar to a (complex) Jordan matrix, unique up to permutation of Jordan blocks: $A = P J P^{-1}$, $P \in S_n$. In this form, $\{ p_{1 + n_i} \}_{i = 0}^{k-1}$ is a basis for the direct sum of all the eigenspaces of $A$, which correspond to eigenvalues $\{\lambda_i\}_{i = 1}^k$ respectively. A family of similar matrices, i.e. an equivalence class under similarity, have the same Jordan canonical form.

Algebraic multiplicity of eigenvalue $\lambda_i$ is the multiplicity of it as a zero of the characteristic polynomial $p_A(t)$, i.e. the number of diagonal entries in the Jordan matrix that equal to $\lambda_i$. Geometric multiplicity of eigenvalue $\lambda_i$ is the maximum number of linearly independent eigenvectors associated with it, i.e. the number of Jordan blocks in the Jordan matrix that correspond to $\lambda_i$.

A square matrix is diagonalizable (aka diagonable, nondefective, or semisimple) if it is similar to a diagonal matrix (Jordan blocks all size one). A square matrix is diagonalizable if and only if its eigenvectors can form a basis. Square matrices are diagonable in general: the set of defective matrices has measure zero.

Two diagonable matrices are simultaneously diagonable if they share a same set of eigenvectors: $A = P \Lambda P^{-1}$, $B = P \widetilde{\Lambda} P^{-1}$. Simultaneously diagonable family is a set of square matrices that can be diagonalized by the same similarity transformation: $\mathcal{F} \subset M_n$, $\exists P \in \text{GL}_n$: $\forall A \in \mathcal{F}$, $A = P \Lambda P^{-1}$. The set of matrices simultaneously diagonable with a given diagonable matrix is n-dimensional: $\mathcal{F} = \{P \text{diag}(\lambda) P^{-1} : \lambda \in \mathbb{C}^n \}$. Commuting family is a set of square matrices where each pair of matrices commutes (under multiplication): $\mathcal{F} \subset M_n$, $\forall A, B \in \mathcal{F}$, $A B = B A$. Two diagonable matrices are simultaneously diagonable if and only if they commute; (Thm 1.3.19) A family of diagonable matrices is a simultaneously diagonable family if and only if it is a commuting family. Commutability with a given diagonable matrix restricts $M_n$ to $\mathbb{C}^n$; from another perspective, the solutions to the matrix equation $A X = X A$, where A is diagonable, forms an n-dim vector space.

Location and perturbation of eigenvalues.

Miscellaneous Topics

Matrix Polynomial
Matrix-variate and Matrix-valued Functions
Matrix Manifold
Quadratic form: Quadratic Equation
Lambda matrices
Linear Programming
Minimax Theorem

Principal angles

Principal angles $\theta$ between two k-dim subspaces are defined recursively: Let [⋅] be an alias for $\text{Span}: M^∗_{n, k} \mapsto G_{k,n}$. With $M \in M^∗_{n, m}$, $N \in M^∗_{n, m'}$, and $m, m' \in \{0, \dots, n\}$, denote $S(M, N) = \mathbb{S}^{n-1} \cap [M] \cap [N]^\bot$ as the sphere in [M] and orthogonal to [N]. Given $U, V \in V_{k,n}$ as Stiefel representations for $[U], [V] \in G_{k,n}$, for $i = 1, \dots, k$, define $(\tilde{u}_i, \tilde{v}_i, \theta_i)$ such that $\cos \theta_i = \max_{\tilde{u}_i \in S(U, (\tilde{u}_j)_{j=1}^{i-1})} \max_{\tilde{v}_i \in S(V, (\tilde{v}_j)_{j=1}^{i-1})} \langle \tilde{u}_i, \tilde{v}_i \rangle$. Note that the principal angles are in non-decreasing order, and at most $\min(k, \lfloor n / 2 \rfloor)$ principal angles can be positive: let $l = \max(0, k - \lfloor n / 2 \rfloor)$, $(\theta_i)_{i=1}^l = 0$, and $0 \le \theta_{l+1} \le \dots \le \theta_k \le \pi / 2$; or more compactly, $(\theta_i)_{i=l+1}^k \in [0, \pi / 2]^u_{\uparrow}$, where $u = \min(k, \lfloor n / 2 \rfloor)$. We may call $\tilde{U} = (\tilde{u}_i)_{i=1}^k$ and $\tilde{V} = (\tilde{v}_i)_{i=1}^k$ the principal bases for the pair of subspaces ([U], [V]), which is uniquely determined by the subspaces up to simultaneous orthogonal transformations of the paired principal subspaces (for repeated principal angles). The principal bases may be seen as a canonical representation for this pair of subspaces. Note that the principal bases are orthonormal, "cross orthogonal", and at principal angles to each other: $\tilde{U}, \tilde{V} \in V_{k,n}$ and $\tilde{U}^T \tilde{V} = \text{diag}(\cos \theta)$. The above discussion on principal angles also applies to subspaces of different dimsensions.

Computing principal bases and principal angles. Denote singular value decomposition $U^T V = P \Sigma Q^T$, where $P, Q \in O(k)$, $\Sigma = \text{diag}(\sigma)$, and $\sigma \in \mathbb{R}^k_{+\downarrow}$, then principal bases $\tilde{U} = U P$ and $\tilde{V} = V Q$, and principal angles $\theta = \arccos \sigma$. However, small principal angles computed this way only has single precision due to arccos(). One may correct these principal angles (up to $\theta < 0.1 \pi/2$) via the algorithm for Grassmann logarithm per [@Zimmermann2019]: $\theta = \arcsin \sigma(U P Q^T - V Q \Sigma Q^T)$. As a note, the algorithm per [@Absil2004] are very inaccurate for principal angles away from $\pi/2$ when there are principal angles close to $\pi/2$, which is not worth using.

🏷 Category=Algebra