Linear Algebra

Linear algebra deals with finite-dimensional vector spaces. Notes on linear algebra follows [@Horn & Johnson, 1990. Matrix Analysis.]

Scanned course notes: Notes on linear algebra; Notes on advanced algebra; Notes on vector space;

Symbols

$\bar{\mathbb{F}}$: algebraically closed field, e.g. $\bar{\mathbb{Q}}$, $\bar{\mathbb{R}}$, and $\bar{\mathbb{C}}$.
$\mathbb{R}^n, \mathbb{C}^n$: n-dimentional vector spaces based on $\mathbb{R}$ and $\mathbb{C}$.
$\mathcal{B}$, $\{ e_i \}_{i=1}^n$: a basis of a vector space; the standard basis of $\mathbb{R}^n$ or $\mathbb{C}^n$.
$[v]_{\mathcal{B}}$, $_{\mathcal{B}_1}[T]_{\mathcal{B}_2}$: coordinate representation of a vector in basis $\mathcal{B}$ / a linear operator in bases $\mathcal{B}_1$ - $\mathcal{B}_2$.
$A$, $[A]_{I,J}$: a matrix; a submatrix with rows and columns from index sets $I$ and $J$ (default to $I = J$).
$M_{m,n}(\mathbb{F})$, $M_n(\mathbb{F})$: the set of all m-by-n / n-by-n matrices over field $\mathbb{F}$, which defaults to $\mathbb{C}$.
$\mathcal{F}$: a family of matrices;
$\text{GL}(n, \mathbb{F})$, $\text{SL}(n, \mathbb{F})$: the general / special linear group formed by nonsingular / unit-determinant matrices in $M_n(\mathbb{F})$;
$O(n)$, $SO(n)$, the (special) orthogonal group formed by orthogonal matrices in $\text{GL}(n, \mathbb{R})$ / $\text{SL}(n, \mathbb{R})$;
$U(n)$, $SU(n)$, the (special) unitary group formed by unitary matrices in $\text{GL}(n, \mathbb{C})$ / $\text{SL}(n, \mathbb{C})$;
$A \succeq 0$, $A \ge 0$: positive semi-definite matrix; nonnegative matrix.
$|A|$: matrix of absolute values of entries of a matrix.
$A^T, \bar{A}, A^∗, A^\dagger$: transpose / conjugate / Hermitian adjoint (conjugate transpose) / Moore-Penrose pesudoinverse of a matrix.
$A^{-1}$: inverse of a nonsingular matrix.
$A^{1/2}$: the unique positive semidefinite square root of a positive semidefinite matrix.
$\sigma(A)$, $\sigma_i(A)$: the set of singular values of a matrix; the i-th largest singular value.
$\lambda(A)$, $\lambda_i(A)$: spectrum, the set of eigenvalues of a square matrix; the i-th eigenvalue, in increasing order if $A$ is Hermitian.
$\rho(A)$: spectral radius of a square matrix, the largest modulus of its eigenvalues.
$\text{tr} A$, $\det A$, $\text{adj} A$: trace / determinant / classical adjoint (adjugate) of a square matrix.
$p_A(t), q_A(t)$: the characteristic / minimal polynomial of a square matrix.

Vector Space and Product Space of Scalars

Vector space or linear space $(V, (+, \cdot_{\mathbb{F}}))$ over a field $(\mathbb{F}, (+, \cdot))$ is a set $V$ endowed with a binary operation $+: V^2 \mapsto V$ and a map $\cdot_{\mathbb{F}}: \mathbb{F} \times V \mapsto V$. For example, Cartesian powers of real numbers $\mathbb{R}^n$ and the set of continuous functions on the unit interval $C^0(I)$ become real vector spaces when endowed with component-/point-wise addition and scalar multiplication. We call $+$ vector addition and $\cdot_{\mathbb{F}}$ scalar multiplication. Vector space structure $(+, \cdot_{\mathbb{F}})$ is the vector addition and scalar multiplication maps of a vector space.

Linear subspace $(S, (+, \cdot_{\mathbb{F}}))$ of a vector space is a vector space consisting of a subset of the vector space and the same vector space structure: $+: S^2 \mapsto S$, $\cdot_{\mathbb{F}}: \mathbb{F} \times S \mapsto S$. Linear combination of vectors is a finite sum of scaled vectors: $\sum_{i=1}^k a^i v_i$. Linear span (线性生成空间) $\text{Span}(A)$ of a set $A$ of vectors is the subspace whose underlying set is the set of all linear combinations of the set.

A set of vectors are linearly independent if every non-zero linear combination of the set is non-zero. Algebraic basis $\mathcal{B}$ of a vector space $V$ is a set of linearly independent vectors that span the space: $\text{Span}(\mathcal{B}) = V$. Dimension $\dim V$ of a vector space is the size of any basis of the vector space. The basis may be finite, countably infinite, or uncountably infinite; so is the dimension of the vector space.

Table: The intersection of linearly independent sets and span-to-space sets are bases.

Elements	$1, \cdots, \dim V$	$\dim V$	$\dim V, \cdots$
Sets	linearly independent	basis	span to $V$

The sum $v + S$ of a vector and a subspaces of a vector space is the set of all vectors that can be written as the addition of the vector and a vector from the subspace: $v + S = \{v + w : w \in S\}$. We call $v + S$ an affine subspace of the vector space parallel to the subspace, and also the coset (陪集) of the subspace determined by the vector. Dimension of an affine subspace is the dimension of the associated subspace. Quotient $V / S$ of a vector space by a subspace is the set of all cosets of the subspace: $V / S = \{v + S : v \in V\}$. Quotient space $(V / S, (+, \cdot_{\mathbb{F}}))$ of a vector space is a vector space of dimension $\dim V - \dim S$, consisting of a quotient of the vector space, and vector addition and scalar multiplication of cosets defined by $(v + S) + (w + S) = (v + w) + S$ and $c (v + S) = (c v) + S$. The natural projection $\pi: V \mapsto V/S$ associated with a quotient space is defined by $\pi(v) = v + S$.

The sum $S + T$ of two subspaces of a vector space is the set of all vectors that can be written as the addition of two vectors, one from each subspace: $S + T = \{v + w : v \in S, w \in T\}$. The sum of two subspaces equals the linear span of their union: $S + T = \text{Span}(S \cup T)$. Internal direct sum $S \oplus T$ of two subspaces of a vector space that intersect only at zero is their sum: $S \oplus T = S + T$. Complementary subspaces are two linear subspaces that intersect only at zero and whose (direct) sum equals the full space. Projection $\pi: V \mapsto S$ from a vector space onto a subspace, given a complementary subspace $T$, is the linear map that takes each vector to the vector in the subspace that appears in its direct sum decomposition: $\pi(v + w) = v$ where $v \in S, w \in T$.

Direct product or product space $(\prod_{\alpha \in A} V_\alpha, (+, \cdot_{\mathbb{F}}))$ of an indexed family of vector spaces over the same field is the vector space consisting of the Cartesian product and componentwise addition and scalar multiplication. The direct product and the external direct sum of a finite indexed family of vector spaces are identical.

Vector space isomorphism is an invertible linear map between two vector spaces. Finite-dimensional vector spaces over the same field are isomorphic if and only if they have the same dimension: $V \cong W \iff \dim V = \dim W$. In particular, any n-dimensional vector space over a field $\mathbb{F}$ is isomorphic the product space of the field to the n-th power: $V \cong \mathbb{F}^n \iff \dim V = n$. For any basis $\mathcal{B}$ of an n-dimensional vector space $V$ over field $\mathbb{F}$, the mapping $f: v \rightarrow [v]{\mathcal{B}}$ from a vector to its coordinates is an isomorphism between $V$ and $\mathbb{F}^n$. It justifies the identification of all n-dimensional vector spaces with $\mathbb{F}^n$, up to a specific basis for each space. The set $\mathbb{F}^X$ of all functions $f: X \mapsto \mathbb{F}$ from a set to a field $(\mathbb{F}, (+, ×))$, endowed with pointwise addition and scalar multiplication, is a vector space $(\mathbb{F}^X, (+, \cdot{\mathbb{F}}))$ over the field, which may be called a scalar-valued function space. A function space has a finite dimension if and only if the underlying set is finite: $|X| = n \implies \mathbb{F}^X \cong \mathbb{F}^n$.

Linear Operator and Matrix

Operator (算子) $\omega: V \mapsto W$ is a mapping from a vector space to another; its application to / action on a vector is commonly denoted as $\omega v = w$. Linear operator $\omega: V \to W$ is an operator between vector spaces over the same field, such that it is compatible with their vector space structures: $\omega (x + y) = \omega x + \omega y$, $\omega (a x) = a \omega x$. Examples of linear operators include the coefficient matrices of systems of linear equations. The set $\mathcal{L}(V, W)$ of all linear operators between two given vector spaces over the same field, endowed with pointwise addition and scalar multiplication, is a vector space over the same field. Linear transformation $\upsilon: V \to V$ is a linear operator from a vector space to itself. Examples of linear transformations include coordinate transformations and the Fourier integral transform. The vector space of linear transformations on a vector space is also written as $\mathcal{L}(V)$. Linear operator theory concerns linear operators between (infinite-dimensional) vector spaces; the study of linear operators between finite-dimensional vector spaces is called Matrix Theory. An important question in linear operator theory is classifying linear transformations $\mathcal{L}(V)$ w.r.t. some equivalence relation, e.g. similarity, unitary equivalence, topological equivalence.

Functional (泛函) $\alpha: V \mapsto \mathbb{F}$ is an operator from a vector space to its underlying scalar field, especially when the vector space is (a subspace of) a scalar-valued function space $\mathbb{F}^X$. Linear functional or covector (余向量) $\alpha: V \to \mathbb{F}$ is a functional that is also a linear operator. Examples of linear functionals include differentiation and integration.

Dual space $V^∗$ of a vector space is the vector space of its linear functionals: $V^∗ = \mathcal{L}(V, \mathbb{R})$. Any finite-dimensional vector space is isomorphic to its second dual space $V^{∗∗} = ( V^∗ )^∗$ via the canonical isomorphism $\xi: V \mapsto V^{∗∗}$ defined by: $\forall v \in V$, $\forall \omega \in V^∗$, $\xi(v)(\omega) = \omega(v)$. The action of a covector on a vector sometimes uses a symmetric angle bracket notation: $\langle w, v \rangle = \omega(v)$, $\langle v, w \rangle = \xi(v)(\omega)$, which is different from the angle bracket notation for inner products because the latter operate on two vectors. Kronecker delta $\delta^i_j$ for indices $i, j \in I$ is a symbol for whether the indices are equal: $\delta: I^2 \mapsto \{0, 1\}$; $\delta^i_j = 1$ iff $i = j$. Dual basis $(\varepsilon^i)_{i=1}^n$ to a basis $(e_i)_{i=1}^n$ of a vector space $V$ is the basis for the dual space $V^∗$ defined by $\varepsilon^i(e_j) = \delta^i_j$. Therefore, a finite-dimensional vector space and its dual space have the same dimension and are isomorphic; but there is no canonical isomorphism, i.e. without reference to any basis. By the canonical isomorphism $V \cong V^{∗∗}$, the dual of a basis vector is an involution, i.e. any basis is the dual basis to its dual basis. The action of a covector on a vector equals the sum of products of their coordinate representation in any basis and its dual basis: $\omega = \omega_i \varepsilon^i$, $v = v^j e_j$, then $\omega(v) = \omega_i v^i$; we write basis covectors with upper indices, and components of a covector with lower indices, so that the Einstein summation convention applies.

Matrix $(a^i_j)^{i \in I}{j \in J}$ is a rectangular array of scalars in a field $\mathbb{F}$. Matrix addition $A + B$ of two matrices of the same shape is the matrix formed by entrywise addition: $[A + B]{i,j} = a^i_j + b^i_j$. Scalar matrix $a I$ is a scalar multiple of the identity matrix. The set $M_{m,n}(\mathbb{F})$ of all m-by-n matrices with entries from a field $\mathbb{F}$, endowed with entrywise addition and scalar multiplication, is a vector space over the same field. Matrix transpose $A^T$ is the flipped matrix: $[A^T]{i,j} = a^j_i$. Matrix multiplication $A B$ of two matrices $A \in M{l,m}(\mathbb{F})$ and $B \in M_{m,n}(\mathbb{F})$ is the matrix in $M_{l,n}(\mathbb{F})$ defined by $[A B]{i,j} = a^i_k b^k_j$. Matrix direct sum $A \oplus B$ is the block diagonal matrix $\text{diag}(A, B)$. Hadamard product or Schur product $A \circ B$ or $A \odot B$ of matrices of the same shape is the matrix formed by entrywise product: $[A \circ B]{i,j} = a^i_j b^i_j$. Kronecker product or tensor product $A \otimes B$ of two matrices $A \in M_{m,n}(\mathbb{F})$ and $B \in M_{p,q}(\mathbb{F})$ is the matrix in $M_{mp,nq}(\mathbb{F})$ defined by $[A \otimes B]{(i-1)p+k,(j-1)q+l} = a^i_j b^k_l$. Kronecker sum or tensor sum $A \oplus B$ (same symbol as direct sum) of square matrices $A \in M_m(\mathbb{F})$ and $M_n(\mathbb{F})$ is the square matrix in $M{mn}(\mathbb{F})$ defined by $A \oplus B = A \oplus I_m + I_n \oplus B$.

Coordinate representation $[A]$ of a linear operator $A \in \mathcal{L}(V, W)$ w.r.t. a basis $(v_j){j=1}^n$ of the domain and a basis $(w_i){i=1}^m$ of the codomain is the matrix $[A] \in M_{m,n}(\mathbb{F})$ defined by $a^i_j = \omega^i A v_j$, where $(\omega^i){i=1}^m$ is the dual basis of the given basis of the codomain. For any bases $\mathcal{B}_V$ and $\mathcal{B}_W$ of n- and m-dimensional vector spaces $V$ and $W$ over field $\mathbb{F}$, the mapping $f: A \rightarrow \{\mathcal{B}W}[T]\{\mathcal{B}V}$ from a linear operator to its matrix representation is a vector space isomorphism between $\mathcal{L}(V, W)$ and $M{m,n}(\mathbb{F})$. Given such isomorphism/identification determined by two bases: matrix addition $[A] + [B]$ corresponds to the addition $A + B$ of linear operators; matrix multiplication $[A] [B]$ corresponds to the composition $A B$ of linear operators. matrix direct sum $[A] \oplus [B]$ corresponds to the direct sum $A \oplus B$ of linear operators, such that $(A \oplus B)(v, v') = (Av) \oplus (Bv')$.

Dual operator or transpose $A^∗ \in \mathcal{L}(W^∗, V^∗)$ of a linear operator $A \in \mathcal{L}(V, W)$ is the linear operator defined by $\forall \omega \in W^∗$, $\forall v \in V$, $(A^∗ \omega)(v) = \omega (A v)$. Given two bases of the domain and the codomain, the matrix representation of the dual operator equals the transpose of that of the original linear operator: $_{\mathcal{B}V}[A^∗]\{\mathcal{B}W} = (\{\mathcal{B}W}[A]\{\mathcal{B}V})^T$. Fundamental subspaces associated with a rank-$r$ linear operator $A: V \mapsto W$ between real vector spaces of dimensions $\dim V = n$ and $\dim W = m$ are the four subspaces of its domain or codomain defined as follows. Let its matrix representation in some bases has singular value decomposition $[A] = U \Sigma V^T$, the coordinate representation of a basis for each subspace is also provided. Image (像) $\text{im}(A)$ or column space is the r-dimensional subspace of the codomain whose underlying set is the range of the operator: $\text{im}(A) = A V$; it has a basis $\{u_i\}\{i=1}^r$. Kernel (核) $\text{ker}(A)$ or null space $A^{-1}(0)$ is the (n-r)-dimensional subspace of the domain whose underlying set is the zero set of the operator; it has a basis $\{v_i\}_{i=r+1}^n$. Coimage (余像) $\text{im}(A^T)$ or row space is the r-dimensional subspace of the domain whose underlying set is the range of the dual operator: $\text{im}(A^T) = A^T W$; it has a basis $\{v_i\}_{i=1}^r$. Cokernel (余核) $\text{ker}(A^T)$ or left null space $(A^T)^{-1}(0)$ is the (m-r)-dimensional subspace of the codomain whose underlying set is the zero set of the dual operator; it has a basis $\{u_i\}_{i=r+1}^m$. Fundamental Theorem of Linear Algebra (not the Fundamental Theorem of Algebra): Kernel and coimage are complementary subspaces: $V = \text{ker}(A) \oplus \text{im}(A^T)$. Cokernel and image are complementary subspaces: $W = \text{ker}(A^T) \oplus \text{im}(A)$. If the underlying vector spaces are inner product spaces, the said two pairs of fundamental subspaces are orthogonal complements: $\text{ker}(A)^\perp = \text{im}(A^T)$; $\text{ker}(A^T) = \text{im}(A)^\perp$.

Matrix Decomposition/Factorization

Matrix decompositions/factorizations express a matrix as the product of smaller or simpler matrices.

QR decomposition ~ Gram–Schmidt orthogonalization process: $A = Q R$, where $A \in M_{m,n}$, $Q \in M_{m,n}$ is orthogonal/unitary, $R \in M_n$ is upper-triangular; $\{ a_1, \cdots, a_n \} \to \{ q_1, \cdots, q_n \}$;
LU (LUP) decomposition ~ Gaussian elimination for solving systems of linear equations: $A = L U$ where $L$ is lower-triagular and $U$ is upper-triangular, if $A \in M_n$ and $\det[A]_{1 \dots i} \ne 0, i = 1, \cdots, n$;
- Cholesky decomposition: $A = L L^∗$ where $L$ is lower-triangular, if $A \succeq 0$;
Jordan decomposition ~ eigenvalue problem: $A = P J P^{-1}$ where $A \in M_n$, $J$ is a Jordan matrix, and $P$ is invertible; if $A$ is real with only real eigenvalues, then $P$ can be real;
- Eigendecomposition (spectral decomposition): $A = P \Lambda P^{-1}$ where $\Lambda$ is diagonal, if $A$ has $n$ linearly independent eigenvectors;
  - $A = U \Lambda U^∗$ where $U$ is orthogonal/unitary, if $A$ is a normal matrix (e.g. orthogonal/unitary, (skew-)symmetric/Hermitian);
QZ decomposition ~ generalized eigenvalue problem;
Singular value decomposition (SVD) ~ principal components analysis, latent factor analysis: $A = V \Sigma W^∗$, where $A \in M_{m,n}$, $V \in M_m, W \in M_n$ are orthogonal/unitary, and $\Sigma \in M_{m,n}$ is non-negative and has diagonal entries only;

Similarity and the Eigenvalue Problem

Two square matrices $A, B \in M_n$ are similar if they are matrix representations of the same linear operator in different bases: $\exists S \in \text{GL}(n): A = S B S^{-1}$. The transformation $B \to S B S^{-1}$ is called a similarity transformation by similarity matrix $S$.

An eigenvector of a square matrix is a vector that remains in its linear span after the linear transformation: $A x = \lambda x$. The eigenvalue (本征值) $\lambda$ that corresponds to an eigenvector is the equivalent scaling when the eigenvector is applied to the linear transformation. The eigenspace that corresponds to an eigenvalue is the subspace where the linear transformation is equivalent to scaling by the eigenvalue: $\{x \mid A x = \lambda x\}$.

Jordan Canonical Form: Any square matrix is similar to a (complex) Jordan matrix, unique up to permutation of Jordan blocks: $A = P J P^{-1}$, $P \in S_n$. In this norm, $\{ p_{1 + n_i} \}_{i = 0}^{k-1}$ is a basis for the direct sum of all the eigenspaces of $A$, which correspond to eigenvalues $\{\lambda_i\}_{i = 1}^k$ respectively. A family of similar matrices, i.e. an equivalence class under similarity, have the same Jordan canonical form.

A Jordan block $J_k(\lambda_i)$ is a $k \times k$ matrix with $\lambda_i$ on the diagonal and $1$ on the super diagonal: $J_k(\lambda_i) = \lambda_i I_k + U_k$, where $U_k$ is the upper/backward shift matrix $U_k = [0, I_{k-1}; 0, 0]$; specially, $J_1(\lambda_i) = [\lambda_i]$. A Jordan matrix is a direct sum of Jordan blocks, i.e. a block diagonal matrix consisting of Jordan blocks: $J = \text{diag}\{ J_{n_i}(\lambda_i) \}_{i=1}^k$, $\sum_{i=1}^k n_i = n$.

The (algebraic) multiplicity of eigenvalue $\lambda_i$ is the multiplicity of it as a zero of the characteristic polynomial $p_A(t)$, i.e. the number of diagonal entries in the Jordan matrix that equal to $\lambda_i$. The geometric multiplicity of eigenvalue $\lambda_i$ is the maximum number of linearly independent eigenvectors associated with it, i.e. the number of Jordan blocks in the Jordan matrix that correspond to $\lambda_i$.

A square matrix is diagonalizable if it is similar to a diagonal matrix (Jordan blocks all size one). A square matrix is invertible iff its eigenvalues do no include zero. A square matrix is diagonalizable iff its eigenvectors can form a basis.

Location and perturbation of eigenvalues.

Miscellaneous Topics

🏷 Category=Algebra