Linear algebra deals with finite-dimensional vector spaces. Notes on linear algebra follows [@Horn & Johnson, 1990. Matrix Analysis.]

Scanned course notes: Notes on linear algebra; Notes on advanced algebra; Notes on vector space;

## Symbols

• $\bar{\mathbb{F}}$: algebraically closed field, e.g. $\bar{\mathbb{Q}}$, $\bar{\mathbb{R}}$, and $\bar{\mathbb{C}}$.
• $\mathbb{R}^n, \mathbb{C}^n$: n-dimentional vector spaces based on $\mathbb{R}$ and $\mathbb{C}$.
• $\mathcal{B}$, $\{ e_i \}_{i=1}^n$: a basis of a vector space; the standard basis of $\mathbb{R}^n$ or $\mathbb{C}^n$.
• $[v]_{\mathcal{B}}$, $_{\mathcal{B}_1}[T]_{\mathcal{B}_2}$: coordinate representation of a vector in basis $\mathcal{B}$ / a linear operator in bases $\mathcal{B}_1$ - $\mathcal{B}_2$.
• $A$, $[A]_{I,J}$: a matrix; a submatrix with rows and columns from index sets $I$ and $J$ (default to $I = J$).
• $M_{m,n}(\mathbb{F})$, $M_n(\mathbb{F})$: the set of all m-by-n / n-by-n matrices over field $\mathbb{F}$, which defaults to $\mathbb{C}$.
• $\mathcal{F}$: a family of matrices;
• $\text{GL}(n, \mathbb{F})$, $\text{SL}(n, \mathbb{F})$: the general / special linear group formed by nonsingular / unit-determinant matrices in $M_n(\mathbb{F})$;
• $O(n)$, $SO(n)$, the (special) orthogonal group formed by (unit-determinant) orthogonal matrices in $\text{GL}(n, \mathbb{R})$ / $\text{SL}(n, \mathbb{R})$;
• $U(n)$, $SU(n)$, the (special) unitary group formed by (unit-determinant) unitary matrices in $\text{GL}(n, \mathbb{C})$ / $\text{SL}(n, \mathbb{C})$;
• $A \succeq 0$, $A \ge 0$: positive semi-definite matrix; nonnegative matrix.
• $|A|$: matrix of absolute values of entries of a matrix.
• $A^T, \bar{A}, A^∗, A^\dagger$: transpose / conjugate / Hermitian adjoint (conjugate transpose) / Moore-Penrose pesudoinverse of a matrix.
• $A^{-1}$: inverse of a nonsingular matrix.
• $A^{1/2}$: the unique positive semidefinite square root of a positive semidefinite matrix.
• $\sigma(A)$, $\sigma_i(A)$: the set of singular values of a matrix; the i-th largest singular value.
• $\lambda(A)$, $\lambda_i(A)$: spectrum of a square matrix; the i-th eigenvalue, in increasing order if the matrix is Hermitian.
• $\rho(A)$: spectral radius of a square matrix, the largest modulus of its eigenvalues.
• $\text{tr} A$, $\det A$, $\text{adj} A$: trace / determinant / classical adjoint (adjugate) of a square matrix.
• $p_A(t), q_A(t)$: the characteristic / minimal polynomial of a square matrix.

## Vector Space and Product Space of Scalars

Vector space or linear space $(V, (+, \cdot_{\mathbb{F}}))$ over a field $(\mathbb{F}, (+, \cdot))$ is a set $V$ endowed with a binary operation $+: V^2 \mapsto V$ and a map $\cdot_{\mathbb{F}}: \mathbb{F} \times V \mapsto V$. For example, Cartesian powers of real numbers $\mathbb{R}^n$, the set $\mathbb{R}^X$ of real-valued functions with a common domain, and the set $C^0(X)$ of continuous real-valued functions with a common domain become real vector spaces when endowed with component-/point-wise addition and scalar multiplication. We call $+$ vector addition and $\cdot_{\mathbb{F}}$ scalar multiplication. Vector space structure $(+, \cdot_{\mathbb{F}})$ is the vector addition and scalar multiplication maps of a vector space.

Linear subspace $(S, (+, \cdot_{\mathbb{F}}))$ of a vector space is a vector space consisting of a subset of the vector space and the same vector space structure: $+: S^2 \mapsto S$, $\cdot_{\mathbb{F}}: \mathbb{F} \times S \mapsto S$. Linear combination of vectors is a finite sum of scaled vectors: $\sum_{i=1}^k a^i v_i$. Linear span (线性生成空间) $\text{Span}(A)$ of a set $A$ of vectors is the subspace whose underlying set is the set of all linear combinations of the set.

Linearly independent subset of a vector space is a subset such that every non-zero linear combination of its elements is non-zero: $n \in \mathbb{N}$, $a \in \mathbb{R}^n$, $a \ne 0$, then $\forall \{v_i\}_{i=1}^n$, $\sum_{i=1}^n a^i v_i \ne 0$. Algebraic/Hamel basis $\mathcal{B}$ of a vector space is a linearly independent subset that spans the space: $\text{Span}(\mathcal{B}) = V$. Every vector space has a basis. Dimension $\dim V$ of a vector space is the size of any basis of the vector space. The basis may be finite, countably infinite, or uncountably infinite; so is the dimension of the vector space.

Table: The intersection of linearly independent sets and span-to-space sets are bases.

Elements $1, \cdots, \dim V$ $\dim V$ $\dim V, \cdots$
Sets linearly independent basis span to $V$

The sum $v + S$ of a vector and a subspaces of a vector space is the set of all vectors that can be written as the addition of the vector and a vector from the subspace: $v + S = \{v + w : w \in S\}$. We call $v + S$ an affine subspace of the vector space parallel to the subspace, and also the coset (陪集) of the subspace determined by the vector. Dimension of an affine subspace is the dimension of the associated subspace. Quotient $V / S$ of a vector space by a subspace is the set of all cosets of the subspace: $V / S = \{v + S : v \in V\}$. Quotient space $(V / S, (+, \cdot_{\mathbb{F}}))$ of a vector space is a vector space of dimension $\dim V - \dim S$, consisting of a quotient of the vector space, and vector addition and scalar multiplication of cosets defined by $(v + S) + (w + S) = (v + w) + S$ and $c (v + S) = (c v) + S$. The natural projection $\pi: V \mapsto V/S$ associated with a quotient space is defined by $\pi(v) = v + S$.

The sum $S + T$ of two subspaces of a vector space is the set of all vectors that can be written as the addition of two vectors, one from each subspace: $S + T = \{v + w : v \in S, w \in T\}$. The sum of two subspaces equals the linear span of their union: $S + T = \text{Span}(S \cup T)$. Internal direct sum $S \oplus T$ of two subspaces of a vector space that intersect only at zero is their sum: $S \oplus T = S + T$. Complementary subspaces are two linear subspaces that intersect only at zero and whose (direct) sum equals the full space. Projection $\pi: V \mapsto S$ from a vector space onto a subspace, given a complementary subspace $T$, is the linear map that takes each vector to the vector in the subspace that appears in its direct sum decomposition: $\pi(v + w) = v$ where $v \in S, w \in T$.

Direct product or product space $(\prod_{\alpha \in A} V_\alpha, (+, \cdot_{\mathbb{F}}))$ of an indexed family of vector spaces over the same field is the vector space consisting of the Cartesian product and componentwise addition and scalar multiplication. The direct product and the external direct sum of a finite indexed family of vector spaces are identical.

Vector space isomorphism is an invertible linear map between two vector spaces. Finite-dimensional vector spaces over the same field are isomorphic if and only if they have the same dimension: $V \cong W \iff \dim V = \dim W$. In particular, any n-dimensional vector space over a field $\mathbb{F}$ is isomorphic the product space of the field to the n-th power: $V \cong \mathbb{F}^n \iff \dim V = n$. For any basis $\mathcal{B}$ of an n-dimensional vector space $V$ over field $\mathbb{F}$, the mapping $f: v \rightarrow [v]_{\mathcal{B}}$ from a vector to its coordinates is an isomorphism between $V$ and $\mathbb{F}^n$. It justifies the identification of all n-dimensional vector spaces with $\mathbb{F}^n$, up to a specific basis for each space. Scalar-valued function space $(\mathbb{F}^X, (+, \cdot_{\mathbb{F}}))$ on a set $X$ over a field $(\mathbb{F}, (+, \cdot))$ is the vector space over the field consisting of all scalar-valued functions on set, endowed with pointwise addition and scalar multiplication. A function space has a finite dimension if and only if the underlying set is finite: $|X| = n$ then $\mathbb{F}^X \cong \mathbb{F}^n$.

## Linear Operator and Matrix

Operator (算子) $\omega: V \mapsto W$ is a mapping from a vector space to another; its application to / action on a vector is commonly denoted as $\omega v = w$. Linear operator $\omega: V \to W$ is an operator between vector spaces over the same field, such that it is compatible with their vector space structures: $\omega (x + y) = \omega x + \omega y$, $\omega (a x) = a \omega x$. Examples of linear operators include the coefficient matrices of systems of linear equations. The set $\mathcal{L}(V, W)$ of all linear operators between two given vector spaces over the same field, endowed with pointwise addition and scalar multiplication, is a vector space over the same field. Linear transformation $\upsilon: V \to V$ is a linear operator from a vector space to itself. Examples of linear transformations include coordinate transformations and the Fourier integral transform. Linear operator theory concerns linear operators between (infinite-dimensional) vector spaces; the study of linear operators between finite-dimensional vector spaces is called Matrix Theory. An important question in linear operator theory is classifying linear transformations $\mathcal{L}(V, V)$ w.r.t. some equivalence relation, e.g. similarity, unitary equivalence, topological equivalence.

Functional (泛函) $\alpha: V \mapsto \mathbb{F}$ is an operator from a vector space to its underlying scalar field, especially when the vector space is (a subspace of) a scalar-valued function space $\mathbb{F}^X$. Linear functional or covector (余向量) $\alpha: V \to \mathbb{F}$ is a functional that is also a linear operator. Examples of linear functionals include differentiation and integration. The vector space $\mathcal{L}(V, \mathbb{F})$ of linear functionals on a vector space is also written as $\mathcal{L}(V)$. Sublinear functional $p: V \mapsto \mathbb{R}$ on a real vector space is a functional that is subadditive and positive-homogeneous: $p(v + w) \le p(v) + p(w)$, $p(|a| v) = |a| p(v)$. Hahn–Banach Theorem (extension of linear functionals) [@Hahn1927; @Banach1929]: Every linear functional on a subspace of a real vector space that is bounded from above by a sublinear functional on the space has a linear extension to the space that remains bounded from above by the sublinear functional: $\omega \in S^∗$, $\omega \le p$, then $\exists \tilde\omega \in V^∗$, $\tilde\omega \le p$.

Algebraic dual space $V^∗$ of a vector space is the vector space of its linear functionals: $V^∗ = \mathcal{L}(V)$. Dual basis $(\varepsilon^i)_{i=1}^n$ to a basis $(e_i)_{i=1}^n$ of a finite-dimensional vector space is the n-tuple of covectors defined by $\varepsilon^i(e_j) = \delta^i_j$, which is a basis for the dual space. A finite-dimensional vector space and its dual space have the same dimension, and thus isomorphic. The action of a covector on a vector equals the sum of products of their coordinate representations in any basis and the dual basis: $\omega = \omega_i \varepsilon^i$, $v = v^j e_j$, then $\omega(v) = \omega_i v^i$; we write basis covectors with upper indices, and components of a covector with lower indices, so that the Einstein summation convention applies. Second dual space $V^{∗∗}$ of a vector space is the dual space of its dual space: $V^{∗∗} = (V^∗)^∗$. Canonical embedding $\xi: V \mapsto V^{∗∗}$ of a vector space into its second dual space is the injective linear operator that maps each vector to the evaluation map of covectors at that vector: $\forall v \in V$, $\forall \omega \in V^∗$, $\xi(v)(\omega) = \omega(v)$. This embedding is called canonical because its definition has no reference to any basis (see invariant definition). Algebraically reflexive vector space is a vector space whose canonical embedding is surjective, and thus a canonical vector space isomorphism: $\xi(V) = V^{∗∗}$. Finite-dimensional vector spaces are reflexive. Although a finite-dimensional vector space is also isomorphic to its dual space, there is no canonical isomorphism. Because of the canonical embedding, the action of a covector on a vector sometimes uses a symmetric angle bracket notation: $\langle w, v \rangle := \omega(v)$, $\langle v, w \rangle := \xi(v)(\omega) = \omega(v)$; this notation causes no confusion with an inner product because the latter operates on two vectors. With this symmetric notation, any basis is the dual basis to its dual basis, and thus the dual of a basis in a finite-dimensional vector space is an involution.

Matrix $(a^i_j)^{i \in I}_{j \in J}$ is a rectangular array of scalars in a field $\mathbb{F}$. Matrix addition $A + B$ of two matrices of the same shape is the matrix formed by entrywise addition: $[A + B]_{i,j} = a^i_j + b^i_j$. Scalar matrix $a I$ is a scalar multiple of the identity matrix. The set $M_{m,n}(\mathbb{F})$ of all m-by-n matrices with entries from a field $\mathbb{F}$, endowed with entrywise addition and scalar multiplication, is a vector space over the same field. Matrix transpose $A^T$ is the flipped matrix: $[A^T]_{i,j} = a^j_i$. Matrix multiplication $A B$ of two matrices $A \in M_{l,m}(\mathbb{F})$ and $B \in M_{m,n}(\mathbb{F})$ is the matrix in $M_{l,n}(\mathbb{F})$ defined by $[A B]_{i,j} = a^i_k b^k_j$. Matrix direct sum $A \oplus B$ is the block diagonal matrix $\text{diag}(A, B)$. Hadamard product or Schur product $A \circ B$ or $A \odot B$ of matrices of the same shape is the matrix formed by entrywise product: $[A \circ B]_{i,j} = a^i_j b^i_j$. Kronecker product or tensor product $A \otimes B$ of two matrices $A \in M_{m,n}(\mathbb{F})$ and $B \in M_{p,q}(\mathbb{F})$ is the matrix in $M_{mp,nq}(\mathbb{F})$ defined by $[A \otimes B]_{(i-1)p+k,(j-1)q+l} = a^i_j b^k_l$. Kronecker sum or tensor sum $A \oplus B$ (same symbol as direct sum) of square matrices $A \in M_m(\mathbb{F})$ and $M_n(\mathbb{F})$ is the square matrix in $M_{mn}(\mathbb{F})$ defined by $A \oplus B = A \oplus I_m + I_n \oplus B$.

Coordinate representation $[A]$ of a linear operator $A \in \mathcal{L}(V, W)$ w.r.t. a basis $(v_j)_{j=1}^n$ of the domain and a basis $(w_i)_{i=1}^m$ of the codomain is the matrix $[A] \in M_{m,n}(\mathbb{F})$ defined by $a^i_j = \omega^i A v_j$, where $(\omega^i)_{i=1}^m$ is the dual basis of the given basis of the codomain. For any bases $\mathcal{B}_V$ and $\mathcal{B}_W$ of n- and m-dimensional vector spaces $V$ and $W$ over field $\mathbb{F}$, the mapping $f: A \rightarrow _{\mathcal{B}_W}[T]_{\mathcal{B}_V}$ from a linear operator to its matrix representation is a vector space isomorphism between $\mathcal{L}(V, W)$ and $M_{m,n}(\mathbb{F})$. Given such isomorphism/identification determined by two bases: matrix addition $[A] + [B]$ corresponds to the addition $A + B$ of linear operators; matrix multiplication $[A] [B]$ corresponds to the composition $A B$ of linear operators. matrix direct sum $[A] \oplus [B]$ corresponds to the direct sum $A \oplus B$ of linear operators, such that $(A \oplus B)(v, v') = (Av) \oplus (Bv')$.

Dual operator or transpose $A^∗ \in \mathcal{L}(W^∗, V^∗)$ of a linear operator $A \in \mathcal{L}(V, W)$ is the linear operator defined by $\forall \omega \in W^∗$, $\forall v \in V$, $(A^∗ \omega)(v) = \omega (A v)$. Given two bases of the domain and the codomain, the matrix representation of the dual operator equals the transpose of that of the original linear operator: $_{\mathcal{B}_V}[A^∗]_{\mathcal{B}_W} = (_{\mathcal{B}_W}[A]_{\mathcal{B}_V})^T$. Fundamental subspaces associated with a rank-$r$ linear operator $A: V \mapsto W$ between real vector spaces of dimensions $\dim V = n$ and $\dim W = m$ are the four subspaces of its domain or codomain defined as follows. Let its matrix representation in some bases has singular value decomposition $[A] = U \Sigma V^T$, the coordinate representation of a basis for each subspace is also provided. Image (像) $\text{im}(A)$ or column space is the r-dimensional subspace of the codomain whose underlying set is the range of the operator: $\text{im}(A) = A V$; it has a basis $\{u_i\}_{i=1}^r$. Kernel (核) $\text{ker}(A)$ or null space $A^{-1}(0)$ is the (n-r)-dimensional subspace of the domain whose underlying set is the zero set of the operator; it has a basis $\{v_i\}_{i=r+1}^n$. Coimage (余像) $\text{im}(A^T)$ or row space is the r-dimensional subspace of the domain whose underlying set is the range of the dual operator: $\text{im}(A^T) = A^T W$; it has a basis $\{v_i\}_{i=1}^r$. Cokernel (余核) $\text{ker}(A^T)$ or left null space $(A^T)^{-1}(0)$ is the (m-r)-dimensional subspace of the codomain whose underlying set is the zero set of the dual operator; it has a basis $\{u_i\}_{i=r+1}^m$. Fundamental Theorem of Linear Algebra (not the Fundamental Theorem of Algebra): Kernel and coimage are complementary subspaces: $V = \text{ker}(A) \oplus \text{im}(A^T)$. Cokernel and image are complementary subspaces: $W = \text{ker}(A^T) \oplus \text{im}(A)$. If the underlying vector spaces are inner product spaces, the said two pairs of fundamental subspaces are orthogonal complements: $\text{ker}(A)^\perp = \text{im}(A^T)$; $\text{ker}(A^T) = \text{im}(A)^\perp$.

Trace $\text{tr}$ of a square matrix is the sum of its diagonal entries: $\forall A \in M_n$, $\text{tr}~A = \sum_{i=1}^n A_{ii}$. Trace is invariant under cyclic permutations: $\forall A \in M_{m,n}$, $\forall B \in M_{n,m}$, $\text{tr}(A B) = \text{tr}(B A)$. Any linear functional on a space linear endomorphisms is proportional to the trace if it is invariant under cyclic permutations. As a corollary, trace is similarity invariant: $\forall A \in M_n$, $\forall P \in \text{GL}n$, $\text{tr}(P^{-1} A P) = \text{tr}(A)$. Trace $\text{tr}$ of an endomorphism of a vector space is the sum of the diagonal entries of any matrix representation of the endomorphism, which is independent of the basis: $T \in \mathcal{L}(V, V)$, $\forall B = (b_i)\{i=1}^n$, $\text{Span}(B) = V$, $\text{tr}~T = \text{tr}~[T]_B$.

## Matrix Decomposition/Factorization

Matrix decompositions/factorizations express a matrix as the product of smaller or simpler matrices.

• QR decomposition ~ Gram–Schmidt orthogonalization process: $A = Q R$, where $A \in M_{m,n}$, $Q \in M_{m,n}$ is orthogonal/unitary, $R \in M_n$ is upper-triangular; $\{ a_1, \cdots, a_n \} \to \{ q_1, \cdots, q_n \}$;
• LU (LUP) decomposition ~ Gaussian elimination for solving systems of linear equations: $A = L U$, where $\forall k \le n$, $[A]_{1 \dots k} \in \text{GL}_k$, $L$ is lower-triagular, and $U$ is upper-triangular;
• Cholesky decomposition: $A = L L^∗$, where $A \succeq 0$ and $L$ is lower-triangular;
• Jordan decomposition ~ eigenvalue problem: $A = P J P^{-1}$, where $A \in M_n$, $J$ is a Jordan matrix, and $P$ is invertible; if $A$ is real with only real eigenvalues, then $P$ can be real;
• Eigendecomposition (spectral decomposition): $A = P \Lambda P^{-1}$, where $\Lambda$ is diagonal, if $A$ has $n$ linearly independent eigenvectors;
• $A = U \Lambda U^∗$, where $U$ is orthogonal/unitary, if $A$ is a normal matrix (e.g. orthogonal/unitary, (skew-)symmetric/Hermitian);
• QZ decomposition (generalized Schur decomposition) ~ generalized eigenvalue problem: $A = Q S Z^∗, B = Q T Z^∗$, where $A, B \in M_n$, $Q, Z$ orthogonal/unitary, and $S, T$ (1-2 block) upper triangular, such that ratio of the diagonals are the generalized eigenvalues $\lambda_i = S_{ii} / T_{ii}$, $A v = \lambda B v$;
• Schur decomposition, $A = Q U Q^∗$, where $A \in M_n$, $Q$ unitary, and $U$ upper triangular (and complex), aka Schur form, where the diagonal are the eigenvalues;
• Singular value decomposition (SVD) ~ principal components analysis, latent factor analysis: $A = V \Sigma W^∗$, where $A \in M_{m,n}$, $V \in M_m, W \in M_n$ are orthogonal/unitary, and $\Sigma \in M_{m,n}$ is non-negative and has diagonal entries only;

### Similarity and the Eigenvalue Problem

Two square matrices $A, B \in M_n$ are similar if they are matrix representations of the same linear operator in different bases: $\exists S \in \text{GL}(n): A = S B S^{-1}$. The transformation $B \to S B S^{-1}$ is called a similarity transformation by similarity matrix $S$.

Eigenvector of a linear transformation on a finite-dimensional vector space is a vector that gets scaled by the transformation: $A x = \lambda x$. Eigenvalue (本征值) $\lambda$ associated with an eigenvector is the scaling factor when the eigenvector is applied to the linear transformation. Eigenspace associated with an eigenvalue is the subspace of eigenvectors with the eigenvalue: $\{x : A x = \lambda x\}$. Spectrum $\lambda(A)$ of a linear transformation on a finite-dimensional vector space is the set of its eigenvalues. Resolvent set $\mathbb{C} \setminus \lambda(A)$ of a linear transformation is the complement of its spectrum in the complex plane.

Jordan block $J_k(\lambda_i)$ is a k-by-k matrix with $\lambda_i$ on the diagonal and 1 on the super diagonal: $J_k(\lambda_i) = \lambda_i I_k + U_k$, where $U_k$ is the upper/backward shift matrix $U_k = [0, I_{k-1}; 0, 0]$; specially, $J_1(\lambda_i) = [\lambda_i]$. Jordan matrix is a direct sum of Jordan blocks, i.e. a block diagonal matrix consisting of Jordan blocks: $J = \text{diag}\{ J_{n_i}(\lambda_i) \}_{i=1}^k$, $\sum_{i=1}^k n_i = n$. Jordan Canonical Form Theorem: Any square matrix is similar to a (complex) Jordan matrix, unique up to permutation of Jordan blocks: $A = P J P^{-1}$, $P \in S_n$. In this norm, $\{ p_{1 + n_i} \}_{i = 0}^{k-1}$ is a basis for the direct sum of all the eigenspaces of $A$, which correspond to eigenvalues $\{\lambda_i\}_{i = 1}^k$ respectively. A family of similar matrices, i.e. an equivalence class under similarity, have the same Jordan canonical form.

Algebraic multiplicity of eigenvalue $\lambda_i$ is the multiplicity of it as a zero of the characteristic polynomial $p_A(t)$, i.e. the number of diagonal entries in the Jordan matrix that equal to $\lambda_i$. Geometric multiplicity of eigenvalue $\lambda_i$ is the maximum number of linearly independent eigenvectors associated with it, i.e. the number of Jordan blocks in the Jordan matrix that correspond to $\lambda_i$.

A square matrix is diagonalizable if it is similar to a diagonal matrix (Jordan blocks all size one). A square matrix is invertible iff its eigenvalues do no include zero. A square matrix is diagonalizable iff its eigenvectors can form a basis.

Location and perturbation of eigenvalues.