(Lemma 7.3.1) For any matrix $M \in M_{m,n}$, $m \ge n$, let $M^* M = Y^* \text{diag}(\lambda) Y$ be an eigendecomposition. Here the non-negative eigenvalues λ are unique, and the unitary eigenvectors Y are unique up to rotation within eigenspaces. Given λ and Y, there is an $X \in M_{m,n}$ with orthonormal columns, such that $M = X \text{diag}(\sqrt{\lambda}) Y$. If M has full rank, then X is unique. If M is real, then Y and X can be real.
Polar decomposition (Theorem 7.3.2): For any matrix $M \in M_{m,n}$, $m \ge n$, let positive semidefmite matrix $P = (M^* M)^{1/2}$, there exists a $U \in M_{m,n}$ with orthonormal columns, such that $M = U P$. If M has full rank, then U is unique. If M is real, then P and U can be real.
(Theorem 7.3.4) Let M be a square matrix with polar decomposition M = U P. M is normal iff P U = U P.
Theorem (7.3.5) Singular Value Decomposition (SVD): Every matrix can be written as a pairing of orthonormal vectors in its domain and codomain, with nonnegative weights; in other words, every matrix maps the unit n-ball to a q-dim ellipsoid, q = rank(M), where an orthonormal basis is mapped to a set of principal semi-axes and, if n > q, the zero vector; the vectors can be real if the matrix is real:
Singular values $(\sigma_i)_{i=1}^q$ are the nonnegative values on the diagonal of $\Sigma$, which is uniquely determined as an unordered tuple. The singular values of $M$ are eigenvalues of polar matrix $P$. Left singular vectors $(v_i)_{i=1}^m$ are the columns of $V$ and right singular vectors $(w_i)_{i=1}^n$ are the columns of $W$; they are uniquely determined up to simultaneous unitary transforms for each repeated singular value: $(V_j Q, W_j Q)$, $\forall Q \in U(m_j)$.
SVD of square matrices:
Properties of SVD:
Full SVD refers to the form given in the theorem: orthogonal decompositions for the domain and codomain, paired up to the rank of the matrix. However, not all information is needed in applications, and the full SVD can be reduced to save computation and storage.
Without loss of generality, consider an m-by-n matrix M of rank r, with $m \ge n \ge r$. Thin SVD refers to a computational procedure that provides an orthonormal n-frame for the codomain and an orthogonal decomposition for the domain, paired up to the rank of the matrix. $M = V \Sigma Q^T$, where $V \in V_{n, m}$, $Q \in O(n)$, and $\Sigma = \text{diag}(\sigma)$, $\sigma \in \mathbb{R}^n_{+\downarrow}$. This saves the cost of completing the larger orthogonal matrix.
Compact SVD refers to a computational procedure that provides paired orthonormal vectors for the domain and codomain, with r positive weights: $M = V \Sigma W^T$, where $V \in V_{r, m}$, $W \in V_{r, n}$, and $\Sigma = \text{diag}(\sigma)$, $\sigma \in \mathbb{R}^r_{+\downarrow}$. This saves the cost of orthogonal decompositions of the kernel and the cokernel, which has no effect on the mapping sine they are associated with zero singular values.
Truncated SVD or partial SVD refers to a computational procedure that provides paired orthonormal vectors for the domain and codomain, with the k-largest weights, k < r: $M \approx M_k = V_k \Sigma_k W_k^T$, where $V_k = V (I_k; 0) \in V_{k, m}$, $W_k = W (I_k; 0) \in V_{k, n}$, and $\Sigma_k = \text{diag}(\sigma_i)_{i=1}^k$. This saves the cost of computing the smaller singular values and the paired left and right singular vectors. The outcome does not exactly represent the original matrix, but is the optimal rank-k approximation in Frobenius norm.
Full and thin SVD: (coefficients depend on output, see [@Golub2013, Fig 8.6.1] for details.)
Truncated SVD [@Halko2011, Sec 1.4, 3.3, 6]:
Truncated eigenvalue decomposition (EVD) of an Hermitian matrix [@Halko2011, Sec 5.3-5.5]: (general dense matrix that fits in fast memory)
Classical truncated EVD algorithms: (Krylov subspace methods)
Classical SVD algorithms for general dense matrix, two steps:
Jacobi SVD algorithm for general dense matrix, for improved accuracy;
Effective rank or numerical rank of a matrix is the number of singular values whose magnitudes are greater than measurement error.
(7.4.1; Nearest rank-$k$ matrix) Given $M \in M_{m,n}, \mathrm{rank}(M) = k$, then $\forall M_1 \in M_{m,n}, \mathrm{rank}(M) = k_1 \le k$, $\min \| M - M_1 \|_2 = \| M - V \Sigma_1 W^H \|_2$, where $\Sigma_1 = \mathrm{diag}\{\sigma_1, \dots, \sigma_{k_1}, 0, \dots, 0\}_{m,n}$.
Theorem (simultaneous singular value decomposition) [@Horn1990, p. 426, Q26]: Two matrices have the same pairs of singular vectors, $A = V \Sigma W^H$ and $B = V \Lambda W^H$ ($\sigma$ and $\lambda$ not necessarily ordered), if and only if $A B^H$ and $B^H A$ are both normal.
Theorem (7.4.10): If $A \in M_{m,n}, B \in M_{n,m}$, $A B$ and $B A$ are positive semi-definite, then $\mathrm{tr}(A B) = \mathrm{tr}(B A) = \sum_{i=1}^q \sigma_i(A) \sigma_{\tau(i)}(B)$.