Lemma (7.3.1): $M = X \Lambda Y$
Theorem (7.3.2; Polar decomposition): $M = P U$
Theorem (7.3.4): Given $M \in M_n, M = P U$, $M$ is normal iff $P U = U P$.
Theorem (7.3.5; Singular Value Decomposition): Every matrix can be written as a pairing of orthonormal vectors in its domain and codomain, with nonnegative weights; the vectors can be real if the matrix is real:
Left singular vectors $(v_i)_{i=1}^m$ are the columns of $V$. Right singular vectors $(w_i)_{i=1}^n$ are the columns of $W$. Singular values $(\sigma_i)_{i=1}^q$ are the nonnegative values on the diagonal of $\Sigma$. The singular values of $M$ are eigenvalues of polar matrix $P$.
If $M$ is normal, then $M M^∗ = M^∗ M$ implies that $M M^∗$ and $M^∗ M$ have the same eigenvectors. This does not mean $V = W$ in $M = V \Sigma W^∗$, because each corresponding singular vector are different by a factor $e^{i \theta_k}$.
If $M = U \Lambda U^∗$, $\lambda_k = |\lambda_k| e^{i \theta_k}$ (define $\theta_k = 0$ if $\lambda_k = 0$), let $|\Lambda| = \mathrm{diag}\{|\lambda_1|, \dots, |\lambda_n|\}$, $D = \mathrm{diag}\{e^{i \theta_1}, \dots, e^{i \theta_k}\}$, then the SVD of $M$ is $M = U |\Lambda| (U D^∗)^∗$.
Theorem (7.3.7): Given $\tilde{M} = [0, M; M^∗, 0]$, the singular values of $M$ are $\{\sigma_1, \dots, \sigma_q\}$ iff the eigenvalues of $\tilde{M}$ are $\{\sigma_1, \dots, \sigma_q, -\sigma_1, \dots, -\sigma_q, 0, \dots, 0\}$.
$M^∗, M^T, \bar{M}$ have the same singular values with $M$. If $U$ and $V$ are unitary, then $U M V$ and $M$ have the same singular values. $\forall c \in \mathbb{C}, \Sigma(c M) = |c| \Sigma(M)$
Theorem (7.3.9; Interlacing property for singular values)
Theorem (7.3.10; Analog of Courant-Fischer Theorem)
Full SVD refers to the form given in the theorem: orthogonal decompositions for the domain and codomain, paired up to the rank of the matrix. However, not all information is needed in applications, and the full SVD can be reduced to save computation and storage.
Without loss of generality, consider an m-by-n matrix M of rank r, with m > n > r. Thin SVD refers to a computational procedure that provides an orthonormal n-frame for the codomain and an orthogonal decomposition for the domain, paired up to the rank of the matrix. $M = V \Sigma Q^T$, where $V \in V_{n, m}$, $Q \in O(n)$, and $\Sigma = \text{diag}(\sigma)$, $\sigma \in \mathbb{R}^n_{+\downarrow}$. This saves the cost of completing the larger orthogonal matrix.
Compact SVD refers to a computational procedure that provides paired orthonormal vectors for the domain and codomain, with r positive weights: $M = V \Sigma W^T$, where $V \in V_{r, m}$, $W \in V_{r, n}$, and $\Sigma = \text{diag}(\sigma)$, $\sigma \in \mathbb{R}^r_{+\downarrow}$. This saves the cost of orthogonal decompositions of the kernel and the cokernel, which has no effect on the mapping sine they are associated with zero singular values.
Truncated SVD or partial SVD refers to a computational procedure that provides paired orthonormal vectors for the domain and codomain, with the k-largest weights, k < r: $M \approx M_k = V_k \Sigma_k W_k^T$, where $V_k = V (I_k; 0) \in V_{k, m}$, $W_k = W (I_k; 0) \in V_{k, n}$, and $\Sigma_k = \text{diag}(\sigma_i)_{i=1}^k$. This saves the cost of computing the smaller singular values and the paired left and right singular vectors. The outcome does not exactly represent the original matrix, but is the optimal rank-k approximation in Frobenius norm.
Truncated SVD [@Halko2011, Sec. 6]:
Truncated eigenvalue decomposition (EVD) of an Hermitian matrix:
Classical SVD algorithms for general dense matrix, two steps:
Jacobi SVD algorithm for general dense matrix, for improved accuracy;
For a matrix $M \in M_{m,n}$, a pseudoinverse or generalized inverse is a matrix $M^+ \in M_{n,m}$ that has some properties analogous to the inverse of an invertible matrix. Left inverse of an injective matrix is a generalized inverse whose composition with the matrix is the identity map on the domain: $\text{rank}~M = n$, $\exists M^+ \in M_{n,m}$: $M^+ M = I_n$. Right inverse of a surjective matrix is a generalized inverse such that the composition of the matrix with it is the identity map on the codomain: $\text{rank}~M = m$, $\exists M^+ \in M_{n,m}$: $M M^+ = I_m$.
Moore-Penrose inverse of a matrix [@Moore1920; @Bjerhammar1951; @Penrose1955] is its Hermitian adjoint with positive singular values replaced by their reciprocals: $M \in M_{m,n}(\mathbb{C})$, given $M = V \Sigma W^∗$, define $M^\dagger = W \Sigma^\dagger V^∗$, where $\Sigma^\dagger = \text{diag}\{(\sigma_i^{-1})_{i=1}^q\}_{n,m}$.
Properties:
Theorem: The Moore-Penrose inverse is the only matrix that satisifes the following conditions:
Theorem (p. 426 Q26; simultaneous singular value decomposition)
Effective rank or numerical rank of a matrix is the number of singular values whose magnitudes are greater than measurement error.
(7.4.1; Nearest rank-$k$ matrix) Given $M \in M_{m,n}, \mathrm{rank}(M) = k$, then $\forall M_1 \in M_{m,n}, \mathrm{rank}(M) = k_1 \le k$, $\min \| M - M_1 \|_2 = \| M - V \Sigma_1 W^∗ \|_2$, where $\Sigma_1 = \mathrm{diag}\{\sigma_1, \dots, \sigma_{k_1}, 0, \dots, 0\}_{m,n}$.
Theorem (7.4.10): If $A \in M_{m,n}, B \in M_{n,m}$, $A B$ and $B A$ are positive semi-definite, then $\mathrm{tr}(A B) = \mathrm{tr}(B A) = \sum_{i=1}^q \sigma_i(A) \sigma_{\tau(i)}(B)$.