Singular Value Decomposition

Polar Decomposition

Lemma (7.3.1): $M = X \Lambda Y$

Theorem (7.3.2; Polar decomposition): $M = P U$

Theorem (7.3.4): Given $M \in M_n, M = P U$, $M$ is normal iff $P U = U P$.

Singular Value Decomposition

Theorem (7.3.5; Singular Value Decomposition): Every matrix can be written as a pairing of orthonormal vectors in its domain and codomain, with nonnegative weights; the vectors can be real if the matrix is real:

$\forall M \in M_{m,n}(\mathbb{C})$, $\exists V \in U(m)$, $W \in U(n)$, $\Sigma = \text{diag}\{(\sigma_i)_{i=1}^q\}_{m,n}$: $M = V \Sigma W^∗$.
$\forall M \in M_{m,n}(\mathbb{R})$, $\exists V \in O(m)$, $W \in O(n)$, and the rest follows.

Left singular vectors $(v_i)_{i=1}^m$ are the columns of $V$. Right singular vectors $(w_i)_{i=1}^n$ are the columns of $W$. Singular values $(\sigma_i)_{i=1}^q$ are the nonnegative values on the diagonal of $\Sigma$. The singular values of $M$ are eigenvalues of polar matrix $P$.

If $M$ is normal, then $M M^∗ = M^∗ M$ implies that $M M^∗$ and $M^∗ M$ have the same eigenvectors. This does not mean $V = W$ in $M = V \Sigma W^∗$, because each corresponding singular vector are different by a factor $e^{i \theta_k}$.

If $M = U \Lambda U^∗$, $\lambda_k = |\lambda_k| e^{i \theta_k}$ (define $\theta_k = 0$ if $\lambda_k = 0$), let $|\Lambda| = \mathrm{diag}\{|\lambda_1|, \dots, |\lambda_n|\}$, $D = \mathrm{diag}\{e^{i \theta_1}, \dots, e^{i \theta_k}\}$, then the SVD of $M$ is $M = U |\Lambda| (U D^∗)^∗$.

Theorem (7.3.7): Given $\tilde{M} = [0, M; M^∗, 0]$, the singular values of $M$ are $\{\sigma_1, \dots, \sigma_q\}$ iff the eigenvalues of $\tilde{M}$ are $\{\sigma_1, \dots, \sigma_q, -\sigma_1, \dots, -\sigma_q, 0, \dots, 0\}$.

$M^∗, M^T, \bar{M}$ have the same singular values with $M$. If $U$ and $V$ are unitary, then $U M V$ and $M$ have the same singular values. $\forall c \in \mathbb{C}, \Sigma(c M) = |c| \Sigma(M)$

Theorem (7.3.9; Interlacing property for singular values)

Theorem (7.3.10; Analog of Courant-Fischer Theorem)

Reduced SVDs

Full SVD refers to the form given in the theorem: orthogonal decompositions for the domain and codomain, paired up to the rank of the matrix. However, not all information is needed in applications, and the full SVD can be reduced to save computation and storage.

Without loss of generality, consider an m-by-n matrix M of rank r, with m > n > r. Thin SVD refers to a computational procedure that provides an orthonormal n-frame for the codomain and an orthogonal decomposition for the domain, paired up to the rank of the matrix. $M = V \Sigma Q^T$, where $V \in V_{n, m}$, $Q \in O(n)$, and $\Sigma = \text{diag}(\sigma)$, $\sigma \in \mathbb{R}^n_{+\downarrow}$. This saves the cost of completing the larger orthogonal matrix.

Compact SVD refers to a computational procedure that provides paired orthonormal vectors for the domain and codomain, with r positive weights: $M = V \Sigma W^T$, where $V \in V_{r, m}$, $W \in V_{r, n}$, and $\Sigma = \text{diag}(\sigma)$, $\sigma \in \mathbb{R}^r_{+\downarrow}$. This saves the cost of orthogonal decompositions of the kernel and the cokernel, which has no effect on the mapping sine they are associated with zero singular values.

Truncated SVD or partial SVD refers to a computational procedure that provides paired orthonormal vectors for the domain and codomain, with the k-largest weights, k < r: $M \approx M_k = V_k \Sigma_k W_k^T$, where $V_k = V (I_k; 0) \in V_{k, m}$, $W_k = W (I_k; 0) \in V_{k, n}$, and $\Sigma_k = \text{diag}(\sigma_i)_{i=1}^k$. This saves the cost of computing the smaller singular values and the paired left and right singular vectors. The outcome does not exactly represent the original matrix, but is the optimal rank-k approximation in Frobenius norm.

Algorithms and Computational Complexity

Truncated SVD [@Halko2011, Sec. 6]:

general dense matrix that fits in fast memory:
- classical algorithms: rank-revealing QR factorization + manipulate the factors, $O(m n k)$;
- randomized algorithms: $O(m n \log(k) + (m+n) k^2)$;
general dense matrix stored in slow memory or streamed: cost dominated by transferring the matrix from slow memory;
- standard techniques for low-rank approximation: O(k) passes;
- randomized algorithms: one pass, or very few (2-4) additional passes for improved accuracy;

Truncated eigenvalue decomposition (EVD) of an Hermitian matrix:

direct method: $O(n^2 k)$;
row extraction: less accurate, $O(n k^2)$;

Classical SVD algorithms for general dense matrix, two steps:

reduce the matrix to bidiagonal form: $M = P B Q^T$, where P and Q are products of Householder matrices, B is upper diagonal;
- dense matrix: Golub-Kahan, $4 m n^2 - 4 / 3 n^3$ flops, $O(m n^2)$;
- large matrix: Golub-Kahan-Lanczos, with Lanczos iterations;
SVD of a bidiagonal matrix, to high relative accuracy:
- singular values only: square-root-free method; bisection method, $O(k n)$;
- singular values and singular vectors: multiple relatively robust representations, $O(k n)$;

Jacobi SVD algorithm for general dense matrix, for improved accuracy;

One-sided Jacobi SVD algorithm [@Hestenes, 1958];
Preconditioned Jacobi SVD algorithm;
Rank-revealing decomposition: structured matrices;

Pseudo-inverse

For a matrix $M \in M_{m,n}$, a pseudoinverse or generalized inverse is a matrix $M^+ \in M_{n,m}$ that has some properties analogous to the inverse of an invertible matrix. Left inverse of an injective matrix is a generalized inverse whose composition with the matrix is the identity map on the domain: $\text{rank}~M = n$, $\exists M^+ \in M_{n,m}$: $M^+ M = I_n$. Right inverse of a surjective matrix is a generalized inverse such that the composition of the matrix with it is the identity map on the codomain: $\text{rank}~M = m$, $\exists M^+ \in M_{n,m}$: $M M^+ = I_m$.

Moore-Penrose inverse of a matrix [@Moore1920; @Bjerhammar1951; @Penrose1955] is its Hermitian adjoint with positive singular values replaced by their reciprocals: $M \in M_{m,n}(\mathbb{C})$, given $M = V \Sigma W^∗$, define $M^\dagger = W \Sigma^\dagger V^∗$, where $\Sigma^\dagger = \text{diag}\{(\sigma_i^{-1})_{i=1}^q\}_{n,m}$.

Properties:

$M^\dagger$ is unique.
If $M$ is square and invertible, then $M^\dagger = M^{-1}$.
If $\text{rank}~M = n$, then $M^\dagger = (M^T M)^{-1} M^T$, which is a left inverse.
- $M^\dagger b$ is the lease square solution to $M x = b$.
If $\text{rank}~M = m$, then $M^\dagger = M^T (M M^T)^{-1}$, which is a right inverse.
- $M^\dagger b$ is the minimum Euclidean norm solution to $M x = b$.

Theorem: The Moore-Penrose inverse is the only matrix that satisifes the following conditions:

$M M^\dagger$ and $M^\dagger M$ are Hermitian.
$M M^\dagger M = M$ and $M^\dagger M M^\dagger = M^\dagger$.

Theorem (p. 426 Q26; simultaneous singular value decomposition)

Applications

Effective rank or numerical rank of a matrix is the number of singular values whose magnitudes are greater than measurement error.

(7.4.1; Nearest rank-$k$ matrix) Given $M \in M_{m,n}, \mathrm{rank}(M) = k$, then $\forall M_1 \in M_{m,n}, \mathrm{rank}(M) = k_1 \le k$, $\min \| M - M_1 \|_2 = \| M - V \Sigma_1 W^∗ \|_2$, where $\Sigma_1 = \mathrm{diag}\{\sigma_1, \dots, \sigma_{k_1}, 0, \dots, 0\}_{m,n}$.

Theorem (7.4.10): If $A \in M_{m,n}, B \in M_{n,m}$, $A B$ and $B A$ are positive semi-definite, then $\mathrm{tr}(A B) = \mathrm{tr}(B A) = \sum_{i=1}^q \sigma_i(A) \sigma_{\tau(i)}(B)$.

🏷 Category=Algebra Category=Matrix Theory