Singular Value Decomposition

Polar Decomposition

Lemma (7.3.1): $A = X \Lambda Y$

Theorem (7.3.2; Polar decomposition): $A = P U$

Theorem (7.3.4): Given $A \in M_n, A = P U$, $A$ is normal iff $P U = U P$.

Singular Value Decomposition

Theorem (7.3.5; Singular Value Decomposition): Every matrix can be written as a pairing of orthonormal vectors in its domain and range, with nonnegative weights; the vectors can be real if the matrix is real:

$\forall A \in M_{m,n}(\mathbb{C})$, $\exists V \in U(m)$, $W \in U(n)$, $\Sigma = \text{diag}\{(\sigma_i)_{i=1}^q\}_{m,n}$: $A = V \Sigma W^∗$.
$\forall A \in M_{m,n}(\mathbb{R})$, $\exists V \in O(m)$, $W \in O(n)$, and the rest follows.

Left singular vectors $(v_i)_{i=1}^m$ are the columns of $V$. Right singular vectors $(w_i)_{i=1}^n$ are the columns of $W$. Singular values $(\sigma_i)_{i=1}^q$ are the nonnegative values on the diagonal of $\Sigma$. The singular values of $A$ are eigenvalues of polar matrix $P$.

If $A$ is normal, then $A A^∗ = A^∗ A$ implies that $A A^∗$ and $A^∗ A$ have the same eigenvectors. This does not mean $V = W$ in $A = V \Sigma W^∗$, because each corresponding singular vector are different by a factor $e^{i \theta_k}$.

If $A = U \Lambda U^∗$, $\lambda_k = |\lambda_k| e^{i \theta_k}$ (define $\theta_k = 0$ if $\lambda_k = 0$), let $|\Lambda| = \mathrm{diag}\{|\lambda_1|, \dots, |\lambda_n|\}$, $D = \mathrm{diag}\{e^{i \theta_1}, \dots, e^{i \theta_k}\}$, then the SVD of $A$ is $A = U |\Lambda| (U D^∗)^∗$.

Theorem (7.3.7): Given $\tilde{A} = [0, A; A^∗, 0]$, the singular values of $A$ are $\{\sigma_1, \dots, \sigma_q\}$ iff the eigenvalues of $\tilde{A}$ are $\{\sigma_1, \dots, \sigma_q, -\sigma_1, \dots, -\sigma_q, 0, \dots, 0\}$.

$A^∗, A^T, \bar{A}$ have the same singular values with $A$. If $U$ and $V$ are unitary, then $U A V$ and $A$ have the same singular values. $\forall c \in \mathbb{C}, \Sigma(c A) = |c| \Sigma(A)$

Theorem (7.3.9; Interlacing property for singular values)

Theorem (7.3.10; Analog of Courant-Fischer Theorem)

Thin SVD or compact SVD of a rectangular matrix refers to a computational procedure that carries out SVD without completing the larger orthogonal matrix: suppose $X \in M_{n,k}$ and $n > k$, then a thin SVD gives $X = V \Sigma W^T$, where $V \in V_{k, n}$, $W \in O(k)$, and $\Sigma = \text{diag}(\sigma)$, $\sigma \in \mathbb{R}^k_{+\downarrow}$.

Pseudo-inverse

For a matrix $A \in M_{m,n}$, a pseudoinverse or generalized inverse is a matrix $A^+ \in M_{n,m}$ that has some properties analogous to the inverse of an invertible matrix. Left inverse of an injective matrix is a generalized inverse whose composition with the matrix is the identity map on the domain: $\text{rank}~A = n$, $\exists A^+ \in M_{n,m}$: $A^+ A = I_n$. Right inverse of a surjective matrix is a generalized inverse such that the composition of the matrix with it is the identity map on the codomain: $\text{rank}~A = m$, $\exists A^+ \in M_{n,m}$: $A A^+ = I_m$.

Moore-Penrose inverse of a matrix [@Moore1920; @Bjerhammar1951; @Penrose1955] is its Hermitian adjoint with positive singular values replaced by their reciprocals: $A \in M_{m,n}(\mathbb{C})$, given $A = V \Sigma W^∗$, define $A^\dagger = W \Sigma^\dagger V^∗$, where $\Sigma^\dagger = \text{diag}\{(\sigma_i^{-1})_{i=1}^q\}_{n,m}$.

Properties:

$A^\dagger$ is unique.
If $A$ is square and invertible, then $A^\dagger = A^{-1}$.
If $\text{rank}~A = n$, then $A^\dagger = (A^T A)^{-1} A^T$, which is a left inverse.
- $A^\dagger b$ is the lease square solution to $A x = b$.
If $\text{rank}~A = m$, then $A^\dagger = A^T (A A^T)^{-1}$, which is a right inverse.
- $A^\dagger b$ is the minimum Euclidean norm solution to $A x = b$.

Theorem: The Moore-Penrose inverse is the only matrix that satisifes the following conditions:

$A A^\dagger$ and $A^\dagger A$ are Hermitian.
$A A^\dagger A = A$ and $A^\dagger A A^\dagger = A^\dagger$.

Theorem (p. 426 Q26; simultaneous singular value decomposition)

Applications

Effective rank or numerical rank of a matrix is the number of singular values whose magnitudes are greater than measurement error.

(7.4.1; Nearest rank-$k$ matrix) Given $A \in M_{m,n}, \mathrm{rank}(A) = k$, then $\forall A_1 \in M_{m,n}, \mathrm{rank}(A) = k_1 \le k$, $\min \| A - A_1 \|_2 = \| A - V \Sigma_1 W^∗ \|_2$, where $\Sigma_1 = \mathrm{diag}\{\sigma_1, \dots, \sigma_{k_1}, 0, \dots, 0\}_{m,n}$.

Theorem (7.4.10): If $A \in M_{m,n}, B \in M_{n,m}$, $A B$ and $B A$ are positive semi-definite, then $\mathrm{tr}(A B) = \mathrm{tr}(B A) = \sum_{i=1}^q \sigma_i(A) \sigma_{\tau(i)}(B)$.

🏷 Category=Algebra Category=Matrix Theory