Vector norms

Vector norm $\|\cdot\|: V \mapsto \mathbb{R}_{\ge 0}$ is any non-negative function on a vector space, that is positive for non-zero vectors, homogeneous, and satisfies the triangle inequality: (1) $\|x\| > 0 \forall x \ne 0$; (2) $\|a x\| = |a| \|x\|$; (3) $\|x + y\| \le \|x\| + \|y\|$.

Vector p-norm (or Hölder p-norm) is the vector norm on the Euclidean n-space, defined by: $\|x\|_p = (\sum_{i=1}^n |x_i|^p)^{1/p}$, $p \in [1, \infty]$. Vector 2-norm or Euclidean norm is the norm induced from the Euclidean inner product: $\|x\|_2 = (\sum_{i=1}^n |x_i|^2)^{1/2}$. Euclidean norm is the default vector norm, and is often simply denoted as $\|\cdot\|$ or $|\cdot|$. A vector norm is unitarily invariant if it does not change under unitary transforms: $\forall U \in U(n)$, $\|U x\| = \|x\|$. The Euclidean norm is unitarily invariant.

Vector norms on matrices

Because the space $M_{m,n}(\mathbb{F})$ of m-by-n matrices is a vector space, vector norms can be defined. Unitarily invariant norm on a matrix space is a vector norm that does not change under left and right unitary transforms: $\forall U \in U(m), V \in U(n)$, $\|U A V\| = \|A\|$. For any unitarily invariant norm on a matrix space, $\|A\| = \|A^H\| = \|\Sigma(A)\|$, where $\Sigma(A)$ is the matrix with the singular values of A on the diagonal. A vector norm on a matrix spaace is self-adjoint if it is invariant under Hermitian adjoint: $\|A\| = \|A^H\|$. All unitarily invariant norms are self-adjoint.

Mixed subordinate norm is the subordinate operator norm on a space of matrices induced from (or subordinate to) mixed vector norms on the Euclidean spaces: $\|A\|_{p,q} = \max_{\|x\|_p = 1} \|A x\|_q$. The mixed subordinate norm need not be a matrix norm: sub-multiplicativity may not hold. If p = q, the mixed subordinate norm is the matrix p-norm. If p = 1 and q = ∞, it is the max norm, which is not a matrix norm: $\|A\|_{1,\infty} = \max_{i \in m, j \in n} |a_{ij}|$.

L_{p,q} norm, $p,q \in [1, \infty]$, is a function defined as: $\|A\|_{p,q} = \|(\|a_j\|_p)_{j=1}^n\|_q = (\sum_{j=1}^n (\sum_{i=1}^m |a_{ij}|^p)^{q/p})^{1/q}$. The $L_{p,q}$ norm need not be a matrix norm. $L_{p,p}$ norm of a matrix equals the p-norm of its vectorization: $\|A\|_{p,p} = \|\text{vec}(A)\|_p$. L_{1,1}-norm of a matrix is the 1-norm of its vectorization: $\|A\| = \sum_{i=1}^m \sum_{j=1}^n |a_{ij}|$, which is a matrix norm. L_{2,2}-norm of a matrix is the Frobenius norm, aka the Euclidean norm. $\|A\|_{2,2} = \|A\|_F$. L_{∞,∞}-norm is not a matrix norm. L_{2,1} norm of a matrix is the sum of the 2-norms of its columns: $\|A\|_{2,1} = \sum_{j=1}^n (\sum_{i=1}^m |a_{ij}|^2)^{1/2})$. The $L_{2,1}$ norm is useful if the matrix represents a data set, where each column vector is an observation.

Matrix norms

Matrix norm $\|\cdot\|: M_{m,n} \mapsto \mathbb{R}_{\ge 0}$ is any vector norm on a vector space of matrices that is sub-multiplicative: $\forall A \in M_{m,n}$, $\forall B \in M_{n,l}$, $\|A B\| \le \|A\| \|B\|$. The most commonly used matrix norms are the matrix 2-norm, the Frobenius norm (F), and the nuclear norm (N).

Matrix p-norms

Matrix p-norm (or subordinate matrix norm) is the subordinate operator norm on a space of matrices induced from (or subordinate to) the p-norm on the input and output Euclidean spaces: $\|A\|_p = \max_{\|x\|_p = 1} \|A x\|_p$. Matrix 2-norm or spectral norm of a matrix is its largest singular value: $\|A\|_2 = \sigma_1(A)$. The spectral norm is unitarily invariant. The spectral norm satisfies: $\|A^H A\|_2 = \|A\|_2^2$. Matrix 1-norm or maximum column sum norm is the largest column sum: $\|A\|_1 = \max_{j \in n} \sum_{i \in m} |a_{ij}|$. Matrix ∞-norm or maximum row sum norm is the largest row sum: $\|A\|_\infty = \max_{i \in m} \sum_{j \in n} |a_{ij}|$.

Schatten p-norms

Schatten p-norm of a matrix is the p-norm of its singular values: $\|A\|_p = |\sigma(A)|_p$, $p \in [1, \infty]$, usually written with three vertical lines to distinguish from matrix p-norms. Schatten p-norms are special cases of symmetric gauge functions (see e.g. [@Horn1990, Sec 3.4-3.5]); they are unitarily invariant since they only depend on the singular values. Schatten p-norms are decreasing in p: $\forall 1 \le p < q$, $\|A\|_p \ge \|A\|_q$; in particular, the nuclear norm (p=1) ≥ the Frobenius norm (p=2) ≥ spectral norm (p=∞). Schatten 1-norm, nuclear norm, or "trace norm", is the sum of the singular values: $\|A\|_* = \sum_{i=1}^r \sigma_i(A)$. Schatten 2-norm is the 2-norm of the singular values, which is equivalent to the Frobenius norm, the Euclidean norm / 2-norm of its vectorization: $\|A\|_F = (\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2)^{1/2} = (\text{tr}(A A^T))^{1/2}$. One can easily show the equivalence using $\text{tr}(A A^T) = \sum_{i=1}^r \lambda_i(A A^T)$ and $\lambda(A A^T) = \sigma^2(A)$. Schatten ∞-norm is equivalent to the matrix 2-norm, aka the spectral norm, whic is the largest singular value.

Ky Fan k-norms

Ky Fan k-norm is the sum of the top-$k$ singular values: $\|A\|_k = \sum_{i=1}^k \sigma_i(A)$, $k = 1, \cdots, \min(m, n)$. Ky Fan k-norms are derived from a symmetric gauge function (see [@Horn1990, 7.4.43]); they are unitarily invariant since they only depend on the singular values. Ky Fan k-norms are increasing in k, which is apparent from its definition: in particular, spectral norm (k = 1) ≤ Ky Fan k-norm (1 < k < min(m, n)) ≤ nuclear norm (k = min(m, n)).


Relative distance to singularity from a nonsingular matrix w.r.t. a vector norm is its distance to the set of order-n singular matrices, normalized by its norm: $\text{dist}(A) := d(A, M_n \setminus \text{GL}_n) / \|A\|$; that is, $\text{dist}(A) = \min\{\|B\| / \|A\| : \text{rank}(A + B) < n\}$.

Spectral radius and square matrix norms

Spectral radius of a square matrix is the largest modulus of its eigenvalues [Def 5.6.8]: $\rho(A) = \max(|\lambda_i|)_{i=1}^n$. The spectral radius is not a vector norm (nor a matrix norm) on square matrices, because: (1) $\rho(A) = 0$ for any nilpotent matrix, which need not be zero; (2) $\rho(A + B) > \rho(A) + \rho(B)$ is true for $A = B^T = J_2(0)$. (3) $\rho(A B) > \rho(A) \rho(B)$ is true for for the sample example.

The spectral radius is the infimum for all matrix norms of given matrix: $\forall A \in M_n$, (Thm 5.6.9) $\forall \|\cdot\|$, $\rho(A) \le \|A\|$, (Lem 5.6.10) $\forall \epsilon > 0$, $\exists \|\cdot\|$, $\|A\| \le \rho(A) + \epsilon$; or equivalently, $\inf_{\|\cdot\|} \|A\| = \rho(A)$.

A square matrix is convergent if its power series tends to zero: $\lim_{k \to \infty} A^k = 0$. (Thm 5.6.12) A square matrix is converent if and only if its spectral radius is less than one, $\rho(A) < 1$. (Cor 5.6.14) For any matrix norm, the norm of a large power of a matrix is dominated by the same power of the spectral radius: $\rho(A) = \lim_{k \to \infty} \|A^k\|$.

(Cor 5.6.35) Among unitarily invariant matrix norms on square matrices, the spectral norm is (1) the global lower bound: $\|A\|_2 \le \|A\|$, $\forall A \in M_n$; and (2) the only induced norm. Among matrix norms on square matrices, the spectral norm is the only one that is both induced and self-adjoint [Thm 5.6.36].


Refs: [@Horn1990]

🏷 Category=Algebra Category=Matrix Analysis