Vector **norm** $\|\cdot\|: V \mapsto \mathbb{R}_{\ge 0}$
is any non-negative function on a vector space,
that is positive for non-zero vectors, homogeneous, and satisfies the triangle inequality:
(1) $\|x\| > 0 \forall x \ne 0$;
(2) $\|a x\| = |a| \|x\|$;
(3) $\|x + y\| \le \|x\| + \|y\|$.

Vector **p-norm** (or Hölder p-norm) is the vector norm on the Euclidean n-space, defined by:
$\|x\|_p = (\sum_{i=1}^n |x_i|^p)^{1/p}$, $p \in [1, \infty]$.
Vector **2-norm** or **Euclidean norm** is the norm induced from the Euclidean inner product:
$\|x\|_2 = (\sum_{i=1}^n |x_i|^2)^{1/2}$.
Euclidean norm is the default vector norm,
and is often simply denoted as $\|\cdot\|$ or $|\cdot|$.
A vector norm is **unitarily invariant** if it does not change under unitary transforms:
$\forall U \in U(n)$, $\|U x\| = \|x\|$.
The Euclidean norm is unitarily invariant.

Because the space $M_{m,n}(\mathbb{F})$ of m-by-n matrices is a vector space,
vector norms can be defined.
**Unitarily invariant** norm on a matrix space is a vector norm
that does not change under left and right unitary transforms:
$\forall U \in U(m), V \in U(n)$, $\|U A V\| = \|A\|$.
For any unitarily invariant norm on a matrix space,
$\|A\| = \|A^H\| = \|\Sigma(A)\|$,
where $\Sigma(A)$ is the matrix with the singular values of A on the diagonal.
A vector norm on a matrix spaace is **self-adjoint**
if it is invariant under Hermitian adjoint: $\|A\| = \|A^H\|$.
All unitarily invariant norms are self-adjoint.

**Mixed subordinate norm** is the subordinate operator norm on a space of matrices
induced from (or subordinate to) mixed vector norms on the Euclidean spaces:
$\|A\|_{p,q} = \max_{\|x\|_p = 1} \|A x\|_q$.
The mixed subordinate norm need not be a matrix norm: sub-multiplicativity may not hold.
If p = q, the mixed subordinate norm is the matrix p-norm.
If p = 1 and q = ∞, it is the **max norm**, which is not a matrix norm:
$\|A\|_{1,\infty} = \max_{i \in m, j \in n} |a_{ij}|$.

**L_{p,q} norm**, $p,q \in [1, \infty]$, is a function defined as:
$\|A\|_{p,q} = \|(\|a_j\|_p)_{j=1}^n\|_q =
(\sum_{j=1}^n (\sum_{i=1}^m |a_{ij}|^p)^{q/p})^{1/q}$.
The $L_{p,q}$ norm need *not* be a matrix norm.
$L_{p,p}$ norm of a matrix equals the p-norm of its vectorization:
$\|A\|_{p,p} = \|\text{vec}(A)\|_p$.
**L_{1,1}-norm** of a matrix is the 1-norm of its vectorization:
$\|A\| = \sum_{i=1}^m \sum_{j=1}^n |a_{ij}|$, which is a matrix norm.
**L_{2,2}-norm** of a matrix is the Frobenius norm, aka the Euclidean norm.
$\|A\|_{2,2} = \|A\|_F$.
L_{∞,∞}-norm is not a matrix norm.
**L_{2,1} norm** of a matrix is the sum of the 2-norms of its columns:
$\|A\|_{2,1} = \sum_{j=1}^n (\sum_{i=1}^m |a_{ij}|^2)^{1/2})$.
The $L_{2,1}$ norm is useful if the matrix represents a data set,
where each column vector is an observation.

**Matrix norm** $\|\cdot\|: M_{m,n} \mapsto \mathbb{R}_{\ge 0}$
is any vector norm on a vector space of matrices that is sub-multiplicative:
$\forall A \in M_{m,n}$, $\forall B \in M_{n,l}$, $\|A B\| \le \|A\| \|B\|$.
The most commonly used matrix norms are the matrix 2-norm,
the Frobenius norm (F), and the nuclear norm (N).

**Matrix p-norm** (or subordinate matrix norm)
is the subordinate operator norm on a space of matrices
induced from (or subordinate to) the p-norm on the input and output Euclidean spaces:
$\|A\|_p = \max_{\|x\|_p = 1} \|A x\|_p$.
**Matrix 2-norm** or **spectral norm** of a matrix is its largest singular value:
$\|A\|_2 = \sigma_1(A)$.
The spectral norm is unitarily invariant.
The spectral norm satisfies: $\|A^H A\|_2 = \|A\|_2^2$.
**Matrix 1-norm** or **maximum column sum norm** is the largest column sum:
$\|A\|_1 = \max_{j \in n} \sum_{i \in m} |a_{ij}|$.
**Matrix ∞-norm** or **maximum row sum norm** is the largest row sum:
$\|A\|_\infty = \max_{i \in m} \sum_{j \in n} |a_{ij}|$.

**Schatten p-norm** of a matrix is the p-norm of its singular values:
$\|A\|_p = |\sigma(A)|_p$, $p \in [1, \infty]$,
usually written with three vertical lines to distinguish from matrix p-norms.
Schatten p-norms are special cases of symmetric gauge functions (see e.g. [@Horn1990, Sec 3.4-3.5]);
they are unitarily invariant since they only depend on the singular values.
Schatten p-norms are decreasing in p:
$\forall 1 \le p < q$, $\|A\|_p \ge \|A\|_q$;
in particular, the nuclear norm (p=1) ≥ the Frobenius norm (p=2) ≥ spectral norm (p=∞).
Schatten 1-norm, **nuclear norm**, or "trace norm", is the sum of the singular values:
$\|A\|_* = \sum_{i=1}^r \sigma_i(A)$.
Schatten 2-norm is the 2-norm of the singular values,
which is equivalent to the **Frobenius norm**, the Euclidean norm / 2-norm of its vectorization:
$\|A\|_F = (\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2)^{1/2} = (\text{tr}(A A^T))^{1/2}$.
One can easily show the equivalence using $\text{tr}(A A^T) = \sum_{i=1}^r \lambda_i(A A^T)$
and $\lambda(A A^T) = \sigma^2(A)$.
Schatten ∞-norm is equivalent to the matrix 2-norm, aka the spectral norm,
whic is the largest singular value.

**Ky Fan k-norm** is the sum of the top-$k$ singular values:
$\|A\|_k = \sum_{i=1}^k \sigma_i(A)$, $k = 1, \cdots, \min(m, n)$.
Ky Fan k-norms are derived from a symmetric gauge function (see [@Horn1990, 7.4.43]);
they are unitarily invariant since they only depend on the singular values.
Ky Fan k-norms are increasing in k, which is apparent from its definition: in particular,
spectral norm (k = 1) ≤ Ky Fan k-norm (1 < k < min(m, n)) ≤ nuclear norm (k = min(m, n)).

**Relative distance to singularity** from a nonsingular matrix w.r.t. a vector norm
is its distance to the set of order-n singular matrices, normalized by its norm:
$\text{dist}(A) := d(A, M_n \setminus \text{GL}_n) / \|A\|$; that is,
$\text{dist}(A) = \min\{\|B\| / \|A\| : \text{rank}(A + B) < n\}$.

**Spectral radius** of a square matrix is the largest modulus of its eigenvalues [Def 5.6.8]:
$\rho(A) = \max(|\lambda_i|)_{i=1}^n$.
The spectral radius is not a vector norm (nor a matrix norm) on square matrices, because:
(1) $\rho(A) = 0$ for any nilpotent matrix, which need not be zero;
(2) $\rho(A + B) > \rho(A) + \rho(B)$ is true for $A = B^T = J_2(0)$.
(3) $\rho(A B) > \rho(A) \rho(B)$ is true for for the sample example.

The spectral radius is the infimum for all matrix norms of given matrix: $\forall A \in M_n$, (Thm 5.6.9) $\forall \|\cdot\|$, $\rho(A) \le \|A\|$, (Lem 5.6.10) $\forall \epsilon > 0$, $\exists \|\cdot\|$, $\|A\| \le \rho(A) + \epsilon$; or equivalently, $\inf_{\|\cdot\|} \|A\| = \rho(A)$.

A square matrix is **convergent** if its power series tends to zero: $\lim_{k \to \infty} A^k = 0$.
(Thm 5.6.12) A square matrix is converent if and only if its spectral radius is less than one,
$\rho(A) < 1$.
(Cor 5.6.14) For any matrix norm, the norm of a large power of a matrix
is dominated by the same power of the spectral radius:
$\rho(A) = \lim_{k \to \infty} \|A^k\|$.

(Cor 5.6.35) Among unitarily invariant matrix norms on square matrices, the spectral norm is (1) the global lower bound: $\|A\|_2 \le \|A\|$, $\forall A \in M_n$; and (2) the only induced norm. Among matrix norms on square matrices, the spectral norm is the only one that is both induced and self-adjoint [Thm 5.6.36].

Notes:

Refs: [@Horn1990]