Introduction to numerical analysis

Numerical error: truncation, round off, error propagation,

## Numerical Linear Algebra

Numerical linear algebra (NLA).

### Linear equations

Classical methods:

• Direct methods for small and dense matrices:
• Gaussian elimination: LU decomposition with pivoting ($2/3 n^3$ flops), forward substitution for a lower triangular system ($n^2$ flops), and back substitution for an upper triangular system ($n^2$ flops);
• Cholesky or LDL decomposition with symmetric pivoting ($1/3 n^3$ flops), forward and back substitutions; (Triangular inverse takes $1/3 n^3$ flops; positive semi-definite inverse takes $n^3$ flops [@Hunger2007]; general inverse takes $5/3 n^3$ flops. )
• Iteration methods for large and sparse matrices: Jacobi iteration, Gauss-Seidel iteration, successive over relaxation, Chebyshev;

Relative condition number for a function $f: X \mapsto Y$ between two normed spaces is its maximum relative sensitivity at a point [John Rice, 1966]: $\text{cond}(f): X \mapsto \mathbb{R}_{\ge 0}$, $\text{cond}(f, x) = \lim_{\epsilon \to 0+} \sup_{\|\Delta x\| \le \epsilon} \frac{\|f(x + \Delta x)\| / \|f(x)\|}{\|\Delta x\| / \|x\|}$; it equals the induced norm of the Fréchet derivative (e.g. Jacobian matrix for vector-valued multivariate functions), normalized by input and output sizes: $\text{cond}(f, x) = \frac{\|Df(x)\|}{\|f(x)\| / \|x\|}$. A function is well-conditioned if the condition number is small and ill-conditioned if the condition number is large. For a range of functions including matrix inverse, matrix eigenvalues, and a root of a polynomial, the condition number is the reciprocal of the relative distance to the nearest singular problem. Absolute condition number is related to relative condition number by $\text{cond}_\text{abs}(f, x) = \text{cond}_\text{rel}(f, x) \cdot \|f(x)\| / \|x\|$.

The relative condition number for matrix inversion $\text{inv}: \text{GL}_n \mapsto \text{GL}_n$ w.r.t. a norm is denoted as $\kappa(A) := \text{cond}(\text{inv}, A)$. Properties: (1) scale-invariant: $\forall c \ne 0$, $\kappa(c A) = \kappa(A)$, which follows from the homogeneity of vector norms. (2) with a matrix norm, it is lower bounded by that of the identity matrix: $\kappa(A) \ge \kappa(I)$, which follows from the sub-multiplicativity of matrix norm. With the mixed subordinate norm, it equals the product of the {p,q}-norm of the matrix and the {q,p}-norm of its inverse: $\kappa_{p,q}(A) = \|A\|_{p,q} \cdot \|A^{-1}\|_{q,p}$. Properties: it is the reciprocal of the relative distance to the nearest singular matrix: $\kappa_{p,q}(A) = \text{dist}_{p,q}(A)^{-1}$. Spectral condition number of an invertible matrix is the condition number for matrix inversion w.r.t. the spectral norm: $\kappa_2(A) = \|A\|_2 \cdot \|A^{-1}\|_2$; it equals the ratio of the largest and the smallest singular value of the matrix: $\kappa_2(A) = \sigma_1(A) / \sigma_n(A)$.

Krylov [kri-lov] subspace methods: (symmetric Lanczos process + LDLt factorization + clever recursions)

• symmetric positive-definite: conjugate gradient (CG);
• symmetric: MINRES (minimum residual), SYMMLQ;
• square: GMRES (general minimum residual), QMR (quasi-minimum residual), BiCG (biconjugate gradient), CGS (CG squared), BiCGStab (BiCG stablilized);
• least squares: LSQR, LSMR (least-squares minimum residual);

Krylov subspace methods are successful only if they can be effectively preconditioned.

Notes: 用高斯消元法求解时，比较计算后和计算前的主对角元，比值越小、解的精度就少多少(?) (Using double precision floating point number) 系数矩阵条件数在百亿量级（1e10）时， 解的精度大概只有1e-3或1e-4。——陈璞

### Eigenvalues and eigenvectors

• power method: acceleration, orthonormalization;
• inverse iteration;
• QR decomposition, Jacobi's method;

Efficient updates in NLA:

• matrix inverses: inverse of a matrix under additive rank-k perturbation (Sherman–Morrison–Woodbury formula, aka matrix inversion lemma; see [@Hager1989] in SIAM Review); Moore-Penrose pseudo-inverse under rank-1 update [@Meyer1973]; inverse of a Gram matrix with an added column (see e.g. [@Petersen2012]);
• Cholesky factor of a Gram matrix (see CHOLMOD);
• singular value decomposition (SVD): with added row/column [@Bunch and Nielsen, 1978], cost about one iteration of a thin SVD; with additive rank-one update [@Brand2006], cost $O(\min(m, n) r)$;
• determinant (matrix determinant lemma);
• subspace: geodesic on the Grassmann manifold (see e.g. [@Balzano2010]);

## Root Finding

Merit function for solving nonlinear equations is a scalar-valued function that indicates whether a new iterate is better or worse than the current iterate, in the sense of making progress toward a root. Merit functions are often obtained by combining the components of the vector field in some way, e.g. the sum of squares; all of which have some drawbacks.

Line search and trust-region techniques play an equally important role in optimization, but one can argue that trust-region algorithms have certain theoretical advantages in solving nonlinear equations.

## Function Approximation

Types of function approximation:

Least squares solution of over-determined systems of equations. [@Golub2013, Ch. 5]

QR decomposition (for an m-by-n matrix A):

• modified Gram-Schmidt (mGS): thin QR, $2 m n^2$ flops, for full-rank matrices;
• Householder QR: find n Householder matrices and R, $2 (m - n/3) n^2$ flops [@Golub2013, Sec 5.2.2], get the first n columns of Q, $2 (m - n/3) n^2$ flops; in total $4 (m - n/3) n^2$ flops, but is more accurate than mGS when A is rank-deficient [@Golub2013, Sec 5.2.9];
• Householder QR with column pivoting: find Householder matrices and R, $4 m n r - 2 r^2 (m + n) + 4/3 r^3$ flops for a rank-r matrix; flops increases with r, with max equal to Householder QR [@Golub2013, Sec 5.4.2]; almost always reveals rank;

Householder matrix, Householder reflection, or Householder transformation is a matrix of the form: $P = I - \beta v v^T$, where Householder vector $v$ is non-zero and $\beta = 2 / |v|^2$. When applied to a vector, $P x$ is the reflection of $x$ in the hyperplane orthogonal to $v$.

QR decomposition with pivoting $A \Pi = Q [R; 0]$

Numerical rank of a matrix given a threshold $\tau$ is its smallest rank under perturbations of spectral norm no greater than the threshold: $k_\tau = \min_{|E|_2 \le \tau} \text{rank}(A+E)$. The numerical rank equals the number of singular values greater than the threshold.

[@Hogben2013, Sections 52.9] Rank revealing decomposition is a two-sided orthogonal decomposition of the form: $A = U [R; 0] V^T$, where $U \in O(m)$, $V \in O(n)$, $R$ is upper triangular, and small singular values of A are revealed by correspondingly small diagonal entries of $R$. Examples of rank revealing decompositions include RRQR, and UTV (URV or ULV) decompositions. Rank revealing QR (RRQR) decomposition is a pivoted QR decomposition such that for all leading principal submatrices $R_i$, the following interlacing inequalities hold: $c_i^{-1} \sigma_i(A) \le \sigma_i(R_i) \le \sigma_i(A) \le \sigma_1(R_i) \le c_i \sigma_i(A)$, where $c_i^2 = i (n - i) + \min(i, n-i)$. RRQR factorization is not unique. URV decomposition is a rank revealing decomposition $A = U [R; 0] V^T$ such that for all $k \in {1, \cdots, n}$ and for all $i \le k < j$, leading principal submatrices $R_i$ and trailing principal submatrices $R_{-k}$ (rows and columns from $k+1$ to $n$) satisfy the following sandwich inequalities: $\breve{c}_k \sigma_i(A) \le \sigma_i(R_i) \le \sigma_i(A)$ and $\sigma_j(A) \le \sigma_{j-k}(R_{-k}) \le \breve{c}_k^{-1} \sigma_j(A)$, where $\breve{c}_k^2 = 1 - \sigma_1^2(R_{-k}) / (\sigma_k^2(R_k) - \sigma_1^2(R_{-k}))$. ULV decomposition is similar to URV, but with a lower triangular matrix.

• rank-revealing QR (RRQR), refinements of QR to determine the rank of a matrix.