Numerical Analysis

Numerical error: truncation, round off, error propagation,

Numerical Linear Algebra

Numerical linear algebra (NLA).

Linear equations

Classical methods:

Direct methods for small and dense matrices:
- Gaussian elimination: LU decomposition with pivoting ($2/3 n^3$ flops), forward substitution for a lower triangular system ($n^2$ flops), and back substitution for an upper triangular system ($n^2$ flops);
- Cholesky or LDL decomposition with symmetric pivoting ($1/3 n^3$ flops), forward and back substitutions; (Triangular inverse takes $1/3 n^3$ flops; positive semi-definite inverse takes $n^3$ flops [@Hunger2007]; general inverse takes $5/3 n^3$ flops. )
Iteration methods for large and sparse matrices: Jacobi iteration, Gauss-Seidel iteration, successive over relaxation, Chebyshev;

Relative condition number for a function $f: X \mapsto Y$ between two normed spaces is its maximum relative sensitivity at a point [@Rice1966]: $\text{cond}(f): X \mapsto \mathbb{R}_{\ge 0}$, $\text{cond}(f, x) = \lim_{\epsilon \to 0+} \sup_{\|\Delta x\| \le \epsilon} \frac{\|f(x + \Delta x) - f(x)\| / \|f(x)\|}{\|\Delta x\| / \|x\|}$; it equals the induced norm of the Fréchet derivative (e.g. Jacobian matrix for vector-valued multivariate functions), normalized by input and output sizes: $\text{cond}(f, x) = \frac{\|Df(x)\|}{\|f(x)\| / \|x\|}$. A function is well-conditioned if the condition number is small and ill-conditioned if the condition number is large. For a range of functions including matrix inverse, matrix eigenvalues, and a root of a polynomial, the condition number is the reciprocal of the relative distance to the nearest singular problem. Absolute condition number is the maximum sensitivity of a function at a point, and is related to relative condition number by $\text{cond}_\text{abs}(f, x) = \text{cond}_\text{rel}(f, x) \cdot \|f(x)\| / \|x\|$.

The relative condition number for matrix inversion $\text{inv}: \text{GL}_n \mapsto \text{GL}_n$ w.r.t. a norm is denoted as $\kappa(A) := \text{cond}(\text{inv}, A)$. Properties: (1) scale-invariant: $\forall c \ne 0$, $\kappa(c A) = \kappa(A)$, which follows from the homogeneity of vector norms. (2) with a matrix norm, it is lower bounded by that of the identity matrix: $\kappa(A) \ge \kappa(I)$, which follows from the sub-multiplicativity of matrix norm. With the mixed subordinate norm, it equals the product of the {p,q}-norm of the matrix and the {q,p}-norm of its inverse: $\kappa_{p,q}(A) = \|A\|_{p,q} \cdot \|A^{-1}\|_{q,p}$. Properties: it is the reciprocal of the relative distance to the nearest singular matrix: $\kappa_{p,q}(A) = \text{dist}_{p,q}(A)^{-1}$. Spectral condition number of an invertible matrix is the condition number for matrix inversion w.r.t. the spectral norm: $\kappa_2(A) = \|A\|_2 \cdot \|A^{-1}\|_2$; it equals the ratio of the largest and the smallest singular value of the matrix: $\kappa_2(A) = \sigma_1(A) / \sigma_n(A)$.

Krylov [kri-lov] subspace methods: (symmetric Lanczos process + LDLt factorization + clever recursions)

symmetric positive-definite: conjugate gradient (CG);
symmetric: MINRES (minimum residual), SYMMLQ;
square: GMRES (general minimum residual), QMR (quasi-minimum residual), BiCG (biconjugate gradient), CGS (CG squared), BiCGStab (BiCG stablilized);
least squares: LSQR, LSMR (least-squares minimum residual);

Krylov subspace methods are successful only if they can be effectively preconditioned.

Notes: 用高斯消元法求解时，比较计算后和计算前的主对角元，比值越小、解的精度就少多少(?) (Using double precision floating point number) 系数矩阵条件数在百亿量级（1e10）时，解的精度大概只有1e-3或1e-4。——陈璞

Eigenvalues and eigenvectors

power method: acceleration, orthonormalization;
inverse iteration;
QR decomposition, Jacobi's method;

Efficient updates in NLA:

matrix inverses: inverse of a matrix under additive rank-k perturbation (Sherman–Morrison–Woodbury formula, aka matrix inversion lemma; see [@Hager1989] in SIAM Review); Moore-Penrose pseudo-inverse under rank-1 update [@Meyer1973]; inverse of a Gram matrix with an added column (see e.g. [@Petersen2012]);
Cholesky factor of a Gram matrix (see CHOLMOD);
singular value decomposition (SVD): with added row/column [@Bunch and Nielsen, 1978], cost about one iteration of a thin SVD; with additive rank-one update [@Brand2006], cost $O(\min(m, n) r)$;
determinant (matrix determinant lemma);
subspace: geodesic on the Grassmann manifold (see e.g. [@Balzano2010]);

Root Finding

Solving Nonlinear Equations:

Newton method, etc.

Merit function for solving nonlinear equations is a scalar-valued function that indicates whether a new iterate is better or worse than the current iterate, in the sense of making progress toward a root. Merit functions are often obtained by combining the components of the vector field in some way, e.g. the sum of squares; all of which have some drawbacks.

Line search and trust-region techniques play an equally important role in optimization, but one can argue that trust-region algorithms have certain theoretical advantages in solving nonlinear equations.

Optimization

Optimization:

unconstrained optimization (for machine learning): quasi-Newton methods, etc.;
Convex Optimization;
smooth optimization;

Function Approximation

Types of function approximation:

Interpolation: polynomial, spline, radial basis function; notes on interpolation & numerical differentiation;
Fitting: least squares, polynomial fitting, smoothing spline;

Least squares solution of over-determined systems of equations. [@Golub2013, Ch. 5]

QR decomposition (for an m-by-n matrix A):

modified Gram-Schmidt (mGS): thin QR, $2 m n^2$ flops, for full-rank matrices;
Householder QR: find n Householder matrices and R, $2 (m - n/3) n^2$ flops [@Golub2013, Sec 5.2.2], get the first n columns of Q, $2 (m - n/3) n^2$ flops; in total $4 (m - n/3) n^2$ flops, but is more accurate than mGS when A is rank-deficient [@Golub2013, Sec 5.2.9];
Householder QR with column pivoting: find Householder matrices and R, $4 m n r - 2 r^2 (m + n) + 4/3 r^3$ flops for a rank-r matrix; flops increases with r, with max equal to Householder QR [@Golub2013, Sec 5.4.2]; almost always reveals rank;

Householder matrix, Householder reflection, or Householder transformation is a matrix of the form: $P = I - \beta v v^T$, where Householder vector $v$ is non-zero and $\beta = 2 / |v|^2$. When applied to a vector, $P x$ is the reflection of $x$ in the hyperplane orthogonal to $v$.

QR decomposition with pivoting $A \Pi = Q [R; 0]$

Numerical rank of a matrix given a threshold $\tau$ is its smallest rank under perturbations of spectral norm no greater than the threshold: $k_\tau = \min_{|E|_2 \le \tau} \text{rank}(A+E)$. The numerical rank equals the number of singular values greater than the threshold.

[@Hogben2013, Sections 52.9] Rank revealing decomposition is a two-sided orthogonal decomposition of the form: $A = U [R; 0] V^T$, where $U \in O(m)$, $V \in O(n)$, $R$ is upper triangular, and small singular values of A are revealed by correspondingly small diagonal entries of $R$. Examples of rank revealing decompositions include RRQR, and UTV (URV or ULV) decompositions. Rank revealing QR (RRQR) decomposition is a pivoted QR decomposition such that for all leading principal submatrices $R_i$, the following interlacing inequalities hold: $c_i^{-1} \sigma_i(A) \le \sigma_i(R_i) \le \sigma_i(A) \le \sigma_1(R_i) \le c_i \sigma_i(A)$, where $c_i^2 = i (n - i) + \min(i, n-i)$. RRQR factorization is not unique. URV decomposition is a rank revealing decomposition $A = U [R; 0] V^T$ such that for all $k \in {1, \cdots, n}$ and for all $i \le k < j$, leading principal submatrices $R_i$ and trailing principal submatrices $R_{-k}$ (rows and columns from $k+1$ to $n$) satisfy the following sandwich inequalities: $\breve{c}_k \sigma_i(A) \le \sigma_i(R_i) \le \sigma_i(A)$ and $\sigma_j(A) \le \sigma_{j-k}(R_{-k}) \le \breve{c}_k^{-1} \sigma_j(A)$, where $\breve{c}_k^2 = 1 - \sigma_1^2(R_{-k}) / (\sigma_k^2(R_k) - \sigma_1^2(R_{-k}))$. ULV decomposition is similar to URV, but with a lower triangular matrix.

rank-revealing QR (RRQR), refinements of QR to determine the rank of a matrix.

References

Isaacson and Keller, 1966. Analysis of Numerical Methods.
Hamming, 1962, 1973. Numerical Methods for Scientists and Engineers.
Golub and Van Loan, 1983, 2013. Matrix Computations.
Trefethen and Bau, 1997. Numerical Linear Algebra.
Dennis and Schnabel, 1983, 1996. Numerical Methods for Unconstrained Optimization and Nonlinear Equations.
Nocedal and Wright, 1999, 2006. Numerical Optimization.
Suli and Mayers, 2003. An Introduction to Numerical Analysis.
Hogben (ed.), 2006, 2013. Handbook of Linear Algebra.

🏷 Category=Computation

Numerical Analysis

Numerical Linear Algebra

Linear equations

Eigenvalues and eigenvectors

Root Finding

Optimization

Function Approximation

Numerical Calculus

Numerical integration

Numerical ordinary differential equations

Numerical partial differential equations

References