Introduction to numerical analysis

Error

Functions and Calculus

Interpolation & numerical differentiation

Interpolation

Fitting

Numerical integration

Solving Equations

Numerical linear algebra

Notes: 用高斯消元法求解时,比较计算后和计算前的主对角元,比值越小、解的精度就少多少(?) (Using double precision floating point number) 系数矩阵条件数在百亿量级(1e10)时, 解的精度大概只有1e-3或1e-4。——陈璞

Nonlinear equations

Newton method

Ordinary differential equations

Optimization

Line search methods select a descent direction and then a step size at each iteration. Trust region methods approximate the objective in a region around the current iterate with a (usually quadratic) model, and then select a direction according to the model; the region size is increased if objective decrease is satisfactory, otherwise it is decreased.

Unconstrained optimization:

  • gradient/steepest descent [@Cauchy1847];
  • gradient methods with momentum: conjugate gradient (CG), heavy ball method [@Polyak1964]; accelerated gradient methods [@Nesterov1983];
  • coordinate descent (CD);

Unconstrained optimization of finite sums / expectations:

  • stochastic gradient descent (SGD) [@Robbins1951];
  • noise reduction methods:
    • iterate averaging [@Polyak1992];
    • gradient aggregation: stochastic average gradient (SAG) [@LeRoux2012]; stochastic variance-reduced gradient (SVRG) [@Johnson2013]; SAGA (stochastic average gradient augmented) [@Defazio2014];
    • dynamic sample size [@Byrd2012; @Hashemi2014];
  • second-order methods:
    • natural gradient method, Hessian-free inexact Newton methods, Gauss-Newton methods, quasi-Newton methods;
    • diagonal scaling: adaptive gradient (AdaGrad) [@Duchi2011], root mean square propagation (RMSprop) [@Tieleman2012], adaptive moment estimation (Adam) [@Kingma2015];
  • randomized CD [@Nesterov2012], parallel stochastic CD [@LiuJ2013], stochastic dual coordinate ascent (SDCA) [@Shalev-Shwartz2013];

Constrained optimization:

  • Simplex algorithm [@Dantzig1947];
  • Interior-point methods [@Karmarkar1984];
  • Alternating direction method of multipliers (ADMM) (see [@Boyd2011]);

Conditional gradient, aka Frank-Wolfe [@Frank and Wolfe, 1956], for convex optimization: move along the maximum descent vector of the linearized objective over the constraint set, with a diminishing step size.

Projected gradient descent (PGD) [@Goldstein1964; @Levitin and Polyak, 1966], for smooth nonconvex optimization on convex set, e.g. bound constraints.

Projected conjugate gradient method, for reduced systems.

Others:

  • sampling-based (zeroth-order): genetic algorithms, simulated annealing;
  • Bayesian optimization;

References

  • Golub and Van Loan, 1983, 2013. Matrix Computations.
  • Dennis and Schnabel, 1983, 1996. Numerical Methods for Unconstrained Optimization and Nonlinear Equations.
  • Nocedal and Wright, 1999, 2006. Numerical Optimization.

🏷 Category=Computation