Introduction to numerical analysis
Error
Functions and Calculus
Interpolation
Solving Equations
Numerical linear algebra
Notes:
用高斯消元法求解时,比较计算后和计算前的主对角元,比值越小、解的精度就少多少(?)
(Using double precision floating point number) 系数矩阵条件数在百亿量级(1e10)时,
解的精度大概只有1e-3或1e-4。——陈璞
Newton method
Optimization
Line search methods select a descent direction and then a step size at each iteration.
Trust region methods approximate the objective in a region around the current iterate
with a (usually quadratic) model, and then select a direction according to the model;
the region size is increased if objective decrease is satisfactory, otherwise it is decreased.
Unconstrained optimization:
- gradient/steepest descent [@Cauchy1847];
- gradient methods with momentum:
conjugate gradient (CG), heavy ball method [@Polyak1964];
accelerated gradient methods [@Nesterov1983];
- coordinate descent (CD);
Unconstrained optimization of finite sums / expectations:
- stochastic gradient descent (SGD) [@Robbins1951];
- noise reduction methods:
- iterate averaging [@Polyak1992];
- gradient aggregation: stochastic average gradient (SAG) [@LeRoux2012];
stochastic variance-reduced gradient (SVRG) [@Johnson2013];
SAGA (stochastic average gradient augmented) [@Defazio2014];
- dynamic sample size [@Byrd2012; @Hashemi2014];
- second-order methods:
- natural gradient method, Hessian-free inexact Newton methods, Gauss-Newton methods,
quasi-Newton methods;
- diagonal scaling: adaptive gradient (AdaGrad) [@Duchi2011],
root mean square propagation (RMSprop) [@Tieleman2012],
adaptive moment estimation (Adam) [@Kingma2015];
- randomized CD [@Nesterov2012], parallel stochastic CD [@LiuJ2013],
stochastic dual coordinate ascent (SDCA) [@Shalev-Shwartz2013];
Constrained optimization:
- Simplex algorithm [@Dantzig1947];
- Interior-point methods [@Karmarkar1984];
- Alternating direction method of multipliers (ADMM) (see [@Boyd2011]);
Conditional gradient, aka Frank-Wolfe [@Frank and Wolfe, 1956], for convex optimization:
move along the maximum descent vector of the linearized objective over the constraint set,
with a diminishing step size.
Projected gradient descent (PGD) [@Goldstein1964; @Levitin and Polyak, 1966],
for smooth nonconvex optimization on convex set, e.g. bound constraints.
Projected conjugate gradient method, for reduced systems.
Others:
- sampling-based (zeroth-order): genetic algorithms, simulated annealing;
- Bayesian optimization;
References
- Golub and Van Loan, 1983, 2013. Matrix Computations.
- Dennis and Schnabel, 1983, 1996.
Numerical Methods for Unconstrained Optimization and Nonlinear Equations.
- Nocedal and Wright, 1999, 2006. Numerical Optimization.
🏷 Category=Computation