**Newton's method** [@Simpson1740]
for the unconstrained optimization of a twice-differentiable real-valued function
applies Newton's method for root finding to the gradient of the function,
which requires computing the gradient and the Hessian:
given $f \in C^2(\mathbb{R}^n)$, then $\nabla f \in C^1(\mathbb{R}^n, \mathbb{R}^n)$;
update by $x_{k+1} = x_k - [H_f(x_k)]^{-1} \nabla f(x_k)$.
In practice, matrix inversion is converted to solving linear equations
to reduce computational complexity: $H_f(x_k) \Delta x_k = -\nabla f(x_k)$.
With a initial value near a local extreme where the Hessian is invertible and Lipschitz continuous,
Newton's method converges towards a local extreme quadratically,
which is much faster than gradient descent.
Relaxing the step size by a factor less than one improves convergence of Newton's method,
but converges linearly.

**Quasi-Newton methods** approximate the gradient and Hessian in Newton's method
by differences of cached function values.

Algorithm (BFGS): initial guess $\mathbf{x}_0$, Hessian matrix $B_0 = I$.

- Obtain a direction $\mathbf{p}_k$: $B_k \mathbf{p}_k = -\nabla f(\mathbf{x}_k)$;
- Line search for an acceptable step size $\alpha_k$;
- $\mathbf{s}_k = \alpha_k \mathbf{p}_k$;
- $\mathbf{x}_{k+1} = \mathbf{x}_k + \mathbf{s}_k$;
- $\mathbf{y}_k = {\nabla f(\mathbf{x}_{k+1}) - \nabla f(\mathbf{x}_k)}$;
- Update formula for $B_k$;
- Iterate until the norm of the gradient falls under a threshold.

**Secant method** (割线法, 弦位法) can be thought of as a finite difference approximation
of Newton's method, although developed long before the latter:
$$x_k = x_{k-1} - f(x_{k-1}) \frac{x_{k-1} - x_{k-2}}{f(x_{k-1}) - f(x_{k-2})}$$
The secant method converges if the initial values are sufficiently close to the root.
If the function is twice continuously differentiable and the root in question has multiplicity 1,
the order of convergence is the golden ratio $(1+\sqrt{5}) / 2$, between linear and quadratic.

**Davidon–Fletcher–Powell formula** (DFP) is the first quasi-Newton method
to generalize the secant method to a multidimensional problem.
[@Davidon1959; @Fletcher and Powell, 1963]

Update formula: $$B_{k+1} = \left (I-\frac {y_k \, \Delta x_k^T} {y_k^T \, \Delta x_k} \right ) B_k \left (I-\frac {\Delta x_k y_k^T} {y_k^T \, \Delta x_k} \right )+\frac{y_k y_k^T} {y_k^T \, \Delta x_k}$$ Or with $H_{k+1} = B_{k+1}^{-1}$: $$H_{k+1} = H_k + \frac {\Delta x_k \Delta x_k^T}{\Delta x_k^{T} \, y_k} - \frac {H_k y_k y_k^T H_k} {y_k^T H_k y_k}$$

**Broyden–Fletcher–Goldfarb–Shanno algorithm** (BFGS)
[@Broyden1970; @Fletcher1970; @Goldfarb1970; @Shanno1970]

Update formula: $$B_{k+1} = B_k + \frac {y_k y_k^T}{y_k^{T} \Delta x_k} - \frac {B_k \Delta x_k (B_k \Delta x_k)^T} {\Delta x_k^{T} B_k \, \Delta x_k}$$ Or with $H_{k+1} = B_{k+1}^{-1}$: $$H_{k+1} = \left (I-\frac {\Delta x_k y_k^T } {y_k^T \Delta x_k} \right ) H_k \left (I-\frac { y_k \Delta x_k^T} {y_k^T \Delta x_k} \right )+\frac{\Delta x_k \Delta x_k^T} {y_k^T \, \Delta x_k}$$

Nonlinear Conjugate Gradient Methods