**Newton's method**, **Newton–Raphson method**, or **method of tangents** [@Simpson1740]
is an iterative root finding algorithm for differentiable functions.
At each iteration, the derivative of the function is calculated to obtain a linear approximation
of the function, whose root is used as a new approximation to a root of the original function.

Given a nonlinear function $f \in C^1(\mathbb{R})$, we would like to find a solution to the equation $f(x) = 0$. With an initial approximate solution $x_0$, Newton's method updates by $x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}$, $k \in \mathbb{N}$.

If $f \in C^2(\mathbb{R})$ and $x_0$ is close to a simple root $x^∗$, i.e. $f'(x^∗) \ne 0$, then Newton's method has quadratic convergence: $|x_{k+1} - x^∗| \le c |x_k - x^∗|^2$, where $c > 0$. The initial approximate solution can be obtained by methods with better convergence guarantee but perhaps slower convergence rates, e.g. bisection method or gradient method.

(Proof of local convergence rate.)

**Modified Newton method** only computes the derivative at the initial step
and use it in all iterations:
$x_{k+1} = x_k - \frac{f(x_k)}{f'(x_0)}$.
Modified Newton method saves gradient computation in ensuing iterations,
but has linear convergence instead of quadratic convergence.

Given a nonlinear transformation of the Euclidean n-space, $f \in C^1(\mathbb{R}^n, \mathbb{R}^n)$, we would like to find a solution to the system of nonlinear equations $f(x) = 0$. With an initial approximate solution $x_0$, Newton's method updates by $x_{k+1} = x_k - [\nabla f(x_k)]^{-1} f(x_k)$. To reduce computational complexity, matrix inversion is converted to solving linear equations $\nabla f(x_k) \delta_k = - f(x_k)$ by Gaussian elimination:

- LU decomposition: $\nabla f(x_k) = L_k U_k$;
- solve: $L_k y_k = - f(x_k)$;
- solve: $U_k \delta_k = y_k$;
- update: $x_{k+1} = x_k + \delta_k$.

LU decomposition can be computed in the same time complexity as the algorithm it uses to multiply two square matrices [@Bunch and Hopcroft, 1974]. Matrix multiplication algorithms have time complexity $\mathcal{O}(n^a)$ for matrices of order $n$, where $a \in (2, 3]$: the Strassen algorithm (1969) has an asymptotic rate $a = \log_2 7 \approx 2.807$; the Coppersmith–Winograd algorithm (1990) has an asymptotic rate $a \approx 2.375$, although it is not used for matrices of a practical size. Solving a system of linear equations with a triangular coefficient matrix takes $\mathcal{O}(n^2)$ steps of scalar multiplications and additions.

(Local convergence rate.)

Given an immersion from the Euclidean n-space into the Euclidean m-space, $m > n$,
i.e. a differentiable map whose differentials are injective everywhere:
$f \in C^1(\mathbb{R}^n, \mathbb{R}^m)$, $\text{rank}~\nabla f(x) = n$.
This gives an overdetermined system of nonlinear equations:
$f(x) = 0$, which may not have a solution.
But it can be turned into a non-linear least squares problem: $\min_x \|f(x)\|^2$.
Then it can be solved by the **Gauss-Newton algorithm** [@Gauss1809]
which replaces the inverse of Jacobian in the update formula with the Moore-Penrose inverse:
$x_{k+1} = x_k - J_k^\dagger f(x_k)$, where $J^\dagger = (J^T J)^{-1} J^T$ and $J_k = \nabla f(x_k)$.
Note that $\text{rank}~(J^T J) = n$, and thus $J^T J$ is invertible.
In practice, it is implemented by solving linear equations:
(normal equations) $J_k^T J_k \delta_k = - J_k^T f(x_k)$.

The rate of convergence of the Gauss–Newton algorithm can approach quadratic [@Björck1996].

Given a submersion from the Euclidean n-space into the Euclidean m-space, $m < n$, i.e. a differentiable map whose differentials are surjective everywhere: $f \in C^1(\mathbb{R}^n, \mathbb{R}^m)$, $\text{rank}~\nabla f(x) = m$.

This gives an underdetermined system of nonlinear equations: $f(x) = 0$, which may have infinitely many solutions. In fact, by submersion level set theorem (see Topology), the zero set $f^{-1}(0)$ is a submanifold of dimension $n - m$.

We find a unique solution by the "least-change" criterium, which also replaces the inverse of Jacobian in the update formula with the Moore-Penrose inverse [@Ben-Israel1966]: $x_{k+1} = x_k - J_k^\dagger f(x_k)$, where $J^\dagger = J^T (J J^T)^{-1}$.

Newton's method for under-determined systems may also use left inverse ($B A = I_n$) [@Tapia1969], right inverse ($A B = I_m$) [@Paardekooper1990], or outer inverse ($B A B = B$) [@Nashed and Chen, 1993].

**Convergence region** or **basin of attraction** $B_\zeta$ of Newton's method associated with a zero
is the set of points that converge to the zero under Newton map:
$B_\zeta = \{x \in \mathbb{R}^n : \lim_{n \to \infty} N_f^n(x) = \zeta\}$.
**Convergence region** or **basin of attraction** $B$ of Newton's method
is the set of points that converge to any zero under Newton map:
$B = \cup_{\zeta \in f^{-1}(0)} B_\zeta$.
We may also use the notation for stable manifold: $W^s_\zeta$ and $W^s$.

**Approximate zero** of a function
is a point whose Newton sequence exists and converges to a zero quadratically
such that (see e.g. [@Dedieu2001]):
$\forall k \ge 0$, $\|x_k - \zeta\| = 2^{-2^k + 1} \|x_0 - \zeta\|$;
where $\zeta$ is called the associated zero.

alpha certified point.

*Kantorovich theorem* [@Kantorovich1964]:
For a Frechet twice continuously differentiable map between subsets of Banach spaces,
starting at a point where the Newton step exists,
let $\beta = \|J^{-1}_0 f_0\|$,
$\gamma = \sup_{x \in X} \|J^{-1}_0 (D^2 f)_x\|$, $\alpha = \beta \gamma$.
If $\alpha \le \frac{1}{2}$ and the domain contains a closed ball of radius
$r_0 = \frac{1 - \sqrt{1 - 2 \alpha}}{\alpha} \beta$ centered at the starting point,
then the Newton sequence exists and converges quadratically
at rate $-\log_2 (2 \alpha)$ to the unique zero in the ball:
$f \in C^2(X, W)$, $X \subset V$,
$x_0 \in X$, if $\alpha \le \frac{1}{2}$, $\bar B_{r_0}(x_0) \subset X$,
then $\bar B_{r_0}(x_0) \cap f^{-1}(0) = \{\zeta\}$, $\forall k \in \mathbb{N}$,
$\|x_k - \zeta\| \le \gamma^{-1} 2^{-k} (2 \alpha)^{2^k}$.
Furthermore, this is the only zero in within distance
$r_1 = \frac{1 + \sqrt{1 - 2 \alpha}}{\alpha} \beta$ to the starting point:
$f^{-1}(0) \cap B_{r_1}(x_0) = \{\zeta\}$.

*Kantorovich theorem* [@Tapia1971]:
For a Frechet $K$-Lipschitz differentiable map
from an open convex subset of a Banach space to another Banach space,
starting at a point where the Newton step exists,
let $\beta = \|J^{-1}_0 f_0\|$, $B = \|J^{-1}_0\|$, $\gamma = B K$, $\alpha = \beta \gamma$,
if $\alpha \le \frac{1}{2}$ and the domain contains a closed ball of radius
$r_0 = \frac{1 - \sqrt{1 - 2 \alpha}}{\alpha} \beta$ centered at the starting point,
then the Newton sequence exists and converges quadratically at rate
$-\log_2 (1 - \sqrt{1 - 2 \alpha})$ to a zero in the ball:
$f \in C^1(X, W)$, $X \subset V$ is open and convex,
$\forall x, x' \in X$, $\|J_x - J_{x'}\| \le K \|x - x'\|$,
$x_0 \in X$, if $\alpha \le \frac{1}{2}$, $\bar B_{r_0}(x_0) \subset X$,
then $\exists \zeta \in \bar B_{r_0}(x_0) \cap f^{-1}(0)$: $\forall k \in \mathbb{N}$,
$\|x_k - \zeta\| \le \gamma^{-1} 2^{-k} (1 - \sqrt{1 - 2 \alpha})^{2^k}$.
Note that, this result gives an optimum estimate for the rate of convergence:
$\forall \alpha \in (0, \frac{1}{2})$, $1 - \sqrt{1 - 2 \alpha} < 2 \alpha$.

*Smale's alpha theorem* [@Smale1986, Thm A]:
For an analytic map betwen subsets of Banach spaces,
denote the norm of the Newton step $\beta(f, x) = \|Df(x)^{-1} f(x)\|$, and
$\gamma(f, x) = \sup_{k \ge 2} \sqrt[k-1]{\left\|(Df)_x^{-1} \frac{(D^k f)_x}{k!} \right\|}$,
where $D^k f(z)$ is the k-th derivative as a k-linear map,
$(Df)^{-1} (D^k f)$ denotes the composition, and the norm is the operator norm of a multilinear map.
There is a constant $\alpha_0 \approx 0.130707$ such that
if the initial point satifies $\alpha(f, x) < \alpha_0$ then it is an approximate zero,
and the associated zero is within distance $c_0 \beta(f, x)$
where constant $c_0 = \sum_{k=0}^\infty 2^{-2^k + 1} \approx 1.63281$.

*Smale's gamma theorem* [@Smale1986, Thm C]:
For an analytic map betwen subsets of Banach spaces, every point in the ball of radius
$u_0 / \gamma(f, \zeta)$ centered at a zero $\zeta$ is an approximate zero,
where constant $u_0 = \frac{3 - \sqrt{7}}{2} \approx 0.17712$.

normal flow algorithm the Newton steps are asymptotically normal to the Davidenko flow.

[@Ben-Israel1966]

augmented Jacobian algorithm

[@Walker1990; @Galantai2000]

Convergence radius of a polynomial system $u_0 / \gamma(f, \zeta)$, constant $u_0 \approx 0.05992$. [@Dedieu2006, Def 129; @Beltran and Pardo 2007, Thm 2.1]

- Kantorovich and Akilov, 1964, 1982. Functional Analysis in Normed Spaces.
- Blum, Cucker, Shub, and Smale. 1998. Complexity and Real Computation.