Newton's method, Newton–Raphson method, or method of tangents [@Simpson1740] is an iterative root finding algorithm for differentiable functions. At each iteration, the derivative of the function is calculated to obtain a linear approximation of the function, whose root is used as a new approximation to a root of the original function.

## Real Function

Given a nonlinear function $f \in C^1(\mathbb{R})$, we would like to find a solution to the equation $f(x) = 0$. With an initial approximate solution $x_0$, Newton's method updates by $x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}$, $k \in \mathbb{N}$.

If $f \in C^2(\mathbb{R})$ and $x_0$ is close to a simple root $x^∗$, i.e. $f'(x^∗) \ne 0$, then Newton's method has quadratic convergence: $|x_{k+1} - x^∗| \le c |x_k - x^∗|^2$, where $c > 0$. The initial approximate solution can be obtained by methods with better convergence guarantee but perhaps slower convergence rates, e.g. bisection method or gradient method.

(Proof of local convergence rate.)

Modified Newton method only computes the derivative at the initial step and use it in all iterations: $x_{k+1} = x_k - \frac{f(x_k)}{f'(x_0)}$. Modified Newton method saves gradient computation in ensuing iterations, but has linear convergence instead of quadratic convergence.

## Transformation

Given a nonlinear transformation of the Euclidean n-space, $f \in C^1(\mathbb{R}^n, \mathbb{R}^n)$, we would like to find a solution to the system of nonlinear equations $f(x) = 0$. With an initial approximate solution $x_0$, Newton's method updates by $x_{k+1} = x_k - [\nabla f(x_k)]^{-1} f(x_k)$. To reduce computational complexity, matrix inversion is converted to solving linear equations $\nabla f(x_k) \delta_k = - f(x_k)$ by Gaussian elimination:

1. LU decomposition: $\nabla f(x_k) = L_k U_k$;
2. solve: $L_k y_k = - f(x_k)$;
3. solve: $U_k \delta_k = y_k$;
4. update: $x_{k+1} = x_k + \delta_k$.

LU decomposition can be computed in the same time complexity as the algorithm it uses to multiply two square matrices [@Bunch and Hopcroft, 1974]. Matrix multiplication algorithms have time complexity $\mathcal{O}(n^a)$ for matrices of order $n$, where $a \in (2, 3]$: the Strassen algorithm (1969) has an asymptotic rate $a = \log_2 7 \approx 2.807$; the Coppersmith–Winograd algorithm (1990) has an asymptotic rate $a \approx 2.375$, although it is not used for matrices of a practical size. Solving a system of linear equations with a triangular coefficient matrix takes $\mathcal{O}(n^2)$ steps of scalar multiplications and additions.

(Local convergence rate.)

## Immersion

Given an immersion from the Euclidean n-space into the Euclidean m-space, $m > n$, i.e. a differentiable map whose differentials are injective everywhere: $f \in C^1(\mathbb{R}^n, \mathbb{R}^m)$, $\text{rank}~\nabla f(x) = n$. This gives an overdetermined system of nonlinear equations: $f(x) = 0$, which may not have a solution. But it can be turned into a non-linear least squares problem: $\min_x \|f(x)\|^2$. Then it can be solved by the Gauss-Newton algorithm [@Gauss1809] which replaces the inverse of Jacobian in the update formula with the Moore-Penrose inverse: $x_{k+1} = x_k - J_k^\dagger f(x_k)$, where $J^\dagger = (J^T J)^{-1} J^T$ and $J_k = \nabla f(x_k)$. Note that $\text{rank}~(J^T J) = n$, and thus $J^T J$ is invertible. In practice, it is implemented by solving linear equations: (normal equations) $J_k^T J_k \delta_k = - J_k^T f(x_k)$.

The rate of convergence of the Gauss–Newton algorithm can approach quadratic [@Björck1996].

## Submersion

Given a submersion from the Euclidean n-space into the Euclidean m-space, $m < n$, i.e. a differentiable map whose differentials are surjective everywhere: $f \in C^1(\mathbb{R}^n, \mathbb{R}^m)$, $\text{rank}~\nabla f(x) = m$.

This gives an underdetermined system of nonlinear equations: $f(x) = 0$, which may have infinitely many solutions. In fact, by submersion level set theorem (see Topology), the zero set $f^{-1}(0)$ is a submanifold of dimension $n - m$.

We find a unique solution by the "least-change" criterium, which also replaces the inverse of Jacobian in the update formula with the Moore-Penrose inverse [@Ben-Israel1966]: $x_{k+1} = x_k - J_k^\dagger f(x_k)$, where $J^\dagger = J^T (J J^T)^{-1}$.

Newton's method for under-determined systems may also use left inverse ($B A = I_n$) [@Tapia1969], right inverse ($A B = I_m$) [@Paardekooper1990], or outer inverse ($B A B = B$) [@Nashed and Chen, 1993].

## Convergence Analysis

Convergence region or basin of attraction $B_\zeta$ of Newton's method associated with a zero is the set of points that converge to the zero under Newton map: $B_\zeta = \{x \in \mathbb{R}^n : \lim_{n \to \infty} N_f^n(x) = \zeta\}$. Convergence region or basin of attraction $B$ of Newton's method is the set of points that converge to any zero under Newton map: $B = \cup_{\zeta \in f^{-1}(0)} B_\zeta$. We may also use the notation for stable manifold: $W^s_\zeta$ and $W^s$.

Approximate zero of a function is a point whose Newton sequence exists and converges to a zero quadratically such that (see e.g. [@Dedieu2001]): $\forall k \ge 0$, $\|x_k - \zeta\| = 2^{-2^k + 1} \|x_0 - \zeta\|$; where $\zeta$ is called the associated zero.

alpha certified point.

### Well-determined system

Kantorovich theorem [@Kantorovich1964]: For a Fréchet twice continuously differentiable map between subsets of Banach spaces, starting at a point where the Newton step exists, let $\beta = \|J^{-1}_0 f_0\|$, $\gamma = \sup_{x \in X} \|J^{-1}_0 (D^2 f)_x\|$, $\alpha = \beta \gamma$. If $\alpha \le \frac{1}{2}$ and the domain contains a closed ball of radius $r_0 = \frac{1 - \sqrt{1 - 2 \alpha}}{\alpha} \beta$ centered at the starting point, then the Newton sequence exists and converges quadratically at rate $-\log_2 (2 \alpha)$ to the unique zero in the ball: $f \in C^2(X, W)$, $X \subset V$, $x_0 \in X$, if $\alpha \le \frac{1}{2}$, $\bar B_{r_0}(x_0) \subset X$, then $\bar B_{r_0}(x_0) \cap f^{-1}(0) = \{\zeta\}$, $\forall k \in \mathbb{N}$, $\|x_k - \zeta\| \le \gamma^{-1} 2^{-k} (2 \alpha)^{2^k}$. Furthermore, this is the only zero in within distance $r_1 = \frac{1 + \sqrt{1 - 2 \alpha}}{\alpha} \beta$ to the starting point: $f^{-1}(0) \cap B_{r_1}(x_0) = \{\zeta\}$.

Kantorovich theorem [@Tapia1971]: For a Fréchet $K$-Lipschitz differentiable map from an open convex subset of a Banach space to another Banach space, starting at a point where the Newton step exists, let $\beta = \|J^{-1}_0 f_0\|$, $B = \|J^{-1}_0\|$, $\gamma = B K$, $\alpha = \beta \gamma$, if $\alpha \le \frac{1}{2}$ and the domain contains a closed ball of radius $r_0 = \frac{1 - \sqrt{1 - 2 \alpha}}{\alpha} \beta$ centered at the starting point, then the Newton sequence exists and converges quadratically at rate $-\log_2 (1 - \sqrt{1 - 2 \alpha})$ to a zero in the ball: $f \in C^1(X, W)$, $X \subset V$ is open and convex, $\forall x, x' \in X$, $\|J_x - J_{x'}\| \le K \|x - x'\|$, $x_0 \in X$, if $\alpha \le \frac{1}{2}$, $\bar B_{r_0}(x_0) \subset X$, then $\exists \zeta \in \bar B_{r_0}(x_0) \cap f^{-1}(0)$: $\forall k \in \mathbb{N}$, $\|x_k - \zeta\| \le \gamma^{-1} 2^{-k} (1 - \sqrt{1 - 2 \alpha})^{2^k}$. Note that, this result gives an optimum estimate for the rate of convergence: $\forall \alpha \in (0, \frac{1}{2})$, $1 - \sqrt{1 - 2 \alpha} < 2 \alpha$.

Smale's alpha theorem [@Smale1986, Thm A]: For an analytic map betwen subsets of Banach spaces, denote the norm of the Newton step $\beta(f, x) = \|Df(x)^{-1} f(x)\|$, and $\gamma(f, x) = \sup_{k \ge 2} \sqrt[k-1]{\left\|(Df)_x^{-1} \frac{(D^k f)_x}{k!} \right\|}$, where $D^k f(z)$ is the k-th derivative as a k-linear map, $(Df)^{-1} (D^k f)$ denotes the composition, and the norm is the operator norm of a multilinear map. There is a constant $\alpha_0 \approx 0.130707$ such that if the initial point satifies $\alpha(f, x) < \alpha_0$ then it is an approximate zero, and the associated zero is within distance $c_0 \beta(f, x)$ where constant $c_0 = \sum_{k=0}^\infty 2^{-2^k + 1} \approx 1.63281$.

Smale's gamma theorem [@Smale1986, Thm C]: For an analytic map betwen subsets of Banach spaces, every point in the ball of radius $u_0 / \gamma(f, \zeta)$ centered at a zero $\zeta$ is an approximate zero, where constant $u_0 = \frac{3 - \sqrt{7}}{2} \approx 0.17712$.

### Under-determined system

normal flow algorithm the Newton steps are asymptotically normal to the Davidenko flow.

[@Ben-Israel1966]

augmented Jacobian algorithm

[@Walker1990; @Galantai2000]

Convergence radius of a polynomial system $u_0 / \gamma(f, \zeta)$, constant $u_0 \approx 0.05992$. [@Dedieu2006, Def 129; @Beltran and Pardo 2007, Thm 2.1]