General form of optimization on manifolds.
General form of optimization methods on Riemannian manifolds: $x_{t+1} = x_t + v_t$, where $v_t$ is the update/step vector (Euclidean); $x_{t+1} = R(x_t, v_t)$ (Riemannian). Riemannian gradient descent on submanifolds with projection as retraction: $x_{t+1} = \Pi(x_t - c_t \nabla f(x_t))$, $c_t > 0$ is stepsize coefficient.
Retraction $R(x, v)$ around a point $x_0$ in a $C^k$ manifold $M$, $k \ge 2$, is a $C^{k−1}$ mapping from a neighborhood $U$ of $(x_0, 0)$ in the tangent bundle to the manifold, such that (a) it maps every zero tangent vector to the point of tangent, and (b) the differential of every restricted retraction evaluated at the zero tangent vector is the identity map of the tangent space: $\forall (x, 0) \in U$, (a) $R(x, 0) = x$; (b) $d(R_x)_0 = \text{Id}$, where $R_x = R|_{U \cap T_x M}$ and $d(R_x)_0 : T_0 (T_x M) \cong T_x M \mapsto T_x M$, under the usual identification of $T_0 (T_x M)$ with $T_x M$. Retraction $R(x, v)$ on a $C^k$ manifold $M$, $k \ge 2$, is a $C^{k−1}$ mapping from an open subset of the tangent bundle to the manifold, such that it is a retraction around every point in the manifold: $R: \mathscr{D} \mapsto M$, $\mathscr{D} \subset T M$, $\{(x, 0) : x \in M\} \subset \mathscr{D}$, $\exists (U_i)_{i \in I}$: $\mathscr{D} = \cup_{i \in I} U_i$, $\forall i \in I$, $R|_{U_i}$ is a retraction around some point $x \in M$. Restricted retraction $R_x(v)$ at a point $x \in M$ is the restriction of the retraction to the itersection of its domain with the tangent space at the point: $R_x = R|_{\mathscr{D} \cap T_x M}$.
Retraction of order k
The retraction as defined earlier is a first-order retraction.
Retractions for the same manifold shall be evaluated by (1) their order of approximation and (2) its computational efficiency.
Retractions can be constructed for:
Constrained optimization (projected updates, uses projection):
Riemannian optimization (on intrinsic manifold, use exponential map by default):
Riemannian optimization is mostly applied to matrix manifolds, see [@Absil2008] for a book treatment.
Escaping saddle points (in smooth optimization):