[@Lafon2004] uses the language of functional analysis.
Geometry of a set $Γ$ is a set of rules that describe the relationship between its elements. Intrinsic geometry (本征几何) is one that does not refer to a superset or the structures therein. Extrinsic geometry (表征几何) is one that is induced from the geometry of a superset.
A dual perspective in functional analysis, from inverse problems such as inverse scattering, potential theory, and spectral geometry: The geometry of a set $Γ$ can be understood through the geometry of the space of functions defined on $Γ$ (and operators on these functions).
(Manifold learning.)
Spectral graph theory analyzes the spectrum (eigenvalues and eigenvectors) of a matrix representing a graph. Spectral clustering/partitioning: construct a matrix representation of the graph; eigendecompose the matrix and embed the vertices to a Euclidean space using one or more eigenvectors; group the points based on their coordinates.
The Laplacian matrix of a graph, aka graph Laplacian, is its degree matrix minus its adjacency matrix, $L = D - A$, where $D = \mathrm{diag}\{d_i\}_{i=1}^n$. The Laplacian matrix is a discrete analog of the Laplacian operator $Δ$ in multivariable calculus. The Laplacian matrix is symmetric, so its eigenvectors can form an orthogonal basis; additionally, all eigenvalues are non-negative. The (symmetric) normalized Laplacian is $L' = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}$.
Diffusion processes (or random walks, Markov processes) can be useful for the geometric descriptions of data sets. @Coifman2006a approximates (anisotropic) diffusion processes $\exp(-t H_\alpha)$ on submanifolds, where $H_\alpha = \Delta - \Delta(p^{1-\alpha}) / p^{1-\alpha}$ is a symmetric Schrödinger operator and parameter $\alpha \in \mathbb{R}$, with special cases: $\alpha = 1$, heat diffusion; $\alpha = 1/2$, a Fokker–Planck diffusion; $\alpha = 0$, normalized graph Laplacian with isotropic weights. In comparison, "kernel eigenmaps" are diffusion processes $\exp(-t Q_2^{-1} Q_1)$.
An integral kernel $k(x,y)$ defines an integral transform/operator $K$ for functions on measure space $(\Gamma, \mathrm{d}\mu)$, such that $K f(x) = \int_\Gamma k(x, y) f(y)~\mathrm{d}\mu(y)$. Here we call an integral kernel admissible, if it is symmetric, positivity-preserving, and positive semi-definite: $k(x, y) = k(y, x)$; $k \ge 0$; $\langle f, K f \rangle \ge 0$. Admissible kernels can be designed to represent a local metric (e.g. degree of similarity) or exhibit a local behavior.
A Markov/stochastic kernel $\tilde{a}(x, y)$ defines a Markov process (an averaging operator) $\tilde{A}$, where $\tilde{a}(x, \cdot)$ is a probability measure. Every positivity-preserving integral kernel can be transformed into a Markov kernel $\tilde{a}(x, y)= k(x, y) / v^2(x)$, where $v^2(x) = K 1$. Assume the Markov kernel has eigendecomposition $\tilde{a}(x, y) = \sum_{i \in \mathbb{N}} \lambda_i \psi_i(x) \hat{\psi}_i(y)$. The right eigenvectors $\{\psi_i\}$ (and left eigenvectors $\{\hat{\psi}_i\}$) need not be orthogonal.
A diffusion kernel $a(x, y)$ is the symmetric conjugate of a Markov kernel $\tilde{a}$ by $v$, $a(x, y) = v(x) \tilde{a}(x, y) / v(y)$. The diffusion operator $A$ with kernel $a$ is positive semi-definite, bounded, with the largest eigenvalue 1 and a corresponding eigenfunction $v(x)$. Assume that $A$ is compact (not just bounded), then its spectrum is discrete and its eigendecomposition can be written as $a(x, y) = \sum_{i \in \mathbb{N}} \lambda_i \phi_i(x) \phi_i(y)$, where $A \phi_i(x) = \lambda_i \phi_i(x)$. The diffusion operator $A$ and the averaging operator $\tilde{A}$ have the same spectrum $\lambda$, and the eigenfunctions of $A$ and the right/left eigenfunctions of $\tilde{A}$ are related by conjugation by $v$: $\psi_i = \mathrm{diag}(v^{-1}) \phi_i$, $\hat{\psi}_i = \mathrm{diag}(v) \phi_i$. The t-step diffusion operator $A^t$ aggregates the local information, and its kernel $a^{(t)}$ has eigendecomposition $a^{(t)}(x, y) = \sum_{i \in \mathbb{N}} \lambda_i^t \phi_i(x) \phi_i(y)$.
The diffusion metric/distance $D_t(x, y)$ at time step $t$ is defined as $D_t^2 (x, y) \equiv a^{(t)}(x, x) + a^{(t)}(y, y) - 2 a^{(t)}(x, y)$, which is equivalent to $D_t^2 (x, y) = \sum_{i \in \mathbb{N}} \lambda_i^t [\phi_i(x) - \phi_i(y)]^2$ and $D_t^2(x,y) = \| a^{(t/2)}(x,\cdot) - a^{(t/2)}(y,\cdot) \|^2$. Thus $D_t$ is a semi-metric on the set, which becomes a metric if the diffusion kernel $a$ is positive definite. Unlike the geodesic distance, the diffusion distance is an average over all paths connecting two points, and thus robust to noise and topological short-circuits. A diffusion map $\Phi: \Gamma \to l^2(\mathbb{N})$ is a mapping of the data to a Euclidean space, based on the eigenfunctions of the diffusion kernel: $\Phi(x) = \left( \phi_i(x) \right)_{i \in \mathbb{N}}$. Diffusion map $\Phi$ defines an embedding of the data that preserves the diffusion distance $D_t$ via a weighted Euclidean metric $D_t^2(x,y) = \langle \Phi(x) - \Phi(y), \Lambda^t (\Phi(x) - \Phi(y)) \rangle$. A truncated diffusion map $\Phi_k: \Gamma \to \mathbb{R}^k$ with $\Phi_k(x) = \left( \phi_i(x) \right)_{i=1}^k$ preserves the diffusion distance with certain accuracy. The dimension $k$ of the new representation depends on the diffusion process, and is in general greater than the dimension $d$ of the set/manifold.
(Inverse mapping.)
Given $\Gamma$ is a compact smooth submanifold of dimension $d$ in $\mathbb{R}^n$, i.e. a Riemannian manifold with Riemannian metric induced from the ambient space $\mathbb{R}^n$. Let $\mu$ be a probability measure on $\Gamma$ and $\mathrm{d}x$ the Riemannian measure on $\Gamma$, then probability density $p(x)$ is given by $p(x) = \mathrm{d}\mu(x) / \mathrm{d}x$. Let $\{e_i\}_{i=1}^d$ be an orthonormal basis of the tangent plane $T_x$ to $Γ$ at $x$, and the exponential map $\mathrm{exp}_x$ of the coordinates of the tangent plane forms a chart on $Γ$ around $x$, which provides normal coordinates $(s_i)_{i=1}^d$. The Laplace-Beltrami operator $\Delta$ on the Riemannian manifold can be written as $\Delta f = - \sum_{i=1}^d \partial^2 f / \partial s_i^2$, where $f \in C^\infty(\Gamma)$ is a smooth function on the manifold. The Neumann heat operator on the manifold (heat diffusion on the manifold with the Neumann boundary condition) is $e^{−t \Delta}$, which corresponds to the Neumann heat kernel $p_t (x, y)$ when seen as an integral operator.
Consider rotation-invariant kernels $k_\varepsilon (x, y) = h(\|x-y\|^2 / \varepsilon)$ with scale parameter $\varepsilon$, where $h$ is smooth and decays exponentially, and $\|\cdot\|$ is the Euclidean metric. As $\varepsilon$ goes to 0, the local geometry specified by $k_\varepsilon$ coincide with that of the manifold. Given an averaging operator $\tilde{A}_\varepsilon f(x) = K_\varepsilon f(x) / v_\varepsilon^2 (x)$ where $v_\varepsilon^2 (x) = K_\varepsilon 1$, the weighted graph Laplacian operator is $\tilde{\Delta}_\varepsilon = (I - \tilde{A}_\varepsilon) / \varepsilon$. On certain subspaces $E_K$ of $L^2(\Gamma)$, as $\varepsilon$ goes to 0, the limit operator of $\tilde{\Delta}_\varepsilon$ is $H f = c (\Delta f + 2 \langle \nabla p / p , \nabla f \rangle)$, where $c = m_2 / (2 m_0)$, $m_0 = \int_{\mathbb{R}^d} h(\|u\|^2)~\mathrm{d}u$, and $m_2 = \int_{\mathbb{R}^d} u_i^2 h(\|u\|^2)~\mathrm{d}u$. Under a conjugation by the density $p$, we get $p H(g/p) = c (\Delta g - g \nabla p / p)$, where $g = p f$. Thus, when the density $p$ is uniform, the weighted graph Laplacian $\tilde{\Delta}_\varepsilon$ converges to (a multiple of) the Laplace-Beltrami operator $\Delta$ on the manifold [@Belkin2003]; but when the density is non-uniform, such reconstruction does not hold.
To separate the distribution $p$ on the manifold from its intrinsic geometry $k_\varepsilon$, use kernel $\tilde{k}_\varepsilon(x, y) = k_\varepsilon(x, y)/(p_\varepsilon(x) p_\varepsilon(y))$ where $p_\varepsilon(x) = K_\varepsilon 1$. This leads to an averaging operator $\bar{A}_\varepsilon f(x) = \tilde{K}_\varepsilon f(x) / \tilde{v}_\varepsilon^2 (x)$ where $\tilde{v}_\varepsilon^2 (x) = \tilde{K}_\varepsilon 1$. Define a Laplace operator $\Delta_\varepsilon = (I - \bar{A}_\varepsilon) / \varepsilon$ acting on the subspaces $E_K$ of $L^2(\Gamma)$, its limit operator $\lim_{\varepsilon \to 0} \Delta_\varepsilon \equiv \Delta_0 = \Delta / c$ is a multiple of the Laplace-Beltrami operator. The limit operator of $\bar{A}_\varepsilon^{t/\varepsilon}$ is the Neumann heat operator $e^{−t \Delta_0}$; in other words, Markov kernel $\bar{a}_\varepsilon^{(t/\varepsilon)}(x, y)$ with small $\varepsilon$, close to a fine scale Gaussian kernel, approximates the (symmetric) Neumann heat kernel $p_t (x, y)$, regardless of knowledge of the boundary. The averaging operator $\bar{A}_\varepsilon$ is compact and its eigendecomposition can be written as $\bar{A}_\varepsilon = \sum_{i \in \mathbb{N}} \lambda_{\varepsilon, i} P_{\varepsilon, i}$, where $P_{\varepsilon, i}$ is the orthogonal projector on the eigenspace corresponding to eigenvalue $\lambda_{\varepsilon, i}$. If the eigendecomposition of the Laplace-Beltrami operator is $\Delta = \sum_{i \in \mathbb{N}} \nu_i^2 P_i$, where $P_i$ is the orthogonal projector on the eigenspace corresponding to eigenvalue $\nu_i^2$. Then the eigenfunctions and eigenvalues of $\bar{A}_\varepsilon^{t/\varepsilon}$ converge to those of the heat operator $e^{-t\Delta}$: $\lim_{\varepsilon \to 0} \lambda_{\varepsilon, i}^{t/\varepsilon} = e^{-t\nu_i^2}$, $\lim_{\varepsilon \to 0} P_{\varepsilon, i} = P_i$. Let $\{\phi_i\}_{i \in \mathbb{N}}$ be the orthonormal eigenfunctions of the Laplace-Beltrami operator (also of the heat kernel) on the manifold $\Gamma$, then the eigendecomposition of the heat kernel is $p_t(x, y) = \sum_{i \in \mathbb{N}} e^{-t \nu_i^2} \phi_i(x) \phi_i(y)$. Although $\{\phi_i\}_{i \in \mathbb{N}}$ is usually viewed as a Hilbert basis of $L^2(Γ)$, it also forms a set of coordinates on the submanifold $Γ$ via diffusion maps $\Phi$, which accurately preserves the heat diffusion distance with dimension $k \ge d$.
Numerical procedure for approximate Neumann heat diffusion: (Note that skipping steps 3-4 provides the weighted graph Laplacian.)
(for differential and dynamical systems)
Diffusion processes are also useful for harmonic analysis of functions on data sets.
Geometric harmonics is a set of functions that allows out-of-sample extension of empirical functions on the data set.
Geometric harmonics provide a simple solution to the relaxed distortion problem, an embedding $\Psi$ that is bi-Lipschitz with a small distortion.
Atlas Computation:
Eigenfunction selection:
Intrinsic geometry, Fourier analysis of the manifold (eigenfunctions of the Laplace-Beltrami operator). Extrinsic geometry, Fourier analysis of the ambient space.
Restriction.
Extension (a version of the Heisenberg uncertainty principle): extend the eigenfunctions to a band-limited function of band $\mathcal{O}(\nu_i)$, localized in a tube of radius $\mathcal{O}(1/\nu_i)$ around the manifold.
(Extension of f = 1 for discription of the manifold?)
Multiscale extension: empirical functions are decomposed into frequency bands, and each band is extended to a certain distance so that it satisfies some version of the Heisenberg principle.
(Intrinsic and extrinsic diffusion.)
Multiscale extension scheme: