[@Lafon2004] uses the language of functional analysis.
Geometry of a set $Γ$ is a set of rules that describe the relationship between its elements. Intrinsic geometry (本征几何) is one that does not refer to a superset or the structures therein. Extrinsic geometry (表征几何) is one that is induced from the geometry of a superset.
A dual perspective in functional analysis, from inverse problems such as inverse scattering, potential theory, and spectral geometry: The geometry of a set can be understood through the geometry of the space of functions on the set (and spaces of functionals and operators).
(Manifold learning.)
Spectral graph theory analyzes the spectrum (eigenvalues and eigenvectors) of a matrix representing a graph. Spectral clustering/partitioning: construct a matrix representation of the graph; eigendecompose the matrix and embed the vertices to a Euclidean space using one or more eigenvectors; group the points based on their coordinates.
The Laplacian matrix of a graph, aka graph Laplacian, is its degree matrix minus its adjacency matrix, $L = D - A$, where $D = \mathrm{diag}\{d_i\}_{i=1}^n$. The Laplacian matrix is a discrete analog of the Laplacian operator $Δ$ in multivariable calculus. The Laplacian matrix is symmetric, so its eigenvectors can form an orthogonal basis; additionally, all eigenvalues are non-negative. The (symmetric) normalized Laplacian is $L' = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}$.
Diffusion processes (or random walks, Markov processes) can be useful for the geometric descriptions of data sets. @Coifman2006a approximates (anisotropic) diffusion processes $\exp(-t H_\alpha)$ on submanifolds, where $H_\alpha = \Delta - \Delta(p^{1-\alpha}) / p^{1-\alpha}$ is a symmetric Schrödinger operator and parameter $\alpha \in \mathbb{R}$, with special cases: $\alpha = 1$, heat diffusion; $\alpha = 1/2$, a Fokker–Planck diffusion; $\alpha = 0$, normalized graph Laplacian with isotropic weights. In comparison, "kernel eigenmaps" are diffusion processes $\exp(-t Q_2^{-1} Q_1)$.
An integral kernel $k(x,y)$ defines an integral transform/operator $K$ for functions on measure space $(\Gamma, \mathrm{d}\mu)$, such that $K f(x) = \int_\Gamma k(x, y) f(y)~\mathrm{d}\mu(y)$. Here we call an integral kernel admissible, if it is symmetric, positivity-preserving, and positive semi-definite: $k(x, y) = k(y, x)$; $k \ge 0$; $\langle f, K f \rangle \ge 0$. Admissible kernels can be designed to represent a local metric (e.g. degree of similarity) or exhibit a local behavior.
A Markov/stochastic kernel $\tilde{a}(x, y)$ defines a Markov process (an averaging operator) $\tilde{A}$, where $\tilde{a}(x, \cdot)$ is a probability measure. Every positivity-preserving integral kernel can be transformed into a Markov kernel $\tilde{a}(x, y)= k(x, y) / v^2(x)$, where $v^2(x) = K 1$. Assume the Markov kernel has eigendecomposition $\tilde{a}(x, y) = \sum_{i \in \mathbb{N}} \lambda_i \psi_i(x) \hat{\psi}_i(y)$. The right eigenvectors $\{\psi_i\}$ (and left eigenvectors $\{\hat{\psi}_i\}$) need not be orthogonal.
A diffusion kernel $a(x, y)$ is the symmetric conjugate of a Markov kernel $\tilde{a}$ by $v$, $a(x, y) = v(x) \tilde{a}(x, y) / v(y)$. The diffusion operator $A$ with kernel $a$ is positive semi-definite, bounded, with the largest eigenvalue 1 and a corresponding eigenfunction $v(x)$. Assume that $A$ is compact (not just bounded), then its spectrum is discrete and its eigendecomposition can be written as $a(x, y) = \sum_{i \in \mathbb{N}} \lambda_i \phi_i(x) \phi_i(y)$, where $A \phi_i(x) = \lambda_i \phi_i(x)$. The diffusion operator $A$ and the averaging operator $\tilde{A}$ have the same spectrum $\lambda$, and the eigenfunctions of $A$ and the right/left eigenfunctions of $\tilde{A}$ are related by conjugation by $v$: $\psi_i = \mathrm{diag}(v^{-1}) \phi_i$, $\hat{\psi}_i = \mathrm{diag}(v) \phi_i$. The t-step diffusion operator $A^t$ aggregates the local information, and its kernel $a^{(t)}$ has eigendecomposition $a^{(t)}(x, y) = \sum_{i \in \mathbb{N}} \lambda_i^t \phi_i(x) \phi_i(y)$.
The diffusion metric/distance $D_t(x, y)$ at time step $t$ is defined as $D_t^2 (x, y) \equiv a^{(t)}(x, x) + a^{(t)}(y, y) - 2 a^{(t)}(x, y)$, which is equivalent to $D_t^2 (x, y) = \sum_{i \in \mathbb{N}} \lambda_i^t [\phi_i(x) - \phi_i(y)]^2$ and $D_t^2(x,y) = \| a^{(t/2)}(x,\cdot) - a^{(t/2)}(y,\cdot) \|^2$. Thus $D_t$ is a semi-metric on the set, which becomes a metric if the diffusion kernel $a$ is positive definite. Unlike the geodesic distance, the diffusion distance is an average over all paths connecting two points, and thus robust to noise and topological short-circuits. A diffusion map $\Phi: \Gamma \to l^2(\mathbb{N})$ is a mapping of the data to a Euclidean space, based on the eigenfunctions of the diffusion kernel: $\Phi(x) = \left( \phi_i(x) \right)_{i \in \mathbb{N}}$. Diffusion map $\Phi$ defines an embedding of the data that preserves the diffusion distance $D_t$ via a weighted Euclidean metric $D_t^2(x,y) = \langle \Phi(x) - \Phi(y), \Lambda^t (\Phi(x) - \Phi(y)) \rangle$. A truncated diffusion map $\Phi_k: \Gamma \to \mathbb{R}^k$ with $\Phi_k(x) = \left( \phi_i(x) \right)_{i=1}^k$ preserves the diffusion distance with certain accuracy. The dimension $k$ of the new representation depends on the diffusion process, and is in general greater than the dimension $d$ of the set/manifold.
(Inverse mapping.)
Given $\Gamma$ is a compact smooth submanifold of dimension $d$ in $\mathbb{R}^n$, i.e. a Riemannian manifold with Riemannian metric induced from the ambient space $\mathbb{R}^n$. Let $\mu$ be a probability measure on $\Gamma$ and $\mathrm{d}x$ the Riemannian measure on $\Gamma$, then probability density $p(x)$ is given by $p(x) = \mathrm{d}\mu(x) / \mathrm{d}x$. Let $\{e_i\}_{i=1}^d$ be an orthonormal basis of the tangent plane $T_x$ to $Γ$ at $x$, and the exponential map $\mathrm{exp}_x$ of the coordinates of the tangent plane forms a chart on $Γ$ around $x$, which provides normal coordinates $(s_i)_{i=1}^d$. The Laplace-Beltrami operator $\Delta$ on the Riemannian manifold can be written as $\Delta f = - \sum_{i=1}^d \partial^2 f / \partial s_i^2$, where $f \in C^\infty(\Gamma)$ is a smooth function on the manifold. The Neumann heat operator on the manifold (heat diffusion on the manifold with the Neumann boundary condition) is $e^{−t \Delta}$, which corresponds to the Neumann heat kernel $p_t (x, y)$ when seen as an integral operator.
Consider rotation-invariant kernels $k_\varepsilon (x, y) = h(\|x-y\|^2 / \varepsilon)$ with scale parameter $\varepsilon$, where $h$ is smooth and decays exponentially, and $\|\cdot\|$ is the Euclidean metric. As $\varepsilon$ goes to 0, the local geometry specified by $k_\varepsilon$ coincide with that of the manifold. Given an averaging operator $\tilde{A}_\varepsilon f(x) = K_\varepsilon f(x) / v_\varepsilon^2 (x)$ where $v_\varepsilon^2 (x) = K_\varepsilon 1$, the weighted graph Laplacian operator is $\tilde{\Delta}_\varepsilon = (I - \tilde{A}_\varepsilon) / \varepsilon$. On certain subspaces $E_K$ of $L^2(\Gamma)$, as $\varepsilon$ goes to 0, the limit operator of $\tilde{\Delta}_\varepsilon$ is $H f = c (\Delta f + 2 \langle \nabla p / p , \nabla f \rangle)$, where $c = m_2 / (2 m_0)$, $m_0 = \int_{\mathbb{R}^d} h(\|u\|^2)~\mathrm{d}u$, and $m_2 = \int_{\mathbb{R}^d} u_i^2 h(\|u\|^2)~\mathrm{d}u$. Under a conjugation by the density $p$, we get $p H(g/p) = c (\Delta g - g \nabla p / p)$, where $g = p f$. Thus, when the density $p$ is uniform, the weighted graph Laplacian $\tilde{\Delta}_\varepsilon$ converges to (a multiple of) the Laplace-Beltrami operator $\Delta$ on the manifold [@Belkin2003]; but when the density is non-uniform, such reconstruction does not hold.
To separate the distribution $p$ on the manifold from its intrinsic geometry $k_\varepsilon$, use kernel $\tilde{k}_\varepsilon(x, y) = k_\varepsilon(x, y)/(p_\varepsilon(x) p_\varepsilon(y))$ where $p_\varepsilon(x) = K_\varepsilon 1$. This leads to an averaging operator $\bar{A}_\varepsilon f(x) = \tilde{K}_\varepsilon f(x) / \tilde{v}_\varepsilon^2 (x)$ where $\tilde{v}_\varepsilon^2 (x) = \tilde{K}_\varepsilon 1$. Define a Laplace operator $\Delta_\varepsilon = (I - \bar{A}_\varepsilon) / \varepsilon$ acting on the subspaces $E_K$ of $L^2(\Gamma)$, its limit operator $\lim_{\varepsilon \to 0} \Delta_\varepsilon \equiv \Delta_0 = \Delta / c$ is a multiple of the Laplace-Beltrami operator. The limit operator of $\bar{A}_\varepsilon^{t/\varepsilon}$ is the Neumann heat operator $e^{−t \Delta_0}$; in other words, Markov kernel $\bar{a}_\varepsilon^{(t/\varepsilon)}(x, y)$ with small $\varepsilon$, close to a fine scale Gaussian kernel, approximates the (symmetric) Neumann heat kernel $p_t (x, y)$, regardless of knowledge of the boundary. The averaging operator $\bar{A}_\varepsilon$ is compact and its eigendecomposition can be written as $\bar{A}_\varepsilon = \sum_{i \in \mathbb{N}} \lambda_{\varepsilon, i} P_{\varepsilon, i}$, where $P_{\varepsilon, i}$ is the orthogonal projector on the eigenspace corresponding to eigenvalue $\lambda_{\varepsilon, i}$. If the eigendecomposition of the Laplace-Beltrami operator is $\Delta = \sum_{i \in \mathbb{N}} \nu_i^2 P_i$, where $P_i$ is the orthogonal projector on the eigenspace corresponding to eigenvalue $\nu_i^2$. Then the eigenfunctions and eigenvalues of $\bar{A}_\varepsilon^{t/\varepsilon}$ converge to those of the heat operator $e^{-t\Delta}$: $\lim_{\varepsilon \to 0} \lambda_{\varepsilon, i}^{t/\varepsilon} = e^{-t\nu_i^2}$, $\lim_{\varepsilon \to 0} P_{\varepsilon, i} = P_i$. Let $\{\phi_i\}_{i \in \mathbb{N}}$ be the orthonormal eigenfunctions of the Laplace-Beltrami operator (also of the heat kernel) on the manifold $\Gamma$, then the eigendecomposition of the heat kernel is $p_t(x, y) = \sum_{i \in \mathbb{N}} e^{-t \nu_i^2} \phi_i(x) \phi_i(y)$. Although $\{\phi_i\}_{i \in \mathbb{N}}$ is usually viewed as a Hilbert basis of $L^2(Γ)$, it also forms a set of coordinates on the submanifold $Γ$ via diffusion maps $\Phi$, which accurately preserves the heat diffusion distance with dimension $k \ge d$.
Numerical procedure for approximate Neumann heat diffusion: (Note that skipping steps 3-4 provides the weighted graph Laplacian.)
(for differential and dynamical systems)
Diffusion processes are also useful for harmonic analysis of functions on data sets.
Geometric harmonics is a set of functions that allows out-of-sample extension of empirical functions on the data set.
Geometric harmonics provide a simple solution to the relaxed distortion problem, an embedding $\Psi$ that is bi-Lipschitz with a small distortion.
Atlas Computation:
Eigenfunction selection:
Intrinsic geometry, Fourier analysis of the manifold (eigenfunctions of the Laplace-Beltrami operator). Extrinsic geometry, Fourier analysis of the ambient space.
Restriction.
Extension (a version of the Heisenberg uncertainty principle): extend the eigenfunctions to a band-limited function of band $\mathcal{O}(\nu_i)$, localized in a tube of radius $\mathcal{O}(1/\nu_i)$ around the manifold.
(Extension of f = 1 for discription of the manifold?)
Multiscale extension: empirical functions are decomposed into frequency bands, and each band is extended to a certain distance so that it satisfies some version of the Heisenberg principle.
(Intrinsic and extrinsic diffusion.)
Multiscale extension scheme: