This article is about Diffusion processes and harmonic analysis on manifolds, in continuous and finite versions.
Geometry of a set Γ is a set of rules that describe the relationship between its elements. Intrinsic geometry (本征几何) is one that does not refer to a superset or the structures therein. Extrinsic geometry (表征几何) is one that is induced from the geometry of a superset. The geometry of a set can be understood through the geometry of the space of functions on the set (and spaces of functionals and operators). This is a dual perspective in functional analysis, from inverse problems such as inverse scattering, potential theory, and spectral geometry. [@Lafon2004] uses the language of functional analysis.
Integral kernel k(x,y) defines an integral transform or integral operator K for functions on measure space (\Gamma, \mathrm{d}\mu), such that K f(x) = \int_\Gamma k(x, y) f(y)~\mathrm{d}\mu(y). Here we call an integral kernel admissible, if it is symmetric, positivity-preserving, and positive semi-definite: k(x, y) = k(y, x); k \ge 0; \langle f, K f \rangle \ge 0. Admissible kernels can be designed to represent a local metric (e.g. degree of similarity) or exhibit a local behavior.
Markov kernel or stochastic kernel \tilde{a}(x, y) defines a Markov process (an averaging operator) \tilde{A}, where \tilde{a}(x, \cdot) is a probability measure. Every positivity-preserving integral kernel can be transformed into a Markov kernel \tilde{a}(x, y)= k(x, y) / v^2(x), where v^2(x) = K 1. Assume the Markov kernel has eigendecomposition \tilde{a}(x, y) = \sum_{i \in \mathbb{N}} \lambda_i \psi_i(x) \hat{\psi}_i(y). The right eigenvectors \{\psi_i\} (and left eigenvectors \{\hat{\psi}_i\}) need not be orthogonal.
Diffusion kernel a(x, y) is the symmetric conjugate of a Markov kernel \tilde{a} by v, a(x, y) = v(x) \tilde{a}(x, y) / v(y). Diffusion operator A with kernel a is positive semi-definite, bounded, with the largest eigenvalue 1 and a corresponding eigenfunction v(x). Assume that A is compact (not just bounded), then its spectrum is discrete and its eigendecomposition can be written as a(x, y) = \sum_{i \in \mathbb{N}} \lambda_i \phi_i(x) \phi_i(y), where A \phi_i(x) = \lambda_i \phi_i(x). The diffusion operator A and the averaging operator \tilde{A} have the same spectrum \lambda, and the eigenfunctions of A and the right/left eigenfunctions of \tilde{A} are related by conjugation by v: \psi_i = \mathrm{diag}(v^{-1}) \phi_i, \hat{\psi}_i = \mathrm{diag}(v) \phi_i. t-step diffusion operator A^t aggregates the local information, and its kernel, the t-step diffusion kernel a^{(t)}, has eigendecomposition a^{(t)}(x, y) = \sum_{i \in \mathbb{N}} \lambda_i^t \phi_i(x) \phi_i(y).
Given \Gamma is a compact smooth submanifold of dimension d in \mathbb{R}^n, i.e. a Riemannian manifold with Riemannian metric induced from the ambient space \mathbb{R}^n. Let \mu be a probability measure on \Gamma and \mathrm{d}x the Riemannian measure on \Gamma, then probability density p(x) is given by p(x) = \mathrm{d}\mu(x) / \mathrm{d}x. Let \{e_i\}_{i=1}^d be an orthonormal basis of the tangent plane T_x to Γ at x, and the exponential map \mathrm{exp}_x of the coordinates of the tangent plane forms a chart on Γ around x, which provides normal coordinates (s_i)_{i=1}^d. Laplace-Beltrami operator \Delta on the Riemannian manifold can be written as \Delta f = - \sum_{i=1}^d \partial^2 f / \partial s_i^2, where f \in C^\infty(\Gamma) is a smooth function on the manifold. Neumann heat operator on the manifold (heat diffusion on the manifold with the Neumann boundary condition) is e^{−t \Delta}, which corresponds to the Neumann heat kernel p_t (x, y) when seen as an integral operator.
For the Euclidean n-space, the heat kernel is the Gaussian probability density function with scalar variance twice the diffusion time: \bar{p}_t (d) = (4 \pi t)^{-n/2} e^{- d^2 / (4t)}, where distance d = |x - y|; in short, \bar{p}_t = \phi(0, 2 t I_n), where \phi is the PDF of a Gaussian random vector.
For the n-sphere, the heat kernel is a function of the spherical distance \theta = \arccos \langle x, y \rangle. It does not have a closed form, but can be written in series approximations. The heat kernel can be bounded as [@Nowak2019]: given T > 0, \forall \theta \in [0, \pi], \forall t \in (0, T], \mathring{p}_t(\theta) \asymp \mathcal{O}\left((t+ \pi - \theta)^{-(n-1)/2} t^{-n/2} e^{-\theta^2 / (4t)}\right), where the constants depend only on n and T; in short, \mathring{p}_t(\theta) \asymp \mathcal{O}\left((t+ \pi - \theta)^{-(n-1)/2} \bar{p}_t (\theta) \right). Note that \asymp denotes order equivalence.
For the hyperbolic n-space, the heat kernel is [@Davies1988]: (1) if n = 2m + 1, \breve p_t (d) = \frac{(-1)^m e^{-(n-1)^2 t / 4}}{\sqrt{(2\pi)^{n} (2 t)}} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right)^m e^{- d^2 / (4t)}; in short, \breve p_t (d) = (-2 t)^{m} e^{-m^2 t} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right)^m \bar p_t (d); (2) if n = 2m + 2, \breve p_t (d) = \frac{(-1)^m e^{-(n-1)^2 t / 4}}{\sqrt{(2\pi)^{n+1} (4 t^3)}} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right)^m \int_d^\infty \frac{s e^{- s^2 / (4t)}}{\sqrt{\cosh s - \cosh d}} \text{d}s. The heat kernels satisfy the recurrence relation: \breve p_{n+2} = \frac{-e^{-nt}}{2\pi} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right) \breve p_n.
Diffusion processes (or random walks, Markov processes) can be useful for the geometric descriptions of data sets. [@Coifman2006a] approximates (anisotropic) diffusion processes \exp(-t H_\alpha) on submanifolds, where H_\alpha = \Delta - \Delta(p^{1-\alpha}) / p^{1-\alpha} is a symmetric Schrödinger operator and parameter \alpha \in \mathbb{R}, with special cases: \alpha = 1, heat diffusion; \alpha = 1/2, a Fokker–Planck diffusion; \alpha = 0, normalized graph Laplacian with isotropic weights. In comparison, "kernel eigenmaps" are diffusion processes \exp(-t Q_2^{-1} Q_1).
Diffusion distance or diffusion metric D_t(x, y) at time step t is the semi-metric defined as: D_t^2 (x, y) \equiv a^{(t)}(x, x) + a^{(t)}(y, y) - 2 a^{(t)}(x, y), which is equivalent to D_t^2 (x, y) = \sum_{i \in \mathbb{N}} \lambda_i^t [\phi_i(x) - \phi_i(y)]^2 and D_t^2(x,y) = \| a^{(t/2)}(x,\cdot) - a^{(t/2)}(y,\cdot) \|^2. The diffusion distance becomes a metric if the diffusion kernel a is positive definite. Unlike the geodesic distance, the diffusion distance is an average over all paths connecting two points, and thus robust to noise and topological short-circuits.
Diffusion map \Phi: \Gamma \to l^2(\mathbb{N}) is a mapping of the data to a Euclidean space, based on the eigenfunctions of the diffusion kernel: \Phi(x) = \left( \phi_i(x) \right)_{i \in \mathbb{N}}. Diffusion map \Phi defines an embedding of the data that preserves the diffusion distance D_t via a weighted Euclidean metric D_t^2(x,y) = \langle \Phi(x) - \Phi(y), \Lambda^t (\Phi(x) - \Phi(y)) \rangle. A truncated diffusion map \Phi_k: \Gamma \to \mathbb{R}^k with \Phi_k(x) = \left( \phi_i(x) \right)_{i=1}^k preserves the diffusion distance with certain accuracy. The dimension k of the new representation depends on the diffusion process, and is in general greater than the dimension d of the set/manifold.
(Inverse mapping.)
Spectral graph theory analyzes the spectrum (eigenvalues and eigenvectors) of a matrix representing a graph. For example, spectral clustering or spectral partitioning: construct a matrix representation of the graph; eigendecompose the matrix and embed the vertices to a Euclidean space using one or more eigenvectors; group the points based on their coordinates. Laplacian matrix of a graph, aka graph Laplacian, is its degree matrix minus its adjacency matrix, L = D - A, where D = \mathrm{diag}\{d_i\}_{i=1}^n. The Laplacian matrix is a discrete analog of the Laplacian operator Δ in multivariable calculus. The Laplacian matrix is symmetric, so its eigenvectors can form an orthogonal basis; additionally, all eigenvalues are non-negative. Symmetric normalized graph Laplacian is L' = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}. Consider rotation-invariant kernels k_\varepsilon (x, y) = h(\|x-y\|^2 / \varepsilon) with scale parameter \varepsilon, where h is smooth and decays exponentially, and \|\cdot\| is the Euclidean metric. As \varepsilon goes to 0, the local geometry specified by k_\varepsilon coincide with that of the manifold. Given an averaging operator \tilde{A}_\varepsilon f(x) = K_\varepsilon f(x) / v_\varepsilon^2 (x) where v_\varepsilon^2 (x) = K_\varepsilon 1, the weighted graph Laplacian operator is \tilde{\Delta}_\varepsilon = (I - \tilde{A}_\varepsilon) / \varepsilon. On certain subspaces E_K of L^2(\Gamma), as \varepsilon goes to 0, the limit operator of \tilde{\Delta}_\varepsilon is H f = c (\Delta f + 2 \langle \nabla p / p , \nabla f \rangle), where c = m_2 / (2 m_0), m_0 = \int_{\mathbb{R}^d} h(\|u\|^2)~\mathrm{d}u, and m_2 = \int_{\mathbb{R}^d} u_i^2 h(\|u\|^2)~\mathrm{d}u. Under a conjugation by the density p, we get p H(g/p) = c (\Delta g - g \nabla p / p), where g = p f. Thus, when the density p is uniform, the weighted graph Laplacian \tilde{\Delta}_\varepsilon converges to (a multiple of) the Laplace-Beltrami operator \Delta on the manifold [@Belkin2003]; but when the density is non-uniform, such reconstruction does not hold.
To separate the distribution p on the manifold from its intrinsic geometry k_\varepsilon, use kernel \tilde{k}_\varepsilon(x, y) = k_\varepsilon(x, y)/(p_\varepsilon(x) p_\varepsilon(y)) where p_\varepsilon(x) = K_\varepsilon 1. This leads to an averaging operator \bar{A}_\varepsilon f(x) = \tilde{K}_\varepsilon f(x) / \tilde{v}_\varepsilon^2 (x) where \tilde{v}_\varepsilon^2 (x) = \tilde{K}_\varepsilon 1. Define a Laplace operator \Delta_\varepsilon = (I - \bar{A}_\varepsilon) / \varepsilon acting on the subspaces E_K of L^2(\Gamma), its limit operator \lim_{\varepsilon \to 0} \Delta_\varepsilon \equiv \Delta_0 = \Delta / c is a multiple of the Laplace-Beltrami operator. The limit operator of \bar{A}_\varepsilon^{t/\varepsilon} is the Neumann heat operator e^{−t \Delta_0}; in other words, Markov kernel \bar{a}_\varepsilon^{(t/\varepsilon)}(x, y) with small \varepsilon, close to a fine scale Gaussian kernel, approximates the (symmetric) Neumann heat kernel p_t (x, y), regardless of knowledge of the boundary. The averaging operator \bar{A}_\varepsilon is compact and its eigendecomposition can be written as \bar{A}_\varepsilon = \sum_{i \in \mathbb{N}} \lambda_{\varepsilon, i} P_{\varepsilon, i}, where P_{\varepsilon, i} is the orthogonal projector on the eigenspace corresponding to eigenvalue \lambda_{\varepsilon, i}. If the eigendecomposition of the Laplace-Beltrami operator is \Delta = \sum_{i \in \mathbb{N}} \nu_i^2 P_i, where P_i is the orthogonal projector on the eigenspace corresponding to eigenvalue \nu_i^2. Then the eigenfunctions and eigenvalues of \bar{A}_\varepsilon^{t/\varepsilon} converge to those of the heat operator e^{-t\Delta}: \lim_{\varepsilon \to 0} \lambda_{\varepsilon, i}^{t/\varepsilon} = e^{-t\nu_i^2}, \lim_{\varepsilon \to 0} P_{\varepsilon, i} = P_i. Let \{\phi_i\}_{i \in \mathbb{N}} be the orthonormal eigenfunctions of the Laplace-Beltrami operator (also of the heat kernel) on the manifold \Gamma, then the eigendecomposition of the heat kernel is p_t(x, y) = \sum_{i \in \mathbb{N}} e^{-t \nu_i^2} \phi_i(x) \phi_i(y). Although \{\phi_i\}_{i \in \mathbb{N}} is usually viewed as a Hilbert basis of L^2(Γ), it also forms a set of coordinates on the submanifold Γ via diffusion maps \Phi, which accurately preserves the heat diffusion distance with dimension k \ge d.
Numerical procedure for approximate Neumann heat diffusion: (Note that skipping steps 3-4 provides the weighted graph Laplacian.)
For differential and dynamical systems.
Diffusion processes are also useful for harmonic analysis of functions on data sets. Geometric harmonics is a set of functions that allows out-of-sample extension of empirical functions on the data set.
Geometric harmonics provide a simple solution to the relaxed distortion problem, an embedding \Psi that is bi-Lipschitz with a small distortion.
Atlas Computation:
Eigenfunction selection:
Intrinsic geometry, Fourier analysis of the manifold (eigenfunctions of the Laplace-Beltrami operator). Extrinsic geometry, Fourier analysis of the ambient space.
Restriction.
Extension (a version of the Heisenberg uncertainty principle): extend the eigenfunctions to a band-limited function of band \mathcal{O}(\nu_i), localized in a tube of radius \mathcal{O}(1/\nu_i) around the manifold.
(Extension of f = 1 for discription of the manifold?)
Multiscale extension: empirical functions are decomposed into frequency bands, and each band is extended to a certain distance so that it satisfies some version of the Heisenberg principle.
(Intrinsic and extrinsic diffusion.)
Multiscale extension scheme: