This article is about Diffusion processes and harmonic analysis on manifolds, in continuous and finite versions.
Geometry of a set Γ is a set of rules that describe the relationship between its elements. Intrinsic geometry (本征几何) is one that does not refer to a superset or the structures therein. Extrinsic geometry (表征几何) is one that is induced from the geometry of a superset. The geometry of a set can be understood through the geometry of the space of functions on the set (and spaces of functionals and operators). This is a dual perspective in functional analysis, from inverse problems such as inverse scattering, potential theory, and spectral geometry. [@Lafon2004] uses the language of functional analysis.
Integral kernel k(x,y) defines an integral transform or integral operator K for functions on measure space (Γ,dμ), such that Kf(x)=∫Γk(x,y)f(y) dμ(y). Here we call an integral kernel admissible, if it is symmetric, positivity-preserving, and positive semi-definite: k(x,y)=k(y,x); k≥0; ⟨f,Kf⟩≥0. Admissible kernels can be designed to represent a local metric (e.g. degree of similarity) or exhibit a local behavior.
Markov kernel or stochastic kernel ˜a(x,y) defines a Markov process (an averaging operator) ˜A, where ˜a(x,⋅) is a probability measure. Every positivity-preserving integral kernel can be transformed into a Markov kernel ˜a(x,y)=k(x,y)/v2(x), where v2(x)=K1. Assume the Markov kernel has eigendecomposition ˜a(x,y)=∑i∈Nλiψi(x)ˆψi(y). The right eigenvectors {ψi} (and left eigenvectors {ˆψi}) need not be orthogonal.
Diffusion kernel a(x,y) is the symmetric conjugate of a Markov kernel ˜a by v, a(x,y)=v(x)˜a(x,y)/v(y). Diffusion operator A with kernel a is positive semi-definite, bounded, with the largest eigenvalue 1 and a corresponding eigenfunction v(x). Assume that A is compact (not just bounded), then its spectrum is discrete and its eigendecomposition can be written as a(x,y)=∑i∈Nλiϕi(x)ϕi(y), where Aϕi(x)=λiϕi(x). The diffusion operator A and the averaging operator ˜A have the same spectrum λ, and the eigenfunctions of A and the right/left eigenfunctions of ˜A are related by conjugation by v: ψi=diag(v−1)ϕi, ˆψi=diag(v)ϕi. t-step diffusion operator At aggregates the local information, and its kernel, the t-step diffusion kernel a(t), has eigendecomposition a(t)(x,y)=∑i∈Nλtiϕi(x)ϕi(y).
Given Γ is a compact smooth submanifold of dimension d in Rn, i.e. a Riemannian manifold with Riemannian metric induced from the ambient space Rn. Let μ be a probability measure on Γ and dx the Riemannian measure on Γ, then probability density p(x) is given by p(x)=dμ(x)/dx. Let {ei}di=1 be an orthonormal basis of the tangent plane Tx to Γ at x, and the exponential map expx of the coordinates of the tangent plane forms a chart on Γ around x, which provides normal coordinates (si)di=1. Laplace-Beltrami operator Δ on the Riemannian manifold can be written as Δf=−∑di=1∂2f/∂s2i, where f∈C∞(Γ) is a smooth function on the manifold. Neumann heat operator on the manifold (heat diffusion on the manifold with the Neumann boundary condition) is e−tΔ, which corresponds to the Neumann heat kernel pt(x,y) when seen as an integral operator.
For the Euclidean n-space, the heat kernel is the Gaussian probability density function with scalar variance twice the diffusion time: ˉpt(d)=(4πt)−n/2e−d2/(4t), where distance d=|x−y|; in short, ˉpt=ϕ(0,2tIn), where ϕ is the PDF of a Gaussian random vector.
For the n-sphere, the heat kernel is a function of the spherical distance θ=arccos⟨x,y⟩. It does not have a closed form, but can be written in series approximations. The heat kernel can be bounded as [@Nowak2019]: given T>0, ∀θ∈[0,π], ∀t∈(0,T], p˚, where the constants depend only on n and T; in short, \mathring{p}_t(\theta) \asymp \mathcal{O}\left((t+ \pi - \theta)^{-(n-1)/2} \bar{p}_t (\theta) \right). Note that \asymp denotes order equivalence.
For the hyperbolic n-space, the heat kernel is [@Davies1988]: (1) if n = 2m + 1, \breve p_t (d) = \frac{(-1)^m e^{-(n-1)^2 t / 4}}{\sqrt{(2\pi)^{n} (2 t)}} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right)^m e^{- d^2 / (4t)}; in short, \breve p_t (d) = (-2 t)^{m} e^{-m^2 t} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right)^m \bar p_t (d); (2) if n = 2m + 2, \breve p_t (d) = \frac{(-1)^m e^{-(n-1)^2 t / 4}}{\sqrt{(2\pi)^{n+1} (4 t^3)}} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right)^m \int_d^\infty \frac{s e^{- s^2 / (4t)}}{\sqrt{\cosh s - \cosh d}} \text{d}s. The heat kernels satisfy the recurrence relation: \breve p_{n+2} = \frac{-e^{-nt}}{2\pi} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right) \breve p_n.
Diffusion processes (or random walks, Markov processes) can be useful for the geometric descriptions of data sets. [@Coifman2006a] approximates (anisotropic) diffusion processes \exp(-t H_\alpha) on submanifolds, where H_\alpha = \Delta - \Delta(p^{1-\alpha}) / p^{1-\alpha} is a symmetric Schrödinger operator and parameter \alpha \in \mathbb{R}, with special cases: \alpha = 1, heat diffusion; \alpha = 1/2, a Fokker–Planck diffusion; \alpha = 0, normalized graph Laplacian with isotropic weights. In comparison, "kernel eigenmaps" are diffusion processes \exp(-t Q_2^{-1} Q_1).
Diffusion distance or diffusion metric D_t(x, y) at time step t is the semi-metric defined as: D_t^2 (x, y) \equiv a^{(t)}(x, x) + a^{(t)}(y, y) - 2 a^{(t)}(x, y), which is equivalent to D_t^2 (x, y) = \sum_{i \in \mathbb{N}} \lambda_i^t [\phi_i(x) - \phi_i(y)]^2 and D_t^2(x,y) = \| a^{(t/2)}(x,\cdot) - a^{(t/2)}(y,\cdot) \|^2. The diffusion distance becomes a metric if the diffusion kernel a is positive definite. Unlike the geodesic distance, the diffusion distance is an average over all paths connecting two points, and thus robust to noise and topological short-circuits.
Diffusion map \Phi: \Gamma \to l^2(\mathbb{N}) is a mapping of the data to a Euclidean space, based on the eigenfunctions of the diffusion kernel: \Phi(x) = \left( \phi_i(x) \right)_{i \in \mathbb{N}}. Diffusion map \Phi defines an embedding of the data that preserves the diffusion distance D_t via a weighted Euclidean metric D_t^2(x,y) = \langle \Phi(x) - \Phi(y), \Lambda^t (\Phi(x) - \Phi(y)) \rangle. A truncated diffusion map \Phi_k: \Gamma \to \mathbb{R}^k with \Phi_k(x) = \left( \phi_i(x) \right)_{i=1}^k preserves the diffusion distance with certain accuracy. The dimension k of the new representation depends on the diffusion process, and is in general greater than the dimension d of the set/manifold.
(Inverse mapping.)
Spectral graph theory analyzes the spectrum (eigenvalues and eigenvectors) of a matrix representing a graph. For example, spectral clustering or spectral partitioning: construct a matrix representation of the graph; eigendecompose the matrix and embed the vertices to a Euclidean space using one or more eigenvectors; group the points based on their coordinates. Laplacian matrix of a graph, aka graph Laplacian, is its degree matrix minus its adjacency matrix, L = D - A, where D = \mathrm{diag}\{d_i\}_{i=1}^n. The Laplacian matrix is a discrete analog of the Laplacian operator Δ in multivariable calculus. The Laplacian matrix is symmetric, so its eigenvectors can form an orthogonal basis; additionally, all eigenvalues are non-negative. Symmetric normalized graph Laplacian is L' = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}. Consider rotation-invariant kernels k_\varepsilon (x, y) = h(\|x-y\|^2 / \varepsilon) with scale parameter \varepsilon, where h is smooth and decays exponentially, and \|\cdot\| is the Euclidean metric. As \varepsilon goes to 0, the local geometry specified by k_\varepsilon coincide with that of the manifold. Given an averaging operator \tilde{A}_\varepsilon f(x) = K_\varepsilon f(x) / v_\varepsilon^2 (x) where v_\varepsilon^2 (x) = K_\varepsilon 1, the weighted graph Laplacian operator is \tilde{\Delta}_\varepsilon = (I - \tilde{A}_\varepsilon) / \varepsilon. On certain subspaces E_K of L^2(\Gamma), as \varepsilon goes to 0, the limit operator of \tilde{\Delta}_\varepsilon is H f = c (\Delta f + 2 \langle \nabla p / p , \nabla f \rangle), where c = m_2 / (2 m_0), m_0 = \int_{\mathbb{R}^d} h(\|u\|^2)~\mathrm{d}u, and m_2 = \int_{\mathbb{R}^d} u_i^2 h(\|u\|^2)~\mathrm{d}u. Under a conjugation by the density p, we get p H(g/p) = c (\Delta g - g \nabla p / p), where g = p f. Thus, when the density p is uniform, the weighted graph Laplacian \tilde{\Delta}_\varepsilon converges to (a multiple of) the Laplace-Beltrami operator \Delta on the manifold [@Belkin2003]; but when the density is non-uniform, such reconstruction does not hold.
To separate the distribution p on the manifold from its intrinsic geometry k_\varepsilon, use kernel \tilde{k}_\varepsilon(x, y) = k_\varepsilon(x, y)/(p_\varepsilon(x) p_\varepsilon(y)) where p_\varepsilon(x) = K_\varepsilon 1. This leads to an averaging operator \bar{A}_\varepsilon f(x) = \tilde{K}_\varepsilon f(x) / \tilde{v}_\varepsilon^2 (x) where \tilde{v}_\varepsilon^2 (x) = \tilde{K}_\varepsilon 1. Define a Laplace operator \Delta_\varepsilon = (I - \bar{A}_\varepsilon) / \varepsilon acting on the subspaces E_K of L^2(\Gamma), its limit operator \lim_{\varepsilon \to 0} \Delta_\varepsilon \equiv \Delta_0 = \Delta / c is a multiple of the Laplace-Beltrami operator. The limit operator of \bar{A}_\varepsilon^{t/\varepsilon} is the Neumann heat operator e^{−t \Delta_0}; in other words, Markov kernel \bar{a}_\varepsilon^{(t/\varepsilon)}(x, y) with small \varepsilon, close to a fine scale Gaussian kernel, approximates the (symmetric) Neumann heat kernel p_t (x, y), regardless of knowledge of the boundary. The averaging operator \bar{A}_\varepsilon is compact and its eigendecomposition can be written as \bar{A}_\varepsilon = \sum_{i \in \mathbb{N}} \lambda_{\varepsilon, i} P_{\varepsilon, i}, where P_{\varepsilon, i} is the orthogonal projector on the eigenspace corresponding to eigenvalue \lambda_{\varepsilon, i}. If the eigendecomposition of the Laplace-Beltrami operator is \Delta = \sum_{i \in \mathbb{N}} \nu_i^2 P_i, where P_i is the orthogonal projector on the eigenspace corresponding to eigenvalue \nu_i^2. Then the eigenfunctions and eigenvalues of \bar{A}_\varepsilon^{t/\varepsilon} converge to those of the heat operator e^{-t\Delta}: \lim_{\varepsilon \to 0} \lambda_{\varepsilon, i}^{t/\varepsilon} = e^{-t\nu_i^2}, \lim_{\varepsilon \to 0} P_{\varepsilon, i} = P_i. Let \{\phi_i\}_{i \in \mathbb{N}} be the orthonormal eigenfunctions of the Laplace-Beltrami operator (also of the heat kernel) on the manifold \Gamma, then the eigendecomposition of the heat kernel is p_t(x, y) = \sum_{i \in \mathbb{N}} e^{-t \nu_i^2} \phi_i(x) \phi_i(y). Although \{\phi_i\}_{i \in \mathbb{N}} is usually viewed as a Hilbert basis of L^2(Γ), it also forms a set of coordinates on the submanifold Γ via diffusion maps \Phi, which accurately preserves the heat diffusion distance with dimension k \ge d.
Numerical procedure for approximate Neumann heat diffusion: (Note that skipping steps 3-4 provides the weighted graph Laplacian.)
For differential and dynamical systems.
Diffusion processes are also useful for harmonic analysis of functions on data sets. Geometric harmonics is a set of functions that allows out-of-sample extension of empirical functions on the data set.
Geometric harmonics provide a simple solution to the relaxed distortion problem, an embedding \Psi that is bi-Lipschitz with a small distortion.
Atlas Computation:
Eigenfunction selection:
Intrinsic geometry, Fourier analysis of the manifold (eigenfunctions of the Laplace-Beltrami operator). Extrinsic geometry, Fourier analysis of the ambient space.
Restriction.
Extension (a version of the Heisenberg uncertainty principle): extend the eigenfunctions to a band-limited function of band \mathcal{O}(\nu_i), localized in a tube of radius \mathcal{O}(1/\nu_i) around the manifold.
(Extension of f = 1 for discription of the manifold?)
Multiscale extension: empirical functions are decomposed into frequency bands, and each band is extended to a certain distance so that it satisfies some version of the Heisenberg principle.
(Intrinsic and extrinsic diffusion.)
Multiscale extension scheme: