Loading [MathJax]/jax/element/mml/optable/GreekAndCoptic.js

This article is about Diffusion processes and harmonic analysis on manifolds, in continuous and finite versions.

Geometry of a set Γ is a set of rules that describe the relationship between its elements. Intrinsic geometry (本征几何) is one that does not refer to a superset or the structures therein. Extrinsic geometry (表征几何) is one that is induced from the geometry of a superset. The geometry of a set can be understood through the geometry of the space of functions on the set (and spaces of functionals and operators). This is a dual perspective in functional analysis, from inverse problems such as inverse scattering, potential theory, and spectral geometry. [@Lafon2004] uses the language of functional analysis.

Symbols

  • \Gamma: a set, either finite (graph) or continuous (manifold).
  • \mathrm{d}x: the Lebesgue measure (\mathbb{R}^n); the Riemannian volume (Riemannian manifold).
  • (\Gamma, \mathrm{d}\mu): a measure space with point set \Gamma and measure \mathrm{d}\mu.
  • f, F: a function (restricted to) a manifold \Gamma; a function (extended to) the ambient space \mathbb{R}^n;
  • L^2(\Gamma, \mathrm{d}\mu): the space of square integrable functions on \Gamma. f \in L^2(\Gamma, \mathrm{d}\mu) \Leftrightarrow \int_\Gamma |f(x)|^2~\mathrm{d}\mu(x) < \infty.
  • H_s(\Gamma, \mathrm{d}\mu): a Sobolev space of functions on \Gamma. f \in H_s(\Gamma, \mathrm{d}\mu) \Leftrightarrow \forall |r| \le s, \partial^r f \in L^2(\Gamma,\mathrm{d}\mu). Here |r| = \sum_i r_i, r_i \in \mathbb{N}.
  • \langle f, g \rangle_\Gamma: inner product of functions in L^2(\Gamma, \mathrm{d}\mu). \langle f,g \rangle_\Gamma = \int_\Gamma \overline{f(x)} g(x)~\mathrm{d}\mu(x).
  • \|f\|_s: norm of a function in H_s(\Gamma, \mathrm{d}\mu). \|f\|_s^2 = \sum_{|r| \le s} \int_\Gamma |\partial^r f(x)|^2~\mathrm{d}\mu(x).
  • k(x, y), p_t(x, y): an integral kernel; the Neumann heat kernel on a manifold.
  • v^2(x): integral transform of 1 with kernel k(x, y).
  • \tilde{a}(x, y): a Markov kernel, transition probability of a diffusion process.
  • a(x, y): a diffusion kernel, the symmetric conjugate to \tilde{a} by v(x).
  • K, \tilde{A}, A: an integral operator; an averaging operator; a diffusion operator.
  • \Delta: the Laplacian (\mathbb{R}^n); the Laplace-Beltrami operator (Riemannian manifold). Signs are defined so that \Delta is positive semi-definite.
  • \{\phi_i\}, \{\nu_i^2\}: eigenfunctions and eigenvalues of \Delta, i.e. \Delta \phi_i = \nu_i^2 \phi_i.
  • \Phi, \Phi_k: diffusion map; diffusion map truncated to dimension k.

Diffusion Process on Manifolds

Integral kernel k(x,y) defines an integral transform or integral operator K for functions on measure space (\Gamma, \mathrm{d}\mu), such that K f(x) = \int_\Gamma k(x, y) f(y)~\mathrm{d}\mu(y). Here we call an integral kernel admissible, if it is symmetric, positivity-preserving, and positive semi-definite: k(x, y) = k(y, x); k \ge 0; \langle f, K f \rangle \ge 0. Admissible kernels can be designed to represent a local metric (e.g. degree of similarity) or exhibit a local behavior.

Markov kernel or stochastic kernel \tilde{a}(x, y) defines a Markov process (an averaging operator) \tilde{A}, where \tilde{a}(x, \cdot) is a probability measure. Every positivity-preserving integral kernel can be transformed into a Markov kernel \tilde{a}(x, y)= k(x, y) / v^2(x), where v^2(x) = K 1. Assume the Markov kernel has eigendecomposition \tilde{a}(x, y) = \sum_{i \in \mathbb{N}} \lambda_i \psi_i(x) \hat{\psi}_i(y). The right eigenvectors \{\psi_i\} (and left eigenvectors \{\hat{\psi}_i\}) need not be orthogonal.

Diffusion kernel a(x, y) is the symmetric conjugate of a Markov kernel \tilde{a} by v, a(x, y) = v(x) \tilde{a}(x, y) / v(y). Diffusion operator A with kernel a is positive semi-definite, bounded, with the largest eigenvalue 1 and a corresponding eigenfunction v(x). Assume that A is compact (not just bounded), then its spectrum is discrete and its eigendecomposition can be written as a(x, y) = \sum_{i \in \mathbb{N}} \lambda_i \phi_i(x) \phi_i(y), where A \phi_i(x) = \lambda_i \phi_i(x). The diffusion operator A and the averaging operator \tilde{A} have the same spectrum \lambda, and the eigenfunctions of A and the right/left eigenfunctions of \tilde{A} are related by conjugation by v: \psi_i = \mathrm{diag}(v^{-1}) \phi_i, \hat{\psi}_i = \mathrm{diag}(v) \phi_i. t-step diffusion operator A^t aggregates the local information, and its kernel, the t-step diffusion kernel a^{(t)}, has eigendecomposition a^{(t)}(x, y) = \sum_{i \in \mathbb{N}} \lambda_i^t \phi_i(x) \phi_i(y).

Given \Gamma is a compact smooth submanifold of dimension d in \mathbb{R}^n, i.e. a Riemannian manifold with Riemannian metric induced from the ambient space \mathbb{R}^n. Let \mu be a probability measure on \Gamma and \mathrm{d}x the Riemannian measure on \Gamma, then probability density p(x) is given by p(x) = \mathrm{d}\mu(x) / \mathrm{d}x. Let \{e_i\}_{i=1}^d be an orthonormal basis of the tangent plane T_x to Γ at x, and the exponential map \mathrm{exp}_x of the coordinates of the tangent plane forms a chart on Γ around x, which provides normal coordinates (s_i)_{i=1}^d. Laplace-Beltrami operator \Delta on the Riemannian manifold can be written as \Delta f = - \sum_{i=1}^d \partial^2 f / \partial s_i^2, where f \in C^\infty(\Gamma) is a smooth function on the manifold. Neumann heat operator on the manifold (heat diffusion on the manifold with the Neumann boundary condition) is e^{−t \Delta}, which corresponds to the Neumann heat kernel p_t (x, y) when seen as an integral operator.

Heat Kernels on Manifolds

For the Euclidean n-space, the heat kernel is the Gaussian probability density function with scalar variance twice the diffusion time: \bar{p}_t (d) = (4 \pi t)^{-n/2} e^{- d^2 / (4t)}, where distance d = |x - y|; in short, \bar{p}_t = \phi(0, 2 t I_n), where \phi is the PDF of a Gaussian random vector.

For the n-sphere, the heat kernel is a function of the spherical distance \theta = \arccos \langle x, y \rangle. It does not have a closed form, but can be written in series approximations. The heat kernel can be bounded as [@Nowak2019]: given T > 0, \forall \theta \in [0, \pi], \forall t \in (0, T], \mathring{p}_t(\theta) \asymp \mathcal{O}\left((t+ \pi - \theta)^{-(n-1)/2} t^{-n/2} e^{-\theta^2 / (4t)}\right), where the constants depend only on n and T; in short, \mathring{p}_t(\theta) \asymp \mathcal{O}\left((t+ \pi - \theta)^{-(n-1)/2} \bar{p}_t (\theta) \right). Note that \asymp denotes order equivalence.

For the hyperbolic n-space, the heat kernel is [@Davies1988]: (1) if n = 2m + 1, \breve p_t (d) = \frac{(-1)^m e^{-(n-1)^2 t / 4}}{\sqrt{(2\pi)^{n} (2 t)}} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right)^m e^{- d^2 / (4t)}; in short, \breve p_t (d) = (-2 t)^{m} e^{-m^2 t} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right)^m \bar p_t (d); (2) if n = 2m + 2, \breve p_t (d) = \frac{(-1)^m e^{-(n-1)^2 t / 4}}{\sqrt{(2\pi)^{n+1} (4 t^3)}} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right)^m \int_d^\infty \frac{s e^{- s^2 / (4t)}}{\sqrt{\cosh s - \cosh d}} \text{d}s. The heat kernels satisfy the recurrence relation: \breve p_{n+2} = \frac{-e^{-nt}}{2\pi} \left(\frac{1}{\sinh d} \frac{\partial}{\partial d}\right) \breve p_n.

Diffusion Maps

Diffusion processes (or random walks, Markov processes) can be useful for the geometric descriptions of data sets. [@Coifman2006a] approximates (anisotropic) diffusion processes \exp(-t H_\alpha) on submanifolds, where H_\alpha = \Delta - \Delta(p^{1-\alpha}) / p^{1-\alpha} is a symmetric Schrödinger operator and parameter \alpha \in \mathbb{R}, with special cases: \alpha = 1, heat diffusion; \alpha = 1/2, a Fokker–Planck diffusion; \alpha = 0, normalized graph Laplacian with isotropic weights. In comparison, "kernel eigenmaps" are diffusion processes \exp(-t Q_2^{-1} Q_1).

Diffusion distance or diffusion metric D_t(x, y) at time step t is the semi-metric defined as: D_t^2 (x, y) \equiv a^{(t)}(x, x) + a^{(t)}(y, y) - 2 a^{(t)}(x, y), which is equivalent to D_t^2 (x, y) = \sum_{i \in \mathbb{N}} \lambda_i^t [\phi_i(x) - \phi_i(y)]^2 and D_t^2(x,y) = \| a^{(t/2)}(x,\cdot) - a^{(t/2)}(y,\cdot) \|^2. The diffusion distance becomes a metric if the diffusion kernel a is positive definite. Unlike the geodesic distance, the diffusion distance is an average over all paths connecting two points, and thus robust to noise and topological short-circuits.

Diffusion map \Phi: \Gamma \to l^2(\mathbb{N}) is a mapping of the data to a Euclidean space, based on the eigenfunctions of the diffusion kernel: \Phi(x) = \left( \phi_i(x) \right)_{i \in \mathbb{N}}. Diffusion map \Phi defines an embedding of the data that preserves the diffusion distance D_t via a weighted Euclidean metric D_t^2(x,y) = \langle \Phi(x) - \Phi(y), \Lambda^t (\Phi(x) - \Phi(y)) \rangle. A truncated diffusion map \Phi_k: \Gamma \to \mathbb{R}^k with \Phi_k(x) = \left( \phi_i(x) \right)_{i=1}^k preserves the diffusion distance with certain accuracy. The dimension k of the new representation depends on the diffusion process, and is in general greater than the dimension d of the set/manifold.

(Inverse mapping.)

Heat Diffusion

Spectral graph theory analyzes the spectrum (eigenvalues and eigenvectors) of a matrix representing a graph. For example, spectral clustering or spectral partitioning: construct a matrix representation of the graph; eigendecompose the matrix and embed the vertices to a Euclidean space using one or more eigenvectors; group the points based on their coordinates. Laplacian matrix of a graph, aka graph Laplacian, is its degree matrix minus its adjacency matrix, L = D - A, where D = \mathrm{diag}\{d_i\}_{i=1}^n. The Laplacian matrix is a discrete analog of the Laplacian operator Δ in multivariable calculus. The Laplacian matrix is symmetric, so its eigenvectors can form an orthogonal basis; additionally, all eigenvalues are non-negative. Symmetric normalized graph Laplacian is L' = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}. Consider rotation-invariant kernels k_\varepsilon (x, y) = h(\|x-y\|^2 / \varepsilon) with scale parameter \varepsilon, where h is smooth and decays exponentially, and \|\cdot\| is the Euclidean metric. As \varepsilon goes to 0, the local geometry specified by k_\varepsilon coincide with that of the manifold. Given an averaging operator \tilde{A}_\varepsilon f(x) = K_\varepsilon f(x) / v_\varepsilon^2 (x) where v_\varepsilon^2 (x) = K_\varepsilon 1, the weighted graph Laplacian operator is \tilde{\Delta}_\varepsilon = (I - \tilde{A}_\varepsilon) / \varepsilon. On certain subspaces E_K of L^2(\Gamma), as \varepsilon goes to 0, the limit operator of \tilde{\Delta}_\varepsilon is H f = c (\Delta f + 2 \langle \nabla p / p , \nabla f \rangle), where c = m_2 / (2 m_0), m_0 = \int_{\mathbb{R}^d} h(\|u\|^2)~\mathrm{d}u, and m_2 = \int_{\mathbb{R}^d} u_i^2 h(\|u\|^2)~\mathrm{d}u. Under a conjugation by the density p, we get p H(g/p) = c (\Delta g - g \nabla p / p), where g = p f. Thus, when the density p is uniform, the weighted graph Laplacian \tilde{\Delta}_\varepsilon converges to (a multiple of) the Laplace-Beltrami operator \Delta on the manifold [@Belkin2003]; but when the density is non-uniform, such reconstruction does not hold.

To separate the distribution p on the manifold from its intrinsic geometry k_\varepsilon, use kernel \tilde{k}_\varepsilon(x, y) = k_\varepsilon(x, y)/(p_\varepsilon(x) p_\varepsilon(y)) where p_\varepsilon(x) = K_\varepsilon 1. This leads to an averaging operator \bar{A}_\varepsilon f(x) = \tilde{K}_\varepsilon f(x) / \tilde{v}_\varepsilon^2 (x) where \tilde{v}_\varepsilon^2 (x) = \tilde{K}_\varepsilon 1. Define a Laplace operator \Delta_\varepsilon = (I - \bar{A}_\varepsilon) / \varepsilon acting on the subspaces E_K of L^2(\Gamma), its limit operator \lim_{\varepsilon \to 0} \Delta_\varepsilon \equiv \Delta_0 = \Delta / c is a multiple of the Laplace-Beltrami operator. The limit operator of \bar{A}_\varepsilon^{t/\varepsilon} is the Neumann heat operator e^{−t \Delta_0}; in other words, Markov kernel \bar{a}_\varepsilon^{(t/\varepsilon)}(x, y) with small \varepsilon, close to a fine scale Gaussian kernel, approximates the (symmetric) Neumann heat kernel p_t (x, y), regardless of knowledge of the boundary. The averaging operator \bar{A}_\varepsilon is compact and its eigendecomposition can be written as \bar{A}_\varepsilon = \sum_{i \in \mathbb{N}} \lambda_{\varepsilon, i} P_{\varepsilon, i}, where P_{\varepsilon, i} is the orthogonal projector on the eigenspace corresponding to eigenvalue \lambda_{\varepsilon, i}. If the eigendecomposition of the Laplace-Beltrami operator is \Delta = \sum_{i \in \mathbb{N}} \nu_i^2 P_i, where P_i is the orthogonal projector on the eigenspace corresponding to eigenvalue \nu_i^2. Then the eigenfunctions and eigenvalues of \bar{A}_\varepsilon^{t/\varepsilon} converge to those of the heat operator e^{-t\Delta}: \lim_{\varepsilon \to 0} \lambda_{\varepsilon, i}^{t/\varepsilon} = e^{-t\nu_i^2}, \lim_{\varepsilon \to 0} P_{\varepsilon, i} = P_i. Let \{\phi_i\}_{i \in \mathbb{N}} be the orthonormal eigenfunctions of the Laplace-Beltrami operator (also of the heat kernel) on the manifold \Gamma, then the eigendecomposition of the heat kernel is p_t(x, y) = \sum_{i \in \mathbb{N}} e^{-t \nu_i^2} \phi_i(x) \phi_i(y). Although \{\phi_i\}_{i \in \mathbb{N}} is usually viewed as a Hilbert basis of L^2(Γ), it also forms a set of coordinates on the submanifold Γ via diffusion maps \Phi, which accurately preserves the heat diffusion distance with dimension k \ge d.

Numerical procedure for approximate Neumann heat diffusion: (Note that skipping steps 3-4 provides the weighted graph Laplacian.)

  1. Choose scale parameter \varepsilon to be (on the order of) the larger of: (1) the mean square distance within the set, \sum_x m(x) / N \min_{y \ne x} \|x - y\|^2, where m(x) is the number of observations identical to x in the data; and (2) the square of the perturbation size \|\eta\|^2, i.e. trace of the covariance matrix of the perturbation. Note that small \varepsilon may result in a disconnected graph.
  2. Construct operator [K_\varepsilon] with entries [K_\varepsilon]_{xy} = k_\varepsilon(x, y), where k_\varepsilon(x, y) = e^{-\|x-y\|^2/\varepsilon} is a Gaussian kernel.
  3. Compute approximate density p_\varepsilon = [K_\varepsilon] \mathbf{1} by substituting true density p(x) with the empirical measure.
  4. Compute density-neutralizing operator [\tilde{K}_\varepsilon] = \mathrm{diag}(p_\varepsilon^{-1}) [K_\varepsilon] \mathrm{diag}(p_\varepsilon^{-1}). It can be implemented as [K_\varepsilon] / [p_\varepsilon p_\varepsilon^T], where / is element-wise division.
  5. Compute normalizing vector \tilde{v}_\varepsilon^2 = [\tilde{K}_\varepsilon] \mathbf{1}. Take a square root to get \tilde{v}_\varepsilon.
  6. Compute diffusion operator [A_\varepsilon] = \mathrm{diag}(\tilde{v}_\varepsilon^{-1}) [\tilde{K}_\varepsilon] \mathrm{diag}(\tilde{v}_\varepsilon^{-1}). It can be implemented as [\tilde{K}_\varepsilon] / [\tilde{v}_\varepsilon \tilde{v}_\varepsilon^T].
  7. Eigendecompose [A_\varepsilon] = \Phi \Lambda_\varepsilon \Phi^T. The Neumann heat operator has approximate spectrum \lambda_\varepsilon and eigenvectors \{\phi_i / \phi_1\}_{i=1}^N. Note that A_\varepsilon \tilde{v}_\varepsilon = \tilde{v}_\varepsilon, so \phi_1 and \tilde{v}_\varepsilon are proportional.

Anisotropic Diffusion

For differential and dynamical systems.

Geometric Harmonics

Diffusion processes are also useful for harmonic analysis of functions on data sets. Geometric harmonics is a set of functions that allows out-of-sample extension of empirical functions on the data set.

Bi-Lipschitz Parametrization

Geometric harmonics provide a simple solution to the relaxed distortion problem, an embedding \Psi that is bi-Lipschitz with a small distortion.

Atlas Computation:

  1. ...

Eigenfunction selection:

  1. ...

Extension and Restriction

Intrinsic geometry, Fourier analysis of the manifold (eigenfunctions of the Laplace-Beltrami operator). Extrinsic geometry, Fourier analysis of the ambient space.

Restriction.

Extension (a version of the Heisenberg uncertainty principle): extend the eigenfunctions to a band-limited function of band \mathcal{O}(\nu_i), localized in a tube of radius \mathcal{O}(1/\nu_i) around the manifold.

(Extension of f = 1 for discription of the manifold?)

Multiscale extension: empirical functions are decomposed into frequency bands, and each band is extended to a certain distance so that it satisfies some version of the Heisenberg principle.

(Intrinsic and extrinsic diffusion.)

Multiscale extension scheme:

  1. ...

🏷 Category=Manifold