Probabilistic learning on manifold.

A probability distribution concentrated on a subset of the Euclidean space, which is viewed as a manifold.

Probability Density Estimation and Sampling

A synthesis of methods:

  1. kernel density estimation (KDE) for the probability distribution of the sample matrix;
  2. diffusion maps for the "local" geometric structure (manifold) of the dataset;
    • reduced-order representation of the sample matrix by top eigenvectors of the diffusion map;
  3. Markov chain Monte Carlo (MCMC) for generating realizations for the sample matrix;

Procedure:

  1. scale the data to (ϵ, 1): \(\), a matrix of n attributes by N observations;
  2. normalize the data by principal component analysis (PCA): \([η] = [μ]^{−1/2} [φ]^T [x_0]\)
    • μ are the v positive eigenvalues of covariance matrix \([c] = 1/(N-1) [x_0] [x_0]^T\);
    • φ the corresponding orthonormal eigenvectors;
  3. kernel density estimation (KDE) of the random vector: \(p_H(η) = 1/N \sum_i π(η; η^i s'/s, s')\),
    • \(π(η; m, σ) = \exp{ - |(η - m)/σ|^2 / 2 } / (\sqrt{2 \pi} σ)^v \) is the Gaussian kernel;
    • \(s = (4 / ((v + 2)N)^{1 / (v + 4)}\) is the optimal Silverman bandwidth,
    • \(s' = s / \sqrt{s^2 + N / (N − 1)}\);
  4. characterize the manifold using diffusion maps: \([η] = [z] [g]^T\),
    • \( [z] = [η] [a], [a] = [g] ([g]^T [g])^{-1} \) (so that \([z] [g]^T = [η] P_{[g]}\));
    • \({g} = [P]^κ {ψ}\) is the "diffusion maps basis", \([P] = \mathrm{diag}\{[K] 1\}^{−1} [K]\) is a transition matrix with right eigenvectors {ψ}, \([K]_{ij} = k_ε(η^i, η^j)\) are transition likelihood, \(k_ε(x, y)\) is a kernel (symmetric, non-negative function) with a smoothing parameter ε, such as the Gaussian kernel \(k_ε(x, y) = \exp(− (x − y)^2 / (4ε))\), κ is the analysis scale of the local geometric structure of the dataset;
    • if only the first m diffusion maps basis vectors \([g](m)\) are retained, then \([η](m) = [η] P_{[g](m)}\) is a reduced-order representation that projects \([η]\) onto \([g](m)\);
  5. sample by solving a reduced-order Itô stochastic differential equation (ISDE) numerically: \([η^l] = [Z(lρ)] [g](m)^T\), \(l = 1, 2, ...\) and \(ρ = M_0 Δt\), where m satisfies mean-square convergence criterion \(|[c](m) - [c]|_F < ε_0 |[c]|_F\) given \(ε_0\); \([Z]\) satisfies the reduced-order ISDE (so that \([Z] [g](m)^T\) admits \(p_{[H](m)}(η)\)): \[\begin{cases} d[Z] = [Y] dr \\ d[Y] = [L]([Z] [g](m)^T) [a](m) dr − f_0/2 [Y] dr + \sqrt{f_0} d[W] [a](m) \\ [Z](0) = [H] [a](m);\quad [Y](0) = [N] [a](m) \end{cases}\], \(Δt = 2 \pi s' / \text{Fac}\) is the sampling step of the integration scheme (oversampled if Fac > 1), \(M_0\) is a multiplier such that \(ρ \gg 4 / f_0\), the relaxation time of the dynamical system;

The Störmer–Verlet discretization scheme preserves energy for non-dissipative Hamiltonian dynamical systems: \[\begin{cases} [Z_{l+1/2}] = [Z_l] + Δt/2 [Y_l] \\ [Y_{l+1}] = (1-b)/(1+b) [Y_l] + Δt/(1+b) [L_{l+1/2}] [a](m) + \sqrt{f_0}/(1+b) [ΔW_{l+1}] [a](m) \\ [Z_{l+1}] = [Z_{l+1/2}] + Δt/2 [Y_{l+1}] \end{cases}\], where \([L_{l+1/2}] = [L]([Z_{l+1/2}] [g](m)^T)\) and \(b = f_0 Δt/4\).

Markov stochastic process of a nonlinear second-order dynamical system (dissipative Hamiltonian system) \[\begin{cases} d[U] = [V] dr \\ d[V] = [L]([U]) dr − f_0/2 [V] dr + \sqrt{f_0} d[W] \\ [U](0) = [H];\quad [V](0) = [N] \end{cases}\], where \([L]([u]) = ( -∇ν(u^j) )_j\) and \(ν(u) = - \text{LogSumExp}\{ -(u - s'/s η^i)^2 / (2 s'^2) \}\), \([W]\) are N independent v-dimensional normalized Wiener process (increments are standard Gaussian), \([N]\) are N independent v-dimensional normalized Gaussian vector, \([H]\) is a random matrix with a realization \([η]\), \(f_0\) is a dissipation parameter such that the transient response of the ISDE are rapidly killed. The ISDE has a unique invariant measure and a unique solution that is a second-order diffusion stochastic process, which is stationary and ergodic, and such that the probability distribution of random matrix \([U]\) is \(p_{[H]}(η)\);

Parameters: \((ε, κ, m, f_0, Δt, M_0)\) (or replace m with \(ε_0\));

(TODO: for discrete distributions)

Polynomial Chaos representation

Probabilistic non-convex optimization