Probabilistic learning on manifold.

A probability distribution concentrated on a subset of the Euclidean space, which is viewed as a manifold.

Probability Density Estimation and Sampling

A synthesis of methods:

  1. kernel density estimation (KDE) for the probability distribution of the sample matrix;
  2. diffusion maps for the "local" geometric structure (manifold) of the dataset;
    • reduced-order representation of the sample matrix by top eigenvectors of the diffusion map;
  3. Markov chain Monte Carlo (MCMC) for generating realizations for the sample matrix;

Procedure:

  1. scale the data to (ϵ, 1): , a matrix of n attributes by N observations;
  2. normalize the data by principal component analysis (PCA): [η] = [μ]^(−1/2) [φ]^T [x_0], where μ are the v positive eigenvalues of covariance matrix [c] = 1/(N-1) [x_0] [x_0]^T, and φ the corresponding orthonormal eigenvectors;
  3. kernel density estimation (KDE) of the random vector: p_H(η) = 1/n \sum_i π_{s'}(η - s'/s η^i), where π is the v-dimensional Gaussian kernel, s = (4 / ((v + 2)N)^{1 / (v + 4)} is the optimal Silverman bandwidth, and s' = s / \sqrt{s^2 + N / (N − 1)};
  4. characterize the manifold using diffusion maps: [η] = [z] [g]^T, where [z] = [η] [a], [a] = [g] ([g]^T [g])^{-1} (so that [z] [g]^T = [η] P_{[g]}), {g} = [P]^κ {ψ} is the "diffusion maps basis", [P] = \diag([K] 1)^{−1} [K] is a transition matrix with right eigenvectors {ψ}, [K]_{ij} = k_ε(η^i, η^j) are transition likelihood, k_ε(x, y) is a kernel (symmetric, non-negative function) with a smoothing parameter ε, such as the Gaussian kernel k_ε(x, y) = \exp(− (x − y)^2 / (4ε)), κ is the analysis scale of the local geometric structure of the dataset; if only the first m diffusion maps basis vectors [g](m) are retained, then [η](m) = [η] P_{[g](m)} is a reduced-order representation that projects [η] onto [g](m);
  5. sample by solving a reduced-order Itô stochastic differential equation (ISDE) numerically: [η^l] = [Z(lρ)] [g](m)^T, l = 1, 2, ... and ρ = M_0 Δt, where m satisfies mean-square convergence criterion \|[c](m) - [c]\|_F < ε_0 \|[c]\|_F given ε_0; [Z] satisfies the reduced-order ISDE (so that [Z] [g](m)^T admits p_{[H](m)}(η)): d[Z] = [Y] dr; d[Y] = [L]([Z] [g](m)^T) [a](m) dr − f_0/2 [Y] dr + \sqrt{f_0} d[W] [a](m), with initial condition [Z](0) = [H] [a](m); [Y](0) = [N] [a](m); Δt = 2 \pi s' / {Fac} is the sampling step of the integration scheme (over-sampled if Fac > 1), M_0 is a multiplier such that ρ \gg 4 / f_0, the relaxation time of the dynamical system;

The Störmer–Verlet discretization scheme preserves energy for non-dissipative Hamiltonian dynamical systems: [Z_{l+1/2}] = [Z_l] + Δt/2 [Y_l], [Y_{l+1}] = (1-b)/(1+b) [Y_l] + Δt/(1+b) [L_{l+1/2}] [a](m) + \sqrt{f_0}/(1+b) [ΔW_{l+1}] [a](m), [Z_{l+1}] = [Z_{l+1/2}] + Δt/2 [Y_{l+1}], where [L_{l+1/2}] = [L]([Z_{l+1/2}] [g](m)^T) and b = f_0 Δt/4.

Markov stochastic process of a nonlinear second-order dynamical system (dissipative Hamiltonian system) d[U] = [V] dr; d[V] = [L]([U]) dr − f_0/2 [V] dr + \sqrt{f_0} d[W], with initial condition [U](0) = [H]; [V](0) = [N], where [L]([u]) = ( -∇ν(u^j) )_j and ν(u) = - \text{LogSumExp}\{ - (u - s'/s η^i)^2 / (2 s'^2) \}, [W] are N independent v-dimensional normalized Wiener process (increments are standard Gaussian), [N] are N independent v-dimensional normalized Gaussian vector, [H] is a random matrix with a realization [η], f_0 is a dissipation parameter such that the transient response of the ISDE are rapidly killed. The ISDE has a unique invariant measure and a unique solution that is a second-order diffusion stochastic process, which is stationary and ergodic, and such that the probability distribution of random matrix [U] is p_{[H]}(η);

Parameters: (ε, κ, m, f_0, Δt, M_0) (or replace m with ε_0);

(TODO: for discrete distributions)

Polynomial Chaos representation

Probabilistic non-convex optimization