This article handles the general theory for statistics on manifolds. For density estimation, density approximation, and sampling on manifolds, see Probabilistic Learning on Manifolds. For parametric probabilistic models on common matrix manifolds, see Probabilistic Models on Matrix Manifolds.
Main references: [@Pennec2006; @Chikuse2003].
Hausdorff measure $\mathcal{H^d}$ on a Riemannian d-manifold is the integral of the Riemannian density over measurable sets: $\mathcal{H^d}(A) = \int_A d V_g$, $A \subset \mathcal{M}$. Hausdorff measure generalizes the Lebesgue measure on Euclidean spaces to Riemannian manifolds. A measure on a Riemannian manifold is absolutely continuous w.r.t. the Hausdorff measure if and only if it is a product of the Hausdorff measure with a non-negative unit-integral function: $\mu \ll \mathcal{H^d}$ iff $\exists f \in L^1(\mathcal{M}, \mathbb{R}_{\ge 0})$, $\int_{\mathcal{M}} f d V_g = 1$, $\mu = f \mathcal{H}^d$. Such a function f is the probability density function (PDF) of μ w.r.t. the Hausdorff measure. In particular, if the manifold is compact, the Hausdorff measure normalized by the full integral (i.e. the volume if the manifold is oriented) defines a reference probability measure on the manifold: $\mu_0 = \mathcal{H}^d / \int_{\mathcal{M}} d V_g$; it corresponds to the uniform distribution on a compact domain of a Euclidean space.
Given a homeomorphism $f: \mathcal{M} \cong \mathcal{M}$ over a topological space, an invariant measure under f is a measure that matches the measure induced by f: $\forall A \subset \mathcal{M}$, $\mu(f^{-1}(A)) = \mu(A)$. Theorem [@Chikuse2003, Thm 1.2.2]: If a compact topological group acts continuously and transitively on a topological space, then there exists a unique invariant measure on the space under the transformations of the group. Corollary: Every homogeneous space (aka homogeneous manifold) has a unique invariant measure under the associated Lie group, if the group is compact. PDF can also be defined w.r.t. the invariant measure under the group. Invariant differential form is defined for smooth d-manifolds similarly to invariant measures [@Chikuse2003, Sec 1.2.2]. Because every nonvanishing differential d-form determines a positive density, whose integral defines a measure, their invariance are all equivalent, and we can define invariant measures via invariant differential d-forms. For example, see [@Chikuse2003, 1.4.2, 1.4.3, 1.4.5, 1.4.6] for common matrix manifolds (orthogonal group, Stiefel manifold, Grassmann manifold, fixed-rank symmetric projection manifold).
Because Riemannian manifolds generally do not have a linear structure, the mean / expectation of a probability measure cannot be carried over from the Euclidean setting. However, because variance is based on the norm of a difference, it can be generalized to a metric, such as the Riemannian distance function of a connected Riemannian manifold.
Pointwise variance $\sigma^2(y)$ of a probability measure on a connected Riemannian manifold, w.r.t. a point on the manifold, is the expectation of the squared Riemannian distance from the point to a random point: let $(\mathcal{M}, d_g, \mu)$, for $y \in \mathcal{M}$, $\sigma^2(y) = \mathbb{E} d_g^2(y, \chi) = \int_{\mathcal{M}} d_g^2(y, x) \mu(d x)$. Variance $\sigma^2$ of a probability measure on a connected Riemannian manifold is the minimum pointwise variance, if the minimum exists: $\sigma^2 = \min_{y \in \mathcal{M}} \sigma^2(y)$. Standard deviation $\sigma$ of the probability measure is the (principal) square root of its variance.
Fréchet mean (IPA French: [fʁeʃe]) of a probability measure on a connected Riemannian manifold is the set of points with the minimum variance, if the minimum exists [@Fréchet 1944, 1948]: $\mathbb{E} \chi = \arg\min_{y \in \mathcal{M}} \sigma^2(y)$. Each point in this set is call an expected point or a mean point. Riemannian center of mass or Karcher mean of a probability measure on a connected Riemannian manifold is the set of points which minimize the variance locally [@Karcher1977]: $\mathbb{E} \chi = \{y \in \mathcal{M} : \exists \varepsilon > 0, y = \arg\min \sigma^2|_{\exp_y(B_\varepsilon(0))}\}$.
Existence and uniqueness of the Riemannian center of mass. Regular geodesic ball is a geodesic ball whose radius is less than $C^{-1/2} \pi / 2$, where $C = \max \{\text{sec}(\Pi) : \Pi \in G_2(T_{q} \mathcal{M}), q \in \exp_p(B_r(0))\}$ is the maximum sectional curvature in this geodesic ball. Theorem [@Karcher1977]: If the measure is supported in a regular geodesic ball and the ball of double radius is still a regular geodesic ball, then the pointwise variance $\sigma^2(y)$ is a geodesically convex function and has only one critical point which is the unique Riemannian center of mass. Theorem [@Kendall1990]: If the measure is supported in a regular geodesic ball, then it has a unique Riemannian center of mass.
"Pullback measure" on the tangent space of a point in a connected complete Riemannian manifold, given a measure on the manifold, is the measure induce on the injectivity domain of the point by the exponential map: for $T \subset \text{ID}(x) \subset T_x \mathcal{M}$, define $\mu(T) = \mu(\exp_x(T))$. The PDF of the pullback measure is $p(v) = p(y) \omega_g(d \exp_{x}(v) (e_i))_{i=1}^n$. Note that we use the same notation for the objects on the manifold and on the injectivity domain.
Theorem (differentiability of pointwise variance) [@Pennec2006; @Pr Maillot, 1997]: The pointwise variance of a probability measure on a connected complete Riemannian manifold is differentiable at any point where its cut locus has measure zero and the pointwise variance is finite: if $\mu(\text{Cut}(y)) = 0$ and $\sigma^2(y) < \infty$, then $\text{grad}~\sigma^2 (y) = -2 \int_{\text{ID}(y)} v~\mu(d v)$.
If the pointwise variance is differentiable, then any Riemannian center of mass must be a critical point: $\mathbb{E} \chi \subset (\text{grad}~\sigma^2)^{-1}(0)$.
The covariance matrix of a random vector can be naturally extented to a tangent space. Covariance of a probability measure on a connected complete Riemannian manifold, w.r.t. a point whose cut locus has measure zero, is the covariant 2-tensor $\text{Cov}(y) = \int_{\text{ID}(y)} v v^T \mu(d v)$. The trace of the covariance equals the pointwise variance: $(\text{tr}_g \text{Cov})(y) = \sigma^2(y)$.