This article looks at some parametric families of probability distributions on matrix manifolds. The default reference is [@Chikuse2003]; other main references include [@Dryden2016; @Bhattacharya2012; @Patrangenaru2015; @Srivastava2016]. In this article, |A| denotes the determinant of a square matrix A; $N_n$ denotes the distribution of an n-dimensional Gaussian random vector; (dX) denotes the Lebesgue measures for rectangular, symmetric, or skew-symmetric matrices, or the unnormalized invariant measure on the Stiefel or Grassmann manifold; [dX] denotes the normalized invariant measure on the Stiefel or Grassmann manifold. Following [@Chikuse2003], all PDFs on the Stiefel or Grassmann manifold are defined w.r.t. the normalized measures.

Left and right orthogonal invariance... $M \in M_{m,n}$, $p(M) = p(M Q)$, $\forall Q \in O(n)$. See orthogonal-invariant function.

**Matrix-variate standard Gaussian** distribution $N_{n,k}(0; I_n, I_k)$
on the n-by-k matrix manifold $M_{n,k}$ [Sec 1.5]
is the nk-dimensional standard Gaussian vector stacked into a matrix:
let $\text{vec}(Z) \sim N_{nk}(0, I_{nk})$, then $Z \sim N_{n,k}(0; I_n, I_k)$.
PDF: (1) $\phi(Z) = z^{-1} \exp(-\text{tr}(Z^T Z) / 2)$,
where normalizing constant $z = (2 \pi)^{kn/2}$;
(2) because the Frobenius norm is related to the trace via $\|Z\|_F^2 = \text{tr}(Z^T Z)$,
we have $\phi(Z) = z^{-1} \exp(-\|Z\|_F^2 / 2)$.
Its characteristic function is also log-quadratic: $\Phi_Z(T) = \exp(-\text{tr}(T^T T)/2)$.

**Matrix-variate Gaussian** distribution $N_{n,k}(M; \Sigma_n, \Sigma_k)$
on the n-by-k matrix manifold,
where $M \in M_{n,k}$, $\Sigma_n \in \mathcal{S}_+(n)$, and $\Sigma_k \in \mathcal{S}_+(k)$,
is the distribution of a transformed matrix-variate standard Gaussian:
$Y = \Sigma_n^{1/2} Z \Sigma_k^{1/2} + M$, where $Z \sim N_{n,k}(0; I_n, I_k)$.
This can be written in a vectorized form as:
$\text{vec}(Y) = (\Sigma_k^{1/2} \otimes \Sigma_n^{1/2}) \text{vec}(Z) + \text{vec}(M)$,
where $\otimes$ is the Kronecker product.
Note that this is a special form of the nk-dimensional Gaussian vector $N_{nk}(\mu, \Sigma)$,
where covariance matrix $\Sigma = \Sigma_k \otimes \Sigma_n \in \mathcal{S}_+(nk)$.
PDF: (1) relation with the matrix-variate standard Gaussian PDF:
$p(Y; M, \Sigma_n, \Sigma_k) = |\Sigma_n|^{-k/2} |\Sigma_k|^{-n/2}
\phi(\Sigma_n^{-1/2} (Y - M) \Sigma_k^{-1/2})$;
(2) explicit form: $p(Y; M, \Sigma_n, \Sigma_k) = z^{-1} \exp\{-\text{tr}[\Sigma_n^{-1} (Y - M)
\Sigma_k^{-1} (Y - M)^T] / 2\}$,
where normalizing constant $z = (2 \pi)^{kn/2} |\Sigma_n|^{k/2} |\Sigma_k|^{n/2}$.

**Symmetric standard Gaussian** distribution $N_{n,n}(0; I_n)$
on symmetric matrix manifold $\mathcal{S}(n)$
is the matrix-variate standard Gaussian restricted to the symmetric matrix manifold:
let $M \sim N_{n,n}(0; I_n, I_n)$, then $M|_{\mathcal{S}(n)} \sim N_{n,n}(0; I_n)$.
PDF: (1) $\phi(S) = z^{-1} \exp(-\text{tr}(S^2)/2)$,
where normalizing constant $z = 2^{n/2} \pi^{n(n+1)/4}$;
(2) norm version, $\phi(S) = z^{-1} \exp(-\|S\|_F^2 / 2)$.
Its characteristic function is also log-quadratic: $\Phi_S(T) = \exp(-\text{tr}(T^2)/2)$.

**Symmetric Gaussian** distribution $N_{n,n}(M; \Sigma)$
on symmetric matrix manifold $\mathcal{S}(n)$,
where $M \in \mathcal{S}(n)$ and $\Sigma \in \mathcal{S}_+(n)$,
is the distribution of a transformed symmetric standard Gaussian:
$Y = \Sigma^{1/2} Z \Sigma^{1/2} + M$, where $Z \sim N_{n,n}(0; I_n)$.
PDF: (1) relation with the symmetric standard Gaussian PDF:
$p(Y; M, \Sigma) = |\Sigma|^{-(n+1)/2} \phi(\Sigma^{-1/2} (Y - M) \Sigma^{-1/2})$;
(2) explicit form: $p(Y; M, \Sigma) = z^{-1} \exp\{-\text{tr}[\Sigma^{-1} (Y - M)
\Sigma^{-1} (Y - M)] / 2\}$,
where normalizing constant $z = 2^{n/2} \pi^{n(n+1)/4} |\Sigma|^{(n+1)/2}$.

**Wishart** distribution $W_n(m, \Sigma)$ on positive-definite manifold $\mathcal{S}_+(n)$,
where degree of freedom $m \in \{n, n+1, \cdots\}$
and covariance matrix $\Sigma \in \mathcal{S}_+(n)$,
is the sampling distribution of the (unscaled) covariance matrix
of a zero-mean Gaussian random vector [@Wishart1928]:
let $y \sim N_n(0, \Sigma)$, if $Y = \{y_i\}_{i=1}^m$ is a random sample,
then $Y Y^T = \sum_{i=1}^m y_i y_i^T \sim W_n(m, \Sigma)$.
Notice that $Y \sim N_{n,m}(0; \Sigma, I_m)$ and $Y^T \sim N_{m,n}(0; I_m, \Sigma)$
are special forms of matrix-variate Gaussians.
PDF: (1) $p(S; m, \Sigma) = z^{-1} |S|^{(m-n-1)/2} \exp(-\text{tr}(\Sigma^{-1} S)/2)$,
where normalizing constant $z = 2^{mn/2} \Gamma_n(m/2) |\Sigma|^{m/2}$,
and $\Gamma_n(a)$ is the multivariate gamma function;
(2) norm version,
$p(S; m, \Sigma) = z^{-1} |S|^{(m-n-1)/2} \exp(-\|\Sigma^{-1/2} S^{1/2}\|_F^2 / 2)$.
Its characteristic function is: $\Phi_S(T) = |I_n - 2i \Sigma T|^{-m/2}$.
The Wishart distribution converges to the symmetric standard Gaussian
in a specific limit [@James1976]:
$\lim_{u \to \infty} p(u I_n + \sqrt{u} S; n+1+2u, I_n/2) = z^{-1} \exp(-\text{tr}(S^2)/2)$.

**Noncentral Wishart** distribution $W_n(m, \Sigma; \Omega)$
on positive-definite manifold $\mathcal{S}_+(n)$,
where non-centrality matrix $\Omega \in \mathcal{S}_{\ge 0}(n)$,
is the distribution of the sum of m squared equal-variance Gaussian random vectors [@Muirhead1982]:
let $y_i \sim N_n(m_i, \Sigma)$, and $Y = \{y_i\}_{i=1}^m$ is jointly independent,
then $Y Y^T = \sum_{i=1}^m y_i y_i^T \sim W_n(m, \Sigma, \Sigma^{-1} M M^T)$,
where $M = \{m_i\}_{i=1}^m$.
Notice that $Y \sim N_{n,m}(M; \Sigma, I_m)$ and $Y^T \sim N_{m,n}(M^T; I_m, \Sigma)$
are special forms of matrix-variate Gaussians.
PDF: $p(S; m, \Sigma, \Omega) = z^{-1} {}_0 F_{1}(m/2; \Omega \Sigma^{-1} S / 4)\ p(S; m, \Sigma)$,
where normalizing constant $z = \exp(\text{tr}(\Omega)/2)$,
and ${}_0 F_{1}(m/2; \Omega \Sigma^{-1} S / 4)$ is a hypergeometric function with a matrix argument,
which equals $\int_{O(m)} \exp(\text{tr}((\Omega \Sigma^{-1} S)^{1/2} H)) [d H]$.
Note that the noncentral Wishart generalizes the Wishart distribution:
$W_n(m, \Sigma; 0) = W_n(m, \Sigma)$.

An unnamed family of distributions on the positive-definite manifold is given by [@Hayakawa1966], see [Thm 2.4.2] below. Its exact computation is intractable. It reduces to the simplest Wishart distribution: $p(\cdot; I_n) = W_k(n, I_k)$.

**von Mises distribution** on the 1-shpere (i.e. circle)
is a closed-form approximation to the wrapped Gaussian distribution.
**von Mises–Fisher distribution** (vMF) [@Fisher1953]
generalizes the von Mises distribution to n-spheres,
and is the most commonly used distribution in directional/circular/spherical statistics:
$f(x; a, b) \propto \exp(b \cos(x-a))$, where $x, a \in [0, 2\pi), b \ge 0$.

**Bingham** [@Bingham1974]

**Kent** [@Kent1982]

**Angular central Gaussian** (ACG)

**Bivariate von Mises** on the 2-torus [@Mardia1975]

**Uniform** distribution on the Stiefel manifold w.r.t. the invariant measure
is the normalized invariant measure, which is possible because the invariant measure is finite:
$[d X] = (d X) / v(k, n)$, where $v(k, n) = 2^k \pi^{kn/2} \Gamma_k^{-1}(n/2)$.
Note that $v(k, n) := \int_{V_{k,n}} (d X)$
should not be called the volume or mass of the Stiefel manifold
because the invariant differential n-form is defined differently from the Riemannian volume form.
PDF: $p(X) = 1$.

**Matrix Langevin** $L_{k,n}(F)$ or **matrix von Mises-Fisher** (matrix vMF), where $F \in M_{n,k}$,
is an exponential-linear family on the Stiefel manifold $V_{k,n}$ [@Downs1972].
PDF: $p(X; F) = z^{-1} \exp(\text{tr}(F^T X))$,
where normalizing constant $z = {}_{0} F_{1} (n/2; F^T F / 4)$.
It is analgous to the distribution $\exp(c x)$,
but on a compact domain and can have non-negative coefficients.
The von Mises-Fisher (vMF) for n-spheres and matrix vMF for Stiefel manifolds
are in the larger family of maximum entropy distributions with moment constraints [@Pennec2006].
A matrix-variate Gaussian restricts to matrix Langevin on the Stiefel:
if $X \sim N_{n,k}(M; I_n, \Sigma)$, then $X|_{V_{k,n}} \sim L_{k,n}(M \Sigma^{-1})$.
Matrix Langevin on the Stiefel contains the uniform distribution: $L_{k,n}(0) = U(V_{k,n})$.

**Matrix Bingham** $B_{k,n}(B)$, where $B \in \mathcal{S}(n)$,
is an exponential-quadratic family on the Stiefel manifold $V_{k,n}$.
PDF: $p(X; B) = z^{-1} \exp(\text{tr}(X^T B X))$,
where normalizing constant $z = {}_{1} F_{1} (k/2; n/2; B/2)$.
It is analgous to $\exp(c x^2)$, but on a compact domain and can have non-negative coefficients.
It is right orthogonal invariant, and therefore explicitly defines a distribution
on the Grassmann manifold (see matrix Langenvin on the Grassmann).
Matrix Bingham contains the uniform distribution: $B_{k,n}(I_n) = U(V_{k,n})$.

Consider the orthonormalization or projection $\pi: M^∗_{n,k} \mapsto V_{k,n}$
defined by $\pi(M) = U V^T$, where $M = U \Sigma V^T$ is a thin SVD.
Note that although SVD is not unique, the mapping is uniquely defined.
*Theorem* (Orthonormalization induced distribution) [Thm 2.4.1]:
Let a random n-by-k matrix $M \sim \tilde{f}$, denote $\pi(M) \sim f$ and $M^T M \sim g$,
then $f(X) = c(n,k) \int_{\mathcal{S}_+(k)} \tilde{f}(X S^{1/2}) |S|^{(n-k-1)/2} (d S)$,
and $g(S) = c(n,k) |S|^{(n-k-1)/2} \int_{V_{k,n}} \tilde{f}(X S^{1/2}) [d X]$,
with coefficient $c(n,k) = \pi^{kn/2} \Gamma_k^{-1}(n/2)$.
*Theorem* (Orthonormalization of left orthogonal invariant Gaussian) [Thm 1.5.5, 2.4.3]:
Let $M \sim N_{n,k}(0; I_n, \Sigma)$,
then $\pi(M) \sim U(V_{k,n})$, $M^T M \sim W_k(n, \Sigma)$, and $\pi(M) \perp M^T M$.
*Theorem* (Orthonormalization of right orthogonal invariant Gaussian) [Thm 2.4.2]:
Let $M \sim N_{n,k}(0; \Sigma, I_k)$,
then $\pi(M) \sim \text{MACG}(\Sigma)$, and
$M^T M \sim p(S; \Sigma) = z^{-1} |S|^{(n-k-1)/2} {}_{0} F_{0}^{(n)}(-\Sigma^{-1}/2, S)$,
where normalizing constant $z = 2^{nk/2} \Gamma_{k}(n/2) |\Sigma|^{k/2}$,
and ${}_{0} F_{0}^{(n)}(S, T) := \int_{O(n)} \exp(\text{tr}(S H T H^T)) [d H]$,
with $S, T \in \mathcal{S}(n)$ [Eq A.6.6].

**Matrix angular central Gaussian** $\text{MACG}(\Sigma)$ on the Stiefel manifold $V_{k,n}$,
where $\Sigma \in \mathcal{S}_+(n)$, is the distribution induced by orthonormalization
of a right orthogonal invariant Gaussian [@Tyler1987]:
if $M \sim N_{n,k}(0; \Sigma, I_k)$, then $\pi(M) \sim \text{MACG}(\Sigma)$.
Equivalently, if a random n-by-k matrix has a PDF of the form
$p(M; \Sigma) \propto g(M^T \Sigma^{-1} M)$ and is right orthogonal invariant,
then $\pi(M) \sim \text{MACG}(\Sigma)$.
Note that for $N_{n,k}(0; \Sigma, I_k)$, we have $g(S) = \exp(-\text{tr}(S)/2)$.
More generally, g() may build on matrix-variate functions (trace, determinant, spectral radius),
real functions, and (optionally) matrix functions (matrix polynomial / exponential / logarithm).
PDF: $p(X; \Sigma) = z^{-1} |X^T \Sigma^{-1} X|^{-n/2}$,
where normalizing constant $z = |\Sigma|^{k/2}$.
The parameter is determined up to scaling:
$\text{MACG}(\Sigma) = \text{MACG}(c \Sigma)$, $\forall c > 0$.
Evaluating the PDF with a diagonal $\Sigma$:
matrix multiplication takes $n k^2$ FLOPs; determinant takes $\mathcal{O}(k^3)$ FLOPs.
It is (also) right orthogonal invariant, and therefore explicitly defines a distribution
on the Grassmann manifold (see MACG on the Grassmann).
MACG contains the uniform distribution: $\text{MACG}(I_n) = U(V_{k,n})$.
If $M \sim N_{n,k}(0; \Sigma, I_k)$, then $\pi(B M) \sim \text{MACG}(B \Sigma B^T)$;
if $B^T B = \Sigma^{-1}$, then $B \Sigma B^T = I_n$ and therefore $\pi(B M) \sim U(V_{k,n})$.

An unnamed general family $\mathcal{F}^{(V)}(B; d)$ on the Stiefel manifold $V_{k,n}$, where $B \in \mathcal{S}(n)$ and $d: \mathbb{Z}_{+\downarrow}^\infty \mapsto \mathbb{R}$, is defined by PDF of the form [Eq 2.3.9]: $p(X; B; d) = z^{-1} g(X^T B X)$, where normalizing constant $z = g_0(B)$. Here, $g(S) = \sum_{l=0}^\infty (l!)^{-1} \sum_{\lambda \vdash l} d_{\lambda} C_{\lambda}(S)$, and $g_0(B) = \sum_{l=0}^\infty (l!)^{-1} \sum_{\lambda \vdash l} d_{\lambda} (k/2)_{\lambda} (n/2)_{\lambda}^{-1} C_{\lambda}(B)$. For the definitions of normalized zonal polynomial $C_{\lambda}(S)$ and generalized hypergeometric coefficient $(a)_{\lambda}$, see hypergeometric function with a matrix argument. Every distribution in this family is right orthogonal invariant, and therefore explicitly defines a distribution on the Grassmann manifold (see unnamed general family on the Grassmann). It can be generalized using invariant polynomials with mutiple matrix arguments [Eq 2.3.17]. Special caes have a simpler form [Eq 2.3.12-16], which include matrix Bingham and MACG.

Sampling: geodesic Monte Carlo [@Byrne2013] is applicable.

**Uniform** distribution on the Grassmann manifold w.r.t. the invariant measure
is the normalized invariant measure, which is possible because the invariant measure is finite:
$[d P] = (d P) / g(k, n)$, where $g(k, n) = \pi^{k(n-k)/2} \Gamma_k(k/2) \Gamma_k^{-1}(n/2)$.
Note that $g(k, n) := \int_{\mathcal{P}(k,n)} (d P)$
should not be called the volume or mass of the Grassmann manifold,
because the invariant differential n-form is defined differently from the Riemannian volume form,
and the Grassmann manifold is not orientable when n is odd.
PDF: $p(P) = 1$.

Any PDF on the Stiefel manifold that is right orthogonal invariant explicitly defines a distribution on the Grassmann manifold: let $X \in V_{k,n}$, $X \sim p(X)$, and $p(X) = p(X Q)$, $\forall Q \in O(k)$, then $X X^T \sim p(X)$. This applies to matrix Langevin and matrix angular central Gaussian.

**Matrix Langevin** $L^{(P)}_{k,n}(B)$, where $B \in \mathcal{S}(n)$,
is an exponential-linear family on the rank-k symmetric projection matrix manifold
$\mathcal{P}(k,n)$ [@Watson1991; @Chikuse and Watson, 1995].
PDF: $p(P; B) = z^{-1} \exp(\text{tr}(B P))$,
where normalizing constant $z = {}_{1} F_{1} (k/2; n/2; B/2)$.
It is analgous to the distribution $\exp(c x)$,
but on a compact domain and can have non-negative coefficients.
Its PDF is the same as matrix Bingham, with $P = X X^T$.
Matrix Bingham induces matrix Langevin on the Grassmann by submersion:
if $X \sim B_{k,n}(B)$, then $X X^T \sim L^{(P)}_{k,n}(B)$.
A symmetric Gaussian restricts to matrix Langevin on the Grassmann:
if $S \sim N_{n,n}(B, I_n)$, then $S|_{\mathcal{P}(k,n)} \sim L^{(P)}_{k,n}(B)$.
Matrix Langevin on the Grassmann contains the uniform distribution:
$L^{(P)}_{k,n}(0) = U(\mathcal{P}(k,n))$.

**Matrix angular central Gaussian** $\text{MACG}(\Sigma)$ on the Grassmann manifold
is the distribution induced by the orthonormalized projection
of a right orthogonal invariant Gaussian [Rmk 2.4.11]:
if $M \sim N_{n,k}(0; \Sigma, I_k)$, then $\pi(M) \pi(M)^T \sim \text{MACG}(\Sigma)$.
PDF: (1) Stiefel representation: $p(X X^T; \Sigma) = z^{-1} |X^T \Sigma^{-1} X|^{-n/2}$,
where normalizing constant $z = |\Sigma|^{k/2}$ (This is exactly the same as MACG for the Stiefel);
(2) projection matrix: $p(P; \Sigma) = z^{-1} |I_n - (I_n - \Sigma^{-1}) P|^{-n/2}$.
Note that the two forms are equivalent: by Sylvester's determinant theorem,
$|I_n - (I_n - \Sigma^{-1}) X X^T| = |I_k - X^T (I_n - \Sigma^{-1}) X|$,
the right-hand side equals $|I_k - X^T X + X^T \Sigma^{-1} X|$ where the first two terms cancels,
and therefore $|I_n - (I_n - \Sigma^{-1}) X X^T| = |X^T \Sigma^{-1} X|$.
MACG on the Stiefel induces MACG on the Grassmann by submersion:
if $X \sim \text{MACG}(\Sigma)$, then $X X^T \sim \text{MACG}(\Sigma)$.
All properties of MACG on Stiefel apply.

**Orthogonal projective Gaussian** $\text{OPG}(B)$
on the rank-k symmetric projection matrix manifold $\mathcal{P}(k,n)$,
where $B \in \cup_{i=0}^k \mathcal{S}_{\ge 0}(i, n)$,
is the distribution induced by the orthonormalized projection
of a (right-transformed) matrix-variate Gaussian:
if $M \sim N_{n,k}(M; I_n, \Sigma)$, then $\pi(M) \pi(M)^T \sim \text{OPG}(M \Sigma^{-1} M^T)$.
Note that $\text{MACG}(\Sigma)$ can be induced in the same way
with $M \sim N_{n,k}(0; \Sigma, I_k)$.
PDF: $p(P; B) = z^{-1} {}_{1} F_{1}(n/2; k/2; BP/2)$,
where normalizing constant $z = \exp(\text{tr}(B)/2)$.
Compared with matrix Langevin $L^{(P)}_{k,n}(B)$,
the PDF uses ${}_{1} F_{1}$ instead of ${}_{0} F_{0}$,
and therefore its exact computation is intractable.

An unnamed general family $\mathcal{F}^{(P)}(B; d)$ on the rank-k symmetric projection matrix manifold $\mathcal{P}(k,n)$, is defined by PDF of the form [Eq 2.3.20]: $p(P; B; d) = z^{-1} g(B P)$, where normalizing constant $z = g_0(B)$. See the unnamed general family $\mathcal{F}^{(V)}(B; d)$ on Stiefel for defintions of the notations. All properties of the unnamed general family on Stiefel apply.

Riemannian symmetric spaces: Gaussian-like distribution $f(p) \propto \exp[- d^2(p, \mu) / (2 \sigma^2)]$ [@Said2018],

- positive definite matrices [@Said2017] with a certain structure, e.g. complex, Toeplitz, or block-Toeplitz;
- Grassmannian

Heat kernels:

- 1-sphere: wrapped Gaussian.

**Wrapped Gaussian distribution** on the 1-sphere is its heat kernel,
which "wraps" the Gaussian distribution around the unit circle: $f_{WN} (\theta; \mu, \sigma) =
\sum_{k \in \mathbb{Z}} \exp[-(\theta-\mu+2\pi k)^2/(2\sigma^2)] / \sqrt{2 \pi \sigma^2}$.
Heat kernels of complete Riemannian manifolds do not have closed forms in general,
see Geometric Diffusion.