This article looks at some parametric families of probability distributions on matrix manifolds. The default reference is [@Chikuse2003]; other main references include [@Dryden2016; @Bhattacharya2012; @Patrangenaru2015; @Srivastava2016]. In this article, |A| denotes the determinant of a square matrix A; $N_n$ denotes the distribution of an n-dimensional Gaussian random vector; (dX) denotes the Lebesgue measures for rectangular, symmetric, or skew-symmetric matrices, or the unnormalized invariant measure on the Stiefel or Grassmann manifold; [dX] denotes the normalized invariant measure on the Stiefel or Grassmann manifold. Following [@Chikuse2003], all PDFs on the Stiefel or Grassmann manifold are defined w.r.t. the normalized measures.

## Concepts

Left and right orthogonal invariance... $M \in M_{m,n}$, $p(M) = p(M Q)$, $\forall Q \in O(n)$. See orthogonal-invariant function.

## Matrix manifolds

### General matrix manifold

Matrix-variate standard Gaussian distribution $N_{n,k}(0; I_n, I_k)$ on the n-by-k matrix manifold $M_{n,k}$ [Sec 1.5] is the nk-dimensional standard Gaussian vector stacked into a matrix: let $\text{vec}(Z) \sim N_{nk}(0, I_{nk})$, then $Z \sim N_{n,k}(0; I_n, I_k)$. PDF: (1) $\phi(Z) = z^{-1} \exp(-\text{tr}(Z^T Z) / 2)$, where normalizing constant $z = (2 \pi)^{kn/2}$; (2) because the Frobenius norm is related to the trace via $\|Z\|_F^2 = \text{tr}(Z^T Z)$, we have $\phi(Z) = z^{-1} \exp(-\|Z\|_F^2 / 2)$. Its characteristic function is also log-quadratic: $\Phi_Z(T) = \exp(-\text{tr}(T^T T)/2)$.

Matrix-variate Gaussian distribution $N_{n,k}(M; \Sigma_n, \Sigma_k)$ on the n-by-k matrix manifold, where $M \in M_{n,k}$, $\Sigma_n \in \mathcal{S}_+(n)$, and $\Sigma_k \in \mathcal{S}_+(k)$, is the distribution of a transformed matrix-variate standard Gaussian: $Y = \Sigma_n^{1/2} Z \Sigma_k^{1/2} + M$, where $Z \sim N_{n,k}(0; I_n, I_k)$. This can be written in a vectorized form as: $\text{vec}(Y) = (\Sigma_k^{1/2} \otimes \Sigma_n^{1/2}) \text{vec}(Z) + \text{vec}(M)$, where $\otimes$ is the Kronecker product. Note that this is a special form of the nk-dimensional Gaussian vector $N_{nk}(\mu, \Sigma)$, where covariance matrix $\Sigma = \Sigma_k \otimes \Sigma_n \in \mathcal{S}_+(nk)$. PDF: (1) relation with the matrix-variate standard Gaussian PDF: $p(Y; M, \Sigma_n, \Sigma_k) = |\Sigma_n|^{-k/2} |\Sigma_k|^{-n/2} \phi(\Sigma_n^{-1/2} (Y - M) \Sigma_k^{-1/2})$; (2) explicit form: $p(Y; M, \Sigma_n, \Sigma_k) = z^{-1} \exp\{-\text{tr}[\Sigma_n^{-1} (Y - M) \Sigma_k^{-1} (Y - M)^T] / 2\}$, where normalizing constant $z = (2 \pi)^{kn/2} |\Sigma_n|^{k/2} |\Sigma_k|^{n/2}$.

### Symmetric manifold

Symmetric standard Gaussian distribution $N_{n,n}(0; I_n)$ on symmetric matrix manifold $\mathcal{S}(n)$ is the matrix-variate standard Gaussian restricted to the symmetric matrix manifold: let $M \sim N_{n,n}(0; I_n, I_n)$, then $M|_{\mathcal{S}(n)} \sim N_{n,n}(0; I_n)$. PDF: (1) $\phi(S) = z^{-1} \exp(-\text{tr}(S^2)/2)$, where normalizing constant $z = 2^{n/2} \pi^{n(n+1)/4}$; (2) norm version, $\phi(S) = z^{-1} \exp(-\|S\|_F^2 / 2)$. Its characteristic function is also log-quadratic: $\Phi_S(T) = \exp(-\text{tr}(T^2)/2)$.

Symmetric Gaussian distribution $N_{n,n}(M; \Sigma)$ on symmetric matrix manifold $\mathcal{S}(n)$, where $M \in \mathcal{S}(n)$ and $\Sigma \in \mathcal{S}_+(n)$, is the distribution of a transformed symmetric standard Gaussian: $Y = \Sigma^{1/2} Z \Sigma^{1/2} + M$, where $Z \sim N_{n,n}(0; I_n)$. PDF: (1) relation with the symmetric standard Gaussian PDF: $p(Y; M, \Sigma) = |\Sigma|^{-(n+1)/2} \phi(\Sigma^{-1/2} (Y - M) \Sigma^{-1/2})$; (2) explicit form: $p(Y; M, \Sigma) = z^{-1} \exp\{-\text{tr}[\Sigma^{-1} (Y - M) \Sigma^{-1} (Y - M)] / 2\}$, where normalizing constant $z = 2^{n/2} \pi^{n(n+1)/4} |\Sigma|^{(n+1)/2}$.

### Positive-definite manifold

Wishart distribution $W_n(m, \Sigma)$ on positive-definite manifold $\mathcal{S}_+(n)$, where degree of freedom $m \in \{n, n+1, \cdots\}$ and covariance matrix $\Sigma \in \mathcal{S}_+(n)$, is the sampling distribution of the (unscaled) covariance matrix of a zero-mean Gaussian random vector [@Wishart1928]: let $y \sim N_n(0, \Sigma)$, if $Y = \{y_i\}_{i=1}^m$ is a random sample, then $Y Y^T = \sum_{i=1}^m y_i y_i^T \sim W_n(m, \Sigma)$. Notice that $Y \sim N_{n,m}(0; \Sigma, I_m)$ and $Y^T \sim N_{m,n}(0; I_m, \Sigma)$ are special forms of matrix-variate Gaussians. PDF: (1) $p(S; m, \Sigma) = z^{-1} |S|^{(m-n-1)/2} \exp(-\text{tr}(\Sigma^{-1} S)/2)$, where normalizing constant $z = 2^{mn/2} \Gamma_n(m/2) |\Sigma|^{m/2}$, and $\Gamma_n(a)$ is the multivariate gamma function; (2) norm version, $p(S; m, \Sigma) = z^{-1} |S|^{(m-n-1)/2} \exp(-\|\Sigma^{-1/2} S^{1/2}\|_F^2 / 2)$. Its characteristic function is: $\Phi_S(T) = |I_n - 2i \Sigma T|^{-m/2}$. The Wishart distribution converges to the symmetric standard Gaussian in a specific limit [@James1976]: $\lim_{u \to \infty} p(u I_n + \sqrt{u} S; n+1+2u, I_n/2) = z^{-1} \exp(-\text{tr}(S^2)/2)$.

Noncentral Wishart distribution $W_n(m, \Sigma; \Omega)$ on positive-definite manifold $\mathcal{S}_+(n)$, where non-centrality matrix $\Omega \in \mathcal{S}_{\ge 0}(n)$, is the distribution of the sum of m squared equal-variance Gaussian random vectors [@Muirhead1982]: let $y_i \sim N_n(m_i, \Sigma)$, and $Y = \{y_i\}_{i=1}^m$ is jointly independent, then $Y Y^T = \sum_{i=1}^m y_i y_i^T \sim W_n(m, \Sigma, \Sigma^{-1} M M^T)$, where $M = \{m_i\}_{i=1}^m$. Notice that $Y \sim N_{n,m}(M; \Sigma, I_m)$ and $Y^T \sim N_{m,n}(M^T; I_m, \Sigma)$ are special forms of matrix-variate Gaussians. PDF: $p(S; m, \Sigma, \Omega) = z^{-1} {}_0 F_{1}(m/2; \Omega \Sigma^{-1} S / 4)\ p(S; m, \Sigma)$, where normalizing constant $z = \exp(\text{tr}(\Omega)/2)$, and ${}_0 F_{1}(m/2; \Omega \Sigma^{-1} S / 4)$ is a hypergeometric function with a matrix argument, which equals $\int_{O(m)} \exp(\text{tr}((\Omega \Sigma^{-1} S)^{1/2} H)) [d H]$. Note that the noncentral Wishart generalizes the Wishart distribution: $W_n(m, \Sigma; 0) = W_n(m, \Sigma)$.

An unnamed family of distributions on the positive-definite manifold is given by [@Hayakawa1966], see [Thm 2.4.2] below. Its exact computation is intractable. It reduces to the simplest Wishart distribution: $p(\cdot; I_n) = W_k(n, I_k)$.

## Spheres

von Mises distribution on the 1-shpere (i.e. circle) is a closed-form approximation to the wrapped Gaussian distribution. von Mises–Fisher distribution (vMF) [@Fisher1953] generalizes the von Mises distribution to n-spheres, and is the most commonly used distribution in directional/circular/spherical statistics: $f(x; a, b) \propto \exp(b \cos(x-a))$, where $x, a \in [0, 2\pi), b \ge 0$.

Bingham [@Bingham1974]

Kent [@Kent1982]

Angular central Gaussian (ACG)

Bivariate von Mises on the 2-torus [@Mardia1975]

## Stiefel manifold

Uniform distribution on the Stiefel manifold w.r.t. the invariant measure is the normalized invariant measure, which is possible because the invariant measure is finite: $[d X] = (d X) / v(k, n)$, where $v(k, n) = 2^k \pi^{kn/2} \Gamma_k^{-1}(n/2)$. Note that $v(k, n) := \int_{V_{k,n}} (d X)$ should not be called the volume or mass of the Stiefel manifold because the invariant differential n-form is defined differently from the Riemannian volume form. PDF: $p(X) = 1$.

Matrix Langevin $L_{k,n}(F)$ or matrix von Mises-Fisher (matrix vMF), where $F \in M_{n,k}$, is an exponential-linear family on the Stiefel manifold $V_{k,n}$ [@Downs1972]. PDF: $p(X; F) = z^{-1} \exp(\text{tr}(F^T X))$, where normalizing constant $z = {}_{0} F_{1} (n/2; F^T F / 4)$. It is analgous to the distribution $\exp(c x)$, but on a compact domain and can have non-negative coefficients. The von Mises-Fisher (vMF) for n-spheres and matrix vMF for Stiefel manifolds are in the larger family of maximum entropy distributions with moment constraints [@Pennec2006]. A matrix-variate Gaussian restricts to matrix Langevin on the Stiefel: if $X \sim N_{n,k}(M; I_n, \Sigma)$, then $X|_{V_{k,n}} \sim L_{k,n}(M \Sigma^{-1})$. Matrix Langevin on the Stiefel contains the uniform distribution: $L_{k,n}(0) = U(V_{k,n})$.

Matrix Bingham $B_{k,n}(B)$, where $B \in \mathcal{S}(n)$, is an exponential-quadratic family on the Stiefel manifold $V_{k,n}$. PDF: $p(X; B) = z^{-1} \exp(\text{tr}(X^T B X))$, where normalizing constant $z = {}_{1} F_{1} (k/2; n/2; B/2)$. It is analgous to $\exp(c x^2)$, but on a compact domain and can have non-negative coefficients. It is right orthogonal invariant, and therefore explicitly defines a distribution on the Grassmann manifold (see matrix Langenvin on the Grassmann). Matrix Bingham contains the uniform distribution: $B_{k,n}(I_n) = U(V_{k,n})$.

Consider the orthonormalization or projection $\pi: M^∗_{n,k} \mapsto V_{k,n}$ defined by $\pi(M) = U V^T$, where $M = U \Sigma V^T$ is a thin SVD. Note that although SVD is not unique, the mapping is uniquely defined. Theorem (Orthonormalization induced distribution) [Thm 2.4.1]: Let a random n-by-k matrix $M \sim \tilde{f}$, denote $\pi(M) \sim f$ and $M^T M \sim g$, then $f(X) = c(n,k) \int_{\mathcal{S}_+(k)} \tilde{f}(X S^{1/2}) |S|^{(n-k-1)/2} (d S)$, and $g(S) = c(n,k) |S|^{(n-k-1)/2} \int_{V_{k,n}} \tilde{f}(X S^{1/2}) [d X]$, with coefficient $c(n,k) = \pi^{kn/2} \Gamma_k^{-1}(n/2)$. Theorem (Orthonormalization of left orthogonal invariant Gaussian) [Thm 1.5.5, 2.4.3]: Let $M \sim N_{n,k}(0; I_n, \Sigma)$, then $\pi(M) \sim U(V_{k,n})$, $M^T M \sim W_k(n, \Sigma)$, and $\pi(M) \perp M^T M$. Theorem (Orthonormalization of right orthogonal invariant Gaussian) [Thm 2.4.2]: Let $M \sim N_{n,k}(0; \Sigma, I_k)$, then $\pi(M) \sim \text{MACG}(\Sigma)$, and $M^T M \sim p(S; \Sigma) = z^{-1} |S|^{(n-k-1)/2} {}_{0} F_{0}^{(n)}(-\Sigma^{-1}/2, S)$, where normalizing constant $z = 2^{nk/2} \Gamma_{k}(n/2) |\Sigma|^{k/2}$, and ${}_{0} F_{0}^{(n)}(S, T) := \int_{O(n)} \exp(\text{tr}(S H T H^T)) [d H]$, with $S, T \in \mathcal{S}(n)$ [Eq A.6.6].

Matrix angular central Gaussian $\text{MACG}(\Sigma)$ on the Stiefel manifold $V_{k,n}$, where $\Sigma \in \mathcal{S}_+(n)$, is the distribution induced by orthonormalization of a right orthogonal invariant Gaussian [@Tyler1987]: if $M \sim N_{n,k}(0; \Sigma, I_k)$, then $\pi(M) \sim \text{MACG}(\Sigma)$. Equivalently, if a random n-by-k matrix has a PDF of the form $p(M; \Sigma) \propto g(M^T \Sigma^{-1} M)$ and is right orthogonal invariant, then $\pi(M) \sim \text{MACG}(\Sigma)$. Note that for $N_{n,k}(0; \Sigma, I_k)$, we have $g(S) = \exp(-\text{tr}(S)/2)$. More generally, g() may build on matrix-variate functions (trace, determinant, spectral radius), real functions, and (optionally) matrix functions (matrix polynomial / exponential / logarithm). PDF: $p(X; \Sigma) = z^{-1} |X^T \Sigma^{-1} X|^{-n/2}$, where normalizing constant $z = |\Sigma|^{k/2}$. The parameter is determined up to scaling: $\text{MACG}(\Sigma) = \text{MACG}(c \Sigma)$, $\forall c > 0$. Evaluating the PDF with a diagonal $\Sigma$: matrix multiplication takes $n k^2$ FLOPs; determinant takes $\mathcal{O}(k^3)$ FLOPs. It is (also) right orthogonal invariant, and therefore explicitly defines a distribution on the Grassmann manifold (see MACG on the Grassmann). MACG contains the uniform distribution: $\text{MACG}(I_n) = U(V_{k,n})$. If $M \sim N_{n,k}(0; \Sigma, I_k)$, then $\pi(B M) \sim \text{MACG}(B \Sigma B^T)$; if $B^T B = \Sigma^{-1}$, then $B \Sigma B^T = I_n$ and therefore $\pi(B M) \sim U(V_{k,n})$.

An unnamed general family $\mathcal{F}^{(V)}(B; d)$ on the Stiefel manifold $V_{k,n}$, where $B \in \mathcal{S}(n)$ and $d: \mathbb{Z}_{+\downarrow}^\infty \mapsto \mathbb{R}$, is defined by PDF of the form [Eq 2.3.9]: $p(X; B; d) = z^{-1} g(X^T B X)$, where normalizing constant $z = g_0(B)$. Here, $g(S) = \sum_{l=0}^\infty (l!)^{-1} \sum_{\lambda \vdash l} d_{\lambda} C_{\lambda}(S)$, and $g_0(B) = \sum_{l=0}^\infty (l!)^{-1} \sum_{\lambda \vdash l} d_{\lambda} (k/2)_{\lambda} (n/2)_{\lambda}^{-1} C_{\lambda}(B)$. For the definitions of normalized zonal polynomial $C_{\lambda}(S)$ and generalized hypergeometric coefficient $(a)_{\lambda}$, see hypergeometric function with a matrix argument. Every distribution in this family is right orthogonal invariant, and therefore explicitly defines a distribution on the Grassmann manifold (see unnamed general family on the Grassmann). It can be generalized using invariant polynomials with mutiple matrix arguments [Eq 2.3.17]. Special caes have a simpler form [Eq 2.3.12-16], which include matrix Bingham and MACG.

Sampling: geodesic Monte Carlo [@Byrne2013] is applicable.

## Grassmann manifold

Uniform distribution on the Grassmann manifold w.r.t. the invariant measure is the normalized invariant measure, which is possible because the invariant measure is finite: $[d P] = (d P) / g(k, n)$, where $g(k, n) = \pi^{k(n-k)/2} \Gamma_k(k/2) \Gamma_k^{-1}(n/2)$. Note that $g(k, n) := \int_{\mathcal{P}(k,n)} (d P)$ should not be called the volume or mass of the Grassmann manifold, because the invariant differential n-form is defined differently from the Riemannian volume form, and the Grassmann manifold is not orientable when n is odd. PDF: $p(P) = 1$.

Any PDF on the Stiefel manifold that is right orthogonal invariant explicitly defines a distribution on the Grassmann manifold: let $X \in V_{k,n}$, $X \sim p(X)$, and $p(X) = p(X Q)$, $\forall Q \in O(k)$, then $X X^T \sim p(X)$. This applies to matrix Langevin and matrix angular central Gaussian.

Matrix Langevin $L^{(P)}_{k,n}(B)$, where $B \in \mathcal{S}(n)$, is an exponential-linear family on the rank-k symmetric projection matrix manifold $\mathcal{P}(k,n)$ [@Watson1991; @Chikuse and Watson, 1995]. PDF: $p(P; B) = z^{-1} \exp(\text{tr}(B P))$, where normalizing constant $z = {}_{1} F_{1} (k/2; n/2; B/2)$. It is analgous to the distribution $\exp(c x)$, but on a compact domain and can have non-negative coefficients. Its PDF is the same as matrix Bingham, with $P = X X^T$. Matrix Bingham induces matrix Langevin on the Grassmann by submersion: if $X \sim B_{k,n}(B)$, then $X X^T \sim L^{(P)}_{k,n}(B)$. A symmetric Gaussian restricts to matrix Langevin on the Grassmann: if $S \sim N_{n,n}(B, I_n)$, then $S|_{\mathcal{P}(k,n)} \sim L^{(P)}_{k,n}(B)$. Matrix Langevin on the Grassmann contains the uniform distribution: $L^{(P)}_{k,n}(0) = U(\mathcal{P}(k,n))$.

Matrix angular central Gaussian $\text{MACG}(\Sigma)$ on the Grassmann manifold is the distribution induced by the orthonormalized projection of a right orthogonal invariant Gaussian [Rmk 2.4.11]: if $M \sim N_{n,k}(0; \Sigma, I_k)$, then $\pi(M) \pi(M)^T \sim \text{MACG}(\Sigma)$. PDF: (1) Stiefel representation: $p(X X^T; \Sigma) = z^{-1} |X^T \Sigma^{-1} X|^{-n/2}$, where normalizing constant $z = |\Sigma|^{k/2}$ (This is exactly the same as MACG for the Stiefel); (2) projection matrix: $p(P; \Sigma) = z^{-1} |I_n - (I_n - \Sigma^{-1}) P|^{-n/2}$. Note that the two forms are equivalent: by Sylvester's determinant theorem, $|I_n - (I_n - \Sigma^{-1}) X X^T| = |I_k - X^T (I_n - \Sigma^{-1}) X|$, the right-hand side equals $|I_k - X^T X + X^T \Sigma^{-1} X|$ where the first two terms cancels, and therefore $|I_n - (I_n - \Sigma^{-1}) X X^T| = |X^T \Sigma^{-1} X|$. MACG on the Stiefel induces MACG on the Grassmann by submersion: if $X \sim \text{MACG}(\Sigma)$, then $X X^T \sim \text{MACG}(\Sigma)$. All properties of MACG on Stiefel apply.

Orthogonal projective Gaussian $\text{OPG}(B)$ on the rank-k symmetric projection matrix manifold $\mathcal{P}(k,n)$, where $B \in \cup_{i=0}^k \mathcal{S}_{\ge 0}(i, n)$, is the distribution induced by the orthonormalized projection of a (right-transformed) matrix-variate Gaussian: if $M \sim N_{n,k}(M; I_n, \Sigma)$, then $\pi(M) \pi(M)^T \sim \text{OPG}(M \Sigma^{-1} M^T)$. Note that $\text{MACG}(\Sigma)$ can be induced in the same way with $M \sim N_{n,k}(0; \Sigma, I_k)$. PDF: $p(P; B) = z^{-1} {}_{1} F_{1}(n/2; k/2; BP/2)$, where normalizing constant $z = \exp(\text{tr}(B)/2)$. Compared with matrix Langevin $L^{(P)}_{k,n}(B)$, the PDF uses ${}_{1} F_{1}$ instead of ${}_{0} F_{0}$, and therefore its exact computation is intractable.

An unnamed general family $\mathcal{F}^{(P)}(B; d)$ on the rank-k symmetric projection matrix manifold $\mathcal{P}(k,n)$, is defined by PDF of the form [Eq 2.3.20]: $p(P; B; d) = z^{-1} g(B P)$, where normalizing constant $z = g_0(B)$. See the unnamed general family $\mathcal{F}^{(V)}(B; d)$ on Stiefel for defintions of the notations. All properties of the unnamed general family on Stiefel apply.

## Misc

Riemannian symmetric spaces: Gaussian-like distribution $f(p) \propto \exp[- d^2(p, \mu) / (2 \sigma^2)]$ [@Said2018],

• positive definite matrices [@Said2017] with a certain structure, e.g. complex, Toeplitz, or block-Toeplitz;
• Grassmannian

Heat kernels:

• 1-sphere: wrapped Gaussian.

Wrapped Gaussian distribution on the 1-sphere is its heat kernel, which "wraps" the Gaussian distribution around the unit circle: $f_{WN} (\theta; \mu, \sigma) = \sum_{k \in \mathbb{Z}} \exp[-(\theta-\mu+2\pi k)^2/(2\sigma^2)] / \sqrt{2 \pi \sigma^2}$. Heat kernels of complete Riemannian manifolds do not have closed forms in general, see Geometric Diffusion.