Probabilistic Models on Matrix Manifolds

This article looks at some parametric families of probability distributions on matrix manifolds. The default reference is [@Chikuse2003]; other main references include [@Dryden2016; @Bhattacharya2012; @Patrangenaru2015; @Srivastava2016]. In this article, |A| denotes the determinant of a square matrix A; $N_n$ denotes the distribution of an n-dimensional Gaussian random vector; (dX) denotes the Lebesgue measures for rectangular, symmetric, or skew-symmetric matrices, or the unnormalized invariant measure on the Stiefel or Grassmann manifold; [dX] denotes the normalized invariant measure on the Stiefel or Grassmann manifold. Following [@Chikuse2003], all PDFs on the Stiefel or Grassmann manifold are defined w.r.t. the normalized measures.

Concepts

Left and right orthogonal invariance... $M \in M_{m,n}$, $p(M) = p(M Q)$, $\forall Q \in O(n)$. See orthogonal-invariant function.

Matrix manifolds

General matrix manifold

Matrix-variate standard Gaussian distribution $N_{n,k}(0; I_n, I_k)$ on the n-by-k matrix manifold $M_{n,k}$ [Sec 1.5] is the nk-dimensional standard Gaussian vector stacked into a matrix: let $\text{vec}(Z) \sim N_{nk}(0, I_{nk})$, then $Z \sim N_{n,k}(0; I_n, I_k)$. PDF: (1) $\phi(Z) = z^{-1} \exp(-\text{tr}(Z^T Z) / 2)$, where normalizing constant $z = (2 \pi)^{kn/2}$; (2) because the Frobenius norm is related to the trace via $\|Z\|_F^2 = \text{tr}(Z^T Z)$, we have $\phi(Z) = z^{-1} \exp(-\|Z\|_F^2 / 2)$. Its characteristic function is also log-quadratic: $\Phi_Z(T) = \exp(-\text{tr}(T^T T)/2)$.

Matrix-variate Gaussian distribution $N_{n,k}(M; \Sigma_n, \Sigma_k)$ on the n-by-k matrix manifold, where $M \in M_{n,k}$, $\Sigma_n \in \mathcal{S}_+(n)$, and $\Sigma_k \in \mathcal{S}_+(k)$, is the distribution of a transformed matrix-variate standard Gaussian: $Y = \Sigma_n^{1/2} Z \Sigma_k^{1/2} + M$, where $Z \sim N_{n,k}(0; I_n, I_k)$. This can be written in a vectorized form as: $\text{vec}(Y) = (\Sigma_k^{1/2} \otimes \Sigma_n^{1/2}) \text{vec}(Z) + \text{vec}(M)$, where $\otimes$ is the Kronecker product. Note that this is a special form of the nk-dimensional Gaussian vector $N_{nk}(\mu, \Sigma)$, where covariance matrix $\Sigma = \Sigma_k \otimes \Sigma_n \in \mathcal{S}_+(nk)$. PDF: (1) relation with the matrix-variate standard Gaussian PDF: $p(Y; M, \Sigma_n, \Sigma_k) = |\Sigma_n|^{-k/2} |\Sigma_k|^{-n/2} \phi(\Sigma_n^{-1/2} (Y - M) \Sigma_k^{-1/2})$; (2) explicit form: $p(Y; M, \Sigma_n, \Sigma_k) = z^{-1} \exp\{-\text{tr}[\Sigma_n^{-1} (Y - M) \Sigma_k^{-1} (Y - M)^T] / 2\}$, where normalizing constant $z = (2 \pi)^{kn/2} |\Sigma_n|^{k/2} |\Sigma_k|^{n/2}$.

Symmetric manifold

Symmetric standard Gaussian distribution $N_{n,n}(0; I_n)$ on symmetric matrix manifold $\mathcal{S}(n)$ is the matrix-variate standard Gaussian restricted to the symmetric matrix manifold: let $M \sim N_{n,n}(0; I_n, I_n)$, then $M|_{\mathcal{S}(n)} \sim N_{n,n}(0; I_n)$. PDF: (1) $\phi(S) = z^{-1} \exp(-\text{tr}(S^2)/2)$, where normalizing constant $z = 2^{n/2} \pi^{n(n+1)/4}$; (2) norm version, $\phi(S) = z^{-1} \exp(-\|S\|_F^2 / 2)$. Its characteristic function is also log-quadratic: $\Phi_S(T) = \exp(-\text{tr}(T^2)/2)$.

Symmetric Gaussian distribution $N_{n,n}(M; \Sigma)$ on symmetric matrix manifold $\mathcal{S}(n)$, where $M \in \mathcal{S}(n)$ and $\Sigma \in \mathcal{S}_+(n)$, is the distribution of a transformed symmetric standard Gaussian: $Y = \Sigma^{1/2} Z \Sigma^{1/2} + M$, where $Z \sim N_{n,n}(0; I_n)$. PDF: (1) relation with the symmetric standard Gaussian PDF: $p(Y; M, \Sigma) = |\Sigma|^{-(n+1)/2} \phi(\Sigma^{-1/2} (Y - M) \Sigma^{-1/2})$; (2) explicit form: $p(Y; M, \Sigma) = z^{-1} \exp\{-\text{tr}[\Sigma^{-1} (Y - M) \Sigma^{-1} (Y - M)] / 2\}$, where normalizing constant $z = 2^{n/2} \pi^{n(n+1)/4} |\Sigma|^{(n+1)/2}$.

Positive-definite manifold

Wishart distribution $W_n(m, \Sigma)$ on positive-definite manifold $\mathcal{S}_+(n)$, where degree of freedom $m \in \{n, n+1, \cdots\}$ and covariance matrix $\Sigma \in \mathcal{S}_+(n)$, is the sampling distribution of the (unscaled) covariance matrix of a zero-mean Gaussian random vector [@Wishart1928]: let $y \sim N_n(0, \Sigma)$, if $Y = \{y_i\}_{i=1}^m$ is a random sample, then $Y Y^T = \sum_{i=1}^m y_i y_i^T \sim W_n(m, \Sigma)$. Notice that $Y \sim N_{n,m}(0; \Sigma, I_m)$ and $Y^T \sim N_{m,n}(0; I_m, \Sigma)$ are special forms of matrix-variate Gaussians. PDF: (1) $p(S; m, \Sigma) = z^{-1} |S|^{(m-n-1)/2} \exp(-\text{tr}(\Sigma^{-1} S)/2)$, where normalizing constant $z = 2^{mn/2} \Gamma_n(m/2) |\Sigma|^{m/2}$, and $\Gamma_n(a)$ is the multivariate gamma function; (2) norm version, $p(S; m, \Sigma) = z^{-1} |S|^{(m-n-1)/2} \exp(-\|\Sigma^{-1/2} S^{1/2}\|_F^2 / 2)$. Its characteristic function is: $\Phi_S(T) = |I_n - 2i \Sigma T|^{-m/2}$. The Wishart distribution converges to the symmetric standard Gaussian in a specific limit [@James1976]: $\lim_{u \to \infty} p(u I_n + \sqrt{u} S; n+1+2u, I_n/2) = z^{-1} \exp(-\text{tr}(S^2)/2)$.

Noncentral Wishart distribution $W_n(m, \Sigma; \Omega)$ on positive-definite manifold $\mathcal{S}_+(n)$, where non-centrality matrix $\Omega \in \mathcal{S}_{\ge 0}(n)$, is the distribution of the sum of m squared equal-variance Gaussian random vectors [@Muirhead1982]: let $y_i \sim N_n(m_i, \Sigma)$, and $Y = \{y_i\}_{i=1}^m$ is jointly independent, then $Y Y^T = \sum_{i=1}^m y_i y_i^T \sim W_n(m, \Sigma, \Sigma^{-1} M M^T)$, where $M = \{m_i\}_{i=1}^m$. Notice that $Y \sim N_{n,m}(M; \Sigma, I_m)$ and $Y^T \sim N_{m,n}(M^T; I_m, \Sigma)$ are special forms of matrix-variate Gaussians. PDF: $p(S; m, \Sigma, \Omega) = z^{-1} {}_0 F_{1}(m/2; \Omega \Sigma^{-1} S / 4)\ p(S; m, \Sigma)$, where normalizing constant $z = \exp(\text{tr}(\Omega)/2)$, and ${}_0 F_{1}(m/2; \Omega \Sigma^{-1} S / 4)$ is a hypergeometric function with a matrix argument, which equals $\int_{O(m)} \exp(\text{tr}((\Omega \Sigma^{-1} S)^{1/2} H)) [d H]$. Note that the noncentral Wishart generalizes the Wishart distribution: $W_n(m, \Sigma; 0) = W_n(m, \Sigma)$.

An unnamed family of distributions on the positive-definite manifold is given by [@Hayakawa1966], see [Thm 2.4.2] below. Its exact computation is intractable. It reduces to the simplest Wishart distribution: $p(\cdot; I_n) = W_k(n, I_k)$.

Spheres

von Mises distribution on the 1-shpere (i.e. circle) [@von-Mises1918] is a closed-form approximation to the wrapped Gaussian distribution: $p(x; m, \kappa) = z^{-1} \exp(\kappa \cos(x - m))$, where $x \in [0, 2\pi)$, with mode (and intrinsic mean) $m \in [0, 2\pi)$, concentration parameter $\kappa \ge 0$, and $z = 2 \pi I_0(\kappa)$. Here $I_v$ is the modified Bessel function of the first kind of order v. It is uniform when $\kappa = 0$, and deterministic when $\kappa \to \infty$. von Mises–Fisher (vMF) distribution [@Fisher1953], aka Langevin distribution $L_n(c)$ (attributing to [@Langevin1905]), is the exponential-linear family on the n-spheres: $p(x; c) = z^{-1} \exp(c^T x)$, where $x \in \mathbb{S}^{n-1}$, $c \in \mathbb{R}^n$, and $z = (2 \pi) (2 \pi / \|c\|)^{n/2 - 1} I_{n/2-1}(\|c\|)$. The vMF distribution generalizes the von Mises distribution to n-spheres, and is the most commonly used distribution in directional/circular/spherical statistics. It has a mode (and intrinsic mean) $\bar{c} = c / \|c\|$, and it is rotationally symmetric around the mode.

Bingham distribution $B_n(B)$ [@Bingham1974] is an exponential-quadratic family on the n-sphere: $p(x; B) = z^{-1} \exp(x^T B x)$, where $B \in S(n)$ and $z = {}_{1}F_{1}(1/2; n/2; \Lambda(B))$. The matrix parameter is determined up to scalar shifts: $B_n(B) = B_n(B + c I_n)$, $\forall c \in \mathbb{R}$. The Bingham distribution is antipodally symmetric: $p(x) = p(-x)$. Let $B = V \Lambda V^T$ where $\lambda \in \mathbb{R}^n_{\downarrow}$, then the Bingham distribution has modes (and intrinsic means) $\pm v_1$.

Angular central Gaussian $\text{ACG}(\Sigma)$ (see e.g. [@Watson1983; @Tyler1987]) is a polynomial-quadratic family on the n-sphere, defined as: $p(x; \Sigma) = z^{-1} (x^T \Sigma^{-1} x)^{-n/2}$, where $\Sigma \in S_+(n)$ and $z = \sqrt{\det(\Sigma)}$. Similar to the Bingham distribution, the ACG distribution: (1) is antipodally symmetric; (2) let $\Sigma = V \Lambda V^T$ where $\lambda \in \mathbb{R}^n_{+\downarrow}$, then it has modes (and intrinsic means) $\pm v_1$.

Miscellaneous: Langevin-Bingham [@Mardia1975] combines the Langevin and the Bingham distributions, and is the general exponential-quadratic family on the n-sphere, with PDF proportional to $\exp(x^T B x + c^T x)$. Bivariate von Mises is the exponential-bilinear family on the 2-torus [@Mardia1975], see also [@Mardia1999, Sec 3.7.1]: $p(x, y; A, b, c) \propto \exp(x^T A y + b^T x + c^T y)$, where $x, y \in \mathbb{S}^1$, $A \in M_2$, and $b, c \in \mathbb{R}^2$. Despite the name, its marginal distributions are not von Mises in general: this happens if and only if x and y are either independent (A = 0) or uniform ($b_2 = c_2 = 0$).

Stiefel manifold

Uniform distribution on the Stiefel manifold w.r.t. the invariant measure is the normalized invariant measure, which is possible because the invariant measure is finite: $[d X] = (d X) / v(k, n)$, where $v(k, n) = 2^k \pi^{kn/2} \Gamma_k^{-1}(n/2)$. Note that $v(k, n) := \int_{V_{k,n}} (d X)$ should not be called the volume or mass of the Stiefel manifold because the invariant differential n-form is defined differently from the Riemannian volume form. PDF: $p(X) = 1$.

Matrix Langevin $L_{k,n}(F)$ or matrix von Mises-Fisher (matrix vMF), where $F \in M_{n,k}$, is an exponential-linear family on the Stiefel manifold $V_{k,n}$ [@Downs1972]. PDF: $p(X; F) = z^{-1} \exp(\text{tr}(F^T X))$, where normalizing constant $z = {}_{0} F_{1} (n/2; F^T F / 4)$. It is analgous to the distribution $\exp(c x)$, but on a compact domain and can have non-negative coefficients. The von Mises-Fisher (vMF) for n-spheres and matrix vMF for Stiefel manifolds are in the larger family of maximum entropy distributions with moment constraints [@Pennec2006]. A matrix-variate Gaussian restricts to matrix Langevin on the Stiefel: if $X \sim N_{n,k}(M; I_n, \Sigma)$, then $X|_{V_{k,n}} \sim L_{k,n}(M \Sigma^{-1})$. Matrix Langevin on the Stiefel contains the uniform distribution: $L_{k,n}(0) = U(V_{k,n})$.

Matrix Bingham $B_{k,n}(B)$, where $B \in \mathcal{S}(n)$, is an exponential-quadratic family on the Stiefel manifold $V_{k,n}$. PDF: $p(X; B) = z^{-1} \exp(\text{tr}(X^T B X))$, where normalizing constant $z = {}_{1} F_{1} (k/2; n/2; B/2)$. It is analgous to $\exp(c x^2)$, but on a compact domain and can have non-negative coefficients. It is right orthogonal invariant, and therefore explicitly defines a distribution on the Grassmann manifold (see matrix Langenvin on the Grassmann). Matrix Bingham contains the uniform distribution: $B_{k,n}(I_n) = U(V_{k,n})$.

Consider the orthonormalization or projection $\pi: M^∗_{n,k} \mapsto V_{k,n}$ defined by $\pi(M) = U V^T$, where $M = U \Sigma V^T$ is a thin SVD. Note that although SVD is not unique, the mapping is uniquely defined. Theorem (Orthonormalization induced distribution) [Thm 2.4.1]: Let a random n-by-k matrix $M \sim \tilde{f}$, denote $\pi(M) \sim f$ and $M^T M \sim g$, then $f(X) = c(n,k) \int_{\mathcal{S}_+(k)} \tilde{f}(X S^{1/2}) |S|^{(n-k-1)/2} (d S)$, and $g(S) = c(n,k) |S|^{(n-k-1)/2} \int_{V_{k,n}} \tilde{f}(X S^{1/2}) [d X]$, with coefficient $c(n,k) = \pi^{kn/2} \Gamma_k^{-1}(n/2)$. Theorem (Orthonormalization of left orthogonal invariant Gaussian) [Thm 1.5.5, 2.4.3]: Let $M \sim N_{n,k}(0; I_n, \Sigma)$, then $\pi(M) \sim U(V_{k,n})$, $M^T M \sim W_k(n, \Sigma)$, and $\pi(M) \perp M^T M$. Theorem (Orthonormalization of right orthogonal invariant Gaussian) [Thm 2.4.2]: Let $M \sim N_{n,k}(0; \Sigma, I_k)$, then $\pi(M) \sim \text{MACG}(\Sigma)$, and $M^T M \sim p(S; \Sigma) = z^{-1} |S|^{(n-k-1)/2} {}_{0} F_{0}^{(n)}(-\Sigma^{-1}/2, S)$, where normalizing constant $z = 2^{nk/2} \Gamma_{k}(n/2) |\Sigma|^{k/2}$, and ${}_{0} F_{0}^{(n)}(S, T) := \int_{O(n)} \exp(\text{tr}(S H T H^T)) [d H]$, with $S, T \in \mathcal{S}(n)$ [Eq A.6.6].

Matrix angular central Gaussian $\text{MACG}(\Sigma)$ on the Stiefel manifold $V_{k,n}$, where $\Sigma \in \mathcal{S}_+(n)$, is the distribution induced by orthonormalization of a right orthogonal invariant Gaussian [@Tyler1987]: if $M \sim N_{n,k}(0; \Sigma, I_k)$, then $\pi(M) \sim \text{MACG}(\Sigma)$. Equivalently, if a random n-by-k matrix has a PDF of the form $p(M; \Sigma) \propto g(M^T \Sigma^{-1} M)$ and is right orthogonal invariant (i.e., $g(A) = g(\Lambda(A))$ for all $A \ge 0$), then $\pi(M) \sim \text{MACG}(\Sigma)$. Note that for $N_{n,k}(0; \Sigma, I_k)$, we have $g(S) = \exp(-\text{tr}(S)/2)$. More generally, g() may build on matrix-variate functions (trace, determinant, spectral radius), real functions, and (optionally) matrix functions (matrix polynomial / exponential / logarithm). PDF: $p(X; \Sigma) = z^{-1} |X^T \Sigma^{-1} X|^{-n/2}$, where normalizing constant $z = |\Sigma|^{k/2}$. The parameter is determined up to scaling: $\text{MACG}(\Sigma) = \text{MACG}(c \Sigma)$, $\forall c > 0$. Evaluating the PDF with a diagonal $\Sigma$: matrix multiplication takes $n k^2$ FLOPs; determinant takes $\mathcal{O}(k^3)$ FLOPs. It is (also) right orthogonal invariant, and therefore explicitly defines a distribution on the Grassmann manifold (see MACG on the Grassmann). MACG contains the uniform distribution: $\text{MACG}(I_n) = U(V_{k,n})$. If $M \sim N_{n,k}(0; \Sigma, I_k)$, then $\pi(B M) \sim \text{MACG}(B \Sigma B^T)$; if $B^T B = \Sigma^{-1}$, then $B \Sigma B^T = I_n$ and therefore $\pi(B M) \sim U(V_{k,n})$.

An unnamed general family $\mathcal{F}^{(V)}(B; d)$ on the Stiefel manifold $V_{k,n}$, where $B \in \mathcal{S}(n)$ and $d: \mathbb{Z}_{+\downarrow}^\infty \mapsto \mathbb{R}$, is defined by PDF of the form [Eq 2.3.9]: $p(X; B; d) = z^{-1} g(X^T B X)$, where normalizing constant $z = g_0(B)$. Here, $g(S) = \sum_{l=0}^\infty (l!)^{-1} \sum_{\lambda \vdash l} d_{\lambda} C_{\lambda}(S)$, and $g_0(B) = \sum_{l=0}^\infty (l!)^{-1} \sum_{\lambda \vdash l} d_{\lambda} (k/2)_{\lambda} (n/2)_{\lambda}^{-1} C_{\lambda}(B)$. For the definitions of normalized zonal polynomial $C_{\lambda}(S)$ and generalized hypergeometric coefficient $(a)_{\lambda}$, see hypergeometric function with a matrix argument. Every distribution in this family is right orthogonal invariant, and therefore explicitly defines a distribution on the Grassmann manifold (see unnamed general family on the Grassmann). It can be generalized using invariant polynomials with mutiple matrix arguments [Eq 2.3.17]. Special caes have a simpler form [Eq 2.3.12-16], which include matrix Bingham and MACG.

Sampling: geodesic Monte Carlo [@Byrne2013] is applicable.

Grassmann manifold

Uniform distribution on the Grassmann manifold w.r.t. the invariant measure is the normalized invariant measure, which is possible because the invariant measure is finite: $[d P] = (d P) / g(k, n)$, where $g(k, n) = \pi^{k(n-k)/2} \Gamma_k(k/2) \Gamma_k^{-1}(n/2)$. Note that $g(k, n) := \int_{\mathcal{P}(k,n)} (d P)$ should not be called the volume or mass of the Grassmann manifold, because the invariant differential n-form is defined differently from the Riemannian volume form, and the Grassmann manifold is not orientable when n is odd. PDF: $p(P) = 1$. Distributions of the principal angles between two random subspaces: [@Absil2006; derivation based on @Muirhead1982] let $X, Y \sim U(G_{k,n})$, $n \ge 2 k$, and $\theta = \theta(X, Y) \in \mathbb{R}^k_{\uparrow}$, (1) joint PDF (eq 3), $p(\theta) = z^{-1} \prod_{i=1}^k \sin^{n-2k}\theta_i \prod_{i < j} (\cos^2 \theta_i - \cos^2 \theta_j)$, where $z = (2 \pi^{k^2/2})^{-1}~\Gamma_k^2(k/2)~\Gamma_k((n-k)/2)~\Gamma_k^{-1}(n/2)$ and $\Gamma_k$ is the multivariate Gamma function. (2) marginal PDF (Thm 1), $p(\theta_k) = z^{-1} (\sin \theta_k)^{k(n-k)-1} ~{}_{2} F_{1}((n-k-1)/2, 1/2; (n+1)/2; \sin^2\theta_k I_{k-1})$, where $z = [k (n-k)]^{-1} ~\Gamma(1/2) ~\Gamma((n+1)/2)~\Gamma^{-1}((k+1)/2) ~\Gamma^{-1}((n-k+1)/2)$. Note that the joint PDF is zero when $\theta_i = \theta_j$ and at $\theta_1 = 0$ when n > 2 k. With a fixed k, the unnormalized $p(\theta_k)$ changes by a factor $(\sin \theta_k)^{k n}$, which means the largest principal angle quickly concentrates to $\pi/2$. It remains an open problem to find a close-form expression for the distribution of the Riemannian distance, $p(\|\theta\|)$.

Any PDF on the Stiefel manifold that is right orthogonal invariant explicitly defines a distribution on the Grassmann manifold: let $X \in V_{k,n}$, $X \sim p(X)$, and $p(X) = p(X Q)$, $\forall Q \in O(k)$, then $X X^T \sim p(X)$. This applies to matrix Langevin and matrix angular central Gaussian.

Matrix Langevin $L^{(P)}_{k,n}(B)$, where $B \in \mathcal{S}(n)$, is an exponential-linear family on the rank-k symmetric projection matrix manifold $\mathcal{P}(k,n)$ [@Watson1991; @Chikuse and Watson, 1995]. PDF: $p(P; B) = z^{-1} \exp(\text{tr}(B P))$, where normalizing constant $z = {}_{1} F_{1} (k/2; n/2; B/2)$. It is analgous to the distribution $\exp(c x)$, but on a compact domain and can have non-negative coefficients. Its PDF is the same as matrix Bingham, with $P = X X^T$. Matrix Bingham induces matrix Langevin on the Grassmann by submersion: if $X \sim B_{k,n}(B)$, then $X X^T \sim L^{(P)}_{k,n}(B)$. A symmetric Gaussian restricts to matrix Langevin on the Grassmann: if $S \sim N_{n,n}(B, I_n)$, then $S|_{\mathcal{P}(k,n)} \sim L^{(P)}_{k,n}(B)$. Matrix Langevin on the Grassmann contains the uniform distribution: $L^{(P)}_{k,n}(0) = U(\mathcal{P}(k,n))$.

Matrix angular central Gaussian $\text{MACG}(\Sigma)$ on the Grassmann manifold is the distribution induced by the orthonormalized projection of a right orthogonal invariant Gaussian [Rmk 2.4.11]: if $M \sim N_{n,k}(0; \Sigma, I_k)$, then $\pi(M) \pi(M)^T \sim \text{MACG}(\Sigma)$. PDF: (1) Stiefel representation: $p(X X^T; \Sigma) = z^{-1} |X^T \Sigma^{-1} X|^{-n/2}$, where normalizing constant $z = |\Sigma|^{k/2}$ (This is exactly the same as MACG for the Stiefel); (2) projection matrix: $p(P; \Sigma) = z^{-1} |I_n - (I_n - \Sigma^{-1}) P|^{-n/2}$. Note that the two forms are equivalent: by Sylvester's determinant theorem, $|I_n - (I_n - \Sigma^{-1}) X X^T| = |I_k - X^T (I_n - \Sigma^{-1}) X|$, the right-hand side equals $|I_k - X^T X + X^T \Sigma^{-1} X|$ where the first two terms cancels, and therefore $|I_n - (I_n - \Sigma^{-1}) X X^T| = |X^T \Sigma^{-1} X|$. MACG on the Stiefel induces MACG on the Grassmann by submersion: if $X \sim \text{MACG}(\Sigma)$, then $X X^T \sim \text{MACG}(\Sigma)$. All properties of MACG on Stiefel apply. Its Fréchet mean is $\text{span}(V_k)$, the top-k eigenspace of $\Sigma$: Let $\Sigma = V \Lambda V^T$ be an EVD, denote $V_k = [v_1, \cdots, v_k]$. Its uncertainty is compactly described by the eigenvalues of $\Sigma$: the larger an eigenvalue $\lambda_i$ is, the more important is the associated eigenspace $\text{span}(v_i)$.

Orthogonal projective Gaussian $\text{OPG}(B)$ on the rank-k symmetric projection matrix manifold $\mathcal{P}(k,n)$, where $B \in \cup_{i=0}^k \mathcal{S}_{\ge 0}(i, n)$, is the distribution induced by the orthonormalized projection of a (right-transformed) matrix-variate Gaussian: if $M \sim N_{n,k}(M; I_n, \Sigma)$, then $\pi(M) \pi(M)^T \sim \text{OPG}(M \Sigma^{-1} M^T)$. Note that $\text{MACG}(\Sigma)$ can be induced in the same way but with $M \sim N_{n,k}(0; \Sigma, I_k)$. PDF: $p(P; B) = z^{-1} {}_{1} F_{1}(n/2; k/2; BP/2)$, where normalizing constant $z = \exp(\text{tr}(B)/2)$. Compared with matrix Langevin $L^{(P)}_{k,n}(B)$, the PDF uses ${}_{1} F_{1}$ instead of ${}_{0} F_{0}$, and therefore its exact computation is intractable. Joint distributions can be defined on products of Grassmann manifolds, $\prod_{i=1}^m G_{k_i,n}$, where the marginal and conditional distributions are OPG (see @Chikuse2003, p.47). Its Fréchet mean is $\text{span}(B)$. Its uncertainty is also compactly described by the (nonzero) eigenvalues $(\lambda_1, \cdots, \lambda_k)$ of B. OPG on the Grassmann contains the uniform distribution: $\text{OPG}(0) = U(\mathcal{P}(k,n))$.

An unnamed general family $\mathcal{F}^{(P)}(B; d)$ on the rank-k symmetric projection matrix manifold $\mathcal{P}(k,n)$, is defined by PDF of the form [Eq 2.3.20]: $p(P; B; d) = z^{-1} g(B P)$, where normalizing constant $z = g_0(B)$. See the unnamed general family $\mathcal{F}^{(V)}(B; d)$ on Stiefel for defintions of the notations. All properties of the unnamed general family on Stiefel apply.

Misc

Riemannian symmetric spaces: Gaussian-like distribution $p(x) \propto \exp[- d^2(x, \mu) / (2 \sigma^2)]$ [@Said2018],

positive definite matrices [@Said2017] with a certain structure, e.g. complex, Toeplitz, or block-Toeplitz;
Grassmannian

Heat kernels:

1-sphere: wrapped Gaussian.

Wrapped Gaussian distribution on the 1-sphere is its heat kernel, which "wraps" the Gaussian distribution around the unit circle: $f_{WN} (\theta; \mu, \sigma) = \sum_{k \in \mathbb{Z}} \exp[-(\theta-\mu+2\pi k)^2/(2\sigma^2)] / \sqrt{2 \pi \sigma^2}$. Heat kernels of complete Riemannian manifolds do not have closed forms in general, see Geometric Diffusion.

[@Yuchi2022, Prop 3.1] If a singular matrix-variate Gaussian distribution $N_{m,n}(0; \sigma^2 P_U, \sigma^2 P_V)$ has its row and column subspace parameters uniformly distributed, $P_U \sim U(G_{k,m})$ and $P_V \sim U(G_{k,n})$, then its left and right singular bases are uniformly distributed, $U \sim U(V_{k,m})$ and $V \sim U(V_{k,n})$, and its unordered singular values follow the repulsed normal distribution $RN(0, \sigma^2)$.

🏷 Category=Statistics