Certain sets of matrices may be endowed a manifold structure, which can help us understand or solve problems. This article looks at some common matrix manifolds.
Riemannian geometric structures are endowed on matrix manifolds usually from an ambient Riemannian manifold or by a Riemannian submersion; in case tangent vectors can be unambiguously identified, e.g. in the case of the positive-definite manifold, Riemannian metrics can be directly constructed to satisfy certain properties.
The generic, m-by-n matrix manifold $M_{m, n}(\mathbb{R})$ is the (m n)-dimensional real vector space of m-by-n real matrices, endowed with the inner product $\langle A, B \rangle = \text{tr}(A B^T)$. It is diffeomorphic to, and often identified with, the Euclidean (m n)-space: $M_{m, n}(\mathbb{R}) \cong \mathbb{R}^{mn}$. Probability models: matrix-variate Gaussian distribution $N_{n,k}(M; \Sigma_n, \Sigma_k)$.
Full-rank manifold $M^∗_{m, n}(\mathbb{R})$ or $\mathbb{R}_∗^{mn}$ is the set of m-by-n full-rank real matrices, as an open subset of the m-by-n matrix manifold: $M^∗_{m, n}(\mathbb{R}) = \{X \in M_{m, n}(\mathbb{R}) : \text{rank}(X) = \min(m, n)\}$; or equivalently, $M^∗_{m, n}(\mathbb{R}) = M_{m, n}(\mathbb{R}) \setminus \bigcup_{k=0}^{\min(m,n)} \mathcal{M}(k, m \times n)$. It is a disconnected Riemannian (m n)-manifold.
Rank-k manifold $\mathcal{M}(k, m \times n)$ is the set of m-by-n real matrices of rank $k \in \{0, \dots, \min(m, n)\}$, as an embedded Riemannian $k (m + n - k)$-submanifold of the m-by-n matrix manifold: $\mathcal{M}(k, m \times n) = \{X \in M_{m, n}(\mathbb{R}) : \text{rank}(X) = k\}$; its Riemannian metric is induced from the Euclidean metric of the m-by-n matrix manifold. Alternatively, it can be defined as a quotient manifold, with the order-k general linear group acting on the full-rank manifold of (m+n)-by-k matrices: $\mathcal{M}(k, m \times n) = M^∗_{m+n,k} / \text{GL}_k$, with equivalence class $[(M, N)] = \{(M Q, N Q^{-T}) : Q \in \text{GL}_k\}$, where $M \in M^∗_{m,k}$ and $N \in M^∗_{n,k}$. From this definition, it is easy to see that it has dimension $k (m + n - k)$. Riemannian metrics can be induced on it via the Riemannian submersion theorem: let $\pi: M^∗_{m,k} \times M^∗_{n,k} \mapsto \mathcal{M}(k, m \times n)$, $\pi(M, N) = M N^T$, which is a surjective smooth submersion, consider the fiber bundle $(M^∗_{m+n,k}, \pi)$ with the (aforementioned) action of group $\text{GL}_k$ and an arbitrary Riemannian metric, there is a unique Riemannian metric on $\mathcal{M}(k, m \times n)$ such that π is a Riemannian submersion (see e.g. [@Absil2014]).
General linear group $\text{GL}_n(\mathbb{R})$ is the Lie group of nonsingular order-n real matrices. Alternatively, GL_n can be defined as the product manifold of the orthogonal group and the positive-definite manifold: $\text{GL}_n = O(n) \times \mathcal{S}_+(n)$, because every nonsingular matrix has a unique polar decompostion, A = U P. Riemannian metric on GL_n...
Symmetric matrix manifold $\mathcal{S}(n)$ is the set of order-n real symmetric matrices, as the n(n+1)/2-dimensional subspace of the order-n matrix manifold: $\mathcal{S}(n) = \{X \in M_n(\mathbb{R}) : X = X^T\}$. Probability models: symmetric Gaussian distribution $N_{n,n}(M; \Sigma)$. Positive-semidefinite manifold $\mathcal{S}_{\ge 0}(n)$ is the order-n positive semi-definite cone, as a regular domain of the symmetric matrix manifold: $\mathcal{S}_{\ge 0}(n) = \{X \in \mathcal{S}(n) : X \ge 0\}$. The $L^2$ Wasserstein metric on Gaussian measures (see Measure-Space) reduces to a distance function on the positive-semidefinite manifold: $W_2^2(\Sigma, \tilde \Sigma) = \text{tr}(\Sigma) + \text{tr}(\tilde \Sigma) - 2 \text{tr}[(\Sigma^{1/2} \tilde \Sigma \Sigma^{1/2})^{1/2}]$. It equals the Bures distance in quantum information between density matrices (PSD matrix of trace one) defining quantum states of a physical system (see e.g. [@Bhatia2019]).
Skew symmetric matrix manifold $\Omega(n)$ is the set of order-n real skew symmetric matrices, as the embedded Riemannian n(n-1)/2-submanifold of the order-n matrix manifold: $\Omega(n) = \{X \in M_n(\mathbb{R}) : X = - X^T\}$.
Orthogonal group $O(n)$ is the Lie group of order-n real orthogonal matrices: $O(n) = \{Q \in M_n(\mathbb{R}) : Q Q^T = I_n \}$. As an embedded n(n-1)/2-submanifold of the order-n matrix manifold: its tangent space is $T_Q O(n) = \{Q \Omega : \Omega \in \Omega(n)\}$; the induced Riemannian metric is $\bar{g}_Q(Z_1, Z_2) = \text{tr}(Z_1^T Z_2) = \text{tr}(\Omega_1^T \Omega_2)$.
Positive-definite manifold $\mathcal{S}_+(n)$, aka positive symmetric cone, is the smooth manifold of order-n positive-definite matrices, as an open convex subset of the symmetric matrix manifold: $\mathcal{S}_+(n) = \{X \in \mathcal{S}(n) : X > 0\}$. The tangent space is the symmetric manifold $\mathcal{S}(n)$. It becomes a homogeneous space $(\mathcal{S}_+(n), \text{GL}_n)$ with the general linear group acting via matrix congruence (合同) $A \cdot C := A C A^T$. Following the homogeneous space characterization theorem, it can be identified with the left coset space $(\text{GL}_n / O(n), \text{GL}_n)$, and $F: \text{GL}_n / O(n) \mapsto \mathcal{S}_+(n)$ defined by $F([A]) = A \cdot I = A A^T$ is an equivariant diffeomorphism. Here, orthogonal group O(n) is the isotropy group of the identity matrix w.r.t. matrix congruence, and equivalence class $[A] = \{A Q : Q \in O(n)\}$. One may further show that it is a reductive homogeneous space (see e.g. [@Bonnabel2010, A.1]). Several other Riemannian metrics have been proposed for the positive-definite manifold; see below. Probability models (see Probabilistic-Models-on-Matrix-Manifolds): Wishart distribution $W_n(m, \Sigma)$; noncentral Wishart distribution $W_n(m, \Sigma; \Omega)$.
The natural metric on the positive-definite manifold is the unique Riemannian metric that is invariant under matrix congruence [@Faraut1994; @Bhatia2007]: $g_C(T_1, T_2) = \text{tr}(T_1 C^{-1} T_2 C^{-1})$; this metric coincide with the Euclidean metric at the identity, $g_I(T_1, T_2) = \text{tr}(T_1 T_2)$. Up to a scaling factor, it is also known as the affine-invariant metric [@Pennec2006] or the Siegel metric in symplectic geometry; in statistics, it is the Fisher information metric for the multivariate Gaussian [@Smith2005]; in optimization, it is the metric defined by the natural self-concordant logarithmically homogeneous barrier [@Boyd2004]. Properties of the natural metric: (1) geodesically complete, the exponential map is $\exp_C(T) = C^{1/2} \exp(C^{-1/2} T C^{-1/2}) C^{1/2}$; in particular, the exponential map at I is the matrix exponential, $\exp_I(T) = \exp(T)$; the logarithm map is $\log_C(D) = C^{1/2} \log(C^{-1/2} D C^{-1/2}) C^{1/2}$; (2) the Riemannian distance is $d_g(C, D) = \|\log(C^{-1/2} D C^{-1/2})\|_F$, computed as |log λ| where λ are the generalized eigenvalues of C - λ D, and is invariant under matrix inversion $d_g(C, D) = d_g(C^{-1}, D^{-1})$; (3) a Riemannian symmetric space, where every point C has a point reflection defined by $\sigma^C(D) = C D^{-1} C$ [@Bhatia2007, p. 226]; (4) sectional curvature is nonpositive, and therefore every compact set has a unique center of mass; (5) the Riemannian center of mass of two points, aka their geometric mean, is the midpoint of the unique geodesic connecting them, which can be computed as: $C \circ D = C^{1/2} (C^{-1/2} D C^{-1/2})^{1/2} C^{1/2}$.
The $L^2$-Wasserstein metric on $\mathcal{S}_{\ge 0}(k, n)$ is also a Riemannian metric on $\mathcal{S}_+(n)$ when $k = n$. (In fact, this applies to any consistent family of Riemannian metrics on $\mathcal{S}_{\ge 0}(k, n)$.) See the next section for properties of the $L^2$-Wasserstein geometry. In particular, the local injectivity radius at any matrix C is the square root of the smallest eigenvalue of C [@Massart2020, Thm 6.3, p = n].
[@Arsigny2007] introduced the log-Euclidean metric, and the induced geometric mean is the matrix exponential of the arithmetic mean of matrix logarithms, which is easier to compute than the geometric mean under the natural metric.
The positive-definite manifold can also be identified with the manifold $\mathcal{L}_{+}(n)$ of (lower) triangular matrices with positive diagonal entries, due to the uniquenss of Cholesky decomposition; and [@LinZH2019] introduced the log-Cholesky metric induced from a bi-invariant metric on $\mathcal{L}_+(n)$, which is even easier to compute and numerically more stable, and the parallel transport has a closed form and is easy to compute.
Rank-k positive-semidefinite manifold $\mathcal{S}_{\ge 0}(k, n)$ is the set of rank-k order-n positive-semidefinite matrices, as a Riemannian $k (2n - k + 1) / 2$-submanifold of the order-n matrix manifold: $\mathcal{S}_{\ge 0}(k, n) = \mathcal{S}_{\ge 0}(n) \cap \mathcal{M}(k, n \times n)$. For the embedded submanifold geometry, see e.g. [@Vandereycken2009]. The exponential map has a closed form of some tangent vectors. It is related to the product manifold of the Stiefel manifold and the non-increasing positive Euclidean k-space by a surjective map: $f: V_{k, n} \times \mathbb{R}^k_{+\downarrow} \mapsto \mathcal{S}_{\ge 0}(k, n)$, $f(V, \lambda) = V \text{diag}(\lambda) V^T$, where $\mathbb{R}^k_{+\downarrow} = \{x \in \mathbb{R}^k_+ : \forall i < j, x_i \ge x_j\}$. Notice that the domain and codomain have the same manifold dimension. This also applies to the positive-definite manifold with k = n.
Alternatively, it can be defined as a quotient manifold, in various ways. [@Bonnabel2010] uses $V_{k,n} \times S_+(k) / O(k)$, with submersion $\pi: V_{k,n} \times S_+(k) \mapsto \mathcal{S}_{\ge 0}(k, n)$ defined by $\pi(V, P) = (V P) (V P)^T$. The equivalence class is $[(V, P)] = V P O(k) = \{V P Q : Q \in O(k)\}$. It extends the natural metric on $\mathcal{S}_+(n)$to $\mathcal{S}_{\ge 0}(k, n)$, as a sum of a weighted natural metric on $\mathcal{S}_+(k)$ and the standard metric of $G_{k,n}$. This class of metrics are invariant w.r.t. transformations that preserve angles (e.g. orthogonal transformations, scalings, and pseudo-inversion), and the resulting Riemannian manifold is geodesically complete [ibid. eq 4.2]. The Riemannian distance can be approximated via SVD in $\mathcal{O}(n k^2)$ time [ibid. eq 5.5]. The geometric mean preserves the rank, and is easy to compute.
[@Vandereycken2013] uses $\text{GL}_n^+ / \text{Stab}(P_k)$, where GL_n+ is the connected subgroup of GL_n with postive determinant, and stability group Stab($P_k$) is the maximal subgroup of GL_n+ that statisfies $A P_k A^T = P_k$. With the natural right-invariant metric on GL_n+, the induced metric makes $\mathcal{S}_{\ge 0}(k, n)$ a complete Riemannian manifold, with a closed-form expression for the exponential map.
[@Absil2009; @Journee2010] use $M^∗_{n, k} / O(k)$, with submersion $\pi: M^∗_{n, k} \mapsto \mathcal{S}_{\ge 0}(k, n)$ defined by $\pi(Y) = Y Y^T$; the equivalence class is $[Y] = Y O(k) = \{Y Q : Q \in O(k)\}$. With the Euclidean metric on the full-rank matrix manifold, the induced metric coincides with the $L^2$-Wasserstein metric on the positive-definite manifold when k = n, and the Riemannian distance function coincides with the $L^2$ Wasserstein distance on the positive-semidefinite manifold (see e.g. [@Massart2020]). Therefore, we may call this Riemannian metric the $L^2$-Wasserstein metric on fixed-rank positive-semidefinite manifolds, and this geometry the $L^2$-Wasserstein geometry. This Riemannian manifold is not geodesically complete, but it has easy to compute exponential map [@Vandereycken2013, sec. 7.2] and logarithm map [@Massart2020]. The injectivity radius at C is the square root of the k-th largest eigenvalue of C, i.e. the square root of the smallest positive eigenvalue [ibid.]. For a summary of the geometric operations, see [@Massart2020, Tbl 1]. The $L^2$-Wasserstein distance between two fixed-rank positive-semidefinite matrices equals the Procrustes distance between any pair of their (rectangular) representatives: $W_2(C, \widetilde{C}) = d_P(Y, \widetilde{Y})$.
Rank-k symmetric projection manifold $\mathcal{P}(k, n)$ is the set of rank-k symmetric projection matrices, as an embedded Riemannian k(n-k)-submanifold of the order-n matrix manifold: $\mathcal{P}(k, n) = \{P \in \mathcal{S}(n) : P^2 = P, \text{rank}(P) = k\}$. Its smooth manifold structure is derived from Lie group theory: because O(n) acts smoothly on $\mathcal{S}(n)$ by the map $\phi: O(n) \times \mathcal{S}(n) \mapsto \mathcal{S}(n)$ defined by $\phi(Q, S) = Q S Q^T$, the orbit $O(n) \cdot P_k = \mathcal{P}(k,n)$ of the point $P_k = I_{n,k} I_{k,n} = (I_k, 0; 0, 0)$ is a properly embedded submanifold of $\mathcal{S}(n)$, and it is diffeomorphic to the quotient manifold $O(n) / (O(n - k) \times O(k))$. Here $O(n - k) \times O(k)$ is used in place of the isotropy group of $P_k$, i.e., $\{\text{diag}(R_1, R_2) : R_1 \in O(k), R_2 \in O(n-k)\}$, because they are diffeomorphic. The smooth action $\phi$ restricts to a surjective smooth submersion $\pi: O(n) \mapsto \mathcal{P}(k,n)$ defined by $\pi(Q) = \phi(Q, P_k) = Q_k Q_k^T$. In fact, this is a Riemannian submersion when O(n) is seen as a Riemannian submanifold of the order-n matrix manifold.
Manifold property. Integral of the invariant measure on the manifold [@Chikuse2003, Eq 1.4.11]: $g(k, n) := \int_{\mathcal{P}(k,n)} (d P) = \pi^{k(n-k)/2} \Gamma_k(k/2) \Gamma_k^{-1}(n/2)$.
Differential constructs. Let Q be an orthogonal representation of P such that $\pi(Q) = P$. (1) tangent space, by way of an orthonormal basis Q, $T_P \mathcal{P(k,n)} = \{Q (0, K^T; K, 0) Q^T : K \in M_{n-k,k}\}$; note that the tangent vectors are symmetric matrices (i.e. in the tangent space of $\mathcal{S}(n)$). (2) Horizontal tangent space at Q, $H^\pi_Q O(n) = \{Q (0, -K^T; K, 0) : K \in M_{n-k,k}\}$. (3) Vertical tangent space at Q, $V^\pi_Q O(n) = \{Q (\Omega, 0; 0, \bar{\Omega}) : \Omega \in \Omega(k), \bar{\Omega} \in \Omega(n-k)\}$; note that the right-hand side matrices are anti-symmetric (i.e. the outcome is in the tangent space of O(n)). (4) Horizontal lift of tangent vector $Z \in T_P \mathcal{P}(k,n)$ to Q can be calculated as: $Z_Q = [Z, P] Q = (Z P - P Z) Q$; with factorization $Z = Q (0, K^T; K, 0) Q^T$, we have $Z_Q = Q (0, -K^T; K, 0)$. (5) With the surjective smooth submersion $\pi: V_{k,n} \mapsto \mathcal{P}(k,n)$ defined by $\pi(X) = X X^T$, the horizontal lift of tangent vector Z to X can be calculated as $Z_X = Z X$.
Geometric properties: (0) Riemannian metric: $\bar{g}_P(Z_1, Z_2) = \text{tr}(Z_1^T Z_2) = 2 \text{tr}(K_1^T K_2)$. (1) It is a Riemannian symmetric space, and every point P has a point reflection $\sigma^P: \mathcal{P}(k,n) \mapsto \mathcal{P}(k,n)$ defined by $\sigma^P(\tilde{P}) = S \tilde{P} S^T$, where $S = Q S_k Q^T$, $P = Q P_k Q^T$ is an eigen-decomposition, and $S_k = \text{diag}(I_k, -I_{n-k})$. (2) Its Riemann (1,3)-curvature tensor at $P_k$ is $R(x, y) z = b$, where $x, y, z, b \in T_{P_k} \mathcal{P}(n,k)$ have the form $x = (0, X^T; X, 0)$ with $X \in M_{n-k,k}$, and $B = Z X^T Y - Z Y^T X - X Y^T Z + Y X^T Z$. (3) When $k (n-k) \ge 2$, its sectional curvature at a point P is: $K_P(Z_1, Z_2) = 4 (\text{tr}(Z_1^2 Z_2^2) - \text{tr}((Z_1 Z_2)^2)) / (\text{tr}(Z_1^2) \text{tr}(Z_2^2) - \text{tr}^2(Z_1 Z_2)))$; or more compactly, using the Euclidean metric and the bracket, $K_P(Z_1, Z_2) = 2 \| [Z_1, Z_2] \|_F^2 / (\|Z_1\|_F^2 \|Z_2\|_F^2 - \langle Z_1, Z_2 \rangle_0^2)$. For n = 2 and k = 1, the sectional curvature is undefined. For n > 2 and k = 1 or n-1, the sectional curvature is always 1. For all other cases, the sectional curvature has a range [0, 2], see e.g. [@WongYC1968].
Stiefel manifold $V_{k, n}$ is the collection of orthonormal k-frames in the Euclidean n-space, as an embedded Riemannian submanifold of the m-by-k matrix manifold: $V_{k, n} = \{X \in M_{n,k}(\mathbb{R}) : X^T X = I_k\}$, $k \in \{1, \dots, n\}$. It is a submersion level set $F(X) = X^T X - I_k = 0$, and therefore an embedded submanifold [@Absil2008, 3.3.2]. The smooth manifold structure can also be derived from Lie group theory: because O(n) acts smoothly on $M_{n,k}$, the orbit $O(n) \cdot I_{n,p} = V_{k,n}$ of the point $I_{n,p} = (I_p; 0)$ is a properly embedded submanifold of $M_{n,k}$, and it is diffeomorphic to the quotient manifold $O(n) / O(n - k)$; here O(n - k) is used in place of the isotropy group $\{\text{diag}(I_k, R) : R \in O(n-k)\}$ because they are diffeomorphic. If k = 1, it is the (n-1)-sphere: $V_{1,n} = \mathbb{S}^{n-1}$, with dimension (n - 1). If k = n, it coincides with the orthogonal group: $V_{n,n} = O(n)$, with dimension n (n - 1) / 2.
Manifold property. Dimension, k (2n - k - 1) / 2. For O(n), n (n-1) / 2; for (n-1)-sphere, n-1. Compact, because it is diffeomorphic to the quotient manifold $O(n) / O(n-k)$. Connectedness: (n-k-1)-connected (see n-connected space). When k = n, it is O(n), which has two components, with determinant 1 for SO(n) and -1 for the other component. When k = n-1, it is path-connected. When k < n-1, it is simply connected. Homogeneous, because O(n) acts transitively on the left. Complete (if k = n, both components are metrically complete). Isometry group includes O(n) acting on the left, and O(k) acting on the right. Orientable. Integral of the invariant measure on the Stiefel manifold [@Chikuse2003, Eq 1.4.8]: $v(k, n) := \int_{V_{k,n}} (d X) = 2^k \pi^{kn/2} \Gamma_k^{-1}(n/2)$.
Differential constructs. Tangent space has the following equivalent forms: (1) as linear constraints, $T_X V_{k, n} = \{Z \in M_{n,k} : X^T Z + Z^T X = 0\}$; (2) in a "constructive" form, $T_X V_{k, n} = \{X \Omega + X_\perp K : \Omega \in \Omega(k), K \in M_{n-k,k} \}$, given an arbitrary orthogonal completion $(X, X_\perp) \in O(n)$; (3) in a projective form (aka tangential projection), $T_X V_{k, n} = \{X~\text{skew}(X^T M) + (I_n - X X^T) M : M \in M_{n,k}\}$, where $\text{skew}(A) = (A - A^T) / 2$. Normal space (w.r.t. the canonical metric): (2) in a constructive form, $N_X V_{k, n} = \{X S : S \in \mathcal{S}(k)\}$; (3) in a projective form, $N_X V_{k, n} = \{X~\text{sym}(X^T M) : M \in M_{n,k}\}$, where $\text{sym}(A) = (A + A^T) / 2$.
Geometric operations. (1) Riemannian metric: $\bar{g}_X(Z_1, Z_2) = \text{tr}(Z_1^T Z_2)$; or in orthonormal coordinates, $\bar{g}_X(Z_1, Z_2) = \text{tr}(\Omega_1^T \Omega_2 + K_1^T K_2)$, where $Z_i = X \Omega_i + X_\perp K_i$. (2) Covariant derivative (w.r.t. the Levi-Civita / tangential connection): $\nabla_Z W(X) = P_X (\bar{\nabla}_Z W(X))$, where $Z \in T_X V_{k, n}$, $W \in \mathfrak{X}(V_{k, n})$. (3) Riemannian distance function: closed form unknown? (4) Exponential map: $\exp_X(Z) = (X, Z) \exp([X^T Z, -Z^T Z; I_k, X^T Z]) (I_k; 0) \exp(-X^T Z)$ (uses matrix exponential); see also [@Edelman1998] for an efficient algorithm. (5) Parallel transport: closed form unknown. (6) Retractions (mostly based on matrix decompositions): (i) QR decomposition: $R(X, Z) = Q$, where $X + Z = Q R$, R has positive diagonal elements (modified Gram-Schmidt, about $2nk^2$ FLOPs). (ii) Polar decomposition: $R(X, Z) = (X + Z) (I_k + Z^T Z)^{-1/2}$ (iterative $O(k^3)$ for eigen-decomposition + $O(nk^2)$ scalar additions and multiplications). (7) Vector transports (associated with a retraction): (i) by differentiated retraction: $T_W(Z) = R \rho(R^T Z R') + (I - R R^T) Z R'$, where $R = R(X, W)$ is the QR-based retraction, $R' = (R^T (X + W))^{-1}$, and $\rho(A) = L - L^T$ (L is the lower triangle of A); (ii) by tangential projection (as a submanifold): $T_W(Z) = (I - Y Y^T) Z + Y~\text{skew}(Y^T Z)$, where $Y = R(X, W)$ is any retraction; (8) Logarithm map: no known closed-form expression but can be computed numerically @Zimmermann2017 Sectional curvature has a range [0, 5/4].
Canonical metric (an alternative geometry for the Stiefel manifold). In some conventions the Riemannian metric of the orthogonal group is scaled by 1/2 [@Zimmermann2017; @Bendokat2020]: $g_Q(Z_1, Z_2) = 1/2~\text{tr}(Z_1^T Z_2)$, where $Z_i \in T_Q O(n)$. This does not change the geometry of O(n): the tangent vectors are scaled by $1/\sqrt{2}$, and so is the Riemannian distance function. The "canonical metric" on the Stiefel manifold is induced from the orthogonal group (with the halved metric) via a smooth submersion $\pi_k: O(n) \mapsto V_{k,n}$, which takes an order-n matrix to its first k columns: $\pi_k(Q) = Q I_{n,k} = (q_i)_{i=1}^k$. The horizontal tangent space at Q can be written as: $H^{\pi_k}_Q O(n) = \{Q (\Omega, -K^T; K, 0) : \Omega \in O(k), K \in M_{n-k,k}\}$. The vertical tangent space at Q can be written as: $V^{\pi_k}_Q O(n) = \{Q (0, 0; 0, \bar{\Omega}) :\bar{\Omega} \in O(n-k)\}$. For a tangent vector $Z = X \Omega + X_\perp K \in T_X V_{k,n}$, its horizontal lift to $Q \in O(n)$ via $\pi_k$ can be written as $Z_Q = Q \tilde{\Omega}$, where $\tilde{\Omega} = (\Omega, -K^T; K, 0)$. Therefore the canonical metric of the Stiefel manifold is $g_X(Z_1, Z_2) = g_Q(Z_{1,Q}, Z_{2,Q})$, which equals $1/2~\text{tr}(\Omega_1^T \Omega_2) + \text{tr}(K_1^T K_2)$, and can be computed as $g_X(Z_1, Z_2) = \text{tr}(Z_1^T (I_n - 1/2 X X^T) Z_2)$. Note that the canonical metric of the Stiefel manifold is consistent with the halved metric on O(n). Since $g_X \ne \bar{g}_X$, the canonical metric gives the Stiefel manifold a different geometry than that of a Riemannian submanifold of a Euclidean space. Sectional curvature has a tight upper bound, $\max \kappa(Z_1, Z_2) = 5/4$; injectivity radius has a lower bound, $\text{inj}(V_{k,n}) \ge 2 \pi / \sqrt{5}$ [@Rentmeesters2013, eq 5.13].
Probability models [@Chikuse2003]: matrix Langevin $L_{k,n}(F)$, aka matrix von Mises-Fisher (vMF), exponential-linear; matrix Bingham $B_{k,n}(B)$, exponential-quadratic; matrix angular central Gaussian $\text{MACG}(\Sigma)$; max entropy with moment constraints [@Pennec2006]. Sampling: geodesic Monte Carlo is applicable [@Byrne2013].
Grassmann manifold or Grassmannian $G_{k, n}$ or $G_k(\mathbb{R}^n)$ is an (abstract) Riemannian manifold, defined as follows: its underlying set consists of k-subspaces of the Euclidean n-space, $G_{k, n} = \{\text{Span}(M) : M \in M^∗_{n, k}\}$, $k \in \{1, \dots, n\}$; its smooth manifold structure is uniquely determined such that group $\text{GL}_n$ is a smooth left action on $G_{k,n}$ [@Lee2012, Ex 21.21]; its Riemannian metric is uniquely determined such that $\text{Span}: V_{k,n} \mapsto G_{k,n}$ is a Riemannian submersion, because the orthogonal group O(k) is an isometric right action on the Stiefel manifold $V_{k,n}$ (w.r.t. the Euclidean-induced metric not the canonical metric) [@Lee2018, Prob 2-7]. If k = 1, it is the (n-1)-dimensional projective space, i.e. the quotient manifold of lines in the Euclidean n-space: $G_{1,n} = \mathbb{RP}^{n-1}$. If k = n, it is a singleton consisting of the Euclidean n-space: $G_{n,n} = \{\mathbb{R}^n\}$. Note that $G_{k,n}$ and $G_{n-k,n}$ can be identified, because orthgonal complement $\perp: G_{k,n} \mapsto G_{n-k,n}$ is an isometry. Although the Grassmann manifold consists of subspaces rather than matrices, it is closely related to matrix manifolds and sometimes identified with $\mathcal{P}(k,n)$.
Manifold property. Dimension, k (n - k). Compact, because it is diffeomorphic to the quotient manifolds $V_{k, n} / O(k)$ [@Lee2012, Prob 21-13] and $O(n) / (O(n-k) \times O(k))$. Simply connected, except $G_{1, 2} \cong \mathbb{S}^1$; (see e.g. this paper) Symmetric, isotropic (and therefore homogeneous) [@Lee2018, Prob 3-20]. Complete. Isometry group includes O(n) on the left, and $N_G(H)/H$ on the right, where $N_G(H)$ denotes the normalizer of H in G, with G = O(n) and isotropy group $H = O(n-k) \times O(k)$. Orientable if and only if n is even. (Note that "oriented Grassmann manifold" means the set of oriented subspaces, which does not mean the Grassmann manifold itself is oriented.)
Representations of the abstract manifold. Because $\text{Span}: V_{k,n} \mapsto G_{k,n}$ and $\text{Span}: M^∗_{n, k} \mapsto G_{k,n}$ are Riemannian submersions, one may use $M \in M^∗_{n, k}$ as a representation of Span(M) to carry out all the computations related to the Grassmann manifold. Its equivalence class is $[M] = \{M A : A \in \text{GL}_k\}$; for $X \in V_{k, n}$, $[X] = \{X Q : Q \in O(k)\}$. For notational simplicity, we may use the equivalence class [M] and abstract element Span(M) interchangeably, when there is no ambiguity. Horizontal tangent space (represents the abstract tangent space; normal space to the fiber at the representation) has the following equivalent forms: (1) as linear constraints, $H_M = \{Z \in M_{n, k} : M^T Z = 0\}$; (2) in a projective form (aka horizontal projection), $H_M = \{(I_n - M (M^T M)^{-1} M^T) Z : Z \in M_{n,k}\}$, and for Stiefel representations, $H_X = \{(I_n - X X^T) Z : Z \in M_{n,k}\}$; (3) in a "constructive" form, $H_X = \{X_\perp B : B \in M_{n-k,k}\}$ given an arbitrary orthogonal completion $(X, X_\perp) \in O(n)$. With two representations M and M' of the same abstract element [M] such that M' = M A where $A \in \text{GL}_k$, if $Z_M, Z_{M'}$ represent the same abstract tangent vector $Z \in T_{[M]} G_{k, n}$, then $Z_{M'} = Z_M A$. Vertical tangent space (analogous to normal space of a submanifold, tangent space to the fiber at the representation) $V_M = \{M A : A \in M_{k, k}\}$; for Stiefel representations, $V_X = \{X \Omega : \Omega \in \Omega(k)\}$. Note that by definition, $V_X = T_X [X]$ and $H_X = V_X^\perp$ w.r.t. $T_X V_{k,n}$.
Representations of an explicit identification. One may identify the underlying sets of the Grassmannian and the rank-k symmetric projection manifold, because $\text{Range}: \mathcal{P}(k, n) \mapsto G_{k,n}$ is a bijection, and so is the map $P: G_{k,n} \mapsto \mathcal{P}(k, n)$ that takes a subspace to its orthogonal projection operator. Let quotient map $\pi: V_{k,n} \mapsto \mathcal{P}(k,n)$ with $\pi(X) = X X^T$. Let quotient map $\pi_k: O(n) \mapsto V_{k,n}$ with $\pi_k(Q) = Q I_{n,k} = (x_i)_{i=1}^k$, which takes the first k columns of a matrix. Without ambiguity, let $\pi: O(n) \mapsto \mathcal{P}(k,n)$ with $\pi(Q) = \pi(\pi_k(Q))$; note that $\pi(Q) = \phi(Q, P_k)$. These quotient maps π are surjective smooth submersions, and as Riemannian submersions, the induced Riemannian metric coincides with (1/2 of) the (restricted) Euclidean metric. One may use $X \in V_{k,n}$ or $Q \in O(n)$ as a representation of $\pi(X) = \pi(Q) \in \mathcal{P}(k,n)$, see e.g. [@Bendokat2020, Fig 2.1].
Geometric operations. In the following, an abstract element of the Grassmannnian (e.g. p) is replaced by a basis or Stiefel representation (e.g. M, X) of the subspace (e.g. [M], [X]), and an abstract tangent vector (e.g. v, Δ) is replaced by its horizontal lift to the corresponding representation (e.g. Z, W). Riemannian metric $g_M(Z, W) = \text{tr}((M^T M)^{-1} Z^T W)$, where $Z, W \in H_M$; for a Stiefel representation, $g_X(Z, W) = \text{tr}(Z^T W)$, which is the same as the Euclidean metric [@Absil2008, Prop 3.4.6, Sec 3.6.2]. Covariant derivative (w.r.t. the Levi-Civita connection) $\overline{\nabla_Z W(X)} = P_X (\bar{\nabla}_Z W(X))$. Exponential map: $\exp_M(Z) = \pi_p(M) W \cos(\Sigma) + U \sin(\Sigma)$; where $Z = U \Sigma W^T$ is a thin SVD, and $\pi_p(M) = M (M^T M)^{-1/2}$ is the projection based on polar decomposition, which can be computed as $\pi_p(M) = A B$ via thin SVD $M = A S B$; for a Stiefel representation, $\exp_X(Z) = X W \cos(\Sigma) + U \sin(\Sigma)$. Parallel transport along geodesics: given $X \in V_{k, n}$ and $Z \in H_X$, denote $Y = \exp_X(Z)$, the parallel transport map $P^Z_{XY}: H_X \cong H_Y$ can be computed as $P^Z_{XY} = (-X W \sin(\Sigma) + U \cos(\Sigma)) U^T + (I - U U^T)$, where $Z = U \Sigma W^T$ is a thin SVD. Retractions: $R_X(Z) = X + Z$. Vector transports (associated with a retraction): $T^W_{XY} = P_Y$ where $Y = X + W$; the same result by differentiated retraction and by horizontal projection. Principal angles between two subspaces: (1) given Stiefel representations X and Y, $\theta = \arccos(\sigma(X^T Y))$; (2) given the horizontal lift $Z = \log_X(Y) \in H_X$ of the Grassmann logarithm, $\theta = \text{rev}(\sigma(Z))$, where the singular values are reversed because $\theta$ is in non-decreasing order by convention. Riemannian distance function: $d_g([X], [Y]) = \|\theta([X], [Y])\|$. Other distance functions on the Grassmann manifold (see e.g. [@Edelman1998, Sec. 4.3]): (1) projection 2-metric (or gap metric), $d_{p2} := \|X X^T - Y Y^T\|_2 = \sin \theta_k$; (2) projection F-metric, $d_{pF} := \|X X^T - Y Y^T\|_F / \sqrt{2} = \|\sin \theta\|$; (3) chordal 2-metric: $d_{c2} := \|\widetilde{X} - \widetilde{Y}\|_2$ where $\widetilde{X}$ and $\widetilde{Y}$ are the principal bases, or equivalently, $d_{c2} = 2 \sin(\theta_k / 2)$; (4) chordal F-metric (or Procrustes metric): $d_{cF} := \|\widetilde{X} - \widetilde{Y}\|_F$, or equivalently, $d_{c2} = \|2 \sin(\theta / 2)\|$; (5) Fubini-Study metric: $d_{FS} := \arccos(\text{abs}(\det X^T Y))$, or equivalently, $d_{FS} = \arccos(\text{prod}(\sigma(X^T Y)))$ and $d_{FS} = \arccos(\text{prod}(\cos\theta))$. (6) Finsler distance function, $d_F = \theta_k$, induced from the Finsler metric defined by the operator 2-norm on the tangent space [@Weinstein2000, Apx. A]. Notice that the projection 2-metric, chordal 2-metric, and Finsler distance only depend on $\theta_k$, while the projection F-metric, chordal F-metric, Fubini-Study, and the Riemannian distance depend on $\theta$. In general, every symmetry gauge function defines a metric by $\Phi(\theta)$, which are unitarily invariant since they only depend on $\theta$ [@QiuL2005]. Curvature: see $\mathcal{P}(k, n)$.
Cut locus and conjugate locus. The tangent cut locus and the tangent conjugate locus of any point p on a Grassmann manifold have simple characterizations on the horizontal tangent space of a Stiefel representation X, both expressed in terms of their singular values. The horizontal lift of the tangent cut locus of p is a sphere of radius π/2 in spectral norm: $\text{TCL}(p)_X = \{Z \in H_X : \sigma_1(Z) = \pi / 2\}$. The horizontal lift of the tangent conjugate locus of p corresponds to a countable collection of planes in the space of singular values: (1) if k ≠ n/2, then $\text{Conj}(p)_X = \{Z \in H_X : \sigma(Z) \in S_1 \cup S_2 \}$, where $S_1 = \cup_{i=1}^u \cup_{m=1}^\infty \{\sigma \in S : \sigma_i = m \pi\}$, $S_2 = \cup_{1 \le i < j \le u} \cup_{m=1}^\infty \{\sigma \in S : \sigma_i - \sigma_j = m \pi\}$, and $S = \mathbb{R}^u_{+\downarrow}$, where u = min(k, n-k); (2) if k = n/2, $S_1$ should be removed. Note that this is essentially proved in [@Bendokat2020, Thm 7.2], but they use conjugate points without reference to geodesics or tangent vectors, which is very confusing. For Grassmann manifolds, an abstract tangent vector has the same length as its horizotnal lift: let $\Delta \in T_p G_{k,n}$, then $\|\Delta\| = \|\Delta_X\|_F = \|\sigma(\Delta_X)\|$. Therefore, length is preserved if we map the horizontal tangent space $H_X$ to the space $S = \mathbb{R}^u_{+\downarrow}$ of singular values. Within this space, the (tangent) cut locus corresponds exactly to the (u-1)-plane, $\sigma_1 = \pi/2$; and the tangent conjugate locus corresponds exactly to the collection of planes, $S_1 \cup S_2$. In particular, the tangent cut locus of a Grassmann manifold always comes before the tangent conjugate locus, if the latter exists. Therefore, the Grassmann manifold at any point has injectivity radius: inj(p) = π/2; and the maximum distance between any two points is: $\max d_g(p, p') = \sqrt{u} \pi / 2$. The cardinality of shortest geodesic to a cut point is |O(r)|, where r denotes the number of principal angles equal to π/2.
Riemannian logarithm. Given Stiefel representations X and Y for $p, p' \in G_{k,n}$, let $\Delta_X \in H_X$ be the horizontal lift of the generalized Grassmann logarithm $\Delta = \log_p(p')$ to X. A method from [@Absil2004, Sec 3.8]: $Y (X^T Y)^{-1} - X = U \Sigma V^T$, an n-by-k SVD; then $\Delta_X = U \arctan(\Sigma) V^T$, which is inaccurate when there are large principal angles. A modified algorithm for the (generalized) Grassmann logarithm computes $\Delta_X$, first described in [@Zimmermann2019] with properties in [@Bendokat2020, Thm 5.4, 5.5]: $Y^T X = \tilde{Q} \tilde{S} \tilde{R}^T$, an order-k SVD; $\overline{Y} = Y (\tilde{Q} \tilde{R}^T)$ and $\overline{X} = X (\tilde{R} \tilde{S} \tilde{R}^T)$; $\overline{Y} - \overline{X} = \hat{Q} \hat{S} \hat{R}^T$, an n-by-k thin SVD. Output: let $r = \sum_{i=1}^k 1(\hat{s}_i = 1)$, if r = 0, then $p' \notin \text{Cut}(p)$, and $\Delta_X = \hat{Q} \arcsin(\hat{S}) \hat{R}^T$, which retains the Procrustes Stiefel representative: $\overline{Y} = \exp_X(\Delta_X)$; if r > 0, then $p' \in \text{Cut}(p)$, and $\Delta_X = \{\hat{Q} \arcsin(\hat{S}) \text{diag}(W, I_{k−r}) \hat{R}^T : W \in O(r)\}$. Notice that many other quantities can be computed with intermeidate results in this algorithm: (1) principal angles $\theta = \arccos \tilde{s} = \text{rev}(\arcsin \hat{s})$, where rev() denotes reversing the order of a vector; in particular, the singular values of the horizontal lift $\Delta_X$ of the (generalized) Grassmann logarithm are the principal angles (in non-increasing order), $\theta = \text{rev}(\sigma(\Delta_X))$; (2) principal bases $\tilde{X} = X \tilde{Q}$ and $\tilde{Y} = Y \tilde{R}$; (3) Riemannian distance $d_g(p, p') = \|\theta\|$.
Probability models [@Chikuse2003]: matrix Langevin $L^{(P)}_{k,n}(B)$, uses trace; matrix angular central Gaussian $\text{MACG}(\Sigma)$, uses (order-k) determinant.