Matrix Manifold

Certain sets of matrices may be endowed a manifold structure, which can help us understand or solve problems. This article looks at some common matrix manifolds.

Matrix Manifolds

The generic, m-by-n matrix manifold $M_{m, n}(\mathbb{R})$ is the $(m n)$-dimensional real vector space of m-by-n real matrices, endowed with the inner product $\langle A, B \rangle = \text{tr}(A B^T)$. It is diffeomorphic to, and often identified with, the Euclidean $(m n)$-space: $M_{m, n}(\mathbb{R}) \cong \mathbb{R}^{mn}$. Full-rank manifold $M^∗_{m, n}(\mathbb{R})$ or $\mathbb{R}_∗^{mn}$ is the set of m-by-n full-rank real matrices, as an open subset of the m-by-n matrix manifold: $M^∗_{m, n}(\mathbb{R}) = \{X \in M_{m, n}(\mathbb{R}) : \text{rank}(X) = \min(m, n)\}$; or equivalently, $M^∗_{m, n}(\mathbb{R}) = M_{m, n}(\mathbb{R}) \setminus \bigcup_{k=0}^{\min(m,n)} \mathcal{M}(k, m \times n)$. It is a disconnected Riemannian $(m n)$-manifold. If $m = n$, its underlying set coincides with the general linear group of nonsingular order-n real matrices: $M^∗_n(\mathbb{R}) = \text{GL}(n, \mathbb{R})$.

Rank-k manifold $\mathcal{M}(k, m \times n)$ is the set of m-by-n real matrices of rank $k \in \{0, \dots, \min(m, n)\}$, as an embedded Riemannian $k (m + n - k)$-submanifold of the m-by-n matrix manifold: $\mathcal{M}(k, m \times n) = \{X \in M_{m, n}(\mathbb{R}) : \text{rank}(X) = k\}$; its Riemannian metric is induced from the Euclidean metric of the m-by-n matrix manifold. Alternatively, it can be defined as a quotient manifold, with the order-k general linear group acting on the full-rank manifold of (m+n)-by-k matrices: $\mathcal{M}(k, m \times n) = (M^∗_{m+n,k} / \text{GL}_k$, with equivalence class $[(M, N)] = {(M Q, N Q^{-T}) : Q \in \text{GL}_k}$, where $M \in M^∗_{m,k}$ and $N \in M^∗_{n,k}$. From this definition, it is easy to see that it has dimension $k (m + n - k)$. Riemannian metrics can be induced on it via the Riemannian submersion theorem: let $\pi: M^∗_{m,k} \times M^∗_{n,k} \mapsto \mathcal{M}(k, m \times n)$, $\pi(M, N) = M N^T$, which is a surjective smooth submersion, consider the fiber bundle $(M^∗_{m+n,k}, \pi)$ with the (aforementioned) action of group $\text{GL}_k$ and an arbitrary Riemannian metric, there is a unique Riemannian metric on $\mathcal{M}(k, m \times n)$ such that $\pi$ a Riemannian submersion (see e.g. [@Absil2014]).

Square matrix manifolds

Symmetric matrix manifold $\mathcal{S}(n)$ is the set of order-n real symmetric matrices, as the Riemannian $n(n+1)/2$-submanifold of the order-n matrix manifold: $\mathcal{S}(n) = \{X \in M_n(\mathbb{R}) : X = X^T\}$. Positive-semidefinite manifold $\mathcal{S}_{\ge 0}(n)$ is the order-n positive semi-definite cone, as a regular domain of the symmetric matrix manifold: $\mathcal{S}_{\ge 0}(n) = \{X \in \mathcal{S}(n) : X \ge 0\}$. Positive-definite manifold $\mathcal{S}_+(n)$ is the set of order-n positive-definite matrices, as an open subset of the symmetric matrix manifold: $\mathcal{S}_+(n) = \{X \in \mathcal{S}(n) : X > 0\}$.

Rank-k positive-semidefinite manifold $\mathcal{S}_{\ge 0}(k, n)$ is the set of rank-k order-n positive-semidefinite matrices, as a Riemannian $k (2n - k + 1) / 2$-submanifold of the order-n matrix manifold: $\mathcal{S}_{\ge 0}(k, n) = \mathcal{S}_{\ge 0}(n) \cap \mathcal{M}(k, n \times n)$. It is related to the product manifold of the Stiefel manifold and the non-increasing positive Euclidean k-space by a surjective map: $f: V_{k, n} \times \mathbb{R}^k_{+\downarrow} \mapsto \mathcal{S}_{\ge 0}(k, n)$, $f(V, \lambda) = V \text{diag}(\lambda) V^T$, where $\mathbb{R}^k_{+\downarrow} = \{x \in \mathbb{R}^k_+ : \forall i < j, x_i \ge x_j\}$. Notice that the domain and codomain have the same manifold dimension. This also applies to $\mathcal{S}_+(n)$ with $k = n$.

Rank-k symmetric projection manifold $\mathcal{P}(k, n)$ is the set of rank-k symmetric projection matrices, as a $k(n-k)$-submanifold: $\mathcal{P}(k, n) = \{P \in \mathcal{S}(n) : P^2 = P, \text{rank}(P) = k\}$.

Skew symmetric matrix manifold $\Omega(n)$ is the set of order-n real skew symmetric matrices, as the Riemannian $n(n-1)/2$-submanifold of the order-n matrix manifold: $\Omega(n) = \{X \in M_n(\mathbb{R}) : X = - X^T\}$.

Other matrix manifolds:

Euclidean group $\text{SE}(3)$ is the Cartesian product of the order-3 special orthogonal group and the Euclidean 3-space: $\text{SE}(3) = \text{SO}(3) \times \mathbb{R}^3$.
Oblique manifold $\text{OB}_{m,n}$ is the set of m-tuples of points on the $(n-1)$-sphere that span the Euclidean n-space: $\text{OB}_{m,n} = \{X \in M^∗_{m, n}: \text{diag}(X X^T) = I\}$, or equivalently $\text{OB}_{m,n} = \{X \in \prod_{i=1}^m \mathbb{S}^{n-1} : \text{Span}~X = \mathbb{R}^n\}$.
Essential manifold is the set of essential matrices, i.e. the product of a skew-symmetric matrix and an orthogonal matrix: $E_n = \{\Omega Q : \Omega \in \Omega(n), Q \in O(n)\}$.
Flag manifold $F_K(\mathbb{R}^n)$ is the set of all flags of type K in the Euclidean n-space, i.e. a nested sequence of linear subspaces: given $K = (k_i)_{i=1}^m$, $F_K(\mathbb{R}^n) = \{(S_i)_{i=1}^m : S_i \in G_{k_i,n}, \forall i < j, S_i \subset S_j \}$.

Stiefel manifold

Stiefel manifold $V_{k, n}$ is the collection of orthonormal k-frames in the Euclidean n-space, as an embedded Riemannian submanifold of the m-by-k matrix manifold: $V_{k, n} = \{X \in M_{n,k}(\mathbb{R}) : X^T X = I_k\}$, $k \in \{1, \dots, n\}$. It is a submersion level set $F(X) = X^T X - I_k = 0$, and therefore an embedded submanifold [@Absil2008, 3.3.2]. Riemannian metric is induced from Euclidean metric: $g_X(Z, W) = \text{tr}(Z^T W)$. If $k = 1$, it is the (n-1)-sphere: $V_{1,n} = \mathbb{S}^{n-1}$, with dimension $(n - 1)$. If $k = n$, it coincides with the orthogonal group: $V_{n,n} = O(n)$, with dimension $n (n - 1) / 2$.

Manifold property. Dimension $k (2n - k - 1) / 2$. For $O(n)$, $n (n-1) / 2$; for (n-1)-sphere, $n-1$. Compact, because it is homeomorphic to the quotient space $O(n) / O(n-k)$. Connectedness: $(n-k-1)$-connected (see n-connected space). When $k = n$, it is $O(n)$, which has two components, with determinant 1 ($SO(n)$) and -1. When $k = n-1$, it is path-connected. When $k < n-1$, it is simply connected. Homogeneous (need reference). Complete (if $k = n$, both components are metrically complete).

Constructs as a submanifold. Tangent space $T_X V_{k, n} = \{Z \in M_{n,k} : X^T Z + Z^T X = 0\}$; or, $T_X V_{k, n} = \{X \Omega + X_\perp K : \Omega \in \Omega(k), K \in M_{n-k,k}\}$. For orthogonal group, $T_X O(n) = \{X \Omega : \Omega \in \Omega(n)\}$. Normal space: $N_X V_{k, n} = \{X S : S \in \mathcal{S}(k)\}$. Projection onto tangent space: $P_X A = (I - X X^T) A + X~\text{skew}(X^T A)$, $\text{skew}(M) = (M - M^T) / 2$; Projection onto normal space: $P_X^\perp A = X~\text{sym}(X^T A)$, $\text{sym}(M) = (M + M^T) / 2$.

Geometric operations. Riemannian metric: $g_X(Z, W) = \text{tr}(\Omega_Z^T \Omega_W + K_Z^T K_W)$. Covariant derivative (w.r.t. the Levi-Civita / tangential connection): $\nabla_Z W(X) = P_X (\bar{\nabla}_Z W(X))$, where $Z \in T_X V_{k, n}$, $W \in \mathfrak{X}(V_{k, n})$. Riemannian distance function: closed form unknown? Exponential map: $\exp_X(Z) = (X, Z) \exp([X^T Z, -Z^T Z; I_k, X^T Z]) (I_k; 0) \exp(-X^T Z)$ (uses matrix exponential). Parallel transport: closed form unknown. Retractions (mostly based on matrix decompositions): (1) QR decomposition: $R(X, Z) = Q$, where $X + Z = Q R$, R has positive diagonal elements (modified Gram-Schmidt, a finite number of addition, multiplication, division, and square root). (2) Polar decomposition: $R(X, Z) = (X + Z) (I_k - Z^T Z)^{-1/2}$ (iterative $O(k^3)$ for eigen-decomposition + $O(nk^2)$ scalar additions and multiplications). Vector transports (associated with a retraction): (1) by differentiated retraction: $T_W(Z) = R \rho(R^T Z R') + (I - R R^T) Z R'$, where $R = R(X, W)$ is the QR-based retraction, $R' = (R^T (X + W))^{-1}$, and $\rho(A) = L - L^T$ (L is the lower triangle of A); (2) by tangential projection (as a submanifold): $T_W(Z) = (I - Y Y^T) Z + Y~\text{skew}(Y^T Z)$, where $Y = R(X, W)$ is any retraction;

Probability models: matrix von Mises-Fisher (vMF) [@Chikuse2003], max entropy with moment constraints [@Pennec2006]. Sampling: geodesic Monte Carlo is applicable [@Byrne2013].

Grassmann manifold

Grassmann manifold or Grassmannian $G_{k, n}$ or $G_k(\mathbb{R}^n)$ is an (abstract) Riemannian manifold, defined as follows: its underlying set consists of k-subspaces of the Euclidean n-space, $G_{k, n} = \{\text{Span}(X) : X \in M^∗_{n, k}\}$, $k \in \{1, \dots, n\}$; its smooth manifold structure is uniquely determined such that group $\text{GL}_n$ is a smooth left action on $G_{k,n}$ [@Lee2012, Ex 21.21]; its Riemannian metric is uniquely determined such that $\text{Span}: V_{k,n} \mapsto G_{k,n}$ is a Riemannian submersion, because the orthogonal group $O(k)$ is an isometric right action on the Stiefel manifold $V_{k,n}$ [@Lee2018, Prob 2-7]. If $k = 1$, it is the $(n-1)$-dimensional projective space, i.e. the quotient manifold of lines in the Euclidean n-space: $G_{1,n} = \mathbb{RP}^{n-1}$. If $k = n$, it is a singleton consisting of the Euclidean n-space: $G_{n,n} = \{\mathbb{R}^n\}$.

Manifold property. Dimension $k (n - k)$. Compact, because it is homeomorphic to the quotient space $V_{k, n} / O(k)$ [@Lee2012, Prob 21-13]. Simply connected, except $G_{1, 2}$; (see e.g. this paper) Symmetric, isotropic (and therefore homogeneous) [@Lee2018, Prob 3-20]. Complete.

Representations of the abstract manifold. Because $\text{Span}: V_{k,n} \mapsto G_{k,n}$ and $\text{Span}: M^∗_{n, k} \mapsto G_{k,n}$ are Riemannian submersions, one may use $X \in M^∗_{n, k}$ as a representation of $\text{Span}(X)$ to carry out all the computations related to the Grassmann manifold. The equivalence classes for $X \in V_{k, n}$ is $[X] = \{X Q : Q \in O(k)\}$; The equivalence classes for $X \in M^∗_{n, k}$ is $[X] = \{X M : M \in \text{GL}_k\}$. For notational simplicity, we may use the equivalence class $[X]$ and abstract element $\text{Span}(X)$ interchangeably, when there is no ambiguity. Horizontal tangent space (represents the abstract tangent space) $H_X = \{Z \in M_{n, k} : X^T Z = 0\}$. With two representations $Y, X \in M^∗_{n, k}$ of the same abstract element $[X]$ such that $Y = X M$ and $M \in \text{GL}_k$, if $Z_Y, Z_X$ represent the same abstract tangent vector $Z \in T_{[X]} G_{k, n}$, then $Z_Y = Z_X M$. Projection onto the horizontal space: $P_X Z = (I - X (X^T X)^{-1} X^T) Z$. Vertical tangent space (analogous to normal space of a submanifold) $V_X = \{X M : M \in M_{k, k}\}$; note that $V_X = T_X [X]$ and $[X] \subset V_X$.

Representations of an explicit identification. One may identify the underlying sets of the Grassmannian and the rank-k symmetric projection manifold, because $\text{Range}: \mathcal{P}(k, n) \mapsto G_{k,n}$ is a bijection, and so is the map $P: G_{k,n} \mapsto \mathcal{P}(k, n)$ that takes a subspace to its orthogonal projection operator. Let quotient map $\pi: V_{k,n} \mapsto \mathcal{P}(k,n)$ with $\pi(X) = X X^T$. Let quotient map $\pi_k: O(n) \mapsto V_{k,n}$ with $\pi_k(X) = (x_i)_{i=1}^k$, which takes the first k columns of a matrix. Without ambiguity, let $\pi: O(n) \mapsto \mathcal{P}(k,n)$ with $\pi(X) = \pi(\pi_k(X))$. (Not sure if the quotient maps $\pi$ are Riemannian submersions.) One may use $X \in V_{k,n}$ or $X \in O(n)$ as a representation of $\pi(X) \in \mathcal{P}(k,n)$, see e.g. [@Bendokat2020, Fig 2.1].

Geometric operations. Riemannian metric $g_{[X]}([Z], [W]) = \text{tr}((X^T X)^{-1} Z^T W)$, where $Z, W \in H_X$; if $X \in V_{k, n}$, then $g_{[X]}([Z], [W]) = \text{tr}(Z^T W)$, which is the same as the Euclidean metric [@Absil2008, Prop 3.4.6, Sec 3.6.2]; it is scaled by a half in some conventions [@Bendokat2020]. Covariant derivative (w.r.t. the Levi-Civita connection) $\overline{\nabla_Z W(X)} = P_X (\bar{\nabla}_{Z_X} W_X(X))$. Exponential map: $\exp_X(Z_X) = [X (X^T X)^{-1/2} V \cos(\Sigma) + U \sin(\Sigma)]$; where $Z_X = U \Sigma V^T$ is a thin SVD, i.e. $U \in V_{k, n}$, $\Sigma = \text{diag}(\sigma)$, $\sigma \in \mathbb{R}^k_{+\downarrow}$, and $V \in O(k)$. Parallel transport along geodesics: $(P_W(Z))_Y = [-X V \sin(\Sigma) + U \cos(\Sigma)] U^T Z_X + (I - U U^T) Z_X$ where $X \in V_{k, n}$, $Y = \exp_X(W_X)$, and $W_X = U \Sigma V^T$ is a thin SVD. Retractions: $R_{[X]}(Z) = [X + Z_X]$. Vector transports (associated with a retraction): $(T_W(Z))_Y = P_Y Z_X$, where $Y = X + W_X$. (the same result by differentiated retraction and by horizontal projection) Riemannian distance function: $d_g(\text{Span}(X), \text{Span}(Y)) = \|\theta\|$, where $\theta$ is the vector of the k (non-decreasing) principal angles between the two subspaces. Note that the gap metric $\|X^T X - Y^T Y\|$ where $X, Y \in V_{k, n}$, equals $\sin(\theta_k)$.

Curvature tensor...

Riemannian logarithm. Any set of points on the Grassmannian can be mapped to any tangent space simultaneously: if a point is in the injectivity domain of the reference point, the tangent vector is unique; if it is in the cut locus of the reference point, there may be two or a continuum of tangent vectors. With $P, F \in G_{k,n}$, let the Grassmann logarithm $\log_P(F) = \Delta$, where $\Delta \in T_P G_{k,n}$. Note that $\Delta$ is not unique for points in the cut locus of $P$. Given representations $U, Y \in V_{k,n}$, let $\Delta_U \in H_U$ be the horizontal lift of Δ to U. A modified algorithm for the Grassmann logarithm [@Bendokat2020]: $Y^T U = \tilde{Q} \tilde{S} \tilde{R}^T$, an order-k SVD; $\bar{Y} = Y (\tilde{Q} \tilde{R}^T)$ and $\bar{U} = U (\tilde{R} \tilde{S} \tilde{R}^T)$; $\bar{Y} - \bar{U} = \hat{Q} \hat{S} \hat{R}^T$, an n-by-k thin SVD; we get $\Delta_U = \hat{Q} \arcsin(\hat{S}) \hat{R}^T$. Notice that many other quantities can be computed with intermeidate results in this algorithm: principal angles $\theta = \arccos \tilde{s}$; principal bases $\tilde{U} = U \tilde{Q}$ and $\tilde{Y} = Y \tilde{R}$; Riemannian distance $d_g(P, F) = \|\theta\|$.

Probability models: matrix Bingham [@Chikuse2003].

Matrix-valued Mapping

One may consider matrix-valued mappings $f: X \mapsto M_{m,n}(\mathbb{F})$. Matrix pencil or pencil of matrices is a one-parameter family of matrices, $f: \mathbb{F} \mapsto M_{m,n}(\mathbb{F})$. The most common form is a linear family of square matrices: $f(\lambda) = A + \lambda B$, where $A, B \in M_n$; sometimes it is written as $(A, B)$. Consider that $M_{m,n}(\mathbb{R}) \cong \mathbb{R}^{m,n}$, the matrix pencil $A + \lambda B$ is equivalent to a line in the matrix manifold $M_{m,n}$. More generally, matrix pencil of degree l is a degree-l polynomial family of matrices: $f(\lambda) = \sum_{i=0}^l \lambda^i A_i$, where $A_l \ne 0$. Matrix pencils have applications in numerical linear algebra, control theory, etc.

Regular matrix pencil is one whose value is not always singular. Singular matrix pencil is one that is not regular. Eigenvalue of a matrix pencil is a complex number where the value of the matrix pencil is singular: $f(\lambda) \notin \text{GL}_n$. This may be the reason that matrix pencils use $\lambda$ as the variable.

🏷 Category=Algebra Category=Topology