Matrix variate and Matrix valued Functions

This article discusses mappings whose domain or codomain contains a space of matrices.

Matrix-variate function

Invarint matrix-variate function

Invariant theory of matrices study matrix-variate functions that are invariant under the transformations from a given linear group. Common invariant functions are usually polynomials polynomials in the matrix entries; note that this is different from matrix polynomials.

Conjugation-invariant function, aka similarity-invariant function, of a square matrix is a function that is invariant under similarity transforms: $f: M_{n,n}(\mathbb{C}) \mapsto \mathbb{C}$, $f(A) = f(P A P^{-1})$, $\forall P \in \text{GL}_n$. If the function is continuous, then it is conjugation-invariant if and only if it commutes: $f \in C^0(M_{n,n}(\mathbb{C}))$, $f(A B) = f(B A)$. A function of square matrices is conjugation-invariant if and only if it can be written as a function of eigenvalues (counting all algebraic multiplicities): for any $A \in M_{n,n}(\mathbb{C})$, let $A = P J P^{-1}$ be a Jordan canonical form, denote eigenvalues $\lambda = \text{diag}(J)$, then there is an n-variate function $g: \mathbb{C}^n \mapsto \mathbb{C}$ such that $f(A) = g(\lambda)$. Some common similarity/conjugation-invariant functions: (1) trace, $\text{tr}(A) = \sum_{i=1}^n \lambda_i$, compute in n-1 additions; (2) determinant, $\det(A) = \prod_{i=1}^n \lambda_i$, compute in $\mathcal{O}(n^\alpha)$ flops, where exponent $\alpha \in (2, 3]$ is the same for matrix multiplication, which has an asymptotic upper bound 2.373 [@LeGall2014]; (3) trace of a power, $\text{tr}(A^k) = \sum_{i=1}^n \lambda_i^k$, where $k \in \mathbb{N}$, compute in $\mathcal{O}(k n^\alpha)$ flops due to (k-1) matrix multiplications; (4) spectral radius, $\rho(A) = \max(|\lambda_i|)_{i=1}^n$, compute in $\mathcal{O}(n^2)$ flops per iteration.

Orthogonal-invariant function of a real matrix, or unitary-invariant function of a complex matrix, is a function that is invariant under left and right actions of orthogonal matrices: $f: M_{m,n} \mapsto \mathbb{F}$, $f(M) = f(Q_m A Q_n)$, $\forall Q_m \in O(m)$, $\forall Q_n \in O(n)$. A matrix-variate function is orthogonal/unitary-invariant if and only if it can be written as a function of singular values: for any $M \in M_{m,n}(\mathbb{F})$, $m \ge n$, let $M = U \Sigma V$ be a full singular value decomposition, denote singular values $\sigma = \text{diag}(\Sigma) \in \mathbb{R}^m_{+\downarrow}$, then there is an m-variate function $g: \mathbb{R}^m_{+\downarrow} \mapsto \mathbb{R}$ such that $f(M) = g(\sigma)$. Some common orthogonal functions: (1) Frobenius norm, $\|M\|_F = (\sum_{i=1}^m \sigma_i^2)^{1/2}$, compute in (m n) multiplications and (m n - 1) additions; (2) spectral norm, $\|M\|_2 = \sigma_1$, compute in $\mathcal{O}(m n)$ flops. Note that the condition number of a nonsingular matrix w.r.t. the spectral norm, $\kappa(A) = \|A\|_2 \|A^{-1}\|_2$ where $A \in \text{GL}_n$, equals the ratio of the largest and the smallest singular values: $\kappa(A) = \sigma_1 / \sigma_n$, and therefore it is orthogonal-invariant. If A is a normal matrix, then $\sigma_1 = \rho(A)$ and $\sigma_n = \rho(A^{-1})$, and therefore its condition number is conjugation-invariant.

Hypergeometric function

Hypergeometric function with a matrix argument ${}_{p}F_{q}: \mathbb{R}^p \times \mathbb{R}^q \times \mathcal{S}(n) \mapsto \mathbb{R}$, where $p, q \in \mathbb{N}$, is a family of functions recursively defined via the multivariate Laplace and inverse Laplace transforms [@Herz1955]: let ${}_{0}F_{0}(S) = \exp(\text{tr}(S))$, $a = (a_i)_{i=1}^p$, and $b = (b_j)_{j=1}^q$, define ${}_{p+1}F_{q}((a, c), b, Y) = \int_{\mathcal{S}_+(n)} |S|^{c - (n+1)/2} ~{}_{0}F_{0}(-S) ~{}_{p}F_{q}(a, b, Y S) (d S)$ and ${}_{p}F_{q+1}(a, (b, c), S) = (2 \pi i)^{-n(n+1)/2} 2^{n(n-1)/2} \Gamma_n(c) \int_{Y_0 + i \mathcal{S}(n)} |Y|^{-c} ~{}_{0}F_{0}(Y) ~{}_{p}F_{q}(a, b, S Y^{-1}) (d Y)$, where (d S) denotes the Lebesgue measure, and $Y_0$ is an arbitrary positive definite matrix. It can also be defined for complex arguments and values. Properties: (1) The function depends on the symmetric matrix argument only through its (real) eigenvalues, and therefore it is real orthogonal similarity-invariant: ${}_{p}F_{q}(a, b, S) = {}_{p}F_{q}(a, b, \Lambda(S))$; (2) For any $S \in S(n)$ and $A \in M_n$, let $S = V \Lambda V^T$ and $\tilde{A} = V^T A V$, we have ${}_{p}F_{q}(a, b, S A) = {}_{p}F_{q}(a, b, \Lambda \tilde{A})$.

Special cases: ${}_{1}F_{0}(a, S) = |I_n - S|^{-a}$; ${}_{0}F_{1}(m/2, X X^T) = \int_{O(m)} \exp(\text{tr}(2 X H)) [d H]$, where [d H] denotes the normalized invariant measure on O(m).

It has a series representation using zonal polynomials [@Constantine1963]: Let $\lambda \vdash l$ denote that λ is an ordered partition of a natural number l: $\lambda \in \mathbb{Z}^m_{+\downarrow} \cap (l \Delta^{m-1})$, $m \le l \in \mathbb{N}$; note that with $\lambda = (l_i)_{i=1}^m$, we have $l_1 \ge \cdots \ge l_m > 0$ and $\sum_{i=1}^m l_i = l$. Let generalized hypergeometric coefficient $(a)_\lambda = \prod_{i=1}^m (a - (i-1)/2)_{l_i}$, where a is a scalar, $(a)_l = \prod_{i=0}^{l-1} (a + i) = {}_{a+l} P_{l}$ equals l-permutations of a+l, and $(a)_0 = 1$. A hypergeometric function with a matrix argument can be written as: ${}_{p}F_{q}(a, b, S) = \sum_{l=0}^\infty \sum_{\lambda \vdash l} \prod_{i=1}^p (a_i)_\lambda \prod_{j=1}^q (b_j)_\lambda^{-1} C_\lambda(S) / l!$, where $C_\lambda: \mathcal{S}_+(n) \mapsto \mathbb{R}$ is a normalized zonal polynomial. For more on this topic, see [@Chikuse2003, A.6-7].

The compuation of hypergeometric functions with a matrix argument is intractable in general: with an order-n Hermitian matrix argument, its m-th order truncation can be computed in $\mathcal{O}(P_{mn}^2 n)$ time, where $P_{mn}^2 = \mathcal{O}(\exp(2\pi \sqrt{2m/3}))$ is sub-exponential [@Koev2006].

Matrix function

Matrix function is a transformation on a space of square matrices: $f: M_{n,n} \mapsto M_{n,n}$. Every analytic function can be used to define a matrix function, see e.g. @Higham2008. Matrix polynomial $p(A)$ is a polynomial of square matrices.

Matrix exponential $e^A$ of an order-n matrix is the order-n matrix defined by the Taylor series of the scalar exponential function, substituting the variable with the matrix: $\exp: M_n(\mathbb{C}) \mapsto \text{GL}_n(\mathbb{C})$, or $\exp: M_n(\mathbb{R}) \mapsto \text{GL}_n^+(\mathbb{R})$, $e^A = \sum_{n=0}^{\infty} A^n / n!$. The series converges for all square matrices, and the resulting matrix is invertible. The matrix exponential has many similar properties with the scalar exponential, but not all. Some properties: (1) $\exp(A^∗) = \exp(A)^∗$, where $∗$ can be transpose or conjugate; (2) if $A B = B A$ then $\exp(A + B) = \exp(A) \exp(B)$; (3; as a corollary) $\exp(A) \exp(-A) = I$; (4) $\exp(0) = I$; (5) $\det(\exp(A)) = \exp(\text{tr}(A))$; (6) $\exp(A \oplus B) = \exp(A) \otimes \exp(B)$, where $\oplus$ and $\otimes$ are the Kronecker sum and product, respectively;

The exponential map is holomorphic (complex analytic): $\exp: C^\omega(M_n(\mathbb{C}), \text{GL}_n(\mathbb{C}))$ and $\exp: C^\omega(M_n(\mathbb{R}), \text{GL}_n^+(\mathbb{R}))$. In fact, a more general result is that the exponential map of a Lie group is analytic: $\exp: C^\omega(\mathfrak{g}, G)$, where $\mathfrak{g}$ is the Lie algebra associated with the group. Since $\text{GL}_n(\mathbb{C})$ is a Lie group with Lie algebra $C^\omega(M_n(\mathbb{C})$, and the exponential map is the matrix exponential, it follows that matrix exponential is analytic.

Given a Jordan decomposition of a matrix, its matrix exponential can be reduced to the exponential of its Jordan canonical form: $A = T J T^{-1}$ then $e^A = T e^J T^{-1}$. Note that, $\exp\left[t \begin{pmatrix}\lambda & 1 \\ 0 & \lambda \end{pmatrix}\right] = e^{\lambda t} \begin{pmatrix}1 & t \\ 0 & 1 \end{pmatrix}$, and $\exp\left[t \begin{pmatrix}a+ib & 0 \\ 0 & a-ib \end{pmatrix}\right] = e^{a t} \begin{pmatrix}\cos bt & -\sin bt \\ \sin bt & \cos bt \end{pmatrix}$. Computation. The matrix exponential is often computed with the scaling and squaring method [@Higham2005; @Al-Mohy2010]. The computational cost is $\mathcal{O}(n^3)$, with a coefficient (usually on the scale of 50) that depends on approximation order and matrix norm.

Matrix logarithm $\log(A)$ is the inverse map of the matrix exponential. The matrix logarithm is often computed with the inverse scaling and squaring method, which essentially carries out the scaling and squaring method for the matrix exponential in reverse order [@Al-Mohy2012].

Matrix-valued function

One may consider matrix-valued mappings $f: X \mapsto M_{m,n}(\mathbb{F})$. Matrix pencil or pencil of matrices is a one-parameter family of matrices, $f: \mathbb{F} \mapsto M_{m,n}(\mathbb{F})$. The most common form is a linear family of square matrices: $f(\lambda) = A - \lambda B$, where $A, B \in M_n$; sometimes it is written as (A, B) and called a matrix pair. Consider that $M_{m,n}(\mathbb{R}) \cong \mathbb{R}^{m,n}$, the matrix pencil $A - \lambda B$ is equivalent to a line in the matrix manifold $M_{m,n}$. Quadratic matrix pencil has the form $f(\lambda) = \lambda^2 A + \lambda B + C$. More generally, matrix pencil of degree l is a degree-l polynomial family of matrices: $f(\lambda) = \sum_{i=0}^l \lambda^i A_i$, where $A_l \ne 0$. Matrix pencils have applications in numerical linear algebra, control theory, etc.

Regular matrix pencil is one whose value is not always singular. Singular matrix pencil is one that is not regular, i.e., whose value is always singular. Eigenvalue of a matrix pencil is a complex number where the value of the matrix pencil is singular: $f(\lambda) \notin \text{GL}_n$. This may be the reason that matrix pencils use $\lambda$ as the variable. Eigenvector associated with an eigenvalue of a matrix pencil is a vector x that satisfies: $f(\lambda) x = 0$.

Generalized eigenvalue problem concerns finding the eigenvalues of a matrix pair (A, B) and the associated eigenvectors x such that $A x = \lambda B x$. Generalized eigenvalue may also be defined as a set of complex pairs (see e.g. [@Stewart1990], [@Saad2011, Ch. 9]). Quadratic eigenvalue problem concerns finding eigenpairs of a quadratic matrix pencil. This arises in structural dynamics, where one is interested in finding $(\lambda^2 M + \lambda C + K) x = 0$. In practice, the damping term C is often simplified and only the generalized eigenvalue problem $(\lambda M + K) x = 0$ is solved.