Circular statistics concerns statistical data analysis on the unit circle.

Basic Concepts

The unit circle $\mathbb{S}^1$ is isometric to the submanifold in the plane defined by $\{(x, y) \in \mathbb{R}^2 : x^2 + y^2 = 1\}$, where each point on the circle is identified with a pair of coordinates, $p \leftrightarrow (x, y)$. Points on the unit circle is often represented as an angle relative to a reference direction, such as the x axis, via counterclockwise rotation. This angle $\theta$ is uniquely defined in $[0, 2\pi]$, while $\theta + 2 b \pi$ represents the same point for any integer b. Alternatively, the unit circle can be identified with complex numbers of unit moduli, $\{z \in \mathbb{C} : |z| = 1\}$, ignoring the geometry. In this case, each point is related to its angle representations via $z = e^{i \theta}$. Thus the angle $\theta$ can also be interpreted as phase or complex argument. In summary, every point $p \in \mathbb{S}^1$ on the unit circle can be represented uniquely as Cartesian coordinates $(\cos\theta, \sin\theta)$ or a complex number $e^{i \theta}$, or nonuniquely as a phase angle $\theta$.

By embedding the unit circle in the plane, a circular random variable becomes a random vector of unit length. We call the sum of multiple vectors in a vector space their resultant vector: $\mathbf{r} = \sum_{i=1}^n \mathbf{v}_i$. Thus, the resultant vector of a random sample $(\mathbf{x}_i)_{i=1}^n$ of an embedded circular random variable is $\mathbf{r} = \sum_{i=1}^n \mathbf{x}_i$, with coordinates $(C, S) := (\sum_{i=1}^n \cos\theta_i, \sum_{i=1}^n \sin\theta_i)$. The resultant length $R = |\mathbf{r}|$ of the sample can be computed as $R = \sqrt{C^2 + S^2}$.

The geodesic distance $d_g(p, q)$ between two points on the unit circle is the length of the shortest arc connecting them. Representing the points as directions or angles, it is the smallest amount of rotation required to align them: $d_g(\theta, \phi) = \min\{\theta - \phi \mod 2 \pi, \phi - \theta \mod 2 \pi\}$. If the angles are in $[0, 2\pi]$, let $d = |\theta - \phi|$, the geodesic distance can be computed as $d_g = \min(d, 2 \pi - d)$ or $d_g = \pi - |\pi - d|$.

Summary Statistics

Extrinsic measures of location and dispersion

The extrinsic center of mass $\bar{\mathbf{x}}$, or mean resultant vector $\bar{\mathbf{r}}$, is the resultant vector devided by sample size: $\bar{\mathbf{x}} = \mathbf{r} / n$, mean resultant length $\bar{R} = R / n$, and mean direction $\bar{\theta} = \arg\bar{\mathbf{r}}$ (defined if $\bar{R} \ne 0$). The mean resultant vector has coordinates $(\bar{C}, \bar{S}) := n^{-1} (C, S)$, where sample cosine moment $\bar{C} = n^{-1} \sum_{i=1}^n \cos\theta_i$ and sample sine moment $\bar{S} = n^{-1} \sum_{i=1}^n \sin\theta_i$.

The mean direction $\bar{\theta}$ is a measure of location. For Euclidean data, centering a sample eliminates the resultant: $\sum_{i=1}^n (x_i - \bar{x}) = 0$. For circular data, centering a sample rotates the resultant to the x axis: $\left(\sum_{i=1}^n \cos(\theta_i - \bar{\theta}), \sum_{i=1}^n \sin(\theta_i - \bar{\theta})\right) = (R, 0)$.

The mean resultant length $\bar{R}$ is a measure of concentration. Since the extrinsic center of mass lies in the closed unit disk, we have $\bar{R} \in [0, 1]$. It is close to one if and only if the sample is tightly clustered; it is close to zero if the sample is widely dispersed or antipodally symmetric. Sample circular variance $V$ is a measure of dispersion, defined as the complement of unit of the mean resultant length: $V = 1 - \bar{R}$; apparently it is also in [0, 1]. For Euclidean data, sample variance is the minimum of the sample mean of squared distance to a point, $(x - a)^2$, obtained at $a = \bar{x}$. For circular data, sample circular variance is the minimum of the sample mean of a dissimilarity measure $1 - \cos(\theta - \alpha)$, obtained at $\alpha = \bar{\theta}$. Sample circular standard deviation $v = \sqrt{-2 \log(\bar{R})}$, which is in $[0, \infty]$.

Intrinsic measures of location and dispersion

Median direction $\tilde{\theta}$ of a sample of circular data is an intrinsic measure of location, defined as any direction for which (1) the same number of points lie in the left and right semi-circles from the point and (2) more points are closer to the point than its opposite point. Equivalently, it is any direction that minimizes the mean geodesic distance to the sample: $\tilde{\theta} \in \arg\min d_0(\alpha)$, where $d_0(\alpha) = n^{-1} \sum_{i=1}^n d_g(\theta_i, \alpha)$. The minimum value $d_0(\tilde{\theta})$ is called the circular mean deviation, which is the sample mean geodesic distance to any median direction. Note that the median direction is different from the Riemannian centers of mass (i.e., Fréchet mean and Karcher mean) because the latter uses the squared geodesic distance.

Circular mean difference $\bar{D}_0$ of a sample of circular data is the average geodesic distance between all pairs of points: $\bar{D}_0 = n^{-2} \sum_{i,j=1}^n d_g(\theta_i, \theta_j)$. Circular range $w$ is the length of the shortest arc containing all observations. Let $(\theta_{(1)}, \cdots, \theta_{(n)})$ be the linear order statistics of the sample. The arc lengths between adjacent observations can be computed as $T_i = \theta_{(i+1)} - \theta_{(i)}$ for $i = 1, \cdots, n-1$, and $T_n = 2 \pi + \theta_{(1)} - \theta_{(n)}$. The circular range can be computed as $w = 2 \pi - \max(T_i)_{i=1}^n$.

Trigonometric moments

First trigonometric moment $m'_1$ about the zero direction of a sample of circular data is the complex number combining the sample cosine and sine moments: $m'_1 := \bar{C} + i \bar{S}$. It is the complex number representation of the mean resultant vector $\bar{\mathbf{r}}$, and we have $m'_1 = \bar{R} e^{i \bar{\theta}}$. First central trigonometric moment $m_1$ of a sample of circular data is the first trigonometric moment of the sample centered at its mean direction: $m_1(\boldsymbol{\theta}) = m'_1(\boldsymbol{\theta} - \bar{\theta})$. It equals the mean resultant length: $m_1 = \bar{R}$.

In general, for any positive integer $p$, define the following quantities: the p-th sample cosine moment $\bar{C}_p = n^{-1} \sum_{i=1}^n \cos(p \theta_i)$, the p-th sample sine moment $\bar{S}_p = n^{-1} \sum_{i=1}^n \sin(p \theta_i)$, the p-th mean resultant length $\bar{R}_p := \bar{R}(p \boldsymbol{\theta})$, and the p-th mean direction $\bar{\theta}_p := \bar{\theta}(p \boldsymbol{\theta})$. They are well defined because angle representations $p \theta$ are unambiguous when $p$ is an integer: $p (\theta + 2 \pi) = p \theta + 2 p \pi$. The p-th trigonometric moment $m'_p$ of a sample of circular data, is the complex number combining the p-th sample cosine and sine moments: $m'_p := \bar{C}_p + i \bar{S}_p$. It is the complex number representation of the mean resultant vector of the p-fold angles: $m'_p = \bar{R}_p e^{i \bar{\theta}_p}$. The p-th central trigonometric moment $m_p$ of a sample of circular data is the p-th trigonometric moment of the sample centered at its mean direction: $m_p(\boldsymbol{\theta}) = m'_p(\boldsymbol{\theta} - \bar{\theta})$. Trigonometric moments can be used to define measures of dispersion, skewness, and kurtosis. Sample circular dispersion is defined as $\hat{\delta} = (1 - \bar{R}_2) / (2 \bar{R}^2)$.

Correlation

Fisher-Lee circular correlation coefficient $\rho_T$. Proportional to $R^2(\Theta - \Phi) - R^2(\Theta + \Phi)$

Fisher-Lee circular rank correlation coefficient $\hat{\Pi}_n$.

Mardia circular rank correlation coefficient.

Jammalamadaka-Sarma circular rank correlation coefficient.

References

Circular statistics:

  • [@Fisher1993] Fisher. Statistical Analysis of Circular Data. 1993.
  • [@Jammalamadaka2001] Jammalamadaka & SenGupta. Topics in Circular Statistics. 2001. (Not recommended.)
  • [@Pewsey2013] Pewsey, Neuhauser, & Ruxton. Circular Statistics in R. 2013.

Spherical statistics:

  • [@Watson1983] Watson. Statistics on Spheres. 1983.
  • [@Fisher1987] Fisher, Lewis, & Embleton. Statistical Analysis of Spherical Data. 1987.

Directional statistics (circular and spherical):

  • [@Mardia1972] Mardia. Statistics of Directional Data. 1972.
  • [@Mardia1999] Mardia & Jupp. Directional Statistics. 1999.

Shape analysis and object data analysis:

  • [@Dryden1998] Dryden & Mardia. Statistical Shape Analysis. 1998.
  • [@Kendall1999] Kendall, Barden, Carne, and Le. Shape and Shape Theory. 1999.
  • [@Bhattacharya2012] Bhattacharya & Bhattacharya. Nonparametric Inference on Manifolds: With Applications to Shape Spaces. 2012.
  • [@Patrangenaru2015] Patrangenaru & Ellingson. Nonparametric Statistics on Manifolds and Their Applications to Object Data Analysis. 2015.
  • [@Dryden2016] Dryden & Mardia. Statistical Shape Analysis: With Applications in R. 2016.
  • [@Srivastava2016] Srivastava & Klassen. Functional and Shape Data Analysis. 2016.

🏷 Category=Statistics Category=Manifold