Heavy Tailed Distribution

Tail distribution function:

$$\overline{F}(x) \equiv \Pr[X>x] = 1 - F(x)$$

Heavy-tailed distributions are probability distributions whose tails are not exponentially bounded:

$$\forall \lambda>0,\quad \lim_{x \to \infty} e^{\lambda x} \overline{F}(x) = \infty$$

A probability distribution is light-tailed if it is not heavy-tailed.

There are three important subclasses of heavy-tailed distributions:

Fat-tailed distribution: regularly varying $$\exists \alpha, c > 0,\quad \lim_{x \to \infty} x^{\alpha} \overline{F}(x) = c$$
Subexponential distribution: ("catastrophe principle") $$\forall n \in \mathbb{Z}^+, \lim_{x \to \infty} \overline{F}_{X_{(n)}}(x) / \overline{F}_{\sum X_i}(x) = 1$$
Long-tailed distribution: $$\forall t>0, \lim_{x \to \infty} \overline{F}(x+t) / \overline{F}(x) = 1$$

All subexponential distributions are long-tailed.

Examples of heavy-tailed distributions:

Fat-tailed: Pareto, log-logistic;
Subexponential: log-normal, Weibull;
Long-tailed distribution: ;

Zipf, Cauchy, Student's t, Frechet

Properties

Scale invariance

Distribution $F$ is scale invariant if:

$$\exists x_0, g: \forall \lambda, x, \lambda x \ge x_0, \overline{F}(\lambda x) = g(\lambda) \overline{F}(x)$$

Theorem: A distribution is scale invariant if and only if it is Pareto.

Distribution $F$ is asymptotically scale invariant if:

$$\exists g \in C^0: \forall \lambda > 0, \lim_{x \to +\infty} \overline{F}(\lambda x) / \overline{F}(x) = g(\lambda)$$

Function $L$ is slowly varying if:

$$\exists L: \forall y > 0, \lim_{x \to +\infty} L(xy) / L(x) = 1$$

Distribution $F$ is regularly varying if exists a slowly varying function $L$ such that:

$$\overline{F}(x) = x^{-\rho} L(x)$$

Theorem: A distribution is asymptotically scale invariant if and only if it is regularly varying.

Regularly varying distributions basically behave like Pareto distributions with respect to the tail.

The "catastrophe principle"

The "catastrophe principle": the principle of a single big jump.

The "conspiracy principle":

$$\forall n \in \mathbb{Z}^+, \lim_{x \to \infty} \overline{F}_{X_{(n)}}(x) / \overline{F}_{\sum X_i}(x) = 0$$

Subexponential distributions by definition satisfies the "catastrophe principle".

The residual life "blows up"

The distribution of residual life given the current life $x$ is:

$$\overline{R}_x(t) = \overline{F}(x + t) / \overline{F}(x)$$

The distribution of residual life of an exponential distribution does not depend on the current life, known as memoryless.

For Pareto distribution, the distribution of residual life increases with the current life:

$$\overline{R}_x(t) = \dfrac{(x_0 / (x+t))^{\alpha}}{(x_0 / x)^{\alpha}} = (1+t/x)^{-\alpha},\quad \alpha > 0$$

Mean residual life $m(x) = - \int_{\mathbb{R}_+} t~\mathrm{d} \overline{R}_x(t)$.

Hazard rate $q(x) \equiv f(x) / \overline{F}(x) = - \overline{R}'_x(0)$

Long-tailed distributions have decreasing hazard rates and increasing mean residual lives (unbounded).

Emergence

Additive Processes

When the population in Generalized Central Limit Theorem does not have finite variance, the limiting processing converges to stable distribution with stability parameter $0 < \alpha < 2$, all of which are heavy tailed. See Limit Theorems.

Ruin time $T \equiv \inf \{t > 0 | x + c t - \sum_{i=1}^t X_i < 0 \}$ is always heavy-tailed. In case of symmetric 1D random walk, $X \sim 2 * \text{Bernoulli}(0.5) - 1$, the distribution of ruin time $\overline{R}_T(t) \sim \sqrt{2/(\pi x)}$.

Multiplicative Processes

Multiplicative processes almost always lead to heavy tails: wealth, twitter followers, hyperlink.

For a population $Y$ with $\mu = \mathbb{E} \log Y$ and $\text{Var} (\log Y) = \sigma^2 < \infty$, classical central limit theorem gives:

$$\left( \prod_i \frac{Y_i}{e^\mu} \right)^{1/\sqrt{n}} \to \text{LogNormal}(0, \sigma^2)$$

Multiplicative process with a lower barrier $P_n = \min(P_{n-1} Y_n, \varepsilon)$, under minor technical conditions, $P_n \to F$ which is "nearly" regularly varying:

$$\lim_{x \to \infty} \dfrac{\log \overline{F}(x)}{\log x} = \sup\{ s \ge 0 | \mathbb{E}Y_1^s \le 1 \}$$

Multiplicative process with noise $P_n = P_{n-1} Y_n + Q_n$ also leads to distributions that are approximately power-law.

Extremal Processes

In simple extremal process $M_n = \max_i X_i$, both heavy-tailed and light-tailed distributions can emerge. See Statistics of Extremes

The time until a record is always heavy-tailed: let $T_k$ be the time between the k-th and (k+1)-th record,

$$P(T_k > t) \sim 2^{k-1} / t$$

Identification

Identifying power-law distributions:

Visual identification: plot of tail distribution function
- linear on log-linear scale: exponential tail;
- linear on log-log scale: power-law tail;
Estimate shape parameter: (both methods are useful)
1. Maximum likelihood estimator (MLE): for Pareto distribution $\overline{F}(x) = (x_0 / x)^\alpha$, $$\hat{\alpha}_{\text{MLE}} = \dfrac{n}{\sum_i \log(x_i/x_{(1)})}$$
2. Weighted least squares (WLS) regression with weight $w_i = 1/ \log(x_i / x_0)$, $$\hat{\alpha}_{\text{WLS}} = - \dfrac{\sum_i \log(\hat{r}_i / n)}{\sum_i \log(x_i/x_0)}$$

The reason for preferring tail distribution plot over frequency plot is that tail distribution is monotonically decreasing.

Identifying power-law tails: Hill plot (Hill estimator with varying cutoff $k$)

$$\hat{\alpha}(k, n) = \dfrac{k}{\sum_{i=n-k+1}^{n} \log(x_{(i)}/x_{(k)})}$$

Resources

Jayakrishnan Nair, Adam Wierman, Bert Zwart. The Fundamentals of Heavy-Tails: Properties, Emergence, and Identification.

🏷 Category=Probability