Tail distribution function:
$$\overline{F}(x) \equiv \Pr[X>x] = 1 - F(x)$$
Heavy-tailed distributions are probability distributions whose tails are not exponentially bounded:
$$\forall \lambda>0,\quad \lim_{x \to \infty} e^{\lambda x} \overline{F}(x) = \infty$$
A probability distribution is light-tailed if it is not heavy-tailed.
There are three important subclasses of heavy-tailed distributions:
All subexponential distributions are long-tailed.
Examples of heavy-tailed distributions:
Zipf, Cauchy, Student's t, Frechet
Distribution $F$ is scale invariant if:
$$\exists x_0, g: \forall \lambda, x, \lambda x \ge x_0, \overline{F}(\lambda x) = g(\lambda) \overline{F}(x)$$
Theorem: A distribution is scale invariant if and only if it is Pareto.
Distribution $F$ is asymptotically scale invariant if:
$$\exists g \in C^0: \forall \lambda > 0, \lim_{x \to +\infty} \overline{F}(\lambda x) / \overline{F}(x) = g(\lambda)$$
Function $L$ is slowly varying if:
$$\exists L: \forall y > 0, \lim_{x \to +\infty} L(xy) / L(x) = 1$$
Distribution $F$ is regularly varying if exists a slowly varying function $L$ such that:
$$\overline{F}(x) = x^{-\rho} L(x)$$
Theorem: A distribution is asymptotically scale invariant if and only if it is regularly varying.
Regularly varying distributions basically behave like Pareto distributions with respect to the tail.
The "catastrophe principle": the principle of a single big jump.
The "conspiracy principle":
$$\forall n \in \mathbb{Z}^+, \lim_{x \to \infty} \overline{F}_{X_{(n)}}(x) / \overline{F}_{\sum X_i}(x) = 0$$
Subexponential distributions by definition satisfies the "catastrophe principle".
The distribution of residual life given the current life $x$ is:
$$\overline{R}_x(t) = \overline{F}(x + t) / \overline{F}(x)$$
The distribution of residual life of an exponential distribution does not depend on the current life, known as memoryless.
For Pareto distribution, the distribution of residual life increases with the current life:
$$\overline{R}_x(t) = \dfrac{(x_0 / (x+t))^{\alpha}}{(x_0 / x)^{\alpha}} = (1+t/x)^{-\alpha},\quad \alpha > 0$$
Mean residual life $m(x) = - \int_{\mathbb{R}_+} t~\mathrm{d} \overline{R}_x(t)$.
Hazard rate $q(x) \equiv f(x) / \overline{F}(x) = - \overline{R}'_x(0)$
Long-tailed distributions have decreasing hazard rates and increasing mean residual lives (unbounded).
When the population in Generalized Central Limit Theorem does not have finite variance, the limiting processing converges to stable distribution with stability parameter $0 < \alpha < 2$, all of which are heavy tailed. See Limit Theorems.
Ruin time $T \equiv \inf \{t > 0 | x + c t - \sum_{i=1}^t X_i < 0 \}$ is always heavy-tailed. In case of symmetric 1D random walk, $X \sim 2 * \text{Bernoulli}(0.5) - 1$, the distribution of ruin time $\overline{R}_T(t) \sim \sqrt{2/(\pi x)}$.
Multiplicative processes almost always lead to heavy tails: wealth, twitter followers, hyperlink.
For a population $Y$ with $\mu = \mathbb{E} \log Y$ and $\text{Var} (\log Y) = \sigma^2 < \infty$, classical central limit theorem gives:
$$\left( \prod_i \frac{Y_i}{e^\mu} \right)^{1/\sqrt{n}} \to \text{LogNormal}(0, \sigma^2)$$
Multiplicative process with a lower barrier $P_n = \min(P_{n-1} Y_n, \varepsilon)$, under minor technical conditions, $P_n \to F$ which is "nearly" regularly varying:
$$\lim_{x \to \infty} \dfrac{\log \overline{F}(x)}{\log x} = \sup\{ s \ge 0 | \mathbb{E}Y_1^s \le 1 \}$$
Multiplicative process with noise $P_n = P_{n-1} Y_n + Q_n$ also leads to distributions that are approximately power-law.
In simple extremal process $M_n = \max_i X_i$, both heavy-tailed and light-tailed distributions can emerge. See Statistics of Extremes
The time until a record is always heavy-tailed: let $T_k$ be the time between the $k$-th and $(k+1)$-th record,
$$P(T_k > t) \sim 2^{k-1} / t$$
Identifying power-law distributions:
The reason for preferring tail distribution plot over frequency plot is that tail distribution is monotonically decreasing.
Identifying power-law tails: Hill plot (Hill estimator with varying cutoff $k$)
$$\hat{\alpha}(k, n) = \dfrac{k}{\sum_{i=n-k+1}^{n} \log(x_{(i)}/x_{(k)})}$$
Jayakrishnan Nair, Adam Wierman, Bert Zwart. The Fundamentals of Heavy-Tails: Properties, Emergence, and Identification.