Standard data analysis and model-fitting techniques work well where the data have greatest density, but can be severely biased in estimating tail probabilities.
**Extreme value theory** deals with the extreme deviations from the median of probability distributions.

Key issues in extreme value theory:

- Data sparsity: there are very few observations in the tail of the distribution;
- Extrapolation: estimates are often required beyond the largest observed data value;

Definition: **Sample maximum** is the maximum order statistic:

$$M_n = \text{max}(X_1, \cdots , X_n)$$

If support of a population is bounded above, the maximum order statistics converges to the supremum of the support:

$$\forall x, F(x)<1: \lim_{n \to \infty} P(M_n \leq x) = 0$$

Under proper **linear rescaling**, some sequences of maxima have non-degenerate limiting distributions (CDF):

$$\exists \{ a_m, b_m: a_m > 0 \}, \text{ s.t. } (M_m - b_m) / a_m \rightarrow H$$

Examples:

- Exponential(1): $a_n = 1, b_n = \ln n, H = \exp\{-e^{-x}\} \sim \text{Gumbel}(0, 1)$;
- Normal(0,1): $a_n = 1 / b_n, b_n = Q(n), H = \exp\{-e^{-x}\} \sim \text{Gumbel}(0, 1)$, where Q(n) is tail quantile function;
- Pareto-type tail, $1 - F(x) \approx c x^{-a}$: $a_n = (nc)^{1/a}, b_n = 0, H = \exp\{-x^{-a}\} \sim \text{Fréchet}(a, 1, 0)$;
- Finite upper endpoint, $1 - F(x) \approx c (w-x)^a, x \rightarrow w-$: $a_n = (nc)^{-1/a}, b_n = w, H = \exp\{-(-x)^a\} \sim \text{Weibull}(1, a)$;

Exceptions:

- Poisson distribution
- $F(x) = 1 − 1 / \ln x$ (x>e)

Defintion: A distribution H is said to be **max-stable** if for sequences $\{a_k\}$ and $\{b_k\}$, $H_k(x) = H(a_k x + b_k), k \in \mathbb{N}$.

Lemma: If rescaled maxima have a non-degenerate limit distribution, the limiting distribution must be max-stable.

**Generalised extreme value (GEV) distribution** is a three-parameter family:

$$\text{GEV}(\mu, \sigma, \xi) \sim \exp\left\{-\left[1 + \frac{\xi(x-\mu)}{\sigma} \right]^{-1/\xi}\right\}$$

Classification of GEV distributions by the shape parameter $\xi$:

- $\xi = 0$: Type I, Gumbel distribution (light-tailed);
- $\xi > 0$: Type II, Fréchet distribution (heavy-tailed);
- $\xi < 0$: Type III, Weibull distribution (heavy-tailed or light-tailed);

The **extreme value theorem** (or **Fisher–Tippett–Gnedenko theorem** [@Gnedenko1948]) says, if linearly rescaled maxima have a limiting distribution, then that limit will be a member of the generalized extreme value (GEV) family.

Extreme value theorem: The maximum of a sample of iid random variables after proper renormalization can only converge in distribution to one of three possible distributions, the Gumbel distribution, the Fréchet distribution, or the Weibull distribution, also known as type I, II and III extreme value distributions.

Rescaled maxima converges to a Fréchet distribution if and only if the population is regularly varying.

Definition: For a sufficiently smooth distribution F with upper terminal $x_F$, the **reciprocal hazard function** is

$$r(x)= \frac{1−F(x)}{f(x)}$$

Definition: **Tail quantile function**

$$Q(y) = F^{−1}(1 − 1/y), y \in [1, \infty]$$

Theorem: (**von Mises condition**)

$$\xi = \lim_{x \to x_F} r'(x) \quad\Rightarrow\quad \exists \mu, \sigma: \dfrac{M_m − Q(m)}{r(Q(m))} \rightarrow \text{GEV}(\mu, \sigma, \xi)$$

Taking $\xi_m = r'(b_m)$ may give a better approximation to the distribution of $\frac{M_m − b_m}{a_m}$ for finite $m$ than does using the limiting approximation.

There has been a good deal of work on the speed of convergence of $M_m$ to the limiting regime, which depends on the underlying distribution F. For example, maxima of Gaussian variables convergence very slow.

Direct use of the GEV rather than the three types separately allows for flexible modelling, and ducks the question of which type is most appropriate — the data decide.

Diagnostics: Gumbel plot, QQ plot

- Anthony Davison. Lecture slides on Statistics of Extremes, Autumn 2011. http://stat.epfl.ch
- Richard L. Smith, 2003. Statistics of Extremes, With Applications in Environment, Insurance and Finance.
- Gumbel, E. J. 1958, Statistics of Extremes.