Standard data analysis and model-fitting techniques work well where the data have greatest density, but can be severely biased in estimating tail probabilities. Extreme value theory deals with the extreme deviations from the median of probability distributions.
Key issues in extreme value theory:
Definition: Sample maximum is the maximum order statistic:
$$M_n = \text{max}(X_1, \cdots , X_n)$$
If support of a population is bounded above, the maximum order statistics converges to the supremum of the support:
$$\forall x, F(x)<1: \lim_{n \to \infty} P(M_n \leq x) = 0$$
Under proper linear rescaling, some sequences of maxima have non-degenerate limiting distributions (CDF):
$$\exists \{ a_m, b_m: a_m > 0 \}, \text{ s.t. } (M_m - b_m) / a_m \rightarrow H$$
Examples:
Exceptions:
Defintion: A distribution H is said to be max-stable if for sequences $\{a_k\}$ and $\{b_k\}$, $H_k(x) = H(a_k x + b_k), k \in \mathbb{N}$.
Lemma: If rescaled maxima have a non-degenerate limit distribution, the limiting distribution must be max-stable.
Generalised extreme value (GEV) distribution is a three-parameter family:
$$\text{GEV}(\mu, \sigma, \xi) \sim \exp\left\{-\left[1 + \frac{\xi(x-\mu)}{\sigma} \right]^{-1/\xi}\right\}$$
Classification of GEV distributions by the shape parameter $\xi$:
The extreme value theorem (or Fisher–Tippett–Gnedenko theorem [@Gnedenko1948]) says, if linearly rescaled maxima have a limiting distribution, then that limit will be a member of the generalized extreme value (GEV) family.
Extreme value theorem: The maximum of a sample of iid random variables after proper renormalization can only converge in distribution to one of three possible distributions, the Gumbel distribution, the Fréchet distribution, or the Weibull distribution, also known as type I, II and III extreme value distributions.
Rescaled maxima converges to a Fréchet distribution if and only if the population is regularly varying.
Definition: For a sufficiently smooth distribution F with upper terminal $x_F$, the reciprocal hazard function is
$$r(x)= \frac{1−F(x)}{f(x)}$$
Definition: Tail quantile function
$$Q(y) = F^{−1}(1 − 1/y), y \in [1, \infty]$$
Theorem: (von Mises condition)
$$\xi = \lim_{x \to x_F} r'(x) \quad\Rightarrow\quad \exists \mu, \sigma: \dfrac{M_m − Q(m)}{r(Q(m))} \rightarrow \text{GEV}(\mu, \sigma, \xi)$$
Taking $\xi_m = r'(b_m)$ may give a better approximation to the distribution of $\frac{M_m − b_m}{a_m}$ for finite $m$ than does using the limiting approximation.
There has been a good deal of work on the speed of convergence of $M_m$ to the limiting regime, which depends on the underlying distribution F. For example, maxima of Gaussian variables convergence very slow.
Direct use of the GEV rather than the three types separately allows for flexible modelling, and ducks the question of which type is most appropriate — the data decide.
Diagnostics: Gumbel plot, QQ plot