Key issues in extreme value theory:
Standard data analysis/model-fitting techniques work well where the data have greatest density, but can be severely biased in estimating tail probabilities.
Definition: Sample Maximum
\[ M_n = \text{max}(X_1, \cdots , X_n) \]
If support of a population is bounded above, the maximum order statistics converges to the supremum of the support. Symbolically,
\[ \forall x \text{ s.t. } F(x)<1: \lim P(M_n \leq x) = 0 \]
Under proper linear rescaling, some sequences of maxima have non-degenerate limiting distributions. Symbolically, \[ \exists \{ a_m, b_m: a_m > 0 \}, \text{ s.t. } (M_m - b_m) / a_m \Rightarrow H \]
Examples:
Exceptions:
Defintion: A distribution H is said to be max-stable if for sequences \( \{a_k\} \) and \( \{b_k\} \), \( H_k(x) = H(a_k x + b_k), k \in \mathbb{N} \).
Lemma: If rescaled maxima have a non-degenerate limit distribution, the limiting distribution must be max-stable.
The extreme value theorem says, if linearly rescaled maxima have a limiting distribution, then that limit will be a member of the generalized extreme value (GEV) family.
Theorem: The maximum of a sample of iid random variables after proper renormalization can only converge in distribution to one of three possible distributions, the Gumbel distribution, the Fréchet distribution, or the Weibull distribution, also known as type I, II and III extreme value distributions.
Credit for the extreme value theorem, aka Fisher–Tippett–Gnedenko theorem, is given to Gnedenko (1948).
Generalised Extreme Value (GEV) distribution:
\[ H(x) = \exp \left\{ − \left( 1 + \frac{\xi (x − \mu)}{\psi} \right)_{+}^{−1/\xi} \right\} \]
Definition: For a sufficiently smooth distribution F with upper terminal \(x_F\) , define the reciprocal hazard function as \( r(x)= \frac{1−F(x)}{f(x)} \)
Definition: Tail quantile function \( Q(y) = F^{−1}(1 − 1/y), y \in [1,∞] \)
von Mises conditions: If \( \xi = \lim_{x \to x_F} r'(x) \) exists, denote \(b_m = Q(m), a_m = r(b_m)\), then \( \frac{M_m − b_m}{a_m} \) converge to a GEV distribution with shape parameter \( \xi \).
Proof:
Taking \( \xi_m = r'(b_m) \) may give a better approximation to the distribution of \( \frac{M_m − b_m}{a_m} \) for finite \(m\) than does using the limiting approximation.
There has been a good deal of work on the speed of convergence of \(M_m\) to the limiting regime, which depends on the underlying distribution F. For example, maxima of Gaussian variables convergence very slow.
Direct use of the GEV rather than the three types separately allows for flexible modelling, and ducks the question of which type is most appropriate — the data decide.