Limit Theorems

Laws of large numbers (LLN)

Idea: sample mean converges to population mean, as sample size goes to infinity.

Weak law of large numbers (WLLN)

Version 1: For a sequence of uncorrelated random variables, if they have the same expectation while the supremum of variances is not growing too fast, then their average converges in probability to the expectation. Symbolically, $\forall X_i, X_j \in \{ X_n \}, i \neq j, \mathbb{E} X_i X_j = 0, \mathbb{E} X_i = \mathbb{E} X_j = \mu$, and $\sup_{i<n} \mathrm{Var} X_i = o(n)$, then $$\frac{1}{n} \sum_i X_i \overset{p}{\to} \mu$$

Version 2: For a random sample from a population with finite population mean, the sample mean converges in probability to the population mean. Symbolically, ${ X_i } \text{ i.i.d. } X$, $\mathbb{E}X < \infty$, then $$\frac{1}{n} \sum_i X_i \overset{p}{\to} \mathbb{E}X$$

Strong law of large numbers (SLLN) by Kolmogorov

For a random sample from a population with finite population mean, the sample mean converges almost surely to the population mean. Symbolically, ${ X_i } \text{ i.i.d. } X$, $\mathbb{E}X < \infty$, then $$\frac{1}{n} \sum_i X_i \overset{a.s.}{\to} \mathbb{E}X$$

Central limit theorem (CLT)

Idea 1: the distribution of sample mean is asymptotically Gaussian, with variance equal to population variance.

Idea 2: In many cases, the sum of (centered) independent random variables is distributed approximately Gaussian. The necessary and sufficient conditions are:

Each summand should be negligible compared to the dispersion of the sum, unless itself has a distribution close to Gaussian.
... (refer to review paper of CLT)

Because of CLT, Gaussian random variables are often used to approximate a finite sum of random variables. But with current computing capacity, the importance of approximations like CLT is somewhat lessened.

Lindeberg-Levy CLT

${ X_i }$ is a random sample from a population with finite population mean $\mathbb{E}X$ and variance $\mathrm{Var}X$, then $$\sqrt{n} \left( \frac{1}{n} \sum_i X_i - \mathbb{E}X \right) \Rightarrow N(0, \mathrm{Var}X )$$

Lindeberg-Feller CLT

${ X_i }$ is a sequence of independent random variables in $L^2 (\Omega, \Sigma, P)$, if the Lindeberg condition holds: $$\lim_{n \to \infty} \frac{1}{s_n^2}\sum_{i = 1}^{n} \mathbb{E}\big[(X_i - \mu_i)^2 \cdot \mathbf{1}_{{ | X_i - \mu_i | > \varepsilon s_n }} \big] = 0$$ , where $s_n^2 = \sum_i \sigma_i^2$. Then $$\sqrt{n} \left( \frac{1}{n} \sum_i ( X_i - \mathbb{E}X_i ) \right) \Rightarrow N(0, s_n^2 )$$

Berry-Essen CLT

${ X_i }$ is a random sample from a population with finite $L^3$-norm, then $$\sup_{Z\in\mathbb{R}} \lvert F_{Z_n}(z) - F_{Z}(z) \rvert \leq \frac{c}{\sqrt{n}} \frac{ \lVert X-\mu \rVert_3 }{\sigma^3}$$ , where $Z_n = \sqrt{n} \frac{\bar{X}_n-\mu}{\sigma}$ and $Z \sim N(0,1)$. $c$ is a constant and $c \in [\tfrac{1}{\sqrt{2\pi}}, 0.8)$

The Berry-Essen CLT provides the accuracy of approximation.

Generalized Central Limit Theorem (GCLT)

A distribution is stable if linear combinations of two independent random samples from the population has the same distribution, up to location and scale parameters:

A non-degenerate distribution $X$ is a stable distribution if $$X_1, X_2 \sim X, \forall a, b > 0, \exists c > 0, d \in \mathbb{R}: a X_1 + b X_2 \sim c X + d$$ The distribution is strictly stable if $d=0$.

Stable distributions have characteristic function:

$$\begin{aligned} \varphi(t; \alpha, \beta, c, \mu) &= \exp \{ i t \mu - |c t|^\alpha (1 - i \beta \text{sgn}(t) \Phi(\alpha, t) ) \} \\ \Phi(\alpha, t) &= \begin{cases} \tan(\pi \alpha / 2) & (\alpha \neq 1) \\ -2/\pi \log|t| & (\alpha = 1) \end{cases} \end{aligned}$$

Stable distributions form a four-parameter family, with stability parameter $\alpha$, skewness parameter $\beta$ (not the standardized 3rd moment), scale parameter $c$, and location parameter $\mu$.

The stability parameter takes value in $(0, 2]$, and roughly corresponds to concentration:

$\alpha = 2$: normal distribution;
$0 < \alpha < 2$: variance undefined;
$0 < \alpha \le 1$: expectation undefined;

When the skewness parameter takes value in $[-1, 1])$, and roughly corresponds to symmetry:

$\beta = 0$: the distribution is symmetric about $\mu$;
$\beta = 1$ and $\alpha < 1$: the distribution has support $[\mu, +\infty)$;

Special cases:

$\alpha = 1, \beta = 0$: Cauchy distribution;
$\alpha = 0.5, \beta = 1$: Levy distribution;

Generalized Central Limit Theorem: {Gnedenko and Kolmogorov, 1954}

The sum of a number of random variables with probability density decreasing as power law $|x|^{−\alpha − 1}$ where $0 < \alpha < 2$ (and therefore having infinite variance) will tend to a stable distribution $f(x; \alpha, 0, c, 0)$ as the number of summands grows; if $\alpha > 2$, the classical central limit theorem applies.

Continuous mapping theorem (CMT)

Idea: Continuous mapping, on support of the stochastic limit, preserves convergence of random variables and distributions.

Continuous mapping theorem

Thm: $h(\cdot)$ is a continuous function on the support of X. If $X_i \overset{p}{\to} X$, then $h(X_i) \overset{p}{\to} h(X)$

The same theorem also holds for almost sure convergence, convergence in $L^2$, and convergence of distribution functions.

Cor: Slutsky's Theorem

Slutsky's Theorem is an important corollary of CMT:

If $X_i \Rightarrow X$, and $Y_i \overset{p}{\to} a$, then

$Y_i X_i \Rightarrow aX$
$Y_i + X_i \Rightarrow a + X$

Cor: The Delta method

Idea: Continuously differentiable function preserves asymptotic distribution.

Thm: Given $\mathbf{X}_n \overset{p}{\to} \mathbf{b}$ and $a_n ( \mathbf{X}_n - \mathbf{b} ) \Rightarrow \mathbf{X}$. If $g: \mathbb{R}^d \to \mathbb{R}^r$ is continuously differentiable at $\mathbf{b}$, then $$a_n [ g(\mathbf{X}_n) - g(\mathbf{b}) ] \Rightarrow (g\nabla)(\mathbf{b}) \mathbf{X}$$

Appendix

🏷 Category=Probability