Idea: sample mean converges to population mean, as sample size goes to infinity.
Version 1: For a sequence of uncorrelated random variables, if they have the same expectation while the supremum of variances is not growing too fast, then their average converges in probability to the expectation. Symbolically, $\forall X_i, X_j \in \{ X_n \}, i \neq j, \mathbb{E} X_i X_j = 0, \mathbb{E} X_i = \mathbb{E} X_j = \mu$, and $\sup_{i<n} \mathrm{Var} X_i = o(n)$, then $$\frac{1}{n} \sum_i X_i \overset{p}{\to} \mu$$
Version 2: For a random sample from a population with finite population mean, the sample mean converges in probability to the population mean. Symbolically, ${ X_i } \text{ i.i.d. } X$, $\mathbb{E}X < \infty$, then $$\frac{1}{n} \sum_i X_i \overset{p}{\to} \mathbb{E}X$$
For a random sample from a population with finite population mean, the sample mean converges almost surely to the population mean. Symbolically, ${ X_i } \text{ i.i.d. } X$, $\mathbb{E}X < \infty$, then $$\frac{1}{n} \sum_i X_i \overset{a.s.}{\to} \mathbb{E}X$$
Idea 1: the distribution of sample mean is asymptotically Gaussian, with variance equal to population variance.
Idea 2: In many cases, the sum of (centered) independent random variables is distributed approximately Gaussian. The necessary and sufficient conditions are:
Because of CLT, Gaussian random variables are often used to approximate a finite sum of random variables. But with current computing capacity, the importance of approximations like CLT is somewhat lessened.
${ X_i }$ is a random sample from a population with finite population mean $\mathbb{E}X$ and variance $\mathrm{Var}X$, then $$\sqrt{n} \left( \frac{1}{n} \sum_i X_i - \mathbb{E}X \right) \Rightarrow N(0, \mathrm{Var}X )$$
${ X_i }$ is a sequence of independent random variables in $L^2 (\Omega, \Sigma, P)$, if the Lindeberg condition holds: $$\lim_{n \to \infty} \frac{1}{s_n^2}\sum_{i = 1}^{n} \mathbb{E}\big[(X_i - \mu_i)^2 \cdot \mathbf{1}_{{ | X_i - \mu_i | > \varepsilon s_n }} \big] = 0$$ , where $s_n^2 = \sum_i \sigma_i^2$. Then $$\sqrt{n} \left( \frac{1}{n} \sum_i ( X_i - \mathbb{E}X_i ) \right) \Rightarrow N(0, s_n^2 )$$
${ X_i }$ is a random sample from a population with finite $L^3$-norm, then $$\sup_{Z\in\mathbb{R}} \lvert F_{Z_n}(z) - F_{Z}(z) \rvert \leq \frac{c}{\sqrt{n}} \frac{ \lVert X-\mu \rVert_3 }{\sigma^3}$$ , where $Z_n = \sqrt{n} \frac{\bar{X}_n-\mu}{\sigma}$ and $Z \sim N(0,1)$. $c$ is a constant and $c \in [\tfrac{1}{\sqrt{2\pi}}, 0.8)$
The Berry-Essen CLT provides the accuracy of approximation.
A distribution is stable if linear combinations of two independent random samples from the population has the same distribution, up to location and scale parameters:
A non-degenerate distribution $X$ is a stable distribution if $$X_1, X_2 \sim X, \forall a, b > 0, \exists c > 0, d \in \mathbb{R}: a X_1 + b X_2 \sim c X + d$$ The distribution is strictly stable if $d=0$.
Stable distributions have characteristic function:
$$\begin{aligned} \varphi(t; \alpha, \beta, c, \mu) &= \exp \{ i t \mu - |c t|^\alpha (1 - i \beta \text{sgn}(t) \Phi(\alpha, t) ) \} \\ \Phi(\alpha, t) &= \begin{cases} \tan(\pi \alpha / 2) & (\alpha \neq 1) \\ -2/\pi \log|t| & (\alpha = 1) \end{cases} \end{aligned}$$
Stable distributions form a four-parameter family, with stability parameter $\alpha$, skewness parameter $\beta$ (not the standardized 3rd moment), scale parameter $c$, and location parameter $\mu$.
The stability parameter takes value in $(0, 2]$, and roughly corresponds to concentration:
When the skewness parameter takes value in $[-1, 1])$, and roughly corresponds to symmetry:
Special cases:
Generalized Central Limit Theorem: {Gnedenko and Kolmogorov, 1954}
The sum of a number of random variables with probability density decreasing as power law $|x|^{−\alpha − 1}$ where $0 < \alpha < 2$ (and therefore having infinite variance) will tend to a stable distribution $f(x; \alpha, 0, c, 0)$ as the number of summands grows; if $\alpha > 2$, the classical central limit theorem applies.
Idea: Continuous mapping, on support of the stochastic limit, preserves convergence of random variables and distributions.
Thm: $h(\cdot)$ is a continuous function on the support of X. If $X_i \overset{p}{\to} X$, then $h(X_i) \overset{p}{\to} h(X)$
The same theorem also holds for almost sure convergence, convergence in $L^2$, and convergence of distribution functions.
Slutsky's Theorem is an important corollary of CMT:
If $X_i \Rightarrow X$, and $Y_i \overset{p}{\to} a$, then
Idea: Continuously differentiable function preserves asymptotic distribution.
Thm: Given $\mathbf{X}_n \overset{p}{\to} \mathbf{b}$ and $a_n ( \mathbf{X}_n - \mathbf{b} ) \Rightarrow \mathbf{X}$. If $g: \mathbb{R}^d \to \mathbb{R}^r$ is continuously differentiable at $\mathbf{b}$, then $$a_n [ g(\mathbf{X}_n) - g(\mathbf{b}) ] \Rightarrow (g\nabla)(\mathbf{b}) \mathbf{X}$$