A sequence of estimators of model parameter based on random sample is **consistent**, if the estimators converge in probability to the parameter, pointwise in the parameter space.

Symbolically, $W_n = W_n (\{ X_1, \cdots, X_n \})$ is consistent of $\boldsymbol{\theta}$, if

$$W_n \overset{p}{\to} \boldsymbol{\theta}, \forall \boldsymbol{\theta} \in \boldsymbol{\Theta}$$

Thm: For a sequence of estimators of model parameter, if its expectation converges to the parameter and its variation converges to zero, both pointwise in the parameter space, then it is consistent.

Symbolically, $W_n$ is consistent of $\boldsymbol{\theta}$, if

$$\mathbb{E}W_n \to \boldsymbol{\theta}, \mathrm{Var} W_n \to 0, \forall \boldsymbol{\theta} \in \boldsymbol{\Theta}$$

Example:

- For a location family of distributions with finite mean, sample mean is a consistent estimator of the location parameter, or effectively the population mean.

Thm: Given a consistent sequence of estimators of model parameter, any sequence of affine transformations that approaches identity also induces a consistent sequence of estimators.

Symbolically, $W_n$ is consistent of $\boldsymbol{\theta}$, $a_n \to 1, b_n \to 0$, then $U_n = a_n W_n + b_n$ is also consistent of $\boldsymbol{\theta}$.

The variance of the asymptotic distribution of an estimator is called the **asymptotic variance** of the estimator.

Note:

- Asymptotic variance is different from the limit of variances of a sequence of estimators. The latter, a.k.a. the limiting variance, sometimes may not exist.
- The asymptotic variance is always smaller than the limiting variance. (Lehmann & Casella, 1998)

A sequence of estimators is called **asymptotically efficient** for an induced parameter, if it is asymptotically normal with asymptotic variance achieving the Cramer-Rao Lower Bound, pointwise in the parameter space.

Symbolically, an estimator $W_n$ of induced parameter $g(\boldsymbol{\theta})$ is asymptotically efficient, if

$$\sqrt{n} [W_n - g(\boldsymbol{\theta})] \Rightarrow N \left( 0, (g \nabla) [\mathbb{E}(\nabla \log f)(\nabla \log f)']^{-1} (\nabla g) \right) , \quad \forall \boldsymbol{\theta} \in \boldsymbol{\Theta}$$

For estimators that cannot achieve asymptotic efficiency, they may have other desirable properties such as ease of calculation, robustness to underlying assumptions. An index of relative efficiency then becomes useful to quantify the trade-off.

Given two asymptotically normal estimators of an induced parameter, the ratio of their asymptotic variances is called the **asymptotic relative efficiency** (ARE) of the latter with respect to the former.

Symbolically, given $\sqrt{n} [V_n - g(\boldsymbol{\theta})] \Rightarrow N ( 0, \sigma^2_V )$, and $\sqrt{n} [W_n - g(\boldsymbol{\theta})] \Rightarrow N ( 0, \sigma^2_W )$, the asymptotic relative efficiency of $V_n$ with respect to $W_n$ is

$$\text{ARE}(V_n,W_n) = \frac{\sigma^2_W}{\sigma^2_V}$$

The term robustness can have many interpretations, but perhaps it is best summarized by Huber (1981, Sec. 1.2):

- It should have a reasonably good (optimal or nearly optimal) efficiency at the assumed model.
- It should be robust in the sense that small deviations from the model assumptions should impair the performance only slightly.
- Somewhat larger deviations from the model should not cause a catastrophe.

A statistic based on random sample has a **breakdown value**, if when any larger portion than the critical value of the sample diverges, the statistic would also diverge.

Symbolically, statistic $T_n$ of a sample of size $n$ has breakdown value $b, b\in[0,1]$, if

$$T_n < \infty \text{ as } X_{ ( \{ (1-b)n \} ) } \to \infty; \\ \forall \varepsilon >0, T_n \to \infty \text{ as } X_{ ( \{ (1-b-\varepsilon)n \} ) } \to \infty$$

Here, $\{ \cdot \}$ denotes the nearest integer function.

Note:

- The breakdown value of sample mean is 0. That is, if any fraction of the sample is driven to infinity, so is the sample mean.
- The breakdown value of sample median is 0.5.