Let $Pf = \mathbb{E}f$ denote the integration of random variable $f$ under probability measure $P$.

**Empirical measure** (empirical probability density function, EPDF)
is the discrete uniform measure on a random sample;
that is, each observation of the sample is assigned the same finite probability.
Empirical measure is a random measure because it is dependent on a random sample.
Symbolically, the empirical measure $P_n$ of a measurable set $A$ in sample space $(Ω,Σ)$ is defined as
$$P_n(A) = \frac{1}{n} \sum_{i=1}^n 1_A(X_i)$$

The **empirical distribution function** (ECDF) of a real-valued random variable
is the empirical measure indexed by a class of one-sided intervals:
$$P_n(x) = P_n (-\infty,x] = \frac{1}{n} \sum_{i=1}^n 1_{\{X_i \leq x\}}$$

Empirical distribution functions have the following properties (point-wise in the parameter domain):

- Unbiased: $P P_n(x) = F(x)$
- Consistent: $P_n(x) \overset{p}{\to} F(x)$
- Asymptotically normal: $\sqrt{n} ( P_n(x) - F(x) ) \Rightarrow N(0, F(x)(1-F(x)) )$

Theorem: (**Glivenko-Cantelli**)
Empirical distribution function converges uniformly to the population distribution function:
$$\| P_n - F \|_{\infty} \overset{a.s.}{\to} 0$$

For a parameter $\mathbf{h}(P \mathbf{g}(X))$ that is a function of expectations,
its **method-of-moments estimator** (substitution estimator) [@Pearson1894]
replaces the population measure with the empirical measure: $\mathbf{h}(P_n \mathbf{g}(X))$.
If the parameter to be estimated is a function of moments, $h(PX, \cdots, PX^k)$,
the method-of-moments estimator is $h( P_n X, \cdots, P_n X^k)$.

Empirical moments are consistent and asymptotically normal: using central limit theorem, we can show that $$\sqrt{n} [ (P_n X, \cdots, P_n X^k) - (P X, \cdots, P X^k) ] \Rightarrow N(0,Σ)$$, where $Σ_{ij} = P X^{i+j} - P X^i P X^j$. If $h(\cdot)$ is continuously differentiable at $(PX, \cdots, PX^k)$, by the Delta method, we have $$\sqrt{n} [ h( P_n X, \cdots, P_n X^k) - h(PX, \cdots, PX^k) ] \Rightarrow N(0, (∇h)'Σ(∇h))$$, where $∇h$ is evaluated at $(PX, \cdots, PX^k)$.

For Gamma distribution $Γ(α, β)$:

- The parameters $(α, β) = g(PX, PX^2)$, where $g(x,y) = ( x^2/(y-x^2), (y-x^2)/x )$.
- The q-th order moment is $PX^q = α (α+1) \cdots (α+q-1) β^q$.
- The asymptotic variance of empirical 1st and 2nd moments is $Σ = \begin{pmatrix} α β^2 & 2 α(α+1)β^2 \\ 2 α(α+1)β^2 & 2 α(α+1)(2α+3)β^4 \end{pmatrix}$
- Gradient of $g(x,y)$ at $(PX, PX^2)$ is $∇ g(PX, PX^2) = \begin{pmatrix} \frac{2(α+1)}{β} & -\frac{α+1}{α} \\ -\frac{1}{β^2} & \frac{1}{αβ} \end{pmatrix}$
- Then we have the asymptotic variance for method of moments estimator $g(P_n X, P_n X^2)$: $Σ' = \begin{pmatrix} 2α(α+1) & -2(α+1)β \\ -2(α+1)β & \frac{(2α+3)β^2}{α} \end{pmatrix}$

For uniform distribution $U(0, θ)$, because $θ = 2 PX$, its method of moments estimator is $2 P_n X$. But this estimator is not very efficient.