Let $Pf = \mathbb{E}f$ denote the integration of random variable $f$ under probability measure $P$.

Empirical Distribution

Empirical measure (empirical probability density function, EPDF) is the discrete uniform measure on a random sample; that is, each observation of the sample is assigned the same finite probability. Empirical measure is a random measure because it is dependent on a random sample. Symbolically, the empirical measure $P_n$ of a measurable set $A$ in sample space $(Ω,Σ)$ is defined as \[ P_n(A) = \frac{1}{n} \sum_{i=1}^n \mathbf{1}_A(X_i) \]

The empirical distribution function (ECDF) of a real-valued random variable is the empirical measure indexed by a class of one-sided intervals: \[ P_n(x) = P_n (-\infty,x] = \frac{1}{n} \sum_{i=1}^n \mathbf{1}(X_i \leq x) \]

Empirical distribution functions have the following properties (point-wise in the parameter domain):

  1. Unbiased: \( P P_n(x) = F(x) \)
  2. Consistent: \( P_n(x) \overset{p}{\to} F(x) \)
  3. Asymptotically normal: \( \sqrt{n} ( P_n(x) - F(x) ) \Rightarrow N(0, F(x)(1-F(x)) ) \)

Theorem: (Glivenko-Cantelli) Empirical distribution function converges uniformly to the population distribution function: \[ \| P_n - F \|_{\infty} \overset{a.s.}{\to} 0 \]

Method-of-Moments Estimators

For a parameter $\mathbf{h}(P \mathbf{g}(X))$ that is a function of expectations, its method-of-moments estimator (substitution estimator) [@Pearson1894] replaces the population measure with the empirical measure: \( \mathbf{h}(P_n \mathbf{g}(X)) \). If the parameter to be estimated is a function of moments, $h(PX, \cdots, PX^k)$, the method-of-moments estimator is \( h( P_n X, \cdots, P_n X^k) \).

Empirical moments are consistent and asymptotically normal: using central limit theorem, we can show that \[ \sqrt{n} [ (P_n X, \cdots, P_n X^k) - (P X, \cdots, P X^k) ] \Rightarrow N(0,Σ) \], where \( Σ_{ij} = P X^{i+j} - P X^i P X^j \). If $h(\cdot)$ is continuously differentiable at $(PX, \cdots, PX^k)$, by the Delta method, we have \[ \sqrt{n} [ h( P_n X, \cdots, P_n X^k) - h(PX, \cdots, PX^k) ] \Rightarrow N(0, (∇h)'Σ(∇h)) \], where $∇h$ is evaluated at $(PX, \cdots, PX^k)$.

For Gamma distribution $Γ(α, β)$:

  • The parameters \((α, β) = g(PX, PX^2) \), where \(g(x,y) = ( x^2/(y-x^2), (y-x^2)/x )\).
  • The $q$-th order moment is \( PX^q = α (α+1) \cdots (α+q-1) β^q \).
  • The asymptotic variance of empirical 1st and 2nd moments is \( Σ = \begin{pmatrix} α β^2 & 2 α(α+1)β^2 \\ 2 α(α+1)β^2 & 2 α(α+1)(2α+3)β^4 \end{pmatrix} \)
  • Gradient of $g(x,y)$ at $(PX, PX^2)$ is \( ∇ g(PX, PX^2) = \begin{pmatrix} \frac{2(α+1)}{β} & -\frac{α+1}{α} \\ -\frac{1}{β^2} & \frac{1}{αβ} \end{pmatrix} \)
  • Then we have the asymptotic variance for method of moments estimator \( g(P_n X, P_n X^2) \): \( Σ' = \begin{pmatrix} 2α(α+1) & -2(α+1)β \\ -2(α+1)β & \frac{(2α+3)β^2}{α} \end{pmatrix} \)

For uniform distribution $U(0, θ)$, because $θ = 2 PX$, its method of moments estimator is $2 P_n X$. But this estimator is not very efficient.


🏷 Category=Statistics