Bootstrap

Asymptotic analysis applies when sample size is large, and results are limited to statistics that are analytically tractable. {Efron1979} introduced the bootstrap as an empirical alternative to asymptotic analysis, which uses resamples from a single sample to estimate the probabilistic distribution of a statistic, especially its standard error. Reference books include {Efron and Tibsharani 1993}, and {Davison and Hinkley 1997}.

Sample size, \(N\); Bootstrap repetition, \(B\); Coefficient of interest, \(\beta_1\); Null hypothesis, \(H_0: \beta_1 = \beta_1^0 \); Estimator, \(\hat{\beta}\); Bootstrap estimator, \( \hat{\beta}_{1b}^* \), which may be different from the standard estimator; Restricted estimator, \(\hat{\beta}^R\); Residuals \(\mathbf{u}\); Estimator of standard error, \(s\); Wald test, \(w = (\hat{\beta}_1 - \beta_1^0) / s_{\hat{\beta}_1}\); Significance level, \(\alpha\);

Implementation

Bootstrap can be implemented with specific choices of resampling method and inference procedure.

Bootstrap resampling methods

Pairs bootstrap (case bootstrap, non-parametric bootstrap): regressors and regressant are always paired together;
Residual bootstrap: regressant is constructed from randomized sample residuals;
Wild bootstrap: regressant is constructed by flipping the sign of sample residual with equal probability (Rademacher weights);

Residual bootstrap assumes iid residuals; Wild bootstrap is applicable to heteroskedastic models;

Bootstrap inference procedures

Bootstrap-t (percentile-t) {Efron1981}: use OLS estimates of the standard error of the sample and resamples, reject by bootstrap distribution;
Bootstrap-se: use bootstrap estimate of the standard error \(\hat{\sigma}_{\hat{\beta}_1} = s_{\hat{\beta}_{1B}^* }\), reject by normal distribution;

Imposing the null hypothesis: restricted OLS estimator and residuals;

Asymptotic refinement: Asymptotically pivotal statistic has faster convergence rate relative to using first-order asymptotic theory.

Clustered Data

Observational units are grouped in a way such that errors are independent across clusters but correlated within.

Number of clusters in sample, \(G\); Number of observations in cluster \(g\), \(N_g\); Subsample of cluster \(g\), \( (y_g, X_g) \); Covariance matrix of errors within cluster \(g\), \(\Sigma_g\); Individual \(i\) in cluster \(g\) have subscript \(ig\);

Covariance matrix of the OLS estimator on clustered data is: \[ \text{Var}(\hat{\boldsymbol{\beta}} \mid \textbf{X}) = (X' X)^{-1} \left( \sum_{g = 1}^G X_g' \Sigma_g X_g \right) (X' X)^{-1}\]

Cluster-robust variance estimator (CRVE), \( \widehat{\text{Var}}_\text{CR}(\hat{\beta} ) \), replaces \(\Sigma_g\) with sample estimate \( \tilde{u_g}' \tilde{u_g} \). Here \(\tilde{u}\) is corrected residuals, and the standard CRVE simply uses the OLS residuals. {Bell and McCaffrey 2002} proposed a correction \( \tilde{u} = u_g \sqrt{G / (G-1)} \), which generalizes the HC3 measure in {MacKinnon and White 1985} and is equivalent to the jackknife estimator of \(\text{Var}(\hat{\boldsymbol{\beta}} \mid \textbf{X})\). {Cameron2008} referred to this correction as CR3.

Resampling methods:

Pairs cluster bootstrap
Residual cluster bootstrap
Wild cluster bootstrap