Asymptotic analysis applies when sample size is large, and results are limited to statistics that are analytically tractable. {Efron1979} introduced the bootstrap as an empirical alternative to asymptotic analysis, estimating the sampling distribution of a statistic by resampling the original sample with replacement. Reference books include {Efron and Tibsharani, 1993}, and {Davison and Hinkley, 1997}.
Bootstrapping may also be used for constructing hypothesis tests, regression.
Notations: Sample size, \(N\); Bootstrap repetition, \(B\); Coefficient of interest, \(\beta_1\); Null hypothesis, \(H_0: \beta_1 = \beta_1^0 \); Estimator, \(\hat{\beta}\); Bootstrap estimator, \( \hat{\beta}_{1b}^* \), which may be different from the standard estimator; Restricted estimator, \(\hat{\beta}^R\); Residuals \(\mathbf{u}\); Estimator of standard error, \(s\); Wald test statistic, \(w = (\hat{\beta}_1 - \beta_1^0) / s_{\hat{\beta}_1}\); Significance level, \(\alpha\);
Bootstrap can be implemented with specific choices of resampling method and inference procedure.
Residual and wild bootstraps can impose the null hypothesis in resampling {Davidson and MacKinnon, 1999}, where the bootstrap Wald statistics are centered on \(\beta_1^0\) and the residuals bootstrapped are those from the restricted OLS estimator that imposes \(H_0\).
Asymptotic refinement refers to a convergence rate faster that using first-order asymptotic theory. To have asymptotic refinement, a bootstrap needs to be applied to an asymptotically pivotal statistic, a statistic whose asymptotic distribution does not contain unknown parameters. Bootstrap-t procedures provide asymptotic refinement, while bootstrap-se procedures do not.
A sample may contain clusters of observational units such that regression errors of the observations are independent across clusters but correlated within. Such correlation structures effectively reduce sample size to the number of clusters in statistical inference, where errors are assumed to be independent across observations. See {Cameron2015} for a good review on inference with clustered data.
Number of clusters in sample, \(G\); Number of observations in cluster \(g\), \(N_g\); Subsample of cluster \(g\), \( (y_g, X_g) \); Covariance matrix of regression errors within cluster \(g\), \(\Sigma_g\); Individual \(i\) in cluster \(g\) have subscript \(ig\);
Covariance matrix of the OLS estimator on clustered data is: \[ \text{Var}(\hat{\boldsymbol{\beta}} \mid \textbf{X}) = (X' X)^{-1} \left( \sum_{g = 1}^G X_g' \Sigma_g X_g \right) (X' X)^{-1}\]
Cluster-robust variance estimator (CRVE), \( \widehat{\text{Var}}_\text{CR}(\hat{\beta} ) \), replaces \(\Sigma_g\) with sample estimate \( \tilde{u_g}' \tilde{u_g} \). Here \(\tilde{u}\) is corrected residuals, and the standard CRVE simply uses the OLS residuals. {Bell and McCaffrey, 2002} proposed a correction \( \tilde{u} = u_g \sqrt{G / (G-1)} \), which generalizes the HC3 measure in {MacKinnon and White, 1985} and is equivalent to the jackknife estimator of \(\text{Var}(\hat{\boldsymbol{\beta}} \mid \textbf{X})\). {Cameron2008} referred to this correction as CR3.
Resampling methods:
Residual cluster bootstrap requires balanced clusters.