Bootstrap is a nonparametric resampling method for estimating the variability of statistical estimates.
Asymptotic analysis applies when sample size is large, and results are limited to statistics that are analytically tractable. The bootstrap [@Efron1979] is an empirical alternative to asymptotic analysis, estimating the sampling distribution of a statistic by resampling the original sample with replacement, that is, resampling from the empirical PDF (EPDF).
As a nonparametric method, bootstrap should be the default inference procedure when we have no knowledge of the population or sampling distribution, or when the sample size is insufficient.
Notations: Sample size, $N$; Bootstrap repetition, $B$; Coefficient of interest, $\beta_1$; Null hypothesis, $H_0: \beta_1 = \beta_1^0 \$; Estimator, $\hat{\beta}$; Bootstrap estimator, $\hat{\beta}_{1b}^*$, which may be different from the standard estimator; Restricted estimator, $\hat{\beta}^R$; Residuals $\mathbf{u}$; Estimator of standard error, $s$; Wald test statistic, $w = (\hat{\beta}_1 - \beta_1^0) / s_{\hat{\beta}_1}$; Significance level, $\alpha$;
Reference books: {Efron, Tibsharani, 1993}, {Davison, Hinkley, 1997}.
Assuming samples are IID.
The plug-in principle is the method of estimation of functionals of a population distribution by evaluating the same functionals at the empirical distribution based on a sample: $g\{X\} \dot \sim g\{x_I\}$.
bootstrap standard error;
bootstrap confidence interval: percentile bootstrap; bias-corrected and accelerated (BCa) bootstrap [@Efron1987];
Bootstrap is asymptotically more accurate than the standard confidence intervals obtained using sample variance and normality assumption. {DiCiccio, Efron, 1996}
Residual and wild bootstraps can impose the null hypothesis in resampling {Davidson, MacKinnon, 1999}, where the bootstrap Wald statistics are centered on $\beta_1^0$ and the residuals bootstrapped are those from the restricted OLS estimator that imposes $H_0$.
Asymptotic refinement refers to a convergence rate faster that using first-order asymptotic theory. To have asymptotic refinement, a bootstrap needs to be applied to an asymptotically pivotal statistic, a statistic whose asymptotic distribution does not contain unknown parameters. Bootstrap-t procedures provide asymptotic refinement, while bootstrap-se procedures do not.
Time series; point processes;
A sample may contain clusters of observational units such that regression errors of the observations are independent across clusters but correlated within. Such correlation effectively reduce sample size to the number of clusters in statistical inference, where errors are assumed to be independent across observations. See [@Cameron2015] for a good review on inference with clustered data.
Number of clusters in sample, $G$; Number of observations in cluster $g$, $N_g$; Subsample of cluster $g$, $(y_g, X_g)$; Covariance matrix of regression errors within cluster $g$, $\Sigma_g$; Individual $i$ in cluster $g$ have subscript $ig$;
Covariance matrix of the OLS estimator on clustered data is: $$\text{Var}(\hat{\boldsymbol{\beta}} \mid \textbf{X}) = (X' X)^{-1} \left( \sum_{g = 1}^G X_g' \Sigma_g X_g \right) (X' X)^{-1}$$
Cluster-robust variance estimator (CRVE), $\widehat{\text{Var}}_\text{CR}(\hat{\beta} )$, replaces $\Sigma_g$ with sample estimate $\tilde{u_g}' \tilde{u_g}$. Here $\tilde{u}$ is corrected residuals, and the standard CRVE simply uses the OLS residuals. {Bell and McCaffrey, 2002} proposed a correction $\tilde{u} = u_g \sqrt{G / (G-1)}$, which generalizes the HC3 measure in {MacKinnon and White, 1985} and is equivalent to the jackknife estimator of $\text{Var}(\hat{\boldsymbol{\beta}} \mid \textbf{X})$. [@Cameron2008] referred to this correction as CR3.
Resampling methods:
Residual cluster bootstrap requires balanced clusters.
Smoothed bootstrap, or smooth bootstrap, adds random noise to each resampled observation. It is equivalent to sample from a kernel density estimate of the data.
Smoothed bootstrap only has second order asymptotic refinement over the bootstrap for statistics that are differentiable functions of vector means {Hall, DiCiccio, Romano, 1989}, but the improvement can still be substantial in small samples [@Efron1982]. First order improvements are more likely for statistics of local properties of the PDF, such as mode [@Romano1988], quantiles {Hall, DiCiccio, Romano, 1989}, and least absolute values regression {De Angelis, Hall, Young, 1993}.