Asymptotic analysis applies when sample size is large, and results are limited to statistics that are analytically tractable. Bootstrap [@Efron1979] is an empirical, nonparametric alternative to asymptotic analysis, estimating the sampling distribution of a statistic by resampling the original sample with replacement, i.e. resampling from the empirical PDF (EPDF). As a nonparametric method, bootstrap should be the default inference procedure when we have no knowledge of the population or sampling distribution, or when the sample size is insufficient.
Notations: $N$, sample size; $B$, bootstrap repetition; $\beta_1$, coefficient of interest; $H_0: \beta_1 = \beta_1^0$, null hypothesis; $\hat{\beta}$, estimator; $\hat{\beta}_{1b}^∗$, bootstrap estimator, which may be different from the standard estimator; $\hat{\beta}^R$, restricted estimator; $\mathbf{u}$, residuals; $s$, estimator of standard error; $w = (\hat{\beta}_1 - \beta_1^0) / s_{\hat{\beta}_1}$, Wald test statistic; $\alpha$, significance level;
Resampling refers to any method that creates replicate datasets from available data, so that a given data analysis procedure can be repeated, and the collection of outcomes can be summarized to quantify uncertainty of the original outcome, without any analytical calculation. The data analysis procedure can be estimating a population parameter (jackknife, bootstrap), testing randomness (permutation tests), validating prediction models (cross validation), etc. Sampling, in comparison, is a process of gathering observations from an idealized population to estimate properties of the population, e.g. MCMC.
Plug-in principle is the method of estimation of functionals of a population distribution by evaluating the same functionals at the empirical distribution based on a sample: $g\{X\} \dot \sim g\{x_I\}$.
Asymptotic refinement refers to a convergence rate faster than using first-order asymptotic theory. To have an asymptotic refinement, a bootstrap needs to be applied to an asymptotically pivotal statistic, i.e. a statistic whose asymptotic distribution does not contain unknown parameters.
Smoothed bootstrap, or smooth bootstrap, adds random noise to each resampled observation. It is equivalent to sample from a kernel density estimate of the data. Smoothed bootstrap only has second order asymptotic refinement over the bootstrap for statistics that are differentiable functions of vector means [@Hall, DiCiccio, and Romano, 1989], but the improvement can still be substantial in small samples [@Efron1982]. First order improvements are more likely for statistics of local properties of the PDF, such as mode [@Romano1988], quantiles [@Hall, DiCiccio, and Romano, 1989], and least absolute values regression [@De Angelis, Hall, and Young, 1993].
bootstrap standard error;
bootstrap confidence interval: percentile bootstrap; bias-corrected and accelerated (BCa) bootstrap [@Efron1987];
Bootstrap is asymptotically more accurate than the standard confidence intervals obtained using sample variance and normality assumption. [@DiCiccio and Efron, 1996]
Residual and wild bootstraps can impose the null hypothesis in resampling [@Davidson and MacKinnon, 1999], where the bootstrap Wald statistics are centered on $\beta_1^0$ and the residuals bootstrapped are those from the restricted OLS estimator that imposes $H_0$.
Bootstrap-t procedures provide asymptotic refinement, while bootstrap-se procedures do not.
A sample may contain clusters of observational units such that regression errors of the observations are independent across clusters but correlated within. Such correlation effectively reduce sample size to the number of clusters in statistical inference, where errors are assumed to be independent across observations. See [@Cameron2015] for a good review on inference with clustered data.
Number of clusters in sample, $G$; Number of observations in cluster $g$, $N_g$; Subsample of cluster $g$, $(y_g, X_g)$; Covariance matrix of regression errors within cluster $g$, $\Sigma_g$; Individual $i$ in cluster $g$ have subscript $ig$;
Covariance matrix of the OLS estimator on clustered data is: $$\text{Var}(\hat{\boldsymbol{\beta}} \mid \textbf{X}) = (X' X)^{-1} \left( \sum_{g = 1}^G X_g' \Sigma_g X_g \right) (X' X)^{-1}$$
Cluster-robust variance estimator (CRVE), $\widehat{\text{Var}}_\text{CR}(\hat{\beta} )$, replaces $\Sigma_g$ with sample estimate $\tilde{u_g}' \tilde{u_g}$. Here $\tilde{u}$ is corrected residuals, and the standard CRVE simply uses the OLS residuals. [@Bell and McCaffrey, 2002] proposed a correction $\tilde{u} = u_g \sqrt{G / (G-1)}$, which generalizes the HC3 measure in [@MacKinnon and White, 1985] and is equivalent to the jackknife estimator of $\text{Var}(\hat{\boldsymbol{\beta}} \mid \textbf{X})$. [@Cameron2008] referred to this correction as CR3.
Resampling methods:
Residual cluster bootstrap requires balanced clusters.
Bootrap on data with other dependency structures, e.g. time series; point processes.
[@Efron and Tibsharani, 1993]
[@Davison and Hinkley, 1997]