Asymptotic analysis applies when sample size is large, and results are limited to statistics that are analytically tractable. Bootstrap [@Efron1979] is an empirical, nonparametric alternative to asymptotic analysis, estimating the sampling distribution of a statistic by resampling the original sample with replacement, i.e. resampling from the empirical PDF (EPDF). As a nonparametric method, bootstrap should be the default inference procedure when we have no knowledge of the population or sampling distribution, or when the sample size is insufficient.
Notations: $N$, sample size; $B$, bootstrap repetition; $\beta_1$, coefficient of interest; $H_0: \beta_1 = \beta_1^0$, null hypothesis; $\hat{\beta}$, estimator; $\hat{\beta}_{1b}^∗$, bootstrap estimator, which may be different from the standard estimator; $\hat{\beta}^R$, restricted estimator; $\mathbf{u}$, residuals; $s$, estimator of standard error; $w = (\hat{\beta}_1 - \beta_1^0) / s_{\hat{\beta}_1}$, Wald test statistic; $\alpha$, significance level;
Resampling refers to any method that creates replicate datasets from available data, so that a given data analysis procedure can be repeated, and the collection of outcomes can be summarized to quantify uncertainty of the original outcome, without any analytical calculation. The data analysis procedure can be estimating a population parameter (jackknife, bootstrap), testing randomness (permutation tests), validating prediction models (cross validation), etc. Sampling or simulation, in comparison, is a process of gathering observations from an idealized population to estimate properties of the population, e.g. Monte Carlo methods.
Plug-in principle is the method of estimation of functionals of a population distribution by evaluating the same functionals at the empirical distribution based on a sample: $g\{X\} \dot \sim g\{x_I\}$.
Asymptotic refinement refers to a convergence rate faster than using first-order asymptotic theory. To have an asymptotic refinement, a bootstrap needs to be applied to an asymptotically pivotal statistic, i.e. a statistic whose asymptotic distribution does not contain unknown parameters.
Smoothed bootstrap, or smooth bootstrap, adds random noise to each resampled observation. It is equivalent to sample from a kernel density estimate of the data. Smoothed bootstrap only has second order asymptotic refinement over the bootstrap for statistics that are differentiable functions of vector means [@Hall1989], but the improvement can still be substantial in small samples [@Efron1982]. First order improvements are more likely for statistics of local properties of the PDF, e.g. mode [@Romano1988], quantiles [@Hall1989], and least absolute values [@Angelis1993] regression.
bootstrap standard error;
bootstrap confidence interval: percentile bootstrap; bias-corrected and accelerated (BCa) bootstrap [@Efron1987];
Bootstrap is asymptotically more accurate than the standard confidence intervals obtained using sample variance and normality assumption. [@DiCiccio1996]
Pairs bootstrap is often acceptable if the data set is fairly large. But in regression problems, the explanatory variables are often fixed (i.e. no error), or at least observed with more control than the response variable. Also the range of the explanatory variables are informative. Therefore, each pairs bootstrap resample will lose some information.
Residual and wild bootstraps can impose the null hypothesis in resampling [@Davidson1999], where the bootstrap Wald statistics are centered on $\beta_1^0$ and the residuals bootstrapped are those from the restricted OLS estimator that imposes $H_0$.
Bootstrap-t procedures provide asymptotic refinement, while bootstrap-se procedures do not.
A sample may contain clusters of observational units such that regression errors of the observations are independent across clusters but correlated within. Such correlation effectively reduce sample size to the number of clusters in statistical inference, where errors are assumed to be independent across observations. See [@Cameron2015] for a good review on inference with clustered data.
Number of clusters in sample, $G$; Number of observations in cluster $g$, $N_g$; Subsample of cluster $g$, $(y_g, X_g)$; Covariance matrix of regression errors within cluster $g$, $\Sigma_g$; Individual $i$ in cluster $g$ have subscript $ig$;
Covariance matrix of the OLS estimator on clustered data is: $$\text{Var}(\hat{\boldsymbol{\beta}} \mid \textbf{X}) = (X' X)^{-1} \left( \sum_{g = 1}^G X_g' \Sigma_g X_g \right) (X' X)^{-1}$$
Cluster-robust variance estimator (CRVE), $\widehat{\text{Var}}_\text{CR}(\hat{\beta} )$, replaces $\Sigma_g$ with sample estimate $\tilde{u_g}' \tilde{u_g}$. Here $\tilde{u}$ is corrected residuals, and the standard CRVE simply uses the OLS residuals. [@Bell2002] proposed a correction $\tilde{u} = u_g \sqrt{G / (G-1)}$, which generalizes the HC3 measure in [@MacKinnon1985] and is equivalent to the jackknife estimator of $\text{Var}(\hat{\boldsymbol{\beta}} \mid \textbf{X})$. [@Cameron2008] referred to this correction as CR3.
Resampling methods:
Residual cluster bootstrap requires balanced clusters.
Bootrap on data with other dependency structures, e.g. time series; point processes.
Bootstrap becomes computationally prohibitive in multiple testing on genomes and big data settings. Bag of little bootstraps (BLB) [@Kleiner2014] extends the idea of bootstrap for big data sets.
An introduction to the bootstrap. [@Efron1993]
Bootstrap Methods and their Application. [@Davison1997]