A model form, often simply called a model, specifies a set of probability distributions of an outcome variable conditioned on some predictors. Specifically, a (parametric) model is a group of conditional probability distributions indexed by its parameters.
In comparison, an estimator specifies the optimal model parameters in some criteria as a function of a sample.
A fitted model specifies a probability distribution of the outcome variable conditioned on predictors, which is an element of the model form elected by the estimator given the observed sample.
linear regression: simple linear regression;
\[ Y \mid \mathbf{X} \sim \text{Normal}(\mathbf{X} \cdot \beta, \sigma^2) \]
Generalized linear regression (GLM): polynomial, splines; count data; logistic regression (LR) (K=2);
Estimator is the cost function (optimization criteria) on training data: least squares (LS), ridge, lasso.
R-squared \( R^2 \), or coefficient of determination, is the proportion of sample variance explained by a fitted model; in other words, it is the ratio of regression sum of squares (RSS) to total sum of squares (TSS).
\[ R^2 = \frac{\sum_i (\hat{y}_i-\bar{y})^2}{\sum_i (y_i-\bar{y})^2} \]
Properties:
In a linear model with a single regressor and a constant term, the coefficient of determination \( R^2 \) is the square of the correlation between the regressor and the dependent variable,
\[ R^2 = \left( \frac{ \widehat{\mathrm{cov}(X,Y)} }{\hat{\sigma}_X \hat{\sigma}_Y} \right)^2 = \frac{\left( \frac{1}{n} \sum\limits_{i=1}^n (x_i-\bar{x})(y_i-\bar{y}) \right)^2} { \left( \frac{1}{n}\sum\limits_{i=1}^n (x_i-\bar{x})^2 \right) \left( \frac{1}{n}\sum\limits_{i=1}^n (y_i-\bar{y})^2 \right) } \]
If the number of regressors is more than one, \( R^2 \) can be seen as the square of a coefficient of multiple correlation.
\[ R^2 = \widehat{\mathrm{cov}(Y,\mathbf{X})} \left( \hat{\sigma}_Y^2 \widehat{\mathrm{cov}( \mathbf{X},\mathbf{X})} \right) ^{-1} \widehat{\mathrm{cov}(\mathbf{X},Y)} \]
\[ = (Y_0^{T} X_0) [(Y_0^{T} Y_0) (X_0^{T} X_0)]^{-1} (X_0^{T} Y_0) \]
F is the ratio of explained variance to unexplained variance; also the ratio of between-group variance to within-group variance.
Mean square error (MSE) of prediction is the gold standard: (bias-variance trade-off)
MSE = variance + bias^2 + true variance
Model selection is the process of selecting the proper level of flexibility/complexity for a model. Given any data set, LS gives the optimal fit in RSS within the default full model (p-dimensional LM). However, optimal training error does not mean optimal prediction error; this is what we call overfitting.
Model selection methods:
Types of LM model selection:
One-standard-error rule of prediction error estimates (MSE): Choose the simplest fitted model (lowest \(m\) or \(M\); highest \(\lambda\)) with prediction MSE within one standard error of the smallest prediction MSE.
Estimates of prediction error (MSE or RMSE):
Adjusted R-squared increases with one more predictor, if and only if the t-ratio of the new predictor is greater than 1.
\[ \bar{R}^2 = 1 - \frac{\text{RSS}/(n-k)}{\text{TSS}/(n-1)} \]
Cross validation (CV) estimates prediction error; bootstrap estimates estimator variation. Cross-validation is preferred to validation.
Bootstrap is repeated subsampling with replacement to estimates estimator variation (or any other population information).
References: {ISL2015}, {ESL2013}