Statistics has two aspects: algorithms and inference (justification).
In statistics, a model is a (joint) probability distribution.
Parametric Probabilistic Model
Parametric and nonparametric estimators do not have essential difference between or superiority to each other. Both take random samples as the sole input. Both are collections of models. Parametric estimators, such as MLE, are algorithms selecting a unique probability model from a subspace of probability model space, indexed by model parameter. Nonparametric methods are algorithms selecting a unique probability model from another subspace of probability model space, only without an index.
Generally, non-parametric methods are non-mechanistic methods, which are basically statistical in the essence.
Random Sample: Statistic, Sampling Distributions from Gaussian Population, Order Statistic.
Notation: Use \( X \) to denote random variables; use \( x \) to denote samples.
Traditional statistics assumes "large n, small p" (n
for observations, p
for parameters measured.)
While in modern statistics, the problem typically is "small n, large p".
Asymptotic Analysis:
Statistical inference is a system of mathematical logic for guidance and correction. Classical inference methodology: Bayesian, frequentist, and Fisherian.
Principles of Statistical Inference: Sufficiency principle, Conditionality principle, Likelihood principle
Point Estimation: methods of finding and evaluating estimators, UMVU estimators
Interval Estimation: Confidence Interval, Tolerance Interval
simple linear regression, ordinary least square estimator
Logistic Regression (logit)
Model Selection: information criteria.
See the main article about Hypothesis Testing.
Likelihood Ratio Test (LRT), Uniformly Most Powerful (UMP) Test
False discovery rate (FDR)
Causal inference in statistics is hard.
Notes on Intuitive Biostatistics {Motulsky1995}
Table 1: Statistical Techniques
Purpose | Continuous Data | Count or Ranked Data | Arrival Time | Binary Data |
---|---|---|---|---|
(Examples) | (Height) | (Number of headaches in a week; Self-report score) | (Life expectancy of a patient; Minutes until REM sleep begins | Recurrence of infection) |
Describe one sample | Frequency distribution; Sample mean; Quantiles; Sample standard deviation | Frequency distribution; Quantiles; | Kaplan-Meier survival curve; Median survival curve; Five-year survival percentage | Proportion |
Distributional Test | Normality tests; Outlier tests | N/A | N/A | N/A |
Infer about one population | One-sample t test | Wilcoxon’s rank-sum test | Confidence bands around survival curve; CI of median survival | CI of proportion; Binomial test to compare observed distribution with a theoretical (expected) distribution |
Compare two unpaired groups | Unpaired t test | Mann-Whitney test | Log-rank test; Gehan-Breslow test; CI of ratio of median survival times; CI of hazard ratio | Fisher’s exact test; |
Compare two paired groups | Paired t test | Wilcoxon’s matched paires test | Conditional proportional hazards regression | McNemar’s test |
Compare three or more unpaired groups | One-way ANOVA followed by multiple comparison tests | Kruskal-Wallis test; Dunn’s posttest | Log-rank test; Gehan-Breslow test | Chi-squared test (for trend) |
Compare three or more paired groups | Repeated-measures ANOVA followed by multiple comparison tests | Friedman’s test; Dunn’s posttest | Conditional proportional hazards regression | Cochran’s Q |
Quantify association between two variables | Pearson’s correlation | Spearman’s correlation | N/A | N/A |
Predict one variable from one or several others | linear/nonlinear regression | N/A | Cox’s proportional hazards regression | Logistic regression |