Statistics

Statistics has two aspects: algorithms and inference (justification).

Core Concepts

Model

In statistics, a model is a (joint) probability distribution.

Parametric and nonparametric estimators do not have essential difference between or superiority to each other. Both take random samples as the sole input. Both are collections of models. Parametric estimators, such as MLE, are algorithms selecting a unique probability model from a subspace of probability model space, indexed by model parameter. Nonparametric methods are algorithms selecting a unique probability model from another subspace of probability model space, only without an index.

Generally, non-parametric methods are non-mechanistic methods, which are basically statistical in the essence.

Sample

Random Sample: Statistic, Sampling Distributions from Gaussian Population, Order Statistic.

Notation: Use \( X \) to denote random variables; use \( x \) to denote samples.

Traditional statistics assumes "large n, small p" (n for observations, p for parameters measured.) While in modern statistics, the problem typically is "small n, large p".

Asymptotic Analysis:

Statistical Inference

Statistical inference is a system of mathematical logic for guidance and correction. Classical inference methodology: Bayesian, frequentist, and Fisherian.

Principles of Statistical Inference: Sufficiency principle, Conditionality principle, Likelihood principle

Estimation

Point Estimation: methods of finding and evaluating estimators, UMVU estimators

Interval Estimation: Confidence Interval, Tolerance Interval

Regression

simple linear regression, ordinary least square estimator

Logistic Regression (logit)

Model Selection: information criteria.

Hypothesis Testing

See the main article about Hypothesis Testing.

Likelihood Ratio Test (LRT), Uniformly Most Powerful (UMP) Test

False discovery rate (FDR)

Causal Inference

Causal inference in statistics is hard.

Reference

Notes on Intuitive Biostatistics {Motulsky1995}

Table 1: Statistical Techniques

Purpose	Continuous Data	Count or Ranked Data	Arrival Time	Binary Data
(Examples)	(Height)	(Number of headaches in a week; Self-report score)	(Life expectancy of a patient; Minutes until REM sleep begins	Recurrence of infection)
Describe one sample	Frequency distribution; Sample mean; Quantiles; Sample standard deviation	Frequency distribution; Quantiles;	Kaplan-Meier survival curve; Median survival curve; Five-year survival percentage	Proportion
Distributional Test	Normality tests; Outlier tests	N/A	N/A	N/A
Infer about one population	One-sample t test	Wilcoxon’s rank-sum test	Confidence bands around survival curve; CI of median survival	CI of proportion; Binomial test to compare observed distribution with a theoretical (expected) distribution
Compare two unpaired groups	Unpaired t test	Mann-Whitney test	Log-rank test; Gehan-Breslow test; CI of ratio of median survival times; CI of hazard ratio	Fisher’s exact test;
Compare two paired groups	Paired t test	Wilcoxon’s matched paires test	Conditional proportional hazards regression	McNemar’s test
Compare three or more unpaired groups	One-way ANOVA followed by multiple comparison tests	Kruskal-Wallis test; Dunn’s posttest	Log-rank test; Gehan-Breslow test	Chi-squared test (for trend)
Compare three or more paired groups	Repeated-measures ANOVA followed by multiple comparison tests	Friedman’s test; Dunn’s posttest	Conditional proportional hazards regression	Cochran’s Q
Quantify association between two variables	Pearson’s correlation	Spearman’s correlation	N/A	N/A
Predict one variable from one or several others	linear/nonlinear regression	N/A	Cox’s proportional hazards regression	Logistic regression

🏷 Category=Statistics