Statistics has two aspects: algorithms and inference.

Statistical inference is a system of mathematical logic for guidance and correction (or justification). Classical inference methodology: frequentist, Bayesian, and Fisherian.

Principles of Statistical Inference: Sufficiency principle, Conditionality principle, Likelihood principle.

Core Concepts

The fundamental construct in probability is random variable; the fundamental construct in statistics is random sample.

Model

In statistics, a model is a probability distribution of one or more variables: univariate models; regression models;

Parametric and nonparametric methods do not have essential difference between or superiority to each other: both are collections of models and take random samples as the sole input for estimation (frequentist). Parametric methods are algorithms selecting a unique probability model from a subspace of probability models, indexed by model parameter. Nonparametric methods are algorithms selecting a unique probability model from another subspace of probability models, only without an index. Generally, nonparametric methods are non-mechanistic methods, which are statistical in essence.

Sample

Random sample is a sampling process from a hypothetical population. Traditional statistics assumes "large n, small p" (\(n\) for observations, \(p\) for parameters measured.) While in modern statistics, the problem typically is "small n, large p".

Estimation

Point Estimation: methods of finding and evaluating estimators, UMVU estimators;

Interval Estimation: confidence interval, tolerance interval;

regression: Least-squares, lasso, ridge

Hypothesis Testing

Likelihood Ratio Test (LRT), Uniformly Most Powerful (UMP) Test

False discovery rate (FDR)

Miscellaneous Topics

Asymptotic Analysis:

Statistical learning is the attempt to explain techniques of learning from data in a statistical framework.

prediction, explanation

Before Fisher, statisticians didn’t really understand estimation. The same can be said now about prediction. {CASI2017}

Reference

  1. Statistical Tables: Normal, t, F, Chi-Squared
  2. Statistical Tables: Binomial
  3. Statistical Tables: Poisson
  4. Univariate Distribution Relationships

Notes on Intuitive Biostatistics {Motulsky1995}

Table 1: Statistical Techniques

Purpose Continuous Data Count or Ranked Data Arrival Time Binary Data
(Examples) (Height) (Number of headaches in a week; Self-report score) (Life expectancy of a patient; Minutes until REM sleep begins Recurrence of infection)
Describe one sample Frequency distribution; Sample mean; Quantiles; Sample standard deviation Frequency distribution; Quantiles; Kaplan-Meier survival curve; Median survival curve; Five-year survival percentage Proportion
Distributional Test Normality tests; Outlier tests N/A N/A N/A
Infer about one population One-sample t test Wilcoxon’s rank-sum test Confidence bands around survival curve; CI of median survival CI of proportion; Binomial test to compare observed distribution with a theoretical (expected) distribution
Compare two unpaired groups Unpaired t test Mann-Whitney test Log-rank test; Gehan-Breslow test; CI of ratio of median survival times; CI of hazard ratio Fisher’s exact test;
Compare two paired groups Paired t test Wilcoxon’s matched paires test Conditional proportional hazards regression McNemar’s test
Compare three or more unpaired groups One-way ANOVA followed by multiple comparison tests Kruskal-Wallis test; Dunn’s posttest Log-rank test; Gehan-Breslow test Chi-squared test (for trend)
Compare three or more paired groups Repeated-measures ANOVA followed by multiple comparison tests Friedman’s test; Dunn’s posttest Conditional proportional hazards regression Cochran’s Q
Quantify association between two variables Pearson’s correlation Spearman’s correlation N/A N/A
Predict one variable from one or several others linear/nonlinear regression N/A Cox’s proportional hazards regression Logistic regression

🏷 Category=Statistics