Statistics is not intuitive.
People frequently see patterns in random data and often jump to unwarranted conclusions.
Decisions about how to analyze data should be made in advance.
Beware of "HARKing": hypothesizing after results are known.
Statistical inference lets you make general conclusions from limited data. Statistical conclusions are always presented in terms of probability.
A confidence interval quantifies precision, and is easy to interpret.
All statistical tests are based on assumptions.
If your data are not representative of a larger set of data you could have collected (but did not), then statistical inference makes no sense.
A p-values tests a null hypothesis, and is hard to understand at first. "Statistically significant" does not mean the effect is large or scientifically important. "Not significantly different" does not mean the effect is absent, small or scientifically irrelevant.
The concept of statistical significance is designed to help make a decision based on one result.
If a difference is not statistically significant, you can conclude that the observed results are not inconsistent with the null hypothesis. You cannot conclude that the null hypothesis is true.
Multiple comparisons (tests) make it hard to interpret statistical results.
Correlation does not mean causation.
Published statistics tend to be optimistic.
Elements of experimental design:
In mathematical statistics, design of experiments (DOE) [@Fisher1935] deals with the optimal configuration of variables to be used in an experiment subject to measurement errors. Example designs: factorial (all possible combinations of factor levels), fractional factorial, one-factor-at-a-time; block (e.g. Latin square, Latin hypercube, orthogonal array); response surface (optimal design for regression models);
Some principles in design of experiments:
(Factors refers to independent variables.) (Nuisance factors are those that may affect the measured result, but are not of primary interest.)
Compared to most experiments, observational studies often require more complicated analyses and yield less certain results.
Observational studies can be useful but are rarely definitive.
Think of statistical significance (and p-value) as resolution of your observation on an uncertain quantity. It only determines if your sample size is sufficiently large to distinguish an effect. Effect size (the size of the difference, association, or correlation) needs to be compared to some pre-determined reference value for the effect to be nontrivial.
The null is typically the hypothesis to be rejected. And hypothesis testing is useful only in this way.
Conclusions cannot be made on individuals when data are at group-level.
Although classic regression only cares about Conditional Expectation Function.
Inference about the difference between two differences needs to be based on a single test on that exact quantity, not tests on the component differences.
Two populations are distinct if they have distinct population-level attributes or unit-level attributes homogeneous with either population, or they have different mechanism that generates observed attributes. Combining samples from these populations may confound population-specific trends or mechanism.