[@Kolmogorov1956] laid the foundation of axiomatic probability theory.

Probability space

Sample space $\Omega$ is a set of elementary events. A sigma-algebra $\Sigma$ of a sample space consists of all the events that can be measured using some instrument. Probability measure $P$ of a measurable sample space $(\Omega, \Sigma)$ is a normalized finite measure: $P(\Omega) = 1$. Any finite measure can be normalized into a probablity measure. Probability space $(\Omega, \Sigma, P)$ is a sample space $\Omega$ assigned with a probability measure $P$ on its measureable sets $\Sigma$.

Joint Probability

Joint probability $P(A \cap B)$ of two events is the probability of them happen simultaneously. Total probability theorem (aka law of total probability): the probability of any event equals the total of the joint probabilities of the event and any partition mod 0 of the sample space; $P(\sqcup_{i=1}^n A_i) = 1$, then $\forall B \in \Sigma$, $P(B) = \sum_{i=1}^n P(B \cap A_i)$.

Marginal probability $P(A)$ of an event refers to the probability of the event in the context of a collection of other events that partition the sample space mod 0. By the total probability theorem, marginal probability can be computed as: $P(A_i) = \sum_{j=1}^m P(A_i \cap B_j)$, where $P(\sqcup_{i=1}^n A_i) = 1$ and $P(\sqcup_{j=1}^m B_j) = 1$, The joint probability of two independent events equals the product of their marginal probabilities: $P(A_i \cap B_j) = P(A_i) P(B_j)$.

Conditional Probability

Conditional probability $P(B \mid A)$ of an event given another event is the probability of the event in the probability space restricted to the other event: $P(B \mid A) = \frac{P(A \cap B)}{P(A)}$. From the definition of conditional probability, we can compute the joint probability as the product of the marginal probability of one event and the conditional probability of the other event given the former: $P(A \cap B) = P(A) P(B \mid A)$. Chain rule of probability or general product rule of probability generalizes this equality to the joint probability of multiple events: $P(\cap_{i=1}^n A_i) = P(A_1) \prod_{i=2}^n P(A_i \mid \cap_{j=1}^{i-1} A_j)$.

Bayes' Theorem

Bayes' theorem, Bayes' law, or Bayes' rule combines the total probability theorem and the chain rule of probability: $P(\sqcup_{i=1}^n A_i) = 1$, $\forall B \in \Sigma$, $P(A_i \mid B) = \frac{P(B \mid A_i) P(A_i)}{\sum_{i=1}^n P(B \mid A_i) P(A_i)}$.