[@Kolmogorov1956] laid the foundation of axiomatic probability theory.

## Probability space

**Sample space** $\Omega$ is a set of elementary events.
A sigma-algebra $\Sigma$ of a sample space consists of
all the events that can be measured using some instrument.
**Probability measure** $P$ of a measurable sample space $(\Omega, \Sigma)$
is a normalized finite measure: $P(\Omega) = 1$.
Any finite measure can be normalized into a probablity measure.
**Probability space** $(\Omega, \Sigma, P)$
is a sample space $\Omega$ assigned with a probability measure $P$ on its measureable sets $\Sigma$.

## Joint Probability

**Joint probability** $P(A \cap B)$ of two events is the probability of them happen simultaneously.
**Total probability theorem** (aka **law of total probability**):
the probability of any event equals the total of the joint probabilities of
the event and any partition mod 0 of the sample space;
$P(\sqcup_{i=1}^n A_i) = 1$, then $\forall B \in \Sigma$, $P(B) = \sum_{i=1}^n P(B \cap A_i)$.

**Marginal probability** $P(A)$ of an event refers to the probability of the event
in the context of a collection of other events that partition the sample space mod 0.
By the total probability theorem, marginal probability can be computed as:
$P(A_i) = \sum_{j=1}^m P(A_i \cap B_j)$,
where $P(\sqcup_{i=1}^n A_i) = 1$ and $P(\sqcup_{j=1}^m B_j) = 1$,
The joint probability of two independent events equals the product of their marginal probabilities:
$P(A_i \cap B_j) = P(A_i) P(B_j)$.

## Conditional Probability

**Conditional probability** $P(B \mid A)$ of an event given another event
is the probability of the event in the probability space restricted to the other event:
$P(B \mid A) = \frac{P(A \cap B)}{P(A)}$.
From the definition of conditional probability,
we can compute the joint probability as the product of the marginal probability of one event
and the conditional probability of the other event given the former:
$P(A \cap B) = P(A) P(B \mid A)$.
**Chain rule of probability** or **general product rule of probability**
generalizes this equality to the joint probability of multiple events:
$P(\cap_{i=1}^n A_i) = P(A_1) \prod_{i=2}^n P(A_i \mid \cap_{j=1}^{i-1} A_j)$.

## Bayes' Theorem

**Bayes' theorem**, **Bayes' law**, or **Bayes' rule**
combines the total probability theorem and the chain rule of probability:
$P(\sqcup_{i=1}^n A_i) = 1$, $\forall B \in \Sigma$,
$P(A_i \mid B) = \frac{P(B \mid A_i) P(A_i)}{\sum_{i=1}^n P(B \mid A_i) P(A_i)}$.

🏷 Category=Probability