Although (parametric) probabilistic models are only seen as data fitting alternatives in statistics, functional relations can be established from the perspective of probability theory.

This article demonstrates the hierarchical structure among common probabilistic models:

- Uniform distribution is set apart from all else as the reference distribution.
- The two fundamental random processes in discrete and continuous time, Bernoulli and Poisson processes, are established upon uniform distributions, both of which lead to an important group of distributions.
- Asymptotic distributions arise from limiting processes, which deserves a separate section.

Many more probability distributions come from pure conjecture,
e.g. **triangular, semicircle, Pareto** and **Zipf**.
This type of distributions is out of consideration here.

(TODO: Make a hierarchical diagram when have time.)

Prior to any assumption made or knowledge acquired about the underlying structure of an uncertain quantity, every possible outcome is (always) assumed to be equally likely to occur. In other words, the reference distribution of any random entity is uniform over its sample space: $P_0 = U$.

For finite sample space, constant likelihood means every outcome has equal probability mass: $\forall a \in \Omega_N$, $P_0(X = a) = \frac{1}{N}$.

Define a proper subset of sample space $A \subsetneq \Omega$ as the event of interest,
the occurrance of the event is a **Bernoulli** random variable,
with occurrence probability $p = P(U \in A)$.

$$X = 1_{\{U \in A\}} \sim \text{Bernoulli}(p)$$

Built upon Bernoulli experiment,
**Bernoulli process** is a discrete random process $(X_1, X_2, \cdots)$ that satisfies:

- Constant occurrence probability over trials;
- Mutual independence of occurrence among trials.

The sum of a certain number of trials in a Bernoulli process is a **binomial** random variable,
with parameters being occurrence probability $p$ and number of trials $n$.

$$K_n = \sum_{i=1}^n X_i \sim \text{Binomial}(p,n)$$

The number of trials in a Bernoulli process until the first occurrence
is a **geometric** random variable, with parameter the occurrence probability $p$.
Note that $\text{Geometric}(p) \sim \text{NB}(p,1)$.

$$N_1 \sim \text{Geometric}(p)$$

The number of trials in a Bernoulli process until a certain number of occurrence
is a **negative binomial** random variable, with parameters being occurrence probability $p$
and number of occurrence $k$.

$$\sum_{i=1}^{N_k} X_i = k, X_{N_k} = 1 \Rightarrow N_k \sim \text{NB}(p,k)$$

If the sample space is partitioned into multiple categories,
the category of an outcome is a **categorical** distribution,
with parameters being the probability of each category $p_i = P(U \in A_i)$.

$$\mathbf{X} \sim \text{Categorical}(\mathbf{p})$$

The sum of a certain number of such categorical trials is a **multinomial** random variable,
with parameters being the category probabilities $\mathbf{p}$ and number of trials $n$.

$$\mathbf{K}_n = \sum_{i=1}^n \mathbf{X}_i \sim \text{Multinomial}(\mathbf{p},n)$$

If the trials are instead drawn from a finite population without replacement,
the sum of a certain number of trials in such a "lottery process"
is a **hypergeometric** random variable, with parameters being population size $N$,
size of event of interest $M$, and number of trials $n$.

$$K_n = \sum_{i=1}^n X_i \sim \text{Hypergeometric}(N,M,n)$$

The Poisson process is the continuous version of the Bernoulli process: within every infinitesimal time interval, the likeliness of an event occurrence is a constant, independent on observation in other time intervals.

**Poisson process** is a continuous random process that satisfies:

- Constant occurrence rate over time: $\lambda = \lim_{\Delta t \to 0+} \frac{K_{\Delta t}}{\Delta t}$
- Mutual independence of occurrence within non-overlapping intervals.

For any time interval of a certain duration,
the number of occurrences in a Poisson process is a **Poisson** random variable,
with parameter the expected occurrence $\lambda t$.

$$K_t = \sum_{i=1}^{\infty} 1_{\{X_{(i)} \in (t_0, t_0+t)\}} \sim \text{Poisson}(\lambda t)$$

Conditional on the number of occurrences in a finite time interval,
the distribution of occurrences regardless of order $\mathbf{X} = \{X_1, X_2, \cdots\}$
is **uniform** over the time interval.

$$\mathbf{X} | K_t \sim U(0,t)^{K_t}$$

Denote Poisson process as an arrival process, $\mathbf{X} = (X_{(1)}, X_{(2)}, \cdots)$,
the next occurrence epoch in a Poisson process is an **exponential** random variable,
regardless of occurrence at and before the current epoch,
with parameter the occurrence rate $\lambda$.
Any interarrival time is also such a random variable.

$$X_{(1)} \sim \text{Exp}(\lambda)$$ $$X_{(n+1)} - X_{(n)} \sim \text{Exp}(\lambda)$$

The occurrence epoch of a certain order in a Poisson process is an **Erlang** random variable,
regardless of occurrence at and before the current epoch,
with parameter being ordinal number $k$ and occurrence rate $\lambda$.
The time between a certain number of arrivals in a Poisson process is also such a random variable.
Similarly, the sum of a certain number of mutually independent exponential random variables
is again an **Erlang** random variable, with parameters the number of summands $k$
and inherited occurrence rate $\lambda$.

$$X_{(k)} \sim \text{Erlang}(k, \lambda)$$ $$X_{(n+k)} - X_{(n)} \sim \text{Erlang}(k, \lambda)$$ $$\sum_{i=1}^k X_{(1),i} \sim \text{Erlang}(k, \lambda)$$

Substituting factorial with the gamma function,
the ordinal number $k$ is generalized to the positive real line,
which gives a **Gamma** random variable $\Gamma(k, \lambda)$,
with the two parameters typically called the shape parameter $\alpha$ and the rate parameter $\beta$.

Conditional on the occurrence count $K_t$ in a finite time interval,
the ordered occurrences are **Beta** random variables,
with parameters being the order from left $k$ and the order from right $K_t +1-k$.

$$X_{(k)} | K_t \sim B(k, K_t +1-k)$$

The difference between two independent identically distributed exponential random variables
is governed by a **Laplace distribution**,
as is a Brownian motion evaluated at an exponentially distributed random time.
The location parameter is 0, the scale parameter is the inverse of occurrence rate $\lambda$ .

$$X_{(1),i} - X_{(1),j} \sim \text{Laplace}(0,\frac{1}{\lambda})$$

The (1) sum of (2) many (3) weak-correlated and (4) variance-comparable random variables
has approximately a **normal distribution**.
Conditions (3) and (4) is equivalent to the covariance being close to a scalar matrix.

$$\lim_{n \to \infty} \sum_{i=1}^{n} \frac{X_i - \mu_i}{\sqrt{n} \sigma} \sim \text{Normal}(0, 1)$$

The (1) product of (2) many (3) weak-correlated and (4) variance-comparable random variables
has approximately a **log-normal distribution**.

$$\lim_{n \to \infty} \prod_{i=1}^{n} \frac{X_i - \mu_i}{\sqrt{n} \sigma} \sim \text{LnN}(0, 1)$$

Depending on the upper tail types (exponential, power law, bounded),
there are three kinds of **extreme value distributions**.

- Type I: Gumbel distribution;
- Type II: Fréchet distribution;
- Type III: Weibull distribution;

The normal/Gaussian distribution is so commonly used that many random variables derived from it are thoroughly studied. The standard Gaussian random variable is denoted as $Z$, following convention.

The square sum of independent standard Gaussian random variables
is a **Chi-squared** random variable, with parameter, degree of freedom, the number of summands.
As a special case, the Chi-squared distribution with 2 degrees of freedom
coincide with an exponential distribution, with occurrence rate $1/2$.

$$V_n = \sum_{i=1}^{n} Z_i^2 \sim \chi^2(n)$$

$$V_2 \sim \text{Exp}(1/2)$$

The square root of a Chi-squared random variable is a **Chi** random variable,
with the same degree of freedom.
As a special case, the magnitude of a 2-dimensional standard Gaussian vector
is a **Rayleigh** random variable, with scale parameter one.

$$\sqrt{V_n} \sim \chi(n)$$

$$\sqrt{V_2} \sim \text{Rayleigh}(1)$$

The ratio of two independent standard Gaussian r.v.'s is a **Cauchy** random variable.
A Student's t r.v. with one degree of freedom is also such a random variable.
Cauchy distribution can be extended to a localtion-scale family.

$$\frac{Z_1}{Z_2} \sim \text{Cauchy}(0,1)$$

The ratio of a standard Gaussian r.v. and the square root of an independent Chi-squared r.v.
normalized by its degree of freedom is a **Student's t** random variable.
The parameter, degree of freedom, equals the degree of freedom of the Chi-squared r.v.

$$\frac{Z}{\sqrt{V_n / n}} \sim t(n)$$

The ratio of two independent Chi-squared r.v.'s, normalized by their respective degree of freedom,
is a **Fisher–Snedecor** (or simply **F**) random variable,
with parameters being the two degrees of freedom of the Chi-squared r.v.'s.

$$\frac{V_m / m}{V_n / n} \sim F(m, n)$$