This article intends to demonstrate the hierarchical structure among common probabilistic models. Although (parametric) probabilistic models are only seen as data fitting alternatives in statistics, functional relations can be established from probability theoretic perspective.
Many more probability distributions arise from pure conjecture. For example, triangular, semicircle, Pareto and Zipf, to name a few. This type of distribution is out of consideration here.
(Make a hierarchical diagram when have time.)
Prior to any assumption made or knowledge acquired about the underlying structure of an uncertain quantity, every possible outcome is (always) assumed to be equally likely to occur. In other words, the reference distribution of any random entity is uniform over its sample space.
\[ P_0 = U \]
The two basic random processes, Bernoulli and Poisson processes, are shown in the following as based on uniform distributions. Most commonly used probabilistic models can be derived from the two random proceses respectively.
For finite sample space, constant likelihood means every outcome has equal probability mass. \( \forall a \in \Omega_N, P_0(X=a) = \frac{1}{N} \)
Define a proper subset of sample space \( A \subsetneq \Omega \) as the event of interest, the occurrance of the event is a Bernoulli random variable, with occurance probability \( p = P(U \in A) \).
\[ X = \mathbf{1}(U \in A) \sim \text{Bernoulli}(p) \]
Built upon Bernoulli experiment, Bernoulli process is a discrete random process \( (X_1, X_2, \cdots) \) that satisfies:
The sum of a certain number of trials in a Bernoulli process is a binomial random variable, with parameters being occurance probability \(p\) and number of trials \(n\).
\[ K_n = \sum_{i=1}^n X_i \sim \text{Binomial}(p,n) \]
The number of trials in a Bernoulli process until a certain number of occurance is a negative binomial random variable, with parameters being occurance probability \(p\) and number of occurance \(k\).
\[ \sum_{i=1}^{N_k} X_i = k, X_{N_k} = 1 \Rightarrow N_k \sim \text{NB}(p,k) \]
The number of trials in a Bernoulli process until the first occurance is a geometric random variable, with parameter the occurance probability \(p\). Note that \( \text{Geometric}(p) \sim \text{NB}(p,1) \).
\[ N_1 \sim \text{Geometric}(p) \]
If the trials are instead:
The sum of a certain number of trials in such a "lottery process" is a hypergeometric random variable, with parameters being population size \(N\), size of event of interest \(M\), and number of trials \(n\).
\[ K_n = \sum_{i=1}^n X_i \sim \text{Hypergeometric}(N,M,n) \]
If the sample space is partitioned into multiple categories, the category of an outcome is a categorical distribution, with parameters being the probability of each category \( p_i = P(U \in A_i) \).
\[ \mathbf{X} \sim \text{Categorical}(\mathbf{p}) \]
The sum of a certain number of such categorical trials is a multinomial random variable, with parameters being the category probabilities \( \mathbf{p} \) and number of trials \(n\).
\[ K_n = \sum_{i=1}^n X_i \sim \text{Multinomial}(\mathbf{p},n) \]
If the sample space is the unit interval, over which likelihood is uniformly distributed, then the order statistics of a certain number of such trials are Beta random variables, with parameters being order from left \(k\) and order from right \(n+1-k\). \(n\) is the sample size.
\[ U_{(k)} \sim B(k,n+1-k) \]
Poisson process is a continuous random process that satisfies:
Denoted as an arrival process, it is \( \mathbf{X} = (X_{(1)}, X_{(2)}, \cdots) \).
Conditional on the number of occurances in a certain time interval, the distribution of occurances is uniform over the time interval. The Poisson process is related to the reference distribution in this sense.
\[ X|K_t \sim U(0,t)^{K_t} \]
For any time interval of a certain duration, the number of occurances in a Poisson process is a Poisson random variable, with parameter the expected occurance \( \lambda t \).
\[ K_t = \sum_{i=1}^{\infty} \mathbf{1} \left( X_{(i)} \in (t_0, t_0+t) \right) \sim \text{Poisson}(\lambda t) \]
The next occurance epoch in a Poisson process, regardless of occurance at and before the current epoch, is an exponential random variable, with parameter the occurance rate \( \lambda \). Any interarrival time is also such a random variable.
\[ X_{(1)} \sim \text{Exp}(\lambda) \] \[ X_{(n+1)} - X_{(n)} \sim \text{Exp}(\lambda) \]
The occurance epoch of a certain order in a Poisson process, regardless of occurance at and before the current epoch, is a Gamma random variable, with parameter being occurance rate \( \lambda \) and ordinal number \(k\). The time between a certain number of arrivals in a Poisson process is also such a random variable.
\[ X_{(k)} \sim \Gamma(\lambda,k) \] \[ X_{(n+k)} - X_{(n)} \sim \Gamma(\lambda,k) \]
The sum of a certain number of mutually independent exponential random variables is a Gamma random variable, with parameters the number of summands \(k\) and inherited occurance rate \(\lambda\).
\[ \sum_{i=1}^k X_{(1),i} \sim \Gamma(k,\lambda) \]
The difference between two independent identically distributed exponential random variables is governed by a Laplace distribution, as is a Brownian motion evaluated at an exponentially distributed random time. The location parameter is 0, the scale parameter is the inverse of occurance rate \(\lambda\) .
\[ X_{(1),i} - X_{(1),j} \sim \text{Laplace}(0,\frac{1}{\lambda}) \]
The (1) sum of (2) many (3) weak-correlated and (4) variance-comparable random variables has approximately a normal distribution. Conditions (3) and (4) is equivalent to the covariance being close to a scalar matrix.
\[ \sum_{i=1}^{\infty} X_i \sim \text{Normal} \]
The (1) product of (2) many (3) weak-correlated and (4) variance-comparable random variables has approximately a log-normal distribution.
\[ \prod_{i=1}^{\infty} X_i \sim \text{LnN} \]
Upper tail types: exponential, power law, bounded.
Type I: Gumbel distribution
Type II: Fréchet distribution
Type III: Weibull distribution
The square sum of independent standard Gaussian random variables is a Chi-squared random variable, with parameter, degree of freedom, the number of summands.
\[ \sum_{i=1}^{n} Z_i^2 \sim \chi^2(n) \]
The Chi-squared distribution with 2 degrees of freedom coincide with an exponential distribution, with occurrence rate \(\frac{1}{2}\).
\[ \chi^2(2) \sim \text{Exp}(\frac{1}{2}) \]
The square root of a Chi-squared random variable is a Chi random variable, with the same degree of freedom.
\[ X \sim \chi^2(n) \Rightarrow \sqrt{X} \sim \chi(n) \]
The magnitude of a 2-dimensional standard Gaussian vector is a Rayleigh random variable, with scale parameter one.
\[ \sqrt{Z_1^2 + Z_2^2 } \sim \text{Rayleigh}(1) \]
The ratio of a standard Gaussian r.v. and the square root of an independent Chi-squared r.v. normalized by its degree of freedom is a Student's t random variable. The parameter, degree of freedom, equals the degree of freedom of the Chi-squared r.v.
\[ \frac{Z}{\sqrt{\frac{V}{\nu}}} \sim t(\nu) \]
The ratio of two independent standard Gaussian r.v.'s is a Cauchy random variable. A Student's t r.v. with one degree of freedom is also such a random variable. Cauchy distribution can be extended to a localtion-scale family.
\[ \frac{Z_1}{Z_2} \sim \text{Cauchy}(0,1) \]
The ratio of two independent Chi-squared r.v.'s, normalized by their respective degree of freedom, is an F random variable, with parameters being the two degrees of freedom of the Chi-squared r.v.'s.
\[ \frac{V_1 / \nu_{1}}{V_2 / \nu_{2}} \sim F(\nu_{1}, \nu_{2}) \]