Probabilistic Models

Although (parametric) probabilistic models are only seen as data fitting alternatives in statistics, functional relations can be established from the perspective of probability theory.

This article demonstrates the hierarchical structure among common probabilistic models:

Uniform distribution is set apart from all else as the reference distribution.
The two fundamental random processes in discrete and continuous time, Bernoulli and Poisson processes, are established upon uniform distributions, both of which lead to an important group of distributions.
Asymptotic distributions arise from limiting processes, which deserves a separate section.

Many more probability distributions come from pure conjecture, e.g. triangular, semicircle, Pareto and Zipf. This type of distributions is out of consideration here.

(TODO: Make a hierarchical diagram when have time.)

Reference Distribution

Prior to any assumption made or knowledge acquired about the underlying structure of an uncertain quantity, every possible outcome is (always) assumed to be equally likely to occur. In other words, the reference distribution of any random entity is uniform over its sample space: $P_0 = U$.

Distributions from Discrete Random Process

For finite sample space, constant likelihood means every outcome has equal probability mass: $\forall a \in \Omega_N$, $P_0(X = a) = \frac{1}{N}$.

Bernoulli Process and Derived Distributions

Define a proper subset of sample space $A \subsetneq \Omega$ as the event of interest, the occurrance of the event is a Bernoulli random variable, with occurrence probability $p = P(U \in A)$.

$$X = 1_{\{U \in A\}} \sim \text{Bernoulli}(p)$$

Built upon Bernoulli experiment, Bernoulli process is a discrete random process $(X_1, X_2, \cdots)$ that satisfies:

Constant occurrence probability over trials;
Mutual independence of occurrence among trials.

The sum of a certain number of trials in a Bernoulli process is a binomial random variable, with parameters being occurrence probability $p$ and number of trials $n$.

$$K_n = \sum_{i=1}^n X_i \sim \text{Binomial}(p,n)$$

The number of trials in a Bernoulli process until the first occurrence is a geometric random variable, with parameter the occurrence probability $p$. Note that $\text{Geometric}(p) \sim \text{NB}(p,1)$.

$$N_1 \sim \text{Geometric}(p)$$

The number of trials in a Bernoulli process until a certain number of occurrence is a negative binomial random variable, with parameters being occurrence probability $p$ and number of occurrence $k$.

$$\sum_{i=1}^{N_k} X_i = k, X_{N_k} = 1 \Rightarrow N_k \sim \text{NB}(p,k)$$

Categorical Trials and Derived Distributions

If the sample space is partitioned into multiple categories, the category of an outcome is a categorical distribution, with parameters being the probability of each category $p_i = P(U \in A_i)$.

$$\mathbf{X} \sim \text{Categorical}(\mathbf{p})$$

The sum of a certain number of such categorical trials is a multinomial random variable, with parameters being the category probabilities $\mathbf{p}$ and number of trials $n$.

$$\mathbf{K}_n = \sum_{i=1}^n \mathbf{X}_i \sim \text{Multinomial}(\mathbf{p},n)$$

Finite Lottery and Hypergeometric Distribution

If the trials are instead drawn from a finite population without replacement, the sum of a certain number of trials in such a "lottery process" is a hypergeometric random variable, with parameters being population size $N$, size of event of interest $M$, and number of trials $n$.

$$K_n = \sum_{i=1}^n X_i \sim \text{Hypergeometric}(N,M,n)$$

Distributions from Continuous Random Process

The Poisson process is the continuous version of the Bernoulli process: within every infinitesimal time interval, the likeliness of an event occurrence is a constant, independent on observation in other time intervals.

Poisson Process and Derived Distributions

Poisson process is a continuous random process that satisfies:

Constant occurrence rate over time: $\lambda = \lim_{\Delta t \to 0+} \frac{K_{\Delta t}}{\Delta t}$
Mutual independence of occurrence within non-overlapping intervals.

For any time interval of a certain duration, the number of occurrences in a Poisson process is a Poisson random variable, with parameter the expected occurrence $\lambda t$.

$$K_t = \sum_{i=1}^{\infty} 1_{\{X_{(i)} \in (t_0, t_0+t)\}} \sim \text{Poisson}(\lambda t)$$

Conditional on the number of occurrences in a finite time interval, the distribution of occurrences regardless of order $\mathbf{X} = \{X_1, X_2, \cdots\}$ is uniform over the time interval.

$$\mathbf{X} | K_t \sim U(0,t)^{K_t}$$

Denote Poisson process as an arrival process, $\mathbf{X} = (X_{(1)}, X_{(2)}, \cdots)$, the next occurrence epoch in a Poisson process is an exponential random variable, regardless of occurrence at and before the current epoch, with parameter the occurrence rate $\lambda$. Any interarrival time is also such a random variable.

$$X_{(1)} \sim \text{Exp}(\lambda)$$ $$X_{(n+1)} - X_{(n)} \sim \text{Exp}(\lambda)$$

The occurrence epoch of a certain order in a Poisson process is an Erlang random variable, regardless of occurrence at and before the current epoch, with parameter being ordinal number $k$ and occurrence rate $\lambda$. The time between a certain number of arrivals in a Poisson process is also such a random variable. Similarly, the sum of a certain number of mutually independent exponential random variables is again an Erlang random variable, with parameters the number of summands $k$ and inherited occurrence rate $\lambda$.

$$X_{(k)} \sim \text{Erlang}(k, \lambda)$$ $$X_{(n+k)} - X_{(n)} \sim \text{Erlang}(k, \lambda)$$ $$\sum_{i=1}^k X_{(1),i} \sim \text{Erlang}(k, \lambda)$$

Substituting factorial with the gamma function, the ordinal number $k$ is generalized to the positive real line, which gives a Gamma random variable $\Gamma(k, \lambda)$, with the two parameters typically called the shape parameter $\alpha$ and the rate parameter $\beta$.

Conditional on the occurrence count $K_t$ in a finite time interval, the ordered occurrences are Beta random variables, with parameters being the order from left $k$ and the order from right $K_t +1-k$.

$$X_{(k)} | K_t \sim B(k, K_t +1-k)$$

The difference between two independent identically distributed exponential random variables is governed by a Laplace distribution, as is a Brownian motion evaluated at an exponentially distributed random time. The location parameter is 0, the scale parameter is the inverse of occurrence rate $\lambda$ .

$$X_{(1),i} - X_{(1),j} \sim \text{Laplace}(0,\frac{1}{\lambda})$$

Asymptotic Distributions

The (1) sum of (2) many (3) weak-correlated and (4) variance-comparable random variables has approximately a normal distribution. Conditions (3) and (4) is equivalent to the covariance being close to a scalar matrix.

$$\lim_{n \to \infty} \sum_{i=1}^{n} \frac{X_i - \mu_i}{\sqrt{n} \sigma} \sim \text{Normal}(0, 1)$$

The (1) product of (2) many (3) weak-correlated and (4) variance-comparable random variables has approximately a log-normal distribution.

$$\lim_{n \to \infty} \prod_{i=1}^{n} \frac{X_i - \mu_i}{\sqrt{n} \sigma} \sim \text{LnN}(0, 1)$$

Depending on the upper tail types (exponential, power law, bounded), there are three kinds of extreme value distributions.

Type I: Gumbel distribution;
Type II: Fréchet distribution;
Type III: Weibull distribution;

Gaussian-related Distributions

The normal/Gaussian distribution is so commonly used that many random variables derived from it are thoroughly studied. The standard Gaussian random variable is denoted as $Z$, following convention.

The square sum of independent standard Gaussian random variables is a Chi-squared random variable, with parameter, degree of freedom, the number of summands. As a special case, the Chi-squared distribution with 2 degrees of freedom coincide with an exponential distribution, with occurrence rate $1/2$.

$$V_n = \sum_{i=1}^{n} Z_i^2 \sim \chi^2(n)$$

$$V_2 \sim \text{Exp}(1/2)$$

The square root of a Chi-squared random variable is a Chi random variable, with the same degree of freedom. As a special case, the magnitude of a 2-dimensional standard Gaussian vector is a Rayleigh random variable, with scale parameter one.

$$\sqrt{V_n} \sim \chi(n)$$

$$\sqrt{V_2} \sim \text{Rayleigh}(1)$$

The ratio of two independent standard Gaussian r.v.'s is a Cauchy random variable. A Student's t r.v. with one degree of freedom is also such a random variable. Cauchy distribution can be extended to a localtion-scale family.

$$\frac{Z_1}{Z_2} \sim \text{Cauchy}(0,1)$$

The ratio of a standard Gaussian r.v. and the square root of an independent Chi-squared r.v. normalized by its degree of freedom is a Student's t random variable. The parameter, degree of freedom, equals the degree of freedom of the Chi-squared r.v.

$$\frac{Z}{\sqrt{V_n / n}} \sim t(n)$$

The ratio of two independent Chi-squared r.v.'s, normalized by their respective degree of freedom, is a Fisher–Snedecor (or simply F) random variable, with parameters being the two degrees of freedom of the Chi-squared r.v.'s.

$$\frac{V_m / m}{V_n / n} \sim F(m, n)$$

🏷 Category=Probability