Probabilistic Models

This article intends to demonstrate the hierarchical structure among common probabilistic models. Although (parametric) probabilistic models are only seen as data fitting alternatives in statistics, functional relations can be established from probability theoretic perspective.

Many more probability distributions arise from pure conjecture. For example, triangular, semicircle, Pareto and Zipf, to name a few. This type of distribution is out of consideration here.

(Make a hierarchical diagram when have time.)

Reference Distribution

Prior to any assumption made or knowledge acquired about the underlying structure of an uncertain quantity, every possible outcome is (always) assumed to be equally likely to occur. In other words, the reference distribution of any random entity is uniform over its sample space.

\[ P_0 = U \]

The two basic random processes, Bernoulli and Poisson processes, are shown in the following as based on uniform distributions. Most commonly used probabilistic models can be derived from the two random proceses respectively.

Distributions from Discrete Random Process

For finite sample space, constant likelihood means every outcome has equal probability mass. \( \forall a \in \Omega_N, P_0(X=a) = \frac{1}{N} \)

Bernoulli Process and Derived Distributions

Define a proper subset of sample space \( A \subsetneq \Omega \) as the event of interest, the occurrance of the event is a Bernoulli random variable, with occurance probability \( p = P(U \in A) \).

\[ X = \mathbf{1}(U \in A) \sim \text{Bernoulli}(p) \]

Built upon Bernoulli experiment, Bernoulli process is a discrete random process \( (X_1, X_2, \cdots) \) that satisfies:

Constant occurance probability over trials;
Mutual independence of occurance among trials.

The sum of a certain number of trials in a Bernoulli process is a binomial random variable, with parameters being occurance probability \(p\) and number of trials \(n\).

\[ K_n = \sum_{i=1}^n X_i \sim \text{Binomial}(p,n) \]

The number of trials in a Bernoulli process until a certain number of occurance is a negative binomial random variable, with parameters being occurance probability \(p\) and number of occurance \(k\).

\[ \sum_{i=1}^{N_k} X_i = k, X_{N_k} = 1 \Rightarrow N_k \sim \text{NB}(p,k) \]

The number of trials in a Bernoulli process until the first occurance is a geometric random variable, with parameter the occurance probability \(p\). Note that \( \text{Geometric}(p) \sim \text{NB}(p,1) \).

\[ N_1 \sim \text{Geometric}(p) \]

"Lottery Process" and Hypergeometric Distribution

If the trials are instead:

drawn from a finite population;
drawn without replacement.

The sum of a certain number of trials in such a "lottery process" is a hypergeometric random variable, with parameters being population size \(N\), size of event of interest \(M\), and number of trials \(n\).

\[ K_n = \sum_{i=1}^n X_i \sim \text{Hypergeometric}(N,M,n) \]

Categorical Trials and Derived Distributions

If the sample space is partitioned into multiple categories, the category of an outcome is a categorical distribution, with parameters being the probability of each category \( p_i = P(U \in A_i) \).

\[ \mathbf{X} \sim \text{Categorical}(\mathbf{p}) \]

The sum of a certain number of such categorical trials is a multinomial random variable, with parameters being the category probabilities \( \mathbf{p} \) and number of trials \(n\).

\[ K_n = \sum_{i=1}^n X_i \sim \text{Multinomial}(\mathbf{p},n) \]

Trials on Unit Interval and Beta Distribution

If the sample space is the unit interval, over which likelihood is uniformly distributed, then the order statistics of a certain number of such trials are Beta random variables, with parameters being order from left \(k\) and order from right \(n+1-k\). \(n\) is the sample size.

\[ U_{(k)} \sim B(k,n+1-k) \]

Distributions from Continuous Random Process

Poisson Process and Derived Distributions

Poisson process is a continuous random process that satisfies:

Constant occurance rate over time: \( \lambda = \lim_{\Delta t \to 0+} \frac{K_{\Delta t}}{\Delta t} \)
Mutual independence of occurance among non-overlapping intervals.

Denoted as an arrival process, it is \( \mathbf{X} = (X_{(1)}, X_{(2)}, \cdots) \).

Conditional on the number of occurances in a certain time interval, the distribution of occurances is uniform over the time interval. The Poisson process is related to the reference distribution in this sense.

\[ X|K_t \sim U(0,t)^{K_t} \]

For any time interval of a certain duration, the number of occurances in a Poisson process is a Poisson random variable, with parameter the expected occurance \( \lambda t \).

\[ K_t = \sum_{i=1}^{\infty} \mathbf{1} \left( X_{(i)} \in (t_0, t_0+t) \right) \sim \text{Poisson}(\lambda t) \]

The next occurance epoch in a Poisson process, regardless of occurance at and before the current epoch, is an exponential random variable, with parameter the occurance rate \( \lambda \). Any interarrival time is also such a random variable.

\[ X_{(1)} \sim \text{Exp}(\lambda) \] \[ X_{(n+1)} - X_{(n)} \sim \text{Exp}(\lambda) \]

The occurance epoch of a certain order in a Poisson process, regardless of occurance at and before the current epoch, is a Gamma random variable, with parameter being occurance rate \( \lambda \) and ordinal number \(k\). The time between a certain number of arrivals in a Poisson process is also such a random variable.

\[ X_{(k)} \sim \Gamma(\lambda,k) \] \[ X_{(n+k)} - X_{(n)} \sim \Gamma(\lambda,k) \]

Tertiary Distributions

The sum of a certain number of mutually independent exponential random variables is a Gamma random variable, with parameters the number of summands \(k\) and inherited occurance rate \(\lambda\).

\[ \sum_{i=1}^k X_{(1),i} \sim \Gamma(k,\lambda) \]

The difference between two independent identically distributed exponential random variables is governed by a Laplace distribution, as is a Brownian motion evaluated at an exponentially distributed random time. The location parameter is 0, the scale parameter is the inverse of occurance rate \(\lambda\) .

\[ X_{(1),i} - X_{(1),j} \sim \text{Laplace}(0,\frac{1}{\lambda}) \]

Asymptotic Distributions

Normal Distribution

The (1) sum of (2) many (3) weak-correlated and (4) variance-comparable random variables has approximately a normal distribution. Conditions (3) and (4) is equivalent to the covariance being close to a scalar matrix.

\[ \sum_{i=1}^{\infty} X_i \sim \text{Normal} \]

Log-normal Distribution

The (1) product of (2) many (3) weak-correlated and (4) variance-comparable random variables has approximately a log-normal distribution.

\[ \prod_{i=1}^{\infty} X_i \sim \text{LnN} \]

Extreme Value Distribution

Upper tail types: exponential, power law, bounded.

Type I: Gumbel distribution

Type II: Fréchet distribution

Type III: Weibull distribution

Gaussian-related Distributions

Chi-squared and Related Distributions

The square sum of independent standard Gaussian random variables is a Chi-squared random variable, with parameter, degree of freedom, the number of summands.

\[ \sum_{i=1}^{n} Z_i^2 \sim \chi^2(n) \]

The Chi-squared distribution with 2 degrees of freedom coincide with an exponential distribution, with occurrence rate \(\frac{1}{2}\).

\[ \chi^2(2) \sim \text{Exp}(\frac{1}{2}) \]

The square root of a Chi-squared random variable is a Chi random variable, with the same degree of freedom.

\[ X \sim \chi^2(n) \Rightarrow \sqrt{X} \sim \chi(n) \]

The magnitude of a 2-dimensional standard Gaussian vector is a Rayleigh random variable, with scale parameter one.

\[ \sqrt{Z_1^2 + Z_2^2 } \sim \text{Rayleigh}(1) \]

Student's t and Related Distribution

The ratio of a standard Gaussian r.v. and the square root of an independent Chi-squared r.v. normalized by its degree of freedom is a Student's t random variable. The parameter, degree of freedom, equals the degree of freedom of the Chi-squared r.v.

\[ \frac{Z}{\sqrt{\frac{V}{\nu}}} \sim t(\nu) \]

The ratio of two independent standard Gaussian r.v.'s is a Cauchy random variable. A Student's t r.v. with one degree of freedom is also such a random variable. Cauchy distribution can be extended to a localtion-scale family.

\[ \frac{Z_1}{Z_2} \sim \text{Cauchy}(0,1) \]

F Distribution

The ratio of two independent Chi-squared r.v.'s, normalized by their respective degree of freedom, is an F random variable, with parameters being the two degrees of freedom of the Chi-squared r.v.'s.

\[ \frac{V_1 / \nu_{1}}{V_2 / \nu_{2}} \sim F(\nu_{1}, \nu_{2}) \]

🏷 Category=Probability