An analytic approach to probability is naturally established if we map the sample space to some mathematical structure suitable for analysis; this is the motivation of random variables. When extending a deterministic variable to a stochastic one, the "first order uncertainty" is variance, not expectation.

**Random variable** $X: \Omega \mapsto \Psi$
is a measurable mapping on a probability space $(\Omega, \Sigma, P)$.
The sigma-algebra on the codomain $\Psi$ is denoted as $\Sigma_\Psi$.
The codomain of a random variable is typically the real line $\mathbb{R}$
with the Borel sigma-algebra $\mathcal{B(T_d)}$ or the Lebesgue sigma-algebra $\mathcal{L}$,
or a Banach or Hilbert space with a sigma-algebra.
**Sigma-algebra introduced by a random variable** is the class of preimages of measurable sets
in the codomain: $\Sigma_X := \{X^{-1}(B) : B \in \Sigma_\Psi\}$.

**Distribution** of a random variable $X: \Omega \mapsto \Psi$ is the probability measure
$\mu: \Sigma_\Psi \mapsto [0,1]$ induced on its codomain: $\mu(B) := P(X^{-1}(B))$.

**Cumulative distribution function** (CDF) of a real random variable $X: \Omega \mapsto \mathbb{R}$
is the real function $F_X(x) := \mu(-\infty, x]$.
Cumulative distribution function $F_X$ is a convenient characterization of the distribution $\mu$:
CDF always exists, and is equivalent to the distribution
if the sigma-algebra of the real line is Borel.
**Probability density function** (PDF) of a real random variable
is the derivative of a cumulative distribution function, if the derivative exists:
$f_X (x)= \frac{\mathrm{d}}{\mathrm{d} x} F_X (x)$.
**Probability mass function** (PMF) of a discrete random variable
is the real function that assigns each value of the random variable its induced measure:
$f_X (x) = P(X^{-1}(x)), \forall x \in \mathbb{R}$.

**Expectation** of a random variable is its Lebesgue integral:
$\mathbb{E} X := \int_\Omega X~\mathrm{d}P$.
Lebesgue integral provides a uniform definition
for the expectation of discrete and continuous random variables,
and ensures closure of function spaces, e.g. Banach and Hilbert spaces of functions.

Theorem (change of variables): The Lebesgue integral of a real random variable on a probability space equals the Stieltjes integral of the identity function w.r.t. the cumulative distribution function: $\int_\Omega X~\mathrm{d} P = \int_{\mathbb{R}} x~\mathrm{d} F_X$.

The **characteristic function** of a random variable $X$ with measure $\mu$ is

$$\begin{aligned} \varphi_X (t) &\equiv \mathbb{E} e^{itX} = \int_{\mathbb{R}} e^{itx} \mathrm{d} \mu && \text{(scalar form)} \\ \Phi_{\mathbf{X}}(\mathbf{w}) &\equiv \mathbb{E}e^{i \mathbf{w}^T \mathbf{X}} = \mathcal{F} f_{\mathbf{X}}(\mathbf{x}) && \text{(vector form)} \end{aligned}$$

The characteristic function can be thought of as the Fourier transform of the PDF. But unlike PDF, the characteristic function of a distribution always exist.

The characteristic function uniquely determines the distribution of a random variable: $f_{\mathbf{X}}(\mathbf(x)) = \mathcal{F}^{-1} \Phi_{\mathbf{X}}(\mathbf(w))$.

Weak convergence of random variables implies pointwise convergence of the corresponding characteristic functions.

If a random variable $X$ has moments up to the $k$-th order, then the characteristic function is $k$ times continuously differentiable on the entire real line.

If a characteristic function has a $k$-th derivative at 0, then the random variable has moments up to the $k$-th order if $k$ is even, and up to the $k-1$-th order if $k$ is odd.

If the right-hand side is well defined, the $k$-th moment can be computed as:

$$\mathbb{E} X^K = (-i)^k \varphi_X^{(k)} (0)$$

Table: Standard Form of Dominant Moments

Name | Definition | Interpretation | Dimension | Range† |
---|---|---|---|---|

mean | first raw moment | central tendency | as is | $(-\infty, \infty)$ |

standard deviation | second central moment | variation | as is | $[0,\infty)$ |

skewness | normalized third central moment | lopsidedness | dimensionless | $(-\infty, \infty)$ |

excess kurtosis | excess normalized fourth central moment, centered at normal distribution | (for symmetric distribution) probability concentration on center and tails against the standard deviations | dimensionless | $[-2, \infty)$ |

† If exists.

Classification of positive random variables by concentration: [@Taleb2018]

- compact support;
- sub-Gaussian: $\exists a > 0: F(x) = O(e^{-ax^2})$;
- Gaussian;
- sub-exponential: no exponential moment; sum dominated by the maximum for large values [@Embrechts1979];
- power law (p>3): finite mean & variance;
- power law (2<p≤3): finite mean;
- power law (1<p≤2);