An analytic approach to probability is naturally established if we map the sample space to some mathematical structure suitable for analysis; this is the motivation of random variables. When extending a deterministic variable to a stochastic one, the "first order uncertainty" is variance, not expectation.

Random Variable

Random variable χ is a measurable mapping from a probability space to a measurable space: given $(\Omega, \Sigma, P)$, $\chi \in \mathcal{M}(\Omega, \Sigma; X, \Sigma_X)$. The codomain X of a random variable should have more structures other than a sigma-algebra, and it is usually a topological vector space, for example: (1) a Euclidean space with the Lebesgue sigma-algebra, $(\mathbb{R}^n, \mathcal{L})$; (2) a Banach or Hilbert space with the Borel sigma-algebra, $(H, \mathcal{B(T_d)})$. Real random variable is a measurable mapping from a probability space to the real line with the Lebesgue sigma-algebra: $\chi \in \mathcal{M}(\Omega, \Sigma; \mathbb{R}, \mathcal{L})$. Discrete random variable is a measurable mapping from a probability space to a finite or countable set with its power set: $\chi \in \mathcal{M}(\Omega, \Sigma; n, 2^n)$, $n \in \mathbb{N}$; or $\chi \in \mathcal{M}(\Omega, \Sigma; \mathbb{N}, 2^{\mathbb{N}})$.

New constructs on the sample space. Induced sigma-algebra $\Sigma_\chi$ on the domain Ω by a random variable is the collection of preimages of measurable sets in the codomain: $\Sigma_\chi := \{\chi^{-1}(B) : B \in \Sigma_X\}$.

New constructs on the "analysis space". Distribution $\mu$ of a random variable is the probability measure induced on its codomain: $\forall B \in \Sigma_X$, $\mu(B) = P(\chi^{-1}(B))$. Now we have a new probability space $(X, \Sigma_X, \mu)$, which comes with other structures for analysis. Cumulative distribution function (CDF) $F_\chi(x)$ of a random variable is the function that gives the probability of the nonnegative cone associated with each point: $F_\chi(x) = \mu( \{y \in \mathbb{R}^n : y \le x\} )$. CDF is an economic representation of the distribution: its domain is a Euclidean space, it always exists, and it is equivalent to the distribution if the codomain has a Borel sigma-algebra. However, CDF can be less intuitive for vector-valued random variables, because the partial order (nonnegative cones) on a Euclidean space is not immediately useful in most cases. Probability density function (PDF) $f_\chi(x)$ of a random variable whose distribution is absolutely continuous w.r.t. the Lebesgue measure is the non-negative function that relates the distribution to the Lebesgue measure: $f_\chi: \mathbb{R}^n \mapsto \mathbb{R}_{\ge 0}$, $\mu = f_\chi \lambda$. Equivalently, PDF can be defined as the derivative of the CDF, if exists: $f_\chi (x) = \mathrm{d} F_\chi (x) / \mathrm{d} x$. Probability mass function (PMF) $f_\chi (x)$ of a discrete random variable is the function that assigns each value of the random variable its induced measure: $f_\chi (x) = P(\chi^{-1}(x))$.


Expectation $\mathbb{E} \chi$ of a random variable is its Lebesgue integral: $\mathbb{E} \chi := \int_\Omega \chi~\mathrm{d}P = \int_X x~\mathrm{d} \mu$. Lebesgue integral provides a uniform definition for the expectation of discrete and continuous random variables, and ensures closure of function spaces, e.g. Banach and Hilbert spaces of functions. Theorem (change of variables): The Lebesgue integral of a real random variable on a probability space equals the Stieltjes integral of the identity function w.r.t. the cumulative distribution function: $\int_{\mathbb{R}} x~\mathrm{d} \mu = \int_{\mathbb{R}} x~\mathrm{d} F_\chi$.

Expectation provides a summary of a random variable. The expectation operator can also be applied to mappings of a random variable, to obtain other summary information of the random variable. Moment of order k or k-th moment $\mathbb{E} \chi^k$ of a real random variable is the expectation of its k-th power. Characteristic function $\varphi_\chi(t)$ of a random variable can be thought of as the Fourier transform of the PDF, but unlike PDF it always exists: (1) scalar version: $\varphi_\chi (t) := \mathbb{E} e^{it\chi} = \int_{\mathbb{R}} e^{itx} \mathrm{d} \mu$; for real random variables in particular, $\varphi_\chi (t) = \int_{\mathbb{R}} e^{itx} \mathrm{d} F(x)$; (2) vector version: $\Phi_{\mathbf{\chi}}(\mathbf{w}) := \mathbb{E}e^{i \mathbf{w}^T \mathbf{\chi}} = \mathcal{F} f_{\mathbf{\chi}}(\mathbf{x})$. The characteristic function of a random variable uniquely determines the distribution of the random variable: $f_{\mathbf{x}}(x) = \mathcal{F}^{-1} \Phi_{\mathbf{x}}(w)$. Weak convergence of random variables implies pointwise convergence of the corresponding characteristic functions. If a real random variable has moments up to the k-th order, then its characteristic function is k times continuously differentiable on the entire real line: $\mathbb{E} \chi^k \in \mathbb{R}$ then $\phi_\chi \in C^k(\mathbb{R})$. If the characteristic function of a real random variable is k-times differentiable at 0, then the random variable has moments up to the k-th order if k is even, and up to the k-1-th order if k is odd, which can be computed from the characteristic function: $\phi_\chi^{(k)}(t) \in \mathbb{R}$ then $\mathbb{E} \chi^l = (-i)^l \varphi_\chi^{(l)} (0)$, where $l \le 2 \lfloor l/2 \rfloor$.

Table: Standard form of dominant moments.
Name Definition Interpretation Dimension Range†
mean first raw moment central tendency as is $(-\infty, \infty)$
standard deviation second central moment variation as is $[0,\infty)$
skewness normalized third central moment lopsidedness dimensionless $(-\infty, \infty)$
excess kurtosis excess normalized fourth central moment, centered at normal distribution (for symmetric distribution) probability concentration on center and tails against the standard deviations dimensionless $[-2, \infty)$

† If exists.

Classification of positive random variables by concentration: [@Taleb2018]

  1. compact support;
  2. sub-Gaussian: $\exists a > 0: F(x) = \mathcal{O}(e^{-ax^2})$;
  3. Gaussian;
  4. sub-exponential: no exponential moment; sum dominated by the maximum for large values [@Embrechts1979];
  5. power law (p>3): finite mean & variance;
  6. power law (2<p≤3): finite mean;
  7. power law (1<p≤2);


Other summaries of a random variable. Mode of a random variable with a PDF is the set of local maxima of the PDF, if exists. Mode of a discrete random variable is the set of (global) maxima of the PMF. p-quantile $F^{-1}(p)$ of a real random variable is the preimage of p via the CDF. Median is the 0.5-quantile: $F^{-1}(0.5)$. Comparison of mean, mode, and median: (1) Expectation, or mean, applies to any type of random variable. It may not exist, but it is unique if exists. (2) Mode applies to any random variable with a PDF or a PMF. For the PDF case, it may not exist and may not be unique. For the PMF case, it always exist but may not be unique. (3) Median applies to any real random variable. It always exist, but may not be unique. For a real random variable with a symmetric unimodal PDF, its mean (if exists), median, and mode coincide.

🏷 Category=Probability