An analytic approach to probability is naturally established if we map the sample space to some mathematical structure suitable for analysis; this is the motivation of random variables. When extending a deterministic variable to a stochastic one, the "first order uncertainty" is variance, not expectation.

**Random variable** χ is a measurable mapping from a probability space to a measurable space:
given $(\Omega, \Sigma, P)$, $\chi \in \mathcal{M}(\Omega, \Sigma; X, \Sigma_X)$.
The codomain X of a random variable should have more structures other than a sigma-algebra,
and it is usually a topological vector space, for example:
(1) a Euclidean space with the Lebesgue sigma-algebra, $(\mathbb{R}^n, \mathcal{L})$;
(2) a Banach or Hilbert space with the Borel sigma-algebra, $(H, \mathcal{B(T_d)})$.
**Real random variable** is a measurable mapping from a probability space to
the real line with the Lebesgue sigma-algebra:
$\chi \in \mathcal{M}(\Omega, \Sigma; \mathbb{R}, \mathcal{L})$.
**Discrete random variable** is a measurable mapping from a probability space to
a finite or countable set with its power set:
$\chi \in \mathcal{M}(\Omega, \Sigma; n, 2^n)$, $n \in \mathbb{N}$;
or $\chi \in \mathcal{M}(\Omega, \Sigma; \mathbb{N}, 2^{\mathbb{N}})$.
**Continuous random variable** is a random variable
that assigns no positive probability to any value: $\forall x \in X$, $P(X = x) = 0$.

New constructs on the sample space.
**Induced sigma-algebra** $\Sigma_\chi$ on the domain Ω by a random variable
is the collection of preimages of measurable sets in the codomain:
$\Sigma_\chi := \{\chi^{-1}(B) : B \in \Sigma_X\}$.

New constructs on the "analysis space".
**Distribution** $\mu$ of a random variable
is the probability measure induced on its codomain:
$\forall B \in \Sigma_X$, $\mu(B) = P(\chi^{-1}(B))$.
Now we have a new probability space $(X, \Sigma_X, \mu)$,
which comes with other structures for analysis.
**Cumulative distribution function** (CDF) $F_\chi(x)$ of a random variable
is the function that gives the probability of the nonnegative cone associated with each point:
$F_\chi(x) = \mu( \{y \in \mathbb{R}^n : y \le x\} )$.
CDF is an economic representation of the distribution:
its domain is a Euclidean space, it always exists,
and it is equivalent to the distribution if the codomain has a Borel sigma-algebra.
However, CDF can be less intuitive for vector-valued random variables,
because the partial order (nonnegative cones) on a Euclidean space
is not immediately useful in most cases.
**Probability density function** (PDF) $f_\chi(x)$ of a random variable
whose distribution is absolutely continuous w.r.t. the Lebesgue measure
is the non-negative function that relates the distribution to the Lebesgue measure:
$f_\chi: \mathbb{R}^n \mapsto \mathbb{R}_{\ge 0}$, $\mu = f_\chi \lambda$.
Equivalently, PDF can be defined as the derivative of the CDF, if exists:
$f_\chi (x) = \mathrm{d} F_\chi (x) / \mathrm{d} x$.
**Probability mass function** (PMF) $f_\chi (x)$ of a discrete random variable
is the function that assigns each value of the random variable its induced measure:
$f_\chi (x) = P(\chi^{-1}(x))$.

**Expectation** $\mathbb{E} \chi$ of a random variable is its Lebesgue integral:
$\mathbb{E} \chi := \int_\Omega \chi~\mathrm{d}P = \int_X x~\mathrm{d} \mu$.
Lebesgue integral provides a uniform definition
for the expectation of discrete and continuous random variables,
and ensures closure of function spaces, e.g. Banach and Hilbert spaces of functions.
*Theorem* (change of variables):
The Lebesgue integral of a real random variable on a probability space
equals the Stieltjes integral of the identity function w.r.t. the cumulative distribution function:
$\int_{\mathbb{R}} x~\mathrm{d} \mu = \int_{\mathbb{R}} x~\mathrm{d} F_\chi$.

Expectation provides a summary of a random variable.
The expectation operator can also be applied to mappings of a random variable,
to obtain other summary information of the random variable.
**Moment of order k** or **k-th moment** $\mathbb{E} \chi^k$ of a real random variable
is the expectation of its k-th power.
**Characteristic function** $\varphi_\chi(t)$ of a random variable
can be thought of as the Fourier transform of the PDF, but unlike PDF it always exists:
(1) scalar version:
$\varphi_\chi (t) := \mathbb{E} e^{it\chi} = \int_{\mathbb{R}} e^{itx} \mathrm{d} \mu$;
for real random variables in particular,
$\varphi_\chi (t) = \int_{\mathbb{R}} e^{itx} \mathrm{d} F(x)$;
(2) vector version:
$\Phi_{\mathbf{\chi}}(\mathbf{w}) := \mathbb{E}e^{i \mathbf{w}^T \mathbf{\chi}}
= \mathcal{F} f_{\mathbf{\chi}}(\mathbf{x})$.
The characteristic function of a random variable uniquely determines
the distribution of the random variable:
$f_{\mathbf{x}}(x) = \mathcal{F}^{-1} \Phi_{\mathbf{x}}(w)$.
Weak convergence of random variables implies
pointwise convergence of the corresponding characteristic functions.
If a real random variable has moments up to the k-th order,
then its characteristic function is k times continuously differentiable on the entire real line:
$\mathbb{E} \chi^k \in \mathbb{R}$ then $\phi_\chi \in C^k(\mathbb{R})$.
If the characteristic function of a real random variable is k-times differentiable at 0,
then the random variable has moments up to the k-th order if k is even,
and up to the k-1-th order if k is odd, which can be computed from the characteristic function:
$\phi_\chi^{(k)}(t) \in \mathbb{R}$ then $\mathbb{E} \chi^l = (-i)^l \varphi_\chi^{(l)} (0)$,
where $l \le 2 \lfloor l/2 \rfloor$.

Name | Definition | Interpretation | Dimension | Range† |
---|---|---|---|---|

mean | first raw moment | central tendency | as is | $(-\infty, \infty)$ |

standard deviation | second central moment | variation | as is | $[0,\infty)$ |

skewness | normalized third central moment | lopsidedness | dimensionless | $(-\infty, \infty)$ |

excess kurtosis | excess normalized fourth central moment, centered at normal distribution | (for symmetric distribution) probability concentration on center and tails against the standard deviations | dimensionless | $[-2, \infty)$ |

† If exists.

Classification of positive random variables by concentration: [@Taleb2018]

- compact support;
- sub-Gaussian: $\exists a > 0: F(x) = \mathcal{O}(e^{-ax^2})$;
- Gaussian;
- sub-exponential: no exponential moment; sum dominated by the maximum for large values [@Embrechts1979];
- power law (p>3): finite mean & variance;
- power law (2<p≤3): finite mean;
- power law (1<p≤2);

Other summaries of a random variable.
**Mode** of a random variable with a PDF is the set of local maxima of the PDF, if exists.
**Mode** of a discrete random variable is the set of (global) maxima of the PMF.
**p-quantile** $F^{-1}(p)$ of a real random variable is the preimage of p via the CDF.
**Median** is the 0.5-quantile: $F^{-1}(0.5)$.
Comparison of mean, mode, and median:
(1) Expectation, or mean, applies to any type of random variable.
It may not exist, but it is unique if exists.
(2) Mode applies to any random variable with a PDF or a PMF.
For the PDF case, it may not exist and may not be unique.
For the PMF case, it always exist but may not be unique.
(3) Median applies to any real random variable.
It always exist, but may not be unique.
For a real random variable with a symmetric unimodal PDF,
its mean (if exists), median, and mode coincide.