An analytic approach to probability is naturally established if we map the sample space to some mathematical structure suitable for analysis; this is the motivation of random variables. When extending a deterministic variable to a stochastic one, the "first order uncertainty" is variance, not expectation.

## Random Variable

Random variable $\chi: \Omega \mapsto X$ is a measurable mapping on a probability space $(\Omega, \Sigma, P)$. The sigma-algebra on the codomain $X$ is denoted as $\Sigma_X$. The codomain of a random variable is typically the real line $\mathbb{R}$ with the Borel sigma-algebra $\mathcal{B(T_d)}$ or the Lebesgue sigma-algebra $\mathcal{L}$, or a Banach or Hilbert space with a sigma-algebra. Sigma-algebra introduced by a random variable is the class of preimages of measurable sets in the codomain: $\Sigma_\chi := \{\chi^{-1}(B) : B \in \Sigma_X\}$.

## Distribution

Distribution of a random variable $\chi: \Omega \mapsto X$ is the probability measure $\mu: \Sigma_X \mapsto [0,1]$ induced on its codomain: $\mu(B) := P(\chi^{-1}(B))$.

Cumulative distribution function (CDF) of a real random variable $\chi: \Omega \mapsto \mathbb{R}$ is the real function $F_\chi(x) := \mu(-\infty, x]$. Cumulative distribution function $F_\chi$ is a convenient characterization of the distribution $\mu$: CDF always exists, and is equivalent to the distribution if the sigma-algebra of the real line is Borel. Probability density function (PDF) of a real random variable is the derivative of a cumulative distribution function, if the derivative exists: $f_\chi (x) := \mathrm{d} F_\chi (x) / \mathrm{d} x$. Probability mass function (PMF) of a discrete random variable is the real function that assigns each value of the random variable its induced measure: $f_\chi (x) := P(\chi^{-1}(x))$.

## Expectation

Expectation of a random variable is its Lebesgue integral: $\mathbb{E} \chi := \int_\Omega \chi~\mathrm{d}P$. Lebesgue integral provides a uniform definition for the expectation of discrete and continuous random variables, and ensures closure of function spaces, e.g. Banach and Hilbert spaces of functions.

Theorem (change of variables): The Lebesgue integral of a real random variable on a probability space equals the Stieltjes integral of the identity function w.r.t. the cumulative distribution function: $\int_\Omega \chi~\mathrm{d} P = \int_X x~\mathrm{d} \mu = \int_{\mathbb{R}} x~\mathrm{d} F_\chi$.

## Moments and Characteristic Function

The characteristic function of a random variable $\chi$ with measure $\mu$ is

\begin{aligned} \varphi_\chi (t) &\equiv \mathbb{E} e^{it\chi} = \int_{\mathbb{R}} e^{itx} \mathrm{d} \mu && \text{(scalar form)} \\ \Phi_{\mathbf{\chi}}(\mathbf{w}) &\equiv \mathbb{E}e^{i \mathbf{w}^T \mathbf{\chi}} = \mathcal{F} f_{\mathbf{\chi}}(\mathbf{x}) && \text{(vector form)} \end{aligned}

The characteristic function can be thought of as the Fourier transform of the PDF. But unlike PDF, the characteristic function of a distribution always exist.

The characteristic function uniquely determines the distribution of a random variable: $f_{\mathbf{\chi}}(\mathbf(x)) = \mathcal{F}^{-1} \Phi_{\mathbf{\chi}}(\mathbf(w))$.

Weak convergence of random variables implies pointwise convergence of the corresponding characteristic functions.

If a random variable $\chi$ has moments up to the k-th order, then the characteristic function is $k$ times continuously differentiable on the entire real line.

If a characteristic function has a k-th derivative at 0, then the random variable has moments up to the k-th order if $k$ is even, and up to the k-1-th order if $k$ is odd.

If the right-hand side is well defined, the k-th moment can be computed as:

$$\mathbb{E} \chi^K = (-i)^k \varphi_\chi^{(k)} (0)$$

Table: Standard Form of Dominant Moments

Name Definition Interpretation Dimension Range†
mean first raw moment central tendency as is $(-\infty, \infty)$
standard deviation second central moment variation as is $[0,\infty)$
skewness normalized third central moment lopsidedness dimensionless $(-\infty, \infty)$
excess kurtosis excess normalized fourth central moment, centered at normal distribution (for symmetric distribution) probability concentration on center and tails against the standard deviations dimensionless $[-2, \infty)$

† If exists.

Classification of positive random variables by concentration: [@Taleb2018]

1. compact support;
2. sub-Gaussian: $\exists a > 0: F(x) = O(e^{-ax^2})$;
3. Gaussian;
4. sub-exponential: no exponential moment; sum dominated by the maximum for large values [@Embrechts1979];
5. power law (p>3): finite mean & variance;
6. power law (2<p≤3): finite mean;
7. power law (1<p≤2);