Notes on Lebesgue measure and integration
Algebra of sets or set algebra $(\mathcal{A}, (\cup, \cap; \complement))$ is an algebraic system consisting of a non-empty subset $\mathcal{A}$ of a power set $\mathcal{P}(X)$ and two operations: (1) union $\cup$ and/or intersection $\cap$ and (2) complement $\complement$ (such that the system is closed under any finite composition of operations). We call X the underlying set of the set algebra. Any set algebra is a Boolean algebra. Algebra of sets is equivalent to ring of sets that includes the underlying set. Subalgebra $\mathcal{A}'$ of a set algebra is a subset of the set algebra that is also a set algebra: $\mathcal{A}' \subset \mathcal{A}$. The collection $\{\emptyset, X\}$ of the empty set and a set is a subalgebra of any set algebra based on the set, and thus is the smallest set algebra based on the set. Any set algebra is a subalgebra of the power set of its underlying set, and thus the largest set algebra based on a set is its power set. The union of members of any set algebra equals its underlying set: $\bigcup \mathcal{A} = X$. Algebra of sets $\mathcal{A(S)}$ generated by a class of subsets of a set is the smallest set algebra based on the set that contains the class: $\mathcal{A(S)} = \bigcap \{\mathcal{A : S \subset A \subset P}(X)\}$.
Sigma-algebra $(\Sigma, (\cup_{\mathbb{N}}, \cap_{\mathbb{N}}; \complement))$ is a set algebra that is closed under countable unions $\cup_{\mathbb{N}}$ and/or countable intersections $\cap_{\mathbb{N}}$. By De Morgan's laws, closure under countable unions and closure under countable intersections are equivalent. Sigma-algebra $\Sigma(\mathcal{S})$ generated by a subset of a power set is the smallest sigma-algebra based on the underlying set that contains the collection: $\Sigma(\mathcal{S}) = \bigcap \{\Sigma : \mathcal{S} \subset \Sigma \subset \mathcal{P}(X)\}$. Borel sigma-algebra $\mathcal{B(T)}$ is the sigma-algebra generated by a topology: $\mathcal{B(T)} = \Sigma(\mathcal{T})$.
Field of sets $(X, \mathcal{A})$ is a set and a set algebra based on the set. Here, "field" is not in the same sense of "field" in abstract algebra. Measurable space $(X, \Sigma)$ is a set endowed with a sigma-algebra. Measurable set in a measurable space is a set in its sigma-algebra. Borel measurable space $(X, \mathcal{T}, \mathcal{B(T)})$ is a topological space endowed with its Borel sigma-algebra. Borel set in a Borel measurable space is a set in its sigma-algebra.
Measurable mapping is a mapping between two measurable spaces such that the preimage of any measurable set is measurable: $f: X \mapsto Y$, $\{f^{-1}(B) : B \in \Sigma_Y\} \subset \Sigma_X$. I will use $\mathcal{M}(X, \Sigma_X; Y, \Sigma_Y)$ to denote the class of measurable mappings between two measurable spaces; if the sigma-algebras are Borel, and the topologies are unambiguous, the notation is simplified to $\mathcal{M}(X, Y)$. The class of measurable mappings is closed under composition. Every continuous mapping between Borel measurable spaces is measurable: $C(X, Y) \subset \mathcal{M}(X; Y)$. Measurable space isomorphism $f: (X, \Sigma_X) \cong (Y, \Sigma_Y)$ is a measurable mapping with a measurable inverse: $f \in \mathcal{M}(X, \Sigma_X; Y, \Sigma_Y)$, $f^{-1} \in \mathcal{M}(Y, \Sigma_Y; X, \Sigma_X)$. Measurable function is a measurable map to the Borel measurable space of real numbers with the usual topology. The class $\mathcal{M}(X, \Sigma)$ of measurable functions is closed under algebraic operations (addition, scalar and vector multiplications) and lattice operations (countable maximum and minimum).
Measure (测度) $\mu: \Sigma \mapsto [0, \infty]$ is a mapping from a sigma-algebra to the extended nonnegative real numbers which is distributive with countable union of mutually disjoint sets: $\mu(\sqcup_{i \in \mathbb{N}} A_i) = \sum_{i=1}^\infty \mu(A_i)$. The sum, converge or not, is independent of the order of the sets because the terms are nonnegative. Measure specifies the sizes of measurable sets of a measurable space. Inner measure $\mu_∗$ induced by a measure on a measurable space is the measure on the power set that maps each subset to the supremum of the measures of measureable subsets of the subset: $\forall A \in \mathcal{P}(X)$, $\mu_∗(A) = \sup\{\mu(B) : B \in [\emptyset, A] \cap \Sigma\}$. Outer measure $\mu^∗$ induced by a measure on a measurable space is the measure on the power set that maps each subset to the infimum of the measures of measureable subsets that include the subset: $\forall A \in \mathcal{P}(X)$, $\mu^∗(A) = \inf\{\mu(B) : B \in [A, X] \cap \Sigma\}$.
Finite measure is a measure that assigns the full set a finite value: $\mu(X) < \infty$. Normalized measure is a measure that assigns the full set the unit: $\mu(X) = 1$. Every finite measure can be normalized. Sigma-finite measure is a measure where the underlying set is a countable union of measurable sets with finite measure: $X = \cup_{i \in \mathbb{N}} A_i$, $\mu(A_i) < \infty$. Locally finite measure on a Borel measurable space is a measure such that every point has a neighborhood of finite measure.
Absolutely continuous measure $\nu \ll \mu$ w.r.t. a measure μ on a measurable space is a measure on the space such that any set of zero μ-measure is a set of zero ν-measure: $\mu(A) = 0$ then $\nu(A) = 0$. Absolutely continuous measure of a finite measure is equivalent to a measure whose value can be arbitrarily small if the corresponding μ-measure is sufficiently small: $\forall \varepsilon > 0$, $\exists \delta > 0$: $\mu^{-1}[0, \delta) \subset \nu^{-1}[0, \varepsilon)$. Singular measure $\nu \perp \mu$ w.r.t. a measure μ on a measurable space is a measure on the space such that there is a set of zero μ-measure whose complement is a set of zero ν-measure: $\exists A \in \Sigma$: $\mu(A) = 0$, $\nu(\complement A) = 0$. Radon-Nikodým Decomposition Theorem [@Radon1919; @Nikodým1930]: Every sigma-finite measure on a measurable space can be uniquely represented as the sum of an absolutely continuous measure and a singular measure, both w.r.t. another sigma-finite measure: $\exists \nu_a \ll \mu$, $\exists \nu_s \perp \mu$: $\nu = \nu_a + \nu_s$.
Measure space $(X, \Sigma, \mu)$ is a measurable space endowed with a measure. Probability space is a measurable space endowed with a normalized measure. Negligible subset in a measure space is a subset of zero measure: $\mu(A) = 0$. Full-measure subset in a measure space is a subset whose complement is negligible: $\mu(\complement A) = 0$. A property on a measure space holds almost everywhere (or a.e.) if it is a set of full measure: $\mu(\lnot P) = 0$. Almost equality $\approx_\mu$ or equality mod 0 on a measure space is a binary relation on its power set such that two subsets are almost equal if and only if their symmetric difference is negligible: $A \approx_\mu B$ iff $\mu(A \Delta B) = 0$.
Borel measure space $(X, \mathcal{T}, \mathcal{B(T)}, \mu)$ is a Borel measurable space endowed with a measure. Some authors call a measure on a Borel measurable space "Borel" if it satisfies certain properties. Borel measure space $(\mathbb{R}, \mathcal{T_d}, \mathcal{B(T_d)}, \mu)$ of real numbers is the real line endowed with the usual topology, the Borel sigma-algebra, and the measure that assigns each interval its length: $\mu([a, b]) = b - a$. Radon measure on a Borel measurable space with a Hausdorff topology is a locally finite measure such that the measure of a measurable set equals the supremum of the measures of its compact subspaces: $\mu(B) = \sup\{\mu(K) : K \subset B\}$. Every finite Radon measure on a Borel measurable space with a locally compact Hausdorff topology is outer regular, i.e. the measure of a measurable set equals the infimum of the measures of its neighborhoods: $\mu(B) = \inf\{\mu(U) : U \in [B, X] \cap \mathcal{T}\}$. Luzin Criterion for measurability of real functions [@Luzin1912]: A real function on an interval except for a set of zero measure is measurable if and only if it is continuous except for a set of arbitrarily small measure: $X \subset [a, b]$, $\mu(X) = b - a$, then $f \in \mathcal{M}(X, \mathcal{B})$ iff $\exists (A_n)_{n \in \mathbb{N}} \subset \mathcal{B}$, $X_n = \cup_{i=1}^n A_n$: $\lim_{n \to \infty} \mu(X_n) = b - a$, $f|_{X_n} \in C(X_n, Y)$. Luzin Criterion for measurability: A mapping from a finite Radon measure space to a second-countable Borel measurable space is measurable if and only if it is continuous except for an open subset of arbitrarily small measure: $f \in \mathcal{M}(X, \mathcal{B})$ iff $\exists (X_n)_{n \in \mathbb{N}} \subset \mathcal{T}_X^∗$: $\lim_{n \to \infty} \mu(X_n) = \mu(X)$, $f|_{X_n} \in C(X_n, Y)$.
Strict isomorphism or point isomorphism between two measure spaces is a measurable mapping with a measurable inverse and preserves the measures: $f: (X, \Sigma_X) \cong (Y, \Sigma_Y)$; $\forall A \in \Sigma_X$, $\mu(A) = \nu(f(A))$. Almost isomorphism or mod 0 isomorphism between two measure spaces is a strict isomorphism between some full measure subspaces.
μ-measurable subset in a measure space is a subset almost equal to a measurable set: $\exists B \in \Sigma$: $\mu(A \Delta B) = 0$. μ-measurable sigma-algebra $\Sigma_\mu$ w.r.t. a measure space is the sigma-algebra consisting of all the μ-measurable subsets in the space. Every μ-measurable sigma-algebra includes the original sigma-algebra: $\forall \mu$, $\Sigma \subset \Sigma_\mu$. Almost equality is an equivalence relation on a μ-measurable sigma-algebra. μ-measurable space $(X, \Sigma_\mu)$ w.r.t. a measure space is the measurable space consisting of the underlying set and the μ-measurable sigma-algebra. The inner and outer measures are the same on the μ-measurable space: $\mu_∗|\Sigma_\mu = \mu^∗|\Sigma_\mu$. Complete measure space is a measure space where every subset of every negligible set is measurable, i.e. negligible: $\cup_{\mu(N) = 0} \mathcal{P}(N) \subset \Sigma$; or equivalently, every μ-measurable set is measurable: $\Sigma_\mu = \Sigma$. Complete measure spaces of the same cardinality are strictly isomorphic if they are almost isomorphic and for every negligible set in one space there is a negligible set of the same cardinality in the other.
Completion $(X, \Sigma_\mu, \mu^∗)$ of a measure space is the complete measure space consisting of the underlying set, the μ-measurable sigma-algebra, and the outer or inner measure (considered as restricted to the μ-measurable sigma-algebra). The completion of a measure space is the smallest complete measure space that extends it: $\mu^∗|_\Sigma = \mu$, $\Sigma_{\mu^∗|_{\Sigma_\mu}} = \Sigma_\mu$, $\Sigma_\mu = \bigcap \{\Sigma' : \tilde \mu|_\Sigma = \mu, \Sigma_{\tilde \mu} = \Sigma'\}$. Lebesgue measure space $(\mathbb{R}, \mathcal{T_d}, \mathcal{L}, \lambda)$ of real numbers is the completion of the Borel measure space of real numbers: Lebesgue sigmal-algebra $\mathcal{L} = \Sigma_\mu$, Lebesgue measure $\lambda = \mu^∗|\mathcal{L}$. Standard probability space, Lebesgue–Rokhlin probability space, or Lebesgue space is a probability space isomorphic to a probability space consisting of an interval with the Lebesgue measure and a finite or countable set with a discrete measure. Ever separable complete metric space with the Borel sigma-algebra and a normalized measure completes to a Lebesgue space: $(X, \mathcal{B(T_d)}, \mu)$, then $(X,\Sigma_\mu,\mu^∗) \cong (I,\mathcal{L},\lambda) \sqcup (\mathbb{N},\mathcal{P}(\mathbb{N}),m)$.
Measurable rectangle of the product set $X \times Y$ of two measurable spaces $(X, \Sigma_X)$ and $(Y, \Sigma_Y)$ is a product set $A \times B$ of two measurable sets $A \in \Sigma_X$ and $B \in \Sigma_Y$. Product sigma-algebra $\Sigma_X \times \Sigma_Y$ of the product set $X \times Y$ of two measurable spaces $(X, \Sigma_X)$ and $(Y, \Sigma_Y)$ is the sigma-algebra generated by the class of measurable rectangles: $\Sigma_X \times \Sigma_Y = \Sigma(\mathcal{B})$, $\mathcal{B} = \{A \times B : A \in \Sigma_X, B \in \Sigma_Y\}$. Product measurable space $(X \times Y, \Sigma_X \times \Sigma_Y)$ of two measurable spaces $(X, \Sigma_X)$ and $(Y, \Sigma_Y)$ is the product set $X \times Y$ with the product sigma-algebra $\Sigma_X \times \Sigma_Y$.
Product measure $\mu \times \theta$ of two measure spaces $(X, \Sigma_X, \mu)$ and $(Y, \Sigma_Y, \theta)$ is a measure on the product measurable space $(X \times Y, \Sigma_X \times \Sigma_Y)$ such that $(\mu \times \theta)(A \times B) = \mu(A) \theta(B)$. By Hahn–Kolmogorov theorem, product measures always exist. If the constituent measure spaces are sigma-finite, then product measure is uniquely defined: $(\mu \times \theta)(Q) = \int_Y~\text{d}\theta \int_X 1_Q(x, y)~\text{d} \mu$ $= \int_X \text{d} \mu \int_Y 1_Q(x, y) \text{d} \theta$. Product measure space of two measure spaces $(X, \Sigma_X, \mu)$ and $(Y, \Sigma_Y, \theta)$ is the measure space $(X \times Y, \Sigma_X \times \Sigma_Y, \mu \times \theta)$. Product measure spaces might be incomplete, even if both constituent measure spaces are complete. Completion of a product measure space $(X \times Y, (\Sigma_X \times \Sigma_Y)_{\mu \times \theta}, (\mu \times \theta)^∗)$ can be defined in a similar fashion. Lebesgue measure space $(\mathbb{R}^n, \mathcal{T_d}, \mathcal{L}, \lambda)$ of real n-tuples is the completion of the n-th product measure space of the Borel measure space $(\mathbb{R}, \mathcal{T_d}, \mathcal{B(T_d)}, \mu)$ of real numbers. Lebesgue measure on the n-th Cartesian power of real numbers, aka n-volume: $\lambda(A) = \inf\{\sum_{i\in\mathbb{N}} \lambda(I_i) : A \subset \cup_{i\in\mathbb{N}} I_i\}$, where $I_i$ are n-dimensional intervals.
Product measure space of infinitely many normalized measure spaces is well defined.
Indefinite integral of a real function $f: (a, b) \mapsto \mathbb{R}$ on an open interval $(a, b)$ is the set of its primitives, i.e. functions whose derivatives equal f: $\int f~\text{d}x = \{F : \forall x \in (a, b), \text{d}F = f \text{d}x\}$. Definite integral of a real function $f: [a, b] \mapsto \mathbb{R}$ on an interval [a, b] has a definition that evolved over time. Consider any countable partition $\{[x_n, x_{n+1}]\}_{n \in \mathbb{N}}$ of the domain [a, b], specified by a sequence $\{\Delta x_n\}$ of non-negative reals with sum $\sum_n \Delta x_n = b-a$, such that $x_n = a + \sum_{i=1}^n \Delta x_i$. Cauchy defined the definite integral as a limit of sums [@Cauchy1823]: $\int_a^b f~\text{d}x = \lim_{\max \Delta x_n \to 0} \sum_n f(x_n) \Delta x_n$. Continuous functions are Cauchy integrable. With arbitary evaluation points $\{\xi_n \in [x_n, x_{n+1}]\}_{n\in\mathbb{N}}$ given a partition, Riemann defined the definite integral as the limit of Riemann sums [@Riemann1853]: $\int_a^b f~\text{d}x = \lim_{\max \Delta x_n \to 0} \sum_n f(\xi_n) \Delta x_n$. A real function $f: [a, b] \mapsto \mathbb{R}$ is Riemann integrable iff it is bounded and it is countinous except for a zero-measure set of points [@Lebesgue1902]. Assuming f is bounded on [a, b], then there are sequences $\{m_n\}$ and $\{M_n\}$ where $m_n = \inf_{x_n \le x \le x_{n+1}} f(x)$ and $M_n = \sup_{x_n \le x \le x_{n+1}} f(x)$, [@Darboux1879] defined the definite integral as the limit of upper and lower Darboux sums: $\int_a^b f~\text{d}x = \lim_{\max \Delta x_n \to 0} \sum_n m_n \Delta x_n = \lim_{\max \Delta x_n \to 0} \sum_n M_n \Delta x_n$.
Fundamental theorem of calculus: given a continuous real function $f: [a, b] \mapsto \mathbb{R}$, (1) its definite integral equals the difference of its primitive's values at the interval ends, i.e. the Newton–Leibniz formula holds: $\int_a^b f~\text{d}x = F(b) - F(a)$; (2) its indefinite integral can be written as the definite integral with variable upper limit plus an arbitrary constant: $\int f~\text{d}x = \int_a^x f~\text{d}t + C$.
Stieltjes integral (or Riemann-Stieltjes integral) of a bounded real function $f: [a, b] \mapsto \mathbb{R}$ w.r.t. another bounded real function G is the limit of Stieltjes sums [@Stieltjes1894]: $\int_a^b f~\text{d}G = \lim_{\max \Delta x_n \to 0} \sum_n f(\xi_n) \Delta G(x_n)$, where $\Delta G(x_n) = G(x_n) - G(x_{n-1})$. f is called the integrand and G the integrating function of the Stieltjes integral. Stieltjes integral generalizes Riemann integral, and if the integrating function G has a Riemann integrable derivative g, Stieltjes integral reduces to Riemann integral: $\int_a^b f~\text{d}G = \int_a^b f g~\text{d}x$. Stieltjes integral is useful for curvilinear integral and the expectation of real random variables.
Simple function is a mapping $f: X \mapsto \{y_n\}_{n \in \mathbb{N}}$ from a measurable space $(X, \Sigma)$ to a countable set of real numbers such that the preimages are measurable: $\forall n \in \mathbb{N}$, $f^{-1}\{y_n\} \in \Sigma$. Lebesgue integral of a real-valued function on a complete sigma-finite measure space $(X, \Sigma, \mu)$ is: (1) $\int_X f~\text{d}\mu = \sum_{n \in \mathbb{N}} y_n \mu(f^{-1}\{y_n\})$, if f is a simple function and the series is absolutely convergent; (2) $\int_X f~\text{d}\mu = \lim_{n \to \infty} \int_X f_n~\text{d}\mu$, if there is a sequence $\{f_n\}$ of Lebesgue integrable simple functions that uniformly converges to f almost everywhere; (3) $\int_X f~\text{d}\mu = \lim_{n \to \infty} \int_{A_n} f~\text{d}\mu$, if for any sequence $\{A_n\}$ of finite-measure sets successively expanding to X, the sequence of integrals converges [@Lebesgue1902]. A real-valued function is Lebesgue integrable if and only if its absolute value is Lebesgue integrable. The space of all Lebesgue integrable functions on a measure space is $L^1_\mu(X)$, see $L^p$ space. Lebesgue integral can be generalized to mappings to any vector space. Lebesgue–Stieltjes integral generalizes the Lebesgue integral to measures of variable sign: if a measure of variable sign can be decomposed into the difference of two non-negative measures under which a function is Lebesgue integrable, then Lebesgue–Stieltjes integral of the function is the difference of the Lebesgue integrals; $\mu = \mu_1 - \mu_2$, $\int_X f~\text{d}\mu = \int_X f~\text{d}\mu_1 - \int_X f~\text{d}\mu_2$.
The integral concepts of Riemann, Stieltjes, and Lebesgue are very different: Riemann integral integrates a real function w.r.t. the volume (primitive of measure) of its domain X; Stieltjes integral integrates a real function w.r.t. a distribution $G: X \mapsto [0, 1]$ on its domain X; Lebesgue integral integrates a real- or vector-valued function w.r.t. a measure $\mu: \Sigma \mapsto [0, 1]$ on its domain X.
Lebesgue integrable functions are Lebesgue integrable on any measurable subdomain: $L^1_\mu(X) = \cap_{A \in \Sigma} L^1_\mu(A)$. Every Lebesgue integrable function $f \in L^1_\mu(X)$ on a complete sigma-finite measure space induces a measure $\mu_f = f \mu$ on the measurable space: $\forall A \in \Sigma$, $\mu_f(A) = \int_A f~\text{d}\mu$. Radon-Nikodym Theorem (representation of absolutely continuous measures): Any absolutely continuous measure w.r.t. the measure of a complete sigma-finite measure space can be uniquely represented as the product of a Lebesgue integrable function and the measure: $\forall \nu \ll \mu$, $\exists! f \in L^1_\mu(X)$: $\nu = f \mu$. Hence the set of all Lebesgue integrable functions on a complete sigma-finite measure space is isomorphic to the set of all absolutely continuous measures w.r.t. the measure: $L^1_\mu(X) \cong \{\nu: \nu \ll \mu\}$.
Passage to the Limit under the Lebesgue Integral [@Lebesgue1909]: If a sequence $\{f_n\}$ of measurable functions on $(X, \Sigma, \mu)$ that converges almost-everywhere to a function f is absolutely bounded above by a Lebesgue integrable function $\Phi \in L^1(X, \Sigma, \mu)$, $\sup_{n \in \mathbb{N}} |f_n| \le \Phi$, then the sequence and the limit are all Lebesgue integrable and the limit of the sequence of integrals equals the integral of the limit: $\lim_{n \to \infty} \int_X f_n~\text{d}\mu = \int_X f~\text{d}\mu$.
Multiple Lebesgue integral is the Lebesgue integral of a multivariate function $f: \prod_i X_i \mapsto \mathbb{R}$, where the domain is the completion of the product measure space of complete sigma-finite measure spaces $\{(X_i, \Sigma_i, \mu_i)\}_{i=1}^n$. Multiple Integral as Repeated Integrals [@Fubini1907]: Given a measurable function $f: S \times T \mapsto \mathbb{R}$ on the product measure space $(S \times T, \mathcal{S \times T}, \mu \times \theta)$ of two sigma-finite measure spaces $(S, \mathcal{S}, \mu)$ and $(T, \mathcal{T}, \theta)$, if $f \geq 0$ or $\int_S \text{d} \mu \int_T |f| \,\text{d} \theta < \infty$, then $\int_{S \times T} f~\text{d}(\mu \times \theta) = \int_S \text{d} \mu \int_T f~\text{d} \theta = \int_T \text{d} \theta \int_S f~\text{d} \mu$.
L^p norm $\|\cdot\|_{p, \mu}$, where $p \in [1, \infty)$, of a measurable function on a complete sigma-finite measure space $(X, \Sigma, \mu)$ is the positive p-th root of the Lebesgue integral of the p-th power of the absolute function: $\|f\|_{p,\mu} = \sqrt[p]{\int_X |f|^p~\text{d} \mu}$. L^p metric $d_{p, \mu}(\cdot, \cdot)$ is the metric induced from an $L^p$ norm. The $L^2$ norm can be extended to an inner product of functions: $\langle f, g \rangle_\mu = \int_X f \bar g~\text{d} \mu$. L^p space $L^p_\mu(X)$ or $L^p(X, \Sigma, \mu)$ on a complete sigma-finite measure space is the $L^p$-normed space of measurable functions on the measure space: $L^p_\mu(X) = \{f : \int_X |f|^p~\text{d}\mu < \infty\}$. Note that the L in the name derives from Lebesgue, but do not confuse it with Lebesgue spaces. Every $L^p$ space, $p \in [1, \infty)$, is a Banach space. The $L^p$ space on a domain of integration in a Lebesgue measure space is the completion of the $L^p$-normed space of continuous real-valued functions with compact support on the domain: $L^p(D) = (\widehat C(D), \|\cdot\|_p)$. An $L^1$ space consists of all Lebesgue integrable functions on a measure space. Given a function in an $L^1$ space, the space also contains its absolute value, every function that differs only on a measure-zero subset, and every measurable function absolutely bounded by it: $f \in L^1_\mu(X)$ then $|f|, f + g, h \in L^1_\mu(X)$, where $\mu(\text{supp}(g)) = 0$ and $|h| \le |f|$. An $L^2$ space consists of all Lebesgue square integrable functions on a measure space, and is a Hilbert space when endowed with the inner product of functions. Equivalence = of functions in an $L^p$ space is defined by almost equality: $f, g \in L^p_\mu(X)$, $f = g$ iff $f \approx_\mu g$.
Essential supremum $\text{ess} \sup |f|$ of a measurable function on a measure space $(X, \Sigma, \mu)$ is the smallest absolute bound of the function almost everywhere: $\text{ess} \sup |f| = \inf \{a : \mu(\{x: |f(x)| > a\}) = 0\}$. L^∞ norm $\|\cdot\|_{\infty, \mu}$ of a measurable function space is the norm that equals the essential supremum of the absolute function: $\|f\|_{\infty, \mu} = \text{ess} \sup_{x \in X} |f(x)|$. In comparison, uniform norm, sup norm, or infinity norm $\|\cdot\|_{\infty}$ on a scalar-valued function space is the norm that equals the supremum of the absolute function: $\|f\|_{\infty} = \sup_{x \in X} |f(x)|$. L^∞ space $L^\infty_\mu(X)$ or $L^\infty(X, \Sigma, \mu)$ on a complete sigma-finite measure space is the $L^\infty$-normed space of measurable functions on the measure space: $L^\infty_\mu(X) = \{f : \text{ess} \sup |f| < \infty \}$. Every $L^\infty$ space is a Banach space. The space of absolutely bounded, continuous real-valued functions on a domain of integration in a Lebesgue measure space, endowed with the uniform norm, is a Banach subspace of the $L^\infty$ space on the domain: $(C(D), \|\cdot\|_\infty) \subset L^\infty(D)$. $L^∞$ space is an extension of $L^p$ spaces with finite p, which has some different properties.
Inclusion of L^p and L^∞ spaces [@Villani1985]: Given a complete sigma-finite measure space $(X, \Sigma, \mu)$, the following are equivalent: (1) the measure is finite: $\mu(X) < \infty$; (2) one of its $L^p$ spaces includes another of a higher power: $\exists 1 \le p < q \le \infty$: $L^p_\mu(X) \supset L^q_\mu(X)$; (3) its $L^p$ spaces form a descending chain of set inclusion: $(\{L^p_\mu(X)\}_{p \in [1, \infty]}, \subset)$. And the following are also equivalent: (1) the measure is discrete: $\inf_{\mu(A) > 0} \mu(A) > 0$; (2) one of its $L^p$ spaces includes another of a lower power: $\exists 1 \le p < q \le \infty$: $L^p_\mu(X) \subset L^q_\mu(X)$; (3) its $L^p$ spaces form an ascending chain of set inclusion: $(\{L^p_\mu(X)\}_{p \in [1, \infty]}, \subset)$. As a result, almost-everywhere absolutely bounded measurable functions are Lebesgue integrable on subdomains of finite measure; $L^p$ spaces on a finite continuous measure space form a strictly descending chain of set inclusion; $L^p$ spaces on an infinite discrete measure space form a strictly ascending chain of set inclusion.
Integral operator $T_K$ between spaces of measurable functions on Hausdorff topological measure spaces, given a measurable function K on the product measure space, is the operator defined by the integral w.r.t. the second variable of the function: given $(X, \mathcal{T}_X, \mu)$, $(Y, \mathcal{T}_Y, \nu)$, and $K \in \mathcal{M}(X \times Y, \Sigma_X \times \Sigma_Y)$, define $T_K: \mathcal{M}(Y, \Sigma_Y) \mapsto \mathcal{M}(X, \Sigma_X)$, by $(T_K g)(x) := \int_Y K(x, y) g(y) d \nu$. Kernel $K(x, y)$ of an integral operator is the measurable bivariate function in the integrand.
Boundedness of an Integral Operator:
Sobolev norm $\|\cdot\|_{s,p}$, where $s \in \mathbb{N}$ and $p \in [1, \infty)$, of a real-valued function on a domain of integration in a Lebesgue measure space is the sum of the $L^p$ norms of its partial derivatives up to the s-th order: $\|f\|_{s,p} = \sum_{|I| \le s} \left\|\frac{\partial f}{\partial x^I}\right\|_p$, where I is a multi-index. Sobolev space $W^{s,p}(D)$ on a domain of integration in a Lebesgue measure space is the Sobolev-normed space of measureable functions on the domain: $W^{s,p}(D) = \{f: |I| \le s, \int_D \left|\frac{\partial f}{\partial x^I}\right|^p dx < \infty\}$, or in a shorter form with perhaps a little abuse of notation, $W^{s,p}(D) = \bigcap_{|I| \le s} \frac{\partial L^p(D)}{\partial x^I}$. Every Sobolev space is a Banach space. A Sobolev space is the completion of the Sobolev-normed space of smooth real-valued functions: $W^{s,p}(D) = (\widehat{C^\infty}(D), \|\cdot\|_{s,p})$.
Left-invariant Haar measure on a locally compact Hausdorff group G is a positive measure on the sigma-ring of subsets generated by the family $\mathcal{C}$ of all compact subsets, which takes finite values on all compact subsets and is left-invariant: let $\mathcal{C} = {S \subset G : S~\text{is compact}}$, $\forall S \in \Sigma(\mathcal{C})$, $\forall g \in G$, $\mu(S) = \mu(g S)$. Right-invariant Haar measure is similarly defined, but with right-invariance: $\mu(S) = \mu(S g)$. Every locally compact Hausdorff group has a unique left-invariant measure and a unique right-invariant Haar measure, both up to a positive factor [@Haar1933].
improper integral. weak integral. strong integral.
Minimal L^p-metric $l_p$ or the p-th Wasserstein metric $W_p$, $p \in [1, \infty)$, on the space of probability measures with p-th moments on a Borel measurable Euclidean space is the metric defined by the infimum of p-th root of p-th moments of their difference over all joint probabilities: $l_p(P, \tilde P) = \inf_\mu \|d(x, y)\|_{p, \mu}$, where $\mu(A \times \mathbb{R}^n) = P(A)$, $\mu(\mathbb{R}^n \times A) = \tilde P(A)$; more explicitly, $l_p(P, \tilde P) = \inf_\mu \sqrt[p]{\int_{\mathbb{R}^{2n}} \|x-y\|^p d\mu}$.
The minimal $L^1$-metric is also called the Kantorovich metric [@Kantorovich1940], the Wasserstein metric [@Vaseršteĭn1969], or the earth mover's distance [@Stolfi1994]. For probability distributions on the real line, the minimal $L^1$-metric equals the $L^1$-metric between CDFs: $l_1(P, \tilde P) = d_1(F, \tilde F)$, i.e. $l_1(P, \tilde P) = \int_\mathbb{R} |F(x) - \tilde F(x)| dx$.
The minimal $L^2$-metric is also called Mallows metric [@Mallows1972]. For classes of "related elliptically symmetric" probability measures, such as the class of all (regular and singular) n-dimensional Gaussian distributions, the minimal $L^2$-metric can be computed as [@Gelbrich1990]: $l_2^2(P, \tilde P) = |m - \tilde m|^2 + \text{tr}(\Sigma) + \text{tr}(\tilde \Sigma) - 2 \text{tr}[(\Sigma^{1/2} \tilde \Sigma \Sigma^{1/2})^{1/2}]$.
As metrics to compare probability measures, $W_1$ is the minimal cumulative distance to transport the probability mass, which is proportional to the true cost when fixed cost is zero and marginal cost is constant. For $W_2$, the optimal transport plan is determined by a pressureless potential flow from μ to ν as seen from a kinetic energy minimization formulation.
$W_2$ is much easier to analyze than $W_1$, and often used in minimization objectives, but the latter is usually more robust to outliers and noise than the former. The optimal transport map between two absolutely continuous measures is unique in $W_2$, but generally not unique in $W_2$.
excessive computational cost. scales with size of the finite metric space. The empirical runtime of current algorithms scales super-quadratically, typically with a coefficient of 2 to 2.5.
Hellinger distance $H(P, Q)$ [@Hellinger1909] between two probability measures absolutely continuous w.r.t. a reference probability measure μ, is a bounded metric on the set of all absolutely continuous probability measures w.r.t. μ: let $\mathcal{P}_\mu = \{P : P \ll \mu, P(X) = 1\}$, then $H: \mathcal{P}_\mu^2 \mapsto [0, 1]$, and is defined as $H^2(P, Q) = 2^{-1} \int_X \left(\sqrt{{dP}/{d\mu}} - \sqrt{{dQ}/{d\mu}}\right)^2 d\mu$. For random variables with PDFs, this becomes: $H^2(P, Q) = 1 - \int_{\mathbb{R}^n} \sqrt{f(x) g(x)} dx$. The Hellinger distance has analytical forms for many common parametric probability models, e.g. Gaussian, exponential, Beta, Poisson, Weibull. Mahalanobis distance $D_M(x, P)$ [@Mahalanobis1936] of a point from a probability measure is the Euclidean distance from the point to the mean, with standardized covariance matrix: $D_M^2(x, P) = (x-\mu)^T \Sigma^{-1} (x-\mu)$. In one dimension, it reduces to distance from the mean in standard deviations. Mahalanobis distance $D_M(x, y)$ between two realizations of the same distribution P is the generalized Euclidean metric where the weight is the inverse covariance matrix: $D_M^2(x, y) = (x-y)^T \Sigma^{-1} (x-y)$. Bhattacharyya distance $D_B(P, Q)$ [@Bhattacharyya1943] between two probability measures is a discrepancy function related to the Hellinger distance such that $H^2 = 1 - \exp(-D_B)$. For random variables with PDFs, this means $D_B(P, Q) = - \ln \int_{\mathbb{R}^n} \sqrt{f(x) g(x)} dx$. The Bhattacharyya distance is not a metric because it does not satisfy the triangular inequality.
Total variation distance between two probability measures is the sup-norm of their difference: $\delta(P, Q) = |P - Q|_\infty = \sup_{A \in \Sigma} |P(A) - Q(A)|$. If the underlying set is countable, then the total variation distance equals half of their $l_1$ distance (see as functions over the underlying set): $\delta(P, Q) = 2^{-1} |P - Q|_1 = 2^{-1} \sum_{\omega \in \Omega} |P(\omega) - Q(\omega)|$.
Kullback-Leibler divergence $D_\text{KL}(P \mid\mid Q)$ of Q from P, relative entropy of Q w.r.t. P, or discrimination information for P over Q [@Kullback and Leibler, 1951], where reference P and query Q are probability measures on a Euclidean space, is the expected number of extra bits required to code samples from P when using a code based on Q: $D_\text{KL}(P \mid\mid Q) = \mathbb{E}_P [-\log_2 f]$ if $Q \ll P$, $Q = f P$; otherwise, $D_\text{KL}(P \mid\mid Q) = \infty$. For continuous distributions with densities p and q, $D_\text{KL}(P \mid\mid Q) = \int_X - p(x) \log_2 \frac{q(x)}{p(x)} dx$. For discrete distributions with masses $P_i$ and $Q_i$, $D_\text{KL}(P \mid\mid Q) = \sum_{i \in I} -P_i \log_2 \frac{Q_i}{P_i}$. K-L divergence is non-degenerate (zero when $f = 1, \text{a.e.}~P$, positive otherwise), but it is not a metric because it is asymmetric and does not satisfy the triangular inequality. Instead, K-L divergence is an upper bound for the squared $L^1$ metric: $D_\text{KL}(P \mid\mid Q) \ge \frac{1}{2} \|P - Q\|_1^2$. With P fixed, minimizing K-L divergence is equivalent to maximizing expected log-likelihood $\mathbb{E}_p \log q$. When computing K-L divergence for distributions with different supports, it is common in practice to smooth them onto the union of supports.
Books on measure theory: