Stochastic convergence is the convergence of R.V. & Distributions.
Convergence concepts are build on topologies, which is commonly specified by a metric on a metric space.
Convergence mode, notation, $(\mathcal{L}, d(\cdot,\cdot))$:
A simple sufficient condition for almost sure convergence is: $\forall \varepsilon >0, \sum_{n=1}^{\infty} P{ |X_n-X| \geq \varepsilon } < \infty$
Proof:
$$\begin{align} & X_n \xrightarrow{a.e.} X \\ \iff & P{ \lim_{n \to \infty} X_n(\omega) \neq X(\omega) } =0 \\ \iff & \forall \varepsilon >0, P{ lim_{n \to \infty} \cup_{m\geq n} { |X_m-X| \geq \varepsilon } } =0 \\ \iff & \forall \varepsilon >0, lim_{n\to \infty} P{ \cup_{m\geq n} { |X_m-X| \geq \varepsilon } } =0 \\ \Longleftarrow & \forall \varepsilon >0, lim_{n\to \infty} \sum_{m\geq n} P{ |X_m-X| \geq \varepsilon } = 0 \\ \iff & \forall \varepsilon >0, \sum_{n=1}^{\infty} P{ |X_n-X| \geq \varepsilon } < \infty \end{align}$$
The typical definition of convergence in probability is: $\forall a>0, \forall \varepsilon>0, \exists N\in \mathbb{N}: \forall n>N, P{ |X_n-X| \geq a } < \varepsilon$.
An alternative definition is based on metric space $\mathcal{L}_p$, where $\mathcal{L}_p$ is the space of r.v.'s with finite norm $\mathbb{E}\tfrac{|\cdot|}{1+|\cdot|}$, and the metric is the one associated with the norm. $\mathcal{L}_p$ space is complete.
It can be shown using basic inequalities that the two definitions are equivalent.
Stochastic order notation
We denote $X_n = o_p(1)$ if the sequence of random variables $\{ X_n \}$ converges to 0 in probability. Symbolically, $X_n \overset{p}{\to} 0$.
We denote $X_n = O_p(1)$ if the sequence of random variables $\{ X_n \}$ is uniformly bounded in probability. Symbolically, $\forall \varepsilon >0, \exists M>0: \limsup_{n\to\infty} P \{ \lvert X_n \rvert > M \} \leq \varepsilon$
It can be proved that manipulating rules for order in probability notation is in direct parallel to other big-O notations. For example, $o_p(1) + o_p(1) = o_p(1), \quad o_p(1) + O_p(1) = O_p(1), \quad O_p(1) + O_p(1) = O_p(1) \\ o_p(1) o_p(1) = o_p(1), \quad o_p(1) O_p(1) = o_p(1), \quad O_p(1) O_p(1) = O_p(1)$
The $\mathcal{L}_r$-norm of r.v. is $||\cdot||_r = ( \mathbb{E}|\cdot|^r )^{\frac{1}{r}}$
The $\mathcal{L}_r$ space of r.v.'s is almost a metric space, where the metric $L_r$-norm satisfies nonnegativity, symmetry, and triangle inequality (Minkowski's inequality), but positivity is not satisfied.
If we treat equivalence classes $[X] = \{ Y \in \mathcal{L}_r \mid Y=X \text{ almost surely} \}$ as fundamental points in the $\mathcal{L}_r$ space, then the $\mathcal{L}_r$ space becomes a metric space.
It can be shown that $\mathcal{L}_r (\mathcal{U},\mathcal{F},P)$ is complete, and $\mathcal{L}_2 (\mathcal{U},\mathcal{F},P)$ is a Hilbert space.
Hierarchy of r.v. spaces: $\mathcal{L}_p \supsetneq \mathcal{L}_1 \supsetneq \mathcal{L}_2 \supsetneq \cdots \supsetneq \mathcal{L}_{\infty}$
Convergence of distribution functions is pointwise convergence to some distribution function, except for its discontinuous points. The underlying metric space is $([0,1],|\cdot|)$. Denoted as $\mathbf{X}_n \Rightarrow \mathbf{X}$.
The corresponding sequance of random variables is said to converge weakly.
Note:
A series of random vectors converges in distribution iff all its one-dimensional projection also converges in distribution.
Definition: (the portmanteau theorem) Let S be a metric space with its Borel σ-algebra Σ. We say that a sequence of probability measures ${ P_n }$ on (S, Σ) converges weakly to the probability measure P, if and only if any of the following equivalent conditions is true:
In the common case when $S=\mathbb{R}$ with its usual topology, then weak convergence of probability measures is equivalent to convergence of distribution functions.
Weak convergence of probability measures is denoted as $P_n \rightarrow P$.
Relations of convergence modes
(Proof)
Neither of them implies the other.
Convergence of expectation is defined as: $$\lim_{n\to\infty} \mathbb{E}[Z_n - Z] =0$$
Convergence of expectation is related to convergence in $L_1$, but much weaker. Still, it's not weaker than convergence in probability. Counterexamples typically have unbounded heavy tail. Consider sequence $Z_n$ which takes value n with probability $\frac{1}{n}$, and equals 0 otherwise. This sequence converges to degenerated r.v. 0 in probability, but converges to 1 in expectation.