Basic properties:

- Linearity: $\mathbb{E}[aX+bY|Z=z] = a \mathbb{E}[X|Z=z] + b \mathbb{E}[Y|Z=z]$
- $\mathbb{E}[Y \phi(X)|X=x] = \phi(x) \mathbb{E}[Y|X=x]$

Tower property (Law of Iterated Expectations): $$\mathbb{E} [ \mathbb{E}(Y|X) ] = \mathbb{E}Y$$

Conditional variance identity: $$\mathrm{Var}Y = \mathbb{E}[\mathrm{Var}(Y|X)] + \mathrm{Var}[ \mathbb{E}(Y|X) ]$$

Generalization of conditional variance identity: $$\mathbb{E} [Y-\eta(X)]^2 = d_2^2(\mathbb{E}(Y|X),\eta(X)) + \mathbb{E}[\mathrm{Var}(Y|X)]$$

The advantage of hierarchical models is that

- complicated processes may be modeled by a sequence of relatively simple models placed in a hierarchy.
- hierarchical models often make calculations easier.

Theoretically, any more complicated hierarchy can be treated as a two-stage hierarchy.

Hierarchical models lead to mixture distribution.

Independence originates from the idea that for the two events considered, the state of either event doesn't affect the other.
In the perspective of probability theory, independence means additional knowledge of either event does not affect your knowledge of the other.
Symbolically, $P(A|B) = P(A)$ and $P(B|A) = P(B)$, which is equivalent to $P(A \cap B) = P(A) \cdot P(B)$.
The latter equation is then generalized to the standard definition of **statistical independence** (which is weaker than the original idea):

$$f(x,y) = f(x) \cdot f(y)$$

Formally, **dependence** refers to any situation in which random variables do not satisfy the condition of statistical independence.

Definition: Random vectors $\mathbf{X}_1, \cdots, \mathbf{X}_n$ have joint PDF (or PMF) $f(\mathbf{x}_1, \cdots, \mathbf{x}_n)$ and marginal PDFs (or PMFs) $f_{\mathbf{X}_i} (\mathbf{x}_i)$. They are called **mutually independent random vectors** if

$$f(\mathbf{x}_1, \cdots, \mathbf{x}_n) = \prod_{i=1}^n f_{\mathbf{X}_i}(\mathbf{x}_i)$$

Theorem: Random vectors $\mathbf{X}_1, \cdots, \mathbf{X}_n$ are mutually independent, iff there exist functions $g_1, \cdots, g_n$, s.t. $f(\mathbf{x}_1, \cdots, \mathbf{x}_n) = \prod_{i=1}^n g_i(\mathbf{x}_i)$

Theorem: For mutually independent random vectors $\mathbf{X}_1, \cdots, \mathbf{X}_n$ and any functions $g_1, \cdots, g_n$, random variables (or vectors) $g_1(\mathbf{X}_1), \cdots, g_n(\mathbf{X}_n)$ are also mutually independent.

Properties:

- For mutually independent random vectors $\mathbf{X}_1, \cdots, \mathbf{X}_n$ and any functions $g_1, \cdots, g_n$,

$$\mathbb{E}( \prod_{i=1}^n g_i(X_i) ) = \prod_{i=1}^n (\mathbb{E} g_i(X_i))$$

**Covariance** of multiple random variables is the generalization of variance of a real random variable.
$$\mathrm{cov}(X,Y) = \mathbb{E}[(X-\mu_X)(Y-\mu_Y)]$$

**Correlation** of two random variables indicates the average tendency for the conditioned mean of one random variable change in accord to the other.
$$\mathrm{cor}(X,Y) = \frac{\mathrm{cov}(X,Y)}{\sigma_X \sigma_Y}$$

The information given by correlation is not enough to define the dependence structure between random variables.