Inference of a causal connection between the treatment and the outcome.

Problem Setup

Underlying problem:

  • a target population for the treatment of interest
  • randomly selected units who are eligible for treatment.
  • randomly selected units who are treated.
  • nontreated units who serve as a potential control group.


  • Outcome variable, $y$
    • Outcome variable of treated unit, $y_1$
    • Outcome variable of untreated unit, $y_0$
  • Observable variables, $\mathbf{x}$
  • Treatment variable (binary), $D$


  • Treatment effect on a unit, $\Delta_i = y_{1,i} - y_{0,i}$
  • Average treatment effect, $ATE = \mathbb{E}[y_1 - y_0]$
  • Average treatment effect on the treated, $ATET = \mathbb{E}[y_1 - y_0| D=1]$

Treatment effect on a unit is the (hypothetical) difference between the unit's outcome in treated and untreated states. Average treatment effect (ATE) is the population mean of treatment effect. Average treatment effect on the treated (ATET) is the treated sub-population mean of treatment effect.

Selection bias arises when the treatment variable is correlated with the error in the outcome equation. Selection on observables refers to correlation of omitted observable variables with treatment assignment; selection on unobservables refers to correlation of unobserved factors with both treatment assignment and outcome determination. Average selection bias is the difference between program participants and nonparticipants in the base state.


Conditional independence assumption: conditional on observable variables, the outcomes are independent of treatment.

$$y_0, y_1 ∐ D| \mathbf{x}$$


  1. This assumption is validate when random assignment is applied.
  2. The participation decision does not affect the distribution of potential outcomes.
  3. D may be treated as an exogenous variable.

Ignorability assumption: conditional on observable variables, the outcome of the untreated is independent of treatment.

$$y_0 ∐ D| \mathbf{x}$$


  • There is no omitted variable bias once x is included in the regression, and hence there will be no confounding.

Conditional mean independence assumption

$$\mathbb{E}[y_0 | D, \mathbf{x}] = \mathbb{E}[y_0 | \mathbf{x}]$$

Matching (Overlap) assumption: for each value of x there are both treated and nontreated cases.

$$0 < P[D = 1|\mathbf{x}] < 1$$

Stable unit treatment value assumption (SUTVA): the treatment effect on a particular unit does not affect other units.

Model Specification

$$\begin{matrix} y_1 = \mu_1 (\mathbf{x}) + u_1, \text{ if } D = 1 \\ y_0 = \mu_0 (\mathbf{x}) + u_0, \text{ if } D = 0 \\ \mathbb{E}[u_1|\mathbf{x}] = \mathbb{E}[u_0|\mathbf{x}] = 0 \end{matrix}$$


  • Average treatment effect to a unit, $\mu_1 (\mathbf{x}_i) - \mu_0 (\mathbf{x}_i)$.
  • Individual-specific benefit, $u_1 - u_0$


If the assignment to the treatment and control is random, ATET can be estimated as a simple average of the differential due to treatment.

A major ATET estimator is the propensity score matching estimator. {Rosenbaum and Rubin, 1983}

Other estimators includes:

  • Differences-in-differences estimator;
  • IV estimator for local average treatment effect (LATE);
  • Control function estimator

Propensity Score Matching Estimator


Exact matching: each treated unit is matched with untreated units that have the same observable variables $\mathbf{x}$, applicable when $\mathbf{x}$ take value over a discrete finite set.

Inexact matching: a treated unit is matched with untreated units in a neighborhood of observable variables $\mathbf{x}$. Typically $\mathbf{x}$ is mapped into a lower dimensional measure, e.g. propensity score $p(x)$.

The average outcome for the untreated matched group identifies the mean counterfactual outcome for the treated group in the absence of the treatment.


  1. [No Selection Bias] Unobservable variables play no role in the treatment assignment and outcome determination.
  2. The matching rule validates the matching assumption, or at least for each treated unit there is another matched untreated unit with a similar $\mathbf{x}$.

ATET is identifiable under assumption 2.

Propensity score

Propensity score is the probability of treatment conditioning on observable variables.

$$p(\mathbf{x}) = P[D = 1|\mathbf{X} = \mathbf{x}]$$

Implications of conditional independence assumption:

  1. Conditional independence on propensity score: $y_0, y_1 ∐ D| p(\mathbf{x})$
  2. If the data justify matching on x, then matching based on propensity score is also justified.

Propensity score estimator:

  1. Logit prediction: $\hat{p}(\mathbf{x}) = \Lambda(\mathbf{x}' \beta)$
  2. Probit prediction: $\hat{p}(\mathbf{x}) = \Phi(\mathbf{x}' \beta)$

Although logit and probit regressions are usually used to estimate propensity scores, a better statistical fit for the propensity score is more likely to result from a flexible parametric or nonparametric model.

Matching ATET estimator

Matching estimator estimates ATET at specific values of $\mathbf{x}$ without functional form assumptions.

General formula:

$$\text{ATET}^M = \frac{1}{N_T} \sum_{D_i=1} \left( y_{1,i} − \sum_{j \in A_i(\mathbf{x})} w(i,j) y_{0,j} \right)$$


  • Treated unit index, $i$.
  • Number of treated units, $N_T$
  • Characteristics neighborhood of $\mathbf{x}_i$, $c(\mathbf{x}_i)$.
  • Comparison group for the treated case i based on $\mathbf{x}_i$, $A_i(\mathbf{x})= \{ j|\mathbf{x}_j \in c(\mathbf{x}_i) \}$.
  • Weight of the $j$-th matched unit of the $i$-th treated unit, $w(i,j)$. It's required that $\sum_{j\in A_i(\mathbf{x})} w(i,j) = 1$.

Matching methods:

  1. Exact matching: $\text{ATET}^M = \sum_k w_k ( \bar{y}_{1,k} - \bar{y}_{0,k} )$, where $k$ index possible values of $\mathbf{x}$.
  2. Characteristics neighborhood
    1. Nearest-neighbor matching: $A_i(\mathbf{x}) = \{ \arg\min_j d(\mathbf{x}_i, \mathbf{x}_j) \}$, where the metric is Euclidean.
    2. Propensity score matching: any matching based on $p(\mathbf{x})$ in lieu of $\mathbf{x}$.
      1. Interval matching (Stratification): divide the range of propensity score into block, within each the treated and control units have the same average propensity score; compute for each block the difference between average outcomes of the treated and control units; the estimator is an average of block differences, weighted by the fraction of treated units in each block. $\text{ATET}^S = \frac{1}{N_T} \sum_b N_{T,b} ( \bar{y}_{1,b} - \bar{y}_{0,b} )$
      2. Radius matching: the matched group for the $i$-th treated unit consists of all control units with estimated propensity scores within radius $r$ of $p_i$. $A_i( p(\mathbf{x}) ) = \{ j | d(p_i, p_j) < r \}$
  3. Weighting
    1. Kernel matching: $w(i,j) = \frac{ K(\mathbf{x}_j - \mathbf{x}_i) }{ \sum_{j \in A_i(\mathbf{x})} K(\mathbf{x}_j - \mathbf{x}_i) }$

Matching without replacement means that any observation in the comparison group is matched to no more than one treated observation, that which is the closest match. Matching with replacement means that there can be multiple matches.


Balancing condition: $D ∐ \mathbf{x} | p(\mathbf{x})$

🏷 Category=Economics