Applied Econometrics

Microeconometrics [@Cameron2005]

Online Resources on Cameron:2005

ECON 615 Final Cheatsheet

Research Design

Research design is the overall strategy for the collection, measurement, and analysis of data in the social sciences.

Types of research designs: (Ordered by levels of evidence as in evidence-based practices/medicine.)

Experiment: randomized controlled trial (RCT); controlled trial;
Quasi-experiment (natural experiment);
Observational study: cohort study; case-control study; case study;

Data dimension:

Cross-sectional;
Longitudinal (time-series);
Panel: cohort study;

Assumptions in Statistical models for causal inference: [@Holland1986] The unit homogeneity assumption; The assumptions of temporal stability and causal transience;

Model Specification

Parametric Methods:

Linear models (OLS, WLS, IV);
Maximum likelihood (ML) and nonlinear least-square (NLS) estimation;
Generalized method of moments (GMM);

Semiparametric Methods:

Least absolute deviation (LAD) estimator;
- Maximum score (MS) estimator [@Manski1975];
  - Smoothed maximum score estimator [@Horowitz1992];
- Censored LAD estimator [@Powell1981; @Powell1983];
Symmetrically censored least square (SCLS) estimator [@Powell1986];
Partially Linear Model;
- Robinson Difference Estimator @Robinson1988 (16.5);
Single Index Models (9.7.4);
Generalized Additive Models (GAM);

Nonparametric Methods:

Kernel Density Estimation;
Conditional Density Estimation;
Nonparametric Regression;

Models for Cross-Section Data

Discrete Outcome/Choice Models

Binary outcome models:

MLEs as latent variable models;
1. Linear probability model;
2. Logit (logistic regression) model;
3. Probit model;
Grouped and Aggregated Data: Minimum chi-square estimator;

Multinomial outcome models:

Unordered outcomes:
1. conditional logit (CL), multinomial logit (MNL);
  - Independence of Irrelevant Alternatives;
2. nested logit (NL)?, three-level nested logit;
3. multinomial probit (MNP);
Ordered outcomes, and sequential decision:
1. ordered logit, ordered probit;
Specification test:
1. Likelihood ratio (LR);
2. Hausman test;
3. Hausman-McFadden test;
4. Small-Hsiao pseudo-likelihood ratio test;

Choice-based sampling: weighted MLE;

Model selection:

Akaike information criterion (AIC);
Bayesian information criterion (BIC);

Sample Selection Models

Sample Selection Models:

Tobit model (Censored model):
- MLE;
- Two-step estimator [Ahn and Powell, 1993];
Bivariate Sample Selection Model (Type 2 Tobit Model):
- Heckman two-step estimator (Heckman, 1979);
Roy models (Type 5 Tobit Model)

Simultaneous equations models:

simultaneous equations Tobit model;
simultaneous equations Probit model;
coherency condition;

Specification analysis:

Heteroscedasticity, serial correlations, and nonnormality;
- Nelson test, 1981;
- Hausman test;

Duration and Count Data Models

Duration Regression Models:

Proportional Hazard;
Left Censoring;
Markov Chain Models;

Count Data Models:

Poisson and Negative Binomial Models;
Simulated Maximum Likelihood (SML);

Count data models are suitable for samples taking non-negative integer values not much greater than zero. Such models are consistent with the bound and discreteness of the variable, while not adding too much complexity. In case of over-dispersion (variance-to-mean ratio greater than one), negative binomial and quasi-Poisson distributions are considered as alternatives to Poisson distribution, with variance increasing quadratically and linearly in expectation respectively. Binomial distribution is considered when the variable is also bounded above.

Models for Panel Data

Short Panels.

Multilevel models (hierarchical linear models, nested models) are statistical models with coefficients organized at more than one level (group). Depending on the model coefficients (effects), multilevel models can be classified into: random effects models (variance components models), fixed effects models, and mixed models. Random effects are estimated with partial pooling, shrinkage (linear unbiased prediction); Fixed effects are estimated using least squares or maximum likelihood. If a statistical model contains both fixed effects and random effects, it is called a mixed model.

Fixed effects are constant across individuals, and random effects vary. Fixed effects are treated as nuisance parameters, while estimation of marginal effects are of sole interest.

Static Panel Data Models

Simple Regression Models with Variable Intercepts: (dep = time-inv + time-var + individual + error)

$$y_{it} = z_i α + x_{it} β + u_i + ε_{it}$$

Pooled Model: disregard time periods;
- Pooled OLS estimator: OLS over panel;
Individual-specific Effects Model:
- Random Effects (RE) model (random intercept/components model; equicorrelated model): individual-specific effect uncorrelated with regressors, $\text(corr)(X_{it}, u_i) = 0$;
  - Between Estimator: OLS over individual time-averages;
  - Random Effects Estimator: Feasible GLS over panel; MLE;
- Fixed Effects (FE) Model: individual-specific effect correlates with regressors, $\text(corr)(X_{it}, u_i) \ne 0$;
  - Within Estimator (Fixed Effects Estimator, Lease-squares Dummy-variable (LSDV) Estimator, Covariance Estimator): OLS over panel after subtracting individual time-averages;
  - First Differences (FD) Estimator: OLS over panel after first-differences in time;
Specification Analysis:
- Individual-specific effect;

Dynamic Panel Data Models

Dynamic Models with Variable Intercepts: (AR(1)) (dep = time-inv + time-var + lag_dep + individual + error)

$$y_it = z_i α + x_it β + γ y_it-1 + u_i + ε_it$$

Random Effects Models:
- General FGLS Estimator (OLS residuals for error covariance matrix estimation, then FGLS);
- ML Estimator;
- GMM Estimator (IV estimator);
Fixed Effects Model:
- GMM Estimator (IV estimator);
- General FGLS;
  - Fixed Effect GLS Estimator (FEGLS);
  - First-difference GLS (FDGLS);
- Transformed ML Estimator;

GMM estimator. y ~ covariates | gmm instruments | "normal" instruments By default, all the variables of the model which are not used as GMM instruments are used as normal instruments with the same lag structure as the one specified in the model. Transformation: difference GMM; system GMM.

General FGLS is based on a two-step estimation process: first a model is estimated by OLS (pooling), fixed effects (within) or first differences (fd), then its residuals are used to estimate an error covariance matrix for use in a feasible-GLS analysis. This framework allows the error covariance structure inside every group (individual time series) to be fully unrestricted and is therefore robust against any type of intragroup heteroskedasticity and serial correlation. Conversely, this structure is assumed identical across groups and thus general FGLS estimation is inefficient under groupwise heteroskedasticity. Efficiency requires $N >> T$.

First difference with IV estimator.

Pooled OLS is biased upward and is inconsistent. GLS and ML estimators are also generally biased. Within estimator is biased, because eliminating the individual effect causes a correlation between the transformed error term and the transformed lagged dependent variable.

Complication: Limited Dependent Variable

For FE models:

Qualitative Choice Models (Discrete Data):
- Incidental Parameters Problem;
- Conditional MLE;
Sample Selection Models (Censored and Trancated Data):
- Trimmed LS estimator for FE model; [@Honore1992]

Bias-Adjusted Maximum Simulated Likelihood (MSL) Estimator [12.4];
Auxiliary Models [12.6];

Complication: Cross-Sectionally Dependent Panel Data

Spatial Approach;
Factor Approach;
Cross-sectional Mean Augmented Approach;
Test of Cross-Sectional Independence;

General FGLS is inefficient under cross-sectional correlation.

Panel Data Approach for Program Evaluation

Treatment Evaluation

Selection on observables and unobservables;
Propensity Score Matching Estimator (Rosenbaum and Rubin);
Other estimators:
- Differences-in-differences estimator;
- IV estimator for local average treatment effect (LATE) (under selection on unobservables);
- (Control function estimator);
Regression discontinuity (RD) design;

Climate Econometrics

Notes of @Hsiang2016.

Climate $C_iτ$ and weather $c_iτ$ are vectors of $K$ parameters specifying respectively the probability and empirical distributions of atmosphere-ocean states (temperature, rainfall, humidity, etc.) at location $i$ during period $τ$. Climate may affect an outcome directly through weather or through individual decision (belief): $Y(C) = Y(c(C), b(C))$, with (marginal) direct effect $∂Y/∂c$ and belief effect $∂Y/∂b$. Adaptation refers to the belief effects and the interactions between belief and direct effects $∂^2Y/∂b∂c$. Average treatment effect for climate change under current climatic and non-climatic factors: $β = E[Y|C+ΔC, x] - E[Y|C, x]$.

Non-experimental research designs to estimate $β$:

Cross-sectional (CS): exploit spatial variation; adopts the unit homogeneity assumption; subject to omitted variables bias (Estimates are biased if the model omits relevant variables; there is no systematic method to detect such omission.);
Time-series (TS): exploit temporal variation, with unit-specific fixed effects and time trends; adopts the marginal treatment comparability assumption, i.e. the same marginal change in weather and climate have the same effect on outcome;
Long differences (LD): cross-sectional comparison of changes over time, primarily used to test effects of gradual changes; a trade-off for the two assumptions;

Frequency-identification trade-off: Low observation frequency might capture belief effects, but the unit gets less comparable to itself between observations.

A partial test of marginal treatment comparability: If the estimates are stable across all temporal frequencies from unfiltered time-series to long differences to cross section, the marginal treatment comparability assumption is more plausible.

The marginal effects of climate and weather are identical, if the agent adapts its belief/action to the climate to consistently maximize the outcome which is a differentiable function of beliefs/actions. The total effect of climate change is the integration of marginal effects of weather, which can be computed using time-series estimates.

Climate should be parameterized into variables/measures that most strongly influence social or economic outcomes.

Important aspects in reduced-form econometric models of climate effects (dose-response function, regression function): 1. nonlinear effects: nonlinear response functions at the resolution of weather data can be recovered despite aggregated outcome data; 2. spatial and temporal displacement: distribution of net effect in time (harvesting, advancing an expected event; delayed effect, effects dominate after the event) and space (transmit effect across locations; remote effect, effects dominate at other locations); 3. statistical uncertainty: estimates of standard error may be biased, due to spatial and temporal autocorrelation in climate data; 4. measurement of adaptation: - if the adaptive action is known and observed directly as an outcome, the climate effect on adaptation can be estimated; - for an outcome influenced by adaptation, cross-sectional estimate captures all belief effects along with all direct effects, while time-series estimates stratified by proxies of adaptive actions can measure the overall net effect of all adaptive actions; 5. meta-anlysis (cross-study comparison and synthesis): results across studies can be combined to fit a response function applicable to all populations;

Attribute historical impacts and project future impacts of climate change under different scenarios, typically using models of partial equilibrium responses. General equilibrium responses include factor reallocation and price change, but is rarely studied.

Climate can affect an outcome via many mechanisms/pathways, and a specific mechanism/pathway may be isolated and estimated in a structural model.

Note that adaptation costs are almost never measured.

🏷 Category=Economics