Econ 452 Winter 2001 Outline (Part II)

ECON 452 (A & B) Winter 2001 C. Ferrall / A. Gregory

OUTLINE FOR PART II

Introduction

The goal of Part II is for you to learn three skills:

How to carry out econometric studies using data from cross-sectional surveys (in particular, DLI surveys stored in the QED Data Archive).
How to report results from your study so people will understand them
How to model a kind of variable that often appears in survey data: limited-dependent variables.

Some Questions About These Goals

Are these good goals?
Does the focus in Part II complement the focus in part I?
Are the projects designed to achieve these goals?
Why am I asked to find and read an article using similar data?
Should I expect a certain amount of frustration in carrying out these projects?
Are the deadlines firm?

Getting and Reporting Econometric Results Based on Survey Data

Example: The Economics of Abuse,

Survey Data isn't always easy to work with

Sometimes you have to worry about sampling weights
DLI files contain many masked or censored variables
Even when not masked or censored, pay attention to missing observations
Most survey questions have qualitative and categorical answers

"Section II: The Data"

Bowlus and Seitz

Tell your reader about your data sources.
Tell your reader how you selected observations and manipulated the data
Show your reader tables of summary statistics and in the text briefly discuss the patterns in the tables
Show your reader some informative graphs
Place supporting material or information that is difficult to display in prose into a data appendix

A Quick Review of Multiple Linear Regression (MLRM)

Based on: Multiple Regression Notes

Supplementary material: Mike Abbott's Econ 351 notes (and the corresponding sections of Gujarati)

Ordinary Least Squares (OLS) is perfect for the MLRM

The MLRM is defined by about seven assumptions, A0-A7
The OLS estimator is easy to derive and it's formula only depends on A0
The statistical properties of OLS depend crucially on A0-A7
Using OLS and all the assumptions of MLRM you can carry out hypothesis tests, construct confidence intervals, and make predictions
Doing all this is a breeze in Stata

"Section III: Empirical Results"

Show your reader well-constructed tables of regressions
Briefly discuss or intepret most of the regression coefficients
Bury unimportant coefficients or models in appendices, footnotes, or even deeper!

When the assumptions of the MLRM don't hold neither do the nice properties of OLS

It's not hard to correct for violations of constant variance (A6) using standard errors robust to heteroscedasticity
And you already spent six weeks worrying about serial correlation (~A5), so we'll skip that
It's tougher to deal with endogenous or error-correlated regressors (~A3), and we must skip this problem due to time constraints.
So we will focus on the linear assumption in the MLRM.

An Even Quicker Introduction to Maximum Likelihood Estimation

Maximum Likelihood starts with a model of the whole population distribution that depends on some unknown population (or model) parameters
The sample probability function treats data as variable and model parameters as fixed. The likelihood function is really just the probability function, but it treats model parameters as variable and data as fixed

Finding parameter values that maximize the likelihood function generally leads to statistically consistent and asymptotically normal estimates of the population parameters.

Under A0-A7 the ML estimates of the NMLRM are the same as the OLS estimates!

Let's see that using Stata's maxlik command ...

Limited Dependent Variables (LDVs)

Introduction

There are lots of Kinds and Examples of LDVs (Binary, Censored, Multinomial, and Truncated Outcomes)
Economic Models of Individual Choices Often Lead to LDV StatisticalModels

Binary Outcomes are the simplest type of LDV

The Linear Probability Model (LPM) just means running OLS on a binary variable
Logit and Probit are non-linear models for binary outcomes estimated using ML.

Interpretiting coefficients in these models are little trickier than in the MLRM
MLE of logit/probit is easy in Stata
You can do hypothesis testing and prediction

Present results of logit/probit models pretty much like OLS, with a few differences

Multinomial Outcomes require more structure than binary outcomes

If you want to model letter marks (A,B,C,etc), use the Ordered Logit/Probit Model
If you want to model something like which way people get to work use the Multinomial Logit/Probit Model
MLE works more or less the same as before.