[multiple Contents] [Previous File] [Next File]

But the variances in our predictions vary:

which must be estimated by

Notice the sharp increase in the amount of computation required to compute the estimated variance of a prediction as we move from k=2 to the general LRM. Now, to compute a confidence interval for a prediction after the estimation process itself requires a 1xk vector multiplied by a kxk matrix, the result (1xk vector) then multiplied by a kx1. For k=10, this would take hours by hand. It is very important then to learn how to use Stata's predict command and/or Stata's matrix algebra facilities.

```
.
. * Let's compute 95 confidence intervals for expected
. * age for each religion - first, what is the critical t value?
. di invt(5590,.95)
1.9603884
. * By the way: notice that with 5590 degrees of freedom t=z
. predict agehat
. * grab estimated standard error of mean prediction
. predict ahse, stdp
. * here's one way to compute lower and upper bound
. gen alo = agehat - ahse * invt(5590,.95)
. gen ahi = agehat + ahse * invt(5590,.95)
. sort relig2
. by relig2: summ alo agehat ahi
-> relig2= none
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
alo | 221 16.49089 0 16.49089 16.49089
agehat | 221 16.79512 0 16.79512 16.79512
ahi | 221 17.09935 0 17.09935 17.09935
-> relig2= protest
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
alo | 3162 17.12446 4.17e-06 17.12446 17.12446
agehat | 3162 17.20531 4.70e-06 17.20531 17.20531
ahi | 3162 17.28615 4.73e-06 17.28615 17.28615
-> relig2= catholic
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
alo | 2159 17.67134 0 17.67134 17.67134
agehat | 2159 17.77229 3.80e-06 17.77229 17.77229
ahi | 2159 17.87323 0 17.87323 17.87323
-> relig2= non_chrs
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
alo | 721 17.40415 0 17.40415 17.40415
agehat | 721 17.57853 0 17.57853 17.57853
ahi | 721 17.7529 0 17.7529 17.7529
. *CONCLUSION: On average, Catholic girls start .58 years to 1.38 years
. * later, 19 times out of 20.
```

- Violation of A0
if the data do not contain all the data then one cannot compute all of the OLS estimates, so violation of this assumption is fatal to estimating the full model.

- Violation of A1
there are many reasons to think about how the form of the Population Regression Equation may be wrong. Perhaps one is not sure that the PRE should be linear, log-linear, or some other transformation of the original data. Even worse, perhaps the relation between Y and X can't be linearized at all. That is one way to think about the logit and probit models - as extensions of the OLS model in which the relationship between Y and X is a specific one that cannot fit into the LRM framework.

- A2
If the PRE did not generate the sample information then some other process did. So violation of these assumptions is in many respects equivalent to the violation of A1.

- A3
As mentioned in the original discussion of the assumptions, A3 is really a technical assumption. If E[u] is not zero then one can re-define to include the mean of u.

- Violation of A4
Recall that under A0-A4 OLS estimates are unbiased estimates of the LRM parameters. Violating Assumptions A0-A3 leads to either trivial changes (in the case of A0 and A3) or fairly esoteric statistical issues of what model seems to be generating the data. I call this esoteric because it tends to lead one away from the

**economic content**of the LRM, unless one looks for other specifications that come from economic theory. However, most analysis of violations of A1 and A2 are not based on economic theory.Violation of A4/A4* is another matter altogether. If cov(u,X) is not 0 (the A4* way of seeing it), or if one cannot think of X as "exogenous" to Y (the A4 way), then one is in trouble. And this trouble has a lot to do with economic theory. A whole chapter on IQ scores is devoted to explaining why A4 is a critical assumption.

- Violation of A5-A6
Traditionally one would go from this point to study ways to "fix" OLS estimates when A5 and A6 do not hold. But it is important that these assumptions only enter at the point of deriving . This means estimated standard errors and confidence intervals won't be good if A5 and/or A6 is not true. However, this is in many ways a much smaller problem than having either a non-linear model or not even having unbiased estimates. Therefore, our discussion of these assumptions is brief.

- Violation of A7
Since OLS estimates are linear in the Y's, violation of the normality assumption is really only a problem in small samples. That's because for values of N larger than 30 the Central Limit Theorem begins to kick in. That is will be asymptotically normally distributed even if u is not normally distributed.