multiple File 8

HyperMetricsNotes

multiple File 8
[multiple Contents] [Previous File] [Next File]

H. Prediction and Analysis of Variance in the MLRM

Prediction

The OLS prediction for a particular Y or for the expected value of Y conditional on values of X follows the same logic as in the simple LRM. Now, however, we want to predict for a 1xk vector of X values, which would describe one observation. Let $X_0$

be the vector you want to predict for. Then following the earlier discussion the mean and individual predictions are the same,
$\eqalign{\hat E[ Y_0 | X_0] &= X_0\hat\beta\cr \hat Y_0|X_0 &= X_0\hat\beta\cr}$
But the variances in our predictions vary:
$\eqalign{ Var(\hat E[Y_0]) &= {X_0}' Var(\hat\beta) X_0 = \sigma^2 {X_O}'(X'X)^{-1}X_O\cr Var(\hat Y_0) &= {X_0}' Var(\hat\beta) X_0 + \sigma^2\cr }$
which must be estimated by
$\eqalign{ \hat{Var} (\hat E[Y_0]) &= \hat\sigma^2 {X_0}' (X'X)^{-1} X_0\cr \hat{Var} (\hat Y_0) &= \hat\sigma^2\bigl( {X_0}' (X'X)^{-1} X_0X_0+1\bigr) \cr }$
Notice the sharp increase in the amount of computation required to compute the estimated variance of a prediction as we move from k=2 to the general LRM. Now, to compute a confidence interval for a prediction after the estimation process itself requires a 1xk vector multiplied by a kxk matrix, the result (1xk vector) then multiplied by a kx1. For k=10, this would take hours by hand. It is very important then to learn how to use Stata's predict command and/or Stata's matrix algebra facilities.


.
. * Let's compute 95  confidence intervals for expected
. * age for each religion - first, what is the critical t value?
. di invt(5590,.95)
1.9603884
. * By the way:  notice that with 5590 degrees of freedom t=z
. predict agehat
. * grab estimated standard error of mean prediction
. predict ahse, stdp
. * here's one way to compute lower and upper bound
. gen  alo = agehat - ahse * invt(5590,.95)
. gen  ahi = agehat + ahse * invt(5590,.95)
. sort relig2
. by relig2: summ alo agehat ahi

-> relig2=     none  
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     alo |     221    16.49089          0   16.49089   16.49089  
  agehat |     221    16.79512          0   16.79512   16.79512  
     ahi |     221    17.09935          0   17.09935   17.09935  

-> relig2=  protest  
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     alo |    3162    17.12446   4.17e-06   17.12446   17.12446  
  agehat |    3162    17.20531   4.70e-06   17.20531   17.20531  
     ahi |    3162    17.28615   4.73e-06   17.28615   17.28615  

-> relig2= catholic  
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     alo |    2159    17.67134          0   17.67134   17.67134  
  agehat |    2159    17.77229   3.80e-06   17.77229   17.77229  
     ahi |    2159    17.87323          0   17.87323   17.87323  

-> relig2= non_chrs  
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     alo |     721    17.40415          0   17.40415   17.40415  
  agehat |     721    17.57853          0   17.57853   17.57853  
     ahi |     721     17.7529          0    17.7529    17.7529  

. *CONCLUSION:  On average, Catholic girls start .58 years to 1.38 years
. * later, 19 times out of 20.

Analysis of Variance

The analysis of variance interpretation of OLS works exactly the same way in the MRLM as the Simple LRM. In particular, OLS estimates decompose the total variation in Y around the sample mean into two components: explained variation and unexplained or residual variation. So TSS = RSS + ESS. The only differences is that RSS, ESS, and TSS must be extended to matrix notation.
$\displaylines{ RSS = y'(I- X(X'X)^{-1}X')y\cr M \equiv \hbox{NxN matrix of ones}\cr (I -{1\over N}M)'(I-{1\over N}M) = I - {1\over N}M TSS = y'(I - {1\over N}M)y\cr ESS = y' (X(X'X)^{-1}X' - {1\over N}M)'(X(X'X)^{-1}X' -{1\over N}M) y\cr R^2 \equiv {ESS\over TSS}\cr }$

I. Violation of MLRM Assumptions

Violation of A0
if the data do not contain all the data then one cannot compute all of the OLS estimates, so violation of this assumption is fatal to estimating the full model.
Violation of A1
there are many reasons to think about how the form of the Population Regression Equation may be wrong. Perhaps one is not sure that the PRE should be linear, log-linear, or some other transformation of the original data. Even worse, perhaps the relation between Y and X can't be linearized at all. That is one way to think about the logit and probit models - as extensions of the OLS model in which the relationship between Y and X is a specific one that cannot fit into the LRM framework.
A2
If the PRE did not generate the sample information then some other process did. So violation of these assumptions is in many respects equivalent to the violation of A1.
A3
As mentioned in the original discussion of the assumptions, A3 is really a technical assumption. If E[u] is not zero then one can re-define $\beta_1$ to include the mean of u.
Violation of A4
Recall that under A0-A4 OLS estimates are unbiased estimates of the LRM parameters. Violating Assumptions A0-A3 leads to either trivial changes (in the case of A0 and A3) or fairly esoteric statistical issues of what model seems to be generating the data. I call this esoteric because it tends to lead one away from the economic content of the LRM, unless one looks for other specifications that come from economic theory. However, most analysis of violations of A1 and A2 are not based on economic theory.
Violation of A4/A4* is another matter altogether. If cov(u,X) is not 0 (the A4* way of seeing it), or if one cannot think of X as "exogenous" to Y (the A4 way), then one is in trouble. And this trouble has a lot to do with economic theory. A whole chapter on IQ scores is devoted to explaining why A4 is a critical assumption.
Violation of A5-A6
Traditionally one would go from this point to study ways to "fix" OLS estimates when A5 and A6 do not hold. But it is important that these assumptions only enter at the point of deriving $Var(\hat\beta)$ . This means estimated standard errors and confidence intervals won't be good if A5 and/or A6 is not true. However, this is in many ways a much smaller problem than having either a non-linear model or not even having unbiased estimates. Therefore, our discussion of these assumptions is brief.
Violation of A7
Since OLS estimates are linear in the Y's, violation of the normality assumption is really only a problem in small samples. That's because for values of N larger than 30 the Central Limit Theorem begins to kick in. That is $\hat\beta$ will be asymptotically normally distributed even if u is not normally distributed.

End of Document multiple

[multiple Contents] [Previous File]