multiple File 3

HyperMetricsNotes

multiple File 3
[multiple Contents] [Previous File] [Next File]

C. Derivation of OLS Estimates for the Multiple Regression Model

Definition of variables used in OLS estimation

$\eqalign{ \hat\beta &\equiv \left|\matrix{\hat\beta_1\cr\hat\beta_2\cr\vdots\cr\hat\beta_k\cr}\right|\cr\cr \hat{y} &\equiv X\hat\beta\cr\cr e &\equiv y - \hat{y} = y - X\hat\beta\cr\cr SS &\equiv e'e\cr }$
OLS estimates of $\beta$ minimize the scalar function SS. In class we derive the OLS estimator for the MLRM:
$\hat\beta^{OLS} \equiv (X'X)^{-1}X'y\eqno{(***)}$
This expression is only meaningful if the kxk matrix X'X is invertible. We rely on the following result which we will not prove:

X'X is invertible if and only if

The columns of X (corresponding to variables in the data set) are linearly independent of each other.

Linear independence means that no column can be written as a linear combination of the others. You might recall that this is the same as saying that X has full rank. You may be tempted to distribute the inverse operator through the matrix multiplication:
$(X'X)^{-1} = X^{-1}X^{-1}$
If this is possible, then $\hat\beta^{OLS}$ reduces to $X^{-1}y$ . But note that one can distribute the inverse operator only if each matrix in the multiplication has an inverse. X is a k x N, and can only have an inverse if k=N, because only square matrices have inverses. Therefore, if $k<N$ then it is not possible to simplify (***) any more.

OLS estimates in 3 Special Cases of the MLRM

k=1 (only a constant term appears on the right hand side)
Now X is simply a column of 1's and must have full rank. X'X then equals the scalar N (because it equals 1*1+1*1+...+1*1 = N. The matrix X'y simply equals the sum of Y values in the sample. So the OLS estimate $\hat\beta_1$ reduces to $\bar{Y}$ when only a constant term is included in the regression.
k=2 (simple LRM)
This is the case studied as the Simple LRM. You should verify that:
$(X'X) = \left|\matrix{ N&\sum X_i\cr\sum X_i & \sum X_i^2}\right|$
Then you should use the formula for the inverse of a 2x2 matrix to get $(X'X)^{-1}$ . With some manipulation it is possible to show that (***) gives back exactly the scalar formulas for $\hat\beta_1$ and $\hat\beta_2$ for the case k=2.
Let k=N
In this case we have as many parameters to estimate as we have observations. We have N equations in N unknowns. As long as X (which is now square N x N and k x k since N=k) has an inverse, then OLS will find the estimates that satisfy the equation $y = X\hat{\beta}$ , which is simply $\hat\beta = X^{-1}y$ . But we already have seen that (***) collapses to this expression if X is invertible. The prediction error is identically 0 for each observation because we can perfectly explain each observation. If X is not invertible, then we cannot find N different values of $\hat\beta$ . We know longer have a system of linearly independent equations because X is no longer full rank.

Using the derived formulas for the OLS estimates we get the OLS prediction and error formulas:
$\eqalign{ \hat{y} &= X(X'X)^{-1}X'y\cr e &= (I-X(X'X)^{-1}X')y\cr \hat\sigma^2 &= e'e/(N-2) = {1\over N-2}y'(I-X(X'X)^{-1}X')'(I-X(X'X)^{-1}X')y\cr &= {1\over N-2}y'(I-X(X'X)^{-1}X')y\cr }$
Note that each of these expressions except the last takes the form "some matrix that involves only X" X the vector y. In other words, each is a linear function of the Y observation in the sample. The final expression is different since it includes y' and y. The last step in deriving the matrix expression for $\hat\sigma^2$ is proved in class.

Exercise: use these formulas to derive computational properties of OLS estimates for the MLRM that are analogous to those derived for the simple LRM.