Queens University at Kingston

HyperMetricsNotes

regress File 7
[regress Contents] [Previous File] [Next File]


G. Prediction using OLS

    We begin our discussion of OLS with the objective of fitting a line through sample data on pairs of observations (X and Y) as well as possible under the squared error measure of distance. We then showed that under some assumptions OLS estimates provide good estimates of underlying population parameters, including the possibility of performing hypotheses tests and computing confidence intervals. Now we return to using the OLS estimates with respect to prediction and fit.

    For a sample observation we have already defined the predicted value: $\hat{Y_i} \equiv \hat\beta_1+\hat\beta_2 X_i$ . Often we may use the OLS estimates to predict the value of Y for a a value of X that is not in the sample. Furthermore, we will want to know how precise our prediction for the value of Y is, whether in sample or out of sample.

    Let us then consider the problem of predicting the value of Y for an arbitrary value of X denoted $X_0$ . This value of X may or may not be in the sample used to estimate the regression equation. We can actually think of two numbers we would want to predict for $X_0$ :

    Individual Value: $Y_0 = \beta_1 + \beta_2 X_0 + u_0$

    and

    Mean Value$E[Y_0 | X_0] = \beta_1 + \beta_2 X_0$

    Notice the difference between the two. One is the actual value of Y for a particular observation, including that observations error term $u_0$ . The second object is the expected value of Y conditional on knowing the value of $X_0$ . For example, suppose you are a criminologist who has estimated the following regression:
    $$\hbox{Crime Rate} = \beta_1 + \beta_2 \hbox{Population} + u$$
    using data on city sizes and crime rates. You obtain estimates $\hat\beta_1$ and $\hat\beta_2$ and then wish to make predictions about crime rates for cities that you do not know the crime rate for. You might want to predict the crime rate for a particular city, say Toronto, whose population would give it a value $X_0 = 3.2 million$ . That would be a an individual prediction. On the other hand, you may want to know what your model predicts is the crime on average in cities of size 3.2 million. That would be a mean prediction. In effect, you want to average out the effect of the disturbance terms $u$ .

    The OLS prediction is the same for both mean and individual predictions:
    $$\eqalign{\hat Y_0 | X_0 \equiv \hat\beta_1 + \hat\beta_2 X_0\cr \hat{E}[ Y_0 | X_0] \equiv \hat\beta_1 + \hat\beta_2 X_0\cr }$$
    The predictions are the same because the expected value of $u_0$ (that is, the disturbance term for a particular observation such as Toronto) is 0. So one would use the OLS regression line to predict out of sample as well as in sample. We can think of the difference the following way:
    $$\hat Y_0 | X_0 = \hat{E}[Y_0 | X_0] + E[u_0 | X_0] = \hat{E}[Y_0 | X_0]$$
    This equation suggests that the difference between the predictions lies in their variance. The precision of an individual prediction is lower because the variance of the disturbance must be taken into account.
    $$\eqalign{Var(\hat{E}[Y_0 | X_0] &= Var(\hat\beta_1 + \hat\beta_2 X_0) \cr &= Var(\bar Y) + (X_0-\bar X)^2Var(\hat\beta_2)\cr &= \sigma^2\biggl( {1\over N} + {(X_0-\bar X)^2 \over \sum x_i^2} \biggr)\cr} $$
    Notice that variance increases the farther the value of $X_0$ is from the sample mean of X. It is at the sample mean of X that we have the most information about the relationship between X and Y. As we move away from that point, the less information we have and the more unsure we are of the location of the population regression line.

    Since we assume that the disturbance term is unrelated to the value of $X$ , we can see that
    $$Var(\hat Y | X_0 ) = \sigma^2 + Var(\hat E[Y_O | X_O]) = \sigma^2\biggl( 1 + {1\over N} + {(X_0-\bar X)^2 \over \sum x_i^2} \biggr)$$
    Of course we can't directly compute the variance in our predictions, because they depend upon the value of $\sigma^2$ . As usual, we can only compute the estimated variance and standard deviation of the prediction.

    Note that under A7 both predictions are normally distributed, so that we can compute confidence intervals and perform hypotheses tests on actual values of Y and E[Y].

    Once we know the formula for the variance of the prediction we are making, the formula a confidence interval for the prediction is the same as usual:
    $$\eqalign{\hat{Y}_O | X_O &\pm \hat{se}(\hat Y_O) t_{N-2,1-\alpha}\cr \hat{E}[Y_O|X_O] &\pm \hat{se}(\hat E) t_{N-2,1-\alpha}\cr} $$
    where
    $$\eqalign{ \hat{se}\bigl(\hat E(Y_O|X_O)\bigr) &\equiv \sqrt{ \hat\sigma^2 /N + (X_O-\bar X)^2 \hat{Var}(\hat\beta_2) }\cr \hat{se}\bigl(\hat Y_O | X_O\bigr) &\equiv \sqrt{ \hat\sigma^2(1+1/N) + (X_O-\bar X)^2 \hat{Var}(\hat\beta_2) }\cr } $$
    To compute the confidence interval by hand using only the regression output requires five numbers:

    5 Pieces of Information to Compute Confidence Intervals for Predictions

    1. $X_O$ (the value to predict for, chose by you)
    2. $\sigma^2$ (reported as Mean Squared Error in the table)
    3. sample size $N$
    4. $\bar X$ (sample mean of X, which requires use of the summarize command)
    5. $\hat{Var}(\hat\beta_2)$ (square of estimated standard error)
    All but $\bar X$ can taken directly off the Stata regression output. Stata also has a built in predict command that computes predictions and standard errors of predictions. Learn more about them in the Week 4 tutorial.
    [regress Contents] [Next File] [Top of File]

    This document was created using HTX, a (HTML/TeX) interlacing program written by Chris Ferrall.
    Document Last revised: 1997/1/5