[regress Contents] [Previous File] [Next File]

(In both expressions, the second component is Y "bar", the sample mean of the Y observations.) Recall that one of our numerical properties of OLS is that the average predicted value equals the . Therefore, ESS is also a "sum of squared deviations from mean" as is TSS and RSS. In other words, ESS is how much the predicted values in the sample vary, while TSS is how much the actual values vary.

Let us derive a formula for ESS:

In other words, ESS depends only the values of X and
.

With a little bit of algebra we can go on to show

That is, the OLS estimates of the LRMTSS = ESS + RSS

Notice that under A7, each Y observation as well as the sample mean of Y
are normally distributed. Therefore
is normally
distributed. TSS is therefore a sum-of-squared normal variables with mean
0. To calculate TSS, one needs to have calculated
, which imposes
one linear restriction on the deviations. The degrees of freedom
associated with TSS are therefore** N-1**. We have already
argue that to calculate RSS requires two restrictions embodied in the two
OLS normal equations. The degrees of freedom associated with RSS is
**N-2**. What about ESS? Under assumptions **A0-A7**
is normally distributed, and ESS depends only on its value.
Even though ESS is defined as the sum of N normal variables, there are N-1
dependencies among them. ESS is therefore associated with a chi-squared
random variable with ** 1 degree of freedom**.

This leads us to the conclusion that OLS not only decomposes the variance in the Y observations, it only decomposes the degrees of freedom in the variation:

The preceding discussion of the analysis of variances leads quite easily into a measure of how well the OLS estimates fit the sample data. First, imagine a data set in which all the values of X and Y line up on a line:N-1 = 1 + N-2 D.of F. for TSS = D. of F. for ESS + D. of F. for RSS

In this case, OLS will minimize the squared errors and will recover the
line. The predicted values for each observation will be identical to
the actual values:
for all i. The OLS estimates would
*perfectly fit the data*. Or, in other words, RSS = 0. But then
this must mean that TSS = ESS: all the variation in Y is explained by
the linear regression model.

Now take the other extreme in which there is no correlation between X and
Y in the sample. The sample is simply a *flat cloud* of points:

In this case, the OLS estimate would lead to: , and for all i. Notice that if , then ESS = 0. That is, none of the variation in Y is explained by variation in X. Therefore TSS = RSS, all the variation is left unaccounted for by the model. This would be the worst fit of the data.

If we assign a "1" to a perfect fit and a "0" to the worst fit, then all
other cases should lie somewhere in between. X should explain some of
the variation in Y, but generally not all of it. A logical *measure of
fit* would therefore be:

R² = Measure of fit = ESS /TSSThis ratio is usually referred to as R² or "R-squared". The interpretation is simple:

R² is the proportion of total variation in Y in the sample explained by the OLS regression line. It is a rough measure of how close the sample data lie to the estimated regression line.