
. This is another unknown population parameter which must
be estimated from the data. Since the prediction error
is analogous
to the actual disturbance term
, it might make sense to estimate
the variance of the u's with the (observable) variance of the prediction
errors.
We already know from the first normal equation
that the OLS prediction errors have a (sample) mean of 0. Their sample
variance equals therefore equals
The reason for dividing by N-2 rather than N will be made clear later.
Intuitively, the reason for dividing by N-2 is that two parameters must
be estimated to compute the error terms (
and
). When
computing the sample variance of a variable, it is typical to divide by
because one parameter (the sample mean) must be estimated to
compute the sample variance. This changes the degrees of freedom in the sums of squares
that defines
.
Each of the following properties is derived in class.
,
, and
.
, it is necessary to estimate the variances by inserting
the estimate of
:
and
.
and
are statistically independent
of
.
This property follows directly from the three previous properties and
the definition of the t distribution.