We therefore need a single test statistic for a multiple hypothesis.
We will restrict attention to linear hypotheses, which are linear
restrictions on the values contained in the parameter vector
. A
linear restriction on
takes the form
where R is a rxk matrix of constants and c is a r x 1 vector of
constants. For example, if k = 5 then the joint hypothesis above could
be written:
In this world the true parameters actually satisfy the
restriction being imposed on the estimated parameters in the restricted model.
Imposing a true restriction should not make very much difference in the
results.
That is, imposing a true restriction should not change the amount of variation
in Y attributed to the residual u.
would not
be very different from
if
is true. The only reason why
they would differ at all if
is true is because allowing
allows the estimates to pick up sampling variation in
the relationship between X and Y. On the other hand,
Imposing a false restriction on the estimated parameters
should affect OLS abilitly to fit the Y observations.
That is, if
then this should be
would be reflected in
and the RSS in the restricted
and unrestricted models would be much different.
When the restriction matrix R contains one row (r=1), then the test
statistic is F(1,N-k). A chi-square random variable
with one degree of freedom is simply a standard normal Z variable squared.
So F(1,N-k) can be written
, where C is a chi-squared
with N-k degrees of freedom. But this expression is exactly the
square of a t variable with
N-k degrees of freedom. Hence a t-test is a special case of the
more general F-test.
Notice that the restricted model simply includes a constant. The OLS
estimate of
would in the restricted model be simply
. The
would then simply be TSS! The
restricted unexplained variation is simply the total variation in Y
around its sample mean. Recall that TSS = ESS + RSS, so
-
equals TSS -
, which equals
.
So the test statistic for a test of overall significance reduces to
which is simply the ratio of the two MS (mean-squared) entries in the
analysis of variance table. The F statistic reported in
the Stata output table is exactly this number.
We can go even further here. If we have a simple LRM (k=2), then the
test for overall significance collapses to
, which we
know can also be tested with the t test
.
But the F version of the test is F(1,N-k), which we know from example
1 is the square of a t test. And, indeed, the overall F statistic is
exactly equal to the the square of the t statistic for
in a simple LRM. You should confirm this fact.
. * Let's test whether expected age of first intercourse
. * differs significantly by religious background
. test none cath oth
( 1) none = 0.0
( 2) cath = 0.0
( 3) oth = 0.0
F( 3, 5590) = 31.10
Prob > F = 0.0000
. * Conclusion: We can reject the hypothesis that
. * expected age of first intercourse is the same for
. * all religious backgrounds
. *
. * Let's perform the F test directly rather than using "test"
. * The regress above is the Unrestricted Model
. qui regress
. di "Unrestricted RSS = " _result(4)
Unrestricted RSS = 27598.636
. local u = _result(4)
. * The Restricted Model imposes H0 on the estimates, which
. * in this case means none cath oth all have zero coefficients
. * So the Unrestricted model is
. regress age
Source | SS df MS Number of obs = 5594
---------+------------------------------ F( 0, 5593) = .
Model | 0.00 0 . Prob > F = .
Residual | 28059.2594 5593 5.0168531 R-squared = 0.0000
---------+------------------------------ Adj R-squared = 0.0000
Total | 28059.2594 5593 5.0168531 Root MSE = 2.2398
------------------------------------------------------------------------------
age | Coef. Std. Err. t P>|t| [95 Conf. Interval]
---------+--------------------------------------------------------------------
_cons | 17.42063 .0299471 581.714 0.000 17.36192 17.47934
------------------------------------------------------------------------------
. di "The Restricted RSS = " _result(4)
The Restricted RSS = 28059.259
. local r = _result(4)
. di "So the F statistic for the test is " ((`r'-`u')/3)/(`u'/(5594-4))
So the F statistic for the test is 31.099197
. * Notice that this is the F statistic reported by test, and
. * since this is a test of overall significance it is also
. * equal to the F statistic reported in the regression output