We therefore need a single test statistic for a multiple hypothesis.
We will restrict attention to linear hypotheses, which are linear
restrictions on the values contained in the parameter vector
. A
linear restriction on
takes the form
where R is a rxk matrix of constants and c is a r x 1 vector of
constants. For example, if k = 5 then the joint hypothesis above could
be written:
In this world the true parameters actually satisfy the restriction being imposed on the estimated parameters in the restricted model. Imposing a true restriction should not make very much difference in the results. That is, imposing a true restriction should not change the amount of variation in Y attributed to the residual u. would not be very different from if is true. The only reason why they would differ at all if is true is because allowing allows the estimates to pick up sampling variation in the relationship between X and Y. On the other hand,
Imposing a false restriction on the estimated parameters should affect OLS abilitly to fit the Y observations. That is, if then this should be would be reflected in and the RSS in the restricted and unrestricted models would be much different.
When the restriction matrix R contains one row (r=1), then the test statistic is F(1,N-k). A chi-square random variable with one degree of freedom is simply a standard normal Z variable squared. So F(1,N-k) can be written , where C is a chi-squared with N-k degrees of freedom. But this expression is exactly the square of a t variable with N-k degrees of freedom. Hence a t-test is a special case of the more general F-test.
Notice that the restricted model simply includes a constant. The OLS
estimate of
would in the restricted model be simply
. The
would then simply be TSS! The
restricted unexplained variation is simply the total variation in Y
around its sample mean. Recall that TSS = ESS + RSS, so
-
equals TSS -
, which equals
.
So the test statistic for a test of overall significance reduces to
which is simply the ratio of the two MS (mean-squared) entries in the
analysis of variance table. The F statistic reported in
the Stata output table is exactly this number.
We can go even further here. If we have a simple LRM (k=2), then the test for overall significance collapses to , which we know can also be tested with the t test . But the F version of the test is F(1,N-k), which we know from example 1 is the square of a t test. And, indeed, the overall F statistic is exactly equal to the the square of the t statistic for in a simple LRM. You should confirm this fact.
. * Let's test whether expected age of first intercourse
. * differs significantly by religious background
. test none cath oth
( 1) none = 0.0
( 2) cath = 0.0
( 3) oth = 0.0
F( 3, 5590) = 31.10
Prob > F = 0.0000
. * Conclusion: We can reject the hypothesis that
. * expected age of first intercourse is the same for
. * all religious backgrounds
. *
. * Let's perform the F test directly rather than using "test"
. * The regress above is the Unrestricted Model
. qui regress
. di "Unrestricted RSS = " _result(4)
Unrestricted RSS = 27598.636
. local u = _result(4)
. * The Restricted Model imposes H0 on the estimates, which
. * in this case means none cath oth all have zero coefficients
. * So the Unrestricted model is
. regress age
Source | SS df MS Number of obs = 5594
---------+------------------------------ F( 0, 5593) = .
Model | 0.00 0 . Prob > F = .
Residual | 28059.2594 5593 5.0168531 R-squared = 0.0000
---------+------------------------------ Adj R-squared = 0.0000
Total | 28059.2594 5593 5.0168531 Root MSE = 2.2398
------------------------------------------------------------------------------
age | Coef. Std. Err. t P>|t| [95 Conf. Interval]
---------+--------------------------------------------------------------------
_cons | 17.42063 .0299471 581.714 0.000 17.36192 17.47934
------------------------------------------------------------------------------
. di "The Restricted RSS = " _result(4)
The Restricted RSS = 28059.259
. local r = _result(4)
. di "So the F statistic for the test is " ((`r'-`u')/3)/(`u'/(5594-4))
So the F statistic for the test is 31.099197
. * Notice that this is the F statistic reported by test, and
. * since this is a test of overall significance it is also
. * equal to the F statistic reported in the regression output