A3 would then appear to be a difficult to take seriously. But all of our
results can be derived from much weaker assumptions. In particular,
the actual assumption that we need to make is
A5 and A6 are more important than A4. A5 says that the different observations in the data set do not have statistically related disturbance terms. Notice, we didn't say "independent," because independence is a stronger assumption than 0 covariance. In fact, independence between two random variables is very hard to test, but checking or testing whether the disturbance terms have 0 covariance is not difficult.
A6 says that each of the disturbance terms has the same variance, . It assumes that each observation has equal variance around the PRE. This means each observation provides the same information (in a sense made clear later) about where the PRE is located. If, instead, some observations had lower variances than others, then the low variance observations tend to be closer to the PRE. When trying to find the PRE (by estimating and ) we would want to put more weight on the low variance observations. The term ordinary least squares really means equally weighted least squares, and Assumption A6 about the distribution of therefore relates to the performance of OLS.
Notice that we are taking the conditional
expectation of
, conditional upon knowing the value of
.
Equation E7 is another way to think of the LRM. It says that we
are assuming a linear relationship between the exogenous
variable
and the expected value
of the random
variable
. This is an ordinary linear relationship;
there are no random variables left in E7 because
is a population
parameter. Taking expectations wipes out the influence of the random
factor
.
It is then perfectly legal and reasonable to take derivatives in E7:
We interpret the slope coefficient
as a derivative, the rate of change in
as the exogenous variable X
changes. In our minimum wage example,
would determine the rate
at which expected employment (across provinces) changes as the minimum wage changes.
Notice that the disturbance term u has no effect on this interpretation. In
others, the level of
is moved around by the disturbance terms, but
measures how
would change with
.