[iqscores Contents] [Previous File] [Next File]

The conclusion drawn from a regression or any other statistical
procedure should **NEVER** be an **answer**, but
rather a better **QUESTION**. One of the most important
episodes concerning the use of regression to give answers rather than
questions is the history of studying IQ tests.

The **Intelligence Quotient** (IQ) was developed by Binet as a way
to identify students who are falling behind their peers and could benefit
from extra attention. Like many useful tools, it was quickly used
incorrectly. Almost from the start scientists (typically white male
scientists of European heritage who were themselves misinterpreting and
misapplying Darwin's theory of evolution), began trying to see whether IQ
tests differed across races and the sexes.

Of course, they were convinced a priori that white males would have the
highest scores. The tools of regression and correlation being developed
at the time helped make the point. We can trace this story through a
series of regressions. While this is *not* historically accurate
because much of the notation, methodology, and rhetoric has changed over
the years, this way of telling the story makes the essential points and
uses our framework exclusively. (References
to other accounts are given at the end of the document).

where White is a dummy variable indicating the person is white, and
**Male** a dummy variable indiciating a man. The error term u1 is the
error associated with the specification (1). It came as no surprise that
the estimates of
and
were positive and significantly
different from 0. This supported the hypothesis that men and white people
have greater intellect than others.

where Educ is the number of years of school attended by the person before the test was administered. If (1) is estimated the effect of Educ is included in u1, but is not included in u2. Suppose Educ does have an effect on

So it is reasonable to assume that the
. How does excluding
this variable affect estimates of
and
? We know that
the estimate of
from equation (1) would have the property:

where
is the estimated coefficient in the regression:

Given
and
(whites went to school longer than blacks),
the estimate of
from (1) is biased **upwards**:

The effect of race is overestimated when education is omitted. Again, the
point *isn't* just that Educ is related to IQ and is excluded from
equation (1). It is that Educ is related to IQ AND it is correlated with
White (and in the past Male as well). Of course, there were reasonable
white male scientists who saw through this issue at the time, so it was
only the ones who wanted to be convinced that were.

But perhaps instead we should make sure that there aren't other important variables at play that are correlated with the included variables. The most striking element of the education system in much of the U.S. until after 1954 was segregation by race. Blacks attended separate schools that were anything but equal. Teachers in black schools were paid less, trained less, and worked with poorer supplies. Maybe one year of education for a white person provided better training that one year of education for a black person.

In other words, school quality may be omitted in (2) because years of
education captures only quantity and not quality. Consider the population
regression model:

where PTR stands for pupil-teacher-ratio in the school system the person
attended. We might expect
to be negative - the more pupils per
teacher the less good the education and the lower IQ scores are, all else
constant. And PTR was negatively correlated with White because whites went
to schools with more teachers per student (and better books and better
equipment and ...). If
and White negatively correlated with
PTR, then the race indicator is picking up both the effect of race and the
indirect effect of school quality. This leads to an overestimate of b2
using equation (2).

where SMARTS is some unmeasurable index of "intelligence". IQ is related to smarts, but may also culturally determined, as indicated by other factors in (4.2). (4.1) and (4.2) do not allow a direct link between Smarts and race and sex.

- There isn't a racial difference in intelligence. Maybe there is a racial difference. But a definitive answer will not come out of regression analysis. Furthermore, we already know (from work like HW #3) that any racial differences that might exist are swamped by variation within races so maybe it just isn't an important question.
- Only white males misuse statistics. The recent statistics on crime
against women, bias against blacks in the U.S. legal system, and other
politically correct "facts" are indeed alarming but may be distorted
by the same
*we know the right answer and the statistics will show it*attitude. - If not all all the variables in the true model are in the estimated regression then the results are meaningless. Far from it. The results of a good regression properly interpreted can focus future research in the right direction even if important variables are not available.

- Gould, Stephen Jay.
*The Mismeasure of Man*. A highly readable account of the history of intelligence and genetics. - Crossen, Cynthia.
*Tainted Truth: the Manipulation of Fact in America.*(On the misuse and abuse of polls and statistics.) - Hernstein, Richard and Charles Murray.
*The Bell Curve: Intelligence and Class Structure in American Life.* - Review of The Bell Curve and Review of Gould's Review of the Bell Curve. Available through the WWW Chance News which is referenced on the Economics 351 Homepage.
- Card, David and Krueger, Alan. "Does School Quality Matter?."
*Journal of Political Economy*, 1992.