## A. Introduction

A proper understanding of how violations of Assumption A4 (or A4*) in the LRM affect regression results and how they DO NOT is very important, particularly if you want to make sense of most economic and social research.

The conclusion drawn from a regression or any other statistical procedure should NEVER be an answer, but rather a better QUESTION. One of the most important episodes concerning the use of regression to give answers rather than questions is the history of studying IQ tests.

The Intelligence Quotient (IQ) was developed by Binet as a way to identify students who are falling behind their peers and could benefit from extra attention. Like many useful tools, it was quickly used incorrectly. Almost from the start scientists (typically white male scientists of European heritage who were themselves misinterpreting and misapplying Darwin's theory of evolution), began trying to see whether IQ tests differed across races and the sexes.

Of course, they were convinced a priori that white males would have the highest scores. The tools of regression and correlation being developed at the time helped make the point. We can trace this story through a series of regressions. While this is not historically accurate because much of the notation, methodology, and rhetoric has changed over the years, this way of telling the story makes the essential points and uses our framework exclusively. (References to other accounts are given at the end of the document).

## B. The First Wave

One of the biggest coups for "proving" this point came during World War I when American psychologists convinced the U.S. Army to administer IQ tests to all recruits and to record control variables such as race. We can think of the first results as coming from a regression of the form

where White is a dummy variable indicating the person is white, and Male a dummy variable indiciating a man. The error term u1 is the error associated with the specification (1). It came as no surprise that the estimates of and were positive and significantly different from 0. This supported the hypothesis that men and white people have greater intellect than others.

## C. The Second Wave

But, of course, ability to perform well on a IQ test depends upon many things, not simply (and maybe not at all!) the genetic factors White and Male. Consider the effect of race. In the early 1900s, most American blacks lived in the south and had access to almost no education. Equation (1) does not control for this fact, differences in education levels are captured in the error term u1. If we expand (1) to be of the form

where Educ is the number of years of school attended by the person before the test was administered. If (1) is estimated the effect of Educ is included in u1, but is not included in u2. Suppose Educ does have an effect on IQ even though the researchers tried to make the test a measure of only inate ability. (At the least the IQ test had to be given orally if the person couldn't read and reading is certainly not inate.)

So it is reasonable to assume that the . How does excluding this variable affect estimates of and ? We know that the estimate of from equation (1) would have the property:

where is the estimated coefficient in the regression:

Given and (whites went to school longer than blacks), the estimate of from (1) is biased upwards:

The effect of race is overestimated when education is omitted. Again, the point isn't just that Educ is related to IQ and is excluded from equation (1). It is that Educ is related to IQ AND it is correlated with White (and in the past Male as well). Of course, there were reasonable white male scientists who saw through this issue at the time, so it was only the ones who wanted to be convinced that were.

## D. The Third Wave

If you were to run equation (2) using data from the 1920s in the U.S., you would still get a positive and significant coefficient on the White variable even after controlling for education. Perhaps we should conclude that there is still a genetic racial difference in IQ after all.

But perhaps instead we should make sure that there aren't other important variables at play that are correlated with the included variables. The most striking element of the education system in much of the U.S. until after 1954 was segregation by race. Blacks attended separate schools that were anything but equal. Teachers in black schools were paid less, trained less, and worked with poorer supplies. Maybe one year of education for a white person provided better training that one year of education for a black person.

In other words, school quality may be omitted in (2) because years of education captures only quantity and not quality. Consider the population regression model:

where PTR stands for pupil-teacher-ratio in the school system the person attended. We might expect to be negative - the more pupils per teacher the less good the education and the lower IQ scores are, all else constant. And PTR was negatively correlated with White because whites went to schools with more teachers per student (and better books and better equipment and ...). If and White negatively correlated with PTR, then the race indicator is picking up both the effect of race and the indirect effect of school quality. This leads to an overestimate of b2 using equation (2).

## Meet the New Bias: Same as the Old Bias

Finally, one could bring the story up to the 1970s or so, by introducing the fact that IQ tests themselves are a biased measure of intelligence, because they are based on "white" culture. We could then think that the true (and politically correct) model looks something like:

where SMARTS is some unmeasurable index of "intelligence". IQ is related to smarts, but may also culturally determined, as indicated by other factors in (4.2). (4.1) and (4.2) do not allow a direct link between Smarts and race and sex.

## E. Summary

The point of this story IS NOT:
1. There isn't a racial difference in intelligence. Maybe there is a racial difference. But a definitive answer will not come out of regression analysis. Furthermore, we already know (from work like HW #3) that any racial differences that might exist are swamped by variation within races so maybe it just isn't an important question.
2. Only white males misuse statistics. The recent statistics on crime against women, bias against blacks in the U.S. legal system, and other politically correct "facts" are indeed alarming but may be distorted by the same we know the right answer and the statistics will show it attitude.
3. If not all all the variables in the true model are in the estimated regression then the results are meaningless. Far from it. The results of a good regression properly interpreted can focus future research in the right direction even if important variables are not available.

## F. References

• Gould, Stephen Jay. The Mismeasure of Man. A highly readable account of the history of intelligence and genetics.
• Crossen, Cynthia. Tainted Truth: the Manipulation of Fact in America. (On the misuse and abuse of polls and statistics.)
• Hernstein, Richard and Charles Murray. The Bell Curve: Intelligence and Class Structure in American Life.
• Review of The Bell Curve and Review of Gould's Review of the Bell Curve. Available through the WWW Chance News which is referenced on the Economics 351 Homepage.
• Card, David and Krueger, Alan. "Does School Quality Matter?." Journal of Political Economy, 1992.

## End of Document iqscores

[iqscores Contents] [Previous File]