John Mullahy, "Heterogeneity, Excess Zeros, and the Structure of Count Data Models", Journal of Applied Econometrics, Vol. 12, No. 3, 1997, pp. 337-350. The data are in column-separated, multiple-lines-per-record, ASCII format, arrayed as follows: 1. SEX byte 2. AGE float 3. INCOME float 4. ILLNESS byte 5. ACTDAYS byte 6. HSCORE byte 7. DOCTORCON byte 8. LEVYPLUS byte 9. FREEPOOR byte 10. FREEREPAT byte 11. CHCOND1 byte 12. CHCOND2 byte 13. AGESQ float The dependent variable is the number of doctor visits (consultations) during the recall period (DOCTORCON). The covariates are SEX, AGE, AGE SQUARED (AGESQ), INCOME, three insurance status measures (LEVYPLUS, FREEPOOR, and FREEREPAT), ILLNESS (recent illness), ACTDAYS (number of reduced activity days), HSCORE (general health questionnaire score), two measures of chronic conditions (CHCOND1 and CHCOND2). The sample consists of 5,190 observations and is from the 1977-78 Australian Health Survey and contains information on health service utilization and covariates describing factors that affect health care utilization propensities. The sample, whose distribution in entirety is limited by agreement with its Australian source, were kindly provided to me by Prof. P.K. Trivedi. The dataset posted here contains only those variables used in the analysis. A detailed description is provided in: Cameron, A.C. and P.K. Trivedi. 1986. "Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests." Journal of Applied Econometrics 1: 29-53. The interested analyst might also wish to consult the dataset and associated description pertaining to the paper by Cameron and Johansson in the same issue of the JAE. The data file mullahy.dat, which is in DOS format, is zipped into a ZIP archive jm-data.zip. The zipped size of the dataset is approximately 28k; unzipped it is about 740k.