Thomas A. Mroz, "A Simple, Flexible Estimator for Count and Other Ordered Discrete Data", Journal of Applied Econometrics, Vol. 27, No. 4, 2012, pp. 646-665. There are two data files for this paper. The first is an extract from the 1992 National Health interview Survey constructed by Mullahy(1998). The second is a subset of the A.B.S. 1977-78 Australian Health Survey used by Cameron and Trivedi in their 1998 book "Regression Analysis of Count Data". All data sets and Stata .do files are ASCII files in DOS format. They are zipped in the file mroz-data.zip. Unix/Linux users should use "unzip -a". I. Data used in Mullahy (1998) The data in the file mullahy_1998.dat were extracted from the 1992 National Health Interview Survey constructed by John Mullahy for his paper Mullahy J. 1998. "Much Ado About Two: Reconsidering Retransformation and the Two-Part Model in Health Econometrics." Journal of Health Economics 17: 247-281. DOI:10.1016/S0167-6296(98)00030-7 These data are space delimited and can be read into Stata with the command infile age educ poverty famsize docvisits12 male white married working school xcellent verygood good fair using mullahy_1998.dat ; /* this file's size is approximately 5,431Kb */ See the stata do file Read_Mullahy_data.do, which reads in the data set and estimates a simple Poisson model for the determinants of doctor visits over the past 12 months. The variable names are mostly self-explanatory, and there is one excluded health status category. Summary statistics for the Mullahy data set: Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 36111 41.76287 10.9192 25 64 educ | 36111 12.96749 2.958055 0 18 poverty | 36111 1.112736 .3162739 1 2 famsize | 36111 2.88912 1.51095 1 15 docvisits12 | 36111 4.910166 12.80193 0 370 -------------+-------------------------------------------------------- male | 36111 .3780842 .4849155 0 1 white | 36111 .8190025 .3850214 0 1 married | 36111 .6808452 .4661556 0 1 working | 36111 .7057683 .4557029 0 1 school | 36111 .0212401 .1441856 0 1 -------------+-------------------------------------------------------- xcellent | 36111 .3418349 .4743311 0 1 verygood | 36111 .3004625 .4584655 0 1 good | 36111 .2414223 .4279517 0 1 fair | 36111 .0846556 .2783724 0 1 II. Data used in Cameron and Trivedi (1998) The data set racd3.asc cotains data used in Chapter 3 of Cameron AC, Trivedi PK. 1998. Regression Analysis of Count Data. Cambridge University Press: Cambridge This dataset is not a representative sample of Australians, as it oversamples young and old. In particular, use of health services may be overstated. This is because, while the original sample of 40,650 individuals from the A.B.S. 1977-78 Australian Health Survey is representative, the sample used here is restricted to single people over 18 years of age. See: A.C. Cameron, P.K. Trivedi, F. Milne and J. Piggott (1988), "A Microeconometric Model of the Demand for Health Care and Health Insurance in Australia", Review of Economic Studies, Vol.55, pp. 85-106. for more detailed discussion of the data than that given in the 1998 book. Additional information about these data can be found at http://www.econ.ucdavis.edu/faculty/cameron/racd/racd3.1st (accessed August 3, 2010) These data are space delimited and can be read using Stata with the command infile sex age agesq income levyplus freepoor freerepa illness actdays hscore chcond1 chcond2 doctorco nondocco hospadmi hospdays medicine prescrib nonpresc constant using racd3.asc ; /* this file's size is approximately 416Kb */ The file Read_Cameron_Trivedi_data.do reads in the data and estimates a simple Poisson model of the number of doctor visits. Summary statistics for this data set: . summarize ; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- sex | 5190 .5206166 .4996229 0 1 age | 5190 .4063854 .2047818 .19 .72 agesq | 5190 .2070766 .1856365 .0361 .5184 income | 5190 .5831599 .3689067 0 1.5 levyplus | 5190 .4427746 .4967623 0 1 -------------+-------------------------------------------------------- freepoor | 5190 .0427746 .202368 0 1 freerepa | 5190 .2102119 .4074983 0 1 illness | 5190 1.431985 1.384152 0 5 actdays | 5190 .8618497 2.887628 0 14 hscore | 5190 1.217534 2.124266 0 12 -------------+-------------------------------------------------------- chcond1 | 5190 .4030829 .4905644 0 1 chcond2 | 5190 .1165703 .3209385 0 1 doctorco | 5190 .3017341 .7981338 0 9 nondocco | 5190 .2146435 .9652756 0 11 hospadmi | 5190 .1736031 .5075236 0 5 -------------+-------------------------------------------------------- hospdays | 5190 1.333719 6.120081 0 80 medicine | 5190 1.218304 1.556643 0 8 prescrib | 5190 .8626204 1.415375 0 8 nonpresc | 5190 .355684 .712389 0 8 constant | 5190 1 0 1 1 The web appendix for this paper contains four sections Web Appendix Table of Contents: Web Appendix Section 1: Unconditional distributions for the two data generating processes used in the Monte Carlo experiments Web Appendix Section 2: Extending the Estimation Approach To Right Censored and Left Truncated Data Web Appendix Section 3: Model Selection Frequencies for the "Hazard" Models in the Monte Carlo Experiments Web Appendix Section 4: Additional estimated effects in the Monte Carlo Studies and their empirical mean square errors and mean absolute deviations