Brent Kreider and John V. Pepper, "Inferring Disability Status from
Corrupt Data", Journal of Applied Econometrics, Vol. 23, No. 3, 2008,
pp. 329-349.
The data used in this paper come from the Health and Retirement Study
(HRS) and are organized as follows:
1. "1_Dstats.raw" (ascii) contains the variables used to generate the
descriptive statistics presented in Table I. "1_Dstats.dta" is the
corresponding Stata file. There are 9,824 observations (rows) and 18
variables (columns):
V1 CASE ID: WAVE 1
limited 1 if limitation in kind/amount of work, 0 otherwise
cantwork 1 if can't work, 0 otherwise
age respondent's age
female 1 if female, 0 otherwise
nonwhite 1 if nonwhite, 0 otherwise
working 1 if working for pay, 0 otherwise
apply_DI 1 if applied for SSDI/SSI, 0 otherwise
grant_DI_still 1 if receiving DI, 0 otherwise
grant_new 1 if receive any disability benefits
health_general 1 if general health fair or poor
health_emotional 1 if emotional health fair or poor
died 1 if died in 1995 or earlier
BMI_not 1 if BMI out of ideal range
ADL_level_0 no functional limitations
ADL_level_6 very diff/can't do a basic function
C_number number of reported conditions
C_serious_obj diabetes, cancer, lung, heart, stroke, psych
2. "2_Verification.raw" (ascii) contains the variables used to generate
the results in Tables II-IV. "2_Verification.dta" is the corresponding
Stata file. There are 9,824 observations (rows) and 24 variables
(columns). The y_* variables indicate whether the disability self-
reports, labeled X in the paper, are "verified" to be accurate (y_*=1
if verified, y_*=0 otherwise) under the various verification strategies
presented in the paper:
V1 "CASE ID: WAVE 1"
limited "X in paper: 1 if reported limitation in kind or amount
of work"
cantwork "X in paper: 1 if reported can't work"
age_sort "data sorted by age"
dis_sort "data sorted by ordered probit fitted values"
y_1: "full verification of workers, disability beneficiaries,
and those reporting no limitation" (not reported in paper)
For Table II:
y_2: "verify X: Model I"
y_3: "verify X: Model II"
For Table III, Model I:
y_4 "verify X: y_2=1 & [(X=1&ADL>=0) or (X=0&ADL<=6)]"
y_5 "verify X: y_2=1 & [(X=1&ADL>=1) or (X=0&ADL<=5)]"
y_6 "verify X: y_2=1 & [(X=1&ADL>=2) or (X=0&ADL<=4)]"
y_7 "verify X: y_2=1 & [(X=1&ADL>=3) or (X=0&ADL<=3)]"
y_8 "verify X: y_2=1 & [(X=1&ADL>=4) or (X=0&ADL<=2)]"
y_9 "verify X: y_2=1 & [(X=1&ADL>=5) or (X=0&ADL<=1)]"
y_10 "verify X: y_2=1 & [(X=1&ADL>=6) or (X=0&ADL<=0)]"
For Table III, Model II:
y_11 "verify X: y_3=1 & [(X=1&ADL>=0) or (X=0&ADL<=6)]"
y_12 "verify X: y_3=1 & [(X=1&ADL>=1) or (X=0&ADL<=5)]"
y_13 "verify X: y_3=1 & [(X=1&ADL>=2) or (X=0&ADL<=4)]"
y_14 "verify X: y_3=1 & [(X=1&ADL>=3) or (X=0&ADL<=3)]"
y_15 "verify X: y_3=1 & [(X=1&ADL>=4) or (X=0&ADL<=2)]"
y_16 "verify X: y_3=1 & [(X=1&ADL>=5) or (X=0&ADL<=1)]"
y_17 "verify X: y_3=1 & [(X=1&ADL>=6) or (X=0&ADL<=0)]"
For Table IV:
yy_1 "verify X: Model I"
yy_2 "verify X: Model II"
3. "3A_RUR_N_1082.raw" and "3A_RUR_N_233.raw" contain the variables
used to generate the results in Table V, Columns A and B, respectively.
"3A_RUR_N_233.dta" and "3B_RUR_N_233.dta" are the corresponding Stata
files. There are 1082 observations (rows) in the first case and 233
observations (rows) in the second case. In both cases, there are 6
variables (columns):
V1 "CASE ID: WAVE 1"
cantwork "X in paper: 1 if report can't work"
grant_still_receiving "1 if still receiving benefits"
age_sort "data sorted by age"
dis_sort "data sorted by ordered probit fitted values"
yy_2 "verify cantwork: Model I"
The four raw files are ASCII files in DOS format. They are all zipped in the
file kp-raw.zip. The four Stata files are all zipped in the file kp-dta.zip.