Brent Kreider and John V. Pepper, "Inferring Disability Status from Corrupt Data", Journal of Applied Econometrics, Vol. 23, No. 3, 2008, pp. 329-349. The data used in this paper come from the Health and Retirement Study (HRS) and are organized as follows: 1. "1_Dstats.raw" (ascii) contains the variables used to generate the descriptive statistics presented in Table I. "1_Dstats.dta" is the corresponding Stata file. There are 9,824 observations (rows) and 18 variables (columns): V1 CASE ID: WAVE 1 limited 1 if limitation in kind/amount of work, 0 otherwise cantwork 1 if can't work, 0 otherwise age respondent's age female 1 if female, 0 otherwise nonwhite 1 if nonwhite, 0 otherwise working 1 if working for pay, 0 otherwise apply_DI 1 if applied for SSDI/SSI, 0 otherwise grant_DI_still 1 if receiving DI, 0 otherwise grant_new 1 if receive any disability benefits health_general 1 if general health fair or poor health_emotional 1 if emotional health fair or poor died 1 if died in 1995 or earlier BMI_not 1 if BMI out of ideal range ADL_level_0 no functional limitations ADL_level_6 very diff/can't do a basic function C_number number of reported conditions C_serious_obj diabetes, cancer, lung, heart, stroke, psych 2. "2_Verification.raw" (ascii) contains the variables used to generate the results in Tables II-IV. "2_Verification.dta" is the corresponding Stata file. There are 9,824 observations (rows) and 24 variables (columns). The y_* variables indicate whether the disability self- reports, labeled X in the paper, are "verified" to be accurate (y_*=1 if verified, y_*=0 otherwise) under the various verification strategies presented in the paper: V1 "CASE ID: WAVE 1" limited "X in paper: 1 if reported limitation in kind or amount of work" cantwork "X in paper: 1 if reported can't work" age_sort "data sorted by age" dis_sort "data sorted by ordered probit fitted values" y_1: "full verification of workers, disability beneficiaries, and those reporting no limitation" (not reported in paper) For Table II: y_2: "verify X: Model I" y_3: "verify X: Model II" For Table III, Model I: y_4 "verify X: y_2=1 & [(X=1&ADL>=0) or (X=0&ADL<=6)]" y_5 "verify X: y_2=1 & [(X=1&ADL>=1) or (X=0&ADL<=5)]" y_6 "verify X: y_2=1 & [(X=1&ADL>=2) or (X=0&ADL<=4)]" y_7 "verify X: y_2=1 & [(X=1&ADL>=3) or (X=0&ADL<=3)]" y_8 "verify X: y_2=1 & [(X=1&ADL>=4) or (X=0&ADL<=2)]" y_9 "verify X: y_2=1 & [(X=1&ADL>=5) or (X=0&ADL<=1)]" y_10 "verify X: y_2=1 & [(X=1&ADL>=6) or (X=0&ADL<=0)]" For Table III, Model II: y_11 "verify X: y_3=1 & [(X=1&ADL>=0) or (X=0&ADL<=6)]" y_12 "verify X: y_3=1 & [(X=1&ADL>=1) or (X=0&ADL<=5)]" y_13 "verify X: y_3=1 & [(X=1&ADL>=2) or (X=0&ADL<=4)]" y_14 "verify X: y_3=1 & [(X=1&ADL>=3) or (X=0&ADL<=3)]" y_15 "verify X: y_3=1 & [(X=1&ADL>=4) or (X=0&ADL<=2)]" y_16 "verify X: y_3=1 & [(X=1&ADL>=5) or (X=0&ADL<=1)]" y_17 "verify X: y_3=1 & [(X=1&ADL>=6) or (X=0&ADL<=0)]" For Table IV: yy_1 "verify X: Model I" yy_2 "verify X: Model II" 3. "3A_RUR_N_1082.raw" and "3A_RUR_N_233.raw" contain the variables used to generate the results in Table V, Columns A and B, respectively. "3A_RUR_N_233.dta" and "3B_RUR_N_233.dta" are the corresponding Stata files. There are 1082 observations (rows) in the first case and 233 observations (rows) in the second case. In both cases, there are 6 variables (columns): V1 "CASE ID: WAVE 1" cantwork "X in paper: 1 if report can't work" grant_still_receiving "1 if still receiving benefits" age_sort "data sorted by age" dis_sort "data sorted by ordered probit fitted values" yy_2 "verify cantwork: Model I" The four raw files are ASCII files in DOS format. They are all zipped in the file kp-raw.zip. The four Stata files are all zipped in the file kp-dta.zip.