Brent Kreider and John V. Pepper, "Inferring Disability Status from 
Corrupt Data", Journal of Applied Econometrics, Vol. 23, No. 3, 2008, 
pp. 329-349.

The data used in this paper come from the Health and Retirement Study
(HRS) and are organized as follows:  

1. "1_Dstats.raw" (ascii) contains the variables used to generate the
descriptive statistics presented in Table I.  "1_Dstats.dta" is the
corresponding Stata file.  There are 9,824 observations (rows) and 18
variables (columns):  

V1               CASE ID: WAVE 1
limited          1 if limitation in kind/amount of work, 0 otherwise
cantwork         1 if can't work, 0 otherwise
age              respondent's age
female           1 if female, 0 otherwise
nonwhite         1 if nonwhite, 0 otherwise
working          1 if working for pay, 0 otherwise
apply_DI         1 if applied for SSDI/SSI, 0 otherwise
grant_DI_still   1 if receiving DI, 0 otherwise
grant_new        1 if receive any disability benefits
health_general   1 if general health fair or poor
health_emotional 1 if emotional health fair or poor
died             1 if died in 1995 or earlier
BMI_not          1 if BMI out of ideal range
ADL_level_0      no functional limitations
ADL_level_6      very diff/can't do a basic function
C_number         number of reported conditions
C_serious_obj    diabetes, cancer, lung, heart, stroke, psych


2. "2_Verification.raw" (ascii) contains the variables used to generate
the results in Tables II-IV.  "2_Verification.dta" is the corresponding
Stata file.  There are 9,824 observations (rows) and 24 variables
(columns).  The y_* variables indicate whether the disability self-
reports, labeled X in the paper, are "verified" to be accurate (y_*=1
if verified, y_*=0 otherwise) under the various verification strategies
presented in the paper:

  V1         "CASE ID: WAVE 1"
  limited    "X in paper: 1 if reported limitation in kind or amount
              of work"
  cantwork   "X in paper: 1 if reported can't work"
  age_sort   "data sorted by age"
  dis_sort   "data sorted by ordered probit fitted values"
  y_1:       "full verification of workers, disability beneficiaries,
              and those reporting no limitation" (not reported in paper)

  For Table II:
    y_2:       "verify X: Model I"
    y_3:       "verify X: Model II"

  For Table III, Model I:
    y_4        "verify X: y_2=1 & [(X=1&ADL>=0) or (X=0&ADL<=6)]" 
    y_5        "verify X: y_2=1 & [(X=1&ADL>=1) or (X=0&ADL<=5)]"  
    y_6        "verify X: y_2=1 & [(X=1&ADL>=2) or (X=0&ADL<=4)]"  
    y_7        "verify X: y_2=1 & [(X=1&ADL>=3) or (X=0&ADL<=3)]"  
    y_8        "verify X: y_2=1 & [(X=1&ADL>=4) or (X=0&ADL<=2)]"   
    y_9        "verify X: y_2=1 & [(X=1&ADL>=5) or (X=0&ADL<=1)]"  
    y_10       "verify X: y_2=1 & [(X=1&ADL>=6) or (X=0&ADL<=0)]"  

  For Table III, Model II:
    y_11       "verify X: y_3=1 & [(X=1&ADL>=0) or (X=0&ADL<=6)]" 
    y_12       "verify X: y_3=1 & [(X=1&ADL>=1) or (X=0&ADL<=5)]" 
    y_13       "verify X: y_3=1 & [(X=1&ADL>=2) or (X=0&ADL<=4)]" 
    y_14       "verify X: y_3=1 & [(X=1&ADL>=3) or (X=0&ADL<=3)]" 
    y_15       "verify X: y_3=1 & [(X=1&ADL>=4) or (X=0&ADL<=2)]" 
    y_16       "verify X: y_3=1 & [(X=1&ADL>=5) or (X=0&ADL<=1)]"
    y_17       "verify X: y_3=1 & [(X=1&ADL>=6) or (X=0&ADL<=0)]" 

  For Table IV: 	
    yy_1       "verify X: Model I"
    yy_2       "verify X: Model II" 


3. "3A_RUR_N_1082.raw" and "3A_RUR_N_233.raw" contain the variables
used to generate the results in Table V, Columns A and B, respectively.
"3A_RUR_N_233.dta" and "3B_RUR_N_233.dta" are the corresponding Stata
files.  There are 1082 observations (rows) in the first case and 233
observations (rows) in the second case.  In both cases, there are 6
variables (columns):

  V1                     "CASE ID: WAVE 1"
  cantwork               "X in paper: 1 if report can't work"
  grant_still_receiving  "1 if still receiving benefits"  
  age_sort               "data sorted by age"
  dis_sort               "data sorted by ordered probit fitted values"
  yy_2                   "verify cantwork: Model I"

The four raw files are ASCII files in DOS format. They are all zipped in the
file kp-raw.zip. The four Stata files are all zipped in the file kp-dta.zip.