Peter Hudomiet, Gabor Kezdi, and Robert J. Willis, "Stock Market Crash 
and Expectations of American Households", Journal of Applied 
Econometrics, Vol. 26, No. 3, 2011, pp. 393-415.

INTRODUCTION

There are two txt data files zipped into hkw-data.zip. The first data file
is named "hrs_expectations_08.txt" and the second is called 
"stock_indices_08.txt". Both of these are ASCII files in DOS format.
Unix/Linux users should use "unzip -a".

Two sources of data were used in this project:

1. The main data source was the 2008 wave of the Health and Retirement 
   Study (HRS). "hrs_expectations_08.txt" contains data from this wave, 
   together with control variables from the 2004 and 2006 waves and 
   some stock market indices. Except for the last estimation presented
   in Table 5, we only used the publicly available data from HRS. In the
   last estimation, we were using the "day of interview" data that are not
   publicly available. In the public version of HRS, only the month of 
   interview is provided. To be able to match expectations to higher 
   frequency stock market indices, one needs to get access to the 
   restricted day of interview data. For more information, see Section C
   below.

2. Publicly available stock market indices. To be able to reproduce the 
   results in table 5, one must have access to the restricted day of 
   interview data in HRS, and merge it to both the expectation data 
   ("hrs_expectations_08.txt") using the HRS person identifier and to 
   the "stock_indices_08.txt" using the day of interview variable. 
   For those who do not have access to the day of interview data, we 
   provided some more aggregate stock market measures in 
   "hrs_expectations_08.txt" that only use publicly available HRS data.
   Thus the second half of this file contains monthly averages of 
   returns volatility and volume of trade in the Dow Jones Industrial 
   average.

This readme file is organized as follows. Section A describes the 
publicly available HRS data used in this project. Section B describes 
the stock market indices, and Section C describes the way one can use 
the restricted "day of interview" HRS data to reproduce the results of 
Table 5. 

*****

Section A: PUBLICLY AVAILABLE HRS DATA
(source: http://hrsonline.isr.umich.edu)

The first 17 columns of "hrs_expectations_08.txt" are derived from the 
2004, 2006 and 2008 waves of HRS. These variables are:

C1: unique household and person identifier

C2: month of interview in 2008 in "YMM" format

C3: subjective probability that returns will be positive next year 
    (question P047 in HRS) in percentage form in 2008. Value "998" 
    means "Do not know" and "999" indicates a refused answer.

C4: subjective probability that returns will be higher or lower than 
    a randomly assigned threshold value next year (question P150 in 
    HRS) in percentage form in 2008. Value "998" means "Do not know" 
    and "999" indicates a refused answer. It is automatically missing 
    for those who DK/RF for C4 and for the epistemic uncertain people. 
    For more details see the questionnaire or Section 2 of our paper.

C5: The randomly assigned threshold value used for C4. The randomization
    was not perfect. For more details, see the questionnaire at
    
    http://hrsonline.isr.umich.edu/modules/meta/2008/core/qnaire/online/16hr08P.pdf"

C6: Female dummy

C7: Single dummy. Indicates that the interviewee is not married and does
    not have a partner.

C8: Black dummy

C9: Hispanic dummy

C10: Age in 2008

C11: Years of education

C12: Above average "cognitive abilities" dummy. To create this dummy 
     variable, we first derived a cognition proxy variable. This index
     is the principal component based on 12 variables. The first 6 are
     indicators of how the respondents rated the aptitude of their own
     memories (question D101). The seventh and eighth variables are
     direct measures of memory. Interviewees were told 10 words and 
     then they were asked to repeat these words immediately. A little 
     later they were asked again to repeat these words. The seventh 
     component variable is the number of correct responses in the 
     immediate recall phase and the eighth corresponds to the number of 
     correct responses in the delayed recall phase (based on questions 
     D174 and D184). The ninth variable was based on questions D142-
     -D146. Interviewees were asked to subtract 7 from 100, and then 
     repeat it 4 more times. The ninth variable was the number of 
     correct subtractions. The last three variables were dummies 
     regarding whether the interviewees were able to correctly answer 
     three simple math questions (D178-D180).

C13: Indicator of whether the interviewee follows the stock market in 
     general. It was based on the 2004 values of P097. (2008 values are
     potentially endogenous, while 2006 values were not available).
     The variable is coded as 1 if the interviewee followed the stock
     market either "closely" or "somewhat closely".
 
C14: Indicator of whether the respondent or his/her spouse owned stocks 
     in 2006 either directly (in individual companies or mutual funds) 
     or indirectly (IRA/Keogh accounts at least partly invested in 
     stocks or mutual funds). It is based on Q316 and Q513.

C15: The subjective probability that the US economy will experience a 
     major depression in the next 10 years. Average of 2004 and 2006 
     values of P034.

C16: A standardized "depressive symptoms" index. It is based on the 
     2004 and 2006 values of questions D110-D118. The 9+9 questions are
     averaged and then standardized. For any missing values, the sample
     average of that item was replaced.

C17: Fraction of fifty-fifty answers to all subjective probability 
     questions in 2004 and 2006.

*****

Section B: Stock market indices
(sources: http://finance.yahoo.com and http://www.cboe.com/micro/vxd)

Column 18-20 in "hrs_expectations_08.txt" and columns 3-5 in 
"stock_indices_08.txt" are the stock indices. Variables in 
"hrs_expectations_08.txt" are monthly averages and are linked to the 
publicly available HRS data. Variables in "stock_indices_08.txt" have 
daily frequency and they can be merged to the HRS data if one has 
access to the restricted "day of interview" variable. More on this 
issue in Section C.

Variables in "hrs_expectations_08.txt":

C18: Monthly log returns in the DJIA index. For a given month we took
     the log difference between the average DJIA daily closing levels 
     in the given and in the previous month

C19: Average daily volatility in a given month. We used the VXD implied
     volatility index. Details at http://www.cboe.com/micro/vxd

C20: Average daily log volume of trade in a given month

Variables in "stock_indices_08.txt":

C1: month of interview in 2008 in "YMM" format

C2: day of interview

C3: Monthly log returns. It is defined as the log difference between
    the average of the DJIA daily closing levels in the last 5 days 
    minus the same one month before

C4: Average daily volatility in the last 5 days. We used the VXD 
    implied volatility index. Details at http://www.cboe.com/micro/vxd

C5: Average daily log volume of trade in the last 5 days

*****

Section C: Restricted HRS data

The day of interview data in HRS are not publicly available. In case 
one wants to analyze how the stock market affected expectations on
a daily basis, one needs to get access to the restricted day of 
interview data. For more information about the restricted HRS data
please visit

  http://hrsonline.isr.umich.edu/index.php?p=reslis 

There are multiple options for accessing these data. One way is 
to apply for access to the MiCDA Data Enclave. More information about
the application procedure is at

  http://micda.psc.isr.umich.edu/enclave/

Once one has access to the day of interview data, one needs to merge 
it to the expectations data using the unique household and person 
identifier which is called "hhidpn" in HRS. Then one needs to merge
this dataset to any stock market index based on the day of the 
interview. We used the variables in "stock_indices_08.txt" to produce
the results in Table 5. 

*****

Please email Peter Hudomiet with any questions about the data:

Peter Hudomiet
University of Michigan
hudomiet [AT] umich.edu