Peter Hudomiet, Gabor Kezdi, and Robert J. Willis, "Stock Market Crash and Expectations of American Households", Journal of Applied Econometrics, Vol. 26, No. 3, 2011, pp. 393-415. INTRODUCTION There are two txt data files zipped into hkw-data.zip. The first data file is named "hrs_expectations_08.txt" and the second is called "stock_indices_08.txt". Both of these are ASCII files in DOS format. Unix/Linux users should use "unzip -a". Two sources of data were used in this project: 1. The main data source was the 2008 wave of the Health and Retirement Study (HRS). "hrs_expectations_08.txt" contains data from this wave, together with control variables from the 2004 and 2006 waves and some stock market indices. Except for the last estimation presented in Table 5, we only used the publicly available data from HRS. In the last estimation, we were using the "day of interview" data that are not publicly available. In the public version of HRS, only the month of interview is provided. To be able to match expectations to higher frequency stock market indices, one needs to get access to the restricted day of interview data. For more information, see Section C below. 2. Publicly available stock market indices. To be able to reproduce the results in table 5, one must have access to the restricted day of interview data in HRS, and merge it to both the expectation data ("hrs_expectations_08.txt") using the HRS person identifier and to the "stock_indices_08.txt" using the day of interview variable. For those who do not have access to the day of interview data, we provided some more aggregate stock market measures in "hrs_expectations_08.txt" that only use publicly available HRS data. Thus the second half of this file contains monthly averages of returns volatility and volume of trade in the Dow Jones Industrial average. This readme file is organized as follows. Section A describes the publicly available HRS data used in this project. Section B describes the stock market indices, and Section C describes the way one can use the restricted "day of interview" HRS data to reproduce the results of Table 5. ***** Section A: PUBLICLY AVAILABLE HRS DATA (source: http://hrsonline.isr.umich.edu) The first 17 columns of "hrs_expectations_08.txt" are derived from the 2004, 2006 and 2008 waves of HRS. These variables are: C1: unique household and person identifier C2: month of interview in 2008 in "YMM" format C3: subjective probability that returns will be positive next year (question P047 in HRS) in percentage form in 2008. Value "998" means "Do not know" and "999" indicates a refused answer. C4: subjective probability that returns will be higher or lower than a randomly assigned threshold value next year (question P150 in HRS) in percentage form in 2008. Value "998" means "Do not know" and "999" indicates a refused answer. It is automatically missing for those who DK/RF for C4 and for the epistemic uncertain people. For more details see the questionnaire or Section 2 of our paper. C5: The randomly assigned threshold value used for C4. The randomization was not perfect. For more details, see the questionnaire at http://hrsonline.isr.umich.edu/modules/meta/2008/core/qnaire/online/16hr08P.pdf" C6: Female dummy C7: Single dummy. Indicates that the interviewee is not married and does not have a partner. C8: Black dummy C9: Hispanic dummy C10: Age in 2008 C11: Years of education C12: Above average "cognitive abilities" dummy. To create this dummy variable, we first derived a cognition proxy variable. This index is the principal component based on 12 variables. The first 6 are indicators of how the respondents rated the aptitude of their own memories (question D101). The seventh and eighth variables are direct measures of memory. Interviewees were told 10 words and then they were asked to repeat these words immediately. A little later they were asked again to repeat these words. The seventh component variable is the number of correct responses in the immediate recall phase and the eighth corresponds to the number of correct responses in the delayed recall phase (based on questions D174 and D184). The ninth variable was based on questions D142- -D146. Interviewees were asked to subtract 7 from 100, and then repeat it 4 more times. The ninth variable was the number of correct subtractions. The last three variables were dummies regarding whether the interviewees were able to correctly answer three simple math questions (D178-D180). C13: Indicator of whether the interviewee follows the stock market in general. It was based on the 2004 values of P097. (2008 values are potentially endogenous, while 2006 values were not available). The variable is coded as 1 if the interviewee followed the stock market either "closely" or "somewhat closely". C14: Indicator of whether the respondent or his/her spouse owned stocks in 2006 either directly (in individual companies or mutual funds) or indirectly (IRA/Keogh accounts at least partly invested in stocks or mutual funds). It is based on Q316 and Q513. C15: The subjective probability that the US economy will experience a major depression in the next 10 years. Average of 2004 and 2006 values of P034. C16: A standardized "depressive symptoms" index. It is based on the 2004 and 2006 values of questions D110-D118. The 9+9 questions are averaged and then standardized. For any missing values, the sample average of that item was replaced. C17: Fraction of fifty-fifty answers to all subjective probability questions in 2004 and 2006. ***** Section B: Stock market indices (sources: http://finance.yahoo.com and http://www.cboe.com/micro/vxd) Column 18-20 in "hrs_expectations_08.txt" and columns 3-5 in "stock_indices_08.txt" are the stock indices. Variables in "hrs_expectations_08.txt" are monthly averages and are linked to the publicly available HRS data. Variables in "stock_indices_08.txt" have daily frequency and they can be merged to the HRS data if one has access to the restricted "day of interview" variable. More on this issue in Section C. Variables in "hrs_expectations_08.txt": C18: Monthly log returns in the DJIA index. For a given month we took the log difference between the average DJIA daily closing levels in the given and in the previous month C19: Average daily volatility in a given month. We used the VXD implied volatility index. Details at http://www.cboe.com/micro/vxd C20: Average daily log volume of trade in a given month Variables in "stock_indices_08.txt": C1: month of interview in 2008 in "YMM" format C2: day of interview C3: Monthly log returns. It is defined as the log difference between the average of the DJIA daily closing levels in the last 5 days minus the same one month before C4: Average daily volatility in the last 5 days. We used the VXD implied volatility index. Details at http://www.cboe.com/micro/vxd C5: Average daily log volume of trade in the last 5 days ***** Section C: Restricted HRS data The day of interview data in HRS are not publicly available. In case one wants to analyze how the stock market affected expectations on a daily basis, one needs to get access to the restricted day of interview data. For more information about the restricted HRS data please visit http://hrsonline.isr.umich.edu/index.php?p=reslis There are multiple options for accessing these data. One way is to apply for access to the MiCDA Data Enclave. More information about the application procedure is at http://micda.psc.isr.umich.edu/enclave/ Once one has access to the day of interview data, one needs to merge it to the expectations data using the unique household and person identifier which is called "hhidpn" in HRS. Then one needs to merge this dataset to any stock market index based on the day of the interview. We used the variables in "stock_indices_08.txt" to produce the results in Table 5. ***** Please email Peter Hudomiet with any questions about the data: Peter Hudomiet University of Michigan hudomiet [AT] umich.edu