Giuseppe De Luca and Franco Peracchi, "Estimating Engel Curves under Unit and Item Nonresponse", Journal of Applied Econometrics, Vol. 27, No. 7, 2012, pp. 1076-1099. DATA SOURCES Our data are from Release 2.0.1 of the first wave of the Survey of Health, Aging and Retirement in Europe (SHARE), a multidisciplinary and cross-national bi-annual household panel survey coordinated by the Mannheim Research Institute for the Economics of Aging (MEA). The first wave, conducted in 2004, covers about 19,500 households and about 28,500 individuals in eleven European countries (Austria, Belgium, Denmark, France, Germany, Greece, Italy, the Netherlands, Spain, Sweden and Switzerland). We only consider the countries (Denmark, Italy, the Netherlands, Spain, and Sweden) for which the sampling frame contains basic information on the individuals selected for interview. Our sample is obtained by matching three sources of data, namely the gross sample data, the CMS (Case Management System) data, and the Release 2.0.1 of the CAPI (computer assisted personal interview) data. -- The gross sample data provides a list of the original sample units drawn in each country. They contain sampling frame information on gender, year of birth and regional NUTS indicators, plus information on eligibility, the final status of the interview process and the main reasons for unit nonresponse. -- The CMS data contains information on the mode, the date, the time and the result code of each contact attempt made by interviewers during the fieldwork period. -- The Release 2.0.1 of the CAPI data contains information on a wide range of topics including socio-demographic characteristics, consumption expenditure, socio-economic status, measures of cognitive ability, health, social and family networks, features of the data collection process and characteristics of the interviewers. Additional details about survey organization, sampling design, response rates, weighting and imputation strategies can be found in the SHARE web page: http://www.share-project.org/ DATA ACCESSIBILITY Gross sample and CMS data are confidential data sources administered by MEA. Permission to use these data should be directed to: Professor Axel Börsch-Supan, Ph.D. Mannheim Research Institute for the Economics of Aging University of Mannheim L13, 17 68131 Mannheim, Germany The Release 2.0.1 of the CAPI data can be downloaded free of charge from the SHARE Research Data Center: http://www.share-project.org/ To get access to the data, researchers have to complete a statement concerning the use of the microdata. SAMPLE COMPOSITION After merging the three data sources (Gross sample, CMS and CAPI), our estimation sample consists of 15,895 households, of which 8,750 agreed to participate to the survey and 2,805 provided the information needed to compute the food share. If we use a less restrictive definition of item nonresponse on food share which ignores item nonresponse on income from capital assets, then the number of complete cases increases from 2,805 to 4,180. DATA DESCRIPTION Below, we provide a description of the main variables used in our analysis. ----- Contains data from sample_income.dta obs: 15,895 vars: 65 9 Nov 2010 12:11 size: 2,241,195 (99.3% of memory free) storage display value variable name type format label variable label sampid2 str13 %13s Household identifier country byte %11.0g cou Country indicator part byte %8.0g Dummy for household survey participation w_food_obs_a byte %9.0g Dummy for item response on food share - definition a w_food_obs_b byte %9.0g Dummy for item response on food share - definition b w_food_a float %9.0g PPP-adj. food share - definition a w_food_b float %9.0g PPP-adj. food share - definition b w_food_imp1_a float %9.0g Imputed and PPP-adj. food share - definition a w_food_imp1_b float %9.0g Imputed and PPP-adj. food share - definition b hincome_obs_a byte %9.0g Dummy for item response household income - definition a hincome_obs_b byte %9.0g Dummy for item response household income - definition b ln_hincome_a float %9.0g PPP-adj. ln household income - definition a ln_hincome_b float %9.0g PPP-adj. ln household income - definition b ln_hincome_im~a float %9.0g Imputed and PPP-adj. ln household income - definition a ln_hincome_im~b float %9.0g Imputed and PPP-adj. ln household income - definition b foode_obs byte %9.0g Dummy for item response food expenditure foode double %10.0g PPP-adj. food expenditure foode_imp1 double %10.0g Imputed and PPP-adj. food expenditure wgtach float %9.0g Calibrated household weights main & vignette samples together vignette byte %9.0g Dummy for vignette sample suppl byte %9.0g Dummy for supplementary Swedish sample gs_female byte %9.0g Dummy for sample person female (gross sample) gs_age int %9.0g Age of sample person female (gross sample) ans_mac byte %9.0g Dummy for answering machine (CMS) delay float %9.0g Indicator for delay in the contact process (CMS) iv_female byte %9.0g Dummy for interviewer female iv_age byte %9.0g Age of interviewer iv_edu float %9.0g Years of education of the interviewer female byte %9.0g Dummy for household respondent female age int %9.0g Age of the household respondent education float %9.0g Years of education of the household respondent single byte %9.0g Dummy for household respondent living as single hsize byte %8.0g Household size children byte %9.0g Number of young children small_city byte %9.0g Dummy for living in a small city partner_age int %9.0g Age of the partner of the household respondent orient byte %9.0g Orientation in time score math byte %9.0g Math score fluency byte %9.0g Fluency score recall_d byte %9.0g Delayed recall score proxy byte %9.0g Dummy for proxy interview int_out byte %9.0g Dummy for interview conducted outside the respondent home int_clarif byte %9.0g Dummy for often asked clarifications during the interview DK byte %8.0g Dummy for Denmark ES byte %8.0g Dummy for Spain IT byte %8.0g Dummy for Italy NL byte %8.0g Dummy for Netherlands SE byte %8.0g Dummy for Sweden ES_nuts1 byte %8.0g Dummy for Spain - NUTS1=1 ES_nuts2 byte %8.0g Dummy for Spain - NUTS1=2 ES_nuts3 byte %8.0g Dummy for Spain - NUTS1=3 ES_nuts4 byte %8.0g Dummy for Spain - NUTS1=4 ES_nuts5 byte %8.0g Dummy for Spain - NUTS1=5 ES_nuts6 byte %8.0g Dummy for Spain - NUTS1=6 ES_nuts7 byte %8.0g Dummy for Spain - NUTS1=7 IT_nuts1 byte %8.0g Dummy for Italy - NUTS1=1 IT_nuts2 byte %8.0g Dummy for Italy - NUTS1=2 IT_nuts3 byte %8.0g Dummy for Italy - NUTS1=3 IT_nuts4 byte %8.0g Dummy for Italy - NUTS1=4 IT_nuts5 byte %8.0g Dummy for Italy - NUTS1=5 NL_nuts1 byte %8.0g Dummy for Netherlands - NUTS1=1 NL_nuts2 byte %8.0g Dummy for Netherlands - NUTS1=2 NL_nuts3 byte %8.0g Dummy for Netherlands - NUTS1=3 NL_nuts4 byte %8.0g Dummy for Netherlands - NUTS1=4 ppp double %10.0g Purchasing power parity coefficient ------ SUMMARY STATISTICS Below, we provide summary statistics of the main variables used in our analysis. country | Freq. Percent Cum. ------------+----------------------------------- sweden | 4,491 28.25 28.25 netherlands | 3,174 19.97 48.22 spain | 3,303 20.78 69.00 italy | 3,179 20.00 89.00 denmark | 1,748 11.00 100.00 ------------+----------------------------------- Total | 15,895 100.00 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- part | 15895 .5504876 .4974601 0 1 w_food_obs_a | 8750 .3205714 .4667229 0 1 w_food_obs_b | 8750 .4777143 .4995316 0 1 w_food_a | 2805 .228014 .2047741 .0092943 .9997157 w_food_b | 4180 .2241338 .2049385 .0076703 .9997157 w_food_imp~a | 8750 .2279077 .2146392 .0026148 .9997157 w_food_imp~b | 8750 .2305438 .2178601 .0058553 .9997157 hincome_ob~a | 8750 .3602286 .480094 0 1 hincome_ob~b | 8750 .5492571 .4975963 0 1 ln_hincome_a | 3152 10.07515 .7792086 7.346604 12.71169 ln_hincome_b | 4806 10.10024 .792193 6.770937 12.8338 ln_hinco~1_a | 8750 10.29414 .8752015 6.819516 13.71127 ln_hinco~1_b | 8750 10.28288 .8731048 6.770937 13.71124 foode_obs | 8750 .8180571 .3858195 0 1 foode | 7158 5393.643 3920.783 251.7023 52346.57 foode_imp1 | 8750 5507.243 3999.283 251.7023 52346.57 wgtach | 8720 3247.276 3128.602 527.7827 31179.29 vignette | 15895 .1608682 .3674209 0 1 suppl | 15895 .0581315 .2339993 0 1 gs_female | 15895 .5474048 .4977634 0 1 gs_age | 15895 65.17609 10.73107 26 105 ans_mac | 15895 .0242215 .1537409 0 1 delay | 15895 .3248979 .1763373 0 .9867842 iv_female | 15894 .7429219 .4370366 0 1 iv_age | 15894 49.25878 11.80895 18 83 iv_edu | 15828 13.45334 2.864352 0 22 female | 8750 .5458286 .4979238 0 1 age | 8750 64.90126 10.49058 26 102 education | 8728 9.14379 4.53399 0 22 single | 8740 .3187643 .4660241 0 1 hsize | 8750 2.160914 1.052482 1 9 children | 8743 .0910443 .3688774 0 4 small_city | 8750 .2147429 .4106673 0 1 partner_age | 8740 63.16911 8.262791 23 104 orient | 8703 3.756521 .6664522 0 4 math | 8690 3.205984 1.182082 1 5 fluency | 8604 18.40493 7.417752 0 88 recall_d | 8642 3.225179 2.04021 0 10 proxy | 8750 .1059429 .3077821 0 1 int_out | 8750 .0434286 .2038315 0 1 int_clarif | 8750 .0843429 .2779172 0 1 DK | 15895 .1099717 .3128643 0 1 ES | 15895 .2078012 .4057465 0 1 IT | 15895 .2 .4000126 0 1 NL | 15895 .1996854 .3997765 0 1 SE | 15895 .2825417 .4502495 0 1 ES_nuts1 | 15895 .0169236 .1289891 0 1 ES_nuts2 | 15895 .0205096 .14174 0 1 ES_nuts3 | 15895 .0286883 .1669341 0 1 ES_nuts4 | 15895 .0266751 .161137 0 1 ES_nuts5 | 15895 .0552375 .2284504 0 1 ES_nuts6 | 15895 .0478767 .213512 0 1 ES_nuts7 | 15895 .0118905 .1083969 0 1 IT_nuts1 | 15895 .0551117 .2282053 0 1 IT_nuts2 | 15895 .0415225 .1995016 0 1 IT_nuts3 | 15895 .0409563 .1981952 0 1 IT_nuts4 | 15895 .0410821 .1984864 0 1 IT_nuts5 | 15895 .0213275 .1444781 0 1 NL_nuts1 | 15895 .0307644 .1726842 0 1 NL_nuts2 | 15895 .0300094 .1706186 0 1 NL_nuts3 | 15895 .0992765 .2990423 0 1 NL_nuts4 | 15895 .0396351 .1951065 0 1 ppp | 8750 1.029116 .1325509 .8501 1.2658 REQUIRED SOFTWARE The software required to replicate the results in the paper is Stata 10.1 or higher. All files and (mostly empty) folders are contained in the zip file dp-analysis.zip. All files are ASCII files in DOS format, so Unix/Linux users should use "unzip -a". STATA DO-FILES We provide a set of 25 Stata do-files which are fully commented step by step. A brief description of each Stata do-file is provided below. - Maste.do: sets the user Stata section, the working directory, the data folders and run the remaining Stata do-files. After downloading the data and setting the data folders, all analysis in the paper can be replicated by running this file. - From "1_Fieldwork.do" to "7_Budget_share.do": extract the data from the gross sample database, the CMS database and the relevant CAPI modules of the SHARE interview, merge all data sources, and generate all variable needed in our analysis. - 8_Descriptive.do: Provides the summary statistics reported in the paper. - 9_FS_degree.do: Fits SNP models for the two different definitions of item nonresponse on food share using alternative starting values and order of the Hermite polynomial expansion. Estimates are obtained separately by country and for the pooling of the countries. - 10_FS_estimates.do: Displays first-step estimates of the unit and item nonresponse equations for the selected specifications. - 11_PL_degree.do: Uses cross-validation on the second and the third estimation steps to determine the degree of polynomial expansion to be used in semiparametric models with k=(3,3) in the first estimation step (Models 4). - 12_PL_degree_base.do: Uses cross-validation to determine the degree of polynomial expansion to be used in the baseline models (Models 1 and 2). - 13_PL_degree_44.do: Uses cross-validation on the second and the third estimation steps to determine the degree of polynomial expansion to be used in semiparametric models with k=(4,4) in the first estimation step (Models 4 with undersmoothing). - 14_SS_estimates.do: Runs and displays the second step estimates in Table 6. - 15_TS_estimates.do: Runs and displays the third step estimates in Table 7. - 16_TS_USM.do: Runs the third-step estimates with undersmoothing. - 17_TS_Country.do: Runs the third-step estimates separately by country. - 18_TS_WAD.do: Computes the weighted average derivative in Table 8. - 19_TS_Fig_1_4: Plots the food share derivatives in Figures 1 and 4. - 20_TS_Fig_2.do: Plots the food share derivatives with undersmoothing in Figure 2. - 21_TS_Fig_3.do: Plots the food share derivatives for the more and the less conservative definitions of item nonresponse on food share (Figure 3). - 0_Predictors.do: Defines lists of predictors to be used in each equation. - 0_TRANSFORM_NUTS.do: Recoding of the regional NUTS1 codes. - dropping.do: programs to drop missing values on a set of variables. REQUIRED DATA FOLDERS The dp-analysis.zip file contains some data folders in which the outputs from the analysis are stored. Thus, before running the above do-files, researchers have to create the following folders within their working directory: routine_ado : contains the our set of Stata ado-files (see below). routine_do : contains the file doppring.do. FS_DEGREE : contains results from the first-estimation step. PL_DEGREE : contains results from the cross validation procedure. SS_EST : contains results from the second-estimation step. TS_EST : contains results from the third-estimation step. TS_BOOT : contains bootstrap replicates from the third-estimation step. TS_GRA : contains graph from the third-estimation step. TS_USM : contains results from the third-estimation step with undersmoothing. TS_BOOT_USM : contains bootstrap replicates from the third-estimation step with undersmoothing. STATA ADO-FILES In addition to the above set of Stata do-files, we also provide the following set of Stata ado-files which are stored in the folder routine_ado. The main ado-files are: snpbpsel_m.ado : Stata program for SNP estimation of a bivariate binary choice model with sample selection. h2ssd.ado : Stata program to run cross validation on the second and the third estimation step of the semiparametric model. reg_cv2.ado : Stata program to run cross validation on baseline models. plinear.ado : Stata program for power series estimation of a partially linear model. heck2s.ado : Stata program for parametric three-step estimation of our sample selection model with two sequential selection equations and endogeneity of one of the predictors. The first step uses the ML estimator of the bivariate probit with sample selection. The second and the third step use augmented linear regression model with bias correction terms for selectivity and endogeneity. pl2se.ado : Stata program for semiparametric three-step estimation of our sample selection model with two sequential selection equations and endogeneity of one of the predictors. The first step uses the SNP estimator of the bivariate choice with sample selection. The second and the third step use power series expansion to correct for both selectivity and endogeneity. CONTACT Any question can be addressed to: Giuseppe De Luca ISFOL via G.M. Lancisi 29 00161 Roma Email: g.deluca [AT] isfol.it