Anil Kumar, "Nonparametric Estimation of the Impact of Taxes on Female Labor Supply", Journal of Applied Econometrics, Vol. 27, No. 3, 2012, pp. 415-439. The data come from the 1985 and 1989 waves of the PSID, a publicly available longitudinal study of a representative sample of U.S. individuals and the family units in which they reside. The sample consists of married women. Those belonging to the Survey of Economic Opportunity (SEO) subsample were excluded from the analysis sample, as they were non-randomly selected. The PSID has 6335 observations on wives in 1985 and 1989. Following are the reasons for sample exclusions and observations dropped in parenthesis: SEO sample (2233), Age below 25 or above 60 (1427), Self-employed or spouse self-employed (892). Excluding 12 observations with missing variables yields a final sample of 1771. 3 observations were also lost due to missing values on marginal tax rates. The final sample in the data used in the paper consists of 1768 observations. All data are zipped in the file ak-data.zip. It contains two ASCII files in DOS format. Unix/Linux users should use "unzip -a". The data themselves are in the file psid_85_and_89_nonseo_wives.csv. It contains all the budget set variables merged after getting marginal tax rates for a grid of wage income using NBER-TAXSIM. The budget set variables provided with this data were constructed exactly as described in the paper, in particular as presented in Equations (6) and (10) in the paper. Wages for women out of the labor force (dlfp1=0) have been replaced by the predicted wages from a selection corrected nonparametric wage equation as in Equation (9) of the paper. The dataset has 1768 observations and 200 variables. The file variable_mapping_85_89.txt has the crosswalk from the variables in the dataset to the original PSID variables they have been constructed from. **NOTE ON REPLICATING BUDGET SET VARIABLES USING NBER TAXSIM To replicate the budget set variables used in the paper, use the accompanying dataset psid_85_and_89_nonseo_wives.csv to extract the following variables for running them through TAXSIM: 1. taxsimid: Case ID (must be numeric) 2. year: Tax year 3. SOI_code: State 4. taxmar: Marital Status (1. single 2. joint 3. head of household) 5. deps: Dependent Exemptions (including children of all ages, but see #19 below) 6. ageex: Number of taxpayers over 65 years of age. 7. wage_grid: Wage and salary income of Taxpayer 8. spouse_wage: Wage and salary income of Spouse 9. dividend: Dividend income 10. assetinc: Other property income, including interest, rent, alimony 11. hwpeninc: Taxable Pensions 12. hwsocsec: Gross Social Security Income 13. othertri: Other non-taxable transfer Income such as welfare 14. rentpaid: Rent Paid 15. proptax: Property and other taxes paid 16. item: Itemized deductions 17. childcare: Child care expenses. 18. hwworkplusuecomp: Unemployment compensation received. Note that some variables mentioned above are not available in psid_85_and_89_nonseo_wives.csv, but they are easy to create for preparing a file for TAXSIM. They are: (i) taxsimid: this is just a unique id variable to identify the observation in the TAXSIM output. (ii) SOI_code: when I ran TAXSIM it asked for SOI code for state. So make sure that you convert the variable "state" to SOI code before sending it to TAXSIM (iii) wage_grid is wage income. Note that I ran this for a grid of wage incomes ranging from 0 to 200000 for each observation in the dataset psid_85_and_89_nonseo_wives.csv in order to construct the budget set for each individual. (iv) spouse_wage is wage income of spouse which was set to 0 for everyone as the budget is being constructed for a range of incomes (v) dividend is dividend income which was also set to 0 for everyone IMPORTANT POINTS TO NOTE WHILE RUNNING TAXSIM (1) Read the order in which you need to organize the variables using the current version of TAXSIM. I ran TAXSIM interactively back in 2002. The order and information required in latest TAXSIM may have changed. (2) After getting the marginal tax rate information from TAXSIM for each point on the grid of wage income, find the kinks and slopes for each person and merge the information back to psid_85_and_89_nonseo_wives.csv. (3) While constructing the budget set for the paper, I assumed that the employee bears the economic incidence of the entire payroll tax. Therefore, marginal tax rates at each wage income level was calculated as the sum of federal tax rate+state tax rate+entire payroll tax (i.e. including the employer's portion). THE FOLLOWING ARE THE VARIABLE DESCRIPTIONS IN THE ACCOMPANYING DATASET psid_85_and_89_nonseo_wives.csv (1) (2) (3) position variable name variable label 1 caseid Individual Identifier 2 year Time Identifier 3 hrs1 Annual Hours of Work 4 wage1 Hourly Wage 5 nly Non labor Income 6 agi Adjusted Gross Income 7 dyr2 year== 89.0000 8 slope10yv11 slope1^0*yv1^1 (Eq (6): First Term) 9 slope10yv12 slope1^0*yv1^2 (Eq (6): First Term) 10 slope10yv13 slope1^0*yv1^3 (Eq (6): First Term) 11 slope10yv14 slope1^0*yv1^4 (Eq (6): First Term) 12 slope10yv15 slope1^0*yv1^5 (Eq (6): First Term) 13 slope11yv10 slope1^1*yv1^0 (Eq (6): First Term) 14 slope11yv11 slope1^1*yv1^1 (Eq (6): First Term) 15 slope11yv12 slope1^1*yv1^2 (Eq (6): First Term) 16 slope11yv13 slope1^1*yv1^3 (Eq (6): First Term) 17 slope11yv14 slope1^1*yv1^4 (Eq (6): First Term) 18 slope12yv10 slope1^2*yv1^0 (Eq (6): First Term) 19 slope12yv11 slope1^2*yv1^1 (Eq (6): First Term) 20 slope12yv12 slope1^2*yv1^2 (Eq (6): First Term) 21 slope12yv13 slope1^2*yv1^3 (Eq (6): First Term) 22 slope13yv10 slope1^3*yv1^0 (Eq (6): First Term) 23 slope13yv11 slope1^3*yv1^1 (Eq (6): First Term) 24 slope13yv12 slope1^3*yv1^2 (Eq (6): First Term) 25 slope14yv10 slope1^4*yv1^0 (Eq (6): First Term) 26 slope14yv11 slope1^4*yv1^1 (Eq (6): First Term) 27 slope15yv10 slope1^5*yv1^0 (Eq (6): First Term) 28 sumlj1dslope0dyv2 sum_j(kink_j)^1*(slope_j^0yv_j^2-slope_j+1^0yv_j+1^2) (Eq (6): Second Term) 29 sumlj1dslope1dyv1 sum_j(kink_j)^1*(slope_j^1yv_j^1-slope_j+1^1yv_j+1^1) (Eq (6): Second Term) 30 sumlj1dslope1dyv2 sum_j(kink_j)^1*(slope_j^1yv_j^2-slope_j+1^1yv_j+1^2) (Eq (6): Second Term) 31 sumlj1dslope2dyv0 sum_j(kink_j)^1*(slope_j^2yv_j^0-slope_j+1^2yv_j+1^0) (Eq (6): Second Term) 32 sumlj1dslope2dyv1 sum_j(kink_j)^1*(slope_j^2yv_j^1-slope_j+1^2yv_j+1^1) (Eq (6): Second Term) 33 sumlj1dslope2dyv2 sum_j(kink_j)^1*(slope_j^2yv_j^2-slope_j+1^2yv_j+1^2) (Eq (6): Second Term) 34 sumlj2dslope0dyv2 sum_j(kink_j)^2*(slope_j^0yv_j^2-slope_j+1^0yv_j+1^2) (Eq (6): Second Term) 35 sumlj2dslope1dyv1 sum_j(kink_j)^2*(slope_j^1yv_j^1-slope_j+1^1yv_j+1^1) (Eq (6): Second Term) 36 sumlj2dslope1dyv2 sum_j(kink_j)^2*(slope_j^1yv_j^2-slope_j+1^1yv_j+1^2) (Eq (6): Second Term) 37 sumlj2dslope2dyv0 sum_j(kink_j)^2*(slope_j^2yv_j^0-slope_j+1^2yv_j+1^0) (Eq (6): Second Term) 38 sumlj2dslope2dyv1 sum_j(kink_j)^2*(slope_j^2yv_j^1-slope_j+1^2yv_j+1^1) (Eq (6): Second Term) 39 sumlj2dslope2dyv2 sum_j(kink_j)^2*(slope_j^2yv_j^2-slope_j+1^2yv_j+1^2) (Eq (6): Second Term) 40 sumlj1dslope0dyv0 sum_j(kink_j)^1*(slope_j^0yv_j^0-slope_j+1^0yv_j+1^0) (Eq (6): Second Term) 41 sumlj1dslope0dyv1 sum_j(kink_j)^1*(slope_j^0yv_j^1-slope_j+1^0yv_j+1^1) (Eq (6): Second Term) 42 sumlj1dslope1dyv0 sum_j(kink_j)^1*(slope_j^1yv_j^0-slope_j+1^1yv_j+1^0) (Eq (6): Second Term) 43 sumlj2dslope0dyv0 sum_j(kink_j)^2*(slope_j^0yv_j^0-slope_j+1^0yv_j+1^0) (Eq (6): Second Term) 44 sumlj2dslope0dyv1 sum_j(kink_j)^2*(slope_j^0yv_j^1-slope_j+1^0yv_j+1^1) (Eq (6): Second Term) 45 sumlj2dslope1dyv0 sum_j(kink_j)^2*(slope_j^1yv_j^0-slope_j+1^1yv_j+1^0) (Eq (6): Second Term) 46 dsJ1s11 slopeJ^1-slope1^1 (Eq(6): Third Term) 47 dyvJ1yv11 yvJ^1-yv1^1 (Eq(6):Third Term) 48 dsJ2s12 slopeJ^2-slope1^2 (Eq(6): Third Term) 49 dyvJ2yv12 yvJ^2-yv1^2 (Eq(6):Third Term) 50 dsJ1yvJ1 slopeJ^1*yvJ^1-slope1^1*yv1^1 (Eq(6):Third Term) 51 dsJ1yvJ2 slopeJ^1*yvJ^2-slope1^1*yv1^2 (Eq(6):Third Term) 52 dsJ2yvJ1 slopeJ^2*yvJ^1-slope1^2*yv1^1 (Eq(6):Third Term) 53 dsJ2yvJ2 slopeJ^2*yvJ^2-slope1^2*yv1^2 (Eq(6):Third Term) 54 slope1 Slope: Segment 1 55 slope2 Slope: Segment 2 56 slope3 Slope: Segment 3 57 slope4 Slope: Segment 4 58 slope5 Slope: Segment 5 59 slope6 Slope: Segment 6 60 slope7 Slope: Segment 7 61 slope8 Slope: Segment 8 62 slope9 Slope: Segment 9 63 slope10 Slope: Segment 10 64 slope11 Slope: Segment 11 65 slope12 Slope: Segment 12 66 slope13 Slope: Segment 13 67 slope14 Slope: Segment 14 68 slope15 Slope: Segment 15 69 slope16 Slope: Segment 16 70 slope17 Slope: Segment 17 71 slope18 Slope: Segment 18 72 slope19 Slope: Segment 19 73 slope20 Slope: Segment 20 74 slope21 Slope: Segment 21 75 slope22 Slope: Segment 22 76 slope23 Slope: Segment 23 77 slopeJ Last Segment Slope 78 hrskink1 Kink 1 (Hours) 79 hrskink2 Kink 2 (Hours) 80 hrskink3 Kink 3 (Hours) 81 hrskink4 Kink 4 (Hours) 82 hrskink5 Kink 5 (Hours) 83 hrskink6 Kink 6 (Hours) 84 hrskink7 Kink 7 (Hours) 85 hrskink8 Kink 8 (Hours) 86 hrskink9 Kink 9 (Hours) 87 hrskink10 Kink 10 (Hours) 88 hrskink11 Kink 11 (Hours) 89 hrskink12 Kink 12 (Hours) 90 hrskink13 Kink 13 (Hours) 91 hrskink14 Kink 14 (Hours) 92 hrskink15 Kink 15 (Hours) 93 hrskink16 Kink 16 (Hours) 94 hrskink17 Kink 17 (Hours) 95 hrskink18 Kink 18 (Hours) 96 hrskink19 Kink 19 (Hours) 97 hrskink20 Kink 20 (Hours) 98 hrskink21 Kink 21 (Hours) 99 hrskink22 Kink 22 (Hours) 100 mtr1 Marginal Tax Rate: Segment 1 101 mtr2 Marginal Tax Rate: Segment 2 102 mtr3 Marginal Tax Rate: Segment 3 103 mtr4 Marginal Tax Rate: Segment 4 104 mtr5 Marginal Tax Rate: Segment 5 105 mtr6 Marginal Tax Rate: Segment 6 106 mtr7 Marginal Tax Rate: Segment 7 107 mtr8 Marginal Tax Rate: Segment 8 108 mtr9 Marginal Tax Rate: Segment 9 109 mtr10 Marginal Tax Rate: Segment 10 110 mtr11 Marginal Tax Rate: Segment 11 111 mtr12 Marginal Tax Rate: Segment 12 112 mtr13 Marginal Tax Rate: Segment 13 113 mtr14 Marginal Tax Rate: Segment 14 114 mtr15 Marginal Tax Rate: Segment 15 115 mtr16 Marginal Tax Rate: Segment 16 116 mtr17 Marginal Tax Rate: Segment 17 117 mtr18 Marginal Tax Rate: Segment 18 118 mtr19 Marginal Tax Rate: Segment 19 119 mtr20 Marginal Tax Rate: Segment 20 120 mtr21 Marginal Tax Rate: Segment 21 121 mtr22 Marginal Tax Rate: Segment 22 122 mtr23 Marginal Tax Rate: Segment 23 123 mtrsim Observed Marginal Tax rate 124 fdtrsim First Dollar Marginal tax Rate 125 cpi Consumer Price Index 126 p_wage1 Predicted Wage (Nonparametric Seclection Corrected) 127 p_dlfp1 Labor Force Propensity Score (Nonparametric Seclection Corrected) 128 age Age 129 dhome Dummy For Own Home 130 fsize Family Size 131 k12 Kids 1-2 Years 132 k35 Kids 3-5 Years Old 133 nkids Number of Kids 134 state State Of Residence 135 educ Years of Education 136 ddis Dummy for Poor Health 137 occ3digit 3-Digit Occupation Code 138 ind3digit 3-Digit Industry Code 139 dunion Union Dummy 140 dwhite Dummy for White 141 mjocc Major Occupation 142 mjind Major Industry 143 dlfp1 Labor Force Participation Dummy 144 age2 Age Square 145 age3 Age Cube 146 age4 Age ^ Four 147 educ2 Education Square 148 educ3 Education Cube 149 educ4 Education ^ Four 150 ageeduc Age X Education 151 age2educ Age Square X Education 152 kidsu6 Kids Under Six 153 p_dlfp12 Labor Force Propensity Score Square 154 p_dlfp13 Labor Force Propensity Score Cube 155 p_dlfp14 Labor Force Propensity Score ^ Four 156 cpidef85 CPI Deflator 85 157 yv1 Virtual Income: Segment 1 158 yv2 Virtual Income: Segment 2 159 yv3 Virtual Income: Segment 3 160 yv4 Virtual Income: Segment 4 161 yv5 Virtual Income: Segment 5 162 yv6 Virtual Income: Segment 6 163 yv7 Virtual Income: Segment 7 164 yv8 Virtual Income: Segment 8 165 yv9 Virtual Income: Segment 9 166 yv10 Virtual Income: Segment 10 167 yv11 Virtual Income: Segment 11 168 yv12 Virtual Income: Segment 12 169 yv13 Virtual Income: Segment 13 170 yv14 Virtual Income: Segment 14 171 yv15 Virtual Income: Segment 15 172 yv16 Virtual Income: Segment 16 173 yv17 Virtual Income: Segment 17 174 yv18 Virtual Income: Segment 18 175 yv19 Virtual Income: Segment 19 176 yv20 Virtual Income: Segment 20 177 yv21 Virtual Income: Segment 21 178 yv22 Virtual Income: Segment 22 179 yv23 Virtual Income: Segment 23 180 yvJ Last Segment Virtual Income 181 numkink Number of Kinks 182 numseg Number of Segments 183 dyr1 year== 85.0000 184 occind3digit group(occ3digit ind3digit) 185 occind3digitstate group(occ3digit ind3digit state) 186 mjoccind group(mjocc mjind) 187 mjoccstate group(mjocc state) 188 mjoccindstate group(mjocc mjind state) 189 taxmar Marital Status (1single 2joint 3head) 190 deps Dependent Exemptions 191 ageex Number of taxpayers over 65 years of 192 assetinc Other property income 193 hwpeninc Taxable Pensions 194 hwsocsec Gross Social Security Income 195 othertri Other non-taxable transfer Income 196 rentpaid Rent Paid 197 proptax Property taxes paid 198 item Itemized deductions 199 childcare Child care expenses 200 hwworkplusuecomp Unemployment Compensation HOW TO READ THE DATASET INTO STATA To read the dataset in STATA, simply put the file in a directory e.g. C:/mydirectory, paste the following code in a STATA do file, and run the do file. *******CODE TO READ FILE IN STATA****** #delimit; insheet caseid year hrs1 wage1 nly agi dyr2 slope10yv11 slope10yv12 slope10yv13 slope10yv14 slope10yv15 slope11yv10 slope11yv11 slope11yv12 slope11yv13 slope11yv14 slope12yv10 slope12yv11 slope12yv12 slope12yv13 slope13yv10 slope13yv11 slope13yv12 slope14yv10 slope14yv11 slope15yv10 sumlj1dslope0dyv2 sumlj1dslope1dyv1 sumlj1dslope1dyv2 sumlj1dslope2dyv0 sumlj1dslope2dyv1 sumlj1dslope2dyv2 sumlj2dslope0dyv2 sumlj2dslope1dyv1 sumlj2dslope1dyv2 sumlj2dslope2dyv0 sumlj2dslope2dyv1 sumlj2dslope2dyv2 sumlj1dslope0dyv0 sumlj1dslope0dyv1 sumlj1dslope1dyv0 sumlj2dslope0dyv0 sumlj2dslope0dyv1 sumlj2dslope1dyv0 dsJ1s11 dyvJ1yv11 dsJ2s12 dyvJ2yv12 dsJ1yvJ1 dsJ1yvJ2 dsJ2yvJ1 dsJ2yvJ2 slope1 slope2 slope3 slope4 slope5 slope6 slope7 slope8 slope9 slope10 slope11 slope12 slope13 slope14 slope15 slope16 slope17 slope18 slope19 slope20 slope21 slope22 slope23 slopeJ hrskink1 hrskink2 hrskink3 hrskink4 hrskink5 hrskink6 hrskink7 hrskink8 hrskink9 hrskink10 hrskink11 hrskink12 hrskink13 hrskink14 hrskink15 hrskink16 hrskink17 hrskink18 hrskink19 hrskink20 hrskink21 hrskink22 mtr1 mtr2 mtr3 mtr4 mtr5 mtr6 mtr7 mtr8 mtr9 mtr10 mtr11 mtr12 mtr13 mtr14 mtr15 mtr16 mtr17 mtr18 mtr19 mtr20 mtr21 mtr22 mtr23 mtrsim fdtrsim cpi p_wage1 p_dlfp1 age dhome fsize k12 k35 nkids state educ ddis occ3digit ind3digit dunion dwhite mjocc mjind dlfp1 age2 age3 age4 educ2 educ3 educ4 ageeduc age2educ kidsu6 p_dlfp12 p_dlfp13 p_dlfp14 cpidef85 yv1 yv2 yv3 yv4 yv5 yv6 yv7 yv8 yv9 yv10 yv11 yv12 yv13 yv14 yv15 yv16 yv17 yv18 yv19 yv20 yv21 yv22 yv23 yvJ numkink numseg dyr1 occind3digit occind3digitstate mjoccind mjoccstate mjoccindstate taxmar deps ageex assetinc hwpeninc hwsocsec othertri rentpaid proptax item childcare hwworkplusuecomp using C:/mydirectory/psid_85_and_89_nonseo_wives.csv; *************************************** HOW TO MAP VARIABLES IN THE DATASET TO ORIGINAL PSID VARIABLES The file variable_mapping_85_89.txt has the crosswalk from the variables in the dataset to the original PSID variables they have been constructed from. REPLICATION OF RESULTS IN TABLE 2-Table 5 Results in Table 2 and 4 can be replicated by estimating SCLS regression of hrs1 on the right hand side variables described in the paper, using the programs downloaded from Kenneth Chay's website http://elsa.berkeley.edu/~kenchay/ftp/binresp/programs/all.zip. See Chay K, Powell JL. 2001. Semiparametric Censored Regression Models. Journal of Economic Perspectives 15: 29-42. for details. Results in Table 3 and 4 can be replicated by estimating OLS of hrs1 on the right hand side variables described in the paper, on a selected sample of wives in labor force (dlpf1=1), and by including a quartic power series in propensity scores for labor force participation (p_dlpf1, p_dlfp12, p_dlfp13, p_dlfp14). For any further details about the data and methodology, refer to the paper or contact the author. Anil Kumar Research Department Federal Reserve Bank of Dallas 2200 N. Pearl St. Dallas, TX-75201 Phone: +1 214-922-5856 Email: anil.kumar [AT] dal.frb.org