James G. MacKinnon and Matthew D. Webb, "Wild Bootstrap Inference for Wildly Different Cluster Sizes", Journal of Applied Econometrics, Vol. 32, No. 2, 2017, pp. 233-254. [Added 2019-08-06] Note that there were errors in some of the simulations in both the paper and the appendix. These have been corrected in an updated version of the online appendix, mackinnon-webb-appendix.pdf, which may be found in this directory. ***** There are three types of file: Stata .dta files, Stata DO files, and CSV files. All .dta files are zipped in the file mw-dta.zip. The DO and CSV files are ASCII files in DOS format. They are zipped in the file mw-files.zip. Unix/Linux users should use "unzip -a" with mw-files.zip but *not* with mw-dta.zip. ************************ Section 7 - Placebo Laws ************************ The data from this section attempt to recreate (in most respects) the data used in the simulations in Bertrand, Duflo, and Mullainathan (2004). The data come from the Current Population Survey, specifically the Merged Outgoing Rotation Group. The data were downloaded from http://www.nber.org/morg/annual/ . The dataset combines the Merged Outgoing Rotation Group data from all states for 1979 to 1999. The variables used are: -- age keep observations aged 25-50 -- minsamp -- months in sample keep only those who have been in the sample for 4 months -- age squared -- generated -- state of residence -- earnwke -- weekly earnings drop observations with weekly earnings less than 20 transform to log earnings -- gradeat -- highest grade attended (prior to 1992) transformed to four categorical variables for educational attainment: less than high school (1), high school graduate (2), some college (3), college graduate (4) 0-11 --> 1; 12 --> 2; 13-15 --> 3; 16-18 --> 4 -- grade92 -- highest grade completed (from 1992 on) -- the definition of grade attained changed in 1992. The new transformation is: 31-38 -->1 ; 39 --> 2; 40-42 --> 3; 43-46 --> 4 -- weight - final weight (not used) The dataset placebo-laws.dta contains the above, transformed data. All variables have been demeaned by state, to simplify the replications. The procedure for generating the -treatment- variable is described in the paper. Within each replication, the -treatment- variable is demeaned by state. For each replication, we estimate a regression specified in equation (18) in the paper. The difference between that equation and what is estimated is that the STATES dummy variables are not included because the dataset is demeaned by states. Results should be numerically identical. The ASCII file placebo-laws.csv contains the same data as placebo-laws.dta in CSV format. The file pl-panel.csv contains data aggregated to the state-year level. These were used in the experiments in Figures 10, A.10, A.12, and A.13. Variable names are slightly different, but should be self-explanatory. We do not provide any DO files that use these data. ****************************** Section 8 - Empirical Example ****************************** The empirical example in the paper comes from Angrist and Kugler (2008). The data used are publicly available on Josh Angrist's data archive, which can be found at: http://economics.mit.edu/faculty/angrist/data1/data/angkug08 . The DO files that we provide are adapted from the DO files that Prof. Angrist has provided on the archive. The replication files here are divided into four parts. Two parts replicate Panel A in our table, one part for micro estimates and one part for aggregate estimates. The other two parts do the same thing for Panel B. Estimates for Panel C are obtained within the DO files for micro estimates for the two panels. In this specification, the rural departments are -treated- and the urban departments are -controls- because coca production was concentrated in the rural areas. This is equation (19) in the paper. One difference between the micro estimates and the aggregate estimates is the way in which weights are dealt with. Within the micro estimates, all variables including the constant are transformed by the weight variable. Thus, for micro estimates, the regressions have no constant term, and the weight is not specified. For the aggregate regressions, a constant term is included, and the weight is specified. With both the micro estimates and aggregate estimates, many dummy variables are included, all of which start with a d. Many of these dummies are for states in Columbia, others are for ages, dage*, while others are for years. A full list of the variables can be found in the DO files. The variables of interest are urban95-97, urban98-00, rural95-97, and rural98-00. The DO files calculate cluster-robust and wild cluster bootstrap P values for the coefficients on the variables of interest. The DO files also calculate a wild cluster bootstrap P value for the F tests of whether both urban coefficients are 0 and whether both rural coefficients are 0. Adult Men -- Column 6 Micro Estimates This section uses the dataset AK_col6_micro.dta and the DO file AK_col6_micro.do. Aggregate Estimates The aggregate estimates repeat the analysis, but for the mean value for each state-year pair. This section uses the dataset AK_col6_agg.dta and the DO file AK_col6_agg.do. Teenage Boys -- Column 9 Micro Estimates This section uses the dataset AK_col9_micro.dta and the DO file AK_col9_micro.do. The analysis from above is basically repeated, but using observations from teenage boys, rather than adult men. Aggregate Estimates This section uses the dataset AK_col9_agg.dta and the DO file AK_col9_agg.do. The four AK*.csv files contain the same data as the corresponding AK*dta files.