James G. MacKinnon and Matthew D. Webb, "Wild Bootstrap Inference for
Wildly Different Cluster Sizes", Journal of Applied Econometrics,
Vol. 32, No. 2, 2017, pp. 233-254.

[Added 2019-08-06] Note that there were errors in some of the 
simulations in both the paper and the appendix. These have been 
corrected in an updated version of the online appendix, 
mackinnon-webb-appendix.pdf, which may be found in this directory.

*****

There are three types of file: Stata .dta files, Stata DO files, and
CSV files.

All .dta files are zipped in the file mw-dta.zip. The DO and CSV files
are ASCII files in DOS format. They are zipped in the file
mw-files.zip. Unix/Linux users should use "unzip -a" with mw-files.zip
but *not* with mw-dta.zip.


************************
Section 7 - Placebo Laws
************************

The data from this section attempt to recreate (in most respects) the
data used in the simulations in Bertrand, Duflo, and Mullainathan
(2004). The data come from the Current Population Survey,
specifically the Merged Outgoing Rotation Group. The data were
downloaded from http://www.nber.org/morg/annual/ .

The dataset combines the Merged Outgoing Rotation Group data from all
states for 1979 to 1999. The variables used are:

-- age
   keep observations aged 25-50

-- minsamp -- months in sample
   keep only those who have been in the sample for 4 months

-- age squared -- generated

-- state of residence

-- earnwke -- weekly earnings
   drop observations with weekly earnings less than 20
   transform to log earnings

-- gradeat -- highest grade attended (prior to 1992)
   transformed to four categorical variables for educational attainment:
   less than high school (1), high school graduate (2), some college (3),
   college graduate (4)
   0-11 --> 1; 12 --> 2; 13-15 --> 3; 16-18 --> 4

-- grade92 -- highest grade completed (from 1992 on)
-- the definition of grade attained changed in 1992. The new
   transformation is:
   31-38 -->1 ; 39 --> 2; 40-42 --> 3; 43-46 --> 4

-- weight - final weight (not used)

The dataset placebo-laws.dta contains the above, transformed data. All
variables have been demeaned by state, to simplify the replications.

The procedure for generating the -treatment- variable is described in
the paper. Within each replication, the -treatment- variable is
demeaned by state. For each replication, we estimate a regression
specified in equation (18) in the paper. The difference between that
equation and what is estimated is that the STATES dummy variables are
not included because the dataset is demeaned by states. Results should
be numerically identical.

The ASCII file placebo-laws.csv contains the same data as
placebo-laws.dta in CSV format.

The file pl-panel.csv contains data aggregated to the state-year
level. These were used in the experiments in Figures 10, A.10, A.12,
and A.13. Variable names are slightly different, but should be
self-explanatory. We do not provide any DO files that use these data.


******************************
Section 8 - Empirical Example
******************************

The empirical example in the paper comes from Angrist and Kugler
(2008). The data used are publicly available on Josh Angrist's data
archive, which can be found at:

  http://economics.mit.edu/faculty/angrist/data1/data/angkug08 .

The DO files that we provide are adapted from the DO files that Prof.
Angrist has provided on the archive. The replication files here are
divided into four parts. Two parts replicate Panel A in our table, one
part for micro estimates and one part for aggregate estimates. The
other two parts do the same thing for Panel B. Estimates for Panel C
are obtained within the DO files for micro estimates for the two
panels. In this specification, the rural departments are -treated- and
the urban departments are -controls- because coca production was
concentrated in the rural areas. This is equation (19) in the paper.

One difference between the micro estimates and the aggregate estimates
is the way in which weights are dealt with. Within the micro
estimates, all variables including the constant are transformed by the
weight variable. Thus, for micro estimates, the regressions have no
constant term, and the weight is not specified. For the aggregate
regressions, a constant term is included, and the weight is specified.

With both the micro estimates and aggregate estimates, many dummy
variables are included, all of which start with a d. Many of these
dummies are for states in Columbia, others are for ages, dage*, while
others are for years. A full list of the variables can be found in the
DO files. The variables of interest are urban95-97, urban98-00,
rural95-97, and rural98-00.

The DO files calculate cluster-robust and wild cluster bootstrap P
values for the coefficients on the variables of interest. The DO files
also calculate a wild cluster bootstrap P value for the F tests of
whether both urban coefficients are 0 and whether both rural
coefficients are 0.


Adult Men -- Column 6

Micro Estimates

This section uses the dataset AK_col6_micro.dta and the DO file
AK_col6_micro.do. 

Aggregate Estimates

The aggregate estimates repeat the analysis, but for the mean value
for each state-year pair. This section uses the dataset
AK_col6_agg.dta and the DO file AK_col6_agg.do.


Teenage Boys -- Column 9

Micro Estimates

This section uses the dataset AK_col9_micro.dta and the DO file
AK_col9_micro.do. The analysis from above is basically repeated, but
using observations from teenage boys, rather than adult men.

Aggregate Estimates

This section uses the dataset AK_col9_agg.dta and the DO file
AK_col9_agg.do.


The four AK*.csv files contain the same data as the corresponding
AK*dta files.