Boris Augurzky and Jochen Kluve, "Assessing the performance of matching
when selection into treatment is strong", Journal of Applied
Econometrics, Vol. 22, No. 3, 2007, pp. 533-557.

All program and data files are ASCII files in DOS format. Program files are
zipped in the file ak-progs.zip. Data files are zipped in the file
ak-data.zip. Unix users should use "unzip -a".

Estimation procedure

Overview of the optimal matching procedure:

(1) The raw data are extracted from the National Longitudinal Survey of
Youth 1979 and cover the time period 1979 and 1994: "PRESPEC.ASC". Note that
there is a dictionary at the beginning of the ASCII data file that explains
the variables. 

(2) The STATA Do-File "SPEC04.DO" prepares the extracted data for the
matching algorithm (e.g. variable definitions and propensity score
estimation). Matching itself is done in GAUSS and in SAS. The relevant data
set that is used in the following step is "BMB.ASC". 

(3) The GAUSS file "PRE09.G" prepares the data such that they can be used by
the operations research procedure "netflow" in SAS, which is a general
routine to solve minimum cost flow problems. The main purpose of "PRE08.G"
is to define the matching parameters and choose the relevant subsample.
"PRE08.G" uses the ASCII data set "BMB.ASC". Other data specifications than
"BMB.ASC" are possible but not used in the paper. They are available from
the authors upon request or can be generated using "SPEC04.DO". Resulting
data sets of "PRE09.G" are

a. PRE09.OUT as log file,
b. ARCINF1.OUT to ARCINF10.OUT for POST12.G,
c. ARCS1.OUT to ARCS10.OUT for OPTMATCH.SAS,
d. NODES1.OUT to NODES10.OUT for OPTMATCH.SAS.

(4) The SAS file "OPTMATCH02.SAS" pursues the optimal full matching based on
the SAS procedure "netflow" and the above input files. Output files are
a.ARCOUT1.TXT to ARCOUT10.TXT for POST12.G

(5) Finally, the GAUSS file "POST12.G" calculates the treatment effects
based on the SAS output, and then outfiles the results.

Note that the folder structure in the files mentioned above has to be
adapted to the user's folder structure.

Overview of the greedy full matching procedure:

Steps (3) and (4) are both done in GAUSS by "GREEDY08.G". Since there is no
optimal matching SAS is not required. All other steps are identical. 

Overview of the greedy pair matching procedure:

Steps (3) and (4) are both done in GAUSS by "GPAIR07.G". Since there is no
optimal matching SAS is not required. Step (5) is done in GPPOST12.G".

The data are in the file "prespec.asc":
- Original number of observations:      9202
- Original number of variables:          142 
- Size of original data set:          5.3 MB

The number of observations used for matching depends on the chosen parameter
specifications and the subsample; see tables 2 - 5 in the paper.

In the propensity score matching there are the following main variables.

(a) The hourly rate of pay as treatment outcome based on wages between 1979
and 1995. 

(b) The treatment indicator, which equals 1 if the person received a
bachelor's degree (treatment) and 0 otherwise.

(c) The propensity score estimated by a probit model given several
explaining variables, see table 7 of the paper.

Mahalanobis matching, in contrast, directly uses the variables of table 7
for calculating distances between treatment and control individuals. 

We use the 1979 sample weights when calculating the treatment effect in
GAUSS.