Boris Augurzky and Jochen Kluve, "Assessing the performance of matching when selection into treatment is strong", Journal of Applied Econometrics, Vol. 22, No. 3, 2007, pp. 533-557. All program and data files are ASCII files in DOS format. Program files are zipped in the file ak-progs.zip. Data files are zipped in the file ak-data.zip. Unix users should use "unzip -a". Estimation procedure Overview of the optimal matching procedure: (1) The raw data are extracted from the National Longitudinal Survey of Youth 1979 and cover the time period 1979 and 1994: "PRESPEC.ASC". Note that there is a dictionary at the beginning of the ASCII data file that explains the variables. (2) The STATA Do-File "SPEC04.DO" prepares the extracted data for the matching algorithm (e.g. variable definitions and propensity score estimation). Matching itself is done in GAUSS and in SAS. The relevant data set that is used in the following step is "BMB.ASC". (3) The GAUSS file "PRE09.G" prepares the data such that they can be used by the operations research procedure "netflow" in SAS, which is a general routine to solve minimum cost flow problems. The main purpose of "PRE08.G" is to define the matching parameters and choose the relevant subsample. "PRE08.G" uses the ASCII data set "BMB.ASC". Other data specifications than "BMB.ASC" are possible but not used in the paper. They are available from the authors upon request or can be generated using "SPEC04.DO". Resulting data sets of "PRE09.G" are a. PRE09.OUT as log file, b. ARCINF1.OUT to ARCINF10.OUT for POST12.G, c. ARCS1.OUT to ARCS10.OUT for OPTMATCH.SAS, d. NODES1.OUT to NODES10.OUT for OPTMATCH.SAS. (4) The SAS file "OPTMATCH02.SAS" pursues the optimal full matching based on the SAS procedure "netflow" and the above input files. Output files are a.ARCOUT1.TXT to ARCOUT10.TXT for POST12.G (5) Finally, the GAUSS file "POST12.G" calculates the treatment effects based on the SAS output, and then outfiles the results. Note that the folder structure in the files mentioned above has to be adapted to the user's folder structure. Overview of the greedy full matching procedure: Steps (3) and (4) are both done in GAUSS by "GREEDY08.G". Since there is no optimal matching SAS is not required. All other steps are identical. Overview of the greedy pair matching procedure: Steps (3) and (4) are both done in GAUSS by "GPAIR07.G". Since there is no optimal matching SAS is not required. Step (5) is done in GPPOST12.G". The data are in the file "prespec.asc": - Original number of observations: 9202 - Original number of variables: 142 - Size of original data set: 5.3 MB The number of observations used for matching depends on the chosen parameter specifications and the subsample; see tables 2 - 5 in the paper. In the propensity score matching there are the following main variables. (a) The hourly rate of pay as treatment outcome based on wages between 1979 and 1995. (b) The treatment indicator, which equals 1 if the person received a bachelor's degree (treatment) and 0 otherwise. (c) The propensity score estimated by a probit model given several explaining variables, see table 7 of the paper. Mahalanobis matching, in contrast, directly uses the variables of table 7 for calculating distances between treatment and control individuals. We use the 1979 sample weights when calculating the treatment effect in GAUSS.