Martin Huber and Blaise Melly, "A Test of the Conditional Independence Assumption in Sample Selection Models", Journal of Applied Econometrics, Vol. 30, No. 7, 2015, pp. 1144-1168. The data used for the simulations in section 4 of the paper are in the file simulations.txt, an ASCII file in DOS format. The same data are also in the Stata dataset simulations.dta. The data used for the application in section 5 of the paper are in the file application.txt, also an ASCII file. The same data are also in the Stata dataset application.dta. The two text files are zipped in the file hm-data-ascii.zip. Unix/Linux users should use "unzip -a". The two .dta files are zipped in the file hm-data-stata.zip. Unix/Linux users should use "unzip". The econometric results used in this study were implemented using the programming language R. The codes used to generate Figure 1 are in figure1.R. The codes used to generate Figure 2 are in figure2.R. The codes used to generate the simulations results in Section 4 are in simulations.R. The codes used to generate the empirical results in Section 5 are in application.R. All the R files are ASCII files in DOS format. They are zipped in the file hm-codes.zip. Unix/Linux users should use "unzip -a". Data Description: The original source for both datasets is the "merged outgoing rotation groups" extract of the CPS for the year 2011. This file (morg11.dta) can be downloaded from http://data.nber.org/morg/annual/. This dataset is very well documented at the same NBER website. The Stata do file "CPS-ORG-2011-data-preparation.do", which generated application.dta and simulations.dta starting from morg11.dta, is provided in hm-codes.zip, but it is not strictly needed because its only goal is to produce these two datasets. simulations.dta contains 63697 observations and 11 variables. application.dta contains 45,296 and 23 variables.