Neus Herranz, Ricardo Mora, and Javier Ruiz-Castillo, "An Algorithm to Reduce the Occupational Space in Gender Segregation Studies", Journal of Applied Econometrics, Vol. 20, No. 1, 2005, pp. 25-37. The source of the data is the "Encuesta de Poblacion Activa" (EPA), the Spanish Labor Force Surveys. These surveys have been carried out on a quarterly basis since 1975 and are collected by the National Bureau of Statistics (INE). The EPA is a rotating panel in which each household is interviewed during 8 consecutive quarters; thus, one eighth of the sample is renewed every quarter. In this paper, data from the second quarter are taken as representative of the year as a whole. The data set has 258,451 individual records from the years 1977 (71,864 obs.), 1992 (62,663), 1994 (57,548), and 2000 (66,376). The data set is stored in the file , an ASCII file in DOS format. Each row represents an individual record. The variables are distributed across the columns in fixed format: NAME COLUMNS DESCRIPTION sexo 1-2 0: male 1:female ocup 3-6 Before 1993: NCO 1979. After 1993: NCO 1994 act 7-9 Before 1993: NCI 1974. After 1993: NCI 1993 modified to ensure minimum cell size. facele 10-17 Elevation factor muj 18-25 sexo*facele ocupind 26-33 Occupation/Industry Codes. Before 1993: 106 categories. After 1993: 301 categories. See Appendix in reference. grup 34-36 Major Occupation/Industry Codes. See Appendix. year 37-42 Year of interview. The data file is zipped in the file hmrc-data.zip. Unix users should use "unzip -a".