Neus Herranz, Ricardo Mora, and Javier Ruiz-Castillo, "An Algorithm to
Reduce the Occupational Space in Gender Segregation Studies", Journal of
Applied Econometrics, Vol. 20, No. 1, 2005, pp. 25-37.

The source of the data is the "Encuesta de Poblacion Activa" (EPA), the
Spanish Labor Force Surveys. These surveys have been carried out on a
quarterly basis since 1975 and are collected by the National Bureau of
Statistics (INE). The EPA is a rotating panel in which each household is 
interviewed during 8 consecutive quarters; thus, one eighth of the sample 
is renewed every quarter. In this paper, data from the second quarter are
taken as representative of the year as a whole.

The data set has 258,451 individual records from the years 1977 (71,864
obs.), 1992 (62,663), 1994 (57,548), and 2000 (66,376). The data set is
stored in the file <hmrc1.txt>, an ASCII file in DOS format. Each row
represents an individual record. 

The variables are distributed across the columns in fixed format:

NAME		COLUMNS		DESCRIPTION
sexo            1-2		0: male 1:female
ocup            3-6		Before 1993: NCO 1979. After 1993: NCO 1994
act             7-9		Before 1993: NCI 1974. After 1993: NCI 1993
				    modified to ensure minimum cell size.
facele          10-17		Elevation factor
muj             18-25		sexo*facele
ocupind         26-33		Occupation/Industry Codes. Before 1993: 
				  106 categories. After 1993: 301 categories.
				  See Appendix in reference.
grup            34-36		Major Occupation/Industry Codes. See Appendix.
year            37-42		Year of interview.

The data file is zipped in the file hmrc-data.zip. Unix users should use
"unzip -a".