Bart Cockx, Matteo Picchio, and Stijn Baert, "Modeling the Effects of Grade Retention in High School", Journal of Applied Econometrics, Vol. 34, No. 3, 2019, pp. 403-424. This paper uses the longitudinal SONAR dataset. The SONAR dataset was collected starting in 1999 by the Flemish inter-university research group for youths born in 1976, 1978, and 1980 and living in Flanders. The participants to the survey were chosen by randomly selecting from the National Register about 3,000 individuals per each cohort. From the original sample, we selected those born in 1978 and 1980. We removed pupils whose grandmother on mother's side had a foreign nationality (584 pupils deleted), pupils who needed special help, temporarily or permanently, and were therefore in special schools, and pupils who started high school when older than 15 (473 pupils deleted). We dropped students entering the arts track (103 students), those leaving school before the end of compulsory education (9 pupils), those ending in part-time education (183 students), those with inconsistent or missing information on the end-of-year evaluation and grade mobility (396 pupils) and those with missing values for some of the covariates used in the econometric analysis (146 students). Since only 42 students were retained in seventh grade and only 46 students made track transitions involving more than two steps, we deleted their records from our sample. After applying these selection criteria, we ended up with a sample of 3,933 pupils who were observed in each year of their high school career. The matlab dataset is made available. The fila data_wide.mat is a wide format and it is a matrix of dimension 3933x301. Each row is a student. Each column contains a different variable, whose meaning is explained below: id = data(:,1); % individual identified coho = data(:,2); % cohort (1978 or 1980) fem = data(:,3); % =1 if female bro = data(:,4); % Number of brothers at 14 sis = data(:,5); % Number of sisters at 14 ageduf = data(:,6); % Age father left education agedum = data(:,7); % Age mother left education unemf = data(:,8); % Father was unemployed when student was 14 unemm = data(:,9); % Mother was unemployed when student was 14 sta_pri = data(:,10); % Year started primary school end_pri = data(:,11); % Year ended primary school monthb = data(:,12); % Month of birth int_26 = data(:,13); %Interviewed at 26 int_29 = data(:,14); %Interviewed at 29 birdate = data(:,15); % Date of birth in calendar time January 1976=1 intdate = data(:,16); % Calendar time of the last interview: =1 at January 1976 ageint = data(:,17); % Age in months at the last interview agespri = data(:,18); % Age at which primary school started ageepri = data(:,19); % Age at which primary school ended sta_sec = data(:,20); % Starting year of high school agessec = data(:,21); % Age at which high school started faele = data(:,22); % Number of failures at the end of primary school delay = data(:,23); % Age at which secondary school started minus 6 fele = data(:,24); % At last one failure at the end of primary school drop = data(:,25); % School drop out teve = data(:,26); % Total number of years in high school retention = data(:,27:37); % Retention dummy across max. 11 years of high school bdown = data(:,38:48); % 2-step track downgrade across max. 11 years of high school sdown = data(:,49:59); % 1-step track downgrade across max. 11 years of high school downgrade = data(:,60:70); % Track downgrade across max. 11 years of high school grade = data(:,71:81); % Grade across max. 11 years of high school year = data(:,82:92); % Calendar year across max. 11 years of high school diploma = data(:,93); % =1 if high school is completed and diploma attained yexit = data(:,94:104); % High school exit at the end of the school year across max. 11 years of high school noexit = data(:,105:115); % No school exit at the end of the school year across max. 11 years of high school nodip = data(:,116:126); % High school exit without diploma at the end of the school year across max. 11 years of high school dip = data(:,127:137); % High school exit with diploma at the end of the school year across max. 11 years of high school cens = data(:,138:148); % High school exit to part-time education at the end of the school year across max. 11 years of high school fuga = data(:,149:159); % High school exit before June across max. 11 years of high school sam = data(:,160:170); % =1 if the student started the school year across max. 11 years of high school course = data(:,171:181); % School track across max. 11 years of high school pdrop = data(:,182:192); % Legally possible to drop at the end of the school year across max. 11 years of high school resu = data(:,193:203); % Evaluation across max. 11 years of high school eduf = data(:,204); % Father's education edum = data(:,205); % Mother's education sib = data(:,206); % Number of siblings at 14 sib0 = data(:,207); % Number of siblings is 0 sib1 = data(:,208); % Number of siblings is 1 sib2 = data(:,209); % Number of siblings is 2 sib3 = data(:,210); % Number of siblings is 3 or more eduf1 = data(:,211); % Father's education: primary or missing eduf2 = data(:,212); % Father's education: lower secondary eduf3 = data(:,213); % Father's education: upper secondary eduf4 = data(:,214); % Father's education: Tertiary education edum1 = data(:,215); % Mother's education: primary or missing edum2 = data(:,216); % Mother's education: lower secondary edum3 = data(:,217); % Mother's education: upper secondary edum4 = data(:,218); % Mother's education: Tertiary education kid = data(:,219:229); % =1 if student has a kid across max. 11 years of high school freta = data(:,230:240); % =1 if father has already retired across max. 11 years of high school mreta = data(:,241:251); % =1 if mother has already retired across max. 11 years of high school preta = data(:,252:262); % =1 if at least one parent has already retired across max. 11 years of high school fdeth = data(:,263:273); % =1 if father has already passed away across max. 11 years of high school mdeth = data(:,274:284); % =1 if mother has already passed away across max. 11 years of high school pdeth = data(:,285:295); % =1 if at least one parent has already passed away across max. 11 years of high school dayb = data(:,296; % Day of birth (from 1 to 365) moedu = data(:,297); % Mother's education faedu = data(:,298); % Father's education moedu_im = data(:,299); % Mother's education is imputed faedu_im = data(:,300); % Father's education is imputed univ = data(:,301); % =1 if the student will go to university The data are in the file zipped folder cpb-data.zip, which contains the dataset both in Matlab format (data_wide.mat) and in an ASCII file in DOS format (data_wide.csv). The zipped folder cpb-estimation.zip contains the matlab files used to estimate the benchmark model. The file data.m loads the dataset data_wide.mat, creates variables for the construction of the log-likelihood function, starts the minimization with analytical derivatives of the function into g_function_r.m, which contains minus the log-likelihood. The analytical derivatives in g_function_r.m were compiled using ADiMat. The file function_r.m contains minus the log-likelihood and it can be used for minimization with fminunc with numerical derivatives. In the subfolder 'results', the estimation outputs are stored by data.m.