Gregorio Caetano, Josh Kinsler and Hao Teng, "Towards Causal Estimates 
of Children's Time Allocation on Skill Development", Journal of Applied 
Econometrics, Vol. 34, No. 4, 2019, pp. 588-605.

Data used in this paper are from the Panel Study of Income Dynamics
(PSID) main study and the 1997, 2002 and 2007 waves of the Child
Development Supplements (CDS-I, CDS- II, and CDS-III).

Five raw data files are included in the zip file ckt-data.zip.
1997TD.csv, 2002TD.csv and 2007TD.csv are children's time diary
datasets from Wave I (1997), Wave II (2002), and Wave III (2007)
respectively. Child_TA.csv is the main dataset from the Child
Development Supplements.  PSID_family_select.csv is part of the PSID
Main Family Data. All five datasets can be downloaded from the PSID
website: https://simba.isr.umich.edu/VS/f.aspx. More details about the
data files and how we organize them are as follows. 

All data file are ASCII files in DOS format. Unix/Linux users should
use "unzip -a".


**Data files**:

1.1 Time Diary Datasets

    1997TD.csv (N = 131,060)
    2002TD.csv (N = 99,467)
    2007TD.csv (N = 57,813)

Each row represents an activity done by a CDS child during a
continuous time period. We don't change how variables are named.
PSID-CDS codebook provides detailed information about what each
variable is. 

1.2 CDS Main Dataset:

    Child_TA.csv (N = 3,563)

The dataset includes information about 3,563 CDS children's
demographic characteristics, family background, cognitive and
non-cognitive skill measures, etc. We don't change how variables are
named. PSID-CDS codebook provides detailed information about what each
variable is. 
    

1.3 PSID Main Family Dataset:

    PSID_family_select.csv (N = 34,004)

The dataset selects 95 family demographic variables from the PSID Main
Family Data. The dataset covers 1997, 1999, 2001, 2003, 2005 and 2007.
We don't change how variables are named. PSID Main Family Data
codebook provides detailed information about what each variable is. 


**Data organization**:

Following the steps below, one should be able to go from the five raw
data files to the final dataset used for estimation in this paper. 

Step 1. Drop atypical time dairies and recode the original time dairy
data (i.e. 1997TD.csv, 2002TD.csv and 2007TD.csv) into fewer
categories. Details about the categories are described in Section 2.1
of the paper. As a result, three new time diary datasets for 1997,
2002 and 2007 respectively are created. They will be used in step 2.

Step 2. Merge the three new time diary datasets into the CDS main
dataset (i.e. Child_TA.csv). As a result, a new CDS dataset is created
and it will be used in step 3. 

Step 3. Merge the new CDS dataset with the PSID Main Family Dataset
(i.e. PSID_family_select.csv). Then convert the merged dataset into a
panel dataset. As a result, the final dataset used for estimation is
created.