Matteo Barigozzi and Alessio Moneta, "Identifying the Independent
Sources of Consumption Variation", Journal of Applied Econometrics,
Vol. 31, No. 2, 2016, pp. 420-449.

The code and data files are zipped in the file bm-files.zip, which
contains four subdirectories (folders). All files are ASCII files in
DOS format. Unix/Linux users should use "unzip -a".

The code consists of two parts: Part I builds the dataset and is
written in R (see http://cran.r-project.org/). The second part
performs the estimation of the econometric model and is written
partially in Matlab R2012a (http://www.mathworks.com/) and partially
in R.

This readme file permits the replication of the results.

*****

PART I: Building the dataset

1. RAW DATA are to be downloaded from 

http://discover.ukdataservice.ac.uk/series/?sn=200016 

(Food and Expenditure Survey: it regards years 1977-2000)

and

http://discover.ukdataservice.ac.uk/series/?sn=2000028

(Family and Expenditure and Food Survey: it regards years 2001-2006).

Price indices data (RPI), which we obtained from
http://www.ons.gov.uk/, are made directly available in the file
"price_indices.csv".

2. SAVE DATA (for the years 1977-2006) in a folder called "datafes"

3. RUN the R code: "prepare_data.r" (needed package: "foreign"). Read
instructions therein. This code will generate a csv file for each
year(from 1968 to 2006) in the folder "original_data".

4. RUN the R code: "build_dataset.r" (needed package: "sm"). Read
instructions therein. This code will generate the dataset as described
in Section 3 and Table 1 of the paper in a folder called "data". The
code is able to generate several sub-folders of the folder "data".
Each of them contains the dataset given the time window under
consideration. For example: the folder "1997_2006" contains the data
for the time window 1997-2006. Each of these folders contains a dataset
for a given year, given the demographic characteristics under
consideration: for example, the file "dat1987_def_mem1.csv" is the
data set for 1987, with households having just a member. The code will
generate also the total expenditure data in the folder
"tot_exp_descriptive". 

*****

PART II: Estimating the model

DATA

The folder "data" contains the output of PART I in csv format. 

The folder "tot_exp_descriptive" contains the total expenditure data
and is also needed.

CODES

The results in Sections 4 and 7 of the paper are obtained as follows
(just select the number of household member when asked)

A. Section 4

1. Distribution of total expenditures (Figure 1) is obtained using
data and code in the folder called "totexp_descriptive"

2. Using the code "main_ldu.m". Number of factors in single years. See
Table 2.

3. Using the code "main_blocks.m". Factors space estimation in single
years. See Table 3.

B. Section 7 all results except for the average derivatives (Table 9)
are obtained using the code "main.m". This file produces automatically
Figures 3 and 4 while Figure 2 can be created manually from step 3
below.

1. Average budget shares are computed in lines 69-73. See Table 4.

2. Number of factors is estimated in lines 83-91. See Table 5. (A
   figure for ABC criterion is also produced).

3. Factors and loadings are estimated and identified in lines 94-124.
   See Table 6. 

4. Non-parametric estimates of basic Engel curves are in lines 208-225.

5. Parametric estimates of basic Engel curves are in lines 228-308.
   (min and max percentiles of total expenditure should be given when
   asked). See Tables 7-8.

NB. The file runs 1000 bootstrap replications to compute confidence
intervals. Modify line 177 to speed up estimation.

C. Average derivative estimates (Table 9) are obtained with the R
code. "analysis_derivative.r". This code needs the code "funktionen.r"
(made available) and the packages "sm" and "KernSmooth".

D. Additional functions needed are also attached.

*****

For questions, please contact Matteo Barigozzi
(m.barigozzi'at'lse.ac.uk) or Alessio Moneta (amoneta'at'sssup.it).