Michele Aquaro, Natalia Bailey, and M. Hashem Pesaran, "Estimation and Inference for Spatial Models with Heterogeneous Coefficients: An Application to U.S. House Prices", Journal of Applied Econometrics, Vol. 36, No. 1, 2021, pp. 18-44. The data file `data_main.csv` is a comma-delimited ASCII file with CRLF line terminators. It is one of many files zipped in the file `abp-data-code.zip`. Unix/Linux users should *not* use `unzip -a`, because this file contains both text and binary files. Please visit for details on the code in Python, R, and Stata. Empirical Application ===================== Data ---- All relevant data are stored under ./2_empirical_application/data1/ A second sub-folder called `data2/` is used to store temporary data which are then used to generate tables and figures. In this paper we use an extended version of the panel dataset employed by Bailey, Holly and Pesaran (2016) and further augmented with population and per capita real income data by Yang (2020). The main dataset is `data_main.csv`, which contains the following variables: -- `quarter`: time, 1975Q1--2014Q4; -- `msacode`: MSA FIPS code; -- `regioncode`: region id (U.S. Bureau of Economic Analysis classification); -- `cpi`: Consumer Price Index (CPI), nominal values; -- `hp`: Freddie Mac House Price Index (HP); -- `pop`: population at MSA level (POP); -- `pipc`: nominal income per capita (INC). Further details on data sources can be found in the online Appendix F of the paper. Additionally, the following files contain data on (row-standardised) spatial weights matrices: -- `W75.csv`: weights based on threshold distance of 75 miles; -- `W100.csv`: weights based on threshold distance of 100 miles; -- `W125.csv`: weights based on threshold distance of 125 miles. Finally, -- `msa_coord_4matlab.csv` is a data-frame with MSAs' coordinates (latitude and longitude, in decimal degrees). -- The folder `shapefiles/` contains the shapefiles used to generate the thematic maps (Source: US Census Bureau). These shapes are then topologically simplified at 1% using the "Visvalingram weighted area" method in [mapshaper](https://mapshaper.org/); data are then transformed into coordinate reference system EPSG 2163 (US National Atlas Equal Area), and stored in `shapes_msa.rds` and `shapes_sta.rds`. Code ---- The simplest way to reproduce tables and figures in the empirical section of the paper is to open a shell and execute the Bash script `main.sh`, stored in the folder `./2_empirical_application/`. The following files are used to generate the results in Section 6 of the paper: -- `sc_defactoring.m`: This Matlab script de-seasonalises and de-factors the data as explained in Appendix F of the paper. It saves the residuals in mat-format. Notice that this script redefines region id in accordance to the exact U.S. Bureau of Economic Analysis classification. [BEA Regions](https://apps.bea.gov/regional/docs/regions.cfm), last retrieved October 25, 2018. -- `ABP_alpha_CD.prg`: This GAUSS script computes the CD statistic of Pesaran (2015) and the exponent of cross-sectional dependence developed in Bailey, Kapetanios, Pesaran (2016), alpha, for the original dataset on US real house price changes. It also compute the CD-statistic for the residuals from the de-seasonalising and de-factoring procedure applied to the US real house price changes. References to the corresponding papers are provided in the prg-file. -- `sh_hsar.m`: This Matlab script (a) loads the de-seasonalised and de-factored residuals as generated by the code above, and (b) it estimates Equation~(30) of the paper by calling the Matlab function `fn_ml_Npsi_NKbeta_Nsgmsq.m`. Estimation results are then saved in mat-format. -- `fn_ml_Npsi_NKbeta_Nsgmsq.m`: **Main part of the code**. This Matlab function estimates the Heterogeneous Spatial Autoregressive (HSAR) Model. It returns estimated coefficients and their estimated variance. Two versions of estimated variance are returned: one based on the standard formula and one based on the sandwich formula, see Equation (24) of the paper. For more info on inputs and outputs of this function, type in Matlab: >> help fn_ml_Npsi_NKbeta_Nsgmsq -- `sc_tb_mg_byregion.m`: Matlab code generating Table 3 of the paper. -- `sc_fg_map.r`: R script used to generate Figures 1, 2, and F1. -- `sc_partial_effects.r`, `sc_spill_effects_by_msa.r`, `sc_spill_effects_by_region.r`, `sc_fg_map_spill_effects.r`, and `sc_tb_spill_effects_by_region_dir_ind.r`: Compute spill-over effects as in Figures F2-F4, and Table F2. Monte Carlo Simulations ======================= Simulation Experiments ---------------------- The following Matlab files are used to generate the results in Section 5 of the paper: -- `mc_fixed_coefficients.m`. In this script file, (a) heterogeneous true coefficients are generated and then kept fix across replications; (b) simulated data are generated; (c) the HSAR model is estimated; (d) estimates are stored in a mat-file. To replicate results in Tables 1 and 2, set `N=5` and `N=100`, respectively. Set `N` to 25, 50, 75, and 100 to replicate results in Figures G1--G10. To replicate results in Tables G1 and G2, set `v_sgmsq=(v_chisq_2df./8)+0.25`, in addition to the instructions above. -- `mc_random_coefficients.m`. Same as above except for true heterogeneous coefficients which now vary across replications in accordance to the random coefficient model. Tables and Figures ------------------ The following Matlab files are used to generate tex-tables and eps/pdf-figures in the Monte Carlo section. Output is then compiled into a single LaTeX document, see ./3_latex_tables_and_figures/main.tex -- `sc_tb_bias_rmse_size_power_unitspecific_allunits.m`. It generates `tex`-files in Tables 1, 2, G1, and G2 by setting `v_N` to 5 or 100, and `parameter` to 1 or 3. -- `sc_fg_boxplot_rmse.m`. It generates eps-files in Figures G1 and G2 by setting `parameter` to 1 or 3. -- `sc_fg_boxplot_size.m`. It generates eps-files in Figures G3 and G7 by setting `parameter` to 1 or 3. -- `sc_fg_power_function_by_unit.m`. It generates eps-files in Figures G4--G6 and G8--G10 by setting `parameter` to 1 or 3. -- `sc_tb_bias_rmse_size_mg_estimator.m`. It generates tex-files in Table G3. Session Info ============ Matlab ------ ``` < M A T L A B (R) > Copyright 1984-2019 The MathWorks, Inc. R2019b Update 3 (9.7.0.1261785) 64-bit (glnxa64) November 27, 2019 ``` R - ``` R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.4 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sf_0.8-0 ggplot2_3.2.1 magrittr_1.5 dplyr_0.8.3 loaded via a namespace (and not attached): [1] Rcpp_1.0.3 units_0.6-5 tidyselect_0.2.5 munsell_0.5.0 [5] colorspace_1.4-1 R6_2.4.1 rlang_0.4.5 tools_3.6.3 [9] grid_3.6.3 gtable_0.3.0 KernSmooth_2.23-17 e1071_1.7-3 [13] DBI_1.1.0 withr_2.1.2 class_7.3-17 lazyeval_0.2.2 [17] assertthat_0.2.1 tibble_2.1.3 lifecycle_0.1.0 crayon_1.3.4 [21] purrr_0.3.3 glue_1.4.0 compiler_3.6.3 pillar_1.4.3 [25] scales_1.1.0 classInt_0.4-2 pkgconfig_2.0.3 ``` References ========== Bailey, N., S. Holly, and M. H. Pesaran (2016). A two-stage approach to spatio-temporal analysis with strong and weak cross-sectional dependence. Journal of Applied Econometrics 31 (1), 249-280. Bailey, N., G. Kapetatios, and M. H. Pesaran (2016). Exponent of cross-sectional dependence: Estimation and inference. Journal of Applied Econometrics 31 (6), 929-960. Pesaran, M. H. (2015). Testing weak cross-sectional dependence in large panels. Econometric Reviews 34 (6-10), 1089-1117. Yang, C. F. (2020). Common factors and spatial dependence: An application to US house prices. Econometric Reviews, 1-37