Sylvia Kaufmann and Christian Schumacher, "Identifying Relevant and Irrelevant Variables in Sparse Factor Models", Journal of Applied Econometrics, Vol. 32, No. 6, 2017, pp. 1123-1144. The paper uses two datasets: 1) International GDP growth data: pwt.csv 2) CPI inflation data for the US: mmw09.csv Both files are csv files in DOS format. The semicolon is used to separate entries, because variable names in the second dataset contain commas. See the appendix C.2 of the paper for the detailed variable names. The files are zipped in the file sk-data.zip. Unix/Linux users should use "unzip -a". 1) International GDP growth data: The frequency of the data is annual over the time span 1961 to 2009, yielding 49 time series observations for each country. The dataset contains 57 countries. Each country belongs to one of the geographical regions: Africa, Asia 1 (less developed), Asia 2 (more developed), Europe, Latin America, North America and Oceania. Country names can be found in the first row of the table, the geographical region each country belongs to in the second row. The first column contains the time periods in years corresponding to the annual time series observations. The source of the data is the Penn World Tables, version 7.0 (Heston A, Summers R, Aten B. 2011. Penn World Table Version 7.0. Center for International Comparisons of Production, Income and Prices at the University of Pennsylvania, June). The GDP data in levels is transformed into annual growth rates by taking first differences of the logarithm of the level series. 2) CPI inflation data for the US: The frequency of the data is monthly over the time span February 1985 to May 2005, yielding 244 time series observations for each CPI subcomponent. The data contains 79 CPI subcomponents. Each CPI subcomponent belongs to one of groups: Food and beverages, housing, apparel, transportation, medical care, recreation, education and communication, and other goods and services. CPI subcomponent names can be found in the first row of the table, and the groups each subcomponent belongs to in the second row. The first and second columns contain, respectively, the year and month corresponding to the monthly time series observations. The source of the data is the paper Mackowiak, Mönch, and Wiederholt (2009). The CPI index level data in seasonally adjusted form can be found on the journal homepage under the DOI for that paper, Appendix A (Mackowiak B, Mönch E, Wiederholt M. 2009. Sectoral price data and models of price setting. Journal of Monetary Economics 56: S78-S99. doi: 10.1016/j.jmoneco.2009.06.012). The data transformation is the same as in the reference paper. The CPI level data in levels is transformed into month-on-month rates of change by taking first differences of the logarithm of the level series.