Gerald Carlino and Thorsten Drautzburg, "The Role of Startups for
Local Labor Markets", Journal of Applied Econometrics, Vol. 35,
No. 6, 2020, pp. 751-775.

All data and program files are stored in cd-files.zip. Since there are
both binary files and ASCII files, Unix/Linux users should *not* use
"unzip -a".

Raw data description:

The data are contained in VAR_DATA.csv and in VAR_DATA.mat. The
dataset covers 38 years and 354 MSAs. In VAR_DATA.mat, each data
series is a 38 x 354 matrix. In VAR_DATA.csv, each series is a 
(38*354) x 1 vector. The series are:

- Year (vYear in .mat): Year.

- MSA_FIPS (vMSA in .mat): MSA FIPS code.

- dlog_pop: Population growth: We compute the log growth rate of
  Census population. The log growth rate has the advantage of being
  additive to compute level changes, from which we can back out the
  change in the employment level.

- dlog_wage: Wage growth: Growth of average wages: We compute the log
  growth rate of the average wage rate in the County Business Patterns.

- v_migrant_rate_exm: Net migration rate: We define the net migration
  rate as the difference between inflows and outflows of IRS
  exemptions, divided by the population level in the prior period. 

- vfirm_entry_rate: Firm entry rate: We define the firm entry rate as
  the change in the number of firms aged 0, divided by the average of
  the number of firms of any age in the current and prior year.

- vfirm_exit_rate: Firm exit rate: We define the firm exit rate as the
  change in the number of firms aged 1 that exit, divided by the
  average of the number of firms of any age in the current and prior year.

- vfirm_exit_rate_all: Overall firm exit rate: We define the overall
firm exit rate as the change in the number of firms of any age that
exit, divided by the average of the number of firms of any age in the
current and prior year.

- vjob_creation_rate_births: Job creation rate: We define the job
  creation rate as the change in job creation by firms aged 0, divided
  by the average of overall private employment in the current and
  prior year.

- vlog_age0_sz: Startup average size: We compute the average size of a
  startup as the log of the ratio of startup employment in an MSA
  divided by the number of startups in an MSA.

- vlog_emp_pop: Employment-to-population ratio: We use overall
  employment from the County Business Patterns. This measure agrees
  closely with BDS employment. We use Census population to compute the
  employment-to-population ratio. It enters the analysis in logs.

- vpop: Employment-population ratio (log)

- vZit_Bartik: Overall labor demand shock proxy, varying base year
  weights. Source: CBP.

- vZit_Bartik1974: Overall labor demand shock proxy, constant base
  year weights. Source: CBP.

- vZit_Bartik_firm: Barriers to entry shock proxy, varying base year
  weights. Source: CBP and BDS.

- vZit_Bartik_firm1974: Barriers to entry shock proxy, constant base
  year weights. Source: CBP and BDS.

- vZit_Bartik_jc: Startup productivity shock proxy, varying base year
  weights. Source: CBP and BDS.

- vZit_Bartik_jc1974: Startup productivity shock proxy:, constant base
  year weights. Source: CBP and BDS.


Omitted data:

1) The house price data in the paper are provided by CoreLogic
Solutions, see 
https://www.corelogic.com/insights-download/home-price-index.aspx. 

o The data are provided at the level of Core Based Statistical Area
  (CBSA)/Metro areas. 

o We follow that definition, except for the following cities, where we
  used the main division instead: Boston, Chicago, Dallas, Detroit,
  Los Angeles, Miami, NYC, Philadelphia, San Francisco, Seattle,
  Washington DC

o The data are monthly. Given that our main data are based on
  mid-March payroll, we use the data from March in every year.

o We use the house prices for the tier ?Single Family Combined?. 


2)	TFP data used in the Appendix are described in Appendix E. 

Replication files -- General notes:

- The replication files use Matlab for the main results and Stata for
  the comparison of historical shocks with external data.

- The underlying house price data in the article are proprietary and
  omitted. Since house prices were included only in the "periphery",
  the main results here are unaffected and directly comparable to
  those in the article.

- The code is located in the "Code" folder. The data is located in the
  "Data" folder. The code saves results in the "Graphs" and "Tables"
  folders.

- A TeX file assembling all the results in contained in the subfolder
  "TeX". It may require minor adjustments in file names.

Data assembly: 

- Run MakeData_JAE.m to construct the data from raw series.
  Alternatively, a Matlab MAT-file with the data is provided:
  VAR_DATA.mat, also as a CSV in VAR_DATA.csv. Figure 1:

- Run MotivatingScatter.m

Figure 2:

- Run BartikPlots.m

Figures 3 and 4, and Tables 1, 2, and 3:

- Run SPVAR_split_jae.m with VAR_SET=13 and WHICH_IV=1. 

Figure 5:

- Run SPVAR_split_jae.m with VAR_SET=130 and WHICH_IV=5. 

Figure 6 and 7: 

- Run SPVAR_split_jae.m with VAR_SET=13 and WHICH_IV=1 (from Figurs 3
  and 4).

- Then run my_Counterfactual.m

- Last, run VC_Shocks.do in Stata.