Shanjun Li, "Traffic Safety and Vehicle Choice: Quantifying the Effects of the 'Arms Race' on American Roads", Journal of Applied Econometrics, Vol. 27, No. 1, 2012, pp. 34-62. All data files are ASCII files in DOS format. They are zipped in the file sl-data.zip. Unix/Linux users should use "unzip -a". The empirical analysis in the paper includes two parts: vehicle safety and vehicle demand. The analysis of vehicle safety is based on four data files (car.txt, truck.txt, single.txt, and fatalities_msa.txt). Vehicle demand analysis is based on a variety of data sources. Two data files (demographics.txt and vehicle_data.txt) are provided here. Another key part of the analysis uses vehicle sales data, which are from a confidential database maintained by R.L. Polk & Company. The purchase agreement between Polk and Duke University prohibits data disclosure to any party not affiliated with Duke University. Information on how to obtain these data is provided below. The analysis on vehicle safety relies on the following two data sources. The first source is police-reported traffic accident data from the General Estimate System (GES) maintained by the National Highway Traffic Safety Administration (NHTSA) of U.S. Department of Transportation. The original data sets are publicly available at ftp://ftp.nhtsa.dot.gov/ges/. Three data sets were generated based on those original data sets and used for the analysis that produces Table 4 in the paper. The first file, named car.txt, contains data on accidents involving at least one passenger car. It has 76,586 observations with 50 variables in it. The first row lists the name of each variable. The file named single.txt contains data on single vehicle accidents. It has 48,813 observations and 41 variables. The file named truck.txt contains data on accidents involving at least one light truck. It has 59,733 observations and 50 variables. The variable names for each of car.txt and truck.txt are the following: Row files: car.txt & truck.txt 1 Year 2 Case number 3 Case weight 4 Coded vehicle number 5 Number of deaths among occupants 6 Number of serious injuries among occupants 7 Number of injuries among occupants 8 Number of equivalent fatalities among occupants 9 Vehicle Model year 10 Body type 11 Region 12 Dummy for the most serious outcome being deaths 13 Dummy for the most serious outcome being serious injury 14 Dummy for the most serious outcome being injury 15 Dummy for the first driver being negligent 16 Dummy for the second driver being negligent 17 Dummy for both drivers being negligent 18 Dummy for the 2nd vehicle being a truck where the 1st is a car 19 Dummy for small city 20 Dummy for medium city 21 Dummy for big city 22 Number of deaths in the accident 23 Number of serious injuries in the accident 24 Number of injuries in the accident 25 Number of occupants 26 Dummy for seat belt usage 27 Dummy for rain 28 Dummy for snow 29 Dummy for fog 30 Dummy for other weather conditons 31 Dummy for Dark 32 Dummy for weekday 33 Dummy for the first driver being drunk 34 Dummy for the second driver being drunk 35 Dummy for the 1st driver being under influence of drugs 36 Dummy for the 2nd driver being under influence of drugs 37 Dummy for the first driver to be under 21 38 Dummy for the second driver to be under 21 39 Dummy for the first driver to be over 60 40 Dummy for the second driver to be over 60 41 Dummy for the accident to be on interstate highway 42 Dummy for the accident to be on divided highway 43 Dummy for the first driver to be male 44 Dummy for the second river to be male 45 Dummy for the first driver to be young male 46 Dummy for the second driver to be young male 47 Dummy for the 1st vehicle 10 miles over speed limit 48 Dummy for the 2nd vehicle 10 miels over speed limit 49 Number of occupants in the first vehicle 50 Number of occupants in the second vehicle The variable names for single.txt are: Row file: single.tx 1 Year 2 Case number 3 Case weight 4 Coded vehicle number 5 Number of deaths among occupants 6 Number of serious injuries among occupants 7 Number of injuries among occupants 8 Number of equivalent fatalities among occupants 9 Dummy for the vehicle being a car 10 Dummy for the vehicle being a truck 11 Vehicle Model year 12 Body type 13 Region 14 Dummy for the most serious outcome being deaths 15 Dummy for the most serious outcome being serious injury 16 Dummy for the most serious outcome being injury 17 Dummy for both drivers being negligent 18 Dummy for small city 19 Dummy for medium city 20 Dummy for big city 21 Number of deaths in the accident 22 Number of serious injuries in the accident 23 Number of injuries in the accident 24 Number of occupants 25 Dummy for seat belt usage 26 Dummy for rain 27 Dummy for snow 28 Dummy for fog 29 Dummy for other weather conditons 30 Dummy for Dark 31 Dummy for weekday 32 Dummy for the driver being drunk 33 Dummy for thedriver being under influence of drug 34 Dummy for the driver to be under 21 35 Dummy for the driver to be over 60 36 Dummy for the accident to be on interstate highway 37 Dummy for the accident to be on divided highway 38 Dummy for the driver to be male 39 Dummy for the driver to be young male 40 Dummy for thevehicle driving 10 miles over speed limit 41 Number of occupants in the vehicle The second source for the first part of empirical analysis is National fatal crash statistics from the Fatality Reporting Analysis System (FARS) also maintained by the NHTSA. The original data are from ftp://ftp.nhtsa.dot.gov/fars/. Based on those data, one file named fatalities_txt.dat was generated. It contains 464,781 observations and 16 variables about fatal crashes in the 20 MSAs under study in the paper. Together with the data from the first source, this data set was used to construct vehicle safety measures for passenger cars and light trucks for the 20 MSAs as discussed in the appendix of the paper, which were then used in the demand analysis. The variable names are: Row file: fatalities_msa.tx 1 State ID 2 Vehicle Number 3 Vehicle Make 4 Vehicle Model 5 Body type 6 Number of occupants 7 Number of deaths in the vehicle 8 Case number of the state 9 Model year 10 County ID 11 Number of deaths in the accident 12 Year 13 MSA ID 14 Dummy for multiple vehicle crash 15 Dummy for light truck 16 Dummy for passenger car The second part of the empirical analysis is on vehicle demand. Two data files are provided. The file named demographics.txt is a random draw of 250 households for each of 20 MSAs from Census 2000 as shown in Table 1 in the paper. 5 household characteristics are collected for each household: income in $10,000, household size, dummy for renter, dummy for presence of children, and average travel time to work. The first 1:250 observations (250 by 5) is for the first MSA (Albany, NY), and the last block of 250 by 6 observations is for the 20th MSA (Syracuse, NY). The file named vehicle_data.txt contains 32160 observations with 8 variables. They are: year, MSA ID, vehicle price in $1,000, dummy for car, dummy for van, dummy for SUV, dummy for pickup truck, vehicle size, horsepower, and fuel cost in dollars per mile driven. Another key data set of the analysis is vehicle sales data at vehicle model level for each of the 20 MSAs from 1999 to 2006. These data are from a proprietary data base maintained by R.L. Polk & Company, a marketing research company based in Detroit (http://usa.polk.com/Company/WhoWeAre/). The data can be purchased from the company. Here is the contact information of one marketing representative at Polk: Raymond W. Alvarado Account Executive/ Automotive Information Solutions R.L. Polk & Co. 248-728-7510 Direct Line 248-728-6843 Fax Ray_Alvarado@polk.com Questions regarding data acquisition can be directed to Shanjun Li at li@rff.org.