Author ORCID Identifier

Year of Publication


Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation




Civil Engineering

First Advisor

Dr. Mei Chen


Rural two-lane highways account for 76% in mileages of the total paved roads in the US. In Kentucky, these roads represent 85 % of the state-maintained mileages. Crashes on these roads account for 40% of all crashes, 47% of injury crashes, and 66% of fatal crashes on state-maintained roads. These statistics draw attention to the need to investigate the crashes on these roads. Several factors such as road geometries, traffic volume, human behavior, etc. contribute to crashes on a road. Recently, studies have identified speed as one of the key factors of crashes as well as the severity associated with them and indicated the need to incorporate speed into predicting crashes and severity. Such studies are limited for rural two-lane highways due to the lack of measured speed data in the past. This study fills this gap by utilizing widely available measured speed data on these roads and investigates the relationship between speed and crashes on rural two-lane highways. This study collected crash, speed, traffic, and road geometric data for rural two-lane highways in Kentucky. Particularly for the speed, this study utilized GPS-based probe data. The speed data was integrated with the crash data and road attributes for the rural two-lane highways. This study utilized the speed measures directly calculated from the measured speed data and evaluated the effect of speed on the crashes of these roads. At first, this study investigated the effect of speed by incorporating average speed along with traffic volume and length in the crash prediction model for total number of crashes. A zero-inflated negative binomial model was utilized to account for the overdispersion from excess zero crashes in the dataset. From the model, a negative relationship was identified between average speed and number of crashes. One possible explanation is that rural two-lane roads with higher speeds tend to be those main corridors with better geometric conditions. Furthermore, the significance of speed in the model varies with the operating speed on these roads. This suggested considering speed as a categorizer to develop separate models for different speed ranges. Separating models based on speed provided improved prediction performance compared to an overall model. Operating speed often reflects geometric conditions. Therefore, this study also evaluated how the change in the 85th percentile speed from one section to another road section affects the crashes of a road. The analysis showed that more crashes tend to occur when the 85th percentile speed differential between consecutive segments increases. However, further investigation showed that speed differential may not be a suitable indicator of identifying the locations with a high risk of crashes, rather it can be applied for design improvement of the roads. Later, this study investigated spatial heterogeneity of the effect of speed in addition to other factors utilizing a geographically weighted regression model. The model accounted for the geographical location of the data and helped to investigate the spatially varying effect of speed. The results from this model showed that the significance of speed can vary at different locations, which is not observed in the global model. In some regions, speed actually reflects the local geometric conditions of the roads. On the road with poor geometric conditions, crashes tend to be higher. The safety improvement strategies for these roads can focus on improving the geometric conditions such as providing shoulders, realigning the sharp curves, etc. Furthermore, speed seemed to increase crashes in some locations with good geometric conditions and low traffic volume. Speed was indeed a critical factor for these locations and safety countermeasures should be recommended considering the operating condition. Utilizing measured speed data, this study also explored the effect of speed separately on KABC and PDO crashes for these roads. Separate models were developed for KABC and PDO crashes using a zero-inflated Poisson model form. Results from the models showed that speed had a positive relationship with KABC crashes, but a negative relationship with PDO crashes. For the KABC crashes, more KABC crashes tend to occur on high-speed roads. In contrast, PDO crashes tend to be higher on low-speed roads with poor geometric conditions. Furthermore, this study separated the models for each severity level using speed as a categorizer. The models developed at individual speed ranges revealed a varying effect of speed over the different speed ranges of these roads. For example, speed had a positive effect on KABC crashes of low and medium-speed roads, whereas it had a negative influence on crashes of high-speed roads. Further investigation of the study data showed that most of the low and medium-speed roads had poor geometric conditions (narrow shoulder and lane widths with the presence of sharp curves), whereas, high-speed roads had standard geometric conditions. Especially on low-speed roads, it is understandable that a crash can be severe when speed goes up under such restrictive geometric conditions of the roads. In contrast, on high-speed roads, the number of severe crashes tends to be low under standard geometric conditions. Additionally, separating models considering speed ranges provided 19% and 6.5% improvement respectively for KABC and PDO crashes compared to the overall models. Such models can help the agencies to adopt strategies for minimizing crashes at different severity levels based on the speed condition of the road. This study further looked at the effect of speed using Random Forest model since it can deal with multicollinearity between explanatory variables and requires no assumptions on the functional form. After including all the traffic and geometric variables in the model, speed showed 11.5% importance. Compared to the traditional count model, the model provided a better fit with an improved performance of 13%. For better predictability, planning level safety analysis can utilize such machine learning model.

Digital Object Identifier (DOI)