TY - JOUR
T1 - Comparing prediction performance for crash injury severity among various machine learning and statistical methods
AU - Zhang, Jian
AU - Li, Zhibin
AU - Pu, Ziyuan
AU - Xu, Chengcheng
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 61620106002, Grant 71871057, and Grant 51608115, in part by the National Key R and D Program in China under Grant 2016YFB0100906 and Grant 2016YFE0108000, and in part by the Fundamental Research Funds for the Central Universities under Grant 2242018K30015, Grant 2242017K40130, and Grant 2242018K41009.
Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 61620106002, Grant 71871057, and Grant 51608115, in part by the National Key R&D Program in China under Grant 2016YFB0100906 and Grant 2016YFE0108000, and in part by the Fundamental Research Funds for the Central Universities under Grant 2242018K30015, Grant 2242017K40130, and Grant 2242018K41009.
Publisher Copyright:
© 2018 IEEE.
Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.
PY - 2018
Y1 - 2018
N2 - Crash injury severity prediction is a promising research target in traffic safety. Traditionally, various statistical methods were used for modeling crash injury severities. In recent years, machine learning-based methods are becoming popular due to their good predictive performance. However, the machine learning-based models are usually criticized as they perform like a black-box. In this paper, we aim at comparing the predictive performance, including prediction accuracy and estimation of variable importance, among various machine learning and statistical methods with distinct modeling logic for crash severity analysis. The crash severity, road geometry, and traffic flow data were collected at freeway diverge areas in Florida. We estimated two most commonly used statistical methods which were ordered probit (OP) model and multinomial logit model, and four popular machine learning methods, including K-Nearest Neighbor, Decision Tree, Random Forest (RF), and Support Vector Machine. The correct prediction rate for each crash severity level and the overall correct prediction rate were calculated. The results showed that the machine learning methods had higher predicting accuracy than the statistical methods, though they suffered from over-fitting issue. The RF method had the best prediction in overall and severe crashes while OP was the weakest one. We compared variable importance on crash severity via perturbation-based sensitivity analyses. The results showed that the inferences of variable importance from different methods were not always consistent and should be paid careful attention.
AB - Crash injury severity prediction is a promising research target in traffic safety. Traditionally, various statistical methods were used for modeling crash injury severities. In recent years, machine learning-based methods are becoming popular due to their good predictive performance. However, the machine learning-based models are usually criticized as they perform like a black-box. In this paper, we aim at comparing the predictive performance, including prediction accuracy and estimation of variable importance, among various machine learning and statistical methods with distinct modeling logic for crash severity analysis. The crash severity, road geometry, and traffic flow data were collected at freeway diverge areas in Florida. We estimated two most commonly used statistical methods which were ordered probit (OP) model and multinomial logit model, and four popular machine learning methods, including K-Nearest Neighbor, Decision Tree, Random Forest (RF), and Support Vector Machine. The correct prediction rate for each crash severity level and the overall correct prediction rate were calculated. The results showed that the machine learning methods had higher predicting accuracy than the statistical methods, though they suffered from over-fitting issue. The RF method had the best prediction in overall and severe crashes while OP was the weakest one. We compared variable importance on crash severity via perturbation-based sensitivity analyses. The results showed that the inferences of variable importance from different methods were not always consistent and should be paid careful attention.
KW - accuracy
KW - Crash severity
KW - machine learning
KW - statistical model
KW - variable importance
UR - https://www.scopus.com/pages/publications/85054608847
U2 - 10.1109/ACCESS.2018.2874979
DO - 10.1109/ACCESS.2018.2874979
M3 - Article
AN - SCOPUS:85054608847
SN - 2169-3536
VL - 6
SP - 60079
EP - 60087
JO - IEEE Access
JF - IEEE Access
ER -