TY - GEN
T1 - A comparative study of supervised learning algorithms for re-opened bug prediction
AU - Xia, Xin
AU - Lo, David
AU - Wang, Xinyu
AU - Yang, Xiaohu
AU - Li, Shanping
AU - Sun, Jianling
PY - 2013/5/13
Y1 - 2013/5/13
N2 - Bug fixing is a time-consuming and costly job which is performed in the whole life cycle of software development and maintenance. For many systems, bugs are managed in bug management systems such as Bugzilla. Generally, the status of a typical bug report in Bugzilla changes from new to assigned, verified and closed. However, some bugs have to be reopened. Reopened bugs increase the software development and maintenance cost, increase the workload of bug fixers, and might even delay the future delivery of a software. Only a few studies investigate the phenomenon of reopened bug reports. In this paper, we evaluate the effectiveness of various supervised learning algorithms to predict if a bug report would be reopened. We choose 7 state-of-the-art classical supervised learning algorithm in machine learning literature, i.e., kNN, SVM, Simple Logistic, Bayesian Network, Decision Table, CART and LWL, and 3 ensemble learning algorithms, i.e., AdaBoost, Bagging and Random Forest, and evaluate their performance in predicting reopened bug reports. The experiment results show that among the 10 algorithms, Bagging and Decision Table (IDTM) achieve the best performance. They achieve accuracy scores of 92.91% and 92.80%, respectively, and reopened bug reports F-Measure scores of 0.735 and 0.732, respectively. These results improve the reopened bug reports F-Measure of the state-of-the-art approaches proposed by Shihab et al. by up to 23.53%.
AB - Bug fixing is a time-consuming and costly job which is performed in the whole life cycle of software development and maintenance. For many systems, bugs are managed in bug management systems such as Bugzilla. Generally, the status of a typical bug report in Bugzilla changes from new to assigned, verified and closed. However, some bugs have to be reopened. Reopened bugs increase the software development and maintenance cost, increase the workload of bug fixers, and might even delay the future delivery of a software. Only a few studies investigate the phenomenon of reopened bug reports. In this paper, we evaluate the effectiveness of various supervised learning algorithms to predict if a bug report would be reopened. We choose 7 state-of-the-art classical supervised learning algorithm in machine learning literature, i.e., kNN, SVM, Simple Logistic, Bayesian Network, Decision Table, CART and LWL, and 3 ensemble learning algorithms, i.e., AdaBoost, Bagging and Random Forest, and evaluate their performance in predicting reopened bug reports. The experiment results show that among the 10 algorithms, Bagging and Decision Table (IDTM) achieve the best performance. They achieve accuracy scores of 92.91% and 92.80%, respectively, and reopened bug reports F-Measure scores of 0.735 and 0.732, respectively. These results improve the reopened bug reports F-Measure of the state-of-the-art approaches proposed by Shihab et al. by up to 23.53%.
KW - bug reports
KW - classification
KW - comparative study
KW - reopened reports
KW - supervised learning algorithms
UR - http://www.scopus.com/inward/record.url?scp=84877280823&partnerID=8YFLogxK
U2 - 10.1109/CSMR.2013.43
DO - 10.1109/CSMR.2013.43
M3 - Conference Paper
AN - SCOPUS:84877280823
SN - 9780769549484
T3 - Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR
SP - 331
EP - 334
BT - Proceedings of the 17th European Conference on Software Maintenance and Reengineering, CSMR 2013
PB - IEEE, Institute of Electrical and Electronics Engineers
T2 - 17th European Conference on Software Maintenance and Reengineering, CSMR 2013
Y2 - 5 March 2013 through 8 March 2013
ER -