An empirical study of classifier combination for cross-project defect prediction

Yun Zhang, David Lo, Xin Xia, Jianling Sun

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

51 Citations (Scopus)

Abstract

To help developers better allocate testing and debugging efforts, many software defect prediction techniques have been proposed in the literature. These techniques can be used to predict classes that are more likely to be buggy based on past history of buggy classes. These techniques work well as long as a sufficient amount of data is available to train a prediction model. However, there is rarely enough training data for new software projects. To deal with this problem, cross-project defect prediction, which transfers a prediction model trained using data from one project to another, has been proposed and is regarded as a new challenge for defect prediction. So far, only a few cross-project defect prediction techniques have been proposed. To advance the state-of-the-art, in this work, we investigate 7 composite algorithms, which integrate multiple machine learning classifiers, to improve cross-project defect prediction. To evaluate the performance of the composite algorithms, we perform experiments on 10 open source software systems from the PROMISE repository which contain a total of 5,305 instances labeled as defective or clean. We compare the composite algorithms with CODEP Logistic, which is the latest cross-project defect prediction algorithm proposed by Panichella et al., in terms of two standard evaluation metrics: cost effectiveness and F-measure. Our experiment results show that several algorithms outperform CODEP Logistic: Max performs the best in terms of F-measure and its average F-measure outperforms that of CODEP Logistic by 36.88%. Bagging J48 performs the best in terms of cost effectiveness and its average cost effectiveness outperforms that of CODEP Logistic by 15.34%.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE 39th Annual Computer Software and Applications Conference, COMPSAC 2015
EditorsSheikh Iqbal Ahamed, Carl K. Chang, William Chu, Ivica Crnkovic, Pao-Ann Hsiung, Gang Huang, Jingwei Yang
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages264-269
Number of pages6
Volume2
ISBN (Electronic)9781467365635, 9781467365642
DOIs
Publication statusPublished - 2015
Externally publishedYes
EventInternational Computer Software and Applications Conference 2015 - Taichung, Taiwan
Duration: 1 Jul 20155 Jul 2015
Conference number: 39th
https://www.computer.org/web/compsac

Conference

ConferenceInternational Computer Software and Applications Conference 2015
Abbreviated titleCOMPSAC 2015
CountryTaiwan
CityTaichung
Period1/07/155/07/15
Internet address

Keywords

  • Classifier Combination
  • Cross-Project
  • Defect Prediction

Cite this

Zhang, Y., Lo, D., Xia, X., & Sun, J. (2015). An empirical study of classifier combination for cross-project defect prediction. In S. I. Ahamed, C. K. Chang, W. Chu, I. Crnkovic, P-A. Hsiung, G. Huang, & J. Yang (Eds.), Proceedings - 2015 IEEE 39th Annual Computer Software and Applications Conference, COMPSAC 2015 (Vol. 2, pp. 264-269). [7273627] IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/COMPSAC.2015.58