A systematic literature review and meta-analysis on cross project defect prediction

Seyedrebvar Hosseini, Burak Turhan, Dimuthu Gunarathna

Research output: Contribution to journalArticleResearchpeer-review

28 Citations (Scopus)

Abstract

Background:Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence. Objective:To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances. Further, we aim to assess the performance of CPDP vs. within project DP models. Method: We conducted a systematic literature review. Results from primary studies are synthesised (thematic, meta-analysis) to answer research questions. Results: We identified 30 primary studies passing quality assessment. Performance measures, except precision, vary with the choice of metrics. Recall, precision, f-measure, and AUC are the most common measures. Models based on Nearest-Neighbour and Decision Tree tend to perform well in CPDP, whereas the popular naive Bayes yield average performance. Performance of ensembles varies greatly across f-measure and AUC. Data approaches address CPDP challenges using row/column processing, which improve CPDP in terms of recall at the cost of precision. This is observed in multiple occasions including the meta-analysis of CPDP vs. WPDP. NASA and Jureczko datasets seem to favour CPDP over WPDP more frequently. Conclusion: CPDP is still a challenge and requires more research before trustworthy applications can take place. We provide guidelines for further research.

Original languageEnglish
Article number8097045
Pages (from-to)111-147
Number of pages37
JournalIEEE Transactions on Software Engineering
Volume45
Issue number2
DOIs
Publication statusPublished - 1 Feb 2019
Externally publishedYes

Keywords

  • Bibliographies
  • Context modeling
  • Cross Project
  • Data models
  • Defect Prediction
  • Fault Prediction
  • Measurement
  • Meta-analysis
  • Object oriented modeling
  • Predictive models
  • Systematic Literature Review
  • Systematics
  • Within Project

Cite this