A systematic literature review and meta-analysis on cross project defect prediction

Seyedrebvar Hosseini, Burak Turhan, Dimuthu Gunarathna

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Background:Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence. Objective:To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances. Further, we aim to assess the performance of CPDP vs. within project DP models. Method: We conducted a systematic literature review. Results from primary studies are synthesised (thematic, meta-analysis) to answer research questions. Results: We identified 30 primary studies passing quality assessment. Performance measures, except precision, vary with the choice of metrics. Recall, precision, f-measure, and AUC are the most common measures. Models based on Nearest-Neighbour and Decision Tree tend to perform well in CPDP, whereas the popular naive Bayes yield average performance. Performance of ensembles varies greatly across f-measure and AUC. Data approaches address CPDP challenges using row/column processing, which improve CPDP in terms of recall at the cost of precision. This is observed in multiple occasions including the meta-analysis of CPDP vs. WPDP. NASA and Jureczko datasets seem to favour CPDP over WPDP more frequently. Conclusion: CPDP is still a challenge and requires more research before trustworthy applications can take place. We provide guidelines for further research.

Original languageEnglish
Article number8097045
Pages (from-to)111-147
Number of pages37
JournalIEEE Transactions on Software Engineering
Volume45
Issue number2
DOIs
Publication statusPublished - 1 Feb 2019
Externally publishedYes

Keywords

  • Bibliographies
  • Context modeling
  • Cross Project
  • Data models
  • Defect Prediction
  • Fault Prediction
  • Measurement
  • Meta-analysis
  • Object oriented modeling
  • Predictive models
  • Systematic Literature Review
  • Systematics
  • Within Project

Cite this

@article{18dbf6a6ac93448fb466c2ba105dc86a,
title = "A systematic literature review and meta-analysis on cross project defect prediction",
abstract = "Background:Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence. Objective:To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances. Further, we aim to assess the performance of CPDP vs. within project DP models. Method: We conducted a systematic literature review. Results from primary studies are synthesised (thematic, meta-analysis) to answer research questions. Results: We identified 30 primary studies passing quality assessment. Performance measures, except precision, vary with the choice of metrics. Recall, precision, f-measure, and AUC are the most common measures. Models based on Nearest-Neighbour and Decision Tree tend to perform well in CPDP, whereas the popular naive Bayes yield average performance. Performance of ensembles varies greatly across f-measure and AUC. Data approaches address CPDP challenges using row/column processing, which improve CPDP in terms of recall at the cost of precision. This is observed in multiple occasions including the meta-analysis of CPDP vs. WPDP. NASA and Jureczko datasets seem to favour CPDP over WPDP more frequently. Conclusion: CPDP is still a challenge and requires more research before trustworthy applications can take place. We provide guidelines for further research.",
keywords = "Bibliographies, Context modeling, Cross Project, Data models, Defect Prediction, Fault Prediction, Measurement, Meta-analysis, Object oriented modeling, Predictive models, Systematic Literature Review, Systematics, Within Project",
author = "Seyedrebvar Hosseini and Burak Turhan and Dimuthu Gunarathna",
year = "2019",
month = "2",
day = "1",
doi = "10.1109/TSE.2017.2770124",
language = "English",
volume = "45",
pages = "111--147",
journal = "IEEE Transactions on Software Engineering",
issn = "0098-5589",
publisher = "Publ by IEEE",
number = "2",

}

A systematic literature review and meta-analysis on cross project defect prediction. / Hosseini, Seyedrebvar; Turhan, Burak; Gunarathna, Dimuthu.

In: IEEE Transactions on Software Engineering, Vol. 45, No. 2, 8097045, 01.02.2019, p. 111-147.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - A systematic literature review and meta-analysis on cross project defect prediction

AU - Hosseini, Seyedrebvar

AU - Turhan, Burak

AU - Gunarathna, Dimuthu

PY - 2019/2/1

Y1 - 2019/2/1

N2 - Background:Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence. Objective:To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances. Further, we aim to assess the performance of CPDP vs. within project DP models. Method: We conducted a systematic literature review. Results from primary studies are synthesised (thematic, meta-analysis) to answer research questions. Results: We identified 30 primary studies passing quality assessment. Performance measures, except precision, vary with the choice of metrics. Recall, precision, f-measure, and AUC are the most common measures. Models based on Nearest-Neighbour and Decision Tree tend to perform well in CPDP, whereas the popular naive Bayes yield average performance. Performance of ensembles varies greatly across f-measure and AUC. Data approaches address CPDP challenges using row/column processing, which improve CPDP in terms of recall at the cost of precision. This is observed in multiple occasions including the meta-analysis of CPDP vs. WPDP. NASA and Jureczko datasets seem to favour CPDP over WPDP more frequently. Conclusion: CPDP is still a challenge and requires more research before trustworthy applications can take place. We provide guidelines for further research.

AB - Background:Cross project defect prediction (CPDP) recently gained considerable attention, yet there are no systematic efforts to analyse existing empirical evidence. Objective:To synthesise literature to understand the state-of-the-art in CPDP with respect to metrics, models, data approaches, datasets and associated performances. Further, we aim to assess the performance of CPDP vs. within project DP models. Method: We conducted a systematic literature review. Results from primary studies are synthesised (thematic, meta-analysis) to answer research questions. Results: We identified 30 primary studies passing quality assessment. Performance measures, except precision, vary with the choice of metrics. Recall, precision, f-measure, and AUC are the most common measures. Models based on Nearest-Neighbour and Decision Tree tend to perform well in CPDP, whereas the popular naive Bayes yield average performance. Performance of ensembles varies greatly across f-measure and AUC. Data approaches address CPDP challenges using row/column processing, which improve CPDP in terms of recall at the cost of precision. This is observed in multiple occasions including the meta-analysis of CPDP vs. WPDP. NASA and Jureczko datasets seem to favour CPDP over WPDP more frequently. Conclusion: CPDP is still a challenge and requires more research before trustworthy applications can take place. We provide guidelines for further research.

KW - Bibliographies

KW - Context modeling

KW - Cross Project

KW - Data models

KW - Defect Prediction

KW - Fault Prediction

KW - Measurement

KW - Meta-analysis

KW - Object oriented modeling

KW - Predictive models

KW - Systematic Literature Review

KW - Systematics

KW - Within Project

UR - http://www.scopus.com/inward/record.url?scp=85034218670&partnerID=8YFLogxK

U2 - 10.1109/TSE.2017.2770124

DO - 10.1109/TSE.2017.2770124

M3 - Article

VL - 45

SP - 111

EP - 147

JO - IEEE Transactions on Software Engineering

JF - IEEE Transactions on Software Engineering

SN - 0098-5589

IS - 2

M1 - 8097045

ER -