File-level defect prediction: Unsupervised vs. supervised models

Meng Yan, Yicheng Fang, David Lo, Xin Xia, Xiaohong Zhang

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

28 Citations (Scopus)

Abstract

Background: Software defect models can help software quality assurance teams to allocate testing or code review resources. A variety of techniques have been used to build defect prediction models, including supervised and unsupervised methods. Recently, Yang et al. [1] surprisingly find that unsupervised models can perform statistically significantly better than supervised models in effort-aware change-level defect prediction. However, little is known about relative performance of unsupervised and supervised models for effort-aware file-level defect prediction. Goal: Inspired by their work, we aim to investigate whether a similar finding holds in effort-aware file-level defect prediction. Method: We replicate Yang et al.'s study on PROMISE dataset with totally ten projects. We compare the effectiveness of unsupervised and supervised prediction models for effort-aware file-level defect prediction. Results: We find that the conclusion of Yang et al. [1] does not hold under within-project but holds under cross-project setting for file-level defect prediction. In addition, following the recommendations given by the best unsupervised model, developers needs to inspect statistically significantly more files than that of supervised models considering the same inspection effort (i.e., LOC). Conclusions: (a) Unsupervised models do not perform statistically significantly better than state-of-art supervised model under within-project setting, (b) Unsupervised models can perform statistically significantly better than state-ofart supervised model under cross-project setting, (c) We suggest that not only LOC but also number of files needed to be inspected should be considered when evaluating effort-aware filelevel defect prediction models.

Original languageEnglish
Title of host publicationProceedings - 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017
Subtitle of host publication9–10 November 2017 Toronto, Ontario, Canada
EditorsLu Zhang, Thomas Zimmerman
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages344-353
Number of pages10
ISBN (Electronic)9781509040391, 9781509040407
DOIs
Publication statusPublished - 2017
Externally publishedYes
EventInternational Symposium on Empirical Software Engineering and Measurement 2017 - Toronto, Canada
Duration: 9 Nov 201710 Nov 2017
Conference number: 11th
http://www.scs.ryerson.ca/eseiw2017/ESEM/

Conference

ConferenceInternational Symposium on Empirical Software Engineering and Measurement 2017
Abbreviated titleESEM 2017
CountryCanada
CityToronto
Period9/11/1710/11/17
Internet address

Keywords

  • Effortaware Defect Prediction
  • Inspection Effort
  • Replication Study

Cite this