A study of redundant metrics in defect prediction datasets

Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Akinori Ihara, Kenichi Matsumoto

Research output: Chapter in Book/Report/Conference proceedingConference PaperOther

23 Citations (Scopus)


Defect prediction models can help Software Quality Assurance (SQA) teams understand their past pitfalls that lead to defective modules. However, the conclusions that are derived from defect prediction models without mitigating redundant metrics issues may be misleading. In this paper, we set out to investigate if redundant metrics issues are affecting defect prediction studies, and its degree and causes of redundancy. Through a case study of 101 publicly-available defect datasets of systems that span both proprietary and open source domains, we observe that (1) 10%-67% of metrics of the studied defect datasets are redundant, and (2) the redundancy of metrics has to do with the aggregation functions of metrics. These findings suggest that researchers should be aware of redundant metrics prior to constructing a defect prediction model in order to maximize internal validity of their studies.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE 27th International Symposium on Software Reliability Engineering Workshops
Subtitle of host publicationISSREW 2016
EditorsJeremy Bradbury
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages2
ISBN (Electronic)9781509036011
Publication statusPublished - 2016
Externally publishedYes
EventInternational Symposium on Software Reliability Engineering 2016 - Ottawa, Canada
Duration: 23 Oct 201627 Oct 2016
Conference number: 27th


ConferenceInternational Symposium on Software Reliability Engineering 2016
Abbreviated titleISSRE 2016
Internet address


  • Defect prediction models
  • Redundant metrics
  • Software quality assurance

Cite this