AutoSpearman

automatically mitigating correlated software metrics for interpreting defect models

Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Christoph Treude

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

5 Citations (Scopus)

Abstract

The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-Available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Software Maintenance and Evolution - ICSME 2018
Subtitle of host publication23–29 September 2018 Madrid, Spain
EditorsFoutse Khomh, David Lo
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages92-103
Number of pages12
ISBN (Electronic)9781538678701
ISBN (Print)9781538678718
DOIs
Publication statusPublished - 2018
Externally publishedYes
EventIEEE International Conference on Software Maintenance and Evolution 2018 - Madrid, Spain
Duration: 23 Sep 201829 Sep 2018
Conference number: 34th
https://icsme2018.github.io/

Conference

ConferenceIEEE International Conference on Software Maintenance and Evolution 2018
Abbreviated titleICSME 2018
CountrySpain
CityMadrid
Period23/09/1829/09/18
Internet address

Keywords

  • Correlated Metrics
  • Defect Prediction
  • Feature Selection
  • Model Interpretation
  • Software Analytics

Cite this

Jiarpakdee, J., Tantithamthavorn, C., & Treude, C. (2018). AutoSpearman: automatically mitigating correlated software metrics for interpreting defect models. In F. Khomh, & D. Lo (Eds.), Proceedings - 2018 IEEE International Conference on Software Maintenance and Evolution - ICSME 2018: 23–29 September 2018 Madrid, Spain (pp. 92-103). [8530020] Piscataway NJ USA: IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICSME.2018.00018
Jiarpakdee, Jirayus ; Tantithamthavorn, Chakkrit ; Treude, Christoph. / AutoSpearman : automatically mitigating correlated software metrics for interpreting defect models. Proceedings - 2018 IEEE International Conference on Software Maintenance and Evolution - ICSME 2018: 23–29 September 2018 Madrid, Spain. editor / Foutse Khomh ; David Lo. Piscataway NJ USA : IEEE, Institute of Electrical and Electronics Engineers, 2018. pp. 92-103
@inproceedings{f57da6296a3143039aedb184bceaea0c,
title = "AutoSpearman: automatically mitigating correlated software metrics for interpreting defect models",
abstract = "The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-Available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2{\%}pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques.",
keywords = "Correlated Metrics, Defect Prediction, Feature Selection, Model Interpretation, Software Analytics",
author = "Jirayus Jiarpakdee and Chakkrit Tantithamthavorn and Christoph Treude",
year = "2018",
doi = "10.1109/ICSME.2018.00018",
language = "English",
isbn = "9781538678718",
pages = "92--103",
editor = "Khomh, {Foutse } and Lo, {David }",
booktitle = "Proceedings - 2018 IEEE International Conference on Software Maintenance and Evolution - ICSME 2018",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
address = "United States of America",

}

Jiarpakdee, J, Tantithamthavorn, C & Treude, C 2018, AutoSpearman: automatically mitigating correlated software metrics for interpreting defect models. in F Khomh & D Lo (eds), Proceedings - 2018 IEEE International Conference on Software Maintenance and Evolution - ICSME 2018: 23–29 September 2018 Madrid, Spain., 8530020, IEEE, Institute of Electrical and Electronics Engineers, Piscataway NJ USA, pp. 92-103, IEEE International Conference on Software Maintenance and Evolution 2018, Madrid, Spain, 23/09/18. https://doi.org/10.1109/ICSME.2018.00018

AutoSpearman : automatically mitigating correlated software metrics for interpreting defect models. / Jiarpakdee, Jirayus; Tantithamthavorn, Chakkrit; Treude, Christoph.

Proceedings - 2018 IEEE International Conference on Software Maintenance and Evolution - ICSME 2018: 23–29 September 2018 Madrid, Spain. ed. / Foutse Khomh; David Lo. Piscataway NJ USA : IEEE, Institute of Electrical and Electronics Engineers, 2018. p. 92-103 8530020.

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - AutoSpearman

T2 - automatically mitigating correlated software metrics for interpreting defect models

AU - Jiarpakdee, Jirayus

AU - Tantithamthavorn, Chakkrit

AU - Treude, Christoph

PY - 2018

Y1 - 2018

N2 - The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-Available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques.

AB - The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-Available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques.

KW - Correlated Metrics

KW - Defect Prediction

KW - Feature Selection

KW - Model Interpretation

KW - Software Analytics

UR - http://www.scopus.com/inward/record.url?scp=85058267472&partnerID=8YFLogxK

U2 - 10.1109/ICSME.2018.00018

DO - 10.1109/ICSME.2018.00018

M3 - Conference Paper

SN - 9781538678718

SP - 92

EP - 103

BT - Proceedings - 2018 IEEE International Conference on Software Maintenance and Evolution - ICSME 2018

A2 - Khomh, Foutse

A2 - Lo, David

PB - IEEE, Institute of Electrical and Electronics Engineers

CY - Piscataway NJ USA

ER -

Jiarpakdee J, Tantithamthavorn C, Treude C. AutoSpearman: automatically mitigating correlated software metrics for interpreting defect models. In Khomh F, Lo D, editors, Proceedings - 2018 IEEE International Conference on Software Maintenance and Evolution - ICSME 2018: 23–29 September 2018 Madrid, Spain. Piscataway NJ USA: IEEE, Institute of Electrical and Electronics Engineers. 2018. p. 92-103. 8530020 https://doi.org/10.1109/ICSME.2018.00018