On the application of machine learning models to assess and predict software reusability

Matthew Yit Hang Yeow, Chun Yong Chong, Mei Kuan Lim

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

1 Citation (Scopus)

Abstract

Software reuse has proven to be an effective strategy for developers to significantly increase software quality, reduce costs and increase the effectiveness of software development. Research in software reuse typically addresses two main hurdles: reduce the time and effort required to identify reusable candidates, and avoid selecting low-quality software components that may lead to higher cost of development (i.e., solving bugs, errors, refactoring). Inherently, human judgment falls short in the aspect of reliability and effectiveness. Hence this paper investigates the applicability of Machine Learning (ML) algorithms in assessing software reuse. We collected more than 32k open-source projects and employed GitHub fork as the ground truth to its reuse. We developed ML classification pipelines based on both internal and external software metrics to perform software reuse prediction. Our best-performing ML classification model achieved an accuracy of 86%, outperforming existing research in prediction performance and data coverage. Subsequently, we leverage our results by identifying key software characteristics that make software highly reusable. Our results show that size-related metrics (i.e., number of setters, methods, attributes) are the most impactful in contributing to the reuse of the software.

Original languageEnglish
Title of host publicationProceedings of the 6th International Workshop on Machine Learning Techniques for Software Quality Evaluation
EditorsMaxime Cordy, Xiaofei Xie, Bowen Xu, Bibi Stamatia
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages17-22
Number of pages6
ISBN (Electronic)9781450394567
DOIs
Publication statusPublished - 2022
EventInternational Workshop on Machine Learning Techniques for Software Quality Evaluation 2022: co-located with the 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022 - Singapore, Singapore
Duration: 18 Nov 202218 Nov 2022
Conference number: 6th
https://dl.acm.org/doi/proceedings/10.1145/3549034 (Proceedings)

Conference

ConferenceInternational Workshop on Machine Learning Techniques for Software Quality Evaluation 2022
Abbreviated titleMaLTeSQuE 2022
Country/TerritorySingapore
CitySingapore
Period18/11/2218/11/22
Internet address

Keywords

  • Machine Learning
  • Software Metrics
  • Software Reusability

Cite this