Abstract
Software reuse has proven to be an effective strategy for developers to significantly increase software quality, reduce costs and increase the effectiveness of software development. Research in software reuse typically addresses two main hurdles: reduce the time and effort required to identify reusable candidates, and avoid selecting low-quality software components that may lead to higher cost of development (i.e., solving bugs, errors, refactoring). Inherently, human judgment falls short in the aspect of reliability and effectiveness. Hence this paper investigates the applicability of Machine Learning (ML) algorithms in assessing software reuse. We collected more than 32k open-source projects and employed GitHub fork as the ground truth to its reuse. We developed ML classification pipelines based on both internal and external software metrics to perform software reuse prediction. Our best-performing ML classification model achieved an accuracy of 86%, outperforming existing research in prediction performance and data coverage. Subsequently, we leverage our results by identifying key software characteristics that make software highly reusable. Our results show that size-related metrics (i.e., number of setters, methods, attributes) are the most impactful in contributing to the reuse of the software.
Original language | English |
---|---|
Title of host publication | Proceedings of the 6th International Workshop on Machine Learning Techniques for Software Quality Evaluation |
Editors | Maxime Cordy, Xiaofei Xie, Bowen Xu, Bibi Stamatia |
Place of Publication | New York NY USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 17-22 |
Number of pages | 6 |
ISBN (Electronic) | 9781450394567 |
DOIs | |
Publication status | Published - 2022 |
Event | International Workshop on Machine Learning Techniques for Software Quality Evaluation 2022: co-located with the 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022 - Singapore, Singapore Duration: 18 Nov 2022 → 18 Nov 2022 Conference number: 6th https://dl.acm.org/doi/proceedings/10.1145/3549034 (Proceedings) |
Conference
Conference | International Workshop on Machine Learning Techniques for Software Quality Evaluation 2022 |
---|---|
Abbreviated title | MaLTeSQuE 2022 |
Country/Territory | Singapore |
City | Singapore |
Period | 18/11/22 → 18/11/22 |
Internet address |
|
Keywords
- Machine Learning
- Software Metrics
- Software Reusability