Abstract
Although information theory has found success in disciplines, the literature on its applications to software evolution is limit. We are still missing artifacts that leverage the data and tooling available to measure how the information content of a project can be a proxy for its complexity. In this work, we explore two definitions of entropy, one structural and one textual, and apply it to the historical progression of the commit history of 25 open source projects. We produce evidence that they generally are highly correlated. We also observed that they display weak and unstable correlations with other complexity metrics. Our preliminary investigation of outliers shows an unexpected high frequency of events where there is considerable change in the information content of the project, suggesting that such outliers may inform a definition of surprisal.
Original language | English |
---|---|
Title of host publication | Proceedings - 2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering, NLBSE 2023 |
Editors | Sebastiano Panichella, Andrea Di Sorbo |
Place of Publication | Piscataway NJ USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 48-55 |
Number of pages | 8 |
ISBN (Electronic) | 9798350301786 |
ISBN (Print) | 9798350301793 |
DOIs | |
Publication status | Published - 2023 |
Event | IEEE/ACM International Workshop on Natural Language-Based Software Engineering 2023 - Melbourne, Australia Duration: 20 May 2023 → 20 May 2023 Conference number: 2nd https://ieeexplore.ieee.org/xpl/conhome/10189115/proceeding (Proceedings) |
Conference
Conference | IEEE/ACM International Workshop on Natural Language-Based Software Engineering 2023 |
---|---|
Abbreviated title | NLBSE 2023 |
Country/Territory | Australia |
City | Melbourne |
Period | 20/05/23 → 20/05/23 |
Internet address |
Keywords
- entropy
- Information theory
- software engineering