Abstract
Collecting relevant and high-quality data is integral to the development of effective Software Vulnerability (SV) prediction models. Most of the current SV datasets rely on SV-fixing commits to extract vulnerable functions and lines. However, none of these datasets have considered latent SVs existing between the introduction and fix of the collected SVs. There is also little known about the usefulness of these latent SVs for SV prediction. To bridge these gaps, we conduct a large-scale study on the latent vulnerable functions in two commonly used SV datasets and their utilization for function-level and line-level SV predictions. Leveraging the state-of-the-art SZZ algorithm, we identify more than 100k latent vulnerable functions in the studied datasets. We find that these latent functions can increase the number of SVs by 4× on average and correct up to 5k mislabeled functions, yet they have a noise level of around 6%. Despite the noise, we show that the state-of-the-art SV prediction model can significantly benefit from such latent SVs. The improvements are up to 24.5% in the performance (F1-Score) of function-level SV predictions and up to 67% in the effectiveness of localizing vulnerable lines. Overall, our study presents the first promising step toward the use of latent SVs to improve the quality of SV datasets and enhance the performance of SV prediction tasks.CCS CONCEPTS• Security and privacy → Software security engineering.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2024 IEEE/ACM 21st International Conference on Mining Software Repositories, MSR 2024 |
| Editors | Alberto Bacchelli, Eleni Constantinou |
| Place of Publication | New York NY USA |
| Publisher | IEEE, Institute of Electrical and Electronics Engineers |
| Pages | 716-727 |
| Number of pages | 12 |
| ISBN (Electronic) | 9798400705878 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | IEEE International Working Conference on Mining Software Repositories 2024 - Lisbon, Portugal Duration: 15 Apr 2024 → 16 Apr 2024 Conference number: 21st https://dl.acm.org/doi/proceedings/10.1145/3643991 (Proceedings) https://2024.msrconf.org/track/msr-2024-technical-papers? (Website) |
Conference
| Conference | IEEE International Working Conference on Mining Software Repositories 2024 |
|---|---|
| Abbreviated title | MSR 2024 |
| Country/Territory | Portugal |
| City | Lisbon |
| Period | 15/04/24 → 16/04/24 |
| Internet address |
Keywords
- Data quality
- Deep learning
- Software security
- Software vulnerability
- SZZ algorithm
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver