Skip to main navigation Skip to search Skip to main content

Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Collecting relevant and high-quality data is integral to the development of effective Software Vulnerability (SV) prediction models. Most of the current SV datasets rely on SV-fixing commits to extract vulnerable functions and lines. However, none of these datasets have considered latent SVs existing between the introduction and fix of the collected SVs. There is also little known about the usefulness of these latent SVs for SV prediction. To bridge these gaps, we conduct a large-scale study on the latent vulnerable functions in two commonly used SV datasets and their utilization for function-level and line-level SV predictions. Leveraging the state-of-the-art SZZ algorithm, we identify more than 100k latent vulnerable functions in the studied datasets. We find that these latent functions can increase the number of SVs by 4× on average and correct up to 5k mislabeled functions, yet they have a noise level of around 6%. Despite the noise, we show that the state-of-the-art SV prediction model can significantly benefit from such latent SVs. The improvements are up to 24.5% in the performance (F1-Score) of function-level SV predictions and up to 67% in the effectiveness of localizing vulnerable lines. Overall, our study presents the first promising step toward the use of latent SVs to improve the quality of SV datasets and enhance the performance of SV prediction tasks.CCS CONCEPTS• Security and privacy → Software security engineering.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/ACM 21st International Conference on Mining Software Repositories, MSR 2024
EditorsAlberto Bacchelli, Eleni Constantinou
Place of PublicationNew York NY USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages716-727
Number of pages12
ISBN (Electronic)9798400705878
DOIs
Publication statusPublished - 2024
EventIEEE International Working Conference on Mining Software Repositories 2024 - Lisbon, Portugal
Duration: 15 Apr 202416 Apr 2024
Conference number: 21st
https://dl.acm.org/doi/proceedings/10.1145/3643991 (Proceedings)
https://2024.msrconf.org/track/msr-2024-technical-papers? (Website)

Conference

ConferenceIEEE International Working Conference on Mining Software Repositories 2024
Abbreviated titleMSR 2024
Country/TerritoryPortugal
CityLisbon
Period15/04/2416/04/24
Internet address

Keywords

  • Data quality
  • Deep learning
  • Software security
  • Software vulnerability
  • SZZ algorithm

Cite this