Skip to main navigation Skip to search Skip to main content

Combining software metrics and text features for vulnerable file prediction

Yun Zhang, David Lo, Xin Xia, Bowen Xu, Jianling Sun, Shanping Li

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

In recent years, to help developers reduce time and effort required to build highly secure software, a number of prediction models which are built on different kinds of features have been proposed to identify vulnerable source code files. In this paper, we propose a novel approach VULPREDICTOR to predict vulnerable files, it analyzes software metrics and text mining together to build a composite prediction model. VULPREDICTOR first builds 6 underlying classifiers on a training set of vulnerable and non-vulnerable files represented by their software metrics and text features, and then constructs a meta classifier to process the outputs of the 6 underlying classifiers. We evaluate our solution on datasets from three web applications including Drupal, PHPMyAdmin and Moodle which contain a total of 3,466 files and 223 vulnerabilities. The experiment results show that VULPREDICTOR can achieve F1 and EffectivenessRatio@20% scores of up to 0.683 and 75%, respectively. On average across the 3 projects, VULPREDICTOR improves the F1 and EffectivenessRatio@20% scores of the best performing state-of-the-art approaches proposed by Walden et al. by 46.53% and 14.93%, respectively.

Original languageEnglish
Title of host publicationProceedings - 2015 20th International Conference on Engineering of Complex Computer Systems, ICECCS 2015
Subtitle of host publication9–11 December 2015 Gold Coast, Australia
EditorsYuan-Fang Li
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages40-49
Number of pages10
ISBN (Electronic)9781467385817
DOIs
Publication statusPublished - 2015
Externally publishedYes
EventIEEE International Conference on Engineering of Complex Computer Systems 2015 - Gold Coast, Australia
Duration: 9 Dec 201511 Dec 2015
Conference number: 20th
http://iceccs2015.monash.edu.au/2015/index.jsp
https://ieeexplore.ieee.org/xpl/conhome/7381588/proceeding (Proceedings)

Conference

ConferenceIEEE International Conference on Engineering of Complex Computer Systems 2015
Abbreviated titleICECCS 2015
Country/TerritoryAustralia
CityGold Coast
Period9/12/1511/12/15
Internet address

Keywords

  • Machine Learning
  • Text Mining
  • Vulnerable File

Cite this