Combining software metrics and text features for vulnerable file prediction

Yun Zhang, David Lo, Xin Xia, Bowen Xu, Jianling Sun, Shanping Li

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

47 Citations (Scopus)

Abstract

In recent years, to help developers reduce time and effort required to build highly secure software, a number of prediction models which are built on different kinds of features have been proposed to identify vulnerable source code files. In this paper, we propose a novel approach VULPREDICTOR to predict vulnerable files, it analyzes software metrics and text mining together to build a composite prediction model. VULPREDICTOR first builds 6 underlying classifiers on a training set of vulnerable and non-vulnerable files represented by their software metrics and text features, and then constructs a meta classifier to process the outputs of the 6 underlying classifiers. We evaluate our solution on datasets from three web applications including Drupal, PHPMyAdmin and Moodle which contain a total of 3,466 files and 223 vulnerabilities. The experiment results show that VULPREDICTOR can achieve F1 and EffectivenessRatio@20% scores of up to 0.683 and 75%, respectively. On average across the 3 projects, VULPREDICTOR improves the F1 and EffectivenessRatio@20% scores of the best performing state-of-the-art approaches proposed by Walden et al. by 46.53% and 14.93%, respectively.

Original languageEnglish
Title of host publicationProceedings - 2015 20th International Conference on Engineering of Complex Computer Systems, ICECCS 2015
Subtitle of host publication9–11 December 2015 Gold Coast, Australia
EditorsYuan-Fang Li
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages40-49
Number of pages10
ISBN (Electronic)9781467385817
DOIs
Publication statusPublished - 2015
Externally publishedYes
EventIEEE International Conference on Engineering of Complex Computer Systems 2015 - Gold Coast, Australia
Duration: 9 Dec 201511 Dec 2015
Conference number: 20th
http://iceccs2015.monash.edu.au/2015/index.jsp
https://ieeexplore.ieee.org/xpl/conhome/7381588/proceeding (Proceedings)

Conference

ConferenceIEEE International Conference on Engineering of Complex Computer Systems 2015
Abbreviated titleICECCS 2015
Country/TerritoryAustralia
CityGold Coast
Period9/12/1511/12/15
Internet address

Keywords

  • Machine Learning
  • Text Mining
  • Vulnerable File

Cite this