Deep domain adaptation for vulnerable code function identification

Van Nguyen, Trung Le, Tue Le, Khanh Nguyen, Olivier Devel, Paul Montague, Lizhen Qu, DInh Phung

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Due to the ubiquity of computer software, software vulnerability detection (SVD) has become crucial in the software industry and in the field of computer security. Two significant issues in SVD arise when using machine learning, namely: i) how to learn automatic features that can help improve the predictive performance of vulnerability detection and ii) how to overcome the scarcity of labeled vulnerabilities in projects that require the laborious labeling of code by software security experts. In this paper, we address these two crucial concerns by proposing a novel architecture which leverages deep domain adaptation with automatic feature learning for software vulnerability identification. Based on this architecture, we keep the principles and reapply the state-of-the-art deep domain adaptation methods to indicate that deep domain adaptation for SVD is plausible and promising. Moreover, we further propose a novel method named Semi-supervised Code Domain Adaptation Network (SCDAN) that can efficiently utilize and exploit information carried in unlabeled target data by considering them as the unlabeled portion in a semi-supervised learning context. The proposed SCDAN method enforces the clustering assumption, which is a key principle in semi-supervised learning. The experimental results using six real-world software project datasets show that our SCDAN method and the baselines using our architecture have better predictive performance by a wide margin compared with the Deep Code Network (VulDeePecker) method without domain adaptation. Also, the proposed SCDAN significantly outperforms the DIRT-T which to the best of our knowledge is currently the-state-of-the-art method in deep domain adaptation and other baselines.

Original languageEnglish
Title of host publicationInternational Joint Conference on Neural Networks (IJCNN) 2019
EditorsPlamen Angelov, Manuel Roveri
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages8
ISBN (Electronic)9781728119854
ISBN (Print)9781728119861
DOIs
Publication statusPublished - 2019
EventIEEE International Joint Conference on Neural Networks 2019 - Budapest, Hungary
Duration: 14 Jul 201919 Jul 2019
https://www.ijcnn.org/

Conference

ConferenceIEEE International Joint Conference on Neural Networks 2019
Abbreviated titleIJCNN 2019
CountryHungary
CityBudapest
Period14/07/1919/07/19
Internet address

Cite this

Nguyen, V., Le, T., Le, T., Nguyen, K., Devel, O., Montague, P., ... Phung, DI. (2019). Deep domain adaptation for vulnerable code function identification. In P. Angelov, & M. Roveri (Eds.), International Joint Conference on Neural Networks (IJCNN) 2019 [8851923] IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/IJCNN.2019.8851923