Dual-component deep domain adaptation: a new approach for cross project software vulnerability detection

Van Nguyen, Trung Le, Olivier de Vel, Paul Montague, John Grundy, Dinh Phung

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Owing to the ubiquity of computer software, software vulnerability detection (SVD) has become an important problem in the software industry and computer security. One of the most crucial issues in SVD is coping with the scarcity of labeled vulnerabilities in projects that require the laborious manual labeling of code by software security experts. One possible solution is to employ deep domain adaptation (DA) which has recently witnessed enormous success in transferring learning from structural labeled to unlabeled data sources. Generative adversarial network (GAN) is a technique that attempts to bridge the gap between source and target data in the joint space and emerges as a building block to develop deep DA approaches with state-of-the-art performance. However, deep DA approaches using the GAN principle to close the gap are subject to the mode collapsing problem that negatively impacts the predictive performance. Our aim in this paper is to propose Dual Generator-Discriminator Deep Code Domain Adaptation Network (Dual-GD-DDAN) for tackling the problem of transfer learning from labeled to unlabeled software projects in SVD to resolve the mode collapsing problem faced in previous approaches. The experimental results on real-world software projects show that our method outperforms state-of-the-art baselines by a wide margin.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
Subtitle of host publication24th Pacific-Asia Conference, PAKDD 2020 Singapore, May 11–14, 2020 Proceedings, Part I
EditorsHady W. Lauw, Raymond Chi-Wing Wong, Alexandros Ntoulas, Ee-Peng Lim, See-Kiong Ng, Sinno Jialin Pan
Place of PublicationCham Switzerland
PublisherSpringer
Pages699-711
Number of pages13
ISBN (Electronic)9783030474263
ISBN (Print)9783030474256
DOIs
Publication statusPublished - 2020
EventPacific-Asia Conference on Knowledge Discovery and Data Mining 2020 - Singapore, Singapore
Duration: 11 May 202014 May 2020
Conference number: 24th
https://pakdd2020.org (Website)
https://link.springer.com/book/10.1007/978-3-030-47426-3 (Conference Papers)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume12084
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferencePacific-Asia Conference on Knowledge Discovery and Data Mining 2020
Abbreviated titlePAKDD 2020
CountrySingapore
CitySingapore
Period11/05/2014/05/20
Internet address

Keywords

  • Cyber security
  • Deep learning
  • Domain adaptation
  • Machine learning
  • Software vulnerability detection

Cite this

Nguyen, V., Le, T., de Vel, O., Montague, P., Grundy, J., & Phung, D. (2020). Dual-component deep domain adaptation: a new approach for cross project software vulnerability detection. In H. W. Lauw, R. C-W. Wong, A. Ntoulas, E-P. Lim, S-K. Ng, & S. J. Pan (Eds.), Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020 Singapore, May 11–14, 2020 Proceedings, Part I (pp. 699-711). (Lecture Notes in Computer Science ; Vol. 12084 ). Springer. https://doi.org/10.1007/978-3-030-47426-3_54