Duplicate bug report detection using Dual-Channel Convolutional Neural Networks

Jianjun He, Ling Xu, Meng Yan, Xin Xia, Yan Lei

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

1 Citation (Scopus)

Abstract

Developers rely on bug reports to fix bugs. The bug reports areusually stored and managed in bug tracking systems. Due to thedifferent expression habits, different reporters may use differentexpressions to describe the same bug in the bug tracking system.As a result, the bug tracking system often contains many duplicatebug reports. Automatically detecting these duplicate bug reportswould save a large amount of effort for bug analysis. Prior studieshave found that deep-learning technique is effective for duplicatebug report detection. Inspired by recent Natural Language Processing (NLP) research, in this paper, we propose a duplicate bugreport detection approach based on Dual-Channel ConvolutionalNeural Networks (DC-CNN). We present a novel bug report pairrepresentation, i.e., dual-channel matrix through concatenating twosingle-channel matrices representing bug reports. Such bug reportpairs are fed to a CNN model to capture the correlated semanticrelationships between bug reports. Then, our approach uses theassociation features to classify whether a pair of bug reports areduplicate or not. We evaluate our approach on three large datasetsfrom three open-source projects, including Open Office, Eclipse,Net Beans and a larger combined dataset, and the accuracy of classification reaches 0.9429, 0.9685, 0.9534, 0.9552 respectively. Such performance outperforms the two state-of-the-art approaches whichalso use deep-learning techniques. The results indicate that ourdual-channel matrix representation is effective for duplicate bugreport detection.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE/ACM 28th International Conference on Program Comprehension, ICPC 2020
EditorsYann-Gaël Guéhéneuc, Shinpei Hayashi
Place of PublicationNew York NY USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages117-127
Number of pages11
ISBN (Electronic)9781450379588
DOIs
Publication statusPublished - 2020
EventInternational Conference on Program Comprehension 2020 - Seoul, Korea, Republic of (South)
Duration: 13 Jul 202015 Jul 2020
Conference number: 28th
https://dl.acm.org/doi/proceedings/10.1145/3387904 (Proceedings)
https://conf.researchr.org/home/icpc-2020 (Website)

Conference

ConferenceInternational Conference on Program Comprehension 2020
Abbreviated titleICPC 2020
CountryKorea, Republic of (South)
CitySeoul
Period13/07/2015/07/20
Internet address

Keywords

  • Convolutional Neural Networks
  • Dual-Channel
  • Duplicate Bug Report Detection
  • Software Maintenance
  • Software Quality Assurance

Cite this