Improving code search with co-attentive representation learning

Jianhang Shuai, Ling Xu, Chao Liu, Meng Yan, Xin Xia, Yan Lei

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

51 Citations (Scopus)

Abstract

Searching and reusing existing code from a large-scale codebase,e.g, GitHub, can help developers complete a programming task efficiently. Recently, Gu et al. proposed a deep learning-based model(i.e., DeepCS), which significantly outperformed prior models. TheDeepCS embedded codebase and natural language queries intovectors by two LSTM (long and short-term memory) models separately, and returned developers the code with higher similarityto a code search query. However, such embedding method learnedtwo isolated representations for code and query but ignored theirinternal semantic correlations. As a result, the learned isolated representations of code and query may limit the effectiveness of codesearch.To address the aforementioned issue, we propose a co-attentiverepresentation learning model, i.e., Co-Attentive RepresentationLearning Code Search-CNN (CARLCS-CNN). CARLCS-CNN learnsinterdependent representations for the embedded code and querywith a co-attention mechanism. Generally, such mechanism learnsa correlation matrix between embedded code and query, and coattends their semantic relationship via row/column-wise max-pooling.In this way, the semantic correlation between code and query candirectly affect their individual representations. We evaluate the effectiveness of CARLCS-CNN on Gu et al.'s dataset with 10k queries.Experimental results show that the proposed CARLCS-CNN modelsignificantly outperforms DeepCS by 26.72% in terms of MRR (meanreciprocal rank). Additionally, CARLCS-CNN is five times fasterthan DeepCS in model training and four times in testing.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE/ACM 28th International Conference on Program Comprehension, ICPC 2020
EditorsYann-Gaël Guéhéneuc, Shinpei Hayashi
Place of PublicationNew York NY USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages196-207
Number of pages12
ISBN (Electronic)9781450379588
DOIs
Publication statusPublished - 2020
EventInternational Conference on Program Comprehension 2020 - Seoul, Korea, South
Duration: 13 Jul 202015 Jul 2020
Conference number: 28th
https://dl.acm.org/doi/proceedings/10.1145/3387904 (Proceedings)
https://conf.researchr.org/home/icpc-2020 (Website)

Conference

ConferenceInternational Conference on Program Comprehension 2020
Abbreviated titleICPC 2020
Country/TerritoryKorea, South
CitySeoul
Period13/07/2015/07/20
Internet address

Keywords

  • Co-attention mechanism
  • Code search
  • Representation learning

Cite this