Improving code search with co-attentive representation learning

Jianhang Shuai, Ling Xu, Chao Liu, Meng Yan, Xin Xia, Yan Lei

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

51 Citations (Scopus)


Searching and reusing existing code from a large-scale codebase,e.g, GitHub, can help developers complete a programming task efficiently. Recently, Gu et al. proposed a deep learning-based model(i.e., DeepCS), which significantly outperformed prior models. TheDeepCS embedded codebase and natural language queries intovectors by two LSTM (long and short-term memory) models separately, and returned developers the code with higher similarityto a code search query. However, such embedding method learnedtwo isolated representations for code and query but ignored theirinternal semantic correlations. As a result, the learned isolated representations of code and query may limit the effectiveness of codesearch.To address the aforementioned issue, we propose a co-attentiverepresentation learning model, i.e., Co-Attentive RepresentationLearning Code Search-CNN (CARLCS-CNN). CARLCS-CNN learnsinterdependent representations for the embedded code and querywith a co-attention mechanism. Generally, such mechanism learnsa correlation matrix between embedded code and query, and coattends their semantic relationship via row/column-wise max-pooling.In this way, the semantic correlation between code and query candirectly affect their individual representations. We evaluate the effectiveness of CARLCS-CNN on Gu et al.'s dataset with 10k queries.Experimental results show that the proposed CARLCS-CNN modelsignificantly outperforms DeepCS by 26.72% in terms of MRR (meanreciprocal rank). Additionally, CARLCS-CNN is five times fasterthan DeepCS in model training and four times in testing.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE/ACM 28th International Conference on Program Comprehension, ICPC 2020
EditorsYann-Gaël Guéhéneuc, Shinpei Hayashi
Place of PublicationNew York NY USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages12
ISBN (Electronic)9781450379588
Publication statusPublished - 2020
EventInternational Conference on Program Comprehension 2020 - Seoul, Korea, South
Duration: 13 Jul 202015 Jul 2020
Conference number: 28th (Proceedings) (Website)


ConferenceInternational Conference on Program Comprehension 2020
Abbreviated titleICPC 2020
Country/TerritoryKorea, South
Internet address


  • Co-attention mechanism
  • Code search
  • Representation learning

Cite this