Projects per year
Abstract
Searching and reusing existing code from a large-scale codebase,e.g, GitHub, can help developers complete a programming task efficiently. Recently, Gu et al. proposed a deep learning-based model(i.e., DeepCS), which significantly outperformed prior models. TheDeepCS embedded codebase and natural language queries intovectors by two LSTM (long and short-term memory) models separately, and returned developers the code with higher similarityto a code search query. However, such embedding method learnedtwo isolated representations for code and query but ignored theirinternal semantic correlations. As a result, the learned isolated representations of code and query may limit the effectiveness of codesearch.To address the aforementioned issue, we propose a co-attentiverepresentation learning model, i.e., Co-Attentive RepresentationLearning Code Search-CNN (CARLCS-CNN). CARLCS-CNN learnsinterdependent representations for the embedded code and querywith a co-attention mechanism. Generally, such mechanism learnsa correlation matrix between embedded code and query, and coattends their semantic relationship via row/column-wise max-pooling.In this way, the semantic correlation between code and query candirectly affect their individual representations. We evaluate the effectiveness of CARLCS-CNN on Gu et al.'s dataset with 10k queries.Experimental results show that the proposed CARLCS-CNN modelsignificantly outperforms DeepCS by 26.72% in terms of MRR (meanreciprocal rank). Additionally, CARLCS-CNN is five times fasterthan DeepCS in model training and four times in testing.
Original language | English |
---|---|
Title of host publication | Proceedings - 2020 IEEE/ACM 28th International Conference on Program Comprehension, ICPC 2020 |
Editors | Yann-Gaël Guéhéneuc, Shinpei Hayashi |
Place of Publication | New York NY USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 196-207 |
Number of pages | 12 |
ISBN (Electronic) | 9781450379588 |
DOIs | |
Publication status | Published - 2020 |
Event | International Conference on Program Comprehension 2020 - Seoul, Korea, South Duration: 13 Jul 2020 → 15 Jul 2020 Conference number: 28th https://dl.acm.org/doi/proceedings/10.1145/3387904 (Proceedings) https://conf.researchr.org/home/icpc-2020 (Website) |
Conference
Conference | International Conference on Program Comprehension 2020 |
---|---|
Abbreviated title | ICPC 2020 |
Country/Territory | Korea, South |
City | Seoul |
Period | 13/07/20 → 15/07/20 |
Internet address |
|
Keywords
- Co-attention mechanism
- Code search
- Representation learning
Projects
- 1 Finished
-
An Intelligent Programmer’s Assistant Using Data Mining
Xia, X.
Australian Research Council (ARC)
1/01/20 → 1/05/21
Project: Research