Automated query reformulation for efficient search based on query logs from Stack Overflow

Kaibo Cao, Chunyang Chen, Sebastian Baltes, Christoph Treude, Xiang Chen

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

58 Citations (Scopus)

Abstract

As a popular Q&A site for programming, Stack Overflow is a treasure for developers. However, the amount of questions and answers on Stack Overflow make it difficult for developers to efficiently locate the information they are looking for. There are two gaps leading to poor search results: the gap between the user's intention and the textual query, and the semantic gap between the query and the post content. Therefore, developers have to constantly reformulate their queries by correcting misspelled words, adding limitations to certain programming languages or platforms, etc. As query reformulation is tedious for developers, especially for novices, we propose an automated software-specific query reformulation approach based on deep learning. With query logs provided by Stack Overflow, we construct a large-scale query reformulation corpus, including the original queries and corresponding reformulated ones. Our approach trains a Transformer model that can automatically generate candidate reformulated queries when given the user's original query. The evaluation results show that our approach outperforms five state-of-the-art baselines, and achieves a 5.6% to 33.5% boost in terms of ExactMatch and a 4.8% to 14.4% boost in terms of GLEU.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/ACM 43rd International Conference on Software Engineering, ICSE 2021
EditorsArie van Deursen, Tao Xie
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages1273-1285
Number of pages13
ISBN (Electronic)9780738113197
ISBN (Print)9781665402965
DOIs
Publication statusPublished - 2021
EventInternational Conference on Software Engineering 2021 - Online, Madrid, Spain
Duration: 25 May 202128 May 2021
Conference number: 43rd
https://conf.researchr.org/committee/icse-2021/icse-2021-organizing-committe
https://conf.researchr.org/home/icse-2021
https://ieeexplore.ieee.org/xpl/conhome/9401807/proceeding (Proceedings)

Publication series

NameProceedings - International Conference on Software Engineering
PublisherThe Institute of Electrical and Electronics Engineers, Inc.
ISSN (Print)0270-5257
ISSN (Electronic)1558-1225

Conference

ConferenceInternational Conference on Software Engineering 2021
Abbreviated titleICSE 2021
Country/TerritorySpain
CityMadrid
Period25/05/2128/05/21
Internet address

Keywords

  • Data Mining
  • Deep Learning
  • Query Logs
  • Query Reformulation
  • Stack Overflow

Cite this