Chatbot4QR: interactive query refinement for technical question retrieval

Neng Zhang, Qiao Huang, Xin Xia, Ying Zou, David Lo, Zhenchang Xing

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Abstract—Technical Q&A sites (e.g., Stack Overflow (SO)) are important resources for developers to search for knowledge about

technical problems. Search engines provided in Q&A sites and information retrieval approaches (e.g., word embedding-based) have

limited capabilities to retrieve relevant questions when queries are imprecisely specified, such as missing important technical details

(e.g., the user’s preferred programming languages). Although many automatic query expansion approaches have been proposed to

improve the quality of queries by expanding queries with relevant terms, the information missed in a query is not identified. Moreover,

without user involvement, the existing query expansion approaches may introduce unexpected terms and lead to undesired results.

In this paper, we propose an interactive query refinement approach for question retrieval, named Chatbot4QR, which can assist users

in recognizing and clarifying technical details missed in queries and thus retrieve more relevant questions for users. Chatbot4QR

automatically detects missing technical details in a query and generates several clarification questions (CQs) to interact with the user

to capture their overlooked technical details. To ensure the accuracy of CQs, we design a heuristic-based approach for CQ generation

after building two kinds of technical knowledge bases: a manually categorized result of 1,841 technical tags in SO and the multiple

version-frequency information of the tags.

We develop a Chatbot4QR prototype that uses 1.88 million SO questions as the repository for question retrieval. To evaluate

Chatbot4QR, we conduct six user studies with 25 participants on 50 experimental queries. The results are as follows. (1) On average

60.8% of the CQs generated for a query are useful for helping the participants recognize missing technical details. (2) Chatbot4QR can

rapidly respond to the participants after receiving a query within approximately 1.3 seconds. (3) The refined queries contribute to

retrieving more relevant SO questions than nine baseline approaches. For more than 70% of the participants who have preferred

techniques on the query tasks, Chatbot4QR significantly outperforms the state-of-the-art word embedding-based retrieval approach

with an improvement of at least 54.6% in terms of two measurements: Pre@k and NDCG@k. (4) For 48%-88% of the assigned query

tasks, the participants obtain more desired results after interacting with Chatbot4QR than directly searching from Web search engines

(e.g., the SO search engine and Google) using the original queries.

Original languageEnglish
Number of pages26
JournalIEEE Transactions on Software Engineering
DOIs
Publication statusAccepted/In press - 12 Aug 2020

Keywords

  • Chatbot
  • Interactive Query Refinement
  • Question Retrieval
  • Stack Overflow

Cite this