DiffTech: differencing similar technologies from crowd-scale comparison discussions

Han Wang, Chunyang Chen, Zhenchang Xing, John Grundy

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Developers use different technologies for many software development tasks. However, when faced with several technologies with comparable functionalities, it is not easy to select the most appropriate one, as trial and error comparisons among such technologies are time-consuming. Instead, developers can resort to expert articles, read official documents or ask questions in Q&A sites. However, it still remains difficult to get a comprehensive comparison as online information is often fragmented or contradictory. To overcome these limitations, we propose the DiffTech system that exploits crowdsourced discussions from Stack Overflow, and assists technology comparison with an informative summary of different aspects. We first build a large database of comparable technologies in software engineering by mining tags in Stack Overflow. We then locate comparative sentences about comparable technologies with natural language processing methods. We further mine prominent comparison aspects by clustering similar comparative sentences and representing each cluster with its keywords and aggregate the overall opinion towards the comparable technologies. Our evaluation demonstrates both the accuracy and usefulness of our model, and we have implemented our approach as a practical website for public use.

Original languageEnglish
Number of pages17
JournalIEEE Transactions on Software Engineering
DOIs
Publication statusAccepted/In press - 17 Feb 2021

Keywords

  • Aggregates
  • comparing and differencing similar technology
  • Data mining
  • Libraries
  • natural language processing
  • Natural language processing
  • NLP
  • Stack Overflow
  • Tagging
  • Task analysis
  • Tools

Cite this