It takes two to tango: deleted stack overflow question prediction with text and meta features

Xin Xia, David Lo, Denzil Correa, Ashish Sureka, Emad Shihab

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

13 Citations (Scopus)


Stack Overflow is a popular community-based Q&A website that caters to technical needs of software developers. As of February 2015 - Stack Overflow has more than 3.9M registered users, 8.8M questions, and 41M comments. Stack Overflow provides explicit and detailed guidelines on how to post questions but, some questions are very poor in quality. Such questions are deleted by the experienced community members and moderators. Deleted questions increase maintenance cost and have an adverse impact on the user experience. Therefore, predicting deleted questions is an important task. In this study, we propose a two stage hybrid approach - DelPredictor - which combines text processing and classification techniques to predict deleted questions. In the first stage, DelPredictor converts text in the title, body, and tag fields of questions into numerical textual features via text processing and classification techniques. In the second stage, it extracts meta features that can be categorized into: profile, community, content, and syntactic features. Next, it learns and combines two independent classifiers built on the textual and meta features. We evaluate DelPredictor on 5 years (2008 - 2013) of deleted questions from Stack Overflow. Our experimental results show that DelPredictor improves the F1-scores over baseline prediction, a prior approach [12] and a text-based approach by 29.50%, 9.34%, and 28.11%, respectively.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016
Subtitle of host publication10–14 June 2016 Atlanta, Georgia
EditorsSorel Reisman, Sheikh Iqbal Ahamed, Ling Liu, Dejan Milojicic, William Claycomb, Mihhail Matskin, Hiroyuki Sato, Zhiyong Zhang
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages10
ISBN (Electronic)9781467388450
Publication statusPublished - 2016
Externally publishedYes
EventInternational Computer Software and Applications Conference 2016 - Atlanta, United States of America
Duration: 10 Jun 201614 Jun 2016
Conference number: 40th (Proceedings)


ConferenceInternational Computer Software and Applications Conference 2016
Abbreviated titleCOMPSAC 2016
CountryUnited States of America
Internet address


  • Classification
  • Deleted Question
  • Stack Overflow
  • Text Processing

Cite this