An annotated dataset of stack overflow post edits

Sebastian Baltes, Markus Wagner

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

3 Citations (Scopus)

Abstract

To improve software engineering, software repositories have been mined for code snippets and bug fixes. Typically, this mining takes place at the level of files or commits. To be able to dig deeper and to extract insights at a higher resolution, we hereby present an annotated dataset that contains over 7 million edits of code and text on Stack Overflow. Our preliminary study indicates that these edits might be a treasure trove for mining information about fine-grained patches, e.g., for the optimisation of non-functional properties.

Original languageEnglish
Title of host publicationProceedings of the 2020 Genetic and Evolutionary Computation Conference Companion
EditorsCarlos A. Coello Coello
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages1923-1925
Number of pages3
ISBN (Electronic)9781450371278
DOIs
Publication statusPublished - 2020
Externally publishedYes
EventThe Genetic and Evolutionary Computation Conference 2020 - Cancun, Mexico
Duration: 8 Jul 202012 Jul 2020
Conference number: 22nd
https://gecco-2020.sigevo.org/index.html/HomePage
https://dl.acm.org/doi/proceedings/10.1145/3377930 (Proceedings)

Conference

ConferenceThe Genetic and Evolutionary Computation Conference 2020
Abbreviated titleGECCO 2020
Country/TerritoryMexico
CityCancun
Period8/07/2012/07/20
Internet address

Keywords

  • Mining software repositories
  • Patches
  • Software documentation
  • Software evolution
  • Stack overflow

Cite this