On reliability of patch correctness assessment

Dinh Xuan Bach Le, Lingfeng Bao, David Lo, Xin Xia, Shanping Li, Corina Pasareanu

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

4 Citations (Scopus)

Abstract

Current state-of-the-art automatic software repair (ASR) techniques rely heavily on incomplete specifications, or test suites, to generate repairs. This, however, may cause ASR tools to generate repairs that are incorrect and hard to generalize. To assess patch correctness, researchers have been following two methods separately: (1) Automated annotation, wherein patches are automatically labeled by an independent test suite (ITS) - a patch passing the ITS is regarded as correct or generalizable, and incorrect otherwise, (2) Author annotation, wherein authors of ASR techniques manually annotate the correctness labels of patches generated by their and competing tools. While automated annotation cannot ascertain that a patch is actually correct, author annotation is prone to subjectivity. This concern has caused an on-going debate on the appropriate ways to assess the effectiveness of numerous ASR techniques proposed recently. In this work, we propose to assess reliability of author and automated annotations on patch correctness assessment. We do this by first constructing a gold set of correctness labels for 189 randomly selected patches generated by 8 state-of-the-art ASR techniques through a user study involving 35 professional developers as independent annotators. By measuring inter-rater agreement as a proxy for annotation quality - as commonly done in the literature - we demonstrate that our constructed gold set is on par with other high-quality gold sets. We then compare labels generated by author and automated annotations with this gold set to assess reliability of the patch assessment methodologies. We subsequently report several findings and highlight implications for future studies.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering, ICSE 2019
EditorsTevfik Bultan, Jon Whittle
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages524-535
Number of pages12
ISBN (Electronic)9781728108698
ISBN (Print)9781728108704
DOIs
Publication statusPublished - 2019
EventInternational Conference on Software Engineering 2019 - Fairmont The Queen Elizabeth Hotel, Montreal, Canada
Duration: 25 May 201931 May 2019
Conference number: 41st
https://2019.icse-conferences.org/home
https://2019.icse-conferences.org/

Conference

ConferenceInternational Conference on Software Engineering 2019
Abbreviated titleICSE 2019
CountryCanada
CityMontreal
Period25/05/1931/05/19
OtherNew Ideas and Emerging Results 2019
Internet address

Keywords

  • Automated program repair
  • empirical study
  • test case generationn

Cite this

Le, D. X. B., Bao, L., Lo, D., Xia, X., Li, S., & Pasareanu, C. (2019). On reliability of patch correctness assessment. In T. Bultan, & J. Whittle (Eds.), Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering, ICSE 2019 (pp. 524-535). [8812054] Piscataway NJ USA: IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICSE.2019.00064