Automatic generation of pull request descriptions

Zhongxin Liu, Xin Xia, Christoph Treude, David Lo, Shanping Li

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

81 Citations (Scopus)

Abstract

Enabled by the pull-based development model, developers can easily contribute to a project through pull requests (PRs). When creating a PR, developers can add a free-form description to describe what changes are made in this PR and/or why. Such a description is helpful for reviewers and other developers to gain a quick understanding of the PR without touching the details and may reduce the possibility of the PR being ignored or rejected. However, developers sometimes neglect to write descriptions for PRs. For example, in our collected dataset with over 333K PRs, more than 34% of the PR descriptions are empty. To alleviate this problem, we propose an approach to automatically generate PR descriptions based on the commit messages and the added source code comments in the PRs. We regard this problem as a text summarization problem and solve it using a novel sequence-to-sequence model. To cope with out-of-vocabulary words in software artifacts and bridge the gap between the training loss function of the sequence-to-sequence model and the evaluation metric ROUGE, which has been shown to correspond to human evaluation, we integrate the pointer generator and directly optimize for ROUGE using reinforcement learning and a special loss function. We build a dataset with over 41K PRs and evaluate our approach on this dataset through ROUGE and a human evaluation. Our evaluation results show that our approach outperforms two baselines by significant margins.

Original languageEnglish
Title of host publicationProceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
EditorsJulia Lawall, Darko Marinov
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages176-188
Number of pages13
ISBN (Electronic)9781728125084
ISBN (Print)9781728125091
DOIs
Publication statusPublished - 2019
EventAutomated Software Engineering Conference 2019 - San Diego, United States of America
Duration: 10 Nov 201915 Nov 2019
Conference number: 34th
https://2019.ase-conferences.org/ (Conference website)
https://dl.acm.org/doi/proceedings/10.5555/3382508 (Proceedings)

Conference

ConferenceAutomated Software Engineering Conference 2019
Abbreviated titleASE 2019
Country/TerritoryUnited States of America
CitySan Diego
Period10/11/1915/11/19
Internet address

Keywords

  • Document Generation
  • Pull Request
  • Sequence to Sequence Learning

Cite this