Neural-machine-translation-based commit message generation: how far are we?

Zhongxin Liu, David Lo, Xin Xia, Zhenchang Xing, Ahmed E. Hassan, Xinyu Wang

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

178 Citations (Scopus)

Abstract

Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test diffs from which NMT can generate high-quality messages are similar to one or more training diffs at the token level. (2) About 16% of the commit messages in Jiang et al.'s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers.

Original languageEnglish
Title of host publicationASE'18 - Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering
Subtitle of host publicationSeptember 3–7, 2018 Montpellier, France
EditorsGordon Fraser, Christian Kastner
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages373-384
Number of pages12
ISBN (Electronic)9781450359375
DOIs
Publication statusPublished - 2018
EventAutomated Software Engineering Conference 2018 - Corum Conference Center, Montpellier, France
Duration: 3 Sept 20187 Sept 2018
Conference number: 33rd
http://www.ase2018.com/ (Conference website)
https://dl.acm.org/doi/proceedings/10.1145/3238147 (Proceedings)

Conference

ConferenceAutomated Software Engineering Conference 2018
Abbreviated titleASE 2018
Country/TerritoryFrance
CityMontpellier
Period3/09/187/09/18
Internet address

Keywords

  • Commit message generation
  • Nearest neighbor algorithm
  • Neural machine translation

Cite this