谷歌浏览器插件
订阅小程序
在清言上使用

Combining Code Context and Fine-grained Code Difference for Commit Message Generation

13TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2022(2022)

引用 2|浏览35
暂无评分
摘要
Generating natural language messages for source code changes is an essential task in software development and maintenance. Existing solutions mainly treat a piece of code difference as natural language, and adopt seq2seq learning to translate it into a commit message. The basic assumption of such solutions lies in the naturalness hypothesis, i.e., source code written by programming languages is to some extent similar to natural language text. However, compared with natural language, source code also bears syntactic regularities. In this paper, we propose to simultaneously model the naturalness and syntactic regularities of source code changes for commit message generation. Specifically, to model syntactic regularities, we first enlarge the input with additional context information, i.e., the code statements that have dependency with the variables in the code difference, and then extract the paths in the corresponding ASTs. Moreover, to better model code difference, we align the two versions of code before and after the committed code change at token level, and annotate their differences with fine-grained edit operations. The context and difference are simultaneously encoded in a learning framework to generate the commit messages. We collected from GitHub a large dataset containing 480 Java projects with over 160k commits, and the experimental results demonstrate the effectiveness of the proposed approach.
更多
查看译文
关键词
Commit message generation,software naturalness,code regularity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要