KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation
CoRR(2024)
摘要
Commit messages are natural language descriptions of code changes, which are
important for software evolution such as code understanding and maintenance.
However, previous methods are trained on the entire dataset without considering
the fact that a portion of commit messages adhere to good practice (i.e.,
good-practice commits), while the rest do not. On the basis of our empirical
study, we discover that training on good-practice commits significantly
contributes to the commit message generation. Motivated by this finding, we
propose a novel knowledge-aware denoising learning method called KADEL.
Considering that good-practice commits constitute only a small proportion of
the dataset, we align the remaining training samples with these good-practice
commits. To achieve this, we propose a model that learns the commit knowledge
by training on good-practice commits. This knowledge model enables
supplementing more information for training samples that do not conform to good
practice. However, since the supplementary information may contain noise or
prediction errors, we propose a dynamic denoising training method. This method
composes a distribution-aware confidence function and a dynamic distribution
list, which enhances the effectiveness of the training process. Experimental
results on the whole MCMD dataset demonstrate that our method overall achieves
state-of-the-art performance compared with previous methods. Our source code
and data are available at https://github.com/DeepSoftwareAnalytics/KADEL
更多查看译文
关键词
commit message generation,knowledge introducing,denoising training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要