IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion

Bolun Li, Zhihong Sun, Tao Huang,Hongyu Zhang,Yao Wan,Ge Li,Zhi Jin,Chen Lyu

Proceedings of the ACM on Software Engineering（2024）

引用 0|浏览69

暂无评分

摘要

Code completion aims to enhance programming productivity by predictingpotential code based on the current programming context. Recently, pretrainedlanguage models (LMs) have become prominent in this field. Various approacheshave been proposed to fine-tune LMs using supervised fine-tuning (SFT)techniques for code completion. However, the inherent exposure bias of thesemodels can cause errors to accumulate early in the sequence completion, leadingto even more errors in subsequent completions. To address this problem, deepreinforcement learning (DRL) is an alternative technique for fine-tuning LMsfor code completion, which can improve the generalization capabilities andoverall performance. Nevertheless, integrating DRL-based strategies into codecompletion faces two major challenges: 1) The dynamic nature of the codecontext requires the completion model to quickly adapt to changes, which posesdifficulties for conventional DRL strategies that focus on delayed rewarding ofthe final code state. 2) It is difficult to evaluate the correctness of partialcode, thus the reward redistribution-based strategies cannot be adapted to codecompletion. To tackle these challenges, we propose IRCoCo, a codecompletion-specific DRL-based fine-tuning framework. This framework is designedto provide immediate rewards as feedback for detecting dynamic context changesarising from continuous edits during code completion. With the aid of immediatefeedback, the fine-tuned LM can gain a more precise understanding of thecurrent context, thereby enabling effective adjustment of the LM and optimizingcode completion in a more refined manner. Experimental results demonstrate thatfine-tuning pretrained LMs with IRCoCo leads to significant improvements in thecode completion task, outperforming both SFT-based and other DRL-basedbaselines.

查看译文

关键词

code completion,immediate rewards,reinforcement learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要