StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Shihan Dou,Yan Liu,Haoxiang Jia,Limao Xiong,Enyu Zhou,Wei Shen,Junjie Shan,Caishuang Huang,Xiao Wang,Xiaoran Fan,Zhiheng Xi,Yuhao Zhou, Tao Ji,Rui Zheng,Qi Zhang,Xuanjing Huang,Tao Gui

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1 Long Papers)（2024）

引用 0|浏览61

暂无评分

摘要

The advancement of large language models (LLMs) has significantly propelledthe field of code generation. Previous work integrated reinforcement learning(RL) with compiler feedback for exploring the output space of LLMs to enhancecode generation quality. However, the lengthy code generated by LLMs inresponse to complex human requirements makes RL exploration a challenge. Also,since the unit tests may not cover the complicated code, optimizing LLMs byusing these unexecuted code snippets is ineffective. To tackle thesechallenges, we introduce StepCoder, a novel RL framework for code generation,consisting of two main components: CCCS addresses the exploration challenge bybreaking the long sequences code generation task into a Curriculum of CodeCompletion Subtasks, while FGO only optimizes the model by masking theunexecuted code segments to provide Fine-Grained Optimization. In addition, wefurthermore construct the APPS+ dataset for RL training, which is manuallyverified to ensure the correctness of unit tests. Experimental results showthat our method improves the ability to explore the output space andoutperforms state-of-the-art approaches in corresponding benchmarks. Ourdataset APPS+ and StepCoder are available online.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要