Chrome Extension
WeChat Mini Program
Use on ChatGLM

Prompt-Based Length Controlled Generation with Reinforcement Learning

ICLR 2024(2024)

Huawei Noah's Ark Lab

Cited 2|Views122
Abstract
Large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising performance on a wide range of NLP tasks. Length controlled generation of LLMs emerges as an important topic, which enables users to fully leverage the capability of LLMs in more real-world scenarios like generating a proper answer or essay of a desired length. In addition, the autoregressive generation in LLMs is extremely time-consuming, while the ability of controlling this generated length can reduce the inference cost by limiting the length. Therefore, we propose a prompt-based length control method to achieve high-accuracy length controlled generation. In particular, we adopt reinforcement learning with the reward signal given by either trainable or rule-based reward models, which further enhances the length-control ability of LLMs by rewarding outputs that follows pre-defined control instruction. To enable rule-based inference, we also introduce standard prompt extractor to collect the standard control information from users' input. Experiments show that our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT. Both the standard prompt extractor and the RL-tuned model have show strong generalization ability to unseen control prompt templates.
More
Translated text
Key words
Text Generation,Length Control,GPT,Large Language Models,Prompt,Reinforcement Learning
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:该论文提出了一种基于提示的生成长度控制方法,使用强化学习通过奖励模型来影响生成,显著提高了总结任务中提示基长度控制的准确性。

方法】:方法采用了一种基于提示的生成长度控制策略,通过训练或规则基础的奖励模型来指导强化学习过程。

实验】:在CNNDM和NYT等流行数据集上进行的实验表明,该方法能有效提升长度控制的精确度,满足了实际场景中对指定长度文本生成的需求,并在推理成本上实现了节省。