Limits of Detecting Text Generated by Large-Scale Language Models

Varshney Lav R.,Keskar Nitish Shirish,Socher Richard

2020 Information Theory and Applications Workshop (ITA)（2020）

引用 4|浏览281

暂无评分

摘要

Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is ex-tended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.

查看译文

关键词

large-scale language model output detection,language generation performance,human language,maximum likelihood language models,text detection,k-order Markov approximations,error probabilities,semantic side information

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要