Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights
arxiv(2024)
摘要
This study explores the challenge of sentence-level AI-generated text
detection within human-AI collaborative hybrid texts. Existing studies of
AI-generated text detection for hybrid texts often rely on synthetic datasets.
These typically involve hybrid texts with a limited number of boundaries. We
contend that studies of detecting AI-generated content within hybrid texts
should cover different types of hybrid texts generated in realistic settings to
better inform real-world applications. Therefore, our study utilizes the
CoAuthor dataset, which includes diverse, realistic hybrid texts generated
through the collaboration between human writers and an intelligent writing
system in multi-turn interactions. We adopt a two-step, segmentation-based
pipeline: (i) detect segments within a given hybrid text where each segment
contains sentences of consistent authorship, and (ii) classify the authorship
of each identified segment. Our empirical findings highlight (1) detecting
AI-generated sentences in hybrid texts is overall a challenging task because
(1.1) human writers' selecting and even editing AI-generated sentences based on
personal preferences adds difficulty in identifying the authorship of segments;
(1.2) the frequent change of authorship between neighboring sentences within
the hybrid text creates difficulties for segment detectors in identifying
authorship-consistent segments; (1.3) the short length of text segments within
hybrid texts provides limited stylistic cues for reliable authorship
determination; (2) before embarking on the detection process, it is beneficial
to assess the average length of segments within the hybrid text. This
assessment aids in deciding whether (2.1) to employ a text segmentation-based
strategy for hybrid texts with longer segments, or (2.2) to adopt a direct
sentence-by-sentence classification strategy for those with shorter segments.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要