The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts.

International Conference on Language Resources and Evaluation (LREC)(2022)

引用 0|浏览10
暂无评分
摘要
We present the Project Dialogism Novel Corpus, or PDNC, an annotated dataset of quotations for English literary texts. PDNC contains annotations for 35,978 quotations across 22 full-length novels, and is by an order of magnitude the largest corpus of its kind. Each quotation is annotated for the speaker, addressees, type of quotation, referring expression, and character mentions within the quotation text. The annotated attributes allow for a comprehensive evaluation of models of quotation attribution and coreference for literary texts.
更多
查看译文
关键词
quotation attribution, literature, coreference
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要