Mad Libs Are All You Need: Augmenting Cross-Domain Document-Level Event Argument Data
CoRR(2024)
摘要
Document-Level Event Argument Extraction (DocEAE) is an extremely difficult
information extraction problem – with significant limitations in low-resource
cross-domain settings. To address this problem, we introduce Mad Lib Aug (MLA),
a novel generative DocEAE data augmentation framework. Our approach leverages
the intuition that Mad Libs, which are categorically masked documents used as a
part of a popular game, can be generated and solved by LLMs to produce data for
DocEAE. Using MLA, we achieve a 2.6-point average improvement in overall F1
score. Moreover, this approach achieves a 3.9 and 5.2 point average increase in
zero and few-shot event roles compared to augmentation-free baselines across
all experiments.
To better facilitate analysis of cross-domain DocEAE, we additionally
introduce a new metric, Role-Depth F1 (RDF1), which uses statistical depth to
identify roles in the target domain which are semantic outliers with respect to
roles observed in the source domain. Our experiments show that MLA augmentation
can boost RDF1 performance by an average of 5.85 points compared to
non-augmented datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要