A Cross-document Coreference Dataset for Longitudinal Tracking across Radiology Reports.

Surabhi Datta, Hio Cheng Lam, Atieh Pajouhi,Sunitha Mogalla,Kirk Roberts

International Conference on Language Resources and Evaluation (LREC)（2022）

引用 0|浏览0

暂无评分

摘要

This paper proposes a new cross-document coreference resolution (CDCR) dataset for identifying co-referring radiological findings and medical devices across a patient's radiology reports. Our annotated corpus contains 5872 mentions (findings and devices) spanning 638 MIMIC-III radiology reports across 60 patients, covering multiple imaging modalities and anatomies. There are a total of 2292 mention chains. We describe the annotation process in detail, highlighting the complexities involved in creating a sizable and realistic dataset for radiology CDCR. We apply two baseline methods-string matching and transformer language models (BERT)-to identify cross-report coreferences. Our results indicate the requirement of further model development targeting better understanding of domain language and context to address this challenging and unexplored task. This dataset can serve as a resource to develop more advanced natural language processing CDCR methods in the future. This is one of the first attempts focusing on CDCR in the clinical domain and holds potential in benefiting physicians and clinical research through long-term tracking of radiology findings.

查看译文

关键词

tracking, cross-document coreference resolution, radiology

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要