Lessons learned building a legal inference dataset

ARTIFICIAL INTELLIGENCE AND LAW（2023）

引用 0|浏览3

暂无评分

摘要

Legal inference is fundamental for building and verifying hypotheses in police investigations. In this study, we build a Natural Language Inference dataset in Korean for the legal domain, focusing on criminal court verdicts. We developed an adversarial hypothesis collection tool that can challenge the annotators and give us a deep understanding of the data, and a hypothesis network construction tool with visualized graphs to show a use case scenario of the developed model. The data is augmented using a combination of Easy Data Augmentation approaches and round-trip translation, as crowd-sourcing might not be an option for datasets with sensible data. We extensively discuss challenges we have encountered, such as the annotator’s limited domain knowledge, issues in the data augmentation process, problems with handling long contexts and suggest possible solutions to the issues. Our work shows that creating legal inference datasets with limited resources is feasible and proposes further research in this area.

查看译文

关键词

Legal inference,Natural language inference,Criminal court data,Data augmentation,Korean dataset

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要