Lessons learned building a legal inference dataset
ARTIFICIAL INTELLIGENCE AND LAW(2023)
摘要
Legal inference is fundamental for building and verifying hypotheses in police investigations. In this study, we build a Natural Language Inference dataset in Korean for the legal domain, focusing on criminal court verdicts. We developed an adversarial hypothesis collection tool that can challenge the annotators and give us a deep understanding of the data, and a hypothesis network construction tool with visualized graphs to show a use case scenario of the developed model. The data is augmented using a combination of Easy Data Augmentation approaches and round-trip translation, as crowd-sourcing might not be an option for datasets with sensible data. We extensively discuss challenges we have encountered, such as the annotator’s limited domain knowledge, issues in the data augmentation process, problems with handling long contexts and suggest possible solutions to the issues. Our work shows that creating legal inference datasets with limited resources is feasible and proposes further research in this area.
更多查看译文
关键词
Legal inference,Natural language inference,Criminal court data,Data augmentation,Korean dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要