MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property
CoRR(2024)
摘要
Large language models (LLMs) have demonstrated impressive performance in
various natural language processing (NLP) tasks. However, there is limited
understanding of how well LLMs perform in specific domains (e.g, the
intellectual property (IP) domain). In this paper, we contribute a new
benchmark, the first Multilingual-oriented quiZ on Intellectual Property
(MoZIP), for the evaluation of LLMs in the IP domain. The MoZIP benchmark
includes three challenging tasks: IP multiple-choice quiz (IPQuiz), IP question
answering (IPQA), and patent matching (PatentMatch). In addition, we also
develop a new IP-oriented multilingual large language model (called MoZi),
which is a BLOOMZ-based model that has been supervised fine-tuned with
multilingual IP-related text data. We evaluate our proposed MoZi model and four
well-known LLMs (i.e., BLOOMZ, BELLE, ChatGLM and ChatGPT) on the MoZIP
benchmark. Experimental results demonstrate that MoZi outperforms BLOOMZ, BELLE
and ChatGLM by a noticeable margin, while it had lower scores compared with
ChatGPT. Notably, the performance of current LLMs on the MoZIP benchmark has
much room for improvement, and even the most powerful ChatGPT does not reach
the passing level. Our source code, data, and models are available at
.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要