Can ChatGPT Replace Traditional KBQA Models? An In-Depth Analysis of the Question Answering Performance of the GPT LLM Family

Yiming Tan,Dehai Min, Yu Li, Wenbo Li,Nan Hu,Yongrui Chen,Guilin Qi

SEMANTIC WEB, ISWC 2023, PART I(2023)

引用 2|浏览12
暂无评分
摘要
ChatGPT is a powerful large language model (LLM) that covers knowledge resources such as Wikipedia and supports natural language question answering using its own knowledge. Therefore, there is growing interest in exploring whether ChatGPT can replace traditional knowledge-based question answering (KBQA) models. Although there have been some works analyzing the question answering performance of ChatGPT, there is still a lack of large-scale, comprehensive testing of various types of complex questions to analyze the limitations of the model. In this paper, we present a framework that follows the black-box testing specifications of CheckList proposed by [38]. We evaluate ChatGPT and its family of LLMs on eight real-world KB-based complex question answering datasets, which include six English datasets and two multilingual datasets. The total number of test cases is approximately 190,000. In addition to the GPT family of LLMs, we also evaluate the wellknown FLAN-T5 to identify commonalities between the GPT family and other LLMs. The dataset and code are available at https://github.com/ tan92hl/Complex- Question-Answering- Evaluation- of- GPT-family.git.
更多
查看译文
关键词
Large language model,Complex question answering,Knowledge base,ChatGPT,Evaluation,Black-box testing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要