SportQA: A Benchmark for Sports Understanding in Large Language Models
CoRR(2024)
摘要
A deep understanding of sports, a field rich in strategic and dynamic
content, is crucial for advancing Natural Language Processing (NLP). This holds
particular significance in the context of evaluating and advancing Large
Language Models (LLMs), given the existing gap in specialized benchmarks. To
bridge this gap, we introduce SportQA, a novel benchmark specifically designed
for evaluating LLMs in the context of sports understanding. SportQA encompasses
over 70,000 multiple-choice questions across three distinct difficulty levels,
each targeting different aspects of sports knowledge from basic historical
facts to intricate, scenario-based reasoning tasks. We conducted a thorough
evaluation of prevalent LLMs, mainly utilizing few-shot learning paradigms
supplemented by chain-of-thought (CoT) prompting. Our results reveal that while
LLMs exhibit competent performance in basic sports knowledge, they struggle
with more complex, scenario-based sports reasoning, lagging behind human
expertise. The introduction of SportQA marks a significant step forward in NLP,
offering a tool for assessing and enhancing sports understanding in LLMs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要