The FinBen: An Holistic Financial Benchmark for Large Language Models
CoRR(2024)
摘要
LLMs have transformed NLP and shown promise in various fields, yet their
potential in finance is underexplored due to a lack of thorough evaluations and
the complexity of financial tasks. This along with the rapid development of
LLMs, highlights the urgent need for a systematic financial evaluation
benchmark for LLMs. In this paper, we introduce FinBen, the first comprehensive
open-sourced evaluation benchmark, specifically designed to thoroughly assess
the capabilities of LLMs in the financial domain. FinBen encompasses 35
datasets across 23 financial tasks, organized into three spectrums of
difficulty inspired by the Cattell-Horn-Carroll theory, to evaluate LLMs'
cognitive abilities in inductive reasoning, associative memory, quantitative
reasoning, crystallized intelligence, and more. Our evaluation of 15
representative LLMs, including GPT-4, ChatGPT, and the latest Gemini, reveals
insights into their strengths and limitations within the financial domain. The
findings indicate that GPT-4 leads in quantification, extraction, numerical
reasoning, and stock trading, while Gemini shines in generation and
forecasting; however, both struggle with complex extraction and forecasting,
showing a clear need for targeted enhancements. Instruction tuning boosts
simple task performance but falls short in improving complex reasoning and
forecasting abilities. FinBen seeks to continuously evaluate LLMs in finance,
fostering AI development with regular updates of tasks and models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要