A Large-Scale Evaluation of Speech Foundation Models
IEEE/ACM Transactions on Audio, Speech, and Language Processing(2024)
摘要
The foundation model paradigm leverages a shared foundation model to achieve
state-of-the-art (SOTA) performance for various tasks, requiring minimal
downstream-specific modeling and data annotation. This approach has proven
crucial in the field of Natural Language Processing (NLP). However, the speech
processing community lacks a similar setup to explore the paradigm
systematically. In this work, we establish the Speech processing Universal
PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for
speech. We propose a unified multi-tasking framework to address speech
processing tasks in SUPERB using a frozen foundation model followed by
task-specialized, lightweight prediction heads. Combining our results with
community submissions, we verify that the foundation model paradigm is
promising for speech, and our multi-tasking framework is simple yet effective,
as the best-performing foundation model shows competitive generalizability
across most SUPERB tasks. For reproducibility and extensibility, we have
developed a long-term maintained platform that enables deterministic
benchmarking, allows for result sharing via an online leaderboard, and promotes
collaboration through a community-driven benchmark database to support new
development cycles. Finally, we conduct a series of analyses to offer an
in-depth understanding of SUPERB and speech foundation models, including
information flows across tasks inside the models, the correctness of the
weighted-sum benchmarking protocol and the statistical significance and
robustness of the benchmark.
更多查看译文
关键词
speech,foundation model,self-supervised learning,representation learning,task generalization,benchmark,evaluation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要