Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions

Anfeng Xu, Kevin Huang,Tiantian Feng,Lue Shen,Helen Tager-Flusberg,Shrikanth Narayanan

Interspeech 2024（2024）

引用 0|浏览15

暂无评分

摘要

Speech foundation models, trained on vast datasets, have opened uniqueopportunities in addressing challenging low-resource speech understanding, suchas child speech. In this work, we explore the capabilities of speech foundationmodels on child-adult speaker diarization. We show that exemplary foundationmodels can achieve 39.5Rate and Speaker Confusion Rate, respectively, compared to previous speakerdiarization methods. In addition, we benchmark and evaluate the speakerdiarization results of the speech foundation models with varying the inputaudio window size, speaker demographics, and training data ratio. Our resultshighlight promising pathways for understanding and adopting speech foundationmodels to facilitate child speech understanding.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要