MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction
CoRR(2024)
摘要
Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the
quality of synthetic speech. This study extends the application of predicted
MOS to the task of Fake Audio Detection (FAD), as we expect that MOS can be
used to assess how close synthesized speech is to the natural human voice. We
propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training
data selection and model fusion. In training data selection, we demonstrate
that MOS enables effective filtering of samples from unbalanced datasets. In
the model fusion, our results demonstrate that incorporating MOS as a gating
mechanism in FAD model fusion enhances overall performance.
更多查看译文
关键词
MOS prediction,self-supervised learned (SSL) model,model fusion,fake audio detection (FAD)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要