Mdrt: Multi-Domain Synthetic Speech Localization

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览1
暂无评分
摘要
With recent advancements in generating synthetic speech, tools to generate high-quality synthetic speech impersonating any human speaker are easily available. Several incidents report misuse of high-quality synthetic speech for spreading misinformation and for large-scale financial frauds. Many methods have been proposed for detecting synthetic speech; however, there is limited work on localizing the synthetic segments within the speech signal. In this work, our goal is to localize the synthetic speech segments in a partially synthetic speech signal. Most existing methods for synthetic speech localization obtain features from either the time domain waveform or the spectrogram representation of the speech signal. In this work, we propose Multi-Domain ResNet Transformer (MDRT) that obtains multi-domain features from both the time domain and the spectrogram representation of a speech signal to localize synthetic speech segments. MDRT uses transformer neural networks to obtain multi-domain features and processes them using a ResNet-style neural network. We use the PartialSpoof dataset to examine the performance of MDRT on localizing synthetic speech segments of varying duration. Our results show that MDRT performs better than several existing synthetic speech localization methods.
更多
查看译文
关键词
Synthetic speech localization,speech forensics,deepfake speech,PartialSpoof,anti-spoofing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要