On the use of convolutional neural networks for speech presentation attack detection

Pavel Korshunov,Andre R. Goncalves,Ricardo P. V. Violato,Flávio O. Simões,Sébastien Marcel

2018 IEEE 4th International Conference on Identity, Security, and Behavior Analysis (ISBA)（2018）

引用 13|浏览41

暂无评分

摘要

Research in the area of automatic speaker verification (ASV) has advanced enough for the industry to start using ASV systems in practical applications. However, these systems are highly vulnerable to spoofing or presentation attacks(PAs), limiting their wide deployment. Several speech-based presentation attack detection (PAD) methods have been proposed recently but most of them are based on hand crafted frequency or phase-based features. Although convolutional neural networks (CNN) have already shown breakthrough results in face recognition, little is understood whether CNNs are as effective in detecting presentation attacks in speech. In this paper, to investigate the applicability of CNNs for PAD, we consider shallow and deep examples of CNN architectures implemented using Tensorflow and compare their performances with the state of the art MFCC with GMM-based system on two large databases with presentation attacks: publicly available voicePA and proprietary BioCPqD-PA. We study the impact of increasing the depth of CNNs on the performance, and note how they perform on unknown attacks, by using one database to train and another to evaluate. The results demonstrate that CNNs are able to learn a database significantly better (increasing depth also improves the performance), compared to hand crafted features. However, CNN-based PADs still lack the ability to generalize across databases and are unable to detect unknown attacks well.

查看译文

关键词

convolutional neural networks,speech presentation attack detection,automatic speaker verification,ASV systems,PAD,CNN,unknown attacks,proprietary BioCPqD-PA,publicly available voicePA,Tensorflow,spoofing attacks

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要