Msu-Avis Dataset: Fusing Face And Voice Modalities For Biometric Recognition In Indoor Surveillance Videos

2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)(2018)

引用 18|浏览27
暂无评分
摘要
Indoor video surveillance systems often use the face modality to establish the identity of a person of interest. However, the face image may not offer sufficient discriminatory information in many scenarios due to substantial variations in pose, illumination, expression, resolution and distance between the subject and the camera. In such cases, the inclusion of an additional biometric modality can benefit the recognition process. In this regard, we consider the fusion of voice and face modalities for enhancing the recognition accuracy. The main contribution of this work is assembling a multimodal (face and voice), semi-constrained, indoor video surveillance dataset referred to as the MSU Audio-Video Indoor Surveillance (MSU-AVIS) dataset. We use a consumer-grade camera with a built-in microphone to acquire data for this purpose. We use current state-of-art deep-learning based methods to perform face and speaker recognition on the collected dataset for establishing baseline performance. We also explore multiple fusion schemes to combine face and speaker recognition to perform effective person recognition on audio-video surveillance data. Experiments convey the efficacy of the proposed multimodal fusion scheme (face and voice) over unimodal approaches in surveillance scenarios. The collected dataset is being made available for research purposes.
更多
查看译文
关键词
MSU Audio-Video Indoor Surveillance dataset,speaker recognition,audio-video surveillance data,surveillance scenarios,MSU-AVIS dataset,voice modalities,biometric recognition,indoor surveillance videos,Indoor video surveillance systems,face modality,face image,face modalities,indoor video surveillance dataset,person recognition,face fusion,deep-learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要