Dynamic TF-TDNN: Dynamic Time Delay Neural Network Based on Temporal-Frequency Attention for Dialect Recognition

Chao Liao,Jinwen Huang,Huan Yuan,Peng Yao,Jianchao Tan,Dawei Zhang,Feng Deng,Xiaorui Wang,Chengru Song

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2023）

引用 0|浏览2

暂无评分

摘要

Dialect recognition aims to recognize dialect categories in utterances, which has been applied in many audio applications. Recently, various Time Delayed Neural Network (TDNN) based AI models are proposed to solve dialect recognition problems, such as D-TDNN, DMC-TDNN, and ECAPA-TDNN, however, most of them only perform temporal attention in the last statistical pooling layer of the TDNN network, which ignores the importance of simultaneously capturing both frequency and temporal key information in utterances under different receptive fields. In contrast, we introduce a hybrid attention mechanism in both the temporal and frequency domain, called the TF-attention module, which adaptively pays more attention to the indeed important frames and the frame-level important information under different receptive fields for dialect recognition. Moreover, we are the first to introduce a dynamic architecture mechanism in the field of dialect recognition to dynamically reduce the computational cost and the number of parameters of models. We evaluate the proposed dynamic TF-TDNN on the OLR challenge AP20-OLR-dialect task and achieve State-Of-The-Art (SOTA) performance with fewer model parameters.

查看译文

关键词

dialect recognition,temporal-frequency attention,dynamic architectures,TDNN

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要