Hybrid Attention Transformer Based on Dual-Path for Time-Domain Single-Channel Speech Separation

Dingding Han,Wensheng Zhang,Siling Feng,Mengxing Huang,Yuanyuan Wu

2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)（2023）

Cited 0|Views8

No score

Abstract

Transformer allows each position to interact with all other positions in the input sequence, enabling powerful capturing of global interaction information. However, in speech separation tasks, fine-grained local information is crucial in speech sequences, and relying solely on self-attention mechanisms may not extract these local details information effectively. To address this limitation, this paper proposes a dual-path hybrid attention transformer network (DPHAT-Net) for time-domain single-channel speech separation. Specifically, the hybrid attention transformer (HA-Transformer) module is designed to capture global and local information in speech sequences. Furthermore, a Simple Recurrent Unit (SRU) is introduced to replace traditional positional encoding better to utilize the temporal position information in speech sequences. This paper conducts experimental evaluations on the WSJ0-2mix benchmark dataset and shows that the proposed DPHAT-Net realizes state-of-the-art speech separation performance while maintaining a relatively small model size.

Translated text

Key words

hybrid attention,speech separation,dual-path

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined