SPGM: Prioritizing Local Features for enhanced speech separation performance
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)
摘要
Dual-path is a popular architecture for speech separation models (e.g.
Sepformer) which splits long sequences into overlapping chunks for its intra-
and inter-blocks that separately model intra-chunk local features and
inter-chunk global relationships. However, it has been found that inter-blocks,
which comprise half a dual-path model's parameters, contribute minimally to
performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to
replace inter-blocks. SPGM is named after its structure consisting of a
parameter-free global pooling module followed by a modulation module comprising
only 2
layers in the model to be dedicated to local feature modelling, making the
overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4
dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and
0.3 dB respectively and matches the performance of recent SOTA models with up
to 8 times fewer parameters. Model and weights are available at
huggingface.co/yipjiaqi/spgm
更多查看译文
关键词
speech separation,transformer,attentive pooling,feature modulation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要