Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
arxiv(2024)
摘要
Conformer-based attention models have become the de facto backbone model for
Automatic Speech Recognition tasks. A blank symbol is usually introduced to
align the input and output sequences for CTC or RNN-T models. Unfortunately,
the long input length overloads computational budget and memory consumption
quadratically by attention mechanism. In this work, we propose a
"Skip-and-Recover" Conformer architecture, named Skipformer, to squeeze
sequence input length dynamically and inhomogeneously. Skipformer uses an
intermediate CTC output as criteria to split frames into three groups: crucial,
skipping and ignoring. The crucial group feeds into next conformer blocks and
its output joint with skipping group by original temporal order as the final
encoder output. Experiments show that our model reduces the input sequence
length by 31 times on Aishell-1 and 22 times on Librispeech corpus. Meanwhile,
the model can achieve better recognition accuracy and faster inference speed
than recent baseline models. Our code is open-sourced and available online.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要