Efficient Deep Speech Understanding at the Edge
CoRR(2023)
摘要
Contemporary Speech Understanding (SU) involves a sophisticated pipeline:
capturing real-time voice input, the pipeline encompasses a deep neural network
with an encoder-decoder architecture enhanced by beam search. This network
periodically assesses attention and Connectionist Temporal Classification (CTC)
scores in its autoregressive output.
This paper aims to enhance SU performance on edge devices with limited
resources. It pursues two intertwined goals: accelerating on-device execution
and efficiently handling inputs that surpass the on-device model's capacity.
While these objectives are well-established, we introduce innovative solutions
that specifically address SU's distinctive challenges: 1. Late
contextualization: Enables the parallel execution of a model's attentive
encoder during input ingestion. 2. Pilot decoding: Alleviates temporal load
imbalances. 3. Autoregression offramps: Facilitate offloading decisions based
on partial output sequences.
Our techniques seamlessly integrate with existing SU models, pipelines, and
frameworks, allowing for independent or combined application. Together, they
constitute a hybrid solution for edge SU, exemplified by our prototype, XYZ.
Evaluated on platforms equipped with 6-8 Arm cores, our system achieves
State-of-the-Art (SOTA) accuracy, reducing end-to-end latency by 2x and halving
offloading requirements.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要