DyConvMixer: Dynamic Convolution Mixer Architecture for Open-Vocabulary Keyword Spotting

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 2|浏览46
暂无评分
摘要
User-defined keyword spotting research has been gaining popularity in recent years. An open-vocabulary keyword spotting system with high accuracy and low power consumption remains a challenging problem. In this paper, we propose the DyCon-vMixer model for tackling the problem. By leveraging dynamic convolution alongside a convolutional equivalent of the MLP-Mixer architecture, we obtain an efficient and effective model that has less than 200K parameters and uses less than 11M MACs. Despite the fact that our model is less than half the size of state-of-the-art RNN and CNN models, it shows competitive results on the publicly available Hey-Snips and Hey-Snapdragon datasets. In addition, we discuss the importance of designing an effective evaluation system and detail our evaluation pipeline for comparison with future work.
更多
查看译文
关键词
Dynamic Convolution, Open-vocabulary Keyword Spotting, User-defined Keyword Spotting, Query-by-Example, ConvMixer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要