Comparing LSTM Recurrent Networks with Spiking Recurrent Networks on the Recognition of Spoken Digits

msra

引用 23|浏览31
暂无评分
摘要
Abstract One advantage of spiking recurrent neural networks (SNNs) is an ability to categorise data using a synchrony-based latching mechnanism. This is particularly useful in problems where timewarping is encountered, such as speech recognition. Differentiable recurrent neural networks (RNNs) by contrast fail at tasks involving difficult timewarpi ng, despite having sequence learning capabilities superior to SNNs. In this paper we demonstrate that Long Short-Term Memory (LSTM) is an RNN capable of robustly categorizing timewarped speech data, thus combining the most useful features of both paradigms. We compare its performance to SNNs on two variants of a spoken digit identification task, using data from an international competition. The first task (described in Nature [15]) required the categorisation of spoken digits with only a single training exemplar, and was specifically designed to test robustness to timewarping. Here LSTM performed better than all the SNNs in the competition. The second task was to predict spoken digits using a larger training set. Here LSTM greatly outperformed an SNN-like model found in the literature. These results suggest that LSTM has a place in domains that require the learning of large timewarped datasets, such as automatic speech recognition.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要