A Resource-Saving Energy-Efficient Reconfigurable Hardware Accelerator for BERT-based Deep Neural Network Language Models using FFT Multiplication.

ISCAS(2022)

引用 1|浏览3
暂无评分
摘要
Bidirectional Encoder Representations from Transformers (BERT) based language models are a new class of deep neural networks with an attention mechanism. They emerge as a better alternative to the traditional recurrent neural networks for better sequence representation. They have achieved state-of-the-art performance in various natural language processing (NLP) tasks. Nevertheless, they demand intensive computation, energy, and memory requirements which pose a major challenge for their deployment on resource-constrained platforms and edge devices. To mitigate these limitations, this paper proposes a novel hardware accelerator design dedicated for BERT-based architectures with a reconfigurable functionality that improves circuit reusability and reduces hardware resources utilization. To the best of our knowledge, it is the first to present a holistic design and implementation of a reconfigurable hardware accelerator for BERT-based deep neural network language models. The proposed design leverages Fast Fourier Transform-based multiplication on block-circulant matrices for accelerating BERT weights matrices' multiplication. It is evaluated for different BERT-based model configurations on mainstream popular benchmarks while achieving a state-of-the-art performance. It is also evaluated for distinct batch sizes to study the impact of the batch size on the energy efficiency. A cross-platform comparative analysis shows that the proposed hardware accelerator achieves 6x, 27x, 3.18x, and 8x improvement compared to CPU, and up to 1.17x, 1.77x, 5x, and 86x improvement compared to GPU in latency, throughput, power consumption, and energy efficiency, respectively. This design is suitable for efficient NLP on resource-constrained platforms where low latency and high throughput are critical.
更多
查看译文
关键词
reconfigurable,resource-saving,energy-efficient,bert-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要