Speedy - An Accelerator for Sparse Convolutional Neural Networks on FPGAs.

Liqiang Lu,Yun Liang,Ruirui Huang,Wei Lin,Xiaoyuan Cui,Jiansong Zhang

FPGA（2019）

引用 5|浏览67

暂无评分

摘要

Deep convolutional neural networks (CNNs) have achieved remarkable performance with the cost of huge computation. Moreover, the current trend of CNNs is towards more complex and deeper topology. Compressing CNNs to sparse have emerged as the most attractive approach to reduce the amount of computation and memory requirement. This compression is achieved by pruning the redundant connection in networks. FPGAs have been an effective solution to accelerate CNN inference for its high parallel computing, flexibility and energy-efficiency. Although existing FPGA architectures are able to excellently process dense CNN models, they cannot benefit from the computation reduction when accelerating the sparse CNN models. Because most of the arithmetic operations involve addition and multiplication with zero operands, meanwhile accelerating sparse CNN models incurs significant data encoding and decoding overhead. In this paper, we propose a FPGA accelerator Speedy that can efficiently exploit sparsity in CNN models. We first investigate the dataflow design space to explore the available performance with different parallelization strategies. The result of exploration is Speedy dataflow which provides enough parallel multiplications and maximizes the weight reuse. Then, we propose a novel data representation combined with memory partition technique to increase the on-chip bandwidth. Finally, we propose Speedy FPGA architecture in which we apply line buffer design and high-throughput PE. In the experiments, we evaluate Speedy on contemporary neural networks. Speedy provides flexible parameters for different FPGA scale. First, we evaluate the resources utilization and hardware efficiency with different design configurations. Then we compare our design with previous FPGA implementations. Overall, Speedy achieves 11.3x-20.8x and 1.5x-6.8x speed up for Alexnet and VGGnet with 90% weight sparsity.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要