Data Optimization CNN Accelerator Design on FPGA

2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)(2019)

引用 7|浏览11
暂无评分
摘要
Image understanding is becoming a vital feature in ever more applications ranging from medical diagnostics to autonomous vehicles. Many applications demand for embedded solutions that integrate into existing systems with power constraints and tight real-time. Convolutional Neural Networks (CNNs) presently achieve record-breaking accuracies in all image understanding benchmarks, but have a very high computational complexity. Modern high-end FPGA generations feature hundreds of thousands of configurable logic blocks, and additionally include an abundance of hardened functional units which enable fast and efficient implementations of common functions. Many researchers have proposed their CNN accelerator prototypes on FPGA. But one problem of the stateof-the-art designs is that they have not solved the data dependence problem well. Data dependency is an important factor affecting accelerator performance. Current designs solve data dependence problem by adding hardware module on FPGA. But this approach has little effect and leads to increased hardware complexity. In this paper, we propose an optimization on the data arrangement in CNN. Which solves the data dependence in CNN by rearranging the data. The rearranged data is stored in a hardware-friendly form. By this way, our accelerator can apply pipeline technology better than current designs. We validate our approach on Xilinx Zynq XC-7Z045 board. The experimental results show that our approach has obvious advantages in terms of hardware resource consumption and bandwidth compare to state-of-the-art designs.
更多
查看译文
关键词
CNN accelerator,FPGA,full pipeline,data rearrange
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要