A Scalable Architecture For Accelerating Multi-Operation And Continuous Floating-Point Matrix Computing On Fpgas

IEEE ACCESS(2020)

引用 1|浏览0
暂无评分
摘要
Matrix computing is a basic operational model that was broadly used in science and engineering applications. In this study, we first propose a novel optimization method to obtain a high-performance and scalable architecture for matrix multiplication, including reducing data transmission, optimizing data flow, improving resource utilization, and dynamically changing the length of the linear array. Based on the optimized architecture, we present a multi-operation floating-point matrix computing unit (design-I), which extends the function of matrix computing from single matrix multiplication operation to matrix addition, matrix subtraction, matrix-vector multiplication, matrix-scalar multiplication. With low storage demand and computing efficiency, design-I can be used in computing dense matrices of arbitrary sizes. Moreover, we propose a continuous floating-point matrix computing unit (design-II), which not only has the same function of multi-operation but also meets the requirement of continuous matrix computing in practical engineering and avoids a large amount of intermediate data transfer. Finally, the authors adopt the above-mentioned unit cores to build a matrix computing acceleration system according to different engineering requirements. The experiments implemented in the Xilinx 585T FPGA device show that the accelerator achieves a maximum frequency of 195Mhz with 256 processing elements (PEs) and performs 99.8GFLOPS. The architecture is more outstanding in application scope and prospects compared with state-of-the-art methods.
更多
查看译文
关键词
Acceleration, Field programmable gate arrays, Arrays, Data transfer, Registers, Matrix computing, accelerator, floating-point, FPGAs, scalability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要