Montgomery Multiplication Scalable Systolic Designs Optimized for DSP48E2

ACM Transactions on Reconfigurable Technology and Systems(2024)

引用 0|浏览0
暂无评分
摘要
This article describes an extensive study of the use of DSP48E2 Slices in Ultrascale FPGAs to design hardware versions of the Montgomery Multiplication algorithm for the hardware acceleration of modular multiplications. Our fully scalable systolic architectures result in parallelized, DSP48E2-optimized scheduling of operations analogous to the FIOS block variant of the Montgomery Multiplication. We explore the impacts of different pipelining strategies within DSP blocks, scheduling of operations, processing element configurations, global design structures and their tradeoffs in terms of performance and resource costs. We discuss the application of our methodology to multiple types of DSP primitives. We provide ready-to-use fast, efficient, and fully parametrizable designs, which can adapt to a wide range of requirements and applications. Implementations are scalable to any operand width. Our most efficient designs can perform 128, 256, 512, 1024, 2048, and 4096 bits Montgomery modular multiplications in 0.0992 μs, 0.2032 μs, 0.3952 μs, 0.7792μs, 1.550 μs, and 3.099 μs using 4, 6, 11, 21, 41, and 82 DSP blocks, respectively.
更多
查看译文
关键词
Montgomery Multiplication,hardware acceleration,FPGA,DSP,systolic architecture
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要