Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA.

FPGA '15: The 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Monterey California USA February, 2015(2015)

引用 87|浏览353
暂无评分
摘要
Parallel sorting networks are widely employed in hardware implementations for sorting due to their high data parallelism and low control overhead. In this paper, we propose an energy and memory efficient mapping methodology for implementing bitonic sorting network on FPGA. Using this methodology, the proposed sorting architecture can be built for a given data parallelism while supporting continuous data streams. We propose a streaming permutation network (SPN) by "folding" the classic Clos network. We prove that the SPN is programmable to realize all the interconnection patterns in the bitonic sorting network. A low cost design for sorting with minimal resource usage is obtained by reusing one SPN . We also demonstrate a high throughput design by trading off area for performance. With a data parallelism of p (2 ≤ p ≤ N/ log2N), the high throughput design sorts an N-key sequence with latency O(N/p), throughput (# of keys sorted per cycle) O(p) and uses O(N) memory. This achieves optimal memory efficiency (defined as the ratio of throughput to the amount of on-chip memory used by the design) of O(p/N). Another noteworthy feature of the high throughput design is that only single-port memory rather than dual-port memory is required for processing continuous data streams. This results in 50% reduction in memory consumption. Post place-and-route results show that our architecture demonstrates 1.3x ∼1.6x improvment in energy efficiency and 1.5x ∼ 5.3x better memory efficiency compared with the state-of-the-art designs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要