TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-Based FPGAs

2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2023)

引用 5|浏览64
暂无评分
摘要
The emergence of high-bandwidth memory (HBM) brings new opportunities to boost the performance of sorting acceleration on FPGAs, which was conventionally bounded by the available off-chip memory bandwidth. However, it is nontrivial for designers to fully utilize this immense bandwidth. First, the existing sorter designs cannot be directly scaled at the increasing rate of available off-chip bandwidth, as the required on-chip resource usage grows at a much faster rate and would bound the sorting performance in turn. Second, designers need an in-depth understanding of HBM's characteristics to effectively utilize the HBM bandwidth. To tackle these challenges, we present TopSort, a novel two-phase sorting solution optimized for HBM-based FPGAs. In the first phase, 16 merge trees work in parallel to fully utilize 32 HBM channels' bandwidth. In the second phase, TopSort reuses the logic from phase one to form a wider merge tree to merge the partially sorted results from phase one. TopSort also adopts HBM-specific optimizations to reduce resource overhead and improve bandwidth utilization. TopSort can sort up to 4 GB data using all 32 HBM channels, with an overall sorting performance of 15.6 GB/s. TopSort is 6.7x and 2.7x faster than state-of-the-art CPU and FPGA sorters.
更多
查看译文
关键词
Field programmable gate arrays,Bandwidth,Sorting,Hardware,Throughput,System-on-chip,Optimization,merge sort,hardware acceleration,high-bandwidth memory,memory-centric design,FPGA,floorplan
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要