Efficient data supply for hardware accelerators with prefetching and access/execute decoupling.

MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture Taipei Taiwan October, 2016(2016)

引用 51|浏览69
暂无评分
摘要
This paper presents an architecture framework to easily design hardware accelerators that can effectively tolerate long and variable memory latency using prefetching and access/execute decoupling. Hardware accelerators are becoming increasingly popular in modern computing systems as a promising approach to achieve higher performance and energy efficiency when technology scaling is slowing down. However, today's high-performance accelerators require significant manual efforts to design, in large part due to the need to carefully orchestrate data transfers between external memory and an accelerator. Instead, the proposed framework utilizes automated program analysis along with High-Level Synthesis (HLS) tools to enable prefetching and access/execute decoupling with minimal manual efforts. The framework adds tags to accelerator memory accesses so that hardware prefetching can effectively preload data for accesses with regular patterns. To handle irregular memory accesses, the framework generates an accelerator with decoupled access/execute architecture using program slicing. Experimental results show that the proposed optimizations can significantly improve performance of HLS-generated accelerators (average speedup of 2.28x across eight accelerators) and often reduce energy consumption (average of 15%).
更多
查看译文
关键词
data supply,architecture framework,hardware accelerators design,memory latency,data transfer,external memory,automated program analysis,high-level synthesis tools,accelerator memory access,hardware prefetching,data preloading,decoupled access/execute architecture,performance improvement,HLS-generated accelerators,energy consumption reduction,program slicing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要