Accelerating divergent applications on SIMD architectures using neural networks

Computer Design（2015）

引用 64|浏览55

暂无评分

摘要

The purpose of this research is to find a neural-network-based solution to the well-known problem of branch divergence in Single Instruction Multiple Data (SIMD) architectures. Our approach differs from existing techniques that handle branch (or control-flow) divergence, which use costly hardware modifications, low-utilization masking techniques, or static prediction methods. As we examine divergent applications, we characterize the degree of data-dependent control flow seen in each and isolate the code regions (or “kernels”) that cause the most performance degradation due to branch divergence. We then train neural networks (NNs) offline to approximate these kernels and inject the NN computations directly into the applications as substitutes for the kernels they approximate. This essentially translates control flow into nondivergent computation, trading off precision for performance. As our methodology manipulates application source code directly, it is inherently platform agnostic and can be adopted as a general means for accelerating divergent applications on data-parallel architectures. In this article, we present the Neuralizer, an automated software flow for kernel identification, NN training, and NN integration, as well as supplementary user-controlled optimization techniques. Evaluating our approach on a variety of divergent applications run on a Graphics Processing Unit (GPU), we on average achieve performance gains of 13.6 × and energy savings of 14.8 × with 96% accuracy.

查看译文

关键词

approximation theory,flow control,graphics processing units,neural nets,parallel processing,source code (software),gpu applications,nn approximations,simd architectures,accelerating divergent applications,branch divergence,degradation performance,energy gains,energy savings,neural-network-based solutions,neuralizer,nondivergent computation,platform-agnostic methodology,single instruction multiple data architectures,source code region isolation,trading-off precision,approximate computing,hardware acceleration,neural networks,simd

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要