Using Per-Loop CPU Clock Modulation for Energy Efficiency in OpenMP Applications

2015 44th International Conference on Parallel Processing(2015)

引用 29|浏览78
暂无评分
摘要
As the HPC community moves into the exascale computing era, application energy is becoming as large of a concern as performance. Optimizing for energy will be essential in the effort to overcome the limited power envelope. Existing efforts to optimize energy in applications employ Dynamic Frequency and Voltage Scaling (DVFS) to maximize energy savings in less compute-intensive regions or non-critical execution paths. However, we found that DVFS has high power state switching overhead, preventing its use when a more fine-grained technique is necessary. In this work, we take advantage of the low transition overhead of CPU clock modulation and apply it to fine-grained Open MP parallel loops. The energy behavior of Open MP parallel regions is first characterized by changing the effective frequency using clock modulation. The clock modulation setting that achieves the best energy efficiency is then determined for each region. Finally, different CPU clock modulation settings are applied to the different loops within the same application. The resulting multi-frequency execution of Open MP applications achieves better energy-delay trade-off than any single frequency setting. In the best case scenario, the multi-frequency approach achieved 8.6% energy savings with less than 1.5% execution time increase. Concurrency throttling (i.e., Reducing the number of hardware threads used by an application) saves more energy and can be combined with CPU clock modulation. Using both, we see savings of 21% energy and improvement of energy-delay product (EDP) by 16%.
更多
查看译文
关键词
per-loop CPU clock modulation,HPC community,application energy,energy optimization,power envelope,dynamic frequency-and-voltage scaling,DVFS,energy saving,compute-intensive regions,noncritical execution paths,power state switching overhead,transition overhead,fine-grained OpenMP parallel loops,energy behavior,OpenMP parallel regions,energy efficiency,multifrequency execution,energy-delay,energy savings,execution time,concurrency throttling,hardware threads,energy-delay product improvement,EDP improvement
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要