AgilePkgC: An Agile System Idle State Architecture for Energy Proportional Datacenter Servers

2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)(2022)

引用 4|浏览18
暂无评分
摘要
Modern user-facing applications deployed in datacenters use a distributed system architecture that exacerbates the latency requirements of their constituent microservices (30-250$\mu$s). Existing CPU power-saving techniques degrade the performance of these applications due to the long transition latency (order of 100$\mu$s) to wake up from a deep CPU idle state (C-state). For this reason, server vendors recommend only enabling shallow core C-states (e.g., CC1) for idle CPU cores, thus preventing the system from entering deep package C-states (e.g., PC6) when all CPU cores are idle. This choice, however, impairs server energy proportionality since power-hungry resources (e.g., IOs, uncore, DRAM) remain active even when there is no active core to use them. As we show, it is common for all cores to be idle due to the low average utilization (e.g., 5-20%) of datacenter servers running user-facing applications. We propose to reap this opportunity with AgilePkgC (APC), a new package C-state architecture that improves the energy proportionality of server processors running latency-critical applications. APC implements PC 1A (package C l agile), a new deep package C-state that a system can enter once all cores are in a shallow C-state (i.e., CC1) and has a nanosecond-scale transition latency. PC 1A is based on four key techniques. First, a hardware-based agile power management unit (APMU) rapidly detects when all cores enter a shallow core C-state (CC1) and triggers the system-level power savings control flow. Second, an IO Standby Mode (IOSM) places IO interfaces (e.g., PCIe, DMI, UPI, DRAM) in shallow (nanosecond-scale transition latency) low-power modes. Third, a CLM Retention (CLMR) mode rapidly reduces the CLM (Cache-and-home-agent, Last-level-cache, and Mesh network-on-chip) domain’s voltage to its retention level, drastically reducing its power consumption. Fourth, APC keeps all system PLLs active in PC 1A to allow nanosecond-scale exit latency by avoiding PLL re-locking overhead. Combining these techniques enables significant power savings while requiring less than 200ns transition latency, $\gt250\times$ faster than existing deep package C-states (e.g., PC6), making PC 1A practical for datacenter servers. Our evaluation based on an Intel Skylake-based server shows that APC reduces the energy consumption of Memcached by up to 41% (25% on average) with <0.1% performance degradation. APC provides similar benefits for other representative workloads.
更多
查看译文
关键词
power,energy,thermal management
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要