4.4 Some run it hot, others do not

Printer-friendly version PDF version

Date: Tuesday 10 March 2020
Time: 17:00 - 18:30
Location / Room: Stendhal

Chair:
Pascal Vivet, CEA-Leti, FR

Co-Chair:
Daniele J. Pagliari, Politecnico di Torino, IT

Temperature management is a must-have in modern computing systems. The session presents a set of techniques for smart cooling systems, both active and pro-active, and thermal control policies. The techniques presented are vertically applied to different components, such as computing and communication sub-systems, and use orthogonal modeling and optimization strategies, such as machine-learning.

TimeLabelPresentation Title
Authors
17:004.4.1A LEARNING-BASED THERMAL SIMULATION FRAMEWORK FOR EMERGING TWO-PHASE COOLING TECHNOLOGIES
Speaker:
Ayse Coskun, Boston University, US
Authors:
Zihao Yuan1, Geoffrey Vaartstra2, Prachi Shukla1, Zhengmao Lu2, Evelyn Wang2, Sherief Reda3 and Ayse Coskun1
1Boston University, US; 2Massachusetts Institute of Technology, US; 3Brown University, US
Abstract
Future high-performance chips will require new cooling technologies that can extract heat efficiently. Two-phase cooling is a promising processor cooling solution owing to its high heat transfer rate and potential benefits in cooling power. Two-phase cooling mechanisms, including microchannel-based two-phase cooling or two-phase vapor chambers (VCs), are typically modeled by computing the temperature-dependent heat transfer coefficient (HTC) of the evaporator or coolant using an iterative simulation framework. Precomputed HTC correlations are specific to a given cooling system design and cannot be applied to even the same cooling technology with different cooling parameters (such as different geometries). Another challenge is that HTC correlations are typically calculated with computational fluid dynamics (CFD) tools, which induce long design and simulation times. This paper introduces a learning-based temperature-dependent HTC simulation framework that is used to model a two-phase cooling solution with a wide range of cooling design parameters. In particular, the proposed framework includes a compact thermal model (CTM) of two-phase VCs with hybrid wick evaporators (of nanoporous membrane and microchannels). We build a new simulation tool to integrate the proposed simulation framework and CTM. We validate the proposed simulation framework as well as the new CTM through comparisons against a CFD model. Our simulation framework and CTM achieve a speedup of 21X with an average error of 0.98degC (and a maximum error of 2.59degC). We design an optimization flow for hybrid wicks to select the most beneficial nanoporous membrane and microchannel geometries. Our flow is capable of finding a geometry-coolant combination that results in a lower (or similar) maximum chip temperature compared to that of the best coolant-geometry pair selected by grid search, while providing a speedup of 9.4X.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:304.4.2LIGHTWEIGHT THERMAL MONITORING IN OPTICAL NETWORKS-ON-CHIP VIA ROUTER REUSE
Speaker:
Mengquan Li, Nanyang Technological University, SG
Authors:
Mengquan Li1, Jun Zhou2 and Weichen Liu2
1Nanyang Technological University, CN; 2Nanyang Technological University, SG
Abstract
Optical network-on-chip (ONoC) is an emerging communication architecture for manycore systems due to low latency, high bandwidth, and low power dissipation. However, a major concern lies in its thermal susceptibility -- under on-chip temperature variations, functional nanophotonic devices, especially microring resonator (MR)-based devices, suffer from significant thermal-induced optical power loss, which may counteract the power advantages of ONoCs and even cause functional failures. Considering the fact that temperature gradients are typically found on many-core systems, effective thermal monitoring, performing as the foundation of thermal-aware management, is critical on ONoCs. In this paper, a lightweight thermal monitoring scheme is proposed for ONoCs. We first design a temperature measurement module based on generic optical routers. It introduces trivial overheads in chip area by reusing the components in routers. A major problem with reusing optical routers is that it may potentially interfere with the normal communications in ONoCs. To address it, we then propose a time allocation strategy to schedule thermal sensing operations in the time intervals between communications. Evaluation results show that our scheme exhibits an untrimmed inaccuracy of 1.0070 K with low energy consumption of 656.38 pJ/Sa. It occupies an extremely small area of 0.0020 mm^2, reducing the area cost by 83.74% on average compared to the state-of-the-art optical thermal sensor design.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:004.4.3A SPECTRAL APPROACH TO SCALABLE VECTORLESS THERMAL INTEGRITY VERIFICATION
Speaker:
Zhuo Feng, Stevens Institute of Technology, US
Authors:
Zhiqiang Zhao1 and Zhuo Feng2
1Michigan Technological University, US; 2Stevens Institute of Technology, US
Abstract
Existing chip thermal analysis and verification methods require detailed distribution of power densities or modeling of underlying input workloads (vectors), which may not always be feasible at early-design stage. This paper introduces the first vectorless thermal integrity verification framework that allows computing worst-case temperature (gradient) distributions across the entire chip under a set of local and global workload (power density) constraints. To address the computational challenges introduced by the large 3D mesh-structured thermal grids, we propose a novel spectral approach for highly-scalable vectorless thermal verification of large chip designs. Our approach is based on emerging spectral graph theory and graph signal processing techniques, which consists of a thermal grid topology sparsification phase, an edge weight scaling phase, as well as a solution refinement procedure. The effectiveness and efficiency of our approach have been demonstrated through extensive experiments.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:154.4.4DYNAMIC THERMAL MANAGEMENT WITH PROACTIVE FAN SPEED CONTROL THROUGH REINFORCEMENT LEARNING
Speaker:
Arman Iranfar, EPFL, CH
Authors:
Arman Iranfar1, Federico Terraneo2, Gabor Csordas1, Marina Zapater1, William Fornaciari2 and David Atienza1
1EPFL, CH; 2Politecnico di Milano, IT
Abstract
Dynamic Thermal Management (DTM) in submicron technology has become a major challenge since it directly affects Multiprocessors Systems-on-chip (MPSoCs) performance, power consumption, and lifetime reliability. For proper DTM, thermal simulators play a significant role as they allow chip temperature to be safely studied. Nonetheless, state-of-the-art thermal simulators do not support transient fan models. As a result, adaptive fan speed control, which is an important runtime parameter, cannot be well utilized in DTM. Therefore, in this work, we first propose and integrate a transient fan model into a state-of-the-art thermal simulator, enabling adaptive fan speed control simulation for efficient DTM. We, then, validate our simulation framework through a thermal test chip achieving less than 2$^circ{C}$ error in the worst case. With multiple fan speeds, however, the DTM design space grows significantly, which can ultimately make conventional solutions, such as grid search, infeasible, impractical, or insufficient due to the large runtime overhead. Therefore, we address this challenge through a reinforcement learning-based solution to proactively determine number of active cores, operating frequency, and fan speed. The proposed solution is able to reduce fan power by up to 40% compared to a DTM with constant fan speed with less than 1% performance degradation. Also, compared to a state-of-the-art DTM technique our solution improves the performance by up to 19% for the same fan power.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP2-5, 362A HEAT-RECIRCULATION-AWARE VM PLACEMENT STRATEGY FOR DATA CENTERS
Authors:
Hao Feng1, Yuhui Deng2 and Yi Zhou3
1Jinan University, CN; 2Chinese Academy of Sciences; Jinan University, CN; 3Columbus State University, US
Abstract
Data centers consisted of a great number of IT devices (e.g., servers, switches and etc.) which generates a massive amount of heat emission. Due to the special arrangement of racks in the data center, heat recirculation often occurs between nodes. It can cause a sharp rise in temperature of the equipment coupled with local hot spots in data centers. Existing VM placement strategies can minimize energy consumption of data centers by optimizing resource allocation in terms of multiple physical resources (e.g., memory, bandwidth, cpu and etc.). However, existing strategies ignore the role of heat recirculation in the data center. To address this problem, in this study, we propose a heat-recirculation-aware VM placement strategy and design a Simulated Annealing Based Algorithm (SABA) to lower the energy consumption of data centers. Different from the existing SA algorithm, SABA optimize the distribution of the initial solution and the way of iteration. We quantitatively evaluate SABA's performance in terms of algorithm efficiency, the activated servers and the energy saving against with XINT-GA algorithm (Thermal-aware task scheduling Strategy), FCFS (First-Come First-Served), and SA. Experimental results indicate that our heat-recirculation-aware VM placement strategy provides a powerful solution for improving energy efficiency of data centers.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP2-6, 826ENERGY OPTIMIZATION IN NCFET-BASED PROCESSORS
Authors:
Sami Salamin1, Martin Rapp1, Hussam Amrouch1, Andreas Gerstlauer2 and Joerg Henkel1
1Karlsruhe Institute of Technology, DE; 2University of Texas at Austin, US
Abstract
Energy consumption is a key optimization goal for all modern processors. Negative Capacitance Field-Effect Transistors (NCFETs) are a leading emerging technology that promises outstanding performance in addition to better energy efficiency. The thickness of the additional ferroelectric layer, frequency, and voltage are the key parameters in NCFET technology that impact the power and frequency of processors. However, their joint impact on energy optimization has not been investigated yet. In this work, we are the first to demonstrate that conventional (i.e., NCFET-unaware) dynamic voltage/frequency scaling (DVFS) techniques to minimize energy are sub-optimal when applied to NCFET-based processors. We further demonstrate that state-of-the-art NCFET-aware voltage scaling for power minimization is also sub-optimal when it comes to energy. This work provides the first NCFET-aware DVFS technique that optimizes the processor's energy through optimal runtime frequency/voltage selection. In NCFETs, energy-optimal frequency and voltage are dependent on the workload and technology parameters. Our NCFET-aware DVFS technique considers these effects to perform optimal voltage/frequency selection at runtime depending on workload characteristics. Results show up to 90 % energy savings compared to conventional DVFS techniques. Compared to state-of-the-art NCFET-aware power management, our technique provides up to 72 % energy savings along with 3:7x higher performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session