2.4 Temperature and Variability Driven Modeling and Runtime Management

Time	Label	Presentation Title Authors
11:30	2.4.1	HOT SPOT IDENTIFICATION AND SYSTEM PARAMETERIZED THERMAL MODELING FOR MULTI-CORE PROCESSORS THROUGH INFRARED THERMAL IMAGING Speaker: Sheldon Tan, University of California, Riverside, US Authors: Sheriff Sadiqbatcha¹, Hengyang Zhao¹, Hussam Amrouch², Joerg Henkel² and Sheldon Tan³ ¹University of California, Riverside, US; ²Karlsruhe Institute of Technology, DE; ³University of California at Riverside, US Abstract Accurate thermal models suitable for system level dynamic thermal, power and reliability regulation and management are vital for many commercial multi-core processors. However, developing such accurate thermal models and identifying the related thermal-power relevant spatial locations for commercial processors is a challenging task due to the lack of information and available tools. Existing tools such as HotSpot-like thermal models may suffer from inaccuracy or inefficiency for online applications, primarily because most rely on parameters that cannot be precisely quantified, such as power-traces, while others are numerical methods not suitable for runtime use. In this work, we propose a novel approach to automatically detecting the major heat-sources on a commercial multi-core microprocessor using an infrared thermal imaging setup. Our approach involves a number of steps including 2D discrete cosine transformation filter for noise reduction on the measured thermal maps, and Laplacian transformation followed by K-mean clustering for heat-source identification. Since the identified heat-sources are the thermally vulnerable areas of the die, we propose a novel approach to deriving a thermal model capable of predicting their temperatures during runtime. We apply Long-Short-Term-Memory (LSTM) networks to build a dynamic thermal model which uses system-level variables such as chip frequency, voltage and instruction count as inputs. The model is trained and tested exclusively using measured thermal data from a commercial multi-core processor. Experimental results show that the proposed thermal model achieves very high accuracy (root-mean-square-error: 2.04C to 2.57C) in predicting the temperature of all the identified heat-sources on the chip. Download Paper (PDF; Only available from the DATE venue WiFi)
12:00	2.4.2	LITHO-GPA: GAUSSIAN PROCESS ASSURANCE FOR LITHOGRAPHY HOTSPOT DETECTION Speaker: David Z. Pan, University of Texas at Austin, US Authors: Wei Ye¹, Mohamed Baker Alawieh¹, Meng Li², Yibo Lin¹ and David Z. Pan¹ ¹University of Texas at Austin, US; ²University of Texas, Austin, US Abstract Lithography hotspot detection is one of the fundamental steps in physical verification. Due to the increasingly complicated design patterns, early and quick feedback for lithography hotspots is desired to guide design closure in early stages. Machine learning approaches have been successfully applied to hotspot detection while demonstrating a remarkable capability of generalization to unseen hotspot patterns. However, most of the proposed machine learning approaches are not yet able to answer one critical question: how much a hotspot predicted from a trained model can be trusted? In this work, we present Litho-GPA, a lithography hotspot detection framework, with Gaussian Process assurance to provide confidence in each prediction. The framework also incorporates a data selection scheme with a sequence of weak classifiers to sample representative data and eventually reduce the amount of training data and lithography simulations needed. Experimental results demonstrate that our Litho-GPA is able to achieve the state-of-the-art accuracy while obtaining on average 28% reduction in false alarms. Download Paper (PDF; Only available from the DATE venue WiFi)
12:30	2.4.3	(Best Paper Award Candidate) PINT: POLYNOMIAL IN TEMPERATURE DECODE WEIGHTS IN A NEUROMORPHIC ARCHITECTURE Speaker: Scott Reid, Stanford University, US Authors: Scott Reid¹, Antonio Montoya¹ and Kwabena Boahen² ¹Stanford University, US; ²Stanford, US Abstract We present Polynomial in Temperature (PinT) decode weights, a novel approach to approximating functions with an ensemble of silicon neurons that increases thermal robustness. In mixed-signal neuromorphics, computing accurately across a wide range of temperatures is challenging because of individual silicon neurons' thermal sensitivity. To compensate for the resulting changes in the neuron's tuning-curves in the PinT framework, weights change continuously as a polynomial function of temperature. We validate PinT across a 38 °C range by applying it to tuning curves measured for ensembles of 64 to 1936 neurons on Braindrop, a mixed-signal neuromorphic chip fabricated in 28-nm FDSOI CMOS. LinT, the Linear in Temperature version of PinT, reduces error by a small margin on test data, relative to an ensemble with temperature-independent weights. LinT and higher-order models show much greater promise on training data, suggesting that performance can be further improved. When implemented on-chip, LinT's performance is very similar to the performance with temperature-independent decode weights. SpLinT and SpLSAT, the Sparse variants of LinT and LSAT, are promising avenues for efficiently reducing error. In the SpLSAT model, up to 90% of neurons on chip can be deactivated while maintaining the same function-approximation error. Download Paper (PDF; Only available from the DATE venue WiFi)
12:45	2.4.4	ENHANCING TWO-PHASE COOLING EFFICIENCY THROUGH THERMAL-AWARE WORKLOAD MAPPING FOR POWER-HUNGRY SERVERS Speaker: Arman Iranfar, EPFL, CH Authors: Arman Iranfar¹, Ali Pahlevan², Marina Zapater³ and David Atienza⁴ ¹EPFL, CH; ²Embedded Systems Lab (ESL), Electrical Engineering Department, EPFL, CH; ³Ecole Polytechnique Federale de Lausanne, CH; ⁴École Polytechnique Fédérale de Lausanne (EPFL), CH Abstract The power density and, consequently, power hungriness of server processors is growing by the day. Traditional air cooling systems fail to cope with such high heat densities, whereas single-phase liquid-cooling still requires high mass flow-rate, high pumping power, and large facility size. On the contrary, in a micro-scale gravity-driven thermosyphon attached on top of a processor, the refrigerant, absorbing the heat, turns into a two-phase mixture. The vapor-liquid mixture exchanges heat with a coolant at the condenser side, turns back to the liquid, and descends thanks to gravity, eliminating the need for pumping power. However, similar to other cooling technologies, thermosyphon efficiency can considerably vary with respect to workload performance requirements and thermal profile, in addition to the platform features, such as packaging and die floorplan. In this work, we first address the workload- and platform-aware design of a two-phase thermosyphon. Then, we propose a thermal-aware workload mapping strategy considering the potential and limitations of a two-phase thermosyphon to further minimize hot spots and spatial thermal gradients. Our experiments, performed on an 8-core Intel Xeon E5 CPU reveal, on average, up to 10 °C reduction in thermal hot spots, and 45% reduction in the maximum spatial thermal gradient on the die. Moreover, our design and mapping strategy are able to decrease the chiller cooling power at least 45%. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00	IP1-5, 711	(Best Paper Award Candidate) ADAPTIVE TRANSIENT LEAKAGE-AWARE LINEARISED MODEL FOR THERMAL ANALYSIS OF 3-D ICS Speaker: Milan Mihajlovic, University of Manchester, GB Authors: Chao Zhang, Milan Mihajlovic and Vasilis Pavlidis, The University of Manchester, GB Abstract Physics-based models for thermal simulation that involve numerical solution of the heat equation are well placed to accurately capture the heterogeneity of materials and structures in modern 3-D integrated circuits (ICs). The introduction of non-linear effects in thermal coefficients and leakage power improves significantly the accuracy of thermal models. However, this non-linearity increases significantly the complexity and computational time of the analysis. In this paper, we introduce a linearised thermal model by demonstrating that weak temperature dependence of the specific heat and the thermal conductivity of silicon-based materials has only minor effect to computed temperature profiles. Thus, these parameters can be considered constant in working temperature ranges of modern ICs. The non-linearity in leakage power is approximated by a piecewise linear least square fit and the resulting model is linearised by exact Newton's method contrary to previous works that employ either simple iterative or inexact Newton's method. The method is implemented in the context of transient thermal analysis with adaptive time step selection, where we demonstrate that it is essential to apply Newton corrections to obtain the right time step size selection. The resulting method is up to 2x faster than a full non-linear method, typically introducing a global relative error of less than 1%. Download Paper (PDF; Only available from the DATE venue WiFi)
13:01	IP1-6, 363	FASTCOOL: LEAKAGE AWARE DYNAMIC THERMAL MANAGEMENT OF 3D MEMORIES Speaker: Lokesh Siddhu, IIT Delhi, IN Authors: Lokesh Siddhu¹ and Preeti Ranjan Panda² ¹Indian Institute of Technology, Delhi, IN; ²IIT Delhi, IN Abstract 3D memory systems offer several advantages in terms of area, bandwidth, and energy efficiency. However, thermal issues arising out of higher power densities have limited their widespread use. While prior works have looked at reducing dynamic power through reduced memory accesses, in these memories, both leakage and dynamic power consumption are comparable. Furthermore, as the temperature rises the leakage power increases, creating a thermal-leakage loop. We study the impact of leakage power on 3D memory temperature and propose turning OFF hot channels to meet thermal constraints. Data is migrated to a 2D memory before closing a 3D channel. We introduce an analytical model to assess the 2D memory delay and use the model to guide data migration decisions. Our experiments show that the proposed optimization improves performance by 27% on an average (up to 66%) over state-of-the-art strategies. Download Paper (PDF; Only available from the DATE venue WiFi)
13:00		End of session Lunch Break in Lunch Area Coffee Breaks in the Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Breaks (Lunch Area) On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area. Tuesday, March 26, 2019 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30 Keynote Lecture "Leonardo da Vinci, Humanism and Engineering between Florence and Milan" by Claudio Giorgione in room 1 13:50 - 14:20 Coffee Break 16:00 - 17:00 Wednesday, March 27, 2019 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30 Keynote Lecture "Heterogeneous, High Scale Computing in the Era of Intelligent, Cloud-Connected" by David Pellerin, Amazon, US in room 1 13:50 - 14:20 Coffee Break 16:00 - 17:00 Thursday, March 28, 2019 Coffee Break 10:00 - 11:00 University Booth Best Demo Award Presentation at the University Booth 10:30 Lunch Break 12:30 - 14:00 Keynote Lecture "A Fundamental Look at Models and Intelligence" by Edward A. Lee, University of California, Berkeley, US in room 1 13:20 - 13:50 Coffee Break 15:30 - 16:00

Time

Label

Presentation Title
Authors

11:30

2.4.1

HOT SPOT IDENTIFICATION AND SYSTEM PARAMETERIZED THERMAL MODELING FOR MULTI-CORE PROCESSORS THROUGH INFRARED THERMAL IMAGING
Speaker:
Sheldon Tan, University of California, Riverside, US
Authors:
Sheriff Sadiqbatcha¹, Hengyang Zhao¹, Hussam Amrouch², Joerg Henkel² and Sheldon Tan³
¹University of California, Riverside, US; ²Karlsruhe Institute of Technology, DE; ³University of California at Riverside, US
Abstract
Accurate thermal models suitable for system level dynamic thermal, power and reliability regulation and management are vital for many commercial multi-core processors. However, developing such accurate thermal models and identifying the related thermal-power relevant spatial locations for commercial processors is a challenging task due to the lack of information and available tools. Existing tools such as HotSpot-like thermal models may suffer from inaccuracy or inefficiency for online applications, primarily because most rely on parameters that cannot be precisely quantified, such as power-traces, while others are numerical methods not suitable for runtime use. In this work, we propose a novel approach to automatically detecting the major heat-sources on a commercial multi-core microprocessor using an infrared thermal imaging setup. Our approach involves a number of steps including 2D discrete cosine transformation filter for noise reduction on the measured thermal maps, and Laplacian transformation followed by K-mean clustering for heat-source identification. Since the identified heat-sources are the thermally vulnerable areas of the die, we propose a novel approach to deriving a thermal model capable of predicting their temperatures during runtime. We apply Long-Short-Term-Memory (LSTM) networks to build a dynamic thermal model which uses system-level variables such as chip frequency, voltage and instruction count as inputs. The model is trained and tested exclusively using measured thermal data from a commercial multi-core processor. Experimental results show that the proposed thermal model achieves very high accuracy (root-mean-square-error: 2.04C to 2.57C) in predicting the temperature of all the identified heat-sources on the chip.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:00

2.4.2

LITHO-GPA: GAUSSIAN PROCESS ASSURANCE FOR LITHOGRAPHY HOTSPOT DETECTION
Speaker:
David Z. Pan, University of Texas at Austin, US
Authors:
Wei Ye¹, Mohamed Baker Alawieh¹, Meng Li², Yibo Lin¹ and David Z. Pan¹
¹University of Texas at Austin, US; ²University of Texas, Austin, US
Abstract
Lithography hotspot detection is one of the fundamental steps in physical verification. Due to the increasingly complicated design patterns, early and quick feedback for lithography hotspots is desired to guide design closure in early stages. Machine learning approaches have been successfully applied to hotspot detection while demonstrating a remarkable capability of generalization to unseen hotspot patterns. However, most of the proposed machine learning approaches are not yet able to answer one critical question: how much a hotspot predicted from a trained model can be trusted? In this work, we present Litho-GPA, a lithography hotspot detection framework, with Gaussian Process assurance to provide confidence in each prediction. The framework also incorporates a data selection scheme with a sequence of weak classifiers to sample representative data and eventually reduce the amount of training data and lithography simulations needed. Experimental results demonstrate that our Litho-GPA is able to achieve the state-of-the-art accuracy while obtaining on average 28% reduction in false alarms.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:30

2.4.3

(Best Paper Award Candidate)
PINT: POLYNOMIAL IN TEMPERATURE DECODE WEIGHTS IN A NEUROMORPHIC ARCHITECTURE
Speaker:
Scott Reid, Stanford University, US
Authors:
Scott Reid¹, Antonio Montoya¹ and Kwabena Boahen²
¹Stanford University, US; ²Stanford, US
Abstract
We present Polynomial in Temperature (PinT) decode weights, a novel approach to approximating functions with an ensemble of silicon neurons that increases thermal robustness. In mixed-signal neuromorphics, computing accurately across a wide range of temperatures is challenging because of individual silicon neurons' thermal sensitivity. To compensate for the resulting changes in the neuron's tuning-curves in the PinT framework, weights change continuously as a polynomial function of temperature. We validate PinT across a 38 °C range by applying it to tuning curves measured for ensembles of 64 to 1936 neurons on Braindrop, a mixed-signal neuromorphic chip fabricated in 28-nm FDSOI CMOS. LinT, the Linear in Temperature version of PinT, reduces error by a small margin on test data, relative to an ensemble with temperature-independent weights. LinT and higher-order models show much greater promise on training data, suggesting that performance can be further improved. When implemented on-chip, LinT's performance is very similar to the performance with temperature-independent decode weights. SpLinT and SpLSAT, the Sparse variants of LinT and LSAT, are promising avenues for efficiently reducing error. In the SpLSAT model, up to 90% of neurons on chip can be deactivated while maintaining the same function-approximation error.
Download Paper (PDF; Only available from the DATE venue WiFi)

12:45

2.4.4

ENHANCING TWO-PHASE COOLING EFFICIENCY THROUGH THERMAL-AWARE WORKLOAD MAPPING FOR POWER-HUNGRY SERVERS
Speaker:
Arman Iranfar, EPFL, CH
Authors:
Arman Iranfar¹, Ali Pahlevan², Marina Zapater³ and David Atienza⁴
¹EPFL, CH; ²Embedded Systems Lab (ESL), Electrical Engineering Department, EPFL, CH; ³Ecole Polytechnique Federale de Lausanne, CH; ⁴École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
The power density and, consequently, power hungriness of server processors is growing by the day. Traditional air cooling systems fail to cope with such high heat densities, whereas single-phase liquid-cooling still requires high mass flow-rate, high pumping power, and large facility size. On the contrary, in a micro-scale gravity-driven thermosyphon attached on top of a processor, the refrigerant, absorbing the heat, turns into a two-phase mixture. The vapor-liquid mixture exchanges heat with a coolant at the condenser side, turns back to the liquid, and descends thanks to gravity, eliminating the need for pumping power. However, similar to other cooling technologies, thermosyphon efficiency can considerably vary with respect to workload performance requirements and thermal profile, in addition to the platform features, such as packaging and die floorplan. In this work, we first address the workload- and platform-aware design of a two-phase thermosyphon. Then, we propose a thermal-aware workload mapping strategy considering the potential and limitations of a two-phase thermosyphon to further minimize hot spots and spatial thermal gradients. Our experiments, performed on an 8-core Intel Xeon E5 CPU reveal, on average, up to 10 °C reduction in thermal hot spots, and 45% reduction in the maximum spatial thermal gradient on the die. Moreover, our design and mapping strategy are able to decrease the chiller cooling power at least 45%.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:00

IP1-5, 711

(Best Paper Award Candidate)
ADAPTIVE TRANSIENT LEAKAGE-AWARE LINEARISED MODEL FOR THERMAL ANALYSIS OF 3-D ICS
Speaker:
Milan Mihajlovic, University of Manchester, GB
Authors:
Chao Zhang, Milan Mihajlovic and Vasilis Pavlidis, The University of Manchester, GB
Abstract
Physics-based models for thermal simulation that involve numerical solution of the heat equation are well placed to accurately capture the heterogeneity of materials and structures in modern 3-D integrated circuits (ICs). The introduction of non-linear effects in thermal coefficients and leakage power improves significantly the accuracy of thermal models. However, this non-linearity increases significantly the complexity and computational time of the analysis. In this paper, we introduce a linearised thermal model by demonstrating that weak temperature dependence of the specific heat and the thermal conductivity of silicon-based materials has only minor effect to computed temperature profiles. Thus, these parameters can be considered constant in working temperature ranges of modern ICs. The non-linearity in leakage power is approximated by a piecewise linear least square fit and the resulting model is linearised by exact Newton's method contrary to previous works that employ either simple iterative or inexact Newton's method. The method is implemented in the context of transient thermal analysis with adaptive time step selection, where we demonstrate that it is essential to apply Newton corrections to obtain the right time step size selection. The resulting method is up to 2x faster than a full non-linear method, typically introducing a global relative error of less than 1%.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:01

IP1-6, 363

FASTCOOL: LEAKAGE AWARE DYNAMIC THERMAL MANAGEMENT OF 3D MEMORIES
Speaker:
Lokesh Siddhu, IIT Delhi, IN
Authors:
Lokesh Siddhu¹ and Preeti Ranjan Panda²
¹Indian Institute of Technology, Delhi, IN; ²IIT Delhi, IN
Abstract
3D memory systems offer several advantages in terms of area, bandwidth, and energy efficiency. However, thermal issues arising out of higher power densities have limited their widespread use. While prior works have looked at reducing dynamic power through reduced memory accesses, in these memories, both leakage and dynamic power consumption are comparable. Furthermore, as the temperature rises the leakage power increases, creating a thermal-leakage loop. We study the impact of leakage power on 3D memory temperature and propose turning OFF hot channels to meet thermal constraints. Data is migrated to a 2D memory before closing a 3D channel. We introduce an analytical model to assess the 2D memory delay and use the model to guide data migration decisions. Our experiments show that the proposed optimization improves performance by 27% on an average (up to 66%) over state-of-the-art strategies.
Download Paper (PDF; Only available from the DATE venue WiFi)

13:00

End of session
Lunch Break in Lunch Area

Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Breaks (Lunch Area)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the Lunch Area to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 26, 2019

Coffee Break 10:30 - 11:30
Lunch Break 13:00 - 14:30
Keynote Lecture "Leonardo da Vinci, Humanism and Engineering between Florence and Milan" by Claudio Giorgione in room 1 13:50 - 14:20
Coffee Break 16:00 - 17:00

Wednesday, March 27, 2019

Coffee Break 10:00 - 11:00
Lunch Break 12:30 - 14:30
Keynote Lecture "Heterogeneous, High Scale Computing in the Era of Intelligent, Cloud-Connected" by David Pellerin, Amazon, US in room 1 13:50 - 14:20
Coffee Break 16:00 - 17:00

Thursday, March 28, 2019

Coffee Break 10:00 - 11:00
University Booth Best Demo Award Presentation at the University Booth 10:30
Lunch Break 12:30 - 14:00
Keynote Lecture "A Fundamental Look at Models and Intelligence" by Edward A. Lee, University of California, Berkeley, US in room 1 13:20 - 13:50
Coffee Break 15:30 - 16:00