6.3 Emerging Low Power Techniques

Printer-friendly version PDF version

Date: Wednesday 11 March 2015
Time: 11:00 - 12:30
Location / Room: Stendhal

Chair:
Guillermo Payá Vayá, Leibniz Universität Hannover, DE

Co-Chair:
Alberto Garcia-Ortiz, U. Bremen, DE

Technology improvements towards the nanometric era are inducing new challenges which need to be addressed at all the abstration levels. This section focuses on the latest emerging approaches to cope with those challenges, as for example approximate computing, compressed sensing, asymmetric underlapped FinFET, etc.

TimeLabelPresentation Title
Authors
11:006.3.1(Best Paper Award Candidate)
ASYMMETRIC UNDERLAPPED FINFET BASED ROBUST SRAM DESIGN AT 7NM NODE
Speaker:
Rangharajan Venkatesan, NVIDIA Corporation, US
Authors:
Arun Goud Akkala1, Rangharajan Venkatesan2, Anand Raghunathan1 and Kaushik Roy1
1Purdue University, US; 2NVIDIA Corporation, US
Abstract
Robust 6T SRAM design in 7nm technology node, at low supply voltage and rising leakage, requires ingenious design of FinFETs capable of providing reasonable Ion/Ioff ratio and acceptable short channel effects even under new leakage mechanisms such as direct source to drain tunneling. In this work, we explore asymmetric underlapped FinFET design with the help of quantum mechanical device simulations considering both the bit-cell and cache design constraints. We show that our optimized FinFET achieves a significant improvement in on-current over conventional symmetrically underlapped FinFETs. Through circuit simulations using compact models, we demonstrate that when such asymmetric underlapped n-FinFETs are used as bit-line access transistors, read/write conflict can be mitigated with simultaneous reduction in 6T SRAM bit-cell leakage. Improvement in write noise margin as well as access time can also be achieved under iso-read stability condition. Based on these technology and bit-cell models, we have developed a CACTI-based simulator for evaluating asymmetric FinFET based SRAM cache at 7nm node. Using this device-circuit-system level framework and optimized asymmetric underlapped FinFETs, we demonstrate significant energy savings and performance improvements for an 8KB L1 cache and a 4MB last-level cache.

Download Paper (PDF; Only available from the DATE venue WiFi)
11:306.3.2QUALITY CONFIGURABLE REDUCE-AND-RANK FOR ENERGY EFFICIENT APPROXIMATE COMPUTING
Speakers:
Arnab Raha, Swagath Venkataramani, Vijay Raghunathan and Anand Raghunathan, Purdue University, US
Abstract
Approximate computing is an emerging design paradigm that exploits the intrinsic ability of applications to produce acceptable outputs even when their computations are executed approximately. In this work, we explore approximate computing for a key computation pattern, Reduce-and-Rank (RnR), which is prevalent in a wide range of workloads including video processing, recognition, search and data mining. An RnR kernel performs a reduction operation (e.g., distance computation, dot product, L1-norm) between an input vector and each of a set of reference vectors, and ranks the reduction outputs to select the top reference vectors for the current input. We propose two complementary approximation strategies for the RnR computation pattern. The first is interleaved reduction-and-ranking, wherein the vector reductions are decomposed into multiple partial reductions and interleaved with the rank computation. Leveraging this transformation, we propose the use of intermediate reduction results and ranks to identify future computations that are likely to have low impact on the output, and can hence be approximated. The second strategy, input similarity based approximation, exploits the spatial or temporal correlation of inputs (e.g., pixels of an image or frames of a video) to identify computations that are amenable to approximation. These strategies address a key challenge in approximate computing - identification of which computations to approximate - and may be used to drive any approximation mechanism such as computation skipping and precision scaling to realize performance or energy improvements. A second key challenge in approximate computing is that the extent to which computations can be approximated varies significantly from application to application, and across inputs for even a single application. Hence, quality configurability, or the ability to automatically modulate the degree of approximation at runtime is essential. To enable quality configurability in RnR kernels, we propose a kernel-level quality metric that correlates well to application-level quality, and identify key parameters that can be used to tune the proposed approximation strategies dynamically. We develop a runtime framework that modulates the identified parameters during execution of RnR kernels to minimize their energy while meeting a given target quality. To evaluate the proposed concepts, we designed quality-configurable hardware implementations of 6 RnR-based applications from the recognition, mining, search and video processing application domains in 45nm technology. Our experiments demonstrate 1.06X-2.18X reduction in energy consumption with virtually no loss in output quality (<0.5%) at the application-level. The energy benefits further improve up to 2.38X and 2.5X when the quality constraints are relaxed to 2.5% and 5% respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:006.3.3ULTRA-LOW-POWER ECG FRONT-END DESIGN BASED ON COMPRESSED SENSING
Speakers:
Hossein Mamaghanian1 and Pierre Vandergheynst2
1EPFL, CH; 2École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
Ultra-low-power design has been a challenging area for design of the sensor front-ends especially in the area of Wireless Body Sensor Nodes (WBSN), where a limited amount of power budget and hardware resources is available. Since introduction of CS, there has been a huge challenge to design CS readout devices for different applications and among all for biomedical signals. Till now, different proposed realizations of the digital CS prove the suitability of using CS as a compression technique for compressible biomedical signals. However, these works mainly take advantages of only one aspect of the benefits of the CS. In this type of works, CS is usually used as a very low cost and easy to implement compression technique. This means that we should acquire the signal with traditional limitations on the bandwidth (BW) and later compresses it. However, the main power of the CS, which lies on the efficient data acquisition, remains untouched. Building on our previous work [1], where the suitability of the CS is proven for the compression of the ECG signals, and our investigation on ultra-low-power CS-based A2I devices [2] , here in this paper we propose a fully redesigned complete CS-based "Analog-to-information'' (A/I) front-end for ECG signals. Our results show that proposed hybrid design easily outperforms the traditional implementation of CS with more than 11 times fold reduction in power consumption in high compression ratio.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:156.3.4GTFUZZ: A NOVEL ALGORITHM FOR ROBUST DYNAMIC POWER OPTIMIZATION VIA GATE SIZING WITH FUZZY GAMES
Speakers:
Tony Casagrande and Nagarajan Ranganathan, University of South Florida, US
Abstract
As CMOS technology continues to scale, the effects of variation inject a greater proportion of error and uncertainty into the design process. Ultra-deep submicron circuits require accurate modeling of gate delay in order to meet challenging timing constraints.With the lack of statistical data, designers are faced with a arduous task to optimize a circuit which is greatly affected by variability due to the mechanical and chemical manufacturing process. Discrete gate sizing is a complex problem which requires (1) accurate models that take into account random parametric variation and (2) a fair allocation of resources to maximize the solution in the delay-energy space. The GTFUZZ algorithm is presented which handles both of these tasks. Fuzzy games are used to model the problem of gate sizing as a resource allocation problem. In fuzzy games, delay is considered a fuzzy goal with fuzzy parameters to capture the imprecision of gate delay early in the design phase when empirical data is absent. Dynamic power is normalized as a fuzzy goal without varying coefficients. The fuzzy goals also provide a flexible platform for multimetric optimization. The robust GTFUZZ algorithm is compared against fuzzy linear programming (FLP) and deterministic worst-case FLP (DWCFLP) algorithms. Benchmark circuits are first synthesized, placed, routed, and optimized for performance using the Synopsys University 32/28nm standard cell library and technology files. Operating at the optimized clock frequency, results show an average power reduction of about 20% versus DWCFLP and 9% against variation-aware gate sizing with FLP. Timing and timing yield are verified by both Synopsys PrimeTime and Monte Carlo simulations of the most critical paths using HSPICE.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30IP3-2, 1051SPATIAL AND TEMPORAL GRANULARITY LIMITS OF BODY BIASING IN UTBB-FDSOI
Speakers:
Johannes Maximilian Kühn1, Dustin Peterson1, Hideharu Amano2, Oliver Bringmann1 and Wolfgang Rosenstiel1
1Eberhard Karls Universität Tübingen, DE; 2Keio University, JP
Abstract
Advances in SOI technology such as STMicro's 28nm UTBB-FDSOI enabled a renaissance of body biasing. Body biasing is a fast and efficient technique to change power and performance characteristics. As the electrical task to change the substrate potential is small compared to Dynamic Voltage Scaling, much finer island sizes are conceivable. This however creates new challenges in regard to design partitioning into body bias islands and body bias combinations across such designs. These combinations should be chosen so that energy efficiency improves while maintaining timing constraints. We introduce a combination based analysis tool to find optimized body bias island partitions and body biasing levels. For such partitions, optimized body bias assignments for static, programmable and dynamic body biasing can be computed. The overheads incurred by dynamically switching body biases are estimated to yield actual improvements and to give an upper bound for the power consumption of required additional circuitry. Based on these partitionings and the switching overheads, optimized application specific switching strategies are computed. The effectiveness of this method is demonstrated in a frequency scaling scenario using forward body biasing on a Dynamic Reconfigurable Processor (DRP) design. We show that leakage can be greatly reduced using the proposed methods and that dynamic body biasing can be beneficial even at small time periods.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:30End of session
Lunch Break, Keynote lectures from 1250 - 1420 (Room Oisans) in front of the session room Salle Oisans and in the Exhibition area

Coffee Break in Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area.

Lunch Break

On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only).

Tuesday, March 10, 2015

Coffee Break 10:30 - 11:30

Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics

Coffee Break 16:00 - 17:00

Wednesday, March 11, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans)

Coffee Break 16:00 - 17:00

Thursday, March 12, 2015

Coffee Break 10:00 - 11:00

Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50

Coffee Break 15:30 - 16:00