7.7 Energy-efficient Computing

Date: Wednesday 11 March 2015
Time: 14:30 - 16:00
Location / Room: Les Bans

Chair:
Damien Querlioz, CNRS-IEF, FR

Co-Chair:
Swaroop Ghosh, University of South Florida, US

The papers in this session are all focused on energy efficient computing. In the first talk, the authors present approaches for accelerating learning algorithms for resistive cross-point arrays. The next paper considers what training schemes are most suitable when RRAM arrays are used to realize spiking neural networks. Finally, the last presentation will discuss how devices that offer the potential for non-volatile state retention can be employed in power gating architectures.

Time	Label	Presentation Title Authors
14:30	7.7.1	TECHNOLOGY-DESIGN CO-OPTIMIZATION OF RESISTIVE CROSS-POINT ARRAY FOR ACCELERATING LEARNING ALGORITHMS ON CHIP Speakers: Pai-Yu Chen, Deepak Kadetotad, Zihan Xu, Abinash Mohanty, Binbin Lin, Jieping Ye, Sarma Vrudhula, Jae-sun Seo, Yu Cao and Shimeng Yu, Arizona State University, US Abstract Technology-design co-optimization methodologies of the resistive cross-point array are proposed for implementing machine learning algorithms on a chip. A novel read and write scheme is designed to accelerate the training process, which realizes fully parallel operations of the weighted sum and the weight update. Furthermore, technology and design parameters of the cross-point array are co-optimized to enhance the array performance in learning tasks, including learning accuracy, latency and energy consumption. In contrast to the conventional memory design, a set of reverse scaling rules is proposed on the resistive cross-point array to achieve high learning accuracy. These include 1) larger wire width to reduce the IR drop on interconnects thereby increasing the learning accuracy; 2) use of multiple cells for each weight element to alleviate the impact of the device variations, at an affordable expense of area, energy and latency. The optimized resistive cross-point array with peripheral circuitry is implemented at the 65 nm node. Its performance is benchmarked for handwritten digit recognition on the MNIST database using gradient-based sparse coding. Compared to state-of-the-art software approach running on CPU, it achieves >1E3 speed-up and >1E6 energy efficiency improvement, enabling real-time image feature extraction and learning. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	7.7.2	SPIKING NEURAL NETWORK WITH RRAM: CAN WE USE IT FOR REAL-WORLD APPLICATION? Speakers: Tianqi Tang¹, Lixue Xia¹, Boxun Li¹, Rong Luo¹, Yiran Chen², Yu Wang¹ and Huazhong Yang¹ ¹Tsinghua University, CN; ²University of Pittsburgh, US Abstract The spiking neural network (SNN) provides a promising solution to drastically promote the performance and efficiency of computing systems. Previous work of SNN mainly focus on increasing the scalability and level of realism in a neural simulation, while few of them support practical cognitive applications with acceptable performance. At the same time, based on the traditional CMOS technology, the efficiency of SNN systems is also unsatisfactory. In this work, we explore different training algorithms of SNN for real-world applications, and demonstrate that the Neural Sampling method is much more effective than Spiking Time Dependent Plasticity (STDP) and Remote Supervision Method (ReSuMe). We also propose an energy efficient implementation of SNN with the emerging metal-oxide resistive random access memory (RRAM) devices, which includes an RRAM crossbar array works as network synapses, an analog design of the spike neuron, and an input encoding scheme. A parameter mapping algorithm is also introduced to configure the RRAM-based SNN. Simulation results illustrate that the system achieves 91.2\% accuracy on the MNIST dataset with an ultra-low power consumption of 3.5mW. Moreover, the RRAM-based SNN system demonstrates great robustness to 20\% process variation with less than 1\% accuracy decrease, and can tolerate 20\% signal fluctuation with about 2\% accuracy loss. These results reveal that the RRAM-based SNN will be quite easy to be physically realized. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	7.7.3	COMPARATIVE STUDY OF POWER-GATING ARCHITECTURES FOR NONVOLATILE FINFET-SRAM USING SPINTRONICS-BASED RETENTION TECHNOLOGY Speakers: Yusuke Shuto¹, Shuu'ichirou Yamamoto¹ and Satoshi Sugahara² ¹Tokyo Institute of Technology, JP; ²Tokyo Institute of Technology, Abstract Power-gating (PG) architectures employing nonvolatile state/data retention are expected to be a highly efficient energy reduction technique for high-performance CMOS logic systems. Recently, two types of PG architectures using nonvolatile retention have been proposed: One architecture is nonvolatile PG (NVPG) using nonvolatile bistable circuits such as nonvolatile SRAM (NV-SRAM) and nonvolatile flip-flop (NV-FF), in which nonvolatile retention is not utilized during the normal SRAM/FF operation mode and it is used only when there exist energetically meaningful shutdown periods given by break-even time (BET). In contrast, the other architecture employs nonvolatile retention during the normal SRAM/FF operation mode. In this type of architecture, an even short standby period can be replaced by a shutdown period, and thus this architecture is also called normally-off (NOF) rather than PG. In this paper, these two PG architectures for a FinFET-based high-performance NV-SRAM cell employing spintronics-based nonvolatile retention were systematically analyzed using HSPICE with a magnetoresistive-device macromodel. The NVPG architecture shows effective reduction of energy dissipation without performance degradation, whereas the NOF architecture causes severe performance degradation and the energy efficiency of the NOF architecture cannot be superior to that of the NVPG architecture. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00	IP3-15, 483	ANALOG NEUROMORPHIC COMPUTING ENABLED BY MULTI-GATE PROGRAMMABLE RESISTIVE DEVICES Speakers: Vehbi Calayir, Mohamed Darwish, Jeffrey Weldon and Larry Pileggi, Carnegie Mellon University, US Abstract Analog neural networks represent a massively parallel computing paradigm by mimicking the human brain. Two important functions that are not efficiently built by CMOS technology for their practical hardware implementations are weighting for synapse circuits and summing for neuron circuits. In this paper we propose the use of tunable analog resistances, such as multi-gate graphene devices, to efficiently enable these two functions. We design and demonstrate a complete analog neuromorphic circuitry enabled by such devices. Simulation results based on Verilog-A compact models for graphene devices confirm its functionality. We also provide experimental demonstration of our proposed graphene device along with projected circuit performance based on scaling targets. Our proposed design is suitable not only for the device example shown in this paper, but also for any beyond-CMOS technology that exhibits similar device characteristics. Download Paper (PDF; Only available from the DATE venue WiFi)
16:01	IP3-16, 582	AN ENERGY-EFFICIENT NON-VOLATILE IN-MEMORY ACCELERATOR FOR SPARSE-REPRESENTATION BASED FACE RECOGNITION Speakers: Yuhao Wang¹, Hantao Huang¹, Leibin Ni¹, Hao Yu¹, Mei Yan¹, Chuliang Weng², Wei Yang² and Junfeng Zhao² ¹Nanyang Technological University, SG; ²Shannon Laboratory, Huawei Technologies Co., Ltd, CN Abstract Data analytics such as face recognition involves large volume of image data, and hence leads to grand challenge on mobile platform design with strict power requirement. Emerging non-volatile STT-MRAM has the minimum leakage power and comparable speed to SRAM, and hence is considered as a promising candidate for data-oriented mobile computing. However, there exists significantly higher write-energy for STT-MRAM when compared to the SRAM. Based on the use of STT- MRAM, this paper introduces an energy-efficient non-volatile in-memory accelerator for a sparse-representation based face recognition algorithm. We find that by projecting high-dimension image data to much lower dimension, the current scaling for STT-MRAM write operation can be applied aggressively, which leads to significant power reduction yet maintains quality-of-service for face recognition. Specifically, compared to a baseline with SRAM, leakage power and dynamic power are reduced by 91.4% and 79% respectively with only slight compromise on recognition rate. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session Coffee Break in Exhibition Area Coffee Break in Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area. Lunch Break On Tuesday and Wednesday, lunch boxes will be served in front of the session room Salle Oisans and in the exhibition area for fully registered delegates (a voucher will be given upon registration on-site). On Thursday, lunch will be served in Room Les Ecrins (for fully registered conference delegates only). Tuesday, March 10, 2015 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30; Keynote session from 13:20 - 14:20 (Room Oisans) sponsored by Mentor Graphics Coffee Break 16:00 - 17:00 Wednesday, March 11, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30, Keynote lectures from 12:50 - 14:20 (Room Oisans) Coffee Break 16:00 - 17:00 Thursday, March 12, 2015 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00, Keynote lecture from 13:20 - 13:50 Coffee Break 15:30 - 16:00