IP3 Interactive Presentations

Date: Wednesday 11 March 2015
Time: 16:00 - 16:30
Location / Room: Exhibition Area

Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the afternoon. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. At the end of each afternoon Interactive Presentations session the award 'Best IP of the Day' is given.

Label	Presentation Title Authors
IP3-1	STT MRAM-BASED PUFS Speakers: Elena Ioana Vatajelu¹, Giorgio Di Natale², Marco Indaco¹ and Paolo Prinetto¹ ¹Politecnico di Torino, IT; ²LIRMM, FR Abstract Physical Unclonable Functions (PUFs) are emerging cryptographic primitives used to implement low-cost device authentication and secure secret key generation. Weak PUFs (i.e., devices able to generate a single signature or able to deal with a limited number of challenges) are widely discussed in literature. Nowadays, the most promising solution is based on SRAMs. In this paper we propose an innovative PUF design based on STT-MRAM memory. We exploit the high variability affecting the electrical resistance of the MTJ device in anti-parallel magnetization. We will show that the proposed solution is robust, unclonable and unpredictable. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-2	SPATIAL AND TEMPORAL GRANULARITY LIMITS OF BODY BIASING IN UTBB-FDSOI Speakers: Johannes Maximilian Kühn¹, Dustin Peterson¹, Hideharu Amano², Oliver Bringmann¹ and Wolfgang Rosenstiel¹ ¹Eberhard Karls Universität Tübingen, DE; ²Keio University, JP Abstract Advances in SOI technology such as STMicro's 28nm UTBB-FDSOI enabled a renaissance of body biasing. Body biasing is a fast and efficient technique to change power and performance characteristics. As the electrical task to change the substrate potential is small compared to Dynamic Voltage Scaling, much finer island sizes are conceivable. This however creates new challenges in regard to design partitioning into body bias islands and body bias combinations across such designs. These combinations should be chosen so that energy efficiency improves while maintaining timing constraints. We introduce a combination based analysis tool to find optimized body bias island partitions and body biasing levels. For such partitions, optimized body bias assignments for static, programmable and dynamic body biasing can be computed. The overheads incurred by dynamically switching body biases are estimated to yield actual improvements and to give an upper bound for the power consumption of required additional circuitry. Based on these partitionings and the switching overheads, optimized application specific switching strategies are computed. The effectiveness of this method is demonstrated in a frequency scaling scenario using forward body biasing on a Dynamic Reconfigurable Processor (DRP) design. We show that leakage can be greatly reduced using the proposed methods and that dynamic body biasing can be beneficial even at small time periods. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-3	A HARDWARE IMPLEMENTATION OF A RADIAL BASIS FUNCTION NEURAL NETWORK USING STOCHASTIC LOGIC Speakers: Yuan Ji¹, Feng Ran¹, Cong Ma² and David Lilja² ¹Shanghai University, CN; ²University of Minnesota - Twin Cities, US Abstract Hardware implementations of artificial neural networks typically require significant amounts of hardware resources. This paper proposes a novel radial basis function artificial neural network using stochastic computing elements, which greatly reduces the required hardware. The Gaussian function used for the radial basis function is implemented with a two-dimensional finite state machine. The norm between the input data and the center point is optimized using simple logic gates. Results from two pattern recognition case studies, the standard Iris flower and the MICR font benchmarks, show that the difference of the average mean squared error between the proposed stochastic network and the corresponding traditional deterministic network is only 1.3% when the stochastic stream length is 10kbits. The accuracy of the recognition rate varies depending on the stream length, which gives the designer tremendous flexibility to tradeoff speed, power, and accuracy. From the FPGA implementation results, the hardware resource requirement of the proposed stochastic hidden neuron is only a few percent of the hardware requirement of the corresponding deterministic hidden neuron. The proposed stochastic network can be expanded to larger scale networks for complex tasks with simple hardware architectures. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-4	SODA: SOFTWARE DEFINED FPGA BASED ACCELERATORS FOR BIG DATA Speakers: Chao Wang, Xi Li and Xuehai Zhou, University of Science and Technology of China, CN Abstract FPGA has been an emerging field in novel big data architectures and systems, due to its high efficiency and low power consumption. It enables the researchers to deploy massive accelerators within one single chip. In this paper, we present a software defined FPGA based accelerators for big data, named SODA, which could reconstruct and reorganize the acceleration engines according to the requirement of the various data-intensive applications. SODA decomposes large and complex applications into coarse grained single-purpose RTL code libraries that perform specialized tasks in out-of-order hardware. We built a prototyping system with constrained shortest path Finding (CSPF) case studies to evaluate SODA framework. SODA is able to achieve up to 43.75X speedup at 128 node application. Furthermore, hardware cost of the SODA framework demonstrates that it can achieve high speedup with moderate hardware utilization. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-5	DYNAMIC RECONFIGURABLE PUNCTURING FOR SECURE WIRELESS COMMUNICATION Speakers: Liang Tang¹, Jude Angelo Ambrose², Akash Kumar¹ and Sri Parameswaran² ¹National University of Singapore, SG; ²University of New South Wales, AU Abstract The ubiquity of wireless devices has created security concerns on the information being transferred. It is critical to protect the secret information in every layer of wireless communication to thwart any type of attacks. A dynamic reconfigurable puncturing based security mechanism, named RePunc, is proposed in this paper to provide an extra level of security at the physical layer. RePunc utilizes the puncturing feature of Forward Error Correction (FEC) to insert the secure information in the punctured positions of the standard information encoded data. The punctured patterns are dynamically changed and passed as a secret key from the sender to the receiver. An eavesdropper will not be able to detect the transmission of the secure information since the inserted secure information will be processed as channel noise by the eavesdropper's receiver. However, the rightful receiver will be able to successfully decode the secure packets by knowingly differentiating the secure information and the standard information before the FEC decoding. A case study of RePunc implementation for WiFi communication is presented in this paper, showing the extreme high security complexity with low hardware overhead. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-6	QR-DECOMPOSITION ARCHITECTURE BASED ON TWO-VARIABLE NUMERIC FUNCTION APPROXIMATION Speakers: Jochen Rust, Frank Ludwig and Steffen Paul, University of Bremen, DE Abstract This paper presents a new approach for hardware-based QR-decomposition using an efficient computation scheme of the Givens-Rotation. In detail, the angle of rotation and its application to the Givens-Matrix are processed in a direct, straight-forward manner. High-performance signal processing is achieved by piecewise approximation of the arctangent and sine function. In order to identify appropriate function approximations, several designs with varying constraints are automatically generated and analyzed. Physical and logical synthesis is performed in a 130nm CMOS-technology. The application of our proposal in a multi-antenna mobile communication scenario highlights our work to be very efficient in terms of calculation accuracy and computation performance. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-7	IN-PLACE MEMORY MAPPING APPROACH FOR OPTIMIZED PARALLEL HARDWARE INTERLEAVER ARCHITECTURES Speakers: Saeed Ur Rehman¹, Cyrille Chavet², Philippe Coussy² and Awais Sani¹ ¹Lab-STICC / Université de Bretagne Sud, PK; ²Lab-STICC / Université de Bretagne Sud, FR Abstract Due to their impressive error correction performances, turbo-codes or LDPC (Low Density Parity Check) architectures are now widely used in communication system and are one of the most critical parts of decoders. In order to achieve high throughput requirements these decoders are based on parallel architecture, which results in a major problem to be solved: parallel memory access conflicts. To solve these conflicts, different approaches have been proposed in state of the art resulting in a lot of different architectural solutions. In this article, we introduce a new class of memory mapping approach that can solve the conflicts with an optimized architecture based on in-place memory mapping for any application. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-8	MAXIMIZING COMMON IDLE TIME ON MULTI-CORE PROCESSORS WITH SHARED MEMORY Speakers: Chenchen Fu¹, Yingchao Zhao², Minming Li¹ and Jason Xue³ ¹Department of Computer Science, City University of Hong Kong, HK; ²Department of Computer Science, Caritas Institute of Higher Education, Hong Kong, HK; ³City University of Hong Kong, HK Abstract Reducing energy consumption is a critical problem in most of the computing systems today. This paper focuses on reducing the energy consumption of the shared main memory in multi-core processors by putting it into sleep state when all the cores are idle. Based on this idea, this work presents systematic analysis of different assignment and scheduling models and proposes a series of scheduling schemes to maximize the common idle time of all cores. An optimal scheduling scheme is proposed assuming the number of cores is unbounded. When the number of cores is bounded, an efficient heuristic algorithm is proposed. The experimental results show that the heuristic algorithm works efficiently and can save as much as 25.6% memory energy compared to a conventional multi-core scheduling scheme. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-9	MAXIMIZING IO PERFORMANCE VIA CONFLICT REDUCTION FOR FLASH MEMORY STORAGE SYSTEMS Speakers: Qiao Li¹, Liang Shi², Congming Gao¹, Kaijie Wu¹, Jason Chun Xue³, Qingfeng Zhuge¹ and H.-M. Edwin Sha⁴ ¹Chongqing University, CN; ²College of Computer Science, Chongqing University, CN; ³City University of Hong Kong, HK; ⁴Chongqing University and University of Texas at Dallas, CN Abstract Flash memory has been widely deployed during the recent years with the improvement of bit density and technology scaling. However, a significant performance degradation is also introduced with the development trend. The latency of IO requests on flash memory storage systems is composed of access conflict latency, data transfer latency, flash chip access latency and ECC encoding/decoding latency. Studies show that the access conflict latency, which is mainly induced by the slow transfer latency and access latency, has become the dominate part of the IO latency, especially for IO intensive applications. This paper proposes to reduce the flash access conflict latency through the reduction of the transfer and flash access latencies. A latency model is built to construct the relationship among the transfer latency and access latency based on the reliability characteristics of flash memory. Simulation experiments show that the proposed approach achieves significant performance improvement. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-10	A HYBRID PACKET/CIRCUIT-SWITCHED ROUTER TO ACCELERATE MEMORY ACCESS IN NOC-BASED CHIP MULTIPROCESSORS Speakers: Yassin Mazloumi and Mehdi Modarressi, University of Tehran, IR Abstract Modern chip multiprocessors will feature a large shared last-level cache (LLC) that is decomposed into smaller slices and physically distributed throughout the chip area. These architectures rely on a network-on-chip (NoC) to handle remote cache access and hence, NoCs play a critical role in optimizing memory access latency and power consumption. Circuit-switching is the most power- and performance-efficient switching mechanism in NoCs, but is not advantageous when the packet transmission time is not long enough compared to the circuit setup time. In this paper, we propose a zero-latency circuit setup scheme to make circuit-switching applicable in transferring individual data packets. The design leverages the fact that in CMPs with distributed LLC (where a considerable portion of the on-chip traffic is composed of remote LLC access requests and data responses), every response packet is sent in reply to a request packet and traverses the same path as its corresponding request, but at the backward direction. The short request packets, then, are responsible to reserve a path for their corresponding response packets. This NoC tries to reduce conflict among circuit paths by considering conflicts in backward direction during request packet routing, backed by a run-time technique to resolve conflicts when circuits are actually set up. Experimental results show that the proposed NoC architecture considerably reduces average packet latency that directly translates to faster memory access Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-11	SEMIAUTOMATIC IMPLEMENTATION OF A BIOINSPIRED RELIABLE ANALOG TASK DISTRIBUTION ARCHITECTURE FOR MULTIPLE ANALOG CORES Speakers: Julius von Rosen¹, Markus Meissner¹ and Lars Hedrich² ¹Goethe Universität Frankfurt, DE; ²Goethe-Universitat Frankfurt a. M., DE Abstract In this paper we present a silicon implementation of a bioinspired analog task distribution system for enabling reliable analog multi-core systems. The increase in reliability is achieved by a dependable task distribution architecture using a hormone based mechanism. The specifications are generated by a feasibility analysis of the algebraic description of the architecture. Starting from the specifications, an automated analog synthesis framework is used to fasten the time-consuming design of the needed analog amplifiers. The complete system with the designed amplifiers has been layouted and fabricated. We present measurements of two different architectures of task distribution system on silicon showing the full functionality of the system and the design methodology. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-12	POWER-EFFICIENT ACCELERATOR ALLOCATION IN ADAPTIVE DARK SILICON MANY-CORE SYSTEMS Speakers: Muhammad Usman Karim Khan, Muhammad Shafique and Joerg Henkel, Karlsruhe Institute of Technology (KIT), DE Abstract Modern many-core systems in the dark silicon era face the predicament of underutilized resources of the chip due to power constraints. Therefore, hardware accelerators are becoming popular as they can overcome this problem by exercising a part of the program on dedicated custom logic in an energy efficient way. However, efficient accelerator usage poses numerous challenges, like adaptations for accelerator's sharing schedule on the many-core systems under run-time varying scenarios. In this work, we propose a power-efficient accelerator allocation scheme for adaptive many-core systems that maximally utilizes and dynamically allocates a shared accelerator to competing cores, such that deadlines of the executing applications are met and the total power consumption of the overall system is minimized. The experimental results demonstrate power minimization and high accelerator utilization for a many-core system. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-13	THERMAL-AWARE FLOORPLANNING FOR PARTIALLY-RECONFIGURABLE FPGA-BASED SYSTEMS Speakers: Davide Pagano, Mikel Vuka, Marco Rabozzi, Riccardo Cattaneo, Donatella Sciuto and Marco D. Santambrogio, Politecnico di Milano, IT Abstract Field Programmable Gate Arrays (FPGAs) systems are being more and more frequent in high performance applications. Temperature affects both reliability and performance, therefore its optimization has become challenging for system designers. In this work we present a novel thermal aware floorplanner based on both Simulated Annealing (SA) and Mixed- Integer Linear Programming (MILP). The proposed method takes into account an accurate description of heterogeneous resources and partially reconfigurable constraints of recent FPGAs. Our major contribution is to provide a high level formulation for the problem, without resorting to low level consideration about FPGAs resources. Within our approach we combine the benefits of SA and MILP to handle both linear and non-linear optimization metrics while providing an effective exploration of the solution space. Experimental results show that, for several designs, it is possible to reduce the peak temperature by taking into account power consumption during the floorplanning stage. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-14	FEEDBACK-BUS OSCILLATION RING: A GENERAL ARCHITECTURE FOR DELAY CHARACTERIZATION AND TEST OF INTERCONNECTS Speakers: Shi-Yu Huang¹, Meng-Ting Tsai¹, Kun-Han Tsai² and Wu-Tung Cheng² ¹National Tsing Hua University, TW; ²Mentor, US Abstract In this paper we propose a flexible delay characterization and test method for arbitrary die-to-die interconnects in a 3D IC. As compared to previous works, it is unique in its ability to streamline the characterization/test operations for a set of arbitrary interconnects with multiple pins sprawling multiple dies. During the Design-for-Testability stage, one common feedback-bus (connected to all dies in the IC under characterization/test) is inserted. Through the feedback-bus, a oscillation ring can be formed dynamically and the Variable-Output-Threshold (VOT) technique can be applied to characterize the delay of a selected interconnect segment at a time. Experimental results indicate that this method is not only flexible and scalable, but requiring only a small area overhead. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-15	ANALOG NEUROMORPHIC COMPUTING ENABLED BY MULTI-GATE PROGRAMMABLE RESISTIVE DEVICES Speakers: Vehbi Calayir, Mohamed Darwish, Jeffrey Weldon and Larry Pileggi, Carnegie Mellon University, US Abstract Analog neural networks represent a massively parallel computing paradigm by mimicking the human brain. Two important functions that are not efficiently built by CMOS technology for their practical hardware implementations are weighting for synapse circuits and summing for neuron circuits. In this paper we propose the use of tunable analog resistances, such as multi-gate graphene devices, to efficiently enable these two functions. We design and demonstrate a complete analog neuromorphic circuitry enabled by such devices. Simulation results based on Verilog-A compact models for graphene devices confirm its functionality. We also provide experimental demonstration of our proposed graphene device along with projected circuit performance based on scaling targets. Our proposed design is suitable not only for the device example shown in this paper, but also for any beyond-CMOS technology that exhibits similar device characteristics. Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-16	AN ENERGY-EFFICIENT NON-VOLATILE IN-MEMORY ACCELERATOR FOR SPARSE-REPRESENTATION BASED FACE RECOGNITION Speakers: Yuhao Wang¹, Hantao Huang¹, Leibin Ni¹, Hao Yu¹, Mei Yan¹, Chuliang Weng², Wei Yang² and Junfeng Zhao² ¹Nanyang Technological University, SG; ²Shannon Laboratory, Huawei Technologies Co., Ltd, CN Abstract Data analytics such as face recognition involves large volume of image data, and hence leads to grand challenge on mobile platform design with strict power requirement. Emerging non-volatile STT-MRAM has the minimum leakage power and comparable speed to SRAM, and hence is considered as a promising candidate for data-oriented mobile computing. However, there exists significantly higher write-energy for STT-MRAM when compared to the SRAM. Based on the use of STT- MRAM, this paper introduces an energy-efficient non-volatile in-memory accelerator for a sparse-representation based face recognition algorithm. We find that by projecting high-dimension image data to much lower dimension, the current scaling for STT-MRAM write operation can be applied aggressively, which leads to significant power reduction yet maintains quality-of-service for face recognition. Specifically, compared to a baseline with SRAM, leakage power and dynamic power are reduced by 91.4% and 79% respectively with only slight compromise on recognition rate. Download Paper (PDF; Only available from the DATE venue WiFi)

< Return to last page

Visit us at DATE 2015

Booth: 2+3

Booth: 11

Booth: 1

Booth: 5

Booth: 4

Booth: 9

Booth: 8

Booth: 10

Booth: 14

Submissions

IP3 Interactive Presentations

Visit us at DATE 2015