IP5 Interactive Presentations

Printer-friendly version PDF version

Date: Thursday 12 March 2015
Time: 15:30 - 16:00
Location / Room: Exhibition Area

Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the morning. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. At the end of each afternoon Interactive Presentations session the award 'Best IP of the Day' is given.

LabelPresentation Title
Authors
IP5-1TOWARDS SYSTEMATIC DESIGN OF 3D PNML LAYOUTS
Speakers:
Robert Perricone1, Yining Zhu2, Katherine Sanders1, X. Sharon Hu1 and Michael Niemier1
1University of Notre Dame, US; 2Zhejiang University, CN
Abstract
Nanomagnetic logic (NML) is a ``beyond-CMOS'' technology that uses bistable magnets to store, process, and move binary information. Compared to CMOS, NML has several advantages such as non-volatility, lower power consumption, and radiation hardness. Recently, NML devices with perpendicular magnetic anisotropy (pNML) have been experimentally demonstrated to perform logic operations in three dimensions. 3D pNML layouts provide additional benefits such as simplified signal routing and greater integration density. However, designing functional 3D pNML circuits can be challenging as one must consider the effects of fringing magnetic fields in three dimensions. Furthermore, the current process of designing 3D pNML layouts is little more than a trial-and-error-based approach, which is infeasible for larger, more complex designs. In this paper, we propose a systematic approach to designing 3D pNML layouts. Our design process leverages a machine learning-inspired prediction approach that examines the effects of varying individual device parameters (e.g., length, width, etc.) and predicts functional configurations.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-2DESTINY: A TOOL FOR MODELING EMERGING 3D NVM AND EDRAM CACHES
Speakers:
Matt Poremba1, Sparsh Mittal2, Dong Li2, Jeffrey Vetter3 and Yuan Xie4
1Pennsylvania State University, US; 2Oak Ridge National Lab, US; 3Oak Ridge National Lab and Georgia Institute of Technology, US; 4University of California, Santa Barbara, US
Abstract
The continuous drive for performance has pushed the researchers to explore novel memory technologies (e.g. non-volatile memory) and novel fabrication approaches (e.g. 3D stacking) in the design of caches. However, a comprehensive tool which models both conventional and emerging memory technologies for both 2D and 3D designs has been lacking. We present DESTINY, a microarchitecture-level tool for modeling 3D (and 2D) cache designs using SRAM, embedded DRAM (eDRAM), spin transfer torque RAM (STT-RAM), resistive RAM (ReRAM) and phase change RAM (PCM). DESTINY facilitates design-space exploration across several dimensions, such as optimizing for a target (e.g. latency or area) for a given memory technology, choosing the suitable memory technology or fabrication method (i.e. 2D v/s 3D) for a desired optimization target etc. DESTINY has been validated against industrial cache prototypes. We believe that DESTINY will drive architecture and system-level studies and will be useful for researchers and designers.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-3BIG-DATA STREAMING APPLICATIONS SCHEDULING WITH ONLINE LEARNING AND CONCEPT DRIFT DETECTION
Speakers:
Karim Kanoun1 and Mihaela van der Schaar2
1École Polytechnique Fédérale de Lausanne (EPFL), CH; 2University of California, Los Angeles, US
Abstract
Several techniques have been proposed to adapt Big-Data streaming applications to resource constraints. These techniques are mostly implemented at the application layer and make simplistic assumptions about the system resources and they are often agnostic to the system capabilities. Moreover, they often assume that the data streams characteristics and their processing needs are stationary, which is not true in practice. In fact, data streams are highly dynamic and may also experience concept drift, thereby requiring continuous online adaptation of the throughput and quality to each processing task. Hence, existing solutions for Big-Data streaming applications are often too conservative or too aggressive. To address these limitations, we propose an online energy-efficient scheduler which maximizes the QoS (i.e., throughput and output quality) of Big-Data streaming applications under energy and resources constraints. Our scheduler uses online adaptive reinforcement learning techniques and requires no offline information. Moreover, our scheduler is able to detect concept drifts and to smoothly adapt the scheduling strategy. Our experiments realized on a chain of tasks modeling real-life streaming application demonstrate that our scheduler is able to learn the scheduling policy and to adapt it such that it maximizes the targeted QoS given energy constraint as the Big-Data characteristics are dynamically changing.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-4(Best Paper Award Candidate)
DESIGN FLOW AND RUN-TIME MANAGEMENT FOR COMPRESSED FPGA CONFIGURATIONS
Speakers:
Christophe Huriaux1, Antoine Courtay1 and Olivier Sentieys2
1University of Rennes 1 - IRISA, FR; 2INRIA, FR
Abstract
The aim of partially and dynamically reconfigurable hardware is to provide an increased flexibility through the load of multiple applications on the same reconfigurable fabric at the same time. However, a configuration bit-stream loaded at runtime should be created offline for each task of the application. Moreover, modern applications use a lot of specialized hardware blocks to perform complex operations, which tends to cancel the "single bit-stream for a single application" paradigm, as the logic content for different locations of the reconfigurable fabric may be different. In this paper we propose a design flow for generating compressed configuration bit-streams abstracted from their final position on the logic fabric. Those configurations will then be decoded and finalized in real-time and at run-time by a dedicated reconfiguration controller to be placed at a given physical location. Our experiments show that densely routed applications gain the most with a compression factor of more than 2× using the finest cluster size, but coarser coding can be implemented to achieve a compression factor up to 10×.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-5EMPIRICAL MODELLING OF FDSOI CMOS INVERTER FOR SIGNAL/POWER INTEGRITY SIMULATION
Speakers:
Wael Dghais and Jonathan Rodriguez, Instituto de Telecomunicações, PT
Abstract
This paper presents a multiport empirical model based on artificial neural network for I/O memory interface (e.g. inverter) designed based on fully depleted silicon on isolator (FDSOI) CMOS 28 nm process for signal and power integrity assessments. The analog mixed-signal identification signals that carry the information about the I/O interface's nonlinear dynamic behavior are recorded from large signal simulation setup. The model's functions are extracted based on a nonlinear optimization algorithm and then implemented in Simulink software. The performance of the resulted model is validated in typical power and ground switching noise scenario. The developed empirical model accurately predicts the timing signal waveforms at the power, ground, and at the output port.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-6ON-CHIP MEASUREMENT OF BANDGAP REFERENCE VOLTAGE USING A SMALL FORM FACTOR VCO BASED ZOOM-IN ADC
Speakers:
Osman Erol1, Sule Ozev1, Chandra K. H. Suresh2, Rubin Parekhji3 and Lakshmanan Balasubramanian3
1ASU, US; 2NYU-Abu Dhabi, AE; 3TI, IN
Abstract
A robust and highly scalable technique for measuring the output voltage of a band-gap reference (BGR) circuit is described. The proposed technique is based on an ADC architecture that uses a voltage controlled oscillator (VCO) for voltage to frequency conversion. During production testing, an external voltage reference is used to approximate the voltage/frequency characteristics of the VCO with 5ms test time. The proposed zoom-in ADC approach is manufactured with 0.5um single well CMOS process. Measurement results indicate that 13 bits of resolution within the measurement range can be achieved with the zoom-in approach. Worst-case INL for the ADC is less than 0.25LSB (50V).

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-7LOGICAL EQUIVALENCE CHECKING OF ASYNCHRONOUS CIRCUITS USING COMMERCIAL TOOLS
Speakers:
Arash Saifhashemi1, Hsin-Ho Huang2, Priyanka Bhalerao3 and Peter Beerel2
1Intel, US; 2University of Southern California, US; 3yahoo, US
Abstract
We propose a method for logical equivalence check (LEC) of asynchronous circuits using commercial synchronous tools. In particular, we verify the equivalence of asynchronous circuits which are modeled at the CSP-level in SystemVerilog as well as circuits modeled at the micro-architectural level using conditional communication library primitives. Our approach is based on a novel three-valued logic model that abstracts the detailed handshaking protocol and is thus agnostic to different gate-level implementations, making it applicable to a variety of different design styles. Our experimental results with commercial LEC tools on a variety of computational blocks and an asynchronous microprocessor demonstrate the applicability and limitations of the proposed approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-8MAY-HAPPEN-IN-PARALLEL ANALYSIS OF ELECTRONIC SYSTEM LEVEL MODELS USING UPPAAL MODEL CHECKING
Speakers:
Che-Wei Chang and Rainer Doemer, University of California Irvine, US
Abstract
In this paper, we propose an approach for May- Happen-in-Parallel (MHP) analysis of electronic system level (ESL) design which models parallel discrete event simulation with concurrent automaton processes and formally identify those MHP states. Our MHP analysis utilizes formal verification by use of the UPPAAL model checker. The proposed approach converts the system model in SpecC SLDL into an UPPAAL model and generates a set of queries that automatically and completely finds all possible MHP pairs. The experimental results show our approach can report more precise MHP analysis results compared to other works at the cost of extended analysis run time.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-9VERIFYING SYNCHRONOUS REACTIVE SYSTEMS USING LAZY ABSTRACTION
Speakers:
Kumar Madhukar1, Mandayam Srivas2, Bjorn Wachter3, Daniel Kroening3 and Ravindra Metta1
1Tata Research Development and Design Center, IN; 2Chennai Mathematical Institute, IN; 3University of Oxford, GB
Abstract
Embedded software systems are frequently modeled as a set of synchronous reactive processes. The transitions performed by the processes are given as sequential, atomic code blocks. Most existing verifiers flatten such programs into a global transition system, to be able to apply off-the-shelf verification methods. However, this monolithic approach fails to exploit the lock-step execution of the processes, severely limiting scalability. We present a novel formal verification technique that analyses synchronous concurrency explicitly rather than encoding it. We present a variant of Lazy Abstraction with Interpolants (LAWI), a technique successfully used in software verification, and tailor it to synchronous reactive concurrency. We exploit the synchronous communication structure by fixing an execution schedule, circumventing the exponential blow-up of state space caused by simulating synchronous behaviour by means of interleavings. The technique is implemented in SYMPARA, a verification tool for synchronous reactive systems. To evaluate the effectiveness of our technique, we compare SYMPARA with Bounded Model Checking and k-induction, and a LAWI-based verifier for multi-threaded (asynchronous) software. On several realistic examples SYMPARA outperforms the other tools by an order of magnitude.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-10SPINTASTIC: SPIN-BASED STOCHASTIC LOGIC FOR ENERGY-EFFICIENT COMPUTING
Speakers:
Rangharajan Venkatesan1, Swagath Venkataramani2, Xuanyao Fong2, Kaushik Roy2 and Anand Raghunathan2
1NVIDIA Corporation, US; 2Purdue University, US
Abstract
Spintronics is one of the leading technologies under consideration for the post-CMOS era. While spintronic memories have demonstrated great promise due to their density, non-volatility and low leakage, efforts to realize spintronic logic have been much less fruitful. Recent studies project the performance and energy efficiency of spintronic logic to be considerably inferior to CMOS. In this work, we explore Stochastic Computing (SC) as a new direction for the realization of energy-efficient logic using spintronic devices. We establish the synergy between stochastic computing and spintronics by demonstrating that (i) the peripheral circuits required for SC to convert to/from stochastic domains, which incur significant energy overheads in CMOS, can be efficiently realized by exploiting the characteristics of spintronic devices, and (ii) the low logic complexity and fine-grained parallelism in SC circuits can be leveraged to alleviate the shortcomings of spintronic logic. We propose SPINTASTIC, a new design approach in which all the components of stochastic circuits — stochastic number generators, stochastic arithmetic units, and stochastic-to-binary converters — are realized using spintronic devices. Our experiments on a range of benchmarks from different application domains demonstrate that SPINTASTIC achieves 2.8X improvement in energy over CMOS stochastic implementations and 1.9X over a CMOS binary baseline.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-11LEAKAGE POWER REDUCTION FOR DEEPLY-SCALED FINFET CIRCUITS OPERATING IN MULTIPLE VOLTAGE REGIMES USING FINE-GRAINED GATE-LENGTH BIASING TECHNIQUE
Speakers:
Ji Li, Qing Xie, Yanzhi Wang, Shahin Nazarian and Massoud Pedram, University of Southern California, US
Abstract
With the aggressive downscaling of the process technologies and importance of battery-powered systems, reducing leakage power consumption has become one of the most crucial design challenges for IC designers. This paper presents a device-circuit cross-layer framework to utilize fine-grained gate-length biased FinFETs for circuit leakage power reduction in the near- and super-threshold operation regimes. The impacts of Gate-Length Biasing (GLB) on circuit speed and leakage power are first studied using one of the most advanced technology nodes - a 7nm FinFET technology. Then multiple standard cell libraries using different leakage reduction techniques, such as GLB and Dual-VT, are built in multiple operating regimes at this technology node. It is demonstrated that, compared to Dual-VT, GLB is a more suitable technique for the advanced 7nm FinFET technology due to its capability of delivering a finer-grained trade-off between the leakage power and circuit speed, not to mention the lower manufacturing cost. The circuit synthesis results of a variety of ISCAS benchmark circuits using the presented GLB 7nm FinFET cell libraries show up to 70% leakage improvement with zero degradation in circuit speed in the near- and super-threshold regimes, respectively, compared to the standard 7nm FinFET cell library.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-12SUBHUNTER: A HIGH-PERFORMANCE AND SCALABLE SUB-CIRCUIT RECOGNITION METHOD WITH PRüFER-ENCODING
Speakers:
Hong-Yan Su, Chih-Hao Hsu and Yih-Lang Li, National Chiao Tung University, TW
Abstract
Sub-circuit recognition (SR) is a problem of recognizing sub-circuits within a given circuit and is a fundamental component in simulation, verification and testing of computer-aided design. The SR problem can be formulated as subgraph isomorphism problem. Performance of previous works is not scalable as the complexities of modern designs increase. In this paper we propose a novel Prüfer-encoding based SR algorithm that performs scalable and high-performance sub-circuit matching. Several techniques including tree structure partition, tree cutting and circuit graph encoding are proposed herein to decompose the SR problem into several small sub-sequence matching problems. A pre-filtering strategy is applied before matching to remove the sub-circuits that are not likely to be matched. A fast branch and bound approach is developed to identify all the sub-circuits within the given circuit. Experimental results show that SubHunter can achieve better performance than SubGemini and detect all the sub-circuits as well. As the circuit size increases, we can also achieve near linear runtime growth that outperforms the exponential growth for SubGemini, showing the scalability of the proposed algorithm.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-13TIMING VERIFICATION FOR ADAPTIVE INTEGRATED CIRCUITS
Speakers:
Rohit Kumar1, Bing Li2, Yiren Shen1, Ulf Schlichtmann2 and Jiang Hu1
1Texas A&M University, College Station, US; 2Technische Universität München, DE
Abstract
An adaptive circuit can perform built-in self-detection of timing variations and accordingly adjust itself to avoid timing violations. Compared with conventional over-design approach, adaptive circuit design is conceptually advantageous in terms of power-efficiency. Although the advantage has been witnessed in numerous previous works including test chips, adaptive design is far from being widely used in practice. A key reason is the lack of corresponding timing veri fication support. We develop new timing analysis techniques to fi ll this void. A main challenge is the large runtime complexity due to numerous adaptivity configurations. We propose several pruning and reduction techniques and apply them in conjunction with statistical static timing analysis (SSTA). The proposed method is validated on benchmark circuits including the recent ISPD'13 suite, which has circuit as large as 150K gates. The results show that our method can achieve orders of magnitude speedup over Monte Carlo simulation with about the same accuracy. It is also several times faster than an exhaustive application of SSTA.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-14A ROBUST APPROACH FOR PROCESS VARIATION AWARE MASK OPTIMIZATION
Speakers:
Jian Kuang, Wing-Kai Chow and Evangeline Young, The Chinese University of Hong Kong, HK
Abstract
As the minimum feature size continues to shrink, whereas the wavelength of light used for lithography remains constant, Resolution Enhancement Techniques are widely used to optimize mask, so as to improve the subwavelength printability. Besides correcting for error between the printed image and target shape, a mask optimization method also needs to consider process variation. In this paper, a robust mask optimization approach is proposed to optimize the process window as well as the Edge Placement Error (EPE) of the printed image. Experiments results on the public benchmarks are encouraging.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-15FASTTREE: A HARDWARE KD-TREE CONSTRUCTION ACCELERATION ENGINE FOR REAL-TIME RAY TRACING
Speakers:
Xingyu Liu, Yangdong Deng, Yufei Ni and Zonghui Li, Institute of Microelectronics, Tsinghua University, CN
Abstract
The ray tracing algorithm is well-known for its ability to generate photo-realistic rendering effects. Recent years have witnessed a renewed momentum in pushing it to real-time for better user experience. Today the construction of acceleration structures, e.g., kd-tree, has become the bottleneck of ray tracing. A dedicated hardware architecture, FastTree, was proposed for kd-tree construction by adopting a fully parallel construction algorithm. FastTree was validated by an FPGA prototype and evaluated as an ASIC implementation. Experiment result shows FastTree outperforms existing hardware construction engines by a factor of nearly 4X at a similar area and power budget.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-16REVERSE LONGSTAFF-SCHWARTZ AMERICAN OPTION PRICING ON HYBRID CPU/FPGA SYSTEMS
Speakers:
Christian Brugger, Javier Alejandro Varela, Norbert Wehn, Songyin Tang and Ralf Korn, University of Kaiserslautern, DE
Abstract
In today's markets, high-speed and energy-efficient computations are mandatory in the financial and insurance industry. At the same time, the gradual convergence of high-performance computing with embedded systems is having a huge impact on the design methodologies, where dedicated accelerators are implemented to increase performance and energy efficiency. This paper follows this trend and presents a novel way to price high-dimensional American options using techniques of the embedded community. The proposed architecture targets heterogeneous CPU/FPGA systems, and it exploits the FPGA reconfiguration to deliver high-throughput. With a bit-true algorithmic transformation based on recomputation, it is possible to eliminate the memory bottleneck and access costs. The result is a pricing system that is 16x faster and 268x more energy-efficient than an optimized Intel CPU implementation.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-17ACCURATE ELECTROTHERMAL MODELING OF THERMOELECTRIC GENERATORS
Speakers:
Mohammad Javad Dousti1, Antonio Petraglia2 and Massoud Pedram1
1University of Southern California, US; 2Federal University of Rio de Janeiro, BR
Abstract
Thermoelectric generators (TEGs) provide a unique way for harvesting thermal energy. These devices are compact, durable, inexpensive, and scalable. Unfortunately, the conversion efficiency of TEGs is low. This requires careful design of energy harvesting systems including the interface circuitry between the TEG module and the load, with the purpose of minimizing power losses. In this paper, it is analytically shown that the traditional approach for estimating the internal resistance of TEGs may result in a significant loss of harvested power. This drawback comes from ignoring the dependence of the electrical behavior of TEGs on their thermal behavior. Accordingly, a systematic method for accurately determining the TEG input resistance is presented. Next, through a case study on automotive TEGs, it is shown that compared to prior art, more than 11% of power losses in the interface circuitry that lies between the TEG and the electrical load can be saved by the proposed modeling technique. In addition, it is demonstrated that the traditional approach would have resulted in a deviation from the target regulated voltage by as much as 59%.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-18EFFICIENCY-DRIVEN DESIGN TIME OPTIMIZATION OF A HYBRID ENERGY STORAGE SYSTEM WITH NETWORKED CHARGE TRANSFER INTERCONNECT
Speakers:
Qing Xie1, Younghyun Kim2, Donkyu Baek3, Yanzhi Wang1, Massoud Pedram1 and Naehyuck Chang4
1University of Southern California, US; 2Purdue University, US; 3Korea Advanced Institute of Science and Technology, KR; 4Seoul National University, KR
Abstract
This paper targets at the state-of-art hybrid energy storage systems (HESSs) with a networked charge transfer interconnect and solves a node placement problem in the HESS, where a node refers to a storage bank, a power source, or a load device, with its distributed power converter. In particular, the node placement problem is formulated as how to place the nodes in a HESS such that the optimal total charge transfer efficiency is achieved, with accurate modelings of all kinds of different components in the HESS. The methodology of FPGA placement problem is adopted to solve the node placement in HESS by properly defining a cost function that strongly relates the charge transfer efficiency to the node placement, properties of HESS components, as well as applications of the HESS. An algorithm that combines a quadratic programming method to generate an initial placement and a simulated annealing method to converge to the optimal placement result is presented in this paper. Experimental results demonstrate the efficacy of the placement algorithm and improvements in the charge transfer efficiency for various problem setups and scales.

Download Paper (PDF; Only available from the DATE venue WiFi)