Date: Wednesday 21 March 2018
Time: 10:00 - 10:30
Location / Room: Conference Level, Foyer
Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session
Label | Presentation Title Authors |
---|---|
IP2-1 | IN-GROWTH TEST FOR MONOLITHIC 3D INTEGRATED SRAM Speaker: Yixun Zhang, Shanghai Jiao Tong University, CN Authors: Pu Pang1, Yixun Zhang1, Tianjian Li1, Sung Kyu Lim2, Quan Chen1, Xiaoyao Liang1 and Li Jiang1 1Shanghai Jiao Tong University, CN; 2Georgia Tech, US Abstract Monolithic three-dimensional integration (M3I) directly fabricates tiers of integrated circuits upon each other and provides millions of vertical interconnections with interlayer vias (ILVs). It thus brings higher integration density and communication capability compared with three-dimensional stacked integration (3D-SI). However, the Known-Good-Die problem haunting 3D-SI-a faulty tier causes the failure of the entire stack-also occurs in M3I. Lack of efficient test methodologies such as the pre-bond testing in 3D-SI, M3I may have a more significant yield drop and thus its cost may be unacceptable for main-stream adoption. This paper introduces a novel In-growth test method for M3I SRAM. We propose a novel Design-for- Test (DfT) methodology to enable the proposed In-growth test on cell-level partitioned incomplete SRAM cells. We also build a statistical model of cost and discover a prospective judgement to determine whether or not to stop the fabrication, in order to prevent from raising the cost of fabricating more tiers upon the irreparable tiers. We find that a "sweet point" exists in the judgement, which can minimize the overall cost. Experimental results show the effectiveness of our proposed test methodology. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-2 | A CO-DESIGN METHODOLOGY FOR SCALABLE QUANTUM PROCESSORS AND THEIR CLASSICAL ELECTRONIC INTERFACE Speaker: Jeroen van Dijk, Delft University of Technology, NL Authors: Jeroen van Dijk1, Andrei Vladimirescu2, Masoud Babaie1, Edoardo Charbon1 and Fabio Sebastiano1 1Delft University of Technology, NL; 2University of California, Berkeley, US Abstract A quantum computer fundamentally comprises a quantum processor and a classical controller. The classical electronic controller is used to correct and manipulate the qubits, the core components of a quantum processor. To enable quantum computers scalable to millions of qubits, as required in practical applications, the simultaneous optimization of both the classical electronic and quantum systems is needed. In this paper, a co-design methodology is proposed for obtaining an optimized qubit performance while considering practical trade-offs in the control circuits, such as power consumption, complexity, and cost. The SPINE (SPIN Emulator) toolset is introduced for the co-design and co-optimization of electronic/quantum systems. It comprises a circuit simulator enhanced with a Verilog-A model emulating the quantum behavior of single-electron spin qubits. Design examples show the effectiveness of the proposed methodology in the optimization, design and verification of a whole electronic/quantum system. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-3 | APPROXIMATE QUATERNARY ADDITION WITH THE FAST CARRY CHAINS OF FPGAS Speaker: Philip Brisk, University of California, Riverside, US Authors: Sina Boroumand1, Hadi P. Afshar2 and Philip Brisk3 1University of Tehran, IR; 2Qualcomm Research, US; 3University of California, Riverside, US Abstract A heuristic is presented to efficiently synthesize approximate adder trees on Altera and Xilinx FPGAs using their carry chains. The mapper constructs approximate adder trees using an approximate quaternary adder as the fundamental building block. The approximate adder trees are smaller than exact adder trees, allowing more operators to fit into a fixed-area device, trading off arithmetic accuracy for higher throughput. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-4 | NN COMPACTOR: MINIMIZING MEMORY AND LOGIC RESOURCES FOR SMALL NEURAL NETWORKS Speaker: Seongmin Hong, Hongik University, KR Authors: Seongmin Hong1, Inho Lee1 and Yongjun Park2 1Hongik University, KR; 2Hanyang University, KR Abstract Special neural accelerators are an appealing hardware platform for machine learning systems because they provide both high performance and energy efficiency. Although various neural accelerators have recently been introduced, they are difficult to adapt to embedded platforms because current neural accelerators require high memory capacity and bandwidth for the fast preparation of synaptic weights. Embedded platforms are often unable to meet these memory requirements because of their limited resources. In FPGA-based IoT (internet of things) systems, the problem becomes even worse since computation units generated from logic blocks cannot be fully utilized due to the small size of block memory. In order to overcome this problem, we propose a novel dual-track quantization technique to reduce synaptic weight width based on the magnitude of the value while minimizing accuracy loss. In this value-adaptive technique, large and small value weights are quantized differently. In this paper, we present a fully automatic framework called NN Compactor that generates a compact neural accelerator by minimizing the memory requirements of synaptic weights through dual-track quantization and minimizing the logic requirements of PUs with minimum recognition accuracy loss. For the three widely used datasets of MNIST, CNAE-9, and Forest, experimental results demonstrate that our compact neural accelerator achieves an average performance improvement of 6.4x over a baseline embedded system using minimal resources with minimal accuracy loss. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-5 | IMPROVING FAST CHARGING EFFICIENCY OF RECONFIGURABLE BATTERY PACKS Speaker: Alexander Lamprecht, TUM CREATE, SG Authors: Alexander Lamprecht1, Swaminathan Narayanaswamy1 and Sebastian Steinhorst2 1TUM CREATE, SG; 2Technical University of Munich, DE Abstract Recently, reconfigurable battery packs that can dynamically modify the electrical connection topology of their individual cells are gaining importance. While several circuit architectures and management algorithms are proposed in the literature, the electrical characteristics of the reconfiguration circuit architectures are not sufficiently studied so far. In this paper, we derive a detailed analytical model for a state-of-the-art reconfiguration architecture capturing the losses introduced by the parasitic resistances of the circuit components. For the first time, we propose a novel fast charging strategy using the reconfiguration architecture that significantly reduces the power losses in comparison to conventional battery packs. Moreover, using the analytical model, we highlight the challenges faced by existing reconfiguration architectures using state-of-the-art components and we derive the specifications for the switches which are essential for improving the energy efficiency of such reconfigurable battery packs. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-6 | CLOUD-ASSISTED CONTROL OF GROUND VEHICLES USING ADAPTIVE COMPUTATION OFFLOADING TECHNIQUES Speaker: Soheil Samii, General Motors R&D, Warren, MI 48090, US Authors: Arun Adiththan1, Ramesh S2 and Soheil Samii2 1City University of New York, US; 2General Motors R&D, US Abstract The existing approaches to design efficient safety-critical control applications is constrained by limited in-vehicle sensing and computational capabilities. In the context of automated driving, we argue that there is a need to leverage resources "out-of-the-vehicle" to meet the sensing and powerful processing requirements of sophisticated algorithms (e.g., deep neural networks). To realize the need, a suitable computation offloading technique that meets the vehicle safety and stability requirements, even in the presence of unreliable communication network, has to be identified. In this work, we propose an adaptive offloading technique for control computations into the cloud. The proposed approach considers both current network conditions and control application requirements to determine the feasibility of leveraging remote computation and storage resources. As a case study, we describe a cloud-based path following controller application that leverages crowdsensed data for path planning. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-7 | FUSIONCACHE: USING LLC TAGS FOR DRAM CACHE Speaker: Evangelos Vasilakis, Chalmers University of Technology, SE Authors: Evangelos Vasilakis1, Vassilis Papaefstathiou2, Pedro Trancoso1 and Ioannis Sourdis1 1Chalmers University of Technology, SE; 2FORTH-ICS, GR Abstract DRAM caches have been shown to be an effective way to utilize the bandwidth and capacity of 3D stacked DRAM. Although they can capture the spatial and temporal data locality of applications, their access latency is still substantially higher than conventional on-chip SRAM caches. Moreover, their tag access latency and storage overheads are excessive. Storing tags for a large DRAM cache in SRAM is impractical as it would occupy a significant fraction of the processor chip. Storing them in the DRAM itself incurs high access overheads. Attempting to cache the DRAM tags on the processor adds a constant delay to the access time. In this paper, we introduce FusionCache, a DRAM cache that offers more efficient tag accesses by fusing DRAM cache tags with the tags of the on-chip Last Level Cache (LLC). We observe that, in an inclusive cache model where the DRAM cachelines are multiples of on-chip SRAM cachelines, LLC tags could be re-purposed to access a large part of the DRAM cache contents. Then, accessing DRAM cache tags incurs zero additional latency in the common case. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-8 | IMPROVED SYNTHESIS OF CLIFFORD+T QUANTUM FUNCTIONALITY Speaker: Philipp Niemann, German Research Center for Articial Intelligence (DFKI GmbH), DE Authors: Philipp Niemann1, Robert Wille2 and Rolf Drechsler3 1Cyber-Physical Systems, DFKI GmbH, DE; 2Johannes Kepler University Linz, AT; 3University of Bremen/DFKI GmbH, DE Abstract The Clifford+T library provides robust and fault-tolerant realizations for quantum computations. Consequently, (logic) synthesis of Clifford+T quantum circuits became an important research problem. However, previously proposed solutions are either only applicable to very small quantum systems or lead to circuits that are far from being optimal—mainly caused by a local, i.e. column-wise, consideration of the underlying transformation matrix to be synthesized. In this paper, we suggest an improved approach that considers the matrix globally and, by this, overcomes many of these drawbacks. Preliminary evaluations show the promises of this direction. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-9 | ENERGY-EFFICIENT CHANNEL ALIGNMENT OF DWDM SILICON PHOTONIC TRANSCEIVERS Speaker: Yuyang Wang, University of California, Santa Barbara, US Authors: Yuyang Wang1, M. Ashkan Seyedi2, Rui Wu1, Jared Hulme2, Marco Fiorentino2, Raymond G. Beausoleil2 and Kwang-Ting Cheng3 1University of California, Santa Barbara, US; 2Hewlett Packard Labs, US; 3Hong Kong University of Science and Technology, HK Abstract The comb laser-driven microring-based dense wavelength division multiplexing silicon photonics is a promising candidate for next-generation optical interconnects. However, existing solutions for exploring the power-performance trade-off of such systems have been restricted to a limited design space, resulting from the unnecessary constraints of using an identical spacing for laser comb lines and microring channels, and of utilizing consecutive laser comb lines for data transmission. We propose an energy-efficient channel alignment scheme that aligns the microring channels to a subset of laser comb lines that are non-uniformly distributed in the free spectrum range of the microrings. Based on a well-established process variation model, our simulations show that the proposed scheme significantly reduces the microring tuning power in the presence of denser comb lines. The power saved from microring tuning can improve the overall system energy efficiency despite some power wasted in unused laser comb lines. We further conducted a case study for design space exploration using the proposed channel alignment scheme, seeking the most energy-efficient configuration in order to achieve a target aggregated data rate. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-10 | A PHYSICAL SYNTHESIS FLOW FOR EARLY TECHNOLOGY EVALUATION OF SILICON NANOWIRE BASED RECONFIGURABLE FETS Speaker: Shubham Rai, Chair For Processor Design, CFAED, Technische Universität Dresden, Dresden, DE Authors: Shubham Rai1, Ansh Rupani2, Dennis Walter1, Michael Raitza1, André Heinzig3, Christian Mayr1, Walter Weber4 and Akash Kumar1 1Technische Universität Dresden, DE; 2Birla Institute of Technology and Science Pilani, Hyderabad Campus, IN; 3NaMLab GmbH, DE; 4NaMLab gGmbH and CfAED, DE Abstract Silicon Nanowire based reconfigurable transistors (RFETs) provide an additional gate terminal called the program gate which gives the freedom of programming p-type or n-type functionality for the same device at runtime. This enables the circuit designers to pack more functionality per computational unit. This saves processing costs as only one device type is required. No doping and associated lithography steps are needed for this technology. In this paper, we present a complete design flow including both logic and physical synthesis for circuits based on SiNW RFETs. We propose layouts of logic gates, Liberty and LEF (library extension format) files for the physical synthesis flow and make these available under an open source license to enable further research in the domain of these novel, functionally enhanced transistors. We develop a table model based on a transistor cell with relaxed dimensions following an SOI-based 22 nm technology having a gate pitch of 110 nm and modeled our logic gates on dual gate RFETs. For the sake of comparison, we use the same tool flow for CMOS. We show that in the first of its kind comparison, for these fully symmetrical reconfigurable transistors, the area after placement and routing for SiNW based circuits is 17% more than that of CMOS for MCNC benchmark. Further, we discuss areas of improvement for obtaining better area results from the silicon nanowire based RFETs from a fabrication and technology point of view. The future use of self-aligned techniques to structure two independent gates within a smaller pitch holds the promise of substantial area reduction. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-11 | ETISS-ML: A MULTI-LEVEL INSTRUCTION SET SIMULATOR WITH RTL-LEVEL FAULT INJECTION SUPPORT FOR THE EVALUATION OF CROSS-LAYER RESILIENCY TECHNIQUES Speaker: Martin Dittrich, Technical University of Munich, DE Authors: Daniel Mueller-Gritschneder1, Martin Dittrich1, Josef Weinzierl1, Eric Cheng2, Subhasish Mitra2 and Ulf Schlichtmann1 1Technical University of Munich, DE; 2Stanford University, US Abstract ETISS is an instruction set simulator (ISS) for Virtual Prototypes (VPs) modeled with SystemC/TLM. In this paper, we propose the extension ETISS-ML, which enables a multi-level simulation that switches between ISS-level and register transfer level (RTL) to accurately evaluate the impact of soft errors in the pipeline of a RISC processor. ETISS-ML achieves close-to-RTL-accurate fault injection simulation results with close-to-ISS simulation performance with a speed up gain up to 100x compared to RTL. For this, we propose an approach to dynamically determine the length of the RTL simulation period. The high simulation performance of ETISS-ML enables an ultra-efficient and accurate evaluation of cross-layer resiliency techniques for embedded applications, which requires running a large number of fault injections for long simulation scenarios. This is demonstrated on a case study of a Microcontroller Unit (MCU) executing a control algorithm for adaptive cruise control. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-12 | PRECISE EVALUATION OF THE FAULT SENSITIVITY OF OOO SUPERSCALAR PROCESSORS Speaker: Antonio Carlos Schneider Beck, Federal University of Rio Grande do Sul, BR Authors: Rafael Tonetto1, Gabriel Luca Nazar2 and Antonio Carlos Schneider Beck2 1Federal University of Rio Grande do Sul, BR; 2Universidade Federal do Rio Grande do Sul, BR Abstract Since superscalar processors lead the market, their resiliency evaluation by means of fault injection grows in importance. Fault injection strategies usually trade-off their levels of accuracy: low-level HW-based methods are accurate, but very expensive, need special equipment and the actual hardware, and lack controllability; while high-level simulation-based strategies are flexible, fast, easily accessible and have high controllability, but are not accurate since they are based on models that do not always reflect the low-level implementation, mainly when it comes to complex designs like out-of-order multiple-issue processors. In this work, we propose a cycle-accurate fault injection platform for superscalar processors, which has a smart checkpointing mechanism to accelerate injection time, attenuating the shortcomings imposed by the aforementioned fault injection methods while providing the same level of abstraction as detailed RTL models. Leveraging from this new platform, we evaluate a complex and parameterizable Out-of-Order processor (BOOM) by experimenting with different issue widths and analyzing the sensitivity of several hardware structures of the processor. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-13 | STREAMFTL: STREAM-LEVEL ADDRESS TRANSLATION SCHEME FOR MEMORY CONSTRAINED FLASH STORAGE Speaker: Dongkun Shin, Sungkyunkwan University, KR Authors: Hyukjoong Kim, Kyuhwa Han and Dongkun Shin, Sungkyunkwan University, KR Abstract Although much research efforts have been devoted to reducing the size of address mapping table which consumes DRAM space in solid state drives (SSDs), most SSDs still use page-level mapping for high performance in their firmware called flash translation layer (FTL). In this paper, we propose a novel FTL scheme, called StreamFTL. In order to reduce the size of the mapping table in SSDs, StreamFTL maintains a mapping entry for each stream, which consists of several logical pages written at contiguous physical pages. Unlike extent, which is used by previous FTL schemes, the logical pages in a stream do not need to be contiguous. We show that StreamFTL can reduce the size of the mapping table by up to 90% compared to page-level mapping scheme. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-14 | ONLINE CONCURRENT WORKLOAD CLASSIFICATION FOR MULTI-CORE ENERGY MANAGEMENT Speaker: Karunakar Reddy Basireddy, University of Southampton, GB Authors: Karunakar Reddy Basireddy1, Amit Kumar Singh2, Geoff V. Merrett1 and Bashir M. Al-Hashimi1 1University of Southampton, GB; 2University of Essex, GB Abstract Modern embedded multi-core processors are organized as clusters of cores, where all cores in each cluster operate at a common Voltage-frequency (V-f ). Such processors often need to execute applications concurrently, exhibiting varying and mixed workloads (e.g. compute- and memory-intensive) depending on the instruction mix and resource sharing. Runtime adaptation is key to achieving energy savings without trading-off application performance with such workload variabilities. In this paper, we propose an online energy management technique that performs concurrent workload classification using the metric Memory Reads Per Instruction (MRPI) and pro-actively selects an appropriate V-f setting through workload prediction. Subsequently, it monitors the workload prediction error and performance loss, quantified by Instructions Per Second (IPS) at runtime and adjusts the chosen V-f to compensate. We validate the proposed technique on an Odroid-XU3 with various combinations of benchmark applications. Results show an improvement in energy efficiency of up to 69% compared to existing approaches. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-15 | AIM: FAST AND ENERGY-EFFICIENT AES IN-MEMORY IMPLEMENTATION FOR EMERGING NON-VOLATILE MAIN MEMORY Speaker: Jingtong Hu, University of Pittsburgh, US Authors: Mimi Xie1, Shuangchen Li2, Alvin Glova2, Jingtong Hu1, Yuangang Wang3 and Yuan Xie2 1University of Pittsburgh, US; 2University of California, Santa Barbara, US; 3Huawei Technologies, China, CN Abstract Non-volatile main memory-based systems pose an opportunity for an attacker to readily access sensitive information on the memory because of its long retention time. While real-time memory encryption with dedicated AES engine can address this vulnerability, it incurs extra performance and energy overheads. As an alternative, we propose an AES in-memory implementation, AIM, to encrypt the whole/part of the memory only when it is necessary. We leverage the benefits offered by the in-memory computing architecture to address the challenges of the bandwidth intensive encryption application. We take advantage of NVM's intrinsic logic operation capability to implement the AES task. Embracing the massive parallelism inside the memory, AIM outperforms existing mechanisms with higher throughput yet lower energy consumption. Compared with state-of-the-art AES engine running at 2.1GHz, AIM can speed up the encryption process by 80 times for a 1GB NVM. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-16 | SAT-BASED BIT-FLIPPING ATTACK ON LOGIC ENCRYPTIONS Speaker: Hai Zhou, Northwestern University, US Authors: Yuanqi Shen, Amin Rezaei and Hai Zhou, Northwestern University, US Abstract Logic encryption is a hardware security technique that uses extra key inputs to prevent unauthorized use of a circuit. With the discovery of the SAT-based attack, new encryption techniques such as SARLock and Anti-SAT are proposed, and further combined with traditional logic encryption techniques, to guarantee both high error rates and resilience to the SAT-based attack. In this paper, the SAT-based bit-flipping attack is presented. It first separates the two groups of keys via SAT-based bit-flippings, and then attacks the traditional encryption and the SAT-resilient encryption, by conventional SAT-based attack and by-passing attack, respectively. The experimental results show that the bit-flipping attack successfully returns a circuit with the correct functionality and significantly reduces the execution time compared with other advanced attacks. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP2-17 | AMS VERIFICATION METHODOLOGY REGARDING SUPPLY MODULATION IN RF SOCS INDUCED BY DIGITAL STANDARD CELLS Speaker: Fabian Speicher, RWTH Aachen University, DE Authors: Fabian Speicher, Jonas Meier, Soheil Aghaie, Ralf Wunderlich and Stefan Heinen, RWTH Aachen University, DE Abstract Nanoscale CMOS enables and forces the use of digital-centric RF architectures, where timing resolution is traded for analog resolution. Simultaneously, digital circuits act as aggressors endangering the performance of the time continuous digital and analog parts. The switching activities of logic cells result in power supply variations which lead to jitter in the digital signal paths and causes interferers coupling to the analog paths, appearing as e.g. phase noise, crosstalk, unwanted frequency conversion, etc. Since todays commonly used AMS simulation methods are limited to register-transfer level (RTL) models for the digital domain, the electrical behavior caused by digital switching is not considered. Here, a method for modeling logic cells with regard to power supply noise is presented using the available characterization data of a standard cell library. It covers the influence of switching on the supply voltage as well as influences of supply variations on the digital path delay and their feedthrough to blocks of the RF domain. A fast event-driven simulation of an entire AMS system regarding the mentioned aspects is enabled. The method is demonstrated on a digital-centric transmitter to detect the effects on system level. Download Paper (PDF; Only available from the DATE venue WiFi) |