IP2 Interactive Presentations

Label	Presentation Title Authors
IP2-1	IN-GROWTH TEST FOR MONOLITHIC 3D INTEGRATED SRAM Speaker: Yixun Zhang, Shanghai Jiao Tong University, CN Authors: Pu Pang¹, Yixun Zhang¹, Tianjian Li¹, Sung Kyu Lim², Quan Chen¹, Xiaoyao Liang¹ and Li Jiang¹ ¹Shanghai Jiao Tong University, CN; ²Georgia Tech, US Abstract Monolithic three-dimensional integration (M3I) directly fabricates tiers of integrated circuits upon each other and provides millions of vertical interconnections with interlayer vias (ILVs). It thus brings higher integration density and communication capability compared with three-dimensional stacked integration (3D-SI). However, the Known-Good-Die problem haunting 3D-SI-a faulty tier causes the failure of the entire stack-also occurs in M3I. Lack of efficient test methodologies such as the pre-bond testing in 3D-SI, M3I may have a more significant yield drop and thus its cost may be unacceptable for main-stream adoption. This paper introduces a novel In-growth test method for M3I SRAM. We propose a novel Design-for- Test (DfT) methodology to enable the proposed In-growth test on cell-level partitioned incomplete SRAM cells. We also build a statistical model of cost and discover a prospective judgement to determine whether or not to stop the fabrication, in order to prevent from raising the cost of fabricating more tiers upon the irreparable tiers. We find that a "sweet point" exists in the judgement, which can minimize the overall cost. Experimental results show the effectiveness of our proposed test methodology. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-2	A CO-DESIGN METHODOLOGY FOR SCALABLE QUANTUM PROCESSORS AND THEIR CLASSICAL ELECTRONIC INTERFACE Speaker: Jeroen van Dijk, Delft University of Technology, NL Authors: Jeroen van Dijk¹, Andrei Vladimirescu², Masoud Babaie¹, Edoardo Charbon¹ and Fabio Sebastiano¹ ¹Delft University of Technology, NL; ²University of California, Berkeley, US Abstract A quantum computer fundamentally comprises a quantum processor and a classical controller. The classical electronic controller is used to correct and manipulate the qubits, the core components of a quantum processor. To enable quantum computers scalable to millions of qubits, as required in practical applications, the simultaneous optimization of both the classical electronic and quantum systems is needed. In this paper, a co-design methodology is proposed for obtaining an optimized qubit performance while considering practical trade-offs in the control circuits, such as power consumption, complexity, and cost. The SPINE (SPIN Emulator) toolset is introduced for the co-design and co-optimization of electronic/quantum systems. It comprises a circuit simulator enhanced with a Verilog-A model emulating the quantum behavior of single-electron spin qubits. Design examples show the effectiveness of the proposed methodology in the optimization, design and verification of a whole electronic/quantum system. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-3	APPROXIMATE QUATERNARY ADDITION WITH THE FAST CARRY CHAINS OF FPGAS Speaker: Philip Brisk, University of California, Riverside, US Authors: Sina Boroumand¹, Hadi P. Afshar² and Philip Brisk³ ¹University of Tehran, IR; ²Qualcomm Research, US; ³University of California, Riverside, US Abstract A heuristic is presented to efficiently synthesize approximate adder trees on Altera and Xilinx FPGAs using their carry chains. The mapper constructs approximate adder trees using an approximate quaternary adder as the fundamental building block. The approximate adder trees are smaller than exact adder trees, allowing more operators to fit into a fixed-area device, trading off arithmetic accuracy for higher throughput. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-4	NN COMPACTOR: MINIMIZING MEMORY AND LOGIC RESOURCES FOR SMALL NEURAL NETWORKS Speaker: Seongmin Hong, Hongik University, KR Authors: Seongmin Hong¹, Inho Lee¹ and Yongjun Park² ¹Hongik University, KR; ²Hanyang University, KR Abstract Special neural accelerators are an appealing hardware platform for machine learning systems because they provide both high performance and energy efficiency. Although various neural accelerators have recently been introduced, they are difficult to adapt to embedded platforms because current neural accelerators require high memory capacity and bandwidth for the fast preparation of synaptic weights. Embedded platforms are often unable to meet these memory requirements because of their limited resources. In FPGA-based IoT (internet of things) systems, the problem becomes even worse since computation units generated from logic blocks cannot be fully utilized due to the small size of block memory. In order to overcome this problem, we propose a novel dual-track quantization technique to reduce synaptic weight width based on the magnitude of the value while minimizing accuracy loss. In this value-adaptive technique, large and small value weights are quantized differently. In this paper, we present a fully automatic framework called NN Compactor that generates a compact neural accelerator by minimizing the memory requirements of synaptic weights through dual-track quantization and minimizing the logic requirements of PUs with minimum recognition accuracy loss. For the three widely used datasets of MNIST, CNAE-9, and Forest, experimental results demonstrate that our compact neural accelerator achieves an average performance improvement of 6.4x over a baseline embedded system using minimal resources with minimal accuracy loss. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-5	IMPROVING FAST CHARGING EFFICIENCY OF RECONFIGURABLE BATTERY PACKS Speaker: Alexander Lamprecht, TUM CREATE, SG Authors: Alexander Lamprecht¹, Swaminathan Narayanaswamy¹ and Sebastian Steinhorst² ¹TUM CREATE, SG; ²Technical University of Munich, DE Abstract Recently, reconfigurable battery packs that can dynamically modify the electrical connection topology of their individual cells are gaining importance. While several circuit architectures and management algorithms are proposed in the literature, the electrical characteristics of the reconfiguration circuit architectures are not sufficiently studied so far. In this paper, we derive a detailed analytical model for a state-of-the-art reconfiguration architecture capturing the losses introduced by the parasitic resistances of the circuit components. For the first time, we propose a novel fast charging strategy using the reconfiguration architecture that significantly reduces the power losses in comparison to conventional battery packs. Moreover, using the analytical model, we highlight the challenges faced by existing reconfiguration architectures using state-of-the-art components and we derive the specifications for the switches which are essential for improving the energy efficiency of such reconfigurable battery packs. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-6	CLOUD-ASSISTED CONTROL OF GROUND VEHICLES USING ADAPTIVE COMPUTATION OFFLOADING TECHNIQUES Speaker: Soheil Samii, General Motors R&D, Warren, MI 48090, US Authors: Arun Adiththan¹, Ramesh S² and Soheil Samii² ¹City University of New York, US; ²General Motors R&D, US Abstract The existing approaches to design efficient safety-critical control applications is constrained by limited in-vehicle sensing and computational capabilities. In the context of automated driving, we argue that there is a need to leverage resources "out-of-the-vehicle" to meet the sensing and powerful processing requirements of sophisticated algorithms (e.g., deep neural networks). To realize the need, a suitable computation offloading technique that meets the vehicle safety and stability requirements, even in the presence of unreliable communication network, has to be identified. In this work, we propose an adaptive offloading technique for control computations into the cloud. The proposed approach considers both current network conditions and control application requirements to determine the feasibility of leveraging remote computation and storage resources. As a case study, we describe a cloud-based path following controller application that leverages crowdsensed data for path planning. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-7	FUSIONCACHE: USING LLC TAGS FOR DRAM CACHE Speaker: Evangelos Vasilakis, Chalmers University of Technology, SE Authors: Evangelos Vasilakis¹, Vassilis Papaefstathiou², Pedro Trancoso¹ and Ioannis Sourdis¹ ¹Chalmers University of Technology, SE; ²FORTH-ICS, GR Abstract DRAM caches have been shown to be an effective way to utilize the bandwidth and capacity of 3D stacked DRAM. Although they can capture the spatial and temporal data locality of applications, their access latency is still substantially higher than conventional on-chip SRAM caches. Moreover, their tag access latency and storage overheads are excessive. Storing tags for a large DRAM cache in SRAM is impractical as it would occupy a significant fraction of the processor chip. Storing them in the DRAM itself incurs high access overheads. Attempting to cache the DRAM tags on the processor adds a constant delay to the access time. In this paper, we introduce FusionCache, a DRAM cache that offers more efficient tag accesses by fusing DRAM cache tags with the tags of the on-chip Last Level Cache (LLC). We observe that, in an inclusive cache model where the DRAM cachelines are multiples of on-chip SRAM cachelines, LLC tags could be re-purposed to access a large part of the DRAM cache contents. Then, accessing DRAM cache tags incurs zero additional latency in the common case. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-8	IMPROVED SYNTHESIS OF CLIFFORD+T QUANTUM FUNCTIONALITY Speaker: Philipp Niemann, German Research Center for Articial Intelligence (DFKI GmbH), DE Authors: Philipp Niemann¹, Robert Wille² and Rolf Drechsler³ ¹Cyber-Physical Systems, DFKI GmbH, DE; ²Johannes Kepler University Linz, AT; ³University of Bremen/DFKI GmbH, DE Abstract The Clifford+T library provides robust and fault-tolerant realizations for quantum computations. Consequently, (logic) synthesis of Clifford+T quantum circuits became an important research problem. However, previously proposed solutions are either only applicable to very small quantum systems or lead to circuits that are far from being optimal—mainly caused by a local, i.e. column-wise, consideration of the underlying transformation matrix to be synthesized. In this paper, we suggest an improved approach that considers the matrix globally and, by this, overcomes many of these drawbacks. Preliminary evaluations show the promises of this direction. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-9	ENERGY-EFFICIENT CHANNEL ALIGNMENT OF DWDM SILICON PHOTONIC TRANSCEIVERS Speaker: Yuyang Wang, University of California, Santa Barbara, US Authors: Yuyang Wang¹, M. Ashkan Seyedi², Rui Wu¹, Jared Hulme², Marco Fiorentino², Raymond G. Beausoleil² and Kwang-Ting Cheng³ ¹University of California, Santa Barbara, US; ²Hewlett Packard Labs, US; ³Hong Kong University of Science and Technology, HK Abstract The comb laser-driven microring-based dense wavelength division multiplexing silicon photonics is a promising candidate for next-generation optical interconnects. However, existing solutions for exploring the power-performance trade-off of such systems have been restricted to a limited design space, resulting from the unnecessary constraints of using an identical spacing for laser comb lines and microring channels, and of utilizing consecutive laser comb lines for data transmission. We propose an energy-efficient channel alignment scheme that aligns the microring channels to a subset of laser comb lines that are non-uniformly distributed in the free spectrum range of the microrings. Based on a well-established process variation model, our simulations show that the proposed scheme significantly reduces the microring tuning power in the presence of denser comb lines. The power saved from microring tuning can improve the overall system energy efficiency despite some power wasted in unused laser comb lines. We further conducted a case study for design space exploration using the proposed channel alignment scheme, seeking the most energy-efficient configuration in order to achieve a target aggregated data rate. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-10	A PHYSICAL SYNTHESIS FLOW FOR EARLY TECHNOLOGY EVALUATION OF SILICON NANOWIRE BASED RECONFIGURABLE FETS Speaker: Shubham Rai, Chair For Processor Design, CFAED, Technische Universität Dresden, Dresden, DE Authors: Shubham Rai¹, Ansh Rupani², Dennis Walter¹, Michael Raitza¹, André Heinzig³, Christian Mayr¹, Walter Weber⁴ and Akash Kumar¹ ¹Technische Universität Dresden, DE; ²Birla Institute of Technology and Science Pilani, Hyderabad Campus, IN; ³NaMLab GmbH, DE; ⁴NaMLab gGmbH and CfAED, DE Abstract Silicon Nanowire based reconfigurable transistors (RFETs) provide an additional gate terminal called the program gate which gives the freedom of programming p-type or n-type functionality for the same device at runtime. This enables the circuit designers to pack more functionality per computational unit. This saves processing costs as only one device type is required. No doping and associated lithography steps are needed for this technology. In this paper, we present a complete design flow including both logic and physical synthesis for circuits based on SiNW RFETs. We propose layouts of logic gates, Liberty and LEF (library extension format) files for the physical synthesis flow and make these available under an open source license to enable further research in the domain of these novel, functionally enhanced transistors. We develop a table model based on a transistor cell with relaxed dimensions following an SOI-based 22 nm technology having a gate pitch of 110 nm and modeled our logic gates on dual gate RFETs. For the sake of comparison, we use the same tool flow for CMOS. We show that in the first of its kind comparison, for these fully symmetrical reconfigurable transistors, the area after placement and routing for SiNW based circuits is 17% more than that of CMOS for MCNC benchmark. Further, we discuss areas of improvement for obtaining better area results from the silicon nanowire based RFETs from a fabrication and technology point of view. The future use of self-aligned techniques to structure two independent gates within a smaller pitch holds the promise of substantial area reduction. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-11	ETISS-ML: A MULTI-LEVEL INSTRUCTION SET SIMULATOR WITH RTL-LEVEL FAULT INJECTION SUPPORT FOR THE EVALUATION OF CROSS-LAYER RESILIENCY TECHNIQUES Speaker: Martin Dittrich, Technical University of Munich, DE Authors: Daniel Mueller-Gritschneder¹, Martin Dittrich¹, Josef Weinzierl¹, Eric Cheng², Subhasish Mitra² and Ulf Schlichtmann¹ ¹Technical University of Munich, DE; ²Stanford University, US Abstract ETISS is an instruction set simulator (ISS) for Virtual Prototypes (VPs) modeled with SystemC/TLM. In this paper, we propose the extension ETISS-ML, which enables a multi-level simulation that switches between ISS-level and register transfer level (RTL) to accurately evaluate the impact of soft errors in the pipeline of a RISC processor. ETISS-ML achieves close-to-RTL-accurate fault injection simulation results with close-to-ISS simulation performance with a speed up gain up to 100x compared to RTL. For this, we propose an approach to dynamically determine the length of the RTL simulation period. The high simulation performance of ETISS-ML enables an ultra-efficient and accurate evaluation of cross-layer resiliency techniques for embedded applications, which requires running a large number of fault injections for long simulation scenarios. This is demonstrated on a case study of a Microcontroller Unit (MCU) executing a control algorithm for adaptive cruise control. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-12	PRECISE EVALUATION OF THE FAULT SENSITIVITY OF OOO SUPERSCALAR PROCESSORS Speaker: Antonio Carlos Schneider Beck, Federal University of Rio Grande do Sul, BR Authors: Rafael Tonetto¹, Gabriel Luca Nazar² and Antonio Carlos Schneider Beck² ¹Federal University of Rio Grande do Sul, BR; ²Universidade Federal do Rio Grande do Sul, BR Abstract Since superscalar processors lead the market, their resiliency evaluation by means of fault injection grows in importance. Fault injection strategies usually trade-off their levels of accuracy: low-level HW-based methods are accurate, but very expensive, need special equipment and the actual hardware, and lack controllability; while high-level simulation-based strategies are flexible, fast, easily accessible and have high controllability, but are not accurate since they are based on models that do not always reflect the low-level implementation, mainly when it comes to complex designs like out-of-order multiple-issue processors. In this work, we propose a cycle-accurate fault injection platform for superscalar processors, which has a smart checkpointing mechanism to accelerate injection time, attenuating the shortcomings imposed by the aforementioned fault injection methods while providing the same level of abstraction as detailed RTL models. Leveraging from this new platform, we evaluate a complex and parameterizable Out-of-Order processor (BOOM) by experimenting with different issue widths and analyzing the sensitivity of several hardware structures of the processor. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-13	STREAMFTL: STREAM-LEVEL ADDRESS TRANSLATION SCHEME FOR MEMORY CONSTRAINED FLASH STORAGE Speaker: Dongkun Shin, Sungkyunkwan University, KR Authors: Hyukjoong Kim, Kyuhwa Han and Dongkun Shin, Sungkyunkwan University, KR Abstract Although much research efforts have been devoted to reducing the size of address mapping table which consumes DRAM space in solid state drives (SSDs), most SSDs still use page-level mapping for high performance in their firmware called flash translation layer (FTL). In this paper, we propose a novel FTL scheme, called StreamFTL. In order to reduce the size of the mapping table in SSDs, StreamFTL maintains a mapping entry for each stream, which consists of several logical pages written at contiguous physical pages. Unlike extent, which is used by previous FTL schemes, the logical pages in a stream do not need to be contiguous. We show that StreamFTL can reduce the size of the mapping table by up to 90% compared to page-level mapping scheme. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-14	ONLINE CONCURRENT WORKLOAD CLASSIFICATION FOR MULTI-CORE ENERGY MANAGEMENT Speaker: Karunakar Reddy Basireddy, University of Southampton, GB Authors: Karunakar Reddy Basireddy¹, Amit Kumar Singh², Geoff V. Merrett¹ and Bashir M. Al-Hashimi¹ ¹University of Southampton, GB; ²University of Essex, GB Abstract Modern embedded multi-core processors are organized as clusters of cores, where all cores in each cluster operate at a common Voltage-frequency (V-f ). Such processors often need to execute applications concurrently, exhibiting varying and mixed workloads (e.g. compute- and memory-intensive) depending on the instruction mix and resource sharing. Runtime adaptation is key to achieving energy savings without trading-off application performance with such workload variabilities. In this paper, we propose an online energy management technique that performs concurrent workload classification using the metric Memory Reads Per Instruction (MRPI) and pro-actively selects an appropriate V-f setting through workload prediction. Subsequently, it monitors the workload prediction error and performance loss, quantified by Instructions Per Second (IPS) at runtime and adjusts the chosen V-f to compensate. We validate the proposed technique on an Odroid-XU3 with various combinations of benchmark applications. Results show an improvement in energy efficiency of up to 69% compared to existing approaches. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-15	AIM: FAST AND ENERGY-EFFICIENT AES IN-MEMORY IMPLEMENTATION FOR EMERGING NON-VOLATILE MAIN MEMORY Speaker: Jingtong Hu, University of Pittsburgh, US Authors: Mimi Xie¹, Shuangchen Li², Alvin Glova², Jingtong Hu¹, Yuangang Wang³ and Yuan Xie² ¹University of Pittsburgh, US; ²University of California, Santa Barbara, US; ³Huawei Technologies, China, CN Abstract Non-volatile main memory-based systems pose an opportunity for an attacker to readily access sensitive information on the memory because of its long retention time. While real-time memory encryption with dedicated AES engine can address this vulnerability, it incurs extra performance and energy overheads. As an alternative, we propose an AES in-memory implementation, AIM, to encrypt the whole/part of the memory only when it is necessary. We leverage the benefits offered by the in-memory computing architecture to address the challenges of the bandwidth intensive encryption application. We take advantage of NVM's intrinsic logic operation capability to implement the AES task. Embracing the massive parallelism inside the memory, AIM outperforms existing mechanisms with higher throughput yet lower energy consumption. Compared with state-of-the-art AES engine running at 2.1GHz, AIM can speed up the encryption process by 80 times for a 1GB NVM. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-16	SAT-BASED BIT-FLIPPING ATTACK ON LOGIC ENCRYPTIONS Speaker: Hai Zhou, Northwestern University, US Authors: Yuanqi Shen, Amin Rezaei and Hai Zhou, Northwestern University, US Abstract Logic encryption is a hardware security technique that uses extra key inputs to prevent unauthorized use of a circuit. With the discovery of the SAT-based attack, new encryption techniques such as SARLock and Anti-SAT are proposed, and further combined with traditional logic encryption techniques, to guarantee both high error rates and resilience to the SAT-based attack. In this paper, the SAT-based bit-flipping attack is presented. It first separates the two groups of keys via SAT-based bit-flippings, and then attacks the traditional encryption and the SAT-resilient encryption, by conventional SAT-based attack and by-passing attack, respectively. The experimental results show that the bit-flipping attack successfully returns a circuit with the correct functionality and significantly reduces the execution time compared with other advanced attacks. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-17	AMS VERIFICATION METHODOLOGY REGARDING SUPPLY MODULATION IN RF SOCS INDUCED BY DIGITAL STANDARD CELLS Speaker: Fabian Speicher, RWTH Aachen University, DE Authors: Fabian Speicher, Jonas Meier, Soheil Aghaie, Ralf Wunderlich and Stefan Heinen, RWTH Aachen University, DE Abstract Nanoscale CMOS enables and forces the use of digital-centric RF architectures, where timing resolution is traded for analog resolution. Simultaneously, digital circuits act as aggressors endangering the performance of the time continuous digital and analog parts. The switching activities of logic cells result in power supply variations which lead to jitter in the digital signal paths and causes interferers coupling to the analog paths, appearing as e.g. phase noise, crosstalk, unwanted frequency conversion, etc. Since todays commonly used AMS simulation methods are limited to register-transfer level (RTL) models for the digital domain, the electrical behavior caused by digital switching is not considered. Here, a method for modeling logic cells with regard to power supply noise is presented using the available characterization data of a standard cell library. It covers the influence of switching on the supply voltage as well as influences of supply variations on the digital path delay and their feedthrough to blocks of the RF domain. A fast event-driven simulation of an entire AMS system regarding the mentioned aspects is enabled. The method is demonstrated on a digital-centric transmitter to detect the effects on system level. Download Paper (PDF; Only available from the DATE venue WiFi)

Label

Presentation Title
Authors

IP2-1

IN-GROWTH TEST FOR MONOLITHIC 3D INTEGRATED SRAM
Speaker:
Yixun Zhang, Shanghai Jiao Tong University, CN
Authors:
Pu Pang¹, Yixun Zhang¹, Tianjian Li¹, Sung Kyu Lim², Quan Chen¹, Xiaoyao Liang¹ and Li Jiang¹
¹Shanghai Jiao Tong University, CN; ²Georgia Tech, US
Abstract
Monolithic three-dimensional integration (M3I) directly fabricates tiers of integrated circuits upon each other and provides millions of vertical interconnections with interlayer vias (ILVs). It thus brings higher integration density and communication capability compared with three-dimensional stacked integration (3D-SI). However, the Known-Good-Die problem haunting 3D-SI-a faulty tier causes the failure of the entire stack-also occurs in M3I. Lack of efficient test methodologies such as the pre-bond testing in 3D-SI, M3I may have a more significant yield drop and thus its cost may be unacceptable for main-stream adoption. This paper introduces a novel In-growth test method for M3I SRAM. We propose a novel Design-for- Test (DfT) methodology to enable the proposed In-growth test on cell-level partitioned incomplete SRAM cells. We also build a statistical model of cost and discover a prospective judgement to determine whether or not to stop the fabrication, in order to prevent from raising the cost of fabricating more tiers upon the irreparable tiers. We find that a "sweet point" exists in the judgement, which can minimize the overall cost. Experimental results show the effectiveness of our proposed test methodology.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-2

A CO-DESIGN METHODOLOGY FOR SCALABLE QUANTUM PROCESSORS AND THEIR CLASSICAL ELECTRONIC INTERFACE
Speaker:
Jeroen van Dijk, Delft University of Technology, NL
Authors:
Jeroen van Dijk¹, Andrei Vladimirescu², Masoud Babaie¹, Edoardo Charbon¹ and Fabio Sebastiano¹
¹Delft University of Technology, NL; ²University of California, Berkeley, US
Abstract
A quantum computer fundamentally comprises a quantum processor and a classical controller. The classical electronic controller is used to correct and manipulate the qubits, the core components of a quantum processor. To enable quantum computers scalable to millions of qubits, as required in practical applications, the simultaneous optimization of both the classical electronic and quantum systems is needed. In this paper, a co-design methodology is proposed for obtaining an optimized qubit performance while considering practical trade-offs in the control circuits, such as power consumption, complexity, and cost. The SPINE (SPIN Emulator) toolset is introduced for the co-design and co-optimization of electronic/quantum systems. It comprises a circuit simulator enhanced with a Verilog-A model emulating the quantum behavior of single-electron spin qubits. Design examples show the effectiveness of the proposed methodology in the optimization, design and verification of a whole electronic/quantum system.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-3

APPROXIMATE QUATERNARY ADDITION WITH THE FAST CARRY CHAINS OF FPGAS
Speaker:
Philip Brisk, University of California, Riverside, US
Authors:
Sina Boroumand¹, Hadi P. Afshar² and Philip Brisk³
¹University of Tehran, IR; ²Qualcomm Research, US; ³University of California, Riverside, US
Abstract
A heuristic is presented to efficiently synthesize approximate adder trees on Altera and Xilinx FPGAs using their carry chains. The mapper constructs approximate adder trees using an approximate quaternary adder as the fundamental building block. The approximate adder trees are smaller than exact adder trees, allowing more operators to fit into a fixed-area device, trading off arithmetic accuracy for higher throughput.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-4

NN COMPACTOR: MINIMIZING MEMORY AND LOGIC RESOURCES FOR SMALL NEURAL NETWORKS
Speaker:
Seongmin Hong, Hongik University, KR
Authors:
Seongmin Hong¹, Inho Lee¹ and Yongjun Park²
¹Hongik University, KR; ²Hanyang University, KR
Abstract
Special neural accelerators are an appealing hardware platform for machine learning systems because they provide both high performance and energy efficiency. Although various neural accelerators have recently been introduced, they are difficult to adapt to embedded platforms because current neural accelerators require high memory capacity and bandwidth for the fast preparation of synaptic weights. Embedded platforms are often unable to meet these memory requirements because of their limited resources. In FPGA-based IoT (internet of things) systems, the problem becomes even worse since computation units generated from logic blocks cannot be fully utilized due to the small size of block memory. In order to overcome this problem, we propose a novel dual-track quantization technique to reduce synaptic weight width based on the magnitude of the value while minimizing accuracy loss. In this value-adaptive technique, large and small value weights are quantized differently. In this paper, we present a fully automatic framework called NN Compactor that generates a compact neural accelerator by minimizing the memory requirements of synaptic weights through dual-track quantization and minimizing the logic requirements of PUs with minimum recognition accuracy loss. For the three widely used datasets of MNIST, CNAE-9, and Forest, experimental results demonstrate that our compact neural accelerator achieves an average performance improvement of 6.4x over a baseline embedded system using minimal resources with minimal accuracy loss.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-5

IMPROVING FAST CHARGING EFFICIENCY OF RECONFIGURABLE BATTERY PACKS
Speaker:
Alexander Lamprecht, TUM CREATE, SG
Authors:
Alexander Lamprecht¹, Swaminathan Narayanaswamy¹ and Sebastian Steinhorst²
¹TUM CREATE, SG; ²Technical University of Munich, DE
Abstract
Recently, reconfigurable battery packs that can dynamically modify the electrical connection topology of their individual cells are gaining importance. While several circuit architectures and management algorithms are proposed in the literature, the electrical characteristics of the reconfiguration circuit architectures are not sufficiently studied so far. In this paper, we derive a detailed analytical model for a state-of-the-art reconfiguration architecture capturing the losses introduced by the parasitic resistances of the circuit components. For the first time, we propose a novel fast charging strategy using the reconfiguration architecture that significantly reduces the power losses in comparison to conventional battery packs. Moreover, using the analytical model, we highlight the challenges faced by existing reconfiguration architectures using state-of-the-art components and we derive the specifications for the switches which are essential for improving the energy efficiency of such reconfigurable battery packs.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-6

CLOUD-ASSISTED CONTROL OF GROUND VEHICLES USING ADAPTIVE COMPUTATION OFFLOADING TECHNIQUES
Speaker:
Soheil Samii, General Motors R&D, Warren, MI 48090, US
Authors:
Arun Adiththan¹, Ramesh S² and Soheil Samii²
¹City University of New York, US; ²General Motors R&D, US
Abstract
The existing approaches to design efficient safety-critical control applications is constrained by limited in-vehicle sensing and computational capabilities. In the context of automated driving, we argue that there is a need to leverage resources "out-of-the-vehicle" to meet the sensing and powerful processing requirements of sophisticated algorithms (e.g., deep neural networks). To realize the need, a suitable computation offloading technique that meets the vehicle safety and stability requirements, even in the presence of unreliable communication network, has to be identified. In this work, we propose an adaptive offloading technique for control computations into the cloud. The proposed approach considers both current network conditions and control application requirements to determine the feasibility of leveraging remote computation and storage resources. As a case study, we describe a cloud-based path following controller application that leverages crowdsensed data for path planning.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-7

FUSIONCACHE: USING LLC TAGS FOR DRAM CACHE
Speaker:
Evangelos Vasilakis, Chalmers University of Technology, SE
Authors:
Evangelos Vasilakis¹, Vassilis Papaefstathiou², Pedro Trancoso¹ and Ioannis Sourdis¹
¹Chalmers University of Technology, SE; ²FORTH-ICS, GR
Abstract
DRAM caches have been shown to be an effective way to utilize the bandwidth and capacity of 3D stacked DRAM. Although they can capture the spatial and temporal data locality of applications, their access latency is still substantially higher than conventional on-chip SRAM caches. Moreover, their tag access latency and storage overheads are excessive. Storing tags for a large DRAM cache in SRAM is impractical as it would occupy a significant fraction of the processor chip. Storing them in the DRAM itself incurs high access overheads. Attempting to cache the DRAM tags on the processor adds a constant delay to the access time. In this paper, we introduce FusionCache, a DRAM cache that offers more efficient tag accesses by fusing DRAM cache tags with the tags of the on-chip Last Level Cache (LLC). We observe that, in an inclusive cache model where the DRAM cachelines are multiples of on-chip SRAM cachelines, LLC tags could be re-purposed to access a large part of the DRAM cache contents. Then, accessing DRAM cache tags incurs zero additional latency in the common case.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-8

IMPROVED SYNTHESIS OF CLIFFORD+T QUANTUM FUNCTIONALITY
Speaker:
Philipp Niemann, German Research Center for Articial Intelligence (DFKI GmbH), DE
Authors:
Philipp Niemann¹, Robert Wille² and Rolf Drechsler³
¹Cyber-Physical Systems, DFKI GmbH, DE; ²Johannes Kepler University Linz, AT; ³University of Bremen/DFKI GmbH, DE
Abstract
The Clifford+T library provides robust and fault-tolerant realizations for quantum computations. Consequently, (logic) synthesis of Clifford+T quantum circuits became an important research problem. However, previously proposed solutions are either only applicable to very small quantum systems or lead to circuits that are far from being optimal—mainly caused by a local, i.e. column-wise, consideration of the underlying transformation matrix to be synthesized. In this paper, we suggest an improved approach that considers the matrix globally and, by this, overcomes many of these drawbacks. Preliminary evaluations show the promises of this direction.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-9

ENERGY-EFFICIENT CHANNEL ALIGNMENT OF DWDM SILICON PHOTONIC TRANSCEIVERS
Speaker:
Yuyang Wang, University of California, Santa Barbara, US
Authors:
Yuyang Wang¹, M. Ashkan Seyedi², Rui Wu¹, Jared Hulme², Marco Fiorentino², Raymond G. Beausoleil² and Kwang-Ting Cheng³
¹University of California, Santa Barbara, US; ²Hewlett Packard Labs, US; ³Hong Kong University of Science and Technology, HK
Abstract
The comb laser-driven microring-based dense wavelength division multiplexing silicon photonics is a promising candidate for next-generation optical interconnects. However, existing solutions for exploring the power-performance trade-off of such systems have been restricted to a limited design space, resulting from the unnecessary constraints of using an identical spacing for laser comb lines and microring channels, and of utilizing consecutive laser comb lines for data transmission. We propose an energy-efficient channel alignment scheme that aligns the microring channels to a subset of laser comb lines that are non-uniformly distributed in the free spectrum range of the microrings. Based on a well-established process variation model, our simulations show that the proposed scheme significantly reduces the microring tuning power in the presence of denser comb lines. The power saved from microring tuning can improve the overall system energy efficiency despite some power wasted in unused laser comb lines. We further conducted a case study for design space exploration using the proposed channel alignment scheme, seeking the most energy-efficient configuration in order to achieve a target aggregated data rate.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-10

A PHYSICAL SYNTHESIS FLOW FOR EARLY TECHNOLOGY EVALUATION OF SILICON NANOWIRE BASED RECONFIGURABLE FETS
Speaker:
Shubham Rai, Chair For Processor Design, CFAED, Technische Universität Dresden, Dresden, DE
Authors:
Shubham Rai¹, Ansh Rupani², Dennis Walter¹, Michael Raitza¹, André Heinzig³, Christian Mayr¹, Walter Weber⁴ and Akash Kumar¹
¹Technische Universität Dresden, DE; ²Birla Institute of Technology and Science Pilani, Hyderabad Campus, IN; ³NaMLab GmbH, DE; ⁴NaMLab gGmbH and CfAED, DE
Abstract
Silicon Nanowire based reconfigurable transistors (RFETs) provide an additional gate terminal called the program gate which gives the freedom of programming p-type or n-type functionality for the same device at runtime. This enables the circuit designers to pack more functionality per computational unit. This saves processing costs as only one device type is required. No doping and associated lithography steps are needed for this technology. In this paper, we present a complete design flow including both logic and physical synthesis for circuits based on SiNW RFETs. We propose layouts of logic gates, Liberty and LEF (library extension format) files for the physical synthesis flow and make these available under an open source license to enable further research in the domain of these novel, functionally enhanced transistors. We develop a table model based on a transistor cell with relaxed dimensions following an SOI-based 22 nm technology having a gate pitch of 110 nm and modeled our logic gates on dual gate RFETs. For the sake of comparison, we use the same tool flow for CMOS. We show that in the first of its kind comparison, for these fully symmetrical reconfigurable transistors, the area after placement and routing for SiNW based circuits is 17% more than that of CMOS for MCNC benchmark. Further, we discuss areas of improvement for obtaining better area results from the silicon nanowire based RFETs from a fabrication and technology point of view. The future use of self-aligned techniques to structure two independent gates within a smaller pitch holds the promise of substantial area reduction.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-11

ETISS-ML: A MULTI-LEVEL INSTRUCTION SET SIMULATOR WITH RTL-LEVEL FAULT INJECTION SUPPORT FOR THE EVALUATION OF CROSS-LAYER RESILIENCY TECHNIQUES
Speaker:
Martin Dittrich, Technical University of Munich, DE
Authors:
Daniel Mueller-Gritschneder¹, Martin Dittrich¹, Josef Weinzierl¹, Eric Cheng², Subhasish Mitra² and Ulf Schlichtmann¹
¹Technical University of Munich, DE; ²Stanford University, US
Abstract
ETISS is an instruction set simulator (ISS) for Virtual Prototypes (VPs) modeled with SystemC/TLM. In this paper, we propose the extension ETISS-ML, which enables a multi-level simulation that switches between ISS-level and register transfer level (RTL) to accurately evaluate the impact of soft errors in the pipeline of a RISC processor. ETISS-ML achieves close-to-RTL-accurate fault injection simulation results with close-to-ISS simulation performance with a speed up gain up to 100x compared to RTL. For this, we propose an approach to dynamically determine the length of the RTL simulation period. The high simulation performance of ETISS-ML enables an ultra-efficient and accurate evaluation of cross-layer resiliency techniques for embedded applications, which requires running a large number of fault injections for long simulation scenarios. This is demonstrated on a case study of a Microcontroller Unit (MCU) executing a control algorithm for adaptive cruise control.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-12

PRECISE EVALUATION OF THE FAULT SENSITIVITY OF OOO SUPERSCALAR PROCESSORS
Speaker:
Antonio Carlos Schneider Beck, Federal University of Rio Grande do Sul, BR
Authors:
Rafael Tonetto¹, Gabriel Luca Nazar² and Antonio Carlos Schneider Beck²
¹Federal University of Rio Grande do Sul, BR; ²Universidade Federal do Rio Grande do Sul, BR
Abstract
Since superscalar processors lead the market, their resiliency evaluation by means of fault injection grows in importance. Fault injection strategies usually trade-off their levels of accuracy: low-level HW-based methods are accurate, but very expensive, need special equipment and the actual hardware, and lack controllability; while high-level simulation-based strategies are flexible, fast, easily accessible and have high controllability, but are not accurate since they are based on models that do not always reflect the low-level implementation, mainly when it comes to complex designs like out-of-order multiple-issue processors. In this work, we propose a cycle-accurate fault injection platform for superscalar processors, which has a smart checkpointing mechanism to accelerate injection time, attenuating the shortcomings imposed by the aforementioned fault injection methods while providing the same level of abstraction as detailed RTL models. Leveraging from this new platform, we evaluate a complex and parameterizable Out-of-Order processor (BOOM) by experimenting with different issue widths and analyzing the sensitivity of several hardware structures of the processor.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-13

STREAMFTL: STREAM-LEVEL ADDRESS TRANSLATION SCHEME FOR MEMORY CONSTRAINED FLASH STORAGE
Speaker:
Dongkun Shin, Sungkyunkwan University, KR
Authors:
Hyukjoong Kim, Kyuhwa Han and Dongkun Shin, Sungkyunkwan University, KR
Abstract
Although much research efforts have been devoted to reducing the size of address mapping table which consumes DRAM space in solid state drives (SSDs), most SSDs still use page-level mapping for high performance in their firmware called flash translation layer (FTL). In this paper, we propose a novel FTL scheme, called StreamFTL. In order to reduce the size of the mapping table in SSDs, StreamFTL maintains a mapping entry for each stream, which consists of several logical pages written at contiguous physical pages. Unlike extent, which is used by previous FTL schemes, the logical pages in a stream do not need to be contiguous. We show that StreamFTL can reduce the size of the mapping table by up to 90% compared to page-level mapping scheme.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-14

ONLINE CONCURRENT WORKLOAD CLASSIFICATION FOR MULTI-CORE ENERGY MANAGEMENT
Speaker:
Karunakar Reddy Basireddy, University of Southampton, GB
Authors:
Karunakar Reddy Basireddy¹, Amit Kumar Singh², Geoff V. Merrett¹ and Bashir M. Al-Hashimi¹
¹University of Southampton, GB; ²University of Essex, GB
Abstract
Modern embedded multi-core processors are organized as clusters of cores, where all cores in each cluster operate at a common Voltage-frequency (V-f ). Such processors often need to execute applications concurrently, exhibiting varying and mixed workloads (e.g. compute- and memory-intensive) depending on the instruction mix and resource sharing. Runtime adaptation is key to achieving energy savings without trading-off application performance with such workload variabilities. In this paper, we propose an online energy management technique that performs concurrent workload classification using the metric Memory Reads Per Instruction (MRPI) and pro-actively selects an appropriate V-f setting through workload prediction. Subsequently, it monitors the workload prediction error and performance loss, quantified by Instructions Per Second (IPS) at runtime and adjusts the chosen V-f to compensate. We validate the proposed technique on an Odroid-XU3 with various combinations of benchmark applications. Results show an improvement in energy efficiency of up to 69% compared to existing approaches.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-15

AIM: FAST AND ENERGY-EFFICIENT AES IN-MEMORY IMPLEMENTATION FOR EMERGING NON-VOLATILE MAIN MEMORY
Speaker:
Jingtong Hu, University of Pittsburgh, US
Authors:
Mimi Xie¹, Shuangchen Li², Alvin Glova², Jingtong Hu¹, Yuangang Wang³ and Yuan Xie²
¹University of Pittsburgh, US; ²University of California, Santa Barbara, US; ³Huawei Technologies, China, CN
Abstract
Non-volatile main memory-based systems pose an opportunity for an attacker to readily access sensitive information on the memory because of its long retention time. While real-time memory encryption with dedicated AES engine can address this vulnerability, it incurs extra performance and energy overheads. As an alternative, we propose an AES in-memory implementation, AIM, to encrypt the whole/part of the memory only when it is necessary. We leverage the benefits offered by the in-memory computing architecture to address the challenges of the bandwidth intensive encryption application. We take advantage of NVM's intrinsic logic operation capability to implement the AES task. Embracing the massive parallelism inside the memory, AIM outperforms existing mechanisms with higher throughput yet lower energy consumption. Compared with state-of-the-art AES engine running at 2.1GHz, AIM can speed up the encryption process by 80 times for a 1GB NVM.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-16

SAT-BASED BIT-FLIPPING ATTACK ON LOGIC ENCRYPTIONS
Speaker:
Hai Zhou, Northwestern University, US
Authors:
Yuanqi Shen, Amin Rezaei and Hai Zhou, Northwestern University, US
Abstract
Logic encryption is a hardware security technique that uses extra key inputs to prevent unauthorized use of a circuit. With the discovery of the SAT-based attack, new encryption techniques such as SARLock and Anti-SAT are proposed, and further combined with traditional logic encryption techniques, to guarantee both high error rates and resilience to the SAT-based attack. In this paper, the SAT-based bit-flipping attack is presented. It first separates the two groups of keys via SAT-based bit-flippings, and then attacks the traditional encryption and the SAT-resilient encryption, by conventional SAT-based attack and by-passing attack, respectively. The experimental results show that the bit-flipping attack successfully returns a circuit with the correct functionality and significantly reduces the execution time compared with other advanced attacks.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-17

AMS VERIFICATION METHODOLOGY REGARDING SUPPLY MODULATION IN RF SOCS INDUCED BY DIGITAL STANDARD CELLS
Speaker:
Fabian Speicher, RWTH Aachen University, DE
Authors:
Fabian Speicher, Jonas Meier, Soheil Aghaie, Ralf Wunderlich and Stefan Heinen, RWTH Aachen University, DE
Abstract
Nanoscale CMOS enables and forces the use of digital-centric RF architectures, where timing resolution is traded for analog resolution. Simultaneously, digital circuits act as aggressors endangering the performance of the time continuous digital and analog parts. The switching activities of logic cells result in power supply variations which lead to jitter in the digital signal paths and causes interferers coupling to the analog paths, appearing as e.g. phase noise, crosstalk, unwanted frequency conversion, etc. Since todays commonly used AMS simulation methods are limited to register-transfer level (RTL) models for the digital domain, the electrical behavior caused by digital switching is not considered. Here, a method for modeling logic cells with regard to power supply noise is presented using the available characterization data of a standard cell library. It covers the influence of switching on the supply voltage as well as influences of supply variations on the digital path delay and their feedthrough to blocks of the RF domain. A fast event-driven simulation of an entire AMS system regarding the mentioned aspects is enabled. The method is demonstrated on a digital-centric transmitter to detect the effects on system level.
Download Paper (PDF; Only available from the DATE venue WiFi)