IP1 Interactive Presentations

Printer-friendly version PDF version

Date: Tuesday 10 March 2020
Time: 16:00 - 16:30
Location / Room: Poster Area

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP1-1DYNUNLOCK: UNLOCKING SCAN CHAINS OBFUSCATED USING DYNAMIC KEYS
Speaker:
Nimisha Limaye, New York University, US
Authors:
Nimisha Limaye1 and Ozgur Sinanoglu2
1New York University, US; 2New York University Abu Dhabi, AE
Abstract
Outsourcing in semiconductor industry opened up venues for faster and cost-effective chip manufacturing. However, this also introduced untrusted entities with malicious intent, to steal intellectual property (IP), overproduce the circuits, insert hardware Trojans, or counterfeit the chips. Recently, a defense is proposed to obfuscate the scan access based on a dynamic key that is initially generated from a secret key but changes in every clock cycle. This defense can be considered as the most rigorous defense among all the scan locking techniques. In this paper, we propose an attack that remodels this defense into one that can be broken by the SAT attack, while we also note that our attack can be adjusted to break other less rigorous (key that is updated less frequently) scan locking techniques as well.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-2CMOS IMPLEMENTATION OF SWITCHING LATTICES
Speaker:
Levent Aksoy, Istanbul TU, TR
Authors:
Ismail Cevik, Levent Aksoy and Mustafa Altun, Istanbul TU, TR
Abstract
Switching lattices consisting of four-terminal switches are introduced as area-efficient structures to realize logic functions. Many optimization algorithms have been proposed, including exact ones, realizing logic functions on lattices with the fewest number of four-terminal switches, as well as heuristic ones. Hence, the computing potential of switching lattices has been justified adequately in the literature. However, the same thing cannot be said for their physical implementation. There have been conceptual ideas for the technology development of switching lattices, but no concrete and directly applicable technology has been proposed yet. In this study, we show that switching lattices can be directly and efficiently implemented using a standard CMOS process. To realize a given logic function on a switching lattice, we propose static and dynamic logic solutions. The proposed circuits as well as the compared conventional ones are designed and simulated in the Cadence environment using TSMC 65nm CMOS process. Experimental post layout results on logic functions show that switching lattices occupy much smaller area than those of the conventional CMOS implementations, while they have competitive delay and power consumption values.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-3A TIMING UNCERTAINTY-AWARE CLOCK TREE TOPOLOGY GENERATION ALGORITHM FOR SINGLE FLUX QUANTUM CIRCUITS
Speaker:
Massoud Pedram, University of Southern California, US
Authors:
Soheil Nazar Shahsavani, Bo Zhang and Massoud Pedram, University of Southern California, US
Abstract
This paper presents a low-cost, timing uncertainty-aware synchronous clock tree topology generation algorithm for single flux quantum (SFQ) logic circuits. The proposed method considers the criticality of the data paths in terms of timing slacks as well as the total wirelength of the clock tree and generates a (height-) balanced binary clock tree using a bottom-up approach and an integer linear programming (ILP) formulation. The statistical timing analysis results for ten benchmark circuits show that the proposed method improves the total wirelength and the total negative hold slack by 4.2% and 64.6%, respectively, on average, compared with a wirelength-driven state-of-the-art balanced topology generation approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-4SYMMETRY-BASED A/M-S BIST (SYMBIST): DEMONSTRATION ON A SAR ADC IP
Speaker:
Antonios Pavlidis, Sorbonne Université, CNRS, LIP6, FR
Authors:
Antonios Pavlidis1, Marie-Minerve Louerat1, Eric Faehn2, Anand Kumar3 and Haralampos-G. Stratigopoulos1
1Sorbonne Université, CNRS, LIP6, FR; 2STMicroelectronics, FR; 3STMicroelectronics, IN
Abstract
In this paper, we propose a defect-oriented Built-In Self-Test (BIST) paradigm for analog and mixed-signal (A/MS) Integrated Circuits (ICs), called symmetry-based BIST (Sym-BIST). SymBIST exploits inherent symmetries into the design to generate invariances that should hold true only in defect-free operation. Violation of any of these invariances points to defect detection. We demonstrate SymBIST on a 65nm 10-bit Successive Approximation Register (SAR) Analog-to-Digital Converter (ADC) IP by ST Microelectronics. SymBIST does notresult in any performance penalty, it incurs an area overhead of less than 5%, the test time equals about 16x the time to convert an analog input sample, it can be interfaced with a 2-pin digital access mechanism, and it covers the entire A/M-S part of the IP achieving a likelihood-weighted defect coverage higher than 85%.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-5RANGE CONTROLLED FLOATING-GATE TRANSISTORS: A UNIFIED SOLUTION FOR UNLOCKING AND CALIBRATING ANALOG ICS
Speaker:
Yiorgos Makris, University of Texas at Dallas, US
Authors:
Sai Govinda Rao Nimmalapudi, Georgios Volanis, Yichuan Lu, Angelos Antonopoulos, Andrew Marshall and Yiorgos Makris, University of Texas at Dallas, US
Abstract
Analog Floating-Gate Transistors (AFGTs) are commonly used to fine-tune the performance of analog integrated circuits (ICs) after fabrication, thereby enabling high yield despite component mismatch and variability in semiconductor manufacturing. In this work, we propose a methodology that leverages such AFGTs to also prevent unauthorized use of analog ICs. Specifically, we introduce a locking mechanism that limits programming of AFGTs to a range which is inadequate for achieving the desired analog performance. Accordingly, our solution entails a two-step unlock-&-calibrate process. In the first step, AFGTs must be programmed through a secret sequence of voltages within that range, called waypoints. Successfully following the waypoints unlocks the ability to program the AFGTs over their entire range. Thereby, in the second step, the typical AFGT-based post-silicon calibration process can be applied to adjust the performance of the IC within its specifications. Protection against brute-force or intelligent attacks attempting to guess the unlocking sequence is ensured through the vast space of possible waypoints in the continuous (analog) domain. Feasibility and effectiveness of the proposed solution is demonstrated and evaluated on an Operational Transconductance Amplifier (OTA). To our knowledge, this is the first solution which leverages the power of analog keys and addresses both unlocking and calibration needs of analog ICs in a unified manner.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-6TESTING THROUGH SILICON VIAS IN POWER DISTRIBUTION NETWORK OF 3D-IC WITH MANUFACTURING VARIABILITY CANCELLATION
Speaker:
Koutaro Hachiya, Teikyo Heisei University, JP
Authors:
Koutaro Hachiya1 and Atsushi Kurokawa2
1Teikyo Heisei University, JP; 2Hirosaki University, JP
Abstract
To detect open defects of power TSVs (Through Silicon Vias) in PDNs (Power Distribution Networks) of stacked 3D-ICs, a method was proposed which measures resistances between power micro-bumps connected to PDN and detects defects of TSVs by changes of the resistances. It suffers from manufacturing variabilities and must place one micro-bump directly under each TSV (direct-type placement style) to maximize its diagnostic performance, but the performance was not enough for practical applications. A variability cancellation method was also devised to improve the diagnostic performance. In this paper, a novel middle-type placement style is proposed which places one micro-bump between each pair of TSVs. Experimental simulations using a 3D-IC example show that the diagnostic performances of both the direct-type and the middle-type examples are improved by the variability cancellation and reach the practical level. The middle-type example outperforms the direct-type example in terms of number of micro-bumps and number of measurements.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-7TFAPPROX: TOWARDS A FAST EMULATION OF DNN APPROXIMATE HARDWARE ACCELERATORS ON GPU
Speaker:
Zdenek Vasicek, Brno University of Technology, CZ
Authors:
Filip Vaverka, Vojtech Mrazek, Zdenek Vasicek and Lukas Sekanina, Brno University of Technology, CZ
Abstract
Energy efficiency of hardware accelerators of deep neural networks (DNN) can be improved by introducing approximate arithmetic circuits. In order to quantify the error introduced by using these circuits and avoid the expensive hardware prototyping, a software emulator of the DNN accelerator is usually executed on CPU or GPU. However, this emulation is typically two or three orders of magnitude slower than a software DNN implementation running on CPU or GPU and operating with standard floating point arithmetic instructions and common DNN libraries. The reason is that there is no hardware support for approximate arithmetic operations on common CPUs and GPUs and these operations have to be expensively emulated. In order to address this issue, we propose an efficient emulation method for approximate circuits utilized in a given DNN accelerator which is emulated on GPU. All relevant approximate circuits are implemented as look-up tables and accessed through a texture memory mechanism of CUDA capable GPUs. We exploit the fact that the texture memory is optimized for irregular read-only access and in some GPU architectures is even implemented as a dedicated cache. This technique allowed us to reduce the inference time of the emulated DNN accelerator approximately 200 times with respect to an optimized CPU version on complex DNNs such as ResNet. The proposed approach extends the TensorFlow library and is available online at https://github.com/ehw-fit/tf-approximate

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-8BINARY LINEAR ECCS OPTIMIZED FOR BIT INVERSION IN MEMORIES WITH ASYMMETRIC ERROR PROBABILITIES
Speaker:
Valentin Gherman, CEA, FR
Authors:
Valentin Gherman, Samuel Evain and Bastien Giraud, CEA, FR
Abstract
Many memory types are asymmetric with respect to the error vulnerability of stored 0's and 1's. For instance, DRAM, STT-MRAM and NAND flash memories may suffer from asymmetric error rates. A recently proposed error-protection scheme consists in the inversion of the memory words with too many vulnerable values before they are stored in an asymmetric memory. In this paper, a method is pro-posed for the optimization of systematic binary linear block error-correcting codes in order to maximize their impact when combined with memory word inversion.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-9BELDPC: BIT ERRORS AWARE ADAPTIVE RATE LDPC CODES FOR 3D TLC NAND FLASH MEMORY
Speaker:
Meng Zhang, Huazhong University of Science & Technology, CN
Authors:
Meng Zhang, Fei Wu, Qin Yu, Weihua Liu, Lanlan Cui, Yahui Zhao and Changsheng Xie, Huazhong University of Science & Technology, CN
Abstract
Three-dimensional (3D) NAND flash memory has high capacity and cell storage density by using the multi-bit technology and vertical stack architecture, but degrading data reliability due to high raw bit error rates (RBER) caused by program/erase (P/E) cycles and retention periods. Low-density parity-check (LDPC) codes become more popular error-correcting technologies to improve data reliability due to strong error correction capability, but introducing more decoding iterations at higher RBER. To reduce decoding iterations, this paper proposes BeLDPC: bit errors aware adaptive rate LDPC codes for 3D triple-level cell (TLC) NAND flash memory. Firstly, bit error characteristics in 3D charge trap TLC NAND flash memory are studied on a real FPGA testing platform, including asymmetric bit flipping and temporal locality of bit errors. Then, based on these characteristics, a high-efficiency LDPC code is designed. Experimental results show BeLDPC can reduce decoding iterations under different P/E cycles and retention periods.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-10POISONING THE (DATA) WELL IN ML-BASED CAD: A CASE STUDY OF HIDING LITHOGRAPHIC HOTSPOTS
Speaker:
Kang Liu, New York University, US
Authors:
Kang Liu, Benjamin Tan, Ramesh Karri and Siddharth Garg, New York University, US
Abstract
Machine learning (ML) provides state-of-the-art performance in many parts of computer-aided design (CAD) flows. However, deep neural networks (DNNs) are susceptible to various adversarial attacks, including data poisoning to compromise training to insert backdoors. Sensitivity to training data integrity presents a security vulnerability, especially in light of malicious insiders who want to cause targeted neural network misbehavior. In this study, we explore this threat in lithographic hotspot detection via training data poisoning, where hotspots in a layout clip can be "hidden" at inference time by including a trigger shape in the input. We show that training data poisoning attacks are feasible and stealthy, demonstrating a backdoored neural network that performs normally on clean inputs but misbehaves on inputs when a backdoor trigger is present. Furthermore, our results raise some fundamental questions about the robustness of ML-based systems in CAD.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-11SOLOMON: AN AUTOMATED FRAMEWORK FOR DETECTING FAULT ATTACK VULNERABILITIES IN HARDWARE
Speaker:
Milind Srivastava, IIT Madras, IN
Authors:
Milind Srivastava1, PATANJALI SLPSK1, Indrani Roy1, Chester Rebeiro1, Aritra Hazra2 and Swarup Bhunia3
1IIT Madras, IN; 2IIT Kharagpur, IN; 3University of Florida, US
Abstract
Fault attacks are potent physical attacks on crypto-devices. A single fault injected during encryption can reveal the cipher's secret key. In a hardware realization of an encryption algorithm, only a tiny fraction of the gates is exploitable by such an attack. Finding these vulnerable gates has been a manual and tedious task requiring considerable expertise. In this paper, we propose SOLOMON, the first automatic fault attack vulnerability detection framework for hardware designs. Given a cipher implementation, either at RTL or gate-level, SOLOMON uses formal methods to map vulnerable regions in the cipher algorithm to specific locations in the hardware thus enabling targeted countermeasures to be deployed with much lesser overheads. We demonstrate the efficacy of the SOLOMON framework using three ciphers: AES, CLEFIA, and Simon.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-12FORMAL SYNTHESIS OF MONITORING AND DETECTION SYSTEMS FOR SECURE CPS IMPLEMENTATIONS
Speaker:
Ipsita Koley, IIT Kharagpur, IN
Authors:
Ipsita Koley1, Saurav Kumar Ghosh1, Dey Soumyajit1, Debdeep Mukhopadhyay1, Amogh Kashyap K N2, Sachin Kumar Singh2, Lavanya Lokesh2, Jithin Nalu Purakkal2 and Nishant Sinha2
1IIT Kharagpur, IN; 2Robert Bosch Engineering and Business Solutions Private Limited, IN
Abstract
We consider the problem of securing a given control loop implementation of a cyber-physical system (CPS) in the presence of Man-in-the-Middle attacks on data exchange between plant and controller over a compromised network. To this end, there exists various detection schemes which provide mathematical guarantees against such attacks for the theoretical control model. However, such guarantees may not hold for the actual control software implementation. In this article, we propose a formal approach towards synthesizing attack detectors with varying thresholds which can prevent performance degrading stealthy attacks while minimizing false alarms.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-13ASCELLA: ACCELERATING SPARSE COMPUTATION BY ENABLING STREAM ACCESSES TO MEMORY
Speaker:
Bahar Asgari, Georgia Tech, US
Authors:
Bahar Asgari, Ramyad Hadidi and Hyesoon Kim, Georgia Tech, US
Abstract
Sparse computations dominate a wide range of applications from scientific problems to graph analytics. The main characterization of sparse computations, indirect memory accesses, prevents them from effectively achieving high performance on general-purpose processors. Therefore, hardware accelerators have been proposed for sparse problems. For these accelerators, the storage format and the decompression mechanism is crucial but have seen less attention in prior work. To address this gap, we propose Ascella, an accelerator for sparse computations, which besides enabling a smooth stream of data and parallel computation, proposes a fast decompression mechanism. Our implementation on a ZYNQ FPGA shows that on average, Ascella executes sparse problems up to 5.1x as fast as prior work.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-14ACCELERATION OF PROBABILISTIC REASONING THROUGH CUSTOM PROCESSOR ARCHITECTURE
Speaker:
Nimish Shah, KU Leuven, BE
Authors:
Nimish Shah, Laura I. Galindez Olascoaga, Wannes Meert and Marian Verhelst, KU Leuven, BE
Abstract
Probabilistic reasoning is an essential tool for robust decision-making systems because of its ability to explicitly handle real-world uncertainty, constraints and causal relations. Consequently, researchers are developing hybrid models by combining Deep Learning with Probabilistic reasoning for safety-critical applications like self-driving vehicles, autonomous drones, etc. However, probabilistic reasoning kernels do not execute efficiently on CPUs or GPUs. This paper, therefore, proposes a custom programmable processor to accelerate sum-product networks, an important probabilistic reasoning execution kernel. The processor has an optimized datapath architecture and memory hierarchy optimized for sum-product networks execution. Experimental results show that the processor, while requiring fewer computational and memory units, achieves a 12x throughput benefit over the Nvidia Jetson TX2 embedded GPU platform.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-15A PERFORMANCE ANALYSIS FRAMEWORK FOR REAL-TIME SYSTEMS SHARING MULTIPLE RESOURCES
Speaker:
Shayan Tabatabaei Nikkhah, Eindhoven University of Technology, NL
Authors:
Shayan Tabatabaei Nikkhah1, Marc Geilen1, Dip Goswami1 and Kees Goossens2
1Eindhoven University of Technology, NL; 2Eindhoven university of technology, NL
Abstract
Timing properties of applications strongly depend on resources that are allocated to them. Applications often have multiple resource requirements, all of which must be met for them to proceed. Performance analysis of event-based systems has been widely studied in the literature. However, the proposed works consider only one resource requirement for each application task. Additionally, they mainly focus on the rate at which resources serve applications (e.g., power, instructions or bits per second), but another aspect of resources, which is their provided capacity (e.g., energy, memory ranges, FPGA regions), has been ignored. In this work, we propose a mathematical framework to describe the provisioning rate and capacity of various types of resource. Additionally, we consider the simultaneous use of multiple resources. Conservative bounds on response times of events and their backlog are computed. We prove that the bounds are monotone in event arrivals and in required and provided rate and capacity, which enables verification of real-time application performance based on worst-case characterizations. The applicability of our framework is shown in a case study.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-16SCALING UP THE MEMORY INTERFERENCE ANALYSIS FOR HARD REAL-TIME MANY-CORE SYSTEMS
Speaker:
Matheus Schuh, Verimag / Kalray, FR
Authors:
Matheus Schuh1, Maximilien Dupont de Dinechin2, Matthieu Moy3 and Claire Maiza4
1Verimag / Kalray, FR; 2ENS Paris / ENS Lyon / LIP, FR; 3ENS Lyon / LIP, FR; 4Grenoble INP / Verimag, FR
Abstract
In RTNS 2016, Rihani et al. proposed an algorithm to compute the impact of interference on memory accesses on the timing of a task graph. It calculates a static, time-triggered schedule, i.e. a release date and a worst-case response time for each task. The task graph is a DAG, typically obtained by compilation of a high-level dataflow language, and the tool assumes a previously determined mapping and execution order. The algorithm is precise, but suffers from a high O(n^4) complexity, n being the number of input tasks. Since we target many-core platforms with tens or hundreds of cores, applications likely to exploit the parallelism of these platforms are too large to be handled by this algorithm in reasonable time. This paper proposes a new algorithm that solves the same problem. Instead of performing global fixed-point iterations on the task graph, we compute the static schedule incrementally, reducing the complexity to O(n^2). Experimental results show a reduction from 535 seconds to 0.90 seconds on a benchmark with 384 tasks, i.e. 593 times faster.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-17LIGHTWEIGHT ANONYMOUS ROUTING IN NOC BASED SOCS
Speaker:
Prabhat Mishra, University of Florida, US
Authors:
Subodha Charles, Megan Logan and Prabhat Mishra, University of Florida, US
Abstract
System-on-Chip (SoC) supply chain is widely acknowledged as a major source of security vulnerabilities. Potentially malicious third-party IPs integrated on the same Network-on-Chip (NoC) with the trusted components can lead to security and trust concerns. While secure communication is a well studied problem in computer networks domain, it is not feasible to implement those solutions on resource-constrained SoCs. In this paper, we present a lightweight anonymous routing protocol for communication between IP cores in NoC based SoCs. Our method eliminates the major overhead associated with traditional anonymous routing protocols while ensuring that the desired security goals are met. Experimental results demonstrate that existing security solutions on NoC can introduce significant (1.5X) performance degradation, whereas our approach provides the same security features with minor (4%) impact on performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-18A NON-INVASIVE WEARABLE BIOIMPEDANCE SYSTEM TO WIRELESSLY MONITOR BLADDER FILLING
Speaker:
Michele Magno, ETH Zurich, CH
Authors:
Markus Reichmuth, Simone Schuerle and Michele Magno, ETH Zurich, CH
Abstract
Monitoring of renal function can be crucial for patients in acute care settings. Commonly during postsurgical surveillance, urinary catheters are employed to assess the urine output accurately. However, as with any external device inserted into the body, the use of these catheters carries a significant risk of infection. In this paper, we present a non-invasive method to measure the fill rate of the bladder, and thus the rate of renal clearance, via an external bioimpedance sensor system to avoid the use of urinary catheters, thereby eliminating the risk of infections and improving patient comfort. We design and propose a 4-electrode front-end and the whole wearable and wireless system with low power and accuracy in mind. The results demonstrate the accuracy of the sensors and low power consumption of only 80µW with a duty cycling of 1 acquisition every 5 minutes, which makes this battery-operated wearable device a long-term monitor system.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP1-19INFINIWOLF: ENERGY EFFICIENT SMART BRACELET FOR EDGE COMPUTING WITH DUAL SOURCE ENERGY HARVESTING
Speaker:
Michele Magno, ETH Zurich, CH
Authors:
Michele Magno1, Xiaying Wang1, Manuel Eggimann1, Lukas Cavigelli1 and Luca Benini2
1ETH Zurich, CH; 2Università di Bologna and ETH Zurich, IT
Abstract
This work presents InfiniWolf, a novel multi-sensor smartwatch that can achieve self-sustainability exploiting thermal and solar energy harvesting, performing computationally high demanding tasks. The smartwatch embeds both a System-on-Chip (SoC) with an ARM Cortex-M processor and Bluetooth Low Energy (BLE) and Mr. Wolf, an open-hardware RISC-V based parallel ultra-low-power processor that boosts the processing capabilities on board by more than one order of magnitude, while also increasing energy efficiency. We demonstrate its functionality based on a sample application scenario performing stress detection with multi-layer artificial neural networks on a wearable multi-sensor bracelet. Experimental results show the benefits in terms of energy efficiency and latency of Mr. Wolf over an ARM Cortex-M4F micro-controllers and the possibility, under specific assumptions, to be self-sustainable using thermal and solar energy harvesting while performing up to 24 stress classifications per minute in indoor conditions.

Download Paper (PDF; Only available from the DATE venue WiFi)