IP2 Interactive Presentations

Label	Presentation Title Authors
IP2-1	SAMPLING FROM DISCRETE DISTRIBUTIONS IN COMBINATIONAL HARDWARE WITH APPLICATION TO POST-QUANTUM CRYPTOGRAPHY Speaker: Michael Lyons, George Mason University, US Authors: Michael Lyons and Kris Gaj, George Mason University, US Abstract Random values from discrete distributions are typically generated from uniformly-random samples. A common technique is to use a cumulative distribution table (CDT) lookup for inversion sampling, but it is also possible to use Boolean functions to map a uniformly-random bit sequence into a value from a discrete distribution. This work presents a methodology for deriving such functions for any discrete distribution, encoding them in VHDL for implementation in combinational hardware, and (for moderate precision and sample space size) confirming the correctness of the produced distribution. The process is demonstrated using a discrete Gaussian distribution with a small sample space, but it is applicable to any discrete distribution with fixed parameters. Results are presented for sampling schemes from several submissions to the NIST PQC standardization process, comparing this method to CDT lookups on a Xilinx Artix-7 FPGA. The process produces compact solutions for distributions up to moderate size and precision. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-2	ON THE PERFORMANCE OF NON-PROFILED DIFFERENTIAL DEEP LEARNING ATTACKS AGAINST AN AES ENCRYPTION ALGORITHM PROTECTED USING A CORRELATED NOISE HIDING COUNTERMEASURE Speaker: Amir Alipour, Grenoble INP Esisar, FR Authors: Amir Alipour¹, Athanasios Papadimitriou², Vincent Beroulle³, Ehsan Aerabi³ and David Hely³ ¹University Grenoble Alpes, Grenoble INP ESISAR, LCIS Laboratory, FR; ²University Grenoble Alpes, Grenoble INP ESISAR, ESYNOV, FR; ³University Grenoble Alpes, Grenoble INP ESISAR, LSIC Laboratory, FR Abstract Recent works in the field of cryptography focus on Deep Learning based Side Channel Analysis (DLSCA) as one of the most powerful attacks against common encryption algorithms such as AES. As a common case, profiling DLSCA have shown great capabilities in revealing secret cryptographic keys against the majority of AES implementations. In a very recent study, it has been shown that Deep Learning can be applied in a non-profiling way (non-profiling DLSCA), making this method considerably more practical, and able to break powerful countermeasures for encryption algorithms such as AES including masking countermeasures, requiring considerably less power traces than a first order CPA attack. In this work, our main goal is to apply the non-profiling DLSCA against a hiding-based AES countermeasure which utilizes correlated noise generation so as to hide the secret encryption key. We show that this AES, with correlated noise generation as a lightweight countermeasure, can provide equivalent protection under CPA and under non- profiling DLSCA attacks, in terms of the required power traces to obtain the secret key. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-3	FAST AND ACCURATE PERFORMANCE EVALUATION FOR RISC-V USING VIRTUAL PROTOTYPES Speaker: Vladimir Herdt, University of Bremen, DE Authors: Vladimir Herdt¹, Daniel Grosse² and Rolf Drechsler² ¹University of Bremen, DE; ²University of Bremen / DFKI, DE Abstract RISC-V is gaining huge popularity in particular for embedded systems. Recently, a SystemC-based Virtual Prototype (VP) has been open sourced to lay the foundation for providing support for system-level use cases such as design space exploration, analysis of complex HW/SW interactions and power/timing/performance validation for RISC-V based systems. In this paper, we propose an efficient core timing model and integrate it into the VP core to enable fast and accurate performance evaluation for RISC-V based systems. As a case-study we provide a timing configuration matching the RISC-V HiFive1 board from SiFive. Our experiments demonstrate that our approach allows to obtain very accurate performance evaluation results while still retaining a high simulation performance. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-4	AUTOMATED GENERATION OF LTL SPECIFICATIONS FOR SMART HOME IOT USING NATURAL LANGUAGE Speaker: Shiyu Zhang, Nanjing University, CN Authors: Shiyu Zhang¹, Juan Zhai¹, Lei Bu¹, Mingsong Chen², Linzhang Wang¹ and Xuandong Li¹ ¹Nanjing University, CN; ²East China Normal University, CN Abstract Ordinary inexperienced users can build their smart home IoT system easily nowadays, but such user-customized systems could be error-prone. Using formal verification to prove the correctness of such systems is necessary. However, to conduct formal proof, formal specifications such as Linear Temporal Logic (LTL) formulas have to be provided, but ordinary users cannot author LTL formulas but only natural language. To address this problem, this paper presents a novel approach that can automatically generate formal LTL specifications from natural language requirements based on domain knowledge and our proposed ambiguity refining techniques. Experimental results show that our approach can achieve a high correctness rate of 95.4% in converting natural language sentences into LTL formulas from 481 requirements of real examples. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-5	A HEAT-RECIRCULATION-AWARE VM PLACEMENT STRATEGY FOR DATA CENTERS Authors: Hao Feng¹, Yuhui Deng² and Yi Zhou³ ¹Jinan University, CN; ²Chinese Academy of Sciences; Jinan University, CN; ³Columbus State University, US Abstract Data centers consisted of a great number of IT devices (e.g., servers, switches and etc.) which generates a massive amount of heat emission. Due to the special arrangement of racks in the data center, heat recirculation often occurs between nodes. It can cause a sharp rise in temperature of the equipment coupled with local hot spots in data centers. Existing VM placement strategies can minimize energy consumption of data centers by optimizing resource allocation in terms of multiple physical resources (e.g., memory, bandwidth, cpu and etc.). However, existing strategies ignore the role of heat recirculation in the data center. To address this problem, in this study, we propose a heat-recirculation-aware VM placement strategy and design a Simulated Annealing Based Algorithm (SABA) to lower the energy consumption of data centers. Different from the existing SA algorithm, SABA optimize the distribution of the initial solution and the way of iteration. We quantitatively evaluate SABA's performance in terms of algorithm efficiency, the activated servers and the energy saving against with XINT-GA algorithm (Thermal-aware task scheduling Strategy), FCFS (First-Come First-Served), and SA. Experimental results indicate that our heat-recirculation-aware VM placement strategy provides a powerful solution for improving energy efficiency of data centers. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-6	ENERGY OPTIMIZATION IN NCFET-BASED PROCESSORS Authors: Sami Salamin¹, Martin Rapp¹, Hussam Amrouch¹, Andreas Gerstlauer² and Joerg Henkel¹ ¹Karlsruhe Institute of Technology, DE; ²University of Texas at Austin, US Abstract Energy consumption is a key optimization goal for all modern processors. Negative Capacitance Field-Effect Transistors (NCFETs) are a leading emerging technology that promises outstanding performance in addition to better energy efficiency. The thickness of the additional ferroelectric layer, frequency, and voltage are the key parameters in NCFET technology that impact the power and frequency of processors. However, their joint impact on energy optimization has not been investigated yet. In this work, we are the first to demonstrate that conventional (i.e., NCFET-unaware) dynamic voltage/frequency scaling (DVFS) techniques to minimize energy are sub-optimal when applied to NCFET-based processors. We further demonstrate that state-of-the-art NCFET-aware voltage scaling for power minimization is also sub-optimal when it comes to energy. This work provides the first NCFET-aware DVFS technique that optimizes the processor's energy through optimal runtime frequency/voltage selection. In NCFETs, energy-optimal frequency and voltage are dependent on the workload and technology parameters. Our NCFET-aware DVFS technique considers these effects to perform optimal voltage/frequency selection at runtime depending on workload characteristics. Results show up to 90 % energy savings compared to conventional DVFS techniques. Compared to state-of-the-art NCFET-aware power management, our technique provides up to 72 % energy savings along with 3:7x higher performance. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-7	TOWARDS A MODEL-BASED MULTI-OBJECTIVE OPTIMIZATION APPROACH FOR SAFETY-CRITICAL REAL-TIME SYSTEMS Speaker: Emmanuel Grolleau, LIAS / ISAE-ENSMA, FR Authors: Soulimane Kamni¹, Yassine OUHAMMOU², Antoine Bertout³ and Emmanuel Grolleau⁴ ¹LIAS/ENSMA, FR; ²LIAS / ISAE-ENSMA, FR; ³LIAS, Université de Poitiers, ISAE-ENSMA, FR; ⁴LIAS, ISAE-ENSMA, Universite de Poitiers, FR Abstract In safety-critical real-time systems domain, obtaining the appropriate operational model which meets the temporal (e.g. deadlines) and business (e.g. redundancy) requirements while being optimal in terms of several metrics is a primordial process in the design life-cycle. Recently, several researches have proposed to explore cross-domain trade-offs for a higher behaviour performance. Indeed, this process represents the first step in the deployment phase, which is very sensitive because it could be error-prone and time consuming. This paper is a work in progress proposing an approach aiming to help real-time system architects to take benefit from existing works, overcome their limits, and capitalize the efforts. Furthermore, the approach is based on the model-driven engineering paradigm and suggests to ease the usage of methods and tools thanks to repositories gathering them as a sort of a shared knowledge. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-8	CURRENT-MODE CARRY-FREE MULTIPLIER DESIGN USING A MEMRISTOR-TRANSISTOR CROSSBAR ARCHITECTURE Speaker: Shengqi Yu, Newcastle University, GB Authors: Shengqi Yu¹, Ahmed Soltan², Rishad Shafik¹, Thanasin Bunnam¹, Domenico Balsamo¹, Fei Xia¹ and Alex Yakovlev¹ ¹Newcastle University, GB; ²Nile University, EG Abstract Traditional multipliers consist of complex logic components. They are a major energy and performance contributor of modern compute-intensive applications. As such, designing multipliers with reduced energy and faster speed has remained a thoroughgoing challenge. This paper presents a novel, carry-free multiplier, which is suitable for new-generation of energy-constrained applications. The multiplier circuit consists of an array of memristor-transistor cells that can be selected (i.e., turned ON or OFF) using a combination of DC bias voltages based on the operand values. When a cell is selected it contributes to current in the array path, which is then amplified by current mirrors with variable transistor gate sizes. The different current paths are connected to a node for analogously accumulating the currents to produce the multiplier output directly, removing the carry propagation stages, typically seen in traditional digital multipliers. An essential feature of this multiplier is autonomous survivability, i.e., when the power is below this threshold the logic state automatically retains at a zero-cost due to the non-volatile properties of memristors. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-9	N-BIT DATA PARALLEL SPIN WAVE LOGIC GATE Speaker: Abdulqader Mahmoud, TU Delft, NL Authors: Abdulqader Mahmoud¹, Frederic Vanderveken², Florin Ciubotaru², Christoph Adelmann², Sorin Cotofana¹ and Said Hamdioui¹ ¹TU Delft, NL; ²IMEC, BE Abstract Due to their very nature, Spin Waves (SWs) created in the same waveguide, but with different frequencies, can coexist while selectively interacting with their own species only. The absence of inter-frequency interferences isolates input data sets encoded in SWs with different frequencies and creates the premises for simultaneous data parallel SW based processing without hardware replication or delay overhead. In this paper we leverage this SW property by introducing a novel computation paradigm, which allows for the parallel processing of n-bit input data vectors on the same basic SW based logic gate. Subsequently, to demonstrate the proposed concept, we present 8-bit parallel 3-input Majority gate implementation and validate it by means of Object Oriented MicroMagnetic Framework (OOMMF) simulations. To evaluate the potential benefit of our proposal we compare the 8-bit data parallel gate with equivalent scalar SW gate based implementation. Our evaluation indicates that 8-bit data 3-input Majority gate implementation requires 4.16x less area than the scalar SW gate based equivalent counterpart while preserving the same delay and energy consumption figures. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-10	HIGH-SPEED ANALOG SIMULATION OF CMOS VISION CHIPS USING EXPLICIT INTEGRATION TECHNIQUES ON MANY-CORE PROCESSORS Speaker: Tom Kazmierski, University of Southampton, GB Authors: Gines Domenech-Asensi¹ and Tom J Kazmierski² ¹Universidad Politecnica de Cartagena, ES; ²University of Southampton, GB Abstract This work describes a high-speed simulation technique of analog circuits which is based on the use of state-space equations and an explicit integration method parallelised on a multiprocessor architecture. The integration step of such method is smaller than the one required by an implicit simulation technique based on Newton-Raphson iterations. However, given that explicit methods do not require the computation of time-consuming matrix factorizations, the overall simulation time is reduced. The technique described in this work has been implemented on a NVIDIA general purpose GPU and has been tested simulating the Gaussian filtering operation performed by a smart CMOS image sensor. Such devices are used to perform computation on the edge and include built-in image processing functions. Among those, the Gaussian filtering is one of the most common functions, since it is a basic task for early vision processing. These smart sensors are increasingly complex and hence the time required to simulate them during their design cycle is also larger and larger. From a certain imager size, the proposed simulation method yields simulation times two order of magnitude faster that an implicit method based tool such us SPICE. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-11	A 100KHZ-1GHZ TERMINATION-DEPENDENT HUMAN BODY COMMUNICATION CHANNEL MEASUREMENT USING MINIATURIZED WEARABLE DEVICES Speaker: Shreyas Sen, Purdue University, US Authors: Shitij Avlani, Mayukh Nath, Shovan Maity and Shreyas Sen, Purdue University, US Abstract Human Body Communication has shown great promise to replace wireless communication for information exchange between wearable devices of a body area network. However, there are very few studies in literature, that systematically study the channel loss of capacitive HBC for wearable devices over a wide frequency range with different terminations at the receiver, partly due to the need for miniaturized wearable devices for an accurate study. This paper, for the first time, measures the channel loss of capacitive HBC from 100KHz to 1GHz for both high-impedance and 50 ohm terminations using wearable, battery powered devices; which is mandatory for accurate measurement of the HBC channel-loss, due to ground coupling effects. Results show that high impedance termination leads to a significantly lower channel loss (40 dB improvement at 1MHz), as compared to 50 ohm termination at low frequencies. This difference steadily decreases with increasing frequency, until they become similar near 80MHz. Beyond 100MHz inter-device coupling dominates, thereby preventing accurate measurements of channel loss of the human body. The measured results provide a consistent wearable, wide-frequency HBC channel loss data and could serve as a backbone for the emerging field of HBC by aiding in the selection of an appropriate operation frequency and termination. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-12	FROM DRUP TO PAC AND BACK Speaker: Daniela Kaufmann, Johannes Kepler University Linz, AT Authors: Daniela Kaufmann, Armin Biere and Manuel Kauers, Johannes Kepler University Linz, AT Abstract Currently the most efficient automatic approach to verify gate-level multipliers combines SAT solving and computer algebra. In order to increase confidence in the verification, proof certificates are generated. However, due to different solving techniques, these certificates require two different proof formats, namely DRUP and PAC. A combined proof has so far been missing. Correctness of this approach can thus only be trusted up to the correctness of compositional reasoning. In this paper we show how to generate a single proof in one proof format, which then allows to certify correctness using one simple proof checker. We further investigate empirically the effect on proof generation and checking time as well as on proof size. It turns out that PAC proofs are much more compact and faster to check. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-13	VERIFIABLE SECURITY TEMPLATES FOR HARDWARE Speaker: Bill Harrison, Oak Ridge National Laboratory, US Authors: William Harrison¹ and Gerard Allwein² ¹Oak Ridge National Laboratory, US; ²Naval Research Laboratory, US Abstract But HLS has, with a few notable exceptions, not focused on transferring ideas and techniques from high assurance software formal methods to hardware development, despite there being a sophisticated and mature body of research in that area. Just as it has introduced software engineering virtues, we believe HLS can also become a vector for retrofitting software formal methods to the challenge of high assurance security in hardware. This paper introduces the Device Calculus and its mechanization in the Agda proof checking system. The Device Calculus is a starting point for exploring formal methods and security within high-level synthesis flows. We illustrate the Device Calculus with a number of examples of formally verifiable security templates---i.e., functions in the Device Calculus that express common security structures at a high-level of abstraction. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-14	IFFSET: IN-FIELD FUZZING OF INDUSTRIAL CONTROL SYSTEMS USING SYSTEM EMULATION Speaker: Dimitrios Tychalas, New York University, US Authors: Dimitrios Tychalas¹ and Michail Maniatakos² ¹New York University, US; ²New York University Abu Dhabi, AE Abstract Industrial Control Systems (ICS) have evolved in the last decade, shifting from proprietary software/hardware to contemporary embedded architectures paired with open-source operating systems. In contrast to the IT world, where continuous updates and patches are expected, decommissioning always-on ICS for security assessment can incur prohibitive costs to their owner. Thus, a solution for routinely assessing the cybersecurity posture of diverse ICS without affecting their operation is essential. Therefore, in this paper we introduce IFFSET, a platform that leverages full system emulation of Linux-based ICS firmware and utilizes fuzzing for security evaluation. Our platform extracts the file system and kernel information from a live ICS device, building an image which is emulated on a desktop system through QEMU. We employ fuzzing as a security assessment tool to analyze ICS specific libraries and find potential security threatening conditions. We test our platform with commercial PLCs, showcasing potential threats with no interruption to the control process. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-15	FANNET: FORMAL ANALYSIS OF NOISE TOLERANCE, TRAINING BIAS AND INPUT SENSITIVITY IN NEURAL NETWORKS Speaker: Mahum Naseer, TU Wien, AT Authors: Mahum Naseer¹, Mishal Fatima Minhas², Faiq Khalid¹, Muhammad Abdullah Hanif¹, Osman Hasan² and Muhammad Shafique¹ ¹TU Wien, AT; ²National University of Sciences and Technology, PK Abstract With a constant improvement in the network architectures and training methodologies, Neural Networks (NNs) are increasingly being deployed in real-world Machine Learning systems. However, despite their impressive performance on "known inputs", these NNs can fail absurdly on the "unseen inputs", especially if these real-time inputs deviate from the training dataset distributions, or contain certain types of input noise. This indicates the low noise tolerance of NNs, which is a major reason for the recent increase of adversarial attacks. This is a serious concern, particularly for safety-critical applications, where inaccurate results lead to dire consequences. We propose a novel methodology that leverages model checking for the Formal Analysis of Neural Network (FANNet) under different input noise ranges. Our methodology allows us to rigorously analyze the noise tolerance of NNs, their input node sensitivity, and the effects of training bias on their performance, e.g., in terms of classification accuracy. For evaluation, we use a feed-forward fully-connected NN architecture trained for the Leukemia classification. Our experimental results show 11% noise tolerance for the given trained network, identify the most sensitive input nodes, confirm the biasness of the available training dataset, and indicate that the proposed methodology is much more rigorous and yet comparable to validation testing in terms of time and computational resources for larger noise ranges. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-16	A SCALABLE MIXED SYNTHESIS FRAMEWORK FOR HETEROGENEOUS NETWORKS Speaker: Max Austin, University of Utah, US Authors: Max Austin¹, Scott Temple¹, Walter Lau Neto¹, Luca Amaru², Xifan Tang¹ and Pierre-Emmanuel Gaillardon¹ ¹University of Utah, US; ²Synopsys, US Abstract We present a new logic synthesis framework which produces efficient post-technology mapped results on heterogeneous networks containing a mix of different types of logic. This framework accomplishes this by breaking down the circuit into sections using a hypergraph k-way partitioner and then determines the best-fit logic representation for each partition between two Boolean networks, And-Inverter Graphs(AIG) and Majority-Inverter Graphs(MIG), which have been shown to perform better over each other on different types of logic. Experimental results show that over a set of Open Piton DesignBenchmarks(OPDB) and OpenCores benchmarks, our proposed methodology outperforms state-of-the-art academic tools inArea-Delay Product(ADP), Power-Delay Product(PDP), and Energy-Delay Product(EDP) by 5%, 2%, and 15% respectively after performing Application Specific Integrated Circuits(ASIC) technology mapping as well as showing a 54% improvement in runtime over conventional MIG optimization Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-17	DISCERN: DISTILLING STANDARD CELLS FOR EMERGING RECONFIGURABLE NANOTECHNOLOGIES Speaker: Shubham Rai, TU Dresden, DE Authors: Shubham Rai¹, Michael Raitza², Siva Satyendra Sahoo¹ and Akash Kumar¹ ¹TU Dresden, DE; ²TU Dresden and CfAED, DE Abstract Logic gates and circuits based on reconfigurable nanotechnologies demonstrate runtime-reconfigurability, where a single logic gate can exhibit more than one functionality. Recent attempts on circuits based on emerging reconfigurable nanotechnologies have primarily focused on using the traditional CMOS design flow involving similar-styled standard-cells. These CMOS-centric standard-cells fail to utilize the exciting properties offered by these nanotechnologies. In the present work, we explore the boolean properties that define the reconfigurable properties of a logic gate. By analyzing the truth-table in detail, we find that there is a common boolean rule which dictates why a logic gate is reconfigurable. Such logic gates can be efficiently implemented using reconfigurable nanotechnologies. We propose an algorithm which analyses the truth-tables of nodes in a circuit to list all such potential reconfigurable logic gates for a particular circuit. Technology mapping with these new logic gates (or standard-cells) leads to a better mapping in terms of area and delay. Experiments employing our methodology over EPFL benchmarks, show average improvements of around 13%, 16% and 11.5% in terms of area, number of edges and delay respectively as compared to the conventional CMOS-centric standard-cell based mapping. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-18	A 16×128 STOCHASTIC-BINARY PROCESSING ELEMENT ARRAY FOR ACCELERATING STOCHASTIC DOT-PRODUCT COMPUTATION USING 1-16 BIT-STREAM LENGTH Speaker: Hyunjoon Kim, Nanyang Technological University, SG Authors: Qian Chen, Yuqi Su, Hyunjoon Kim, Taegeun Yoo, Tony Tae-Hyoung Kim and Bongjin Kim, Nanyang Technological University, SG Abstract This work presents 16×128 stochastic-binary processing elements for energy/area efficient processing of artificial neural networks. A processing element (PE) with all-digital components consists of an XNOR gate as a bipolar stochastic multiplier and an 8bit binary adder with 8× registers for accumulating partial-sums. The PE array comprises 16× dot-product units, each with 128 PEs cascaded in a single row. The latency and energy of the proposed dot-product unit is minimized by reducing the number of bit-streams required for minimizing the accuracy degradation induced by the approximate stochastic computing. A 128-input dot-product operation requires the bit-stream length (N) of 1-to-16, which is two orders of magnitude smaller than the baseline stochastic computation using MUX-based adders. The simulated dot-product error is 6.9-to-1.5% for N=1-to-16, while the error from the baseline stochastic method is 5.9-to-1.7% with N=128-to-2048. A mean MNIST classification accuracy is 96.11% (which is 1.19% lower than 8b binary) using a three-layer MLP at N=16. The measured energy from a 65nm test-chip is 10.04pJ per dot-product, and the energy efficiency is 25.5TOPS/W at N=16. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-19	TOWARDS EXPLORING THE POTENTIAL OF ALTERNATIVE QUANTUM COMPUTING ARCHITECTURES Speaker: Arighna Deb, Kalinga Institute of Industrial Technology, IN Authors: Arighna Deb¹, Gerhard W. Dueck² and Robert Wille³ ¹Kalinga Institute of Industrial Technology, IN; ²University of New Brunswick, CA; ³Johannes Kepler University Linz, AT Abstract The recent advances in the physical realization of Noisy Intermediate Scale Quantum (NISQ) computers have motivated research on design automation that allows users to execute quantum algorithms on them. Certain physical constraints in the architectures restrict how logical qubits used to describe the algorithm can be mapped to physical qubits used to realize the corresponding functionality. Thus far, this has been addressed by inserting additional operations in order to overcome the physical constrains. However, all these approaches have taken the existing architectures as invariant and did not explore the potential of changing the quantum architecture itself—a valid option as long as the underlying physical constrains remain satisfied. In this work, we propose initial ideas to explore this potential. More precisely, we introduce several schemes for the generation of alternative coupling graphs (and, by this, quantum computing architectures) that still might be able to satisfy physical constraints but, at the same time, allow for a more efficient realization of the desired quantum functionality. Download Paper (PDF; Only available from the DATE venue WiFi)
IP2-20	ACCELERATING QUANTUM APPROXIMATE OPTIMIZATION ALGORITHM USING MACHINE LEARNING Speaker: Swaroop Ghosh, Pennsylvania State University, US Authors: Mahabubul Alam, Abdullah Ash- Saki and Swaroop Ghosh, Pennsylvania State University, US Abstract We propose a machine learning based approach to accelerate quantum approximate optimization algorithm (QAOA) implementation which is a promising quantum-classical hybrid algorithm to prove the so-called quantum supremacy. In QAOA, a parametric quantum circuit and a classical optimizer iterates in a closed loop to solve hard combinatorial optimization problems. The performance of QAOA improves with an increasing number of stages (depth) in the quantum circuit. However, two new parameters are introduced with each added stage for the classical optimizer increasing the number of optimization loop iterations. We note a correlation among parameters of the lower-depth and the higher-depth QAOA implementations and, exploit it by developing a machine learning model to predict the gate parameters close to the optimal values. As a result, the optimization loop converges in a fewer number of iterations. We choose graph MaxCut problem as a prototype to solve using QAOA. We perform a feature extraction routine using 100 different QAOA instances and develop a training data-set with 13,860 optimal parameters. We present our analysis for 4 flavors of regression models and 4 flavors of classical optimizers. Finally, we show that the proposed approach can curtail the number of optimization iterations by on average 44.9% (up to 65.7%) from an analysis performed with 264 flavors of graphs. Download Paper (PDF; Only available from the DATE venue WiFi)

Label

Presentation Title
Authors

IP2-1

SAMPLING FROM DISCRETE DISTRIBUTIONS IN COMBINATIONAL HARDWARE WITH APPLICATION TO POST-QUANTUM CRYPTOGRAPHY
Speaker:
Michael Lyons, George Mason University, US
Authors:
Michael Lyons and Kris Gaj, George Mason University, US
Abstract
Random values from discrete distributions are typically generated from uniformly-random samples. A common technique is to use a cumulative distribution table (CDT) lookup for inversion sampling, but it is also possible to use Boolean functions to map a uniformly-random bit sequence into a value from a discrete distribution. This work presents a methodology for deriving such functions for any discrete distribution, encoding them in VHDL for implementation in combinational hardware, and (for moderate precision and sample space size) confirming the correctness of the produced distribution. The process is demonstrated using a discrete Gaussian distribution with a small sample space, but it is applicable to any discrete distribution with fixed parameters. Results are presented for sampling schemes from several submissions to the NIST PQC standardization process, comparing this method to CDT lookups on a Xilinx Artix-7 FPGA. The process produces compact solutions for distributions up to moderate size and precision.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-2

ON THE PERFORMANCE OF NON-PROFILED DIFFERENTIAL DEEP LEARNING ATTACKS AGAINST AN AES ENCRYPTION ALGORITHM PROTECTED USING A CORRELATED NOISE HIDING COUNTERMEASURE
Speaker:
Amir Alipour, Grenoble INP Esisar, FR
Authors:
Amir Alipour¹, Athanasios Papadimitriou², Vincent Beroulle³, Ehsan Aerabi³ and David Hely³
¹University Grenoble Alpes, Grenoble INP ESISAR, LCIS Laboratory, FR; ²University Grenoble Alpes, Grenoble INP ESISAR, ESYNOV, FR; ³University Grenoble Alpes, Grenoble INP ESISAR, LSIC Laboratory, FR
Abstract
Recent works in the field of cryptography focus on Deep Learning based Side Channel Analysis (DLSCA) as one of the most powerful attacks against common encryption algorithms such as AES. As a common case, profiling DLSCA have shown great capabilities in revealing secret cryptographic keys against the majority of AES implementations. In a very recent study, it has been shown that Deep Learning can be applied in a non-profiling way (non-profiling DLSCA), making this method considerably more practical, and able to break powerful countermeasures for encryption algorithms such as AES including masking countermeasures, requiring considerably less power traces than a first order CPA attack. In this work, our main goal is to apply the non-profiling DLSCA against a hiding-based AES countermeasure which utilizes correlated noise generation so as to hide the secret encryption key. We show that this AES, with correlated noise generation as a lightweight countermeasure, can provide equivalent protection under CPA and under non- profiling DLSCA attacks, in terms of the required power traces to obtain the secret key.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-3

FAST AND ACCURATE PERFORMANCE EVALUATION FOR RISC-V USING VIRTUAL PROTOTYPES
Speaker:
Vladimir Herdt, University of Bremen, DE
Authors:
Vladimir Herdt¹, Daniel Grosse² and Rolf Drechsler²
¹University of Bremen, DE; ²University of Bremen / DFKI, DE
Abstract
RISC-V is gaining huge popularity in particular for embedded systems. Recently, a SystemC-based Virtual Prototype (VP) has been open sourced to lay the foundation for providing support for system-level use cases such as design space exploration, analysis of complex HW/SW interactions and power/timing/performance validation for RISC-V based systems. In this paper, we propose an efficient core timing model and integrate it into the VP core to enable fast and accurate performance evaluation for RISC-V based systems. As a case-study we provide a timing configuration matching the RISC-V HiFive1 board from SiFive. Our experiments demonstrate that our approach allows to obtain very accurate performance evaluation results while still retaining a high simulation performance.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-4

AUTOMATED GENERATION OF LTL SPECIFICATIONS FOR SMART HOME IOT USING NATURAL LANGUAGE
Speaker:
Shiyu Zhang, Nanjing University, CN
Authors:
Shiyu Zhang¹, Juan Zhai¹, Lei Bu¹, Mingsong Chen², Linzhang Wang¹ and Xuandong Li¹
¹Nanjing University, CN; ²East China Normal University, CN
Abstract
Ordinary inexperienced users can build their smart home IoT system easily nowadays, but such user-customized systems could be error-prone. Using formal verification to prove the correctness of such systems is necessary. However, to conduct formal proof, formal specifications such as Linear Temporal Logic (LTL) formulas have to be provided, but ordinary users cannot author LTL formulas but only natural language. To address this problem, this paper presents a novel approach that can automatically generate formal LTL specifications from natural language requirements based on domain knowledge and our proposed ambiguity refining techniques. Experimental results show that our approach can achieve a high correctness rate of 95.4% in converting natural language sentences into LTL formulas from 481 requirements of real examples.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-5

A HEAT-RECIRCULATION-AWARE VM PLACEMENT STRATEGY FOR DATA CENTERS
Authors:
Hao Feng¹, Yuhui Deng² and Yi Zhou³
¹Jinan University, CN; ²Chinese Academy of Sciences; Jinan University, CN; ³Columbus State University, US
Abstract
Data centers consisted of a great number of IT devices (e.g., servers, switches and etc.) which generates a massive amount of heat emission. Due to the special arrangement of racks in the data center, heat recirculation often occurs between nodes. It can cause a sharp rise in temperature of the equipment coupled with local hot spots in data centers. Existing VM placement strategies can minimize energy consumption of data centers by optimizing resource allocation in terms of multiple physical resources (e.g., memory, bandwidth, cpu and etc.). However, existing strategies ignore the role of heat recirculation in the data center. To address this problem, in this study, we propose a heat-recirculation-aware VM placement strategy and design a Simulated Annealing Based Algorithm (SABA) to lower the energy consumption of data centers. Different from the existing SA algorithm, SABA optimize the distribution of the initial solution and the way of iteration. We quantitatively evaluate SABA's performance in terms of algorithm efficiency, the activated servers and the energy saving against with XINT-GA algorithm (Thermal-aware task scheduling Strategy), FCFS (First-Come First-Served), and SA. Experimental results indicate that our heat-recirculation-aware VM placement strategy provides a powerful solution for improving energy efficiency of data centers.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-6

ENERGY OPTIMIZATION IN NCFET-BASED PROCESSORS
Authors:
Sami Salamin¹, Martin Rapp¹, Hussam Amrouch¹, Andreas Gerstlauer² and Joerg Henkel¹
¹Karlsruhe Institute of Technology, DE; ²University of Texas at Austin, US
Abstract
Energy consumption is a key optimization goal for all modern processors. Negative Capacitance Field-Effect Transistors (NCFETs) are a leading emerging technology that promises outstanding performance in addition to better energy efficiency. The thickness of the additional ferroelectric layer, frequency, and voltage are the key parameters in NCFET technology that impact the power and frequency of processors. However, their joint impact on energy optimization has not been investigated yet. In this work, we are the first to demonstrate that conventional (i.e., NCFET-unaware) dynamic voltage/frequency scaling (DVFS) techniques to minimize energy are sub-optimal when applied to NCFET-based processors. We further demonstrate that state-of-the-art NCFET-aware voltage scaling for power minimization is also sub-optimal when it comes to energy. This work provides the first NCFET-aware DVFS technique that optimizes the processor's energy through optimal runtime frequency/voltage selection. In NCFETs, energy-optimal frequency and voltage are dependent on the workload and technology parameters. Our NCFET-aware DVFS technique considers these effects to perform optimal voltage/frequency selection at runtime depending on workload characteristics. Results show up to 90 % energy savings compared to conventional DVFS techniques. Compared to state-of-the-art NCFET-aware power management, our technique provides up to 72 % energy savings along with 3:7x higher performance.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-7

TOWARDS A MODEL-BASED MULTI-OBJECTIVE OPTIMIZATION APPROACH FOR SAFETY-CRITICAL REAL-TIME SYSTEMS
Speaker:
Emmanuel Grolleau, LIAS / ISAE-ENSMA, FR
Authors:
Soulimane Kamni¹, Yassine OUHAMMOU², Antoine Bertout³ and Emmanuel Grolleau⁴
¹LIAS/ENSMA, FR; ²LIAS / ISAE-ENSMA, FR; ³LIAS, Université de Poitiers, ISAE-ENSMA, FR; ⁴LIAS, ISAE-ENSMA, Universite de Poitiers, FR
Abstract
In safety-critical real-time systems domain, obtaining the appropriate operational model which meets the temporal (e.g. deadlines) and business (e.g. redundancy) requirements while being optimal in terms of several metrics is a primordial process in the design life-cycle. Recently, several researches have proposed to explore cross-domain trade-offs for a higher behaviour performance. Indeed, this process represents the first step in the deployment phase, which is very sensitive because it could be error-prone and time consuming. This paper is a work in progress proposing an approach aiming to help real-time system architects to take benefit from existing works, overcome their limits, and capitalize the efforts. Furthermore, the approach is based on the model-driven engineering paradigm and suggests to ease the usage of methods and tools thanks to repositories gathering them as a sort of a shared knowledge.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-8

CURRENT-MODE CARRY-FREE MULTIPLIER DESIGN USING A MEMRISTOR-TRANSISTOR CROSSBAR ARCHITECTURE
Speaker:
Shengqi Yu, Newcastle University, GB
Authors:
Shengqi Yu¹, Ahmed Soltan², Rishad Shafik¹, Thanasin Bunnam¹, Domenico Balsamo¹, Fei Xia¹ and Alex Yakovlev¹
¹Newcastle University, GB; ²Nile University, EG
Abstract
Traditional multipliers consist of complex logic components. They are a major energy and performance contributor of modern compute-intensive applications. As such, designing multipliers with reduced energy and faster speed has remained a thoroughgoing challenge. This paper presents a novel, carry-free multiplier, which is suitable for new-generation of energy-constrained applications. The multiplier circuit consists of an array of memristor-transistor cells that can be selected (i.e., turned ON or OFF) using a combination of DC bias voltages based on the operand values. When a cell is selected it contributes to current in the array path, which is then amplified by current mirrors with variable transistor gate sizes. The different current paths are connected to a node for analogously accumulating the currents to produce the multiplier output directly, removing the carry propagation stages, typically seen in traditional digital multipliers. An essential feature of this multiplier is autonomous survivability, i.e., when the power is below this threshold the logic state automatically retains at a zero-cost due to the non-volatile properties of memristors.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-9

N-BIT DATA PARALLEL SPIN WAVE LOGIC GATE
Speaker:
Abdulqader Mahmoud, TU Delft, NL
Authors:
Abdulqader Mahmoud¹, Frederic Vanderveken², Florin Ciubotaru², Christoph Adelmann², Sorin Cotofana¹ and Said Hamdioui¹
¹TU Delft, NL; ²IMEC, BE
Abstract
Due to their very nature, Spin Waves (SWs) created in the same waveguide, but with different frequencies, can coexist while selectively interacting with their own species only. The absence of inter-frequency interferences isolates input data sets encoded in SWs with different frequencies and creates the premises for simultaneous data parallel SW based processing without hardware replication or delay overhead. In this paper we leverage this SW property by introducing a novel computation paradigm, which allows for the parallel processing of n-bit input data vectors on the same basic SW based logic gate. Subsequently, to demonstrate the proposed concept, we present 8-bit parallel 3-input Majority gate implementation and validate it by means of Object Oriented MicroMagnetic Framework (OOMMF) simulations. To evaluate the potential benefit of our proposal we compare the 8-bit data parallel gate with equivalent scalar SW gate based implementation. Our evaluation indicates that 8-bit data 3-input Majority gate implementation requires 4.16x less area than the scalar SW gate based equivalent counterpart while preserving the same delay and energy consumption figures.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-10

HIGH-SPEED ANALOG SIMULATION OF CMOS VISION CHIPS USING EXPLICIT INTEGRATION TECHNIQUES ON MANY-CORE PROCESSORS
Speaker:
Tom Kazmierski, University of Southampton, GB
Authors:
Gines Domenech-Asensi¹ and Tom J Kazmierski²
¹Universidad Politecnica de Cartagena, ES; ²University of Southampton, GB
Abstract
This work describes a high-speed simulation technique of analog circuits which is based on the use of state-space equations and an explicit integration method parallelised on a multiprocessor architecture. The integration step of such method is smaller than the one required by an implicit simulation technique based on Newton-Raphson iterations. However, given that explicit methods do not require the computation of time-consuming matrix factorizations, the overall simulation time is reduced. The technique described in this work has been implemented on a NVIDIA general purpose GPU and has been tested simulating the Gaussian filtering operation performed by a smart CMOS image sensor. Such devices are used to perform computation on the edge and include built-in image processing functions. Among those, the Gaussian filtering is one of the most common functions, since it is a basic task for early vision processing. These smart sensors are increasingly complex and hence the time required to simulate them during their design cycle is also larger and larger. From a certain imager size, the proposed simulation method yields simulation times two order of magnitude faster that an implicit method based tool such us SPICE.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-11

A 100KHZ-1GHZ TERMINATION-DEPENDENT HUMAN BODY COMMUNICATION CHANNEL MEASUREMENT USING MINIATURIZED WEARABLE DEVICES
Speaker:
Shreyas Sen, Purdue University, US
Authors:
Shitij Avlani, Mayukh Nath, Shovan Maity and Shreyas Sen, Purdue University, US
Abstract
Human Body Communication has shown great promise to replace wireless communication for information exchange between wearable devices of a body area network. However, there are very few studies in literature, that systematically study the channel loss of capacitive HBC for wearable devices over a wide frequency range with different terminations at the receiver, partly due to the need for miniaturized wearable devices for an accurate study. This paper, for the first time, measures the channel loss of capacitive HBC from 100KHz to 1GHz for both high-impedance and 50 ohm terminations using wearable, battery powered devices; which is mandatory for accurate measurement of the HBC channel-loss, due to ground coupling effects. Results show that high impedance termination leads to a significantly lower channel loss (40 dB improvement at 1MHz), as compared to 50 ohm termination at low frequencies. This difference steadily decreases with increasing frequency, until they become similar near 80MHz. Beyond 100MHz inter-device coupling dominates, thereby preventing accurate measurements of channel loss of the human body. The measured results provide a consistent wearable, wide-frequency HBC channel loss data and could serve as a backbone for the emerging field of HBC by aiding in the selection of an appropriate operation frequency and termination.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-12

FROM DRUP TO PAC AND BACK
Speaker:
Daniela Kaufmann, Johannes Kepler University Linz, AT
Authors:
Daniela Kaufmann, Armin Biere and Manuel Kauers, Johannes Kepler University Linz, AT
Abstract
Currently the most efficient automatic approach to verify gate-level multipliers combines SAT solving and computer algebra. In order to increase confidence in the verification, proof certificates are generated. However, due to different solving techniques, these certificates require two different proof formats, namely DRUP and PAC. A combined proof has so far been missing. Correctness of this approach can thus only be trusted up to the correctness of compositional reasoning. In this paper we show how to generate a single proof in one proof format, which then allows to certify correctness using one simple proof checker. We further investigate empirically the effect on proof generation and checking time as well as on proof size. It turns out that PAC proofs are much more compact and faster to check.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-13

VERIFIABLE SECURITY TEMPLATES FOR HARDWARE
Speaker:
Bill Harrison, Oak Ridge National Laboratory, US
Authors:
William Harrison¹ and Gerard Allwein²
¹Oak Ridge National Laboratory, US; ²Naval Research Laboratory, US
Abstract
But HLS has, with a few notable exceptions, not focused on transferring ideas and techniques from high assurance software formal methods to hardware development, despite there being a sophisticated and mature body of research in that area. Just as it has introduced software engineering virtues, we believe HLS can also become a vector for retrofitting software formal methods to the challenge of high assurance security in hardware. This paper introduces the Device Calculus and its mechanization in the Agda proof checking system. The Device Calculus is a starting point for exploring formal methods and security within high-level synthesis flows. We illustrate the Device Calculus with a number of examples of formally verifiable security templates---i.e., functions in the Device Calculus that express common security structures at a high-level of abstraction.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-14

IFFSET: IN-FIELD FUZZING OF INDUSTRIAL CONTROL SYSTEMS USING SYSTEM EMULATION
Speaker:
Dimitrios Tychalas, New York University, US
Authors:
Dimitrios Tychalas¹ and Michail Maniatakos²
¹New York University, US; ²New York University Abu Dhabi, AE
Abstract
Industrial Control Systems (ICS) have evolved in the last decade, shifting from proprietary software/hardware to contemporary embedded architectures paired with open-source operating systems. In contrast to the IT world, where continuous updates and patches are expected, decommissioning always-on ICS for security assessment can incur prohibitive costs to their owner. Thus, a solution for routinely assessing the cybersecurity posture of diverse ICS without affecting their operation is essential. Therefore, in this paper we introduce IFFSET, a platform that leverages full system emulation of Linux-based ICS firmware and utilizes fuzzing for security evaluation. Our platform extracts the file system and kernel information from a live ICS device, building an image which is emulated on a desktop system through QEMU. We employ fuzzing as a security assessment tool to analyze ICS specific libraries and find potential security threatening conditions. We test our platform with commercial PLCs, showcasing potential threats with no interruption to the control process.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-15

FANNET: FORMAL ANALYSIS OF NOISE TOLERANCE, TRAINING BIAS AND INPUT SENSITIVITY IN NEURAL NETWORKS
Speaker:
Mahum Naseer, TU Wien, AT
Authors:
Mahum Naseer¹, Mishal Fatima Minhas², Faiq Khalid¹, Muhammad Abdullah Hanif¹, Osman Hasan² and Muhammad Shafique¹
¹TU Wien, AT; ²National University of Sciences and Technology, PK
Abstract
With a constant improvement in the network architectures and training methodologies, Neural Networks (NNs) are increasingly being deployed in real-world Machine Learning systems. However, despite their impressive performance on "known inputs", these NNs can fail absurdly on the "unseen inputs", especially if these real-time inputs deviate from the training dataset distributions, or contain certain types of input noise. This indicates the low noise tolerance of NNs, which is a major reason for the recent increase of adversarial attacks. This is a serious concern, particularly for safety-critical applications, where inaccurate results lead to dire consequences. We propose a novel methodology that leverages model checking for the Formal Analysis of Neural Network (FANNet) under different input noise ranges. Our methodology allows us to rigorously analyze the noise tolerance of NNs, their input node sensitivity, and the effects of training bias on their performance, e.g., in terms of classification accuracy. For evaluation, we use a feed-forward fully-connected NN architecture trained for the Leukemia classification. Our experimental results show 11% noise tolerance for the given trained network, identify the most sensitive input nodes, confirm the biasness of the available training dataset, and indicate that the proposed methodology is much more rigorous and yet comparable to validation testing in terms of time and computational resources for larger noise ranges.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-16

A SCALABLE MIXED SYNTHESIS FRAMEWORK FOR HETEROGENEOUS NETWORKS
Speaker:
Max Austin, University of Utah, US
Authors:
Max Austin¹, Scott Temple¹, Walter Lau Neto¹, Luca Amaru², Xifan Tang¹ and Pierre-Emmanuel Gaillardon¹
¹University of Utah, US; ²Synopsys, US
Abstract
We present a new logic synthesis framework which produces efficient post-technology mapped results on heterogeneous networks containing a mix of different types of logic. This framework accomplishes this by breaking down the circuit into sections using a hypergraph k-way partitioner and then determines the best-fit logic representation for each partition between two Boolean networks, And-Inverter Graphs(AIG) and Majority-Inverter Graphs(MIG), which have been shown to perform better over each other on different types of logic. Experimental results show that over a set of Open Piton DesignBenchmarks(OPDB) and OpenCores benchmarks, our proposed methodology outperforms state-of-the-art academic tools inArea-Delay Product(ADP), Power-Delay Product(PDP), and Energy-Delay Product(EDP) by 5%, 2%, and 15% respectively after performing Application Specific Integrated Circuits(ASIC) technology mapping as well as showing a 54% improvement in runtime over conventional MIG optimization
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-17

DISCERN: DISTILLING STANDARD CELLS FOR EMERGING RECONFIGURABLE NANOTECHNOLOGIES
Speaker:
Shubham Rai, TU Dresden, DE
Authors:
Shubham Rai¹, Michael Raitza², Siva Satyendra Sahoo¹ and Akash Kumar¹
¹TU Dresden, DE; ²TU Dresden and CfAED, DE
Abstract
Logic gates and circuits based on reconfigurable nanotechnologies demonstrate runtime-reconfigurability, where a single logic gate can exhibit more than one functionality. Recent attempts on circuits based on emerging reconfigurable nanotechnologies have primarily focused on using the traditional CMOS design flow involving similar-styled standard-cells. These CMOS-centric standard-cells fail to utilize the exciting properties offered by these nanotechnologies. In the present work, we explore the boolean properties that define the reconfigurable properties of a logic gate. By analyzing the truth-table in detail, we find that there is a common boolean rule which dictates why a logic gate is reconfigurable. Such logic gates can be efficiently implemented using reconfigurable nanotechnologies. We propose an algorithm which analyses the truth-tables of nodes in a circuit to list all such potential reconfigurable logic gates for a particular circuit. Technology mapping with these new logic gates (or standard-cells) leads to a better mapping in terms of area and delay. Experiments employing our methodology over EPFL benchmarks, show average improvements of around 13%, 16% and 11.5% in terms of area, number of edges and delay respectively as compared to the conventional CMOS-centric standard-cell based mapping.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-18

A 16×128 STOCHASTIC-BINARY PROCESSING ELEMENT ARRAY FOR ACCELERATING STOCHASTIC DOT-PRODUCT COMPUTATION USING 1-16 BIT-STREAM LENGTH
Speaker:
Hyunjoon Kim, Nanyang Technological University, SG
Authors:
Qian Chen, Yuqi Su, Hyunjoon Kim, Taegeun Yoo, Tony Tae-Hyoung Kim and Bongjin Kim, Nanyang Technological University, SG
Abstract
This work presents 16×128 stochastic-binary processing elements for energy/area efficient processing of artificial neural networks. A processing element (PE) with all-digital components consists of an XNOR gate as a bipolar stochastic multiplier and an 8bit binary adder with 8× registers for accumulating partial-sums. The PE array comprises 16× dot-product units, each with 128 PEs cascaded in a single row. The latency and energy of the proposed dot-product unit is minimized by reducing the number of bit-streams required for minimizing the accuracy degradation induced by the approximate stochastic computing. A 128-input dot-product operation requires the bit-stream length (N) of 1-to-16, which is two orders of magnitude smaller than the baseline stochastic computation using MUX-based adders. The simulated dot-product error is 6.9-to-1.5% for N=1-to-16, while the error from the baseline stochastic method is 5.9-to-1.7% with N=128-to-2048. A mean MNIST classification accuracy is 96.11% (which is 1.19% lower than 8b binary) using a three-layer MLP at N=16. The measured energy from a 65nm test-chip is 10.04pJ per dot-product, and the energy efficiency is 25.5TOPS/W at N=16.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-19

TOWARDS EXPLORING THE POTENTIAL OF ALTERNATIVE QUANTUM COMPUTING ARCHITECTURES
Speaker:
Arighna Deb, Kalinga Institute of Industrial Technology, IN
Authors:
Arighna Deb¹, Gerhard W. Dueck² and Robert Wille³
¹Kalinga Institute of Industrial Technology, IN; ²University of New Brunswick, CA; ³Johannes Kepler University Linz, AT
Abstract
The recent advances in the physical realization of Noisy Intermediate Scale Quantum (NISQ) computers have motivated research on design automation that allows users to execute quantum algorithms on them. Certain physical constraints in the architectures restrict how logical qubits used to describe the algorithm can be mapped to physical qubits used to realize the corresponding functionality. Thus far, this has been addressed by inserting additional operations in order to overcome the physical constrains. However, all these approaches have taken the existing architectures as invariant and did not explore the potential of changing the quantum architecture itself—a valid option as long as the underlying physical constrains remain satisfied. In this work, we propose initial ideas to explore this potential. More precisely, we introduce several schemes for the generation of alternative coupling graphs (and, by this, quantum computing architectures) that still might be able to satisfy physical constraints but, at the same time, allow for a more efficient realization of the desired quantum functionality.
Download Paper (PDF; Only available from the DATE venue WiFi)

IP2-20

ACCELERATING QUANTUM APPROXIMATE OPTIMIZATION ALGORITHM USING MACHINE LEARNING
Speaker:
Swaroop Ghosh, Pennsylvania State University, US
Authors:
Mahabubul Alam, Abdullah Ash- Saki and Swaroop Ghosh, Pennsylvania State University, US
Abstract
We propose a machine learning based approach to accelerate quantum approximate optimization algorithm (QAOA) implementation which is a promising quantum-classical hybrid algorithm to prove the so-called quantum supremacy. In QAOA, a parametric quantum circuit and a classical optimizer iterates in a closed loop to solve hard combinatorial optimization problems. The performance of QAOA improves with an increasing number of stages (depth) in the quantum circuit. However, two new parameters are introduced with each added stage for the classical optimizer increasing the number of optimization loop iterations. We note a correlation among parameters of the lower-depth and the higher-depth QAOA implementations and, exploit it by developing a machine learning model to predict the gate parameters close to the optimal values. As a result, the optimization loop converges in a fewer number of iterations. We choose graph MaxCut problem as a prototype to solve using QAOA. We perform a feature extraction routine using 100 different QAOA instances and develop a training data-set with 13,860 optimal parameters. We present our analysis for 4 flavors of regression models and 4 flavors of classical optimizers. Finally, we show that the proposed approach can curtail the number of optimization iterations by on average 44.9% (up to 65.7%) from an analysis performed with 264 flavors of graphs.
Download Paper (PDF; Only available from the DATE venue WiFi)