IP3 Interactive Presentations

Printer-friendly version PDF version

Date: Wednesday 11 March 2020
Time: 16:00 - 16:30
Location / Room: Poster Area

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

LabelPresentation Title
Authors
IP3-1CNT-CACHE: AN ENERGY-EFFICIENT CARBON NANOTUBE CACHE WITH ADAPTIVE ENCODING
Speaker:
Kexin Chu, School of Electronic Science & Applied Physics Hefei University of Technology Anhui,China, CN
Authors:
Dawen Xu1, Kexin Chu1, Cheng Liu2, Ying Wang2, Lei Zhang2 and Huawei Li2
1School of Electronic Science & Applied Physics Hefei University of Technology Anhui, CN; 2Chinese Academy of Sciences, CN
Abstract
Carbon Nanotubu field-effect transistor(CNFET) that promises both higher clock speed and energy efficiency becomes an attractive alternative to the conventional power-hungry CMOS cache. We observe that CNFET-based cacheconstructed with typical 9T SRAM cells has distinct energy consumption when reading/writing 0 and 1 from/to it. The energy consumption of reading 0 is around 3X higher compared toreading 1. The energy consumption of writing 1 is almost 10X higher than writing 0. With this observation, we propose an energy-efficient cache design called CNT-Cache to take advantage of this feature. It includes an adaptive data encoding modulethat can convert the coding of each cache line to match the cache reading and writing preferences. Meanwhile, it has a cache line encoding direction predictor that instructs the encoding direction according to the cache line access history. The two optimizations combined together can reduce the overall dynamicpower consumption significantly. According to our experiments,the optimized CNFET-based L1 D-Cache reduces the dynamic power consumption by 22% on average compared to the baseline CNFET cache.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-2ENHANCING MULTITHREADED PERFORMANCE OF ASYMMETRIC MULTICORES WITH SIMD OFFLOADING
Speaker:
Antonio Scheneider Beck, Universidade Federal do Rio Grande do Sul, BR
Authors:
Jeckson Dellagostin Souza1, Madhavan Manivannan2, Miquel Pericas2 and Antonio Carlos Schneider Beck1
1Universidade Federal do Rio Grande do Sul, BR; 2Chalmers, SE
Abstract
Asymmetric multicore architectures with single-ISA can accelerate multithreaded applications by running code that does not execute concurrently (i.e., the serial region) on a big core and the parallel region on a larger number of smaller cores. Nevertheless, in such architectures the big core still implements resource-expensive application-specific instruction extensions that are rarely used while running the serial region, such as Single Instruction Multiple Data (SIMD) and Floating-Point (FP) operations. In this work, we propose a design in which these extensions are not implemented in the big core, thereby freeing up area and resources to increase the number of small cores in the system, and potentially enhance thread-level parallelism (TLP). To address the case when missing instruction extensions are required while running on the big core we devise an approach to automatically offload these operations to the execution units of the small cores, where the extensions are implemented and can be executed. Our evaluation shows that, on average, the proposed architecture provides 1.76x speedup when compared to a traditional single-ISA asymmetric multicore processor with the same area, for a variety of parallel applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-3HARDWARE ACCELERATION OF CNN WITH ONE-HOT QUANTIZATION OF WEIGHTS AND ACTIVATIONS
Speaker:
Gang Li, Chinese Academy of Sciences, CN
Authors:
Gang Li, Peisong Wang, Zejian Liu, Cong Leng and Jian Cheng, Chinese Academy of Sciences, CN
Abstract
In this paper, we propose a novel one-hot representation for weights and activations in CNN model and demonstrate its benefits on hardware accelerator design. Specifically, rather than merely reducing the bitwidth, we quantize both weights and activations into n-bit integers that containing only one non-zero bit per value. In this way, the massive multiply and accumulates (MACs) are equivalent to additions of powers of two that can be efficiently calculated with histogram based computations. Experiments on the ImageNet classification task show that comparable accuracy can be obtained on our proposed One-Hot Networks (OHN) compared to conventional fixed-point networks. As case studies, we evaluate the efficacy of one-hot data representation on two state-of-the-art CNN accelerators on FPGA, our preliminary results show that 50% and 68.5% resource saving can be achieved on DaDianNao and Laconic respectively. Besides, the one-hot optimized Laconic can further achieve an average speedup of 4.94x on AlexNet.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-4BNNSPLIT: BINARIZED NEURAL NETWORKS FOR EMBEDDED DISTRIBUTED FPGA-BASED COMPUTING SYSTEMS
Speaker:
Luca Stornaiuolo, Politecnico di Milano, IT
Authors:
Giorgia Fiscaletti, Marco Speziali, Luca Stornaiuolo, Marco D. Santambrogio and Donatella Sciuto, Politecnico di Milano, IT
Abstract
In the past few years, Convolutional Neural Networks (CNNs) have seen a massive improvement, outperforming other visual recognition algorithms. Since they are playing an increasingly important role in fields such as face recognition, augmented reality or autonomous driving, there is the growing need for a fast and efficient system to perform the redundant and heavy computations of CNNs. This trend led researchers towards heterogeneous systems provided with hardware accelerators, such as GPUs and FPGAs. The vast majority of CNNs is implemented with floating-point parameters and operations, but from research, it has emerged that high classification accuracy can be obtained also by reducing the floating-point activations and weights to binary values. This context is well suitable for FPGAs, that are known to stand out in terms of performance when dealing with binary operations, as demonstrated in Finn, the state-of-the-art framework for building Binarized Neural Network (BNN) accelerators on FPGAs. In this paper, we propose a framework that extends Finn to a distributed scenario, enabling BNNs implementation on embedded multi-FPGA systems.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-5L2L: A HIGHLY ACCURATE LOG_2_LEAD QUANTIZATION OF PRE-TRAINED NEURAL NETWORKS
Speaker:
Salim Ullah, TU Dresden, DE
Authors:
Salim Ullah1, Siddharth Gupta2, Kapil Ahuja2, Aruna Tiwari2 and Akash Kumar1
1TU Dresden, DE; 2IIT Indore, IN
Abstract
Deep Neural Networks are one of the machine learning techniques which are increasingly used in a variety of applications. However, the significantly high memory and computation demands of deep neural networks often limit their deployment on embedded systems. Many recent works have considered this problem by proposing different types of data quantization schemes. However, most of these techniques either require post-quantization retraining of deep neural networks or bear a significant loss in output accuracy. In this paper, we propose a novel quantization technique for parameters of pre-trained deep neural networks. Our technique significantly maintains the accuracy of the parameters and does not require retraining of the networks. Compared to the single-precision floating-point numbers-based implementation, our proposed 8-bit quantization technique generates only ∼ 1% and ∼ 0.4%, loss in the top-1 and top-5 accuracies respectively for VGG16 network using ImageNet dataset.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-6FAULT DIAGNOSIS OF VIA-SWITCH CROSSBAR IN NON-VOLATILE FPGA
Speaker:
Ryutaro Doi, Osaka University, JP
Authors:
Ryutaro DOI1, Xu Bai2, Toshitsugu Sakamoto2 and Masanori Hashimoto1
1Osaka University, JP; 2NEC Corporation, JP
Abstract
FPGA that exploits via-switches, which are a kind of non-volatile resistive RAMs, for crossbar implementation is attracting attention due to its high integration density and energy efficiency. Via-switch crossbar is responsible for the signal routing by changing on/off-states of via-switches. To verify the via-switch crossbar functionality after manufacturing, fault testing that checks whether we can turn on/off via-switches normally is essential. This paper confirms that a general differential pair comparator successfully discriminates on/off-states of via-switches, and clarifies fault modes of a via-switch by transistor-level SPICE simulation that injects stuck-on/off faults to atom switch and varistor, where a via-switch consists of two atom switches and two varistors. We then propose a fault diagnosis methodology that diagnoses the fault modes of each via-switch using the comparator response difference between normal and faulty via-switches. The proposed method achieves 100% fault detection by checking the comparator responses after turning on/off the via-switch. In case that the number of faulty components in a via-switch is one, the ratio of the fault diagnosis, which exactly identifies the faulty varistor and atom switch inside the faulty via-switch, is 100%, and in case of up to two faults, the fault diagnosis ratio is 79%.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-7APPLYING RESERVATION-BASED SCHEDULING TO A µC-BASED HYPERVISOR: AN INDUSTRIAL CASE STUDY
Speaker:
Dirk Ziegenbein, Robert Bosch GmbH, DE
Authors:
Dakshina Dasari1, Paul Austin2, Michael Pressler1, Arne Hamann1 and Dirk Ziegenbein1
1Robert Bosch GmbH, DE; 2ETAS GmbH, GB
Abstract
Existing software scheduling mechanisms do not suffice for emerging applications in the automotive space, which have the conflicting needs of performance and predictability. %We need mechanisms that lend themselves naturally to this requirement, by virtue of their design. As a concrete case, we consider the ETAS light-weight hypervisor, a commercially viable solution in the automotive industry, deployed on multicore microcontrollers. We describe the architecture of the hypervisor and its current scheduling mechanisms based on Time Division Multiplexing. We next show how Reservation-based Scheduling can be implemented in the ETAS LWHVR to efficiently use resources while also providing freedom from interference and explore design choices towards an efficient implementation of such a scheduler. With experiments from an industry use case, we also compare the performance of RBS and the existing scheduler in the hypervisor.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-8REAL-TIME ENERGY MONITORING IN IOT-ENABLED MOBILE DEVICES
Speaker:
Nitin Shivaraman, TUMCREATE, SG
Authors:
Nitin Shivaraman1, Seima Suriyasekaran1, Zhiwei Liu2, Saravanan Ramanathan1, Arvind Easwaran2 and Sebastian Steinhorst3
1TUMCREATE, SG; 2Nanyang Technological University, SG; 3TU Munich, DE
Abstract
With rapid advancements in the Internet of Things (IoT) paradigm, every electrical device in the near future is expected to have IoT capabilities. This enables fine-grained tracking of individual energy consumption data of such devices, offering location-independent per-device billing and demand management. Hence, it abstracts from the location-based metering of state-of-the-art infrastructure, which traditionally aggregates on a building or household level, defining the entity to be billed. However, such in-device energy metering is susceptible to manipulation and fraud. As a remedy, we propose a secure decentralized metering architecture that enables devices with IoT capabilities to measure their own energy consumption. In this architecture, the device-level consumption is additionally reported to a system-level aggregator that verifies distributed information from our decentralized metering systems and provides secure data storage using Blockchain, preventing data manipulation by untrusted entities. Through experimental evaluation, we show that the proposed architecture supports device mobility and enables location-independent monitoring of energy consumption.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-9TOWARDS SPECIFICATION AND TESTING OF RISC-V ISA COMPLIANCE
Speaker:
Vladimir Herdt, University of Bremen, DE
Authors:
Vladimir Herdt1, Daniel Grosse2 and Rolf Drechsler2
1University of Bremen, DE; 2University of Bremen / DFKI, DE
Abstract
Compliance testing for RISC-V is very important. Therefore, an official hand-written compliance test-suite is being actively developed. However, this requires significant manual effort in particular to achieve a high test coverage. In this paper we propose a test-suite specification mechanism in combination with a first set of instruction constraints and coverage requirements for the base RISC-V ISA. In addition, we present an automated method to generate a test-suite that satisfies the specification. Our evaluation demonstrates the effectiveness and potential of our method.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-10POST-SILICON VALIDATION OF THE IBM POWER9 PROCESSOR
Speaker:
Hillel Mendelson, IBM, IL
Authors:
Tom Kolan1, Hillel Mendelson1, Vitali Sokhin1, Kevin Reick2, Elena Tsanko2 and Gregory Wetli2
1IBM Research, IL; 2IBM Systems, US
Abstract
Due to the complexity of designs, post-silicon validation remains a major challenge with few systematic solutions. We provide an overview of the state-of-the-art post silicon validation process used by IBM to verify its latest IBM POWER9 processor. During the POWER9 post-silicon validation, we detected and handled 30% more logic bugs in 80% of the time, as compared to the previous IBMPOWER8 bring-up. This improvement is the result of lessons learned from previous designs, leading to numerous innovations. We provide bug analysis data and compare it to POWER8 results. We demonstrate our methodology by describing several bugs from fail detection to root cause.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-11ON THE TASK MAPPING AND SCHEDULING FOR DAG-BASED EMBEDDED VISION APPLICATIONS ON HETEROGENEOUS MULTI/MANY-CORE ARCHITECTURES
Speaker:
Nicola Bombieri, Università di Verona, IT
Authors:
Stefano Aldegheri1, Nicola Bombieri1 and Hiren Patel2
1Università di Verona, IT; 2University of Waterloo, CA
Abstract
In this work, we show that applying the heterogeneous earliest finish time (HEFT) heuristic for the task scheduling of embedded vision applications can improve the system performance up to 70% w.r.t. the scheduling solutions at the state of the art. We propose an algorithm called exclusive earliest finish time (XEFT) that introduces the notion of exclusive overlap between application primitives to improve the load balancing. We show that XEFT can improve the system performance up to 33% over HEFT, and 82% over the state of the art approaches. We present the results on different benchmarks, including a real-world localization and mapping application (ORB-SLAM) combined with the NVIDIA object detection application based on deep-learning.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-12ARE CLOUD FPGAS REALLY VULNERABLE TO POWER ANALYSIS ATTACKS?
Speaker:
Ognjen Glamocanin, EPFL, CH
Authors:
Ognjen Glamocanin1, Louis Coulon1, Francesco Regazzoni2 and Mirjana Stojilovic1
1EPFL, CH; 2ALaRI, CH
Abstract
Recent works have demonstrated the possibility of extracting secrets from a cryptographic core running on an FPGA by means of remote power analysis attacks. To mount these attacks, an adversary implements a voltage fluctuation sensor in the FPGA logic, records the power consumption of the target cryptographic core, and recovers the secret key by running a power analysis attack on the recorded traces. Despite showing that the power analysis could also be performed without physical access to the cryptographic core, these works were mostly carried out on dedicated FPGA boards in a controlled environment, leaving open the question about the possibility to successfully mount these attacks on a real system deployed in the cloud. In this paper, we demonstrate, for the first time, a successful key recovery attack on an AES cryptographic accelerator running on an Amazon EC2 F1 instance. We collect the power traces using a delay-line based voltage drop sensor, adapted to the Xilinx Virtex Ultrascale+ architecture used on Amazon EC2 F1, where CARRY8 blocks do not have a monotonic delay increase at their outputs. Our results demonstrate that security concerns raised by multitenant FPGAs are indeed valid and that countermeasures should be put in place to mitigate them.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-13EFFICIENT TRAINING ON EDGE DEVICES USING ONLINE QUANTIZATION
Speaker:
Michael Ostertag, University of California, San Diego, US
Authors:
Michael Ostertag1, Sarah Al-Doweesh2 and Tajana Rosing1
1University of California, San Diego, US; 2King Abdulaziz City of Science and Technology, SA
Abstract
Sensor-specific calibration functions offer superior performance over global models and single-step calibration procedures but require prohibitive levels of sampling in the input feature space. Sensor self-calibration by gathering training data through collaborative calibration or self-analyzing predictive results allows these sensors to gather sufficient information. Resource-constrained edge devices are then stuck between high communication costs for transmitting training data to a centralized server and high memory requirements for storing data locally. We propose online dataset quantization that maximizes the diversity of input features, maintaining a representative set of data from a larger stream of training data points. We test the effectiveness of online dataset quantization on two real-world datasets: air quality calibration and power prediction modeling. Online Dataset Quantization outperforms reservoir sampling and performs equally to offline methods.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP3-14MULTI-AGENT ACTOR-CRITIC METHOD FOR JOINT DUTY-CYCLE AND TRANSMISSION POWER CONTROL
Speaker:
Sota Sawaguchi, CEA-Leti, FR
Authors:
Sota Sawaguchi1, Jean-Frédéric Christmann2, Anca Molnos2, Carolynn Bernier2 and Suzanne Lesecq2
1CEA, FR; 2CEA-Leti, FR
Abstract
Energy-harvesting Internet of Things (EH-IoT) wireless networks have gained attention due to their infinite operation and maintenance-free property. However, maintaining energy neutral operation (ENO) of EH-IoT devices, such that the harvested and consumed energy are matched during a certain time period, is crucial. Guaranteeing this ENO condition and optimal power-performance trade-off under various workloads and transient wireless channel quality is particularly challenging. This paper proposes a multi-agent actor-critic method for modulating both the transmission duty-cycle and the transmitter output power based on the state-of-buffer (SoB) and the state-of-charge (SoC) information as a state. Thanks to these buffers, system uncertainties, especially harvested energy and wireless link conditions, are addressed effectively. Differently from the state-of-the-art, our solution does not require any model of the wireless transceiver nor any measurement of wireless channel quality. Simulation results of a solar powered EH-IoT node using real-life outdoor solar irradiance data show that the proposed method achieves better performance without system fails throughout a year compared to the state-of-the-art that suffers some system downtime. Our approach also predicts almost no system fails during five years of operation. This proves that our approach can adapt to the change in energy-harvesting and wireless channel quality, all without direct observations.

Download Paper (PDF; Only available from the DATE venue WiFi)