Date: Wednesday 27 March 2019
Time: 16:00 - 16:30
Location / Room: Poster Area
Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session
Label | Presentation Title Authors |
---|---|
IP3-1 | NON-INTRUSIVE SELF-TEST LIBRARY FOR AUTOMOTIVE CRITICAL APPLICATIONS: CONSTRAINTS AND SOLUTIONS Speaker: Davide Piumatti, Politecnico di Torino, IT Authors: Paolo Bernardi1, Riccardo Cantoro1, Andrea Floridia1, Davide Piumatti1, Cozmin Pogonea1, Annachiara Ruospo1, Ernesto Sanchez1, Sergio De Luca2 and Alessandro Sansonetti2 1Politecnico di Torino, IT; 2STMicroelectronics, IT Abstract Today, safety-critical applications require self-tests and self-diagnosis approaches to be applied during the lifetime of the device. In general, the fault coverage values required by the standards (like ISO 26262) in the whole System-on-Chip (SoC) are very high. Therefore, different strategies are adopted. In the case of the processor core, the required fault coverage can be achieved by scheduling the periodical execution of a set of test programs or Software-Test Library (STL). However, the STL for in-field testing should be able to comply with the operating system specifications without affecting the mission operation of the device application. In this paper, the most relevant problems for the development of the STL are first discussed. Then, it presents a set of strategies and solutions oriented to produce an efficient and non-intrusive STL to be used exclusively during the in-field testing of automotive processor cores. The proposed approach was experimented on an automotive SoC developed by STMicroelectronics. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-2 | DEPENDENCY-RESOLVING INTRA-UNIT PIPELINE ARCHITECTURE FOR HIGH-THROUGHPUT MULTIPLIERS Speaker: Dae Hyun Kim, Washington State University, US Authors: Jihee Seo and Dae Hyun Kim, Washington State University, US Abstract In this paper, we propose two dependency-resolving intra-unit pipeline architectures to design high-throughput multipliers. Simulation results show that the proposed multipliers achieve approximately 2.3× to 3.1× execution time reduction at a cost of 4.4% area and 3.7% power overheads for highly-dependent multiplications. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-3 | A HARDWARE-EFFICIENT LOGARITHMIC MULTIPLIER WITH IMPROVED ACCURACY Authors: Mohammad Saeed Ansari, Bruce Cockburn and Jie Han, University of Alberta, CA Abstract Logarithmic multipliers take the base-2 logarithm of the operands and perform multiplication by only using shift and addition operations. Since computing the logarithm is often an approximate process, some accuracy loss is inevitable in such designs. However, the area, latency, and power consumption can be significantly improved at the cost of accuracy loss. This paper presents a novel method to approximate log_2N that, unlike the existing approaches, rounds N to its nearest power of two instead of the highest power of two smaller than or equal to N. This approximation technique is then used to design two improved 16x16 logarithmic multipliers that use exact and approximate adders (ILM-EA and ILM-AA, respectively). These multipliers achieve up to 24.42% and 9.82% savings in area and power-delay product, respectively, compared to the state-of-the-art design in the literature with similar accuracy. The proposed designs are evaluated in the Joint Photographic Experts Group (JPEG) image compression algorithm and their advantages over other approximate logarithmic multipliers are shown. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-4 | LIGHTWEIGHT HARDWARE SUPPORT FOR SELECTIVE COHERENCE IN HETEROGENEOUS MANYCORE ACCELERATORS Speaker: Alessandro Cilardo, CeRICT, IT Authors: Alessandro Cilardo, Mirko Gagliardi and Vincenzo Scotti, University of Naples Federico II, IT Abstract Shared memory coherence is a key feature in manycore accelerators, ensuring programmability and application portability. Most established solutions for coherence in homogeneous systems cannot be simply reused because of the special requirements of accelerator architectures. This paper introduces a low-overhead hardware coherence system for heterogeneous accelerators, with customizable granularity and noncoherent region support. The coherence system has been demonstrated in operation in a full manycore accelerator, exhibiting significant improvements in terms of network load, execution time, and power consumption. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-5 | FUNCTIONAL ANALYSIS ATTACKS ON LOGIC LOCKING Speaker: Pramod Subramanyan, Indian Institute of Technology Kanpur, IN Authors: Deepak Sirone and Pramod Subramanyan, Indian Institute of Technology Kanpur, IN Abstract This paper proposes Functional Analysis attacks on state of the art Logic Locking algorithms (FALL attacks). FALL attacks use structural and functional analyses of locked circuits to identify the locking key. In contrast to past work, FALL attacks can often (90% of successful attempts in our experiments) fully defeat locking by only analyzing the locked netlist, without oracle access to an activated circuit. Experiments show that FALL attacks succeed against 65 out of 80 (81%) of circuits locked using Secure Function Logic Locking (SFLL), the only combinational logic locking algorithm resilient to known attacks. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-6 | SIGATTACK: NEW HIGH-LEVEL SAT-BASED ATTACK ON LOGIC ENCRYPTIONS Speaker: Hai Zhou, Northwestern University, US Authors: Yuanqi Shen1, You Li1, Shuyu Kong2, Amin Rezaei1 and Hai Zhou1 1Northwestern University, US; 2northwestern university, CN Abstract Logic encryption is a powerful hardware protection technique that uses extra key inputs to lock a circuit from piracy or unauthorized use. The recent discovery of the SAT-based attack with Distinguishing Input Pattern (DIP) generation has rendered all traditional logic encryptions vulnerable, and thus the creation of new encryption methods. However, a critical question for any new encryption method is whether security against the DIP-generation attack means security against all other attacks. In this paper, a new high-level SAT-based attack called SigAttack has been discovered and thoroughly investigated. It is based on extracting a key-revealing signature in the encryption. A majority of all known SAT-resilient encryptions are shown to be vulnerable to SigAttack. By formulating the condition under which SigAttack is effective, the paper also provides guidance for the future logic encryption design. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-7 | ZEROPOWERTOUCH: ZERO-POWER SMART RECEIVER FOR TOUCH COMMUNICATION AND SENSING FOR INTERNET OF THING AND WEARABLE APPLICATIONS Speaker: Michele Magno, ETH Zurich, CH Authors: Philipp Mayer, Raphael Strebel and Michele Magno, ETH Zurich, CH Abstract The human body can be used as a transmission medium for electric fields. By applying an electric field with a frequency of decades of megahertz to isolated electrodes on the human body, it is possible to send energy and data. Extra body and intra-body communication is an interesting alternative way to communicate in a wireless manner in the new era of wearable device and internet of things. In fact, this promising communication works without the need to design a dedicate radio hardware and with a lower power consumption. We designed and implemented a novel zero-power receiver targeting intra-body and extra-body wireless communication and touch sensing. To achieve zero-power and always-on working, we combined ultra-low power design and an energy-harvesting subsystem, which extracts energy directly from the received message. This energy is then employed to supply the whole receiver to demodulate the message and to perform data processing with a digital logic. The main goal of the proposed design is ideal to wake up external logic only when a specific address is received. Moreover, due to the presence of the digital logic, the designed zero-power receiver can implement identification and security algorithms. The zero-power receiver can be used either as an always-on touch sensor to be deployed in the field or as a body communication wake up smart and secure devices. A working prototype demonstrates the zero-power working, the communication intra-body, and extra-body, and the possibility to achieve more than 1.75m in intra-body without the use of any external battery. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-8 | TAILORING SVM INFERENCE FOR RESOURCE-EFFICIENT ECG-BASED EPILEPSY MONITORS Speaker: Lorenzo Ferretti, Università della Svizzera italiana, CH Authors: Lorenzo Ferretti1, Giovanni Ansaloni1, Laura Pozzi1, Amir Aminifar2, David Atienza2, Leila Cammoun3 and Philippe Ryvlin3 1USI Lugano, CH; 2École Polytechnique Fédérale de Lausanne (EPFL), CH; 3Centre Hospitalier Universitaire Vaudois, CH Abstract Event detection and classification algorithms are resilient towards aggressive resource-aware optimisations. In this paper, we leverage this characteristic in the context of smart health monitoring systems. In more detail, we study the attainable benefits resulting from tailoring Support Vector Machine (SVM) inference engines devoted to the detection of epileptic seizures from ECG-derived features. We conceive and explore multiple optimisations, each effectively reducing resource budgets while minimally impacting classification performance. These strategies can be seamlessly combined, which results in 12.5X and 16X gains in energy and area, respectively, with a negligible loss, 3.2% in classification performance. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-9 | AN INDOOR LOCALIZATION SYSTEM TO DETECT AREAS CAUSING THE FREEZING OF GAIT IN PARKINSONIANS Speaker: Graziano Pravadelli, Dept. of Computer Science, Univ. of Verona, IT Authors: Florenc Demrozi1, Vladislav Bragoi1, Federico Tramarin2 and Graziano Pravadelli1 1Department of Computer Science, University of Verona, IT; 2Department of Information Engineering, University of Padua, IT Abstract People affected by the Parkinson's disease are often subject to episodes of Freezing of Gait (FoG) near specific areas within their environment. In order to prevent such episodes, this paper presents a low-cost indoor localization system specifically designed to identify these critical areas. The final aim is to exploit the output of this system within a wearable device, to generate a rhythmic stimuli able to prevent the FoG when the person enters a risky area. The proposed localization system is based on a classification engine, which uses a fingerprinting phase for the initial training. It is then dynamically adjusted by exploiting a probabilistic graph model of the environment. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-10 | ASSEMBLY-RELATED CHIP/PACKAGE CO-DESIGN OF HETEROGENEOUS SYSTEMS MANUFACTURED BY MICRO-TRANSFER PRINTING Speaker: Tilman Horst, Dresden University of Technology, DE Authors: Robert Fischbach, Tilman Horst and Jens Lienig, Dresden University of Technology, DE Abstract Technologies for heterogeneous integration have been promoted as an option to drive innovation in the semiconductor industry. However, adoption by designers is lagging behind and market shares are still low. Alongside the lack of appropriate design tools, high manufacturing costs are one of the main reasons. µTP is a novel and promising micro-assembly technology that enables the heterogeneous integration of dies originating from different wafers. This technology uses an elastomer stamp to transfer dies in parallel from source wafers to their target positions, indicating a high potential for reducing manufacturing time and cost. In order to achieve the latter, the geometrical interdependencies between source, target and stamp and the resulting wafer utilization must be considered during design. We propose an approach to evaluate a given µTP design with regard to the manufacturing costs. We achieve this by developing a model that integrates characteristics of the assembly process into the cost function of the design. Our approach can be used as a template how to tackle other assembly-related co-design issues -- addressing an increasingly severe cost optimization problem of heterogeneous systems design. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-11 | VISUAL INERTIAL ODOMETRY AT THE EDGE - A HARDWARE-SOFTWARE CO-DESIGN APPROACH FOR ULTRA-LOW LATENCY AND POWER Speaker: Dipan Kumar Mandal, Intel Corporation, IN Authors: Dipan Kumar Mandal1, Srivatsava Jandhyala1, Om J Omer1, Gurpreet S Kalsi1, Biji George1, Gopi Neela1, Santhosh Kumar Rethinagiri1, Sreenivas Subramoney1, Hong Wong2, Lance Hacking2 and Belliappa Kuttanna2 1Intel Corporation, IN; 2Intel Corporation, US Abstract Visual Inertial Odometry (VIO) is used for estimating pose and trajectory of a system and is a foundational requirement in many emerging applications like AR/VR, autonomous navigation in cars, drones and robots. In this paper, we analyze key compute bottlenecks in VIO and present a highly optimized VIO accelerator based on a hardware-software co-design approach. We detail a set of novel micro-architectural techniques that optimize compute, data movement, bandwidth and dynamic power to make it possible to deliver high quality of VIO at ultra-low latency and power required for budget constrained edge devices. By offloading the computation of the critical linear algebra algorithms from the CPU, the accelerator enables high sample rate IMU usage in VIO processing while acceleration of image processing pipe increases precision, robustness and reduces IMU induced drift in final pose estimate. The proposed accelerator requires a small silicon footprint (1.3 mm2 in a 28nm process at 600 MHz), utilizes a modest on-chip shared SRAM (560KB) and achieves 10x speedup over a software-only implementation in terms of image sample-based pose update latency while consuming just 2.2 mW power. In a FPGA implementation, using the EuRoC VIO dataset (VGA 30fps images and 100Hz IMU) the accelerator design achieves pose estimation accuracy (loop closure error) comparable to a software based VIO implementation. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-12 | CAPSACC: AN EFFICIENT HARDWARE ACCELERATOR FOR CAPSULENETS WITH DATA REUSE Speaker: Alberto Marchisio, Vienna University of Technology (TU Wien), AT Authors: Alberto Marchisio, Muhammad Abdullah Hanif and Muhammad Shafique, Vienna University of Technology (TU Wien), AT Abstract Recently, CapsuleNets have overtaken traditional Deep Convolutional Neural Networks (CNNs), because of their improved generalization ability due to the multi-dimensional capsules, in contrast to the single-dimensional neurons. Consequently, CapsuleNets also require extremely intense matrix computations, making it a gigantic challenge to achieve high performance. In this paper, we propose CapsAcc, the first specialized CMOS-based hardware architecture to perform CapsuleNets inference with high performance and energy efficiency. State-of-the-art convolutional CNN accelerators would not work efficiently for CapsuleNets, as their designs do not account for unique processing nature of CapsuleNets involving multi-dimensional matrix processing, squashing and dynamic routing. Our architecture exploits the massive parallelism by flexibly feeding the data to a specialized systolic array according to the operations required in different layers. It also avoids extensive load and store operations on the on-chip memory, by reusing the data when possible. We synthesized the complete CapsAcc architecture in a 32nm CMOS technology using Synopsys design tools, and evaluated it for the MNIST benchmark (as also done by the original CapsuleNet paper) to ensure consistent and fair comparisons. This work enables highly-efficient CapsuleNets inference on embedded platforms. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-13 | SDCNN: AN EFFICIENT SPARSE DECONVOLUTIONAL NEURAL NETWORK ACCELERATOR ON FPGA Speaker: Suk-Ju Kang, Sogang University, KR Authors: Jung-Woo Chang, Keon-Woo Kang and Suk-Ju Kang, Sogang University, KR Abstract Generative adversarial networks (GANs) have shown excellent performance in image generation applications. GAN typically uses a new type of neural network called deconvolutional neural network (DCNN). To implement DCNN in hardware, the state-of-the-art DCNN accelerator optimizes the dataflow using DCNN-to-CNN conversion method. However, this method still requires high computational complexity because the number of feature maps is increased when converted from DCNN to CNN. Recently, pruning has been recognized as an effective solution to reduce the high computational complexity and huge network model size. In this paper, we propose a novel sparse DCNN accelerator (SDCNN) combining these approaches on FPGA. First, we propose a novel dataflow suitable for the sparse DCNN acceleration by loop transformation. Then, we introduce a four stage pipeline for generating the SDCNN model. Finally, we propose an efficient architecture based on SDCNN dataflow. Experimental results on DCGAN show that SDCNN achieves up to 2.63 times speedup over the state-of-the-art DCNN accelerator. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-14 | A FINE-GRAINED SOFT ERROR RESILIENT ARCHITECTURE UNDER POWER CONSIDERATIONS Speaker: Sajjad Hussain, Chair for Embedded Systems, KIT, Karlsruhe, DE Authors: Sajjad Hussain1, Muhammad Shafique2 and Joerg Henkel1 1Karlsruhe Institute of Technology, DE; 2Vienna University of Technology (TU Wien), AT Abstract Besides the limited power budgets and the dark-silicon issue, soft error is one of the most critical reliability issues in computing systems fabricated using nano-scale devices. During the execution, different applications have varying performance, power/energy consumption and vulnerability properties. Different trade-offs can be devised to provide required resiliency within the allowed power constraints. To exploit this behavior, we propose a novel soft error resilient architecture and the corresponding run-time system that enables power-aware fine-grained resiliency for different processor components. It selectively determines the reliability state of various components, such that the overall application reliability is improved under a given power budget. Our architecture saves power up to 16% and reliability degradation up to 11% compared to state-of-the-art techniques. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-15 | FINE-GRAINED HARDWARE MITIGATION FOR MULTIPLE LONG-DURATION TRANSIENTS ON VLIW FUNCTION UNITS Speaker: Angeliki Kritikakou, University of Rennes 1 - IRISA/INRIA, FR Authors: Rafail Psiakis1, Angeliki Kritikakou1 and Olivier Sentieys2 1Univ Rennes/IRISA/INRIA, FR; 2INRIA, FR Abstract Technology scaling makes hardware more susceptible to radiation, which can cause multiple transient faults with long duration. In these cases, the affected function unit is usually considered as faulty and is not further used. To reduce this performance degradation, the proposed hardware mechanism detects the faults that are still active during execution and re-schedules the instructions to use the fault-free components of the affected function units. The results show multiple long-duration fault mitigation with low performance, area, and power overhead Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-16 | ADAPTIVE WORD REORDERING FOR LOW-POWER INTER-CHIP COMMUNICATION Speaker: Eleni Maragkoudaki, University of Manchester, GB Authors: Eleni Maragkoudaki1, Przemyslaw Mroszczyk2 and Vasilis Pavlidis3 1University of Manchester, GB; 2Qualcomm, IE; 3The University of Manchester, GB Abstract The energy for data transfer has an increasing effect on the total system energy as technology scales, often overtaking computation energy. To reduce the power of inter-chip interconnects, an adaptive encoding scheme called Adaptive Word Reordering (AWR) is proposed that effectively decreases the number of signal transitions, leading to a significant power reduction. AWR outperforms other adaptive encoding schemes in terms of decrease in transitions, yielding up to 73% reduction in switching. Furthermore, complex bit transition computations are represented as delays in the time domain to limit the power overhead due to encoding. The saved power outweighs the overhead beyond a moderate wire length where the I/O voltage is assumed equal to the core voltage. For a typical I/O voltage, the decrease in power is significant reaching 23% at just 1 mm. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-17 | MACHINE-LEARNING-DRIVEN MATRIX ORDERING FOR POWER GRID ANALYSIS Speaker: Wenjian Yu, Tsinghua University, CN Authors: Ganqu Cui1, Wenjian Yu1, Xin Li2, Zhiyu Zeng3 and Ben Gu3 1Tsinghua University, CN; 2Duke University, US; 3Cadence Design Systems, Inc., US Abstract A machine-learning-driven approach for matrix ordering is proposed for power grid analysis based on domain decomposition. It utilizes support vector machine or artificial neural network to learn a classifier to automatically choose the optimal ordering algorithm, thereby reducing the expense of solving the subdomain equations. Based on the feature selection considering sparse matrix properties, the proposed method achieves superior efficiency in runtime and memory usage over conventional methods, as demonstrated by industrial test cases. Download Paper (PDF; Only available from the DATE venue WiFi) |
IP3-18 | ASSERTION-BASED VERIFICATION THROUGH BINARY INSTRUMENTATION Speaker: Laurence Pierre, Univ. Grenoble Alpes, FR Authors: Enzo Brignon and Laurence Pierre, TIMA Lab (Univ. Grenoble Alpes, CNRS, Grenoble INP), FR Abstract Verifying the correctness and the reliability of C or C++ embedded software is a crucial issue. To alleviate this verification process, we advocate runtime assertion-based verification of formal properties. Such logic and temporal properties can be specified using the IEEE standard PSL (Property Specification Language) and automatically translated into software assertion checkers. A major issue is the instrumentation of the embedded program so that those assertion checkers will be triggered upon specific events during execution. This paper presents an automatic instrumentation solution for object files, which enables such an event-driven property evaluation. It also reports experimental results for different kinds of applications and properties. Download Paper (PDF; Only available from the DATE venue WiFi) |