Date: Thursday 27 March 2014
Time: 10:00 - 10:30
Location / Room: Conference Level, foyer
Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the morning. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. At the end of each afternoon Interactive Presentations session the award 'Best IP of the Day' is given.
Label | Presentation Title Authors |
---|---|
IP4-1 | A MULTIPLE FAULT INJECTION METHODOLOGY BASED ON CONE PARTITIONING TOWARDS RTL MODELING OF LASER ATTACKS Speakers: Athanasios Papadimitriou1, David Hely1, Vincent Beroulle1, Paolo Maistri2 and Regis Leveugle3 1LCIS Laboratory - Grenoble INP, FR; 2TIMA Laboratory / CNRS, FR; 3TIMA Laboratory / Grenoble INP, FR Abstract Laser attacks, especially on circuits manufactured with recent deep submicron semiconductor technologies, pose a threat to secure integrated circuits due to the multiplicity of errors induced by a single attack. An efficient way to neutralize such effects is the design of appropriate countermeasures, according to the circuit implementation and characteristics. Therefore tools which allow the early evaluation of security implementations are necessary. Our efforts involve the development of an RTL fault injection approach more representative of laser attacks than random multi-bit fault injections and the utilization and evolution of state of the art emulation techniques to reduce the duration of the fault injection campaigns. This will ultimately lead to the design and validation of new countermeasures against laser attacks, on ASICs implementing cryptographic algorithms. |
IP4-2 | ENERGY EFFICIENT DATA FLOW TRANSFORMATION FOR GIVENS ROTATION BASED QR DECOMPOSITION Speakers: Namita Sharma1, Preeti Ranjan Panda1, Min Li2, Prashant Agrawal2 and Francky Catthoor2 1Indian Institute of Technology Delhi, IN; 2IMEC, BE Abstract QR Decomposition (QRD) is a typical matrix decomposition algorithm that shares many common features with other algorithms such as LU and Cholesky decomposition. The principle can be realized in a large number of valid processing sequences that differ significantly in the number of memory accesses and computations, and hence, the overall implementation energy. With modern low power embedded processors evolving towards register files with wide memory interfaces and vector functional units (FUs), the data flow in matrix decomposition algorithms needs to be carefully devised to achieve energy efficient implementation. In this paper, we present an efficient data flow transformation strategy for the Givens Rotation based QRD that optimizes data memory accesses. We also explore different possible implementations for QRD of multiple matrices using the SIMD feature of the processor. With the proposed data flow transformation, a reduction of up to 36% is achieved in the overall energy over conventional QRD sequences. |
IP4-3 | MODE-CONTROLLED DATAFLOW BASED MODELING & ANALYSIS OF A 4G-LTE RECEIVER Speakers: Hrishikesh Salunkhe1, Orlando Moreira2 and Kees van Berkel3 1PhD Candidate, NL; 2Principal DSP Systems Engineer, NL; 3Prof. Dr., NL Abstract Today's smartphones and tablets contain multiple cellular modems to support 2G/3G/4G standards, including Long Term Evolution (LTE). They run on complex multi-processor hardware platforms and have to meet hard real-time constraints. Dataflow modeling can be used to design an LTE receiver. Static dataflow allows a rich set of analysis techniques, but is too restrictive to model the dynamic behavior in many realistic applications, including LTE receivers. Dynamic dataflow allows modeling of many realistic applications, but does not support rigorous temporal analysis. Mode-Controlled Dataflow (MCDF) is a restricted form of dynamic dataflow, and allows the same analysis techniques as static dataflow, in principle. We prove that MCDF is sufficiently expressive to handle the dynamic behavior of a realistic LTE receiver, by systematically and stepwise developing a complete MCDF model for an LTE receiver. |
IP4-4 | MODEL-BASED ACTOR MULTIPLEXING WITH APPLICATION TO COMPLEX COMMUNICATION PROTOCOLS Speakers: Christian Zebelein1, Christian Haubelt1, Joachim Falk2, Tobias Schwarzer2 and Jürgen Teich2 1University of Rostock, DE; 2University of Erlangen-Nuremberg, DE Abstract We propose a dynamic scheduling approach for the concurrent execution of logical actor instances on a single synthesized actor instance. Based on a formal dataflow model of computation, the proposed approach can be applied to a wide range of applications in a model-based design flow. As case-study, we evaluate a bus-cycle-accurate SystemC RTL model based on an InfiniBand network adapter in a PCI Express system. |
IP4-5 | A NOVEL MODEL FOR SYSTEM-LEVEL DECISION MAKING WITH COMBINED ASP AND SMT SOLVING Speakers: Alexander Biewer1, Jens Gladigau1 and Christian Haubelt2 1Robert Bosch GmbH, DE; 2University of Rostock, DE Abstract In this paper, we present a novel model enabling system-level decision making for time-triggered many-core architectures in automotive systems. The proposed application model includes shared data entities that need to be bound to memories during decision making. As a key enabler to our approach, we explicitly separate computation and shared memory communication over a network-on-chip (NoC). To deal with contention on a NoC, we model the necessary basis to implement a time-triggered schedule that guarantees freedom of interference. We compute fundamental design decisions, namely (a) spatial binding, (b) multi-hop routing, and (c) time-triggered scheduling, by a novel coupling of answer set programming (ASP) with satisfiability modulo theories (SMT) solvers. First results of an automotive case study demonstrate the applicability of our method for complex real-world applications. |
IP4-6 | DESPERATE: SPEEDING-UP DESIGN SPACE EXPLORATION BY USING PREDICTIVE SIMULATION SCHEDULING Speakers: Giovanni Mariani, Gianluca Palermo, Vittorio Zaccaria and Cristina Silvano, Politecnico di Milano, IT Abstract Design Space Exploration (DSE) is the problem to find the best architecture configuration in a platform based design problem. To accurately evaluate a configuration, computational expensive simulations are required. A common approach to reduce DSE execution time is to use analytic performance prediction models to approximate some of the required simulations, thus to prune the design space by removing bad configuration candidates. In this paper we will demonstrate that state of the art analytic techniques to speedup the DSE process are not capable to fully exploit the potentialities of a parallel simulation environment. We will demonstrate that, when different simulations can be run in parallel, predicting simulation time to better schedule the simulations on the parallel simulation environment is a more profitable approach with a speedup of more than 2x when compared to state of the art approaches. |
IP4-7 | COMIK: A PREDICTABLE AND CYCLE-ACCURATELY COMPOSABLE REAL-TIME MICROKERNEL Speakers: Andrew Nelson1, Ashkan Beyranvand Nejad1, Anca Molnos2, Martijn Koedam3 and Kees Goossens3 1TU Delft, NL; 2CEA Leti, FR; 3TU Eindhoven, NL Abstract The functionality of embedded systems is ever increasing. This has lead to mixed time-criticality systems, where applications with a variety of real-time requirements co-exist on the same platform and share resources. Due to inter-application interference, verifying the real-time requirements of such systems is generally non trivial. In this paper, we present the CoMik microkernel that provides temporally predictable and composable processor virtualisation. CoMik's virtual processors are cycle-accurately composable, i.e. their timing cannot affect the timing of co-existing virtual processors by even a single cycle. Real-time applications executing on dedicated virtual processors can therefore be verified and executed in isolation, simplifying the verification of mixed time-criticality systems. We demonstrate these properties through experimentation on an FPGA prototyped hardware platform. |
IP4-8 | UTILIZATION-AWARE LOAD BALANCING FOR THE ENERGY EFFICIENT OPERATION ON THE BIG.LITTLE PROCESSOR Speakers: Myungsun Kim1, Kibeom Kim2, James Geraci1 and Seongsoo Hong3 1Samsung Electronics, KR; 2SAMSUNG Electronics, KR; 3Seoul National University, KR Abstract ARM's big.LITTLE architecture introduces the opportunity to optimize power consumption by selecting the core type most suitable for a level of processing demand. To take advantage of this new axis of optimization, we introduce the processor utilization factor into the Linux kernel's load balancing algorithm after carefully analyzing the power management mechanism of the big.LITTLE processor's port of Linux and deriving its state diagram representation. Our mechanism improves the Linux kernel's ability to assign tasks to cores in an energy efficient manner without having to make it directly aware of the available core types. Our experiments with a real test bed show that our algorithm improves energy consumption over the standard Linux scheduler up to 11.35% with almost no corresponding reduction in performance. |
IP4-9 | HEVCDTM: APPLICATION-DRIVEN DYNAMIC THERMAL MANAGEMENT FOR HIGH EFFICIENCY VIDEO CODING Speakers: Daniel Palomino1, Muhammad Shafique2, Hussam Amrouch2, Altamiro Susin3 and Jörg Henkel2 1Karlsruhe Institute of Technology (KIT), BR; 2Karlsruhe Institute of Technology (KIT), DE; 3Federal University of Rio Grande do Sul, BR Abstract This paper presents an application-driven algorithm for Dynamic Thermal Management (DTM) for the High Efficiency Video Coding (HEVC). For efficient design of such a DTM policy, we perform an offline thermal analysis of an HEVC encoder and demonstrate the impact of different video sequences and different coding configurations on the processor temperature. Our thermal analysis is leveraged to develop an efficient application-driven DTM policy that performs temperature-aware coding along with an application-driven control of DTM knobs (e.g., frequency scaling) in order to meet the temperature constraints while still providing high video quality (i.e. PSNR loss < 0.01dB). For accurate thermal analysis and evaluation, we deploy an infrared camera-based thermal measurement setup that, on the contrary to state-of-the-art setups, does not require adding any extra layer on top of the measured chip, thus allowing the camera to accurately capture the infrared emissions from the die. |
IP4-10 | IMPROVING EFFICIENCY OF EXTENSIBLE PROCESSORS BY USING APPROXIMATE CUSTOM INSTRUCTIONS Speakers: Mehdi Kamal1, Amin Ghasem Azar1, Ali Afzali-Kusha1 and Massoud Pedram2 1University of Tehran, IR; 2University of Southern California, US Abstract In this paper, we propose to move the conventional extensible processor design flow to the approximate computing domain to gain more speedup. In this domain, the instruction set architecture (ISA) design flow selects both exact and approximate custom instructions (CIs). The proposed approach could be used for the applications where imprecise results may be tolerated. In the CI identification phase of the flow, the CIs which do not satisfy the maximum propagation delay but can provide approximate results also may be included in the CI candidate set. Next, in the selection phase, we propose a merit function which selects CIs with higher cycle savings and small error rates. The efficacy of the proposed approximate design flow is investigated using the case studies of the discrete cosine transform (DCT) and inverse DCT (iDCT) of the MPEG2 application. Also, the impact of the process variation on the impreciseness of the results is investigated. |
IP4-11 | PROBABILISTIC STANDARD CELL MODELING CONSIDERING NON-GAUSSIAN PARAMETERS AND CORRELATIONS Speakers: André Lange1, Christoph Sohrmann1, Roland Jancke1, Joachim Haase1, Ingolf Lorenz2 and Ulf Schlichtmann3 1Fraunhofer Institute for Integrated Circuits (IIS), Design Automation Division (EAS), DE; 2GLOBALFOUNDRIES Inc., DE; 3Technische Universität München, DE Abstract Variability continues to pose challenges to integrated circuit design. With statistical static timing analysis and high-yield estimation methods, solutions to particular problems exist, but they do not allow a common view on performance variability including potentially correlated and non-Gaussian parameter distributions. In this paper, we present a probabilistic approach for variability modeling as an alternative: model parameters are treated as multi-dimensional random variables. Such a fully multivariate statistical description formally accounts for correlations and non-Gaussian random components. Statistical characterization and model application are introduced for standard cells and gate-level digital circuits. Example analyses of circuitry in a 28 nm industrial technology illustrate the capabilities of our modeling approach. |
IP4-12 | DYNAMIC CONSTRUCTION OF CIRCUITS FOR REACTIVE TRAFFIC IN HOMOGENEOUS CMPS Speakers: Marta Ortín-Obón1, Darío Suárez-Gracia Suárez-Gracia1, María Villaroya-Gaudó1, Cruz Izu2 and Víctor Viñals-Yúfera1 1University of Zaragoza, ES; 2University of Adelaide, AU Abstract Networks on Chip (NoCs) have a large impact on system performance, area and energy. Considering the characteristics of the memory subsystem while designing the NoC helps identify improvement opportunities and build more efficient designs. Leveraging the frequent request-reply pattern, our proposal dynamically builds the reply path in advance, is able to share circuits between messages, and even removes some implicit replies, significantly reducing NoC latency. A careful implementation of this circuit reservation mechanism achieves an average 17% reduction in router energy consumption, 8% smaller router area and a 2% system performance increase, compared with its baseline counterpart. |
IP4-13 | IMPROVING HAMILTONIAN-BASED ROUTING METHODS FOR ON-CHIP NETWORKS: A TURN MODEL APPROACH Speakers: Poona Bahrebar and Dirk Stroobandt, Ghent University, BE Abstract The overall performance of Multi-Processor System-on-Chip (MPSoC) platforms depends highly on the efficient communication among their cores in the Network-on-Chip (NoC). Routing algorithms are responsible for the on-chip communication and traffic distribution through the network. Hence, designing efficient and high-performance routing algorithms is of significant importance. In this paper, a deadlock-free and highly adaptive path-based routing method is proposed without using virtual channels. This method strives to exploit the maximum number of minimal paths between any source and destination pair. The simulation results in terms of performance and power consumption demonstrate that the proposed method significantly outperforms the other adaptive and non-adaptive schemes. This efficiency is achieved by reducing the number of hotspots and smoothly distributing the traffic across the network. |
IP4-14 | EDA TOOLS TRUST EVALUATION THROUGH SECURITY PROPERTY PROOFS Speaker: Yier Jin, The University of Central Florida, US Abstract The security concerns of EDA tools have long been ignored because IC designers and integrators only focus on their functionality and performance. This lack of trusted EDA tools hampers hardware security researchers' efforts to design trusted integrated circuits. To address this concern, a novel EDA tools trust evaluation framework has been proposed to ensure the trustworthiness of EDA tools through its functional operation, rather than scrutinizing the software code. As a result, the newly proposed framework lowers the evaluation cost and is a better fit for hardware security researchers. To support the EDA tools evaluation framework, a new gate-level information assurance scheme is developed for security property checking on any gate-level netlist. Helped by the gate-level scheme, we expand the territory of proof-carrying based IP protection from RT-level designs to gate-level netlist, so that most of the commercially trading third-party IP cores are under the protection of proof-carrying based security properties. Using a sample AES encryption core, we successfully prove the trustworthiness of Synopsys Design Compiler in generating a synthesized netlist. |
IP4-15 | (Best Paper Award Candidate) ANALYSIS AND EVALUATION OF PER-FLOW DELAY BOUND FOR MULTIPLEXING MODELS Speakers: Yanchen Long1, Zhonghai Lu2 and Xiaolang Yan3 1Zhejiang University and KTH Royal Institute of Technology, SE; 2KTH Royal Institute of Technology, SE; 3Zhejiang University, CN Abstract Multiplexing models are common in resource sharing communication media such as buses, crossbars and networks. While sending packets over a multiplexing node, the packet delay bound can be computed using network calculus models. The tightness of such delay bound remains an open problem. This paper studies the multiplexing models for weighted round robin scheduling with different traffic arrival curves, and analyzes per-flow packet delay bounds with different service properties. We empirically evaluate the tightness of the delay bounds. Our results show the quality of different analysis models, and how influential each parameter is to tightness. |
IP4-17 | AGING-AWARE STANDARD CELL LIBRARY DESIGN Speakers: Saman Kiamehr1, Farshad Firouzi1, Mojtaba Ebrahimi2 and Mehdi Tahoori2 1Karlsruhe Institute of Technology (KIT), DE; 2Karlsruhe Institute of Technology, DE Abstract Transistor aging, mostly due to Bias Temperature Instability (BTI), is one of the major unreliability sources at nano-scale technology nodes. BTI causes the circuit delay to increase and eventually leads to a decrease in the circuit lifetime. Typically, standard cells in the library are optimized according to the design time delay, however, due to the asymmetric effect of BTI, the rise and fall delays might become significantly imbalanced over the lifetime. In this paper, the BTI effect is mitigated by balancing the rise and fall delays of the standard cells at the excepted lifetime. We find an optimal trade-off between the increase in the size of the library and the lifetime improvement (timing margin reduction) by non-uniform extension of the library cells for various ranges of the input signal probabilities. The simulation results reveal that our technique can prolong the circuit lifetime by around 150% with a negligible area overhead. |
IP4-18 | PASS-XNOR LOGIC: A NEW LOGIC STYLE FOR P-N JUNCTION BASED GRAPHENE CIRCUITS Speakers: Valerio Tenace, Andrea Calimera, Enrico Macii and Massimo Poncino, Politecnico di Torino, IT Abstract In this work we introduce a new logic style for p-n junctions based digital graphene circuits: the pass-XNOR logic style. The latter enables the realization of compact, energy efficient circuits that better exploit the characteristics of graphene. We first show how a single p-n junction can be conceived as a pass-XNOR gate, i.e., a transmission gate with embedded logic functionality, the XNOR Boolean operator. Secondly, we propose a smart integration strategy in which series/parallel connections of pass-XNOR gates allow to implement AND/OR logical conjunctions, and, therefore, all possible truth tables. Experimental results conducted on a set of representative logic functions show the superior of pass-XNOR logic circuits w.r.t. standard CMOS circuits and graphene circuits that use p-n junctions in a complementary-like structure. |
IP4-19 | MIXED ALLOCATION OF ADJUSTABLE DELAY BUFFERS COMBINED WITH BUFFER SIZING IN CLOCK TREE SYNTHESIS OF MULTIPLE POWER MODE DESIGNS Speakers: Kitae Park, Geunho Kim and Taewhan Kim, Seoul National University, KR Abstract Recently, many works have shown that adjustable delay buffer (ADB) whose delay is adjustable dynamically can effectively solve the clock skew variation problem in the designs with multiple power modes. However, all the previous works of ADB allocation inherently entail two critical limitations, which are the adjusted delays by ADB are always increments and the low cost buffer sizing has never been or not been primarily taken into account. To demonstrate how much overcoming the two limitations is effective in resolving the clock skew constraint, we characterize the two types of ADBs called CADB (capacitor based ADB) and IADB (inverter based ADB) and show that the adjusted delays by IADB can be decremented, and show that the clock skew violation in some clock trees of multiple power modes can be resolved by applying buffer sizing together with using only a small number of IADBs and CADBs. |