8.7 Compilers and Tools for Performance

Printer-friendly version PDF version

Date: Wednesday 11 March 2015
Time: 17:00 - 18:30
Location / Room: Les Bans

Chair:
Frank Hannig, Friedrich-Alexander-Universität Erlangen-Nürnberg, DE

Co-Chair:
Christian Haubelt, University of Rostock, DE

This session introduces different aspects of optimizing compiler technology in the context of embedded systems. The first paper shows that the performance of Android applications can be significantly improved by ahead-of-time compilation to C. The second paper shows how to generate efficient vector code from domain-specific (linear algebra) specifications. The first interactive presentation presents a technique to speed up the QEMU machine emulator by dynamic binary translation of vector instructions. The second one proposes a trace-based reuse distance profiler to help improving the memory profile of applications.

TimeLabelPresentation Title
Authors
17:008.7.1BYTECODE-TO-C AHEAD-OF-TIME COMPILATION FOR ANDROID DALVIK VIRTUAL MACHINE
Speakers:
Hyeong-Seok Oh, Ji Hwan Yeo and Soo-Mook Moon, Seoul National University, KR
Abstract
Android employs Java for programming its apps which is executed by its own virtual machine called the Dalvik VM (DVM). One problem of the DVM is its performance. Its just-in-time compiler (JITC) cannot generate high-performance code due to its trace-based compilation with short traces and modest optimizations, compared to JVM's method-based compilation with ample optimziations. This paper proposes a bytecode-to-C ahead-of-time compilation (AOTC) for the DVM to accelerate pre-installed apps. We translated the bytecode of some of the hot methods used by these apps to C code, which is then compiled together with the DVM source code. AOTC-generated code works with the existing Android zygote mechanism, with corrects garbage collection and exception handling. Due to off-line, method-based compilation using existing compiler with full optimizations and Java-specific optimizations, AOTC can generate quality code while obviating runtime compilation overhead. For benchmarks, AOTC can improve the performance by 10% to 500%. When we compare this result with the recently-introduced ART, which also performs ahead-of-time compilation, our AOTC performs better.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.7.2A BASIC LINEAR ALGEBRA COMPILER FOR EMBEDDED PROCESSORS
Speakers:
Nikolaos Kyrtatas, Daniele Giuseppe Spampinato and Markus Püschel, Swiss Federal Institute of Technology in Zurich (ETHZ), CH
Abstract
Many applications in signal processing, control, and graphics on embedded devices require efficient linear algebra computations. On general-purpose computers, program generators have proven useful to produce such code, or important building blocks, automatically. An example is LGen, a compiler for basic linear algebra computations of fixed size. In this work, we extend LGen towards the embedded domain using as example targets Intel Atom, ARM Cortex-A8, ARM Cortex-A9, and ARM1176 (Raspberry Pi). To efficiently support these processors we introduce support for the NEON vector ISA and a methodology for domain-specific load/store optimizations. Our experimental evaluation shows that the new version of LGen produces code that performs in many cases considerably better than well-established, commercial and non-commercial libraries (Intel MKL and IPP), software generators (Eigen and ATLAS), and compilers (icc, gcc, and clang).

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.7.3VARSHA: VARIATION AND RELIABILITY-AWARE APPLICATION SCHEDULING WITH ADAPTIVE PARALLELISM IN THE DARK-SILICON ERA
Speakers:
Nishit Kapadia and Sudeep Pasricha, Colorado State University, US
Abstract
With deeper technology scaling accompanied by a worsening power-wall, an increasing proportion of chip area on a chip multiprocessor (CMP) is expected to be occupied by dark-silicon. At the same time, design challenges due to process variations and soft-errors in integrated circuits are projected to become even more severe. In this work, we propose a novel framework that leverages the knowledge of variations on the chip to perform runtime application mapping and dynamic voltage scaling to optimize system performance and energy, while satisfying dark-silicon power constraints of the chip as well as application-specific performance and reliability constraints. Our experimental results show average savings of 35%-80% in application service-times and 13%-15% in energy consumption, compared to the state-of-the-art.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP4-7, 356IMPROVING SIMD CODE GENERATION IN QEMU
Speakers:
Sheng-Yu Fu1, Jan-Jan Wu2 and Wei-Chung Hsu1
1Department of Computer Science National Taiwan University, TW; 2Institute of Information Science Academia Sinica, TW
Abstract
Modern processors are often enhanced with SIMD instructions. For examples, the MMX, SSE, and AVX instruction set in the x86 architecture, and the Neon instruction set in the ARM architecture are SIMD instructions. Using these SIMD instructions could significantly increase the performance of applications, hence application binaries are likely to have a good fraction of instructions that are SIMD instructions. However, SIMD instruction translation has not attacked much attention in Dynamic Binary Translation (DBT). For example, in the popular QEMU system emulator, guest SIMD instructions are often emulated with a sequence of scalar instructions even when the host machines do have SIMD instructions to support such parallel computation, leaving a large potential for performance enhancement. In this paper, we propose two approaches, one to leverage the existing helper function implementation in QEMU, and the other to use a newly introduced vector IR (Intermediate Representation) to enhance the performance of SIMD instructions translation in DBT of QEMU. The approaches have been implemented in the QEMU to support ARM and IA32 frontend and x86-64 backend. Our preliminary experiments show that adding vector IR can significantly enhance the performance of guest applications containing SIMD instructions for both ARM and IA32 architectures when running with QEMU on the x86-64 platform.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:31IP4-8, 442REUSE DISTANCE ANALYSIS FOR LOCALITY OPTIMIZATION IN LOOP-DOMINATED APPLICATIONS
Speakers:
Christakis Lezos, Grigoris Dimitroulakos and Konstantinos Masselos, University of Peloponnese, GR
Abstract
This paper discusses MemAddIn, a compiler assisted dynamic code analysis tool that analyzes C code and exposes critical parts for memory related optimizations on embedded systems that can heavily affect systems performance, power and cost. The tool includes enhanced features for data reuse distance analysis and source code transformation recommendations for temporal locality optimization. Several of data reuse distance measurement algorithms have been implemented leading to different trade-offs between accuracy and profiling execution time. The proposed tool can be easily and seamlessly integrated into different software development environments offering a unified environment for application development and optimization. The novelties of our work over a similar optimization tool are also discussed. MemAddIn has been applied for the dynamic computation of data reuse distance for a number of different applications. Experimental results prove the effectiveness of the tool through the analysis and optimization of a realistic image processing application.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session
19:30DATE Party in Museum of Grenoble (Musée de Grenoble, 5 Place de Lavalette, 38000 Grenoble, France)

Musée de Grenoble

As one of the main networking opportunities during the DATE week, the DATE Party states a perfect occasion to meet friends and colleagues in a relaxed atmosphere while enjoying local amenities. It will take place on March 11, 2015, from 19:30 to 23:00 in the renowned "Musée de Grenoble" (Grenoble Museum). This painting museum features a unique collection of ancient, modern and contemporary art including major masterpieces of classical Flemish, Dutch, Italian and Spanish painting and all the great pot-1945 contemporary art-trends, right up to the most recent artwork of the 2000s.

During this evening, you can enjoy the famous French Cuisine and outstanding wines. Discover the region of the French Alps through ist cheese and wine specialties. The dinner will be accompanied by jazz songs and instrumental music from Anna Cruz and her vocal band. Another highlight will be the show waders "THE INSEPARABLES", sweet and ephemeral characters walking through the premises, releasing dreams and laughter. Furthermore, at the very beginning of the evening, from 20h00 to 21h30, you will have the opportunity to visit parts of the permanent collection of the museum (ninetieth and twenties century).

Please kindly note that it is not a seated dinner.

All delegates, exhibitors and their guests are invited to attend the party. Please be aware that entrance is only possible with a valid party ticket. Each full conference registration includes a ticket for the DATE Party (which needs to be booked during the online registration process though). Additional tickets can be purchased on-site at the registration desk (subject to availability of tickets). Price for extra ticket: 60 € per person.

How to get there: The tram B has a stop called "Notre Dame Musee". That stop is next to the Museum. Attendees would take the tram A from Alpexpo and change for Tram B in one of the stations between "Gares" and "Maison du Tourisme" to get to the museum. The trip takes about 30 minutes.