Enhancing Multithreaded Performance of Asymmetric Multicores with SIMD Offloading

Jeckson Dellagostin Souza1,a, Madhavan Manivannan2,b, Miquel Pericàs2,c and Antonio Carlos Schneider Beck1,d

1Universidade Federal Do Rio Grande do Sul, Porto Alegre, Brazil
2Chalmers University of Technology, Gothenburg, Sweden
ajeckson.souza@inf.ufrgs.br
bmadhavan@chalmers.se
cmiquelp@chalmers.se
dcaco@inf.ufrgs.br

ABSTRACT

Asymmetric multicore architectures with single-ISA can accelerate multithreaded applications by running code that does not execute concurrently (i.e., the serial region) on a big core and the parallel region on a larger number of smaller cores. Nevertheless, in such architectures the big core still implements resource-expensive pplication-specific instruction extensions that are rarely used while running the serial region, such as Single Instruction Multiple Data (SIMD) and Floating-Point (FP) operations. In this work, we propose a design in which these extensions are not implemented in the big core, thereby freeing up area and resources to increase the number of small cores in the system, and potentially enhance thread-level parallelism (TLP). To address the case when missing instruction extensions are required while running on the big core we devise an approach to automatically offload these operations to the execution units of the small cores, where the extensions are implemented and can be executed. Our evaluation shows that, on average, the proposed architecture provides 1.76× speedup when compared to a traditional single-ISA asymmetric multicore processor with the same area, for a variety of parallel applications.

Keywords: Functional unit sharing, Offloading, SIMD, Heterogeneity, Multicore.



Full Text (PDF)