LASER: A Hardware/Software Approach to Accelerate Complicated Loops on CGRAs
Mahesh Balasubramanian1,a, Shail Dave1,b, Aviral Shrivastava1,c and Reiley Jeyapaul2
1Arizona State University, Tempe, AZ
ambalasubramanian@asu.edu
bshail.dave@asu.edu
caviral.shrivastava@asu.edu
2ARM, Cambridge, United Kingdom
reiley.jeyapaul@arm.com
ABSTRACT
Coarse‐Grained Reconfigurable Arrays (CGRAs) are popular accelerators predominantly used in streaming, filtering, and decoding applications. Due to their high performance and high power‐efficiency, CGRAs can be a promising solution to accelerate the loops of general purpose applications also. However, the loops in general purpose applications are often complicated, like loops with perfect and imperfect nests and loops with nested if‐then‐else's (conditionals). We argue that the existing hardware‐software solutions to execute branches and conditions are inefficient. In order to efficiently execute complicated loops on CGRAs, we present a hardware‐software hybrid solution: LASER ‐ a comprehensive technique to accelerate compute‐intensive loops of applications. In LASER, compiler transforms complex loops, maps them to the CGRA, and lays them out in the memory in a specific manner, such that the hardware can fetch and execute the instructions from the right path at runtime. LASER achieves a geomean performance improvement of 40.91% and utilization of 43.43% with 46% lower energy consumption.