7.2 Reconfigurable Systems and Architectures

Time	Label	Presentation Title Authors
14:30	7.2.1	A FRAMEWORK FOR ADDING LOW-OVERHEAD, FINE-GRAINED POWER DOMAINS TO CGRAS Speaker: Ankita Nayak, Stanford University, US Authors: Ankita Nayak, Keyi Zhang, Raj Setaluri, Alex Carsello, Makai Mann, Stephen Richardson, Rick Bahr, Pat Hanrahan, Mark Horowitz and Priyanka Raina, Stanford University, US Abstract To effectively minimize static power for a wide range of applications, power domains for a coarse-grained reconfigurable array (CGRA) need to be finer-grained than a typical ASIC. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that intrinsically provides boundary protection. This technique reduces the area overhead of boundary protection between power domains for the CGRA from around 9% to less than 1% and removes the delay from the isolation cells. However, with this design choice, we cannot leverage the conventional UPF-based flow to introduce power domain boundary protection. We create compiler-like passes that iteratively introduce the needed design transformations, and formally verify the passes with satisfiability modulo theories (SMT) methods. These passes also allow us to optimize how we handle test and debug signals through the off tiles. We use our framework to insert power domains into an SoC with an ARM Cortex M3 processor and a CGRA with 32x16 processing element (PE) and memory tiles and 4MB secondary memory. Depending on the size of the applications mapped, our CGRA achieves up to an 83% reduction in leakage power and 26% reduction in total power versus a CGRA without multiple power domains, for a range of image processing and machine learning applications. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	7.2.2	BLASTFUNCTION: AN FPGA-AS-A-SERVICE SYSTEM FOR ACCELERATED SERVERLESS COMPUTING Speaker: Rolando Brondolin, Politecnico di Milano, IT Authors: Marco Bacis, Rolando Brondolin and Marco D. Santambrogio, Politecnico di Milano, IT Abstract Heterogeneous computing platforms are now a valuable solution to continue to meet Service Level Agreements (SLAs) for compute intensive cloud workloads. Field Programmable Gate Arrays (FPGAs) effectively accelerate cloud workloads, however, these workloads have a spiky behavior as well as long periods of underutilization. Sharing the FPGA with multiple tenants then helps to increase the board's time utilization. In this paper we present BlastFunction, a distributed FPGA sharing system for the acceleration of microservices and serverless applications in cloud environments. BlastFunction includes a Remote OpenCL Library to access the shared devices transparently; multiple Device Managers to time-share and monitor the FPGAs and a central Accelerators Registry to allocate the available devices. BlastFunction reaches higher utilization and throughput w.r.t. a native execution thanks to device sharing, with minimal differences in latency given by the concurrent accesses. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	7.2.3	ENERGY-AWARE PLACEMENT FOR SRAM-NVM HYBRID FPGAS Speaker: Seongsik Park, Seoul National University, KR Authors: Seongsik Park, Jongwan Kim and Sungroh Yoon, Seoul National University, KR Abstract Field-programmable gate arrays (FPGAs) have been widely used in many applications due to their reconfigurability. Especially, the short development time makes the FPGAs one of the promising reconfigurable architectures for emerging applications, such as deep learning. As CMOS technology advances, however, conventional SRAM-based FPGAs have approached their limitations. To overcome these obstacles, NVM-based FPGAs have been introduced. Although NVM-based FPGAs have the features of high area density, low static power consumption, and non-volatility, they are struggling to reduce energy consumption. Their challenge is mainly caused by the access speed of NVM, which is relatively slower than SRAM. In this paper, for compensating this limitation, we suggest SRAM-NVM hybrid FPGA architecture with SRAM- and NVM-based CLBs. In addition, we propose an energy-aware placement for efficient use of the SRAM-NVM hybrid FPGAs. As a result of our experiments, we were able to reduce the average energy consumption of SRAM-NVM hybrid FPGA by 22.23% and 21.94% compared to SRAM-based FPGA on the MCNC and VTR benchmarks, respectively. Download Paper (PDF; Only available from the DATE venue WiFi)
16:00		End of session

Time

Label

Presentation Title
Authors

14:30

7.2.1

A FRAMEWORK FOR ADDING LOW-OVERHEAD, FINE-GRAINED POWER DOMAINS TO CGRAS
Speaker:
Ankita Nayak, Stanford University, US
Authors:
Ankita Nayak, Keyi Zhang, Raj Setaluri, Alex Carsello, Makai Mann, Stephen Richardson, Rick Bahr, Pat Hanrahan, Mark Horowitz and Priyanka Raina, Stanford University, US
Abstract
To effectively minimize static power for a wide range of applications, power domains for a coarse-grained reconfigurable array (CGRA) need to be finer-grained than a typical ASIC. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that intrinsically provides boundary protection. This technique reduces the area overhead of boundary protection between power domains for the CGRA from around 9% to less than 1% and removes the delay from the isolation cells. However, with this design choice, we cannot leverage the conventional UPF-based flow to introduce power domain boundary protection. We create compiler-like passes that iteratively introduce the needed design transformations, and formally verify the passes with satisfiability modulo theories (SMT) methods. These passes also allow us to optimize how we handle test and debug signals through the off tiles. We use our framework to insert power domains into an SoC with an ARM Cortex M3 processor and a CGRA with 32x16 processing element (PE) and memory tiles and 4MB secondary memory. Depending on the size of the applications mapped, our CGRA achieves up to an 83% reduction in leakage power and 26% reduction in total power versus a CGRA without multiple power domains, for a range of image processing and machine learning applications.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:00

7.2.2

BLASTFUNCTION: AN FPGA-AS-A-SERVICE SYSTEM FOR ACCELERATED SERVERLESS COMPUTING
Speaker:
Rolando Brondolin, Politecnico di Milano, IT
Authors:
Marco Bacis, Rolando Brondolin and Marco D. Santambrogio, Politecnico di Milano, IT
Abstract
Heterogeneous computing platforms are now a valuable solution to continue to meet Service Level Agreements (SLAs) for compute intensive cloud workloads. Field Programmable Gate Arrays (FPGAs) effectively accelerate cloud workloads, however, these workloads have a spiky behavior as well as long periods of underutilization. Sharing the FPGA with multiple tenants then helps to increase the board's time utilization. In this paper we present BlastFunction, a distributed FPGA sharing system for the acceleration of microservices and serverless applications in cloud environments. BlastFunction includes a Remote OpenCL Library to access the shared devices transparently; multiple Device Managers to time-share and monitor the FPGAs and a central Accelerators Registry to allocate the available devices. BlastFunction reaches higher utilization and throughput w.r.t. a native execution thanks to device sharing, with minimal differences in latency given by the concurrent accesses.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:30

7.2.3

ENERGY-AWARE PLACEMENT FOR SRAM-NVM HYBRID FPGAS
Speaker:
Seongsik Park, Seoul National University, KR
Authors:
Seongsik Park, Jongwan Kim and Sungroh Yoon, Seoul National University, KR
Abstract
Field-programmable gate arrays (FPGAs) have been widely used in many applications due to their reconfigurability. Especially, the short development time makes the FPGAs one of the promising reconfigurable architectures for emerging applications, such as deep learning. As CMOS technology advances, however, conventional SRAM-based FPGAs have approached their limitations. To overcome these obstacles, NVM-based FPGAs have been introduced. Although NVM-based FPGAs have the features of high area density, low static power consumption, and non-volatility, they are struggling to reduce energy consumption. Their challenge is mainly caused by the access speed of NVM, which is relatively slower than SRAM. In this paper, for compensating this limitation, we suggest SRAM-NVM hybrid FPGA architecture with SRAM- and NVM-based CLBs. In addition, we propose an energy-aware placement for efficient use of the SRAM-NVM hybrid FPGAs. As a result of our experiments, we were able to reduce the average energy consumption of SRAM-NVM hybrid FPGA by 22.23% and 21.94% compared to SRAM-based FPGA on the MCNC and VTR benchmarks, respectively.
Download Paper (PDF; Only available from the DATE venue WiFi)

16:00

End of session