8.4 Applications of Reconfigurable Computing

Time	Label	Presentation Title Authors
17:00	8.4.1	ADAPTIVE VEHICLE DETECTION FOR REAL-TIME AUTONOMOUS DRIVING SYSTEM Speaker: Maryam Hemmati, The University of Auckland, NZ Authors: Maryam Hemmati¹, Morteza Biglari-Abhari¹ and Smail Niar² ¹University of Auckland, NZ; ²University of Valenciennes and Hainaut-Cambresis, FR Abstract Modern cars are being equipped with powerful computational resources for autonomous driving systems (ADS) as one of their major parts to provide safer travels on roads. High accuracy and real-time requirements of ADS are addressed by HW/SW co-design methodology which helps in offloading the computationally intensive tasks to the hardware part. However, the limited hardware resources could be a limiting factor in complicated systems. This paper presents a dynamically reconfigurable system for ADS which is capable of real-time vehicle and pedestrian detection. Our approach employs different methods of vehicle detection in different lighting conditions to achieve better results. A novel deep learning method is presented for detection of vehicles in the dark condition where the road light is very limited or unavailable. We present a partial reconfiguration (PR) controller which accelerates the reconfiguration process on Zynq SoC for seamless detection in real-time applications. By partially reconfiguring the vehicle detection block on Zynq SoC, resource requirements is maintained low enough to allow for the existence of other functionalities of ADS on hardware which could complete their tasks without any interruption. Our presented system is capable of detecting pedestrian and vehicles in different lighting conditions at the rate of 50fps (frames per second) for HDTV (1080x1920) frame. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30	8.4.2	AN EFFICIENT FPGA-BASED FLOATING RANDOM WALK SOLVER FOR CAPACITANCE EXTRACTION USING SDACCEL Speaker: Xin Wei, Fudan University, CN Authors: Xin Wei¹, Changhao Yan¹, Hai Zhou², Dian Zhou¹ and Xuan Zeng¹ ¹Fudan University, CN; ²Northwestern Univerity, US Abstract The floating random walk (FRW) algorithm is an important method widely used in the capacitance extraction of very large-scale integration (VLSI) interconnects. FRW could be both time-consuming and power-consuming as the circuit scale grows. However, its highly parallel nature prompts us to accelerate it with FPGAs, which have shown great performance and energy efficiency potential to other computing architectures. In this paper, we propose a scalable FPGA/CPU heterogeneous framework of FRW using SDAccel. Large-scale circuits are partitioned first by the CPU into several segments, and these segments are then sent to the FPGA random walking one by one. The framework solves the challenge of limited FPGA on-chip resource and integrates both merits of FPGAs and CPUs by targeting separate parts of the algorithm to suitable architecture, and the FPGA bitstream is built once for all. Several kernel optimization strategies are used to maximize performance of FPGAs. Besides, the FRW algorithm we use is the naive version with walking on spheres (WOS), which is much simpler and easier to implement than the complicatedly optimized version with walking on cubes (WOC). The implementation on AWS EC2 F1 (Xilinx VU9P FPGA) shows up to 6.1x performance and 42.6x energy efficiency over a quad-core CPU, and 5.2x energy efficiency over the state-of-the-art WOC implementation on an 8-core CPU. Download Paper (PDF; Only available from the DATE venue WiFi)
18:00	8.4.3	ACCELERATING ITEMSET SAMPLING USING SATISFIABILITY CONSTRAINTS ON FPGA Speaker: Mael Gueguen, Univ Rennes, Inria, CNRS, IRISA, FR Authors: Mael Gueguen¹, Olivier Sentieys² and Alexandre Termier¹ ¹Univ Rennes, CNRS, IRISA, FR; ²INRIA, FR Abstract Finding recurrent patterns within a data stream is important for fields as diverse as cybersecurity or e-commerce. This requires to use pattern mining techniques. However, pattern mining suffers from two issues. The first one, known as ``pattern explosion'', comes from the large combinatorial space explored, and is the output of too many results for them to be useful. Recent techniques called output space sampling solve this problem by outputing only a sampled set of all the results, with a target size provided by the user. The second issue is that most algorithms are designed to operate on static datasets or low throughput streams. In this paper, we propose a contribution to tackle both issues, by designing an FPGA accelerator for pattern mining with output space sampling and we show that our accelerator can outperform a state of the art implementation on a server class CPU using modest a FPGA product. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30	IP4-1, 492	AN EFFICIENT MAPPING APPROACH TO LARGE-SCALE DNNS ON MULTI-FPGA ARCHITECTURES Speaker: Jiaxi Zhang, Peking University, CN Authors: Wentai Zhang¹, Jiaxi Zhang¹, Minghua Shen², Guojie Luo¹ and Nong Xiao³ ¹Peking University, CN; ²Sun Yat-sen University, CN; ³Sun Yat-Sen University, CN Abstract FPGAs are very attractive to accelerate the deep neural networks (DNNs). While single FPGA can provide good performance for small-scale DNNs, support for large-scale DNNs is limited due to higher resource demand. In this paper, we propose an efficient mapping approach for accelerating large-scale DNNs on asymmetric multi-FPGA architectures. In this approach, the neural network mapping can be formulated as a resource allocation problem. We design a dynamic programming-based partitioning to solve this problem optimally. Experimental results using the large-scale ResNet-152 demonstrate that our approach deploys sixteen FPGAs to provide an advantage of 16.4x GOPS over the state-of-the-art work. Download Paper (PDF; Only available from the DATE venue WiFi)
18:30		End of session

Time

Label

Presentation Title
Authors

17:00

8.4.1

ADAPTIVE VEHICLE DETECTION FOR REAL-TIME AUTONOMOUS DRIVING SYSTEM
Speaker:
Maryam Hemmati, The University of Auckland, NZ
Authors:
Maryam Hemmati¹, Morteza Biglari-Abhari¹ and Smail Niar²
¹University of Auckland, NZ; ²University of Valenciennes and Hainaut-Cambresis, FR
Abstract
Modern cars are being equipped with powerful computational resources for autonomous driving systems (ADS) as one of their major parts to provide safer travels on roads. High accuracy and real-time requirements of ADS are addressed by HW/SW co-design methodology which helps in offloading the computationally intensive tasks to the hardware part. However, the limited hardware resources could be a limiting factor in complicated systems. This paper presents a dynamically reconfigurable system for ADS which is capable of real-time vehicle and pedestrian detection. Our approach employs different methods of vehicle detection in different lighting conditions to achieve better results. A novel deep learning method is presented for detection of vehicles in the dark condition where the road light is very limited or unavailable. We present a partial reconfiguration (PR) controller which accelerates the reconfiguration process on Zynq SoC for seamless detection in real-time applications. By partially reconfiguring the vehicle detection block on Zynq SoC, resource requirements is maintained low enough to allow for the existence of other functionalities of ADS on hardware which could complete their tasks without any interruption. Our presented system is capable of detecting pedestrian and vehicles in different lighting conditions at the rate of 50fps (frames per second) for HDTV (1080x1920) frame.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:30

8.4.2

AN EFFICIENT FPGA-BASED FLOATING RANDOM WALK SOLVER FOR CAPACITANCE EXTRACTION USING SDACCEL
Speaker:
Xin Wei, Fudan University, CN
Authors:
Xin Wei¹, Changhao Yan¹, Hai Zhou², Dian Zhou¹ and Xuan Zeng¹
¹Fudan University, CN; ²Northwestern Univerity, US
Abstract
The floating random walk (FRW) algorithm is an important method widely used in the capacitance extraction of very large-scale integration (VLSI) interconnects. FRW could be both time-consuming and power-consuming as the circuit scale grows. However, its highly parallel nature prompts us to accelerate it with FPGAs, which have shown great performance and energy efficiency potential to other computing architectures. In this paper, we propose a scalable FPGA/CPU heterogeneous framework of FRW using SDAccel. Large-scale circuits are partitioned first by the CPU into several segments, and these segments are then sent to the FPGA random walking one by one. The framework solves the challenge of limited FPGA on-chip resource and integrates both merits of FPGAs and CPUs by targeting separate parts of the algorithm to suitable architecture, and the FPGA bitstream is built once for all. Several kernel optimization strategies are used to maximize performance of FPGAs. Besides, the FRW algorithm we use is the naive version with walking on spheres (WOS), which is much simpler and easier to implement than the complicatedly optimized version with walking on cubes (WOC). The implementation on AWS EC2 F1 (Xilinx VU9P FPGA) shows up to 6.1x performance and 42.6x energy efficiency over a quad-core CPU, and 5.2x energy efficiency over the state-of-the-art WOC implementation on an 8-core CPU.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:00

8.4.3

ACCELERATING ITEMSET SAMPLING USING SATISFIABILITY CONSTRAINTS ON FPGA
Speaker:
Mael Gueguen, Univ Rennes, Inria, CNRS, IRISA, FR
Authors:
Mael Gueguen¹, Olivier Sentieys² and Alexandre Termier¹
¹Univ Rennes, CNRS, IRISA, FR; ²INRIA, FR
Abstract
Finding recurrent patterns within a data stream is important for fields as diverse as cybersecurity or e-commerce. This requires to use pattern mining techniques. However, pattern mining suffers from two issues. The first one, known as ``pattern explosion'', comes from the large combinatorial space explored, and is the output of too many results for them to be useful. Recent techniques called output space sampling solve this problem by outputing only a sampled set of all the results, with a target size provided by the user. The second issue is that most algorithms are designed to operate on static datasets or low throughput streams. In this paper, we propose a contribution to tackle both issues, by designing an FPGA accelerator for pattern mining with output space sampling and we show that our accelerator can outperform a state of the art implementation on a server class CPU using modest a FPGA product.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:30

IP4-1, 492

AN EFFICIENT MAPPING APPROACH TO LARGE-SCALE DNNS ON MULTI-FPGA ARCHITECTURES
Speaker:
Jiaxi Zhang, Peking University, CN
Authors:
Wentai Zhang¹, Jiaxi Zhang¹, Minghua Shen², Guojie Luo¹ and Nong Xiao³
¹Peking University, CN; ²Sun Yat-sen University, CN; ³Sun Yat-Sen University, CN
Abstract
FPGAs are very attractive to accelerate the deep neural networks (DNNs). While single FPGA can provide good performance for small-scale DNNs, support for large-scale DNNs is limited due to higher resource demand. In this paper, we propose an efficient mapping approach for accelerating large-scale DNNs on asymmetric multi-FPGA architectures. In this approach, the neural network mapping can be formulated as a resource allocation problem. We design a dynamic programming-based partitioning to solve this problem optimally. Experimental results using the large-scale ResNet-152 demonstrate that our approach deploys sixteen FPGAs to provide an advantage of 16.4x GOPS over the state-of-the-art work.
Download Paper (PDF; Only available from the DATE venue WiFi)

18:30

End of session