8.4 Applications of Reconfigurable Computing

Printer-friendly version PDF version

Date: Wednesday 27 March 2019
Time: 17:00 - 18:30
Location / Room: Room 4

Chair:
Suhaib Fahmy, University of Warwick, UK

Co-Chair:
Marco Platzner, Paderborn University, DE

This session presents three papers that advance the state of the art in FPGA-based applications for autonomous driving, circuit analysis, and time-series data processing, and one interactive presentation on mapping deep neural networks to multi-FPGA platforms.

TimeLabelPresentation Title
Authors
17:008.4.1ADAPTIVE VEHICLE DETECTION FOR REAL-TIME AUTONOMOUS DRIVING SYSTEM
Speaker:
Maryam Hemmati, The University of Auckland, NZ
Authors:
Maryam Hemmati1, Morteza Biglari-Abhari1 and Smail Niar2
1University of Auckland, NZ; 2University of Valenciennes and Hainaut-Cambresis, FR
Abstract
Modern cars are being equipped with powerful computational resources for autonomous driving systems (ADS) as one of their major parts to provide safer travels on roads. High accuracy and real-time requirements of ADS are addressed by HW/SW co-design methodology which helps in offloading the computationally intensive tasks to the hardware part. However, the limited hardware resources could be a limiting factor in complicated systems. This paper presents a dynamically reconfigurable system for ADS which is capable of real-time vehicle and pedestrian detection. Our approach employs different methods of vehicle detection in different lighting conditions to achieve better results. A novel deep learning method is presented for detection of vehicles in the dark condition where the road light is very limited or unavailable. We present a partial reconfiguration (PR) controller which accelerates the reconfiguration process on Zynq SoC for seamless detection in real-time applications. By partially reconfiguring the vehicle detection block on Zynq SoC, resource requirements is maintained low enough to allow for the existence of other functionalities of ADS on hardware which could complete their tasks without any interruption. Our presented system is capable of detecting pedestrian and vehicles in different lighting conditions at the rate of 50fps (frames per second) for HDTV (1080x1920) frame.

Download Paper (PDF; Only available from the DATE venue WiFi)
17:308.4.2AN EFFICIENT FPGA-BASED FLOATING RANDOM WALK SOLVER FOR CAPACITANCE EXTRACTION USING SDACCEL
Speaker:
Xin Wei, Fudan University, CN
Authors:
Xin Wei1, Changhao Yan1, Hai Zhou2, Dian Zhou1 and Xuan Zeng1
1Fudan University, CN; 2Northwestern Univerity, US
Abstract
The floating random walk (FRW) algorithm is an important method widely used in the capacitance extraction of very large-scale integration (VLSI) interconnects. FRW could be both time-consuming and power-consuming as the circuit scale grows. However, its highly parallel nature prompts us to accelerate it with FPGAs, which have shown great performance and energy efficiency potential to other computing architectures. In this paper, we propose a scalable FPGA/CPU heterogeneous framework of FRW using SDAccel. Large-scale circuits are partitioned first by the CPU into several segments, and these segments are then sent to the FPGA random walking one by one. The framework solves the challenge of limited FPGA on-chip resource and integrates both merits of FPGAs and CPUs by targeting separate parts of the algorithm to suitable architecture, and the FPGA bitstream is built once for all. Several kernel optimization strategies are used to maximize performance of FPGAs. Besides, the FRW algorithm we use is the naive version with walking on spheres (WOS), which is much simpler and easier to implement than the complicatedly optimized version with walking on cubes (WOC). The implementation on AWS EC2 F1 (Xilinx VU9P FPGA) shows up to 6.1x performance and 42.6x energy efficiency over a quad-core CPU, and 5.2x energy efficiency over the state-of-the-art WOC implementation on an 8-core CPU.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:008.4.3ACCELERATING ITEMSET SAMPLING USING SATISFIABILITY CONSTRAINTS ON FPGA
Speaker:
Mael Gueguen, Univ Rennes, Inria, CNRS, IRISA, FR
Authors:
Mael Gueguen1, Olivier Sentieys2 and Alexandre Termier1
1Univ Rennes, CNRS, IRISA, FR; 2INRIA, FR
Abstract
Finding recurrent patterns within a data stream is important for fields as diverse as cybersecurity or e-commerce. This requires to use pattern mining techniques. However, pattern mining suffers from two issues. The first one, known as ``pattern explosion'', comes from the large combinatorial space explored, and is the output of too many results for them to be useful. Recent techniques called output space sampling solve this problem by outputing only a sampled set of all the results, with a target size provided by the user. The second issue is that most algorithms are designed to operate on static datasets or low throughput streams. In this paper, we propose a contribution to tackle both issues, by designing an FPGA accelerator for pattern mining with output space sampling and we show that our accelerator can outperform a state of the art implementation on a server class CPU using modest a FPGA product.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30IP4-1, 492AN EFFICIENT MAPPING APPROACH TO LARGE-SCALE DNNS ON MULTI-FPGA ARCHITECTURES
Speaker:
Jiaxi Zhang, Peking University, CN
Authors:
Wentai Zhang1, Jiaxi Zhang1, Minghua Shen2, Guojie Luo1 and Nong Xiao3
1Peking University, CN; 2Sun Yat-sen University, CN; 3Sun Yat-Sen University, CN
Abstract
FPGAs are very attractive to accelerate the deep neural networks (DNNs). While single FPGA can provide good performance for small-scale DNNs, support for large-scale DNNs is limited due to higher resource demand. In this paper, we propose an efficient mapping approach for accelerating large-scale DNNs on asymmetric multi-FPGA architectures. In this approach, the neural network mapping can be formulated as a resource allocation problem. We design a dynamic programming-based partitioning to solve this problem optimally. Experimental results using the large-scale ResNet-152 demonstrate that our approach deploys sixteen FPGAs to provide an advantage of 16.4x GOPS over the state-of-the-art work.

Download Paper (PDF; Only available from the DATE venue WiFi)
18:30End of session