A Time‐Multiplexed FPGA Overlay with Linear Interconnect
Xiangwei Li1,a, Abhishek Kumar Jain2, Douglas L. Maskell1,b and Suhaib A. Fahmy3
1Nanyang Technological University, Singapore
axli045@ntu.edu.sg
basdouglas@ntu.edu.sg
2Lawrence Livermore National Laboratory, United States
jain7@llnl.gov
3School of Engineering, University of Warwick, United Kingdom
s.fahmy@warwick.ac.uk
ABSTRACT
Coarse‐grained overlays improve FPGA design productivity by providing fast compilation and software like programmability. Soft processor based overlays with well‐defined ISAs are attractive to application developers due to their ease of use. However, these overlays have significant FPGA resource overheads. Time multiplexed (TM) CGRA‐like overlays represent an interesting alternative as they are able to change their behavior on a cycle by cycle basis while the compute kernel executes. This reduces the FPGA resource needed, but at the cost of a higher initiation interval (II) and hence reduced throughput. The fully flexible routing network of current CGRA‐like overlays results in high FPGA resource usage. However, many application kernels are acyclic and can be implemented using a much simpler linear feed‐forward routing network. This paper examines a DSP block based TM overlay with linear interconnect where the overlay architecture takes account of the application kernels' characteristics and the underlying FPGA architecture, so as to minimize the II and the FPGA resource usage. We examine a number of architectural extensions to the DSP block based functional unit to improve the II, throughput and latency. The results show an average 70% reduction in II, with corresponding improvements in throughput and latency.
Keywords: Reconfigurable system, overlay architecture, FPGA.