A Reconfigurable Multiple-Precision Floating-Point Dot Product Unit for High-Performance Computing

Wei Mao1, Kai Li1, Xinang Xie1, Shirui Zhao1, He Li2 and Hao Yu1
1School of Microelectronics, Southern University of Science and Technology, Shenzhen, China
2Department of computer science and technology, University of Cambridge, London, UK

ABSTRACT


There is an emerging need to optimize floating-point (FP) dot product units (DPU) for high-performance scientific computing as well as training deep learning models. Due to different precision requirements of applications, a reconfigurable multiple-precision DPU operation can largely reduce the cost of area and power. However, the existing methods could result in redundant bits for unit multipliers, but also leave idle hardware resources for the operations in different precisions. In this paper, a reconfigurable multiple-precision FP DPU design is proposed for high-performance computing (HPC) applications. The FP DPU can be reconfigured as follows. A bit-partitioning method is provided to minimize the redundant bits with a configurable mixed-precision multiplier for three-mode operations: 20 halfprecision Dot Product (DP), 5 single-precision DP, and 1 doubleprecision DP operations. Any of the modes can be executed in two successive clock cycles without idle hardware resources. The proposed design is realized by using the UMC 55-nm process with simulation results. Compared with the existing multiple-precision FP methods, the proposed DPU achieves 88.9% and 35.8% areasaving performance for FP16 and FP32 operations, respectively. Moreover, when using benchmarked HPC applications where multiple precisions can be used, the proposed reconfigurable DPU can accelerate up to 4× and 20× maximum throughput rates when compared with fixed FP32 and FP64 operations, respectively.

Keywords: Multiple-precision, Partitioning, Floating Point, Simd, Dot Product Unit, Mix-Precision, Multiplier.



Full Text (PDF)