Triple Fixed-Point MAC Unit for Deep Learning

Madis Kerner1,a, Kalle Tammemäe1,b, Jaan Raik1,c and Thomas Hollstein1,2,d
1Tallinn University of Technology, Tallinn, Estonia
2Frankfurt University of Applied Sciences, Frankfurt, Germany
amadis.kerner@taltech.ee
bkalle.tammemae@taltech.ee
cjaan.raik@taltech.ee
dhollstein@fb2.fra-uas.de

ABSTRACT


Deep Learning (DL) algorithms have proved to be successful in various domains. Typically, the models use Floating Point (FP) numeric formats and are executed on Graphical Processing Units (GPUs). However, Field Programmable Gate Arrays (FPGAs) are more energy-efficient and, therefore, a better platform for resource-constrained devices. As the FP design infers many FPGA resources, it is replaced with quantized fixedpoint implementations in state-of-the-art. The loss of precision is mitigated by dynamically adjusting the radix point on network layers, reconfiguration, and re-training. In this paper, we present the first Triple Fixed-Point (TFxP) architecture, which provides the computational precision of FP while using significantly fewer hardware resources and does not need network re-training. Based on a comparison of FP and existing Fixed-Point (FxP) implementations in combination with a detailed precision analysis of YOLOv2 weights and activation values, the novel TFxP format is introduced.



Full Text (PDF)