Paradigm On Approximate Compute for Complex Perception-Based Neural Networks

Andre Guntoro and Cecilia De la Parra
Robert Bosch GmbH, DE

ABSTRACT

The rise of machine learning pushes the massive compute power requirements, especially on the edge devices for their real-time inferences. One established approach for reducing the power usage is by going down to integer inferences (such as 8-bit) instead of utilizing higher computation accuracy given by their floating-point counterparts. Squeezing into lower bit representations such as in binary weight networks or binary neural networks requires complex training methods and also more efforts to recover the precision loss, and it typically functions only on simple classification tasks. One promising alternative to further reduce power consumption and computation latency is by utilizing approximate compute units. This method is a promising paradigm for mitigating the computation demand of neural networks, by taking advantage of their inherent resilience. Thanks to the development in approximate computing in the last decade, we have abundant options to utilize the best available approximate units, without re-developing or re-designing them. Nonetheless, adaptation during training phase is required. At first, we need to adapt the training methods for neural networks to take into account the inaccuracy given by approximate compute, without sacrificing the training speed (considering the trainings are performed on GPU with floating-point). Second, we need to define new metric for assessing and selecting the best-fit of approximation units per use-case basis. Lastly, we need to take advantages of approximation into the neural networks, such as over-fitting mitigation per design and resiliency, so that the networks trained for and designed with approximation will and shall perform better than their exact computing counterparts. For these steps, we evaluate on small tasks first and further validate on complex tasks which are more relevant in automotive domains.