Opportunities for Analog Acceleration of Deep Learning With Phase Change Memory

Pritish Narayanan, Geoffrey W. Burr, Stefano Ambrogio, Hsinyu Tsai, Charles Mackin, Katherine Spoon, An Chen, Alexander Friz and Andrea Fasoli
IBM Research, US

ABSTRACT

Storage Class Memory and High Bandwidth Memory Technologies are already reshaping systems architecture in interesting ways, by bringing cheap and high-density memory closer and closer to processing. Extrapolating on this trend, a new class of in-memory computing solutions is emerging, where some or all of the computing happens at the location of the data. Within the landscape of in-memory computing approaches, Non-von Neumann architectures seek to eliminate most of the data movement associated with computing, eliminating the demarcation between compute and memory. While such non-Von Neumann architectures could offer orders of magnitude performance improvements on certain workloads, they are not as general purpose nor as easily programmable as von-Neumann architectures. Therefore, well defined use cases need to exist to justify the hardware investment. Fortunately, acceleration of deep learning, which is both compute and memory-intensive, is one such use case. Today, the training of deep learning networks is done primarily in the cloud and could take days or weeks even when using many GPUs. Specialized hardware for training is thus primarily focused on speedup, with energy/power a secondary concern. On the other hand, 'Inference', the deployment and use of pre-trained models for real-world tasks, is done both in the cloud and on edge devices and presents hardware opportunities at both high speed and low power design points. In this presentation, we describe some of the opportunities and challenges in building accelerators for deep learning using analog volatile and non-volatile memory. We review our group's recent progress towards achieving software-equivalent accuracies on deep learning tasks in the presence of real-device imperfections such as non-linearity, asymmetry, variability and conductance drift. We will present some novel techniques and optimizations across device, circuit, and neural network design to achieve high accuracy with existing devices. We will then discuss challenges for peripheral circuit design and conclude by providing an outlook on the prospects for analog memory-based DNN accelerators.