Storage Class Memory with Computing Row Buffer: A Design Space Exploration
Valentin Egloff1,a, Jean-Philippe Noel1,b, Maha Kooli1,c, Bastien Giraud1,d, Lorenzo Ciampolini1,e, Roman Gauchi1,f, César Fuguet1,g, Éric Guthmuller1,h, Mathieu Moreau2,i, Jean-Michel Portal2,j
1Univ. Grenoble Alpes, CEA, List
aValentin.Egloff@cea.fr
bJean-Philippe.Noel@cea.fr
cMaha.Kooli@cea.fr
dBastien.Giraud@cea.fr
eLorenzo.Ciampolini@cea.fr
fRoman.Gauchi@cea.fr
gCesar.Fuguet@cea.fr
hEric.Guthmuller@cea.fr
2IM2NP, Univ. Aix-Marseille et Toulon, CNRS, France
iMathieu.Moreau@cea.fr
JJean-Michel.Portal@cea.fr
ABSTRACT
Today computing centric von Neumann architectures face strong limitations in the data-intensive context of numerous applications, such as deep learning. One of these limitations corresponds to the well known von Neumann bottleneck. To overcome this bottleneck, the concepts of In-Memory Computing (IMC) and Near-Memory Computing (NMC) have been proposed. IMC solutions based on volatile memories, such as SRAM and DRAM, with nearly infinite endurance, solve only partially the data transfer problem from the Storage Class Memory (SCM). Computing in SCM is extremely limited by the intrinsic poor endurance of the Non-Volatile Memory (NVM) technologies. In this paper, we propose to take the best of both solutions, by introducing a Computing Row Buffer (C-RB), using a Computing SRAM (C-SRAM) model, in place of the standard Row Buffer (RB) in the SCM. The principle is to keep operations on large vectors in the C-RB of the SCM, minimizing data movement to and from the CPU, thus drastically reducing energy consumption of the overall system. To evaluate the proposed architecture, we use an instruction accurate platform based on Intel Pin software. Pin instruments run time binaries in order to get applications’ full memory traces of our solution. We achieve energy reduction up to 7.9x on average and up to 45x for the best case and speedup up to 3.8x on average and up to 13x for the best case, and a reduction of write accesses in the SCM up to 18%, compared to SIMD 512-bit architecture.
Keywords: Near-Memory Computing, In-Memory Computing, von Neumann bottleneck, Storage Class Memory, row-buffer, C-SRAM.

