12.6 Special Session: Computing with Emerging Memories: How Good can it be?

Time	Label	Presentation Title Authors
16:00	12.6.1	PRACTICAL CHALLENGES IN DELIVERING THE PROMISES OF REAL PROCESSING-IN-MEMORY MACHINES Speaker: Nishil Talati, Technion - Israel Institute of Technology, IL Authors: Nishil Talati¹, Ameer Haj Ali², Rotem Ben Hur², Nimrod Wald², Ronny Ronen², Pierre-Emmanuel Gaillardon³ and Shahar Kvatinsky¹ ¹Technion, IL; ²Technion - Israel Institute of Technology, IL; ³University of Utah, US Abstract Processing-in-Memory (PiM) machines promise to overcome the von Neumann bottleneck in order to further scale performance and energy efficiency of computing systems by reducing the extent of data transfer and offering ample parallelism. In this paper, we take the memristive Memory Processing Unit (mMPU) as a case study of a PiM machine and scrutinize it in practical scenarios. Specifically, we explore the limitations of parallelism and data transfer elimination. We argue that lack of operand locality and arrangement might make data transfer inevitable in the mMPU. We then devise techniques to move data within the mMPU, without transferring it off-chip, and quantify their costs. Additionally, we present electrical parameters that might limit the parallelism offered by the mMPU and evaluate their impact. Using benchmarks from the LGsynth91 suite, their vector extensions, and a few synthetic data-parallel workloads, we show that the internal data transfer results in an increase of up to 1.5x in the execution time, while the limited parallelism increases it by 1.1x to 2x. Download Paper (PDF; Only available from the DATE venue WiFi)
16:30	12.6.2	SMART INSTRUCTION CODES FOR IN-MEMORY COMPUTING ARCHITECTURES COMPATIBLE WITH STANDARD SRAM INTERFACES Speaker: Maha Kooli, CEA-Leti, FR Authors: Maha Kooli¹, Henri-Pierre CHARLES², Bastien Giraud³ and Jean-Philippe Noel² ¹CEA/LETI, FR; ²CEA, FR; ³CEA LETI, FR Abstract This paper presents the computing model for the In-Memory Computing architecture based on SRAM memory that embeds computing abilities. This memory concept offers significant performance gains in terms of energy consumption and execution time. To handle the interaction between the memory and the CPU, new memory instruction codes were designed. Those instructions are communicated by the CPU to the memory, using standard SRAM buses. This implementation allows (1) to embed In-Memory Computing capabilities on a system without Instruction Set Architecture (ISA) modification, and (2) to finely interlace CPU instructions and in-memory computing instructions. Download Paper (PDF; Only available from the DATE venue WiFi)
17:00	12.6.4	MEMRISTIVE DEVICES FOR COMPUTATION-IN-MEMORY Speaker: Said Hamdioui, Delft University of Technology, NL Authors: Jintao Yu, HoangAnh DuNguyen, Mottaqiallah Taouil and Said Hamdioui, TU Delft, NL Abstract CMOS technology and its continuous scaling have made electronics and computers accessible and affordable for almost everyone on the globe; in addition, they have enabled the solutions of a wide range of societal problems and applications. Today, however, both the technology and the computer architectures are facing severe challenges/walls making them incapable of providing the demanded computing power with tight constraints. This motivates the need for the exploration of novel architectures based on new device technologies; not only to sustain the financial benefit of technology scaling, but also to develop solutions for extremely demanding emerging applications. This paper presents two computation-in-memory based accelerators making use of emerging memristive devices; they are Memristive Vector Processor and RRAM Automata Processor. The preliminary results of these two accelerators show significant improvement in terms of latency, energy and area as compared to today's architectures and design. Download Paper (PDF; Only available from the DATE venue WiFi)
17:15	12.6.3	COMPUTING-IN-MEMORY WITH SPINTRONICS Speaker: Shubham Jain, Purdue University, US Authors: Shubham Jain¹, Sachin Sapatnekar², Jian-Ping Wang², Kaushik Roy¹ and Anand Raghunathan¹ ¹Purdue University, US; ²Department of Electrical and Computer Engineering, University of Minnesota, US Abstract In-memory computing is a promising approach to alleviating the processor-memory data transfer bottleneck in computing systems. While spintronics has attracted great interest as a non-volatile memory technology, recent work has shown that its unique properties can also enable in-memory computing. We summarize efforts in this direction, and describe three different designs that enhance STT-MRAM to perform logic, arithmetic, and vector operations and evaluate transcendental functions within memory arrays. Download Paper (PDF; Only available from the DATE venue WiFi)
17:30		End of session

Time

Label

Presentation Title
Authors

16:00

12.6.1

PRACTICAL CHALLENGES IN DELIVERING THE PROMISES OF REAL PROCESSING-IN-MEMORY MACHINES
Speaker:
Nishil Talati, Technion - Israel Institute of Technology, IL
Authors:
Nishil Talati¹, Ameer Haj Ali², Rotem Ben Hur², Nimrod Wald², Ronny Ronen², Pierre-Emmanuel Gaillardon³ and Shahar Kvatinsky¹
¹Technion, IL; ²Technion - Israel Institute of Technology, IL; ³University of Utah, US
Abstract
Processing-in-Memory (PiM) machines promise to overcome the von Neumann bottleneck in order to further scale performance and energy efficiency of computing systems by reducing the extent of data transfer and offering ample parallelism. In this paper, we take the memristive Memory Processing Unit (mMPU) as a case study of a PiM machine and scrutinize it in practical scenarios. Specifically, we explore the limitations of parallelism and data transfer elimination. We argue that lack of operand locality and arrangement might make data transfer inevitable in the mMPU. We then devise techniques to move data within the mMPU, without transferring it off-chip, and quantify their costs. Additionally, we present electrical parameters that might limit the parallelism offered by the mMPU and evaluate their impact. Using benchmarks from the LGsynth91 suite, their vector extensions, and a few synthetic data-parallel workloads, we show that the internal data transfer results in an increase of up to 1.5x in the execution time, while the limited parallelism increases it by 1.1x to 2x.
Download Paper (PDF; Only available from the DATE venue WiFi)

16:30

12.6.2

SMART INSTRUCTION CODES FOR IN-MEMORY COMPUTING ARCHITECTURES COMPATIBLE WITH STANDARD SRAM INTERFACES
Speaker:
Maha Kooli, CEA-Leti, FR
Authors:
Maha Kooli¹, Henri-Pierre CHARLES², Bastien Giraud³ and Jean-Philippe Noel²
¹CEA/LETI, FR; ²CEA, FR; ³CEA LETI, FR
Abstract
This paper presents the computing model for the In-Memory Computing architecture based on SRAM memory that embeds computing abilities. This memory concept offers significant performance gains in terms of energy consumption and execution time. To handle the interaction between the memory and the CPU, new memory instruction codes were designed. Those instructions are communicated by the CPU to the memory, using standard SRAM buses. This implementation allows (1) to embed In-Memory Computing capabilities on a system without Instruction Set Architecture (ISA) modification, and (2) to finely interlace CPU instructions and in-memory computing instructions.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:00

12.6.4

MEMRISTIVE DEVICES FOR COMPUTATION-IN-MEMORY
Speaker:
Said Hamdioui, Delft University of Technology, NL
Authors:
Jintao Yu, HoangAnh DuNguyen, Mottaqiallah Taouil and Said Hamdioui, TU Delft, NL
Abstract
CMOS technology and its continuous scaling have made electronics and computers accessible and affordable for almost everyone on the globe; in addition, they have enabled the solutions of a wide range of societal problems and applications. Today, however, both the technology and the computer architectures are facing severe challenges/walls making them incapable of providing the demanded computing power with tight constraints. This motivates the need for the exploration of novel architectures based on new device technologies; not only to sustain the financial benefit of technology scaling, but also to develop solutions for extremely demanding emerging applications. This paper presents two computation-in-memory based accelerators making use of emerging memristive devices; they are Memristive Vector Processor and RRAM Automata Processor. The preliminary results of these two accelerators show significant improvement in terms of latency, energy and area as compared to today's architectures and design.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:15

12.6.3

COMPUTING-IN-MEMORY WITH SPINTRONICS
Speaker:
Shubham Jain, Purdue University, US
Authors:
Shubham Jain¹, Sachin Sapatnekar², Jian-Ping Wang², Kaushik Roy¹ and Anand Raghunathan¹
¹Purdue University, US; ²Department of Electrical and Computer Engineering, University of Minnesota, US
Abstract
In-memory computing is a promising approach to alleviating the processor-memory data transfer bottleneck in computing systems. While spintronics has attracted great interest as a non-volatile memory technology, recent work has shown that its unique properties can also enable in-memory computing. We summarize efforts in this direction, and describe three different designs that enhance STT-MRAM to perform logic, arithmetic, and vector operations and evaluate transcendental functions within memory arrays.
Download Paper (PDF; Only available from the DATE venue WiFi)

17:30

End of session