Prometheus: Processing‐in‐memory Heterogeneous Architecture Design From a Multi‐layer Network Theoretic Strategy

Yao Xiaoa, Shahin Nazarianb and Paul Bogdanc
Department of Electrical Engineering University of Southern California, Los Angeles, CA, USA
axiaoyao@usc.edu
bshahin.nazarian@usc.edu
cpbogdan@usc.edu

ABSTRACT


With increasing demand for distributed intelligent physical systems performing big data analytics on the field and in real‐time, processing‐in‐memory (PIM) architectures integrating 3D‐stacked memory and logic layers could provide higher performance and energy efficiency. Towards this end, the PIM design requires principled and rigorous optimization strategies to identify interactions and manage data movement across different vaults. In this paper, we introduce Prometheus, a novel PIMbased framework that constructs a comprehensive model of computation and communication (MoCC) based on a static and dynamic compilation of an application. Firstly, by adopting a low level virtual machine (LLVM) intermediate representation (IR), an input application is modeled as a two‐layered graph consisting of (i) a computation layer in which the nodes denote computation IR instructions and edges denote data dependencies among instructions, and (ii) a communication layer in which the nodes denote memory operations (e.g., load/store) and edges represent memory dependencies detected by alias analysis. Secondly, we develop an optimization framework that partitions the multi‐layer network into processing communities within which the computational workload is maximized while balancing the load among computational clusters. Thirdly, we propose a community‐to‐vault mapping algorithm for designing a scalable hybrid memory cube (HMC)‐based system where vaults are interconnected through a network‐on‐chip (NoC) approach rather than a crossbar architecture. This ensures scalability to hundreds of vaults in each cube. Experimental results demonstrate that Prometheus consisting of 64 HMC-based vaults improves system performance by 9.8x and achieves 2.3x energy reduction, compared to conventional systems.



Full Text (PDF)