2.3 Fueling the future of computing: 3D, TFT, or disruptive memories?

Printer-friendly version PDF version

Date: Tuesday 10 March 2020
Time: 11:30 - 13:00
Location / Room: Autrans

Chair:
Yvain Thonnart, CEA-Leti, FR

Co-Chair:
Marco Vacca, Politecnico di Torino, IT

In the post-CMOS era, the future of computing relies more and more on emerging technologies, like resistive memories, TFT and 3D integration or their combination, to continue performance improvements: from a novel accelerating solution for deep neural networks with ferroelectric transistor technology, to a physical design methodology for face-to-face 3D ICs to enable commercial-quality IC layouts. Furthermore, the monolithic 3D advantage obtained combining TFT and RRAM technology is quantified using a novel open-source CAD flow.

TimeLabelPresentation Title
Authors
11:302.3.1TERNARY COMPUTE-ENABLED MEMORY USING FERROELECTRIC TRANSISTORS FOR ACCELERATING DEEP NEURAL NETWORKS
Speaker:
Sandeep Krishna Thirumala, Purdue University, US
Authors:
Sandeep Krishna Thirumala, Shubham Jain, Sumeet Gupta and Anand Raghunathan, Purdue University, US
Abstract
Ternary Deep Neural Networks (DNNs), which employ ternary precision for weights and activations, have recently been shown to attain accuracies close to full-precision DNNs, raising interest in their efficient hardware realization. In this work we propose a Non-Volatile Ternary Compute-Enabled memory cell (TeC-Cell) based on ferroelectric transistors (FEFETs) for in-memory computing in the signed ternary regime. In particular, the proposed cell enables storage of ternary weights and employs multi-word-line assertion to perform massively parallel signed dot-product computations between ternary weights and ternary inputs. We evaluate the proposed design at the array level and show 72% and 74% higher energy efficiency for multiply-and-accumulate (MAC) operations compared to standard near-memory computing designs based on SRAM and FEFET, respectively. Furthermore, we evaluate the proposed TeC-Cell in an existing ternary in-memory DNN accelerator. Our results show 3.3X-3.4X reduction in system energy and 4.3X-7X improvement in system performance over SRAM and FEFET based near-memory accelerators, across a wide range of DNN benchmarks including both deep convolutional and recurrent neural networks.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:002.3.2MACRO-3D: A PHYSICAL DESIGN METHODOLOGY FOR FACE-TO-FACE-STACKED HETEROGENEOUS 3D ICS
Speaker:
Lennart Bamberg, University of Bremen, DE / GrAi Matter Labs, NL
Authors:
Lennart Bamberg1, Lingjun Zhu2, Sai Pentapati2, Da Eun Shim2, Alberto Garcia-Ortiz3 and Sung Kyu Lim2
1GrAi Matter Labs, NL; 2Georgia Tech, US; 3University of Bremen, DE
Abstract
Memory-on-logic and sensor-on-logic face-to-face stacking are emerging design approaches that promise a significant increase in the performance of modern systems-on-chip at reasonable costs. In this work, a netlist-to-layout design flow for such heterogeneous 3D systems is proposed. The proposed technique overcomes the severe limitations of existing 3D physical design methodologies. A RISC-V-based multi-core system, implemented in a commercial technology, is used as a case study to evaluate the proposed design flow. The case study is performed for modern/large and small cache sizes to show the superiority of the proposed methodology for a broad set of systems. While previous 3D design flows do not show to optimize performance against 2D baseline designs for processor systems with a significant memory area occupation, the proposed flow shows a performance and power improvement by 20.4-28.2 % and 3.2-3.8 %, respectively.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:302.3.3QUANTIFYING THE BENEFITS OF MONOLITHIC 3D COMPUTING SYSTEMS ENABLED BY TFT AND RRAM
Speaker:
Abdallah Felfel, Zewail City of Science and Technology, EG
Authors:
Abdallah M Felfel1, Kamalika Datta1, Arko Dutt1, Hasita Veluri2, Ahmed Zaky1, Aaron Thean2 and Mohamed M Sabry Aly1
1Nanyang Technological University, SG; 2National University of Singapore, SG
Abstract
Current data-centric workloads, such as deep learning, expose the memory-access inefficiencies of current computing systems. Monolithic 3D integration can overcome this limitation by leveraging fine-grained and dense vertical connectivity to enable massively-concurrent accesses between compute and memory units. Thin-Film Transistors (TFTs) and Resistive RAM (RRAM) naturally enable monolithic 3D integration as they are fabricated in low temperature (a crucial requirement). In this paper, we explore ZnO-based TFTs and HfO2-based RRAM to build a 1TFT-1R memory subsystem in the upper tiers. The TFT-based memory subsystem is stacked on top of a Si-FET bottom tier that can include compute units and SRAM. System-level simulations for various deep learning workloads show that our TFT-based monolithic 3D system achieves up to 11.4x system-level energy-delay product benefits compared to 2D baseline with off-chip DRAM---5.8x benefits over interposer-based 2.5D integration and 1.25x over 3D stacking of RRAM on silicon using through-silicon vias. These gains are achieved despite the low density of TFT-based RRAM and the higher energy consumption versus 3D stacking with RRAM, due to inherent TFT limitations.

Download Paper (PDF; Only available from the DATE venue WiFi)
12:452.3.4ORGANIC-FLOW: AN OPEN-SOURCE ORGANIC STANDARD CELL LIBRARY AND PROCESS DEVELOPMENT KIT
Speaker:
Ting-Jung Chang, Princeton University, US
Authors:
Ting-Jung Chang, Zhuozhi Yao, Barry P. Rand and David Wentzlaff, Princeton University, US
Abstract
Organic thin-film transistors (OTFTs) are drawing increasing attention due to their unique advantages of mechanical flexibility, low-cost fabrication, and biodegradability, enabling diverse applications that were not achievable using traditional inorganic transistors. With a growing number of complex applications being proposed, the need for expediting the design process and ensuring the yield of large-scale designs with organic technology increases. A complete digital standard cell library plays a crucial role in integrating the emerging organic technology into existing computer-aided-design (CAD) flows. In this paper, we present the design, fabrication, and characterization of a standard cell library based on bottom gate, top contact pentacene OTFTs. We also propose a commercial tool compatible, RTL-to-GDS flow along with a new organic process design kit (PDK) developed based on our process. To the best of our knowledge, this is the first open-source organic standard cell library, enabling the community to explore this emerging technology.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00IP1-2, 130CMOS IMPLEMENTATION OF SWITCHING LATTICES
Speaker:
Levent Aksoy, Istanbul TU, TR
Authors:
Ismail Cevik, Levent Aksoy and Mustafa Altun, Istanbul TU, TR
Abstract
Switching lattices consisting of four-terminal switches are introduced as area-efficient structures to realize logic functions. Many optimization algorithms have been proposed, including exact ones, realizing logic functions on lattices with the fewest number of four-terminal switches, as well as heuristic ones. Hence, the computing potential of switching lattices has been justified adequately in the literature. However, the same thing cannot be said for their physical implementation. There have been conceptual ideas for the technology development of switching lattices, but no concrete and directly applicable technology has been proposed yet. In this study, we show that switching lattices can be directly and efficiently implemented using a standard CMOS process. To realize a given logic function on a switching lattice, we propose static and dynamic logic solutions. The proposed circuits as well as the compared conventional ones are designed and simulated in the Cadence environment using TSMC 65nm CMOS process. Experimental post layout results on logic functions show that switching lattices occupy much smaller area than those of the conventional CMOS implementations, while they have competitive delay and power consumption values.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:01IP1-3, 327A TIMING UNCERTAINTY-AWARE CLOCK TREE TOPOLOGY GENERATION ALGORITHM FOR SINGLE FLUX QUANTUM CIRCUITS
Speaker:
Massoud Pedram, University of Southern California, US
Authors:
Soheil Nazar Shahsavani, Bo Zhang and Massoud Pedram, University of Southern California, US
Abstract
This paper presents a low-cost, timing uncertainty-aware synchronous clock tree topology generation algorithm for single flux quantum (SFQ) logic circuits. The proposed method considers the criticality of the data paths in terms of timing slacks as well as the total wirelength of the clock tree and generates a (height-) balanced binary clock tree using a bottom-up approach and an integer linear programming (ILP) formulation. The statistical timing analysis results for ten benchmark circuits show that the proposed method improves the total wirelength and the total negative hold slack by 4.2% and 64.6%, respectively, on average, compared with a wirelength-driven state-of-the-art balanced topology generation approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
13:00End of session