11.6 Memory: new technologies and reliability-related issues

Time	Label	Presentation Title Authors
14:00	11.6.1	XNOR-RRAM: A SCALABLE AND PARALLEL RESISTIVE SYNAPTIC ARCHITECTURE FOR BINARY NEURAL NETWORKS Speaker: Shimeng Yu, Arizona State University, CN Authors: Xiaoyu Sun, Shihui Yin, Xiaochen Peng, Rui Liu, Jae-sun Seo and Shimeng Yu, Arizona State University, US Abstract Recent advances in deep learning have shown that Binary Neural Networks (BNNs) are capable of providing a satisfying accuracy on various image datasets with significant reduction in computation and memory cost. With both weights and activations binarized to +1 or -1 in BNNs, the high-precision multiply-and-accumulate (MAC) operations can be replaced by XNOR and bit-counting operations. In this work, we propose a RRAM synaptic architecture (XNOR-RRAM) with a bit-cell design of complementary word lines that implements equivalent XNOR and bit-counting operation in a parallel fashion. For large-scale matrices in fully connected layers or when the convolution kernels are unrolled in multiple channels, the array partition is necessary. Multi-level sense amplifiers (MLSAs) are employed as the intermediate interface for accumulating partial weighted sum. However, a low bit-level MLSA and intrinsic offset of MLSA may degrade the classification accuracy. We investigate the impact of sensing offsets on classification accuracy and analyze various design options with different sub-array sizes and sensing bit-levels. Experimental results with RRAM models and 65nm CMOS PDK show that the system with 128×128 sub-array size and 3-bit MLSA can achieve accuracies of 98.43% for MLP on MNIST and 86.08% for CNN on CIFAR-10, showing 0.34% and 2.39% degradation respectively compared to the accuracies of ideal BNN algorithms. The projected energy-efficiency of XNOR-RRAM is 141.18 TOPS/W, showing ~33X improvement compared to the conventional RRAM synaptic architecture with sequential row-by-row read-out. Download Paper (PDF; Only available from the DATE venue WiFi)
14:30	11.6.2	A NOVEL FAULT TOLERANT CACHE ARCHITECTURE BASED ON ORTHOGONAL LATIN SQUARES THEORY Speaker: Georgios Keramidas, Think Silicon S.A., GR Authors: Filippos Filippou¹, Georgios Keramidas², Michail Mavropoulos¹ and Dimitris Nikolos¹ ¹University of Patras, GR; ²Think Silicon S.A./Technological Educational Institute of Western Greece, GR Abstract Aggressive dynamic voltage and frequency scaling is widely used to reduce the power consumption of microprocessors. Unfortunately, voltage scaling increases the impact of process variations on memory cells resulting in an exponential increase in the number of malfunctioning memory cells. As a result, various cache fault-tolerant (CFT) techniques have been proposed. In this work, we propose a new CFT technique which applies a systematic redistribution (permutation) of the cache blocks (assuming various block granularity levels) within the cache structure using the orthogonal Latin Square concept and taking as input the location of the malfunctioning cells in the cache array. The aim of the redistribution is twofold. First, to uniformly distribute the faulty blocks to sets and second, to gather the faulty subblocks to a minimum number of blocks, so as the fault free blocks are maximized. Our evaluation results using the benchmarks of SPEC2006 suite, 100 memory fault maps, and four percentages of malfunctioning cells show that our proposal exhibits strong capability to reduce cache performance degradation especially in situations with high percentages of faulty cells and compares favorably to already known techniques. Download Paper (PDF; Only available from the DATE venue WiFi)
15:00	11.6.3	TECHNOLOGY-AWARE LOGIC SYNTHESIS FOR RERAM BASED IN-MEMORY COMPUTING Speaker: Debjyoti Bhattacharjee, Nanyang Technological University, SG Authors: Debjyoti Bhattacharjee¹, Luca Amaru² and Anupam Chattopadhyay¹ ¹Nanyang Technological University, SG; ²Synopsys, US Abstract Resistive RAMs (ReRAMs) have gained prominence for design of logic-in-memory circuits and architectures due to fast read/write speeds, high endurance, density and logic operation capabilities. ReRAM crossbar arrays allow constrained bit-level parallel operations. In this paper, for the first time, we propose optimization techniques during logic synthesis, which are specifically targeted for leveraging the parallelism offered by ReRAM crossbar arrays. Our method uses Majority-Inverter Graph (MIG) for the internal representation of the Boolean functions. The novel optimization techniques, when applied to the MIG, exposes the bit-level parallelism, and is further coupled with an efficient technology mapping flow. The entire synthesis process is benchmarked exhaustively over large arithmetic functions using a representative ReRAM crossbar architecture, while varying the crossbar dimensions. For the hard benchmarks, we obtained 10% reduction in the number of nodes with 16% reduction in delay on average. Download Paper (PDF; Only available from the DATE venue WiFi)
15:15	11.6.4	SMARTAG: ERROR CORRECTION IN CACHE TAG ARRAY BY EXPLOITING ADDRESS LOCALITY Speaker: Hamed Farbeh, School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR Authors: Seyedeh Golsana Ghaemi¹, Iman Ahmadpour², Mehdi Ardebili³ and Hamed Farbeh⁴ ¹Sharif University of Technology, IR; ²Sharif University of technology, IR; ³Tehran University, IR; ⁴School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR Abstract Soft errors in on-chip caches are the major cause of processors failure. Partitioning the cache into data and tag arrays, recent reports show that the vulnerability of the latter is as high as or even higher than that of the former. Although Error-Correcting Codes (ECCs) are widely used to protect the data array, their overheads are not affordable in the tag array and its protection is conventionally limited to parity code. In this paper, we propose Similarity-Managed Robust Tag (SMARTag) technique to provide the error correction capability in parity-protected tags. SMARTag exploits the inherent similarity between the upper parts of the tags in a cache set to share these parts between addresses and ECCs. Using SMARTag, the cache access time is intact since the ECC part is bypassed in normal cache operation and no extra memory is required since ECCs are stored in available tag space. The simulation results show that SMARTag is capable of correcting more than 98% of errors in the tag array, on average, and its energy consumption, area, and performance overhead is less than 0.2%. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30	IP5-8, 287	IMPROVING THE ERROR BEHAVIOR OF DRAM BY EXPLOITING ITS Z-CHANNEL PROPERTY Speaker: Kira Kraft, University of Kaiserslautern, DE Authors: Kira Kraft¹, Matthias Jung², Chirag Sudarshan¹, Deepak M. Mathew¹, Christian Weis¹ and Norbert Wehn¹ ¹University of Kaiserslautern, DE; ²Fraunhofer IESE, DE Abstract In this paper, we present a new communication theoretic channel model for Dynamic Random Access Memory (DRAM) retention errors, that relies on the fully asymmetric retention error behavior of DRAM cells. This new model shows that the traditional approach is over pessimistic and we confirm this with real measurements of DDR3 and DDR4 DRAM devices. Together with an exploitation of the vendor specific true- and anti-cell structure, a low complexity bit-flipping approach is presented, that can largely increase DRAM's reliability with minimum overhead. Download Paper (PDF; Only available from the DATE venue WiFi)
15:31	IP5-9, 668	ARCHITECTURE AND OPTIMIZATION OF ASSOCIATIVE MEMORIES USED FOR THE IMPLEMENTATION OF LOGIC FUNCTIONS BASED ON NANOELECTRONIC 1S1R CELLS Speaker: Arne Heittmann, RWTH-Aachen University, DE Authors: Arne Heittman and Tobias G. Noll, RWTH Aachen University, DE Abstract A neuromorphic architecture based on Binary Associative memories and nanoelectronic resistive switches is proposed for the realization of arbitrary logic/arithmetic functions. Subsets of non-trivial code sets based on error detecting 2-out-of-n-codes are thoroughly used to encode operands, results, and intermediate states in order to enhance the circuit reliability by mitigating the impact of device variability. 2-ary functions can be implemented by cascading a mixer memory, a correlator memory, and a response memory. By introduction of a new cost function based on class-specific word-line-coverage, stochastic optimization is applied with the aim to minimize the overall number of active amplifiers. For various exemplary functions optimized architectures are compared against solutions obtained using a standard-cost function. It is shown that the consideration of word-line-coverage results in a significant circuit compaction. Download Paper (PDF; Only available from the DATE venue WiFi)
15:32	IP5-17, 62	EXPLORING NON-VOLATILE MAIN MEMORY ARCHITECTURES FOR HANDHELD DEVICES Speaker: Virendra Singh, Indian Institute of Technology Bombay, IN Authors: Sneha Ved and Manu Awasthi, Indian Institute of Technology Gandhinagar, IN Abstract As additional functionality is being added to contemporary handheld devices, the SoCs inside these devices are becoming increasingly complex. Similarly, the applications executing on these handhelds are beginning to exhibit an ever increasing memory footprint. To support these trends, main memory capacity of these SoCs has been increasing over time. Due to these developments, memory system's contribution to the overall system power has increased dramatically. Non-volatile memories have been used in server architectures to increase capacity as well as keep memory system's power consumption in check. However, in the handheld domain, where user experience and battery life are of paramount importance, the applicability of such technologies has not been widely studied. In this paper, we propose and evaluate a number of hybrid memory architectures using mobile DRAM and PCM. We show that intelligent memory architectures, cognizant of workload's memory access patterns can provide significant energy savings without compromising on user experience. Using proposed approach, we can devise architectures that exhibit significant energy savings with only a 2.8% performance loss. Download Paper (PDF; Only available from the DATE venue WiFi)
15:30		End of session Coffee Break in Exhibition Area Coffee Breaks in the Exhibition Area On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD). Lunch Breaks (Großer Saal + Saal 1) On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area. Tuesday, March 20, 2018 Coffee Break 10:30 - 11:30 Lunch Break 13:00 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:50 - 14:20 Coffee Break 16:00 - 17:00 Wednesday, March 21, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:30 Awards Presentation and Keynote Lecture in "Saal 2" 13:30 - 14:20 Coffee Break 16:00 - 17:00 Thursday, March 22, 2018 Coffee Break 10:00 - 11:00 Lunch Break 12:30 - 14:00 Keynote Lecture in "Saal 2" 13:20 - 13:50 Coffee Break 15:30 - 16:00

Time

Label

Presentation Title
Authors

14:00

11.6.1

XNOR-RRAM: A SCALABLE AND PARALLEL RESISTIVE SYNAPTIC ARCHITECTURE FOR BINARY NEURAL NETWORKS
Speaker:
Shimeng Yu, Arizona State University, CN
Authors:
Xiaoyu Sun, Shihui Yin, Xiaochen Peng, Rui Liu, Jae-sun Seo and Shimeng Yu, Arizona State University, US
Abstract
Recent advances in deep learning have shown that Binary Neural Networks (BNNs) are capable of providing a satisfying accuracy on various image datasets with significant reduction in computation and memory cost. With both weights and activations binarized to +1 or -1 in BNNs, the high-precision multiply-and-accumulate (MAC) operations can be replaced by XNOR and bit-counting operations. In this work, we propose a RRAM synaptic architecture (XNOR-RRAM) with a bit-cell design of complementary word lines that implements equivalent XNOR and bit-counting operation in a parallel fashion. For large-scale matrices in fully connected layers or when the convolution kernels are unrolled in multiple channels, the array partition is necessary. Multi-level sense amplifiers (MLSAs) are employed as the intermediate interface for accumulating partial weighted sum. However, a low bit-level MLSA and intrinsic offset of MLSA may degrade the classification accuracy. We investigate the impact of sensing offsets on classification accuracy and analyze various design options with different sub-array sizes and sensing bit-levels. Experimental results with RRAM models and 65nm CMOS PDK show that the system with 128×128 sub-array size and 3-bit MLSA can achieve accuracies of 98.43% for MLP on MNIST and 86.08% for CNN on CIFAR-10, showing 0.34% and 2.39% degradation respectively compared to the accuracies of ideal BNN algorithms. The projected energy-efficiency of XNOR-RRAM is 141.18 TOPS/W, showing ~33X improvement compared to the conventional RRAM synaptic architecture with sequential row-by-row read-out.
Download Paper (PDF; Only available from the DATE venue WiFi)

14:30

11.6.2

A NOVEL FAULT TOLERANT CACHE ARCHITECTURE BASED ON ORTHOGONAL LATIN SQUARES THEORY
Speaker:
Georgios Keramidas, Think Silicon S.A., GR
Authors:
Filippos Filippou¹, Georgios Keramidas², Michail Mavropoulos¹ and Dimitris Nikolos¹
¹University of Patras, GR; ²Think Silicon S.A./Technological Educational Institute of Western Greece, GR
Abstract
Aggressive dynamic voltage and frequency scaling is widely used to reduce the power consumption of microprocessors. Unfortunately, voltage scaling increases the impact of process variations on memory cells resulting in an exponential increase in the number of malfunctioning memory cells. As a result, various cache fault-tolerant (CFT) techniques have been proposed. In this work, we propose a new CFT technique which applies a systematic redistribution (permutation) of the cache blocks (assuming various block granularity levels) within the cache structure using the orthogonal Latin Square concept and taking as input the location of the malfunctioning cells in the cache array. The aim of the redistribution is twofold. First, to uniformly distribute the faulty blocks to sets and second, to gather the faulty subblocks to a minimum number of blocks, so as the fault free blocks are maximized. Our evaluation results using the benchmarks of SPEC2006 suite, 100 memory fault maps, and four percentages of malfunctioning cells show that our proposal exhibits strong capability to reduce cache performance degradation especially in situations with high percentages of faulty cells and compares favorably to already known techniques.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:00

11.6.3

TECHNOLOGY-AWARE LOGIC SYNTHESIS FOR RERAM BASED IN-MEMORY COMPUTING
Speaker:
Debjyoti Bhattacharjee, Nanyang Technological University, SG
Authors:
Debjyoti Bhattacharjee¹, Luca Amaru² and Anupam Chattopadhyay¹
¹Nanyang Technological University, SG; ²Synopsys, US
Abstract
Resistive RAMs (ReRAMs) have gained prominence for design of logic-in-memory circuits and architectures due to fast read/write speeds, high endurance, density and logic operation capabilities. ReRAM crossbar arrays allow constrained bit-level parallel operations. In this paper, for the first time, we propose optimization techniques during logic synthesis, which are specifically targeted for leveraging the parallelism offered by ReRAM crossbar arrays. Our method uses Majority-Inverter Graph (MIG) for the internal representation of the Boolean functions. The novel optimization techniques, when applied to the MIG, exposes the bit-level parallelism, and is further coupled with an efficient technology mapping flow. The entire synthesis process is benchmarked exhaustively over large arithmetic functions using a representative ReRAM crossbar architecture, while varying the crossbar dimensions. For the hard benchmarks, we obtained 10% reduction in the number of nodes with 16% reduction in delay on average.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:15

11.6.4

SMARTAG: ERROR CORRECTION IN CACHE TAG ARRAY BY EXPLOITING ADDRESS LOCALITY
Speaker:
Hamed Farbeh, School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR
Authors:
Seyedeh Golsana Ghaemi¹, Iman Ahmadpour², Mehdi Ardebili³ and Hamed Farbeh⁴
¹Sharif University of Technology, IR; ²Sharif University of technology, IR; ³Tehran University, IR; ⁴School of Computer Science, Institute for Research in Fundamental Sciences (IPM), IR
Abstract
Soft errors in on-chip caches are the major cause of processors failure. Partitioning the cache into data and tag arrays, recent reports show that the vulnerability of the latter is as high as or even higher than that of the former. Although Error-Correcting Codes (ECCs) are widely used to protect the data array, their overheads are not affordable in the tag array and its protection is conventionally limited to parity code. In this paper, we propose Similarity-Managed Robust Tag (SMARTag) technique to provide the error correction capability in parity-protected tags. SMARTag exploits the inherent similarity between the upper parts of the tags in a cache set to share these parts between addresses and ECCs. Using SMARTag, the cache access time is intact since the ECC part is bypassed in normal cache operation and no extra memory is required since ECCs are stored in available tag space. The simulation results show that SMARTag is capable of correcting more than 98% of errors in the tag array, on average, and its energy consumption, area, and performance overhead is less than 0.2%.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:30

IP5-8, 287

IMPROVING THE ERROR BEHAVIOR OF DRAM BY EXPLOITING ITS Z-CHANNEL PROPERTY
Speaker:
Kira Kraft, University of Kaiserslautern, DE
Authors:
Kira Kraft¹, Matthias Jung², Chirag Sudarshan¹, Deepak M. Mathew¹, Christian Weis¹ and Norbert Wehn¹
¹University of Kaiserslautern, DE; ²Fraunhofer IESE, DE
Abstract
In this paper, we present a new communication theoretic channel model for Dynamic Random Access Memory (DRAM) retention errors, that relies on the fully asymmetric retention error behavior of DRAM cells. This new model shows that the traditional approach is over pessimistic and we confirm this with real measurements of DDR3 and DDR4 DRAM devices. Together with an exploitation of the vendor specific true- and anti-cell structure, a low complexity bit-flipping approach is presented, that can largely increase DRAM's reliability with minimum overhead.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:31

IP5-9, 668

ARCHITECTURE AND OPTIMIZATION OF ASSOCIATIVE MEMORIES USED FOR THE IMPLEMENTATION OF LOGIC FUNCTIONS BASED ON NANOELECTRONIC 1S1R CELLS
Speaker:
Arne Heittmann, RWTH-Aachen University, DE
Authors:
Arne Heittman and Tobias G. Noll, RWTH Aachen University, DE
Abstract
A neuromorphic architecture based on Binary Associative memories and nanoelectronic resistive switches is proposed for the realization of arbitrary logic/arithmetic functions. Subsets of non-trivial code sets based on error detecting 2-out-of-n-codes are thoroughly used to encode operands, results, and intermediate states in order to enhance the circuit reliability by mitigating the impact of device variability. 2-ary functions can be implemented by cascading a mixer memory, a correlator memory, and a response memory. By introduction of a new cost function based on class-specific word-line-coverage, stochastic optimization is applied with the aim to minimize the overall number of active amplifiers. For various exemplary functions optimized architectures are compared against solutions obtained using a standard-cost function. It is shown that the consideration of word-line-coverage results in a significant circuit compaction.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:32

IP5-17, 62

EXPLORING NON-VOLATILE MAIN MEMORY ARCHITECTURES FOR HANDHELD DEVICES
Speaker:
Virendra Singh, Indian Institute of Technology Bombay, IN
Authors:
Sneha Ved and Manu Awasthi, Indian Institute of Technology Gandhinagar, IN
Abstract
As additional functionality is being added to contemporary handheld devices, the SoCs inside these devices are becoming increasingly complex. Similarly, the applications executing on these handhelds are beginning to exhibit an ever increasing memory footprint. To support these trends, main memory capacity of these SoCs has been increasing over time. Due to these developments, memory system's contribution to the overall system power has increased dramatically. Non-volatile memories have been used in server architectures to increase capacity as well as keep memory system's power consumption in check. However, in the handheld domain, where user experience and battery life are of paramount importance, the applicability of such technologies has not been widely studied. In this paper, we propose and evaluate a number of hybrid memory architectures using mobile DRAM and PCM. We show that intelligent memory architectures, cognizant of workload's memory access patterns can provide significant energy savings without compromising on user experience. Using proposed approach, we can devise architectures that exhibit significant energy savings with only a 2.8% performance loss.
Download Paper (PDF; Only available from the DATE venue WiFi)

15:30

End of session
Coffee Break in Exhibition Area

Coffee Breaks in the Exhibition Area

On all conference days (Tuesday to Thursday), coffee and tea will be served during the coffee breaks at the below-mentioned times in the exhibition area (Terrace Level of the ICCD).

Lunch Breaks (Großer Saal + Saal 1)

On all conference days (Tuesday to Thursday), a seated lunch (lunch buffet) will be offered in the rooms "Großer Saal" and "Saal 1" (Saal Level of the ICCD) to fully registered conference delegates only. There will be badge control at the entrance to the lunch break area.

Tuesday, March 20, 2018