# Three-Terminal MTJ-Based Nonvolatile Logic Circuits with Self-Terminated Writing Mechanism for Ultra-Low-Power VLSI Processor

Takahiro Hanyu<sup>\*1)</sup>, Daisuke Suzuki<sup>\*2)</sup>, Naoya Onizawa<sup>\*2)</sup>, and Masanori Natsui<sup>\*1)</sup>

<sup>\*1)</sup> Research Institute of Electrical Communication, Tohoku University, Japan <sup>\*2)</sup> Frontier Research Institute for Interdisciplinary Science, Tohoku University, Japan

Abstract—Magnetic-Tunnel Junction (MTJ)-based nonvolatile logic circuits have some possibility to solve the powerdissipation problem seriously focusing on the present CMOSonly-based VLSI processors. Three terminal MTJ devices are the promising candidate as nonvolatile storage device to realize such a nonvolatile logic circuit. However, its writing energy is still serious in comparison with conventional CMOS-only-based logic circuits. In this paper, a new MTJ-based nonvolatile logic circuit with self-terminated mechanism is proposed and its energy efficiency is evaluated in comparison with the corresponding previous work. In addition, some recent research topics related to MTJ-based nonvolatile logic-circuit design and its application, such as a computer-aided-design (CAD) tool considering a stochastic MTJ-switching behavior and the application to a resilient "die-hard" VLSI processor against sudden power-supply outage, are also demonstrated.

## I. INTRODUCTION

In the Internet of Everything (IoE) era, it is strongly necessary to establish ultra-low-power VLSI architecture, while still increasing high-performance processing power. In the present CMOS-only-based VLSI, however, there are several serious problems such as communication bottleneck between memory and logic modules inside a VLSI chip, increasing power dissipation, especially standby power dissipation, and device-characteristic variation effect. On the other hand, several emerging storage devices are getting developed to overcome the weak points of ordinary semiconductor memories; dynamic random-access memory (DRAM) and static random-access memory (SRAM). Especially, magneto-resistive random-access memory (MRAM) that has already undergone a few incarnations, is now converging on a scheme for upending the memory business. Spin-transfer torque (STT) MRAM promises speed and reliability comparable to that of SRAM, where SRAM is the quick-access memory embedded inside microprocessors, along with the "non-volatility" of flash, the storage of smartphones and other portables [1-2]. Since magnetic tunnel junction (MTJ) device, the key element of MRAM, is easily distributed over a logic-circuit plane by using a threedimensional (3D) stack structure as shown in Fig. 1, performance degradation due to intra-chip global wires could be drastically mitigated, which could be lead to a highperformance, ultra-low-power and highly reliable (or highly resilient such as "die-hard"-like) logic LSIs [3-5].

One of the most useful methods to cut off leakage power is to use power gating. If the power gating is applied in the conventional CMOS-only-based logic LSI, a part of standby power can be surely eliminated, but two additional operations, "back-up" and "boost-up" procedures, must be additionally applied before and after utilizing the power gating, respectively, which may discourage to apply the power-gating technique. In contrast, the use of nonvolatile devices as storage elements maximizes the outcome of power-saving using the powergating technique. Since a judicious combination between nonvolatile devices and power-gating technique has been naturally realized in the MTJ-based nonvolatile VLSIprocessor architecture, it is truly expected that the wasted power dissipation could be ideally eliminated.

In order to accelerate the implementation of nonvolatile logic LSIs, you must make efforts to develop key technologies in every LSI-design hierarchy from device/material level to system-architecture/application level. In the following section, some concrete efforts to develop nonvolatile logic LSIs are demonstrated. First, a self-termination technique is introduced to minimize the write energy of 3-terminal MTJs as well as to verify its write operation [6-9]. In addition, two more recent research topics related to MTJ-based nonvolatile logic-circuit design and its application, such as a computer-aided-design (CAD) tool considering a stochastic MTJ-switching behavior [10-12] and the application to a resilient "die-hard" VLSI processor against sudden power-supply outage [13-16], are also demonstrated.

## II. DESIGN OF A 3-TERMINAL MTJ-BASED FLIP-FLOP WITH SELF-TERMINATED WRITE MECHANISM

A nonvolatile flip-flop (NV-FF) is an essential component for the nonvolatile logic LSI since temporal data of each function block must be clock-synchronized, backed up before power-off, and recalled just after power-on. An MTJ device is the best candidate to implement the NV-FF owing to its virtually unlimited endurance, CMOS compatibility, and 3dimensional stacking capability. While various types of MTJbased NV-FFs have been proposed, there are important issues for the MTJ-based NV-FF design. The first issue is the stochastic nature in MTJ switching. Even in the same MTJ device, the actual time to complete the write operation varies dramatically. Therefore, a longer time write current pulse is required for the reliable write operation than that of average one, which results in a large amount of energy consumption. The large amount write energy consumption is critical for the backup operation because temporal data in each NV-FF must be stored into the MTJ device before power-off. Thus, the time during which the energy saved by the power-gating technique equals to the energy lost by the backup/recall operations (referred as break-even time), becomes long. As a result, fine-grained power-gating technique cannot be applicable and the potential of nonvolatile logic LSI is not fully utilized. To overcome this issue, a self-terminated NV-FF, which makes it possible to minimize write energy for the MTJ device by monitoring the voltage change in MTJ switching and terminating write current, has been reported [6-9]. In particular, a 3-terminal MTJ (3T-MTJ) device is focused on to realize high-speed nonvolatile memories and highperformance logic gates. Since the write current path is separated from the read current path in the 3T-MTJ device, both a sense circuit for reading MTJ resistance and a write driver for applying write current can be individually optimized.

Figure 2(a) shows the schematic diagram of the selfterminated NV-FF, which is composed of a master latch, a slave latch, a write driver, a nonvolatile storage cell, and a self-termination circuit, and Figure 2(b) shows the comparison of nonvolatile storage cells using the 2T-MTJ device and the 3T-MTJ device. In case of 3T-MTJ-based nonvolatile storage cell, the read-operation path is separated from the writeoperation path, which makes it possible to relax design space exploration.

Figure 3 shows the basic behavior of the self-terminated NV-FF. Three times of backup operation (denoted as *BCK*) are performed just after each basic operation (denoted as *OP*). In the first backup operation, Nq=0 is stored into the MTJ device as *M*. After the MTJ switching, both *DONE* and *STR* quickly become high and low, respectively. Then, the write current  $I_{WR}$  is terminated. In the second backup operation, Nq=1 is stored into the MTJ device in the similar manner. In the third backup operation, in contrast,  $I_{WR}$  is skipped because both Nq and *M* are '1' and the backup operation is skipped.

Figure 4(a) shows the comparison of 20 cases of backup energy consumption in conventional 3T-MTJ-based NV-FF using the worst-cased-oriented method and those of the selfterminated NV-FF. By utilizing the self-terminated mechanism, the backup energy consumption is greatly reduced. In fact, the average backup energy is reduced by 69% compared with that of a conventional non-self-terminated method as shown in Fig. 4(b).

## III. CONTENT-AWARE WRITE ERROR MASKING TECHNIQUE CONSIDERING STOCHASTIC MTJ-SWITCHING BEHAVIOR

In order to design practical-scale MTJ/MOS-hybrid logic LSI and to broaden the scope of its application, it is important to establish a sophisticated design environment that can faithfully reflect the physical behavior of the MTJ device, while retaining good compatibility with standard EDA tools for CMOS-based VLSI. Especially in the case of such a novel logic LSI design, the effect of stochastic behavior of MTJ devices due to unavoidable thermal fluctuations of

magnetization [17] must be reproduced since it often causes unexpected fatal errors in performing logic operations. Some approaches have been reported, based on Verilog-A model, however, these results were limited only to primitive logic gates or logic circuits with regular structure such as logic-array structure. From this point of view, a new design flow for MTJ/MOS-hybrid LSI considering stochastic behavior of MTJ devices has been reported by combining new supplementary design libraries that reflect the physical behavior of MTJ device and de-facto standard EDA tools [10]. By utilizing the proposed flow, various MTJ-based nonvolatile logic-inmemory (NV-LIM) circuits can be designed with Verilog Hardware Description Language (Verilog-HDL). Their operation including the effect of MTJ's stochastic switching behavior can be verified by analog-mixed-signal (AMS) simulation. Figure 5 shows a layout of the microprocessor with 90nm MOS/100nm perpendicular MTJ technology generated by the proposed flow, whose layout validity can be completely verified through DRC and LVS using standard EDA tools. The processor is based on a general purpose 32-bit microprocessor (ARM Cortex-M0), where all flip-flops in it are replaced with nonvolatile flip-flops to add non-volatility. Figure 6 shows a block diagram of the test-bench structure monitoring stochastic behavior of MTJ devices. Both counting of the number of write errors by digital simulation and difference calculation of resistance values of each MTJ device by analog simulation are simultaneously executed by maximally utilizing the AMS simulation environment.

As an concrete example of nonvolatile logic-LSI design using the above CAD environment, an MTJ-based video coding (motion-vector (MV) prediction) hardware with an MTJ-write-error-rate relaxation scheme is demonstrated under a 90 nm MOS and 75 nm perpendicular MTJ process. Figure 7 shows the circuit structure of a processing element (PE) with a dynamic error masking function. The basic function of the PE is to calculate the absolute difference between input and the stored pixel values in an 8-bit nonvolatile memory. The error masking function is embedded into each PE compactly by partially sharing the circuit for absolute difference function. The PE has two modes; test mode and operation mode. If nonnegligible errors are found in the check mode, 1-bit nonvolatile flip-flop (NVFF) is set to 1, and the PE outputs 0 in the operation mode independent of the input data until the next write operation occurs. Figure 8 shows the effect of the proposed error checking function on relaxation of acceptable MTJ write error rate. Three different ways to handle error are listed by evaluating acceptable MTJ write error rate to satisfy less than 0.5-pixel average root-mean-square error in predicted MVs. The proposed dynamic error-checking scheme is quite simple but effectively mitigates the impact of failed memories regardless of the content of the video sequences, and achieves 7.8 times improvement at maximum. This technique can apply LIM-style circuitry in general, and would contribute to the design of highly reliable and low-power NV-LIM LSI.

# IV. SUDDEN POWER-OUTAGE RESILIENT IN-PROCESSOR CHECKPOINTING TECHNIQUE

In energy harvesting applications, a power supply generated from a renewable power source is unstable that may induce frequent sudden power outages, causing the inconsistency among distributed nonvolatile flip-flops (NVFFs) and hence failure rollbacks in conventional nonvolatile processors. Checkpointing techniques [18] using nonvolatile storage devices are one of effective solutions to realize continuous operations upon frequent power outages as shown in Fig. 9. The nonvolatile-storage based checkpointing can reduce the rollback time not to access the external storage while the consistency between two different nonvolatile stored data must be maintained. Until now, a few papers [19-20] have reported the methods using nonvolatile memories, such as MRAMs and ferroelectric RAMs, instead of SRAM, to store checkpoints. However, the within-processor consistency of stored data has not been considered.

The proposed in-processor checkpointing technique fixes the inconsistency using time-reminding redundant NV-FFs (TMRNVFFs). Fig. 10 shows a circuit diagram of the timereminding redundant nonvolatile flip-flops (TM-RNVFFs), where the redundancy, r=3, is used. The TM-RNVFFs store the current and the past few data with the timing information of storing. If several NV-FFs fail to store the current data due to the sudden power outages, the proposed in-processor checkpointing technique exploits the timing information to find the common newest state among distributed NV-FFs, leading to correct rollbacks to the state with consistency.

The sudden power-outage effect is modeled to perform design space explorations at different configurations, such as redundancy and checkpointing period. Nonvolatile ARM Cortex-M0 processors are designed using hybrid 90nm CMOS and 70nm magnetic tunnel junction (MTJ) technologies as shown in Fig. 11. Table 1 summarizes the performance on nonvolatile ARM Cortex-M0 processors. Based on the design space explorations, the proposed nonvolatile processor achieves a several order-of magnitude reduction in rollback error probability with a power dissipation overhead of 11.6% and an area overhead of 52.1% in comparison with the conventional nonvolatile processor.

#### V. CONCLUSION

Some key techniques for realizing practical-scale nonvolatile logic LSIs have been introduced and their usefulness has been demonstrated in comparison with those of the corresponding conventional approaches. As a future prospect, it is also important to extend the nonvolatile MTJbased computing technique toward non-standard VLSIcomputing paradigm, such as brain-inspired computing.

#### ACKNOWLEDGMENT

A part of this research was supported by JST ImPACT Program, R&D for Next-Generation Info. Tech. of MEXT, JST COI program, and JSPS KAKENHI Grant No. 15H02254 in Japan.

### REFERENCES

 S. Ikeda, J. Hayakawa, Y. M. Lee, F. Matsukura, Y. Ohno, T. Hanyu, and H. Ohno, "Magnetic Tunnel Junctions for Spintronic Memories and Beyond," IEEE Trans. Electron Devices, vol.54, no.5, pp.991-1002, May 2007.

- [2] R. Courtland, "Spin memory shows its might," IEEE Spectrum, pp.11-12, Aug. 2014.
- [3] T. Hanyu, T. Endoh, D. Suzuki, H. Koike, Y. Ma, N. Onizawa, M. Natsui, S. Ikeda, and H. Ohno, "Standby Power-Free Integrated Circuits Using Spintronics-Based VLSI Computing," Proc. IEEE, vol.104, no.10, pp.1844-1863, Oct. 2016.
- [4] M. Masuduzzaman and M. A. Alam, "Emergence of Memtronics: from Memory to Sensor, Logic and Display," IEEE Electron Devices Society Newsletters, Technical Briefs, vol.23, no.4, pp.1-5, Oct. 2016.
- [5] W. Kang et al., Spintronic logic design methodology based on spin Hall effect driven magnetic tunnel junctions. J. Phys. D: Appl. Phys., vol. 49, no. 6, pp. 065008, 2016.
- [6] D. Chabi, et al., "Ultra Low Power Magnetic Flip-Flop Based on checkpointing/Power Gatingand Self-Enable Mechanisms," IEEE Trans. on Circuits and Systems I, vol.61, no.6, pp.1755-1765, 2014.
- [7] D. Suzuki, M. Natsui, A. Mochizuki, and T. Hanyu, "Cost-Efficient Self-Terminated Write Driver for Spin-Transfer-Torque RAM and Logic," IEEE Trans. Magn., vol. 50, no. 11, pp. 3402104, Nov. 2014.
- [8] D. Suzuki and T. Hanyu, "A Greedy Power Saving of MTJ-Based Nonvolatile FPGA with Self-Terminated Logic-In-Memory Structure," Proc. of Int. Conf. on Field-Programmable Logic and Applications (FPL), pp.1-4, Aug. 2016.
- [9] D. Suzuki and T. Hanyu, "Design of a Self-Terminated Low-Power Nonvolatile Flip-Flop Using 3-Terminal Magnetic-Tunnel-Junction-Based Self-Terminated Mechanism," Japanese Journal of Applied Physics (JJAP) (to appear).
- [10] M. Natsui, A. Tamakoshi, A. Mochizuki, H. Koike, H. Ohno, T. Endoh, and T. Hanyu, "Stochastic Behavior-Considered VLSI CAD Environment for MTJ/MOS-Hybrid Microprocessor Design," 2016 IEEE International Symposium on Circuits and Systems (ISCAS2016), pp.1878-1881, May 2016.
- [11] M. Natsui, A. Tamakoshi, T. Endoh, H. Ohno, and T. Hanyu, "Highly Reliable MTJ-Based Motion-Vector Prediction Unit with Dynamic Write Error Masking Scheme," Proc. of 2016 International Conference on Solid State Devices and Materials (SSDM2016), pp.77-78, Sept. 2016.
- [12] M. Natsui, A. Tamakoshi, T. Endoh, H. Ohno, and T. Hanyu, "Fabrication of an MTJ-Based Nonvolatile Logic-in-Memory LSI with Content-Aware Write Error Masking Scheme Achieving 92% Storage Capacity and 79% Power Reduction," Japanese Journal of Applied Physics (JJAP) (to appear).
- [13] N. Onizawa, A. Mochizuki, A. Tamakoshi, and T. Hanyu, "A Sudden Power-Outage Resilient Nonvolatile Microprocessor for Immediate System Recovery," Proc. 11th IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH), pp. 39 44, July 2015.
- [14] N. Onizawa and T. Hanyu, "A Soft/Write-Error Resilient CMOS/MTJ Nonvolatile Flip-Flop Based on Majority-Decision Shared Writing," Proc. of 2016 International Conference on Solid State Devices and Materials (SSDM2016), pp.79-80, Sept. 2016.
- [15] N. Onizawa, A. Mochizuki, A. Tamakoshi, and T. Hanyu, "Sudden Power-Outage Resilient In-Processor Checkpointing for Energy-Harvesting Nonvolatile Processors," IEEE Trans. on Emerging Topics in Computing (to appear; DOI 10.1109/TETC.2016.2604083).
- [16] N. Onizawa and T. Hanyu, "A Soft/Write-Error Resilient CMOS/MTJ Nonvolatile Flip-Flop Based on Majority-Decision Shared Writing," Japanese Journal of Applied Physics (JJAP) (to appear).
- [17] M. Marins de Castro et al., "Processional spin-transfer switching in a magnetic tunnel junction with a synthetic anti-ferromagnetic perpendicular polarizer," J. Appl. Phys., vol.111, p.07C912, 2012.
- [18] R. Oldfield, et al., "Modeling the impact of checkpoints on nextgeneration systems," in 24th MSST, Sept 2007, pp. 30–46.
- [19] P. Chi, et al., "Using multi-level cell STT-RAM for fast and energyefficient local checkpointing," in 2014 ICCAD, Nov 2014, pp. 301–308.
- [20] X. Mimi, et al., "Fixing the broken time machine: Consistencyaware checkpointing for energy harvesting powered non-volatile processor," in 52nd DAC, June 2015, pp. 1–6.



Figure 1: MTJ-based Nonvolatile logic-LSI architecture with logic-in-memory structure.



Figure 3: Basic behavior, where three times of backup operations (denoted as BCK).are performed just after each basic operation (denoted as OP).



| CLK<br>D-Master<br>latch | Nq CLK<br>Slave latch D                    |           |                                                       |
|--------------------------|--------------------------------------------|-----------|-------------------------------------------------------|
| Write driver             | STR BCK<br>Self-<br>termination<br>circuit | Y Na-DoDo | CMP<br>STR-4<br>GND-4-BCK<br>GND-4-BCK<br>Termination |
| storage cell             |                                            | (a)       |                                                       |



Figure 2: Design of a self-terminated NV-FF; (a) overall schematic, and (b) nonvolatile storage cell configurations.

|                                      | Worst-case-oriented<br>(Conventional) <sup>*3)</sup> | Self-terminated<br>(Proposed) |
|--------------------------------------|------------------------------------------------------|-------------------------------|
| Average backup<br>energy [pJ] *1, 2) | 11.5 pJ                                              | 3.57 pJ                       |

\*1) The number of iterations is 100.

\*2) Random patterns are used for the logic inputs and initial states of 3T-MTJ devices.

\*3) The width of the write current pulse is fixed to 12 ns.

(b)

Figure 4: Monte-Carlo simulation result; (a) comparison of write energy consumptions with 20 benchmark circuits, (b) comparison of average energies.



microprocessor.

Figure 8: Effect of an error masking

function on relaxation of acceptable

MTJ write error rate.



Figure 6: Block diagram of the test-bench structure for simulation.





Video sequences

Figure 7: Processing element with error detection/masking functions. The existence of error is represented by the signal ERR.



Table 1: Performance comparison of nonvolatile ARM Cortex-M0 processors.

|                                                                                                      |                            | Conventional                                   | Proposed                     |                              |                                            |                                |                                |                                            |
|------------------------------------------------------------------------------------------------------|----------------------------|------------------------------------------------|------------------------------|------------------------------|--------------------------------------------|--------------------------------|--------------------------------|--------------------------------------------|
|                                                                                                      |                            |                                                | Triple-redundant NVFFs (r=3) |                              |                                            | Dual-redundant NVFFs (r=2)     |                                |                                            |
| Configuration (k)                                                                                    |                            | -                                              | 10                           | 20                           | 40                                         | 10                             | 20                             | 40                                         |
| Total power dissipation [mW]<br>@10 MHz<br>(P <sub>t</sub> )                                         | n=1<br>n=5<br>n=10<br>n=20 | 3.04<br>1.06<br>0.82<br>0.69                   | 5.42<br>1.99<br>1.56<br>1.35 | 4.97<br>1.75<br>1.35<br>1.15 | 4.75<br>1.63<br>1.24<br>1.05               | 3.89<br>1.46<br>1.16<br>1.01   | 3.51<br>1.28<br>1.00<br>0.86   | 3.29<br>1.16<br>0.90<br>0.77               |
| Rollback error probability $(p_E)$<br>@ $\sigma_G$ =1 and $\sigma_L$ =0.015                          | n=1<br>n=5<br>n=10<br>n=20 | 0.3173<br>0.3173<br>0.3173<br>0.3173<br>0.3173 | 0.0027<br>~0<br>~0<br>~0     | 0.0027<br>~0<br>~0<br>~0     | 0.0027<br>4.71E-08<br>4.71E-08<br>4.71E-08 | 0.0455<br>~0<br>~0<br>~0<br>~0 | 0.0455<br>~0<br>~0<br>~0<br>~0 | 0.0455<br>5.55E-12<br>5.55E-12<br>5.55E-12 |
| Area $[\mu m^2]$<br>Write energy per checkpointing [p]]<br>Re-computation energy $(E_r)$ [p]] (n=10) |                            | 85,282<br>247<br>256                           | 197,073<br>428<br>511        | 186,835<br>403<br>426        | 177,307<br>390<br>383                      | 150,492<br>304<br>385          | 136,858<br>278<br>326          | 129,681<br>265<br>288                      |