# STT-RAM Designs Supporting Dual-port Accesses

Xiuyuan Bi<sup>1</sup>, Mohamed Anis Weldon<sup>2</sup> and Hai Li<sup>1</sup>

<sup>1</sup>Department of Electrical & Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA

<sup>2</sup>Department of Electrical & Computer Engineering, Polytechnic Institute of NYU, New York, NY, USA

Email: {xib5, hal66}@pitt.edu, mweldo01@students.poly.edu

Abstract—The spin-transfer torque random access memory (STT-RAM) has been widely investigated as a promising candidate to replace the static random access memory (SRAM) as on-chip cache memories. However, the existing STT-RAM cell designs can be used for only single-port accesses, which limits the memory access bandwidth and constraints the system performance. In this work, we propose the design solutions to provide dual-port accesses for STT-RAM. The area increment by introducing an additional port is reduced by leveraging the shared source-line structure. Detailed analysis on the performance/reliability degradation caused by dual-port accesses and the corresponding design optimization are performed. We propose two types of dual-port STT-RAM cell structures having 2 read/write ports (2RW) or 1-read/1-write port (1R/1W), respectively. Comparison shows that a 2RW STT-RAM cell consumes only 42% of area of a dual-port SRAM. The 1R/1W design further reduces 7.7% of cell area under same performance target.

#### I. INTRODUCTION

The continuously increasing demand on system performance and functionality in recent years has greatly stimulated the development of Chip-Multiprocessor (CMP) and Systemon-Chip (SoC). Consequently, the large instruction and data exchange among different memory hierarchies makes the memory accesses more and more critical. Often a memory array receives multiple requests from one or many cores at the same time. The single-port memory which grants access to one request and stalls all the others can lead to significant performance degradation. Therefore, the dual-port or multiport memory to reduce access conflicts and provide high memory bandwidth becomes a popular approach [1][2][3].

The traditional static random access memory (SRAM) as on-chip memory is facing severe challenges at the scaled technology node such as large cell size, higher leakage power and vulnerability to soft-errors [4]. On the other side, spin-transfer torque random access memory (STT-RAM) has demonstrated great potential to replace SRAM in near future and attracted much attention from both academia and industry world [5][6]. By storing the data as the relative magnetic direction of magnetic tunnel junction (MTJ), STT-RAM provides high density, fast access speed, zero standby power, as well as hardness to radiation-injected soft-errors.

However, all the previous STT-RAM designs can support only single-port access [7][8]. For example, the popular onetransistor-one-MTJ cell structure contains only one set of word-line (WL), bit-line (BL), and source-line (SL), which makes dual-port access impossible. Considering the fact that writing to a STT-RAM cell takes longer time than programming a SRAM cell, the stall of the pending accesses of STT-RAM will become even more severe, especially when the port is occupied by write operations. Therefore, the dual-port or multi-port STT-RAM cell design is necessary to enhance the system performance.

In dual-port SRAM designs [1][9], the additional port access is implemented by adding two extra access transistors and one set of WL/BL to the six-transistor cell design. However, as we shall show in Section III, the same design method cannot be applied to STT-RAM design for the extremely large area overhead. By leveraging the shared source-line array structure [10][11], we propose a STT-RAM design solution that supports dual-port accesses by paying a small cell area. In our design, each STT-RAM cell has two BLs and a memory array shares a single grounded SL. *To the best knowledge of authors, it is the first STT-RAM design that enables the dualport accesses*.

To meet the different access requirements of various applications, two types of designs are presented. In a 2*RW* STT-RAM cell, both data access ports can support read and write operations. In contrast, *1R/1W* STT-RAM has one read-only port and one write-only port. Separating the read and write accesses reduces the size requirement of access transistor, therefore the even smaller cell area can be achieved. Furthermore, we analyze the reliability of the proposed structures and present the design and layout optimization techniques for density improvement. Our results show that the area of a 2RW STT-RAM cell is only 42% of a dual-port SRAM's area. Compared to a single-port STT-RAM cell, the area overhead introduced by the extra port access is 39%. The 1R1W design with the constrained access flexibility can further save 7.7% of cell area and reduce the probability of read-disturbance.

## **II. STT-RAM BASICS**

The basic storage element in STT-RAM is magnetic tunneling junction (MTJ). Conceptually, an MTJ contains three layers as shown in Figure 1(a): two ferromagnetic layers are respectively named as reference layer and free layer, which are separated by an oxide barrier, *e.g.* MgO. The magnetization direction of the reference layer is fixed, but the magnetization direction of the free layer can be switched through a spin polarized current [12]. For example, a large current injected from the free layer to the reference layer can switch the magnetization direction of the free layer to be parallel to that



Fig. 1. (a) MTJ in parallel and anti-parallel states; (b) 1T-1J STT-RAM cell.

of the reference layer, and vice versa. When the magnetization directions of the two ferromagnetic layers are *parallel* (P) or *anti-parallel* (AP), the MTJ demonstrates a low- or high-resistance state, representing logic '0' or '1', respectively.

Figure 1(b) illustrates the most popular STT-RAM cell structure consisting of one NMOS transistor and one MTJ (1T-1J) [12][7]. The NMOS transistor, named as the access transistor, connects to the MTJ's reference layer and controls the accessibility of the MTJ. Since there is only one set of WL, BL and SL, this cell structure can only be used for single-port memory design. The MTJ pillar has a very small area so the NMOS transistor determines the area of a STT-RAM cell. In other words, a small transistor is expected for high density. However, the MTJ switching performance strongly relies on the switching current [13]. Reducing transistor size reduces switching current through MTJ and hence degrades the write performance.

In this work, we use 65nm CMOS technology [14] with a  $65nm \times 130nm$  in-plane MTJ model calibrated against the experimental data [8]. The switching behavior of the MTJ is modeled based on the Landau-Lifshitz-Gilbert equation [13]. The detailed parameters are listed in Table I.

## III. DUAL-PORT STT-RAM DESIGN CHALLENGES

Figure 2(a) illustrates a typical SRAM design with two sets of read/write ports [1][9]. Compared to a single-port SRAM cell with six transistors, two more transistors (M1 and M2) associated with the wordline control (WLB) and the data access connections (BLB and  $\overline{\text{BLB}}$ ) of the second port, are inserted.

By following the same design concept, Figure 2(b) shows a dual-port STT-RAM cell with four transistors and one MTJ (4T-1J). Here, a duplicate pair of BL and SL provide the access through the second port. Compared to the single-port STT-RAM cell in Figure 1(b), three additional transistors (M1, M3, and M4) are needed for access control. Note that in a STT-RAM array, the BL and SL are usually shared by entire

TABLE I

| SIMULATION PARAMETERS                             |                                     |  |
|---------------------------------------------------|-------------------------------------|--|
| Technology <sup>1</sup>                           | 65nm                                |  |
| VDD                                               | 1.2V                                |  |
| MTJ geometry                                      | $65 \text{nm} \times 130 \text{nm}$ |  |
| $R_P/R_{AP}$                                      | $1.88/3.77$ k $\Omega$              |  |
| $AP \rightarrow P$ Switching Current <sup>2</sup> | $112\mu A$                          |  |
| $P \rightarrow AP$ Switching Current <sup>2</sup> | $142\mu A$                          |  |

<sup>1</sup> The minimun channel length is 60nm.

<sup>2</sup> At 10ns switching time.



Fig. 2. (a) A typical dual-port SRAM. (b) A 4T-1J dual-port STT-RAM.

column. For single-port cells, only one memory cell within a column can be activated at a time. Therefore, one transistor at SL terminal is sufficient to control the accessibility to one cell per column. In contrast, a dual-port array may simultaneously access two cells within one column through Port-A and Port-B, respectively, as illustrated in Figure 3(a). Determined by the operation type and data pattern, the two concurrent accesses could have the different BL voltages. Thus, M1 and M3 are necessary to isolate BLA and BLB from each other.

Due to degraded biasing condition, the 4T-1J dual-port STT-RAM cell is functionally correct by paying significant area overhead compared to the 1T-1J single-port design. As shown in Figure 1(b), a conventional 1T-1J STT-RAM encounters  $V_{GS}$  degradation induced by the voltage drop on MTJ only in write-1 operations, which constrains the switching current through MTJ. The 4T-1J dual-port STT-RAM has a symmetric cell structure: along an access path, e.g., from BLA to SLA, two transistors M1 and M2 are turned on and connected side by side of the MTJ. No matter in write-1 or write-0 operations, one of them suffers from V<sub>GS</sub> degradation, as shown in Figure 3(b). In other words, the biasing condition of the access transistors in the 4T-1J cell is much worse than that of a 1T-1J design. We have to enlarge all the access transistors to provide sufficient MTJ switching current in write operations.

Figure 4(a) shows the relation between the write-1 current and the size of the access transistors in the 4T-1J design, assuming all the four transistors are of the same size. Here, the write-1 operation dominates the transistor size selection because of the asymmetric  $P \rightarrow AP$  and  $AP \rightarrow P$  switching currents of the MTJ device used in this work (refer Table I). To obtain the write time of 10ns, the access transistors' width is approximately 1400nm. Integrating four such large transistors into one memory cell leads to a cell area of  $575F^2$ , which



Fig. 3. (a) When two cells with in a column is accessed by two ports. (b) Biasing condition for 4T-1J.



is even bigger than that of the dual-port SRAM design (*e.g.*,  $233F^2$  reported in [9]). It is not acceptable to adopt such a large STT-RAM design for on-chip applications.

#### IV. STT-RAM DESIGN WITH TWO READ/WRITE PORTS

#### A. Design Concept

Previously, the shared SL for single-port STT-RAM array has been proposed by Zhao *et. al.* to increase array density [10]. It also has been used to balance the write-0 and write-1 performance [11]. The basic design concept is that all the cells on the same row share the same SL, then all the SLs are connected together and grounded (GND/OV).

In this work, we propose to reduce the cell area of dual-port STT-RAM design by utilizing the shared SL structure. Figure 5 depicts the STT-RAM design with two read and write ports (2RW). Please note in [10], the grounded (0V) SL is connected to the transistor, but in the proposed 2RW design, the grounded SL is connected to the MTJ in order to support Dual-Port. The write-1 operation requires a switching current from SL (GND) to BL, so a negative voltage (V<sub>BLN</sub>) need be applied to BL. Such V<sub>BLN</sub> can be generated using level converter[15].

The 2RW cell design can significantly reduce the cell area compare to 4T-1J. First, since the SL is always connected to GND, isolating SLs of different memory cells is no longer necessary. The transistors used for SL access control in STT-RAM cell can be removed. Only two transistors M1 and M2 remain to enable/disable the access to Port-A and Port-B, respectively. Thus, the number of transistors reduces to half of the 4T-1J dual-port design. Second, the width of access transistors can greatly decrease because only one transistor exists along the current path between BL and SL. Figure 4(b) shows the relation between the write-1 current and the size of the access transistors in the proposed 2RW STT-RAM design.



Fig. 6. Illustration of how the access pattern affect the  $V_{\rm S}$ .

The required transistor width to achieve the 10ns write time is 585nm, which is only  $\sim 42\%$  of the access transistor size of the 4T-1J STT-RAM cell. The area of a proposed 2RW cell is approximately 21% of the 4T-1J design.

## B. Reliability Analysis

The voltage of the shared SL (V<sub>S</sub>) in the single-port STT-RAM array may not be ideal 0V due to the existence of the parasitic resistance ( $R_S$ ) [10]. Figure 6 illustrates the scenario. When turning on WL and applying a certain voltage to BL, the variation on  $V_{\rm S}$  exists and induces degradation on both read and write performance. For example, if V<sub>S</sub> is higher than ideal 0V, the actual voltage drop across the BL and SL reduces. Consequently, the write-1 current becomes lower than the projected value obtained under the ideal condition. In read operations, a higher/lower VS can decrease/increase the read-0/read-1 current. The reduced difference between read-0 and read-1 currents could result in more read errors. V<sub>S</sub> variation can also leads to higher possibility of read disturbance, i.e., unwanted '0' $\rightarrow$ '1' switch when reading a cell which stored '0' [7]. A negative  $V_S$  will increase the read-0 current ( $I_{R0}$ ) and bring it closer to the  $P \rightarrow AP$  switching current (I<sub>W1</sub>).

For the proposed 2RW STT-RAM design, the impact of the  $V_{\rm S}$  variation becomes even more severe. First, the  $V_{\rm S}$  variation increases as the number of cells being accessed grows. When both ports access the cells on the same row as illustrated in Figure 6, the number of cells doubles compared to that of single-port STT-RAM array. So a larger  $V_{\rm S}$  variation is expected. Moreover, we notice that in the single-port STT-RAM, the read operations have a lower  $V_S$  variation than the write operation. This is because the write requires a bigger voltage amplitude applied to BL (|V<sub>B</sub>|) and the only port can perform either write or read access. However, for the 2RW STT-RAM design, it is possible that the read and write are conducted simultaneously through the two sets of ports. The interaction in between degrades the V<sub>S</sub> variation of read operations. Third, the value of  $V_{\rm S}$  is also affected by the MTJ resistance states of the cells being accessed. When the MTJ is at high resistance state, the  $V_{\rm S}$  is more reluctant to be disturbed by V<sub>B</sub>.

Here, we use  $n_{\{A/B,R/W1/W0,H/L\}}$  to represent the number of the cells under certain access pattern. The subscript A/Bindicates Port-A or Port-B access. R/W1/W0 describes the operation modes, including read, write-1, or write-0. H/Lrepresents the high or low resistance states of MTJ. For example,  $n_{\{B,W1,L\}}$  is the number of the cells that are with

|                     |                            | Dual-Port Access                                      |               |                     | Single-Port Access              |               |                   |
|---------------------|----------------------------|-------------------------------------------------------|---------------|---------------------|---------------------------------|---------------|-------------------|
|                     | Ideal Current <sup>1</sup> | Worst-Case Pattern <sup>2</sup>                       | Worst Current | Worst Vs            | Worst-Case Pattern <sup>2</sup> | Worst Current | Worst Vs          |
| Write-1             | $142\mu A$                 | n{A,W1,L}=8,n{B,W1,L}=8                               | $130\mu A$    | 57.5mV              | n{A,W1,L}=8                     | $138\mu A$    | 30.4mV            |
| Write-0             | $112\mu A$                 | n{A,W0,H}=1, n{A,W0,L}=7,<br>n{B,W0,L}=8              | $96\mu A$     | -71.4mV             | n{A,W0,H}=1,<br>n{A,W0,L}=7     | $104 \mu A$   | -37.4mV           |
| Read-1              | $29.0\mu A$                | n{A,R,H}=8, n{B,W0,L}=8                               | $35.7\mu A$   | $-32.1 \mathrm{mV}$ | n{A,R,H}=8                      | $27.8\mu A$   | $6.1 \mathrm{mV}$ |
| Read-0              | $47.6\mu A$                | $n{A,R,L}=8, n{B,W1,L}=8$                             | $34.9\mu A$   | 37.7mV              | n{A,R,L}=8                      | $44.3\mu A$   | 9.8mV             |
| $I_{W1}$ - $I_{R0}$ | $94.4 \mu A$               | $\frac{n\{A,R,L\}=1, n\{A,R,H\}=7,}{n\{B, W0, L\}=8}$ | $83.7\mu A$   | $-31.5 \mathrm{mV}$ | $n{A, R, L} =1, n{A, R, H}=7$   | $96.6\mu A$   | $6.6 \mathrm{mV}$ |

TABLE II WORST-CASE ANALYSIS OF THE 2RW CELL. TRANSISTOR WIDTH=585nm;  $V_{READ} = 0.14V$ ;  $V_{BLN} = 0.50V$ .

 $^{1}$  V<sub>S</sub> = 0V for Ideal case.

<sup>2</sup> Unlisted n indicates the corresponding value is 0.

low MTJ resistance and conducting write-1 operations through Port-B.

Without loss of generality, we studied the current through a 2RW STT-RAM cell when it is accessed through Port-A. Table II summarizes its worst-case current and the corresponding access patterns in read and write operations. In the experiment, we assume a SL is shared by 32 columns, and each port accesses only 8 cells by using column selection, which is very common to support set-associative cache. The R<sub>S</sub> of such setup is set to  $27.5\Omega$  according to [10]. The worst-case scenario happens when all the 16 cells being accessed fall on the same row. For comparison purpose, the currents under the ideal condition when V<sub>S</sub> is exactly 0V are also presented.

The simulation results show that write-1 and write-0 currents drop from  $142\mu A$  and  $112\mu A$  projected under the ideal condition to  $130\mu A$  and  $96\mu A$  in the worst scenario, respectively. The write current degradation means the design cannot meet the target of a 10ns switching time. The situation for read operation is even worse: ideally the read-0 current is  $18.6\mu A$  more than the read-1 current. However, in the worst-case combination, the read-0 current turns to be less than that the read-1 current, which can result in read decision errors. One possibly way to solve this is to increase the read voltage (Section IV-C). The margin between read-0 current and  $P \to AP$  switching current  $(I_{\rm W1}-I_{\rm R0})$  reduces from  $94.4\mu A$  to  $83.7\mu A$  under worst-case, which indicate higher possibility of read disturbance. Please note the "worst-case" for the  $I_{W1} - I_{R0}$  occurs when  $I_{R0}$  reach its highest value.

In Table II, we also show the results when disabling Port-B, which is indeed equivalent to single-port access. The results show that the second set of access ports results in  $8\mu A$  degradation on both write-1 and write-0 currents in the worst-case condition. The difference between read-0 and read-1 currents dramatically drops  $17.3\mu$ A due to the interaction between read and write in dual-port accesses.

## C. The Cell Configuration and The Operating Setup

Previously we demonstrate that the variation of  $V_{\rm S}$  is exaggerated by the dual-port access, which must be considered when determining the access transistor size in cell design and setting up the operating conditions, *i.e.*, the read and write voltages.

For the given MTJ device in Table I, the write-1 operation is critical in transistor size selection. To compensate the current degradation under the worst-case access pattern, we have further increase the transistor width. The simulation result in Figure 7(a) shows that to maintain the write-1 current at  $142\mu A$  in the worst-case condition, the access transistor grows to 715nm in width.

The negative voltage (V<sub>BLN</sub>) for write-0 operations also needs to be adjusted to compensate the impact of V<sub>S</sub> variation. With the access transistor width of 715nm, Figure 7(b) shows that  $|V_{BLN}|$  should increase to 0.58V to obtain the  $112\mu$ A write-0 current in the worst-case condition. Figure 7(c) demonstrates the relation between the read voltage (V<sub>Read</sub>) and the current difference in read-1 and read-0 operations. The negative value of current difference indicates that the read-0 produces a smaller current than the read-1, which will result in inevitable read decision error. Increasing V<sub>Read</sub> can significantly improves the read current difference. On the other hand, the higher read-0 current can increase the chance of read disturbance.

## D. Layout Design

Figure 8(a) shows the layout of the proposed 2RW STT-RAM cell, where  $\lambda$  is half of the feature size (F). Based on the analysis in the previous section, the access transistor width is 715nm (11F). The two access transistors in one cell can share the diffusion area, which is connected to the MTJ.





Fig. 8. (a) 2RW layout. (b) The directly tiled layout. (c) The optimized layout with shared diffusion.

Fig. 9. Access pattern of 1R1W.

Figure 8(b) shows that when directly tiling up the cells on a column, the diffusion area of two adjacent cells cannot be shared. Because WLA and WLB are driven from two separate decoders, WLB<0> and WLA<1> could be turned on at the same time. In such operation situation, sharing the diffusion area can results in current flowing through the two MTJs, which is not allowable. In contrast, we can safely share the diffusion area by vertically flipping the bottom cell as shown in Figure 8(c). The shared diffusion is controlled by WLB<0> and WLB<1>. They are driven by the same decoder and won't be turned on simultaneously. As a result, the height of a memory cell greatly reduces from  $25\lambda$  to  $16\lambda$ .

The area of the optimized cell in Figure 8(c) is  $100F^2$ , which is about 42% of the area of a 2RW SRAM design  $(233F^2)$  reported in [9]. Comparing to the single-port 1T-1J cell which obtains same write performance with our MTJ parameter  $(72F^2)$  [16], the area overhead of introducing an additional port is about 39%.

## V. STT-RAM DESIGN WITH 1-READ/1-WRITE PORT

## A. Design Concept

Some dual-port SRAM designs restrict the port functionality [2][3]: one support read operations only and the other is for writes only. Such designs with 1-read/1-write port (1R1W) can alleviate the degradation of static noise margin, compared to 2RW design.

Similarly, the 1R1W design concept can be applied to the dual-port STT-RAM to reduce the impact of  $V_S$  variation. Figure 9 illustrates the access pattern when constraining the port functionality to 1R1W. Not like writes through two port aggravate the  $V_S$  variation in 2RW STT-RAM, a write in 1RW design can be accompanied to only a read through the other port. Since the read voltage is much lower than the write voltage,  $V_S$  reduces compared to the 2RW case. Moreover, the positive  $V_{READ}$  tends to bring  $V_S$  to the positive direction, which actually improves the write-0 current strength. Therefore, the worst-case access patterns for the write operations in 1R1W STT-RAM is redefined as shown in Table III. The patterns for read operations remain the same as the 2RW design in Table II.

## B. Transistor Sizing and Operating Voltage

Benefiting from the improved worst-case write current, the 1R1W design can shrink the transistor sizes to achieve the same write performance as the 2RW design. For example, if assuming the two access transistors are of the same dimension and setting  $V_{Read}$  to 0.24V, our simulation shows that the transistors can reduce to 670nm.

Moreover, if utilizing the different sizes to the read access transistor ( $W_R$ ) and the write access transistor ( $W_W$ ), the design could be further reduced. On one hand, the increased resistance induced by a smaller  $W_R$  helps reduce the  $V_S$  variation, which in turn alleviates the sizing requirement for  $W_W$ . On the other hand, the smaller read access transistor degrades the read current difference  $I_{Rdiff}$ , which could lead to more read errors. To maintain  $I_{Rdiff}$  when decreasing transistor sizes, we can increase  $V_{READ}$ , which however exaggerates the  $V_S$  variation.

For a given  $W_R$ , we proposed the following design flow to obtain the minimum  $W_W$  and hence the most area-efficient configuration:

Step 0: Randomly choose a  $V_{READ}$ .

Step 1: With the given  $W_R$  and  $V_{READ}$ , sweep  $W_W$  till it meets the write-1 current target in the worst-case pattern.

Step 2: Find the  $V_{BLN}$  to achieve the write-0 current target under the worst-case configuration, when  $W_R$ ,  $V_{READ}$  and  $W_W$  are fixed. Step 3: Get the  $V_{Read}$  to achieve the  $I_{Rdiff}$  target for the given  $W_W$ ,  $V_{BLN}$  and  $W_R$ .

Step 4: Repeat the iteration from Step 1 to Step 3 until  $W_W$  and  $V_{READ}$  converge to certain values.

Figure 10 shows the minimum  $W_W$  and the corresponding  $V_{READ}$  under different  $W_R$ . Here, we set the targeted  $I_{Rdiff}$  as  $10\mu$ A and the write time as 10ns for both write-0 and write-1. The result shows that reducing  $W_R$  from 660nm to 540nm helps relax the sizing requirement of  $W_W$  due to the increased equivalent resistance of read access transistor. However,  $W_W$  starts to increase when further decreasing  $W_R$  because the the higher  $V_{READ}$  becomes the dominating factor. As the width of the cell layout is determined by  $W_W$ , the smallest 1R1W STT-RAM cell can be obtained when  $W_W = 660$ nm and  $W_R = 540$ nm. The corresponding  $V_{READ}$  and  $V_{BLN}$  are 0.27V and -0.53V, respectively.

TABLE III WORST-CASE ACCESS PATTERNS FOR WRITE OPERATIONS IN 1R1W.

|         | Worst-Case Pattern                       |  |  |
|---------|------------------------------------------|--|--|
| Write-1 | $n{A,W1,L}=8, n{B,R,L}=8$                |  |  |
| Write-0 | $n{A,W0,H}=1, n{A,W0,L}=7, Port-B idle.$ |  |  |
| 1       |                                          |  |  |

<sup>1</sup> Assuming Port-A is write only and Port-B is read only.



Fig. 10. The minimum  $W_{\rm W}$  and  $V_{\rm READ}$  under different  $W_{\rm R}.$ 

# C. Comparison of 2RW and 1R1W STT-RAM Designs

We compared the proposed 2RW and 1R1W STT-RAM designs by following the worst-case design methodology and the results are summarized in Table IV. Thanks to the smaller transistors, the cell area of a 1R1W STT-RAM cell is only 92.3% of that of the 2RW design. The amplitude of V<sub>BLN</sub> is smaller too. The reduced transistors and  $|V_{BLN}|$  indicates that the 1R1W design has the less write current in the non-worstcase condition and hence consumes less write energy than the 2RW design. Interestingly, although the  $V_{READ}$  is higher for 1R1W, the worst-case difference between read-'0' current and  $P \rightarrow AP$  switching current  $(I_{W1} - I_{R0})$  is still improved, which indicates lower possibility of read disturbance. This is because smaller  $W_W$  and  $V_{BLN}$  reduce the  $V_S$  drift toward negative direction, which is the main reason for the excessive I<sub>R0</sub>. In summary, 1R1W cell can achieve smaller area, less energy waste and smaller possibility of read disturbance, with the cost of restricted port functionality.

## VI. THE RELATED WORKS

Dual-port SRAM design has been widely used in order to satisfy the increasing demand on memory bandwidth. The 2RW SRAM is a common style [1][9]. The major design challenge is the degradation of static noise margin caused by the additional WL during "common-row-different-column" access. One common solution is to isolate the read port and write port, *i.e.*, 1R1W, present in [2][3].

Nearly all the previous works on STT-RAM focused on the single-port designs, such as the most popular 1T-1J STT-RAM [12][7]. The cell structure with two transistors (2T-1J) have been presented by Chung [8]. However, the main motivation was to enhance the writability and array density. The two transistors are controlled by the same WL and hence the design still has only one port.

| TABLE IV             | V            |
|----------------------|--------------|
| COMPARISON BETWEEN 2 | 2RW AND 1R1W |

|                                  | 2RW        | 1R1W           |  |
|----------------------------------|------------|----------------|--|
| Worst-case write time            | 10ns       |                |  |
| Worst-case I <sub>Rdiff</sub>    | $10\mu A$  |                |  |
| Transistor Width                 | both       | $W_W = 660 nm$ |  |
|                                  | 715nm      | $W_R = 540 nm$ |  |
| Cell Size                        | $100F^{2}$ | $92.3F^{2}$    |  |
| Area overhead over               | 39%        | 28%            |  |
| Single-Port STT                  | 39%        | 20%            |  |
| V <sub>READ</sub>                | 0.24V      | 0.27V          |  |
| I <sub>VBLN</sub>                | -0.58V     | -0.53V         |  |
| I <sub>W1</sub> -I <sub>R0</sub> | 77.8µA     | 82.4µA         |  |

#### VII. CONCLUSION

In this work, we firstly propose the dual-port STT-RAM design, which can provide higher data bandwidth. We propose to leverage the shared SL design to simplify the cell structure and reduce the memory cell area. Two types of the dual-port STT-RAM design, 2RW and 1R1W, are presented. Furthermore, the related design issues, including reliability, cell configuration, operating setup, and layout techniques, have been considered and discussed.

#### VIII. ACKNOWLEDGEMENT

This material is based upon work supported by the National Science Foundation under Grant No. CNS-1116684, CNS-1116171, and CNS-1149654. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

#### REFERENCES

- K. Nii et al., "A 90nm dual-port SRAM with 2.04um<sup>2</sup> 8t-thin cell using dynamically-controlled column bias scheme," in *IEEE International Solid-State Circuits Conference (ISSCC)*, vol. 1, 2004, pp. 508–543.
- [2] T. Suzuki et al., "A stable 2-port SRAM cell design against simultaneously read/write-disturbed accesses," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 43, no. 9, pp. 2109–2119, 2008.
- [3] S. Ishikura *et al.*, "A 45 nm 2-port 8T-SRAM using hierarchical replica bitline technique with immunity from simultaneous r/w access issues," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 43, no. 4, pp. 938–945, 2008.
- [4] "The International Technology Roadmap for Semiconductors," http://www.itrs.net, 2010.
- [5] W. Xu et al., "Design of last-level on-chip cache using Spin-Torque Transfer RAM (STT-RAM)," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI)*, vol. 19, no. 3, pp. 483–493, 2011.
- [6] X. Dong et al., "Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement," in ACM/IEEE Design Automation Conference (DAC), 2008, pp. 554–559.
- [7] T. Kawahara *et al.*, "2 Mb SPRAM (Spin-Transfer Torque RAM) with bit-by-bit bi-directional current write and parallelizing-direction current read," *IEEE Journal of Solid-State Circuits (JSSC)*, vol. 43, no. 1, pp. 109–120, 2008.
- [8] S. Chung et al., "Fully integrated 54nm STT-RAM with the smallest bit cell dimension for high density memory application," in *IEEE International Electron Devices Meeting (IEDM)*, 2010, pp. 12.7.1– 12.7.4.
- [9] K. Nii et al., "Synchronous ultra-high-density 2RW dual-port 8T-SRAM with circumvention of simultaneous common-row-access," *IEEE Journal* of Solid-State Circuits (JSSC), vol. 44, no. 3, pp. 977–986, 2009.
- [10] B. Zhao et al., "Architecting a common-source-line array for bipolar non-volatile memory devices," in *Design, Automation Test in Europe Conference Exhibition (DATE)*, 2012, pp. 1451–1454.
- [11] D. Lee *et al.*, "High-performance low-energy STT MRAM based on balanced write scheme," in *ACM/IEEE international symposium on Low power electronics and design (ISLPED)*, 2012, pp. 9–14.
- [12] M. Hosomi et al., "A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM," in *International Electron Devices Meeting (IEDM)*, 2005, pp. 459–462.
- [13] X. Wang *et al.*, "Thermal fluctuation effects on spin torque induced switching: Mean and variations," *Journal of Applied Physics*, vol. 103, no. 3, p. 034507, 2008.
- [14] "SMIC 65nm logic low leakage & RF Cadence PDK," http://service.smics.com, 2011.
- [15] F. Ishihara et al., "Level conversion for dual-supply systems," IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI), vol. 12, no. 2, pp. 185–195, 2004.
- [16] S. Gupta *et al.*, "Layout-aware optimization of STT MRAMS," in *Design, Automation Test in Europe Conference Exhibition (DATE)*, 2012, pp. 1455–1458.