# Optimized Multi-Memristor Model based Low Energy and Resilient Current-Mode Multiplier Design

Shengqi Yu<sup>†</sup>, Rishad Shafik<sup>†</sup>, Thanasin Bunnam<sup>†</sup>, Kaiyun Chen<sup>†</sup>, Alex Yakovlev<sup>†</sup> <sup>†</sup>Microsystems Research Group, Newcastle University, Newcastle upon Tyne, UK E-mail: s.yu10@ncl.ac.uk, Rishad.Shafik@ncl.ac.uk

Abstract-Multipliers are central to modern compute-intensive applications, such as signal processing and artificial intelligence (AI). However, the complex logic chain in conventional multipliers, particularly due to cascaded carry propagation circuits, contributes to high energy and performance costs. This paper proposes a novel current-mode multiplier design that reduces the carry propagation chain and improves the current amplification. Fundamental to this design is a one transistor multi-memristor (1TxM) cell architecture. In each cell, transistor can be switched ON/OFF to determine the cell selection, while the high/low resistive states of memristors determine the corresponding cell output current when selected. The memristor states as well as biasing configurations in each memristor are suitably optimized through a new memristor model. The number of memristors implementing this model in each cell is suitably determined depending on the cell significance to achieve the required amplification. Consequently, the design reduces the need to have current mirror circuits in each current path, while also ensuring high resilience in transitional bias voltages. Parallel cell currents are then directed to a common current accumulation path to generate the multiplier output without requiring any carry propagation chain. We carried out a wide range of experiments to extensively validate our multiplier design in Cadence Virtuoso analogue design environment for functional and parametric properties. The results show that the proposed multiplier reduces up to 85% latency and 99% energy cost when compared with the recently proposed approaches.

*Index Terms*—Multiplier, in-memory multiplication, , memristive multiplier, 1TxM, energy efficiency, current-mode design.

### I. INTRODUCTION

Arithmetic-heavy applications, such as signal processing and artificial intelligence (AI), are fundamental enablers of the industrial revolution 4.0 [1]. Multiplication is a crucial component of these applications with significant impact on performance and energy efficiency. This is because the underlying circuits implementing multiplication require complex partial product generation as well as carry propagation logic chain [2]. As such, reducing the energy consumption of multipliers has remained a thoroughgoing design challenge.

Over these years, researchers have investigated methods of optimizing the multiplier performance and energy efficiency. To reduce the circuit complexity a number of approximate and speculative circuit design methods have been proposed. The key aim is to prune the carry chains to a minimum proportion, while accepting minor accuracy losses [3]. The effect of this is significant on performance improvement.

The impact of digital logic pruning on energy proportionality is however limited using the above-mentioned approaches. This is because these are primarily based on Landauer's logic boundaries for the operating voltages (i.e. digital 0 and 1 logic biasing voltages), which are paired with a set of operating frequencies [4]. Under the performance demands of modern compute-intensive applications, the number of these pairs is further cut down to satisfy the conflicting tradeoffs between energy and performance. Moreover, as inaccuracies and errors accumulate in cascaded computational workloads, the mitigation strategies are required which add to the logic chains, making energy-efficient multiplication challenging [5].

To provide better elasticity as well as energy efficiency, recently current-mode arithmetic design method has been proposed [6]. In this method the operands are expressed in the current form (within a dynamic range: from nA to mA). As current based arithmetic, such as addition, subtraction, multiplication and division, is inherently simple the circuits are less complex [7]. For instance, addition operation is equivalent to directing a current path into a node, while taking a current path away from a node is equivalent to subtraction operation. Additionally, the circuits can transition the internal states much faster without requiring any biasing boundaries or voltage/ frequency coupling, they exhibit high slew rate [8].

Existing memristor-transitor cell is founded on a transistor in series with a memristor (1T1M) structure. The output current from each cell requires a separate current mirror (i.e. an amplifier) to reflect the significance of the current path. Typically, the amplification factors increase by powers of 2 from in a binary crossbar architecture from the least significant to most significant binary lines. As the amplification factor increases the current mirrors tend to skew the transistor dimensions rather unconventionally. Designing these current mirrors often requires the transistor designs to feature disproportionately high width to length ratio. Such an unconventional transistor sizing is a significant challenge for design validation as well as manufacturing [9]. Moreover, this can contribute to unexpected parasitic behavior, which can posit further challenges in load resistance matching [10], [11].

In this paper, we present a novel multiplier design with one transistor multi-memristor (1TxM) crossbar architecture. The basic structure of the cell consists of a transistor in series with multiple memristors (in powers of 2) in parallel. The cell can be turned ON or OFF by a suitable transistor biasing. The parallel memristors, which are designed using fast and high marginal resistance state memristor model, can be mode-switched to high-resistance state (HRS) or low-resistance state

(LRS) by applying the same DC bias voltages based on the cell operand value. Each cell current is suitably amplified by the parallel memristors within the cell depending on the significance of the path. When the cells in a column are turned ON, they form a sum of the products term without involving any carry propagation, according to Kirchhoff's current law (KCL). By replacing the conventional current mirror with multi-memristor cell structure, the circuit benefits from substantially improved performance, energy efficiency and resilience. Specifically, this work makes the following *contributions*:

- an optimized current-mode single transistor-multimemristor (1TxM) crossbar cell structure with in-situ current amplification;
- a mixed-signal carry-free multiplier architecture using the principle of current-mode operation; and
- low-level circuit design, validation and analysis demonstrating the energy efficiency and latency improvement.

The rest of this paper is organized as follows. Section II analyzes the multi-memristor model. Section III describes the proposed multiplier design using a crossbar architecture. Section IV discusses the experimental results with comparative analysis between multiplier architectures. Finally, Section V concludes the paper.

## II. PROPOSED MULTI-MEMRISTOR MODEL

In a current-mode multiplier implemented using the memristor-transistor crossbar architecture, the current level in each cell defines the binary value and their significance. The current within each cell is programmed by the combination of transistor and memristor bias voltages. Each combination generates a unique circuit resistance and thereby a current output whose logic value equivalents are shown in Table. I.

TABLE I CURRENT-MODE LOGIC DEFINITIONS.

| Cell bias | Cell memristance | Output logic value |
|-----------|------------------|--------------------|
| high      | high             | 0                  |
| low       | low              | 0                  |
| low       | high             | 0                  |
| high      | low              | 1                  |

It is important that the output current levels are distinguishable between the logic states. This requires the difference between memristor ON resistance  $(R_{ON})$  and OFF resistance  $(R_{OFF})$  to be sufficiently large (i.e.  $R_{MOFF} >> R_{MON}$ ). Existing memristor models [12] do not consider this carefully, particularly for a multi-memristor structure used in our work.

In a multiplication cell (MC), the transistor paired with the memristors also contributes parasitic resistance and thereby affects the output current. This is an important consideration in deriving the suitable margin between  $R_{ON}$  and  $R_{OFF}$ .

Existing memristor-transistor based models do not take this balance into consideration to provide a resilient multimemristor based logic operation, which motivates our work. In an MC, the current levels for logic 0 and 1 are predetermined . The choice of current levels affects the largest amplification possible within the cell. This is because larger amplification on the logic 0 current can potentially exceed the logic 1 current level. As such, a key challenge is to retain the current margin between logic 0 and logic 1 states within each MC. Moreover, as the requirement of amplification ratio increases by powers of 2 for higher significance MCs, the overall circuit must also adhere to a higher margin between logic 0 and logic 1 currents. For example, in the case of a 4-bit multiplier circuit, this margin must be equal to or higher than  $2^6$  as the number of parallel MC columns will be 7 in total.

In an MC different amplification ratios (r) are adjusted by suitably adjusting the cell memristance. For an  $r \times$  amplification r parallel memristors are required. When the rincreases the cell memristance (denoted as  $R_C$ ) also decreases as  $R_C = \frac{R_M}{r} + R_T$ . For resilient and accurate cell operation, the transistor resistance and memristance should be given by:

$$R_{TOFF} >> \frac{R_{MON}}{r} >> R_{TON} \tag{1}$$

$$\frac{R_{MOFF}}{r} > R_{TOFF} >> R_{TON} \tag{2}$$

Therefore, the r within each MC will be approximately given by:  $r \approx S \frac{R_{MOFF}}{R_{MON}}$ , where S is the significance of the cell. For designing a resilient multiplier circuit, the following two conflicting aspects are crucial. Firstly, the memristor ON/ OFF ratio must be high enough to make sure that the sum of logic 0 currents from all cells is still significantly smaller than the logic 1 current. Secondly, for higher significance cells with high memristor ON/OFF ratio, the r will be quite large. This will require biasing adjustments, i.e. the minimum biasing latency should be longer to ensure logic stability.

#### III. CARRY-FREE CURRENT-MODE MULTIPLIER DESIGN

Our multiplier design, which is based on the model in the previous section, consists of a number of current-mode 1TxM MC cells which we explain below. Later, the organization of the MCs in the multiplier design will be elaborated.

Fig. 1 shows the crossbar architecture using a 4-bit multiplication as an example, fundamentally based on the 1TxM multiplication cells (MCs). According to Ohm's law the MC current is given by I = V/R, where V is a fixed bias voltage and I is the variable current depending on the resistance R.

From the 4-bit multiplier architecture example in Fig. 1, it can be seen that the first operand is the input voltage vector x, which defines the biases. The other operand w determines the memristor state vector on a single row w. Both operands can be expressed as below:

$$x = [in1 \quad in2 \quad in3 \quad in4]' \tag{3}$$

$$w = \begin{bmatrix} R_{m4} & R_{m3} & ; R_{m2} & R_{m1} \end{bmatrix}$$
(4)

- $= \begin{bmatrix} R_{m8} & R_{m7} & R_{m6} & R_{m5} \end{bmatrix}$ (5)
- $= \begin{bmatrix} R_{m12} & R_{m11} & R_{m10} & R_{m9} \end{bmatrix}$ (6)
- $= \begin{bmatrix} R_{m16} & R_{m15} & R_{m14} & R_{m3} \end{bmatrix} \quad . \tag{7}$



Fig. 1. Proposed 4-bit multiplier architecture; the multiplication cell has one transistor in series with multiple parallel memristors. The number of memristors in each cell equals to the amplification ratio (r) pre-set depending on the significance of the cell; the row line (RL) writes or reads the cell on the same row; the column line (CL) directs the current path from the cell on the same column, leading to the output node. The gate line (GL) provides bias for the switching cell transistors on the same column. The significance of the operands, i.e. LSB/MSB arrangement of input voltage vector x and resistance state vector w, is shown using arrows.

For illustration purposes, consider the following operands: x = 0011 and w = 0101, for which the binary output should be 1111 (i.e. decimal 15). Referring to the crossbar in Fig. 1 and the multiplication procedure presented before, the input vector is  $x = [in1 \ in2 \ in3 \ in4]'$ , transistor control vector is  $v = [V_{g7}]V_{g6}V_{g5}V_{g4}V_{g3}V_{g2}V_{g1}$ . In first write operation, x = [-1.4V - 1.4V - 1.4V - 1.4V]' and v =[1.2V 1.2V 1.2V 1.2V 1.2V 1.2V 1.2V], whereby all cells are set to HRS (logic 0). In the following write operation, x = $[1.6V \ 0V \ 0V \ 0V]'$  and  $v = [0V \ 0V \ 0V \ 0V \ 1.2V \ 0V \ 1.2V]$ will commit 0101 on the first row. In the remaining write operations, x shifts 1 bit to right on the next row while vshifts 1 bit to the left. After the writing is complete, x is set to  $[0.4V \ 0.4V \ 0V \ 0V]'$  for reading the entire crossbar with  $v = [1.2V \ 1.2V \ 1.2V \ 1.2V \ 1.2V \ 1.2V \ 1.2V]$ . As a result, the output currents on CL1, CL2, CL3, and CL4 are  $1\times$ ,  $2\times$ ,  $4\times$ and  $8 \times$  larger than the logic 1 current generated from the least significance cell (m1, encircled in red), while CL5, CL6, and CL7 generate logic 0 currents. The sum of all CL currents accumulate at the output line as approximately  $15 times I_0$  $(I_0 \text{ is the basic logic 1 current in the least signicant MC}),$ i.e.  $I=15 \times I_0$ . Using a current sensor this will approximately correspond to a binary value 1111, thereby satisfying the expected outcome.

## IV. EXPERIMENTAL RESULTS & EVALUATION

The proposed design is implemented with UMC 65nm technology, which integrates our proposed model implemented using the VTEAM library [12]. All transistors in our design are N-type with the width and length of 500nm and 60nm, respectively, to keep the same channel resistance in current-mode circuit. The VTEAM parameters are shown in Table II. These are extracted from the physical and practical devices in [12], [13].

To extract power, delay and area results from our experiments using the above parameters, we design the circuit

 TABLE II

 CU:ZNO MEMRISTOR MODEL PARAMETERS FROM [12], [13]

| Parameter            | value | parameter        | value |
|----------------------|-------|------------------|-------|
| alpha <sub>OFF</sub> | 7     | $alpha_{ON}$     | 5     |
| $V_{OFF}(V)$         | 0.9   | $V_{ON}(V)$      | -0.85 |
| $R_{OFF}(\Omega)$    | 150M  | $R_{ON}(\Omega)$ | 150k  |
| $k_{OFF}$ (m/s)      | 40    | $k_{ON}$ (m/s)   | -80   |
| $w_{OFF}(nm)$        | 3     | $w_{ON}(nm)$     | 0     |

schematic in Cadence Virtuoso Design environment. using a number of simulation scripts, we monitor the outcomes for evaluating the parametric as well as functional properties. Below we discuss the outcomes of specific experimental results.

To validate the switching behavior of the individual MCs, several cells are switched in the following pattern: ON-OFF-ON and the performance results are recorded. Details are illustrated in Fig. 2.



Fig. 2. Delay and energy comparison of a single cell with ON-OFF-ON switching operation: (a) shows that the delay of our cell is 4.725ns, MAD 1TxM approach [14] is 4.925ns, IMPLY shift-and-add approach [15] is 12.5ns, and MAD shift-and-add approach [15] is 2ns, (b) illustrates that proposed 1TxM cell has 0.0148pJ energy consumption, while the one of MAD 1TxM approach [14] is 0.118pJ, IMPLY shift-and-add approach is 450pJ and MAD shift-and-add approach is 0.72pJ.

As can be seen, the proposed 1TxM cell performs the best in terms of delay and power consumption with the given switching pattern. It can reduce up to 72.2% delay and 97.95% power consumption when compared with IMPLY and MAD approaches. This is because our proposed approach features memristors that can react much faster under the same biasing voltage due to higher ion transfer speed [13].

TABLE III MULTIPLIER OPERANDS WITH ARRAY CONFIGURATIONS.

| State                    | logic 1               | logic 0                                                |
|--------------------------|-----------------------|--------------------------------------------------------|
| $R_m$                    | 150.793kΩ             | 152.43MΩ                                               |
| Vin                      | 0.4V                  | 0                                                      |
| Multiplication<br>Result | 0.4V/150.793kΩ=2.65µA | 0V/150.793kΩ=0A<br>0V/150MΩ=0A<br>0.4V/152.43MΩ=2.62nA |

The multiplier results are compared with MAD 1TxM [14], IMPLY shift-and-add [15] and MAD shift-and-add [15] multipliers. Fig. 3 depicts the comparative results.



Fig. 3. Delay and energy comparisons between multiplication approaches using of x = 0011 and w = 0101: (a) shows the delay of the proposed multiplier design is 6.152ns, MAD 1TxM approach [14] is 11.072ns, IMPLY shift-and-add approach [15] is 40.625ns, and MAD shift-and-add approach is 6.5ns, (b) illustrates that proposed multiplier energy is 17.587pJ, while that in MAD 1TxM [14] is 112.76pJ, in IMPLY shift-and-add is 1200pJ and finally in MAD shift-and-add approach is 1.92pJ.

It can be seen that the proposed multiplier has the least delay and the second lowest power consumption. When compared with IMPLY multiplier, the delay and energy consumption reductions reach up to 85% and 99%. This is because our proposed multiplier design has fewer operating steps by rowscale adjustment and no current mirrors. The proposed multiplier exhibits 6 steps, while 6 steps in MAD 1TxM, 120 steps in IMPLY shift-and-add, 20 steps in MAD shift-andadd, and 136 steps in CMOS shift-and-add multipliers in 4bit multiplication. When compared with MAD-cell approach, which has similar number of steps, our design gains 44% and 84% delay and power reductions. This can be attributed to the faster memristor model used in our design with much higher ON-OFF margin memristance (Table II). We envisage the delay and energy characteristics will significantly scale up for larger multiplier crossbar designs. However, this will need the model to include delay adjustments for higher significance MCs.

#### V. CONCLUSIONS

In this paper, a novel multiplier design is presented, which features a carry-free and in-memory multiplication using the current-mode principles. By suitably determining the large ON/OFF ratio between memristor states our proposed design offers high resilience in the presence of transitional bias voltages. Also, by using multi-memristor configuration we sidestep the need to have dedicated current mirrors, thereby reducing energy and area substantially. The functional as well as parameteric properties are extensively validated and compared with the recently proposed approaches.

The proposed multiplier is a promising alternative for microedge applications that have limited and variable powers. The non-volatile properties of the resistive memory can provide autonomous survivability under extreme power conditions. In the future, we plan to develop synchronized biasing for the multiplier architecture and validate in machine learning applications.

#### REFERENCES

- [1] R. S. et al., "Real-power computing," IEEE Trans. Computers, 2018.
- [2] G. T. et al., "Always-on motion detection with application-level error control on a near-threshold approximate computing platform," in *ICECS*. IEEE, 2016, pp. 552–555.
- [3] A. C. et al., "High speed speculative multipliers based on speculative carry-save tree," *IEEE Transactions on Circuits & Systems I Regular Papers*, vol. 61, no. 12, pp. 3426–3435, 2014.
- [4] R. S. et al., "Learning transfer-based adaptive energy minimization in embedded systems," *IEEE Trans. on Comp.-Aided Des. of Integ. Circuits* and Systems (TCAD), vol. 35, no. 6, pp. 877–890, 2016.
- [5] A. Yakovlev, "Enabling survival instincts in electronic systems: An energy perspective," in TRANSFORMING RECONFIGURABLE SYS-TEMS: A Festschrift Celebrating the 60th Birthday of Professor Peter Cheung. World Scientific, 2015, pp. 237–263.
- [6] Y. P. et al., "Fully hardware-implemented memristor convolutional neural network," *Nature*, vol. 577, no. 7792, pp. 641–646, 2020.
- [7] C. L. et al., "Analogue signal and image processing with large memristor crossbars," *Nature Electronics*, vol. 1, no. 1, p. 52, 2018.
- [8] S. Y. et al., "Current-mode carry-free multiplier design using a memristor-transistor crossbar architecture," in 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2020, pp. 638–641.
- [9] P. Wilson and R. Wilcock, "Optimal sizing of configurable devices to reduce variability in integrated circuits," in *Design, Automation and Test* in Europe, DATE, 2009.
- [10] J. R. et al., "Drain-induced barrier lowering and parasitic resistance induced instabilities in short-channel insnzno tfts," *Electron Device Letters, IEEE*, vol. 35, no. 7, pp. 756–758, 2014.
- [11] G. Li, J. Mathew, R. A. Shafik, D. K. Pradhan, M. Ottavi, and S. Pontarelli, "Lifetime reliability analysis of complementary resistive switches under threshold and doping interface speed variations," *IEEE Transactions on Nanotechnology*, vol. 14, no. 1, pp. 130–139, 2015.
- [12] S. K. et al., "VTEAM: A general model for voltage-controlled memristors," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 62, no. 8, pp. 786–790, 2015.
- [13] B. S. et al., "Realizing spike-timing dependent plasticity learning rule in pt/cu:zno/nb:sto memristors for implementing single spike based denoising autoencoder," *Journal of Micromechanics & Microengineering*, 2019.
- [14] S. Y. et al., "Self-Amplifying Current-Mode Multiplier Design using a Multi-Memristor Crossbar Cell Structure," in *International Conference* on Electronics Circuits and Systems, ICECS, 2020.
- [15] L. G. et al., "Optimized memristor-based multipliers," *IEEE Transac*tions on Circuits & Systems I Regular Papers, vol. PP, no. 2, pp. 1–13, 2017.