# Current-Mode Carry-Free Multiplier Design using a Memristor-Transistor Crossbar Architecture

Shengqi Yu<sup>†</sup>, Ahmed Soltan<sup>‡</sup>, Rishad Shafik<sup>†</sup>, Thanasin Bunnam<sup>†</sup>, Fei Xia<sup>†</sup>, Domenico Balsamo<sup>†</sup>, Alex Yakovlev<sup>†</sup>

<sup>†</sup>Microsystems Research Group, Newcastle University, Newcastle upon Tyne, UK

<sup>‡</sup>NISC Group, Nile university, Al Sheikh Zayed, Giza, Egypt

E-mail: s.yu10@ncl.ac.uk, asoltan@nu.edu.eg, Rishad.Shafik@ncl.ac.uk

Abstract-Multipliers are a major energy and delay contributor in modern compute-intensive applications due to their complex logic architecture. As such, designing multipliers with reduced energy and faster speed has remained a thoroughgoing challenge. This paper presents a novel, carry-free multiplier, which is suitable for a new-generation of energy-constrained applications. The multiplier circuit consists of an array of memristor-transistor cells that can be selected (i.e., turned ON or OFF) using a combination of DC bias voltages based on the operand values. When a cell is selected it contributes to current in the array path, which is then amplified by current mirrors with variable transistor gate sizes. The different current paths are connected to a node for analogously accumulating the currents to produce the multiplier output directly. This removes the need for latency-sensitive carry propagation stages, typically seen in traditional multipliers. We conduct a number of experiments to validate the functional and parametric properties. Our experiments showed that proposed multiplier achieves 51.44% savings in energy at a similar accuracy when compared with recently proposed approaches.

*Index Terms*—Mixed-signal, current-mode multiplier, memristor-transistor crossbar, energy efficiency.

## I. INTRODUCTION

Continued developments in microelectronics technology have led to a myriad of new compute-intensive applications at the micro-edge, such as artificial intelligence and signal and image processing. Multiplication is a crucial arithmetic process in such applications. However, large logic complexities typically seen in traditional multipliers generate combinatorial blocks with long chains of cascaded carry addition. As such, energy efficiency has remained a primary design challenge for these applications, when powered by batteries or emerging energy harvesters [1], [3].

Over the years, researchers have investigated various design methods to minimize the energy and latency of multiplication, such as approximate [2], [4] and speculative circuit designs [5], [6]. These are based on marginally shortening the carry chains or truncating carry chains completely. However, the performance as well as accuracy are limited by the constrained length of the carry chains, determined by the design methods. Most of these methods use Landauer's logic boundaries defined in the voltage domain, and as such their permissible energy points are defined by a set of pre-defined voltage/ frequency pairs above the threshold [7]. To improve the energy proportionality, analogue currentmode arithmetic circuit designs have recently gained momentum [8]. These circuits operate with a dynamic range of currents (from  $\mu$ A to several mA), providing considerably higher energy efficiency leverage than voltage-mode circuits, with the added advantage of high slew rate and simpler circuitry. For example, using current mirror networks, concurrent additions can be performed by directing current paths into a node, and subtractions can be carried out by controlling current paths away from a node. Due to reduced circuit complexity, these networks can also operate faster with significantly reduced energy consumption [9], [10].

For energy efficiency, the programming or switching networks will need to offer low conductance with low biasing voltages for the current-mode circuits. This is challenging for traditional transistor based current networks as the typical switching current tends to vary marginally based on the biasing voltages. Besides, to reduce the conductance large resistor networks need to be integrated, which can contribute to unexpected parasitic behaviour of the circuits. Recently, memristortransistor cells have shown excellent current-mode characteristics, which can be positively exploited to accurately program current networks for ultra low-power applications [11].

This paper presents a novel multiplier design using a memresistive crossbar architecture for current path controls. The array consists of memristor-transistor cells, which can be turned ON or OFF using a combination of DC bias voltages based on the operand values. When a number of cells in a column are turned ON, they form a sum of product terms without involving any carry propagation at all. The resulting current is then amplified by current mirrors with variable transistor gate sizes to suitably multiply these paths with powers of 2. The different current paths are then directed to a node which accumulates the currents according to Kirchhoff's current law (KCL), without the need for carry propagation, unlike traditional digital multipliers.

Specifically, this work makes the following *contributions*:

- 1) a mixed-signal carry-free multiplier design using current-mode principles in a crossbar architecture; and
- 2) extensive validation and analysis demonstrating the multiplier's improved latency and energy efficiency.

The rest of the paper is organized as follows. Section II provides background and motivation. Section III describes

the proposed multiplier design using a crossbar architecture. Section IV presents the results of two different multiplication cases. Finally, Section V concludes the paper.

#### II. BACKGROUND AND MOTIVATION

In a traditional  $(N \times N)$  binary multiplier, two N-bit unsigned integers can be multiplied using  $N^2$  logic AND operations followed by up to 2N ADD operations. For example, consider the multiplication of two 4-bit unsigned integers, where the multiplier is  $M_1 : \{m_3m_2m_1m_0\}$  and the multiplicand is  $M_2 : \{n_3n_2n_1n_0\}$ , as illustrated in Fig. 1.

|                |          |           |          | $m_3$     | $m_2$     | $m_1$     | $m_0$     |                 |
|----------------|----------|-----------|----------|-----------|-----------|-----------|-----------|-----------------|
| _              |          |           | ×        | $n_3$     | $n_2$     | $n_1$     | $n_0$     |                 |
|                |          |           |          | $m_3n_0$  | $m_2 n_0$ | $m_1 n_0$ | $m_0 n_0$ | $\leftarrow PP$ |
|                |          |           | $m_3n_1$ | $m_2n_1$  | $m_1n_1$  | $m_0 n_1$ | 0         | $\leftarrow PP$ |
|                |          | $m_3n_2$  | $m_2n_2$ | $m_1n_2$  | $m_0 n_2$ | 0         | 0         | $\leftarrow PP$ |
|                | $m_3n_3$ | $m_2 n_3$ | $m_1n_3$ | $m_0 n_3$ | 0         | 0         | 0         | $\leftarrow PP$ |
| P <sub>7</sub> | $P_6$    | $P_5$     | $P_4$    | $P_3$     | $P_2$     | $P_1$     | $P_0$     | $\leftarrow FP$ |
|                |          |           |          |           |           |           |           |                 |

Fig. 1: Binary multiplication algorithm with 4-bit operands

As can be seen, the  $N^2$  logic AND operations produce partial product (PP) terms (i.e. bits), which can be generated in parallel. These terms are then column-wise added with variable numbers of PP terms. For the given example, the column-wise sums of the product terms can be expressed as follows:

$$P_{0} = m_{0}n_{0};$$

$$P_{1} = m_{1}n_{0} + m_{0}n_{1};$$

$$P_{2} = m_{2}n_{0} + m_{1}n_{1} + m_{0}n_{2};$$

$$P_{3} = m_{3}n_{0} + m_{2}n_{1} + m_{1}n_{2} + m_{0}n_{3};$$

$$P_{4} = m_{3}n_{1} + m_{2}n_{2} + m_{1}n_{3};$$

$$P_{5} = m_{3}n_{2} + m_{2}n_{3};$$

$$P_{6} = m_{3}n_{3}.$$
(1)

From Eq.(1), note that when the number of bits in each column is two or more, carry propagation becomes more likely depending on the operand bit values. For example, if  $m_1 = m_0 = n_1 = n_0 = 1$ ,  $P_1$  is expected to produce a carry into  $P_2$ . When both operands have all bits set to 1, i.e.,  $M_1$ ={1111} and  $M_2$ ={1111}, the multiplier experiences the largest chain of carry propagation between the columns, starting from the least significant to the most significant bits in the multiplier output.

The final product (FP) in the binary multiplication is actually the sum of all partial products. If the result can be generated by all the partial products, the existence of carry in multiplying procedure can be neglected.

# III. PROPOSED MULTIPLIER ARCHITECTURE

The building block of our crossbar array solution is a onememristor one-transistor (1M1T) cell, as shown in Fig. 2. The memristor values represent the bits of one operand, while the voltage signals in the row lines represent the bits of the other operand [11]. The 1M1T logic cell uses memristor as



Fig. 2: Multiplier product generation and accumulation circuits

the memory unit, and transistor as the switching unit. The memristor remembers its resistance state, even when there is no power supply [13]. When the memristor voltage is over its threshold, the set voltage (SV) will bias the memristor to low resistance state (LRS) and reset voltage (RSV) will bias the memristor to high resistance state (HRS). We denote LRS as logic '1' and HRS as logic '0'. Fig. 3 shows how these bias voltages affect the operations. The operation details of logic cell include  $\alpha$  (multiplication operand),  $\beta$  (tune resistance state of memristor to high level, write logic '1'), and  $\gamma$  (tune resistance state of memristor to high level, write logic '0') with their bias conditions and times. For example, in  $\alpha$ , 0.4V is denoted as a logic '1' and 0V is the logic '0' in the multiplier. Fig. 4 shows a 4-bit multiplier design using the proposed memristor-transistor crossbar array.



Fig. 3: Source line (SL) bias values and durations

In the multiplier circuit shown in Fig. 4, the basic 1M1T cell is organized at each cross point (i.e., cell) via a mapping procedure. This design provides a combination of fast operation and accurate selection. Both the input and control signals are applied in the form of a single bar source (SBS) [15]. The nonvolatile resistive memory cell follows the threshold voltage



Fig. 4:  $4 \times 4$  1M1T crossbar circuit with a 3-line setting, one row line and two parallel column lines defined to give the circuit the ability to select any cell.

memristor model VTEAM [12] with the model parameters from [13] which are extracted from the physical device. This ensures our design can be practically implemented; Table I lists the parameters.

TABLE I: VTEAM model parameters from [13]

| $alpha_{off}$   | 4     | alphaon        | 4      |
|-----------------|-------|----------------|--------|
| $V_{off}(V)$    | 0.3   | $V_{on}(V)$    | -1.5   |
| $R_{off}(Ohms)$ | 300K  | $R_{on}(Ohms)$ | 1K     |
| $k_{off}$ (m/s) | 0.091 | $k_{on}$ (m/s) | -216.2 |
| $w_{off}(nm)$   | 3     | $w_{on}(nm)$   | 0      |

The proposed design is implemented in UMC 65nm technology. The transistors are divided into two groups, logic cell (LC) and current mirror (CM), as shown in Fig. 2. All LCs contain the same size transistors which are 1000nm width and 60nm length. At the output terminal, NMOS and PMOS CMs are serially connected to perform high ratio amplification. As CMs work as amplifiers with respective gains, their transistor sizes are different as shown in Table II.

TABLE II: Transistor sizes of the current mirrors

|       | NN      | 105     | PMOS    |         |  |
|-------|---------|---------|---------|---------|--|
| group | M1 (nm) | M2 (nm) | M3 (nm) | M4 (nm) |  |
| 1     | 1520/60 | 400/60  | 80/60   | 240/60  |  |
| 2     | 2720/60 | 1600/60 | 80/60   | 260/60  |  |
| 3     | 3840/60 | 2400/60 | 80/60   | 720/60  |  |
| 4     | 5440/60 | 3200/60 | 80/60   | 1680/60 |  |
| 5     | 4080/60 | 4800/60 | 80/60   | 1920/60 |  |
| 6     | 2720/60 | 4800/60 | 80/60   | 2680/60 |  |
| 7     | 1520/60 | 1840/60 | 80/60   | 5120/60 |  |

## IV. EXPERIMENTAL RESULTS AND DISCUSSIONS

In experimental simulations, a  $4 \times 4$  crossbar multiplier is simulated to validate the method's effectiveness. For demonstration purposes, the multiplication between two 4-bit binary numbers is performed and shown in Fig. 4. The input 'x' is



Fig. 5: Expected and simulated results of the 4bit multiplier with w=0. The memristor-transistor array generates approximate results, corresponding to the output currents.

from 0 (0000) to 15 (1111) and the weight 'w' keeps a fix value 0 (0000). Multiplication operands are listed in Table III, the logic '0' of output has two current values 0A and  $1.33\mu$ A. Both of them are small enough to distinguish from the logic '1', which is 0.4mA.

TABLE III: Multiplier operands with array configurations

| State                 | W            | х                    |  |  |
|-----------------------|--------------|----------------------|--|--|
| 1                     | 300KOhms     | 0.4V                 |  |  |
| 0 1KOhms              |              | 0V                   |  |  |
| Multiplication Result |              |                      |  |  |
| Logic '1'             |              |                      |  |  |
| 0.4V/1KOhms=0.4mA     |              |                      |  |  |
| Logic '0'             |              |                      |  |  |
| 0V/300KOhms=0A        | 0V/1KOhms=0A | 0.4V/300KOhms=1.33µA |  |  |

In Fig. 4, the least significant bits (LSBs) are organized as follows. For the input voltage array, the LSB is 'input1' on circuit; for the final product, the LSB is 'out1' on circuit; and for memory, the LSB are 'M1', 'M5', 'M9', and 'M13' on the circuit. Likewise, the most significant bit (MSB) for the input voltage array, final product, and memory on circuit are 'input4', 'out7', 'M4', 'M8', 'M12', and 'M16', respectively. The input 'x' binary value increases from '0000' to '1111' corresponding to the respective input voltage biases, and generates specific current to output the calculation result.

The expected and simulation results of multiplication are comparatively shown in Fig. 5. The figure illustrates that in the 4-bit multiplication, proposed design may save over 50% energy compared to Kulkarni's truncated multiplier design [14] and have 84.22% higher error rate than Qiqieh's significancedriven logic compression approach [2]. More detailed descriptions of the comparative analysis as well as experimental results can be found in [15].

Although our proposed multiplier shows a good tradeoff between power and accuracy, there are still a number of issues



Fig. 6: Comparative power, delay and accuracy analysis: (a) shows that the power consumption of proposed design is  $403\mu$ W, Qiqieh's approach [2] is  $7.1\mu$ W, and Kulkarni's approach [14] is  $830\mu$ W, (b) illustrates that proposed multiplier has 0.851ns delay, while Qiqieh's approach has a delay of 0.96ns, (c) shows that the proposed design has the lowest error rate(ER) 2.004% for lower operand values, followed by Kulkarni's (2.6%) and Qiqieh's (12.7%). However, for higher operand values, the proposed design exhibits the highest error rate (71.7%).

at this early stage. For example, output current of smaller result multiplication is higher than the current of larger result multiplication. That is because the LRS of LC will cause a higher voltage drop than the HRS. Thus, the current generated for logic '1' is lower than the expected value, and the logic '0' current is higher. Moreover, the output current errors can also be amplified by the CM circuit. The gain of CM is also effected by the terminal voltages, the higher voltage drop of logic '1' current case leads to the decrease of CM amplifier gain. At the same time, the logic '0' current with lower voltage drop gets higher gain than it should. This may lead to the current level of small multiplication LSB logic '1' being lower than the level of large multiplication MSB logic '0'. These can get worse after the amplification, as seen in Fig. 5. Fig. 6c and Fig. 6d give a much more direct method to describe the difference between the error rates between high level multiplication and low level multiplication. The delay of the proposed multiplier is the smallest among these three models, which means that speed is an advantage of the proposed solution.

# V. CONCLUSION

In this paper, a mixed-signal current-mode multiplier has been proposed. The proposed multiplier features carry-free operation using current-mode principles. By reducing circuit complexity, computation latency and power consumption are significantly reduced. Using Cadence VHDL-AMS, the efficacy of the proposed multiplier is validated. When compared with existing multipliers, the proposed crossbar array shows deterministic precision and consumes much less power (in some cases showing power savings of up to 51.44%). This makes the proposed multiplier more relevant for applications in which the computation units at the edge are powered with limited energy sources with unpredictable and sporadic supply powers. The use of memristors enables the state of the switches to be retained naturally under power cuts, which we aim to study at a greater detail in future. Our further planned works include the development of a fully featured crossbar together with in-situ power delivery and control. We will aim to enhance the functional capability of the crossbar to

a hierarchical multiply-accumulate unit, suitable for emerging machine learning applications.

#### REFERENCES

- R. Shafik, A. Yakovlev and S. Das, "Real-Power Computing," IEEE Trans. Computers, 2018.
- [2] I. Qiqieh, R. Shafik, G. Tarawneh, D. Sokolov, S. Das and A. Yakovlev, "Significance-Driven Logic Compression for Energy-Efficient Multiplier Design," IEEE J. Emerging & Selected Topics in Circuits and Systems, 2018.
- [3] G. Tagliavini, A. Marongiu, D. Rossi and L. Benini, "Always-on motion detection with application-level error control on a near-threshold approximate computing platform," ICECS, 2016, pp. 552–555.
- [4] Y. Kim, B. Song, J. Grosspietsch and S.F. Gillig, "A carry-free 54b/spl times/54b multiplier using equivalent bit conversion algorithm," JSSC, 2001, pp.1538-1545.
- [5] T. Juang and S. Hsiao, "Low-error carry-free fixed-width multipliers with low-cost compensation circuits," J. IEEE Trans. Circuits and Systems II: Express Briefs, 2005, pp.299-303.
- [6] S. Getzlaff and R. Schuffny, R, "A mixed signal multiplier principle for massively parallel analog VLSI systems," CSCC'99. Greecs, 1999.
- [7] A. Yakovlev, "Ch.Enabling Survival Instincts in Electronic Systems: An Energy Perspective, in Transforming Reconfigurable Systems," Imperial College Press, 2015, pp.237-263.
- [8] M.A. Eldeeb, Y.H. Ghallab, Y. Ismail and H. El-Ghitani, "A 0.4-V miniature CMOS current mode instrumentation amplifier," J. IEEE Trans. Circuits and Systems II: Express Briefs, 2018, pp.261-265.
- [9] G.GE. Gielen and R.A. Rutenbar, "Computer-aided design of analog and mixed-signal integrated circuits," J. IEEE Proceedings, 2000, pp.1825-1854.
- [10] F. Yuan, "Low-voltage CMOS current-mode circuits: topology and characteristics," J. IEE PCDS, 2006, pp.219-230.
- [11] C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang, W. Song, N. Davila, C.E. Graves and others, "Analogue signal and image processing with large memristor crossbars," J. Nature Electronics, 2018, pp.52.
- [12] S. Kvatinsky, M. Ramadan, E. G. Friedman and A. Kolodny, "VTEAM: A General Model for Voltage-Controlled Memristors," J. IEEE Trans. Circuits and Systems II: Express Briefs, 2015, pp.786-790.
- [13] S. Kvatinsky, D.Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny and U. C. Weiser, "MAGIC - Memristor-Aided Logic," J. IEEE Trans. Circuits and Systems II: Express Briefs, 2014, pp.895-899.
- [14] P. Kulkarni, P. Gupta and M. Ercegovac, "Trading accuracy for power with an underdesigned multiplier architecture," B. VLSI Design, 2011, pp.346-351.
- [15] S. Yu, A. Soltan, R. Shafik, T. Bunnam, D. Balsamo, F. Xia and A. Yakovlev, "Current-Mode Carry-Free Multiplier Design using a Memristor-Transistor Crossbar Architecture," NCL-EEE-MICRO-TR-2019-216, Technical Report, µSystems Research Group, School of Engineering, Newcastle University. Accessed on: November.2019. [Online]. Available: https://tinyurl.com/s6agu40.