# Inexact Designs for Approximate Low Power Addition by Cell Replacement

Haider A.F.Almurib & T. Nandha Kumar Faculty of Engineering The University of Nottingham

Semenyih, Selangor, Malaysia (haider.abbas; nandhakumar.t@nottingham.edu.my)

Abstract— This paper proposes three designs of an inexact adder cell for approximate computing. These cells require a substantially smaller number of transistors compared to an exact full adder cell as well as known inexact designs. These inexact cells are simulated at 45 nm and compared with respect to circuit based metrics (such as energy consumption, delay, complexity and energy delay product) as well as error metrics (such as error rate). The replacement of exact cells with inexact cells such as the ones proposed in this manuscript in a ripple carry adder is evaluated to assess by exhaustive simulation different metrics for approximate computing; image addition is then pursued as application. These results show that among existing inexact cells found in the technical literature, the proposed designs consume the least power and have superior performance in terms of delay, switching capacitance and error measures for image quality and processing.

Keywords—Inexact adder, approximate adder, Inexact computing, approximate computing, error distance, adder, low power DSP.

# I. INTRODUCTION

A handheld/portable system usually processes a large amount of information that is computational and power intensive. Processing of image and video is also highly error tolerant; moreover, human senses cannot often perceive degradation in performance, such as quality of visual and audio information. Hence, approximate rather than accurate computing has been advocated for this type of applications. Approximate computing introduces inaccuracy in the computational process by targeting specific figures of merit, such as for improvement power dissipation, delay and circuit complexity.

Digital Signal Processing (DSP) systems are widely used to process image and video information. Hence, low energy DSP circuits have been investigated to maintain a good output quality of an image by approximate computing. An approach is to scale the supply voltage beyond the critical voltage that is required to meet the critical path delay; this leads to a degradation in the algorithmic performance which is compensated using algorithmic noise tolerance schemes [1]. Another scheme that simultaneously targets low power and process tolerance using voltage scaling is presented in [2]; such approximate computing method uses logic complexity reduction at algorithmic level to achieve low power by using voltage over scaling (VOS). Fabrizio Lombardi Department of ECE Northeastern University Boston, MA 02115, USA (lombardi@ece.neu.edu)

The reduction of circuit complexity in approximate computing at logic/gate level is dealt in [3]. In [3], a logic synthesis approach is proposed to design circuits for implementing an approximate version of the given function by considering the so-called error rate (ER) as metric for error tolerance; the objective of [3] is to designing a circuit with a lower number of transistors such that a reduction in power dissipation is also accomplished.

Arithmetic circuits are well suited to approximate computing; addition has been extensively analyzed in the technical literature. A reduction in circuit complexity at transistor level of an adder circuit provides a reduction in power dissipation higher than conventional low power design techniques [4]; in addition to the ER, new figures of merit for estimating the error introduced in an approximate adder have been presented in [5]. Approximate adder designs have been evaluated in [6]: approximation has been introduced by either replacing the accurate cell of a modular adder design (such as the ripple carry adder, RCA) with an approximate cell of lower circuit complexity, or by modifying the generation and propagation of the carry in the addition process.

Cell replacement usually provides a shorter critical path, it also enables voltage scaling and reduces the switching capacitance. [4] has presented an approximate mirror adder (AMA) circuit that utilizes cell replacement by reducing adder cell circuit complexity compared to a traditional mirror adder (MA) scheme. [7] has proposed approximate adder cells (AXA) based on XOR and XNOR implementations; node capacitances and power are also reduced in the AXA designs of [7].

In this paper, three new inexact (approximate) adder cell designs (denoted to as InXA) are presented; these cells have both electrical and error features that are very favorable for approximate computing. The proposed InXAs are then used in design an approximate ripple carry adder (RCA). The results (based on an exhaustive simulation) of the InXA based RCAs are then compared with existing approximate schemes as well as an exact RCA [8]. A detailed analysis of approximate addition for image processing application is presented to show that the RCAs using the proposed inexact cells offer improved performance metrics over existing approximate cells found in the technical literature. In conclusion, the proposed adder cells have the following advantageous features over previous designs: (i) a very small number of transistors; (ii) a very small number of erroneous outputs at the two outputs (i.e.

Sum and Carry); (iii) smaller switching capacitances, thus incurring in a substantial reduction in both delay and energy dissipation (and their product as combined metric); (iv) improved peak signal to noise ratio (PSNR) for image addition.

#### II. PROPOSED CELLS

Three inexact adder cells with a lower number of transistors than previous approximate designs [4] [7] are presented; these designs are denoted as InXA1, InXA2 and InXA3.

# A. First Inexact Adder (InXA1) Cell

This adder is designed by retaining an exact Sum, while approximating the Carry. Figures 1 and 2 show its gate and circuit level diagrams. The truth table for the Sum and Carry are shown in Table 1 (columns 6 and 7); the first three columns give the input values (X, Y and Cin), while columns 4 and 5 give the Sum and Carry of the exact full adder. InXA introduces an error in rows 2 and 7; as InXA1 introduces an error in Carry, this error may propagate to subsequent cells when utilized in a multi-bit adder.





#### B. Second Inexact Adder (InXA2) Cell

InXA2 has an exact Carry and an approximate Sum. The gate and circuit level diagrams are shown in Figures 3 and 4 respectively. The truth table for Sum and Carry of this circuit is shown in Table 1(columns 8 and 9). InXA2 introduces an error in Sum at rows 4 and 6.



Figure 4. Transistor circuit diagram of InXA2.

## C. Third Inexact Adder (InXA3)Cell

InXA3 has an exact Carry, while approximation is introduced into the Sum. Errors are introduced in Sum at rows 1 and 8; the gate and circuit level circuit diagrams are shown in Figures 5 and 6 respectively. The truth table for Sum and Carry of this circuit is shown in Table 1 (columns 10 and 11). When compared with the exact full adder cell, InXA2 reduces a gate delay (XOR gate to NOT gate). When compared to AXA, a transistor is eliminated, while providing the same logic operation.



Figure 6. Transistor circuit diagram for InXA3

Table 1. Truth table for the proposed inexact adder cells

| Inputs |   |                 | Exact   |      | Proposed Inexact Adder Cells |          |       |      |            |      |
|--------|---|-----------------|---------|------|------------------------------|----------|-------|------|------------|------|
| _      |   |                 | Outputs |      | InXA1                        |          | InXA2 |      | InXA3      |      |
| Х      | Y | $C_{\text{in}}$ | Sum     | Cout | Sum                          | Cout     | Sum   | Cout | Sum        | Cout |
| 0      | 0 | 0               | 0       | 0    | 0√                           | 0√       | 0√    | 0√   | 1×         | 0√   |
| 0      | 0 | 1               | 1       | 0    | 1√                           | 1×       | 1√    | 0√   | 1√         | 0√   |
| 0      | 1 | 0               | 1       | 0    | 1√                           | 0√       | 1√    | 0√   | 1√         | 0√   |
| 0      | 1 | 1               | 0       | 1    | 0√                           | 1√       | 1×    | 1√   | 0√         | 1√   |
| 1      | 0 | 0               | 1       | 0    | 1√                           | 0√       | 1√    | 0√   | 1√         | 0√   |
| 1      | 0 | 1               | 0       | 1    | 0√                           | 1√       | 1×    | 1√   | 0√         | 1√   |
| 1      | 1 | 0               | 0       | 1    | 0√                           | <b>%</b> | 0√    | 1√   | 0√         | 1√   |
| 1      | 1 | 1               | 1       | 1    | 1√                           | 1√       | 1√    | 1√   | <b>x</b> 0 | 1√   |

The evaluation of the proposed inexact adder cells is pursued with respect to the exact and previously proposed approximate designs. The following performance metrics (as defined in [5]) are used

NAB: For an n-bits number, NAB is as the number of bits starting at the LSB that utilize approximate cells, i.e. NAB denotes the number of LSB bits that use approximate cells.

Error Distance (ED): ED is defined as the arithmetic difference between the exact result (R) and the approximate result  $(\hat{R})$ , i.e.,

$$ED = \left| R - \widehat{R} \right| \tag{1}$$

Mean Error Distance (MED): MED is the average ED for a set of outputs.

Normalized Mean Error Distance (NMED): NMED is the normalized value of MED, i.e.

$$NMED = \frac{MED}{R_{max}}$$
(2)

where  $R_{max}$  is the maximum magnitude of the output value of the exact adder.

Mean Relative Error Distance (MRED): MRED is the average of the Relative Error Distance (RED) for the same set of outputs, where RED is defined as:

$$RED = \frac{ED}{R}$$
(3)

Error Rate (ER): ER is defined as the percentage of erroneous outputs among all outputs.

D. Error Analysis

Next, the number of errors introduced by the proposed inexact adder cells is compared with previous approximate adders [4] [7] for the possible 8 input patterns. The results are presented in Table 2. Columns 2 and 3 give the number of error values in Sum and Carry outputs for approximate adder cells. Columns 4 and 5 present the error rates at Sum and Carry. All proposed inexact adder cells introduce a smaller number of errors (2) and a reduced error rate (same to AMA2 and AXA).

Table 2. Number of erroneous entries in truth table for previous and proposed approximate full adder cells

| Approximate<br>Adder Cells | No<br>Error<br>Val | . of<br>neous<br>ues | Error Rate (ER)<br>in % |      |  |
|----------------------------|--------------------|----------------------|-------------------------|------|--|
|                            | Sum                | Cout                 | Sum                     | Cout |  |
| AMA1                       | 2                  | 1                    | 25                      | 12.5 |  |
| AMA2                       | 2                  | 0                    | 25                      | 0    |  |
| AMA3                       | 3                  | 1                    | 37.5                    | 12.5 |  |
| AMA4                       | 3                  | 2                    | 37.5                    | 25   |  |
| AXA                        | 2                  | 0                    | 25                      | 0    |  |
| InXA1                      | 0                  | 2                    | 0                       | 25   |  |
| InXA2                      | 2                  | 0                    | 25                      | 0    |  |
| InXA3                      | 2                  | 0                    | 25                      | 0    |  |

#### E. Number of Transistors & Input Node Capacitance

The second column of Table 3 shows the number of transistors in the proposed and previous approximate designs; the proposed inexact cells require less number of transistors than the AMA designs [4] and AXA [7] as well as for the Exact Full Adder (EFA) that needs 10 transistors [8].

Table 3. Input node capacitances of previous and proposed approximate adder cells

| Approximate | No. of<br>Transistors | Node Capacitances in $C_{gn}$ |    |     |  |
|-------------|-----------------------|-------------------------------|----|-----|--|
| Adder Cells |                       | Х                             | Y  | Cin |  |
| AMA1        | 20                    | 12                            | 15 | 13  |  |
| AMA2        | 14                    | 12                            | 12 | 8   |  |
| AMA3        | 11                    | 8                             | 11 | 8   |  |
| AMA4        | 15                    | 12                            | 8  | 9   |  |
| AXA         | 7                     | 8                             | 6  | 0   |  |
| InXA1       | 6                     | 6                             | 6  | 8   |  |
| InXA2       | 8                     | 6                             | 6  | 2   |  |
| InXA3       | 6                     | 6                             | 6  | 0   |  |

Next, the input node capacitance is found using the approach of [4]. Let Cgn and Cgp denote the gate capacitance of minimum size NMOS and PMOS transistors respectively. The drain diffusion capacitance of the NMOS and PMOS are given by Cdn and Cdp; also based on the assumption that the width of a PMOS is three times the width of the nMOS transistor, then  $Cgp \approx 3Cgn$  and  $Cdp \approx 3Cdn$ . Therefore, the total capacitance at a node is given by the sum of the gate and

diffusion capacitances. The node capacitances are calculated in terms of Cgn [4] and the results are presented in Table 3. Among all approximate adder cells, the proposed inexact designs have the least values of input nodal capacitances, i.e. a faster process for charging/discharging the capacitance as well as lowering the power dissipation [6].

# F. Performance Metrics

Next, the previous and proposed approximate and EFA cells are evaluated at 45nm (with the PTM of [9]) by simulation using LTSPICE IV. The exhaustive input set is utilized (eight combinations) and input patterns are supplied at an interval of 2ns by measuring the charging and discharging durations of Sum and Carry.



Figure 7. Average and worst case delay for approximate adder cells.



Figure 8. Average and worst case energy dissipation for approximate adder cells.

Delay, energy dissipation and the energy-delay-product (EDP) for previous and proposed approximate cells are presented in Figures 7, 8 and 9 for the average and worst cases; InXA1 and InXA2 outperform the previous approximate adder cells in all of these metrics. As for InXA3, although it dissipates less energy compared with other previous approximate adder cells, it also incurs in a larger delay.

Table 4. Average and Worst Delay, Energy dissipated and EDP for EFA [8]

| Average<br>delay<br>(ns) | Average<br>energy<br>dissipation<br>(fJ) | Average<br>EDP<br>(ns.fJ) | Worst<br>delay<br>(ns) | Worst<br>energy<br>dissipation<br>(fJ) | Worst<br>EDP<br>(ns.fJ) |
|--------------------------|------------------------------------------|---------------------------|------------------------|----------------------------------------|-------------------------|
| 0.174419                 | 0.926751                                 | 0.161643                  | 0.424647               | 2.366840                               | 1.005072                |

Table 4 shows the delay, energy dissipation and EDP for the EFA of [8]; as expected, EFA incurs in the largest delay and the highest energy dissipation (hence, the largest value of EDP) for both the average and worst cases when compared with other approximate adder cells.



Figure 9. Average and worst case EDP for approximate adder cells.

#### III. APPROXIMATE RIPPLE CARRY ADDERS

In this section, the approximate cells are utilized for designing an approximate ripple carry adder (RCA); EFA cells are replaced with approximate cells in an n-bit RCA starting from the LSB. The exhaustive simulation for n=12 is presented (as consistent results are found by changing n), i.e. starting from 1 until all exact cells are replaced by inexact cells. The results are shown in Figures 10, 11 and12 (the x-axis is given by the percentage value of the NAB to the length of the 12-bit RCA, i.e., the x-axis value is given by (NAB/12)x100%); results are presented for NAB values that correspond to 25%, 50%, 75% and 100% of the 12-bit RCA.



Among all previous approximate adders, the InXA2-based RCA has the least error rate; the error rates of InXA1 and InXA3 based RCAs are nearly the same and correspond to the least error rate among approximate adders. This is due to the feature in the InXA2 cell (an error distance of 1 is introduced) of no error propagation (i.e. the error from the LSB does not propagate to the higher bit cell in the adder). Moreover, the overall NMED of InXA2 based RCA is the least among all approximate adders as well as for InXA1 and InXA3. However, the MRED of InXA1 based RCA is the worst among approximate adders; the MREDs of InXA2 based RCAs (except for AMA) have the least values among the proposed and inexact approximate adders.



#### IV. APPLICATION: IMAGE ADDITION

The inexact adders are evaluated for image addition; arithmetic is widely used in image processing to perform a variety of processing tasks, such as masking/enhancing parts of an image or determining motion (through subtraction). In this paper, two images are added to generate a new image. Lena and Tulips (Figs. 13(a) and 13(b)) are the first two images to be added; these images have the same size, i.e. m = n = 256 and were selected due to their opposite features as pertaining to image analysis. The added images with a NAB of 5 for accurate and InXA2 based RCAs are shown in Figs. 13(c) and 13(d) respectively (accurate addition is on a bit-by-bit basis).



Figure 13. Image addition; (a) Lena, (b) Tulips, (c) Accurate addition, (d) InXA2.

The following measures (as related to image analysis) are used to evaluate the addition results:

• Mean Square Error (MSE): MSE  $= \frac{1}{m \times n} \sum_{j=1}^{m} \sum_{k=1}^{n} (p_{j,k} - \hat{p}_{j,k})^2$  where  $p_{j,k}$  is the accurate pixel value at row j and column k of the image,  $\hat{p}_{j,k}$  is the approximate value of the same pixel, m and n are the size of the image (rows and columns respectively).

- Peak Signal to Noise Ratio (PSNR):  $PSNR = 10 \log \frac{(2^n - 1)^2}{NCE}$
- Normalized Cross Correlation (NK):  $NK = \sum_{j=1}^{m} \sum_{k=1}^{n} (p_{j,k} \cdot \hat{p}_{j,k}) / \sum_{j=1}^{m} \sum_{k=1}^{n} p_{j,k}^{2}$
- Mean Absolute Error (MAE):

MAE = 
$$\frac{1}{m \times n} \sum_{j=1}^{m} \sum_{k=1}^{n} |p_{j,k} - \hat{p}_{j,k}|$$

- Normalized Absolute Error (NAE):  $NAE = \frac{\sum_{j=1}^{m} \sum_{k=1}^{n} |p_{j,k} - \hat{p}_{j,k}|}{\sum_{j=1}^{m} \sum_{k=1}^{n} p_{j,k}^{2}}$
- Average Difference (AD):
- $AD = \frac{1}{m \times n} \sum_{j=1}^{m} \sum_{k=1}^{n} (p_{j,k} \hat{p}_{j,k})$ Maximum Absolute Difference (MD):  $MD = \max_{m,n} \{ |p_{j,k} - \hat{p}_{j,k}| \}$
- Structural Content (SC):  $SC = \sum_{j=1}^{m} \sum_{k=1}^{n} p_{j,k}^{2} / \sum_{j=1}^{m} \sum_{k=1}^{n} \hat{p}_{j,k}^{2}$

Figure 14 shows the MSE of the addition process using a 16bits adder; the InXA2 based RCA generates the least MSE, thus confirming the conclusions found in the previous section about the superior performance of the proposed design also for this application.



Figure 14. MSE results of adding Lena and Tulips using all approximate adders at different values of NAB.



Figure 15. PSNR results of adding Lena and Tulips using all approximation methods with different vales of NAB.

Figure 15 plots the PSNR at different NAB values; for NAB values in the interval of 1-3 and 8 and higher, a InXA2 based RCA performs the best. For NAB values of 4-7, the AMA1 based RCA produces the best PSNR followed by InXA2. AXA and InXA3 based RCAs produce the lowest PSNR values for all NAB values.

Figures 16 and 17 show the NAE and SC respectively. Figure 16 confirms the PSNR results by showing that the InXA2 based CA performs the best, followed by AMA1 for NABs greater than 7. In Figure 17, SC shows that the InXA2 based RCA performs better than AMA1 for all NAB values. However, it also shows that the AMA3, AXA and InXA3 based RCAs perform better than the remaining approximate adders for NAB values of 8 and above.



Figure 16. NAE of adding Lena and Tulips using all approximate adders at different values of NAB.



Figure 17. SC of adding Lena and Tulips using all approximate adders at different values of NAB

Figure 18 depicts MD, while Figure 19 shows the AD for all approximate adders. Again, the InXA2 based RCA produces the least error (Figure 18). The normalized cross correlation is shown in Figure 20.



methods with different vales of NAB.



Figure 19. AD of adding Lena and Tulips using all approximate adders at different values of NAB



Figure 20. NK of adding Lena and Tulips using all approximate adders at different values of NAB



Figure 21. Image addition; (a) Cameraman, (b) Rice, (c) Accurate addition, (d) InXA2.



Figure 22. Cameraman and Rice image addition errors; (a) Accurate addition, (b) Accurate error, (c) AMA1, (d) AMA2, (e) AMA3, (f) AMA4, (g) InXA1, (h) InXA2, and (i) AXA and InXA3.

To further validate the above results, Cameraman (Figures 21(a)) and Rice images (Figures 21(b)) are also considered. The results of an accurate addition are shown in Figure21(c);

Figure 21(d) shows the resulting image using InXA2, as the most efficient inexact adder proposed in this paper. These results confirm the findings previously presented by adding Lena and Tulips, namely that InXA2 yields an excellent image quality. This is pictorially shown in Figure 22 by considering the absolute error generated by all inexact adders; also in this case, it is evident that InXA2 produces the least values of absolute errors (corresponding to a darker image).

# V. CONCLUSION

This paper has presented three new inexact adder cell designs (InXA) that have a lower circuit complexity (in the number of transistors) that other approximate circuits found in the technical literature. The simulation results show that the proposed InXA2 adder outperforms the other proposed inexact adders and previous approximate adders in terms of average energy dissipated and average EDP; the proposed InXA1 adder has the least delay. As for InXA3, although it dissipates less energy compared with other previous approximate adder cells, it also incurs in a larger delay.

The analysis is then extended to RCAs using the proposed approximate cells using image addition as application. Among all previous approximate adders, the InXA2-based RCA has the least error rate; the error rates of InXA1 and InXA3 based RCAs are nearly the same and correspond to the least error rate among approximate adders. The NMED of a InXA2 based RCA is the least among all approximate adders as well as for InXA1 and InXA3; however, the NMED of a InXA1 based RCA is the worst among approximate adders. The mean error distance of the InXA2 based RCA is the least among the proposed and inexact approximate adder. In conclusion both at cell and RCA levels, InXA2 outperforms all proposed and previous approximate adders.

#### REFERENCES

- R. Hegde and N. R. Shanbhag, "Soft digital signal processing," IEEE Trans. On Very Large Scale Integration System, vol. 9, no. 6, pp. 813– 823, Jun.2001.
- [2] N. Banerjee, G. Karakonstantis, and K. Roy, "Process variation tolerant low power DCT architecture," in Proc. Design, Automat. Test Europe, 2007, pp. 1–6.
- [3] D. Shin and S. K. Gupta, "Approximate logic synthesis for error tolerant applications," in Proc. Design, Automat. Test Europe, 2010, pp. 957–960.
- [4] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, "Low-power digital signal processing using approximate adders," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 32,no. 1, pp. 124–137, Jan 2013.
- [5] J. Liang, J. Han, and F. Lombardi, "New metrics for the reliability of approximate and probabilistic adders," IEEE Transactions on Computers, vol. 62, no. 9, pp. 1760–1771, 2013.
- [6] H. Jiang, J. Han and F. Lombardi, "A Comparative Review and Evaluation of Approximate Adders," Proc. ACM/IEEE Great Lakes Symposium on VLSI, pp. 343-348, Pittsburgh, May 2015.
- [7] Z. Yang, A. Jain, J. Liang, J. Han, and F. Lombardi, "Approximate xor/xnor-based Adders for Inexact Computing," Proceedings of the IEEE International Conference on Nanotechnology, Beijing, China, August 2013.
- [8] J.-F. Lin, Y.-T. Hwang, M.-H. Sheu, and C.-C. Ho," A Novel High-Speed and Energy Efficient 10-Transistor Full Adder Design" IEEE Transactions on Circuits and Systems—I: Regular Papers, vol. 54, no. 5, May 2007.
- [9] Predictive Technology Model (PTM), http://ptm.asu.edu