# Dynamic Bit-width Adaptation in DCT : Image Quality versus Computation Energy Trade-off \*

Jongsun Park, Jung Hwan Choi and Kaushik Roy School of Electrical and Computer Engineering, Purdue University West Lafayette, IN 47907, USA {jongsun, choi56, kaushik}@ecn.purdue.edu

## Abstract

We present a dynamic bit-width adaptation scheme in DCT applications for efficient trade-off between image quality and computation energy. Based on sensitivity differences of 64 DCT coefficients, various operand bit-widths are used for different frequency components to reduce computation energy in DCT operation. Numerical results show that our DCT architecture can achieve power savings ranging from 36 % to 75% compared to normal operation.

### 1 Background

VLSI implementation of digital signal processing algorithms requires fixed-point arithmetic since floating point arithmetic needs more area and power consumption. When we determine the word length of a system, we should consider the trade-off between system performance and implementation cost [1]. In order to satisfy the power consumption, area constraint and system performance, accurate bitwidth selection is clearly required [2].

In this paper, we propose dynamic bit-width adaptation scheme in DCT applications for efficiently trading off image quality for computation energy. Figure 1 (a) shows an example of  $8 \times 8$  block of image data in spatial domain and figure 1 (b) shows corresponding outputs of 2-D DCT that are  $8 \times 8$  block of 64 DCT coefficients. Since DCT has energy compaction property, signal energy of the DCT outputs is concentrated on a few low frequency components while most other frequency components are negligibly small. Figure 1 (c) also shows the output data after the quantization. As shown in the figure, high frequency DCT coefficients that we got from FDCT become even smaller after quantization operation.

Generally, we use same operand bit-widths for calculating both low frequency and high frequency DCT coefficients. However, considering precision loss due to quantization and sensitivity difference between DCT coefficients, arithmetic operations with maximum operand bit-width are



Figure 1. (a) Normalized 8  $\times$  8 block of image data. (b) Output of Forward DCT. (64 DCT coefficients) (c) Output of quantization operation.

not always required in DCT operation. In our approach, smaller precision bit- widths are used for calculating high frequency components and larger precision bit for low frequency components. We can save computation power for calculating perceptually less significant high frequency DCT coefficients at the expense of slight image quality degradation.

### 2 Low Power dynamic bit-width adaptation in DCT

 $8 \times 8$  2-D DCT can be implemented with two 1-D DCT units by the row-column decomposition method. In  $8 \times 1$ row DCT, each column of the input data is computed and the outputs are stored in the transformation memory. Another  $8 \times 1$  column DCT is performed to yield desired 64 DCT coefficients. Initially, we used the data input bit-width of the row DCT as 9 and that of the column DCT as 12. To reduce the computation energy under image quality constraints, we reduce input bit-widths of arithmetic units for the visually less sensitive high frequency coefficients. As the operand bit-widths is reduced, the power consumed for computing DCT also decrease with increasing image quality degradation. Based on the simulation with 20 different color images, we propose three different image quality / computation energy trade-off cases are proposed. Figure

<sup>\*</sup>This research was funded by Semiconductor Research Corporation.

|         | original | level 1 | level 2 | level 3 |
|---------|----------|---------|---------|---------|
| lena    | 34.97    | 34.85   | 33.79   | 31.51   |
| peppers | 36.16    | 35.55   | 33.02   | 30.60   |
| monarch | 36.05    | 35.88   | 34.00   | 31.08   |
| sail    | 34.40    | 34.15   | 32.75   | 30.02   |

Table 1. PSNR(dB)'s for 4 color images for different trade-off levels.

2 shows the input bit-width of row and column DCT with 3 different trade-off levels. As it goes to the higher trade-off levels, the input bit-width for calculating both row and column DCT is reduced from the less sensitive high frequency components. Table 1 shows the PSNR's of 4 images for different trade-off levels.

| Row DCT             |         |         |         |         |         |         | Column DCT |         |          |          |    |          |    |          |          |                      |
|---------------------|---------|---------|---------|---------|---------|---------|------------|---------|----------|----------|----|----------|----|----------|----------|----------------------|
| Normal<br>Operation | Z0<br>9 | Z1<br>9 | Z2<br>9 | Z3<br>9 | Z4<br>9 | Z5<br>9 | Z6<br>9    | Z7<br>9 | Z0<br>12 | Zi<br>12 |    | Z3<br>12 |    | Z5<br>12 | Z6<br>12 | Z <sup>*</sup><br>12 |
| Trade off           | Z0      | Z1      | Z2      | Z3      | Z4      | Z5      | Z6         | Z7      |          | Zi       | Z2 | Z3       | Z4 | Z5       | Z6       | Z <sup>*</sup>       |
| case 1              | 9       | 9       | 6       | 6       | 6       | 4       | 0          | 0       |          | 12       | 9  | 9        | 9  | 6        | 0        | 0                    |
| Trade off case 2    | Z0      | Z1      | Z2      | Z3      | Z4      | Z5      | Z6         | Z7      | Zð       | ZŤ       | Z2 | Z3       | Z4 | Z\$      | ZŠ       | Z <sup>‡</sup>       |
|                     | 9       | 6       | 4       | 4       | 0       | 0       | 0          | 0       | 12       | 9        | 6  | 6        | 0  | 0        | 0        | 0                    |
| Trade off case 3    | Z0      | Z1      | Z2      | Z3      | Z4      | Z5      | Z6         | Z7      | Zð       | Z1       | Z2 | Z3       | Z4 | Z5       | Z6       | Z <sup>*</sup> 7     |
|                     | 9       | 4       | 4       | 0       | 0       | 0       | 0          | 0       | 9        | 6        | 4  | 0        | 0  | 0        | 0        | 0                    |

Figure 2. Input operant bit-width in 2-D DCT operation for 3 trade-off levels.

#### **3** Low power reconfigurable DCT

The proposed image quality / power consumption trade off approach was implemented using carry save adder trees. Figure 3 shows the carry save adder for calculating  $Z_1$ . Figure 3 (b) shows the inputs of the carry save adder tree in figure 3 (a). As an initial configuration, full precision bit-width (9 bit) is used. In order to reduce DCT power consumption, reduced input bit-width (6 bit, 4 bit) are used instead of 9 bit by forcing LSB parts to zeroes. With the reduced precision bit-width approaches, the corresponding adders, which are shown as the colored part in figure 3 (b), are turned off.

Table 2 shows power consumption of the whole 2-D DCT architecture at different trade-off levels. As the tradeoff level goes higher, we can achieve more power saving in 2-D DCT operation with degradation of image quality. Depending on the required amount of power saving, the proposed scheme allows the selection of different bit-width configuration in DCT architectures, thus achieving considerable power consumption reduction at the cost of image quality degradation.

#### 4 Conclusion

We propose a low power reconfigurable DCT architecture which is based on the image quality and computation



Figure 3. (a) Balanced adder tree architecture for calculating  $z_1$ . (b) Inputs of the carry save adder tree.

| Table  | 2.  | Power   | consumption | for | different |
|--------|-----|---------|-------------|-----|-----------|
| trade- | off | levels. |             |     |           |

|            | Normal    | Trade-off | Trade-off | Trade-off |
|------------|-----------|-----------|-----------|-----------|
|            | operation | level 1   | level 2   | level 3   |
| power (mW) | 94.05     | 60.54     | 37.70     | 23.8      |
| percentage | 100%      | 64%       | 40%       | 25%       |

energy trade-off. The DCT architecture uses the dynamic bit-width adaptation, where operand bit-widths are changed according to the image quality and/or power consumption requirements. Three trade-off levels are proposed and the proposed DCT architecture can dynamically reconfigured from one trade-off level to another. The idea presented in this paper can assist in the design of DCT algorithm and its implementation for low power applications.

#### References

- T. Xanthopoulos and A. Chandrakasan, "A Low-Power DCT Core using Adaptive Bitwidth and Arithmetic Activity Exploring Signal Correlations and Quantization", IEEE Journal of Solid-State Circuits (JSSC), Vol. 35, No. 5, May 2000.
- [2] C. Nikol, P. Larsson, K. Azadet, and N. O'neill, "A Low-Power 128 tap digital adaptive equalizer for broadband modems", IEEE Journal of Solid-State Circuits (JSSC), Vol. 32, pp. 1777-1789, Nov. 1997.