# Scan Power Minimization Through Stimulus and Response Transformations

Ozgur Sinanoglu and Alex Orailoglu

Computer Science and Engineering Department University of California, San Diego La Jolla, CA 92093

{ozgur, alex}@cs.ucsd.edu

# ABSTRACT

Scan-based cores impose considerable test power challenges due to excessive switching activity during shift cycles. The consequent test power constraints force SOC designers to sacrifice parallelism among core tests, as exceeding power thresholds may damage the chip being tested. Reduction of test power for SOC cores can thus increase the number of cores that can be tested in parallel, improving significantly SOC test application time. In this paper, we propose a scan chain modification technique that inserts logic gates on the scan path. The consequent beneficial test data transformations are utilized to reduce the scan chain transitions during shift cycles and hence test power. We introduce a matrix band algebra that models the impact of logic gate insertion between scan cells on the test stimulus and response transformations realized. As we have successfully modeled the response transformations as well, the methodology we propose is capable of truly minimizing the overall test power. The test vectors and responses are analyzed in an intertwined manner, identifying the best possible scan chain modification, which is realized at minimal area cost. Experimental results justify the efficacy of the proposed methodology as well.

## 1. INTRODUCTION

Attaining parallelism among core tests translates into SOC test application time reductions; yet increased power dissipation elevates the risk of damaging the chip under test. The test power problem is especially acute in a scan-based environment as the shift operations for the delivery of test stimuli and the collection of responses create toggling in scan cells. The consequent switching activity in the core internal logic magnifies test power dissipation.

Test power dissipation during shift cycles can be reduced by decreasing the number of transitions that occur inside the scan cells. The shifting of complementary values in consecutive cycles results in the toggling of each scan cell through which complementary adjacent stimulus bits are to pass. An analogous argument can be drawn in a similar manner for the consecutive bits of responses observed through the scan-out pin. High correlation between consecutive inserted stimulus bits and consecutive observed response bits reflects into reduced scan chain transitions, consequently.

In traditional scan-based test, the stimulus inserted to the scan chain is identical to the test vector delivered into the scan cells; similarly, the response transmitted to the tester is identical to the response captured in the scan cells. In the proposed methodology, we break this equivalence by modifying the scan chain; we propose the insertion of logic gates between the scan cells, transforming over numerous shift cycles the stimulus inserted to the actual test vector and the response captured to the response transmitted, with the number of shift cycles equaling the number of scan cells. As the introduction of the logic gates is confined to within the scan path, it creates no interference with the functional operation of the chip timing-wise, fully preserving SOC performance. The proposed scan chain modification and the consequent test data transformation can thus be utilized to reduce the number of scan chain transitions, subsiding the test power dissipation significantly.

The transformations utilized are restricted to bijective ones, such as XOR gates and inverters, as the delivery of any test vector and the observation of any captured fault effect should be guaranteed. In this paper, we introduce a matrix band algebra to model the impact of any possible XOR/inverter insertion on test data transformation. Such a modeling enables the realization of any possible bijective test data transformation through the insertion of logic gates in appropriate locations. As the complete transformation space can thus be explored globally, optimal power reductions can be attained by the proposed methodology.

As each scan chain modification imposes distinct transformations on stimuli and responses, the proposed algorithm should search for the scan chain modification that minimizes the overall test power. Modeling response transformations though is more complicated compared to modeling stimulus transformations, which have been thoroughly analyzed by previously suggested techniques; these schemes have overlooked the highly challenging modeling issues associated with response transformations, and hence failing to handle scan-out power issues. The increased complexity of response transformation modeling stems from the fact that transformed response bits depend not only on the captured response bits but also on the inserted stimulus bits as well; transformed stimulus bits, on the other hand, depend only on the test vector bits to be delivered. Identifying the connection with the XOR/inverter insertion is more challenging in the case of response transformations, consequently. In this paper, we present a complete analysis, investigating not only the stimulus transformations but also the response transformations, resolving the associated modeling challenges successfully. Based on the outcome of our analysis, we develop an algorithm which handles the stimuli and the responses in an intertwined manner, identifying the best possible scan chain modification and hence minimizing the overall test power.

## 2. PREVIOUS WORK

Numerous methodologies that aim at test power reduction in a scanbased environment have been proposed recently. The utilization of externally controlled gates [1, 2] has been shown to reduce test power drastically, albeit at the expense of functional performance degradation due to additional gate delays introduced on functional paths. Appropriate primary input assignments during the shift cycles [3, 4] help reduce transition propagation from the scan chain to the circuit under test; however, the effectiveness of such techniques is limited, as typically circuits are controlled mostly by scan chains rather than primary inputs. Scan chain partitioning techniques [5, 6] have also been proposed for test power reduction; the scan chain is decomposed into several partitions so as to have only one of the partitions active at a time, reducing scan chain rippling. Test vector ordering and scan-latch ordering techniques [7], modification of test cube compaction [8] and test generation [9] procedures constitute a set of alternative techniques for reducing scan power dissipation. These techniques extract test power reductions yet at the expense of prolonged test application time [8, 9], performance degradation [1, 2],



Figure 1: Impact of inverter insertion

or possible layout constraint violations [7].

A number of scan chain modification techniques for test power reduction [10, 11, 12] have been proposed. These techniques essentially rely on scan chain modifications with block-contained impact on the test data transitions. Constraining the impact of scan chain modifications within blocks delivers algorithmic benefits but necessitates the imposition of certain restrictions on XOR gate insertions, resulting in modifications that are locally optimal but may stray considerably from the global optimum. The technique in [11] modifies the scan chain by utilizing only inverter insertions, while in [12] X(N)OR gate insertions with blockcontained impact are employed. Even though the techniques outlined in [10, 11, 12] exhibit algorithmic simplicity, the restriction to a subset of XOR gate insertion configurations significantly hampers optimality in power reductions, failing to reap appreciable additional power reductions possible.

Scan chain modifications with the block-contained impact constraint eliminated have been proposed in [13, 14]. In these techniques, the impact of any possible XOR/inverter has been successfully modeled, enabling the exploitation of any possible bijective transformation. These schemes have investigated only stimulus transformations; although they have attained optimal scan-in power reductions, they have overlooked the responses, leaving the scan-out power issues unhandled and hence failing to minimize overall test power.

#### 3. MOTIVATION

The identification of the optimal scan chain modification hinges on the modeling of the impact of XOR/inverter insertion on test data transformation. Specifically, the following questions need to be answered. How can the optimal mapping that transforms the given test vectors and responses be identified? How can this transformation be realized through the appropriate scan chain modification? To answer these questions, one needs to examine the impact of logic gate insertion between scan cells.

Inverters have local impact on the stimulus transitions, as the insertion of an inverter between two scan cells complements all the test vector bits that are to pass through it while keeping intact the bits prior to the insertion point. The clustered complementation of the bits preserves the transitions between them except for the single transition between the two stimulus bits that are to be delivered into the scan cells connected through the inverter. An analogous argument can be made for the impact of inverters on the response transitions; the only transition that gets complemented is the one between the two response bits that are captured in the scan cells connected through the inverter. The overall effect is limited to a complementation of the transition frequency at that point only.

A quick look at XOR gate insertion on a scan path reveals the challenges associated with the modeling of the impact of XOR gate insertion though. The example in Figure 2 illustrates the impact of the insertion of XOR gates on the scan path; the actual test vectors which happen to be the stimuli to be inserted to the unmodified scan chain are given in Figure 2.A, while Figures 2.B and 2.C illustrate the modified scan chains along with the stimuli to be inserted. It can be seen that the insertion of an XOR gate as in Figure 2.B eliminates all the transitions between the fourth and the fifth bit positions. The transitions between the first two stimuli bits can be eliminated by the insertion of yet another XOR gate



Figure 2: Scan chain modification

between the first two scan cells as shown in Figure 2.C; however, this modification eradicates partially the benefit of the previous modification as some transitions are re-introduced between the fourth and the fifth bit positions. An algorithm that identifies the optimal scan chain modification necessitates an understanding and modeling not only of the precise impact of the insertion of XOR gates but also of their interference.

Let us examine how a stimulus is transformed into a test vector over a number of shift cycles; in the example in Figure 3, a stimulus of "111011" is transformed into "100101" over 6 shift cycles. This transformation operation can also be formulated in binary matrix multiplication algebra; an upper triangular binary square transformation matrix with each dimension equaling the number of scan cells can be rightmultiplied by a stimulus matrix, to produce the test vector matrix.<sup>1</sup> Each non-zero entry in the transformation matrix denotes the test vector bits to be XORed in forming the transformed stimulus bits. We denote this matrix as *I2D*, as this transformation represents the mapping from the Inserted stimulus to the Delivered test vector. In the same figure, the I2D matrix embeds three discontinuities. It is interesting to note that the ones on the first upper off-diagonal band right before the second and the fifth columns account for the XOR gates inserted before the second and the fifth scan cells, respectively; the discontinuity on the second upper off-diagonal band however cannot be accounted for as straightforwardly.

The transformation mapping the test vectors to the inserted stimuli and the one mapping the captured responses to the observed responses should somehow be related to the XOR/inverter insertions. The former mapping, which we denote as D2I, is simply the inverse of I2D, which we have mentioned in the previous paragraph; D2I is an upper triangular square matrix that can be left-multiplied by a test vector matrix to produce the inserted stimulus matrix. The Captured to Observed response transformation, which we denote as C2O, however, necessitates further analysis, as the observed response bits depend not only on the captured response bits but also on the test vector bits that reside inside the scan cells during shift operations. The C2O matrix is not a square matrix, consequently; the number of rows in C2O equals one less than twice the number of scan cells, while the number of columns equals the number of scan cells. C2O contains one row for each of the stimulus bits except for the leftmost bit, and one row for each response bit. In the upcoming sections, we will show how a relationship can be established between C2Oand I2D, thoroughly explaining the impact of XOR/inverter insertions on the test data transformations realized.

The inverter insertion impact is simpler to model compared to XOR gates, as the inverter insertion impact consists simply of transition fre-

<sup>&</sup>lt;sup>1</sup>A simple transposition argument can be used to show that a lower triangular matrix can be left-multiplied by the stimulus matrix, equivalently. The test stimulus and vector will be denoted in column form in that case.



#### Figure 3: Stimulus transformation operation

quency complementation. Inverters enable handling of XNOR insertions also, by utilizing XOR transformations that map to high transition frequencies, which are then complemented through inverter insertion. The identification of the optimal logic gate insertions necessitates a two phase algorithm. The XOR insertions aim at skewing the transition frequency as close to 0.0 or 1.0 as possible, with the latter case being resolved through an inverter insertion; *maximal transition frequency skew* is the aim of XOR transformations, consequently.

## 4. MATRIX BAND ALGEBRA

Modeling the impact of XOR gate insertion hinges on an in-depth matrix-based analysis. In this section, we introduce an algebraic formulation based on *matrix bands*, engendering such a modeling. The algebraic formulation we present focuses on triangular matrices, as constrained by the unidirectionality of the scan shift operations.

We represent a contiguous subset of the  $d^{th}$  off-diagonal band for an  $n \times n$  matrix composed of a consecutive set of 1's from the *init*<sup>th</sup> column to the rightmost edge of the matrix as the  $B_{init}^d$  band.  $B_{d+1}^d$ , for example, denotes the  $d^{th}$  upper off-diagonal band with a string of 1's starting from the intersection of the  $d + 1^{th}$  column and the first row. Similarly,  $B_1^0$  denotes the full diagonal band, while  $B_n^{n-1}$  denotes the band composed of the single top right corner entry. A  $B_{init}^d$  designation with  $d \ge n$  or *init* > n signifies a null band.

One may be tempted to surmise, albeit incorrectly, that the representational power of this band denotation falls short of being able to represent any upper-triangular matrix, as the band definition constrains the string of 1's to run contiguously until the rightmost matrix column. By taking the XOR of a number of bands, however, any intermixed string of 0's and 1's can be represented; a *band list*, denoted as *BL*, can therefore be introduced to represent any upper-triangular matrix:

$$BL^d_{(l_1, l_2, \dots, l_n)} = \bigoplus_i B^d_{l_i} \tag{1}$$

The band list shown above denotes that strings of 1's start on the positions designated by the odd numbered elements of the list while strings of 0's start on the even numbered ones. Any upper triangular matrix can hence be represented as the collection of various band lists; the *I*2*D* matrix in Figure 3, for instance, can be represented as  $I \oplus BL_{(2,5)}^1 \oplus BL_{(5)}^2$ .

Taking the XOR of  $B_{l_i}^d$  with a collection of other bands results in the complementation of all the off-diagonal band entries between the  $l_i^{th}$  and the rightmost columns. If  $l_i$  exists in a band list, it creates a discontinuity  $(0 \rightarrow 1 \text{ or } 1 \rightarrow 0)$  on the band between the  $(l_i - 1)^{th}$  and  $l_i^{th}$  columns. Consequently, each discontinuity on an off-diagonal band corresponds to a distinct band; the *I2D* matrix in Figure 3 consists of 3 discontinuities, resulting in 3 distinct bands, namely,  $B_2^1$ ,  $B_5^1$ , and  $B_5^2$ , in its band representation.

## 5. MODELING THE IMPACT OF XOR GATES

In this section we present the relationship between XOR gate insertion and the consequent test data transformation through the matrix band



algebra defined in the previous section. First we introduce the matrixbased transformations and explain the relationships between these transformations. We follow this up with an exposition of the relationship between the I2D matrix bands and XOR gate insertion, as this matrix displays the modified scan chain characteristics more explicitly.

#### 5.1 Test Data Transformations

The impact of XOR gate insertion can be modeled as transformation functions. The D2I function maps the test vectors to the stimuli needed at the input of the scan chain; I2D, on the other hand, maps the inserted stimuli to the data delivered into the scan cells, i.e., the test vectors to be applied. The C2O function maps the captured responses to the responses observed by the tester; the observed response bits depend both on the captured response bits and on the subsequent stimulus bits. Figure 4 illustrates these transformations for a scan chain modified through XOR gate insertions; the subsequent stimulus bits are denoted by  $s_i$  in the response transformation in this figure. In this example, the leftmost captured response bit,  $t_1$ , is transformed into  $t_1 \oplus s_5$ , with  $s_5$  denoting the rightmost bit of the subsequent stimulus. Figure 5 provides the corresponding matrix representations for I2D, D2I and C2O in Figure 4.

|                                                                                                                                       |                                                                                                                                       | 020.                                                 |
|---------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------|
| I2D:                                                                                                                                  | D2I:                                                                                                                                  |                                                      |
| $\begin{bmatrix} 1 & 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}$ | $\begin{bmatrix} 1 & 0 & 1 & 1 & 1 \\ 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}$ | $\begin{array}{cccccccccccccccccccccccccccccccccccc$ |
|                                                                                                                                       |                                                                                                                                       | $0 \ 0 \ 0 \ 0 \ 1$                                  |

Figure 5: Matrix representation for I2D, D2I and C2O in Figure 4 The product of the I2D and D2I matrices equals the identity matrix, denoted by I, as one can verify by performing the multiplication<sup>2</sup> in Figure 5.

$$\mathbb{I}2\mathbb{D} \cdot \mathbb{D}2\mathbb{I} = \mathbb{I} \tag{2}$$

C20

Finding the connection between I2D and the response transformation, C2O, is slightly more complicated though. To understand this relationship, we need to define the Scan Chain Characteristic, denoted as SCC, which represents the transformation function for the complete left-toright traversal of the modified scan chain; SCC denotes the value that a bit shifted all the way from the scan-in pin to the scan-out pin transforms into. It is apparent that the rightmost column of the I2D matrix yields SCC. For any bit position, the composition of the associated I2D and C2O transformations is the same and equals the SCC of the modified scan chain. Intuitively, one can understand this statement by considering the  $i^{th}$  captured response bit which is to be shifted out by traversing all the scan cells at positions greater than i; the consequent transformation function constitutes the C2O function corresponding to the  $i^{th}$  bit.

<sup>&</sup>lt;sup>2</sup>As these matrix multiplications model XOR operations, the inner products are to be computed through XORing the product terms.



Figure 6: The relationship between I2D and C2O

The traversal of the remainder of the scan chain, namely the scan cells at positions smaller than i, corresponds to the I2D of the same bit; the composition of the two transformation functions yields the traversal of the complete scan chain, namely SCC. In the example in Figure 4, the SCC of the modified scan chain constitutes the XOR of a bit with the previous bit. The third captured response bit, for instance, transforms into the XOR of itself and the first response bit. The first stimulus bit is delivered into the first scan cell intact, while the third stimulus bit transforms into the XOR of itself with the first and the second stimulus bits. The result of the composition operation for the third bit itself with the second bit which yields the SCC. The composition operation can be formulated by another matrix multiplication as in Figure 6, establishing the relationship between C2O and I2D. The matrices in Figure 5.

## 5.2 XOR Gate Insertion and Transformations

Mapping a transformation function to the appropriate scan chain modification necessitates an understanding of the impact of XOR gate insertions in the scan chain. In this subsection, we analyze the relationship between XOR gate insertion and the resulting I2D implemented by the modified scan chain.

We first start with the impact of a single XOR gate insertion. The stimulus bits that are delivered into the scan cells without passing through the XOR gates remain intact. The stimulus bits that pass through the XOR gate however are transformed, all identically, if a single XOR gate is being inserted. A single m-input XOR gate insertion between the  $(s-1)^{th}$  and  $s^{th}$  scan cells results in a transformation denoted by:

$$\mathbb{I}2\mathbb{D} = \mathbb{I} \oplus \bigoplus_{i=1}^{m} B_s^{d_i} \tag{3}$$

wherein  $d_i$  denotes the number of scan cells between the XOR gate and the  $i^{th}$  input tap and I, the identity matrix.

In multiple XOR gate insertions, two distinct cases need to be consid-



Figure 7: I2D computation



Figure 8: Computation of D2I and C2O entries

ered. In the first case, the XOR gates inserted overlap; the consequent I2D can be obtained by taking the sum of the individual bands corresponding to each XOR gate. In the second case, the XOR gates are non-overlapping; due to the interference of the XOR gates, the consequent I2D includes the product<sup>3</sup> of the two bands in addition to the individual bands. Figure 7 illustrates a scan chain modified through four XOR gate insertions. In this example, the modifications corresponding to  $B_3^1$  and  $B_4^1$  overlap; there is no other overlapping among the four XOR gates, resulting in a number of product bands with no more than three terms each. The band list representation and the resulting I2D matrix are provided in figure 7 as well.

# 6. ALGORITHMIC FRAMEWORK

Having identified the impact of inverter and XOR gate insertion on test data transformations, in this section we present algorithms for test power minimization through the implementation of the transformation that optimally maps any given set of test vectors and responses. Specifically, the proposed test power minimization algorithms consist of two subroutines; the first one identifies the optimal test data transformation based on the test set, while the second one identifies the scan chain modification with minimal area that implements the optimal transformation.

In the following subsections, IS represents the matrix whose rows are composed of the transformed stimuli and can be obtained by the rightmultiplication of TV, the test vector matrix, by D2I. It can be seen that  $TV \cdot D2I = IS$  and equivalently  $TV = IS \cdot I2D$ . Similarly, OR represents the matrix whose rows are composed of the transformed responses and can be obtained by the right-multiplication of TVR, an augmented matrix consisting of the concatenation of test vector and response bits, by C2O, so as to account for the fact that observed response bits depend on test stimulus and captured response bits. As the transformed response bits depend on all the bits of the subsequent test vector except the leftmost bit, TVR does not contain the leftmost bit of the test vectors. Furthermore, in the augmented matrix, each test response vector is located in the same row with the subsequent test vector.

The transitions between two consecutive IS(OR) columns can be computed by XORing the two columns bitwise, forming the corresponding *transition column*; each transition column entry denotes the boolean expression that represents whether a transition exists between the associated transformed test data bits.

## 6.1 Optimal Test Data Transformation

In this subsection, we present the proposed algorithm for identifying the optimal test data transformation; the algorithm searches for the I2D matrix that yields the transformed test data, IS and OR, with maximized

<sup>&</sup>lt;sup>3</sup>Simple matrix algebra manipulations yield  $B_{i_1}^{d_1} \cdot B_{i_2}^{d_2} = B_{i_2}^{d_1+d_2}$  for the product of two bands.



Figure 9: Computation of transition columns

transition frequency skew. The maximization of the transition frequency skew in IS and OR is performed in an intertwined manner, computing an I2D matrix that minimizes overall test power.

To enable an efficient search for the optimal I2D matrix, the D2I and C2R entries are represented in terms of the I2D entries; the computation of these matrix entries is performed based on the relationships presented in section 5.1. An example that illustrates this computation is provided in Figure 8 for a scan chain consisting of 4 scan cells; the reader can verify that the relationships provided in Equation 2 and Figure 6 are satisfied by the matrices in Figure 8.

The proposed algorithm then proceeds by transforming the test vectors and responses through matrix multiplication operations. The transformed test data entries are computed in terms of I2D entries. The transition columns are subsequently obtained by taking the column-wise XOR of the transformed test data columns. The example in Figure 9 illustrates the computation of the first two transition columns associated with the transformed test data, IS and OR, for the given set of test vectors and responses.

As the optimization criterion consists of transition frequency skew maximization, the I2D entries should be assigned so as to set the entries of each transition column to all 0's or all 1's. Two systems of equations are formed, consequently, one for minimizing and one for maximizing the transition frequency in a transition column. The stimulus and response transition columns are handled in an interleaved manner. To account for the fact that distinct transition columns have distinct contribution to the power cost, the transition columns with larger contributions are handled earlier; a stimulus transition at bit position *i* toggles i - 1 scan cells, while a response transition at the same location toggles N - i - 1 scan cells, with N denoting the number of scan cells. The rightmost stimulus transition column and the leftmost response transition column are therefore handled initially. Subsequent to the assignment of the I2D entries in handling a transition column, the other transition columns are updated.

In each step, the number of equations equals the number of test vectors (responses). The variables in the system consist of the I2D entries corresponding to the column being handled. The solution of a system of equations can be identified through the Gaussian elimination technique [15] if a solution exists. If both systems of equations fail, the I2D entries are assigned, by using a linear dependency based heuristic, so as

| Circuit | Power Reduction (%) | Area Cost (%) |
|---------|---------------------|---------------|
| s713    | 66.3                | 14.2          |
| s953    | 45.2                | 12.1          |
| s1423   | 71.3                | 8.9           |
| s5378   | 69.3                | 9.2           |
| s9234   | 68.7                | 7.3           |
| s13207  | 70.4                | 12.1          |
| s15850  | 73.4                | 11.0          |
| s35932  | 84.5                | 7.2           |
| s38417  | 76.3                | 9.8           |
| s38584  | 78.8                | 10.5          |
| Avg.    | 70.4                | 10.2          |

Table 1: Power reductions attained by the proposed methodology

to satisfy the maximal number of equations; the equations that cannot be satisfied contribute to the power cost. The system of equations which has the largest number of equations satisfied is selected for assigning the I2D column entries.

To account for the stringent area and layout constraints, the proposed algorithm can be slightly modified. Additional constraints may be incorporated into the algorithm by restraining the number of discontinuities in I2D bands incurred when handling the transition columns. As these discontinuities reflect into XOR taps, the incorporation of such a constraint enables the exploitation of the power-area tradeoff.

#### 6.2 Minimal Area Implementation

In this subsection, we provide an algorithm for implementing the I2Dmatrix with minimal area overhead. In Section 5.2, we have shown that overlapping and non-overlapping XOR gate insertions have distinct impact, as an additional product term is introduced in the latter case. The implementation of the I2D matrix at minimal area cost necessitates the selection of the best possible configuration at every step. Yet predicting whether the existence of a product band is beneficial in realizing higher degree bands is a challenging issue; not only the individual impact of the product band but furthermore its interference with other bands should be accounted for as well. We therefore utilize decision variables whenever a decision is to be made regarding whether an XOR tap is to be configured as overlapping or nonoverlapping. As the consequent decision variables become a part of the I2D bands implemented, the existence of some XOR taps introduced in the subsequent steps depends on how these decision variables are assigned. As many conditional XOR taps as possible are eliminated by appropriately setting the decision variables, realizing the minimal area implementation.

The algorithm processes a single I2D band in each step. The bands closer to the diagonal are handled earlier, as the product of a number of lower-indexed bands results in a higher indexed one. After each step, the *I2D* matrix implemented so far by the XOR insertions is updated so as to account for the product bands as well. The band that the algorithm is to implement in the  $i^{th}$  step therefore consists of the difference of the  $i^{th}$  I2D band and the  $i^{th}$  band of the matrix that has been implemented so far by the XOR gate insertions, with the difference computed simply as an XOR operation of the band lists. In every step, the XOR gates that help implement the band being handled are inserted into the scan chain. Whenever a decision is to be made regarding whether an XOR tap is to be configured as overlapping or nonoverlapping, a decision variable is bound to this XOR configuration. The bands implemented by the XOR insertions therefore depend on the decision variables. Subsequent to handling all the bands, the decision variables are assigned so as to minimize the number of XOR taps.

### 7. EXPERIMENTAL RESULTS

The proposed test power reduction scheme has been applied to several fully-scanned circuits in ISCAS89. The fully specified test sets that are used to compute the test power reductions achieved by the proposed

|         |                | Consecutive  | Scan Chain       | Optimal      |          |
|---------|----------------|--------------|------------------|--------------|----------|
| Circuit | Inverters [11] | X(N)ORs [12] | Partitioning [6] | Scan-in [13] | Proposed |
| s713    | 16.0           | 18.7         | 45.5             | 41.3         | 66.3     |
| s953    | 9.7            | 11.6         | 40.3             | 16.3         | 45.2     |
| s1423   | 8.5            | 10.4         | 51.9             | 44.2         | 71.3     |
| s5378   | 8.4            | 10.2         | 38.0             | 43.8         | 69.3     |
| s9234   | 11.2           | 12.5         | 42.6             | 36.1         | 68.7     |
| s13207  | 9.6            | 10.3         | 39.1             | 45.2         | 70.4     |
| s15850  | 13.5           | 16.2         | 53.9             | 47.3         | 73.4     |
| s35932  | 10.1           | 11.3         | 47.9             | 49.2         | 84.5     |
| s38417  | 10.6           | 12.4         | 49.4             | 46.3         | 76.3     |
| s38584  | 11.2           | 13.2         | 50.2             | 48.8         | 78.8     |
| Avg.    | 10.9           | 12.7         | 45.9             | 41.9         | 70.4     |

Table 2: Comparison of test power reduction percentages

methodology are generated by ATALANTA [16].

Table 1 demonstrates the test power<sup>4</sup> reductions attained by our methodology along with the area costs. The second and the third columns depict the test power reduction percentages for fully specified test vectors and the associated area costs, respectively. On the average, more than 70% reduction is attained by the proposed methodology. An interesting observation is that the test power reduction percentages increase as a function of circuit size, as can be seen in the three largest circuits, all exceeding 75% reductions, boding well for real industry SOCs.

Table 2 presents the test power reduction comparisons in the case of fully specified test vectors against various previously proposed techniques [11, 12, 6, 13]. The scan chain modification methodologies in [11, 12] constrain the impact of scan chain modifications within blocks. The experimental results confirm the benefit of eliminating the block-contained impact constraint on XOR gate insertions; the ability to use any possible scan chain modification siginificantly enhances the test power reductions. The proposed methodology is also compared against a scan chain partitioning methodology [6], which attains appreciable test power reductions at the expense of increased test control complexity. This technique constitutes an orthogonal approach that can be applied in conjunction with the methodology we propose. Finally, the proposed methodology is compared against the scan chain modification technique in [13], which attains optimal scan-in power reductions; as scan-out power is not handled in [13], the proposed methodology significantly outperforms [13] in terms of overall test power reductions.

# 8. CONCLUSION

In this paper, we propose a test power reduction methodology for SOC cores, enhancing parallelism among core tests and hence cutting down SOC test time. The methodology we propose is based on the power-wise efficient transformation of a given set of test vectors and responses into a new set of inserted stimuli and observed responses. Such a transformation is realized by inserting bijective gates between the scan cells.

We have individually analyzed the impact of XOR/inverter gate insertion on test stimulus and response transformations realized and the consequent interrelationships between these transformations. We have formulated these relationships through matrix-based equations, which in turn give rise to the implementation of an algorithm that reduces both scan-in and scan-out power simultaneously. The algorithm we propose handles stimulus and response transformations in an intertwined manner, minimizing overall test power.

The methodology we propose has the capability of realizing any bijective test data transformation, as the novel matrix band based algebra we have developed models the impact of XOR/inverter insertion on the actual test data transformation realized. The implementation of the best possible test data transformation through the insertion of logic gates at appropriate locations is thus enabled. The hardware implementation can be performed cost-effectively as the algorithm we have developed is capable of identifying the minimal area cost solution.

To demonstrate the efficacy of the proposed approach, we have applied it on several ISCAS89 benchmark circuits. The experimental results indicate the benefits of the proposed scan chain modification. Especially for larger benchmark circuits, test power reductions up to 85% for fully specified test vectors are attained, strongly favoring the proposed methodology over the previously proposed techniques.

## 9. **REFERENCES**

- H. J. Wunderlich and S. Gerstendorfer, "Minimized power consumption for scan based BIST", in *ITC*, pp. 85–94, 1999.
- [2] R. Sankaralingam and N.A. Touba, "Inserting Test Points to Control Peak Power During Scan Testing", in *DFT*, pp. 138–146, 2002.
- [3] T. Huang and K. Lee, "An input control technique for power reduction in scan circuits during test application", in ATS, pp. 315–320, 1999.
- [4] N. Nicolici, B. M. Al-Hashimi and A. C. Williams, "Minimisation of power dissipation during test application in full-scan sequential circuits using primary input freezing", *IEEE TCOMP*, vol. 47, n. 2, pp. 256–262, 1998.
- [5] L. Whetsel, "Adapting scan architectures for low power operation", in *ITC*, pp. 863–872, 2000.
- [6] Y. Bonhomme, P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch and H.J. Wunderlich, "A modified clock scheme for a low power BIST test pattern generator", in VTS, pp. 306–311, 2001.
- [7] V. Dabholkar, S. Chakravarty, I. Pomeranz and S. M. Reddy, "Techniques for minimizing power dissipation in scan and combinational circuits during test application", *IEEE TCAD*, vol. 17, n. 12, pp. 1325–1333, 1998.
- [8] R. Sankaralingam, R. R. Oruganti and N. A. Touba, "Adapting scan architectures for low power operation", in VTS, pp. 35–40, 2000.
- [9] R. Sankaralingam, B. Pouya and N.A. Touba, "Reducing Power Dissipation During Test Using Scan Chain Disable", in VTS, pp. 319–324, 2001.
- [10] O. Sinanoglu, I. Bayraktaroglu and A. Orailoglu, "Test Power Reduction Through Minimization of Scan Chain Transitions", in VTS, pp. 166–171, 2002.
- [11] O. Sinanoglu, I. Bayraktaroglu and A. Orailoglu, "Scan Power Reduction Through Test Data Transition Frequency Analysis", in *ITC*, pp. 844–850, 2002.
- [12] O. Sinanoglu, I. Bayraktaroglu and A. Orailoglu, "Reducing Average and Peak Test Power Through Scan Chain Modification", *JETTA*, vol. 19, n. 4, pp. 457–467, 2003.
- [13] O. Sinanoglu and A. Orailoglu, "Modeling Scan Chain Modifications for Scan-in Test Power Minimization", in *ITC*, 2003.
- [14] O. Sinanoglu and A. Orailoglu, "Aggressive Test Power Reduction Through Test Stimuli Transformation", in *ICCD*, 2003.
- [15] G. W. Stewart, Introduction to Matrix Computations, Academic Press, 1973.
- [16] H. K. Lee and D. S. Ha, On the Generation of Test Patterns for Combinational Circuits, Technical Report 12-93, Department of Electrical Eng., Virginia Polytechnic Institute and State University.

<sup>&</sup>lt;sup>4</sup>Test power reductions are computed based on the number of scan chain transitions; it has been shown in [8] that the number of scan chain transitions and the actual test power dissipation are strongly correlated.