# High Speed GaAs Subsystem Design using Feed Through Logic

J. A. Montiel-Nelson, V. de Armas, R. Sarmiento & A. Núñez Centre for Applied Microelectronics Univ. of Las Palmas de Gran Canaria E35017, Spain montiel@cma.ulpgc.es

### Abstract

In this paper design of fast arithmetic circuits using GaAs based Feed Through Logic (FTL) family [1] is presented. A modified version of FTL termed Differential FTL (DFTL) is introduced and basic aspects of design methologies using FTL are discussed. A 4-bit ripple-carry adder is designed and its performance is evaluated against other similar reported works in terms of, device count, chip area, delay, clock rate, and power consumption. It is shown how arithmetic circuits based on FTL outperform the evaluated performance. A 4-bit magnitude comparator is designed and performance evaluated against four cascaded 1-bit comparators.

## 1. Introduction

High performance arithmetic circuits constitute the heart of any digital processor. The performance of the arithmetic circuits is usually determined not only by architectural issues but also by the underlying logic construction. In recent years a number of Gallium Arsenide based arithmetic circuits which employ dynamic logic structures have been designed. These dynamic logic structures have been proposed as an alternative to static logic basically to reduce the power dissipation, as well as, a means to improve the speed of operation.

Designs based upon dynamic domino, precharge the gate output node to a high voltage during the precharging phase and conditionally discharge it during the evaluation phase [2]-[9]. Charge redistribution, leakage current from the output node and threshold voltage variation are the major problems associated with this class of logic. To circumvent the leakage current problem various compensating mechanisms have been proposed. They include, a Metal-insulator Capacitor [3], a trickle transistor [5], or a weak DMESFET and diode[9].

S. Nooshabadi School of Electrical Engineering Northern Territory University NT 0909, Australia saeid@cs.ntu.edu.au

The optimization of these dynamic logic families in terms of sizing of the logic gate and compensating transistors and sensitivity threshold and temperature variation is a major design task. There are other difficulties, with the dynamic designs as well. For example the dynamic design in [2] needs both negative and positive power supplies. The TDFL design in [6] works on two clock phases with added design complexity due to sensitivity to clock skew and area penalty for routing of two clock signals. The TTDL design in [5] uses a trickle transistor to compensate for the leakage from the precharged node. The SPDL design in [8] relies on a reference voltage for compensating the leakage current. This requires multiple power supplies, therefore, complicating the design. Furthermore, cascading of domino dynamic blocks involves incorporation of inverters at the output of the logic blocks. Not only we pay for the extra area but also suffer longer circuit delays because of these inverters.

The concept of *Feed Through Logic* (FTL) using GaAs as introduced in [1] overcomes the common problems associated with the dynamic logic families. In FTL the gate outputs are reset to low during the high phase of the clock. Therefore problem of leakage current and charge redistribution is completely removed. Furthermore the need for the output inverters common to domino structures is eliminated. This paper presents the design, implementation and performance evaluation of several arithmetic circuits (4-bit ripple-carry adder, 1-bit and 4-bit comparators) using FTL and a modified version termed *Differential FTL* (DFTL). The design is compared with other published works, thus, demonstrating the superior performance of our novel concept.

# 2. FTL Design Methodology

In this section some of the basic aspects of design methologies are discussed, and guidelines are provided to the approach that might be pursued. The best results, however, can be obtained by mixing of various techniques. Since FTL gates are fully compatible with DCFL [1] they are included in a DCFL standard library.



Figure 1. (a) Feed Through Logic and (b) Differential Feed Through Logic Structures.

# 2.1. Inverting Feed Through Logic and Differential Feed Through Logic

The block diagrams of FTL and (DFTL) are illustrated in Figures 1(a), and (b), respectively. FTL and DFTL blocks can be cascaded in a domino fashion, without intervening inverters as required in the classical domino logic structures. This means faster signal propagation with less chip area becomes possible. The circuit latency and the maximum clock frequency change with the number of domino stages. The frequency can be improved by reducing the number of the domino stages and inserting pipelining latches in between them. Furthermore, since both are inverting logic they are capable of realizing all logical functions.

However, it is sometimes more convenient to implement a function and its complement together. Extension of FTL to DFTL is straightforward. It also has the added advantage of providing functional blocks for the delay insensitive asynchronous circuits [12]. As seen in Figure 1(b) true and complement branches can share some logic. This sharing of some transistors might result in some area saving and speed improvement if the cross connections between the transistors are not excessive.

In [1] it was said that as the clock signal ( $\phi$ ) goes low all the FTL outputs (gates outputs Out) charge towards the gates switching threshold voltage ( $V_{TH}$ ) and the effect of the external inputs (INs) are propagated through gates outputs. Therefore, when valid inputs to each gate are asserted the gates need only to make a partial transition from the  $V_{TH}$  to low or high. In a similar fashion the differential outputs in DFTL charge towards  $V_{TH}$ . However, difference in values of the capacitive loading at the differential outputs (Out and  $\overline{Out}$ ), causes one of the differential outputs to rise faster than the other one.

In a cascaded gate structure, this effect is accumulative and would result in one of the differential outputs to cross



Figure 2. Effect of Dissimilar Capacitive Loading in a Cascaded DFTL Chain.

the  $V_{TH}$  and make the full logic swing back to the wrong side and, therefore, will have to be restored through a full logic swing to its final value at the other end, as is despited in Figure 2. This also result in the retardation in the rising of the other differential output and reversing back to a voltage value below  $V_{TH}$  and the subsequent need for restoration to its proper logic value through a full logic swing.

The degree of the dissimilarity in the rise times at the differential output, proportionally, depends on the ratio of capacitances at the two differential outputs. Closer this ratio to unity the smaller would be the capacitive imbalance.



# Figure 3. Multiple Output Logic Function Iterative Network in DFTL.

### 2.2. Multiple Outputs Logic Function with DFTL

FTL can also be employed in much more complex iterative network structures. An iterative network consists of one dimensional array of identical modules [13, 14]. An iterative function S of n vector variables  $\{\mathbf{a}_0, \mathbf{a}_1, \dots, \mathbf{a}_{n-1}\}$  can, iteratively, be decomposed into a function P of only 2 variables. Function P is much simpler than S, and intermediate results are also available as outputs. An iterative array network can be built by organizing several identical FTL in an one dimensional array. Concept of *Multiple Outputs Logic Function* (MOLF) [15] using iterative networks is depicted in Figure 3.

Design with iterative network is generally more regular, uses less transistors and takes up less area and results in more efficient interconnection routing. Testing of iterative networks is also simpler [16]. These advantages make the use of iterative network using MOLF attractive in VLSI. In MOLF the deterioration in the noise margin will be a limiting factor in determining the length of the EMESFET trees. The noise margin can be improved using the additional cross-coupled regenerative difference amplifier, as shown in the shaded region of Figure 3.

#### **3.** Design Examples

In order to demonstrate the usefulness of FTL, several circuits were designed and their performance evaluated. These circuits include: both cascaded simple functional blocks, as a ripple carry adder; and a single complex iterative functional block, like a complex magnitude comparator.

#### 3.1. 4-bit Ripple-Carry Adder

The circuit digram for a 1-bit full adder carry and sum functions are illustrated in Figure 4(a) and (b), respectively. As seen, the sum is a FTL block whereas the carry a DFTL block. This is because both carry and its complement are needed. A 4-bit ripple-carry adder was obtained by cascading four such 1-bit adder units and its layout is shown in Figure 5.



Figure 4. FTL Circuit Diagrams for 1-bit Full Adder (a) DFTL Carry and  $\overline{Carry}$  Logic and (b) FTL Sum Logic. ( $T_L$ : (10.0  $\mu m \times 1.0 \mu m$ ),  $T_{PU}$ : (2.0  $\mu m \times 2.0 \mu m$ ),  $T_R$ : (8.0  $\mu m \times 1.0 \mu m$ )).

All EMESFET transistor dimensions (width  $\times$  length) in the logic gates were (10.0  $\mu$ m  $\times$  1.0  $\mu$ m). The pull-up DMESFET transistors were chosen as  $(2.0 \ \mu m \times 2.0 \ \mu m)$ and the reset transistors as  $(8.0 \ \mu m \times 1.0 \ \mu m)$ . In the performance characterization all the sum outputs, as well as, the last carry and carry-complement outputs, were loaded with two inverters (pull-up EMESFET of  $2.0 \ \mu m \times 2.0 \ \mu m$ and pull-down EMESFET of  $10.0 \ \mu m \times 1.0 \ \mu m$ ). In addition, a post layout extracted interstage capacitances (backannotated wiring parasitic capacitances) of  $17 \ fF$  and  $21 \ fF$  for the sum and carry outputs, respectively, were included in the simulation.

The total area was  $0.028 \text{ mm}^2$ . Propagation delays for the last sum and carry bits were 195 ps and 120 ps, respectively. The circuit was simulated to up to 1 GHz frequency and found to be operative in the typical process corner. The power dissipation was 2.7 mW.



Figure 5. Layout for the 4-bit Ripple Carry Adder. The Technology is  $0.6 \ \mu$ m H-GaAsIII from Vitesse.

The 4-bit adder circuit also was simulated with a 100 fF output pad load connected to the sum outputs, as well as, the last carry output. The simulated propagation delays for the last sum and carry bits were 527 ps and 455 ps, respectively. The circuit was simulated to be operative up to 0.833 GHz frequency in the typical process corner.

Table 1 compares the performance of FTL in relation to other reported works. As can be observed FTL has the best performance in terms of delay (195 ps), chip area (0.028  $mm^2$ ) and gate count (8), and its power dissipation is only marginally more than TDFL [7].

To provide a method for comparison between the various

| ······································ |      |      |      |      |       |      |      |       |
|----------------------------------------|------|------|------|------|-------|------|------|-------|
| Logic Family $\rightarrow$             | DCFL | BFL  | CCDL | TTDL | TDFL  | SPDL |      | FTL   |
| Characteristics $\downarrow$           | [3]  | [3]  | [3]  | [5]  | [7]   | [8]  | [9]  |       |
| Area ( $mm^2$ )                        | 0.32 | 0.75 | 0.70 | 0.55 | 0.16  | 0.45 | -    | 0.028 |
| Gate Count                             | 62   | 62   | 28   | 15   | 68    | 34   | -    | 8     |
| Device Count                           | -    | -    | -    | -    | -     | -    | -    | 88    |
| Delay (ns)                             | 1.40 | 2.0  | 1.1  | 0.8  | -     | 0.5  | 1.34 | 0.195 |
| Frequency $(GHz)$                      | -    | -    | -    | -    | 0.770 | -    | -    | 1.0   |
| Power ( $mW$ )                         | 47   | 190  | 96   | 130  | 1.7   | 128  | 4.8  | 2.7   |
| $\Gamma$ ( $mm^2 	imes pJ$ )           | 21   | 285  | 74   | 57   | -     | 29   | -    | 0.01  |

Table 1. Performance Comparisons for the 4-bit Ripple-Carry Adder.

logic families, we define a Figure of Merit ( $\Gamma$ ), which is defined as the product of the three performance parameters, *chip area, power dissipation* and *delay*. The figure of merit for FTL is (0.01  $mm^2 \times pJ$ ) being approximately 2000 times smaller than DCFL.

Since the high phase of the clock is only used for reseting the output, through  $T_R$ , we can use an asymetric clock (< 50% duty cycle) where the high phase of the clock is much smaller than its low phase, therefore, increasing the frequency. Simulation result shows that it is possible to reduce the high phase of  $\phi$  from 400 ps to 200 ps, thereby increasing the frequency to 1.25 GHz.

#### 3.2. FTL 4-bit Magnitude Comparator

To evaluate the performance of DFTL for the implementation of MOLF structures, an iterative 4-bit comparator was designed and simulated. The results are compared with the similar results for the four cascaded 1-bit comparators. Figure 6 is the schematic for 1 bit comparator in FTL. It has been, recursively, cascaded into an iterative 4-bit comparator of Figure 7. Since  $T_M$  devices are not in the critical path their size has been made smaller. The performance comparison between the iterative and cascaded versions of the 4-bit magnitude comparator is given in Table 2. Post layout extracted capacitance values of 23 fF and 34 fF, respectively, for the cascaded and iterative networks, were used in the simulation. In addition each output node was loaded with an inverter (2.0  $\mu$ m × 2.0  $\mu$ m pull-up and 10.0  $\mu$ m × 1.0  $\mu$ m pull-down transistors).

As seen the iterative network perform much better than cascaded form in terms of area and power dissipation, however, due to larger capacitive load at the output, it is not as fast as the cascaded form. The Figure of Merit for the iterative network is 6.1 times better than the cascaded form. The cross coupled differential network is quite insensitive to the variations in the geometrical and technological parameters. A variation of 20% in the threshold voltage, width and length of the devices increases the delay a 0.23%.



Figure 6. Feed Through Logic 1-bit Comparator. The shaded area is the logic for 1-bit comparator. ( $T_L$ : (10.0  $\mu$ m × 1.0  $\mu$ m),  $T_{PU}$ : (2.0  $\mu$ m × 2.0  $\mu$ m),  $T_R$ : (8.0  $\mu$ m × 1.0  $\mu$ m)).



Figure 7. Differential Feed Through Logic Iterative 4-bit Comparator. The shaded area is the logic for 1-bit comparator. ( $T_L$ : (10.0  $\mu$ m  $\times$  1.0  $\mu$ m),  $T_M$ : (4.0  $\mu$ m  $\times$  1.0  $\mu$ m),  $T_{PU}$ : (2.0  $\mu$ m  $\times$  2.0  $\mu$ m),  $T_R$ : (8.0  $\mu$ m  $\times$  1.0  $\mu$ m))

Table 2. Performance Comparisons for the Iterative and Cascaded Versions of the Magnitude Comparator. The technology is  $0.6 \,\mu$ m H-GaAsIII from Vitesse.

| Design $\rightarrow$                            | ITERATIVE | CASCADED |  |
|-------------------------------------------------|-----------|----------|--|
| Characteristics $\downarrow$                    |           |          |  |
| Active Area ( $\mu m 	imes \mu m$ )             | 166 × 1   | 416 × 1  |  |
| Area $(mm^2)	imes 10^{-4}$                      | 85        | 167      |  |
| Gate Count                                      | 1         | 4        |  |
| Device Count                                    | 34        | 48       |  |
| Delay (ps)                                      | 297       | 207      |  |
| Frequency (GHz)                                 | 0.833     | 1.0      |  |
| Power (mW)                                      | 0.45      | 1.8      |  |
| Noise Margin $(mV)$                             | 170       | 168      |  |
| $\Gamma_{-}$ ( $mm^2	imes pJ$ ) $	imes 10^{-4}$ | 10        | 61       |  |

## 4. Conclusions

In this paper concept of GaAs based Feed Through Logic (FTL) and a modified version of FTL termed *Differential FTL* (DFTL) was employed to design arithmetic circuits. Basic aspects of design methologies using FTL were discussed, and guidelines are provided as to the approach that might be pursued. The best results, however, can be obtained by mixing of various techniques.

A 4-bit ripple-carry adder was designed and performance evaluated and it was demonstrated that the adder design based on FTL is superior in performance in terms of chip area, speed, power dissipation when compared with other logic families.

A 4-bit magnitude comparator was designed and performance evaluated. Results were compared with the similar results for the four cascaded 1-bit comparators. The iterative network performs much better than cascaded form in terms of area and power dissipation, however, due to larger capacitive load at the output, it is not as fast as the cascaded form.

#### 5. Acknowledgement

Both the support provided by the Spanish Interministerial Commission of Science and Technology (CICYT) under DSIPS (TIC97-0953) project and the Australian Research Council are greatly acknowledged.

### References

 J. A. Montiel-Nelson, S. Nooshabadi, and K. Eshraghian, "Gallium Arsenide Based Fast Feed Through Logic (FTL)", in Proce. IEEE Int. Sym. on Cir. & Sys., vol. 3, pp. 1884–1887, June 1997.

- [2] S. I. Long and S. E. Butner, Gallium Arsenide Digital Integrated Circuit Design, McGraw-Hill, 1990.
- [3] D. H. K. Hoe and A. T. Salama, "Dynamic GaAs Capacitively Coupled Domino Logic (CCDL)", *IEEE Jo. of Solid State Circuits*, vol. 26, no. 6, pp. 844–849, June 1991.
- [4] J. H. Pasternak and A. T. Salama, "GaAs MESFET Differential Pass-Transistor Logic", *IEEE Jo. of Solid State Circuits*, vol. 26, no. 9, pp. 1309–1316, Sept. 1991.
- [5] D. H. K. Hoe and A. T. Salama, "GaAs Trickle Transistor Dynamic Logic", *IEEE Jo. of Solid State Circuits*, vol. 26, no. 10, pp. 1441–1448, Oct. 1991.
- [6] K. R. Nary and S. I. Long, "GaAs Two-Phase Dynamic FET Logic: A low Power Logic Family for VLSI", *IEEE Jo. of Solid State Circuits*, vol. 27, no. 10, pp. 1364–1371, Oct. 1992.
- [7] P. S. Lassen, S. I. Long, and K. R. Nary, "Ultralow-Power GaAs MESFET MSI Circuits Using Two-Phase Dynamic FET Logic", *IEEE Jo. of Solid State Circuits*, vol. 28, no. 10, pp. 1038–1043, Oct. 1993.
- [8] O. M. K. Law and C. A. T. Salama, "GaAs Split Phase Dynamic Logic", *IEEE Jo. of Solid State Circuits*, vol. 29, no. 5, pp. 617–622, May 1994.
- [9] V. Chandramouli, N. Michell, and K. F. Smith, "A New, Precharged, Low-Power Logic Family for GaAs Circuits", *IEEE Jo. of Solid State Circuits*, vol. 30, no. 2, pp. 140–143, Feb. 1995.
- [10] J. M. Rabaey, Digital Integrated Circuits: A Design Prespective, Prentice Hall, 1996.
- [11] H. Statz, P. Newman, I. W. Smith, R. A. Pucel, and H. A. Haus, "GaAs FET Devices and Circuit Simulation in SPICE", *IEEE Transactions on Electron Devices*, vol. 25, pp. 160–169, 1987.
- [12] G. M. Jacobs and R. W. Brodersen, "Self-Timed Integrated Circuits for Digital Signal Processing Applications", in VLSI Signal Processing III, R. W. Brodersen, Ed. 1988, pp. 197–208, IEEE Press.
- [13] M. D. Ercegovac and T. Lang, Digital Systems and Hardware/Firmware Algorithms, John Wiely, 1985.
- [14] S. L. Lu and M. D. Ercegovac, "Evaluation of Two-Summand Adders Implementation in ECDL CMOS Differential Logic", *IEEE Jo. of Solid State Circuits*, vol. 26, no. 8, pp. 1152–1160, Aug. 1991.
- [15] I. S. Hwang and A. L. Fisher, "Ultrafast Compact 32-bit CMOS Adders in Multiple-Output Domino Logic", *IEEE Jo. of Solid State Circuits*, vol. 24, no. 2, pp. 358–369, April 1989.
- [16] N. J. Jha and Q. Tong, "Testing of Multiple-Output Domino Logic MODL CMOS Circuits", *IEEE Jo. of Solid State Circuits*, vol. 25, no. 3, pp. 800–805, June 1990.
- [17] K. R. Nary and S. I. Long, "Dynamic Latch for High Speed GaAs Domino Circuits", *IEE Electronics Letters*, vol. 28, no. 1, pp. 36–37, January 1992.
- [18] D. H. K. Hoe and A. T. Salama, "Pipelining of GaAs Dynamic Circuits", in *IEEE Int. Sym. on Circuits and Systems*, May 1992, vol. 1.
- [19] Vitesse, Foundry Design Manual, Vitesse Semiconductor Corporation, Mar. 1992.