# **Two-Phase Resonant Clocking for Ultra-Low-Power Hearing Aid Applications**

Flavio Carbognani

Felix Buergin Norbert Felber Wolfgang Fichtner

Integrated Systems Laboratory ETH Zurich, Switzerland carbo@iis.ee.ethz.ch

Hubert Kaeslin

# Abstract

Resonant clocking holds the promise of trading speed for energy in CMOS circuits that can afford to operate at low frequency, like hearing aids. An experimental chip with 110k transistors and more than 2500 latches, has been designed, fabricated and tested. The measured energy consumption of the design at 0.8 V is 62  $\mu$ W/MHz, about 7.5% less than the conventional single-edge-triggered benchmark. Closer analysis reveals that much of the energy savings brought about by resonant clocking at low supply voltages are lost when a CMOS circuit is operated at higher voltages. This is because of the crossover currents that persist for much of a clock period when a circuit is driven from sine-type clock waveform.

### 1. Introduction

Various portable applications, especially audio-oriented ones like hearing aids, impose tight constraints on both area occupation and energy consumption, but relax the timing requirements. Therefore, low-power VLSI techniques, intended for this market, often go into the direction of trading speed for energy.

As a significant amount of power is spent in the toggling of the clock distribution network, different papers have proposed more energy-efficient clocking strategies than traditional single-edge-triggered (SET) one-phase clocking. In [1] a DSP for hearing aids is described, where a nonnegligible energy proportion is saved thanks to the intrinsic skew insensitivity of two-phase level-sensitive over SET clocking. Yet, as the parasitic contribution to the overall consumption keeps increasing generation by generation, a turning point is soon expected, where saving buffers for skew balancing will not pay off anymore, against the need for the additional load of a second clock signal.

In this regard, resonant clocking [2] certainly represents an attractive solution: by recycling energy between an oscillator and the clock net load, it reaches minimum clock distribution consumption.

From the first pioneer publications, like [3], to the most recent works, like [4], the group of W. Athas has brought resonant clocking up to the level of a promising research topic. At the same time, different clock generator circuits have been compared and investigated ([5, 6] among others) to reach reasonable compromises between the energy gains and the unavoidable driver overheads.

Few publications, yet, deal with application-oriented designs; they often present simple proofs of concept. This work tries to bridge the gap between the afore mentioned and other research papers and the actual feasibility of resonant clocking on silicon in VLSI implementations.

After a comprehensive overview about adiabatic switching in Sec. 2, a new theoretical approach to resonant clocking for VLSI design is presented in Sec. 3. A simple set of equations for a straight forward feasibility test will be proposed, to determine whether resonant clocking is applicable with efficiency to a given design. After that, in Sec. 4 the resonant clocking and the reference implementations of an audio-oriented FIR filter are discussed. A section describing the clock driver and the measurement set-up follows. The last three sections present the results on silicon, a brief outlook on possible improvements and the conclusions.

### 2. Adiabatic vs Conventional Switching

The main cause of dynamic energy dissipation in CMOS circuits is the charging/discharging of the internal node capacitances. In conventional CMOS logic, the power dissipated for charging and discharging a node's capacitance Cwith frequency f is

$$P_{conv} = fCV^2 = \frac{\omega}{2\pi}CV^2 \tag{1}$$

where  $\omega = 2\pi f$  denotes the angular frequency and V the fixed supply voltage. In the occurrence of a clock net, fis the clock frequency  $f_{cp}$ . The dissipated power is thus



Figure 1. Conventional charging (a) and adiabatic charging (b) of a capacitance *C*.

largely independent of waveforms and ramp times involved, and also of the transistor's on-resistances.

Yet, [7] demonstrates that eq. 1 is only a special case of the following equation, which refers to an arbitrary current generator charging the node capacitance (Fig. 1b):

$$P = \xi \omega \frac{RC}{2\pi T} C V^2 \tag{2}$$

where R is now the sum of the switch and the connection resistances, supposed constant for simplicity, T is the charging time, and  $\xi$  is the source voltage shape factor. According to eq. 2, once a voltage waveform is given, the longer the charging, the more efficient the operation. This is actually what the term "adiabatic" stands for.

Yet, in order to exploit the benefits suggested by eq. 2, the voltage source cannot be constant, otherwise it reduces to eq. 1; more than that, standard CMOS cells would require re-design, to be compliant with adiabatic switching.

To sum up, pure adiabatic switching is not feasible and any approximation requires careful cell re-design; when aiming at more conservative solutions, resonant clocking can represent a promising alternative.

#### 3. Resonant Clocking in VLSI

Resonant clocking consists in the application of the adiabatic switching concept to the bare clock net. What is needed is only some care in the clock net design, a specific clock generator and the re-design of the sequential cells.

#### 3.1. New Approach to Resonant Clocking

In order to get easy equations to deal with, the whole clock net is simplified to an R-C low-pass filter. This is not

far away from reality as all the major contributions to the capacitance come from the input pins of sequential cells. Assuming that the clock net driver is an ideal oscillator, regardless of its realization, the related first-order differential equation can be easily solved and the following set of equations is immediately derived:

$$V_C = \frac{V}{\sqrt{1 + \omega^2 R^2 C^2}} \tag{3}$$

$$\Delta \phi = \arctan[\omega \Delta(RC)] \tag{4}$$

$$P_{sin} = \frac{RV^2}{4(R^2 + \frac{1}{\omega^2 C^2})} \xrightarrow{\omega R C \ll 1} \frac{\omega^2 R C^2 V^2}{4}$$
(5)

$$\frac{P_{sin}}{P_{conv}} \stackrel{\omega RC \ll 1}{\longrightarrow} \pi \omega RC \tag{6}$$

where  $V_C$  is the maximum voltage over the capacitance,  $\Delta \phi$  represents the phase shift, and  $P_{sin}$  is the power dissipated in presence of an oscillator.

Eq. 3 has somehow the flavour of a feasibility test: the product  $\omega RC$  must be sensibly lower than one, otherwise the clock sine would be filtered out.

The phase shift in eq. 4 corresponds to the clock skew: to keep it low, variations in R or C along different clock tree branches must be limited. The fulfillment of eq. 4 can be particularly critical when the number of sequential cells is large, because no clock buffers are allowed.

Eq. 5 and especially eq. 6 stipulate that the product term  $\omega RC$  must be kept low in order to make the power efficiency of resonant clocking superior to that of conventional clocking. This imposes a limitation on the achievable clock rate.

Eq. 5 and especially eq. 6 state that the product  $\omega RC$  has not only to do with the feasibility, but also with the power efficiency of resonant clocking compared to conventional clocking.

Clock tree design for resonant clocking should therefore guarantee low resistive, low capacitive, balanced branches up to the leaf cells, without buffers.

# 3.2. An H-Clock-Tree

An H-clock-tree represents a straightforward solution to the requirements outlined in the previous paragraph. It is basically skew-free, without the need of any additional buffer. The overall resistance is also fundamentally low, thanks to the regular parallel connections in the H-structure. To reduce the capacitive load (and the resistance as well) the clock tree was designed before any other wire connection with the highest available metal layer (the fifth) in the technology.

As standard place&route tools do not allow the automatic routing of H-trees, an object-oriented recursive function was implemented in C++ to determine the coordinates



Figure 2. Implemented H-tree geometric characteristics (upper-left corner of fig. 3).

of both the sequential cells and the tree branches. Some assumptions on the size of the tracks were made: they are summarized in fig. 2. on a detail of the DIE photo (3). The connection between the clock pad and the center of the Htree does not affect the global capacitance much, but has a large impact on the resistance, being the only track without a parallel counterpart: for this reason, it has been designed wider than the rest, as shown in fig. 2 in the upper right corner.

**3.2.1. Residual Clock Skew Verification** Modern backend tools enable the export of a DSPF (Detailed Standard Parasitic Format) file, which contains the estimated parasitic information of the chip. The DSPF file contains a distributed-constant modelling of all the nets, which can be directly simulated in Spectre.

By this methodology, it was possible to verify that the residual worst-case clock skew in the clock net does not exceed few tens of picoseconds.

The estimated total clock net load in the presented chip is around 31 pF.

## 3.3. Two-Phase Clocking

Resonant clocking, as it has been discussed in sec. 3.1, is not compatible with the use of FFs. The clock transition time seen from the leaf cells is equal in first approximation to the derivative of the sine waveform in the threshold point. Depending on the working frequency and on the amplitude, this value can easily exceed 100 ns. At such slow clock ramp times, cascaded FFs are prone to malfunctioning.

As level-sensitive latches have been found to be fully compatible with these clock specifications, a two-phase clocking strategy has been chosen to implement the chip.



Figure 3. Resonant clocking chip photo, giving prominence to the H-clock-tree.

# 4. The chip

A 100-tap 16-bit FIR filter, implementing a two-phase resonant clocking (Fig. 3), and a reference design, differing only in the clocking strategy (SET one-phase in Fig. 4), have been integrated in the same  $0.25 \,\mu\text{m}$  process. Being specifically designed for low-frequency portable applications, such as hearing aids, they make use of numerous low-power techniques: a hybrid number representation (sign-magnitude for the multiplication, two's complement for the addition), and latch-based memories among others.



Figure 4. The standard SET benchmark implementation is contoured.

Despite the two-phase clocking, only one H-clock tree has been implemented in the layout, see fig. 3. This was made possible by the fact that the circuit makes extensive use of latch-based register files. 97% of all latches are driven from the master phase. The small slave phase tree has been automatically routed by the back-end tool. The relatively large size of the chip (about 2550 latches and more than 110k transistors) made the clock tree design a challenging issue.

## 5. Clock Generators

The load associated with the two clock nets and to the input clock pins is mainly capacitive. Charging and discharging such capacitances adiabatically means charging and discharging them very slowly, as already stated in sec. 2.

The shape of the driving waveform is not that critical. In [7] it is demonstrated that a constant current, hence a linear voltage, represents the most energy efficient driver. With such an ideal generator the shape factor in eq. 2 would reduce to unity, whereas other driving waveforms have a  $\xi$ greater than one ([7]).

Nevertheless, due to the difficult implementation of such a driver, many papers, [5] among others, have proposed the use of resonant oscillators, forcing sine or half-sine waveforms. These oscillators provide a controlled recycling of energy between the clock capacitance and an inductor, typically an external discrete component. This solution has been chosen to drive the presented chip too.

#### 5.1. 2-Phase Resonant Oscillators

The desired driver is therefore a non-overlapping twophase resonant oscillator. Circuits that accomplish this task typically include two or four transistors and can be subdivided into two main families ([8]): asynchronous and synchronous drivers. In synchronous drivers, the transistor gates are driven by standard non-overlapping squarewave generators, as in fig. 5. Asynchronous drivers are auto-



Figure 5. Implemented driver for two-phase resonant clocking.

sustained oscillators, as it can be obtained in the circuit in fig. 5 by cross-connecting the slave phase with the resonant master phase and vice-versa. [8] and other past works conclude that 2-transistor synchronous oscillators are more energy-efficient: for that reason the configuration depicted in fig. 5 has been preferred. The external inductors  $L_S$  and  $L_M$ , and the clock capacitances  $C_{S1}$  and  $C_{M1}$  would be basically sufficient to get the energy recycling. The external capacitance  $C_{S0}$  and  $C_{M0}$  have been added to free the working frequency from the clock net loads, which are process dependent.

The driver in fig. 5 operates as follows. When the input slave clock is high, the NMOS transistor M1 is on, the output slave phase is connected to ground and the corresponding inductor  $L_S$  is re-charged at constant voltage  $V_{dd}$ (and linear current). When the input slave clock is low, M1 is off and the energy is transferred from the inductor  $L_S$ to the parallel capacitances  $C_{S0}||C_{S1}$  (neglecting the resistance  $R_S$ ) and vice-versa: the output slave phase consequently follows, in first approximation, an arc of sine. Then the slave clock goes high again and the operation repeats periodically. The master phase works in a specular way. As  $C_{S0}$  is chosen significantly larger than  $C_{S1}$ ,  $L_S$  and  $C_{S0}$ must be such that:

$$\nu = \frac{1}{2\pi\sqrt{L_S C_{S0}}}.\tag{7}$$

where  $\nu$  is the frequency of the input square waveform. The output phases are therefore half-sine non-overlapping waveforms. Fig. 6 shows how real waveforms look like at the oscilloscope.

#### 5.2. Measurement Set-up

The prototype chip has been measured on a digital ASIC tester (HP83000) with the set-up of fig. 7. The tester generates the two non-overlapping 50%-duty-cycle square clock



Figure 6. Half-sine master (solid) and slave (dashed) phases at the oscilloscope.



signals; the resonant oscillator "converts" them into halfsine equivalents (see fig. 6), which are fed into the chip. As it generates the driving clock signals, the tester itself can easily provide synchronous input data to the chip. The working frequency is 670 kHz.

#### 6. Results

Measurement results point out that, at low voltages, resonant clocking is more energy efficient than the benchmark. Fig. 8 shows the energy consumption of the SET one-phase benchmark (dashed curve  $\mathbf{A}$ ) and the resonant clocking design (solid curve  $\mathbf{H}$ ). The two curves present a cross-point at around 0.9 V: for larger voltages  $\mathbf{A}$  dissipates less, for lower voltages  $\mathbf{H}$  is more efficient. The reason for this behaviour is outlined in the next paragraph. Therefore resonant clocking in the test chip is actually more energy efficient (-7.5%) only at low supply voltages (in this case less than about 0.9 V).

Fig. 9 shows the measured energy dissipated during the charging/discharging of the master phase. Conventional clocking gives out a typical staircase; bumps are due to parasitics. As opposite to that, in resonant clocking the energy is not an increasing monotonous function: it follows the oscillation of the source voltage, enabling extensive energy recycling.

During each charging operation, conventional clocking dissipates about 75 pJ: according to eq. 1, the clock load is

| clocking strategy           | energy [µW/MHz] |
|-----------------------------|-----------------|
| A: SET one-phase reference  | 67              |
| H: resonant clocking design | 62              |





Figure 8. Measured energy consumption at different supply voltages.

around 37.5 pF, which is in good agreement with the estimation of sec. 3.2.1 (around 31 pF), considering a typical 6 pF to 9 pF probe parasitic capacitance.

Assuming  $120 \Omega$  series resistance including parasitics, eq. 6 predicts that the power necessary for driving the resonant clock should be lower by a factor of 17 compared with conventional clocking. Actual measurements indicate a ratio of 16. This quick calculation points out the validity of the set of eq. 3 to eq. 6 to evaluate resonant clocking feasibility and efficiency.



Figure 9. Measured master net energy over the time (@ 2 V,  $T_{Clk}$ =1493 ns)

#### 6.1. Supply Voltage and Energy Efficiency

When replacing conventional by resonant clocking, the energy savings are at most equal to the total energy dissipated while charging and discharging the clock nets. In the presented chip, for instance, with an estimated master phase load of about 31 pF and a supply voltage of 0.8 V, resonant clocking could spare, in principle, up to about 20 pJ.

The reason why the gain is reduced to  $5 \mu$ W/MHz or 5 pJ (see tab. 1) is only partially due to the series resistance in the H-clock-tree and to the presence of parasitics and non-idealities in the clock driver and in the measurement set-up.

A large energy overhead is actually caused by the crossover current consumption inside standard latches. A significant drawback of resonant clocking, which has never been addressed in literature so far, is the inflated clock ramp time that triggers the leaf cells. Because of that, the inverters that are normally present at the clock input of standard latches, drain large cross-over currents, whenever the clock toggles.

The same reason is at the basis of the cross-point in the curves of fig. 8. What makes design **H** more sensitive to the supply voltage is precisely the very large cross-over current energy contribution inside latches, whose dependence on the supply voltage is more pronounced than that of the switching energy. In design **A**, conventional clocking with clock buffers minimizes the ramp time and the contribution of the cross-over currents to the dynamic dissipation. For this reason, design **A** consumption is less sensitive to supply voltage variations. More exhaustive explanations can be found in [9].

# 7. Outlook

In order to preserve the energy savings along the clock net (fig. 9), while reducing the cross-over current overhead discussed in the previous paragraph, latch re-design appears mandatory. A candidate circuit is presented in [9].

## 8. Conclusions

The following conclusions can be drawn:

- 1. A complete design flow featuring resonant clocking and including a skew-free H-clock-tree has been lined up. One of the largest design (about 2550 latches and 110k transistors) implementing resonant clocking has been integrated on silicon.
- 2. A new consistent theoretical approach to resonant clocking design has been developed and tested on a prototype chip.

- 3. It has been shown (Fig. 8) that resonant clocking is more energy efficient (-7.5%) than SET one-phase clocking in a low-voltage low-frequency audio application.
- 4. It has been shown that cross-over currents do represent a significant limitation to the energy efficiency of resonant clocking, particularly when latch re-design is not undertaken.

## References

- P. Mosch, G. van Oerle, S. Menzl, N. Rougnon-Glasson, K. van Nieuwenhove, and M. Wezelenburg, "A 660-μW 50-Mops 1-V DSP for a hearing aid chip set," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1705–1712, Nov. 2000.
- [2] A. J. Drake, K. J. Nowka, T. Y. Nguyen, J. L. Burns, and R. B. Brown, "Resonant clocking using distributed parasitic capacitance," *IEEE J. Solid-State Circuits*, vol. 39, pp. 1520–1528, Sept. 2004.
- [3] W. C. Athas, N. Tzartzanis, N. Svensson, and L. Peterson, "A low-power microprocessor based on resonant energy," *IEEE J. Solid-State Circuits*, vol. 32, pp. 1693–1701, Aug. 1997.
- [4] W. Athas, N. Tzartzanis, W. Mao, L. Peterson, K. Lal, K. Chong, J. S. Moon, L. Svensson, and M. Bolotski, "The design and implementation of a low-power clock-powered microprocessor," *IEEE J. Solid-State Circuits*, vol. 35, pp. 1561– 1570, Nov. 2000.
- [5] J. S. Moon, W. C. Athas, S. D. Soli, J. T. Draper, and P. A. Beerel, "Voltage-pulse driven harmonic resonant rail drivers for low-power applications," *IEEE Trans. VLSI Syst.*, vol. 11, pp. 762–777, Oct. 2003.
- [6] M. Arsalan and M. Shams, "Charge-recovery power clockgenerator for adiabatic logic circuits," in *Proc. IEEE International Conference on VLSI Design (VLSID'05)*, Kolkata, India, Jan. 2005, pp. 171–174.
- [7] A. P. Chandrakasan and R. W. Brodersen, *Low Power Digital CMOS Design*. Norwell, MA: Kluwer Academic Publishers, 1995.
- [8] H. Mahmoodi-Meimand and A. Afzali-Kusha, "Efficient power clock generation for adiabatic logic," in *Proc. IEEE International Symposium on Circuits and Systems (ISCAS'01)*, Sydney, Australia, May 2001, pp. 642–645.
- [9] F. Carbognani, F. Buergin, N. Felber, H. Kaeslin, and W. Fichtner, "Two-phase clocking and a new latch design for low-power portable applications," in *Proc. Power and Timing Modeling, Optimization and Simulation (PATMOS'05)*, Leuven, Belgium, Sept. 2005, pp. 446–455.