# Common-Centroid Layouts for Analog Circuits: Advantages and Limitations

Arvind K. Sharma<sup>1</sup>, Meghna Madhusudan<sup>1</sup>, Steven M. Burns<sup>2</sup>

Parijat Mukherjee<sup>2</sup>, Soner Yaldiz<sup>2</sup>, Ramesh Harjani<sup>1</sup>, Sachin S. Sapatnekar<sup>1</sup> <sup>1</sup>University of Minnesota, Minneapolis, MN <sup>2</sup>Intel Corporation, Hillsboro, OR

Abstract—Common-centroid (CC) layouts are widely used in analog design to make circuits resilient to variations by matching device characteristics. However, CC layout may involve increased routing complexity and higher parasitics than other alternative layout schemes. This paper critically analyzes the fundamental assumptions behind the use of common-centroid layouts, incorporating considerations related to systematic and random variations as well as the performance impact of common-centroid layout. Based on this study, conclusions are drawn on when CC layout styles can reduce variation, improve performance (even if they do not reduce variation), and when non-CC layouts are preferable.

#### I. INTRODUCTION

Over the years, analog designers have developed a set of widely-followed best practices to make designs less sensitive to absolute variability (which is hard to control), and more sensitive to differential variability (which is easier to limit). An essential ingredient of these practices is the use of layout techniques [1] to reduce the process-induced differential mismatch between elements in a layout. Structures with both active (e.g., differential pairs, current mirrors) and passive (e.g., resistor or capacitor arrays) devices are matched for this reason.

For transistor arrays, we examine the technique of commoncentroid (CC) layout, with a focus on FinFET technologies. We gauge the effectiveness of CC in combating process variations in transistor arrays, and examine the tradeoffs involved in using CC. We overview the nature of on-chip variations, along with a set of variation models. We examine the impact of CC and other layout styles on layout-dependent effects (LDEs), which have become increasingly important in recent technology generations [2]. We study the circuit-level impact on nominal performance and on performance variations, identifying scenarios where CC can be beneficial or suboptimal. We show that larger layouts can benefit from the use of CC. For smaller layouts, we demonstrate that while some designs may not need CC layout, in other cases, CC may result in improved performance compared to other layout patterns. Finally, we present an algorithm for selecting an optimized layout pattern from a given set of layout patterns.

## II. COMMON-CENTROID LAYOUT

A CC layout of k devices places the  $s_i$  segments of each device i so that their centroids coincide. For example, the layout of the differential pair (DP) shown in Fig. 1 could be organized

This work is supported in part by the DARPA IDEA program, as part of the ALIGN project, under SPAWAR Contract N660011824048.



Fig. 1: CC and ID FinFET layout of a  $2 \times$  differential pair. into an array of two devices, A and B (i.e., k = 2), with  $s_A = s_B = 2$  segments,<sup>1</sup> each built as a unit cell with multiple FinFETs. The CC technique lays out devices in a 1-D or 2-D array such that the centroids match in each dimension. We denote the location of segment *i* of device *j* as  $(x_i^j, y_i^j)$ . This is met using the "ABBA" sequence. Similarly, a 2-D CC layout pattern is symmetric around both the X- and Y-axes.

*Interdigitated* (ID) layouts alternate the placement of unit cells of each device, as shown in the 1-D "ABAB" layout in Fig. 1. Interdigitated schemes do not have a common centroid for the devices: in the figure, the centroid for the A cells lies to the left of that for the B cells. CC layouts are widely used and are considered to be better for matching process-induced variations than alternatives such as ID patterns.

The rationale for using CC layouts is that they cancel out linear systematic variations due to first-order process gradients. A variation  $\Delta p$  in process parameter p induces a small perturbation,  $\Delta P$ , in the circuit performance parameter P. This can be modeled using a linear Taylor series expansion,  $\Delta P = S_p \Delta p$ , where  $S_p = \partial P / \partial p$  is the sensitivity at the nominal point. Using the centroid as the origin, the variations are modeled by a plane  $\Delta p = \alpha \cdot x$ , for an unknown  $\alpha$ , since the precise slope of the variation during manufacturing is unknown. It is easily seen that where x is the horizontal dimension with the origin at the center of the layout,

$$\Delta P = \alpha \mathcal{S}_p \cdot x \tag{1}$$

i.e., for linear variations, i.e., constant  $\alpha$ , the performance P is a linear function of x, the location of each device.

Under this linear assumption, the CC criterion ensures that the sum of variations over all devices cancel each other out. In Fig. 1, let us say that p represents the threshold voltage and P the drain current. Since  $\Delta p = \alpha \cdot x$ , the parameter p of device A is shifted by  $-2\alpha$  for the leftmost device and  $+2\alpha$  for the rightmost device with respect to the value at the centroid. From Eq. (1), the drain current shift by  $-2\alpha S_{p,A}$ 

<sup>&</sup>lt;sup>1</sup>In analog design, devices are divided into segments called fingers [1]. For FinFET technologies, transistors may be split into *unit cells* of multiple FinFETs.

and  $+2\alpha S_{p,A}$ , adding up to a net shift of zero. Using similar notation, currents in the devices of B shift by currents shift by  $-\alpha S_{p,B}$  and  $\alpha S_{p,B}$ , also creating a net shift of zero. A similar argument justifies CC in 2D layouts.

# III. MODELING ON-CHIP VARIATIONS

# A. A Taxonomy of Process Variations

Process-induced variations can be categorized as either *systematic* variations, which can be modeled predictably, or *random* variations, which can only be represented statistically. These variations can also be classified according to their provenance: **Across-die** (or global) variations affect all devices on a chip similarly, and do not cause mismatch between devices on a die. These are well modeled using process corners.

Within-die variations affect devices differently based on their location on a chip and result in differential mismatch. Withindie systematic variations are often modeled by linear gradients [3], while random variations are modeled with distributions. Random variations have uncorrelated and spatially correlated components characterized by a correlation distance [4]– [7]. Uncorrelated variations, e.g., due to dopant fluctuations or line edge roughness, can be reduced using larger devices [8].

# B. Variation in Nanometer-Scale Technologies

Analog circuit performance is predicated on reducing the differential variability, or *mismatch* between devices. Pelgrom's model [8] quantifies the mismatch in a parameter P of two devices as the sum of random variables corresponding to the uncorrelated component, u, and a spatially correlated component, s. The variance of the mismatch is given by

$$\sigma_{\Delta P}^2 = \sigma_u^2 + \sigma_s^2 \text{ where } \sigma_u^2 = A_P^2/(WL) \text{ ; } \sigma_s^2 = S_P^2 r^2 \quad (2)$$

where  $A_P$  and  $S_P$  are technology-dependent proportionality constants, W and L are the device width and length, respectively, r is the distance between the devices, and  $\sigma^2$  denotes a variance. The first component depends on the transistor area and its impact can be reduced using large-sized transistors; the second can be mitigated by reducing the distance between devices.

We use a variation model [4], [5] for nanometer technologies, modeling a process parameter P as a sum of global (g), uncorrelated (u), and spatially-correlated (s) components.

$$\Delta P = g + u + s \tag{3}$$

The mean of  $\Delta P$  is zero, and its variance is  $\sigma_P^2 = \sigma_g^2 + \sigma_u^2 + \sigma_s^2$ . At a distance r, the correlation functions are:

$$\rho_g(r) = 1; \ \rho_u(r) = 0; \ \rho_s(r) = e^{-(r/R_L)^2}$$
(4)

where  $R_L$  is the **correlation distance** for the process, formally defined as the distance at which the correlation  $\rho_s(r)$  goes down to a factor of 1/e, or 37%, of  $\rho_s(0)$ . For devices *i* and *j* separated by a distance *r* [4],  $\operatorname{cov}(P_i, P_j) = \sigma_g^2 + \rho_s(r)\sigma_s^2$ . The correlation coefficient  $\rho(r)$  between  $P_i$  and  $P_j$  is:

$$\rho(r) = \frac{\operatorname{cov}(P_i, P_j)}{\sigma_{P_i}\sigma_{P_j}} = \frac{\sigma_r^2 + \rho_s(r)\sigma_s^2}{\sigma_P^2}$$
(5)

Thus, the global component, g, is fully correlated, regardless of distance, while the uncorrelated component, u, limits the maximum correlation at r = 0 to  $(\sigma_g^2 + \sigma_s^2(0))/\sigma_P^2$ . As r increases,  $\rho_s(r)$  decreases, and for large r, spatial correlations vanish and the correlation coefficient saturates at  $\sigma_g^2/\sigma_P^2$ . A plot of  $\rho(r)$ , similar to industry data [4], is shown in Fig. 2(a).



Fig. 2: (a) Process correlation as a function of distance and (b) its corresponding semivariogram (adapted from [4]).



Fig. 3: Threshold voltage mismatch between devices i and j due to (a) uncorrelated and (b) spatially correlated variations.

We visualize the mismatch between two devices, i and j, due to variations. The random component, u, (Fig. 3(a)) is the first component of Pelgrom's model and shows no spatial pattern. Spatially correlated variations, s, (Fig. 3(b)) corresponding to the second term of Pelgrom's model, affect devices differently based on their location on the die. These maps are consistent with published industry data [5]. We discuss our approach to generate these maps in the next sub-section.

# C. Random Fields and Semivariograms

Based on the distributions of Section III-B, we analyze the impact of variations on analog layouts using Monte Carlo methods. Each instance of the Monte Carlo simulation is based on a sample of the distribution, and corresponds to a two-dimensional random field. We borrow ideas from geostatistics to use Kriging methods [9] to build random fields based on a correlation map similar to Fig. 3. For a spatial function  $F(\mathbf{p})$  with mean zero and constant variance  $\sigma^2$  over the region, a *semivariogram*  $\gamma$  is defined as half the average squared difference of two samples. For points at a distance r in a stationary spatial random variable in an isotropic region:

$$F(r) = \frac{1}{2}E\left[(F(r) - F(0))^2\right]$$
(6)

For an ergodic spatial field F, it is easy to see that  $\gamma(r) \ge 0$  $\forall r$ , and  $\gamma(r=0) = 0$ . A semivariogram is described by its:

- sill, or its limiting value,  $\lim_{r\to\infty} \gamma(r)$ , which equals  $\sigma^2$ .
- nugget, the value at  $r = 0^+$ , i.e.,  $\gamma(r = 0^+) = \frac{1}{2} (E(F(0^+) F(0)))$ . If F has a spatially independent component, this value is nonzero.

The plot of  $\rho(r)$  in Fig. 2(a) is referred to as a correlogram, and can be related to the semivariogram as follows:

$$\gamma(r) = (1 - \rho(r))\sigma^2 \tag{7}$$

In our work, we use GSTools [10], a Python package that generates 2-D spatial random fields corresponding to a prescribed semivariogram, parameterized by its sill, nugget, and correlation length. Using the notation of Section III-B, we set  $F = \Delta P$  with mean zero and variance  $\sigma_P^2$  (Fig. 2(b)):

 $\gamma_P(r) = \sigma_u^2 + \sigma_s^2; \qquad \text{Nugget} = \sigma_u^2$ Correlation length =  $R_L$   $\text{Sill} = (\sigma_P^2 - \sigma_g^2);$ 

# IV. IMPACT OF LAYOUT ON PERFORMANCE

The use of CC layout has an impact on resistive parasitics. In FinFET technologies, where wire resistances can be significant, interconnect parasitics may alter circuit performance. CC layout aims to minimize the variance of circuit performance metrics, but we will show that CC may degrade absolute performance. We analyze two issues that affect analog performance: layoutdependent effects (LDEs) and routing parasitic mismatch.



Fig. 4: Clustered, CC, and ID layouts with routing connections.

# A. Layout-Dependent Effects

LDEs induce shifts in transistor performance parameters stemming from relative position in the layout. One of the most profound LDEs is caused by the length of diffusion (LOD) effect [2], [11], whereby the stress on a transistor, and hence its  $V_{th}$ , varies with the length of the diffusion region. In bulk technologies, stress mismatch is induced by shallow trench isolation (STI). In FinFETs, the distance of the transistor from the fin edge determines  $\Delta V_{th}$ , which is proportional to LOD.

The impact of LOD [2], [11] is described by two geometric parameters, SA and SB, defined as the distance from poly-gate to the diffusion/active edge on either side of the device. Fig. 4 illustrates SA and SB for the left-most unit cell A of the top layout. For a device of gate length  $L_g$ , and n unit cells [12]:

$$V_{th} \propto \frac{1}{\text{LOD}} = \sum_{i=1}^{n} \left( \frac{1}{\text{SA}_i + 0.5L_g} + \frac{1}{\text{SB}_i + 0.5L_g} \right) \qquad (8)$$

Fig. 4 shows three layouts (clustered, ID, and CC), each experiencing different LOD. Unit cells in the clustered layout are symmetric for LOD, i.e., SA [SB] for the leftmost unit cell

A is the same as SB [SA] for the rightmost cell B, resulting in the same LOD. A similar observation is made for the ID layout, but in the CC layout, from Eq. (8), LOD for the inner B cells differs from that for the outer A cells, causing mismatch.

## B. Routing Parasitic Mismatch

From Fig. 4, the CC layout inherently shows a mismatch between the length of the drain/source connections (and hence the wire parasitics) for devices A and B. No such mismatch is seen for the ID or clustered layout. This mismatch can also be seen in the CC vs. ID layout in Fig. 1. In FinFET technologies, where the wires have significant resistance, this can be a significant performance issue. Due to restricted design rules that may specify wire directions (horizontal or vertical) in a given layer, detours for parasitic matching are not possible, and moving to another layer involves vias that cause large resistances jumps, making resistance matching even harder.

The impact of parasitic mismatch and LOD is more critical for smaller devices. For larger devices, these effects can be avoided by changing device placement, e.g., in Fig. 4, mismatch can be reduced by using two rows of transistors, with A and B swapped in the second row, to ensure that both LOD and routing parasitics for A and B match even for CC.

To capture the impact of interconnect parasitics for differential structures such as DPs, a useful performance metric is the effective transconductance,  $G_m$ , defined as the sensitivity of the output current,  $I_{out}$ , to the input voltage,  $V_{in}$  [13]:

$$G_m = \frac{\partial I_{out}}{\partial V_{in}} = \frac{g_m(v_{in} - v_s)}{v_{in}} = \frac{g_m(v_{in} - i_{ac}R_s)}{v_{in}} \quad (9)$$

where  $v_{in}$  and  $v_s$  are the small-signal input and source voltages, respectively;  $g_m$  is the transistor transconductance;  $R_s$  is the parasitic resistance from the transistor source to its AC ground node (the point where small-signal currents cancel); and  $i_{ac}$  is the small-signal current through  $R_s$ .

For a DP, each unit cell of device A carries a positive smallsignal current of magnitude  $I_{UA}$  and device B carries a negative small-signal current of magnitude  $I_{UB}$ . The locations of AC ground and the AC currents in a DP are annotated in Fig. 4. For the clustered pattern, the current through  $R_s$  increases from the leftmost/rightmost unit cell (A/B) to the AC ground, and is larger than for CC or ID. Consequently,  $v_s$  is higher and  $G_m$ is lower (from (9)) than for CC or ID.

The effective small-signal currents through  $R_s$  are very similar for CC and ID, but due to  $R_s$  mismatch between device A and B, the CC pattern is inferior to the ID pattern [13]. Thus, the ID layout provides the best  $G_m$ , the CC layout is next best, and the clustered layout is the worst of the three.

## C. Selection of a Layout Pattern

Having discussed the impact of CC, ID, and clustered layout patterns on performance, we now present a methodology, summarized in Algorithm 1, for selecting an optimized layout pattern for a given set of device sizes and a correlation distance. The inputs to the algorithm are the technology parameters,  $R_L$  and  $A_{V_{th}}$ , and a set of N candidate layout patterns (e.g., clustered, CC, ID) for placing M devices.

The core idea of the algorithm is to first sort the patterns according to the level of deterministic mismatch in Step 1; then to use the small-signal current through  $R_s$  as a tie-breaker in Step 2. This orders patterns from the most- to the least-preferred for deterministic mismatch. Finally, in Step 3, we traverse this list till we reach the first pattern where spatial variations s are a small fraction of random variations u: this is chosen as the optimal pattern. Next, we detail each step.

Algorithm 1 Selection of an optimized pattern

1: Input:  $R_L$ ;  $A_{V_{th}}$ ; Threshold voltage gradient  $\alpha_{V_{th}}$ ; M devices to be matched (sizes  $(W_1, L_1)$  $(W_M, L_M)$ ; N layout patterns  $P = (P_1, P_2, \dots, P_N)$ 2: Output: Optimal pattern in P // Step1: Sort the layout patterns based on mismatch 4: for i = 1 to N do S =NULL; Z =NULL 5. 6: for j = 1 to M do // Over all devices $(C_X, C_Y) = CC$  point for the device j in  $P_i$  $S.add(C_X, C_Y)$ 7: 8: Q = LOD for the device j in  $P_i$  using (8) 9. 10Z.add(Q)11:end for  $\Delta = \max(Z) - \min(Z) // Deterministic mismatch$ 12  $D_{max}$ 13: = max(Distance between the CC points stored in S)  $14 \cdot$  $U.add(P_i, \Delta, D_{max})$ 15: end for 16: Sort U in ascending order of  $\triangle$ 17: // Step2: Sort U for effective small-signal current 18: Calculate the effective small-signal current through  $R_s$  for patterns in U 19: Sort patterns in U with same  $\triangle$  in ascending order of effective small-signal current 20: // Step3: Compare random and spatial variations 21:  $\sigma_u^2 = \sum_{i=1}^M A_{V_{th}}^2 / (W_i L_i)$  // Calculate random variations 22: for T in sorted patterns U do Calculate spatial variations  $\sigma_s(r = D_{max})$  using (10) 24. if  $\sigma_s(r =$  $(D_{max}) \leq \epsilon * \sigma_u$  then 25 Pattern T is the best among the layout patterns in P26: break 27 end if 28: end for Performance shift due to mismatch For each layout pattern,

renormance similate to institute for each layout pattern, the maximum distance between the common-centroid points of the devices (line 7) and the total LOD mismatch (Eq. (8)) for each device (line 9) are computed. The largest deterministic mismatch,  $\Delta$ , for a pattern (line 12), and the largest distance,  $D_{max}$ , between the common-centroid points (line 13) are then determined. The LOD matching criteria also ensure the parasitic matching at the source/drain terminals of the devices in a layout pattern. In principle, the mismatch due to process gradient (Eq. (1)) for the layout can also be incorporated into this shift, but in practice it is much smaller than the LOD effect.

Thereafter, the patterns are sorted (line 16) in increasing order of mismatch,  $\triangle$ . For the DP in the previous subsection, the clustered and ID patterns P3 do not have LOD mismatch, but the CC pattern P2 does. After sorting, the clustered and ID patterns come before CC at the end of Step 1.

**Small-signal current through**  $R_s$  Next, as a tie-breaker, the layout patterns with the same mismatch are sorted based on the small-signal current through  $R_s$  in the pattern (line 18–19).

**Comparing variations** Next, we compute the variance,  $\sigma_u$ , of random variations using Pelgrom's model (line 21), and for the sorted layout patterns in Step 2, the spatial variations  $\sigma_s$  are calculated using (10) (line 23). We traverse the sorted patterns list (line 16) to pick the first pattern for which spatial variations

are small compared to random variations, i.e., they are within the allowable percentage ( $\epsilon$ ) of the random variations (e.g., 1%), the pattern is chosen as the best in *P* (line 25).



Fig. 5: (a) A CM layout in two different patterns. (b) Copying current mismatch as a function of  $D (R_L = 10 \mu \text{m})$ .

Fig. 5(a) shows two layout patterns for a CM with two devices ( $W/L = 2.3\mu$ m/14nm) A and B. Both patterns show no LOD mismatch between A and B. The centroids for devices A and B coincide for the CC pattern (Fig. 5(a)), irrespective of spacing D, but not for the clustered pattern. For  $R_L = 10\mu$ m, Fig. 5(b) shows the trend of copying current variance with D. For  $D < 1\mu$ m =  $0.1R_L$ , a clustered pattern is preferred as it has lower parasitics and the same variance as CC. For  $D > 1\mu$ m, CC layout is necessary to cancel spatial variations.

# V. RESULTS AND DISCUSSION

#### A. Spatial Field Generation

We use public-domain models to model process variations and analyze the impact of layout patterns on performance. All circuit simulations use HSPICE and commercial 12nm FinFET device models. We use spatial random fields [10] for  $V_{th}$ variations to generate the  $V_{th}$  semivariogram as follows:

**Variances of the components of**  $\Delta P$ : We use data from [14] to obtain  $\sigma_u$ . From [4],  $\sigma_s/\sigma_u = 1.7$  at  $r \gg R_L$ . Thus,

$$\sigma_s^2(r) = 2.9 \ \sigma_u^2 (1 - e^{-(r/R_L)^2}) \tag{10}$$

**Correlation length**: Data on the correlation length is inconsistent in the literature. Therefore, we show results using two values from commercial processes:  $1000\mu$ m [5], and  $10\mu$ m [6]. **Die-to-die variations**: Global variations *g* affect all devices on a die uniformly, and therefore do not affect mismatch.

We also model gradient-based variations, representing the gradient in the  $V_{th}$  across the chip, based on [15].

#### B. SPICE Simulations using Spatial Fields

We first apply our methodology to analyze DPs, CMs, and CM banks (CMBs). Device matching in these blocks is critical for performance and they have conventionally employed CC layouts. We analyze the performance of these building blocks with different layout patterns, and then move to larger circuits that use these building blocks: a five-transistor operational transconductance amplifier (OTA) and StrongARM comparator. Layouts for these circuits are generated using ALIGN [16]. To ensure that we accurately capture the impact of mismatch on performance, all simulations are based on post-layout RC-extracted netlists. For each layout, using these extracted RC parameters, we perform 1000 Monte Carlo simulations and show the mean  $\mu$  and standard deviation  $\sigma$  of circuit performance parameters over these trials.

1) Basic Analog Blocks: Fig. 6(a) shows a schematic of a DP, CM, and current mirror bank (CMB), sized as follows: **DP and CM**: Transistors in the DP have  $W/L = 46\mu m/14nm$  and in the CM,  $W/L = 18.4\mu m/14nm$ , respectively. The overall layout areas  $3.1\mu m \times 5.2\mu m$  for the DP and  $1.6\mu m \times 5.2\mu m$  for the CM: these sizes are based on the sizing of the OTA that they will be inserted into in Section V-C.

**CMB**: The structure has 256 transistors: 16 devices (M1-M16), each with 16 unit cells. Each transistor has  $W/L = 18.4 \mu m/112 nm$ , and the layout size is  $20 \mu m \times 21.5 \mu m$ .

Three layout patterns – a clustered (non-CC) layout, CC, and ID – are used. The layouts for DP and CM are shown in Fig. 6(b), where the unit cells/fingers of M1 and M2 are labeled A and B, respectively. The clustered pattern resembles P1 and the ID patterns interleaves 16 devices (M1 - M16), similar to P3. The CC layout scheme is taken from [17].



Fig. 6: (a) Schematics and (b) layout patterns of DP/CM/CMB.

2) Performance Drift due to On-Chip Variations: We simulate the performance of the DP, CM, and CMB layouts for each random trial. As representative performance metrics, for the DP we use the *input-referred offset* (i.e.,  $\Delta V_{gs}$  when  $\Delta I = 0$ ) and the change in *copying current ratio* for the CM/CMB (for the CMB, taking the largest change over all devices).

TABLE I: Variability performance of the DP, CM, and CMB for three layout patterns (P1: Clustered, P2: CC and P3: ID).

|    | DP: Inj            | out referred | offset | CM: Copy           | ing curre | nt ratio | CMB: Co            | pying curr | ent ratio |
|----|--------------------|--------------|--------|--------------------|-----------|----------|--------------------|------------|-----------|
|    | $g + u + s; R_L =$ |              | u      | $g + u + s; R_L =$ |           | u        | $g + u + s; R_L =$ |            | u         |
|    | 1000µm             | 10µm         | only   | 1000µm             | 10µm      | only     | $1000 \mu m$       | 10µm       | only      |
| P1 | 1.4mV              | 3.2mV        | 1.3mV  | 4.6%               | 5.3%      | 4.5%     | 1.6%               | 31.5%      | 1.5%      |
| P2 | 1.3mV              | 1.3mV        | 1.4mV  | 4.6%               | 4.6%      | 4.5%     | 1.5%               | 3.1%       | 1.5%      |
| P3 | 1.4mV              | 1.4mV        | 1.4mV  | 4.5%               | 4.6%      | 4.6%     | 1.5%               | 15.2%      | 1.6%      |

Table I shows the variation of these metrics for the two process correlation distances discussed above. Based on the the confidence interval for 1000 Monte Carlo simulations, the data is rounded to one decimal place. For  $R_L = 1000\mu$ m, the CC, ID, and clustered layouts have similar stdev for the inputreferred offset and copying current ratio when all components of variation (g + u + s) are considered. Examining each component of variation: global die-to-die variations, g, do not affect mismatch on a die. For this  $R_L$ , spatially correlated variations, s affect all unit cells to the same degree because the layout size is well below  $R_L$ . Over the small layout size, gradient-based variation is also minimal [15]. Random uncorrelated variations, u thus dominate all other components, and variations for the large  $R_L$  match a separate Monte Carlo simulation considering u only, shown in the table.

At  $R_L = 10 \mu m$ , it is seen that the CC layouts have the

same performance as at  $R_L = 1000 \mu m$ . However, the clustered layout shows a shift in the stdev of about  $2.3 \times$  in the inputreferred offset, and  $1.2 \times$  and  $20 \times$  in the copying current for the CM and CMB, respectively. This is understandable as their dimensions are comparable to  $R_L$ , and the absence of any cancellation in the clustered layout, unlike the CC layouts, leads to a larger stdev in performance. The ID layouts for the DP and CM are close to the CC layouts (centroids of the devices are close), therefore, results in a similar performance at  $R_L = 10 \mu m$ . However, for the CMB, the ID pattern results in a larger mismatch in centroids of the devices (due to 16 different devices) and shows a shift of about  $10 \times$  in the copying current.

*3) Performance Drift due to LDE and Parasitics:* We examine the impact of parasitics and LOD on circuit performance.



Fig. 7: (a) Impact of parasitics on DP and CM performance (b) Improvements in DP  $G_m$  using multiple parallel wires.

**Current mirrors** Fig. 7(a) shows the copying current mismatch of clustered, CC, and ID, layouts of a CM with four unit cells for each device and  $W/L = 2.304 \mu m/14 nm$  (In Fig. 7(a), "Ideal" represents schematic simulation). It is helpful to view the layouts in Fig. 4 to understand these results.

For the clustered layout P1, placing all A and B unit cells together results in the lowest routing parasitics, with a symmetric route connecting the sources/drains (Fig. 4) that ensures a good  $V_{GS}$  match for the devices. For the ID layout P3, the unit cells of A and B are distributed uniformly, resulting in equal wire lengths/parasitics at source/drain. However, since the unit cells are spread over a larger area, wire length/parasitics are higher than P1. As stated earlier, the clustered and ID layouts result in equal LOD for A and B, providing good performance.

For the CC layout P2, the unit cells of A are farther apart than those of B. This results in a mismatch in wire parasitics between A and B at the source and drain, causing  $V_{GS}$  mismatch. As stated earlier, it is difficult to create matched routes in FinFET technologies where only unidirectional routes are permitted and via resistances are significant. Moreover, the LOD is also different for A and B. Consequently, there is a significant mismatch in the copying current ratio.

**Differential pairs** The last column of the table in Fig. 7(a) shows the impact of layout patterns on the performance of a DP with device  $W/L = 46\mu m/14nm$ . We evaluate the effective transconductance,  $G_m$ , defined in (9): as described qualitatively in Section IV-B,  $G_m$  is degraded to a greater extent in clustered layouts than ID or CC layouts due to the larger small-signal current through  $R_s$ ; and the ID layout has the lowest small-signal current through  $R_s$ : this is quantitatively confirmed in the table, where the ID layout (P3) comes closest to the ideal  $G_m$  that assumes  $R_s = 0$ . Even when  $G_m$  is improved by using

multiple parallel wires at the source to reduce  $R_s$ , Fig. 7(b) shows that P3 is superior to P1 and P2.

## C. Evaluation of Circuit Level Performance

We now examine two circuit structures in Fig. 8. The 5T-OTA uses the DP and NMOS CM from Section V-B1. and a PMOS CM, with transistors M5 and M6 [ $W/L = 2.3 \mu m/14 nm$ ]. We evaluate its input-referred offset, which is sensitive to device mismatch [18]. The StrongARM comparator uses a DP  $(M_1, M_2)$  [W/L = 6.1 $\mu$ m/14nm], an NMOS cross-coupled pair (CCP)  $(M_3, M_4)$   $[W/L = 3.1 \mu m/14 nm]$ , a PMOS CCP  $(M_5, M_6)$  [W/L = 1.6µm/14nm], and switches. We evaluate its dynamic input offset, which is sensitive to the mismatch between X and Y [19].



Fig. 8: Schematic of a (a) 5T-OTA (b) StrongARM comparator

We optimize primitive layouts for both circuits using Algorithm 1. The optimized OTA layout uses a CC pattern for the DP (due to high  $G_m$ , Section V-B3) and the NMOS CM (both layouts sizes are comparable to  $R_L = 10 \mu m$ ). A clustered pattern is used for the PMOS CM, which has a small area. All blocks in the comparator are small, and the optimized layout uses the clustered pattern. For these blocks, CC will result in capacitance mismatch between nodes X and Y, and ID incurs higher parasitics at X and Y. Table II shows the input-referred offset for the two circuits for  $R_L = 10 \mu m$  and  $R_L = 1000 \mu m$ , laid out using three options: clustered, CC, and optimized. Т

| AB | LE | II: | Performance | of | 5 T | OTA, | StrongARM | comparator |
|----|----|-----|-------------|----|-----|------|-----------|------------|
|----|----|-----|-------------|----|-----|------|-----------|------------|

|                          | 5T-OTA: 1          | input-referr | red offset (mV) | StrongARM: Dynamic offset (mV) |            |         |  |
|--------------------------|--------------------|--------------|-----------------|--------------------------------|------------|---------|--|
|                          | $g + u + s; R_L =$ |              | u               | g + u + s                      | s; $R_L =$ | u       |  |
|                          | $1000 \mu m$       | 10µm         | only            | $1000 \mu m$                   | 10µm       | only    |  |
| Clustered $(\mu/\sigma)$ | 0.9/3.1            | 1.0/4.0      | 1.1/2.9         | 3.1/2.4                        | 4.5/3.4    | 3.2/2.4 |  |
| CC $(\mu/\sigma)$        | 3.1/3.1            | 3.1/3.1      | 3.0/3.0         | 4.6/3.2                        | 4.9/3.4    | 4.8/3.2 |  |
| Optimized $(\mu/\sigma)$ | 1.1/3.0            | 1.2/3.0      | 1.1/2.8         | 3.1/2.4                        | 4.5/3.4    | 3.2/2.4 |  |

5T-OTA: Input-referred offset: The mean of the offset is affected by layout parasitics and LDEs, and the CC layout is worse than the optimized layout. The main culprit in CC is the PMOS CM with four unit cells in each device, arranged in a single row: as discussed in Section V-B3 creates high parasitics and LOD mismatch in the CC configuration. The DP and NMOS current mirror have 80 and 40 unit cells for each device, respectively, and the CC patterns for these have four rows that can match both LOD and parasitics.

The offset stdev is affected by both u and s variations. For  $R_L = 1000 \mu m$ , the total variations are dominated by u, irrespective of the layout pattern, as the block sizes are well below  $R_L$ , but for  $R_L = 10 \mu m$ , the clustered pattern is clearly worse. The optimized layout has the best  $\sigma$  and good  $\mu$ .

StrongARM comparator: Dynamic offset is a nonlinear function of  $V_{th}$  mismatch and parasitics [19]. Its mean is higher under CC due to parasitic mismatch and inherent LOD mismatch in the DP and CCP. Like the 5T-OTA, at  $R_L = 1000 \mu \text{m}, \mu$ and  $\sigma$  are similar to the *u* only case. At  $R_L = 10 \mu m$ , for the clustered layout (= optimized layout), spatial variations greatly impact mismatch. Its nonlinear relationship with dynamic offset causes both  $\mu$  and  $\sigma$  for the clustered case to degrade. For CC, spatial variations at both  $R_L$  values have modest effects:  $\mu$  and  $\sigma$  are similar to *u*-only, but worse than the optimized case.

## VI. CONCLUSION: IS CC IMPORTANT?

We have carefully analyzed common-centroid layout to question the conventional wisdom that CC layouts are always good. Small layouts, where the size is much less than the correlation distance, are dominated by random variations and do not require CC for matching, but as layouts become larger, CC is required to cancel spatial variations and process gradients. Deterministic shifts and parasitics are also important: clustered and ID layouts have lower LDE, and ID layouts have better parasitics. For differential structures, CC layouts may be useful even for small structures as they reduce the impact of parasitics. Our algorithm for optimized layout pattern generation shows improvements over CC layouts on several circuits.

#### REFERENCES

- [1] A. Hastings, The Art of Analog Layout. Prentice-Hall, 2001.
- A. L. S. Loke, et al., "Analog/mixed-signal design challenges in 7-nm [2] CMOS and beyond," in Proc. CICC, 2019.
- M. Orshansky, et al., "Impact of spatial intrachip gate length variability [3] on the performance of high-speed digital circuits," IEEE T. Comput. Aid. D., vol. 21, no. 5, pp. 544-553, 2002
- [4] J. Xiong, et al., "Robust extraction of spatial correlation," in Proc. ISPD, pp. 2–9, 2006.
- [5] Y. Abulafia and A. Kornfeld, "Estimation of FMAX and ISB in microprocessors," IEEE T. VLSI Syst, vol. 13, no. 10, pp. 1205-1209, 2005.
- [6] L. T. Pang, Measurement and Analysis of Variability in CMOS circuits. PhD thesis, University of California, Berkeley, Berkeley, CA, 2008.
- [7] P. Friedberg, et al., "Modeling within-die spatial correlation effects for process-design co-optimization," in Proc. ISQED, pp. 516-521, 2005.
- M. J. Pelgrom, et al., "Matching properties of MOS transistors," IEEE J. Solid-St. Circ., vol. 24, no. 5, pp. 1433-1439, 1989
- [9] N. Cressie, Statistics for spatial data. New York, NY: John Wiley, 1993.
- [10] S. Müler and L. Schüler, "GeoStat-Framework/GSTools," 2020.
- K. W. Su, et al., "A scaleable model for STI mechanical stress effect [11] on layout dependence of MOS electrical characteristics," in Proc. CICC, pp. 245-248, 2003.
- [12] P. G. Drennan, et al., "Implications of proximity effects for analog design," in Proc. CICC, pp. 169-176, 2006.
- B. Razavi, Design of Analog CMOS Integrated Circuits. New York, NY: [13] McGraw-Hill, 2nd ed., 2016.
- [14] M. D. Giles, et al., "High sigma measurement of random threshold voltage variation in 14nm Logic FinFET technology," in Proc. VLSI Tech., pp. T150-T151, 2015.
- [15] K. Kuhn, et al., "Managing process variation in Intel's 45nm CMOS technology," Intel Technol. J., vol. 12, pp. 93-109, May 2008.
- [16] K. Kunal, et al., "ALIGN: Open-source analog layout automation from the ground up," in Proc. DAC, pp. 77-80, 2019.
- J. Deveugele and M. S. Steyaert, "A 10-bit 250-ms/s binary-weighted [17] current-steering DAC," IEEE J. Solid-St. Circ., vol. 41, no. 2, pp. 320-329, 2006.
- [18] P. R. Kinget, "Device mismatch and tradeoffs in the design of analog circuits," IEEE J. Solid-St. Circ., vol. 40, no. 6, pp. 1212-1224, 2005.
- B. Razavi, "The StrongARM latch [a circuit for all seasons]," IEEE J. Solid-St. Circ. Mag., vol. 7, no. 2, pp. 12-17, 2015.