# 3D FPGA using high-density interconnect Monolithic Integration

Ogun Turkyilmaz, Gérald Cibrario, Olivier Rozeau, Perrine Batude, Fabien Clermidy CEA – LETI, MINATEC Campus Grenoble, France ogun.turkyilmaz@cea.fr

*Abstract*— New 3D technology, called "Monolithic Integration", offers very dense 3D interconnect capabilities. In this paper, we propose a 3D FPGA architecture with logic-onmemory approach based on this technology. The routing and computation blocks are splitted into two layers where the logic is placed on the top and memory on the bottom. Using extracted values from layout in 14nm FDSOI technology, typical benchmark circuits are evaluated in the VPR5 toolflow. The results show an area reduction of 55% compared to the 2D FPGA. More importantly, due to the lowered routing congestion, the EDP of the 3D FPGA is improved by 47%.

## Keywords— FPGA, 3D, monolithic integration

## I. INTRODUCTION

With exponentially increasing cost of new technology nodes, 3D integration becomes an appealing solution for "low cost" scaling. Parallel integration, with the use of throughsilicon vias (TSVs) makes it possible to integrate separately fabricated dies vertically. However, recent ITRS [1] roadmap shows that the TSV alignment performance will be limited in between  $\sim 0.5$  - 1µm (Table I). Due to the restriction in the alignment, the pitch between two TSVs cannot be smaller than  $\sim 4$  - 8µm. In addition, the depth of TSV ( $\sim 20$  - 50 µm) implies performance impact when signals crossing TSV are on the chip critical paths. Consequently, the number of TSVs between two layers can only support coarse grain partitioning. On the other hand, in 3D Monolithic Integration (3DMI), transistor layers are fabricated one after another on the same die. It results in an improved alignment performance of ~10nm [2]. Therefore, vertical connections can be placed with a very small footprint of less than 100nm diameter in 65nm technology [3]. Moreover, the distance between the top and bottom layers can be reduced to 100nm [2] which decreases the delay while passing through the vertical via. As a result, efficient, very fine grain partitioning is achievable with 3DMI.

FPGAs suffer from high number of configuration memory nodes to support flexibility. With parallel integration FPGAs

TABLE I. 3DMI VS TSV VERTICAL CONNECTION COMPARISON

|          | Alignment<br>(µm) | Diameter<br>(µm) | Pitch<br>(µm) | Minimum<br>Depth(µm) |
|----------|-------------------|------------------|---------------|----------------------|
| TSV      | 0.5 - 1           | 2 - 4            | 4 - 8         | 20 - 50              |
| 3DMI     | 0.01              | 0.1              | 0.2*          | 0.1                  |
| 3DMI vs. | 50x - 100x        | 20x - 40x        | 20x - 40x     | 200x-500x            |
| TSV gain |                   |                  |               |                      |

\* Pitch is assumed to be at least two times larger than the diameter.

can gain in terms of area and performance. However, the integration is limited to coarse grain partitioning where multiple layers of FPGA tiles are stacked using TSVs. In [4], different 3D switchbox topologies are surveyed for TSV-based FPGAs with several layers. Since more complex switchboxes are required, the improvement of delay is limited to 10% when two layers are considered. There are several works which focus on decreasing the number of TSVs in order to reduce area and delay [5] [6]. In addition, the tool TPR [7] is presented for placement and routing of FPGA benchmarks with TSVs. Silicon interposers are proposed as a low cost solution node between 2D and 3D. FPGA vendor Xilinx has released an FPGA fabricated on interposer [8] using reduced TSV aspect ratio of 10 with increased yield.

3DMI took attention in the literature to increase the performance. In order to exploit the benefit of 3DMI, recent works focus on transistor-on-transistor and gate-on-gate 3D partitioning. In [9], various design tradeoffs in 3DMI are studied and they are compared with TSV-based integration. The power benefits of 3DMI are discussed in [10]. Several design techniques are proposed to design 3D circuits with existing 2D standard cell libraries [11][12]. Transistor-on-transistor approach improves the total area, critical path delay and power. However, well balanced NMOS and PMOS transistor footprints are necessary in order to reach highest benefits.

Recently, a number of works focused on monolithically integrated FPGAs. In [13] and [14], authors show improvements depending on several different stacking scenarios of logic and memory layers. In [15], a switchbox with memory and logic separation is presented.

In this paper, we propose a 3D FPGA with logic-onmemory approach. The memory and logic cells are separated and placed into the bottom and top layers respectively. Since a large portion of the FPGA is built with NMOS-only MUXs, gate level partitioning is preferred for the proposed FPGA. Compared to previously published papers, this study adds:

- realistic technological assumptions compared to [13-14] by using 14nm design kit and 3D add-on.
- design-based experiments instead of theoretical figures as in [13-14].
- real layout-extracted figures on 14 nm technology node.

- full FPGA consideration instead of only switch box based improvements as in [15].

The paper is organized as follows: Section II presents brief introduction to 3D monolithic integration technology. The proposed 3D FPGA and the 3D MUX4 as well as performance figures are given in Section III. In Section IV, several benchmark results on the entire FPGA are provided. Finally, conclusions are summarized in Section V.

## II. 3D MONOLITHIC INTEGRATION TECHNOLOGY

3D Monolithic Integration consists of sequential fabrication of active layers on the same die. Fig. 1 shows the crosssectional view of 3DMI. As demonstrated in [16], low process temperature (600°C) is necessary for successful stacking of the upper layer. Since the inter-tier vias are fabricated as regular vias between metals, very small interconnect footprint of 100nm can be achieved [3]. Therefore, compared to TSVs, high-granular and less capacitive vertical interconnects are obtained with 3DMI.

The evaluation of the 3D Monolithic approach through demonstrators on advanced nodes needs full custom design flow. For this, the main goals are to define a Design Rules Manual (DRM) document and to set up a predictive Process Design Kit (PDK) with tools for simulation, physical implementation and verification. All information described in the DRM like design rules have been specified from projection and embedded into this PDK for benchmark studies. This study is based on a PDK developed for Fully Depleted SOI (FD-SOI). An additional "Add-on" dedicated to 14nm 3D Monolithic integration was built to define the upper level with a specific description of 3D layers and intermediate metal levels.

## III. 3D FPGA DESIGN WITH MONOLITHIC INTEGRATION

## A. Baseline FPGA Architecture

In regular island style FPGAs, logical operations are carried out in logic blocks (LB). I/Os of LBs are connected to the channels (tracks) in connection boxes (CB) and the communications between different routing channels are performed in switch boxes (SB) as shown in Fig. 2. The LB includes a cluster of basic logic elements (BLE) each of which is composed of one look-up table (LUT), flip-flop (FF), and multiplexer to connect either the sequential or combinational result to the LB output. The logic functionality is performed in



Fig. 1. Cross-sectional view of 3D monolithic integration.



Fig. 2. Island style FPGA

SRAM-based LUTs inside the LBs, and the routing is stored in SRAMs of the SB and CB. With this typical FPGA architecture, memories occupy almost half of the total chip area in the FPGA landscape [13].

An FPGA is a scalable and modular architecture. Some parameters can change overall efficiency and a balance must be reached between routing and computing elements. In [17], it is concluded that the highest area efficiency is achieved when N (number of BLEs in LB)=4, K(number of input to LUT)=4 and I(number of inputs to the LB)=10. In this paper, we assume the same parameters. The channel width for the routing is fixed to 32 in order to accommodate high number of applications.

## B. 3D Cell Design

In order to fully benefit from 3DMI, technological and cost-related challenges must be taken into consideration during design. Fabrication of the bottom layer is classical but the top transistors, even if performed in a cold process, implies technological difficulties on the intermediate metal layers. Moreover, due to cost, the total number of available intermediate metal layers must be limited. Typically, integration of one or two metal layers is a reasonable technological target. Additionally, 3D connections are performed by vias which can be fabricated with a low pitch. Here we assume a pitch between vias of 100nm (Fig. 3) given by realistic technological rules. This figure shows that density of 3D connections in 3DMI is at least of one order of magnitude higher than its 3D TSV counterpart.

As described before, almost half of the FPGA is made of memories. In consideration of this characteristic, we determine the 3DMI partitioning as follows: The bottom layer contains SRAM cells while the top, computing and routing resources. For keeping a good global performance, SRAM cells must be entirely integrated on the bottom layer which leads to the choice of two intermediate metal layers. A second target is on the design side: a good equilibrium between the two layers must be reached while retaining modularity and scalability capacities of FPGA for performance and low area purposes. To fulfill the first requirement, a coarse grain top and bottom layer co-design must be carried out, while the second constraint leads us to keep the classical FPGA partitioning for design optimization, i.e. LB, SB and CB. An example on a multiplexer is described to demonstrate the co-design and the benefit of 3DMI.

## C. 3D MUX4

In the designed FPGA, multiplexers are used to create unidirectional routing blocks. The main component of the CB and SB, is the 4-input MUX with 2 memory cells. Each memory cell integrated with the MUX is a traditional 6T SRAM. The MUX is designed with NMOS-only pass gates with 6 transistors in order to minimize the area. Buffers with LEAP [18] are added after the MUX to improve the signal levels. Fig. 3 shows the designed cell. It is shown that vertical vias can be placed as close as 100nm (Fig.3c). As a result of the stacked NMOS placement in the MUX, a very compact layout is achieved. With this partitioning, the MUX4, SRAM, and the buffer occupy equal areas and, therefore, a balanced area between the top and bottom layers is established which is one of the major challenges in 3D design.

### D. Cell Performance Evaluation

For the MUX4 described in the previous section, the corresponding 2D cell is designed. The parasitic extraction is carried out for each cell. The extracted netlist is simulated with ELDO and, delay and power metrics are reported. Dynamic power values assume an activity of 2GHz.

Table II presents the performance metrics for the 2D and 3D MUX4s. A gain of 51%, 14% and 12% in area, delay, and power can be achieved. Since the area is significantly reduced, the routing complexity is lowered. As a result, the output is generated faster with decreased power consumption. Once the memory is configured, the values of the select inputs do not change. Therefore, the memory and vertical connections do not affect the cell performance.

## IV. 3D FPGA PERFORMANCE EVALUATION

All the necessary blocks (LB, CB and SB) are designed both in 2D and 3D. In order to evaluate the metrics, post layout results are extracted and the architecture files for VPR5 [19] tool have been created. Fig. 4 and 5 show the evaluation results.

Reduction in area by 55% is observed as shown in Fig. 4. The area benefits of 3DMI can be described as follows: First, the memory is completely removed from the logic layer. Second, due to the high granular vertical connections, replacing the memory on the bottom layer does not impose any routing congestion. Especially, the use of intermediate metal layers on the bottom layer enables very flexible memory placement while keeping high proximity to the logic layer.

As for the final metric, Fig.5 shows that the EDP (energydelay product) can be reduced by 47%. The improvement in the EDP is two fold: The intrinsic delays of the blocks are reduced with 3D integration due to simplified internal routing.

TABLE II.4-INPUT MUX PERFORMANCE

| MUX4               | Area(µm²) | Delay(ps) | Power(µW) |
|--------------------|-----------|-----------|-----------|
| 2D                 | 1,18      | 28,48     | 2,75      |
| 3D                 | 0,57      | 24,35     | 2,39      |
| 3D vs. 2D gain (%) | 51        | 14        | 12        |



Fig. 3. 3D 4-input MUX with SRAMs: a) Schematic view. b) 3D layout view. c) Top layer view. d) Bottom layer view. The vertical connecting VIAs (highlighted in yellow) can be placed with 100nm pitch.



Fig. 4. Area of FPGA benchmark circuits for 2D and 3D architectures. Area can be reduced by 55% on average when designed in 3D.



Fig. 5. EDP of FPGA benchmark circuits for 2D and 3D architectures. EDP can be reduced by 47% on average when designed in 3D.

As a consequence of shorter wirelength between blocks, the capacitance of the routing wires is decreased. Therefore, the operations are carried out faster while consuming less energy which results in an improved EDP of the 3D FPGA.

It is possible to optimize the memory and logic layers separately, meaning that the low leakage/high-Vt and high performance/low-Vt processes can be applied to memory and logic layers for optimal performance. It is also expected that the thermal difficulties can be overcome with the proposed partitioning. Since the memory layer holds the configuration information, there is no dynamic evolution in the values of the SRAM once it is written. Therefore, the heat generation in the bottom layer is minimized.

## V. CONCLUSION

In this paper, a 3D FPGA with monolithic integration is presented. Taking advantage of the very small vertical interconnects and the intermediate metal layers available in the 3D 14nm FDSOI technology, fine grain partitioning is achieved. In each building block, memory cells are placed in the bottom layer and the logic cells in the top. The results show that such a partitioning of 3D FPGA with high number of vertical interconnects results in 55% smaller and 47% more efficient in terms of EDP.

## ACKNOWLEDGMENT

This work has been partially supported in the frame of the ST/IBM/CEA-Leti development alliance.

#### REFERENCES

- ITRS, "Interconnect". Internet: <u>http://www.itrs.net/Links/2011ITRS/</u> 2011Chapters/ 2011Interconnect.pdf
- [2] Batude, P.; et al, "Enabling 3D monolithic integration", Poceedings of the Electro-Chemical Society (ECS), VOL 16, pp47, 2008
- [3] Soon-Moon Jung; et al, "Highly cost effective and high performance 65nm S<sup>3</sup> (stacked single-crystal Si) SRAM technology with 25F<sup>2</sup>, 0.16um<sup>2</sup> cell and doubly stacked SSTFT cell transistors for ultra high density and high speed applications," VLSI, 2005
- [4] Gayasen, A.; Narayanan, V.; Kandemir, M.; Rahman, Arifur, "Designing a 3-D FPGA: Switch Box Architecture and Thermal Issues," *VLSI Systems, IEEE Trans. on*, vol.16, no.7, pp.882,893, 2008
- [5] Pangracious, V.; Mehrez, H.; Marakchi, Z., "Architecture level TSV count minimization methodology for 3D tree-based FPGA," *Cool Chips* XVI (COOL Chips), 2013 IEEE, vol., no., pp.1,3, 2013
- [6] K. Siozios, V. F. Pavlidis, and D. Soudris, "A Novel Framework for Exploring 3-D FPGAs with Heterogeneous Interconnect Fabric," ACM Trans. on Reconf. Tech. and Systems, 2012.
- [7] Ababei, C.; et al., "Placement and routing in 3D integrated circuits," Design & Test of Computers, IEEE, vol.22, no.6, pp.520,531, 2005
- [8] Banijamali, B.; et al., "Advanced reliability study of TSV interposers and interconnects for the 28nm technology FPGA," *Electronic Components and Technology Conference (ECTC)*, pp.285,290, 2011
- [9] Chang Liu; Sung-Kyu Lim, "A design tradeoff study with monolithic 3D integration," *Quality Electronic Design (ISQED), International Symposium on*, vol., no., pp.529,536, 2012
- [10] Young-Joon Lee; Limbrick, D.; Sung Kyu Lim, "Power benefit study for ultra-high density transistor-level monolithic 3D ICs," *Design Automation Conference (DAC), ACM / EDAC / IEEE*, pp.1,10, 2013
- [11] Bobba, S.; Chakraborty, A.; Thomas, O.; Batude, P.; Ernst, T.; Faynot, O.; Pan, D.Z.; De Micheli, G., "CELONCEL: Effective design technique for 3-D monolithic integration targeting high performance integrated circuits," *Design Automation Conference (ASP-DAC)*, pp.336,343, 2011
- [12] H. Sarhan; et al., "3DCoB: A New Design Approach for Monolithic 3D Integrated Circuits", ASP-DAC, accepted.
- [13] Mingjie Lin; et al., "Performance Benefits of Monolithically Stacked 3-D FPGA," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol.26, no.2, pp.216,229, 2007
- [14] Wong, Simon; et al., "Monolithic 3D Integrated Circuits," VLSI Technology, Systems and Applications, pp.1,4, 2007
- [15] Batude, P.; et al., "3-D Sequential Integration: A Key Enabling Technology for Heterogeneous Co-Integration of New Function With CMOS," *Emerging and Selected Topics in Circuits and Systems, IEEE Journal on*, vol.2, no.4, pp.714,722, 2012
- [16] Batude, P.; et al, "Advances, challenges and opportunities in 3D CMOS sequential integration," *Electron Devices Meeting (IEDM)*, *IEEE International*, vol., no., pp.7.3.1,7.3.4, 2011
- [17] Betz, Vaughn; Rose, J., "Cluster-based logic blocks for FPGAs: areaefficiency vs. input sharing and size," *Custom Integrated Circuits Conference,IEEE*, vol., no., pp.551,554, 1997
- [18] Yano, K.; et al., "Top-down pass-transistor logic design," Solid-State Circuits, IEEE Journal of, vol.31, no.6, pp.792,803, 1996
- [19] P. Jamieson, W. Luk, S. J. E. Wilton, and G. A. Constantinides, "An Energy and Power Consumption Analysis of FPGA Routing Architectures," in *Field-Programmable Technology*, 2009. FPT 2009. International Conference on, 2009, pp. 324-327