# Bitstream-Level Interconnect Fault Characterization for SRAM-based FPGAs

Christian Fibich Dept. of Electronic Engineering University of Applied Sciences Technikum Wien Vienna, Austria fibich@technikum-wien.at Martin Horauer Dept. of Electronic Engineering University of Applied Sciences Technikum Wien Vienna, Austria horauer@technikum-wien.at Roman Obermaisser Chair for Embedded Systems University of Siegen Siegen, Germany roman.obermaisser@uni-siegen.de

Abstract-A significant portion of the configuration memory of modern SRAM-based FPGAs is dedicated to configuring the interconnect. Understanding the effects of interconnect-related Single-Event Upsets (SEUs) on the circuit's behavior is critical for developing accurate reliability prediction and efficient fault mitigation approaches. This work describes an approach to classify the effects of single-bit interconnect faults into wellknown fault models, and to characterize the electrical effects of these modeled faults. An experimental fault characterization for two families of Xilinx and Lattice FPGAs shows that different types of single-bit interconnect faults exhibit significantly different criticality. This may serve as a partial explanation for the large discrepancies reported in literature between faults predicted to be critical by state-of-the-art methods ("essential bits") compared to the numbers of actually critical bits determined experimentally and may be used to improve prediction accuracy or reliabilityaware routing approaches.

Index Terms-FPGA, Fault Injection, Interconnect Faults

#### I. INTRODUCTION

The susceptibility of SRAM-based FPGAs to Single-Event Upsets (SEUs) affecting their configuration, thereby causing modifications to the implemented circuit, is a well-known problem. However, not all configuration bits in an FPGA configured with a particular design are equally critical to the design's correct operation. State-of-the-art FPGA tools allow the generation of a list of bits that are deemed "essential" to the design's correct functionality (see, e.g., [1]). However, fault injection experiments described in literature - with the aim of determining the critical subset of these essential bits by flipping them one by one - often result in a far smaller number of critical bits. For example, in [2], fault injection experiments on eleven HLS-generated benchmark designs revealed that only between 0.5% and 20% of essential bits impact the design critically when flipped. This large discrepancy between predicted and experimentally confirmed critical bits raises the question if there are systematic factors unaccounted for in stateof-the-art criticality prediction approaches. As a large portion

of the configuration of modern FPGAs is dedicated to the interconnect (e.g., 80%–90% of the configuration of Xilinx 7 Series FPGAs [3], [4]), improving the understanding of the effects of individual interconnect faults on a given circuit is relevant to answering this question.

## II. RELATED WORK

In [5], a set of five primary topological effects of interconnect faults in FPGAs are described: An Open fault disconnects a route's fan-out from its driver, a Conflict fault short-circuits two routes, an Input Antenna fault adds an unused routing segment to the fan-in part to a route, an Output Antenna fault adds an unused routing segment to the fan-out, and a Bridge fault selects another driver for a route. Multiple recent works [3], [4], [6], [7] analyze the consequences of one or more of the fault effects identified in [5], and implicitly develop fault models for single-bit flips in the process. The scale of the experiments described in these works is often limited to a small number of routing resources, while this work describes an approach for larger-scale systematic characterization of a device's routing resources. This allows to investigate the effects of different driver and wire types on the outcomes of interconnect faults. It focuses on Conflict and Input Antenna faults, as these faults have the potential to disrupt a design's functionality – if, e.g., an erroneously connected stronger driver overpowers the original, weaker driver - but also to behave in a benign way under reversed circumstances, allowing the design to function until the bit flip is resolved by, for example, configuration scrubbing.

### III. BITSTREAM-LEVEL FAULT MODELLING AND EXPERIMENTAL CHARACTERIZATION

To model the faults potentially caused by a single bit flip, the following knowledge about the device's routing resources and routing structure is required:

• The configuration bits associated with each routing resource. In conjunction with the number of fan-in wires, this may inform assumptions about the implementation of the routing resource (e.g., decoded vs. binary encoded multiplexers, cf. [5]).

This work has been supported by the Austrian Federal Ministry for Digital and Economic Affairs (BM:DW) and the National Foundation for Research, Technology and Development as related to the Josef Ressel Center "Innovative Platforms for Electronic-Based Systems" (INES) managed by the Christian Doppler Society (CDG).



Fig. 1: Proposed Characterization Circuit(s)

- The *valid* configuration patterns for configuring each of the possible routes, as well as which pattern is set if the routing resource is unused. Furthermore, assumptions about the effects of invalid patterns are required.
- The number of routing resources able to drive a single wire. If more than one resource can drive a given wire, a configuration bit or pattern in each resource must be dedicated to preventing an unused routing resource to act as an additional driver (e.g., to "tri-state" the resource).

Using the information above, two cases need to be considered for each resource: (1) Single bit-flips occurring in each configuration bit of an *unused* resource need to be analyzed for the potential to cause the resource to become active, and (2) bit flips occurring in each configuration bit of each valid configuration pattern of a *used* routing resource for the potential to disconnect or otherwise impact the intended connection. Furthermore, the circumstances of each fault – if an erroneously connected wire is used by the design – must be considered to discern different types of faults. In this way, each bit flip can then be associated with a well-known fault model from [5].

To characterize the electrical consequences of these faults, we propose to place and route the schematics shown in Figure 1 via each routing resource of interest. The version using one wire-under-test may be used to analyze the effects of *Input Antenna* and *Open* faults by comparing the result with the unmodified "golden" route. The two-wire version drives logically complementary values onto two routes-under-test, which can be used to analyze the effects of *Bridge* and *Conflict* faults. As the effects of a fault may only manifest at one particular logic level, the experiments should be performed while driving the route with different signal values. To limit the influence of process, voltage, and temperature effects on the results, experiments should be conducted on multiple similar devices and under controlled temperature and supply conditions (cf. [7]).

### **IV. PRELIMINARY RESULTS**

Development of bitstream-based fault models as well as characterization experiments for *Input Antenna* and *Conflict* faults as described in Section III have been performed on four Xilinx Artix 7 (xc7a35tcpg236-1) and ten Lattice iCE40 UltraPlus (ice40up5k) FPGAs. For both of these technologies, third-party bitstream documentation is available<sup>1</sup>. In the Xilinx experiments, the routable subset of all possible *Input Antenna* 

TABLE I: Preliminary Characterization Results

| Device  | Fault Model   | Injected | Critical on |             |
|---------|---------------|----------|-------------|-------------|
| Family  |               | Faults   | Any Device  | All Devices |
| Xilinx  | Input Antenna | 190067   | 11.6%       | 11.0%       |
| Artix 7 | Conflict      | 17988    | 100.0%      | 100.0%      |
| Lattice | Input Antenna | 29107    | 22.1%       | 4.95%       |
| iCE40   | Conflict      | 4223     | 98.5%       | 86.3%       |

faults of 17 interconnect tiles neighboring different types of logic tiles (CLB, BRAM, DSP, etc.) formed the basis of the *Input Antenna* characterization, while all routable combinations of route source, fault source, and destination wires possible in one interconnect tile were characterized for *Conflict* faults. A similar approach was followed for Lattice iCE40.

Table I shows the outcome of this characterization process, detailing the number of faults that negatively impacted the logical value transported by the routes(s) under test on both technologies. It can be seen that *Conflict* faults are far more consistently critical than *Input Antenna* faults on both technologies. Especially noteworthy is the large device-to-device variation in the Lattice *Input Antenna* case. In summary, these preliminary results show that differentiating interconnect faults by their consequences may allow to more precisely predict the criticality of single-bit errors from a design's bitstream alone.

Analyzing the *Input Antenna* results for Xilinx Artix 7 in more detail revealed that faults connecting certain subsets of the output wires of function tiles neighboring interconnect tiles (LOGIC\_OUTS\*) as an additional fan-in are responsible for a vast majority of critical faults in this category. These subsets depend on the type of function tile and its vertical offset to the considered interconnect tile. For example, in interconnect tiles next to CLB tiles, LOGIC\_OUTS[8-15] (the CLB tile's LUT outputs) are especially critical. This may be useful for further improving prediction accuracy and for the implementation of technology-specific reliability-aware routing approaches.

#### REFERENCES

- (v1. Bits," "XAPP538 [1] Xilinx, Inc., 0): Soft Error Mitigation Using Prioritized Essential 2012, Accessed: 2023-01-26. https://web.archive.org/web/20200805232308/ [Online]. Available: https://www.xilinx.com/support/documentation/application\_notes/ xapp538-soft-error-mitigation-essential-bits.pdf
- [2] S. T. Fleming and D. Thomas, "Injecting FPGA Configuration Faults in Parallel," in 2018 International Conference on Field-Programmable Technology (FPT), 2018, pp. 198–205.
- [3] L. Bozzoli, C. De Sio, L. Sterpone, and C. Bernardeschi, "PyXEL: An Integrated Environment for the Analysis of Fault Effects in SRAM-Based FPGA Routing," in 2018 International Symposium on Rapid System Prototyping (RSP), 2018, pp. 70–75.
- [4] M. Darvishi, Y. Audet, Y. Blaquière, C. Thibeault, and S. Pichette, "On the Susceptibility of SRAM-Based FPGA Routing Network to Delay Changes Induced by Ionizing Radiation," *IEEE Transactions on Nuclear Science*, vol. 66, no. 3, pp. 643–654, 2019.
- [5] N. Battezzati, L. Sterpone, and M. Violante, *Reconfigurable Field Programmable Gate Arrays: Basic Concepts*. New York, NY: Springer New York, 2011, pp. 7–35.
- [6] M. Cannon, A. Keller, and M. Wirthlin, "Improving the Effectiveness of TMR Designs on FPGAs with SEU-Aware Incremental Placement," in 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2018, pp. 141–148.
- [7] C. Fibich, M. Horauer, and R. Obermaisser, "Device- and Temperature Dependency of Systematic Fault Injection Results in Artix-7 and iCE40 FPGAs," in 2021 Design, Automation Test in Europe Conference Exhibition (DATE), 2021, pp. 1600–1605.

<sup>&</sup>lt;sup>1</sup>https://f4pga.github.io/prjxray-db/ (Xilinx 7 Series) and

http://bygone.clairexen.net/icestorm/ (Lattice iCE40), Accessed: 2023-01-26