# Evaluating the effects of SEUs affecting the configuration memory of an SRAMbased FPGA

M. Bellato<sup>3</sup>, P. Bernardi<sup>1</sup>, D. Bortolato<sup>2</sup>, A. Candelori<sup>3</sup>, M. Ceschia<sup>2,3</sup>, A. Paccagnella<sup>2,3</sup>, M. Rebaudengo<sup>1</sup>, M. Sonza Reorda<sup>1</sup>, M. Violante<sup>1</sup> and P. Zambolin<sup>2</sup>

<sup>1</sup> Politecnico di Torino, Torino, Italy
<sup>2</sup> DEI, Università di Padova, Padova, Italy
<sup>3</sup> Istituto Nazionale di Fisica Nucleare, Padova, Italy

## Abstract\*

This paper analyses the effects of Single Event Upsets in an SRAM-based FPGA, with special emphasis for the transient faults affecting the configuration memory. Two approaches are combined: from one side, by exploiting the available information and tools dealing with the device configuration memory, we were able to make hypothesis on the meaning of every bit in the configuration memory. From the other side, radiation testing was exploited to validate the hypothesis and to gather experimental evidence about the correctness of the obtained results. As a major result, we can provide detailed information about the effects of SEUs affecting the configuration memory of a commercial FPGA device. As a second contribution, we describe a method for obtaining the same result with similar devices. Finally, the obtained results are crucial to allow the possible usage of SRAM-based FPGAs in safety-critical environments, e.g., by working on the place and route strategies of the supporting tools.

## 1. Introduction

The size and complexity of commercial programmable logic devices allows replacing ASICs in several applications. SRAM-based Field Programmable Gate Arrays (FPGAs) [1] offer high densities and in-system re-programmability that are very attractive for electronic design. Despite the benefits SRAM-based devices offer, dependability issues limit their widespread adoption in safety- or mission-critical applications. For example, the Xilinx Virtex family is fabricated on thin-epitaxial silicon wafers exploiting a 0.22 µm CMOS technology with 5 metal layers. Such a kind of technology is relatively sensitive to Single Event Upsets (SEUs) [2] that may be originated by high-energy particles hitting the sensitive silicon areas, and that interact with the memory elements by changing their logic state. Since the behaviour of a SRAM-based FPGA is determined by the bitstream loaded and stored in the configuration memory, the effects of SEUs may drastically alter the correct

1530-1591/04 \$20.00 (c) 2004 IEEE

operations of FPGAs [3][4] causing unexpected output results, usually called Single Event Functional Interrupts (SEFIs). It is becoming of primary concern the possibility of forecasting the effects of SEUs into the configuration memory of a SRAM-based FPGA, possibly starting already in the initial design, when only a highlevel model of the system is available and to possibly intervene in order to guarantee the desired degree of dependability.

For this reason, researchers investigated simulationbased approaches for predicting the SEU effects on the FPGA functionality. The methods proposed so far [5]-[8], although effective and accurate, are intended for the analysis only of circuits implemented as ASICs. When the target technology is shifted to SRAM-based FPGAs, two complementary aspects should be considered:

- SEUs may alter the memory elements the design embeds.
- SEUs may alter the content of the memory storing the device configuration information. For example, SEUs may alter the content of Look-Up Tables (LUTs) inside Configurable Logic Blocks (CLBs), or the routing of signals within CLBs or among CLBs.

As far as the first aspect is concerned, already available approaches can be used to characterize its effects. Conversely, the latter aspect demands much more complex and powerful analysis capabilities. The effects of SEUs in the device configuration memory are not limited to modifications in the memory elements, but they may also produce modifications in the interconnections inside CLBs and among different CLBs, thus giving rise to totally different circuits from those intended.

The paper presents an approach aiming at investigating the effects of SEUs affecting the configuration memory of SRAM-based FPGAs. A detailed analysis of the critical resources sensitive to SEUs is first described, based on an approach exploiting the available information and tools dealing with the configuration memory. Then, we validate and integrate the obtained results by means of a radiation-testing environment: in this case we can experimentally compute the effects of

<sup>\*</sup> This work has been partially supported by the Italian Ministry for University through the project *Reconfigurable platforms for wideband wireless communications* (PRIMO).

radiations on the FPGA configuration memory, and correlate faulty behaviours with bit-flips arising in it.

A preliminary version of this work was presented in [9]. The novelty of this paper lies in the proposed methodology, which is able to deeply investigate the causes of SEFIs as far as SEUs affecting the configuration memory are considered. The proposed analysis has been validated by radiation-testing experiments executed on an application mapped on a SRAM-based FPGA. Thanks to this approach it is possible to individuate the most critical bits within the configuration memory, which may cause a SEFI if affected by a SEU. The proposed approach could be a valid support to the designer in two directions: improvement of the reliability of the application through an efficient placement and routing procedure and improvement of the dependability parameters of the target application at the stages of the design flow by reducing the number of critical configurations.

The remainder of this paper is organized as follows. Section 2 summarizes the available information on the configuration memory organization and content and describes our approach to determine the meaning of every bit. Section 3 describes the radiation-testing environment and the performed experiments; Section 4 describes the results we obtained by integrating results of radiation testing with those coming from the previous analysis, while Section 5 draws some conclusions.

## 2. SEUs effects analysis

The main goal of the proposed technique is to analyse SEU effects in FPGA-based applications early in the design phase, in particular as soon as the placed and routed model of the designed circuit is available. We can thus investigate the effects of SEUs affecting the device configuration memory, aiming at identifying the modifications they introduce in the circuit implemented by the FPGA.

In the present paper we are considering the *Virtex* XCV300 FPGA from Xilinx, selected as a valid representative of the class of SRAM-based FPGAs: however, the same approach can be followed for similar devices of the same family.

The XCV300 FPGA is based on a SRAM configuration memory and it features a 32x48 TILE matrix with almost 7,000 equivalent logic cells, 320,000 system gates and 64 kbits of embedded RAM.

In order to analyze the effects of SEUs, we first decoded the information stored inside the device configuration memory, thus becoming able to precisely associate each bit in it with the corresponding FPGA resource. These bits define how the FPGA resources are used to form a netlist implementing the circuit mapped on the FPGA. In other words, these bits determine how the CLBs are connected and which functions the LUTs inside the CLBs implement. For the *Virtex* device we obtained a map where all the 864 configuration bits for each TILE are organized as follows:

- *North, Middle and South Switch Box*: they control the routing of IO signals between the considered CLB and the surrounding CLBs (as shown in Fig. 1);
- *Internal interconnections*: they control the routing of signals within each of the two slices composing a CLB;
- *Control resources*: they define the behaviour of the programmable resources within a CLB;
- *LUTs*: they store the truth table for the combinational functions implemented by the CLB.



Fig. 1: The TILE schematic composed of two programmable Control Logic Block and an internal interconnection layer managed by North, Middle and South Switch box.

In order to perform the device configuration memory decoding, we identified all the possible configurations for a given resource by considering its configuration bits, modifying them one by one and recording the introduced modification of the resource configuration. By repeating this process for all the FPGA resources, we were able to identify all the possible effects of a SEU in the device configuration memory.

All the information about the configuration of FPGA resources implementing a given design, i.e., the configuration memory, is stored in the Native Circuit Description (NCD) file. The following proprietary Xilinx tools aim to analyze the different descriptions of the implemented circuits:

- the NCD2XDL generates an high-level description of the circuit mapped onto the device and it allows to modify the internal resources in order to introduce all the possible modifications
- the XDL2NCD executes the reversed operation, generating a NCD file starting from an high-level description
- the BITGEN tool converts the obtained original and the modified NCD files into a bitstream, which can be

analysed in order to investigate the effects of the introduced modifications.

The approach we followed, shown in Fig. 2, started from the analysis of the modifications of the configuration memory due to a modification of a single programmable resource.



Fig. 2: Bit stream analysis flow.

Thanks to this preliminary analysis we were able to understand the correspondence between the configuration memory and the allocated FPGA resource, and then to know the effects of a bit stream modification. We omit the implementation details of this learning process.

In the following we will detail the results obtained analyzing the effects of SEUs affecting the configuration memory.

## 2.1. CLB resources

A part of the bitstream stored into the configuration memory is devoted to manage the CLB resources. This set of 192 bits is used to:

- describe the content of the LUTs,
- program the CLB internal routing by selecting a MUX via,
- decide how the CLB internal structure works (a LUT can be used as LUT or RAM or ROM, while the embedded Flip Flop can work like a Flip Flop or a latch with high or low set/reset and synchronous or asynchronous reset).

A SEU that modifies a bit corresponding to a CLB resource can produce an anomalous behavior of the mapped circuit, depending on the involved resource:

- *LUT defect*, a SEU affecting a bit controlling the LUT content implies a modification of the logic function implemented;
- *MUX defect*, a SEU affecting a MUX selection bit causes a new path to the exit points of the CLB;
- *Initialization defect*, a SEU affecting a initialization bit produces a modification of the behavior of the internal components of the CLB.

#### 2.2. Routing resources

In SRAM-based Xilinx devices the signal routing takes place through interconnection matrices named Programmable Interconnection Points (PIPs). The reader should note that the place and route tool may implement any net connecting two circuit modules by joining several PIPs, each one belonging to an interconnection bridge (either the north or the south one). As a result, SEUs in the configuration bits of north/south interconnection bridge may modify one PIP, possibly interrupting the signal propagation among CLBs and to a large scale, circuit modules.



Fig. 3: Schematic representation of the interconnection matrix implemented by one PIP that may be used for connecting input signals (IN\_0 to IN\_11) coming from FPGA resources to output signals (OUT0 to OUT7). The figure depicts the fault-free situation where the PIP implements the routed nets Net\_1 and Net\_2 as defined by the place and route tool.

Starting from the fault-free configuration of the interconnection bridge presented in Fig. 3, we identified the fault effect scenarios presented in the following:

• Open: the PIP configuration corresponding to Net\_1 is set to the open state, in such a way that IN\_0 and OUT1 are no longer connected. Fig. 4 reports two different cases where the SEU effect can be classified as Open: in Fig. 4.a the routed net Net\_1 is deleted, while in Fig. 4.b. a new net Net\_X is inserted connecting an unused input node with an used output node. As a result the CLBs (or the output pads) that are fed with the signal previously travelling over Net\_1 become dangling;



Fig. 4: The SEU deletes a routed net introducing an open connection.

Bridge: a new PIP, called Net\_X, is enabled, while Net\_1 is deleted as in the Open case, as shown in Fig. 5. The new PIP may influence the behaviour of the

implemented circuit, since the CLBs or output pads originally driven by the deleted net are now driven by an unknown logic value;



Fig. 5: The SEU introduces a new path between used nodes.

• *Input Antenna*: a new PIP, called Net\_X, starting from an unused input node is connected to an used output node, as shown in Fig. 6. The new PIP may influence the behaviour of the implemented circuit, since the CLBs or output pads are driven by an unknown logic value;



Fig. 6: The SEU introduces a new path between an unused input node and a used output one.

• *Output Antenna*: a new PIP, called Net\_X, starting from an used input node is connected to an unused output node, as shown in Fig. 7; The new PIP does not influence the behaviour of the implemented circuit, since the CLBs or output pads are unused;



Fig. 7: The SEU introduces a new path between a used input node and an unused output one.



Fig. 8: The SEU introduces a new path between used nodes.

- *Conflict:* a new PIP called Net\_X links an input and an output node, both used, as shown in Fig. 8. The new PIP creates a conflict, resulting in the propagation of unknown values to the CLBs (or output pads) fed with the output node;
- *None*: the PIP configuration is not affected by the fault that modified an unused portion of the device configuration memory;

• *Others:* the PIP modification cannot be classified in any of the above classes.

#### 3. Radiation-testing Set-up

Radiation testing [10] is an effective solution for understanding the effects of SEUs affecting both the memory elements the design embeds and the configuration memory. Following this technique, a prototype of the system under analysis is exposed to a flux of highly energized particles, originated either by radioactive sources or by particle accelerators, which interacts with both the design memory elements and the configuration memory.

Throughout our irradiation experiments we have tested the same *Virtex* XCV300PQ240-4 FPGA model from Xilinx analyzed in the previous section. Radiation experiments have been carried out using various ion species from 84 MeV Carbon to 210 MeV Nickel featuring Linear Energy Transfer (LET) values between 1.6 and 30 MeV·cm2/mg.

Our test strategy was based on the continuous monitoring of the outputs of a circuit implemented on the FPGA under test, which was continuously stimulated with a given set of input vectors. As soon as a permanent mismatch on the output values was observed between the expected values and the read ones, i.e., when a SEFI was detected, the test was stopped and the configuration memory content read back.

This operation was performed by a Power PC-based (MPC860) microprocessor system and control hardware implemented in a second Virtex FPGA. In this control circuit we implemented four FIFO buffers and the circuit controlling the writing and reading-back of the configuration memory through the DUT's parallel SELECTmap interface (Fig. 9). The FIFOs were implemented mainly to decouple the data flow between the CPU and DUT: two FIFOs were used for downloading the configuration data stream, and other two FIFOs for reading/writing the stimuli and output data stream. Both the CPU board and control hardware boards were installed very close to the DUT inside the irradiation vacuum chamber. In this way the only connections we had to route outside the chamber were the power supply and the Ethernet link to the Control Host (PC) in the control room. To avoid the destruction of the DUT due to Single Event Latchup (SEL), the DUT power supply current was continuously monitored by a protection circuit outside the chamber.

The target circuit implemented in the DUT was composed of four 16x16-bit binary multipliers. Inputs of the four multipliers were connected in parallel and the outputs were connected to a *XOR* gate array. The main feature of this circuit is that it is purely combinational and it uses a large part of the DUTs resources (about 65%). The absence of user Flip-Flops (FFs) ensures that a SEFI occurs only when a configuration memory register is modified, while sequential circuits could show SEFIs even when any of the user FFs are modified. This kind of circuit was particularly useful to deeply investigate the SEFI generation mechanism when a "critical" SEU (on the configuration memory) hits the device: we introduce the idea of "critical" SEU because not all SEUs necessarily induce a SEFI.



Fig. 9: Experimental set up. The control host is about 50 m away from the irradiation chamber that contains the Power PC-based microprocessor system (CPU), the control hardware (FPGA) and the DUT, all located within a distance of 10 cm.

The device was configured, exposed to the beam and continuously stimulated and monitored from the control hardware. As noted before, when a mismatch was detected on the output sequence, the device configuration memory was read back and stored on a file along with the sequence of output vectors the circuit produced. A number of SEUs depending on the area occupied by the circuit implemented in the DUT has to pile up before a SEFI occurs: the smaller the circuit, the higher this number because "critical" areas of the FPGA are smaller.

The described environment aims to obtain two relevant information: it allows both the evaluation of the susceptibility of the FPGA architecture in terms of SEFI cross section of the implemented circuit and the analysis of single SEU effects on the FPGA mapped circuit behavior.

In order to calculate the implemented circuit cross section a first testing procedure have been used: we compared the recorded erroneous configurations read after each SEFI occurrence with the reference one obtained by the Xilinx place and route tools. For each read back configuration we recorded 100-200 errors at most, but often less than 10 errors were observed and for 10% of the cases only one corrupted configuration memory bit was detected. As reported in [9] the cross section is strictly dependant from the density of the implemented circuit and the SEU/SEFI ratio fall down proportionally with the number of involved resources.

As we have no chronological information about the order in which the SEUs occurred, the cases where a single error leads to a functional mismatch result particularly interesting, since they allow accurate identification of errors in the device configuration memory that correspond to SEFIs in the user circuit.

Since the classification of the effect of a single SEU is investigated, a second experimental procedure have been adopted based on this supposition: to make easier the identification of the "critical" SEUs, a SEFI should have occurred only when the configuration memory was corrupted by a small number of SEUs, ideally only one. To achieve this result the DUT was periodically reconfigured and the length of the reconfiguration period was chosen in such a way that 1 or 2 SEUs could occur before the DUT was reconfigured, on the average. Thanks to this approach we obtained a large number of measurements having only one bit corrupted in the configuration memory.

| Ion              | LET<br>(MeV/mg/cm2) | SEU/SEFI |
|------------------|---------------------|----------|
| <sup>12</sup> C  | 1.6                 | 33       |
| <sup>16</sup> O  | 3                   | 12       |
| <sup>19</sup> F  | 4.1                 | 8        |
| <sup>28</sup> Si | 8.5                 | 6        |
| <sup>58</sup> Ni | 30                  | 9        |

#### Table 1: SEU/SEFI ratio

Each of these algorithms was applied to the test circuit in sequence for each ion and table 1 reports the ratio between the configuration SEU cross section and the SEFI cross section for each ion. This ratio corresponds to the average number of errors in the configuration memory needed to induce a SEFI in the user circuit. The average value for the device under test is 14, indicating that several errors must occur in the configuration memory to induce one error on the output of the circuit. This is an important result, because it underlines the necessity of correctly identify which SEU can induce an error in the implemented design.

Throughout all the experimental runs we never recorded SEFIs without errors in the configuration memory. Finally, only twice there was a failure of the controlling circuitry of FPGA. In those conditions the only possible action for restoring the correct functionality of FPGA was the switching off/on procedure. These errors may be caused by a SEU in the FPGA controller interfacing the device with our control system.

#### 4. Experimental Results

Preliminary experimental results have been obtained analyzing the effects of SEUs on a device under test composed of four 16x16-bit binary multipliers, mapped on the *Virtex* XCV300PQ240-4 FPGA from Xilinx. Radiation experiments have been carried out at the Tandem Van De Graaff Accelerator of INFN-LNL, Legnaro, Italy. The classification method has been applied on the set of faulty circuits generating a SEFI during the radiationtesting experiments. As described above, during the experiment the bit stream configuration was read back each time a SEFI occurred. The whole set of bit streams, stored in a set of files, has been elaborated, comparing the faulty and fault-free bitmaps. The difference between them is analyzed and the effects of each SEU was classified, exploiting the analysis described in Section 2

The results we obtained, reported in Table 2, confirm that the routing resources are the most sensitive to SEU effects, while few faults have been observed inside CLBs resources. This is a consequence of the number of bits devoted to manage interconnections between logic elements and I/O blocks as, for each TILE, the 78% of the bits could define routing path.

As far as CLBs are considered, the MUXs are the most sensitive resources, while, when routing resources are considered, the dominant effects are the Open and the Conflict ones.

These effects are a real challenge for those designers that are involved in devising solutions for hardening their FPGA-based circuits.

|         |                | SEFIs |      |
|---------|----------------|-------|------|
|         |                | [#]   | %    |
| 8       | LUT            | 36    | 7.9  |
| CLJ     | MUX            | 54    | 11.9 |
|         | Inizialization | 0     | 0    |
| Routing | Open           | 108   | 23.8 |
|         | Bridge         | 66    | 14.5 |
|         | Output Antenna | 0     | 0    |
|         | Input Antenna  | 13    | 2.8  |
|         | Conflict       | 145   | 31.9 |
|         | None           | 0     | 0    |
|         | Others         | 32    | 7.0  |
|         | Total          | 454   |      |

Table 2: Classification of the radiation-testing experiments generating a SEFI.

## 5. Conclusions

In this paper we described a method for assessing the effects of SEUs in the device configuration memory of an SRAM-based FPGA. The method combines the of results radiation-testing for technology characterization with those obtained analyzing the meaning of every bit in the FPGA configuration memory. The radiation-testing set-up allowed to experimentally identify the effects of SEUs into a device under test exposed to heavy ions, and the faulty bitmaps of the configuration memories corresponding to unexpected output results were used to validate the results of the performed analysis. The methodology presented in this paper allows to investigate the critical bits responsible of a failure and to classify them according to the affected resource. The current analysis confirmed that an erroneous modification of both the bits coding the CLB and the interconnections resources can cause a failure in the application and showed that the FPGA interconnection resources are the most sensitive to SEUs.

We are now in the position of forecasting the effects of any SEU affecting the configuration memory, and an automatic tool to perform this operation is under construction.

Future work will also consider the possibility to adopting the present methodology in order to tune the place and route algorithm and to introduce a suitable redundancy aiming at reducing the probability that a SEU could modify the behavior of the application.

## 6. References

- [1] P. Chow, Soon Ong Seo, J. Rose, K. Chung, G. Paez-Monzon, I. Rahardja, "The design of an SRAM-based field-programmable gate array. I. Architecture", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 7 Issue 2, June 1999, pp. 191-197
- [2] M. Nicolaidis, "Time Redundancy Based Soft-Error Tolerance to Rescue Nanometer Technologies", IEEE 17th VLSI Test Symposium, April 1999, pp. 86-94
- [3] M. Ceschia, A. Paccagnella, S.-C. Lee, C. Wan, M. Bellato, M. Menichelli, A. Papi, A. Kaminski and J. Wyss "Ion Beam Testing of ALTERA APEX FPGAs", NSREC 2002 Radiation Effects Data Workshop Record
- [4] R. Katz, K. LaBel, J.J. Wang, B. Cronquist, R. Koga, S. Penzin and G. Swift, "Radiation Effects on Current Field Programmable Technologies", IEEE Trans. on Nuclear Science, Vol. 44, No. 6, 1997, pp. 1945-1956
- [5] B. L. Bhuva, J. J. Paulos, R. S. Gyurcsik, S. E. Kerns, "Switch-Level Simulation of Total Dose Effects on CMOS VLSI Circuits", IEEE Transactions on Nuclear Science, Vol. 8, No. 9, 1989, pp. 933-938
- [6] N. Kaul, B. L. Bhuva, S. E. Kerns, "Simulation of SEU Transients in CMOS IC", IEEE Transactions on Nuclear Science, Vol. 38, No. 6, 1991, pp. 1514-1520
- [7] M. P. Baze, S. Buchner, W. G. Bartholet, T. A. Dao, "An SEU Analysis Approach for Error Propagation in Digital VLSI CMOS ASICs", IEEE Transactions on Nuclear Science, Vol. 42, No. 6, 1995, pp. 1863-1869
- [8] L. W. Massengill, A. E. Baranski, D. O. Van Nort, J. Meng, B. L. Bhuva, "Analysis of Single-Event Effects in Combinational Logic-Simulation of the AM2901 Bitslice Processor", IEEE Transactions on Nuclear Science, Vol. 47, No. 6, 2000, pp. 2609-2615
- [9] M. Violante, M. Ceschia, M. Sonza Reorda, A. Paccagnella "Analyzing SEU Effects in SRAM-based FPGAs", IEEE On-Line Testing Symposium 2003, pp. 119-123
- [10] J.J. Wang, R.B Katz, J.S. Sun, B.E. Cronquist, T.M. Speers, and W.C. Plants, SRAM-based Re-programmable FPGA for Space Applications", IEEE Transactions on Nuclear Science, Vo. 46, No. 6, Dec. 1999, pp. 1728-1735