Understanding radiation effects in SRAM-based field programmable gate arrays for implementing instrumentation and control systems of nuclear power plants

1. Introduction

Field programmable gate arrays (FPGAs) are already a well-known technology in applications such as aerospace, automotive, medical, and high-performance computing and data storage. However, FPGAs are not much used in the area of nuclear power plant (NPP) instrumentation and control (I&C) systems. The International Atomic Energy Agency recommends the use of FPGAs instead of analog- and microprocessor-based systems [1] in future and existing nuclear I&C systems to improve reliability and also to overcome fast obsolescence. As of now, there are only a few reactors in the world that use FPGA-based systems for their I&C, as shown in Table 1 [2]; among those, most systems are implemented using antifuse FPGAs. However, SRAM-based FPGAs have the benefit of the most up-to-date fabrication process, on par with complementary metal–oxide–semiconductor (CMOS) process technology; they also offer much higher integration and logic capacity as well when compared with flash- or antifuse-based FPGAs [3], [4]. Along with the mentioned advantages, SRAM-based FPGAs can be reconfigured an infinite number of times without any degradation in their performance. These features make SRAM-based FPGAs more suitable for complex design implementation. As a result of the implementation of the defense in depth concept in I&C architecture, use cases of the programmable logic device–based designs are varied in their applications and importance. For example, programmable logic device–based designs are used to develop instrumentation for shutdown systems (design assurance level: high) and instrumentation for simple data acquisition systems (design assurance level: moderate to low). SRAM-based FPGAs are primarily targeted for designs in which the assurance level required is moderate to low. The typical cross-section data [5] for SRAM-based FPGAs suggest that the failure rate expressed as failure in times (FITs) due to irradiation in installed locations is much less than the overall target failure rate of the system, i.e., the selected device is not the weakest link in the structure and can be safely used. SRAM-based FPGAs provide the added advantage of configuration readback, which enables system-level diagnostics to reprogram the FPGA in case of an error, a feature that is missing in antifuse- and flash-based FPGAs. Flash- and antifuse-based FPGAs find maximum applications in safety systems, where they are made as simple as possible to enhance the reliability. Capabilities of SRAM-based FPGAs for complex computations and dynamic and partial reconfiguration at runtime [6]are not much required for these systems. However, core temperature-monitoring systems in fast reactors which are tasked with core supervision for early detection of core anomalies, such as plugging of fuel subassemblies and errors in core loading, are a notable exception [7]. These require substantial input/output (I/O) handling capability and processing power and are usually implemented using a microprocessor-based system. SRAM-based FPGAs with large logic processing capacities are ideal candidates for hardware implementation of this system and hence require a detailed study. Although SRAM-based FPGAs have numerous advantages, they are vulnerable to radiation effects either due to transient or cumulative radiation exposure [8].

Table 1. FPGA-based I&C Systems in NPPs.

Nuclear reactor/company	Type & status	FPGA-based systems
Prototype Fast Breeder Reactor, India	Fast breeder reactor , under construction	Reactor core central sub assembly temperature monitoring system, Primary sodium pump reactivity meter, VME (Versa Module Europa) bus–based CPU card, analog I/O cards, and digital I/O cards
CANDU (CANadian Deuterium Uranium) Reactor	Pressurized heavy water reactor, operational	System implemented the logic for shutdown system No.1
Lungmen Nuclear Power Plant, Taiwan	Boiling water reactor, unfinished	Reactor protection system
Wolf Creek Generating Station, USA	Pressurized water reactor, operational	Main steam and feed-water isolation system
The Ukrainian RPC (Research and Production Company) Radiy for Ukrainian and Bulgarian NPPs	—	Reactor trip system, reactor power control and limitation system, power equipment for rods control system, and regulation and monitoring control and protection system for research reactors.
Rolls-Royce and Electricite de France	Pressurized water reactor	Rod control systems

FPGA, field programmable gate array; I&C, instrumentation and control; NPP, nuclear power plant.

The design that needs to be implemented in FPGAs is converted into bitstreamsand downloaded into the device. The bitstreams are stored in the configuration memory, which holds the functionality and the routing of the design mapped into the FPGAs. The configuration memory, which constitutes an array of SRAMmemory cells, along with the configuration access ports and control logic, forms the configuration layer. The user logic, user memory, and I/O resources form the application layer. The current state of the functionality is stored in the user memory [9], [10]. The configuration memory is organized as an array of frames; each bit is stored in the static RAM cells as shown in Fig. 1. These configuration memory cells implement the lookup tables (LUTs), control multiplexers, and other control elements. A LUT stores its truth table in the configuration memory cells, which implement the combinational logic function. The interconnection structure includes a programmable interconnection point, mostly a pass transistor that is controlled by the value stored in the configuration memory cell [11]. The selection line values of the multiplexers and other programmable elements are also stored in the configuration memory cells. The registers [flip-flops (FFs) and latches] and on-chip memory (Block RAM (BRAM)) bits hold the current state of the circuit [12]. Among the elements of the configuration memory, the configuration memory bits are very prone to radiation effects; the bits dedicated to routing resources are more vulnerable than the bits dedicated to logic resources [13]. In the application layer, BRAM is highly susceptible and registers and I/O resources are medium to low susceptible to radiation effects [14].

This article is organized as follows: Section 2 talks about the major sources of radiation effects and also how particle radiation interacts with matter in various ways. In Section 3, the radiation effects in metal-oxide-semiconductor (MOS) structures, especially SRAM-based FPGAs, with emphasis on single event upsets(SEUs), are discussed. The measurement of radiation upset sensitivity and the effects of various irradiation experiments on SRAM-FPGAs are explained in Section 4. The main SEU mitigation techniques for configuration memory and user logics, which are compared based on their mitigation efficiency, are depicted in Section 5. Concluding remarks are given in Section 6.

2. Sources of radiation effects

FPGAs can be affected by gamma photons and also heavy particles like neutrons, alpha particles, etc. When electronic devices are exposed to gamma ray photons, the energy of the photons gets deposited in the devices, mainly by ionization process. The energy required to form an electron–hole pair is called the ionization dose. However, the cumulative energy absorbed by the circuit during the whole exposure is determined as total ionization dose (TID) [15]. The ionization process can take place directly by gamma photons themselves or indirectly by secondary recoil particles. The major damaging effects due to gamma photons are basically single event effects (SEEs) and TID effects caused by increased conductivity and trapped charges in the electronic devices. Neutron interaction with matter is dominated by collisions, with nuclei leading to either scattering or absorption. In elastic scattering, a neutron collides with a nucleus and scatters in different directions. The energy lost by the neutron is gained by the target nucleus. In inelastic scattering, the neutron strikes a nucleus, forming a compound nucleus, and the deexcitation process of the nucleus produces gamma radiation.

The neutron absorption reaction includes radiative capture and nuclear fission. A neutron can be captured by nuclei through one of the following nuclear reactions: (n, p), (n, α), or (n, γ). Elastic scattering is more probable for high-energy neutrons and the capture effect is more likely for low-energy ones. The secondary particles generated by the neutron interaction can cause ionization in the targeted material. For example, an alpha particle generated in such a way has very high linear energy and transfer and deposits its whole energy, ionizing the material. Neutrons generally cause displacement damage dose (DDD) and SEEs in targeted devices. The types of particle interactions and the primary and secondary effects they cause are illustrated in Table 2 [16], [17], [18], [19].

Table 2. Types of particle interaction.

Radiation type	Energy range	Type of interaction	Primary effects	Secondary effects
Photons	<0.1 MeV	Photoelectric effect	Ionizing phenomena	Displacement damage
	0.3–3 MeV	Compton effect
	> 1.024 MeV	Pair production
Neutrons	∼0.025eV	Slow diffusion and capture by nuclei	Displacement damage	Ionizing phenomena
	< 10 MeV	Elastic scattering, capture, and nuclear excitation
	>10 MeV	Elastic, inelastic scattering, various nuclear reactions, and secondary charged reaction products
Alpha particles	Typical 4–8 MeV	Coulomb attraction	Ionization phenomena	--

3. Radiation effects in SRAM-based FPGA

3.1. TID effects

TID effects are dependent on the dose rate, the type of radiation applied, and the internal electric field including space charge effects [20], device geometry [21], [22], operating temperature, time after irradiation (annealing or rebound), [23], [24] and so on. The ionization radiation effects are responsible for building up of the charge in the SiO2 and Si/SiO2 interface. These trapped charges affect the electronic parameters of the MOS transistor, with the threshold voltage (Vth) being the most important parameter [25]. The other parameters are a decrease of transconductance, an increase of leakage current, reduction of drain-source breakdown voltage, a deterioration of noise parameters, and reduction in surface mobility [26], [27].

n-type metal-oxide-semiconductor (NMOS) transistors are more vulnerable to radiation and cause threshold voltage shift more easily than p-type metal-oxide-semiconductor (PMOS) transistors. The positive threshold voltage can either decrease or increase in NMOS transistors, as shown in Fig. 2 [28]. Initially, the charge sheet moves toward the interface due to positive gate bias voltage; a decrease in threshold voltage happens when the oxide trapped charge (Qot) effect dominates. The threshold value can move to a positive side when the charge deposition increases. The threshold voltage shift in the PMOS transistor is as shown in Fig. 3 [29]. PMOS transistors, due to the presence of holes as charge carriers, are slower and carry less current than NMOS transistors, which has electrons as carriers [30]. Given a constant area of influence of a radiation event, the percentage of area affected of the NMOS is two to three times that of PMOS, and hence, PMOS is more tolerant. In modern processes, short channel effects, such as saturation velocity, reduce this ratio to a much lower value [31]. In this context, an isolated NMOS will be more vulnerable than a PMOS. In another perspective, the change in threshold voltage of MOS devices depends on the electric field in the silicon dioxide [32]. Therefore, the biasing voltage has a significant influence on generated and trapped charge. The threshold voltage shift can be expressed as the sum of two voltage changes caused by the increase of the charge in silica (Qot) and two interface trapped charges (Qit) [33]. The effect of the trapped charge and the interface state formation are additive in PMOS devices, but for the source of the differential in NMOS devices likely lies in the difference in worst-case logic bias conditions for PMOS and NMOS transistors [34]. The position of the built-up charge strongly depends on the gate bias voltage, and thus, the smaller the distance between the gate terminal and the charge sheet, the less additional electric field is observed and the less the threshold voltage is shifted. The distance is greater for the PMOS transistor because of negative biasing; thus, PMOS is more radiation resistant than NMOS [15]. Charges trapped in MOS oxide will shift the threshold voltage negatively in NMOS, leading to unacceptable drain-source leakage current. In PMOS, the opposite occurs, increasing the threshold and reducing the leakage [35].

3.2. DDD effects

The DDD quantizes the displacement damage to the semiconductor lattice due to the impact of energetic particles. If the transferred energy is higher than the displacement energy, a lattice atom will be removed from its original position in the lattice and a defect will be created [20]. A cascade of disruptions in the silicon lattice is possible with higher energy particle exposure. The main types of displacement defects are vacancy, divacancy, interstitial, Schottky and Frenkel as shown in Fig. 4 [18].

The displacement damage changes the arrangement of the atoms in the crystal lattice, creating lasting damage and increasing the number of recombination centers, depleting the minority carriers, and worsening the electronic properties of the affected semiconductor junctions.

3.3. Single event effects

A SEE is caused by a single energetic particle, which generates an electrical charge in a material depending on the amount of energy the ionizing particletransfers to the material; this process is also known as linear energy transfer (LET) [36]. LET is expressed in MeVμm−1; it can also be measured in MeVcm2g−1when it is normalized to the specific mass of the absorbing material [37]. Critical LET, or the LET threshold (LETth), is the maximum LET value deposited by a high-energy particle travelling through a semiconductor device for which failure is not yet observed. When the created electron–hole pairs are expressed as a charge, the minimum charge necessary to create SEE is called the critical charge [38]. The SEEs can be classified as soft errors and hard errors [39], as illustrated in Fig. 5. Hard errors, being nonrecoverable, can permanently damage the hardware in the same way as in the case of a burnout resulting from a short circuit. A soft error is a change in the signal or a data bit flip and can occur in logic modules, I/Os, routing resources, and block random access memory (RAMs)—virtually any part of the FPGA. When a soft error occurs, the device may still function correctly or may exhibit partial functionality [40]. Unlike hard errors, soft errors can be detected and corrected through special design techniques without having to power-cycle the device.

3.3.1. Soft errors

The capacitance and voltage levels of logic circuits have a significant role in the generation of soft errors; the higher these parameter values are, the less probability there is of a soft error generation. The critical charge value varies for each node in the FPGAs. The capacitance of the internal nodes of SRAM cells is very low compared to that of FFs; therefore, it requires less charge deposition to alter the value stored in the SRAM cells. Soft errors are mainly classified into two types: they are single event transients (SETs) or SEUs.

The basic mechanism of soft error generation is illustrated in Fig. 6 [41]. When the charged particles pass through the device material, they generate electron–hole pairs. The most susceptible parts are generally reverse-biased p-n junctions. The charge carriers are collected by the electric field and drift to the nearby node, where a current/voltage transient is created. The majority of the charge is collected by rapid drift process, and this is followed by a diffusion process, as shown in Fig. 7 [41]. A funnel-shaped extension of the depletion region enhances the drift collection; therefore, more charges can be effectively collected at the node [42], [43].

3.3.1.1. Single event transient

A SET is a current or voltage spike generated due to particle strikes. SET could be a glitch in the circuit or it may get captured in FFs or other memory elements and can cause a functional error in the operation of the device [44]. SETs are not always harmful to the device and may be transitory in nature. The probability of transient pulse capture is increased by high clock speeds [45]. As SET captures are asynchronous, it is impossible to predict them by static timing analysis. The generation of a transient pulse and its capture are shown in Fig. 8.

3.3.1.2. Single event upset

SEU is a soft error caused by a transient signal induced by a single energetic particle strike when the collected charge is greater than the critical charge required to cause a change in state of a memory cell, register, latch, or FF. For 0.5 μm technology, the critical charge required to cause an SEU is roughly in the range of femto coulombs. The SEU sensitivity is measured by cross-section and is expressed in cm2/bits or cm2/device. The most sensitive regions in SRAM cells are the reverse-biased drain junctions of a transistor biased in the off state [46], [47].

SEU generation is dependent on lots of factors such as the LET, particle strike location, charge collection and recovery process, etc.; from a technology standpoint, it depends on the restoring transistor current drive and minority carrier lifetimes in the substrate [48], [49], [50], [51]. A bit flip in an SRAM cell is illustrated in Fig. 9 [52].

3.3.1.3. Single bit and multiple bit upsets

A single particle strike can affect either single or multiple memory cells based according to whether it is a single bit upset or multiple bits upset, respectively. A single particle strike can pass through multiple adjacent cells and can cause multiple bit upsets. There are three major principles for multiple bit upset origination: (a) a particle impact angle that allows the particle to pass through more cells; (b) a diameter of the cylinder in which the charge is deposited, that crosses more memory cells such that SEUs may occur there; (c) memory cells that are upset by the products of spallation reactions from the primary particle in the chip [53].

3.3.1.4. Single event functional interrupt

Single event functional interrupt (SEFI) causes the interruption of normal operation of the affected device [54]. SEFI is a special case of SEU in which SEU either occurs in control logic or control over the logic, and the device functionality are lost. SEFI in SRAM-based FPGAs is due to upsets in particular circuits that involve power-on-reset, failures in the joint test action group (JTAG) or select-map communications port, loss of configuration capability, or others [55], [56].

3.3.2. Hard errors

3.3.2.1. Single event latchup

Single event latchup (SEL) occurs when the energy released by a particle strike can activate the parasitic thyristor (PNPN structure) embedded in the CMOS architecture [57], [58]. When activated, this structure presents positive feedback, causing the involved transistor to start to drain high current [59]. Depending on the resistance, the latchup can be a) fatal when the current density exceeds safe current limits or b) temporal (soft) when a latchup generates heat that further increases current consumption. However, after the power cycle, the device recovers [51]. The SEL typically requires power cycling of the device (when the latchup occurs between the supply voltage and ground) but can also occur within signals where the latchup can be stopped by the change of values.

3.3.2.2. Single event burnout

Even when there is no P-N-P-N structure, an ion strike can turn on a real bipolar junction transistor (BJT) or a parasitic BJT structure in a (usually) n-channel metal-oxide-semiconductor field-effect transistor (MOSFET). The resulting second breakdown causes a high-current state and can cause thermal failure of the device. Due to particle strike, the substrate right under the source region gets forward biased, and the drain-source voltage is higher than the breakdown voltage of the parasitic structures. The resulting high current and overheating may then destroy the device. MOSFETs, BJTs, and some CMOS structures are very susceptible to single event burnouts [60], [61].

3.3.2.3. Single event gate rupture

A local breakdown happens in the insulating layer of SiO2, causing local overheating and destruction of the gate region [62]. Single event gate rupture only affects transistors when they are in their nonconducting states (VGS ≤ 0V for n-channel devices or VGS ≥ 0V for p channel devices). In the case of single event gate rupture, holes from the ion strike pileup under the gate, thus increasing the electric field across the MOSFET gate oxide to its dielectric breakdown point. The resulting flow of the current causes thermal failure of the gate oxide. These events represent localized breakdowns in the oxide and also can result in latent damage [63].

4. Measurement of radiation upset sensitivity

Before deploying FPGA-based systems in NPP I&C systems, the sensitivity of the device needs to be measured. For this purpose, the device has to be exposed to radiation sources and the consequences have to be analyzed. The main objectives of irradiation experiments on SRAM-FPGAs are listed in [64], they are:

a.
Measure SEU sensitivity of configuration memory and block RAM cells (with and without mitigation techniques).
b.
Measure SEU sensitivity of input/output blocks (IOBs).
c.
Measure SEFI modes (power on reset, SelectMAP, IOB, etc.)
d.
Measure the TID effects.

SEE evaluation can be mainly classified into three areas; they are: 1) static: during irradiation the FPGA design is tested in unclocked state, and configuration memory upsets and SEFI failure modes are measured [65]; 2) dynamic: the FPGA design is tested in clocked state and this mainly helps to measure the SETs and also measure SEFIs and IOB upsets; process requires observation to measure the upsets during transient signal propagation [65]; 3) mitigation: after implementing the error mitigation techniques, the FPGA design is evaluated for upsets. The radiation test needs to be conducted mainly to determine faults in the logic resources (LUT error, multiplexer (MUX) error, and FF error) [66] and routing resources (short error, open error, open/short error) [67]. A basic block diagram of the irradiation experimental setup is