I think HFT (hardware fault tolerance) from IEC 61508 is one of its most misunderstood and misused concepts from the standard. People and indeed other standards often require HFT=1 according to IEC 61508 when what they really want is single fault tolerance or two-channel safety.
Let us start with the question “What is HFT”? HFT exists as a concept to allow a trade-off between redundancy and diagnostic coverage as measured by SFF (safe failure fraction) if following Route 1H of IEC 61508. For route 2H which doesn’t use SFF, there are minimum HFT requirements depending on the SIL and demand mode. Route 1H is the most common route for IEC 61508 designs. The table below gives the rules for the more complex items (type B) according to route 1H showing the ways you can tradeoff HFT and SFF (safe failure fraction – a measure of diagnostic coverage which gives credit for failures in the safe direction). This is therefore the only use of HFT in the standard for most designs.
Figure 1 - Table showing on SFF and HFT can be traded off against one another for complex items if following route 1H
As an aside - IEC 61508-2:2010 7.4.4.2.1 2) allows cross-comparison as a means to achieve a higher SFF by comparing the outputs of redundant portions of the architecture. This means that an element or sub-system could have a different SFF depending on how it is used. One value of SFF if used in a non-redundant configuration and a higher SFF if used in a redundant configuration.
And that is the extent of the use of HFT for most designs.
Otherwise, HFT is only used in IEC 61508-2:2010 sub-clauses
- 4.4.1.3 and 7.4.4.1.4 when discussing time constraints on diagnostics and achieving the safe state
- 4.8 required safety actions when a fault is detected depending on the HFT
Now that we know how HFT is used let’s look at its definition.
Figure 2 Description of HFT from IEC 61508-2:2010
From the above, we can see that a redundant system will meet the requirements of HFT = 1. However, a non-redundant system will not meet the requirement even if it has 100% diagnostic coverage since you can’t claim credit for a diagnostic when calculating the HFT i.e. you get no credit for the fact that it is single fault tolerant in the sense that both the function itself and its diagnostic would have to fail before you get a dangerous failure of the safety function. This is in contrast to other standards such as the machine safety standard ISO 13949 where the diagnostics can be considered as part of claiming single fault tolerance.
Figure 3 an extract from ISO 13849-1:2015 6.2.1
Another implication of the above HFT definition (see Figure 2) is that safe faults are ok (don’t impact on your HFT claim) since they cannot cause a loss of the safety function. So, for instance, suppose you have a single ground pin and the item stops communicating (assumed to be an identified safe state) if the ground pin lifts, it may be sufficient to just have one ground pin to claim HFT=1. If you accept that then you should also accept a wire break on a single-channel communications network that doesn’t limit an HFT claim! Point c) on the list also allows fault exclusions on some faults and such faults do not limit the HFT claim.
Let’s now have a look at the channels. A channel is defined in IEC 61508-4:2010.
Figure 4 - Channel definition from IEC 61508-4:2010
Channels become important when you want to calculate the reliability metrics - PFH for a high-demand safety function and PFD for a low-demand safety function. IEC 61508-6:2010 contains descriptions of various architectures such as 1oo2, 2oo3, etc. You can read MooN as M channels out of the implemented N channels need to fail to cause a failure of the safety function. There are no SFF or HFT requirements directly associated with channels.
You might think this means that a 1oo2 system will have a HFT of 1 but that is not necessarily true. What about the voting logic? If a single dangerous failure in the voting logic could cause a dangerous failure of the safety function, you don’t have HFT = 1 – see the definition above to satisfy yourself with this. However, in most cases, it probably does meet the requirements of HFT since you would need a failure of at least one of the channels and a failure of the voting logic to bring about a dangerous failure of the safety function. Just be careful and think about it. While you are thinking about it, what about the power supply to the two channels, if only one power supply you might not have HFT=1! You might say you have a single power supply and a power supply monitor but remember you can’t take credit for diagnostics when calculating your HFT.
Figure 5 - 1oo2 description from IEC 61508-6:2010
Also, remember that redundancy can be used for availability as opposed to increased safety. For instance, in a 2oo2 system, both channels need to demand the safety function to go to the safe state. If one channel fails dangerously your safety function will not be executed.
Can you apply the HFT rules to items other than elements and sub-systems? The answer is yes but really all it means is that such components will then impose limits on the element or sub-system they are used to build.
Terms used in the literature which mean some form of redundancy is required or implemented include
- No single error
- Single fault tolerant
- 1oo2, 2oo3
- HFT=1
- A voted system
- Redundant
Hopefully, you now better understand that if following the IEC 61508 standard the main purpose of HFT is to provide a tradeoff against SFF.
The full series of blogs in this series can be found here.
A blog complaining about standards that have mandatory redundancy requirements can be found here.
For a blog discussing the 1oo2 architecture see here.
For a blog discussing the requirements for on-chip redundancy claims see here.