Two adorable twin babies nestled in a basket with a soft teddy bear beside them.

The 5 Advantages of Hardware Fault Tolerance

In IEC 61508, HFT stands for “Hardware Fault Tolerance”. HFT of one or higher means that there is no single fault that will cause a safety function to fail. Usually, this is done with two channels so that, even if channel A fails, channel B can still perform the safety-related task. Note: when following IEC 61508, no credit is given for diagnostics when calculating HFT; see IEC 61508-2:2010, 7.4.4.1.1a. I have previously discussed why I don’t support mandatory single-fault tolerance. See, for instance: Is Mandating a Category or HFT Best Practice

However, in this blog, I will focus on the 5 advantages of strategically leveraging hardware fault tolerance.

Advantage 1

This is the obvious one. It can be used to meet hardware metrics even when high diagnostic coverage is not possible, or when there are no diagnostics. Looking at the table below, if you need to implement a SIL 1 safety function, then for that element, you need to be able to achieve an SFF (safe failure fraction – a measure of diagnostic coverage which gives credit for failures to the safe state) of at least 60%. However, if you implement an architecture with HFT=1, you can get away with zero diagnostics in either channel. For SIL 2, you can get away with an SFF of 90% per channel instead of 99% if you have one channel and so on.

 An extract from IEC 61508-2:2010 table 3

Figure 1: An extract from IEC 61508-2:2010 table 3

Advantage 2

Another obvious one. It makes it easier to achieve your PFH/PFD requirement.  For any SIL, the safety function has a dangerous failure rate limit. To achieve SIL 3, you need a dangerous undetected failure rate of < 100e-9/h. This can be challenging, needing reliable components with high diagnostic coverage and a high diagnostic rate. If you have a single channel with a dangerous undetected failure rate of 5e-7/h, it won’t meet the SIL 3 requirement. If you put two of those channels in parallel, however, the failure rate will fall to β*5e-7/h = 5e-8/h (for an assumed common cause failure rate of 10%), and then you meet the PFH requirement for SIL 3.

Figure 2: PFH/PFD requirements according to IEC 61508

See a previous blog on quantifying common cause failures here.

Advantage 3

 With HFT>=1, you don’t have to run your diagnostics as often. IEC 7.4.5.3 specifies the diagnostic test internal/rate for HFT = 0 applications. There are two options:

  •  Option 1 – as shown in the diagram below, run the diagnostics fast enough so that you detect a failure and go to your defined safe state before something bad can happen.
  •  Option 2 – run your diagnostics at least 100X faster than the demand rate.

Figure 3 - Graphical interpretation of option 1 from IEC 61508-2:2010 7.4.5.3

For some applications, this can often mean a diagnostic test rate of << 100ms. This means you cannot claim credit in your safety case for specific diagnostics you may have implemented.

If, however, you go with a redundant (HFT=1) architecture, the diagnostic test interval requirements are far more generous, see IEC 61508-2:2010 7.4.5.4. Some sources suggest a diagnostic test rate of once/year for SIL 2 and once/month for SIL 3 when the architecture is HFT>=1. This allows a wide range of diagnostics, such as an STL (software test library) to test a uC core or march testing for RAM.

Advantage 4

It reassures nervous people who don’t trust your reliability metrics.

Advantage 5

This is related to advantage 4. While many recently revised standards have removed the requirement for mandatory HFT or single-fault tolerance, some standards, especially in the machinery sector, still include such requirements.  Implementing a HFT = 1 safety system will satisfy these requirements. So, as I said at the start, I have often argued against mandatory HFT requirements, but that doesn’t mean I wouldn’t choose to implement such an architecture.

Related Blogs

  1.  HFT vs Two-channel Safety – The same or different
  2.  Is Mandating a Category of HFT Best Practice
  3.  How to describe a redundant system
  4.  On chip redundancy according to IEC 61508 Annex E
  5. Combining functional safety and availability using redundancy

For previous blogs in this series, see here.

For the full suite of ADI blogs on the EngineerZone platform, see here.

For the full range of ADI products, see here.

  • There is an answer by the European Notified Bodies in a vertical "Recommendation for Use" sheet regarding the Machinery Directive 2006/42/EC. The question asked in this RfU was:

    "What are the minimum requirements concerning the frequency of tests for failure detection in a safety-related system with 2 channels with electromechanical outputs (relays or contactors)?"

    And the answer to this:

    "A functional test (automatic or manual) to detect failures shall be performed within the following intervals:

    a) at least every month for PL e with Category 3 or Category 4 (according to EN ISO 13849-1) or SIL 3 with HFT (hardware fault tolerance) = 1 (according to EN 62061);

    b) at least every 12 months for PL d with Category 3 (according to EN ISO 13849-1) or SIL 2 with HFT (hardware fault tolerance) = 1 (according to EN 62 061)."

  • Very interesting arcticle!

    For advantage 3, you write "Some sources suggest a diagnostic test rate of once/year for SIL 2 and once/month for SIL 3 when the architecture is HFT>=1" - Can you provide some concrete sources to reference?

    I have seen some material from TUV that describes 24 hrs as being a suitable interval, and I have worked with safety consultants who believe that once per work-shift (~8hrs) is appropriate, so I realize there is room for some interpretation.