Can you ever be half safe? The topic here today is to explore the approximation given in IEC 61508-2:2010 Annex C that for complex components (such as an IC) the failure rate of the component can be taken as 50% safe 50% dangerous. I have heard people say the 50% safe 50% dangerous approximation is no longer an acceptable practice and I decided to look in more detail.
I hadn’t planned to do this blog next, but it turns out that it simplifies two other blogs I was considering if I get this one done first.
As you know I work for a semiconductor vendor Analog Devices and I believe this approximation is especially valuable to our customers who wish to build a safety system out of standard components. For components such as the ADFS5758 developed to IEC 61508 you should have all the information you need in the safety manual but for other components you may just have the standard datasheet and a reliability prediction based on the data available on www.analog.com/ReliabilityData.
Below is the text from IEC 61508-2:2010 Annex C:
Figure 1 - IEC 61508-6:2010 Annex C
This text appears unchanged from the 1998 version of IEC 61508 and survived the 2010 revision and so far, there have been no national committee comments to remove it from the third revision of IEC 61508 due out in 202X.
It was also in the 2011 version of ISO 26262 and is still used in an example from ISO 26262:2018 as shown below.
Figure 2 - use in an example from ISO 26262-5:2018 Annex E
Therefore, it appears to be as if this approximation is still considered as valid by most safety experts and that there is very little will to remove it from the standards. We shall then move onto the questions of why I believe it is useful and why I believe it is conservative.
The reason that it is useful is that most safety systems for industrial are still built using standard components. Since the components are standard COTS (commercial of the shelf) products there are no safety artefacts available for them such as an FMEDA. Failure rate predictions can be got from somewhere like the ADI reliability site at www.analog.com/ReliabilityData or calculated according to IEC 62380 or SN2900 once you know a few things like the transistor count. If you don’t know the transistor count you can estimate it conservatively since SN29500 uses bands of transistor counts e.g. 50k to 500k and 500k to 5000k transistors, so you don’t have to know the exact value just the band it might fall into. Even if you get the band wrong the impact of going from the 50k to 500k band to the 500k to the 5000k band is that the FIT might go from 60 FIT to 70 FIT as shown in the SN29500 extract below.
Figure 3 - Extract from SN29500-2 showing impact of transistor count on FI
This means that a reliability prediction should not be a problem but how do you allocate it to the various on-chip blocks and how do you claim a diagnostic coverage for it. Since the product is COTS it is very much a black box. Therefore, being allowed make the 50% safe 50% dangerous approximation as shown in the previous example from ISO 26262 and in the example from IEC 61508-2:2010 Annex C allows the analysis to continue.
Figure 4 - Example showing the use of the approximation from IEC 61508-6:2010 Annex C
Trying to make this example add up it appears that a FIT of 744 was assigned to the IC. Four failure modes of the IC are then identified as open circuit, short circuit, drift, and function. The failure rate of 744 is allocated equally to the four failure modes which is another common approximation when no other details are available (justifications include that the real value of an FMEDA is making you think about the analysis rather than the actual number produced). In the above example failures due to open circuit and short circuit are considered as dangerous. For failures due to drift it is split 50% safe and 50% dangerous and for function the split is also 50% safe and 50% dangerous. Therefore 25%+25% + 12.5% + 12.5% = 75% dangerous and 25% safe overall if the failure rate was divided equally across the failure modes. However, this does not seem to be the case. Examining the above exactly 30% of the failures are open+shorts and 70% are drift+function. I looked at several sources of failure mode distributions and I can’t find anything to match the above distribution and since its exact derivation is not important to the main focus of this paper I will ignore it but if anybody knows the source of the failure mode distributions I would love to hear it (I added note 3 at end for a possible source of the data). The analysis above then assumes a 90% DC for all the safe failures and 90% DC for all the dangerous failures to give an “SFF” for the IC of over 93% (I put “SFF” in inverted commas because in the standard SFF applies to an element as opposed to a component).
I note the ISO 26262-5:2018 example uses the 50% safe, 50% dangerous approximation at the whole IC level rather than for the individual failure modes. You could therefore argue that the IEC 61508 example is more nuanced (it applies it for some of the failure modes only).
I said earlier that some experts now frown on the 50% safe 50% dangerous approximation. One of the reasons I have heard for this is that there is no consideration of no-effect failures. No effect failures were introduced in the 2010 version of IEC 61508 to prevent counting no effect failures as safe failures as a means to make the SFF metric look better (I believe this might still be a possibility in a system developed to ISO 26262).
To examine if ignoring no effect errors is conservative let’s assume a FIT of 100. If we assume 50% safe 50% dangerous and ignore no effect errors, then we have an SFF of 50% even with no diagnostics. To continue the analysis with no diagnostics the λDU (portion of the FIT which is dangerous undetected) will be 50 FIT. If instead we assume 50% of the failures are no effect and then use the 50% safe and 50% dangerous approximation, we still end up with an SFF of 50% and a λDU of 25 FIT (see note 1 below). So, using the approximation has no effect on the SFF and makes the λDU worse which means it is conservative.
Other references to the 50% safe 50% dangerous approximation include:
- IEC 61508-6:2010 table B.1
- IEC 61800-5-2:2015 clause B.184.108.40.206, clause B.220.127.116.11, clause B.3.1.3
- ISO 13849-1:2008 clause C.5.1 and tables C.2, C.3, C.4, C.5, C.6, C.7
- IEC 61131-6 18.104.22.168.1
- ISO 26262-10:2018 8.1.8 g)
- Functional Safety – An IEC 61508 SIL 3 Compliant Development process” section 9.10.2
I haven’t included a video link in a while, but I came across this over the weekend: https://www.youtube.com/watch?v=aDSHABsgkjA – it’s a hit from the 1980s and has no relevance other than it is called the “Safety Dance.”
Note 1 – under the rules from IEC 61508-2:1998 no effect failures would be considered as safe failures and the SFF above would have been 75%.
Note 2 – if you use the 50% safe 50% dangerous approximation then you get a neat relationship between SFF and DC whereby the SFF = 50% + DC/2. So that a DC of 85% gives you an SFF of 90% and DC of 90% gives you an SFF of 95%.
Note 3 – one possible source of the failure mode distributions for an IC is table 2-2 of FMD-91 (Failure mode/mechanism distributions 1991 from the reliability analysis centre) which is referenced in the bibliography of IEC 61508-6:2010. In this table opens are given for an SSI/MSI digital IC as a failure rate of 18.7%, shorted as 12.2% for a combined 30.9%. Drift is given as 13.2% leaving everything else under the function header as 55.6%. These are close to what is used above.