To Assume Makes a Donkey of You and Me

To Assume Makes a Donkey of You and Me

There is a saying that - to assume makes an A** of U and ME. Assuming something related to a safety case can be especially dangerous.

Functional safety standards present numbers and analysis; unless you know the assumptions underlying that analysis, you can misuse the standards. In this blog, I will examine some assumptions behind IEC 61508-6:2010 Annex B and ISO 13849-1 Annex K.

Starting with IEC 61508-6:2010 Annex B, this standard contains an analysis of architectures such as 1oo1, 1oo2, etc. for both low and high demand to derive formulas for the PFH and PFD of the various architectures. I have analysed some of these architectures in these blogs and here.

Figure 1: PFH formula derivation for a 1oo2D architecture

But there is a long list of assumptions behind these formulas, the assumptions are found in IEC 61508-6:2010 B.3.1.

Some of the assumptions include:

  • Component failure rates are constant – this is not a surprise as this is a general assumption throughout most IEC 61508
  • The channels in a voted group all have the same failure rates and diagnostic coverage – in many cases, this is not true
  • Perfect proof test and perfect repair – I wonder!
  • The expected interval between demands is at least an order of magnitude greater than the proof test interval – I need to think about this one because people often say the proof test interval is 20 years to say that there is no proof testing over the lifetime of the part – this means that the demand interval would have to be at least 200 years which for machinery and robots is never true. Does this alone make the equations useless for high demand? This is a blog for another day.

Note – while writing this blog I noticed one assumption which states “the overall hardware failure rate of a channel of the subsystem is the sum of the dangerous failure rate and safe failure rate for that channel, which are assumed to be equal”.  I think this is wrong and is a mistake in the standard as the formulas include λDU, λDD and since we have DC instead of SFF I think that assumption is not required.

Within some of the descriptions such as that for 1oo2 in B.3.2.2.2 we find more assumptions such as “It is assumed that any diagnostic testing would only report the faults found and would not change any output states or change the output voting”. This means that the assumption is that the system doesn’t trip just because it has one faulty channel but rather continues and relies on the other channel to meet the safety requirements.

Another interesting assumption is that from Annex C of part 2 where it states, “The calculations to obtain the diagnostic coverage, and the ways it is used, assume that the EUC can operate safely in the presence of an otherwise dangerous fault that is detected by the diagnostic tests”. I’m not sure exactly what this means. Why wouldn’t the EUC (equipment under control) continue to operate safely if there is a failure of the safety function? Is it some sort of independence requirement? 

If the above assumptions don’t hold true in your case you might have to do your own calculation perhaps using Markov modeling.

Annex D of IEC 61508-2:2010 which covers safety manuals requires you to make your assumptions explicit when it states:

  1. “Therefore, no claims shall be made in the safety manual, in respect of the hardware fault tolerance or the safe failure fraction or any other functional safety characteristic that is dependent on knowledge of safe and dangerous failure modes, unless the underlying assumptions, as to what constitute safe and dangerous failure modes, are clearly specified”.
  2. “Constraints on the use of the compliant item and/or assumptions on which analysis of the behaviour or failure rates of the item are based.”.

 Now let’s look at ISO 13849-1 Annex K. ISO 13849 offers a simplified method to determine a PL (performance level) based on DC (diagnostic coverage), MTTFd (representing the average dangerous failure rate) and CAT (category or architecture). If you want to go a little less simplified there is additional information given in Annex K to allow you choose a different MTTFd and DC outside the fixed values used in the body of the standard.

Figure 2: Extract from ISO 13849-1:2015 table K.1

The table was generated based on unpublished Markov modeling. I think they might be working to release this as ISO 13849-3 but I’m not 100% certain of what it will contain. However, the assumptions behind the calculations are not explicitly stated. From trying to reproduce the numbers I believe the assumptions include

  • The failure rate of the diagnostics is ½ the failure rate of the functional channel. If the failure rate of the diagnostics is < ½ the failure rate of the functional channel, then the calculations are conservative.
  • A beta of 2% (common cause failure rate for the CAT 3 and CAT 4 columns)
  • A proof test interval of 20 years i.e. no proof testing. The secondary assumption is the lifetime of the machine is 20 years (see for instance the robot safety standard ISO 10218-1).
  • Demand rate of 1 per day which seems a bit low for some machinery applications and it would give optimistic results if the real demand rate is higher

Assuming you got to the end of this blog I invite you to check back next month on the second Tuesday of the month for the next blog in this series. Until then I hope to post “mini blogs” on the other Tuesdays in the month directly from my LinkedIn account. Please follow me on LinkedIn if interested.

For previous blogs in this series see here.

For the full suite of ADI blogs on the EngineerZone platform see here

For the full range of ADI products see here