Person debugging PCB

Diagnostics on your Diagnostics

A question that often arises is whether IEC 61508 has any requirements related to diagnostics other than the coverage they give and the required rate at which they are run. For instance, do diagnostics themselves need a diagnostic SFF (diagnostics on your diagnostics), do the elements implementing the diagnostics need a systematic capability (minimum design requirements) and do diagnostics have any specific reliability requirement (dangerous undetected failure rate)?

The main reason they might is that the diagnostics allow you to claim a higher SIL. In fact, as you increase the SFF from 60% to 99% for a non-redundant element or sub-system your SIL claim limit increases from SIL 0 (no safety claim according to IEC 61508) to SIL 3. Since the level of safety achieved increases 10X with each increase in SIL this is a 1000x reduction you get from improving your diagnostics. Yes, you have to do more than simply add diagnostics to get this expected reduction in risk, but if you had done all the other things then the maximum SIL you could claim depends directly on the diagnostics.

Maximum allowable safety integrity level for a safety function

Figure 1 Improvement to the SIL claim limit you get from increasing the diagnostics.

So, let’s see what is in IEC 61508 revision 2.

The first thing I want to show you is that you must know the reliability of your diagnostics as evidenced by item e) below.

 Extract from IEC 61508-2:2010 Annex D

Figure 2 Extract from IEC 61508-2:2010 Annex D

So, you need to know the failure rate of your diagnostics. Of course, if you claim a dangerous failure of the diagnostics of less than 10-5 /hour your diagnostics could automatically become safety related as shown below. To avoid this you might want to limit the claim on the failure rate of your diagnostics to 1e-5/h.

Figure 3 an extract from IEC 61508-1:2010

Item d) from figure 2 would also seem to imply that you need to do an FMEDA or similar to detect the failure modes of the diagnostics. It’s unclear what the end user is supposed to do with this data, but you must still put it in the safety manual. My assumption is that the end user can use the data to detect a failure of the diagnostics but that’s just my assumption.

As regards the systematic capability of the diagnostics it is obvious that if you cannot show sufficient separation between the diagnostics and the functionality of the safety function the diagnostics become safety related but what if they are sufficiently independent. IEC 61508:2010 contains limited guidance and the guidance is somewhat hidden.

 Extract from IEC 61508-2010 D.3

Figure 4 Extract from IEC 61508-6:2010 D.3

There doesn’t seem to be any requirement in part 2 to have diagnostics on your diagnostics i.e. some equivalent of an SFF but just monitoring the diagnostics.

Some sector specific standards such as ISO 26262 with its latest fault metric has this.

 Latent fault metric from ISO 26262

Figure 5 Latent fault metric from ISO 26262

Our colleagues in Europe working in the machinery sector definitely have thought of this already (see link at bottom to access entire document).

 Extract from the vertical recommendations

Figure 6 Extract from the vertical recommendations for use of the machinery sector

The machinery sector has obviously embraced this. In ISO 13849 the assumption is made that the dangerous failure rate of the diagnostics are no more than half that of the that of the safety function. To confirm this, you really need to calculate it, unless it is very obviously less.

In the other machine safety standard IEC 62061:2021 7.4.4 we find clear guidance including.

  • The diagnostic functions are considered as separate functions.
  • Applicable requirements for avoidance of systematic failures

With a requirement that the SC (systematic capability) and PFH are the same as that of the safety function itself or that a test on the diagnostics be carried out every two years (assuming proof test interval of 10 years).

Before I finish up there is one caveat. Let’s look at the definition of dangerous failure in IEC 61508.

 Definition of a dangerous failure from IEC 61508-4:2010

Figure 7 - Definition of a dangerous failure from IEC 61508-4:2010

This means that a failure of a diagnostics is not a dangerous failure of the safety function. But that doesn’t mean that a failure of a diagnostic function doesn’t make it less likely that the safety function will operate correctly when required to do so. A Markov diagram can show this.

 Simple Markov model

Figure 8 Simple Markov model showing impact of an undetected failure of the diagnostics

The Markov model shows that you get a dangerous undetected failure of the safety function if you get a dangerous failure of the safety function which is not covered by the diagnostics or if the diagnostics fail silently then any dangerous failure of the safety function will take you to the failed state. So once you are in state 1 the chance of moving from state 1 to state 2 can be up to 99 times higher (for SIL 3) than the old probability of moving from state 0 to state 2. While the Markov modelling is for random hardware failures the real world doesn’t distinguish between systematic and random hardware failures.

So, we wait with bated breath to see what will be agreed for IEC 61508 revision 3. I believe it will bring together some of the above and spell out the requirements more clearly.

The vertical recommendations for use for the machinery sector are available here.

For previous blogs in this series see here.

For the full suite of ADI blogs on the EngineerZone platform see here

For the full range of ADI products see here