A statue of a man intently holding binoculars, symbolizing exploration and observation.

Are Some Diagnostics Self-checking?

The new draft of IEC 61508 revision 3 contains requirements:

  • For diagnostics on your diagnostics.
  • That diagnostics have a systematic capability.

What I want to talk about today is the first of these. Can some diagnostics be considered as self-checking?  If they do, then they already have a diagnostic on your diagnostic.

The primary concern that led to the requirement for diagnostics on diagnostics is that:

1. Your diagnostic circuitry fails. Initially, there is no issue since the main body of the safety function is still working fine. Next, the safety function fails, and since there is no diagnostic to detect the failure, bad things can happen.

2. If the safety function circuitry fails before the diagnostic circuitry fails, there is no issue, as the diagnostic circuitry will detect it and bring the system to a safe state.

Diagnostics on your diagnostics helps with 1. above by detecting that the diagnostics have failed and takes the system to a safe state. The question for this blog is whether some diagnostics are inherently self-checking and don’t require diagnostics on your diagnostics, since failure of the diagnostic will become quickly apparent.

To state it differently, since the requirement is DSFF (diagnostic SFF), you get credit for safe failures of the diagnostic.

DSFF = 100.0 * (safe failure rate of the diagnostic)/(safe + dangerous failure rate of the diagnostics)

Given that a DSFF of 60% or 90% is the most that will be required if a large portion of the failures of the diagnostic lead to the system tripping, you would easily meet the new requirements from IEC 61508 revision 3.

Let’s take some examples to show the reasoning.

Example 1 – A CRC for an SPI Interface

A CRC is used, for instance, to check the data on a communications bus against corruption. The data bits are used to calculate a CRC value, which is transmitted along with the data. The receiver then performs the same calculation and verifies that the CRC is correct. If the values don’t match, the received data is corrupted, and the system should trip. This is shown in the graphic below.

 Diagnostic for data corruption on an SPI bus – this could be the transmitter or the receiver

Figure 1: Diagnostic for data corruption on an SPI bus – this could be the transmitter or the receiver

There is no need to have an engine to feed the data bits to the CRC calculation logic, as it “flows” past on the SPI bus.

The 16-bit CRC can have 2^16 = 65536 possible values, and only one of them is the correct one. Therefore, there is only a 1/65536 chance that the CRC calculation engine is faulty but calculates the correct value.

If we analyze this using a very simplified FMEDA, we obtain an SFF (in this case, essentially a DSFF) of over 99%.

A simplified FMEDA

Figure 2:A simplified FMEDA

The comparison logic is not self-checking, as its output could be stuck in the 'don’t trip' state regardless of the input data. The FMEDA assumes that the size of the CRC calculation logic is 10X that of the comparison logic. So, even with no added diagnostics on the diagnostics, the DSFF metric is met even for a SIL 3 safety function.

Note – this checking of the diagnostic is done on every data transfer. This makes the diagnostic good enough for even a continuous mode safety function.

Example 2 – CRC used to Check a Flash Memory

This example is very similar to the SPI checker. However, in this case, there is an additional piece of logic to read the data bits from the memory and feed them to the CRC calculation engine. The CRC checks this additional logic and will have a failure rate bigger than the CRC calculation logic or the comparison logic, and the DSFF calculation is even more likely to be higher than before.

Diagnostic for data corruption in a flash memory

Figure 3: Diagnostic for data corruption in a flash memory

Note - The use of a CRC to monitor a flash memory results in a large diagnostic test interval, and it may not be suitable for all applications.

Example 3 – ADC used to Monitor the Output of a 4/20mA DAC

I have previously blogged about 4/20mA systems; see here. While it is usually challenging to implement a single-channel SIL 3 system due to the SFF requirements, one application where it may be possible is with a 4/20mA DAC. For SIL 3 HFT=0, you need an SFF of 99% and a DSFF of 90%. Can the DSFF requirement be achieved with a self-checking argument?  A possible architecture is shown below.

ADC used as a diagnostic for a DAC

Figure 4: ADC used as a diagnostic for a DAC

Assuming:

  • The DAC output is 4mA to 20mA
  • The ADC input range is 0 to 5V
  • The sense resistor for the returned current is 250 ohm (250*20mA=5V)
  • The safety accuracy is +/-1%

Then, 99% of the time, a failure in the ADC (which is the diagnostic for the DAC) will give a value that doesn’t correspond to the expected value. Even if the ADC output is stuck on a value, the failure of the ADC will become apparent once the DAC output changes. You could potentially have an issue if the DAC output stays at one value for say a year and the ADC fails with its output stuck on the same value (for instance no on-chip clock to update the output register) and then within the year the DAC also fails.

Even though this means that the ADC might not require any diagnostics, it will still need a systematic capability (SC) according to the proposed new rules for IEC 61508 revision 3. A suitable ADC might be the ADFS7124.

You could even combine the DAC and ADC in one package, as found in the ADFS5758. In both cases, even if not required, the parts above both actually do have additional diagnostics if the self-checking argument is not convincing for you. This could, for instance, be used to check for the issue where the DAC doesn’t change for a year and the ADC output is stuck on the old value by switching in other voltages on the ADC input to confirm its output is not stuck.

Example 4 – A Watchdog Timer

This is one where the logic has no way of knowing that the diagnostic has failed. The logic has to reset the watchdog timer before the watchdog asserts the reset signal to the logic. Therefore, the logic block has to assume that the watchdog timer is still operating and that it would have reset the logic, except that the logic reset it first.

I had planned to look at other diagnostics such as:

  • Dual-core lockstep
  • ECC on a RAM

But the blog is already long enough. These are things that can be your homework.

Summary

Don’t panic because you see a new requirement in IEC 61508 revision 3. Do the calculations, and perhaps you will be able to justify that because your diagnostics are self-checking, you already meet the DSFF metric. The requirements might not be as new as you think because our automotive colleagues have had a latent fault metric since 2011.

For a previous blog related to diagnostics on your diagnostics see here.

For previous blogs in this series, see here.

For the full suite of ADI blogs on the EngineerZone platform, see here.

For the full range of ADI products, see here.