On Chip Redundancy According to IEC 61508 Annex E

On Chip Redundancy According to IEC 61508 Annex E

There are many advantages to claiming your safety function is implemented redundantly. 

  • A relaxation in the diagnostic test interval for each channel i.e. you might be able to run your diagnostics once per month instead of once every second
  • A relaxed diagnostic coverage requirement for each channel i.e. 90% for each channel vs 99% for each channel
  • A better PFH/PFD i.e. reduced by a factor of 10 if you have a common cause failure rate of 10%

Some standards, especially in the machinery sector, mandate a minimum level of redundancy. Typically, these are standards which previously relied on equipment where the reliability was poor and high-rate diagnostics were hard to achieve.

However, redundancy obviously comes with costs, one of which is area on your PCB if building an electronic based element using discrete IC (integrated circuit).

Given the advantages and disadvantages above, an obvious question then is whether you can achieve redundancy, with a single integrated circuit i.e. on-chip redundancy. Put another way, if everything else in your system is redundant can a single IC limit you to claiming only HFT=0 or CAT 2 instead of HFT=1, CAT 3.

Note: HFT = hardware fault tolerance from IEC 61508. HFT = 0 indicates no redundancy, HFT=1 indicates redundancy. CAT 2 and CAT 3 are non-redundant and redundant architectures from ISO 13849 although ISO 13849 does allow diagnostics to be considered as redundancy in some cases.

Some might be surprised that I would even attempt claiming HFT=1 with a single IC being shared by two channels but IEC 61508:2010 gives guidance on how to achieve this.

Some might wonder is it worth the hassle. Won’t it upset your assessor. Short answer, it probably will but if you are space constrained perhaps it is worth the hassle.

Concerns your assessor and you might have with claiming redundancy despite having what looks like a single point of failure could include:

  1. Both channels failing simultaneously due to common cause failures
  2. Isn’t the package still a CCF
  3. A systematic failure in both channels if they are identical

Item 1) will be covered in the rest of this blog.

Item 2) is not considered in the standard and I won’t comment on it here.

Item 3) is not relevant since HFT only deals with random hardware failures and the standard already contains lots of measures to prevent the introduction of systematic failure modes (see for instance last month’s blog in this series on IEC 61508-2 Annex F).

Ok let’s get started. I will highlight three interesting parts showing on-chip redundancy and then discuss the formal requirements given in IEC 61508 for on-chip redundancy.

The first part I will look at is the AD7902 which is a “Dual Pseudo Differential 16 bit” 1 million samples per second SAR (successive approximation) ADC (analog to digital converter).

Figure 1: AD7902 SAR ADC

The good news is that the AD7902 has two completely separate die in a single package and is therefore not subject to the requirements from IEC 61508-2:2010 Annex E which we will set later.

Another interesting ADC is the  AD7770. The AD7770 has eight channels of 24-bit sigma-delta ADC and then at the bottom a single 12-bit SAR ADC. The bit that intrigues me from a functional safety point of view is that the SAR and sigma-delta are very diverse architectures and the probability of a single failure causing a SAR and a sigma-delta to fail in the same way at the same time would be low. In this case the SAR is also fast enough to convert the eight sigma-delta inputs to give a lower resolution “backup” or diagnostic measurement of those eight inputs. If you read my recent blog on safety accuracy you will see why 12-bit diagnostics are probably sufficient for a 24-bit ADC. This part would need to meet the requirements we will see later from Annex E because all 9 ADC are on a single substrate. However, if you considered it as eight channels to implement eight separate safety functions with the SAR ADC providing diagnostics to all eight safety functions, Annex E would not be relevant since there is no claim for HFT=1 and there is no requirement in IEC 61508:2010 to separate an item and its diagnostics. Similarly, if it was just a single safety function which needed to measure eight inputs there is still no HFT=1 claim.

Figure 2: AD7770

The MAX32690 is another part I must study more in future. It’s interesting on-chip redundancy feature is that it has an ARM Cortex M4 and a RISC V processor. Could this combination somehow be exploited for a claim of on-chip HFT? I think the issues would be very similar to those on the AD7770. A claim of HFT=1 will need Annex E compliance but perhaps you could use the diversity to implement a diagnostic to detect failures in either core without claiming HFT=1 and to address systematic failures of either core.

Figure 3: MAX32690 micro controller block diagram

If you do insist on claiming HFT=1, let’s look at the requirements in IEC 61508:2010. The formal requirements for claiming on-chip redundancy in IEC 61508 are given in IEC 61508-2:2010 Annex E. The requirements are onerous and for now only apply to digital ICs because Annex E states “The following requirements are related to digital ICs only. For mixed-mode and analogue ICs no general requirements can be given at the moment”. The reason for restricting the informative annex to digital ICs is not that you can’t claim on-chip HFT for analog and mixed signal ICs but rather a lack of experience with Analog and mixed signal ICs on the team writing the guidance back before 2010. Given that analog and mixed ICs already use things like isolation wells and careful routing of power and ground for performance as opposed to safety reasons, analog and mixed signal ICs will meet a lot of the requirements we will discuss below without any extra effort for safety reasons.

The Annex E requirements are onerous, and this probably explains why not many open market parts have ever successfully made the claim. There may be parts designed for specific customers and not featured on the IC manufacturers websites, but I believe the numbers are still small.

Examples of the requirements from Annex E:

  • Need to consider on-chip temperature since it could be a cause of common cause failure
  • Separate physical blocks need to be used on-chip. This means no big sea-of-gates for the digital of both channels. However the standard goes further and requires separate bond wires and no shared pins or tracks from one channel routed over the other channel. This would mean 2 separate power and ground pins, one set for each channel.
  • Each of the redundant channels to have a DC of 60%

The requirement for separate pinouts and the requirement to use completely separate on-chip physical blocks is probably the most onerous of the requirements. There is also an interesting requirement to calculate a Beta IC for the redundant items based on a list of bonus and malus (oppositive of bonus) attributes and if you get a Beta IC of less than 25% you can claim HFT=1. I may cover this in more detail in a future blog.

Revision 3 of IEC 61508 will take an approach which is more similar to automotive with a less proscriptive approach. The CDV (committee draft for voting) should be available for public voting (contact your national committee for access) in Autumn 2024.

About My Blog Series:

Check back next month on the second Tuesday of the month for the next blog in this series. Until then I hope to post “mini blogs” on the other Tuesdays in the month directly from my LinkedIn account. Please follow me on LinkedIn if interested.

For previous blogs in this series see here.

For the full suite of ADI blogs on the EngineerZone platform see here

For the full range of ADI products see here