Hardware Reliability Metrics - PFH and PFD

Hardware Reliability Metrics - PFH and PFD

This blog is a follow on to the last one which covered demand modes and particularly low and high demand mode. This description is designed to fit in the length of a blog and target enthusiastic amateurs in functional safety, experts might want more and I have included some links at the end of the blog.

First the basics:

·         PFH = probability of dangerous failure per hour (IEC 62061 adds a very helpful “d” as in PFHd to remind us that it is for dangerous failures only)

·         PFDavg = probability of failure on demand average (its inverse is RRF or risk reduction factor)

PFH and PFD represent the reliability targets for the hardware in terms of random hardware failures depending on the SIL. Tables 1, 2 of IEC 61508-1:2010 give the actual requirements as

Figure 1 - link between PFH and PFDavg

A key point to remember is that both metrics are only concerned with dangerous failures that would prevent a system from maintaining safety or achieving a safe state. Dangerous failures detected by diagnostics are effectively safe and excluded from the metric.

Remembering from the last blog that the divide between low and high demand mode is a demand rate of once/year it can be seen that the requirements for both PFH and PFDavg are the same at a demand rate of once/year if a year is taken as 10,000 hours instead of 8,760 hours.

Reading the tables if you have a SIL 3 high demand safety function then the PFH needs to be < 1e-7/h (100 FIT). This could be determined using an FMEA (failure mode and effects analysis) or FTA (fault tree analysis). For low demand a SIL 3 safety function needs to have an average probability of failure on demand of less than 0.001.

Using approximations from IEC 61508-6:2010 the above leads to an interesting anomaly whereby it appears that the reliability requirement increases by a factor of 10 as the demand rate changes from 1.01/year to 0.99/year. This doesn’t make a lot of sense as the demand rate is falling.

To illustrate, suppose you have a single channel system with a lifetime of 20 years and no proof testing (proof test interval = 20 years.) Further suppose a high demand rate of 1.01/year (1/year to keep the maths simple) then to meet the requirements of IEC 61508-1:2010 table 3 a λDU of 0.99e-7 allows a claim a SIL 3 in terms of the hardware metric.

However, if the demand rate falls to 0.99/year (1/year to keep the math simple), it is surely a good thing to have the demand rate fall so what SIL are we allowed claim using the low demand rules. Based on IEC 61508-6:2010 clause B. we get a PFDavg of 9e-3 which is right at the top of the SIL 2 range and the λDU apparently needs to fall by a factor of 10 to get within the SIL 3 range. As I said it is confusing and doesn’t appear to make sense. Could the august ladies and gentlemen who wrote the standard have made a mistake? Fear not they did not.

Even if demand rates are used to calculate a RRF the apparent anomaly continues as shown below. Surely this can’t be right, what gives?

Figure 2 - apparent increase in required reliability as demand rate crosses 1/year

The above numbers assume a proof test interval of 20 years as stipulated for instance in ISO 10218-1.

The first warning that something is wrong is that the situation gets much worse as the demand rate increases further (into the high demand region) if you follow the low demand rules.  This suggests something very wrong with the low demand approximation at high demand rates above 0.1/year including the noted 1.0/year.

The second warning is given in IEC 61508-6:2010 clause B.3.1 where it states, “the expected interval between demands is at least an order of magnitude greater than the proof test interval.” However, you must read the standard very carefully to spot that. In this case with a proof test interval of 20 years the approximate equation therefore would only hold once the demand rate got to once every 200 years!!

So how can you rationalize this? The picture below helps somewhat.

Figure 3 - How a low demand system fails

Looking at the above picture you would estimate the hazardous event frequency follows the red curve below with the hazardous event frequency increasing as the demand rate increases. However, the safety function stands between the demand rate and the hazardous events and therefore the hazardous event rate can never exceed the failure rate of the safety function which is independent of the demand rate and shown by the purple line shown on the graph. When you crunch the math, you get something like the dashed line below giving the real hazardous event frequency and if this dashed line is used the anomaly around the 1/year demand rate disappears and the hazardous event frequency can never exceed that given by λDU.

Figure 4 - maximum hazardous event rate

In IEC 63161 the equation for the curve above is referred to as the Henley/Kumamoto equation.

While this is a complicated topic and difficult to handle in a blog here are some takeaways:

·        High demand mode gives the maximum hazardous event rate for a given λDU

·        IC suppliers and indeed module or sub-system suppliers need to give a λDU and let the PFH and PFD calculations to the safety function designer

·        An important reason to calculate the RRF/PFD for low demand is to determine the required SIL for the systematic requirements

·        As the demand rate falls from once/year towards once/thousand years the required RRF and SIL (SC) falls

For more information on this topic please see the referenced papers and standards below.

I was stumped finding a relevant video, but this one is at least some way relevant – see: https://www.youtube.com/watch?v=3a_24tJ3YYk (apologies the quality is poor) https://www.youtube.com/watch?v=zBZrXuHmrtM and finally https://www.youtube.com/watch?v=DfdGyzTa_gc .

The last two blogs have been complex so I am going to take the easy way out for the next few blogs and discuss some standard parts from ADI which I believe have interesting properties as regards functional safety. The first such part is the AD7124.

For more reading on this topic see

1)     SIL determination – Dealing with the unexpected

2)     IEC 63161 – Assignment of a safety integrity – Basic Rationale

3)     ISO/TR 12489:2013 – Petroleum, petrochemical and natural gas industries – Reliability modelling and calculation of safety systems