A sunset view of an oil refinery, emitting a warm glow against the darkening sky.

I Want It All - High Safety and Availability – Part 1

Previously I did a blog on the 1oo2 (1 out of 2) architecture see here.  Another interesting architecture described and analysed in IEC 61508-6:2010 is the 1oo2D architecture which promises high safety and high availability. This blog contains some of my thoughts on this interesting architecture.

Firstly, let's discuss the differences between safety and availability, and to illustrate the difference here are a few examples. If I go out to your garage and cut the wire to the ignition switch on your car, your car will now be very safe (barring something unusual) but its availability will be zero (I guess you could still sleep in it).

Conversely, if you have a robot application protected by a light curtain. If you bypass the light curtain the availability of the robot will improve because it stops shutting down when the operator breaks the light curtain, but the safety will tend towards zero. Functional safety architectures such as 1oo2D and 2oo3 can give high safety and high availability. In a future blog I might cover 2oo3 but today it is 1oo2D.

The 1oo2D architecture uses the principle of degradation to achieve its aims. When all shiny and new out of the box it starts life as a 2oo2 architecture (both channels need to demand the safety function to trip the system) until a first failure is detected and then it changes to a 1oo1D architecture if the failure can be allocated to one of the channels. Another way to view this circuit is that it is two 1oo1D circuits in parallel (1oo1D is not shown in IEC 61508 part 6 but it is a standard 1oo1 circuit with a second diagnostic output). I figure 1oo2D has “1oo2” in its name to reflect the fact that it has a HFT of 1(see discussion below) but you might as well call it “2oo2D” since it starts as 2oo2. Anyway, ignoring the name let's discuss its functionality as described in IEC 61508-6.

 Figure 1 - 1oo2D diagram from IEC 61508-6

Figure 1 - 1oo2D diagram from IEC 61508-6

The drawing from IEC 61508 doesn’t describe its functionality so below is my attempt at a version of 1oo2D which matches the description from IEC 61508.  Note other documents such as IEC 61131-6 (functional safety for process control) give variations on the interpretation so be careful.

In the below, it is assumed that the actuator is some sort of solenoid and while power is available to it, the system will not trip.

With the 1oo2D architecture, each channel has its output switch controlled by the channel logic and a series of diagnostic switch controlled by the diagnostics for the channel. Channel 2 has the same arrangement with the channel 2 functional and diagnostics switches in parallel to those for channel 1. The circuit starts as 2oo2 as both channel 1 and channel 2 would need to open their switches to achieve the safe state (power removal to the actuator). To properly implement the voting each channel can also control the other's diagnostic switch.

 Figure 2 - Tom's interpretation of 1oo2D

Figure 2 - Tom's interpretation of 1oo2D

In reality, voting may be implemented in a PLC and there may not be any real switches as shown above. Each channel may report its data and status and a 2oo3 logic solver in the PLC implements the voting.

If you did somehow implement the voting using relays the equations in IEC 61508 part 6 include a factor “k” and this represents the failure rate of the voting logic. What happens if instead of switching from 2oo2 to 1oo1D it stays in 2oo2 mode? Then when a demand occurs, the faulty channel may prevent the trip. You get your availability but not your safety.

It’s easy to see that this circuit gives high availability because, by its very description, it starts with two channels each of which must trip to take down the system. In addition, if one channel fails with a detected error it opens its diagnostic switch and the overall circuit continues to operate as a single-channel system until the failure is repaired.

Deciding which channel, the system should follow is referred to as voting.

That it is high safety can be seen for the below extract from IEC 61508-6:2010 B.3.3.3 where for DC = 90% and Beta = 10% a 1oo1 architecture has a λD of 5e-9 vs 2.3e-9/h for the 1oo2D arrangement. It may come as a surprise that it is only 2x better in terms of safety that a 1oo1 architecture. It’s also not as safe (its PFH/PFD is not as low) as 1oo2 and this is the price for the availability.

 Figure 3 - Extract from IEC 61508-6:2010 table B.10

Figure 3 - Extract from IEC 61508-6:2010 table B.10

If you look closely at this table, you will think you have seen a problem. For the Beta = 2% column λDU gets worse as you increase the diagnostic coverage. This sounds crazy. The problem is that the equation for PFHG is dominated by the voting factor for K=0.98 and it is multiplied by λDD. For higher values of β the last term for the CCF dominates and it is multiplied by λDU which falls as DC is increased.

The diagnostics for 1oo2D will be a combination of diagnostics by comparison and per channel diagnostics. This was not shown well in my earlier diagram. Diagnostics by comparison can often achieve almost 100% diagnostic coverage for random hardware failures whereas individual channel diagnostics might only be 90%. That means that in 10% of the cases, you won’t be able to tell which channel has failed and the only alternative is to trip the system. But in 90% of cases, you will be able to move to 1oo1.

Another thing to consider is the diagnostic test interval. Diagnostics by comparison operate at a very high rate but when they discover a discrepancy if you don’t trip until you have completed the per channel diagnostics then the diagnostic test interval is the sum of the two times. This means that the process safety time you can claim for 1oo2D might not be as good as you can claim for 1oo2. The diagnostics by comparison test doesn’t trip the system, tripping the system is left to the per channel diagnostics, but you need to account for those 10% that the diagnostics by comparison catch but the per channel diagnostics don’t.

Of course, 1oo2D is a means to address random hardware failures. You still need to consider systematic failures and since 1oo2D is generally implemented using identical redundancy each of the redundant items would have to match the targeted SIL i.e., SC 3 if targeting a safety function with a SIL 3 requirement.

Before moving on, look at the equations for 1oo2D this might be a good place to stop. ADI’s social media team tells me once a blog gets past around 800 words it's already too long. I’m already over 1000 and if I don’t split the blog in two parts it will be over 2000 words. Please come back next month for part 2. Who knows it might even need a part 3. If you are reading this after January 2024 both parts should already be available at the link for all blogs in this series, see here.