It is very hard to envisage a functional safety system without a communications network of some sort. You would therefore expect that the available guidance from the standards was clear and unambiguous. Some would say that it is clear, but when you debate it with people it turns out it is open to interpretation.
Part of the problem might be that there are a large variety of networks that are vastly different in their properties.
Types of networks include
- A connection that might include the cloud
- A connection that runs along a railway track over 10s or 100s of kilometers
- A wireless network in an oil refinery
- A full field bus running around a factory floor
- A point-to-point network on a factory floor
- A connection between two PCB within an element or sub-system
- A connection between IC on a single PCB
- Internal communication networks within an IC
I have previously blogged on the topic of black channel networks, see here. The black channel network paradigm is well described and controlled via standards such as IEC 61784-3 (fieldbus) and IEC 62280/EN 50159 (rail). However, even then is it appropriate to apply a fieldbus standard to the networks on the bottom half of the list above. IEC 61508 does identify another type of network which is referred to as a white channel network, but there isn’t much written on white channel requirements. Most other standards defer to IEC 61508-2 7.4.11 for their networking requirements.
Firstly, where do the names white and black channels come from?
It is called a black channel because you can’t see into it. It's all dark in there so you have no idea of the reliability of the routers, what type of switches are used, their EMI robustness, and their reliability, etc. A black channel could even include a link over the internet.
The white channel is exactly the opposite. It is a network designed in compliance with IEC 61508 and therefore you have full visibility and know the reliability of all the components, their EMC robustness, their failure modes, etc. Surely it would be better called a “clear” channel.
A grey channel is not mentioned in the standard but is something between a black and white channel. It's mostly black but you do know some of its properties and can perhaps fault exclude some failure modes.
Issues with the black check channel approach include
- Need to use a penal BER (bit error rate)/BEP (bit error probability) of 0.01 to allow for poor EMI and the unreliability of the unknown components (remember its black in there so you cannot see them)
- You need to calculate residual error rates for timeliness, corruption, masquerade, etc, and add them to your PFH (probability of dangerous failure per hour). This can be quite time consuming to do and agree with an assessor. You might also be surprised as to how large the failure rate due to the residual error rates can be.
- Unless you can justify it somehow you need to implement a lot of defenses against threats that might not be real.
You can choose a standard network protocol such as PROFIsafe and then many of your problems go away as someone has already done all the calculations for you and identified the defenses required to detect the errors using the worst-case assumptions.
That’s the black channel covered.
So, what do the standards say about a white channel, lets's consult IEC 61508-2:2010 sub-clause 7.4.11.
Figure 1 - Extract from IEC 61508-2:2010 22.214.171.124
This is accompanied by the diagram shown below. Both are somewhat confusing in that it describes the white channel as being completely designed to IEC 61508 but then mention that it should comply with one of the two black channel standards also. Many white channel networks would be out of the scope of those two standards e.g. a connection between two PCB or any point to point connection such as 4/20mA. The diagram, with its thick black vertical lines on either end of the network element, also seems to suggest the defenses are built into the endpoints of the network rather than into every component from which the network is built.
Figure 2 - drawing of a white channel network
Ignoring the references to the black channel network standards, which I think are wrong, my interpretation of the requirements for a white channel network are
- You need to design the network to a specific SC (systematic capability) with all the right design reviews, EMI testing, verification, and validation. Selection of suitable components etc. Any software used needs to be developed to the same SC.
- You need to do reliability predictions for all the components used.
- You need to use an FMEDA or similar to calculate a λDU (dangerous undetected failure rate) with diagnostics to get it sufficiently low (depends on the required SIL).
- Not all the ICs used in the network need to have been developed to IEC 61508. Most industrial safety systems are built using standard components.
- EMI should be considered as always present and not as an intermittent event.
- Your EMI testing should be done with margin (See IEC 61508-2:2010 table A.16) so that you should not get EMI failures in the real application at the expected EMI levels. If components in your design fail, such as a filter capacitor, you may then get failures due to EMI but that is already included in your FMEDA and λDU.
- There is no need for a penal BER/BEP. The failure rate of all the components in the network is known from the FMEDA or other reliability prediction done as part of a design for IEC 61508 and failures due to EMI should be eliminated as EMI constitutes a systematic failure mode. The only way a packet should get lost or delayed is through a failure of a component. The only way a packet should get corrupted is through poor EMI performance or the failure of a component designed to give good EMI.
Perhaps, the most controversial of the above is my assertion that you should not get failures due to EMI in a network designed for IEC 61508. Let’s look at some evidence.
Firstly, let’s assert that EMI is a systematic failure mode rather than a random hardware failure mode and then consider the below from part 6 of IEC 61508.
Figure 3 - Extract from IEC 61508-6:2010 A.1
So, if EMI is a systematic failure mode (evidence to follow) its effects should be reduced, through design reviews, careful design, and testing, so that the systematic failure rate due to EMI and other such failure modes is commensurate with that for random hardware failures at the target PFH for the target SIL.
I looked up a definition of commensurate and I find “corresponding in size or degree, in proportion”. Ok, so the measures used must get the failures down to the same level as the random hardware failures. But is EMI a systematic failure mode, let’s look for the evidence that it is.
Below is a collage showing evidence that indicates that EMI is a systematic failure mode.
Figure 4 - Various bits of guidance from IEC 61508-2:2010 related to EMI (electromagnetic immunity)
Some of the other standards state it even more clearly. Let’s start with IEC 61000-1-2 (Electromagnetic compatibility (EMC) - Part 1-2: General - Methodology for the achievement of functional safety of electrical and electronic systems including equipment with regard to electromagnetic phenomena).
Figure 5 - An extract from IEC 61000-1-2
And it is hard to be any clearer than IEC 61326-3-1 (Electrical equipment for measurement, control, and laboratory use - EMC requirements – Part 3-1: Immunity requirements for safety-related systems and for equipment intended to perform safety-related functions (functional safety) – General industrial applications).
Figure 6 - IEC 61326-3-1
The statement “it is not necessary to take into account the effect of electromagnetic phenomena in the quantification of hardware safety integrity” is surely the clincher for the argument.
Even if working on industrial stuff I always like to look at automotive. It is the latest of the sector specific functional safety standards and was developed by a large and diverse team with a lot of detail in its 11 parts. Plus, almost everything in automotive is safety related so those guys are pretty focused. Automotive imposes stringent EMI requirements with detailed testing. ISO 26262 makes no reference to the black channel standards IEC 61784-3 or IEC 62280 and there is no requirement to calculate residual error rates. You are expected however to add defenses against corruption, delay, masquerade, etc. but the impact is not quantified and is qualitative in nature. This is in line with my understanding of how a white channel network should be implemented and consistent with the requirements in IEC 61508 for the control of systematic failures from IEC 61508-2:2010 7.4.7 and tables A.15 to A.17.
It’s hard to find examples of white channel networks for industrial. Networks I would classify as white channel include
- A network on a PCB where the circuitry on the PCB is designed to IEC 61508
- A 4/20mA network, see my previous blog.
- Potentially the network with a robot arm (more properly called a manipulator)
- A wired BMS (battery monitoring system) network, see for instance the ADBMS6815 datasheet
It is hoped to add clarity to IEC 61508 revision three as regards the white channel and possibly the grey channel. However, it might be 2027 or later before it is published.
In summary, if everyone just agreed to agree with me then the problem is solved.
Reasons I am interested in this topic include the fact that ADI makes some nice networking chips and solutions such as
GMSL2 – see here (Well worth reading if you are an industrial guy and have never heard of it)
4/20mA components – see here
10Base-T1L – see here
Ethernet APL – see here
IO-Link components – see here
Wireless BMS – see here
Well done to all who got to the end of this long blog. All comments are welcome whether in agreement or disagreement.
For the full series of Safety Matters blogs see here.