Functional safety is all about risk reduction. But how do you decide what SIL, ASIL, or PL you need to give the level of risk reduction you require? The process is referred to as SIL determination and this blog will give you a brief introduction to the topic. These processes are generally followed by the end user. For ADI this is generally our customer’s customer. For ADI and our customers (who often make the equipment used to build new factories such as PLC, laser scanners, light curtains, motor controllers, and encoders….) the best we can do is make assumptions of use and design accordingly. The end users then pick a suitable component from our customer's catalog that meets their safety needs. Our customers in turn choose a suitable part from ADI to meet their needs.
For ADI industrial group the requirements are generally either SIL 2/PL d or SIL 3. If anybody asks for a SIL 4 IC or module they are probably wrong. If somebody wants SIL 1/PL c they probably just want evidence of the use of a good, documented development process as opposed to safety.
Figure 1 - A long-established machinery safety standard that emphasizes a risk-based approach
One of the first things you must do when you start designing a safety system is do a hazard analysis and risk assessment. Such an analysis looks at what way your EUC (equipment under control) can cause harm to someone. Ideally, you would change the design to eliminate the risk but sometimes this isn’t possible so instead you identify a safety function to reduce the risk to an acceptable level. An example of a SIL 2/PL d safety function might be to stop a robot if someone approaches. But how to identify the SIL (safety integrity level according to IEC 61508), ASIL (automotive safety integrity level) or PL (performance level according to ISO 13849) of that safety function? That is the job of the risk assessment. The hazard analysis tells you what safety functions you need and the risk assessment the “quality” of those safety functions i.e. the level of risk reduction.
I have tried to capture the process of SIL determination in the graphic below.
Figure 2 - SIL determination
Alternatively, the cartoon from ISO 17305 puts it differently.
Figure 3 SIL determination from ISO 17305
There are many ways to determine the required “quality”, “reliability” or “dependability” of that safety function. Typically, a basic safety standard (a standard not designed for any sector but applicable to many different ones) will allow the use of many methods to determine the required level of safety, but sector-specific standards will usually nominate a preferred method.
The ISO 13849 (machinery safety standard) advocates the use of a risk graph. It’s called a risk graph because you start at one side and follow a path through the graph.
Figure 4 - risk graph from ISO 13849
Starting on the left you decide the severity of the hazard, is it more or less severe? If it is more severe you follow branch S2 and if less severe you follow branch S1. As a general principle, less severe accidents are tolerated with a higher frequency, than accidents with a higher severity. Next, you must decide on the frequency or duration of exposure. If there is a higher frequency/duration of exposure you follow path F2 otherwise path F1. Finally, you consider the probability that someone can somehow still avoid the hazard, and if it is likely they can avoid it you follow path P1 and otherwise path P2. Eventually, by following the graph, you end up with the required performance level PLR.
The next method we will look at is a risk matrix. A standard that advocates the use of a risk matrix is ISO 26262 (automotive functional safety standard). Here the severity of the accident is ranked as S1 to S3 (low to high severity), the exposure to the dangerous situation as E1 to E4, and how controllable is the car if the bad thing happens on a scale of C1 to C3 (not generally controllable). By using the matrix, you can determine the necessary ASIL in the range ASIL A(lowest) to D (highest confidence). A determination of QM (quality management) means that the special measures from ISO 26262 are not required and normal quality management practices during the design of the safety function should be sufficient.
Figure 5 - Risk matrix from ISO 26262
Often sector specific standards will define what they mean by severity etc. Below is one such example from IEC 62061.
Figure 6 - an example of a severity scale
Before we go further both risk graphs and risk matrices give you just a required SIL, PL or ASIL level. Given that each of those comes with a band of allowed dangerous failure rates that means that you just need to have a PFH (probability of dangerous failure rate per hour) anywhere in that band which often is taken as the maximum value for that band e.g. 1e-6/h for SIL 2, PL d where the band runs from a high of 1e-6/h to a low of 1e-7/h.
IEC 61508-5:2010 clause B.4 gives a warning about risk graphs, which also applies to risk matrices.
Figure 7 - Warning on risk graphs from IEC 61508-5:2010 clause B.4
My interpretation of the above is that risk graphs and risk matrices can be overly conservative and that probably explains why the use of the highest PFH (probability of failure per hour dangerous) for that SIL or PL range is ok.
The next method we will look at is the quantitative method from IEC 61508. In this method, we will determine an actual required PFH rather than a band. In the quantitative method, you need to determine what is the maximum tolerable individual risk per annum. This is roughly aligned with what is your risk of dying from a “normal” life. For an employee, the tolerable risk might be 1e-4/year. A lower figure is used for the public as they are not deemed to have accepted the risk in the same way that a worker might be deemed to have done so. Let’s use a member of the public and say the maximum tolerable risk is 1e-5/y. Let's suppose the hazard is an explosion and if the explosion occurs our expert judgement (see an interesting reference on expert below) is that 1 in 100 explosions will kill someone. Therefore, the allowed explosion rate is 1e-5/1e-2 = 1e-3/year. Next in our expert judgment, we decided that if we didn’t have a safety system such explosions would occur at a rate of 1 every 5 years (0.2/year). Therefore, we need a safety function giving a probability of failure on demand of 1e-3/0.2 = 5e-3 (1 in 200).
From IEC 61508 we can use the table below and see that 1/200 is in SIL 2 range which for a low demand safety function ranges from 1e-3 to 10e-3.
Figure 8 - maximum allowed failure rates for safety functions according to IEC 61508
Different from the risk graph methods however is that the target is now to achieve an RRF (risk reduction factor) of 200 (1/5e-3). We don’t use the upper end of the band as seen with the previous methods. However, the band is still important as depending on the band it determines which measures are required to protect against systematic failures. If it the RRF is anywhere in the SIL 2 range our design measures must meet the SC 2 (systematic capability 2) requirements from IEC 61508 parts 1,2 and 3.
I promised an interesting reference on the topic of expert judgment – I am currently reading “Eliciting and analyzing Expert judgment” which is called a report from the Los Alamos National Laboratory and runs to over 400 pages. I refer to this book as all of the above procedures require some level of expert judgment.
Figure 9 - An excerpt from the report "Eliciting and analyzing expert judgment"
For a previous blog on the differences between low and high-demand safety functions, see here.
For the full set of blogs in this series see here.