An Introduction to Machine Safety Standard ISO 13849

An Introduction to Machine Safety Standard ISO 13849

In the next couple of months, I will introduce the concept of functional safety and robots, including cobots (collaborative robots) and mobots (mobile robots). However, I haven’t covered ISO 13849 in previous safety matters blogs, and most robot safety systems use PL (performance level) from ISO 13849 instead of SIL (safety integrity level) from IEC 61508 or IEC 62061 to express the level of safety needed and achieved.

Therefore, in this blog I am going to give an introduction to ISO 13849. This should be of use to industrial functional safety people and even automotive functional safety people who want to work with things like autonomous farm machinery or mobile robots.

ISO 13849 is based on an older standard called EN 954 which dates back to 1990s. The IEC 61508 FAQ has a good concise history of ISO 13849-1.

 IEC 61508 FAQ

Figure 1 - A snapshot from the IEC 61508 FAQ describing ISO 13849

For me, EN 954 was an application-level standard to build a system using components such as laser scanners, safety relays and sensors. Typical machine safety functions are to check whether a door or guard was opened or closed and stop the machine if it is open. For safety EN 954 relied mostly on the architecture of the system and for the higher safety levels required two channel architectures.

ISO 13849 as the successor to EN 954 expresses the level of safety achieved in terms of PL (performance level). The achieved PL depends on is a combination of reliability (MTTF), diagnostic coverage (DC) and category (architecture). I will discuss each of these later. I still think some people like to rely on category (architecture) forgetting that different combinations of the MTTF, DC and CAT can be used to achieve the required PL. It is still unfortunately common to see a requirement given as PL d CAT 3 which limits the design options.

ISO 13849 comes as two parts, with part 2 containing information on verification and validation.

Correspondence between PL and SIL

The level of safety achieved in IEC 61508 is given by a SIL and in ISO 13849 by a PL. PL c and SIL 2 have the same range of probability of dangerous failure per hour. PL d and SIL 3 also match. PL b and PL c span SIL 1 while PL a is below the scale covered by IEC 61508 and ISO 13849 doesn’t cover SIL 4 due to the limited number of people normally at risk from a machine.

 IEC 61508 equivalent

Figure 2 - Correspondence between PL and SIL

The chart below shows how PL d can be achieved using a CAT 2 architecture with a DC of low (60%) and MTTFd high. It can also be achieved with CAT 3 and a DC of low (60%) or medium (90%) with an MTTFd of medium or high.

MTTFd DC and CAT to achieve a required PL

Figure 3 - Chart showing how to combine MTTFd, DC and CAT to achieve a required PL

As mentioned previously, the fact that ISO 13849 allows you to trade off reliability vs DC vs category is however ignored in some standards which reference ISO 13849 such as ISO 10218 (robot safety) and IEC 61496 (human presence detection) which explicitly require CAT 3 or CAT 4. Following ISO 13849, the measure of safety achieved is given by a PL and this can be stated without a CAT leaving the choice of architecture to achieve that PL up to the system designer. Some of the confusion relates to the fact that ISO 13849 includes mechanic, pneumatic and hydraulic components within its scope and for many of these the achieved diagnostic coverage and low reliability mean that a redundant architecture (CAT 3 or CAT 4) is typically required to reach PL d and above. Applying this same logic to electronic circuits with high reliability and the ability to run extensive diagnostics in a short time is wrong.

Risk assessment per ISO 13849

ISO 13849 is a simplified standard compared to IEC 61508. This simplification was intended to allow it to be used easily down on the factory floor but over the years ISO 13849 has increased in complexity and I wonder how easy it is to apply outside of experts anymore. One area that is still relatively simple is the risk graph suggested for the risk assessment to determine the required PL.

 ISO 13849

Figure 4 - Risk assessment per ISO 13849

Using this risk graph, you must first decide if the possible injuries are severity 1 or 2. Then how often someone is exposed given by F1 or F2 and finally the probability of an operator being nimble enough to avoid the hazard. Following the path leads to a PL in the range a through e. As an example, S2+F2+P1 leads to PL d.

DC

In contrast to IEC 61508, IEC 62061, and ISO 26262 - ISO 13849 only considers dangerous failures. Therefore, DC is actually the fraction of dangerous failures detected. There is no credit available for failures taking you to the safe state. This is therefore a more difficult metric that either the SFF (IEC 61508 and IEC 62061) or single point fault metric (ISO 26262) with a DC of 90% being close to an SFF of 95% (assuming 50% of the failures are safe and 50% dangerous). Otherwise, the scale for DC matches than used in other standards such as IEC 61508.

Figure 5 - Diagnostic coverage ranges per ISO 13849

Category

ISO 13849 doesn’t mention HFT of 1 or 1oo2 etc, but rather uses categories to represent architectures.

Figure 6 - Definition of a category from ISO 13849-1:2015

A category 2 architecture is a single channel architecture with a separate test channel to implement the diagnostics. Interestingly according to ISO 13849 you can have CCF (common cause failures) between the functional and test channels whereas in IEC 61508 CCF is only of concern between two or more functional channels.

Figure 7 - Category 3 architecture from ISO 13849-1:2015

CAT 3 is a two-channel architecture as shown above with the dashed lines between the two logic units representing diagnostics by comparison including the sharing of data read back on the status of the output devices. However sub-clause 6.2.1 of ISO 13849-1:2015 does state that “The designated architectures cannot be considered only as circuit diagrams but also as logical diagrams. For categories 3 and 4, this means that not all parts are necessarily physically redundant but that there are redundant means of assuring that a fault cannot lead to the loss of the safety function.”. This means that to a certain extent that the diagram can be ignored, and you should concentrate on the text of the description. The key part of the text for me is shown below.

 

Figure 8 - Key text from the description of a category 3 system

This means that some parts of a circuit that are considered as single fault tolerant according to ISO 13849 are not single fault tolerant according to IEC 61508 or IEC 62061. ISO 13849 has the requirement for single fault tolerance taking the diagnostics into consideration, but IEC 61508-2 7.4.4.1.1 a) does not allow the consideration of diagnostics when calculating the HFT. It is easy to see how a two-channel system meets the above single fault tolerance requirements. A single fault in either channel means that the other channel will still carry out the safety function. There is no absolute requirement only where “reasonably practical” that the fault in one channel is detected therefore an accumulation of faults can lead to the loss of the safety function. Such an accumulation of faults can mean neither channel is able respond when called upon. It is worth remembering that ISO 13849-1:2015 sub-clause 7.2 states “two or more separate faults having a common cause shall be considered” as a single fault”. Therefore ISO 13849-1:2015 Annex F is important to make sure sufficient measures have been taken to prevent such failures. A single channel system where all the dangerous failures are detected would also meet the single fault tolerant requirement for ISO 13849. With such a single channel system an accumulation of faults that will lead to the loss of the safety function is if the diagnostics fail followed by the item it was designed to diagnose. The order of failure is important since if the item being monitored fails first the diagnostics will detect the failure.

However, detecting such an accumulation of faults is a property of a category 4 architecture and not an absolute requirement for category 3 unless “reasonably practical”. The fault detection interval for category 3 is not specified like it is for category 2 even though category 3 can also be implemented with a single channel (if diagnostics cover the non-redundant portions). There are two options given. Option 1 is to detect the failure before the next demand which I would interpret as being similar to the CAT 2 requirement and so mean a diagnostic test rate of 100x the demand rate. Option 2 is to run the diagnostics when a demand occurs. So, for instance in a robot application this might mean that when someone enters a protected area that the diagnostics are run as part of the presence algorithm and still gives time to achieve the safe state if a fault is detected.

Requirements to validate a category 3 system according to ISO 13849-2:2012 are shown below.  

Figure 9 - Validation requirements for category 3 from part 2 of ISO 13849

Category 4 is very similar to category 3 except there is now a requirement to protect against an accumulation of faults. If a fault cannot be detected it must be analysed to see if this failure could in combination with some other failure could lead to the loss of the safety function. Again, a two-channel architecture should help here and the standard states “in practice, the consideration of a fault combination of two faults may be sufficient”.

It is interesting to see that the standard implies that a high DC protects against the accumulation of faults. To me this is only partly true. Yes, if the first fault is detected that is good but I think the real subtlety is within the safe or no effect failures combining to lead to the loss of the safety function. However, ISO 13849 only defines dangerous failures with no definition of safe or no-effect failures. Either way if you wanted to implement a single channel CAT 4 system it means you would probably need diagnostics on your diagnostics to prevent the accumulation of faults.

MTTFd

Reliability in ISO 13849 is given by the MTTFd variable. This stands for mean time to dangerous failure. Assuming a constant failure rate MTTFd = 1/λD where λD is the dangerous failure rate.

Figure 10 - Measure MTTFd

Systematic requirements

ISO 13849 doesn’t cover systematic failures in the detail found in IEC 61508 or ISO 26262. For instance, there are only a couple of pages of software requirements and for PL e you are told to refer to IEC 61508. The lack of detail makes sense because for machinery there are in fact two standards. IEC 62061 is the machinery interpretation of IEC 61508 and uses the terminology of SIL and HFT which will be familiar to anybody who has studied IEC 61508. The chart states that for more complex systems, which are the ones most prone to systematic failures, IEC 62061 or even 61508 should be used. Nevertheless, machinery guys seem to avoid IEC 61508 like the plague but the chart below suggests to me that it should be used more often for robot, cobot and mobot safety.

Figure 11 - Guidance from IEC 62061 on which standard to use

Closing comments

  • For more information on ISO 13849, including categories, please see the excellent document from the IFA in Germany which is available for free here.
  • The IFA also provides free software to do many of the ISO 13849 calculations here – click on the hyperlink to download SISTEMA.
  • The free machinery safety book from Rockwell Automation available here is also helpful.

For all previous blogs in this Safety Matters series, please see here.

Anonymous