Combining Functional Safety and Availability Using Redundancy

Combining Functional Safety and Availability Using Redundancy

Functional Safety is defined as a property of an E/E system to mitigate the effects of failures leading to potential harm. One way to achieve this goal is to use traditional fail-safe systems requiring the transition to a safe state within a certain fault-tolerant time. While this approach is acceptable for most automotive applications, it may even create a potential hazard for systems that require continuous operation. Adding redundancy will solve the issue and increase availability.


Traditionally, Functional Safety and Reliability are treated separately. Whereas reliability refers to the probability of a system to perform its function within a certain period, safety relies on mitigation of failures no matter when they happened. The ISO26262 standard has no requirements for minimum reliability, except that existing reliability techniques should be followed. 


Most automotive systems are designed to be fail-safe, where the system is transferred to a safe state after a potential single point failure occurs. For many of those systems, this state has been associated with deactivating a whole sub-system. The problem with this approach is that a whole module would not be available anymore. Consider applications for autonomous vehicles that use only a single front camera, where a camera failure would prevent the system from making critical decisions affecting braking or steering. For such applications, availability needs to be maintained because there is no additional control by the human driver to rely on.

Improving Availability by Using Redundant Systems


To illustrate the differences between a fail-safe system and a fault-tolerant system, let’s look at three different examples consisting of a sensor, a serializer for transferring image data, and a de-serializer that forwards the video to a central System on a Chip (SoC).


1 – Linear system (1oo1)

 RBD (Reliability Block Diagram) of Linear System

Figure 1: RBD (Reliability Block Diagram) of Linear System


Assume that any of the sub-systems is subject to a constant failure rate leading to the Reliability Function 

Additionally, any of the 3 sub-systems is associated with a failure rate of 1000 FIT. To be functional, all 3 sub-systems need to work, leading to a reliability Function Rs = R1*R2*R3 . For T=100 000 operating hours, the system reliability would be R=0.74 where 74% of all parts at this time would be functional and 26% have already failed. The FIT rate for such a system is equal to 3000 FIT.

Conclusion: The reliability of this system will always be lower than the one of its least reliable components.  

2 – Triple Modular Redundant System (2oo3)


In our second example, let’s consider a triple modular redundant system. For such a system, at least 2 of the 3 channels need to be functional.

 RBD of redundant system

Figure 2: RBD of redundant system

The reliability function for this system is Rs= R(t)³ + 3 R(t)²*(1-R(t)) where R is the reliability of the linear system as described in Figure 1. The reliability is identical to R = 0.83= 83% for the same mission time T, and we can derive an equivalent FIT = 1823 FIT, concluding that the Mean Time to Failure MTTF is improved by more than 40%.


Conclusion: The 2oo3 system has a better Hardware Fault Tolerance HFT=1 and improved availability compared to the linear system.

3 – Fully Redundant System (1oo3)


In our final example, let’s start with the same RBD as in Figure 2, but assume a parallel solution, where only the requirement of one of the three sub-systems is needed to ensure its functionality. For this system to fail, all components need to fail at the same time - which translates to a reliability function Rs(t)= 1-(1-R(t)) ³. For the same mission time (T), this results in Rs (T)= 0.98 where 98% of all parts would l be functional and the equivalent FIT rate is 175 FIT.


Conclusion: The failure rate is only 6% of the linear system, reliability is significantly improved, and it will always be better than one of its highest reliable components.

 Comparison of Reliability R(t) for different solutions: Linear, Triple Modular Redundant, Fully Redundant

Figure 3: Comparison of Reliability R(t) for different solutions: Linear, Triple Modular Redundant, Fully Redundant

Paving the Way for Autonomous Vehicles


Both Automotive Safety Integrity Level (ASIL) and high availability can be achieved by implementing redundant systems to produce a lower failure rate. Redundant systems can help ensure that autonomous vehicles can continue to operate safely even if a component fails. This not only helps to protect passengers and drivers but also enhances public trust in autonomous technology, paving the way for wider adoption. While this approach comes with additional hardware costs, it is a necessary investment for the safe operation of autonomous vehicles in the future.

Read more from the Automotive FuSa blog series

  • Labels have been fixed an blog updated

  • Jeff, thanks the feedback and the comment on the labelling. Main intention was to highlight the differences in availability/reliability between a linear system (which should correctly be noted a 1oo1), a 3 channel system requiring functionality of two channels and a fully redundant system. Technically it should be clear from the description within the text, anyhow the nomenclature should  be consistent with existing standards as you correctly noted. 

  • In functional safety, the MooN (pronounced "M out of N") nomenclature is a way to describe a redundant safety architecture.  It defines how many i.e., 'M', channels (out of a total number i.e., 'N') must function correctly for the safety system to perform its intended safety function.

    The RBD in Figure 1 is showing a 1oo1 system, not a 3oo3 system.  It is illustrating a single channel system made up of 3 serially connected components (inside the dashed box).  The failure rate of the 1oo1 system (as correctly demonstrated) will be determined by simply adding the failure rates of all single channel elements/components and entering into the reliability formula.  This has the same solution whether one adds the failure rates before entering into the reliability formula or multiplies the reliability of each serially connected element / part.

    The second section / figure illustrates a 2oo3 system (not 3oo2).  And the third section should be labeled as a 1oo3 system (not 3oo1).