What is Markov Modeling & What is it Used For?

What is Markov Modeling & What is it Used For?

My last blog was on CCF (common cause failures) and this one is on a handy technique for reliability modeling including CCF known as Markov modeling. As a refresher a CCF generally involves all the channels in a redundant safety system failing at the same time so that a hazard occurs. Ideally in a redundant system in the event of a single failure there is still at least one channel to keep you safe. CCF generally takes down all redundant channels.

I won’t bore you with the official definition of a Markov model but will instead give you some examples of what a Markov model looks like especially in the context of modelling CCF. One thing I will say from the definition is that in a Markov model the next state depends only on the present state and not on anything which went before. As my first example, below is shown two ways to model a two channel safety system (repair not shown) using Markov modeling. The failure rates are shown above the arrows in all cases and in a practical system would actually be the dangerous undetected failure rates. Looking at the Markov analysis on the left the system starts in the “ok” state with both channels working and then enters one of three states representing channel A fails, channel B fails or both channels simultaneously fail due to CCF. Once channel A has failed then you can enter state 4 if channel B fails and similarly if in state 3 and channel A fails you go to the state with channel A and B both failed. In the examples below the failure rate for both channels is λ.

Note – the path from S1 to S2 and from S2 to S3 should really read λ-λCC but that’s for the advanced class.

The diagram on the right shows another way to model the same circuit and I will leave it as an exercise for you to figure it out for yourself.

Figure 1 - two ways to represent a two channel safety system using a Markov model

Even from the model above you should now have figured out that Markov modeling requires identification of the system states and the probabilities of moving between them. As an example, below is a Markov model of a two-channel safety system from IEC 615800-5-2:2007 Annex B.

It contains eight states as follows:

S1 – the all ok state with both channels working

S2 – channel A has failed dangerous

S3 – channel B has failed dangerous

S4, S7 are states to represent whether the failure of channel B is dangerous detected or dangerous undetected

S5, S6 are states to represent whether the failure of channel A is dangerous detected or dangerous undetected

S8 – represents the hazardous state where both channels have failed undetected


Figure 2 - Markov model of a diverse two channel safety system from IEC 61800-5-2:2007 Annex B

If the eight states are arranged as a vector then the initial starting point is S=[1,0,0,0,0,0,0,0] indicating that at time 0 the probability of being in state 1 is 1 and the probability of being in any of the other states is 0. Remember the sum of the probabilities must be one.

You can then create an 8x8 transition matrix (P matrix) showing the probability of moving from any state to another state. Many of the entries in the matrix will be zero as generally there are only 1 or 2 paths out of any given state. For instance the 6th row representing the means to exist state S6 will be [0, 0, 0, 0, 0, 1-λBD, 0, λBD]. The sum of each row should add up to 1 and is a useful check to perform on your P matrix.

To calculate the steady state probability of being in the various states calculate SN=P*SN-1. After a number of iterations you should reach the steady state probabilities of being in the various states.

Matlab or the very similar but free, Octave can be used to do the math. I have also used Excel but it takes a bit more work.

One interesting aspect of the Markov model in figure 2 is that it assumes diverse channels each with their own failure rate. Therefore the common cause failure path from the all ok to the all failed state is given by βA/Bmin (λBDAD). The simplest rationale behind the use of the minimum is that even if βA/B=1 the failure rate cannot exceed that of the channel with the lowest λ. 

Another interesting aspect is how diagnostics are modeled. Note the transitions from S2 to S5 and S2 to S6 are modeled as DCA*rtest and (1-DCA)*rtest where rtest is the diagnostic test rate.

As an example of how it might be done the table from IEC 61508-6:2010 table B.12 can be reproduced using a simple Markov model in Matlab.

The Markov model for the 1oo2 system is shown below and has only 3 states once we ignore all bar dangerous undetected failures. (System being analyzed using Octave)

The Octave code to implement the above for β=2% and λD=0.5e-7 is shown below and should be reasonably self-explanatory. Some diagrams in books and papers will show a lot more states but if you get out a highlighter and highlight all the paths which can lead to a dangerous undetected failure of both channels you should get the above. The other states may be necessary to calculate unavailability but not for PFHd.

Figure 5 - Octave code for a 1oo2 system per IEC 61508-6:2010 table B.10

For systems with more than three or four components the state transition matrix can become unwieldy. As an example the model for a category 2 system given in BGIA report 2/2008 figure G.2 has 17 states and well over 50 transitions.

Figure 6 - Annex K of ISO 13849-1

The data in Annex K of ISO 13849-1 was generated using Markov analysis. Having a simple lookup table allows readers to estimate a PL (performance level) without having to do the analysis themselves. However without knowing the reasoning behind the modeling you can be misled. For instance it assumes identical channels, a β of 2% and diagnostics implemented using cross comparison. This might not match your system. My efforts so far to produce the data above have been close but unsuccessful.

Hopefully the above gives you enough information that you don’t panic when someone mentions Markov modeling.

You will get lots of hits on Markov modeling if you Google it but most of them are not related to reliability modeling and some of it is highly technical. Some easy to read information on Markov modelling can be found in chapters 8 and 14 of “Control Systems Safety Evaluation and Reliability” by William M. Goble and also in

GNU Octave can be downloaded from https://www.gnu.org/software/octave/