Functional Safety for Software

Functional Safety for Software

I won’t go so far as to say that functional safety guys hate software but sometimes you would be forgiven for thinking so. From a safety point of view, software is deemed to have too many states and is said to be un-testable.

Figure 1 - A book I look forward to reading

Because of its un-testability there was no way to demonstrate the safety of software and instead in the past relatively simple logic was used, implemented with safety relays and the like.

However, given the flexibility and power that software brings to systems its use in safety was inevitable. With the advent of new standards such as IEC 61508-3 designers have a way to show that their software is sufficiently safe by following a set of techniques that have been shown to deliver safe software in the past.

Figure 2 - The power of software

Things which make software different to hardware include:

  • Software doesn’t tape so there is generally no hard deadline (to put it another way “how does a project get to be a year late – one day at a time” – Brooks Law)
  • Features can be added after release – “and they looked upon the software and saw that it was good. But they had to have this one feature….” – Attributed to McCormick G.F.
  • Software can do almost anything and is often asked to – “the curse of flexibility”
  • Software runs on hardware

While software doesn’t wear out and doesn’t have random failures like hardware it can contain systematic errors. Systematic errors are errors which can only be removed by a design change i.e. change the code. Systematic errors are always present but are only exposed when a certain set of conditions arise. Hardware reliability uses traditional reliability methods and is based on probabilities. You can attempt to use probabilities for software, but the probability of software failure is 1; when the right conditions arise to expose a bug.

Estimates for the number of errors per 1,000 LOC (lines of code) vary but the estimates are generally in the range of 1 to 10 EPTLOC (errors per thousand lines of code) for good code. The book “Software assessments, Benchmarks and Best practices” by Addison-Wesley gives figures for various CMM (capability maturity models) levels as level 1 – 7 EPTLOC, level 2 – 6 EPTLOC , level 3 – 5 EPTLOC, level 4 – 2 EPTLOC, level 5 – 1 EPTLOC. While other data sources give values in the range of rates per 1,000 lines of code for office applications as 7, industrial applications as 2 and space shuttle applications as 0.1. All of this shows the challenge of making software safe.

The book “Software Safety primer” describes 5 types of software errors and estimates that 60% of the errors are related to specification and design with 40% related to coding.

  • Specification errors – some functionality omitted because it was not documented in the requirements
  • Design errors – use of incorrect algorithms, lack of self-tests….
  • Coding errors – endless loops, syntax errors….
  • Hardware induced errors – for example bit flips in the flash memory changing an instruction
  • Interface errors – a problem related to the software hardware interface

So, what do the safety standards say. They advocate a set of methods and processes designed to reduce the chances of introducing an undetected error into the code. The lifecycle model below is advocated by IEC 61508:2010 and I will come back to it in a later blog. The process is holistic going the whole way from requirements to architecture to design and eventually to coding with verification and validation steps to match each stage.

Figure 3 - Software V-model from IEC 61508-3:2010

Typically, the processes advocated by functional safety standards are rigorous compared to even good software development practices advocated for the non-safety domains. The table below shows estimates of the gaps between CMMI and the avionics D0-178 standard.

Source page 172 of Avionics certification (see below)

The main gaps relate to things like

  • Independent safety assessment
  • Tool qualification
  • Very specific safety requirements related to a particular kind of analysis e.g. need to do fault tree analysis or an FMEDA

while tasks such as

  • Configuration management
  • Software planning
  • Coding
  • Functional testing

are well covered by the standard non-safety high quality development processes.

Functional safety for software standards which are worth reading regardless of the domain for which you are developing include

  • IEC 61508-3:2010 – main non-sector specific functional safety for software consensus standard
  • D0-178C – Avionics software safety standard
  • EN 50128 – Rail software standard
  • ISO 26262-6:2011 – Automotive functional safety for software standard
  • IEC 62304 – medical device software
  • IEC 60880 – Nuclear software safety
  • UL 1998 – US standard for software in home appliances

While the end domain for each of the above is different the intent of each is the same and what is described badly in one standard is often described much better in another.

For those who want to read more on software safety for themselves, here are my recommendations.

  • Software safety primer by Clifton A. Ericson II
  • Mission-critical and safety-critical systems handbook by Kim Fowler
  • Embedded software development for safety-critical systems by Chris Hobbs
  • Better embedded system software by Philip Koopman
  • Software for dependable systems sufficient evidence from the National Academy of Sciences
  • Avionics certification – A complete guide to D0-178, D0-178C, D0-254 by Vance Hilderman and Tony Baghai

This week’s video is at https://www.youtube.com/watch?v=gp_D8r-2hwk  and shows an Ariane 5 rocket launch which doesn’t go so well due to a rounding issue in re-used software. The cause was related to the use of software from the Ariane 4 which was slower to accelerate.

This is the first in a series of software related blogs. Functional safety for software is a massive issue. It was hard to know where to go for the next blog but in the end, I decided that for next time, the discussion will be on “Software systematic capability”.