I recently acquired and read a copy of Software Reliability: Principles and Practices by Glenford J. Myers which was published in 1976. To save you doing the math in your head that is 45 years ago. I can’t remember how I found the book, but it was probably referenced from some standard or paper I was reading. Given the fast-changing nature of software engineering, could this book be still relevant after all that time? I would definitively answer yes. At the time of writing there are copies available on amazon.co.uk from around 15 pounds sterling including shipping.

This blog starts with some math before becoming less academic because I actually started reading the book from chapter 18 on “Reliability modelling” as I had a need related to a previous blog on “The Math behind proven in use”. One model discussed is where the failure rate of the software z(t) is constant (the time between failures is negative exponential) until an error is discovered and corrected at which time z(t) becomes again constant but with a lower value. This leads to a staircase shape as errors are found and corrected. The author however points out that the failure rate of software is not generally related to time but more likely is a function of the remaining number of errors, the severities and locations of those errors and the way in which the system is being used. What’s that saying – All models are wrong, but some models are useful – or something like that.

Figure 1 - Illustration of the reliability growth model

Chapter 18 continues with a discussion on an error seeding model where known errors are inserted in the software and it is assumed that testing is equally likely to uncover the seeded and indigenous errors. By testing for a fixed amount of time it should then be possible to estimate N the number of indigenous errors from the number of seeded errors discovered during the testing. The equation N=n*s/v is given as the maximum likelihood estimate of N where n is the number of indigenous errors uncovered, v is the number of seeded errors uncovered and s is the number of seeded errors inserted. So, if 20 errors are seeded and testing reveals 15 indigenous errors and 5 seeded errors then the estimate of N is 60.

Suppose you want to give a confidence level that the number of errors in the program is zero. The following formula is offered where s, n as before (number of seeded and indigenous errors) and k is the number of errors we assert are in the program.

So, for instance if assert the program has no errors i.e. k=0 and we seed the program with 4 errors and testing reveals those 4 errors then we are 80% confident that the program contains no other errors. To be 95% confident we would need to seed 19 errors and find all 19.

The author points out one of the psychological problems with testing is that the person doing the testing tends to assume every detected error is the last one and stops testing. With the error seeding approach you know you must continue because there are seeded errors still to be uncovered. Obvious problems with the method are that the program with the seeded errors is not the same as the program with the unseeded errors and there is no guarantee that the seeded errors are not easier to uncover than the indigenous errors. Perhaps the seeded errors should be placed by your most cunning and devious colleague, someone who doesn’t like you might be best (although they might be tempted to tell you they had seeded errors but not do so that you spend your time chasing your tail).  I must say it is an interesting approach and I like the fact that it can give a confidence level.

The first thing that comes to mind regarding error seeding in IEC 61508 or ISO 26262 appears to be fault insertion/injection. However, fault insertion is only used in the hardware part of IEC 61508 and then only to verify claims for diagnostic coverage when the claimed DC is greater than or equal to 90%. “Error seeding” does appear in IEC 61508-3 table B.2 as a recommended technique for SIL 2 and higher. The description in part 7 gives the same warnings as above.

Figure 2 - table B.2 from IEC 61508-3

I then went back to the start of the book where it begins with a discussion on what is a software error. It uses the US Ballistic missile early warning system as an example. An early version of the software mistook the rising moon as an incoming missile. Was this an error? The requirement was that any moving object appearing over the horizon that was not a known friendly object be identified as a threat. That the moon was not excluded sounds more like a requirements issue. He goes on to point out that all software errors are systematic and the reason that a “software error appears at a particular time is that some unique sequence of inputs is being processed at that time”. Another example offered is that 18 errors were detected during the 10-day flight of Apollo 14 and gives this quotation from a 1973 paper.

Figure 3 - Comment on software errors in the Apollo program

Chapter 3 discusses four approaches to software reliability.

• Fault avoidance
• Fault detection
• Fault correction
• Fault tolerance

The author considers software errors as being largely caused by translation errors and advocates “minimizing the number of people” as that “has a positive effect of reliability by reducing the number of communication paths and hence the number of translation errors”.  On the topic of complexity, the author states “Complexity being a principle underlying cause of translation errors, is one the major causes of unreliable software. Complexity is both difficult to define precisely and to quantify. However, we can say that the complexity of an object is some measure of the mental effort required to understand that object”.

On testing the author has lots of bits spread around the book including

• People have a tendency to grossly underestimate the time needed for testing.
• A programmer who truly sees the program as an extension of his own ego is not going to be trying to find al the errors in that program. On the contrary, he is going to be trying to prove that the program is correct.
• Each piece of program is either right or wrong with no middle ground.
• Reliability cannot be tested into a program; a program’s reliability is established by the correctness of the design stages.
• That since errors come in clusters that as the number of detected errors in a piece of software increases, the probability of the existence of more undetected errors also increases.
• To encourage the detection of as many errors as possible in early testing phases, negative criteria, such as that function testing is not complete until the tester detects and corrects at least two errors per 1000 lines of code in the function are preferable.
• Managers tend to use the word success to describe a test case that has passed but if the function of testing is to find bugs then success is a test case which fails.
• The author of the book states that “where programmers are normally rewarded based on the absence of errors in their product, test specialists must be rewarded for finding as many errors as possible, even to the extent of paying them on a commission based (e.g. \$100 for detecting an error of severity X).

Scotts Adams who writes the Dilbert cartoons must have read the last one.

Figure 4 - Scott Adam's take on paying for finding and fixing bugs

One statement that will get me a lot of respect from the test people out there is “That you should assign your most creative programmers to testing. Testing, particularly test case design is the area of software development that demands the most creativity.” He also points out that design is usually viewed as the most creative task by managers and that many people with a talent for test and debug move to design to further their careers!

On the myth of path testing the book offers the flow graph below which is two do loops separated by a “if” statement. The book says that there are 1018 unique paths this module. I have put it on my to-do list to confirm this for myself.

Figure 5 - path testing example from the book

In general, I found the book still relevant with good examples, interesting insights and with good descriptions of some topics that might be overlooked in a more modern text which has to deal with operating systems, multi-threading, hypervisors, the cloud… It still reads very well and, in my view, has lasted the test of time. This being a blog I couldn’t include everything from the book but hopefully I have included enough to whet your appetite and that the bits I have decided to write about were interesting to you.