Last time I promised my next blog would feature a deep dive into IEC 61508, the main functional safety standard. And I keep my promises, however, this will be the last of my introductory blogs covering basic topics for a while. I am keen to move on to more exciting topics such as requirements for Cobots, AI, networking and cyber security. So keep tuning in because these topics will all be covered beginning with my next installment.
Obviously as a semi-conductor manufacturer I am going to concentrate on the semi-conductor functional safety requirements but anything here should be more widely applicable. Also, obviously because of the nature of a blog some poetic licence is taken to quickly explain the concepts.
The graphic below shows a path through the standard for a semi-conductor device. Within Analog Devices this flow is captured in our ADI61508 process.
The first task is to understand the environment. This includes not only the EMC environment, the average and the extremes of the temperatures at which the circuitry is expected to operate but also what standards and regulations apply.
Next comes the hazard analysis where the safety functions are identified. Typically, you will need a safety function to address each hazard unless the item can be redesigned to eliminate the hazard.
The third box is where the safety integrity requirements for each of the safety functions is determined. Typically, this is done based on the severity of the harm and the frequency at which that harm may occur.
The next three vertical boxes show the various ways to address the systematic requirements. Systematic failures are failures not caused by random events. Examples of systematic failures are not having enough EMC robustness, missing requirements, something missed because of insufficient testing. Route 1S based on meeting all the requirements in IEC 61508 is the most common option but Route 2S based on evidence of proven in use is also possible. Route 3S is only an option for software and involves retrospectively doing all the paperwork and analyses you should have done in the first place. For an IC the requirements form IEC 61508-2:2010 Annex F shows a means to achieve route 1S.
Then you have two options on how to meet the hardware integrity requirements. Route 1H allows a trade-off between diagnostic coverage and hardware fault tolerance(redundancy). For example, for SIL 3 you could use no redundancy but have a SFF (safe failure fraction – a measure of diagnostic coverage) of 99% or an HFT (hardware fault tolerance) of 1 and 90% SFF in each channel. Route 2H is based on field experience and minimum levels of HFT.
Next if there is on-chip or off-chip redundancy you need to consider CCF (common cause failures). CCF can easily defeat redundancy and CCF are the most common means to defeat a redundant system. Annex E gives guidance on minimizing the risk of on-chip CCF where on-chip redundancy is used through the use of isolation wells, on-chip separation etc.
Now the PFH (probability of dangerous failure per hour) or PFD (probability of failure on demand) need to be calculated. Depending on the SIL level there will be maximum values for these metrics. Typically, an IC will be allocated only a fraction of that maximum.
"When the weight of the paperwork equals the weight of the plane
it is ready to fly."
Next data communications need to be considered. Guidance says that perhaps 1% of the PFH budget should be allocated to interfaces. This might involve calculations based on the bit error rate for the transmission medium, the number of bits transferred per message, the number of messages per hour and the Hamming distance of any CRC used to detect failures. (There will be a blog on this topic.)
Perhaps at the end is the wrong place to put this but if you have on-chip diagnostics you need to consider what you want to do when the diagnostics discover an error. For a motor control application, you may want to stop the power but for other applications you need to know a lot about the final application. For instance, in a nuclear power station cooling application you probably want to keep the coolant flowing but if it is a system carrying gas you might want to stop the gas flowing.
There are lots of other sub-tasks such as configuration management, change management, gathering evidence of competence, independent assessment - not shown above and remember documentation is key. If it is not written down it didn’t happen. Not only must the product be safe but you must be able to demonstrate the reasoning behind it’s safety. There is a saying in avionics that when the weight of the paperwork equals the weight of the plane it is ready to fly.
Video of the day: shows some of the testing required before an airplane can fly – my understanding is that this test was done, in the dark, with half the exits blocked and nobody knows in advance which half – regardless of the size of the plane everybody must be off in less than 90 seconds – see https://www.youtube.com/watch?v=_gqWeJGwV_U
For the next time - The Functional safety requirements for Robots, Cobots and Mobots.
Hi Leonfun, I am not sure I understand your comment but I will reply as best I can. First identify the hazards. If the hazard can then somehow be addressed using a circuit that looks something like a sensor+logic+actuator then you are talking about functional safety. If instead the hazard can be addressed by using one of our digital isolator parts such as an aduM1310, AD7401 or ADuM4135 you are probably talking about electrical safety as opposed to functional safety - Tom
Hey Tom, Would you please explain more about how to define a safety function during hazard analysis of a isolated chip?