Skip navigation
1 2 3 Previous Next

EngineerZone Spotlight

214 posts

In my last blog on soft errors I promised that the next blog would be on the topic of PFH and PFD. However, since that promise it occurred to me that I should first cover low and high demand mode.

Within IEC 61508 there are basically two types of safety functions, high demand and low demand. A high demand safety function is for a demand which occurs more often than once per year (e.g. once per day) and a low demand is for something which has an expected demand rate of less than once per year (e.g. once every 10 years).

Determining whether a safety function is low or high demand has implications including:

·        The key reliability metric – could be PFH of PFD (see next blog in series)

·        The suitable methods to determine the required SIL of any safety function

·        The measures which must be taken to prevent the introduction of design errors (systematic errors)

·        The diagnostic rate

There is no definition of “demand” in IEC 61508, but IEC TR 631161 defines a demand as an “event that causes the safety control system to perform the safety control function." Within the process industry, a demand may also be referred to as a process update or a process deviation.

IEC 61508 defines a third mode of operation called continuous mode, but the requirements are similar to high demand mode.  In low and high demand modes two things need to happen for someone to get hurt. 1) the safety system needs to fail and 2) a demand must occur while the safety system is in the failed state. In continuous mode, an incident happens as soon as the safety system fails dangerously as it is the safety system which is maintaining safety.  

Figure 1 - Continuous mode vs demand mode according to ISO/TR 12489:2013

While IEC 61508 as a basic standard needs to cover both low and high demand mode this is not the case for sector specific standards. For instance, machinery only has high demand and process control has mostly low demand. Something like an airbag sub-system has high and low demand safety functions although ISO 26262 does not have modes of operation at all making all safety functions effectively high demand (the low demand safety function deploy airbag when you crash, the high demand safety function is to prevent inadvertent deployment.)

Figure 2 - how to calculate demand rate from IEC 63161

The above figure is from a draft of IEC 63161 where it calculates demand rate as DR=IR.Pr.Fr.(1-AV)

In the next blog I will deal with the PFH (high demand) and PFD (low demand) metrics.

The demand rate can be used to determine the SIL in terms of the systematic requirements. Suppose the maximum acceptable risk is deemed to be 1e-5/y. And suppose only 1 in 100 events leads to a fatality => can allow a demand to occur 100 times more frequently => 1e-3/y without exceeding the 1e-5/y number. Further suppose that the EUC (equipment under control) will fail only once every 5 years (0.2/y). Then the average failure of demand on the safety system needs to be a maximum of 1e-3/0.2 =5e-3 which is 1 in 200 which is in the SIL 2 range according to IEC 61508-1:2010 table 2.

Therefore, the system needs to achieve a RRF (risk reduction factor) of 200 and meet the systematic requirements for SIL 2 – Note an RRF of 100 to 999 would give a SIL 2 requirement in terms of the systematic requirements (also known as systematic capability SC 1 to SC 4). However, the PFH and PFD (see next blog) must still be sufficient to achieve a risk reduction factor of 200.

In regards to setting the diagnostic test rate see IEC 61508-2:2010 clauses and In effect for a non-redundant system it states that the sum of the diagnostic test interval (inverse of diagnostic test rate) plus the time to achieve the safe state should be less that the process safety time OR the ratio of the diagnostic test rate to the demand rate equals 100. For low demand safety functions there is no minimum diagnostic test rate but once/day or once/shift is generally taken as conservative and should allow the hardware reliability metric to be met (PFD).

This blogs video discusses airbags – see

For next time, the discussion will be on the “PFH and PFD”.

If you wish to read more on the topic the following papers are good but you will need motivation and stamina to persist:

1)     SIL Determination – Dealing with the unexpected by Alan G King

2)     Reliability of safety-critical systems: Theory and application – section 2.6

3)     IEC 63161

4)     ISO/TR 12489:2013 clause 3.2.13 and others

5)     Functional Safety – an IEC 61508 SIL 3 compliant development process section E.7


Lithium-Ion batteries are produced in a variety of form factors and capacities and are used to provide power in a range of applications from cell phones, laptop computers, electric vehicles, to energy storage. The cells used in these applications can have different chemistries and widely varying capacities that require a system capable of delivering different voltage and current requirements. The AD8452 Precision Integrated Analog Front End, Controller, and PWM for Battery Test and Formation Systems, can provide a flexible and scalable platform to adapt to various cell requirements.


ADI has developed a number of reference designs that have enabled customers to more easily adopt new formation & test capabilities. Now, the AD8452 4-Channel System Board is being developed to help accelerate the design of battery formation and test equipment while highlighting the versatility of the AD8452. This system will incorporate the adaptable feature set of the AD8452; allowing customers to quickly evaluate the performance benefits of a flexible multichannel system and reduce the time needed to bring new products to market.


This new reference design will demonstrate software configurability that allows multiple different operating modes. The Independent mode of operation will provide bidirectional operation of four power channels that can be individually adjusted to different current and voltage levels. This mode will allow energy from one or more batteries to be directly recycled to charge other batteries connected to the system. Parallel channel operating modes will also be enabled through software providing increased current output while maintaining energy recycling capabilities. All modes will allow the user to select various clock sources for synchronization between channels as well as adjust the individual phase of each channel to take advantage of decreased switching ripple and reduced Electromagnetic Interference (EMI).


Safety features will also be an important consideration of the AD8452 4-Channel System Board. The reference design will be able to disconnect the input bus voltage and battery during fault conditions.  Faults such as overvoltage, overcurrent, reverse battery connections, and over-temperature will be detected and appropriate actions will be performed to protect the system.


Given the number of different requirements for the manufacturing and test of batteries, battery formation and test equipment manufacturers face significant challenges supporting the production demands of the industry and the wide variety of cell capacities. The AD8452 4-Channel System Board aims to deliver a flexible and scalable reference design that customers can leverage to develop products that meet the demanding requirements of the battery formation and test industry now and in the future.


If you’d like to be notified when this reference design is published, please leave a note in the comments section. We’ll be sure to update you.


Soft Errors - Hard Facts

Posted by Tom-M Employee Jul 17, 2018

I am long overdue to include a blog on the functional safety requirements for software and indeed the functional safety requirements for Verilog code however this isn’t that blog. By soft errors I mean bit flips in RAM or FF that are not caused by hard errors and therefore disappear when the power is cycled. Previously soft errors were largely ignored and reliability predictions concentrated on hard errors but when IEC 61508-2:2010 mentioned soft errors they could no longer be ignored. This is good because in parts with significant RAM the soft error rate can easily exceed the hard error rate by three orders of magnitude. However even if parts with no RAM there can be a large number of FF and so every part will have some level of soft errors. Even analog circuits such as those using switched cap architectures can suffer from soft errors but it is largely ignored given the relative scale of the problems.


Soft errors are largely caused by alpha particles from the packaging materials and neutron particles caused by galactic sources. At ground altitudes the two contribute roughly equally. While the alpha particles cannot penetrate deeply into silicon they are coming from on top of the die so they are hard to shield against but the literature suggests polyamide can help. On the other hand, neutron particles are hard to shield against without using several meters of cement or lead. Therefore, mitigation is needed either at the CMOS device level, the module level on the IC, at the system level on the IC or at the top-level system level.


IEC 61508-7:2010 part 7 advocates a value of 1000 FIT/mega bit if you don’t have better information.  The widely accepted Siemens SN29500 series of standards advocates 1200 FIT/mega bit. In reality 1000 FIT/mega bit is widely accepted. The optimum would be to test every IC but that is still not uncontroversial as you run into issues related to the many different types of FF used in a typical part, issues related to accelerated testing vs testing on top of a mountain top and discussions over the AVF (architecture vulnerability factor) whereby many of the soft errors never propagate to create a system failure.


At the CMOS device level, you could use hardened devices (triple well, SOI, extra capacitance) but the most common way to cope with software errors is at the silicon block level by adding parity or ECC to RAM. A parity bit will detect if one-bit flips in a protected byte or word. However, it cannot detect if two bits flip. If parity is combined with the physical separation of logically contiguous bits then this problem is overcome as one particle should no longer flip two bits in the same word. ECC on the other-hand can typically detect all one-bit, two-bit errors and most higher bit errors. The big advantage of ECC over parity is in fact that it can recover from one-bit errors with no intervention required. For parity errors it is generally required to reboot the system to clear the error but it depends on the end application. If either ECC or parity is not available the application designer can still mitigate against soft errors by storing critical values in two memory locations and comparing the results before using the values. This however tends to mess up the application code. Other options including using a two-channel system with comparison. This is somewhat similar to a CAT 3 or CAT 4 architecture from ISO 13849 as it is typically drawn. A dual core lockstep architecture achieves similar benefits.



Parts such as the ADSP-CM417F from ADI facilitates several of the above solutions. The on-chip RAM has ECC and physical separation, the RAM is built from multiple separate 32k blocks and it contains two cores with evidence of sufficient separation available. While parts such as the  a the AD7124 ( 24 bit sigma-delta ADC) contain an on-chip state machine which at the end of the configuration state stores a golden CRC and thereafter the state machine recalculates the CRC at an interval of less than 500uS to check if any of the configuration bits have flipped. Both of these also illustrate the value of Safety Datasheets whereby the end user gets extra information to facilitate doing a safety analysis e.g. information on the physical separation of the logically contiguous bits in a RAM, the fact a RAM isn’t implemented as one big block but rather several smaller blocks…..


The most famous recent case of soft errors came from automotive where it was mooted that a single bit flip could cause unintended acceleration. However other cases have included voting machine errors and electricity sub-stations shutting down.


If I had time I would have discussed soft errors and FPGA but perhaps that will do for another day.


This video is inappropriate in the sense that neutrinos don’t really interact with matter but it is impressive and it shows that high energy particles are passing through all matter – see I though this video used to be longer but you get the idea, perhaps the longer version has been taken down.


This week there is a bonus video explaining how Tesla have decided to be radiant tolerant instead of radiation hardened – see It is also a good example of taking care of safety at the system level.

Good books on the topic of soft errors if you want to learn more include

1) “Soft errors in modern Electronic systems”

2) “Architecture design for soft errors”

For next time, the discussion will be on the “PFH and PFD”.


Is there a secret sauce for functional safety? I have been asked if there was only one thing from functional safety that I think should be applied to every product development, what would it be. 

My answer: rigorous requirements management.

Requirements management is important for any project but functional safety takes it to a new level and insists on minimum standards.

In theory, requirements capture and management are easy and if you read the books on the topic it all looks good. However, when you try to apply it you will often run into problems. The advent of tools such as Jama and Doors have taken some of the hassle out of requirements management but there is still a lot of work to do.

All functional safety standards have requirements related to the properties of individual requirements and to the set of requirements.

Individual requirements need to have properties such as

  • Correct – technically and legally possible
  • Feasible – can be accomplished within cost and schedule
  • Clear – unambiguous and not confusing
  • Verifiable – it can be determined that the system meets the requirements
  • Traceable – can be uniquely identified and tracked
  • Abstract – does not impose a specific design solution on the layer below

While the set of requirements needs to have properties including

  • Complete – expresses a whole idea, all requirements are present
  • Consistent – no conflicting requirements
  • Unique – each requirement is only specified once
  • Modular – requirements that belong together are put close together
  • Structured – top down and traceable

There are differences between the standards such as for instance the avionics standard D0-178C looking for one requirement for every 25 lines of code and no unreachable or dead code but in general they all require the above.

Aside from the properties of the requirements there are requirements in IEC 61508 that insist on requirements traceability such as the table below from IEC 61508-3:2010.

For an IC forward traceability implies that you should be able to trace from the top-level requirements to the block on silicon which implements it and the test on real silicon which verifies that it was met. The purpose being that if it was worth writing down a requirement then it is worth making sure it has been implemented. In turn, if you have design blocks which don’t trace back to top level requirements it might represent gold plating on the part of the designer or missing top-level requirements.

Backwards traceability means that you can trace back from a test performed on the silicon to the top level requirement which required that test. Amongst the advantages of backwards traceability is that fact that if you discover a problem with a test you can immediately know which top level requirements might be impacted. If you have a test that doesn’t trace back to a top-level requirement it demonstrates that you are probably missing a top-level requirement, otherwise you are doing unnecessary testing. Tools such as Jama can provide compliance matrices to demonstrate that there are no untraceable requirements.

One place where I believe most of the standards fall down in terminology is in mixing up the terms requirements and specifications. Requirements express what needs to be achieved and specifications express one means of achieving it. Many of the standards however use the terms interchangeably and even use terms such as requirements specifications.

An interesting future talk might be functional safety and Agile where requirements management needs to be considered.

Apologies if I have used the six red lines video already but it is good so I will take the risk -

For next time, the discussion will be on the “soft errors”.


I recently came across a series of standards for lifts (elevators), escalators and moving walkways. I thought it was a good example of a sector specific standard. The standard in question is ISO 22201-1 and was first published in 2017. It uses SIL levels and lists fifty one safety functions and gives a SIL for each one with a maximum SIL of 3. This is one of the nice things about a sector specific standard if one exists. It takes the generic IEC 61508 standard and identifies the requirements which are applicable to that equipment type. It is much easier to read a standard related to one specific type of equipment as it is only about 60 pages long instead of a generic standard which can span up to to 700 pages. Part 2 of ISO 22201 covers escalators and moving walk ways.


Even if you have no interest in lifts/elevators hopefully you will find the details of what a sector specific standard looks like interesting.


The first part of any standard you should read is the scope and I note the scope of this standard includes both passenger and goods lifts and their use in hotels, homes, factories and hospitals. It refers to the safety systems as PESSRAL (programmable electronic systems in safety related applications for lifts).


Similar to the home appliance standards UL 1998 and IEC 60730 it gives various architectures and their suitability for various SIL: 


  • One channel with self-test for SIL 1
  • One channel with self-test and monitoring for SIL 2, where monitor means a separate diagnostic block
  • Two channels with comparison for SIL 2 and SIL 3


I note the table describing these architectures describes them as “possible measures for failure control” which I like as in theory IEC 61508 allows a single channel system to SIL 3 if the diagnostic coverage is high enough. Achieving 99% diagnostic coverage is difficult but it is still good not to “tie a designers hands”.


However there appears to be no such choice with the SIL. Other standards such as the robot standard ISO 10218-1 requires SIL 2 with a HFT of 1 or PLd, CAT3 unless a risk assessment shows otherwise but here there are no “ifs or buts” a given safety integrity level is mandated.


Safety functions specified at SIL 3 includes safety functions to check for:

  • Loss of tension in the compensation means
  • Working platform is fully retracted
  • Loss of tension in the governor rope or car safety rope
  • Car or landing door, or car or landing door panels are open


Safety functions at a SIL 1 level include:

  • Detects loss of DC hoist motor field running current
  • Detects if car safety gear is actuated
  • Detects and engaged clamping device


A separate table gives the safe state for each of the 51 safety functions.


The standard contains two annexes one of which is normative and one is informative. (A normative annex gives requirements while an informative one contains only guidance.) The informative Annex A allows two routes one using the measures from IEC 61508-2 and IEC 61508-3 or alternatively a tailored approach based on the contents of the Annex.


Items which came to my notice when reading Annex A include


  • No requirement to consider a combination of two or more faults
  • A means to claim a fault exclusion for shorts on a PCB
  • A minimum separation distance on a PCB of 3mm (clearance) and 4mm (creepage) if a safety and non-safety function are on the same PCB
  • Specific safety accuracies in the range of +/-1% to +/-5% for things like the measurement of masses, forces, distances, speeds, voltages, currents, temperature and accelerations
  • Requirements for protection against all odd bit, 2 bit and some 3 bit failures in variable memories for single channel safety functions even at SIL 1


I note there is no mention of cyber security which means for guidance you must fall back on IEC 61508 which in turn refers you to the IEC 62443 series which was covered in an earlier blog.


Also, not covered are network requirements. Therefore, the reader needs to revert to IEC 61508 and then IEC 61784-3. Typically, this means that 1% of the allowed PFH (probability of dangerous failure per hour) would apply to the network which is a failure rate of 1e-9/h for a SIL 3 safety function. See my previous blog on functional safety for networking for more information.


My video for this blog shows a dangerous situation on a ski lift – I hope nobody was seriously injured as it looks bad-


For next time, the discussion will be on the “If I had to keep one thing from Functional Safety”.

While I am the functional safety guy for Analog Device’s industrial products I find it useful to read books related to many application areas. Often what is poorly explained in one book is very well explained in a book from another domain. I find a similar thing with the standards themselves. I imagine that some of the books I like, others will find not useful. It depends a lot on your level of knowledge, what you were hoping to find and your background.


I get my books from as I like to read my books on the Kindle app these days. With the Kindle app you can search for something, easily highlight important bits of text and my book press stops growing. However many of the books referenced from IEC 61508 are old are only paper copies are available and then only second hand.


If I had to pick two of the below to start with it would be the two free Rockwell automation books.



The first functional safety book I read was Safety Critical Systems Handbook by David J. Smith and Kenneth G.L. Simpson. As a result of reading the book I attended David Smiths training in the UK. If you read the first half of the book it gives a very quick and easy introduction to the topic.


The Functional Safety Lifecycle

Functional Safety – An IEC 61508 SIL 3 Compliant Development Process by Michael Medoff & Rainer Faller is an excellent book. Sections I particularly liked were those on derating and the quantitative analysis of failures rates on interfaces.


The System Safety-Lessons Learned in Safety Management and Engineering by Terry L. Hardy illustrates the importance of putting in the safety effort where it actually adds value.


Cenelec 50128 and IEC 6229 Standards by Jean-Louis Boulnger. While this concentrates on rail I decided the put it in the functional safety process section. I found it had lots of good insights and chapter 6 of “Data preparation” is good on parameter based systems.


The Checklist Manifesto – How to Get Things Right by Atul Gawande is not a functional safety book at all. Gawande  expresses the value of checklists .


In the interests of Safety – The Absurd Rules that Blight our Lives and How We Can Change Them by Tracey Brown and Michael Hanlon. Not a functional safety book at all but it teaches you a lesson on how to use common sense as opposed to blindly following the letter of the rules.


Requirements Engineering by Elizabeth Hull, Ken Jackson and Jeremy Dick is a good introduction to the topic. I really like the example of requirements traceability involving A4 pages, a big room and lots of string.


Configuration Management – Best Practices by Bob Aiello and Leslie Sachs is a good explanation of a topic that is covered in the standards as if everybody already knew how to implement it.



Reliability Maintainability and Risk by David J. Smith is a great effort to explain the maths behind functional safety in as readable a way as possible for such a topic.


Control systems Safety Evaluation and Reliability by William M. Goble has nice big writing, lots of pictures and chapter 9 on diagnostics has the best explanation I have seen on Markov analysis.



Better Embedded System Software by Philip Koopman does not claim to be a functional safety book at all and is now hard to get. However it has great chapter names such as “Global variables are Evil” and all that is in it is very relevant to functional safety.


Software for Dependable Systems – Sufficient Evidence – a short but interesting book


Embedded Software Development or Safety-Critical Systems by Chris Hobbs – is also a good and book with lots of interesting insights.


The Leprechauns of Software Engineering by Laurent Bossavit is a nice light book to read on an airplane and tries to find the source of many software myths.


Sector specific books

Process Safebook 1-Functional Safety in the Process Industry is a free book available in PDF or paper form from Rockwell automation.  It runs to 168 pages.


Safe Book 4 – Safety Related Control Systems for Machinery is another free book from Rockwell automation.


BGIA Report 2/2008e – Functional Safety of Machine Controls – Application of EN ISO 13849 is technically not a book but rather a free download. However it runs to over 400 pages and deals with everything related to ISO 13849 so I had to include it.


Functional Safety in Practice by Harvey T. Dearden is focused on automotive functional safety but has some good insights if the allusion to Russian roulette on the front cover is somewhat confusing.


Basic Guide to (Automotive) Functional Safety by Thorsten Langenhan has lots of English grammar mistakes but is still an insightful read.


Avionics Certification by Vance Hilderman and Tony Baghai is an encouraging book. If the requirements from functional safety seem impossible to achieve, have a read and you will feel better.


Cyber Security

If you are not secure then you can’t be safe. Therefore learning about cyber security is also important.


Embedded Systems Security by David Kleidermacher and Mike Kleidermacher is a book I want to read again.


Industrial Network Security by David J. Teumin is a short but good introduction.



Video of the Day: Finding a relevant video took a bit of thinking – this is my best effort:


For next time, the discussion will be on the “Functional Safety for Elevators”.

In my last post I discussed cyber security and functional safety and said if you are not secure, then you are not safe.The main non-sector specific functional safety standard is IEC 61508. Within IEC 61508 it references IEC 62443 for security. IEC 62443 is entitled “Security for industrial Automation and Control systems” or “Industrial communication networks – Network and system security” depending on where you look. At last count it consisted of 13 parts and almost 1000 pages. The standards are being developed and published via the ISA (international society of automation engineers) committee ISA99 and the IEC (international electro-technical committees) IEC TC 65. IEC TC 65/SC 65A also publishes the functional safety standards IEC 61511 and IEC 61508 which is our first clue that the two areas might be related.


The four parts of IEC 62443-1-X deal with general concepts including concepts and models and a glossary of terms and conditions. The four parts in IEC 62443-2-X deal with policies and procedures including patch management while IEC 62443-3-X has three parts dealing with system level topics including the choosing of the correct SL (security level). The two parts of IEC 62443-4-X are probably the most interesting to companies like Analog Devices and our customers as these relate to component suppliers, with one part covering the life cycle requirements and the other the technical requirements. 


A key concept within the IEC 62443 series is that of zones and conduits. Put in simple language a zone contains nodes with similar security requirements and a conduit is a link between zones.

A similarity with functional safety is that IEC 62443 nominates four SL (security levels) which sound very similar to the four SIL from IEC 61508 (another clue to the links).  However, there is no one to one correspondence between SL and SIL. The definitions of the SL are contained in IEC 62443-1-1 and are shown below.



The definitions concentrate more on what is required to hack the system than the likelihood or probability of the system being hacked. There are alternate definitions given in various articles such as one which states that SL 4 is designed to prevent a nation state level attack. The tables in part 3-2 of the standard expand somewhat on the above using a combination of impact and likelihood to determine the required SL.


IEC 62443-1-1 defines seven foundational requirements (FR) to achieve a given SL. These are

  • Identification and authentication control(IAC)
  • Use control(UC)
  • System integrity(SI)
  • Data confidentiality (DC)
  • Restricted data flow(RDF)
  • Timely response to events(TRE)
  • Resource availability(RA)


These seven FR can be expressed as a vector so that [1,1,1,1,1,1,1] represents each of the above seven FR implemented to a SL 1 level of rigour. From a purely functional safety point of view you can then argue that by confidentiality, restricted data flow and resource availability are not so important and a SL 1 implementation is sufficient. Therefore, the required security vector for a safety system becomes [X,X,X,1,1,X,1] where X represents a SL of at least one.


If developing an IC or a piece of equipment once you have determined the required SL, you then proceed to IEC 62443-4-1- and IEC 62443-4-2. IEC 62443-4-1 tells you the process steps necessary under eight headings including security management and having an in depth defense strategy. The requirements are given independent of the SL. IEC 62443-4-2 gives you requirements under the heading of the seven FR and with additional requirements depending on whether it is an application, an embedded device a host device or a networked device. According to IEC 62443-4-2 the necessary requirements depend on the SL.


Part 4-2 provides requirements for 4 types of components with 47 requirements in total depending on the SL.


There is now a certification scheme in place for IEC 62443, see ISAsecure and the various TUV and Exida also offer certification.


Video of the Day: This video from Siemens highlights some of the issues and has dramatic music which I like in a video -


For next time, the topic will be functional safety: recommended reads.

Nearly all horse racing fans and even most causal sports fans know the name Secretariat. The thoroughbred shattered track records on the way to a Triple Crown sweep in 1973. What many people may not know is that Secretariat didn’t break the track record at the Preakness Stakes, at least not officially, until almost 40 years later.


Analog Devices engineer, Tom Westenburg, helped set the record straight. You may remember Tom from a February blog where he talked about his experience with ensuring the accuracy of the timing systems used at many of the Winter Olympic sliding tracks.


With this year’s Preakness Stakes just a few days away, Tom was kind enough to share his story about Secretariat and record that almost wasn’t. Here’s Tom’s account:


When Secretariat won the 1973 Preakness Stakes, the official time was 1:55, which appeared to be around 2 seconds longer than what hand-held timers recorded (1:53.2). You can read more about what happened and why Secretariat didn’t hold the track record even though the horse should have here and here.


Back in 1973 I was a teenager mowing a woman’s lawn when this race happened. She insisted that I come inside and watch it. She had grown up in Kentucky and her family had raced horses, so she filled me in on all the “behind the scenes” details and how, if Secretariat won the Preakness, he’d probably be a Triple Crown winner. There was something different about the Preakness, I think length and/or surface of the track. As I watched her jump and scream as Secretariat won, I had no idea that I’d be working to correct a bad time 39 years later.


My work would involve reviewing the videotape of the race to determine what time I thought was correct, and why. I had a piece of software that could count fields of a video. A frame is made up of two fields, so using fields gave me twice the time resolution, or 1/59.94 Hz. My plan was to count frames, then calculate the worst possible time error, multiply it out and I’d be done. It wasn’t that easy. I also thought it’d be easy to find specifications for 1973 (or older) equipment, and locate the older NTSC standards. I wanted to find out how tight the 16.683 ms field rate was. I was able to find specifications from that era, but not specifically for 1973. However, when I went through the math on oscillator and thermal drift errors, that error was minimal. Trying to count frames was much more difficult than I expected. The cameras at the start and finish were not perpendicular to the track, so I had to estimate. I also had to interpolate between fields. As it turns out these were the largest error sources. I came up with 1:53.08, with a range of 1:53.00 to 1:53.15. There was no question in my mind that 1:55 was incorrect.    


While doing this I learned that a horse race does not start when the gate doors open; it starts down the track. This short stretch is called the “run-up.” It varies from track to track, and can be as long as 375 feet from the starting gate. At the Preakness the run-up is around 150 feet. I also learned that in the past, horse racing was timed to one fifth of a second, or 0.2 s resolution, (1:53 0/5th s). Tracks today are timed to 1/100 s, or 10 ms.


In my written testimony, I speculated on what could have caused the error. It seemed likely that the start-timing light started the timer early. It could have been from a bright glint of light (sun on a mirror or a camera flash) saturating the receiver and causing it to trip erroneously. After the testimony, someone who was there at the race in 1973 told me that a man ran out onto the track to pick up a piece of trash that blew onto the track about the time the gates opened. He left the track near the start-timing light. This is my revised theory as to what happened, but we’ll never know for sure.


This was the third hearing attempting to correct Secretariat’s time, and track record. Penny Chenery (Secretariat’s owner) was getting up in years and she was determined to correct this before she died. I never met Penny during this, but she did write me a very nice letter after the time and record were corrected and thanked me for my assistance. I would have liked to have met her, she seemed like a very interesting and unique person. I received her thoughts and concerns through Leonard Lusky, who was working meticulously to put everything together. He did a great job at laying out and building the case to get the Maryland Racing Commission to understand and change Secretariat’s time. Penny died September 16, 2017 at 95 years of age. I’m very happy I could be part of this, and that everybody involved may have given her a little peace of mind that things were set straight before she passed.  


Functional Safety & Security

Posted by Tom-M Employee May 8, 2018

Functional safety concentrates on protecting people, assets and the environment from inadvertent harm caused by non-malicious actors, for instance by bad planning, bad implementation, a bad set of requirements or random failures.


Cyber security on the other hand concentrates on harm caused by malicious actors. Somebody deliberately causes the system to fail in a way that brings some advantage to them.


Given that functional safety concentrates on “accidents” and “mishaps” and security deals with deliberate “hacks”, you do need to think about it somewhat differently. For instance, it is more important to think about what is possible as opposed to what is probable.


In many languages there is only one word to cover both safety and security. For instance, in German it is Sicherheit. Therefore, I generally try and remember to say, “cyber security” instead of security to make it clear which I meant.


All systems with functional safety requirements have security requirements. At a minimum in functional safety you must protect against foreseeable misuse and somebody hacking the system comes under that category. There will be lots of systems with security requirements which are not safety relevant. Therefore, systems with functional safety requirements are a subset of systems with security requirements.



Sometimes the root cause of both safety and security concerns are the same. Suppose you have 1,000 lines of code and it contains a single design error. If you only consider safety then that buggy line of code may never be executed or may execute at a time when the bug doesn’t matter. However, a hacker becoming aware of such a bug can try to exploit the situation so that the dodgy line of code is always executed.


Like functional safety, cyber security comes with its own terminology. You have attack surfaces, a PUF (physically un-clonable function), side channel and glitch attacks. Some of the security requirements such as threat assessments, parallel things like a hazard analysis in functional safety. Also, there are procedures for setting a target security level which are quite like those for SIL determination. Perhaps the biggest similarity though is that both are emergent system level properties and it is very hard if not impossible to add security or safety afterwards to an already designed system.


Within Analog Devices we are lucky in that some years ago we acquired the Cyber Security Solutions (CSS) business of Sypris Electronics who are based in Tampa Florida and they have become the Trusted Security Solutions group within Analog Devices. As the safety guy I do need to know something about security but it is good to have the real experts on call.


  • Regular security patches are generally not possible on the factory floor for fear of upsetting production
  • Many of the nodes used in industrial are resource constrained with RAM often << 1 Mega byte
  • The equipment lifetimes can often be twenty years or more
  • Some of the controller equipment is dangerous and can cause harm
  • Much of the equipment is time critical and security can add big time overhead
  • Many of the protocols are proprietary


An interesting example, if you have a nuclear shutdown system - is it appropriate to lock out the safety guy from the shutdown system if he gets his password wrong three times?


Within industrial circles the most famous cyber security incident is the Stuxnet virus. It was designed to target the Iranian nuclear enrichment program via a Siemens S7 PLC. It is believed to have been written and deployed by state level actions. There is an excellent documentary film on the topic called “Zero Days”.


This blogs video is the trailer for Zero Days – see


Within IEC 61508 it references the IEC 62443 series for cyber security requirements. Therefore, for my next blog, the discussion will be on the “The IEC 62443 series of cyber security standards”.




3GPP declared a major milestone for 5G this past December by announcing the approval of the first 5G New Radio (NR) specifications. But even after that formal milestone, the members of 3GPP will spend at least the next six months finishing additional required details of the 5G specification.


While the specification for the radio are approaching completeness, the test specifications were barely started when the announcement was made. Test specifications are an important part of the overall 3GPP output as they are adopted by certification bodies to certify user equipment (UE). RAN5 is the working group within 3GPP which has the task of detailing the UE test specifications also known as conformance specifications. These specifications include the various well-known tests such as RF transmit and receive power, waveform quality, occupied bandwidth, adjacent channel leakage, etc. There are also protocol specifications that define the behavioral performance of signaling between the phone and network, yet to be written. 


As of March 2018, 3GPP RAN5 had established the skeletons of the test specifications as well as significant detailing of some aspects of the specifications. These test specifications are pre-release documents and can be seen as very early due to the very frequent use of “TBD” (to be determined) and “FFS” (for future study)—these are known unknowns that are placeholders for future values.    



User Equipment (UE) conformance specification; Part 1: Common test environment


Special conformance testing functions for User Equipment (UE)


User Equipment (UE) conformance specification; Applicability of RF and RRM test cases


UE conformance specification; Part 1: Protocol conformance specification; RAN5 doc


User Equipment (UE) conformance specification; Radio transmission and reception; Part 1: [Frequency] Range 1 Standalone


User Equipment (UE) conformance specification; Radio transmission and reception; Part 2: [Frequency] Range 2 Standalone


User Equipment (UE) conformance specification; Radio transmission and reception; Part 3: NR interworking between NR range1 + NR range2; and between NR and LTE;


User Equipment (UE) conformance specification; Radio transmission and reception; Part 4:  Performance requirements;

5G UE test specifications produced by 3GPP RAN5 (source:


An interesting aspect of testing 5G, and a concern for the industry, is how to test beamforming in base stations and mobile phones while the system is actively scanning and tracking the 3-D sphere for energy. A new function defined for 5G NR that helps with this challenge is called “beamlock” (see specification 38.509). The beamlock function forces the UE to freeze the beamforming pattern of the UE so that testing can occur. The receive pattern and transmit beam patterns can be independently frozen. This test function is not meant to be used in regular operation . The presence of this function reinforces that the world is quite different when it comes to testing a beam-formed millimeter wave system-- the yet to be defined over-the-air (OTA) tests will be far different from past wireless generations.  And it is certain to be a challenging task finding agreement on these tests given the highly varied opinions of RAN5 attendees and complex technical nature of the problem.   


The early nature of the test standards coupled with the complexity of OTA suggests a substantial amount of work in order to complete the test specifications by 3GPP’s goal of the end of 2018. In order to achieve the goal, 3GPP RAN5 will continue its world tour, hosting 2018 meetings in Korea, Sweden and the United States.  In addition, there will likely be many discussions over phone and email to complete the work.  


As we look forward to the future, we await 3GPP’s detailing of the test specification, especially in the aforementioned area of OTA. With beamforming and tightly integrated device antennas, we expect the R&D, type approval, and production tests to have a dramatic increase in the amount of OTA testing compared to prior generations.


Follow EngineerZone Spotlight to receive updates when new blogs about 5G or other interesting topics are published.


Functional Safety and Networking

Posted by Tom-M Employee Apr 25, 2018

After writing my blog on the functional safety requirements for robots, cobots and mobots I thought it would be interesting to tackle functional safety requirements for networking. The two topics are linked as most robots will be networked as robots are an important part of Industrie 4.0.


Mentions of networking within IEC 61508 are few with only IEC 61508-2:2010 clause 7.4.11 offering much guidance where it offers white channel and a black channel approaches and refers the user to IEC 61784-3 or the IEC 62280 series. Using the white channel approach, the entire network including the communication devices at both ends are developed to the relevant functional safety standards. This would be a lot of work and limit the use of standard networking components. The more common approach is the use of the black channel where no assumptions are made about the channel and safety is taken care of with an additional SCL (safety communication layer) in the application software. This SCL is developed to the safety standards but everything else in the communication system is just a standard component. The picture below is taken from the IEC 61784-3 standard.


IEC 61784-3 is a fieldbus standard and the IEC 62280 series (also known as EN 50159) covers trains. EN 50159 gives a series of threats and a list of possible defenses against those threats. For each threat the SCL must implement at least one defense, see below.


For safety of machinery the time-out defense is of particular interest. It effectively implements a watchdog timer so that if for instance a robot receives no communications then after a specified interval it takes the robot to it’s safe state.


Also, table B.2 of EN50159 is of interest. It lists various categories of networks and identifies each of the threats as either negligible, needing some protection or needing strong countermeasures. A Category 1 network might be considered as the closed network within a robot or cobot or perhaps the interface between an analog to digital converter and a local micro-controller. A category 1 network has a known fixed maximum number of users and limited opportunity for unauthorized access. A category 3 network on the other hand might be something like a wireless network which typically has a lot more opportunities for unauthorized access than a wired network.


The white channel approach is not widely used but I wonder will new requirements such as those for TSN (time-sensitive networking) change that. This might be a good topic for a future blog.


I have struggled to find a good video related to functional safety and networking – this one is even more tenuous than normal. For anyone who doesn’t spot the link – leave a comment in the comments section and I will get back to you – see


Actually, this week there is a bonus video which discusses how to decide if your CRC is good enough. It shows how to combine the hamming distance of the CRC, the expected bit error rate of the network, the number of bits transferred per second and the required SIL level to determine if your CRC is good enough the meet the PFH requirements from IEC 61508 or indeed ISO 13849 – see


Follow EngineerZone Spotlight to be notified of new safety blogs.

This might be my shortest blog yet. Artificial intelligence comes by many names including machine learning. Systems that understand hand writing are not referred to as AI but rather optical character recognition systems. Deep learning on the other hand is an AI technique. AI can be part of many systems but is not an end in itself.

Anyway, here is the key guidance from the generic functional safety standard IEC 61508-3.

The use of AI is not recommended at any SIL level greater than SIL 1. At SIL 1 it is neither recommended or not recommended. For guidance the definition of not recommended is given below from Annex A of IEC 61508-3:2010.

One of the main objections to AI is that it is overly complex. Functional safety loves simplicity. To quote the book “Software for dependable systems”. Actually, I searched the book but couldn’t find the quote. I googled it and found it again in the book “Code Complete” and attributed to C. A. R. Hoare – “There are two ways of constructing a software design: one way it to make it so simple there are obviously no deficiencies, and the other is to make it so complicated there are no obvious deficiencies”.  I see their point, when you consider that a deep learning algorithm might need to crash a car into a tree 50,000 times before it figures out it is a bad idea. A kid on a tricycle generally figures it about after the first or second crash. Non-determinism is hard to accept for safety. To give a second quote from the above book – “essential that developers are familiar with best practices and deviate from them only for good reasons”.

I can find no mention on AI in the automotive functional safety standard ISO 26262 and therefore in theory the guidance for automotive would fall back to IEC 61508. Yet there appears to be widespread use of AI within new automotive technology. I haven’t yet read all of ISO 26262 revision 2 (expected release 2018) but I must discuss this with my automotive functional safety colleagues within Analog Devices. Perhaps AI is only proposed for driver assist as opposed to safety applications. Perhaps it will be somehow covered by the new SOTIF standard (safety of intended functionality).  I feel the benefits of AI may become so great that the above guidance may have to change and in fact IEC 61508-7:2010 clause C.3.9 offers such hope when it states, “supervisory actions may be supported by artificial intelligence (AI) based systems in a very efficient way in diverse channels of a system”.

Today’s video selection had a lot of possibilities. I went with the SpaceX heavy launch and side booster landing which took place the week I was writing this article. Elon Musk is one of the people who actively warn about the dangers of AI (Google it for a long list of references). I should probably have gone with something like HAL from 2001 a space odyssey but instead I selected the talkie toaster from Red Dwarf. Perhaps not what Elon was warning about but who knows perhaps he does watch "Red Dwarf" after all he obviously watched "The Hitchhikers guide to the Galaxy".

For next time, the discussion will be on the functional safety and security.


Robots, Cobots and Mobots

Posted by Tom-M Employee Mar 22, 2018

Most of my earlier blogs have been on the basics of functional safety because I wanted to cover the fundamentals. I feel now is a good time to cover some more interesting topics and today's topic will be industrial Robots, Cobots and Mobots.

I think everybody knows what a robot is. Robots are big scary machines that typically need to be kept in cages and the functional safety requirements generally involve door interlocks, laser scanners and such. The goal is to keep the robots separated from people. All the safety can be designed to the machinery safety standards ISO 13849 and IEC 62061 (machinery interpretation of IEC 61508).

COBOT stands for collaborative robot. These are robots which are designed to interact with people and where physical contact between the person and the robot may occur. Some people object to the term collaborative robot and say that there are no such robots but rather collaborative applications. The standards ISO 10218-1 and ISO 10218-2 (both parts of ISO 10218 also known as R15.06) give design and application requirements for robots and have some small bits on collaborative operation. In general, they advocate safety integrity requirements of SIL 2, HFT=1 according to IEC 62061 or PL d, CAT 3 according to ISO 13849 unless a risk assessment shows otherwise.


 A suitable risk assessment could be done as per Annex A of ISO 13849:2006 but since 2016 R15.306 is available as a robot specific risk assessment methodology. Risk assessments should be done assuming the user is not wearing any personal protection equipment and before the safety function is added.

Also available since 2016 is ISO/TS 15066. This technical specification is referenced from ISO 10218-1 and ISO 10218-2 and gives additional guidance for “collaborative robots” where “a robot system and people share the same work space.” Figure one is a good illustration of a robot system with a normally protected operating space and a collaborative operating space. This is also covered in the video below. One on key topics in ISO/TS 15066 is “Power and force limiting”. In this mode of operation physical contact between the robot and the operator is expected either deliberately or inadvertently. Risk reduction is achieved through inherently safe design (e.g. removal of pinch points or the use of padding) or using safety functions. Annex A gives limits for the maximum pressure and force allowed during contact for 30 different body locations. It gives no limits for “contact with face, skull and forehead, contact with these areas is not permissible”.

The graphic above shows some of the most relevant standards for robots, cobots and mobots. I never really got to mention mobots. A mobot is a mobile robot more commonly known as an AGV (automated guided vehicles). There currently is no up to date standard that I know of that covers mobots or AGV but I understand work is underway on a new one. Until then ISO 13849 would appear to be the most relevant standard and it would seem logical to use the force and pressure limits from ISO/TS 15066 Annex A for any potential mobot/human contact.

I note that a review of several robots advertised as suitable for collaborative applications do not meet the suggested safety integrity requirements from ISO 10218-1. Some have PL d but only a CAT 2 architecture and some have a PL of b which is the lowest level defined in ISO 13849. In addition I fear that many end users are not doing a suitable risk assessment on their final applications. Perhaps as people become more aware of the latest standards things will change.

Today’s YouTube video comes from Yaskawa and is a very good introduction to Cobots and collaborative operation – see

For next time, the discussion will be on the functional safety requirements for industrial networks.


The Brave New World of DSPs

Posted by tvbsubbu Employee Mar 15, 2018

Imagine traveling in a time machine across 140 years, listening from passive gramophones to the latest 16-channel audio video receiver (AVR), and the results would be amazing. It could be bit isolating, too. In the 19th century when the gramophone was playing, the neighbors and folks in the village and towns all gathered to listen and enjoy the sounds together. When it came to listening to a 16-channel AVR, I was the only one in my living room. Transformation in the society aside, there was a major change in dynamic range and fidelity, increased channel count and of course decrease in noise levels. Processing power with higher resolution and accuracy is one of the major elements for this transformation.


Analog Devices integrated Digital Signal Processors in the mid 80’s and these were 16-bit fixed point processors. The Harvard Architecture used in these processors made them very efficient. The first audio products using these types of processors were players with 2-channel decoding and post processing. The 2-channel decoders running on these processors did use double precision mathematics and output 24-bit audio. As a software hobbyist, and probably because I was a novice in signal processing, I used to spend significant time tuning these fixed-point processors and getting the desired characteristics from the filters. The major problem was decimation and truncation errors, and the laborious trial and error tuning of filter coefficients was the only solution. Subsequently, some of the simulation software packages did generate coefficients for fixed point processors, but didn’t eliminate the hand tweaking process completely.


Floating point digital signal processors were a boon and brought multiple advantages including better dynamic range, higher resolution, and lower noise. Soon enough the professional audio industry realized these benefits and used them in high end studio equipment with multiple processors on each board. Then equipment in movie theaters had audio decoders running on these DSPs. As one might expect they also migrated to AV Receivers for decoding and post processing bringing the experience of a theater in to their living rooms.


Good tool chains for these processors helped writing code in C/C++ and also use some of the highly optimized libraries for FIR, IIR, FFT/IFFT, etc. Programing in C reduced the time to market and brought portability across processors without deep knowledge on the processor architecture and latent. Example, IP holders may release multiple versions of a decoder correcting bugs or for bringing improvements and provide a new code in C/C++ with a few changes. Efficient processor compilers can create the new libraries for the processors with lesser effort and time as compared to doing this task in assembly.


That was just a helicopter’s view of the advantages that came with time. In my next blog, I will attempt a deep dive into processor architectures and how this has helped the audio industry.


IEC 61508 A Deep Dive

Posted by Tom-M Employee Mar 13, 2018

Last time I promised my next blog would feature a deep dive into IEC 61508, the main functional safety standard. And I keep my promises, however, this will be the last of my introductory blogs covering basic topics for a while. I am keen to move on to more exciting topics such as requirements for Cobots, AI, networking and cyber security. So keep tuning in because these topics will all be covered beginning with my next installment.


Obviously as a semi-conductor manufacturer I am going to concentrate on the semi-conductor functional safety requirements but anything here should be more widely applicable. Also, obviously because of the nature of a blog some poetic licence is taken to quickly explain the concepts.


The graphic below shows a path through the standard for a semi-conductor device. Within Analog Devices this flow is captured in our ADI61508 process.



The first task is to understand the environment. This includes not only the EMC environment, the average and the extremes of the temperatures at which the circuitry is expected to operate but also what standards and regulations apply.


Next comes the hazard analysis where the safety functions are identified. Typically, you will need a safety function to address each hazard unless the item can be redesigned to eliminate the hazard.


The third box is where the safety integrity requirements for each of the safety functions is determined. Typically, this is done based on the severity of the harm and the frequency at which that harm may occur.


The next three vertical boxes show the various ways to address the systematic requirements. Systematic failures are failures not caused by random events. Examples of systematic failures are not having enough EMC robustness, missing requirements, something missed because of insufficient testing. Route 1S based on meeting all the requirements in IEC 61508 is the most common option but Route 2S based on evidence of proven in use is also possible. Route 3S is only an option for software and involves retrospectively doing all the paperwork and analyses you should have done in the first place. For an IC the requirements form IEC 61508-2:2010 Annex F shows a means to achieve route 1S.


Then you have two options on how to meet the hardware integrity requirements. Route 1H allows a trade-off between diagnostic coverage and hardware fault tolerance(redundancy). For example, for SIL 3 you could use no redundancy but have a SFF (safe failure fraction – a measure of diagnostic coverage) of 99% or an HFT (hardware fault tolerance) of 1 and 90% SFF in each channel. Route 2H is based on field experience and minimum levels of HFT.


Next if there is on-chip or off-chip redundancy you need to consider CCF (common cause failures). CCF can easily defeat redundancy and CCF are the most common means to defeat a redundant system. Annex E gives guidance on minimizing the risk of on-chip CCF where on-chip redundancy is used through the use of isolation wells, on-chip separation etc.


Now the PFH (probability of dangerous failure per hour) or PFD (probability of failure on demand) need to be calculated. Depending on the SIL level there will be maximum values for these metrics. Typically, an IC will be allocated only a fraction of that maximum.


"When the weight of the paperwork equals the weight of the plane

it is ready to fly."


Next data communications need to be considered. Guidance says that perhaps 1% of the PFH budget should be allocated to interfaces. This might involve calculations based on the bit error rate for the transmission medium, the number of bits transferred per message, the number of messages per hour and the Hamming distance of any CRC used to detect failures. (There will be a blog on this topic.)


Perhaps at the end is the wrong place to put this but if you have on-chip diagnostics you need to consider what you want to do when the diagnostics discover an error. For a motor control application, you may want to stop the power but for other applications you need to know a lot about the final application. For instance, in a nuclear power station cooling application you probably want to keep the coolant flowing but if it is a system carrying gas you might want to stop the gas flowing.


There are lots of other sub-tasks such as configuration management, change management, gathering evidence of competence, independent assessment - not shown above and remember documentation is key. If it is not written down it didn’t happen. Not only must the product be safe but you must be able to demonstrate the reasoning behind it’s safety. There is a saying in avionics that when the weight of the paperwork equals the weight of the plane it is ready to fly.


Video of the day: shows some of the testing required before an airplane can fly – my understanding is that this test was done, in the dark, with half the exits blocked and nobody knows in advance which half – regardless of the size of the plane everybody must be off in less than 90 seconds – see


For the next time -  The Functional safety requirements for Robots, Cobots and Mobots.

Filter Blog

By date: By tag: