Skip navigation
1 2 3 Previous Next

EngineerZone Spotlight

209 posts

While I am the functional safety guy for Analog Device’s industrial products I find it useful to read books related to many application areas. Often what is poorly explained in one book is very well explained in a book from another domain. I find a similar thing with the standards themselves. I imagine that some of the books I like, others will find not useful. It depends a lot on your level of knowledge, what you were hoping to find and your background.

 

I get my books from Amazon.com as I like to read my books on the Kindle app these days. With the Kindle app you can search for something, easily highlight important bits of text and my book press stops growing. However many of the books referenced from IEC 61508 are old are only paper copies are available and then only second hand.

 

If I had to pick two of the below to start with it would be the two free Rockwell automation books.

 

General

The first functional safety book I read was Safety Critical Systems Handbook by David J. Smith and Kenneth G.L. Simpson. As a result of reading the book I attended David Smiths training in the UK. If you read the first half of the book it gives a very quick and easy introduction to the topic.

 

The Functional Safety Lifecycle

Functional Safety – An IEC 61508 SIL 3 Compliant Development Process by Michael Medoff & Rainer Faller is an excellent book. Sections I particularly liked were those on derating and the quantitative analysis of failures rates on interfaces.

 

The System Safety-Lessons Learned in Safety Management and Engineering by Terry L. Hardy illustrates the importance of putting in the safety effort where it actually adds value.

 

Cenelec 50128 and IEC 6229 Standards by Jean-Louis Boulnger. While this concentrates on rail I decided the put it in the functional safety process section. I found it had lots of good insights and chapter 6 of “Data preparation” is good on parameter based systems.

 

The Checklist Manifesto – How to Get Things Right by Atul Gawande is not a functional safety book at all. Gawande  expresses the value of checklists .

 

In the interests of Safety – The Absurd Rules that Blight our Lives and How We Can Change Them by Tracey Brown and Michael Hanlon. Not a functional safety book at all but it teaches you a lesson on how to use common sense as opposed to blindly following the letter of the rules.

 

Requirements Engineering by Elizabeth Hull, Ken Jackson and Jeremy Dick is a good introduction to the topic. I really like the example of requirements traceability involving A4 pages, a big room and lots of string.

 

Configuration Management – Best Practices by Bob Aiello and Leslie Sachs is a good explanation of a topic that is covered in the standards as if everybody already knew how to implement it.

 

Reliability

Reliability Maintainability and Risk by David J. Smith is a great effort to explain the maths behind functional safety in as readable a way as possible for such a topic.

 

Control systems Safety Evaluation and Reliability by William M. Goble has nice big writing, lots of pictures and chapter 9 on diagnostics has the best explanation I have seen on Markov analysis.

 

Software

Better Embedded System Software by Philip Koopman does not claim to be a functional safety book at all and is now hard to get. However it has great chapter names such as “Global variables are Evil” and all that is in it is very relevant to functional safety.

 

Software for Dependable Systems – Sufficient Evidence – a short but interesting book

 

Embedded Software Development or Safety-Critical Systems by Chris Hobbs – is also a good and book with lots of interesting insights.

 

The Leprechauns of Software Engineering by Laurent Bossavit is a nice light book to read on an airplane and tries to find the source of many software myths.

 

Sector specific books

Process Safebook 1-Functional Safety in the Process Industry is a free book available in PDF or paper form from Rockwell automation.  It runs to 168 pages.

 

Safe Book 4 – Safety Related Control Systems for Machinery is another free book from Rockwell automation.

 

BGIA Report 2/2008e – Functional Safety of Machine Controls – Application of EN ISO 13849 is technically not a book but rather a free download. However it runs to over 400 pages and deals with everything related to ISO 13849 so I had to include it.

 

Functional Safety in Practice by Harvey T. Dearden is focused on automotive functional safety but has some good insights if the allusion to Russian roulette on the front cover is somewhat confusing.

 

Basic Guide to (Automotive) Functional Safety by Thorsten Langenhan has lots of English grammar mistakes but is still an insightful read.

 

Avionics Certification by Vance Hilderman and Tony Baghai is an encouraging book. If the requirements from functional safety seem impossible to achieve, have a read and you will feel better.

 

Cyber Security

If you are not secure then you can’t be safe. Therefore learning about cyber security is also important.

 

Embedded Systems Security by David Kleidermacher and Mike Kleidermacher is a book I want to read again.

 

Industrial Network Security by David J. Teumin is a short but good introduction.

 

 

Video of the Day: Finding a relevant video took a bit of thinking – this is my best effort: https://www.youtube.com/watch?v=XRN8NK2oCVo

 

For next time, the discussion will be on the “Functional Safety for Elevators”.

In my last post I discussed cyber security and functional safety and said if you are not secure, then you are not safe.The main non-sector specific functional safety standard is IEC 61508. Within IEC 61508 it references IEC 62443 for security. IEC 62443 is entitled “Security for industrial Automation and Control systems” or “Industrial communication networks – Network and system security” depending on where you look. At last count it consisted of 13 parts and almost 1000 pages. The standards are being developed and published via the ISA (international society of automation engineers) committee ISA99 and the IEC (international electro-technical committees) IEC TC 65. IEC TC 65/SC 65A also publishes the functional safety standards IEC 61511 and IEC 61508 which is our first clue that the two areas might be related.

 

The four parts of IEC 62443-1-X deal with general concepts including concepts and models and a glossary of terms and conditions. The four parts in IEC 62443-2-X deal with policies and procedures including patch management while IEC 62443-3-X has three parts dealing with system level topics including the choosing of the correct SL (security level). The two parts of IEC 62443-4-X are probably the most interesting to companies like Analog Devices and our customers as these relate to component suppliers, with one part covering the life cycle requirements and the other the technical requirements. 

 

A key concept within the IEC 62443 series is that of zones and conduits. Put in simple language a zone contains nodes with similar security requirements and a conduit is a link between zones.

A similarity with functional safety is that IEC 62443 nominates four SL (security levels) which sound very similar to the four SIL from IEC 61508 (another clue to the links).  However, there is no one to one correspondence between SL and SIL. The definitions of the SL are contained in IEC 62443-1-1 and are shown below.

 

 

The definitions concentrate more on what is required to hack the system than the likelihood or probability of the system being hacked. There are alternate definitions given in various articles such as one which states that SL 4 is designed to prevent a nation state level attack. The tables in part 3-2 of the standard expand somewhat on the above using a combination of impact and likelihood to determine the required SL.

 

IEC 62443-1-1 defines seven foundational requirements (FR) to achieve a given SL. These are

  • Identification and authentication control(IAC)
  • Use control(UC)
  • System integrity(SI)
  • Data confidentiality (DC)
  • Restricted data flow(RDF)
  • Timely response to events(TRE)
  • Resource availability(RA)

 

These seven FR can be expressed as a vector so that [1,1,1,1,1,1,1] represents each of the above seven FR implemented to a SL 1 level of rigour. From a purely functional safety point of view you can then argue that by confidentiality, restricted data flow and resource availability are not so important and a SL 1 implementation is sufficient. Therefore, the required security vector for a safety system becomes [X,X,X,1,1,X,1] where X represents a SL of at least one.

 

If developing an IC or a piece of equipment once you have determined the required SL, you then proceed to IEC 62443-4-1- and IEC 62443-4-2. IEC 62443-4-1 tells you the process steps necessary under eight headings including security management and having an in depth defense strategy. The requirements are given independent of the SL. IEC 62443-4-2 gives you requirements under the heading of the seven FR and with additional requirements depending on whether it is an application, an embedded device a host device or a networked device. According to IEC 62443-4-2 the necessary requirements depend on the SL.

 

Part 4-2 provides requirements for 4 types of components with 47 requirements in total depending on the SL.

 

There is now a certification scheme in place for IEC 62443, see ISAsecure and the various TUV and Exida also offer certification.

 

Video of the Day: This video from Siemens highlights some of the issues and has dramatic music which I like in a video - https://www.youtube.com/watch?v=dlczMRRFdtQ&stc=nls_152_trackingID_en

 

For next time, the topic will be functional safety: recommended reads.

Nearly all horse racing fans and even most causal sports fans know the name Secretariat. The thoroughbred shattered track records on the way to a Triple Crown sweep in 1973. What many people may not know is that Secretariat didn’t break the track record at the Preakness Stakes, at least not officially, until almost 40 years later.

 

Analog Devices engineer, Tom Westenburg, helped set the record straight. You may remember Tom from a February blog where he talked about his experience with ensuring the accuracy of the timing systems used at many of the Winter Olympic sliding tracks.

 

With this year’s Preakness Stakes just a few days away, Tom was kind enough to share his story about Secretariat and record that almost wasn’t. Here’s Tom’s account:

 

When Secretariat won the 1973 Preakness Stakes, the official time was 1:55, which appeared to be around 2 seconds longer than what hand-held timers recorded (1:53.2). You can read more about what happened and why Secretariat didn’t hold the track record even though the horse should have here and here.

 

Back in 1973 I was a teenager mowing a woman’s lawn when this race happened. She insisted that I come inside and watch it. She had grown up in Kentucky and her family had raced horses, so she filled me in on all the “behind the scenes” details and how, if Secretariat won the Preakness, he’d probably be a Triple Crown winner. There was something different about the Preakness, I think length and/or surface of the track. As I watched her jump and scream as Secretariat won, I had no idea that I’d be working to correct a bad time 39 years later.

 

My work would involve reviewing the videotape of the race to determine what time I thought was correct, and why. I had a piece of software that could count fields of a video. A frame is made up of two fields, so using fields gave me twice the time resolution, or 1/59.94 Hz. My plan was to count frames, then calculate the worst possible time error, multiply it out and I’d be done. It wasn’t that easy. I also thought it’d be easy to find specifications for 1973 (or older) equipment, and locate the older NTSC standards. I wanted to find out how tight the 16.683 ms field rate was. I was able to find specifications from that era, but not specifically for 1973. However, when I went through the math on oscillator and thermal drift errors, that error was minimal. Trying to count frames was much more difficult than I expected. The cameras at the start and finish were not perpendicular to the track, so I had to estimate. I also had to interpolate between fields. As it turns out these were the largest error sources. I came up with 1:53.08, with a range of 1:53.00 to 1:53.15. There was no question in my mind that 1:55 was incorrect.    

 

While doing this I learned that a horse race does not start when the gate doors open; it starts down the track. This short stretch is called the “run-up.” It varies from track to track, and can be as long as 375 feet from the starting gate. At the Preakness the run-up is around 150 feet. I also learned that in the past, horse racing was timed to one fifth of a second, or 0.2 s resolution, (1:53 0/5th s). Tracks today are timed to 1/100 s, or 10 ms.

 

In my written testimony, I speculated on what could have caused the error. It seemed likely that the start-timing light started the timer early. It could have been from a bright glint of light (sun on a mirror or a camera flash) saturating the receiver and causing it to trip erroneously. After the testimony, someone who was there at the race in 1973 told me that a man ran out onto the track to pick up a piece of trash that blew onto the track about the time the gates opened. He left the track near the start-timing light. This is my revised theory as to what happened, but we’ll never know for sure.

 

This was the third hearing attempting to correct Secretariat’s time, and track record. Penny Chenery (Secretariat’s owner) was getting up in years and she was determined to correct this before she died. I never met Penny during this, but she did write me a very nice letter after the time and record were corrected and thanked me for my assistance. I would have liked to have met her, she seemed like a very interesting and unique person. I received her thoughts and concerns through Leonard Lusky, who was working meticulously to put everything together. He did a great job at laying out and building the case to get the Maryland Racing Commission to understand and change Secretariat’s time. Penny died September 16, 2017 at 95 years of age. I’m very happy I could be part of this, and that everybody involved may have given her a little peace of mind that things were set straight before she passed.  

Tom-M

Functional Safety & Security

Posted by Tom-M Employee May 8, 2018

Functional safety concentrates on protecting people, assets and the environment from inadvertent harm caused by non-malicious actors, for instance by bad planning, bad implementation, a bad set of requirements or random failures.

 

Cyber security on the other hand concentrates on harm caused by malicious actors. Somebody deliberately causes the system to fail in a way that brings some advantage to them.

 

Given that functional safety concentrates on “accidents” and “mishaps” and security deals with deliberate “hacks”, you do need to think about it somewhat differently. For instance, it is more important to think about what is possible as opposed to what is probable.

 

In many languages there is only one word to cover both safety and security. For instance, in German it is Sicherheit. Therefore, I generally try and remember to say, “cyber security” instead of security to make it clear which I meant.

 

All systems with functional safety requirements have security requirements. At a minimum in functional safety you must protect against foreseeable misuse and somebody hacking the system comes under that category. There will be lots of systems with security requirements which are not safety relevant. Therefore, systems with functional safety requirements are a subset of systems with security requirements.

 

 

Sometimes the root cause of both safety and security concerns are the same. Suppose you have 1,000 lines of code and it contains a single design error. If you only consider safety then that buggy line of code may never be executed or may execute at a time when the bug doesn’t matter. However, a hacker becoming aware of such a bug can try to exploit the situation so that the dodgy line of code is always executed.

 

Like functional safety, cyber security comes with its own terminology. You have attack surfaces, a PUF (physically un-clonable function), side channel and glitch attacks. Some of the security requirements such as threat assessments, parallel things like a hazard analysis in functional safety. Also, there are procedures for setting a target security level which are quite like those for SIL determination. Perhaps the biggest similarity though is that both are emergent system level properties and it is very hard if not impossible to add security or safety afterwards to an already designed system.

 

Within Analog Devices we are lucky in that some years ago we acquired the Cyber Security Solutions (CSS) business of Sypris Electronics who are based in Tampa Florida and they have become the Trusted Security Solutions group within Analog Devices. As the safety guy I do need to know something about security but it is good to have the real experts on call.

 

  • Regular security patches are generally not possible on the factory floor for fear of upsetting production
  • Many of the nodes used in industrial are resource constrained with RAM often << 1 Mega byte
  • The equipment lifetimes can often be twenty years or more
  • Some of the controller equipment is dangerous and can cause harm
  • Much of the equipment is time critical and security can add big time overhead
  • Many of the protocols are proprietary

 

An interesting example, if you have a nuclear shutdown system - is it appropriate to lock out the safety guy from the shutdown system if he gets his password wrong three times?

 

Within industrial circles the most famous cyber security incident is the Stuxnet virus. It was designed to target the Iranian nuclear enrichment program via a Siemens S7 PLC. It is believed to have been written and deployed by state level actions. There is an excellent documentary film on the topic called “Zero Days”.

 

This blogs video is the trailer for Zero Days – see http://www.zerodaysfilm.com/trailer

 

Within IEC 61508 it references the IEC 62443 series for cyber security requirements. Therefore, for my next blog, the discussion will be on the “The IEC 62443 series of cyber security standards”.

 

 

 

3GPP declared a major milestone for 5G this past December by announcing the approval of the first 5G New Radio (NR) specifications. But even after that formal milestone, the members of 3GPP will spend at least the next six months finishing additional required details of the 5G specification.

 

While the specification for the radio are approaching completeness, the test specifications were barely started when the announcement was made. Test specifications are an important part of the overall 3GPP output as they are adopted by certification bodies to certify user equipment (UE). RAN5 is the working group within 3GPP which has the task of detailing the UE test specifications also known as conformance specifications. These specifications include the various well-known tests such as RF transmit and receive power, waveform quality, occupied bandwidth, adjacent channel leakage, etc. There are also protocol specifications that define the behavioral performance of signaling between the phone and network, yet to be written. 

 

As of March 2018, 3GPP RAN5 had established the skeletons of the test specifications as well as significant detailing of some aspects of the specifications. These test specifications are pre-release documents and can be seen as very early due to the very frequent use of “TBD” (to be determined) and “FFS” (for future study)—these are known unknowns that are placeholders for future values.    

 

38.508-1

User Equipment (UE) conformance specification; Part 1: Common test environment

38.509

Special conformance testing functions for User Equipment (UE)

38.522

User Equipment (UE) conformance specification; Applicability of RF and RRM test cases

38.523-1

UE conformance specification; Part 1: Protocol conformance specification; RAN5 doc

38.521-1

User Equipment (UE) conformance specification; Radio transmission and reception; Part 1: [Frequency] Range 1 Standalone

38.521-2

User Equipment (UE) conformance specification; Radio transmission and reception; Part 2: [Frequency] Range 2 Standalone

38.521-3

User Equipment (UE) conformance specification; Radio transmission and reception; Part 3: NR interworking between NR range1 + NR range2; and between NR and LTE;

38.521-4

User Equipment (UE) conformance specification; Radio transmission and reception; Part 4:  Performance requirements;

5G UE test specifications produced by 3GPP RAN5 (source: ftp.3gpp.org/Specs/latest-drafts)

 

An interesting aspect of testing 5G, and a concern for the industry, is how to test beamforming in base stations and mobile phones while the system is actively scanning and tracking the 3-D sphere for energy. A new function defined for 5G NR that helps with this challenge is called “beamlock” (see specification 38.509). The beamlock function forces the UE to freeze the beamforming pattern of the UE so that testing can occur. The receive pattern and transmit beam patterns can be independently frozen. This test function is not meant to be used in regular operation . The presence of this function reinforces that the world is quite different when it comes to testing a beam-formed millimeter wave system-- the yet to be defined over-the-air (OTA) tests will be far different from past wireless generations.  And it is certain to be a challenging task finding agreement on these tests given the highly varied opinions of RAN5 attendees and complex technical nature of the problem.   

 

The early nature of the test standards coupled with the complexity of OTA suggests a substantial amount of work in order to complete the test specifications by 3GPP’s goal of the end of 2018. In order to achieve the goal, 3GPP RAN5 will continue its world tour, hosting 2018 meetings in Korea, Sweden and the United States.  In addition, there will likely be many discussions over phone and email to complete the work.  

 

As we look forward to the future, we await 3GPP’s detailing of the test specification, especially in the aforementioned area of OTA. With beamforming and tightly integrated device antennas, we expect the R&D, type approval, and production tests to have a dramatic increase in the amount of OTA testing compared to prior generations.

 

Follow EngineerZone Spotlight to receive updates when new blogs about 5G or other interesting topics are published.

Tom-M

Functional Safety and Networking

Posted by Tom-M Employee Apr 25, 2018

After writing my blog on the functional safety requirements for robots, cobots and mobots I thought it would be interesting to tackle functional safety requirements for networking. The two topics are linked as most robots will be networked as robots are an important part of Industrie 4.0.

 

Mentions of networking within IEC 61508 are few with only IEC 61508-2:2010 clause 7.4.11 offering much guidance where it offers white channel and a black channel approaches and refers the user to IEC 61784-3 or the IEC 62280 series. Using the white channel approach, the entire network including the communication devices at both ends are developed to the relevant functional safety standards. This would be a lot of work and limit the use of standard networking components. The more common approach is the use of the black channel where no assumptions are made about the channel and safety is taken care of with an additional SCL (safety communication layer) in the application software. This SCL is developed to the safety standards but everything else in the communication system is just a standard component. The picture below is taken from the IEC 61784-3 standard.

 

IEC 61784-3 is a fieldbus standard and the IEC 62280 series (also known as EN 50159) covers trains. EN 50159 gives a series of threats and a list of possible defenses against those threats. For each threat the SCL must implement at least one defense, see below.

 

For safety of machinery the time-out defense is of particular interest. It effectively implements a watchdog timer so that if for instance a robot receives no communications then after a specified interval it takes the robot to it’s safe state.

 

Also, table B.2 of EN50159 is of interest. It lists various categories of networks and identifies each of the threats as either negligible, needing some protection or needing strong countermeasures. A Category 1 network might be considered as the closed network within a robot or cobot or perhaps the interface between an analog to digital converter and a local micro-controller. A category 1 network has a known fixed maximum number of users and limited opportunity for unauthorized access. A category 3 network on the other hand might be something like a wireless network which typically has a lot more opportunities for unauthorized access than a wired network.

 

The white channel approach is not widely used but I wonder will new requirements such as those for TSN (time-sensitive networking) change that. This might be a good topic for a future blog.

 

I have struggled to find a good video related to functional safety and networking – this one is even more tenuous than normal. For anyone who doesn’t spot the link – leave a comment in the comments section and I will get back to you – see https://www.youtube.com/watch?v=yBBWUZfgRiw

 

Actually, this week there is a bonus video which discusses how to decide if your CRC is good enough. It shows how to combine the hamming distance of the CRC, the expected bit error rate of the network, the number of bits transferred per second and the required SIL level to determine if your CRC is good enough the meet the PFH requirements from IEC 61508 or indeed ISO 13849 – see http://www.analog.com/en/education/education-library/videos/4592427497001.html

 

Follow EngineerZone Spotlight to be notified of new safety blogs.

This might be my shortest blog yet. Artificial intelligence comes by many names including machine learning. Systems that understand hand writing are not referred to as AI but rather optical character recognition systems. Deep learning on the other hand is an AI technique. AI can be part of many systems but is not an end in itself.

Anyway, here is the key guidance from the generic functional safety standard IEC 61508-3.

The use of AI is not recommended at any SIL level greater than SIL 1. At SIL 1 it is neither recommended or not recommended. For guidance the definition of not recommended is given below from Annex A of IEC 61508-3:2010.

One of the main objections to AI is that it is overly complex. Functional safety loves simplicity. To quote the book “Software for dependable systems”. Actually, I searched the book but couldn’t find the quote. I googled it and found it again in the book “Code Complete” and attributed to C. A. R. Hoare – “There are two ways of constructing a software design: one way it to make it so simple there are obviously no deficiencies, and the other is to make it so complicated there are no obvious deficiencies”.  I see their point, when you consider that a deep learning algorithm might need to crash a car into a tree 50,000 times before it figures out it is a bad idea. A kid on a tricycle generally figures it about after the first or second crash. Non-determinism is hard to accept for safety. To give a second quote from the above book – “essential that developers are familiar with best practices and deviate from them only for good reasons”.

I can find no mention on AI in the automotive functional safety standard ISO 26262 and therefore in theory the guidance for automotive would fall back to IEC 61508. Yet there appears to be widespread use of AI within new automotive technology. I haven’t yet read all of ISO 26262 revision 2 (expected release 2018) but I must discuss this with my automotive functional safety colleagues within Analog Devices. Perhaps AI is only proposed for driver assist as opposed to safety applications. Perhaps it will be somehow covered by the new SOTIF standard (safety of intended functionality).  I feel the benefits of AI may become so great that the above guidance may have to change and in fact IEC 61508-7:2010 clause C.3.9 offers such hope when it states, “supervisory actions may be supported by artificial intelligence (AI) based systems in a very efficient way in diverse channels of a system”.

Today’s video selection had a lot of possibilities. I went with the SpaceX heavy launch and side booster landing which took place the week I was writing this article. Elon Musk is one of the people who actively warn about the dangers of AI (Google it for a long list of references). I should probably have gone with something like HAL from 2001 a space odyssey but instead I selected the talkie toaster from Red Dwarf. Perhaps not what Elon was warning about but who knows perhaps he does watch "Red Dwarf" after all he obviously watched "The Hitchhikers guide to the Galaxy".

For next time, the discussion will be on the functional safety and security.

Tom-M

Robots, Cobots and Mobots

Posted by Tom-M Employee Mar 22, 2018

Most of my earlier blogs have been on the basics of functional safety because I wanted to cover the fundamentals. I feel now is a good time to cover some more interesting topics and today's topic will be industrial Robots, Cobots and Mobots.

I think everybody knows what a robot is. Robots are big scary machines that typically need to be kept in cages and the functional safety requirements generally involve door interlocks, laser scanners and such. The goal is to keep the robots separated from people. All the safety can be designed to the machinery safety standards ISO 13849 and IEC 62061 (machinery interpretation of IEC 61508).

COBOT stands for collaborative robot. These are robots which are designed to interact with people and where physical contact between the person and the robot may occur. Some people object to the term collaborative robot and say that there are no such robots but rather collaborative applications. The standards ISO 10218-1 and ISO 10218-2 (both parts of ISO 10218 also known as R15.06) give design and application requirements for robots and have some small bits on collaborative operation. In general, they advocate safety integrity requirements of SIL 2, HFT=1 according to IEC 62061 or PL d, CAT 3 according to ISO 13849 unless a risk assessment shows otherwise.

 

 A suitable risk assessment could be done as per Annex A of ISO 13849:2006 but since 2016 R15.306 is available as a robot specific risk assessment methodology. Risk assessments should be done assuming the user is not wearing any personal protection equipment and before the safety function is added.

Also available since 2016 is ISO/TS 15066. This technical specification is referenced from ISO 10218-1 and ISO 10218-2 and gives additional guidance for “collaborative robots” where “a robot system and people share the same work space.” Figure one is a good illustration of a robot system with a normally protected operating space and a collaborative operating space. This is also covered in the video below. One on key topics in ISO/TS 15066 is “Power and force limiting”. In this mode of operation physical contact between the robot and the operator is expected either deliberately or inadvertently. Risk reduction is achieved through inherently safe design (e.g. removal of pinch points or the use of padding) or using safety functions. Annex A gives limits for the maximum pressure and force allowed during contact for 30 different body locations. It gives no limits for “contact with face, skull and forehead, contact with these areas is not permissible”.

The graphic above shows some of the most relevant standards for robots, cobots and mobots. I never really got to mention mobots. A mobot is a mobile robot more commonly known as an AGV (automated guided vehicles). There currently is no up to date standard that I know of that covers mobots or AGV but I understand work is underway on a new one. Until then ISO 13849 would appear to be the most relevant standard and it would seem logical to use the force and pressure limits from ISO/TS 15066 Annex A for any potential mobot/human contact.

I note that a review of several robots advertised as suitable for collaborative applications do not meet the suggested safety integrity requirements from ISO 10218-1. Some have PL d but only a CAT 2 architecture and some have a PL of b which is the lowest level defined in ISO 13849. In addition I fear that many end users are not doing a suitable risk assessment on their final applications. Perhaps as people become more aware of the latest standards things will change.

Today’s YouTube video comes from Yaskawa and is a very good introduction to Cobots and collaborative operation – see https://www.youtube.com/watch?v=4JNJ1LHSAwA&t=23s

For next time, the discussion will be on the functional safety requirements for industrial networks.

tvbsubbu

The Brave New World of DSPs

Posted by tvbsubbu Employee Mar 15, 2018

Imagine traveling in a time machine across 140 years, listening from passive gramophones to the latest 16-channel audio video receiver (AVR), and the results would be amazing. It could be bit isolating, too. In the 19th century when the gramophone was playing, the neighbors and folks in the village and towns all gathered to listen and enjoy the sounds together. When it came to listening to a 16-channel AVR, I was the only one in my living room. Transformation in the society aside, there was a major change in dynamic range and fidelity, increased channel count and of course decrease in noise levels. Processing power with higher resolution and accuracy is one of the major elements for this transformation.

 

Analog Devices integrated Digital Signal Processors in the mid 80’s and these were 16-bit fixed point processors. The Harvard Architecture used in these processors made them very efficient. The first audio products using these types of processors were players with 2-channel decoding and post processing. The 2-channel decoders running on these processors did use double precision mathematics and output 24-bit audio. As a software hobbyist, and probably because I was a novice in signal processing, I used to spend significant time tuning these fixed-point processors and getting the desired characteristics from the filters. The major problem was decimation and truncation errors, and the laborious trial and error tuning of filter coefficients was the only solution. Subsequently, some of the simulation software packages did generate coefficients for fixed point processors, but didn’t eliminate the hand tweaking process completely.

 

Floating point digital signal processors were a boon and brought multiple advantages including better dynamic range, higher resolution, and lower noise. Soon enough the professional audio industry realized these benefits and used them in high end studio equipment with multiple processors on each board. Then equipment in movie theaters had audio decoders running on these DSPs. As one might expect they also migrated to AV Receivers for decoding and post processing bringing the experience of a theater in to their living rooms.

 

Good tool chains for these processors helped writing code in C/C++ and also use some of the highly optimized libraries for FIR, IIR, FFT/IFFT, etc. Programing in C reduced the time to market and brought portability across processors without deep knowledge on the processor architecture and latent. Example, IP holders may release multiple versions of a decoder correcting bugs or for bringing improvements and provide a new code in C/C++ with a few changes. Efficient processor compilers can create the new libraries for the processors with lesser effort and time as compared to doing this task in assembly.

 

That was just a helicopter’s view of the advantages that came with time. In my next blog, I will attempt a deep dive into processor architectures and how this has helped the audio industry.

Tom-M

IEC 61508 A Deep Dive

Posted by Tom-M Employee Mar 13, 2018

Last time I promised my next blog would feature a deep dive into IEC 61508, the main functional safety standard. And I keep my promises, however, this will be the last of my introductory blogs covering basic topics for a while. I am keen to move on to more exciting topics such as requirements for Cobots, AI, networking and cyber security. So keep tuning in because these topics will all be covered beginning with my next installment.

 

Obviously as a semi-conductor manufacturer I am going to concentrate on the semi-conductor functional safety requirements but anything here should be more widely applicable. Also, obviously because of the nature of a blog some poetic licence is taken to quickly explain the concepts.

 

The graphic below shows a path through the standard for a semi-conductor device. Within Analog Devices this flow is captured in our ADI61508 process.

 

 

The first task is to understand the environment. This includes not only the EMC environment, the average and the extremes of the temperatures at which the circuitry is expected to operate but also what standards and regulations apply.

 

Next comes the hazard analysis where the safety functions are identified. Typically, you will need a safety function to address each hazard unless the item can be redesigned to eliminate the hazard.

 

The third box is where the safety integrity requirements for each of the safety functions is determined. Typically, this is done based on the severity of the harm and the frequency at which that harm may occur.

 

The next three vertical boxes show the various ways to address the systematic requirements. Systematic failures are failures not caused by random events. Examples of systematic failures are not having enough EMC robustness, missing requirements, something missed because of insufficient testing. Route 1S based on meeting all the requirements in IEC 61508 is the most common option but Route 2S based on evidence of proven in use is also possible. Route 3S is only an option for software and involves retrospectively doing all the paperwork and analyses you should have done in the first place. For an IC the requirements form IEC 61508-2:2010 Annex F shows a means to achieve route 1S.

 

Then you have two options on how to meet the hardware integrity requirements. Route 1H allows a trade-off between diagnostic coverage and hardware fault tolerance(redundancy). For example, for SIL 3 you could use no redundancy but have a SFF (safe failure fraction – a measure of diagnostic coverage) of 99% or an HFT (hardware fault tolerance) of 1 and 90% SFF in each channel. Route 2H is based on field experience and minimum levels of HFT.

 

Next if there is on-chip or off-chip redundancy you need to consider CCF (common cause failures). CCF can easily defeat redundancy and CCF are the most common means to defeat a redundant system. Annex E gives guidance on minimizing the risk of on-chip CCF where on-chip redundancy is used through the use of isolation wells, on-chip separation etc.

 

Now the PFH (probability of dangerous failure per hour) or PFD (probability of failure on demand) need to be calculated. Depending on the SIL level there will be maximum values for these metrics. Typically, an IC will be allocated only a fraction of that maximum.

 

"When the weight of the paperwork equals the weight of the plane

it is ready to fly."

 

Next data communications need to be considered. Guidance says that perhaps 1% of the PFH budget should be allocated to interfaces. This might involve calculations based on the bit error rate for the transmission medium, the number of bits transferred per message, the number of messages per hour and the Hamming distance of any CRC used to detect failures. (There will be a blog on this topic.)

 

Perhaps at the end is the wrong place to put this but if you have on-chip diagnostics you need to consider what you want to do when the diagnostics discover an error. For a motor control application, you may want to stop the power but for other applications you need to know a lot about the final application. For instance, in a nuclear power station cooling application you probably want to keep the coolant flowing but if it is a system carrying gas you might want to stop the gas flowing.

 

There are lots of other sub-tasks such as configuration management, change management, gathering evidence of competence, independent assessment - not shown above and remember documentation is key. If it is not written down it didn’t happen. Not only must the product be safe but you must be able to demonstrate the reasoning behind it’s safety. There is a saying in avionics that when the weight of the paperwork equals the weight of the plane it is ready to fly.

 

Video of the day: shows some of the testing required before an airplane can fly – my understanding is that this test was done, in the dark, with half the exits blocked and nobody knows in advance which half – regardless of the size of the plane everybody must be off in less than 90 seconds – see https://www.youtube.com/watch?v=_gqWeJGwV_U

 

For the next time -  The Functional safety requirements for Robots, Cobots and Mobots.

A complex systems challenge needs a comprehensive systems solution. The IoT is a system. A system with far-reaching capabilities, opportunities and benefits. All of which come with significant complexity for those looking to harness its vast potential. The most successful results will come from a systems approach. Extreme IoT solutions call for the most precise and secure data under the widest range of conditions with low power processing at the very edge of the network to reliably connect to the cloud.

 

Extreme IoT is moving beyond number of connected sensors into systems that may be moving (an “Internet of Moving Things”) or systems irrespective of domain partitioning where analytics can happen at the sensor edge or in the cloud. Time based or “time stamped” data allows insights to be drawn or an outcome to be predicted from many sets of sensor data taken at the edge or in the cloud. But finally, when data is sent into the cloud, it is vital that that it is done with highest reliability.

 

Consider these examples from my previous blog, Welcome to the Extreme IoT: a sensor in the heart of the desert, another deep in the arctic or sensors on a moving robot in a factory full of radio interference. Just surviving and operating in those extreme settings is challenging.

 

Low Power Reliable Wireless Sensor Networks (WSNs)

Wireless IoT networks must meet the same requirements of wired networks, but the demands extend beyond reliable transmission in harsh conditions. Wireless networks must also deliver robust performance, security, and the lowest possible power consumption. While the radio is a critical building block, and many low power radio solutions are available, network protocol and architecture play a large role in determining the performance and power consumption of the full solution. And the needs may be very different depending on the application. These can range from large data sets transferred on a continuous basis to small amounts of information transferred on an “as needed” basis. But in all cases, the data needs to incorporate strong security, including encryption and authentication.

 

From Rant to Reality:

A solution that uses efficient sensor networks, chips, and pre-certified PCB modules with mesh networking software, can enable sensors to communicate in demanding industrial IoT environments. For example, ADI’s SmartMesh® wireless networking products use a time-synchronized channel-hopping protocol with built-in self-diagnostics to transmit data. Each wireless node has an on-board ARM Cortex-M3, which can be used as an edge processor. That way, only the necessary information is transmitted, reducing power consumption and cost. This specialized combination of reliable WSN communications, intelligent sensing, and low power consumption makes wireless solutions like SmartMesh well-suited for placement almost anywhere in demanding industrial environments.

 

For more Inside IoT blogs click here.

In my last Blog, I promised a discussion on the various functional safety standards. As someone once said about standards, the great thing about standards is “that there are so many to choose from”.

 

IEC 61508 is what is referred to as an A level or basic standard. It is meant to be non-application specific and to be a general standard. From it are derived sector specific standards such as ISO 26262 for automotive or IEC 62061 for machinery. These sector specific standards are referred to as level B standards. The bottom tier of standards are level C standards and apply to specific pieces of equipment.

 

 

There are also some standards such as ISO 13849 or the avionics standards such as D0-254/D0-178C which are not derived from IEC 61508 but if you look at the table of contents in any of these you will note that they cover all the same areas and topics as IEC 61508. Some of these standards such as ISO 13849 refer back to IEC 61508 for complex technology or in the case of the medical standards for the detailed software techniques. Others such as the robot safety standard ISO 10218-1 give SIL and PL from IEC 61508 and ISO 13849 to specify the safety integrity requirements.

 

Standards are published by various groups including ISO, IEC, ISA, IEEE, UL, CENELEC and many others. The ISO (International Organization for Standardization) and IEC (International Electrotechnical Commission) are the two main international standards organizations and the members of these groups are the main standards bodies within a country. For instance, in Ireland the members are the NSAI (National standards authority of Ireland). Each national standards body can then nominate experts to take part in drafting and reviewing the standards. The group dealing with IEC 61508 are split into IEC TC 65/SC 65A/MT 61508-1/2 and IEC 61508/TC 65/SC 65A/MT 61508-3. These standards are meant to be developed by consensus and are therefore referred to as consensus standards. A criticism of this approach is that some people interpret the standards as being the minimum necessary on the basis that this was “all the committee could agree on”. There is some merit in this criticism in that compliance is the minimum you are required to do and in many cases it is also the most you are “required” to do. If consensus cannot be reached then sometimes a standard is not published but instead it is a technical specification. Within a standard such as IEC 61508 some of the parts will be normative and some of the parts will be informative. Normative parts contain the actual requirements of the standard and the informative parts give guidance on how to apply the normative parts.

 

The standards can be difficult to read and legalistic as shown below and I would advocate reading a good book on the topic if you want to get an overview of the topic. In a future blog, I will feature a functional safety book review. If you do insist on wanting to read the standards they cost in the region of $250/Euro 250 per standard and can be bought directly from the IEC, ISO or your national standards body (note – IEC 61508 is in 7 parts and ISO 26262 is in 10 so buying all the parts will cost upwards of Euro 2000).

 

 

Most standards also include the idea of tailoring whereby the standard needs to be interpreted depending on the task in hand and the non-relevant bits can be skipped. As Mike Miller a functional safety expert told us during a functional safety training course “Functional safety should be common sense written down”. When tailoring a standard, you should record the reasons for your decisions as to why you are skipping bits. If you don’t write down your reasons you could be accused of being negligent. If you write down your reasons for not performing some of the actions required then you are at worst stupid.

 

Sometimes the standards bodies cooperate and a standard can have multiple names such as IEC/ISO/IEEE 5288:2015 on Systems and Software engineering.

 

Complying with the standards is not normally legally necessary. However, it can be and things like the machinery directive within the EU insist that all machines must be design to “state of the art”. Complying with IEC 61508 and ISO 13849 given evidence that you followed a state of the art development process. Complying with standards such as IEC 61508 can also be put forward as part of the defence case if a company is sued as you have followed state of the art.

 

Video of the Day: I normally try to pick an entertaining video as the video of the day, this one is a bit alarmist but gives an idea of the importance of complying with the necessary standards - https://www.youtube.com/watch?v=5VQBl4PLVSY

 

Next Time: The discussion will be a more detailed look at IEC 61508 and the life cycle it advocates.

 

Notes: For more on level A, B and C standards see ISO 12100

 

Enjoying the Safety Matters series? Tell us by liking the blog posts or commenting below. You may find more Safety Matters blogs here.

In a sport where victory and defeat are often separated by 1/100th of a second, it was surprising when both the German and Canadian two-man bobsled teams won gold medals at the PyeongChang 2018 Winter Olympics after finishing with exactly the same times. In fact, the top five teams were separated by just 0.13 seconds. Which is roughly the blink of a human eye.

 

Precise measurement is absolutely essential for many Olympic events, including bobsled, skeleton, and luge. And luge pushes the measurement limits even further by scoring speed down to 1/1000 of a second.

 

We’re no strangers to precision measurement at Analog Devices. Or to doing so for Olympic sports. ADI associate design engineer Tom Westenburg was the Principal Engineer for the US Olympic Committee’s Sports Science division. He spent 18 years with the USOC before joining Linear Technology Corporation (LTC) and now ADI.

 

I had a chance to talk with Tom about some of his experience related to timing and scoring, as well as improving athletic performance.

 

You discovered a flaw with the timing systems used at many of the sliding tracks and came up with a solution. What was the flaw?

 

Bobsled and luge tracks use optical sensors at the start and finish. They use modulated light sources so they aren’t  affected by changes in the surrounding lighting, such as when a cloud passes in front of the sun.

 

So these lights are looking for a specific modulation rate, say 100 Hz. But when the athlete breaks that beam, it really matters as to where that light is in terms of its flash cycle. As a result, you can end up with a random 10 millisecond error at the start and the finish. Now, you might think, “Well 10 milliseconds at the start and the finish will just cancel each other out.” But because they were random, an unlucky athlete could have them both work against him, adding 10ms to his time. Each light can have an error of 0ms to +10ms, so the maximum is 10ms not 20ms. And luge is a sport that’s measured down to the millisecond, so it needed to be much better than this to fairly judge every athlete.

 

So how did you fix the issue?

 

We wanted to increase the modulation rate as high as we could. We found a few commercially available lights that would work, and did some lab testing. One had a modulation rate of 20 kHz and looked great in the lab, but it had too many false triggers on the track. Around ~9.4 kHz gave us the best overall performance. It was lower than I had planned, but it was still much better than the 200 to 700-Hz lights that most other tracks were using at that time.

 

As a side note, when a light beam is broken, typically three pulses must be missed before it counts as a valid break. The random part is when the athlete enters into the pulse cycle. The second and third pulses add a slight delay, which is equal at the start and finish, so it doesn’t affect accuracy.

 

Then after we had an acceptable timing light, I wanted to make sure it was accurate end-to-end. At this point I needed some expertise and some help. I got in touch with the time frequency expert at NIST and got access via satellite to the NIST Cesium Fountain atomic clock, one of the most accurate clocks in the world. We then built a system that had super high-speed beryllium shutters used for pulsing lasers in surgical applications. We had a set of shutters with a satellite receiver at the start and at the finish. These could be programmed to break the timing light beam with 100ns resolution. The error, including the shutters, was around 50-100us. Without the satellite setup it would have been difficult to accurately test a track that is almost a mile long. In the past, the system timer would be verified in a calibration lab, but not with the timing-lights and a mile of cabling attached. That is a lot easier than an end-to-end test. As far as I know the 2002 Salt Lake games were the only ones ever tested to this level.

 

You had an interesting experience with luger sliders taking advantage of the timing system, didn’t you?

 

Yes, that’s kind of how I got involved in all of this. Many of the older tracks were using retroreflective timing lights. The transmitter and receiver were on the same side of the track with a reflector on the other side. So the light from the transmitter would reflect back to the receiver. Once the beam was broken by the feet of a luge slider passing through, the timer would start.

 

As it turned out, some athletes had suits made of a highly reflective material and a matte black helmet. The suit would reflect the beam back to the receiver and the sensor would not record a break in the beam until the athlete’s head passed through. So the slider was getting basically a full body-length head start, which could be over 200ms (i.e. -250ms + 50ms = -200ms). In a sport won and lost in thousandths of seconds, this was a huge advantage.

 

Of course, they had to be completely flat on the luge for the helmet to trip the light, and they weren’t always doing that. So there’d be these weird instances when the timer never started, and that raised some eyebrows within the sport.

 

So they came to us and we looked into it, and with some time and head-scratching we figured out what was happening. Now, nearly all the tracks use a transmitter on one side and the receiver on the other.

 

You were also involved in helping athletes improve their performance as well.

 

Yes. One example was the U.S. bobsled team. Like luge, you’re looking for any way to shave a tenth of a second or more off the run. The start is a very important and it can win or lose the race. We focused on how the two-person and four-man team members pushed and loaded into the sled. The goal was to get the sled going as fast as possible with a clean load going into the first timing light, which is where the timing of the run begins.

 

We had a real sled, but it was a dry-land sled with wheels instead of runners. We used photo-electric sensors on the wheels to measure distance and velocity, and strain-gauges in each of the push handles to measure force. In fact, we used AD626 amps to amplify the strain gauges.

 

An athlete’s excitement, especially at an event such as the Olympics, can cause him or her to push a bit longer than they should. If the first three athletes in a 4-man sled team do that and delay their load-in, it can cause the brakeman to have to run beyond the point where he/she is applying propulsive force to the sled. They then have to pull themselves into the sled. All of which can cause a poor load-in and slow the sled going into the first timing light.

 

Using that system, we could calculate where they were on the track and when they were loading. The system transmitted the sled data and mixed in a live video of the athlete to a coach’s laptop. We’d display a force profile on top of that and calculated other parameters which indicated the quality of the push and load. So teams knew how well they were pushing and loading. We wanted each team to have the optimal start burned into memory and not deviate from it. This real-time feedback enabled athletes to find that optimal point by making corrections when what they just did was fresh in their minds. Previously, it would take days to process the data, but by then, it was hard for an athlete to remember what they did, and so it would be almost useless in terms of making an effective correction.

 

The US four-man team is competing in a few days. Thanks for giving us a unique look at some of what goes on behind the scenes to enhance the precision of their performance.

 

You’re welcome.

As part of the evolution of the IoT, more information is needed than simply increasing the number of sensors in a system and measuring more modalities. I’ve previously spoken about clever partitioning of systems and breaking an IoT system into what needs to happen at the edge (sensor or gateway) and what happens in the cloud. Extreme IoT requires going beyond stationary, connected and sometimes dumb sensors, irrespective of how much data they produce. What if the target is mobile? The complexities of an Internet of Moving Things goes beyond simple data collection into how to track and measure intelligently. However the information about “when” a measurement happens can be almost as important. Machine learning about what conditions signify can only be gained if you can synchronize an event to a time stamped set of data. 

 

It’s only then that the real magic of IoT (moving data into value or wisdom) can be unlocked.

 

Time Sensitive Data in Industrial Ethernet

Two critical elements that come into play for industrial applications are the need for guaranteed “on-time” reliable data delivery and accurate time-stamped data for event sequencing and process analysis. When the data absolutely, positively has to be there at the right moment, deterministic networking can enable everything from motion control applications to process control and factory automation applications. Time-stamped data can be used in algorithms to reveal trends across the factory that deepen the value of the information.

 

At the 2017 IoT Solutions World Congress in Barcelona, several ideas were presented to show the value and readiness of Time-Sensitive Networks (TSN) and new IEEE standards to support real-time control and synchronization of machinery and processes. The vision was to enable flexible manufacturing for Industrial IoT and Industry 4.0 through deployment of open, standard deterministic networks in production facilities. Analog Devices was one of 17 partners behind an award-winning solution.

 

From Rant to Reality:

As the IoT evolves at such a rapid pace, solutions should be engineered with the flexibility to meet the current and future requirements. For example, ADI’s fido5000 Real-time Ethernet, Multi-protocol (REM) switch is designed for all of today's major Industrial Ethernet protocols and its configurable blocks will also make it easier to support future IEEE 802.1 TSN protocols.

In my last Blog, I posed the question “What are 3 key requirements for a safety integrity level?”.

 

A functional safety standard such as IEC 61508 runs to over 700 pages across 7 parts. However, the requirements can be summarized under 3 key requirements

 

  • Requirement 1: Have good reliability
  • Requirement 2: Be fault tolerant (even though you have good reliability, failures will still happen) 
  • Requirement 3: Prevent design errors (not all system failures are due to hardware failure) 

 

Requirement 1: Most people would accept that while having good reliability doesn’t guarantee safety it is at least a good first step. Reliability is measured in FIT (failures per billion hours of operation). Reliability predictions can be based on field experience or predictions using systems such as IEC 62380, SN29500 or the FIDES guide. The allowed dangerous failure rate will depend on the SIL with 10000 FIT for SIL 1, 1000 for SIL 2, 100 for SIL 3 and 10 for SIL 4.

 

ADI publishes the die FIT for all released products at www.analog.com/ReliabilityDataThe data is presented using a tool which allows the average operating temperature to be entered and gives the reliability predictions at the 60% and 90% confidence levels. The numbers presented below are based on accelerated life testing.

 

 

Most equipment suppliers are interested in reliability, but functional safety insists on it with specific limits depending on the required safety level for the allowed probability of dangerous failure. It also offers means to enhance it using techniques such as derating and architectures such as MooN which are topics for future blogs.

 

Requirement 2: If you accept that no matter how good the reliability the system will still fail, then ways to cope with this failure include diagnostics and redundancy. Diagnostics detect that a failure has occurred and take the system to a safe state. Redundancy implies that there is more than one system capable of performing the safety action and that even if one failure occurs there is another redundant piece of equipment which will maintain safety. In IEC 61508 the diagnostic coverage figure of merit is the SFF (safe failure fraction). SFF gives credit safe failures and detected dangerous failures. For SIL 1 a minimum SFF of 60% is required, for SIL 2 90% and for SIL 3 99%. It is allowed to trade off redundancy (HFT) for SFF so that a SIL 2 safety function can be implemented with two channels each having 60% SFF. At the IC level parts such as the AD7124 feature lots of diagnostics which can be used to detect both internal and system level failures. On-chip diagnostics include references inputs such as 0V, +/-full-scale and +/-20mV and state machines to detect internal bit flips. System level diagnostics include transducer burnout current sources.

 

 

Requirement 3: In IEC 61508 functional safety refers to the measures taken to prevent the introduction of design errors as the systematic safety integrity of the item. These measures are necessary since no matter how good your reliability and despite your built-in hardware fault tolerance you must recognize that a system can fail to carry out its safety related task without any failures. The causes of such failures might include missed, forgotten requirements, improper verification or validation. Software coding errors are considered as systematic errors because they are not caused by failures per say as typically the system is operating as designed. Harder to accept is that EMI (electromagnet immunity) failures are also considered as systematic failures because once again the system hasn’t failed as such but rather was not built with enough robustness. Measures advocated by IEC 61508 to prevent the introduction of systematic errors include things like coding standards, design review, verification plans, safety plans, checklists, requirements management and many more.

 

Video of the day – https://www.youtube.com/watch?v=QxG41aFl5Ns (the excuse for including this video is that it vaguely relates to determining customer requirements).

 

For the next time -  Name some functional safety standards?

 

Click here to read more Safety Matters blogs.

Filter Blog

By date: By tag: