Functional Safety Engineer: The Job that 'certifies' self-driving cars

In 2019, I was an Autonomous Shuttle Engineer, working for a company that got a thrilling opportunity: to equip Paris' transportation system with our autonomous shuttles. This was a golden opportunity many don't have, but the client was known to be ruthless selectors. Many others perished while trying to be "approved".

With high hopes, our team prepared for the demo day for months. We meticulously reviewed the client's 100+-point checklist, ensuring our shuttle met all requirements from real-time operations to autonomy measures. One day, a team of 5 was called to begin process in a secret underground site. It was going to begin.

The experimentation lasted days, in which each of the items were reviewed. Came the final test: Cyber-Security. The client made a phone call, and within 30 seconds, an engineer with a thinkpad came and entered the shuttle. "Oh great! We can charge our phones!" He said amused. "What a mistake!". My colleagues were sweating, horrified at the vision of what this young men could do.... and they were right: In just five minutes, using only a USB stick, he had taken control of the vehicle, got it to drive all across the room. The room went silent, as everyone realized our chance had slipped away.

Checkmate.

Many engineers join the self-driving car world for the same reasons I did: it's exciting, it's interesting, it's a passionating, it's impacting, it's just... wow. Yet, nearly all the engineers who are on the "learning" group and have never joined a real self-driving car company yet have absolute zero vision on what it takes to certify a vehicle. We could talk cyber security, but even at the automotive level, the software level, and more...

So in this post, I will try to sensibilize you to the concept of safety, from an autonomous tech engineer point of view. This means — this post won't be for expert functional safety engineers, but for those who want an introduction.

Let's begin with the fundamentals:

What is Functional Safety?

Functional Safety is about making sure machines and systems stay safe, even if something goes wrong. For example, in self-driving cars, it means making sure the car can still drive safely if a part stops working. It can mean verifying that an algorithm works under all conditions, but also that it's never going to crash, and that if it does, the system has a backup.

To make it work, we use functional safety standards that determine what is safe to include in a self-driving car by evaluating the potential risks associated with each function and scenario. You can therefore understand the entire point of functional safety:

To reduce risk to an acceptable level.

The goal of functional safety is to make sure autonomous cars are at an acceptable level of risk

Okay, but this shouldn't be your job, right? It's someone else's problem! So you may wonder...

Why should I bother learning about Functional Safety?

Let's say you decide to build an autonomous tech startup and run your algorithms. Some are open source, some are designed by you. You decide that these are good algorithms, the accuracy is near perfect, and you're a brutal C++ coder. There is no way you missed anything. Let's even pretend you really ARE a super-hero and really, the system is perfect...

You convinced me... but can you convince recruiters? Or your management? Or the suits giving your startup a self-driving permit? Hey — you can't test without the permit. No matter how good your system looks, you will need to convince the state to deliver you a permit. It can be the State of California, or the Ministry of Transport, or whoever delivers authorizations.

The problem? They are NOT experts in safety or self-driving cars. So they will ask you to go via independent organizations, who run functional safety certification programs. Organisms like TÜV Rheinland and TÜV SUD (Germany) are the ones 'certifying' you. They're verifying your safety functions, even the safety critical functions (emergency braking), and doing all kinds of silly tests before issuing you the certification.

Their job is to verify you are compliant with the industry norms.

But which norms are we talking about?

What are the different functional safety norms used?

When we say we want to "reduce risk to an acceptable level"... What is an acceptable level? Are you the one defining it? If an object detector works at 95%... is this okay? No? Yes? Who defines it? If your blinkers fail once every 300,000 miles... is this fine? Or is it every 3 millions miles?

You can't be the deciding entity, this is what norms and industry standards are for. For example, ISO 26262 is a norm. It's focusing on electronics (buttons, A/C, windows, sensors, computers, ...), and defines a complete process to develop & test your cars. It also tells you how to test scenarios, how to grade the risk of any event, and how to reduce that risk.

Let me share some norms we use in the industry:

✅ ISO-26262 is the norm that focuses on failures in electronic and software systems. It's going to deal with the question "What happens if the object detector crashes mid-drive? Is there any backup?" Based on how your system is implemented, you will comply more or less with the norm.
✅ ISO-21448 verifies the Safety of the Intended Function (SOTIF). It ensures perception systems like LiDAR, cameras, and object detection perform safely in all conditions. "Is your object detector working on all pedestrians? Really? Even in the dark?"
✅ ISO-21434 is the norm focused on cyber-security of the system. It solves my USB-stick story. And it tells you everything you need to do to ensure your model is free from cyber attacks.
✅ A-SPICE is focused on how your project is coded, tested, and maintained. This means the requirements, the modular and maintainable code, the coding standards & reviews, the software testing, software versions and revisions, bug fixing, lifecycle of the product, etc...
✅ UNECE WP.29 Regulations is the compliance with EU autonomous driving laws. You need at least this one to be allowed to drive autonomously.
and more... depending on what you want to certify.

While these are not mandatory, the more of these norms you check, the safer you'll look.

📱

If you want to learn more about self-driving cars in production... I am doing a full breakdown of Mobileye's True Redundancy System. Inside, I'm showing you all the different algorithms they test, how their safety guardian fallback works, and discuss their End-To-End algorithm.

It's all in my App, along with 5+ hours of self-driving car content — available when you join my daily emails. Here is where you can learn more.

So comes a question:

How to know if your robot complies with Functional Safety norms?

There are TONS of ways to do this, and it's really a profession, but let me share with you 2 important functional safety concepts:

The V-Model
The Functional Safety Process to "certify" a function

The V-Model

The V-Model is a widely used framework in functional safety management and software development. You will find it when trying to comply with ISO26262, but also A-SPICE for example. It is structured like a "V," where the left side represents the concepts/requirements/design phase, the bottom part is the coding phase, and the right side corresponds to the validation/integration/testing phase.

The V-Model is heavily used across all industries in software

You can see it as a continuous process, where you continuously verify that your system behaves as intended in the concept phase. If not, you rework it. It's evolving, it's alive, promoting a systematic approach to achieving functional safety in safety related systems.

In most companies that seriously want to comply with the ISO norms and get the functional safety accreditation, using the V-Model is the best starting point.

The Functional Safety Process to "certify" a function

As we said, we have ISO26262 focusing on electronics, SOTIF focusing on algorithms, and A-SPICE focusing on code/software. Each of these is using the V-Model. Then, to comply with these norms, you'll need a "process". This means defining clearly what each of these phases are.

Here is a 7-Step process:

The 7 Steps to make a system compliant to ISO norms

The job of a functional safety engineer is to implement this. This is the "bridge" between systems and production I was telling you about earlier.

Let me briefly define each element: (credit to a client of Think Autonomous named Mayur Wagchoure for helping me write this one)

1. Define the System

First, we want to define the system we're testing. For example, lane detection. We want to define the purpose, the scope, the dependencies, and even the normal and edge cases.

2. HARA: Hazard Analysis and Risk Assessment

The second point is HARA, in which we want to do:

HA — Hazard Analysis (what could go wrong?)
RA — Risk Assessment (how bad would that be if it went wrong?)

Hazard Analysis

If you want to comply with functional safety standards, the first thing you'll need to do is account for the different scenarios. I see them into 4 main sections: Car Status, Scenario, Environment, Driving Status.

Example of all possible environments your car may be in (this may vary based on your testing site)

Your car could be turned on, driving in a country road, with rainy conditions, and driving at low speed. Or you could drive at high speed, and accelerate. Or suddenly brake. Or drive in dry roads. Or wet roads. Putting categories into each of these is a way to avoid the summer/winter rookie mistake.

Risk Assessment

To "grade" each function, you then use the formula defined by ISO26262: Risk = Severity * Exposure * Controllability.

For example:

I am testing the emergency braking function, and the risk that it doesn't activate (Severity = S3)
I'm driving in urban environment, at 30-60 km/h, which happens all the time (Exposure = E4)
Urban areas have many pedestrians, it's very hard to control (Controllability = C3)

Then what?

The ISO26262 provides what's called the ASIL (Automotive Safety Integrity Level) Table:

The ASIL Table — This attributes a grading based on your Severity, Exposure, and Controllability. If you have C1, E1, S1, it means you don't need to go through millions of tests.

I am NOT going to describe how we do in this article, but the "RA" phase is about assigning, for every single function and every single scenario, what's called an ASIL level. These can be A (safe), B (safe), C (risky), or D (risky). We're trying to see, for each function, is it risky or safe?

For example:

If you're testing an emergency braking system, in a highway scenario, with wet road, snow, and fog... you can imagine it's an ASIL-D score. Now if you're on the same scenario, but testing the radio, it's probably A or B.

3. Set Safety Goals

From every potential hazard and risk we have, we want to turn this into a safety goal. Basically, turn the failure into an opportunity to design a better system. If I have just one LiDAR, and it's working bad under snow, could I have a better LiDAR and a camera instead?

Here, we will create a list of requirements for the new system. It's still the "concept" phase, where we identify the breaking points, and turn this into a better solution.This is the work where you try to think about reducing risk to an acceptable level.

4. Functional Safety Analysis

Then, we implement things like FMEA (Failure Mode and Effects Analysis) to assess potential failure causes, effects, and mitigation strategies. We can also run FTA (Fault Tree Analysis) to explore how faults propagate and lead to hazards. We want to identify all causes of errors.

5. Design Safety Mechanisms

Then, we're introducing mechanisms to detect, isolate, or prevent failures (e.g., redundancy, diagnostics, fail-safe systems). This can be watchdog timers, dual-channel systems, degraded operational modes, ...

For example, one of the Functional Safety Methods is to implement redundancy. If you have an ASIL-D component (unsafe); you could turn it into 2 ASIL-B ones (somewhat safe). This way, your overall ASIL score is better, and you become compliant.

A Functional Safety task called ASIL Decomposition used to decrease risk

In this example, we could imagine that the second LiDAR is different, or that the algorithms behind it are more "deterministic", don't use AI, and therefore are safer. The goal of functional safety is to try and reduce as many components to ASIL-A and ASIL-B as possible. >>> This is the acceptable level.

6. Validation and Verification

How do we test? This can be field tests, but also simulations, hardware-in-the-loop (HIL), and fault injection testing. You can also here test the Safety of Intended Functionality (SOtIF) — how performant is your algorithm? Is it really THAT good?

Finally:

7. Iterate, Validate, and Document

You want to iterate, improve, and document your safety analysis results. In the end, it's a very technical job, but that has a lot of paperwork, documentation, diagrams, schematics, grading, because these are the papers giving you authorizations.

We have now seen:

What is functional safety?
What are the different norms we should comply with?
How do we comply with these norms (overview)

Let's see an example:

Example: Mobileye's Primary Guardian Fallback / "True Redundancy" System

Mobileye, Intel's self-driving car company, is has a very strong functional safety focus. Their algorithm has 3 distinct channels that are completely different:

Mobileye's True Redundancy System (you can learn more by looking the full video in my app — available when you join my daily emails)

The lane detection is the main channel used to fine lane lines. This can work for example with modular deep lane detection. It is verified with HD Map Extraction & Localization. If they agree, then we're good, but if they don't, they'll extract the lanes from a parallel end-to-end deep learning algorithm that will act as the "judge" or guardian.

Do you realize how many algorithms are running in parallel? They implemented these automatic protection functions in case of failure. They also implemented these safety requirements across the entire system, meaning the electronic systems, the software components, and so on...

When doing something like this, it's very important that each function is run using a separate method, possibly with a separate computer, separate sensors, etc... so that there cannot be a single point of failure (for example, if everything uses the same camera, and this one fails, it's not functionally safe).

Wait... Does everybody really do all of this?

No.

In fact, many startups don't have a functional safety team, or even have a safety system in place. In this case, they try to do it in the safety critical systems, while waiting for the certification process. Some are also in a more favorable state/country that gives permits more easily (to enhance innovation and startups work on the technology).

It's important to understand that complying with ISO norms is NOT mandatory. In the European Union, you need to comply with the UNECE WP.29 Regulations (traffic laws) but I don't think the ISO norms are mandatory.

In fact, Tesla doesn't comply with the norms, and they are approved to drive in the streets. They sell cars, and they even sell autonomous cars all across the world. But you'll note that some of their functions, like FSD (Full Self-Driving) are currently (early 2025) NOT authorized everywhere, like in Europe, because they don't comply with all the norms.

Okay, okay, I think we have ENOUGH! Let's do a summary...

Summary & Next Steps

Functional safety makes sure robots and algorithms operate safely, even when something goes wrong, by reducing risks to an acceptable level.
Every engineer working in the field should be introduced to safety. This defines how you code, but also whether a startup gets authorzations to drive or not.
Key functional safety norms include ISO 26262 for electronics, ISO 21448 for algorithms, ISO 21434 for cybersecurity, and UNECE WP.29 for EU compliance.
The V-Model is a structured approach in functional safety management, covering concept, coding, and validation phases to achieve compliance. It has a V shape doing Conception - Coding - Testing.
Functional Safety is a 7-step process includes defining systems, hazard analysis, setting safety goals, and implementing safety mechanisms to ensure compliance.
The ISO26262 norm defines risks as Exposure * Severity * Controllability. An ASIL table then defines for each function, which grade it has.
When something is risky (ASIL-C, ASIL-D), we introduce redundancy, diagnostics, and fail-safe systems to detect, isolate, or prevent failures, enhancing the overall safety integrity level.
We want to test through simulations, field tests, and fault injection to ensure safety functions perform under all conditions, meeting the required safety standards.

Alright, I think we are good!

📱