Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Can FSD hallucinate?

This site may earn commission on affiliate links.

mimetz

Member
Supporting Member
May 21, 2023
30
24
95070
I've come to understand that FSD, with it's intense data gathering, training models, and dojo neuron networks, is a large AI application. Chat GPT and other better known AI applications are known to sometimes hallucinate, to make mistakes, even to make up facts. My question is can FSD behave similarly? Obviously if the answer is yes that raises a big issue. But if the answer is no, why does the FSD system not share this behavior of other AI systems?

Thanks for educating me.
 
I've come to understand that FSD, with it's intense data gathering, training models, and dojo neuron networks, is a large AI application. Chat GPT and other better known AI applications are known to sometimes hallucinate, to make mistakes, even to make up facts. My question is can FSD behave similarly? Obviously if the answer is yes that raises a big issue. But if the answer is no, why does the FSD system not share this behavior of other AI systems?

Thanks for educating me.


 
I've come to understand that FSD, with it's intense data gathering, training models, and dojo neuron networks, is a large AI application. Chat GPT and other better known AI applications are known to sometimes hallucinate, to make mistakes, even to make up facts. My question is can FSD behave similarly? Obviously if the answer is yes that raises a big issue. But if the answer is no, why does the FSD system not share this behavior of other AI systems?

Thanks for educating me.

My understanding is that yes, FSD can "hallucinate", just like any large model. I am sure Tesla does their best to try to minimize hallucinations with quality data and the right training but the risk always there. One question is whether the hallucination is safety critical or not. If it is not safety critical, then you can ignore it. You only need to worry about hallucinations that are safety critical.

The risk of safety critical hallucinations is one reason why FSD requires supervision. It is also why many experts argue that a pure vision end-to-end model cannot achieve the 99.99999% reliability needed to remove driver supervision and why some heuristic code is needed to serve as a guardrail against hallucinations.
 
I've come to understand that FSD, with it's intense data gathering, training models, and dojo neuron networks, is a large AI application. Chat GPT and other better known AI applications are known to sometimes hallucinate, to make mistakes, even to make up facts. My question is can FSD behave similarly? Obviously if the answer is yes that raises a big issue. But if the answer is no, why does the FSD system not share this behavior of other AI systems?

Thanks for educating me.

Give Tesla some time. FSD was only announced in 2016, or 8 years ago. Multiple articles have claimed Tesla has the advantage of massive data collection like telemetry even before 2016.

This great advantage will finally lead to the introduction of Robotaxi soon, right in 2024!
 
  • Funny
Reactions: Bikeman
My understanding is that yes, FSD can "hallucinate", just like any large model. I am sure Tesla does their best to try to minimize hallucinations with quality data and the right training but the risk always there. One question is whether the hallucination is safety critical or not. If it is not safety critical, then you can ignore it. You only need to worry about hallucinations that are safety critical.

The risk of safety critical hallucinations is one reason why FSD requires supervision. It is also why many experts argue that a pure vision end-to-end model cannot achieve the 99.99999% reliability needed to remove driver supervision and why some heuristic code is needed to serve as a guardrail against hallucinations.
Thank you diplomat33, this is very helpful for my understanding of FSD. Re your second paragraph, this issue would seem to not bode well for Musk's vision for FSD, no pun intended.
 
Thank you diplomat33, this is very helpful for my understanding of FSD. Re your second paragraph, this issue would seem to not bode well for Musk's vision for FSD, no pun intended.

Well, it all depends on how reliable Tesla is able to get their end-to-end model. Maybe they do get the hallucinations to be so rare that it does not matter? Or maybe Tesla discovers a way to eliminate AI hallucinations? A lot of work is going into trying to find a way to solve the problem of AI hallucinations. Or maybe Tesla does change their approach by adding some heuristic code or adding radar/lidar to protect against safety critical hallucinations? Tesla's approach is very difficult to get to 99.99999% reliable but not impossible, especially given enough time with new AI innovations that are happening so quickly now.
 
Last edited:
  • Like
Reactions: JulienW
Also, since the end-to-end model takes in vision input and outputs control, I don't think the hallucinations are likely to be perception. So for example, I don't think FSD will hallucinate an entire bus that is not really there. More likely, the hallucinations will be prediction-planning. If you think of how chat-GPT works, it takes in a text prompt and based on its training, strings together words to form a response to the prompt. Hallucinations happen when chat-GPT strings together words that seem to match the training but in reality are completely wrong in terms of the text prompt. So for FSD, I think a hallucination would be similar. Based on its training, the e2e model would output a steering or braking output that seems to match the training but is actually completely wrong in that specific situation. For example, you train the model to move over to the left lane to pass a vehicle. FSD follows its training, moves over in what it sees as an open lane to pass the obstacle but that "lane" is actually an unpaved section of a construction zone that is not drivable.
 
Thank you diplomat33, this is very helpful for my understanding of FSD. Re your second paragraph, this issue would seem to not bode well for Musk's vision for FSD, no pun intended.
Remember that people "hallucinate" as well. Optical illusions, tricks of light and shadow, etc. It's popular clickbait on social media. No vision system is perfect, so the key to getting FSD to be reliable is training, training and more training. Fortunately, Tesla has the data to do extensive training.



What follows is a lengthy description of how the problem of hallucinations is playing out, but the short form is that they seem to have vanished for the driving system, while they may remain for some other driver assistance features that use an older perception system that is known to sometimes hallucinate. Such as is seen in the graveyard video.



Earlier generations of Tesla's driver assists were divided into two major parts.

The first part was a neural network perception system that looked at the video coming from the cameras and came up with a description of the world around it. It was sufficiently detailed that Tesla could create a visualization of it. The perception system identified cars, pedestrians, lane lines and so on.

The second part was a set of hand-built rules written with traditional coding techniques that looked at the information coming from the perception system and decided what the car should do to drive around.

The perception system is what you see in that video. It was known to hallucinate, and would even cause the car to sometimes "phantom brake", which is to perform a hard brake for no apparent reason. It could scare the pants off drivers.

Now come forward to the latest generation of Tesla's driver assists; version 12. It is entirely neural networks. The system looks at the video coming in from the cameras and, through the magic of extensively-trained neural networks, tells the car what to do. That system hasn't been demonstrating any behaviors that suggest hallucinations. That may be because the system isn't obliged to come up with cars, pedestrians and such. It doesn't have to do anything but drive the car correctly. It may be that asking for too many details of a neural network encourages it to fill in the blanks by making up stuff. ChatGPT is infamous for that.

With the latest generation, the still car shows the same visualization that shows ghosts in graveyards. That's because even today, Tesla has kept the older perception system for that visualization (and perhaps other things), which is a neat toy, and great marketing tool. But the driving system is not using that visualization information. When the car shows ghosts in the graveyard, the driving software doesn't care because it decides on its own what's in the graveyard.

One other bit of possible confusion could be that there are still driver assistance features that use the older perception software. For example, when using the latest parking assist, people have occasionally reported that the car will show a phantom person standing in the parking spot, and the parking assist will refuse to drive into them. Given that the visualization is based on the old perception system, it suggests that the parking assist is also based on the old perception system.
 
  • Like
Reactions: Dewg
It’s not a valid comparison or question.

LLM are designed to hallucinate. It’s their main feature. Safety critical systems like autonomous driving systems need to be correct. Fundamentally the systems are completely different, with different design objectives..

Imagine if someone got injured to died when autocorrect or LLM:s messed up their “next word” prediction…



A big part of safety critical engineering is validation. There is currently no way to validate a machine learning system, and that’s a large part of why wide ODD autonomy is hard.

The systems can fail if you put a few pieces of tape on a sign or if the sun is in a certain position when taking a curve for the 1000 time and we don’t currently understand when or why it will happen.

To answer your question: It’s not likely that pure ML based systems like FSD will be autonomous in a meaningful way this decade unless there are some breakthoughs in research. But it’s not due to hallucinations. It’s because they’re unpredictable and unvalidatable, but I guess that’s that you meant.
 
Last edited:
A big part of safety critical engineering is validation. There is currently no way to validate a machine learning system, and that’s why wide ODD autonomy is hard.

Is that problem unique to an ML based approach to self driving cars though?

You can't "validate" a explicitly hand-coded self driving system either, in the sense of writing out a set of acceptance criteria, testing the the system, and then if it passes the tests it's "validated". I mean, you could do that, but it gives you very little real confidence on it's real world driving ability. The real world of driving is simply more complex and has more variety than is possible to distill down to a set of rules or behaviors you can validate.

Tesla's approach (for years!) has always been that the real validation is billions of miles of actual driving under human supervision. That's been the verbiage on their website, explicitly, for as long as I can remember and has been one of the core principles of how they've been developing FSD.

(Seems reasonable to me. I haven't seen anybody propose a serious alternative testing the cars a lot in the real world, with varying approaches to how much human supervised testing is needed and how large of a geographic area they test in)

To the original question, I'm also not quite sure what "hallucination" actually means for a car. Could we better define it as "car does something unexpected without an obvious trigger?" Do you have a better definition?

Under that definition, yes, 100%. I see a lot of little oddities from V12. V12 consistently drifts off center in lanes for no reason, sometimes even hitting yellow lines. It occasionally simply "forgets" to use it's blinker in situations it has worked fine in the past. Following distances to cars in front are strangely variable as well.
 
Also, since the end-to-end model takes in vision input and outputs control, I don't think the hallucinations are likely to be perception. So for example, I don't think FSD will hallucinate an entire bus that is not really there. More likely, the hallucinations will be prediction-planning. If you think of how chat-GPT works, it takes in a text prompt and based on its training, strings together words to form a response to the prompt. Hallucinations happen when chat-GPT strings together words that seem to match the training but in reality are completely wrong in terms of the text prompt. So for FSD, I think a hallucination would be similar. Based on its training, the e2e model would output a steering or braking output that seems to match the training but is actually completely wrong in that specific situation. For example, you train the model to move over to the left lane to pass a vehicle. FSD follows its training, moves over in what it sees as an open lane to pass the obstacle but that "lane" is actually an unpaved section of a construction zone that is not drivable.
This is scary. We may see more of this as ADAS and AVs become more prevalent.

 
This is scary. We may see more of this as ADAS and AVs become more prevalent.

I guess? Maybe I spent too much time testing V11 that managed to abruptly and jerkily do the wrong thing many times without any AI involvement, so it doesn't really seem concerning to me. At least not any more than systems that have already driven many millions of miles on public roads
 
In a LLMs (and other modal AIs) aren't hallucinations manly cased by all the inaccurate and down right bogus data on the web that it combs through. With Tesla the data is curated videos by Tesla so this should reduce the chances of hallucinations because there shouldn't be much "garbage" data to confuse with.

 
Is that problem unique to an ML based approach to self driving cars though?

You can't "validate" a explicitly hand-coded self driving system either, in the sense of writing out a set of acceptance criteria, testing the the system, and then if it passes the tests it's "validated". I mean, you could do that, but it gives you very little real confidence on it's real world driving ability. The real world of driving is simply more complex and has more variety than is possible to distill down to a set of rules or behaviors you can validate.

Tesla's approach (for years!) has always been that the real validation is billions of miles of actual driving under human supervision. That's been the verbiage on their website, explicitly, for as long as I can remember and has been one of the core principles of how they've been developing FSD.

(Seems reasonable to me. I haven't seen anybody propose a serious alternative testing the cars a lot in the real world, with varying approaches to how much human supervised testing is needed and how large of a geographic area they test in)

To the original question, I'm also not quite sure what "hallucination" actually means for a car. Could we better define it as "car does something unexpected without an obvious trigger?" Do you have a better definition?

Under that definition, yes, 100%. I see a lot of little oddities from V12. V12 consistently drifts off center in lanes for no reason, sometimes even hitting yellow lines. It occasionally simply "forgets" to use it's blinker in situations it has worked fine in the past. Following distances to cars in front are strangely variable as well.
The point is not "ML is bad and rules are good". This is a super hard engineering problem. In my view you need to use all techniques and technology to the fullest so that they complement each other if you want any kind of functional system with some amount of reasonable safety guarantees. My position is that in life-or-death-type of applications there are no magic shortcuts. You need to limit the ODD (primarily geo and speed) to be able to validate the system.

The benefits with rules is that they can be reviewed and understood. The benefits with ML is that it is likely, to some extent, to handle the cases for which you cannot write rules.

Rules are great for interpreting the actual traffic laws, whereas ML is great for interpreting behavior for example. Rules are great imho as a kind of general safety net around the ML.

There comes a time when you get to situations that extremely rare and machine learning cannot handle. To have a driverless deployment you need to carefully think through your stack to make sure that even if machine learning does not solve everything 100%, your full self-driving product does.

If you are to release a driverless service you need to also answer the question of how how to make sure the whole thing is fully robust and I believe that adds a level of complexity that Tesla is yet to tackle.

It's impossible to regression test "the world" for every point release that might cause horrible accidents because of some weird bug. So I don't see wide-ODD happening anytime soon regardless of approach.
 
Last edited:
In a LLMs (and other modal AIs) aren't hallucinations manly cased by all the inaccurate and down right bogus data on the web that it combs through. With Tesla the data is curated videos by Tesla so this should reduce the chances of hallucinations because there shouldn't be much "garbage" data to confuse with.

No. That's not why they "screw up". The design intent is to be creative. There is a parameter in LLM:s for this that you tune depending on the use case called "temperature".

If you set the temperature to zero, you basically get the training data back when generating tokens. If you set the temperature to non-zero it improvises when predicting the next likely word - it's basically fancy statistics with a sprinkle of randomness. The higher the temperature, the more improvisation/"creativity". The model itself lacks fundamental understanding of the world.


Another problem with ML in general that is disregards outliers in the training set, like a weird long tail event that only happens once or twice in the data. So when that happens, the model isn't likely do the resonable thing, because it cannot reason and it lacks common sense.
 
Last edited:
I guess? Maybe I spent too much time testing V11 that managed to abruptly and jerkily do the wrong thing many times without any AI involvement, so it doesn't really seem concerning to me. At least not any more than systems that have already driven many millions of miles on public roads
I think you missed the scary part of article. Engineers have created a device that can fool nearly any radar system into seeing phantom objects like other cars when they aren't there.

Imagine teenagers using that to cause mayhem in cities with AVs.
 
My understanding is that yes, FSD can "hallucinate", just like any large model. I am sure Tesla does their best to try to minimize hallucinations with quality data and the right training but the risk always there. One question is whether the hallucination is safety critical or not. If it is not safety critical, then you can ignore it. You only need to worry about hallucinations that are safety critical.

The risk of safety critical hallucinations is one reason why FSD requires supervision. It is also why many experts argue that a pure vision end-to-end model cannot achieve the 99.99999% reliability needed to remove driver supervision and why some heuristic code is needed to serve as a guardrail against hallucinations.
My Tesla routinely hallucinates semi's while parked in my carport, which doesn't really bother me and certainly isn't a safety hazard. But seeing people walking around in a grave yard that normal humans can't see, not so much. I can picture it easily turning into a safety hazard as people become reckless in trying to get out of there? 😱
 
Imagine teenagers using that to cause mayhem in cities with AVs.
Imagine teenagers throwing rocks from overpasses. Or scattering nails on a road. There are a million ways to cause mayhem with cars today, and it's more a question of civil order than technology.

But returning to hallucinations, I'll repeat what I said earlier - it may be that the more complex the output requested, the more innovative a machine learning system will get in providing answers. It transforms input patterns into output patterns, and where there is missing information, it extemporizes.

@spacecoin mentioned "temperature", and that apparently controls how low of an output signal you're willing to tolerate. So just make sure that tolerance is low. I assume that Tesla has to give the system a certain amount of leeway, otherwise it would constantly demand that the driver take over. "That's a blue Chevy and I've never seen one of those. I give up." It then falls to Tesla to get the training curated as perfectly as possible so that the system will confidently choose the best outputs, allowing for the system to continue to operate with that low "temperature" value.
 
@spacecoin mentioned "temperature", and that apparently controls how low of an output signal you're willing to tolerate. So just make sure that tolerance is low. I assume that Tesla has to give the system a certain amount of leeway, otherwise it would constantly demand that the driver take over. "That's a blue Chevy and I've never seen one of those. I give up." It then falls to Tesla to get the training curated as perfectly as possible so that the system will confidently choose the best outputs, allowing for the system to continue to operate with that low "temperature" value.
That's not quite how it works. In an LLM you set the temperature to low if you want the token prediction to be less improvising. Think of it as 0.1 is a boring teacher and 2.0 is Salvador Dali on LSD.

I don't think there is any point in comparing LLM:s and AVS (autonomous vehicle systems) or their terminology, They have very different design goals and requirements.

IMHO, it's not "hallucinating" when a Tesla "phantom breaks". I'd call that a false positive.
 
Last edited:
  • Like
Reactions: clydeiii