Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
I still don't know why people still think V12 uses any of the V11 perceptual assets.

If V12 was simply a NN planner on top of V11 assets, Tesla would have built V12 gradually, in a software 2.0 style, where the NN would slowly replace more and more heuristic code.

There would have been no need to go all nets all at once.
A lot of the literature on end-to-end autonomous driving seems to assume a modular approach, so it seems like they would use modules that they already have (of course they may all be optimized together).

That’s why replacing the planner with an NN seems like such a natural and incremental approach for them.

Who knows exactly what they are doing. But end-to-end v12 certainly doesn’t exclude them from building on all they have included to date (except the heuristic stuff that is being swapped out).
 
  • Like
Reactions: APotatoGod
If V12 was simply a NN planner on top of V11 assets, Tesla would have built V12 gradually, in a software 2.0 style, where the NN would slowly replace more and more heuristic code.
Could you provide some examples of what you think an iterative replacing of heuristic code could start with? Perhaps I'm misunderstanding, but it would seem to increase complexity if needing to decide when to use neural network control vs heuristics?

Another potential benefit of not directly using 11.x perception is that architecture had human defined outputs that people thought were useful for control, but letting training figure that out could find new perception that is more useful for end-to-end.
 
  • Like
Reactions: gsmith123
When I think back to my early driving experiences, there may be useful clues there about solutions to some current driving issues. The horses understood verbal commands, possibly with a larger vocabulary than my 'Y'. The only real controls beyond that were a parking brake and the reins. I have had no problems at all with the Y parking brake.

I'm thinking we should be considering other controls that would be nice on Teslas. Elon surely could provide a couple of reins, (if we can find the right vegan material), to control a local time distortion field. A pull on the B rein could distort the field to reverse time locally, so when the car gets in trouble, a gentle yank on the rein could move the area back in time far enough to allow adjusting speed and direction to avoid an incident.
 
Could you provide some examples of what you think an iterative replacing of heuristic code could start with? Perhaps I'm misunderstanding, but it would seem to increase complexity if needing to decide when to use neural network control vs heuristics?

Another potential benefit of not directly using 11.x perception is that architecture had human defined outputs that people thought were useful for control, but letting training figure that out could find new perception that is more useful for end-to-end.

I'm not sure I fully understand your question, but an example of software 2.0 wrt to a NN planner is the Vision Speed network:


"FSD Beta v11.4 added a new Vision Speed network to infer the typical driving speed on a given road. It mitigates hydroplaning risk by making maximum allowable speed in Autopilot proportional to the severity of the detected road conditions."
 
  • Helpful
Reactions: RabidYak
an example of software 2.0 wrt to a NN planner is the Vision Speed network: "FSD Beta v11.4 added a new Vision Speed network to infer the typical driving speed on a given road. It mitigates hydroplaning risk by making maximum allowable speed in Autopilot proportional to the severity of the detected road conditions."
The release notes has two separate entries, but unclear if they're directly related like the author seems to imply in that article.
  • Added new Vision Speed network to infer the typical driving speed on a given road. This is used to limit the maximum allowed speed in environments such as parking lots and residential roads.
  • Mitigated hydroplaning risk by making maximum allowable speed in Autopilot proportional to the severity of the detected road conditions. In extreme cases, Autopilot may use the wetness of the road, tire spray from other vehicles, rain intensity, tire wear estimation or other risk factors that indicate the vehicle is near the handling limit of the surface to warn the driver and reduce speed.
Just focusing on the first entry, that still seems to be a perception task of predicting whether the environment looks like a parking lot or residential road that wants a lower speed, but indeed this is more directly used to override the higher speed that heuristics might have otherwise wanted to go. Basically it's a perception alternative to static map data or dynamically detected speed limits.

I wonder if Tesla added this behavior as a proactive safety change to be more cautious in some situations where pedestrians can suddenly appear or if Tesla noticed a lot of disengagements of FSD Beta confused by unmapped parking lots assumed it should follow the speed limit of the main road. Either way, this is also a good example where end-to-end could potentially learn that these situations that look like parking lots, etc. resulted in human driving speeds much lower than what the speed limit would suggest.
 
  • Like
Reactions: APotatoGod
Question is - how does end-to-end work in terms of routing ?

Because - lane selection is probably the major cause of disengagements.

How will or even can e-2-e make it better? It can't learn from videos ...
Why couldn’t it? Most of my disengagements are due to FSD veering into turn lanes. They should be able to use videos to teach it what a turn lane is. It seems pretty ridiculous - going straight is one of the most basic functions of FSD And they still can’t get it right.
 
  • Like
Reactions: MTOman
Just focusing on the first entry, that still seems to be a perception task of predicting whether the environment looks like a parking lot or residential road that wants a lower speed, but indeed this is more directly used to override the higher speed that heuristics might have otherwise wanted to go. Basically it's a perception alternative to static map data or dynamically detected speed limits.

Yes, theoretically, you can use software 2.0 to slowly consume most heuristic code.

When I listen to Elon talking about the 300k lines of code, he's constantly referring to behaviors at traffic elements / lanes / etc.

All of these can be slowly consumed by NNs, example:

Prior human code:

If turning left at roundabout, wait check left quadrant for any cars > 5mph with predicted path intersecting ego path within next 3 seconds, proceed after 3 seconds

Heuristic code with more NNs (this line of code hypothetically eliminates 90% of the prior human code lines):

If roundabout and clear, proceed // where there's a NN to determine roundabouts and another NN that predicts a clear path
 
  • Informative
Reactions: APotatoGod
With V12 on the way (just released to a wider group of employees), it's interesting that they're still making improvements to V11.x. I've got a theory about something they might be doing. If we look at the 11.4.9 release notes the changes are all perception related (cut-in prediction, VRU perception, vehicle velocity, static objection detection)

I wonder if they're planning to keep using these networks as the AI building blocks for active safety features (automatic emergency braking, collision avoidance, collision warnings, etc).

Not using end-to-end AI might make sense in these applications. In near-collisions (and actual collisions) humans often don't perform particularly well as drivers (slow reaction times, can only look in one direction at a time), so it wouldn't be straightforward to collect a large training set. Plus those events are quite rare, although that may not be a problem at Tesla scale.
 
With V12 on the way (just released to a wider group of employees), it's interesting that they're still making improvements to V11.x. I've got a theory about something they might be doing. If we look at the 11.4.9 release notes the changes are all perception related (cut-in prediction, VRU perception, vehicle velocity, static objection detection)

I wonder if they're planning to keep using these networks as the AI building blocks for active safety features (automatic emergency braking, collision avoidance, collision warnings, etc).

Not using end-to-end AI might make sense in these applications. In near-collisions (and actual collisions) humans often don't perform particularly well as drivers (slow reaction times, can only look in one direction at a time), so it wouldn't be straightforward to collect a large training set. Plus those events are quite rare, although that may not be a problem at Tesla scale.
I am almost positive they are reusing a lot if not all of the perception pieces from version 11. V12 is about feeding these inputs into a new neural network or set of neural networks that do all the decision-making and control outputs for the driving task.

The new video module in 11.4.9 is taking advantage of amortized compute, where rather than recalculating something entirely on each time step, you compute say 33% of it so that you have new info every 3rd time step. This frees up computing power to do other tasks.

This is very much like techniques that are being used by the cutting edge game engine Unreal Engine 5 for their Lumen system, which effectively calculates real-time global illumination but does it by amortizing the calculations across a series of frames (because the computations can't be done with current hardware quickly enough do be done entirely in a single frame).

I think Tesla did this for 2 reasons: one, to lower latency, and two, to free up more compute for the (what must be) quite large neural network(s) controlling the planning/driving task.
 
I try to give as much helpful feedback as possible, for example: "Autopilot did not execute a lane change as requested", "I needed to make a quick lane change", "Autopilot was going to hit another vehicle in the next lane".
I've personally decided that the FSD team has moved onto the v12 fork and anything that I might click to narrate a disengagment would end up in the toilet. Just my opinion. I believe Tesla has access to way more data than anyone would possibly know what to do with. Tesla's challenge today and even more so in the future is to figure out the most efficient way to identify and use the vast, vast data they have access to.
 
I've personally decided that the FSD team has moved onto the v12 fork and anything that I might click to narrate a disengagment would end up in the toilet.

I still complain by voice on important disengagements. I doubt every single one is listened to by a human, but I bet somewhere in their sea of metrics, they're at least giving more weight to disengagements that were voice-tagged at all, and who knows maybe they count swear words or something.
 
They probably could for straight highway driving (where the old AP stack was being used.) Even that would be awesome - how nice would it be to be able to read a book while cruising on the highway?
It’s completely unrealistic to me that FSD will be L3 at highway speed next year. Perhaps in a few years at lower speed daytime, dry roads and without lane changes like Mercedes, but likely never due both to lack of business incentives and technical limitations . L4? LOL.

Just the certification for (L3) UNECE R157 will take 3-6 months and Tesla currently don’t have the reliability required nor the features required like hand over protocol, MRM, emergency corridor et c. Earliest 2025, probably never on current hw is my guess.

Instead they’re targeting they new DCAS regulations from UNECE in they ever can get that approved. Tesla’s been chairing that L2 effort for 3 years with limited success thus far. Perhaps implemented by 2025-2026?
 
Last edited: