FSD v12.x (end to end AI)

powertoold · Sep 4, 2023

nate704 said:
I got a few questions:

1. Why FSD Beta tries to stay on the passing lane all the time blocking faster cars?
2. Why FSD Beta tries to stay too close to the curb on either side?

This will all be fixed in V12! Without any baked in heuristics, V12 will automagically choose the best lanes to be in, better than any manually programmed logic

insaneoctane · Sep 4, 2023

So, is all the effort with the "lane language" developed for v11 wasted with v12?

mongo · Sep 4, 2023

insaneoctane said:
So, is all the effort with the "lane language" developed for v11 wasted with v12?

View attachment 970941 View attachment 970942

Possibly, unless there is carryover from translating map data to route planning.
Map -> semantics -> NN input
Although, on Elon's drive he mentioned v12 self navigating, so ???

powertoold · Sep 4, 2023

Yes, most of the past approaches / strategies are "irrelevant" with V12's training.

All of these are no longer used in V12's training data (my educated guess):

1) Any human/auto labeling
2) Birds eye view
3) Occupancy network / NERFs
4) Autolabeled speed / velocity estimations

V12 is only given video and makes its own world model without using any human labels or heuristics.

hooty · Sep 4, 2023

Is V12 just going to be an advanced SAE Level 2?
If so, what does that mean for the website advertising the ‘Tesla Network’ in 2017?

JB47394 · Sep 4, 2023

insaneoctane said:
So, is all the effort with the "lane language" developed for v11 wasted with v12?

In a research project, any effort that goes into moving towards the goal isn't wasted. The lane language stuff was just another in a long line of learning experiences for the team. Realize that everything they have today, including all their V12 plans, may be scrapped before they finally get FSD working reliably, regardless of the autonomy level.

I'm not trying to be pedantic here at all. FSD is a research project, so the most important aspect of moving forward is gaining understanding of both the problem and the solution. Imagine the expertise that has been accumulated by the people who worked on the heuristic control system. The same can be said of applying language techniques to lane navigation.

powertoold · Sep 4, 2023

JB47394 said:
In a research project, any effort that goes into moving towards the goal isn't wasted. The lane language stuff was just another in a long line of learning experiences for the team. Realize that everything they have today, including all their V12 plans, may be scrapped before they finally get FSD working reliably, regardless of the autonomy level.

I'm not trying to be pedantic here at all. FSD is a research project, so the most important aspect of moving forward is gaining understanding of both the problem and the solution. Imagine the expertise that has been accumulated by the people who worked on the heuristic control system. The same can be said of applying language techniques to lane navigation.

Tesla's FSD approach for the last 7-8 years can be summarized as "do what we can with the compute we have."

They went from single images with simple human labels to single images with more complicated labels, then to video with human labels in vector space, to large NN autolabels with human editors, and now to pure video with billions $$ of compute clusters.

I wouldn't say anything they did has been a "waste," but we can definitely see that even with V11, they approached a local maximum.

JB47394 · Sep 4, 2023

powertoold said:
Tesla's FSD approach for the last 7-8 years can be summarized as "do what we can with the compute we have."

"and with the expertise that we possess"

powertoold said:
I wouldn't say anything they did has been a "waste," but we can definitely see that even with V11, they approached a local maximum.

Heuristic control on V3 was stalled, I agree to that much. Whether heuristic control is a fundamentally flawed approach or whether V3 is fundamentally inadequate to provide competent L2, let alone L3, autonomy is still to be determined in my book.

powertoold · Sep 4, 2023

JB47394 said:
"and with the expertise that we possess"

Heuristic control on V3 was stalled, I agree to that much. Whether heuristic control is a fundamentally flawed approach or whether V3 is fundamentally inadequate to provide competent L2, let alone L3, autonomy is still to be determined in my book.

I find the fact that autolabeling isn't working "well" to be intriguing.

A year or two ago, I actually was surprised to learn that Tesla was going to heavily leverage autolabeling. This is because Karpathy had a talk in the past (long before Tesla used autolabeling) where he said that autolabeling isn't ideal because over time, the predicted labels biased towards error in the NN model (to human evaluators, the labels look decent enough, but there's inherent jitter in the bounding boxes or lines that were predicted by the NN).

So it's turning out that Karpathy's intuition about autolabeling was well-placed.

Granted this might not be whole story for why V11 has seemingly hit a limit, but since autolabeling has played such a huge role in V11, I wouldn't be surprised if autolabels are the source of V11's erratic behavior.

Here's my actual post about that in 2021:

OFFICIAL BUTTON WATCH

Today evening is earnings release, so we should be get an update on FSD along with FSD subscription

teslamotorsclub.com

Electroman · Sep 5, 2023

powertoold said:
This will all be fixed in V12! Without any baked in heuristics, V12 will automagically choose the best lanes to be in, better than any manually programmed logic

You are saying this with conviction or as a sarcasm ?

Mardak · Sep 5, 2023

powertoold said:
That's why during the livestream, Elon said that the visualization doesn't represent what the car is thinking

I couldn't find that comment during the livestream, but you might be referring to Spaces discussion just before that:

It's actually hard for the car to explain what it's doing. But the same is true when you are say driving in a taxi or an Uber -- you don't actually know what the driver is thinking. You just know what the driver's track record is -- 4 or 5 star or whatever; and that they have a lot of experience, so you kind of trust that experience that they'll drive well…

Even the rendering of what's on the screen is an approximation of what the car is thinking -- not exactly what the car is thinking.

I do agree that a lot of what was visualized in the V12 demo was probably reused from 11.x mostly for helping provide some context such as the road and objects. And from what can be seen of the demoed blue path, it seems to behave differently enough from 11.x to want the additional UX context as otherwise a blue path representing new control just by itself would probably be confusing. Similarly, as we've noticed the differences in visualization in the demo, e.g., framerate, why bother changing it vs showing nothing at all if the new control network really did not affect the display?

After rewatching the various 2023 CVPR presentations from Tesla, it does seem possible that their new world model evolved from occupancy network obsoleted a lot of supervised training networks that have been deployed to the fleet for traditional control. If so, that would probably be an even bigger accomplishment than people realize or appreciate. However, these networks are still useful for collecting and curating data, e.g., find examples of red lights where adjacent vehicles are moving versus a more generic "human driver control differed from control network."

enemji · Sep 5, 2023

powertoold said:
V12 is only given video and makes its own world model without using any human labels or heuristics.

I sure hope the videos are made with Hamilton and Verstappen as the drivers

powertoold · Sep 5, 2023

Electroman said:
You are saying this with conviction or as a sarcasm ?

Sorta sarcasm, I don't have much faith in V12

powertoold · Sep 5, 2023

Mardak said:
it does seem possible that their new world model evolved from occupancy network

I don't think the new world model evolved from the occupancy network. Ashok pointed out the shortcomings of the occupancy network in his recent cvpr talk:

powertoold · Sep 9, 2023

Article about what led to V12, essentially Elon slowed development of V11 and went all-in for end-to-end in April 2023:

How Elon Musk set Tesla on a new course for self-driving

Tesla's latest version of FSD had taught itself how to drive by processing billions of frames of video of how humans do it, Isaacson writes.

www.cnbc.com

lb92677 · Sep 10, 2023

I have a HW3 Model S 2022. Very optimistic about FSD 12 with end to end AI. With 300,000 lines of code removed, it will run faster on HW3 and drive more smoothly since that will be AI controlled. Curated video in and improved driving out without any additional coding is the way it works. With the new Nvidia supercomputer online and Dojo ramping up, we should see rapid improvements in driving. I am pleased that Tesla is working to make HW3 cars super smooth and safe before turning their attention to HW4.

Knightshade · Sep 10, 2023

lb92677 said:
I have a HW3 Model S 2022. Very optimistic about FSD 12 with end to end AI. With 300,000 lines of code removed, it will run faster on HW3

The NNs run on a different part of the HW (NPUs) than the C++ code does (ARM CPus) though so it's not, at all, clear that what you write is true.

enemji · Sep 10, 2023

IMHO, there will be auto generated code based on the video processing on the on-premise/in-house NN DOJO. The cars FSD will still have code.

powertoold · Sep 11, 2023

So many people are misunderstanding V12, even James Douma doesn't know what he's talking about in this video. V12 is not built on top of major V11 techniques like BEVs and autolabeling. Elon and Ashok made it clear as such during the livestream when they repeatedly said V12 doesn't make use of human concepts like stop signs, lane lines, and traffic lights.

V12 is not simply a neural planner on top of V11...

V12 isn't even in the same paradigm as V11 or "normal" fsd systems. It is not a "perception, planning, and control" kind of paradigm. It literally is what Elon said, neural nets all the way, no humans involved in labeling or defining semantics or heuristics (except for maybe guardrails to limit extreme / risky behavior).

JB47394 · Sep 11, 2023

powertoold said:
So many people are misunderstanding V12

Everything that James Douma said resonated with me perfectly well. I have every expectation that they've replaced the control module with a neural network that has been trained separately. Andrej Karpathy said that a monolithic neural network would suffer from loss of signal during training. That's why you start with training of the individual chunks. Once you've got them where you want, you can consider allowing the borders between those chunks to shift as additional training dictates.

powertoold said:
It literally is what Elon said, neural nets all the way

Which is what a V11 system with a neural control module would be. Neural networks all the way.

I can't see them duplicating the V11 visualization without relying on the V11 software. When they have a monolithic neural network solution, I wonder if they'll even bother with a visualization. I've always thought that the visualization stuff was just an engineering diagnostic that they turned into a feature (which only serves to distract the driver). In other words, the design of the software happened to have that data lying around, so they created a visualization. In a monolithic system, that data won't naturally come into being. If they want to keep the visualization they're going to have to train the system to provide it.

FSD v12.x (end to end AI)

Active Member

Well-Known Member

Well-Known Member

Active Member

Member

Active Member

Active Member

Active Member

Active Member

Well-Known Member

Active Member

Active Member

Active Member

Active Member

Active Member

Member

Well-Known Member

Active Member

Active Member

Active Member

Similar threads