Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
So, is all the effort with the "lane language" developed for v11 wasted with v12?

Screenshot 2023-09-04 at 8.42.02 AM.png
Screenshot 2023-09-04 at 8.43.04 AM.png
 
Yes, most of the past approaches / strategies are "irrelevant" with V12's training.

All of these are no longer used in V12's training data (my educated guess):

1) Any human/auto labeling
2) Birds eye view
3) Occupancy network / NERFs
4) Autolabeled speed / velocity estimations

V12 is only given video and makes its own world model without using any human labels or heuristics.
 
Last edited:
So, is all the effort with the "lane language" developed for v11 wasted with v12?
In a research project, any effort that goes into moving towards the goal isn't wasted. The lane language stuff was just another in a long line of learning experiences for the team. Realize that everything they have today, including all their V12 plans, may be scrapped before they finally get FSD working reliably, regardless of the autonomy level.

I'm not trying to be pedantic here at all. FSD is a research project, so the most important aspect of moving forward is gaining understanding of both the problem and the solution. Imagine the expertise that has been accumulated by the people who worked on the heuristic control system. The same can be said of applying language techniques to lane navigation.
 
In a research project, any effort that goes into moving towards the goal isn't wasted. The lane language stuff was just another in a long line of learning experiences for the team. Realize that everything they have today, including all their V12 plans, may be scrapped before they finally get FSD working reliably, regardless of the autonomy level.

I'm not trying to be pedantic here at all. FSD is a research project, so the most important aspect of moving forward is gaining understanding of both the problem and the solution. Imagine the expertise that has been accumulated by the people who worked on the heuristic control system. The same can be said of applying language techniques to lane navigation.

Tesla's FSD approach for the last 7-8 years can be summarized as "do what we can with the compute we have."

They went from single images with simple human labels to single images with more complicated labels, then to video with human labels in vector space, to large NN autolabels with human editors, and now to pure video with billions $$ of compute clusters.

I wouldn't say anything they did has been a "waste," but we can definitely see that even with V11, they approached a local maximum.
 
Tesla's FSD approach for the last 7-8 years can be summarized as "do what we can with the compute we have."
"and with the expertise that we possess"

I wouldn't say anything they did has been a "waste," but we can definitely see that even with V11, they approached a local maximum.
Heuristic control on V3 was stalled, I agree to that much. Whether heuristic control is a fundamentally flawed approach or whether V3 is fundamentally inadequate to provide competent L2, let alone L3, autonomy is still to be determined in my book.
 
  • Like
Reactions: Pdubs and JHCCAZ
"and with the expertise that we possess"


Heuristic control on V3 was stalled, I agree to that much. Whether heuristic control is a fundamentally flawed approach or whether V3 is fundamentally inadequate to provide competent L2, let alone L3, autonomy is still to be determined in my book.

I find the fact that autolabeling isn't working "well" to be intriguing.

A year or two ago, I actually was surprised to learn that Tesla was going to heavily leverage autolabeling. This is because Karpathy had a talk in the past (long before Tesla used autolabeling) where he said that autolabeling isn't ideal because over time, the predicted labels biased towards error in the NN model (to human evaluators, the labels look decent enough, but there's inherent jitter in the bounding boxes or lines that were predicted by the NN).

So it's turning out that Karpathy's intuition about autolabeling was well-placed.

Granted this might not be whole story for why V11 has seemingly hit a limit, but since autolabeling has played such a huge role in V11, I wouldn't be surprised if autolabels are the source of V11's erratic behavior.

Here's my actual post about that in 2021:


Screenshot 2023-09-04 at 2.39.40 PM.png
 
Last edited:
  • Like
Reactions: JB47394
That's why during the livestream, Elon said that the visualization doesn't represent what the car is thinking
I couldn't find that comment during the livestream, but you might be referring to Spaces discussion just before that:

It's actually hard for the car to explain what it's doing. But the same is true when you are say driving in a taxi or an Uber -- you don't actually know what the driver is thinking. You just know what the driver's track record is -- 4 or 5 star or whatever; and that they have a lot of experience, so you kind of trust that experience that they'll drive well…​
Even the rendering of what's on the screen is an approximation of what the car is thinking -- not exactly what the car is thinking.​

I do agree that a lot of what was visualized in the V12 demo was probably reused from 11.x mostly for helping provide some context such as the road and objects. And from what can be seen of the demoed blue path, it seems to behave differently enough from 11.x to want the additional UX context as otherwise a blue path representing new control just by itself would probably be confusing. Similarly, as we've noticed the differences in visualization in the demo, e.g., framerate, why bother changing it vs showing nothing at all if the new control network really did not affect the display?

After rewatching the various 2023 CVPR presentations from Tesla, it does seem possible that their new world model evolved from occupancy network obsoleted a lot of supervised training networks that have been deployed to the fleet for traditional control. If so, that would probably be an even bigger accomplishment than people realize or appreciate. However, these networks are still useful for collecting and curating data, e.g., find examples of red lights where adjacent vehicles are moving versus a more generic "human driver control differed from control network."
 
I have a HW3 Model S 2022. Very optimistic about FSD 12 with end to end AI. With 300,000 lines of code removed, it will run faster on HW3 and drive more smoothly since that will be AI controlled. Curated video in and improved driving out without any additional coding is the way it works. With the new Nvidia supercomputer online and Dojo ramping up, we should see rapid improvements in driving. I am pleased that Tesla is working to make HW3 cars super smooth and safe before turning their attention to HW4.
 
So many people are misunderstanding V12, even James Douma doesn't know what he's talking about in this video. V12 is not built on top of major V11 techniques like BEVs and autolabeling. Elon and Ashok made it clear as such during the livestream when they repeatedly said V12 doesn't make use of human concepts like stop signs, lane lines, and traffic lights.

V12 is not simply a neural planner on top of V11...

V12 isn't even in the same paradigm as V11 or "normal" fsd systems. It is not a "perception, planning, and control" kind of paradigm. It literally is what Elon said, neural nets all the way, no humans involved in labeling or defining semantics or heuristics (except for maybe guardrails to limit extreme / risky behavior).

 
So many people are misunderstanding V12
Everything that James Douma said resonated with me perfectly well. I have every expectation that they've replaced the control module with a neural network that has been trained separately. Andrej Karpathy said that a monolithic neural network would suffer from loss of signal during training. That's why you start with training of the individual chunks. Once you've got them where you want, you can consider allowing the borders between those chunks to shift as additional training dictates.
It literally is what Elon said, neural nets all the way
Which is what a V11 system with a neural control module would be. Neural networks all the way.

I can't see them duplicating the V11 visualization without relying on the V11 software. When they have a monolithic neural network solution, I wonder if they'll even bother with a visualization. I've always thought that the visualization stuff was just an engineering diagnostic that they turned into a feature (which only serves to distract the driver). In other words, the design of the software happened to have that data lying around, so they created a visualization. In a monolithic system, that data won't naturally come into being. If they want to keep the visualization they're going to have to train the system to provide it.