FSD v12.x (end to end AI)

Todd Burch · Jan 8, 2024

Regardless of how the first iteration of v12 looks out of the gate, I think the positive thing to look forward to is that making improvements to the first version should be much easier.

With v11, improving C++ planning code in an already extremely complex codebase means very tiny baby steps and a high likelihood for bugs. It also means one step forward in one area often means one step backward elsewhere.

With neural nets, they just need to source more data from their data engine, possibly resize/reparameterize the network, then run it through compute. This is something that Tesla has proven, they have tons of real data for, and is much less prone to error than complex C++ code.

So I think this gives hope that we’ll see more steady improvements with each iteration of v12.

The Autopilot team has a lot on their plate right now:

1. They are working on Actually Smart Summon, intending to release late Q1.

2. Unless they’ve determined they can source all data from HW4 cars even for use in HW3 cars, they need to fork the HW3 and HW4 video clips and compute separate neural nets for each. They probably still need to source a lot more HW4 clips than HW3, particularly for winter and edge cases.

3. They need to get FSD running on Cybertruck and Semi.

4. They need to fully integrate the new NVidia cluster (maybe already completely done?) and the new Dojo clusters as they come online into their compute framework.

Because of all this, they may just make the v12 version just “good enough” to be better than v11 so they can get everyone focused on v12 from here on.

The real indicator of future progress will be how iterative versions of v12 improve on the previous one. THAT will be the indicator of how much room v12 has to grow.

We might even find that Tesla iterates their architecture further over time, combining or separating perception modules/networks as they learn more.

lzolman · Jan 8, 2024

Todd, this is what I said yesterday on the FB beta testers group in reply to the Teslascope news:

Wunnerful. I'm amazed at how anyone expects V12 to just come into reality as a Level 3 or 4 system and blow everyone's socks off.
In my understanding of reality, an AI system takes a long time to train...well. I'm a lot more interested in V12's rate of learning once it gets out there than it's absolute skill level right now. We've seen that V11 is capped at student-driver level pretty much; a 300K-line C++ codebase doesn't do a step increase in capabilities all that often and all that easily, and we've seen its limits.
What matters isn't how great V12 is now, it's how effective is the training regimen to increase its reliability. We've never seen that for v11, and until we do see it for V12, I'll be skeptical of any timelines coming from Elon's keyboard.
If the training actually works, and the improvements come at a regular pace, we won't need Elon's predictions, we'll see for ourselves how much better it's getting.

MP3Mike · Jan 8, 2024

beachmiles said:
2) Start to decelerate WAY earlier for stopping/slowing situations instead of waiting till the last second and using hard friction brakes. (See desired fix #1)

That is how 11.4.9 is working for me. It hardly uses the friction brakes at all.

beachmiles said:
5) Stop tailgating cars in city streets and give larger following distances especially in chill mode. (See desired fix #1)

Again, here it usually leaves more room than even I think is necessary.

I really wonder why it acts so differently for some people/vehicles.

aronth5 · Jan 8, 2024

powertoold said:
I wonder how many of the employee "reviews" are legit though I'm not sure how Teslascope verifies their sources

Reviews were only from 12 employees and TeslaScope also said they were a passenger on one drive. I suspect they know a few of the employees. TeslaScope has been a good source so far regardless of the Tesla topic so I give them the benefit of the doubt.

powertoold · Jan 8, 2024

lzolman said:
I'm amazed at how anyone expects V12 to just come into reality as a Level 3 or 4 system and blow everyone's socks off.

What we've learned about Tesla's different approaches in the past is that a new version doesn't improve dramatically from the first iteration.

Since Tesla has built a massive testing and training infrastructure, they already know all of FSD's limitations, so we rightfully expect V12 to be amazing out of the gate because Ashok has said that V12 will be the best, most competent self-driving software in the world:

https://x.com/aelluswamy/status/1695562811190210890?s=20

mtndrew1 · Jan 8, 2024

So

uscbucsfan said:
Mine on both cars slams on brakes, every time

And from other groups, it's common behavior

Mine did exactly this from the beginning of 2023 until one of the last firmware releases. It would slam on the brakes after misidentifying target vehicles in adjacent lanes. It was so bad that I stopped using even legacy autopilot altogether.

willow_hiller · Jan 8, 2024

lzolman said:
If the training actually works, and the improvements come at a regular pace, we won't need Elon's predictions, we'll see for ourselves how much better it's getting.

An exciting aspect of end-to-end that I haven't seen anyone talking about is the predictability of scaling. There's really good research for LLMs showing that model loss (a measure of how well the model performs), scales progressively, smoothly, and predictably with model size, dataset size, and compute. Scaling laws for neural language models

If end-to-end vision scales like language, then Tesla should already have a good idea of how well V12 will perform when trained with X data and Y compute (holding model size constant to run on HW3/4).

Mardak · Jan 8, 2024

willow_hiller said:
If end-to-end vision scales like language, then Tesla should already have a good idea of how well V12 will perform when trained with X data and Y compute (holding model size constant to run on HW3/4).

The last point of available compute will be the main limitation, but even then, engineering efforts can optimize to get more performance. There are also potential learnings from language models that people have already done such as taking GPT4 outputs basically treating it as a teacher for fine-tuned smaller models. The previously estimated 50fps for end-to-end on HW3 does allow the model to scale to "slow" down to 36fps matching the cameras, but unclear how fast 12.x would run on HW4 compute with the higher resolution cameras.

Mardak · Jan 8, 2024

powertoold said:
What we've learned about Tesla's different approaches in the past is that a new version doesn't improve dramatically from the first iteration.
… Ashok has said that V12 will be the best, most competent self-driving software in the world

Both can be true as neither of them said the initial release of 12.x will be the best / no longer beta / ready for robotaxi. It's practically obvious for them as they've experienced in-development 12.x capabilities, and it shouldn't be surprising that they'll continue their usual behavior of at least generally maintaining parity with a rewrite. It's more that from what they've already seen with 12.x, they feel confident end-to-end is the correct general architectural approach, but even then there's a lot of potential engineering effort needed to keep up the rate of improvement.

Mardak · Jan 8, 2024

lzolman said:
a 300K-line C++ codebase doesn't do a step increase in capabilities all that often and all that easily, and we've seen its limits

It seems like the various improvements with 10.x and 11.x have had dedicated engineering efforts to fix a particular control issue that couldn't be fixed with "just" additional perception training. These would probably require digging through the 300k+ lines of code to figure out what part needs to be modified without breaking other behaviors. Potentially there's multiple parts of the code that would need to change for other instances of the same issue, so this could come with some refactoring to hopefully make the code more maintainable.

End-to-end generally moves the efforts to be more test-driven in that there are examples for training and testing to help ensure existing and desired new behaviors are maintained. Unclear how Autopilot team is organized in terms of how much overlap there had been for people implementing neural network changes vs C++ control, so freeing up resources previously focused on maintaining the old code could now be generally helping out many other parts.

AlanSubie4Life · Jan 8, 2024

Mardak said:
r 12.x rewrite to already maintain core driving capability and improve some other aspects of a system with over 3 years of public development and testing seems to be quite the accomplishment already.

This is good - but expected, though, since v12 shares a huge amount with v11. It's not like it's a complete rewrite.

Definitely they seem to be making good progress, but it's not like three years of work has been reproduced independently - it's building on that with a lot of the same modules (or at least the same starting point) and such in place.

If they had started from scratch and got to here in whatever time they've worked on it, that might suggest ability to rapidly improve from here. But that's not really what has happened.

I think the new approach will likely help with their ability to iterate, if they can get it to a point where it is consistently better than or on par with v11 first (obviously it's not there yet).

But I expect continue slow progress in reducing safety & comfort intervention rates from one per mile or whatever it is currently, to perhaps 10x better over the course of the next couple years - if we're fortunate, and this approach actually works well.

DrChaos · Jan 8, 2024

JHCCAZ said:
The point of bringing up these v11 foibles in a v12 thread? I wonder whether these things, especially the second one, are related to the navigation map-hints infrastructure. And if so, I wonder if that infrastructure will be replaced by NN or maybe new heuristic code in v12.

I think many failures are because of poor mapping data, and they don't want to buy the expensive proprietary data which is good enough for autonomous driving.

Mardak · Jan 8, 2024

Todd Burch said:
they may just make the v12 version just “good enough” to be better than v11 so they can get everyone focused on v12 from here on

Who are you referring to "everyone" here? It sounded like Tesla already shifted resources to 12.x back in April after the internal demo to Elon Musk according to Walter Isaacson, and practically we haven't seen much activity on 11.x since around that time. There were two NHTSA FSD Beta recalls in 2023, and a couple 11.4.x releases might have been part of testing out bringing online Dojo / other training clusters as well as evaluating parts of 12.x architectural improvements ahead of end-to-end deployment.

"Everyone" can even include the fleet as shadow mode seems to have already been sending back increased data for 12.x training even though these vehicles have 11.x. Although practically, there are limits to the quality of data that can best be gotten with actual hands-on experiences with 12.x actively controlling the car. So there is indeed more to go before everyone is actively contributing solely towards improving v12, and one way to get there sooner is to have something good enough for initial wide public release.

DrChaos · Jan 8, 2024

willow_hiller said:
If end-to-end vision scales like language, then Tesla should already have a good idea of how well V12 will perform when trained with X data and Y compute (holding model size constant to run on HW3/4).

The problem comparing those tasks (NLP) is that what is being measured isn't the same. NLP scaling is roughly how well can I predict next token/word and that's the thing that has tons of train data and reliable scaling behavior. That may also apply to video scene prediction, but scene prediction is much less correlated to the "downstream tasks" of what we need which is driving policy. Other than the phantom braking, people don't complain about perception errors that much but policy errors (true some could be induced by perception errors we didn't notice but I doubt that's majority).

Policy has been the difficult part, and it's much less clear there are scaling laws as obvious, and data acquired as well---as frequently improving one aspect for some people will hurt other people. Yes, telemetry of good drivers on manual will help but we don't know if that will really work to make reliably controllable systems. In some absurd limit one would have it repeat drivers going to destinations where they wanted to go, and not you.

Even if it is "all nets" now, I suspect the policy net is something distilled from off-line computation of conventional optimization, rules & robotics & simulation primarily, with a little bit of observed driving behavior as test cases. Over time that may get more human data driven but there's all sorts of risks taking human driving as ground truth vs observation & perception of natural environment & human labeling.

Mardak · Jan 8, 2024

JHCCAZ said:
As in there's a subroutine telling it to move into the correct lane well ahead of time - but to accomplish that it thinks it needs to be in the wrong lane first.

I'm pretty sure this is caused by inaccurate map data where the subroutine notices the map indicates an upcoming turn-only lane before your actual turn, but it doesn't have enough map data nor visibility to see the intermediate intersection actually has a fork from your current lane for a dedicated turn lane, so it believes your current lane is wrong because you'll be forced to turn if you don't get out.

It'll be interesting to see if end-to-end can learn when to ignore bad map data or perhaps it'll generally be more lazy in switching lanes as we've seen in the 12.1 video almost missing the left turn to the Fremont Supercharger at the factory. Or if map data needs to be much more accurate even with end-to-end.

MP3Mike · Jan 8, 2024

DrChaos said:
I think many failures are because of poor mapping data, and they don't want to buy the expensive proprietary data which is good enough for autonomous driving.

I think that is true. It has a lot of problems on one of my routes where they recently took a lane off the road. It keeps wanting to switch over to the, now nonexistent, lane and change it's mind and abort. Lather, rinse, repeat. (Though sometimes it actually completes the change.) Last night it seemed to do better at not moving over than it has in the past. (Maybe updated map hints as part of the routing? Or maybe because they keep making small changes to the road, and are adding a pedestrian crossing.)

But even visualization wise it switches between showing it as a lane, and not showing it is a lane. (It is absolutely horrible road design.) And I suspect a lot of damage will happen after the first good snow fall from people trying to drive in it and from the snow plowing, unless they just don't plow this street because of the design. (There will be small concrete bits, like parking blocks, hidden in the snow.)

Mardak · Jan 8, 2024

beachmiles said:
Drive WAY more chill especially while FSD is in beta and chill mode is selected

Are you referring to acceleration and FSD Beta setting? I tried both on Chill, and FSD Beta felt too chill in some aspects and not enough for others. Some of it is very personal and regional preferences such as follow distance of 2 seconds could feel way too far or way too close in various situations. Do we know if 12.x so far even honors the settings? I suppose it could be possible for end-to-end to temporarily remove / ignore these settings to get a base level of functionality working before considering these adjustments.

Dewg · Jan 8, 2024

Mardak said:
Are you referring to acceleration and FSD Beta setting? I tried both on Chill, and FSD Beta felt too chill in some aspects and not enough for others. Some of it is very personal and regional preferences such as follow distance of 2 seconds could feel way too far or way too close in various situations. Do we know if 12.x so far even honors the settings? I suppose it could be possible for end-to-end to temporarily remove / ignore these settings to get a base level of functionality working before considering these adjustments.

You bring up a good point - if the NN is trained on good behavior, there may not be settings to select anymore. We haven't seen the screen with the modes yet I don't think.

enemji · Jan 8, 2024

Todd Burch said:
Unless they’ve determined they can source all data from HW4 cars even for use in HW3 cars

I think this makes most sense to do

JHCCAZ · Jan 8, 2024

Mardak said:
I'm pretty sure this is caused by inaccurate map data where the subroutine notices the map indicates an upcoming turn-only lane before your actual turn, but it doesn't have enough map data nor visibility to see the intermediate intersection actually has a fork from your current lane for a dedicated turn lane, so it believes your current lane is wrong because you'll be forced to turn if you don't get out.

(I started to reply with some detail about the worst example I have, but quickly deleted that post as I think it's not really important)

Yes I agree that the problem is probably bad map data related to the accuracy of turn lanes vs. main through lanes. And this is a simpler explanation then the heuristic bug theory I gave. But this is a widespread problem in v11 and we'll have to see if Vlv12 can overcome it.

in my most egregious case the problem extends over a stretch encompassing a number of minor intersections / bays prior to the actual next turn. In this area, the map is not telling the car that it's in a good through lan.

So we may explain it but not excuse it, as any human driver would recognize the obvious dominant through-lane as such. However, v11 is not confident enough in its visual perception to override obviously incorrect map data, and frustratingly this has been getting worse not better despite version updates and map data updates. in fact the very latest holiday versions now perform disturbing slowdowns along this stretch - a further degradation beyond the prior signal-and-change syndrome.

Given the above, and many user reports of similar experiences in other places, the v11 trend seems to have been ever-increasing reliance on the maps versus the visuals, despite Tesla obviously knowing that this will cause annoyances and interventions. They must be prioritizing caution, i.e. allowing reactions for false positives from the maps to take precedence over the real time judgment of the perception system.

To overcome this, v12 has two main tools: firstly better perception in terms of high-confidence interpretation of the video stream, and secondly better and faster crowdsourcing of the road maps from the entire fleet of Teslas driving around (and perhaps the effective merging of map learning into the training weights of the driving itself). The first tool is really the driving software itself and if it performs quite well, the confidence would be there to shift back to more reliance on the real-time vision and override obviously incorrect map details. The second is not the driving software but improves the maps so that they cause fewer errors from bad map data. Both are good improvements, but for a real breakthrough from 12, I think the quality of the self-driving independent of map data, or in spite of bad map data, is the real key to success.

FSD v12.x (end to end AI)

14-Year Member

Never thought I'd be driving the world's best car!

Well-Known Member

Long Time Follower

Active Member

Active Member

Well-Known Member

Active Member

Active Member

Active Member

Efficiency Obsessed Member

Active Member

Active Member

Active Member

Active Member

Well-Known Member

Active Member

Active Member

Active Member

Electrified Engineer

Similar threads