Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
This is definitely not my expectation- I think it’s a major rewrite and (mostly) end to end.

That is why my expectations of an early release are still low.

I also think there will be multiple rounds of employee and YT rollouts before we get it.

From V11's initial employee rollout --> OGs receiving it was about 4 months:

11/11/22:

OGs on 3/8/23:
 
On S&X, yes; on 3&Y HW4 has no unused inputs. At least that is what I thought green said.
The 3/Y have the connector locations depopulated. S/X have them but with no cables plugged in
Critical piece of info is that HW4 can support a front bumper camera.
(Tangent: HW3 theoretically could (ignoring processing) also since the tri-camera is now a bi-camera. )
 
This is definitely not my expectation- I think it’s a major rewrite and (mostly) end to end.

That is why my expectations of an early release are still low.

I also think there will be multiple rounds of employee and YT rollouts before we get it.
V11, as I recall, first went to employee vehicles mid-Nov 2022 and finally delivered to end users at the end of March 2023. So, almost five months to go from employees to customers. V12 started rollout a little later in the year than V11. So, I would expect to get V12 no earlier than April.
 
When we get V12 I’m going to take things slowly and test my regular routes in low traffic before starting to use regularly and be extra-extra vigilant.
Yeah, I'm expecting end-to-end to drive noticeably different from 11.x, so extra practice and resetting human driver expectations is probably a good starting point. Presumably Tesla has accumulated automated test cases from 10.x as well as highway situations leading up to single stack, so hopefully there are plenty related to control to ensure those known failures of earlier FSD Beta do not regress with end-to-end. The trickier aspect is new driving behaviors that might have been "easy" for previous versions or newly learned from examples, e.g., pulling to the side of the street at a destination, probably didn't have existing testing.

Another thing to consider is with FSD Beta in such wide release driving over 2 million miles every day, Tesla is probably also a bit more cautious than earlier 10.x releases when daily miles were nearly 3 orders of magnitude smaller. Hopefully Tesla can continue leveraging the fleet to speed up rollouts while maintaining safety.
 
  • Like
Reactions: EVNow
Surprised nobody picked up on Wholemars saying "it could suck worse than v11" about fsd v12.

He's always praising v11 like it's near level 5 and now he's implying it sucks? Lol, what a tool.
No that is NOT what he said.

In the past, a newer version (for Tesla FSD) has shown regressions before it got better in the subsequent minor version updates. One step forward, two steps backward. So he is wondering aloud if first release of V12 will actually be a regression from V11 on some aspects. Go back and recollect the very first version of Tesla's own Autopilot compared the Mobil Eye's version. It was a huge regression before things got far better.

Now given this is a seismic change in FSD software architecture between V11 & V12, that is a distinct possibility. Things might regress before it gets better
 
11/11/22:
Ah yeah, the original release of 11.x was exactly 11/11 11:11. Tesla's 2-week development cycle with Tuesday releases happens to match up with 12/12 12:12 this year, so that could be another internal deadline for 12.x before holiday update / end of year.
 
  • Like
Reactions: ToTheStars
This is definitely not my expectation- I think it’s a major rewrite and (mostly) end to end.

That is why my expectations of an early release are still low.

I also think there will be multiple rounds of employee and YT rollouts before we get it.
Best to manage and set expectations low, given Elon's track record here....
 
Last edited:
1701043551442.png
 
you're suggesting that the end-to-end livestream from August is a separate development path from what we'll get with 12.x because 3 months from demo to employee testing is not enough time to make things safe

No. Not a separate path!
I think that demo just shows their “end-to-end” planner in action. Which is an incremental change. They’ve rolled things in like this before - their lanes network, their occupancy network, etc. This just adds another facet (and gets rid of a lot of code). But it isn’t going to get rid of a lot of structure which ties this all together - and they may even add some guardrails to make sure behavior is correct (I have no idea how that would be done, of course - I don’t know ANYTHING about any of this, at all).

But that is very different than just throwing photons at a massive single NN structure and getting driving controls out! (What I think most people think of as “end-to-end,” and most certainly would not be incremental.)

In the videos linked above, note Ashok’s liberal use of “end-to-end”. Multiple pieces of the FSD system are referred to as end-to-end. And they are per his definition. Lanes network is end-to-end (6:05). Someone should really just ask Ashok to define end-to-end so that everyone can be more clear on this (I’m not going to attempt to define it to put words in his mouth, I just agree that the lanes network is end-to-end for the most part, based on what they claim.)

It’s the same for the v12 Beta - the planning is going to be end to end, presumably with minimal post processing. But I don’t think it’s going to be performing the final control functions (like how exactly to stop and how to go - that will presumably still be a good solid physics model). And overall everything else we’re already familiar with is going to stay pretty much intact, unless explicitly stated otherwise.

I think it’s a major rewrite and (mostly) end to end.
To me the above doesn’t seem that major. It certainly seems worthy of a revision to v12 Beta though!

But we still have v12 after that!
 
V12 Beta very much seems like it is going to be a small incremental change. ...
I think that demo just shows their “end-to-end” planner in action. Which is an incremental change. They’ve rolled things in like this before - their lanes network, their occupancy network, etc. This just adds another facet (and gets rid of a lot of code). ...
It seems to me that to believe this, you have to specifically deny and disbelieve what Elon and Ashok explained: that there is no code for recognition of stop signs, traffic lights, lanes and so on.
But that is very different than just throwing photons at a massive single NN structure and getting driving controls out!
I believe that it is a massive but not formless NN structure that is the starting point for the v12 video training. You don't throw "photons" at it at training time, you throw at it recordings and simulations of video and transducer telemetry.

I'm no ML expert, but the following is my current understanding of why this works and why it is indeed a very significant departure from the previous versions, even while it builds on them:

V12 does very much depend on the prior developments, because that is where the starting-point weights come from. Those weights effectively determine the architecture of the neural network. I believe that this is tractable because it then is not, in fact, just a giant mass of software neurons with every output potentially influencing every other neuron's input. Instead there are large, but not impossibly large, lists of dot product weights that reflect the architectural grouping and functionality of various decision centers throughout the system.

There is not much compiled "code" but there is a huge database of tensor weight values that define the interactions among the neurons, and just as importantly define, by their absence, the zero-weighted non-interactions among isolated sub-networks.

In theory it could be implemented as a single homogeneous network where every neuron receives a weighted input from a gigantic list of every other neuron in the whole thing - but most of those weights would be zero, so the efficiency and performance would be dreadful, and the memory requirement would be enormous to no purpose. And more fundamentally, I believe the training iterations would quickly become quite unstable as the back propagation process would try assigning useless finite intercommunication weights among what should be the isolated sub-networks. Some kind of entropy where the prior (promising yet imperfect) network actually loses its architectural form and descends into chaos.

All this means that the v12 end-to-end network is only made possible and practical by having it train and tune the finite lists of weightings that came out of the prior code-defined NN versions. It is indeed a fundamentally different approach, but it critically depends on the prior work.

Also - and here I'm really just ruminating - the v11 starting-point platform (or future iterations of it) may still have importance as a kind of developmental breadboard to be used for cases where the v12 training is not producing satisfactory outcomes. For example, if it doesn't generatively learn to read and understand signs or crossing guard actions in school zones, they could go back to v11 and do some old-fashioned module coding that "kind of works" and becomes the foundational basis of a new sub-etwork, a needed capability enabler for the end-to-end training to then fine-tune, and from which it can extract the most performance.
 
It seems to me that to believe this, you have to specifically deny and disbelieve what Elon and Ashok explained: that there is no code for recognition of stop signs, traffic lights, lanes and so on.

I believe that it is a massive but not formless NN structure that is the starting point for the v12 video training. You don't throw "photons" at it at training time, you throw at it recordings and simulations of video and transducer telemetry.

I'm no ML expert, but the following is my current understanding of why this works and why it is indeed a very significant departure from the previous versions, even while it builds on them:

V12 does very much depend on the prior developments, because that is where the starting point weights come from. Those weights effectively determine the architecture of the neural network. I believe that this is tractable because it then is not, in fact, just a giant mass of software neurons with every output potentially influencing every other neuron's input. Instead there are large, but not impossibly large, lists of dot product weights that reflect the architectural grouping and functionality of various decision centers throughout the system.

There is not much compiled "code" but there is a huge database of tensor weight values that define the interactions among the neurons, and just as importantly define, by their absence, the zero-weighted non-interactions among isolated sub-networks.

In theory it could be implemented as a single homogeneous network where every neuron receives a weighted input from a gigantic list of every other neuron in the whole thing - but most of those weights would be zero, so the efficiency and performance would be dreadful, and the memory requirement would be enormous to no purpose. And more fundamentally, I believe the training iterations would quickly become quite unstable as the back propagation process would try assigning useless finite intercommunication weights among what should be the isolated sub-networks. Some kind of entropy where the prior (promising yet imperfect) network actually loses its architectural form and descends into chaos.

All this means that the v12 end-to-end network is only made possible and practical by having it train and tune the finite lists of weightings that came out of the prior code-defined NN versions. It is indeed a fundamentally different approach, but it critically depends on the prior work.

Also - and here I'm really just ruminating - the v11 starting-point platform (or future iterations of it) may still have importance as a kind of developmental breadboard to be used for cases where the v12 training is not producing satisfactory outcomes. For example, if it doesn't generatively learn to read and understand signs or crossing guard actions in school zones, they could go back to v11 and do some old-fashioned module coding that "kind of works" and becomes the foundational basis of a new sub-etwork, a needed capability enabler for the end-to-end training to then fine-tune, and from which it can extract the most performance.
Whew… 😉