Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
The amazing thing is that V11 was/is required to get to V12...

In order to curate / test / simulate / etc. the data, you need V11 heuristics

- Even once you have E2E, imo any realistic self-driving car deployment will demand a vector space stack because sometimes you just want explicit control. E.g. if a regulator in some country comes to you with demands around time/distance for various maneuvers, you just want to be able to implement that.
- A vector space stack gives you a valuable "dictionary" over scenes, which helps a ton with various data triggers, data science, etc., the ability to both source training data and analyze/evaluate the performance of the system. And I wouldn't be surprised if it's involved in also shaping the reward function for RL.
For these reasons I never saw the two at tension pulling in different directions. I saw the explicit vector space stack as a precondition to the E2E stack, and as something that will stick around, even if a lot of the driving gradually shifts to E2E mode. So the E2E project has been alive and well inside the team for a long time. Super exciting that it's actually starting to look quite capable.


I think Tesla has been curating the dataset to only include the desired behavior.
 
In order to curate / test / simulate / etc. the data, you need V11 heuristics
Yes, and there are real reasons of that can be added to clarify the word "need":
  • If you try to start from nothing, just a huge neural net with all the neurons connected to all the others and some arbitrary set of connection weights, the number of parameters is ridiculously large and the number of attempts required to get to a working solution approaches Infinity.
    • Cool emergent behavior, starting from nothing, does occur if the machine and the defined goals are simple enough. But there's not enough computing power on Earth to program the formless self-driving machine with no priors.
  • Also, it's not just that it will take too long on a gigantic computer farm. More fundamental is that the no-priors / unbounded-connections NN will be unstable in early training, i.e. it cannot converge and refine a solution because by far most of the possible (and eventually unneeded) parameters (interconnect weights) are not only costly to include, they are adversarial unless they are zeroed out.
Starting from v11 isn't theoretically needed, but to start from scratch without heuristic code, you'd have to constrain the problem to a small toy-like driving problem that can converge, then constrain (protect) that network from fundamental disruption while you add next-step capabilities. As of 2023, Tesla's prior self-driving network is a pretty good base for E2E refinement and further development. They may well be looking at alternatives that go back to the beginning, to rebuild a system that's an even better base; I don't know.

Tesla (and AI developers in general) freely admit that key breakthroughs and insights lead to simpler yet better performing systems - but it's not just a case of "Why didn't you think of that before?" to achieve tomorrow's working solution.

So George Hotz, along with many others, can say he knew all along that E2E would be the winner. To me, that's not at all the same as saying that Tesla wasted time doing it wrong.
 
Rohan has increased his postings recently showing Elon has a lot of faith in him
Do you have a sense of whether Rohan Patel's posts are more of what Tesla is focused on or perhaps he's also excited to talk more about FSD?
I suppose gradual rollouts of target regions and hardware are part of the validation and testing to ensure safety before sending it wide. Potentially Ashok Elluswamy was referring to similar metrics or validation for 12.x surpassing 11.x?
 
Yes, and there are real reasons of that can be added to clarify the word "need":
  • If you try to start from nothing, just a huge neural net with all the neurons connected to all the others and some arbitrary set of connection weights, the number of parameters is ridiculously large and the number of attempts required to get to a working solution approaches Infinity.
    • Cool emergent behavior, starting from nothing, does occur if the machine and the defined goals are simple enough. But there's not enough computing power on Earth to program the formless self-driving machine with no priors.
  • Also, it's not just that it will take too long on a gigantic computer farm. More fundamental is that the no-priors / unbounded-connections NN will be unstable in early training, i.e. it cannot converge and refine a solution because by far most of the possible (and eventually unneeded) parameters (interconnect weights) are not only costly to include, they are adversarial unless they are zeroed out.
Starting from v11 isn't theoretically needed, but to start from scratch without heuristic code, you'd have to constrain the problem to a small toy-like driving problem that can converge, then constrain (protect) that network from fundamental disruption while you add next-step capabilities. As of 2023, Tesla's prior self-driving network is a pretty good base for E2E refinement and further development. They may well be looking at alternatives that go back to the beginning, to rebuild a system that's an even better base; I don't know.

Tesla (and AI developers in general) freely admit that key breakthroughs and insights lead to simpler yet better performing systems - but it's not just a case of "Why didn't you think of that before?" to achieve tomorrow's working solution.

So George Hotz, along with many others, can say he knew all along that E2E would be the winner. To me, that's not at all the same as saying that Tesla wasted time doing it wrong.
The pure E2E examples I see start with something very simple. like keeping in the lane, then figuring when to turn left or right (and how to get through the curve to do so), not having to deal with more parameters. Then slowly build from there.

But if that earlier article is to be believed, Tesla's approach is basically to keep existing V11 perception stack, convert the planning stack to NNs, then give feedback to the perception stack with weights that take into account the planning stack (making it E2E). So basically the earlier work was not wasted and remains an important part of V12.
 
Yes, and there are real reasons of that can be added to clarify the word "need":
  • If you try to start from nothing, just a huge neural net with all the neurons connected to all the others and some arbitrary set of connection weights, the number of parameters is ridiculously large and the number of attempts required to get to a working solution approaches Infinity.
    • Cool emergent behavior, starting from nothing, does occur if the machine and the defined goals are simple enough. But there's not enough computing power on Earth to program the formless self-driving machine with no priors.
  • Also, it's not just that it will take too long on a gigantic computer farm. More fundamental is that the no-priors / unbounded-connections NN will be unstable in early training, i.e. it cannot converge and refine a solution because by far most of the possible (and eventually unneeded) parameters (interconnect weights) are not only costly to include, they are adversarial unless they are zeroed out.
Starting from v11 isn't theoretically needed, but to start from scratch without heuristic code, you'd have to constrain the problem to a small toy-like driving problem that can converge, then constrain (protect) that network from fundamental disruption while you add next-step capabilities. As of 2023, Tesla's prior self-driving network is a pretty good base for E2E refinement and further development. They may well be looking at alternatives that go back to the beginning, to rebuild a system that's an even better base; I don't know.

Tesla (and AI developers in general) freely admit that key breakthroughs and insights lead to simpler yet better performing systems - but it's not just a case of "Why didn't you think of that before?" to achieve tomorrow's working solution.

So George Hotz, along with many others, can say he knew all along that E2E would be the winner. To me, that's not at all the same as saying that Tesla wasted time doing it wrong.
Yeah. For V12 to be possible Tesla needed the following:

1. A large dataset of difficult situations

To get that they first needed to deploy a worse system but good enough to be able to gather some edge cases to see where that previous systems failed. V11 was great for this.

2. A good way to filter data

Autolabel which was worked on beside FSD was instrumental for this. Now they can analyze good and bad drivers, balance the data etc. Maybe they will even improve on the drivers by centering them slightly in some situations, by improving the longitudial control, by using the future etc.

3. A way to ask the fleet for more data of specific situations

Here V11 is useful in that it has a lot of code to manually classify the situation ie stop sign was occluded, where V12 don't even know what a stop sign is.

4. A good base layer for the neural network.

Like Karpathy says, there is not a lot of signal from hundreds of millions of pixel to steering angle. They need to narrow it down a bit first and only train the top layers. So they need a good base layer. Here I think the world model can be useful. Take the video input, compress it, add temporal, uncompress it to generate the future. In order to do this you create a very good compression, aka a world model. Then instead of predicting the future you make it predict the steering angle only, but starting from the compressed model of the world.

So yeah, V11 was not wasted work. But I think Karpathy misjudged it a bit in his quote, he thought the vector space was needed at run time, when it likely only was needed to curate the dataset. It turned out that the solution was even simpler than they thought.
 
A way to ask the fleet for more data of specific situations
Now with end-to-end safe enough to be deployed to customers, is this aspect of labeled perception as important versus generally gathering examples of where 12.x is doing poorly and how it could behave better? I can understand the value in getting to a base level of functionality and safety for initial deployment such as finding specific examples of NHTSA stops or preventing accidental going on red lights. We're still in the transition phase where data of specific situations might leverage 11.x to find regional or other differences, but long-term maybe this will be relatively under utilized?
 
Now with end-to-end safe enough to be deployed to customers, is this aspect of labeled perception as important versus generally gathering examples of where 12.x is doing poorly and how it could behave better? I can understand the value in getting to a base level of functionality and safety for initial deployment such as finding specific examples of NHTSA stops or preventing accidental going on red lights. We're still in the transition phase where data of specific situations might leverage 11.x to find regional or other differences, but long-term maybe this will be relatively under utilized?
Yeah, they will still need to gather "full stop at stop sign"-like scenarios for each jurisdiction. Not just examples of where the driver thinks the car is driving like a good driver, but also where NHTSA and their peers are being a pain in their actually smart summon. And plenty of other things, they might realize that they lack enough RHD-snow data and have to ask the fleet in China for snowy examples. And plenty of other specific things they are aware of where they are making a targeted effort in order to fix the problem.

The main innovation of end2end is that now every problem is a data problem. But they still have the data problem and just getting driver disengaging will not be enough, even if all of those cases are still useful.
 
  • Like
Reactions: Mardak
Yeah, they will still need to gather "full stop at stop sign"-like scenarios for each jurisdiction. Not just examples of where the driver thinks the car is driving like a good driver, but also where NHTSA and their peers are being a pain in their actually smart summon.
I mentioned above that I think the stop sign data-gathering issue probably involves a huge number of simulated clips, as they've explicitly stated that a very small set of drivers (and not very much intersecting with the set of deemed good drivers) perform the stop behavior that NHTSA wants.

But that may have been a blessing in disguise, pushing Tesla to consider the generalized question of what training can be largely simulated. A lot of normal predictable driving can be simulated, and a lot of predictable edge cases and hazards like red light running, kids and animals running into the street, car doors flying open and so on.

For the endless set of less predictable or nearly unique edge cases, they can gather data from the entire fleet which today is mostly not running FSD at all. In fact at this stage, they probably need to remove FSD cars for first-level studies of desired behavior in construction zones, emergency vehicle interactions, school zones, school buses, flagmen and policemen directing traffic etc.
And plenty of other things, they might realize that they lack enough RHD-snow data and have to ask the fleet in China for snowy examples. And plenty of other specific things they are aware of where they are making a targeted effort in order to fix the problem.
Based on occasional reports, I think the China data gathering and training effort may need to reside entirely in firewalled China-dedicated systems. Nonetheless your point is valid across many countries with different road and traffic patterns. We also know that Tesla has considerable experience in data Gathering campaigns to address specific scenarios; this started at least as far back as the human labeling teams around 2019.
The main innovation of end2end is that now every problem is a data problem. But they still have the data problem and just getting driver disengaging will not be enough, even if all of those cases are still useful.
I agree that FSD disengagements, while being a useful class of data, is only one aspect and may not even be the most important data set. One might say that a disengagement event is just one kind of flag to call attention to a possibly interesting clip. But many of these are arguably unnecessary, different users have very different disengagement thresholds, and there are probably a larger number of important clips that weren't accompanied by user disengagement at all.

Pertinent to this, I've been thinking about real time and offline Shadow Mode. There's been some debate about whether Tesla really can and does run FSD in Shadow Mode alongside manual driving, for comparison. I'm not sure there's enough compute in the car to compare released FSD to release-candidate FSD. However, I don't really think that's so important because it's reasonable to assume that they can compare any released or release-candidate FSD stack against manual driving - which again represents most of the the off highway miles and probably still a very significant portion of the highway miles.

But even without real-time Shadow Mode, if the uploaded data includes all vehicle telemetry, I think they can do an offline version of Shadow Mode and figure out if the car would have responded appropriately. Offline, the analysis would be faster on the compute Farm , and also even better in that they can try different versions, sub-versions and parameter settings to compare against the uploaded clips. Similar to my point about disengagements, it may be that in-car = real-time Shadow Mode it's not so important for the direct comparison result, but mostly important as a first-line screening operation to select events that are important enough to merit upload for offline analysis.

Mixed in with all this is the is the curation of which drivers and/or which uploaded human driving clips are good standards of comparison. Obviously they can use some version of the Safety Score, which can also be gathered in Shadow mode. On top of all that, they still have their team of dedicated hired drivers all around the world (they're not all in Chuck's neighborhood :) ).
 
I'm curious about this, as I've understood the opposite. Red light and speed cameras were common many years ago, but have been removed due to constitutional challenges - specifically the right to face your accuser. Can't haul a camera into court. :)
I think it’s state dependent, but the accuser is the government, the camera is just evidence.
 
  • Like
Reactions: jabloomf1230
Yeah but even then, in the past there's been a wavy pattern to it. It will ramp up for a couple days, drop off for a day, then ramp harder for another day or two, then off a day or two, then go all out, or something like that. I think we're now in the first dip.
 
It’s promising to hear that v12.3 is excelling in smoothness and such, this is critical for wide release as a Level 2 driver assist.

I would reword that to say:

It’s promising to hear that v12.3 is excelling in smoothness and such, this is critical for tesla to be the undisputed leader as a Level 2 driver assist.

Level 2 driver assist is a really wide category and to be fair, tesla is already arguably better than any North American driver assist when all driving situations are considered. From all the different comparison articles I've read/watched, I've learned there are many hands-free systems but they are still limited in where they can be used and even they are not very smooth (although they may be in some limited circumstances be as smooth or smoother that V11). As you all know, I have no intention of ever buying another tesla so I'm paying close attention to progress in this area since L2 driver assist was on my list of options my car needed to have back in 2020, but now the L2 ADAS on any car I buy will be compared to FSDb.*

I'd also argue that every tesla has L2 driver assist under the definition (basic AP offers brake, acceleration, lane keep which makes it L2) so is already widespread given the number of teslas on the road and the fact that these features tend to be high-end options in other vehicles.

It is smoothness that will lead to widespread USE by tesla owners.

* Ironically, the last time we were discussing the must-haves for a our next car while we were on a road trip, my husband wanted all the features of V11 highway driving including for 2-lane undivided highways because he wasn't willing to go 'back' to the 'old stress' of highway driving. I asked him if it was so important to him, why wasn't FSDb or AP engaged at that moment? He had to admit he had turned it off because it was too stressful to pay attention to potential diving into turn lanes or other random, dangerous, behaviour and the wipers were acting up so to keep them off, he turned off AP as well.