Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Autonomous Car Progress

This site may earn commission on affiliate links.
I don't understand. The very purpose of this effort is to give people an idea of what the model is doing. The fact that it said that it was doing something that it wasn't was a means of letting us know that it has a bug, either in the text generation system or in the driving system. If FSD explained what it was doing, we'd know right away why it was driving at a particular speed, or why it refused to change lanes, or why it does all the other odd things that it does.
My point is an E2E system doesn't "know" anything. It doesn't know there's a bike or a car or a lane to the right or whatever. It's just a huge equation with a billion fixed coefficients. This language engine, for lack of a better word, simply makes stuff up.

Here's a question: how do you train the language engine? You train NNs by feeding in a bunch training data and tweaking the coefficients until you get close to the "right" output. With image recognition the right output is what a human curator says each training image represents. With a braking NN the right output could be the lowest constant g force that stops at the stop sign.

What's the right output for this language NN? What a human curator says, as with image recognition? Probably. But the human curator is just guessing! So they trained the NN to guess like a human. It might make the rider feel better, but it doesn't represent the actual driving decisions.
 
  • Like
Reactions: spacecoin
It's just a huge equation with a billion fixed coefficients.

Relevant XKCD:

1713454656617.png
 
What's the right output for this language NN? What a human curator says, as with image recognition? Probably. But the human curator is just guessing! So they trained the NN to guess like a human. It might make the rider feel better, but it doesn't represent the actual driving decisions.
Have you watched that video? It describes what the car is doing, and why. Folks picked up on the fact that a pedestrian had to hustle to get across the road before the car arrived - while the car happily pronounced the road clear (7:00 mark). Is that a flaw in the description or in the control system? I'd claim the latter. I say that the description system was correct in describing what the car thought, and that's invaluable to understanding how the control system worked; it considered the road clear because the pedestrian was hustling to move off the driving surface. FSD would have slowed slightly, and that would certainly have been more reassuring than assuming that the pedestrian would be clear by the time the car arrived. But that's a control problem.
 
This is probably worth discussing here. For years we have assumed that the version of FSD deployed to customer vehicles was the only version, and Tesla did not have a different version that would represent higher levels of autonomy quickly.

Ed Ludlow is now reporting that Tesla does actually have separate versions in development for dedicated robotaxis:
Given these are separate from the version we're receiving on our cars, I can only assume that the hardware for the Tesla robotaxi is substantially different from HW3/4.
 
This is probably worth discussing here. For years we have assumed that the version of FSD deployed to customer vehicles was the only version, and Tesla did not have a different version that would represent higher levels of autonomy quickly.

Ed Ludlow is now reporting that Tesla does actually have separate versions in development for dedicated robotaxis:

Given these are separate from the version we're receiving on our cars, I can only assume that the hardware for the Tesla robotaxi is substantially different from HW3/4.
"Elon first tasked the software team to look at robotaxi-specific architecture / FSD foundation models in 2019." Hahahaha. Not likely.

"The barriers to progress were Dojo being behind". Oh yeah? Why didn't he buy A100:s then to solve it? What about all unsolved research problems? Afaik NN:s don't provide reliability guarantees.

"The innovation that’s mattered is FSD software that is end-to-end (no c++ control Code). That’s been the key to unlocking a generation of software that could realistically power a 'robotaxi.' " Sigh. Tesla marketing once again.

I hold Bloomberg to higher standards than this POS reporting. This is basically advertising Tesla's hopium.
 
Last edited:
  • Like
Reactions: diplomat33
I hold Bloomberg to higher standards than this POS reporting. This is basically advertising Tesla's hopium.

Far from it. Did you miss the fact that Ludlow titled his full piece "Elon Musk’s Robotaxi Dreams Plunge Tesla Into Chaos"?

Bloomberg has done their best to spin it negatively, but I think this represents some genuinely interesting inside information on Tesla's development toward actual autonomy.
 
Far from it. Did you miss the fact that Ludlow titled his full piece "Elon Musk’s Robotaxi Dreams Plunge Tesla Into Chaos"?

Bloomberg has done their best to spin it negatively, but I think this represents some genuinely interesting inside information on Tesla's development toward actual autonomy.
No I didn't miss that. I subscribe to Bloomberg. That tweet though was unsubstanciated and unverified bullcrap.
 
This is probably worth discussing here. For years we have assumed that the version of FSD deployed to customer vehicles was the only version, and Tesla did not have a different version that would represent higher levels of autonomy quickly.

Ed Ludlow is now reporting that Tesla does actually have separate versions in development for dedicated robotaxis:
Given these are separate from the version we're receiving on our cars, I can only assume that the hardware for the Tesla robotaxi is substantially different from HW3/4.
This whole thing is fabricated complete nonsense trash.
There's not one true statement in this tweet. Dojo has contributed approx 0% to FSD.
Anyone who regurgitates "Dojo" just shows how clueless they are.
 
This whole thing is fabricated complete nonsense trash.
There's not one true statement in this tweet. Dojo has contributed approx 0% to FSD.
Anyone who regurgitates "Dojo" just shows how clueless they are.
Hi, Bladerskb --

For extra fun, recent reporting on the Austin data center indicates that it is known internally as "Dojo". Maybe Teslarians refer to any in-house FSD related compute as Dojo.

Yours,
RP
 
  • Helpful
Reactions: willow_hiller
For years we have assumed that the version of FSD deployed to customer vehicles was the only version, and Tesla did not have a different version that would represent higher levels of autonomy quickly.
Actually for years people have speculated a secret FSD version that is much better. Given how much Elon would love to solve FSD (and how he had to eat humble pie year after year), I doubt they have anything better.

Moreover - they simply don't have any training data for the new robotaxi (with more cameras, presumably). So, what neural networks will they have that is somehow better ?
 
Given these are separate from the version we're receiving on our cars, I can only assume that the hardware for the Tesla robotaxi is substantially different from HW3/4.
What are they going to train them with? They can pile on all the inference compute that they want and train a huge model from existing video, but if they change the cameras, they lose the vaunted advantage of a massive volume of training data.
 
Moreover - they simply don't have any training data for the new robotaxi (with more cameras, presumably). So, what neural networks will they have that is somehow better ?

V12 was trained primarily on employee-collected data, so I don't think we can assume they have no training data for a robotaxi neural net because there are no customer vehicles.

Earlier this month, Tesla had public job listings for data collection from "prototype vehicles": Tesla boosts its data collection team with hiring ramp for Prototype Vehicle Operators

What's the alternative explanation? Ed Ludlow decided to lie about Tesla's progress? For what reason?
 
V12 was trained primarily on employee-collected data, so I don't think we can assume they have no training data for a robotaxi neural net because there are no customer vehicles.

Earlier this month, Tesla had public job listings for data collection from "prototype vehicles": Tesla boosts its data collection team with hiring ramp for Prototype Vehicle Operators

What's the alternative explanation? Ed Ludlow decided to lie about Tesla's progress? For what reason?
Hi, Willow_Hiller --

> V12 was trained primarily on employee-collected data,

Got a source for that? If true, it really undercuts the "only Tesla has the data" story. Or maybe the idea is that getting to 12.x is easy, but you need the mountains of data for nine-marching? But if that's true, why would you train on employee-collected data, instead of using the existing mountain?

Yours,
RP
 
Why would that be ? They would have used all the data they collect from all vehicles. Employee vehicles would only be used for validation.

I was under the impression that certain maneuvers, like coming to a complete stop at a stop sign, had such little training data, that Tesla was supplementing it with employee collected data. Things like that, and Chuck Cook's UPL, which still to this day seems to have employees there collecting data for the turn.

But looking back at the discussion between Ashok and Elon, it does sound like they still poll the fleet for it, despite it being less than 0.5% of stops.
 
  • Informative
Reactions: EVNow
Have you watched that video?
I gave timestamps from that video, so I obviously watched some of it (first ~5 minutes + a couple snippets).
It describes what the car is doing, and why.
I don't agree. We don't know what it describes. NNs produce guesses based on coefficients defined during training. We don't know how they trained this NN. If it's truly E2E there may be no connection at all between the language output and what the car is actually "thinking".

I say that the description system was correct in describing what the car thought, and that's invaluable to understanding how the control system worked; it considered the road clear because the pedestrian was hustling to move off the driving surface.
There's no way to know any of this with E2E.

Consider this classic adversarial image example (from this paper). In both cases you and I see a bear-shaped face with mostly white fur, black ears and black around the eyes. So we say "Panda". The vision NN, on the other hand, sees two completely different images. It doesn't "think" in the same terms as us. It came up with "Panda" for the first image, but it obviously wasn't based on head shape, white face and black eyes/ears. What was it based on? We don't know.

1713805579379.png


You could build and train a language model would look at these same images and say "based on the bear shaped face, white fur, black ears and eyes the image NN decided this is a panda". But it would be lying.

It's different if you build a driving stack out of separate NNs, e.g. perception, prediction, planning, etc. You can look at the output from the perception NN and correctly say the car sees a bicycle. You can look at the outputs from the prediction stack and correctly say the car predicts the bicycle will cross into the car's path. And you can reasonably infer that's why the car braked. You won't be right 100% of the time, maybe the car actually braked for a butterfly. But you can get very close.

With E2E, though, all bets are off. We can't know if the NN "sees" the bicycle or not -- in fact the whole concept of "seeing" doesn't even exist in the way you and I think of it.
 
I gave timestamps from that video, so I obviously watched some of it (first ~5 minutes + a couple snippets).

I don't agree. We don't know what it describes. NNs produce guesses based on coefficients defined during training. We don't know how they trained this NN. If it's truly E2E there may be no connection at all between the language output and what the car is actually "thinking".


There's no way to know any of this with E2E.

Consider this classic adversarial image example (from this paper). In both cases you and I see a bear-shaped face with mostly white fur, black ears and black around the eyes. So we say "Panda". The vision NN, on the other hand, sees two completely different images. It doesn't "think" in the same terms as us. It came up with "Panda" for the first image, but it obviously wasn't based on head shape, white face and black eyes/ears. What was it based on? We don't know.

View attachment 1040755

You could build and train a language model would look at these same images and say "based on the bear shaped face, white fur, black ears and eyes the image NN decided this is a panda". But it would be lying.

It's different if you build a driving stack out of separate NNs, e.g. perception, prediction, planning, etc. You can look at the output from the perception NN and correctly say the car sees a bicycle. You can look at the outputs from the prediction stack and correctly say the car predicts the bicycle will cross into the car's path. And you can reasonably infer that's why the car braked. You won't be right 100% of the time, maybe the car actually braked for a butterfly. But you can get very close.

With E2E, though, all bets are off. We can't know if the NN "sees" the bicycle or not -- in fact the whole concept of "seeing" doesn't even exist in the way you and I think of it.
FYI, up thread it was posted that the current theory is Tesla is using modular E2E, meaning there are still separate perception, prediction, planning NNs. What makes it E2E is that they are NN and the output at the end is fed back to the prior NNs (instead of them being 100% independent).

As such they can get the output of the perception NN and have a decent idea what the NN has detected an object as.
 
Tesla will integrate Baidu "hd" maps into cars in China.

"On Monday, Baidu announced that Tesla vehicles in China will integrate Baidu Maps V.20 in May which features a 3D lane-level navigation capability directly integrated into vehicles."


baidu.jpeg