Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
Here's more context of what he said / what I can hear:

Because of the fact that it's photons in to controls out -- the photons are different. You're getting a different bitstream with HW4 cameras than HW3 cameras.​
Although this makes me wonder if Tesla did specially retrain 11.4.x for HW4 vs running in some emulated mode with some preprocessing? Or even Basic Autopilot already running on new HW4 including Model Y?

I still don't like the expression "photons are different". He could say the bitstream is different. He could say the video is different. But it is the same photons that enter the HW3 or HW4 cameras, they are just processed differently.
 
  • Like
Reactions: E90alex and daktari
1693174189716.png


 
I think my favorite Elonism is when he said "photons are different". That is pseudoscience nonsense. Maybe he was trying to say that they can't take the V12 trained on HW3's lower res cameras and simply put it on HW4 cars because HW4 uses higher res cameras, hence they need to retrain on higher res cameras for V12 to work right on HW4 cars. But saying "photons are different" is dumb.
Yes, he was trying to say
that they can't take the V12 trained on HW3's lower res cameras and simply put it on HW4 cars because HW4 uses higher res cameras
I understood it that way when he said it. It wasn't dumb at all.
 
  • Like
Reactions: scaesare
I think there is a clear disconnect between Elon and the engineering team. Elon seems to be thinking more in theory. So when Elon says FSD is solved, he means it is solved "on paper" as he sees it. Put differently, in his mind, he sees E2E vision-only as the right solution to FSD on paper. But engineers look at the real world. So the Tesla engineers are looking at what it takes to actually achieve a safe, reliable autonomous driving in a real car in the real world, where the human does not need to supervise, with all the edge cases and challenges that it entails.

That seems like a very charitable interpretation. The engineering team reports to Musk. He has more insight into the real state of affairs than any other non-engineer. And yet he kept repeating the same claims again and again. I think the reason is pretty obvious: the Robotaxi vision is a large part of Tesla's mythology and sky-high valuation. He can't admit that they are nowhere close without losing that. And so he keeps rolling out a never ending series of next big things that will "solve everything". Currently it's the end to end AI.

I wish they would focus on achievable goals that bring real-world benefits, such as L4 on divided highways. MobilEye and others are close to rolling that out ...
 
That seems like a very charitable interpretation. The engineering team reports to Musk. He has more insight into the real state of affairs than any other non-engineer. And yet he kept repeating the same claims again and again. I think the reason is pretty obvious: the Robotaxi vision is a large part of Tesla's mythology and sky-high valuation. He can't admit that they are nowhere close without losing that. And so he keeps rolling out a never ending series of next big things that will "solve everything". Currently it's the end to end AI.

I wish they would focus on achievable goals that bring real-world benefits, such as L4 on divided highways. MobilEye and others are close to rolling that out ...
Yeah I can't imagine a more marketable car than one you can let drive itself and watch a movie while on a road trip. I'd actually say with some effort they could get to L4 on interstates during the next few years.
 
Why wouldn't a single neural network be able to handle the differences? If you're thinking the size is insufficient to effectively learn the differences from the training data, then Tesla can add more layers at the cost of additional compute and latency. At least right now, it sounds like there's still plenty of space to expand with it theoretically able to run at 50fps. However, increasing the size of the neural networks probably makes it more susceptible to overfitting as the weights in the additional layers want to "do something" and could incorrectly detect patterns that aren't meaningful.
Perception isn't the issue, but control---you need to output different control signals and have different control algorithms in different regions. The training targets need to be different and some of the networks would have to be segmented and conditioned on regions.
 
I'd be happy if my car to could learn to reliably detect that it's raining, like those $10 rain sensors that other carmakers have used for decades.
Another bad Elon decision. The cameras cant do it because the focus is wrong, they're focused at hundreds of meters instead of 5 millimeters.

You as a human have an eyeball 50 cm away from the windshield and can see that there is rain on it. If your eyeball were on the glass it would be hard.

None of the car cameras can, at best they can detect slight smudging of background images (and it's even less distinguishable at night where glare and rain look similar). Example, back with a 35mm camera, put some water drops on a telephoto lens and look through the viewfinder. Can you see the water drops from inside the viewfinder? Not very well at all.

And that's also why dirt/smudge and wiper leaving behind a film makes the system go nuts because that gives the same signal as rain.

The only solution is to use a ****ing rain sensor.
 
Another bad Elon decision. The cameras cant do it because the focus is wrong, they're focused at hundreds of meters instead of 5 millimeters.

You as a human have an eyeball 50 cm away from the windshield and can see that there is rain on it. If your eyeball were on the glass it would be hard.

None of the car cameras can, at best they can detect slight smudging of background images (and it's even less distinguishable at night where glare and rain look similar). Example, back with a 35mm camera, put some water drops on a telephoto lens and look through the viewfinder. Can you see the water drops from inside the viewfinder? Not very well at all.

And that's also why dirt/smudge and wiper leaving behind a film makes the system go nuts because that gives the same signal as rain.

The only solution is to use a ****ing rain sensor.
I suspect that if the outward-facing camera can't do it well then the cabin-facing camera can. It just needs more training to get better.

They need to spend more time working on the wipers but it hasn't been a high priority.
 
This is a hyperbolic statement that means absolutely nothing. Literally all AV NN are trained using videos.

CVPR23 best paper from June, E2E autonomous driving using just cameras. The world didn't change then, cameras didn't have a paradigm shift.

Planning-oriented Autonomous Driving using E2E NN.


 
These are both really simple and straightforward. Do you really not understand them?
Its technobabble designed to easily impress those who aren't the least bit technologically savy.
Tesla has opted for a low-power, low-profile compute board so that it doesn't take trunk space or spend an appreciable part of an EV's energy. I know you prefer companies that fill the entire trunk with computers and don't care about how much power they draw, but that doesn't invalidate the usefulness of working within constraints.
You are talking about development platforms; you are not worried about formfactor in the early stages of development. Tesla is selling "end products." so they can't afford to put development platforms in consumer cars. Look at everyone selling consumer cars, they have just as much if not more compute power in smaller packages.
And let me break down the second part for you. Sub-human = shorter time interval than perceived by people. Photon-to-control = camera inputs in, driving outputs out.
Its technobabble. Everyone is doing sub-human input to control. Why is that something to even talk about? It takes ~ 200 milliseconds for our brain to turn input into actionable information. That is very slow in computer terms.
The only explanation for you I can come up with comes from Upton Sinclair: "It is difficult to get a man to understand something, when his salary depends on his not understanding it."
It's all nonsense (not something worth even discussing), you don't see anyone in this field on twitter talking about subhuman control photon 36fps, 100W. Everyone understands you have to be able to do the calculations very fast at a reasonable compute cost in order to create a safe consumer AV.
 
Last edited:
Perception isn't the issue, but control---you need to output different control signals and have different control algorithms in different regions
Not sure if you were trying to quote someone else as I wasn't talking about adding layers to perception. I was indeed talking about the new control network and how adding layers could help train a single network that works across multiple regions given appropriate inputs and training targets.

Is your concern that combining it into one would be inefficient such as confusing whether to drive on the right side or left side of the road?

On the flip side, regions that have more roundabouts probably help provide training data for the relatively few in the US and would probably control fine for most roundabouts, and problematic ones can then get additional training data that could happen to be region specific with the neural network learning to rely on the location/region input signals if necessary to affect control predictions.
 
I’m skeptical about what 99% end-to-end means. I think some of the planning is now done by NN instead of code
Are you skeptical that Musk and Elluswamy demoed something still mostly being controlled with traditional code instead of neural networks? Or that you're skeptical that when they're able to actually release something, there will need to be a lot of traditional code added back?

What do you think about the repeated comments from the live stream:

There's no line of code that says slow down for speed bumps​
There is no line of code that says give clearance to bicyclists​
There is no line of code that says stop at a stop sign, wait for another car, who came first, wait X number of seconds​
We have never programmed in the concept of a roundabout​
We just showed a whole bunch of videos of roundabouts​
The mind-blowing thing is that there are no there there's no heuristics​
It doesn't know what a scooter is​
It doesn't know what paddles are​
There is no line of code that says this is a roundabout​
There is not nothing that says wait X number of seconds​
Just because there's no lines of code doesn't mean that it's uncontrollable​
It's still quite controllable on what you want by just adding data now​
We've never programmed in the notion of a turn lane or even a lane​
There's no line of code about traffic lanes at all​
Based on the video that's received that at the end of the destination you pull over to the side and park​

With the various examples of not having code, it would seem like they're quite aware of all the intricate control logic that has existed up through 11.x including specially handling crosswalk paddles that are placed in the middle of the road reminding people that state law requires stop/yield for pedestrian within crosswalk.
 
Are you skeptical that Musk and Elluswamy demoed something still mostly being controlled with traditional code instead of neural networks? Or that you're skeptical that when they're able to actually release something, there will need to be a lot of traditional code added back?

What do you think about the repeated comments from the live stream:

There's no line of code that says slow down for speed bumps​
There is no line of code that says give clearance to bicyclists​
There is no line of code that says stop at a stop sign, wait for another car, who came first, wait X number of seconds​
We have never programmed in the concept of a roundabout​
We just showed a whole bunch of videos of roundabouts​
The mind-blowing thing is that there are no there there's no heuristics​
It doesn't know what a scooter is​
It doesn't know what paddles are​
There is no line of code that says this is a roundabout​
There is not nothing that says wait X number of seconds​
Just because there's no lines of code doesn't mean that it's uncontrollable​
It's still quite controllable on what you want by just adding data now​
We've never programmed in the notion of a turn lane or even a lane​
There's no line of code about traffic lanes at all​
Based on the video that's received that at the end of the destination you pull over to the side and park​

With the various examples of not having code, it would seem like they're quite aware of all the intricate control logic that has existed up through 11.x including specially handling crosswalk paddles that are placed in the middle of the road reminding people that state law requires stop/yield for pedestrian within crosswalk.
How would you ever safety validate an AI system that is basically "we don't know how but it works like magic". Let's say 99 times of 100 it stopped for a red light but on one occasion it didn't and crashed into cars. What made it do that? What if the answer is "nobody knows, we just need to train it some more". You'd have to have a remarkable safety validation system if your device runs on "magic" and not coding; simply driving around for a while seeing what happens is not adequate validation, even it was a lot of driving around. You need to be really really sure you've presented every possible usage case. Unless you're just going to fall back on "this is Level 2 so if it screws up then it's the driver's fault", which is not an acceptable answer for life-critical software.

I don't believe they have no lines of code as described in the example of V12 that Elon claimed was running, personally I think Elon is just saying that. It was driving basically exactly like V11, I'd be surprised if a pure AI V12 would be exactly the same as V11. I think he's being aspirational again, maybe there's a bit of AI on top of mostly V11 coding.