Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Is all neural networks really a good idea?

This site may earn commission on affiliate links.
I can see why you got confused by what Elon said. What he said is very confusing and easy to misinterpret.
What he said is very clear. The misinterpretation is on you.
They are not throwing out all of their existing work and they are not doing something that is utterly impossible going from V11 to V12. One needs to be really careful in interpreting what Elon says. In this case the full quote is:

v12 is reserved for when FSD is end-to-end AI, from images in to steering, brakes & acceleration out.
I highlighted the crucial part: end-to-end AI. The only possible thing this can mean is it will be all NNs because they have finally converted the last pieces of heuristic code over to a NN.
The crucial part is
v12 is reserved for when FSD is end-to-end AI, from images in to steering, brakes & acceleration out.
That is what e2e means. Images in, control out.

tAVZMQC.png


You can speculate on what he meant or that he misspoke but the language he used has a specific meaning and is not easy to misinterpret.
 
What he said is very clear. The misinterpretation is on you.

The crucial part is

That is what e2e means. Images in, control out.

tAVZMQC.png


You can speculate on what he meant or that he misspoke but the language he used has a specific meaning and is not easy to misinterpret.
We will see. When V12 is released and we lose on-screen visualization, the occupancy network, and all the other great stuff they've been working on for years then I will humbly apologize. If those things remains then clearly V12 will be using multiple NNs tied together, not one big NN as per your image.
 
We will see.
That Elon Musk often is wrong, exaggerating or misrepresenting things is not up for debate. But what he said is not open for misinterpretation. It has a specific meaning which is not up for debate. He said v12 is reserved for when FSD is end-to-end AI, from images in to steering, brakes & acceleration out. Could he be entirely wrong again, lying, misrepresenting things? Yes.
When V12 is released and we lose on-screen visualization, the occupancy network, and all the other great stuff they've been working on for years then I will humbly apologize. If those things remains then clearly V12 will be using multiple NNs tied together, not one big NN as per your image.
What that tells us is that Elon Musk is wrong again.
 
  • Like
Reactions: diplomat33
This is one of the best interviews I've seen James Douma do in quite a while. Kudos to Farzad.


@1:21:33 :
So version 12, what Elon was talking about, that happens to be this milestone where the neural network finally makes it all the way through. Like right now they still have layers of heuristics that the path of information from the cameras to where it gets to the steering wheel has to go through some layers that are just purely heuristic, or purely rules. So at some point it'll be able to get all the way there just going through neural networks [...]
@1:23:11:
The way the system is built right now it's actually a lot of neural networks ... I mean we tend to think of it as one. Neural networks are this funny thing, they're kind of like Lego blocks. I mean on some scale they're individual Lego blocks, but on another scale it's like a Lego thing.
 
Last edited:
Again, that whole video analysis relies on a belief that everything Elon is saying is both true and 100% technically accurate. But Elon has shown time and time again that he is not always truthful and/or doesn’t 100% know what is actually going on “under the hood.” So like everything Elon, take with a grain of salt.

That said, here’s a thought experiment: if the sensor suite is now limited to vision only - “Tesla Vision” - because that is how humans drive, but the system is now end-to-end NNs trained from video of humans driving with no sign reading or rules programmed into the system - absolutely NOT the way humans learn to drive or actually drive - then how is it that we think the system can be an order of magnitude safer than humans. Isn’t it the case that the most we can hope for here is that the system is 100% as safe as the human drivers in the selected videos?
 
That said, here’s a thought experiment: if the sensor suite is now limited to vision only - “Tesla Vision” - because that is how humans drive, but the system is now end-to-end NNs trained from video of humans driving with no sign reading or rules programmed into the system - absolutely NOT the way humans learn to drive or actually drive - then how is it that we think the system can be an order of magnitude safer than humans. Isn’t it the case that the most we can hope for here is that the system is 100% as safe as the human drivers in the selected videos?

I think the idea is that the AV does not get distracted, tired or impaired like human drivers can get. The E2E vision is always attentive, just doing its supercomputing thing all the time to drive. It will always drive at 100% capability unlike humans that are not always 100% when they are driving. But of course that is true for all AVs, it is not unique to E2E vision approach.

The other possibility is that if the E2E is trained on the best human driver examples then it will drive better than the average human driver. If the best human driver is 2-3x better than the average human, then that could work to be 2-3x better than the average human. And maybe just being 2-3x better than the average will be good enough for regulators. But I was always under the impression that we wanted AVs to be better than the top human drivers, not just better than the average. If you train on the best human drivers, then I don't see how the system will be better than the best driver. It can't be better than its training.
 
Isn’t it the case that the most we can hope for here is that the system is 100% as safe as the human drivers in the selected videos?
At all times, making them considerably safer than human drivers taken as a whole. Watch the YouTube dash cam videos available. The vast majority of them are a result of lack of situational awareness (texting, not checking blind spots), ignorance of traffic rules (assuming right of way), or physical impairment (drunk, fatigued, medical event). So the driver who is paying attention, knows the traffic rules and isn't drunk or tired is very rarely going to have problems. A vision-based autonomy system can do that.
 
Last edited:
  • Like
Reactions: JHCCAZ
That said, here’s a thought experiment: if the sensor suite is now limited to vision only - “Tesla Vision” - because that is how humans drive, but the system is now end-to-end NNs trained from video of humans driving with no sign reading or rules programmed into the system - absolutely NOT the way humans learn to drive or actually drive - then how is it that we think the system can be an order of magnitude safer than humans. Isn’t it the case that the most we can hope for here is that the system is 100% as safe as the human drivers in the selected videos?
Perhaps the AV system won't make the 'human errors' related to being distracted by non-driving things, drive while intoxicated, or just ignore traffic regulations. That would go a long way toward being safer than the average human.
 
So I for one am not excited about the pending v12 release with full-stack NNs. There's going to be a ton of regression here and lots of opportunities for the system to veer away (no pun intended) from being a polished L3 autonomous driving system. Hopefully, this isn't more of Elon's "goal" of an L5 robotaxi, which I think everybody who drives these knows (if only deep down inside) is never going to happen. I think it's time for Tesla to start thinking about picking a realistic goal and then making moves using everything available to take this product over the finish line and call it done. It can't be a work-in-progress forever, right?

All nets can work, but only if there are lots of rules which went into making simulated data and filtering/scoring real data as "yes drive like that/no that's a bad idea", with the nets 'distilling' the rules. The off-line optimization/rules can be too computationally intensive to do on-board but OK in simulation back in the lab. Humans don't always obey rules, but they do most of the time---and a nnet planner with heavy supervision is most likely to achieve that.

It is true that Elon has an attraction to New Cool Shiny on the autonomous driving business rather than the difficult work of Refining The Edges Off Something We Have Working. Also he is too cheap to really hire the size and strength of team that's needed as you have to have both the Refining Current Solution and Next Generation Research going on simultaneously. The rate of improvement in deployed systems (regular Autopilot and even FSD) has slowed substantially.

Any new architecture is going to set back refinement and end user palatability significantly though you may think it has a higher upper bound eventually, which is probably true.

Like when are we going to get the "single stack merge"? If the FSD is now moving Yet Again to another fundamental stack in V12 then the date for getting regular AP on that as a perception/planning base is even further out.

The biggest problem with "all nets planning" is that it will be unpredictable to human drivers, we won't know when it will be good or screw up very badly. By constrast, conventional L2+ drive assist---what other automakers deploy---is less ambitious but much more predictable to people.
 
  • Like
Reactions: johnm
And maybe just being 2-3x better than the average will be good enough for regulators. But I was always under the impression that we wanted AVs to be better than the top human drivers, not just better than the average. If you train on the best human drivers, then I don't see how the system will be better than the best driver. It can't be better than its training.

It could work. Assume there are no perfect human drivers. And, assume the top human drivers don’t all have the same failings. Then, the specific failing of one subset of top drivers would be down voted by the more numerous remaining top drivers’ examples without that failing. The same thing would happen for each specific failing. The result could be a learned driving pattern with no failings. Basically, you sand off the bad spots.
 
  • Like
Reactions: johnm
It could work. Assume there are no perfect human drivers. And, assume the top human drivers don’t all have the same failings. Then, the specific failing of one subset of top drivers would be down voted by the more numerous remaining top drivers’ examples without that failing. The same thing would happen for each specific failing. The result could be a learned driving pattern with no failings. Basically, you sand off the bad spots.
It's interesting right because DeepMind created AlphaGo, giving it knowledge of many human games, and AlphaGo ended up able to defeat the world champion. Since then AlphaZero throws out the human games altogether and simply learns from itself.
 
  • Informative
Reactions: pilotSteve
So the driver who is paying attention, knows the traffic rules and isn't drunk or tired is very rarely going to have problems. A vision-based autonomy system can do that.
Except the vision-based autonomy doesn't know the rules - that's my point. There are no rules, just trained to drive like humans have driven in similar situations. This seems like an approach problem. The autonomous driving system has the opportunity to know A LOT MORE than the human driver. It can be taught all the rules, e.g., the difference between a solid white line and a dashed white line - something that I would be willing to bet 80% of the drivers on this forum don't know. It can have sensor input way beyond that of humans - radar, ultrasonic, integrated mapping data, real-time traffic updates and construction info, vision in spectrums far wider than human vision, etc. We can never expect all of that from human drivers. Why give all that up?
 
  • Like
Reactions: APotatoGod
So even if it's end to end NN, it's still a "program," just using a different "language."

The language is chunks of training data that must be curated and organized somehow.

If some locality decided to make right turn on red illegal on Jan 1, after 100 years of it being legal, someone has to remove all training video data of right turns on red and replace it with no right turns on red training data, test, and ship it. Edit: even worse, the car would need to know when to reference different sets of training data in different localities. Not sure how that is accomplished with strictly NN. If I drive across the USA, and different states have different laws, how do they manage that? Perhaps, code? IDK enough about how NNs work to deal with that. In the NN in my head, I had trouble remembering to not pump my own gas in Oregon, for example. So do NNs have an executive function that has to remember stuff like this, the way I do? Interesting.

It's obviously more than a different language, in the old way of thinking about it strictly as C++ or Python or any logical code. It can't be ported to another platform the way legacy code can (yet).

But it's still programming a computer, just with different methods.

Maybe it will be better, maybe not. I suspect there will still be issues with legal guardrails on behavior, and it will still fail to solve long tail and black swan situations, but I'll be happy to watch it from a distance and see what happens.
 
Last edited:
Except the vision-based autonomy doesn't know the rules - that's my point.
It doesn't need to know the rules. It only needs to follow them. That's what they'll get by using the right training data. The advantage of the neural net system over a heuristic system is that the neural net system handles ambiguities better. That is, places where there are no written rules. The neural net system will just have examples of how people deal with those situations and handle them. So neural networks cover both written and unwritten rules of the road.
We can never expect all of that from human drivers. Why give all that up?
They're not giving up on anything. They're starting from scratch, so they're using the data that they have, which is vision data. There's no reason that the system cannot be trained from LiDAR, RADAR and/or ultrasonic data in combination with vision data. Where will they get that data? It may be simulated based on the vision data, or it may come from new cars that are equipped with the new sensors. The latter is slower because they'll have to collect data for a while, but they're certainly not giving up on anything.
 
  • Like
Reactions: pilotSteve
If some locality decided to make right turn on red illegal on Jan 1, after 100 years of it being legal, someone has to remove all training video data of right turns on red and replace it with no right turns on red training data, test, and ship it. Edit: even worse, the car would need to know when to reference different sets of training data in different localities.
You make many excellent points. I am very skeptical of some of the things Elon said during the V12 video. I am also skeptical of the extrapolations many of the usual suspects on YouTube have made based on what Elon said. After 7 years of horribly wrong predictions about FSD, it feels like I'm watching Elon (Lucy) hold the football to be kicked by the usual suspects (Charlie Brown). Year after year they try to kick the football by making predictions and extrapolations based on a known faulty source of FSD information and year after year they fall flat on their backs as the football is pulled away.

Certainly the cars will have to know the traffic rules for different localities. For example, whether to drive on the left or right side of the road. Likewise, the cars need to have internal maps in order to navigate. I find it hard to believe that Tesla would hobble their FSD development by creating hundreds or thousands of different models based on the rules of the road in different regions and then by limiting each model to only use data from that region. This may have been your point.

One key question is something @diplomat33 and I have already gone around on in this thread: is Tesla throwing out most of their previous work and starting over with a new V12 model they have to train from scratch or is V12 an incremental improvement where many of the NNs from V11 are still being used but 300K+ lines of code have been replaced by additional NNs?

I admit Elon implied they started over from scratch in the V12 video just like he did previously saying V12 will be end to end. But if the only evidence we have are these possibly ambiguous statements by Elon then I still don't believe it. For example, it doesn't make a lot sense for Ashok to talk about a traffic light regression if this is a brand new system.

I don't believe Tesla threw out their much ballyhooed occupancy network. And we could see they didn't throw out the on-screen visualization and identification. Clearly there is either tremendous redundancy in V12 or they are using connected NNs, not one big NN.

I don't believe they dumped their previous years of work in the trash bin and replaced it all with a new NN for V12 that needed to be trained all over again (in addition to replacing hundreds of thousands of lines of code). If V12 is a complete redo and lacks significant internal structure then the barrier to entry is much lower than we thought. If it can now be trained with YouTube videos, the barrier is even lower. It would be astounding to me if they were able to replicate most of the behavior of V11 and replicate the on screen visualization in such a short period of time after starting over from scratch.

I think the barrier to entry is still high. I think Tesla built upon their previous work without trashing it. And I think Elon once again misled us with what he said about FSD. Year after year Charlie Brown tries to kick the football. Year after year he falls on his back. He never learns.
 
That begs a somewhat unrelated question: if they're starting from scratch, then what's the point of 11.4.7?
I get lots of software updates even though companies might be working on major updates. There's about a half million users of 11.x with 12.x likely a long way off yet. Probably a good idea not to abandon them yet.

Besides. The E2E NN approach might fail in the end.