Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Is all neural networks really a good idea?

This site may earn commission on affiliate links.
Everyone uses machine learning in perception. How much machine learning is there in the drive policy? That's the critical question.

Well, we know Waymo is using at least some ML in their driving policy. Back in Sept, Waymo mentioned that they switched to their next-gen ML planner.


But yes, the critical question is how much of the driving policy is ML. We don't know the exact amount.

My point is that the debate is not 100% ML planner versus 0% ML planner. It's more like between 100% ML planner versus 80% ML planner.
 
  • Like
Reactions: spacecoin
Define "traditional approach". Nobody uses all heuristics for autonomous driving. Everyone uses a lot of NN in all parts of their stack with very little heuristics, it is just a matter of the structure, modular vs E2E.
Traditional approach meaning modular, with each distinct function having its own NN or just regular code. E2E instead takes video input and spits out actions. E2E on rails would be another function doing a sanity check on the E2E (to detect and prevent violations of local laws). But it's easy to see that such a function may bloat the code back to the same or worse than the modular approach.
 
Traditional approach meaning modular, with each distinct function having its own NN or just regular code. E2E instead takes video input and spits out actions. E2E on rails would be another function doing a sanity check on the E2E. But it's easy to see that such a function may bloat the code back to the same or worse than the modular approach.

Thanks. I know Tesla pushes this false narrative that other companies don't use machine learning. So I wanted clarification.

I am not sure why adding some code to check E2E would automatically make it worse than modular. Modular can be 90% or even 100% ML. Modular does not mean a lot of code.
 
Thanks. I know Tesla pushes this false narrative that other companies don't use machine learning. So I wanted clarification.

I am not sure why adding some code to check E2E would automatically make it worse than modular. Modular can be 90% or even 100% ML. Modular does not mean a lot of code.
How does the code to check E2E work? What are the inputs? What is it checking? Is the checker more reliable than what it is checking? Then why not just use it? Is the checker less reliable than what it is checking? Then isn't it making the system worse?
 
  • Like
Reactions: QUBO
How does the code to check E2E work? What are the inputs? What is it checking? Is the checker more reliable than what it is checking? Then why not just use it? Is the checker less reliable than what it is checking? Then isn't it making the system worse?

Well, one idea is to have code that creates the FSD visualizations you see on the screen in your Tesla. So it takes outputs from the NN and generates the graphics on screen of lanes, objects etc... The developers or the person in the car could use those visualizations to get a sense of what the E2E is doing. Since the code is just generating graphics, not performing any driving tasks, you are not going to use the code instead of the NN. And it is not making the system worse, it is just giving you a graphical representation of what the system is doing.
 
  • Like
Reactions: QUBO
Mobileye CEO and CTO, Amnon Shashua and Shai Shalev-Shwartz just wrote a blog arguing against an end-to-end approach for full-self-driving. They argue that E2E is neither sufficient nor necessary and is lacking in transparency, controllability and performance:





Conclusion:




I see Amnon is going off the rails again.

Does anyone here actually train deep learning NNs? I do.

Guess what's useful when going end-to-end for training? Having already built a suite of algorithms and modular solutions that give you visualizations of path planning, VRUs, etc...

Did you know you can have secondary outputs / backprop for you end-to-end network? You can make it also learn to try to generate all these visual outputs. You can do this because you generate the labels for it from your V11 solution.

Not only that, but also Tesla is probably architecting their V12 as some combination of all the previous modular pieces, so those pieces with initial weights will converge easily to giving desired outputs.

Not only that, but do you know what's the best thing for giving confidence about how a system will perform?

Orders of magnitude more test data. Ignore the competency of Tesla's system, but in pure statistical confidence, the best thing you can do is show performance on a massive amount of diverse real world cases. Nothing instills more confidence than that.

Of course, that is something that Waymo / Cruise / Mobileye cannot currently do, as they simply do not have the magnitude nor diversity.
 
  • Disagree
Reactions: diplomat33
I see Amnon is going off the rails again.

Does anyone here actually train deep learning NNs? I do.

Guess what's useful when going end-to-end for training? Having already built a suite of algorithms and modular solutions that give you visualizations of path planning, VRUs, etc...

Did you know you can have secondary outputs / backprop for you end-to-end network? You can make it also learn to try to generate all these visual outputs. You can do this because you generate the labels for it from your V11 solution.

Not only that, but also Tesla is probably architecting their V12 as some combination of all the previous modular pieces, so those pieces with initial weights will converge easily to giving desired outputs.

Not only that, but do you know what's the best thing for giving confidence about how a system will perform?

Orders of magnitude more test data. Ignore the competency of Tesla's system, but in pure statistical confidence, the best thing you can do is show performance on a massive amount of diverse real world cases. Nothing instills more confidence than that.

Of course, that is something that Waymo / Cruise / Mobileye cannot currently do, as they simply do not have the magnitude nor diversity.

Amnon is not off the rails. He is one of the world's leaders in AI and ML. I think he knows what he is talking about.

And Mobileye has over 400 petabytes of real world driving data. So yes, Mobileye has enough data, both magnitude and diversity to have the confidence in their system. In fact, that is why Shashua says that they don't need E2E, since they already enough real world data to validate confidence in their system.
 
  • Funny
Reactions: ZeApelido
Amnon is not off the rails. He is one of the world's leaders in AI and ML. I think he knows what he is talking about.

And Mobileye has over 400 petabytes of real world driving data. So yes, Mobileye has enough data, both magnitude and diversity to have the confidence in their system. In fact, that is why Shashua says that they don't need E2E, since they already enough real world data to validate confidence in their system.

Oh really? Mobileye has how many cars fully sensored (so they can post-hoc run model inference)? I mean fully sensored, not just 1 camera for highway L2.

And how many diverse geographies are said fully sensored systems active in? In order to instill statistical confidence, you'll need all that data coming from across basically most streets in the U.S. (or Europe, etc...).
 
  • Funny
Reactions: diplomat33
Oh really? Mobileye has how many cars fully sensored (so they can post-hoc run model inference)? I mean fully sensored, not just 1 camera for highway L2.

And how many diverse geographies are said fully sensored systems active in? In order to instill statistical confidence, you'll need all that data coming from across basically most streets in the U.S. (or Europe, etc...).

Mobileye has hundred of thousands of fully sensored Zeekr cars in Asia, doing real world validation now.

Additionally, Mobileye has a simulator that reconstructs the entire scene just from REM maps. And they don't need fully sensored cars to build the REM maps. And they have REM maps of every road in EU, US, Asia etc... So they don't need fully sensored cars to build this simulator to validate in simulation.

 
Last edited:
  • Like
Reactions: QUBO
Mobileye has a simulator that reconstructs the entire scene from the REM maps. And they have REM maps of every road in EU, US, Asia etc... So they are able to do validation in simulation from across the world.


And they have hundred of thousands of fully sensored Zeekr cars in Asia.

No, recreating the static background does not count.

This is a well known phenomenon in data science / machine learning - you never rely testing / validation on simulated / augmented data. Sure you can test on it in addition to real-world data, but it does not replace.

Ergo, you need to have of that data collected from real-world cases to prove anything. Sure, Mobileye can leverage REM maps when testing on real world data in each city, but they physical need to be in those cities.
 
No, recreating the static background does not count.

This is a well known phenomenon in data science / machine learning - you never rely testing / validation on simulated / augmented data. Sure you can test on it in addition to real-world data, but it does not replace.

Ergo, you need to have of that data collected from real-world cases to prove anything. Sure, Mobileye can leverage REM maps when testing on real world data in each city, but they physical need to be in those cities.

We've had this debate before. Nobody is suggesting that simulation alone is sufficient. You always need real world testing. Mobileye does simulation testing first and then real world testing afterwards to validate. And Mobileye has stated that they have 400 petabytes of real world driving cases.
 
  • Like
Reactions: QUBO
This is a well known phenomenon in data science / machine learning - you never rely testing / validation on simulated / augmented data. Sure you can test on it in addition to real-world data, but it does not replace.
Is it a well-studied phenomenon? I'm curious if there is a simulation quality threshold where simulation becomes useful. I'm trying to imagine a system that is trained entirely on high quality simulation, then refined by real world data. The assumption here is that the simulation allows for precise training control, but the real world data refines the final weights.
 
  • Like
Reactions: QUBO and diplomat33
We've had this debate before. Nobody is suggesting that simulation alone is sufficient. You always need real world testing. Mobileye does simulation testing first and then real world testing afterwards to validate. And Mobileye has stated that they have 400 petabytes of real world driving cases.

Again you are ignoring what that real world data consists of. All that matters for evaluation of a self-driving system that requires X cameras, Y lidars, and Z radars, is data from cars equipped with X, Y, and Z. So Mobileye cannot evaluated their L4 solution on data from a car with one camera - they can only use data from cars that have multiple cameras with every needed viewing angle the model expects to input.

Which cars on the road are collecting this data? I think we both know it is a far, far, smaller number. Most of that 400 petabytes is meaningless for evaluation of a L4 system.

Is it a well-studied phenomenon? I'm curious if there is a simulation quality threshold where simulation becomes useful. I'm trying to imagine a system that is trained entirely on high quality simulation, then refined by real world data. The assumption here is that the simulation allows for precise training control, but the real world data refines the final weights.

Yes. Specific to evaluation, you do not do it:



It's akin to "grading your own homework." Well known in data science yet self driving car companies like to pretend it's not.

Now for training, its definitely useful. In extreme situations where the possibilities are extremely rigid and well-known (e.g. a chess game), everything can be done with simulation. Thus reinforcement learning. The problem is when the real world deviates (and in any real-world problem, the real world always deviates from simulation).

Your idea for training makes sense. Use simulation with lots of repeitition to find reasonable weights, then refine with real world data.

The problem is - the more complicated the solution space and the higher the accuracy required means more reliance on real world data. You can't replicate quirky cases in simulation that you've never thought up before - ergo you can't have a NN learn a generalized solution.

What you want is lots of unique data, this is the only way to learn a generalized solution that integrates all those nearby cases. Yes, on top of that you want to simulate all sorts of variations of those cases - this helps ensure you don't overfit.

The problem with self driving is that its both very complex and requires very high accuracy.
 
  • Informative
Reactions: JB47394
Which cars on the road are collecting this data? I think we both know it is a far, far, smaller number. Most of that 400 petabytes is meaningless for evaluation of a L4 system.

Well, right now, we know of over 100K Zeekr cars with the full Supervision sensor set in China. As a result, Mobileye has validated Supervision in China, hence why NZP is going "wide" to all Zeekr owners. And as more carmakers deploy Supervision, the fleet of full sensored cars will increase.

 
  • Like
Reactions: ZeApelido
Wayve CEO responds to Mobileye's blog against E2E:


I am not sure Alex is really addressing the points that Shashua made. Alex says that with respect to performance and transparency, hard coded is the wrong approach but he never really addresses the argument that E2E has weaknesses in those areas. And he says that ML and AI have demonstrated some amazing capabilities so he is betting on E2E. But Shashua is not saying that ML and AI are bad or can't do amazing, he is arguing that E2E in particular cannot achieve the high MTBF needed for safe unsupervised autonomy. And Shashua is not advocating for all hard coded approach in AVs. He is simply advocating for a modular/hybrid approach that uses both ML when appropriate and code when appropriate in the stack. I don't feel like Alex really addressed the argument that a modular ML approach in particular is better than E2E, he simply says that E2E is better than all code but Shashua is not advocating for all code. So I feel like he uses a bit a strawman.
 
  • Like
Reactions: DrChaos
I see Amnon is going off the rails again.

Does anyone here actually train deep learning NNs? I do.

Guess what's useful when going end-to-end for training? Having already built a suite of algorithms and modular solutions that give you visualizations of path planning, VRUs, etc...

Did you know you can have secondary outputs / backprop for you end-to-end network? You can make it also learn to try to generate all these visual outputs. You can do this because you generate the labels for it from your V11 solution.

Not only that, but also Tesla is probably architecting their V12 as some combination of all the previous modular pieces, so those pieces with initial weights will converge easily to giving desired outputs.

Not only that, but do you know what's the best thing for giving confidence about how a system will perform?

Orders of magnitude more test data. Ignore the competency of Tesla's system, but in pure statistical confidence, the best thing you can do is show performance on a massive amount of diverse real world cases. Nothing instills more confidence than that.
The question is what is the "test data" and loss function on it? Test data of driving policy is much less dense than test data of generative video---predicting video 2 seconds ahead has tons of self-supervision labels, no argument.

Driving policy is more important, risky, and difficult to acquire as outside simulation you don't get many negative examples. And that's the issue, ensuring safe policy in pure ML when you can't sample from the support of unsafe policy---entirely unlike AlphaGo/Zero.

Which is why it has been algorithmic/optimization based so far based on an object ontology derived from human-labeled supervision of intermediate concepts---the Karpathy solution.

I've put forth my argument before: E2E with almost all positive policy labels from natural observed driving will be very nice to make a L2++++ driver assist product, the use of human driving will have it learn subtle socialized and condition-dependent behaviors which aren't in any deterministic/optimization policy framework. It will feel like success. To be honest the Career Loss Function for Tesla developers and managers is optimizing Big Boss Gut Feeling of Success Personally Testing On His Car and How Hypeable It is on Twitter. Not some abtruse metrics that show failure when it feels like it should already be working. Ashok is going to beat Karpathy on that metric.

Waymo had way less disengagements than Tesla 10 years ago, but that still wasn't enough for robotaxis even though it felt great to a casual rider.

Ensuring safety for L4 at 10,000 hours MTBF (probably deployed robo regulatory level at minimum) is a much harder problem. So far Tesla is struggling to get 10 and most people experience 1 or less.

From a business point of view, obviously Tesla's doing much better--they're making money. To optimize Tesla cash flow, delivering E2E L2++++ to sell cars while promising some future L4 (that's never happening on this HW base and probably not SW base) is good, if dishonest, business.

I'm resigned to the reality that my 3LR will never ever be L4 contrary to Elon hype, and so a L2++++ that I can download seems like a great idea.

FSD is of course dishonestly named until they retcon it into Full Spectrum Driver Assist, meaning that it will *try* to be a driver assist in all conditions which is closer to the truth.
 
Last edited:
Ensuring safety for L4 at 10,000 hours MTBF (probably deployed robo regulatory level at minimum) is a much harder problem. So far Tesla is struggling to get 10 and most people experience 1 or less.

For sure. But it will certainly not be because end-to-end would be a worse choice than more heuristic code in terms of potential accuracy.

Simple models work for simple problems, and need less data.

Complex models work for complex problems if data / compute / architecture needs are meet. The 2012 breakthrough on ImageNet is an example. More compute and better architectures and suddenly the simpler computer vision techniques were put to pasture (but they were more explanable! lol).

Like, every other complicated data project has followed a similar path. More complicated machine learning algorithms with more data and more compute --> better accuracy.

And OpenAI didn't say "no we don't need to parse every text on the web, we can simulate most of it!"

And Google didn't say "no we don't need as much voice / text available, we can simulate most of it!"

Cruise CEO's tweets a month or 2 ago already proved Cruise is data limited.

Waymo's algos are certainly more advanced than Cruise's or Tesla's, but they also have limited ODDs for a reason. Will be interesting to see how well scaling to 50 metro areas goes, as less heuristic code can be relied upon at that scale. It is certainly apples to oranges vs trying to make something work on every surface street in the U.S. from the get go.
 

Waymo's algos are certainly more advanced than Cruise's or Tesla's, but they also have limited ODDs for a reason. Will be interesting to see how well scaling to 50 metro areas goes, as less heuristic code can be relied upon at that scale. It is certainly apples to oranges vs trying to make something work on every surface street in the U.S. from the get go.

It is a myth that Waymo relies on heuristics. Waymo uses very little heuristics. Their perception is NN, their prediction is NN and their planner is NN. Waymo relies on ML first in their stack. In fact, Waymo has said that this ML-primary approach is a big reason why their stack is generalizing so quickly to new cities. Here, Dolgov cites ML primary as the reason the Waymo Driver worked "right from the get-go" when they started testing in Austin: