Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Is all neural networks really a good idea?

This site may earn commission on affiliate links.
It is a myth that Waymo relies on heuristics. Waymo uses very little heuristics. Their perception is NN, their prediction is NN and their planner is NN. Waymo relies on ML first in their stack. In fact, Waymo has said that this ML-primary approach is a big reason why their stack is generalizing so quickly to new cities. Here, Dolgov cites ML primary as the reason the Waymo Driver worked "right from the get-go" when they started testing in Austin:


Sorry that was my mistake. ML primary is much better, and I think Cruise is similar - but still prone to overfitting without the diverse data set. Cruise has proven they are overfitting - Waymo is less clear.

But back to the beginning topic - do you think there is a big difference between having 4-5 NNs in a modular approach vs one big one? The 4-5 might be a bit more explanable, but I think this is minor difference. More modular components (whether NNs or not) tends to limit potential performance so it's not surprise all competitors are moving to less NNs.
 
Sorry that was my mistake. ML primary is much better, and I think Cruise is similar - but still prone to overfitting without the diverse data set. Cruise has proven they are overfitting - Waymo is less clear.

But back to the beginning topic - do you think there is a big difference between having 4-5 NNs in a modular approach vs one big one? The 4-5 might be a bit more explanable, but I think this is minor difference. More modular components (whether NNs or not) tends to limit potential performance so it's not surprise all competitors are moving to less NNs.
The real issue to me is 'how numerous, and how *good* are the supervision labels". The quantity of bits on the input is a burden---the quantity/quality of bits on the labels is the gold.

The ML performance is limited by the quantity of good labels in the low cardinality classes/rare cases. The issue is if the modular approach, by learning additional intermediate labels which can be acquired far more numerously and inexpensively, improves overall performance. Presumably yes, most researchers think they need this in the loss and to backprop the learning signals.

To some significant degree, humans are modular with a generic 'survive to reproduce' loss function heavily influenced by evolution and walking around. It takes 16-18 years of learning intermediate tasks and semantic understanding until humans can start to drive a vehicle.
 
  • Like
Reactions: ZeApelido
Sorry that was my mistake. ML primary is much better, and I think Cruise is similar - but still prone to overfitting without the diverse data set. Cruise has proven they are overfitting - Waymo is less clear.

I still think that Waymo has a more diverse data set that you give them credit for. After all, Waymo has collected data from over 20 cities in the US. That is not exactly nothing. I consider that to be a pretty diverse data set. Personally, I think Waymo is less overfit that Cruise just based on what we are seeing from both in terms of reliability and performance. Waymo seems more reliable than Cruise, handling more cases like construction better than Cruise. Waymo is also operating in a larger ODD than Cruise (not just bigger geofences but also 24/7, higher speeds and more adverse weather) which I think also supports the argument that Waymo is less overfit than Cruise. But I won't quibble on this point.

But back to the beginning topic - do you think there is a big difference between having 4-5 NNs in a modular approach vs one big one? The 4-5 might be a bit more explanable, but I think this is minor difference. More modular components (whether NNs or not) tends to limit potential performance so it's not surprise all competitors are moving to less NNs.

This is a more interesting question. Honestly, there are probably a lot of factors like training data, architecture etc that would affect the performance and reliability of the NN. So I don't think we can automatically say that modular is better or worse. I can imagine a scenario where 4-5 NN are better structured and better trained and so perform better than the 1 NN that is poorly trained. I can also imagine the reverse scenario where the 4-5 NN are poor and as a result perform worse than the 1 NN that is better trained. So either approach could be better depending on how well they are built.

Modular is more explainable which can help with troubleshooting. If there is a failure, it is likely easier with modular to know where the failure happened, what caused the failure and therefore how to fix it. And if there is a failure, you only need to retrain the NN directly related to the failure. With E2E, since there is just 1 NN and there are no distinct perception, prediction or planning components, it is harder to explain a failure and harder to troubleshoot. You basically need to retrain the entire stack every time. I would imagine it would be hard to avoid regressions too. For example, if I retrain the E2E NN to better handle say "no turn on red" scenarios, how do I know that I did not accidentally cause a regression in another scenario without revalidating the entire stack?

In this video, Anguelov does not believe that E2E is quite there yet but says that the trend is towards fewer and larger NNs. IMO, it is possible that E2E will eventually prevail, especially as ML becomes even more advanced and computing power increases. In fact, we might even say that the modular approach and the E2E approach are just taking different paths to get to the same goal. So the modular approach starts with many NN and then will reduce/merge the NN until it becomes E2E whereas the E2E companies like Wayve or Tesla are trying to do train E2E directly from scratch. So really the main difference is how you get to E2E: do you try the more incremental approach (modular to E2E) or the more ambitious approach (E2E directly)?

 
  • Like
Reactions: ZeApelido
Alex Kendall sums it up nicely: "It's not 50 years away".

Well, E2E exists now. The question is when will it be reliable enough to do safe driverless. Maybe a couple of anti-AI people might think E2E is 50 years away but I think the vast majority of AI experts would say that E2E driverless will happen sooner than 50 years. My personal prediction would be 5-10 years before we get E2E driverless cars.

But I do think that some "guard rails" will be needed to help E2E, at least in the short term. That's because interpretability, safety, robustness and regulations are real challenges for E2E right now. Those guard rails could be a safety driver and/or safety policy code to keep the car from doing unsafe maneuvers. That's because as Shai Shalev-Shwartz points out current NN are probabilistic by nature and can do stupid things. I don't think we would trust current E2E to drive everywhere completely unsupervised. But certainly in say 5-10 years, we might.
 
Last edited:
  • Like
Reactions: ZeApelido
I asked ChatGPT what "end-to-end" is in the context of autonomous driving. Here is it's response:

In the context of autonomous driving, "end-to-end" refers to a particular approach or architecture used in developing self-driving vehicles or systems. The end-to-end approach aims to create a fully autonomous driving system that can take raw sensor data as input (such as camera images, LiDAR point clouds, radar signals, etc.) and directly output the appropriate driving actions without relying on handcrafted algorithms or intermediate stages of perception and decision-making.

Key characteristics of an end-to-end autonomous driving system include:
  1. Data-Driven Learning: Instead of designing complex rule-based algorithms for tasks like object detection, lane keeping, and path planning, end-to-end systems use machine learning techniques, particularly deep learning, to learn directly from data. Neural networks are trained on large datasets to map sensor inputs to driving actions.
  2. Integration of Perception and Control: In traditional autonomous driving systems, perception (recognizing objects, understanding the environment) and control (steering, acceleration, braking) are often treated as separate components. In end-to-end systems, they are integrated, meaning the neural network learns to perceive the environment and make driving decisions simultaneously.
  3. End-to-End Mapping: The neural network learns a direct mapping from sensory input to control outputs. For example, it might take camera images and produce steering commands, throttle, and brake controls without explicitly breaking down the process into object detection, path planning, and trajectory generation.
  4. Training Data: Training an end-to-end autonomous driving system typically requires large amounts of labeled data, including sensor data from real-world driving scenarios. This data is used to train the neural network to make the right driving decisions in various situations.
While the end-to-end approach has gained attention due to its potential for simplifying the development of autonomous vehicles, it also has its challenges and limitations. These challenges include:
  • Data Requirement: Large and diverse datasets are needed for training, and collecting such data can be resource-intensive.
  • Interpretablility: Neural networks in end-to-end systems can be challenging to interpret, making it difficult to understand why a particular decision was made.
  • Safety and Robustness: Ensuring the system's safety and robustness in all possible scenarios can be complex, as the neural network may not handle edge cases well.
  • Regulatory and Certification Challenges: Meeting regulatory and certification standards for autonomous vehicles may be more difficult when the decision-making process is less transparent and relies heavily on machine learning.
Many companies and researchers are exploring various approaches to autonomous driving, and end-to-end is just one of them. Hybrid approaches that combine traditional algorithms with machine learning are also common, aiming to strike a balance between robustness and the benefits of deep learning. The choice of approach depends on factors such as safety, reliability, and regulatory requirements.
 
  • Informative
Reactions: QUBO
Well, E2E exists now. The question is when will it be reliable enough to do safe driverless. Maybe a couple of anti-AI people might think E2E is 50 years away but I think the vast majority of AI experts would say that E2E driverless will happen sooner than 50 years. My personal prediction would be 5-10 years before we get E2E driverless cars.
It doesn't exist in any AV that's deployed driverless. I doubt 5-10 years is enough for a full back-prop e2e architecture in autonomous vehicles, but time will tell.

Perhaps we've moved on before it ever happens, who knows. My point is that e2e is experimentation and is likely not going into deployable products in the coming years.
 
It doesn't exist in any AV that's deployed driverless. I doubt 5-10 years is enough for a full back-prop e2e architecture in autonomous vehicles, but time will tell.

Perhaps we've moved on before it ever happens, who knows. My point is that e2e is experimentation and is likely not going into deployable products in the coming years.

I make a distinction between "demos" and "driverless". We have E2E demos now. We do not have E2E driverless yet.

The reason I think we could see E2E driverless in 5-10 years is because of the rapid pace of progress with ML and compute. ML is improving very rapidly. It is not linear.
 
Last edited:
We'll get end to end much sooner IMO. If you have a software stack with mostly ML in modules, it's not going to be hard to "convert" that into an end-to-end with great initial guess of matrix weights. You'll be able to get something that converges to similar accuracy in less time. But end to end only useful (given the explanability downsides) when it is signficantly betteer. That requires a lot of data and compute.

Practically, I expect all these companies to continue to develop their more modular approaches while also persuing end-to-end. This provide a more "conservative" solution, but also the conservative solution offers label generation of explanable features output you'd want the end-to-end to learn to output.
 
  • Like
Reactions: diplomat33
I make a distinction between "demos" and "driverless". We have E2E demos now. We do not have E2E driverless yet.

The reason I think we could see E2E driverless in 5-10 years is because of the rapid pace of progress with ML and compute. ML is improving very rapidly. It is not linear.
The limiting problem is data availability, and proper data availability and conceptual progress.

Look at the large language models. By assimilating nearly all human readable text, they can simulate text very effectively. The central problem technologically/conceptually is that they have no understanding on what is probable or was said versus something that is True. That can't be solved with more of the same nature of data.

The same applies to L4 driving. You can observe natural data in huge quantities to know what is 'probable' but you don't know what is truly Safe. Equating the two is the same problem as LLM's equating highly probable token streams with truth.
 
  • Like
Reactions: QUBO and diplomat33
The same applies to L4 driving. You can observe natural data in huge quantities to know what is 'probable' but you don't know what is truly Safe. Equating the two is the same problem as LLM's equating highly probable token streams with truth.

I think Mobileye's RSS is a solution to this problem. RSS sets those limits of what is safe or not safe driving behavior. So it essentially constrains the driving decisions of the FSD system to fit within the "safe driving" category. That way if your FSD system wants to do something unsafe, the RSS prevents it from doing it. In fact, you might also be able to use RSS for training by filtering out the training data that is "unsafe driving" ahead of time. And then you can train your system on only the "safe driving" data.
 
I think Mobileye's RSS is a solution to this problem. RSS sets those limits of what is safe or not safe driving behavior. So it essentially constrains the driving decisions of the FSD system to fit within the "safe driving" category. That way if your FSD system wants to do something unsafe, the RSS prevents it from doing it. In fact, you might also be able to use RSS for training by filtering out the training data that is "unsafe driving" ahead of time. And then you can train your system on only the "safe driving" data.
I think it's a disadvantage to train only on safe (positive) examples instead of full supervision including negative examples as policy to move away from.

This requires very good simulation of negative examples, but then the system will likely learn to distinguish simulation from real from small simulation artifacts if most simulation is negative class.
 
  • Like
Reactions: OxBrew
Can you repost the links without the Twitter? I cna't log into it at work.

Sure.

Here is the link to the paper: [2306.16927] End-to-end Autonomous Driving: Challenges and Frontiers

And here is the summary chart from the paper :

DKTuwj6.png
 
Here is an interesting video showing Ghost's MMLLM (multi-modal large language model) reasoning about various driving scenarios:


I think it shows the potential that LLMs have in autonomous driving. I could imagine LLMs being a good driving policy where the AV would be able to figure out when to slow down, when to change lanes, when to yield etc based on what it is seeing.