Is all neural networks really a good idea?

ZeApelido · Sep 20, 2023

diplomat33 said:
It is a myth that Waymo relies on heuristics. Waymo uses very little heuristics. Their perception is NN, their prediction is NN and their planner is NN. Waymo relies on ML first in their stack. In fact, Waymo has said that this ML-primary approach is a big reason why their stack is generalizing so quickly to new cities. Here, Dolgov cites ML primary as the reason the Waymo Driver worked "right from the get-go" when they started testing in Austin:

https://twitter.com/x/status/1687131405724364801

Sorry that was my mistake. ML primary is much better, and I think Cruise is similar - but still prone to overfitting without the diverse data set. Cruise has proven they are overfitting - Waymo is less clear.

But back to the beginning topic - do you think there is a big difference between having 4-5 NNs in a modular approach vs one big one? The 4-5 might be a bit more explanable, but I think this is minor difference. More modular components (whether NNs or not) tends to limit potential performance so it's not surprise all competitors are moving to less NNs.

DrChaos · Sep 20, 2023

ZeApelido said:
Sorry that was my mistake. ML primary is much better, and I think Cruise is similar - but still prone to overfitting without the diverse data set. Cruise has proven they are overfitting - Waymo is less clear.

But back to the beginning topic - do you think there is a big difference between having 4-5 NNs in a modular approach vs one big one? The 4-5 might be a bit more explanable, but I think this is minor difference. More modular components (whether NNs or not) tends to limit potential performance so it's not surprise all competitors are moving to less NNs.

The real issue to me is 'how numerous, and how *good* are the supervision labels". The quantity of bits on the input is a burden---the quantity/quality of bits on the labels is the gold.

The ML performance is limited by the quantity of good labels in the low cardinality classes/rare cases. The issue is if the modular approach, by learning additional intermediate labels which can be acquired far more numerously and inexpensively, improves overall performance. Presumably yes, most researchers think they need this in the loss and to backprop the learning signals.

To some significant degree, humans are modular with a generic 'survive to reproduce' loss function heavily influenced by evolution and walking around. It takes 16-18 years of learning intermediate tasks and semantic understanding until humans can start to drive a vehicle.

diplomat33 · Sep 20, 2023

ZeApelido said:
Sorry that was my mistake. ML primary is much better, and I think Cruise is similar - but still prone to overfitting without the diverse data set. Cruise has proven they are overfitting - Waymo is less clear.

I still think that Waymo has a more diverse data set that you give them credit for. After all, Waymo has collected data from over 20 cities in the US. That is not exactly nothing. I consider that to be a pretty diverse data set. Personally, I think Waymo is less overfit that Cruise just based on what we are seeing from both in terms of reliability and performance. Waymo seems more reliable than Cruise, handling more cases like construction better than Cruise. Waymo is also operating in a larger ODD than Cruise (not just bigger geofences but also 24/7, higher speeds and more adverse weather) which I think also supports the argument that Waymo is less overfit than Cruise. But I won't quibble on this point.

ZeApelido said:
But back to the beginning topic - do you think there is a big difference between having 4-5 NNs in a modular approach vs one big one? The 4-5 might be a bit more explanable, but I think this is minor difference. More modular components (whether NNs or not) tends to limit potential performance so it's not surprise all competitors are moving to less NNs.

This is a more interesting question. Honestly, there are probably a lot of factors like training data, architecture etc that would affect the performance and reliability of the NN. So I don't think we can automatically say that modular is better or worse. I can imagine a scenario where 4-5 NN are better structured and better trained and so perform better than the 1 NN that is poorly trained. I can also imagine the reverse scenario where the 4-5 NN are poor and as a result perform worse than the 1 NN that is better trained. So either approach could be better depending on how well they are built.

Modular is more explainable which can help with troubleshooting. If there is a failure, it is likely easier with modular to know where the failure happened, what caused the failure and therefore how to fix it. And if there is a failure, you only need to retrain the NN directly related to the failure. With E2E, since there is just 1 NN and there are no distinct perception, prediction or planning components, it is harder to explain a failure and harder to troubleshoot. You basically need to retrain the entire stack every time. I would imagine it would be hard to avoid regressions too. For example, if I retrain the E2E NN to better handle say "no turn on red" scenarios, how do I know that I did not accidentally cause a regression in another scenario without revalidating the entire stack?

In this video, Anguelov does not believe that E2E is quite there yet but says that the trend is towards fewer and larger NNs. IMO, it is possible that E2E will eventually prevail, especially as ML becomes even more advanced and computing power increases. In fact, we might even say that the modular approach and the E2E approach are just taking different paths to get to the same goal. So the modular approach starts with many NN and then will reduce/merge the NN until it becomes E2E whereas the E2E companies like Wayve or Tesla are trying to do train E2E directly from scratch. So really the main difference is how you get to E2E: do you try the more incremental approach (modular to E2E) or the more ambitious approach (E2E directly)?

spacecoin · Sep 21, 2023

diplomat33 said:
In this video, Anguelov does not believe that E2E is quite there yet but says that the trend is towards fewer and larger NNs.

Alex Kendall sums it up nicely: "It's not 50 years away".

On the other hand e2e architectures is not imminent in safety critical applications either. Perhaps in another 10-15 years?

diplomat33 · Sep 21, 2023

spacecoin said:
Alex Kendall sums it up nicely: "It's not 50 years away".

Well, E2E exists now. The question is when will it be reliable enough to do safe driverless. Maybe a couple of anti-AI people might think E2E is 50 years away but I think the vast majority of AI experts would say that E2E driverless will happen sooner than 50 years. My personal prediction would be 5-10 years before we get E2E driverless cars.

But I do think that some "guard rails" will be needed to help E2E, at least in the short term. That's because interpretability, safety, robustness and regulations are real challenges for E2E right now. Those guard rails could be a safety driver and/or safety policy code to keep the car from doing unsafe maneuvers. That's because as Shai Shalev-Shwartz points out current NN are probabilistic by nature and can do stupid things. I don't think we would trust current E2E to drive everywhere completely unsupervised. But certainly in say 5-10 years, we might.

diplomat33 · Sep 21, 2023

I asked ChatGPT what "end-to-end" is in the context of autonomous driving. Here is it's response:

In the context of autonomous driving, "end-to-end" refers to a particular approach or architecture used in developing self-driving vehicles or systems. The end-to-end approach aims to create a fully autonomous driving system that can take raw sensor data as input (such as camera images, LiDAR point clouds, radar signals, etc.) and directly output the appropriate driving actions without relying on handcrafted algorithms or intermediate stages of perception and decision-making.

Key characteristics of an end-to-end autonomous driving system include:

Data-Driven Learning: Instead of designing complex rule-based algorithms for tasks like object detection, lane keeping, and path planning, end-to-end systems use machine learning techniques, particularly deep learning, to learn directly from data. Neural networks are trained on large datasets to map sensor inputs to driving actions.
Integration of Perception and Control: In traditional autonomous driving systems, perception (recognizing objects, understanding the environment) and control (steering, acceleration, braking) are often treated as separate components. In end-to-end systems, they are integrated, meaning the neural network learns to perceive the environment and make driving decisions simultaneously.
End-to-End Mapping: The neural network learns a direct mapping from sensory input to control outputs. For example, it might take camera images and produce steering commands, throttle, and brake controls without explicitly breaking down the process into object detection, path planning, and trajectory generation.
Training Data: Training an end-to-end autonomous driving system typically requires large amounts of labeled data, including sensor data from real-world driving scenarios. This data is used to train the neural network to make the right driving decisions in various situations.

While the end-to-end approach has gained attention due to its potential for simplifying the development of autonomous vehicles, it also has its challenges and limitations. These challenges include:

Data Requirement: Large and diverse datasets are needed for training, and collecting such data can be resource-intensive.
Interpretablility: Neural networks in end-to-end systems can be challenging to interpret, making it difficult to understand why a particular decision was made.
Safety and Robustness: Ensuring the system's safety and robustness in all possible scenarios can be complex, as the neural network may not handle edge cases well.
Regulatory and Certification Challenges: Meeting regulatory and certification standards for autonomous vehicles may be more difficult when the decision-making process is less transparent and relies heavily on machine learning.

Many companies and researchers are exploring various approaches to autonomous driving, and end-to-end is just one of them. Hybrid approaches that combine traditional algorithms with machine learning are also common, aiming to strike a balance between robustness and the benefits of deep learning. The choice of approach depends on factors such as safety, reliability, and regulatory requirements.

spacecoin · Sep 21, 2023

diplomat33 said:
Well, E2E exists now. The question is when will it be reliable enough to do safe driverless. Maybe a couple of anti-AI people might think E2E is 50 years away but I think the vast majority of AI experts would say that E2E driverless will happen sooner than 50 years. My personal prediction would be 5-10 years before we get E2E driverless cars.

It doesn't exist in any AV that's deployed driverless. I doubt 5-10 years is enough for a full back-prop e2e architecture in autonomous vehicles, but time will tell.

Perhaps we've moved on before it ever happens, who knows. My point is that e2e is experimentation and is likely not going into deployable products in the coming years.

diplomat33 · Sep 21, 2023

spacecoin said:
It doesn't exist in any AV that's deployed driverless. I doubt 5-10 years is enough for a full back-prop e2e architecture in autonomous vehicles, but time will tell.

Perhaps we've moved on before it ever happens, who knows. My point is that e2e is experimentation and is likely not going into deployable products in the coming years.

I make a distinction between "demos" and "driverless". We have E2E demos now. We do not have E2E driverless yet.

The reason I think we could see E2E driverless in 5-10 years is because of the rapid pace of progress with ML and compute. ML is improving very rapidly. It is not linear.

ZeApelido · Sep 21, 2023

We'll get end to end much sooner IMO. If you have a software stack with mostly ML in modules, it's not going to be hard to "convert" that into an end-to-end with great initial guess of matrix weights. You'll be able to get something that converges to similar accuracy in less time. But end to end only useful (given the explanability downsides) when it is signficantly betteer. That requires a lot of data and compute.

Practically, I expect all these companies to continue to develop their more modular approaches while also persuing end-to-end. This provide a more "conservative" solution, but also the conservative solution offers label generation of explanable features output you'd want the end-to-end to learn to output.

DrChaos · Sep 21, 2023

diplomat33 said:
I make a distinction between "demos" and "driverless". We have E2E demos now. We do not have E2E driverless yet.

The reason I think we could see E2E driverless in 5-10 years is because of the rapid pace of progress with ML and compute. ML is improving very rapidly. It is not linear.

The limiting problem is data availability, and proper data availability and conceptual progress.

Look at the large language models. By assimilating nearly all human readable text, they can simulate text very effectively. The central problem technologically/conceptually is that they have no understanding on what is probable or was said versus something that is True. That can't be solved with more of the same nature of data.

The same applies to L4 driving. You can observe natural data in huge quantities to know what is 'probable' but you don't know what is truly Safe. Equating the two is the same problem as LLM's equating highly probable token streams with truth.

diplomat33 · Sep 21, 2023

DrChaos said:
The same applies to L4 driving. You can observe natural data in huge quantities to know what is 'probable' but you don't know what is truly Safe. Equating the two is the same problem as LLM's equating highly probable token streams with truth.

I think Mobileye's RSS is a solution to this problem. RSS sets those limits of what is safe or not safe driving behavior. So it essentially constrains the driving decisions of the FSD system to fit within the "safe driving" category. That way if your FSD system wants to do something unsafe, the RSS prevents it from doing it. In fact, you might also be able to use RSS for training by filtering out the training data that is "unsafe driving" ahead of time. And then you can train your system on only the "safe driving" data.

DrChaos · Sep 21, 2023

diplomat33 said:
I think Mobileye's RSS is a solution to this problem. RSS sets those limits of what is safe or not safe driving behavior. So it essentially constrains the driving decisions of the FSD system to fit within the "safe driving" category. That way if your FSD system wants to do something unsafe, the RSS prevents it from doing it. In fact, you might also be able to use RSS for training by filtering out the training data that is "unsafe driving" ahead of time. And then you can train your system on only the "safe driving" data.

I think it's a disadvantage to train only on safe (positive) examples instead of full supervision including negative examples as policy to move away from.

This requires very good simulation of negative examples, but then the system will likely learn to distinguish simulation from real from small simulation artifacts if most simulation is negative class.

diplomat33 · Sep 22, 2023

Interesting paper and chart that surveys the state of the art in ML and E2E. The paper looks to be a great read if you want to learn about all the ways companies are working on AV problems. It compares E2E to modular.

https://twitter.com/x/status/1703656105916670149

diplomat33 · Nov 8, 2023

Ghost Autonomy has long blog on using one large foundational model in autonomous driving. It is long read but well worth it.

One Model To Rule The Road? | Ghost

Large models are changing everything, and autonomous driving is next.

www.ghostautonomy.com

DrChaos · Nov 8, 2023

diplomat33 said:
Interesting paper and chart that surveys the state of the art in ML and E2E. The paper looks to be a great read if you want to learn about all the ways companies are working on AV problems. It compares E2E to modular.

https://twitter.com/x/status/1703656105916670149

Can you repost the links without the Twitter? I cna't log into it at work.

diplomat33 · Nov 8, 2023

DrChaos said:
Can you repost the links without the Twitter? I cna't log into it at work.

Sure.

Here is the link to the paper: [2306.16927] End-to-end Autonomous Driving: Challenges and Frontiers

And here is the summary chart from the paper :

diplomat33 · Nov 8, 2023

Here is an interesting video showing Ghost's MMLLM (multi-modal large language model) reasoning about various driving scenarios:

I think it shows the potential that LLMs have in autonomous driving. I could imagine LLMs being a good driving policy where the AV would be able to figure out when to slow down, when to change lanes, when to yield etc based on what it is seeing.

wwarren · Apr 3, 2024

Stumbled upon this conversation, and while I haven't read 8 pages of followup, I wondered if anyone is thinking differently since v12.3 came out, given how well it's been received, and seems to do with just neural nets?

enemji · Apr 3, 2024

DrChaos said:
I think it's a disadvantage to train only on safe (positive) examples instead of full supervision including negative examples as policy to move away from.

This requires very good simulation of negative examples, but then the system will likely learn to distinguish simulation from real from small simulation artifacts if most simulation is negative class.

Yup! Survivorship bias.

This hypothetical pattern of damage of surviving aircraft shows locations where they can sustain damage and still return home. If the aircraft was reinforced in the indicated areas, this would be a result of survivorship bias because crucial data from fatally damaged planes was being ignored; those hit in other places did not survive.

DrChaos · Apr 3, 2024

wwarren said:
Stumbled upon this conversation, and while I haven't read 8 pages of followup, I wondered if anyone is thinking differently since v12.3 came out, given how well it's been received, and seems to do with just neural nets?

It is behaving as I would expect. As a L2+ driving assistant it feels more natural, more human like (except at stop signs where they mandated by government unrealistic behavior), but it makes mistakes that the previous rule-based system did not, particularly about driveable areas as sometimes it takes corners poorly and hits wheels. Hasnt happened to me (I'm in SoCal and in the train set and probably where it behaves the best).

Maintaining firm invariants that a rule based system is programmed to do is harder with a net trained only driving policy. For instance "all boundaries of car must stay on drivable area perception locations" is a firm rule that can be programmed in a traditional path planner which has physics knowledge of car size, mass, acceleration dynamics etc. The new 12.3 net based driving system is more natural like people and sometimes moves a bit over lane lines (naturally) than the previous system did to give space to obstructions (this is good), but it also seems like it occasionally takes corners too closely and hits rims (bad behavior, because it doesn't have a physics simulation and robotics space rule inside it).

Humans don't have an explicit physical simulation either but by the time they're driving they've had 16-18 years of physically moving around in space in their own body, and evolved neural hardware has millions of years of training on motion with identified mapping & direction and other neurons built in to the brain by evolution. Artificial neural networks don't generally have this built-in special architecture yet---we don't yet know how to do it in the right places, but eventually I suspect that researchers will figure out the minimal ways to add it on (like submodules for neural 3-d computations) that can give good performance with much more efficient and smaller network sizes.

And then humans make errors in cars at first, but later get better.

With a larger and especially better curated positive and negative dataset, I think it will progress faster as a L2++ driver assistance, and be scalable to more localities, but the gap to L4 is still high. Like LLMs simulate human fluency and trigger incorrect heuristics in humans, the greater naturalness of driving doesn't necessarily mean L4 is around the corner.

Further progress in that direction would probably require multiple simultaneous models: a network based driving module for most of the drive, but augmented with a simultaneous "safety driver" code that enforces certain boundaries and crash avoidance, and has priority, and this is rules/robotics based. For instance, with fast precise absolute distances, including on sides of car with full-angle high res lidar or radar you could enforce policy that prevents the collisions and rim curbings in extreme situations while letting the net drive most of the time. This might make the rules-based programming easier as you have to program in only boundary-condition rules and not all the usual heuristics.

But this is expensive and Elon likes cheap.

Heuristic fallacy: On LLMs, people who can speak gramatically well are often knowledgable and intelligent, LLMs can do the first part well they're likely to invent *sugar* up more than people. Similarly more-like-human driving most of the time doesn't mean it has human-like conception. It's as if they trained an orangutan to drive as it watched many people drive. It's hard to direct an organutan with concepts on top of experience.

Is all neural networks really a good idea?

Active Member

Active Member

Average guy who loves autonomous vehicles

Active Member

Average guy who loves autonomous vehicles

Average guy who loves autonomous vehicles

Active Member

Average guy who loves autonomous vehicles

Active Member

Active Member

Average guy who loves autonomous vehicles

Active Member

Average guy who loves autonomous vehicles

Average guy who loves autonomous vehicles

Active Member

Average guy who loves autonomous vehicles

Average guy who loves autonomous vehicles

Member

Active Member

Active Member

Similar threads