Sorry that was my mistake. ML primary is much better, and I think Cruise is similar - but still prone to overfitting without the diverse data set. Cruise has proven they are overfitting - Waymo is less clear.
I still think that Waymo has a more diverse data set that you give them credit for. After all, Waymo has collected data from over 20 cities in the US. That is not exactly nothing. I consider that to be a pretty diverse data set. Personally, I think Waymo is less overfit that Cruise just based on what we are seeing from both in terms of reliability and performance. Waymo seems more reliable than Cruise, handling more cases like construction better than Cruise. Waymo is also operating in a larger ODD than Cruise (not just bigger geofences but also 24/7, higher speeds and more adverse weather) which I think also supports the argument that Waymo is less overfit than Cruise. But I won't quibble on this point.
But back to the beginning topic - do you think there is a big difference between having 4-5 NNs in a modular approach vs one big one? The 4-5 might be a bit more explanable, but I think this is minor difference. More modular components (whether NNs or not) tends to limit potential performance so it's not surprise all competitors are moving to less NNs.
This is a more interesting question. Honestly, there are probably a lot of factors like training data, architecture etc that would affect the performance and reliability of the NN. So I don't think we can automatically say that modular is better or worse. I can imagine a scenario where 4-5 NN are better structured and better trained and so perform better than the 1 NN that is poorly trained. I can also imagine the reverse scenario where the 4-5 NN are poor and as a result perform worse than the 1 NN that is better trained. So either approach could be better depending on how well they are built.
Modular is more explainable which can help with troubleshooting. If there is a failure, it is likely easier with modular to know where the failure happened, what caused the failure and therefore how to fix it. And if there is a failure, you only need to retrain the NN directly related to the failure. With E2E, since there is just 1 NN and there are no distinct perception, prediction or planning components, it is harder to explain a failure and harder to troubleshoot. You basically need to retrain the entire stack every time. I would imagine it would be hard to avoid regressions too. For example, if I retrain the E2E NN to better handle say "no turn on red" scenarios, how do I know that I did not accidentally cause a regression in another scenario without revalidating the entire stack?
In this video, Anguelov does not believe that E2E is quite there yet but says that the trend is towards fewer and larger NNs. IMO, it is possible that E2E will eventually prevail, especially as ML becomes even more advanced and computing power increases. In fact, we might even say that the modular approach and the E2E approach are just taking different paths to get to the same goal. So the modular approach starts with many NN and then will reduce/merge the NN until it becomes E2E whereas the E2E companies like Wayve or Tesla are trying to do train E2E directly from scratch. So really the main difference is how you get to E2E: do you try the more incremental approach (modular to E2E) or the more ambitious approach (E2E directly)?