if end-to-end doesn't work, then it's unclear what approach will solve the problem.
How do you define "solve the problem". I think how we define "solved autonomous driving" will greatly affect how we view if an approach is working or not.
I don't think autonomous driving will ever be perfect. Heck, human drivers are far from perfect. And we cannot expect autonomous cars to never be in an accident since sometimes accidents are caused by other drivers and not the fault of the AV. So if we define autonomous driving as needing to be perfect or never getting into an accident, it will never be completely solved 100%. There will likely always be something we can improve in the AV. In fact, one advantage of AVs is that they never stop learning and improving over time. So "solved" is an ongoing process. So rather than talking about "solved", I think it is better to talk about when the autonomous driving is good enough to be deployed without supervision in a given ODD.
When deploying autonomous driving, I think there are a few important goals when deploying AVs:
1) the autonomous driving should be unsupervised.
2) the autonomous driving should be safer than human drivers (to be defined).
3) the ODD should be useful (to be defined).
4) the autonomous driving should be commercially available and affordable.
So maybe we could say that when we achieve these 4 goals together in the same product, autonomous driving is "solved"?
With that said, I don't think there is a magic bullet that will "solve" the problem of autonomous driving. I think what will end up solving the problem in the end will be a combination of many different approaches and a lot of hard work and perseverance. And I believe we should use whatever approach works to solve a part of the problem. So if end-to-end video training allows us to efficiently scale a generalized driving policy then we should do that. If sensor fusion (cameras, radar and lidar) allow us to make perception more reliable in adverse conditions then we should do that. If HD maps make the AV safer by providing useful road info that the perception stack could not get on its own, then we should do that too. Ultimately, I think solving autonomous driving will be a very long grind because of the infamous long tail of edge cases. We will just need to keep grinding away, solving problems, solving edge cases, making the AI better, until it is eventually "good enough".
Lastly, I do think your statement might be a bit short sighted because it implies that end-to-end is the only way to solve autonomous driving. There are other approaches that might work too. Also, there is still a lot more to learn about ML. In fact, we are learning new ML all the time. So even if E2E does not "solve" autonomous driving, there might be some new approach we have not discovered yet that does solve autonomous driving in the future. After all, people use to think that we could solve autonomous driving with just coding perception and planning until we realized that we needed ML. Then we thought we could solve autonomous driving with some NN modules for perception, prediction and planning. That arguably got us much closer to solving the problem but the AI is not quite smart enough so now people think that end-to-end might be the final piece of the puzzle that we need. But who is to say that there isn't some other piece that we are still missing that we have not discovered yet?