He talks about end2end object detection, not end2end control. His argument is that you will have objects right in space but wrong in lane and that this is problem. Didn't Tesla solve this by having a single neural network doing all the predictions. If not can you just solve it using language of lanes like system? Also end2end control "solves" the problem of end2end object detection by not having any object detection... ^^End to end is not a magic bullet, according to ME CTO:
"
End-to-end AI is great. Let's just not be religious about each fancy new method. What really works in applications that require high precision is a combination of methods, each with its own advantages. For more details, watch the video
"