If you can see something and react to it, it hopefully should be more obvious to end-to-end networks that the differing control behavior is related to the obstruction, e.g., road closed sign. Presumably Tesla is also using human labelers to approve / rate / rank driving behaviors, so if a random driving decision, e.g., oh I actually need to stop at the grocery store, doesn't appear to be good for training, it might not make it to the networks based on a second human opinion. More generally, neural networks with lots of training data somewhat "averages" away infrequent behaviors, so this can be good or bad in filtering out random mistakes or not realizing special behavior is necessary.It's unclear how an all nets approach will understand implicit human decision making. How will it understand that I made a lane change because I'm avoiding an arbitrary obstruction or situation vs navigating to my destination vs fixing a mistake I did prior.
Depending on how much dynamic driving context is kept, e.g., the last second vs last minute or upcoming information, some immediate driving prediction could be wrong or not make sense. For example, the turns to get to a destination over an hour away probably aren't as relevant as the immediate next turn, which could also be several minutes away, so does the network get the whole navigation and it needs to learn when to pay attention to that information or does an engineer decide only the next 3 intersections should be enough to pass in? Similarly, passing a "lane closed ahead" sign at highway speeds nearly a mile back should probably avoid switching into that lane, but then again, I've seen humans correctly(?) ignore the signs until forced to merge to get ahead and save several minutes off the trip.