It's semantics, the training of either approach is the same. You still have backprop through the modules. It's just each piece has been pre-trained. That's not a bad thing, it assuredly allows your total model to converge to decent results more rapidly. And it can still allow the perception component to evolve over time. it doesn't have to be static.
Feedback from pedal & wheel control backward toward trying to figure out which component of the network needs updating is indeed a weak signal. But this is counterbalanced by the sheer volume of data available. If the labeled data shows 100k cases where the driver changes lane when it sees an object (and 0 where it runs over the object), then the NN will learn to always move over for the object). Even if there are other reasons you may change lanes where a few training samples wouldn't be sufficient to clarify, the volume wins out.
I don't think people are really understanding what is happening in machine learning with transformers and compute. Look at this
post going around these days in ML world, as well this
paper on scaling laws.
Intuition and domain knowledge eventually lose out to compute and data. These models just keep improving in a consistent, predictiable fashion as long as you allow enough compute, data, and parameters in the model. The actual architecture / depth / width matters less!
Self driving cars at scale is one of the hardest problems, which is why it will need to be solved by ML models. There is no reason to think it will behave differently than other complex models. What wins? Compute and data. Clean data. The Attention mechanism / transformer models have shown strong ability to generalize from a data set and store nuanced understandings given enough data.
Worried about navigation? Just literally feed a picture of the navigation screen into the training, like a human.
The main issues going forward I see are:
1) Cleaning the data. You are ingesting so much data, and you want to remove contradictory data & data from say distracted drivers.
2) Inference compute. All of what I said above does not imply a great model can work on the limited compute resources.