Almost like labeling, "this batch of human reactions to situations all fit in this set of outputs" but with curation?
At risk of triggering certain elements here, it would be like giving the control software the current visualization that we see in the car. It is a distillation of the essential information that's needed to make control decisions. Not
literally the current visualization, but something along those lines. Also not an occupancy network, but something that provides a normalized understanding of the driving environment. The current visualization is just my best point of reference. It may be something completely unintelligible to people.
That distillation could come from cameras (at various locations), LIDARs, ultrasonic sensors, drones, other cars, traffic cameras, whatever. However the sensor information comes together, the sensor system spits out that distillation, and that's what the control software works from. Given that, a simulator would be able to produce the same information without having to perform realistic rendering. It then falls to separate software to figure out how to collect all the information coming from the various sensors, distilling it and presenting it to the control network. Ultimately, an all-LIDAR vehicle produces the same outputs to the control logic as an all-camera vehicle would, but the means of producing those outputs might be wildly different. They'd be car-specific, while the control logic could be created once and used by everyone.
The machine learning guys tell me that such an approach hobbles the overall efficiency of the neural network because there can be no back-propagation to the pixel level. I'm suggesting this approach because of the larger problem of training many different sensors sets while trying to produce the same control outputs.
My approach has the problem that there may simply not be enough information coming from the sensors to satisfy the control system's needs. For example, putting a one-pixel camera facing forward on a car isn't enough information to make control decisions. So even if such a distillation system was created, it might all fall apart when sensor solutions come along that blow the pants off existing stuff. At that point, you'd want to completely retrain the control system with the outputs of the new sensors because they can provide a much better distillation - and the currently-trained control system wouldn't take full advantage of it.