One potential solution I can think of is add a swivel and swivel controller to the front narrow 250M camera. If that camera can turn and look to the sides, problem solved, as it is far seeing and higher up. This might also be a simpler solution.
This is also great thinking about a potential solution. But I would point out the implication that in order to use such a real-time adaptive camera (or more generally, adaptive sensor), there would have to be a fairly fundamental architectural change in the Neural Network, including new hardware-control output paths.
Right now, the perception NN receives the multi-camera input (possibly somewhat pre-processed by stitching software in another NN or a more conventional video-merging block) and analyzes the surround or Bird's Eye View. Presently though, this has an entirely fixed relationship to the vehicle position. It then processes the imagery to recognize and label objects, including aspects of their time-domain past and predicted future paths. I
think it's also now explicitly farming out some of the distance-extraction work to yet another "pseudo-lidar" NN, though I don't claim to know the real details of the architecture and its explicit vs. implicit blocks within the Mind of the Car. But the point is that it's not currently set up the way humans and animals are, where there are feedback loops to allow optimization of the sensor position or output. We humans constantly move our eyes and swivel our heads to augment our visual perception. Animals like cats & dogs can move their ears independently, but we can only reposition our heads or cup our hands to locate or isolate sound.
The Tesla Vision NN, I think, is not currently architected with the capability to request camera pan, tilt or zoom for a real-time adjustment of its imaging input. Of course it could be done but I'm saying that it probably involves a deeper redesign of the "brain" map and training protocols, with some kind of handler NN that knows when the main perception block needs a better angle-of-view or possibly higher magnification. Part of such a redesign would add the physical I/O channels to enable that in external hardware.
I will say, however, that there's indeed some evidence of very crude attempts by the car to improve its camera-angles. First, the whole creeping-forward action to compensate for the lack of real side-looking cameras that are well forward of the human driver (or at least as far forward as a driver can lean today). Second, less clear though, is the behavior where the car angles uncomfortably leftward while considering high-speed oncoming traffic - that could be designed to point the narrow-view center camera. (I hope there's
some good reason, as it's otherwise a very poor driving practice!) These awkward behaviors are, to me, indications that the system needs better camera angles but is very poorly equipped to get them and currently very slow to process the results.
So yes, perhaps some pan/tilt/zoom camera hardware, but properly combined with a modified perception NN that knows how to use it efficiently and with low latency. Alternatively to motorized hardware, a couple more fixed cameras but also a general upgrade to 4k-ish sensors and lenses. Not to overwhelm the general driving NN with a ridiculous and unnecessary ultra-res bitstream, but another approach to allow adaptive digital zoom-in when useful. The main NN could continue with down-sampled modest-resolution video over the whole panoramic field, but with the ability to consider one or more virtual high-resolution regions that can supplement the perception. The detail region(s) would typically be forward vision but as needed, to either side at any needed angle. This is an implementation of human-like pointable central-vision (fovea) detail, but potentially super-human in handling more than one angle for multiple detailed views and more rapid shifting of the high-attention view.
There are some great possibilities, but Tesla admits no need for any of this. We're left to accept the idea that it's simply a matter of tightening up the software.