Your points are well-taken and I generally agree strongly, but I might say you're a little pessimistic about the separation effect. Rangefinder cameras have a triangulation baselength of one to three inches or so, and they can reliably separate 25 feet from 30 feet from infinity. Optical fire-control rangefinders with baselength of three to six feet could accurately direct artillery for targets a mile or more distant (sorry for the non-metric examples).Even for humans stereo vision (depth perception) really is only useful for near/close vision. At distance, we are using monocular cues like relative image size, shadows ect.. not stereo vision. That isn't to say that having two eyes isn't helpful, things like binocular summation does improve overall vision. But its different from depth perception. We would need eyes very very wide apart to have effect on depth perception at distance.
Teslas have their cameras pretty close together, so don't see much use in depth perception, more likely just redundancy. I did read a paper a while back where they placed cameras at far ends of the vehicle to gain some stereo depth perception.
So I think the width across a Tesla should be quite sufficient for triangulation of objects within the range of reaction-safety and accurate braking calculations. The effectively continuous video-stream updating helps to average (filter) and refine the estimate at the frame rate - not to mention all the other supporting clues about depth from the rolling perspective shifts etc. as you discussed.