I think this is indeed the case. In my area there are 4 lane roads where the average speed is 60mph. For an unprotected right on red at a traffic light the cameras have to look across a wide intersection and then work out the velocity of cars that are approaching the traffic light - about 100m. At least with 10.12 it seems to not see the traffic at all - when it is 100% clear it pauses a good 15 seconds and does a timid commit to the turn. When there is approaching traffic it does the same thing - except there is traffic and I have to disengage or slam on the accelerator / change lanes. Curious to see if 10.69 is any better on this type of turn.It's also possible the low resolution of the cameras (only 1280x960) means that recognition and classification of imagery cannot occur at far enough distances as it's too coarse compared to good human fovea vision, meaning that reaction is too late. Better cameras aren't a problem to install now, but it would greatly increase the data rate into the neural networks and that requires significantly faster compute which is a big issue.
Faster compute without the most efficient chips means high expense and high power consumption. I wouldn't be surprised if the Waymo cars have $10-20K of compute and 4000 watts of power consumption, which would obviously kill efficiency. It would be like resistive heaters running at full blast. And in the summer that would have to be A/C'ed out too.
Phantom breaking is probably similar - at 1280x960 your are calculating the risk of hitting fuzzy pixelated blobs at 50 meters - AI's can do amazing things but it will be limited by the cameras. If the cameras were sharper there would be much better distinction of shadows / lighting vs a concrete object in the road at distance. And yeah, the compute on higher res cameras would be way too much.