Also we have have necks so we can turn our heads, that's where the multiple cameras come in.
But yeah you are absolutely right. There should be enough sensors. Elon's thought is like this: If a human can see and do it, the deep learning neural net should be able to do the same.
HW3 supposedly processes 2000 FPS which means you have a tick of 500µs, which is pretty amazing for this kind of system. Not sure if that's for one camera or all 8. But worst case you would have 2000 / 8 = 4ms tick rate, which is more than good enough.
My guess it that progress will seem slow, but once they nail it, it will be amazing, and the competitors will be left in the dust. I am convinced that the deep learning + "normal" cameras is the way to go.