With supervised learning, the neural network doesn't need explicit programming of how to do binocular vision position calculations. Similar to how one doesn't need to teach a kid trigonometry and calculus to catch a ball by measuring angles and deriving velocity, the neural network if designed correctly learns from a lot of accurately labeled training data that an object that overlaps in multiple camera views happens to behave a certain way.