My take on this:
If Tesla is perceiving depth from the size of the object (as opposed to binocular vision), then it can only make an assessment of its relative distance if it actually recognises the object and has a size comparison in its database. Ok for cars, bikes, humans, traffic cones...