Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Does Tesla have binocular vision?

This site may earn commission on affiliate links.

sleepydoc

Well-Known Member
Aug 2, 2020
5,587
9,928
Minneapolis
There have been numerous posts about phantom braking, removing radar, etc. one point that has been made is that other companies that use vision based systems (e.g. Subaru) make use of 2 cameras to allow binocular, stereoscopic vision.

Tesla has more than one forward facing camera. They are not as widely spaced as Subaru’s, but they are separated so theoretically, Tesla should be able to have some degree of stereoscopic vision. Does anyone know if they do and/or if they make use of it?
 
The three front cameras have different focal lengths. They could compensate for distortion to simulate stereo, the way they compensate for the wide angle camera in the rear. But the effect would be limited to the max length of the lowest length camera.
 
The three front cameras have different focal lengths. They could compensate for distortion to simulate stereo, the way they compensate for the wide angle camera in the rear. But the effect would be limited to the max length of the lowest length camera.
That was my thought. Even if it’s limited to the ‘worst’ camera it’s still better than nothing. Like @DanCar said, though, I’ve never seen them mention it.
 
  • Like
Reactions: Dewg
Since Tesla's AI is looking at all three forward looking cameras, as well as the two pillar cams that have some forward looking field of view, it is possible that the AI is getting some distance information from parallax between the 5 cameras, but there is no real way to tell given how opaque the operation of an AI system is.
 
  • Like
Reactions: Daniel in SD
There have been numerous posts about phantom braking, removing radar, etc. one point that has been made is that other companies that use vision based systems (e.g. Subaru) make use of 2 cameras to allow binocular, stereoscopic vision.

Tesla has more than one forward facing camera. They are not as widely spaced as Subaru’s, but they are separated so theoretically, Tesla should be able to have some degree of stereoscopic vision. Does anyone know if they do and/or if they make use of it?
I recall Karpathy talking about this and it is definitely not stereoscopic or binocular vision.

Found it. Skip to 2:20

 
I recall Karpathy talking about this and it is definitely not stereoscopic or binocular vision.

Found it. Skip to 2:20

Thanks for the link - that makes it pretty clear they are not using stereoscopic vision for depth perception. It makes me wonder about some other aspects, though. Karpathy said they were using the Neural net to infer distance, presumably using other visual cues like humans do. First, it was my understanding that neural net processing does not occur in the car, but it doesn't make sense to do this kind of processing off-vehicle. There's too much data required for the continuous evaluations that need to be made and it would also mean slow or absent cell service would lead to delays in processing or unavailability.

The other question I have regards accuracy. In the example image he presented, the NN was very accurate, but then he wouldn't present an image with a lot of mistakes because that would undermine his position. Say the NN is 99% accurate with its estimates, the remaining 1% would still be plenty to cause issues. One would assume that they tested and verified a high degree of reliability before enacting the system, but maybe not. Or maybe the reliability ended up being lower than they expected.
 
Thanks for the link - that makes it pretty clear they are not using stereoscopic vision for depth perception. It makes me wonder about some other aspects, though. Karpathy said they were using the Neural net to infer distance, presumably using other visual cues like humans do. First, it was my understanding that neural net processing does not occur in the car, but it doesn't make sense to do this kind of processing off-vehicle. There's too much data required for the continuous evaluations that need to be made and it would also mean slow or absent cell service would lead to delays in processing or unavailability.

The other question I have regards accuracy. In the example image he presented, the NN was very accurate, but then he wouldn't present an image with a lot of mistakes because that would undermine his position. Say the NN is 99% accurate with its estimates, the remaining 1% would still be plenty to cause issues. One would assume that they tested and verified a high degree of reliability before enacting the system, but maybe not. Or maybe the reliability ended up being lower than they expected.
Absolutely agree with you. That said, I am even surprised the direction that was taken to train the neural net to infer distances. This is like teaching a person with one eye and having zero depth perception to learn what depth is by using a stick with measurements marked on it. Simply amazing. I would have never ever thought of that approach, ever. That is reflected even by the other challengers in the field namely Waymo who has gone down the “obvious” path.

All that being said, The fact they were even able to achieve such usability on daily driving cars for AutoPilot makes me say that they will be able to finesse it (train it) just like training a student driver. It will take a few years but it will be done, and it will be expandable to many uses as there is no reliance on a previously mapped out HD map to support it.
 
  • Like
Reactions: Dewg and sleepydoc
I haven’t seen any mention, and my guess is they aren’t using any sort of stereoscopic vision. There are many other ways to derive depth (like people with just one functioning eye don’t just bump into walls).
I had a friend with one working eye. He said he used the perception of the eye's autofocusing to judge distances. That's something else fixed cameras don't do.

But two eyes is better for driving no doubt and I favor high resolution cameras with overlapping fields of view to obtain physical parallax. The parallax may work better only at shorter distances (you know, where you might hit someone or something) but that physical direct distance measurement would provide enormous self-supervised data that they could use to train the machine learning nets for estimating arbitrary objects at further distances. The whole fleet could provide this instead of a few instrumented research vehicles with radar and lidar (Tesla has a few of them).

The original camera set and locations was chosen when they were using radar.
 
First, it was my understanding that neural net processing does not occur in the car, but it doesn't make sense to do this kind of processing off-vehicle. There's too much data required for the continuous evaluations that need to be made and it would also mean slow or absent cell service would lead to delays in processing or unavailability.

Neural net training happens off-vehicle. Trained neural net(s) are then optimized to run on the car. Different things.
 
No. Tesla has never mentioned use of stereoscopic vision. Binocular vision means having two eyes. Tesla has more than that. To have stereoscopic vision you typically need identical cameras spaced apart. Tesla doesn't have that. Tesla has 3 different front facing cameras.
TESLA doesn’t need to have mentioned this. We know that TESLA has depth perception using cameras alone. That means that TESLA is using the cameras stereoscopically.
 
  • Disagree
Reactions: DarkForest
TESLA doesn’t need to have mentioned this. We know that TESLA has depth perception using cameras alone. That means that TESLA is using the cameras stereoscopically.
take a look at the video @enemji posted above. It’s pretty clear that they measure depth via other visual cues without using binocular vision. It seems like triangulating with 2 cameras would be a lot simpler and Tesla chose to take the hard route but maybe it actually wasn’t much more work when coupled with other algorithms they’re using for other purposes. As a pragmatist, I only care that it works, not how.
 
  • Like
Reactions: enemji
take a look at the video @enemji posted above. It’s pretty clear that they measure depth via other visual cues without using binocular vision. It seems like triangulating with 2 cameras would be a lot simpler and Tesla chose to take the hard route but maybe it actually wasn’t much more work when coupled with other algorithms they’re using for other purposes. As a pragmatist, I only care that it works, not how.
🤔
 
  • Funny
Reactions: enemji