Does Tesla have binocular vision?

sleepydoc · Oct 21, 2022

There have been numerous posts about phantom braking, removing radar, etc. one point that has been made is that other companies that use vision based systems (e.g. Subaru) make use of 2 cameras to allow binocular, stereoscopic vision.

Tesla has more than one forward facing camera. They are not as widely spaced as Subaru’s, but they are separated so theoretically, Tesla should be able to have some degree of stereoscopic vision. Does anyone know if they do and/or if they make use of it?

DanCar · Oct 21, 2022

No. Tesla has never mentioned use of stereoscopic vision. Binocular vision means having two eyes. Tesla has more than that. To have stereoscopic vision you typically need identical cameras spaced apart. Tesla doesn't have that. Tesla has 3 different front facing cameras.

Dewg · Oct 21, 2022

The three front cameras have different focal lengths. They could compensate for distortion to simulate stereo, the way they compensate for the wide angle camera in the rear. But the effect would be limited to the max length of the lowest length camera.

sleepydoc · Oct 21, 2022

Dewg said:
The three front cameras have different focal lengths. They could compensate for distortion to simulate stereo, the way they compensate for the wide angle camera in the rear. But the effect would be limited to the max length of the lowest length camera.

That was my thought. Even if it’s limited to the ‘worst’ camera it’s still better than nothing. Like @DanCar said, though, I’ve never seen them mention it.

AKADAP · Oct 21, 2022

Since Tesla's AI is looking at all three forward looking cameras, as well as the two pillar cams that have some forward looking field of view, it is possible that the AI is getting some distance information from parallax between the 5 cameras, but there is no real way to tell given how opaque the operation of an AI system is.

TresLA · Oct 22, 2022

I haven’t seen any mention, and my guess is they aren’t using any sort of stereoscopic vision. There are many other ways to derive depth (like people with just one functioning eye don’t just bump into walls).

enemji · Oct 23, 2022

TresLA said:
I haven’t seen any mention, and my guess is they aren’t using any sort of stereoscopic vision. There are many other ways to derive depth (like people with just one functioning eye don’t just bump into walls).

“Pirate Vision” © Copyright protected.

enemji · Oct 23, 2022

sleepydoc said:
There have been numerous posts about phantom braking, removing radar, etc. one point that has been made is that other companies that use vision based systems (e.g. Subaru) make use of 2 cameras to allow binocular, stereoscopic vision.

Tesla has more than one forward facing camera. They are not as widely spaced as Subaru’s, but they are separated so theoretically, Tesla should be able to have some degree of stereoscopic vision. Does anyone know if they do and/or if they make use of it?

I recall Karpathy talking about this and it is definitely not stereoscopic or binocular vision.

Found it. Skip to 2:20

sleepydoc · Oct 24, 2022

enemji said:
I recall Karpathy talking about this and it is definitely not stereoscopic or binocular vision.

Found it. Skip to 2:20

Thanks for the link - that makes it pretty clear they are not using stereoscopic vision for depth perception. It makes me wonder about some other aspects, though. Karpathy said they were using the Neural net to infer distance, presumably using other visual cues like humans do. First, it was my understanding that neural net processing does not occur in the car, but it doesn't make sense to do this kind of processing off-vehicle. There's too much data required for the continuous evaluations that need to be made and it would also mean slow or absent cell service would lead to delays in processing or unavailability.

The other question I have regards accuracy. In the example image he presented, the NN was very accurate, but then he wouldn't present an image with a lot of mistakes because that would undermine his position. Say the NN is 99% accurate with its estimates, the remaining 1% would still be plenty to cause issues. One would assume that they tested and verified a high degree of reliability before enacting the system, but maybe not. Or maybe the reliability ended up being lower than they expected.

enemji · Oct 24, 2022

sleepydoc said:
Thanks for the link - that makes it pretty clear they are not using stereoscopic vision for depth perception. It makes me wonder about some other aspects, though. Karpathy said they were using the Neural net to infer distance, presumably using other visual cues like humans do. First, it was my understanding that neural net processing does not occur in the car, but it doesn't make sense to do this kind of processing off-vehicle. There's too much data required for the continuous evaluations that need to be made and it would also mean slow or absent cell service would lead to delays in processing or unavailability.

The other question I have regards accuracy. In the example image he presented, the NN was very accurate, but then he wouldn't present an image with a lot of mistakes because that would undermine his position. Say the NN is 99% accurate with its estimates, the remaining 1% would still be plenty to cause issues. One would assume that they tested and verified a high degree of reliability before enacting the system, but maybe not. Or maybe the reliability ended up being lower than they expected.

Absolutely agree with you. That said, I am even surprised the direction that was taken to train the neural net to infer distances. This is like teaching a person with one eye and having zero depth perception to learn what depth is by using a stick with measurements marked on it. Simply amazing. I would have never ever thought of that approach, ever. That is reflected even by the other challengers in the field namely Waymo who has gone down the “obvious” path.

All that being said, The fact they were even able to achieve such usability on daily driving cars for AutoPilot makes me say that they will be able to finesse it (train it) just like training a student driver. It will take a few years but it will be done, and it will be expandable to many uses as there is no reliance on a previously mapped out HD map to support it.

DrChaos · Oct 28, 2022

TresLA said:
I haven’t seen any mention, and my guess is they aren’t using any sort of stereoscopic vision. There are many other ways to derive depth (like people with just one functioning eye don’t just bump into walls).

I had a friend with one working eye. He said he used the perception of the eye's autofocusing to judge distances. That's something else fixed cameras don't do.

But two eyes is better for driving no doubt and I favor high resolution cameras with overlapping fields of view to obtain physical parallax. The parallax may work better only at shorter distances (you know, where you might hit someone or something) but that physical direct distance measurement would provide enormous self-supervised data that they could use to train the machine learning nets for estimating arbitrary objects at further distances. The whole fleet could provide this instead of a few instrumented research vehicles with radar and lidar (Tesla has a few of them).

The original camera set and locations was chosen when they were using radar.

enemji · Oct 28, 2022

DrChaos said:
The original camera set and locations was chosen when they were using radar.

The radar was essentially teaching the Pirate Eye Vision © depth perception.

ninefiveone · Oct 28, 2022

sleepydoc said:
First, it was my understanding that neural net processing does not occur in the car, but it doesn't make sense to do this kind of processing off-vehicle. There's too much data required for the continuous evaluations that need to be made and it would also mean slow or absent cell service would lead to delays in processing or unavailability.

Neural net training happens off-vehicle. Trained neural net(s) are then optimized to run on the car. Different things.

enemji · Oct 28, 2022

ninefiveone said:
Neural net training happens off-vehicle. Trained neural net(s) are then optimized to run on the car. Different things.

The algorithms are built through training off-site using the vision supplied by the Tesla cars. That algorithm is then pushed out as OTA update to the car.

THEbuz · Nov 3, 2022

DanCar said:
No. Tesla has never mentioned use of stereoscopic vision. Binocular vision means having two eyes. Tesla has more than that. To have stereoscopic vision you typically need identical cameras spaced apart. Tesla doesn't have that. Tesla has 3 different front facing cameras.

TESLA doesn’t need to have mentioned this. We know that TESLA has depth perception using cameras alone. That means that TESLA is using the cameras stereoscopically.

enemji · Nov 3, 2022

THEbuz said:
TESLA doesn’t need to have mentioned this. We know that TESLA has depth perception using cameras alone. That means that TESLA is using the cameras stereoscopically.

Not again. Tesla does not use a binocular/stereoscopic vision. Their approach is not the obvious way most people think and that in itsef is mind blowing

sleepydoc · Nov 4, 2022

THEbuz said:
TESLA doesn’t need to have mentioned this. We know that TESLA has depth perception using cameras alone. That means that TESLA is using the cameras stereoscopically.

take a look at the video @enemji posted above. It’s pretty clear that they measure depth via other visual cues without using binocular vision. It seems like triangulating with 2 cameras would be a lot simpler and Tesla chose to take the hard route but maybe it actually wasn’t much more work when coupled with other algorithms they’re using for other purposes. As a pragmatist, I only care that it works, not how.

THEbuz · Nov 4, 2022

sleepydoc said:
take a look at the video @enemji posted above. It’s pretty clear that they measure depth via other visual cues without using binocular vision. It seems like triangulating with 2 cameras would be a lot simpler and Tesla chose to take the hard route but maybe it actually wasn’t much more work when coupled with other algorithms they’re using for other purposes. As a pragmatist, I only care that it works, not how.

Kablooie · Nov 4, 2022

There’s an algorithm developed a few years ago that allows an NN to infer a 3D model from a single photo. The car also can take the differences between video frames to refine the model more. From this info the car builds an internal representation of the local environment, many times each second, that it uses to navigate with.

TresLA · Nov 15, 2022

THEbuz said:

Also, depth is being calculated from not just forward looking cameras. Side and rear views don’t have multiple cameras covering most pixels and yet we see depth of objects presented in the visualization on our displays.

Does Tesla have binocular vision?

Well-Known Member

Active Member

Active Member

Well-Known Member

Member

Member

Active Member

Active Member

Well-Known Member

Active Member

Member

Active Member

Member

Active Member

Member

Active Member

Well-Known Member

Member

Member

Member

Similar threads