Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

4D vision

This site may earn commission on affiliate links.

shrineofchance

she/her, they/them
Feb 10, 2021
247
278
Canada





This is 2D vision:​


000C14C4-89A3-4533-9421-6755E3BE9DC0.jpeg

Image credit: greentheonly.

Projecting out 3D models from detections within individual 2D images.





This is 3D vision:​


3AC52EFB-9BF5-4EDB-8317-3E1E74E0EC5E.jpeg


F7CB6098-C00D-4B5A-85DC-B30D91EE8024.jpeg


ABABCCD9-F0C9-4AA0-ABEE-9382B30CC0D3.jpeg

Image source: Karpathy at CVPR 2020.

The neural network directly outputs a 3D model from the 8 camera images.



This is 4D vision (3D + time):​


[The neural network] has seen the images over time and [it has] done the tracking. And having accumulated information from all those frames, here's actually what the world looks like around you.” -Karpathy​


Original source.
 
Last edited:
Last edited:
The temporal part is just prediction.

Nope. Prediction is a different thing. The vision neural networks are using temporal information for perception. Karpathy hasn't elaborated on how exactly they're doing this, but an example of how they could glean useful information from sequences of frames is occlusion.

If you enforce internal consistency within a sequence of frames and the assumption that objects don't just pop in and out of existence randomly, you could train the vision NNs to "remember" objects that become occluded by other objects.

Enforcing temporal consistency might give better vision NN predictions overall.
 
I don't know what other companies are doing, but Tesla is moving to training on 4D information. That means training on some sequence of temporal 3D representations. This is essential not just for object permanence but also for their depth perception.

Other companies could certainly build the same models. In the past "video" has certainly been limited by training and inference compute power , maybe transformers have helped reduce these requirements.

Of course, as you can imagine in deep learning, 4D transformers probably need lots and lots of data to work well.
 
  • Helpful
Reactions: shrineofchance