This is 2D vision:
Image credit: greentheonly.
Projecting out 3D models from detections within individual 2D images.
This is 3D vision:
Image source: Karpathy at CVPR 2020.
The neural network directly outputs a 3D model from the 8 camera images.
This is 4D vision (3D + time):
“[The neural network] has seen the images over time and [it has] done the tracking. And having accumulated information from all those frames, here's actually what the world looks like around you.” -Karpathy
Original source.
Last edited: