Here is a very recent video going into Lidar perception:
My takeaway:
Current Lidar approaches do the feature engineering online between the input and the neural network.
Tesla are doing their feature engineering when they generate the labels offline. This saves online computation and no information is lost from the input to the neural network. It also abstracts away the feature engineering which plays into software 2.0 while Lidar inherently has to be more software 1.0.
At some point(no pun intended) the Lidar need to go to pseudo camera to make the data palatable to the neural network, while camera is doing pseudo lidar to generate the labels. If you use range pseudo image directly, you need to discretize the ranges, this loses information. Meanwhile when camera does pseudo Lidar you don’t lose data, but you get holes in the output. These holes can be filled up with using data from the future or different drives offline, but you cannot fill up the holes using the future when you run online.
The Lidar approach seems less clean to me. It must be so easy to work for Tesla, each group can focus on doing what they like doing. The point cloud guys can do point clouds, the neural network guys can do neural networks. These groups don’t really need to care about what the other group does as long as they have agreed on their interface, which just is labels. Meanwhile the Lidar team needs to do both and they will change the interface between the data and the neural network all the time as they try to figure out what is the best interface.
My takeaway:
Current Lidar approaches do the feature engineering online between the input and the neural network.
Tesla are doing their feature engineering when they generate the labels offline. This saves online computation and no information is lost from the input to the neural network. It also abstracts away the feature engineering which plays into software 2.0 while Lidar inherently has to be more software 1.0.
At some point(no pun intended) the Lidar need to go to pseudo camera to make the data palatable to the neural network, while camera is doing pseudo lidar to generate the labels. If you use range pseudo image directly, you need to discretize the ranges, this loses information. Meanwhile when camera does pseudo Lidar you don’t lose data, but you get holes in the output. These holes can be filled up with using data from the future or different drives offline, but you cannot fill up the holes using the future when you run online.
The Lidar approach seems less clean to me. It must be so easy to work for Tesla, each group can focus on doing what they like doing. The point cloud guys can do point clouds, the neural network guys can do neural networks. These groups don’t really need to care about what the other group does as long as they have agreed on their interface, which just is labels. Meanwhile the Lidar team needs to do both and they will change the interface between the data and the neural network all the time as they try to figure out what is the best interface.
Last edited: