Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Elon: "Feature complete for full self driving this year"

This site may earn commission on affiliate links.
Again, lidar isn't required for FSD, but it's extremely helpful to use it to label camera objects, especially for the 3D labeling approach Elon is talking about. You can use lidar to develop FSD, but it isn't required for FSD.

The whole Tesla has more data argument is moot if the data is inaccurate.

Is lidar that helpful for labeling? It can tell you an object it there, but how much does it do beyond that? Or does one use it to sub set the image and provide the clipped view for human ID?
 
I think you also need to consider how cost prohibitive LIDAR is at the moment. The Verge released a video on Waymo in December last year where they estimated that the sensors and compute for a single test vehicle costs $400,000. That massive up-front investment in sensors means that while Tesla can make money shipping their camera-based vehicles across the world, it would cost Google millions and millions of dollars to even approach a small fleet in all 50 states, let alone in every country.

LIDAR may give Waymo and others the advantage in small-scale rollout, but affordability gives Tesla the advantage in global rollout and global data gathering.

In terms of the cost of Waymo's FSD, I think there are a few things to bear in mind. First, costs usually come down between the prototype phase and the final production phase. Also, when Waymo first started their FSD project, they probably were not 100% sure what hardware they would need to achieve FSD. So they probably erred on the side of overkill to get to FSD, knowing they can always cut back, rather than risk not having enough hardware to do FSD at all. Now that Waymo has a functioning robotaxi with lots of autonomous miles, they have a better sense of what they can maybe cut back on and still have safe FSD. Also, the cost of the hardware is coming down.

Second, there are two types of lidar. There is the spinning type that Velodyne makes that you see on the roof of some autonomous cars. That lidar cost $75k a piece. Then, there is solid state lidar from companies like luminar are making, that are smaller and don't spin. That lidar only costs around $500 a piece. So it should be possible to use that type of lidar instead of the costly spinning type to drastically bring down the overall cost of the FSD hardware. In fact, cars like the Lucid Air will be equipped with several lidar sensors of the non-spinning type that are cheaper. So I think the overall cost even with lidar will be drastically less than $400k.

Lastly, there is the question of whether Tesla's hardware is even good enough to do safe FSD at all. Yes, Tesla has a big advantage in terms of global rollout but if the hardware is not good enough to do safe FSD, then the advantage is irrelevant.
 
Is lidar that helpful for labeling? It can tell you an object it there, but how much does it do beyond that? Or does one use it to sub set the image and provide the clipped view for human ID?

Lidar is extremely helpful for labeling. Just to give you an example. Let's say you have a pedestrian. The lidar knows exactly the pedestrians size, distance, and speed. The lidar can also persist the pedestrian volume in the 3-D environment. The human labeler can simply label the front of the pedestrian, and the lidar can then take care of the rest of the labels. So for example, if the pedestrian turns around and walks another direction, there's no need to relabel the front of the pedestrian because the persisted 3-D object was tracked at more than five refreshes a second using the lidar, that means all of the camera images of the pedestrian turning and moving a different direction was properly labeled by the lidar's accuracy throughout the video input. The human labeler only needed to label the front of the pedestrian once in order to get a large number of properly labeled pedestrian orientations and speeds using the data from the lidar.
 
Lidar is extremely helpful for labeling. Just to give you an example. Let's say you have a pedestrian. The lidar knows exactly the pedestrians size, distance, and speed. The lidar can also persist the pedestrian volume in the 3-D environment. The human labeler can simply label the front of the pedestrian, and the lidar can then take care of the rest of the labels. So for example, if the pedestrian turns around and walks another direction, there's no need to relabel the front of the pedestrian because the persisted 3-D object was tracked at more than five refreshes a second using the lidar, that means all of the camera images of the pedestrian turning and moving a different direction was properly labeled by the lidar's accuracy throughout the video input.

LIDAR may be more data-rich in information about distances, but I think it's data-poor in terms of context. This recent tweet by Karpathy says they are able to anticipate stop-sign placement before the cameras can see the stop-sign by learning about the context of the entire environment around the car. I don't know if that same sort of data-rich contextual learning could be achieved with a point cloud generated from LIDAR.
 
Lidar is extremely helpful for labeling. Just to give you an example. Let's say you have a pedestrian. The lidar knows exactly the pedestrians size, distance, and speed. The lidar can also persist the pedestrian volume in the 3-D environment. The human labeler can simply label the front of the pedestrian, and the lidar can then take care of the rest of the labels. So for example, if the pedestrian turns around and walks another direction, there's no need to relabel the front of the pedestrian because the persisted 3-D object was tracked at more than five refreshes a second using the lidar, that means all of the camera images of the pedestrian turning and moving a different direction was properly labeled by the lidar's accuracy throughout the video input. The human labeler only needed to label the front of the pedestrian once in order to get a large number of properly labeled pedestrian orientations and speeds using the data from the lidar.

Humm... I'm doubtful about the ability of the lidar to reliably detect person rotation. Seems like vision could track just as easily and also detect them turning.
Sounds like end result is similar to the Tesla labeling approach, but with less calculations needed.
 
Seems like vision could track just as easily and also detect them turning.

I agree. It "could." But lidar would be extremely helpful, accurate, and efficient for this type of thing: along with turning vehicles and any moving objects in the 3-D environment.

As for Karpathy's recent tweet about detecting stop signs, you guys need to remember that teslas have a front facing radar along with three front facing cameras. It seems Tesla has put all of its chips in the front detection for the car, but somehow the side and rear detection is not as important for accurate distances.
 
LIDAR's advantages are strong:
  • It is extremely reliable at detecting an object of sufficient size and range, and providing its distance, size and position. Very close to 100% reliable.
  • The result is a 3-D map of the world around you. It is trivial to isolate something from the things behind it (or in front of it.)
  • LIDAR uses emitted light, so it works independent of the ambient light. Night or day, clouds or sun, shadows or sunlight, it pretty much sees the same in all conditions.
  • It is robust against interference, and much higher resolution than radar.
  • Some LIDARs can also detect the speed any target is moving, using Doppler effect.
However, there are disadvantages:
  • Initially it was very expensive. High resolution LIDARS are made in small quantities and cost more than a car. (Newer LIDARs are now appearing at sub-$1,000 price points.)
  • The resolution is pretty modest. The best units get an image only 128 pixels high, though much more horizontally, at about a 10hz rate.
  • Range is limited. Typical LIDARs see well to about 70-100 metres, and get more limited returns from larger objects like cars to around 100m. Some are now claiming out to 200m but this is dubious. 1.5 micron LIDARS, which are even more expensive, can see further.
  • Most LIDARs have moving parts so they can scan the world. Flash LIDARs avoid moving parts but are currently even more expensive. (New generation solid state lidars reduce or eliminate moving parts.)
  • Refresh rates tend to be slower. In addition, since LIDARs normally scan a scene, the scene is distorted by the movement of the scanning car and the movement of the objects being scanned, because one end is scanned at a different time than the other end, and everything's moved.
  • LIDARs can face trouble in heavy rain, snow and fog, though it is similar to other light based sensors including cameras. LIDARs can also sometimes trigger invisible things like car exhaust.
  • LIDARs are better mounted outside. They need every photon so you don't want to send them through a windshield with any attenuation.
Cameras
  • Cameras are really inexpensive. The hardware can cost just tens of dollars. You can have lots of them.
  • Because the visible light cameras use reflected light, they can see an arbitrary distance in the daytime if they have a narrow field of view and can be aimed. At night they must use transmitted light -- like your headlights.
  • They see colour. LIDARs just see a grayscale in the infrared spectrum.
  • Unless they are aimable they do not have moving parts, but if they are aimable they can gain very high resolution for more distant objects. Even in the wide field, cheap cameras with very high resolution are available -- where a LIDAR might see 64 lines, a camera could see 3,000.
  • Because of this high resolution, and colour, they are able to understand things about the scene that can't be easily learned from lower-resolution LIDAR.
  • They can see traffic lights, brake lights, turn signals and other emitted light. They are superior for reading signs.
But cameras have a few downsides, the first being a deal-breaker:
  • Computer vision is just not good enough today to detect all important features with the reliability necessary for safe driving.
  • They must deal with lighting variation. Objects are routinely subject to moving shadows and can be lit from any direction -- or not lit at all.
  • They need illumination at night, and headlights might not be enough.
  • Computer vision takes a great deal of CPU or custom chips to get even as far as it does today.
Cameras or Lasers?
 
Lidar is extremely helpful for labeling. Just to give you an example. Let's say you have a pedestrian. The lidar knows exactly the pedestrians size, distance, and speed. The lidar can also persist the pedestrian volume in the 3-D environment. The human labeler can simply label the front of the pedestrian, and the lidar can then take care of the rest of the labels. So for example, if the pedestrian turns around and walks another direction, there's no need to relabel the front of the pedestrian because the persisted 3-D object was tracked at more than five refreshes a second using the lidar, that means all of the camera images of the pedestrian turning and moving a different direction was properly labeled by the lidar's accuracy throughout the video input. The human labeler only needed to label the front of the pedestrian once in order to get a large number of properly labeled pedestrian orientations and speeds using the data from the lidar.

Is that something you want for ML though? I'm not an ML expert but I think the model could pick up on Lidar idiosyncrasies in the training set and also it could become overfit. I know people deliberately inject noise into models to try to avoid over-fitting and make the model more robust. Also, Tesla says they are labeling on video as opposed to still frames, so movement is part of the training set, and refresh rate matters.
 
I agree. It "could." But lidar would be extremely helpful, accurate, and efficient for this type of thing: along with turning vehicles and any moving objects in the 3-D environment.

As for Karpathy's recent tweet about detecting stop signs, you guys need to remember that teslas have a front facing radar along with three front facing cameras. It seems Tesla has put all of its chips in the front detection for the car, but somehow the side and rear detection is not as important for accurate distances.

Lidar is good for letting you know something is probably there. We have dumbed down versions (laser scanner) for collision avoidance on some products we make.

I doubt the radar is making a difference on the video for stop sign detection. The vertical plane of it would not give a strong return, and the radar itself filters out the non-moving objects.
 
But cameras have a few downsides, the first being a deal-breaker:
  • Computer vision is just not good enough today to detect all important features with the reliability necessary for safe driving.
  • They must deal with lighting variation. Objects are routinely subject to moving shadows and can be lit from any direction -- or not lit at all.
  • They need illumination at night, and headlights might not be enough.
  • Computer vision takes a great deal of CPU or custom chips to get even as far as it does today.

Vision processing is not currently good enough, but cameras are great. Cameras have better dynamic range and usable resolution (via optics) than our eyes and need less light.
 
Vision processing is not currently good enough, but cameras are great. Cameras have better dynamic range and usable resolution (via optics) than our eyes and need less light.

Cameras themselves can be great but if your computer vision is not goo enough, it's a moot point. That's why Tesla is so focused on computer vision and machine learning. For a camera only approach to give you safe FSD, computer vision has to be near perfect. And it's why other companies are supplementing their cameras with lidar. Since computer vision is not good enough, rather than wait and try to make it good enough, they have added lidar to do FSD sooner. Tesla is focused on machine learning to get to computer vision that is good enough that they won't need lidar. But that process takes longer.
 
Vision processing is not currently good enough, but cameras are great. Cameras have better dynamic range and usable resolution (via optics) than our eyes and need less light.

On the other hand, if you gave a human the output of a Lidar sensor and had them drive the car through a 3D point cloud, like some kind of bizarre video game, I think they would have a really hard time, I think you'd make a lot of mistakes. A lot of things look totally unrecognizable and sometimes the points are too sparse to even recognize that you're looking at a car or not, especially if it's partially occluded. You lose a lot of the contextual clues that you get from vision, like watching how light plays across objects and creates shadows and gives the brain hints to things that are unseen.
 
  • Like
Reactions: willow_hiller
On the other hand, if you gave a human the output of a Lidar sensor and had them drive the car through a 3D point cloud, like some kind of bizarre video game, I think they would have a really hard time, I think you'd make a lot of mistakes. A lot of things look totally unrecognizable and sometimes the points are too sparse to even recognize that you're looking at a car or not, especially if it's partially occluded. You lose a lot of the contextual clues that you get from vision, like watching how light plays across objects and creates shadows and gives the brain hints to things that are unseen.

Not sure what's confusing about this:


Or this:

 
That's because you are using your evolved vision system to interpret an isometric view of the point cloud data with infinite persistence.
If we gave you the raw (or color coded) data set, it would not be so easy.

What are you talking about? Show me the data then

You do know the 2nd video is titled "raw data"?

Are you talking about the 01010100101001 data? Ya, I can't read that, obviously.