This always has me thinking. I get it that LIDAR can augment and help fill in some gaps, but how can there be a place where vision doesn't work? That would mean humans can't drive there.
I started to write up this answer and it got pretty long , sorry but I'm posting it anyway! In the following, I am not advocating for Lidar nor for Vision only. I'm just trying to answer your question in a balanced way.
Yes, Lidar operates near (though not usually within) the human visual spectrum, so it has no significant perceptual benefit over camera vision. And in fact , since almost all modern Lidar systems are monochromatic, the Lidar really doesn't see colors or even grayscale representation of colors. Within limits, it can see differences in IR reflection and may be able to distinguish stripes on a road or even some signs, for example, in the right conditions, but this isn't the main point of the sensor. Think of everything in the image being wrapped in a neutral flat gray shrink wrap, on a heavily overcast day. That's all you can expect, though sometimes you can get more than that. In other words, in a lot of ways it's much more limited than camera vision.
However, I think there are two very significant reasons that AV engineers are interested in it:
1. Lidar provides a three-dimensional map of rather definite and precise voxel-occupancy information. This is because it's fundamentally designed to do rangefinding. It doesn't just see there's something out there intersecting the beam, it measures the distance with good resolution and confidence. The 3D map produced requires no AI techniques to obtain its fundamental output, though I do think AI interpretation is increasingly being applied in conjunction with more conventional noise-reduction and signal conditioning to help clean up and then interpret features detected in the coordinate space.
Monocular* camera vision doesn't inherently have this rangefinding capability**. It sees a much more intricate palette of color and texture, but this requires interpretation to assign three-dimensional coordinates. This interpretation module, these days, will almost certainly be heavy on evolving AI techniques and training. Without some sort of visual interpretive intelligence-based on training experience, depth information is very hard to extract. There are ways of categorizing image elements into edges and surfaces, and attempting to apply transforms and other kinds of math to infer perspective that can help with ranging, but it's very complicated and far from reliable. Others here are probably better at explaining various emerging approaches for "Vidar", but I believe this has only become practical because of very modern AI/ML/NN techniques. Again, Lidar provides this information directly without needing any fancy interpretation software.
* On and off, you see people discuss binocular stereoscopic or multiscopic techniques that one might think could be extracted from the multiple cameras with their (somewhat) overlapping fields of view. This often goes along with the assumption that normal human depth perception must be a key element of driver vision. Actually this is not really true, human depth perception at the distances important in high speed driving is extremely limited because of the relatively short triangulation baselength of the distance between our eyes. Humans get their depth understanding at greater distances much more from their intelligent knowledge of the scene and its contents, and from the changing perspective clues as they and/or other cars are in motion. This doesn't mean that absolutely no beneficial information can be extracted from overlapping multiple camera views, but I think it just isn't a very significant first or second order benefit in machine vision for driving.
** Another sometimes-discussed aspect is that people know something about the technology of autofocus for digital still cameras and video cameras, and so one might naturally assume that some of those associated rangefinding techniques would be applicable to the video cameras on the car. However, again this does not really translate well to the AV situation. The cameras on Teslas, and I believe most AV cars, don't have any focus-modulation hardware, not needing it because the inherent depth of field is quite large for the small-sensor and limited-resolution video cameras on the car. Both of the much-debated "contrast detect" and "phase detect" autofocus methods provide very poor selectivity when the depth of field is large and so everything in the scene is more or less in focus already - especially anything that is many car lengths away. Another way of looking at this is that the triangulation baselength of an image-sensor-based camera autofocus system is no greater than the diameter of the lens' aperture - a tiny and fairly useless dimension for the cameras were talking about.
2. Lidar is an active-emitting ie self-illuminating system. It can see at night as well as, even better than, in the day. It is only looking for the return of its very specific laser wavelength and can largely reject everything else coming back.
However, direct sun or extremely strong sun reflections are still somewhat of a problem, and there are other possible sources of interference if and when many other cars become Lidar equipped. I wouldn't focus too much on these points because there are various ways to mitigate them and the technology continues to evolve and improve.
What may not improve too much, however, is that Lidar Imaging is quite vulnerable to poor-weather interference. Although the wavelength of the laser is at the long end of the visual spectrum, it's still nowhere near as long as conventional or new HD Radar. So it does a much poorer job of seeing through raindrops, snow, dust etc. I don't know the latest single processing advances, but my impression is that wide-spectrum camera vision has a better chance in a rainstorm then does Lidar. (Neither one is too good if you throw mud on the sensor and don't clean it off).
So I'll try to make a TLDR summary:
- Lidar provides a 3D there/not-there coordinate map of space occupancy that doesn't fundamentally depend on any evolving AI smarts. This is fundamentally a better capability than processed camera-vision image capture. For an engineer just starting to work on the self-driving problem, as well as for engineers who are tasked with providing the highest possible safety factors, this reliable coordinate map seems, at least initially, an almost irresistible benefit.
- But camera-vision engineers are having good success creating similar "Vidar" 3D maps by applying advanced computation including AI techniques to camera vision.
- Lidar provides its own illumination and is relatively good at rejecting interference from external illumination sources. Night vision is not an issue for lidar.
- Foul weather is a problem for Lidar, perhaps even more so than for camera vision.