Yes, well, but only Tesla is saying they're doing visual FSD. Rest are doing Lidar, which definitely handles darkness. So again it comes down to: is Tesla seeing something the rest are not? Surely they did with the BEV drivetrain. But that does not automatically translate to being all-seeing about everything...
I don't disagree that groups using lidar + camera could be experiencing problems with the camera at night and deciding to rely more heavily on the lidar when the camera is performing poorly, but it seems odd that people building test vehicles would choose to sacrifice camera performance by omitting an inexpensive IR emitter. I see it frequently argued that the dominant value in adding lidar is that it increases safety through the addition of a redundant and orthogonal sensor platform, and that the not insignificant cost is small relative to the potential safety benefit. If that's true then it seems odd to be willing to tolerate camera degradation, cameras being one of the redundant and orthogonal sensor modalities, in order to save a truly insignificant sum.
I wonder sometimes if there aren't different approaches to thinking about cameras and lidar, and that these different approaches lead to different assumptions about the relative contributions of these two platforms. For instance, early driving systems like the those used in the DARPA challenges seemed to be relying very heavily on lidar to make a reliable model of the environment, and then using cameras just to supplement information that couldn't be absorbed by a lidar - like the state of a traffic signal, for instance. If the lidar locates the traffic signal but can't read it then perhaps it's a simple matter to just extract the relevant portion of the camera FOV that contains the signal and evaluate it for whether the light is red. If one assumes that approach - that lidar is doing almost all of the work and cameras are just filling in a few details - then the relative performance of the camera is perhaps a lot less important. So maybe operating at night wouldn't be much different since the lidar system is doing the lion's share of the work. In the middle you have sensor fusion approaches where every object in the world is independently and probabilistically evaluated using all sensor modalities and a consensus is reached. In that scenario every sensor always contributes, but sensors are often also backed up by other approaches. If you assume this kind of model then you don't want any system to be degraded, but degradation is on the whole more tolerable. Then you have the mainly cameras supplemented by lidar way of thinking, which is the reverse of the first approach and has a reverse set of requirements. Doing sensor fusion is the most difficult proposition because orthogonal sensor modalities have to be fused at a very high level, which means you have to have independent perception stacks that are individually highly functional. The first and third seem easier because only one of the stacks must be extremely good.
I think the first one is a natural red herring because it appeals to a simple model of driving: the most important thing in driving is to not collide with another object and lidar will reliably tell me if an object is present, thus lidar provides the best way to avoid a collision. The reason this is a red herring is because it reduces the task of driving to transportation without collisions and leads to strong and simple, but incorrect, assumptions. In 2006 vehicles using this model were able navigate an urban environment and perform simple tasks independently and without having collisions. But the absence of any of those vehicles from real world applications a decade later illustrates that it's not sufficient to just avoid collisions. And the other necessary aspects of the task are in fact much more difficult.
The actual requirement that a self driving vehicle must fulfill is to construct and maintain an accurate predictive model of the environment in which the vehicle is operating. That environment is extraordinarily complex, includes many rare but important situations and, for competent functioning, must include things as abstract as anticipating the actions of other drivers, pedestrians, cyclists, animals, and various natural phenomena as well as being able to recognize and understand all manner of objects that might find their way onto a roadway. A car should not brake for a pigeon standing in the road but it is highly advisable to brake for comparably sized and oriented trailer hitch lying in the street. A tumbleweed on a highway requires a different response to a comparably sized rock. Cars will encounter all manner of fallen branches and must be able to proceed, and sometimes they will encounter fallen power lines and must be able to not proceed.
It is true that lidar provides a kind of input which is easy to interpret and act on if the primary objective is to avoid interacting with other objects. For this reason it was an indispensable contribution to early efforts to make self driving vehicles. But cameras are by far the richer source of information about the state of the environment. Sophisticated vision processing is not something which can be omitted from truly competent systems.
For this reason I don't think that sacrificing camera performance is a good idea, especially if that performance can be supplemented easily. Note that all the hazards I enumerated above are as likely to occur at night as they are during the day, and all of them are much harder to distinguish with lidar than they are with a camera.