I see people pondering over 'sensor disagreement' as if Tesla are struggling with something completely new and unexplored. Fact is it's nothing new at all. Sure an autonomous car for public roads is a new(ish) application, but complex autonomous systems with multiple sensors have been around for decades. So no it sure as hell isn't impossible to stitch different sensors together - it's been done!
Aircraft autopilot and auto landing systems as a close enough example. There we have not only multiple types of sensors, but also double or triple (redundant) copies of the same sensors - so even more potential disagreement to deal with than just combining cameras with radar. But that's what everyone is doing... except Tesla!
People repeat 'so what happens when camera says x, radar says y, lidar says z, etc' as if it's some profound 'gotcha' question that lays bare the impossibility of the task, and Elon's genius in avoiding the problem. It implies the more sensors you have, the more disagreement you can have - completely different data from every different sensor. But that's not how it works at all! It's more like the opposite - you typically get most sensors in agreement (they are all sensing the same environment after all) and one/some outliers. The more sensors you have, in fact the easier it is to identify the outliers/faulty sensors. Have enough redundancy to be highly confident in identifying the outliers, you even have self-troubleshooting. Then you decide what to do about it - voting, averaging, take most conservative action, etc. With only one sensor, when that's wrong you take the wrong action, blissfully unaware.
And ughhh - all this 'yeah but humans are Vision only!' is just asinine! Come back when the cameras are moving (or fixed in all locations needed), stereoscopic, self cleaning, super high res, high dymanic range and, this is the big one, the neural network attached to them is even 1% as sophisticated and capable as human brain V1.0! Or is that V1-billion or so, depends how we number the updates
. Yes the only known successful human-level driving system is a human, as somebody put it. Likewise the only known human-level aircraft piloting system is a human, but I don't see any aerospace companies developing a camera only autopilot (and the multi-sensor ones are close approaching human-level). It's not always best to copy nature's solutions - see the lack of cars employing legs or aircraft with flapping wings!
I fully expect other manufacturers to get their first, before Tesla Vision FSD.