Doggydogworld
Active Member
Craddock's points are irrelevant -- he got 6 likes and 3 replies. Musk tweets some random gibberish and gets eleventy billion likes and replies. That's how you win in today's world. That's why Tesla collects billions in FSD revenue while Waymo collects $118.53.@diplomat33 has shared some posts from Warren Craddock and it is a really good place to start when understanding ADS. Most of us aren't experts in the field and is good to have a basic understanding so we can communicate effectively. There are some things I had the wrong understanding on that is cleared up by some of his post so I will share some especially because I recently had an unpleasant discussion about perception, sensor fusion and how it is not the most difficult part of solving autonomous driving.
AV Myth #7: "If two sensors disagree, there's no way to figure out which one to believe." Fact: The disagreements are the *entire point* of using multiple types of sensors. If they never disagreed, then they'd be 100% redundant, and would offer no additional value. 1/n
Radar does well in fog, because fog is essentially transparent at radio wavelengths. Cameras do well in daylight, because the sun is bright. Lidar does well at night, because it produces its own light. Each sensor sees the world differently, and they disagree all the time. 2/n
The disagreements between sensors are intentional and desirable. The disagreements mean the AV gets much richer data than it would from any individual sensor. The sensors are much stronger together than they are individually. 3/n
Sensor fusion is a mature field. Medical imaging is a prominent example, where PET and CT scanners are bundled together. The PET system detects positrons, but can't see the body itself. The CT system sees the body, but can't detect positrons. 4/n https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1987321/
Sensor fusion is never about trusting one sensor over another. Instead, data from multiple sensors is combined and fed into a single neural net, which learns to make the best use of the salient features provided by all the sensors. 5/n https://arxiv.org/pdf/2005.09202.pdf
An aside: If you believe neural nets are sufficient to drive cars more safely than humans, then you must also believe that neural nets are able to competently combine inputs from multiple sensors -- a vastly simpler task. 6/n
Some of the most exciting work in the AI research community today is explicitly multi-modal, e.g. DeepMind's Gato. An AI system can be much stronger when it is given text, images, sound, etc. all associated with the same event. 7/n https://www.deepmind.com/publications/a-generalist-agent
This is of course also true for humans: we learn more effectively when we're taught something in a way that uses all our senses. Even kindergarteners understand that. 8/n
In summary, we use multiple sensing modalities (cameras, radars, lidars) precisely *because* they sometimes disagree. Together, they provide a fuller and more complete picture of the world than they do individually. And we put it all into the same neural nets, anyway! 9/9