I expect faster reaction times, multi sensor validation, increased field of view, constant attention and conservative action bias can compensate for inferior driving policy relative to average human and lead to less accidents.
I also expect driving policy can be made safer than humans, even without understanding causation.
Whether these together produce a car which is 10x, 100x, 1000x or 10000x safer than an average human I don’t know. But I’m sure an understanding of why something happened is going to allow for a solution which is better able to generalise. This will be able to better predict (giving more time to deal with the laws of Physics) and safely deal with scenarios very different from anything it has ever seen. So at some level, causation will add to safety. But it may never be required for Robotaxis.
One of the things that I loved when watching the presentation is realizing that the method they're approaching will - without any "logic" being dictated from Tesla - account for my typical examples of problematic edge cases from Tesla.
I'd often give the example of sheep. If you have a ewe beside the road, you should watch it but not be too concerned. If you have a lamb beside the road, you should tune your caution level up to "moderate". But if there's a lamb on one side and a ewe on the other, you should really turn your caution up to "high" and slow down because the lamb will almost inevitably run to its mother when a car approaches. I used to point out... is there going to be someone at Tesla programming the nuances of sheep behavior and teaching it to recognize a ram from a ewe from a lamb? And all of the edge cases like this?
Except no, and there doesn't need to be. Because their approach is to let the neural net itself solve for actor intent based on any and all content in the scene, without being told
how to expect actors to behave. It's free on its own to determine that if there's a "lamb-looking" actor on one-side of the road and a "ewe-looking" actor on the other, the former is likely to run to the latter. No, curating and labeling the dataset that the net needs to train to in order to figure out this sort of thing is not a fully automated, zero-labour task. But it also doesn't require that Tesla program in sheep-behaviorual-logic and things like that
My other common example is that of "adjusting driving cautiousness based on the environment". Is there a shoulder? How stable does that shoulder look? What's your braking distance going to be on it? If you drive off, will you end up in grass, or drive off a cliff? How likely is your current driving surface to send you accidentally into the shoulder? Etc. Again, Tesla doesn't have to spell out all of the nuance in the logic - it just simply needs to label roads
and shoulders (which they apparently already do, from the images they presented), label danger areas, and the
neural net itself will learn to recognize them, using
all of its sensor data (not just cameras). And of course, Tesla gets hinted in to danger areas based on where users tend to disengage the system or at least lower the speed, so it can use these sorts of clues in order to build its training dataset.
Their approach seems endlessly flexible. That doesn't mean "ready soon", mind you. But it means that it's a practical approach.