More data is not the solution to all problems, and especially not for safety-critical applications. To get ML to "work" you indeed need an insane amount (Petabytes) of labelled examples that are carefully curated. You need the right events, in all environments (the ODD) and in all weather and that still may not be enough.
The curation has been the bottleneck forever for self-driving. More compute will not solve this. And ML alone will not likely be able to provide any functional guarantees anytime soon.
Furthermore the models cannot be easily validated in a large ODD, and they are prone to dangerous regressions. There is no way to get logical "proof" of a models' safety or performance. Just look at Tesla's release history. It's nothing like a straight line going up in reliability.
There is plenty of research going on that might solve a few problems, but wrt more data the arrow are pointing in the wrong direction, if you look at the latest research. To get linear progress you likely need an exponential amount of training examples. Which again, is a bottleneck to curate.
The curation has been the bottleneck forever for self-driving. More compute will not solve this. And ML alone will not likely be able to provide any functional guarantees anytime soon.
Furthermore the models cannot be easily validated in a large ODD, and they are prone to dangerous regressions. There is no way to get logical "proof" of a models' safety or performance. Just look at Tesla's release history. It's nothing like a straight line going up in reliability.
There is plenty of research going on that might solve a few problems, but wrt more data the arrow are pointing in the wrong direction, if you look at the latest research. To get linear progress you likely need an exponential amount of training examples. Which again, is a bottleneck to curate.
Last edited: