Billions of miles driven is essential for both edge case identification/data collection and for the crash rate verification.
I think the millions of edge cases was referring to solving all signs, not the Stop/Traffic light task.
When it's ready, Tesla can just tell its cars to take pictures of every sign it sees where it's identification confidence is less than say 99% (plus likely other criteria to pre-filter the data), this will leave it's annotation team in India (with help from some experts in California) with all the sign edge cases to annotate.
I get what you're going for with the 99%.
The actual confidence / True Predictive Rate, is going to need to be a lot higher than 99%.
A different way of thinking about 99% is some variation of:
- doesn't identify 1 stop light in 100 (== run red light!?!)
- doesn't identify 1 stop sign in 100
- identifies 1 stop sign / light in 100 as something different
My commute is pretty short, so I wouldn't see a failure each direction, or even each day. 99% probably gives me a weekly stop light recognition failure. Throw in the fact that I'm highly likely to be following somebody else, and the stop light recognition when it matters might mean 1 failure per month or 2. That's testable / beta test quality with engineers behind the wheel.
That's not something that's ready to be deployed to the fleet for general usage. That is good enough to run in shadow mode with an indicator on the dash showing a stop light (with information going back to the mothership every time a "run" a red light that shadow mode would stop me for, and vice versa).
I work in a manufacturing industry where (this is my own interpretation) - to get people to stop thinking in terms of 99% being really precise, we started measuring in units of DPM or Defects Per Million. In DPM, 99% quality is 10,000 wrong per 1M (stop lights in this case).
Once you've got your quality level to 10k DPM, then you Pareto the 10k failures and start fixing those. If you're lucky, the big ones will be measured in 1000's of DPM.
Once those are fixed, and your overall DPM is down to 1k, then you Pareto what's left and start fixing things in the 100's.
Etc..
My off-the-cuff guess is that stop light recognition is going to need to be single or double digit DPM (4 or 5 9's) to start thinking about using it on the street by car owners as a driver assistance feature. Or some fraction of DPM that rounds to 0 (or 6 9's).
Kinda like if your credit card fraud detector is 99% quality, then that means you're only rejecting 1 transaction per 100 for fraud that isn't actually fraudulent.
Or 99% quality check processing at a bank is existentially bad for the bank.