Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Wiki Tracking FSD Feature Complete

This site may earn commission on affiliate links.
He obviously isn't going to be surprised by the corner cases as it was the basis of his entire strategy.

Tesla AP2+ vehicles should now be driving over 0.5 billion miles per month. As Tesla start focussing on the rarer and rarer edge cases they just need to scale up their fleet data collection rate and scale up their data annotation partners in India.

Those billions of miles driven were supposed to be the sufficient "training data set", or so we all thought.

I don't think EM thought there were so many edge cases just for stop signs / traffic lights. Anyway, just my tea leaf reading that his tweet shows an amount of irritation.

While billions of miles driven will give edge cases - if reported / collected - the real usefulness of large # of miles driven is in establishing true crash rates for AP. The higher the # of miles, the higher the confidence in crash rate.

BTW, I'm going through what verygreen found in terms of how Tesla seems to collect data - what kind of triggers they can / are using. We should understand that better to say how fast they will be able to solve edge/corner cases.
 
I don't think EM thought there were so many edge cases just for stop signs / traffic lights. Anyway, just my tea leaf reading that his tweet shows an amount of irritation.

While billions of miles driven will give edge cases - if reported / collected - the real usefulness of large # of miles driven is in establishing true crash rates for AP. The higher the # of miles, the higher the confidence in crash rate.

Billions of miles driven is essential for both edge case identification/data collection and for the crash rate verification.

I think the millions of edge cases was referring to solving all signs, not the Stop/Traffic light task.
When it's ready, Tesla can just tell its cars to take pictures of every sign it sees where it's identification confidence is less than say 99% (plus likely other criteria to pre-filter the data), this will leave it's annotation team in India (with help from some experts in California) with all the sign edge cases to annotate.
 
When it's ready, Tesla can just tell its cars to take pictures of every sign it sees where it's identification confidence is less than say 99% (plus likely other criteria to pre-filter the data), this will leave it's annotation team in India (with help from some experts in California) with all the sign edge cases to annotate.
You mean confidence > 50% but < 99% ? Otherwise they get a ton of other stuff. Even with this, they won't get good set of data.

One thing someone was suggesting for stop sign, for eg., was "car stops for a few seconds and drives on again - but NN doesn't identify a stop sign". I'm trying to figure out whether their triggering mechanism can catch something like this.
 
Billions of miles driven is essential for both edge case identification/data collection and for the crash rate verification.

I think the millions of edge cases was referring to solving all signs, not the Stop/Traffic light task.
When it's ready, Tesla can just tell its cars to take pictures of every sign it sees where it's identification confidence is less than say 99% (plus likely other criteria to pre-filter the data), this will leave it's annotation team in India (with help from some experts in California) with all the sign edge cases to annotate.

I get what you're going for with the 99%.

The actual confidence / True Predictive Rate, is going to need to be a lot higher than 99%.

A different way of thinking about 99% is some variation of:
- doesn't identify 1 stop light in 100 (== run red light!?!)
- doesn't identify 1 stop sign in 100
- identifies 1 stop sign / light in 100 as something different

My commute is pretty short, so I wouldn't see a failure each direction, or even each day. 99% probably gives me a weekly stop light recognition failure. Throw in the fact that I'm highly likely to be following somebody else, and the stop light recognition when it matters might mean 1 failure per month or 2. That's testable / beta test quality with engineers behind the wheel.

That's not something that's ready to be deployed to the fleet for general usage. That is good enough to run in shadow mode with an indicator on the dash showing a stop light (with information going back to the mothership every time a "run" a red light that shadow mode would stop me for, and vice versa).


I work in a manufacturing industry where (this is my own interpretation) - to get people to stop thinking in terms of 99% being really precise, we started measuring in units of DPM or Defects Per Million. In DPM, 99% quality is 10,000 wrong per 1M (stop lights in this case).

Once you've got your quality level to 10k DPM, then you Pareto the 10k failures and start fixing those. If you're lucky, the big ones will be measured in 1000's of DPM.

Once those are fixed, and your overall DPM is down to 1k, then you Pareto what's left and start fixing things in the 100's.

Etc..


My off-the-cuff guess is that stop light recognition is going to need to be single or double digit DPM (4 or 5 9's) to start thinking about using it on the street by car owners as a driver assistance feature. Or some fraction of DPM that rounds to 0 (or 6 9's).


Kinda like if your credit card fraud detector is 99% quality, then that means you're only rejecting 1 transaction per 100 for fraud that isn't actually fraudulent.

Or 99% quality check processing at a bank is existentially bad for the bank.
 
99% doesn't mean 1/100 doesn't recognize a particular light... It means it recognizes 99/100 lights 100% of the time.

For the lights that FSD does recognize, it will recognize them 100% of the time. People don't understand this. In the same way, I would say Tesla is able to recognize 99.5% of vehicles on the road right now. That means it will recognize 99.5% of vehicles 100% of the time. The other .5% are vehicles that it hasn't been trained adequately for.

Once tesla can achieve a 99% reliability for all FSD features, they can easily set up pre-determined routes and have them be 100% reliable.
 
Last edited:
Good point - one needs to know that the car sees the sign or light well in advance of taking action. Just like it is reassuring to see, on your screen, the cars that AP sees.
I completely agree the lights and stop signs should appear in the display so you know the car sees them. However, I could care less about seeing cars on the screen as that is simply too much distraction.
The only cars I really need to pay attention are those in my blind spots, cars further behind coming up at high speed and cars merging into the highway. I could care less about the rest since I need to be looking out the window to know if I need to take over.
 
I work in a manufacturing industry where (this is my own interpretation) - to get people to stop thinking in terms of 99% being really precise, we started measuring in units of DPM or Defects Per Million. In DPM, 99% quality is 10,000 wrong per 1M (stop lights in this case).
That's what Tesla does as well. They just call it six 9s - 99.9999% or 1 error in 1 million.

But what OP is saying is - if NN comes with a confidence of 99% (or lets say 30% to 99%) then those are the items that should go into training/testing data set as that is probably an edge case.
 
Last edited:
@neroden Would love to hear your opinions on this Karpathy presentation:

Andrej Karpathy | Multi-Task Learning in the Wilderness

IIRC your stance was that Tesla didn't even know what the scope of the problems were or begin to understand them... Does this video change (improve) your opinion of them? Are they more closer to grasping the problems, if not finding the solutions?
That was a nice presentation. It gave me a better appreciation for the need for more computing power. I don't think HW3.0 will satisfy the needed computing power for complete FSD (sleep in your car while it drives you), but that is just my uninformed opinion based upon the power in desktop computers over the ages.

I also don't think Tesla can accurately say when FSD will achieve the more aggressive milestones. Some of the improvement branches may be dead-ends and one may have to trim the branch and try another branch.

It seems like it is an awesome project to work on for CS people.
 
That was a nice presentation. It gave me a better appreciation for the need for more computing power. I don't think HW3.0 will satisfy the needed computing power for complete FSD (sleep in your car while it drives you), but that is just my uninformed opinion based upon the power in desktop computers over the ages.

It's not actually the computer in the car that does most of the computing. At the end of the day a neural network is just a series of weights; the computer in the car applies those weights to inputs to receive an output (which is a computationally simple process), the training of the network to adjust those weights based on millions of observations is the computationally complex part.

During the Autonomy Day Karpathy also mentioned the had been working on a model-training supercomputer. That's where the majority of the computation will occur (along with the labeling of hundreds of millions of training images).
 
It's not actually the computer in the car that does most of the computing. At the end of the day a neural network is just a series of weights; the computer in the car applies those weights to inputs to receive an output (which is a computationally simple process), the training of the network to adjust those weights based on millions of observations is the computationally complex part.

During the Autonomy Day Karpathy also mentioned the had been working on a model-training supercomputer. That's where the majority of the computation will occur (along with the labeling of hundreds of millions of training images).
Yes, good point - training at headquarters, not in the car.
 
  • Like
Reactions: willow_hiller
Karpathy is purposely being vague about everything lol. He says, "it's not obvious how to do this," but he doesn't say, "we haven't figured out how to do this."

I think most of the FSD development is still in the voodoo magic phase and is extremely reliant on a few developers. I hope they're clearly documenting how they're achieving progress in case Karpathy / important member leaves.
 
  • Informative
Reactions: willow_hiller
Karpathy is purposely being vague about everything lol. He says, "it's not obvious how to do this," but he doesn't say, "we haven't figured out how to do this."

I think most of the FSD development is still in the voodoo magic phase and is extremely reliant on a few developers. I hope they're clearly documenting how they're achieving progress in case Karpathy / important member leaves.
There are 200+ people in the team. I'm sure a lot of people know what they are doing. The problem would be figuring out changes that are needed as they go forward.

It seems to me like a long slog. I wrote in some other thread what I think is the process they are using.

It would be good for us to get an understanding of how the dev process really works on new features. Here is what I think. Lets take stop sign as an example.

A. Initial development
- Collect lots and lots of images with stop signs. They already have a lot of images without stop signs, so can use those as negative examples.
- Label the images. I've no idea how many images might be needed. Thousands ? 100s of thousands ? Millions ? This is the most laborious and time consuming part of the development.
- Create a new NN "task" which outputs presence/absence of stop sign and distance to the sign.
- Train the NN with labelled images.
- Write procedural code (heuristics / software 1.0) to use the NN task's output to stop the car at the right place.
- Iterate collecting images, labeling and training optimizing NN
- Once the initial quality bar is met, include in dev build

B. Test on internal fleet and iteratively fix procedural bugs and optimize NN

C. Include in shadow mode. Whenever the shadow mode is different from what the driver does, send the data.

D. Analyse the shadow mode data and include new scenarios where NN+software doesn't work properly and optimize, fix NN/software. This is also a very laborious, time consuming phase. If we think the shadow mode is operating even on 100k cars and they could send a Million data points every week. How do you analyze, pick the edge cases to include ? Having a lot of data is good, but its only the start. Lots of hard work, time & resources needed to use that data properly.

E. Include in early release and fix/optimize as in C/D.

F. Release widely to fleet.

As you can see, it is a laborious and time consuming process. That is why each new feature takes months. As I wrote elsewhere, Tesla tries to get a particular feature good to six 9s before releasing to the fleet (that doesn't mean it is really six 9s, because as they release widely they will find more bugs and edge cases).​
 
  • Informative
Reactions: willow_hiller
A different way of thinking about 99% is some variation of:
- doesn't identify 1 stop light in 100 (== run red light!?!)
- doesn't identify 1 stop sign in 100
- identifies 1 stop sign / light in 100 as something different
When the confidence is 99% - it means that particular object is being identified as a stop sign - but 1 out of 100 cases, it is actually not a stop sign i.e. it is giving you the false positive error rate.

You want to look for stop signs that come in as 1% to 49% (meaning the software will not show it as a stop sign). Some of these would be actually stop signs and would form training data (and are false negatives). Some of them would not be stop signs and they form the negative training data.

Similarly if you want to train on false positives, you examine 50% to 99% ones to find which ones are actually not stop signs and use them to train. They are probably more worried about false negatives than false positives, though.
 
During his talk, he keeps referring to his team as "20" people, but maybe those are the managers of specific tasks.
Those are probably people who own specific NN tasks. Those are the people making choices about optimization, training data, edge cases, what triggers are needed etc.

There may be more people working on NN, who support those 20 task owners. There are also people who write tools for labeling, training, infrastructure etc.

The 200 number probably includes heuristics developers as well.
 
My s85D (2015) just gave itself a heart attack. I was driving through an intersection (green light) and it started to beep like "collision imminent". A red car showed up on the dash at the same time. I believe some auto braking started to happen, but then as I passed through the intersection, all of the warnings went away. I'm sure glad it didn't lock up the brakes on me and that nobody was behind me. The only thing I could imagine was that a bad shadow fooled it, or it got into some sort of crazy corner case. 4 pm in the afternoon, westbound in the Phoenix area.