You still didn't answer the question, what does "it will take a lot of data" mean.

You need to quantify that statement unless its useless.

You say that Tesla has "lot of data" but you dont want to quantify "lot of data". So you are making it about Tesla.

If I had to speculate...

There are at least 6 million intersections in the main 500 or so urbanized areas of the U.S. (So probably even more including rural).

Every intersection should be "tested" N number of times. N is related to the dimensionality and complexity of the problem. N ~ 2^D where D = # dimensions.

Dimensions are things like

1) Is it nighttime vs daytime?

2) Raining vs not

3) Foggy vs not

4) Pedestrian crossing vs not

We could probably come up with a 100 or so dimensions (just a guess). 2^100 = 1267650600228229401496703205376

And right now we are assuming sampling each intersection with each combination of conditions just one time is sufficient. But it probably should be 10, or 100 times.

so 6 million x 100 samples x 1267650600228229401496703205376 combinations.

So we need ~ 7 x 10^32 cases.

Now, most of these cases will be redundant. Probably 99.999999 % will be useless. The problem is, you don't know which ones! So, a test vehicle may not have to save and send the data back to the cloud for training, but the test vehicle has to at least see it and compare it with whatever inference model is running.

This is the

*conservative* approach. Waymo, of course, has nothing close to this. Tesla does not either. But we know with say 100,000 or 1,000,000 vehicles (in a few years) in the U.S., most intersections can be sampled 10 to 100 times daily. Then in one year, each intersection could be sampled 30,000 times. So we could have 200 billion intersection samples per year (2 x 10^10).

2 x 10^10 still doesn't reach the 7 x 10^32 came up with. So

**no**, Tesla would not have enough data to match my conservative estimate of how much data needs to be seen.

But, of course we hope that each intersection doesn't really need to be sampled 2^100 times, hopefully there is a bunch of overlap in edge cases. How many orders of magnitude can be chopped off? Only the performance of the algorithms over time will tell us.

Tesla (and maybe eventually in a few years Mobileye) will be able to run their algorithm on every intersection in the U.S. daily to see when both their training accuracy has reach an ultimate asymptote and see performance on unseen diverse test data to make sure performance on test data matches training data.

Waymo, Cruise

**do not and will not **have that. They are banking on most of those intersections not mattering, most interactions not mattering.

Lot's of assumptions.