Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

How much data does your HW2+ Tesla upload?

This site may earn commission on affiliate links.
Ask people with wifi monitoring to contribute their upload data?

I’m trying to get a rough ballpark estimate of how much data (in MB or GB) Tesla is collecting from each HW2+ car (HW2 or HW2.5). If you monitor this on your car, please let us know here.

If we can get a rough monthly average, then it’s a simple matter of multiplying it by the HW2+ fleet size to find the total. Since some individual cars may be randomly (or non-randomly) selected to upload more data than normal, it’s important to get as many people as possible to share this info.

We can then use findings from verygreen and others to see how GB of data uploaded might translate into hours of video (and other data). Taking an estimated average driving speed, we can estimate recorded miles based on hours of video uploaded.

Getting a ballpark number is useful in for a number of analyses. Perhaps this would help contextualize Jimmy_d’s neural network analysis, since he mentioned AKnet_V9 requires a lot more training data than the previous version of AKnet. It’s also useful for comparing Tesla to other companies like Waymo in terms of how much faster/slower you would infer Tesla is making progress based on first principles.

It’s possible to compare against other numbers like Tesla’s R&D budget and the cost of labelling data. We can get a sense of whether or not Tesla could afford to label all the data it’s collecting.
 
you really need to come with a teslafi like interface where people could plug in daily totals I suspect (or at least some sort of a google doc?) otherwise tallying up the data is going to get ugly real fast.
 
This comes from a FW report I get every day. I have the car pinned to a specific IP so I know when it is Tx or Rx data. The reports I get are top 20 local IPs for traffic produced in a given 24 hrs. So, if the car falls below the top 20 it is not recorded. These are the days it was recorded, in the near past so to speak. This is data that the car had sent (vs the data it has received, like updates, maps, etc). The VPN setups when I occasionally ping the car from the Tesla app are tiny but they are buried in the Tx data.

8/10/18 - 103.062 Mb
8/11/18 - 120.683 Mb
8/12/18 - 194.324 Mb
9/4/18 - 28.384 Mb
9/5/18 - 80.461 Mb
9/7/18 - 173.708Mb
9/8/18 - 120.388Mb
9/12/18 - 167.198Mb
9/15/18 - 118.832Mb
9/16/18 - 76.282Mb
9/21/18 - 351.443Mb
9/27/18 - 19.866Mb
9/28/18 - 145.014Mb
10/5/18 - 11.887Mb
10/9/18 - 60.596Mb
10/11/18 - 256.403Mb
10/12/18 - 645.216Mb
10/19/18 - 142.184Mb
10/22/18 - 114.106Mb
10/28/2018 - 27.281Mb
11/11/2018 - 18.925Mb
 
This comes from a FW report I get every day. I have the car pinned to a specific IP so I know when it is Tx or Rx data. The reports I get are top 20 local IPs for traffic produced in a given 24 hrs. So, if the car falls below the top 20 it is not recorded. These are the days it was recorded, in the near past so to speak. This is data that the car had sent (vs the data it has received, like updates, maps, etc). The VPN setups when I occasionally ping the car from the Tesla app are tiny but they are buried in the Tx data.

8/10/18 - 103.062 Mb
8/11/18 - 120.683 Mb
8/12/18 - 194.324 Mb
9/4/18 - 28.384 Mb
9/5/18 - 80.461 Mb
9/7/18 - 173.708Mb
9/8/18 - 120.388Mb
9/12/18 - 167.198Mb
9/15/18 - 118.832Mb
9/16/18 - 76.282Mb
9/21/18 - 351.443Mb
9/27/18 - 19.866Mb
9/28/18 - 145.014Mb
10/5/18 - 11.887Mb
10/9/18 - 60.596Mb
10/11/18 - 256.403Mb
10/12/18 - 645.216Mb
10/19/18 - 142.184Mb
10/22/18 - 114.106Mb
10/28/2018 - 27.281Mb
11/11/2018 - 18.925Mb

I averaged this and got 142 MB of upload per day.

~2.8 billion HW2 miles / 31.8 miles per day per car = 88 million days of driving

88 million days of driving * 142 MB of upload per day = 12.5 million GB (12,500 TB; 12.5 PB)

12.5 million GB * 90% of data upload for images (just a guess) = 11.25 million GB

1.18 MB per image * 8 cameras = 9.44 MB per 8-camera snapshot

11.25 million GB / 9.44 MB = 9.5 billion images (or 1.2 billion 8-camera snapshots)

———————

If accurate, this would be almost 3x larger (2.7x) than Facebook’s database of 3.5 billion images, which is the largest I’ve heard of. But Facebook’s database was geared toward classifying the 1000 semantic classes of the ImageNet challenge; an autonomous car might use something like 50-100 semantic classes. So this hypothetical Tesla database would have something like ~30x to ~60x the number of images per semantic class.

Plus, Facebook’s dataset uses “weakly supervised” learning. Rather than being professionally labelled, Facebook just pulled the images from Instagram and used hashtags as the labels.

If Tesla really has uploaded ~9 billion images, the bigger difficulty and cost would be labelling them.
 
Last edited:
If Tesla really has uploaded ~9 billion images, the bigger difficulty and cost would be labelling them.

Let’s say 70% of AKnet’s classifications are correct, which is about GoogLeNet/Inception v1’s top-1 accuracy on ImageNet. Let’s say for a correct classification, it only takes 5 seconds on average for a human labeler to look at the object and click the green checkmark (or whatever). In the other 30% of cases, let’s say it takes 15 seconds on average to select the correct label. (Which, using GoogLeNet’s top-5 accuracy of almost 90%, will be one of the first 5 items in the drop-down menu in 20% of cases.)

Let’s say each image has 5 objects on average. 5 objects * 70% = 3.5 correctly labelled objects. 3.5 correctly labelled objects * 5 seconds = 17.5 seconds

5 objects * 30% = 1.5 mislabelled objects. 1.5 * 15 seconds = 22.5 seconds

17.5 seconds + 22.5 seconds = 40 seconds per image on average

1 hour / 40 seconds = 90 images per hour

9 billion images / 90 images per hour = 100 million hours of labour

100 million hours of labour * $11/hour (California minimum wage) = $1.1 billion

$1.1 billion / 8 quarters (Q4 2016 to Q4 2018) = $110 million per quarter

Tesla’s R&D budget is about $350 million per quarter. So it wouldn’t be impossible for Tesla to pay this much, although $110 million per quarter intuitively seems too high to me.

I wonder if in reality this work is outsourced to countries where wages are much lower. The Informarion reported that Tesla uses third-party labelling companies, but it didn’t give any details beyond that.

In India, for example, a typical wage for call centre employees is $2/hour. So, at India wages rather than California wages, 100 million hours of labour would cost $200 million in salaries, or $25 million per quarter over 8 quarters. Only 7% of the budget.

I could be greatly underestimating or overestimating the time it takes to label images. Probably AKnet’s top-1 accuracy on Tesla HW2 images is better than GoogLeNet’s top-1 accuracy on ImageNet, since 1) there are probably something like 1/10th to 1/20th the number of semantic classes in Tesla’s dataset as in ImageNet and 2) you train ImageNet challengers on 1 million images, whereas I’m postulating here that AKnet is trained on billions of images. Also, 3) the offline version of AKnet is more accurate than the online version.

On the other hand, it might be hard to maintain a 5-second pace for correctly labelled images over hours. There will be outliers where objects take minutes to label, and I don’t know how that will impact the average. On the other other hand, you could probably hit that green checkmark in about 1-2 seconds in a lot of cases.

Upshot from an upload and labelling perspective is that 9 billion labelled training images doesn’t seem like an unrealistic amount.

Now, how many GPUs would it take to train a neural network on 9 billion images?

Facebook used 336 GPUs to train ResNeXt 101-32x-48d on 3.5 billion images over 22 days.

An Nvidia V100 GPU costs $19,000. Let’s say you want 3024 GPUs so you can train a network on ~10 billion images in about a week. 3024 * $19,000 = $57 million. No biggie with Tesla’s budget, assuming you accumulate these GPUs over time as you need them (i.e. as your dataset grows). Over 8 quarters, it’s $7 million per quarter.
 
Last edited:
Thread on reddit maybe you can also do the query there Wifi traffic on my Model X before/after V9 : teslamotors

Thanks!

1 GB / 29 days = 34.5 MB per day

(this is similar to the 30 MB per day average reported last year on Electrek)

34.5 MB / 142 MB = 24%

9.5 billion images * 24% = 2.3 billion images

So, Tesla might have ~2 billion images rather than ~9 billion.

It’s also possible a lot less than 90% of data upload is still images. There could be a lot of video uploads, for example.
 
Last edited:
1 hour / 40 seconds = 90 images per hour

According to the New York Times, people on Mechanical Turk labelled the ImageNet images at an average rate of 3000 per hour, or 50 per minute. I don't even know how that's possible. That's an image every 1.2 seconds.

This sounds fishy to me because it seems faster than humanly possible. Human reaction time is 0.2-0.3 seconds, and even if you type at 100 words per minute, that's 0.6 seconds to type one word. So it should take around 0.9 seconds just to notice the image and type the label, leaving only 0.3 seconds to recognize the image and think of the word. This doesn't seem possible.

an autonomous car might use something like 50-100 semantic classes.

Apparently the Tsinghua-Tencent 100k benchmark dataset has 128 classes of traffic sign, although only 45 of them are non-rare. Still, lvl5's 40 semantic classes is definitely too few. I'm going to change my guess to ~200 semantic classes. 128 types of traffic sign and 72 other categories of object.
 
Last edited:
tenor.gif
 
Nvidia says they have 1,500 people labeling images and they can label 1 million images per month (about 3 images per hour on a 8 hour work 7-day shift).
Based on your 5 seconds calculation they should have been able to label a ridiculous 259,200,000 million images.