TMC is an independent, primarily volunteer organization that relies on ad revenue to cover its operating costs. Please consider whitelisting TMC on your ad blocker and becoming a Supporting Member. For more info: Support TMC

How much data does your HW2+ Tesla upload?

Discussion in 'Autonomous Vehicles' started by strangecosmos, Nov 12, 2018.

  1. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    896
    Location:
    The Prime Material Plane
    I’m trying to get a rough ballpark estimate of how much data (in MB or GB) Tesla is collecting from each HW2+ car (HW2 or HW2.5). If you monitor this on your car, please let us know here.

    If we can get a rough monthly average, then it’s a simple matter of multiplying it by the HW2+ fleet size to find the total. Since some individual cars may be randomly (or non-randomly) selected to upload more data than normal, it’s important to get as many people as possible to share this info.

    We can then use findings from verygreen and others to see how GB of data uploaded might translate into hours of video (and other data). Taking an estimated average driving speed, we can estimate recorded miles based on hours of video uploaded.

    Getting a ballpark number is useful in for a number of analyses. Perhaps this would help contextualize Jimmy_d’s neural network analysis, since he mentioned AKnet_V9 requires a lot more training data than the previous version of AKnet. It’s also useful for comparing Tesla to other companies like Waymo in terms of how much faster/slower you would infer Tesla is making progress based on first principles.

    It’s possible to compare against other numbers like Tesla’s R&D budget and the cost of labelling data. We can get a sense of whether or not Tesla could afford to label all the data it’s collecting.
     
  2. verygreen

    verygreen Curious member

    Joined:
    Jan 16, 2017
    Messages:
    2,054
    Location:
    TN
    you really need to come with a teslafi like interface where people could plug in daily totals I suspect (or at least some sort of a google doc?) otherwise tallying up the data is going to get ugly real fast.
     
  3. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    896
    Location:
    The Prime Material Plane
    Hmm... Well if 10 people tell me how much data their car uploaded in 1 week/1 month, I can just average that. That’s just 10 numbers.
     
  4. r0xx0r

    r0xx0r Member

    Joined:
    Jul 9, 2016
    Messages:
    267
    Location:
    CA
    I drive ~ 50-60 miles on Autopilot per day.

    Before v9: ~ 200-300MB upload.
    After: ~ 40-70MB.
     
    • Like x 1
  5. verygreen

    verygreen Curious member

    Joined:
    Jan 16, 2017
    Messages:
    2,054
    Location:
    TN
    did you install v9 "out of band" at any point in time?
     
  6. pyraca

    pyraca Member

    Joined:
    Mar 3, 2018
    Messages:
    14
    Location:
    San Jose, CA
    This comes from a FW report I get every day. I have the car pinned to a specific IP so I know when it is Tx or Rx data. The reports I get are top 20 local IPs for traffic produced in a given 24 hrs. So, if the car falls below the top 20 it is not recorded. These are the days it was recorded, in the near past so to speak. This is data that the car had sent (vs the data it has received, like updates, maps, etc). The VPN setups when I occasionally ping the car from the Tesla app are tiny but they are buried in the Tx data.

    8/10/18 - 103.062 Mb
    8/11/18 - 120.683 Mb
    8/12/18 - 194.324 Mb
    9/4/18 - 28.384 Mb
    9/5/18 - 80.461 Mb
    9/7/18 - 173.708Mb
    9/8/18 - 120.388Mb
    9/12/18 - 167.198Mb
    9/15/18 - 118.832Mb
    9/16/18 - 76.282Mb
    9/21/18 - 351.443Mb
    9/27/18 - 19.866Mb
    9/28/18 - 145.014Mb
    10/5/18 - 11.887Mb
    10/9/18 - 60.596Mb
    10/11/18 - 256.403Mb
    10/12/18 - 645.216Mb
    10/19/18 - 142.184Mb
    10/22/18 - 114.106Mb
    10/28/2018 - 27.281Mb
    11/11/2018 - 18.925Mb
     
    • Informative x 1
    • Love x 1
  7. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    896
    Location:
    The Prime Material Plane
    Is this daily upload?

    Thank you so much!
     
  8. r0xx0r

    r0xx0r Member

    Joined:
    Jul 9, 2016
    Messages:
    267
    Location:
    CA
    Yes. I followed your instruction :) Thanks again.

    Yes, it is. And it also downloaded 215MB at once last week, 1.16GB two weeks ago. I think they are map updates.
     
    • Helpful x 1
  9. verygreen

    verygreen Curious member

    Joined:
    Jan 16, 2017
    Messages:
    2,054
    Location:
    TN
    Well, I think there were at least some incidents of trigger-blacklisting on those unapproved updated cars so it might be your data is not really representative. (clearly there are some uploads, but the level did drop and they might be non-AP "monitoring" uploads instead)
     
    • Helpful x 1
    • Like x 1
  10. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    896
    Location:
    The Prime Material Plane
    #10 strangecosmos, Nov 21, 2018
    Last edited: Nov 21, 2018
    I averaged this and got 142 MB of upload per day.

    ~2.8 billion HW2 miles / 31.8 miles per day per car = 88 million days of driving

    88 million days of driving * 142 MB of upload per day = 12.5 million GB (12,500 TB; 12.5 PB)

    12.5 million GB * 90% of data upload for images (just a guess) = 11.25 million GB

    1.18 MB per image * 8 cameras = 9.44 MB per 8-camera snapshot

    11.25 million GB / 9.44 MB = 9.5 billion images (or 1.2 billion 8-camera snapshots)

    ———————

    If accurate, this would be almost 3x larger (2.7x) than Facebook’s database of 3.5 billion images, which is the largest I’ve heard of. But Facebook’s database was geared toward classifying the 1000 semantic classes of the ImageNet challenge; an autonomous car might use something like 50-100 semantic classes. So this hypothetical Tesla database would have something like ~30x to ~60x the number of images per semantic class.

    Plus, Facebook’s dataset uses “weakly supervised” learning. Rather than being professionally labelled, Facebook just pulled the images from Instagram and used hashtags as the labels.

    If Tesla really has uploaded ~9 billion images, the bigger difficulty and cost would be labelling them.
     
  11. Anner J. Bonilla

    Joined:
    Oct 10, 2014
    Messages:
    104
    Location:
    Miami FL
    • Helpful x 1
  12. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    896
    Location:
    The Prime Material Plane
    #12 strangecosmos, Nov 21, 2018
    Last edited: Nov 21, 2018
    Let’s say 70% of AKnet’s classifications are correct, which is about GoogLeNet/Inception v1’s top-1 accuracy on ImageNet. Let’s say for a correct classification, it only takes 5 seconds on average for a human labeler to look at the object and click the green checkmark (or whatever). In the other 30% of cases, let’s say it takes 15 seconds on average to select the correct label. (Which, using GoogLeNet’s top-5 accuracy of almost 90%, will be one of the first 5 items in the drop-down menu in 20% of cases.)

    Let’s say each image has 5 objects on average. 5 objects * 70% = 3.5 correctly labelled objects. 3.5 correctly labelled objects * 5 seconds = 17.5 seconds

    5 objects * 30% = 1.5 mislabelled objects. 1.5 * 15 seconds = 22.5 seconds

    17.5 seconds + 22.5 seconds = 40 seconds per image on average

    1 hour / 40 seconds = 90 images per hour

    9 billion images / 90 images per hour = 100 million hours of labour

    100 million hours of labour * $11/hour (California minimum wage) = $1.1 billion

    $1.1 billion / 8 quarters (Q4 2016 to Q4 2018) = $110 million per quarter

    Tesla’s R&D budget is about $350 million per quarter. So it wouldn’t be impossible for Tesla to pay this much, although $110 million per quarter intuitively seems too high to me.

    I wonder if in reality this work is outsourced to countries where wages are much lower. The Informarion reported that Tesla uses third-party labelling companies, but it didn’t give any details beyond that.

    In India, for example, a typical wage for call centre employees is $2/hour. So, at India wages rather than California wages, 100 million hours of labour would cost $200 million in salaries, or $25 million per quarter over 8 quarters. Only 7% of the budget.

    I could be greatly underestimating or overestimating the time it takes to label images. Probably AKnet’s top-1 accuracy on Tesla HW2 images is better than GoogLeNet’s top-1 accuracy on ImageNet, since 1) there are probably something like 1/10th to 1/20th the number of semantic classes in Tesla’s dataset as in ImageNet and 2) you train ImageNet challengers on 1 million images, whereas I’m postulating here that AKnet is trained on billions of images. Also, 3) the offline version of AKnet is more accurate than the online version.

    On the other hand, it might be hard to maintain a 5-second pace for correctly labelled images over hours. There will be outliers where objects take minutes to label, and I don’t know how that will impact the average. On the other other hand, you could probably hit that green checkmark in about 1-2 seconds in a lot of cases.

    Upshot from an upload and labelling perspective is that 9 billion labelled training images doesn’t seem like an unrealistic amount.

    Now, how many GPUs would it take to train a neural network on 9 billion images?

    Facebook used 336 GPUs to train ResNeXt 101-32x-48d on 3.5 billion images over 22 days.

    An Nvidia V100 GPU costs $19,000. Let’s say you want 3024 GPUs so you can train a network on ~10 billion images in about a week. 3024 * $19,000 = $57 million. No biggie with Tesla’s budget, assuming you accumulate these GPUs over time as you need them (i.e. as your dataset grows). Over 8 quarters, it’s $7 million per quarter.
     
  13. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    896
    Location:
    The Prime Material Plane
    #13 strangecosmos, Nov 21, 2018
    Last edited: Nov 21, 2018
    Thanks!

    1 GB / 29 days = 34.5 MB per day

    (this is similar to the 30 MB per day average reported last year on Electrek)

    34.5 MB / 142 MB = 24%

    9.5 billion images * 24% = 2.3 billion images

    So, Tesla might have ~2 billion images rather than ~9 billion.

    It’s also possible a lot less than 90% of data upload is still images. There could be a lot of video uploads, for example.
     
  14. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    896
    Location:
    The Prime Material Plane
    #14 strangecosmos, Nov 21, 2018
    Last edited: Nov 21, 2018
    According to the New York Times, people on Mechanical Turk labelled the ImageNet images at an average rate of 3000 per hour, or 50 per minute. I don't even know how that's possible. That's an image every 1.2 seconds.

    This sounds fishy to me because it seems faster than humanly possible. Human reaction time is 0.2-0.3 seconds, and even if you type at 100 words per minute, that's 0.6 seconds to type one word. So it should take around 0.9 seconds just to notice the image and type the label, leaving only 0.3 seconds to recognize the image and think of the word. This doesn't seem possible.

    Apparently the Tsinghua-Tencent 100k benchmark dataset has 128 classes of traffic sign, although only 45 of them are non-rare. Still, lvl5's 40 semantic classes is definitely too few. I'm going to change my guess to ~200 semantic classes. 128 types of traffic sign and 72 other categories of object.
     
  15. lunitiks

    lunitiks (ง ͠° ͟ل͜ ͡°)ง

    Joined:
    Nov 19, 2016
    Messages:
    2,543
    Location:
    Prawn Island, VC
    You keep replying to yourself :eek:
     
    • Funny x 1
  16. Bladerskb

    Bladerskb Senior Software Engineer

    Joined:
    Oct 24, 2016
    Messages:
    1,348
    Location:
    Michigan
    [​IMG]
     
    • Like x 1
    • Funny x 1
  17. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    896
    Location:
    The Prime Material Plane
    I need to add corrections/footnotes/further thoughts
     
  18. Bladerskb

    Bladerskb Senior Software Engineer

    Joined:
    Oct 24, 2016
    Messages:
    1,348
    Location:
    Michigan
    Nvidia says they have 1,500 people labeling images and they can label 1 million images per month (about 3 images per hour on a 8 hour work 7-day shift).
    Based on your 5 seconds calculation they should have been able to label a ridiculous 259,200,000 million images.
     

Share This Page

  • About Us

    Formed in 2006, Tesla Motors Club (TMC) was the first independent online Tesla community. Today it remains the largest and most dynamic community of Tesla enthusiasts. Learn more.
  • Do you value your experience at TMC? Consider becoming a Supporting Member of Tesla Motors Club. As a thank you for your contribution, you'll get nearly no ads in the Community and Groups sections. Additional perks are available depending on the level of contribution. Please visit the Account Upgrades page for more details.


    SUPPORT TMC