TMC is an independent, primarily volunteer organization that relies on ad revenue to cover its operating costs. Please consider whitelisting TMC on your ad blocker and becoming a Supporting Member. For more info: Support TMC

Dojo 2020

Discussion in 'Autopilot & Autonomous/FSD' started by strangecosmos2, Jan 8, 2020.

  1. strangecosmos2

    strangecosmos2 Koopa Troopa

    Joined:
    Nov 24, 2019
    Messages:
    177
    Location:
    New Donk City
    #1 strangecosmos2, Jan 8, 2020
    Last edited: Jan 8, 2020
    At Autonomy Day, Elon said:

    “The car is an inference-optimized computer. We do have a major program at Tesla — which we don’t have enough time to talk about today — called Dojo. That’s a super powerful training computer. The goal of Dojo will be to be able to take in vast amounts of data — at a video level — and do unsupervised [a.k.a. self-supervised] massive training of vast amounts of video with the Dojo computer. But that’s for another day.”
    Dojo is for self-supervised learning on video and, therefore, presumably for computer vision.

    DeepMind recently demonstrated that you can get better performance on image recognition with 2x to 5x fewer hand-labelled training images when you do self-supervised pre-training beforehand.

    Yann LeCun also recently shared his prediction for self-supervised learning on video in 2020. LeCun helped pioneer the field of deep learning and won a Turing Award for that. He's also a computer science professor at NYU and Chief AI Scientist at Facebook. LeCun wrote:

    “This suggests that the way forward in AI is what I call self-supervised learning. It’s similar to supervised learning, but instead of training the system to map data examples to a classification, we mask some examples and ask the machine to predict the missing pieces. For instance, we might mask some frames of a video and train the machine to fill in the blanks based on the remaining frames.”
    Here's a short clip of LeCun explaining this idea:



    The full talk is worth watching.

    LeCun continues:

    “This approach has been extremely successful lately in natural language understanding. Models such as BERT, RoBERTa, XLNet, and XLM are trained in a self-supervised manner to predict words missing from a text. Such systems hold records in all the major natural language benchmarks.​

    In 2020, I expect self-supervised methods to learn features of video and images. Could there be a similar revolution in high-dimensional continuous data like video?​

    One critical challenge is dealing with uncertainty. Models like BERT can’t tell if a missing word in a sentence is “cat” or “dog,” but they can produce a probability distribution vector. We don’t have a good model of probability distributions for images or video frames. But recent research is coming so close that we’re likely to find it soon.​

    Suddenly we’ll get really good performance predicting actions in videos with very few training samples, where it wasn’t possible before. That would make the coming year a very exciting time in AI.”
    Now I feel like I understand the purpose of Dojo. There is a research hurdle to clear that Dojo won't solve: representing uncertainty in video prediction.

    But if it is solved in 2020 as LeCun predicts, then the main constraints on self-supervised pre-training will be data and compute. Tesla has access to plenty of video data, which is cheap to record, upload, and store. It can also use active learning to select which video clips to upload. Then Dojo is intended to provide 10x more useful training compute at a lower cost.

    There's also the constraint of inference compute. Can HW3 run big enough neural networks in real time? I guess we'll see. DeepScale is supposed to squeeze down neural networks to fit on HW3 and HW4 is already in the works and it's supposed to be 3x more powerful.

    Maybe with the breakthrough in self-supervised video prediction that LeCun anticipates, adopted by Tesla and accelerated by Dojo, we'll see an order of magnitude increase in Tesla's computer vision performance.
     
    • Like x 3
  2. Bladerskb

    Bladerskb Senior Software Engineer

    Joined:
    Oct 24, 2016
    Messages:
    2,014
    Location:
    Michigan
    Elon said that Dojo was for planning and control "Over time, I would expect that it moves really to just training against video, video in, car and steering and pedals out… that’s what we’re gonna use the dojo system for."

    Its basically what they are doing today with their in-house Nvidia chips being done a bit faster and in greater scale with "dojo".
     
    • Like x 1
    • Disagree x 1
  3. diplomat33

    diplomat33 Well-Known Member

    Joined:
    Aug 3, 2017
    Messages:
    6,674
    Location:
    Terre Haute, IN USA
    Elon shares some additional info on Dojo:

    [​IMG]
     
  4. diplomat33

    diplomat33 Well-Known Member

    Joined:
    Aug 3, 2017
    Messages:
    6,674
    Location:
    Terre Haute, IN USA
    More info:

    [​IMG]
     
    • Informative x 1
  5. DanCar

    DanCar Active Member

    Joined:
    Oct 2, 2013
    Messages:
    1,619
    Location:
    SF Bay Area
    https://twitter.com/elonmusk/status/1307823914651979776
    According to me Google TPU is best at medium and large training jobs. nVidia wins at small training. It is not hard to be the best by defining benchmarks that optimize your hardware.
     

Share This Page

  • About Us

    Formed in 2006, Tesla Motors Club (TMC) was the first independent online Tesla community. Today it remains the largest and most dynamic community of Tesla enthusiasts. Learn more.
  • Do you value your experience at TMC? Consider becoming a Supporting Member of Tesla Motors Club. As a thank you for your contribution, you'll get nearly no ads in the Community and Groups sections. Additional perks are available depending on the level of contribution. Please visit the Account Upgrades page for more details.


    SUPPORT TMC