Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
Obviously we don't know much about Dojo, but it seems that Elon is very good at keeping his companies at the bleeding edge of whatever is out there.

This recent article about researchers from the University of Notre Dame and Global Foundries suggests some very interesting points which I am quietly confident would be along the lines of what Dojo is aiming to do.

A ferroelectric ternary content-addressable memory to enhance deep learning models
 
  • Informative
Reactions: J1mbo
Obviously we don't know much about Dojo, but it seems that Elon is very good at keeping his companies at the bleeding edge of whatever is out there.

This recent article about researchers from the University of Notre Dame and Global Foundries suggests some very interesting points which I am quietly confident would be along the lines of what Dojo is aiming to do.

A ferroelectric ternary content-addressable memory to enhance deep learning models

Karpathy gave a talk at Pytorch a few weeks ago where he teased a bit about Dojo. He said Dojo is a neural net training computer.

Basically, Tesla aims to automate the entire machine learning process where the machine will train the machine in a closed loop. Karpathy joked that it is called "operation vacation" because the engineers will be able to go on vacation and Autopilot will improve on its own without any human input. He shared this graph during his talk:

upload_2019-12-6_8-26-47.png


You can watch his entire talk here:

 
Karpathy gave a talk at Pytorch a few weeks ago where he teased a bit about Dojo. He said Dojo is a neural net training computer.

Basically, Tesla aims to automate the entire machine learning process where the machine will train the machine in a closed loop. Karpathy joked that it is called "operation vacation" because the engineers will be able to go on vacation and Autopilot will improve on its own without any human input. He shared this graph during his talk:

View attachment 485327

You can watch his entire talk here:


All dojo does is replace/add to the nvidia server racks, it does not to change the actual process.

Automation of the entire ML process is what's already been done in other SDC companies like Waymo and what others are trying to do.

It's not unique and definitely not "OP vacation"

1*xYugFnyqjGh4JLIE5a4uuw.png
 
All dojo does is replace/add to the nvidia server racks, it does not to change the actual process.

Automation of the entire ML process is what's already been done in other SDC companies like Waymo and what others are trying to do.

It's not unique and definitely not "OP vacation"

1*xYugFnyqjGh4JLIE5a4uuw.png

Thanks for the additional information and clarification.
 
Dojo involves a custom ML processor developed by Tesla that is a 10x speed/efficiency boost over their current system.

Tesla's operation vacation is unique in the industry because they have mass data and testing done for "free". Eventually, there'll be no labeling needed. That's the goal.
 
  • Like
Reactions: kbM3
jimmy_d described his theory about Dojo in this post:

“This is an opinion, but I’m going to state it emphatically because I’m pretty confident in it.​

Dojo isn’t going to be a training computer deployed into car, it’s going to be training infrastructure that is optimized to perform unsupervised learning from video at scale. Tesla is probably going to produce custom silicon to enable this because available commercial hardware is inadequate to the task, but it should be doable with a level of silicon development effort comparable to what it took to create Tesla’s FSD chip.”​

This theory lines up with things Elon and Karpathy have said and it's what makes the most sense to me. The purpose of Dojo is likely to speed up self-supervised learning (a.k.a. unsupervised learning) for computer vision tasks. Whereas supervised learning uses deliberate human labels as the supervisory signal or training signal, in self-supervised learning, the supervisory signal comes from the data itself. This blog post has a bunch of examples.

This is what Elon said at Autonomy Day (at 2:26:42):

“The car is an inference-optimized computer. We do have a major program at Tesla — which we don’t have enough time to talk about today — called Dojo. That’s a super powerful training computer. The goal of Dojo will be to be able to take in vast amounts of data — at a video level — and do unsupervised massive training of vast amounts of video with the Dojo computer. But that’s for another day.”​

In July, Karpathy had a tweet — not directly related to Tesla, but still suggestive — on self-supervised learning:

“(The “correct” area of research to watch closely is stupid large self-supervised learning or anything that finetunes on/distills from that. Other “shortcut” solutions prevalent today, while useful, are evolutionary dead ends)”​

Tesla already uses self-supervised learning to predict the behaviour of road users. On Autonomy Day, Karpathy said Tesla has explored a self-supervised technique for at least one computer vision task: depth mapping.

For many tasks, I would imagine the goal with self-supervised learning would be to supplement or bootstrap supervised learning rather than replace it entirely. In other words, the goal would be to allow supervised learning to get better results with the same amount of manually labelled training data.

Here's one way I think that could work. (I'm new to this topic so I could be getting it wrong.) The neural network trains on a proxy task or “pretext task” like predicting/generating future frames of video from previous frames of video. In so doing, the network learns latent or implicit representations or concepts of objects like vehicles, pedestrians, cyclists, lane lines, road edges, curbs, traffic lights, stop signs, and so on. When it comes time to do supervised learning with manually labelled video frames, the neural network learns these objects categories faster and better because it already has rich concepts of them from its self-supervised training.

Here's a great talk from deep learning pioneer and Turing Award winner Yann LeCun on self-supervised learning:


Weakly supervised learning is another approach that would allow Tesla to train neural networks on computer vision tasks without manually labelling data. (I recently wrote about weakly supervised learning in this article.) In an autonomous driving context, weakly supervised learning uses human driving behaviour as a source of automatic labels for camera data. This approach has been shown to work well for semantic segmentation of free space. Weakly supervised learning is also what Tesla has been using to predict the curvature and gradient of roadways.

I would venture to speculate that, as with self-supervised learning, in order to fully exploit weakly supervised learning, Tesla needs to train neural networks on “stupid large” video datasets — and that's hardware-intensive.

I believe self-supervised learning and weakly supervised learning for computer vision are two pillars of Tesla's large-scale fleet learning approach.
 
Last edited:
Tesla's operation vacation is unique in the industry because they have mass data and testing done for "free". Eventually, there'll be no labeling needed. That's the goal.

There are three main parts of autonomous driving:

Perception: what the system sees.

Prediction: what the system anticipates.

Planning: what the system does.

Prediction and planning already use automatic labels. Prediction uses a form of self-supervised learning in which part of the data (the future) supervises another part of the data (the past). In other words, the future labels the past with the desired output.

Planning uses imitation learning, in which human driving behaviour labels perception data: either raw video or perceptual representations derived from neural networks. Imitation learning is a form of supervised learning or weakly supervised learning. (It can also take other forms, but this is usually true.)

DeepMind recently showed that imitation learning alone is enough to get a StarCraft II agent up to the 84th percentile among ranked human players (i.e. it's better at StarCraft than 83% of humans). StarCraft is super complicated, so I find this to be a remarkable proof of concept.

In addition to imitation learning, planning could also use real world reinforcement learning. The reward signal could take the form of human interventions (such as Autopilot disengagements). It may also be possible to learn a self-evaluated reward from human driving behaviour. By doing reinforcement learning on top of imitation learning, DeepMind got its StarCraft agent to above the 99th percentile.

So, two-thirds of the autonomous driving problem already don't require data to be manually labelled. If jimmy_d's theory is correct, then Dojo is a part of Tesla's effort to reduce the need for manual labels in perception. Accelerated by Dojo, methods like self-supervised learning and weakly supervised learning can supplement fully supervised learning in computer vision.

Lex Fridman (who teaches a course on deep learning for self-driving cars at MIT) recently interviewed Elon for the second time. I found this part of the interview (at 29:21) quite interesting:

Lex Fridman: What's harder, perception or control for these problems? So being able to perfectly perceive everything or figuring out a plan once you perceive everything? How to interact with all the agents in the environment? In your sense, from a learning perspective, is perception or action harder, in that giant, beautiful, multi-task learning neural network?

Elon Musk: The hardest thing is having accurate representation of the physical objects in vector space. So, taking the visual input, primarily visual input, some sonar and radar and then creating an accurate vector space representation of the objects around you. Once you have an accurate vector space representation, the planning and control is relatively easier. That is relatively easy.

Basically, once you have accurate vector space representation, then you're kind of like a video game, like cars in Grand Theft Auto or something. They work pretty well. They drive down the road, they don't crash, pretty much unless you crash into them. That's because they've got an accurate vector space representation of where the cars are, and then they're rendering that as the output.
My analogy is to StarCraft rather than Grand Theft Auto. Once you have accurate computer vision, prediction and planning are amenable to training via massive quantities of automatically labelled fleet data. If driving were a video game, we would probably already have a driving agent that's better than 80%+ of human drivers. The challenge of computer vision is to make driving like a video game. That's where techniques like self-supervised learning and weakly supervised learning come in. And Dojo exists to accelerate those techniques/make them possible at greater scale.
 
Last edited:
Any time I've heard Elon or Karpathy discuss Dojo it's been in the future tense and they haven't given a concrete timeline. With the FSD Computer, there was a clear finish line: the computer going into new production cars. Long before that, they were testing a version of the FSD Computer in Tesla-owned vehicles.

With Dojo, I guess the finish line would be when the Dojo computers replace (likely Nvidia) GPUs for the Tesla AI team's daily/weekly/monthly training workloads. Dojo seems like it might be optimized specifically for training on video, so GPUs might still be used for non-video training tasks (e.g. some prediction and planning tasks might use the “vector space” representations as the input rather than pixels).

If they have a prototype version of Dojo running now (I don't know whether they do or don't), presumably they are testing it by using it to train neural networks.

As I understand it, Tesla can still use weakly supervised learning and self-supervised learning on video with just GPUs, before the Dojo computer is finished. But there are technical constraints like DanCar said above and economic constrains like jimmy_d discussed here.
 
Any time I've heard Elon or Karpathy discuss Dojo it's been in the future tense and they haven't given a concrete timeline. With the FSD Computer, there was a clear finish line: the computer going into new production cars. Long before that, they were testing a version of the FSD Computer in Tesla-owned vehicles.

With Dojo, I guess the finish line would be when the Dojo computers replace (likely Nvidia) GPUs for the Tesla AI team's daily/weekly/monthly training workloads. Dojo seems like it might be optimized specifically for training on video, so GPUs might still be used for non-video training tasks (e.g. some prediction and planning tasks might use the “vector space” representations as the input rather than pixels).

If they have a prototype version of Dojo running now (I don't know whether they do or don't), presumably they are testing it by using it to train neural networks.

As I understand it, Tesla can still use weakly supervised learning and self-supervised learning on video with just GPUs, before the Dojo computer is finished. But there are technical constraints like DanCar said above and economic constrains like jimmy_d discussed here.

Thanks. It will be interesting to see if and how soon we see more rapid progress with FSD after dojo goes online. I would think we should start to see some benefits from dojo a few months after it goes online.
 
TIt will be interesting to see if and how soon we see more rapid progress with FSD after dojo goes online. I would think we should start to see some benefits from dojo a few months after it goes online.

I’m a bit puzzled about why, as far as I know, no cars with the FSD Computer have yet gotten the new neural networks that Karpathy seemed to imply were ready last year. Or at least almost ready.

In October 2018, Karpathy said:

“...my team trains all of the neural networks that analyze the images streaming in from all the cameras for Autopilot. For example, these neural networks identify cars, lane lines, traffic signs, and so on. The team is incredibly excited about the upcoming upgrade for the Autopilot computer which Pete [Bannon] briefly talked about.​

This upgrade allows us to not just run the current neural networks faster, but more importantly, it will allow us to deploy much larger, computationally more expensive networks to the fleet. The reason this is important is that, it is a common finding in the industry and that we see this as well, is that as you make the networks bigger by adding more neurons, the accuracy of all their predictions increases with the added capacity.​

So, in other words, we are currently at a place where we trained large neural networks that work very well, but we are not able to deploy them to the fleet due to computational constraints. So, all of this will change with the next iteration of the hardware. And it's a massive step improvement in the compute capability. And the team is incredibly excited to get these networks out there.”
In April 2019, Elon tweeted:

“The Tesla Full Self-Driving Computer now in production is at about 5% compute load for these tasks [i.e. Navigate on Autopilot] or 10% with full fail-over redundancy”
The same day Elon also tweeted that that the compute load on HW2.5 was “~80%”.
 
  • Funny
Reactions: AlanSubie4Life
Thanks. It will be interesting to see if and how soon we see more rapid progress with FSD after dojo goes online. I would think we should start to see some benefits from dojo a few months after it goes online.

Not convinced we will. It will reduce internal build time, and probably increase accuracy/confidence of AP. But my guess is that improvements that can be attributed to Dojo (and not just expansion of the data set to consume all that lovely extra processing capacity) will be about system quality. Like: reduction in false alarms, and more capable path prediction. Stuff that you can't "see", but which gives an overall better experience.
 
  • Like
Reactions: theBurtReynold
I have just rewatched the Autonomy Day video and the pyTorch talk. It still completely blows me away what is ostensibly going on behind closed doors at Tesla with respect to FSD.

To me the biggest irreconcilable point is the difference between what is being promised as "feature complete" FSD and the state of AutoPilot with FSD right now. I recently received the latest software update and while its fantastic, its absolutely light years behind what I would consider level 5. There is absolutely no city level perception and even used as intended on highways it seems to have strange regressions every now again. Features are slowly added like traffic cone recognition and more recently adjacent speed monitoring. If these "features" are being added piece by piece it will be DECADES before we have feature complete FSD. There will need to be an almost unbelievable step change in AutoPilot capability in the coming months for me to have any hope of the timelines being true.

I wonder if Dojo is the missing ingredient here? Are Tesla banking on the fact that they will all of sudden be able to fully realise the hidden learnings within the video of their fleet? Is this what takes the current NN model from a level 2 to a level 4/5 within a matter of months?
 
  • Like
Reactions: OPRCE