Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla, TSLA & the Investment World: the Perpetual Investors' Roundtable

This site may earn commission on affiliate links.
Can someone please explain in simple words, what is the function of Dojo? How does Dojo help solve FSD? I keep hearing 'labelling', but I don't quite understand what that means. How are others solving this problem?

This is how I understood :

video labels
Cars ----------> Dojo -----------> Cars <cars can drive better now>
I know others have answered, but I thought I'd give it a shot

What is the function of Dojo? To train the AI faster and cheaper. Faster in that it is purpose built to just train Tesla's AI. So think of it like a blender which is just designed to blend one particular substance. So if you use just any blender it might take about an hour to blend say a crate of peaches (hypothetically). But let's say that Tesla is just wanting to blend rock salt to a certain grain size. The standard blender wouldn't work well, nor would even a Ninja blender or whatever because Tesla wants to blend literally mountains of rock salt every minute of every day. Huge mountains to give you a sense of scale.

Now imagine how much cheaper it would be to have a machine that could blend mountains of rock salt in a matter of milliseconds. Yep, that's what we are talking about!

How does Dojo help solve FSD? Tesla trains it's AI so often and the bigger the data set the better the result. So, you'd want to biggest computer possible as well as the fastest. And on top of that, it needs to be specific to Tesla's AI to be as efficient as possible. Up until now Tesla has relied on CPUs and GPUs to do their training. Now, with DPUs, things will go several orders of magnitude faster for far less money and be able to scale to larger and large AI model sizes overtime.

What is labeling? Put simply, labeling is what you think it is. You see a dog and you brain knows it is a dog. Your brain knows the dog from everything around it. Labeling is the same thing. Labeling is a human looking at a picture and drawing a line around the dog and writing the word dog on it. It gets way more complicated than that, but that is the simplest version. Auto labeling is the next big step for training AI. And that is a deep rabbit hole with a lot of big fancy words.

How are others solving this problem? Lots of different ways and also a deep rabbit hole. Whats cool about this is that it is just a small part of the overall problem of solving for FSD. It is a very cool part though and when Tesla implements more and more auto labeling it will vastly improve their overall progress towards FSD.

My 2 cents: Tesla will most likely figure out FSD even before Dojo comes online. What Dojo offers them is making FSD iterate better, faster and with less cost. And opens up possible other revenue streams due to the fact they'll have stupid amounts of left over AI compute. Enter DaaS and the Optimus Sub-Prime Android.
 
You have a select few that know "enough" about the field and don't particularly like Tesla's approach that try and establish themselves as experts in an effort to discredit Tesla's approach
A great challenge of life: Knowing enough to think you are right, but not knowing enough to know you are wrong - Neil deGrasse Tyson
 
I plead and beg you all, NO MORE PICTURES of GLJ!
Seconded.

On AI day, Elon said the advances they are making are testing if there is a limit to the economy!! and we have endless comments regarding the rantings of an obviously paid liar.
1629665863149.png
 
Can someone please explain in simple words, what is the function of Dojo? How does Dojo help solve FSD? I keep hearing 'labelling', but I don't quite understand what that means. How are others solving this problem?

This is how I understood :

............. video ...................... labels
Cars ----------> Dojo -----------> Cars <cars can drive better now>

Elon's litmus test - if you can turn off the GPUs, then DOJO works.

(high level)So DOJO is just a hardware stack to do ML where GPUs are the current preferred over CPUs. GPU's though developed for graphics/gaming found their niche in ML because they were better in doing vector math as compared to CPUs. cheers!!
 
More on robot viability. Wires are a pain. And expensive. And unreliable.

Let's compare the reliability of a fully thermocouples piece of power electronics with 100 paired wires called themocuoples. It is a rats nest, even iff well dressed.

To a thermal image of the piece of power electronics. Much simpler.

To a fielded piece of equipment that is periodically thermal imaged (or continuously).

Video is great for field data acquisition - lot of points.

If beam torque wrenches were a part of every slightly flexible bone, and they could be observed with an inside camera, almost all the wires go away and you are already formatted for AI processing.

Internally visible for position and some ratio method (like pinky is always weaker) and just a few strain locations that the camera could see...

Or just back it all out from the motor data... Or the motor data and the tendon stretch. Motor data is sort of open loop and would require periodic calibration.

I am not convinced you need anything more than motor torque and positional information that a camera could get.

In other words, Tesla's new set of tools could vastly simplify what is needed inside the robot.
Speaking as a guy who occasionally gets involved with the proverbial rats' nests of nasty looking wires:
One of the great advances in automotive engineering was the CAN bus. Yeah, people complain about it and all that, but the principle is sound:
  1. On a point where one needs electrical control of any type, apply two wires for power and ground. If one is feeling in a reliable mood, then make it four wires and diode-OR the two power and two ground leads.
  2. Apply one communications bus onto which multiple other control locations are required. Again, if one is feeling in a reliable mood, dual-home the control bus.
  3. Control and/or sense the data with local electronics. On a good day, a total of four wires going anywhere, two of which are power and ground, two of which are control.
The size of the wiring bundle goes down by.. lots. Yeah, I've seen people, in lab settings, attach a dozen thermocouples, voltage monitors, and current sense points, to something under test and, you betcha, that's a rat's nest: but, for production, bussed power and control is the way to go. And has been for decades.
Back in the deeps of time I've worked on gear that didn't follow that paradigm. Zillions of wires, easily broken, etc. They don't make stuff like that any more, thank goodness.
 
EVs get higher speed limit in the EU since ICE cars spew out more pollution the faster they go. In Austria air quality improved where ICE limit is 100 while EV is at 130. US should consider this as an added incentive.
I am personally embarrassed by the mainstream media's incompetence; I think a grade school newspaper would have more factual reporting.
What the mainstream media doesn't comprehend is that small lies make all their reports suspect which harms our voter democracy.

Here is the details from Lars on EV speed limits and other info.

Differing official speed limits are a bad idea, likely to further introduce resentment from those stuck with ICE while waiting for an EV they can afford or have a place to charge.

A better option is to brief law enforcement to be lenient on EVs pushing the limit. Explain that it’s because the cars are newer, safer and less polluting.

And the best thing tax collectors can do to drive the switchover for now, is sit their hands - do not touch the model whereby road taxes are collected on petrol sales. Obviously it will require tweaking later when revenue starts to fall significantly, but for now it automatically gives early adopters a reward, without being obnoxiously in the face of ICEV drivers.
 
Saw some seriously long wait times posted earlier. I'm sure at least a portion of these wait times, particularly with long range S and X, is due to supply constraints. I'm also confident that demand for the entire Tesla lineup is strong.

I'm curious to hear opinions/evidence on whether 3 and Y wait times are more a function of chip (or some other part) shortages or if 3 and Y demand is just that staggering. Long-term, it doesn't really matter. Short-term, if deliveries are unable to keep skyrocketing, we might have a FUD-induced buying opportunity on our hands. If it's primarily just overwhelming demand despite Tesla continuing to grow deliveries, then we are going to lift off sooner rather than later.
I really do not understand why there is so much confusion about the dynamic between delivery dates for new orders being farther out and the chip supply.

Here's what we know as recent as the Q2 earnings call less than a month ago :

- Tesla still expects the majority of sales and production ramp to happen in the 2nd half of this year
- After Elon's long rant about supply issues, he said the situation was easing for Tesla
- Bear in mind delivery dates were already being pushed farther out for new orders before the earnings calls.

Thus we know that they will at least maintain Q2 production levels + about 10,000-12,000 more S than in Q2. Plus we already know Giga China's Q3 output is 8,000 ahead of Q2 after the first month of the quarter.

Now sure...there is a chip shortage. But it's simply limiting how much Tesla can expands it's production right now. But we can calculate very easily that even at a 200k quarterly rate AND with all Fremont production going to North America.....that demand is so strong that new order delivery times keep getting farther out. As Elon said on the earnings call, demand has reach an inflection point in North America. People on twitter are making it seem like each time the delivery window for new orders gets farther out, that production is slowing down or stagnate.....which is completely false.

Btw, deliveries do not need to skyrocket to sustain the current share price, as Q2 earnings showed. Even if Tesla only did 225k deliveries in Q3, it will have profound impacts on Tesla's EPS and thus their P/E, which has been dropping dramatically throughout the year. Tesla isn't "priced" for perfection anymore......not by a long shot...no matter how much bears/shots still use that line. Tesla's P/E was over 1,200 to start the year, it will be under 250 after Q3 earnings if the stock stays at this share price level.
 
Last edited:
Can someone please explain in simple words, what is the function of Dojo? How does Dojo help solve FSD? I keep hearing 'labelling', but I don't quite understand what that means. How are others solving this problem?

This is how I understood :

............. video ...................... labels
Cars ----------> Dojo -----------> Cars <cars can drive better now>
My understanding is that right now labeling is done manually. Dojo will be able to label automatically.
 
I really do not understand why there is so much confusion about the dynamic between delivery dates for new orders being farther our and the chip supply.

Here's what we know as recent as the Q2 earnings call less than a month ago :

- Tesla still expects the majority of sales and production ramp to happen in the 2nd half of this year
- After Elon's long rant about supply issues, he said the situation was easing for Tesla
- Bear in mind delivery dates were already being pushed farther out for new orders before the earnings calls.

Thus we know that they will at least maintain Q2 production levels + about 10,000-12,000 more S than in Q2. Plus we already know Giga China's Q3 output is 8,000 ahead of Q2 after the first month of the quarter.

Now sure...there is a chip shortage. But it's simply limiting how much Tesla can expands it's production right now. But we can calculate very easily that even at a 200k quarterly rate AND with all Fremont production going to North America.....that demand is so strong that new order delivery times keep getting farther out. As Elon said on the earnings call, demand has reach an inflection point in North America. People on twitter are making it seem like each time the delivery window for new orders gets farther out, that production is slowing down or stagnate.....which is completely false.

Btw, deliveries do not need to skyrocket to sustain the current share price, as Q2 earnings. Even if Tesla only did 225k deliveries in Q3, it will have profound impacts on Tesla's EPS, which has been dropping dramatically throughout the year. Tesla isn't "priced" for perfection anymore......not by a long shot...no matter how much bears/shots still use that line. Tesla's P/E was over 1,2000 to start the year, it will be under 250 after Q3 earnings if the stock stays at this share price level.

And to touch on this a bit about certain twitter's views on Tesla's production and/or supple/production issues. Practically everyone was saying in June "Based on what I'm seeing Tesla gonna be lucky just to make/delivery a few hundred Model S's".........Lo and behold we find out that Tesla made 2,400 Model S's and delivered over 1,500 :rolleyes: .

Despite some of those twitter profiles only creating doubt about Tesla's current production ability, I've seen zero evidence that Fremont isn't running anything different than normal. And as I mentioned in the previous post, we already know Giga China is 8,000 ahead in terms of deliveries/exports (probably even more ahead in total production for the first month of the quarter) and every video we see out of Giga China shows the lot with thousands of cars on them with 25 delivery trucks there at all times loading them up.
 
Differing official speed limits are a bad idea, likely to further introduce resentment from those stuck with ICE while waiting for an EV they can afford or have a place to charge.

A better option is to brief law enforcement to be lenient on EVs pushing the limit. Explain that it’s because the cars are newer, safer and less polluting.

And the best thing tax collectors can do to drive the switchover for now, is sit their hands - do not touch the model whereby road taxes are collected on petrol sales. Obviously it will require tweaking later when revenue starts to fall significantly, but for now it automatically gives early adopters a reward, without being obnoxiously in the face of ICEV drivers.


Yet it seems to work in these countries and the people like cleaner air. As more people shift to EVs the disappointed ICE drivers will have to change or pay the citations for driving over the speed limits.

As Abraham Lincoln said:
You can please some of the people some of the time, all of the people some of the time, some of the people all of the time, but you can never please all of the people all of the time.
 
Huh, upon second listen Dojo is meant for generalized NN training with the team using the tool sets they have built for this purpose. It's not necessarily used to solve FSD. Almost seems like Elon is saying they will throw vision stack into Dojo just to test if it's better than the gpu cluster as a test. Dojo is meant for the humanoid robot...which probably plays on the fact that usually a human trains in a dojo. So Tesla has been thinking about a general purpose humanoid robot years ago.
 
Speaking as a guy who occasionally gets involved with the proverbial rats' nests of nasty looking wires:
One of the great advances in automotive engineering was the CAN bus. Yeah, people complain about it and all that, but the principle is sound:
  1. On a point where one needs electrical control of any type, apply two wires for power and ground. If one is feeling in a reliable mood, then make it four wires and diode-OR the two power and two ground leads.
  2. Apply one communications bus onto which multiple other control locations are required. Again, if one is feeling in a reliable mood, dual-home the control bus.
  3. Control and/or sense the data with local electronics. On a good day, a total of four wires going anywhere, two of which are power and ground, two of which are control.
The size of the wiring bundle goes down by.. lots. Yeah, I've seen people, in lab settings, attach a dozen thermocouples, voltage monitors, and current sense points, to something under test and, you betcha, that's a rat's nest: but, for production, bussed power and control is the way to go. And has been for decades.
Back in the deeps of time I've worked on gear that didn't follow that paradigm. Zillions of wires, easily broken, etc. They don't make stuff like that any more, thank goodness.
I agree. So a reasonably smart strain gage on one bone of each of 5 fingers with CAN bus connecting them.

If you wanted a strain gage on each bone of 5 fingers that is a 15 node CAN bus.

But they do move. Solder joints don't always like that.

And the data is coming in as a discrete double precision word, or something like that.
A camera is one node. And it does not move. Two cameras are 2 nodes.

If the tendons all pass through the wrist, you could look with one stationary camera there for position. If you make the tendons elastic enough, you could see stretch. (I don't think signal to noise would work out satisfactorily).

So one stationary camera node should be more reliable than a moving wire set of any variety.

And the camera as input discipline might leverage the AI investment better.

Even with a CAN bus, every point you monitor requires adhesive of the sensor.

A camera is going to be cleaner and more reliable if you can get signal to noise and an aggregation point where only one camera is needed (a lense would be OK, but fiber optic bending loss models or whatever can hamper aggregation. You might be able to look at reflections from bends with location determined by path length and amount of bend determined by magnitude of reflection).

The wrist is a good spot. If you can't do it totally with the motor.

Thanks for pointing out the exaggeration in the first example.
 
Can someone please explain in simple words, what is the function of Dojo? How does Dojo help solve FSD? I keep hearing 'labelling', but I don't quite understand what that means. How are others solving this problem?

This is how I understood :

............. video ...................... labels
Cars ----------> Dojo -----------> Cars <cars can drive better now>
I'm going to give this a shot. And perhaps point out the foibles of Neural Networks in the process.
So: A Neural Net. Imagine a thousand neurons, hooked up to a bunch of nodes, maybe fifty or so. Each node has weights assigned to the 20 or so neurons attached to it; multiple a weight times the analog input on each of those twenty or so neurons; if one goes above a threshold, then that node will fire, stimulating all the neurons leaving that node.
So, of the twenty or so nodes in this example, have all their twenty outputs go to a summary node.
Now - attach pixels from multiplicity of pictures with a giraffe in the picture. Giraffes standing, running from lions, glimpsed in the distance, right in front of one, inbetween trees, and so on. Have the final, output node, be "Giraffe or No Giraffe".
In addition, zillions of pictures with no giraffe in evidence.
Now, run through all those pictures. Each time one has a giraffe in evidence and the giraffe bit is set, do some complicated math and increase the weights on the neurons and intermediate neurons that seemed to have something to do with the correct detection. Each time one has a giraffe in evidence but no giraffe is detected, reduce weights with math; i.e., wrong answer time.
Each time one shows a picture with no giraffe and a giraffe is detected, do the wrong answer algorithm. Each time one shows a picture with no giraffe and no giraffe is detected, correct answer algorithm.
Did I mention feedback? There's feedback.
This kind of a system gets really, really good at detecting giraffes. People who have done this kind of work discover that, when shown a picture that doesn't nominally have a giraffe in it, and the giraffe detector goes off, if they take a careful look - and yep, there's a giraffe, 90% hidden by leaves.
Now, make it more complicated: Lots of outputs, lots of intermediate nodes, lots of cross-connections. The outputs are every animal in the savanna. And, weights and all.. It works. In fact, it kind of works, and works well: Think about how well people pick up creatures in the savanna - like that lion trying to hide in the grass. (And those ancestors of ours whose NN's didn't work up to snuff ended up winning Darwin awards and aren't around any more to complain.)

Now, for the nasty part. As one might expect for a system that's designed to work on a hair trigger and works on variable weights, to boot, training up such a NN means that if one isn't careful, one might train such a NN up for the wrong thing. Suppose that one is attempting to train a NN to play poker. All well and good; but if one puts a deck of marked cards into play and lets the NN lean from lots and lots of pictures of people playing poker, one might end up, instead of a NN that can catch the "Tells" that professional poker players look for, a NN might get really, really good at figuring out how to play when it knows what the cards are in the other players hands! It might do pretty well in that kind of environment: And fail hugely when it has to play with unmarked decks!

So, what's the Dojo compute platform doing? Near as I can tell from the AI day, it's a platform that generates an environment that's absolutely, positively realistic, right down to the hair on the back of the dog. It generates this environment from a combination of real pictures, maps, labels on other moving things, rainstorms, you name it. Have a semi go around the bend, in a realistic way. Or a Cybertruck drive through the intersection. Or have a family running down the freeway. Why do this? Because it's like buzzards on a tree waiting for something to die: You know that there's a particular corner case, because you either thought of one or saw one that was close: Now you want to run a zillion iterations against your NN for every possible version of this corner case, in all weathers, with all vehicles, in all possible road intersections and curves.
And it won't just be this corner case. It'll be the whole panoply of corner cases, regular cases, medium cases, and so on. Millions or even billions of iterations, training up the NN processor: Image recognition, object detection, 3-d physics, and so on. It's a heck of a lot faster to do this than to try and get people and video cameras to do it for you. By orders of magnitude.

And that's what the Dojo is doing. It's a real Dojo - a place where one can practice, and practice, and practice, over and over, so, when one is finally let out into the real world, one hopes one has the wherewithal to succeed.

In this environment, the Dojo internal display to the NN has to be balls-on accurate; if not, one is going to run into the marked cards problem and end up training it for something that's an artifact of the simulation itself.

And there you are.
 
Huh, upon second listen Dojo is meant for generalized NN training with the team using the tool sets they have built for this purpose. It's not necessarily used to solve FSD. Almost seems like Elon is saying they will throw vision stack into Dojo just to test if it's better than the gpu cluster as a test. Dojo is meant for the humanoid robot...which probably plays on the fact that usually a human trains in a dojo. So Tesla has been thinking about a general purpose humanoid robot years ago.
Solving FSD shows that the technology stack they have created can handle real world vision based AI.

Dojo provides them the framework to VERY quickly spin up new applications for the FSD tech stack. Anything that requires decision making based on vision could theoretically be automated. Dojo facilitates that.

Think surgery, landing space craft, farming, number of manufacturing processes, and any other application where a human acts solely based on visual inputs. I say visual, but plausibly dojo will be able to train NNs based on other inputs (audible, tactile, etc).

People are severely underestimating what this means. Tesla can solve FSD without dojo. Dojo makes it a lot easier and makes the solution potentially a lot more application agnostic.
 
I agree. So a reasonably smart strain gage on one bone of each of 5 fingers with CAN bus connecting them.

If you wanted a strain gage on each bone of 5 fingers that is a 15 node CAN bus.

But they do move. Solder joints don't always like that.

And the data is coming in as a discrete double precision word, or something like that.
A camera is one node. And it does not move. Two cameras are 2 nodes.

If the tendons all pass through the wrist, you could look with one stationary camera there for position. If you make the tendons elastic enough, you could see stretch. (I don't think signal to noise would work out satisfactorily).

So one stationary camera node should be more reliable than a moving wire set of any variety.

And the camera as input discipline might leverage the AI investment better.

Even with a CAN bus, every point you monitor requires adhesive of the sensor.

A camera is going to be cleaner and more reliable if you can get signal to noise and an aggregation point where only one camera is needed (a lense would be OK, but fiber optic bending loss models or whatever can hamper aggregation. You might be able to look at reflections from bends with location determined by path length and amount of bend determined by magnitude of reflection).

The wrist is a good spot. If you can't do it totally with the motor.

Thanks for pointing out the exaggeration in the first example.
I mentioned the CAN bus because, well, this is an automotive forum, and people who play with cars kind of know what a CAN bus is and what it can do. But there's plenty of different kinds of buses out there; yeah, there's SPI, I2C, $RANDOM serial buses of whatever type one wants, TCP/IP (nominally two wires), one-wire, and I can't remember that bus I implemented that was running at 100 Mb/s with asynchronous clocks over fiber. And I can't remember that one because it was some 30 years ago; this is not new stuff.
And there's very positively speedier buses these days, up to and past 1 Gb/s, low latency, and so on.
Why mention all these buses? The CAN bus is an old, very old standard in Internet dog years. At the time it was invented it was the best thing on the market and saved lots of folks tons of cash. Build a brand new robot with some serious data gathering requirements? Sky's the limit, hardware is cheap, software is fast, multiprocessing is a thing (as is NN), so I don't really see any difficulties collecting data into a central processing unit and going to town. Lots and lots of fiddly details, you betcha. But reaction time won't be a problem.
 
Tesla is shaping up to be just as impactful to the world and will grow much faster than the Dutch East India Company. The VOC had estimated value, at its height, at over $7 Trillion (inflation adjusted). So, a Tesla 10X isn't out of the realm of historical reason.
I'm honestly fairly confident in a 10x in the next decade. I'm more unsure beyond that, but as Tesla continues to expand its TAM, I am just not sure how crazy things can get...
 
Huh, upon second listen Dojo is meant for generalized NN training with the team using the tool sets they have built for this purpose. It's not necessarily used to solve FSD. Almost seems like Elon is saying they will throw vision stack into Dojo just to test if it's better than the gpu cluster as a test. Dojo is meant for the humanoid robot...which probably plays on the fact that usually a human trains in a dojo. So Tesla has been thinking about a general purpose humanoid robot years ago.
I think they hope Dojo is better than the GPU cluster...

Elon is just setting a test they want Dojo to pass.

IMO Dojo may or may not slightly accelerate the progress of FSD, but Tesla wants Dojo ready ASAP, just in case it will accelerate progress,,,,

But developing Dojo just to solve FSD might have been an overkill, it may have been cheaper just to keep scaling up the GPU cluster.

But add in the fact that Dojo can be used for Robot training.

IMO a Robot has multiple applications where FSD is one application, he likely amount of training needed for the Robot is at least 10X that needed for FSD.

The we come to the target for Dojo2 is 10X Dojo, I think FSD is likely solved to a very good march of 9s before Dojo2. But we know Dojo2 is not just limited to FSD and the Robot, there are lots of potential NN applications that require training, both in house and 3rd-party.