Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

How does dojo get the FSD training to my car? I don't mean merely OTA...

This site may earn commission on affiliate links.
Hi all, I THINK I have the gist of FSD at the dojo level, but I'm not sure how the dojo rules make it to the car.
I'll try to 'splain what I'm asking, and welcome corrections to anything I say that is wrong.
What I believe is a zillion video clips are fed to dojo - maybe it's unprotected left turn day. upload a couple of hundred thousand left turns, and maybe who knows, left turns that had collisions. label the successful ones good, and the bad ones, you guessed it, bad.
then later, you can input a video, and dojo should be able to classify it as a good left turn or not, and i guess at the end of the day, dojo can plot a good left turn based on enough variants of good or bad that were uploaded and labeled.
Fine, now i think we have left turns, at the DOJO level.
How does my car know what to do with a firmware download?
to me, and here's what i'm really asking, is the car ALSO able to run a neural net, and just not really train one like dojo can? or does dojo somehow either write code that ultimately looks like human code, "IF these parameters are true, THEN do the following" OR is our tesla hw 3.0 able to somehow run the rules that dojo creates?
so from my perspective, dojo gets trained, and ultimately creates rules. then i'm wondering if the car can "read" those rules and follow them.
or, does dojo ultimately write code that follows those rules, and the car runs the code?
as a former programmer, i never was involved with neural networks, so i can understand an if/then statement with the best of em, but i can't figure out how what dojo decides in california makes it to my car in new jersey.
i coudl better understand if the car was talking to dojo in real time, which of course would not be practical. so how does the weaker tesla hardware do it?
Thanks for any clarification!
 
Hi all, I THINK I have the gist of FSD at the dojo level, but I'm not sure how the dojo rules make it to the car.
I'll try to 'splain what I'm asking, and welcome corrections to anything I say that is wrong.
What I believe is a zillion video clips are fed to dojo - maybe it's unprotected left turn day. upload a couple of hundred thousand left turns, and maybe who knows, left turns that had collisions. label the successful ones good, and the bad ones, you guessed it, bad.
then later, you can input a video, and dojo should be able to classify it as a good left turn or not, and i guess at the end of the day, dojo can plot a good left turn based on enough variants of good or bad that were uploaded and labeled.
Fine, now i think we have left turns, at the DOJO level.
How does my car know what to do with a firmware download?
to me, and here's what i'm really asking, is the car ALSO able to run a neural net, and just not really train one like dojo can? or does dojo somehow either write code that ultimately looks like human code, "IF these parameters are true, THEN do the following" OR is our tesla hw 3.0 able to somehow run the rules that dojo creates?
so from my perspective, dojo gets trained, and ultimately creates rules. then i'm wondering if the car can "read" those rules and follow them.
or, does dojo ultimately write code that follows those rules, and the car runs the code?
as a former programmer, i never was involved with neural networks, so i can understand an if/then statement with the best of em, but i can't figure out how what dojo decides in california makes it to my car in new jersey.
i coudl better understand if the car was talking to dojo in real time, which of course would not be practical. so how does the weaker tesla hardware do it?
Thanks for any clarification!

My understanding is that dojo is a massive computer that can do vast machine learning training. Hopefully, with a lot of training, the neural net is able to perform the given task with high enough reliability. So you need to test the neural net and see how good it is. If it is not good enough, you give dojo more data and do more training until it is good enough. To answer your question, once Tesla is satisfied the neural net is good enough, Tesla then takes it and makes the software that actually runs in our cars.
 
Hi all, I THINK I have the gist of FSD at the dojo level, but I'm not sure how the dojo rules make it to the car.
I'll try to 'splain what I'm asking, and welcome corrections to anything I say that is wrong.
What I believe is a zillion video clips are fed to dojo - maybe it's unprotected left turn day. upload a couple of hundred thousand left turns, and maybe who knows, left turns that had collisions. label the successful ones good, and the bad ones, you guessed it, bad.
then later, you can input a video, and dojo should be able to classify it as a good left turn or not, and i guess at the end of the day, dojo can plot a good left turn based on enough variants of good or bad that were uploaded and labeled.
Fine, now i think we have left turns, at the DOJO level.
How does my car know what to do with a firmware download?
to me, and here's what i'm really asking, is the car ALSO able to run a neural net, and just not really train one like dojo can? or does dojo somehow either write code that ultimately looks like human code, "IF these parameters are true, THEN do the following" OR is our tesla hw 3.0 able to somehow run the rules that dojo creates?
so from my perspective, dojo gets trained, and ultimately creates rules. then i'm wondering if the car can "read" those rules and follow them.
or, does dojo ultimately write code that follows those rules, and the car runs the code?
as a former programmer, i never was involved with neural networks, so i can understand an if/then statement with the best of em, but i can't figure out how what dojo decides in california makes it to my car in new jersey.
i coudl better understand if the car was talking to dojo in real time, which of course would not be practical. so how does the weaker tesla hardware do it?
Thanks for any clarification!

I've used this as a mental model (and it's likely oversimplified and somewhat wrong, but it's enough for a lay person to get the gist).

You can think of the car's brain as a black box (you can't see how it works inside). On one end, you give it a situation for an input, and on the other end, it outputs what it thinks is the right result/maneuver. Inside the box are a ton of dials, and they represent the neurons. When the AI is being trained with dojo or whatever computer, it's basically tuning all those dials. The goal is to find the right setting for each of those dials where the output is correct one for any given input. With a task as complex as driving, in order to have a chance at getting correct outputs for all the possible inputs, you need a ton of dials. And to get the settings right for all those dials, you need a ton of computing power, hence the need for something like dojo.

The black box with the dials sits in the car. But the same type of box exists at the dojo training center. When dojo training yields good outputs, the settings for all those dials at Tesla are sent to all the cars. Therefore the "brain" for all the cars are identical for a given firmware update.

In reality, there isn't just a single neural net (black box); there are several, each handling more specific functions. But that's the general idea of centralized training and distribution to the fleet. The important concept to understand here is that for any given input, past performance does not guarantee future results. A retraining of the NN could theoretically wipe out a previously correct output. This is why we see regressions in every firmware. Overall there might be a net improvement, but sometimes to address one problem, you screw up something else that was previously "solved."
 
  • Like
Reactions: IamGaryGnu and Dewg
is the car ALSO able to run a neural net, and just not really train one like dojo can? or does dojo somehow either write code that ultimately looks like human code, "IF these parameters are true, THEN do the following" OR is our tesla hw 3.0 able to somehow run the rules that dojo creates?
Yes, the car gets the neural networks as part of a vehicle software update, and these are basically a bunch of numbers that you can think of like thresholds and/or adjustments. Similar to your idea of "if condition, then result" statements, these numbers are dynamically combined with camera inputs to produce outputs such as whether there's a vehicle in front of you. FSD Beta is not fully "end-to-end" neural networks in that there is traditional code that you're more familiar with deciding what to do with these predictions such as adjusting the accelerator or steering wheel.

The reason why Dojo needs a lot more data and compute is that it needs to process a lot more data from many many examples to decide on what are the "correct" numbers for the neural networks that are run in the car. The process of training a neural network, which I believe you're referring to as "rules," is doing some of the same steps as what a vehicle would do in it takes video inputs and computes the output, but what's different is the labelled training data also knows what the output should be and can slightly tweak the neural network numbers in a way that better matches the output.
 
  • Like
Reactions: IamGaryGnu
I think IamGary's question is how does the DOJO training get transferred to our cars. The output of the DOJO is a trained neural net(s), and that output is part of the firmware download that our cars receive. Our cars don't have a normal processor, but a neural net processor that runs the trained neural net output from DOJO.
 
  • Like
Reactions: IamGaryGnu
I think IamGary's question is how does the DOJO training get transferred to our cars. The output of the DOJO is a trained neural net(s), and that output is part of the firmware download that our cars receive. Our cars don't have a normal processor, but a neural net processor that runs the trained neural net output from DOJO.

Yes. Our cars simply run the NN's that dojo trained.
 
I think IamGary's question is how does the DOJO training get transferred to our cars. The output of the DOJO is a trained neural net(s), and that output is part of the firmware download that our cars receive. Our cars don't have a normal processor, but a neural net processor that runs the trained neural net output from DOJO.
spot on. Thanks. yeah, my belief is dojo sucks up a ton of videos and creates a neural net, or rules in the neural net - great. then somehow, someway, i guess that "end part" gets cloned to the cars in the firmware update. so while the cars don't have the mental horsepower to create the rules, it can receive the rules and then act accordingly.
i am also believing from the responses above that there is in fact no "coded" if this then that per se, but somehow datasets or instructions as vomited by dojo.
 
I've used this as a mental model (and it's likely oversimplified and somewhat wrong, but it's enough for a lay person to get the gist).

You can think of the car's brain as a black box (you can't see how it works inside). On one end, you give it a situation for an input, and on the other end, it outputs what it thinks is the right result/maneuver. Inside the box are a ton of dials, and they represent the neurons. When the AI is being trained with dojo or whatever computer, it's basically tuning all those dials. The goal is to find the right setting for each of those dials where the output is correct one for any given input. With a task as complex as driving, in order to have a chance at getting correct outputs for all the possible inputs, you need a ton of dials. And to get the settings right for all those dials, you need a ton of computing power, hence the need for something like dojo.

The black box with the dials sits in the car. But the same type of box exists at the dojo training center. When dojo training yields good outputs, the settings for all those dials at Tesla are sent to all the cars. Therefore the "brain" for all the cars are identical for a given firmware update.

In reality, there isn't just a single neural net (black box); there are several, each handling more specific functions. But that's the general idea of centralized training and distribution to the fleet. The important concept to understand here is that for any given input, past performance does not guarantee future results. A retraining of the NN could theoretically wipe out a previously correct output. This is why we see regressions in every firmware. Overall there might be a net improvement, but sometimes to address one problem, you screw up something else that was previously "solved."
thanks, i agree with the black box concept. your concept of dials was actually quite helpful - and yeah, no actual dials, i get that.
a much tougher question would be to ask how dojo can analyze the video clips, yada, but for my purposes, essentially, "how can a probably powerful, but relatively weak computer in a car run the dojo output" - and that (to me, simplified) would be "the outputted dojo rules fall within the car's computational ability"
next question would be, "okay, autopilot is pretty decent for what it is, but why does it accelerate and brake like a teenager?" - and for that, i imagine there needs to be some integrals or similar to decide how much acceleration or deceleration to use as against a potentially moving target car in front, and possibly even car behind" - so far i think AP and even NoA is superb for lane keeping, but miserable when coming to a stop and then say, the cars in front start moving again...it's like a lead footed teenager driving!
thanks again!
 
spot on. Thanks. yeah, my belief is dojo sucks up a ton of videos and creates a neural net, or rules in the neural net - great. then somehow, someway, i guess that "end part" gets cloned to the cars in the firmware update. so while the cars don't have the mental horsepower to create the rules, it can receive the rules and then act accordingly.

DOJO creates or trains the NN. Our cars simply run the NN. Dojo needs more power so that it can take in a ton of data and create a large NN in a reasonable amount of time. It takes much less power to simply run the NN. The FSD computer in our cars is able to run the NN.
 
  • Like
Reactions: Dewg
thanks, i agree with the black box concept. your concept of dials was actually quite helpful - and yeah, no actual dials, i get that.
a much tougher question would be to ask how dojo can analyze the video clips, yada, but for my purposes, essentially, "how can a probably powerful, but relatively weak computer in a car run the dojo output" - and that (to me, simplified) would be "the outputted dojo rules fall within the car's computational ability"
next question would be, "okay, autopilot is pretty decent for what it is, but why does it accelerate and brake like a teenager?" - and for that, i imagine there needs to be some integrals or similar to decide how much acceleration or deceleration to use as against a potentially moving target car in front, and possibly even car behind" - so far i think AP and even NoA is superb for lane keeping, but miserable when coming to a stop and then say, the cars in front start moving again...it's like a lead footed teenager driving!
thanks again!

Essentially they have a methodology for reducing the inputs to a set of numbers that can be fed through the neural net.

Then how it executes the actions required for those decisions is other specific programming.

But, as it makes and executes decisions the outside world is changing so then there's feedback.

This can cause the problematic jerky behavior. To make things smoother you don't want it treating every instant as a separate event. As I understand it, part of recent changes in their approach has been to try to be able to go "4D".
 
so far i think AP and even NoA is superb for lane keeping, but miserable when coming to a stop and then say, the cars in front start moving again...it's like a lead footed teenager driving!
To be clear, you haven't used FSD Beta? The current production Autopilot stack is around 4 years old, and it's basically predicting the position and velocity of the lead vehicle to maintain a desired distance. I believe FSD Beta directly predicts up to 4th time derivative of position (0th: position -> 1st: velocity -> 2nd: acceleration -> 3rd: jerk -> 4th: snap/jounce) to better determine how fast your car should accelerate whereas NoA probably does a bunch of heuristics and manually calculated derivatives to close the gap such as assuming the lead vehicle is more likely to accelerate back up to highway speeds.

[Potentially the speaker in the linked presentation actually meant 4 "derivatives" referring to 0th, 1st, 2nd, 3rd derivatives, so "only" up to jerk; and I wouldn't be surprised by an "off-by-one" type error with an engineer talking about this. But in any case, jerk is probably the key aspect of accelerating at an appropriate rate.]
 
Last edited:
OP, the real answer is, no one really knows how they're using "Dojo", or if they're even using it at all yet.
thats what I thought too.
I thought that Dojo was a development in process, not in use. They are working on the dojo compute resource and have yet to actually build the computer made up of dojo nodes.
But the question is still relevant - They are using "some resource" to generate the NN, it will just get faster and be able to process more video when it is able to run on the finished Dojo.
 
And like any real time cost senstitive production the car's h/w has limited resource and likely battles to schedule and overlay many large NNs into the processing cores/matrix/arrays. They try to optimize by dividing work between the cores but best laid plans don't always work in the real world. Time will tell if V11's AP/FSD vertical stack makes seamless operation ever so slightly more challenging with potentially worse lag and indecisiveness.
 
A NN training analogy might be like placing fingers in visibly leaking holes of a water dike. You can minimize visible leaks by adjusting NN weights so the overall NN error is minimized. But in between the visible leaks are slow leaks that may or may not be covered by parts of the hand or finger webbing and that's where NN generalization comes into play. NN generational is the magic that doesn't always work well. That's why NNs need lots of real world varied data with good clean accurate sensors.
 
Last edited: