Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
You are wrong. Has been available to key customers in beta status since summer. You are welcome!
So let's look at this from a business standpoint. Tesla knew 2 years ago or so that they needed massive compute power to train the NNs. What were their options?

1. Try to use off-the-shelf hardware (servers/cloud/GPUs etc) and deploy massive numbers of these.
2. Find someone who has built the kind of custom hardware they need.
3. Build it in-house.

#1 isnt efficient .. CPUs can do it, of course, but at huge power expense (which means $$$$ and time). #2 there are limited options, pretty much the only choice is Google. #3 means dedicating resources to the design. No doubt there was some "ooh we want to build that" from the design team (after all, they already had chip design experience from HW3), but in-house designs always overrun, and have a huge long-term maintenance cost.

So, why didn't they use Google? I think it's pretty obvious. They would be taking (indirectly) a dependency on their #1 major competitor .. Waymo. From a business perspective, this is high-risk. Also, the Tesla mind-set and company culture is vertical integration .. they want to own the entire stack, so they can fine-tune and control it going forward as a competitive edge over less integrated competitors.

I don't know if this will turn out to be a good idea to not, but I'd put money on something like the above being part of their decision-making process.
 
Also there's no actual evidence googles solution is "better" (let alone cheaper) for the thing Tesla is actually doing here.

See again nearly every field in the comparison chart for Google V4 was blank.

There's CAD GPUs that beat gaming ones on certain benchmarks, but run like garbage at specific tasks the gaming GPUs excel at. And vice versa.

Likewise Googles solution might work better for SOME tasks, but not specifically what Tesla needs, where Dojo outperforms it.

The fact Tesla is NOT trying to use googles v4 (and given they have mountains of cash, and a stated need for more compute for specific tasks they'd have little reason not to be using it the second it's available at least till Dojo is ready to go) suggests that is the case too.
 
  • Like
Reactions: K5TRX and impastu
Karpathy said they are highly protective of their data. Suspect they'd rather not upload their data to Google with a connection to Waymo. Cloud costs will likely be higher than a custom solution over the long run. There is something to be said about controlling your own destiny. I doubt Tesla ever seriously considered a Google Cloud solution.
 
Last edited:
Karpathy said they are highly protective of their data. Suspect they'd rather not upload their data to Google with a connection to Waymo. Cloud costs will likely be higher than a custom solution over the long run. There is something to be said about controlling your own destiny. I doubt Tesla ever seriously considered a Google Cloud solution.
Until Dojo is ready, I’m sure they will be using large clusters of off-the-shelf GPUs in the cloud
 
  • Like
Reactions: impastu
  • Like
Reactions: impastu
Off the shelf GPUs? Yes.

In the cloud? Nope.

They physically built it themselves in-house.

ah, hadn't seen this. Generally speaking, setting up on-prem datacenters especially at a gigantic scale is cost prohibitive. Many large AI companies do ML training in the cloud (on AWS or GCP). But I take it back if Tesla has already publicly stated that they have created everything in-house :D
 
Also there's no actual evidence googles solution is "better" (let alone cheaper) for the thing Tesla is actually doing here.

See again nearly every field in the comparison chart for Google V4 was blank.

There's CAD GPUs that beat gaming ones on certain benchmarks, but run like garbage at specific tasks the gaming GPUs excel at. And vice versa.

Likewise Googles solution might work better for SOME tasks, but not specifically what Tesla needs, where Dojo outperforms it.

The fact Tesla is NOT trying to use googles v4 (and given they have mountains of cash, and a stated need for more compute for specific tasks they'd have little reason not to be using it the second it's available at least till Dojo is ready to go) suggests that is the case too.

This is a myth that has been spreading in the tesla comm. Not surprised you picked it up and ran with it.
Dojo is completely general purpose and not specialized. No one is stupid enough to build a specialized training system as it will be DOA because of how fast NN architecture is advancing. Any ML expert would tell u that..
We know the type of NN FSD Beta runs. This isn't some hidden secret. Its not some special NN architecture, its run of the mill.

The "Tesla is doing something special" is one among the thousands of other Tesla myths.

Finally, WE actually have MLPerf benchmarks from Google TPU V4, we have ZERO from DOJO and won't for many years, infact DOJO as presented doesn't even EXIST! and wont for several years if ever.
 
This is a myth that has been spreading in the tesla comm. Not surprised you picked it up and ran with it.
Dojo is completely general purpose and not specialized.

The myth is that you ever make an honest post.

I cited actual real world examples of two GPUs that are specialized.

They can both execute the same tasks, but one is MUCH better at a specific class of tasks than more generalized hardware.

The other is better at a DIFFERENT specific class of tasks than more generalized hardware.


So the idea Tesla would bother to custom design a GENERAL PURPOSE chip to primarily handle a SPECIFIC TASK (processing computer vision data)is nonsensical.

In fact the CURRENT GPU cluster Tesla is using is a great example of that. It's a ton of Nvidia A100s.

Which are "GPUs" but they're be terrible at video games compared to something designed for that task. Just as a gaming GPU would be much worse at the specific task they're using the A100s for.
 
Last edited:
This is a myth that has been spreading in the tesla comm. Not surprised you picked it up and ran with it.
Dojo is completely general purpose and not specialized. No one is stupid enough to build a specialized training system as it will be DOA because of how fast NN architecture is advancing. Any ML expert would tell u that..
We know the type of NN FSD Beta runs. This isn't some hidden secret. Its not some special NN architecture, its run of the mill.

The "Tesla is doing something special" is one among the thousands of other Tesla myths.

Finally, WE actually have MLPerf benchmarks from Google TPU V4, we have ZERO from DOJO and won't for many years, infact DOJO as presented doesn't even EXIST! and wont for several years if ever.
It’s always interesting to come across writing like this. A study in intellectual superiority. Thank you for sharing.
 
  • Like
Reactions: SucreTease
... Dojo is completely general purpose and not specialized. ...
Are Dojo's 8 and 16 bit floating point numbers specialized?
Quote:
This standard specifies Tesla arithmetic formats and methods for the new 8-bit and 16-bit binary floating-point arithmetic in computer programming environments for deep learning neural network training.
 
The myth is that you ever make an honest post.

I cited actual real world examples of two GPUs that are specialized.

They can both execute the same tasks, but one is MUCH better at a specific class of tasks than more generalized hardware.

The other is better at a DIFFERENT specific class of tasks than more generalized hardware.


So the idea Tesla would bother to custom design a GENERAL PURPOSE chip to primarily handle a SPECIFIC TASK (processing computer vision data)is nonsensical.

In fact the CURRENT GPU cluster Tesla is using is a great example of that. It's a ton of Nvidia A100s.

Which are "GPUs" but they're be terrible at video games compared to something designed for that task. Just as a gaming GPU would be much worse at the specific task they're using the A100s for.

Its not the same. When Chip designers talk about specialized NN training system, they are talking about a system where the NN architecture is built in the hardware. If its not doing that then its not specialized. period.
 
Its not the same. When Chip designers talk about specialized NN training system, they are talking about a system where the NN architecture is built in the hardware. If its not doing that then its not specialized. period.
No way ... things can be "specialized" in the sense they are aimed at specific vertical applications but are still flexible enough to adopt to differing needs .. for example a DSP for audio processing etc.
 
  • Informative
Reactions: Knightshade
Yeah you've got a guy who has no idea WTF he's talking about (again) being caught making that obvious (again) and now trying to move goalposts about what "specialized" and "generalized" hardware actually mean to make it less obvious (again).

At least he's consistent, even if it's consistently wrong :)

BTW, excellent additional example of the actual meanings citing DSPs. They can do LOTS of jobs generally, but you can still absolutely build ones that do specific things better than others (and most do so)....
 
  • Like
Reactions: EVNow and aronth5
At A.I. day a reporter asked the hardware design guy if they could patent some of the stuff. He replied I'm not sure you can patent a linear algebra processor. Implying it is generalized. And yes, you can have generalized and specialized together, although I agree with bladerskb that it is much more general than specialized.

Karpathy tweet about how neural network architectures are becoming more generalized:
 
At A.I. day a reporter asked the hardware design guy if they could patent some of the stuff. He replied I'm not sure you can patent a linear algebra processor. Implying it is generalized.

That doesn't imply that at all. it implies you can't patent math.

Tesla is using a couple of specific types of math, and thus designed their processor to SPECIALIZE in those operations.

There's plenty of processors out there, as multiple folks have noted now, that while they CAN run a lot of generalized stuff, are SPECIALIZED to run specific things much much better than more generic/generalized silicon.

Same thing here.

And it's not like Tesla doesn't already HAVE some patents about specificity either-


According to the patent publication, an embodiment of systems and methods include techniques and systems that are specifically described to determine neural network configurations, which are adapted to a specific platform


But if you want something dojo specific Tesla is creating that's specific, here ya go:

Tesla extended precision support by introducing Configurable Float8 (CFloat8), an eight-bit floating-point format. This format reduces memory storage and bandwidth in storing weights, activations, and gradient values essential for training and increasing larger networks.


Tesla published a 9 page whitepaper on this (mentioned in link above) and more on it here:

It mentions another standard Google created to help with NN/AI stuff previously, and how Tesla is now doing a couple of their own to take this further.

Dojo is HW SPECIALIZED for these operations.

That doesn't mean it couldn't run other stuff of course. Just that it's better than anything else at those SPECIALIZED tasks.


Google v4 may well be faster at OTHER tasks of course. But why would Tesla care since those aren't the ones they need to be faster?