Project Dojo - the SaaS Product?

Buckminster · Feb 17, 2024

https://twitter.com/x/status/1759016103383380058

NVDA software is often cited as a huge positive. It probably is right now but will soon be a liability. Another area where Dojo can leapfrog.

Buckminster · Feb 20, 2024

https://twitter.com/x/status/1759941976927924682

Buckminster · Feb 21, 2024

https://twitter.com/x/status/1760159205288227235

Buckminster · Feb 22, 2024

NicoV said:
The end game is that, in addition of the CPU/GPU/Flash/RAM silicon we have in every computing device, there will be additional silicon for AI inference (i.e. using the neural networks). That will be TPU's, not GPU's. There will be lots of different TPU's, each one matched to the intended use. E.g. the Tesla Autopilot cpu, or the Apple m1 series 16-core Neural Engine. If you are a software developer (aka neural network consumer), you're probably going to use an abstraction layer that can interact with all these different devices without hardcoding for one of them (so that you don't depend on Nvidia CUDA). I'm currently using ONNX (ONNX | Home). Nvidia is probably not going to be dominant here because:
- SOC builders like Intel, AMD, Apple will include (actually they already have done that to some degree) that functionality into their SOC.
-If it's not built into a SOC, there is still the possibility to use a dedicated TPU. Google sells one cheaply (Products | Coral), and there are others, and probably an order of magnitude more under development.
NVidia sells it's own SOC ARM-based product in this market: Jetson, Xavier NVIDIA Embedded Systems for Next-Gen Autonomous Machines
TPU's give more bang for the buck than GPU's:

https://twitter.com/x/status/1760328985630924958

Then there is the training part of AI. This requires massive data centers, massive amounts of data, massive amounts of compute. For the moment Nvidia is dominant here, but they charge so much for their H100 TPUs NVIDIA H100 Tensor Core GPU that their clients will all look for cheaper alternatives. Tesla even built their own system in about 1 or 2 years, called Dojo. It's difficult to imagine that Nvidia will remain without competition here for a long time.

Buckminster · Feb 24, 2024

https://twitter.com/x/status/1761176389925941332

Buckminster · Mar 9, 2024

https://twitter.com/x/status/1766346389943128080

Buckminster · Mar 27, 2024

heltok said:
https://twitter.com/x/status/1772874074911359073

Seems like it was a large amount of compute coming online from various sources. Imo this is what Tesla are doing well, being agile. When everyone struggles to get nVidia chips, they have already diversified to alternate suppliers, doing it inhouse and getting things up and running fast. And they are not afraid to make the billion dollar investment into compute. Meanwhile where is Toyota, VW etc with this? Have we heard anything?

A new inference load at xAI

https://twitter.com/x/status/1772724958801649711

I suspect that Dojo problems are related to software than anything else. Competing with cuda is hard but they will crack it.

Buckminster · Mar 28, 2024

Usain said:
After watching Karpathy's recent interview I gleaned something I think is very important with regard to Dojo.

He was talking about how horribly inefficient today's GPU's are. Even NVIDIA's best are power hungry monsters that just don't do their jobs very well compared to the low-power human brain at only about 20 watts. It's obvious that when it comes to AI hardware, we are doing it wrong.

Clearly, there is unbelievable potential for improvement.

So the Dojo project can be justified on the basis that AI hardware is in its infancy. It's time for exploration. Like the gold rush, most will fail, but the one who digs in the right spot will be richly rewarded.

There's gold in them thar hills!

Buckminster · Apr 22, 2024

Tesla is building data centers in Buffalo, New York, and Austin, where it’s headquartered, to process the footage captured by its vehicles and train its driving systems. The Buffalo site is further along, while the Austin one is struggling with cost overruns, people familiar with the projects said.

Buckminster · Apr 23, 2024

Buckminster · Apr 26, 2024

https://twitter.com/x/status/1783866958850486305

Buckminster · May 1, 2024

willow_hiller said:
Tidbit of news about the next generation of Dojo from TSMC at the North American Technology Symposium. The stated timeline is to expand the reticle limit to 5.5 by 2026, which would enable 3.5 times the compute, and full wafer integration by 2027 which enables 40 times the compute.

Expect a Wave of Wafer-Scale Computers

TSMC tech allows for one version now and a more advanced version in 2027

spectrum.ieee.org

Buckminster · Jun 20, 2024

https://twitter.com/x/status/1803855681029566557

Cosmacelf · Jun 20, 2024

Buckminster said:
https://twitter.com/x/status/1803855681029566557

Sounds like Tesla will be bringing a lot of inference chips into their data center. Why? I think it is to do extremely fast validations of new training builds. Instead of running a new build in shadow mode for weeks in their fleet they will now be able to run a new build on a large cluster of inference chips using recorded video or even simulations. This will speed up the validation process by a huge amount.

Cosmacelf · Jun 21, 2024

Cosmacelf said:
Sounds like Tesla will be bringing a lot of inference chips into their data center. Why? I think it is to do extremely fast validations of new training builds. Instead of running a new build in shadow mode for weeks in their fleet they will now be able to run a new build on a large cluster of inference chips using recorded video or even simulations. This will speed up the validation process by a huge amount.

Correcting myself here. According to an earlier post from Elon, the inference chips are used during the training step to generate the loss number. So they use the actual production inference chips during training as part of the back propagation algorithm.

https://twitter.com/x/status/1798398757781835946

Search

Project Dojo - the SaaS Product?

Buckminster

Well-Known Member

Buckminster

Well-Known Member

Buckminster

Well-Known Member

Buckminster

Well-Known Member

Buckminster

Well-Known Member

Buckminster

Well-Known Member

Buckminster

Well-Known Member

Buckminster

Well-Known Member

Buckminster

Well-Known Member

Buckminster

Well-Known Member

Buckminster

Well-Known Member

Buckminster

Well-Known Member

Expect a Wave of Wafer-Scale Computers

Buckminster

Well-Known Member

Cosmacelf

Well-Known Member

Cosmacelf

Well-Known Member

Similar threads