Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Project Dojo - the SaaS Product?

This site may earn commission on affiliate links.
That looks like a picture of the 2021 Dojo tile on display at the Petersen auto museum (first pic seems to show the label on the display)

I mean, I've seen that too- it was cool looking... but production ones doing useful work would be a lot cooler to see.
 

Day 09: Nine Ladies Dancing | Dojo as a Service​

Part of 12 Days of Christmas - Tesla Edition a series (c) by the Artful Dodger, Dec 2023

Over this Yuletide season, I will post a daily installment focusing on Tesla products, past, present, and future (please note that I will express major themes as short-hand bullet points as I Yule tidy-up the loose ends). Here's the series so far:

Day 01: A Partridge in a Pear Tree | Roadster Proof of Concept
Day 02: 2 Turtle Doves | S/X Fraternal Twins go Mainstream
Day 03: 3 French Hens | Model 3 Bets the Company
Day 04: 4 Calling Birds | Model Y Built at Four Factories
Day 05: Five Golden Rings | Semi Breaks Physiks
Day 06: Six geese a-laying | Megapack To Excel
Day 07: Seven Swans a-Swimming | Cybertourdeforce
Day 08: Eight Maids A-Milking | Model 2 World Car

Intro to Part 09: Nine Ladies Dancing | Dojo as a Service

Back at Tesla's first AI Day on Aug 19, 2021, the stage was set for the unexpected debut of the DOJO chip. This turned out to be the basic functional unit of a much larger supercluster consisting of 'training tiles' of these chips, run in a massively parallel group. This chip pron revealed the key to Tesla's secret plan to create their own neural net (AI) training system, and promised to become one of the world's largest supercomputers (designed and built in-house by Tesla engineers).

1. I/O Bottleneck

  • The problem with hand-coding FSD is that, after writing 300K lines of C++ code, the original problem is not important (and still buggy, and bulky, and over-budget)
  • Artificial Intelligence in the form of multiple Neural Nets (NN) changes the paradigm by providing literally billions of tunable parameters at the expense of not having direct control over individual weights (indeed, attributing 'causality' in a NN requires an AI psychologist)
  • So we trade the limitations of human programmers for the uncertainty of statistics, generated by a NN training system which processes vast amounts of data to create its end product: a NN intended to be used to make decisions in realtime by a much more modest agent
  • Project Dojo was designed to make that training regime more efficient. It was built to move data. Really, really large amounts of data. Like a fleet of VW Buses full of data. Quickly.
  • What previously took 30 days to create with Tesla's nVidia A100 CPU cluster may take just 8 hrs on DOJO. Why is that important? Because the Autopilot team is experimenting with various techniques to see what type and amount of data, and what level of architecture vs self-learning is most effective. This is empirical science and engineering; there is no wrong path, only paths that lead nowhere. The 'Fail Fast' mantra of Silicon Valley applies to Tesla AI/Robotics.
  • In 2023, Tesla spent Billions on nVidia GPU hardware to support its insatiable NN trg needs, and will spend more next year. Eventually though, Telsa will be the vendor of superfast NN trg equipment, shirley one of the fastest growing sectors of the economy over the next 20-30 yrs
Lesson 1: Move it or Lose it

2. A New Metric

  • In 2016, Tesla began its own CPU design project to create the FSD chip used in HD3 (in service since Apr 2019). It was 20x more powerful than the MobilEye CPU it replaced, but consumed <100W of power. Thus a new mantra was coined: "Useful Compute per KWh"
  • Tesla bought an "NN trimming" company as an 'aqui-hire' for their 12 engineers. This small group demonstrated skill in greatly reducing the size of a NN w/o dramatically affecting performance. Indeed the entire Tesla Autopilot team is only a few hundred engineers.
  • As always, there is a delicate dance between the cost of training the NN and the size of the NN which can be used by the agent (ie: FSD Computer). This is the art in engineering, and it takes time and effort to reach the right balance. Right now, Tesla is throwing more resources at the training hdw side, believing that is the low-hanging fruit. We should know in a year.
Lesson 2: Make it Fast; Make it Efficiently

3. AI training as a Service

  • Dojo v1 is optimized for processing video. Except for other automakers who may decide to license Tesla FSD (and possibly Optimus Robot applications) there is a relatively small market for video training (as we speak, but this may change)
  • Dojo v2 has hardware optimized for general AI training, such as large language models (LLMs) and generative AI (ie: deep fakes!) and thus has a much broader TAM
  • It is this new generation of Dojo training services which I believe Tesla intends to commercialize (their are job posting out today for a Data Center API Programmer). Who knows where this will go, but I am certain TSLA shareholders will 'get their beaks wet' in this AI fountain
  • Lastly, the race to AGI is a National Security issue, with both China and Russia staking out claims in this latest land rush. The U.S.A. will not be left behind (kicking and screaming).
Lesson 3: Build a Better Mouse Trap (and they'll build a better mouse)

Conclusion:​

Tesla Dojo will be the 'Watt steam engine' that powers AI agents (ranging from FSD to Optimus), and increasingly, some 3rd-party AI applications, until the dawn of true AGI. Right now, the agent (whether a car or humanoid) has only a limited "inference" CPU, meaning it can not learn directly, it can only apply a set of connections created in the Data Center training environment. It is my belief we won't see true AGI until the agent can dream (reprocess experiences internally, conduct what-if trials, and update it's own NN matrix). Whether Tesla that makes that leap, or some other entity remains to be seen. And whether that day is 4 yrs in the future, or 40, does not matter a wit if we humans can not thrive in our new enviroment. The time is now to stake a claim.

Next up: Humanoid robots in the Factory (and Healthcare, and gasp, the Post Office)

Tomorrow's Topic:

Day 10: Ten Lords A-Leaping | Teslabot FTW​

 
Elon held a space with Peter Diamandis. Below is a link to a compressed version. See particularly at ~43:30 when Elon starts talking about the acceleration of AI computing growth.

He notes a 10x 6-month growth rate in AI computing power industry-wide and says that it is the fastest technology acceleration that he has ever seen. Mentions a GW-class compute cluster being built in Kuwait (700,000 A100-equivalent), a 500MW compute cluster, and multiple 100 MW compute clusters. Says he's not clear what you would do with so much computing power because you would run out of training data, although I suspect he has a few ideas...

Compare to Tesla projections of 300,000 A100-equivalent available to the company by EOY 2024 and maybe 60,000 A100-equivalent available to it today.

 
  • Helpful
Reactions: Rarity


AWS is already on there at both ends? Not sure what you mean by "AWS as a service" given the S in AWS is service.

Since nobody else can buy Dojo, and there's no "as a service" offering in existence either, and there's as yet no evidence it's actually of value compared to alternatives other than Teslas own supply limits (Elon himself said Dojo might be entirely unnecessary if Nvidia sold them more chips), I wouldn't think adding it made sense unless/until one or more of those changes.