Once again, Lex and Jim Keller knock it out of the park. This was a great
talk that covered many topics and to listen to it, you really get a great idea of what Jim is like in his entirety.
Also, I'm finding it a bit hard to describe the talk in a way that is approachable to this audience, but here goes...
TL;DListen - Dojo is a game changer for FSD as it will be 10X to 1M times more efficient while being less risky and way more scalable to solve FSD.
More details below...
Why? GPU's or other options are not designed for just solving FSD or even solving FSD the exact way that Tesla wants to solve it. When Tesla started the process of trying to solve FSD, they had no idea if or when or how often they would hit 'local maximiums' (also known as failing and 'hitting a wall' where the code would never get any better; one step forward and one step back). Building something like Dojo solves or is the least painful and/or quickest path to solving FSD as well as most likely being the cheapest. Dojo will work seamlessly with their inference chip (the chip that is in the car) to do inference as well as possibly other tasks (I won't go into that now). These chips working together also solves huge bottlenecks as iteration time is greatly reduced from training to inference as well as cost and quality. Engineers can spend more time solving FSD rather than solving infrastructure issues like throughput constraints or data quality or data structure or data cleansing...etc
Dojo will be the hardware and the entire software stack that is purpose built for one and only one purpose; Solve FSD. This is super exciting as there is nothing like in the custom chip (ASIC or other kind of chip) world. It will NOT be easily copied and sounds like they are thinking about chip scale, machine scale and network scale all as 'first class citizens' (or 'native' or maybe a better way to say this is 'No bull sh$t other code or HW that will slow down the one and only purpose of the chip).
Here's a bit more detail....Jim is building a new processor at his new company with none other than Chris Lattner (decorated senior engineer who had a stint at Tesla as well as built key parts of the Google TPU stack). Jims' new processors may or may not be specific to any single goal. He didn't say, but the compelling aspect of Dojo, in his description, has gone through 2 iterations (he calls them pivots) from when he was at Tesla.
Jim's description of what a great custom ASIC is like, is very interesting and I think, the meat of this interview for this audience.
Based on his description, Dojo is most likely going to be specific to Tesla's entire software stack while seemingly only relying on little public code (like PyTorch) so it won't be easily copied. This is key as it means once they have built the software stack they can built out the HW to be flexible enough to avoid or mitigate bottlenecks in the hardware as well as the software. Today, when you hit bottleneck in hardware or software you are most likely going to have to live with it for at least until new hardware is released or even worse, it might be just a bottleneck that your team or company has hit and it is ONLY you who hit the bottleneck and you have to solve it yourself in software or hope/pray that you can convince the hardware company to fix it in their next or future revision of hardware. Tesla has effectively reduced this to a minimum as they own the whole stack (top to bottom).
If you've made it this far, maybe you might want more details and I'd be happy to, but maybe in another thread.
MODS: This may be of general interest to genpop , but off course, feel free to kick it to a sub-forum.
Thanks DD ! - great summary: Good level of detail combined with clear explanations.
Follow up question:
In similar projects of this magnitude what is the ballpark ratio of work used for data transfer, compression or bottleneck workarounds, as you describe it, compared to solving the actual problem (FSD)?
Clarification:
Did Keller really say 10- 1 million times faster/better or are you using your own background to evaluate the upper bound? Did he compare to the (in car) FSD version 3 chip efficiency, or to other ways of doing FSD (competitors)
Wild-eyed speculation: Perhaps Tenstorrent will built some kind of specialized compute unit which Tesla can use in DOJO? Perhaps made-to-order, Tesla-only. Perhaps not a main or bulk part but supplementary? I love Teslas vertical integration, but sometimes having a trusted partner is worth a lot.
General amazement:
I still find it hard to believe that DOJO is generic enough for other tasks. Elon said it could mine bitcoin, and your summary seems to imply a generic quality. To my limited understanding, FSD is (or was) considered
so freaking hard that solving that requires specialized hardware - as evidenced by Tesla doing exactly that re. their custom car chips.
It also kind of doesn't make sense financially: Solving FSD is worth so much money, that even making a lot of bitcoins wouldn't really measure up. On the other hand, if solving FSD takes a number of years, and a huge amount of compute and custom chips, having some extra income is useful.
The only way that I can make sense of DOJO being generic is if DOJO is actually a trojan horse kind of tech for solving AGI !!
Is there a chance that this is actually what Tesla is trying to do?
Or am I flying of a tangent here?
(How does that rhyme with Elons continued warnings about AI?)
Or, if DOJO does not solve AGI entirely, then solves an at least a large subset of AGI. Or perhaps doesn't quite solve, but
boosts other known techniques by a significant order of magnitude - a kind of
AGI runway. Which in the end may be the missing link for solving AGI ?..!
Maybe Elon concluded that FSD was close enough to AGI that he might as well solve for AGI. And then get FSD 'for free'.
If that is the case, then solving for a subset of AGI is worth a ... what is the level above?
Most of us are getting used to Tesla being 10+ startups. I used to think that solving FSD was worth a lot - to mankind, but also to us fans and investors. Solving AGI (or a significant subset) Danm! That is is 'huger than huge'.
If true, then someone PM Warren Redlich - his most crazy estimates are
way too conservative!
(AGI: Artificial General Intelligence)