Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

xAI / Grök general discussion

This site may earn commission on affiliate links.
Elon said yesterday during the twitter space that there will be an electricity shortage in two years and he implied this will be in part because of the scaling of AI computing. I am not able to believe this after doing a few calculations.

However, I take this as an indication that Tesla and x.AI are aiming for a further significant growth in AI computing performance beyond October 2024 where the goal is to reach 100 Exa Flops AI Performance according to the tweet shared by Tesla AI on June 21, 2023.


By extrapolating the curve I get targeted 300 Exa-Flops AI Performance by Juli 2025

🤯

Comparison von NVIDIA’s and Teslas AI Hardare

It seems to me that we have reached now the steep part on the S-curve in the evolution of AI and AI Hardware. To get a better understanding, I compared NVIDIA’s with Teslas portfolio in this area. Beeing not an AI Expert at all, the products look comparable from a performance viewpoint and also from a performance per watt viewpoint. Also important is now who (e.g. Microsoft/OpenAI, Meta, Google and Tesla/x.AI) is able to scale very fast and can leverage the additional computing capacity. Tesla and NVIDIA A100 use the older TSMC 7 nm process while NVIDIA H100 uses the TSMC 4 nm process, which is also used for the Apples A15 System-on-a-Chip of iPhone 14 Pro/Pro Max. It’s possible that Tesla can scale faster due to better available manufacturing capacity at TSMC, but I don’t know which part of the system will be bottle-neck.

For Comparison, the most powerful supercomputing system acording to the Top500 List is currently Frontier with AMD Components and a Performance of 1.2 exaFlops FP64 which I would translate to about 20 exaFlops FP16 AI Performance.

My main Sources were
  • Tweet Tesla AI June 21, 2023
  • Presentation Tesla AI Day 2 October 1, 2022
  • NVIDIA’s Website, July 16, 2023
  • Wikipedia
Out of scope for my comparison was AMD’s Portfolio, due to my limited time.
NVIDIA‘s current AI Portfolio (A100 based)

The A100 Tensor Core GPU has a FP16 (AI) performance of 0.312 petaFLOPS and a max. TDP of 300 W or 400 W depending on the configuration. NVIDIA’s FP16 AI performance in this post is measured without the optimization method “sparsity”, since Tesla also took this assumption in the graph tweeted on June 21, 2023.

The DGX A100 consists of 8x NVIDIA A100 80GB Tensor Core GPU‘s, has a System Power Usage of 6.5 kW max and 2.5 petaFLOP‘s AI Performance. NVIDIA A100 Tensor Core GPU‘s seem to require about 50 % of the max. system power, a distribution I use below for ballpark estimations.

For 1 exaFlop FP16 AI Performance, 400 DGX A100 would be needed, with a total Max System Power Usage of 2.6 MW

NVIDIA‘s upcoming AI Portfolio (H100 based)

The new Grace Hopper superchip with 1000 W TDP consist of a CPU (Grace Arm Neoverse V2), a GPU (NVIDIA H100 Tensor Core GPU), up to 480 GB LPDDR5X ECC Memory and up to 96 GB HBM3. FP16 Performance is 0.990 petaFLOPS.

The DGX GH200, planned for HY2 2023, consists of 256 NVIDIA Grace Hopper superchips (total TDP of Grace Hopper superchips: 256 kW) and has a FP16 performance of 0.25 exaFLOP

Helios will consist of 4 DGX GH200 (TDP of Grace Hopper superchips: 1024 kW) and will have a FP16 performance of 1 exaFLOP.

The total power draw of a system with 1 exaFLOP FP16 AI performance is roughly 2 MW (Assumption: Half of the energy is used for the Grace Hopper superchips)
.
Teslas‘s upcoming AI Portfolio

The new D1 Chip with 400 W TDP has a FP16 Performance of 0.362 petaFLOPS.

A tile consists of 25 D1 Chips (TDP of D1 chips: 10 kW) and has a total FP16 Performance of 9.05 petaFLOPS.

A cabinet consists of 12 Tiles (TDP of D1 chips: 120 kW) and has a FP16 Performance of 108 PetaFLOPS.

An ExaPOD consists of 10 Cabinets (TDP of D1 chips: 1200 kW) has a total FP16 Performance of 1.08 ExaFLOP.

The total power draw of the system for 1.08 exaFLOP FP16 AI Performance is roughly 2.4 MW (Assumption: half of the energy is used for the C1 chips). For 1 exaFLOP 2.22 MW Electrical Power is required.

100 exaFLOP FP16 AI Performance, scheduled for October 2023, will require roughly 220 MW, which equals 1.9 TWh per year if running at 100 % uptime at full capacity.
 
  • Like
Reactions: Buckminster
Screenshot 2023-11-05 at 18.25.36.png


Imo wp catching up with facebooks rather large AI team which have been working on llama/llama2 for a long time for a pretty new and small team. Indicates that the team is making very rapid progress and who knows how far they will get in the future...
 
So basically Tesla/X.ai engineers reimplemented ChatGPT(GPT3.5) and llama2 with search of recent data in about 2 or 4 months(depending on how you read the tweets/blog). Not GPT4 level performance yet, but compute budget a lot lower and no dalle3 yet and least not public.

Imo Apple, Meta, Amazon et al should probably wake up a bit that theres a new player in town. This bodes very well for communication with Optimus.