D1 (400w TDP, 645mm^2, 7nm)
BF16: 362 Tflops
FP32: 22.6 Tflops
On Chip Bandwidth 10TBps(or 1250 Gb/s)
Off Chip bandwidth: 4TBps(or 500GB/s, 25 D1 per Tile)
Off Tile bandwidth is 36TB/s reported (I think 9TB/s is more like it for tile to tile communication), 3000 D1 chips connected together
AMD Radeon MI100 (300w TDP, 750mm^2, 7nm)
BF16: 92.3 Tflops
FP32: 23.1 Tflops
On Chip bandwidth : 1228.8 GB/s
Peak Infinity Fabric™ Link Bandwidth 92 GB/s (offchip)
Nvidia A100 (400w TDP, 826mm^2, 7nm)
BF16: 312 Tflops
FP32: 19.5 Tflop
On chip bandwidth: 2,039 Gb/s
Off Chip bandwidth: 600GB/s (up to 12GPUs)
Off Chip bandwidth PCI4: 64Gbs
My opinion, the actual D1 chip is pretty good, but not mind blowing given the size/power usage. It's inline with Nvidia's best. But dat scalability holy S balls with them interconnect bandwidth off tile. Mind blowing...