I am pretty sure they're using GP106, not GP102. So it's actually 1280 + 256 (+ 256 on HW2.5), or an 80/20% theoretical split (~71%/~29% on HW2.5).
Indeed, and this significantly lowers the $600-$700 price I estimated for HW2.5: GP106 chips should be much less expensive than GP102 chips.
An equivalent GPU would be the "NVIDIA GeForce GTX 1060 Max-Q" discrete GPU chip for laptops with a 80W TDP, which has a similar boost clock of 1,480 MHz:
Price of the board should be similar to discrete GPU106 based NVIDIA cards, such as the "GeForce GTX 1060 6 GB":
except that the Tesla board has 8 GB of GPU RAM.
The GTX-1060 board retails for around $200-$300, and I suspect Tesla gets the GP106 for much less than $200 in bulk quantities, maybe even below $100. So the Tesla AI chip's direct per-unit cost savings should be less than $100, not the $400-$600 I estimated previously.
This is also more in line with what
@oneday noted:
They said on the last conference call,
Elon Musk said:
"And it costs the same as our current hardware and we anticipate that this would have to be replaced, this replacement, which is why I made it easy to switch out the computer, and that's all that needs to be done."
Regarding GPU clock speed:
But it's actually more complicated than that, since clock speeds will matter. For example a typical base clock of 1480 MHz for the GP106, and a GPU clock on the Parker SoC's ranging from 854 MHz to 1465 MHz on the Parker SoC(s) ... If just one Parker SoC is used and at the lowest clock, then you'd have a ~90%/10% split, and at the other end with both Paker SoCs and fastest clock, it would be a ~71% / 29% split. We have no idea what clocks are involved here, plus it is likely the Parker SoCs are less efficienct / have more latencies and overhead due to likely having slower memory than the GP106 does, and other reasons, since iGPUs are often second class citizens when it comes to accessing data and so forth. I would guess the realistic performance splits to be somewhere around 95%/5% to 85%/15% depending on clocks and whether both Parker SoCs are in use.
I'm pretty certain that when AutoPilot is active (i.e. when the car is driven) the chips typically just clock up to the maximum frequency. It's all liquid cooled, so there should be no thermal throttling.
My guess is that low power mode matters mostly when the car is not driving, you'd still want to have vehicle control software running and react to certain sensor inputs (such as temperature sensors to keep the BMS running, or the security system, or cabin overheat protection, etc.) - but full AutoPilot processing of the video+sensor feeds is not required. In this scenario the discrete Pascal GP106 chip is turned off entirely, and the two Parker SoC's are in low power mode. (Maybe even the integrated GPU is off in this case and only some of the ARM cores are running.)
Regarding memory bus speed: using already trained, static neural nets with no back-propagation are exceedingly simple calculations of combining the weights with the input values, where the number of weight values in their neural nets far exceeds the limited hardware cache sizes of Pascal chips, so I suspect their NN throughput is primarily memory bus limited. So if the memory performance of the GP106 and the Parker chips differs significantly, that would have a direct effect on NN processing performance of the integrated GPUs.
An interesting question is whether Tesla is going to replace the Parker SoC's as well, or only the GP106 discrete GPU chip. The safest iterative step would be to only replace the discrete GPU and keep the Parker SoC's, this would leave much of the ARM v8 based vehicle control platform unmodified. Making their own chip is a complex enough step already, they'd want to reduce the HW3 migration risks as much as possible.
But, these are just guesses and wild speculation, and I've been wrong a number of times in this short discussion already.