Regarding NVidia vs Tesla performance
Tesla HW3 : 144 TOPS (quoted as NN specific compute power, not including CPU/GPU) at 72W, 2 TOPS/W performance, costing 20% less than HW2.5 compute board
Tesla HW2.5 : < 12 TOPS at 57W, < 0.21 TOPS/W performance, costing ??
NVidia Drive Xavier : < 30 (20 GPU / Tensor) TOPS at 30W, < 1 TOPS/W performance, costing ?? (Tegra Xavier dev kits are $1300 on amazon but I'm sure actual cost is way below that)
NVidia Drive Pegasus : < 320 (260 GPU / Tensor) TOPS at 500W, < 0.64 TOPS/W performance, costing ?? (~RTX 2080 performance x2 plus Tegra Xavier x2 easily puts it over $2000, probably more)
I say the NVidia solutions are all less than claimed because there's no way a Tesla NN actually runs at claimed TOPS (vs HW3 which would) when considering the lack of architectural features needed to prevent stalls and work with batch size of 1, etc. I suspect they might not use batch size of one and instead take the latency hit for better overall throughput on HW2.5, but I could be wrong.
In fact, we can use Tesla's own data here to figure out the effective TOPS of HW2.5. They claim that HW3 is 21x faster than HW2.5 running the same NN, and this means they get ~6.86 TOPS from a claimed 12 TOPS performance, making real world performance only 57% of claimed. The newer NVidia chips might improve things somewhat but they probably won't run batch size of 1 as well, and they will never handle huge NNs as well (due to memory latencies and lack of large addressable SRAM memory on chip). As memory latency only gets worse in terms of clock rate (as it will be similar in terms of absolute time) the newer NVidia chips may not see particularly massive improvements when running small batch size large memory size NNs, even as their clock rates and number of compute units increases.
Plus NVidia is combining all sources of computing power to arrive at their TOPS ratings (CPU, GPU, and any Tensor cores), which is disingenuous from the perspective of running a single large NN. Ignoring the CPU cores, you get 20 TOPS on Xavier and 260 TOPS on Pegasus. The split of regular GPU and Tensor core power makes things even more complex...
So giving a generous 80% efficiency to NVidia's newer solutions when running Tesla's NN would mean Pegasus would be getting effectively 208 TOPS (still more than HW3) at 500W (still too much) for a horrendous 0.416 TOPS/W. If you scaled back that solution to 144 effective TOPS then it would still be 346W power consumption. Limiting it to 100W you'd be looking at only 42 TOPS. Hell limiting the claimed 320 TOPS to 100W would net you only 64 TOPS.
So no matter what, NVidia is hot and power hungry, and not a suitable candidate for Tesla's needs, even if they can reach the needed performance when unbounded by thermal and power constraints.