Only GP100 (Tesla P100) and the Tegra X1 and X2 SoCs supports FP16 at double speed.
GP106 offers INT8 at quadruple speed. So 4,5 TFLOPs FP32 = 18 TFLOPs INT8.
You dont't want to train the neural networks in the car, that you do on a supercomputer somewhere, what you need in the car is fast...