AI progress in the past 12 months have come with the increased compute: The most powerful models require a lot more compute - both for training and inference. Thus increased per node FLOPS, RAM, and number of nodes used in inference are an important factor for fast rate of improvement.
Tesla can increase compute for training (with Dojo or buying H100s from Nvidia), but for the inference the compute and the model size are fixed due to most of the fleet being on HW3 that is over four years old. With this, rate of improvement will be constrained.