This is one of the most important posts on this forum given the high importance FSD robotaxis has suddenly become in supporting any future apprecation of share price, so it's surprising there's not much discussion.
You have given some good evidence for why it's quite likely the models are going to grow much bigger to support lower error rates / lower disengagements. There will probably be advances in ability to distill these big models into something smaller that works mostly as well but still... this a serious issue with inference compute. If Tesla eventually trains a model with high enough fidelity for robotaxi, it may be orders of magnitude too big to fit on HW3, HW4, or who knows, HW5.
This is a very serious risk.
There are other risks wrt to training: How hard will it really be to improve performance 100x?
There are papers and studies around "neural scaling laws". They look at how much increase in compute, data, and model size affects the abiliy to lower error rates.
Here's one on vision transformers (on images) which could relate reasonably well with FSD.
Here's the thing, even on a log scale, as you increase the compute, data, and model size, you get
sublinear returns in error rate improvement.
I.e. it becomes harder and harder to reduce the error rate further. The model size, data size, and compute needed to get crit disengagements (CD) to 3000 miles / CD may be a 10x increase, but then to get from 3000/CD to 30000/CD may take more than another 10x increase, it might be another 30x increase in data / compute / model size.
Tesla has the ability to scale up data and compute some, but not these orders of magnitudes in a "short" time frame of a few years (let alone by August).
Point being, there's a lot of empirical data hinting that training FSD to a robotaxi level could take a lot longer than people think.
View attachment 1038036
View attachment 1038037
View attachment 1038040