I just tried FSD 12 and was very impressed. Only 2 interventions needed over 30 miles and much more natural behavior, which is orders better than anything Tesla released before.
As a potential investor though, I need to determine whether FSD can achieve an intervention rate of 100x better for Teslabot (average person walks 2 miles a day) or 10,000x better for RT (> 100k miles per intervention), and the minimum timeframe for that. I also want this to be as quantitative as possible, despite the lack of empirical information.
Another premise is that when a breakthrough occurs, like end to end AI for FSD, extrapolating progress based on similar projects is better than using the history of the FSD prior to the step improvement. No one else is using the pure AI model like Tesla, so I'll use LLMs for reasons I'll describe later. There are lots of other generative models like image generation (predicts the image based on words) and video generation (predicts the next frames using previous frames, pictures, or words). All of these have much in common with FSD, which uses video input and a destination to predict its next driving action.
AI model "smartness" performance (as opposed to speed) depends on model design and training. The design dictates the number of parameters, which is proportional to the memory required for inference (prediction). I'll only include AI models since 2018 that are based on the transformer architecture, because these likely contain many elements in the FSD model.
The key for most generative AI is the transformer architecture, introduced in 2017. The earliest versions produced human-like responses, but were not very smart. As models increased in size, they become smarter with more training data. The performance of smaller models also improved with better design and training, but more parameters generally results in better inference.
I've listed some models to give an idea of size affects their performance. For fun, I've included responses by the models to some questions to give an idea of how "smart" the model is.
Question 1. What is the number that rhymes with the word for a tall plant?
Year | Model | Parameters | Response 1 |
2018 | GPT 1 | 117M | "that's a lot of syllables, " ivy said, looking over her notes. |
2019 | GPT 2 | 1.5B | "Thin is a word," I think someone said. |
2021 | GPT 3 | 175B | |
2022 | GPT 3.5 | 350B ? | The number that rhymes with the word for a tall plant is "four." |
2023 | GPT 4 | >1T | The number that rhymes with "tree," a word for a tall plant, is "three." |
It's pretty obvious that GPT 4 is a lot smarter than prior models, but even the earliest models understand something, although not very much. Also note that the number of parameters since GPT 1 has increased exponentially each year. The improvements come at a price that is acceptable when computing in a remote data center.
For FSD, inference needs to be on the edge, in the vehicle. The next table shows Llama 2, an LLM released in 2023 with several model sizes. This is a good way to see how the model size influences the response.
Question 1. What is the number that rhymes with the word for a tall plant?
Question 2. A car cuts in front of a driver. He would need to swerve to avoid hitting it. If he goes left, he hits a curb. If he goes right, he hits a brick wall. In one sentence, describe his best option.
Year | LLM | Parameters | Response 1 | Response 2 |
2023 | Llama 2 | 7B | The number that rhymes with "tall" is "all". | The best option for the driver is to swerve right, as hitting the brick wall is likely to cause less damage than hitting the curb on the left. |
2023 | Llama 2 | 13B | The word for a tall plant is "tall."
There is no number that rhymes with "tall." | In this situation, the best option for the driver would be to swerve to the right, as hitting the brick wall would be less damaging than hitting the curb on the left. |
2023 | Llama 2 | 70B | The word that rhymes with "tall plant" is "bant." | The driver's best option in this situation would be to swerve right, towards the brick wall, but brake heavily and try to come to a stop before impact, hoping that the car that cut him off will stop or slow down enough to allow him to avoid a collision. |
2023 | Chat GPT | > 1T | The number that rhymes with "tree," a word for a tall plant, is "three." | The driver's best option is to hit the curb by swerving left.
Me: Why?
Swerving to the left and hitting the curb is generally a safer option compared to swerving right and hitting a brick wall, as hitting the curb is less likely to cause severe damage or injury to the driver and passengers. |
The Llama models do well compared to early GPT models with similar size. As the model size increases, the model seems to understand the question better. However, model size still reigns supreme. A model with 10-20x fewer parameters cannot compete against a larger one.
Given the above and the limited data available for FSD, my thoughts at this time are:
1. Regardless of training, there is a very high chance the HW3 or HW4 may not be able to achieve 10^4 fewer interventions. Training improves models up to a point, but size still matters when handling diverse input. Better training and design has improved smaller models, but models with 10-100x fewer parameters are not able to compete with larger one. Intuitively, this makes sense - the world's smartest dog is smart for a dog, but a smart dog is still a very dumb adult.
2. Improving the reliability of FSD will probably be slower and more difficult than most people think. GPT 4 took many years after GPT 1 to achieve orders of magnitude better performance and still required exponentially more parameters each year. FSD 12 needs to improve similarly,
without substantially increasing the model size or power. This is far more difficult with those constraints.
3. Although LLM progress is not a perfect comparison, it seems like a reasonable start. A FSD RT with very high reliability needs to understand complex situations and behaviors to make good predictions, just like LLMs.
4. An issue I haven't touched on is processing speed. GPU processing speed is closely related to the number of parameters. If Tesla can solve this for the model size, I assume they can handle GPU computing requirements. Not a given either, but I'm being lazy
5. (edit) Teslabot is a lot easier than FSD. Besides the less critical nature of its decisions, the computing may not need to reside completely on the edge. The price of computing and power consumption may have more flexibility than cars, especially in a hybrid model where AI processing is not done complete on the bot.
Detailed benchmarks for different LLMs.
Enter your own text into LLMs with different model sizes (choose the Direct Chat tab at the top).