Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla, TSLA & the Investment World: the Perpetual Investors' Roundtable

This site may earn commission on affiliate links.
Although fast compared to training, inference is still compute and memory intensive compared to algorithms (ie hand coding). The code saved is small compared to model computing and size.

Tesla's first priority and top challenge will be to get FSD to work, which includes reducing interventions by orders of magnitude. This will require more training data, which is usually accompanied by increased complexity, resulting in more parameters and layers, and higher memory requirements.

Based on the history of most AI models, improving the predictions increases model size, and rarely decreases. Tesla won't be able to optimize level 4/5 FSD, beyond trying to make each version fit into its hardware, until it's able to meet safety specs. Dealing with limited hw while improving model performance makes this a very tough problem.

As you noted... it's 5 years later to get to this point, which is only .001% of the way there.
It's not "fast compared to training". It's a totally different application process. Training requires petabytes of information to be processed to build the algorithm. Inference compute can fit on a smart watch for edge computing.

Currently Tesla is in the process of removing bottlenecks on the inference side by having the algorithm using more of the npu. The npu was done with the answer a long time ago and it was the cpu having all the control C code that took forever to utilize the npu answer and would offen output the wrong thing.

Compute has plenty of headroom because prior to v12, v11 was bogged down by a cpu core capable of only 16k flops per core. Assuming 4 cores were used, we are talking 64k flops as the rate limiting step vs the npu which is 72 trillion flops. The question is all about memory size.
 
Dialing down below your current max (posted) speed in anticipation of a lower speed change (new posted) had the car returning to the max (posted) speed? I’ve never had that happen. Or am I misunderstanding your situation?

Anyway, if the posted speed is 50 and you know it’s going to change to 35, V12 doesn’t slow quickly enough to the new speed to escape a speed trap so now you have to start dialing down at least at the warning speed sign so that when you reach the 35 zone the car is already close to that speed. You know, like how people are supposed to drive but typically keep going 50 and then hit the brakes hard to get down to the new speed. Anyway, I’ve never had any version of FSD go above my manually toggled and set speed setting UNLESS I pass a new speed sign of a greater speed. Then it resets.
You have the situation correct. However, the school zones I encounter are from posted speeds of 30 or 40 mph with school zones of 20 mph. If I dial it down to 20, it goes back up to 30 or 40 after a very short time (5 seconds or so). Never tried this with 50->35 school zones because I rarely encounter them.
 
  • Informative
Reactions: Krugerrand
Dialing down below your current max (posted) speed in anticipation of a lower speed change (new posted) had the car returning to the max (posted) speed? I’ve never had that happen. Or am I misunderstanding your situation?

Anyway, if the posted speed is 50 and you know it’s going to change to 35, V12 doesn’t slow quickly enough to the new speed to escape a speed trap so now you have to start dialing down at least at the warning speed sign so that when you reach the 35 zone the car is already close to that speed. You know, like how people are supposed to drive but typically keep going 50 and then hit the brakes hard to get down to the new speed. Anyway, I’ve never had any version of FSD go above my manually toggled and set speed setting UNLESS I pass a new speed sign of a greater speed. Then it resets.
V12 has more options in the menu. One option is to better mimic human driving without adhering to speed limits. It is more aggressive and I've seen it exceed speed limits by quite a bit.
 
I just tried FSD 12 and was very impressed. Only 2 interventions needed over 30 miles and much more natural behavior, which is orders better than anything Tesla released before.

As a potential investor though, I need to determine whether FSD can achieve an intervention rate of 100x better for Teslabot (average person walks 2 miles a day) or 10,000x better for RT (> 100k miles per intervention), and the minimum timeframe for that. I also want this to be as quantitative as possible, despite the lack of empirical information.

Another premise is that when a breakthrough occurs, like end to end AI for FSD, extrapolating progress based on similar projects is better than using the history of the FSD prior to the step improvement. No one else is using the pure AI model like Tesla, so I'll use LLMs for reasons I'll describe later. There are lots of other generative models like image generation (predicts the image based on words) and video generation (predicts the next frames using previous frames, pictures, or words). All of these have much in common with FSD, which uses video input and a destination to predict its next driving action.

AI model "smartness" performance (as opposed to speed) depends on model design and training. The design dictates the number of parameters, which is proportional to the memory required for inference (prediction). I'll only include AI models since 2018 that are based on the transformer architecture, because these likely contain many elements in the FSD model.

The key for most generative AI is the transformer architecture, introduced in 2017. The earliest versions produced human-like responses, but were not very smart. As models increased in size, they become smarter with more training data. The performance of smaller models also improved with better design and training, but more parameters generally results in better inference.

I've listed some models to give an idea of size affects their performance. For fun, I've included responses by the models to some questions to give an idea of how "smart" the model is.

Question 1. What is the number that rhymes with the word for a tall plant?

YearModelParametersResponse 1
2018GPT 1117M"that's a lot of syllables, " ivy said, looking over her notes.
2019GPT 21.5B"Thin is a word," I think someone said.
2021GPT 3175B
2022GPT 3.5350B ?The number that rhymes with the word for a tall plant is "four."
2023GPT 4>1TThe number that rhymes with "tree," a word for a tall plant, is "three."

It's pretty obvious that GPT 4 is a lot smarter than prior models, but even the earliest models understand something, although not very much. Also note that the number of parameters since GPT 1 has increased exponentially each year. The improvements come at a price that is acceptable when computing in a remote data center.

For FSD, inference needs to be on the edge, in the vehicle. The next table shows Llama 2, an LLM released in 2023 with several model sizes. This is a good way to see how the model size influences the response.

Question 1. What is the number that rhymes with the word for a tall plant?

Question 2. A car cuts in front of a driver. He would need to swerve to avoid hitting it. If he goes left, he hits a curb. If he goes right, he hits a brick wall. In one sentence, describe his best option.


YearLLMParametersResponse 1Response 2
2023Llama 27BThe number that rhymes with "tall" is "all".The best option for the driver is to swerve right, as hitting the brick wall is likely to cause less damage than hitting the curb on the left.

2023
Llama 213BThe word for a tall plant is "tall."
There is no number that rhymes with "tall."
In this situation, the best option for the driver would be to swerve to the right, as hitting the brick wall would be less damaging than hitting the curb on the left.
2023Llama 270BThe word that rhymes with "tall plant" is "bant."The driver's best option in this situation would be to swerve right, towards the brick wall, but brake heavily and try to come to a stop before impact, hoping that the car that cut him off will stop or slow down enough to allow him to avoid a collision.
2023Chat GPT> 1TThe number that rhymes with "tree," a word for a tall plant, is "three."The driver's best option is to hit the curb by swerving left.

Me: Why?

Swerving to the left and hitting the curb is generally a safer option compared to swerving right and hitting a brick wall, as hitting the curb is less likely to cause severe damage or injury to the driver and passengers.

The Llama models do well compared to early GPT models with similar size. As the model size increases, the model seems to understand the question better. However, model size still reigns supreme. A model with 10-20x fewer parameters cannot compete against a larger one.

Given the above and the limited data available for FSD, my thoughts at this time are:

1. Regardless of training, there is a very high chance the HW3 or HW4 may not be able to achieve 10^4 fewer interventions. Training improves models up to a point, but size still matters when handling diverse input. Better training and design has improved smaller models, but models with 10-100x fewer parameters are not able to compete with larger one. Intuitively, this makes sense - the world's smartest dog is smart for a dog, but a smart dog is still a very dumb adult.

2. Improving the reliability of FSD will probably be slower and more difficult than most people think. GPT 4 took many years after GPT 1 to achieve orders of magnitude better performance and still required exponentially more parameters each year. FSD 12 needs to improve similarly, without substantially increasing the model size or power. This is far more difficult with those constraints.

3. Although LLM progress is not a perfect comparison, it seems like a reasonable start. A FSD RT with very high reliability needs to understand complex situations and behaviors to make good predictions, just like LLMs.

4. An issue I haven't touched on is processing speed. GPU processing speed is closely related to the number of parameters. If Tesla can solve this for the model size, I assume they can handle GPU computing requirements. Not a given either, but I'm being lazy :)

5. (edit) Teslabot is a lot easier than FSD. Besides the less critical nature of its decisions, the computing may not need to reside completely on the edge. The price of computing and power consumption may have more flexibility than cars, especially in a hybrid model where AI processing is not done complete on the bot.

Detailed benchmarks for different LLMs.

Enter your own text into LLMs with different model sizes (choose the Direct Chat tab at the top).

This is one of the most important posts on this forum given the high importance FSD robotaxis has suddenly become in supporting any future apprecation of share price, so it's surprising there's not much discussion.

You have given some good evidence for why it's quite likely the models are going to grow much bigger to support lower error rates / lower disengagements. There will probably be advances in ability to distill these big models into something smaller that works mostly as well but still... this a serious issue with inference compute. If Tesla eventually trains a model with high enough fidelity for robotaxi, it may be orders of magnitude too big to fit on HW3, HW4, or who knows, HW5.

This is a very serious risk.

There are other risks wrt to training: How hard will it really be to improve performance 100x?

There are papers and studies around "neural scaling laws". They look at how much increase in compute, data, and model size affects the abiliy to lower error rates.

Here's one on vision transformers (on images) which could relate reasonably well with FSD.


Here's the thing, even on a log scale, as you increase the compute, data, and model size, you get sublinear returns in error rate improvement.

I.e. it becomes harder and harder to reduce the error rate further. The model size, data size, and compute needed to get crit disengagements (CD) to 3000 miles / CD may be a 10x increase, but then to get from 3000/CD to 30000/CD may take more than another 10x increase, it might be another 30x increase in data / compute / model size.

Tesla has the ability to scale up data and compute some, but not these orders of magnitudes in a "short" time frame of a few years (let alone by August).

Point being, there's a lot of empirical data hinting that training FSD to a robotaxi level could take a lot longer than people think.

Screenshot 2024-04-12 at 9.27.12 AM.png



Screenshot 2024-04-12 at 9.27.00 AM.png



Screenshot 2024-04-12 at 9.27.23 AM.png
 

Attachments

  • Screenshot 2024-04-12 at 9.27.12 AM.png
    Screenshot 2024-04-12 at 9.27.12 AM.png
    164.9 KB · Views: 2
  • Screenshot 2024-04-12 at 9.26.30 AM.png
    Screenshot 2024-04-12 at 9.26.30 AM.png
    124 KB · Views: 3
This is indeed true. And for the "base essentials" many seem to be unable (or simply won't) to do a TOC excercise. On a forum I'm on, a woman said there's no way she'd own an EV, as she's already paying way too much for electricity. When a couple of us Tesla owners walked her through the miles driven, cost of gas, etc... and showed her significant savings... she just seemed to not want to accept that and largely went silent.

For many I think it's a convenient crutch to confirm their inherent bias.
Elon is waiting to make them an offer they can’t refuse. The glut needs to happen first in reality rather than on paper.
 
No. I absolutely don’t think people will engage their brains. And for that reason I don’t think Tesla raising their warranty will do squat for sales. See what I did there?
I get it. But I hear me out.

There are millions of people who won't buy an EV because they think the battery will go bad and cost them thousands of dollars. A 200,000 mile warranty proves that a Tesla drive train lasts longer than any other car you can buy.

Even the brainless will understand.
 
Maybe. I'd certainly like this as an owner, but not so sure as an investor. I'd need to see the data Tesla has. I suspect we'll get there as the chemistries evolve towards the "million mile" pack life. On the current and past packs, I'm not sure what liability would come with that extension. But, if you believe that Tesla is truly working on million mile worthy batteries and motors, you would agree that Tesla will be able to safely extend battery warranties in the future.
I don't think Tesla would extend the warranty on battery packs they have already sold.

Except for 4680, Tesla probably knows if the warranty liability will be low. And from what we know so far, it looks like Tesla would more than make up in extra sales the cost of a few extra packs replaced under a 200,000 mile warranty.

But now that I think about it, 4680 might be the whole reason that a 200,000 mile warranty won't work. Tesla doesn't know enough about those. So Tesla could be afraid of putting such a high mileage warranty on 4680. And if the 4680 pack has a lower warranty than the rest it would call the longevity of 4680 into question.
 
V12 has more options in the menu. One option is to better mimic human driving without adhering to speed limits. It is more aggressive and I've seen it exceed speed limits by quite a bit.
Yes. It’s a nice option, but IMO, one that needs some refinement. I was using it late last night, and it was going well over the posted speed limit, 49 mph in 35 mph zone. During the day, this wouldn’t be unusual; however being late at night and the only one on the road (making me an easy target for law enforcement), I decided to disengage and report the excessive speed back to Tesla.
 
This is one of the most important posts on this forum given the high importance FSD robotaxis has suddenly become in supporting any future apprecation of share price, so it's surprising there's not much discussion.

You have given some good evidence for why it's quite likely the models are going to grow much bigger to support lower error rates / lower disengagements. There will probably be advances in ability to distill these big models into something smaller that works mostly as well but still... this a serious issue with inference compute. If Tesla eventually trains a model with high enough fidelity for robotaxi, it may be orders of magnitude too big to fit on HW3, HW4, or who knows, HW5.

This is a very serious risk.

There are other risks wrt to training: How hard will it really be to improve performance 100x?

There are papers and studies around "neural scaling laws". They look at how much increase in compute, data, and model size affects the abiliy to lower error rates.

Here's one on vision transformers (on images) which could relate reasonably well with FSD.


Here's the thing, even on a log scale, as you increase the compute, data, and model size, you get sublinear returns in error rate improvement.

I.e. it becomes harder and harder to reduce the error rate further. The model size, data size, and compute needed to get crit disengagements (CD) to 3000 miles / CD may be a 10x increase, but then to get from 3000/CD to 30000/CD may take more than another 10x increase, it might be another 30x increase in data / compute / model size.

Tesla has the ability to scale up data and compute some, but not these orders of magnitudes in a "short" time frame of a few years (let alone by August).

Point being, there's a lot of empirical data hinting that training FSD to a robotaxi level could take a lot longer than people think.

View attachment 1038036


View attachment 1038037


View attachment 1038040
Zero error is not the goal, and errors are harder to reduce because it is making less mistakes.

10x better than humans still means thousands of deaths per year.
 
I know many posters here claim this often, but I don't feel the majority of people feel this way. Most people I know, and I'm talking about average or below average wealth people, they value money over safety because they are strapped financially. Most people would never consider paying for a software to drive them around when they can do it themselves for free, even if it is less safe.

I often see Tesla bulls claiming once FSD is solved the take rate will skyrocket and the price will go up, but personally I do not see it playing out that way. Most people can't afford FSD today at $12K so they certainly won't be able to afford it at a higher price, no matter how good it is.

My personal opinion is Tesla will LOWER the price once FSD is solved in order to increase adoption. Bulls like to forecast tens of billions of dollars of FSD profit, but it's important to remember Tesla's mission is NOT to produce huge profits, it's to create a better world. Look at how Tesla is willing to drop margins on the cars in order to increase sales, FSD will likely play out the same way in my opinion.

People like Gary Black get blasted for looking at TSLA's future conservatively, but to me many TSLA bulls go too far in the optimistic direction. Remember Tesla's mission and don't let emotions get in the way of being realistic or practical. And just because many of us here can easily afford to pay $12K+ for FSD does not mean most other people ever would.

IMHO of course, just my two cents.
Think of it this way.

Imagine you can afford a $40k car. Would you rather a Model 3, or a somewhat smaller car with worse performance, but that can drive you everywhere, allowing you to sleep, eat, watch movies and play games?

And the real money is not in FSD. Nit is in Robotaxis.
 
  • Like
Reactions: Mengy and Usain
Zero error is not the goal, and errors are harder to reduce because it is making less mistakes.

10x better than humans still means thousands of deaths per year.

Who said zero?

Who said 10x better than human?

Right now, it's about 100x worse than human, and I'm inferring based on some data on how these models scale that getting to human equivalent may take more than 100x in data and model size. Compute too, but part of that can be solved by training for a longer period of time.
 
I get it. But I hear me out.

There are millions of people who won't buy an EV because they think the battery will go bad and cost them thousands of dollars. A 200,000 mile warranty proves that a Tesla drive train lasts longer than any other car you can buy.

Even the brainless will understand.
Well, my battery bricked at 42,000 miles. Don’t know the frequency for others, but too many like me might hurt.
 
This is one of the most important posts on this forum given the high importance FSD robotaxis has suddenly become in supporting any future apprecation of share price, so it's surprising there's not much discussion.

You have given some good evidence for why it's quite likely the models are going to grow much bigger to support lower error rates / lower disengagements. There will probably be advances in ability to distill these big models into something smaller that works mostly as well but still... this a serious issue with inference compute. If Tesla eventually trains a model with high enough fidelity for robotaxi, it may be orders of magnitude too big to fit on HW3, HW4, or who knows, HW5.

This is a very serious risk.

There are other risks wrt to training: How hard will it really be to improve performance 100x?

There are papers and studies around "neural scaling laws". They look at how much increase in compute, data, and model size affects the abiliy to lower error rates.

Here's one on vision transformers (on images) which could relate reasonably well with FSD.


Here's the thing, even on a log scale, as you increase the compute, data, and model size, you get sublinear returns in error rate improvement.

I.e. it becomes harder and harder to reduce the error rate further. The model size, data size, and compute needed to get crit disengagements (CD) to 3000 miles / CD may be a 10x increase, but then to get from 3000/CD to 30000/CD may take more than another 10x increase, it might be another 30x increase in data / compute / model size.

Tesla has the ability to scale up data and compute some, but not these orders of magnitudes in a "short" time frame of a few years (let alone by August).

Point being, there's a lot of empirical data hinting that training FSD to a robotaxi level could take a lot longer than people think.

View attachment 1038036


View attachment 1038037


View attachment 1038040
Absolutely. You are hitting on the multi-trillion dollar question.

Concerning that research, even if the FSD improvement function is the same as in this paper, we don't know where we are on the curve.

Let's take data size as an example, leaving compute and model size constant.

Elon has said that the system was decent after training on 1 million video clips. Then it got really good training on 2 million video clips. Does that suggest that we are just getting started on the improvement curve? Or are we at the end? Or maybe in the middle? Do we know what will happen with 10 million clips?

So even if we can expect diminishing returns with scale, I think it's just as likely that training on 10 million clips will yield a big improvement as it is that extra training will only yield a small improvement.

As to where we actually are on the curve, I don't know, but I bet Tesla has a pretty good idea. And Tesla is going all in on autonomy. As an investor, I'm taking that as super-bullish.
 
I get it. But I hear me out.

There are millions of people who won't buy an EV because they think the battery will go bad and cost them thousands of dollars. A 200,000 mile warranty proves that a Tesla drive train lasts longer than any other car you can buy.

Even the brainless will understand.
No beuno logic.

Tesla literally offers comparable warranty to ICE vehicles as well as not having service requirements; no oil changes, transmission flushes etc…. New engines and transmissions cost thousands, but nobody thinks twice. They either drive their ICE vehicle into the ground or they buy a new ICE vehicle when the warranty ends. Why would they suddenly apply a different logic to an EV?

I can answer the why: not engaging their brains and/or being brainwashed by the systemic media propaganda campaign.
 
I get it. But I hear me out.

There are millions of people who won't buy an EV because they think the battery will go bad and cost them thousands of dollars. A 200,000 mile warranty proves that a Tesla drive train lasts longer than any other car you can buy.

Even the brainless will understand.
I agree. A longer battery and drivetrain warranty would help the next wave of adopters make the switch. Playing devil’s advocate, “If electric cars are so great, why isn’t the drivetrain warranty better than what I can get on an ICE vehicle?”

I recall early in the Model 3 days, the car was said to be engineered for a million miles, largely because autonomous vehicles would experience more miles, and they needed to last. I don’t know if the million mile target is still in play, and I doubt if the design life of every component is that high, but I believe it would be a selling point if it were true AND the warranty reflected it.

Early in the Model S production days Tesla had the typical, 50k mile warranty on drive units, while the battery had an 8 year, unlimited mile warranty. Tesla struggled with some of those early drive units, and it was a real issue. However, they stepped up and increased the drive unit warranty to be the same as the battery. The drive units still had problems, but would-be customers saw Tesla backing their products with the improved warranty and that gave them confidence to purchase a Tesla.
 
Well, my battery bricked at 42,000 miles. Don’t know the frequency for others, but too many like me might hurt.

What vintage vehicle?

Thae pack on my early 2013 S went at about then too, and I remember hearing a few such cases. Got a refurb pack under warranty, and it still plugging away at 180K miles...

(Although it is telling me I need a fuse replacement, but that's only a couple hundred bucks I understand...)
 
  • Informative
Reactions: hobbes