Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

How Tesla could potentially solve “feature complete” FSD decision-making with imitation learning

This site may earn commission on affiliate links.
DeepMind’s AlphaStar Supervised is better than 84% of ranked human players at StarCraft II and it trained purely via imitation learning on about 1 million human-played StarCraft II games. At an average game time of 30 minutes (a high estimate), that’s about 60 years of continuous play.

Those 1 million games were selected from the top 22% of players on the European ranked servers.

How long will it take Tesla to collect 60 years of continuous driving data from the top 20% of drivers in the Hardware 3 fleet for the purposes of imitation learning? I made a spreadsheet to find out.

I start counting in July 2020. My assumption for the purposes of this spreadsheet is that the requisite neural networks and software to do imitation learning for “feature complete” Full Self-Driving will be running in the latest production software update by July 2020.

The biggest caveat is that the computer vision networks will have to be good enough for “feature complete” FSD by this point. That remains to be seen. The comparison to AlphaStar Supervised only pertains to using imitation learning to solve the decision-making component of “feature complete” FSD.

Note: this spreadsheet is just about “feature complete” FSD with human monitoring and interventions, not eyes-off, robotaxi FSD.

The assumption that is by far the hardest to give any kind of reasonable or evidence-based estimate for is the long tail for competent Level 2 highway, city, country, and parking lot decision-making versus StarCraft II. If my thinking is correct, the length of the long tail determines what percentage of data would be useful to collect (via active learning) for driving versus StarCraft II. The way I tried to handle this was to simply give a range of figures:

5% of cumulative years collected = decision-making long tail is 20x longer for “feature complete” FSD than for StarCraft II

1% of cumulative years collected = decision-making long tail is 100x longer for “feature complete” FSD than for StarCraft II

0.5% of cumulative years collected = decision-making long tail is 200x longer for “feature complete” FSD than for StarCraft II

0.1% of cumulative years collected = decision-making long tail is 1,000x longer for “feature complete” FSD than for StarCraft II​


This allows me to compute a corresponding month by which 60 years of continuous driving data could be collected:

5% of cumulative years collected = November 2020

1% of cumulative years collected = October 2021

0.5% of cumulative years collected = July 2022

0.1% of cumulative years collected = July 2025​


Link to the spreadsheet:

Tesla FSD Computer Fleet Years of Continuous Driving (projected)

Constructive feedback welcome. Feel free to make a copy of the spreadsheet and input your own assumptions.
 
Last edited:
  • Like
Reactions: JBRR
5% of cumulative years collected = decision-making long tail is 20x longer for “feature complete” FSD than for StarCraft II

A clearer way of saying this would be: “5% of cumulative years is useful to collect”.

July 2025

This is a bit silly to include, actually, since I assumed Model S/3/X production would be basically flat from Q3 2019 to Q2 2025. That probably won’t even be true for Q4 2019, but I wanted to be conservative.

I also assumed a very conservative production ramp and ceiling for Model Y (run rate of 208,000 units per year with production starting in Q1 2022) and I didn’t include any other vehicles beyond Models S/3/X/Y, i.e., no Cybertruck, “Model 2” sedan, or “Model Z” crossover. So, beyond Q2 2022 the production numbers just get silly. (IMO, anyway.)

Maybe if I have more time/motivation I’ll fix Q3 2022 - Q2 2025, while still keeping it conservative. Anyone should feel free to make their own version of the spreadsheet with their own numbers.
 
Last edited:
DeepMind’s AlphaStar Supervised is better than 84% of ranked human players at StarCraft II and it trained purely via imitation learning on about 1 million human-played StarCraft II games. At an average game time of 30 minutes (a high estimate), that’s about 60 years of continuous play.

This has zero relation to driving. A single task NN is in no way impressive compared to what an autonomous car needs to do. Also, the control mechanisms aren't neural networks in these cars, they're traditionally written software. The imaging systems are neural networks. These problem sets and solutions could not be more different than each other.

Those 1 million games were selected from the top 22% of players on the European ranked servers.

So all we need is to record all driving ever done by 1 million people and then hand it over to...some magical system that interprets, labels, and trains on it? I mean, Tesla already use the open data sets for NN training which contain many millions of images as well as their own collected data which at this point has got to be decades worth of video and still images.

Then what?

How long will it take Tesla to collect 60 years of continuous driving data from the top 20% of drivers in the Hardware 3 fleet for the purposes of imitation learning? I made a spreadsheet to find out.

Uh. About 60 years. Unless you mean an aggregated amount of data that equates to 525960 hours of driving. Which isn't really a good way to train a driving system since what you really need is unique data and a lot of driving is the same exact data streaming by.

I start counting in July 2020. My assumption for the purposes of this spreadsheet is that the requisite neural networks and software to do imitation learning for “feature complete” Full Self-Driving will be running in the latest production software update by July 2020.

That's one hell of an assumption, given none of that exists right now. And also, the vehicles don't do the training. So, I guess you're going to be loading a bunch of nVidia Voltas into your car and powering them somehow? The amount of compute required to do training is, and I can not stress this enough, absolutely astounding. Datacenter floors full levels of hardware.

The biggest caveat is that the computer vision networks will have to be good enough for “feature complete” FSD by this point. That remains to be seen. The comparison to AlphaStar Supervised only pertains to using imitation learning to solve the decision-making component of “feature complete” FSD.

If the vision networks aren't good enough, then quite literally nothing else matters. And again, since neural nets are NOT used to actually drive the vehicle, none of the rest of this matters of makes sense.

Honestly, this just makes no sense to me. I feel like people are hearing buzzwords and landing somewhere on the Dunning Kruger curve.
 
  • Like
Reactions: motocoder
And again, since neural nets are NOT used to actually drive the vehicle, none of the rest of this matters of makes sense.

Untrue. For example, when a HW3 Tesla running Software v10 drives through a highway cloverleaf on Autopilot, its trajectory is determined in part by a neural network:


That's one hell of an assumption, given none of that exists right now.

At least some of it exists, per the Karpathy video above.

So all we need is to record all driving ever done by 1 million people and then hand it over to...some magical system that interprets, labels, and trains on it?

It isn't magic, it's imitation learning. In end-to-end imitation learning, the input-output pair used for deep supervised learning is 1) a video and 2) the control output (steering, acceleration, braking) recorded while a human is driving the vehicle.

In mid-to-mid imitation learning, the input-output pair is 1) the object detections and other "vector space" representations outputted by the computer vision networks and 2) the planner-level semantic actions (e.g. paths taken) while a human is driving the vehicle. This input-output pair is known as the state-action pair.

For StarCraft II, the state-action pairs DeepMind used were 1) the game state (e.g. units, buildings, resources) observed by the API and 2) the clicks and keystrokes made by a human player. This approach was highly effective.

Waymo has a neural network called ChauffeurNet that was trained via mid-to-mid imitation learning. ChauffeurNet is a research project; it isn't used for planning in Waymo One vehicles. But folks at Waymo say they are pursuing a hybrid approach to planning that combines hand-coded elements and imitation learned elements. I believe Tesla is pursuing a similar, hybrid approach.


Blog post: Learning to Drive: Beyond Pure Imitation

Paper: ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

Uh. About 60 years. Unless you mean an aggregated amount of data that equates to 525960 hours of driving. Which isn't really a good way to train a driving system since what you really need is unique data and a lot of driving is the same exact data streaming by.

I mean 60 years total, i.e., 525,600 hours.

As I briefly mentioned in the OP, the driving data would be curated through active learning. One way of doing active learning would be uploading data whenever there's a disagreement between an ensemble of neural networks.

Another way would be learning from interventions. Whenever a user disengages Autopilot, upload data about what happened before, during, and after.

A third approach to upload data when, in manual driving mode, there's a disagreement between the planner and the human driver. Aurora describes this concept in their ICML 2019 talk on slides 14-17 from 13:15 to 16:30.

There are other ways to do it too. For example, if you know your system struggles with unprotected left turns, collect a lot of data on unprotected lefts.

And also, the vehicles don't do the training. So, I guess you're going to be loading a bunch of nVidia Voltas into your car and powering them somehow? The amount of compute required to do training is, and I can not stress this enough, absolutely astounding. Datacenter floors full levels of hardware.

Of course not. The data is uploaded to Tesla and the training happens on Tesla's GPUs. Training all of Tesla's neural networks once from scratch takes 70,000 GPU hours (i.e. if you have 1,000 GPUs, it will take you 70 hours). 1,000 GPUs is equivalent to 63 Nvidia DGX-2s, which altogether would cost $25.2 million. This is what a DGX-2 looks like, so imagine 63 of these:

hIjXCOX.jpg


If you want to do your training runs in 7 hours, it will take 630 DGX-2s at a cost of $252 million (plus the power bill).

If the vision networks aren't good enough, then quite literally nothing else matters.

I totally agree.
 
Last edited:
Andrej Karpathy made another reference to imitation learning at Tesla Autonomy Day at 2:28:18:

"Ultimately, actually designing all the different heuristics for when it's okay to lane change is actually a little bit intractable, I think, in the general case. And so ideally, you actually want to use fleet learning to guide those decisions. So, when do humans lane change, in what scenarios? And when do they feel it's not safe to lane change? And let's just look at a lot of the data and train machine learning classifiers for distinguishing when it is safe to do so. And those machine learning classifiers can can write much better code than humans because they have the massive amount of data backing that. So, they can really tune all the right thresholds and agree with humans and do something safe."​

A third reference is at 2:44:40:

Question asker: "...in terms of platooning, do you think the system is geared? Because somebody asked about when there is snow on the road, but if you have platooning feature, you can just follow the car in front. Is your system capable of doing that?"

Andrej Karpathy: "So, you're asking about platooning. So, I think, like, we could absolutely build those features. But, again, if you just train neural networks, for example, on imitating humans, humans already, like, follow the car ahead. And so that neural neural network actually incorporates those patterns internally. It figures out that there's a correlation between the way the car ahead of you faces and the path that you are going to take. That's all done internally in the net. So, you're just concerned with getting enough data and tricky data. And the neural network training process, actually, it's quite magical, does all the other stuff automatically. So, you turn all the different problems into just the one problem: just collect your data set and use neural network training."​
 
Last edited:

Its quite funny how no one ever talks about the clear HD map being depicted in that video, the parking lots being mapped and paths through the intersection being mapped, including each lane. Elon is the master of saying what he won't do to appear superior while doing it behind the scenes. Definitely gonna need verygreen to look into that when city NOA gets here.

EDIT: i don't believe that's simply a view of a traj of a car (the first minute). that looks like a full HD map if i ever seen one.
 
Last edited:
So, I already shared a spreadsheet that made conservative assumptions about HW3 vehicle production, etc.

I just created a new version of the spreadsheet that makes more aggressive assumptions:

Tesla's Imitation Learning Fleet: Years of Continuous Driving (projected)

Constructive feedback welcome.

Wayve has a pair of graphs showing the distribution of steering angles and speeds for city driving:

TrQntjW.jpg


Unfortunately, the Y axis is unlabelled. But by eyeballing it, it supports my original guess that ~95%+ of driving data should be thrown out. A lot of city driving is either 1) driving straight ahead or 2) being stopped. And that's just city driving. Highway driving would probably increase the percentage of steering straight (while also adding higher speeds to the distribution).
 
Last edited:
Maybe you could elaborate?

You're making up pretty much everything in here based on ignorance and complete speculation. There is zero relation to game playing networks and what's used in these cars. You're asserting that the NN control the vehicle, which they absolutely do not. There's really no reason to debate facts against fabrication.

Imitation learning is simple when all you need to do is learn the structure of a game. When it needs to learn how the universe works, that's a bit of a different problem set.
 
You're asserting that the NN control the vehicle, which they absolutely do not. There's really no reason to debate facts against fabrication.

What's your evidence for this? Andrej Karpathy was quite clear at Tesla Autonomy Day (2:13:33) that Tesla is using imitation learning for path prediction, which — according to Karpathy — plays an important role in determining Autopilot's trajectory through highway cloverleafs. He said (my emphasis):

"...we understand the path that this person took through this environment. And then of course we can use this for supervision for the network. So, we just source a lot of this from the fleet, we train the neural network on those trajectories, and then the neural network predicts paths, just from that data. So, really, what this is referred to typically is called imitation learning. We're taking human trajectories from the real world and we're just trying to imitate how people drive in real worlds. And can also apply the same data engine crank to all of this and make this work over time."
Then a bit later (2:15:52) he said (my emphasis):

"So, path prediction actually is live in the fleet today, by the way. So, if you're driving cloverleafs, if you're in a cloverleaf on the highway, until maybe five months ago or so, your car would not be able to do cloverleaf, now it can. That's path prediction, running live on your cars. We've shipped this a while ago. And today, you're going to get to experience this for traversing intersections. A large component of how we go through intersections and your drives today is all sourced from path prediction from automatic labels."​

In my post above, I also quoted what Karpathy said about applying imitation learning to lane changes and car following.

Do you think Karpathy was "fabricating" all of this?
 
Last edited:
What's your evidence for this? Andrej Karpathy was quite clear at Tesla Autonomy Day (2:13:33) that Tesla is using imitation learning for path prediction, which — according to Karpathy — plays an important role in determining Autopilot's trajectory through highway cloverleafs. He said (my emphasis):

"...we understand the path that this person took through this environment. And then of course we can use this for supervision for the network. So, we just source a lot of this from the fleet, we train the neural network on those trajectories, and then the neural network predicts paths, just from that data. So, really, what this is referred to typically is called imitation learning. We're taking human trajectories from the real world and we're just trying to imitate how people drive in real worlds. And can also apply the same data engine crank to all of this and make this work over time."
Then a bit later (2:15:52) he said (my emphasis):

"So, path prediction actually is live in the fleet today, by the way. So, if you're driving cloverleafs, if you're in a cloverleaf on the highway, until maybe five months ago or so, your car would not be able to do cloverleaf, now it can. That's path prediction, running live on your cars. We've shipped this a while ago. And today, you're going to get to experience this for traversing intersections. A large component of how we go through intersections and your drives today is all sourced from path prediction from automatic labels."​

In my post above, I also quoted what Karpathy said about applying imitation to lane changes and car following.

Do you think Karpathy was "fabricating" all of this?

I could be wrong but to my knowledge while Tesla did indeed use imitation learning to figure out the right paths, the vehicle controls themselves are not directly controlled by neural nets. Put differently, neural nets handle the perception part of seeing the lanes which then inform the "software 1.0" which actually handles the vehicle steering and braking. The neural nets are not directly controlling the steering and braking.
 
I could be wrong but to my knowledge while Tesla did indeed use imitation learning to figure out the right paths, the vehicle controls themselves are not directly controlled by neural nets. ... The neural nets are not directly controlling the steering and braking.

Yes, I believe this is correct.

Path prediction might be an instance of what's known as mid-to-mid imitation learning, wherein the planner or some elements of the planner are neural networks trained via imitation learning, but 1) the computer vision system is trained via traditional fully supervised learning with hand-labelled images/videos and 2) hand-coded controls software executes the plan given to it by the planner. From the Waymo ChauffeurNet talk embedded above:

E4XTxcR.jpg


Or if path prediction is trained by driver-generated labels applied to raw pixels — rather than to representations extracted from raw pixels by the computer vision networks — then it would be end-to-mid imitation learning.

I'm not sure whether Tesla drivers' steering is labelling raw images of roadways or whether the steering is labelling some mid-level representation like road edges or semantic road space or semantic free space. So, I'm not sure whether path prediction is mid-to-mid or end-to-mid.

In either case, I believe Tesla's controls software is hand-coded.

Put differently, neural nets handle the perception part of seeing the lanes which then inform the "software 1.0" which actually handles the vehicle steering and braking.

Since path prediction is actually predicting the future curvature and gradient of the roadway ahead, beyond what its sensors (including cameras) can see, I would say it is an element of planning rather than an element of perception. Or at least it's in a gray area where it isn't easily categorized as just strictly perception.
 
Last edited:
To clarify: my assertion is not that neural networks are involved in Tesla's controls software. My assertion is that — per Andrej Karpathy's remarks on Autonomy Day — neural networks are involved in Tesla's planner. Path prediction is the primary example of this.

My point is just that Tesla is using imitation learning with Autopilot/FSD. And that's what makes the analogy between Autopilot/FSD and AlphaStar Supervised relevant.
 
Last edited:
  • Like
Reactions: clydeiii
Fleet data, imitation learning, deep neural nets, end to end learning etc.. are useful tools that help with developing autonomous driving. They are tools that everybody is using, not just Tesla. They are not magic bullets that automatically solve autonomous driving. I think that is important to keep in mind. I only say this because I get the impression that some Tesla fans sometimes act like Tesla just needs to feed all the data from their large fleet into a machine and full self-driving will pop out the other end. It does not work that way.
 
  • Like
Reactions: aymelis
Sebastian Thrun, a co-founder of the Google self-driving car project (which is now Waymo), explains planning:


Thrun explains control:


These videos are from the free online Udacity course Self-Driving Fundamentals.

What I'm saying is that neural networks and imitation learning are involved in planning. Not in control.

Hope this helps clear up any confusion.

I think, above, @DrDabbles may have thought I was discussing control when I was discussing planning.
 
Last edited:
  • Informative
Reactions: diplomat33
Sebastian Thrun, a co-founder of the Google self-driving car project (which is now Waymo), explains planning:


Thrun explains control:


These videos are from the free online Udacity course Self-Driving Fundamentals.

What I'm saying is that neural networks and imitation learning are involved in planning. Not in control.

Hope this helps clear up any confusion.

Thanks for the videos.
 
  • Like
Reactions: strangecosmos2
Fleet data, imitation learning, deep neural nets, end to end learning etc.. are useful tools that help with developing autonomous driving. They are tools that everybody is using, not just Tesla.

But if to extract 100 years of useful driving data from your fleet for imitation learning, you need 40,000 years of total driving, then you need a large fleet to do this in a practical amount of time.

If you have 1,000 cars driving 24/7/365, that will take you 40 years. If you have 10,000 cars driving 24/7/365, it will take you 4 years. Tesla can do it in about 1.5 years.

A widely accepted principle of deep learning is that scale of data matters quite a lot. This has been borne out by research from Baidu, Google, Facebook, and others. Specifically in the autonomous driving context, Wayve recently published a blog post that showed — albeit at a very small scale — how their system's performance scaled with hours of experience:

aFlzxBY.png


Pretty much everyone working in the autonomous vehicle space seems to agree that scale of training data is important. For example, this is what Kyle Vogt, the CTO and President of Cruise, recently said:

“The reason we want lots of data and lots of driving is to try to maximize the entropy and diversity of the datasets we have.”
CEOs, CTOs, Presidents, and technical leads of other AV companies express similar sentiments.
 
Last edited:
Imitation learning is simple when all you need to do is learn the structure of a game. When it needs to learn how the universe works, that's a bit of a different problem set.

Andrej Karpathy is a world-class deep learning researcher. Why do you think Karpathy believes in using imitation learning for autonomous driving tasks like path prediction and lane changes?

Mid-to-mid imitation learning doesn't need to understand the world from scratch. It uses the computer vision neural networks' representations, i.e., the “vector space” representations of the physical world. It can also use the behaviour prediction neural networks' forecasts about road users' future trajectories. So, mid-to-mid imitation learning isn't starting from scratch.

Imitation learned elements can also be combined with a hand-designed planner. Sergey Levine has a great talk about this. If the system detects that the incoming data is out-of-distribution for the imitation learning neural networks, it can fall back on hand-coded planning algorithms (i.e. heuristics). This is a hybrid approach between “Software 1.0” and “Software 2.0”.

Waymo appears to be pursuing such a hybrid approach. From their blog post on ChauffeurNet:

“The planner that runs on Waymo vehicles today uses a combination of machine learning and explicit reasoning to continuously evaluate a large number of possibilities and make the best driving decisions in a variety of different scenarios, which have been honed over 10 million miles of public road testing and billions of miles in simulation. Therefore, the bar for a completely machine-learned system to replace the Waymo planner is incredibly high, although components from such a system can be used within the Waymo planner, or can be used to create more realistic “smart agents” during simulated testing of the planner.”
Waymo’s head of research, Drago Anguelov, gave a guest lecture at MIT where he stressed the importance of imitation learning for autonomous driving and discussed how Waymo is using it (starting around 34:00):

 
Last edited: