Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Elon: "Feature complete for full self driving this year"

This site may earn commission on affiliate links.
giphy.gif

You do realize my first quote where I say Smart Summon is L4 was tongue and cheek right? Did you not see the little emoticons?
 
  • Like
Reactions: ironwaffle
Good god, I’d be laughing as hard as Mike is if I hadn’t actually paid $3k for this two years ago. It sounds like they should rename “advanced summon” to “Flintstones mode” with that list of limitations, because pushing the f-ing car seems like it’ll be more efficient than this.

I'm really curious what the Model Y will have in terms of cameras.

The biggest limitation the Model S/X/3 has when it comes to Smart Summons is there is no down facing 360 degree cameras. So there is no way for it to detect a lot of things like curbs.

Now I still look forwards to it being released. I do have a few easy use cases where it could be useful.
 
I asked @strangecosmos in another thread to run the numbers on the cost of collecting, storing, retrieving, labeling, and training on this supposedly vast amount of fleet data, but he never responded.

I don't read everything people post, and I don't respond to everything I read. The more respectful you are, the more likely I am to respond. The more disrespectful, the less likely.

Training machine learning algorithms on large amounts of data is very expensive. If there is any human labeling that is also very expensive,

Human labelling is by far the most expensive part of the equation for the training of a perception neural network. Raw sensor data needs human labelling. (Maybe some raw sensor data can be weakly labelled with driver input, but even if so that's at best supplementary to the hand-labelled data.)

State-action pairs for imitation learning don't need to be labelled. The action is the label for the state. In other words, the action is the output that the neural network learns to map to the input, which is the state. This is just good ol' fashioned deep supervised learning. AlphaStar was trained with state-action pairs from StarCraft games that didn't require hand labelling by annotators.

At the prices advertised on Google's website, training AlphaStar on cloud TPUs for the reinforcement learning portion of the training (which I have to assume was much more computationally intense than the imitation learning portion) would cost around $4 million.

For context, Tesla's overall R&D spending is $350 million+ per quarter. If you wanted to do the RL portion of AlphaStar's training once per week, it would cost $52 million per quarter ($4 million * 13 weeks).

As mongo mentioned, the economics work out differently if Tesla owns its own GPUs/NN training chips. According to Amir Efrati's reporting:

"...Tesla uses thousands of microchips known as graphical processor units, or GPUs, to train many networks simultaneously at its headquarters, according to a person who has been involved with the effort."​
 
Last edited:
Amir himself said Tesla hadn't even started working on a simulator

Tesla hadn't even started working on a simulator? That's certainly not what Amir said. He wrote:

"Under Mr. Bowers, the simulation and maps teams are both still in their infancy, said a person familiar with the situation. But simulation already has proven to be valuable in helping the Autopilot teams, especially the vision team, see how new software would be able to handle difficult driving scenarios that don’t happen frequently in the real world.​

As an example, Tesla’s simulator has helped determine that software recently being developed would prompt the vehicle to stop if it encountered certain parked vehicles on the highway."
So... Tesla hadn't even started working on a simulator? Not what Amir wrote. An infant isn’t a gamete.

When you make a strong statement of fact, it helps to double check the source material you're citing, rather than going off memory. We often misremember things we read. You've badly misquoted me multiple times; in at least one case the post you misremembered said the direct opposite of what you later claimed it did. In other cases, what you claimed I said was just completely untrue.

To avoid making this mistake, you can look up the source and quote it rather than going off memory. This might be slightly more inconvenient in the moment, but it can save you time in the long run and give your arguments more credibility. When someone can disprove what you said just by quoting the thing you referenced — multiple times — that makes it hard to trust your claims in the future.

Evidence of state/action data collection X

Why do you say there is no evidence Tesla is using imitation learning? On one hand, you are:

1) citing Amir's reporting as evidence regarding Tesla's simulation effort

But, on the other hand, you are:

2) disregarding Amir's reporting that Tesla is using imitation learning

Surely (1) and (2) are inconsistent. Either Amir's reporting is evidence of Tesla's internal efforts, or it isn't.

Evidence of HD Map X

Amir reported in the same article that Tesla is creating HD maps. As I understand it, he even said that Navigate on Autopilot uses HD maps:

"Detailed road maps, on the other hand, are in an even earlier stage at Tesla. These are different from surface-level navigation maps from Google that Tesla owners can use in their vehicle. Autopilot software stepped up reliance on maps for the just-launched feature to help the vehicle merge from one highway to another, in which it is useful to know when you are about to approach such a merge. (It is not clear how well the merge feature works overall.)​

The more detailed maps Tesla is building rely on image data that’s collected by Tesla vehicles on the road, in combination with GPS. In the future, these maps might be used to spot construction zones or other hazards and communicate them to other Tesla vehicles so that the Autopilot driving system can avoid them automatically."​

he ignores MobilEye EyeQ4 is actually already ahead with a similar advantage on abstracted data.

In January, I asked Amnon Shashua (the CEO) if Mobileye is using imitation learning and this was his reply:

"Imitation learning is great when you have someone to imitate (like in pattern recognition & NLP). We instead created two layers – one based on “self-play” RL that learns to handle adversarial driving (including non-human) and another layer called RSS which is rule-based."
If you have any evidence Mobileye is collecting state-action pairs for imitation learning from production cars with EyeQ4, please share it.

Then in my other point i talked about how your own post disproves the statement you made. You said it yourself, Tesla needs HW3 because only that is said to have traffic lights, traffic sign, road signs, road markings, potholes, debris, general object and more accurate detection, etc. Therefore how can they already have the training data they need as you just said? As the current firmware in AP2.X has none of those detection capability?

I didn't mean to imply that Hardware 2 will give Tesla all the mid-level representation data it needs for imitation learning. Yes, Tesla needs Hardware 3. Once HW3 starts going into all new cars, and once HW2 cars start getting retrofitted with HW3, the HW3 fleet will rapidly begin to grow until it surpasses the HW2 fleet.

Similar to how Waymo uses imitation learning for some driving tasks and not others, Tesla could use imitation tasks for some narrow tasks initially and broaden its scope over time. Amir's reporting implies Tesla is already using imitation learning, but I agree what happens post-HW3 when fuller mid-level representation data is available is more interesting than what might be happening pre-HW3.
 
Last edited:
State-action pairs for imitation learning don't need to be labelled. The action is the label for the state. In other words, the action is the output that the neural network learns to map to the input, which is the state. This is just good ol' fashioned deep supervised learning. AlphaStar was trained with state-action pairs from StarCraft games that didn't require hand labelling by annotators.

For this assumption to work, you need one of two things:

1) Raw data of the state (ie full-res video from all the cameras)
or
2) Reliable abstraction (ie 3D view of the world)

There is no proof that Tesla is gathering 1) on any significant-enough scale to teach NNs nor is anywhere near reliable enough in their vision engine for 2).

On the other hand, MobilEye’s REM mapping is doing 2) basically. I won’t be digging for their talks on gathering driving interactions in an absracted form and feeding those into NNs for driving policy simulation (which then you can compare your driving to) but that’s something you can do when you are at point 2). Tesla is not.

So for Tesla to teach NNs via consumer cars they would basically have to feed massive amounts of raw data as well as those driving logs and there is no proof of this happening on an NN teaching scale.

No, Tesla trains their networks outside of consumer cars... As I’ve said Tesla has a potential validation and deployment advantage in their consumer fleet, but there is no signs of them using that consumer fleet for NN training or being anywhere near a point where they could.
 
Last edited:
  • Helpful
Reactions: rnortman
So... Tesla hadn't even started working on a simulator? Not what Amir wrote. An infant isn’t a gamete.

I would equate infancy to haven't started. Haven't started = serious development haven't begun. I explicitly pointed out the difference between a simulator built for ADAS lane keeping and adaptive cruise control versus one used for self driving functionality. You even just confirmed that by pointing out how they resolved issues for current AP on the highway. This is quiet different compared to a simulator built for full self driving functionalities, including full fledged multi-agent simulation. For example Waymo's carcraft. Which is exactly what we are talking about here. So yes, according to Amir, they haven't even started. Simulators built for lane keeping/acc features doesn't count. I already outlined that in my post.

To avoid making this mistake, you can look up the source and quote it rather than going off memory. This might be slightly more inconvenient in the moment, but it can save you time in the long run and give your arguments more credibility. When someone can disprove what you said just by quoting the thing you referenced — multiple times — that makes it hard to trust your claims in the future.

Unlike a-lot of people, i have watched every video, read every paper, listened to every talk and presentation. I have done in-depth research into basically EVERY single self driving company and their technique. Every statement i make is based on absolute fact. Keyword here is: EVERY and I reanalysis so often that everything is fresh in my memory.

Why do you say there is no evidence Tesla is using imitation learning?

I never said that, i said collection of state/action pairs which specifically implied the ones needed for a AlphaStar thesis and as @wk057 proved, its not being uploaded.

Imitation learning is already used in current networks. Mobileye has a network which they call holistic path planning, for Tesla Its the yellow arrow in @verygreen videos, for Nvidia they call theirs path perception. Mobileye's method (egomotion) is to use 6DOF and motion of the camera to produce future trajectories which they use in pair with the output of the raw camera input and feed it into a neural network. They don't use steering/pedals input. This is what i believe the imitation learning is referring to in Amir's articles. Atleast for their current efforts.

Amir reported in the same article that Tesla is creating HD maps. As I understand it, he even said that Navigate on Autopilot uses HD maps:

"Detailed road maps, on the other hand, are in an even earlier stage at Tesla. These are different from surface-level navigation maps from Google that Tesla owners can use in their vehicle. Autopilot software stepped up reliance on maps for the just-launched feature to help the vehicle merge from one highway to another, in which it is useful to know when you are about to approach such a merge. (It is not clear how well the merge feature works overall.)​

The more detailed maps Tesla is building rely on image data that’s collected by Tesla vehicles on the road, in combination with GPS. In the future, these maps might be used to spot construction zones or other hazards and communicate them to other Tesla vehicles so that the Autopilot driving system can avoid them automatically."​

The maps that are downloaded as a preq to use NOA are not HD maps. From Amir's article and your quote, their actual HD Maps is also in an infancy. Both of these things are VITAL for an AlphaStar thesis and based on amir's article they are not even close to being ready.

In January, I asked Amnon Shashua (the CEO) if Mobileye is using imitation learning and this was his reply:

"Imitation learning is great when you have someone to imitate (like in pattern recognition & NLP). We instead created two layers – one based on “self-play” RL that learns to handle adversarial driving (including non-human) and another layer called RSS which is rule-based."

Your question isn't direct. they 100% use imitation learning to bootstrap their RL which i have already proved by providing academic papers and videos of them outlining it. They have already driven 750k miles in manual mode last year in California.

None of the prerequisite needed to do a full AlphaStar is ready nor is there timetable for it to be ready. AlphaStar Which I may add isn't even a single network. The architectures that was tailored to worked on AlphaStar doesn't mean it will work on driving. So you actually have to develop a architecture that works for driving.

They can however do imitation learning on continuous steering/pedal inputs from drivers as the (NN output) and the processed outputs from the sensors from HW2 cars as (NN input) for highway EAP driving. But as proven by @verygreen and @wk057, they haven't done that.
 
If you have any evidence Mobileye is collecting state-action pairs for imitation learning from production cars with EyeQ4, please share it.

Other than the fact that Mobileye does use IL to bootstrap their RL efforts. We know for a fact they are also gathering HD Map of the world and will use it for localization, driving, and simulation. But they are also gathering drive-able path trajectories (which can be used in a neural network, HD Map would be the state and the trajectories would be the output).

Hv2fBce.png


However what we know for a fact based on their published papers on their method, their tech presentations, plus even their patents for reinforcement learning for driving and of-course their collaboration with BMW which i quoted below. They are using IL to bootstrap RL 100%.

There are some 80 engineering test cars around the globe, gathering data on the roads and learning about driving in different locations. With so many sensors gathering data, it comes in at "a couple of terabytes per hour per car". But it's data that's essential for developing an autonomous driving policy and it's data that informs the algorithm that will ultimately see the car making decisions.

So if you spot test cars out on the road they aren't necessarily driving autonomously waiting for human intervention, instead they are capturing the human driver's data to learn from the decisions that are being made.
 
Last edited:
This is interesting. Have you noticed any patterns in when Tesla uploads steering+pedal data, or what other data it uploads alongside it?

I remember in some other thread he said it wasn't disengagements that triggered logging, but certain specific environmental triggers. He called them "campaigns" where a certain thing to look for was enabled on a small subset of the fleet.

I think someone else said that supervised learning won't work because the car doesn't know what your intent is, so it has nothing to compare against. For example, you may be navigating to the airport, but the car has no idea what you are doing. To it, you are randomly changing lanes, etc.

If you turn on navigate, but not NoA, I suppose it could compare notes with its intent with your actions.
 
I think someone else said that supervised learning won't work because the car doesn't know what your intent is, so it has nothing to compare against. For example, you may be navigating to the airport, but the car has no idea what you are doing. To it, you are randomly changing lanes, etc.

I think navigation and driving are different problems. I believe the navigation and route is handled by a traditional GPS navigation system like Google Maps or TomTom or whatever. I believe supervised imitation learning only comes in at the level of discrete driving tasks, like taking a right turn at an intersection, or taking an exit off a highway, etc.
 
On the other hand, MobilEye’s REM mapping is doing 2) basically. I won’t be digging for their talks on gathering driving interactions in an absracted form and feeding those into NNs for driving policy simulation (which then you can compare your driving to) but that’s something you can do when you are at point 2). Tesla is not.

HD map data is different than state-action pairs. HD maps only include fixed features of the environment (roads, signs, lights, lane lines), not road users (cars, bikes, pedestrians). HD map data also doesn't include driver input (steering, braking, accelerating).

It would be incredibly interesting to me if Mobileys is collecting data on road users and on driver input from production cars with EyeQ4. I've watched a good number of Mobileye talks and read a few of their papers, but I haven't come across that yet.

They are using IL to bootstrap RL 100%.

There are some 80 engineering test cars around the globe, gathering data on the roads and learning about driving in different locations. With so many sensors gathering data, it comes in at "a couple of terabytes per hour per car". But it's data that's essential for developing an autonomous driving policy and it's data that informs the algorithm that will ultimately see the car making decisions.

So if you spot test cars out on the road they aren't necessarily driving autonomously waiting for human intervention, instead they are capturing the human driver's data to learn from the decisions that are being made.

(Source: BMW technology: How the carmaker is developing its autonomous driving system)

Sure, BMW says it's doing imitation learning on an engineering fleet of 80 cars, but not on hundreds of thousands of production cars. Similarly, Waymo is doing imitation learning on its hundreds of engineering cars, but not on any production cars.

For this assumption to work, you need one of two things:

1) Raw data of the state (ie full-res video from all the cameras)
or
2) Reliable abstraction (ie 3D view of the world)

There is no proof that Tesla is gathering 1) on any significant-enough scale to teach NNs nor is anywhere near reliable enough in their vision engine for 2).

I think you are not drawing a clear enough line between perception and action (by "action" I mean path planning and driving policy). Yes, perception errors need to be reduced to within a tolerable threshold in order to reduce errors in the learning and execution of action to within a tolerable threshold. But to bring perception errors down to this extent doesn't necessarily require production fleet data.

Tesla can collect raw sensor data from engineering cars just like Waymo, Cruise, Zoox, and Mobileye can. The main bottleneck to acquiring a diverse, high-quality labelled dataset is the cost of human labelling. Paying drivers to drive around collecting data is relatively cheap (at 25 mph and $25/hour it's $1/mile). Collection is cheap, labelling is expensive. The amount of raw sensor data Tesla can collect from engineers cars is not practically constrained by cost. Driving 50 million miles per year would cost around $50 million and take up ~3.5% of its R&D budget.

Tesla also collects some raw sensor data from production cars, as Karpathy has discussed at length and as verygreen's hacking has confirmed. The true scale of collection is hard to know.

We don't have insight into the scale of Tesla's data collecting or labelling — or the scale of collecting and labelling for other companies like Waymo. Mobileye said at one point it had 1,000 full-time employees working on labelling. At $50/hour, 40 hours per week, and 47 weeks per year, their annual salaries would be about $100 million plus maybe about 1/3 (just a guess) for other costs of employing them. So let's say $135 million/year. That would be about 10% of Tesla's R&D budget.

So, while technically true there is no proof that Tesla is collecting or labelling enough raw sensor data to train perception NNs, technically there is no proof that Waymo, for instance, is labelling enough raw sensor data either. These companies typically don't disclose much about the exact scale of their internal R&D operations. They are highly secretive. They make employees sign NDAs and sue employees who leave who they suspect might have stolen secrets (sometimes overzealously). Even if companies did disclose how much data they label, we don't really know exactly how much labelled training data is necessary for autonomous car perception NNs. No matter what numbers they disclosed about quantity of labelled data, there would no proof whether that much data is enough.

So, while there is no proof Tesla is collecting and labelling enough raw sensor data to train its perception NNs, there is also no proof that Waymo, Cruise, Zoox, Mobileye, or any other company on Earth is collecting and labelling enough raw sensor data to train its perception NNs.

I would equate infancy to haven't started. Haven't started = serious development haven't begun.

"Serious" is so subjective a term this could mean anything.

So yes, according to Amir, they haven't even started. Simulators built for lane keeping/acc features doesn't count. I already outlined that in my post.

Amir didn't write that the simulator isn't a full self-driving simulator. That's a conjecture you're making.

If the simulator is only an ADAS simulator, and it's new, how do you square that with the fact that Tesla has been doing ADAS simulation since at least early 2016?

The maps that are downloaded as a preq to use NOA are not HD maps.

Source?
 
Last edited:
snapshot requests vary as is the data requested, but typically after a trigger happened a canbus dump (that includes raw radar, but also steering and other such driver input) is sent as a follow on thing 30 seconds later (since it goes back like 2 minutes it's ok)
 
  • Informative
Reactions: strangecosmos
snapshot requests vary as is the data requested, but typically after a trigger happened a canbus dump (that includes raw radar, but also steering and other such driver input) is sent as a follow on thing 30 seconds later (since it goes back like 2 minutes it's ok)

Thanks! Any camera data? Video or still images?

If the neural network’s mid-level representation (i.e. what’s represented in your videos by bounding boxes, “green carpet”, etc.) were uploaded with snapshots, would you be able to tell?
 
I found an old Tesla job posting for “Simulation Engineer, Autopilot”. (archive.org | archive.is)

The job description says a few things to make it clear this is a full self-driving simulator:
  • “As an Autopilot Simulation Engineer, you will contribute to the development of a fully-autonomous driving system...”
  • Responsibilities include: “The development of new models covering all parts of the self-driving stack.“
  • And: “Keep up to date with the latest research/technologies in the fields of autonomous-driving and simulation.”
An archive.org copy of the Tesla careers page from December 30, 2017 shows the same job listed under a slightly different name: “Autopilot Simulation Engineer”. The actual job posting page isn’t archived that far back, but we know it’s the same position because:
  • The URL itself is also identical.
  • The job description for “Simulation Engineer, Autopilot” says “As an Autopilot Simulation Engineer, you will...”.
So, we can conclude: Tesla started looking for engineers to work on a full self-driving simulator no later than December 30, 2017.

But we can narrow it down further. A TMC post I made on November 7, 2017 references that same job posting:

Tesla has posted public job listings for an “Autopilot Simulation Engineer” and a “Software Engineer, Autopilot Simulation”.

Unfortunately, the links are broken and unarchived.

Since at least November 2017, then, Tesla started hiring for this position. That puts work on a full self-driving simulator at ~1 year before Amir's article, at least.
 
Last edited: