Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla files patent for sourcing self-driving training data from its fleet

This site may earn commission on affiliate links.

diplomat33

Average guy who loves autonomous vehicles
Aug 3, 2017
12,900
19,060
USA
Karpathy is listed as the inventor of the patent. It appears to be a system for determining if data from the fleet needs to be uploaded for training:

"An example method includes receiving sensor and applying a neural network to the sensor data. A trigger classifier is applied to an intermediate result of the neural network to determine a classifier score for the sensor data. Based at least in part on the classifier score, a determination is made whether to transmit via a computer network at least a portion of the sensor data. Upon a positive determination, the sensor data is transmitted and used to generate training data."
Tesla files patent for sourcing self-driving training data from its fleet - Electrek

Screen-Shot-2020-03-23-at-5.49.10-AM.jpg


It might allow Tesla to better leverage their large fleet for training by doing a better job of collecting useful data from the fleet.

Thoughts?
 
Thoughts?

I think this actually contradicts what some Tesla-hackers have been telling us with regard to Tesla's training data collection. Their personal cars haven't been uploading data, and so they concluded that the fleet as a whole doesn't upload much.

But if the above patent is already implemented, those vehicles may have just not been in situations novel enough to trigger data collection.

Anecdotally, you see folks on this forum with those mesh WiFi routers that enable them to see how much each device is uploading/downloading, and some people report GB worth of uploads. I think Tesla has a lot more training data on hand than we've been giving them credit for.
 
I think this actually contradicts what some Tesla-hackers have been telling us with regard to Tesla's training data collection. Their personal cars haven't been uploading data, and so they concluded that the fleet as a whole doesn't upload much.

But if the above patent is already implemented, those vehicles may have just not been in situations novel enough to trigger data collection.

Anecdotally, you see folks on this forum with those mesh WiFi routers that enable them to see how much each device is uploading/downloading, and some people report GB worth of uploads. I think Tesla has a lot more training data on hand than we've been giving them credit for.

It stands to reason that Tesla would not be uploading every single mile that we drive. That would be a waste and totally inefficient. As this patent seems to hint at, Tesla needs a way to judge if data is useful or not and only upload data for training that they actually need. So some cars might see larger uploads than others depending on if their cars have data that Tesla finds useful for training.
 
  • Like
Reactions: willow_hiller
I wonder why a generalized patent like this wasn't submitted a long time ago.

Who knows? This patent does seem related to "Operation Vacation" and Dojo since it seems part of Tesla's effort to automate more of the whole process from initial fleet data collection to final machine learning. Maybe Tesla was not ready until now to do it?
 
Who knows? This patent does seem related to "Operation Vacation" and Dojo since it seems part of Tesla's effort to automate more of the whole process from initial fleet data collection to final machine learning. Maybe Tesla was not ready until now to do it?

The optimist in me says Tesla is just about to deploy their re-write to AP along with some new FSD features, and they're using this patent as an opportunity to "show their hand" and explain how they've accomplished the results.

The pessimist says that this is actually a new process after Tesla realized they needed a lot more data to power their re-write.
 
The optimist in me says Tesla is just about to deploy their re-write to AP along with some new FSD features, and they're using this patent as an opportunity to "show their hand" and explain how they've accomplished the results.

The pessimist says that this is actually a new process after Tesla realized they needed a lot more data to power their re-write.

It might be a little bit of both. I think Tesla did underestimate initially what it would take to do FSD based on the numerous rewrites and missed FSD deadlines. But the optimist in me is hoping that if Tesla is really collecting large amounts of good data from the fleet and using it effectively to train the NN, that we will see some big improvements "soon".

I am all about results. If it works and AP shows some big improvements, I will be very happy. If I have seemed "anti-Tesla" on FSD, it's only because I have not seen a lot of results commensurate with all the FSD promises of robotaxis, sleeping in your Tesla, coast to coast demos etc...
 
I think this actually contradicts what some Tesla-hackers have been telling us with regard to Tesla's training data collection. Their personal cars haven't been uploading data, and so they concluded that the fleet as a whole doesn't upload much.

But if the above patent is already implemented, those vehicles may have just not been in situations novel enough to trigger data collection.

Anecdotally, you see folks on this forum with those mesh WiFi routers that enable them to see how much each device is uploading/downloading, and some people report GB worth of uploads. I think Tesla has a lot more training data on hand than we've been giving them credit for.

Huh? No. Basically everything in this patent was disclosed in length by verygreen 3 years ago. It's nothing new.
 
Huh? No. Basically everything in this patent was disclosed in length by verygreen 3 years ago. It's nothing new.

I read verygreen's writings on this subject very differently: green on Twitter

"We'll start with the bitter truth. The "shadow driver that just sits there in the computer comparing notes and sending discrepancies and interesting events to Tesla" is a myth. I used to think people just misunderstood Elon, but now I believe Tesla lies about it on purpose"

He does go on to describe "triggers" green on Twitter which probably correlate to the patent above. But why start his twitter thread by saying that AP doesn't send discrepancies and interesting events to Tesla? How would a neural net decide what is a discrepancy or interesting event if Tesla didn't program in ways of identifying them?
 
Last edited:
  • Like
Reactions: kbM3
But why start his twitter thread by saying that AP doesn't send discrepancies and interesting events to Tesla?
I believe the context was that people thought Autopilot was sending "everything," so instead of

"shadow driver that just sits there in the computer comparing notes and sending [all] discrepancies and interesting events to Tesla"

it's more of

"shadow driver that just sits there in the computer comparing notes and sending [only Tesla-selected] discrepancies and interesting events to Tesla"

As you point out, the findings later in the thread clearly show Tesla is collecting data, and a couple times he says some of the data is "boring" but that same data is likely quite valuable if Tesla has decided to send it over cell connection.
 
  • Like
Reactions: willow_hiller
Do we know if this shadow driver is making comparisons and uploading situation data even when AP is off? I've read that data is uploaded when the driver wrests control away from AP, but don't know if shadow driver is always running in the bg, even when full-manual driving is done.
 
Do we know if this shadow driver is making comparisons and uploading situation data even when AP is off? I've read that data is uploaded when the driver wrests control away from AP, but don't know if shadow driver is always running in the bg, even when full-manual driving is done.

Later in that twitter thread, Green pulled out some of the specific circumstances where training data is collected: green on Twitter

Looks like this one collects data when the steering wheel is jerked "img-vid-don-steer-diseng: jerking steering wheel above some speed dependent value. 1 request at 0.1% prob." But most of the others don't require AP to be on.
 
Do we know if this shadow driver is making comparisons and uploading situation data even when AP is off?
Yes, here's some excerpts linked from the article:

For example, the deep learning analysis at 401 and 403 is performed to provide results as input to the trigger classifier in order to identify potential training data even when the autonomous driving system is not actively controlling the vehicle.
Screen-Shot-2020-03-23-at-5.49.10-AM.jpg

Additionally, things can get updated even with Autopilot off and without needing OTA of core software because the triggers don't depend on or change driving:
New and updated trigger classifiers that link into and are associated with the vehicle's existing neural network software can be pushed to vehicles much more frequently and with little to no impact to the core vehicle functionality such as driving, safety systems, and navigation, among others. For example, a trigger classifier can be trained to identify cobblestone roads, be deployed to a fleet of vehicles, and begin to gather image and related data of cobblestone roads within minutes. Using the disclosed techniques, the speed to gather relevant training data for specific use cases is vastly improved with little to no impact on ongoing vehicle operation or on the driver or passengers of the vehicle.

Interestingly, this remote control behavior also allows Tesla to have the fleet scoop up large amounts of data including images just based on location without any other trigger perhaps to help Autopilot engineers figure out what's going on in an area before fine-tuning specific triggers by getting multiple views of a situation:
Furthermore, the system 120 may instruct vehicles to transmit sensor data even if the above-described classifier does not assign a classifier score greater than a threshold. As an example, the system 120 may receive sensor data from a threshold number of vehicles proximate to a real-world location. In this example, the system 120 may instruct any vehicle within a threshold distance of that real-world location to transmit sensor data (e.g., images) even if their classifiers do not generate a classifier score greater than a threshold. Since the classifier may be trained on a training set with a limited number of examples (e.g., 100, 1000, as described above), depending on the angle of a particular vehicle with respect to an object, the particular vehicle's classifier may not identify the object. However, the sensor data may be useful to the generation of a robust training set for the object. For example, the object may be partially visible in images obtained by the particular vehicle and therefore may be useful in a large training set to identify the object. In this way, the outside system 120 may override the classifier and cause the particular vehicle to transmit sensor data.
Screen-Shot-2020-03-23-at-5.48.39-AM-e1584968208501.jpg
 
  • Like
Reactions: kbM3 and diplomat33
I wonder why a generalized patent like this wasn't submitted a long time ago.
It was. If you look at the Scribd document shown by Electrek, you'll see that the priority date is 9/14/2018. This means a version of this patent was filed back then (perhaps as a provisional filing at the USPTO). This is not a new patent.

And yes, at first glance it looks like this describes the triggers that hackers have discovered.
 
It was. If you look at the Scribd document shown by Electrek, you'll see that the priority date is 9/14/2018. This means a version of this patent was filed back then (perhaps as a provisional filing at the USPTO). This is not a new patent.

And yes, at first glance it looks like this describes the triggers that hackers have discovered.
If that is the case, a prescient move by Tesla.
 
He does go on to describe "triggers" green on Twitter which probably correlate to the patent above. But why start his twitter thread by saying that AP doesn't send discrepancies and interesting events to Tesla?
The patent application doesn't describe detecting "discrepancies", i.e. the old story of autopilot running in "shadow mode" and reporting if the driver does something differently. It describes that a "trigger classifier" (i.e. an ML model trained to recognize interesting events) is trained a priori and then uploaded to the fleet.
 
It describes that a "trigger classifier" (i.e. an ML model trained to recognize interesting events) is trained a priori and then uploaded to the fleet.
This data collection mechanism also looks at and sends back various metadata including "vehicle control and/or operating parameters such as speed, acceleration, braking, whether autonomous driving was enabled, steering angle, etc."

Additional conditional requirements may be based on the location, the weather, road conditions, road type, vehicle type, disengagement of an autonomous driving feature, steering angle (e.g., exceeding a steering angle threshold), change in acceleration, activation of the brakes, or other appropriate feature.

In some embodiments, different use cases may utilize different trigger properties and the intermediate result of different layers of the neural network. For example, some use cases may be more efficient and produce high quality results using the intermediate result of a latter layer of the neural network. Other use cases may require an earlier intermediate result in order to identify useful examples of sensor data that meet the use case.

The second line of the excerpt says earlier data can be used with the latest data as part of the trigger, e.g., earlier data detected a red light with driver applying brakes and current data still detects a red light with driver steering and accelerating -- potentially a "right turn on red" trigger.
 
Last edited:
Something in this patent concerns me. Part of it went into great detail on how to collect data on tires lying in the road. They even included examples of tires partially deflated, or tires partially occluded by snow. My problem is that tires are not an edge case. Tires may be the most common example of road debris, even though it’s relatively rare to see one on the road.

A school desk is a good example of an edge case for road debris. A human just knows that it is dangerous to hit, if one fell off the back of a truck. How is Tesla going to gather enough images of school desks lying in the road? Does anyone have a handle on how Tesla can handle classifying the millions of objects that may be lying in the road? It seems like there will be a huge bucket of road debris that will be unidentified. I always assumed Tesla would swerve to miss all unidentified road debris. How does it help to go into such great effort to identify tires, when there are millions of objects, that it may run into that are unclassified?

I.e. if your system is already designed to avoid unidentified road debris, then it will already handle tires by default.

A school desk in the road may sound absurd, but over 1M miles of driving, there are very high odds, that some strange kind of road debris will be encountered.