Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla patent using augmented images to train NN

This site may earn commission on affiliate links.

diplomat33

Average guy who loves autonomous vehicles
Aug 3, 2017
12,701
18,663
USA
Tesla has a newly published patent called "Systems and Methods for Training Machine Models with Augmented Data":

Excerpts from patent:

"Augmentation may provide generalization and greater robustness to the model prediction, particularly when images are clouded, occluded, or otherwise do not provide clear views of the detectable objects. These approaches may be particularly useful for object detection and in autonomous vehicles. This approach may also be beneficial for other situations in which the same camera configurations may be deployed to many devices. Since these devices may have a consistent set of sensors in a consistent orientation, the training data may be collected with a given configuration, a model may be trained with augmented data from the collected training data, and the trained model may be deployed to devices having the same configuration."

“As a further example, the images may be augmented with a“cutout” function that removes a portion of the original image. The removed portion of the image may then be replaced with other image content, such as a specified color, blur, noise, or from another image. The number, size, region, and replacement content for cutouts may be varied and may be based on the label of the image (e.g., the region of interest in the image, or a bounding box for an object).”

tesla-neural-network-training.jpg

Tesla is patenting a clever way to train Autopilot with augmented camera images

If I am understanding this patent correctly, it seems like it is basically using "photoshop" to enhance the images to make them more useful for training the neural network. Is that right?

Any thoughts on this?
 
  • Informative
Reactions: 1 person
Lol I don't even know how this could be patented, but I guess I could say the same about some of my patents :)

Data augmentation by adding various types of noise is a very common technique in machine learning.

Yes they are basically editing the images in ways they think they want to make sure the network is able to adapt to.

All sorts of things to try with images. Change the contrast to simulate different lighting situations. Add general noise to simulate worse resolution at dusk / dawn. Change the colors of objects that shouldn't matter (but not traffic lights!).
 
Lol I don't even know how this could be patented, but I guess I could say the same about some of my patents

To be fair, there are probably a lot of machine learning things that shouldn't have been granted patents. For instance, the US post office has been using regressions to read images of handwritten numbers since 1990, but Mobileye snagged the patent on applying that technology to speed limit signs.

And in speaking of reading handwritten digits, this augmented image technique has been around since then as well. If you've got a training set of 100,000 digits, you can slightly stretch, skew, and rotate the images to easily magnify your training set by 10 fold, and it adds robustness to the NN result in the same way Tesla has described above.
 
If I am understanding this patent correctly, it seems like it is basically using "photoshop" to enhance the images to make them more useful for training the neural network. Is that right?

The way I read this patent, they are introducing image artifacts over their training dataset so that it can also train for random occlusion, night, bad weather, fly on the windscreen, etc. using multiple passes over the same dataset.

This should theoretically generate tonnes more training data quickly and we all know the benefit of that!
 
The way I read this patent, they are introducing image artifacts over their training dataset so that it can also train for random occlusion, night, bad weather, fly on the windscreen, etc. using multiple passes over the same dataset.

This should theoretically generate tonnes more training data quickly and we all know the benefit of that!

Yes. That makes sense. Thanks.
 
Yes. That makes sense. Thanks.

One could also think Tesla could use such a method to "merge" the content from all 3 front-facing cameras. (one simple example)
For reference, here are the FOVs of all 3 front-facing windshield cameras:
  • Main Forward Camera: Max distance 150m with 50° field of view

  • Narrow Forward Camera: Max distance 250m with 35° field of view

  • Wide Forward Camera: Max distance 60m with 150° field of view
We know that the Wide camera is used for rain/snow detection for the auto wipers. Apparently the other two can "focus" through the blur cause by raindrops.
I could imagine Tesla could use the patent above to selectively choose data from either 3 of these cameras to provide the clearest scene possible for the perception engine.

Tesla/Elon has stated at multiple occasions that the AP "re-write" will be able to stitch content from all outward facing cameras to view its surroundings in 3D (creating a 3D point cloud from video, probably with a color attached to their respective 3D point).
Just cropping in the Narrow FOV into the Main FOV for example is probably not the most robust way to go.
If there's a bug in front of the Narrow FOV camera, maybe it's best to ignore it, or maybe just use the "good" data.
Maybe the main camera is affected by glare from oncomming sun in the sky, how is that issue detected and how can I selectively choose data from the overlapping images to improve the dataset?

As J1mbo notes, the network could be trained by introducing artefacts although I would also think comparing real data from overlapping FOVs would help automatically identify occlusions/loss of data.
The "augmentation" could be sourced from redundant sensor data from overlapping FOVs or local color/blur/noise filters as they describe in the patent.
Technologies like this would seem to be a key foundation to allow merging/redundancy of thoses 3 cameras into 1 and could also be used for any other overlapping regions between the 8 cameras or in the non-overlapping regions using the described filters.