Interesting finding
from Baidu.
As we know, with 1 million training images (1,000 per semantic class), you can get superhuman accuracy on ImageNet.
But with 25,000 training images (25 per semantic class), Baidu found you get ~0% accuracy:
“For small training sets—less than roughly 25 images per class—these error metrics are roughly equal to the model random guessing...”
There are some training datasets that are too small to make any progress on solving a problem. Once you cross a certain threshold of dataset size, you suddenly begin to make progress. This is something to keep in mind with neural networks. If a problem seems intractable, one possible cause is that the dataset is just too small.
In some cases, increasing a dataset 40x can be the difference between “it doesn’t work at all” and “it’s superhuman”.
Similarly, from 25,000 to 1 million training images, convolutional neural networks go from “it works a bit” to “it’s superhuman”. Sometimes increasing a dataset less than 40x is enough to solve a problem. You can’t automatically conclude in every instance “it only works a bit, and no additional amount of training data will make it work”. Sometimes that will be true, and sometimes it won’t.
With Waymo’s imitation learning paper, on the types of situation that ChauffeurNet was trained and tested on, it apparently performed perfectly on some, or at least so well that Waymo didn’t bother to publish the test results. In one of the types of situation Waymo did report (random perturbances), ChauffeurNet had a 100% success rate. In the other two, it is difficult to assess, since we don’t have a human benchmark, and Waymo says a human driver may not have been able to perform better. Put people in driving simulators, yo!
Waymo concludes the paper by saying/implying ChauffeurNet is not yet fully competitive with Waymo’s current hybrid machine learning/explicit reasoning system, but they don’t share what led them to that conclusion. Did they run tests? Was it their qualitative assessment? I wish they had expanded on this more, to let us know what the weaknesses of ChauffeurNet are relative to Waymo’s current system. Perhaps they didn’t want to reveal any information about Waymo’s proprietary technology.
The most insight we get is under the Failure Modes section on pages
16-17:
“At our ground resolution of 20 cm/pixel, the agent currently sees 64 m in front and 40 m on the sides and this limits the model’s ability to perform merges on T-junctions and turns from a high-speed road. Specific situations like U-turns and cul-de-sacs are also not currently handled, and will require sampling enough training data. The model occasionally gets stuck in some low speed nudging situations. It sometimes outputs turn geometries that make the specific turn infeasible (e.g. large turning radius). We also see some cases where the model gets over aggressive in novel and rare situations for example by trying to pass a slow moving vehicle. We believe that adequate simulated exploration may be needed for highly interactive or rare situations.”
It’s important to note that Waymo attributes some of these failure modes to insufficient training data, and suggests adding more real or synthetic training data as a potential solution. The relevant measure of training data here is probably number of training examples per type of situation, but with driving there may be a long tail of an indefinite number of types of situations. One advantage of collecting 1 billion, or 10 billion, or 25 billion miles of mid-level representations data from real world driving is that you would be able to train a neural network on all kinds of rare situations that human engineers might never think to simulate. Including thousands of crashes and near-crashes.
So, Waymo believes that one way to fix some of the failure modes and improve the system is more training data. This highlights the importance of training data for machine learning approaches to path planning. Don’t take my word for it — just read what Waymo wrote.
Andrew Ng’s
advice to deep learning engineers is: “no matter where you’re stuck, with modern deep learning tools we have a clear path for making progress... In particular, no matter what your problem is — overfitting or underfitting, really high bias or high variance or maybe both — you always have at least one action you can take, which is: bigger model or more data. So, in the deep learning era... it feels like we more often have a way out of whatever problem we’re stuck in.”
This is something I keep in mind when thinking about neural networks.