The biggest breakthrough was in 2012, when AI researcher Geoffrey Hinton and his two graduate students, Ilya Sutskever and Alex Krizhevsky, showed a new way to attack the problem:
a deep convolutional neural network to the ImageNet Challenge that could detect pictures of everyday objects. Their neural net embarrassed the competition — reducing the error rate on image recognition to 16 percent, from 25 percent the other methods produced.
“I believe that was the first time that a deep learning, neural net-based approach beat the pants off more standard approach,” says Ferguson, the former Google engineer. “And since then, we’ve never looked back.”
Krizhevsky takes a more circumspect approach to his role in the 2012 ImageNet Challenge. “I guess we were at the right place at the right time,” he tells me. He attributes their success to his hobby of programming GPUs to run code for the team’s neural net, enabling them to run experiments that would normally take months in just a matter of days. And Sutskever made the connection to apply the technique to the ImageNet competition, he says.
Hinton and his team’s success “triggered a snowball effect,” Vanhoucke says. “A lot of innovation came from that.” An immediate result was
Google acquiring Hinton’s company DNNresearch, which included Sutskever and Krizhevsky, for an undisclosed sum. Hinton stayed in Toronto, and Sutskever and Krizhevsky moved to Mountain View. Krizhevsky joined Vanhoucke’s team at Google Brain. “And that’s when we started thinking about applying those things to Waymo,” Vanhoucke says.
Another Google researcher, Anelia Angelova, was the first to reach out to Krizhevsky about applying their work to Google’s car project. Neither officially worked on that team, but the opportunity was too good to ignore. They created an algorithm that could teach a computer to learn what a pedestrian looked like — by analyzing thousands of street photos —
andidentify the visual patterns that define a pedestrian. The method was so effective that Google began applying the technique to other parts of the project, including prediction and planning.
Problems emerged almost immediately. The new system was making too many errors, mislabeling cars, traffic signals, and pedestrians. It also wasn’t fast enough to run in real time. So Vanhoucke and his team combed through the images, where they discovered most of the errors were mistakes made by
human labelers. Google brought them in to provide a baseline, or “ground truth,” to measure the algorithm’s success rate — and they’d instead added mistakes. The problem with autonomous cars, it turned out, was still people.
After correcting for human error, Google still struggled to modify the system until it could recognize images instantly. Working closely with Google’s self-driving car team, the AI researchers decided to incorporate more traditional machine learning approaches, like decision trees and cascade classifiers, with the neural networks to achieve “the best of both worlds,” Vanhoucke recalls.
“It was a very, very exciting time for us to actually show that those techniques that have been used to find cat pictures and interesting things on the web,” he says. “Now, they were actually being used for improving safety in driverless cars.”