strangecosmos
Non-Member
He's at it again, this time on the ARK podcast: On the Road to Full Autonomy With Elon Musk — FYI Podcast
"I think we will be feature complete FSD this year, meaning the car will be able to find you in a parking lot, pick you up and take you all the way to your destination this year. I would say I'm certain of that. That is not a question mark."
Getting back to the topic of the OP, I think one of the most fascinating pieces of information to come out about autonomy lately is that Waymo is using imitation learning.
Imitation learning, for those who don't know, means a neural network learns to map certain kinds of actions to certain kinds of environment states based on observing what humans do. By training on lots and lots of examples of human action, the neural network learns, "If you see this, do that." Like, "If you see a stop sign, stop." Or, "If you see a parked car blocking your way, nudge around it like so."
Drago Anguelov, the lead of Waymo’s research team, recently gave a talk at MIT where he went deep into this topic.
One of the most interesting slides from the presentation:
Anguelov says for the long tail of the human driving behaviour distribution — rare situations — there aren’t enough training examples in Waymo’s dataset to do imitation learning. I mean, imagine a situation that arises every 30 million miles on average. Waymo has done under 15 million miles, so it might not have encountered that situation even once. Or a situation that occurs every 1 million miles. It might have less than 15 examples, which might be far too few. (I don't know what's true for imitation learning, but for image classification the rule of thumb is you want at least 1,000 examples.) There are lots of rare situations Waymo has never seen, or has seen too few times.
So, while Anguelov would prefer to do imitation learning across the whole human driving behaviour distribution — including the long tail — Waymo just doesn’t have the data to do it. Well, who does have the data? Tesla... And is Tesla doing imitation learning? Amir Efrati at The Information has reported that it is, citing at least one source who has worked in Tesla’s Autopilot division:
“Tesla’s cars collect so much camera and other sensor data as they drive around, even when Autopilot isn’t turned on, that the Autopilot team can examine what traditional human driving looks like in various driving scenarios and mimic it, said the person familiar with the system. It uses this information as an additional factor to plan how a car will drive in specific situations—for example, how to steer a curve on a road or avoid an object. Such an approach has its limits, of course: behavior cloning, as the method is sometimes called…
But Tesla’s engineers believe that by putting enough data from good human driving through a neural network, that network can learn how to directly predict the correct steering, braking and acceleration in most situations. “You don’t need anything else” to teach the system how to drive autonomously, said a person who has been involved with the team. They envision a future in which humans won’t need to write code to tell the car what to do when it encounters a particular scenario; it will know what to do on its own.”
Tesla hasn’t confirmed this, but Elon made some comments on the ARK Invest podcast that could be interpreted as describing imitation learning.
At 13:30:
"The advantage that we have that I think is very difficult to overcome is that we have just a vast amount of data on interventions. So, effectively, the customers are training the system on how to drive. And there are millions of corner cases that are so obscure and weird you wouldn't believe it..."
At 14:25:
“Everytime somebody intervenes — takes over from Autopilot — it saves that information and uploads it to our system. ... And we’re really starting to get quite good at not even requiring human labelling. Basically the person, say, drives the intersection and is thereby training Autopilot what to do."
These comments are ambiguous and there are multiple possible interpretations. But, to me, imitation learning fits most closely with what Elon said.
Important to clarify: Tesla wouldn't need to upload any sensor data for this to work. It would only need to upload the perception neural network's mid-level representation — the judgments (or predictions, to use the technical term) it makes about what it sees — paired with data about what the human driver did with the steering wheel and pedals. These state-action pairs don't need human annotation.
Okay, let's suppose Tesla is using imitation learning on the whole distribution of human driving behaviour, including the long tail of millions of obscure and weird situations. Can this approach really train a neural network to execute complex tasks?
Yes! Using just imitation learning on millions of StarCraft games played by humans, AlphaStar achieved performance estimated by DeepMind to be at the level of a Gold/Platinum league human player. That's roughly in the middle of the ranked ladder. In other words, AlphaStar reached roughly median human performance using imitation learning alone. StarCraft is a complex task that has been considered by many to be a significant AI challenge.
Imitation learning was also used by DeepMind to bootstrap reinforcement learning (RL) for AlphaStar. This allowed it to go on and beat one of the world's top pro players. Could the same be done for autonomous driving? Maybe, but there is an important difference between StarCraft and driving. To quote Oriol Vinyals, one of the creators of AlphaStar:
"Driving a car is harder. The lack of (perfect) simulators doesn’t allow training for as much time as would be needed for Deep RL to really shine."
Why can't we do RL in a driving simulator? According to two researchers at Waymo:
"...doing RL requires that we accurately model the real-world behavior of other agents in the environment, including other vehicles, pedestrians, and cyclists. For this reason, we focus on a purely supervised learning approach in the present work, keeping in mind that our model can be used to create naturally-behaving “smart-agents” for bootstrapping RL."
Drago Anguelov also emphasized "smart agents" for robust simulation in his talk:
So, to summarize:
- Waymo is doing imitation learning, but not for the long tail because it lacks training data
- Tesla might be doing imitation learning, and has training data for the long tail
- AlphaStar reached roughly median human performance on StarCraft with imitation learning
- Imitation learning bootstrapped reinforcement learning for AlphaStar
- The same could potentially be done for autonomous driving
It also implies that, if this approach works, progress will happen at the speed of a machine learning project, not at the speed of a classical robotics project. For the past 15 years, classical robotics software has played a major role in almost all autonomous vehicles (with Wayve and Nvidia's BB8 being two exceptions). Classical robots software makes progress slowly. And it might just never work. Elon doesn't seem to think so (15:05):
"...a series of if-then-else statements and lidar is not going to solve it. Forget it. Game over."
Just because humans know how to do things doesn't mean we know how we do them. That's a problem for psychology, neuroscience, cognitive science, artificial intelligence research. Not something a software engineer can understand through introspection or casual observation and then program into a robot. Some tasks we can program, and some we can't. Classical robotics doesn't have a lot of complex tasks under its belt.
If classical robotics software is our best approach to autonomous driving, it might be hopeless. Even if it isn't, progress will probably be about as slow over the next 15 years as the last 15 years. This would make linear extrapolation of progress rational. If you extrapolate progress linearly, then talking about full autonomy in 2020 doesn't make sense.
If the approach isn't classical robotics, but machine learning, then progress could be super-linear. Linear extrapolation of progress wouldn't be rational. I think this is where Elon is coming from. If you think of Tesla AI on the timeline of AlphaStar, then full autonomy in 2020 makes a lot more sense. AlphaStar took less than 3 years of development. The final training run before it beat MaNa — one of the world's top players — run took 17 days. The imitation learning portion took just 3 days. An agent can go from zero to human-level at StarCraft in a long weekend. So, to say full autonomy is definitely not going to happen next year because of what's on the road today doesn't make sense, if you think imitation learning is a viable approach.
Last edited: