When The Information recently published two articles on Tesla and autonomy, the strangest thing to come out of that reporting was this bit under the subheading “Behavior Cloning”:
What’s not made clear in the article is the difference between end to end learning and neural networks in general. If you use neural networks, but not end to end learning, that’s still a situation where humans don’t need to write code for specific scenarios.
Amnon Shashua has a really good talk on end to end learning vs. the “semantic abstraction” approach to using neural networks:
As Amnon says, if Tesla were using end to end learning, it would not need to label images. The only “labelling” that occurs is the human driver’s actions: the steering angle, accelerator pushes, and brake pedal pushes. The sensor data is the input, and the one big neural network tries to learn how to map that sensor data onto the human driver’s actions. Since we know Tesla is labelling images, we know Tesla can’t be using end to end learning. Since “end to end learning” and “behaviour cloning” are synonymous, we know Tesla can’t be using behaviour cloning.
So, what did Amir Efrati at The Information hear from his sources that led him to report that Tesla is using “behaviour cloning”? Amir writes that how humans drive is used “to plan how a car will drive in specific situations—for example, how to steer a curve on a road or avoid an object.” What this makes me think is that perhaps Tesla is working on a neural network for path planning (or motion planning) and/or control. Perhaps a path planning neural network and/or control neural network is being trained not with sensor data as input, but with the metadata outputted by the perception neural networks. The Tesla drivers’ behaviour — steering, acceleration, brake — “labels” the metadata in the same way that, in end to end learning, the human driver’s behaviour “labels” the sensor data.
This approach would solve the combinatorial explosion problem of end to end learning (described by Amnon in the video above) by decomposing perception and action. Perception tasks and action tasks would be handled independently by separate neural networks that are trained independently.
Using human drivers’ actions as the supervisory signal/training signal for path planning and/or control actually makes sense to me (whereas end to end learning does not). What are the alternatives?
a) Use a hand-coded algorithm. While this may be effective, we have lots of examples where fluid neural networks outperform brittle hand-crafted rules.
b) Use simulation. Until recently, I didn’t appreciate how much trouble we have simulating the everyday physics of the real world. From OpenAI
c) Use reinforcement learning. One difficulty with using reinforcement learning is how to define the reward function. How do you determine when “points” are added or deducted for the neural network’s performance? Another even greater difficulty: where do you do the training? In simulation? On test tracks? With engineers on public roads?
If Tesla were to get the training signal from the behaviour of Tesla drivers — and use human review to remove examples of bad driving — then it would avoid hand-coded algorithms’ brittleness and simulations’ lack of verisimilitude. I think (but I’m not sure) it would then be possible to use reinforcement learning or supervised learning to improve on this. You put the path planning and/or control neural network into cars running Autopilot and other advanced features, and then you use disengagements, aborts, crashes, and bug reports to identify failures. These failures then become part of the training signal.
“Tesla’s cars collect so much camera and other sensor data as they drive around, even when Autopilot isn’t turned on, that the Autopilot team can examine what traditional human driving looks like in various driving scenarios and mimic it, said the person familiar with the system. It uses this information as an additional factor to plan how a car will drive in specific situations—for example, how to steer a curve on a road or avoid an object. Such an approach has its limits, of course: behavior cloning, as the method is sometimes called... But Tesla’s engineers believe that by putting enough data from good human driving through a neural network, that network can learn how to directly predict the correct steering, braking and acceleration in most situations. “You don’t need anything else” to teach the system how to drive autonomously, said a person who has been involved with the team. They envision a future in which humans won’t need to write code to tell the car what to do when it encounters a particular scenario; it will know what to do on its own.”
As I understand it, when software engineers who work on self-driving cars use the term “behaviour cloning”, this means the same thing as “end to end learning”, i.e. the entire system is just one big neural network that takes sensor data as its input and outputs steering, acceleration, and braking.
What’s not made clear in the article is the difference between end to end learning and neural networks in general. If you use neural networks, but not end to end learning, that’s still a situation where humans don’t need to write code for specific scenarios.
Amnon Shashua has a really good talk on end to end learning vs. the “semantic abstraction” approach to using neural networks:
As Amnon says, if Tesla were using end to end learning, it would not need to label images. The only “labelling” that occurs is the human driver’s actions: the steering angle, accelerator pushes, and brake pedal pushes. The sensor data is the input, and the one big neural network tries to learn how to map that sensor data onto the human driver’s actions. Since we know Tesla is labelling images, we know Tesla can’t be using end to end learning. Since “end to end learning” and “behaviour cloning” are synonymous, we know Tesla can’t be using behaviour cloning.
So, what did Amir Efrati at The Information hear from his sources that led him to report that Tesla is using “behaviour cloning”? Amir writes that how humans drive is used “to plan how a car will drive in specific situations—for example, how to steer a curve on a road or avoid an object.” What this makes me think is that perhaps Tesla is working on a neural network for path planning (or motion planning) and/or control. Perhaps a path planning neural network and/or control neural network is being trained not with sensor data as input, but with the metadata outputted by the perception neural networks. The Tesla drivers’ behaviour — steering, acceleration, brake — “labels” the metadata in the same way that, in end to end learning, the human driver’s behaviour “labels” the sensor data.
This approach would solve the combinatorial explosion problem of end to end learning (described by Amnon in the video above) by decomposing perception and action. Perception tasks and action tasks would be handled independently by separate neural networks that are trained independently.
Using human drivers’ actions as the supervisory signal/training signal for path planning and/or control actually makes sense to me (whereas end to end learning does not). What are the alternatives?
a) Use a hand-coded algorithm. While this may be effective, we have lots of examples where fluid neural networks outperform brittle hand-crafted rules.
b) Use simulation. Until recently, I didn’t appreciate how much trouble we have simulating the everyday physics of the real world. From OpenAI
“Learning methods for robotic manipulation face a dilemma. Simulated robots can easily provide enough data to train complex policies, but most manipulation problems can’t be modeled accurately enough for those policies to transfer to real robots. Even modeling what happens when two objects touch — the most basic problem in manipulation — is an active area of research with no widely accepted solution. Training directly on physical robots allows the policy to learn from real-world physics, but today’s algorithms would require years of experience to solve a problem like object reorientation.”
While simulation may be a part of the development and training process for path planning and/or control, it probably can’t be the whole process.
c) Use reinforcement learning. One difficulty with using reinforcement learning is how to define the reward function. How do you determine when “points” are added or deducted for the neural network’s performance? Another even greater difficulty: where do you do the training? In simulation? On test tracks? With engineers on public roads?
If Tesla were to get the training signal from the behaviour of Tesla drivers — and use human review to remove examples of bad driving — then it would avoid hand-coded algorithms’ brittleness and simulations’ lack of verisimilitude. I think (but I’m not sure) it would then be possible to use reinforcement learning or supervised learning to improve on this. You put the path planning and/or control neural network into cars running Autopilot and other advanced features, and then you use disengagements, aborts, crashes, and bug reports to identify failures. These failures then become part of the training signal.
Last edited: