Tesla AI and behaviour cloning: what’s really happening?

strangecosmos · Dec 1, 2018

When The Information recently published two articles on Tesla and autonomy, the strangest thing to come out of that reporting was this bit under the subheading “Behavior Cloning”:

“Tesla’s cars collect so much camera and other sensor data as they drive around, even when Autopilot isn’t turned on, that the Autopilot team can examine what traditional human driving looks like in various driving scenarios and mimic it, said the person familiar with the system. It uses this information as an additional factor to plan how a car will drive in specific situations—for example, how to steer a curve on a road or avoid an object. Such an approach has its limits, of course: behavior cloning, as the method is sometimes called... But Tesla’s engineers believe that by putting enough data from good human driving through a neural network, that network can learn how to directly predict the correct steering, braking and acceleration in most situations. “You don’t need anything else” to teach the system how to drive autonomously, said a person who has been involved with the team. They envision a future in which humans won’t need to write code to tell the car what to do when it encounters a particular scenario; it will know what to do on its own.”

As I understand it, when software engineers who work on self-driving cars use the term “behaviour cloning”, this means the same thing as “end to end learning”, i.e. the entire system is just one big neural network that takes sensor data as its input and outputs steering, acceleration, and braking.

What’s not made clear in the article is the difference between end to end learning and neural networks in general. If you use neural networks, but not end to end learning, that’s still a situation where humans don’t need to write code for specific scenarios.

Amnon Shashua has a really good talk on end to end learning vs. the “semantic abstraction” approach to using neural networks:

As Amnon says, if Tesla were using end to end learning, it would not need to label images. The only “labelling” that occurs is the human driver’s actions: the steering angle, accelerator pushes, and brake pedal pushes. The sensor data is the input, and the one big neural network tries to learn how to map that sensor data onto the human driver’s actions. Since we know Tesla is labelling images, we know Tesla can’t be using end to end learning. Since “end to end learning” and “behaviour cloning” are synonymous, we know Tesla can’t be using behaviour cloning.

So, what did Amir Efrati at The Information hear from his sources that led him to report that Tesla is using “behaviour cloning”? Amir writes that how humans drive is used “to plan how a car will drive in specific situations—for example, how to steer a curve on a road or avoid an object.” What this makes me think is that perhaps Tesla is working on a neural network for path planning (or motion planning) and/or control. Perhaps a path planning neural network and/or control neural network is being trained not with sensor data as input, but with the metadata outputted by the perception neural networks. The Tesla drivers’ behaviour — steering, acceleration, brake — “labels” the metadata in the same way that, in end to end learning, the human driver’s behaviour “labels” the sensor data.

This approach would solve the combinatorial explosion problem of end to end learning (described by Amnon in the video above) by decomposing perception and action. Perception tasks and action tasks would be handled independently by separate neural networks that are trained independently.

Using human drivers’ actions as the supervisory signal/training signal for path planning and/or control actually makes sense to me (whereas end to end learning does not). What are the alternatives?

a) Use a hand-coded algorithm. While this may be effective, we have lots of examples where fluid neural networks outperform brittle hand-crafted rules.

b) Use simulation. Until recently, I didn’t appreciate how much trouble we have simulating the everyday physics of the real world. From OpenAI

“Learning methods for robotic manipulation face a dilemma. Simulated robots can easily provide enough data to train complex policies, but most manipulation problems can’t be modeled accurately enough for those policies to transfer to real robots. Even modeling what happens when two objects touch — the most basic problem in manipulation — is an active area of research with no widely accepted solution. Training directly on physical robots allows the policy to learn from real-world physics, but today’s algorithms would require years of experience to solve a problem like object reorientation.”

While simulation may be a part of the development and training process for path planning and/or control, it probably can’t be the whole process.

c) Use reinforcement learning. One difficulty with using reinforcement learning is how to define the reward function. How do you determine when “points” are added or deducted for the neural network’s performance? Another even greater difficulty: where do you do the training? In simulation? On test tracks? With engineers on public roads?

If Tesla were to get the training signal from the behaviour of Tesla drivers — and use human review to remove examples of bad driving — then it would avoid hand-coded algorithms’ brittleness and simulations’ lack of verisimilitude. I think (but I’m not sure) it would then be possible to use reinforcement learning or supervised learning to improve on this. You put the path planning and/or control neural network into cars running Autopilot and other advanced features, and then you use disengagements, aborts, crashes, and bug reports to identify failures. These failures then become part of the training signal.

strangecosmos · Dec 1, 2018

If my conjecture is correct, I can see how this would be an extremely fast way to solve path planning and/or control. I can also see how it’s an approach that Tesla is uniquely suited to pursue, given a fleet of HW2 cars that is driving something like 400 million miles a month (300,000 cars * 1,380 miles per month). Based on (admittedly scant)
anecdotal evidence, each HW2 car might be uploading an average of 30 MB+ per day.

I can’t help but think of Elon’s comments:

“...I think no one is likely to achieve a generalized solution to self-driving before Tesla. I could be surprised, but... You know, I think we’ll get to full self-driving next year. As a generalized solution, I think. ... Like we’re on track to do that next year. So I don’t know. I don’t think anyone else is on track to do it next year. ... I would say, unless they’re keeping it incredibly secret, which is unlikely, I don’t think any of the car companies are likely to be a serious competitor.”

strangecosmos · Dec 1, 2018

jimmy_d’s reply here.

Bladerskb · Dec 2, 2018

"If my conjecture is correct, I can see how this would be an extremely fast way to solve path planning and/or control."

Why does every thesis has to revolve around extreme bias and optimism? For example "5 seconds to label an image"..

A combination of classical control and deep learning techniques is what's likely to work.

The type of physics simulation that is quoted is not the type of simulation that cars are involved in. The ones that pertain to driving a car on the road is pretty advanced. You see the results in rockets putting a man on the moon, missiles, including the safety system in your car right now. the physics of object manipulation vs driving are two completely different thing.

"the human driver’s behaviour “labels” the sensor data."

You still need discrete states unless you end up with the same problem of imitation learning that you started with, which is not knowing what action caused another and how to correct it if that same situation hasn't been seen before. And because the action you take during path planning actually affect the world and the other actors, means that these situations will arise.

We know that learning errors cascade in imitation learning and is not solved by using metadata.

I love how "human review" was dropped in without acknowledging the exponential increase in labeling bandwidth. The most optimistic, dreamy view is always the conclusion with Tesla, why?

I have come to the conclusion, the reason miles data are chanted from Tesla proponents is because that's all they can say.
Even though Its been proven that its easy to gather/buy data.
A sensing system trained in one city won't stop working in another city, aka the BMWs in UK looks the same as the BMWs in the US.
Humans don't have 10 legs and 5 heads in Paris, they look just alike. There are very minor different in sensing between countries, and these minor differences are documented which allows you to easily gather/buy specific data (for example bot dots in California, or types of traffic lights deployed which are all documented by the government).

Hence there is no data collection advantage when it comes to sensing.

The same is the case with planning. Because most of our roads are actually the same and because we have a map of every road. You can look for roads that are different from the roads in the cities you are doing testing in.

The same way sensing generalizes when you go from city to city. A path planning system that is built to be general will also generalize. People don't drive space ships when you go from CA to Texas. Additionally we know every type of road and junction that exists. If a road is vastly different from the one you are testing in. You can go specifically and easily collect CLEAN and SANITIZED data on human behaviors in it. With the simplistic of data collection and the ability to have a HD map of the entire country in simulation with different actors emulating human behaviors to train your path planning system. There is literally no advantage that Tesla has. Infact its an disadvantage to lean too much on customer data because it requires huge human infrastructure for labeling and reviewing data even for path planning.

As Chris Urmson said "The teams that don’t thoughtfully scale data pipelines that extract value will drown in data and operational complexity."

electronblue · Dec 2, 2018

@Bladerskb it does not bother me so much that some folks are very optimistic about Tesla’s autonomous driving plan but that so many are at the same time very pessimistic about the plans other companies have seems a recipe for a disaster when it comes to a thesis.

This does not seem to me a method to form a thesis from data but a method of picking data to fit a thesis.

strangecosmos · Dec 2, 2018

My further thoughts (along with jimmy_d’s) here:

Tesla AI and behaviour cloning: what’s really happening?

strangecosmos · Dec 10, 2018

Thrilling update on this today. See here.

Search

Tesla AI and behaviour cloning: what’s really happening?

strangecosmos

Non-Member

strangecosmos

Non-Member

strangecosmos

Non-Member

Bladerskb

Senior Software Engineer

electronblue

Active Member

strangecosmos

Non-Member

strangecosmos

Non-Member

Similar threads