Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Jeff Schneider (Carnegie Mellon/Uber ATG): Self-Driving Cars and AI

This site may earn commission on affiliate links.
@heltok posted this awesome video in another thread (great find heltok!). It’s a talk by a robotics professor at Carnegie Mellon who worked on autonomous cars at Uber ATG.


To me, the most interesting theme of his talk was the contrast between two self-driving car architectures. First, there’s what I’ll call the classical robotics architecture:

1cSytB2.jpg


Then there are end-to-end learning architectures:

PSr4pYY.jpg


You go from sensor input to actuator output with one big neural network in the middle. You train the network via imitation learning using human-driven miles in the autonomous car (or potentially via reinforcement learning in simulation). Jeff Schneider talks about Uber ATG’s experiments with this in the talk.

A third alternative is a mid-to-mid architecture. Schneider alludes to this in the talk when he says (at 59:50):

“If you want to make the problem hard, force it to start from the camera image. ... But, again, we have the option here to start small. We can take the existing perception system and just work on the motion planning system to start with.”
An example is Waymo’s ChaffeurNet. Instead of sensor input to actuator output, you go from perception neural network output to classical control algorithm input. It goes:

Sensors —> Perception neural network —> Action neural network —> Classical control algorithm —> Actuators​

By contrast, the classical robotics architecture goes:

Sensors —> Perception neural network —> Classical action algorithms —> Classical control algorithm —> Actuators
So, the important difference between the classical robotics architecture and the mid-to-mid architecture is that in the mid-to-mid architecture the vehicle’s decisions about what actions to take are handled by a neural network, instead of hand-coded software.

The action neural network is trained with a) imitation learning using data from human driving, b) reinforcement learning in simulation, or c) both.

I may not have all the terminology exactly right, but I think I have the general concept right.

As far as I know, no major company is actively pursuing an end-to-end architecture that goes directly from sensors to actuators. But Waymo, Tesla, and Mobileye (33:10 to 46:15) all seem to be pursuing a a mid-to-mid architecture where the perception neural network’s output is the action neural network’s input (and the action neural network’s output is the classical control algorithm’s input).

Schneider doesn’t actually say what architecture Uber ATG is pursuing, except that it’s not an end-to-end architecture. If Schneider’s views reflect the majority view of technical leads at the company, I would guess Uber is working on a mid-to-mid architecture.

Big picture: some of the major self-driving companies are coming around to the idea that a self-driving car’s actions should be decided by a neural network trained via imitation learning and/or reinforcement learning, as opposed to hand-coded software.
 
Last edited:
  • Informative
Reactions: Tam
An interesting idea from a Mobileye talk on reinforcement learning is an options graph:

“Semantic Abtraction: we decompose the diving policy function into semantically meaningful components using an options graph”

“Decomposing the problem into this graph helps us to learn every component of the graph much faster...”

xcWrsTY.jpg



Rather than treating driving policy — the set of high-level actions a car can take, distinguished here from the lower-level actions of path planning/trajectory planning — as a pure machine learning/neural network problem, the options graph imports some degree of explicit human domain knowledge (perhaps unwisely), some degree of hand crafting into the driving policy solution.

As I understand it, you then have two problems:
  1. Choosing the right action. (What to do.)
  2. Executing the action correctly. (How to do it.)
I think my intuition about why this an intriguing idea is that it reduces the combinatorial size of the search space. Instead of driving policy errors = action choice errors * action execution errors, with an options graph you train action choice and action execution independently.

This is the same intuition around using a mid-to-mid architecture rather than an end-to-end architecture. With an end-to-end architecture, errors = perception errors * action errors. With a mid-to-mid architecture, perception and action are trained independently.

But when I try to imagine how an options graph would work in practice, what I imagine doesn’t make sense. I assume the training is happening via reinforcement learning in simulation. Do you define a separate reward function for each individual action, and then run 5-second “make room” simulations or 5-second “push” simulations millions of times?
 
Last edited:
  • Informative
Reactions: Tam
I really encourage people to watch the video in the OP. The contrast between end-to-end learning, the classical architecture, and mid-to-mid learning of path planning is super helpful for understanding the autonomous vehicle problem space. Waymo, Tesla, and Mobileye all seem to be developing machine learning solutions to path planning. On TMC, we’ve talked a lot about applying machine learning to computer vision, but not much about applying machine learning to path planning. If you want to understand the big picture of autonomous vehicle tech, I think this a key piece.

A tip: I watched the lecture on 1.5x speed; that might make it easier to listen to if you find yourself getting bored or impatient. Talking speed is slower than listening speed, so your brain gets frustrated listening sometimes — the talking feels too slow. But for me 1.5x speed often feels just right.
 
Rather than treating driving policy — the set of high-level actions a car can take, distinguished here from the lower-level actions of path planning/trajectory planning — as a pure machine learning/neural network problem, the options graph imports some degree of explicit human domain knowledge (perhaps unwisely), some degree of hand crafting into the driving policy solution.

This is the type of stuff that I have been trying to make you see, basically.

The solution to autonomous driving is probably not going to simply or automatically or even most likely come from someone who can train an NN best or has the most data for an NN — just because they can train an NN best or have the most data anyway (it might come from them for a combination of other reasons of course).

It will likely be a combination of techniques, partnerships and experiences and we don’t know who first gets the winning formula right yet.
 
Last edited:
  • Disagree
Reactions: strangecosmos