Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.

NeurIPS paper: Causal Confusion in Imitation Learning

This site may earn commission on affiliate links.


she/her, they/them
Feb 10, 2021

“Behavioral cloning reduces policy learning to supervised learning by training a discriminative model to predict expert actions given observations. Such discriminative models are non-causal: the training procedure is unaware of the causal structure of the interaction between the expert and the environment. We point out that ignoring causality is particularly damaging because of the distributional shift in imitation learning. In particular, it leads to a counter-intuitive “causal misidentification” phenomenon: access to more information can yield worse performance. We investigate how this problem arises, and propose a solution to combat it through targeted interventions—either environment interaction or expert queries—to determine the correct causal model. We show that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations.”

PDF: https://proceedings.neurips.cc/paper/2019/file/947018640bf36a2bb609d3557a285329-Paper.pdf

Short talk by one of the authors explaining the paper:

Long, wide-ranging interview with one of the authors:

Advancements in Machine Learning with Sergey Levine - The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Autonomous vehicles can learn behaviour planning from observing human driving behaviour, but a weakness in this approach is that the learning is superficial and latches onto incorrect proxies for or correlates to the correct behaviours.

Targeted interventions provide a way for deep neural networks to correctly learn the real causal relationships between the key elements of the perceptual world.

Understanding causality is widely believed by ML experts to be an important challenge in deploying robotic agents (such as autonomous vehicles) that act intelligently in the real world.
Not disagreeing with the paper, but...

But, behavioral cloning of experts, isn't the goal for most of AI learning, the goal is to be much better, than the average expert/driver. Needs to be better to gain acceptance. Learning to be a poor imitation of an expert, isn't really much use in the scheme of things.

You can't achieve that, by simply learning to imitate what experts have decided was the cause/affect relationship. In that case, you will never be better than the expert, you can only approach the competency of the teacher.

You need multiple methods of learning, of coarse.
  1. Being taught by a teacher, from existing knowledge gets you going. (Teaching)
  2. Then leaning more by watching peers/experts do what you are trying to do, knowing what the good/bad outcomes are, moves you beyond any one teacher. (Emulating Experts)
  3. Then you are out on your own, learning from experience. This is where you learn more about what good outcomes are, etc. etc. (Experience)
When you get to #2 and especially #3, that is where LOTS of data comes in, as the data isn't black/white and the causes/effects aren't already known.

At #2 you really aren't trying to emulate/clone the expert, you are trying to build a generalization of what you see, of all the experts. Experts aren't 100% correct 100% of the time. So don't want to emulate them, want to learn/generalize what the rules are.

Experts also, won't be able to teach you 100% of what they know. Experts can't teach you 100% of what they know. Maybe 50% of what they know is conscious, the other 50% is experience (that they don't know why they know, they just do)

At #3 then you are refining what you learned from watching experts, AND making your own rules up based upon the outcomes you see. Need to get into #3 if you want to exceed the capabilities of your teachers.

Ever heard the expression "You start the game with a full pot o' luck and an empty pot o' experience... The object is to fill the pot of experience before you empty the pot of luck.". Well, that.

That requires ingesting lots of data, and working out what's good, what's bad, and what you can get away with, for yourself. The AI is no different really. Just has the advantage of LOTS more data it can see, and can process that data/learn from it, quicker than we can.

Much of 'experience' isn't about finding the cause/effect relationships, it's about getting advanced warning of the "butterfly effect". There is no cause/effect; BUT you can make predictions of behavior, even so, if you know the rules.