strangecosmos
Non-Member
Talking about the future of technology requires conjecture about things that exist today because future technology doesn't exist today.
Meant to say: conjecture about things that don't exist today
- - - - - - - - - - -
Let me try to summarize the discussion so far. I put forward an idea that I'll call the imitation learning thesis. The imitation learning thesis says:
If/when Tesla solves perception (which will require, at a minimum, HW3, and probably more NN training after HW3's launch), it will be able to collect state-action pairs from HW3 vehicles at essentially whatever scale it wants. These state-action pairs require less bandwidth and storage than raw sensor data and, unlike raw sensor data, don't require costly human annotation.
State-action pairs can be used for imitation learning. Imitation learning is an approach used and endorsed by Waymo for autonomous driving. According to Waymo's head of research, Drago Anguelov, the reason Waymo doesn't use imitation learning more is a lack of training examples from the long tail of human driving behaviour — something Tesla could, in the future, have the ability to collect. A paper published by Waymo showed (on page 14) that imitation learning achieved an 85-100% success rate on certain driving challenges in simulation. Waymo noted the human success rate for these challenges is unknown (e.g. a human approaching a stopped car at high speed might also be unable to avoid a bad outcome). Waymo also did a few successful trial runs in the real world. Waymo's neural network, ChauffeurNet, was trained on ~1,440 hours (~60 days) of human driving.
State-action pairs can be used for imitation learning. Imitation learning is an approach used and endorsed by Waymo for autonomous driving. According to Waymo's head of research, Drago Anguelov, the reason Waymo doesn't use imitation learning more is a lack of training examples from the long tail of human driving behaviour — something Tesla could, in the future, have the ability to collect. A paper published by Waymo showed (on page 14) that imitation learning achieved an 85-100% success rate on certain driving challenges in simulation. Waymo noted the human success rate for these challenges is unknown (e.g. a human approaching a stopped car at high speed might also be unable to avoid a bad outcome). Waymo also did a few successful trial runs in the real world. Waymo's neural network, ChauffeurNet, was trained on ~1,440 hours (~60 days) of human driving.
Two Waymo researchers argue that imitation is both 1) intrinsically useful as a way to train a car to drive and 2) instrumentally useful as a way to set up reinforcement learning in simulation:
ChauffeurNet and AlphaStar are promising proofs of concept for imitation learning. For Tesla, imitation learning could be 1) intrinsically useful and 2) instrumentally useful as a way to bootstrap reinforcement learning. If Tesla can solve perception, it will be positioned to collect state-action pairs for imitation learning on a unique scale: billions of miles per year. This unique position could be a competitive advance in autonomous driving.
Now let me try to summarize objections to the imitation learning thesis from this thread."...doing RL requires that we accurately model the real-world behavior of other agents in the environment, including other vehicles, pedestrians, and cyclists. For this reason, we focus on a purely supervised learning approach in the present work, keeping in mind that our model can be used to create naturally-behaving “smart-agents” for bootstrapping RL."
In a different domain, StarCraft, DeepMind took this exact approach. First, it used imitation learning to attain performance estimated to be roughly around the human median for competitive play. Second, it used imitation learning to bootstrap reinforcement learning and achieved professional-level performance. StarCraft is in some ways unlike driving, but it has more in common with driving than, for instance, Go: it involves real time strategic and tactical action in a 3D environment with a continuous action space, imperfect information, and a vastly higher number of possible moves at any time interval.
ChauffeurNet and AlphaStar are promising proofs of concept for imitation learning. For Tesla, imitation learning could be 1) intrinsically useful and 2) instrumentally useful as a way to bootstrap reinforcement learning. If Tesla can solve perception, it will be positioned to collect state-action pairs for imitation learning on a unique scale: billions of miles per year. This unique position could be a competitive advance in autonomous driving.
Objection #1: Tesla hasn't solved perception, so the point is moot.
My response: Tesla hasn't solved perception, but the point is not moot. If/when Tesla solves perception, it can apply imitation learning on a scale no one else can.
Objection #2: Billions of miles of state-action pairs data isn't required to attain human-level driving.
My response: We don't know how much data is required.
Objection #3: Other companies are collecting, or will in the future collect, more state-action pairs than Tesla.
My response: What evidence is there that this is true?
Objection #4: Tesla's efforts at solving perception will be made more difficult because it doesn't use lidar.
My response: Yes, in some cases such as road user detection, this is true. However, it's not true when the perception task requires seeing depthless features or light, such as: lane lines, traffic lights, signs, and turn signals. In these cases, only cameras can be used. This is why Mobileye, for instance, is pursuing a camera-first approach to autonomy.
Objection #5: Talking about how Tesla might use imitation learning in the future is just speculation.
My response: Talking about the future of autonomous vehicles is inherently speculative. Investigative reporting from Amir Efrati says that Tesla is using, and plans to use, imitation learning. So the premise is not purely speculative.
Objection #6: Tesla doesn't have a full self-driving simulator, which is a necessary part of training.
My response: Job postings indicate Tesla began looking to hire people to work on a full self-driving simulator no later than November 2017.
Objection #7: Fully autonomous driving may require human perceptual or cognitive capabilities that are fundamentally just impossible for the current machine learning paradigm.
My response: True, but unless we can prove this, we should try anyway. (Also, this objection applies equally to Waymo, Mobileye, Cruise, Zoox, et al. as to Tesla.)
My response: Tesla hasn't solved perception, but the point is not moot. If/when Tesla solves perception, it can apply imitation learning on a scale no one else can.
Objection #2: Billions of miles of state-action pairs data isn't required to attain human-level driving.
My response: We don't know how much data is required.
Objection #3: Other companies are collecting, or will in the future collect, more state-action pairs than Tesla.
My response: What evidence is there that this is true?
Objection #4: Tesla's efforts at solving perception will be made more difficult because it doesn't use lidar.
My response: Yes, in some cases such as road user detection, this is true. However, it's not true when the perception task requires seeing depthless features or light, such as: lane lines, traffic lights, signs, and turn signals. In these cases, only cameras can be used. This is why Mobileye, for instance, is pursuing a camera-first approach to autonomy.
Objection #5: Talking about how Tesla might use imitation learning in the future is just speculation.
My response: Talking about the future of autonomous vehicles is inherently speculative. Investigative reporting from Amir Efrati says that Tesla is using, and plans to use, imitation learning. So the premise is not purely speculative.
Objection #6: Tesla doesn't have a full self-driving simulator, which is a necessary part of training.
My response: Job postings indicate Tesla began looking to hire people to work on a full self-driving simulator no later than November 2017.
Objection #7: Fully autonomous driving may require human perceptual or cognitive capabilities that are fundamentally just impossible for the current machine learning paradigm.
My response: True, but unless we can prove this, we should try anyway. (Also, this objection applies equally to Waymo, Mobileye, Cruise, Zoox, et al. as to Tesla.)
Last edited: