Here’s my theory about the approach to autonomy Tesla will take in 2019 and beyond:
1. Hardware 3 will make Tesla’s perception neural network(s) a lot more accurate. Andrej Karpathy says Tesla has new, more accurate neural networks already trained and ready to go, but they are too big to run on Hardware 2.
2. Tesla will upload mid-level representations from the HW3 fleet. Mid-level representations are the output of a perception neural network. That is, they are determinations the perception neural network makes based on raw sensor data. Mid-level representations are visualized by, for example, 3D bounding boxes around nearby vehicles. This kind of data is a tiny fraction of the size of raw sensor data, and requires no manual labelling since it is, effectively, a bunch of labels.
3. Tesla will upload the mid-level representations paired with driver input to the steering wheel and pedals. The driver input data will “label” the mid-level representations with the correct response to perceptual cues. For example, stop signs will be “labelled” with deceleration. If you read about Waymo’s ChauffeurNet, this is the same idea. It‘s a technique that falls under the umbrella of imitation learning. It’s a way for a neural network to learn a task by copying humans. (Amir Efrati reported in The Information that Tesla is doing this.)
4. This imitation learning approach may at some point be combined or supplemented with reinforcement learning. Reinforcement learning involves learning through random trial-and-error; an agent tries to maximize its reward (specified by a reward function, which is essentially a points system) by taking random actions and seeing what happens. Reinforcement learning works best in simulation because you can simulate centuries of random exploration every day, and you avoid real danger. The most significant part of the “reality gap” between simulation and real driving is the behaviour of road users. If we could perfectly simulate how human drivers behave, well, then we would have already solved self-driving — because that simulated agent would itself be a human-level self-driving car. A reasonably good, reasonably human-like — but still subhuman — driving system could interact with versions of itself in simulation. Imitation learning could thereby pass the baton to reinforcement learning. This is an idea Waymo suggests in its ChauffeurNet paper and blog post. (Mobileye claims to have had some successes with reinforcement learning just by starting from scratch, with no imitation learning.)
5. Any software pushed to the fleet will probably still require driver monitoring initially. What a driver does after disengaging the software could perhaps be treated as a demonstration for imitation learning. Or perhaps avoiding disengagements and crashes could be part of the reward function for reinforcement learning. It’s possible the training will continue in the real world, after software features are pushed.
6. We can’t predict the result. Imitation learning on this scale has never been tried before. It’s possible that it won’t work and the Tesla AI team will have to circle back and change its approach. It’s also possible that HW3 Teslas’ capabilities will rapidly increase. In AI, sometimes trying a new approach, or implementing it better, or trying it on a new scale, can lead to an unsolved problem (like Montezuma’s Revenge) being suddenly solved. I think this contributes to deep uncertainty about any sort of timeline for full autonomy. Rather than steady progress toward a goal, what we might see instead is a long period of autonomous cars not working and barely improving, eventually followed by a new approach that suddenly works.
Source: OpenAI.
1. Hardware 3 will make Tesla’s perception neural network(s) a lot more accurate. Andrej Karpathy says Tesla has new, more accurate neural networks already trained and ready to go, but they are too big to run on Hardware 2.
2. Tesla will upload mid-level representations from the HW3 fleet. Mid-level representations are the output of a perception neural network. That is, they are determinations the perception neural network makes based on raw sensor data. Mid-level representations are visualized by, for example, 3D bounding boxes around nearby vehicles. This kind of data is a tiny fraction of the size of raw sensor data, and requires no manual labelling since it is, effectively, a bunch of labels.
3. Tesla will upload the mid-level representations paired with driver input to the steering wheel and pedals. The driver input data will “label” the mid-level representations with the correct response to perceptual cues. For example, stop signs will be “labelled” with deceleration. If you read about Waymo’s ChauffeurNet, this is the same idea. It‘s a technique that falls under the umbrella of imitation learning. It’s a way for a neural network to learn a task by copying humans. (Amir Efrati reported in The Information that Tesla is doing this.)
4. This imitation learning approach may at some point be combined or supplemented with reinforcement learning. Reinforcement learning involves learning through random trial-and-error; an agent tries to maximize its reward (specified by a reward function, which is essentially a points system) by taking random actions and seeing what happens. Reinforcement learning works best in simulation because you can simulate centuries of random exploration every day, and you avoid real danger. The most significant part of the “reality gap” between simulation and real driving is the behaviour of road users. If we could perfectly simulate how human drivers behave, well, then we would have already solved self-driving — because that simulated agent would itself be a human-level self-driving car. A reasonably good, reasonably human-like — but still subhuman — driving system could interact with versions of itself in simulation. Imitation learning could thereby pass the baton to reinforcement learning. This is an idea Waymo suggests in its ChauffeurNet paper and blog post. (Mobileye claims to have had some successes with reinforcement learning just by starting from scratch, with no imitation learning.)
5. Any software pushed to the fleet will probably still require driver monitoring initially. What a driver does after disengaging the software could perhaps be treated as a demonstration for imitation learning. Or perhaps avoiding disengagements and crashes could be part of the reward function for reinforcement learning. It’s possible the training will continue in the real world, after software features are pushed.
6. We can’t predict the result. Imitation learning on this scale has never been tried before. It’s possible that it won’t work and the Tesla AI team will have to circle back and change its approach. It’s also possible that HW3 Teslas’ capabilities will rapidly increase. In AI, sometimes trying a new approach, or implementing it better, or trying it on a new scale, can lead to an unsolved problem (like Montezuma’s Revenge) being suddenly solved. I think this contributes to deep uncertainty about any sort of timeline for full autonomy. Rather than steady progress toward a goal, what we might see instead is a long period of autonomous cars not working and barely improving, eventually followed by a new approach that suddenly works.
Source: OpenAI.
Last edited: