AlphaGo Zero is a cool proof of concept for reinforcement learning. But as Andrej Karpathy points out, the game of Go is deterministic, fully observed, and has a discrete action space. What about real world robotics problems — which are stochastic, involve noisy and uncertain information, and have a continuous action space? Well, now there are some cool proofs of concept applying reinforcement learning to robotics. OpenAI trained a robotic hand to manipulate a toy block. Craziest of all, the hand was trained purely in simulation. No real world training was used. Google used reinforcement learning to train robot pincers — using lots of real world experience — to successfully grasp previously unseen objects 96% of the time over 700 trials. This beat supervised learning, which had a success rate of 78%. These proofs of concepts in robotics make me wonder about using reinforcement learning for path planning in autonomous cars. How far away might we be from AlphaGo Zero-like superhuman performance on path planning? What are the obstacles to getting there? A little more esoterically, how might evolution strategies be a better solution for path planning than reinforcement learning?