TMC is an independent, primarily volunteer organization that relies on ad revenue to cover its operating costs. Please consider whitelisting TMC on your ad blocker and becoming a Supporting Member. For more info: Support TMC

Reinforcement learning

Discussion in 'Autonomous Vehicles' started by strangecosmos, Aug 27, 2018.

  1. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    1,041
    Location:
    The Prime Material Plane
    AlphaGo Zero is a cool proof of concept for reinforcement learning. But as Andrej Karpathy points out, the game of Go is deterministic, fully observed, and has a discrete action space. What about real world robotics problems — which are stochastic, involve noisy and uncertain information, and have a continuous action space?

    Well, now there are some cool proofs of concept applying reinforcement learning to robotics. OpenAI trained a robotic hand to manipulate a toy block. Craziest of all, the hand was trained purely in simulation. No real world training was used.

    Google used reinforcement learning to train robot pincers — using lots of real world experience — to successfully grasp previously unseen objects 96% of the time over 700 trials. This beat supervised learning, which had a success rate of 78%.

    These proofs of concepts in robotics make me wonder about using reinforcement learning for path planning in autonomous cars. How far away might we be from AlphaGo Zero-like superhuman performance on path planning? What are the obstacles to getting there?

    A little more esoterically, how might evolution strategies be a better solution for path planning than reinforcement learning?
     
  2. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    1,041
    Location:
    The Prime Material Plane
    I wonder if what OpenAI did with simulation for the robotic hand — randomize the uncertain or noisy variables — would be applicable to self-driving car simulation. For the robotic hand, the variables were physics-related. For path planning, I’m not sure what all the variables would be. The distribution and behaviour of entities throughout the environment would be some. Physics-related variables would at first seem to be a problem for control, not for path planning. But I wonder: does a car need to plan different paths if the ground is snowy or wet?

    If simulation could be used to solve path planning the way OpenAI used simulation to “solve” (sort of) toy block manipulation, that could lead to rapid, unexpected improvement — depending on how much computation actually needs to be done. It is possible to simulate billions and maybe even trillions of miles of driving. Once the barrier to progress is just compute — as long as the amount of compute needed is within the budget of companies like Alphabet, Tesla, Intel, and GM — then progress could happen all of a sudden from the perspective of outsider observers.

    If the 100% simulation approach is off the table, then that leaves real world training, like Google used for its robotic pincers. What Google did is analogous to structured testing for autonomous cars at closed facilities like Castle, GoMentum Station, and MCity. A lot of failures in a safe environment. Then of course there’s testing on public roads with safety drivers, and remote operators. Pretty much the status quo, then. Hmm...

    If real world training is the bottleneck for progress on path planning, then it seems like real world miles is the key metric of progress. As mentioned, there are real world test miles at private facilities and on public roads. Could Tesla also make use of Enhanced Autopilot disengagements? For instance, if a customer disengages Autopilot because the car is rounding a corner too quickly, could Tesla use the video, radar, IMU, and GPS data from that disengagement for reinforcement learning?

    In theory, Tesla could feed sensor data from thousands of disengagements for backpropogation, i.e. to tweak the parameters of the neural network it’s using for path planning. Similarly, it could collect sensor snapshots of times when Autopilot e.g. rounded a corner correctly and use those for backpropogation as well.

    These snapshots — failure examples and success examples — could in theory be used not just for training, but for testing as well. Before releasing a new version of Autopilot, Tesla could test the updated neural network on a set of previously unseen e.g. sharp corners, and see if the new version does path planning better than the previous version.
     
  3. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    1,041
    Location:
    The Prime Material Plane
  4. strangecosmos

    strangecosmos Non-Member

    Joined:
    May 10, 2017
    Messages:
    1,041
    Location:
    The Prime Material Plane
    Interesting to note that AlphaGo Zero trained by playing 30 million games of Go. What would be the equivalent for Enhanced Autopilot — 30 million days of driving? Since the average American drives about 30 miles per day, that would be a total of 900 million miles.

    This isn’t a rigorous comparison. It’s comparing apples to orange juice. Go and driving are fundamentally different kinds of tasks. The kind of training is fundamentally different too. AlphaGo Zero’s self-play puts it up against a perfectly matched opponent, whereas Enhanced Autopilot is just contending with the environment.

    Teslas have already driven over 400 million miles on Enhanced Autopilot, and lane keeping still isn’t anywhere close to AlphaGo levels. Enhanced Autopilot seems worse at lane keeping than a typical driver.
     

Share This Page

  • About Us

    Formed in 2006, Tesla Motors Club (TMC) was the first independent online Tesla community. Today it remains the largest and most dynamic community of Tesla enthusiasts. Learn more.
  • Do you value your experience at TMC? Consider becoming a Supporting Member of Tesla Motors Club. As a thank you for your contribution, you'll get nearly no ads in the Community and Groups sections. Additional perks are available depending on the level of contribution. Please visit the Account Upgrades page for more details.


    SUPPORT TMC