A theory about Tesla’s approach to autonomy

strangecosmos · Jan 19, 2019

Here’s my theory about the approach to autonomy Tesla will take in 2019 and beyond:

1. Hardware 3 will make Tesla’s perception neural network(s) a lot more accurate. Andrej Karpathy says Tesla has new, more accurate neural networks already trained and ready to go, but they are too big to run on Hardware 2.

2. Tesla will upload mid-level representations from the HW3 fleet. Mid-level representations are the output of a perception neural network. That is, they are determinations the perception neural network makes based on raw sensor data. Mid-level representations are visualized by, for example, 3D bounding boxes around nearby vehicles. This kind of data is a tiny fraction of the size of raw sensor data, and requires no manual labelling since it is, effectively, a bunch of labels.

3. Tesla will upload the mid-level representations paired with driver input to the steering wheel and pedals. The driver input data will “label” the mid-level representations with the correct response to perceptual cues. For example, stop signs will be “labelled” with deceleration. If you read about Waymo’s ChauffeurNet, this is the same idea. It‘s a technique that falls under the umbrella of imitation learning. It’s a way for a neural network to learn a task by copying humans. (Amir Efrati reported in The Information that Tesla is doing this.)

4. This imitation learning approach may at some point be combined or supplemented with reinforcement learning. Reinforcement learning involves learning through random trial-and-error; an agent tries to maximize its reward (specified by a reward function, which is essentially a points system) by taking random actions and seeing what happens. Reinforcement learning works best in simulation because you can simulate centuries of random exploration every day, and you avoid real danger. The most significant part of the “reality gap” between simulation and real driving is the behaviour of road users. If we could perfectly simulate how human drivers behave, well, then we would have already solved self-driving — because that simulated agent would itself be a human-level self-driving car. A reasonably good, reasonably human-like — but still subhuman — driving system could interact with versions of itself in simulation. Imitation learning could thereby pass the baton to reinforcement learning. This is an idea Waymo suggests in its ChauffeurNet paper and blog post. (Mobileye claims to have had some successes with reinforcement learning just by starting from scratch, with no imitation learning.)

5. Any software pushed to the fleet will probably still require driver monitoring initially. What a driver does after disengaging the software could perhaps be treated as a demonstration for imitation learning. Or perhaps avoiding disengagements and crashes could be part of the reward function for reinforcement learning. It’s possible the training will continue in the real world, after software features are pushed.

6. We can’t predict the result. Imitation learning on this scale has never been tried before. It’s possible that it won’t work and the Tesla AI team will have to circle back and change its approach. It’s also possible that HW3 Teslas’ capabilities will rapidly increase. In AI, sometimes trying a new approach, or implementing it better, or trying it on a new scale, can lead to an unsolved problem (like Montezuma’s Revenge) being suddenly solved. I think this contributes to deep uncertainty about any sort of timeline for full autonomy. Rather than steady progress toward a goal, what we might see instead is a long period of autonomous cars not working and barely improving, eventually followed by a new approach that suddenly works.

Source: OpenAI.

Bladerskb · Jan 19, 2019

The problem with imitation learning (supervised learning over state/action pairs. aka given this state perform this action) is that when you see an environment/scenario (state) that you haven't been trained with and you make an action. That action further leads you to a different env/scene(state), and the problem compounds exponentially till you reach an event horizon.

Since the real world has a unlimited combination of environment and scenario (state). Using imitation learning to simply drive your car would be disastrous even with 10 billion miles . Not only that it will crash but you won't know WHY it crashed. Even if the imitation learning is optimized using inverse reinforcement learning.

Another problem is that the Expert teachers you learn from are humans who drive dangerous and do stupid things which are definitely not what you want your AI car to drive like (for ex; humans don't drive in the center of their lanes, they are always weaving about).

But also you don't want to discard this, you need this set of data of how humans drive.

The fix to that is to NOT use imitation learning for your driving system, but to use imitation learning to TEST and IMPROVE your completely separate driving system. Since you are not using IL to drive the car, you can eliminate the risks it brings with it. So a human-like IL network crashing in simulator would just be another example of a real human doing something stupid while driving in the POV of your separate driving system.

1) Use Generative Adversarial Imitation learning to generate human like driving network by leveraging production cars
2) Place the network into cars in a simulator that is based on real HD map where they are driving like human drivers, do all the stupid sh*t that humans do including reckless and imperfect driving.
3) Place your own vehicle (with its own driving system) in the simulator among with the human-like GAIL cars and run it an infinite number of times (self play).

Mobileye's Amnon explains below and i posted it in a thread last year before ChauffeurNet was published.

Mobileye will launch a self-driving taxi service in Israel in 2019

"How do you measure the probability of an accident. I can't do it on a test track. I can't not drive on a test track and say everything is okay. Test track do not reflect the complexity of the real world. Am i going to drive around my block 1 million miles and say I drove 1 million miles and everything is okay? This is what people do by the way."

"One way to validate is to build a generative model of how human drive. Similar to GANs that create realistic pictures. You can create realistic trajectories of how humans drives by collecting a-lot of data. Using the HD Maps, create a computer game where you have agents driving on realistic roads and the trajectory of their driving paths are mimicking human drivers including reckless human drivers. Then you take our vehicle with our robotic driving policy and you drive in the simulator an infinite number of times (millions) and prove that we don't have accidents."

"How else would you validate this? Take a fleet of 1,000 vehicles and measure how much time i hold the steering wheels? Its all very misleading. Because i can drive in simple areas for 1 million miles and show that i don't touch the steering wheel and avoid going into more complicated areas because i don't want to mess my statistics." - Amnon Shashua

Search

A theory about Tesla’s approach to autonomy

strangecosmos

Non-Member

Bladerskb

Senior Software Engineer

Similar threads