EM quote
(via Teslarati):
“There’s a new version of Autopilot that’s rolling out, I think, this week which I think is quite a significant improvement. What you’ll see is that the reliability and capability of Autopilot will increase exponentially over the next 6-12 months. The improvements are very, very rapid.”
“I was just testing that last night at about 1 a.m. I think we might be able to release something in a couple of months that can do that. We’ve been pursuing two paths. One really complicated path that I think isn’t working that great. And then a simple path that I think will work pretty well.”
“I was able to able to drive last night, going from highway on-ramp to highway off-ramp using the simplified version of the control system. And I think with some further effort, we can get that out in the next couple months.”
What could these "two paths" be? AI vs Conventional?
That caught my attention too. We don't know which part of the system was different between the complicated and simple versions (perception? planning? control?) but presumably it was the part that was giving them the most trouble. On these boards there's a strong bias towards seeing perception as the limiting issue and there's some merit to that position. So lets run with that for now.
Tesla's statements and my own review of AP code and outputs suggests that neural networks are a central and probably the limiting component of their vision perception system today. So what makes a neural network vision perception system complicated or simple? The network design, network training, and network use are all candidates here as is the non-network code used in the vehicle.
Something that is not well appreciated about using NNs in deployed systems is that the NN code itself is rarely most of the code or most of the complexity in a system. There are lots of NN reference designs that are implemented in a single page of code and even the networks used in systems as advanced as AlphaGo generally reduce to just a few pages of code.
An example of this is Karpathy's own "Pong to Pixels" blog entry:
Deep Reinforcement Learning: Pong from Pixels where he implements from scratch a state of the art TRPO reinforcement learning algorithm (which is one of the more complex approaches being used BTW) to solve the same problem that DeepMind addressed in their 2015 Nature paper. The entire code base for that, excluding math libraries, is 130 lines of python:
Training a Neural Network ATARI Pong agent with Policy Gradients from raw pixels
The complexity in a neural network isn't written or designed, it's the part that is *learned* by the network. The code itself is usually relatively simple.
But for real world applications there's often a huge amount of interface and management code surrounding this small neural network codebase that makes the NN itself usable for the overall application. The size and complexity of this surrounding code depends on what role the NN itself is playing in the application and how well the NN's capabilities match up to what the application needs.
I'd speculate that the difference between 'simple' and 'complex' in this case comes from reducing the non-NN code either by letting the NN take on more of the overall job or by redefining the job to be a better match to how the NN is currently performing. This makes sense mainly when the NN is working better than initially planned for, or when you are capitulating on some aspect of your original objective.
Elon's comment was so terse that he could have been talking about almost anything. We have very close to zero specific context here. But this is what ran through my mind when I heard him talk about the simple version working better.