I think many folks are circling around the essence and substance of this debate, without touching on the fundamental things that really matter.
Why Tesla could be closer than many think (per the thread title):
It is a truism in deep learning that data, compute, and neural network design are the three factors that determine performance. To argue, in the absence of direct quantitative evidence, that one companies' NNs have dramatically better performance than another companies', one must construct a plausible explanation of how that could be the case based on these three factors.
In the autonomous vehicle application in particular, we know that, for major, multi-billion-dollar companies, training compute is by far the least significant constraint of the three. Abundant cheap compute exists in the cloud and these companies can set up their own GPUs or ASICs for training.
Neural network design is the most unpredictable and mysterious of the three factors. However, there is good reason to believe it is a much less significant source of competitive advantage among major companies than data.
As Karpathy stated in
the tweet I posted many pages back, most cutting-edge research in AI is conducted by either a) academic labs like Mila or b) industry labs like DeepMind. Counterintuitively, the industry labs publish a huge amount of research that is replicable by other labs based on reading their papers and often even open source their research.
This is not because these corporations are simply generous, but largely because there is a very powerful ethos among AI researchers of publishing replicable papers. If Alphabet suddenly forbade DeepMind from publishing their research, there would no doubt be an exodus of researchers from DeepMind to FAIR or somewhere else that still allowed publishing.
In other words, AI researchers as a subculture are ideologically committed to open science, and this puts pressure on companies that want to do AI research to allow their researchers to do open science.
This is why the primary competitive advantage in the current landscape is data. Compute is abundant and cheap, AI research is largely open, but data is relatively scarce, expensive, and can be jealously hoarded.
Why Waymo’s L4 is not automatically more impressive than Tesla’s L2:
It should not be surprising — it should be obvious — that L4 in a highly constrained environment with lots of crutches is a much easier problem than L4 in the wild, with basically no constraints.
For Tesla to achieve human-level L4 driving with their FSD software would be a vastly larger technical achievement than getting to human-level L4 driving within Waymo’s constraints.
It’s not clear to me which is more difficult: making a driverless robot work in Waymo’s playpen or making an L2 robot work in the wild. It’s possible they’re about equally difficult.
What we cannot accept as sound reasoning is that L4, irrespective of constraints, is better or more impressive or more advanced than L2 in the wild simply because 4 is a higher number than 2. That is folly.
Waymo’s technology could not support an L2 system in the wild because it depends on crutches that Waymo only has within its playpen. If you stripped away the crutches and forced Waymo employees to re-develop the software for L2, I reckon you’d (eventually) end up with something comparable to FSD Beta.
Conversely, if you took Tesla’s technology and built a playpen for it in Arizona with all the same crutches Waymo uses, I bet you’d eventually end up with something comparable to Waymo’s driverless proof of concept.
If anything is going to break through the challenges in perception, prediction, and planning that continue to confound AVs, it will be the application of new approaches or new advances in old approaches — such as 4D vision, multi-task learning, self-supervised learning, imitation learning, and reinforcement learning — at the million-vehicle scale, with thoughtful data curation (using things such as active learning and shadow mode).
Solving L4 in the wild with this data is a fundamentally different problem — a fundamentally easier problem — than solving L4 in the wild (not in a playpen) with the data you can get from a few hundred vehicles. It requires neural networks to generalize much less. It trains them with an amount of data commensurate with what we’ve seen in successful AI projects.
Waymo has driven less than 1,000 years in its totality. Artificial agents that play modern, complex 3D games like StarCraft and Dota are trained on a different order of magnitude of experience: in the ballpark of 100,000 years, rather than 1,000.
This is why we have to look beyond shallow comparisons between Waymo and Tesla. It is too simplistic to say Waymo has more advanced AI because 4 is a bigger number than 2. We have to look at the size of the problem — its scope, its constraints, its crutches, and also the resources, i.e. the data, that a company can use to solve it.
No, I’m being tongue-in-cheek to illustrate the error of an argument I’m objecting to. Sarcastically applying an interlocutor’s argument to derive a silly conclusion is a time-honoured tactic in argumentation.
If we go along with the premise that L4 is unconditionally better than L2, we end up with absurd and obviously false conclusions. So, we must reject the premise.
The bar for L4 is farcically low. The impressiveness of any autonomous driving technology is
not in its SAE Level alone. It has to be judged based on multiple criteria, including environmental, geographical, temporal, and meteorological scope, as well as statistical success rate (or failure rate) at driving tasks within that scope.