Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

AI experts: true full self-driving cars could be decades away because AI is not good enough yet

This site may earn commission on affiliate links.
Ah, that's possible. I haven't followed the specifics of Tesla's implementations closely as I don't think they publish much of their work. Assuming they are using some sort of RNN like architecture or some other means to distill temporal information and feed it back into the network, that can certainly help it do a better job, but the point in my original post still stands. The network is still not rooted in any physics at all. It is just an approximation machine and it can still break down in completely unpredictable ways on samples that are not similar enough to the training-set distribution. It doesn't for example "learn" classical structure-from-motion algorithms and wouldn't be able to generalize well on data that looks very different even though classical algorithms rooted in physics would.

Well I agree and disagree. :D

Everything you said tells me you are a smart experienced ML expert. All your points stand about basically any ML algorithm If some ML algorithm is trained on Lidar data, you would have the same potential issues with training/test distributions. Obviously thought, the more complicated the algorithm (like deep learning) and more noisy the data, the worse it could become.

But I would still argue deep learning techniques can still learn some physics. I mean, that is definitely true in the pure sense. Feed in time series with labels that are some order of integrals or time derivatives and convnets can figure the true math out (of course convolutions are set up for this) that still generalize as well as standard derivatives.

If you fed in *perfect* image data with high resolution, no noise, etc... where structure from motion algos would work, I think the right architecture could also learn the same math to get it right and be simplistic enough to generalize.

The benefit of course of deep learning is still that it is statistical and can achieve better performance than classical algorithms. The best performing algorithms (on independent test dat) will have learned the most simplistic representation of the physics that also handles the noise and edge cases in the best manner on average.

So I agree with all your points on the potential flaws, but it doesn't mean it's impossible. It means you need a lot of diverse data to ensure that your training / test set distributions converge, and then you can better trust if your algorithm is even good enough. Doesn't guarantee it though.
 
A video game has a constrained set of rules, and the AI has a pixel perfect view, and even input "under the hood".
So it can train on that very limited and specific data for specific scenarios (finishing a level that never changes)

Reality is extremely complex and is ever changing, so it needs to adapt. That's why self driving is hard. You could probably train for a street in your neighborhood or a block , by running the car crashing over and over again till it gets it, given that you don't change the environment too much :D

That's right - in games they fail over and over again. But in reality you will cause damage for millions of dollars and endanger people, sending the car crashing in random spots over and over again till it manages to get through :D It will also be good ONLY for that block


Exactly. A video game gives you unlimited data set with a confined dimensionality and confined set of possible outcomes. A video game based on driving (ahem a simulator) would have been "solved" by now. The actual real challenging issues are in the odd edge case data everyone keeps talking about and the only way to even possibly overcome these is to collect a lot of them.

This is why it's a joke when some companies tout their simulators as being any sort of key reason they will succeed.
 
History is a great predictor of the future. History tells us that revolutionary changes are the result of a slow, steady march. Anybody expecting anything but a slow, steady march is making a mistake. I have seen zero research papers, and zero products in the real world that lead me to believe that a generalized L5 driving system is coming any time soon, or that the current solutions for L2-L4 are going to be applicable.



I'm very interested in good faith debate, but you people are repeating the same things you've been saying since 2016. At some point, repeating myself to you became a chore rather than a debate. As I've said, we've been working on this problem with the current strategy since the 1980s. Before then, we were working on this problem with all kinds of other strategies since the 1950s.

Consider this- How long has it taken to improve BEVs and all of their constituent parts? How much have they improved since the 1980s? Now compare that to the promises of nuclear fusion energy, and the decade after decade refrain that we're "just a couple years away". But we aren't even close to a couple years away. Meanwhile, entire new battery materials have gone from concept to research, into lab and finally production and improvement cycles. In less than 80 years, we went from the initial Hewlet Packard computer to the Internet and smart phones. In 60 years, neural networks are still in their infancy- easily tricked, brittle, black boxes that rapidly produce lawsuits as people misunderstand what they're holding and apply it to increasingly complex problem sets. If we can't reliably use "AI" in radiology settings, how are we expecting to use it reliably in uncontrolled environments exactly?

CMU has been the preeminent researcher in this space for over 40 years now and they are the originators of the modern solution to autonomous driving. So, you can and should certainly debate what I've said. But when the most advanced and experienced research facility isn't confident in the solution, and they don't believe autonomy will be solved for decades still, you're going to have to argue with more than PR slides from a company seeking investors. You can all sit in this thread and try to tear down the incredible work that CMU has done, and the leading research they have produced by comparing screen shots of Tesla's rendered UI to their UI from 1995, but that honestly just looks foolish. Their solution back then IS the parent of the solutions being used now. All founded on the same concepts, improved excruciatingly slowly over time. Much slower than any successful technology we've used in modern life.


There were indeed step function changes in the performance of neural nets "overnight" 8 years ago or so. Image classification, language text translation, audio processing all had big changes in the accuracy of their systems based on a few fundamental modifications to neural net architectures and gpu training. And those all ended up into production code of products we all use today. Those were rapid changes.

Combinations of deep nets need a lot of a data to perform well in general. For something like self driving, that basically was never possible before. Researchers have proposed many ideas that have been well before their time. That doesn't really mean their progress was slow.

Actually, performance of autonomous cars has rapidly improved over the last 10 years. You can certainly argue that it still isn't good enough and may never get there, but the recent progress has been significant.
 
The actual real challenging issues are in the odd edge case data everyone keeps talking about and the only way to even possibly overcome these is to collect a lot of them.

This is why it's a joke when some companies tout their simulators as being any sort of key reason they will succeed.
Nobody says they are only using simulation. Simulation can actually be a great way to expand your edge cases when blended with in-field data collection. For instance, the recent example of Tesla's interpreting a stop sign painted on a truck as a real stop sign. Once you see that edge case, you realize that vehicles can have anything painted on them. Wraps are a thing. Once you realize that, you can program your simulation to wrap random textures on all the vehicles, and get much, much more coverage than you could from trying to collect data from the real world, and you can eventually demonstrate that your system is robust to this issue (like a human is) rather than just being reactive to the specific cases you happened to collect a few times.

It's more concerning for the companies that think they can get away without simulation as a fundamental. Technology is nowhere near the point where we can collect raw video from large quantities of fielded vehicles, and you cannot "fail" in the real world like you can in a simulator, nor can you get the critical feedback signals as efficiently. Any system that only sends back data when it detects an "edge case" suffers from needing to detect the edge case, which is an immense challenge in itself, detecting things that you can't yet detect....


There were indeed step function changes in the performance of neural nets "overnight" 8 years ago or so. Image classification, language text translation, audio processing all had big changes in the accuracy of their systems based on a few fundamental modifications to neural net architectures and gpu training. And those all ended up into production code of products we all use today. Those were rapid changes.

Actually, performance of autonomous cars has rapidly improved over the last 10 years. You can certainly argue that it still isn't good enough and may never get there, but the recent progress has been significant.

As you say, a step change in classification occurred 8-ish years ago. But we haven't seen that repeat every few years, and that has not led to large performance increases in intention vs classification. Yes, an Alexa, Siri, Cortana, or Google can convert the words you say into text very, very well. However, they have not then learned as well what you mean when you ask "what is the temperature?" even though now that they have the classification, they are the ultimate version of "big data." The issue is that just like trying to learn from watching a human drive, they lack the feedback signal to if they got the intent correct.

As you indicated, this same step change impacted self driving in the last 10 years. We got AP1 in Teslas in 2014. But we haven't seen major changes since then- incremental, yes, but the machines are still struggling with classification and intention. I think everyone would agree that over the last 10 years we went from nothing to pretty good L2 highway systems. But a lot of that happened closer to 10 years ago than recently, so it's pretty reasonable to assume we're still a few more step changes away, and those might only happen once every 10 years, not that once we have classification working, the intent portion will come along naturally and rapidly.
 
Forgive me if the list is a little off, but it's usually understood that Self-Driving system it's some variant of:
Perception Localization Decision Planning Control

Yeah, pretty close. IMO, Decision is the same as Planning. There is also Prediction, although maybe you consider that to be a part of Perception.

Here is how Let's Talk Autonomous Driving describes it:

An autonomous vehicle needs to answer 4 questions:
  • Where am I? (Localization)
  • What’s around me? (Perception)
  • What will happen next? (Prediction)
  • What should I do? (Decision or Planning)

I don't hear too much debate about feasibility of Localization*, more about the relative necessity of detailed pre-mapping. Robotically Planning the trajectory and Control of its execution (once the decision has been made) are still imperfect based on watching Tesla FSD videos, but that has nothing to do with feasibility or compute and machine-design limitations, and in my view need no technical help beyond better-quality driving expertise encoded into the software.

So this leaves Perception and Decision as the most challenging Intelligence-based aspects, and the argument about AI's chance of success is mostly around edge cases there. These are the areas that could really use a leg up over humans, because we're very doubtful that AI is on the trajectory to solve them as well as adaptable humans, especially not to the desired mistake-free level that everyone wants.

IMO, it is Prediction and Decision/Planning that are the two big challenges.

Perception is not a big challenge if you use all the sensors (cameras, lidar and radar) and HD maps. Camera vision is pretty good now at detecting and classifying most objects. Lidar is very accurate at detecting road features and objects and measuring distances. Radar can be very effective at measuring distance and velocity of moving objects, especially in low visibility conditions, like fog, rain or snow. Lastly, HD map is very good at giving the car an accurate map of static road features. And having different sensors means you can cover different cases that maybe a single sensor could not do on its own. I am not saying that there aren't edge cases in Perception that still need to be solved, but if you combine all of the sensors together, you can get really good perception, that is "good enough" for most FSD.

Prediction is a challenge because of the unpredictable nature of the whole world, especially in the city. You can have a large number of pedestrians, cyclists, vehicles etc... that will have different behaviors and may not always behave in a predictable manner. A pedestrian may suddenly decide to jay walk because they see something important on the other side of the street. A car may suddenly change lanes because the driver decides at the last minute to take a different route. We have NN that can predict paths for individual objects but making sure those predictions are accurate for a large number of objects, especially when objects can influence the paths of other objects, becomes more complicated.

Decision/Planning is a big challenge because it is not always obvious to a computer what the right decision or right path is. Some cases, like construction zones, can be especially tricky since they change all the normal rules. A lane might be closed and cars have to follow a different path that would normally be wrong. And city driving can be tricky with lots of quick thinking. So it is not always obvious for the autonomous car what the right decision should be.

In fact, if we look at Waymo and Cruise, most of the failures are related to Decision/Planning. For example, a busy parking lot with a lot of pedestrians walking in front of the AV so the AV is not sure what path to follow. Double parked vehicles in a narrow street, where the AV is not sure whether to wait for the double parked vehicles to move or go around. A parking lot where the normal path is blocked off so the AV is not sure how to get out of the parking lot. A construction zone with cones in the middle of the road and the AV is not sure what lane to be in.

I believe it is also, why we see L4 cars, like Waymo and Cruise, with remote operators who provide guidance to the cars. The remote observers don't take over, they merely help the AV make decisions, like path suggestions. That tells me that the AV struggles mostly with Decision/Planning.
 
Nobody says they are only using simulation. Simulation can actually be a great way to expand your edge cases when blended with in-field data collection. For instance, the recent example of Tesla's interpreting a stop sign painted on a truck as a real stop sign. Once you see that edge case, you realize that vehicles can have anything painted on them. Wraps are a thing. Once you realize that, you can program your simulation to wrap random textures on all the vehicles, and get much, much more coverage than you could from trying to collect data from the real world, and you can eventually demonstrate that your system is robust to this issue (like a human is) rather than just being reactive to the specific cases you happened to collect a few times.

It's more concerning for the companies that think they can get away without simulation as a fundamental. Technology is nowhere near the point where we can collect raw video from large quantities of fielded vehicles, and you cannot "fail" in the real world like you can in a simulator, nor can you get the critical feedback signals as efficiently. Any system that only sends back data when it detects an "edge case" suffers from needing to detect the edge case, which is an immense challenge in itself, detecting things that you can't yet detect....

Of course, anyone at least decently experienced in ML knows you almost always have a "data augmentation" step to make your solution space more robust and reduce overfitting. Simulation is essentially this step.

Tesla uses simulation, no one thinks they can get away without using any simulation.

Rather, it's more concerning that many companies are downplaying the need for more real edge case data (of course because they can't actually get it).

You need data augmentation on top of a lot of edge case data. It is easier to get the augmentation, everyone has some form.

Much harder to get the data part - might only be Tesla right now that can (in theory).
 
might only be Tesla right now that can (in theory).
You really think Tesla is the only one that can get data? Waymo has driven 6 million miles autonomously, and Google has an image of almost every road on the planet.

The counterpoint is that Waymo got to L4 using simulation quicker than Tesla got to L2 using data. It's hard to use L2 data to find edge cases when your feedback signal is really weak, and much easier when you're L4.
 
Waymo has driven 6 million miles autonomously,

Waymo has driven more than that. The 6M miles is only the sample from Chandler from 2019 that Waymo used for their safety study. Waymo has actually driven over 20M autonomous miles in total since they started.

From page 4 of the Waymo Safety Report:

zivcS3o.png
 
  • Informative
Reactions: gearchruncher
I would be very surprised if it was decades away. If we look back at the last decade, in 2011 deep learning was dead. Then in 2012 it took off with alexnet. Then we had deepmind atari in 2015 and alphago in 2016 and gpt-2 in 2019. In 2021 we have Wu Dao 2.0 10x that of GPT-3.
GPT-3 Scared You? Meet Wu Dao 2.0: A Monster of 1.75 Trillion Parameters

If we extrapolate this growth another decade will it be enough to solve self driving? Imo yes. The problem seems to be the applied engineering problem, bugs, infrastructure, hardware, verification, legal etc. But with a lot of money and a lot of talent thrown at the problem I feel very confident that the next decade will see city street FSD at superhuman performance. I would guess one year, but then I have been wrong before so I will be conservative and say two years. If more I will consider what is wrong with my model of the world…
 
  • Funny
Reactions: DrDabbles
I feel very confident that the next decade will see city street FSD at superhuman performance. I would guess one year, but then I have been wrong before so I will be conservative and say two years.
From what company? We know this won't be Tesla- Even they have stopped indicating they will have L4 FSD anytime soon. They've had "L2 City Streets Beta" "coming this year" now for 2.5 years and it's still perpetually a few weeks away!
 
  • Like
Reactions: diplomat33
From what company? We know this won't be Tesla- Even they have stopped indicating they will have L4 FSD anytime soon. They've had "L2 City Streets Beta" "coming this year" now for 2.5 years and it's still perpetually a few weeks away!
Currently they're only promising "autosteer on city streets"
I think people are underestimating how much money and time has already been spent on autonomous vehicles. Tesla released Beta FSD October 2020. If we extrapolate from where it is on October 2021 when will it be superhuman? I'm not very hopeful, it seems like we need another breakthrough, not the current incremental progress.
It does seem like vehicles loaded up with 360 degree radar, lidar, and frequent use of remote assistance might be viable. I'm still waiting for Cruise, Waymo, etc. to actually drive enough miles to prove it.
 
it is hard to make a system that can defend against a long-tail of weird, real-world edge-cases reliably and robustly.
This is why I never bought it when Elon would say ‘you’ll be able to remove the steering wheel on your Tesla’ in the not so distant future.

As you said, we may get an incredible ADAS system that takes on most driving drudgery. But it is *trivial* to come up with driving situations that no AI system could conceivably handle, including scenarios where the system would need access to data points it just doesn’t have.
 
Here's an interesting article related to the idea that Tesla will be able to use the natural big data they get from customer owned cars to develop autonomy:


Even Tesla is expanding their synthetic testing capabilities.
 
  • Informative
Reactions: diplomat33
Exactly. A video game gives you unlimited data set with a confined dimensionality and confined set of possible outcomes. A video game based on driving (ahem a simulator) would have been "solved" by now. The actual real challenging issues are in the odd edge case data everyone keeps talking about and the only way to even possibly overcome these is to collect a lot of them.

This is why it's a joke when some companies tout their simulators as being any sort of key reason they will succeed.

Indeed. Also the visuals themselves are very different from real imagery. So if you deploy NN based on fake simulator with artificial graphics it might not detect real world objects well.
As Elon said, in a simulator you already have the answer before you started.
 
Here's an interesting article related to the idea that Tesla will be able to use the natural big data they get from customer owned cars to develop autonomy:


Even Tesla is expanding their synthetic testing capabilities.
Do they give you a Tesla to use? If so I'm in Lots of narrow two way roads. Hills and turns. Farm equipment and Amish horse and buggies. Then there is the occasional water over road sign. I guess the car could decide if the water is too deep to cross
 
Also the visuals themselves are very different from real imagery. So if you deploy NN based on fake simulator with artificial graphics it might not detect real world objects well.
This is proof that you have a very delicate system, and the current limits of classification.

Humans use simulators with less than perfect imagery, because we recognize a car, tree, or stop sign by very general traits and other context. We don't suddenly completely fail to drive a simulated car because the pixels or lighting isn't exactly right. It's this exact reason that we aren't confused by a stop sign painted on a car, or a bike on a bike carrier. These sims are used to us because they can teach us the timing, reactions, behaviors and more of a system, even if the images are imperfect, and they can do this faster and safer than the real thing.

I'd argue that if your system works well on simulated graphics, and then performs in the real world, it's a pretty robust system that is not going to fail when Dodge puts 4 stripes on a car instead of 3 or Tesla actually releases the Cybertruck.

It's also interesting that Google claims they are way past classification- their simulators are focused on scenarios, not imagery, just like humans use sims for.
 
Hmmm, the "experts" might be right, yet I recall a time when "experts" reported in the wall street journal that it would be at least 100 years before we can fully sequence a human DNA :)
Since we're making fun of "experts" and their incorrect estimates, I can recall a time (Feb 20, 2019) when a certain "expert" said:

I think we will be ‘feature-complete’ on full self-driving this year, meaning the car will be able to find you in a parking lot, pick you up, take you all the way to your destination without an intervention this year. I am certain of that. That is not a question mark

Interestingly, if the 100 year DNA guys are off by 10X, and Elon is off by 10X, we're a decade away.... But you'd think someone being sure something was less than a year away would be more accurate than someone estimating 100 years out.
 
Last edited:
It's also interesting that Google claims they are way past classification- their simulators are focused on scenarios, not imagery, just like humans use sims for.

I think that's expected. Google has already "solved" the perception part. The scenarios aspect also may have lots of potential edge cases that you can't collect when you only grab data in a handful of cities.

While Tesla will also needs lots of edge data for the scenarios, they need more edge case data for their camera / vision system as well. Much more than what Waymo / lidar folks would need.
 
Indeed. Also the visuals themselves are very different from real imagery. So if you deploy NN based on fake simulator with artificial graphics it might not detect real world objects well.
As Elon said, in a simulator you already have the answer before you started.

The graphics in simulators are pretty realistic IMO.

Voyage released this open source AV simulator:


But I don't think you use simulators to train your camera vision on basic object detection. You should be way past that point when you start using simulators. You are certainly not going to train your camera vision on a simulator and then immediately deploy to the public. I think you use simulators to fine tune planning and driving policy. Simulators are great for putting your AV in different driving scenarios.