Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

AI experts: true full self-driving cars could be decades away because AI is not good enough yet

This site may earn commission on affiliate links.
But I don't think you use simulators to train your camera vision on basic object detection. You should be way past that point when you start using simulators.
Which is why it's somewhat concerning that Kaparthy's explanation of what they use the fleet for is stuff like collecting unexpected or obscured stop signs, based on hard coded triggers. It appears Tesla is still very much in the classification stage.
 
Interestingly, if the 100 year DNA guys are off by 10X, and Elon is off by 10X, we're a decade away....
The geometric mean of super-optimistic and super-pessimstic.

For technology predictions, one comes from not admitting there are problems you still don't know, the other from not admitting that others can find ways around problems you think you know.
 
Which is why it's somewhat concerning that Kaparthy's explanation of what they use the fleet for is stuff like collecting unexpected or obscured stop signs, based on hard coded triggers. It appears Tesla is still very much in the classification stage.

Yes, Tesla seems very much still in the classification stage, basically "solving perception", with some planning thrown in to navigate the car appropriately (following a route, avoiding objects). I think part of this is because of Tesla's vision-only approach. When camera vision is all you have, your camera vision has to do everything. You better make sure that your camera vision can classify everything relevant. You will need camera vision to classify roads, lanes, road signs, vehicles, pedestrians, cyclists, road debris, static objects, etc... There is a lot that needs to be classified, especially when aiming for L5. So I am not surprised that Tesla is still in the classification stage.
 
  • Like
Reactions: Matias
I am curious when this 100 year statement was made?
According to Wikipedia the Human Genome Project was expected to take 15 years and took 13.
Was the statement made before a major breakthrough in sequencing technology? Obviously a breakthrough leading to artificial general intelligence could get us L5 vehicles way faster than expected.
I think there is something about AI technology that makes people way too optimistic: History of artificial intelligence - Wikipedia
 
  • Like
Reactions: diplomat33
This is proof that you have a very delicate system, and the current limits of classification.

Humans use simulators with less than perfect imagery, because we recognize a car, tree, or stop sign by very general traits and other context. We don't suddenly completely fail to drive a simulated car because the pixels or lighting isn't exactly right. It's this exact reason that we aren't confused by a stop sign painted on a car, or a bike on a bike carrier. These sims are used to us because they can teach us the timing, reactions, behaviors and more of a system, even if the images are imperfect, and they can do this faster and safer than the real thing.

I'd argue that if your system works well on simulated graphics, and then performs in the real world, it's a pretty robust system that is not going to fail when Dodge puts 4 stripes on a car instead of 3 or Tesla actually releases the Cybertruck.

It's also interesting that Google claims they are way past classification- their simulators are focused on scenarios, not imagery, just like humans use sims for.

The graphics in simulators are pretty realistic IMO.

Still pretty crap as realism goes. No realistic lightning, pristine environment etc. It is a very difficult undertaking to get the right lighting and shading so the imagery of signs etc are right. Also you need to simulate how weather behaves, random things happening etc. You need to train the data on the real world input, you might test it in the simulator based on the real imagery.


But I don't think you use simulators to train your camera vision on basic object detection. You should be way past that point when you start using simulators. You are certainly not going to train your camera vision on a simulator and then immediately deploy to the public. I think you use simulators to fine tune planning and driving policy. Simulators are great for putting your AV in different driving scenarios.

You are writing the test you are gonna take, so it's "cheating", and wont bring you as real results.
Simulators could be for basic tests on your driving policy logic (not machine learning), before testing it out in the field, as a first stage.
 
Still pretty crap as realism goes. No realistic lightning, pristine environment etc. It is a very difficult undertaking to get the right lighting and shading so the imagery of signs etc are right. Also you need to simulate how weather behaves, random things happening etc. You need to train the data on the real world input, you might test it in the simulator based on the real imagery.

Again, simulators are not meant to train basic camera vision or replace real world testing. So the graphics don't need to be perfect. Simulators are meant for testing planning and driving policy. You will still do real world testing.

You are writing the test you are gonna take, so it's "cheating", and wont bring you as real results.

You are writing the test but you are not writing the answers. You still put your FSD software in the simulator and see how it handles things.

Simulators could be for basic tests on your driving policy logic (not machine learning), before testing it out in the field, as a first stage.

Yes, exactly. That is what simulators are used for. They don't replace real world testing.
 
  • Like
Reactions: emmz0r
Still pretty crap as realism goes. No realistic lightning, pristine environment etc. It is a very difficult undertaking to get the right lighting and shading so the imagery of signs etc are right. Also you need to simulate how weather behaves, random things happening etc. You need to train the data on the real world input, you might test it in the simulator based on the real imagery.
Might want to check out Flight Sim 2020. These are images from a $40 program on a normal, $1000 desktop PC, and are real-time, fly-able environments, with dynamic weather. Clearly "crap" though that could never pass as real, and you can see how something trained on this would immediately fail in the real world.

1623357551846.png

1623357592027.png

1623357601601.png

1623358086073.png


If your system is sensitive to the errors in these images, I really wonder how it will handle the variation in the real world.
 
Last edited:
  • Like
Reactions: KrenGrl
Speaking of AV simulations, this is how Waymo uses simulations:

Simulation Testing
In simulation, we rigorously test any changes or updates to our software before they’re deployed in our fleet. We identify the most challenging situations our vehicles have encountered on public roads, and turn them into virtual scenarios for our self-driving software to practice in simulation. We also review data from crash databases and naturalistic driving studies to identify other possible collision scenarios and develop tests accordingly.

Page 22 of Waymo Safety Report:
 
If your system is sensitive to the errors in these images, I really wonder how it will handle the variation in the real world.
Neural nets are susceptible to pixel attacks. Intentionally feed them just a few incorrect pixels and they're mislabel almost anything to almost anything else. They're extremely trivial to attack.

Communications of the ACM published a piece ( DOI:10.1145/3460218 ) this month talking about multiple organizations working on this issue, and what a dire issue it actually is. There have been major lawsuits against "AI" companies that do image processing for oncology groups and they're basically a crap shoot. IBM's Watson is a lawsuit magnet because it basically can't deliver on the promises. At the end of the day, these things are parlor tricks still.
 
One way to think about it is, does a car need to be as smart or smarter than a human or as smart as a bird? Birds do very well, better than humans in some cases. They can navigate the oceans without maps or, to our knowledge, any great understanding of the world. They can take evasive action when necessary, it’s amazing really. But birds also fly into the windows of my house, during lockdown one titmouse (yes that’s a bird, I didn’t name the poor thing) would peck at its reflection on one of our windows, amusing us greatly and this spring a bird miscalculated the intersection of its vector and my Tesla and became vulture food. I couldn’t stop in time either, but I was going faster and was mistakenly confident that the bird would do what it’s companion did and not fly in front of the frigging car (a mere altitude correction would have been easy there).

So are birds a bad model for a FSD car? Would they be less safe? Hard to tell, but I bet they would probably be fine. Birds don’t text, they don’t drink, and they don’t fly recklessly for kicks (ok, maybe they do, I’m not a bird), and they don’t have a years long learning curve for flying. Those are some leading causes of death right there.

If FSD can achieve a bird intelligence, it would be good enough. If accident rates and death rates demonstrably begin to drop for AI vehicles (eg it’s not Elon who says so but the insurance agencie), things could accelerate greatly.
 
  • Helpful
Reactions: mikes_fsd
Of course, anyone at least decently experienced in ML knows you almost always have a "data augmentation" step to make your solution space more robust and reduce overfitting. Simulation is essentially this step.

Tesla uses simulation, no one thinks they can get away without using any simulation.

Rather, it's more concerning that many companies are downplaying the need for more real edge case data (of course because they can't actually get it).

You need data augmentation on top of a lot of edge case data. It is easier to get the augmentation, everyone has some form.

Much harder to get the data part - might only be Tesla right now that can (in theory).

That's not simulation, that's data augmentation.
Let me ask you this, which cars in this image (which is a video) is fake? That's how good simulation has gotten.
You can super-impose depth aware smart agents into a video and they will drive and adapt to the other cars and environment in the video and drive realistically and be shaded with the same lighting of the video.

1623419037455.png


This is because new techniques and architectures are making it into the AV industry.


This link and video below are good primer on AV simulation which has gotten a bad rap.

  • Primer #1


  • Primer #2

 
YES

IT

IS

I've harped on this many times in previous posts, no need to go through this again. Suffice to say, simulation / augmentation is important & necessary to add new types of cases, but it alone is not sufficient.

No one is saying simulation isn't necessary, but only even great simulation will not overcome a deficit of edge case data.

The question becomes - what is a sufficient amount of edge case data? Well based on my experience, it's probably a lot.

Autonomous cars might be the hardest machine learning / algorithmic challenge humans have encountered.

Not only is the 1) dimensionality / complexity extremely high, but 2) the accuracy requirements are also extremely high. The amount of data needed as either 1 or 2 increases scales non-linearly, and both are very high in this case.

So yes, my intuition based on previous ML problems is that no, collecting some data in some cities will not be sufficient. We will need to collect data basically everywhere. My guess is Waymo/Cruise are not on a path to collect enough data to deploy everywhere in the U.S. yet. Their areas of collection are too focused. Maybe Tesla won't even get enough data.



It's entirely speculative.
 
BTW, the real question is why should anyone trust any of these companies as telling the truth as to what is needed? Every single company has bias and skewed motivations that affect their decision making as well as what they say publicly.

Tesla is biased against LIDAR because it's not practical for them to put them into every car (costs too much right now). Tesla is biased toward saying more data matters because they have that advantage. Tesla is biased toward saying current cars can achieve FSD because they want people to buy in (mentally and $$$).

Waymo / Cruise is biased toward LIDAR because it makes perception easier and made getting decent working versions happen much more quickly. They have investors they need to appease. They are biased against saying more data is needed because they know they cannot get it in any cost-effective manner and their investors would walk away.

Maybe I need to say that again. All of these self-driving startups cannot say a lot of data is needed or else they would not get funding.

So, don't rely on any of these people and the buzzwords they throw out there to tell you the real truth. My assessment is quite independent of what the company's say, I'm just basing it on my experience in algorithm development, signal processing, and machine learning. I laughed when Tesla was claiming they were going to do FSD with a processor that required them to downsample the images and only feed in 2 images to make a decision.
 
YES

IT

IS

Data Augmentation is when you manipulate your training set for your ML model (flip it, rotate it, change color, saturation, add noise, etc) so that your model doesn't overfit.


BTW, the real question is why should anyone trust any of these companies as telling the truth as to what is needed? Every single company has bias and skewed motivations that affect their decision making as well as what they say publicly.

Tesla is biased against LIDAR because it's not practical for them to put them into every car (costs too much right now). Tesla is biased toward saying more data matters because they have that advantage. Tesla is biased toward saying current cars can achieve FSD because they want people to buy in (mentally and $$$).

Waymo / Cruise is biased toward LIDAR because it makes perception easier and made getting decent working versions happen much more quickly. They have investors they need to appease. They are biased against saying more data is needed because they know they cannot get it in any cost-effective manner and their investors would walk away.

Maybe I need to say that again. All of these self-driving startups cannot say a lot of data is needed or else they would not get funding.

So, don't rely on any of these people and the buzzwords they throw out there to tell you the real truth. My assessment is quite independent of what the company's say, I'm just basing it on my experience in algorithm development, signal processing, and machine learning. I laughed when Tesla was claiming they were going to do FSD with a processor that required them to downsample the images and only feed in 2 images to make a decision.

I didn't reference anything that consisted of buzzwords or marketing. I posted information on what makes up a simulation system and how SOTA AV simulation is advancing. How the latest advances in ML are fueling the improvement of simulation.

Maybe I need to say that again. All of these self-driving startups cannot say a lot of data is needed or else they would not get funding.

Lastly there's more AV companies than Waymo and Cruise. There are companies who actually have consumer cars with the exact same sensors they use in their AV system and yet they don't say alot of data ("billions of miles") is needed.
 
Data Augmentation is when you manipulate your training set for your ML model (flip it, rotate it, change color, saturation, add noise, etc) so that your model doesn't overfit.


That's what simulations are doing in the 4-D driving space analogous to your 2D image examples. Simulations change where the cars are, different layouts, change colors, change everything, change and create tons of instances of all the variables we know about.

It's the same concept and provides the same benefits.

Lastly there's more AV companies than Waymo and Cruise. There are companies who actually have consumer cars with the exact same sensors they use in their AV system and yet they don't say alot of data ("billions of miles") is needed.

None of these companies have the capabilities to collect a lot of data. Ergo, they will not say they need a lot. It's pretty obvious.

Assessment of what the "correct" approaches are should be made without influence of what any company says (including Tesla), because they are all biased.
 
Data Augmentation is when you manipulate your training set for your ML model (flip it, rotate it, change color, saturation, add noise, etc) so that your model doesn't overfit.
That is how data augmentation originally started out when training vision-networks as it was the lowest hanging fruit to try to introduce new variations to the data that you want your model to be invariant to. Adding additional scenarios to further this, either via simulations or other approaches like using GANs to create more data to train on, can and is considered as data augmentation. The purpose of all of these are to ultimately achieve the same thing, which is to add more variety and richness to the source data used for training that you want the network to be invariant to so that it can generalize better across a wider set of inputs in the real-world.

On the other hand, if you consider simulations for training policies and not the underlying perception networks, one could argue that it is not really data augmentation so much as providing truly unique data for unique scenarios to train against. At the end of the day, it is not very meaningful to debate whether something is or isn't data augmentation. The end goal is the same. What really matters is how much "novel" information your data augmentation scheme can introduce to make your networks more robust.

While simulations are a powerful tool and can be very helpful, I don't see them being as useful in capturing the long tail of exceptional/weird events that humans deal with every day without even thinking about it. eg: A large plastic bag flutters across the front of the car and the Lidar freaks out because there is an obstacle in the way when humans know they don't need to worry about it at all and will never hit the brakes in this scenario.
 
Last edited:
  • Like
Reactions: ZeApelido
The thought of "Tesla's fleet of driving cameras gives them an enormous advantage" is a nice thought but then why that thought was not put into practice to prevent another fatal crash for the 2018 Model 3 in 2019 in Florida?

This article has numerous red-flags in it. I wouldn't even say it was a fault with Autopilot but Collision Avoidance. Autopilot just gets the press.

"Banner activated Autopilot's traffic-aware cruise control feature 12.3 seconds before the fatal crash. He then activated Tesla's Autosteer feature 9.9 seconds before impact."

And then what? Flossed his teeth? Seems that the error was not detecting the truck crossing the road, both car and human.

"Truck entered the roadway in front of Banner's car about 4.5 seconds before the crash. Banner was traveling at almost 70 miles per hour; the truck was traveling about 10 miles per hour. Neither vehicle slowed down or took other evasive actions during those 4.5 seconds."

I used to live in Florida ... these "highways" without stop lights and cross traffic ... dangerous stuff. There's a reason why you see a dozen semi-truck injury lawyer billboards outside the Jacksonville airport.

But of course it's Tesla's fault because in this case the radar looked under the truck or something. Improvement is needed but humans count too.

One of the comments: "And YES it was very much autopilot that had him going 70 in a 55"

Says someone who never used it. Driver needs to increase the speed manually, always.
 
That is how data augmentation originally started out when training vision-networks as it was the lowest hanging fruit to try to introduce new variations to the data that you want your model to be invariant to. Adding additional scenarios to further this, either via simulations or other approaches like using GANs to create more data to train on, can and is considered as data augmentation. The purpose of all of these are to ultimately achieve the same thing, which is to add more variety and richness to the source data used for training that you want the network to be invariant to so that it can generalize better across a wider set of inputs in the real-world.

On the other hand, if you consider simulations for training policies and not the underlying perception networks, one could argue that it is not really data augmentation so much as providing truly unique data for unique scenarios to train against. At the end of the day, it is not very meaningful to debate whether something is or isn't data augmentation. The end goal is the same. What really matters is how much "novel" information your data augmentation scheme can introduce to make your networks more robust.

While simulations are a powerful tool and can be very helpful, I don't see them being as useful in capturing the long tail of exceptional/weird events that humans deal with every day without even thinking about it. eg: A large plastic bag flutters across the front of the car and the Lidar freaks out because there is an obstacle in the way when humans know they don't need to worry about it at all and will never hit the brakes in this scenario.

Here's a plastic bag in-front of a Waymo... I see no freak outs.
This is prime example why simulation at every level is necessary. Let's say you have driven 10 million miles. You can drop in photorealistic relight geometry aware animating plastic and simulate these photorealistic plastic bags into those 10 million miles at lidar and camera level and see how your perception and driving policy will react to it. You car either pass without hard braking or you now have exact scenarios you need to address. I think you don't quite understand the gravity of photorealistic geometry-aware and relit smart actors.

If we are going to solve L4 everywhere/ L5. It will only come through simulation at scale. Another example is solving the phantom brake issue with shadows that Tesla has. You take your 10 million miles of video clips, using SOTA mono to depth and you can relight it with different lighting, producing different shadows in which you can test your system to see where/what type of shadows and in what location they will phantom brake at.

26m 50s


 
Last edited: