Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Elon: "Feature complete for full self driving this year"

This site may earn commission on affiliate links.
Essentially Elon says that this new 3D video labeling will provide 2-3 orders of magnitude improvement in labeling efficiency. That means Waymo has been labeling objects with more than 2-3 orders of magnitude better accuracy and efficiency than Tesla for years...

I doubt Waymo is using lidar purely for localization. They're also using lidar to accurately label the 3d world created using their 360 camera and lidar fusion. Once they have the NN that accurately maps distances and sizes of objects purely based on the 360 camera, they can get rid of the lidar...

Lidar essentially becomes the training wheels for accurate vision-based FSD.
 
There's no known way to accurately produce a depth map using one camera (with exception of portrait photos on camera phones, but that's only for close distances). There are 3 front cameras, but any distance estimation from the side or back cameras will need to be made using one image.

I feel like consecutive frame estimation could work here, but I can't be certain. Either way, those distances are usually a lot less important, because you aren't moving in those directions most of the time.
 
I understand what Elon is saying.

I'm just saying that generating a 3D scene using a "2D" 360 camera is inaccurate. The distances and size of objects generated from a NN creating 3D objects from a 2D camera image are going to be inaccurate, whereas lidar will provide cm-accurate distances and sizes.

The vehicle NN is not creating a 3D scene. The computers at Fremont doing the compositing are not running a NN. The labeling is likely including some level of distance/ cross track information for each individual object/ frame pair, and that would be fairly accurate due to the integration of multiple images and vehicle ground truth data (GPS, wheel encoders, parallax, accelerometers) similar to SAR radar.
The exact size (cm level) of an object is not critical, nor is exact range estimation of non lane objects. Stop lines, cross walks, lane lane, etc have high movement with tracking error. Signs can be judged by aspect ratio with a vision system, like the 3D bounding boxes on vehicles.
 
  • Like
Reactions: nepenthe
The vehicle NN is not creating a 3D scene. The computers at Fremont doing the compositing are not running a NN.

Yes they are.

And 2 images from the same camera 0.018 seconds apart doesn't allow for parallax distance estimation of a moving object. To do parallax distance estimation, you need two images from two different cameras of the same scene at the same exact timestamp.
 
Centimeter accuracy is for surveyors. When you're talking about a self-driving car traveling at even 35 miles per hour, the difference between one foot and two feet isn't usually meaningful, much less centimeter accuracy.
  • If something is far enough out to be able to act on it meaningfully, you just need to know the approximate distance within ± a few feet and whether it is flat enough to run over or not.
  • If it isn't, it's already too late and you're going to hit it anyway, so the accuracy doesn't buy you anything.
Never in my entire driving career have I even once wondered whether something was 750 or 751 centimeters away. If you don't need to know that information, there's a good chance your car doesn't, either.

This misses what I consider a critical point: Computers are not brains and do not work like brains. Even so-called neural nets only attempt to mimic a brain. They still start and end with numbers, and manipulate those numbers in between. Computers are still, no matter what else, very very fast calculators. The millimeter accuracy of lidar will provide the computer with more accurate numbers going in, and this will provide better numbers coming out.

Elon has said that lidar is a crutch.
Elon is just being ridiculous here. When you break a leg you need a crutch. A crutch is not a bad thing, it's an aid. Calling lidar a crutch is a glib way of dismissing a useful tool. It's like saying that a hammer is a crutch because you can pound nails with a rock. A hammer is better than a rock, and lidar plus cameras is better than cameras alone.

Lidar essentially becomes the training wheels for accurate vision-based FSD.

I disagree. More information is always better, and lidar provides additional information that cameras do not.

Never in my entire driving career have I even once wondered whether something was 750 or 751 centimeters away. If you don't need to know that information, there's a good chance your car doesn't, either.

This gets back to the different way that computers and brains operate: When we see someone walking we understand that they are moving. Computers do not "understand" anything. Highly-accurate range data over time tells the system that something is moving. This is important information that can be obtained from lidar data because of its accuracy, but which a computer cannot infer from the kinds of clues a brain is capable of processing. Range data based on cameras will never be as good.

In short, more information is always better. Elon says lidar is a "crutch," but what he's really doing is crippling his system by refusing to use an available tool. Maybe because of hubris. Something like, "We are so good we can do it without." Or maybe just because he's already sold FSD with the promise that your car has all the necessary hardware, and it would be too difficult or too expensive to add lidar now to all those old cars. This could cost Tesla the lead, and in this race, getting to commercially-viable FSD first is worth billions, maybe hundreds of billions.

We have seen that Elon was wrong when he said our 2018 cars had all the needed hardware. We have seen that he was wrong when he thought Tesla was just a year or two away from robo-taxi-capable cars. And now we see that their code needs a rewrite. It's time to accept that lidar is a necessary tool to achieve this goal. It's time to bite the bullet, accept that the hardware in today's cars cannot reach FSD, deal with the consequences of having sold a pig in a poke, and move on toward the goal.
 
The exact size (cm level) of an object is not critical, nor is exact range estimation of non lane objects.

Well that depends. If you want to have zero or near zero serious accidents that are the car's fault then they matter a lot. If you don't care so much and just accept that occasionally it will kill or badly injure someone then okay, don't worry too much about it.

Even if Tesla's system is say 10x or 100x better than a human driver that still means that they are going to be looking at huge liabilities over the entire fleet. They will need insurance to cover their software on every vehicle with FSD.
 
Even if Tesla's system is say 10x or 100x better than a human driver that still means that they are going to be looking at huge liabilities over the entire fleet. They will need insurance to cover their software on every vehicle with FSD.

Tesla already offers insurance in CA and plans to expand to other states.

If the software is provably 10x or 100x safer than a human it'll obviously be a LOT cheaper to insure too.
 
Yes they are.

And 2 images from the same camera 0.018 seconds apart doesn't allow for parallax distance estimation of a moving object. To do parallax distance estimation, you need two images from two different cameras of the same scene at the same exact timestamp.

Interesting, what trains that NN? Seemed like direct physics model + image matching would be a deterministic solution.

The parallax was referring to the 3-D scene generation where you have a baseline of multiple feet to determine object locations.

In the instance of driving, parallax gets better the closer the object is, which is also when knowing it's location is more important. Any reason Tesla can't feed images from multiple cameras into the same NN along with two time intervals?
 
In the instance of driving, parallax gets better the closer the object is, which is also when knowing it's location is more important. Any reason Tesla can't feed images from multiple cameras into the same NN along with two time intervals?

Using parallax to generate a 3D representation / depth map requires 2 different perspectives of the same scene. It's possible to use parallax to generate a 3D representation using two different image timestamps *IF* the scene is *static*. However, if you have moving objects in a scene, you can't use parallax for 3D representation (with 1 camera) b/c you can't know if the object looks different b/c the object moved or b/c the camera moved.
 
Using parallax to generate a 3D representation / depth map requires 2 different perspectives of the same scene. It's possible to use parallax to generate a 3D representation using two different image timestamps *IF* the scene is *static*. However, if you have moving objects in a scene, you can't use parallax for 3D representation (with 1 camera) b/c you can't know if the object looks different b/c the object moved or b/c the camera moved.

That is true, but there are two situations that can be handled differently.
Assume
  • the camera path is fairly well known
  • there is an algorithm to track object position in a scene (basis of video compression)
For static objects (sign, lights, lanes), their position is able to be determined (since velocity is 0). If the position has a high error band, that indicates the object is in motion which leads to:
For dynamic objects (people, cars), their gross path can be determined, but exact position is not as certain on the first pass. Once an approximate path is determined, that can be used to firm up the positional data in each frame. Accuracy depends on path smoothness, which it typically would be at small time scales.

Even with poor path accuracy, the ability to track the object through the screen allows for a single label to be applied to all instances of that object in the source frames.
 
Even with poor path accuracy, the ability to track the object through the screen allows for a single label to be applied to all instances of that object in the source frames.

Sure, I just find that using vision alone to track object distances, sizes, and paths to be very troublesome. I can't really think of a way for Tesla to properly label objects in this type of detail for anything but the front cameras which are fused with the radar data.

We really can't apply human thinking and experience to neural networks. They need to be fed as accurate data as possible in order to derive the right statistical probabilities. That's why neural networks have been so successful in games like StarCraft and DoTA. The neural nets are fed exact information from the game's data. Although starCraft and DoTA are imperfect information games, the information that *is* presented is perfect. That is not true for Tesla's new approach.
 
  • Like
  • Helpful
Reactions: croman and mongo
As far as I can tell, the academic research on monocular distance estimation seems pretty robust. You can find articles on it stemming back to 2014 or so, here's one from 2019 relevant to autonomous driving: FisheyeDistanceNet: Self-Supervised Scale-Aware Distance...

As for training the network, I think Tesla uses the forward-facing radar to train their distance estimating neural networks.

EDIT: Paper included a YouTube video:

 
As far as I can tell, the academic research on monocular distance estimation seems pretty robust. You can find articles on it stemming back to 2014 or so, here's one from 2019 relevant to autonomous driving

As I said, it's possible to use a single camera to do a depth map for static scene. You know the car's position in the scene, and you have two different images from the same camera with different timestamps, you can derive distance estimation using the two images.

This isn't possible for moving objects in the scene (with good accuracy for fast moving objects).
 
Last edited:
I disagree. It's possible to achieve FSD without lidar. However, you need lidar initially to properly label distances and sizes with your 360 cameras: similar to how Tesla is using front radar to label distances and sizes.

It may very well be possible to achieve a self-driving car that's a better driver than a human with cameras only. But a system that has both cameras and lidar will be better because it will have more information.

In addition, time is significant. We don't just want FSD some day, we want it as soon as possible. And Tesla is in a race with several other companies. What happens to Tesla if Waymo gets there first and Toyota leases the technology and starts selling fully autonomous cars and Tesla is still making promises about "Really soon now!"? If that happens and Toyota puts that technology in an electric car, I'll sell my Tesla in a heartbeat and buy that Toyota. The re-sale value of Tesla's FSD package will be nil because people who want FSD will buy the Toyota.

Lidar could make all the difference in who gets there first and reaps the reward of being the only company selling FSD.
 
  • Like
Reactions: diplomat33
No that's a HD map no matter how you try to undress it. There's nothing about it being traffic related. For example Mobileye's HD Map includes stop signs and potholes. To avoid a pothole, you need to know precisely where it is.

No surprise that Tesla fans are already contradicting themselves.
UesM53B.png
The HD map Elon is describing is the same HD map a driver familiar with the road has in his head. Only crowdsourced from cars and digitalized.

You can perfectly well drive at a new location without maps by vision only. But it doesn't make sense for a human driver to drive into the same pothole every day when you know it's there. In the same way it doesn't make sense that other Tesla's need to drive into the pothole once a few cars before you has.

Maps help, and they should definitely not be required to drive. If the road is covered in snow, they help you know where the lanes split and merge, even if you don't see the marks on the road. In the same way a familiar driver would know, however a new driver would likely drive elsewhere because he doesn't know the lanes under the snow.
 
In addition, time is significant. We don't just want FSD some day, we want it as soon as possible. And Tesla is in a race with several other companies. What happens to Tesla if Waymo gets there first and Toyota leases the technology and starts selling fully autonomous cars and Tesla is still making promises about "Really soon now!"? If that happens and Toyota puts that technology in an electric car, I'll sell my Tesla in a heartbeat and buy that Toyota. The re-sale value of Tesla's FSD package will be nil because people who want FSD will buy the Toyota.

Lidar could make all the difference in who gets there first and reaps the reward of being the only company selling FSD.

I think you also need to consider how cost prohibitive LIDAR is at the moment. The Verge released a video on Waymo in December last year where they estimated that the sensors and compute for a single test vehicle costs $400,000. That massive up-front investment in sensors means that while Tesla can make money shipping their camera-based vehicles across the world, it would cost Google millions and millions of dollars to even approach a small fleet in all 50 states, let alone in every country.

LIDAR may give Waymo and others the advantage in small-scale rollout, but affordability gives Tesla the advantage in global rollout and global data gathering.
 
  • Informative
Reactions: daniel
Ugh, this new approach that Elon is talking about stinks. I had some faith in Tesla after autonomy day, but after Elon mentioned during the earnings car that they got close to feature complete along with his mention of rewriting the entire auto pilot foundation, it seems to me that Tesla's approach was not producing the level of accuracy and consistency that will be required for feature complete and the trailing 9's needed for FSD roll out.

Essentially, the initial assumptions and criteria set my tesla for FSD were not met with the approach that was presented during autonomy day, so the entire code needed to be rewritten. And with the problem in mind, there is very little confidence that somehow this new code will meet the criteria needed for feature complete and FSD roll out.

Also, the whole FSD preview thing was simply to satisfy some aspect of "feature complete" before end of 2019... it wasn't actually a step towards feature complete at all........

I'm thinking Waymo's approach is actually inching towards FSD better than Tesla's. Since they have "real" information from their sensor suite.
 
It may very well be possible to achieve a self-driving car that's a better driver than a human with cameras only. But a system that has both cameras and lidar will be better because it will have more information.

You have two watches and they disagree, which do you use to tell time?
If you trust one watch over the other, why do you have two?

In addition, time is significant. We don't just want FSD some day, we want it as soon as possible. And Tesla is in a race with several other companies. What happens to Tesla if Waymo gets there first and Toyota leases the technology and starts selling fully autonomous cars and Tesla is still making promises about "Really soon now!"? If that happens and Toyota puts that technology in an electric car, I'll sell my Tesla in a heartbeat and buy that Toyota. The re-sale value of Tesla's FSD package will be nil because people who want FSD will buy the Toyota.
And with a Tesla, you would get it OTA as soon as it is ready.
With Toyota, you need to wait till they perfect the SW and the HW, and integrate it into a car, and sell enough cars for you to buy one. At which point, it probably won't be electric (unless you have H2 available).
Even if Tesla were 2 years behind Toyota on FSD, it probably wouldn't matter. Tesla would still be selling cars, just perhaps without the FSD option.

Lidar could make all the difference in who gets there first and reaps the reward of being the only company selling FSD.
Only is only for some span of time, again that only matters when there are saleable cars at competitive price points. Lidar could be the difference to the negative side also.
 
Only is only for some span of time, again that only matters when there are saleable cars at competitive price points. Lidar could be the difference to the negative side also.

Again, lidar isn't required for FSD, but it's extremely helpful to use it to label camera objects, especially for the 3D labeling approach Elon is talking about. You can use lidar to develop FSD, but it isn't required for FSD.

The whole Tesla has more data argument is moot if the data is inaccurate.

Essentially what lidar allows you to do is to create a 3-D representation of the environment along with persistent object volumes that are moving around that environment. You can then use the camera images as textures to wrap around the persistent object volumes generated from the lidar. You can then feed the characteristics of the 3-D objects (size, distance, path, speed, etc.) along with the video input from the camera expressed as a wrapped texture on the 3-D objects from the lidar. After the neural network gathers enough data to deduce a 3-D object (size, distance, path, speed, etc.) simply from camera video input, you can get rid of the lidar.
 
Last edited: