Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Andrej Karpathy - AI for Full-Self Driving (2020)

This site may earn commission on affiliate links.
So you solve vision to say 95. LIDAR is still back there at 85 and it’s not going to contribute towards the finish line because we already baked its capabilities into the 85%-capable system. We’ve passed LIDAR’s limits. So we’re at 95 on the scale with vision, and LIDAR is done contributing in a meaningful manner, and vision is doing all the heavy lifting in nearly every scenario. If we don’t pass 95 then no L5 for anyone and LIDAR isn’t going to help. It’s already accounted for.

This assumes that the 85 that lidar solved wholly overlaps with the 95 that CV solved. What if 4 of the 84 that lidar solved was not on the 95 that CV solved. Would that not bring it cumulatively to 99?
 
  • Like
Reactions: Doggydogworld
I guess you have never been in a downtown area.

Now we are creating specific scenarios because i can create specific scenaerios that you can't drive with camera in? The point is can you drive with lidar only and the answer is YES.

Yes, but that isn't relevant to the conversation. As a tech were you allowed to go straight? Because your crippled Robotaxi couldn't. (Without possibly breaking the law.)

Can you drive with Lidar only? Yes. Like i told you i can go from my house to my work that is 20 miles away without going straight through an traffic light controlled intersection.

So Can you drive with Lidar only? Yes.
Can you drive with Camera only? Yes.
Can you reach 99.99999% level of safety with Lidar only? Currently No.
Can you reach 99.99999% level of safety with Lidar only? Currently No.

If you have two completely independent ways to sense a person, because of the pros/cons of each independent system. they won’t fail in a correlated way. If either sensor sees a person, you act as if a person is there. And because of independence, you assume that both sensor systems missing a real person is so improbable that it will essentially never happen.

So if you had a camera system at 99.99% rate of failure
And then you add a lidar/radar system with 99.99% rate of failure.
This would result in effectively a system that is well over 99.99999% rate of failure.

 
Last edited:
Adding LIDAR to this setup, today, immediately bumps that to 85 on this scale. You get to leapfrog vision overnight and rely on this incredible dot cloud to navigate the world. But you can’t get past 85 with (year 2020’s) vision capabilities + LIDAR.

Waymo have given hundreds if not thousands of rides in phoenix without a human driver. I'm pretty sure their rate of failure is well beyond 85%.

LIDAR’s additive abilities END where they are today, more or less, because it can’t see signs or markings or lights or all the things we need to navigate the world safely, and it never will. LIDAR is fundamentally incapable of these tasks and always will be. LIDAR let’s you cut in line but not to the front, only to 85. Not good enough for the end goal but some immediate and impressive progress.

Now what? LIDAR plus 2020-vision got you to the 85 mark and you’re stuck. You cannot opt out of lane markings and signs and speed limits and light colors and all this stuff. That will never be optional. It must be extremely reliably solved and LIDAR can’t help. You’ll never get to 100 with this setup.

The problem is that first of all. Lidar DOES see signs, lane and road markings. This is where you are yet again fundamentally wrong.
Second of all, Lidar isn't Additive its a Multiplier. Because Lidar and cameras fails in different ways.

So then what? Without solving vision to the degree that it can do all this on its own you never get to 100.

So you solve vision to say 95. LIDAR is still back there at 85 and it’s not going to contribute towards the finish line because we already baked its capabilities into the 85%-capable system. We’ve passed LIDAR’s limits. So we’re at 95 on the scale with vision, and LIDAR is done contributing in a meaningful manner, and vision is doing all the heavy lifting in nearly every scenario. If we don’t pass 95 then no L5 for anyone and LIDAR isn’t going to help. It’s already accounted for.

Now what? How do you start creeping past 99? It ain’t LIDAR. It’s vision. And to do this vision must have already surpassed LIDAR’s early leapfrog on the timeline. We don’t need LIDAR anymore because we fundamentally can’t need it to solve this problem. We need something else.

While I remain skeptical that this problem will ever be solved I am solidly convinced that LIDAR will not be contributing past its leapfrog point. I’m convinced that only enormous data ingestion feeding neural net development (plus radar and ultrasonics) will ultimately get us to the 100 mark on this scale. And I’m convinced that Tesla is the only company with the fleet and tools deployed to pull this off, if it’s possible at all.

Lidar contributes to everything a camera can do other than color of traffic lights.
You are seeing this completely wrong. Lidar is a multiplier not an additive.

It gives two completely independent ways to sense a person that fail and succeed in different ways, because of that they won’t fail in a correlated way. If either sensor sees a person, you act as if a person is there. And because of independence, it means that both sensor systems missing a real person is so improbable that it will essentially never happen.

Lidar sees in low light situations even in pitch darkness.
Lidar also sees in bright light situation even in direct sunlight.
Lidar also provides cm accuracy of 3d object exact size and dimension and also distance without the need of a faulty neural network.
Lidar can also see and read traffic signs, road signs, pavement markings and lane markings
Lidar data can also be used to classify objects, for example a person, car, bike, pole, curb, tree, traffic light, truck, van, etc.


Radar sees through fog, smoke, heavy rain, heavy snow, sand storm, etc.
Radar sees in low light situations even in pitch darkness.
Radar also sees in bright light situation even in direct sunlight.
Radar also provides cm accuracy of 3d object distance and speed without the need of a faulty neural network.
More advanced (5th gen and even more powerful imaging) radar data can also be used to classify some objects, for example a person, car, bike, curb, tree, truck, etc.

20180713-Arbe-Robotics-image-1-4D-Imaging-Radar-Image1-1024x638.jpg


Camera does poorly in low light conditions and doesn't work in pitch darkness
Camera does poorly in bright direct light conditions
Camera does poorly in heavy rain, heavy snow, mist, smoke, fog, dust storm, blizzard, etc
Camera only sees numbers (ex: 0,1,0,0,1,0) and needs a slew of faulty neural network to determine an 3d object's dimension, size, speed, distance and classification.
Lidar and high resolution/imaging radar however only needs neural network for classification.

baab9e4b97aceb77ec70abeda6be022d.png
 
Last edited:
Here's a bit of a fun video for all of the LIDAR fans:


I will concede, some of these quotes are taken out of context, especially the Mobileye folks who are still using LIDAR on their primarily-vision based system, but it's still an entertaining watch :D
 
Here's a bit of a fun video for all of the LIDAR fans:


I will concede, some of these quotes are taken out of context, especially the Mobileye folks who are still using LIDAR on their primarily-vision based system, but it's still an entertaining watch :D

LMAO almost all of that is taken of out of context. For example Nissan, Autox and Mobileye which was featured extensively all use lidar.
I'm not surprised this what i expect from the TSLA community. Misinformation, lies and pure nonsense.
 
  • Like
Reactions: diplomat33
Except that you don't need as much CV computing power if you are using lidar. For example, you don't need CV to do "pseudo-lidar" if you are using lidar. So we don't know what the actual numbers will be for each variable in your equation.

We don't know, because we aren't at sufficient accuracy with Lidar + CV systems yet.

The next evolution (who knows how long it will take) for every sensored AV (camera w or w/o lidar, radar, etc...) will be advancements in training of huge, deep neural nets based on video segment.

This will be state of the art engineering accomplishment. No one knows how long it will take, how much compute it will take, what advancements in architectures will be needed, how much inference compute will be needed.

So we don't know how much cost "pseudo-lidar" output will be in that system. You may find out that the accuracy needed for the CV+Lidar system will generate "pseudo-lidar" at minimal cost (maybe needed for other reasons). Who knows!
 
So if you had a camera system at 99.99% rate of failure
And then you add a lidar/radar system with 99.99% rate of failure.
This would result in effectively a system that is well over 99.99999% rate of failure.


You're assuming failure is always a false negative, rather than having false positives.

If it were that easy, AI would be much more valuable than it currently is.
 
The next evolution (who knows how long it will take) for every sensored AV (camera w or w/o lidar, radar, etc...) will be advancements in training of huge, deep neural nets based on video segment.

This will be state of the art engineering accomplishment. No one knows how long it will take, how much compute it will take, what advancements in architectures will be needed, how much inference compute will be needed.
You're taking a very "this is a BIG black box" approach to this.
That black box has been worked on for over a decade and has been broken down to smaller black boxes as some of the unknowns become knowns.

For instance, there will NOT be an FSD solution without radar of some sort.
This is because radar does way better then humans in inclement weather. And as radar gets better, the benefit will only grow.

During Autonomy Day and since then, Karpathy and even Elon have both outlined the framework and approach that they at Tesla are taking.
While, it is still accurate "we do not know how long it will take", I wouldn't classify it all as this big black box.
 
You're taking a very "this is a BIG black box" approach to this.
That black box has been worked on for over a decade and has been broken down to smaller black boxes as some of the unknowns become knowns.

For instance, there will NOT be an FSD solution without radar of some sort.
This is because radar does way better then humans in inclement weather. And as radar gets better, the benefit will only grow.

During Autonomy Day and since then, Karpathy and even Elon have both outlined the framework and approach that they at Tesla are taking.
While, it is still accurate "we do not know how long it will take", I wouldn't classify it all as this big black box.


I did not say it would be one big black box...
 
My position is that vision is needed but that it is unclear how accurate and reliable it can. "Solving vision" to the needed 9's may take a long time if it is even possible. So I am arguing the "Waymo approach": you solve vision but only to 99.99% which is much easier, and you combine this vision with radar and lidar. Radar will help with cases where vision is poor like inclement weather. In good weather, Lidar will provide a "second opinion" on distance calculation, object detection, object classification, lane detection etc...

Sorry, but this argument does not work. You say that you can solve vision to 99.99% and solve lidar to 99.99% and then you can multiply the error rates and get 99.999999% reliability. But this assumes independent failure modes for the different sensors. Two simple cases where this will not work are traffic lights and stop signs. Suppose you only had two sensors in front of the car: 1 camera and 1 lidar. If the camera fails, lidar may be able to detect a traffic light, but it would not know its state, so your car would need to stop at the light. Note that an HD map would not tell you the current state of the traffic light. Lidar may be able to detect a sign but it would not be able to interpret it. Again, if you do not know what a sign says, how can you be sure you can keep driving? The car must stop, unless you trust an HD Map, provided the HD map is available, and was updated recently.

Other cases where cameras and lidars fail in different ways:
  • Detection of objects with low reflectivity, such as black cars or certain clothing: cameras can do this just fine in daylight; lidars will have trouble
  • Classification of smaller objects and phenomena: debris like tires, plastic bags; car exhaust: cameras have enough resolution and color recognition to recognize this as benign/not benign; a lidar will give you some points from it and you need to decide whether it's a real obstacle
  • Detection/classification of nearby objects: If you mount a lidar on top of a car, you'll have huge blind spots in the immediate vicinity of the car, where you would need to fall back to cameras, radar, and ultrasonics. Or, if you would like to, again, add thousands of dollars to the cost of a car, you would add multiple extra lidars to deal with this.
  • Classification of objects far away (> 100 m): The much lower resolution of lidar (~3-5% of a typical ADAS camera's resolution vertically) starts becoming a bigger problem at farther distances. Choose 100 m on this page and see (or rather, don't see) what details that one of the best lidars on the market, that needs to stick out at the top of the car, captures.
 
Sorry, but this argument does not work. You say that you can solve vision to 99.99% and solve lidar to 99.99% and then you can multiply the error rates and get 99.999999% reliability. But this assumes independent failure modes for the different sensors. Two simple cases where this will not work are traffic lights and stop signs. Suppose you only had two sensors in front of the car: 1 camera and 1 lidar. If the camera fails, lidar may be able to detect a traffic light, but it would not know its state, so your car would need to stop at the light. Note that an HD map would not tell you the current state of the traffic light. Lidar may be able to detect a sign but it would not be able to interpret it. Again, if you do not know what a sign says, how can you be sure you can keep driving? The car must stop, unless you trust an HD Map, provided the HD map is available, and was updated recently.

Other cases where cameras and lidars fail in different ways:
  • Detection of objects with low reflectivity, such as black cars or certain clothing: cameras can do this just fine in daylight; lidars will have trouble
  • Classification of smaller objects and phenomena: debris like tires, plastic bags; car exhaust: cameras have enough resolution and color recognition to recognize this as benign/not benign; a lidar will give you some points from it and you need to decide whether it's a real obstacle
  • Detection/classification of nearby objects: If you mount a lidar on top of a car, you'll have huge blind spots in the immediate vicinity of the car, where you would need to fall back to cameras, radar, and ultrasonics. Or, if you would like to, again, add thousands of dollars to the cost of a car, you would add multiple extra lidars to deal with this.
  • Classification of objects far away (> 100 m): The much lower resolution of lidar (~3-5% of a typical ADAS camera's resolution vertically) starts becoming a bigger problem at farther distances. Choose 100 m on this page and see (or rather, don't see) what details that one of the best lidars on the market, that needs to stick out at the top of the car, captures.

Yes, there are cases where cameras and lidar fail differently. That's to be expected since they are different sensors that work completely differently after all.

You are conveniently ignoring cases where cameras will fail and lidar won't. For example, cameras may fail in blinding light but lidar will work. Cameras also will fail in pitch darkness with no independent light source but lidar will work. Camera vision can also have plenty of false positives and false negatives. So there are plenty of cases where camera-only may be unreliable.

But there are plenty of situations where cameras and lidar can do the same thing and therefore the reliability can be additive. That's what I was talking about. For example, lane detection. If camera vision is 99.99% accurate on lane detection and lidar is independently 99.99% accurate on lane detection, then combined together, you will achieve greater than 99.99% accuracy. Same with object detection/classification, range estimation, all things that both cameras and lidar can do. So for all those cases, having camera + lidar will be better than just cameras only.

Remember that nobody is advocating lidar only. Everybody agrees that camera vision is required. The question is can camera vision do everything by itself with 99.99999% reliability? If camera vision alone can achieve 99.99999% reliability, then sure, do FSD with just cameras and lidar becomes optional. But if camera vision alone cannot achieve 99.99999% (say it is only 99.99%) then camera vision alone won't be good enough and you will need additional sensors like lidar whether you like or not. And as far as I know, the current state of camera vision is only 99.99% reliable.

Lastly, it is probably worth noting that it depends what kind of FSD you are trying to do. If you are aiming for lesser forms of FSD, like L3 or FSD with human supervision, then 99.99% might be good enough. But if you are aiming for driverless L5, then 99.99% won't be good enough.
 
In blinding light, a car with lidar and cameras will also fail. It won't see the traffic light colors.

A car with just cameras would also fail, right? So if camera-only FSD won't work either, what's the solution? So this is not a problem with lidar, it's a problem with cameras. It's not a problem that removing lidar would fix.

It might not see the traffic light color but at least the FSD car with lidar, would still be able to stay in the lanes and avoid hitting other cars if the cameras were blinded. So, worst case scenario, the FSD car with lidar might blow through a red light but at least it could handle the intersection safely. The camera-only FSD car would completely fail and would not be able to handle the intersection safely.
 
Last edited:
  • Disagree
Reactions: mikes_fsd
The camera-only FSD car would completely fail and would not be able to handle the intersection safely.

I don't think this line of argumentation works for or against LIDAR. As long as either vehicle can retain some information about their surroundings, a blinded autonomous vehicle can remember where the side of the road is and safely pull over.

Human drivers are temporarily blinded all the time, and we don't instantly crash because of our ability to recall our immediate surroundings.
 
I don't think this line of argumentation works for or against LIDAR. As long as either vehicle can retain some information about their surroundings, a blinded autonomous vehicle can remember where the side of the road is and safely pull over.

Yes, I agree if the car retained some information, it could still pull over. So a HD map would be useful since a HD map could provide that memory.

If the camera-only car did not any information, then no, I don't think it could pull over safely if the cameras were blinded. The advantage of lidar is that it would provide that extra information needed to pull over safely, since the lidar could create a map and detect lane lines and other objects.

Human drivers are temporarily blinded all the time, and we don't instantly crash because of our ability to recall our immediate surroundings.

If our eyes were completely blinded, we would crash. But humans can put up one hand to eclipse the sun or pull down the sun visor to block the sun, allowing our eyes to still see most of the environment, enough to drive. That's how we can still drive. FSD cars don't have that ability to block what is blinding the cameras, at least Tesla cars don't, do they?
 
My point is that an autonomous car that has camera problems will "give up" or "fail." Essentially, unless you perfect vision, there'll be no FSD.

Once you perfect vision, there's no need for lidar. We're back at square one with this argument lol.

The people advocating for lidar + camera aren't considering the complexity of "when" and "how" to hand over to vision vs lidar when either one is disagreeing with the other. It's a problem that's rife with hyperparameters and micromanagement that becomes more problematic over time.
 
  • Like
Reactions: mikes_fsd
Once you perfect vision, there's no need for lidar.

Sure, except you are kinda glossing over the hard part. Perfect vision is far from a given.

That's kinda the whole point of lidar is that it's a good idea to have an extra sensor precisely because perfect vision is far from a guarantee.

The people advocating for lidar + camera aren't considering the complexity of "when" and "how" to hand over to vision vs lidar when either one is disagreeing with the other. It's a problem that's rife with hyperparameters and micromanagement that becomes more problematic over time.

Waymo uses both cameras and lidar. They seemed to have figured this out pretty well.
 
  • Disagree
Reactions: mikes_fsd