Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Karpathy talk today at CVPR 2021

This site may earn commission on affiliate links.
Sure, but that is not what they said or what they are doing. Integrating multiple inputs is a trade-off Tesla has decided not to accept.
its not a trade-off. richer source of info is a plus, my friend, not a minus ;)

what tesla is capable of, as engineers and as a computer system - that's a separate argument about whether its worth having rich sensor feeds or thin sensor feeds.

I know which side I'm on and it would take a pretty long-term safety demo in the real world to convince me otherwise.
 
its not a trade-off. richer source of info is a plus, my friend, not a minus

I don't think it is that simple. There is a computing cost to integrating disparate sensor data. That is the trade-off I was thinking of.
Notice that I am do not have an opinion whether Tesla is right -- I only recognize that a choice exists because there are pluses and minuses to each approach.

--
Good to see you made it through Covid.
 
  • Like
Reactions: mikes_fsd
its not a trade-off. richer source of info is a plus, my friend, not a minus
Key word there is "richer".
It has to be useable in the rest of the stack without degrading performance.

Karpathy proved that the radar was not a "richer source of info", and was actually the opposite, so they are focusing on the real "richer source of info" -- cameras.

at the 9 minute mark he directly addresses it:
"vision is getting to the point where the sensor is 100x better than, say, radar. Then, if you have a sensor that is dominating the other sensor and so much better, than the other sensor is actually starting to holding you back and starting to contribute noise, the former system [i.e. radar]. AND SO, we are really doubling down on vision only approach."
 
Last edited:
  • Like
Reactions: rxlawdude
its not a trade-off.

It's absolutely a trade-off because resources are not infinite.

So they can ether put those resources (both human and compute) into finding a way to better fuse low-res radar (that is currently providing a very poor S/N ratio for the system) with vision

Or they can put those resources into perfecting vision- which they need to do in either case to ever get to their stated goals.
 
I think every car should have individual drone up in the sky tracking it an its surroundings.

Also every car should have night vision sensors.
And a magnetic resonance system.

Come to think of it, why stop there? Every car should have an individual human "sensor" monitoring it.

Sensor fusion at all costs amirite?

/s


Sometimes, in business, you need to make tradeoffs.
 
  • Love
Reactions: mikes_fsd
Did you see Waymo's new lidar? This is the point cloud from the lidar. It is on par with camera vision IMO. It's basically like having night vision.

jqlgU6I.png


In the end, I'm gonna guess this level of LIDAR resolution will be unnecessary.
 
  • Like
Reactions: rxlawdude
In the end, I'm gonna guess this level of LIDAR resolution will be unnecessary.

Perhaps. The fact is that I don't think anyone really knows for sure what sensors are needed to reliably solve L5. For one, nobody has solved L5 yet. Also, we don't really know how many edge cases are out there. Lastly, we don't really know what AV safety is acceptable. So there are several unknowns. Companies like Waymo are dealing with the uncertainty by betting that it is better to have too many sensors than not enough sensors. Tesla is betting vision-only will be "good enough" with enough data and training. We will see. I think the real test of who solves FSD will be safety and edge cases. Everybody will be able to do the basic stuff. The key is how good your FSD is when things get more difficult.

At the end of the day, if you are riding in a driverless robotaxi, you want it to get you to your destination safely and comfortably. Do you want to take the chance that the robotaxi kills you because the sensors did not see that object? That is why I air on the side of "more sensors and better sensors".

So they got a solution vastly more expensive than vision to be...on par with vision?

Sure is baffling how they keep losing money!

I respect you as a poster on TMC but you are sounding kinda dumb. 1) The cost of lidar is coming down. 2) Sensor fusion is better than vision-only. Having HD lidar and HD camera and HD radar gives Waymo better perception than Tesla's vision-only. It is worth spending the extra money for better perception.
 
The fact is that I don't think anyone really knows for sure what sensors are needed to reliably solve L5.
LMAO!
Go watch the Karpathy CVPR 2021 video again at 9:45
Code:
https://youtu.be/NSDTZQdo6H8?t=585

You go with the sensor that gives you the most data!
8 million bits of constraints per second.
On the Tesla's - this vision only approach at Tesla is not a reactionary step, this has been planned since at least 2018 (or at least the public was first made aware of the approach from Tesla in 2018)
 
On the Tesla's - this vision only approach at Tesla is not a reactionary step, this has been planned since at least 2018 (or at least the public was first made aware of the approach from Tesla in 2018)

It's clear that this has been something they have been investigating for a while, but if this was "the plan all along," then why would they release it before the software was complete?

My guess is that this was a reaction to supply problems and they already had something "mostly done," so they took the opportunity to make the jump so that they could continue to deliver cars before the end of the quarter.
 
I respect you as a poster on TMC but you are sounding kinda dumb.

Same. On both things.

1) The cost of lidar is coming down.

I'm not sure how "It's now gotten cheaper to add sensors that are almost as good as the much cheaper vision ones we ALSO have anyway" makes this any less dumb.

2) Sensor fusion is better than vision-only

This is called assuming facts not in evidence.

Karpathy gave numerous examples of it being worse


. Having HD lidar and HD camera and HD radar gives Waymo better perception than Tesla's vision-only. It is worth spending the extra money for better perception.


Again, facts without evidence. If Waymos perception is so much better, why does it work in so few places in comparison?

Further it's confusing you claim here it's worth spending extra money on more sensors, but ALSO admit

The fact is that I don't think anyone really knows for sure what sensors are needed to reliably solve L5.


Because if Tesla solves it with just cameras, ipso facto it was not worth spending extra money on LIDAR (or HD radar).
 
Maybe. But I thought Waymo and others thought the perception part has already been solved, that now the challenge is planning / prediction? (not being sarcastic).

Yes Waymo believes that the main challenge now is prediction/planning. I don't know if they would say that they have solved perception. But I think they would say that their sensors gives them really comprehensive perception, needed for solving FSD:

"we've spent over a decade developing a single integrated system comprised of complementary sensors to give our Driver this comprehensive view of the world so that it can safely navigate complex environments."

But, different AV companies are trying different approaches to perception. Even in the "sensor fusion" approach, there are differences. Cruise, Waymo, Zoox, etc don't use the exact same type of sensor or place the sensors in exactly the same locations on the car. In fact, Zoox, Cruise, Pony.AI etc all appear to have good perception too. So it is entirely possible that there are different sensor suites that will do perception good enough for L5. And since nobody has solved L5 yet, I don't think we can say for sure what specific sensor type and specific sensor configuration is required to solve L5.
 
This is called assuming facts not in evidence.

Karpathy gave numerous examples of it being worse

I have evidence. Just watch any video from an AV company doing sensor fusion.

Here is Pony.ai that uses sensor fusion. Look at how many objects it is tracking accurately!

5iW5j5k.png



And Karpathy gave examples of Tesla's radar and Tesla's sensor fusion being worse. That does not mean that all sensor fusion is worse.

Again, facts without evidence. If Waymos perception is so much better, why does it work in so few places in comparison?

It's apples and oranges. Tesla deploys in more places but requires a driver. The reliability of perception, planning and prediction can be lower since there is a driver ready to intervene as needed. In fact, if you watch FSD Beta videos, you can see lots of examples of Tesla's perception being bad. But that's ok since the driver is responsible. Waymo might deploy in fewer areas but they are deploying autonomous driving when the reliability needs to be higher and they can remove the driver. Also good perception is not enough to solve FSD, you also need good planning and prediction. Waymo is working to solve planning and prediction problems that will allow them to safely deploy without a driver in more areas.

Because if Tesla solves it with just cameras, ipso facto it was not worth spending extra money on LIDAR (or HD radar).

Yes, if Tesla does solve L5 with vision-only, it will prove that they were right and Waymo was wrong to use lidar and radar. But Tesla has not solved L5 yet.
 
I have evidence. Just watch any video from an AV company doing sensor fusion.

Here is Pony.ai that uses sensor fusion. Look at how many objects it is tracking accurately!

Unless you have video of a Tesla in the same situation unable to track the same (or more) objects that's not evidence of any sort.


And Karpathy gave examples of Tesla's radar and Tesla's sensor fusion being worse. That does not mean that all sensor fusion is worse.

Nor does it mean any sensor fusion is better either.




It's apples and oranges.

And yet you somehow keep insisting one is better than the other.

While also admitting nobody actually knows the right answer.


Which is... weird.

Yes, if Tesla does solve L5 with vision-only, it will prove that they were right and Waymo was wrong to use lidar and radar. But Tesla has not solved L5 yet.


Then perhaps you might wish to change your earlier claim that more sensors IS worth it to "It's possible more sensors are, or are not, worth it, and we won't know until someone solves it one way or the other"?
 
Did you see Waymo's new lidar? This is the point cloud from the lidar. It is on par with camera vision IMO. It's basically like having night vision.

jqlgU6I.png
I looked through for the source because from my knowledge of lidar sensors, presenting a FOV like that can be misleading on the resolution and coverage of the sensor. I found you show a birds eye view of the older sensor, but didn't for the new one. At 1:30:37 it shows a similar view and you can see the blind spots of the sensor.
Waymo

waymo_lidar.jpg
 
How much does a 4D radar cost? I tried to Google it but no luck.

If you're running sub-1000 vehicles, there is no problem using cutting edge sensors that are expensive, low volume, and need frequent repair.

Not so for a production vehicle. And Karpathy's claim in the CVPR talk is that the 1 million+ car fleet sourcing billions of unique examples is crucial to building a dataset to train NNs adequately for the level of computer vision that Tesla aspires to — and which will be required for any company to reach large-scale L4 autonomy.

You speak with too much intelligence - are you sure you're in the right place? ;)
 
That is because of the chip shortage.

But the chip shortage did not force them to start on vision only, but to accelerate it.

That's what I said. The lack of radar units forced their hand. That sounds like a "reactionary step" to me.

The alternative was to complete the vision-only software, fully test, and then remove the radar units from cars without a loss of functionality.