Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Autonomous Car Progress

This site may earn commission on affiliate links.
End to end is not a magic bullet, according to ME CTO:

"
End-to-end AI is great. Let's just not be religious about each fancy new method. What really works in applications that require high precision is a combination of methods, each with its own advantages. For more details, watch the video :)
"
He talks about end2end object detection, not end2end control. His argument is that you will have objects right in space but wrong in lane and that this is problem. Didn't Tesla solve this by having a single neural network doing all the predictions. If not can you just solve it using language of lanes like system? Also end2end control "solves" the problem of end2end object detection by not having any object detection... ^^
 
Just chance unit from mile to meter and you can add three nines.

Imo it's just a figure of speech, probably based on six sigma.
Six Sigma is a set of methodologies and tools used to improve business processes by reducing defects and errors, minimizing variation, and increasing quality and efficiency. The goal of Six Sigma is to achieve a level of quality that is nearly perfect, with only 3.4 defects per million opportunities.

Fair point.

Imo Tesla has the right question being asked, because they have actually thought through it. Basically
1. Be safer than the average human -> you can argue that FSD should be allowed, ie it makes the streets safer
2. Be ~10x safer than the average human -> at this point it's get hard to argue that FSD should not be allowed, you can start to prove it is significant safer than the average human
3. Be safer than all humans -> now it's impossible to argue that FSD should not be allowed

Except that it is very vague. That is not a safety methodology. It is an outline of goals at best. Tesla has not clearly defined what safer than the average human means. What metric are they using to define that? Is if 100k miles per intervention? What? And Tesla has not shared any official data on the safety of FSD if unsupervised so we don't know close Tesla FSD is to achieving these goals. Now Tesla does share accident rates but that is FSD supervised. At best, they make an argument that FSD supervised is safer than average human and should be allowed. That's great. But that is supervised. It does not tell us if or when FSD (Supervised) can become eyes off.

Mobileye has at least given us very specific MTBF numbers based on human safety metrics where they say they can go eyes off when they achieve that MTBF.
 
He talks about end2end object detection, not end2end control. His argument is that you will have objects right in space but wrong in lane and that this is problem. Didn't Tesla solve this by having a single neural network doing all the predictions. If not can you just solve it using language of lanes like system? Also end2end control "solves" the problem of end2end object detection by not having any object detection... ^^

There are different ways to solve the lane problem that Shai mentions. And yes, e2e control is one way to solve it. Shai promotes the Mobileye solution of course but it is not the only solution.
 
There's two separate things - the amount / variety of sensors input and the modeling approach. I don't like the term "redundant", that makes it sound like it's a backup / safety critical system when all it really means is there is some orthoganality to the data streams so a sensor fusion technique can improve reliability when using multiple inputs vs just one type. No arguement from me that more inputs give you the opportunity for a more accurate model.

But that has nothing to do with end-to-end. You can still input all that data into an end to end model. The end-to-end model will figure out how to optimally fuse the sensors, you don't need separate models for each. Multimodal deep learning models are a pretty well studied thing!

Yes I get the difference between sensor input and modelling approaches. Sensor "redundancy" is just one part of ME's argument. ME also talks about using different models within perception in order to create more reliable perception. They are trying to make the argument that combining different sensor modes and different modeling approaches is a better path to 99.999999% than pure vision end-to-end.
 
I think sf
99.999999% is impossible, unless you play games with your definitions, tuning them towards that figure.
I assure you there is greater than a 99.9999999999999999999999999% that I will have more coffee this morning. So it's not impossible. Oh wait...you mean driving?...well I can't answer that without more coffee. How much coffee does AI need to drive properly? Or read forums?
 
Yes I get the difference between sensor input and modelling approaches. Sensor "redundancy" is just one part of ME's argument. ME also talks about using different models within perception in order to create more reliable perception. They are trying to make the argument that combining different sensor modes and different modeling approaches is a better path to 99.999999% than pure vision end-to-end.

Right, and I'm saying "different models to create more reliable perception" is generally wrong. There are many computer vision tasks out there, and the state of the art solutions are "end to end" deep learning models, not some DL models combined with some other computer vision algorithms. Those perform worse.

Other techniques cannot handle the statistical diversity in the data.

I personally don't think ML alone is ready to take on safety critical applications this decade. See for example radiology. Also, (a) how many companies can you names form the valley that has shipped safety-critical tech ? (b) Based on pure ML?

a) a handful
b) zero

Not even radiology on still images has removed the human from the loop.

With regards to (pure) e2e: To properly scale self driving you need to be able to adjust it to climates, different regulatory domains, cultural driving differences. I don't see how a single large NN will be able to accommodate that.

Yeah, machine learning may not be accurate enough yet to take on safety critical applications. Your radiology example is a good point.

But it doesn't mean there is some other algorithm or combination of algorithms that is ready to take over. It means the technology isn't ready in general.

Again, follow the trend in every other area where there is plentiful raw time series / images / words data along with labels - deep learning gives the most accurate, most reliable results.

But it requires a lot of data and a lot of compute. And isn't necessarily good enough yet either.
 
  • Informative
Reactions: primedive
Yeah, machine learning may not be accurate enough yet to take on safety critical applications. Your radiology example is a good point.

But it doesn't mean there is some other algorithm or combination of algorithms that is ready to take over. It means the technology isn't ready in general.

Again, follow the trend in every other area where there is plentiful raw time series / images / words data along with labels - deep learning gives the most accurate, most reliable results.

But it requires a lot of data and a lot of compute. And isn't necessarily good enough yet either.
Yeah. Thanks. I think we're in agreement.

The point I am trying to make is that there is a lot of ML in an AVS. But ML alone isn't enough at this point in time. That's where hard rules, active sensing, hd-maps and such play a role as a safety net.

Getting those last nines to get to driverless deployment is massively hard. I believe most people fail to understand how hard it really is. Waymo basically spent the most of the last 8-10 years on this exact problem.

You need to throw all the tools and "crutches" you have at it. And again, I simply don't see a pure ML stack getting there this decade for any safety critical application.
 
Last edited:
With regards to (pure) e2e: To properly scale self driving you need to be able to adjust it to climates, different regulatory domains, cultural driving differences. I don't see how a single large NN will be able to accommodate that.
1. you input the GPS position into the neural network, train it on a diverse dataset and the neural network learns what regulatory rules are active at different GPS positions
2. you preprocess the GPS position to figure out which jurisdiction you are in, for example "california = 1 ... china 623 ..." and input that to the neural network and it learns what rules are used in each jurisdiction

Think about yourself. Let's assume you have driven a few billion miles in USA. How many miles do you think you need to drive in Australia before you drive better than the average Australian driver? 1k miles? 1M miles? 1B miles? Let's say you woke up 30s ago, where told the GPS position of where you are and can watch 30s video of what happened recently, do you think you could figure out how to drive if you previously have driven 1M miles of diverse situations in that country/state?
 
1. you input the GPS position into the neural network, train it on a diverse dataset and the neural network learns what regulatory rules are active at different GPS positions
2. you preprocess the GPS position to figure out which jurisdiction you are in, for example "california = 1 ... china 623 ..." and input that to the neural network and it learns what rules are used in each jurisdiction

Think about yourself. Let's assume you have driven a few billion miles in USA. How many miles do you think you need to drive in Australia before you drive better than the average Australian driver? 1k miles? 1M miles? 1B miles? Let's say you woke up 30s ago, where told the GPS position of where you are and can watch 30s video of what happened recently, do you think you could figure out how to drive if you previously have driven 1M miles of diverse situations in that country/state?
How many kangaroos and “drive on the left” roads in the US again?



Surely the GPS neural network will solve this! My friend, you’re truly a person that believes in magic.
 
  • Funny
Reactions: primedive
How many kangaroos and “drive on the left” roads in the US again?



Surely the GPS neural network will solve this! My friend, you’re truly a person that believes in magic.
If your neural network can solve it, expect an artifical neural network to solve it. You don't solve it using magic...
 
If your neural network can solve it, expect an artifical neural network to solve it. You don't solve it using magic...
Oh yes. You’ve got me. A neural network is the exactly the same as a human brain, that’s what it’s called a “neural” network, right? Silly me, I forgot.

Now explain Moravec’s paradox. Or why you need billions of miles to teach a computer to drive poorly, when most humans do it safely in 10-20 hours.
 
Last edited:
  • Love
Reactions: Daniel in SD
Oh yes. You’ve got me. A neural network is the exactly the same as a human brain, that’s what it’s called a “neural” network, right? Silly me, I forgot.

Now explain Moravec’s paradox. Or why you need billions of miles to teach a computer to drive poorly, when most humans do it safely in 10-20 hours.
1. It's true that we currently need more data approximate what humans can do with artificial neural networks
2. It's also true that in many domains we can exceed what humans can do by using lots of data
3. Tesla has lots of data

FSD currently has driven >1B miles with 0 deaths. This significantly exceeds what humans average. Imo this is not poorly. It still has a few quirks to iron out, but in many ways we could argue that humans are the ones driving poorly. Let's say all humans were driving like FSD, do you think a computer driving like the average human today would be allowed?
 
  • Funny
Reactions: spacecoin
1. It's true that we currently need more data approximate what humans can do with artificial neural networks
2. It's also true that in many domains we can exceed what humans can do by using lots of data
3. Tesla has lots of data

FSD currently has driven >1B miles with 0 deaths. This significantly exceeds what humans average. Imo this is not poorly. It still has a few quirks to iron out, but in many ways we could argue that humans are the ones driving poorly. Let's say all humans were driving like FSD, do you think a computer driving like the average human today would be allowed?
That’s 10x safer than a human. Why no robotaxis?
 
1. It's true that we currently need more data approximate what humans can do with artificial neural networks
2. It's also true that in many domains we can exceed what humans can do by using lots of data
3. Tesla has lots of data
I’m not sure if you’re trolling or being serious.

Have you given any thought to why LLM:s are trained on all known text and billions of hours of video, and still can’t do basic math and fail simple logic tests?
 
Not crashing or killing people is such a low bar and not even the difficult part. A driving agent also needs to complete trips without getting stuck, it needs to be actually autonomous (no humans in loop, either in car, the field or call center), it needs to be comfortable, it needs to be expedient and economical, it needs to be "natural" and "human-like" and not behave like an alien/robot. I don't see any way you can do all that today without e2e vision and world models . You can't behave like a human if you're driving on vector-space bounding boxes and point clouds and hd maps because that's not how humans drive. Autonomy requires deep understanding of how the whole world works, it can't be hand-coded by humans.
 
This thread could be so much better with a little more allowance for ideas and discussion, and a lot less falsely authoritative preening and derision.

I do know something about how successful engineering teams operate to solve hard problems, how one can learn from evaluating intermediate results, and how problem-solving discussions work. This thread in recent days is unfortunately nothing like that, when it comes to open minded evaluation of competitive approaches.

However I do appreciate it as a repository of links to videos and news stories of the industry, and subsequent discussion that is long on comparative information and short on arrogance. I'm pretty sure that was the original intent.
 
i
This thread could be so much better with a little more allowance for ideas and discussion, and a lot less falsely authoritative preening and derision.

I do know something about how successful engineering teams operate to solve hard problems, how one can learn from evaluating intermediate results, and how problem-solving discussions work. This thread in recent days is unfortunately nothing like that, when it comes to open minded evaluation of competitive approaches.

However I do appreciate it as a repository of links to videos and news stories of the industry, and subsequent discussion that is long on comparative information and short on arrogance. I'm pretty sure that was the original intent.
Be the change you want to see. Plenty of value in discussing things from all different framings and levels.
 
  • Like
Reactions: spacecoin