Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Is all neural networks really a good idea?

This site may earn commission on affiliate links.
I have never been really comfortable with this idea of FSD being full-stack neural networks, aka, "images in to steering, brakes & acceleration out." At first I thought it at least needed some hard and fast rules, e.g., "don't ever hit solid objects," "don't ever pass a school bus," "don't ever drive off a cliff," etc. Not neural networks trained on these circumstances that have a 97% accuracy rate, but basically an "if" statement that is 100% enforced. With these, an all "AI" driving system could be workable, I thought.

However, now I think it is quite misguided to even aspire for full-stack neural network automation. It seems more like an "our only (or, better yet, our most exciting) tool is AI, so everything looks like an AI nail" type situation. Imagine if we taught our kids to drive this way: "I'm not going to tell you anything about driving - just watch what I do in all these circumstances and then emulate it." Would that result in good drivers? I don't think so.

The lane selection is an excellent example. Lane selection was never something I thought FSDb really struggled with. Then at 2022 AI day the Tesla engineers were all excited about they had this new NN model based on natural language processing that they had implemented into FSDb for lane selection. It would use the environment to make choices about lane selection and reduce dependence on "hard-coded" lane decisions and mapping. Well, it sucks - lane selection right now is one of the worst things about FSDb behind phantom braking. It seems to me there is still a place for "hard-coded" driving decisions to be in the system, such as "Are you turning left? No? Then don't move into the leftmost lane when it becomes available because it might be (probably is) a turn lane." Pretty simple, right? I mean especially when you have mapping information that can tell you what the lane is for. But the NNs just sort of guess, and while they may be right 85% of the time, that remaining 15% is pretty damn annoying (if not downright dangerous).

So I for one am not excited about the pending v12 release with full-stack NNs. There's going to be a ton of regression here and lots of opportunities for the system to veer away (no pun intended) from being a polished L3 autonomous driving system. Hopefully, this isn't more of Elon's "goal" of an L5 robotaxi, which I think everybody who drives these knows (if only deep down inside) is never going to happen. I think it's time for Tesla to start thinking about picking a realistic goal and then making moves using everything available to take this product over the finish line and call it done. It can't be a work-in-progress forever, right?
 
"I'm not going to tell you anything about driving - just watch what I do in all these circumstances and then emulate it." Would that result in good drivers? I don't think so.

I'm not sure about this statement. My oldest child has a high attention to detail. She knows the city we live in by driving through it (as a passenger), and she knows the rules of the road and the basic operation of the vehicle by observation.

Sure, there are things I will teach her once she's ready to drive, but I'd say 80% of what she needed to learn to be a driver - she already knows just by observation.

I'm not advocating for AI as an excellent driver (although I'm floored by midjourney and the like)... I guess I'm more just arguing with that quoted statement of yours 🤪. I suppose only time will tell.
 
I'm not sure about this statement. My oldest child has a high attention to detail. She knows the city we live in by driving through it (as a passenger), and she knows the rules of the road and the basic operation of the vehicle by observation.

I'll support this a bit further by citing one of the theories as to why certain immigrant groups are terrible drivers. They come from cities where transit was the norm and children did not grow up riding around in cars.

It also reminds me of an experience I had with my then 4 year old grandson. The design of the back seat of the Tesla makes it almost impossible to see out a window and if in a car seat, they are strapped in so can't lean for a better look out the front or side windows. Therefore he watches the display (of which he can only see half, so he sees Ego).

One day he was watching Ego go through a stop sign with no slowing down in an empty (due to covid lockdowns) parking lot. He corrected me and said he'd be telling the next police officer he saw! In his case he has learned the rules, not the 'norm.' And perhaps that's for the good!
 
I'm not sure about this statement. My oldest child has a high attention to detail. She knows the city we live in by driving through it (as a passenger), and she knows the rules of the road and the basic operation of the vehicle by observation.

Sure, there are things I will teach her once she's ready to drive, but I'd say 80% of what she needed to learn to be a driver - she already knows just by observation.

I'm not advocating for AI as an excellent driver (although I'm floored by midjourney and the like)... I guess I'm more just arguing with that quoted statement of yours 🤪. I suppose only time will tell.
The statement is meant more to portray the nature of training AI to drive from learned behavior instead of laying out explicit rules. The best example of this (which has basically become a meme at this point) is how movies and TV portray an alien learning to drive from a human. After watching the human driver, the alien has learned that the green light means go, the red light means stop, and the yellow light means speed up. Obviously, this fictional situation could have been avoided with a very simple explanation of how the lights work (the coded rules).
 
  • Funny
Reactions: APotatoGod
I have never been really comfortable with this idea of FSD being full-stack neural networks, aka, "images in to steering, brakes & acceleration out." At first I thought it at least needed some hard and fast rules, e.g., "don't ever hit solid objects," "don't ever pass a school bus," "don't ever drive off a cliff," etc. Not neural networks trained on these circumstances that have a 97% accuracy rate, but basically an "if" statement that is 100% enforced. With these, an all "AI" driving system could be workable, I thought.

However, now I think it is quite misguided to even aspire for full-stack neural network automation. It seems more like an "our only (or, better yet, our most exciting) tool is AI, so everything looks like an AI nail" type situation. Imagine if we taught our kids to drive this way: "I'm not going to tell you anything about driving - just watch what I do in all these circumstances and then emulate it." Would that result in good drivers? I don't think so.

The lane selection is an excellent example. Lane selection was never something I thought FSDb really struggled with. Then at 2022 AI day the Tesla engineers were all excited about they had this new NN model based on natural language processing that they had implemented into FSDb for lane selection. It would use the environment to make choices about lane selection and reduce dependence on "hard-coded" lane decisions and mapping. Well, it sucks - lane selection right now is one of the worst things about FSDb behind phantom braking. It seems to me there is still a place for "hard-coded" driving decisions to be in the system, such as "Are you turning left? No? Then don't move into the leftmost lane when it becomes available because it might be (probably is) a turn lane." Pretty simple, right? I mean especially when you have mapping information that can tell you what the lane is for. But the NNs just sort of guess, and while they may be right 85% of the time, that remaining 15% is pretty damn annoying (if not downright dangerous).

So I for one am not excited about the pending v12 release with full-stack NNs. There's going to be a ton of regression here and lots of opportunities for the system to veer away (no pun intended) from being a polished L3 autonomous driving system. Hopefully, this isn't more of Elon's "goal" of an L5 robotaxi, which I think everybody who drives these knows (if only deep down inside) is never going to happen.

The problem with hard coded rules is that your stack won't be flexible enough. Driving requires intelligence. You will encounter lots of different cases that may each require their own rule or exceptions where you need to break a rule like crossing a double line because a construction zone has closed that lane. You can't write a hard coded rule for every single case or for every exception to the rule, it would be too time consuming and there would be too many rules. That won't scale. I don't think hard coded rules for lane selection would be a good idea since there could be many exceptions and nuances. You could not easily code for all of them. The big advantage of NN is that it you are letting the machine "write the rules" for you. So it is faster and more flexible. With more data, you can scale the training. And with the right data, you can let the machine essentially "code" for all those cases and exceptions. Yes, there will be regressions with NN training at first, but eventually, with enough data, the system will get better. So long term, NN is the way to go. In fact, I would argue that machine learning is the reason we are seeing autonomous driving reach the maturity it is today, with more and more driverless in complex driving environments. For example, the driverless we see from Waymo or Cruise would not be possible without machine learning and NN.

Having said that, some hard coded rules may be useful. For example, Mobileye uses some hard coded rules in their safety policy, called RSS. The rules determine what a safe distance is, among other things. But Mobileye still uses NN for the actual driving (perception, planning). The hard coded rules in RSS are just guard rails to ensure safer driving, they don't make driving decisions. So, the NN determines the path of the car but RSS ensures that path is safe. You don't want to fall into the trap of trying to hard code the driving itself.

There is a difference between using all NN for your entire stack and end-to-end. Most AV companies will use NN for all their stack for the reasons I stated above. With the right data, they can create the necessary deep NN for perception, prediction and planning, needed for complex driving that they could never do if they had to hard code everything. But what they do is create intermediary NN. So they might have NN for perception, a separate NN for prediction and a separate NN for planning. This way, they can create each piece of the puzzle and make sure each piece works. What Waymo has learned is that it is good to consolidate your NN. So for example, instead of having half a dozen NN for perception, maybe you can have just 1 big NN for all of perception. But you still have separate NN for the other tasks like planning. So they use NN for the entire stack, but they have separate NN for different tasks. This is very different from end-to-end that seeks to have just 1 big NN for the entire stack, video in and control out, as Elon says. Waymo does not believe that end-to-end is smart at this point because it has several big challenges but they still believe in consolidating NN.

Elon seems to start with an idea of how he thinks things should work and then tries to make it work, instead of solving the problem the best way possible. We saw that with auto wipers. The obvious way to solve auto wipers is with a cheap rain sensor that detects humidity on the windshield. Detect humidity reliably and turn the wipers on. Easy. Instead Elon wanted to solve it with NN because he thinks NN is the answer to everything. He did the same with vision. He got rid of radar and ultrasonics, even when vision-only was not ready yet, because he already made up his mind that it should be vision-only. Now, he wants to do the same with end-to-end. The current stack is not L5 yet but he is convinced that end-to-end is how it should work so he wants to do it, even before it is ready. I do think Elon is falling for AI hype, as end-to-end is the latest big thing. I agree he should focus on making autonomous driving reliable with the best approach rather than jumping on the latest AI thing and trying to make work before it is ready. And I agree with you that there will likely be many regressions before we see reliable end-to-end FSD.

I think it's time for Tesla to start thinking about picking a realistic goal and then making moves using everything available to take this product over the finish line and call it done. It can't be a work-in-progress forever, right?

I agree. That is why I wish Tesla would come up with a realistic intermediary goal like say reliable hands-free L2 highway driving like other companies are doing. That way we could have something tangible and reliable while we wait for L5. But sadly, I doubt it will happen as long as Elon is the CEO. He is just not interested. Heck, AP2 was announced 7 years ago now. Elon has had many chances to change course but he has not. I doubt he will now. I suspect he will continue the "L5 or bust" approach even if it is a work-in-progress forever.
 
The statement is meant more to portray the nature of training AI to drive from learned behavior instead of laying out explicit rules. The best example of this (which has basically become a meme at this point) is how movies and TV portray an alien learning to drive from a human. After watching the human driver, the alien has learned that the green light means go, the red light means stop, and the yellow light means speed up. Obviously, this fictional situation could have been avoided with a very simple explanation of how the lights work (the coded rules).

Then it's Tesla's responsibility to train using good driving habits, not bad.
 
  • Like
Reactions: pilotSteve
@diplomat33, you make many good points I agree with. I hope I can add some insights regarding the difference between R&D and production engineering. Also, I'm unclear on your use of "end-to-end" regarding Tesla's NNs.
There is a difference between using all NN for your entire stack and end-to-end. Most AV companies will use NN for all their stack for the reasons I stated above. With the right data, they can create the necessary deep NN for perception, prediction and planning, needed for complex driving that they could never do if they had to hard code everything. But what they do is create intermediary NN. So they might have NN for perception, a separate NN for prediction and a separate NN for planning. This way, they can create each piece of the puzzle and make sure each piece works.
This is a key point. Tesla wants to use NNs for all the pieces, not one big NN that takes video as input and outputs car controls. In a world of infinite computer resources one big NN is theoretically possible and is trivial to implement but in our finite world it is impossible. The huge amount of work Tesla has poured into NNs has been to find solutions that are vastly more efficient than one big NN. For example, they use NNs to create an occupancy network, a model of the 3-d world, based on the video inputs. Then other NNs use the data in the occupancy network to make decisions about what the car should do.

A single end-to-end NN would not have an occupancy network.

Elon seems to start with an idea of how he thinks things should work and then tries to make it work, instead of solving the problem the best way possible.
I disagree somewhat. Elon is trying to solve problems that have never been solved before. He is doing research and development, not production engineering. He takes an approach of fail early and learn from your mistakes. He is definitely pushing many envelopes. If the best way possible is already known then you are in the realm of production engineering which is almost diametrically opposed to the R&D that Elon does and that I've done most of my life.

We saw that with auto wipers. The obvious way to solve auto wipers is with a cheap rain sensor that detects humidity on the windshield. Detect humidity reliably and turn the wipers on. Easy.
One of the main elements of R&D is learning; about the problem and about possible solutions. Perhaps NNs with vison will never be great at controlling auto-wipers but the only way to find out is to try it and see if it works in the field. If it does work then it slightly simplifies the design of the car which is good. You swing for the fences and learn from your mistakes. The benefit of trying to use NNs for auto-wipers is you learn more about the limits of what NNs can do. If it fails, you learn about how and why it fails. This information is extremely valuable.

Instead Elon wanted to solve it with NN because he thinks NN is the answer to everything. He did the same with vision. He got rid of radar and ultrasonics, even when vision-only was not ready yet, because he already made up his mind that it should be vision-only.
I think it would be more accurate to say Elon wants to find out if NNs are the answer to most things. His comment that lidar is a crutch is telling. He admits lidar makes it easier to solve parts of the problem in the short term, just like a rain gauge helps solve the auto-wiper problem but you learn almost nothing by doing it.

I'm not certain about his decisions to remove USS and radar early. I imagine it was partly due to him being overly optimistic about how quickly they can solve FSD. I'm almost certain part of it was to goad his team into solving FSD vision more quickly. I have no doubt Elon is doing his very best to get his team to solve FSD as soon as possible.

Now, he wants to do the same with end-to-end. The current stack is not L5 yet but he is convinced that end-to-end is how it should work so he wants to do it, even before it is ready.
If the current stack was L5 then there would be little need to try new things. Elon is trying to convert everything to NNs in order to get to L5. The current system is not even close. Again, I would say Elon is doing the experiment to find out if using NNs throughout will get us closer to L5. IMO the term "end-to-end" can imply one big NN which is not what Tesla is doing.

I do think Elon is falling for AI hype, [...]
I don't think this is fair. Elon, Karpathy and others started OpenAI eight years ago. this was before AlphaZero took the world by storm. Elon was deeply committed to AI well before the hype started.

[...] as end-to-end is the latest big thing.
Again, I'm not sure what you mean by "end-to-end". If you mean one big NN to rule them all then this is NOT what Tesla is doing and it doesn't apply. It is certainly not the "latest thing" because it does not work in this case and no one in the field thinks it will. If you mean using NNs almost everywhere then AFAIK, this has been the goal for ages and is not part of the latest hype. If you have evidence to the contrary, I'd like to learn from it.

I agree he should focus on making autonomous driving reliable with the best approach rather than jumping on the latest AI thing and trying to make work before it is ready. And I agree with you that there will likely be many regressions before we see reliable end-to-end FSD.
The problem is no one on the face of the Earth knows for sure what the best approach is. If the best approach is known then you're talking about production engineering, not R&D. Trying to make something work before it is ready is the heart of research and development. If everyone waited until the best approach was known and ready then progress by the human race would come to a standstill.

IMO Elon is laser focused on making autonomous driving reliable with the best approach. He is trying to find out what the best approach is. If his current approach doesn't work then he either fixes it and tries again or he tries a different approach.

Elon is not alone in thinking NNs will end up being the best approach to solve FSD. Many smart and knowledgeable people like Karpathy, Douma, Hotz, and others are on the same bus. Do you know of smart and knowledgeable people who honestly think NNs will not be the answer? The big unknown is how long it will take for NNs to solve FSD.
 
Tesla wants to use NNs for all the pieces, not one big NN that takes video as input and outputs car controls. In a world of infinite computer resources one big NN is theoretically possible and is trivial to implement but in our finite world it is impossible. The huge amount of work Tesla has poured into NNs has been to find solutions that are vastly more efficient than one big NN. For example, they use NNs to create an occupancy network, a model of the 3-d world, based on the video inputs. Then other NNs use the data in the occupancy network to make decisions about what the car should do.

A single end-to-end NN would not have an occupancy network.

Again, I'm not sure what you mean by "end-to-end". If you mean one big NN to rule them all then this is NOT what Tesla is doing and it doesn't apply.

End-to-end means one NN that directly takes images/video as input and produces steering and acceleration/braking as output.

No, Tesla is not doing end-to-end now. They are using different NN for different parts of the stack. But Elon's tweet clearly indicates that he hopes to switch to end-to-end with FSD beta V12.


And Elon says "from images in to steering, brakes & acceleration out". That is clearly referencing one big NN that takes images/video as input and outputs car controls. So Elon clearly wants V12 to be one big end-to-end NN that takes images in to steering, brakes and acceleration out.

Elon is not alone in thinking NNs will end up being the best approach to solve FSD. Many smart and knowledgeable people like Karpathy, Douma, Hotz, and others are on the same bus. Do you know of smart and knowledgeable people who honestly think NNs will not be the answer? The big unknown is how long it will take for NNs to solve FSD.

You misunderstand. I never said that some don't believe in NN. Everybody agrees that NNs are the best approach to solve FSD. And everyone is using NN to some degree or the other in FSD. That is not the issue. The issue is HOW best to use NN. Some use NN for only part of their stack, some use NN for the entire stack but split into different NN and some like Wayve are trying one end-to-end NN. So again, looking at Elon's tweet, it is clear, he believes in end-to-end and plans to switch to it with V12.

To be clear, I am not against Elon or anyone using NN, I agree that the best approach is NN, I am just skeptical in end-to-end. I think we are a long ways from safe, reliable L5 with vision-only end-to-end. That's all I am saying.

I think it would be more accurate to say Elon wants to find out if NNs are the answer to most things. His comment that lidar is a crutch is telling. He admits lidar makes it easier to solve parts of the problem in the short term, just like a rain gauge helps solve the auto-wiper problem but you learn almost nothing by doing it.

Lidar also uses NN. You can look up Waymo's research. They do a ton of NN with lidar. So no, Elon's refusal to use lidar has nothing to do with him wanting to use NN to solve the problem since you can do NN with lidar too. It had more to do with Elon already deciding in vision-only. He wanted to solve perception with vision so he felt like getting any perception data from other sensors would be a "crutch". For example, lidar gives you precise distance measurements to objects but Elon wanted to get distance measurements from vision instead.

I'm not certain about his decisions to remove USS and radar early. I imagine it was partly due to him being overly optimistic about how quickly they can solve FSD. I'm almost certain part of it was to goad his team into solving FSD vision more quickly. I have no doubt Elon is doing his very best to get his team to solve FSD as soon as possible.

Again, I think Elon removed ultrasonics and radar early because he is committed to the vision-only approach.
 
Last edited:
I have never been really comfortable with this idea of FSD being full-stack neural networks, aka, "images in to steering, brakes & acceleration out." At first I thought it at least needed some hard and fast rules, e.g., "don't ever hit solid objects," "don't ever pass a school bus," "don't ever drive off a cliff," etc. Not neural networks trained on these circumstances that have a 97% accuracy rate, but basically an "if" statement that is 100% enforced. With these, an all "AI" driving system could be workable, I thought.

However, now I think it is quite misguided to even aspire for full-stack neural network automation. It seems more like an "our only (or, better yet, our most exciting) tool is AI, so everything looks like an AI nail" type situation. Imagine if we taught our kids to drive this way: "I'm not going to tell you anything about driving - just watch what I do in all these circumstances and then emulate it." Would that result in good drivers? I don't think so.

The lane selection is an excellent example. Lane selection was never something I thought FSDb really struggled with. Then at 2022 AI day the Tesla engineers were all excited about they had this new NN model based on natural language processing that they had implemented into FSDb for lane selection. It would use the environment to make choices about lane selection and reduce dependence on "hard-coded" lane decisions and mapping. Well, it sucks - lane selection right now is one of the worst things about FSDb behind phantom braking. It seems to me there is still a place for "hard-coded" driving decisions to be in the system, such as "Are you turning left? No? Then don't move into the leftmost lane when it becomes available because it might be (probably is) a turn lane." Pretty simple, right? I mean especially when you have mapping information that can tell you what the lane is for. But the NNs just sort of guess, and while they may be right 85% of the time, that remaining 15% is pretty damn annoying (if not downright dangerous).

So I for one am not excited about the pending v12 release with full-stack NNs. There's going to be a ton of regression here and lots of opportunities for the system to veer away (no pun intended) from being a polished L3 autonomous driving system. Hopefully, this isn't more of Elon's "goal" of an L5 robotaxi, which I think everybody who drives these knows (if only deep down inside) is never going to happen. I think it's time for Tesla to start thinking about picking a realistic goal and then making moves using everything available to take this product over the finish line and call it done. It can't be a work-in-progress forever, right?

It's a good question. After 9 or so years FSDb has made progress but it's been slow and results are still far from a normal human level of safe driving. Gotta hope some day FSD will be the adult in the car instead of a student driver.

I agree with having absolutes.

It will be interesting when NNs finally takes over steering, brake and throttle control. Hopefully they find a way to improve testing.
 
I don't recall Tesla ever even hinting that they had any intention of creating an L3 system. Did I miss something?

No, you did not miss anything. I think Tesla referred to the Autonomy Day hands-free demo as L3 in a private email to the CA DMV but Tesla has never said anything publicly about wanting to deploy L3 AFAIK.
 
End-to-end means one NN that directly takes images/video as input and produces steering and acceleration/braking as output.

No, Tesla is not doing end-to-end now. They are using different NN for different parts of the stack. But Elon's tweet clearly indicates that he hopes to switch to end-to-end with FSD beta V12.

And Elon says "from images in to steering, brakes & acceleration out". That is clearly referencing one big NN that takes images/video as input and outputs car controls. So Elon clearly wants V12 to be one big end-to-end NN that takes images in to steering, brakes and acceleration out.
I can see why you got confused by what Elon said. What he said is very confusing and easy to misinterpret. The key context to understanding what he meant is that using one big NN from end to end for FSD is utterly impossible now. It's not even close. Even if it were possible and it wasn't an utterly stupid approach, it would mean throwing all of their existing work on the scrape heap and starting over from scratch. The more efficient approach they've been taking for years is billions upon billions upon billions of times easier than one big NN. Just trying to train such a thing would be a nightmare because the training data would have to be video from the cars combined with what drivers do operating the steering wheel and pedals. Not to mention that all the visualizations on the car's screen would be scraped.

In theory, with unlimited computing power, this is possible but in the real world it is completely impossible. This is massively wasteful because, for example, you are wasting training time to get the NN to come up with the equivalent of the occupancy network based solely on training. We, as humans know the cars are operating in a 3-D physical world. We know all the cameras are taking video of the one physical world. This is a massive symmetry (redundancy) in the data and we take advantage of this by building it into the system instead of forcing the NN to figure it out on its own.

Another way to see this is Elon has talked a lot about using FSD as the basis for the AI inside of the Tesla bot and eventually Artificial General Intelligence (AGI). One big black-box/NN that only takes Tesla camera video feeds as input and only supplies steering and pedal control as outputs would be useless for this. What will transfer over to the bot and to AGI will be things like the occupancy network, all the things that would be thrown away if FSD was one big NN.

Therefore what Elon meant was that it will be all NNs doing the processing from video in to car control out. They've been slowly whittling away at this for years, converting more and more heuristic hard-coded pieces to NNs. Moving the highway portion to the new FSD beta stack was a big step. Elon has been talking about this process for a while. He recently mentioned moving route planning over to a NN (or something like that).

They are not throwing out all of their existing work and they are not doing something that is utterly impossible going from V11 to V12. One needs to be really careful in interpreting what Elon says. In this case the full quote is:

v12 is reserved for when FSD is end-to-end AI, from images in to steering, brakes & acceleration out.
I highlighted the crucial part: end-to-end AI. The only possible thing this can mean is it will be all NNs because they have finally converted the last pieces of heuristic code over to a NN.
 
I highlighted the crucial part: end-to-end AI. The only possible thing this can mean is it will be all NNs because they have finally converted the last pieces of heuristic code over to a NN.
I do think this was an unfortunate use of the term by Elon; this was discussed just after his tweet. I agree with you that the currently accepted meaning of end-to-end is not what Elon actually meant in this context. He's certainly not ignorant about it and I think he just wasn't being careful enough in this tweet. Maybe he should have said "front l-to-back" AI, or "an all-AI pipeline".

The last I heard, comma.ai was trying to do their self-driving using a more end-to-end architecture, though with more modest goals and equipment. In their case, they said the end to end pipeline did not include vehicle control execution, but stopped at an abstracted or virtual control layer before that. That's because they want to be able to port the trained ML output to multiple different vehicle platforms, and not have to separately train for each vehicle type.

I haven't followed comma.ai news recently. George Hotz was stepping back from his role and I don't know how things have proceeded since then.

I think Wayve (not Waymo) is another small team trying to execute a basically end-to-end architecture. They've given some of the best conference talks but again I don't know if they're gaining traction beyond their basic research.
 
  • Informative
Reactions: APotatoGod
I do think this was an unfortunate use of the term by Elon; this was discussed just after his tweet. I agree with you that the currently accepted meaning of end-to-end is not what Elon actually meant in this context. He's certainly not ignorant about it and I think he just wasn't being careful enough in this tweet. Maybe he should have said "front l-to-back" AI, or "an all-AI pipeline".
It would have been clear if he simply said "replace all the hand written code with NNs". But much less glitzy.

There are two things that may be going on. First, since Elon was very aware that using a single NN for FSD was impossible, a part of him didn't think there would be any confusion about what he meant. But another thing I've seen is he will over-promise by implication and let people read their own hopes and desires into his words.

I got flak here in the summer of 2021 after an earnings call where Elon talked about 4680 batteries. Some people waiting for a Model Y delivery were considering postponing in hopes of getting a car from Austin with 4680 batteries before the end of the year based on the hope they heard in Elon's words. But if you listened closely, he left himself an out of using 2170s if there was a problem with the 4680 ramp. Which there was. From his words alone I figured that waiting for 4680s before the end of 2021 was a bad bet. Some people didn't want to hear that.

Elon is a very optimistic guy, which is great for doing the things he does. He's Optimist Prime. IMO the world is a better and more interesting place because of his optimism. But I listen to his words very carefully, like I would read a contact written by a lawyer I don't trust. Usually the least optimistic interpretation of what he says is closest to what actually happens.

The last I heard, comma.ai was trying to do their self-driving using a more end-to-end architecture [etc]
I'm quite sure these other players also mean using only NNs, not one big NN for the reasons I gave above. If you have any evidence to the contrary I would like to see it so I can improve my understanding.
 
A single end-to-end NN would not have an occupancy network
Not to mention that all the visualizations on the car's screen would be scraped
Curious, why do you say a single neural network wouldn't have those? Both of those internal representations would probably still be good intermediate training targets and outputs that the overall neural network would learn to use or ignore when appropriate. Each part of the end-to-end NN could be trained individually (with various parts frozen) as well as jointly to improve the shared backbone of mutli-task learning then repeatedly iterated as well as refined at lower learning rates.

Starting from robust perception neural networks probably will transfer knowledge to newly added control portion. Additionally, allowing the gradient to flow all the way from control outputs to video inputs could result in perception learning new things as part of the overall optimization. Basically, there's no need to get rid of the mature FSD Beta networks when integrated into a single one.
 
I can see why you got confused by what Elon said. What he said is very confusing and easy to misinterpret. The key context to understanding what he meant is that using one big NN from end to end for FSD is utterly impossible now. It's not even close. Even if it were possible and it wasn't an utterly stupid approach, it would mean throwing all of their existing work on the scrape heap and starting over from scratch. The more efficient approach they've been taking for years is billions upon billions upon billions of times easier than one big NN. Just trying to train such a thing would be a nightmare because the training data would have to be video from the cars combined with what drivers do operating the steering wheel and pedals. Not to mention that all the visualizations on the car's screen would be scraped.

In theory, with unlimited computing power, this is possible but in the real world it is completely impossible. This is massively wasteful because, for example, you are wasting training time to get the NN to come up with the equivalent of the occupancy network based solely on training. We, as humans know the cars are operating in a 3-D physical world. We know all the cameras are taking video of the one physical world. This is a massive symmetry (redundancy) in the data and we take advantage of this by building it into the system instead of forcing the NN to figure it out on its own.

Another way to see this is Elon has talked a lot about using FSD as the basis for the AI inside of the Tesla bot and eventually Artificial General Intelligence (AGI). One big black-box/NN that only takes Tesla camera video feeds as input and only supplies steering and pedal control as outputs would be useless for this. What will transfer over to the bot and to AGI will be things like the occupancy network, all the things that would be thrown away if FSD was one big NN.

Therefore what Elon meant was that it will be all NNs doing the processing from video in to car control out. They've been slowly whittling away at this for years, converting more and more heuristic hard-coded pieces to NNs. Moving the highway portion to the new FSD beta stack was a big step. Elon has been talking about this process for a while. He recently mentioned moving route planning over to a NN (or something like that).

They are not throwing out all of their existing work and they are not doing something that is utterly impossible going from V11 to V12. One needs to be really careful in interpreting what Elon says. In this case the full quote is:

v12 is reserved for when FSD is end-to-end AI, from images in to steering, brakes & acceleration out.
I highlighted the crucial part: end-to-end AI. The only possible thing this can mean is it will be all NNs because they have finally converted the last pieces of heuristic code over to a NN.

I don't think I am confused at all. Elon said "images in to steering and braking/acceleration out". That is clearly a reference to end-to-end. I think we should take Elon's words as plainly as possible. He is a big boy. He knows what end-to-end is. He did not accidentally use a poor choice of words. He used those words on purpose. Respectfully, I think you are trying to rationalize things. End-to-end does not make sense to you so you are assuming that Elon could not possibly have meant that, even though that is clearly what he wrote. So you are coming up with your own interpretation to try to make the tweet make sense in your mind. But the simplest explanation is that Elon said what he meant even if it does not make sense.

I think Elon sees end-to-end as the ultimate ideal: train AI to think like the human brain and figure stuff out directly from vision. So, he is very attracted to that idea and is not thinking about the nitty gritty of how to make it actually work. So you are right that end-to-end would be a nightmare but, like with vision-only, Elon does not care. He is not thinking about the details, he is just thinking that he really likes the approach and wants to somehow make it work.
 
Then it's Tesla's responsibility to train using good driving habits, not bad.
Yeah, but the point is you can't know what the AI has learned about a very particular situation until the AI screws it up. You can put all the best training data you can think of in there and run lots of simulations to test it, but a situation that is easily navigable by a human who has learned a handful of hard and fast rules may result in completely incomprehensible behavior from a NN-driven car. This is simply because of the nature of how NNs work.

I certainly don't think anyone (including Elon) is talking about one big giant NN. Such a NN would be impossible to train and test. The video processing, occupancy network, classification, and decision NN layers will all remain. And I am definitely not saying it should have no NNs. I am saying there are some parts of driving intelligence that are strictly rule based where the car should simply follow the rules. If we assume (as I do) that there is no possibility of HW3 cars ever actually being driverless robotaxis, then there will always be a driver present. Accordingly, I believe that at some level all decisions should be passed through these rules, and if that results in an exceptional situation, i.e., "where you need to break a rule," then the driver can take over and perform the exceptional maneuver. For example, if there is a school bus with flashing lights and/or a stop sign displayed, the car should always wait patiently behind it. If it turns out something has gone wrong with the bus and it is really not picking up or dropping off anybody, the driver can disengage and go around it. Same holds true for double-yellow lines. If there is something blocking ego's lane but the car would have to go over a double yellow line into an oncoming traffic lane to go around it, then the car should just wait. If the driver wants to take the chance to navigate around the blocking item, then that's what they're there for. Put down the Zaxbys salad and pilot the car around the obstacle, and then reengage FSD. That simple.

Having this rule layer, while perhaps preventing a true L5 autonomous driving system (which is not obtainable in any near term and certainly not with HW3, IMO), would advance FSD faster towards a polished L3 autonomous system without the never-ending chase of the "long tail of 9s" that is seemingly required on the current development path.
 
Last edited:
It would have been clear if he simply said "replace all the hand written code with NNs". But much less glitzy.

There are two things that may be going on. First, since Elon was very aware that using a single NN for FSD was impossible, a part of him didn't think there would be any confusion about what he meant. But another thing I've seen is he will over-promise by implication and let people read their own hopes and desires into his words.

I got flak here in the summer of 2021 after an earnings call where Elon talked about 4680 batteries. Some people waiting for a Model Y delivery were considering postponing in hopes of getting a car from Austin with 4680 batteries before the end of the year based on the hope they heard in Elon's words. But if you listened closely, he left himself an out of using 2170s if there was a problem with the 4680 ramp. Which there was. From his words alone I figured that waiting for 4680s before the end of 2021 was a bad bet. Some people didn't want to hear that.

Elon is a very optimistic guy, which is great for doing the things he does. He's Optimist Prime. IMO the world is a better and more interesting place because of his optimism. But I listen to his words very carefully, like I would read a contact written by a lawyer I don't trust. Usually the least optimistic interpretation of what he says is closest to what actually happens.


I'm quite sure these other players also mean using only NNs, not one big NN for the reasons I gave above. If you have any evidence to the contrary I would like to see it so I can improve my understanding.

Thank you this clarifies a lot. I always think of Elon Musk's proclamations about future product features as conjectures, not promises.
 
Yeah, but the point is you can't know what the AI has learned about a very particular situation until the AI screws it up.

Very true. I guess that's why they'll continue to put liability on the driver. They're expecting that less and less disengagements will be necessary, but they'll call it L2 forever unless somehow they can get the disengagement count to zero over many many many miles.

I'm in the camp of "that ain't gonna happen"... especially with cameras.