Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
Armchair thoughts:

A modular NN can be one monolithic NN once completed (layers all directly stacked). The difference between those two cases is that intermediate layers have fixed meaning in the modular case. So training back props from that boundary layer to the previous as GIGO applies.
Success of a modular approach requires the designers can correctly determine what data is important to pass on and what can be discarded. (Making it difficult on themselves)

Without the quantized intermediates, training occurs over the whole NN and validation of intermediates becomes difficult. Internal structure itself may be more uniform due to removal of the category filters. They may be able to get some clues using a fMRI type heat map of the NN while stimulating it. What images propagate through? Training suite would need to monitor this as it's beyond human scale. (Making it compute heavy)

Modular: NN recognizes the X types of road signs, recognizes lane lines in our Y test cases, follows rational paths in the Z senarios.
E2E: NN followed the road safely and legally in the Z scenarios, but what data it acted on is less clear.

Here is an interesting video where Drago (Waymo) and Alex (Wayve) discuss the differences between modular and E2E:


Interestingly, Drago says the trend in ML is towards larger models. He thinks too many modules and E2E are both extreme on opposite sides of the spectrum. He argues for a middle ground. He sees many advantages in a modular approach so he advocates for a modular approach but with a small number of large models. Alex argues for E2E, saying it is simpler and more efficient since you just need data and you can automate training of one large model that does everything.

I wonder if E2E will ultimately prevail, it is just a matter of how to get there. The modular approach will get to E2E in a more incremental way, starting with many modules and gradually merging modules into fewer and larger modules until they eventually are just left with one big module (E2E). The other E2E approach from say Wayve is to build and train one large model from scratch.
 
Last edited:
It can be implemented like that but there's no benefit. All that random indecisivenes (noise) is uncomfortable and less confidence inspiring for passengers and roadway occupants. Imagine a pedestrian in the crosswalk as all that noise drives steering, braking, and acceleration.
Ideally, there should not be much noise as small perturbations of the scene should produce the same or similar response.
Vehicle dynamics filtering needs to occur someplace regardless. The NN can be fed current and past vehicle dynamics to do this internally (which also helps determine what path is reasonable).
 
I wonder with all the changes in FSD, how long before FSD surpasses HW3 capabilities and those who want full fsd will need hw4 or higher.
You might find this an interesting read.

 
Ideally, there should not be much noise as small perturbations of the scene should produce the same or similar response.
Vehicle dynamics filtering needs to occur someplace regardless. The NN can be fed current and past vehicle dynamics to do this internally (which also helps determine what path is reasonable).
I imagine Cruise and Waymo implemented vehicle control NNs and their results looks pretty smooth. The trick is adding confidence to the vehicle control NN input data. Vehicle control might only need to be updated every 100ms or so which is a far cry from the front end.
 
Here is an interesting video where Drago (Waymo) and Alex (Wayve) discuss the differences between modular and E2E:


Interestingly, Drago says the trend in ML is towards larger models. He thinks too many modules and E2E are both extreme on opposite sides of the spectrum. He argues for a middle ground. He sees many advantages in a modular approach so he advocates for a modular approach but with a small number of large models. Alex argues for E2E, saying it is simpler and more efficient since you just need data and you can automate training of one large model that does everything.

I wonder if E2E will ultimately prevail, it is just a matter of how to get there. The modular approach will get to E2E in a more incremental way, starting with many modules and gradually merging modules into fewer and larger modules until they eventually are just left with one big module (E2E). The other E2E approach from say Wayve is to build and train one large model from scratch.
A large NN makes sense. NNs are crazy efficient but they aren't easily designed or trained. These days NN systems seem to be based on trial/error which sucks for paying customers. For now it's easier to slice and dice the problem into bite size manageable chunks. Maybe in the next 5 to 10 years?
 
  • Helpful
Reactions: pilotSteve
Vehicle control might only need to be updated every 100ms or so which is a far cry from the front end.

Initially this sounded a bit slow to me, but maybe not. I was considering the case of an unexpected obstacle (think a child running out into the road from behind a parked car).

If vehicle control runs at 10hz (100ms) then at 30 mph the car could start braking / avoiding within 4.4 ft (after perception notices the obstacle), which is not bad. I wouldn't mind a bit better though.

Are you thinking in a modular approach the different layers might run at different frequencies, something like:

- perception (camera feed > "vector space" representation of the world): 30hz
- path planning and vehicle control: 10-15hz
- "long term" planning (things like "is now a good time to start a lane change"): could be much slower, like 3hz
 
  • Like
Reactions: pilotSteve
You might find this an interesting read.

That's what Elon said. But he might give one time amnesty to upgrade HW3 to HW4 capability (not the identical HW4) like he gave amnesty to transfer FSD to a new car now. Lots of thing can change Elon's mind. What Elon says is never 100% true - That's why many people bash Elon.
 
  • Like
Reactions: APotatoGod
That's what Elon said. But he might give one time amnesty to upgrade HW3 to HW4 capability (not the identical HW4) like he gave amnesty to transfer FSD to a new car now. Lots of thing can change Elon's mind. What Elon says is never 100% true - That's why many people bash Elon.
I meant the thread, which goes on for 32 pages. It discusses pretty much every idea you might come up with. But if you have a new one, I'm sure the folks there would love to read about it.

As for Elon, just ignore the man. He's what's known in literary circles as an unreliable narrator.
 
I am not sure what you mean. As I see it, Tesla has two options:

1) Start from scratch. Basically, retrain a brand new end-to-end NN from zero and when it gets good enough, replace the old stack with the new end-to-end stack. The downside of this approach is we could see a major regression in features until the new end-to-end stack catches up with the features of the old stack.

2) Try to consolidate the existing NN until they become just one big end-to-end NN. With this approach, they might combine NNs and reduce the number of NNs over time. So, they might replace parts of the stack with end-to-end NN. For example, replace the traffic light and stop sign control with end-to-end AI. And just do that until the entire stack is end-to-end. This approach would likely prevent major regressions. It would be more gradual. Although, it might be difficult to do since you would need to manage the different pieces of the stack that are end-to-end and not end-to-end yet. And I don't know if there could be unintended consequences from one part of the stack affecting another.
I would suspect that each of your noted approaches would have a financial impact, ie how much TSLA’s auditing firm would allow for FSD revenue recognition.
 
  • Like
Reactions: DrChaos
Ideally ML scientists and everyone else should have the same definition? t makes meaningful discussion pretty hard otherwise...
Won't ever happen.

It serves tech boosters & managers to take a respected, strong engineering term of merit and then weaken its definition quietly or deceptively to promote and sell whatever compromise they have in hand at the moment. This dynamics is inevitable in human-driven capitalism. I mean look at the firms promoting what's probably logistic regression underneath as "A.I."

Elon's #1 benefit has been his ability to sell the story to other wealthy tech bro investors. That's an important skill for a startup, as SpaceX and Tesla once were, but is no longer needed for them.

The one good thing about SpaceX is that you can't fake orbit. At all. Can't shade it or claim 'other people don't have a problem with it.'.

You can't fake "did I get the customer's mass M into orbit with sufficient energy as per the contract". And as a result SX is serious and disciplined.
 
Alex argues for E2E, saying it is simpler and more efficient since you just need data and you can automate training of one large model that does everything.

"just need data" --- Ay, there's the rub

Somewhat modular systems (like humans) use transfer learning like a generic world model having been born + 16 years and millions of years of mammalian evolution with binocular vision, and then upon that build driving policy.

And even still the mistakes are substantial.

Modular systems with subtasks might enable far more data to be acquired and used in training. Like the LLM trainings do: scrape every readable text in existence. Then build specific policies on top.

End to end from policy training loss functions back propagating to perception would be very hard to validate and might collapse unpredictably in real world slightly outside its training set, but outside the train set distribution in ways which are incomprehensible to people.

Probably the way forward would have to have aspects of both (like humans)---a baseline of foundation perception and policy using enormous data and intermediate training objectives. Then fine-tune end-to-end on the specific necessary tasks, allowing for some constrained updates of the base networks (but not so much to have catastrophic forgetting). Like a trained human race car driver probably does have generic object perception different, and better than an average off the street human.

This approach of course is the hardest and most expensive.
 
Last edited:
Here is an interesting video where Drago (Waymo) and Alex (Wayve) discuss the differences between modular and E2E:
There's some interesting statements regarding modules and end-to-end:

Drago: there's one simple way in which it [modules] stays -- you have intermediate outputs that you try to understand​

Modules have some values or state in the middle that is human understandable, e.g., roads and objects.


Marco: making components that are not neural net based differentiable so that they can be jointly trained with data along with other modules​

Modules can be neural networks or traditional code, and they're even working towards propagating data backwards through traditional code like neural networks.


Drago: the question generally for modularity is "can you also train all these modules end-to-end and pass rich internal state?"​
Drago: when you should move all the way to one model say is still an open question​
Alex: the beauty of having an end-to-end stack is it's one single model​

Seems like Drago says there's end-to-end training if all modules are neural networks while Alex says end-to-end if there's only one with no intermediate output.


Alex: when we started Wayve in 2017 looking to use end-to-end learning method for autonomous driving, many folks in the industry laughed at us and said "look, no way that will work; it won't be interpretable…"​

Alex seems to imply here (and has presented elsewhere) that even a single network with its intermediate state can be interpretable even without explicit intermediate outputs.
 
Since we were talking about Wayve and E2E vs modules, I checked out the Wayve website. Here is their schematic for their autonomous driving stack, called Wayve Driver:

9R5vzzW.png


Source: Wayve Driver – AI software & fleet learning platform

We can see that they have 2 modules: a data module and an autonomy module. The data module collects data from the route, radar and cameras. The autonomy module takes that data and drives the car. We can see that the autonomy module is made up of a driving intelligence (the AI that determines how to drive). The AI is part of a safety monitor which I am guessing ensures the AI decisions are safe. Lastly, there is the control part that does the steering and braking. The autonomy module is E2E since it takes in sensor data and outputs steering and braking.
 
Last edited:
Maybe just wishful thinking, but could this mean Elon is planning on livestreaming V12? He wrote "FSD test drive" not "FSD Beta test drive."


So did the livestream happen? I can't find anything.

Elon replied this:


I am getting the feeling Elon was just trolling Zuckerberg and there was no livestream of FSD beta V12. 😟

The sad reality is that Elon seems to get caught up in these silly feuds. It undermines his credibility IMO.
 
  • Informative
Reactions: pilotSteve