This discussion by Douma this July seems to disagree with the take that Tesla is necessarily hitting capacity limits even if they are doing multi-node NN computing. Or is there a more recent take you are referring to?
James Douma on Tesla’s Fleet Data Collection Effort
Douma disagreed with green on WHY there was no longer redundancy due to borrowing compute from Node B--- he did not disagree that was the current state of things though.
Which probably puts him in the "they will maybe fix it later on" camp, but as things stand now no redundancy.
I don't feel they need to do that even if they reach limits on a single HW3 node. If they get the multi-node working properly, as discussed in previous threads, they can run a minimal safety critical set on the spillover node and that would satisfy all the redundancy requirements for even L4/L5 according to SAE. No need to fit everything into one node.
Except there's no evidence they can do that.
Remember, in the production code the only thing they're using NNs for is perception right now (Green reconfirmed that just yesterday).
And that's split across both sides.
If one side fails- you lose perception. How do you "fail safely" at that point?
They need the perception stack running
fully on both sides to be able to do that.
Which, if they could do that, they wouldn't be splitting it between sides.
(and it's not like when one side crashes they can then decide to spin up a bunch of extra NNs on the other to take over perception anyway- it's too late by then).
The fact HW3 could survive a failover of one side was one of the major things they hyped about it at autonomy day.
So- other than the idea Tesla is just writing terrible, massively bloated, code they'll somehow be able to add a ton MORE ability to and then also massively shrink down in compute needed as well- I don't see how you get above L3 (or even L2 really) without HW4 (if that's even enough- since they don't actually know until they solve it).