Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Karpathy talk today at CVPR 2021

This site may earn commission on affiliate links.
Well, there are others pointing out that the recent presentations conflict with your assessment

Naah- there's one guy who doesn't appear to understand much about computers, either software or hardware, or teslas previously stated design in intents either, misunderstanding Karpathy is all.


. I haven't followed both in detail so I won't jump into that (and for my point it doesn't matter which side is correct, given either version can satisfy SAE L4/L5). As for Tesla talking about redundancy early on, another comment pointed out Tesla also talked about how important radar was and then now have removed it. Nothing is stopping them from changing their approach as they see fit.

Well, yes, one thing is stopping them- Nobody will approve a non-redundant L4/L5 system.

That (nearly) everyone knows and understands that is why Tesla put so much effort into designing their hardware inherently around redundancy.


ARM big.LITTLE - Wikipedia
What you describe here sounds like a heterogeneous architecture like ARM's big.LITTLE. There are huge disadvantages to doing so (it makes the architecture hugely inflexible). You aren't necessarily saving money either, given running duplicate cores gives you economy of scale (which Tesla seemed to have achieved with HW3, I remember seeing the cost is down to $190, even less than the HW2.5 solution).

The cost savings is not in the design--- indeed designing as I describe would've been CHEAPER by a fair bit since you wouldn't have needed nearly as much resource-wise on Node B.

The cost savings is in Moores law- compute hardware gets cheaper over time and HW3 went into production roughly two moores-law cycles after HW2 did.
(plus of course there's cost savings in-housing the thing in general vs having to pay a third party like Nvidia- and also from designing a specific compute solution instead of a general one)



But point being- there's no reason to design the system as they did OTHER than FULL redundancy.

If they didn't INTEND full redundancy you'd either (if you wanted to go big little) have different node hardware- which they don't.

Or if they didn't INTEND each node to NOT share compute- you'd have instead gone with an entire different design that wasn't so heavily isolating the two nodes... because since they DID so heavily isolate them they're now running into significant performance and programming inefficiencies trying to force them to share compute of the same stuff as the hardware was never meant to be used that way.



That's not an issue as long as either node can fall back to code that can achieve the "minimal risk condition". For example:
1) Node 1 crashes, Node 2 performs the steps necessary to reach minimal risk condition.
2) Node 2 crashes, Node 1 performs the steps necessary to reach minimal risk condition.
Running some of regular driving code in both nodes does not conflict with this, given you have free computing resources available (due to the fact that the other node only needs to perform minimal risk task in the event of a crash of one).

Except that's a terrible user experience. Especially for something like robotaxis- or for when you tell your car to go somewhere without a human in it.

It means it'd fail-back to the "just stop" behavior vastly more often.


In full redundant mode if one node crashes, the car keeps driving itself just fine. No issues.

In "both nodes are needed to self drive at all" if one node (EITHER node) crashes- the self driving system fails and has to revert to stopping.


Tesla would have have designed, or intended, a system that sucks that badly.

They're only using compute on node B for the driving computer stack because they are out of compute on A.

And (and this is the only thing here you could consider "speculation" but it's supported by literally everything everyone on the Tesla side has ever said about redundancy)- they won't roll out anything they consider genuinely self-driving until they have actual redundancy that doesn't have to keep slamming on the brakes every time ONE node crashes.



No, it won't, given the "fail safely" code is much less than the code required to drive the car in normal circumstances. Say for example it takes 10% of the resources, and let's assume what you say is true (Tesla is using 100% of resources of Node 1 already or close).

They're well past 100% on node A.

That's the problem.


Here's the scenarios, just to illustrate the idea (made up numbers just to illustrate the point):
1) Duplicate code:
Node 1 100% driving code = 100% utilization
Node 2 100% driving code = 100% utilization
Tesla is SOL and needs new HW

2) "fail safely" taking 10% of resources:
Node 1 90% original driving code + 10% failsafe code = 100% utilization
Node 2 10% driving code offloaded from Node 1 + 10% failsafe code = 20% utilization
Tesla now has 80% left on Node 2 to use for other things.

You can replace the 10% assumption with another, but you will still end up with a better situation that doing duplicate code in both.


You won't though- because of the situation I described above. EITHER node crashing takes the whole self-driving system down and it has to stop the car.


That's hilariously terrible and they're not gonna roll out a system like that to production/robotaxis.

Remember how much crap Waymo got for that ONE on-video failure with a traffic cone?

Youtube'd be flooded with "Check out my Tesla robotaxi slamming on the brakes in the middle of I-95 because the computer rebooted" videos.


Plus again since the HW was never designed to share compute like this, you're losing some capacity just doing that at all.

And lastly- since we (and Tesla) still don't know how much compute you ACTUALLY NEED to do L4 or L5, they may well not -have- an extra 10% on either node to spare (or whatever the fail-code needs).


That's the thing I mention where ideal case is they can get ONE spin-up of the stack CAPABLE of L4 or L5 running, non-redundant, using the entire compute of both A and B on HW3.

If they do, then they know they can just upgrade FSD owners to HW4 and they're all good for full redundancy.


The worst case is they run out of compute on both nodes and STILL haven't solved it. Because then they still have no answer for how much do they need, and they're back to square one where HW4 may or may not be enough- and it'd be SUPER useful for them to know that before they finalize HW5 which you know will be coming.



Also note, Node 1 likely only needs to have a low demand watchdog on Node 2 and won't need to run a full copy of the failsafe code, given Node 1 would have a bulk of the driving code, which likely can already perform the failsafe function just using existing code.

There is... a lot of guesses and likelies there...

But again this produces a garbage end-user experience if any node failing knocks you out of self driving in an allegedly L4/L5 car every time.



That EU document only mentions it in a very vague way "put in place adequate design and redundancy to cope with these risk and hazards". But this conversation is talking specifically about regulations/specifications requiring two identical computers running identical code

Well, that's where the goalposts moved after one user insisting that NOBODY requires ANY redundancy was corrected with the EU doc anyway.


But see above for why that is, effectively, going to BE a requirement for a product anybody will approve or want to use.

It's not just Tesla that knows (and has said) this-- Even the more advanced L2 systems out there do this now- Supercruise for example

Caddy lead super cruise engineer said:
We have two central computing systems that are both running continuously so that if we have issues with one, we have the other as a backup

Likewise Nvidia offers the AP2x, which is NOT fully HW redundant and they only cert it for up to L2 driving aids.... and they offer the AGX Pegasus which they class for L4 solutions and is... two FULL SoCs and two FULL GPUs...to permit perfect redundancy because "Pulls over any time one node crashes" is a bad solution.





. I see nothing so far that indicates that is a requirement in SAE's L4/L5 specification

SAE is not a government regulatory body. They won't "require" anything, legally. They can't.

But commercially nobody is going to approve a robotaxi system that fails to MRC every time a single computer node crashes.

And I doubt the vast majority of governing bodies would approve such a system for regular consumer consumption either.


Again this won't prevent Tesla from (if they can do it within HW3s total compute available across both nodes plus the efficiency hit for running stuff that way) offering what is EFFECTIVELY an L4 or L5 system but they still only officially call it an L2 system and still require a driver (with driver monitoring).

Because in THAT case the human is there to insure an MRC that doesn't suck nearly as badly as just "stopping on the interstate"

At which point they'd wait to upgrade the driving computer before they just flipped a switch and made the same code "L4" or whatever once it could be redundant enough to keep safely driving even with a single node failure.
 
Last edited:
They're well past 100% on node A.

That's the problem.

This is the part I have serious doubts about. They're running a lot of parts of the system in shadow mode, and they almost certainly can't run all of that extra functionality at this point unless they turn off the redundant failover, because they don't have 4x the processing power. But that doesn't mean they don't have a full version stack running on each side. It could easily be the case that in the event of a core failing, they have enough to run at least the non-shadow stack on one core and the shadow stack on the other core, and that the only missing redundancy is caused by the fact that they can't fully compute the base values on the fallback side or the shadow computation on the main side.


Plus again since the HW was never designed to share compute like this, you're losing some capacity just doing that at all.
Yeah, that's a big problem. Except that aren't most of the cameras each going into only one side, such that if one side goes down completely, you lose about half of the cameras? If so, then it must be designed to share large amounts of data between the two sides. It's never as efficient to do distributed computing as it is to do computing in a single CPU, but this is a massively parallel task we're talking about here, with lots of different independent parts, so it shouldn't be too bad.


And lastly- since we (and Tesla) still don't know how much compute you ACTUALLY NEED to do L4 or L5, they may well not -have- an extra 10% on either node to spare (or whatever the fail-code needs).

It's anybody's guess as long as they're still running lots of extra bits in shadow mode, which probably limits how much you can optimize things.

My guess is that they were doing stuff across cores as a quick hack to get things running without worrying about optimizing out all of the shadow mode behavior or shrinking the less-critical NNs.


The worst case is they run out of compute on both nodes and STILL haven't solved it. Because then they still have no answer for how much do they need, and they're back to square one where HW4 may or may not be enough- and it'd be SUPER useful for them to know that before they finalize HW5 which you know will be coming.
I'm assuming that they only reason they gave HW3 upgrades to all those HW2 and HW2.5 people is that they think they're close, and that it has enough computing power. I would be really surprised if they have to upgrade all those cars again, much less more than once. I certainly can't think of any good reason to upgrade that small percentage of cars to new hardware prematurely, so I'd expect them to drag their heels until they were pretty sure.


Again this won't prevent Tesla from (if they can do it within HW3s total compute available across both nodes plus the efficiency hit for running stuff that way) offering what is EFFECTIVELY an L4 or L5 system but they still only officially call it an L2 system and still require a driver (with driver monitoring).

Because in THAT case the human is there to insure an MRC that doesn't suck nearly as badly as just "stopping on the interstate"

At which point they'd wait to upgrade the driving computer before they just flipped a switch and made the same code "L4" or whatever once it could be redundant enough to keep safely driving even with a single node failure.
Which I'm assuming is where they *thought* they were when they decided to upgrade everybody to HW3.
 
It's an indication they couldn't manage it with the compute available.... and it answers the question folks asked back in 2019 of "If HW3 is so great, why did they admit they were already working on a 3x more powerful HW4?"
At Autonomy Day, the question was asked from the audience, with the inference about what value can HW4 add beyond HW3. I recall an engineer mumbling about better "safety" to Elon as a prompt to get him to dial back on what it might be able to do. Even if sensors to go with it are not changed much, or the software feature set remains the same, I can imagine that 3X speed (faster frame rates, etc.) can help in this regard. I'm one of the silent majority who happily accepts that current Teslas may never become robotaxis and did not buy one for this purpose.
 
Well, yes, one thing is stopping them- Nobody will approve a non-redundant L4/L5 system.
That is only opinion/speculation. There are already counterexamples, for example states like Arizona which passed a self driving law that cares nothing about the details of how the L4 vehicles operate, much less redundancy.
That (nearly) everyone knows and understands that is why Tesla put so much effort into designing their hardware inherently around redundancy.

The cost savings is not in the design--- indeed designing as I describe would've been CHEAPER by a fair bit since you wouldn't have needed nearly as much resource-wise on Node B.

The cost savings is in Moores law- compute hardware gets cheaper over time and HW3 went into production roughly two moores-law cycles after HW2 did.
(plus of course there's cost savings in-housing the thing in general vs having to pay a third party like Nvidia- and also from designing a specific compute solution instead of a general one)

But point being- there's no reason to design the system as they did OTHER than FULL redundancy.

If they didn't INTEND full redundancy you'd either (if you wanted to go big little) have different node hardware- which they don't.

Or if they didn't INTEND each node to NOT share compute- you'd have instead gone with an entire different design that wasn't so heavily isolating the two nodes... because since they DID so heavily isolate them they're now running into significant performance and programming inefficiencies trying to force them to share compute of the same stuff as the hardware was never meant to be used that way.
They may intend for something at the beginning and then when conditions change they can change their minds (just like they did with radar). I'm just saying they have an option of doing so (not necessarily that they have done so already).
Except that's a terrible user experience. Especially for something like robotaxis- or for when you tell your car to go somewhere without a human in it.

It means it'd fail-back to the "just stop" behavior vastly more often.


In full redundant mode if one node crashes, the car keeps driving itself just fine. No issues.

In "both nodes are needed to self drive at all" if one node (EITHER node) crashes- the self driving system fails and has to revert to stopping.


Tesla would have have designed, or intended, a system that sucks that badly.

They're only using compute on node B for the driving computer stack because they are out of compute on A.

And (and this is the only thing here you could consider "speculation" but it's supported by literally everything everyone on the Tesla side has ever said about redundancy)- they won't roll out anything they consider genuinely self-driving until they have actual redundancy that doesn't have to keep slamming on the brakes every time ONE node crashes.

They're well past 100% on node A.

That's the problem.


You won't though- because of the situation I described above. EITHER node crashing takes the whole self-driving system down and it has to stop the car.


That's hilariously terrible and they're not gonna roll out a system like that to production/robotaxis.

Remember how much crap Waymo got for that ONE on-video failure with a traffic cone?

Youtube'd be flooded with "Check out my Tesla robotaxi slamming on the brakes in the middle of I-95 because the computer rebooted" videos.


Plus again since the HW was never designed to share compute like this, you're losing some capacity just doing that at all.

And lastly- since we (and Tesla) still don't know how much compute you ACTUALLY NEED to do L4 or L5, they may well not -have- an extra 10% on either node to spare (or whatever the fail-code needs).


That's the thing I mention where ideal case is they can get ONE spin-up of the stack CAPABLE of L4 or L5 running, non-redundant, using the entire compute of both A and B on HW3.

If they do, then they know they can just upgrade FSD owners to HW4 and they're all good for full redundancy.


The worst case is they run out of compute on both nodes and STILL haven't solved it. Because then they still have no answer for how much do they need, and they're back to square one where HW4 may or may not be enough- and it'd be SUPER useful for them to know that before they finalize HW5 which you know will be coming.





There is... a lot of guesses and likelies there...

But again this produces a garbage end-user experience if any node failing knocks you out of self driving in an allegedly L4/L5 car every time.
But at the moment even with zero redundancy (according to your claim that they already are doing some other code in Node 2) we haven't seen full crashes of a node happen. So it's going to be something that's very rare, not something that happens regularly in a drive.
Well, that's where the goalposts moved after one user insisting that NOBODY requires ANY redundancy was corrected with the EU doc anyway.
Well this was the exchange that prompted the EU doc:
- all of this continues to get vastly far afield of the fact the current production firmware....(and production firmwares going back to at least mid-2020) have exceeded the available compute on a single node of HW3.

Meaning they can't run the nodes redundantly.

Meaning with the current architecture and compute needs of the system- HW3 can not support L3+ driving.
Where is it stated that L3+ requires redundant compute resources? (There are so many single points of failure in a vehicle that it really doesn't make much sense to require that.)
1) It's already established that L3 does not require any redundancy (driver can take over).
2) Even if you move the goal over to L4/L5 the EU doc makes no reference to the "full" redundancy you are arguing is required. At most it suggest some sort of redundancy, but it only says "adequate". And the MRC backup is considered adequate for SAE.

But see above for why that is, effectively, going to BE a requirement for a product anybody will approve or want to use.

It's not just Tesla that knows (and has said) this-- Even the more advanced L2 systems out there do this now- Supercruise for example


Likewise Nvidia offers the AP2x, which is NOT fully HW redundant and they only cert it for up to L2 driving aids.... and they offer the AGX Pegasus which they class for L4 solutions and is... two FULL SoCs and two FULL GPUs...to permit perfect redundancy because "Pulls over any time one node crashes" is a bad solution.

SAE is not a government regulatory body. They won't "require" anything, legally. They can't.
But governments are likely going to use that document as a basis for standards, they aren't going to reinvent the wheel.
But commercially nobody is going to approve a robotaxi system that fails to MRC every time a single computer node crashes.

And I doubt the vast majority of governing bodies would approve such a system for regular consumer consumption either.


Again this won't prevent Tesla from (if they can do it within HW3s total compute available across both nodes plus the efficiency hit for running stuff that way) offering what is EFFECTIVELY an L4 or L5 system but they still only officially call it an L2 system and still require a driver (with driver monitoring).

Because in THAT case the human is there to insure an MRC that doesn't suck nearly as badly as just "stopping on the interstate"

At which point they'd wait to upgrade the driving computer before they just flipped a switch and made the same code "L4" or whatever once it could be redundant enough to keep safely driving even with a single node failure.
Well at this moment it's only speculation that "nobody" will approve of such L4/L5 systems, there are no laws indicating yet such systems would be disallowed. I'm just saying under SAE they are perfectly valid for L4/L5, the "full" redundancy you talk about is not required to qualify.
 
They're only using compute on node B for the driving computer stack because they are out of compute on A.
Just because additional compute is needed from node B doesn't mean node A compute is insufficient for FSD (with full node B redundancy). Tesla is running multiple tasks on node A such as regular production Autopilot, local FSD shadow mode, remote neural networks for triggers and potentially legacy networks that have accumulated over time due to not needing to worry about compute budget.

I would guess as usual there's prioritization of limited developer resources with the specialized understanding to implement the low level firmware to make node B as featureful as node A and trading off the short-term benefit of supporting this extended compute to support multiple tasks than the long-term need to only support the single FSD task with full redundancy.
 
Consider what the MCU2/AP3 does today. If you get the occasional MCU/IC reboot. All screens are dark. It seems to some aspect of NAV continues to work. It might revert to a basic state, guessing simple lane-keeping, basic lights/sign recog, and obstacle avoidance. No navigation, lane change, convenient things. For what seems 5 minutes later when everything reconnects, it picks back up. Don't know if the rules require a "immediate full functionality access" time frame in its redundancy? Rather an immediate "safe mode" and full functionality 5 minutes later? Both are arguably technically "redundant" but rather differ on the time frame. No?
 
If they didn't INTEND full redundancy you'd either (if you wanted to go big little) have different node hardware- which they don't.

...

Except that's a terrible user experience. Especially for something like robotaxis- or for when you tell your car to go somewhere without a human in it.

It means it'd fail-back to the "just stop" behavior vastly more often.
...
In full redundant mode if one node crashes, the car keeps driving itself just fine. No issues.

In "both nodes are needed to self drive at all" if one node (EITHER node) crashes- the self driving system fails and has to revert to stopping.
I've lost the plot here. But there are couple of things that I'd like to highlight.

Firstly, I believe, and I think you agree, that design intent didn't survive contact with reality. So currently, no redundancy is possible.

How Tesla goes forward is the question. The most probable course of action I can see is that Tesla will provide HW4/5 to people to whom LR4/5 is promised. And only them. Pre March '19 buyers of FSD. Tesla made ~200K in 2018, so by the March of the 2019, there are ~300K-400K cars produced capable of FSD. With the FSD take rate of 25-50%, we're talking 80K-200K cars. A lot less than more than 1M Teslas currently prowling this earth, soon to be 2M by the of the year.

If these people are indeed promised L4/L5, they will get new HW upgrade for free for redundancy.
But the other 70%+ won't get free HW update, it will be at cost, or cost+.
They'll however get equivalent FSD features without redundancy, and auto-pilot that shuts down/stops/panics if one core fails. After all how often does that happen?

Of course, I can't know for sure, but considering Tesla changed FSD language in Mar 2019, I assume this was beginning of hedging their bets to not take on a billion(s) dollar of liability. And I've been in some higher up positions in the companies - I could absolutely see this kind of conversation going, resulting in options I outlined above... It is a speculation though.
 
Lota posts today I'll consolidate replies in one place to avoid spamming with posts, and to save the carpel tunnel for the Mikes that disagree with everything I say :)


I've lost the plot here. But there are couple of things that I'd like to highlight.

Firstly, I believe, and I think you agree, that design intent didn't survive contact with reality. So currently, no redundancy is possible.

Yes.

That is, in fact, the primary point here.

There's no longer enough compute to provide redundancy in HW3.

Which is no issue at all for an L2 system- but a potentially huge one for higher levels.

(and there's also significant implications for HW4 and potentially HW5 as discussed in previous posts too.


How Tesla goes forward is the question. The most probable course of action I can see is that Tesla will provide HW4/5 to people to whom LR4/5 is promised. And only them. Pre March '19 buyers of FSD. Tesla made ~200K in 2018, so by the March of the 2019, there are ~300K-400K cars produced capable of FSD. With the FSD take rate of 25-50%, we're talking 80K-200K cars. A lot less than more than 1M Teslas currently prowling this earth, soon to be 2M by the of the year.

If these people are indeed promised L4/L5, they will get new HW upgrade for free for redundancy.
But the other 70%+ won't get free HW update, it will be at cost, or cost+.

Entirely possible, yes.

I've mentioned for a while the pre-3/19 buyers got different (greater) specific promises- so that's certainly one possible route.

That said... the post-3/19 folks also paid quite a bit more, and Tesla has fairly insane amounts of cash on hand-- combine with the fact tesla has said the price would keep going up and it's not at all inconceivable a 3->4 upgrade will be baked into the cost.



This is the part I have serious doubts about. They're running a lot of parts of the system in shadow mode, and they almost certainly can't run all of that extra functionality at this point unless they turn off the redundant failover

Absolutely.

But they COULD put all that extra on Node B if that was the issue.

The only reason to put active driving stack stuff on BOTH nodes is one can't provide enough compute to do so.



Yeah, that's a big problem. Except that aren't most of the cameras each going into only one side, such that if one side goes down completely, you lose about half of the cameras?

...no?

That wouldn't make any sense, given the system was designed to run the entire stack on ONE node, and have the second node ALSO running the entire stack.

That'd be impossible if each node only got half the cameras ever.



Just because additional compute is needed from node B doesn't mean node A compute is insufficient for FSD (with full node B redundancy). Tesla is running multiple tasks on node A such as regular production Autopilot, local FSD shadow mode, remote neural networks for triggers and potentially legacy networks that have accumulated over time due to not needing to worry about compute budget.

So first- shadow mode primarily is triggers.

But that aside-as I said- if "all" the actual driving stack could fit in Node A, they'd put it there.

Then use Node B for all the "extra" stuff.

Because the fact they are independent architectures makes that the only way to do it that isn't super wasteful of resources.


That's not what happened though.

They've got ACTIVE parts of the driving stack on BOTH nodes, and the ONLY reason to do that is because you can't get enough compute from a single node to run it.


Again Green himself makes exactly that same point.




I'm assuming that they only reason they gave HW3 upgrades to all those HW2 and HW2.5 people is that they think they're close, and that it has enough computing power. I would be really surprised if they have to upgrade all those cars again, much less more than once. I certainly can't think of any good reason to upgrade that small percentage of cars to new hardware prematurely, so I'd expect them to drag their heels until they were pretty sure.


As to why Tesla would have upgraded folks to HW3, simple. They thought is was enough 2 years ago (and they already had SOME features nearly ready to go, the stopsign/stoplight behavior for example).

HW4 is a hedge that it's not... but as Elon said just today, this turns out to be a much harder problem than he originally thought it was.



Which I'm assuming is where they *thought* they were when they decided to upgrade everybody to HW3.

When HW3 was designed they took a guess on how much compute they'd need.

They've entirely change the design of the software multiple times since then- and their guess was wrong.

If Tesla can get L4 or L5 working using ALL the compute of BOTH nodes (with no redundancy, so they only officially call it L2)- then they know they're safe upgrading folks to HW4 and it'll all work great.

Right now though they have no idea if that'll be possible or not--- until they do it.





That is only opinion/speculation. There are already counterexamples, for example states like Arizona which passed a self driving law that cares nothing about the details of how the L4 vehicles operate, much less redundancy.

That's not really a counter-example.

They just YOLOed "Hey if you say your car can drive itself- cool! do it!"

A few other US states have as well.

The rest have not- and are unlikely to do anything of the sort.

The EU for sure won't. Again they already call out redundancy as a requirement even if they haven't gotten down to the nitty gritty yet.


But at the moment even with zero redundancy (according to your claim that they already are doing some other code in Node 2)

Not my claim. @verygreen made the claim.

And has stated its been the case since at least mid-2020.

I'm unaware of any of the few other known folks with root access to tesla computers saying otherwise.

In fact, at least one well known AI guy, James Douma, confirmed it was true, with Green remarking "@jamesdouma profiled the NNs and they are too big to run at full fps on a single NPU"

James himself confirmed he did said profiling, but did say say anything further on what he felt the implications of that fact were.

It's always possible Tesla will somehow magically have some amazing NN breakthrough where they figure out they can replace a TON of code with a lot less of it- but for over a year now they've been increasing requirements and adding more code and more, and larger, NNs in an attempt to solve vision, and they're still not there yet.




Well at this moment it's only speculation that "nobody" will approve of such L4/L5 systems

And speculation that anyone else would.

I think the fact everyone ELSE working on L4+ driving mentions redundancy numerous times in their system descriptions suggests the folks with the most knowledge of likely regulation are speculating that it will be required though.

Tesla included based on the remarks from all 3 major players at Autonomy day.
 
shadow mode primarily is triggers.
I was enumerating the various types of neural networks being processed. Yes, shadow mode triggers can use both local networks and remote networks, but each additional network uses up some compute budget. The remote networks include those for new prototype tasks that Karpathy has mentioned from previous years in the context of Operation Vacation that generally are smaller to iterate on triggers quickly without needing a full software update. Unclear how many of these remote networks can be active at a time, and I wouldn't be surprised if they are hardcoded to load only on node A.
But that aside-as I said- if "all" the actual driving stack could fit in Node A, they'd put it there.
Then use Node B for all the "extra" stuff.
Green suggests node B is not as featureful as node A notably lacking triggering for "extra" stuff, and that's what I was getting at in terms of new low level firmware to get them to feature parity, but that has trade-offs and implementation complexity. As green also suggests, there shouldn't be technical limitations preventing it, but it does require some engineering resources. And perhaps it was easier to offload some driving tasks to node B than getting node B to support triggers.
 
Given the physical HW is identical on A/B I'd be.... VERY surprised... if it was easier to do multi-node split compute, as they are doing now, than to just enable triggers on node B.

Especially since making the HW that is node B do triggers would be essentially the same code they already have for A... while the multi-node splitting is all new stuff they're having to make work on the fly.

But to bring up an even more obvious reason that can't be what's going on... If Node B can't do triggers, but CAN do driving tasks (which must be true if they're now running some of the driving code there), why wouldn't you then move ALL the driving stuff to B and leave all the rest in A?

So again you'd be able to avoid needing to figure out multi-node split compute and the inefficiencies that comes with.


The only reason you wouldn't is... you can't get ALL the driving stuff into one node anymore.

He also has noted that part of the cross-node driving code stuff has caused them to actually drop framerates on cameras that used to run full frame (which was a problem with HW2.x that 3 originally solved when it still had compute to spare)...or in some cases entirely eliminate using certain cameras as inputs to certain NNs-- so it appears they're getting measurably degraded performance being forced to split code across nodes like this-- doesn't seem like they'd make that choice if they had any option to avoid it and run all the active driving stuff in either single node.
 
In fact, at least one well known AI guy, James Douma, confirmed it was true, with Green remarking "@jamesdouma profiled the NNs and they are too big to run at full fps on a single NPU"
Ah. I see the problem. You're underestimating the compute power on the HW3 by more than a factor of two. The HW3 computer consists of:
  • A dual-core lockstep processor that compares the outputs from the two sides of the board.
  • Two SoCs (UBQ01B0), each of whichcontains:
    • Three quad-core ARM CPUs
    • A GPU
    • Two NPUs
Here's the board:


and here's info about the chip:


So each HW3 board contains four NPUs, not two.

What they're saying, then, is that it is not possible to run both the main stack and a second complete shadow-mode stack simultaneously on either side using only the NPUs. So for shadow mode, they either have to make up the difference using some combination of the GPU and some of the twelve CPU cores or they have to split the shadow-mode stack across the two sides.

This doesn't mean they're past the point of being fully redundant — just that they're more than halfway there, ignoring the CPU and GPU capacity. :D
 
Ah. I see the problem. You're underestimating the compute power on the HW3 by more than a factor of two

well no- I'm quoting Doumas post confirming Greens remarks.

You'd need to speak with him for clarification.



What they're saying, then, is that it is not possible to run both the main stack and a second complete shadow-mode stack simultaneously on either side using only the NPUs. So for shadow mode, they either have to make up the difference using some combination of the GPU and some of the twelve CPU cores or they have to split the shadow-mode stack across the two sides.

Except, again, no-- because again that is not what they are actually doing

The driving code is being split between the two sides.

Which the system wasn't designed to do- hence why they're now dropping frames and sometimes entire cameras inputs, to deal with the compute lack and inefficacies of splitting across nodes.


If they could get all the active driving code in a single node they would- because it's the objectively better way to run things on this hardware, and doesn't require inventing a bunch of extra code (and inefficiencies) to be able to split active driving work across nodes.
 
But to bring up an even more obvious reason that can't be what's going on... If Node B can't do triggers, but CAN do driving tasks (which must be true if they're now running some of the driving code there), why wouldn't you then move ALL the driving stuff to B and leave all the rest in A?
The neural networks controlling driving behavior are also used for data collection triggers, so "just" moving that to node B would additionally require getting triggers working on node B. Most likely this is engineering resources and product prioritization to get new functionality working on their custom hardware.

Maybe getting either driving or trigger behaviors was similar amount of work, so there was a decision that getting dual-node driving behavior was more desirable anyway perhaps as a step towards full redundancy driving as well as allowing for >50% compute utilization for driving while humans are supervising. This provides both short-term and long-term driving benefits with the assumption that data collection related networks can stay under 50% (at least for now).
 
The neural networks controlling driving behavior are also used for data collection triggers, so "just" moving that to node B would additionally require getting triggers working on node B.

Most likely this is engineering resources and product prioritization to get new functionality working on their custom hardware.


This doesn't really add up though.

The engineering/coding to get triggers to work on a node is already done

Node A is running that code today.

Versus borrowing compute from a separate independent node, which has to be figured out from scratch and doesn't work great.


I can't conceive of how badly every bit of their design and code would have to be for adding triggers to B not being at least an order of magnitude simpler than cobbling together cross-node compute functionality.



Maybe getting either driving or trigger behaviors was similar amount of work, so there was a decision that getting dual-node driving behavior was more desirable anyway perhaps as a step towards full redundancy driving as well as allowing for >50% compute utilization for driving while humans are supervising. This provides both short-term and long-term driving benefits with the assumption that data collection related networks can stay under 50% (at least for now).


Now see- if they concluded "We simply CAN NOT get enough compute out of Node A to finish FSD" then it makes sense to dump engineering effort into writing cross-node stuff to be able to use the compute on both nodes for driving tasks.

Because there's simply no other option at all (other than wait for HW4 I guess).



So again, the simplest explanation is generally correct- and "One node lacks enough compute to do the driving-needed work" is by far the simplest explanation for what is being seen.

Also the one that doesn't require the Tesla team to have been basically incompetent in coding and design BEFORE this point- but simply requires them to have not known for sure how much compute you need to get FSD working.
 
Except, again, no-- because again that is not what they are actually doing

The driving code is being split between the two sides.

That makes very little sense. Why would it be being split between the two sides if it just barely won't fit in half the tensor processing capacity of one side?

It seems more likely that what they're doing is splitting the processing between the two neural processors on the same side. That's still a lot of work, because I'd imagine the original assumption was that each NPU would be doing work that was largely independent of what the other side was doing (e.g. one side dealing with road signs and lane keeping, the other side dealing with cars and pedestrians), and that there would be little coordination between the two.

The move to a BEV with lots of NN-based front end processing would have meant throwing that plan out the window, requiring a lot of effort to more closely coordinate the work of the two neural processors. That's still basically doing parallel computing with two independent processors, but it doesn't affect redundancy — only code complexity.
 
That makes very little sense. Why would it be being split between the two sides if it just barely won't fit in half the tensor processing capacity of one side?

That's the point.

It won't fit in the entire node.

You're taking one phrase Douma said about NN scans from a while ago and acting like it tells the entire story of what's happening while ignoring what's actually running and the resources being used today.


It seems more likely that what they're doing is splitting the processing between the two neural processors on the same side.

No. Green is seeing this running on both actual nodes

This isn't what is "likely" it's what is observably happening.

That's why the "must just be the extra shadow work causing it" doesn't hold up.

There's no reason to split actual driving work between nodes other than you're out of compute on a single full node to do it.
 
That's why the "must just be the extra shadow work causing it" doesn't hold up.

There's no reason to split actual driving work between nodes other than you're out of compute on a single full node to do it.
Sure there is. If it is almost full, it could be easier to split the driving work than to take some larger piece of data that is an input to both the main and shadow parts of the stack and pass that to both sides so that the shadow stuff can run on one side and the main code on the other.
 
Sure there is. If it is almost full, it could be easier to split the driving work than to take some larger piece of data that is an input to both the main and shadow parts of the stack and pass that to both sides



The sensor inputs already GO to both sides.

That's built into the design.

(which it obviously would have to be for redundancy to work--If only one side took the input then had to pass it all to the other and that first side failed you'd be screwed)

Further- because they're having to split the active driving work- the stuff they're passing NOW (ie NOT just the raw inputs from the sensors) is being hit hard, to the point they're having to reduce the framerates they're processing during active driving, and in some cases entirely dropping some cameras inputs from things like


followed up with:



it's kind of remarkable the mental gymnastics folks keep going through to avoid the simplest, most obvious, reason they're using compute on both nodes for active driving work.

A single node lacks the compute to to do it anymore (and has since they began borrowing node B compute mid-2020).

Every other idea offered so far either ignores the hardware design, requires Teslas programmers to have been incompetent at designing the software in the first place (or to be incompetent now at picking an objectively worse way of splitting load when they have better options), or some combination of both.
 
Sure there is. If it is almost full, it could be easier to split the driving work than to take some larger piece of data that is an input to both the main and shadow parts of the stack and pass that to both sides so that the shadow stuff can run on one side and the main code on the other.

The sensor inputs already GO to both sides.
By "larger piece of data", I meant some large output from some NN that takes its input from the BEV or whatever, not a sensor input.
 
By "larger piece of data", I meant some large output from some NN that takes its input from the BEV or whatever, not a sensor input.


But since the original inputs are the sensors, and that input goes equally to both nodes, then if the idea they CAN fit -just the driving stack- in a single node is correct, then the other node running shadow stuff could be running whatever NNs they need spitting out that output in the same node.

There's no need at all to pass large chunks of anything in real time between nodes unless you don't have enough compute for actively driving in a single node.

The way they're doing it NOW involves passing large chunks because active driving code is running on both nodes. That's the worst way possible to do things if you have any other choice.... (as evidenced by their needing to cut back frame rates and in some cases drop entire cameras from being processed as a result of this).