Karpathy talk today at CVPR 2021

Knightshade · Jul 2, 2021

stopcrazypp said:
Of course, they can change that design and still achieve L4/L5.
I will note the exact wording quoted above says "can" not "must":
"With respect to redundancy, absolutely you can run basically the copy of the network on both, and that is actually how it's designed to achieve a level 4 or level 5 system that is redundant"

Note as I pointed out in the other comment, SAE L4/L5 specifications do not require the fall back in a ADS failure to be identical to the main ADS, only that it can reach the "minimal risk condition". That means you can run different code in the backup system.

What backup system?

They only have 2 nodes.

They are using BOTH for the primary ADS now.

I also find it pretty grasping-for-strawsish to decide the multiple times Elon, Karpathy, and Bannon all made a big deal out of having full redundancy for FSD were all just "can" not "must" discussions...

The physical design of the chip, at many many specific points, is intended for A/B full failover.

If that wasn't the thing they actually needed the FSD computer would look entirely different.

The HW design makes no sense for anything else.

That they've found they're now unable to get it working is not an indication they just decided later they didn't need it--- a LOT of engineering work went into that as a NEEDED thing-- otherwise you don't get that design in the first place.

It's an indication they couldn't manage it with the compute available.... and it answers the question folks asked back in 2019 of "If HW3 is so great, why did they admit they were already working on a 3x more powerful HW4?"

That answer being they knew they might not be able to do it with HW3 even if they hoped they could.... (just as they hoped it but couldn't with HW2 and HW2.5)

This isn't a failure on Teslas part, so it's weird people keep bending over backward to try and find excuses.

nobody knows how much compute you need for FSD- so the fact Tesla has taken several guesses now, and they've underestimated every time isn't surprising or a failure of any sort.... and the fact they were already working on HW4 2 years ago tells you it's not even THAT unexpected on their end.

And I'd bet you a fair bit of $ HW4 is also explicitly designed for full failover redundancy.

Because you're not getting L4/L5 approved in a lot of places without that. Something Tesla knew, and kept pointing out they were aiming for, years ago.

They still are.

drtimhill · Jul 2, 2021

Knightshade said:
Rob appears to be saying the opposite of that below- explicitly insisting it's NOT running the same code on both.

Yes, I have heard. over time, different stories. One is that they both run the same code and cross-check as I noted (though how they decide which is right without a 3rd processor as a tie-break is not clear). Another is that one is the master and the other a backup running as a slave ready to take-over (again, however, what triggers this take-over is not clear). Finally a few people are saying they are now trying to run different tasks on each processor.

(And I'm using "processor" here very abstractly, I'm not at all sure of the overall architecture and/or what resources are shared among the processors and/or what mechanisms exists for communication between them.)

Knightshade · Jul 2, 2021

drtimhill said:
Yes, I have heard. over time, different stories. One is that they both run the same code and cross-check as I noted (though how they decide which is right without a 3rd processor as a tie-break is not clear). Another is that one is the master and the other a backup running as a slave ready to take-over (again, however, what triggers this take-over is not clear). Finally a few people are saying they are now trying to run different tasks on each processor.

(And I'm using "processor" here very abstractly, I'm not at all sure of the overall architecture and/or what resources are shared among the processors and/or what mechanisms exists for communication between them.)

FWIW, that third one (using the second node as extra compute for the first) is what it's actually doing today in released firmware.

I've seen the cross-check story SAID, but to my knowledge nobody ever found that it actually did that in production firmware- and as you note the explanation doesn't even make much sense without a third node for tie breaking.

Until near end of 2019 Node B did literally nothing at all.

19.40.50.1 appears to be the first version where they ran an exact dupe of node A (you know- for redundancy- the thing everyone on autonomy day kept saying it was for

)

Then sometime mid 2020 that went away in favor of using Node B for extra compute since node A was maxed out and they had no other choice.

mikes_fsd · Jul 2, 2021

Knightshade said:
Then sometime mid 2020 that went away in favor of using Node B for extra compute since node A was maxed out and they had no other choice.

Maxed out doing what?
The claim (read assumption) from you and green seems to be that when they maxed out running the normal NN stack for Autopilot/FSD - and thus will need a new more powerful computer -- HW4 -- and if they introduce HW4 they in turn will have to retrofit the existing fleet.

That is easily proven false/wrong/delusional via Karpathy's latest presentation at CVPR where he says we ran 7 rounds of shadow mode on the fleet in order to get Tesla Vision released into production.

Things running in shadow mode are using the compute on those chips in conjunction with the Autopilot/FSD NN stack that is used for operating the car, so the assumption/speculation that HW3 is out of compute power for the standard Autopilot/FSD stack is premature at best.

Neither you nor green knows the breakdown -- how much processing power is being used for production NN stack vs the campaigns running in shadow mode.

Please don't tell me you think they can run stuff in parallel (for shadow mode) without using any processing power of the FSD computer!!!

dgatwood · Jul 2, 2021

stopcrazypp said:
L3 does not allow you to be a sleeping passenger in the back seat, you must be ready to take over in seconds. I know there are auto journalists that made this suggestion, but it's completely wrong and dangerous.

Blame the SAE for hiding all of their actual information behind a paywall, and only providing useless little dribbles of information to the general public, which is why we're all still arguing about what these levels mean after... how many years?

The real problem is that level 3 doesn't actually exist. Either A. the driver has to maintain enough situational awareness to take over immediately, in which case the driver is actually driving, and it's really level 2, or B. the driver doesn't, and if the vehicle can keep driving for long enough for the driver to take over safely, it can also bring the vehicle to a safe stop, in which case it's level 4 with a very limited ODD.

The general consensus, however, is that any real-world implementation of L3 would be able to automatically stop itself, L4 is L3 with an ODD wide enough to handle most driving (as opposed to just freeways or whatever), and L5 is a mostly hypothetical "can drive anywhere" system.

stopcrazypp said:
Also, as @MP3Mike points out, you don't need a redundant computer to ensure the car doesn't just hurdle down the road until it crashes. You can use a much dumber failsafe. Even in the L4/L5 cases where there is a ADS failure, it doesn't require that the fallback be of equivalent capability, only that it can achieve the minimal risk condition (which can be simply coming to a stop in its current path of travel).

Coming to a stop in its current path of travel is not necessarily a minimal risk condition. See also: stopping on a railroad track. For that matter, stopping in the middle of a freeway is generally considered extremely dangerous, too. I'd hardly call that minimal risk.

mikes_fsd · Jul 2, 2021

mikes_fsd said:
Karpathy's latest presentation at CVPR where he says we ran 7 rounds of shadow mode on the fleet in order to get Tesla Vision released into production.

I want to emphasize this.
They ran the full Tesla Vision stack in shadow mode (in parallel with the standard Autopilot/FSD stack) seven different times.

Once people understand this, they will also understand that the hype around "they are out of compute on FSD hardware" is way pre-mature!

mikes_fsd · Jul 2, 2021

mikes_fsd said:
They ran the full Tesla Vision stack in shadow mode (in parallel with the standard Autopilot/FSD stack) seven different times.

They ran 2 full NN stacks in parallel without running out of compute!
One stack that actually drives the car, the other in shadow mode for validating performance of Tesla Vision stack.

Oh, and ALL of that ran on one of the 2 nodes (they only now started adding the cross-node multitasking stuff)!

@Knightshade which part of this are you disagreeing with? Do actually present any additional facts beside what I presented with the Karpathy talk @ CVPR '21!

stopcrazypp · Jul 2, 2021

Knightshade said:
What backup system?

They only have 2 nodes.

They are using BOTH for the primary ADS now.

I also find it pretty grasping-for-strawsish to decide the multiple times Elon, Karpathy, and Bannon all made a big deal out of having full redundancy for FSD were all just "can" not "must" discussions...

The physical design of the chip, at many many specific points, is intended for A/B full failover.

If that wasn't the thing they actually needed the FSD computer would look entirely different.

The HW design makes no sense for anything else.

That they've found they're now unable to get it working is not an indication they just decided later they didn't need it--- a LOT of engineering work went into that as a NEEDED thing-- otherwise you don't get that design in the first place.

It's an indication they couldn't manage it with the compute available.... and it answers the question folks asked back in 2019 of "If HW3 is so great, why did they admit they were already working on a 3x more powerful HW4?"

That answer being they knew they might not be able to do it with HW3 even if they hoped they could.... (just as they hoped it but couldn't with HW2 and HW2.5)

This isn't a failure on Teslas part, so it's weird people keep bending over backward to try and find excuses.

nobody knows how much compute you need for FSD- so the fact Tesla has taken several guesses now, and they've underestimated every time isn't surprising or a failure of any sort.... and the fact they were already working on HW4 2 years ago tells you it's not even THAT unexpected on their end.

Backup system being the second node. I don't know what method they are currently using, and it seems there isn't a consensus on this thread yet. All I'm pointing out is either way can work for L4/L5 under SAE. There is no need for both nodes to be identical (although that can be one possible design). All the second node needs to do is to have enough code for it to achieve a "minimal risk condition" on detecting a failure of the first node.

Knightshade said:
And I'd bet you a fair bit of $ HW4 is also explicitly designed for full failover redundancy.

Because you're not getting L4/L5 approved in a lot of places without that. Something Tesla knew, and kept pointing out they were aiming for, years ago.

They still are.

By "full" I presume you mean both nodes are running identical code. If that is the case, that is not required for L4/L5 under SAE. The failover only needs to run the minimal amount of code necessary to achieve the "minimal risk condition" (which can be just coming to a stop in its current path of travel or it can also be pulling to the side of the road). Hard to say if approvals will be more strict than this, as we have zero examples yet of such requirements.

stopcrazypp · Jul 2, 2021

dgatwood said:
Blame the SAE for hiding all of their actual information behind a paywall, and only providing useless little dribbles of information to the general public, which is why we're all still arguing about what these levels mean after... how many years?

You can download a copy for free right here:
SAE MOBILUS
They actually released a new version also this year in April:
SAE MOBILUS

While the general public has an excuse to not read it, tech/auto journalists do not (they definitely have access to SAE papers, even ones not provided for free). They are just too lazy to do so and just wing it in terms of how they define L3/L4/L5.

dgatwood said:
The real problem is that level 3 doesn't actually exist.

Well Honda has a L3 system apparently running already, so it's possible.

dgatwood said:
Either A. the driver has to maintain enough situational awareness to take over immediately, in which case the driver is actually driving, and it's really level 2, or B. the driver doesn't, and if the vehicle can keep driving for long enough for the driver to take over safely, it can also bring the vehicle to a safe stop, in which case it's level 4 with a very limited ODD.

The general consensus, however, is that any real-world implementation of L3 would be able to automatically stop itself, L4 is L3 with an ODD wide enough to handle most driving (as opposed to just freeways or whatever), and L5 is a mostly hypothetical "can drive anywhere" system.

I'm also personally skeptical of L3 systems, but it is well defined by the document and as pointed out above, there are examples of them in the real world. I guess we'll find out more as Honda launches the fleet.

dgatwood said:
Coming to a stop in its current path of travel is not necessarily a minimal risk condition. See also: stopping on a railroad track. For that matter, stopping in the middle of a freeway is generally considered extremely dangerous, too. I'd hardly call that minimal risk.

A minimal risk condition is defined as follows (p11 J3016 2018):
"A condition to which a user or an ADS may bring a vehicle after performing the DDT fallback in order to reduce the risk of a crash when a given trip cannot or should not be complete."
(p15 J3016 2021):
"A stable, stopped condition to which a user or an ADS may bring a vehicle after performing the DDT fallback in order to reduce the risk of a crash when a given trip cannot or should not be continued."

It's simply a condition that safer than the alternative you pointed out earlier, which is the car hurdling down the road out of control until it crashes into something. It's the best the system can do when part of it has failed. It doesn't mean it's necessarily the best possible situation out of all possible outcomes (best would obviously be to be able to just keep driving), but it doesn't have to be, given the system has been impaired in some way.

Knightshade · Jul 2, 2021

mikes_fsd said:
They ran 2 full NN stacks in parallel without running out of compute!

No, they did not.

You do not understand what "shadow mode" means.

Green, helpfully, has a really good explanation of it on twitter, but you don't seem to be interested in the many facts and insights he provides.

In case you're interesting in learning though- that is here:

https://twitter.com/x/status/1096322810694287361

What it absolutely is not is "a duplicate of the full stack"

Which is physically impossible anyway since HW3 lacks the compute to do that anymore.

Also, if they were doing it, it would've been obvious in observing what was running on each node and that is factually not what is being observed running on them

mikes_fsd said:
@Knightshade which part of this are you disagreeing with?

The entire thing.

That's not, even remotely, what shadow mode is.

mikes_fsd said:
Do actually present any additional facts beside what I presented with the Karpathy talk @ CVPR '21!

You don't appear to have actually understood with Karpathy said seems to be the problem.

Or you're making a lot incorrect assumptions about what some of the words he used actually mean at the very least.

stopcrazypp said:
Backup system being the second node. I don't know what method they are currently using, and it seems there isn't a consensus on this thread yet.

Well, there is, among the folks actually paying attention to the facts on what's running where- or who listened to anything anybody at autonomy day said.

Currently it's an L2 system, so there is no backup nor any need for one.

So what they're doing is moving a bunch of compute that node A no longer has capacity for over to node B- instead of using it for redundancy.

That is- factually- what method they're currently using to push development forward.

And it's perfectly fine so long as the system remains L2.

But for an L4 or better system three different Tesla folks at Autonomy day said their entire design intent was full redundancy.

And the entire layout and design of the physical hardware confirms that fact.

So clearly tesla believes it's needed- or they wouldn't have wasting a ton of time, money, and design on this setup- they've have done it totally differently.

Which tells us if there does not turn out to be some magic way of shrinking compute needs drastically while also[/B} making the system drastically more capable- that they're going to end up needing to upgrade the cars to (at least) HW4- which allegedly has 3x the power... but I'd bet you money is [B}also physically and intentionally meant to be fully redundant

stopcrazypp said:
All I'm pointing out is either way can work for L4/L5 under SAE. There is no need for both nodes to be identical

Then why does it seem to have been such a major requirement that Teslas entire hardware design is built around it- and the 3 most knowledgable people about the system speaking at Autonomy day all mentioned it?

stopcrazypp said:
(although that can be one possible design). All the second node needs to do is to have enough code for it to achieve a "minimal risk condition" on detecting a failure of the first node.

So there's a few issues with this...

One- you wouldn't have designed the computer even remotely the way it is if that's what you were actually aiming for... (you'd have instead designed it as one really powerful main computer, and a much smaller "safety" computer- instead of two independent but identical powerful ones)

Two- Since currently they're running parts of the driving code on both nodes- either one crashing would cause a problem for the system if it was running L4 or L5.

So now both nodes ALSO need to be running this "fail safely" code too, eating more compute you're already running out of.

stopcrazypp said:
By "full" I presume you mean both nodes are running identical code. If that is the case, that is not required for L4/L5 under SAE. The failover only needs to run the minimal amount of code necessary to achieve the "minimal risk condition" (which can be just coming to a stop in its current path of travel or it can also be pulling to the side of the road). Hard to say if approvals will be more strict than this, as we have zero examples yet of such requirements.

Other than the EU explicitly using the word redundancy.

Just as Elon, Bannon, and Karpathy all did.

It seems the only people who do not[/B} think it's gonna be considered needed are a few folks in this thread

mikes_fsd · Jul 2, 2021

Knightshade said:
You do not understand what "shadow mode" means.

Ahh, yes, I am sorry, I should listen to the guy that says Elon/Karpathy and Tesla are all lying (greens words not mine) instead of the guy (Karpathy) who is actually deploying the Tesla Vision to thousands of cars in real life.
My bad.

Green has a very simplistic view of the world:
* they are using all of CPU on node A -- Tesla is doomed they will never be able to run FSD on current hardware [note no questions on WHAT is using all the CPU]
* saw some events fire -- ALL of shadow mode is nothing more than just a events reporting system -- ignoring the actual architect of Tesla Vision stack and what he says about it.
* 4D radar -- look it is just like lidar! this one still boggles my mind.

Please find better sources for your facts, green has exhausted all the credibility he will ever have.

mikes_fsd · Jul 2, 2021

mikes_fsd said:
* saw some events fire -- ALL of shadow mode is nothing more than just a events reporting system -- ignoring the actual architect of Tesla Vision stack and what he says about it.

Knightshade said:
Which is physically impossible anyway since HW3 lacks the compute to do that anymore.

Before you lose your sh!t again, please listen to what the man himself says:
"we've also deployed this in shadow modes and seen that this stack performs fairly well "
he is NOT talking about a task, he talking about the entire "Vision stack vs the fusion stack" performing as shown by running it shadow mode on the fleet.

at the 29 minute mark --

Again, physically impossible when your assumptions are wrong.
If the CPU utilization is because of all the stuff that is running in shadow mode -- collecting data or validating a new set of NN's -- and not because of your preconceived notion that the production build is already using up all the CPU capacity.

Knightshade · Jul 2, 2021

mikes_fsd said:
Before you lose your sh!t again, please listen to what the man himself says:
"we've also deployed this in shadow modes and seen that this stack performs fairly well "
he is NOT talking about a task, he talking about the entire "Vision stack vs the fusion stack" performing as shown by running it shadow mode on the fleet.

There's multiple issues with your claims here-- apart from the histrionics in the first sentence.

For one, he never says "entire" anything.

For another, the entire AP/FSD software stack is vastly more stuff than just the vision NNs- which are only part of what the -entire- stack runs.

It'd be like a developer talking about anti-virus and saying they ran "the entire antivirus stack" and you confusing that with "the entire OS stack"

@verygreen is a user here- perhaps he'd like to comment directly though to help clear up some of the confusion that appears to have you in such a tizzy.

mikes_fsd · Jul 2, 2021

Knightshade said:
For another, the entire AP/FSD software stack is vastly more stuff than just the vision NNs- which are only part of what the -entire- stack runs.

It'd be like a developer talking about anti-virus and saying they ran "the entire antivirus stack" and you confusing that with "the entire OS stack"

Ahh, duh!
I actually quoted the video, "Tesla Vision stack vs Fusion stack" is the continuation of that same quote from the video.
Maybe to help you we can call it the perception stack? Not sure, what you're grasping at.

Knightshade · Jul 2, 2021

mikes_fsd said:
Ahh, duh!
I actually quoted the video, "Tesla Vision stack vs Fusion stack" is the continuation of that same quote from the video.
Maybe to help you we can call it the perception stack? Not sure, what you're grasping at.

The fact "the perception stack" is much less stuff than the entire driving computer stack

Was the anti-virus vs OS analogy too technical for you?

Another reason we know you're wrong is if it was as simple as your claim the entire -active- stack, without the shadow stuff, fits in one Node, then you would have seen that, and Node B could be used for the stuff running shadowed, and compared to what Node A was seeing.

This would avoid the significant performance and programming issues with spreading your active code across two nodes never meant to work together after all.

But that's not what is actually happening

There's active (ie not "shadow") code running on both nodes because there is not enough compute on Node A to run it all

mikes_fsd · Jul 2, 2021

Knightshade said:
There's active (ie not "shadow") code running on both nodes because there is not enough compute on Node A to run it all

Proof?

HTF do you know what is running in shadow mode (campaigns can be uploaded to a subset or entire fleet and change frequently -- also your boy green confirms this too).
How do you differentiate a NN running when the output is just used for reporting back vs a NN running where the output is used by the driving policy to operate the car?

Knightshade · Jul 2, 2021

mikes_fsd said:
Proof?

You've been linked to it. You simply refuse to believe it, or the person who unlike you has root computer access providing it.

Mind you, your entire counter-argument to this appears to be just denying what he's saying while having literally no proof of anything except a misunderstanding of what Karpathy said.

mikes_fsd said:
HTF do you know what is running in shadow mode (campaigns can be uploaded to a subset or entire fleet and change frequently -- also your boy green confirms this too).
How do you differentiate a NN running when the output is just used for reporting back vs a NN running where the output is used by the driving policy to operate the car?

Because you can see where the output is going if you have root access.

You can also see they are not running a second instance of the entire driving stack

I would politely suggest that if your knowledge of the inner workings of computers and programming, and Teslas systems, are such that all of that is news to you- which seems to be the case since your comments suggest you didn't know you could find this stuff out- this might a good time to step away from the discussion and do quite a bit of technical reading instead.

https://twitter.com/x/status/1410694595885813766

dgatwood · Jul 2, 2021

stopcrazypp said:
You can download a copy for free right here:
SAE MOBILUS
They actually released a new version also this year in April:
SAE MOBILUS

All I'm seeing is a PDF containing terms and definitions, along with a table of contents. You're referring to page 10 in a document that, from my perspective, has only four pages. So....

stopcrazypp said:
A minimal risk condition is defined as follows (p11 J3016 2018):
"A condition to which a user or an ADS may bring a vehicle after performing the DDT fallback in order to reduce the risk of a crash when a given trip cannot or should not be complete."
(p15 J3016 2021):
"A stable, stopped condition to which a user or an ADS may bring a vehicle after performing the DDT fallback in order to reduce the risk of a crash when a given trip cannot or should not be continued."

That's about the least defined definition I've ever seen. How much do you have to reduce the risk of a crash? The first of those two definitions doesn't even require the vehicle to be stopped. Technically, moving the vehicle at 1 MPH qualifies. At least the second one includes the word "stopped". But it still basically says nothing at all. I'd expect at least a page of criteria for what constitutes "minimal risk". That right there is a total copout.

stopcrazypp said:
It's simply a condition that safer than the alternative you pointed out earlier, which is the car hurdling down the road out of control until it crashes into something. It's the best the system can do when part of it has failed. It doesn't mean it's necessarily the best possible situation out of all possible outcomes (best would obviously be to be able to just keep driving), but it doesn't have to be, given the system has been impaired in some way.

By that interpretation, a car suddenly stopping in the middle of the freeway every time navigate on autopilot gets disabled because you turned up the windshield wipers past the second slowest setting qualifies as "safer", while in reality, doing that would border on suicidal. The SAE definition is crap.

stopcrazypp · Jul 2, 2021

Knightshade said:
Well, there is, among the folks actually paying attention to the facts on what's running where- or who listened to anything anybody at autonomy day said.

Currently it's an L2 system, so there is no backup nor any need for one.

So what they're doing is moving a bunch of compute that node A no longer has capacity for over to node B- instead of using it for redundancy.

That is- factually- what method they're currently using to push development forward.

And it's perfectly fine so long as the system remains L2.

But for an L4 or better system three different Tesla folks at Autonomy day said their entire design intent was full redundancy.

And the entire layout and design of the physical hardware confirms that fact.

So clearly tesla believes it's needed- or they wouldn't have wasting a ton of time, money, and design on this setup- they've have done it totally differently.

Which tells us if there does not turn out to be some magic way of shrinking compute needs drastically while also[/B} making the system drastically more capable- that they're going to end up needing to upgrade the cars to (at least) HW4- which allegedly has 3x the power... but I'd bet you money is [B}also physically and intentionally meant to be fully redundant

Well, there are others pointing out that the recent presentations conflict with your assessment. I haven't followed both in detail so I won't jump into that (and for my point it doesn't matter which side is correct, given either version can satisfy SAE L4/L5). As for Tesla talking about redundancy early on, another comment pointed out Tesla also talked about how important radar was and then now have removed it. Nothing is stopping them from changing their approach as they see fit.

Knightshade said:
Then why does it seem to have been such a major requirement that Teslas entire hardware design is built around it- and the 3 most knowledgable people about the system speaking at Autonomy day all mentioned it?

Didn't we just talk about this? The wording used in regards to running a duplicate network was "can" not "must". Also Tesla can always plan to do something beyond SAE's minimal requirements for L4/L5. I'm just pointing out Tesla can also do something less than an identical copy and still meet the requirements.

Knightshade said:
So there's a few issues with this...

One- you wouldn't have designed the computer even remotely the way it is if that's what you were actually aiming for... (you'd have instead designed it as one really powerful main computer, and a much smaller "safety" computer- instead of two independent but identical powerful ones)

ARM big.LITTLE - Wikipedia
What you describe here sounds like a heterogeneous architecture like ARM's big.LITTLE. There are huge disadvantages to doing so (it makes the architecture hugely inflexible). You aren't necessarily saving money either, given running duplicate cores gives you economy of scale (which Tesla seemed to have achieved with HW3, I remember seeing the cost is down to $190, even less than the HW2.5 solution). The main reason to do such a configuration is power savings, but Tesla determined already HW3 was within their power budget (HW3 is actually more efficient than HW2.5; it consumes 25% more power overall, but provides drastically more computing power), so that is not a concern here.

Knightshade said:
Two- Since currently they're running parts of the driving code on both nodes- either one crashing would cause a problem for the system if it was running L4 or L5.

That's not an issue as long as either node can fall back to code that can achieve the "minimal risk condition". For example:
1) Node 1 crashes, Node 2 performs the steps necessary to reach minimal risk condition.
2) Node 2 crashes, Node 1 performs the steps necessary to reach minimal risk condition.
Running some of regular driving code in both nodes does not conflict with this, given you have free computing resources available (due to the fact that the other node only needs to perform minimal risk task in the event of a crash of one).

Knightshade said:
So now both nodes ALSO need to be running this "fail safely" code too, eating more compute you're already running out of.

No, it won't, given the "fail safely" code is much less than the code required to drive the car in normal circumstances. Say for example it takes 10% of the resources, and let's assume what you say is true (Tesla is using 100% of resources of Node 1 already or close).

Here's the scenarios, just to illustrate the idea (made up numbers just to illustrate the point):
1) Duplicate code:
Node 1 100% driving code = 100% utilization
Node 2 100% driving code = 100% utilization
Tesla is SOL and needs new HW

2) "fail safely" taking 10% of resources:
Node 1 90% original driving code + 10% failsafe code = 100% utilization
Node 2 10% driving code offloaded from Node 1 + 10% failsafe code = 20% utilization
Tesla now has 80% left on Node 2 to use for other things.

You can replace the 10% assumption with another, but you will still end up with a better situation that doing duplicate code in both. Also note, Node 1 likely only needs to have a low demand watchdog on Node 2 and won't need to run a full copy of the failsafe code, given Node 1 would have a bulk of the driving code, which likely can already perform the failsafe function just using existing code.

Knightshade said:
Other than the EU explicitly using the word redundancy.

Just as Elon, Bannon, and Karpathy all did.

It seems the only people who do not[/B} think it's gonna be considered needed are a few folks in this thread

That EU document only mentions it in a very vague way "put in place adequate design and redundancy to cope with these risk and hazards". But this conversation is talking specifically about regulations/specifications requiring two identical computers running identical code. I see nothing so far that indicates that is a requirement in SAE's L4/L5 specification, much less in regulation. According to SAE, a backup computer that can run the minimal risk condition code is adequate redundancy for L4/L5. There is no requirement whatsoever that the backup be a full identical copy of the main system (I encourage you to read the document yourself). This distinction is specifically why I asked what you mean by "full failover redundancy".

stopcrazypp · Jul 3, 2021

dgatwood said:
All I'm seeing is a PDF containing terms and definitions, along with a table of contents. You're referring to page 10 in a document that, from my perspective, has only four pages. So....

You have to click the download button on the page, then click download again in the window that pops up.
I got a 35 page document for the 2018 link, and a 41 page document for the 2021 link. Maybe your download didn't complete?

dgatwood said:
That's about the least defined definition I've ever seen. How much do you have to reduce the risk of a crash? The first of those two definitions doesn't even require the vehicle to be stopped. Technically, moving the vehicle at 1 MPH qualifies. At least the second one includes the word "stopped". But it still basically says nothing at all. I'd expect at least a page of criteria for what constitutes "minimal risk". That right there is a total copout.

The document was intended to be flexible in the first place and not be overly rigid (as doing so may follow the same slippery slope arguments you have used previously). They do give specific examples in the rest of the document what can be considered minimal risk conditions, but they only serve as examples, not hard requirements:
p8
"turning on the hazard flashers, maneuvering the vehicle to the road shoulder and parking it, before automatically summoning emergency assistance."
p11
"It may entail automatically bringing the vehicle to a stop within its current travel path, or it may entail a more extensive maneuver designed to remove the vehicle from an active lane of traffic and/or to automatically return the vehicle to a dispatching facility."

dgatwood said:
By that interpretation, a car suddenly stopping in the middle of the freeway every time navigate on autopilot gets disabled because you turned up the windshield wipers past the second slowest setting qualifies as "safer", while in reality, doing that would border on suicidal. The SAE definition is crap.

Not sure what your example has to do with it. SAE is talking about transitioning to that state when there is a ADS failure (like for example the computer crashing, or important sensors going offline, enough that it can't keep driving safely) or a vehicle failure (like a blown tire or suspension), not when ADS/vehicle is functioning properly. In such a case, coming to a stop in the middle of the freeway (with hazards on) is a safer alternative than to keep driving. In fact, that's what a lot of people do for regular vehicle failures when they can't safely get over to a shoulder (or if there isn't a shoulder in the first place). I've seen that plenty of times when driving, and I even have passed cars like that myself.

Karpathy talk today at CVPR 2021

Well-Known Member

Active Member

Well-Known Member

Banned

Active Member

Banned

Banned

Well-Known Member

Well-Known Member

Well-Known Member

Banned

Banned

Well-Known Member

Banned

Well-Known Member

Banned

Well-Known Member

Active Member

Well-Known Member

Well-Known Member

Similar threads