Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Karpathy talk today at CVPR 2021

This site may earn commission on affiliate links.
I'm putting faith in them having common sense. Every car likely exhibits slightly over one spontaneous bit flip per day. That's at least five or six orders of magnitude too high to be in control of a car.
This is not a matter of "common sense". This is a very small part of autonomous vehicles, so may not be brought up at all in regulation. And what you bring up seems to apply equally to many critical controller units already in cars today (for example, engine/throttle controls, brake controls, etc). Are there any laws requiring redundancy in those systems today? If they missed such "common sense" in current regulations, why would it necessarily be part of the next regulations?
Right now, there are no examples because the safety regulations don't exist yet. States with autonomous vehicle laws have generally just covered the basic functionality requirements, and left the safety aspects up to NHTSA.
Then that pretty much ends the argument for now. There are no such regulations, and there is no timetable nor any specific proposals for including redundancy (especially something as specific as protection against bit flips) into law at this moment.
NHTSA's rulemaking plans are described here:


Note that redundancy is mentioned on p. 24 as one of the criteria expected to be addressed in the actual rulemaking. When the rules get nailed down, redundancy will be a requirement.
That's last in a list of 8 other possible issues to consider, which itself is in an "other aspects" paragraph of the "other safety functions", not part of the 4 core ADS safety functions. I guess we'll see how they address it in the future, but doesn't seem to me like it'll be high priority.
 
  • Like
Reactions: rxlawdude
I've touched on this before many years ago. Basically even when Tesla releases the software that makes the car "FSD" there's no guarantee a regulation is not passed that makes it so that car can't be driven in FSD mode, due to regulations that might for example require lidar.

That said, this discussion is talking about current regulations.
California law requires that the vehicle be able to bring itself to a stop safely in the event of a failure. I don't see how you could realistically guarantee that without redundancy. So a vehicle without a redundant core would not be approved in California.
 
California law requires that the vehicle be able to bring itself to a stop safely in the event of a failure. I don't see how you could realistically guarantee that without redundancy. So a vehicle without a redundant core would not be approved in California.
That has nothing to do with redundancy though. For example, the AEB systems in various cars today (some of which may be mandated in jurisdictions like Europe) basically have to meet various standards for auto braking for various subjects on the road. If there was a requirement for redundancy in the event of primary brake failure (for example by using Regen brakes or parking/emergency brakes), pretty much all of the systems out there would fail today, as they simply wouldn't be able to brake in time. And I'm pretty sure none of them have redundancies in the controllers or the sensors either.

If redundancy was a requirement, it'll have to be specifically laid out, not a loose conjecture as you propose.
 
This is not a matter of "common sense". This is a very small part of autonomous vehicles, so may not be brought up at all in regulation. And what you bring up seems to apply equally to many critical controller units already in cars today (for example, engine/throttle controls, brake controls, etc). Are there any laws requiring redundancy in those systems today? If they missed such "common sense" in current regulations, why would it necessarily be part of the next regulations?

Those things are many orders of magnitude less complex than a self-driving computer, so they can typically fail and reboot so quickly that you wouldn't even know it happened most of the time.

And brake controllers and steering systems have a mechanical backup, which is required by law. So yes, there are laws requiring redundancy in safety-critical systems.


Then that pretty much ends the argument for now. There are no such regulations, and there is no timetable nor any specific proposals for including redundancy (especially something as specific as protection against bit flips) into law at this moment.

Nobody is saying that they should require protection against something that specific. What we're saying is that regulators will require that the computer system have a full backup, complete with a redundant power supply, to ensure that a controller failure, whether caused by random bit flips, undetected hardware defects, electromigration, solder ball failures from thermal expansion or vibration damage, or even an improperly connected wiring harness won't cause the vehicle to suddenly and without warning become driverless.


That's last in a list of 8 other possible issues to consider, which itself is in an "other aspects" paragraph of the "other safety functions", not part of the 4 core ADS safety functions. I guess we'll see how they address it in the future, but doesn't seem to me like it'll be high priority.
It's not a high priority right now, precisely because every single company doing any sort of automated car development is already building redundancy into the systems from day one. Heck, I don't think anybody is even building ADAS systems without redundancy, much less any attempts at full autonomy.

The best way to get new regulations passed overnight would be for a company the size of Tesla to release a self-driving system that has no redundancy. You would then see hastily written laws banning them from public roads in state after state. And then we'd have to deal with those badly written regulations for decades, all because a company ignored their duty to public safety and did something incredibly reckless.

Not going to happen.
 
California law requires that the vehicle be able to bring itself to a stop safely in the event of a failure. I don't see how you could realistically guarantee that without redundancy. So a vehicle without a redundant core would not be approved in California.

That has nothing to do with redundancy though.

It most certainly does. If a behavior required by law cannot be achieved without redundancy, then the law does, in effect, require redundancy. If a vehicle can't drive safely without both of its computers being active, then it cannot possibly meet the requirement that it be able to safely stop in the event of a failure of one of those computers. This is tautologically correct.


For example, the AEB systems in various cars today (some of which may be mandated in jurisdictions like Europe) basically have to meet various standards for auto braking for various subjects on the road. If there was a requirement for redundancy in the event of primary brake failure (for example by using Regen brakes or parking/emergency brakes), pretty much all of the systems out there would fail today, as they simply wouldn't be able to brake in time. And I'm pretty sure none of them have redundancies in the controllers or the sensors either.

If redundancy was a requirement, it'll have to be specifically laid out, not a loose conjecture as you propose.

Maybe you aren't quite clear on what would happen if the self-driving computer suddenly stopped doing anything. You'd have a passenger sleeping in the back seat, and the car would suddenly continue in whatever direction it was going until it hit something. The passenger would die.

Automatic emergency brakes are a driver assistance system. They don't need a backup, because the human is the primary driver. They ARE the backup. The human is expected to know how to slow down the car, and to do so under typical circumstances.

What I'm saying is not "a loose conjecture". It is NOT POSSIBLE to meet the requirements for SAE level 3 if you don't have a redundant computer, period. And the law requires you to meet those criteria before deploying something as an autonomous vehicle.
 
The thing is the argument is over if there is any proof of regulations that require such redundancy. So far there is not a single example given.

You mean besides where I specifically quoted the EU regs saying they would require redundancy?

Does Europe not count?



I'm not about to hunt that down but heres a good overview

It's your claim, thus your responsibility to support the claim- but I'd be surprised if he makes it anywhere in that video. Feel free to provide a timestamp if you find it.

To my knowledge that's not how the system works at all though (it literally can't be since the system doesn't have enough compute to run the entire software stack twice.- which is the exact thing being discussed in the last several pages)
 
  • Like
  • Disagree
Reactions: DanCar and MP3Mike
What I'm saying is not "a loose conjecture". It is NOT POSSIBLE to meet the requirements for SAE level 3 if you don't have a redundant computer, period. And the law requires you to meet those criteria before deploying something as an autonomous vehicle.
Does it have to be a redundant computer, or can it just be a fail-safe system such that if the FSD computer fails the car shuts down, turns on the hazard lights, and applies the brakes?
 
To my knowledge that's not how the system works at all though (it literally can't be since the system doesn't have enough compute to run the entire software stack twice.- which is the exact thing being discussed in the last several pages)

Thats exactly what I am not saying. The system is a whole with many NN running across both CPUs its not confined to 1 or a duplicate across both.
 
Does it have to be a redundant computer, or can it just be a fail-safe system such that if the FSD computer fails the car shuts down, turns on the hazard lights, and applies the brakes?
That would only be safe if you were on a perfectly straight road. Otherwise, if the curve of the road changes, you could cross the center line into traffic before you stop, or go off into a ravine.

But what if you only have lane keeping? No, because it could still stop on a railroad track or in the middle of an intersection.

But what if it has lane keeping and intersection/sign detection? No, because it could still fail to see a pedestrian that the full system would have swerved for.

But what if it detects pedestrians? No, because what if you're twenty feet from a construction zone when it shuts down, and it has to detect safety cones and handle safe path finding through them?

And now you're basically running the entire stack.

Also, loading a partial, lightweight stack would probably require shutting everything down for a couple of seconds, which is plenty long enough to kill someone on city streets.

In the event of one core failing, the other core must have enough functionality already loaded to be able to safely bring the vehicle to a stop in a safe place and in any traffic conditions. That might not *quite* be the full stack — you could probably get away with not having things like speed limit detection, for example — but it's remarkably close.
 
  • Like
Reactions: pilotSteve
It most certainly does. If a behavior required by law cannot be achieved without redundancy, then the law does, in effect, require redundancy. If a vehicle can't drive safely without both of its computers being active, then it cannot possibly meet the requirement that it be able to safely stop in the event of a failure of one of those computers. This is tautologically correct.
But it can be achieved without redundancy, and the vehicle can drive safely without both computers active.
Maybe you aren't quite clear on what would happen if the self-driving computer suddenly stopped doing anything. You'd have a passenger sleeping in the back seat, and the car would suddenly continue in whatever direction it was going until it hit something. The passenger would die.
L3 does not allow you to be a sleeping passenger in the back seat, you must be ready to take over in seconds. I know there are auto journalists that made this suggestion, but it's completely wrong and dangerous.
Automatic emergency brakes are a driver assistance system. They don't need a backup, because the human is the primary driver. They ARE the backup. The human is expected to know how to slow down the car, and to do so under typical circumstances.

What I'm saying is not "a loose conjecture". It is NOT POSSIBLE to meet the requirements for SAE level 3 if you don't have a redundant computer, period. And the law requires you to meet those criteria before deploying something as an autonomous vehicle.
That's incorrect. SAE Level 3 allows the car to alert the driver to take over in the event of a ADS (or vehicle) system failure (p11 J3016):
"At level 3, given a DDT performance-relevant system failure in the ADS or vehicle, the DDT fallback-ready user is expected to achieve a minimal risk condition when s/he determines that it is necessary, or to otherwise perform the DDT if the vehicle is drivable."

Also, as @MP3Mike points out, you don't need a redundant computer to ensure the car doesn't just hurdle down the road until it crashes. You can use a much dumber failsafe. Even in the L4/L5 cases where there is a ADS failure, it doesn't require that the fallback be of equivalent capability, only that it can achieve the minimal risk condition (which can be simply coming to a stop in its current path of travel).
 
That would only be safe if you were on a perfectly straight road. Otherwise, if the curve of the road changes, you could cross the center line into traffic before you stop, or go off into a ravine.

But what if you only have lane keeping? No, because it could still stop on a railroad track or in the middle of an intersection.

But what if it has lane keeping and intersection/sign detection? No, because it could still fail to see a pedestrian that the full system would have swerved for.

But what if it detects pedestrians? No, because what if you're twenty feet from a construction zone when it shuts down, and it has to detect safety cones and handle safe path finding through them?

And now you're basically running the entire stack.

Also, loading a partial, lightweight stack would probably require shutting everything down for a couple of seconds, which is plenty long enough to kill someone on city streets.

In the event of one core failing, the other core must have enough functionality already loaded to be able to safely bring the vehicle to a stop in a safe place and in any traffic conditions. That might not *quite* be the full stack — you could probably get away with not having things like speed limit detection, for example — but it's remarkably close.
That's a fairly long slippery slope argument, but SAE's definition does not say anything of the sort. A system that simply comes to a stop in its current path of travel is enough to satisfy the requirement even in a L4 case (much less a L3 case). You can eliminate pedestrian and construction zone considerations using ODD restrictions (plus as per the trolley problem discussion elsewhere, self driving systems are unlikely to be expected to do advanced avoidance maneuvers: trying to slow down to avoid impacting a pedestrian is a valid response and if the car is already entering such a fail safe mode, it's already slowing down ). None of what you say has to apply.
 
  • Like
Reactions: rxlawdude
Thats exactly what I am not saying. The system is a whole with many NN running across both CPUs its not confined to 1 or a duplicate across both.

But that's not what the design in intent was at all.

The second node was intended- and again I quoted Elon explicitly stating this on autonomy day- to be a failover node for redundancy.

The entire stack was supposed to run on a single node, with the other node backing it up, not spreading out the entire workload.

The only reason they're doing that now is they've exceeded the compute capacity of the first node. Which as Green points out they're having some trouble doing well, because it wasn't designed to do that originally.

If they'd actually intended for the FSD computer to distributed workloads across the nodes, instead of each independently running the entire stack, they'd have designed it significantly differently instead of as two explicitly separate systems, where the memory and other resources aren't at all shared between them.
 
The second node was intended- and again I quoted Elon explicitly stating this on autonomy day- to be a failover node for redundancy.
Yes, he said at autonomy day, "In order to have a self-driving car or robotaxi, you really need redundancy throughout the vehicle at the hardware level." Karpathy said, "With respect to redundancy, absolutely you can run basically the copy of the network on both, and that is actually how it's designed to achieve a level 4 or level 5 system that is redundant." And Pete Bannon summarizing the FSD computer as "enables a fully redundant computing solution."

Personally, I'm surprised that Tesla has only just started using the second node as this basically doubles the potential compute while Autopilot is level 2. Even something "simple" as running extra neural networks purely for shadow mode and distributed data processing which avoids needing the extra complexity of combining results with the other node. Karpathy at CVPR 2021 described offline processing of the 10-second 8-camera clips has the benefit of looking forwards and backwards through time as well as using larger neural networks, but this could be done in a distributed fashion on individual Tesla vehicles potentially even when parked and analyzing previously triggered/snapshotted clips.
 
What makes you say that?
Which part?

The first is from the FSD3.0 specs, the second is from information Karpathy etc have said.
As Rob says, the HW3 mainboard has dual redundant processors (they made a big deal out of this when HW3 was announced). It was stated at the time that these run the same code and cross-compare results. I'm not clear what results are compared (nor how often), and it was not stated what happens if they disagree (no way to tell which one is right).
 
  • Like
Reactions: rxlawdude
As Rob says, the HW3 mainboard has dual redundant processors (they made a big deal out of this when HW3 was announced). It was stated at the time that these run the same code and cross-compare results. I'm not clear what results are compared (nor how often), and it was not stated what happens if they disagree (no way to tell which one is right).

Rob appears to be saying the opposite of that below- explicitly insisting it's NOT running the same code on both.

The system is a whole with many NN running across both CPUs its not confined to 1 or a duplicate across both.


Possibly it can save a lot of back and forth if he clarifies which one he's actually claiming.

But it appears from the autonomy day stuff quoted above yours (and the other stuff already posted by myself and others) the actual intent of Tesla was exactly to run a duplicate of the entire stack on each node for redundancy....which by definition confines every NN in the stack to 1 node (with the exact same full stack running on the other-- nothing, at all, distributed across the two because that would create a single point of failure).
 
Rob appears to be saying the opposite of that below- explicitly insisting it's NOT running the same code on both.




Possibly it can save a lot of back and forth if he clarifies which one he's actually claiming.

But it appears from the autonomy day stuff quoted above yours (and the other stuff already posted by myself and others) the actual intent of Tesla was exactly to run a duplicate of the entire stack on each node for redundancy....which by definition confines every NN in the stack to 1 node (with the exact same full stack running on the other-- nothing, at all, distributed across the two because that would create a single point of failure).
Of course, they can change that design and still achieve L4/L5.
I will note the exact wording quoted above says "can" not "must":
"With respect to redundancy, absolutely you can run basically the copy of the network on both, and that is actually how it's designed to achieve a level 4 or level 5 system that is redundant"

Note as I pointed out in the other comment, SAE L4/L5 specifications do not require the fall back in a ADS failure to be identical to the main ADS, only that it can reach the "minimal risk condition". That means you can run different code in the backup system.