5% load total
Or
10% load
with full fail over redundancy
Not: 10% load
after failover.
Full failover mode uses twice the resources to do the same job.
That means everything is being done twice, in parallel, to verify consistency.
Yah, not voting since that would require 3 in parallel, but rather lock step like is common on automative safety critical hardware.
http://www.ti.com/microcontrollers/hercules-safety-mcus/overview.html
Bit errors are only minor if they occur in the early layers of the NN (and then only if the error is in the data portion). If you drop a high level bit in the classification layer, you could ignore a real object. If you alter a bit in the driving policy, the car could go the wrong way. If you alter a bit in the coefficient table, you could saturate (of potentially overflowing overflow) the output. If the failure is in the HW and not coefficients, multiple multiple error could result.
Heath status via self checks would require injecting full coverage test data (multiple test cases for full bit/ HW coverage) streams at a faster rate than a bad bit would result in vehicle misbehavior. Either with known test results, or by injecting the same data on both cores and comparing. It would also require verifying all internal calculations, not just final output.
Lockstep identifes any influential bit flip or logic errors instantly and puts the system into fault mode. NoA is running on AP2.x (10% of HW3 single core) so they could go to four copies of a half size NN (8 copies of something double the size of current) for limp to safety mode and still have redundancy to potentially determine which core (or half core) has the fault.