The problem with such a test is that "gaming" the test would almost certainly be rampant unless the test had sufficient variability to stop makers hard-wiring knowledge of the test as a special-case into the car (think how computer makers try to rig benchmark test results in various dubious ways). And if its not standardized then you are open to all sorts of accusations of favoritism. Not saying I'm against the idea, it makes a lot of sense .. it's just going to be tough to get it right (and fair).
The good news is that we do know WHAT we want to test for. Agencies like SAE and NHTSA have put together a list of elemental behavior competencies:
- Maintain a lane
- Changing lanes
- Navigating intersections
- Navigating unstructured roadways, entering/exiting unstructured roadways
- Navigating pick-up and drop off zones and parking structures
- Responding to vulnerable road users
- Responding to other vehicles
- Responding to special purpose vehicles (ie emergency vehicles etc)
- Responding to lane obstructions and obstacles (static, dynamic, including animals)
- Responding to confined road structures (tunnels, bridges, parking garages etc)
- Responding to work zones (including workers)
- Responding to DDT performance relevant system failures
- Responding to relevant dynamic traffic signs and signals
So the question becomes HOW do we test for these competencies. I don't think it would be too difficult to devise a set of scenarios that test all these competencies. But you are right that ensuring that the tests are right and fair could be tricky. we could construct those "fake cities", with intersections, construction areas, parking lots, fake pedestrians that pop out, cars that cut in etc... and have the AVs complete a random route through the fake city. I would also propose doing the test on different days to cover good weather and bad weather. You could even create fake fog to test for that as well. Obviously, AV companies could still try to adjust their system to pass the test. I would suggest that AVs would need to complete this test without premapping so that they can't just premap the course and pass it that way. Companies could still use HD maps to improve performance in the real world when they deploy but this would test that the AVs have the basic competencies if there was no HD map. Since everyone would take the test in the same "fake city", I think that would ensure fairness. To address the issue of just rigging the system to pass the test, I think there would need to be other parts of the validation process. Just taking an AV driving test would not be enough IMO.
I propose a 3 step process the get certification for deploying an AV:
Step 1: Documentation
The AV company would need to provide documentation on how their AV works, how it was built, what processes were used to develop and test the AV, what redundancies exist for critical safety issues, what the ODD is, what processes exists to enforce the ODD, and what processes exist after deployment to quickly fix an issue. The last part is key because AV accidents will inevitably happen no matter what. So it is important to have a "recall" process in place to quickly address and fix a problem. The purpose of this step would be to ensure that regulators are knowledgeable about how the AV works, provide evidence that the AV company did their due diligence in building and testing their AV and give regulators an opportunity to provide feedback if regulators feel that the AV company missed something critical.
Step 2: Testing behavior competencies
This would be the on-road test that I talked about above that would test for basic driving competencies. This purpose of this step would be to make sure that the AV has the basic driving skills needed.
Step 3: Safety Data
In this final step, the AV company would need to show safety data from x million of miles of autonomous driving with a safety driver in the ODD that they want to deploy in that shows the overall safety is good enough. The data would need to meet some criteria of x safety critical failures per million miles. Safety critical failures would be actual accidents + near misses that almost resulted in a collision. The purpose of this final step would be show overall safety is good enough.
I think if we did all 3 steps that would be good enough to certify AVs for deployment. All three steps could be standardized to ensure fairness. And all three steps together would provide a more complete validation process. Step 1 ensures the company meets the requirements in developing the AV, step 2 tests for basic driving skills, and step 3 checks for overall safety with statistically significant data. Again, this would not guarantee that AVs never crash but I think it would set the bar high enough that the public could trust AVs.