How is it flawed? The more interventions, the lower the score, weighed by miles. Interventions are also weighed by severity which is good because interventions that are more safety critical should lower the score more than interventions that are less safety critical. The score also covers all the important types of interventions. And "perfect drives" improve the score.
Perhaps you don't understand how the scoring works?
Add number of interventions where FSD Beta would have hit something multiplied by 15
Add number of interventions where FSD Beta attempted an illegal act multiplied by 10
Add number of interventions where FSD Beta caused confusion in other drivers multiplied by 3
Add number of interventions where FSD Beta did an incorrect act by 2
Add number of interventions where driver was not confident in FSD Beta action but it was not illegal or unsafe multiplied by 1
Minus number of "zero intervention" drives greater than 3 miles multiplied by 2
That basically gives you a total points of "bad driving". Divide by total miles to get "bad driving" per miles. Then multiply by 100 and substract it from 100 to get a percentage of "good driving" per mile.
Here is an example: