Umm...did you read that paper? It definitely does not suggest the opposite, nor would Fridman claim that. The paper is very very clear about the limited scope and how the results are unlikely to be able to be extrapolated to more capable systems. In a very specific situation, the 21 drivers in the study seemed to stay engaged and maintain good awareness when using AP. There are a number of possible reasons for this discussed in the paper. I recommend reading it through.
“...the Autopilot dataset includes 323,384 total miles and 112,427 miles under Autopilot control. Of the 21 vehicles in the dataset, 16 are HW1 vehicles and 5 are HW2 vehicles.
The Autopilot dataset contains a total of 26,638 epochs of Autopilot utilization...
…these findings (1) cannot be directly used to infer safety as a much larger dataset would be required for crash-based statistical analysis of risk, (2) may not be generalizable to a population of drivers nor Autopilot versions outside our dataset, (3) do not include challenging scenarios that did not lead to Autopilot disengagement, (4) are based on human-annotation of critical signals, and (5) do not imply that driver attention management systems are not potentially highly beneficial additions to the functional vigilance framework for the purpose of encouraging the driver to remain appropriately attentive to the road…
…Research in the scientific literature has shown that highly reliable automation systems can lead to a state of “automation complacency” in which the human operator becomes satisfied that the automation is competent and is controlling the vehicle satisfactorily. And under such a circumstance, the human operator’s belief about system competence may lead them to become complacent about their own supervisory responsibilities and may, in fact, lead them to believe that their supervision of the system or environment is not necessary….The corollary to increased complacency with highly reliable automation systems is that decreases in automation reliability should reduce automation complacency, that is, increase the detection rate of automation failures….
…Wickens & Dixon hypothesized that when the reliability level of an automated system falls below some limit (which the suggested lies at approximately 70% with a standard error of 14%) most human operators would no longer be inclined to rely on it. However, they reported that some humans do continue to rely on such automated systems. Further, May[23] also found that participants continued to show complacency effects even at low automation reliability. This type of research has led to the recognition that additional factors like first failure, the temporal sequence of failures, and the time between failures may all be important in addition to the basic rate of failure….
….We filtered out a set of epochs that were difficult to annotate accurately. This set consisted of disengagements … [when] the sun was below the horizon computed based on the location of the vehicles and the current date. [So all miles are daytime miles]
Normalizing to the number of Autopilot miles driven during the day in our dataset, it is possible to determine the rate of tricky disengagements. This rate is, on average, one tricky disengagement every 9.2 miles of Autopilot driving. Recall that, in the research literature (see§II-A), rates of automation anomalies that are studied in the lab or simulator are often artificially increased in order to obtain more data faster [19] such as “1 anomaly every 3.5 minutes” or “1 anomaly every 30 minutes.” This contrasts with rates of “real systems in the world” where anomalies and failures can occur at much lower rates (once every 2 weeks, or even much more rare than that). The rate of disengagement observed thus far in our study suggests that the current Autopilot system is still in an early state, where it still has imperfections and this level of reliability plays a role in determining trust and human operator levels of functional vigilance...
...We hypothesize two explanations for the results as detailed below: (1) exploration and (2) imperfection. The latter may very well be the critical contributor to the observed behavior. Drivers in our dataset were addressing tricky situations at the rate of 1 every 9.2 miles. This rate led to a level of functional vigilance in which drivers were anticipating when and where a tricky situation would arise or a disengagement was necessary 90.6% of the time…..
….In other words, perfect may be the enemy of good when the human factor is considered. A successful AI-assisted system may not be one that is 99.99...% perfect but one that is far from perfect and effectively communicates its imperfections….
...It is also recognized that we are talking about behavior observed in this substantive but still limited naturalistic sample. This does not ignore the likelihood that there are some individuals in the population as a whole who may over-trust a technology or otherwise become complacent about monitoring system behavior no matter the functional design characteristics of the system. The minority of drivers who use the system incorrectly may be large enough to significantly offset the functional vigilance characteristics of the majority of the drivers when considered statistically at the fleet level.”