Those are all valid concerns, but what little data we have so far suggestions that Tesla drivers at least
do not over-trust the system on the whole.
Tesla MIT study concludes that drivers maintain vigilance when using Autopilot
I'm not surprised they don't overtrust the system since it's AP HW 1 & 2, and they only go 9 miles between tricky disengagements on average.
It sounds like you understand that this study really isn't that relevant to this conversation...but for further detail:
There is VERY little data in this study, BTW, and I don't think particularly relevant to the obvious concern: What happens when the system becomes highly reliable and highly capable? THAT'S the key, obvious problem with these systems which I've been harping on above and will become the concern of regulators. That is the maximum danger point (which I guess is counter-intuitive to some people?).
Anyway, I would have preferred you link directly to the PDF rather than via the sickening Teslarati site! I followed them on Twitter for a bit then I had to stop due to the garbage headlines and fawning Tesla coverage.
Click-bait regurgitation "journalism" makes my blood boil.
For example:
Teslarati said:
The data used in the
study was generated from the over 1 billion miles driven by Tesla owners since its [Autopilot's] activation in 2015, about 35% of which were determined to be assisted by Autopilot
A casual reader would conclude 350 million AP miles for the study, amirite? Actually, this statement is just wrong. It is disgraceful "journalism". Some would call it fake news. At best, it is misleading. There is no way to determine from that statement that the dataset is 112,427 AP miles.
The paper (in the introduction - perhaps that's as far as Teslarati got)
actually talks about "1 billion miles driven on Autopilot" since 2015...and mentions their dataset contains 35% Autopilot miles out of the total number of miles in the dataset...
Specifically:
The MIT Study said:
...the Autopilot dataset includes 323,384 total miles and 112,427 miles under Autopilot control. Of the 21 vehicles in the dataset, 16 are HW1 vehicles and 5 are HW2 vehicles.
The Autopilot dataset contains a total of 26,638 epochs of Autopilot utilization.
So as you say, it is VERY little data. From 21 vehicles (HW1 & 2, which is arguably less capable, so likely safer), with possible selection bias - since these cars also had cameras installed in them specifically for this study. (Do these drivers participating in a study, affiliated loosely with MIT, really represent the average Tesla driver of the near future?) The authors of the study SPECIFICALLY understand this, and they are
very clear it is a limited study. It's a study!
Some further excerpts I found to be mostly self-evident, but still worth posting here (emphasis in bold added by me):
The MIT Study said:
…these findings (1) cannot be directly used to infer safety as a much larger dataset would be required for crash-based statistical analysis of risk, (2) may not be generalizable to a population of drivers nor Autopilot versions outside our dataset, (3) do not include challenging scenarios that did not lead to Autopilot disengagement, (4) are based on human-annotation of critical signals, and (5) do not imply that driver attention management systems are not potentially highly beneficial additions to the functional vigilance framework for the purpose of encouraging the driver to remain appropriately attentive to the road…
…Research in the scientific literature has shown that highly reliable automation systems can lead to a state of “automation complacency” in which the human operator becomes satisfied that the automation is competent and is controlling the vehicle satisfactorily. And under such a circumstance, the human operator’s belief about system competence may lead them to become complacent about their own supervisory responsibilities and may, in fact, lead them to believe that their supervision of the system or environment is not necessary….The corollary to increased complacency with highly reliable automation systems is that decreases in automation reliability should reduce automation complacency, that is, increase the detection rate of automation failures….
…Wickens & Dixon hypothesized that when the reliability level of an automated system falls below some limit (which the suggested lies at approximately 70% with a standard error of 14%) most human operators would no longer be inclined to rely on it. However, they reported that some humans do continue to rely on such automated systems. Further, May[23] also found that participants continued to show complacency effects even at low automation reliability. This type of research has led to the recognition that additional factors like first failure, the temporal sequence of failures, and the time between failures may all be important in addition to the basic rate of failure….
….We filtered out a set of epochs that were difficult to annotate accurately. This set consisted of disengagements … [when] the sun was below the horizon computed based on the location of the vehicles and the current date. [So all miles are daytime miles]
Normalizing to the number of Autopilot miles driven during the day in our dataset, it is possible to determine the rate of tricky disengagements. This rate is, on average, one tricky disengagement every 9.2 miles of Autopilot driving. Recall that, in the research literature (see§II-A), rates of automation anomalies that are studied in the lab or simulator are often artificially increased in order to obtain more data faster [19] such as “1 anomaly every 3.5 minutes” or “1 anomaly every 30 minutes.” This contrasts with rates of “real systems in the world” where anomalies and failures can occur at much lower rates (once every 2 weeks, or even much more rare than that). The rate of disengagement observed thus far in our study suggests that the current Autopilot system is still in an early state, where it still has imperfections and this level of reliability plays a role in determining trust and human operator levels of functional vigilance...
...We hypothesize two explanations for the results as detailed below: (1) exploration and (2) imperfection. The latter may very well be the critical contributor to the observed behavior. Drivers in our dataset were addressing tricky situations at the rate of 1 every 9.2 miles. This rate led to a level of functional vigilance in which drivers were anticipating when and where a tricky situation would arise or a disengagement was necessary 90.6% of the time…..
….In other words, perfect may be the enemy of good when the human factor is considered. A successful AI-assisted system may not be one that is 99.99...% perfect but one that is far from perfect and effectively communicates its imperfections….
...It is also recognized that we are talking about behavior observed in this substantive but still limited naturalistic sample. This does not ignore the likelihood that there are some individuals in the population as a whole who may over-trust a technology or otherwise become complacent about monitoring system behavior no matter the functional design characteristics of the system. The minority of drivers who use the system incorrectly may be large enough to significantly offset the functional vigilance characteristics of the majority of the drivers when considered statistically at the fleet level.