Pinballing?

Gwgan · Aug 18, 2021

Referring to http://hal.pratt.duke.edu/sites/hal.pratt.duke.edu/files/u43/IEEE_Tesla_final.pdf

Car 2 in this study had firmware version v10.2 (2020.12 4fbcc4b942a8) and FSD was not turned on during the test however it was stated “…The car vigorously pinballed from side-to-side in the lane almost immediately upon autopilot engagement and routinely hit the rumble strips, Car 2’s behavior during highway testing was also very unpredictable. The car vigorously pinballed from side-to-side in the lane almost immediately upon autopilot engagement and routinely hit the rumble strips…”

Anyone here had this trouble with this version?

EVRider-FL · Aug 19, 2021

Version 2020.12 is ancient history at this point, so does it matter? Didn’t read the article, too long.

Terminator857 · Aug 19, 2021

Have not noticed such a problem.

AndreP · Aug 19, 2021

The conclusion of this paper actually contains some interesting tidbits that have implications beyond pinballing in the version mentioned

VII. CONCLUSION

The goal of this research was to assess between- and within vehicle variation for an L2+ system, including driver monitoring, in three key scenarios. To this end, three Tesla Model 3 vehicles displayed significant between- and withinvehicle variation on a number of metrics related to driver monitoring, alerting, and safe operation of the underlying autonomy.

These results suggest that the performance of the underlying artificial intelligence and computer vision systems was extremely variable, and this variation was likely responsible for many of the delays in alerting a driver whose hands were not on the steering wheel. Ironically, in some cases the cars seemed to perform the best in the most challenging driving scenarios (navigating a construction zone), but performed worse on seemingly simpler scenarios like detecting a road departure.

This finding highlights a common misconception that what humans perceive to be hard in driving may not necessarily be what an autonomous system finds difficult. It may be that the cones were more easily detected in one software version as opposed to the road edges in a much more gradual drift in the road departure test. Another possibility is that engineers spend more effort on the more difficult problems and spend less time on seemingly easy problems. Whatever the reason for such variable and often unsafe behaviors, these results indicate that more testing is needed for these vehicles before such technology is allowed to operate without humans in direct control.

These results also suggest that more effort is need on developing consistent and accurate alerts when L2+ systems are not performing as expected. These results should be interpreted in light of the discrepancies in the software/hardware configurations of the vehicles, which present a confound for assessing the nature of performance variation. Despite the very similar configurations of Cars 1 and 3, they completed the tests using different versions of software. Car 2 possessed the purported “full self driving chip”, so in theory should have the most advanced Autopilot system, but this car objectively performed the worst.

Such results also indicate that the concept of over-the-air updates needs to be revisited when safety-critical functionalities may be changed. While agile software engineering techniques may be suitable for smartphones and other similar devices, these techniques likely cause significant problems in safety-critical systems. Unfortunately, these processes have never been formally studied or evaluated by a regulatory body. Indeed, these results highlight the need for more scrutiny of the cars and software embedded in them, as well as the certification processes, or lack thereof, that allow these cars on the road.

Lastly, these results highlight that the post-deployment regulatory process that NHTSA uses in Fig. 1 to protect the public against unsafe vehicle technologies is ill-equipped to flag significant issues with L2+, or in the future, self-driving cars. These results dramatically illustrate that testing a single car, or even a single version of deployed software, is not likely to reveal serious deficiencies. Waiting until after new autonomous software has been deployed find flaws can be deadly and can be avoided by adaptable regulatory processes. The recent series of fatal Tesla crashes underscores this issue. It may be that any transportation system (or any safety-critical system) with embedded artificial intelligence should undergo a much more stringent certification process across numerous platforms and software versions before it should be released for widespread deployment. To this end, our current derivative efforts are focused on developing risk models based on such results.

These researchers are suggesting the systems can't be evaluated by regulators using only one vehicle because there's so much variation between which software version is being used at any given time, and they're saying over-the-air updates are suitable for smartphones and similar devices but may cause "significant problems in safety-critical systems" especially in terms of regulatory certification.

Gwgan · Aug 19, 2021

It's a bad study, no doubt. They binned three distinct software packages and then claimed they were different. I just wanted to know if what they saw on the test car was typical or if perhaps their outlier car had a problem.

Dan D. · Aug 19, 2021

AndreP said:
The conclusion of this paper actually contains some interesting tidbits that have implications beyond pinballing in the version mentioned

These researchers are suggesting the systems can't be evaluated by regulators using only one vehicle because there's so much variation between which software version is being used at any given time, and they're saying over-the-air updates are suitable for smartphones and similar devices but may cause "significant problems in safety-critical systems" especially in terms of regulatory certification.

In a way we can be thankful that autonomous software doesn't have regulatory certification yet. The airline industry is so strict that new 737 Max planes a couple of years ago were still being made with basically 80286 processors. It would have been just too challenging to replace them with something newer and have to recertify everything.

With something like autonomous driving it's still not clear what the hardware & software demands are, you don't want to be so locked into any particular configuration. This could change in the future I suppose if regulations become strict.

Search

Pinballing?

Gwgan

Almost a wagon

EVRider-FL

Well-Known Member

Terminator857

Active Member

AndreP

Guest

Gwgan

Almost a wagon

Dan D.

Desperately Seeking Sapience

Similar threads