I find that software engineers sometimes have the worst time in understanding how different Tesla' Autopilot software release is from "normal" software releases. I do not believe that it is possible to release software like Tesla's Autopilot in a fashion similar to traditional software where you can run a testing suite and do unit testing or standard regression testing, get the defect count down to a manageable level and ship it. Undoubtedly, they have many terabytes of video and run extensive simulations that are akin to regression testing. But that's not nearly enough to provide confidence of proper behavior in the field. Instead, after all of that testing, they then put the software into the cars for additional validation testing amongst a much larger driving pool, but in shadow mode so it doesn't take action. However, there is a limit to coverage of that kind of validation also. Simply, it isn't possible to develop this kind of software and do any kind of lab testing and pre-recorded video testing in order to achieve the necessary confidence in validation for general release. To top it off, some of the additional hinting is actually crowd sourced to further cut down on both false positive and false negatives, so each run is potentially a new scenario with updated map tiles. Certainly, add to that hardware sensor changes, and the amount of regression testing one can do with earlier recorded real world sensor data is limited at best. The best way is then to get the hardware into vehicles as fast as possible and have the new software run in shadow mode.
In any case, comparing a Tesla Autopilot software release with iOS beta release operating procedure belies a lack of serious thought towards the differences between the software and necessary and possible testing regimens as well as real world consequences.
Fleet learning at this juncture is about shared validation and updating the map tiles. The neural nets training certainly isn't being done in the vehicles... it's more likely deviations from expectations are recorded and uploaded, some of which result in data for updating the training, some of which updates the map tiles.
The only way a smoother transition could have happened is to provide both AP1 hardware *and* AP2 hardware in all the cars, have AP1 hardware actually functioning and blend in AP2. That's cost intensive, power intensive, and possibly the complication of switching and the physical installation and cross connection of the cameras for both make for an actually worse scenario.