It's really impressive that they made end2end work so well. Whatever kinks the system have they will now just fix with more/cleaner data. Mainly what has happened is that a lot of the complexity has been moved from the software in the car to the software in the datacenters. In the car they will just be running one major input->control output neural network and a few shadow mode neural networks to find new data and maybe to supervise the system.
Basically what they will do is give the neural network tons of examples of filtered good drivers and tell it to predict how they would drive in a given situation.
They are adding some complexity to the data labeler/data-selector. For example they have tons of drivers not stopping at stop lights but NHTSA forces them to drive unhumanlike so they will have to remove normal drivers from the dataset of those situations and only keep NHTSA-style drivers. The messy C++ code will move to figuring out which data to include.
Then there are many other parts of the world with different typical local driver's styles and different unhuman like government agency rules. This will complicate things a bit as the AI needs to learn how to drive in different countries etc. Maybe they can just feed in a variable for which jurisdiction they are in and the car learn which of its modes it should use in each particular situation. Or they can just gather enough of data for each country in each situation with different rules. This complexity will be moved offline, but it will still be complexity they need to deal with.
There will be some signs with text that the car will have to learn to read. With enough examples and enough training the car should learn to read basic car literacy and basic math. For example:
View attachment 968349
If you feed the neural network the day, time, gps, navigation screen and then the video where it sees the image above it might just "learn" to read the gist of it given enough examples of good drivers. ChatGPT could probably solve it, so clearly it is possible. But it's gonna take a massive amount of examples and a massive amount of training to get there... Maybe there is some better way of solving these, but if they actually want to be able to drive in situation where the driver needs to stop and read and think, something will have to be done about these...