ChatGPT is the example given by Elon multiple times in relation to V12 though.
And ChatGPT hallucinates for many reasons, the main ones are it's fed lots of data mostly indiscriminately, and the human prompts are sparse and flawed.
V12 is different in that the data is well-massaged / curated with a tight feedback loop on failures with the data engine. V12's data consists of extreme examples to the mundane. Also, there's no flawed human prompts, the "prompts" are the same types of data-rich pixel streams as its training set.
It's worth noting that ChatGPT (GPT 3.5) is trained with Reinforcement Learning with Human Feedback.
But in regards to Tesla's V12. Remember that AlphaGo, AlphaChess, etc. were only possible because there were great chess engines and Go engines beforehand. Crazy Stone, for example, was a Go-playing AI system that played at a professional level. These systems' existence made it possible for the engineers to understand how these systems were achieving professional level and where they could be failing. Then, replicating these algorithms in neural network architectures became possible, which comes with the multiplier improvement that using ML offers, automatically catapulting them to world level.
But if these systems didn't exist, it would have been harder and taken longer for the engineers to replicate what needed to be done in the neural network. Remember, it's the humans who are the ones architecturing these networks, and they architect it based on what they believe is necessary to achieve a particular result. But if they don't know, then they won't be able to assemble a set of architectures that would hit the mark that they are looking for. It would literally be playing darts in pitch darkness.
The same was the case with AlphaZero. AlphaZero was only possible BECAUSE of the lessons from AlphaGo. Because it allowed them to see what made AlphaGo work and then be able to turn the overly engineered parts and replace it with RL-self play.
What Tesla, however, is doing is rushing to do an E2E planner without actually having a human-level system (or something close to it) as a working foundation to replicate. So let's be generous and say that FSD Beta as a whole has a disengagement every 100 miles. Changing from a C++ planner to a 99% NN planner isn't going to give you a 1,000x improvement. Let's say in a fairy-tale scenario that it gives them a 10x improvement. They are still left with a system that fails every 1,000 miles, which is far short from what they need to go driverless.
Take in contrast Waymo. They took a system that was ~100% NN perception, mostly NN for prediction, and mostly C++ code for planner. Basically, the planner was just like the previous Go and chess engines before AlphaGo. But at least they had a system that worked in suburbs. But the problem is that its limits were very low. So it will fail when it runs into construction. But because Waymo had a foundation of a system with a robotics planner that at the very least was at human level, they can now turn the robotic planner into an ML planner piece by piece while retaining the performance and getting all the ML benefits for free.
So they went from mostly C++ code to "ML FIRST" planner. I would think of it as a ~60% NN planner. This has allowed them to go from just being able to do driverless in suburb environment and in light rain and zero construction to being able to handle environments that include city, urban, heavy rain, heavy fog, and constructions, storms, debris, roadblocks, road detour, dead ends, etc. All while being driverless.
But if they didn't already have a robotics (C++) planner that achieved human-level performance and better. Yes, they could transition into a ~100% NN planner. But it won't give the system human-level performance. Because it's the knowledge that you know that you can implement as a NN that matters not the fact that you have an NN planner. That is why Wayve, even though they have a 100% NN planner, is nowhere near human-level performance. This is why OpenPilot which constantly markets end-to-end so they can hitch onto Tesla and sell devices. Doesn't even have a system.
TLDR: By building system 1.0, you learn how to properly build system 2.0. This is the case in everything. Whether you are building a web application, phone application, game, regular software, ML software, hardware, building architecture, cooking, etc.
People have falsely thought that if you geofence to a city you automatically get human-level performance. But if this were true, then everyone would be driving driverless. It's clearly not true because for driverless cars in the US. It's basically Waymo....Cruise and then everyone else. While the gap between Waymo and Cruise is huge, the gap between Waymo and everyone else is unimaginable.
So yes, V12 will provide improvements, add new features (pullover, parking lots, u-turn maybe, emergency vehicle handling maybe, 3-point turn maybe, dead ends maybe, etc) But the performance will be similar or incrementally better than V11 (2-3x).