I bring up chess because it's simpler in the hopes that people can better understand how reinforcement learning for predicting end-to-end controls progresses. The core part is that the same network size can be improved with examples of where it performed poorly as well as examples of how it could do better. Chess neural networks can beat humans even when only considering the current board position to predict which one action to take, i.e., it doesn't need to brute-force explore any next possible moves. (Yes, it'll play even better if it can think about future board positions too, but that doesn't seem as applicable to 12.x.)Chess? Please... That's a trivial sandboxed small problem space that you can brute force
FSD Beta 12.x has more complications than chess such as real-time processing of the video at 36fps, but for each frame, it's still predicting the appropriate control action, so for a relatively "early" network like 12.2.1, this could result in decision wobble if each of the 36 frames don't predict the same best control. Similarly, human disengagement and correct driving examples provide reinforcement learning data for the end-to-end network to improve controls for the next release without requiring a larger neural network that might approach hardware compute limits. Instead of brute-forcing with a chess simulator to find better behaviors, Tesla can deploy 12.x to find real-world examples of where humans drive better.