What specific broad concepts do you think are lacking in 12.2.1 to see if it improves in later 12.x?
When I assess an approach, I try to intuit any possible fatal flaws.
The end-to-end approach with video training and my experience with 12.2.1's decision wobble has me concerned:
1) In the case of the decision wobble, V12's dataset has videos of committing to some x maneuver, but it also has videos where it is behaving differently (like braking) in a similar situation. The biggest challenge and question for the current NN approach is, "can the NN conceptualize the purpose of a maneuver and collapse its decision tree in the case of a gray area in decision making."
In that video example you gave of the car parking in Dennys, you can see that right before turning into the spot, the path planner had some milliseconds of turning right. Because the decision of parking was clearer, that plan disappeared as the car inched forward.
There's actually a lot of these gray area decisions, especially in a parking lot. Say there 2 possible paths to a pin in the parking lot but there's a small island in the way. You can turn before the island or after the island, 12.2.1 seems to get tripped up with these decisions.
2) There's still many instances of freezing at stop signs / unprotected turns with 12.2.1. This is something we saw during Elon's livestream and persists today. What sort of miracle would be needed to fix this? That's what I wonder.
3) I think these problems can only be reduced with a higher parameter NN, and HW3 is already limited, so HW3 seems like a dead end for V12. Elon already mentioned that more training compute is needed to reduce inference compute, so I guess there's some headroom, but I feel like there needs to be at least 3-5x higher parameter count for V12 to work well on HW3.