Stumbled upon this conversation, and while I haven't read 8 pages of followup, I wondered if anyone is thinking differently since v12.3 came out, given how well it's been received, and seems to do with just neural nets?
It is behaving as I would expect. As a L2+ driving assistant it feels more natural, more human like (except at stop signs where they mandated by government unrealistic behavior), but it makes mistakes that the previous rule-based system did not, particularly about driveable areas as sometimes it takes corners poorly and hits wheels. Hasnt happened to me (I'm in SoCal and in the train set and probably where it behaves the best).
Maintaining firm invariants that a rule based system is programmed to do is harder with a net trained only driving policy. For instance "all boundaries of car must stay on drivable area perception locations" is a firm rule that can be programmed in a traditional path planner which has physics knowledge of car size, mass, acceleration dynamics etc. The new 12.3 net based driving system is more natural like people and sometimes moves a bit over lane lines (naturally) than the previous system did to give space to obstructions (this is good), but it also seems like it occasionally takes corners too closely and hits rims (bad behavior, because it doesn't have a physics simulation and robotics space rule inside it).
Humans don't have an explicit physical simulation either but by the time they're driving they've had 16-18 years of physically moving around in space in their own body, and evolved neural hardware has millions of years of training on motion with identified mapping & direction and other neurons built in to the brain by evolution. Artificial neural networks don't generally have this built-in special architecture yet---we don't yet know how to do it in the right places, but eventually I suspect that researchers will figure out the minimal ways to add it on (like submodules for neural 3-d computations) that can give good performance with much more efficient and smaller network sizes.
And then humans make errors in cars at first, but later get better.
With a larger and especially better curated positive and negative dataset, I think it will progress faster as a L2++ driver assistance, and be scalable to more localities, but the gap to L4 is still high. Like LLMs simulate human fluency and trigger incorrect heuristics in humans, the greater naturalness of driving doesn't necessarily mean L4 is around the corner.
Further progress in that direction would probably require multiple simultaneous models: a network based driving module for most of the drive, but augmented with a simultaneous "safety driver" code that enforces certain boundaries and crash avoidance, and has priority, and this is rules/robotics based. For instance, with fast precise absolute distances, including on sides of car with full-angle high res lidar or radar you could enforce policy that prevents the collisions and rim curbings in extreme situations while letting the net drive most of the time. This might make the rules-based programming easier as you have to program in only boundary-condition rules and not all the usual heuristics.
But this is expensive and Elon likes cheap.
Heuristic fallacy: On LLMs, people who can speak gramatically well are often knowledgable and intelligent, LLMs can do the first part well they're likely to invent *sugar* up more than people. Similarly more-like-human driving most of the time doesn't mean it has human-like conception. It's as if they trained an orangutan to drive as it watched many people drive. It's hard to direct an organutan with concepts on top of experience.