This is the incredibly hard problem that Tesla is years ahead of the competition in solving. Translating the world including its printed surfaces into a near-real-time 3D vector-space, and of course doing it on a Tesla (with its suite of sensors).
Change any of those requirements and the problem is MUCH easier!
Adding Lidar makes this easier (still tough because you need surface recognition for signs/etc though also complex object detection to tell the difference between a shopping bag and a boulder).
Doing this not near 30fps, taking a few minutes to compute against a series of still images or videos makes it WAY easier.
Doing it without surface recognition or with different sensors makes it way easier!
Doing it with a crazy-high energy/compute available on-car makes it a fair bit easier.
--
Tesla is doing it the hard way, but it's working! It just needs more of those two-week training cycles to tune the Neural Nets (this literally isn't something you can out-think and nail without the iterative tuning process, and you definitely can't nail it without a BOATLOAD of quality and varying situation data (correctly 'tagged' for the NN training), which requires those two-week cycles through the Data Engine.
So yeah, they're crazy good at it now. Truly they are. Even with the schyzophrenic behaviour. That's genuinely expected at this point in the NN dev/training lifecycle. Especially with the utterly major shift to the new Occupancy Neural Network, which surprisingly seems to be responsible for a lot of the improvements in 10.69. I'm surprised there wasn't a serious regression in quality switching to this new NN, and it makes me SUPER optimistic.
Obviously they've been working on the Occupancy Net for a while, running on some internal test vehicles, and maybe it's been running in shadow mode on some of the fleet, but it's obviously had a LOT less data to run on for training (so much of the data for training is taken from Beta drivers, using when they take over, report bugs, etc and then the engineers look at those situations, 'tag' it and identify where there was a mistake by the net, etc).
So long-story short, the Occupancy Net is still a baby, but it's working pretty darn good. It's impossible for it to be as good as it will be even if there were no further software changes to be made (of course there will), so the training of it will make it better, and the tweaks to the Net itself will make it better.
These are all expected and very, VERY good news in my estimation.
certain they already do. Actually there are a TON of sensor inputs that go into the NNs, not just the camera feeds.
Dojo v1 is a revenue game changer for Tesla. They are SO far ahead on Generalized AI. And FSD is basically like Apple's iPod in 2006, funding the development of the iPhone and what became iOS and the App Store.
That vector-space problem is an ENORMOUSLY difficult problem to solve (with the specific constraints for mobility, compute, energy, real-time, sensor suite, etc). And it's a key problem that will allow for so much behavioural AI to be built on top (FSD is the first one Tesla is tackling, Optimus is the next, but the applications are utterly game-changing across nearly every industry).