Increasing the price in Europe was a foolish move, and their conversion rates for FSD sales are still pretty low and only getting lower. The easiest way to know whether Tesla has made major progress is that Tesla will be absolutely yelling it from the mountain tops. They'll give press demo drives, they'll hold a special event for investors, they'll blog about it repeatedly, etc. Because if you're actually the leader in autonomous driving and selling it, that's an obvious way to drum up immediate sales.
Here’s what I’d like to see in NoA: When approaching an interchange where traffic is merging, where you can clearly see traffic coming up an on ramp, NoA should move the the left lane. This is how I “manually” drive, then move back into the right lane once I’m able to. I know it was doing this in prior software releases, but it was doing it waaaaay to early, hanging in the left lane too long. It slows down in the right lane to allow cars to merge now, which is fine and well, but I could see people behind me bitching about slowing down. Also, on 2020.36 when merging from certain on ramps to the free way, it wants to go directly to the far left lane of the freeway. This is not safe, nor necessary, no reason it should be doing this in the cases it does this. Whatever. It’s still better than the Pilot Assist my last two Volvo’s had.
MS R 2020.36.10 Something that has been bugging me for a while (and currently being ignored waiting for the fabled 'rewrite') is why the visualizations are so twitchy and objects often appear, disappear, turn into a different object - especially when stationary. I have had truck visualizations repeatedly changing the cab end from the front to rear of the truck as surrounding traffic edges forwards. I have had traffic cones appearing and disappearing all over a wet road surface that were nothing more than reflections of stop lights. Surely, no self driving system can hope to work if it is prepared to believe that random (and near impossible) events are taking place within the system's field of vision. No wonder fantom braking is an issue. The AI might well be trained on a frame by frame basis, but surely its obvious that it needs to learn what happens across many frames, pretty well taking into account everything (relevant) that it 'sees' up to a hundred or more yards ahead. Hopefully, this will be the night / day change that we see with the rewrite, where we will be able to feel the car change speed and position within a lane based on upcoming road layout / traffic. Right now, driving (only) at 60 in a 60 zone regardless of it being a country lane or major road, not starting to slow until you pass a reduction in limit sign, slamming on brakes if a vehicle crosses your path some way ahead, having no regard for parked / stationary cars other than stopping behind them all seem like big short comings. The (simple) NN models I have looked at - once fully trained - do not lurch suddenly from one conclusion to another. If that was even possible, then it would suggest that the NN wasn't adequately trained as 'confidence' shouldn't be able to suddenly switch back and forth between significantly different outcomes. For that to happen would indicate a dangerous situation. I feel as though the issue in training is not so much the stuff the system 'recognizes' as the stuff it effectively opts / learns to ignore. Edit: if an AI system absolutely must 100% come up with some opinion on what it is seeing, (and given a finite processor resource) then presumably in some situations, something has to give, and you allow an increased chance of the system being 'tricked' purely to have a view to work with. Within a single frame, this might manifest as jumping objects, and imo there should be far less of that evident today if everything was working correctly. Within multiple consecutive frames, I can't really vizualise how it would work. As soon as a frame has low(er) confidence, then can you usefully build that into a multi-frame view? So is Tesla really going to suddenly make a leap from (imo) jumpy twitchy visualizations as at present to a far smoother, steadier, flowing and more stable / confident view?
Is it generally known how Tesla's recognition (maybe prediction-less) system works? Does it (at present) really just try to make sense of each frame, one at a time, or is it doing object tracking? Object tracking would suggest some implied relationship between consecutive frames, and (with my little amount of real 1st hand knowledge) makes me feel that it is recognising but not tracking. Or if it is tracking then it's not using that (dynamic) data in a predictive manner. If the system needs to identify and track objects and their relative vectors all around the car, is that still within the scope of HW3? What are the numbers like? 1000 potential objects that are or might soon become track able? 100 significant objects identified as needing close tracking? Plus some 'Emergency' capability to allocate a load of processing for up to 3 or 4 objects deemed to pose imminent threat? Is there even such a hierarchy in the way the system works at present or might work in future? Or is all this really just an intrinsic function of the NN that as long as you are working at a sufficient resolution, it just works itself out?
Where you are looking for is that 4th dimension - i.e. time. Right now, there is very little in Autopilot that is correlated in time (cut in detection is an example). Elon called the current implementation 2.5 -- meaning 2D (images at a time) and only correlated in time for some specific tasks. The rewrite is supposed to give full 3D (stitched view of the entire surrounding) plus correlation in time throughout the entire stack, not just explicit tasks. Obviously, I am waiting to see this play out, but the potential is there. Hope that helps.
Thanks. I get the time element, but as a process, is it identifying and tracking objects or does a NN develop some inherent automatic object awareness just by virtue of what it does and how it does it? As well as trying to build a 3D image from 2D cameras, you then presumably have to hold a moving buffer of say 10 - 20 seconds (may be less) and then follow 'likely' objects through that space. That sounds like a staggering task, and in some way in very busy environments where little can be taken for granted, it seems like the only way to be sure your image / environment processing keeps up is by slowing down the car OR working with a less certain view of the environment.
The NN for self-driving at Tesla, goes through 70000+ GPU hours of training, on well-curated and meticulously labeled data. So, it is able to identify the objects only as well as it has been trained. But prior to the stitching (3D) view, it would run the detection on independent feeds (1 for each camera) and then would spit out predictions for each of those and was stitched in the code. (I believe this is also where all the UI jumpiness is coming from)