You can install our site as a web app on your iOS device by utilizing the Add to Home Screen feature in Safari. Please see this thread for more details on this.
Note: This feature may not be available in some browsers.
Does diffusion come first or combined with end to end?So yet another architectural rewrite? Will Tesla start from scratch or try to just merge the different NNs into one big one?
I am not surprised that Elon wants to do end-to-end AI. It is very consistent with his vision and AI centric approach. I am sure he sees it as the ideal end goal for autonomous driving.
Wayve is already doing end-to-end AI for their self-driving. But all they have are some nice demos, nothing they can actually deploy to the public yet. Experts like Waymo's Anguelov say that the industry is moving in the direction of end-to-end AI but that doing reliable autonomous driving everywhere with pure end-to-end AI is probably still far off. End-to-end AI is difficult to troubleshoot, difficult to train and likely requires new AI. End-to end AI will likely happen at some point but I think we can expect it will take awhile. I think we should expect lots of Elon missed deadlines before V12 actually makes it to general release.
Does diffusion come first or combined with end to end?
Tesla are also likely swapping over to diffusion in place of transformers:I am not sure what you mean. As I see it, Tesla has two options:
1) Start from scratch. Basically, retrain a brand new end-to-end NN from zero and when it gets good enough, replace the old stack with the new end-to-end stack. The downside of this approach is we could see a major regression in features until the new end-to-end stack catches up with the features of the old stack.
2) Try to consolidate the existing NN until they become just one big end-to-end NN. With this approach, they might combine NNs and reduce the number of NNs over time. So, they might replace parts of the stack with end-to-end NN. For example, replace the traffic light and stop sign control with end-to-end AI. And just do that until the entire stack is end-to-end. This approach would likely prevent major regressions. It would be more gradual. Although, it might be difficult to do since you would need to manage the different pieces of the stack that are end-to-end and not end-to-end yet. And I don't know if there could be unintended consequences from one part of the stack affecting another.
I agree with this caution about the term "end-to-end". I certainly don't claim to know more about Machine Learning then Elon or any of his team, but my impression has been that the term is used to mean a system that is trained with a unified model for the complete task. This has numerous implications for performance, is arguably desirable in some respects, but challenges the ability to understand, optimize and troubleshoot the inner workings of the system during development.I don't think this means they have singular models that are trained end-to-end, it just means that they're replacing some control/planning C++ code and heuristics (say for speed control) with ML. So in V12 there are cases where you have ML models all the way down from pixel to steering/accelerator. But they still will have this human-engineered "vector space" layer between perception and planning. I.e. The objects and lane markings and road edges and so forth that you see on the screen.
Apparently Ashok stated v12 this year.
He said, "maybe later this year" in the Q&A.Apparently Ashok stated v12 this year.
Here is Elon's comment again:
The comment from Yann Le Cun refers to models trained on text, but I do wonder - would hallucinations in AI that Tesla would use be completely preventable? If not, then could you even solve FSD with end to end AI?
The "end to end world model" means it's not really "end-to-end" which would be training from sensor inputs into motor and steering control signals.