Captkerosene
Member
That may well be. But, it's the answer to all the "how will it ... " questions.I see the “wishful thinking” stage is in full swing.
You can install our site as a web app on your iOS device by utilizing the Add to Home Screen feature in Safari. Please see this thread for more details on this.
Note: This feature may not be available in some browsers.
That may well be. But, it's the answer to all the "how will it ... " questions.I see the “wishful thinking” stage is in full swing.
That may well be. But, it's the answer to all the "how will it ... " questions.
I mean sure, you could have the video -> video net take in throttle and steering inputs and it will predict where the car is going to go---that's learning a physics network of the dynamics. This wasn't really a big problem before though as the ground truth physics network learned from regular control systems theory, and yes, accurate physics is at least as good. Rediscovering Newton is cool but not the problem.You qualify the video -> video prediction task as a mainly helping perception and not control policy, but would you agree that for video predictions to be accurate, it probably has at least some general internalization of control related concepts? For example, if you provided video leading up to a red light vs green light, the predicted subsequent video frames could reflect slowing down vs maintaining speed even though it was not explicitly trained to output controls. Expand that to many other video prediction situations where there's a lead vehicle or not, stop signs, crossing traffic, etc. where it needs to predict video frames reflecting speed control.
I totally agree that good control will need dedicated training data, but potentially the amount needed can be significantly less because the pre-trained world model already ends up with at least basic concepts of average control as opposed to introducing a completely new idea trained from scratch.
Presumably when fine-tuning the policy network, the training data includes the video, navigation route and other context to then minimize error from human controls. The video forwards through the world model to generate its internal understanding of the environment acting as additional context/input for deciding controls.The central question is how is the policy network going to be trained, and against what ground truth targets and what loss function?
That is true end-to-end training where the error (supervision signal) is difference of ego from human-chosen path & velocity and that loss is backpropagated all the way back to perception. That's really hard because the bandwidth of the supervision signal (human path) is really low compared to video. They could make a secondary task of video->next frame video prediction in order to make good internal perceptual models but in the end the net should only use capacity (quantity of weights devoted to a task) to do the task it needs to: drive, and not to predict video. Predicting video is OK for an offline generative simulator to make representative movies.Presumably when fine-tuning the policy network, the training data includes the video, navigation route and other context to then minimize error from human controls. The video forwards through the world model to generate its internal understanding of the environment acting as additional context/input for deciding controls.
Are you suggesting this would likely not work for even relatively basic tasks or are you raising concerns for complex and rare situations?
For example, do you think something like this could learn that it should switch left into a faster lane because it's far enough from the upcoming highway exit and similarly should switch right to prepare to exit when closer? I suppose there could be a potential concern that many humans are happy to follow behind slower traffic and not bother with the faster lane?
Yeah, I noticed that too and wondered if it's used for another fine-tuned output head adjacent to control. Explicit labeled training targets for lanes, objects, signals, etc. could potentially boost world model internal weights for these concepts and speed up learning, and these stronger signals could then be more useful for other downstream tasks like control policy.
One practical use of another head trained from labeled data is to produce visualizations. Another use of labeled data is to make it searchable if Tesla is looking for certain types of behaviors, e.g., examples of adjacent green turn signal when you're the first vehicle at the stop line and the driver did not go.
My prediction: what Ashok is hyping as end-to-end in V12 isn't a end-to-end training. Instead it uses evolutions of existing perception with existing autolabels for intermediate objects, and a neural network distillation of the existing optimization and rule based policy planner. Now the planner lives back on the training servers and can have a higher computational budget (if that was limiting it before) there to make the training labels for the policy network. Maybe some very cleaned up human driving clips can be added too.Based on Teslascope comments I think there is a good chance that V12 is going out to employees now as part of holiday update.
We find out tomorrow.
Based on Teslascope comments I think there is a good chance that V12 is going out to employees now as part of holiday update.
We find out tomorrow.
I know you are super-bullish about everything Tesla - still ....Based on Teslascope comments I think there is a good chance that V12 is going out to employees now as part of holiday update.
We find out tomorrow.
Agreed. What is the rush other then to have something new to complain about. Test it, refine it, and release it when it’s ready not when we want it.I know you are super-bullish about everything Tesla - still ....
Anyway, I personally hope they send V12 to everyone when it is as the benchmark used for V11. They made sure there was hardly any regression on V11 compared to before the "single stack".
In other words, the gate should be quality and not calendar.
This does not seem helpful.These 2023 Holiday Update Park Assist visualizations (at a supercharger?) are much more detailed than FSD Beta 11.x occupancy network visualizations. I wonder if this is a preview of FSD 12.x visualizations?
View attachment 997492
A fairly useless tech demo of NeRFs. I can't think of a practical use for this.This does not seem helpful.
You can't see a practical use for the park assist?A fairly useless tech demo of NeRFs. I can't think of a practical use for this.
Yeah, this is a much better representation of the raw occupancy network (ON) output than converting the data to virtual ultrasonic signals to use the existing swiggly line UI.You can't see a practical use for the park assist?
I think I'd want my car to show translucent so I don't have to interact with the screen to see everything, but I do like the proximity coloring. I can't wait to see what it does in my driveway with bushes, light poles, garage door and such.Yeah, this is a much better representation of the raw occupancy network (ON) output than converting the data to virtual ultrasonic signals to use the existing swiggly line UI.
Ashok Elluswamy answered a question about 3D voxel sizes at CVPR: "Even for between the robot and the car, you can configure different sizes as they have different needs… Optionally, it can be made queryable as in you can get arbitrary precision…"lower the draw distance and increase the resolution when in parking mode
Blame the fill-in-the-blanks generative ai NeRF said the child in the blind spot. A recreated 3D-scene that looks great isn't the same as you know what's in the scene. Safe scene understanding requires 360 sensing.You can't see a practical use for the park assist?