FSD v12.x (end to end AI)

Mardak · Dec 24, 2023

willow_hiller said:
V12.1 going out to employees with internally-terse release notes:

https://twitter.com/x/status/1738764149621170366

Looks like the same "FSD Beta v12" release notes from a few weeks ago for 2023.38.10. But presumably they made improvements to bump it to "FSD Beta v12.1" as well as updating to 2023.44.30.10, so that employee vehicles already on holiday update could get this version. The release notes seem to be remotely hosted anyway, so theoretically customer vehicles could update to this same version if there aren't issues found with wider employee testing to slowly rollout even more.

spacecoin · Dec 24, 2023

Supcom said:
I'm still skeptical, but a useful L3 next year would be a realistic goal.

ODD limitation; close vicinity of a parking space? I don’t think Tesla ever will take on liability for even Autopark for existing vehicles

Mardak · Dec 24, 2023

powertoold said:
When I listen to Elon talking about the 300k lines of code, he's constantly referring to behaviors at traffic elements / lanes / etc.
…
If turning left at roundabout, wait check left quadrant for any cars > 5mph with predicted path intersecting ego path within next 3 seconds, proceed after 3 seconds

Thanks for the example. I'm not sure if FSD Beta has had that type of control logic specifically for roundabouts as it seems to be more of a general driving behavior to check for intersecting paths of relevant cars continuously in addition to all the other things to consider like lanes. The complexity comes from all the other things to simultaneously control for like a roundabout-looking intersection but actually has traffic lights or a pedestrian in the center or construction cones are on the edge, etc. Or even more basic, the vehicles in the roundabout suddenly stop, so proceeding still needs to avoid hitting the new lead vehicle.

If a neural network was trained to handle all the potential behaviors needed at a roundabout, it seems like it might as well be trained to handle every other intersection as well non-intersections. But I suppose if there was special roundabout-specific control logic and this control neural network was only trained for roundabouts, that portion of software 1.0 could switch over to 2.0, and the rest of control could follow the pattern and incrementally get to end-to-end modular control

Perhaps Tesla did look into incrementally increasing scope of a control network vs a complete control network for every situation vs single end-to-end network and decided that it would be best to go with single.

Mardak · Dec 24, 2023

Hah.

TeslaFi is on top of this one -- already labeled 2023.44.30.10 as FSD Beta 12.1 without any pending installs. TeslaInfo already has a 2017 Model 3 on that version, so we might be able to notice when Tesla expands to their widest employee testing group on these services. If they follow their 2-week pattern, I suppose it could be end of the first week of new year?

Todd Burch · Dec 24, 2023

BBTX said:
I still complain by voice on important disengagements. I doubt every single one is listened to by a human, but I bet somewhere in their sea of metrics, they're at least giving more weight to disengagements that were voice-tagged at all, and who knows maybe they count swear words or something.

The smart way to do it—and they way I suspect they’re actually doing it—is to send the recording through voice recognition, then look for keywords. For example, “speed bump”.

Then, suddenly you have 50,000 immediate examples of the car failing to slow down for a speed bump. Then run those clips through the auto-labeler (or if early in the process, label manually as needed). Review the auto-labeled results by a human and adjust as necessary. Would be a great way to get a lot of pertinent clips quickly.

powertoold · Dec 24, 2023

Mardak said:
Perhaps Tesla did look into incrementally increasing scope of a control network vs a complete control network for every situation vs single end-to-end network and decided that it would be best to go with single.

Yes, I believe this is the case. A software 2.0 strategy is actually compute inefficient for inference. Karpathy has talked about FSD being a balancing act of inference compute since 2020, where he mentioned that different fsd teams would be rationed some portion of the compute for the 1000+ outputs from HW3. Over time, it follows that you'd need more efficient architectures as you consume more heuristics with NNs.

mongo · Dec 24, 2023

johnm said:
I guarantee no one will ever deploy a system with unlimited compute.

What is the compute per mile of a stopped car?

diplomat33 · Dec 24, 2023

Anyone notice that the release notes for V12 say that it upgrades "city streets driving"? This seems to imply to me that the new end-to-end stack is only for city streets right now and therefore V12 still uses the "V11 stack" for highway driving.

mongo · Dec 24, 2023

diplomat33 said:
Anyone notice that the release notes for V12 say that it upgrades "city streets driving"? This seems to imply to me that the new end-to-end stack is only for city streets right now and therefore V12 still uses the "V11 stack" for highway driving.

Tesla has been very cautious in modifying Navigate on Autopilot. Likely due it its level of performance in that domain. Methodology may be to test new stuff in City which has more stressors, and only migrate to NoA when its solid (and after running in parallel with NoA).

willow_hiller · Dec 24, 2023

diplomat33 said:
Anyone notice that the release notes for V12 say that it upgrades "city streets driving"? This seems to imply to me that the new end-to-end stack is only for city streets right now and therefore V12 still uses the "V11 stack" for highway driving.

Teslascope did answer the question. I think it's a possibility, but their understanding is that this is just draft text and not really indicative of what we'll eventually see as public release notes:

https://twitter.com/x/status/1738906813863743913

diplomat33 · Dec 24, 2023

willow_hiller said:
Teslascope did answer the question. I think it's a possibility, but their understanding is that this is just draft text and not really indicative of what we'll eventually see as public release notes:

https://twitter.com/x/status/1738906813863743913

Thanks

gsmith123 · Dec 24, 2023

Todd Burch said:
I am almost positive they are reusing a lot if not all of the perception pieces from version 11. V12 is about feeding these inputs into a new neural network or set of neural networks that do all the decision-making and control outputs for the driving task.

Is this a broadly agreed on community consensus?

I really thought they were trying to end-to-end one single neural network where it takes in video (and other sensor inputs) and spits out a path and velocity plan.

A previous post makes the point the if they're reusing the existing perception layer they could be building V12 more incrementally.

Another downside I see is that separate layers for perception and planning implies a connection between them in the middle (think "public API" in computer engineering terminology). I think people have used the term "Vector Space" for the output from the perception stack (locations of vehicles, locations of VRUs, lane lines, driveable space, signs, signals, metadata about these objects, etc).

The downside I see is these are all human picked and curated concepts. By building planning on top of these the planner can only know about the concepts that human decided to build into the perception layer. Combining all of it into one neural net, end to end, eliminates the need to even pick what stuff is represented in the intermediary "Vector Space", or how it's represented.

willow_hiller · Dec 24, 2023

gsmith123 said:
The downside I see is these are all human picked and curated concepts. By building planning on top of these the planner can only know about the concepts that human decided to build into the perception layer. Combining all of it into one neural net, end to end, eliminates the need to even pick what stuff is represented in the intermediary "Vector Space", or how it's represented.

It's still possible to connect intermediate layers of one module to another, prior to the high-dimensional vector space being reduced down to the human-interpretable concepts.

The only requirement I see for a neural network to be truly end to end is that the back propagation during training is done in one continuous step. Tesla could hold the current final layer of the perception stack constant and separate, for visualization purposes, and train the rest of the network end to end.

But throwing out the training efficiently of actual end to end for the sake of visualizations would be throwing the baby out with the bathwater.

gsmith123 · Dec 24, 2023

I see, sort of. This is where my lack of ML experience isn't helping...

willow_hiller said:
It's still possible to connect intermediate layers of one module to another

How sort of information is passed around in this architecture? What are the inputs and outputs of "layered" NNs like this? Are those inputs and outputs completely non-human readable? How do you train it then?

willow_hiller · Dec 24, 2023

gsmith123 said:
I see, sort of. This is where my lack of ML experience isn't helping...

How sort of information is passed around in this architecture? What are the inputs and outputs of "layered" NNs like this? Are those inputs and outputs completely non-human readable? How do you train it then?

If you have a couple hours and are interested in learning, this video from Karpathy is really useful for understanding the core mechanics of modern NNs:

It also comes with a GitHub repository: GitHub - karpathy/ng-video-lecture

I can't speak for C++ implementations, but the inputs and outputs in Python are just a special type of numerical matrix, with defined dimensions. The architecture that defines the connections between the weights are modules, and the training is a process of initializing random weights, running the inputs through them to achieve a predicted output, and then updating the values of the weights from back to front based on how the predicted output compares to the desired output.

mongo · Dec 24, 2023

gsmith123 said:
I see, sort of. This is where my lack of ML experience isn't helping...

How sort of information is passed around in this architecture? What are the inputs and outputs of "layered" NNs like this? Are those inputs and outputs completely non-human readable? How do you train it then?

In a discrete human understandable NN, you feed data in one end and gave human labels at the other. Video -> {lanes, cars, pedestrians} that kind of thing. So it might go video, objects, path planning.
However, that intermediate labeling step throws out all the additional data in the feed and only leaves the next NN an N label vector of probabilities/values. Maybe there is a low confidence pedestrian object that is actually something important.
Connecting all the NN together without the downsampled intermediate stages let's it pull more information into the final output.
Result is less/no human friendly quantization layers.

If doing OCR.
NN1: circle, arc, line detection
NN2 or C++: based on NN1, determine letters
NN3 or C++: based on NN2 outputs, determine word
vs
NN: determine word

kabin · Dec 24, 2023

Yep, it'll be a slew of NNs as it's much easier to manage and test.

powertoold · Dec 24, 2023

Why do the V12 release notes say "a single end-to-end" NN?

sleepydoc · Dec 24, 2023

spacecoin said:
It’s completely unrealistic to me that FSD will be L3 at highway speed next year. Perhaps in a few years at lower speed daytime, dry roads and without lane changes like Mercedes, but likely never due both to lack of business incentives and technical limitations . L4? LOL.

Just the certification for (L3) UNECE R157 will take 3-6 months and Tesla currently don’t have the reliability required nor the features required like hand over protocol, MRM, emergency corridor et c. Earliest 2025, probably never on current hw is my guess.

Instead they’re targeting they new DCAS regulations from UNECE in they ever can get that approved. Tesla’s been chairing that L2 effort for 3 years with limited success thus far. Perhaps implemented by 2025-2026?

My experience with AP and, more recently, FSD on the highway has been excellent. The only time I’ve had to take over for FSD is because it misses exits, and those are primarily cloverleaf exits which are somewhat difficult anyway because there’s such a short merge period. If you restrict it to Straight interstate driving with no exits then it absolutely could be ready.

I don’t think it’s ready for all highway driving but definitely some limited use scenarios, and even those would be huge.

Edit: my other minor gripe is it doesn’t reliably merge back over to the right lane after passing unless there’s a car on your tail. That should be a trivial fix for them to make, though.

spacecoin · Dec 24, 2023

sleepydoc said:
My experience with AP and, more recently, FSD on the highway has been excellent. The only time I’ve had to take over for FSD is because it misses exits, and those are primarily cloverleaf exits which are somewhat difficult anyway because there’s such a short merge period. If you restrict it to Straight interstate driving with no exits then it absolutely could be ready.

I don’t think it’s ready for all highway driving but definitely some limited use scenarios, and even those would be huge.

Edit: my other minor gripe is it doesn’t reliably merge back over to the right lane after passing unless there’s a car on your tail. That should be a trivial fix for them to make, though.

If you feel it’s ready, why not try it with a blindfold and see if it still feels ready?

I love my AP but I also understand that I am driving.

FSD v12.x (end to end AI)

Active Member

Active Member

Active Member

Active Member

14-Year Member

Active Member

Well-Known Member

Average guy who loves autonomous vehicles

Well-Known Member

Well-Known Member

Average guy who loves autonomous vehicles

Member

Well-Known Member

Member

Well-Known Member

Well-Known Member

Active Member

Active Member

Well-Known Member

Active Member

Similar threads