FSD v12.x (end to end AI)

Tronguy · Mar 13, 2024

kabin said:
More specifically, thick as a brick is in reference to the net's ability to retain new training data.

Actually..

When one is playing with heuristics, one can look at a bit of code and say, "That code does this here in that situation." Lots and lots and lots of code, but it's all visible; one can even get the equivalent of printf's on the state variables during debugging and all that.

My understanding is that, with a NN, all that visibility is gone. The natural design is that there's inputs, lots; outputs, not so many, but what happens in between might not quite be a mystery, but it's close. So, the question of, "Just what was it computing?" might not have any easy answer.

Discoducky · Mar 13, 2024

Take time and watch this entire video:

Several examples of 12.2.1 doing things very human-like in a challenging, complex and busy situations where it is making decisions confidently. Obviously open for interpretations based on your comfort levels, but this looks and feels like an example drive which could easily be repeated with similar results (no interactions, no issues).

Is it Waymo level? Remains to be seen, but it feels like it. For instance, I'd be nearly convinced with a solid 4 hours of this kind of data without intervention. 8 hours and that would complete an entire average "shift" of a robotaxi between charges. And you'd need to string several of these together walk the 9's.

mongo · Mar 13, 2024

EVNow said:
If part of NN learning is by minimizing errors in test as well - you have to have fairly representative test data including oversampled edge cases. You are always looking at quantitative figures with testing (% errors etc) which can all be skewed by the kind of (and amount of) test data that you have. I don't think there are any free lunches when it comes to test data either.

Right, 1000 passes of test case type 1 with 0 passes of tests 2-10 should be more like 10% than 99.9%.
That said, FSD should properly handle all test cases (for some value of properly)

spacecoin · Mar 13, 2024

EVNow said:
Do we *know* that is not the case with living organisms .... ?

We know that Moravec's paradox anno 1988 still is valid and that's proof "enough" to me that humans don't work like that. Check out Lex Fridman Podcast #416 with Yann LeCun.

Mardak · Mar 13, 2024

Todd Burch said:
I've had 11.4.9 blow right through (or start to) well-marked, normally-placed stop signs

Yeah, I've experienced similar stop sign mistakes with FSD Beta, but most of the time, it'll handle the easy situations correctly. I've also experienced more false positive stopping/slowing where 11.x seems to be extremely sensitive to signage (doesn't even need to be a stop sign) at intersections especially with a crosswalk, and this particular case of 12.2.1 not stopping had clear 11.x visualizations of the intersection, crosswalk and stop sign. I've experienced very similar situations with 11.x correctly stopping for temporary stop signs on A-frames placed both on the left and right of the lane, so that's why I also suspect 12.x learning from 11.x mistakes could have this behavior under-represented.

Given Elon Musk's comment about "Gathering video training data over a wide range of adversarial conditions," it somewhat sounds like Tesla is focusing their dedicated drivers on the more dangerous situations that would not be ideal to put average FSD Beta testers in needing to disengage or even attempt. Correcting the stop sign behavior by rolling out 12.2.1 for fleet disengagement might be more acceptable than for high speed unprotected left turns. If Tesla is still collecting training data for this particular adversarial condition, it'll be interesting to see how quickly and widely 12.3 rollout will be as presumably it was trained on 12.2 disengagements so the newer version should be safer to release to more (but these unprotected lefts might need more training for 12.4+).

stealthyc · Mar 13, 2024

Discoducky said:
Take time and watch this entire video:

Several examples of 12.2.1 doing things very human-like in a challenging, complex and busy situations where it is making decisions confidently. Obviously open for interpretations based on your comfort levels, but this looks and feels like an example drive which could easily be repeated with similar results (no interactions, no issues).

Is it Waymo level? Remains to be seen, but it feels like it. For instance, I'd be nearly convinced with a solid 4 hours of this kind of data without intervention. 8 hours and that would complete an entire average "shift" of a robotaxi between charges. And you'd need to string several of these together walk the 9's.

It's very human like, but where I first clicked into the video it stopped where it looked like it was partially blocking an intersection and then ran a red light, followed by stopping at a red light while straddling the double yellow. Very human like. Not sure that's good though? Honest question.

It definitely drove quite well overall, especially relative to v11. I will say though that a network of mostly one way streets is not what I would consider particularly challenging. I don't mean to say it didn't encounter any challenging situations, but it would be more interesting to me to see it drive in manhattan or in situations where it has to negotiate with other drivers and navigate multi-lane roads.

I should add that I *have* watched more of Bradford's videos including the one in my area, I know they exist. And there are in fact more interventions in some of those.

JB47394 · Mar 13, 2024

stealthyc said:
It definitely drove quite well overall, especially relative to v11. I will say though that a network of mostly one way streets is not what I would consider particularly challenging. I don't mean to say it didn't encounter any challenging situations, but it would be more interesting to me to see it drive in manhattan or in situations where it has to negotiate with other drivers and navigate multi-lane roads.

We've come a long way from "It won't go when the light turns green."

powertoold · Mar 13, 2024

When you guys get V12, you'll be surprised how stable, coherent, and reliable its decision making is. I had my doubts about end to end video training initially, but V12 is way more predictable and stable than I anticipated. It doesn't randomly hallucinate a car and swerve or stop for no reason. It doesn't imagine a random lane and drive in it, etc.

All of its decisions have some rational foundation, even if the reason is a bad one.

If this is the case, then more training will only reduce the bad decisions.

LowlyOilBurner · Mar 13, 2024

powertoold said:
When you guys get V12, you'll be surprised how stable, coherent, and reliable its decision making is. I had my doubts about end to end video training initially, but V12 is way more predictable and stable than I anticipated. It doesn't randomly hallucinate a car and swerve or stop for no reason. It doesn't imagine a random lane and drive in it, etc.

All of its decisions have some rational foundation, even if the reason is a bad one.

If this is the case, then more training will only reduce the bad decisions.

I bet it fails an 1/4mile out of my driveway, in the same spot 11.4.9 fails.

EVNow · Mar 13, 2024

powertoold said:
When you guys get V12, you'll be surprised how stable, coherent, and reliable its decision making is. I had my doubts about end to end video training initially, but V12 is way more predictable and stable than I anticipated.

Agreed. It seems to be doing better than I anticipated ... in the poll on when V12 will get released - I chose H2 of this year. Its possible we'll get it sooner (though considering we are in March - it may be beginning of H2). The failures also seem to be somewhat guessable (close encounters in parking lots, dividers etc). Red light skipping can be monitored and intervened easily.

My biggest worry with AP/FSD in general is swerving on to traffic suddenly departing the lane. That is very difficult to prevent even if you are very attentive and have your hand on the wheel all the time. V12 doesn't seem to do that (infact I'm aghast how little hands on the wheel we see with all the testers - I always have one hand on the wheel all the time and at intersections, both the hands).

Mardak · Mar 13, 2024

EVNow said:
Will they be able to add training data to correct mistakes but that doesn't adversely affect other parts of driving ? How long will that take for each problem ?

One big potential for 12.x is the ability to improve on many things at the same time versus 11.x generally required dedicated engineering effort focusing on specific control issues. Even one disengagement of a particular behavior can incrementally improve end-to-end for the next trained version, so aggregate that across the many different types of disengagements across the fleet, there can be parallel learning potentially even with synergies of seemingly different situations that the neural network associates together. The downside of not having a dedicated effort is a less clear "finished" state such as the introduction of control for creep limit and median crossover region; but then again, 11.x wasn't able to consistently pass Chuck Cook style situations either.

It sounds like Tesla has a huge repository of examples of what FSD Beta should not do, and presumably additional test cases are continuously added from safety disengagements. However, a lot of driving is also very flexible such as making a lane change now vs later, so additional training can adversely affect other parts of end-to-end. Although hopefully if it's truly flexible, maybe some of these "regressions" don't matter as much.

drtimhill · Mar 13, 2024

Tronguy said:
Actually..

When one is playing with heuristics, one can look at a bit of code and say, "That code does this here in that situation." Lots and lots and lots of code, but it's all visible; one can even get the equivalent of printf's on the state variables during debugging and all that.

My understanding is that, with a NN, all that visibility is gone. The natural design is that there's inputs, lots; outputs, not so many, but what happens in between might not quite be a mystery, but it's close. So, the question of, "Just what was it computing?" might not have any easy answer.

Indeed, and this is the concern with such things .. there is never any way to determine if (say) a robotaxi that has been driving well for 5 years suddenly decides to drive into a brick wall. NNs work in probabilities only. However, any sufficiently large code base, in practice, also becomes very hard to make 100% predictable .. something you can see the FSD team was struggling with in V11.

PianoAl · Mar 13, 2024

EVNow said:
swerving on to traffic suddenly departing the lane.

What do you mean? Typo?

Discoducky · Mar 13, 2024

stealthyc said:
It's very human like, but where I first clicked into the video it stopped where it looked like it was partially blocking an intersection and then ran a red light, followed by stopping at a red light while straddling the double yellow. Very human like. Not sure that's good though? Honest question.

It definitely drove quite well overall, especially relative to v11. I will say though that a network of mostly one way streets is not what I would consider particularly challenging. I don't mean to say it didn't encounter any challenging situations, but it would be more interesting to me to see it drive in manhattan or in situations where it has to negotiate with other drivers and navigate multi-lane roads.

I should add that I *have* watched more of Bradford's videos including the one in my area, I know they exist. And there are in fact more interventions in some of those.

my only gripe in that video is the red light running. However, in that scenario, there is an indication that FSD had already entered the intersection (including coming to a smooth stop) based on the position of the solid white line relative to the white car parked on the right.

And another car was also in the intersection while travelling well below the speed limit. So was FSD actually in the wrong? I would not have proceeded, but I can see how the model could have interpreted this situation differently based on measurements.

PianoAl · Mar 13, 2024

JulienW said:
Unfortunately that proves the placebo effect. There is NO v12 of highways. It switches back to the same v11 stack when you get on a highway.

I'm a super fan of FSD, and I only have a few gripes with v12, but I'm seeing a lot of honeymoon effect with V12 reports.

For example, Dirty Tesla is wearing rose colored glasses in the video below. He's like, "Wow, that was great!" for many things, but doesn't emphasize some major mistakes, such as this one where the car could have had a close encounter of an unpleasant kind if she hadn't been paying attention.

sleepydoc · Mar 13, 2024

uscbucsfan said:
Except all of it.

We went over this. They had drivers manually driving for weeks as Chuck pointed out, then what looked like FSD driving. There have been drivers there for months testing and training almost daily...which Elon confirmed.

But of course, there's no reason for them to be training anywhere, because they will just pull data from cars. Tesla isn't hiring drivers to test/validate FSD, and on and on.

You couldn't have been more wrong.

It’s very possible to make a sound argument or decision based on the available information and still be wrong.

PianoAl · Mar 13, 2024

Discoducky said:
And another car was also in the intersection while travelling well below the speed limit. So was FSD actually in the wrong?

Sure looks like it was dead wrong to me:

The guys in the car joked about it, but Tesla can't let the cars do that. They were lucky that NYPD car didn't show up a block earlier.

mongo · Mar 13, 2024

PianoAl said:
I'm a super fan of FSD, and I only have a few gripes with v12, but I'm seeing a lot of honeymoon effect with V12 reports.

For example, Dirty Tesla is wearing rose colored glasses in the video below. He's like, "Wow, that was great!" for many things, but doesn't emphasize some major mistakes, such as this one where the car could have had a close encounter of an unpleasant kind if she hadn't been paying attention.

"She" being the second pedestrian?
There was little risk at that speed (4 MPH), they didn't use the crosswalk (it's at the tree), and this is Ann Arbor (IYKYK)...

stealthyc · Mar 13, 2024

PianoAl said:
I'm a super fan of FSD, and I only have a few gripes with v12, but I'm seeing a lot of honeymoon effect with V12 reports.

For example, Dirty Tesla is wearing rose colored glasses in the video below. He's like, "Wow, that was great!" for many things, but doesn't emphasize some major mistakes, such as this one where the car could have had a close encounter of an unpleasant kind if she hadn't been paying attention.

That area is part of the University of Michigan campus, and often the students will just blindly walk out into the street anywhere without regard, either on phones or just not paying attention. More driving around that area would be be edifying about how it interacts with pedestrians that aren't acting "properly". I think actually a different time in the video Bradford disengages to stop it from aggressively moving toward a pedestrian crossing the street when the car is making a right turn.

Overall the car did pretty well in Ann Arbor, but I know of many places that v11 mightily struggles for me there and I look forward to getting v12 and giving it a whirl.

stealthyc · Mar 13, 2024

mongo said:
"She" being the second pedestrian?
There was little risk at that speed (4 MPH), they didn't use the crosswalk (it's at the tree), and this is Ann Arbor (IYKYK)...
View attachment 1027573
View attachment 1027574 View attachment 1027575

Yeah I think later there's a disengagement at 27:18 turning on to State from Liberty which is far worse.

FSD v12.x (end to end AI)

Active Member

P100DL, 2021 M3, 4 CT reservations and counting

Well-Known Member

Active Member

Active Member

Member

Active Member

Active Member

Active Member

Well-Known Member

Active Member

Active Member

Active Member

P100DL, 2021 M3, 4 CT reservations and counting

Active Member

Well-Known Member

Active Member

Well-Known Member

Member

Member

Similar threads