FSD v12.x (end to end AI)

JB47394 · Mar 17, 2024

Mardak said:
Similarly, even when navigation wanted to go straight, I was sometimes able to trick it to turn by engaging the turn signal.

I've wanted that as a feature for a while because I often use FSD as a simple lane-following system where I get to back seat drive with the turn signals. I tried it once today with no luck. I'll have to try it some more.

Mardak said:
I haven't tested enough of regular set speed, but it seems like 12.x tries to drive at the speed it wants to unless traditional control code prevents exceeding the set speed (or 50% above detected speed limit). AUTO max set speed effectively is the Autopilot 85mph limit, so the main benefit of numeric set speed seems to be to go slower, e.g., for school zones if end-to-end doesn't understand.

I've had good luck with auto speed off. The car tends to hit the number I give it, and it's much better at maintaining a uniform speed.

Separately, I noticed that when crossing into a reduced speed limit zone, the car no longer slows quickly like it does in the current V11. It's very casual about slowing down, and invites speeding tickets.

JB47394 · Mar 17, 2024

JHCCAZ said:
Other kinds of traffic rule signs, including NToR and school zone directives etc, were only sporadically (if at all) recognized in v11. Again, I don't think these capabilities would emerge easily in a training process that simply added some random set of parameters. In theory it could but at a very high cost in training data and time.)

Do you think that stop signs have a very high training cost?

AndrewZ · Mar 17, 2024

Iain said:
Funny how there are so many different opinions. I’ve had it one day. I did four drives and so far zero interventions. I basically never had a zero intervention drive prior to this unless it was extremely short. It’s not perfect by any means but it’s a big jump in the right direction. I’ve had fifty Uber rides that were worse. I see drivers everyday drive far worse than 12.3.

A SAE Level 4-5 system needs to be much, much better than all Uber drivers and much, much better than regular everyday drivers.

It’s promising to hear that v12.3 is excelling in smoothness and such, this is critical for wide release as a Level 2 driver assist.

JHCCAZ · Mar 17, 2024

RowdyMY said:
Auto max speed blows on state highways. Tries to match speed when clusters of speeders pass, otherwise hangs out at 2-3 above the limit.

Would you say that this behaves like the annoying class of drivers who are going slower, but then wake up and compete with you when you try to pass them? This would definitely be the wrong behavior.

There's a fine line between keeping up with traffic on the one hand, and reactively getting in the way on the other hand. Each driver can have their own slightly different speed preference and driving style. That's okay and others will adapt by passing or following as appropriate. But the ones who are too dense to get out of the way, or who speed up even a little when you try to pass, are the scourge of the road . I hope v12 isn't doing those things! And definitely if this is the case, it should get fixed before they deploy it on the freeways.

(I turned off the Auto speed offset last night with no traffic around, so I didn't get to experience its interaction with afternoon traffic today.)

DanCar · Mar 17, 2024

Joe is not impressed with V12 driving on rural roads.

https://twitter.com/x/status/1769527966533382453

Bladerskb · Mar 17, 2024

Todd Burch said:
The best part of 12.3 going wide is that I don’t have to watch any more of Whole Mars’ boring videos .

Yeah he takes the same easy routes so he can get unlimited "zero intervention". When he's in Cali and SF and can actually test the software for real and take it to some places to challenge it. But nope.

If you actually want to watch challenging videos from Cali, you have to go to people like Ken who has to pay Waymo for ride yet picks destinations that would challenge the system.

Even randoms like this have more challenging videos than you would see from Omar.

FSDtester#1 said:
Nope, your not.
The drivers located back at Waymo HQ waiting to takeover whenever their cars get stuck, which happens often are. Not a bad system for ~ $500k per car, hardware included, plus a staff to supervise and help drive all the cars.
Seems like a super scalable model.

/S

Stop listening to super stans on twitter and youtube who have brain rot.
There's no one at Waymo HQ waiting to takeover (No one joysticks or pedals the car remotely).
It DOESN'T get stuck often.
And its no where near $500k.

AlanSubie4Life · Mar 17, 2024

AlanSubie4Life said:
Awesome! Chuck is a machine.

I haven't watched all of Chuck's v12 videos, but I gather from his first video (not unprotected left specific) he had an easy roller on his UPL. So we'll call that 1/1.

From the new UPL-focused video:

This was a medium to light traffic situation. There was a decent amount of traffic (but plenty of gaps) in the close lanes, but very very little in the far lanes.

Hopefully he does another one with more traffic soon! Plenty of easy rollers here, where it just had to wait for a break in near-side traffic and then it was all clear.

Overall: 6 out of 11. (Plus 1/1 for the initial video, so 7/12)

Chuck's UPL:

1 Pass Easy roller
2. Fail. Came to near stop in traffic lanes; this is incorrect and wrong. Fail!
3. Pass Easy roller
4. Pass Easy roller
5. Pass Easy roller
6. Pass Easy roller
7. Fail. Left butt out in lanes for a while. This is a failure, because it is wrong. Then had wrong pose in median. Not clear it could see, though it behaved correctly by not going, so perhaps it could see further than the view that is displayed.
8. Pass Easy roller

Other UPL:
1 Fail. Caused traffic to slow down to avoid near miss or collision (there was construction, but traffic had to slow for Chuck even though he said it was for the construction). I would definitely have disengaged, but this was an obvious failure. Chuck has nerves of steel; must be the Navy training!
2 Fail. Stopped in traffic lanes for left-turning truck in median. First Disengagement.
3 Fail. Missed a six-second gap in traffic (I would give this a pass...but then subsequently it paused in near-side traffic lanes again due to traffic on the far side). Anyway, the pause is incorrect. I think missing a six-second gap followed by a huge gap could be argued to be fine, but it's a close call. But it's a fail anyway.

So Chuck UPL alone:
6/8 + 1/1 = 7/9. (Have not reached minimum number of attempts on that specific turn, but would take 11/11 on the next video on 12.3 for him to get it up to 18/20, and for me to lose.)

Overall:
7/9 + 0/3 = 7/12. One disengagement.

For the Unprotected Lefts:

The NHTSA stop at the exact (legally required) stop line is annoying. But more annoying than that is how long it takes to resume and go to the creep limit. Tesla should fix this! There are big problems with pausing in traffic lanes. That to some extent has existed before but it is back with a vengeance here. I didn't really evaluate pace closely, but it seemed a bit slow to cross, and it still was taking a little bit of time to really push it when entering traffic on the far side.

For 12.3 I nearly have a lock (arguably I've already won), but I'd like to see more difficult situations tested. There's a serious regression here as Chuck said, with the stopping in traffic lanes, etc. And we saw no examples of left turns with significant traffic from the right, far-side lanes We really need to see that threading in - it's not clear it can handle it reliably without stopping in the near-side lanes.

I'm willing to bet again on the next version, as long as it's not a bunch of easy rollers. It really has to be tested with some traffic. It failed even the easy case this time, but next time with all the special training, we need to have situations where it usually has to wait for (or time the gaps) on far-side traffic.

Usually Chuck always tests higher traffic situations, so I expect unless there's another release very soon, he'll get a chance to try 12.3 in busier traffic. That'll be exciting.

But on 12.4 or whatever is next, I think there's a chance I'll lose. I still am betting against FSD though. I think it will not do 9/10 or better (with traffic).

Overall, it's good to hear that people are generally describing this as a "step change" in utility. That's a bit more than I expected, which was incremental improvement. I guess I'll see, when I get it. So far, a regression on unprotected lefts, but hopefully that is cleaned up soon!

It's come to my attention that @Daniel in SD disagrees with the scoring system (I want to point out it's made clear in the 10.69 thread that it can't do anything wrong or weird, like stopping in lanes of traffic). He wants to count only the OG UPL (that's fine, that was the bet as I recall), and he wants to count only disengagements (and the various other obvious failures like honking, etc., listed elsewhere).

So I'll be magnanimous this one time and say that with this in mind, so far on Chuck's OG UPL, the score is: 9/9. That leaves the bet unresolved on this version, unless Chuck does more turns (with traffic from the right, this time).

I have minimal concerns about winning the bet on 12.3, as long as more than 5-6 of those more challenging turns are taken.

Anyway, the reason for the "no weird s**t" rule, is that I like to call failures failures, rather than successes. But I'm unilaterally waiving it this one time. 9/9 it is (so far) on Chuck's UPL (with minimal traffic - it was basically 7 or 8 easy rollers plus two with slightly more complexity). And the actual sane score was 7/9 but not for the beer bet.

The count will reset on version 12.4, and @Daniel in SD will have to come up with a poll or something for us to determine the correct treatment for this nonsense for subsequent bets.

In any case, this makes it interesting. It's possible that 12.3 could come out with great performance with traffic. We'll see (maybe - that depends on Chuck). It could surprise me with its perspicacity!

JHCCAZ · Mar 17, 2024

JB47394 said:
Do you think that stop signs have a very high training cost?

Not intrinsically, because they're so common (ubiquitous as I mentioned). And to reiterate, stop sign recognition was well developed when the v11 network was carried over into v12; no really new capability was required.

However, I do think (and Elon and Ashok said right from the get-go) that the training cost of stop signs was made artificially high in V12 because of the NHTSA's abhorrence of human-like rolling stops. Also (apparently) the insistence that the official full-stop line is right at the signpost location, rather than allowing it to be at the natural road edge slightly past the sign.

Both of these requirements are largely unavailable from good-driver training clips. Tesla hasn't really explained how they're dealing with this, but my guess is with billions of simulated full-stop scenarios, and possibly some kind of automatic data deletion of normal (deemed incorrect) human stop-sign encounters.

Overall, what I'm getting at is that the E2E approach is excellent at fine-tuning behavior, recognizing and mimicking the subtleties of human driving technique in a way that's almost impossible to cover with traditional coding. However, while it's possible to try to encourage new "emergent" AI capabilities from nothing, it's much less costly to give it an imperfect, underachieving but basically capable structure, and let the E2E refine it so that it actually works well.

So far, this is exactly what we see from v12. It pretty much does what v11 did, but more smoothly and confidently. At this point it does very little that v11 couldn't do it all. Even the "new" U-turns are a gray area in this argument, going from v11 bailing on them, probably due to C code guardrails, to early v12 being able to do slightly creaky U-turns. Really new capabilities will come, I believe, from engineering-developed NN modules that have the new but raw structure, followed by some combination of real world and simulated video clips to make it successful.

MP3Mike · Mar 17, 2024

AlanSubie4Life said:
So I'll be magnanimous this one time and say that with this in mind, so far on Chuck's OG UPL, the score is: 9/9. That leaves the bet unresolved on this version,

How is that unresolved? It is 9 out of ten to pass right? So, no matter what happens on the next attempt it will pass right?

powertoold · Mar 17, 2024

heltok said:
George Hotz on end2end:

The amazing thing is that V11 was/is required to get to V12...

In order to curate / test / simulate / etc. the data, you need V11 heuristics

https://twitter.com/x/status/1695506496958976118

Daniel in SD · Mar 17, 2024

AlanSubie4Life said:
It's come to my attention that @Daniel in SD disagrees with the scoring system (I want to point out it's made clear in the 10.69 thread that it can't do anything wrong or weird, like stopping in lanes of traffic). He wants to count only the OG UPL (that's fine, that was the bet as I recall), and he wants to count only disengagements (and the various other obvious failures like honking, etc., listed elsewhere).

So I'll be magnanimous this one time and say that with this in mind, so far on Chuck's OG UPL, the score is: 9/9. That leaves the bet unresolved on this version, unless Chuck does more turns (with traffic from the right, this time).

I have minimal concerns about winning the bet on 12.3, as long as more than 5-6 of those more challenging turns are taken.

Anyway, the reason for the "no weird s**t" rule, is that I like to call failures failures, rather than successes. But I'm unilaterally waiving it this one time. 9/9 it is (so far) on Chuck's UPL (with minimal traffic - it was basically 7 or 8 easy rollers plus two with slightly more complexity). And the actual sane score was 7/9 but not for the beer bet.

The count will reset on version 12.4, and @Daniel in SD will have to come up with a poll or something for us to determine the correct treatment for this nonsense for subsequent bets.

In any case, this makes it interesting. It's possible that 12.3 could come out with great performance with traffic. We'll see (maybe - that depends on Chuck). It could surprise me with its perspicacity!

I think you are falling victim to the common misconception that FSD beta is a driver assist and not a beta version of FSD. It's not "weird" to drive slowly across the road when the vehicle is just going to have to wait in the median. It was clearly optimizing for energy efficiency (something you have complained about in the past). This is superhuman performance, albeit only on single left turn in Florida.
I'm sure Chuck will go back and do more turns with 11.3 in more "challenging" conditions.

TK211X · Mar 17, 2024

Waiting on this wave to drop staying up past midnight.

AlanSubie4Life · Mar 17, 2024

MP3Mike said:
How is that unresolved? It is 9 out of ten to pass right? So, no matter what happens on the next attempt it will pass right?

No, the bet counts all attempts on a given version, not just the first 10. Old rules, too many to list. It’s all in the 10.69 thread. I didn’t respecify it explicitly here because rarely does Chuck do more than 9 attempts in one video.
He also doesn’t just keep doing videos on a given version. Usually we see two UPL videos, the second one with more traffic due to typical weekend timing of release.

I also expect a 12.3 reprise. Chuck was very clear that he was not challenging 12.3 with the first test. He likes to give it a challenge.

Daniel in SD · Mar 17, 2024

AlanSubie4Life said:
No, it counts all attempts on a given version, not just the first 10. Old rules, too many to list. It’s all in the 10.69 thread.

Need to ask Chuck to do a video with just one turn.

arnolddeleon · Mar 17, 2024

Notes from today's drives:

Around where I live it is still way better than V11. Most of today's drive was going to be freeway so that was going to be pretty boring (since that part is V11). I did see that "autospeed" for the freeway is not a straight "10 over". It set a speed max speed of 63 for a 55 mph segment. From what I can tell it simply sets the max speed to some offset value and then it is the usual scroll up/scroll down adjustments.

My destination was in Berkeley and the driving was pretty good. There was a protected left turn where it wisely waited because there was no room to complete the turn because of congestion. I ended disengaging when the light was changing, and I decided that I could squeeze next to the other car slightly hanging out and still not block cross traffic. This "situational awareness" I think is the biggest difference between V11 and V12.

I've driven around here with FSDb before, and this felt a little smoother. Instead of smooth, perhaps another description is "under control". Basically, in the limited driving so far there hasn't been a "whoa" moment triggered by the amount of jerk. But just to be clear, far from perfect though.

I also encountered the reaaaaally slow acceleration in autospeed. In terms of errors, I feel this is better that having reign in an over exuberant acceleration.

The return trip involved trying to avoid some really heavy traffic on 880 and 680. The car chosen path involved more city street driving (yay?!). There were some really complicated lane negotiations that I thought FSDb did well on. There was one spot where I disengaged because I thought the car been a little too assertive. It was technically correct, and by zipper merge logic it was in the thing to do, but it just didn't "feel right" so I disengaged and the let the other cars go by me. There was also a moment where the car was going around a car and I ended accidentally disengaging. I think it was going to do it just fine. I was slightly behind the situation (driving in unfamiliar place). Once, I got caught up I think I completed what it was planning to do.

OxBrew · Mar 17, 2024

AlanSubie4Life said:
No, the bet counts all attempts on a given version, not just the first 10. Old rules, too many to list. It’s all in the 10.69 thread. I didn’t respecify it explicitly here because rarely does Chuck do more than 9 attempts in one video.
He also doesn’t just keep doing videos on a given version. Usually we see two UPL videos, the second one with more traffic due to typical weekend timing of release.

I also expect a 12.3 reprise. Chuck was very clear that he was not challenging 12.3 with the first test. He likes to give it a challenge.

Those first 9 turns were really unfairly easy. I think you have a case to ask for a new bet: 90% of the final total of Chuck's UPL on 12.3. At the end of the day it's not a first 9 for FSD until it's a significant sample size in all conditions.

I'll buy you both a beer if it does 18/20. (but only IPAs from the PNW).

Mardak · Mar 17, 2024

JB47394 said:
The car tends to hit the number I give it

From my quick test, I turned off AUTO set speed and scrolled up to 85mph, and 12.3 kept going the speed it wanted to go. This is quite a departure from 11.x behavior where adjusting the set speed basically allowed you to control how fast or slow FSD Beta would go. Another downside of AUTO is that it seems to replace the highway 11.x customized offset percentage, so practically it seems like if you want an offset over the speed limit, might as well use the custom offset as 12.x will tend to go slower than that anyway and you still keep the old behavior on highways. Additionally, if you do want to go slower, non-AUTO allows for that while with AUTO requires braking disengagement.

AlanSubie4Life · Mar 17, 2024

OxBrew said:
I think you have a case to ask for a new bet: 90% of the final total of Chuck's UPL on 12.3. At the end of the day it's not a first 9 for FSD until it's a significant sample size in all conditions.

That has always been the bet (and the total must meet or exceed 10 attempts), so no need for a new one.

It's going to be a nail-biter in the second video!

I actually think v12 is going to be exceedingly dangerous to supervise on this turn in more difficult conditions, and I hope Chuck plans it out carefully. It looks very hazardous.

Acisplat · Mar 17, 2024

TK211X said:
Waiting on this wave to drop staying up past midnight.

View attachment 1029152

It's been 24 hours since the last wave... what are the odds of another drop tonight?

Supcom · Mar 17, 2024

Mardak said:
From my quick test, I turned off AUTO set speed and scrolled up to 85mph, and 12.3 kept going the speed it wanted to go. This is quite a departure from 11.x behavior where adjusting the set speed basically allowed you to control how fast or slow FSD Beta would go. Another downside of AUTO is that it seems to replace the highway 11.x customized offset percentage, so practically it seems like if you want an offset over the speed limit, might as well use the custom offset as 12.x will tend to go slower than that anyway and you still keep the old behavior on highways. Additionally, if you do want to go slower, non-AUTO allows for that while with AUTO requires braking disengagement.

This is my experience as well. Auto speed seems to affect only the absolute maximum speed the car will select. It will select lower speeds if it wants.

When I turned off Auto speed on a 55 mph secondary road, it displayed 85 mph, even though it was traveling at 55-60 mph.

You can also roll the right scroll wheel and see arrows flash next to the auto speed indicator, but there is no driving effect that I can tell. I rolled the wheel downward a lot and the car just kept going at the same speed. I wonder what the intent is for this?

FSD v12.x (end to end AI)

Active Member

Active Member

Active Member

Electrified Engineer

Active Member

Senior Software Engineer

Efficiency Obsessed Member

Electrified Engineer

Well-Known Member

Active Member

(supervised)

Member

Efficiency Obsessed Member

(supervised)

Active Member

Active Member

Active Member

Efficiency Obsessed Member

Member

Active Member

Similar threads