FSD Beta v11.x

uscbucsfan · Jun 30, 2023

My car saw a red-light last week and stopped for it. Even rendered an entire traffic light. Unfortunately it was the brake light on a Semi.

dgatwood · Jun 30, 2023

willow_hiller said:
I can only assume there are two explanations for the variance in performance:

1. On aggregate, performance is improved with an update. Otherwise if Tesla saw major regressions across the board in their simulations, they wouldn't ship it.
2. Tesla does see the regressions in performance, but it's the consequence of introducing a new function or method that will be necessary for overall improvements in the future.

3. They added extra training data to fix some problem, and ended up making other things worse because the model is too small to handle what is being asked of it.

Tronguy · Jun 30, 2023

uscbucsfan said:
My car saw a red-light last week and stopped for it. Even rendered an entire traffic light. Unfortunately it was the brake light on a Semi.

For what it's worth: Was driving up to a lunch meeting yesterday and was looking carefully at the rendering of the red lights on the M3 panel.

At one of the intersections there happened to be three red lights pointed in my direction. The left one would blink on and off at about a 2 Hz rate for five second or so, then go steady red for another five seconds, lather, rinse, repeat. The other two stayed steady red.

To the eyeball all three stayed steady red.

Mind you, I was in the center lane going straight and was a few cars back from the front in any case, so none of this would have affected me anyway. But it kind of looked to me like a "beat" pattern, seen when two, say, square waves at nearly the same frequency are mixing with each other. If true, this would be the sampling of the camera in the car and some kind of switching frequency driving the (probable) LEDs on the lights.

Outside of that: total of two interventions over fifteen miles in each direction. The first one was a partly busted-down car crawling along next to the curb and blocking the travel lane; the other one was a sharp right to an on-ramp with another char also entering the on-ramp from another entrance, but far enough back from where I was to not, theoretically, bother the car. The car got bothered, anyway, and had to goose it.

willow_hiller · Jun 30, 2023

dgatwood said:
3. They added extra training data to fix some problem, and ended up making other things worse because the model is too small to handle what is being asked of it.

That's encapsulated in my #1. Just because things are worse for you doesn't mean they're not better on average for the whole fleet.

sleepydoc · Jun 30, 2023

uscbucsfan said:
There's a few other potential factors; your speed, the size of the sign, weather conditions, and your car.

The temp stop signs around me, my car likes. It even stopped at one while a guy was carrying it. In Florida they used a mini temp stop sign and my car didn't pay it any attention.

My car doesn't recognize the ones people hold, either.

I'll try to snap a picture of them - the weather has been perfect (except for a slight haze from the Canadian forest fires) so visibility isn't to blame. They are actually larger than your average stop sign but like I said, the display shows them as top signs so they're being recognized as such, just not obeyed. It doesn't seem to matter which direction I come from or which lane I'm in.

Maybe my car just isn't too bright?

kabin · Jun 30, 2023

dgatwood said:
3. They added extra training data to fix some problem, and ended up making other things worse because the model is too small to handle what is being asked of it.

That seems to be the norm. We are left with the hope HW3 has headroom to take whatever is thrown at it, the dojo wet dream will optimize all scenarios, and HW3 real time processing will reproduce it in 8 bit glory. Just ship it baby!

If FSDj could talk it would say something like "hold my beer... watch this."

PACEMD · Jul 1, 2023

PR0FESSOR said:
I am embarrassed to admit that I had to look up “k-hole”. LOL. How in the world can someone use Ketamine to the point where they become disassociated with their body. What??? Are people that @#$#@ing stupid? What am I thinking, of course they are. I bet the boards of the public companies are not too happy with the exposure.

Oh… wait… THAT EXPLAINS IT!!!

11.4.4 is using the ”Ketamine“ neural network!

Joe

To be fair lower doses of ketamine, such as Elon discussed, do not have that dissociative effect. And Steve Jobs was quite a fan of LSD so there's that.......

Twiglett · Jul 1, 2023

only got to do one leg of my commute on FSDb yesterday. The first leg was running the camera recalibration.
Sadly not much improvement, in the end I just wasn‘t in the mood for its stupidity on the highway and just used TACC

Supcom · Jul 1, 2023

So, what is the speculation on the next FSDb version? 11.4.5 or 11.5.X?

I'd kind of like to see 11.4.5 with some fixes for the errant lane changes.

MarcG · Jul 1, 2023

Twiglett said:
only got to do one leg of my commute on FSDb yesterday. The first leg was running the camera recalibration.
Sadly not much improvement, in the end I just wasn‘t in the mood for its stupidity on the highway and just used TACC

Does TACC work while cameras are still recalibrating?

JulienW · Jul 1, 2023

Supcom said:
So, what is the speculation on the next FSDb version? 11.4.5 or 11.5.X?

I'd kind of like to see 11.4.5 with some fixes for the errant lane changes.

Whichever I'd like to see Tesla stop treating us like 3ed class owners and move us up to 23.20.x.

zoomer0056 · Jul 1, 2023

FSDtester#1 said:
I hope there is a method to this madness and they did this intentionally to test some new Neural Network behaviors...

FSDb beta software will always be pushing the limit with new software. I've not paid to be a tester. But I would expect to discover problems on each software update until it comes out of beta.

Supcom · Jul 1, 2023

MarcG said:
Does TACC work while cameras are still recalibrating?

No.

Edit: At least I don't think any of the ADAS functions are available. It's been so long since I calibrated, I can't be certain.

aronth5 · Jul 1, 2023

I am still emailing Tesla once or twice a week when encountering really unsafe interactions or problems that bug me. Typically that means multiple emails on the same issues. I usually use "Critical Safety Issue" in the subject line. Surprisingly most have been fixed since I started in 2021. Coincidence probably.

I still have tons of issues but most are problems I use the accelerator pedal on. Are others emailing Tesla, using the report note or both?

JulienW · Jul 1, 2023

aronth5 said:
I am still emailing Tesla once or twice a week when encountering really unsafe interactions or problems that bug me. Typically that means multiple emails on the same issues. I usually use "Critical Safety Issue" in the subject line. Surprisingly most have been fixed since I started in 2021. Coincidence probably. I still have tons of issues but most are problems I use the accelerator pedal on. Are others emailing Tesla, using the report note or both?

Blackholes have no passion.

Pdubs · Jul 1, 2023

PACEMD said:
To be fair lower doses of ketamine, such as Elon discussed, do not have that dissociative effect. And Steve Jobs was quite a fan of LSD so there's that.......

who isn't?

grottomatic · Jul 1, 2023

I’m new to this but joining the discussion as I’ve been interested in driver assistance for a while. I subscribed to FSD when I bought my model Y in May and got 11.3.6 pushed about a week ago with the 20.7 update.

My thoughts:

My commute is mixed 30% city streets and 70% highway. FSD is completely unusable on city streets in my area but works fairly well on the freeway portion (much better than enhanced autopilot).

I am a private pilot and have experience using and monitoring aircraft autopilot systems. These systems, all of them, require quite a bit of monitoring even in a situation where things are slow (cruise flight with minimal traffic, course and altitude changes). In terminal areas after takeoff and before landing monitoring the autopilot is a very intensive job. In fact, in multicrew aircraft crew resource management concepts delineate a “pilot flying” and “pilot monitoring.”

The Tesla is always facing “terminal area complexity” on city streets given traffic and routing, pedestrians, etc. Monitoring the system is cognitively more intensive than just driving- defeating the purpose of the system.

I think there are so flaws with planning that need to be overcome. I wonder if when you plan a route the computer decides optimal lanes to be in for route planning and turns… it seems that it doesn’t. It should be planning multiple turns ahead to optimize its positioning on multi lane roads, and then take traffic into account to get where it needs to be. On freeways, it does a better job due to less “inputs” of cars entering and exiting. Monitoring the system is easier on the freeway for the same reason.

Overall it is an interesting system with huge flaws and I would never use it in an area that I wasn’t familiar with because you need to have knowledge of the local road conditions, traffic, and driver habits to properly monitor it. I understand why YouTube influencers who are big evangelists for FSD drive the same route over and over again.

PACEMD · Jul 1, 2023

Pdubs said:
who isn't?

I know, right? Acid is groovy..........the secret ingredient for FSDb.........remember what the dormouse said........

rlsd · Jul 1, 2023

grottomatic said:
I’m new to this but joining the discussion as I’ve been interested in driver assistance for a while. I subscribed to FSD when I bought my model Y in May and got 11.3.6 pushed about a week ago with the 20.7 update.

My thoughts:

My commute is mixed 30% city streets and 70% highway. FSD is completely unusable on city streets in my area but works fairly well on the freeway portion (much better than enhanced autopilot).

I am a private pilot and have experience using and monitoring aircraft autopilot systems. These systems, all of them, require quite a bit of monitoring even in a situation where things are slow (cruise flight with minimal traffic, course and altitude changes). In terminal areas after takeoff and before landing monitoring the autopilot is a very intensive job. In fact, in multicrew aircraft crew resource management concepts delineate a “pilot flying” and “pilot monitoring.”

The Tesla is always facing “terminal area complexity” on city streets given traffic and routing, pedestrians, etc. Monitoring the system is cognitively more intensive than just driving- defeating the purpose of the system.

I think there are so flaws with planning that need to be overcome. I wonder if when you plan a route the computer decides optimal lanes to be in for route planning and turns… it seems that it doesn’t. It should be planning multiple turns ahead to optimize its positioning on multi lane roads, and then take traffic into account to get where it needs to be. On freeways, it does a better job due to less “inputs” of cars entering and exiting. Monitoring the system is easier on the freeway for the same reason.

Overall it is an interesting system with huge flaws and I would never use it in an area that I wasn’t familiar with because you need to have knowledge of the local road conditions, traffic, and driver habits to properly monitor it. I understand why YouTube influencers who are big evangelists for FSD drive the same route over and over again.

"I think there are so flaws with planning that need to be overcome. I wonder if when you plan a route the computer decides optimal lanes to be in for route planning and turns… it seems that it doesn’t. It should be planning multiple turns ahead to optimize its positioning on multi lane roads, and then take traffic into account to get where it needs to be. On freeways, it does a better job due to less “inputs” of cars entering and exiting. Monitoring the system is easier on the freeway for the same reason."

I agree with what you say here but I don't think "the computer decides optimal lanes". I think Tesla software engineers give the computer bad planning and bad lane control algorithms because they don't have driving experience. They are good at calculations and coding but they don't have the experience with maneuvering the car on the road to apply the human experience to FSD. I have the impression that FSD is the product of pure calculation at the beginning. Maybe the customer complaints and the AI work can help improve FSD later.

dgatwood · Jul 1, 2023

willow_hiller said:
That's encapsulated in my #1. Just because things are worse for you doesn't mean they're not better on average for the whole fleet.

I'm not an AI expert by any means, but based on my limited understanding, that's actually not encapsulated in #1, which was:

willow_hiller said:
1. On aggregate, performance is improved with an update. Otherwise if Tesla saw major regressions across the board in their simulations, they wouldn't ship it.

Obviously the simulations don't have enough data to cover all of the interesting edge cases that people are seeing, or else we wouldn't be seeing them. We don't know if they're even close; it could be five orders of magnitude too few.

More importantly, you can never assume that generated simulations created by a GAN (generative adversarial network) will ever become a representative sample of real-world conditions. GANs, for folks who aren't familiar or need refreshing, generate new training data by imitating existing training data. In Tesla's case, this means creating new sequences of input video frames from multiple angles that could plausibly occur in the real world, using a large corpus of existing input video as examples of what the real world looks like.

The problem is, you really can't assume that the underlying training data used to train the GAN is sufficiently diverse (or even within orders of magnitude of being sufficiently diverse). Thus, they potentially will never be able to find huge swaths of edge cases without more training data, because GANs trained with the existing training data will never rule in those types of edge cases as being plausible.

And at that point, somebody has to tell the fleet to send back mountains of video clips with specific combinations of tags, e.g. pedestrians standing in a bike lane or whatever (yes, this example is a joke, but you understand the point) and add them to the training sets for the GAN. For the problems that they know about, simulation is a great way to iterate and improve that aspect of the model, but at some point, it's like playing Whac-A-Mole. You're never going to find every possible unusual road condition that way, because there are just entirely too many, realistically speaking.

Now here's where it gets interesting. If there are cases where a driving model is just marginally good enough, there's a decent chance it is marginal because there isn't much driving model training data that covers those edge cases. If a model change negatively affects a large number of those marginal cases, and if those happen often enough, the average overall behavior of the driving could get worse even when it massively improves some other problem that occurs less frequently than the sum total of all of those individually rare edge cases.

Worse, because such edge cases are underrepresented in the training data, the simulation results won't necessarily tell you that the average behavior is going to get worse unless they're somehow compensating for that underrepresentation (and if they knew that those edge cases were underrepresented, they presumably wouldn't still be underrepresented in the training data, so that seems unlikely to actually be possible).

Thus, it is entirely possible for a release to seem better on average in simulated driving before the release and still be considerably worse on average in the real world, particularly if the vehicles chosen for the early rollouts are not a representative sample of the real world.

And streets in San Francisco, Palo Alto, Mountain View, Fremont, and other similar areas are likely massively overrepresented in both the rollouts (particularly in the early stages involving employees) and in captured data, simply because they are massively overrepresented in terms of the number of cars on the road. So unless the experiment design is quite nonrandom, massively biasing vehicle selection based on geographical location in an effort to balance out the nonrandom geographical distribution of the vehicles themselves, we can probably safely say that neither the data that feeds into the simulations nor the the early rollout vehicles are likely to be particularly representative samples of the real world.

So here's what I'm wondering: Why doesn't Tesla allow the MCU to upload a firmware supplement bundle to the FSD computer that adds a few new models that run in shadow mode for comparison purposes? If a fault occurs while running one of those models, ignore the fault, stop running the model, and report the failure. If they kept the models entirely in RAM to minimize flash wear, it seems likely to be mostly harmless.

With that approach, Tesla could silently push bundles out to a large percentage of the fleet when on Wi-Fi on a daily basis, enabling them to get much more data about how each model change improves things or makes them worse. Assuming they have enough people to then analyze the incoming telemetry data or manually tag or verify AI-based auto-tagging of video captured when the driving decisions generated using the outputs from those shadow models would be too different from the driving decisions based on the actual prod model outputs (where feasible), they would be able to potentially iterate on the models more quickly, rather than waiting for a release push and hoping that it actually makes things better.

Alternatively, why doesn't Tesla build NNs that are trained on their simulation data's metadata — things like how often certain combinations of tags occur in close proximity, how often particular tags move along certain vectors, etc. — and run those on every car in the fleet in an effort to identify road features, conditions, behaviors, etc. that are underrepresented in the simulation training data, and then capture more data to cover them? (I'm assuming they don't do this.)

Or both. My vote would be both.

Anyway, it seems to me that simulation is great, and it is absolutely critical as a part of the QA process, but assuming the telemetry is good enough and assuming there's enough extra horsepower to do it, daily updated live A/B experiments at the model level seem like a better way to move fast and (pretend to) break things.

FSD Beta v11.x

Active Member

Active Member

Active Member

Well-Known Member

Well-Known Member

Active Member

Active Member

Single pedal driver

Active Member

Active Member

Well-Known Member

Active Member

Active Member

Long Time Follower

Well-Known Member

Ingesting large quantities of LSD

Member

Active Member

Active Member

Active Member

Similar threads