Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
There are hundreds of thousands/millions of variants of this sort of thing
Millions of variations say 10^6? Some estimate Chess might be around 10^50 board configurations while Go 10^170, so maybe this is "easier" in some sense. Just think about how many above average drivers there are compared to above average players! ;) :p But independent of the actual number of variations, these indeed do have a similarity of large potential variations and relatively small number of possible actions.

One major simplification for these games is not needing to worry about perception as a piece is there or not whereas driving has a lot of uncertainty in what can be seen or inferred or just quality of current predictions. Also one special aspect of a later work MuZero is that it learned to play without even knowing the rules (of Chess, Go, Atari games) basically learning by seeing what happens when it does some action.

However, I think what you're getting at is that future planning seems to want some look-ahead search prediction. AlphaZero definitely has the benefit of a perfect simulator to see how things might be many moves ahead, and this is how it finds new behaviors based on current model understanding. It doesn't seem like 12.x is doing this in the car, but one potential hope is that trained game models are already superhuman even without searching ahead. This would seem to mean there's somewhat of an intuitive understanding that incorporates likelihood of future possibilities.

End-to-end already needs to make further-than-few-seconds-out predictions such as getting into the appropriate lane for an upcoming turn. Videos so far seem to indicate it's not great at this either tending to switch much closer to the intersection. But even basic things like turning on the turn signal, it sounds like some people feel this happens too early. These examples might be assisted by more consistent input signals such as map and navigation data versus dynamic signals of noticing vehicles ahead in lane with their left blinker engaged. So it seems like 12.x neural networks should be able to do more longer-term planning than immediate reactions, but maybe this requires dedicated training of when people manually switch lanes sooner.
 
  • Helpful
Reactions: AlanSubie4Life
Millions of variations say 10^6? Some estimate Chess might be around 10^50 board configurations while Go 10^170, so maybe this is "easier" in some sense
I was talking about the number of intersections (you could call it the number of boards I guess?). There are apparently 16M. Obviously there are tons of other variables like car positions, number of cars turning, vehicle speeds, traffic density, lighting conditions, etc.

These all multiply up the permutations!

It seems to me that the road system of the United States is probably far more complex and has many more valid moves and configurations than a chess board.
these indeed do have a similarity of large potential variations and relatively small number of possible actions.
The space of variations is much larger. Space of actions is far more continuous and less discrete than the game of chess. In the end pretty dissimilar problem as you seem to imply.

I think it is hopeless. I should probably just call it quits here so I stop bringing people down (and so I spend less time on this forum!). No one likes a Debbie Downer, I know that.

I anticipate my expectations for v12 are going to be completely met. Looks like it will likely be a nice improvement over v11. Looking forward to it and I hope it continues to improve. I think it will be great and I am excited about it. But in spite of its greatness, it’s not clear to me how useful it will really be. And I don’t expect to be able to use it with anyone in the car in good conscience.

But while I use v11 to get to/from work sometimes (today a 20-minute drive with just one brief nag while I was gunning the accelerator), basically for the novelty value and curiosity, ultimately it tends to be exhausting and a bit tiring to use. It really is much easier smoother and relaxing to drive myself - I know exactly what is going to happen, and I have full control and don’t have to worry about mode confusion in the case of evasive action (on two occasions recently I have hit the accelerator when disengaging when I have had to take evasive action - maybe I am straight-up incompetent, and in one case it was probably the correct move, but I would have it handled much better if I had just been driving, and of course I anticipated both of these scenarios before they occurred, so they never would have occurred with me driving). Of course v12 will be no different. I just want something that is going to stop me from piling into someone by mistake. And maybe instantly warn me of someone running a light on a collision course, coming up super fast, stop me from running into someone in my blind spot for no reason. etc.

Anyway since I think it’s never going to work (with current hardware), I should just stop posting and attempt to be right in silence and let people dream. No need to say “I told you so;” that is just me being a Negative Nancy.

Good luck to all. Probably won’t be the last post but I will try.

AGI is nearly here, clearly. Hopefully Tesla’s perception is just as good.
IMG_0398.jpeg

IMG_0403.jpeg
 
Last edited:
Yeah it can definitely do better planning, but its ability to safely recover shows the current model architecture is able to consider a bunch of complexity and even small signals at the same time, so maybe it just doesn't have enough training examples or core understanding yet. Sorry if you don't play chess and I should find a better comparison, but AlphaZero learning process also needed to learn that it blundered then tactics of how to recover then strategies of how to avoid getting into dangerous situations in the first place.

Hopefully Tesla has a good way to collect this type of data without requiring people to have it active and ending up in riskier situations. Doesn't help that Omar doesn't disengage, which presumably would send back data of needing improvement.
I'd rather it plan better and not need to recover in the first place. In one of the videos posted above comparing V11 to V12 V12 did a better job of negotiating merging 2 lanes over after the left turn. Except had it picked the correct turn lane in the first place it wouldn't have needed to negotiate that merge.
 
  • Like
Reactions: AlanSubie4Life
AI DRIVR's latest drive. Worth a watch.

So,

What is really fascinating is that v12 does not show cones.

Up until now, from 2019, I had noted improvements were tied to basically by the number of objects the car could identify. It was never a question that, once an object, like a lane line or an off ramp or a stop light, was properly identified the car did not react to the object properly, it was just that the number of objects identified were simply not enough to avoid disengagements.

In this video, I am surprised that the cones are blobs. I am not surprised that it does not recognize the dude holding the stop/slow sign as a dude holding the sign, but instead as a dude standing next to a sign, I would think the first priority is not running into any person, so the task is to identify people, not really what the people happen to have in their hands. This isn't enough when the one thing the person has in his hand is a stop sign!

On another video, the car was in a parking lot of a big box store and although it spotted people and cars well enough, it did not separately identify the shopping cart as a shopping cart. I would expect that soon, as shopping carts not only should not be hit, but are pretty common.

Anyway, you see in this video that the deer are labeled as dogs, and a couple of deer together flick back in to a human. Again, good enough to not run into them, but not good enough as there is no reason that deer, dogs and people should not be separately noted.

Same with the women pushing a baby carriage. No separate identification of the baby carriage.

Well, we will see, one of the reasons I am bullish on FSD is that it seemed to me obvious, that advancement was tied to the conversion of what the cameras "see" to code which identifies particular objects. And once you can do orange cones you can eventually do everything, and once you have everything instructing the car not to hit things will be the easy part.

I am really not sure if once everything seen by the cameras is identified how much "anticipation" is really needed to drive safely.

Clearly, others think the technical challenge is more than just identifying the space. It may be a bit more but I think identifying the physical world is the biggest hurdle.
 
  • Informative
Reactions: FSDtester#1
So,

What is really fascinating is that v12 does not show cones.

Up until now, from 2019, I had noted improvements were tied to basically by the number of objects the car could identify. It was never a question that, once an object, like a lane line or an off ramp or a stop light, was properly identified the car did not react to the object properly, it was just that the number of objects identified were simply not enough to avoid disengagements.

In this video, I am surprised that the cones are blobs. I am not surprised that it does not recognize the dude holding the stop/slow sign as a dude holding the sign, but instead as a dude standing next to a sign, I would think the first priority is not running into any person, so the task is to identify people, not really what the people happen to have in their hands. This isn't enough when the one thing the person has in his hand is a stop sign!

On another video, the car was in a parking lot of a big box store and although it spotted people and cars well enough, it did not separately identify the shopping cart as a shopping cart. I would expect that soon, as shopping carts not only should not be hit, but are pretty common.

Anyway, you see in this video that the deer are labeled as dogs, and a couple of deer together flick back in to a human. Again, good enough to not run into them, but not good enough as there is no reason that deer, dogs and people should not be separately noted.

Same with the women pushing a baby carriage. No separate identification of the baby carriage.

Well, we will see, one of the reasons I am bullish on FSD is that it seemed to me obvious, that advancement was tied to the conversion of what the cameras "see" to code which identifies particular objects. And once you can do orange cones you can eventually do everything, and once you have everything instructing the car not to hit things will be the easy part.

I am really not sure if once everything seen by the cameras is identified how much "anticipation" is really needed to drive safely.

Clearly, others think the technical challenge is more than just identifying the space. It may be a bit more but I think identifying the physical world is the biggest hurdle.
The visualizations we see on screen are a separate module that is fed from the occupancy network NN. It is a database of objects based on the blocks from the ON - mostly just a guess as to what the blob of blocks might be. Since the system has changed to end-to-end NN, it may be that the visualization module was transitioned as well, and now has to be re-trained. The old visualization module was improved upon over time to show us trash cans, dogs, cats, etc. The new one will likely take time as well.
 
What is really fascinating is that v12 does not show cones.
The visualizations in v12 are just eye candy for you. It doesn't appear that FSDb V12 uses them, or that data, for any driving purposes. (It responds to items not visualized, and it doesn't respond to items visualized that aren't actually there.)
 
The visualizations in v12 are just eye candy for you. It doesn't appear that FSDb V12 uses them, or that data, for any driving purposes. (It responds to items not visualized, and it doesn't respond to items visualized that aren't actually there.)
The path planner appears to come from the E2E process. But, I agree that everything else is separate from driving.
 
  • Like
Reactions: FSDtester#1
The visualizations in v12 are just eye candy for you.
Perhaps. But I like eye candy!

Even if just a visual clue, I like seeing that the machine properly sees/categorizes potential hazards. Even with today's 2-year-old ability to see, just basics today. But useful to me.

I wonder if specific feedback from this public group will be sought? I would hope they do a survey at least.
 
I feel like HW4 performs better with V12 than HW3, based on the videos I've seen vs my own experience. HW3 12.2.1 seems to have more latency in response to changes in object trajectories. This is most apparently when following lead cars, where HW3 is more jumpy vs what I've seen in Omar's videos.

This is something you can really "feel" when using V12 on HW3, the responsiveness feels sluggish wrt other vehicles.

This is on Chill and Aggressive.
 
  • Informative
Reactions: FSDtester#1