Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
The next big milestone for FSD is 11. It is a significant upgrade and fundamental changes to several parts of the FSD stack including totally new way to train the perception NN.

From AI day and Lex Fridman interview we have a good sense of what might be included.

- Object permanence both temporal and spatial
- Moving from “bag of points” to objects in NN
- Creating a 3D vector representation of the environment all in NN
- Planner optimization using NN / Monte Carlo Tree Search (MCTS)
- Change from processed images to “photon count” / raw image
- Change from single image perception to surround video
- Merging of city, highway and parking lot stacks a.k.a. Single Stack

Lex Fridman Interview of Elon. Starting with FSD related topics.


Here is a detailed explanation of Beta 11 in "layman's language" by James Douma, interview done after Lex Podcast.


Here is the AI Day explanation by in 4 parts.


screenshot-teslamotorsclub.com-2022.01.26-21_30_17.png


Here is a useful blog post asking a few questions to Tesla about AI day. The useful part comes in comparison of Tesla's methods with Waymo and others (detailed papers linked).

 
Last edited:
No they use a hybrid approach now.



View attachment 751000
They might be using Monte Carlo tree search for the coarse search part. It's supposedly the state of the art for heuristic path planning searches. The coarse search avoids local minima but can be too slow on a fine grid. Once the coarse search finds a convex domain, the continuous optimization can solve the problem much faster.

The new NN approach creates better heuristic values that make the search algorithm stray from the optimal path less which should also make the search faster.
 
I am happy if they don't bother taking on more betas, *now that I got mine*, until v11. Focus on the new paradigm, don't bother with making pretty changes.

Now as far as the other v11 (Is there a NAME for those two sw releases that is different??) that everyone is upset about the look of, feel free to keep making changes to that and releasing it's updates until the rabble have been sated.
🤦‍♂️
 
  • Funny
Reactions: planetary
I hope so. I don't expect 11 for several months ... and I don't want to be stuck with 10.8.

But I've to say ... the FSD team is probably interested in working on 11 rather than making small improvements to 10.x. Afterall they keep making improvements to 10.x and a lot of us think there isn't much of an improvement or there are a lot of regressions.

These are always tough decisions for the engineering team ...
We know that different parts of the FSD stack are worked on by different teams and I am sure they have defined, pluggable "APIs," so the decision to work on v10.x or v11.0 is not mutually exclusive. E.g, the vision team can work on the "photon to NN" part of the vector space generation while the path planning team works on improvements to path planning from the vector space, and the next release will be labeled 11.0 or 10.x depending on which team(s) finish a release for testing first, I imagine. Same goes for the training team collecting/labeling data for additional NN training.
 
They might be using Monte Carlo tree search for the coarse search part. It's supposedly the state of the art for heuristic path planning searches. The coarse search avoids local minima but can be too slow on a fine grid. Once the coarse search finds a convex domain, the continuous optimization can solve the problem much faster.

The new NN approach creates better heuristic values that make the search algorithm stray from the optimal path less which should also make the search faster.

I don't think so.


So, we’re working on neural networks that can produce state and action distributions, that can then be plugged into Monte Carlo tree search with various cost functions. Some of the cost functions can be explicit cost functions like collisions, comfort, traversal time, etc. But they can also be interventions from the actual manual driving events. We train such a network for this simple parking problem. So here again, the same problem. Let’s see how MCTS researched us.​
..... So, this only takes 288 nodes and several orders of magnitude less than what was done in the A* with the Euclidean distance heuristic.​
@diplomat33 This is what I was talking about - its not easy to figure out what is today and what is tomorrow.
 
We know that different parts of the FSD stack are worked on by different teams and I am sure they have defined, pluggable "APIs," so the decision to work on v10.x or v11.0 is not mutually exclusive. E.g, the vision team can work on the "photon to NN" part of the vector space generation while the path planning team works on improvements to path planning from the vector space, and the next release will be labeled 11.0 or 10.x depending on which team(s) finish a release for testing first, I imagine. Same goes for the training team collecting/labeling data for additional NN training.
Not so easy.

Would you work on current NN optimization or future NN using raw video ? Would you work on current planner optimization or future on with Montecarlo simulation and NN ?

Different teams solve this in different ways. One way is to have two separate teams - a team working on the future and a team working on the current. The future team starts small and gets bigger with time as more and more current team members join the future team. We have mostly done this in projects I've worked on and I personally prefer this.

Other way is to have a single team that flexibly works on both platforms - this can only work in the beginning of the v-next project. At some point you need to transition to full time v-next members.
 
  • Like
Reactions: Baumisch
One way is to have two separate teams - a team working on the future and a team working on the current. The future team starts small and gets bigger with time as more and more current team members join the future team. We have mostly done this in projects I've worked on and I personally prefer this.

This is the standard in software dev, in medium-to-large companies, and with good reason.
 
Last edited:
  • Informative
Reactions: pilotSteve
Not so easy.

Would you work on current NN optimization or future NN using raw video ? Would you work on current planner optimization or future on with Montecarlo simulation and NN ?

Different teams solve this in different ways. One way is to have two separate teams - a team working on the future and a team working on the current. The future team starts small and gets bigger with time as more and more current team members join the future team. We have mostly done this in projects I've worked on and I personally prefer this.

Other way is to have a single team that flexibly works on both platforms - this can only work in the beginning of the v-next project. At some point you need to transition to full time v-next members.
That doesn't really address the point of my previous post, though. The idea is that, e.g., the path planning team works on the part of the FSD stack that goes from vector space to acceleration/steering, while the vision team works on the part that goes from cameras to vector space. If the format of the vector space is well defined, then the two teams can work independently. If the path planning team adds new or improved features in path planning, then they can be rolled out on top of the current vision stack for testing as v10.9. If the vision team gets the new foundational rewrites done for vision, these can be rolled out with the existing path planning as v11.0.
 
If the format of the vector space is well defined, then the two teams can work independently.
Its not. NN to procedural code handoff is changing as well. Remember the "bag of points" will be changed to objects in NN, rather than in C/C++.

If the path planning team adds new or improved features in path planning, then they can be rolled out on top of the current vision stack for testing as v10.9. If the vision team gets the new foundational rewrites done for vision, these can be rolled out with the existing path planning as v11.0.
But the question is - is there one team that works on both current & future path planning or is it two separate teams ? Same for NN.
 
Its not. NN to procedural code handoff is changing as well. Remember the "bag of points" will be changed to objects in NN, rather than in C/C++.
The "giant bag of points" was NN to C++ code to generate the vector space. The new NNs in the vision stack will (supposedly) generate the vector space directly. But as long as the definition of the vector space remains the same, then how you get there doesn't matter. For the path planning team, what the vector space is doesn't change. Building a system this complex with no well defined breaks in design would be insane. You would have to rewrite the FSD stack from top to bottom with every improvement. I am certain that's not how its done.
 
Hopefully we can have at least the braking resolved and ironed out 100% by 11.0
As I mentioned before (or maybe in another thread), I gather from what Elon said things like lane keeping and phantom braking will likely get worse in v11.x because of the foundational rewrites and retraining (and thus less accurate vector space generation) before getting better.
 
The "giant bag of points" was NN to C++ code to generate the vector space. The new NNs in the vision stack will (supposedly) generate the vector space directly. But as long as the definition of the vector space remains the same, then how you get there doesn't matter. For the path planning team, what the vector space is doesn't change. Building a system this complex with no well defined breaks in design would be insane. You would have to rewrite the FSD stack from top to bottom with every improvement. I am certain that's not how its done.
See the James Douma interview I linked to in OP.
 
Hopefully we can have at least the braking resolved and ironed out 100% by 11.0
As I mentioned before (or maybe in another thread), I gather from what Elon said things like lane keeping and phantom braking will likely get worse in v11.x because of the foundational rewrites and retraining (and thus less accurate vector space generation) before getting better.

I don't know whether Elon said that or not ... but in general Beta 11 will have regressions. They will have to optimize the networks to get better - but hopefully those optimizations will make the new network better than the old one.

BUT, there is no particular reason Beta 11 should be released before this happens - unless they actually need the reports and inputs for testers. I hope they have collected enough data now to be able to train Beta 11 to be at least on par with 10.x before they release it. But likely there will be some regressions and some areas that are better.
 
I don't think so.


So, we’re working on neural networks that can produce state and action distributions, that can then be plugged into Monte Carlo tree search with various cost functions. Some of the cost functions can be explicit cost functions like collisions, comfort, traversal time, etc. But they can also be interventions from the actual manual driving events. We train such a network for this simple parking problem. So here again, the same problem. Let’s see how MCTS researched us.​
..... So, this only takes 288 nodes and several orders of magnitude less than what was done in the A* with the Euclidean distance heuristic.​
@diplomat33 This is what I was talking about - its not easy to figure out what is today and what is tomorrow.
I took all that to mean that the Euclidean heuristic is being replaced with a better heuristic that is generated by the NN and used in the tree search. That doesn't preclude the coarse search algorithm from using a mcts with the Euclidean distance or some similar weight function.
 
I took all that to mean that the Euclidean heuristic is being replaced with a better heuristic that is generated by the NN and used in the tree search. That doesn't preclude the coarse search algorithm from using a mcts with the Euclidean distance or some similar weight function.
Yes - its always difficult to say what is on the drawing board vs POC vs production. My sense is, it is in POC. You might have a different take on it - thats fine.
 
this only takes 288 nodes and several orders of magnitude less than what was done in the A* with the Euclidean distance heuristic
One key metric that wasn't mentioned at this AI Day toy parking problem was the overall latency to find the path. A MCTS approach like AlphaZero/MuZero uses the neural network to evaluate a state to determine the value and likely actions to generate a future state that can then be evaluated. Notably even taking the most likely action from each state, it's a sequential process of asking the neural network to predict what's next. Increasing parallelism by asking the neural network to evaluate a batch of multiple likely actions/states then increases the breadth of search potentially then limiting the depth (how far into the future to look) if running into compute constraints.

Basically, if we say it takes 1ms for the neural network to evaluate a state, 288 nodes could have taken 288ms to find the path. Whereas earlier at the AI Day presentation, Ashok gives an example of doing 2500 (non-NN) searches in 1.5ms (which seem to be referring to full 10 seconds path plans). I have no idea what's the round trip latency of the neural networks, but overall there is indeed a tradeoff of leveraging neural networks to provide potentially better evaluations and using CPU to calculate and compare many potential states to fit desired time limits.
 
  • Informative
  • Like
Reactions: impastu and EVNow
Basically, if we say it takes 1ms for the neural network to evaluate a state, 288 nodes could have taken 288ms to find the path. Whereas earlier at the AI Day presentation, Ashok gives an example of doing 2500 (non-NN) searches in 1.5ms (which seem to be referring to full 10 seconds path plans). I have no idea what's the round trip latency of the neural networks, but overall there is indeed a tradeoff of leveraging neural networks to provide potentially better evaluations and using CPU to calculate and compare many potential states to fit desired time limits.
I think the CNN / Montecarlo must be better or atleast compable. Otherwise they would not use it.
 
  • Like
Reactions: impastu