Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

The FSD v9 showstopper

This site may earn commission on affiliate links.
How do you know this?

First, you can see from the visuals in the presentation that the BEV they had in the prototype closely matches that on the 8.x FSD beta. And it was made clear in the presentation that they needed the BEV to get anywhere close to FSD. Second, Elon and others have mentioned that they are working on integrating more cameras into the BEV (implying that the 8.x is already relying a subset of the cameras for a BEV). Finally, Elon has tweeted that the main change in 9.x is the removal of dependency of radar. Which makes sense given the delays.

FSD as we see it being demonstrated in the FSD videos really would not work without a decent BEV, which is what they have been working on for the last 18+ months and is really the breakthrough that has made FSD even possible. So it's clearly in 8.x, which is why its not a "showstopper" for 9.x (imho).

Also, if you watch a lot of the beta videos you will see the car repeatedly stalling or slowing down mid-way through a turn with no obvious cause, which perplexes the test drivers. This is almost certainly the radar giving a false positive as an object swings into view as the car is making the turn. My (informed) guess is that when Tesla looked at the ratio of false to true positives for the radar they realized that is was more a problem than a help. If you keep having the visual system say "no, you can ignore that radar", but it's causing the car to slow (dangerously) mid-turn, then you are probably better off without it. This is thus the major goal of 9.x (imho).

What made you think 9.x was going to be "a showstopper" ?
 
This would imply bird's eye view vision has not yet been fully deployed.

You are misinterpreting him:


I dont know what you mean by "fully deployed" .. but the car is already building a BEV from the cameras that are in use (watch any of the FSD videos to see this). Adding extra cameras should (a) allow higher recognition accuracy and (b) extend the range of the BEV to include more of the side/ear views. Anyway, this is a beta, no-one has said it is finished.

As for Elon, how am I mis-interpreting him? By "pure vision" he means "no radar", as he has made clear in several other tweets about 9.x.
 
Tesla is removing radar because of improvements in vision. Those improvements in vision are achieved through “significant architectural changes”.

Based on the things Elon and Andrej have said over the last two years, including things Elon has said recently, I speculate that one of the most significant changes from FSD beta v8.2 to v9.0 is the transition to an approach to vision that is fully top-down, bird’s eye view, natively 3D using a fully neural network approach to go from raw pixels from the eight cameras to the 3D, 360-degree bird’s eye view model of the world.

What exists in v8.2 seems to be some sort of hybrid or halfway point to the aspiration of natively 3D, fully NN-based computer vision that Elon and Andrej have described, such as in the clip in the OP of this thread.
 
Tesla is removing radar because of improvements in vision. Those improvements in vision are achieved through “significant architectural changes”.

Based on the things Elon and Andrej have said over the last two years, including things Elon has said recently, I speculate that one of the most significant changes from FSD beta v8.2 to v9.0 is the transition to an approach to vision that is fully top-down, bird’s eye view, natively 3D using a fully neural network approach to go from raw pixels from the eight cameras to the 3D, 360-degree bird’s eye view model of the world.

What exists in v8.2 seems to be some sort of hybrid or halfway point to the aspiration of natively 3D, fully NN-based computer vision that Elon and Andrej have described, such as in the clip in the OP of this thread.
But as I and others have noted, that basic BEV milestone was part of FSD from the start, and has always been present in the beta. The whole point of the last 18 months of work by the AI/NN team has been to change the NN to synthesize a top-down model from the combined camera views. And that's what it has done. You can see this in every single FSD video back from the first public beta onwards. The FSD would not have been possible without this.

I dont know why you think 8.x is some inferior version of that. Sure, it's evolving and improving, but the basic ground-plan and model is there in 8.x. As I've noted, 9.0 (so far as has been indicated) is mostly about moving away from radar, so far as anyone can tell reading between the lines of Elons posts. I don't know why you think 9.0 is some huge radical rethink of the basic FSD stack .. since the stuff in the video you referenced has been in FSD all along. Switching to pure vision is a result of observations about the discrepancy between the vision and radar views, and while certainly a testimony to the growing confidence in the vision based model, isnt a driver of that.

It's also a mistake to think the goal of the NN is to synthesize a "view" of the world, in the sense of a viewable image. The BEV is a conceptual model .. the NN creates a mapped model view that is indeed from the top down (BEV), but its only at the model, not pixel, level. And also note that the term "3D" here refers to the temporal aspect (x+y+time) more than any z-axis stuff.
 
Last edited:
No. It’s x, y, and z axis. Width, height, and depth. When you add the dimension of time, it’s 4D.
You might think of it that way, others dont in this specific context. The specific references used in much of the FSD discussion (including the video you reference) use 3D to mean a BEV view that includes a velocity (i.e. time) component.

Again, you seem to think Elon is implying there is no BEV in the 8.x FSD beta, which is manifestly not true. I'm sure 9.x will be a significant improvement (it had better be), but your OP and thread title was (and still is) inaccurate and misleading.
 
Detailed explanation of 2D vs. 3D vs. 4D:

 
  • Like
Reactions: Microterf
Detailed explanation of 2D vs. 3D vs. 4D:

So what? I pointed out that there is terminology confusion, but that's irrelevant. As I've already noted, whatever its called, this basic technology is already present in the 8.x stack, as you can see if you examine the various beta videos. Aside from Elons usual hype words ("significant", "fundamental"), the actual functional change in 9.x is not to add BEV, but to enhance it and remove any reliance on radar. If you are expecting 9.x to be a fundamental change, as you claimed in the OP, then you are likely to be disappointed. Sure, it will be better (I hope), but hyping it as a "showstopper" is dubious at best. Remember all those huge excited threads before the holiday update? People were claiming it would be V11, and would have vast new features etc etc etc. What did we get? A few tweaks and a new game or two.
 

Topic 1: the definition of “3D”​


How many spatial dimensions would you say this purple cuboid around the fire truck has? One, two, or three?

ADB8460E-57DF-49E4-82E1-11B26B2B182C.jpeg




Topic 2: BEV in FSD​


I can't find any source that establishes to what extent bird's eye view (BEV) is present in FSD Beta besides these tweets from Elon:



He does not mention transitioning more cameras to BEV, but transitioning more neural networks.

Tesla has more than 40 neural networks running in FSD, according to Karpathy's CVPR 2020 presentation.

If you have a source besides these two tweets from Elon, please share it.

I never claimed that there was no BEV to any extent in v8.2. Rather, I conjectured that a transition from non-BEV to BEV from v8.2 to v9.0 would lead to major performance increases in v9.0. Not necessarily 0% of NNs using BEV to 100% using it, but a significant transition nonetheless. It will likely also be an improved implementation of BEV.
 
Last edited:
If you watch the entire Karpathy talk, and study the slides (and some of his tweets, which I dont have at hand), you will see that he pretty much admits that FSD without BEV really cannot be done. He shows with/without examples, where (say) an intersection cannot be recreated without the NN generating a BEV. The BEV examples he shows, right down to the color schema for the BEV view of roads, cars etc, matches pretty much exactly that in the 8.x FSD beta. Further, since he stated that BEV was required and that this was their primary focus in 2020, I think this is pretty conclusive that 8.x does indeed use BEV .. because you can see it on the car screen.

The distinction between cameras vs NNs is not relevant here, and I'm not clear what "focal areas" means in Musk-ese, but no doubt we shall see when 9.x does indeed get released.

As for this 2D/3D stuff that seems to get you flustered. Some other threads, and indeed some discussions elsewhere, started using "3D' to indicate a primarily 2D system (i.e. a map) with velocity projections, was "3D", which would have been better expressed as 2D+T or some such. I'm not disagreeing that we live in a world of 3 spatial and one time dimension (string theory notwithstanding), but merely that some references to "3D" in other threads should be taken to mean "top down map view plus velocity".

In fact, the NN is probably more accurately described as 2D+T or perhaps 2.5D+T, since I doubt if it would (say) correctly place a car if it was flying through the air. The visual system must ultimately establish a ground plane, and then use the NN to place objects onto that plane in the correct (X,Y) locations .. that, after all, is basically what a BEV is.

(oh, and that purple "cuboid" is of course strictly speaking a 2D co-planar abutted square and trapezoid on my computer screen :)
 
Elon mentioned that in v9.0, users will see a probability distribution of objects:


Karpathy says in his CVPR 2020 presentation (at ~20:00) that designing NNs that can deal with uncertainty and probability is a difficult challenge with the new BEV nets.

I find it interesting that, seemingly, Tesla's solution for this challenge will be visualized in v9.0. Visualization doesn't always evolve in lock step with software progress under the hood, but in this case my hunch is that it's more than coincidence given what else Elon has said about v9.0.
 
If you watch the entire Karpathy talk, and study the slides (and some of his tweets, which I dont have at hand), you will see that he pretty much admits that FSD without BEV really cannot be done.

"Cannot be done" is your interpretation. Karpathy doesn't actually say that; he just says the BEV approach works much better.

Also, we must deconflate "Level 4 autonomy" and "FSD Beta 8.2" (which is a Level 2 closed beta) when we discuss what Karpathy thinks can or can't be done.

In any case, read what I said above about BEV vs. non-BEV not being an all-or-nothing distinction. You are splitting hairs on this topic in service of no substantive point.

The distinction between cameras vs NNs is not relevant here

It is relevant. For starters, it makes your prior claim false.

Some other threads, and indeed some discussions elsewhere, started using "3D' to indicate a primarily 2D system (i.e. a map) with velocity projections, was "3D", which would have been better expressed as 2D+T

I don't care what random laypeople on this forum have said in threads I haven't read. I care about Karpathy and Elon mean by "2D", "3D", and "4D" in their public communications about FSD.

This should clear it up:

 
  • Disagree
Reactions: drtimhill
If you watch the entire Karpathy talk, and study the slides (and some of his tweets, which I dont have at hand), you will see that he pretty much admits that FSD without BEV really cannot be done. He shows with/without examples, where (say) an intersection cannot be recreated without the NN generating a BEV. The BEV examples he shows, right down to the color schema for the BEV view of roads, cars etc, matches pretty much exactly that in the 8.x FSD beta. Further, since he stated that BEV was required and that this was their primary focus in 2020, I think this is pretty conclusive that 8.x does indeed use BEV .. because you can see it on the car screen.

The distinction between cameras vs NNs is not relevant here, and I'm not clear what "focal areas" means in Musk-ese, but no doubt we shall see when 9.x does indeed get released.

As for this 2D/3D stuff that seems to get you flustered. Some other threads, and indeed some discussions elsewhere, started using "3D' to indicate a primarily 2D system (i.e. a map) with velocity projections, was "3D", which would have been better expressed as 2D+T or some such. I'm not disagreeing that we live in a world of 3 spatial and one time dimension (string theory notwithstanding), but merely that some references to "3D" in other threads should be taken to mean "top down map view plus velocity".

In fact, the NN is probably more accurately described as 2D+T or perhaps 2.5D+T, since I doubt if it would (say) correctly place a car if it was flying through the air. The visual system must ultimately establish a ground plane, and then use the NN to place objects onto that plane in the correct (X,Y) locations .. that, after all, is basically what a BEV is.

(oh, and that purple "cuboid" is of course strictly speaking a 2D co-planar abutted square and trapezoid on my computer screen :)

This is "the next update has the shiny object that we have all been waiting on that will change everything" problem in the tesla community. Its quite evident that the BEV is already running in the car and as you said it would be hard to make turns in intersections without it. This has also been confirmed by verygreen. But if someone believed there's some special BEV somewhere, they will also easily believe that Tesla has some amazing cars behind the scene with L4 software.