The FSD v9 showstopper

CyberGus · May 6, 2021

rxlawdude said:
This is dangerous, folks.

Still safer than owning a Chevy Volt.

bedoig · May 6, 2021

rxlawdude said:
Then at least make your hypertext meaningful. Too much to ask?

Let's be fair, neither @dgatwood or @CyberGus are ever going to tell a lie, and hurt you.

shrineofchance · May 7, 2021

drtimhill said:
In fact this is what is already driving FSD 8.x beta, though I understand the camera integration is far from complete, including in V9 where the main change is removal of radar as a primary input (not a switch to BEV, which has already happened).

How do you know this?

drtimhill · May 7, 2021

shrineofchance said:
How do you know this?

First, you can see from the visuals in the presentation that the BEV they had in the prototype closely matches that on the 8.x FSD beta. And it was made clear in the presentation that they needed the BEV to get anywhere close to FSD. Second, Elon and others have mentioned that they are working on integrating more cameras into the BEV (implying that the 8.x is already relying a subset of the cameras for a BEV). Finally, Elon has tweeted that the main change in 9.x is the removal of dependency of radar. Which makes sense given the delays.

FSD as we see it being demonstrated in the FSD videos really would not work without a decent BEV, which is what they have been working on for the last 18+ months and is really the breakthrough that has made FSD even possible. So it's clearly in 8.x, which is why its not a "showstopper" for 9.x (imho).

Also, if you watch a lot of the beta videos you will see the car repeatedly stalling or slowing down mid-way through a turn with no obvious cause, which perplexes the test drivers. This is almost certainly the radar giving a false positive as an object swings into view as the car is making the turn. My (informed) guess is that when Tesla looked at the ratio of false to true positives for the radar they realized that is was more a problem than a help. If you keep having the visual system say "no, you can ignore that radar", but it's causing the car to slow (dangerously) mid-turn, then you are probably better off without it. This is thus the major goal of 9.x (imho).

What made you think 9.x was going to be "a showstopper" ?

shrineofchance · May 7, 2021

drtimhill said:
Second, Elon and others have mentioned that they are working on integrating more cameras into the BEV (implying that the 8.x is already relying a subset of the cameras for a BEV)

This would imply bird's eye view vision has not yet been fully deployed.

drtimhill said:
Finally, Elon has tweeted that the main change in 9.x is the removal of dependency of radar

You are misinterpreting him:

https://twitter.com/x/status/1373145910549483522

rxlawdude · May 7, 2021

DanCar said:
You get rick rolled if you follow through.

CyberGus said:
Still safer than owning a Chevy Volt.

If I could do three laughing emojis, I would for this one. Classic.

drtimhill · May 7, 2021

shrineofchance said:
This would imply bird's eye view vision has not yet been fully deployed.

You are misinterpreting him:

https://twitter.com/x/status/1373145910549483522

I dont know what you mean by "fully deployed" .. but the car is already building a BEV from the cameras that are in use (watch any of the FSD videos to see this). Adding extra cameras should (a) allow higher recognition accuracy and (b) extend the range of the BEV to include more of the side/ear views. Anyway, this is a beta, no-one has said it is finished.

As for Elon, how am I mis-interpreting him? By "pure vision" he means "no radar", as he has made clear in several other tweets about 9.x.

shrineofchance · May 8, 2021

Tesla is removing radar because of improvements in vision. Those improvements in vision are achieved through “significant architectural changes”.

Based on the things Elon and Andrej have said over the last two years, including things Elon has said recently, I speculate that one of the most significant changes from FSD beta v8.2 to v9.0 is the transition to an approach to vision that is fully top-down, bird’s eye view, natively 3D using a fully neural network approach to go from raw pixels from the eight cameras to the 3D, 360-degree bird’s eye view model of the world.

What exists in v8.2 seems to be some sort of hybrid or halfway point to the aspiration of natively 3D, fully NN-based computer vision that Elon and Andrej have described, such as in the clip in the OP of this thread.

drtimhill · May 8, 2021

shrineofchance said:
Tesla is removing radar because of improvements in vision. Those improvements in vision are achieved through “significant architectural changes”.

Based on the things Elon and Andrej have said over the last two years, including things Elon has said recently, I speculate that one of the most significant changes from FSD beta v8.2 to v9.0 is the transition to an approach to vision that is fully top-down, bird’s eye view, natively 3D using a fully neural network approach to go from raw pixels from the eight cameras to the 3D, 360-degree bird’s eye view model of the world.

What exists in v8.2 seems to be some sort of hybrid or halfway point to the aspiration of natively 3D, fully NN-based computer vision that Elon and Andrej have described, such as in the clip in the OP of this thread.

But as I and others have noted, that basic BEV milestone was part of FSD from the start, and has always been present in the beta. The whole point of the last 18 months of work by the AI/NN team has been to change the NN to synthesize a top-down model from the combined camera views. And that's what it has done. You can see this in every single FSD video back from the first public beta onwards. The FSD would not have been possible without this.

I dont know why you think 8.x is some inferior version of that. Sure, it's evolving and improving, but the basic ground-plan and model is there in 8.x. As I've noted, 9.0 (so far as has been indicated) is mostly about moving away from radar, so far as anyone can tell reading between the lines of Elons posts. I don't know why you think 9.0 is some huge radical rethink of the basic FSD stack .. since the stuff in the video you referenced has been in FSD all along. Switching to pure vision is a result of observations about the discrepancy between the vision and radar views, and while certainly a testimony to the growing confidence in the vision based model, isnt a driver of that.

It's also a mistake to think the goal of the NN is to synthesize a "view" of the world, in the sense of a viewable image. The BEV is a conceptual model .. the NN creates a mapped model view that is indeed from the top down (BEV), but its only at the model, not pixel, level. And also note that the term "3D" here refers to the temporal aspect (x+y+time) more than any z-axis stuff.

shrineofchance · May 9, 2021

drtimhill said:
the term "3D" here refers to the temporal aspect (x+y+time)

No. It’s x, y, and z axis. Width, height, and depth. When you add the dimension of time, it’s 4D.

drtimhill said:
I don't know why you think 9.0 is some huge radical rethink of the basic FSD stack

https://twitter.com/x/status/1373145910549483522

drtimhill · May 9, 2021

shrineofchance said:
No. It’s x, y, and z axis. Width, height, and depth. When you add the dimension of time, it’s 4D.

You might think of it that way, others dont in this specific context. The specific references used in much of the FSD discussion (including the video you reference) use 3D to mean a BEV view that includes a velocity (i.e. time) component.

Again, you seem to think Elon is implying there is no BEV in the 8.x FSD beta, which is manifestly not true. I'm sure 9.x will be a significant improvement (it had better be), but your OP and thread title was (and still is) inaccurate and misleading.

shrineofchance · May 18, 2021

Detailed explanation of 2D vs. 3D vs. 4D:

4D vision

This is 2D vision: Image credit: greentheonly. Projecting out 3D models from detections within individual 2D images. This is 3D vision: Image source: Karpathy at CVPR 2020. The neural network directly outputs a 3D model from the 8 camera images. This is 4D vision (3D + time)...

teslamotorsclub.com

drtimhill · May 18, 2021

shrineofchance said:
Detailed explanation of 2D vs. 3D vs. 4D:

4D vision

This is 2D vision: Image credit: greentheonly. Projecting out 3D models from detections within individual 2D images. This is 3D vision: Image source: Karpathy at CVPR 2020. The neural network directly outputs a 3D model from the 8 camera images. This is 4D vision (3D + time)...

teslamotorsclub.com

So what? I pointed out that there is terminology confusion, but that's irrelevant. As I've already noted, whatever its called, this basic technology is already present in the 8.x stack, as you can see if you examine the various beta videos. Aside from Elons usual hype words ("significant", "fundamental"), the actual functional change in 9.x is not to add BEV, but to enhance it and remove any reliance on radar. If you are expecting 9.x to be a fundamental change, as you claimed in the OP, then you are likely to be disappointed. Sure, it will be better (I hope), but hyping it as a "showstopper" is dubious at best. Remember all those huge excited threads before the holiday update? People were claiming it would be V11, and would have vast new features etc etc etc. What did we get? A few tweaks and a new game or two.

shrineofchance · May 18, 2021

Topic 1: the definition of “3D”

How many spatial dimensions would you say this purple cuboid around the fire truck has? One, two, or three?

Topic 2: BEV in FSD

I can't find any source that establishes to what extent bird's eye view (BEV) is present in FSD Beta besides these tweets from Elon:

https://twitter.com/x/status/1353663687505178627

https://twitter.com/x/status/1364480679019429888

He does not mention transitioning more cameras to BEV, but transitioning more neural networks.

Tesla has more than 40 neural networks running in FSD, according to Karpathy's CVPR 2020 presentation.

If you have a source besides these two tweets from Elon, please share it.

I never claimed that there was no BEV to any extent in v8.2. Rather, I conjectured that a transition from non-BEV to BEV from v8.2 to v9.0 would lead to major performance increases in v9.0. Not necessarily 0% of NNs using BEV to 100% using it, but a significant transition nonetheless. It will likely also be an improved implementation of BEV.

drtimhill · May 18, 2021

If you watch the entire Karpathy talk, and study the slides (and some of his tweets, which I dont have at hand), you will see that he pretty much admits that FSD without BEV really cannot be done. He shows with/without examples, where (say) an intersection cannot be recreated without the NN generating a BEV. The BEV examples he shows, right down to the color schema for the BEV view of roads, cars etc, matches pretty much exactly that in the 8.x FSD beta. Further, since he stated that BEV was required and that this was their primary focus in 2020, I think this is pretty conclusive that 8.x does indeed use BEV .. because you can see it on the car screen.

The distinction between cameras vs NNs is not relevant here, and I'm not clear what "focal areas" means in Musk-ese, but no doubt we shall see when 9.x does indeed get released.

As for this 2D/3D stuff that seems to get you flustered. Some other threads, and indeed some discussions elsewhere, started using "3D' to indicate a primarily 2D system (i.e. a map) with velocity projections, was "3D", which would have been better expressed as 2D+T or some such. I'm not disagreeing that we live in a world of 3 spatial and one time dimension (string theory notwithstanding), but merely that some references to "3D" in other threads should be taken to mean "top down map view plus velocity".

In fact, the NN is probably more accurately described as 2D+T or perhaps 2.5D+T, since I doubt if it would (say) correctly place a car if it was flying through the air. The visual system must ultimately establish a ground plane, and then use the NN to place objects onto that plane in the correct (X,Y) locations .. that, after all, is basically what a BEV is.

(oh, and that purple "cuboid" is of course strictly speaking a 2D co-planar abutted square and trapezoid on my computer screen

shrineofchance · May 18, 2021

Elon mentioned that in v9.0, users will see a probability distribution of objects:

https://twitter.com/x/status/1387893690354515970

Karpathy says in his CVPR 2020 presentation (at ~20:00) that designing NNs that can deal with uncertainty and probability is a difficult challenge with the new BEV nets.

I find it interesting that, seemingly, Tesla's solution for this challenge will be visualized in v9.0. Visualization doesn't always evolve in lock step with software progress under the hood, but in this case my hunch is that it's more than coincidence given what else Elon has said about v9.0.

shrineofchance · May 18, 2021

drtimhill said:
If you watch the entire Karpathy talk, and study the slides (and some of his tweets, which I dont have at hand), you will see that he pretty much admits that FSD without BEV really cannot be done.

"Cannot be done" is your interpretation. Karpathy doesn't actually say that; he just says the BEV approach works much better.

Also, we must deconflate "Level 4 autonomy" and "FSD Beta 8.2" (which is a Level 2 closed beta) when we discuss what Karpathy thinks can or can't be done.

In any case, read what I said above about BEV vs. non-BEV not being an all-or-nothing distinction. You are splitting hairs on this topic in service of no substantive point.

drtimhill said:
The distinction between cameras vs NNs is not relevant here

It is relevant. For starters, it makes your prior claim false.

drtimhill said:
Some other threads, and indeed some discussions elsewhere, started using "3D' to indicate a primarily 2D system (i.e. a map) with velocity projections, was "3D", which would have been better expressed as 2D+T

I don't care what random laypeople on this forum have said in threads I haven't read. I care about Karpathy and Elon mean by "2D", "3D", and "4D" in their public communications about FSD.

This should clear it up:

https://twitter.com/x/status/1285812824581660676

Bladerskb · May 18, 2021

drtimhill said:
If you watch the entire Karpathy talk, and study the slides (and some of his tweets, which I dont have at hand), you will see that he pretty much admits that FSD without BEV really cannot be done. He shows with/without examples, where (say) an intersection cannot be recreated without the NN generating a BEV. The BEV examples he shows, right down to the color schema for the BEV view of roads, cars etc, matches pretty much exactly that in the 8.x FSD beta. Further, since he stated that BEV was required and that this was their primary focus in 2020, I think this is pretty conclusive that 8.x does indeed use BEV .. because you can see it on the car screen.

The distinction between cameras vs NNs is not relevant here, and I'm not clear what "focal areas" means in Musk-ese, but no doubt we shall see when 9.x does indeed get released.

As for this 2D/3D stuff that seems to get you flustered. Some other threads, and indeed some discussions elsewhere, started using "3D' to indicate a primarily 2D system (i.e. a map) with velocity projections, was "3D", which would have been better expressed as 2D+T or some such. I'm not disagreeing that we live in a world of 3 spatial and one time dimension (string theory notwithstanding), but merely that some references to "3D" in other threads should be taken to mean "top down map view plus velocity".

In fact, the NN is probably more accurately described as 2D+T or perhaps 2.5D+T, since I doubt if it would (say) correctly place a car if it was flying through the air. The visual system must ultimately establish a ground plane, and then use the NN to place objects onto that plane in the correct (X,Y) locations .. that, after all, is basically what a BEV is.

(oh, and that purple "cuboid" is of course strictly speaking a 2D co-planar abutted square and trapezoid on my computer screen

This is "the next update has the shiny object that we have all been waiting on that will change everything" problem in the tesla community. Its quite evident that the BEV is already running in the car and as you said it would be hard to make turns in intersections without it. This has also been confirmed by verygreen. But if someone believed there's some special BEV somewhere, they will also easily believe that Tesla has some amazing cars behind the scene with L4 software.

https://twitter.com/x/status/1323394464828841985

shrineofchance · May 18, 2021

I'll say it for the third time:

BEV vs. non-BEV is not an all-or-nothing distinction!

shrineofchance · May 18, 2021

The FSD v9 showstopper

Not Just a Member

Member

she/her, they/them

Active Member

she/her, they/them

Active Member

Active Member

she/her, they/them

Active Member

she/her, they/them

Active Member

she/her, they/them

Active Member

she/her, they/them

Topic 1: the definition of “3D”​

Topic 2: BEV in FSD​

Active Member

she/her, they/them

she/her, they/them

Senior Software Engineer

she/her, they/them

BEV vs. non-BEV is not an all-or-nothing distinction!​

she/her, they/them

Similar threads

Topic 1: the definition of “3D”

Topic 2: BEV in FSD

BEV vs. non-BEV is not an all-or-nothing distinction!