Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
The next big milestone for FSD is 11. It is a significant upgrade and fundamental changes to several parts of the FSD stack including totally new way to train the perception NN.

From AI day and Lex Fridman interview we have a good sense of what might be included.

- Object permanence both temporal and spatial
- Moving from “bag of points” to objects in NN
- Creating a 3D vector representation of the environment all in NN
- Planner optimization using NN / Monte Carlo Tree Search (MCTS)
- Change from processed images to “photon count” / raw image
- Change from single image perception to surround video
- Merging of city, highway and parking lot stacks a.k.a. Single Stack

Lex Fridman Interview of Elon. Starting with FSD related topics.


Here is a detailed explanation of Beta 11 in "layman's language" by James Douma, interview done after Lex Podcast.


Here is the AI Day explanation by in 4 parts.


screenshot-teslamotorsclub.com-2022.01.26-21_30_17.png


Here is a useful blog post asking a few questions to Tesla about AI day. The useful part comes in comparison of Tesla's methods with Waymo and others (detailed papers linked).

 
Last edited:
So, is anyone still trying to make the claim that the car cameras will not limit the car’s visual range below that of a human?
Your question is a bit ambiguous, so a simple "yes" might not be correct.

I will say agin that it is my understanding that the front narrow view camera has an angular resolution similar to that of a human eye.

Megapixels is a term camera marketeers have used to claim "my camera is better than their camera". Clinging to this spec in trying to understand the issues with FSD is not productive. In fact, the article you linked to said as much: "Really, though, the megapixel resolution of your eyes is the wrong question."

I suspect that a more significant problem is latency: how long does it take the car to react to something the camera sees. It is a bit difficult to measure this, but I have one stop sign on my route which is hidden from view till we come over a rise, and about one second later the car slams on the brakes. It then relaxes the brakes as it figures out the the now visible stop is far enough away. So it seems that the reaction is about a second after the camera sees.

To mitigate this, FSD maintains a dynamic model of world around it, and predicts where things will be in the near future and it drives based on that prediction. But if something changes rapidly or suddenly becomes visible, that model will take some time to update. At 70 mph, the car will travel a bit over 100 ft if it takes one second to respond.

The owners manual warns about one such situation. On a multi lane road, you are following a car at a safe distance, but that car suddenly swerves into another lane, which allows you to see the stopped vehicle which the lead car swerved to barley miss. The manual warns that autopilot may not be able to stop in time. Part of that is simply the stopping distance, but the latency means that you car does not even start to brake for a moment. Another example, which I experienced, is a deer jumping into our path - the car did not react at all, even though the dash cam captured the entire incedent. In contrast, I did start to swerve, but not soon enough. (Slower still was Tesla's 2 months to get parts to our body shop, but that was discussed on a different thread.)

I also suspect that Tesla is intentionally delaying defensive and smooth driving improvements so that FSD can learn to avoid collisions in adverse circumstances. This is just a hunch...

My point is that there are many other factors in play besides camera resolution. Some folks in this FSD thread believe that "better" cameras are needed. Sadly, more pixels means more pixels to process, and clearly the processing is already too slow to prevent some collisions. The rumor that Tesla is abandoning 300,000 lines of C code in hopes that a neural network can do better suggests that they understand that the software is a major limiting factor. I hope they are right.

When you worry that the camera pixel count is causing the abrupt slowing for slow traffic ahead, I think you are looking in the wrong direction, so to speak.
 
In fact, the article you linked to
I didn't link to any articles.
I suspect that a more significant problem is latency
Yes, latency is a big problem as I have mentioned a few times. Undoubtedly that impacts this poor reaction to stopped traffic, as well, since there appears to be latency in every action the system takes. That’s why it is so important to be ready to take over (to avoid deer, people, misc. animals, etc.) - even with the reaction time you’ll be considerably faster than the car.
narrow view camera has an angular resolution similar to that of a human eye
The narrow vs. wide effect can be used to adjust the numbers I mentioned above; it's a simple scaling. If it's narrow with the same number of pixels the cameras has proportionally more pixels to represent a particular piece of the scene.

When you worry that the camera pixel count is causing the abrupt slowing for slow traffic ahead, I think you are looking in the wrong direction, so to speak.
I've never claimed that adding more pixels alone is going to fix the problem. Everything would have to be improved.

I'm just asking what the limitations are for particular observed problems, and trying to figure out what the most likely candidates are (there may be several contributors!).

I think of it in terms of limits. At some point, resolution will limit performance. I doubt we're at that limit yet. But I don't think we're super far off, either. Whether it is a limit may be situation dependent, too.
 
Last edited:
Do we know how far the new HW4 cameras can see?
Most of this discussion though doesn't really matter to me until Tesla fixes the B-pillar problem. Had to perform 2 disengagements today caused by obstructions to the B-pillar cameras. At least you know when the view is obstructed so I know when to lean forward knowing a disengagement is likely.
Just spent a week in Houston using FSD and didn't encounter one obstructed view intersection. Same in Florida several months ago. But In Massachusetts obstructed view intersections are extremely common. The only adjective I associate with Tesla and the B-pillar problem is stupidity because the problem is so obvious.
Car is 3 feet into the crossing road when I took this picture. When I drive this manually I'm less than a foot into the intersection since I lean forward. To the right is a blind hill with 13 degree grade so you cannot afford to be sticking out since cars coming from the right don't see this intersection until they are on top of it as they come over the hill. And of course cars from the left are way too close when FSD decides it's safe to go.
You could have your head mounted on the hood and not see around that bush wall. Don’t see the point. I see more fault in the landscaper then the B cameras.
 
  • Funny
Reactions: FSDtester#1
It is a bit difficult to measure

Was thinking a bit more about this.

1) As mentioned certainly a second or so of latency is a factor. (As you say, hard to measure.)

2) In this common case of stopped traffic on the freeway (but not limited to that), the delay in response exceeds typical latencies. So I think there is something else at play, presumably with perception.
 
Was thinking a bit more about this.

1) As mentioned certainly a second or so of latency is a factor. (As you say, hard to measure.)

2) In this common case of stopped traffic on the freeway (but not limited to that), the delay in response exceeds typical latencies. So I think there is something else at play, presumably with perception.
”certainly a second or so” and “this Common case” are why your discussions revolve in a circular debate. You theorize in absolutes that are far from known fact. Just an observation.
 
I didn't link to any articles.
...
I'm just asking what the limitations are for particular observed problems, and trying to figure out what the most likely candidates are (there may be several contributors!).
Sorry. I thought I was replying to:
Resolution is way less than human eye - estimated at over 500 Meg.

 
  • Like
Reactions: AlanSubie4Life
You could have your head mounted on the hood and not see around that bush wall. Don’t see the point. I see more fault in the landscaper then the B cameras.
Wrong, I can lean forward enough to make a safe turn. And yes the shrubs shouldn't be this close to the road but that is the reality FSD has to solve.
Besides the next intersection looks like this. which I can take safely turn manually by leaning forward. FSD just launches out even though it cannot see. The location of the B-pillar camera is simply inadequate regardless and Tesla needs to address this. I expected the camera location to be solved when the Model Y was released since it was clear my Model 3's cameras were a problem.

B-Pillar View.jpg
 
Last edited:
  • Informative
Reactions: FSDtester#1
Be very careful when you talk about "human vision". The image coming from your eye is actually pretty bad, and is very low resolution outside of the fovea (though it is optimized for things other than acuity at the edges). Your brain does a VAST amount of processing to create what we think of as "the real world", much of which is extrapolated from the rather poor data coming from the retina. For example, you eye is never still, and in fact if you TRY to keep you r eye truly still you will find you cannot actually see very well at all. This is because your brain continually oversampled information from the retina to fill in extra detail. The cars NN can do this as well (not saying they DO in FSD, but this can be done) .. it can get far more data from a video image than a still image because something that is ambiguous in one frame can be validated across multiple frames.

As an example, here is a famous optical illusion. Note that the "real image" you see is VERY FAR from what actually arrives at your retina. The squares "A" and "B" are the EXACT same gray level in the image, yet your brain creates a visual representation that is very far removed from the objective image.

That is why all the latest VR headsets utilize more pixels straight ahead and lower resolution around the edges. It makes you wonder if Tesla does the same.
 
why your discussions revolve in a circular debate.
Not sure what you mean.

I think there isn’t really debate.

The subject was FSD v11.x continuing to drive without easing towards stopped traffic. This terrifies passengers.

The only potential causes I can see are latency and lack of actionable perception range (perception range may well exceed this but not be actionable).

Are there other possibilities?

Someone suggested it was because Tesla was prioritizing other things. That may well be the case but it remains a valid issue. It may be that Tesla thinks this response is fine.

Today my wife indicated I could no longer use FSD on the freeway with her because it surged towards traffic in the carpool lanes, rather than merging properly here (easing on the accelerator would have been appropriate). I did disengage but not before the vehicle surged, which meant the damage to confidence was done. It was a trivial merge operation; there were massive amounts of space behind the vehicles it surged towards in the through lanes.
 
Had a good test with deer last night on highway in rural PA. FSD did not see a deer that was straddling the lane line. No slow down or move over from where the deer was standing. Luckily it wasn't in our lane since we for sure would have hit it.

Also was super impressed with FSD again on another long road trip. Only thing it's really awful at is merging and exiting highways. I dont get why it's so aggressive turning merging into the exit lane. It on occasion crosses the solid white line on the right then aggressively centers back to the left. They should just progam it to act like a normal lane change.
 
Yes, latency is a big problem as I have mentioned a few times. Undoubtedly that impacts this poor reaction to stopped traffic, as well, since there appears to be latency in every action the system takes. That’s why it is so important to be ready to take over (to avoid deer, people, misc. animals, etc.) - even with the reaction time you’ll be considerably faster than the car.
Latency is the major problem with V11. It's so bad that the car will initiate hard braking after cross traffic has already cleared ego's path. It regularly will brake for a vehicle moving toward ego's lane after the offender has moved back. I believe it's the cause of jerky starts when following traffic at intersections as well as hard braking when there's stopped traffic ahead of ego.

Latency may be a major reason Tesla is going to E2E on V12. It may be the reason why Tesla has yet to implement recognition and response to various things like school buses and school zones. Anything new likely adds to the latency.
 
Latency may be a major reason Tesla is going to E2E on V12. It may be the reason why Tesla has yet to implement recognition and response to various things like school buses and school zones. Anything new likely adds to the latency.
Yeah, could be. Just seems like grasping at straws. I don't really see why going E2E would necessarily reduce latency to an acceptable level. Hopefully it's a design parameter they will reduce to superhuman levels.

I don't think there's a guarantee that E2E means reduced latency, but certainly no expert on that. Probably have to choose the right architecture to make sure it is low. It seems like "simplifying" should reduce the latency...but will it be reduced by a factor of 10 (what is needed)?
 
Last edited:
If it's a latency issue, the solution is ridiculously simple: throw CPU power at it. HW5 could give us 5x performance and eliminate the issue.

Others have mentioned that the car starts slowing well in advance of an upcoming red light, sometimes even before the light is visualized on screen. With a perception range of 820 feet, that's about 7 seconds to go from 80mph to 0mph. If there is a 2 second latency, that makes it 5 seconds, which is pretty aggressive slow down.

The fact is that none of us are autonomous experts, none of us work for Tesla and know what is involved on the back end, or what they have planned. So it's all just guessing and should be taken as such
 
slowing well in advance of an upcoming red light, sometimes even before the light is visualized on screen
These are not the same thing necessarily.
It's quite common for the system to react to things not visualized on the screen. The visualization is quite short range.
So if it reacts to something that is not visualized, it's not necessarily well in advance.

With a perception range of 820 feet, that's about 7 seconds to go from 80mph to 0mph
Where is this 820-foot perception range coming from? There is not much evidence for it, even if the front narrow camera has a range of 250 meters. It seems that 820 feet is just made up.

More importantly, this math is wrong. Assuming constant deceleration, the average speed would be 40mph over the interval. So it would take 14 seconds to go from 80mph to 0mph in 820 feet (820feet/40mph = 14 seconds). 0.26g; brisk but not excessive.

820 feet, 250 meters, is a very long distance (though humans can easily perceive and react at distances way longer than that when conditions allow). It's 7 seconds at 80mph as you implied. I've never seen reactions from FSD at anything close to that range. Even allowing for 2 seconds latency (I think slightly high), you'd still be looking at 586 feet. 10 seconds to stop assuming constant deceleration. 0.36g, starting to become a bit uncomfortable, but quite manageable. Maybe I've seen reactions at this distance? Not sure.

For reference, it takes about 190-220 feet to stop from 80mph (assuming reaction time of zero and typical Tesla tires, 190 feet is for PS4S, Model 3). Most people won't be happy with that stop though.

Someone really needs to take some videos and figure all of this out. It wouldn't be all that difficult to figure out if you live in a flat area with long lights. I'm surprised someone hasn't done a detailed study already, actually.
 
Last edited:
Wrong, I can lean forward enough to make a safe turn. And yes the shrubs shouldn't be this close to the road but that is the reality FSD has to solve.
Besides the next intersection looks like this. which I can take safely turn manually by leaning forward. FSD just launches out even though it cannot see. The location of the B-pillar camera is simply inadequate regardless and Tesla needs to address this. I expected the camera location to be solved when the Model Y was released since it was clear my Model 3's cameras were a problem.

View attachment 988219
Yes, and the front camera completes the front segment. No matter how many bush pics you post the answer is still the same. If there was Only a B side camera sure, that’s not the case
IMG_0621.png
 
I had a first for me yesterday. I had disengaged FSD to enter a roadway from an interstate, this exit and entry required a merger and then a quick move to the far-left lane in a short distance and I did not trust FSD to do it in traffic, once in the left lane there was a stop light at an intersection that was green, on the other side of the light a construction zone began and it was at this point I reengaged FSD. There was another car to my immediate left in the opposite lane when I reengaged FSD, after reengagement it immediately put on the left turn signal and jerked the wheel to the left attempting to drive head on into the car in the opposite lane. I had a firm grip on the wheel and FSD disengaged.
 
Last edited:
  • Informative
Reactions: JHCCAZ
If it's a latency issue, the solution is ridiculously simple: throw CPU power at it. HW5 could give us 5x performance and eliminate the issue.
But it's not just processing power. There's are inadequate camera positions, an insufficient sensor suite (per the industry), insufficient quality of training data, and an abundance of edge cases. The way things are going, one could assume there's an iceberg of data below still unaccounted for. And last but not least, NNs that don't easily retain additional training data.

Said another way, all the CPU power in the world won't solve current FSD SNAFUs. There needs to be a balanced approach. Anything else is just marketing BS to possibly placate a smaller and smaller gullible customer base.