Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
The next big milestone for FSD is 11. It is a significant upgrade and fundamental changes to several parts of the FSD stack including totally new way to train the perception NN.

From AI day and Lex Fridman interview we have a good sense of what might be included.

- Object permanence both temporal and spatial
- Moving from “bag of points” to objects in NN
- Creating a 3D vector representation of the environment all in NN
- Planner optimization using NN / Monte Carlo Tree Search (MCTS)
- Change from processed images to “photon count” / raw image
- Change from single image perception to surround video
- Merging of city, highway and parking lot stacks a.k.a. Single Stack

Lex Fridman Interview of Elon. Starting with FSD related topics.


Here is a detailed explanation of Beta 11 in "layman's language" by James Douma, interview done after Lex Podcast.


Here is the AI Day explanation by in 4 parts.


screenshot-teslamotorsclub.com-2022.01.26-21_30_17.png


Here is a useful blog post asking a few questions to Tesla about AI day. The useful part comes in comparison of Tesla's methods with Waymo and others (detailed papers linked).

 
Last edited:
My Guess is there is a reason Tesla unlike any other OEM opening an auto insurance company while scaling. IF that liability trigger is ever required I would bet it would only be covered using Their direct insurance. Insurance covers more then a direct accident so I suspect they can cover the random occurrence failure from the policy overage.
Yes - this is what I've been saying. Tesla will never directly assume responsibility - but through insurance they can say things like no deductible if FSD was active.

Essentially, Tesla needs to limit liability in case of accidents.

BTW, what is the largest individual payout for a manufacturer defect resulting in fatality ?

ps : I've to add, it would be interesting to see if OEMs start adding L3 features to cars on highways with more realistic ODD. That might force Tesla too - atleast on highways.
 
This is what I am talking about, it is quite common. In San Diego it is routine to have free-flowing traffic followed by common sudden slowdowns (see I-805S north of Golden Triangle south of the I-5 merge for example). It is just routine. Caused by massive amounts of added traffic at a particular location and time of day.

LA too. Any big city I would guess!

Anyway it is a trivial problem for a human. I’m not sure whether or not FSD with current sensors can do it. I don’t know how far out they are allowing the system to respond.
I agree that in the scenario of a sudden slow down of freeway traffic, FSD and AutoPilot both delay slowing down for too long. I have let it do its thing a few times, and it does slow down, but it waits too long and slows too abruptly for my comfort. I don't think this is unsafe, unless my wife is riding shotgun, in which case she might use the shotgun on me for scaring her. What I usually do is scroll the speed down so that FSD has less speed to loose, and it performs pretty comfortably. Once it starts to slow below the set speed, I scroll back up so it can take off when we get past the slowdown.

So, I think this is not really a safety issue, but for me it is a comfort/convenience issue. But scary behavior of this sort will make FSD fail as a product because comfort is really the benchmark. On the other hand, I expect the FSD team focuses on safety first, which means making the car solve problems quickly, because the dangerous problems develop fast. Smoothness and comfort are polish to be added later in the development process. Hence the hands on the wheel, always pay attention beta program.

One other related observation: the cameras do see far beyond what the visualization (and the driving behavior we are discussing) would suggest. I don't think I have seen this particular proof with FSD yet, but on AutoPilot it occurred several times for me: A 4-lane rural highway, very few other cars, AP driving in the right lane and slowly approaching a slower truck quite far ahead of us, AP decides to change to the left lane to pass. This happens maybe a quarter mile from the truck, long before the truck appears in the visualization. It really surprised me that it did that, so I paid attention every time that situation occurred.

When I was trying to figure out how it can do this, I estimated the angular resolution of the narrow field forward facing camera, and found it to be about the same as a human eye's central, foveal resolution. At these distances, binocular depth perception is negligible, but we are still able to judge the speed of the distant truck because of several visual cues. It make sense that Tesla would use our visual capabilities as a benchmark for their cameras. The fact that it has similar resolution (and better dynamic brightness range and time resolution) which extends 360 degrees around as opposed to our very limited foveal field of view is very reassuring. The computation power and complexity is another question, obviously, hence the beta program.
 
Resolution is way less than human eye - estimated at over 500 Meg.

It’s hard to compare human vision with a simple camera sensor. There’s a lot more to it than straight resolution. The density of rods and cones is not constant across the retina so the ‘resolution’ of the retina is not constant. Beyond that, there are local nerve networks that affect the effective resolution built into the retina, not to mention the specialized networks that exist in the brain.
 
I don't think this is unsafe,

I expect the FSD team focuses on safety first.

I think the faster you slow down on a freeway the less safe it is. The only high risk here is the rear-end collision which could be pretty severe. I’d prefer in these situations that it slow down gradually and briefly flash the hazards (could be situational depending on closing speed of trailing traffic) to alert traffic behind.

What I usually do is scroll the speed down so that FSD has less speed to loose, and it performs pretty comfortably.
This is what the OP here did. Seems perfectly fine. I disengage because it is easier for me and it gives an opportunity to provide a disengagement and feedback to Tesla. (They are striving to eliminate these types of disengagements.)

This happens maybe a quarter mile from the truck, long before the truck appears in the visualization.
The car can certainly see beyond the visualization - it’ll often respond to lights not yet displayed.

I don’t think the cameras have the resolution to match human perception. You can look at the output of a similar resolution camera for a static image and just see that it isn’t enough to match human sight. Someone with well corrected eyes viewing the same scene can see much better what is going on, read signs at greater distance, etc.

It’s one thing to respond to a single vehicle. There is the question of repeatability in that case. It’s also quite a different and more challenging problem to sort out a jumble of vehicles, figure out which ones are stopped and which lanes are moving, etc. Usually it is trivial for an attentive human to perform this task.

I have no idea if the cameras limit the current lookahead performance, though. It may be possible to do better, I have no idea. At some point it becomes an issue of what level of detail is required for perception to actually work and distinguish objects.

which means making the car solve problems quickly, because the dangerous problems develop fast
I’ve not once seen the car react nearly as fast as an attentive human to developing scenarios. It’s a bit disconcerting actually, since it sometimes reacts well after the hazard has cleared. I’d love to see better performance in emergency scenarios.
 
Last edited:
When Ron Baron one of the largest Tesla investors says his investment firm talks weekly with Tesla and within the last couple of weeks he met with senior FSD managers one has to wonder why Ron Baron continues to be so bullish on Tesla and FSD? What does Ron Baron know about V12 and the future for FSD that we only guess at? Makes you think perhaps the future is brighter for FSD then many think.
@1:25

 
It’s hard to compare human vision with a simple camera sensor. There’s a lot more to it than straight resolution. The density of rods and cones is not constant across the retina so the ‘resolution’ of the retina is not constant. Beyond that, there are local nerve networks that affect the effective resolution built into the retina, not to mention the specialized networks that exist in the brain.
The way a doctor would answer!!
 
This is what I am talking about, it is quite common. In San Diego it is routine to have free-flowing traffic followed by common sudden slowdowns (see I-805S north of Golden Triangle south of the I-5 merge for example). It is just routine. Caused by massive amounts of added traffic at a particular location and time of day.

LA too. Any big city I would guess!

Anyway it is a trivial problem for a human. I’m not sure whether or not FSD with current sensors can do it. I don’t know how far out they are allowing the system to respond.
I’m confused - you saying things like ‘the edge of the sensors abilities.’ That would imply a completely open freeway with no cars whatsoever then approaching a wall of completely stopped cars 250 meters down the road. I don’t think I’ve ever encountered that and I highly doubt that situation exists in San Diego or LA.

I’ve had plenty of cases when AP/FSD has been in traffic that suddenly starts to slow down and every time it’s slowed and stopped without issue so I can’t see how you‘re not sure if “FSD with current sensors can do it.” It already is doing it.

As far as it being trivial for humans, not really - how many fender benders occur daily during rush hour on freeways?
 
Makes you think perhaps the future is brighter for FSD then many think.

I have no crystal ball. However, it’s entirely consistent to think this valuation will happen and also realize that the current vehicles with current hardware will never achieve this penetration which he mentioned in the video to justify the potential valuation.

That is entirely possible and it might happen. Obviously there would have to be leaps forward in the capabilities of whatever hardware ultimately comprises the system that achieves this market dominance.

Gotta remember the timelines are very long here. Things might look and perform very differently in five or ten years! Definitely plenty of time left to dominate the market, with the right decisions.

As far as it being trivial for humans, not really - how many fender benders occur daily during rush hour on freeways?
I’m talking about attentive humans. You have to be careful when thinking about statements like this. You may well be missing in your mental model of this the enormous number of collisions averted on a daily basis.

So it’s quite possible to implement a system which solves nearly all of these particular fender benders you mention yet dramatically increases the number of these specific types of collisions. Just remember that is completely plausible (and I think probably likely at the moment). You might simply reveal many of the averted collisions by implementing a system which solves the low-hanging fruit but doesn’t have capability to outperform attentive humans.

That would be my concern in these slow-reaction scenarios.

That would imply a completely open freeway with no cars whatsoever then approaching a wall of completely stopped cars 250 meters down the road. I don’t think I’ve ever encountered that and I highly doubt that situation exists in San Diego or LA.
As I said, I’m not going to post videos of such things. They would be easy to produce though!

Certainly these scenarios exist. Take the example above - anyone traveling those freeways (805 south or 56 bypass to 805 south) in the afternoon knows that one. It’s statistical whether you have a lead car or not (which usually solves the problem because FSD just follows it). But certainly it can happen that you do not have a lead car.

It’s just routine on large fast moving freeways for this type of thing to occur. It also happens if there is a massive accident that occurs a mile or two ahead of you (much more rare).

Just look at the Google traffic layer here on a few weekday afternoons. It varies. But it goes from green to deep red quite rapidly at times. Especially if you come rocketing off of the 56 bypass (“I-5 Local Bypass”).

IMG_9302.jpeg


I’ve had plenty of cases when AP/FSD has been in traffic that suddenly starts to slow down
Yeah of course it can do this. Obviously that has nothing to do with the scenario being discussed though.
 
Last edited:
I’m talking about attentive humans. You have to be careful when thinking about statements like this. You may well be missing in your mental model of this the enormous number of collisions averted on a daily basis.
Ah, but there’s the rub! Humans have finite attention spans and are not universally attentive which is how these accidents occur. Even normally attentive humans get distracted by a child in the back seat, accidentally dropping something, etc. If humans were universally attentive then automated braking systems wouldn’t be necessary.
Certainly these scenarios exist. Take the example above - anyone traveling those freeways (805 south or 56 bypass to 805 south) in the afternoon knows that one. It’s statistical whether you have a lead car or not (which usually solves the problem because FSD just follows it). But certainly it can happen that you do not have a lead car.

It’s just routine on large fast moving freeways for this type of thing to occur. It also happens if there is a massive accident that occurs a mile or two ahead of you (much more rare).

Just look at the Google traffic layer here on a few weekday afternoons. It varies. But it goes from green to deep red quite rapidly at times. Especially if you come rocketing off of the 56 bypass (“I-5 Local Bypass”).

View attachment 987994


Yeah of course it can do this. Obviously that has nothing to do with the scenario being discussed though.
Green doesn’t mean no traffic, it means free flowing traffic. I challenge you to find a 250 meter stretch of 805 south that has no cars in the afternoon.

It’s routine for traffic to suddenly come to a stop on urban freeways, especially during rush hour (well, in LA it’s just plain stopped during rush hour…) The key here is that there’s traffic. The system doesn’t have to detect stopped traffic ¼ mile ahead, it just has to have an appropriate following distance and react appropriately, something it does in my experience. Now regular human drivers often do not have an appropriate following distance, but that’s a different issue.
 
Last edited:
The system doesn’t have to detect stopped traffic ¼ mile ahead
It really does need to match human performance if you want to match and then exceed human safety.

Ah, but there’s the rub!
Sure, but not sure what your point is. As I said: Just because a system (unaided by humans) can solve all these inattention accidents does not mean it will get rid of these types of accidents!

I think you might be under the impression that if the system solves 80% (arbitrarily chosen) of accidents that there will be 80% fewer accidents. That unfortunately is not the case.


I challenge you to find a 250 meter stretch of 805 south that has no cars in the afternoon.
Easy. Just come from 56 west. There is little traffic on the bypass, sometimes. Will happen periodically, though obviously it is statistical.

And it doesn’t have to be a stretch of freeway. If just has to be one lane.

I am not sure why you are debating this. It’s clearly something that happens, otherwise I wouldn’t have said it did! And quite sure that many others here concur.
 
Last edited:
  • Like
Reactions: jebinc
When Ron Baron one of the largest Tesla investors says his investment firm talks weekly with Tesla and within the last couple of weeks he met with senior FSD managers one has to wonder why Ron Baron continues to be so bullish on Tesla and FSD? What does Ron Baron know about V12 and the future for FSD that we only guess at? Makes you think perhaps the future is brighter for FSD then many think.
@1:25

That’s an interesting thought. Ofcourse Tesla might achieve FSD in 5 years with new hardware and that would shoot up the SP. It might achieve great success with CT and the Model “2” and robots that might take 1,600.

I think the problem has been presented and looked at wrong by a number of bulls. It’s not about what V12 can do, but what it can’t… or looked at it differently, is V12 going to be 100x or 1000x better than V11 ?

Tesla done a number of rewrites at this point with Elon saying this is it every time. Ofcourse he can be “right” one of these times … so, who knows.
 
Is V12 going to be 100x or 1000x better than V11 ?

I think a vanishingly small portion of the market believes this scenario (fortunately!).

At this point, the market likely knows how this is going to go. It is expecting incremental improvements.

Ofcourse he can be “right” one of these times … so, who knows.
Haha. If there were a bunch of unknowns, or new hardware, sure, that could be possible. But in this case it is all pretty clear about v12.

We’ve even (allegedly) seen it in action! So very few unknowns.
 
The question still remains what could Ron Baron know that keeps him so bullish? He is not the average investor who will blindly follows Tesla. I think most of us who follow this thread are pretty critical of FSD in the near term so one wonders if Ron actually uses FSD?
 
The question still remains what could Ron Baron know that keeps him so bullish?
As I said, the thesis might be that Tesla will solve FSD and dominate, with that and a bunch of other drivers (that is how I interpreted it). But it doesn’t matter at all whether FSD 11 or 12 work, for that thesis to be validated. Has nothing to do with solving FSD as we think of it here in this thread. It’s completely inconsequential, since that is a such a short time horizon. Tesla doesn’t have to get anything working on current vehicles. V11 experiences as described here are all pretty irrelevant. All that matters is what happens next (in the next few years).

Of course he doesn’t know anything about v12, but plenty of potential with Dojo and other AI hardware development. We’ll see if Tesla can actually execute, but I can totally see how someone can have a bullish outlook on this. And it’s totally detached from the descriptions of v11 experiences in this thread and the eventual v12 thread. Just doesn’t matter. That stuff is DOA when it comes to solving the ultimate objective (which is fine).
 
Last edited:
Resolution is way less than human eye - estimated at over 500 Meg.

Quoting from that story: "Really, though, the megapixel resolution of your eyes is the wrong question. The eye isn't a camera lens..."

As I said, I compared the angular resolution of the human central foveal vision of the eye with that of one of the Tesla cameras. I believe I posted that calculation in TMC some time ago.

Here is an old article, actually referenced in the one you link, which better explains the relationship between eyeballs and pixel arrays. Resolving the iPhone resolution However, one issue was left out of both these articles, and that is the field of view, i.e. telephoto vs wide angle. For my calculation I found some specs on the Tesla cameras somewhere, long ago forgotten, but it included the field of view and pixel counts.

Rather than redoing the math, we could just use the the slate author's iPhone 4 as an illustration. He shows that it's pixels, when viewed at 12 inches, were about what an eye could resolve. That phone screen had a resolution of 940 X 640, i.e. 0.6 megapixels. So, you can hold an iPhone 4 about 12 inches from your face, and you can get a feel for the width of view a 0.6 megapixel camera can have at human eyeball resolution. Tesla HW3 cameras are around 1.2MP, so they can get eyeball resolution with a somewhat wider viewing angle. But again, our eyes have that much resolution only at the center of what we look at, so we point our eyes at what we want to see. Tesla cameras cover their entire field with this pixel density, there are several pointed all around the car, so the car has a great advantage over our eyes which can see clearly in only one direction at a time.

Some of the Tesla cameras do have wider views, and so lower angular resolution, but the discussion about stopped traffic would pertain to the front, narrow view (telephoto-ish) front camera.

Part of the confusion in your linked article is that our peripheral vision is very sensitive to motion even though we can not see clearly there. This very wide field of motion sensitivity could be implemented by a very high pixel count, but that is not how our eyes work. My understanding is that processing is done in the retina which detects changes, but not actual images. This may be part of the apparent discrepancy between the two articles, and the reason the author wrote what I quoted above.
 
  • Like
Reactions: jebinc
Fortunately, Elon has been unambiguous, finally providing complete clarification on Twitter recently:

View attachment 987665

This is good stuff and represents Tesla’s objective and definition (L4/L5!). We’re nearly there!

Elon: FSD Beta tweets
Didn't Elon also said "Most accidents are actually small — a broken fender or scratched side of the car." when launching Tesla insurance .... LOL
 
It’s hard to compare human vision with a simple camera sensor. There’s a lot more to it than straight resolution. The density of rods and cones is not constant across the retina so the ‘resolution’ of the retina is not constant. Beyond that, there are local nerve networks that affect the effective resolution built into the retina, not to mention the specialized networks that exist in the brain.
All true. But definitely better than 720p ;)

BTW, we can move the head and eyes to look at look at the point of interest such that best resolution can be employed where needed.
 
Quoting from that story: "Really, though, the megapixel resolution of your eyes is the wrong question. The eye isn't a camera lens..."

As I said, I compared the angular resolution of the human central foveal vision of the eye with that of one of the Tesla cameras. I believe I posted that calculation in TMC some time ago.

Here is an old article, actually referenced in the one you link, which better explains the relationship between eyeballs and pixel arrays. Resolving the iPhone resolution However, one issue was left out of both these articles, and that is the field of view, i.e. telephoto vs wide angle. For my calculation I found some specs on the Tesla cameras somewhere, long ago forgotten, but it included the field of view and pixel counts.

Rather than redoing the math, we could just use the the slate author's iPhone 4 as an illustration. He shows that it's pixels, when viewed at 12 inches, were about what an eye could resolve. That phone screen had a resolution of 940 X 640, i.e. 0.6 megapixels. So, you can hold an iPhone 4 about 12 inches from your face, and you can get a feel for the width of view a 0.6 megapixel camera can have at human eyeball resolution. Tesla HW3 cameras are around 1.2MP, so they can get eyeball resolution with a somewhat wider viewing angle. But again, our eyes have that much resolution only at the center of what we look at, so we point our eyes at what we want to see. Tesla cameras cover their entire field with this pixel density, there are several pointed all around the car, so the car has a great advantage over our eyes which can see clearly in only one direction at a time.

Some of the Tesla cameras do have wider views, and so lower angular resolution, but the discussion about stopped traffic would pertain to the front, narrow view (telephoto-ish) front camera.

Part of the confusion in your linked article is that our peripheral vision is very sensitive to motion even though we can not see clearly there. This very wide field of motion sensitivity could be implemented by a very high pixel count, but that is not how our eyes work. My understanding is that processing is done in the retina which detects changes, but not actual images. This may be part of the apparent discrepancy between the two articles, and the reason the author wrote what I quoted above.
IMG_9305.jpeg


This was taken at a distance of about 30 feet with my iPhone 13 mini 12MP camera. It’s cropped from a 2.9MB HEIF image. Now it’s 73x34 and it’s an image of text about 8 inches wide.

I took a few to make sure focus, etc. were not an issue.

It’s much more clear with my human eyes. I can read the time and date, and would be able to read the month and day number if it changed (and besides I didn’t know the date for certain, until reading this).

Why is this? This is actually a question. I don’t know whether I am making a valid comparison or there is something wrong with this simple test case.

To me, corrected human vision seems amazingly sharp compared to a 12MP image. Need that 48MP camera I guess!
 
Last edited:
As I said, the thesis might be that Tesla will solve FSD and dominate, with that and a bunch of other drivers (that is how I interpreted it). But it doesn’t matter at all whether FSD 11 or 12 work, for that thesis to be validated. Has nothing to do with solving FSD as we think of it here in this thread. It’s completely inconsequential, since that is a such a short time horizon. Tesla doesn’t have to get anything working on current vehicles. V11 experiences as described here are all pretty irrelevant. All that matters is what happens next (in the next few years).

Of course he doesn’t know anything about v12, but plenty of potential with Dojo and other AI hardware development. We’ll see if Tesla can actually execute, but I can totally see how someone can have a bullish outlook on this. And it’s totally detached from the descriptions of v11 experiences in this thread and the eventual v12 thread. Just doesn’t matter. That stuff is DOA when it comes to solving the ultimate objective (which is fine).
We don't know that. Don't you think v12 came up when he met with senior FSD staff? Of course that doesn't mean Tesla would share any v12 information with him but to just state he doesn't know anything may not be accurate either.
 
  • Funny
Reactions: AlanSubie4Life