Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

FSD Vector space implementation

This site may earn commission on affiliate links.

scaesare

Well-Known Member
Mar 14, 2013
11,035
26,346
NoVA
(continuing topic from the Investor Discussion thread)

Discoducky said:
AK says "we'd like to" implying that they haven't yet. This is a huge task and while I think it would be cool if they were, I think this is a critical long-pole item for single stack vector space.

Also, I will continue this, if needed, in another thread come market open tomorrow...

Thanks for the exchange and your insight... and agree best moved elsewhere come tomorrow...

So, in watching Chuck Cook's latest video, I note that the median box, jumps around a bit, and the road median curbs are darn solid. The creep-wall is solid too.

So I understand @Discoducky the idea that those are vector-space representations, and as such you suspect they are sourced from previous FSD encounters...but does the fact their jitter seems to be based on visibility imply differently?
 
  • Like
Reactions: petit_bateau
(continuing topic from the Investor Discussion thread)





So, in watching Chuck Cook's latest video, I note that the median box, jumps around a bit, and the road median curbs are darn solid. The creep-wall is solid too.

So I understand @Discoducky the idea that those are vector-space representations, and as such you suspect they are sourced from previous FSD encounters...but does the fact their jitter seems to be based on visibility imply differently?
just watched it and yes, I saw it bounce around as well. I'm still believing that there is a trust of the vector space non-real-time data, but it is not explicit. So the real-time camera data is weighted higher and if it can't see **enough** of the scene, then it will not be able to map out the safe vector space.

Also notice from Chucks' video that when it moves into the median space it is trusting that space even more than it did with the last build as it slowly places itself perfectly by rolling forward just enough to get the back of the car out of the lane of travel. Beautiful! It almost seems like they may have also tied in some wheel speed sensor data to deterministically measure how much the car needs to move into the vector space median box. Anyway, that is a method that I think would work. The wheel speed sensors coupled with steering wheel angle could do a very accurate calculation to place the vehicle so perfectly in the box.
 
just watched it and yes, I saw it bounce around as well. I'm still believing that there is a trust of the vector space non-real-time data, but it is not explicit. So the real-time camera data is weighted higher and if it can't see **enough** of the scene, then it will not be able to map out the safe vector space.

Also notice from Chucks' video that when it moves into the median space it is trusting that space even more than it did with the last build as it slowly places itself perfectly by rolling forward just enough to get the back of the car out of the lane of travel. Beautiful! It almost seems like they may have also tied in some wheel speed sensor data to deterministically measure how much the car needs to move into the vector space median box. Anyway, that is a method that I think would work. The wheel speed sensors coupled with steering wheel angle could do a very accurate calculation to place the vehicle so perfectly in the box.
It's confidence level and capability is indeed getting impressive.

Just (re)watched Ashok's CVPR 2022 presentation, and it's much more about occupancy networks, which are represented internally as 3D voxel-space.

As part of that he talks about occluded space and reasoning based on the implications thereof. At about 18 mins in he discuses using that to determine creep limits, using an unprotected turn as an example. This appears to be a real-time calculation the car does, although that pre-dates the creewp-wall visualization.

Of course he doesn't speak to (therefore rule out) previously collected sparse data hints, although earlier in the presentation he does state they don't use HD maps at all... so if previous fleet-sourced data was used it would have to be keyed on something else.... GPS or lat/long coordinates?
 
  • Like
Reactions: petit_bateau
suspect they are sourced from previous FSD encounters
there is a trust of the vector space non-real-time data
To be clear, the premise is that FSD Beta 10.69 is remembering what the vector space should be for this and many other intersections -- potentially via neural network and/or map data?

If the former via neural networks, I don't think that's the case or at least intent as their purpose is to generalize even for "never seen before" intersections, but I believe Karpathy had made an offhand comment that the size of the neural networks could basically remember specific locations.

The latter via map data is more like what other companies are using with HD maps that do basically have the 3D intersection structure computed ahead of time although unclear if it's stored and represented as "vector space." Tesla has been using basic navigation map data that the car normally uses for driving directions, e.g., show turns for the upcoming intersection using certain lanes.

So I believe the vector space non-real-time data is only used offline as part of the autolabeller and training system for the neural networks that end up in vehicles, but at inference time, everything is "real time." I remember early on from FSD Beta even the initial release to non-employees, people were commenting how Tesla must be using HD maps to be able to show intersections lines/dots as the cameras would not have visibility to see that portion of the intersection. But these comments were actually highlighting how strong neural networks already were to make predictions based on partial information to make a reasonable guess at what the intersection could actually be.

At a very broad level, yes previous FSD encounters and non-real-time data crunching via autolabelling do influence the behavior of FSD Beta driving the same intersection again, but it's most likely not a specific "database-like lookup" of what was understood from previous encounters of the specific intersection.
 
  • Like
Reactions: petit_bateau
So I believe the vector space non-real-time data is only used offline as part of the autolabeller and training system for the neural networks that end up in vehicles, but at inference time, everything is "real time."

At a very broad level, yes previous FSD encounters and non-real-time data crunching via autolabelling do influence the behavior of FSD Beta driving the same intersection again, but it's most likely not a specific "database-like lookup" of what was understood from previous encounters of the specific intersection.

That was my understanding as well... and the comments Karpathy made at AI day in my previous post in the Investor thread confirms that they are only using collected data for labeling.
 
Could they, with appropriate data connections, use cloud NNs to "remember" locations in advance? This would eliminate local storage issues on the cars. It wouldn't be for real-time processing, as cloud data connections have too much latency, but could be used to augment upcoming predictions and pathing. If the cloud data is not available, or the latency is too high, it's not a problem as the local NN continues to handle things as before.
 
Could they, with appropriate data connections, use cloud NNs to "remember" locations in advance? This would eliminate local storage issues on the cars. It wouldn't be for real-time processing, as cloud data connections have too much latency, but could be used to augment upcoming predictions and pathing. If the cloud data is not available, or the latency is too high, it's not a problem as the local NN continues to handle things as before.
If it's just a repository for location hints, I'm not sure that would warrant a NN (which processes data), as much as would need to be a storage repository for geo-tagged ground-truth 'hints'. I don't know of any evidence for such "fleet memory" used by cars in real time other than that obtained by cars used for labeling and training the next version of FSD.
 
If it's just a repository for location hints, I'm not sure that would warrant a NN (which processes data), as much as would need to be a storage repository for geo-tagged ground-truth 'hints'. I don't know of any evidence for such "fleet memory" used by cars in real time other than that obtained by cars used for labeling and training the next version of FSD.
I was thinking of path planning beyond the visual perception of the cameras. The car is in the right lane, preparing for a right turn, and the vision is occluded due to large amount of traffic. Currently the planner sometimes attempts to go around the traffic to the left, which then leads it to miss the turn. If the cloud had memory of the intersection, and previous drives, perhaps that behavior would be reduced (though not eliminated). I admit my knowledge of how the NN is handling decision making is limited, so I may be way off base. :)
 
I was thinking of path planning beyond the visual perception of the cameras. The car is in the right lane, preparing for a right turn, and the vision is occluded due to large amount of traffic. Currently the planner sometimes attempts to go around the traffic to the left, which then leads it to miss the turn. If the cloud had memory of the intersection, and previous drives, perhaps that behavior would be reduced (though not eliminated). I admit my knowledge of how the NN is handling decision making is limited, so I may be way off base.

I don't think the nets would be trained to remember that specifically, and if they did, it would be overfitting or over capacity nets which could have undesirable effects.

There appears much to be done on the driving policy side of the product, as opposed to the perception side. The technology of the perception side, with the latest major update, seems to be high quality machine learning with state of the art algorithms, though I would prefer higher resolution cameras, and in stereo.

But planning is a different task. To me this requires significant semantic mapping, which could be gained from fleet driving. Then, those semantic hints would go into a neural network/ML based planner. I don't think Tesla is there yet, and there's a big gap.

Humans could have made a mistake like you describe if they had no prior knowledge of the intersection (e.g. where can I turn from to get to where I'm going). There is some basic mapping used but not at a really deep level. Humans though usually drive places they've been before and intuitively understand the dynamics and optimal routes for their intersections.

The fleet should be generating routing information from human driving, and use that to build semantic maps of "what do people do to get from X to Y", not just allowed behavior, but average preferred behavior. The car might not be able to follow that on average, but could use it as a target. Collecting large datasets of routing under human driven conditions would also generate targets (presumably using reliable safe drivers only) for the driving policy training of neural networks.

This would require a major upgrade to back end and front end technology and an expansion of the capabilities of the mapping side to bring in semantic hints for use on the planning side. The non-neural network mapping side would be use usual geometric database lookups to provide the appropraite 'semantic tags' for the routes and intersections (getting this representation right is not at all easy) which would be fed to a primarily neural network based planner and optimizer.