Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

FSD Beta Videos (and questions for FSD Beta drivers)

This site may earn commission on affiliate links.
I wonder why they release at night which just encourages all the first videos to be done in what I presume are harder driving environments.

I give it until 12:47 before someone posts a video of V10 completely missing an obvious obstacle.

I wondered the same thing. May be not harder conditions? Less traffic, fewer people around? Higher contrast (at least in area with headlights)?
 
the fact that objects flicker like that tells me that have years still left to go before their internal model is truly trustable.
AI Day did show improvements of using multiple cameras (without incorporating time in this example it seems). And this is noticeable in Beta 9.x relative to 8.x where before visualized cuboids could jump around when passing parked vehicles although Beta 9 also changed the visualizations, so it's a bit harder to directly compare.

The second Multi-Cam clip from AI Day does still have jank / flicker especially when the truck ahead is further away, but it seems stable enough when passing the truck to be trustable:
multicam.jpg


The next segment of the presentation talks about the Video Module, which should smooth out the positions even more. My guess is Karpathy didn't want to complicate this slide in the presentation, so the Single-Cam and Multi-Cam visualization here is close to the raw intermediate output of the neural network before and after "Multi-camera fusion & BEV transformation" and doesn't show what "ground truth" would be (from offline larger neural network consensus using past and future video data explained at an even later segment).

Hopefully we'll see clear improvements in Beta 10 for removing object flicker especially with trucks on highways, so what's "years" away might actually be just "hours" away.
 
don't know how you "rewrite" a NN, and that sounds like the words used by a team that hard codes things
Maybe 'rewrite' is a euphemism for 'we were so far off we couldn't build on what we did previously and had to start over'.

'Re-write' as opposed to 'carrying on evolving from the previous build'?

I find that hard to believe, but it does beg the question what 're-write' is intended to signify. May be one smaller code module?
 
  • Like
Reactions: daktari
This sounds like the words used by someone who doesn't understand what the rewrite was :)

Rewatch the AI day stuff, it's explained fairly well how the system evolved.
I guess I don't get it either! :) I thought I watched most if not all the AI Day presentations. But Re-write does have the implication that to some extent you are starting over from scratch.
 
I guess I don't get it either! :)


At a very high level-

Previous system- Each camera ran its own single-frame NN to act as visual perception for the car. Radar was a primary sensor for forward speed and distance of objects. Nothing persisted over time.

Current beta FSD system- All camera inputs are fed into a series of cascading NNs that essentially process a BEV 4D perception of the world (think of it as 360 surround video that persists over time). Radar is not used at all. Speed and distance are determined from the video inputs creating a point cloud.


There were steps in between, where for example some BEV views existed by still frame by frame, and still using some radar inputs as well. But that's the general "How it started and how's it going" of the design philosophy.

This also required rewriting of a lot of the training code too--- the upside being if it's rewritten to understand 360 view and time, you only have to manually label something in frame 1, and the new code can self-label that object for the rest of the video as long as it remains in view of any camera (and even to make predictions about it reappearing if it moves behind something briefly- they gave examples of this on AI day).



Here's Karpathy discussing the overall transition with examples in practice from mid 2020:
 
I guess I don't get it either! :)
No see, as usual, with Elon, you have to listen to what he "means" not what he "says." Do not apply normal English definitions to the words he uses, and you'll be fine. Just go watch an hour video and you'll understand what he really meant.

Same guy that said the Plaid set a Nürburgring record "completely unmodified, straight from the factory" and the video shows a replacement dashboard bolted on the column in front of the driver, blocking the stock dash, and a couple switches that are not stock around the cabin. See, what he meant was....
 
  • Like
Reactions: Battpower
Thanks. I've followed all this but without first hand experience of Beta / City, all I see is same old jumping around and duplicated artifacts from time to time. It is really helpful (for me at least!) to hear the same explanation from different perspectives.

It has always confused me where the overlap is between NN and conventional code / logic, but also between static frame by frame processing vs a continuous ’certainty field' around the car that must always be maintained very close to being 100% certain at least close to and in the direct path of the car.

The time based processing must be based from multiple frames, and again in my mind you need to track and predict pathways for multiple significant objects in some kind of hierarchy.
Watch this starting around 2:40, it explains the difference between the single cam NN, multi-cam NN, and multi-cam + time NN.

Note they were doing fusing of the different cameras and smoothing of the data already way back in 2020. That obviously was not as good as what they are doing now.
Tesla.com - "Transitioning to Tesla Vision"