Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

FSD rewrite will go out on Oct 20 to limited beta

This site may earn commission on affiliate links.
The tweet does somewhat clarify what he was referring to the "rewrite" in that it was primarily the labeling software.
I do not see it that way.
4D was the big rewrite that was "almost complete" in January 2020.
"We rewrote all labeling software for 4D" just means that "labeling software" was part of the entire rewrite, not that "labeling software" was the rewrite.


p.s. just because we had seen hydranets since late 2018 and BEV net since late 2019 does not make them any less part of the rewrite.
Each one is a stepping stone for the next big step.
 
  • Like
Reactions: drtimhill
LOL

That isn't the car learning (which again, not only can it not do, it'd be insane of Tesla to have it do- troubleshooting would be a NIGHTMARE)


That's the guy having his top speed on approach set 25 mph lower on the 2 attempts that actually worked than the one that didn't.

Look at the displayed speed limit on attempt 1 (50 mph) versus attempt 2/3 (25 mph)- in both cases he had TACC set +5.
 
I do not see it that way.
4D was the big rewrite that was "almost complete" in January 2020.
"We rewrote all labeling software for 4D" just means that "labeling software" was part of the entire rewrite, not that "labeling software" was the rewrite.


p.s. just because we had seen hydranets since late 2018 and BEV net since late 2019 does not make them any less part of the rewrite.
Each one is a stepping stone for the next big step.

That was tesla finally catching up to the industry standard in labeling. But leave it to Elon to try to hype it up as though its new and completely misrepresent it and for Tesla fans to lap it up.

Like I said above and in the past, "4d labeling" is already industry standard and used widely. For example cruise details it here:
16:10m
 
That was tesla finally catching up to the industry standard in labeling. But leave it to Elon to try to hype it up as though its new and completely misrepresent it and for Tesla fans to lap it up.

Like I said above and in the past, "4d labeling" is already industry standard and used widely. For example cruise details it here:
16:10m


Good video. Definitely true Tesla 4D labeling is not ground breaking in anyway by itself.

Side notes:

Around 15 minutes he clearly states the importance of collecting tons of edge cases.

Mentions how expensive it would be to scale up labeling. This is my main concern for Tesla, as they will actually have a lot of data, moreso than Cruise. And they will need to use it. Labeling tech has to get really efficient for Tesla. I'm sure their "rewrite" has sped it up, but have no sense if it's anything that can handle edge cases that would come from full release of FSD.
 
Good video. Definitely true Tesla 4D labeling is not ground breaking in anyway by itself.

What frustrates me in many of these conversations regarding FSD engineering solutions is that people don't appreciate the specificity of the technologies and strategies involved. People who work in software and engineering understand the importance of every minute detail in an implementation. One small difference can make something a breakthrough vs mediocre.

At 16:00 in the Cruise video, it's obviously not the same "4D" labeling as described by Elon:

1) Voigt refers to ML tools that estimate a bounding box for 2D images to aid in labeling
2) Since Cruise uses LIDAR, they can label point clouds and then maintain the labeled point clouds over time, to avoid labeling each point cloud frame.

None of this seems to match up with "4D stitched video labeling". The 4D stitched video labeling described by Elon (which we still lack a detailed explanation) involves:

1) Labeling stitched VIDEO, not 2D images
2) The label presumably follows the VIDEO object through time, similar to the LIDAR point cloud, but with VIDEO

Bladerskb keeps parroting this "industry standard" thing as if every company that implements XYZ is the same as any other company. That's like saying a Ferrari is the same as a Civic because they're both cars. It's a ridiculous point that doesn't appreciate the end result.
 
Last edited:
What frustrates me in many of these conversations regarding FSD engineering solutions is that people don't appreciate the specificity of the technologies and strategies involved. People who work in software and engineering understand the importance of every minute detail in an implementation. One small difference can make something a breakthrough vs mediocre.
10000%
There seem to be a lot of people on here that will claim, "more sensors" or "lidar because it gives you x" or "HD Maps is the industry standard"
In the software world, there is the roll-your-own approach, or utilize a framework.
Sometimes a framework is great (especially if it is very mature and has gone through many cycles) but many times a framework is just locking you into a new set of problems - instead of solving the original problem you are end up solving or working around framework problems (this is especially evident in less mature frameworks).

In autonomy there is no framework that you can take on (beside the ideological one of "map first vs vision first")
From there you have to roll your own solution and every minute detail determines how much baggage and technical debt you carry forward.

ALL "map first" guys will be suffering and most will be killed by this technical debt down the line -- mainly because they don't have the balls to do a fundamental rewrite of the system when they realize they were on the wrong path.
 
What frustrates me in many of these conversations regarding FSD engineering solutions is that people don't appreciate the specificity of the technologies and strategies involved. People who work in software and engineering understand the importance of every minute detail in an implementation. One small difference can make something a breakthrough vs mediocre.

At 16:00 in the Cruise video, it's obviously not the same "4D" labeling as described by Elon:

1) Voigt refers to ML tools that estimate a bounding box for 2D images to aid in labeling
2) Since Cruise uses LIDAR, they can label point clouds and then maintain the labeled point clouds over time, to avoid labeling each point cloud frame.

None of this seems to match up with "4D stitched video labeling". The 4D stitched video labeling described by Elon (which we still lack a detailed explanation) involves:

1) Labeling stitched VIDEO, not 2D images
2) The label presumably follows the VIDEO object through time, similar to the LIDAR point cloud, but with VIDEO

Bladerskb keeps parroting this "industry standard" thing as if every company that implements XYZ is the same as any other company. That's like saying a Ferrari is the same as a Civic because they're both cars. It's a ridiculous point that doesn't appreciate the end result.

Wrong per usual.

Do you even know what the term "4D" means? Its to label through time. Voigt presented that instead of labeling each single images as was the case in the early deep learning era. They capture 10 seconds footages of lidar and camera data.
Then they label a frame of it, which then propagates through time to label the entire 10 seconds.

Because you don't understand what you are talking about you don't know what problems are being solved or what in the world they are doing.

The lidar data is being displayed in a bird eye view. That data is from their 5x Lidars, not from one lidar. Secondly just because they are only showing a camera picture on top from one camera doesn't mean they are only labeling that one camera and no the other 14x cameras. Thirdly its not video, its sequence of images (frames).

Here is the problem this is solving.

Lets say you have 8 cameras and 10 seconds of footage from all 8 cameras at 10 fps. That's 800 images. Lets say you want to label them all.

Previously Tesla was using outdated labeling software made from early deep learning era (2012-2015).
Since then there have been new label techniques and methods used. Before Tesla would then sent those 800 images in random order to dozens of labelers.

This meant alot of labeling work and the network wouldn't have continuality in the objects detected from camera to camera.

Finally Tesla adopted current labeling standard where you use models you have already trained to help you label your images reducing the workload of your labelers and boosting efficiency.

Furthermore, you can also 10x labeling efficiency, if you instead of sending the images in random order to dozens of labelers.
You take those images align them and create a labeling software that uses your already trained model to label the entire sequences all at one.

This helps improve transition from one image to another (basically one camera to another).
This also improves partial object labeling. Lets say 5% of a car is in another cam's pov (aka another image). The software will be able to label that which before it could not because its letting time assist its labeling. Leading to better recall and reliability.

This is what "4D labeling" is. Its a labeling software.
You believe in this mystical thing elon fabricated that doesn't exist. Per usual. Shadow mode 2.0
 
That was tesla finally catching up to the industry standard in labeling. But leave it to Elon to try to hype it up as though its new and completely misrepresent it and for Tesla fans to lap it up.

Like I said above and in the past, "4d labeling" is already industry standard and used widely. For example cruise details it here:
16:10m

At 16:10, he's talking about minimizing human labor for labeling. So unless we're talking about having humans label lidar data real-time while the car is in motion and attempting to drive, I'm pretty sure this "4D labeling" is for training. I believe that's different from what Tesla means with 4D, as that's also functioning in the cars as they drive around to improve perception. That said, I wouldn't be surprised if Cruise is already doing the same.
 
  • Like
Reactions: willow_hiller
At 16:10, he's talking about minimizing human labor for labeling. So unless we're talking about having humans label lidar data real-time while the car is in motion and attempting to drive, I'm pretty sure this "4D labeling" is for training. I believe that's different from what Tesla means with 4D, as that's also functioning in the cars as they drive around to improve perception. That said, I wouldn't be surprised if Cruise is already doing the same.

No that is what he is talking about. That is what sequence labeling through time is (4D).

You people need to stop believing elon's hype nonesense.
In between his hype he slips and lets you in on what's actually going on.
and he precisely says

"We had to do a fundamental rewrite of the entire Autopilot software stack... We're now labeling 3D video, which is hugely different from when we were previously labeling single 2D images. We're now labeling entire video segments, taking all cameras simultaneously and labeling that. The sophistication of the neural net of the car and the overall logic of the car is improved dramatically."

He spells it out "We're now labeling entire video segments, taking all cameras simultaneously and labeling that."

Video segments aka 10 seconds of 360 lidar/camera data as Cruise does for example. Also when he says 3D video. He doesn't mean some special 3d output. But actually the 360 coverage of all cameras. And the time addition (4D) is using RNN models to auto label the image sequences.

And yes this also reduces human labor, this is part of the problem this is solving. He even mentioned it. but you people refuse to pay attention. People only want to see and hear the glitter and the sparks.

“…in terms of labeling, labeling with video in all eight cameras simultaneously. This is a really, I mean in terms of labeling efficiency, arguably like a three order of magnitude improvement in labeling efficiency. For those who know about this, it’s extremely fundamental, so that’s really great progress on that,” Musk said.​

Again he presents industry standard labeling technique are quantum leap.

Example...
xWEFxb2.png
 
Can I disagree with everyone?

Did Musk even say what Tesla is doing labeling wise is ground-breaking? I'm not sure he did.

It's a big change for Tesla.

And the importance is a much bigger deal for Tesla than Cruise.

And minor details of the differences may be more important for scaling labeling efficiency than we know of.

So even if Tesla is just now doing what Cruise is doing, that still is an advantage for Tesla because of their amplified ability to gather data to annotate.

The playing field is not level.
 
No that is what he is talking about. That is what sequence labeling through time is (4D).

So you are suggesting at 16:18 when he says "Instead of having to draw a box around a person to say this is a person, we actually have ML models that assist the humans in doing this task of labeling" he's not talking about labeling for training purposes, but to reduce human labor in real-time perception by the car as it drives? Those must be some really quick manual labelers to do so at even 10 frames per second!
Edit: Nevermind, I see what you mean.


Or did you mean Elon Musk when you said "he"? If that's the case, I agree 4D is being used by Tesla for training, but I assumed (maybe incorrectly) that it meant the re-write perception NN also recognizes 4D / video. I suppose one way to know is if it does a better job at identifying a signaling car's intention, as individual images wouldn't show blinking, while video would.

My understanding is 4D is used by Tesla at both training and perception.
 
How about we look at Tesla's related patent?

https://www.freepatentsonline.com/y2020/0250473.html
https://www.freepatentsonline.com/20200250473.pdf

"In some embodiments, the various outputs of deep learning are used to construct a three-dimensional representation of the vehicle's environment for autonomous driving which includes predicted paths of vehicles, identified obstacles, identified traffic control signals including speed limits, etc. In some embodiments, the vehicle control module utilizes the determined results to control the vehicle along a determined path."

"In various embodiments, the deep learning analysis is used to predict additional features. The predicted features may be used to assist autonomous driving. For example, a detected vehicle can be assigned to a lane or road. As another example, a detected vehicle can be determined to be in a blind spot, to be a vehicle that should be yielded to, to be a vehicle in the left adjacent lane, to be a vehicle in the right adjacent lane, or to have another appropriate attribute. Similarly, the deep learning analysis can identify traffic lights, drivable space, pedestrians, obstacles, or other appropriate features for driving."

"At 407, the results of deep learning analysis are provided to vehicle control."
 
just because we had seen hydranets since late 2018 and BEV net since late 2019 does not make them any less part of the rewrite.
Totally agree that the iterations and additions to the neural networks are part of the overall rewrite, and in fact most likely the complete rewrite of the labeling software was driven by the changes to the neural networks needing much more higher quality training data than before. I'm just pointing out that the neural network architecture doesn't need to be completely rewritten to significantly change the behavior as training data plays a huge role in what the network predicts.

If we assume randomly ordered single camera labeled objects/lines/edges was sufficient for training HydraNet for highway driving, the transition to 4D labeling rewrite should mean the BEV net (along with fusion layer and temporal module) required sequential frames from all cameras as part of the training data to be correctly trained.

At a very high level, Tesla isn't doing much different from competitors in that everyone needs to process data from multiple sensors over time to control the vehicle. Even the old 2.5D Autopilot approach did that, and it works quite well for highway driving after years of refinements from fleet data. It definitely will be interesting to see how quickly the FSD neural network improves with the fleet actively using it to find and send back data (potentially already in shadow mode).
 
  • Like
Reactions: mikes_fsd
Some more interesting quotes from GENERATING GROUND TRUTH FOR MACHINE LEARNING FROM TIME SERIES ELEMENTS - Tesla, Inc. which I think describes both pre-rewrite (209) and post-rewrite (409) autopilot system.

"For example, the model may be trained to identify road lane lines, obstacles, pedestrians, moving vehicles, parked vehicles, drivable space, etc., as appropriate. In some embodiments, multiple trajectories for a lane line are identified. For example, several potential trajectories for a lane line are detected and each trajectory has a corresponding probability of occurring. In some embodiments, the lane line predicted is the lane line with the highest probability of occurring and/or the highest associated confidence value. In some embodiments, a predicted lane line from deep learning analysis requires exceeding a minimum confidence threshold value. In various embodiments, the neural network includes multiple layers including one or more intermediate layers. In various embodiments, the sensor data and/or the results of deep learning analysis are retained and transmitted at 411 for the automatic generation of training data."

"In some embodiments, the sampling rate of the captured sensor and/or related data is configurable. For example, the sampling rate is increased at higher speeds, during sudden braking, during sudden acceleration, during hard steering, or another appropriate scenario when additional fidelity is needed. "

"In some embodiments, label A of FIG. 6 corresponds to label A of FIG. 5 and the predicted three-dimensional trajectories of lane lines 601 and 611 are determined using only image data 600 as input to a trained machine learning model. By training the machine learning model using a ground truth determined using image and related data of a time series that includes elements taken at the locations of labels A, B, and C of FIG. 5, three-dimensional trajectories of lane lines 601 and 611 are predicted with a high degree of accuracy even portions of the lane lines in the distance, such as portions 621. Although image data 600 and image data 500 of FIG. 5 are related, the prediction of trajectories does not require image data 600 to be included in the training data. By training on sufficient training data, lane lines can be predicted even for newly encountered scenarios. In various embodiments, the predicted three-dimensional trajectories of lane lines 601 and 611 are used to maintain the position of the vehicle within the detected lane lines and/or to autonomously navigate the vehicle along the detected lane of the prediction lane lines. By predicting the lane lines in three-dimensions, the performance, safely, and accuracy of the navigation is vastly improved."
 
No that is what he is talking about. That is what sequence labeling through time is (4D).

....... First of all, it's not even the same technology. Again, Vogt was referring to labeling lidar point clouds, not video, not a subtle difference. Second, Vogt doesn't explicitly talk about labeling moving point clouds. He was talking about tools to help labelers do less work by estimating the same label in a sequence of static point clouds. There's a big difference here between labeling the position of an object as a function of time vs labeling the same object in a sequence. Vogt doesn't talk about the time dimension at 16:00 in the video. Even if he did, he'd be referring to the lidar point clouds, which is not the same as video, obviously.

Tesla's 4D labeling seems to imply that the labeler see's a video and labels the moving video directly, rather than a sequence of images. Although we don't know the full details, we haven't seen video labeling mentioned by other FSD developers.
 
Last edited:
  • Like
Reactions: mikes_fsd
Here's Elon briefly describing video labeling during Battery Day (emphasis mine):

Elon Musk: (01:20:40)
So we are now labeling in 3D video, so this is hugely different from the previously where we were labeling essentially a bunch of single images from the eight cameras, and they would be labeled at different times by different people, and some of the labels, you literally can’t tell what it is you’re labeling. So it basically made it sort of in some cases impossible to label, and the labels had a lot of errors. Now with our new labeling tools, we label it in video, so we actually label entire video segments in the system, so you get basically a surround video thing to label with the surround video and with time. So it’s now taking all cameras simultaneously and looking at how the image has changed over time and labeling that, and then the sophistication of the neural nets in the car and the overall logic in the car has improved dramatically.