Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Blog Musk Touts ‘Quantum Leap” in Full Self-Driving Performance

This site may earn commission on affiliate links.


A “quantum leap” improvement is coming to Tesla’s Autopilot software in six to 10 weeks, Chief Executive Elon Musk said a tweet.

Musk called the new software a “fundamental architectural rewrite, not an incremental tweak.”






Musk said his personal car is running a “bleeding edge alpha build” of the software, which he also mentioned during Tesla’s Q2 earnings. 

“So it’s almost getting to the point where I can go from my house to work with no interventions, despite going through construction and widely varying situations,” Musk said on the earnings call. “So this is why I am very confident about full self-driving functionality being complete by the end of this year, is because I’m literally driving it.”

Tesla’s Full Self-Driving software has been slow to roll out against the company’s promises. Musk previously said a Tesla would drive from Los Angeles to New York using the Full Self Driving feature by the end of 2019. The company didn’t meet that goal. So, it will be interesting to see the state of Autopilot at the end of 2020.

 
Last edited by a moderator:
It is also what Tesla was doing with their old 2.5D labelling. 4D is making a 4D point cloud of the entire video from all cameras to label all frames at once. Not labelling each camera in 2D and stitching them together(in 2D).

Like I said. Maybe Tesla is implementing 4D differently. But if Mobileye has camera vision that can track the paths of objects in real-time in both space and time I consider that to be 4D.
 
Here is what Elon said during the Q2 2020 Earnings Call on 4D:

"Well, the actual major milestone that's happening right now is really a transition of the autonomy system or the cars, like AI, if you will, from thinking about things in -- like 2.5D. It's like think -- things like isolated pictures and doing image recognition on pictures that are harshly correlated in time but not very well and transitioning to kind of a 4D, where it's like -- which is video essentially."

Tesla (TSLA) Q2 2020 Earnings Call Transcript | The Motley Fool

So basically 2.5D refers to just processing still images and maybe you stitch the images together to get a little of bit motion through time. 4D refers to video labeling where you now are fully processing objects in both 3D space and in Time. That's what Elon means by 4D.
 
  • Like
Reactions: mikes_fsd
This came out today, which I guess is close to state of the art in academia/opensource/startup space

When watching it I can’t help but think how many of the problems Tesla will solve or even remove with their new 4D labelling, data engine and fusion layer, output in vector space solution. It feels like Tesla are at least 2 years ahead of academia, ie two years ago Tesla switched to a new better system. And now they already have the system in production. Don’t think people fully understand that competition will have to move to 4D or risk getting stuck in the 99.9%-range.
 
Well, the raising of the FSD price tag, coupled with 1,000s of people using the system should reveal loud complaints if the system doesn't live up to that $10 k price tag. If it is still in Beta 6 months from now, that would be bad news. Time will tell, and the time is not an indefinite number of months and years. Tesla/Musk have basically said, we got it, it is here. How long do they get for people to agree or refute that claim?
 
Musk confirming what everyone already knew about better sensors...
upload_2020-11-20_15-25-38.png

https://twitter.com/elonmusk/status/1329878876202426371
 
Thanks for sharing! Very fun to follow, I once supervised a master thesis doing lidar localization not too differently for what Mobileye are doing with pseudo-lidar.

Anyway I think it must be pretty frustrating for mobileye to once a year hear that Tesla are changing their stack away from what Mobileye have implemented. So many things in the paper just screams “feature engineering” and George Hotz would be laughing at what they are doing. And so many of their problems they are trying to address are removed when switching to 4D, like Elon says, the best design is to remove parts. And mainly, the way I see it is, is that Deep Learning in 2015-2020 was mostly about scaling up the dataset rather than being clever with algorithms and feature engineering. Mobileye are now wasting time labelling a dataset and doing feature engineering for a system that pretty soon will be replaced by a similar 4D stack as Tesla are doing that are more efficient at Labeling. Then in a few years when Mobileye have switched to a 4D labelling system Karpathy and his team will announce that they now are doing end2end using transformers, GANs and metalearning and Mobileye will again have to scrap what they are doing to catch up...

Is that why Tesla is two years behind Mobileye in deployed NN?

578gwi8zu5y21.png


Picture of Tesla's Future HW3 FSD Road Edge Neural Network from late 2019

76eSolH.png


Picture of Mobileye's Road Edge Neural Network from Production Q4 2017

HuvXQ2r.jpg


it also looks like this will also be the case with EyeQ5. Looks like Mobileye will be the first to deploy Vidar this September.

gZAjMD.gif


And mainly, the way I see it is, is that Deep Learning in 2015-2020 was mostly about scaling up the dataset rather than being clever with algorithms and feature engineering. Mobileye are now wasting time labelling a dataset and doing feature engineering for a system that pretty soon will be replaced by a similar 4D stack as Tesla are doing that are more efficient at Labeling.

You don't even know what '4D stack' is or the state of the art in autonomous driving. All you do is regurgitate whatever elon is saying.

Karpathy and his team will announce that they now are doing end2end using transformers, GANs and metalearning

Who invented these deep learning algorithms and architecture? Tesla? Oh wait no its DeepMind.
Who was responsible for all the deep learning breakthroughs the past 9 years? Tesla? Oh wait no its DeepMind.
Who collaborates with DeepMind? Tesla? Oh no its Waymo.

This came out today, which I guess is close to state of the art in academia/opensource/startup space

That's not the state of the art in academia/opensource/startup space. Not even close. That's like 6 years behind.
It all makes sense how misinformed you are.

When watching it I can’t help but think how many of the problems Tesla will solve or even remove with their new 4D labelling, data engine and fusion layer, output in vector space solution. It feels like Tesla are at least 2 years ahead of academia, ie two years ago Tesla switched to a new better system. And now they already have the system in production. Don’t think people fully understand that competition will have to move to 4D or risk getting stuck in the 99.9%-range.

Is that why Tesla's Driving policy and prediction is almost 100% hard-coded? While others have been using mostly Deep learned models for a while? Talk about being behind.
 
From what we've heard Tesla rewrote labelling of all features (RG, RB, RU, RS in Mobileye's terms) to be driven by video data.

This is a misconception of what's actually going on.

Cruise Data Labeling

The video is great and I love the level of detail. However I think if anything it actually proves that Tesla is taking a different approach. Direct link to their pdf: https://newsroom.intel.com/wp-conte...leye-Investor-sensing-status-presentation.pdf

Mobileye is very focused on redundancy and their system interprets results from the vision, radar and lidar separately. Each of the 4 categories of features (Road Geometry, Road Boundaries, Road Users, Road Semantics) are covered by multiple processing engines.

Unlike Tesla, none of their different processing engines (Object Detection DNNs, Lanes detections DNN, Semantic Segmentation engine, Single view Parallax-net elevation map, Multi-view Depth network, Generalized-HPP, Wheels DNN, Road Semantic Networks) depend on using a Birds Eye View map for detecting features. They mention an occupancy grid but that's actually just for Road Boundaries. They use video processing but only for pseudo-lidar as far as I could tell, not for labelling.

From what we've heard Tesla rewrote labelling of all features (RG, RB, RU, RS in Mobileye's terms) to be driven by video data. Based on the diagram I posted earlier they also route all perception through a single BEV Net now that outputs all feature types (compared to the 8 separate other approaches listed above). I don't see where Mobileye is doing these things.

Literally Andrej Karpathy said "What we have been working on is going much more towards these bird eyes view prediction, which are actually relatively standard and well understood but for us it kinda of a step up."

He is literally tell YOU that they are playing catch up with the industry standard.

All Tesla is doing is

Step #1 Raw Images
Step #2 Backbone ResNet (outputs features such as moving objects, road lines, road edges, etc)
Step #3 Fusion layer (stitches all the extracted feature map from Step #2 and projects from image space to bird's eye view)
Step #4 Temporal Modules (smooths out all the rough spots)
Step #5 Bird's eye view decoder (creates the top down space)

All Tesla is doing is Step 3-5 which is already industry standard and what companies do with lidar data to create BEV networks and they are just playing catch up to that.

@heltok
 
I keep watching presentations, reading all I can find and trying to make sense of it as well as patching in my own old school engineering and electronics experience but where I struggle most is with the context and purpose of the information on offer. Long or short on TSLA vs Tesla not giving away how far ahead or behind they are.

I generally feel better able to understand stuff that relates to physical processes, layers, chip, feedback systems, much of which seems lost with NN. However, I keep reading posts here that suggest there are some pretty basic layers of authority (for example) and obviously all this still has to connect with the real world so physical stuff still matters.

Can someone who is certain of their facts point me to a current model (or list out steps here) that shows the ordering that goes on with Tesla's current (beta) approach? Is there just one model that's modified based on environment (freeway / city / lhd vs rhd.... country) or different models for each? In a more consistent / conforming environment like a freeway is it safer of not to try and perform potentially pointless and maybe confusing image processing at speed?

Step #1 Raw Images
Step #2 Backbone ResNet (outputs features such as moving objects, road lines, road edges, etc)
Step #3 Fusion layer (stitches all the extracted feature map from Step #2 and projects from image space to bird's eye view)
Step #4 Temporal Modules (smooths out all the rough spots)
Step #5 Bird's eye view decoder (creates the top down space)

Is this certain and in order?

Raw images: do you mean physical camera feeds or logical images of objects being tracked?

Is ResNet physical or logical? Are objects tracked individually or merged into a combined view? Since you haven't mentioned 360 / BEV yet, are these still flat representation of separate objects linked to a physical camera or combined logical image feed?

Step 3 pulls together objects from multiple feeds and places them in 360 space / 3D space?

Step 4 tracks objects between multiple frames? Is it part of step 4 that objects acquire direction and velocity?

Step 5. What is the significance of this? (other than I want to see a BEV on the MCU!) surely by the end of step 4 you have all your objects and how they are moving.

Why wouldn't you take all video sources and use them to produce stitched 3D images as a first step, then process that single source to extract / track objects?

Given that physical and logical seems quite a blurred distinction these days, how do we know which is which?
 
[QUOTE="Battpower, post: 5317916, member: 117285"
Can someone who is certain of their facts point me to a current model (or list out steps here) that shows the ordering that goes on with Tesla's current (beta) approach? Is there just one model that's modified based on environment (freeway / city / lhd vs rhd.... country) or different models for each? In a more consistent / conforming environment like a freeway is it safer of not to try and perform potentially pointless and maybe confusing image processing at speed?[/QUOTE]


Elon Musk on Q4 2020 earnings call said:
so there's still a few of the neural nets that need to be upgraded to video training and video inference.

Also green had been reporting none of the freeway (NoA) code had changed significantly for the FSD Beta software so that appears to be among the stuff not running "re-write" code

And AFAIK all the folks running FSD beta are in LHD countries (mostly the US, but maybe there's some in Canada, can't recall)
 
I thought Tesla was doing auto labeling of video. The 4D rewrite uses surround video. So why is Tesla hiring people to still do manual labelling of images?

Tesla looks to hire data labelers to feed Autopilot neural nets with images at Gigafactory New York - Electrek

because it’s fundamentally still image labeling. They are just labeling a series of images that make up like 10 secs of video. Calling it 4D video labeling is just another hype job by Elon. Just like quantum leap or silky smooth.
 
I thought it was more like they still need a human to label in frame 1 THAT IS X for whatever it is- but then the system is capable of understanding it's still X in the rest of the video clip forward in time.... rather than the previous system where they had to manually tell it THAT IS X in each individual frame.

That'd still require humans for the initial labels for video but not for every item in every frame of it.
 
I thought Tesla was doing auto labeling of video. The 4D rewrite uses surround video. So why is Tesla hiring people to still do manual labelling of images?

Tesla looks to hire data labelers to feed Autopilot neural nets with images at Gigafactory New York - Electrek
My understanding (from what little has been actually shared to us) on this was that Dojo hasn't even started yet. I suspect you have to keep manually doing this until it can start to automate some of those tasks? Not sure Elon ever gave more than a guess on Dojo being up and running though? Late this year?
 
My understanding (from what little has been actually shared to us) on this was that Dojo hasn't even started yet. I suspect you have to keep manually doing this until it can start to automate some of those tasks? Not sure Elon ever gave more than a guess on Dojo being up and running though? Late this year?

If I am remembering correctly, I think it was last year that Elon mentioned that Dojo was a year away from completion.
 
  • Like
Reactions: cbrigante2
That's what it sounded like to me, but left further clarification of what is automated, will be automated and where DoJo sits all somewhat vague ... as usual.

My understanding (from what little has been actually shared to us) on this was that Dojo hasn't even started yet. I suspect you have to keep manually doing this until it can start to automate some of those tasks? Not sure Elon ever gave more than a guess on Dojo being up and running though? Late this year?

What is DOJO to you? Serious question. From your words it seems like some magic potion or secret sauce...
 
I thought it was more like they still need a human to label in frame 1 THAT IS X for whatever it is- but then the system is capable of understanding it's still X in the rest of the video clip forward in time.... rather than the previous system where they had to manually tell it THAT IS X in each individual frame.

That'd still require humans for the initial labels for video but not for every item in every frame of it.

That is exactly what it is. Unfortunately its Tesla community so its hyped as the second coming of christ that changes everything and is unique to Tesla. Anything Tesla does is regarded as the second coming even if others have been doing it for years.

Cruise Data Labeling