Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

diplomat33

Well-Known Member
Aug 3, 2017
7,802
9,119
Terre Haute, IN USA
It is also what Tesla was doing with their old 2.5D labelling. 4D is making a 4D point cloud of the entire video from all cameras to label all frames at once. Not labelling each camera in 2D and stitching them together(in 2D).

Like I said. Maybe Tesla is implementing 4D differently. But if Mobileye has camera vision that can track the paths of objects in real-time in both space and time I consider that to be 4D.
 

diplomat33

Well-Known Member
Aug 3, 2017
7,802
9,119
Terre Haute, IN USA
Here is what Elon said during the Q2 2020 Earnings Call on 4D:

"Well, the actual major milestone that's happening right now is really a transition of the autonomy system or the cars, like AI, if you will, from thinking about things in -- like 2.5D. It's like think -- things like isolated pictures and doing image recognition on pictures that are harshly correlated in time but not very well and transitioning to kind of a 4D, where it's like -- which is video essentially."

Tesla (TSLA) Q2 2020 Earnings Call Transcript | The Motley Fool

So basically 2.5D refers to just processing still images and maybe you stitch the images together to get a little of bit motion through time. 4D refers to video labeling where you now are fully processing objects in both 3D space and in Time. That's what Elon means by 4D.
 
  • Like
Reactions: mikes_fsd

heltok

Active Member
Aug 12, 2014
1,232
10,427
Sweden
This came out today, which I guess is close to state of the art in academia/opensource/startup space

When watching it I can’t help but think how many of the problems Tesla will solve or even remove with their new 4D labelling, data engine and fusion layer, output in vector space solution. It feels like Tesla are at least 2 years ahead of academia, ie two years ago Tesla switched to a new better system. And now they already have the system in production. Don’t think people fully understand that competition will have to move to 4D or risk getting stuck in the 99.9%-range.
 

tmoz

S85D, Prius PiP
Aug 16, 2015
880
483
Gilbert, Arizona
Well, the raising of the FSD price tag, coupled with 1,000s of people using the system should reveal loud complaints if the system doesn't live up to that $10 k price tag. If it is still in Beta 6 months from now, that would be bad news. Time will tell, and the time is not an indefinite number of months and years. Tesla/Musk have basically said, we got it, it is here. How long do they get for people to agree or refute that claim?
 

DanCar

Active Member
Oct 2, 2013
2,002
1,735
SF Bay Area
... How long do they get for people to agree or refute that claim?
I've been wondering that since 2016. I suspect people will give Tesla lots of leeway. If they give everyone access to "closed" beta with signed waiver of liability, then I suspect there will be plenty of people happy.
 
  • Love
Reactions: Microterf

mikes_fsd

Banned
May 23, 2014
2,556
2,088
Charlotte, NC
Musk confirming what everyone already knew about better sensors...
upload_2020-11-20_15-25-38.png

https://twitter.com/elonmusk/status/1329878876202426371
 

Bladerskb

Senior Software Engineer
Oct 24, 2016
2,326
2,667
Michigan
Thanks for sharing! Very fun to follow, I once supervised a master thesis doing lidar localization not too differently for what Mobileye are doing with pseudo-lidar.

Anyway I think it must be pretty frustrating for mobileye to once a year hear that Tesla are changing their stack away from what Mobileye have implemented. So many things in the paper just screams “feature engineering” and George Hotz would be laughing at what they are doing. And so many of their problems they are trying to address are removed when switching to 4D, like Elon says, the best design is to remove parts. And mainly, the way I see it is, is that Deep Learning in 2015-2020 was mostly about scaling up the dataset rather than being clever with algorithms and feature engineering. Mobileye are now wasting time labelling a dataset and doing feature engineering for a system that pretty soon will be replaced by a similar 4D stack as Tesla are doing that are more efficient at Labeling. Then in a few years when Mobileye have switched to a 4D labelling system Karpathy and his team will announce that they now are doing end2end using transformers, GANs and metalearning and Mobileye will again have to scrap what they are doing to catch up...

Is that why Tesla is two years behind Mobileye in deployed NN?

578gwi8zu5y21.png


Picture of Tesla's Future HW3 FSD Road Edge Neural Network from late 2019

76eSolH.png


Picture of Mobileye's Road Edge Neural Network from Production Q4 2017

HuvXQ2r.jpg


it also looks like this will also be the case with EyeQ5. Looks like Mobileye will be the first to deploy Vidar this September.

gZAjMD.gif


And mainly, the way I see it is, is that Deep Learning in 2015-2020 was mostly about scaling up the dataset rather than being clever with algorithms and feature engineering. Mobileye are now wasting time labelling a dataset and doing feature engineering for a system that pretty soon will be replaced by a similar 4D stack as Tesla are doing that are more efficient at Labeling.

You don't even know what '4D stack' is or the state of the art in autonomous driving. All you do is regurgitate whatever elon is saying.

Karpathy and his team will announce that they now are doing end2end using transformers, GANs and metalearning

Who invented these deep learning algorithms and architecture? Tesla? Oh wait no its DeepMind.
Who was responsible for all the deep learning breakthroughs the past 9 years? Tesla? Oh wait no its DeepMind.
Who collaborates with DeepMind? Tesla? Oh no its Waymo.

This came out today, which I guess is close to state of the art in academia/opensource/startup space

That's not the state of the art in academia/opensource/startup space. Not even close. That's like 6 years behind.
It all makes sense how misinformed you are.

When watching it I can’t help but think how many of the problems Tesla will solve or even remove with their new 4D labelling, data engine and fusion layer, output in vector space solution. It feels like Tesla are at least 2 years ahead of academia, ie two years ago Tesla switched to a new better system. And now they already have the system in production. Don’t think people fully understand that competition will have to move to 4D or risk getting stuck in the 99.9%-range.

Is that why Tesla's Driving policy and prediction is almost 100% hard-coded? While others have been using mostly Deep learned models for a while? Talk about being behind.
 

Bladerskb

Senior Software Engineer
Oct 24, 2016
2,326
2,667
Michigan
From what we've heard Tesla rewrote labelling of all features (RG, RB, RU, RS in Mobileye's terms) to be driven by video data.

This is a misconception of what's actually going on.

Cruise Data Labeling

The video is great and I love the level of detail. However I think if anything it actually proves that Tesla is taking a different approach. Direct link to their pdf: https://newsroom.intel.com/wp-conte...leye-Investor-sensing-status-presentation.pdf

Mobileye is very focused on redundancy and their system interprets results from the vision, radar and lidar separately. Each of the 4 categories of features (Road Geometry, Road Boundaries, Road Users, Road Semantics) are covered by multiple processing engines.

Unlike Tesla, none of their different processing engines (Object Detection DNNs, Lanes detections DNN, Semantic Segmentation engine, Single view Parallax-net elevation map, Multi-view Depth network, Generalized-HPP, Wheels DNN, Road Semantic Networks) depend on using a Birds Eye View map for detecting features. They mention an occupancy grid but that's actually just for Road Boundaries. They use video processing but only for pseudo-lidar as far as I could tell, not for labelling.

From what we've heard Tesla rewrote labelling of all features (RG, RB, RU, RS in Mobileye's terms) to be driven by video data. Based on the diagram I posted earlier they also route all perception through a single BEV Net now that outputs all feature types (compared to the 8 separate other approaches listed above). I don't see where Mobileye is doing these things.

Literally Andrej Karpathy said "What we have been working on is going much more towards these bird eyes view prediction, which are actually relatively standard and well understood but for us it kinda of a step up."

He is literally tell YOU that they are playing catch up with the industry standard.

All Tesla is doing is

Step #1 Raw Images
Step #2 Backbone ResNet (outputs features such as moving objects, road lines, road edges, etc)
Step #3 Fusion layer (stitches all the extracted feature map from Step #2 and projects from image space to bird's eye view)
Step #4 Temporal Modules (smooths out all the rough spots)
Step #5 Bird's eye view decoder (creates the top down space)

All Tesla is doing is Step 3-5 which is already industry standard and what companies do with lidar data to create BEV networks and they are just playing catch up to that.

@heltok
 

Battpower

Supporting Member
Oct 10, 2019
2,033
1,992
Uk
I keep watching presentations, reading all I can find and trying to make sense of it as well as patching in my own old school engineering and electronics experience but where I struggle most is with the context and purpose of the information on offer. Long or short on TSLA vs Tesla not giving away how far ahead or behind they are.

I generally feel better able to understand stuff that relates to physical processes, layers, chip, feedback systems, much of which seems lost with NN. However, I keep reading posts here that suggest there are some pretty basic layers of authority (for example) and obviously all this still has to connect with the real world so physical stuff still matters.

Can someone who is certain of their facts point me to a current model (or list out steps here) that shows the ordering that goes on with Tesla's current (beta) approach? Is there just one model that's modified based on environment (freeway / city / lhd vs rhd.... country) or different models for each? In a more consistent / conforming environment like a freeway is it safer of not to try and perform potentially pointless and maybe confusing image processing at speed?

Step #1 Raw Images
Step #2 Backbone ResNet (outputs features such as moving objects, road lines, road edges, etc)
Step #3 Fusion layer (stitches all the extracted feature map from Step #2 and projects from image space to bird's eye view)
Step #4 Temporal Modules (smooths out all the rough spots)
Step #5 Bird's eye view decoder (creates the top down space)

Is this certain and in order?

Raw images: do you mean physical camera feeds or logical images of objects being tracked?

Is ResNet physical or logical? Are objects tracked individually or merged into a combined view? Since you haven't mentioned 360 / BEV yet, are these still flat representation of separate objects linked to a physical camera or combined logical image feed?

Step 3 pulls together objects from multiple feeds and places them in 360 space / 3D space?

Step 4 tracks objects between multiple frames? Is it part of step 4 that objects acquire direction and velocity?

Step 5. What is the significance of this? (other than I want to see a BEV on the MCU!) surely by the end of step 4 you have all your objects and how they are moving.

Why wouldn't you take all video sources and use them to produce stitched 3D images as a first step, then process that single source to extract / track objects?

Given that physical and logical seems quite a blurred distinction these days, how do we know which is which?
 

Knightshade

Well-Known Member
Jul 31, 2017
12,339
17,437
NC
[QUOTE="Battpower, post: 5317916, member: 117285"
Can someone who is certain of their facts point me to a current model (or list out steps here) that shows the ordering that goes on with Tesla's current (beta) approach? Is there just one model that's modified based on environment (freeway / city / lhd vs rhd.... country) or different models for each? In a more consistent / conforming environment like a freeway is it safer of not to try and perform potentially pointless and maybe confusing image processing at speed?[/QUOTE]


Elon Musk on Q4 2020 earnings call said:
so there's still a few of the neural nets that need to be upgraded to video training and video inference.

Also green had been reporting none of the freeway (NoA) code had changed significantly for the FSD Beta software so that appears to be among the stuff not running "re-write" code

And AFAIK all the folks running FSD beta are in LHD countries (mostly the US, but maybe there's some in Canada, can't recall)
 

Bladerskb

Senior Software Engineer
Oct 24, 2016
2,326
2,667
Michigan
I thought Tesla was doing auto labeling of video. The 4D rewrite uses surround video. So why is Tesla hiring people to still do manual labelling of images?

Tesla looks to hire data labelers to feed Autopilot neural nets with images at Gigafactory New York - Electrek

because it’s fundamentally still image labeling. They are just labeling a series of images that make up like 10 secs of video. Calling it 4D video labeling is just another hype job by Elon. Just like quantum leap or silky smooth.
 

boonedocks

MS LR Blk/Blk 19” OD-1/1/21 RN#1143376 NO Date
May 1, 2015
2,897
4,630
Gainesville GA

Knightshade

Well-Known Member
Jul 31, 2017
12,339
17,437
NC
I thought it was more like they still need a human to label in frame 1 THAT IS X for whatever it is- but then the system is capable of understanding it's still X in the rest of the video clip forward in time.... rather than the previous system where they had to manually tell it THAT IS X in each individual frame.

That'd still require humans for the initial labels for video but not for every item in every frame of it.
 

cbrigante2

Member
Aug 14, 2018
92
60
North Aurora, Il
I thought Tesla was doing auto labeling of video. The 4D rewrite uses surround video. So why is Tesla hiring people to still do manual labelling of images?

Tesla looks to hire data labelers to feed Autopilot neural nets with images at Gigafactory New York - Electrek
My understanding (from what little has been actually shared to us) on this was that Dojo hasn't even started yet. I suspect you have to keep manually doing this until it can start to automate some of those tasks? Not sure Elon ever gave more than a guess on Dojo being up and running though? Late this year?
 

diplomat33

Well-Known Member
Aug 3, 2017
7,802
9,119
Terre Haute, IN USA
My understanding (from what little has been actually shared to us) on this was that Dojo hasn't even started yet. I suspect you have to keep manually doing this until it can start to automate some of those tasks? Not sure Elon ever gave more than a guess on Dojo being up and running though? Late this year?

If I am remembering correctly, I think it was last year that Elon mentioned that Dojo was a year away from completion.
 
  • Like
Reactions: cbrigante2

Battpower

Supporting Member
Oct 10, 2019
2,033
1,992
Uk
require humans for the initial labels for video but not for every item in every frame of it

That's what it sounded like to me, but left further clarification of what is automated, will be automated and where DoJo sits all somewhat vague ... as usual.
 

Bladerskb

Senior Software Engineer
Oct 24, 2016
2,326
2,667
Michigan
That's what it sounded like to me, but left further clarification of what is automated, will be automated and where DoJo sits all somewhat vague ... as usual.

My understanding (from what little has been actually shared to us) on this was that Dojo hasn't even started yet. I suspect you have to keep manually doing this until it can start to automate some of those tasks? Not sure Elon ever gave more than a guess on Dojo being up and running though? Late this year?

What is DOJO to you? Serious question. From your words it seems like some magic potion or secret sauce...
 

Bladerskb

Senior Software Engineer
Oct 24, 2016
2,326
2,667
Michigan
I thought it was more like they still need a human to label in frame 1 THAT IS X for whatever it is- but then the system is capable of understanding it's still X in the rest of the video clip forward in time.... rather than the previous system where they had to manually tell it THAT IS X in each individual frame.

That'd still require humans for the initial labels for video but not for every item in every frame of it.

That is exactly what it is. Unfortunately its Tesla community so its hyped as the second coming of christ that changes everything and is unique to Tesla. Anything Tesla does is regarded as the second coming even if others have been doing it for years.

Cruise Data Labeling
 

About Us

Formed in 2006, Tesla Motors Club (TMC) was the first independent online Tesla community. Today it remains the largest and most dynamic community of Tesla enthusiasts. Learn more.

Do you value your experience at TMC? Consider becoming a Supporting Member of Tesla Motors Club. As a thank you for your contribution, you'll get nearly no ads in the Community and Groups sections. Additional perks are available depending on the level of contribution. Please visit the Account Upgrades page for more details.


SUPPORT TMC