Like I said. Maybe Tesla is implementing 4D differently. But if Mobileye has camera vision that can track the paths of objects in real-time in both space and time I consider that to be 4D.
Here is what Elon said during the Q2 2020 Earnings Call on 4D: "Well, the actual major milestone that's happening right now is really a transition of the autonomy system or the cars, like AI, if you will, from thinking about things in -- like 2.5D. It's like think -- things like isolated pictures and doing image recognition on pictures that are harshly correlated in time but not very well and transitioning to kind of a 4D, where it's like -- which is video essentially." Tesla (TSLA) Q2 2020 Earnings Call Transcript | The Motley Fool So basically 2.5D refers to just processing still images and maybe you stitch the images together to get a little of bit motion through time. 4D refers to video labeling where you now are fully processing objects in both 3D space and in Time. That's what Elon means by 4D.
Here's Karpathy's first recorded hint at using video for FSD (at 38:15 in Aug 2018). His explanation is way above my pay grade, but apparently, it's a very difficult problem: Starts at 38:15
This came out today, which I guess is close to state of the art in academia/opensource/startup space When watching it I can’t help but think how many of the problems Tesla will solve or even remove with their new 4D labelling, data engine and fusion layer, output in vector space solution. It feels like Tesla are at least 2 years ahead of academia, ie two years ago Tesla switched to a new better system. And now they already have the system in production. Don’t think people fully understand that competition will have to move to 4D or risk getting stuck in the 99.9%-range.
Well, the raising of the FSD price tag, coupled with 1,000s of people using the system should reveal loud complaints if the system doesn't live up to that $10 k price tag. If it is still in Beta 6 months from now, that would be bad news. Time will tell, and the time is not an indefinite number of months and years. Tesla/Musk have basically said, we got it, it is here. How long do they get for people to agree or refute that claim?
I've been wondering that since 2016. I suspect people will give Tesla lots of leeway. If they give everyone access to "closed" beta with signed waiver of liability, then I suspect there will be plenty of people happy.
Musk confirming what everyone already knew about better sensors... https://twitter.com/elonmusk/status/1329878876202426371
Is that why Tesla is two years behind Mobileye in deployed NN? Picture of Tesla's Future HW3 FSD Road Edge Neural Network from late 2019 Picture of Mobileye's Road Edge Neural Network from Production Q4 2017 it also looks like this will also be the case with EyeQ5. Looks like Mobileye will be the first to deploy Vidar this September. You don't even know what '4D stack' is or the state of the art in autonomous driving. All you do is regurgitate whatever elon is saying. Who invented these deep learning algorithms and architecture? Tesla? Oh wait no its DeepMind. Who was responsible for all the deep learning breakthroughs the past 9 years? Tesla? Oh wait no its DeepMind. Who collaborates with DeepMind? Tesla? Oh no its Waymo. That's not the state of the art in academia/opensource/startup space. Not even close. That's like 6 years behind. It all makes sense how misinformed you are. Is that why Tesla's Driving policy and prediction is almost 100% hard-coded? While others have been using mostly Deep learned models for a while? Talk about being behind.
This is a misconception of what's actually going on. Cruise Data Labeling Literally Andrej Karpathy said "What we have been working on is going much more towards these bird eyes view prediction, which are actually relatively standard and well understood but for us it kinda of a step up." He is literally tell YOU that they are playing catch up with the industry standard. All Tesla is doing is Step #1 Raw Images Step #2 Backbone ResNet (outputs features such as moving objects, road lines, road edges, etc) Step #3 Fusion layer (stitches all the extracted feature map from Step #2 and projects from image space to bird's eye view) Step #4 Temporal Modules (smooths out all the rough spots) Step #5 Bird's eye view decoder (creates the top down space) All Tesla is doing is Step 3-5 which is already industry standard and what companies do with lidar data to create BEV networks and they are just playing catch up to that. @heltok
I keep watching presentations, reading all I can find and trying to make sense of it as well as patching in my own old school engineering and electronics experience but where I struggle most is with the context and purpose of the information on offer. Long or short on TSLA vs Tesla not giving away how far ahead or behind they are. I generally feel better able to understand stuff that relates to physical processes, layers, chip, feedback systems, much of which seems lost with NN. However, I keep reading posts here that suggest there are some pretty basic layers of authority (for example) and obviously all this still has to connect with the real world so physical stuff still matters. Can someone who is certain of their facts point me to a current model (or list out steps here) that shows the ordering that goes on with Tesla's current (beta) approach? Is there just one model that's modified based on environment (freeway / city / lhd vs rhd.... country) or different models for each? In a more consistent / conforming environment like a freeway is it safer of not to try and perform potentially pointless and maybe confusing image processing at speed? Is this certain and in order? Raw images: do you mean physical camera feeds or logical images of objects being tracked? Is ResNet physical or logical? Are objects tracked individually or merged into a combined view? Since you haven't mentioned 360 / BEV yet, are these still flat representation of separate objects linked to a physical camera or combined logical image feed? Step 3 pulls together objects from multiple feeds and places them in 360 space / 3D space? Step 4 tracks objects between multiple frames? Is it part of step 4 that objects acquire direction and velocity? Step 5. What is the significance of this? (other than I want to see a BEV on the MCU!) surely by the end of step 4 you have all your objects and how they are moving. Why wouldn't you take all video sources and use them to produce stitched 3D images as a first step, then process that single source to extract / track objects? Given that physical and logical seems quite a blurred distinction these days, how do we know which is which?
[QUOTE="Battpower, post: 5317916, member: 117285" Can someone who is certain of their facts point me to a current model (or list out steps here) that shows the ordering that goes on with Tesla's current (beta) approach? Is there just one model that's modified based on environment (freeway / city / lhd vs rhd.... country) or different models for each? In a more consistent / conforming environment like a freeway is it safer of not to try and perform potentially pointless and maybe confusing image processing at speed?[/QUOTE] Also green had been reporting none of the freeway (NoA) code had changed significantly for the FSD Beta software so that appears to be among the stuff not running "re-write" code And AFAIK all the folks running FSD beta are in LHD countries (mostly the US, but maybe there's some in Canada, can't recall)
I thought Tesla was doing auto labeling of video. The 4D rewrite uses surround video. So why is Tesla hiring people to still do manual labelling of images? Tesla looks to hire data labelers to feed Autopilot neural nets with images at Gigafactory New York - Electrek
because it’s fundamentally still image labeling. They are just labeling a series of images that make up like 10 secs of video. Calling it 4D video labeling is just another hype job by Elon. Just like quantum leap or silky smooth.
I thought it was more like they still need a human to label in frame 1 THAT IS X for whatever it is- but then the system is capable of understanding it's still X in the rest of the video clip forward in time.... rather than the previous system where they had to manually tell it THAT IS X in each individual frame. That'd still require humans for the initial labels for video but not for every item in every frame of it.
My understanding (from what little has been actually shared to us) on this was that Dojo hasn't even started yet. I suspect you have to keep manually doing this until it can start to automate some of those tasks? Not sure Elon ever gave more than a guess on Dojo being up and running though? Late this year?
If I am remembering correctly, I think it was last year that Elon mentioned that Dojo was a year away from completion.
That's what it sounded like to me, but left further clarification of what is automated, will be automated and where DoJo sits all somewhat vague ... as usual.
What is DOJO to you? Serious question. From your words it seems like some magic potion or secret sauce...
That is exactly what it is. Unfortunately its Tesla community so its hyped as the second coming of christ that changes everything and is unique to Tesla. Anything Tesla does is regarded as the second coming even if others have been doing it for years. Cruise Data Labeling