Tesla.com - "Transitioning to Tesla Vision"

linux-works · Jun 22, 2021

Daniel in SD said:
The new Honda Civic is camera only too. I bet many other manufacturers will follow suit.

vendors saving money by going with lesser sensors is a money-saving technique. I'm not at all convinced its for OUR benefit, as drivers and passengers.

if/when camera-only sensor arrays drive us safely, that will be great and I hope it happens, but I have extreme doubts about dumbing down the sensor array in ways like that. more is always better. you can still decide what/which box and subsystem to believe and when, nothing is saying that you have to use every sensor every millisecond and trust 100% of all outputs. you *value* them and that's the real secret sauce.

mass-market vendors will go to great lengths to reduce even 5 cents on a pcboard. I worked with hardware groups that were ordered by their big bosses to cost-reduce this and that. we removed rtc chips (that we NEEDED) on orders from above. it sucks and its part of the field I hate quite passionately.

I've seen the cost-cutting at car companies. so I'm not quite a believer that this is for our benefit. if its true, great, but initially I start out assuming its not true and need to be convinced otherwise. too much is at stake, afterall.

texas_star_TM3 · Jun 22, 2021

linux-works said:
vendors saving money by going with lesser sensors is a money-saving technique. I'm not at all convinced its for OUR benefit, as drivers and passengers.

if/when camera-only sensor arrays drive us safely, that will be great and I hope it happens, but I have extreme doubts about dumbing down the sensor array in ways like that. more is always better. you can still decide what/which box and subsystem to believe and when, nothing is saying that you have to use every sensor every millisecond and trust 100% of all outputs. you *value* them and that's the real secret sauce.

mass-market vendors will go to great lengths to reduce even 5 cents on a pcboard. I worked with hardware groups that were ordered by their big bosses to cost-reduce this and that. we removed rtc chips (that we NEEDED) on orders from above. it sucks and its part of the field I hate quite passionately.

I've seen the cost-cutting at car companies. so I'm not quite a believer that this is for our benefit. if its true, great, but initially I start out assuming its not true and need to be convinced otherwise. too much is at stake, afterall.

yup. the Continental radar sensor Tesla uses (and many others) is $100+ ... that's massive cost savings (and reduced dependencies on the constrained output of a supplier due to chip shortage). I mean Tesla removed front passenger lumbar support for cost cutting as well and that part likely isn't even worth $20. So pretending that removing a $100+ part and limit available sensor input in order to make the car more capable and better suited for FSD is quite funny.

mark95476 · Jun 22, 2021

This was a tell that you're wasting your time and this guy is basically a troll.

Q: What do students do when they don't have any hardware, even cheap stuff like a camera or image sensor?

Take baked video (.avi, .mp4), pipe to API/CODEC to decompress, massage/filter/transform, ....etc....

mikes_fsd said:
His condescension and throwing in .avi and .mp4 formats in with the rest of the bath water was where they lost me.
Not sure about everyone else.

AlanSubie4Life · Jun 22, 2021

I guess I am wondering why they weren’t processing multiple frames of video years ago?

What has changed? Hardware more capable (can’t you just use 9x fewer pixels or something?), but that cannot be the issue since there are lots of ways around that.

I guess transformers are new? Are there other enabling technology changes that really suddenly made processing multiple frames possible where it was not possible at all before? (Seems unlikely…)

It just seems like everyone at Tesla and elsewhere knew you would have to have multiple frames of videos and be making predictions about future movement years ago, so wondering why this transition is only occurring now?

Did they get stuck in a place of having to deliver a product to customers, which interfered with their actual goals? Then stuck in a loop of dealing with tons of (arguably unimportant) corner cases from that kludged solution as fleet data came in? Getting “feature complete” NoA took too much of their programmers’ time somehow?

This is not revolutionary stuff either to Tesla or anyone else, so why are they only starting to really push it now, vs 3 years ago?

I genuinely do not understand the timeline.

thewishmaster · Jun 22, 2021

AlanSubie4Life said:
It just seems like everyone at Tesla and elsewhere knew you would have to have multiple frames of videos and be making predictions about future movement years ago, so wondering why thtransition is only occurring now?

I could be entirely wrong, but I thought that the standard technique has been to use perception (after all the sensor fusion etc) to place and track agents in a scene and then use non-NN code to do all the forward-looking predictions (or maybe running NNs on the higher-order scene reconstructions rather than on the raw video input). Tesla was processing each frame individually at first and is starting to lean on NN-based predictions straight from the input.

gearchruncher · Jun 22, 2021

stopcrazypp said:
Unlike claimed by some upthread, NHTSA didn't suggest in their statement to the media that Tesla could restore the checkmarks on based on Tesla's own testing, if the features were affected by the production change. They only commented they have not finalized which vehicles to test yet for 2022 Model year (which seems to imply we might not get resolution until then).

You continue to be disingenuous about the source of information.
The place that the data about NHTSA re-testing and restoring the check marks came from was Elon himself. Nobody "upthread" said it came from NHTSA.

Just confirmed with the Autopilot team that these features are active in all cars now, including vision-only. NHTSA automatically removes the check mark for any cars with new hardware until they retest, which is happening next week, but the functionality is actually there.

Yeah, Elon literally said they were testing NEXT WEEK on May 28th, so that Testing would have occurred 3 weeks ago, and he clearly indicated the re-rest would lead to the restoration of those checks. Yet, 3 weeks later, we get your update, which is that Tesla "talked" to NHTSA and got the LDW check back because that system was untouched. The touched systems are not mentioned at all even though the Technoking said they were going to be tested 3 weeks ago.

As usual, basically everything Elon says around autonomy is made up, a half truth, or deeply misleading. It continues to be true that anyone that listens to Elon in this area and uses anything he says as reality will be made a fool of by history.

ZeApelido · Jun 22, 2021

AlanSubie4Life said:
I guess I am wondering why they weren’t processing multiple frames of video years ago?

What has changed? Hardware more capable (can’t you just use 9x fewer pixels or something?), but that cannot be the issue since there are lots of ways around that.

I guess transformers are new? Are there other enabling technology changes that really suddenly made processing multiple frames possible where it was not possible at all before? (Seems unlikely…)

It just seems like everyone at Tesla and elsewhere knew you would have to have multiple frames of videos and be making predictions about future movement years ago, so wondering why this transition is only occurring now?

Did they get stuck in a place of having to deliver a product to customers, which interfered with their actual goals? Then stuck in a loop of dealing with tons of (arguably unimportant) corner cases from that kludged solution as fleet data came in? Getting “feature complete” NoA took too much of their programmers’ time somehow?

This is not revolutionary stuff either to Tesla or anyone else, so why are they only starting to really push it now, vs 3 years ago?

I genuinely do not understand the timeline.

Maybe all of those things? Lol.

Definitely hardware was one issue. Old computer they had to downsample (reduce resolution) of the images, that tells you how limited it was. And why spend the compute money on AWS for video training when you know you can't even use it right away ( need HW3 first).

Transformers definitely have advanced things, but no I think they could have done video without them.

Definitely business goals have to align. I am an investor so I know Tesla splurging on R&D projects that weren't going to deliver any ROI for over a few years would have been seen poorly from investors.

Tesla makes a lot of decisions that are trade-offs. They aren't a research institute. Thus the lidar / radar choices.

Deep learning on video is nascent because of the amount of compute / data needed so I don't really think Tesla could have done this years ago and delivered it into cars.

AlanSubie4Life · Jun 22, 2021

thewishmaster said:
I could be entirely wrong, but I thought that the standard technique has been to use perception (after all the sensor fusion etc) to place and track agents in a scene and then use non-NN code to do all the forward-looking predictions (or maybe running NNs on the higher-order scene reconstructions rather than on the raw video input). Tesla was processing each frame individually at first and is starting to lean on NN-based predictions straight from the input.

Certainly there are different strategies for path prediction, and maybe this is a very narrow conversation about using neural nets to do this, rather than more traditional methods.

However, if they were really doing good path prediction with distance and (with multiple frames) velocity measurements, etc., previously, without NNs, why all the issues with radar?

Maybe this is all somehow explained in the latest Karpathy video, which I have not watched?

AlanSubie4Life · Jun 22, 2021

ZeApelido said:
Definitely hardware was one issue. Old computer they had to downsample (reduce resolution) of the images, that tells you how limited it was. And why spend the compute money on AWS for video training when you know you can't even use it right away ( need HW3 first).

It just seems like this is not a real issue for development, because you could just build a custom FSD computer and do development on that. (Who cares that customers don’t have it - it doesn’t matter.) You do need programmers to work with it though. But it seems like really pounding on that front years ago would have given them a good idea of what future hardware requirements would be, and help your FSD computer development team properly spec the upcoming hardware.

Plus it would have the substantial side benefit of having working NNs that really do excellent image recognition and path prediction, etc. so when the fleet hardware is capable you could just push it out. (A simplification I am sure, but seems better than apparently starting from scratch.)

I mean, paying for AWS to do all the training would kind of suck since you don’t get instant ROI, but it seems like necessary suckage.

thewishmaster · Jun 22, 2021

AlanSubie4Life said:
Certainly there are different strategies for path prediction, and maybe this is a very narrow conversation about using neural nets to do this, rather than more traditional methods.

However, if they were really doing good path prediction with distance and (with multiple frames) velocity measurements, etc., previously, without NNs, why all the issues with radar?

Maybe this is all somehow explained in the latest Karpathy video, which I have not watched?

I don’t think you missed anything, all of Karpathy’s (and Elon’s) comms on the subject tend to focus on some narrow part of the current solution rather than telling the full story.

I don’t think they were doing much path prediction previously, only reacting to the current state of the world - the velocity measurements from radar would provide a glimpse into the future state and thus AP could react appropriately. In the examples from the latest video, under hard braking, the radar measurements would be lost and re-acquired, meaning there could be blind spots resulting in misapplication of the brakes etc; doing “proper” prediction without radar smoothes out the reactions.

They weren’t doing much raw velocity prediction from vision either, only stuff like “will this car become an obstacle or not” (eg cut-in prediction).

qdeathstar · Jun 22, 2021

linux-works said:
thought experiment: would you willingly give away any one of your 5 senses? do you think anyone is ever more capable in the world with LESS senses than they were born with?

its really simple. more rich info is always good. always. its 101 level stuff!

I don’t think that is a very good analogy

AlanSubie4Life · Jun 22, 2021

thewishmaster said:
I don’t think they were doing much path prediction previously, only reacting to the current state of the world

Ah well, better late than never I suppose.

Maybe it will only take them 1/10th the development time now, especially if there are open source libraries for image recognition and path prediction now - they can just lift those! Problem solved, next problem.

ZeApelido · Jun 22, 2021

AlanSubie4Life said:
It just seems like this is not a real issue for development, because you could just build a custom FSD computer and do development on that. (Who cares that customers don’t have it - it doesn’t matter.) You do need programmers to work with it though. But it seems like really pounding on that front years ago would have given them a good idea of what future hardware requirements would be, and help your FSD computer development team properly spec the upcoming hardware.

Plus it would have the substantial side benefit of having working NNs that really do excellent image recognition and path prediction, etc. so when the fleet hardware is capable you could just push it out. (A simplification I am sure, but seems better than apparently starting from scratch.)

I mean, paying for AWS to do all the training would kind of suck since you don’t get instant ROI, but it seems like necessary suckage.

I mean I believe they were doing some work - they had some software already running in January 2019 when they were putting HW3.0 chips in employee's cars. That must have been started in 2018.

Basically I'm guessing they started working on bigger perception NNs when Karpathy joined.

gearchruncher · Jun 22, 2021

So, 4 weeks to the day since they announced "Transitioning to Tesla Vision." Still limited to 75 MPH, no smart summon, no NHTSA testing. No obvious large software releases around vision only like Elon said was coming.

https://twitter.com/x/status/1401621271943815169

Elon has gone pretty silent on vision and FSD since that tweet 16 days ago. I'm sure V9 will hit this week though.

qdeathstar · Jun 22, 2021

Isidro Jr said:
LOL My last post, this reminds me why I dont really do social media. When someone tries to explain something to you, with knowledge outside of medium articles, wikipedia, and videos aimed at educating at the 30k foot level you should listen, observe, learn and STFU. Remember, what AK releases to you via youtube, is the same *sugar* he shows people out of their depth within Tesla. Anyway, back to actually creating these vision systems.

Someone with a masters degree and actual experience in the field won’t through up the fact that they have a masters degree makes them right, they let the facts speak for themselves...

Knightshade · Jun 22, 2021

gearchruncher said:
So, 4 weeks to the day since they announced "Transitioning to Tesla Vision." Still limited to 75 MPH, no smart summon, no NHTSA testing. No obvious large software releases around vision only like Elon said was coming.

https://twitter.com/x/status/1401621271943815169

Elon has gone pretty silent on vision and FSD since that tweet 16 days ago. I'm sure V9 will hit this week though.

Elon Musk missed an FSD target date???

JHCCAZ · Jun 22, 2021

Topic: Non-Tesla Vision-only Autonomy - support for viability of camera-only AVs

I'd like to call to everyone's attention the morning Keynote talk, at the same CVPR workshop that included the latest Andrej Karpathy talk under discussion. This one is by Alex Kendall, CEO of Wayve in London. He is quite clear that the autonomous examples shown in London streets are using cameras only. Specifically "Monocular 360 Cameras" meaning a stitched-together surround panorama video as the NN input, but not utilizing simultaneous stereo-pair or similar range-finding. This is similar to the Tesla BEV integration of camera feeds inside the FSD computer. He mentions this several times, including the statement that HD maps / LIDAR are not involved, just intelligence applied to the video. Regarding the latter, he stresses that London is full of scenes where the concept of "lanes" is not well-defined, and even if they were he stresses situations where the intended lanes are blocked, unavailable or occupied by oncoming traffic.

I would say that this talk gives compelling if indirect support to the idea that Tesla Vision is a viable concept, "barking up the right tree" in Karpathy's words. I wouldn't say it's any kind of closed proof that other sensors couldn't be fused successfully and helpfully. Also I'd note that the London examples, while impressive, are low-to-medium speed. Very high speed highway driving and collision/object avoidance could be a different emphasis.

The following link is only the Keynote talk, not the whole day-long meeting. I linked it to start a couple of minutes in, but you can easily rewind it to the beginning. I don't know much else about Wayve, but I'd assume that an invited Keynote speaker is considered a very credible source.

stopcrazypp · Jun 22, 2021

AlanSubie4Life said:
Certainly there are different strategies for path prediction, and maybe this is a very narrow conversation about using neural nets to do this, rather than more traditional methods.

However, if they were really doing good path prediction with distance and (with multiple frames) velocity measurements, etc., previously, without NNs, why all the issues with radar?

Maybe this is all somehow explained in the latest Karpathy video, which I have not watched?

Watch the video, there is a specific part at 23:53 where he compares the difference between the previous radar version with the current Tesla Vision version:

The gist I got was the previous version of the software was either not getting velocity/acceleration from vision at all, or only getting something very rough (like labeling something as stationary or not for example), so they had to rely a lot on the radar. And when the radar blinks out for whatever reason (mismatch between target association of vision vs radar), it can result in noticeable errors.

stopcrazypp · Jun 22, 2021

gearchruncher said:
You continue to be disingenuous about the source of information.
The place that the data about NHTSA re-testing and restoring the check marks came from was Elon himself. Nobody "upthread" said it came from NHTSA.

Yeah, Elon literally said they were testing NEXT WEEK on May 28th, so that Testing would have occurred 3 weeks ago, and he clearly indicated the re-rest would lead to the restoration of those checks. Yet, 3 weeks later, we get your update, which is that Tesla "talked" to NHTSA and got the LDW check back because that system was untouched. The touched systems are not mentioned at all even though the Technoking said they were going to be tested 3 weeks ago.

As usual, basically everything Elon says around autonomy is made up, a half truth, or deeply misleading. It continues to be true that anyone that listens to Elon in this area and uses anything he says as reality will be made a fool of by history.

I took the "they" in Elon's statement back then to mean NHTSA doing the testing (as mentioned in that sentence), not Tesla. As in "NHTSA automatically removes the check mark for any cars with new hardware until they (NHTSA) retest, which is happening next week, but the functionality is actually there."
Same thing with IIHS's rating (IIHS doing the testing, not Tesla). Someone else claimed upthread that NHTSA didn't need to do any testing, only Tesla, so NHTSA is not involved at all. The statement Reuters got from NHTSA however implies that NHTSA is still making a decision on 2022 Model Year testing and that the features would be evaluated as part of that testing (not a test that should have happened weeks ago).

Dan D. · Jun 22, 2021

stopcrazypp said:
Watch the video, there is a specific part at 23:53 where he compares the difference between the previous radar version with the current Tesla Vision version:

The gist I got was the previous version of the software was either not getting velocity/acceleration from vision at all, or only getting something very rough (like labeling something as stationary or not for example), so they had to rely a lot on the radar. And when the radar blinks out for whatever reason (mismatch between target association of vision vs radar), it can result in noticeable errors.

Yeah, I get that they are explaining now that their radar was causing them problems. Their radar. So get a better one. AK is making it out to be radar's fault while also saying that they could have solved the problem of radar detection glitches - but didn't want to.

Ok, if they think they can solve FSD with just vision, from their blind-zone limited camera suite and lack of 360 radar, I will be impressed. I'm skeptical though, without having enough sensory data their system is going to have to guess to fill in the blanks. There was a video of a user who taped over all the cameras in his Tesla, one by one, and eventually the FSD still turned left even though there was no camera that actually looked left. That's risky programming.

Tesla.com - "Transitioning to Tesla Vision"

Active Member

Active Member

Active Member

Efficiency Obsessed Member

Active Member

Well-Known Member

Active Member

Efficiency Obsessed Member

Efficiency Obsessed Member

Active Member

Completely Serious

Efficiency Obsessed Member

Active Member

Well-Known Member

Completely Serious

Well-Known Member

Electrified Engineer

Well-Known Member

Well-Known Member

Desperately Seeking Sapience

Similar threads