FSD v12.x (end to end AI)

DrChaos · Aug 27, 2023

Bladerskb said:
The stuff coming out of Elon Musk's team in the last couple of days is seriously wacky.

It's cool that you're not falling for the nonsense either. Seems like Musk told his crew to hype up the livestream event, and it's pretty obvious.

Honestly, what they're saying doesn't make much sense and is full of wrong info. Who cares if they're only using 100 watts? It's kind of pointless and just sounds like the latest elon-babble ("useful AI compute per Watt").

High inference (on-board) power consumption is a big deal in an end-user owned EV and would significantly reduce driving efficiency and range. 1000+W (as I expect Waymo/Cruise to be) would

Bladerskb said:
Bringing up 36 FPS as if it's a big deal? Other companies with better compute and cameras run their networks way faster.

Saying "sub-human photon-to-control latency" literally makes zero sense.

That's actually something that does make sense, it means that the control latency is shorter than human neurology. Which isn't that much to brag about given that humans are about 100 Hz "clock" speed (neural firing rate) vs GHz.

Elon does bullshit a whole bunch, but lower power consumption and lower latency are actually useful engineering criteria.

Yelobird · Aug 27, 2023

Supcom said:
I believe the standard is to be safer than humans, not necessarily faster in every situation. In your example, yellow light timing is set with human response times in mind. So, if the car reacts in the same time as a human, it should have no difficulty stopping before the light turns red.

However, if it is desired to respond faster, then Tesla could feed simulation video to the training system with improved reaction times. But this may not be safer if there is a tailgating human with typical human reaction times.

My opinion….. the better then a human variable comes into play as we ALL get distracted for many reasons where a computer AI likely won’t. We all know the rules but at times we get distracted, talking, phone, squirrel, singing whatever. If this never happened there would likely never be accidents. As that’s not ever going to be the case an AI system will ensure all variables are in check including those we don’t consider making it better then a human.

DrChaos · Aug 27, 2023

diplomat33 said:
Yes. The photon count may different, the way the photons are filtered may be different. For those reasons, V12 will need to be retrained for HW4 but the photons are not different. That was a poor choice of words on Elon's part. Just a nitpick maybe on my part.

Specifically what they mean by "direct photon" is that older versions of the nets used images that had been post processed through some standard image filtering/representation libraries before presenting to the net, but they got better performance (both computationally and in ML) by skipping that and ingesting the direct CCD device counts.

The downside of course is if the hardware changes they need some initial layers of the system to be trained with data on the new hardware.

DrChaos · Aug 27, 2023

cliff harris said:
Its amazing that this thread is so packed with machine learning geniuses, who find the time on their gold plated yachts to educate us about how Tesla is doing it all wrong. Such a joy to read.

Some of us do machine learning for a living you know

Supcom · Aug 27, 2023

Yelobird said:
My opinion….. the better then a human variable comes into play as we ALL get distracted for many reasons where a computer AI likely won’t. We all know the rules but at times we get distracted, talking, phone, squirrel, singing whatever. If this never happened there would likely never be accidents. As that’s not ever going to be the case an AI system will ensure all variables are in check including those we don’t consider making it better then a human.

Presumably, you don't use videos of poor driving as positive training material.

DrChaos · Aug 27, 2023

Mardak said:
Andrej Karpathy providing some insights into keeping neural network modules around such as vector space outputs not only to bootstrap end-to-end training but also provide insights into what's affecting end-to-end video -> control.

https://twitter.com/x/status/1695506496958976118

I agree 100% with Karpathy here, I had some thoughts but he said it even better. Karpathy is a major genius---clearly Ashoks hype agreed more with Elon's attitude than Karpathy's clear science focus and it's a shame that Elon couldn't keep Karpathy. (And Elon is jealous of Sam Altman)

There's another point: when is the "regular Autopilot" going to get the better performing FSD software? If they're rewriting the FSD control V12 almost from scratch with E2E (where you don't know the internal representations), how do you algorithmically and deterministically cut the control laws down to just Autopilot behavior and not FSD behavior?

Looks like the "single stack merge" is still a long way off, and regular Autopilot behavior, particularly outside USA, is often worse than competition---which focuses on tuning a lower-ambitious but highly predictable automotive product.

As Karpathy says:

Worth expanding on is that an important caveat to keep track of in all of this is what runs in the car vs. what runs in the backend. E.g. the whole explicit stack might be alive and well but over time move to backend, used to 1) tune the data distribution and 2) modulate the loss function, and in this way get "distiled" into pure E2E test-time networks. That's the dream.

Meaning there will still need to be explicit control algorithms operating on an explicit representation "vector space" which relates to Newtonian and human physical intuition, not a grey goo that is all implicitly learned by end-to-end "photon to control" reinforcement learning exclusively by watching human drivers. Model "distillation" is a specific technical term of art in ML which typically means replicating input-output behavior of expensive already trained models in a cheaper form---in this case it would be replicating explicit optimization/rule based control systems serving as training input on the backend into cheaper nnet based control approximations. That would be necessary to get different behavior for Autopilot vs FSD.

He's tactful, but to translate, he's hoping Ashok's team is doing the right thing while blowing hype smoke up the boss's posterior. If you notice, in his public speaking, Karpathy never ever backed up Elon's boasts.

DrChaos · Aug 27, 2023

Supcom said:
Presumably, you don't use videos of poor driving as positive training material.

And how exactly does one sort those out by the millions in an automated way? Build a machine learning classifier? Well, how do you judge the good ones? Maybe you need a quantitative optimization target and explicit physics-representation and some rules about ideal driving behavior?

Tronguy · Aug 27, 2023

animorph said:
I'm sure Tesla's AI team, and other researchers as well, have developed their own dialect for describing their work. Elon will pick that up. Engineers can sound pretty obscure at times. In the case of photons, it's not hard to see that HW3 and HW4 will present different video data to the NN input, hence the "photons" are different.

Seconding this one.

During my life as a working engineer, sometimes (with a team) building from scratch Big Honking Piles of Electronics, sheer expediency results in acronyms, lots of acronyms, amongst the enlightened. People coming across bunches of us at lunch wouldn't be able to spy on us because we took the English language and mangled it beyond all redemption, making it incomprehensible to mere mortals.

"If you're not inventing an acronym a day, you're not working."

For a near parallel, think about the language sailors use to describe sails, various lines, ropes, machinery, masts, and so on.

Tronguy · Aug 27, 2023

DrChaos said:
And how exactly does one sort those out by the millions in an automated way? Build a machine learning classifier? Well, how do you judge the good ones? Maybe you need a quantitative optimization target and explicit physics-representation and some rules about ideal driving behavior?

Um. Might I suggest the (likely expletive-filled) recordings made by Beta users whenever they do an intervention and get prompted by the car as to why? Those utterances would likely be put into a text-searchable database. Not perfect, but capable of value judgements, something that a computer wouldn't be able to do.

Mardak · Aug 27, 2023

DrChaos said:
how do you algorithmically and deterministically cut the control laws down to just Autopilot behavior and not FSD behavior?

Oh yeah that's a really good question. With explicit control logic, Basic Autopilot "just" stays in the current lane without making lane changes, whereas with neural networks, it just drives as it's been trained. The demoed V12 doesn't show messages like "Changing lanes to follow route" but it does use the turn signal, so somewhere internal to the neural network, it has learned that people use the indicator in certain situations for crossing lines.

Potentially a basic approach would restrict controls when the neural network would want to change lanes, but that seems rather dangerous. The more complete approach would require specially training the networks with examples of what it can do when on Basic Autopilot, but it seems unlikely Tesla would dedicate resources to that anytime soon. Somewhere in between would be to write lane keeping control logic using the intermediate vector space perception, but even that requires a lot of effort and validation.

Similarly, FSD Beta 11.x improvements have made it to Automatic Emergency Braking such as for crossing vehicles, cutting-in vehicles and even general obstacles; so it'll be interesting how Tesla gets these safety improvements to the majority of fleet without FSD Capability.

eli_ · Aug 27, 2023

DrChaos said:
And how exactly does one sort those out by the millions in an automated way? Build a machine learning classifier? Well, how do you judge the good ones? Maybe you need a quantitative optimization target and explicit physics-representation and some rules about ideal driving behavior?

My theory is that good drivers all drive the same way and bad drivers drive badly in different ways from one another, different bad habits and behaviors. There is more diversity for bad drivers. So the data points for good drivers are all clustered close together, while bad drivers are scattered all over the place. Its easier to draw a box around the good drivers.

Mardak · Aug 27, 2023

Tronguy said:
Might I suggest the (likely expletive-filled) recordings made by Beta users whenever they do an intervention and get prompted by the car as to why?

These types of expletive-filled interventions are probably signals for where behaviors could be improved but not necessarily good training data as FSD Beta likely had been driving and already made a mistake requiring sudden human jerking. Someone would need to come up with a shadow mode trigger to find similar scenarios to gather video of what smooth human driving could have been to avoid needing the original interventions.

I suppose interventions without expletives could be an indicator that the driver was prepared in preventing FSD Beta from getting into a tricky situation, and these could maybe be more directly used for training especially with the future knowledge of FSD Beta would have done something different from human driving.

Tronguy · Aug 27, 2023

Mardak said:
These types of expletive-filled interventions are probably signals for where behaviors could be improved but not necessarily good training data as FSD Beta likely had been driving and already made a mistake requiring sudden human jerking. Someone would need to come up with a shadow mode trigger to find similar scenarios to gather video of what smooth human driving could have been to avoid needing the original interventions.

I suppose interventions without expletives could be an indicator that the driver was prepared in preventing FSD Beta from getting into a tricky situation, and these could maybe be more directly used for training especially with the future knowledge of FSD Beta would have done something different from human driving.

Actually, I was pretty much kidding about the "explitives" as a trigger for Tesla's detection of an item to Be Improved. Although I wouldn't be surprised if there were some in Tesla's database.

Most of the time when I'm intervening I'm typically busy driving to either get out of the hole that FSD-b stuck me in, or it's a place with relatively complex traffic that FSD-b failed within - and if the traffic is that complex, that probably means that a human driver had better be alert and paying attention to the traffic and one's own driving. Talking to the car and letting it know where it failed is definitely task #2 (or 3); thinking up appropriate expletives for the occasion is right out.

I think there's only a five or ten second window for that comment to go in, so even getting an appropriate response is difficult, especially when one's mind is on other things.

Still, what text does get out the door to whatever automated systems Tesla uses to scan the intervention text database has got to be decent fodder for the mill, errors in it and all. Especially given the gigantic size of the people dumping interventions into it; even if six out of ten can't get the correct message out the door, the other four probably can. And that won't be the end of it, anyway: somebody's got to look through the, "hits" and see what happened sooner or later. (Although I'm not proposing that Tesla looks at every hit; everything probably about sampling.)

Eno Deb · Aug 27, 2023

Bladerskb said:
https://twitter.com/x/status/1695516856315150735

I find it quite interesting that they say they are "trying to solve the challenges", in light of the fact that their CEO has said since at least 2016 that it was "a solved problem" (and started charging customers thousands of dollars for the product).

DarkandStormy · Aug 27, 2023

https://twitter.com/x/status/1695683422667317304

Wait, thought it was getting solved this year for sure? 6 months is...*does math*...not this year.

Dan D. · Aug 27, 2023

diplomat33 said:
Yes. The photon count may different, the way the photons are filtered may be different. For those reasons, V12 will need to be retrained for HW4 but the photons are not different. That was a poor choice of words on Elon's part. Just a nitpick maybe on my part.

Reminds me of some commercial from a while back, there is a pile of coffee beans and a hand holds a stick and diverts some off to the side. We only use the finest coffee beans is the tag. Well Tesla just chooses the finest photons.

It's a chore keeping up with the latest advances that Tesla is doing, while they belatedly say how poorly they used to be doing it. Yes, but realize they were doing it poorly all along. Today's advances are tomorrow's poorly done past choices.

Until they have better vision coverage they really can't do anything well regardless of fancy techno-babble. Fix the multiple blind spots first.

Bladerskb · Aug 27, 2023

willow_hiller said:
These are both really simple and straightforward. Do you really not understand them?

Only a laymen would. That's why its techno-bable. Its a bunch of mumbo jumbo that arouses a laymen but an expert in the field sees right through it.

Technobabble also called technospeak, is a type of nonsense that consists of buzzwords, esoteric language, or technical jargon.

There was study done several years ago and if I remember correctly it stated that between 70% or was it 90% of all AI Branded companies had nothing to do with AI. But they brand themselves as using AI to get more funding, put out AI presentations with a bunch of techno-babble. Laymen, the media and investors like you all lapped it up.

willow_hiller said:
Tesla has opted for a low-power, low-profile compute board so that it doesn't take trunk space or spend an appreciable part of an EV's energy. I know you prefer companies that fill the entire trunk with computers and don't care about how much power they draw, but that doesn't invalidate the usefulness of working within constraints.

This is a prime example and is complete none-sense. The production system (not development system that is continually swapped) of all ADAS/AV cars are either smaller than the HW3 computer or around the same size. While having orders of magnitude more compute.

Here is the size of HW3 compared to systems like Mobileye (136 Tops), GM Ultracruise using Qualcom (300 tops), NIO Adam using Nvidia Orin (1016 Tops), Huawei MDC 810 (400 Tops).

I could keep adding pictures for all door to door ADAS systems and AV systems but it would be pointless. There are dozens of cars (delivered) and not a single one of them take up the trunk. Even the size of Waymo is about the size of a xbox one s. Its the shaded rectangle in the middle. The rest is the redundant power supply.

willow_hiller said:
or spend an appreciable part of an EV's energy. I know you prefer companies that fill the entire trunk with computers and don't care about how much power they draw

Another prime example.
A Nvidia 1016 Tops setup (7x more powerful than Tesla FSD Compute) would use around 300 watts.
And 300 Watts would cost just under 1 mile of range if left on for a full hour (0.3 kwh). The average EV (from 231 EVs) consumption is 0.346kWh per mile. Talk about FUD.
And OEMs who use 2x Orin (502 Tops) instead like Xpeng will end up with a system that is 3x more powerful as Teslas while using similar TDP.
And if you just use one Orin like Volvo, you end up with a system that is almost twice as powerful while using less watts than Tesla FSD.

This is pure none-sense.

DrChaos said:
High inference (on-board) power consumption is a big deal in an end-user owned EV and would significantly reduce driving efficiency and range. 1000+W (as I expect Waymo/Cruise to be) would

This is a myth.
See above

Mardak · Aug 27, 2023

diplomat33 said:
the photons are not different. That was a poor choice of words on Elon's part.

Here's more context of what he said / what I can hear:

Because of the fact that it's photons in to controls out -- the photons are different. You're getting a different bitstream with HW4 cameras than HW3 cameras.

Although this makes me wonder if Tesla did specially retrain 11.4.x for HW4 vs running in some emulated mode with some preprocessing? Or even Basic Autopilot already running on new HW4 including Model Y?

JB47394 · Aug 27, 2023

Max Spaghetti said:
So, if the system only knows what it has been taught by video, if it has never seen a video of a car going over a cliff (with a negative attached), it won't know that a cliff is bad. It only knows that a road is good, and a cliff is undefined. Presumably undefined means bad. I would be keen to learn more about this.

I was thinking more about this and, at a guess, if roads are good (1), but cliffs are nothing worse than a non-road (0), then you can't go fooling around with confidence levels around 0.1 or 0.2 because a cliff might show up that way via hallucination, data variation, or whatever. So if you train to recognize a cliff and declare it explicitly bad (-1), then you can mess around near confidence of 0.0 without worrying too much that a hallucination is going to move a cliff into that range. Ultimately, if continuing to operate with low confidence is important, you'd want to make sure that a cliff got classified well away from that range.

Source: I have no idea what I'm talking about.

diplomat33 · Aug 27, 2023

Eno Deb said:
I find it quite interesting that they say they are "trying to solve the challenges", in light of the fact that their CEO has said since at least 2016 that it was "a solved problem" (and started charging customers thousands of dollars for the product).

I think there is a clear disconnect between Elon and the engineering team. Elon seems to be thinking more in theory. So when Elon says FSD is solved, he means it is solved "on paper" as he sees it. Put differently, in his mind, he sees E2E vision-only as the right solution to FSD on paper. But engineers look at the real world. So the Tesla engineers are looking at what it takes to actually achieve a safe, reliable autonomous driving in a real car in the real world, where the human does not need to supervise, with all the edge cases and challenges that it entails.

Bladerskb said:
Only a laymen would. That's why its techno-bable. Its a bunch of mumbo jumbo that arouses a laymen but an expert in the field sees right through it.

Technobabble also called technospeak, is a type of nonsense that consists of buzzwords, esoteric language, or technical jargon.

There was study done several years ago and if I remember correctly it stated that between 70% or was it 90% of all AI Branded companies had nothing to do with AI. But they brand themselves as using AI to get more funding, put out AI presentations with a bunch of techno-babble. Laymen, the media and investors like you all lapped it up.

This is a prime example and is complete none-sense. The production system (not development system that is continually swapped) of all ADAS/AV cars are either smaller than the HW3 computer or around the same size. While having orders of magnitude more compute.

Here is the size of HW3 compared to systems like Mobileye (136 Tops), GM Ultracruise using Qualcom (300 tops), NIO Adam using Nvidia Orin (1016 Tops), Huawei MDC 810 (400 Tops).

I could keep adding pictures for all door to door ADAS systems and AV systems but it would be pointless. There are dozens of cars (delivered) and not a single one of them take up the trunk. Even the size of Waymo is about the size of a xbox one s. Its the shaded rectangle in the middle. The rest is the redundant power supply.

Another prime example.
A Nvidia 1016 Tops setup (7x more powerful than Tesla FSD Compute) would use around 300 watts.
And 300 Watts would cost just under 1 mile of range if left on for a full hour (0.3 kwh). The average EV (from 231 EVs) consumption is 0.346kWh per mile. Talk about FUD.
And OEMs who use 2x Orin (502 Tops) instead like Xpeng will end up with a system that is 3x more powerful as Teslas while using similar TDP.
And if you just use one Orin like Volvo, you end up with a system that is almost twice as powerful while using less watts than Tesla FSD.

This is pure none-sense.

This is a myth.
See above

There is one key difference. Those other companies like Mobileye, GM or Xpeng understand that their systems are L2. Tesla is the only one claiming that just 8 low resolution cameras, 36 fps, and 100W compute is enough to achieve safe, reliable level 5 autonomy. So in that sense, Srihari is correct. Tesla is giving themselves an even harder challenge because they are trying to achieve autonomous driving but putting huge constraints on themselves to only using a few low res cameras at 36 fps, only using 100W of compute, not using HD maps and only using end-to-end training. Nobody else is foolish enough to try that because they know it is an unnecessary handicap on your approach.

FSD v12.x (end to end AI)

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Member

Active Member

Active Member

Active Member

Active Member

Desperately Seeking Sapience

Senior Software Engineer

Active Member

Active Member

Average guy who loves autonomous vehicles

Similar threads