FSD v12.x (end to end AI)

Usain · Feb 17, 2024

But something I have thought a lot about is training cycle time on its own. From the very early days of FSD and even up to now there has been this obvious problem that it takes way too long to train and test a new version of the software.

This reminds me of how my old professors used to talk about the era when they were only able to compile their code once a day. It was hard to make progress at that rate. But once compilers and hardware improved by orders of magnitude, the sky was the limit. We were able to make computers do things only dreamed of in science fiction.

I think we are at that tipping point in AI. Cycle time is the key. If Tesla is able to improve cycle time by orders of magnitude as planned, then autonomy has a very good chance of being solved. And it could be rather soon.

lzolman · Feb 17, 2024

Usain said:
This reminds me of how my old professors used to talk about the era when they were only able to compile their code once a day. It was hard to make progress at that rate.

How about 3 weeks to code and debug a simple FORTRAN program? I learned to program while attending Hollywood High in the mid-70s. We'd bubble in Hollerith cards by hand (same codes as a keypunch, 2-3 bubbles per character), then the card decks would be sent to downtown L.A. to be processed. That cycle took about 3 days. The first cycle was just to detect the bubbling errors, the a few to address syntax errors, and a few more to actually debug the logic. Weekends didn't count. I quickly learned to hang out at LACC and USC where anyone could walk in off the street and submit a card deck or use the interactive BASIC and APL terminals...

Go V12!

sleepydoc · Feb 17, 2024

JHCCAZ said:
The neural network implementations, in almost all machine learning hardware applications as of 2024, are binary digital logic machines. They are not implemented with analog neurons with associated DACs, multipliers and ADCs (or specialized nonlinear analog decision comparators or whatever). Nor are they pulse-density analog machines.

There have indeed been such concepts, and research performed in that vein - I myself am interested in that and think it could have a real future - but that isn't the way it's being done in any serious large scale commercial ML/NN deployment that I'm aware of.

The ML computers are very high speed arrays of familiar synchronous-logic processing units to make up the "NPU", architecturally specialized to perform accumulate and multiply (dot product) functions, and of course associated high-speed memory to hold dynamically changing operation results as well as the NN "program" in terms of the associated weights, all using relatively coarse fixed-point "like" numerical representations (I read that Tesla came up with their own preferred number format, but I don't know where in the training/inference universe it's actually being used). The closest widely-deployed silicon that met these requirements, as of a few years ago, were found in graphics cards and their core GPUs. That's why you often hear and read about giant GPU training clusters, and lots of business for Nvidia as the leading example of companies that somewhat lucked into the huge hardware market for the current AI boom.

With each generation of development beyond the early graphics-card arrays, I think the architecture is becoming more refined towards purpose-built ML computing. I personally don't know a lot about this, nor at what point people will stop calling them "GPUs". Tesla's own Dojo project is a non-Nvidia example of custom silicon and extensive support hardware, but again it's an extremely high bandwidth digital computer module intended for efficient expansion, and in the meantime Tesla is buying tons of Nvidia along with most if not all other big players.

On a smaller scale, but with impressive computing power and efficiency, the same comments hold for the inference processors within the car. Tesla did design a fairly impressive and power efficient autopilot computer for HW3, a better (but higher power) one for HW4, and I wouldn't be surprised if they already have prototype silicon for HW5. In terms of volume deployment I think Tesla is currently the leader in this regard. Nvidia, Qualcomm, probably Intel for Mobileye's Orin, and others including Huawei et al in China, are working on these things.

Most of these projects are not just silicon processor development, but I think these companies all have in-house self-driving platform developments beyond just the idea of selling chips or computer boards to carmakers. I think some of the recent and existing-generation robotaxi companies don't have particularly efficient in-car computing; we hear about trunks stuffed full of computers and cooling equipment. But their volume is currently low and I'm sure they will be taking advantage of supplier developments in this space .

To finish this by throwing in the requisite v12 content: a big question that hangs over the v12 approach is whether the couple of million HW3 computers already out there, or even the faster HW4, have enough compute (inference) power to achieve the goal. People make a lot of pronouncements here in the forum, but I don't think the answer is completely known even inside Tesla, much less to the rest of us. There's a lot of talk these days about the ratio of training compute and data size to the inference compute assets. The field is moving rapidly and there are very encouraging reports that massive and properly targeted training effort can result in a very compact, efficient and capable inference implementation, i.e. could work very well on HW3. The counterpoint is that the training investment could be too high to achieve that goal, and that a more tractable training infrastructure (and training cycle time) could be enabled if the in-car hardware were better than HW3|4 by some factor. I'm far from being knowledgeable enough to make a prediction, but of course I have my hopes!

I think the analog/digital confusion arises from the fact that while neural nets may be composed digital components, they are better at dealing with analog inputs. The neurons in a human brain are also inherently digital but our brains operate in a decidedly analog fashion.

I also have @Usain ‘s question about whether an increased training cycle can be used to compensate for a less powerful computer. I guess we’ll see.

Mardak · Feb 17, 2024

sleepydoc said:
whether an increased training cycle can be used to compensate for a less powerful computer

Yeah, more and faster training should allow multiple engineers to explore additional techniques in parallel such as taking a larger model that wouldn't run fast enough in a car and using it to teach a smaller student model. Other explorations can look for ways to have the capacity of a larger model but compute inference requirements of a smaller model. And in general faster iteration cycles should also allow for more regular continuous deployments to the fleet for rapid learning and finding of real-world situations that are still problems especially for compute limited vehicles because the models can stay the same size while there's capacity to learn and just needs more data.

lzolman · Feb 17, 2024

This just came up in my FB feed and it's pretty cool. Almost totally OT of course.

https://www.facebook.com/video.php?v=1104016354365798

zoomer0056 · Feb 17, 2024

lzolman said:
This just came up in my FB feed and it's pretty cool. Almost totally OT of course.

https://www.facebook.com/video.php?v=1104016354365798

Cool video!

kabin · Feb 17, 2024

JHCCAZ said:
The neural network implementations, in almost all machine learning hardware applications as of 2024, are binary digital logic machines. They are not implemented with analog neurons with associated DACs, multipliers and ADCs (or specialized nonlinear analog decision comparators or whatever). Nor are they pulse-density analog machines.

There have indeed been such concepts, and research performed in that vein - I myself am interested in that and think it could have a real future - but that isn't the way it's being done in any serious large scale commercial ML/NN deployment that I'm aware of.

The ML computers are very high speed arrays of familiar synchronous-logic processing units to make up the "NPU", architecturally specialized to perform accumulate and multiply (dot product) functions, and of course associated high-speed memory to hold dynamically changing operation results as well as the NN "program" in terms of the associated weights, all using relatively coarse fixed-point "like" numerical representations (I read that Tesla came up with their own preferred number format, but I don't know where in the training/inference universe it's actually being used). The closest widely-deployed silicon that met these requirements, as of a few years ago, were found in graphics cards and their core GPUs. That's why you often hear and read about giant GPU training clusters, and lots of business for Nvidia as the leading example of companies that somewhat lucked into the huge hardware market for the current AI boom.

With each generation of development beyond the early graphics-card arrays, I think the architecture is becoming more refined towards purpose-built ML computing. I personally don't know a lot about this, nor at what point people will stop calling them "GPUs". Tesla's own Dojo project is a non-Nvidia example of custom silicon and extensive support hardware, but again it's an extremely high bandwidth digital computer module intended for efficient expansion, and in the meantime Tesla is buying tons of Nvidia along with most if not all other big players.

On a smaller scale, but with impressive computing power and efficiency, the same comments hold for the inference processors within the car. Tesla did design a fairly impressive and power efficient autopilot computer for HW3, a better (but higher power) one for HW4, and I wouldn't be surprised if they already have prototype silicon for HW5. In terms of volume deployment I think Tesla is currently the leader in this regard. Nvidia, Qualcomm, probably Intel for Mobileye's Orin, and others including Huawei et al in China, are working on these things.

Most of these projects are not just silicon processor development, but I think these companies all have in-house self-driving platform developments beyond just the idea of selling chips or computer boards to carmakers. I think some of the recent and existing-generation robotaxi companies don't have particularly efficient in-car computing; we hear about trunks stuffed full of computers and cooling equipment. But their volume is currently low and I'm sure they will be taking advantage of supplier developments in this space .

To finish this by throwing in the requisite v12 content: a big question that hangs over the v12 approach is whether the couple of million HW3 computers already out there, or even the faster HW4, have enough compute (inference) power to achieve the goal. People make a lot of pronouncements here in the forum, but I don't think the answer is completely known even inside Tesla, much less to the rest of us. There's a lot of talk these days about the ratio of training compute and data size to the inference compute assets. The field is moving rapidly and there are very encouraging reports that massive and properly targeted training effort can result in a very compact, efficient and capable inference implementation, i.e. could work very well on HW3. The counterpoint is that the training investment could be too high to achieve that goal, and that a more tractable training infrastructure (and training cycle time) could be enabled if the in-car hardware were better than HW3|4 by some factor. I'm far from being knowledgeable enough to make a prediction, but of course I have my hopes!

Almost anything can be represented digitally. What's important is what's laid down on silicon meets minimum performance specs of the system architecture. It's just math and it can be done on computers, DSPs, FGPAs, embedded systems, or the white board.

One thing is for certain, the team had no idea what was adequate for the HW design. Classic spec creep when one doesn't do their homework up front and there isn't enough excess margin in the design. We know HW3 used 8 bit weights. We also know the team incorrectly assumed HW3 would be adequate for improved driver safety via dual core redundancy. We know the team needed to increase HW4 weights to 10 bits as well as more compute power. HW5 will be even more capable but no one knows if it will be sufficient.

In spite of what Elon says, we know FSD improvements have been slow and v11 is much less capable than an attentive human driver. It's not clear if the team has already collected enough HW4 related data for HW4 optimization but that is the plan.

V12 shows some improvement with more human like driving (could've have been done in v11) and acknowledging/responding to speed bumps but otherwise there's still signs of significant system latency.

We can't change the past but hope springs eternal.

JB47394 · Feb 17, 2024

Mardak said:
Yeah, more and faster training should allow multiple engineers to explore additional techniques in parallel such as taking a larger model that wouldn't run fast enough in a car and using it to teach a smaller student model.

You're talking about training, but what about verification? Is that automated via simulation? How long does it take to verify a trained network's adherence to the 'specification'. How do we know that the car won't turn into a rampaging bull when it sees a tree painted blue?

jabloomf1230 · Feb 17, 2024

lzolman said:
This just came up in my FB feed and it's pretty cool. Almost totally OT of course.

https://www.facebook.com/video.php?v=1104016354365798

He could have made millions if he had attached it to a vacuum cleaner.

Daniel in SD · Feb 17, 2024

stopcrazypp said:
The Synopsys's definition may be a bad interpretation of SAE's but what I mean is the specific person using it is using exactly that diagram when they say "level 4" and that is what they mean when they use that term.

They are not using the "SAE level 4" definition in the J3016 diagram. So if you assumed they meant "SAE Level 4" instead of "Synopsys Level 4" during a discussion, you would have misunderstood what they meant.

You come to the similar situation as the whole "drive" discussion above. The OP meant the vehicle can move around the block with no user intervention, not that it would operate as SAE level 4. But if you insist that you can only discuss things in SAE terms, then you end up with that whole argument about what "drive" means. Or you can clarify with the OP what he means and find out it has nothing to do with SAE.

To bring it back to the point, when Elon says "level 5" does he mean truly the SAE level 5, or does he mean SAE level 4 with no geofencing (but other ODD limits allowed like rain for example) or something else (door to door L3)? If he said "SAE Level 5" then we can assume he means that. If not, he may likely be using a different or more colloquial definition, as many people do (including in this forum). That's my main point.

There is no contradiction once you realize that FSD beta is L5 and it can drive itself around the block. However if you send it around the block without a safety driver you would risk criminal prosecution because it's clearly not safe to do so.
Elon defines L5 as driverless. You all are overanalyzing it, a vehicle that can be summoned from coast to coast is clearly SAE Level 5.

Knightshade · Feb 17, 2024

Daniel in SD said:
There is no contradiction once you realize that FSD beta is L5 and it can drive itself around the block.

So, only if you realize things that are factually untrue and directly contradicted by Teslas own public and private statements including those to government agencies.

Thanks for clarifying!

stopcrazypp · Feb 17, 2024

Daniel in SD said:
There is no contradiction once you realize that FSD beta is L5 and it can drive itself around the block. However if you send it around the block without a safety driver you would risk criminal prosecution because it's clearly not safe to do so.
Elon defines L5 as driverless. You all are overanalyzing it, a vehicle that can be summoned from coast to coast is clearly SAE Level 5.

A car that can be summoned coast to coast may be considered "level 5" in colloquial speech (given no geofenced restrictions) but if it had another ODD restriction (like it can't do it in rain) it would not be "SAE Level 5".

Another common definition of "level 5" I've seen is just removal of driver controls, so essentially what is considered "SAE level 4" but with no option for driver controls. "SAE Level 5" however allows you to keep driver controls.

Daniel in SD · Feb 17, 2024

stopcrazypp said:
A car that can be summoned coast to coast may be considered "level 5" in colloquial speech (given no geofenced restrictions) but if it had another ODD restriction (like it can't do it in rain) it would not be "SAE Level 5".

Another common definition "level 5" I've seen is just removal of driver controls, so essentially what is considered "SAE level 4" but with no option for driver controls.

How exactly would that work given that it takes two days to drive coast to coast and rain is unpredictable?
Somehow you've concluded that even though Elon calls it L5 and has never described it in a way inconsistent with SAE L5 he actually has his own secret definition of L5.

Knightshade · Feb 17, 2024

Daniel in SD said:
How exactly would that work given that it takes two days to drive coast to coast and rain is unpredictable?

An L4 vehicle would safely pull over and park until the bad weather had passed (in this specific example) since one of the requirements for L4 is the ability to "fail safely" if you're exceeding your ODD.

Daniel in SD said:
Somehow you've concluded that even though Elon calls it L5 and has never described it in a way inconsistent with SAE L5 he actually has his own secret definition of L5.

Probably the same way Tesla calls current FSD L2 and has never described it in a way inconsistent with SAE L2, yet YOU actually have your own secret definition where it's L5

Daniel in SD · Feb 17, 2024

Knightshade said:
An L4 vehicle would safely pull over and park until the bad weather had passed (in this specific example) since one of the requirements for L4 is the ability to "fail safely" if you're exceeding your ODD.

That is also the case for L5.

“Unconditional/not ODD-specific” means that the ADS can operate the vehicle on-road anywhere within its region of the world and under all road conditions in which a conventional vehicle can be reasonably operated by a typically skilled human driver. This means, for example, that there are no design-based weather, time-of-day, or geographical restrictions on where and when the ADS can operate the vehicle. However, there may be conditions not manageable by a driver in which the ADS would also be unable to complete a given trip (e.g., white-out snow storm, flooded roads, glare ice, etc.) until or unless the adverse conditions clear. At the onset of such unmanageable conditions the ADS would perform the DDT fallback to achieve a minimal risk condition (e.g., by pulling over to the side of the road and waiting for the conditions to change).

Knightshade said:
Probably the same way Tesla calls current FSD L2 and has never described it in a way inconsistent with SAE L2, yet YOU actually have your own secret definition where it's L5

Except that there are inconsistencies with L2. For example in the DMV emails where they use miles per interaction as a metric for when they will declare it L5. And that number is so high that they could safely deploy it as L5 with a system that mysteriously never had L5 as the "production design intent."

Knightshade · Feb 17, 2024

Daniel in SD said:
That is also the case for L5.

It's not though since as your own quote shows, there is no ODD for L5.

Pulling over when a HUMAN wouldn't be able to drive is fundamentally different than pulling over when you leave an ODD but a human COULD still drive. It is, in fact, the fundamental difference between L4 and L5.

Daniel in SD said:
Except that there are inconsistencies with L2.

No, there are not. They're clear, repeatedly, their system is L2, both in practice and design intent.

The only place it's any higher is in your imagination.

Daniel in SD · Feb 17, 2024

Knightshade said:
It's not though since as your own quote shows, there is no ODD for L5.

Pulling over when a HUMAN wouldn't be able to drive is fundamentally different than pulling over when you leave an ODD but a human COULD still drive. It is, in fact, the fundamental difference between L4 and L5.

No, there are not. They're clear, repeatedly, their system is L2, both in practice and design intent.

The only place it's any higher is in your imagination.

Yes, my point is that stopping for severe weather is consistent with L5 but not being able to drive in the rain would not be practical for coast to coast summon. It could literally take weeks to drive coast to coast if the ODD did not include rain.

enemji · Feb 17, 2024

Usain said:
You seem to be saying that training cycle time could be reduced if the in-car hardware was faster/beefier/more efficient. Is that correct? And do you think the inverse is also true?

@JHCCAZ’s post is beautiful and should be given the attention that it should by anyone who cares about NN and the growth of AI.

Your question is also very well put. The compute power is indeed needed but where should it be? Should it be in a global center or crowd distributed? It comes down to economics. I would have thought that they would have started using the FSD power of all the cars that are sitting parked.

enemji · Feb 17, 2024

lzolman said:
This just came up in my FB feed and it's pretty cool. Almost totally OT of course.

https://www.facebook.com/video.php?v=1104016354365798

The original Roomba

sleepydoc · Feb 17, 2024

enemji said:
@JHCCAZ’s post is beautiful and should be given the attention that it should by anyone who cares about NN and the growth of AI.

Your question is also very well put. The compute power is indeed needed but where should it be? Should it be in a global center or crowd distributed? It comes down to economics. I would have thought that they would have started using the FSD power of all the cars that are sitting parked.

theoretically that would be a great idea, except they would be using the energy of the parked cars, costing the owners money and potentially draining the battery without them knowing. They could potentially set up some system with supercharging credits but that doesn’t fix the problem of someone coming out to the car and finding the battery 15% lower than they expected

FSD v12.x (end to end AI)

Active Member

Never thought I'd be driving the world's best car!

Well-Known Member

Active Member

Never thought I'd be driving the world's best car!

Active Member

Active Member

Active Member

Minister of Silly Walks

(supervised)

Well-Known Member

Well-Known Member

(supervised)

Well-Known Member

(supervised)

Well-Known Member

(supervised)

Banned

Banned

Well-Known Member

Similar threads