FSD v12.x (end to end AI)

willow_hiller · Jan 2, 2024

DrChaos said:
The end of quarter push to get numbers and earnings, a practice spanning many years, belie inferences from one tweet.

He obviously cares about the company growing, and demonstrating that growth over time. But caring about the company doesn't mean he cares about the stock.

And original point was that Tesla's entire robotics program can be written off because it's a "stock pump." There's no evidence of that, and it's a bad take meant to shut down discussion.

stopcrazypp · Jan 2, 2024

rentierparasit said:
Hi, StopCrazypp --

This is at the core of the debate. If you've got a burger flipping robot (machine?) and fry-cooking robot (machine?) what do you need the general purpose robot for? Is the theory that broader adoption will lead to increased production leading to economies of scale that allow a GPR to be produced for less than SPR?

That's exactly the hope. Working robots have largely been out of reach for normal households and most businesses because the actuators are made in such low volumes that they are too expensive. That's why there tends to be excitement when an automaker shows interest, because they can actually possibly achieve economies of scale. Previously it was Honda, but they never went anywhere with it. We will see if Tesla actually releases an actual commercially released product.

rentierparasit said:
Maybe! But you'd have to look at that on a case-by-case basis. And note there's a good bit of circularity here; what are the tasks that require a GPR that are so numerous to lead to economies of scale sufficient to drive out SPRs?

Tasks that may require more mobility than a specialized robot can accomplish. For example a robot that takes the food to the table. Also the fact that presumably they can be trained by the owner to do tasks beyond what the company programmed them for, which allows one unit to do multiple things (instead of needing to buy multiple different specialized robots).

rentierparasit said:
Honestly, I think what drives a lot of the interest in humanoid GPRs is the idea of having a personal servant. It's obviously impractical to have an omelette making robot, and a dishwashing robot, and a putting-away-the-groceries robot, there just isn't enough room in the kitchen, but easy to imagine a humanoid GPR that was capable of all them. The difficulty you run into there is the AI. The most recent version of the Turing Test I've heard is, "Go into a strange apartment and make a pot of coffee." Thanks to abundant promiscuity, I've done that, but sometimes it's not easy!

Yours,
RP

I don't think the initial general purpose robots may necessarily be able to accomplish that (something just based on a vague command). Rather, they may need the owner to first train them in a task then later repeat that task. However they will simply be more flexible than a specialized robot (not be stuck on rails like that fry making/burger flipping robot nor only be able to accomplish tasks it was programmed for by the company).

Mardak · Jan 2, 2024

spacecoin said:
Given that you need millions of examples to train a system for a single task using today’s technology

Do we know how many end-to-end control examples Tesla needs now? The engineering effort to scale language model training resulted in GPT-3 handling many more tasks with few-shot learning, so is there potential for a vision foundation world model to support learning from a few examples? For example a humanoid robot could be provided both good and bad examples of picking up an egg to realize what is actually an egg and that too much pressure breaks the egg.

Similarly for end-to-end vehicle control, could there be some underlying understanding of traffic and other vehicle behaviors to "naturally" mimic what other vehicles do in one region vs another region? Just like how people learned what tasks ChatGPT is better or worse at, it'll be interesting to see what v12 can do with a base amount of training and which types of problems are more difficult requiring more engineering.

stopcrazypp · Jan 2, 2024

spacecoin said:
Given that you need millions of examples to train a system for a single task using today’s technology I don’t think training of a physical robot is feasible to happen on site. That’s not practical and will take years or decades using a single robot. I don’t think world models will change this meaningfully.

The training for the basics would be already done (for example how to pick up an object, or how to walk from point a to point b). You just need to fine tune it with the specific task you want it to accomplish.

spacecoin said:
A generic robot is also likely to be a lot more expensive to make, train, and maintain than a specialized one. What’s the point of those legs and 12 DoF when you‘re “making burgers and fries”?

The legs and DOF will come for "free" (and may give some flexibility for example for bringing the items to the counter after it is done without needing another separate robot). It's the same logic of how Tesla includes hardware like heated rear seats, but have it disabled. At some point it becomes cheaper to not make so many different models even if some parts are extraneous for the given task.

spacecoin said:
A software agent can be trained and be deployed cheaply at scale to get the RLHF feedback loop going. Not so much for robotics: Robots are expensive to make, need onsite service and hw gets obsolete quickly.

The specific tasks you laid out as other pointed out is more complex than the simpler tasks that a general purpose robot would likely be initially doing.

spacecoin · Jan 2, 2024

willow_hiller said:
And original point was that Tesla's entire robotics program can be written off because it's a "stock pump." There's no evidence of that, and it's a bad take meant to shut down discussion.

If you can't counter a simple poke like that, perhaps its time to question your beliefs? If you think humanoid robots are imminent, please lay out the case?

There is definitely "no evidence" that Tesla will make Optimus work for anything that requires the humanoid form autonomously. The whole thing is a "bad take" at getting clueless people to believe that AGI is near using teleoperations or previously CGI. Paint it black part deux.

willow_hiller · Jan 2, 2024

spacecoin said:
If you can't counter a simple poke like that, perhaps its time to question your beliefs? If you think humanoid robots are imminent, please lay out the case?

There is definitely "no evidence" that Tesla will make Optimus work for anything that requires the humanoid form autonomously. The whole thing is a "bad take" at getting clueless people to believe that AGI is near using teleoperations or previously CGI. Paint it black part deux.

What pokes of yours haven't I countered? You're making a wild claim about Tesla establishing a robotics program for the sole purpose of financial fraud and you can't back it up.

And I'm not saying anything about AGI. You're doing a very poor job of setting up straw men. I'm saying a humanoid robot application of Tesla's FSD is worth discussing.

In another thread, many FSD Beta versions ago, I had a moment where I drove parallel to a gate that was blocked by a single thin chain that showed up as drivable space. And it occurred to me that Optimus might be useful in the development of FSD by providing a tactile understanding of objects at the human scale. Our cars can't walk up to a chain and learn that it is high in tensile strength and likely to cause damage if driven into. But Optimus possibly could.

And think of all the thousands of different types of objects a car can encounter on the road; Tesla cannot write code to anticipate all of those, or curate enough training data to let FSD know how to respond to things that may have never been on the road before. But if you have a humanoid robot handling everyday objects, it has a chance.

Mardak · Jan 2, 2024

DrChaos said:
the problem is not perception, it's policy (what to do and when) and that's even harder for most general robotics tasks than driving

Do Tesla vision foundation models need both video and current control for training? While IMU could be a decent input proxy with linear and angular velocities to correctly differentiate between waiting at an intersection or turning through it, that could be different enough from accelerator and steering wheel control.

In particular, I'm wondering if the model can understand how the world will change when controlling a turn. But for humanoid robots, often times video will actually include the robot's own hands, so would training the model with control lead to understanding how hand control affects the world? If so, it seems like for end-to-end, both perception and policy would be more tied together than before?

Mardak · Jan 2, 2024

What level is next-level?

https://twitter.com/x/status/1742403669486707175

AlanSubie4Life · Jan 2, 2024

Mardak said:
What level is next-level?

https://twitter.com/x/status/1742403669486707175

None of my drives have even close to zero interventions. I mean, it is way better than it used to be 2.5 years ago. But it makes horrible mistakes (not necessarily true safety interventions but could increase risk) all the time.

So I would guess that Elon means it is about the same. Still pretty mediocre, is what he is suggesting. The next level of mediocrity, if you will.

Reminds me of the motivational poster for mediocrity.

As someone on Twitter pointed out:

https://x.com/elonmusk/status/1430767616524787712?s=20

Screenshot 2024-01-02 at 10.58.53 PM.png

drtimhill · Jan 2, 2024

Mardak said:
If you're referring to depth of the stack as how long it takes a sequence of modules to process inputs to perception to control along with all its data shuffling, the framerate and time from photon to action are both heavily tied to the slowest path. Extending the stack with a new control neural network will probably make things take longer and slow down the framerate, but Elon Musk said during the August livestream that end-to-end is faster on HW3:

The pure AI version runs faster than the version that is a mixture of normal software and AI. In fact it would run it faster than 36 frames per second except the cameras are currently only capable of 36 FPS. Our current back the envelope frame number is we think it could probably run 50 frames a second.

If the end-to-end network is able to run faster without depending on existing networks that are slower, there's is actually excess compute to run the old perception potentially completely on a separate SoC in parallel to provide visualization context at a lower framerate during this transition.

No, he stated that the combination of the existing NNs and the C++ code was slower than an all NN based solution, so you can't deduce anything about the "unified NN" vs "stacked NN" options based on that, since we do not know the overhead of the C++ stack And NN speeds are closely tired both to depth AND breadth, so a back-end NN that reduces the breadth of the front-end control NN can actually result in a faster overall throughput than a shallower (but broader) unified NN.

In fact, of course, there are a huge number of variables here, which is one of the issues Tesla has to deal with as they progress with the V12 design.

spacecoin · Jan 3, 2024

willow_hiller said:
What pokes of yours haven't I countered? You're making a wild claim about Tesla establishing a robotics program for the sole purpose of financial fraud and you can't back it up.

I didn’t say anything about financial fraud. A lot of companies do things solely/mostly for optics. Spending a few hundred millions of this type of marketing is cheap, don’t you agree? You can book it as capex even. And get happy employees since their stock comp doesn’t crater.

willow_hiller said:
And I'm not saying anything about AGI. You're doing a very poor job of setting up straw men. I'm saying a humanoid robot application of Tesla's FSD is worth discussing.

It’s really quite simple: robotaxis are orders of magnitudes easier to solve than getting a humanoid to “(make and) bring me coffee and an omelette” or the car factory equivalent in a random location, even if you train it for a specific location and setup.

willow_hiller said:
In another thread, many FSD Beta versions ago, I had a moment where I drove parallel to a gate that was blocked by a single thin chain that showed up as drivable space. And it occurred to me that Optimus might be useful in the development of FSD by providing a tactile understanding of objects at the human scale. Our cars can't walk up to a chain and learn that it is high in tensile strength and likely to cause damage if driven into. But Optimus possibly could.

And think of all the thousands of different types of objects a car can encounter on the road; Tesla cannot write code to anticipate all of those, or curate enough training data to let FSD know how to respond to things that may have never been on the road before. But if you have a humanoid robot handling everyday objects, it has a chance.

It doesn’t work like that. You don’t need to touch a chain to train for a chain blocking the road... You need to add lots of such chains to your training dataset and label it accordingly.

willow_hiller · Jan 3, 2024

spacecoin said:
I didn’t say anything about financial fraud.

You should really be careful about your words if you don't mean them. You were referring to a "stock pump" which is most definitely part of a form of financial fraud. Pump and dump - Wikipedia

spacecoin said:
It doesn’t work like that. You don’t need to touch a chain to train for a chain blocking the road... You need to add lots of such chains to your training dataset and label it accordingly.

It doesn't work like that now, but you're being unimaginative. The kind of lack of imagination that thought landing rockets on their ends wasn't possible. The world was built by humans for humans, so the best shape to interpret the world is human-shaped. A car only has an opportunity to learn the nature of objects by hitting or avoiding them, so the training data will always be limited; but a humanoid bot can learn the nature of objects without damaging itself.

spacecoin · Jan 3, 2024

willow_hiller said:
You should really be careful about your words if you don't mean them. You were referring to a "stock pump" which is most definitely part of a form of financial fraud. Pump and dump - Wikipedia

I think you read too much into what I was writing. Stock pumping is not fraud, it's talking up the prospects of a company within the limits of the law in "future looking statements", using words like "i think, if Tesla executes perfectly during the coming five years, that it could some day have a higher market cap than Apple and Aramco combined".

Elon has said: "I might pump, but I won't dump" (context: crypto-currencies). Is he admitting fraud there? No.

All CEO:s do that, but no one seems to be as wrong/misleading/optimistic as Musk is consistently regarding the prospects of "AI".

Example from 2022Q1 call:
"Elon Musk: (31:23)
So I think we don’t want to jump the gun on an exciting product announcement too much. So I think we’ll aim to, if we do a product event for Robotaxi next year, and get into more detail, but we are aiming for volume production in 2024."

willow_hiller said:
It doesn't work like that now, but you're being unimaginative. The kind of lack of imagination that thought landing rockets on their ends wasn't possible. The world was built by humans for humans, so the best shape to interpret the world is human-shaped. A car only has an opportunity to learn the nature of objects by hitting or avoiding them, so the training data will always be limited; but a humanoid bot can learn the nature of objects without damaging itself.

I can imagine a future where things work differently. But I also know how things work today and approximately where the research is at. That's being grounded in reality. The opposite is being delusional. That's why I think the "useful humanoid" is at least 10-15 years out.

Statements like "the best shape to interpret the world is human-shaped" is complete BS.

Furthermore, you're basically claiming you can't learn a car not to drive off a cliff without doing in a few hundred times first. Again, not how it works for security critical applications.

Claiming that a humanoid robot will advance self driving is ridiculous.

DanCar · Jan 3, 2024

Mardak said:
What level is next-level?

https://twitter.com/x/status/1742403669486707175

Level 12 of hopium. After that we will have level 13 of hopium.

BBTX · Jan 3, 2024

spacecoin said:
Statements like "the best shape to interpret the world is human-shaped" is complete BS.

I actually think that argument has merit. Not only are all the human-built objects in our world designed by and for humans, but you can even stretch to the slightly-more-tenuous argument that humans and the natural world co-evolved, making the human form more uniquely-suitable (than, say, a car) for interpreting the natural world as well. If you're going to build something artificial that's meant to interact with our existing world in a very generally-capable and intelligent way, a humanoid design is likely to be a better bet than e.g. a car. You can go beyond-humanoid, but you must at least do humanoid. For example, it's a nice upgrade if your humanoid eyes can also see in infrared, but at a bare minimum they need to perceive all the wavelengths that humans perceive. Ditto for hands and DoF and all that: if they can do at least all the basic things a human hand can do (not just motion/grip, but also sensing), that's a good start in our world.

spacecoin · Jan 3, 2024

BBTX said:
I actually think that argument has merit. Not only are all the human-built objects in our world designed by and for humans, but you can even stretch to the slightly-more-tenuous argument that humans and the natural world co-evolved, making the human form more uniquely-suitable (than, say, a car) for interpreting the natural world as well. If you're going to build something artificial that's meant to interact with our existing world in a very generally-capable and intelligent way, a humanoid design is likely to be a better bet than e.g. a car. You can go beyond-humanoid, but you must at least do humanoid. For example, it's a nice upgrade if your humanoid eyes can also see in infrared, but at a bare minimum they need to perceive all the wavelengths that humans perceive. Ditto for hands and DoF and all that: if they can do at least all the basic things a human hand can do (not just motion/grip, but also sensing), that's a good start in our world.

The main reason to mimic a human look in robotics is for social reasons. You don’t need a head open doors and you don’t need to be bipedal to climb stairs.

Also it’s easier to get clueless people to go overboard with unrealistic expectations about capabilities with a humanoid. If it looks like a human it must be as capable as a human, right? C3PO looks cooler than R2D2, but everyone knows that R2D2 is the better one after watching the movies.

Mardak · Jan 3, 2024

AlanSubie4Life said:
As someone on Twitter pointed out:
https://x.com/elonmusk/status/1430767616524787712?s=20
View attachment 1005289

FSD Beta 10.x did allow for expanding deployment ~200x from just over a thousand private beta vehicles to ~363k. Cumulative miles similarly increased from not even a million miles to ~131M before 11.x single stack. I believe the main architectural change with 10.x was introducing memory both spatial and temporal initially with static objects including lines and road edges for better intersection predictions especially with dynamic occlusions then later applied to moving objects and occupancy network.

Will 12.x with end-to-end allow for eventually expanding deployment? Tesla is already testing 12.x around the world, so that could be an aspect of next level allowing people outside of North America to get actual value of city streets with FSD Capability. Similarly even within US/Canada, will people end up using FSD more along with higher take rate because it will be much more comfortable to use? So there can't be a matching 200x more vehicles with 12.x, but perhaps it'll be reflected in faster growth of cumulative miles?

AlanSubie4Life · Jan 3, 2024

Mardak said:
Will 12.x with end-to-end allow for eventually expanding deployment

I would think that if it works it would improve efficiency of training in rolling out to different areas. Don’t have to rewrite a bunch of code. If it works, which I guess I expect it will - though there is the question of how good it will be.

Mardak said:
people end up using FSD more along with higher take rate because it will be much more comfortable to use

I find this doubtful. Current 11.4.9 has gone back to oscillatory braking/regen behavior and these hard problems seem impossible for Tesla to resolve - they just get better or worse around some local “optimal” spot which is not optimal. I’m not convinced that NNs will be the secret that solves this but I guess we’ll see.

Anyway it has to be a lot better for better take rate.

Bladerskb · Jan 3, 2024

Speaking on the whole general robotics topic.
I actually think the whole Optimus thing will be a HUGE success compared to "FSD" (which i would refer to as a scam).
Obviously not in the time-line that Elon states.

But really, my views has nothing to do with Optimus or Tesla. It has everything to do with the advancement of NN models and their implementation in robotics.

I can totally see general purpose robots being used for specialized use in the next couple years just from how quick things are accelerating from academia.

I can definitely see a private company with huge resources being able to build a ML factory to create a unified & general agent, with unlimited compute, large scale data gathering and testing, simulator 1.0 (game engine based), simulator 2.0 (neural based), human labelers, auto labeling, hd mapping, etc.

Basically, utilize all the tools built for SDC for robotic.

https://twitter.com/x/status/1742605699174236663

https://twitter.com/x/status/1742603121682153852

Supcom · Jan 3, 2024

AlanSubie4Life said:
I would think that if it works it would improve efficiency of training in rolling out to different areas. Don’t have to rewrite a bunch of code. If it works, which I guess I expect it will - though there is the question of how good it will be.

I find this doubtful. Current 11.4.9 has gone back to oscillatory braking/regen behavior and these hard problems seem impossible for Tesla to resolve - they just get better or worse around some local “optimal” spot which is not optimal. I’m not convinced that NNs will be the secret that solves this but I guess we’ll see.

Anyway it has to be a lot better for better take rate.

It needs to be L3+ with a useful ODD for a significantly higher take rate.

FSD v12.x (end to end AI)

Well-Known Member

Well-Known Member

Active Member

Well-Known Member

Active Member

Well-Known Member

Active Member

Active Member

Efficiency Obsessed Member

Active Member

Active Member

Well-Known Member

Active Member

Active Member

Member

Active Member

Active Member

Efficiency Obsessed Member

Senior Software Engineer

Active Member

Similar threads