Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla Optimus Sub-Prime Robot

This site may earn commission on affiliate links.
Parts is a low margin business, chinese companies will soon 3D print all your stuff on aliexpress for 1/10 the price. What you want is a moat which you can only achieve with vertical integration from hardware to software to payments at high margins like Apple.

High margins eat low-profit volumes and production halts for breakfast.

There will be more Tesla Bots in the world than there are iPhones. A person only needs one working iPhone but several Tesla Bots can work for them.
Do you think Nvidia chips are a low margin business? The AI boom is driving up insatiable demand. What I am suggesting about the Bots component market is that critical hardware will also be in very high demand.

Critical components will not be mere commodities for another 10 years. For one thing parts will need to be standardized first, and the pace of innovation is ways too fast for that to happen.

For a simple example, it's taken a decade for NACS to become the standard for North American charging. Tesla had very good foresight into what would serve EVs well for decades to come.

What is sort of standard should be for a humanoid hand with NN comments? I expect that Tesla is looking deeply into that. They will want a protocol that enables them to continue to iterate on both hand design and AI fitting. If they land on protocol that can serve them for many generations, there is a very good chance that other Bot developers could also buy into Tesla's standard. If Tesla open sources this protocol (as I would expect) then other part makers could also enter into making these components too.

If other companies become cost competive in building Tesla protocol part, yes this could lead to lower margins for Tesla as a parts supplier. But it also means that Tesla could buy these parts from suppliers cheaper than making it themselves. In this case, Tesla is able to keep the cost of it final Bots down while ramping up supply. This is a winning outcome for Tesla.

My point with all this is that Tesla can take an active role in cultivating a Bots ecosystem. It's helpful for other Bot makers to buy into Tesla protocol parts as well as for other parts makers to build these parts. Moreover, we want to see a high level of innovation within the ecosystem.

Additionally, I see Dojo as part of this ecosystem. Musk has been clear that he envisions selling compute to other organizations seeking to fit NNs. O2 and other Bots are prime embodiments of NNs trained on Dojo. Moreover, Bot developers that are using Tesla protocols will have certain advantages training their NNs in the Dojo. For example, using parts that are already paired with Tesla NN components will save on compute time as they train and re-train in the Tesla Dojo. Keep in mind that training can cost tens of millions of dollars. Maintaining freshness of estimates is non-trivial. If using Tesla protocol parts can shave a few a certain percent off this expense, it will be a very attractive option, not at all a commodity deal.

I strongly suspect that the bulk of value in a Bot will be derived from the AI components, not from hardware. But integration of hardware and NN is really the key. So Tesla can define the protocols, produce the hardware, and make massive profit off of maintaining NN parameter estimates. I believe the value of Dojo is increased by open sourcing the protocols and fostering a deep ecosystem.
 
. I believe the value of Dojo is increased by open sourcing the protocols and fostering a deep ecosystem.


AFAIK even Tesla is still waiting to determine if Dojo has any value.

They're still installing Nvidia HW as fast as they can get their hands on it, and haven't established Dojo does as good or better a job for even their own internal use yet.... and you've got them selling it to others (and robot HW as well despite it being at least another year or two minimum before they have a robot you can even buy made by themselves) in the next 3 years?
 
  • Helpful
Reactions: cliff harris
AFAIK even Tesla is still waiting to determine if Dojo has any value.

They're still installing Nvidia HW as fast as they can get their hands on it, and haven't established Dojo does as good or better a job for even their own internal use yet.... and you've got them selling it to others (and robot HW as well despite it being at least another year or two minimum before they have a robot you can even buy made by themselves) in the next 3 years?
Are you referring to version 1 or 2 of dojo?
 
Ok, if you think Tesla will be compute constrained for the next three years that definitely would slow a lot of progress in the Bot space. It would actually worry me that Tesla could fall behind and miss out. From my perspective there is a massive ecosystem that Tesla wants to be a part of.
Last we knew, Tesla was ramping its computing capability with a vengeance. But its wants and needs are unlimited, up to and including human attainment of a Kardashev Level 1 civilization. It is taking months to train the FSD model. How often would they like to come out with a newly trained model? Every month? Every week? Every day? After all, you've got to train it and distribute it before you can start collecting the valuable data.

There is a ruthless prioritization of resources on the training side. But on the inference side, the wants and needs are mostly complementary at these quantities. It's wonderful that the Tesla Bot brain and FSD computer can share a parts bin. TSMC and Samsung are comfortable producing chips at cell phone quantities, so Tesla's current needs are breeze for them. For now.
 
Are you referring to version 1 or 2 of dojo?

Yes.

Dojo 1 has yet to prove it's as good or better than Nvidia (which was what Elon defined as the success criteria previously)

Dojo 2 appears to have yet to exist, and poor progress was rumored to be the reason the head of the program no longer works there-
 
  • Like
Reactions: Mengy and replicant
Yes.

Dojo 1 has yet to prove it's as good or better than Nvidia (which was what Elon defined as the success criteria previously)

Dojo 2 appears to have yet to exist, and poor progress was rumored to be the reason the head of the program no longer works there-

and I'd say it's a moving target, AMD is adding a layer to the race.

Will AMD or NVIDIA be better than Dojo? If so will AMD and/or NVIDIA be able to supply all the chips Tesla wants?

If not then Dojo is better than nothing even if it isn't better than AMD and NVIDIA.
 
Next, I would submit that Tesla should become the leading Bot parts supplier.
(...)
So I see selling fully assembled O2 as inconsequential. Selling a box of parts from which an O2 can be built will suffice for many years. And this means Tesla has some new products to scale and market in 2024.
Tesla could do this, but they most likely won't, since this adds a lot of complexity to the manufacturing line. (Needing the ability to ship one hand, one leg, etcetera, that have to have the proper connections for power and compute)

Elon has stated many times that he has too many "good ideas" that would work in some form or another. He is constantly prioritizing one thing over the other. Now Tesla is torn between improving FSD and improving the bot. Any training they can do that improves both technologies (for example training for a foundational model of the world) will most likely be prioritized over pure FSD (example: training for snowy conditions) or pure Bot (example: threading a needle). Second priority: FSD. Since it is easier to gain a profit from it in the short term. The FSD-robots (=the cars) are already out in the field in great numbers. The Bot isn't. And the design of the bot is less set in stone than the car design.

By December 2024 Tesla's compute will have risen a lot compared to today, so the engineers will have a little more leeway/luxury to spend time/compute on both FSD and the Bot, but currently the bot is indeed the lowest priority.

Another reason why Tesla won't supply loose optimus parts IMO is one stated before: a lot of the value lies in the complete integration of hardware and software (like Apple).

Tesla has so much on their plate it's unreal. Engineers and compute are however limited in the physical realm :).
 
We’ve built a very low-latency & high-fidelity teleoperation system, used to collect AI training data of the bot imitating humans performing certain tasks.
We’ve designed, trained and deployed some of the first end-to-end neural nets for humanoid robots ever demonstrated to autonomously perform tasks requiring coordinated control of humanoid torso, arms, and full hands with fingers.
We've designed and built yet another upgraded version of the bot (Optimus Gen-2), adding an articulated neck, revamped hands w/ tactile sensing, and a tighter integration of harnesses, actuators & electronics.
 
I'd like to archive that image here just in case it gets removed from X/Twitter

1704075579884.jpeg
 
Uh... that didn't learn how to make coffee. It learned how to insert a cup in a hole and push a button on another machine that makes coffee.

Show me a humanoid robot go from beans to handing me a finished cup and I'll be impressed.
 
  • Like
Reactions: navguy12
How to Train an Optimus

I think I figured out how Tesla will be training the Optimus. Basically they will start with very simple system and gradually decrease the amount of human effort. Here is the stage by stage of the development:

1. Human captures motion, simulate, deploy on robot. Showed in 2022 AI day
2. Human performs motion with sensei helmet, backpack and gloves control the robot aka teleoperation like sanctuary.ai. Showed on Shareholders day 2023 when the robot was moving small objects.
3. Human performs motion with sensei helmet, backpack and gloves and they record the sequences. AI learns to perform same movement on the robot. Aka "end2end". Showed on Shareholders day 2023 when the guy celebrates.
4. Robot observes a human performing the task, translates this into robot movement, no backpack needed, can be done by the customer. Will be shown on AI day 2023.
5. Robot hears a voice command, translates into a text string. An LLM figures out what the task is given the environment, converts this into a sequence of motions, shows it in simulation and the user can verify that it's the correct task and then the robot performs the task. Will be shown on AI day 2024.
6. Robot doesn't even need voice commands, just figures out what it should do. Decides that the dishes needs to be cleaned and places them in the correct place. AI day 2026!?


Currently Tesla is rapidly iterating on the hardware of the robot. What mechanical joints, motors, batteries, electronics etc it should have. and what cameras in what angles etc. The nice thing is that they can keep iterating without having to throw away previous data. What they just need to do is to have an intermediate step that reconstructs the world. This can easily be done, they already do this with autolabeler. Basically generate a "ground truth" environment and then simulate what they cameras should be seeing given a robots configuration and position. Then if a human is performing a task, the autolabeler can imagine what a robot would see if it was in the humans position and how the world would behave given the robots execution of the task.

So they can start generating a dataset of camera input->robot motion->object manipulation. First it will be very small with motion capture from the sensei backpack/helmet/gloves. Then eventually it will grow the dataset with humans interacting with objects seen from the robot. Heck they can probably even take data from the dataset from the cars fleet and see how the world evolves when humans are manipulating objects in it, then translate it into robot reference frame.

I recommend you to rewatch the autolabeler and simulation parts of AI day:

In Shareholder day 2023 they showed the robots "memorizing" the environment aka SLAM. I believe this is the first step of creating the ground truth dataset, then they can "imagine" what a robot would be seeing and more importantly should be perceiving(output from the vision neural network) in any given position. To train the neural network to accurately output the correct lane lines etc in the car, and in the case of the robot the physical shape and properties of the objects to interact with in. Then the control network uses the vision output as input. Thus they can quickly iterate on the hardware and use have the vision input translated from old hardware, from the point of view of a human in a different position etc into training data for the control network.


Great news if true. Further proves Tesla method is workable. Also that Tesla is ahead on what looks like expensive hardware.
So we are at stage 4 according to my guess, right about the time for it also. ^^
 
  • Like
Reactions: Buckminster

Great news if true. Further proves Tesla method is workable. Also that Tesla is ahead on what looks like expensive hardware.
Man, I was about to post that. I think this is a pretty cool breakthrough. It's not the complexity of the task here that matters, but the mode of learning. We want bots to be able to observe stuff (video in) and then figure out how to replicate it. One small step toward self-learning. Human toddlers do this.