I wonder if it would be more efficient for it to have wheels instead of legs. Not exactly humanoid at that point, but it could get from A to B a lot quicker and more efficiently.
You can install our site as a web app on your iOS device by utilizing the Add to Home Screen feature in Safari. Please see this thread for more details on this.
Note: This feature may not be available in some browsers.
I wonder if it would be more efficient for it to have wheels instead of legs. Not exactly humanoid at that point, but it could get from A to B a lot quicker and more efficiently.
And if needed, Optimus could always hop on a One Wheel for A<->B w/o stairs.Unless there were stairs in between A and B![]()
Time to edit ran out, so will just add this. I realized that NERF today can go straight into Unreal, which Tesla are using for their simulation. So basically robot camera capture->unreal is done automatically today. Even normal amateurs can do this pretty easily with modern software such as Luma.ai:How to Train an Optimus
I think I figured out how Tesla will be training the Optimus. Basically they will start with very simple system and gradually decrease the amount of human effort. Here is the stage by stage of the development:
1. Human captures motion, simulate, deploy on robot. Showed in 2022 AI day
2. Human performs motion with sensei helmet, backpack and gloves control the robot aka teleoperation like sanctuary.ai. Showed on Shareholders day 2023 when the robot was moving small objects.
3. Human performs motion with sensei helmet, backpack and gloves and they record the sequences. AI learns to perform same movement on the robot. Aka "end2end". Showed on Shareholders day 2023 when the guy celebrates.
4. Robot observes a human performing the task, translates this into robot movement, no backpack needed, can be done by the customer. Will be shown on AI day 2023.
5. Robot hears a voice command, translates into a text string. An LLM figures out what the task is given the environment, converts this into a sequence of motions, shows it in simulation and the user can verify that it's the correct task and then the robot performs the task. Will be shown on AI day 2024.
6. Robot doesn't even need voice commands, just figures out what it should do. Decides that the dishes needs to be cleaned and places them in the correct place. AI day 2026!?
Currently Tesla is rapidly iterating on the hardware of the robot. What mechanical joints, motors, batteries, electronics etc it should have. and what cameras in what angles etc. The nice thing is that they can keep iterating without having to throw away previous data. What they just need to do is to have an intermediate step that reconstructs the world. This can easily be done, they already do this with autolabeler. Basically generate a "ground truth" environment and then simulate what they cameras should be seeing given a robots configuration and position. Then if a human is performing a task, the autolabeler can imagine what a robot would see if it was in the humans position and how the world would behave given the robots execution of the task.
So they can start generating a dataset of camera input->robot motion->object manipulation. First it will be very small with motion capture from the sensei backpack/helmet/gloves. Then eventually it will grow the dataset with humans interacting with objects seen from the robot. Heck they can probably even take data from the dataset from the cars fleet and see how the world evolves when humans are manipulating objects in it, then translate it into robot reference frame.
I recommend you to rewatch the autolabeler and simulation parts of AI day:
In Shareholder day 2023 they showed the robots "memorizing" the environment aka SLAM. I believe this is the first step of creating the ground truth dataset, then they can "imagine" what a robot would be seeing and more importantly should be perceiving(output from the vision neural network) in any given position. To train the neural network to accurately output the correct lane lines etc in the car, and in the case of the robot the physical shape and properties of the objects to interact with in. Then the control network uses the vision output as input. Thus they can quickly iterate on the hardware and use have the vision input translated from old hardware, from the point of view of a human in a different position etc into training data for the control network.
Plus they can fire up virtual robots in their thousands or millions and modify their environment slightly or significantly. In a virtual world, add pets, children playing with balls, multiple robots, obstacles - initially just to navigate around, eventually being aware of what an unpredictable animal can do.How to Train an Optimus
I think I figured out how Tesla will be training the Optimus. Basically they will start with very simple system and gradually decrease the amount of human effort. Here is the stage by stage of the development:
1. Human captures motion, simulate, deploy on robot. Showed in 2022 AI day
2. Human performs motion with sensei helmet, backpack and gloves control the robot aka teleoperation like sanctuary.ai. Showed on Shareholders day 2023 when the robot was moving small objects.
3. Human performs motion with sensei helmet, backpack and gloves and they record the sequences. AI learns to perform same movement on the robot. Aka "end2end". Showed on Shareholders day 2023 when the guy celebrates.
4. Robot observes a human performing the task, translates this into robot movement, no backpack needed, can be done by the customer. Will be shown on AI day 2023.
5. Robot hears a voice command, translates into a text string. An LLM figures out what the task is given the environment, converts this into a sequence of motions, shows it in simulation and the user can verify that it's the correct task and then the robot performs the task. Will be shown on AI day 2024.
6. Robot doesn't even need voice commands, just figures out what it should do. Decides that the dishes needs to be cleaned and places them in the correct place. AI day 2026!?
Currently Tesla is rapidly iterating on the hardware of the robot. What mechanical joints, motors, batteries, electronics etc it should have. and what cameras in what angles etc. The nice thing is that they can keep iterating without having to throw away previous data. What they just need to do is to have an intermediate step that reconstructs the world. This can easily be done, they already do this with autolabeler. Basically generate a "ground truth" environment and then simulate what they cameras should be seeing given a robots configuration and position. Then if a human is performing a task, the autolabeler can imagine what a robot would see if it was in the humans position and how the world would behave given the robots execution of the task.
So they can start generating a dataset of camera input->robot motion->object manipulation. First it will be very small with motion capture from the sensei backpack/helmet/gloves. Then eventually it will grow the dataset with humans interacting with objects seen from the robot. Heck they can probably even take data from the dataset from the cars fleet and see how the world evolves when humans are manipulating objects in it, then translate it into robot reference frame.
I recommend you to rewatch the autolabeler and simulation parts of AI day:
In Shareholder day 2023 they showed the robots "memorizing" the environment aka SLAM. I believe this is the first step of creating the ground truth dataset, then they can "imagine" what a robot would be seeing and more importantly should be perceiving(output from the vision neural network) in any given position. To train the neural network to accurately output the correct lane lines etc in the car, and in the case of the robot the physical shape and properties of the objects to interact with in. Then the control network uses the vision output as input. Thus they can quickly iterate on the hardware and use have the vision input translated from old hardware, from the point of view of a human in a different position etc into training data for the control network.
Yes. Lots of the massive compute from Dojo will be to train massive offline models to generate very good simulation for the robot to train in. Models that will need to understand the world very accurately. Real world AI. Small changes in the environment and the the robot interacting with the environment. Planning motion around dogs, kids, cars etc. Tesla car fleet will be a massive advantage in understanding how dogs and kids behave and as they have more and robots out there they will get a better understanding for how humans interact with the robots, how humans walk, how humans move their hands etc.Plus they can fire up virtual robots in their thousands or millions and modify their environment slightly or significantly. In a virtual world, add pets, children playing with balls, multiple robots, obstacles - initially just to navigate around, eventually being aware of what an unpredictable animal can do.
Simulation/Ai learning is a big advantage compared to others.Yes. Lots of the massive compute from Dojo will be to train massive offline models to generate very good simulation for the robot to train in. Models that will need to understand the world very accurately. Real world AI. Small changes in the environment and the the robot interacting with the environment. Planning motion around dogs, kids, cars etc. Tesla car fleet will be a massive advantage in understanding how dogs and kids behave and as they have more and robots out there they will get a better understanding for how humans interact with the robots, how humans walk, how humans move their hands etc.
In case y'all missed it, they are training Optimus by having it watch humans.
Optimus is learning how to perform new tasks simply by watching humans perform them.
OPTIMUS IS LEARNING JUST BY WATCHING HUMANS.
Let that sink in.
FFS, HODL FTW!!!
You are correct, but nonetheless, no custom code was written for that task. Tack on a LLM that can interpret and break down tasks down to the offline NN learned level and you’ve got something very powerful.I mean, it's not actually doing that of course.... The "OPTIMUS IS LEARNING JUST BY WATCHING HUMANS." bit simply is not so... no training ever happens local to the bot, nor does it happen in real time.... just as none ever happens local to a car or in real time- it doesn't have REMOTELY the compute power for that sort of thing.
What you saw shown was a human with sensor gear performing a task over and over with a bunch of data captured (quite a bit more than just "watching"- note the sensor-filled gloves for example)... and the captured data from it will go back to the giant GPU NN training clusters.... same as the fleet data from the cars does for training FSD.
Some folks seem to think they showed some kind of "Show your individual bot how to do something and that bot, just by watching you, will learn to do that thing" and that is not REMOTELY how any of that works.
Probably not right now. For now they probably have to retrain on the cluster for every new task. But once the robot master many tasks it can probably few shot new tasks.Some folks seem to think they showed some kind of "Show your individual bot how to do something and that bot, just by watching you, will learn to do that thing" and that is not REMOTELY how any of that works.
The other thing that was interesting with the Teslabot video was that the bot was mapping out its environment and remembering(?) it. That’s definitely a departure from FSD which doesn’t remember anything from one drive to the next.
So basically they have sold 1000 units! Yay! Elon is going for billions...