Tesla Optimus Sub-Prime Robot

Chunky Jr. · May 18, 2023

I wonder if it would be more efficient for it to have wheels instead of legs. Not exactly humanoid at that point, but it could get from A to B a lot quicker and more efficiently.

Knightshade · May 18, 2023

Chunky Jr. said:
I wonder if it would be more efficient for it to have wheels instead of legs. Not exactly humanoid at that point, but it could get from A to B a lot quicker and more efficiently.

Unless there were stairs in between A and B

EVCollies · May 18, 2023

Knightshade said:
Unless there were stairs in between A and B

And if needed, Optimus could always hop on a One Wheel for A<->B w/o stairs.

heltok · May 19, 2023

How to Train an Optimus

I think I figured out how Tesla will be training the Optimus. Basically they will start with very simple system and gradually decrease the amount of human effort. Here is the stage by stage of the development:

1. Human captures motion, simulate, deploy on robot. Showed in 2022 AI day
2. Human performs motion with sensei helmet, backpack and gloves control the robot aka teleoperation like sanctuary.ai. Showed on Shareholders day 2023 when the robot was moving small objects.
3. Human performs motion with sensei helmet, backpack and gloves and they record the sequences. AI learns to perform same movement on the robot. Aka "end2end". Showed on Shareholders day 2023 when the guy celebrates.
4. Robot observes a human performing the task, translates this into robot movement, no backpack needed, can be done by the customer. Will be shown on AI day 2023.
5. Robot hears a voice command, translates into a text string. An LLM figures out what the task is given the environment, converts this into a sequence of motions, shows it in simulation and the user can verify that it's the correct task and then the robot performs the task. Will be shown on AI day 2024.
6. Robot doesn't even need voice commands, just figures out what it should do. Decides that the dishes needs to be cleaned and places them in the correct place. AI day 2026!?

Currently Tesla is rapidly iterating on the hardware of the robot. What mechanical joints, motors, batteries, electronics etc it should have. and what cameras in what angles etc. The nice thing is that they can keep iterating without having to throw away previous data. What they just need to do is to have an intermediate step that reconstructs the world. This can easily be done, they already do this with autolabeler. Basically generate a "ground truth" environment and then simulate what they cameras should be seeing given a robots configuration and position. Then if a human is performing a task, the autolabeler can imagine what a robot would see if it was in the humans position and how the world would behave given the robots execution of the task.

So they can start generating a dataset of camera input->robot motion->object manipulation. First it will be very small with motion capture from the sensei backpack/helmet/gloves. Then eventually it will grow the dataset with humans interacting with objects seen from the robot. Heck they can probably even take data from the dataset from the cars fleet and see how the world evolves when humans are manipulating objects in it, then translate it into robot reference frame.

I recommend you to rewatch the autolabeler and simulation parts of AI day:

In Shareholder day 2023 they showed the robots "memorizing" the environment aka SLAM. I believe this is the first step of creating the ground truth dataset, then they can "imagine" what a robot would be seeing and more importantly should be perceiving(output from the vision neural network) in any given position. To train the neural network to accurately output the correct lane lines etc in the car, and in the case of the robot the physical shape and properties of the objects to interact with in. Then the control network uses the vision output as input. Thus they can quickly iterate on the hardware and use have the vision input translated from old hardware, from the point of view of a human in a different position etc into training data for the control network.

heltok · May 19, 2023

heltok said:
How to Train an Optimus

I think I figured out how Tesla will be training the Optimus. Basically they will start with very simple system and gradually decrease the amount of human effort. Here is the stage by stage of the development:

1. Human captures motion, simulate, deploy on robot. Showed in 2022 AI day
2. Human performs motion with sensei helmet, backpack and gloves control the robot aka teleoperation like sanctuary.ai. Showed on Shareholders day 2023 when the robot was moving small objects.
3. Human performs motion with sensei helmet, backpack and gloves and they record the sequences. AI learns to perform same movement on the robot. Aka "end2end". Showed on Shareholders day 2023 when the guy celebrates.
4. Robot observes a human performing the task, translates this into robot movement, no backpack needed, can be done by the customer. Will be shown on AI day 2023.
5. Robot hears a voice command, translates into a text string. An LLM figures out what the task is given the environment, converts this into a sequence of motions, shows it in simulation and the user can verify that it's the correct task and then the robot performs the task. Will be shown on AI day 2024.
6. Robot doesn't even need voice commands, just figures out what it should do. Decides that the dishes needs to be cleaned and places them in the correct place. AI day 2026!?

Currently Tesla is rapidly iterating on the hardware of the robot. What mechanical joints, motors, batteries, electronics etc it should have. and what cameras in what angles etc. The nice thing is that they can keep iterating without having to throw away previous data. What they just need to do is to have an intermediate step that reconstructs the world. This can easily be done, they already do this with autolabeler. Basically generate a "ground truth" environment and then simulate what they cameras should be seeing given a robots configuration and position. Then if a human is performing a task, the autolabeler can imagine what a robot would see if it was in the humans position and how the world would behave given the robots execution of the task.

So they can start generating a dataset of camera input->robot motion->object manipulation. First it will be very small with motion capture from the sensei backpack/helmet/gloves. Then eventually it will grow the dataset with humans interacting with objects seen from the robot. Heck they can probably even take data from the dataset from the cars fleet and see how the world evolves when humans are manipulating objects in it, then translate it into robot reference frame.

I recommend you to rewatch the autolabeler and simulation parts of AI day:

In Shareholder day 2023 they showed the robots "memorizing" the environment aka SLAM. I believe this is the first step of creating the ground truth dataset, then they can "imagine" what a robot would be seeing and more importantly should be perceiving(output from the vision neural network) in any given position. To train the neural network to accurately output the correct lane lines etc in the car, and in the case of the robot the physical shape and properties of the objects to interact with in. Then the control network uses the vision output as input. Thus they can quickly iterate on the hardware and use have the vision input translated from old hardware, from the point of view of a human in a different position etc into training data for the control network.

Time to edit ran out, so will just add this. I realized that NERF today can go straight into Unreal, which Tesla are using for their simulation. So basically robot camera capture->unreal is done automatically today. Even normal amateurs can do this pretty easily with modern software such as Luma.ai:

It's getting crazy good. This should be excellent to go from robot/human capture->simulation of the environment in unreal. Tesla probably have augmentet their unreal engine with excellent physics, crash simulation software etc.

UkNorthampton · May 19, 2023

heltok said:
How to Train an Optimus

I think I figured out how Tesla will be training the Optimus. Basically they will start with very simple system and gradually decrease the amount of human effort. Here is the stage by stage of the development:

1. Human captures motion, simulate, deploy on robot. Showed in 2022 AI day
2. Human performs motion with sensei helmet, backpack and gloves control the robot aka teleoperation like sanctuary.ai. Showed on Shareholders day 2023 when the robot was moving small objects.
3. Human performs motion with sensei helmet, backpack and gloves and they record the sequences. AI learns to perform same movement on the robot. Aka "end2end". Showed on Shareholders day 2023 when the guy celebrates.
4. Robot observes a human performing the task, translates this into robot movement, no backpack needed, can be done by the customer. Will be shown on AI day 2023.
5. Robot hears a voice command, translates into a text string. An LLM figures out what the task is given the environment, converts this into a sequence of motions, shows it in simulation and the user can verify that it's the correct task and then the robot performs the task. Will be shown on AI day 2024.
6. Robot doesn't even need voice commands, just figures out what it should do. Decides that the dishes needs to be cleaned and places them in the correct place. AI day 2026!?

Currently Tesla is rapidly iterating on the hardware of the robot. What mechanical joints, motors, batteries, electronics etc it should have. and what cameras in what angles etc. The nice thing is that they can keep iterating without having to throw away previous data. What they just need to do is to have an intermediate step that reconstructs the world. This can easily be done, they already do this with autolabeler. Basically generate a "ground truth" environment and then simulate what they cameras should be seeing given a robots configuration and position. Then if a human is performing a task, the autolabeler can imagine what a robot would see if it was in the humans position and how the world would behave given the robots execution of the task.

So they can start generating a dataset of camera input->robot motion->object manipulation. First it will be very small with motion capture from the sensei backpack/helmet/gloves. Then eventually it will grow the dataset with humans interacting with objects seen from the robot. Heck they can probably even take data from the dataset from the cars fleet and see how the world evolves when humans are manipulating objects in it, then translate it into robot reference frame.

I recommend you to rewatch the autolabeler and simulation parts of AI day:

In Shareholder day 2023 they showed the robots "memorizing" the environment aka SLAM. I believe this is the first step of creating the ground truth dataset, then they can "imagine" what a robot would be seeing and more importantly should be perceiving(output from the vision neural network) in any given position. To train the neural network to accurately output the correct lane lines etc in the car, and in the case of the robot the physical shape and properties of the objects to interact with in. Then the control network uses the vision output as input. Thus they can quickly iterate on the hardware and use have the vision input translated from old hardware, from the point of view of a human in a different position etc into training data for the control network.

Plus they can fire up virtual robots in their thousands or millions and modify their environment slightly or significantly. In a virtual world, add pets, children playing with balls, multiple robots, obstacles - initially just to navigate around, eventually being aware of what an unpredictable animal can do.

heltok · May 19, 2023

UkNorthampton said:
Plus they can fire up virtual robots in their thousands or millions and modify their environment slightly or significantly. In a virtual world, add pets, children playing with balls, multiple robots, obstacles - initially just to navigate around, eventually being aware of what an unpredictable animal can do.

Yes. Lots of the massive compute from Dojo will be to train massive offline models to generate very good simulation for the robot to train in. Models that will need to understand the world very accurately. Real world AI. Small changes in the environment and the the robot interacting with the environment. Planning motion around dogs, kids, cars etc. Tesla car fleet will be a massive advantage in understanding how dogs and kids behave and as they have more and robots out there they will get a better understanding for how humans interact with the robots, how humans walk, how humans move their hands etc.

UkNorthampton · May 19, 2023

heltok said:
Yes. Lots of the massive compute from Dojo will be to train massive offline models to generate very good simulation for the robot to train in. Models that will need to understand the world very accurately. Real world AI. Small changes in the environment and the the robot interacting with the environment. Planning motion around dogs, kids, cars etc. Tesla car fleet will be a massive advantage in understanding how dogs and kids behave and as they have more and robots out there they will get a better understanding for how humans interact with the robots, how humans walk, how humans move their hands etc.

Simulation/Ai learning is a big advantage compared to others.

Starship robot near me had difficulty crossing the road and ended up under a car recently. Some discussion whether it actually uses the crossing like a human does, reading the green man sign or whether it crosses when there's a traffic gap. I know they ask people to press the crossing button as they can't do it themselves. A few cross (maybe 2) while the others queue up out of the way of pedestrians and wait for another opportunity. Anyway, presumably car driver didn't see robot - I think the driver was turning right (reverse for USA equivalent) onto the road and didn't see the robot crossing, perhaps as the other lane's vehicles obscured their view or maybe SUV with little forward view. The little orange triangle isn't obvious enough to some. I can't remember if triangle is lit up. Low speed but robot turned into a wedge/jack underneath. Might have been nasty repair price.

Apparently a few robots have been hit recently (according to rumour). Sunday mornings are busiest times, presumably after a few sherberts on Saturday night - bringing hair of the dog, McDonalds or ingredients for English breakfast (I'm not sure who else use them apart from Co-Op shops & McD's).

How can I add music to my order? - Starship Technologies: Autonomous robot delivery

We offer a small selection of songs for a robot to play when it delivers your order. When placing your order, you can add one of these songs to your basket (at no cost) from the ‘robot music’ category, before completing your order and checking out as usual. We update this music selection on a

www.starship.xyz

Starship Robots - FAQs - Starship Technologies: Autonomous robot delivery

The Starship Technologies FAQs answer the most common questions we receive about the company, robots, food delivery app and our services.

www.starship.xyz

Obviously not competition for Tesla's Optimus Subprime, BUT they've been operating & expanding for years, so providing evidence of a business case for even simple robots.

Buckminster · May 20, 2023

growler23 said:
In case y'all missed it, they are training Optimus by having it watch humans.
Optimus is learning how to perform new tasks simply by watching humans perform them.
OPTIMUS IS LEARNING JUST BY WATCHING HUMANS.
Let that sink in.

FFS, HODL FTW!!!

Knightshade · May 20, 2023

I mean, it's not actually doing that of course.... The "OPTIMUS IS LEARNING JUST BY WATCHING HUMANS." bit simply is not so... no training ever happens local to the bot, nor does it happen in real time.... just as none ever happens local to a car or in real time- it doesn't have REMOTELY the compute power for that sort of thing.

What you saw shown was a human with sensor gear performing a task over and over with a bunch of data captured (quite a bit more than just "watching"- note the sensor-filled gloves for example)... and the captured data from it will go back to the giant GPU NN training clusters.... same as the fleet data from the cars does for training FSD.

Some folks seem to think they showed some kind of "Show your individual bot how to do something and that bot, just by watching you, will learn to do that thing" and that is not REMOTELY how any of that works.

Cosmacelf · May 20, 2023

Knightshade said:
I mean, it's not actually doing that of course.... The "OPTIMUS IS LEARNING JUST BY WATCHING HUMANS." bit simply is not so... no training ever happens local to the bot, nor does it happen in real time.... just as none ever happens local to a car or in real time- it doesn't have REMOTELY the compute power for that sort of thing.

What you saw shown was a human with sensor gear performing a task over and over with a bunch of data captured (quite a bit more than just "watching"- note the sensor-filled gloves for example)... and the captured data from it will go back to the giant GPU NN training clusters.... same as the fleet data from the cars does for training FSD.

Some folks seem to think they showed some kind of "Show your individual bot how to do something and that bot, just by watching you, will learn to do that thing" and that is not REMOTELY how any of that works.

You are correct, but nonetheless, no custom code was written for that task. Tack on a LLM that can interpret and break down tasks down to the offline NN learned level and you’ve got something very powerful.

Cosmacelf · May 20, 2023

The other thing that was interesting with the Teslabot video was that the bot was mapping out its environment and remembering(?) it. That’s definitely a departure from FSD which doesn’t remember anything from one drive to the next. Of course, we don’t know how sophisticated this is.

heltok · May 20, 2023

Knightshade said:
Some folks seem to think they showed some kind of "Show your individual bot how to do something and that bot, just by watching you, will learn to do that thing" and that is not REMOTELY how any of that works.

Probably not right now. For now they probably have to retrain on the cluster for every new task. But once the robot master many tasks it can probably few shot new tasks.

Everything you need to know about Few-Shot Learning

In this tutorial, we examine the Few-Shot Learning paradigm for deep and machine learning tasks. Readers can expect to learn what it is, different techniques, and details about use cases for Few-Shot Learning

blog.paperspace.com

Few-Shot Learning (FSL) is a Machine Learning framework that enables a pre-trained model to generalize over new categories of data (that the pre-trained model has not seen during training) using only a few labeled samples per class. It falls under the paradigm of meta-learning (meta-learning means learning to learn).

Knightshade · May 21, 2023

Cosmacelf said:
The other thing that was interesting with the Teslabot video was that the bot was mapping out its environment and remembering(?) it. That’s definitely a departure from FSD which doesn’t remember anything from one drive to the next.

Well, it KIND of does though... fleet cars send map-relevant data back to Tesla, and Tesla pushes that info back out to other cars as soon as they put in a route that touches relevant locations....we recently had a whole big thread on twitter from Green about the surprising amount of drive-specific info that gets pushed to cars each time a destination is put in, including fleet-gathered map data.... I would expect it's doing the same type of thing here-just without the "base map" info to begin from that they get from tomotom, google, etc.... so that if you say hired 20 bots to do a thing in a new unmapped location they wouldn't ALL need to map the location- one would, the data would upload to a back end, and then be distributed back out to the rest (or if it's a big area I suppose you could split the mapping task up among bots then have the back end stitch together and push the full map.

heltok · May 24, 2023

I think one major advantage optimus will have over normal workers is being able to quickly communicate with a central server planning a larger task. For example this one going viral now:

https://twitter.com/x/status/1661199027554377730

Having 1500 optimus at the work site should be a lot easier to manage than 1500 workers. Not unlike in this great movie:

Buckminster · Jun 3, 2023

Liked by Elon:

https://twitter.com/x/status/1665080596551409665

Buckminster · Jun 3, 2023

Dave discuses here that Elon has in part setup X.AI to ensure that Optimus neural nets are controlled by Elon and not Tesla's shareholders:

https://twitter.com/x/status/1665157668745641985

Worth reading.

Buckminster · Jun 7, 2023

47 mins in; suggests that after Tesla have exhausted all internal needs for the the bot they will sell to their suppliers. Sounds like a great idea and would make Tesla auto imperious.

heltok · Jun 7, 2023

spot is the world leader in the emerging
mobile robotics Market with more than a
thousand robots in over 35 countries no
other robot has been deployed more often
to tackle some of the industry's
toughest most dangerous tasks
spot handles tasks that are difficult or
dangerous for people
spot spends hours and hours each week
walking Factory floors checking gauges
and Machinery exposes itself to high
radiation in nuclear facilities goes
offshore and much more so people like
you don't have to every single day our
robot is being deployed at job sites all
over the world and it's making a big
difference in Industries like
manufacturing construction power and
utilities mining oil and gas and even in
a classroom where hopefully we're
helping inspire the next generation of
young roboticists but we want spot to do
even more

So basically they have sold 1000 units! Yay! Elon is going for billions... Different scopes. And it's basically a camera on legs so far, not manipulating the environment in any large scale. A useful camera, but why not just put a stationary camera at every guage or even just have the equipment be read digitally and sent over a network? Is that so difficult to do? (just asking, not claiming that it's not)

Knightshade · Jun 7, 2023

heltok said:
So basically they have sold 1000 units! Yay! Elon is going for billions...

I mean, this is the Waymo vs Tesla FSD argument isn't it?

One is trying for a general works everywhere self-driving solution but has 0 actually deployed examples so far and one has a lot of obstacles to scale affordably or at any speed, but actually has small numbers of working ones in the field.

Tesla Optimus Sub-Prime Robot

Hungry Member

Well-Known Member

Member

Active Member

Active Member

TSLA - 12+ startups in 1

Active Member

TSLA - 12+ startups in 1

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Active Member

Well-Known Member

Active Member

Well-Known Member

Well-Known Member

Well-Known Member

Active Member

Well-Known Member

Similar threads