Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla Optimus Sub-Prime Robot

This site may earn commission on affiliate links.
Thanks to @MC3OZ for linking to this thread from investment main.

I had a couple disagreements with @Cosmacelf about the bot.

1) I don’t think DL is near done with its potential and
2) I don’t think continuous (online) learning is necessarily tesla bot to function.

I welcome feedback and criticism, especially clarity of @Cosmacelf ’s opinion. I don’t intend to put words into mouths.
IMO I think the vision of Robots presented by SciFi is wrong on several counts in terms of the bare minimum to have a useful bot
  • Bots are portrayed as having emotions.
  • Bots are portrayed as having independent thought.
  • Bots are portrayed as having a massive brain.
  • Bots are portrayed as using logic and deduction to solve problems
I'm not saying a future Bot might not have these abilities, but it is likely decades away.

We know the cars with FSD are a kind of "Robot on wheels" and we know they can navigate a defined problem domain with a pretrained NN and we know "fleet learning" can improve the cars ability over time.

It just seemed to me that Tesla would base Bot V1 fairly firmly on the car architecture, as this is the quickest path to getting a useful product to market.

Bot V1 also simplifies and reduces the cost of the hardware by moving a lot of the problem to software. And in turn that ensures software upgrades and forwards compatibility, old versions of the Bot will improve.

It also seems to me that doing all the hard work in a centralised Dojo cluster is most cost effective (currently) than pushing that work into each individual Bot,. IMO Bot V1 is $10,000-$20,000 a Bot with "Dojo like" capabilities might cost more than $1,000,000, if it was a practical form factor.

As an investor I also find the idea of selling software licenses for particular tasks attractive, It is a steady stream of revenue. and is a business model similar to a smart phone, PC, or gaming console. Having a "do everything" Bot with the ability to do all tasks on day 1, would make it a more expensive product.
 
Last edited:
  • Like
Reactions: Buckminster
I had a couple disagreements with @Cosmacelf about the bot.

1) I don’t think DL is near done with its potential and
2) I don’t think continuous (online) learning is necessarily tesla bot to function.

I welcome feedback and criticism, especially clarity of @Cosmacelf ’s opinion. I don’t intend to put words into mouths.

Let's see how agile my brain is this late on a Friday night - I make no promises for the cogency of this post 😊

DL does indeed have quite a way to run, no argument from me there, with one caveat. Currently large DL models, like GPT-3 cost on the order of $1M in CPU time for each full training run. So there is going to be an economic limit to how far the current DL architecture can go and what problems it can be applied to. Tesla didn't build Dojo just for fun. They were reaching the economic limits of what GPU clusters could do.

That's just one problem with current DL architectures (cost). There are many others like how fragile and complicated they are (needing to layer in "new" knowledge so as to not trigger catastrophic forgetting, for example).

And then you have my previously stated concern that they can't learn in the field. Field knowledge must be laboriously gathered offline (think Tesla vehicle camera videos), hugely processed (laborious labelling), and then days/weeks of massive CPU/GPU time to create a new baseline inference engine.

In summary, DL has limits.

So to answer your number 2). It all depends on what you expect a bot to be able to do. If all you want is a specialized packing robot for a warehouse, then you'd custom build hardware for that and use some DL mixed with algorithms to make a very efficient and fast packing robot.

But TeslaBot has general purpose humanoid hardware. Which isn't tailored for anything in particular. The hardware is going to suck compared to a custom designed robot. For example, look at Tesla FSD. A Tesla FSD car is a custom robot with eight eyes situated around the car along with sub millimeter precise control actuation for acceleration, braking and steering. Plop a humanoid robot in the driver's seat and you'll get worse performance no matter how good your AI is (not only in precision, but especially in cost).

So what is TeslaBot good for? It's only going to make sense, IMHO, if it can take on general purpose tasks with minimal training, otherwise the economics will favor custom hardware. Just to give another example, warehouse picking. Amazon bought Kiva robotics to do that task. Kiva robots are flat square robots with wheels that can lift probably 1,500 pound bin shelves. Instead of having a humanoid robot walk all over the place, it makes more sense to have these much easier and cheaper to manufacture Kiva robots bring the shelves to a central location for picking.

DL does not allow for quick learning with minimal training. So that's why I think TeslaBot won't be all that useful without a new AI architecture. Do note that Tesla isn't the first company to build a humanoid robot. Many others have from Honda to Boston Robotics. And you haven't seen any compelling use cases for them ... because DL isn't the right tool for the job (IMHO).

My point is - you can build a TeslaBot now with DL, but the economics won't work out for a mass produced product.

There are some promising approaches that are buried deep in research labs to go beyond DL. They are usually lumped under the general classification of neuromorphic computing. Instead of a massive training algorithm based on back propagation, they might rely on spiking neural networks, or an analog of the same. Current spiking NNs aren't very sophisticated. It'll take time to figure it all out. But there are two things neuromorphic architectures provide that DL can't: Huge speed up and/or cost reduction for learning, and continuous learning.
 
DL does not allow for quick learning with minimal training. So that's why I think TeslaBot won't be all that useful without a new AI architecture. Do note that Tesla isn't the first company to build a humanoid robot. Many others have from Honda to Boston Robotics. And you haven't seen any compelling use cases for them ... because DL isn't the right tool for the job (IMHO).

My only argument with your post is that the Boston Robotics and Honda robots are probably more expensive, when the Bot is relatively cheap and adaptable to different tasks, the tasks themselves don't need to be high value.

So with the humanoid form, we are looking a simple repetitive tasks that humans can do in a constrained environment with minimal variation, there are many of those on factory production lines.

Another possible application is fruit picking, yes there are difficult to master techniques, but often farmers find it hard to get enough workers to hand pick crops. There are different types of crops with different picking techniques, but we know humans can pick them all. We also know no one has (so far) invented a mechanical harvester that can do the job to the standard the farmer requires for a price the farmer can afford.

If the farmer can use the bot for weeding, fertilising, pruning, fencing and mowing at other times that is a bonus, no shortage of work on farms.

How DL for cars works is that the expensive training cost is spread over the entire fleet, so the equation is (Training Cost/Fleet size).
For Bots that is (Training Cost/Bots doing that task).

How many Bots will Tesla build? DL makes the most sense with a large fleet size,
Train one bot to do a job, you train as many as you need, millions if required.

No longer need that job, then all those Bots can be retained to do something different, via a software download.

So is DL training that different to app and computer game development?

It can be expensive to develop a computer game, but that is hopefully recouped by high volumes sales.

I expect a lot of DL training will migrate to video, with minimal manual labelling, the main cost is a lot of expensive computer hardware running in an expensive data centre.
 
Last edited:
I have had some thoughts on how fruit picking training could work.

A human voluteer picks 100 pieces of fruit that they believe they should pick labelling them 1 to 100 in order with stickers, the process is videoed. Process A.

The farmer then grades each piece out of 10 on ripeness and execution. A 6 is the minimum target score in each category.

The fruit is videod from all angles in different lighting conditions, after the grading.

The human next picks 100 pieces of fruit that that they believe they should not pick again the process is videoed and the farmer grades the results. Process B.

In both cases Process A and Process B what is not picked is also information.

When Tesla believes the bot is sufficiently trained it repeats process A and process B again the farmer grades the results.

The process continues until the farmer is happy with the results or gives up.
 
Last edited:
I have had some thoughts on how fruit picking training could work.

A human voluteer picks 100 pieces of fruit that they believe they should pick labelling them 1 to 100 in order with stickers, the process is videoed. Process A.

The farmer then grades each piece out of 10 on ripeness and execution. A 6 is the minimum target score in each category.

The fruit is videod from all angles in different lighting conditions, after the grading.

The human next picks 100 pieces of fruit that that they believe they should not pick again the process is videoed and the farmer grades the results. Process B.

In both cases Process A and Process B what is not picked is also information.

When Tesla believes the bot is sufficiently trained it repeats process A and process B again the farmer grades the results.

The process continues until the farmer is happy with the results or gives up.
You are vastly underestimating how much training DL requires, and how complex fruit picking is. Fruit obscured by foliage, different sizes, different ripeness, techniques for pulling fruit off branch depending on ripeness, dexterity, etc.
 
You are vastly underestimating how much training DL requires, and how complex fruit picking is. Fruit obscured by foliage, different sizes, different ripeness, techniques for pulling fruit off branch depending on ripeness, dexterity, etc.
Still the process is essentially train via video with auto-labeling.

When the bot gets it wrong, they can video a human doing it right.

They already deal with obscured by foilage for stop signs.

We need to compare the time and money to develop this using a pre-existing generic Dojo train via video with auto-labeling app, to developing a console game.

If picking involves climbing a ladder, I would look for alternatives to ladders.

In part the bot project is to ensure that the FSD team don't get bored and leave, it is the next challenges. Didn't work with Karpathy.

Yes I think solving FSD from here is 95% routine grind, 5% inspiration.
 
Oh heck, I’ll just lose some sleep and give a few thoughts. Just forgive me if I disappear again for a bit, please.

Training costs are immense. I 100% agree and this fact is pointing me to look to invest in ML infrastructure companies.

I ask myself, what % of world compute will be for AI in the near to medium term? Long term, looks like very close to 100%. Medium term must be somewhere between here and there. Seems like a good business, since the demand is there.

The bot training will be done in sim, for a lot of it, which decreases costs.

UPS, since the early 2000s, was trying to automate package loading of trucks. They spent a lot of money and effort to engineer conveyer belts, computer read barcodes, etc to get packages onto trucks without “package handlers,” aka people, picking them up and putting them in order on a truck. I see zero reason why something like this requires online learning.

For example:

Assign bot 1 to trucks 37a 56b and 42c. Tell bot 1 to scan all packages coming off of the conveyor belt. All packages that match assigned duty trucks are to be loaded in an order convenient for the driver. This requires a general ability to pick up and place packages of any shape, size, weight that comes in through the line. It requires the bot to be able to recognize the location of the line and the truck. It requires the bot to move between these locations carrying the packages. It requires the bot to place the packages securely in a location that is satisfactory for the package deliverer for efficient unloading.

Maybe I forgot some steps in there, but the “handling” of the package is something that can be trained. The movement to trucks 37a 56b and 42c requires locational awareness. The locational awareness can easily be programmed*

*this all assumes tesla can solve, sufficiently for tasks like these, spatial intelligence. And then implement control code to execute.

My whole contention was specific to online/continuous learning and deep learning. I think you’ve cleared up your deep learning concerns, but I don’t see the online/continuous learning issue. Just because a bot has never seen a particular warehouse before, doesn’t mean it doesn’t know how to navigate through it while carrying boxes.

Does that make sense?
 
The bot training will be done in sim, for a lot of it, which decreases costs.
That is how I see things, for FSD the road environment is complex, sim is of little use, and it is hard to bootstrap the initial training, cars need to drive around navigate in the real world and experience the complexity of the real world.

For many other types of training we can use barcodes, labels, mapping of the environment to aid the bot, and we can video tape humans doing the activity and example objects with labels attached can be presented to the bot. Simulation is also more useful in a constrained environment.

It may still take a team of developers 5-10 years to train a bot to do a particular task well, and there may be millions of dollars of Dojo compute time needed to do the training.

But once we have a bot trained to do Task X, we can deploy as many bots to do Task X as we need.
So the question is the number of bot-hours expended worldwide on a particular task, and the economic value those bots add when doing the work, compared to the training cost.

My hunch is it will not be hard for the economic value of the work the bots are doing to exceed the cost of training, because thousands or millions of bots will eventually be doing that task.

I didn't mean to imply training the bots is a simple, quick and cheap process, but for many tasks the vision environment should be less complex than for FSD. For starters in a factory, the company that owns the factory, has a lot of control over the visual environment. When a bot is employed in a domestic situation, I would expect one of the first tasks it does is to map the house and gardens,
 
  • Like
Reactions: Tommy O
Reading this thread, my mind wandered to how much better my Roomba might be using DOJO. Would iRobot want to harness NN for such a simple task? Would Tesla benefit from using a wide base of already installed robots for some real world training?

When I was building a robot to pick up doggie do, (The Poomba) I quickly realized there were very few ready to use components, software and help to be successful. It was a fine distraction for a while but eventually parts were donated to the local colleges robotic team.

If Tesla had DOJO as a service, I could see many robot projects becoming successful.
 
Reading this thread, my mind wandered to how much better my Roomba might be using DOJO. Would iRobot want to harness NN for such a simple task? Would Tesla benefit from using a wide base of already installed robots for some real world training?

When I was building a robot to pick up doggie do, (The Poomba) I quickly realized there were very few ready to use components, software and help to be successful. It was a fine distraction for a while but eventually parts were donated to the local colleges robotic team.

If Tesla had DOJO as a service, I could see many robot projects becoming successful.

Google already has NN cloud as a service, but at a lower level.
 
  • Like
Reactions: Tommy O
I just want a bot for home security to block our bedroom door with a shield / call police and to be around my grandmother while I’m at work in case anything happens. For that alone I’d easily pay $100,000
Funny you say that. I’ve been working on a product that’s not at all a robot but that you would be a target customer for — watching after vulnerable relatives and their health is going to be a big industry. I’ve been thinking about how if the tesla bot can do something like what you described, my product would be supplanted almost immediately. I hope that happens.
 
  • Like
Reactions: Xepa777
Recently folks here wondered how Tesla Bot will be trained and how long it will take. The likely answer is: By two methods and not long.

The two methods:

1) Simulation. Tesla talked about this at AI Day One. Their simulated driving environments are so close to reality that FSD can be safely trained for many dangerous situations before the training is refined by fleet data. Other companies have demonstrated other simulated environments with incredibly accurate physics of various objects (solids, fluids, gases, guns and butter).

2) Observation. Academic researchers have already demonstrated a robot hand that can learn a task by watching a human hand, using a single camera. Tesla likely knows about this work, or has hired the researchers.

Both methods will be very fast. Training in a simulation proceeds at the speed of supercomputers, allowing multitudinous practice runs in a day. Training by observation proceeds in real time, but a robot never forgets, and can share its training with millions of other robots almost instantly.

The following videos are cued to demonstrations of the two methods.


 
Observation method is the way, and here's why. In real life, it's called Peer Training and is most commonly in use today. Just as Primus is in human form to fit quickly into our existing world, the Observation Method is an easy transition if the robots are up to the task, because that's how training is setup today. The eLearning portion of the training would be gulped in an instant, followed by hands-on Peer Training with Demos and Shadow Mode (same term as FSD, same meaning too). Class Size = 1.

This still blows my mind how a single trained robot could replicate some new skill immediately across the fleet. Training Time approaches zero for volume production scenarios, Learning Efficiency skyrockets, and Process Control Charts naturally adjust to a new standard. Factory Ramps are faster by at least a couple weeks per line, the list goes on.

In the Factories I've worked in, production lines had a spec for the Procedures at each Operation, but I have yet to find a single operation where the person was following spec 100%. Many times they cut corners and create manufacturing defects without knowing it (either because it's easier or provides a higher output). So to get the "Procedure" exactly right (or exactly close, but consistent) across all robots, all shifts, and all Factories is Training Heaven! It also means we need half the HR staff (which is sadly where companies stick their training groups, and not in Operations).

Wow, the improvement cycles would likely accelerate because we remove the human variable. Experiments will likely yield more accurate results due to consistency of the tests performed. Place the robots into an R&D role, and with additional objectives and background, they could improved the methods. At that point, Training Time = ZERO. Completely eliminated training bc the Robots themselves figured out the better way to do it.
 
1659764315960.png

Tesla's Optimus Bot hands may be a clue
I wasn't expecting solid metal bot but now I see it I suppose it makes a lot of sense. Outer plastic/fabric shell will be easily replaced and be non structural.
1659764537065.png

1659764613454.png

1659764702325.png