Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Project Dojo

This site may earn commission on affiliate links.
What do you think they are doing with this Project Dojo?

One simple answer might be they will just upload a crap ton of data, without doing any manual labeling. And they will just assume the driver's input is the True condition.

E.g. the driver's input is the label. Even though in some cases the driver will do the wrong thing, this is such a small % of the data that they expect to get most of the way to good accuracy without worrying about that.

BTW I surmise the reason they are going to do this is they are still using the NNs mainly for perception. Which may be good enough. But I would guess the next gen chip is expected to do more end-to-end deep learning for driving - this requires much more data and compute power (both training and inference).

That's their next-level goal.
 
How do they filter out the "bad" drivers? IF the NN thinks an action by a horrible driver is good data, then it could turn into a cluster. I would be interested in how they deal with this.
Covariance of deviation from center of lane, mean(abs(jerk)), ratio eyes on/off road, rate of hard brakes, mean(abs(deviation from speed limit)) etc etc.
 
How do they filter out the "bad" drivers? IF the NN thinks an action by a horrible driver is good data, then it could turn into a cluster. I would be interested in how they deal with this.
I suppose the NN can learn from an average of all the data where the outliers are and their driving characteristics. I think identifying a bad driver would be one of the easier tasks of a learning machine honestly given all of the sensors in the car.
 
I suppose the NN can learn from an average of all the data where the outliers are and their driving characteristics. I think identifying a bad driver would be one of the easier tasks of a learning machine honestly given all of the sensors in the car.

That has to be a powerful filter. I tried googling, but cant find it. I thought I read there was a stat that the average driver was a horrible driver. If that is the case, they probably employ additional methods.. I would hope. Sorry, point being I wish they would have covered that during the talk.
 
That has to be a powerful filter. I tried googling, but cant find it. I thought I read there was a stat that the average driver was a horrible driver. If that is the case, they probably employ additional methods.. I would hope. Sorry, point being I wish they would have covered that during the talk.
No reason to be sorry. I think the average driver probably is pretty bad. I’m not sure where Tesla owners lie on the scale of average vehicle drivers. I think most luxury/premium car drivers would be slightly above average in general due to various factors. But over time and enough sample data I would think the NN could suss out the difference. It doesn’t need to follow the exact actions of its sample data, but filter out enough bad data to avoid poisoning the well and draw its own conclusions. I’m just guessing here as I have no more information than the average guy.
 
No reason to be sorry. I think the average driver probably is pretty bad. I’m not sure where Tesla owners lie on the scale of average vehicle drivers. I think most luxury/premium car drivers would be slightly above average in general due to various factors. But over time and enough sample data I would think the NN could suss out the difference. It doesn’t need to follow the exact actions of its sample data, but filter out enough bad data to avoid poisoning the well and draw its own conclusions. I’m just guessing here as I have no more information than the average guy.

Yep me too. I hope Elon explains that some day.

I had to laugh at your luxiory/premium car drivers being better drivers. I actually think the opposite. We all know the BMW driver stereotype.. lol
 
  • Like
Reactions: Green Pete
How do they filter out the "bad" drivers? IF the NN thinks an action by a horrible driver is good data, then it could turn into a cluster. I would be interested in how they deal with this.


They don't need to if the ratio of good to bad is high. Optimization will arrive at the ideal weights, albeit converge a little slower than if all the data was pure.
 
  • Like
Reactions: SpudLime
I watched the whole presentation. Dojo was mentioned in passing, and basically in response to whether HW3 was inference only or could be used for training. Dojo is meant to be used for training (hence the name 'Dojo') and in particular my recollection was that it was meant to be used for training neural networks using video. Currently, most of the neural networks take individual images as input. You can imagine a 30fps video of many seconds could potentially be many orders of magnitude larger as input (if an image is N bytes than a ~30 second video at 30fps could be about 1000 times larger as an input) if done as a single input. They'll probably use different Neural Net architectures for this though, probably some combination of their current image network and LSTMs.

It may also be necessary to do the depth mapping that Karpathy talked about. Here is the paper

https://arxiv.org/pdf/1904.04998.pdf
 
I watched the whole presentation. Dojo was mentioned in passing, and basically in response to whether HW3 was inference only or could be used for training. Dojo is meant to be used for training (hence the name 'Dojo') and in particular my recollection was that it was meant to be used for training neural networks using video. Currently, most of the neural networks take individual images as input. You can imagine a 30fps video of many seconds could potentially be many orders of magnitude larger as input (if an image is N bytes than a ~30 second video at 30fps could be about 1000 times larger as an input) if done as a single input. They'll probably use different Neural Net architectures for this though, probably some combination of their current image network and LSTMs.

It may also be necessary to do the depth mapping that Karpathy talked about. Here is the paper

https://arxiv.org/pdf/1904.04998.pdf

Yup, people don't seem to realize they've only scratched the surface of camera-based inference for driving. As I mentioned elsewhere, Jimmy D was very excited when they were inputting not one but two(!) images into the network. Multi-image / video inputs will be difficult to train but likely pay off in accuracy upgrades.
 
  • Like
Reactions: Cirrus MS100D
My guess/take on project Dojo is that it is a Tesla project for automating builds with a heavy focus on training neural networks on massive amounts of data. Think nVidia DGX/DGX-2 or AWS. If it is a physical computer at Tesla or on the cloud I don’t know. From the way he talked about it, I got the feeling it was inhouse, but imo it makes more sense to put it on AWS/Azure(like openAI).
 
My guess/take on project Dojo is that it is a Tesla project for automating builds with a heavy focus on training neural networks on massive amounts of data. Think nVidia DGX/DGX-2 or AWS. If it is a physical computer at Tesla or on the cloud I don’t know. From the way he talked about it, I got the feeling it was inhouse, but imo it makes more sense to put it on AWS/Azure(like openAI).

I wouldn't be surprised if it uses in house and cloud provided compute. Using Cloud for all processing is expensive if there's a certain average amount of demand that you can predict and build in house for. In house is usually significantly cheaper than cloud hosting. On those occasions when you need a burst of capacity you can scale it out to the cloud at a more expensive price per the same capability.
 
  • Like
Reactions: theBurtReynold
I wouldn't be surprised if it uses in house and cloud provided compute. Using Cloud for all processing is expensive if there's a certain average amount of demand that you can predict and build in house for. In house is usually significantly cheaper than cloud hosting. On those occasions when you need a burst of capacity you can scale it out to the cloud at a more expensive price per the same capability.

Using cloud compute resources is only expensive if you've already got datacenter space you own or lease, you have the staff to manage and maintain it, and the depreciation/replacement cycle of the hardware is useful to your business' bottom line. None of these are the case for Tesla, so they use AWS.

What would change this equation is if they had specialized, custom hardware that was used to tag content and train their NN. Since they already get access to nVidia's latest hardware with a cloud provider, this doesn't seem like it'd be a very worthwhile investment. It doesn't seem likely that they'd do streaming NN updates to their entire fleet world wide, so it's improbable that they'll need to do hundreds of builds per day. Given that assumption, I don't think they need some super purpose-built platform for any of this. They've got access to amazing Talent over at SpaceX when it comes to using CUDA to produce fantastic results, so I feel like they'd probably just as likely contract with the team over there to get what they need.
 
Thinking about it a bit more, a hybrid solution makes a lot of sense. Some things are better run locally but once they need thousands extra gpus or tpus they get them from the cloud. And when the need to do Hardware In Loop(HIL), like the 4x HW3 images they showed in the presentation, they can do that. Also Amazon recently started to offer better hybrid solutions:
 
Been thinking a little about this. I think Tesla going forward will use a lot of self supervised data. With the scale they need that is the only way. Remember green’s post about needing 100000x the amount of data for AKNET_V9. One way of doing this is using the future, the past and other cars to get inference about the current situation.

For example:
* Is the sign 100m in front of us a 60mph or a 90mph sign? Fast forward a few seconds and their own NN will be able to annotate with very high confidence.
* Where are the lane markings, it is dark, snowy and hard to tell? Another video stream from the same place during easier weather condidtions can label it for them, use normal computer vision methods to align frames from two different videos.
* Which red/green light is the relevant for us? Gather 128 videos from the same intersection and see which one our car and other cars obeyed 100% of the time.
*If our network’s path prediction doesn’t match the drived path, ask for more data from the same road and see where everyone else was driving.

Etc

Also whenever our network has improved, use this newer netwrok to find inaccuracies in the dataset. If major disagreement ask the fleet for data about that current location and clean the network. Some of these might have to monitored by humans.

Use humans to verify that the the adding data/cleaning data seems balanced and that performance on a test dataset is improving over time.