Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Tesla AI Day - 2021

This site may earn commission on affiliate links.
Maybe Dojo will be a bust, but if it isn't, Tesla now has amazing AI supercomputer technology (and it isn't only in the chip design - it is also in the huge communications bandwidth, the compiler software, the packaging, the cooling, the power electronics, etc.) which can be leveraged for god knows what.
Yes, absolutely. Knowing Elon's track record, I get why they were tempted to go down this path. And I'm not even saying that it was the wrong decision. Like you said, maybe it will be a bust and maybe it won't. If it ends up successful, it will almost certainly lead to a spin-off so they can commercialize those compute capabilities.

It is still a bit risky and this has certainly not been a cheap endeavor. Also, keep in mind that once you commit to this, you have to budget to continue funding all this talent and R&D because you have to keep improving year over year because the competition like nVidia, Google, and other startups are all very hyperfocused on this stuff as well. It's an interesting experiment right now and I'm curious to see how it pans out. Hoping it is a success and they can spin-off successfully. If not, I doubt they will have the appetite to keep going down this path as it is going to cost a lot of money to keep it alive each year.
 
It is still a bit risky and this has certainly not been a cheap endeavor. Also, keep in mind that once you commit to this, you have to budget to continue funding all this talent and R&D because you have to keep improving year over year because the competition like nVidia, Google, and other startups are all very hyperfocused on this stuff as well.

It's good to see Tesla spending R&D on anything they deem necessary, rather than hoard hundreds of billions like Apple and Google, lol.
 
  • Like
Reactions: Big Earl
Yes, absolutely. Knowing Elon's track record, I get why they were tempted to go down this path. And I'm not even saying that it was the wrong decision. Like you said, maybe it will be a bust and maybe it won't. If it ends up successful, it will almost certainly lead to a spin-off so they can commercialize those compute capabilities.

It is still a bit risky and this has certainly not been a cheap endeavor. Also, keep in mind that once you commit to this, you have to budget to continue funding all this talent and R&D because you have to keep improving year over year because the competition like nVidia, Google, and other startups are all very hyperfocused on this stuff as well. It's an interesting experiment right now and I'm curious to see how it pans out. Hoping it is a success and they can spin-off successfully. If not, I doubt they will have the appetite to keep going down this path as it is going to cost a lot of money to keep it alive each year.

Worst case, they sell it to Intel for 10x money invested 😀

BTW, I'm at the point in my re-listening where Karpathy points out that their vector space representation of the world is a very dense raster which must then be consumed/analyzed by the car's AI chip. They think they can optimize this quite a lot by producing a much more compact and easier to analyze representation. That alone would reduce AI chip load by quite a factor. I could easily see a 2x or 4x gain by this.

I'm still working my way through, but I'm beginning to see that Tesla has like a v0.5 of what a production vision architecture would be using standard deep learning methodologies. There is a ton of optimization and enhancement they can do, and will do over time.

What did you think of the way they handle "video"? I was expecting them to actually consume video, but that isn't what they are doing. They are still doing image by image classification/extraction/detection but have augmented with a vector space memory of features. Clever trick and seems to work. But is there no real video NN out there?
 
I'd love to hear your expert insights on this topic rather than snide personal attacks if you have any to offer.
Sorry. I did appreciate many of your other numbered points but your comments about speed by 'some amount' sounded like you didn't think their new hardware speed stats were important to process these complex NNs. Obviously, they are using machines now (almost 10000 Nvidia’s A100 GPUs) to get out the FSDBeta every few weeks so they know what the criteria and their needs are.

“We’ve been scaling our neural network training compute dramatically over the last few years,” said Milan Kovac, Tesla’s director of autopilot engineering. “Today, we’re barely shy of ten thousand GPUs.But that’s not enough.”

They were very descriptive in their criteria and avoiding bottle next at every possible point. They talked about optimized hardware to match the specific processing needs [supporting FP32, BFP16 (aka bfloat16 or brain floating point) and a new format called CFP8 (“configurable FP8”)] vs generalized GPUs. I've read several reviews of experts on their presentation on the DOJO hardware and they are all vastly different than your own snide :) quip about 'some amount'.

“For Dojo, we envisioned a large compute plane filled with very robust compute elements, packed with a large pool of memory, and interconnected with very high-bandwidth and low-latency fabric,” Venkataramanan said. “We wanted to attack this all the way – top to bottom of the stack – and remove all the bottlenecks at any of these levels.”

One of many articles on it.
 
Sorry. I did appreciate many of your other numbered points but your comments about speed by 'some amount' sounded like you didn't think their new hardware speed stats were important to process these complex NNs. Obviously, they are using machines now (almost 10000 Nvidia’s A100 GPUs) to get out the FSDBeta every few weeks so they know what the criteria and their needs are.



They were very descriptive in their criteria and avoiding bottle next at every possible point. They talked about optimized hardware to match the specific processing needs [supporting FP32, BFP16 (aka bfloat16 or brain floating point) and a new format called CFP8 (“configurable FP8”)] vs generalized GPUs. I've read several reviews of experts on their presentation on the DOJO hardware and they are all vastly different than your own snide :) quip about 'some amount'.



One of many articles on it.
My comment wasn't a snide quip at all. I think I qualified everything extensively in my description and reasoning prior to the comment about some speed up. And I used that term because as Elon says, the proof will be in the pudding and Dojo is not a solved problem yet and there is still a lot of engineering both on the hardware and software/compiler side that still remains to be done. There is the aspirational compute capabilities they hope to achieve but it is still very early days and it's reasonable to take their aspirational target compute capabilities with a grain of salt till they get much closer to having a larger integrated system and are running established ML benchmarks.

Yes, Dojo, if successful will certainly be beneficial to Tesla. But I still contend that it isn't critical and it is a fairly risky proposition. I hope it pans out for them, but it isn't like nVidia is sitting around not improving their product. In fact they already have something very similar to CFP8 and have honestly driven most of the work in reduced precision network training and inference over the past 5+ years. If a 1000 A100s aren't enough, then getting 1000 more is still a viable, low risk solution.

Anyways, it's rather frustrating to be personally attacked around these forums by taking one sentence out of a detailed post out of context and then acting outraged over it just because it doesn't jive with all the hype and kool-aid. My post in totality was extremely positive on everything Tesla is doing and even with regards to Dojo. Just stating my reservations. I hope they succeed but it isn't nearly a sure thing yet.
 
Yup, Elon said just as much. The AI Day slide said 4x performance, whatever that means.

But I see orion2001's perspective. From Elon's commentary, Dojo is still in experimental territory. It might end up being vaporware if the software team still clings to the GPU cluster.
I might be reading into things, but I thought Elon’s remark about Dojo only being a success if the team could turn off the GPUs and stop using the at all in favor of Dojo was him expressing a little doubt himself. The guy talking about the Dojo chip (iirc, his name was Ganesh?) was very enthusiastic, but I thought Elon’s remark and the one he had about the proof being in the pudding was very much him saying: “Prove it”.
 
Dan D. said:
Can vision do everything at 70mph without any range-finding sensors?
Humans can. The NN should be able to as well.

Humans and ML NN's are not (yet) really equivalent. Humans don't use fused instantaneous vision snippets. I don't need to stitch together several views, we have continuous vision. I have the memory of approaching a long dump truck & trailer and know that's what's now beside me, and can see one approaching from behind. I know its distance and relative speed it's going - well enough to understand & react to the situation. I see very little actual intelligence, understanding and long-term memory from these NN's. Maybe one day.

I can't see everything that's beside me or behind me when I'm parking at night. Often I only know what's there because I've just driven past it and I have to remember where the hazards are, I can't see them while parking. The reverse lights help a bit, but there's no lighting on the sides. You just have this 'sense' of the relative position of objects and where they all are - without needing to further see them. I presume FSD will also have a 'memory' of where things were, when its Vision cameras don't show them up anymore.

I still doubt that occluded Vision-Only systems can really fully sense all the detail needed for autonomous driving - fancy graphic presentations notwithstanding. Nothing in the FSD Beta drives have yet proven it's in every way capable of doing it well. Maybe it's a camera hardware problem, I don't know.
 
Anyways, it's rather frustrating to be personally attacked around these forums by taking one sentence out of a detailed post out of context and then acting outraged over it just because it doesn't jive with all the hype and kool-aid. My post in totality was extremely positive on everything Tesla is doing and even with regards to Dojo. Just stating my reservations. I hope they succeed but it isn't nearly a sure thing yet.
Thank you for your thoughtful further explanations in several posts. You are right about the totality and I overreacted. I apologize.
 
The presentations by Karpathy and his team were fantastic and super informative. Maybe AI Day should have stopped after Dojo. End on a high note as George Costanza would say. The Tesla Bot, especially the intro with the guy in spandex dancing, was a jump the shark moment IMO. It was a bit too far. The Tesla Bot sounds like classic Elon hype. The reality is probably that it will take longer than expected to actually work as advertised, as we've seen with FSD.

If BlueMan Group can go out there and do their thing, I didn’t see much difference with Tesla Bot. There were a few moments of levity during the presentation and I’d consider this one. Laughed at the simulation when the FSD Tesla was halted at an intersection and a car being chased by numerous cop cars with lights on appeared. Also thought Elon’s comment about making Bot run slower than a human and could be overpowered by one was rather comical. Light moments in an otherwise technical presentation that lasted just over 2 hours.
 
  • Like
Reactions: jkirkwood001
Lex Fridman has a breakdown and overview of key points. He is well respected and has done a variety of interviews during which his knowledge and understanding are demonstrated.

Lex Fridman -- About
Research in human-centered AI and deep learning at MIT and beyond in the context of autonomous vehicles and personal robotics. I'm particularly interested in understanding human behavior in the context of human-robot collaboration, and engineering learning-based methods that enrich that collaboration.

I received my BS, MS, and PhD from Drexel University where I worked on applications of machine learning, computer vision, and decision fusion techniques in a number of fields including robotics and human sensing.

Before joining MIT, I was at Google working on machine learning for large-scale behavior-based authentication.

 
Last edited:
  • Like
Reactions: Bitdepth
I ended up staying up pretty late watching the whole thing last night. I work in a similar field of research (not self-driving, but computer-vision/object-detection/segmentation/tracking, etc) and really enjoyed the talk. Here are just some general thoughts and notes I had as I watched it:

1. I've always admired Andrej Karpathy as a very pragmatic, no-BS ML researcher and I continue to be glad that he is leading this effort. Not to take anything away from the entire team, but the overall direction and approach they have taken, not just in the sense of where they are going with their ML approaches, but also the spectacular amounts of engineering they have done to build up the right tooling and infrastructure to help them iterate fast is quite something.

2. The evolution of their approach to their networks, architecture and training approaches are all quite sensible but they are extremely complex and require ridiculous amount of data to train which is why Tesla has built out all the amazing infrastructure on the labeling/reconstruction side of things, as well as with simulations.

3. Their approach to spatial and temporal memory is a nice and strongly needed upgrade, and the performance improvement in the demos were very nice to see.

4. The policy learning section was really neat and I'm sure there is a lot more neat stuff under the hood.

5. Very excited to see where they have gone with simulations. The imagery looks quite impressive. It is very tricky because if your visuals don't look near-identical to the real-world, the networks are so powerful that they can just learn to do well on synthetic imagery and then fall apart on real-world imagery. This is a common challenge with folding in synthetic data from simulations, so it was nice to see the extent of work they have done and continue to do with exploring neural rendering to try and bridge the gap between simulation and reality
- I did note that the vehicle dynamics of cars/trucks, etc in the simulations were fairly non existent in terms of body-roll, etc. But that's a 2nd order concern at the moment and unlikely to matter too much for what they are trying to do

6. It's clear that they are starting to hit the limits of being able to fit all the inference for these complex networks onto their current FSD hardware. In the diagram they showed in one of the slides, it's clear that they are now leveraging both compute units to try and squeeze the needed performance (rather than the original vision of having the 2nd unit for redundancy). I wonder how they are going to handle the inevitable fragmentation when they release new hardware and now have to support two separate sets of FSD/AP and associated networks when that happens because they certainly aren't going to foot the bill for upgrading their entire fleet.

7. The amount of crazy levels of custom engineering that has gone into all aspects of their entire pipeling... from labeling tools and infrastructure, training networks, simulations and regression testing their models is quite spectacular.

8. Their offline processing of videos to generate ground-truth 3D point clouds and other assets + autolabeling is extremely impressive. Anyone who works in that kind of field probably really appreciated how neat it was even more.

9. They've definitely made the right decisions early on in how they set up the closed loop between experiments they want to run or data they want to collect and their vehicle fleet. Tying hard/weird examples in with simulations to generate a large number of variants of a single instance of a real-world "weird" scenario is a logical extension and really neat to see because this helps try to address the problem with the long-tail of weird scenarios. It still doesn't fundamentally solve this because the long-tail in the real-world is really long but with time this will still improve the overall FSD abilities to handle weird scenarios

10. I feel like they were a bit disingenuous with their Dojo presentation. There was very little distinction made between what they have built, tested and benchmarked to-date vs. what was aspirational with regards to Dojo. They only just got one working D1 on a benchtop that they managed to train a small GPT model on, but the slides would have you believe they have this huge room-sized cluster almost built up and ready to go. During the Q&A session a researcher on compilers for distributed computing systems asked a question about whether Tesla had managed to solve a very hard and challenging problem with such systems that is an active area of research in academia and the reply from the Tesla counterpart was very wishy washy, basically saying no, it's hard, but we think we can solve it. How Dojo eventually shakes out is still a big unknown at this point.

11. While the inner-geek in my loved the Dojo stuff, and I'm sure anyone working on computing hardware would absolutely love to see someone spending time and a lot of resources on developing a whole new architecture, I have to say that taking a step back and looking past all the hype, it still isn't clear to me if investing all of this effort into Dojo is really all that beneficial to Tesla. It feels more like someone said this would be cool if we did this, and they just ran with it. At the end of the day, even if they meet all their goals for performance and efficiency, it's not like Dojo is going to let them train fundamentally new types of networks that are impossible to train on any other hardware. The Google's, Facebooks, OpenAIs of the world are doing just fine training extraordinarily complex ML systems using conventional GPU clusters relying on nVidia GPUs/TPUs. Even Tesla to date has managed to do everything they have achieved so far on such clusters. All Dojo will let them do is train things faster by some amount, and at a lower cost. It isn't a fundamental game-changer in any aspect that opens up fundamentally new directions and opportunities for Tesla. I wonder if the company wouldn't have been better served building out a regular GPU computing cluster and saving all these resources and allocating it elsewhere. Still curious to see how it shakes out, but it still seems like a bit of a gamble to me with very questionable long-term benefit.

All in all, it was a neat presentation and everything looks to be going in a better direction and it certainly doesn't look like Tesla is just stuck spinning wheels. I'm still very skeptical on them getting anywhere near L5 anytime soon, but I do feel fairly confident that they will have a extremely solid AP even in city conditions with the progress they are making. I still do wish they just made the plunge with augmenting with automotive Lidar, but I guess that's not going to happen due to the optics of switching up their sensor suite so late in the game and the fact that they have sold FSD to so many customers already with the current camera suite.

Very insightful perspective, @orion2001 , thanks.

re your point 11 (Tesla building its own, proprietary supercomputer), initially I agree. But upon reflection, perhaps Tesla's thinking is that AI is at the core of all their future products, like Tesla Bot, and leaving the key enabler for such work to third parties is too high a risk / too low a return.

Superchargers are another example. Back in 2008 (?), I would have questioned why Tesla decided on the huge investment to build a proprietary global charging network (are u nuts?!?). But given how slowly and milquetoast-like public charging network development has been since then, who amongst us would question today Tesla's gamble back then?
 
Not much discussion on Tesla bot so far on this thread. Two simple questions I had after AI Day presentation.
First, as far as applying/porting all the Tesla FSD techniques to robot navigation through it's working environment,
it strikes me that the challenges being solved for autonomous driving are far greater than what a Tesla robot would
need. Navigating around a workplace I would think is much easier than an autonomous car driving. If Tesla partners
with Dennis Hong half as well as they have with Jeff Dahn, perhaps a Tesla bot may be realized in the next few years,
not 5 - 10?

Second, can someone explain the part of presentation which went into detail on how they are creating very accurate
'simulations' to supplement the data collection, manipulation and labeling work that I thought was the entirety of the
NN FSD project. Is the point to create artificial and very unlikely edge cases to train the NN with rather than only
depend on the numbers of actual edge cases being reported from all the Teslas driving with AP and FSD hardware?
Thanks.
 
Second, can someone explain the part of presentation which went into detail on how they are creating very accurate
'simulations' to supplement the data collection, manipulation and labeling work that I thought was the entirety of the
NN FSD project. Is the point to create artificial and very unlikely edge cases to train the NN with rather than only
depend on the numbers of actual edge cases being reported from all the Teslas driving with AP and FSD hardware?
Thanks.
My understanding is that the use simulation not to find unknown edge cases, but to fine tune on hard to source, known edge cases. Like the running family with a dog, in the middle of the highway. Maybe they've seen it happening once, and thought it was important to add that scenario mixed in different environments for increased training material. Like during a snowstorm, who knows. They mentioned 2 other cases as well, but this is one that justifies simulation for me.
 
  • Like
Reactions: jkirkwood001
Second, can someone explain the part of presentation which went into detail on how they are creating very accurate
'simulations' to supplement the data collection, manipulation and labeling work that I thought was the entirety of the
NN FSD project. Is the point to create artificial and very unlikely edge cases to train the NN with rather than only
depend on the numbers of actual edge cases being reported from all the Teslas driving with AP and FSD hardware?
Thanks.
R8CmzvI.jpg

 
  • Like
Reactions: mark95476