Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Autopilot HW 3

This site may earn commission on affiliate links.
Hahaha. Whenever I speak precisely I get "this sounds awesome, but I don't know what it means". Whenever I speak generally I get corrected.
I’m loving it.

Takes me back to early ‘90s when I worked for a company that heavily used IT to improve their competitive position.

Spent some time in supercomputer deep dive to figure out where and how we could use them. Learned to look at world through “parallel” glasses-could this be parallelized? What would it look like?

Also enjoyed lunch with AI team, getting neural net tutorials.

Now I drive the dream. And I’m back at the table with the smart kids.
 
What happens if your car is worn out and scraped before they deliver it?


67C3459C-82C4-4CC5-BC5A-4681B8C69FFE.jpeg
 
Assuming HW3 does 10x more teraops than HW2, how many teraops does HW3 do? Do we know how many teraflops HW2 does?

Would be interesting to compare to Waymo’s 2016 estimate that it would need 50 teraops, and to Nvidia’s 320-teraop Pegasus robotaxi computer.

Edit: Electrek says the HW1 computer did 0.256 teraops. Taking Tesla’s figure that HW2 has 40x as much processing power, that implies 10 teraops for HW2. 10 teraops x 10 = 100 teraops for HW3.

Or, taking Elon’s 5x to 20x estimate, between 50 and 200 teraops.

So, this puts HW3 within/above what Waymo estimated it needed in 2016 (that estimate might be different today), and between about 1/6 to 2/3 Nvidia’s Pegasus computer.
 
Last edited:
Wikipedia claims that HW2 can do 4 FP32 teraops or 10-12 deep learning teraops — not sure if that’s accurate. If accurate, that would put HW3 (assuming 10x of HW2) at 40 FP32 teraops or 100-120 deep learning teraops.

In terms of deep learning ops, that would make it about 1/3 of Nvidia’s Pegasus, which can do 320 INT8 teraops.

Beginner question: is 40 FP32 teraops equivalent to 80 FP16 teraops? If HW3 can do 40 FP32 teraops, does that also mean it can do 80 FP16 teraops?

Here’s the Hot Chips video from August 2016 where Daniel Rosenband from Google/Waymo gives the 50 FP16 teraops figure (at 39:50):

 
Last edited:
  • Informative
Reactions: BigD0g
Just like they did for AP1 ? Sorry, but Tesla has a habit of abandoning the old in favor of the new as soon as they think the old is "good" enough. Just ask any AP1 owner. Or AP 2.0 owners about there dashcams?

Not referring to AP1 at all. I was responding a post about AP2/2.5, the one with a highly similar sensor set to HW3...

Dashcam was never an advertised feature of HW2, so I'm not sure what the gripe is there...
 
Strangely, Mobileye's EyeQ5 — scheduled to launch in 2020 and intended to support full autonomy — is only targeting 15 teraops. That's not much more than HW2.

(Source: The Evolution of EyeQ - Mobileye)

It's been increased to 24 tflops for quite a long time.

But keep in mind that eyeq4 (only 2.5 tflops) handles processing of 8+ cameras at over 30+ fps and is packed with next gen neural networks perception system capable of L5 self driving.

It's quite clear that Mobileye's NN data and architecture, and chip are efficient and high quality unlike Tesla.
 
  • Funny
Reactions: Snerruc
Tesla has a habit of abandoning the old in favor of the new as soon as they think the old is "good" enough.

Tesla seems to have this habit for flavors they don’t manufacture anymore. Old Model S cars probably get user interface updates mostly because they still share some computers with the latest models. Features not manufactured anymore are quickly mothballed.

Autopilot 2 can be somewhat different because product was sold regarding full self-drive but other than that it is already obsolete for sure as evidenced by the dashcam.
 
It's quite clear that Mobileye's NN data and architecture, and chip are efficient and high quality unlike Tesla.

MobileEye certainly has expertise on highly efficient, and elegant architectures. So much so that I think it's kinda pointless to talk about TFLOPS.

Instead I would focus on the capability, and accuracy of the NN network. After all do we really care if the Tesla's solution is less efficient? It's not we don't have big friggen battery.

What I like about Tesla's 360 degree visualization is you can see first hand that it's not exactly accurate. So I can't wait to try out HW3 on it which more than anything else should considerably improve the accuracy of it.

I look forwards to any vehicle with an EyeQ4 system that has some similar visualization.
 
  • Like
Reactions: J1mbo and Bladerskb
Is one FP16 op equivalent to two INT8 ops?

That would put Google’s hypothetical figure of 50 FP16 teraops at 100 INT8 teraops.

Assuming Nvidia treats a “deep learning op” as an INT8 op, HW2 would have 10-12 INT8 teraops and if HW3 is 10x that, then it’s 100-120 INT8 teraops.

Intel/Mobileye wants to do 24 “deep learning” teraops with EyeQ5 in 2020.

Nvidia wants to 320 INT8 teraops with Pegasus (not sure when launching).

If all this is accurate (it might not be), we have:
  • Mobileye EyeQ5: 24 DL teraops
  • Google’s 2016 hypothetical: 100 INT8 teraops
  • Tesla HW3: 100-120 DL/INT8 teraops (50-240 DL/INT8 teraops)
  • Nvidia Pegasus: 320 INT8 teraops
Nvidia is also working on Orin, which is two Pegasuses, so 640 INT8 teraops total.

Not sure if Nvidia is getting these numbers from anywhere, or just going nuts? :p “Pegasus is sufficient for a fully autonomous, Level 5 robotaxi... but now we’re gonna DOUBLE it!!”

When Nvidia announced AutoChauffeur with 24 DL teraops, it sure sounded to me like they were pitching it as a computer to power self-driving cars:


Nvidia still says on their website today that AutoChauffeur is supposed to be for “point-to-point travel”. I also found some old Nvidia PowerPoint slides where the subheading was “AutoChauffeur & Fully Autonomous”. Confusing marketing at the very least.

Is Nvidia backpedaling, or has there been a consistent narrative all along?
 
Last edited:
Maybe the idea is that you would stack multiple Drive PX 2s together?



It feels strange to launch a computer with 24 DL teraops and pitch it as a computer for self-driving cars. Then to launch a 320 DL teraops computer for self-driving cars. Then announce you’re working on a 640 DL teraops computer for self-driving cars.

I guess theoretically there is no limit for how much compute self-driving cars ought to use. Nvidia is setting the ceiling, not the floor. They want to give their customers as much computation as possible. Whereas Mobileye and Tesla are trying to determine the floor, and do the job with as little computation as possible.

I would love to know how much compute Waymo’s Pacifica minivans use, since Waymo is much less cost constrained than Mobileye or Tesla. The 50 FP16 TOPS estimate from Google was in August 2016, and in that same timeframe both Tesla and seemingly Nvidia have increased their estimates of how much computing is required by about 10x. So following that pattern Waymo probably uses 500 FP16 TOPS now :p
 
Last edited:
I don't believe Nvidia is saying that Orin will be 640 TOPS. But that Two Orin chips will match pegasus performance rather than using 4 Xavier chips.

DrivePegasus.jpg


Nvidia is promoting that you need more power only because they want to sell more chips. Which i addressed before.
Mobileye has always proven they can do vastly more with bare minimal chip power and are only offering more because the automakers are begging for more. The automakers are also stubbornly using in-efficient algorithms, plus the fact that mobil-eye can make more money by advocating you should double up doesn't hurt.

For example Mobileye L4 SDC cars runs on 4x eyeq4 (10 TFLOPS). That's ridiculous efficiency! Proving you don't need 50 TFLOPS (Google), 100 TFOPS (10x theoretical Tesla), or 320 TOPS (Nvidia).

However, Waymo i believe is limited by watts consumption. I'm not sure if they uses the car's 33 mile battery to power and cool their chips.