Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

HW2.5 capabilities

This site may earn commission on affiliate links.
So I got a chance to look at the network specification for the AP2 neural network in 40.1. As @verygreen previously reported, the input is a single 416x640 image with two color channels - probably red and grey. Internally the network processes 104x160 reduced frames as quantized 8 bit values. The network itself is a tailored version of the original GoogLeNet inception network plus a set of deconvolution layers that present the output. Output is a collection of 16 single color frames, some at full and some at quarter resolution. The network is probably a bit less than 15 million parameters given the file size.

So what does this mean? Images go into it, and for every frame in the network produces a set of 16 interpretations which also come in the form of grayscale images. Some external process takes those processed frames and makes control decisions based on them, probably after including radar and other sensors. This is not an end-to-end network: it doesn't have any scalar outputs that could be used directly as controls.

Also, the kernel library includes a number of items with intriguing names that are unused in the current network. At a minimum this must mean that there are variations of this network that have more features than they are exploiting in the current version, or that they have other networks with enhanced functionality which share the same set of kernels.

Is it possible to run inference with the model? Assuming we have a 416x640 two color image from one of the cameras?
 
Is it possible to run inference with the model? Assuming we have a 416x640 two color image from one of the cameras?

Not on any hardware that I have. There is custom code in the libraries so you can't just run it on a generic NN framework. Those binaries will be specific to the cpu/OS/gpu combination in the car and we don't have source code to let us cross compile to another platform.
 
:cool::eek::oops: So much smart here on this grupe! :oops::eek::cool:

My 2 cent guess from peanut gallery is that this network is commanding a tiny bit of available bandwidth on the GPU. Peanuts anyone?

Is there any way we could see if they were running the uber-smarts FSD net in "shadow mode" (I feel so much dumber for typing that)?
 
I wonder if what they are doing is using the NN to recognize features, then using those to build a local virtual world which the car then drives through rather than having the NN directly make the complete driving decisions. Not having programmed anything like that, I wouldn't be surprised if the driving path strategy might be better done algorithmically than in a NN since settingi up the training sounds pretty difficult.

Yes, path planning was Sterling Anderson's specialty. And it very likely is algorithmic, not NN based given his involvement and the drift of the articles linked. Also these algorithms are much easier to debug, than a NN

Also, I believe all the DL / ML experts are in the vision team and not in other teams.
 
  • Informative
Reactions: scottf200
This whole "teams" speculation is a little crazy IMO. I mean who cares what teams they have! Don't they see what's (not) going on? Can't they read? Their own webpage states what EAP should be doing - right now!

If the reason we're not seeing any 'enhanced' is because they've pulled valuable resources from eap to fsd I think it's negligent. If they did the opposite they'd be satisfying tens of thousands of owners, avoid lawsuits, make a s*** ton of cash and be recognized as the big cahuna in this game. Doesn't make sense.

I believe they're struggling. It's that easy. The lack of rain sensing wipers is the smoking gun here. Otherwise they're management is beyond critique. I'd SCREAM if I was running an automotive company that can't get effing wipers to work properly.

Sorry. I'll leave it to you smart guys now. Great insights here
 
This whole "teams" speculation is a little crazy IMO. I mean who cares what teams they have! Don't they see what's (not) going on? Can't they read? Their own webpage states what EAP should be doing - right now!

It depends on the perspective. If you're talking about Tesla's commitments to their customers, you're right of course. It does not matter.

But trying to get a full picture of what is going on at that company and the industry in general, understanding the resource split between driver's aids and self-driving would be necessary.
 
Not on any hardware that I have. There is custom code in the libraries so you can't just run it on a generic NN framework. Those binaries will be specific to the cpu/OS/gpu combination in the car and we don't have source code to let us cross compile to another platform.
I'm sure we could crowd source you a Drive PX2. :)
 
Seeing as Junli Gu's job while she was a manager, tech lead of the machine learning team at Tesla was to "achieve large scale of object detection, lane detection and modeling various of complex driving scenarios;"

Shows that they are still working on the object detection and sensing problem precisely as i predicted.
They are so behind, its comical.
 
Exactly, computer vision is so small part in FSD. I have seen papers and videos from Mobileye and Shashua from the 90s. Relying on brutal force computing power shows how clueless some new players still are.

Brutal force? :)

header.jpg
 
Ok, after trying to see if there was a way to profile the running network on the car and striking out, I laboriously did the speed calculation for the 40.1 neural network by hand. (homework attached) Turns out to be 17G-MACs per image. Best guess at the speed of the HW2 GPU is 3.8G-MACs per second (floating point). I tried to track down a benchmark for similar networks running on similar parts, and what I can find so far is a 70% utilization rate for a pure float32 network. The 40.1 network is 90% ints and 10% floats, so it could be up to 3.7x faster than a 100% float network would be. A 100% float version of this network would run at 156fps. If the integer utilization is good then it could be up to 577fps.
 

Attachments

  • 40_1 spreadsheet.png
    40_1 spreadsheet.png
    802.5 KB · Views: 124
Oh, and in going back through it I found that they are actually generating output frames named 'shoulder' and 'vl_class' and 'obj'. So there's a total of 16 flavors of output frame (details in homework) but they seem to include 4 kinds of bounding box, 5 kinds of obj, and 5 kinds of 'loc_seg' in addition to 'vl_class' and 'shoulder'.
 
70% utilization for such a simple NN (compared to one using all 8 cameras + controlling steering) sounds kinda crummy... Thoughts on current optimization level and if this is just further evidence that that do not plan on extending this model beyond baskc AP1 functionality? Or is the current hardware just grossly underpowered?
 
Ok, after trying to see if there was a way to profile the running network on the car and striking out, I laboriously did the speed calculation for the 40.1 neural network by hand. (homework attached) Turns out to be 17G-MACs per image. Best guess at the speed of the HW2 GPU is 3.8G-MACs per second (floating point). I tried to track down a benchmark for similar networks running on similar parts, and what I can find so far is a 70% utilization rate for a pure float32 network. The 40.1 network is 90% ints and 10% floats, so it could be up to 3.7x faster than a 100% float network would be. A 100% float version of this network would run at 156fps. If the integer utilization is good then it could be up to 577fps.

Anyone got an straw because Professor JimmyD just opened a can of fresh knowledge in this thread. :cool:
 
Ok, after trying to see if there was a way to profile the running network on the car and striking out, I laboriously did the speed calculation for the 40.1 neural network by hand. (homework attached) Turns out to be 17G-MACs per image. Best guess at the speed of the HW2 GPU is 3.8G-MACs per second (floating point). I tried to track down a benchmark for similar networks running on similar parts, and what I can find so far is a 70% utilization rate for a pure float32 network. The 40.1 network is 90% ints and 10% floats, so it could be up to 3.7x faster than a 100% float network would be. A 100% float version of this network would run at 156fps. If the integer utilization is good then it could be up to 577fps.
So if we cautionously assume that current network has data at 60fps (2 cameras at 30fps each) + we'll get 2 more of similar complexity NNs at another 60 fps (2x repeater + 2x pillars) = 180fps in total + a hopefully simplier wide camera NN and that just leaves the backup camera that has a totally different picture pattern, but on the other hand would not need to be used all the time.
This looks like there's a chance the whole performance would be about where it should be unless they drastically redo their NNs to make them much heavier I imagine.
 
Hey @verygreen, I have a question, what sort of computing power do you predict will be needed for Tesla to do all parts of EAP well? Do you think they can pull it off with the AP2.0 hardware or 2.5?
Sorry, this is not my area of specialization. I am just breaking stuff. You need people like @jimmy_d to to put them back together ;)