HW2.5 capabilities

S4WRXTTCS · Oct 23, 2017

jimmy_d said:
So I got a chance to look at the network specification for the AP2 neural network in 40.1. As @verygreen previously reported, the input is a single 416x640 image with two color channels - probably red and grey. Internally the network processes 104x160 reduced frames as quantized 8 bit values. The network itself is a tailored version of the original GoogLeNet inception network plus a set of deconvolution layers that present the output. Output is a collection of 16 single color frames, some at full and some at quarter resolution. The network is probably a bit less than 15 million parameters given the file size.

So what does this mean? Images go into it, and for every frame in the network produces a set of 16 interpretations which also come in the form of grayscale images. Some external process takes those processed frames and makes control decisions based on them, probably after including radar and other sensors. This is not an end-to-end network: it doesn't have any scalar outputs that could be used directly as controls.

Also, the kernel library includes a number of items with intriguing names that are unused in the current network. At a minimum this must mean that there are variations of this network that have more features than they are exploiting in the current version, or that they have other networks with enhanced functionality which share the same set of kernels.

Is it possible to run inference with the model? Assuming we have a 416x640 two color image from one of the cameras?

jimmy_d · Oct 23, 2017

S4WRXTTCS said:
Is it possible to run inference with the model? Assuming we have a 416x640 two color image from one of the cameras?

Not on any hardware that I have. There is custom code in the libraries so you can't just run it on a generic NN framework. Those binaries will be specific to the cpu/OS/gpu combination in the car and we don't have source code to let us cross compile to another platform.

_jal_ · Oct 23, 2017

So much smart here on this grupe!

My 2 cent guess from peanut gallery is that this network is commanding a tiny bit of available bandwidth on the GPU. Peanuts anyone?

Is there any way we could see if they were running the uber-smarts FSD net in "shadow mode" (I feel so much dumber for typing that)?

generalenthu · Oct 23, 2017

RDoc said:
I wonder if what they are doing is using the NN to recognize features, then using those to build a local virtual world which the car then drives through rather than having the NN directly make the complete driving decisions. Not having programmed anything like that, I wouldn't be surprised if the driving path strategy might be better done algorithmically than in a NN since settingi up the training sounds pretty difficult.

Yes, path planning was Sterling Anderson's specialty. And it very likely is algorithmic, not NN based given his involvement and the drift of the articles linked. Also these algorithms are much easier to debug, than a NN

Also, I believe all the DL / ML experts are in the vision team and not in other teams.

lunitiks · Oct 23, 2017

This whole "teams" speculation is a little crazy IMO. I mean who cares what teams they have! Don't they see what's (not) going on? Can't they read? Their own webpage states what EAP should be doing - right now!

If the reason we're not seeing any 'enhanced' is because they've pulled valuable resources from eap to fsd I think it's negligent. If they did the opposite they'd be satisfying tens of thousands of owners, avoid lawsuits, make a s*** ton of cash and be recognized as the big cahuna in this game. Doesn't make sense.

I believe they're struggling. It's that easy. The lack of rain sensing wipers is the smoking gun here. Otherwise they're management is beyond critique. I'd SCREAM if I was running an automotive company that can't get effing wipers to work properly.

Sorry. I'll leave it to you smart guys now. Great insights here

AnxietyRanger · Oct 23, 2017

lunitiks said:
This whole "teams" speculation is a little crazy IMO. I mean who cares what teams they have! Don't they see what's (not) going on? Can't they read? Their own webpage states what EAP should be doing - right now!

It depends on the perspective. If you're talking about Tesla's commitments to their customers, you're right of course. It does not matter.

But trying to get a full picture of what is going on at that company and the industry in general, understanding the resource split between driver's aids and self-driving would be necessary.

S4WRXTTCS · Oct 23, 2017

jimmy_d said:
Not on any hardware that I have. There is custom code in the libraries so you can't just run it on a generic NN framework. Those binaries will be specific to the cpu/OS/gpu combination in the car and we don't have source code to let us cross compile to another platform.

I'm sure we could crowd source you a Drive PX2.

Bladerskb · Oct 24, 2017

Seeing as Junli Gu's job while she was a manager, tech lead of the machine learning team at Tesla was to "achieve large scale of object detection, lane detection and modeling various of complex driving scenarios;"

Shows that they are still working on the object detection and sensing problem precisely as i predicted.
They are so behind, its comical.

Kant.Ing · Oct 24, 2017

Exactly, computer vision is so small part in FSD. I have seen papers and videos from Mobileye and Shashua from the 90s. Relying on brutal force computing power shows how clueless some new players still are.

calisnow · Oct 24, 2017

Nothing to add here - just wanted to say a quick thanks to @jimmy_d and @verygreen for their ongoing, valuable contributions to TMC. Seriously - your knowledge and technical explanations are much appreciated.

sandpiper · Oct 24, 2017

Kanting said:
Exactly, computer vision is so small part in FSD. I have seen papers and videos from Mobileye and Shashua from the 90s. Relying on brutal force computing power shows how clueless some new players still are.

Brutal force?

jimmy_d · Oct 24, 2017

Ok, after trying to see if there was a way to profile the running network on the car and striking out, I laboriously did the speed calculation for the 40.1 neural network by hand. (homework attached) Turns out to be 17G-MACs per image. Best guess at the speed of the HW2 GPU is 3.8G-MACs per second (floating point). I tried to track down a benchmark for similar networks running on similar parts, and what I can find so far is a 70% utilization rate for a pure float32 network. The 40.1 network is 90% ints and 10% floats, so it could be up to 3.7x faster than a 100% float network would be. A 100% float version of this network would run at 156fps. If the integer utilization is good then it could be up to 577fps.

jimmy_d · Oct 24, 2017

Oh, and in going back through it I found that they are actually generating output frames named 'shoulder' and 'vl_class' and 'obj'. So there's a total of 16 flavors of output frame (details in homework) but they seem to include 4 kinds of bounding box, 5 kinds of obj, and 5 kinds of 'loc_seg' in addition to 'vl_class' and 'shoulder'.

Pale_Rider · Oct 24, 2017

70% utilization for such a simple NN (compared to one using all 8 cameras + controlling steering) sounds kinda crummy... Thoughts on current optimization level and if this is just further evidence that that do not plan on extending this model beyond baskc AP1 functionality? Or is the current hardware just grossly underpowered?

buttershrimp · Oct 24, 2017

jimmy_d said:
Ok, after trying to see if there was a way to profile the running network on the car and striking out, I laboriously did the speed calculation for the 40.1 neural network by hand. (homework attached) Turns out to be 17G-MACs per image. Best guess at the speed of the HW2 GPU is 3.8G-MACs per second (floating point). I tried to track down a benchmark for similar networks running on similar parts, and what I can find so far is a 70% utilization rate for a pure float32 network. The 40.1 network is 90% ints and 10% floats, so it could be up to 3.7x faster than a 100% float network would be. A 100% float version of this network would run at 156fps. If the integer utilization is good then it could be up to 577fps.

Anyone got an straw because Professor JimmyD just opened a can of fresh knowledge in this thread.

verygreen · Oct 24, 2017

jimmy_d said:
Ok, after trying to see if there was a way to profile the running network on the car and striking out, I laboriously did the speed calculation for the 40.1 neural network by hand. (homework attached) Turns out to be 17G-MACs per image. Best guess at the speed of the HW2 GPU is 3.8G-MACs per second (floating point). I tried to track down a benchmark for similar networks running on similar parts, and what I can find so far is a 70% utilization rate for a pure float32 network. The 40.1 network is 90% ints and 10% floats, so it could be up to 3.7x faster than a 100% float network would be. A 100% float version of this network would run at 156fps. If the integer utilization is good then it could be up to 577fps.

So if we cautionously assume that current network has data at 60fps (2 cameras at 30fps each) + we'll get 2 more of similar complexity NNs at another 60 fps (2x repeater + 2x pillars) = 180fps in total + a hopefully simplier wide camera NN and that just leaves the backup camera that has a totally different picture pattern, but on the other hand would not need to be used all the time.
This looks like there's a chance the whole performance would be about where it should be unless they drastically redo their NNs to make them much heavier I imagine.

buttershrimp · Oct 24, 2017

Hey @verygreen, I have a question, what sort of computing power do you predict will be needed for Tesla to do all parts of EAP well? Do you think they can pull it off with the AP2.0 hardware or 2.5?

verygreen · Oct 24, 2017

buttershrimp said:
Hey @verygreen, I have a question, what sort of computing power do you predict will be needed for Tesla to do all parts of EAP well? Do you think they can pull it off with the AP2.0 hardware or 2.5?

Sorry, this is not my area of specialization. I am just breaking stuff. You need people like @jimmy_d to to put them back together

_jal_ · Oct 24, 2017

Nvidia's demo ran at 30fps. So that halves the utilization vs 60fps. Still, seems like a lot of work for a few bounding boxes and a few other objects.

_jal_ · Oct 24, 2017

Just reread @verygreen's post and saw he was counting on the cameras working at 30fps already.

HW2.5 capabilities

Well-Known Member

Deep Learning Dork

Member

Active Member

Cool James & Black Teacher

Well-Known Member

Well-Known Member

Senior Software Engineer

Member

Banned

Active Member

Deep Learning Dork

Attachments

Deep Learning Dork

Member

Click my signature to Go Mad Max Mode

Curious member

Click my signature to Go Mad Max Mode

Curious member

Member

Member

Similar threads