Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Inside the NVIDIA PX2 board on my HW2 AP2.0 Model S (with Pics!)

This site may earn commission on affiliate links.
They may have said that but they didn't mean it. Here's a quote from the Nvidia website:


People don't normally train at 8 bit ints. Google's former TPU for instance was only for inference. The new one supports FP16 so they are finally about to do training on it.

It's possible they could augment the network or train a smaller network based on local patterns, but the major networks are going to have been trained outside the car.

Ofcourse you cant train on each individual car on a production pipline because you cant combine models. So there wont be any kind of "local neural net".

But there are many small startups, tech companies, universities, individuals, enthusiast who dont have the funds to buy datacenters and only have 1 car they are testing with.

They will train on the drive px2 dev kit and deploy with it aswell.

Theres has to be a reason why it cost $15,000. No one in their right mind would use it if the fp16 were as gimped as the other models when they can simply use a gtx 1080 that has 36 int 8 inference.

And since no deep learning library fully supports int 8, it will be practically useless.
 
Last edited:
Ofcourse you cant train on each individual car on a production pipline because you cant combine models. So there wont be any kind of "local neural net".

But there are many small startups, tech companies, universities, individuals, enthusiast who dont have the funds to buy datacenters and only have 1 car they are testing with.

They will train on the drive px2 dev kit and deploy with it aswell.

Theres has to be a reason why it cost $15,000. No one in their right mind would use it if the fp16 were as gimped as the other models when they can simply use a gtx 1080 that has 36 int 8 inference.

And since no deep learning library fully supports int 8, it will be practically useless.

Naw, the dev kit was just that, a dev kit. It's not for training, it's exactly like the production version it was simply pre mass production. If you saw the V100 announcement since Jensen was holding the very first chip he joked that that single chip cost 3 billion dollars.

Universities and enthusiasts are better off sticking with Tesla cards for compute. The high end ones usually run about $6000 ish so it would have been cheaper than a dev Drive PX2. If they tried training on a DrivePX2 it'd be a laughable failure as it'd take practically forever. haha:)

Usually what Google has done in the past is train the model using FP16 or FP32 and then convert it to appropriate ints then deploy it on an INT8 TPU.

Some outside of Tesla have suggested local neural nets which tailor the vehicle to individual driving preferences (some people prefer more aggressive driving to feel safer). Similar to how other AI companies like Google tailor individual services for users. It wouldn't necessarily be fed back into the Matrix (pun intended).

You can combine neural networks but I don't think in this case it'd be that effective ... however on a massive scale it could be interesting and larger/cheaper than a datacenter they might invest in.
 
  • Like
Reactions: techmaven
Hi All-

Shout out to the inspiration: @lunitiks @bjornb @verygreen

It was nice to know that my Model S still drove great today! Disassembly and reassembly (again) went much faster today, and contrary to my wife's promising, I did not "break my car". Still drives!

I took the AP2.0 board out again tonight and took some more pictures, this time with the heatsinks and thermal paste removed. You can clearly see the CPU and GPU now. What is interesting is that the GPU daughterboard is designed for removal without the need to take the board out of the enclosure. I think Tesla is planning on in-situ GPU upgrades by easily removing the fans, the heatsink, and then the daughterboard. Makes things easy when upgrades are available in the future. Apologies, I really tried to get clear pictures of the RAM modules, but frankly the numbers engraved on the casing were completely impossible to read. I did my best. Here are some pics of the chips:

IMG_0592.JPG
IMG_0593.JPG
IMG_0594.JPG
IMG_0595.JPG
IMG_0596.JPG
IMG_0599.JPG
IMG_0600.JPG
IMG_0601.JPG
IMG_0602.JPG
IMG_0603.JPG
 
Thanks for going through all this. Unfortunately for the SoC, the markings doesn't really tell much, like what happened when the markings on the Nintendo switch chip vs the Nvidia Shield chip was analyzed. This is different from the GPUs where the code name is right on the marking.
 
Great work @kdday! Also, smart thinking about that board-swap :) Now I'm going to sit down and analyze your pictures pixel-by-pixel :)

Unfortunately for the SoC, the markings doesn't really tell much, like what happened when the markings on the Nintendo switch chip vs the Nvidia Shield chip was analyzed. This is different from the GPUs where the code name is right on the marking.

Yes it's a bummer. Luckily @verygreen has posted a boot log that shows 6 CPU cores; two Denver and four A57s so I guess there's little room for doubt that this is mr Parker. Besides, Tesla/Nvidia wouldn't call this a "PX2" if it wasn't the Parker chip. Like Nvidia states here, Parker is "The main building component of NVIDIA DRIVE PX 2 platform".
 
Great work @kdday! Also, smart thinking about that board-swap :) Now I'm going to sit down and analyze your pictures pixel-by-pixel :)



Yes it's a bummer. Luckily @verygreen has posted a boot log that shows 6 CPU cores; two Denver and four A57s so I guess there's little room for doubt that this is mr Parker. Besides, Tesla/Nvidia wouldn't call this a "PX2" if it wasn't the Parker chip. Like Nvidia states here, Parker is "The main building component of NVIDIA DRIVE PX 2 platform".
Yep, everything matches Parker. Just a shame it can't be further confirmed with the markings.
 
I bemoan that I can't otherwise contribute pithily to this thread; I will, however, say that

  • some people wrestle crocodiles for a living
  • others fish the Bering Sea in January
  • some sit on a fire ant nest, feet in water and licking a live HPWC cable
they all take a back seat to our kdday! Props to you, man - AND I see in the first pic of your post #103 that you're still wearing your wedding ring. Heh.
 
I'll not pretend to know a lot about the subject of pricessing demand (I do have an electronics degree, just not up to speed with specs and required demand), but I didn't think full self driving in Teslas world is level 5 autonomy. They have left a clear get out in that they say it should work in nearly all situations, and suggest a driver is behind the wheel. In that case the processing demands are likely to be lower than what google and the rest are trying to achieve.
 
Naw, the dev kit was just that, a dev kit. It's not for training, it's exactly like the production version it was simply pre mass production. If you saw the V100 announcement since Jensen was holding the very first chip he joked that that single chip cost 3 billion dollars.

Universities and enthusiasts are better off sticking with Tesla cards for compute. The high end ones usually run about $6000 ish so it would have been cheaper than a dev Drive PX2. If they tried training on a DrivePX2 it'd be a laughable failure as it'd take practically forever. haha:)

Usually what Google has done in the past is train the model using FP16 or FP32 and then convert it to appropriate ints then deploy it on an INT8 TPU.

Some outside of Tesla have suggested local neural nets which tailor the vehicle to individual driving preferences (some people prefer more aggressive driving to feel safer). Similar to how other AI companies like Google tailor individual services for users. It wouldn't necessarily be fed back into the Matrix (pun intended).

You can combine neural networks but I don't think in this case it'd be that effective ... however on a massive scale it could be interesting and larger/cheaper than a datacenter they might invest in.

That preference is simply based on settings not about having different models.

You cant comibne models, you can train use an already trained network and train a new model. But you cant merge 100k individual models in one.
 
That preference is simply based on settings not about having different models.
Individualization can be accomplished with individual models. Here's a paper documenting an example of this approach: http://ieeexplore.ieee.org/document/7139555/

You cant comibne models, you can train use an already trained network and train a new model. But you cant merge 100k individual models in one.
You can merge multiple expert models. Here's a pdf describing a few methods based on the Geoffrey Hinton Coursera course.
https://courses.cs.ut.ee/MTAT.03.27...-to-improve-generalization-andres-viikmaa.pdf
 
You can merge multiple expert models. Here's a pdf describing a few methods based on the Geoffrey Hinton Coursera course.
https://courses.cs.ut.ee/MTAT.03.27...-to-improve-generalization-andres-viikmaa.pdf

Did you just randomly search for these papers? Did you even take a look at them?

They are talking about averaging the output and prediction of different models.

That isnt combining models, each model still has its own individual weights and bias.

The whole point of the paper was to average out the prediction of the different models which can be based on different types of nn using different algorithms or expert models like knn.

They then add the predictions of each individual models. In simplified term, if i had 5 models and im trying to predict if the image im looking at is a cup.

I run that image through each 5 different individual models.

Model 1 says its a cup.
Model 2 says its not a cup.
Model 3 says its not a cup.
Model 4 says its a cup.
Model 5 says its a cup.

Thats 3 models saying its a cup so because thats over 50% i predict its a cup.


Turns out its not a cup!

The paper is talking about different types pf model but yours is even worse because you are trying to apply this to the same type of model and learning algorithms. Meaning a bunch of 70% accurate models give or take still produces 70% predi tion accuracy.

Now do this with 100k individual created models. Not only is that NOT COMBINING MODELS but its also stupidly inefficient and cant run on any gpu in realtime.
 
Last edited:
  • Disagree
  • Informative
Reactions: croman and JeffK
It can be but doesn't have to be if the same algorithm was used.

But hey if you say Geoffrey Hinton doesn't know what he's talking about....


If the same algorithm was used its even worse. Check my edit of my preivous post.

What geoffrey is talking about and what you are inferring from his paper are two totally different thing.

He's not in the wrong, you on the other hand are.

What he is saying is pretty standard and thats using acouple different models based on different models (expert or types of neural network) averaging out their outputs/prediction to improve your accuracy.

What you are talking about however is different.
 
  • Disagree
Reactions: JeffK
If the same algorithm was used its even worse. Check my edit of my preivous post.

What geoffrey is talking about and what you are inferring from his paper are two totally different thing.

He's not in the wrong, you on the other hand are.

What he is saying is pretty standard and thats using acouple different models based on different models (expert or types of neural network) averaging out their outputs/prediction to improve your accuracy.

What you are talking about however is different.
My understanding is he is talking about part two "Mixtures of Experts," while you are talking about part 1.

You aren't averaging the outputs to improve accuracy (as would be intuitive), but rather picking from a bunch of experts by using a function that breaks down the data to select the right one for the right inputs. Under this scheme you can potentially merge 100k different models and not have drastically higher processing requirements (since other than a few models deemed suitable for a given input, the rest are not run). Definitely an interesting idea.
 
My understanding is he is talking about part two "Mixtures of Experts," while you are talking about part 1.

You aren't averaging the outputs to improve accuracy (as would be intuitive), but rather picking from a bunch of experts by using a function that breaks down the data to select the right one for the right inputs. Under this scheme you can potentially merge 100k different models and not have drastically higher processing requirements (since other than a few models deemed suitable for a given input, the rest are not run). Definitely an interesting idea.

Nah thats still not combining models.

The mixture of experts is simply training different networks on different tasks.

For example a model trained on stop signs and another trained on street signs....etc or a model trained on seeing stop signs at night and another trained to see stop signs in the day then looking at the input data and deciding which model to use.

Or a model trained to detect stop sign in heavy rain vs heavy snow vs normal weather and looking at the input and picking which one to use.

The training data is segregated and only a subset of the training data is used for the specilized model.

So only pictures of stop signs in the rain is feed to one model to be trained with.

Geoffrey himself said

"The idea is to train a number of nn each of which specilizes in a different part of the data. We assume we have a dataset which comes from a number of different regimes.

We train a system in which one nn will specilize in one regime and a managing nn will look at the input data and decide which specialist to give it to."

Walkthrough

We have 3 training data (heavy rain stop signs, clear weather stop signs, then mixture of heavy rain and clear weather signs)

Model 1
Trained with pictures of stop sign in the heavy rain to recognize them

Model 2
Trained with pictures of stop sign in clear weather to recognize.

Model 3
Trained with pictures of a mixture of stop signs in both weather including the accompanying data of which model to use (model 1 or model 2) for each picture.


Input data comes in, model 3 tells it to use either model 1 or 3.

This technique which came out in the 90s is not utilized because it just doesnt give better accuracy.

Almost everyone use one model for everything.
So they feed traffic signs in all weather to one network.

Or all types of traffic light in all weather to one network or all type of pedestrians in all weather.
 
Last edited:
Nah thats still not combining models.

The mixture of experts is simply training different networks on different tasks.

For example a model trained on stop signs and another trained on street signs....etc or a model trained on seeing stop signs at night and another trained to see stop signs in the day then looking at the input data and deciding which model to use.

Or a model trained to detect stop sign in heavy rain vs heavy snow vs normal weather and looking at the input and picking which one to use.

The training data is segregated and only a subset of the training data is used for the specilized model.

So only pictures of stop signs in the rain is feed to one model to be trained with.

Geoffrey himself said

"The idea is to train a number of nn each of which specilizes in a different part of the data. We assume we have a dataset which comes from a number of different regimes.

We train a system in which one nn will specilize in one regime and a managing nn will look at the input data and decide which specialist to give it to."

Walkthrough

We have 3 training data (heavy rain stop signs, clear weather stop signs, then mixture of heavy rain and clear weather signs)

Model 1
Trained with pictures of stop sign in the heavy rain to recognize them

Model 2
Trained with pictures of stop sign in clear weather to recognize.

Model 3
Trained with pictures of a mixture of stop signs in both weather including the accompanying data of which model to use (model 1 or model 2) for each picture.


Input data comes in, model 3 tells it to use either model 1 or 3.

This technique which came out in the 90s is not utilized because it just doesnt give better accuracy.

Almost everyone use one model for everything.
So they feed traffic signs in all weather to one network.

Or all types of traffic light in all weather to one network or all type of pedestrians in all weather.
If you couldn't parallelize neural network training then using HPC clusters or GPUs would suck.
https://www.cs.swarthmore.edu/~newhall/papers/pdcn08.pdf

I'm not saying Tesla is doing this, but of course it can be done.
 
  • Helpful
Reactions: HumanGenome
I am bewildered by this slide from Nvidias tech adviser @ GTC Japan 2016 (PDF).

Seems to me that one Parker chip ("Tegra A") + dGPU is meant for the actual AI work, while the other Parker chip ("Tegra B") is supposed to do driver visualization bling, or?

View attachment 228093
Interesting post. Is it talking about Tegra A + 2x dGPU or just 1x dGPU (like Tesla is using)? One Parker + 1 dGPU seems like overkill if dedicated for UI purposes (or maybe they are talking about fancy stuff like full windshield HUDs).