Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Neural Networks

This site may earn commission on affiliate links.
I appreciate the technical data. I also appreciated the improved performance of Autopilot during a recent round trip drive from SoCal to OKC. The reliability of V9 was amazing and impressive with the 2017 MS 90D.

One caution still is in order. If construction crews place cement barriers or reflective plastic channelized drums over the left lane line, drive carefully and exit Autopilot at the first indication of being too close. Nothing was hit, but the right lane line kept the vehicle a bit too close to those barricades.
 
Are the Google TPUs generalized or optimized for a specific function? I.e. if they're either generalized or optimized for training (I forget the correct term) NNs, versus running already trained NNs, then the Tesla design may differ as it will surely be optimized only for running already compiled NNs.
 
Are the Google TPUs generalized or optimized for a specific function? I.e. if they're either generalized or optimized for training (I forget the correct term) NNs, versus running already trained NNs, then the Tesla design may differ as it will surely be optimized only for running already compiled NNs.

They are optimized for performing matrix multiplication on 8 bit quantized neural networks who's mean dimension is over 256. This means they will be pretty good at running inference on the large majority of vision NN algorithms in use today.

The first version of the TPU (as detailed in the paper I linked above) is not appropriate for training networks with any extant algorithm - it's meant for deploying trained and pre-optimized networks. It happens to be well suited to running networks just like the ones that Tesla is currently using in the vehicle (TensorRT style optimized 8bit inference). Frankly, the TPU V1 itself would probably be a great chip for Tesla to deploy but for the fact that it's not commercially available. There are no commercial systolic array chips that I'm aware of. Google is releasing (this month) a chip of their design which they call the edge TPU which might be a systolic array and which could conceivably be useful in self driving applications - but I've not yet see the architecture or any detailed specs for it.

Edge TPU - Run Inference at the Edge | Edge TPU | Google Cloud

Edge TPU Devices
 
They are optimized for performing matrix multiplication on 8 bit quantized neural networks who's mean dimension is over 256. This means they will be pretty good at running inference on the large majority of vision NN algorithms in use today.

The first version of the TPU (as detailed in the paper I linked above) is not appropriate for training networks with any extant algorithm - it's meant for deploying trained and pre-optimized networks. It happens to be well suited to running networks just like the ones that Tesla is currently using in the vehicle (TensorRT style optimized 8bit inference). Frankly, the TPU V1 itself would probably be a great chip for Tesla to deploy but for the fact that it's not commercially available. There are no commercial systolic array chips that I'm aware of. Google is releasing (this month) a chip of their design which they call the edge TPU which might be a systolic array and which could conceivably be useful in self driving applications - but I've not yet see the architecture or any detailed specs for it.

Edge TPU - Run Inference at the Edge | Edge TPU | Google Cloud

Edge TPU Devices
Thanks for the informative replies/posts. I asked the question because I lack the necessary understanding of NN related tech to easily find the answer from the document, though I'm not surprised it was in there. It's an area of tech I haven't yet taken the time to become familiar with.
 
Of course there would be a bunch of small hardware and software changes, and Tesla would have to develop a set of software tools to enable efficient use of the new chip.

Just wanted to echo our appreciation for your contribution, and ask:

Is developing a set of software tools a huge undertaking? What would the time frame be on this? And would this fall under Stuart Bowers responsibilities?
 
It is not an exaggeration. It is just your opinion.

This is not a matter of opinion, though I am certainly putting my own spin on it which is where the exaggeration comes from. But go back to the first page of this thread, where you will find @jimmy_d 's findings, which I will cherry-pick excerpt here:

2) The front half of the NN in AP2 is basically Googlenet with a few notable differences:
- The input is 416x640 (original Googlenet was 224x224)
- The working frame size in Googlenet is reduced by 1/2 in each dimension between each of the 5 major blocks. The AP network omits the reduction between blocks 4 and 5 so that the final set of features is 2x2 times larger than in Googlenet.
[...]
After my first look at the version 40 NN I was surprised at how simple it was, conceptually, and how 'old' the circa 2015 architectural concepts were and speculated that perhaps this version of EAP was not getting much effort. (In the deep learning world 2 years is an eternity).

Like I said, I'm exaggerating and spinning, but the fact is that the NN in use in early 2017 was simplistic, already outdated, with an architecture cribbed from Googlenet (which was simplistic and outdated already at the time). This is something they threw together as quickly as possible to begin enabling the most simplistic AP features.
 
Isn’t the version of AKnet (Autonomous Kar net) in Autopilot v9 also based on GoogLeNet a.k.a. Inception v1? A neural network that won the 2014 ImageNet challenge?

For perspective, V9 camera network is 10x larger and requires 200x more computation when compared to Google’s Inception V1 network from which V9 gets it’s underlying architectural concept.

What is really fundamentally new in neural network architectures since 2014? Haven’t there mostly been incremental advances on the same general architectures? I know there is new stuff like capsule networks, but they are still experimental and not production ready.

If GoogLeNet is outdated, what’s a non-outdated neural network architecture that is used in production today?

There seems to be lots of academic and open source work on neural network architectures. I think most of the time companies just use existing architectures.

It’s kind of like how Android and Chrome OS use the Linux kernel. Why bother writing your own kernel from scratch when there is a perfectly good one that’s already freely available? Saying, “Omg, did you hear Google just downloaded some kernel off the Internet for Android? They didn’t even write their own, they just cribbed it!” just seems like you are out of touch with how open source software works, and how important open source software is to the software industry. Proprietary ≠ better. Open source ≠ plagiarism.

Maybe neural networks are different, but I don’t see why you wouldn’t use — or at least start with — a successful neural network architecture that’s already been developed by Google, Microsoft, a university research group, or whoever. Why isn’t using GoogLeNet the equivalent of using the Linux kernel?
 
  • Helpful
Reactions: croman
This is not a matter of opinion, though I am certainly putting my own spin on it which is where the exaggeration comes from. But go back to the first page of this thread, where you will find @jimmy_d 's findings, which I will cherry-pick excerpt here:

Like I said, I'm exaggerating and spinning, but the fact is that the NN in use in early 2017 was simplistic, already outdated, with an architecture cribbed from Googlenet (which was simplistic and outdated already at the time). This is something they threw together as quickly as possible to begin enabling the most simplistic AP features.

But it is your opinion, because you have no idea what Tesla used to create their FSD video. Nobody outside of a Tesla NDA knows what software was used to make that film.

It seems probable that they used NVIDIA's own DrivePX software, which is far more advanced than "a little toy demo they downloaded from the internet". For sure it was not the EAP NN that Jimmy_d looked at later.
 
But it is your opinion, because you have no idea what Tesla used to create their FSD video. Nobody outside of a Tesla NDA knows what software was used to make that film.

Look at the context of my response. I was responding specifically to a question about the released software in early 2017. What they used for the demo video is probably completely different, but that's not what we were talking about.
 
Just wanted to echo our appreciation for your contribution, and ask:

Is developing a set of software tools a huge undertaking? What would the time frame be on this? And would this fall under Stuart Bowers responsibilities?

I don't know anything about Tesla's internal organization.

It's probably worth mentioning here that: I don't know anyone who works at Tesla, I've never been in contact with Tesla in any fashion other than as a Tesla vehicle owner, I have no access to non-public information from inside Tesla via any means at all. All my speculation is based on some meagre knowledge of the technical state of the art plus occasional firmware files that are provided to me by other Tesla enthusiasts who seem to be extracting them from their vehicles somehow.

All my mistakes are my own.

Creating a production software tool suite to support the use of a custom systolic array IC (if that is what the Tesla Vision NN computer is) is probably an undertaking on a similar scale to that of developing the IC itself - ten to a hundred designer-years of work not counting support staff. If the IC architecture is something else it could be quite a bit more software - the systolic array is one of the simpler approaches, from a software standpoint.
 
Here, do you mean the design of the architecture is new and groundbreaking, or that its design is close to an existing architecture but it's way bigger?

Oh, and you think the network has 5x more weights, but what about layers? Any increase in layers?

The layer count is the same as for Inception V1 - 1/2/3ab/4abcdefg - inception was 1/2/3ab/4abcde/5ab. Those are inception V1 layers. An individual inception layer is a compound of 1x1,3x3,5x5, and maxpool layers which is comparable to 3 or 5 non-compound CNN layers. The transitions between numbers (1 to 2, 2 to 3) corresponds to a reduction in the frame size being processed at each layer. In the V8 network you'd get a reduction by half at each of those steps (640x416 to 320x208 etc). V8 is modified from inception in some minor ways which basically comes down to adjusting the depth of the convolutional kernels used in various places - plus some stuff to accommodate the higher resolution being used at the start (inception v1 was 224x224 3 color images).

So architecturally it's really similar to inception but it's been modified - 'tuned' if you will, to match the differences in the output requirements, size of the training data pool, accuracy targets, training resources, and so forth between a network designed to perform well on an academic benchmark and one designed to perform well in a real world product. The inception architecture introduced some novel and ground breaking advances that improved final accuracy to a degree but beyond that significantly reduced the runtime computational requirements at the cost of an increase in training resource requirements. Later versions of the inception architecture (V2, V3, V4 so far) change the internal structure of the inception modules but can be seen as incremental refinements. I'm not in a good position to say whether V2/3/4 would be a better match to Tesla's needs than V1. Similarly, while there are other, newer networks in the public domain which perform better on public benchmarks by various criteria I am not in a position to say whether they are a better option for Tesla's needs than Inception V1.

I don't entirely disagree with the sentiment that progress since Inception V1 has been incremental. There has been a lot of very important work and a lot of important stuff learned. But there aren't any networks that are so much better than Inception V1 that I would say Tesla should certainly be using them.

AKNET_V9 uses this same Inception V1 architecture but it makes some notable changes well beyond the tuning that V8 networks did. For one thing the frame sizes and kernel depths are enough bigger (2x or more) that normally you'd expect the behavior to move into a new regime because the expressive power of the network is going to be qualitatively greater. Usually when that happens you have to make other changes as well and you will start seeing behavior that can't be easily extrapolated from the original system. Additionally AKNET_V9 has dual-frame inputs: it looks at two successive frames from each camera, which allows the network 'at a glance' to leverage instantaneous motion to enhance it's discriminative capabilities and probably to provide new categories of output (like object relative motion). This is novel AFAIK and would constitute the kind of advance that I would like to see described in a peer reviewed scientific paper. Finally, the structure of AKNET makes in clear that the network itself must be camera agnostic since it is being fed by cameras that have substantial variations in their optical characteristics. This is also novel and similarly worthy of a paper describing the research that went into it.

For these reasons I describe AKNET_V9 as 'ground breaking'.

Alas, industrial research often does not see the light of day for many years because keeping the research internal conveys competitive advantages. This is probably why it took google years to report what they learned from TPU V1. Hopefully Tesla will, eventually, allow their developers to contribute to the state of public knowledge as well.
 
Amazing how you figure all of this stuff out without having access to the source code of this system!

Frankly, I do have access to a sort of 'source code' for the some of the networks. The metadata file for AKNET_V9 describes it in terms that aren't too hard to analyze. I don't have that data for the other networks which is why I don't yet have much to say about them.
 
  • Informative
Reactions: GSP