Perhaps some of the NN’s can run at a lower frame rate and higher resolution, or intermittently? Like the lane change cameras likely don’t need the same update rate as maintaining lane position or looking for forward obstacles.
You can install our site as a web app on your iOS device by utilizing the Add to Home Screen feature in Safari. Please see this thread for more details on this.
Note: This feature may not be available in some browsers.
Just for fun I downsampled some frames grabbed from some of the HW2 cameras under different conditions (source) to 104x160. Kind of gives you a sense of what level of detail we're talking about. These are full size (Try zooming in 500x.)...
Awesome stuff!Whoa, I just got my hands on 2017.42 - lots of goodies there. two more NNs, one for the wide angle cam and one for the repeaters. Also even more maps stuff (perhaps that stull will finally start working now?)
I am a bit nervous to actually install it due to the suspension problems that are widely reported.
Whoa, I just got my hands on 2017.42 - lots of goodies there. two more NNs, one for the wide angle cam and one for the repeaters. Also even more maps stuff (perhaps that stull will finally start working now?)
I am a bit nervous to actually install it due to the suspension problems that are widely reported.
Whoa, I just got my hands on 2017.42 - lots of goodies there. two more NNs, one for the wide angle cam and one for the repeaters. Also even more maps stuff (perhaps that stull will finally start working now?)
I am a bit nervous to actually install it due to the suspension problems that are widely reported.
So I got a chance to look at the network specification for the AP2 neural network in 40.1. As @verygreen previously reported, the input is a single 416x640 image with two color channels - probably red and grey. Internally the network processes 104x160 reduced frames as quantized 8 bit values. The network itself is a tailored version of the original GoogLeNet inception network plus a set of deconvolution layers that present the output. Output is a collection of 16 single color frames, some at full and some at quarter resolution. The network is probably a bit less than 15 million parameters given the file size.
So what does this mean? Images go into it, and for every frame in the network produces a set of 16 interpretations which also come in the form of grayscale images. Some external process takes those processed frames and makes control decisions based on them, probably after including radar and other sensors. This is not an end-to-end network: it doesn't have any scalar outputs that could be used directly as controls.
Also, the kernel library includes a number of items with intriguing names that are unused in the current network. At a minimum this must mean that there are variations of this network that have more features than they are exploiting in the current version, or that they have other networks with enhanced functionality which share the same set of kernels.
So if we cautionously assume that current network has data at 60fps (2 cameras at 30fps each) + we'll get 2 more of similar complexity NNs at another 60 fps (2x repeater + 2x pillars) = 180fps in total + a hopefully simplier wide camera NN and that just leaves the backup camera that has a totally different picture pattern, but on the other hand would not need to be used all the time.
This looks like there's a chance the whole performance would be about where it should be unless they drastically redo their NNs to make them much heavier I imagine.
Actually 70% is on the good side. Utilization is basically how many of the theoretical maximum flops for some particular chip a particular piece of code is using. For instance CPU utilization on a typical laptop might bounce around 10% for a typical heavy workload. Only pretty well optimized stuff that is well tuned to the particular hardware you're using will every get above about 25%. That's just the nature of today's general purpose computer IC's: they can do a lot of different things but they aren't optimized for any of them.
Sorry, that's kind of a digression.
I don't actually know that this piece of code is running at 70%, so my guess might be optimistic. I found a benchmark running Inception-V2 on a GP104 IC which showed 70% utilization. The first 2/3 of this 40.1 network is basically identical to Inception-V2, and the last 1/3 is computationally similar. I think the AP2 hardware is likely based on a GP102 IC, which is architecturally identical to the GP104 but about half the size, so I think it should be able to get similar utilization. That seemed to be the best way for me to generate an estimate.
I think the hardware isn't underpowered at all. I haven't seen a network that is know to be able to do FSD so it's only speculation on my part, but I'd expect this chip to be able to do it if the FSD algorithm is decently mature. Properly tuned this chip can probably run a much more competent network than this one on 8 cameras simultaneously at 30fps. The fact that the encoding portion of this network is a trivial cut-and-paste from a Imagenet contest winner from 2014 suggests to me that they aren't even trying to optimize this puppy.
I mean seriously, it's like an intern did the network architecture for this. That's a big part of why I think they must be working on something else. Because this sure doesn't feel like the product of a world class team with tons of resources.
Comma AI seems to be doing the basic stuff for lane keeping on a smart phone, which is probably somewhere between a hundred and a thousand times less capable than this GPU. That gives me hope that the problem will turn out to be tractable with existing hardware.
Isn't it possible, actually plausible, that the NN deployed and evaluated by us is a placeholder legacy version? I mean, it seems obvious to me that this is the case, that the future EAP/FSD stuff hasn't been deployed and thus we can't infer or validate its existence (nor can any competitors!). Yes, this is bad for the implied benefits of a "shadow mode" but I don't think it's necessarily bad for the suspected advantage of Tesla's fleet learning. Our cars could be collecting data and shipping it home where it is used to train the model(s) we've yet to see. That wouldn't require a NN at all, just the sensors/cameras and a data feed. It would also provide excellent data for the simulator we keep hearing about...
As an aside, and I admit I am not certain of the relevance, but I once interviewed one of the guys that built the original Forza Motorsport. He told me an interesting story. Up until that time all racing games had an AI "line" which the non-player cars would use to get around each track. So, when they built Forza they applied this strategy and built an ideal "line" for every track in the game. Because of the Xbox's increased compute power the game had far better physics - each car was its own model and behaved its own way - a major selling point when they pitched the game for funding. The physics were so good (relatively speaking) that the classic, generalized line that was used for the car to follow simply didn't work. Cars would go careening off the track or into walls. Amazing, right!? Thus, they had to rethink the car AI. They subsequently took each car and had to "train" it to drive around each track, each car producing its very own line for each track based on the conditions (wet, dry) and modifications (tires, suspension, engine, etc). Cool stuff. (And probably not all that different from Tesla's simulator.)
There was a post earlier in this thread about Tesla's pathing being algorithmic with the NN (CNN, DLNN, whatever) building the world "model". That comment got me thinking about the Forza example. It also got me thinking about the recent air suspension woes in 2017.41 - it may answer why they are messing with air suspension in then first place; obviously suspension performance is a major input to the capabilities of the car, which matters in algorithmic pathing. You need to know exactly how the car is going to behave or it might go flying off the track.
Also, if an Xbox360 can model these physics (for all the cars in at race at once in real time), smartphones today probably can (Comma.ai), and AP2 or AP2.5 absolutely can.
Source?Just pointing out, 2.5 has 2x the GPU capacity of 2.0, so it's 100% more powerful.
Whoa, I just got my hands on 2017.42 - lots of goodies there. two more NNs, one for the wide angle cam and one for the repeaters. Also even more maps stuff (perhaps that stull will finally start working now?)
I am a bit nervous to actually install it due to the suspension problems that are widely reported.
The NNs are visible in the firmware. Yes, there are 3 NNs in 2017.42.Woah!!! again how are you finding this information??!! So there are three NNs in 2017.42??
I'd like ramp to ramp AP without nags to ease going to the in law's for Thanksgiving...
I'd like ramp to ramp AP without nags to ease going to the in law's for Thanksgiving...
I bet we may be seeing those new NN in the much ballyhooed “stealth mode” collecting data before they are rolled out to every one. It would explain why they seem intent on rolling this out to everyone. Exciting times.
The NNs are visible in the firmware. Yes, there are 3 NNs in 2017.42.
I'd like ramp to ramp AP without nags to ease going to the in law's for Thanksgiving...
But how do you examine the firmware?
narrow and main are served by the same NN.What NN does the front long/narrow range camera use? the same as the main front camera use?
Well duh, it's a bunch of files. Just look at the files and you see what they are.
narrow and main are served by the same NN.
I mean how do you get the files?