S4WRXTTCS
Well-Known Member
So I got a chance to look at the network specification for the AP2 neural network in 40.1. As @verygreen previously reported, the input is a single 416x640 image with two color channels - probably red and grey. Internally the network processes 104x160 reduced frames as quantized 8 bit values. The network itself is a tailored version of the original GoogLeNet inception network plus a set of deconvolution layers that present the output. Output is a collection of 16 single color frames, some at full and some at quarter resolution. The network is probably a bit less than 15 million parameters given the file size.
So what does this mean? Images go into it, and for every frame in the network produces a set of 16 interpretations which also come in the form of grayscale images. Some external process takes those processed frames and makes control decisions based on them, probably after including radar and other sensors. This is not an end-to-end network: it doesn't have any scalar outputs that could be used directly as controls.
Also, the kernel library includes a number of items with intriguing names that are unused in the current network. At a minimum this must mean that there are variations of this network that have more features than they are exploiting in the current version, or that they have other networks with enhanced functionality which share the same set of kernels.
Is it possible to run inference with the model? Assuming we have a 416x640 two color image from one of the cameras?