Then what do you think this meant, which was the original post that spawned this discussion?
I feel like I'm purposely putting my foot in a bear trap here, but:
One of the major points of going to NN-everywhere and the elimination of those "300 k lines of C/C++ code" was to make the entire CPU more efficient, in the sense of, "You've got a task. How much compute time does it take?"
It's well known in CS/EE that certain algorithms are far more efficient than other algorithms in order to do a given task. As a random example, take sorting: A bubble sort with a particular type of random data takes on the order of N^2, for N items in a list; while a tree sort takes order N*log(N). For a sufficiently large value of N, the tree sort is far, far faster. (And, having actually had to
do this back in the deeps of time on a thoroughly inadequate Original IBM PC, this isn't a hypothetical. The difference between an hour and less than three seconds is amazing.)
With the Tesla driving computer, it's pretty clear that the original division of tasks had the NN doing mostly image recognition games, that being something that NNs are known to be far more efficient than a step-at-a-time computer. While the main CPU was tasked with taking all that nifty image data and, from that, doing the actual driving. That was then: The breakthrough was the idea that the NN could take on the
driving part as well. Well, lack-a-day: We wetware types use NN to do driving (Yeah, I know, wetware != hardware, etc., etc.), so I suppose that's not a complete surprise. But Tesla's thought that this would lead to increased frame rate and, incidentally, better training.
So, from my perspective over here, this isn't a matter as to whether the GPU/NN/A12 processing units are on the same die or not (yes, yes, they are on the same die, welcome to SOC land), it's whether using different algorithms on different hardware GPU/NN/A12's results in faster, "better", processing.
Ducks and covers.