But this is exactly the O(N²) scaling
@KarenRei mentioned: with a given layering architecture the size of a layer is roughly a constant percentage of the total number of neurons. So as N increases, the size of all weights and thus the overhead of inference calculations of the most important layers goes up quadratically - i.e. O(N²).
This is why
@jimmy_d's enthusiasm about the large networks in the firmware was justified IMO: overhead doesn't increase linearly with number of neurons but quadratically.
Given that Tesla's current networks only utilize about 10% of a single HW3 chip's capacity, I doubt there's much input down-sampling done in the "full size" FSD networks (be it max pooling or any other size reduction method such as the post training trimming of layers).
So yes, it makes sense to talk about big O, as long as there's still significant idle time left in the computing unit.
O(N²) also makes it clear how difficult the competition is going to find it to catch up with Tesla's FSD technology:
- Even Google backed Waymo is going to have significant difficulties matching HW3 performance IMHO - and they have no business model to reach Tesla's training data volume.
- All the ICE makers trying to build their own FSD solution based on commodity computing hardware are IMO out of the FSD race. They have neither the hardware nor the training data. They just don't know it yet that they are dead.