Without insider knowledge, it's really hard to know or predict how the compute performance of HW3 will work out in the long run, that's the bottom line. They may be using both processors (non-redundantly) for primary compute today, but there are a lot of unknown ways things could be intentionally slowed-down now that eventually would be resolved to free up significant capacity in the future:
1. They could currently be doing more-subtle versions of shadowing still (e.g. not necessarily two whole version of the entire stack, but running multiple variants of some particular NN: one that's actually operating the car, and some older or newer variants running alongside for comparison to flag interesting things for future work).
2. In an even broader sense than the above, they could be running some "extra" code (deterministic or NN) whose only purpose is to observe the behavior of the main software and gather data about edge cases in the live code for future development, etc.
3. They could be running "debug"-level code for some parts, which is perhaps much more resource-intensive than it needs to be, in order to provide better traces of problems leading to disengagements.
4. There could be potentially very significant future optimizations to come. Maybe right now they're more focused on feature development, and at some later time when lack of compute capacity becomes more-limiting, they'll turn their attention to optimizing NN execution, optimizing the deterministic parts, and even making software-architectural changes that result in significant new efficiencies. I highly doubt they're anywhere near squeezing out every last drop at this stage of development.
Still, it's a valid concern that they may end up never investing much in further optimizing HW3 performance once they've got significant HW4 deployment.