I agree that's the generally understood definition of end-to-end, but the current practice for constructing large ML models like LLMs actually consists of a pipeline of multiple individual neural network modules trained and used in tandem.
For e.g. check out Andrej Karpathy's simplified implementation of a block of self attention for recreating GPT2:
nanoGPT/model.py at master · karpathy/nanoGPT
A single forward pass involves passing the tensors first through the CausalSelfAttention module, and then through an MLP module. Each is a neural network in its own right, but no one would argue that this implementation of GPT2 isn't end-to-end.
Maybe the real distinction between a system that's end-to-end, and one that's not, is whether the modules are trained and used for inference as one, or whether they're are trained separately.