Recognizing your words happens in-car. Translating those words into actions uses a network request. (This confused me when trying to figure out how to successfully express voice commands while parked in the garage. It'll display the words as they're recognized, then wait, then punt.)
This design makes sense for navigation requests, internet audio searches, and easter eggs, but it's unreliable and slow for car controls like turning on recirculation and fog lights.