This is proof that you have a very delicate system, and the current limits of classification.
Humans use simulators with less than perfect imagery, because we recognize a car, tree, or stop sign by very general traits and other context. We don't suddenly completely fail to drive a simulated car because the pixels or lighting isn't exactly right. It's this exact reason that we aren't confused by a stop sign painted on a car, or a bike on a bike carrier. These sims are used to us because they can teach us the timing, reactions, behaviors and more of a system, even if the images are imperfect, and they can do this faster and safer than the real thing.
I'd argue that if your system works well on simulated graphics, and then performs in the real world, it's a pretty robust system that is not going to fail when Dodge puts 4 stripes on a car instead of 3 or Tesla actually releases the Cybertruck.
It's also interesting that Google claims they are way past classification- their simulators are focused on scenarios, not imagery, just like humans use sims for.