So, this image library is not a "tested one image, then all others will work." I believe every image needs entered with numerous scales/resolution, viewing angles variations, lighting conditions. Probably hundreds or thousands variations for a single sign. So, if stop sign is recognized ok, it doesn't mean that all other signs will be recognized with the same precision. There was some article about someone from the Tesla FSD team describing how difficult it is to assemble all this data, including different road signs for different countries.
As far as corner cases, I'm thinking, for example, traffic light colors made invisible by a sun glare - recently saw such case when you could see the color only from a certain angle. Snow on the road - we currently have some untreated roads with no visible lane markings where people can drive ok, but I doubt an FSD would know what to do.
A lot of times "wrong way" signs are perpendicular to the road you're on and totally confusing; you can barely figure out they relate to a different road. It just feels like a lot of these need to be hit specifically in testing, not just a single quarter of generic testing.