What's the difference between this guy and a construction worker holding a slow/stop sign? To us, an easy distinction based on many cues. To a self driving car? We can only speculate.
I think that is why computer vision needs to be more detailed. Human vision sees the world in super high def and we see every little detail. For example, we don't just see a pedestrian, we see every detail of the pedestrian, from the clothes they wear, to the brand on their baseball cap, the watch on their wrist, the type of sneakers etc... We don't just see a car, we see every part of the car, from the license plate to the wheel caps etc... Now obviously, a lot of that detail is not needed for driving and humans learn to ignore details that are not relevant to driving. But that detail does help us. In this case, a human would see that the stop sign is a print on a t-shirt so we would know it is not a real stop sign that needs to be obeyed. Or we see a stop sign being held by a construction worker so we know it is a temp stop sign that should be obeyed. Computer vision needs to understand that detail and context as well in order not be fooled.
Sensor fusion could also help. In this case, the lidar would detect a pedestrian, the camera vision would detect the pattern of a stop sign at the exact same location as the pedestrian. So the perception could learn that this stop sign is not real because it is "inside" the pedestrian and real stop signs are distinct from people. Additionally, if the stop sign is moving with the same velocity as the pedestrian that is another clue that it is a false since stop signs don't move like pedestrians.
Context could also help. I bet you could train ML based on millions of examples where stop signs are usually located. Then, perception could see if the stop sign matches the training of where stop signs are supposed to be. For example, we know stop signs are located at intersections, on city streets, construction zones. Stop signs are not located If the answer is no, then that would be another indicator that the stop sign is false. For example, we know stop signs are located at intersections, on city streets, construction zones. Stop signs are not located in the middle of the side walk where there is no reason to stop. Also, are other vehicles are stopping? If no, then that is another contextual clue that it is likely a false stop sign.
In conclusion, I would say that self-driving cars can learn the difference. It just takes more ML training to make the system better at understanding context.