Oldie but goodie. Blog post from September 2014.
Karpathy's top-5 error ended up being 5.1%. That was enough to beat GoogLeNet at the time, but nowadays there are plenty of neural network architectures with a top-5 error below 5%.
However, the big caveat is that about two-thirds of Karpathy's are attributable to an inability to learn or memorize 1,000 object categories, especially similar categories like different dog breeds.
If anyone is aware of any similar research (even n=1 studies like this one) on benchmarking human vision against computer vision, please share. I would love to see more work like this.
“There are now several tasks in Computer Vision where the performance of our models is close to human, or even superhuman. Examples of these tasks include face verification, various medical imaging tasks, Chinese character recognition, etc. However, many of these tasks are fairly constrained in that they assume input images from a very particular distribution. For example, face verification models might assume as input only aligned, centered, and normalized images. In many ways, ImageNet is harder since the images come directly from the “jungle of the interwebs”. Is it possible that our models are reaching human performance on such an unconstrained task?”
Karpathy's top-5 error ended up being 5.1%. That was enough to beat GoogLeNet at the time, but nowadays there are plenty of neural network architectures with a top-5 error below 5%.
However, the big caveat is that about two-thirds of Karpathy's are attributable to an inability to learn or memorize 1,000 object categories, especially similar categories like different dog breeds.
If anyone is aware of any similar research (even n=1 studies like this one) on benchmarking human vision against computer vision, please share. I would love to see more work like this.