I’m flummoxed by a recent discovery. The AI/robotics startup Vicarious (in which Elon is an investor, along with Jeff Bezos and Mark Zuckerberg) has developed a new neural network architecture they call a Recursive Cortical Network (RCN). Vicarious used its RCN to solve CAPTCHAs with the same accuracy as a Google DeepMind convolutional neural network. Here’s the kicker: the RCN was trained on only 260 examples, versus 2.3 million for the ConvNet. So that’s a ~900,000% improvement in training data efficiency. Holy $%!&.
You can read about the RCN solving CAPTCHAs in Vicarious’ blog post on the matter, or you can read their paper in the journal Science, if you happen to have access. Vicarious also has a reference implementation of its RCN up on GitHub.
So, the RCN has achieved state-of-the-art accuracy on optical character recognition with ~900,000% higher training data efficiency. Here’s my question: has anyone tried to adapt Vicarious’ RCN for 3D computer vision?
I’m a lay enthusiast and CS 101 dropout, not a computer scientist or software engineer. So I don’t have the ability to try this myself, or even the knowledge to say whether it would feasible to try using the code Vicarious has made available on GitHub. So apologies if this is a misconceived question.
But if I have not exceeded my depth here, this seems like such an exciting experiment. If the RCN can match the accuracy of state-of-the-art ConvNets not just on character recognition, but on object detection in a 3D environment, and do so after being trained on ~0.011% as many examples, imagine the possibilities. Imagine training the vision neural networks for an autonomous car on video from 500 miles of driving, and achieving the same object detection accuracy as Waymo’s neural nets after 5 million miles.
Or, exciting for companies like Waymo and Tesla, what if using RCNs, 5 million miles of test driving is as good as 50 billion miles with ConvNets?
You can read about the RCN solving CAPTCHAs in Vicarious’ blog post on the matter, or you can read their paper in the journal Science, if you happen to have access. Vicarious also has a reference implementation of its RCN up on GitHub.
So, the RCN has achieved state-of-the-art accuracy on optical character recognition with ~900,000% higher training data efficiency. Here’s my question: has anyone tried to adapt Vicarious’ RCN for 3D computer vision?
I’m a lay enthusiast and CS 101 dropout, not a computer scientist or software engineer. So I don’t have the ability to try this myself, or even the knowledge to say whether it would feasible to try using the code Vicarious has made available on GitHub. So apologies if this is a misconceived question.
But if I have not exceeded my depth here, this seems like such an exciting experiment. If the RCN can match the accuracy of state-of-the-art ConvNets not just on character recognition, but on object detection in a 3D environment, and do so after being trained on ~0.011% as many examples, imagine the possibilities. Imagine training the vision neural networks for an autonomous car on video from 500 miles of driving, and achieving the same object detection accuracy as Waymo’s neural nets after 5 million miles.
Or, exciting for companies like Waymo and Tesla, what if using RCNs, 5 million miles of test driving is as good as 50 billion miles with ConvNets?