Tezlanian
Member
Amon Shashua has said multiple times that it has. if you actually watched Mobileye presentation from 2014 till now you would know they have since stopped talking about sensing. Its of no interest to them anymore. They have basically solved it for SDC. Nowadays for quite a while they only give talks on their RSS (Responsibility-Sensitive Safety).
First of all the question of " solving vision or solving perception" as most people put it is wrong. you don't need to solve vision to get SDC.
If you were trying to create a AGI then yeah. For a SDC you simply need a system that can reach a certain level of accuracy/verification.
for example the mobileye's eyeq3 1 false positive of pedestrian detection every 400,000 miles according to Amon.
in reference to sensing aka solving vision for SDC, amon said
"When people think about sensing, they think about object detection, vehicles, pedestrians, traffic signs, lights, objects, etc. You receive an image as an input and your output is a bounding box...this is the easiest problem. this problem has been solved."
In reference to second level of sensing (semantic free space) where you can drive and where you can't drive:
"this is already in production.
in reference to the third level of sensing (drive-able path) where your input is an image and the output is a story. For example this lane is ending in 15 meters, etc.
Amon says its an open problem in the industry that is solved by REM maps.
So yes vision for SDC is SOLVED.
To see how ahead Mobileye is in respects to sensing. Check out for example the bounding box accurary of Zoox's sensing system.
Notice how inaccurate and jumpy the detection is?
You can do this by simply going Zoox website. They have a video on their home page
https://zoox.com/wp-content/uploads/2018/07/Vision_Video_graded-Window-3.mp4
more loose boundary box
Now compare that to the tight and accurate 3d bounding box of mobileye eyeq4.
The accuracy between the two is night and day. Its not even comparable.
"Perception" in the most simple sense you're trying to use was already solved over a decade ago with algorithms like Speeded up robust features - Wikipedia among others. I think we're talking about different things. All of these systems are trained or utilize a discrete algorithm to recognize an object or use machine learning. If they have not been trained on the data or not deliberately instructed to recognize an object then it won't recognize it. That's how the Harry Potter watching guy got decapitated while Mobileye's system hummed along a-OK.
Recognition and perception are different things. Perception is an actual understanding of that environment and the machine only knows what it has been told. It will only work in the scenarios it was designed for. For anyone to claim it is completely solved is pretty asinine. Perhaps for a very limit environment, however, that's all. Intent of vehicles in round abouts, ice over traffic lights, traffic lights being out of power, a plane emergency lands on the interstate, etc. I highly doubt all this is included in their wholly solved perception model. Solving most of your problem space while ignoring everything else doesn't mean it's solved.