Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Academic research

This site may earn commission on affiliate links.
New paper on 3D object detection using a monocular camera. Abstract:

“3D object detection from monocular images has proven to be an enormously challenging task, with the performance of leading systems not yet achieving even 10% of that of LiDAR-based counterparts. One explanation for this performance gap is that existing systems are entirely at the mercy of the perspective image-based representation, in which the appearance and scale of objects varies drastically with depth and meaningful distances are difficult to infer. In this work we argue that the ability to reason about the world in 3D is an essential element of the 3D object detection task. To this end, we introduce the orthographic feature transform, which enables us to escape the image domain by map- ping image-based features into an orthographic 3D space. This allows us to reason holistically about the spatial con- figuration of the scene in a domain where scale is consistent and distances between objects are meaningful. We apply this transformation as part of an end-to-end deep learn- ing architecture and achieve state-of-the-art performance on the KITTI 3D object benchmark.”
[1811.08188] Orthographic Feature Transform for Monocular 3D Object Detection

giphy.gif


Source: Alex Kendall on Twitter
 
Last edited:

Thanks for the link. Very interesting.

Importantly, we are able to recover correctly the depth of a car moving at the same speed as the ego-motion vehicle. This has been challenging previously — in this case, the moving vehicle appears (in a monocular input) as static, exhibiting the same behavior as the static horizon, resulting in an inferred infinite depth. While stereo inputs can solve that ambiguity, our approach is the first one that is able to correctly infer that from a monocular input.
 
  • Like
Reactions: strangecosmos