Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Possible scientific vindication of using just cameras, no lidar

This site may earn commission on affiliate links.
I’m super excited by this recent paper: “3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection”. The authors used a four-camera system — similar to the eight-camera system in Harware 2 Teslas — to determine the locations of obstacles in real time with an accuracy of under 10 cm (3.9 in). By comparison, lidar has an accuracy of 1.5 cm (0.6 in).

My strong hunch is that an accuracy of under 10 cm is good enough for full self-driving. For reference, a credit card is 8.6 cm (3.4 in) long. At that point, you’re getting to the limit of how accurately human driver can control a car. I found a study where drivers were only able to park with about 10 cm of accuracy at best.

The big caveat here is that the multi-camera system was only tested at low speeds. The experiments occured in a parking garage. I have not been able to find any published research on multi-camera systems at high driving speeds.

Here’s what I’m trying to figure out now. What would it take to adapt a multi-camera system to high driving speeds, while retaining an accuracy of under 10 cm?

Based on my interactions on Quora, Facebook, Twitter, Stack Exchange and by emailing the paper’s first author, the challenge seems to be motion blur and other visual artifacts that occur at higher speeds. Some people I have talked to have suggested that this can be overcome with cameras that use a global shutter (i.e. that capture every pixel simultaneously, as opposed to a rolling shutter which captures pixels line by line) and a high enough frame rate. One person suggested shutter speed is also important.

I’m hoping the community here can help me answer this question conclusively — or as conclusively as possible without running a test of a multi-camera system at high speeds. People here really go deep on the cameras used in Hardware 2 Teslas as well as the software. I don’t have a deep understanding of the technical details. So I’m looking for some help.
 
Very interesting, although Tesla also have radar in the mix and have been working (at least as of late 2016) to integrate that.

Back then, they were running the radar at only 10 fps - considerably slower than the IR/visible light framerates of monochrome video cameras.

Maybe you can trade framerate for image quality, since the system only has to be better than human reaction times.

Upgrading Autopilot: Seeing the World in Radar

(but since then we've had Chris Lattner arrive and leave and Andrej Karpathy join, so who knows what's happening)
 
Last edited:
I've never understood the problem with using cameras.
We operate cars using just our eyes already.
I suppose our eyes are pretty high res, and can turn, and we have massive processing power, but still. We certainly don't have lidar or radar.
 
  • Funny
Reactions: Kant.Ing
Why would this paper be a vindication for using cameras only? It seems to be a paper on how cameras only can be used, not a comparison of cameras vs. cameras + lidar.

Nobody does lidar only, so the comparison is nor cameras vs. lidar, it is cameras vs. cameras + lidar (as well as a varying amount of radars on both sides, 1 for Tesla publicly, 360 degrees in competitor public prototypes).
 
I've never understood the problem with using cameras.
We operate cars using just our eyes already.
I suppose our eyes are pretty high res, and can turn, and we have massive processing power, but still. We certainly don't have lidar or radar.

Even if we forget the processing power question (lidar can help in requiring less procession) or that human eyes are mobile, i.e. they can be moved and they are attached to hands that can clear obstacles...

Even if we forget those, there is the added benefit of redundant technologies complementing each other in various weather and lighting conditions. The question is not so much what can work. Surely cameras only can work. But what is best? What results in the most safest cars?

Radars can see through obstacles (so far Tesla has only front radar so no 360).

Lidar can see in darkness, for example. Think of a fast approaching car with lights turned off approaching from behind.

Redundant sensor fusion can certainly be a very useful safety net, even if vision itself were deemed sufficient.
 
Even if we forget the processing power question (lidar can help in requiring less procession) or that human eyes are mobile, i.e. they can be moved and they are attached to hands that can clear obstacles...

Even if we forget those, there is the added benefit of redundant technologies complementing each other in various weather and lighting conditions. The question is not so much what can work. Surely cameras only can work. But what is best? What results in the most safest cars?

Radars can see through obstacles (so far Tesla has only front radar so no 360).

Lidar can see in darkness, for example. Think of a fast approaching car with lights turned off approaching from behind.

Redundant sensor fusion can certainly be a very useful safety net, even if vision itself were deemed sufficient.
I agree about having more sensors being better. I've always been fascinated by the concept of wiring different sensors into the brain (UV, infra red etc). No reason not to give the car the advantages that are so hard to give ourselves.

While I like the idea of going camera-only (for cheapness, ease of production), the cameras can't blink (clean snow, dirt, droplets) so that could also be a problem.
 
Why would this paper be a vindication for using cameras only? It seems to be a paper on how cameras only can be used, not a comparison of cameras vs. cameras + lidar.

This paper gives us hard numbers on the accuracy of cameras — but only at low speeds. We can compare these numbers against the numbers for lidar. If a multi-camera system has an accuracy of under 10 cm, lidar has an accuracy of 1.5 cm, and if in tests humans are only able to park at best with an accuracy of around 10 cm, that implies cameras are sufficient for full self-driving at a human level of capability or beyond.

The big caveat is that the multi-camera system was only tested at low speeds. I’m trying to find out in this thread if a multi-camera system could achieve under 10 cm of accuracy at high speeds with the specs of Tesla’s Hardware 2 cameras.
 
Well, from the paper:
Each camera has a nominal
FOV of 185 and outputs 1280 x 800 images at 12.5 frames per second

AP2 cameras beat that handily except the the FOV where there's only one camera that's wide (actually two - the backup as well I guess).

You can check the collection of images from AP2.0 Cameras: Capabilities and Limitations? thread to check the motion blur and rolling shutter. It also depends on lighting condition to a degree.

The location of cameras does not leave too much for intersection of cameras fields of vision though, but that might be ok. If there's a particular scene it's possible to collect a few snapshots from it to see how well the logic works if you can get the authors of the paper to run their algorithms on AP2 produced images, I guess.

Bigger problem ATM is that most of the cameras are only polled at 1fps with like 0.05-0.06s difference from one camera to another (that does add up, esp at high speed). But given enough CPU I am sure you can cam overlapping sections of images together.
 
Thank you for commenting, verygreen! Do you know if the Hardware 2 cameras are using a global shutter or a rolling shutter? Rolling shutter would be worrisome. Do you know the FPS? I’ll scour the AP2.0 Cameras thread to see what else I can learn.

The experiments used four fisheye cameras versus Tesla’s eight cameras, so total visual coverage for all eight cameras would be the same or better I’m guessing.

I’m not too worried about what the software is currently doing since it can be updated. I want to know about the fundamental capability of the hardware because that can’t so easily be changed.
 
  • Like
Reactions: calisnow
Thank you for commenting, verygreen! Do you know if the Hardware 2 cameras are using a global shutter or a rolling shutter? Rolling shutter would be worrisome. Do you know the FPS? I’ll scour the AP2.0 Cameras thread to see what else I can learn.

The experiments used four fisheye cameras versus Tesla’s eight cameras, so total visual coverage for all eight cameras would be the same or better I’m guessing.

I’m not too worried about what the software is currently doing since it can be updated. I want to know about the fundamental capability of the hardware because that can’t so easily be changed.
The sensor part number is known so we can verify the capabilities. I doubt it has any sort of actual physical shutter so probably at least some sort of rolling shutter effect is there. Two front cameras are used in 30fps mode.
There was a link to datasheet but I cannot find it fast aptina AR0132 or some such.
 
  • Helpful
Reactions: strangecosmos
I am not an expert on the topic but one concern I have with AP2 is weather/dirt/salt/snow/splashes etc limiting the cameras ability to see. This would especially be a big problem for the cameras on the fenders. I have cameras on my car at that location and they never stay clean during the winters.

How will this be dealt with? I feel even if Tesla was going to go with cameras only, it should have incorporated some kind of cleaning mechanism.
 
  • Love
Reactions: EVgeek
I am not an expert on the topic but one concern I have with AP2 is weather/dirt/salt/snow/splashes etc limiting the cameras ability to see. This would especially be a big problem for the cameras on the fenders. I have cameras on my car at that location and they never stay clean during the winters.

How will this be dealt with? I feel even if Tesla was going to go with cameras only, it should have incorporated some kind of cleaning mechanism.
this is not a problem at all (other than a backup camera). See this post AP2.0 Cameras: Capabilities and Limitations? and the next one.
 
  • Helpful
Reactions: StefanSarzio
I’m super excited by this recent paper: “3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection”. The authors used a four-camera system — similar to the eight-camera system in Harware 2 Teslas — to determine the locations of obstacles in real time with an accuracy of under 10 cm (3.9 in). By comparison, lidar has an accuracy of 1.5 cm (0.6 in).

My strong hunch is that an accuracy of under 10 cm is good enough for full self-driving. For reference, a credit card is 8.6 cm (3.4 in) long. At that point, you’re getting to the limit of how accurately human driver can control a car. I found a study where drivers were only able to park with about 10 cm of accuracy at best.

The big caveat here is that the multi-camera system was only tested at low speeds. The experiments occured in a parking garage. I have not been able to find any published research on multi-camera systems at high driving speeds.

Here’s what I’m trying to figure out now. What would it take to adapt a multi-camera system to high driving speeds, while retaining an accuracy of under 10 cm?

Based on my interactions on Quora, Facebook, Twitter, Stack Exchange and by emailing the paper’s first author, the challenge seems to be motion blur and other visual artifacts that occur at higher speeds. Some people I have talked to have suggested that this can be overcome with cameras that use a global shutter (i.e. that capture every pixel simultaneously, as opposed to a rolling shutter which captures pixels line by line) and a high enough frame rate. One person suggested shutter speed is also important.

I’m hoping the community here can help me answer this question conclusively — or as conclusively as possible without running a test of a multi-camera system at high speeds. People here really go deep on the cameras used in Hardware 2 Teslas as well as the software. I don’t have a deep understanding of the technical details. So I’m looking for some help.
I saw this article a few weeks ago - thank you for posting it. At the time I remember thinking that it is excellent theoretical support for Tesla's approach.
 
I've never understood the problem with using cameras.
We operate cars using just our eyes already.
I suppose our eyes are pretty high res, and can turn, and we have massive processing power, but still. We certainly don't have lidar or radar.
The problem is that cameras require more processing "intelligence" than lidar. I'm rooting for Tesla seeing as I own an AP2 car. Just saying - that's the root of the issue. Can the AI be made reliable enough? We shall see.