Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Benchmarking software against humans

This site may earn commission on affiliate links.
Pretty much every academic paper published about autonomous vehicles compares its results to a benchmark data set like KITTI Vision or Cityscapes. This is great for comparing software to other software, but what about comparing software to humans?

It strikes me that it would be easy to quantify autonomous vehicle progress if only we had human benchmarks for error rates on tasks like object detection, lane keeping, depth estimation, and so on. I wonder if anyone is aware of any helpful points of comparison along these lines.
 
  • Like
Reactions: Skipdd
...comparison...

Tesla has promised to quarterly publish Autopilot safety data but in the meantime, we have some consolation prizes:

Tesla has been gathering data from owners and it shows that Autosteer keeps center much better than manual human do:

autopilot-position-in-lane.jpg



NHTSA graphed that Tesla crashes is reduced by almost 40 percent for those equipped with Autopilot (It's the hardware presence regardless whether it's turned on or off, whether the hardware feature was paid for or not):

12-0575658750.jpg



In the Uber fatal accident, Tempe Police benchmarked that at 44 meters or 143 feet away, 85% of drivers would be able to visualize a pedestrian like Elaine Herzberg in the same scenario (39 MPH in the dark night with the same existing street lights).

The NTSB shows a graph that Elaine Herzberg walking her bicycle was detected by the Autonomous system at about 25 meters (yellow bands below) or 85 feet:


41583363594_8de1769563_z.jpg
 
Last edited:
  • Informative
Reactions: Skipdd
Tesla has been gathering data from owners and it shows that Autosteer keeps center much better than manual human do:

autopilot-position-in-lane.jpg

I'm sure AP is much more consistent than human drivers, but I've often seen it get confused about where the edges of the lanes are. If AP itself is confused, it's going to report that it's dead center, even when it isn't. So I question that particular claim, unless the lane edges were established independent of AP. Otherwise it's "position in what AP identifies as the lane".
 
  • Helpful
Reactions: strangecosmos
I'm sure AP is much more consistent than human drivers, but I've often seen it get confused about where the edges of the lanes are. If AP itself is confused, it's going to report that it's dead center, even when it isn't. So I question that particular claim, unless the lane edges were established independent of AP. Otherwise it's "position in what AP identifies as the lane".

Good point: what’s the ground truth?
 
Good point: what’s the ground truth?
You can't benchmark against humans as humans have certain flaws and machines have a separate set of flaws.

As far as image recognition, we've seen imageNet competitions where the computers outscore humans. In the real world they can sometimes spot things humans miss while at the same time totally failing on something humans can get with ease.

In writing this post I thought it might be cool to do a AI where's waldo but it appears someone has done that:


You cannot always use lane centering either. Many times humans tend to purposely hug the outside if being passed by a semi or oversized vehicle. This is a matter of comfort and preference. We might veer off center to avoid a pothole or to give a car parked on the side of the road some space. Sometimes people hug the line separating the lane they wish to merge into just before they put their blinker on (if they actually care to use one :mad:). MobilEye's test vehicles will actually hug the line before merging just like a human.

The one true benchmark is going to eventually be the fatality rate. This is pretty much a standard measure done in many countries to benchmark both drivers and the average safety of vehicles on the road.

However, another benchmark, for which human data doesn't accurately exist, would be incidents where the vehicle collides with something else. Most of this goes unreported. It'd be nice if FSD cars acted like bumper cars with invisible bumpers, in that ideally they shouldn't run into anything and should have buffer zones of "personal space".

My benchmark is this:
Can a car get me to where I'm going safely, without incident, and comfortably so that it feels natural?
 
  • Helpful
Reactions: strangecosmos
You can't benchmark against humans as humans have certain flaws and machines have a separate set of flaws.

As far as image recognition, we've seen imageNet competitions where the computers outscore humans. In the real world they can sometimes spot things humans miss while at the same time totally failing on something humans can get with ease.

In writing this post I thought it might be cool to do a AI where's waldo but it appears someone has done that:


You cannot always use lane centering either. Many times humans tend to purposely hug the outside if being passed by a semi or oversized vehicle. This is a matter of comfort and preference. We might veer off center to avoid a pothole or to give a car parked on the side of the road some space. Sometimes people hug the line separating the lane they wish to merge into just before they put their blinker on (if they actually care to use one :mad:). MobilEye's test vehicles will actually hug the line before merging just like a human.

The one true benchmark is going to eventually be the fatality rate. This is pretty much a standard measure done in many countries to benchmark both drivers and the average safety of vehicles on the road.

However, another benchmark, for which human data doesn't accurately exist, would be incidents where the vehicle collides with something else. Most of this goes unreported. It'd be nice if FSD cars acted like bumper cars with invisible bumpers, in that ideally they shouldn't run into anything and should have buffer zones of "personal space".

My benchmark is this:
Can a car get me to where I'm going safely, without incident, and comfortably so that it feels natural?
Regarding the bumper car analogy, Nissan/Infiniti has a system called DCA that auto brakes for other vehicles (not sure about objects) even when their version of TACC is not engaged. It is a really nice feature, keeps a set distance between your car and others at all times, and should be easy for Tesla to add in a software update.