Benchmarking software against humans

strangecosmos · Aug 20, 2018

Pretty much every academic paper published about autonomous vehicles compares its results to a benchmark data set like KITTI Vision or Cityscapes. This is great for comparing software to other software, but what about comparing software to humans?

It strikes me that it would be easy to quantify autonomous vehicle progress if only we had human benchmarks for error rates on tasks like object detection, lane keeping, depth estimation, and so on. I wonder if anyone is aware of any helpful points of comparison along these lines.

Tam · Aug 20, 2018

Trent Eady said:
...comparison...

Tesla has promised to quarterly publish Autopilot safety data but in the meantime, we have some consolation prizes:

Tesla has been gathering data from owners and it shows that Autosteer keeps center much better than manual human do:

NHTSA graphed that Tesla crashes is reduced by almost 40 percent for those equipped with Autopilot (It's the hardware presence regardless whether it's turned on or off, whether the hardware feature was paid for or not):

In the Uber fatal accident, Tempe Police benchmarked that at 44 meters or 143 feet away, 85% of drivers would be able to visualize a pedestrian like Elaine Herzberg in the same scenario (39 MPH in the dark night with the same existing street lights).

The NTSB shows a graph that Elaine Herzberg walking her bicycle was detected by the Autonomous system at about 25 meters (yellow bands below) or 85 feet:

mblakele · Aug 20, 2018

Tam said:
Tesla has been gathering data from owners and it shows that Autosteer keeps center much better than manual human do:

I'm sure AP is much more consistent than human drivers, but I've often seen it get confused about where the edges of the lanes are. If AP itself is confused, it's going to report that it's dead center, even when it isn't. So I question that particular claim, unless the lane edges were established independent of AP. Otherwise it's "position in what AP identifies as the lane".

Tam · Aug 20, 2018

mblakele said:
...it get confused...

It's not perfect so that's why the red area is not a straight thin line shot right straight up at the center but it's more like an A with its base gradually spreading out both sides due to those confusing autosteer mistakes.

strangecosmos · Aug 21, 2018

mblakele said:
I'm sure AP is much more consistent than human drivers, but I've often seen it get confused about where the edges of the lanes are. If AP itself is confused, it's going to report that it's dead center, even when it isn't. So I question that particular claim, unless the lane edges were established independent of AP. Otherwise it's "position in what AP identifies as the lane".

Good point: what’s the ground truth?

JeffK · Aug 21, 2018

Trent Eady said:
Good point: what’s the ground truth?

You can't benchmark against humans as humans have certain flaws and machines have a separate set of flaws.

As far as image recognition, we've seen imageNet competitions where the computers outscore humans. In the real world they can sometimes spot things humans miss while at the same time totally failing on something humans can get with ease.

In writing this post I thought it might be cool to do a AI where's waldo but it appears someone has done that:

You cannot always use lane centering either. Many times humans tend to purposely hug the outside if being passed by a semi or oversized vehicle. This is a matter of comfort and preference. We might veer off center to avoid a pothole or to give a car parked on the side of the road some space. Sometimes people hug the line separating the lane they wish to merge into just before they put their blinker on (if they actually care to use one

). MobilEye's test vehicles will actually hug the line before merging just like a human.

The one true benchmark is going to eventually be the fatality rate. This is pretty much a standard measure done in many countries to benchmark both drivers and the average safety of vehicles on the road.

However, another benchmark, for which human data doesn't accurately exist, would be incidents where the vehicle collides with something else. Most of this goes unreported. It'd be nice if FSD cars acted like bumper cars with invisible bumpers, in that ideally they shouldn't run into anything and should have buffer zones of "personal space".

My benchmark is this:
Can a car get me to where I'm going safely, without incident, and comfortably so that it feels natural?

MXFLA · Aug 21, 2018

JeffK said:
You can't benchmark against humans as humans have certain flaws and machines have a separate set of flaws.

As far as image recognition, we've seen imageNet competitions where the computers outscore humans. In the real world they can sometimes spot things humans miss while at the same time totally failing on something humans can get with ease.

In writing this post I thought it might be cool to do a AI where's waldo but it appears someone has done that:

You cannot always use lane centering either. Many times humans tend to purposely hug the outside if being passed by a semi or oversized vehicle. This is a matter of comfort and preference. We might veer off center to avoid a pothole or to give a car parked on the side of the road some space. Sometimes people hug the line separating the lane they wish to merge into just before they put their blinker on (if they actually care to use one ). MobilEye's test vehicles will actually hug the line before merging just like a human.

The one true benchmark is going to eventually be the fatality rate. This is pretty much a standard measure done in many countries to benchmark both drivers and the average safety of vehicles on the road.

However, another benchmark, for which human data doesn't accurately exist, would be incidents where the vehicle collides with something else. Most of this goes unreported. It'd be nice if FSD cars acted like bumper cars with invisible bumpers, in that ideally they shouldn't run into anything and should have buffer zones of "personal space".

My benchmark is this:
Can a car get me to where I'm going safely, without incident, and comfortably so that it feels natural?

Regarding the bumper car analogy, Nissan/Infiniti has a system called DCA that auto brakes for other vehicles (not sure about objects) even when their version of TACC is not engaged. It is a really nice feature, keeps a set distance between your car and others at all times, and should be easy for Tesla to add in a software update.

Tam · Aug 21, 2018

I guess California DMV annual disengagement report is a good start. The less needs for human to intervene the system, the better!

The table is done by TheLastDriverLicenseHolder.com:

Search

Benchmarking software against humans

strangecosmos

Non-Member

Tam

Well-Known Member

mblakele

FSD Beta (99)

Tam

Well-Known Member

strangecosmos

Non-Member

JeffK

Well-Known Member

MXFLA

Member

Tam

Well-Known Member

Similar threads