Seeing world in autopilot

verygreen · Jun 18, 2018

So I guess some of you have already seen this electrek article:

A rare look at what Tesla Autopilot can see and interpret

The data it contains was obtained from my old unicorn snapshots and also another source I got recently that includes a lot more recent snapshots from a 18.10.4 car.

These videos below only use interpreted radar data (so it's not raw radar output, there's some classification and I am sure some culling too. In particular notice how overhead signs are quickly discarded).

And separately another set of snapshots from this other source had nice "autopilot detected objects" data included apparently used for debugging object depth.

The object description looks like this:

Code:

            {
                "vision_loc_x": "23.495266",
                "vision_loc_y": "-3.82337713",
                "vision_loc_z": "0",
                "depth": "15.9356184",
                "velocity": "4.39441586",
                "log_likelihood": "3.50788665",
                "rad_loc_x": "15.125",
                "rad_loc_y": "-3.5",
                "rad_loc_z": "0.5",
                "rad_vx": "-3.5625",
                "rad_vy": "-0.25",
                "prob_obstacle": "0.96875",
                "prob_existence": "0.96875",
                "moving": "true",
                "stopped": "false",
                "stationary": "false",
                "bbox_top_left_x": "413",
                "bbox_top_left_y": "207",
                "bbox_height": "59",
                "bbox_width": "89",
                "bbox_core_x": "457",
                "bbox_core_y": "236"
            }

And the objects are reported separately for each camera (currently only main and narrow, though header includes every camera with luminance level from each so we now know they also use other cameras at least for luminance checks).

This one was interesting in that there's a stopped truck and it has a probability 25% of being an obstacle.

There's still some work going in here by other diving into the data to also check the raw radar stream (also included) vs the interpreted stream so some more data might come out of these snapshots.

And also hopefully we'll get some even more recent snapshots.

And also I wanted to again thank @DamianXVI for the awesome visualization tools!

Alketi · Jun 18, 2018

Great work! And 25% obstacle explains why the car isn't braking for these parked vehicles.

By chance, do you have a similar example of a car on the side of the road? If it has a much higher obstacle percentage that would point to the vision NN as the discriminator and imply that more examples of trucks are needed.

verygreen · Jun 18, 2018

No other stopped vehicle examples in the dump unfortunately. Perhaps some other time...

jimmy_d · Jun 18, 2018

Alketi said:
Great work! And 25% obstacle explains why the car isn't braking for these parked vehicles.

By chance, do you have a similar example of a car on the side of the road? If it has a much higher obstacle percentage that would point to the vision NN as the discriminator and imply that more examples of trucks are needed.

Judging from how the radar tags constantly evolve it seems likely that the evaluation tags for the vision system and the combined evaluations are changing in real time. That obstacle tag is very likely the system's evaluation at the instant that the image was taken. As the distance declines and other aspects of the vehicle's situation become more clear the obstacle probability will be updated. In this case the correct evaluation is likely zero, since the stationary vehicle is not present in the Tesla vehicles's traffic lane, or even in the adjacent traffic lane.

The NN post processing code that I saw devoted considerable resources to identifying other vehicles and determining their relative motion, identifying the location of lanes and the lane configuration, and assigning vehicles to lanes. I saw labels for moving, stationary, and stopped but didn't know what they meant. @verygreen now provides us with some info that helps to understand those labels. "Moving" is self explanatory, but the difference between stationary and stopped is probably whether the system thinks a vehicle is parked and unlikely to move anytime soon versus whether it's temporarily stopped - i.e. at a stop sign. The radar tags in the video seem to support this interpretation and all the vision tags in the few available frames are also consistent with this interpretation. And probably a big component of 'stopped' versus 'stationary' comes down to whether the system sees the vehicle as occupying a traffic lane or not.

lunitiks · Jun 18, 2018

Totally awesome work

gluu · Jun 18, 2018

Maybe there is not enough data showing cars on the side of the road in such a manner in their dataset, hence the low probability. Not sure how they could augment this data easily using the fleet.

verygreen · Jun 18, 2018

gluu said:
Maybe there is not enough data showing cars on the side of the road in such a manner in their dataset, hence the low probability. Not sure how they could augment this data easily using the fleet.

appears to be relatively easy to create a matching trigger if they are interested in getting more of those.

chillaban · Jun 18, 2018

The impression I'm getting from these kinds of revelations is that Tesla seems to have more of a groundwork for both collecting and utilizing more complicated data, and this isn't strictly a lane-following single-lead-car-tracking driver assist system anymore. That might be what it does the best right now, but it sure seems like they've actually laid the foundation for extending this (and collecting the right kind of data to do so) into more of a driving path + obstacle identifying system.

The radar overlays make it kind of clear why Tesla doesn't act too much on radar alone. It seems fine when traveling straight on unoccupied roads, but man, when taking turns and driving in an edge lane, the radar sure picks up a ton of harmless objects like reflectors and sometimes the circle sizes for those don't seem any smaller than for an actual car.

verygreen · Jun 18, 2018

chillaban said:
sometimes the circle sizes for those don't seem any smaller than for an actual car.

circle size does not denote object size. The bigger is the circle the closer is the object. It's in the description.

pkodali · Jun 18, 2018

@verygreen is there any evidence of bounding boxes/how does autopilot determine object size?

verygreen · Jun 18, 2018

pkodali said:
@verygreen is there any evidence of bounding boxes/how does autopilot determine object size?

the bounding boxes ar eright there on the pictures at the end of the article. They appear to be done as a fusion of vision system + difference with radar return of some sort if you look at the raw object description.

chispas · Jun 18, 2018

This is awesome, great work folks!

banned-66611 · Jun 18, 2018

This seems like a step back from the "FSD" demo video they made a couple of years ago.

Did they start over out something?

pkodali · Jun 18, 2018

banned-66611 said:
This seems like a step back from the "FSD" demo video they made a couple of years ago.

Did they start over out something?

I think they redid everything when Karpathy joined

chillaban · Jun 18, 2018

banned-66611 said:
This seems like a step back from the "FSD" demo video they made a couple of years ago.

Did they start over out something?

The demo was purely a proof of concept, seems like using reference code. Furthermore, if we believe that Redditor who claims to be an ex engineer, the bounding boxes on the demo were drawn in post processing by another neural net that’s not onboard the car.

In other words, that was purely a demo. And this is an example of production firmware that’s on 100,000+ customer cars producing this data in real-time.

jimmy_d · Jun 18, 2018

pkodali said:
@verygreen is there any evidence of bounding boxes/how does autopilot determine object size?

The camera inference neural networks on each of the 7 main autopilot cameras (main,narrow,fisheye,2xpillar,2xrepeater) all produce bounding boxes as one of their outputs. The radar return raw output includes a locus (a point near the center of the object), signal strength, distance, and relative velocity as the major components. The radar probably has some ability to determine the cross section size of whatever it's getting back from the target, but it's very likely that the bounding boxes are primarily, if not entirely, output from the vision neural network. The resolution is too high to plausibly be the product of the radar return.

The vision binary includes functions that seem to be correlating what the cameras see with what the radar is returning, and there's evidence of this in the log file that was used to generate the bounding box images that @verygreen is sharing with us. Those log files call out individual targets and ascribe a variety of properties to them, including some that are labeled with radar tags and some that are labeled with vision tags. Probably some of the parameters are mostly from radar (velocity and distance) with some vision contribution, other parameters will be primarily vision (bounding boxes) with some contribution from radar.

pkodali · Jun 18, 2018

jimmy_d said:
The camera inference neural networks on each of the 7 main autopilot cameras (main,narrow,fisheye,2xpillar,2xrepeater) all produce bounding boxes as one of their outputs. The radar return raw output includes a locus (a point near the center of the object), signal strength, distance, and relative velocity as the major components. The radar probably has some ability to determine the cross section size of whatever it's getting back from the target, but it's very likely that the bounding boxes are primarily, if not entirely, output from the vision neural network. The resolution is too high to plausibly be the product of the radar return.

The vision binary includes functions that seem to be correlating what the cameras see with what the radar is returning, and there's evidence of this in the log file that was used to generate the bounding box images that @verygreen is sharing with us. Those log files call out individual targets and ascribe a variety of properties to them, including some that are labeled with radar tags and some that are labeled with vision tags. Probably some of the parameters are mostly from radar (velocity and distance) with some vision contribution, other parameters will be primarily vision (bounding boxes) with some contribution from radar.

Thanks for the super detailed reply!

kbecks13 · Jul 14, 2018

These are the threads that keep me coming back to TMC, thank you very much for spending the time to dig into this!

I wish Tesla were more open about how the autopilot system works, but i totally understand why they aren't.

Search

Seeing world in autopilot

verygreen

Curious member

Alketi

Member

verygreen

Curious member

jimmy_d

Deep Learning Dork

lunitiks

Cool James & Black Teacher

gluu

Member

verygreen

Curious member

chillaban

Active Member

verygreen

Curious member

pkodali

Member

verygreen

Curious member

chispas

Member

banned-66611

Guest

pkodali

Member

chillaban

Active Member

jimmy_d

Deep Learning Dork

pkodali

Member

kbecks13

Active Member

Similar threads