Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Seeing world in autopilot

This site may earn commission on affiliate links.

verygreen

Curious member
Jan 16, 2017
3,048
11,762
TN
So I guess some of you have already seen this electrek article:

A rare look at what Tesla Autopilot can see and interpret

The data it contains was obtained from my old unicorn snapshots and also another source I got recently that includes a lot more recent snapshots from a 18.10.4 car.

These videos below only use interpreted radar data (so it's not raw radar output, there's some classification and I am sure some culling too. In particular notice how overhead signs are quickly discarded).



And separately another set of snapshots from this other source had nice "autopilot detected objects" data included apparently used for debugging object depth.

The object description looks like this:
Code:
            {
                "vision_loc_x": "23.495266",
                "vision_loc_y": "-3.82337713",
                "vision_loc_z": "0",
                "depth": "15.9356184",
                "velocity": "4.39441586",
                "log_likelihood": "3.50788665",
                "rad_loc_x": "15.125",
                "rad_loc_y": "-3.5",
                "rad_loc_z": "0.5",
                "rad_vx": "-3.5625",
                "rad_vy": "-0.25",
                "prob_obstacle": "0.96875",
                "prob_existence": "0.96875",
                "moving": "true",
                "stopped": "false",
                "stationary": "false",
                "bbox_top_left_x": "413",
                "bbox_top_left_y": "207",
                "bbox_height": "59",
                "bbox_width": "89",
                "bbox_core_x": "457",
                "bbox_core_y": "236"
            }
And the objects are reported separately for each camera (currently only main and narrow, though header includes every camera with luminance level from each so we now know they also use other cameras at least for luminance checks).

narrow_new_2.jpg


This one was interesting in that there's a stopped truck and it has a probability 25% of being an obstacle.

There's still some work going in here by other diving into the data to also check the raw radar stream (also included) vs the interpreted stream so some more data might come out of these snapshots.

And also hopefully we'll get some even more recent snapshots.

And also I wanted to again thank @DamianXVI for the awesome visualization tools!
 
Great work! And 25% obstacle explains why the car isn't braking for these parked vehicles.

By chance, do you have a similar example of a car on the side of the road? If it has a much higher obstacle percentage that would point to the vision NN as the discriminator and imply that more examples of trucks are needed.
 
Great work! And 25% obstacle explains why the car isn't braking for these parked vehicles.

By chance, do you have a similar example of a car on the side of the road? If it has a much higher obstacle percentage that would point to the vision NN as the discriminator and imply that more examples of trucks are needed.

Judging from how the radar tags constantly evolve it seems likely that the evaluation tags for the vision system and the combined evaluations are changing in real time. That obstacle tag is very likely the system's evaluation at the instant that the image was taken. As the distance declines and other aspects of the vehicle's situation become more clear the obstacle probability will be updated. In this case the correct evaluation is likely zero, since the stationary vehicle is not present in the Tesla vehicles's traffic lane, or even in the adjacent traffic lane.

The NN post processing code that I saw devoted considerable resources to identifying other vehicles and determining their relative motion, identifying the location of lanes and the lane configuration, and assigning vehicles to lanes. I saw labels for moving, stationary, and stopped but didn't know what they meant. @verygreen now provides us with some info that helps to understand those labels. "Moving" is self explanatory, but the difference between stationary and stopped is probably whether the system thinks a vehicle is parked and unlikely to move anytime soon versus whether it's temporarily stopped - i.e. at a stop sign. The radar tags in the video seem to support this interpretation and all the vision tags in the few available frames are also consistent with this interpretation. And probably a big component of 'stopped' versus 'stationary' comes down to whether the system sees the vehicle as occupying a traffic lane or not.
 
The impression I'm getting from these kinds of revelations is that Tesla seems to have more of a groundwork for both collecting and utilizing more complicated data, and this isn't strictly a lane-following single-lead-car-tracking driver assist system anymore. That might be what it does the best right now, but it sure seems like they've actually laid the foundation for extending this (and collecting the right kind of data to do so) into more of a driving path + obstacle identifying system.

The radar overlays make it kind of clear why Tesla doesn't act too much on radar alone. It seems fine when traveling straight on unoccupied roads, but man, when taking turns and driving in an edge lane, the radar sure picks up a ton of harmless objects like reflectors and sometimes the circle sizes for those don't seem any smaller than for an actual car.
 
This seems like a step back from the "FSD" demo video they made a couple of years ago.

Did they start over out something?
The demo was purely a proof of concept, seems like using reference code. Furthermore, if we believe that Redditor who claims to be an ex engineer, the bounding boxes on the demo were drawn in post processing by another neural net that’s not onboard the car.

In other words, that was purely a demo. And this is an example of production firmware that’s on 100,000+ customer cars producing this data in real-time.
 
@verygreen is there any evidence of bounding boxes/how does autopilot determine object size?

The camera inference neural networks on each of the 7 main autopilot cameras (main,narrow,fisheye,2xpillar,2xrepeater) all produce bounding boxes as one of their outputs. The radar return raw output includes a locus (a point near the center of the object), signal strength, distance, and relative velocity as the major components. The radar probably has some ability to determine the cross section size of whatever it's getting back from the target, but it's very likely that the bounding boxes are primarily, if not entirely, output from the vision neural network. The resolution is too high to plausibly be the product of the radar return.

The vision binary includes functions that seem to be correlating what the cameras see with what the radar is returning, and there's evidence of this in the log file that was used to generate the bounding box images that @verygreen is sharing with us. Those log files call out individual targets and ascribe a variety of properties to them, including some that are labeled with radar tags and some that are labeled with vision tags. Probably some of the parameters are mostly from radar (velocity and distance) with some vision contribution, other parameters will be primarily vision (bounding boxes) with some contribution from radar.
 
The camera inference neural networks on each of the 7 main autopilot cameras (main,narrow,fisheye,2xpillar,2xrepeater) all produce bounding boxes as one of their outputs. The radar return raw output includes a locus (a point near the center of the object), signal strength, distance, and relative velocity as the major components. The radar probably has some ability to determine the cross section size of whatever it's getting back from the target, but it's very likely that the bounding boxes are primarily, if not entirely, output from the vision neural network. The resolution is too high to plausibly be the product of the radar return.

The vision binary includes functions that seem to be correlating what the cameras see with what the radar is returning, and there's evidence of this in the log file that was used to generate the bounding box images that @verygreen is sharing with us. Those log files call out individual targets and ascribe a variety of properties to them, including some that are labeled with radar tags and some that are labeled with vision tags. Probably some of the parameters are mostly from radar (velocity and distance) with some vision contribution, other parameters will be primarily vision (bounding boxes) with some contribution from radar.
Thanks for the super detailed reply!
 
  • Love
Reactions: MelaniainLA
These are the threads that keep me coming back to TMC, thank you very much for spending the time to dig into this!

I wish Tesla were more open about how the autopilot system works, but i totally understand why they aren't.