Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

FSD Beta: Perception vs path planning/driving logic

This site may earn commission on affiliate links.
I may be oversimplifying, but after a few weeks using FSD Beta I've come to the following conclusion: perception is great, path planning and driving logic not so much.

Perception is great at least based on the 'mind of the car' visualizations. Intersections, lane lines, stop signs/traffic signals, and other vehicles are represented accurately. I'm sure there are still some hard problems here - unusual objects in the driving path, accurately measuring the speed and distance of other vehicles, insufficient visibility at heavily occluded intersections, etc. But for the most part the visualizations are pretty damn accurate.

That said, FSD Beta still has some pretty serious issues, and it seems to me like these all relate to path planning and the hard coded-driving logic, aka how to navigate the space that is being perceived. Specifically:
  • Turns where the vehicle does not have guaranteed right of way (i.e. left/right turns at uncontrolled intersections, or right on red, or left without a green arrow). In these situations FSD is incredibly cautious to the point of being useless. It stops too far back, creeps way too slowly, and even when there is no cross traffic it can sit for far too long before making a decision, sometimes only making a move when the opportunity is gone.
  • Driving on unmarked roads. Not only does the car drift way too far into the middle of the road, but turning and braking can be incredibly jerky when navigating unmarked roads. Again, the drivable space is accurately displayed on the screen, yet the car jerks around violently and seems extremely unsure of how to proceed. Seems strange given how accurate the drivable path projection is.
  • Smooth completion of a left turn. This one definitely seems like it's path planning. Turns at small/simple intersections usually work well, but at larger intersections the car can turn too sharply and end up heading for the divider or the yellow line, and then have to swerve back to make it into the driving lane.
Anyway, maybe I'm stating the obvious - maybe it's well known that perception is easy and path planning is hard. Or maybe I'm wrong about how accurate/high confidence the perception system really is. Would love to hear from others who know more about this space than I do.
 
perception is great,

I don't know that we can say that, as long as it appears that FSD is braking for shadows. I certainly don't have another explanation for the unexplained slowdowns on straight roads with no traffic around.

I also think we can't really say that until we have perception that is successfully anticipating events & objects 500-1000 feet ahead, and smoothly responding as a human would.

There's also the issue of persistence - there are multiple videos out there on Twitter (Elon responded to one) showing the clear limitations of the persistence of perception. It's striking how many objects are missed or disappear when they are occluded (when it's very clear they must still exist). This has ramifications for how good the path planning can be - it's hard to respond smoothly to a world that is (potentially) constantly changing shape & size.

How good is their perception of distance? I have no idea. Some people have suggested using brake lights to detect stopping traffic - this seems unnecessary for a driving system since it must be able to very accurately measure distances (though there is certainly no downside in the extreme distance limit to being able to detect brake lights if you can do it). However, it clearly can't detect brake lights well (lots of false positives), and there's not a lot of information on how good the distance measurement actually is and how it degrades with range (obviously error will go way up as the range increases - the question is how much). The evidence suggests it's not very good, though it's difficult to separate from the path planning when just observing results.

I don't know what balance of the clear & substantial limitations of FSD rest with path planning & driving logic, vs. perception.
 
Last edited:
  • Like
Reactions: nvx1977
I had similar thoughts, but was wondering if small misc inaccuracies in perception might be disrupting path planning somehow, such that FSD works better in places where perception is working better due to lots of good training data, I.e. SF bay area or other places where beta testers have been operating for a while.
 
I don't know that we can say that, as long as it appears that FSD is braking for shadows.
Great point - this is the one other major weakness with FSD Beta today. Phantom braking is worse than regular AP.

there's not a lot of information on how good the distance measurement actually is and how it degrades with range
Well, one good sign is that it doesn't run into stopped traffic! I find that FSD Beta generally slows down gracefully when traffic ahead is slowed or stopped. As you say, it's more difficult at a distance, and presumably also for oncoming traffic?


I had similar thoughts, but was wondering if small misc inaccuracies in perception might be disrupting path planning somehow, such that FSD works better in places where perception is working better due to lots of good training data, I.e. SF bay area or other places where beta testers have been operating for a while.
This is a great point as well. The visualization of the road looks relatively smooth, but who knows what kind of micro-fluctuations/changes in probability are occurring. For example, I've noticed that fallen leaves along the road near my house really makes FSD Beta go haywire - almost like the drivable space it sees on the road is constantly shifting!
 
Great point - this is the one other major weakness with FSD Beta today. Phantom braking is worse than regular AP.
Part of this may be false positive/negative tuning. With AP, erring on the side of not braking probably is "better". For FSD Beta, I imagine they may want to tune it so that it would rather brake or slow down unnecessarily than for it to not respond. The former is an annoyance, the latter may mean a crash, especially given FSD Beta is designed for more complex road situations than AP.
 
  • Like
Reactions: AlanSubie4Life
You can definitely add lane selection to the list of path planning/diving logic issues. On some roads it tries to stay in the right most lane, which is pretty absurd for city driving because that lane frequently becomes a turn lane or gets blocked by turning cars. Or it tries to get into a faster(?) lane for a second, just to find out it's not actually faster. For very dense traffic it would be better off just trying to get in the right lane as early as possible and waiting it out. They all move at about the same speed. If they don't well chances are the lanes that move will lead you else where and people won't let you merge.

Same thing for 2 lane roads turning onto 4 lane roads where the left and right lanes are turn lanes - It needs to know which lane it has to be in to get where it is.
Especially when it has a light the car can choose most of the lanes but from what I've seen it tries to stick to the right most lane (if it's in the 2nd from the right, it'll go to the 2nd from the right) instead of going for a lane GPS know about ("2nd from the left etc.)

For example, I've noticed that fallen leaves along the road near my house really makes FSD Beta go haywire - almost like the drivable space it sees on the road is constantly shifting!
Oddly enough I have no issues with the leafs at all and fall is in full swing here (all unmarked roads). However the tree shadows early in the morning are fun with phantom breaking.
 
  • Like
Reactions: momo3605
Anyway, maybe I'm stating the obvious - maybe it's well known that perception is easy and path planning is hard.

Perception and Planning are different kinds of problems.

Perception is more objective and straight forward. The world is physical. Position, classification, size, velocity are real quantities that can be measured. It is matter of how accurately. And we have the tools to do perception. This does not mean that perception is easy. It takes a lot of hard work to do accurate perception of all the road features and objects. But we know HOW to do it. We can do object detection, classification, depth perception etc now really well. There are of course still weird edge cases. For example, Waymo mentioned a case of a cyclist with a stop sign on his back. We can detect and classify the cyclist and the stop sign but it is critical for the AV to understand that it should not obey that stop sign. Likewise, there was a Tesla video of a truck carrying traffic lights. The Tesla detected the traffic lights on the truck but it is key that it does not try to stop for the traffic lights. Understanding those edge cases can be tricky.

Planning is more subjective. It is not a real quantity that can be measured. It is a decision that depends on perception. If your perception is wrong, especially your drivable space, then your planning will be off. Perception errors can trickle down to planning. However, even with perfect perception, planning can still be wrong. Some planning is easy and obvious. For example, driving straight on a well marked interstate with no traffic. And we've seen AP and NOA handle that fine. But city driving can present very challenging path planning problems. For example a construction zone where you have to temporarily drive in a non-lane or a busy intersection where there are cars all around you, a truck trying to make a left turn, a cyclist that suddenly cuts you off and a pedestrian that decides to jay walk. Knowing when to turn, when to slow down and when to accelerate that is safe, but also smooth and respectful to other drivers, can be tricky. And what makes path planning difficult in city driving is that other road users can be unpredictable. So your AV can have a solid path plan and then need to change it in a microsecond if other road users change their path at the last second. In busy city driving, an AV will have to consider hundreds of agent interactions and thousands of possible paths according to Cruise (See screenshot below). And unlike perception, there is no objective correct answer. It will depend on the situation, traffic, what other road users do, etc... It can even depend on road culture. For example, road users in LA might expect you to behave differently than road users in Boston or NYC. For all these reasons, I consider Planning to be a more difficult problem to solve than Perception.

bWYw0as.png
 
  • Like
Reactions: dckiwi
For example a construction zone where you have to temporarily drive in a non-lane or a busy intersection where there are cars all around you, a truck trying to make a left turn, a cyclist that suddenly cuts you off and a pedestrian that decides to jay walk.
I definitely agree that path planning gets exponentially more difficult in some of these situations, like this one you describe. I think what's surprising and a bit disappointing are more basic path planning issues, like driving on unmarked roads (as others have said, there may be perception issues at play here) or determining the correct arc for a left turn at a large intersection.
 
  • Like
Reactions: diplomat33
Underneath the stuff that’s visualized there’s a bunch of data used for planning that is not visualized, simplest example being stuff like the voxel pseudo-LiDAR stuff that was posted on Twitter not long ago, or whatever they use to determine right of way, or any NNs doing behavior predictions of other agents, etc. Even if the raw perception of the environment seems pretty straightforward and complete based on the static snapshot of the scene shown on screen at any given time, I suspect most of the common issues are caused by faulty perception of the intricate details. The planner (also not simple, as mentioned above) has to both make smart decisions and cover up for any shortcomings in perception by acting more cautiously, stopping short even when it doesn't need to, and so on.
 
I may be oversimplifying, but after a few weeks using FSD Beta I've come to the following conclusion: perception is great, path planning and driving logic not so much.

Perception is great at least based on the 'mind of the car' visualizations. Intersections, lane lines, stop signs/traffic signals, and other vehicles are represented accurately. I'm sure there are still some hard problems here - unusual objects in the driving path, accurately measuring the speed and distance of other vehicles, insufficient visibility at heavily occluded intersections, etc. But for the most part the visualizations are pretty damn accurate.

That said, FSD Beta still has some pretty serious issues, and it seems to me like these all relate to path planning and the hard coded-driving logic, aka how to navigate the space that is being perceived. Specifically:
  • Turns where the vehicle does not have guaranteed right of way (i.e. left/right turns at uncontrolled intersections, or right on red, or left without a green arrow). In these situations FSD is incredibly cautious to the point of being useless. It stops too far back, creeps way too slowly, and even when there is no cross traffic it can sit for far too long before making a decision, sometimes only making a move when the opportunity is gone.
  • Driving on unmarked roads. Not only does the car drift way too far into the middle of the road, but turning and braking can be incredibly jerky when navigating unmarked roads. Again, the drivable space is accurately displayed on the screen, yet the car jerks around violently and seems extremely unsure of how to proceed. Seems strange given how accurate the drivable path projection is.
  • Smooth completion of a left turn. This one definitely seems like it's path planning. Turns at small/simple intersections usually work well, but at larger intersections the car can turn too sharply and end up heading for the divider or the yellow line, and then have to swerve back to make it into the driving lane.
Anyway, maybe I'm stating the obvious - maybe it's well known that perception is easy and path planning is hard. Or maybe I'm wrong about how accurate/high confidence the perception system really is. Would love to hear from others who know more about this space than I do.
You make some good points, and overall I agree. But I think its a bit more complex than that. From careful observation, I think its a combination.

My reading is that the vision system supplies the best data it has, with the usual uncertainty measures. The path planner uses that to make a path plan/predictions etc. For example, to make a turn. When the car starts executing the turn, the vision system updates its model as the camera views update, changing (slightly) the probabilities and the computed positions of road markings etc. The path predictor updates its path, but then has to apply course correction to get the car back onto the new path. I think that is the source of most of the jerkiness people are seeing.

So its really both systems interacting. As (say) the vision improves in accuracy, the changes to the paths will go down, and as the path predictor gets better, its paths will be more stable. It also needs to have a path "corridor" (with bound widths) so that it can more gradually correct the path and remove the jerkiness we are all seeing.

A gross simplification I'm sure, but I think it's probably something like that.
 
Last edited: