Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Looking for detailed commentary on Cruise, Waymo, Mobileye, Tesla approaches and technology

This site may earn commission on affiliate links.
I wonder how much of that goes back to perception. I can look at another driver in a car at a stop sign and communicate with them/read their gestures to know what they’re planning to do, while an autonomous car trying to follow rules can’t. There are other nuanced actions a driver may take that help another driver perceive their intentions that may be harder for a neural net to parse out from noise. All of the companies must reduce what they take in with their sensors to structured data, which necessarily means throwing out some data that may be signal / not noise to a human. I suppose hence the need for capabilities like Dojo - if you decide you want to change the perception algos to add new vectors, you may need to completely retrain the models.

Anguelov talked about this problem of what perception data to transmit to the prediction part. You don't want to send too much perception data that ends up having a lot of "noise" that confuses Prediction. You also want to represent Perception in a simple way that reduces compute power and makes it easier for Prediction to be reliable. But you don't want to accidentally eliminate good data that Prediction needs to to make a reliable prediction.

The basic perception data you need for other road users is object classification, position, orientation and velocity. With cameras, radar and lidar, getting that part is easy. But you need more. As you pointed out, you also need perception that can help you figure out intent. So turn signals are another important data you need. Body language or hand gestures of the driver in the other car, a pedestrian or a cyclist are also important data to have since the body language or hand gestures may also be clues about intent. That is why companies like Waymo and Cruise can recognize body language and hand gestures of other road users.

Perception is a critical foundation. All the data that the AV will use to drive will come from Perception. So if the AV has bad or incomplete perception, it will have problems driving correctly. But having the best perception does not guarantee good driving since the AV also needs good Prediction and Planning. So you need to have good Prediction and Planning too. But it starts with perception. I think that is one reason why companies like Waymo use cameras, radar and lidar and have so many sensors around the car. They want to make sure the AV is getting the best perception possible, 360 degrees around the car, and in all conditions (day/night, weather).
 
I found this blog that details how Tesla, Waymo, Cruise, Aurora, Nividia and Waabi are doing Active Learning, Data Selection, Data Auto-Labeling, and Simulation in Autonomous Driving. It is long but well researched. It is a good comparison between these different companies for this area of FSD.