Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Looking for detailed commentary on Cruise, Waymo, Mobileye, Tesla approaches and technology

This site may earn commission on affiliate links.
Has anyone come across a reasonably detailed comparison of the state of the self-driving tech of the major players (and others)? Something that's:

  • Detailed and technical (i.e., goes beyond "Waymo is Geofenced" or vision vs LiDAR). Broken down to the subcomponents - perception, decision making, etc (or whatever the right components are). Ideally put together by a technical person who's consumed all the public info out there, including AI day, Cruise's "Under the Hood" event, Elon's interviews, Green & @rice_fry's insight into FSD Beta, etc.
  • Objective and not tribal
  • Current
  • Focused on the tech and approach as much as is known, not overly reliant on inferences based on public actions e.g., Cruise applying for a driverless permit
I'd love to watch a long interview or read a long article that gets into the details and tries to make some sense around what's similar vs. different across the approaches and in which elements each player may be ahead or behind the pack. I'm honestly not even sure this is possible with the publicly available info out there, but perhaps there's someone who's got their ear to the ground in the industry and has synthesized the info out there beyond what's easily available.

Perhaps put another way - who are the best people in this forum, Twitter, elsewhere to read/follow/watch on these topics?
 
Has anyone come across a reasonably detailed comparison of the state of the self-driving tech of the major players (and others)? Something that's:

  • Detailed and technical (i.e., goes beyond "Waymo is Geofenced" or vision vs LiDAR). Broken down to the subcomponents - perception, decision making, etc (or whatever the right components are). Ideally put together by a technical person who's consumed all the public info out there, including AI day, Cruise's "Under the Hood" event, Elon's interviews, Green & @rice_fry's insight into FSD Beta, etc.
  • Objective and not tribal
  • Current
  • Focused on the tech and approach as much as is known, not overly reliant on inferences based on public actions e.g., Cruise applying for a driverless permit
I'd love to watch a long interview or read a long article that gets into the details and tries to make some sense around what's similar vs. different across the approaches and in which elements each player may be ahead or behind the pack. I'm honestly not even sure this is possible with the publicly available info out there, but perhaps there's someone who's got their ear to the ground in the industry and has synthesized the info out there beyond what's easily available.

Perhaps put another way - who are the best people in this forum, Twitter, elsewhere to read/follow/watch on these topics?

I like to think I am pretty knowledge. I watch all the technical presentations from Tesla, Waymo, Cruise, Mobileye etc...

Do you have a specific question that I could try to answer?
 
  • Like
Reactions: Tam
2 tribes exist despite the wish for non-tribal:

1) Non-Lidar. Stick to Pure Vision. Fusion is too complex.

2) Fusion, including Lidar.

MobilEye does believe in non-Lidar also but only for L2, ADAS.
That is a tribal explanation. In short, Elon’s view is in short is …

HD Maps + LiDAR is ugly and don’t scale … so from first principles : “ since humans don’t have LiDAR, cars don’t need LiDAR either. ”

Getting back to OP’s question, I’d love to see one as well. Last time I searched I couldn’t find anything detailed and technical enough.
 
  • Like
Reactions: impastu
...who are the best people in this forum, Twitter, elsewhere to read/follow/watch on these topics?
I follow @diplomat33 because he is non-biased even though he owns Tesla but he's open-minded in seeking information about Autonomous Vehicle Progress regardless of the sources/companies/tribes.

It's great to follow Elon Musk except that would violate your rule of "non-tribal" because he said Lidar is “a fool’s errand”.

MobilEye doesn't trash Tesla Vision as “a fool’s errand” but they diplomatically say it's great for ADAS.
 
I like to think I am pretty knowledge. I watch all the technical presentations from Tesla, Waymo, Cruise, Mobileye etc...

Do you have a specific question that I could try to answer?
Yes! You clearly are - I've seen a lot of your posts and wish there was some sort of wiki with all of them :).

What's your take on the Cruise Under the Hood video - anything new there? It all seems fairly sophisticated, but I would imagine Tesla is doing a lot of the same things in terms of predicting how different situations may play out and how various actors might interact with the car.

Elon in his interview with Lex mentioned the challenge of converting the images/video into vector space - do you think Cruise / Waymo are ahead on this dimension - i.e., better translating the environment into all of the different types of structured data needed to inform the decision making?
 
  • Like
Reactions: impastu
@SupersonicP3D Here is a summary of the 3 basic FSD approaches are:

1) Pure vision (Tesla, Comma.AI): Collect images/video clips and use ML to train NN to do all the driving tasks based solely on what the cameras see. No radar, no lidar, no HD maps. This approach believes they are unnecessary because it argues that cameras provide all the data needed to solve autonomous driving, it is only a matter of the right data and the right training. This is based on the fact that cameras do collect visual data which contains all the information you need about an environment (size, distance, color etc). After all, vision is how humans and animals navigate the world. The approach also argues that building HD maps will add cost and complexity and make it difficult to scale. Tesla uses 8 cameras but other companies like Mobileye use 12 because they include 4 parking cameras. Tesla uses NN to stitch together all the camera views into a single coherent 3D view of the world around the car. Then, Tesla uses a combination of NN and some hardcoded rules to decide what path to take based on perception. I believe Tesla does use navigation maps to help with navigation, but the maps are not HD. This approach has the advantage of being very cheap and can improve quickly with data. The main disadvantage will be that it will only be as good as the training. The system may fail if presented with edge cases that it has not been trained on yet. Furthermore, since the car only has cameras, any situation that impairs the cameras like rain, snow, fog, dirt etc will likely reduce the system's reliability.

2) Sensor fusion (everybody else): This approach uses cameras, lidar, radar and HD maps. This approach argues that since cameras, radar and lidar have different strengths and weaknesses, using all three will give you the best of everything and will compensate for any one sensor's weakness. So the system will be more reliable. Furthermore, HD maps can provide rich information on the static world and aid the car in making better decisions. The advantage of this approach tends to be better perception which leads to the car being able to make better predictions and planning. The disadvantage is that it is more costly and more complex. Early sensor fusion is where you fuse all the data from cameras, lidar and radar first into a single 3D view. Then, the perception feeds into prediction (to predict what other road users will do) and prediction feeds into planning to decide what path to take. Planning takes into account the nav directions to stay on the correct general route to reach the intended destination but also takes into account the perception and prediction to make adjustments to the path to avoid collisions. Planner will also get input from perception and/or the HD map about things like traffic lights, stop signs, construction zone, one way street, etc to know when to stop or change paths etc.. This approach uses extensive ML and data for perception, prediction and planning.

3) Mobileye is taking a sort of "hybrid" approach. They are developing vision-only like Tesla, except with HD maps. So they are using vsion-only + HD maps. But they are restricting their vision-only system to L2 that will be hands-free but require driver supervision. This is because they argue that while vision-only can drive the car pretty well, they don't believe the safety is good enough for driverless. So Mobileye's robotaxis will have cameras, lidar, radar and HD maps like Waymo. However, Mobileye's sensor fusion is different from Waymo's in a key way. Mobileye is planning a sort of late sensor fusion that their marketing has dubbed "true redundancy". The approach is for camera vision to build its own perception and the radar/lidar to build its own separate perception. And then the car compares the planning from both subsystems and based on its driving policy determines the best path. Mobileye argues that this approach will be more reliable because it is less likely that both the camera and radar/lidar will output the same bad path. If your camera vision makes a mistake, then the radar/lidar will likely not make the same mistake and vice versa. So the overall error rate of the entire system will be less than the error rate of vision-only.

I hope this helps.
 
Last edited:
What's your take on the Cruise Under the Hood video - anything new there? It all seems fairly sophisticated, but I would imagine Tesla is doing a lot of the same things in terms of predicting how different situations may play out and how various actors might interact with the car.

I thought the Cruise Under the Hood video was very impressive. Cruise does have very sophisticated prediction and planning. In particular, they showed off how their autonomous driving is programmed to handle uncertainty which is key for safe driving.

I could be wrong, but based on AI Day, I don't think Tesla's prediction is as advanced as Cruise or Waymo. Tesla seems more focused on perception. Tesla's vision does measure position and velocity of other objects pretty accurately to determine current trajectory of objects. But Cruise and Waymo are doing a lot of ML on group behavior and prediction of intent, for example predicting pedestrian intent based on if they are talking to someone or distracted by their phone or if a moped is actually pulling into traffic or just doing a 3 point turn to park. Waymo has ML that can predict paths up to 8 seconds into the future for hundreds of objects.

Elon in his interview with Lex mentioned the challenge of converting the images/video into vector space - do you think Cruise / Waymo are ahead on this dimension - i.e., better translating the environment into all of the different types of structured data needed to inform the decision making?

Yes, I think Waymo and Cruise are probably ahead in this area. To Tesla's credit, they have done amazing work with converting images into 3D vector space. But Waymo and Cruise also convert camera, lidar and radar data into 3D vector space. And lidar and radar tend to be very accurate in this regard. With lidar and radar, you can create very accurate 3D vector space. I know Waymo is doing a lot of work on this. Anguelov recently talked about how they are working on the best way to represent the 3D vector space to help the car make the best decisions. Waymo has NN called VectorNet which converts the HD map and the perception into vectors and lines.
 
@SupersonicP3D Here is a summary of the 3 basic FSD approaches are:

1) Pure vision (Tesla, Comma.AI): Collect images/video clips and use ML to train NN to do all the driving tasks based solely on what the cameras see. No radar, no lidar, no HD maps. This approach believes they are unnecessary because it argues that cameras provide all the data needed to solve autonomous driving, it is only a matter of the right data and the right training. This is based on the fact that cameras do collect visual data which contains all the information you need about an environment (size, distance, color etc). After all, vision is how humans and animals navigate the world. The approach also argues that building HD maps will add cost and complexity and make it difficult to scale. Tesla uses 8 cameras but other companies like Mobileye use 12 because they include 4 parking cameras. Tesla uses NN to stitch together all the camera views into a single coherent 3D view of the world around the car. Then, Tesla uses a combination of NN and some hardcoded rules to decide what path to take based on perception. I believe Tesla does use navigation maps to help with navigation, but the maps are not HD. This approach has the advantage of being very cheap and can improve quickly with data. The main disadvantage will be that it will only be as good as the training. The system may fail if presented with edge cases that it has not been trained on yet. Furthermore, since the car only has cameras, any situation that impairs the cameras like rain, snow, fog, dirt etc will likely reduce the system's reliability.

2) Sensor fusion (everybody else): This approach uses cameras, lidar, radar and HD maps. This approach argues that since cameras, radar and lidar have different strengths and weaknesses, using all three will give you the best of everything and will compensate for any one sensor's weakness. So the system will be more reliable. Furthermore, HD maps can provide rich information on the static world and aid the car in making better decisions. The advantage of this approach tends to be better perception which leads to the car being able to make better predictions and planning. The disadvantage is that it is more costly and more complex. Early sensor fusion is where you fuse all the data from cameras, lidar and radar first into a single 3D view. Then, the perception feeds into prediction (to predict what other road users will do) and prediction feeds into planning to decide what path to take. Waymo also uses HD maps and a Router to figure out the general route to reach the destination. Planning takes into account the nav directions to stay on the correct general route to reach the intended destination but also takes into account the perception and prediction to make adjustments to the path to avoid collisions. Planner will also get input perception and the HD map about things like traffic lights, stop signs, construction zone, one way street, etc to know when to stop or change paths etc.. This approach uses extensive ML and data for perception, prediction and planning.

3) Mobileye is taking a sort of "hybrid" approach. They are developing vision-only like Tesla, except with HD maps. So they are using vsion-only + HD maps. But they are restricting their vision-only system to L2 that will be hands-free but require driver supervision. This is because they argue that while vision-only can drive the car pretty well, they don't believe the safety is good enough for driverless. So Mobileye's robotaxis will have cameras, lidar, radar and HD maps like Waymo. However, Mobileye's sensor fusion is different from Waymo's in a key way. Mobileye is planning a sort of late sensor fusion that their marketing has dubbed "true redundancy". The approach is for camera vision to build its own perception and the radar/lidar to build its own separate perception. And then the car compares the planning from both subsystems and based on its driving policy determines the best path. Mobileye argues that this approach will be more reliable because it is less likely that both the camera and radar/lidar will output the same bad path. If your camera vision makes a mistake, then the radar/lidar will likely not make the same mistake and vice versa.

I hope this helps.
This is definitely helpful - do you have a sense for what key problems Cruise/Waymo are still trying to solve? What are the big blockers - perception? Driving rules? Something else entirely?
 
I thought the Cruise Under the Hood video was very impressive. Cruise does have very sophisticated prediction and planning. In particular, they showed off how their autonomous driving is programmed to handle uncertainty which is key for safe driving.

I could be wrong, but based on AI Day, I don't think Tesla's prediction is as advanced as Cruise or Waymo. Tesla seems more focused on perception. Tesla's vision does measure position and velocity of other objects pretty accurately to determine current trajectory of objects. But Cruise and Waymo are doing a lot of ML on group behavior and prediction of intent, for example predicting pedestrian intent based on if they are talking to someone or distracted by their phone or if a moped is actually pulling into traffic or just doing a 3 point turn to park. Waymo has ML that can predict paths up to 8 seconds into the future for hundreds of objects.



Yes, I think Waymo and Cruise are probably ahead in this area. To Tesla's credit, they have done amazing work with converting images into 3D vector space. But Waymo and Cruise also convert camera, lidar and radar data into 3D vector space. And lidar and radar tend to be very accurate in this regard. With lidar and radar, you can create very accurate 3D vector space. I know Waymo is doing a lot of work on this. Anguelov recently talked about how they are working on the best way to represent the 3D vector space to help the car make the best decisions. Waymo has NN called VectorNet which converts the HD map and the perception into vectors and lines.
This is interesting. I’m going to look up the Anguelov talk. I suppose the other piece of this is translating the 3D vector space into recognizable and relevant components (cars, pedestrians, obstacles, bumps, trees, etc.). I would imagine that’s *relatively* solved at this point by everyone.
 
You need to look at all the players with skepticism - not fall for marketing. Just as we are skeptical about claims of Musk that FSD will be better than human in a year, we should be not gullible about Waymo’s ability to scale.

What's your take on the Cruise Under the Hood video - anything new there? It all seems fairly sophisticated, but I would imagine Tesla is doing a lot of the same things in terms of predicting how different situations may play out and how various actors might interact with the car.
The problem with Cruise video or even Tesla AI day is - we don’t know what is current, what’s planned, what’s marketing.

Elon in his interview with Lex mentioned the challenge of converting the images/video into vector space - do you think Cruise / Waymo are ahead on this dimension - i.e., better translating the environment into all of the different types of structured data needed to inform the decision making?
It’s more difficult for Tesla because they use only cameras. Cruise, Waymo and others use LiDAR and HD maps which are inherently vector space oriented.

But the challenge they face are different - one of getting better at driving vs geographic scaling.

PS: Other big thing people don’t talk about is scope. Robotaxi players like Waymo just want to operate in big cities. They won’t cover all the x million paved roads. They will probably not cover even top metros in US this decade. They will made HD maps of their service area and that’s it. So, their approach simply doesn’t work for Tesla’s scope.

Tesla’s scope is all of the mapped world. But the question is whether the FSD can get good enough for hands free driving this decade with the cameras they have.
 
Last edited:
  • Like
Reactions: nvx1977
This is definitely helpful - do you have a sense for what key problems Cruise/Waymo are still trying to solve? What are the big blockers - perception? Driving rules? Something else entirely?
For me, the basic task to achieve is collision avoidance.

I think the technology has been here to be refined into perfection but Tesla has not achieved that ability, especially avoiding stationary obstacles, just yet.

Waymo can't afford to have a fatality on its watch. That's its basic principle until someone would otherwise propose to sacrifice some humans for the sake of Autonomous Driving.

Thus, by switching from radar to radarless, Tesla hopes it would solve the collision problems so the NHTSA can leave it alone.

That seems to be basic but the annual CA DMV disengagement report still shows that every single company has disengagements last year (Tesla says it's L2 so there's no need to report).

Once that basic skill is achieved, then the next would be intelligence. It's nice that Waymo didn't collide with a traffic cone that's inconsistent with its pre-mapping but it didn't have the intelligence on what to do next.

Once those 2 steps above are achieved then how to scale up:

Tesla is generalized so it should work anywhere and not be confined to a geofenced location.

Others will have to deal with the very big task of scaling up from one geofenced location to the rest.

Pre-mapping seems to be a big task but if Google Map could record the streets of the world for its own map, others can do it too. It only takes time and money but achievable.

If the first basic task of collision avoidance will not be perfected by all the methods being used right now, maybe additional sensors/transmitters to help out such as V2V (Vehicle-to-Vehicle), V2P (Vehicle-to-Person), transmitter at signal lights/intersections, embedded transmitters on the roads, road shoulders, Dedicated Autonomous Driving Roads only...

That would cost much more because it deals with infrastructure. Maybe there will be a government that will be able to foot the bill for such an infrastructure.

It might be silly that I have to have a V2P transmitter so that cars can sense me and don't hit me but that system could have prevented the fatal accident as a car was hitting 2 traffic policemen in China.
 
Last edited:
This is definitely helpful - do you have a sense for what key problems Cruise/Waymo are still trying to solve? What are the big blockers - perception? Driving rules? Something else entirely?

Anguelov mentioned two technical problems Waymo is particularly focused on. One is the transition from perception to prediction, Waymo has a good way of representing perception but they are looking for better ways to represent perception to make prediction better and more efficient. The other problem is merging prediction and planning. He points out that prediction and planning are a closed loop. You need to predict the behavior of other road users in order to plan the right path. But the path you take will change the behavior of other road users thus affecting your prediction.

More generally, I think the main problems are proving safety, and prediction and planning.

1) Safety
Waymo released a safety report on 6M autonomous miles in Chandler awhile back. The report showed that the Waymo was safer than humans in some areas but not necessarily safer than humans in other areas. The Waymo does not get into single vehicle accidents like humans do. Waymo does not drive distracted, tired or drunk. Waymo always follows the speed limit and trafic riles so it does not cause accidents caused by speeding or running red lights or stop signs like humans sometimes do. Those are positive factors for safety. The negative factors for safety is that the Waymo is not always able to avoid accidents caused by other drivers when it could (this is a prediction/planning issue). The report showed that a lot of the accidents were caused by bad human drivers. Waymo got rear ended a bunch of times. Some of the rear ends were the fault of the human driver who was not paying attention. But some of the rear ends where the Waymo did not behave in a way that human drivers expect. For example, Waymo may sometimes slow down unexpectedly or stop when it is not sure what to do. There can be situations that are easy for human drivers but are confusing for autonomous cars. We have to remember that computers are basically "dumb", they simply follow rules, perhaps complex rules but still rules. They can't actually "think" on their own.

The big question is, when do you know that your AV is safe enough? The AV might be really good in a lot of situations but how do you know that it can handle what you don't know. I think this is the main reason Cruise and Waymo have not scaled yet. Yes, their AVs can handle a lot of situations but can the AVs really be trusted in all situations? What about situations or edge cases the AVs have not seen yet? You will never see all edge cases, but when do you know that you've seen enough edge cases that your AV is "good enough? One approach is to do a lot of driving so that you have a lot of safety data. But this approach requires billions of miles of real world driving to get statistically accurate safety data. Getting billions of miles of driving takes time and a large fleet. Plus, you have to do all those billions of miles with a safety driver to make sure the AV is safe while testing. Another approach is to use simulations to try to test in situations you haven't seen in the real world yet. But this requires a hyper realistic simulation that is extremely close to the real world, not just physically but also in the behavior of other road users. You can also build safety cases where you test you AV in lots of safety critical scenarios to make sure it can handle those cases. This is essential to make sure you've covered the essentials of safety but this won't cover all edge cases. Another approach is to try to program your AV to be prudent and defensive when it is uncertain. That way, you don't need to test for every single edge case which is not possible but you can be reasonably confident that when your AV encounters something new, it will at least behave in a cautious way that should probably be safe. Cruise talked about this in their Under the Hood presentation.

2) Prediction/Planning
If humans always followed road rules and always behaved in logical, predictable ways, solving autonomous driving would be a lot easier. You would be able to predict behavior better and program good driving behavior for the AV. Unfortunately, as we all know, humans don't always follow road rules and don't always act in logical ways. Humans can be unpredictable on the road. How do you program the AV to handle situations where other road users might behave in unpredictable ways? Unpredictable behavior like a pedestrian suddenly deciding to jay walk, a car deciding to run a red light, a motorcyclist deciding to split lanes, is what causes many accidents. An AV needs to be able to drive defensively and anticipate possible behaviors. Currently, companies like Waymo and Cruise have NN that creates probabilities on different possible paths. So it might predict a car has a 10% chance of going left, 20% of going straight, 40% of going right etc... Based on those probabilities, the AV can pick a path that is prudent and will avoid a collision. The AV also needs good planning so that it drives in a smooth way. You don't want the AV to drive "robotically" as that confuses other human drivers. Also, robotaxis need to offer comfortable rides to their customers. So you don't want the AV to phantom brake or swerve when it is gets surprised by that unpredictable behavior of other road users. There are also some situations like construction zones where the normal lane lines don't apply that can be confusing for AVs.

Put simply, the challenge is making AVs smarter where they can drive in a smooth and defensive way, anticipate behavior of other road users and not get confused when road rules change.

This is interesting. I’m going to look up the Anguelov talk. I suppose the other piece of this is translating the 3D vector space into recognizable and relevant components (cars, pedestrians, obstacles, bumps, trees, etc.). I would imagine that’s *relatively* solved at this point by everyone.

Anguelov says that perception is mostly solved at this point. Of course, there are still some rare edge cases that still need to be solved. Perception is not the main challenge. The main problem is making smart driving decisions based on what perception sees.

I would recommend these two videos with Anguelov.

The first video is a recent keynote. Anguelov gives an excellent overview of how the Waymo Driver works (perception, prediction and planning), shares some interesting edge cases and discusses current research:


The second video is an interview. Anguelov discusses some problems Waymo is working on and looks to the some future research ideas:


By the way, thanks for starting this thread. I love people who are genuinely interested about autonomous driving and want to learn and don't want to just spout talking points. And it gives me an opportunity to share what I think I know which I love to do. :)
 
Last edited:
  • Informative
Reactions: Skipdd
The problem with Cruise video or even Tesla AI day is - we don’t know what is current, what’s planned, what’s marketing.

This is FUD. If you pay attention, it is obvious what is current, planned or marketing. For example, when Karpathy says something like "here is an example of ML that my team is currently working on" and he shows a NN output, I think we can know that it is current research. Likewise, when Anguelov shares data from a peer reviewed academic paper that Waymo submitted a few months back, we know it is current research. But if Tesla or Waymo say something like "in the future, we hope to improve ML in this area", we know that it is planned research. Or if Tesla or Waymo share some edited video to show off certain capabilities, we know the purpose is marketing. So it is pretty easy based on context to know what is what.
 
This is FUD. If you pay attention, it is obvious what is current, planned or marketing. For example, when Karpathy says something like "here is an example of ML that my team is currently working on" and he shows a NN output, I think we can know that it is current research. Likewise, when Anguelov shares data from a peer reviewed academic paper that Waymo submitted a few months back, we know it is current research. But if Tesla or Waymo say something like "in the future, we hope to improve ML in this area", we know that it is planned research. Or if Tesla or Waymo share some edited video to show off certain capabilities, we know the purpose is marketing. So it is pretty easy based on context to know what is what.
Don’t be gullible. Two years back you believed everything Elon said and got disillusioned when that didn’t happen. Now you believe all the marketing material Waymo, Cruise and MobileEye puts out. Did you figure out Waymo and Cruise CEOs would be soon fired before it happened ? Apparently behind all the happy talk, things are obviously not going well.

I approach this in a different way. Like a good investor / peer reviewer would. Everything a company says is a claim that has passed through legal and PR. It has a lot of obfuscation - you have to see behind the marketing speak.

Ofcourse my “take” on what is BS and what is not will be different from others. That’s why there are buyers and sellers in the market, for example.

If you could just use a few keywords and figure this all out as you claim, bots could do it too.
 
@SupersonicP3D Here is an excellent deep dive from Mobileye on Driving Policy that details the challenges of driving policy, the pros and cons of different approaches, how Mobileye is solving driving policy, with lots of examples. It is very informative. I think you will see why I say that prediction/planning is the biggest challenge in autonomous driving.

 
Last edited:
Anguelov mentioned two technical problems Waymo is particularly focused on. One is the transition from perception to prediction, Waymo has a good way of representing perception but they are looking for better ways to represent perception to make prediction better and more efficient. The other problem is merging prediction and planning. He points out that prediction and planning are a closed loop. You need to predict the behavior of other road users in order to plan the right path. But the path you take will change the behavior of other road users thus affecting your prediction.

More generally, I think the main problems are proving safety, and prediction and planning.

1) Safety
Waymo released a safety report on 6M autonomous miles in Chandler awhile back. The report showed that the Waymo was safer than humans in some areas but not necessarily safer than humans in other areas. The Waymo does not get into single vehicle accidents like humans do. Waymo does not drive distracted, tired or drunk. Waymo always follows the speed limit and trafic riles so it does not cause accidents caused by speeding or running red lights or stop signs like humans sometimes do. Those are positive factors for safety. The negative factors for safety is that the Waymo is not always able to avoid accidents caused by other drivers when it could (this is a prediction/planning issue). The report showed that a lot of the accidents were caused by bad human drivers. Waymo got rear ended a bunch of times. Some of the rear ends were the fault of the human driver who was not paying attention. But some of the rear ends where the Waymo did not behave in a way that human drivers expect. For example, Waymo may sometimes slow down unexpectedly or stop when it is not sure what to do. There can be situations that are easy for human drivers but are confusing for autonomous cars. We have to remember that computers are basically "dumb", they simply follow rules, perhaps complex rules but still rules. They can't actually "think" on their own.

The big question is, when do you know that your AV is safe enough? The AV might be really good in a lot of situations but how do you know that it can handle what you don't know. I think this is the main reason Cruise and Waymo have not scaled yet. Yes, their AVs can handle a lot of situations but can the AVs really be trusted in all situations? What about situations or edge cases the AVs have not seen yet? You will never see all edge cases, but when do you know that you've seen enough edge cases that your AV is "good enough? One approach is to do a lot of driving so that you have a lot of safety data. But this approach requires billions of miles of real world driving to get statistically accurate safety data. Getting billions of miles of driving takes time and a large fleet. Plus, you have to do all those billions of miles with a safety driver to make sure the AV is safe while testing. Another approach is to use simulations to try to test in situations you haven't seen in the real world yet. But this requires a hyper realistic simulation that is extremely close to the real world, not just physically but also in the behavior of other road users. You can also build safety cases where you test you AV in lots of safety critical scenarios to make sure it can handle those cases. This is essential to make sure you've covered the essentials of safety but this won't cover all edge cases. Another approach is to try to program your AV to be prudent and defensive when it is uncertain. That way, you don't need to test for every single edge case which is not possible but you can be reasonably confident that when your AV encounters something new, it will at least behave in a cautious way that should probably be safe. Cruise talked about this in their Under the Hood presentation.

2) Prediction/Planning
If humans always followed road rules and always behaved in logical, predictable ways, solving autonomous driving would be a lot easier. You would be able to predict behavior better and program good driving behavior for the AV. Unfortunately, as we all know, humans don't always follow road rules and don't always act in logical ways. Humans can be unpredictable on the road. How do you program the AV to handle situations where other road users might behave in unpredictable ways? Unpredictable behavior like a pedestrian suddenly deciding to jay walk, a car deciding to run a red light, a motorcyclist deciding to split lanes, is what causes many accidents. An AV needs to be able to drive defensively and anticipate possible behaviors. Currently, companies like Waymo and Cruise have NN that creates probabilities on different possible paths. So it might predict a car has a 10% chance of going left, 20% of going straight, 40% of going right etc... Based on those probabilities, the AV can pick a path that is prudent and will avoid a collision. The AV also needs good planning so that it drives in a smooth way. You don't want the AV to drive "robotically" as that confuses other human drivers. Also, robotaxis need to offer comfortable rides to their customers. So you don't want the AV to phantom brake or swerve when it is gets surprised by that unpredictable behavior of other road users. There are also some situations like construction zones where the normal lane lines don't apply that can be confusing for AVs.

Put simply, the challenge is making AVs smarter where they can drive in a smooth and defensive way, anticipate behavior of other road users and not get confused when road rules change.



Anguelov says that perception is mostly solved at this point. Of course, there are still some rare edge cases that still need to be solved. Perception is not the main challenge. The main problem is making smart driving decisions based on what perception sees.

I would recommend these two videos with Anguelov.

The first video is a recent keynote. Anguelov gives an excellent overview of how the Waymo Driver works (perception, prediction and planning), shares some interesting edge cases and discusses current research:


The second video is an interview. Anguelov discusses some problems Waymo is working on and looks to the some future research ideas:


By the way, thanks for starting this thread. I love people who are genuinely interested about autonomous driving and want to learn and don't want to just spout talking points. And it gives me an opportunity to share what I think I know which I love to do. :)
Thanks so much for this reply! This is exactly what I was looking for. Going through all of these things certainly takes time, luckily I have the rest of the long weekend to catch up on the videos. It’s helpful to understand how much of this is really about predicting actions and the response. I wonder how much of that goes back to perception. I can look at another driver in a car at a stop sign and communicate with them/read their gestures to know what they’re planning to do, while an autonomous car trying to follow rules can’t. There are other nuanced actions a driver may take that help another driver perceive their intentions that may be harder for a neural net to parse out from noise. All of the companies must reduce what they take in with their sensors to structured data, which necessarily means throwing out some data that may be signal / not noise to a human. I suppose hence the need for capabilities like Dojo - if you decide you want to change the perception algos to add new vectors, you may need to completely retrain the models.
 
  • Like
Reactions: diplomat33