Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Autonomous Car Progress

This site may earn commission on affiliate links.

Alex makes an important point: Wayve did not just strap chatGPT to their autonomous driving and ask it to narrate what it sees. Wayve is training a single end-to-end AI model on both vision, language and action. That is an importance distinction.

I can see the application for solving the black box problem in e2e. By training the AI model on both vision, language and action, it is able to explain what it is "thinking" and doing. So we can see the what and the why behind the e2e.
I wonder how often it "updates" the LLM portion.

It maintained "Keeping a steady speed, the road is clear" even when a pedestrian was running across the road around 7:10 (and continued accelerating).
 
First reported Zoox accident:

According to the CA DMV's crash reports, Zoox's purpose-built robotaxi had a collision on April 2 and sustained moderate damage.

"A Zoox vehicle in autonomy, traveling westbound on 3rd Avenue toward Foster City Blvd. in Foster City, drove over an in-lane utility valve box with a displaced valve cover. The Zoox vehicle sustained damage to the undercarriage. There were no reported injuries, and police were not called."

 
Does that mean that the valve cover stuck up enough to damage the undercarriage, but wasn't up enough for the camera, radar, and lidar to see it? (I assume it has all three.) I guess it is more likely that all three saw it, but their perception stack failed to properly identify it?

Yeah, I assume the cameras and/or lidar and/or radar detected the valve cover sticking up but was not trained to identify it as an obstacle and therefore ignored it.
 
Yeah, I assume the cameras and/or lidar and/or radar detected the valve cover sticking up but was not trained to identify it as an obstacle and therefore ignored it.
Yeah, the data from the sensor(s) in this situation was different from the data from the sensor(s) in a nominal case. An example of sensors themselves not inherently providing better performance/ redundancy.

Personally, I think the best thing Tesla could do is announce a new Tesla model that has cameras and will use V12 end to end but also has surround radar and front lidar, as well as HD mapping, for extra redundancy
 
Yeah, I assume the cameras and/or lidar and/or radar detected the valve cover sticking up but was not trained to identify it as an obstacle and therefore ignored it.
I wonder if it is simpler than that. Maybe it is programed that it has 5" of clearance, saw the object at 4.75" and decided it was OK to drive over it. But maybe the clearance figure isn't dynamic to account for suspension droop/compression from load/wear?
 
  • Like
Reactions: diplomat33
Yeah, the data from the sensor(s) in this situation was different from the data from the sensor(s) in a nominal case. An example of sensors themselves not inherently providing better performance/ redundancy.

Of course, a sensor does not automatically provide better performance, it is how you use the sensor that matters. But that does not mean that we should dismiss the potential benefits of extra sensors either. For example, if you are driving at night and your cameras don't receive enough pixels then your perception software will be handicapped. Adding lidar will give you better data in that scenario that will definitely help. It is then up to you to use that better data correctly. But having the right data to start with is essential.
 
  • Like
Reactions: Ben W
Of course, a sensor does not automatically provide better performance, it is how you use the sensor that matters. But that does not mean that we should dismiss the potential benefits of extra sensors either. For example, if you are driving at night and your cameras don't receive enough pixels then your perception software will be handicapped. Adding lidar will give you better data in that scenario that will definitely help. It is then up to you to use that better data correctly. But having the right data to start with is essential.
If there were a situation where headlight illumination was insufficient versus albedo and yet lidar had sufficient return signal, and the lidar point cloud density could register the object, and the software could properly interpret the lidar, then the lidar would help in that case.
However, what does that do to all the other cases when vision was performing correctly yet lidar is saying something different?
(Same old discussion)
 
I wonder if it is simpler than that. Maybe it is programed that it has 5" of clearance, saw the object at 4.75" and decided it was OK to drive over it. But maybe the clearance figure isn't dynamic to account for suspension droop/compression from load/wear?
Or the lid was offset and tiped up as it was driven over. In that situation, the lid may have occluded the opening.
 
  • Like
Reactions: Doggydogworld
We don't need to complicate it. It drove over an open manhole cover and the tire either went in or lifted a partially open lid and the undercarriage got damaged.

Just a reminder of how bad it can be depending on fast you are going.
TJ9CUvz.gif


kGBanI7.gif
 
  • Like
Reactions: DanCar
It maintained "Keeping a steady speed, the road is clear" even when a pedestrian was running across the road around 7:10 (and continued accelerating).
Also says road is clear at 5:03 despite pedestrian crossing. Also says "Reducing speed for the cyclist" at 0:19 when 1) it doesn't reduce speed and 2) the cyclist has moved safely behind some parked cars. And "Moving to right lane" at 0:53 when there's only one lane (except oncoming traffic lane, which it obviously doesn't move into or we'd never see that part of the video).

I didn't watch it all, but IMHO this illustrates a problem with E2E and quasi-E2E -- you really have no idea what the model is doing. Even this language model output seems almost random at times. How do you ever do failure analysis?
 
I didn't watch it all, but IMHO this illustrates a problem with E2E and quasi-E2E -- you really have no idea what the model is doing.
I don't understand. The very purpose of this effort is to give people an idea of what the model is doing. The fact that it said that it was doing something that it wasn't was a means of letting us know that it has a bug, either in the text generation system or in the driving system. If FSD explained what it was doing, we'd know right away why it was driving at a particular speed, or why it refused to change lanes, or why it does all the other odd things that it does.

Or are you observing that without an effort such as this, we don't know? If so, then it's good that we have things like this to help inform us.
 
Brad Templeton has a nice primer on the companies using state of the art AI like end-to-end and LLMs.

Here a few highlights I pulled:

Classic approach:
A classic self-driving system is divided into modules, which are roughly layered. The core modules are perception (what’s out there,) prediction (where is everything going,) planning (where will I go) and execution (pedals and wheel,) with additional support for localization (where am I?) as well as mapping, user interface, HQ interface, remote operation and more. The lines between perception, prediction and planning have become blurred, and most of all in what’s known as an “end to end” neural network design.

Tesla's end-to-end approach:
Tesla made waves by rewriting their driver assist system (now called Supervised Full Self Driving) to heavily use end to end networks. In an E2E system, there’s very little traditional programming logic. Instead, data from sensors (particularly cameras in Tesla’s case) go in and driving decisions come out. It is frightening to some that the programmers have only a limited idea of how the system makes decisions, they just know it does better. Most reviewers believe that Tesla’s new SFSD outperforms the older one—though many reviewers don’t realize just how greatly far behind the other self-driving systems it remains, in spite of improvements.

Others:
There are other believers in E2E, however, to various degrees, including UK startup Wayve and Toronto startup Waabi, which both presented at Nvidia GTC. Open source ADAS tool “comma” has also long used it, and HYPR, the new startup from Zoox co-founder Tim Kenley-Klay also is reported to use this approach. Writing the software is “easy” because you don’t write much, what matters is getting the right training data, and lots of it, combined with lots of compute. Tesla has been planning a giant compute center called Dojo for this, but construction of it has been delayed, reportedly angering Elon Musk and resulting in some of the recent executive departures.

How data is used for end-to-end:
Your initial training data comes from recordings of humans (and robots) doing successful drives. You must remove or label any recordings of bad driving behavior or the system will learn it. (Tesla had to remove all recordings of people doing rolling stops, since NHTSA ordered them to not have their cars do this very common activity.) Most teams also add simulated drives to the training data, and this is the specialty at Waabi, which does most training in simulator. This includes adversarial training, where one AI tries to be as clever as possible in creating simulated scenarios which will make the car crash, so it can learn what not to do. This can allow the car to have experienced far more bad situations than any human could.

Transformer LLMs for autonomous driving:
Driving may not seem like writing, but once perception has been done, what the sensors see can be turned into a string of tokens not too different from sentences. And so, an LLM that has been trained on tons of driving can get very good, and very human, at deciding what should come next in any situation. You can try this with your favorite AI, and you may see that even though all it did was read books about driving, it’s able to figure things out from very basic perception information.

Nuro
Nuro, for example, which makes delivery vehicles, has both an AI planner and a traditional one, and they both make proposals for what the machine should do at any given moment. Another tool then picks which of the plans it thinks is best. Usually it’s the AI planner that makes the best and most human-like choice.

Zoox
Zoox has been slower to fully absorb LLMs but is working on it as well, and while Waymo has made limited comment they are also believed to be doing so—after all, the transformer model that lies at the base of all LLMs was developed at Waymo’s sibling Google.

Black box problem
People are afraid of “black box” approaches which may make decisions for reasons unknown to their developers. If you encounter problems, you can “fix” them by adding more training designed to discourage the bad choices, but without the same certainty of traditional programming. I often ask people, “Would you prefer a car that crashes once in a million miles but you can’t explain why, though you can fix it, or a car which crashes twice in a million miles but you know just why it did?” I get both answers.

Wayve
UK developer Wayve has merged an actual text LLM with their E2E driving system. You can ask it at any time why it’s doing what it’s doing. They hope that will make people feel better, as well as help debug it. When it was stopped at a red light with some cars ahead of it, I asked it why it wasn’t going. It mentioned the red light, but not the more important forward traffic, which I felt was a serious error, since it would not drive into them just because the light went green. This approach thus needs more work but may help deal with the fear.

Conclusion
It’s good news that so many different approaches are being worked with, from LLMs to classic imitation and reinforcement learning, to traditional robotics constraints that are better at rigidly following rules of the road. Tesla and MobilEye have the largest pools of data of human drivers and hope those will give them an edge in a world where the party with the most training data and compute wins. But there’s a lot of data out there, and a lot of compute, when you consider that companies like Google, Amazon and Nvidia are still fighting in this game. While a number of companies have shut down in this race, including projects at major automakers, there are still many in the race, hoping to be the first to deploy this at scale.