FSD v12.x (end to end AI)

zoomer0056 · Jan 30, 2024

diplomat33 said:
If adding microphones helps AVs detect emergency vehicles sooner

How could that help?. If the siren is heard but the ambulance is never seen, was there really an emergency?

KArnold · Jan 30, 2024

diplomat33 said:
If adding microphones helps AVs detect emergency vehicles sooner, that would be a good thing.

Agree. But I'm sure Elon thinks he can listen with Vision.

diplomat33 · Jan 30, 2024

zoomer0056 said:
How could that help?. If the siren is heard but the ambulance is never seen, was there really an emergency?

Yes, could be. The emergency vehicle could be heading towards an intersection with the AV but the AV cannot see the emergency vehicle yet because of occlusions. In fact, there are times where I have heard the siren of a first reponder vehicle before I could see it and the siren got louder so I knew it getting closer. The sound gave me an important clue to slow down and prepare to pull over to let the first responder pass.

mongo · Jan 30, 2024

zoomer0056 said:
How could that help?. If the siren is heard but the ambulance is never seen, was there really an emergency?

Cross traffic emergency vehicle in a city. You have the green, but hear it approaching so you don't enter the intersection.

zoomer0056 · Jan 30, 2024

mongo said:
Cross traffic emergency vehicle in a city. You have the green, but hear it approaching so you don't enter the intersection.

Yes, this I can see. I can also envision a bunch of cars frozen and blocking the way because the emergency vehicle can't be seen. Not a trivial problem.

spacecoin · Jan 30, 2024

To sum up this conversation: The best sensor is the sensor that exists and adds value to the use case. The worst sensor is no sensor.

Also, Elon's Law of sensors: Does the sensor add cost to the BoM and is not absolutely required to ship my L2 branded as L5? Remove it.

spacecoin · Jan 30, 2024

zoomer0056 said:
Yes, this I can see. I can also envision a bunch of cars frozen and blocking the way because the emergency vehicle can't be seen. Not a trivial problem.

Here are some examples:

mongo · Jan 30, 2024

spacecoin said:
The worst sensor is no sensor.

Nah, there are things worse than having no data input...

zoomer0056 · Jan 30, 2024

spacecoin said:
Here are some examples:

Yup, those examples are easy to react to with visual input. I wonder if those vehicles use sound input at diplomat suggests would be good.

powertoold · Jan 30, 2024

No one should be responding to sirens alone, visual cues are more important for avoiding emergency vehicles, other cars pulling over around you, etc.

I doubt anyone gets pulled over for making some effort to not block emergency vehicles

The more difficult task to know when to avoid an emergency scene vs drive through it

spacecoin · Jan 30, 2024

zoomer0056 said:
Yup, those examples are easy to react to with visual input. I wonder if those vehicles use sound input at diplomat suggests would be good.

Yeah poor examples, sorry. I have seen one where is definately uses sound to wait in an intersection.

Here's some additional info:

Recognizing the sights and sounds of emergency vehicles

Whether it’s the high shrills of a fire truck or the flashing lights of a police cruiser, it’s important for every driver to recognize the telltale signs of an emergency vehicle. Over the last few years, we’ve been teaching our self-driving cars to detect and respond to everything from fire...

waymo.com

Twiglett · Jan 30, 2024

Love the conversation about sirens, but the biggest issue in the US is that sirens seem to be created to make it almost impossible to give any directional information, especially in cities.
Many EU countries have sirens that make it very easy to locate the source.

swedge · Jan 30, 2024

dramsey said:
Sure there is, It would work something like this:

1. Look at a frame from the cameras. Tentatively identify cars and other objects that might be moving.
2, Check for positional changes in subsequent frames. Spawn a new thread to track each object that is determined to be in motion.
3. In each "moving object" thread, conintiually update the position and predicted path of the tracked object. Extrapolate to predict possible intersections with ego and/or ego's predicted path.
4. Terminate threads whose objects have stopped moving or cannot intersect ego's path.
5. Rinse and repeat.

You get the idea. Predicting the positions of dozens of vehicles with C code is trivial compared to, say, predicting the position of millions of particles in fluid dynamics simulations. That's not to say that NNs aren't a better solution, but procedural and object oriented programming has handled problems more complex than this for many decades.

Uh... If the car is moving, every object in the images moves from frame to frame, except for those moving directly toward or away from ego, which instead only bloom or shrink.

This frame to frame parallax shift gives solid distance clues, so it is not a bad thing, but your C code just got way more complicated. But, hey, a 3 gram humming bird brain does this, corrects for winds, and adjusts flapping all while flying at speed through 3D trees and landing on a twig. Oh, right, that is a NN, not C code.

Evolution develops these things by killing off the ones which make mistakes, which explains why Telsa's NNs are trained in simulations, at great expense of compute power and time.

JulienW · Jan 30, 2024

Twiglett said:
....Many EU countries have sirens that make it very easy to locate the source.

.....and how is this accomplished?

Mullermn · Jan 30, 2024

powertoold said:
No one should be responding to sirens alone, visual cues are more important for avoiding emergency vehicles, other cars pulling over around you, etc.

I doubt anyone gets pulled over for making some effort to not block emergency vehicles

The more difficult task to know when to avoid an emergency scene vs drive through it

Sirens are a work around to the limitation that human drivers can only look in one place at a time and they get distracted. I'm not sure what useful action an AV would take in response to the sound of a siren.

KArnold · Jan 30, 2024

Mullermn said:
I'm not sure what useful action an AV would take in response to the sound of a siren.

Not enter an intersection? I was first at a stoplight and heard a siten but could not immediately see where it was coming from. Light turned green but nobody moved. 15 seconds later the fire truck crossed in front of me. If I had moved I would have impeded its progress.

But there are apparently different local practices. I remember being in NYC with wall-to-wall cars and an ambulance with full lights/siren next to me crawling as I was. Nobody was moving as there was no place to move to. Pretty sad.

sleepydoc · Jan 30, 2024

Mardak said:
Not sure if this is impressive or terrifying or both?

sometimes driving in big cities is definitely both!

sleepydoc · Jan 30, 2024

Mullermn said:
Sirens are a work around to the limitation that human drivers can only look in one place at a time and they get distracted. I'm not sure what useful action an AV would take in response to the sound of a siren.

You can hear a siren before you can see the vehicle.

diplomat33 said:
I always find these types of arguments a bit silly. Just because we allow something does not mean it is the best approach. I would argue that non deaf people probably handle emergency vehicles more reliably than deaf people because they can hear the sirens. And we want AVs to be as reliable as possible. If adding microphones helps AVs detect emergency vehicles sooner, that would be a good thing.

I think there's a disconnect in the discussion - are looking at what's required or what's 'best?' Required is pretty clear cut; Best is open to interpretation.

JulienW · Jan 30, 2024

Thinking a self driving car needs to hear sirens is anthropomorphism. A siren is used to catch our attention so we are looking in all directions and paying full attention. An autonomous car is ALWYAS looking in all directions and paying full attention. So hearing a siren brings NOTHING to the table.

Goose66 · Jan 30, 2024

diplomat33 said:
Not sure what you mean. The end-to-end stack takes input from the cameras. It does not take input from the autonomous mode. And the end-to-end stack is trained on video to perform certain tasks like handle a roundabout, a lane change, an unprotected turn, a 4 way stop etc... So I am not sure how you would input to tell the end-to-end that you just want AP mode or FSD mode if that is what you are suggesting.

That's not quite true. First, I am still not convinced v12 is one, single end-to-end NN. It is most likely a cascade of NNs. So the driving decision NN takes a variety of inputs beyond the output from the occupancy network (built from the cameras). This includes inputs from speed sensors, accelerometers, mapping data, navigation data, and the like. It also includes user settings, such as speed-offset, the selected FSD driving "profile," "Exit Passing Lane, "Require Lane Change Confirmation," and the like. Are you suggesting that each of these settings requires a different FSD stack? Just make the autonomous mode (Autopilot vs. NoA/Autosteer on City Streets) an additional setting. One stack to rule them all.

FSD v12.x (end to end AI)

Active Member

Active Member

Average guy who loves autonomous vehicles

Well-Known Member

Active Member

Active Member

Active Member

Well-Known Member

Active Member

Active Member

Active Member

Single pedal driver

Member

Well-Known Member

Adapting to life without USS one hour at a time

Active Member

Well-Known Member

Well-Known Member

Well-Known Member

Member

Similar threads