FSD v12.x (end to end AI)

Mardak · Aug 28, 2023

EVNow said:
Do they just input the route to be taken and NN figures out everything else ? How about traffic lights

Based on what was said in the demo, it seems like there's no explicit control logic about lanes or signs or traffic lights; yet it's able to follow the navigation route by getting into appropriate lanes to make turns and respond to traffic lights.

We have not programmed in the concept of traffic lights. There's not "this is a red light," "this is a green light," and "this is the traffic light position." We have that in the normal stack, but we do not have that in V12. This is just video video training -- nothing but neural nets, and yet it knows which light applies to it.

Did anybody see V12 visualizations show any traffic lights or signs? I'm pretty sure 11.x would have shown the red lights and stop sign when first in line in these situations, but maybe they're small enough to get hidden by video compression:

If those visualization actually are gone in the V12 demo, then that could mean part of the 300k+ lines of code for control was in charge of determining when signals/signs were relevant to control and visualize. Presumably perception has been predicting all visible traffic lights and stop signs including those for cross traffic, and traditional control code has had mistakes especially for oddly angled situations, so neural networks could learn to do better.

It can read signs without ever being taught to read.

Presumably if control network has learned about how to behave for stop signs, it should be able to learn how to handle school zones, no turn on red, and even upcoming lane use control signs based on how they look and how people drive when seeing those signs.

powertoold · Aug 28, 2023

willow_hiller said:
I think you might have a fundamental misunderstanding of how neural networks generalize.

V12 is not a parrot. It's not recording exact movements and playing them back. For any given timeframe of video data, it's trained to predict the subsequent frames of video for successfully maneuvering the situation, and the controls necessary to achieve that position. If, in one frame, it detects a red-light-running vehicle about to intersect with its path, it will predict the controls necessary to avoid a collision. Even if it's never seen a case of a vehicle running a red-light. All it needs to have been trained on is data of drivers correctly judging trajectories and avoiding intersecting paths of other vehicles.

I hate to continue drawing parallels to LLMs, but the above is like saying "It takes me 2 days to write an essay, so ChatGPT cannot write an essay in less than 2 days."

Yes, I know what you mean about NN generalization.

I'm not saying V12 will parrot. However, based on what Elon has described, it is only trained on what Tesla deems are "good" driver videos. This is based on parameters they agreed upon in video curation.

V12 will generalize the pixel patterns associated with all the vehicle metadata (GPS, route destination, wheel / brake / accelerator ticks, gyro, etc. etc.) for these videos. Since the dataset matters so much (needs to be very clean and real-world), no simulation can be used (in my understanding). If no simulation can be used, then there's no way to manipulate the driver reaction time or other parameters of the videos. You can't create a different future for real-world videos.

dflowerz · Aug 28, 2023

I am getting more excited about version 12 as it seems to me that it's the way to go. Can't wait to try it out. I currently have version 11.4.6 and have no hope for that version. Seems like version 12 might do better at contextual situations using the AI model. There are a couple of urban streets I drive where no speed limits are posted but the car thinks the speed limit is 55 which is way too fast. Seems like 12 might not have this issue.

powertoold · Aug 28, 2023

Mardak said:
Presumably if control network has learned about how to behave for stop signs, it should be able to learn how to handle school zones, no turn on red, and even upcoming lane use control signs based on how they look and how people drive when seeing those signs.

V12 learns based on nuanced pixel flows from all cameras. It doesn't "look" for Stop signs, it simply generalizes all the nuanced pixel patterns when a Stop sign is approaching. Everything from the road geometry pixels, to pixels in the distant horizon, pixels from lead cars and in the distance, etc.

Basically, it takes all real-world pixels into account, forms a pattern of behavior based on them, and generalizes what it's learned from "good" driver pixel flows and control.

willow_hiller · Aug 28, 2023

powertoold said:
Yes, I know what you mean about NN generalization.

I'm not saying V12 will parrot. However, based on what Elon has described, it is only trained on what Tesla deems are "good" driver videos. This is based on parameters they agreed upon in video curation.

V12 will generalize the pixel patterns associated with all the vehicle metadata (GPS, route destination, wheel / brake / accelerator ticks, gyro, etc. etc.) for these videos. Since the dataset matters so much (needs to be very clean and real-world), no simulation can be used (in my understanding). If no simulation can be used, then there's no way to manipulate the driver reaction time or other parameters of the videos. You can't create a different future for real-world videos.

That may be the case for simulation. Unless they have a simulator that's sufficiently real. With enough processing power, you can do things like simulations of the photons as well. That's what ray tracing in modern video games are trying to approximate.

But on the generalization front, I'm thinking of it this way: v12 will, in theory, be capable of distilling everything that makes a good driver good from videos of good driving. And that compression of the good driving behavior can be applied to the neural network's entire understanding of the world, including the full 360 surround situation, and reaction speeds faster than the good drivers it learned from in the first place.

powertoold · Aug 28, 2023

willow_hiller said:
That may be the case for simulation. Unless they have a simulator that's sufficiently real. With enough processing power, you can do things like simulations of the photons as well. That's what ray tracing in modern video games are trying to approximate.

Based on what I understand about V12, I don't think simulation videos can be used in the training set.

This is because every pixel of real-world video is so important for something like V12. In the skewed traffic light case, there may be only a few pixels that are relevant, and everything about these pixels will matter to the NN, from the illumination, adjacent pixels, every pixel of the traffic light casing, etc.

This is not something you can perfectly replicate in simulation.

Because V12 needs to base its decision making on every nuance of pixels, there's no way to replicate it in my intuition.

powertoold · Aug 28, 2023

willow_hiller said:
And that compression of the good driving behavior can be applied to the neural network's entire understanding of the world, including the full 360 surround situation, and reaction speeds faster than the good drivers it learned from in the first place.

Do you have any examples of NNs being better than their human-derived training set?

willow_hiller · Aug 28, 2023

powertoold said:
Based on what I understand about V12, I don't think simulation videos can be used in the training set.

This is because every pixel of real-world video is so important for something like V12. In the skewed traffic light case, there may be only a few pixels that are relevant, and everything about these pixels will matter to the NN, from the illumination, adjacent pixels, every pixel of the traffic light casing, etc.

This is not something you can perfectly replicate in simulation.

Because V12 needs to base its decision making on every nuance of pixels, there's no way to replicate it in my intuition.

If that were true for Tesla, it would be true for any other company attempting end-to-end. First one that comes to mind for me is Comma AI's openpilot. They seem to be able to train their end-to-end network using a simulator.

Here's their blog discussing the end-to-end nature of the network: End-to-end lateral planning

We need to make sure our models learn where to drive and not just how to exploit simulator artifacts. To solve this, we could try to make a better simulator that doesn’t have obvious artifacts, but this is incredibly hard. Instead, we try to make the model blind to the artifacts of the simulator.

powertoold · Aug 28, 2023

willow_hiller said:
If that were true for Tesla, it would be true for any other company attempting end-to-end. First one that comes to mind for me is Comma AI's openpilot. They seem to be able to train their end-to-end network using a simulator.

Here's their blog discussing the end-to-end nature of the network: End-to-end lateral planning

It seems that article relates to comma trying out some hacks to make simulation use possible. The goal in that article isn't to propose a final solution, simply to explore a single test case.

willow_hiller · Aug 28, 2023

powertoold said:
Do you have any examples of NNs being better than their human-derived training set?

Yes, it's been fairly common for the last 5 years or so. Here are some random examples:

Object classification from 2015 (trained on human labels): Computers are now better than humans at recognising images

Finding cancer cells (trained on radiologist diagnoses): Using A.I. to Detect Breast Cancer That Doctors Miss

Detecting heart problems (trained on cardiologist diagnoses): AI more accurate at assessing heart health than humans, study reveals

powertoold · Aug 28, 2023

willow_hiller said:
Yes, it's been fairly common for the last 5 years or so. Here are some random examples:

Object classification from 2015 (trained on human labels): Computers are now better than humans at recognising images

Finding cancer cells (trained on radiologist diagnoses): Using A.I. to Detect Breast Cancer That Doctors Miss

Detecting heart problems (trained on cardiologist diagnoses): AI more accurate at assessing heart health than humans, study reveals

I think you misunderstand my question, "Do you have any examples of NNs being better than their human-derived training set?"

There are plenty of examples of NNs being better than an average / typical human. That's not what I'm asking though.

I mean is there an example of a NN trained on a human-derived dataset that is better than any human at that particular task. In our example, we're talking about reaction time / 360-degree awareness.

For example, V12 can definitely be 2x-10x *safer* than a typical human (based on the fact that it never gets tired, always drives based on "good" humans, etc.), but I don't think it can react faster than its training set.

willow_hiller · Aug 28, 2023

powertoold said:
For example, V12 can definitely be 2x-10x *safer* than a typical human (based on the fact that it never gets tired, always drives based on "good" humans, etc.), but I don't think it can react faster than its training set.

Again, you're assuming that v12 will only be capable of replicating maneuvers it's exactly seen in the training data. See a car on a collision course, respond to it 0.2 seconds later because that's when the human driver responded to similar situations.

But it's not playing back recordings of good drivers. It doesn't have the same hand-to-eye latency as human drivers, and there's no reason that a well-trained and generalized driving network would wait for 10 frames before responding to stimulus.

powertoold · Aug 28, 2023

willow_hiller said:
Again, you're assuming that v12 will only be capable of replicating maneuvers it's exactly seen in the training data. See a car on a collision course, respond to it 0.2 seconds later because that's when the human driver responded to similar situations.

But it's not playing back recordings of good drivers. It doesn't have the same hand-to-eye latency as human drivers, and there's no reason that a well-trained and generalized driving network would wait for 10 frames before responding to stimulus.

Yes, based on my understanding, V12 will be limited by its set of good driver training examples.

Potentially, it will be better than any single good driver at his best, but it won't be able to take full advantage of the 36hz 360 degree cameras.

Supcom · Aug 28, 2023

powertoold said:
I think you misunderstand my question, "Do you have any examples of NNs being better than their human-derived training set?"

There are plenty of examples of NNs being better than an average / typical human. That's not what I'm asking though.

I mean is there an example of a NN trained on a human-derived dataset that is better than any human at that particular task. In our example, we're talking about reaction time / 360-degree awareness.

For example, V12 can definitely be 2x-10x *safer* than a typical human (based on the fact that it never gets tired, always drives based on "good" humans, etc.), but I don't think it can react faster than its training set.

I think chess engines have surpassed even the very best humans. They were, at least originally, trained on actual human chess matches.

powertoold · Aug 28, 2023

Supcom said:
I think chess engines have surpassed even the very best humans. They were, at least originally, trained on actual human chess matches.

Yes, I knew someone would bring this up, but AlphaZero wasn't trained on a human-derived dataset, and chess is a closed system game where all the parameters are known.

Supcom · Aug 28, 2023

powertoold said:
Yes, I knew someone would bring this up, but AlphaZero wasn't trained on a human-derived dataset, and chess is a closed system game where all the parameters are known.

So, move the goalposts to get the answer you want. I'm fine with that. Have fun!

powertoold · Aug 28, 2023

Supcom said:
So, move the goalposts to get the answer you want. I'm fine with that. Have fun!

That's why I kept saying "human-derived" dataset, so people don't bring up examples of chess

Supcom · Aug 28, 2023

powertoold said:
That's why I kept saying "human-derived" dataset, so people don't bring up examples of chess

And then, like any good internet goalie, you added the 'closed system' disqualifier.

(moderator edit)

powertoold · Aug 28, 2023

Supcom said:
And then, like any good internet goalie, you added the 'closed system' disqualifier.

(moderator edit)

I'm not sure what you're getting at. I simply included that part as additional context for why chess isn't relevant to this discussion about AI that predicts based on real-world inputs.

I'd hate to provide full context for all of my thoughts. I assume people in the discussion are using the topic at hand and all its implications as context.

JB47394 · Aug 28, 2023

willow_hiller said:
Again, you're assuming that v12 will only be capable of replicating maneuvers it's exactly seen in the training data. See a car on a collision course, respond to it 0.2 seconds later because that's when the human driver responded to similar situations.

The assumption is that timing is just as important as any other parameter of the training data. What is it about the training process that would encourage the system to want to react faster than the data as presented? Why not crowd a lane divider more tightly on a turn? Why not accelerate away from a traffic light more quickly? The idea of departing from the training parameters seems non-intuitive.

FSD v12.x (end to end AI)

Active Member

Active Member

Member

Active Member

Well-Known Member

Active Member

Active Member

Well-Known Member

Active Member

Well-Known Member

Active Member

Well-Known Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Similar threads