Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
Can no longer be "super human" because it only makes use of videos from "good" drivers. It is only "super human" in that it can potentially drive like a "good" human all the time.
Do you think FSD Beta 11.x does super-human behaviors sometimes? If so, do some humans do those same things sometimes too? Both these sets of driving examples could be used as end-to-end training data if Tesla curates the examples appropriately with evaluation metrics of what clips or even parts of clips to use.

Similarly, sometimes humans are extra observant and/or more intuitive when driving to be proactive in anticipating issues and avoiding trouble. Even if these are rare cases, if there is actual signal from training inputs of 360º cameras with better vantage, the neural networks can learn to do the same thing when appropriate.

Even more powerful is combining multiple behaviors such as the ability to be "just" good at each of tracking the upcoming traffic light, anticipating adjacent lane movement, watching for pedestrians and cross traffic, knowing how the road layout will change; and doing all of those at the same time every time when making a lane change is probably super human.
 
Has it been explained how you get E2E to allow increased MPH? Elon's v12 demo was set to 85mph and seemed to drive at or below the local speed limit. It's not obvious to me that a NN taught only by video has the ability to set a desired speed?
If the end goal is L4, where Tesla takes responsibility, the driver would have no say in the speed it drives, and most laws require autonomous vehicles to obey all traffic laws. So it would always drive at, or below, the local speed limit.
 
  • Like
Reactions: jeewee3000
Do you think FSD Beta 11.x does super-human behaviors sometimes? If so, do some humans do those same things sometimes too? Both these sets of driving examples could be used as end-to-end training data if Tesla curates the examples appropriately with evaluation metrics of what clips or even parts of clips to use.

Similarly, sometimes humans are extra observant and/or more intuitive when driving to be proactive in anticipating issues and avoiding trouble. Even if these are rare cases, if there is actual signal from training inputs of 360º cameras with better vantage, the neural networks can learn to do the same thing when appropriate.

Even more powerful is combining multiple behaviors such as the ability to be "just" good at each of tracking the upcoming traffic light, anticipating adjacent lane movement, watching for pedestrians and cross traffic, knowing how the road layout will change; and doing all of those at the same time every time when making a lane change is probably super human.

V11's approach seems completely different than V12 though. V11 leverages the autolabeling pipeline to label pedestrians and cars from all cameras. It leverages the future to label trajectories in the past, etc.

This allows V11 to see multiple objects in 360 degrees unlike a human with its slow gimbal.

V12 seems to forgo autolabeled clips in the training set.

Autolabeling is still used in the curation process though, but again, the capability of V12 seems to be limited by good human drivers because that's what the system is imitating.
 
V12 seems to forgo autolabeled clips in the training set
Getting rid of all autolabeled clips is not entirely clear and not the approach Karpathy would have expected for gradual Software 2.0 progression. If Tesla is keeping around the perception module before feeding data into control, all the autolabeled clips for detecting objects, lanes, etc. are still there.

Indeed, adding the control neural network to the end of that to reach end-to-end doesn't need autolabeled clips as you suggest because freezing perception to train control converts the raw video to intermediate outputs/inputs during the forward pass, and then including the control weights to compare with the expected output allows backpropagation, e.g., to the beginning of control. This also allows for training the entire stack with nothing frozen although it'll be a balancing act of making sure other outputs, e.g., for visualization, aren't broken.
 
  • Helpful
Reactions: Artful Dodger
I'm wondering whether the training will result in the car never getting into situations where it would have to test its reaction time. We react when we become aware of a situation only at the last moment. When will that happen to a car with ~30 millisecond reaction times and 360 vision - and where there is a constructive reaction to the situation in that short timeframe?
There will always be situations where the time needed to avoid a crash is less than the time available (for acceptable levels of utility vs risk).
Continuous situational awareness will reduce that space though.
 
There will always be situations where the time needed to avoid a crash is less than the time available (for acceptable levels of utility vs risk).
That just means that there will always be occasions where there will be collisions. No argument there.

I was thinking about cases where the computer reaction time would make the difference relative to examples of human drivers. After all, if all the training data is only as good as a human, what would that miss? Perhaps the most meaningful example I thought of was a child running from between parked cars and onto the roadway. Every millisecond of reaction time gained is valuable. For most cases, there are visual cues to inform the driver that something is going to happen, or whatever is going to happen develops relatively slowly.

Dealing with a case like the child may require simulated test data so that the car can learn to brake faster than a human would, and possibly to swerve, whatever.
 
  • Like
Reactions: powertoold
Perhaps the most meaningful example I thought of was a child running from between parked cars and onto the roadway. Every millisecond of reaction time gained is valuable. For most cases, there are visual cues to inform the driver that something is going to happen, or whatever is going to happen develops relatively slowly.

Dealing with a case like the child may require simulated test data so that the car can learn to brake faster than a human would, and possibly to swerve, whatever.

Perhaps in this case, Tesla will use video examples of AEB in action. The AEB models probably use the autolabeled pipeline and would have detected the child faster than a human and braked sooner.
 
Perhaps in this case, Tesla will use video examples of AEB in action. The AEB models probably use the autolabeled pipeline and would have detected the child faster than a human and braked sooner.
I would think that all of the safety features, AEB/Lane departure/etc. would continue to use the perception stack that identifies objects, lanes, etc.

Perception doesn't go away because FSDb becomes End2End, it is just an output spit out from the middle of the End2End stack.
 
  • Informative
Reactions: Artful Dodger
We react when we become aware of a situation only at the last moment.
Not generally true.

It’s very common for humans to have negative reaction times, reacting to avoid something before it occurs.

Anyway we’ll see, lots of speculation about how training is done for end to end here, and I think none of us really know (most definitely the folks at Tesla are thinking about it of course).

In the end Elon has said unambiguously they are targeting L4/L5 (“2X as safe as average human fully unsupervised in all scenarios, then we will want to make it…10X”), so they’ll have to do something that flawlessly duplicates human anticipation and leverages the theoretically superior reaction time (not yet realized in any software build to date).

Count me as highly skeptical that current neural net SOTA architectures will get them (or anyone!) there.

To be 10X safer than the average human, they will have to avoid most accidents caused by other drivers and pedestrians.

It’s going to be very exciting and a huge step forward when this problem is solved though - the implications for autonomous capabilities will be huge and of course extend to many other applications.
 
If the end goal is L4, where Tesla takes responsibility, the driver would have no say in the speed it drives
Maybe the passenger could give instructions to drive slower or make fewer lane changes somewhat like Chill FSD Beta driving profile? Elluswamy suggests some examples during the demo:


So seems like they're looking into controlling behavior even with end-to-end. One possible approach is to add additional context tokens (converted from natural language or settings or fixed/selectable prompts) along with video inputs with the matching control outputs. This could allow for Basic Autopilot "stay in lane" and Assertive "follow closer" as presets as well as on-demand commands. Of course, it'll be limited to what Tesla trains, so probably not "ignore speed limits" and "roll stops" if NHTSA has a say. Then again, there could be actual use cases when it should follow "ignore the red light."
 
The apparent loss of momentum with the 11.4.x releases
In the past week with not only 11.4.4 going to main product software for newer vehicles on HW3/HW4 but also nearly a majority of early access receiving 11.4.7, it seems like Tesla is making progress with 11.4.x but clearly secondary relative to V12. Perhaps it was all-hands-on-deck with Elon Musk giving a deadline for end-to-end livestream, and once that made it through training and initial validation to be demo-able, some of the team could take care of existing customers with these updates.

Perhaps what you're getting at is whether there'll be more 11.x with new release notes for new functionality, and that would likely depend on what can be reused or even necessary for V12. Single stack highway driving was always on the horizon during 10.x, but there were many releases incrementally improving the architecture to achieve performance needed to safely drive at high speeds.

With the new H100s providing ~2-3x compute over the existing A100s, presumably two sets of training will happen at the same time. Maybe the slower one will be used for other end-to-end ideas or restore some momentum to 11.x?

 
I would think that all of the safety features, AEB/Lane departure/etc. would continue to use the perception stack that identifies objects, lanes, etc.
Earlier versions of FSD Beta even switched to legacy stack visualizations when triggering lane departure warning, and even more so, it got confused by lines that FSD Beta knew weren't actually lane lines. So it is practical to keep the current safety feature behaviors dependent on the perception stack, and maybe that's even part of the remaining ~3k lines of control.

Maybe longer term, Tesla might want to rework these explicitly named safety features on top of end-to-end control. E.g., both Automatic Emergency Braking and Pedal Misapplication Mitigation are presumably cases when control would predict the vehicle needs to stop quickly with very high confidence.
 
Earlier versions of FSD Beta even switched to legacy stack visualizations when triggering lane departure warning, and even more so, it got confused by lines that FSD Beta knew weren't actually lane lines. So it is practical to keep the current safety feature behaviors dependent on the perception stack, and maybe that's even part of the remaining ~3k lines of control.

Maybe longer term, Tesla might want to rework these explicitly named safety features on top of end-to-end control. E.g., both Automatic Emergency Braking and Pedal Misapplication Mitigation are presumably cases when control would predict the vehicle needs to stop quickly with very high confidence.
The legacy lane departure has been a problem for me. There is a road with a crack in the middle and every time I go down it, lane departure is triggered and is throws out the alert noise and does some swerving. I really wish they would fix this, as it’s not very safe having it try to swerve out of the lane.
 
To be 10X safer than the average human, they will have to avoid most accidents caused by other drivers and pedestrians
Do you have examples you're thinking of? It seems like a good number of "accidents caused by others" from California Autonomous Vehicle Collision Reports are from not driving like a human, e.g., suddenly stopping unnecessarily. It would seem like training end-to-end to behave like humans could reduce these.
 
Do you have examples you're thinking of?
Rear-end collisions (often avoidable; humans avoid them frequently) is one example.

I’m not talking about collisions caused by unnatural behavior on the part of the AV.

Just talking about all the collisions every day avoided by defensive driving.

Obviously can’t avoid every collision, but certainly there are a huge number that are avoidable, in spite of the best efforts of the party at fault to complete the collision.

So that 10X level would be tough, I think, but it might be attainable. Not sure. I guess it is, if the best human drivers have collision rates 10x lower than the average driver, all else being equal. (Don’t have data on this.)
 
If, in one frame, it detects a red-light-running vehicle about to intersect with its path, it will predict the controls necessary to avoid a collision. Even if it's never seen a case of a vehicle running a red-light. All it needs to have been trained on is data of drivers correctly judging trajectories and avoiding intersecting paths of other vehicles.
Not sure about this.

It needs to get trained on others running red-lights to know when to anticipate others may run a red light (given the paucity of such data, training that may be difficult - but a different problem). Then, it may be able to avoid hitting the redlight running vehicle.

Ofcourse, they shouldn't train on ego runing redlights. But Elon's car ran a redlight ... so may be they did ;)
 
  • Like
Reactions: Bladerskb