Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Is all neural networks really a good idea?

This site may earn commission on affiliate links.
The claim is not that 0.5% perform rolling stops, it's actually that only 0.5% of people perform complete stops! That's the problem.
Absolutely right, thank you for the catch. I simply wrote that incorrectly. I meant to say what they said, that only 0.5% come to a full stop.

The number being that low actually does surprise me. I'm not claiming I knew it all along. I certainly felt that most people don't stop completely and there's no doubt at all that the new behavior at stop signs, following the NHTSA recall, feels very unnatural and definitely generates impatience from drivers behind you at a stop sign.
 
I don't think it's true it was trained on rolling stops. Otherwise it should have emulated that and rolled every single stop, which presumably it didn't. I didn't watch it,
I watched it, but wasn't paying too much attention given the video quality. So as a fellow speculator, I can only guess that the case(s) where it stopped involved the presence of other cars while the case(s) where it rolled didn't. shrug Tesla will just have to work it out without our help :)
 
If you don't stop at the stop line, but roll to the edge of the road, it becomes a habit. Then when it matters, you become a risk.

Look at this intersection: Google Maps

Cars from the far left - coming down the very steep hill to the T-intersection are supposed to stop at the white line on the left, by the lady in white. Many many cars do not stop there but stop at the edge past the sidewalk for their turn right or left. I have been running along this sidewalk, and even when walking, and a car zooms right across my path when I near the blind corner. I know this, so I always pause at the blind corner.

When a car has rolled though to the edge, you no longer have eye-contact because they are watching the fast road to make their turn. You can't make your legal pedestrian crossing in front of them. If they have come to a full stop at the stop line, they can see you and will typically do the hand gesture, head nod, or just pause because they know they have to wait for you to cross.

Rolling through this is just so dangerous. Not every pedestrian comes to a halt at the intersection, nor do they have to. If you are pushing a stroller, the stroller is leading. If you are jogging, you do not have to stop at the intersection. Yes, you definitely SHOULD pause and look around the blind corner, but you don't HAVE to. It is expected that a car will stop at the white line and see you. People who roll through this are 100% bad drivers.
 
  • Like
Reactions: dtdtdt
If you don't stop at the stop line, but roll to the edge of the road, it becomes a habit. Then when it matters, you become a risk.

Look at this intersection: Google Maps

Cars from the far left - coming down the very steep hill to the T-intersection are supposed to stop at the white line on the left, by the lady in white. Many many cars do not stop there but stop at the edge past the sidewalk for their turn right or left. I have been running along this sidewalk, and even when walking, and a car zooms right across my path when I near the blind corner. I know this, so I always pause at the blind corner.

When a car has rolled though to the edge, you no longer have eye-contact because they are watching the fast road to make their turn. You can't make your legal pedestrian crossing in front of them. If they have come to a full stop at the stop line, they can see you and will typically do the hand gesture, head nod, or just pause because they know they have to wait for you to cross.

Rolling through this is just so dangerous. Not every pedestrian comes to a halt at the intersection, nor do they have to. If you are pushing a stroller, the stroller is leading. If you are jogging, you do not have to stop at the intersection. Yes, you definitely SHOULD pause and look around the blind corner, but you don't HAVE to. It is expected that a car will stop at the white line and see you. People who roll through this are 100% bad drivers.
All that is true and I agree. However, you're talking about a human driver who concentrates on the traffic to the left and fails to be aware of pedestrians. The Tesla or other AV has multiple cameras and watches for pedestrians all the time all around the car. So if there's no pedestrian around and the traffic is safe for a turn, there is no safety concern - only the vestigial legal concern - associated with a rolling stop in that scenario.

If there is in fact a pedestrian nearby showing any possibility of crossing in front of the car's path, then the car will not perform a rolling maneuver but will stop for the pedestrian- which it would have done in any case, no matter what the law about rolling stops might be.

We seem to be holding the AV to a "no rolling stop" standard that humans actually don't obey, except for good and attentive human drivers in the applicable scenarios like you describe. In which case, once again, the AV will not lose focus as a human might do. So if we want to improve safety regarding rolling stops, the focus should be on a national or International campaign to enforce rolling stops for all human drivers at all times. Not focus on the emerging AV technology that doesn't have the same problems.

I want to be clear that I have no personal angst or unrecoverable impatience about the car stopping; it's simply that it's bewildering and annoying to most other drivers, because they wouldn't be doing it that way. It's negatively impacting acceptance and adoption without actually improving safety.
 
I want to be clear that I have no personal angst or unrecoverable impatience about the car stopping; it's simply that it's bewildering and annoying to most other drivers, because they wouldn't be doing it that way. It's negatively impacting acceptance and adoption without actually improving safety.
Interesting question for the discussion: would humans come to a complete stop if there was a cop at the intersection?
 
  • Like
Reactions: JHCCAZ
Speculation on FSD Architecture Evolution
(by James Douma)

TL;DW:
Douma is firmly in the camp that V12 builds/depends on the previous V11 by tying together many smaller previously trained NNs. But V12 opens the door for end-to-end training. This training is vastly less efficient so they are expanding their compute by 10x then 100x to make up for this inefficiency. V12 will eventually shift the bottleneck from people to training and computers.

IMO it is unclear if V12 is solely using end to end training. Douma's presentation and common sense tell us "no, of course not." But Elon and his fans say "yes, of course" repeating that the V12 car Elon demoed was not taught about round-abouts or lanes and just deduced it all from training on videos of good driving.

IMO V11 is not great when measured by getting close to solving FSD and making robotaxis. Still a long way to go (certainly where I live). If V12 will take 100x the compute power to catch up to V11 then we won't see V12 released before 2025 (after the optimistic estimate of when their compute grows by 100x). My best guess is they will still use a lot of the more efficient training (many smaller NNs) in order to bring V12 to market.

Does Elon really believe FSD will be solved this year or was he just trolling us? Why spend billions for more compute if the solution is only a few months away?
 
IMO it is unclear if V12 is solely using end to end training. Douma's presentation and common sense tell us "no, of course not." But Elon and his fans say "yes, of course" repeating that the V12 car Elon demoed was not taught about round-abouts or lanes and just deduced it all from training on videos of good driving.
All that tells me is that roundabouts and lane selection and such were part of the control software. Instead of trying to model each control scenario and have logic and calculations to figure out how to handle them, they've gone with neural nets that take in whatever the control software did and spit out good control outputs. After appropriate training, of course.

The stuff Elon demoed was "nothing but nets", which is plural. I consider it obvious that they dropped in a neural network in place of the old heuristic control software. It's a well-defined component of the overall system, it retains the V11 visualization, and the system becomes entirely composed of neural networks. Anyone thinking that it's a monolith with end-to-end training is just dreaming. That's like that LK-99 superconductor material; it's just too much too quickly.
 
Highly doubt FSD 12 is "full neural network" with no hard coded logic. Im sure the initial "action" pops out of a black hole neural net, but can 100% guarantee there is a "monitor" program running to make sure those net actions dont violate some very strict rules (like dont hit a human, or drive off a cliff).

All neural networks will have edge cases where they WILL do the worst thing, and these edge cases need to be caught especially early in training.
 
I'm way out of my league here.

Could this visual input of traffic situations also be captured by the car itself, even if using MCU MIPS? Maybe takes and record video snapshots periodically- correlate them using car MIPS after hours. In essence at least give weight (if not priority) to "learn" how that driver profile prefers this specific stretch of road over time.

Or hust upload those clips to the mothership, tag it somehow as mine and my profile, then update my local preferences file

Might be an appealing FSD feature, minimizing the "FSD is too/fast/timid/aggressive concern. The car could even adapt to the style of the current profile.

Unless that's too much for our pour little MCU2/AP3. Actually that AP computer is pretty powerful, and if thr car is idle maybe it is too. Hmmm.

Of course, it's not likely. Do yous gurus see this as a possibility?]::
 
Highly doubt FSD 12 is "full neural network" with no hard coded logic. Im sure the initial "action" pops out of a black hole neural net, but can 100% guarantee there is a "monitor" program running to make sure those net actions dont violate some very strict rules (like dont hit a human, or drive off a cliff).

All neural networks will have edge cases where they WILL do the worst thing, and these edge cases need to be caught especially early in training.
That of course is not in dispute. The dispute is over whether V12 is a single NN that takes video input and spits out actions (true "end-to-end") or whether its using V11 as a basis with the remaining non-NN bits changed over to NN such that the whole chain from video input to action are all done by NNs, but it's still modular.
 
  • Like
Reactions: DrChaos
I'm way out of my league here.

Could this visual input of traffic situations also be captured by the car itself, even if using MCU MIPS? Maybe takes and record video snapshots periodically- correlate them using car MIPS after hours. In essence at least give weight (if not priority) to "learn" how that driver profile prefers this specific stretch of road over time.

Or hust upload those clips to the mothership, tag it somehow as mine and my profile, then update my local preferences file

Might be an appealing FSD feature, minimizing the "FSD is too/fast/timid/aggressive concern. The car could even adapt to the style of the current profile.

Unless that's too much for our pour little MCU2/AP3. Actually that AP computer is pretty powerful, and if thr car is idle maybe it is too. Hmmm.

Of course, it's not likely. Do yous gurus see this as a possibility?]::
Tesla so far does no such customization other than what you can toggle in the option menus. All data is processed by the mothership and then downloaded to the car via update. There is no individualized learning by individual cars.
 
All that tells me is that roundabouts and lane selection and such were part of the control software. Instead of trying to model each control scenario and have logic and calculations to figure out how to handle them, they've gone with neural nets that take in whatever the control software did and spit out good control outputs. After appropriate training, of course.

That's the more useful intermediate situation. Technically its "model distillation"---and they could estimate effects a more expensive deterministic code/optimization basted planner that has higher driving performance but worse computational properties that made it infeasible on-board.

But the other option is entirely learning planning from raw human observations. That seems unlikely to be successful in any reasonable time. It would get common cases super easily and then ensuring actually useful and safe policy might be very difficult.
 
That of course is not in dispute. The dispute is over whether V12 is a single NN that takes video input and spits out actions (true "end-to-end") or whether its using V11 as a basis with the remaining non-NN bits changed over to NN such that the whole chain from video input to action are all done by NNs, but it's still modular.

At a technical level the question is "what is are the loss functions, where are they and where is the ground-truth label coming from?" That's the core decision for the nnet training.
 
Mobileye CEO and CTO, Amnon Shashua and Shai Shalev-Shwartz just wrote a blog arguing against an end-to-end approach for full-self-driving. They argue that E2E is neither sufficient nor necessary and is lacking in transparency, controllability and performance:

For transparency, while it may be possible to steer an end-to-end system towards satisfying some regulatory rules, it is hard to see how to give regulators the option to dictate the exact behavior of the system in all situations. In fact, the most recent trend in LLMs is to combine them with symbolic reasoning elements – also known as good, old fashion coding.
For controllability, end-to-end approaches are an engineering nightmare. Evidence shows that the performance of GPT-4 over time deteriorates as a result of attempts to keep improving the system.
Regarding performance (i.e., the high MTBF requirement), while it may be possible that with massive amounts of data and compute an end-to-end approach will converge to a sufficiently high MTBF, the current evidence does not look promising. Even the most advanced LLMs make embarrassing mistakes quite often. Will we trust them for making safety critical decisions? It is well known to machine learning experts that the most difficult problem of statistical methods is the long tail. The end-to-end approach might look very promising to reach a mildly large MTBF (say, of a few hours), but this is orders of magnitude smaller than the requirement for safe deployment of a self-driving vehicle, and each increase of the MTBF by one order of magnitude becomes harder and harder. It is not surprising that the recent live demonstration of Tesla’s latest FSD by Elon Musk shows an MTBF of roughly one hour.

Conclusion:

In summary, we argue that an end-to-end approach is neither necessary nor sufficient for self-driving systems. There is no argument that data-driven methods including convolutional networks and transformers are crucial elements of self-driving systems, however, they must be carefully embedded within a well-engineered architecture.

 
It will have to be E2E on rails, but the question is if that still has advantages versus the traditional approach or will the code bloat back to what it originally was.

Define "traditional approach". Nobody uses all heuristics for autonomous driving. Everyone uses a lot of NN in all parts of their stack with very little heuristics, it is just a matter of the structure, modular vs E2E.
 
It will have to be E2E on rails, but the question is if that still has advantages versus the traditional approach or will the code bloat back to what it originally was.
Nobody knows how to make "rails" for such a system, without re-building the conventional policy with deterministic coding. Now if Tesla's problem was that they have a deterministic solution with sufficient performane (like Waymo) but it's too computationally challenging to run on-board, that's promising as there are engineering approaches, both in hardware, software and nets (model distilllation) which could help. But I don't think Tesla has anything like this. They'd show off a robotaxi but one that took a $20K computer on board. Waymo and MobileEye have been working on their policy for literally decades, and probably haven't been scrapping it every couple of years.

The 'alignment problem' the end-to-end trained Large Language Models is very difficult and all experience shows that as post-hoc 'alignment' adjustments (to human desirable policy behavior) goes up, their performance goes down. It's nearly literally a lobotomy.

Bad LLMs give rude results or result in embarassment. So far, no real costs.

Bad driving policy results in crashes---whether self or sufficiently unnatural behavior that other humans crash trying to understand or evade---a far worse problem.
 
  • Like
Reactions: JB47394