Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Is all neural networks really a good idea?

This site may earn commission on affiliate links.
One key question is something @diplomat33 and I have already gone around on in this thread: is Tesla throwing out most of their previous work and starting over with a new V12 model they have to train from scratch or is V12 an incremental improvement where many of the NNs from V11 are still being used but 300K+ lines of code have been replaced by additional NNs?

I admit Elon implied they started over from scratch in the V12 video just like he did previously saying V12 will be end to end. But if the only evidence we have are these possibly ambiguous statements by Elon then I still don't believe it. For example, it doesn't make a lot sense for Ashok to talk about a traffic light regression if this is a brand new system.

I don't believe Tesla threw out their much ballyhooed occupancy network. And we could see they didn't throw out the on-screen visualization and identification. Clearly there is either tremendous redundancy in V12 or they are using connected NNs, not one big NN.

I don't believe they dumped their previous years of work in the trash bin and replaced it all with a new NN for V12 that needed to be trained all over again (in addition to replacing hundreds of thousands of lines of code). If V12 is a complete redo and lacks significant internal structure then the barrier to entry is much lower than we thought. If it can now be trained with YouTube videos, the barrier is even lower. It would be astounding to me if they were able to replicate most of the behavior of V11 and replicate the on screen visualization in such a short period of time after starting over from scratch.

I agree that it would seem odd for Tesla to just throw everything out and start from scratch. I am thinking maybe Tesla is adopting a parallel approach. Maybe they are continuing V11 with all nets but modular, replacing the planner code with NN's AND also building V12 end-to-end model "on the side". This could allow them to do A/B testing. They could continue to train V12 to see how good it can get and compare that with their progress with V11. And if V12 surpasses V11 then Tesla could simply stop V11 development and shift their efforts to V12.
 
That begs a somewhat unrelated question: if they're starting from scratch, then what's the point of 11.4.7?
Starting from scratch on the control system. By this theory, the rest of the networks will carry over. Perception, occupancy, whatever else they're using. Work on those other components will not be wasted. Perhaps it's work to bring Hardware 4 into the fold. Perhaps they want to make sure that they don't lose mindshare by letting V11 languish for 6-12 months. Maybe they're trying to make the system more appealing for subscriptions over that timeframe. There may be more esoteric reasons as well, which could involve the collection of data in response to the current heuristic system. There could be a million different reasons. I spent enough time as a software engineer to know that the information available to customers is usually wildly inadequate to understanding the thinking of the development team or its management.
 
Last edited:
Starting from scratch on the control system. By this theory, the rest of the networks will carry over. Perception, occupancy, whatever else they're using. Work on those other components will not be wasted. Perhaps it's work to bring Hardware 4 into the fold. Perhaps they want to make sure that they don't lose mindshare by letting V11 languish for 6-12 months. Maybe they're trying to make the system more appealing for subscriptions over that timeframe. There may be more esoteric reasons as well, which could involve the collection of data in response to the current heuristic system. There could be a million different reasons. I spent enough time as a software engineer to know that the information available to customers is usually wildly inadequate to understanding the thinking of the development team or its management.
It would not seem logical to think the inputs to the "control system" would be the outputs from the current occupancy, lane identification, sign recognition, light recognition, etc. NNs. What would be the point of that? If it's end-to-end NN, I would think it would be photons-in and acceleration and steering-out.
 
It would not seem logical to think the inputs to the "control system" would be the outputs from the current occupancy, lane identification, sign recognition, light recognition, etc. NNs. What would be the point of that? If it's end-to-end NN, I would think it would be photons-in and acceleration and steering-out.
Elon's meaning of "end to end" is subject to debate. He may have meant that they've created a single, monolithic neural network that literally takes in photons at one end and spits out control actions at the other. Or he may have meant that they've created a multi-neural-network system that no longer uses heuristics to take in photons and kick out control actions. No matter the system, it must be able to go from photons to control actions because that is its purpose. That was true from the very first iteration (including RADAR, which involves collecting photons).

Anyway, that was my theory and I'll happily toss it out the window once we have information refuting it.
 
However, now I think it is quite misguided to even aspire for full-stack neural network automation. It seems more like an "our only (or, better yet, our most exciting) tool is AI, so everything looks like an AI nail" type situation. Imagine if we taught our kids to drive this way: "I'm not going to tell you anything about driving - just watch what I do in all these circumstances and then emulate it." Would that result in good drivers? I don't think so.
AI allows all kinds of experiments that are not possible to learn while inside the car.

For example, recording human drivers at busy intersections for multiple cycles and then flooding them with FSDs in simulations can help with planning.
 
Is that so? And all this time, I thought RADAR was an application of electromagnetic waves. I guess I need to review Maxwell's equations again.

All electromagnetic waves are also photons. So cameras, radar and lidar, all collect photons. The energy of the photons is just different. And cameras collect photons passively, radar and lidar emit their own photons and collect the photons that come back.
 
All electromagnetic waves are also photons. So cameras, radar and lidar, all collect photons. The energy of the photons is just different. And cameras collect photons passively, radar and lidar emit their own photons and collect the photons that come back.
Yes, there is the duality of photons as particles and EM waves. But we generally don't think of radio antennas as collecting photons. We work in the EM wave domain.
 
  • Like
Reactions: diplomat33
It would not seem logical to think the inputs to the "control system" would be the outputs from the current occupancy, lane identification, sign recognition, light recognition, etc. NNs. What would be the point of that? If it's end-to-end NN, I would think it would be photons-in and acceleration and steering-out.
The control system deals with integrating navigation and driving. Musk touched on this recently. So did Ashok. The many complaints about our cars choosing the wrong lane for an upcoming turn are a prime example. It is essential for the control system to get input about lanes and obstructions in order to function properly. A while ago I suggested that this part of the car's behavior has not been improving recently because they are working on transferring the control system to NNs and they don't want to invest development time tweaking code that will soon be removed.

This is a good example of the problem space getting bigger and bigger compared to what was originally envisioned. IMHO this ever growing problem space is one of the key reasons why the solution to FSD has taken much longer than expected. This is also why it's hard to accurately predict when FSD will be solved: we don't know how much the problem space needs to further expand. It will keep growing and growing until it no longer needs to and then there will be a solution. This is also why FSD is a vastly harder problem then Chess, Go, and games in general where the problem space is well defined at the get go.

More speculatively I think the vastly increased problem space is why Musk's opinion on FSD has radically changed. He now thinks they will need to be close to solving the Artificial General Intelligence problem in order to solve FSD. If Musk thought this initially then he would not have predicted having a solution to FSD in 2016.
 
Is that so? And all this time, I thought RADAR was an application of electromagnetic waves. I guess I need to review Maxwell's equations again.

Wikipedia:
A photon (from Ancient Greek φῶς, φωτός (phôs, phōtós) 'light') is an elementary particle that is a quantum of the electromagnetic field, including electromagnetic radiation such as light and radio waves, and the force carrier for the electromagnetic force.
Oxford Languages:
a particle representing a quantum of light or other electromagnetic radiation. A photon carries energy proportional to the radiation frequency but has zero rest mass.
 

This is a good example of the problem space getting bigger and bigger compared to what was originally envisioned. IMHO this ever growing problem space is one of the key reasons why the solution to FSD has taken much longer than expected. This is also why it's hard to accurately predict when FSD will be solved: we don't know how much the problem space needs to further expand. It will keep growing and growing until it no longer needs to and then there will be a solution. This is also why FSD is a vastly harder problem then Chess, Go, and games in general where the problem space is well defined at the get go.
More speculatively I think the vastly increased problem space is why Musk's opinion on FSD has radically changed. He now thinks they will need to be close to solving the Artificial General Intelligence problem in order to solve FSD. If Musk thought this initially then he would not have predicted having a solution to FSD in 2016.

Yeah, If we look back at Elon's statements over the years, we can see that problem space grow. Elon said FSD was solved in 2015 when he seemed to think of FSD as basically just lane keeping, lane changes, traffic lights, stop signs, making turns and following a route. Then Tesla fans would say "hey Elon, FSD can't handle X" and Elon would reply "we will work on X". For example, FSD could not handle roundabouts so Elon added that to the problem space, FSD could not handle emergency vehicles, so Elon added that to the problem space, FSD could not handle railroad crossings, so Elon added that to the problem space, FSD could not handle double parked cars, so Elon added that to the problem space, FSD could not handle unmarked roads, so Elon added that to the problem space, etc... So we are seeing Elon add to the problem space over time. Of course, most people could have told him that the problem space was bigger than he thought. It is why people were skeptical that FSD was solved. He thinks AGI is needed because the problem space is so vast that just trying to solve each edge case one by one, is not practical. Clearly, heuristics alone cannot solve such a big problem space. AGI seems to be the logical way to handle such a large problem space since the car would have the intelligence to solve new problems on its own. That is also why Elon likes E2E. With E2E, you can train on vast data, and the NN can generalize to problems not directly seen in training. That seems like an efficient way to tackle such a vast problem space.

But to be fair, Elon is not the only one to underestimate the size of the problem space. In 2017, Waymo thought that solving Chandler was enough to scale and then realized all the cases in SF that needed to be solved. Even now, Waymo and others have solved a lot of FSD problems but they still run into new problems as we see with the stalls in SF. So the problem space continues to grow. I would argue that the problem space is actually infinite because there will always be new edge cases. So solving FSD is not a matter of solving the entire problem space (that's impossible for an infinite space). I see only two options: either achieve AGI so that the car has the intelligence to figure out new problems on its own or solve enough of the problem space that your system is "good enough", ie can drive everywhere safer than humans. In other words, you pick an arbitrary point where you say "we have not solved everything but we've solved enough to be able to deploy safely per our metrics". And AVs have the benefit that they will continue to improve even after they are deployed. I think we are seeing that on a small scale already. Waymo and Cruise are not perfect but they feel that they are "good enough" for driverless deployment in limited geofences. As AVs improve, the ODD will expand.
 
Last edited:
  • Like
Reactions: BitJam
The control system deals with integrating navigation and driving. Musk touched on this recently. So did Ashok. The many complaints about our cars choosing the wrong lane for an upcoming turn are a prime example. It is essential for the control system to get input about lanes and obstructions in order to function properly. A while ago I suggested that this part of the car's behavior has not been improving recently because they are working on transferring the control system to NNs and they don't want to invest development time tweaking code that will soon be removed.

This is a good example of the problem space getting bigger and bigger compared to what was originally envisioned. IMHO this ever growing problem space is one of the key reasons why the solution to FSD has taken much longer than expected. This is also why it's hard to accurately predict when FSD will be solved: we don't know how much the problem space needs to further expand. It will keep growing and growing until it no longer needs to and then there will be a solution. This is also why FSD is a vastly harder problem then Chess, Go, and games in general where the problem space is well defined at the get go.

More speculatively I think the vastly increased problem space is why Musk's opinion on FSD has radically changed. He now thinks they will need to be close to solving the Artificial General Intelligence problem in order to solve FSD. If Musk thought this initially then he would not have predicted having a solution to FSD in 2016.

Overconfidence is a well known phenomenon.
If people were aware of how they actually look (i.e., as perceived by an average person) before dating, humanity would have gone extinct long ago.

Iterative approach and conviction should solve FSD eventually.
 
I am in the camp that all-AI, all the time, is better than rules based. Consider Google Translate. Until a few years ago the main approach was rules of language based. More rules, better results, but it basically stunk. Then a separate team developed an AI approach with no rules, just feeding in texts in two languages that humans had already translated. Thank Canada because there is a huge corpus of dual French/English texts that kick-started the project. Very soon the translator went from crappy to great.

It's going to take hundreds of millions of miles of training data, but I think AI is a better approach than hundreds of thousands of "if-then" statements in code.
 
Re: "1. radar uses photons and 2. photons are not reduced to pixels by the NN" discussion. I don't agree, philosophically.

One, what ever you call it, photon or wave, radar in its simplest form has one detector (like one really fat pixel, if you will) and only uses timing of the signal bounce to gather the data of the distance to an object. So whether you call it a photon or wave is irrelevant. The data output is an increment of time, the NN doesn't care what generated that signal, if it really only access raw radar data. I suspect it actually looks at highly processed radar data, why waste the extra intelligence built into the radar unit? So it's probably even more removed from reading "photons."

Two, the statement that the NN never reduces or processes the photon input into the cameras into "pixels" is a bit misleading. Camera sensors have an array of sensors (pixels) that can be triggered by the photons, and the processor has a hz rate that can only read those signals at a certain speed like a frame rate. So even though the data may not be processed into some image format, it's still effectively pixelated and has a frame rate like video.

I fail to see the reasoning that the NN somehow has magically accessed some theoretical feed of pure photons. It's still limited by the sensor's capabilities, and real gaps in the field of view exist. Distant objects can be small enough to not trigger more than one "pixel" on the camera sensor, even though probably, IDK, some crazy number of actual photons hit that one "pixel."

I don't think the NN ever processes actual photons, it's always filtered by the sensor limits.
 
I don't think the NN ever processes actual photons, it's always filtered by the sensor limits.

The individual quantum nature of the EM field is not measurable practically in radar, but it is so in CCD detection in optical frequencies. CCDs will read out counts (though with well under 100%) efficiency and there is stochastic noise that's not due to sensor noise, unlike radar.

The words about 'photons' are misleading here, and what they really mean is CCD raw values or post processed by standard image algorithms, and and one point the Tesla nets started taking in the raw values and avoiding the standard post processing with the idea that the nets can learn what is needed anyway and avoid some computational load.
 
  • Like
Reactions: BitJam and OxBrew
I am in the camp that all-AI, all the time, is better than rules based. Consider Google Translate. Until a few years ago the main approach was rules of language based. More rules, better results, but it basically stunk. Then a separate team developed an AI approach with no rules, just feeding in texts in two languages that humans had already translated. Thank Canada because there is a huge corpus of dual French/English texts that kick-started the project. Very soon the translator went from crappy to great.

It's going to take hundreds of millions of miles of training data, but I think AI is a better approach than hundreds of thousands of "if-then" statements in code.
Of course ML is better than a rule based approach since the area is too complex to be expressed by rules alone.

I an not convinced that only using ML is the best approach either. The rules of the road and high level behavior can be expressed using rules, and may have to be expressed using rules to cater for localisations to different jurisdictions.

Furthermore, vision-only is not ready for unsupervised safety critical applications like Radiology. It's absolutely not ready for driving, which is also time-critical, at this point in time.

Here's what Bard says:
Computer vision is not yet ready for safety-critical applications on its own. There are a number of challenges that need to be addressed before it can be used in these applications, such as:
  • Robustness to adversarial attacks. Adversarial attacks are designed to fool machine learning models, and they can be particularly effective against computer vision models. This is a major concern for safety-critical applications, where even a small failure rate can have catastrophic consequences.
  • Explainability. It is important to be able to explain why a computer vision model made a particular decision. This is especially important in safety-critical applications, where it is important to be able to understand why the system failed.
  • Data requirements. Computer vision models require a lot of data to train, and this data can be difficult and expensive to collect. This is a major challenge for safety-critical applications, where the data may be sensitive or difficult to obtain.
  • Real-time performance. Computer vision models need to be able to make decisions in real time in order to be used in safety-critical applications. This can be a challenge for complex models, especially when the environment is changing rapidly.
 
I an not convinced that only using ML is the best approach either. The rules of the road and high level behavior can be expressed using rules, and may have to be expressed using rules to cater for localisations to different jurisdictions.
The rules are, in turn, just generalizations of common sense, which can be represented as a tensor in a large enough neural network.
 
I an not convinced that only using ML is the best approach either. The rules of the road and high level behavior can be expressed using rules, and may have to be expressed using rules to cater for localisations to different jurisdictions.
A human with an International Driving Permit can drive just about anywhere in the world. In theory. Get to another country, get a vehicle and go. Because you "know how to drive".

But you should also learn the local rules (laws, customs), which vary everywhere.

Is this applicable to ML/AI? If the system can "drive" then it only needs to be given the local rules in a download patch. Solve general driving and add local rules (laws, customs) as needed.