Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

Is all neural networks really a good idea?

This site may earn commission on affiliate links.
What do you think Tesla could add to enable complete FSD? How would the addition of this make a future with complete FSD more likely?

I do believe that LiDAR is far superior than cameras, and that if we're going to let 4,000lbs of rolling metal be controlled by "beta" software (or any software, for that matter), it should use the most accurate sensors.

My Model 3 complains half the year that the cameras are blocked or blinded. I'm just unable to see how they could ever fulfill the need.
 
I do believe that LiDAR is far superior than cameras, and that if we're going to let 4,000lbs of rolling metal be controlled by "beta" software (or any software, for that matter), it should use the most accurate sensors.

My Model 3 complains half the year that the cameras are blocked or blinded. I'm just unable to see how they could ever fulfill the need.
This all makes sense. Do you think allowing the cameras to move, akin to the human eyes, and installing some sort of self-cleaning or self-diagnosing function for cameras would be a viable alternative?
 
They're expecting that less and less disengagements will be necessary, but they'll call it L2 forever unless somehow they can get the disengagement count to zero over many many many miles.
Not zero, but when the actuaries say that the software is better than human drivers. If Tesla is providing car insurance and people are burning more money in accidents than their automation software, they're going to want people to leave the driving to the automation software. They'll save money that way. I figure that's when they'll move beyond the L2 designation.
 
  • Like
Reactions: JHCCAZ
This all makes sense. Do you think allowing the cameras to move, akin to the human eyes, and installing some sort of self-cleaning or self-diagnosing function for cameras would be a viable alternative?

I'm still not entirely convinced that cameras will do the job. That would solve the weather related issues, but would still require the software to perceive the distance and shape of objects as a human does.

In *theory* I understand why people think that computers can do anything the brain does (and faster), but it hasn't been proven yet.

I guess if all of mankind's resources were devoted to autonomous driving with cameras, probably it would happen. It's definitely not the path of least resistance though.

Where are the $250 LiDARs that were talked about so long ago?
 
There are also lots of high-resolution steerable radar technologies now that weren't around in 2016. They are cheap and easily integrated into the occupancy NNs (e.g., NVIDIA and MobileEye are both doing it). Tesla's HW4 could certainly benefit from a 3D front-facing and two side facing long range radar units in the front bumper. Also adding cameras in the front headlights would help. LIDAR is nice but requires matching detailed mapping to compare it with, and that's expensive and limiting and probably not required for L3.

All I have ever expected or wanted from FSD is L3 on Interstates and highways. Autosteer on City Streets, even if it could be L3, is of practically no utility for me because I drive the same surface level streets just about everyday and I will always be able to drive these routes faster and more efficiently (probably safer too) than the car. Plus I like zipping around the city in my Tesla. It's commutes in bumper-to-bumper traffic on I-285 or long trips on the highway(s) where I could use some relief from the tedium of driving that an L3 FSD would shine. Here's how they should market it:

Tesla Full Self-Driving: the World's first true consumer Level-3 hands-off autonomous driving system. Works on 90% of the U.S. Interstates and highways. 200-rule Safe!(tm)

I would pay money for that. This robotaxi nonsense? Please!
 
Last edited:
There are also lots of high-resolution steerable radar technologies now that weren't around in 2016. They are cheap and easily integrated into the occupancy NNs (e.g., NVIDIA and MobileEye are both doing it). Tesla's HW4 could certainly benefit from a 3D front-facing and two side facing long range radar units in the front bumper. Also adding cameras in the front headlights would help. LIDAR is nice but requires matching detailed mapping to compare it with, and that's expensive and limiting and probably not required for L3.

All I have ever expected or wanted from FSD is L3 on Interstates and highways. Autosteer on City Streets, even if it could be L3, is of practically no utility for me because I drive the same surface level streets just about everyday and I will always be able to drive these routes faster and more efficiently (probably safer too) than the car. Plus I like zipping around the city in my Tesla. It's commutes in bumper-to-bumper traffic on I-285 or long trips on the highway(s) where I could use some relief from the tedium of driving that an L3 FSD would shine. Here's how they should market it:

Tesla Full Self-Driving: the World's first true consumer Level-3 hands-off autonomous driving system. Works on 90% of the U.S. Interstates and highways. 200-rule Safe!(tm)

I would pay money for that. This robotaxi nonsense? Please!
What you're describing is essentially EAP. I wouldn't have a problem with Tesla perfecting EAP soon and pushing FSD out further either. EAP would make road trips so much more enjoyable. Most of my driving time is on highways anyway.
 
Curious, why do you say a single neural network wouldn't have [an occupancy networks and on screen visualizations]?

The information would exist inside OBNN (One Big Neural Net) but not in a form comprehensible to humans. If you do an MRI scan of someone's brain you won't magically see pictures of what they are thinking of.
Both of those internal representations would probably still be good intermediate training targets and outputs that the overall neural network would learn to use or ignore when appropriate.
Again, those representations don't exist in OBNN. You would need to specifically design smaller NNs to create those representations then use other smaller NNs to use that information for controlling the car, much like what Tesla does now.

Each part of the end-to-end NN could be trained individually (with various parts frozen) as well as jointly to improve the shared backbone of mutli-task learning then repeatedly iterated as well as refined at lower learning rates.
I do not understand what you are saying unless you mean using multiple smaller NNs. OBNN doesn't have parts that you can train individually; there are only layers and God only knows what information is stored in each layer. In order to train an individual part, that part would need to have inputs and outputs that can be used for training. It would be a separate NN. OBNN has only video streams as input and car controls as output so there is no way to train smaller parts inside of it.
Starting from robust perception neural networks probably will transfer knowledge to newly added control portion.
This sounds like what Tesla is doing now where they combine many NNs together in clever ways.

Additionally, allowing the gradient to flow all the way from control outputs to video inputs could result in perception learning new things as part of the overall optimization.
This is the only way to train OBNN. If you break the problem down to smaller pieces that can be trained separately then you are replacing OBNN with smaller ones.

Basically, there's no need to get rid of the mature FSD Beta networks when integrated into a single one.
OBNN has video streams as input and car controls as output. That's it. None of the training from the existing system could be directly reused. OBNN cannot be broken down into smaller trainable chucks. If you do this then you are making multiple connected NNs not OBNN.

Here is a four year old article about end-to-end learning:

They succinctly show the difference between end-to-end (OBNN) and pipelined (many connected NNs):
Code:
Audio (input) -> feature extraction -> phoneme detection -> word composition -> text transcript (output)

Audio (input) —> (NN) —> transcript (output)
The whole point of OBNN is to eliminate all internal structure and thus eliminate the possibility of intermediate learning.
 
I do not understand what you are saying unless you mean using multiple smaller NNs. OBNN doesn't have parts that you can train individually; there are only layers and God only knows what information is stored in each layer.
Is "OBNN" something you've invented as it sounds like a specific interpretation of end-to-end AI?

A large neural network like those used for FSD Beta and ChatGPT are made up of smaller neural network components such as transformers, which are also made up of smaller components such as encoder and decoder that themselves are made up of smaller components like a multilayer perceptron (MLP), which could be considered one of the more basic neural network building blocks. Even at that basic level, the MLP has an input layer, hidden layers and an output layer; and each of those individual pieces could still be individually trained or frozen as is common for finetuning and multi-task learning.

In particular, it sounds like you expect OBNN to be a really deep fully connected network like a MLP that the training process will figure out which weights, biases and connections are useful. That indeed could be an approach that eventually works with enough training data, but I would expect Tesla to put more design into the neural network architecture such as reusing all the "smaller" NNs already used for FSD Beta as at least there's proof that this particular structure seems to be useful for training and making good "intermediate" predictions.
 
Is "OBNN" something you've invented as it sounds like a specific interpretation of end-to-end AI?
Yes, because there is an ambiguity in the term "end to end AI" that Elon used. Did he mean a monolithic NN with end-to-end learning as described in the article I linked to or did he mean multiple NNs after finally getting rid of the last hand-coded parts? This was being actively debated here. It was claimed (and maintained) that end-to-end AI in FSD V12 was the latest AI fad Elon had glommed onto and not an evolutionary change of the current system.

In my line of work it is extreme common to mint new abbreviations to reduce redundancy. If you thought I was trying to pretend it was common lingo in the AI biz, I apologize.
A large neural network like those used for FSD Beta and ChatGPT are made up of smaller neural network components such as transformers, which are also made up of smaller components such as encoder and decoder that themselves are made up of smaller components like a multilayer perceptron (MLP), which could be considered one of the more basic neural network building blocks. Even at that basic level, the MLP has an input layer, hidden layers and an output layer; and each of those individual pieces could still be individually trained or frozen as is common for finetuning and multi-task learning.

In particular, it sounds like you expect OBNN to be a really deep fully connected network like a MLP that the training process will figure out which weights, biases and connections are useful. That indeed could be an approach that eventually works with enough training data, but I would expect Tesla to put more design into the neural network architecture such as reusing all the "smaller" NNs already used for FSD Beta as at least there's proof that this particular structure seems to be useful for training and making good "intermediate" predictions.
We agree here. This is what I've been saying but others disagree and say Elon meant it was going to be one big NN with end-to-end learning and hence no human understandable internal structure that would allow for things like an occupancy network or on-screen visualizations.

I believe FSD V12 will be an evolutionary change from V11, reusing most of what they have now and finally getting rid of the last hand-coded parts. Others disagree which is why a term like OBNN was needed to distinguish between these two different interpretations of "end to end AI" as used by Elon to describe V12.
 
This all makes sense. Do you think allowing the cameras to move, akin to the human eyes, and installing some sort of self-cleaning or self-diagnosing function for cameras would be a viable alternative?

Cameras get confused by wet roads, since they reflect things around them. LiDAR wouldn't have that problem.

I'm not saying it's impossible for AI to learn to work around that as humans do, but it's one more downside of cameras.

Speaking of many NNs vs OBNN, one big NN would make trading cameras for LiDAR a much more painful transition... they'd need to start over.
 
  • Like
Reactions: Hiline
I have never been really comfortable with this idea of FSD being full-stack neural networks, aka, "images in to steering, brakes & acceleration out." At first I thought it at least needed some hard and fast rules, e.g., "don't ever hit solid objects," "don't ever pass a school bus," "don't ever drive off a cliff," etc. Not neural networks trained on these circumstances that have a 97% accuracy rate, but basically an "if" statement that is 100% enforced. With these, an all "AI" driving system could be workable, I thought.
How do you hard code "never drive off a cliff" without neural nets? How does the machine know what a cliff even is without neural nets?
For better or worse, neural nets are the best hope we have for a robust AI driver.
 
  • Like
Reactions: enemji
How do you hard code "never drive off a cliff" without neural nets? How does the machine know what a cliff even is without neural nets?
For better or worse, neural nets are the best hope we have for a robust AI driver.
Also the problem with hard and fast rules like "never drive off a cliff" are that they're super brittle and likely to perform exactly the wrong thing at the wrong time. "Drive off a cliff" might be better than some much worse alternative.
 
Despite's diplomat's plea that we should take Elon at his word, I too interpret his comment about v12 "end-to-end NNs" as about combining a bunch of different NNs together, NOT One Big NN. Bunch of reasons why OBNN makes no sense, but the big one would be Tesla wouldn't be able to rely on the NNs it's spent years training previously.
 
  • Informative
Reactions: APotatoGod
Despite's diplomat's plea that we should take Elon at his word, I too interpret his comment about v12 "end-to-end NNs" as about combining a bunch of different NNs together, NOT One Big NN. Bunch of reasons why OBNN makes no sense, but the big one would be Tesla wouldn't be able to rely on the NNs it's spent years training previously.
While modular is great, it is also not a great idea when speed / reaction times are concerned. Definitely cannot have delay when milliseconds matter on the road
 
How do you hard code "never drive off a cliff" without neural nets? How does the machine know what a cliff even is without neural nets?
For better or worse, neural nets are the best hope we have for a robust AI driver.
I never said no neural nets. The occupancy networks and the classification networks are obviously an integral part of autonomous systems relying on sensory perception. It’s the driving decision part of it that I believe is at least partially, and maybe primarily, rule based (read the examples provided).