Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
Given what we've been told, that the way to "program" v12 is to show it video of the desired (skilled human) driving behavior:

It makes sense that one aspect, and I'd guess the majority of the activity, is to collect data from the test driver executing this UPL manually.

Of course the v12 prototype should already have lots of applicable UPL video from elsewhere, so it shouldn't be starting from nothing, So yes they would still be in a position to test v12 behavior there. However, they must have been sent specifically to collect a lot of training data because that's the essence of the v12 development.

Could they be accused of deliberately overfitting for Chuck's UPL? Maybe but that's not exactly a bad thing if it still helps to fit for similar scenarios everywhere else. Also, learning on specific intersections and then testing in other places gives him a better handle on how transportable or widely applicable such data really can be.

Another interesting question is whether they can actually make progress while the car and the employees are down there. It depends on the turnaround time for them to get a new sub-version out, after training back at the mothership using the telemetry from this car. It may require at least one return trip to this location in the future, to test the results of taking and training the data from this current session.

Finally, are they also testing the latest internal version of the v11 for comparison? Probably yes, but I don't think that's the primary task.
 
  • Like
Reactions: APotatoGod
Another interesting question is whether they can actually make progress while the car and the employees are down there. It depends on the turnaround time for them to get a new sub-version out, after training back at the mothership using the telemetry from this car. It may require at least one return trip to this location in the future, to test the results of taking and training the data from this current session.
This is probably just "redneck invention", but I'm a bit surprised that "modular" networks aren't used. That is, create networks for all sorts of things like handling basic lane keeping, making turns, lane changes, etc, ad nauseum. On top of that, create networks that identify the driving situation and then route processing to the subnet that deals with that situation. So instead of monolithically grinding through ten thousand parameters, most of which are completely uninteresting to the needed outputs, you restrict computation to only the bits that apply to the current situation.

It also allows you to train up on unprotected lefts in essentially real time because the network would - ideally - be quite small because it only addresses the uniqueness of that scenario. Lane keeping, yielding to traffic, etc, would be handled by other networks that are not being trained right then. They only need training when the unprotected left requires additional tuning of lane keeping or yielding.

This is how I figure our brains work, and a monolithic neural network seems like a rubbish idea to me. I'm not figuring out if my computer tends to start dancing on its right or left foot to a Samba at Benny's House of Dance because that bit of my brain doesn't get activated when I'm working on my computer.

Unless I'm replying to a forum post about end to end AI, obviously.
 
  • Like
Reactions: JHCCAZ
This is probably just "redneck invention", but I'm a bit surprised that "modular" networks aren't used. That is, create networks for all sorts of things like handling basic lane keeping, making turns, lane changes, etc, ad nauseum. On top of that, create networks that identify the driving situation and then route processing to the subnet that deals with that situation. So instead of monolithically grinding through ten thousand parameters, most of which are completely uninteresting to the needed outputs, you restrict computation to only the bits that apply to the current situation.

It also allows you to train up on unprotected lefts in essentially real time because the network would - ideally - be quite small because it only addresses the uniqueness of that scenario. Lane keeping, yielding to traffic, etc, would be handled by other networks that are not being trained right then. They only need training when the unprotected left requires additional tuning of lane keeping or yielding.

This is how I figure our brains work, and a monolithic neural network seems like a rubbish idea to me. I'm not figuring out if my computer tends to start dancing on its right or left foot to a Samba at Benny's House of Dance because that bit of my brain doesn't get activated when I'm working on my computer.

Unless I'm replying to a forum post about end to end AI, obviously.
I don't disagree with you at all. With the usual disclaimer that I'm no AI expert, think there's been a misapprehension in this thread that "end-to-end AI" or "nothing but nets"means some kind of monolithic and formless mass of neurons.

Why this took hold in this thread is perhaps from an early dismissal of James Delma's interviews on the topic, where he essentially laid out the explanation that the internal architecture is probably very much derived from the preceding v11 system. I happen to think this is correct and I don't know why it's hard to accept. For me it's the core reason of how this idea can work.

So I think in your reply to me, you're assuming that I must fall the monolithic AI camp here; I didn't say that at all. I said the training requires example video, which is what we were told by Ashok and Elon and knowledgeable outsiders. I'm simply saying that the black car in Florida is there for the purpose gathering specific video for the training set.

As you say, our brains are not a giant mush of neurons either. There are all the specialized cortexes and lobes, evolved to excel in in specific duties, yet known to be adaptable in the case of damage to the original assignments. Again with the disclaimers, it feels perfectly reasonable that the v12 end to end training is operating on an internal architecture (and a set of previously developed weights that makes it go) that contains such centers of expertise and specialization for perception, decision, planning and control of driving tasks.

I've heard, many times, a caution not to get to wrapped up in the idea that AI neural Nets are highly analogous to the brain. With that said, I think perhaps one of the differences between the computer neural net and the brain architecture is that the concept of specialized cortexes is not, in the computer version, based on hardwired choices of interconnection paths. Rather it is based on predeveloped collections of weights. If we say that Tesla is starting with, or borrowing heavily from, the v11 predecessor system, well that 300K lines of code Elon mentioned was essentially a way to configure the NN architecture

We've been told repeatedly that the Optimus robot is using the same FSD and many of the same development techniques but I haven't heard anyone say that software architecture by having a massive network of neuron to neuron connection weights - but an even more massive Ghost Network, if you will, of zero-valued weights, that by their absence define and partition off the centers of expertise, (akin to cortexes of the brain). The non-zero-valued weights provide the intelligence within each center and the intercommunication among them.

The above, though possibly quite flawed, is how I've been able to wrap my own untrained brain :) around the idea that almost all of the FSD configuration programming will now based on data and training, yet is the developmental result of many prior generations of human-coded self-driving software effort.
 
So I think in your reply to me, you're assuming that I must fall the monolithic AI camp here
Not at all. Your comment about training triggered my distaste for large neural networks, where a monolith is the poster child for the wrong end of the spectrum.

I'm on the same page as Douma about V12 today, expecting it to be V11 with a neural network for the control module. Anything else seems idiotic at multiple levels.

In my opinion, the right end of the spectrum involves decomposing the control module into a hierarchy of networks. Each could be trained separately and, at runtime, only the pertinent networks would need to be calculated (as established by earlier networks in the hierarchy).
 
My question for the bored: Does Tesla/Elon have enough information now to know if FSD can be achieved with current tech and planned Dojo upgrades? Can they define some very difficult diverse problems, train for that, and then extrapolate the results across the full spectrum of issues to know with any certainty if they will succeed in full autonomy without any more breakthroughs?
 
My question for the bored: Does Tesla/Elon have enough information now to know if FSD can be achieved with current tech and planned Dojo upgrades?

Nobody knows if FSD can be achieved with resources of X until they acheieve it with resources of X.

(X being any specific collection of resources, not the Site Formerly Known As Twitter)

This is why every time they've claimed the driving computer would enable L4+ they were... less than accurate...in the claim. Nobody knows how much is "enough" until you achieve it.



Can they define some very difficult diverse problems, train for that, and then extrapolate the results across the full spectrum of issues to know with any certainty if they will succeed in full autonomy without any more breakthroughs?


No.

They can keep going down a development path until they realize they've hit some local maximum, and revise their design. As they have several times now.
 
I have always assumed v11 was modular NN's with some modules still hand coded. "Modular" could include swapping out NN's as appropriate (UPL NN, roundabout NN, California NN, don't hit anything NN, turn left NN etc.). One reason this sounded reasonable was that it should be faster (lower latency) to process several smaller NN's instead of one big one. Swapping NN's avoids unnecessary processing of scenarios that are not currently applicable. V12 would at least replace the remaining coded modules with NN's, and I would consider it "end-to end NN" if that's all they did.

However, I think it has been shown that ML can frequently outsmart humans in structuring a solution. Forcing an NN solution to follow a human-structured process may result in a less efficient solution. A single large NN is probably better for complex problems.

My concern with a single NN is how much an NN sufficiently small to process in a reasonable time can learn. I'm pretty sure an NN can handle any one of the corner cases I've seen here, especially with some map input, but where is the point where teaching it one more new corner case starts to degrade its right turns? When we hit that point Tesla may have to go back to swapping or modular NN's. Maybe an NN that can compress its processing and ignore "irrelevant" calculations in order to speed up execution? Hopefully Tesla will let us in on what they're doing.
 
  • Like
Reactions: JB47394
My question for the bored: Does Tesla/Elon have enough information now to know if FSD can be achieved with current tech and planned Dojo upgrades? Can they define some very difficult diverse problems, train for that, and then extrapolate the results across the full spectrum of issues to know with any certainty if they will succeed in full autonomy without any more breakthroughs?
It’s highly unlikely that Tesla can get to autonomy even in optimal conditions in a meaningful ODD with hw3/4. Computer Vision alone isn’t there yet. It’s still likely 2-3 research breakthroughs away. Even Waymo have a hard time getting to the reliability needed at highway speeds with all the sensors.

More training data alone will most likely not suffice.
 
Last edited:
  • Like
Reactions: AlanSubie4Life
And here is more than just the tease.


Doesn’t seem great to me (I’ll leave out the details, since I tire of writing lengthy posts describing all the obvious shortcomings) but hard to tell in some cases whether there were interventions and when the car was under manual control.

I guess it is either a bad human driver or a bad ADAS driver. Arguably not as bad as the other featured human driver in this video, though. But unfortunately that is not the bar.
 
  • Helpful
Reactions: willow_hiller
This is probably just "redneck invention", but I'm a bit surprised that "modular" networks aren't used. That is, create networks for all sorts of things like handling basic lane keeping, making turns, lane changes, etc, ad nauseum. On top of that, create networks that identify the driving situation and then route processing to the subnet that deals with that situation. So instead of monolithically grinding through ten thousand parameters, most of which are completely uninteresting to the needed outputs, you restrict computation to only the bits that apply to the current situation.

And how do you train a routing net to determine "the driving situation"? How do you know from raw sensor data what "the driving situation" is at any time? Is the "driving situation" something human labelled? How do you get that, what happens if there is more than one at the same time, unprotected left with a child in the crosswalk, and someone swerving into your lane?

It also allows you to train up on unprotected lefts in essentially real time because the network would - ideally - be quite small because it only addresses the uniqueness of that scenario. Lane keeping, yielding to traffic, etc, would be handled by other networks that are not being trained right then. They only need training when the unprotected left requires additional tuning of lane keeping or yielding.

This is how I figure our brains work,
it's probably not so.

The transformer based networks already have internal sub networks that devote "attention" to certain facts or certain features of other nets, so the architecture of something that is learned gating and filtering other parts (multiplicatively vs additively) is already included. But it can be optimized through e2e training.
 
  • Like
Reactions: OxBrew
I guess it is either a bad human driver or a bad ADAS driver. Arguably not as bad as the other featured human driver in this video, though. But unfortunately that is not the bar.

I saw a lot of cautious driving from both the Model Y and the example car, but I wouldn't call either "bad." If the Model Y was being driven by FSD V12 and not manually, it looked very human-like and smooth to me.
 
  • Like
Reactions: JHCCAZ
I saw a lot of cautious driving from both the Model Y and the example car, but I wouldn't call either "bad." If the Model Y was being driven by FSD V12 and not manually, it looked very human-like and smooth to me.
You can refer to prior posts. Bring cautious is kind of a deal breaker for this function too, and crossing speed is something covered at length previously. I saw one example here where there seemed to be some alacrity, but that is it. Hard to measure exact speed from the video without looking at timing and comparing to prior videos from in car. Possible but boring.

It’s “bad” if the caution increases stress. Going fast and decisively while staying error free is much better. Of course it isn’t hard to guess why they exhibit such caution if the driver is expected to be able to intervene. They are conflicting requirements.
 
And here is more than just the tease.


Doesn’t seem great to me (I’ll leave out the details, since I tire of writing lengthy posts describing all the obvious shortcomings) but hard to tell in some cases whether there were interventions and when the car was under manual control.

I guess it is either a bad human driver or a bad ADAS driver. Arguably not as bad as the other featured human driver in this video, though. But unfortunately that is not the bar.
You can easily see the drivers hands manipulate the steering wheel multiple times on these turns. They are most likely collecting training data, or using the cameras to 3-d model the turn for simulations.
 
  • Like
Reactions: JHCCAZ and kabin