Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register
This site may earn commission on affiliate links.
I'm very confident that V12 is not what Douma thinks it is in that video.

I'm not saying V12 is a monolithic neural network, but it is vastly different than V11 in that it gets rid of human semantics and heuristics in all parts of the architecture

Likewise, V12 makes no use of human concepts like lanes, stop signs

That will be a major problem trying to connect it to the maps and desired routing information. Humans are explicitly instructed on what a lane is, a stop sign and all the other semantic content of safe driving is about.

Going to V12 helps get rid of labelers, but it's going to make engineering the correct policy that much more difficult. A human driving instructor gives explicit feedback in words to people who do understand those concepts. In new situations, humans really do reason out thinking about stop-signs, lanes, and signs---not the intuitive gut-feeling that a purely observationally trained perception & policy grey goo stack would do.


Otherwise you're going to get the equivalent of a clever dog or chimpanzee who watches its humans drive, but don't fully understand the requirements. They could do a decent job of mimicking behavior around their training set but are otherwise entirely inappropriate drivers for a robo.

Maybe they have some new trick, but switching to Yet Another Totally New Architecture means practical robodriving is many years away.

V11 was totally dependent on autolabeled and manually-labeled human concepts like this.

You can't feed V12 "dirty" ideas like the V11 BEV and autolabeled world representation and expect a good output. e2e thrives on pure, raw data and you need to massage the architecture and data to get the outputs you desire.

They could still make ML mappings from the mysterious internal mappings of V12 into the previously labeled results for visualization. The difference that the labeled data isn't used in the primary loss function for training the net and it will be less reflective of the elements it uses for decisioning. Which will make it harder to craft human feedback when ChatGPT style "read everything" is insufficient.

LLMs already have problems making plausible but wrong answers. The equivalent here would be incorrectly, and confidently, driving into the wrong lane, or doing so with the wrong signal light. Doing the wrong thing when a human traffic control overrides signals.
 
"V12 is different in that the data is well-massaged / curated with a tight feedback loop on failures with the data engine. V12's data consists of extreme examples to the mundane. Also, there's no flawed human prompts, the "prompts" are the same types of data-rich pixel streams as its training set."

Care to expand on this?

ChatGPT and similar have spawned a "field" of prompt engineers, who find the best ways to massage the correct outputs from gpts.

One of the reasons ChatGPT "hallucinates" is because our human-made prompts are "flawed" to the gpt. For example, if you ask, "did lincoln eat an apple in 1850," this input into the gpt is very data sparse, not based in reality, not "real world" so to speak. It's a flawed prompt after all.

V12 only deals with real-world data. Its data streams used for inference are only based in the real world and are a million times more data rich than a human-made prompt to gpt.

The sentence "did lincoln eat an apple in 1850," is a only couple of bytes at once instance in time, where as an 8 camera raw data stream is gigabytes per second.
 
ChatGPT and similar have spawned a "field" of prompt engineers, who find the best ways to massage the correct outputs from gpts.

One of the reasons ChatGPT "hallucinates" is because our human-made prompts are "flawed" to the gpt. For example, if you ask, "did lincoln eat an apple in 1850," this input into the gpt is very data sparse, not based in reality, not "real world" so to speak. It's a flawed prompt after all.

V12 only deals with real-world data. Its data streams used for inference are only based in the real world and are a million times more data rich than a human-made prompt to gpt.

The sentence "did lincoln eat an apple in 1850," is a only couple of bytes at once instance in time, where as an 8 camera raw data stream is gigabytes per second.
Thanks, and agreed (though I was tempted to make a joke about "bites" vs "bytes" in your Lincoln analogy :)
 
  • Like
Reactions: powertoold
Going to V12 helps get rid of labelers, but it's going to make engineering the correct policy that much more difficult. A human driving instructor gives explicit feedback in words to people who do understand those concepts. In new situations, humans really do reason out thinking about stop-signs, lanes, and signs---not the intuitive gut-feeling that a purely observationally trained perception & policy grey goo stack would do.


Otherwise you're going to get the equivalent of a clever dog or chimpanzee who watches its humans drive, but don't fully understand the requirements. They could do a decent job of mimicking behavior around their training set but are otherwise entirely inappropriate drivers for a robo.
Actually, that's exactly what an NN *is*. AI is mis-named, today a better name would be "artificial mimicry", and there is nothing wrong with that per se. No NN today "understands" anything in the sense of meta-cognition (and we are WAY WAY away from getting anywhere near that).

As to your point about maps and lanes, in fact the map lane information is as much an input to the NN as the video feeds (or, more precisely, the output of the NNs that deal with the video feeds). The same applies (at least in theory) to weather conditions etc etc.
 
  • Informative
Reactions: Artful Dodger
So yes, V12 will provide improvements, add new features (pullover, parking lots, u-turn maybe, emergency vehicle handling maybe, 3-point turn maybe, dead ends maybe, etc) But the performance will be similar or incrementally better than V11 (2-3x).

A 2-3x improvement over 11.4.7 in the right areas would make FSD better than Waymo in SF and LA, IMO.

Again, I have little faith in V12 being "the one," but if you watch the livestream, it's incredible how good it is already.

Tesla FSD cannot be stopped. They'll be the first to deploy a mass market robotaxi, if it is at all possible with the technology available at the time. It's clear Elon is all-in on autonomy, and he has the brains, talent, money, and guts to do it first.
 
Here is my take on V12 and future of FSD.

Tesla V12 FSD will be trained by watching how you, the owner/driver of the car, drives through the cameras on your vehicle.

What this means is that it will drive the way you drive, and hence meets everyone’s expectations on how FSD should behave in all situations. If it misbehaves, you will need to train yourself better, and in turn it trains FSD.

In other words, you are responsible for its performance on the road.

I can see “FSD Training Schools” popping up that can train your car, for a fee.
 
Here is my take on V12 and future of FSD.

Tesla V12 FSD will be trained by watching how you, the owner/driver of the car, drives through the cameras on your vehicle.

What this means is that it will drive the way you drive, and hence meets everyone’s expectations on how FSD should behave in all situations. If it misbehaves, you will need to train yourself better, and in turn it trains FSD.

In other words, you are responsible for its performance on the road.

I can see “FSD Training Schools” popping up that can train your car, for a fee.
Are you joking or do you really believe this? Hard to say on the internet.

Assuming you aren't joking...100% the car isn't going to learn you or learn on a local level.

V12 is learning from a collective of "good drivers" to train the fleet. The car doesn't have the memory or capacity to do that sort of local learning.
 
The car doesn't have the memory or capacity to do that sort of local learning.

It's theoretically feasible that V12 could be fine-tuned based on your individual driving. And yes, it wouldn't take place in the car, but they could capture driving video from each and every vehicle, and train customized versions of V12 at their data centers to push back to your vehicle.

But for how gargantuan of a data-processing task Tesla is dealing with now sporadically collecting clips from a select few vehicles, this would be magnitudes larger in complexity. I don't think it's very likely at all.
 
It's theoretically feasible that V12 could be fine-tuned based on your individual driving. And yes, it wouldn't take place in the car, but they could capture driving video from each and every vehicle, and train customized versions of V12 at their data centers to push back to your vehicle.

But for how gargantuan of a data-processing task Tesla is dealing with now sporadically collecting clips from a select few vehicles, this would be magnitudes larger in complexity. I don't think it's very likely at all.
I think you can make an argument that it's not really feasible, currently, with cars having various drivers, potential robotaxis, all of the different locations and laws...it would cost Tesla billions to have that sort of individualized curated data not just in storage cost but compute power....also adding that Tesla themselves said that's an impossible route at AI day.

Regardless, that's not how V12 is being programmed now.

Edit: Could you imagine trying to bug fix something like this? It's almost comical. Remember Tesla would be liable in L4/5...They will define what a good driver is.
 
it would cost Tesla billions to have that sort of individualized curated data not just in storage cost but compute power

Maybe as an additional monthly charge, similar to how OpenAI offers model fine-tuning through their API:

1694695225220.png


It's would still fundamentally be the same "good driver," just with behavior tweaked slightly to match the current driver.
 
V12 is learning from a collective of "good drivers" to train the fleet. The car doesn't have the memory or capacity to do that sort of local learning.
And I'm thankful for that. Not that I think Tesla is stupid enough to allow owner modifications to the safety-critical systems of the car. Imagine some guy running robotaxis who realizes that if his cars were just a little more aggressive around pedestrians, they could each complete 2 extra fares a day in the city.

Of course, even as I say that I realize that text-to-autonomy software is somewhere in our future. In general, empowering people without verifying their... virtue... is a very bad idea.
 
  • Like
Reactions: uscbucsfan
News Flash:

V12 will be trained using your own driving style as you drive it around. Therefore it will meet everyone’s expectations on how the vehicle should behave in the various situations, and also if it misbehaves it is upon you to train it better.

This is not a joke.
Krash - my post was deleted. It was my take on v12, just like every one has theirs. What was so offensive about it?
It’s not offensive. If you hadn’t put “This is not a joke”, I would have put a laugh emoji on it and called it good. But because it is stated as fact, and because it is incorrect, I’m pulling it.

As mentioned, in the future, Tesla could indeed let your car learn your individual driving habits and your car could learn to drive with your tendencies. I don’t think it will happen, but it is an interesting theoretical and philosophical exercise. Happy to move it to a new thread of enough want to continue discussing.
 
It's theoretically feasible that V12 could be fine-tuned based on your individual driving. And yes, it wouldn't take place in the car, but they could capture driving video from each and every vehicle, and train customized versions of V12 at their data centers to push back to your vehicle.

But for how gargantuan of a data-processing task Tesla is dealing with now sporadically collecting clips from a select few vehicles, this would be magnitudes larger in complexity. I don't think it's very likely at all.
It's so logistically difficult and also prone to liability-inducing problems: your teenage son drives your car like a drunken maniac, and then when you go to use FSD for a nice calm commute it acts up.

Humans have a way of trolling and poisoning self-learning AI systems very quickly: look at all the examples of self-learning chatbots.
 
  • Like
Reactions: JB47394
Mobileye's CEO Amnon shashua responds to Tesla doing end to end:

Question: At present, Mobileye NZP is mainly used in highways, will Mobileye have a new algorithm to deal with complex urban scenes? What do you think of Tesla's end-to-end approach?

Amnon Shashua: From highways to cities, we don't need new algorithms. Developed for cities and highways, the NZP started first with high-speed in China, but soon turned to cities. The same algorithm can handle highway and city driving.​
Tests by car companies in Europe and the United States have shown that NZP's manual takeover rate is 5 to 10 times better than Tesla FSD. In urban scenarios, FSD generates much more intervention than SuperVision.​
End-to-end is not a new concept. End-to-end means having a large black box neural network that receives images from the camera and outputs turn and disconnect instructions. There is no sensing state between vehicle position, lane position, etc., only the last output command.​
Back in 2016, my colleague and I (Mobileye CTO Prof. Shai Shalev-Shwartz) published a paper explaining that end-to-end is very resource-intensive from the perspective of sample complexity, and the amount of data required will increase exponentially. The best way to do this is to build a decomposable system so that when something goes wrong, you can find out why the error occurred and focus only on the area of the network where the error occurred, without affecting the rest of the world.​
Why doesn't Mobileye make an end-to-end autonomous driving system? The key is MTBF (Mean Time Between Failures), how long can an intelligent driving system last before it needs to be taken over manually? Tesla FSD's MTBF should be less than an hour. If the MTBF of the driving system is 1 hour, collecting 100 hours of driving data can encounter a corner case. If you need 1000 hours of MTBF, then you need 10 times the amount of data.​
To prove that a system is better than human driving, we need 100000,<> hours of driving data, which can easily be collected over a period of months through a fleet of hundreds or thousands of cars. That's the amount of data Mobileye already has, and we don't need an end-to-end system to cover all corner cases.​
 
Here is my take on V12 and future of FSD.

Tesla V12 FSD will be trained by watching how you, the owner/driver of the car, drives through the cameras on your vehicle.

What this means is that it will drive the way you drive, and hence meets everyone’s expectations on how FSD should behave in all situations. If it misbehaves, you will need to train yourself better, and in turn it trains FSD.

In other words, you are responsible for its performance on the road.

I can see “FSD Training Schools” popping up that can train your car, for a fee.
All you need are 500,000 examples of you driving every situation that needs trained. If you average 23 ULTs every day, you should have done enough of them after 50 years.
 
  • Funny
Reactions: jeewee3000
Are you joking or do you really believe this? Hard to say on the internet.

Assuming you aren't joking...100% the car isn't going to learn you or learn on a local level.

V12 is learning from a collective of "good drivers" to train the fleet. The car doesn't have the memory or capacity to do that sort of local learning.
It will come with foundational stuff of understanding traffic signs. But the rest is up to you. IF you read the threads on AP/FSD, pretty much everyone is complaining about how they have to take over, simply because it drives differently than how they do, and cannot trust FSD. What better than to train FSD the way you drive?