Welcome to Tesla Motors Club
Discuss Tesla's Model S, Model 3, Model X, Model Y, Cybertruck, Roadster and More.
Register

AlphaZero (out of main)

This site may earn commission on affiliate links.
AlphaGo worked by CREATING the data it needed.
Well, first of all I'm talking about AlphaZero, not AlphaGo. And I don't think that's the right way to think about it. AlphaZero created no data, just became a better player. There was nothing it created that could be used for training or verification.
So if you think you can design a simulator that can create random detailed traffic situations and randomly learn success from "playing" those random traffic situations, then I suppose yeah, you've created something comparable to AlphaGo that can learn to drive.
I don't think it can be ruled out that somebody might be able to do so. The AI would control all independent actors in the simulation, thus playing itself. The rules, like Go, can be fairly simple: get to your destination quickly, safely (for everybody), and comfortably. The huge complexity is in the environment, even before you get to the edge cases: it's no simple grid. And the cadence of moves is not simple either.
 
  • Disagree
Reactions: heltok
Well, first of all I'm talking about AlphaZero, not AlphaGo. And I don't think that's the right way to think about it. AlphaZero created no data, just became a better player. There was nothing it created that could be used for training or verification.

I don't think it can be ruled out that somebody might be able to do so. The AI would control all independent actors in the simulation, thus playing itself. The rules, like Go, can be fairly simple: get to your destination quickly, safely (for everybody), and comfortably. The huge complexity is in the environment, even before you get to the edge cases: it's no simple grid. And the cadence of moves is not simple either.


From the wiki:
AlphaZero - Wikipedia

AlphaZero plays 'go'.
"AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go."

AlphaZero trained itself on generated games.
"AlphaZero was trained solely via self-play, using 5,000 first-generation TPUs to generate the games"

Performance
"After 34 hours of self-learning of Go and against AlphaGo Zero, AlphaZero won 60 games and lost 40."
 
Well, first of all I'm talking about AlphaZero, not AlphaGo. And I don't think that's the right way to think about it. AlphaZero created no data, just became a better player. There was nothing it created that could be used for training or verification.

I don't think it can be ruled out that somebody might be able to do so. The AI would control all independent actors in the simulation, thus playing itself. The rules, like Go, can be fairly simple: get to your destination quickly, safely (for everybody), and comfortably. The huge complexity is in the environment, even before you get to the edge cases: it's no simple grid. And the cadence of moves is not simple either.

Using go, or chess or... is the thing that makes it easier. For driving simulations, you‘ll hit the same problems as with any learning on simulated data: your system will learn to exploit the idiosyncrasies of the simulation and overfit to it. In games like go and chess, there’s no such thing as simulated data. It learned not by running a simulation but by actually playing the games. Using those is a hack to get around the problems of simulated data. But unless you want to design a self driving car that can only be used in video games, you can’t sidestep those problems in that way for the driving task.
 
From the wiki:
AlphaZero - Wikipedia

AlphaZero plays 'go'.
"AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go."

AlphaZero trained itself on generated games.
"AlphaZero was trained solely via self-play, using 5,000 first-generation TPUs to generate the games"

Performance
"After 34 hours of self-learning of Go and against AlphaGo Zero, AlphaZero won 60 games and lost 40."
Not sure what point you are making here. Incidentally, the quote is correct: "AlphaZero was trained solely via self-play". But it seems to have misled you into thinking there was generated data. The only data was created in recording the games that AlphaZero played against itself. This wasn't used for anything except for humans to examine.

A similar approach to self-driving would be to populate a simulated world with a variety of AI-driven vehicles with very simple rules (like what I mentioned before). They'd start off trying to win the game by driving in a random direction at a random speed, varying both randomly. Lots of failures. How many iterations it would take before they followed roads I have no idea. But computers are really fast, and they don't get bored.

Using those is a hack to get around the problems of simulated data. But unless you want to design a self driving car that can only be used in video games, you can’t sidestep those problems in that way for the driving task.
Even Elon agrees that there's nothing fundamentally different between the real world and a sophisticated video game. So that doesn't seem to me to be all that strong an argument. It's just a matter of complexity. We don't know what's good enough.

Well, my point was just that y'all were making some simplistic assumptions about how data was everything when it came to AI. This isn't really the right thread for detailed discussion, so I'll call it a night.
 
  • Helpful
Reactions: Artful Dodger
So if you think you can design a simulator that can create random detailed traffic situations and randomly learn success from "playing" those random traffic situations, then I suppose yeah, you've created something comparable to AlphaGo that can learn to drive.

Yeah, realistic simulated driving environments are totally impossible:


:D

Very big difference between the two. There's a pretty limited set of legal moves at any point in Go - nothing like the range of moves when driving.

Actually, while the state space is obviously much larger, the convergence of a neutral net to find "optimal moves" in driving scenarios is possibly much faster, because "driving" is typically not an adversarial zero sum game with both players spending all their computing resources to destroy the other side while hiding/masking their intentions, but a more or less cooperative strategy where intention is shared and survival is maximized.

Even a very simple "don't run into other objects" FSD strategy can be reasonably successful with 99.9%+ survival rate, while a similarly naive Go strategy of "make a legal Go move that doesn't get your piece taken immediately" has 0% chance of survival even against entry level Go players.

So you are right that the two are not directly comparable, but not in the way you think: in many ways playing Go well is IMO a far more difficult cognitive task than driving a car well.

The difficulty is not in learning speed, or in generating legal moves, but in re-creating a simulated environment that matches what the Autopilot system sees, and which has a fleet-learning feedback function where video capture of disengagement events can be automatically transformed into traffic scenarios in the 3D simulated environment.

(I believe that is what "Project Dojo" is about - but that's just speculation.)
 
Last edited:
not by running a simulation but by actually playing the games

Ok, this is new syntax. In my head the actual game has wood and stone pieces, the board is made of card. The computer plays the game digitally, simulating the real game.
But I hear you. The logic is identical, thus a perfect simulation. If it’s not lossy at all, it can be considered “the actual game” and not a simulation.

But yes, driving data is very different to game data. You can’t computer generate driving data, because situations confound the rules, (some) people are crazy, things break (and brake), weather happens, animals happen, trees fall, pedestrians assume and roads come in almost infinite variety. Etc.
 
Sometimes. AlphaZero whupped human ass with no data at all. Seemed a lot like AI to me. You can make the assertion that autonomous driving is completely different from Go, but you would be hard pressed to prove it. Sure, different game, different rules, but is it different enough that you can be sure nobody else can some at it from a different direction and win?
Well, you can't prove a negative, but the difference between go and driving is that go has a large but limited number of choices and clearly defined rules which must be adhered to. Driving has a much larger set of cases and rules that are not necessarily adhered to. Vastly different problems.
 
Not sure what point you are making here. Incidentally, the quote is correct: "AlphaZero was trained solely via self-play". But it seems to have misled you into thinking there was generated data. The only data was created in recording the games that AlphaZero played against itself. This wasn't used for anything except for humans to examine.

A similar approach to self-driving would be to populate a simulated world with a variety of AI-driven vehicles with very simple rules (like what I mentioned before). They'd start off trying to win the game by driving in a random direction at a random speed, varying both randomly. Lots of failures. How many iterations it would take before they followed roads I have no idea. But computers are really fast, and they don't get bored.


Even Elon agrees that there's nothing fundamentally different between the real world and a sophisticated video game. So that doesn't seem to me to be all that strong an argument. It's just a matter of complexity. We don't know what's good enough.

Well, my point was just that y'all were making some simplistic assumptions about how data was everything when it came to AI. This isn't really the right thread for detailed discussion, so I'll call it a night.

You’re missing my point. There’s nothing fundamentally different between real life and a game in and of themselves. The only difference is access to the full rule set. These games have straightforward rule sets that can be programmed in such that the computer is always running on the actual game. While we have some ideas around the rule set for the world(comprised of physics, yes, but also biology, physiology, psychology in knowing how each individual human and other animals will respond in all scenarios), we don’t know them nearly well enough to generate a digital real world to use to train for self driving. The best we can do is an imperfect simulation. And a ML system follows the gradient towards the lowest point on the error surface, meaning it’ll find any edges in the simulation that can be exploited and learn those. Put another way: if you could somehow make a simulator that was an absolutely perfect representation of real life(at least on earth and on the macroscopic level), then yes, you could train a system like AlphaZero on it(kinda, you’d probably need a more complex architecture to up the learning capacity) and get a full self driving AI. Of course, if you had the tech to make such a simulation, you could do far more impressive things than just self driving cars.

Simulated data can be useful to fill in things that are otherwise impossible to capture, but you always want the lions share of data to be from the real world so you don’t end up overfitting to the simulation.
 
Last edited:
Simulated data can be useful to fill in things that are otherwise impossible to capture, but you always want the lions share of data to be from the real world so you don’t end up overfitting to the simulation.
I think we're probably agreed that a mix is good. But it seems to me just as true to say "Real data can be useful to fill in things that are otherwise impossible to simulate, but you always want the lions share of data to be from the simulated world so you don’t end up overfitting to what you just happen to observe and collect." What I want is to take an approach and then perturb things with data from outside. I think either way should work.
 
I remember neroden's point about the fact that, contrary to chess or go, driving is not "solved". You not only have a state space that is several orders of magnitude bigger, but also, what's "winning" in driving? You can't simply say that going A to B and being alive is a win.

I don't remember it right now, but there was a law that stated that AI will be very good at abstract, sophisticated jobs, but not good at all at gardening, or replace a waiter. The amount of things that happen and can go wrong in go are much less than in driving. Thus, I'm not sure how far a AlphaZero for cars can be, but I'm sure Karpathy and others at Tesla have thought deeply about that.

Leapfrogging complexity like that is like the Holy Grail, and probably there are hard computational limits playing at the moment.
 
I think we're probably agreed that a mix is good. But it seems to me just as true to say "Real data can be useful to fill in things that are otherwise impossible to simulate, but you always want the lions share of data to be from the simulated world so you don’t end up overfitting to what you just happen to observe and collect." What I want is to take an approach and then perturb things with data from outside. I think either way should work.

The difference would be that in one case you’ll be overfitting to things that actually, really happen, where in the alternate you’re overfitting to things the system will never really see. Both are bad(this is why Karpathy stresses not just using all the data they get, but carefully selecting out data they care about), but the latter is far worse. The former will produce a network able to handle most driving but that falls on its face when presented with edge cases. The latter will produce a network unable to handle much of anything in the real world, always trying to exploit the parameters of the simulated world it knows.

Think of it this way: if you’re learning to walk, would it be better to learn by walking the same 2 blocks in a city over and over again, or by walking across an entire planet where the surface is that of a bouncy house?
 
this is why Karpathy stresses not just using all the data they get, but carefully selecting out data they care about
Yeah. I think Karpathy is busy building AlphaDriving. It will work eventually after lots of driving-specific effort. Then they'll take their learning from that and build an AlphaZero type solution that's far, far better. My point, lest it's entirely lost, is that I don't see that others can't make an AlphaZero type effort from another direction without ever building AlphaDriving. Saying others are doomed because they don't have the data is not at all certain.