Professional StarCraft commentator discusses AlphaStar's victories against pro players

strangecosmos2 · Dec 18, 2019

Super helpful video that's intended for people who aren't too familiar with StarCraft (or machine learning):

Next video in the series (full playlist here):

AlphaStar is my favourite example of an achievement in AI. It shows that you can get to significantly better than average human performance on a difficult, complex task with a long time horizon and multi-step sequences of actions. Moreover, it shows that you can do this simply with a large dataset of human behaviour to imitate (i.e. with imitation learning). This side-steps problems in reinforcement learning like:

sparse rewards (e.g. victory or defeat in StarCraft occurs only every ~10-20 minutes)

credit assignment (i.e. which actions had what effect on the reward?)

reward hacking (i.e. the AI agent gives you exactly what you asked for, which turns out to be not what you wanted)

the number of possible combinations of actions is too large to find the right sequences by trial and error (e.g. DeepMind estimates 10^26 per time step in StarCraft)

Beautifully, once DeepMind had trained an agent with imitation learning, the researchers were then able to successfully fine-tune it with reinforcement learning. It went from significantly better than average (i.e. Diamond league) to Grandmaster-level, arguably pro-level.

Here's a visualization showing how good AlphaStar got using various training techniques, including pure imitation learning:

AlphaStar makes me feel optimistic that the planning or behaviour generation component of autonomous driving can be solved with a big enough dataset of human driving behaviour. Imitation learning should be able to get us most of the way there. My hunch is that, if anything works, it will be a combination of imitation learning and explicit code. (Cool technical presentation on combining the two here.)

Fine-tuning with reinforcement learning may be helpful or perhaps even necessary. One way to get reward signal would be human interventions. This approach is susceptible to reward hacking if you start from scratch, but maybe it wouldn't if you did imitation learning first. Perhaps training via reinforcement learning in simulation would be possible using replays from real world situations, but this approach would face the problem of accurately simulating how humans would react to the autonomous cars' actions.

An alternative (or maybe supplementary?) approach would be reward learning. Specifically, reward learning from demonstrations. This is part of an approach called inverse reinforcement learning. First, you learn a reward from the behaviour of an “expert” or demonstrator, such as a human being. You assume the demonstrations represent optimal or reward-maximizing behaviour. Second, you do reinforcement learning with the learned reward.

A potential problem with inverse reinforcement learning is that, if you assume demonstrations are optimal, your agent may not be able to do better than your demonstrations. A pair of awesome papers attempt to solve this problem — at least for some tasks — with an approach called T-REX and a follow-up called D-REX. The researchers figured out how to get an agent to extrapolate beyond the best demonstrations it can observe by ranking the demonstrations based on quality. Maybe D-REX, or something like it, could be applied to the autonomous vehicle problem. That's a complex technical question and I don't think I'm equipped to answer it. But I find it to be a fascinating idea.

The success of AlphaStar (and similar successes like OpenAI Five) has made me feel fairly relaxed about the planning/behaviour generation part of the problem. Personally, I feel a lot more worried about computer vision. As Elon put it:

“The hardest thing is having accurate representation of the physical objects in vector space. So, taking the visual input, primarily visual input, some sonar and radar and then creating an accurate vector space representation of the objects around you. Once you have an accurate vector space representation, the planning and control is relatively easier. That is relatively easy.

Basically, once you have accurate vector space representation, then you're kind of like a video game, like cars in Grand Theft Auto or something.”

Once driving is a video game — i.e. once computer vision is “solved” and the world state is known with a high degree of confidence — it feels to me like it's a lot more tractable. The same techniques that made AlphaStar can be used: namely, imitation learning and (possibly) fine-tuning with reinforcement learning. If reinforcement learning is used, ideas like D-REX or like humans providing the reward signal via interventions could substitute for the built-in victory and defeat conditions of StarCraft.

But since there is no proof of concept comparable to AlphaStar for computer vision, I worry that superhuman computer vision for autonomous vehicles may not be tractable with only incremental advances on current technology. I'm not arguing this is actually the case; I just don't have strong evidence with which to rule out this possibility.

On the computer vision front, I'm hopeful (but not necessarily super confident) that scaling up techniques like automatic curation, active learning, weakly supervised learning, sensor-supervised learning, and self-supervised learning will push the frontier all the way to superhuman vision. We all want self-driving cars and, to me, this looks like the best bet for solving computer vision right now.

heltok · Dec 18, 2019

Imo the biggest takeaway is that AlphaStar and OpenAI five uses similar LSTM network structure as Karpathy has been presenting, where the hydranet from his multitask learning in the wild presentation could be seen as the translation between real world and the virtual world of starcraft. Hopefully we will see the same kind of rapid progress that we have seen in Starcraft/Dota in self driving.

strangecosmos2 · Dec 18, 2019

It seems to me like two promising areas of research and development are weakly supervised learning and self-supervised learning for computer vision tasks, fine-tuned with fully supervised learning (which includes active learning). I'm hoping these techniques can help bridge the gap between machine vision and human vision.

BillO · Dec 18, 2019

Do you think that current cameras have the performance (resolution, depth of field, field of view, contrast, etc) to deliver enough information to the algorithms to achieve human-level vision?

heltok · Dec 18, 2019

And it also seems that just scaling up current approaches can solve more complex tasks and that you get very far with just good data.

BillO said:
Do you think that current cameras have the performance (resolution, depth of field, field of view, contrast, etc) to deliver enough information to the algorithms to achieve human-level vision?

Yes. When I drive, the most of my vision is forward and forward Tesla seems to see as good as I do, much better in night time. Sideways and backwards I mostly just glance and often I see through a small mirror, my estimation of relative velocity is far from great and when I try to estimate it better I will lose attention of what is happening in front of me. It’s even worse than comparing alphastar with complete map vision vs alphastar with only screen vision and we know how much worse it performed. If the computer could only see what I see and not the complete 360 deg field of view all the time, it would be much harder to solve it.

It should also be noted that neural networks perform insanly good in low light compared to humans. I struggle with seeing the difference between RGB111 and RGB222, the computer doesn’t.

strangecosmos2 · Dec 21, 2019

Bonus — AlphaStar vs. TLO:

Search

Professional StarCraft commentator discusses AlphaStar's victories against pro players

strangecosmos2

Koopa Troopa

heltok

Active Member

strangecosmos2

Koopa Troopa

BillO

Member

heltok

Active Member

strangecosmos2

Koopa Troopa

Similar threads