back to indexOriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20
Chapters
0:0
3:13 Describe Starcraft
13:49 Parameters of the Challenge
27:41 Observing the Game
38:6 Cloaked Units
45:16 Protoss Race
67:5 The Turing Test
79:54 Sequence To Sequence Learning
84:10 Difference between Starcraft and Go
85:38 Developing New Ideas
91:43 Meta Learning
95:47 The Existential Threat of Artificial Intelligence in the Near or Far Future
103:34 Next for Deep Mind
00:00:00.000 |
The following is a conversation with Ariel Vinales. 00:00:03.280 |
He's a senior research scientist at Google DeepMind, 00:00:05.920 |
and before that, he was at Google Brain and Berkeley. 00:00:09.120 |
His research has been cited over 39,000 times. 00:00:13.280 |
He's truly one of the most brilliant and impactful minds 00:00:18.200 |
He's behind some of the biggest papers and ideas in AI, 00:00:29.640 |
He's a lead researcher of the AlphaStar Project, 00:00:32.840 |
creating an agent that defeated a top professional 00:00:41.800 |
If you enjoy it, subscribe on YouTube, iTunes, 00:00:44.920 |
or simply connect with me on Twitter @LexFriedman, 00:00:51.240 |
And now, here's my conversation with Ariel Vinales. 00:00:55.440 |
You spearheaded the DeepMind team behind AlphaStar 00:00:59.600 |
that recently beat a top professional player at StarCraft. 00:01:18.840 |
What came first for you, a love for programming 00:01:35.240 |
I didn't really code much, but what I would do 00:01:38.560 |
is I would just mess with the computer, break it and fix it. 00:01:50.920 |
especially StarCraft, actually, the first version. 00:01:57.040 |
as professionally as you could play back in '98 in Europe, 00:02:17.960 |
- So, as a player, I tended to try to play not many games, 00:02:30.000 |
I think in StarCraft, there's three main races, 00:02:33.360 |
and I found it very useful to play with all of them. 00:02:42.320 |
because it's not how you play against someone, 00:02:45.400 |
but also if you understand the race because you play it, 00:02:54.160 |
to try to gain advantages here and there, and so on. 00:02:59.080 |
although I must say, in terms of favorite race, 00:03:08.320 |
towards the end of my career, before starting university. 00:03:15.600 |
to people that may never have played video games, 00:03:18.880 |
especially the massively online variety like StarCraft? 00:03:22.280 |
- So, StarCraft is a real-time strategy game. 00:03:30.920 |
is that there's a board, which is called map, 00:03:34.160 |
or, again, the map where people play against each other. 00:03:40.920 |
but the most interesting one is the one-versus-one setup, 00:03:54.400 |
And then, in this board, you have, again, pieces, 00:03:57.720 |
like in chess, but these pieces are not there initially, 00:04:02.240 |
You actually need to decide to gather resources, 00:04:07.800 |
So, in a way, you're starting almost with no pieces. 00:04:12.640 |
In StarCraft, there's minerals and gas that you can gather, 00:04:16.120 |
and then you must decide how much do you want to focus, 00:04:27.120 |
or maybe like attack, a good attack composition, 00:04:32.040 |
then you go and attack the other side of the map. 00:04:35.400 |
And now, the other main difference with chess 00:04:37.720 |
is that you don't see the other side of the map. 00:04:39.840 |
So, you're not seeing the moves of the enemy. 00:04:48.600 |
trading off economy versus building your own units, 00:04:52.200 |
but you also must decide whether you want to scout 00:05:05.880 |
There's also, unlike chess, this is not a turn-based game. 00:05:10.000 |
You play basically all the time, continuously, 00:05:13.560 |
and thus, some skill in terms of speed and accuracy 00:05:18.800 |
And people that train for this really play this game 00:05:31.280 |
where you don't see the other side of the board. 00:05:37.080 |
to basically get some money to build other buildings, 00:05:47.040 |
or maybe that and a game like turn-based strategy, 00:05:55.000 |
'cause you have to make these decisions really quickly. 00:05:58.640 |
And if you are not actually aware of what decisions work, 00:06:06.320 |
Everything you describe is actually quite stressful, 00:06:08.720 |
difficult to balance for an amateur human player. 00:06:11.560 |
I don't know if it gets easier at the professional level. 00:06:14.000 |
Like, if they're fully aware of what they have to do, 00:06:16.320 |
but at the amateur level, there's this anxiety, 00:06:28.360 |
is really stressful, and computation, I'm sure, difficult. 00:06:35.840 |
so StarCraft was released in '98, 20 years ago, 00:06:43.880 |
And Blizzard Battle.net with Diablo in '96 came out. 00:06:52.600 |
but it changed online gaming, and perhaps society forever. 00:07:12.760 |
- Right, so I think I kind of was an active gamer 00:07:16.920 |
whilst this was developing, the internet, online gaming. 00:07:28.400 |
and then I played Warcraft II, which is from Blizzard. 00:07:33.040 |
I didn't understand about what Blizzard was or anything. 00:07:37.320 |
which was actually very similar to StarCraft in many ways. 00:07:42.480 |
where there's orcs and humans, so there's only two races. 00:07:48.000 |
So I remember a friend of mine came to school, 00:07:51.600 |
say, "Oh, there's this new cool game called StarCraft." 00:07:54.800 |
"Oh, this sounds like just a copy of Warcraft II," 00:08:09.520 |
where you kind of start to play these missions, right? 00:08:12.920 |
You play against some sort of scripted things 00:08:15.720 |
to develop the story of the characters in the game. 00:08:18.960 |
And then later on, I start playing against the built-in AI, 00:08:23.480 |
and I thought it was impossible to defeat it. 00:08:27.440 |
and you can actually play against seven built-in AIs 00:08:29.720 |
at the same time, which also felt impossible, 00:08:49.920 |
because you could just connect machines with cables, right? 00:08:55.640 |
as a group of friends, and it was really, really, 00:08:59.280 |
like much more entertaining than playing against AIs. 00:09:02.320 |
And later on, as internet was starting to develop 00:09:07.400 |
then it's when I started experiencing Battle.net, 00:09:11.560 |
not only because of the fact that you can play the game 00:09:20.200 |
You just get exposed to now like this vast variety of, 00:09:23.080 |
it's kind of a bit when the chats came about, right? 00:09:25.840 |
There was a chat system, you could play against people, 00:09:30.720 |
not only about StarCraft, but about anything. 00:09:32.480 |
And that became a way of life for kind of two years, 00:09:38.840 |
it exploded in me that I started to play more seriously, 00:09:44.640 |
- Do you have a sense on a societal sociological level, 00:09:53.760 |
And it's a huge part of society, which is gamers. 00:09:56.840 |
I mean, every time I come across that in YouTube 00:10:03.160 |
this is the huge number of people play games religiously. 00:10:08.880 |
especially now that you've returned to that realm 00:10:17.560 |
which is maybe the main sort of online worlds 00:10:21.360 |
and presence that you get to interact with lots of people. 00:10:26.320 |
It was, to me, it was a bit less stressful than StarCraft 00:10:30.840 |
You just put in this world and you can always 00:10:34.920 |
But I think it was actually the social aspect of, 00:10:46.880 |
Because what you get to experience is just people 00:10:51.600 |
So even nowadays, I still have many Facebook friends 00:10:56.880 |
and their ways of thinking is even political. 00:11:06.720 |
And that way I actually get to understand a bit better 00:11:12.760 |
And these were just connections that were made by, 00:11:20.640 |
and I met this warrior and we became friends, 00:11:28.720 |
and more and more and more people are more aware of it. 00:11:33.440 |
but back in the day, as you were saying in 2000, 2005, 00:11:37.560 |
even it was very, still very strange thing to do, 00:11:44.200 |
I think there were exceptions like Korea, for instance, 00:11:47.120 |
it was amazing that everything happened so early 00:11:58.360 |
you could be a celebrity by playing StarCraft, 00:12:04.120 |
So yeah, it's quite interesting to look back. 00:12:13.080 |
and social networks and so on are also transforming things. 00:12:18.400 |
you're also one of the most productive people 00:12:20.960 |
in your particular chosen passion and path in life. 00:12:26.400 |
And yet you're also appreciate and enjoy video games. 00:12:35.760 |
- Someone told me that you could choose two out of three. 00:12:46.200 |
And I think for the most part, it was relatively true. 00:12:52.320 |
Games like StarCraft, if you take the game pretty seriously 00:12:56.480 |
then you obviously will dedicate more time to it. 00:13:08.640 |
So to me, especially when I started university undergrad, 00:13:16.760 |
And then World of Warcraft was a bit more casual. 00:13:19.000 |
You could just connect online and I mean, it was fun. 00:13:22.840 |
But as I said, that was not as much time investment 00:13:37.200 |
and released a bunch of cool open source agents 00:13:43.720 |
the first time you beat a world class player. 00:13:53.440 |
And how did you and David and the rest of the DeepMind team 00:13:58.240 |
Consider that you can even beat the best in the world 00:14:15.640 |
which is it was in California, is still in California. 00:14:18.880 |
We had this summit where we got together, the two groups. 00:14:21.800 |
So Google Brain and Google DeepMind got together 00:14:35.040 |
like this thing which we call Berkeley Overmind, 00:14:37.360 |
which is really just a StarCraft one bot, right? 00:14:42.120 |
And I remember Demis just came to me and said, 00:14:44.200 |
"Well, maybe not now, it's perhaps a bit too early, 00:14:48.880 |
and do this again with deep reinforcement learning." 00:14:53.680 |
And at the time it sounded very science fiction 00:14:58.720 |
But then in 2016, when I actually moved to London 00:15:01.480 |
and joined DeepMind, transferring from Brain, 00:15:04.720 |
it became apparent that because of the AlphaGo moment 00:15:08.160 |
and kind of Blizzard reaching out to us to say, 00:15:25.320 |
how would it all work before you do anything. 00:15:33.600 |
So in Berkeley, we did a lot of rule-based conditioning 00:15:38.600 |
and if you have more than three units, then go attack. 00:15:42.480 |
And if the other has more units than me, I retreat, 00:15:46.320 |
And of course, the point of deep reinforcement learning, 00:15:50.440 |
is that all these should be learned behavior. 00:15:59.400 |
where we just didn't even have an environment to work with. 00:16:05.800 |
- So if you go back to that conversation with Demis, 00:16:08.520 |
or even in your own head, how far away did you, 00:16:23.320 |
but it seems like StarCraft is way harder than Go, 00:16:44.880 |
and obviously I do a lot of different research 00:16:52.240 |
there's gonna be something good happening out of this. 00:16:55.880 |
So really I thought, well, this sounds impossible, 00:16:59.120 |
and it probably is impossible to do the full thing, 00:17:06.240 |
and it's only a neural network playing and so on. 00:17:14.840 |
I could see some stepping stones like towards that goal. 00:17:19.080 |
Clearly you could define sub-problems in StarCraft 00:17:23.400 |
okay, here is a part of the game, here's another part. 00:17:31.280 |
the fact that we could access human replays, right? 00:17:35.720 |
and in fact, they open sourced this for the whole community 00:17:39.960 |
and it's not every single StarCraft game ever played, 00:17:43.040 |
but it's a lot of them, you can just go and download. 00:17:45.880 |
And every day they will, you can just query a dataset 00:17:48.720 |
and say, well, give me all the games that were played today. 00:17:51.680 |
And given my kind of experience with language and sequences 00:18:02.440 |
because ever before we had such a large dataset of replays 00:18:24.240 |
But I also thought the full thing, like really no rules, 00:18:28.360 |
no single line of code that tries to say, well, I mean, 00:18:31.680 |
if you see this, you need to build a detector, 00:18:33.320 |
all these, not having any of these specializations 00:18:36.680 |
seemed really, really, really difficult to me. 00:18:47.680 |
pulling you in into this really difficult challenge. 00:18:51.840 |
What's the interest from the perspective of Blizzard, 00:18:57.280 |
- Yeah, I think Blizzard has really understood 00:18:59.440 |
and really bring forward this competitiveness 00:19:21.840 |
that can play Atari or Go and then later on chess 00:19:30.560 |
So for them, they wanted to see first, obviously, 00:19:33.840 |
whether it was possible, if the game they created 00:19:42.120 |
they also are a pretty modern company that innovates a lot. 00:19:48.480 |
to how to bring AI into games is not AI for games, 00:19:56.080 |
And we obviously at DeepMind use games for AI, right? 00:20:00.000 |
To drive AI progress, but Blizzard might actually be able 00:20:03.400 |
to do and many other companies to start to understand 00:20:09.760 |
And they definitely, we have brainstormed a lot about this. 00:20:13.680 |
- But one of the interesting things to me about StarCraft 00:20:16.040 |
and Diablo and these games that Blizzard has created 00:20:19.360 |
is the task of balancing classes, for example, 00:20:23.520 |
sort of making the game fair from the starting point 00:20:33.560 |
there's three races, Zerg, Protoss and Terran. 00:20:36.760 |
I don't know if I've ever said that out loud. 00:20:44.120 |
- Yeah, I don't think I've ever in-person interacted 00:20:51.760 |
I wonder if the AI, the work that you're doing 00:20:56.240 |
with AlphaStar would help balance them even further. 00:21:00.520 |
Is that something that Blizzard is thinking about? 00:21:03.320 |
- Right, so balancing when you add a new unit 00:21:09.120 |
given that you can always train or pre-train at scale 00:21:13.160 |
some agent that might start using that in unintended ways. 00:21:16.680 |
But I think actually, if you understand how StarCraft 00:21:24.320 |
the ways that many of the things and strategies 00:21:28.680 |
So I think we've seen it over and over in StarCraft 00:21:43.560 |
And then after that becomes kind of mainstream 00:21:48.280 |
and then they kind of maybe weaken that strategy 00:21:55.400 |
So these kind of continual talk between players and Blizzard 00:22:04.040 |
but also in World of Warcraft, they would do that. 00:22:06.440 |
There are several classes and it would be not good 00:22:09.280 |
that everyone plays absolutely the same race and so on. 00:22:13.200 |
So I think they do care about balancing, of course, 00:22:24.480 |
And I mean, whether AI can be more creative at this point, 00:22:28.680 |
I mean, it's just sometimes something so amazing happens. 00:22:33.680 |
like you have these drop ships that could drop the rivers 00:22:47.800 |
No one thought that you could actually put them 00:22:55.400 |
But I don't know, I think it's quite an amazing 00:23:01.840 |
- Well, it's almost like a reinforcement learning 00:23:07.000 |
that play Blizzard games is almost on the scale 00:23:19.520 |
but hundreds of thousands of games probably a month. 00:23:22.880 |
- So you could, it's almost the same as running RL agents. 00:23:32.120 |
Is it the, like you said, the imperfect information? 00:23:35.360 |
Is it the fact they have to do long-term planning? 00:23:47.600 |
Or is it, you know, in the game theoretic sense, 00:23:52.400 |
at least you don't know what the optimal strategy is 00:23:58.160 |
as just like the hardest, the most annoying thing? 00:24:04.160 |
and start to define like the parameters of it, right? 00:24:14.000 |
the very first barrier that one would hit in Starcraft 00:24:17.120 |
would be because of the action space being so large 00:24:20.680 |
and us not being able to search like you could in Chess 00:24:32.400 |
So without any sort of human knowledge or human prior, 00:24:38.000 |
and you know how deep reinforcement learning algorithm work, 00:24:42.000 |
which is essentially by issuing random actions 00:24:45.360 |
and hoping that they will get some wins sometimes 00:24:49.200 |
So if you think of the action space in Starcraft, 00:24:52.800 |
almost anything you can do in the early game is bad 00:25:01.360 |
That's something that the game does automatically, 00:25:04.920 |
And you would immediately just take them out of mining 00:25:21.080 |
in other locations in the map to gather more resources, 00:25:24.160 |
but the location of the building is important. 00:25:31.760 |
build the building, wait for the building to be built, 00:25:37.800 |
That feels like impossible if you just randomly click 00:25:46.960 |
because eventually that may yield to an extra win, right? 00:25:56.120 |
there's so many turns because the game essentially 00:26:02.080 |
I mean, that's how they can discretize sort of time. 00:26:05.520 |
Obviously, you always have to discretize time. 00:26:14.240 |
And that definitely felt a priori like the hardest. 00:26:34.240 |
and solving it has been basically kind of the focus 00:26:39.760 |
- So exploration in a multi-hierarchical way. 00:27:05.400 |
How do you then do the long-term sequence modeling? 00:27:12.560 |
- So AlphaStar has obviously several components, 00:27:16.840 |
but everything passes through what we call the policy, 00:27:24.280 |
There is, I could just now give you a neural network 00:27:27.160 |
and some weights, and if you fed the right observations 00:27:30.440 |
and you understood the actions the same way we do, 00:27:32.520 |
you would have basically the agent playing the game. 00:27:43.360 |
and we've experimented with a few alternatives. 00:27:46.640 |
The one that we currently use mixes both spatial 00:27:50.280 |
sort of images that you would process from the game, 00:28:00.880 |
But also we give to the agent the list of units 00:28:14.760 |
And we have versions of the game that play well 00:28:16.840 |
without this set vision that is a bit not like 00:28:23.640 |
because it's a very natural way to encode the game 00:28:26.560 |
is by just looking at all the units that there are, 00:28:29.360 |
they have properties like health, position, type of unit, 00:28:47.400 |
- But that's pretty close to the way humans see the game. 00:28:51.480 |
you're saying the exactness of it is not similar to humans? 00:28:55.040 |
- The exactness of it is perhaps not the problem. 00:29:02.280 |
is that they play with a mouse and a keyboard and a screen, 00:29:05.720 |
and they don't see sort of a structured object 00:29:09.560 |
what they see is what they see on the screen, right? 00:29:13.040 |
- Remember that there's a, sorry to interrupt, 00:29:14.360 |
there's a plot that you showed with camera base 00:29:23.520 |
we're kind of experimenting with what's necessary or not, 00:29:28.720 |
So actually, if you look at research in computer vision, 00:29:32.360 |
where it makes a lot of sense to treat images 00:29:38.160 |
there's actually a very nice paper from Facebook, 00:29:54.320 |
and scramble the image as if it was just a list of pixels. 00:29:59.160 |
Crucially, they encode the position of the pixels 00:30:09.840 |
which is a very popular paper from last year, 00:30:12.000 |
which yielded very nice result in machine translation. 00:30:22.560 |
then you could argue that the list of units that we see 00:30:26.960 |
because we have each unit as a kind of pixel, if you will, 00:30:38.720 |
to work very well on Pascal and ImageNet and so on. 00:30:41.400 |
- So the interesting thing here is putting it in that way, 00:30:55.520 |
it seems like there's echoes of a lot of the way 00:31:08.200 |
- Exactly, so now that we understand what an observation 00:31:11.200 |
for a given time step is, we need to move on to say, 00:31:14.680 |
well, there's gonna be a sequence of such observations, 00:31:17.760 |
and an agent will need to, given all that it's seen, 00:31:21.120 |
not only the current time step, but all that it's seen, 00:31:33.640 |
So given that, what you must then think about 00:31:37.840 |
is there is the problem of given all the observations, 00:31:49.360 |
And that sounds exactly like machine translation, 00:31:52.480 |
where, and that's exactly how kind of I saw the problem, 00:31:57.160 |
especially when you are given supervised data 00:32:06.680 |
of observations and actions onto what's gonna happen next, 00:32:10.160 |
which is exactly how you would train a model, 00:32:11.960 |
to translate or to generate language as well, right? 00:32:16.640 |
you must remember everything that comes in the past 00:32:19.000 |
because otherwise you might start having non-coherent text. 00:32:25.080 |
we're using LSTMs and transformers to operate on, 00:32:36.880 |
are exactly the same than what the agent is using 00:32:42.360 |
And the way we train it, moreover, for imitation, 00:32:47.120 |
take all the human experience and try to imitate it, 00:32:57.280 |
that sort of principle applies exactly the same. 00:33:00.200 |
It's almost the same code, except that instead of words, 00:33:04.520 |
you have a slightly more complicated objects, 00:33:16.520 |
- Right, so indeed you can bootstrap from human replays, 00:33:21.520 |
but then the agents you get are actually not as good 00:33:30.480 |
Well, we take humans from 3000 MMR and higher. 00:33:38.000 |
and 3000 MMR might be like 50% percentile, right? 00:33:45.480 |
MMR is a ranking scale, the matchmaking rating for players. 00:33:50.360 |
So it's 3000, I remember there's like a master 00:33:58.520 |
- It just sounds really good relative to chess, I think. 00:34:05.400 |
- So 3000, it's a bit like Elo indeed, right? 00:34:07.920 |
So 3,500 just allows us to not filter a lot of the data. 00:34:12.920 |
So we like to have a lot of data in deep learning 00:34:27.600 |
So we say, this replay you're gonna try to imitate 00:34:30.840 |
to predict the next action for all the actions 00:34:38.840 |
And what's cool about this is then we take this policy 00:34:44.320 |
and then we can ask it to play like a 3000 MMR player 00:34:49.600 |
play like a 3000 MMR player or play like a 6,000 MMR player. 00:34:53.680 |
And you actually see how the policy behaves differently. 00:34:57.280 |
It gets worse economy if you play like a gold level player, 00:35:02.960 |
which is the number of clicks or number of actions 00:35:07.760 |
And it's very interesting to see that it kind of imitates 00:35:12.360 |
But if we ask it to play like a 6,000 MMR player, 00:35:15.480 |
we tested of course these policies to see how well they do. 00:35:22.400 |
but they're nowhere near 6,000 MMR players, right? 00:35:24.960 |
They might be maybe around gold level, platinum perhaps. 00:35:29.240 |
So there's still a lot of work to be done for the policy 00:35:34.960 |
So far, we only asked them, okay, here is the screen, 00:35:38.200 |
and that's what's happened on the game until this point. 00:35:41.600 |
What would the next action be if we ask a pro to now say, 00:35:46.080 |
oh, you're gonna click here or here or there. 00:35:49.080 |
And the point is experiencing wins and losses 00:36:00.440 |
- That's so interesting that you can at least hope 00:36:19.280 |
- Well, I haven't played StarCraft II, so I am unranked, 00:36:26.280 |
- So I used to play StarCraft I, the first one. 00:36:29.640 |
- But you haven't seriously played StarCraft II? 00:36:32.720 |
So the best player we have at DeepMind is about 5,000 MMR, 00:36:42.120 |
Grand Master level would be the top 200 players 00:36:44.720 |
in a certain region like Europe or America or Asia. 00:36:53.760 |
I actually played AlphaStar a bit too late and it beat me. 00:36:56.680 |
I remember the whole team was, "Oh, Oriol, you should play." 00:36:59.760 |
And I was, "Oh, it looks like it's not so good yet." 00:37:23.160 |
The problem is I just don't understand StarCraft II. 00:37:36.520 |
we didn't have this kind of MMR system as well established. 00:37:40.360 |
So it would be hard to know what it was back then. 00:37:56.040 |
there's a few things that are just very hard to simulate. 00:37:59.760 |
The main one, perhaps, which is obvious in hindsight, 00:38:05.240 |
is what's called cloaked units, which are invisible units. 00:38:13.240 |
that you need to have a particular kind of unit to detect it. 00:38:20.560 |
If you cannot detect them, you cannot target them. 00:38:27.720 |
But despite the fact you cannot target the unit, 00:38:31.640 |
there's a shimmer that, as a human, you observe. 00:38:49.120 |
- That's really, the Blizzard term is shimmer. 00:39:02.680 |
and it's kind of a bit annoying to deal with. 00:39:11.080 |
oh, are you looking at this pixel in the screen and so on? 00:39:29.280 |
that it just doesn't feel there's a very proper way. 00:39:32.640 |
I mean, you could imagine, oh, you don't have high-precision, 00:39:36.920 |
or sometimes you see it, sometimes you don't. 00:39:47.240 |
- You know, it seems like a perception problem. 00:39:56.720 |
I would say they wouldn't be able to tell a shimmer 00:40:02.200 |
Whereas AlphaStar, in principle, sees it very sharply. 00:40:05.600 |
It sees that the bit turned from zero to one, 00:40:11.920 |
or you know that you cannot attack it and so on. 00:40:22.920 |
Then there are things humans cannot do perfectly, 00:40:25.120 |
even professionals, which is they might miss a detail, 00:40:32.200 |
if there's a corner of the screen that turns green 00:40:37.640 |
that can go into the memory of the agent, the LSTM, 00:40:47.640 |
it seems like the rate of action from AlphaStar 00:40:50.640 |
is comparative, if not slower than professional players, 00:40:59.680 |
that is causing us more issues for a couple of reasons. 00:41:06.720 |
StarCraft has been an AI environment for quite a few years. 00:41:09.960 |
In fact, I was participating in the very first competition 00:41:18.720 |
a very clear set of rules, how the actions per minute, 00:41:24.680 |
And as a result, these agents or bots that people build 00:41:31.040 |
they do like 20,000, 40,000 actions per minute. 00:41:45.440 |
that's why the range is a bit tricky to identify exactly. 00:41:56.960 |
because they warm up and they kind of select things 00:41:59.440 |
and spam and so on, just so that when they need, 00:42:04.200 |
So we came into this by not having kind of a standard way 00:42:09.200 |
to say, well, how do we measure whether an agent 00:42:26.880 |
and imprecisions of actions in the supervised policy. 00:42:31.000 |
you could see how agents like to spam click, to move here. 00:42:34.680 |
If you played, especially Diablo, you would know what I mean. 00:42:40.320 |
You're doing literally like maybe five actions 00:42:43.240 |
in two seconds, but these actions are not very meaningful. 00:42:48.720 |
So on the one hand, we start from this imitation policy 00:42:52.080 |
that is at the ballpark of the actions per minutes of humans 00:43:10.960 |
and that's the part we haven't talked too much yet, 00:43:13.240 |
but of course the agent must play against himself to improve. 00:43:19.640 |
that these actions will not become more precise 00:43:22.400 |
or even the rate of actions is going to increase over time. 00:43:26.040 |
So what we did, and this is probably kind of the first attempt 00:43:31.160 |
is we looked at the distribution of actions for humans 00:43:37.720 |
because I guess I mentioned that some of these agents 00:43:47.360 |
So what we looked is we look at the distribution 00:43:49.400 |
over professional gamers, and we took reasonably 00:43:57.480 |
after which, even if the agent wanted to act, 00:44:02.120 |
But the problem is this cutoff is probably set 00:44:08.640 |
even though the games, and when we ask the professionals 00:44:22.880 |
which is actions per minute, combined with the precision, 00:44:32.520 |
Should we just let it loose and see what cool things 00:44:38.400 |
- So this is, in itself, an extremely interesting question, 00:44:44.000 |
would be so difficult, modeling absolutely all the details 00:44:47.720 |
about muscles and precision and tiredness of humans 00:45:11.080 |
So one of the constraints you put on yourself, 00:45:15.440 |
or at least focused in, is on the Protoss race, 00:45:19.920 |
Can you tell me about the different races and how they, 00:45:22.920 |
so Protoss, Terran, and Zerg, how do they compare? 00:45:35.720 |
- So Protoss, so in StarCraft, there are three races. 00:45:39.720 |
Indeed, in the demonstration, we saw only the Protoss race. 00:45:45.600 |
Protoss is kind of the most technologically advanced race. 00:45:49.480 |
It has units that are expensive, but powerful, right? 00:45:53.840 |
So in general, you wanna kind of conserve your units 00:45:59.560 |
and then you wanna utilize these tactical advantages 00:46:10.320 |
people say, like, they're a bit easier to play, perhaps. 00:46:17.160 |
I mean, I just talked to, now, a lot to the players 00:46:20.160 |
that we work with, TLO and Mana, and they said, 00:46:26.360 |
So perhaps the easier, that doesn't mean that it's, 00:46:46.840 |
I think it's pretty equal in terms of distribution, 00:46:52.840 |
They don't want, they wouldn't want one race like Protoss 00:46:59.920 |
- So definitely, like, they tried it to be like balanced. 00:47:03.880 |
So then maybe the opposite race of Protoss is Zerg. 00:47:17.640 |
So if you have an army, it's not that valuable 00:47:20.480 |
in terms of losing the whole army is not a big deal as Zerg 00:47:30.840 |
Zergs typically play by applying a lot of pressure, 00:47:40.400 |
I mean, there's never, I mean, they're pretty diverse. 00:47:48.800 |
and there's some units in Protoss that are less valuable 00:47:51.280 |
and you could lose a lot of them and rebuild them 00:48:02.440 |
So first there's collection of a lot of resources. 00:48:06.520 |
The other one is expanding, so building other bases. 00:48:14.840 |
building units and attacking with those units. 00:48:20.600 |
Maybe there's the different timing of attacks, 00:48:25.960 |
What are the different strategies that emerged 00:48:29.040 |
I've read that a bunch of people are super happy 00:48:34.960 |
that it's really good to, what is it, saturate? 00:48:42.120 |
- And that's for greedy amateur players like myself. 00:48:48.960 |
and it just feels good to just accumulate and accumulate. 00:48:56.640 |
But is there other strategies that you discovered 00:49:08.040 |
and real-time strategy games in general are very similar. 00:49:11.040 |
I would classify perhaps the openings of the game. 00:49:18.760 |
And generally I would say there's two kinds of openings. 00:49:23.400 |
That's generally how players find sort of a balance 00:49:28.400 |
between risk and economy and building some units early on 00:50:01.640 |
So standard openings themselves have some choices 00:50:07.480 |
Of course, if you scout and you're good at guessing 00:50:16.480 |
So you can imagine that normal standard games 00:50:19.120 |
in Starcraft looks like a continuous rock, paper, scissor game 00:50:24.040 |
where you guess what the distribution of rock, 00:50:33.360 |
or put the paper out before he kind of changes his mind 00:50:52.200 |
trying to better and better estimate the distribution 00:51:03.000 |
And when your belief state becomes inaccurate, 00:51:08.040 |
whether he's gonna play something that you must know, 00:51:16.440 |
or improving the belief state part of the loss 00:51:22.720 |
- It's implicit, but you could explicitly model it 00:51:25.840 |
and it would be quite good at probably predicting 00:51:32.880 |
There's no additional reward for predicting the enemy. 00:51:42.840 |
And AlphaStar sometimes really likes this kind of cheese. 00:51:46.760 |
These cheeses, what they are is kind of an all-in strategy. 00:51:58.200 |
or you're gonna go for hiding your technological buildings 00:52:16.320 |
Because if I scout your base and I see no buildings at all, 00:52:25.600 |
Should I build suddenly a lot of units to defend? 00:52:35.640 |
and defending against cheeses is extremely important. 00:52:40.720 |
many agents actually develop some cheesy strategies. 00:52:45.040 |
And in the games we saw against TLO and Mana, 00:52:49.200 |
were actually doing these kinds of strategies, 00:52:53.600 |
And then there's a variant of cheesy strategy, 00:52:57.320 |
So an all-in strategy is not perhaps as drastic as, 00:53:03.800 |
and try to just disrupt your base and game over, 00:53:11.920 |
that you can align precisely at a certain time mark. 00:53:20.200 |
like five of this type, five of this other type, 00:53:22.880 |
and align the upgrade so that at four minutes and a half, 00:53:30.560 |
And at that point, that army is really scary. 00:53:33.880 |
And unless the enemy really knows what's going on, 00:53:36.360 |
if you push, you might then have an advantage 00:53:40.160 |
because maybe the enemy is doing something more standard, 00:53:42.360 |
it expanded too much, it developed too much economy, 00:53:45.680 |
and it trade off badly against having defenses, 00:53:51.040 |
But it's called all-in because if you don't win, 00:53:54.960 |
So you see players that do these kinds of strategies. 00:54:08.760 |
So if we start entering the game theoretic aspects of the game, 00:54:15.800 |
it also makes it quite entertaining to watch. 00:54:17.880 |
Even if I don't play, I still enjoy watching the game. 00:54:21.720 |
But the agents are trying to do this mostly implicitly, 00:54:26.800 |
but one element that we improved in self-play 00:54:31.280 |
And the AlphaStar League is not pure self-play. 00:54:34.560 |
It's trying to create different personalities of agents 00:54:37.880 |
so that some of them will become cheesy agents. 00:54:41.480 |
Some of them might become very economical, very greedy, 00:54:46.160 |
but then maybe early on, they're gonna be weak, 00:54:55.360 |
that you can see kind of an evolution of agents 00:55:01.920 |
and then they generate kind of the perfect counter 00:55:05.720 |
But these agents, you must have them in the populations 00:55:11.240 |
you're not covered against these things, right? 00:55:13.000 |
It's kind of, you wanna create all sorts of the opponents 00:55:21.800 |
early aggression, later aggression, more expansions, 00:55:32.720 |
at finding some subset of these, but not all of these. 00:55:39.400 |
do an ensemble of agents that they're all playing in a league 00:55:47.400 |
who does a new cool strategy and you immediately, 00:55:50.200 |
oh my God, I wanna try it, I wanna play again. 00:55:53.000 |
And this to me was another critical part of the problem, 00:55:57.520 |
which was, can we create a Battle.net for agents? 00:56:01.200 |
And that's kind of what the AlphaStar League really- 00:56:04.360 |
And where they stick to their different strategies. 00:56:06.880 |
Yeah, wow, that's really, really interesting. 00:56:25.360 |
but how hard is it in general to win all matchups 00:56:33.560 |
because once you see AlphaStar and superficially 00:56:40.440 |
Let's, if you sum all the games like 10 to one, right? 00:56:42.880 |
It lost the game that it played with the camera interface. 00:57:04.040 |
but a moment like this had not occurred before yet. 00:57:19.480 |
They're definitely understand the game enough 00:57:21.400 |
to play extremely well, but are they unbeatable? 00:57:33.240 |
it's always possible that you might take a huge risk 00:57:48.040 |
We would like to, I mean, if I learn to play Protoss, 00:57:54.760 |
So there are obvious interesting research challenges 00:57:57.720 |
as well, but even as the raw performance goes, 00:58:05.120 |
we are at pro level or at high Grandmaster level, 00:58:09.320 |
but obviously the players also did not know what to expect. 00:58:14.320 |
Right, this kind of their prior distribution was a bit off 00:58:16.960 |
because they played this kind of new, like alien brain 00:58:25.080 |
But also I think if you look at the games closely, 00:58:28.040 |
you see there were weaknesses in some points, 00:58:33.280 |
or if it had got invisible units going against 00:58:42.880 |
but it's really a very exciting moment for us to be seeing, 00:58:46.440 |
wow, a single neural net on a GPU is actually playing 00:58:55.800 |
- Yeah, I'm sure there must be a guy in Poland somewhere 00:59:03.400 |
that this never happens again with AlphaStar. 00:59:06.600 |
So that's really exciting in terms of AlphaStar 00:59:09.720 |
having some holes to exploit, which is great. 00:59:32.880 |
wait, let me actually just pause for a second. 00:59:44.640 |
Olympic athletes have their gold medal, right? 00:59:50.440 |
you've published a lot of prestigious papers, 01:00:05.120 |
I mean, so looking back to those last days of 2018, 01:00:15.040 |
I'll say, oh my God, I wanna be like in a project like that. 01:00:18.040 |
It's like, I already feel the nostalgia of like, 01:00:26.340 |
And so in that sense, as soon as it happened, 01:00:32.980 |
So it's almost like sad that it happened and oh my God, 01:00:36.320 |
but on the other hand, it also verifies the approach. 01:00:46.080 |
that even though we can train a neural network 01:00:57.420 |
but I already was also thinking about next steps. 01:00:59.920 |
I mean, as I said, these Asians play Protoss versus Protoss, 01:01:04.040 |
but they should be able to play a different race 01:01:10.620 |
Some people call this meta reinforcement learning, 01:01:15.160 |
So there's so many possibilities after that moment, 01:01:21.500 |
We had this bet, so I'm kind of a pessimist in general. 01:01:27.680 |
So I kind of sent an email to the team, I said, 01:01:35.080 |
And I really thought we would lose like 5-0, right? 01:01:38.800 |
We had some calibration made against the 5,000 MMR player. 01:01:47.280 |
even if he played Protoss, which is his off race. 01:01:53.040 |
So for me, that was just kind of a test run or something. 01:01:58.940 |
And unbelievably, we went to this bar to celebrate 01:02:03.940 |
and Dave tells me, "Well, why don't we invite someone 01:02:16.120 |
And we had some drinks and I said, "Sure, why not?" 01:02:19.320 |
But then I thought, "Well, that's really gonna be 01:02:24.480 |
a thousand MMR is really like 99% probability 01:02:28.320 |
that Mana would beat TLO as Protoss versus Protoss, right? 01:02:34.160 |
And to me, the second game was much more important, 01:02:38.920 |
even though a lot of uncertainty kind of disappeared 01:02:46.640 |
"Oh, but that's really a very nice achievement." 01:03:09.160 |
And I mean, it was really like this moment for me 01:03:15.320 |
and yeah, it's a really great accomplishment. 01:03:18.200 |
And it was great also to be there with the team in the room. 01:03:25.920 |
the other interesting thing is just like watching Kasparov, 01:03:36.080 |
I mean, whenever you lose, I've done a lot of sports. 01:03:38.320 |
You sometimes say excuses, you look for reasons. 01:03:50.000 |
you could say, well, it felt awkward, it wasn't, 01:03:55.160 |
And it was beautiful to look at a human being 01:04:00.240 |
I mean, it's a beautiful moment for researchers. 01:04:05.240 |
It was, I mean, probably the highlight of my career so far 01:04:29.840 |
- Also on the other side, it's a popularization of AI too, 01:04:34.040 |
because just like traveling to the moon and so on. 01:04:38.200 |
I mean, this is where a very large community of people 01:04:55.880 |
that we must sort of try to explain what it is. 01:05:03.640 |
So it may be everyone experienced an AI playing a video game, 01:05:10.240 |
and some people might even call that AI already, right? 01:05:38.440 |
The main thing I'm thinking about actually is what's next 01:05:47.120 |
and then there's like the sort of three-dimensional worlds 01:05:50.280 |
that we've seen also like pretty good performance 01:05:54.120 |
that also some people at DeepMind and elsewhere 01:05:57.600 |
We've also seen some amazing results on like, 01:06:03.280 |
So for me, like the main thing I'm thinking about 01:06:07.960 |
So as a researcher, I see sort of two tensions 01:06:20.480 |
thanks to the application of StarCraft is very hard, 01:06:23.320 |
we develop some techniques, some new research 01:06:27.480 |
Like are there other applications where we can apply these? 01:06:32.880 |
you can think of feeding back to sort of the community 01:06:37.440 |
we took from, which was mostly sequence modeling 01:06:41.680 |
So we've developed and extended things from the transformer 01:06:48.120 |
We combine LSTM and transformers in interesting ways. 01:06:51.280 |
So that's perhaps the kind of lowest hanging fruit 01:06:57.600 |
of machine learning that's not playing video games. 01:07:00.880 |
- Let me go old school and jump to Mr. Alan Turing. 01:07:08.440 |
it's a natural language test, a conversational test. 01:07:11.560 |
What's your thought of it as a test for intelligence? 01:07:25.640 |
because I also like sequences and language understanding. 01:07:37.280 |
which obviously would never pass the Turing test 01:07:51.760 |
in terms of asking or conversing with it, right? 01:08:12.480 |
I mean, I think they have these competitions every year. 01:08:34.960 |
of a genuine, rich, fulfilling human conversation 01:08:41.600 |
Like the Alexa Prize, which I'm not as well familiar with, 01:08:52.200 |
So basically forcing the agent not to just fool 01:08:55.480 |
but to have an engaging conversation kind of thing. 01:09:06.380 |
And if you have in general, how far away are we from, 01:09:21.720 |
that kind of conversation, have you thought about it? 01:09:23.620 |
- Yeah, so I think you touched here on the critical point, 01:09:32.860 |
which describes sort of grand challenges of physics. 01:09:37.360 |
And he argues that, well, okay, for instance, 01:09:46.580 |
We really don't know or cannot kind of make any progress. 01:09:57.860 |
So I see the Turing test as, in the full Turing test, 01:10:27.300 |
and I probably would not recommend people doing a PhD 01:10:35.460 |
- Yeah, but that said, you said the exact same thing 01:10:45.020 |
the person who passes the Turing test in three years. 01:10:57.780 |
I really wouldn't have not, even six months ago, 01:11:00.780 |
I would not have predicted the level that we see 01:11:03.220 |
that these agents can deliver at Grandmaster level. 01:11:10.060 |
and basically my concern is not that something could happen, 01:11:13.580 |
a breakthrough could happen that would bring us 01:11:18.380 |
is that I just think the statistical approach to it, 01:11:24.100 |
So we need a breakthrough, which is great for the community. 01:11:28.260 |
But given that, I think there's quite more uncertainty. 01:11:31.740 |
Whereas for StarCraft, I knew what the steps would be 01:11:38.060 |
I think it was clear that using the imitation learning part 01:11:44.300 |
were gonna be key, and it turned out that this was the case 01:11:48.220 |
and a little more was needed, but not much more. 01:11:51.540 |
For Turing test, I just don't know what the plan 01:12:01.420 |
but there are quite a few sub challenges that are related 01:12:09.020 |
like Google already has like the Google Assistant, 01:12:15.380 |
That I start to believe maybe we're reaching a point 01:12:22.380 |
'cause it echoes very much the StarCraft conversation. 01:12:26.820 |
Let's break it down into small pieces and solve those, 01:12:31.300 |
Great, but that said, you're behind some of the 01:12:35.180 |
sort of biggest pieces of work in deep learning 01:12:42.260 |
What do you think of the current limits of deep learning 01:12:50.100 |
to define the main challenge in deep learning, 01:12:53.140 |
it's a challenge that probably has been the challenge 01:12:55.660 |
for many years and is that of generalization. 01:12:59.660 |
So what that means is that all that we're doing 01:13:06.700 |
And when the data we see is not from the same distribution 01:13:11.700 |
or even if there are sometimes that it is very close 01:13:15.060 |
to the distribution, but because of the way we train it 01:13:18.140 |
with limited samples, we then get to this stage 01:13:27.700 |
And I think adversarial examples are a clear example 01:13:30.780 |
of this, but if you study machine learning and literature 01:13:39.660 |
and they had some guarantees about generalization, 01:13:45.500 |
or even within distribution where you take an image, 01:13:51.220 |
So I think really, I don't see a lot of progress 01:13:56.220 |
on generalization in the strong generalization sense 01:14:01.820 |
I think our neural networks, you can always find 01:14:06.820 |
design examples that will make their outputs arbitrary, 01:14:10.980 |
which is not good because we humans would never be fooled 01:14:15.980 |
by these kind of images or manipulation of the image. 01:14:19.900 |
And if you look at the mathematics, you kind of understand 01:14:22.660 |
this is a bunch of matrices multiplied together. 01:14:30.820 |
So I think that's really the underlying topic. 01:14:34.500 |
Many times we see when even at the grand stage 01:14:38.700 |
of like Turing test generalization, I mean, if you start, 01:14:43.100 |
I mean, passing the Turing test, should it be in English 01:14:52.260 |
in a different language, you actually will go 01:14:54.060 |
and do some research and try to translate it and so on. 01:15:01.020 |
And it's really a difficult problem and very fascinating 01:15:06.260 |
But do you think it's, if you were to try to solve it, 01:15:10.460 |
can you not grow the size of data intelligently 01:15:14.220 |
in such a way that the distribution of your training set 01:15:17.380 |
does include the entirety of the testing set? 01:15:24.940 |
- So a path that has worked well, and it worked well 01:15:27.860 |
in StarCraft and in machine translation and in languages, 01:15:32.780 |
And that's kind of been maybe the only single formula 01:15:37.340 |
that still delivers today in deep learning, right? 01:15:44.020 |
really do more and more of the things that we thought, 01:15:47.060 |
oh, there's no way it can generalize to these 01:15:51.300 |
But I don't think fundamentally it will be solved with this. 01:15:54.820 |
And for instance, I'm really liking some style or approach 01:16:02.100 |
but it would have programs or some discrete decision-making 01:16:06.380 |
because there is where I feel there's a bit more, 01:16:09.700 |
like, I mean, the example of, the best example, 01:16:14.620 |
I also worked a bit on, oh, like we can learn an algorithm 01:16:18.780 |
So you give it many examples and it's gonna sort your, 01:16:21.340 |
sort the input numbers or something like that. 01:16:27.740 |
you give me some numbers or you ask me to create an algorithm 01:16:33.700 |
which will be fragile because it's gonna go out of range 01:16:55.860 |
I mean, scale is a bit more boring, but it really works. 01:16:59.460 |
And then maybe programs and discrete abstractions 01:17:02.860 |
are a bit less developed, but clearly I think 01:17:06.380 |
they're quite exciting in terms of future for the field. 01:17:09.900 |
- Do you draw any insight wisdom from the 80s 01:17:13.460 |
and expert systems and symbolic systems, symbolic computing? 01:17:16.900 |
Do you ever go back to those sort of reasoning, 01:17:30.180 |
To me, the problem really is, what are you trying to solve? 01:17:34.260 |
If what you're trying to solve is so important 01:17:39.140 |
then absolutely use rules, use domain knowledge, 01:17:44.140 |
and then use a bit of the magic of machine learning 01:17:46.860 |
to empower or to make the system as the best system 01:17:50.060 |
that will detect cancer or detect weather patterns, right? 01:17:55.060 |
Or in terms of StarCraft, it also was a very big challenge. 01:17:59.060 |
So I was definitely happy that if we had to cut a corner here 01:18:04.180 |
and there, it could have been interesting to do. 01:18:09.500 |
about expert systems because it's a very, you can define, 01:18:24.420 |
but that has also neural networks incorporated 01:18:28.980 |
So absolutely, I mean, we should definitely go back 01:18:31.740 |
to those ideas and anything that makes the problem simpler. 01:18:35.300 |
As long as your problem is important, that's okay. 01:18:37.900 |
And that's research driving a very important problem. 01:18:40.940 |
And on the other hand, if you wanna really focus 01:18:46.500 |
then of course you must try not to look at imitation data 01:18:54.060 |
that would help a lot or even feature engineering, right? 01:18:56.900 |
So this is a tension that depending on what you do, 01:19:05.900 |
if you're, as long as what you're doing is important 01:19:16.780 |
But one is translating from image captioning, 01:19:28.460 |
that resonates throughout your work, actually. 01:19:33.140 |
So the underlying nature of reality being language, 01:19:38.740 |
- So what's the connection between images and text, 01:20:00.060 |
you can now really input anything to a neural network 01:20:06.060 |
and then the neural network will learn a function F 01:20:09.500 |
that will take X as an input and produce any output Y. 01:20:16.140 |
or like a fixed vectors or anything like that. 01:20:26.500 |
So that paradigm was tested in a very interesting way 01:20:31.500 |
when we moved from translating French to English 01:20:44.260 |
in this thing that was doing machine translation, 01:20:50.420 |
like it was producing captions that seemed like, 01:21:19.540 |
whatever the flavor of the month of the model is. 01:21:22.380 |
And then as long as we have enough supervised data, 01:21:25.700 |
really this formula will work and will keep working, 01:21:31.780 |
model of these generalization issues that I mentioned before. 01:21:36.700 |
sort of form a representation that's meaningful, I think. 01:21:43.500 |
once you are able to form that representation, 01:21:46.460 |
you could basically take anything, any sequence. 01:21:55.340 |
So we didn't really touch on the long-term aspect. 01:22:12.420 |
So we would have to multiply 22 times 60 seconds per minute 01:22:17.420 |
times maybe at least 10 minutes per game on average. 01:22:25.660 |
but the trick really was to only observe, in fact, 01:22:39.980 |
is what is the gap gonna be until the next action? 01:22:48.060 |
that we have in the dataset that Blizzard provided, 01:22:51.940 |
it turns out that most games are actually only, 01:23:14.460 |
If you had to do it with reinforcement learning, 01:23:21.580 |
But thankfully, because of imitation learning, 01:23:24.540 |
we didn't kind of have to deal with this directly. 01:23:29.580 |
and what happened is you just take all your workers 01:23:33.340 |
And that sort of is kind of obvious in retrospect 01:23:44.740 |
because it basically doesn't know almost anything. 01:23:51.060 |
because the credit assignment issue in ARR is really, really hard. 01:23:57.580 |
and that's maybe a research challenge for the future. 01:24:05.420 |
which I believe is within the realm of what transformers can do. 01:24:10.380 |
Yeah, I guess the difference between StarCraft and Go is 01:24:14.540 |
in Go and chess, stuff starts happening right away. 01:24:23.460 |
it's possible to develop reasonable strategies quickly 01:24:30.620 |
but one action is what people would call the God action 01:24:34.140 |
that would be, if you had expanded the whole search tree, 01:24:56.420 |
in terms of action space size, is way harder. 01:25:10.580 |
For humans, maybe playing StarCraft seems more intuitive 01:25:15.900 |
I mean, the graphics and everything moves smoothly, 01:25:20.140 |
I mean, Go is a game that I would really need to study. 01:25:23.860 |
But for machines, kind of maybe it's the reverse, yes. 01:25:27.020 |
- Which shows you the gap actually between deep learning 01:25:32.140 |
So you developed a lot of really interesting ideas. 01:25:44.500 |
Do you like, like what was it, Ian Goodfellow said 01:25:53.340 |
- He thinks beers are essential for coming up with new ideas. 01:25:55.820 |
- We had beers to decide to play another game 01:26:02.660 |
Actually, I explained this in a DeepMind retreat 01:26:05.780 |
and I said, this is the same as the GAN story. 01:26:09.540 |
let's play a game next week and that's what happened. 01:26:15.860 |
- But in general, like, do you like brainstorming? 01:26:18.220 |
Do you like thinking alone, working stuff out? 01:26:20.140 |
- So I think throughout the years also things changed, right? 01:26:23.860 |
So initially I was very fortunate to be with great minds 01:26:33.940 |
I was really fortunate to join Brain at the very good time. 01:26:37.660 |
So at that point, ideas, I was just kind of brainstorming 01:26:53.140 |
It's very hard at some point to not communicate 01:27:00.460 |
So definitely that communication aspect needs to be there, 01:27:07.580 |
Nowadays, I'm also trying to be a bit more strategic 01:27:15.020 |
So I was describing a little bit this sort of tension 01:27:22.940 |
applications that can drive the research, right? 01:27:25.580 |
And honestly, the formula that has worked best for me 01:27:31.540 |
and then try to see how research fits into it, 01:27:34.620 |
how it doesn't fit into it, and then you must innovate. 01:27:37.820 |
So I think machine translation drove sequence to sequence. 01:27:42.820 |
Then maybe like learning algorithms that had to, 01:27:47.140 |
like combinatorial algorithms led to pointer networks. 01:27:50.540 |
StarCraft led to really scaling up imitation learning 01:27:55.540 |
So that's been a formula that I personally like, 01:28:02.740 |
where you just want to investigate model-based RL 01:28:14.260 |
You need to kind of a minimal environment to try things. 01:28:21.020 |
and something I've also done quite a few times, 01:28:24.060 |
both at Brain, at DeepMind, and obviously as a PhD. 01:28:27.580 |
So I think besides the ideas and discussions, 01:28:34.660 |
because you start sort of guiding not only your own goals, 01:28:39.660 |
but other people's goals to the next breakthrough. 01:28:43.860 |
So you must really kind of understand this feasibility also, 01:28:50.340 |
Whether this domain is ready to be tackled or not, 01:29:01.060 |
which I think as a grad student, I just had no idea. 01:29:07.380 |
And I think this has been maybe the major change. 01:29:09.780 |
And I recommend people kind of feed forward to success, 01:29:21.020 |
which sometimes you stumble upon some interesting things, 01:29:23.820 |
but in general, it's also good to plan a bit. 01:29:30.460 |
of taking a really hard problem, stepping right in, 01:29:40.100 |
There's a silly optimism and a critical sort of skepticism 01:29:48.380 |
which is why it's good to have a team of people 01:30:08.340 |
And I'm just following his lead, which is great, 01:30:12.620 |
So these things are obviously quite important 01:30:17.340 |
that you wanna be surrounded by people who are diverse. 01:30:28.340 |
who actually have an idea that I might not think it's good, 01:30:34.940 |
I've been proven wrong many, many times as well. 01:30:39.140 |
Your colleagues are more important than yourself, I think. 01:30:44.580 |
Now, let's real quick talk about another impossible problem. 01:30:50.460 |
- What do you think it takes to build a system 01:30:54.100 |
We talked a little bit about the Turing test, StarCraft, 01:30:56.380 |
all of these have echoes of general intelligence. 01:31:17.220 |
So what I'm trying to then come up with for myself 01:31:28.460 |
that no longer sort of overfit to a single task, right? 01:31:31.900 |
But actually kind of learn the skill of learning, 01:31:37.900 |
And that actually is a field that I am fascinated by, 01:31:41.460 |
which is the learning to learn or meta-learning, 01:31:45.020 |
which is about no longer learning about a single domain. 01:31:48.620 |
So you can think about the learning algorithm itself 01:31:52.700 |
So the same formula we applied for AlphaStar or StarCraft, 01:31:56.780 |
we can now apply to kind of almost any video game 01:31:59.420 |
or you could apply to many other problems and domains. 01:32:03.540 |
But the algorithm is what's kind of generalizing. 01:32:06.980 |
But the neural network, those weights are useless 01:32:12.060 |
I train a network to play very well at Protoss versus Protoss. 01:32:20.620 |
I would need to retrain a network from scratch 01:32:24.820 |
That's beautiful, but the network itself will not be useful. 01:32:28.540 |
So I think if I see an approach that can absorb 01:32:36.660 |
without the need to kind of restart the process, 01:32:47.620 |
like should Turing test be solved before AGI? 01:32:51.740 |
I think concretely, I would like to see clearly 01:32:56.940 |
meaning there is an architecture or a network 01:33:00.860 |
that as it sees new problem or new data, it solves it. 01:33:08.300 |
it should solve it at the same speed that we do solve 01:33:11.580 |
When I define you a new object and you have to recognize it. 01:33:25.900 |
and what's the exact benchmark is a bit difficult. 01:33:44.660 |
- Yeah, there's an emotional, psychological level 01:33:53.780 |
that we just pass judgment on as human beings, 01:34:26.420 |
but then like from Starcraft to Wikipedia and back. 01:34:38.220 |
as we discovered, for instance, with ImageNet, 01:34:43.060 |
So I think maybe there's a lack of benchmark, 01:34:47.820 |
and the community will then work towards that. 01:34:50.700 |
And then beyond what AGI might mean or would imply, 01:34:57.020 |
I really am hopeful to see basically machine learning 01:35:05.300 |
that might not have the resources to hire an assistant 01:35:08.740 |
or that they might not even know what the weather is like. 01:35:18.020 |
I think that's maybe what we should also not lose focus. 01:35:25.540 |
but I think the way that DeepMind puts it is, 01:35:28.500 |
and then use it to solve everything else, right? 01:35:33.500 |
- Yeah, we shouldn't forget about all the positive things 01:35:47.140 |
do you have any worry about the existential threat 01:35:49.660 |
of artificial intelligence in the near or far future 01:36:06.100 |
and even like whole research field on AI safety emerging 01:36:09.260 |
and in conferences and so on, I think that's great. 01:36:12.620 |
In the long term, I really hope we just can simply 01:36:17.580 |
have the benefits outweigh the potential dangers. 01:36:23.380 |
but also we must remain vigilant to kind of monitor 01:36:33.740 |
or to redirect our efforts if need be, right? 01:36:36.860 |
So, but I'm quite optimistic about the technology 01:36:48.580 |
but obviously that's the one I kind of have more power on. 01:36:52.500 |
So clearly I do start thinking more and more about this 01:37:02.180 |
which is a field that so far I have not really contributed 01:37:05.220 |
to, but maybe there's something to be done there as well. 01:37:08.980 |
You know, I talk about this with a few folks, 01:37:11.460 |
but it's important to ask you and shove it in your head 01:37:14.860 |
because you're at the leading edge of actually 01:37:21.500 |
it's arguably at the very cutting edge of the kind of thing 01:37:32.660 |
to the kind of thing that people might be afraid of, 01:37:38.300 |
And it's also good that you're not as worried 01:37:47.740 |
but obviously we should prepare for it, right? 01:37:55.260 |
misuse of the technologies as with any technologies, right? 01:38:15.540 |
to the new things that will happen in the future, 01:38:18.900 |
that we can still also push the research to the avenue 01:38:35.420 |
of all the time that I spend doing what I'm doing, really. 01:38:40.020 |
- Where do you see the deep learning as a field heading? 01:38:42.980 |
Where do you think the next big breakthrough might be? 01:38:53.100 |
with some form of discretization, program synthesis. 01:38:56.660 |
I think that's kind of as a research in itself 01:39:08.620 |
I don't think that's gonna be what's gonna happen this year, 01:39:11.500 |
but also this idea of starting not to throw away 01:39:15.820 |
all the weights, that this idea of learning to learn 01:39:24.980 |
And you can have an agent that is kind of solving 01:39:34.660 |
And it should really be kind of almost the same network, 01:39:42.700 |
with a optimization algorithm attached to it. 01:39:45.620 |
But I think this idea of generalization to new task 01:39:49.300 |
is something that we first must define good benchmarks, 01:39:56.500 |
but I think if you have a very limited domain, 01:40:06.220 |
in computer vision, we should start thinking, 01:40:08.860 |
I really like a talk that Leon Boutou gave at ICML 01:40:12.700 |
a few years ago, which is this train-test paradigm 01:40:28.180 |
And in meta-learning, we call these the meta-training set 01:40:31.100 |
and the meta-test set, which is really thinking about 01:40:52.060 |
we probably will see quite a few more interest and progress 01:41:00.940 |
- Do you have any hope or interest in knowledge graphs 01:41:12.140 |
but I mean a different kind of knowledge graph, 01:41:14.900 |
sort of like semantic graphs where there's concepts. 01:41:23.100 |
so I've been quite interested in sequences first 01:41:26.420 |
and then more interesting or different data structures 01:41:48.660 |
What's kind of the killer application of graphs, right? 01:41:51.420 |
And perhaps if we could extract a knowledge graph 01:41:56.420 |
from Wikipedia automatically, that would be interesting 01:42:18.860 |
So I think I really like the idea of a knowledge graph. 01:42:27.420 |
or as part of the research we did for StarCraft, 01:42:31.340 |
I thought, wouldn't it be cool to give the graph 01:42:34.420 |
of all these buildings that depend on each other 01:42:39.420 |
and units that have prerequisites of being built by that. 01:42:50.100 |
or to think of really StarCraft as a giant graph 01:42:54.940 |
you just kind of start taking branches and so on. 01:42:57.980 |
And we did a bit of research on this, nothing too relevant, 01:43:12.340 |
being able to generate knowledge representations 01:43:20.940 |
So there's a lot of interesting aspect there. 01:43:22.860 |
And for me personally, I'm just a huge fan of Wikipedia 01:43:29.140 |
aren't taking advantage of all the structured knowledge 01:43:47.980 |
I mean, that sort of shows that the algorithm works 01:43:51.580 |
because we wouldn't want to have created by mistake 01:43:55.580 |
something in the architecture that happens to work 01:44:00.100 |
So as verification, I think that's an obvious next step 01:44:09.300 |
so agents and players can specialize on different skill sets 01:44:15.980 |
I think we've seen AlphaStar understanding very well 01:44:19.500 |
when to take battles and when to not do that. 01:44:33.420 |
But I have not perhaps seen as much as I would like 01:44:37.300 |
this idea of the poker idea that you mentioned, right? 01:44:46.100 |
of what the opponent is doing and reacting to that 01:44:50.100 |
and sort of trying to trick the player to do something else 01:44:58.340 |
So I think purely from a research standpoint, 01:45:01.620 |
there's perhaps also quite a few things to be done there 01:45:10.980 |
in even auctions, manipulating other players, 01:45:13.740 |
sort of forming a belief state and just messing with people. 01:45:30.500 |
or perhaps StarCraft driving new techniques, right? 01:45:33.260 |
As I said, this is always the tension between the two. 01:45:36.660 |
- Wow, Oriol, thank you so much for talking today.