back to index

Oriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20


Chapters

0:0
3:13 Describe Starcraft
13:49 Parameters of the Challenge
27:41 Observing the Game
38:6 Cloaked Units
45:16 Protoss Race
67:5 The Turing Test
79:54 Sequence To Sequence Learning
84:10 Difference between Starcraft and Go
85:38 Developing New Ideas
91:43 Meta Learning
95:47 The Existential Threat of Artificial Intelligence in the Near or Far Future
103:34 Next for Deep Mind

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Ariel Vinales.
00:00:03.280 | He's a senior research scientist at Google DeepMind,
00:00:05.920 | and before that, he was at Google Brain and Berkeley.
00:00:09.120 | His research has been cited over 39,000 times.
00:00:13.280 | He's truly one of the most brilliant and impactful minds
00:00:16.520 | in the field of deep learning.
00:00:18.200 | He's behind some of the biggest papers and ideas in AI,
00:00:20.960 | including sequence-to-sequence learning,
00:00:23.080 | audio generation, image captioning,
00:00:25.480 | neural machine translation,
00:00:27.040 | and, of course, reinforcement learning.
00:00:29.640 | He's a lead researcher of the AlphaStar Project,
00:00:32.840 | creating an agent that defeated a top professional
00:00:35.760 | at the game of StarCraft.
00:00:38.080 | This conversation is part
00:00:39.800 | of the Artificial Intelligence Podcast.
00:00:41.800 | If you enjoy it, subscribe on YouTube, iTunes,
00:00:44.920 | or simply connect with me on Twitter @LexFriedman,
00:00:48.800 | spelled F-R-I-D.
00:00:51.240 | And now, here's my conversation with Ariel Vinales.
00:00:55.440 | You spearheaded the DeepMind team behind AlphaStar
00:00:59.600 | that recently beat a top professional player at StarCraft.
00:01:02.840 | So, you have an incredible wealth of work
00:01:07.680 | in deep learning and a bunch of fields,
00:01:09.440 | but let's talk about StarCraft first.
00:01:11.840 | Let's go back to the very beginning,
00:01:13.720 | even before AlphaStar, before DeepMind,
00:01:16.680 | before deep learning first.
00:01:18.840 | What came first for you, a love for programming
00:01:22.400 | or a love for video games?
00:01:24.960 | - I think for me, it definitely came first
00:01:28.520 | the drive to play video games.
00:01:31.920 | I really liked computers.
00:01:35.240 | I didn't really code much, but what I would do
00:01:38.560 | is I would just mess with the computer, break it and fix it.
00:01:42.040 | That was the level of skills, I guess,
00:01:43.760 | that I gained in my very early days.
00:01:46.360 | I mean, when I was 10 or 11.
00:01:48.480 | And then I really got into video games,
00:01:50.920 | especially StarCraft, actually, the first version.
00:01:53.640 | I spent most of my time just playing
00:01:55.760 | kind of pseudo-professionally,
00:01:57.040 | as professionally as you could play back in '98 in Europe,
00:02:01.000 | which was not a very main scene
00:02:03.040 | like what's called nowadays e-sports.
00:02:05.800 | - Right, of course, in the '90s.
00:02:07.360 | So, how'd you get into StarCraft?
00:02:09.880 | What was your favorite race?
00:02:11.640 | How did you develop your skill?
00:02:15.040 | What was your strategy?
00:02:16.840 | All that kind of thing.
00:02:17.960 | - So, as a player, I tended to try to play not many games,
00:02:21.480 | not to kind of disclose the strategies
00:02:23.640 | that I kind of developed.
00:02:25.360 | And I like to play random, actually,
00:02:27.520 | not in competitions, but just to...
00:02:30.000 | I think in StarCraft, there's three main races,
00:02:33.360 | and I found it very useful to play with all of them.
00:02:36.560 | So, I would choose random many times,
00:02:38.320 | even sometimes in tournaments,
00:02:40.160 | to gain skill on the three races,
00:02:42.320 | because it's not how you play against someone,
00:02:45.400 | but also if you understand the race because you play it,
00:02:48.720 | you also understand what's annoying,
00:02:51.000 | then when you're on the other side,
00:02:52.480 | what to do to annoy that person,
00:02:54.160 | to try to gain advantages here and there, and so on.
00:02:57.280 | So, I actually played random,
00:02:59.080 | although I must say, in terms of favorite race,
00:03:02.000 | I really like Zerk.
00:03:03.640 | I was probably best at Zerk,
00:03:05.480 | and that's probably what I tend to use
00:03:08.320 | towards the end of my career, before starting university.
00:03:11.400 | - So, let's step back a little bit.
00:03:13.280 | Could you try to describe StarCraft
00:03:15.600 | to people that may never have played video games,
00:03:18.880 | especially the massively online variety like StarCraft?
00:03:22.280 | - So, StarCraft is a real-time strategy game.
00:03:25.880 | And the way to think about StarCraft,
00:03:27.760 | perhaps if you understand a bit chess,
00:03:30.920 | is that there's a board, which is called map,
00:03:34.160 | or, again, the map where people play against each other.
00:03:39.080 | There's obviously many ways you can play,
00:03:40.920 | but the most interesting one is the one-versus-one setup,
00:03:44.560 | where you just play against someone else,
00:03:47.320 | or even the built-in AI, right?
00:03:48.800 | The Blizzard put a system
00:03:50.680 | that can play the game reasonably well,
00:03:52.520 | if you don't know how to play.
00:03:54.400 | And then, in this board, you have, again, pieces,
00:03:57.720 | like in chess, but these pieces are not there initially,
00:04:01.320 | like they are in chess.
00:04:02.240 | You actually need to decide to gather resources,
00:04:05.720 | to decide which pieces to build.
00:04:07.800 | So, in a way, you're starting almost with no pieces.
00:04:10.680 | You start gathering resources.
00:04:12.640 | In StarCraft, there's minerals and gas that you can gather,
00:04:16.120 | and then you must decide how much do you want to focus,
00:04:19.360 | for instance, on gathering more resources
00:04:21.400 | or starting to build units or pieces.
00:04:24.280 | And then, once you have enough pieces,
00:04:27.120 | or maybe like attack, a good attack composition,
00:04:32.040 | then you go and attack the other side of the map.
00:04:35.400 | And now, the other main difference with chess
00:04:37.720 | is that you don't see the other side of the map.
00:04:39.840 | So, you're not seeing the moves of the enemy.
00:04:43.280 | It's what we call partially observable.
00:04:45.360 | So, as a result, you must not only decide
00:04:48.600 | trading off economy versus building your own units,
00:04:52.200 | but you also must decide whether you want to scout
00:04:54.840 | to gather information, but also by scouting,
00:04:57.720 | you might be giving away some information
00:04:59.400 | that you might be hiding from the enemy.
00:05:01.840 | So, there's a lot of complex decision-making
00:05:04.880 | all in real time.
00:05:05.880 | There's also, unlike chess, this is not a turn-based game.
00:05:10.000 | You play basically all the time, continuously,
00:05:13.560 | and thus, some skill in terms of speed and accuracy
00:05:16.800 | of clicking is also very important.
00:05:18.800 | And people that train for this really play this game
00:05:21.360 | at an amazing skill level.
00:05:23.440 | I've seen many times this,
00:05:25.680 | and if you can witness this live,
00:05:27.240 | it's really, really impressive.
00:05:29.360 | So, in a way, it's kind of a chess
00:05:31.280 | where you don't see the other side of the board.
00:05:33.280 | You're building your own pieces,
00:05:35.080 | and you also need to gather resources
00:05:37.080 | to basically get some money to build other buildings,
00:05:40.560 | pieces, technology, and so on.
00:05:42.720 | - From the perspective of a human player,
00:05:45.000 | the difference between that and chess,
00:05:47.040 | or maybe that and a game like turn-based strategy,
00:05:50.640 | like "Heroes of Might and Magic,"
00:05:52.640 | is that there's an anxiety,
00:05:55.000 | 'cause you have to make these decisions really quickly.
00:05:58.640 | And if you are not actually aware of what decisions work,
00:06:03.640 | it's a very stressful balance.
00:06:06.320 | Everything you describe is actually quite stressful,
00:06:08.720 | difficult to balance for an amateur human player.
00:06:11.560 | I don't know if it gets easier at the professional level.
00:06:14.000 | Like, if they're fully aware of what they have to do,
00:06:16.320 | but at the amateur level, there's this anxiety,
00:06:19.080 | "Oh, crap, I'm being attacked.
00:06:20.280 | "Oh, crap, I have to build up resources.
00:06:22.560 | "Oh, I have to probably expand."
00:06:24.160 | And all these, the real-time strategy aspect
00:06:28.360 | is really stressful, and computation, I'm sure, difficult.
00:06:31.200 | We'll get into it, but for me, Battle.net,
00:06:35.840 | so StarCraft was released in '98, 20 years ago,
00:06:41.920 | which is hard to believe.
00:06:43.880 | And Blizzard Battle.net with Diablo in '96 came out.
00:06:48.880 | And to me, it might be a narrow perspective,
00:06:52.600 | but it changed online gaming, and perhaps society forever.
00:06:57.240 | But I may have a way too narrow viewpoint,
00:07:00.320 | but from your perspective,
00:07:02.240 | can you talk about the history of gaming
00:07:05.600 | over the past 20 years?
00:07:07.000 | Is this, how transformational,
00:07:09.640 | how important is this line of games?
00:07:12.760 | - Right, so I think I kind of was an active gamer
00:07:16.920 | whilst this was developing, the internet, online gaming.
00:07:20.560 | So for me, the way it came was
00:07:23.040 | I played other games, strategy-related.
00:07:26.400 | I played a bit of Common and Conquer,
00:07:28.400 | and then I played Warcraft II, which is from Blizzard.
00:07:31.840 | But at the time, I didn't know,
00:07:33.040 | I didn't understand about what Blizzard was or anything.
00:07:36.040 | Warcraft II was just a game,
00:07:37.320 | which was actually very similar to StarCraft in many ways.
00:07:40.280 | It's also a real-time strategy game,
00:07:42.480 | where there's orcs and humans, so there's only two races.
00:07:45.400 | - But it was offline.
00:07:46.520 | - And it was offline, right?
00:07:48.000 | So I remember a friend of mine came to school,
00:07:51.600 | say, "Oh, there's this new cool game called StarCraft."
00:07:53.960 | And I just said,
00:07:54.800 | "Oh, this sounds like just a copy of Warcraft II,"
00:07:57.600 | until I kind of installed it.
00:07:59.720 | And at the time, I am from Spain,
00:08:01.960 | so we didn't have very good internet, right?
00:08:04.640 | So there was, for us, StarCraft became first
00:08:07.600 | kind of an offline experience,
00:08:09.520 | where you kind of start to play these missions, right?
00:08:12.920 | You play against some sort of scripted things
00:08:15.720 | to develop the story of the characters in the game.
00:08:18.960 | And then later on, I start playing against the built-in AI,
00:08:23.480 | and I thought it was impossible to defeat it.
00:08:26.120 | Then eventually you defeat one,
00:08:27.440 | and you can actually play against seven built-in AIs
00:08:29.720 | at the same time, which also felt impossible,
00:08:32.680 | but actually it's not that hard to beat
00:08:35.280 | seven built-in AIs at once.
00:08:36.960 | So once we achieved that,
00:08:39.160 | also we discovered that we could play,
00:08:42.120 | as I said, internet wasn't that great,
00:08:43.840 | but we could play with a LAN, right?
00:08:45.920 | Like basically against each other
00:08:48.040 | if we were in the same place,
00:08:49.920 | because you could just connect machines with cables, right?
00:08:53.680 | So we started playing in LAN mode,
00:08:55.640 | as a group of friends, and it was really, really,
00:08:59.280 | like much more entertaining than playing against AIs.
00:09:02.320 | And later on, as internet was starting to develop
00:09:05.160 | and being a bit faster and more reliable,
00:09:07.400 | then it's when I started experiencing Battle.net,
00:09:09.720 | which is this amazing universe,
00:09:11.560 | not only because of the fact that you can play the game
00:09:14.720 | against anyone in the world,
00:09:16.440 | but you can also get to know more people.
00:09:20.200 | You just get exposed to now like this vast variety of,
00:09:23.080 | it's kind of a bit when the chats came about, right?
00:09:25.840 | There was a chat system, you could play against people,
00:09:29.040 | but you could also chat with people,
00:09:30.720 | not only about StarCraft, but about anything.
00:09:32.480 | And that became a way of life for kind of two years,
00:09:36.640 | and obviously then it became like kind of,
00:09:38.840 | it exploded in me that I started to play more seriously,
00:09:42.200 | going to tournaments and so on and so forth.
00:09:44.640 | - Do you have a sense on a societal sociological level,
00:09:49.640 | what's this whole part of society
00:09:52.200 | that many of us are not aware of?
00:09:53.760 | And it's a huge part of society, which is gamers.
00:09:56.840 | I mean, every time I come across that in YouTube
00:10:00.920 | or streaming sites, I mean,
00:10:03.160 | this is the huge number of people play games religiously.
00:10:07.600 | Do you have a sense of those folks,
00:10:08.880 | especially now that you've returned to that realm
00:10:10.840 | a little bit on the AI side?
00:10:12.560 | - Yeah, so in fact, even after StarCraft,
00:10:15.840 | I actually played World of Warcraft,
00:10:17.560 | which is maybe the main sort of online worlds
00:10:21.360 | and presence that you get to interact with lots of people.
00:10:24.600 | So I played that for a little bit.
00:10:26.320 | It was, to me, it was a bit less stressful than StarCraft
00:10:29.000 | because winning was kind of a given.
00:10:30.840 | You just put in this world and you can always
00:10:33.320 | complete missions.
00:10:34.920 | But I think it was actually the social aspect of,
00:10:38.640 | especially StarCraft first,
00:10:40.400 | and then games like World of Warcraft,
00:10:43.320 | really shaped me in a very interesting way.
00:10:46.880 | Because what you get to experience is just people
00:10:49.640 | you wouldn't usually interact with, right?
00:10:51.600 | So even nowadays, I still have many Facebook friends
00:10:54.920 | from the era where I played online
00:10:56.880 | and their ways of thinking is even political.
00:11:00.040 | They just, we don't live in,
00:11:01.560 | like we don't interact in the real world,
00:11:03.640 | but we were connected by basically fiber.
00:11:06.720 | And that way I actually get to understand a bit better
00:11:10.760 | that we live in a diverse world.
00:11:12.760 | And these were just connections that were made by,
00:11:15.560 | because I happened to go in a city,
00:11:18.040 | in a virtual city as a priest,
00:11:20.640 | and I met this warrior and we became friends,
00:11:23.600 | and then we started playing together, right?
00:11:25.640 | So I think it's transformative
00:11:28.720 | and more and more and more people are more aware of it.
00:11:31.240 | I mean, it's becoming quite mainstream,
00:11:33.440 | but back in the day, as you were saying in 2000, 2005,
00:11:37.560 | even it was very, still very strange thing to do,
00:11:42.040 | especially in Europe.
00:11:44.200 | I think there were exceptions like Korea, for instance,
00:11:47.120 | it was amazing that everything happened so early
00:11:50.560 | in terms of cyber cafes.
00:11:52.160 | Like if you go to Seoul, it's a city that,
00:11:56.280 | back in the day, StarCraft was kind of,
00:11:58.360 | you could be a celebrity by playing StarCraft,
00:12:00.600 | but this was like 99, 2000, right?
00:12:03.000 | It's not like recently.
00:12:04.120 | So yeah, it's quite interesting to look back.
00:12:08.480 | And yeah, I think it's changing society,
00:12:10.920 | the same way, of course, like technology
00:12:13.080 | and social networks and so on are also transforming things.
00:12:16.880 | - And a quick tangent, let me ask,
00:12:18.400 | you're also one of the most productive people
00:12:20.960 | in your particular chosen passion and path in life.
00:12:26.400 | And yet you're also appreciate and enjoy video games.
00:12:29.440 | Do you think it's possible to do,
00:12:31.160 | to enjoy video games in moderation?
00:12:35.760 | - Someone told me that you could choose two out of three.
00:12:39.880 | When I was playing video games,
00:12:41.120 | you could choose having a girlfriend,
00:12:43.680 | playing video games or studying.
00:12:46.200 | And I think for the most part, it was relatively true.
00:12:50.520 | These things do take time.
00:12:52.320 | Games like StarCraft, if you take the game pretty seriously
00:12:55.360 | and you wanna study it,
00:12:56.480 | then you obviously will dedicate more time to it.
00:12:59.040 | And I definitely took gaming
00:13:01.160 | and obviously studying very seriously.
00:13:03.640 | I love learning science and et cetera.
00:13:08.640 | So to me, especially when I started university undergrad,
00:13:13.080 | I kind of step off StarCraft,
00:13:14.880 | I actually fully stopped playing.
00:13:16.760 | And then World of Warcraft was a bit more casual.
00:13:19.000 | You could just connect online and I mean, it was fun.
00:13:22.840 | But as I said, that was not as much time investment
00:13:26.800 | as it was for me in StarCraft.
00:13:29.440 | - Okay, so let's get into AlphaStar.
00:13:31.600 | What are the, you're behind the team.
00:13:35.160 | So DeepMind has been working on StarCraft
00:13:37.200 | and released a bunch of cool open source agents
00:13:39.400 | and so on in the past few years.
00:13:41.280 | But AlphaStar really is the moment where
00:13:43.720 | the first time you beat a world class player.
00:13:49.120 | So what are the parameters of the challenge
00:13:51.560 | in the way that AlphaStar took it on?
00:13:53.440 | And how did you and David and the rest of the DeepMind team
00:13:57.400 | get into it?
00:13:58.240 | Consider that you can even beat the best in the world
00:14:00.920 | or top players.
00:14:02.440 | - I think it all started back in 2015.
00:14:07.440 | Actually I'm lying, I think it was 2014
00:14:10.760 | when DeepMind was acquired by Google.
00:14:13.640 | And I at the time was at Google Brain,
00:14:15.640 | which is it was in California, is still in California.
00:14:18.880 | We had this summit where we got together, the two groups.
00:14:21.800 | So Google Brain and Google DeepMind got together
00:14:24.360 | and we gave a series of talks.
00:14:26.320 | And given that they were doing
00:14:28.600 | deep reinforcement learning for games,
00:14:30.560 | I decided to bring up part of my past,
00:14:33.600 | which I had developed at Berkeley,
00:14:35.040 | like this thing which we call Berkeley Overmind,
00:14:37.360 | which is really just a StarCraft one bot, right?
00:14:40.120 | So I talked about that.
00:14:42.120 | And I remember Demis just came to me and said,
00:14:44.200 | "Well, maybe not now, it's perhaps a bit too early,
00:14:47.080 | but you should just come to DeepMind
00:14:48.880 | and do this again with deep reinforcement learning."
00:14:53.680 | And at the time it sounded very science fiction
00:14:56.520 | for several reasons.
00:14:58.720 | But then in 2016, when I actually moved to London
00:15:01.480 | and joined DeepMind, transferring from Brain,
00:15:04.720 | it became apparent that because of the AlphaGo moment
00:15:08.160 | and kind of Blizzard reaching out to us to say,
00:15:11.160 | "Wait, do you want the next challenge?"
00:15:12.960 | And also me being full-time at DeepMind,
00:15:15.040 | so sort of kind of all these came together.
00:15:17.400 | And then I went to Irvine in California,
00:15:20.920 | to the Blizzard headquarters,
00:15:22.520 | to just chat with them and try to explain
00:15:25.320 | how would it all work before you do anything.
00:15:27.760 | And the approach has always been
00:15:30.640 | about the learning perspective, right?
00:15:33.600 | So in Berkeley, we did a lot of rule-based conditioning
00:15:38.600 | and if you have more than three units, then go attack.
00:15:42.480 | And if the other has more units than me, I retreat,
00:15:45.000 | and so on and so forth.
00:15:46.320 | And of course, the point of deep reinforcement learning,
00:15:48.760 | deep learning, machine learning in general,
00:15:50.440 | is that all these should be learned behavior.
00:15:53.400 | So that kind of was the DNA of the project
00:15:56.920 | since its inception in 2016,
00:15:59.400 | where we just didn't even have an environment to work with.
00:16:02.840 | And so that's how it all started really.
00:16:05.800 | - So if you go back to that conversation with Demis,
00:16:08.520 | or even in your own head, how far away did you,
00:16:12.160 | because we're talking about Atari games,
00:16:14.520 | we're talking about Go, which is kind of,
00:16:16.760 | if you're honest about it,
00:16:17.840 | really far away from StarCraft.
00:16:19.800 | Well, now that you've beaten it,
00:16:22.200 | maybe you could say it's close,
00:16:23.320 | but it seems like StarCraft is way harder than Go,
00:16:27.200 | philosophically and mathematically speaking.
00:16:30.920 | So how far away did you think you were?
00:16:35.080 | Do you think in 2019 and '18,
00:16:37.280 | you could be doing as well as you have?
00:16:38.760 | - Yeah, when I kind of thought about,
00:16:40.880 | okay, I'm gonna dedicate a lot of my time
00:16:43.800 | and focus on this,
00:16:44.880 | and obviously I do a lot of different research
00:16:48.080 | in deep learning, so spending time on it,
00:16:50.400 | I mean, I really had to kind of think
00:16:52.240 | there's gonna be something good happening out of this.
00:16:55.880 | So really I thought, well, this sounds impossible,
00:16:59.120 | and it probably is impossible to do the full thing,
00:17:01.600 | like the all, like the full game,
00:17:04.280 | where you play one versus one,
00:17:06.240 | and it's only a neural network playing and so on.
00:17:09.760 | So it really felt like,
00:17:10.960 | I just didn't even think it was possible.
00:17:14.000 | But on the other hand,
00:17:14.840 | I could see some stepping stones like towards that goal.
00:17:19.080 | Clearly you could define sub-problems in StarCraft
00:17:21.600 | and sort of dissect it a bit and say,
00:17:23.400 | okay, here is a part of the game, here's another part.
00:17:26.720 | And also, obviously the fact,
00:17:29.400 | so this was really also critical to me,
00:17:31.280 | the fact that we could access human replays, right?
00:17:34.400 | So Blizzard was very kind,
00:17:35.720 | and in fact, they open sourced this for the whole community
00:17:38.560 | where you can just go,
00:17:39.960 | and it's not every single StarCraft game ever played,
00:17:43.040 | but it's a lot of them, you can just go and download.
00:17:45.880 | And every day they will, you can just query a dataset
00:17:48.720 | and say, well, give me all the games that were played today.
00:17:51.680 | And given my kind of experience with language and sequences
00:17:56.680 | and supervised learning, I thought, well,
00:17:58.560 | that's definitely gonna be very helpful
00:18:00.760 | and something quite unique now,
00:18:02.440 | because ever before we had such a large dataset of replays
00:18:08.240 | of people playing the game at this scale
00:18:11.000 | of such a complex video game, right?
00:18:12.520 | So that to me was a precious resource.
00:18:15.640 | And as soon as I knew that Blizzard was able
00:18:18.000 | to kind of give this to the community,
00:18:20.920 | I started to feel positive
00:18:22.240 | about something non-trivial happening.
00:18:24.240 | But I also thought the full thing, like really no rules,
00:18:28.360 | no single line of code that tries to say, well, I mean,
00:18:31.680 | if you see this, you need to build a detector,
00:18:33.320 | all these, not having any of these specializations
00:18:36.680 | seemed really, really, really difficult to me.
00:18:39.120 | - Intuitively.
00:18:39.960 | I do also like that Blizzard was teasing
00:18:42.680 | or even trolling you, sort of almost, yeah,
00:18:47.680 | pulling you in into this really difficult challenge.
00:18:50.280 | Did they have any awareness?
00:18:51.840 | What's the interest from the perspective of Blizzard,
00:18:55.640 | except just curiosity?
00:18:57.280 | - Yeah, I think Blizzard has really understood
00:18:59.440 | and really bring forward this competitiveness
00:19:03.240 | of eSports in games.
00:19:04.760 | StarCraft really kind of sparked a lot of,
00:19:07.800 | like something that almost was never seen,
00:19:10.680 | especially as I was saying, back in Korea.
00:19:13.920 | So they just probably thought, well,
00:19:16.400 | this is such a pure one versus one setup
00:19:18.840 | that it would be great to see if something
00:19:21.840 | that can play Atari or Go and then later on chess
00:19:26.640 | could even tackle these kind of complex
00:19:29.200 | real-time strategy game, right?
00:19:30.560 | So for them, they wanted to see first, obviously,
00:19:33.840 | whether it was possible, if the game they created
00:19:38.080 | was in a way solvable to some extent.
00:19:40.800 | And I think on the other hand,
00:19:42.120 | they also are a pretty modern company that innovates a lot.
00:19:45.720 | So just starting to understand AI for them
00:19:48.480 | to how to bring AI into games is not AI for games,
00:19:52.360 | but games for AI, right?
00:19:54.280 | I mean, both ways I think can work.
00:19:56.080 | And we obviously at DeepMind use games for AI, right?
00:20:00.000 | To drive AI progress, but Blizzard might actually be able
00:20:03.400 | to do and many other companies to start to understand
00:20:06.040 | and do the opposite.
00:20:06.880 | So I think that is also something
00:20:08.600 | they can get out of this.
00:20:09.760 | And they definitely, we have brainstormed a lot about this.
00:20:13.680 | - But one of the interesting things to me about StarCraft
00:20:16.040 | and Diablo and these games that Blizzard has created
00:20:19.360 | is the task of balancing classes, for example,
00:20:23.520 | sort of making the game fair from the starting point
00:20:27.440 | and then let skill determine the outcome.
00:20:30.920 | Is there, I mean, can you first comment,
00:20:33.560 | there's three races, Zerg, Protoss and Terran.
00:20:36.760 | I don't know if I've ever said that out loud.
00:20:38.920 | Is that how you pronounce it, Terran?
00:20:40.560 | - Terran, yeah.
00:20:41.920 | (laughing)
00:20:44.120 | - Yeah, I don't think I've ever in-person interacted
00:20:46.480 | with anybody about StarCraft, that's funny.
00:20:48.640 | So they seem to be pretty balanced.
00:20:51.760 | I wonder if the AI, the work that you're doing
00:20:56.240 | with AlphaStar would help balance them even further.
00:20:59.160 | Is that something you think about?
00:21:00.520 | Is that something that Blizzard is thinking about?
00:21:03.320 | - Right, so balancing when you add a new unit
00:21:06.400 | or a new spell type is obviously possible
00:21:09.120 | given that you can always train or pre-train at scale
00:21:13.160 | some agent that might start using that in unintended ways.
00:21:16.680 | But I think actually, if you understand how StarCraft
00:21:19.920 | has kind of co-evolved with players,
00:21:22.200 | in a way, I think it's actually very cool,
00:21:24.320 | the ways that many of the things and strategies
00:21:27.400 | that people came up with, right?
00:21:28.680 | So I think we've seen it over and over in StarCraft
00:21:32.280 | that Blizzard comes up with maybe a new unit
00:21:34.960 | and then some players get creative
00:21:37.240 | and do something kind of unintentional
00:21:39.080 | or something that Blizzard designers
00:21:40.840 | that just simply didn't test or think about.
00:21:43.560 | And then after that becomes kind of mainstream
00:21:46.200 | in the community, Blizzard patches the game
00:21:48.280 | and then they kind of maybe weaken that strategy
00:21:51.880 | or make it actually more interesting,
00:21:53.880 | but a bit more balanced.
00:21:55.400 | So these kind of continual talk between players and Blizzard
00:21:58.520 | is kind of what has defined them actually
00:22:01.680 | in actually most games, in StarCraft,
00:22:04.040 | but also in World of Warcraft, they would do that.
00:22:06.440 | There are several classes and it would be not good
00:22:09.280 | that everyone plays absolutely the same race and so on.
00:22:13.200 | So I think they do care about balancing, of course,
00:22:17.240 | and they do a fair amount of testing,
00:22:19.600 | but it's also beautiful to also see
00:22:22.120 | how players get creative anyways.
00:22:24.480 | And I mean, whether AI can be more creative at this point,
00:22:27.440 | I don't think so, right?
00:22:28.680 | I mean, it's just sometimes something so amazing happens.
00:22:31.560 | Like I remember back in the days,
00:22:33.680 | like you have these drop ships that could drop the rivers
00:22:36.920 | and that was actually not thought about
00:22:39.560 | that you could drop this unit
00:22:41.240 | that has this what's called splash damage
00:22:43.200 | that would basically eliminate
00:22:45.600 | all the enemies workers at once.
00:22:47.800 | No one thought that you could actually put them
00:22:50.120 | in really early game, do that kind of damage
00:22:53.040 | and then things change in the game.
00:22:55.400 | But I don't know, I think it's quite an amazing
00:22:58.000 | exploration process from both sides,
00:23:00.280 | players and Blizzard alike.
00:23:01.840 | - Well, it's almost like a reinforcement learning
00:23:04.280 | exploration, but I mean, the scale of humans
00:23:07.000 | that play Blizzard games is almost on the scale
00:23:12.000 | of a large scale, deep mind RL experiment.
00:23:15.320 | I mean, if you look at the numbers,
00:23:17.200 | that's, I mean, you're talking about,
00:23:18.680 | I don't know how many games,
00:23:19.520 | but hundreds of thousands of games probably a month.
00:23:22.040 | - Yeah.
00:23:22.880 | - So you could, it's almost the same as running RL agents.
00:23:27.880 | What aspect of the problem of Starcraft
00:23:31.200 | do you think is the hardest?
00:23:32.120 | Is it the, like you said, the imperfect information?
00:23:35.360 | Is it the fact they have to do long-term planning?
00:23:38.120 | Is it the real time aspects?
00:23:40.280 | We have to do stuff really quickly.
00:23:42.200 | Is it the fact that a large action space
00:23:44.720 | so you can do so many possible things?
00:23:47.600 | Or is it, you know, in the game theoretic sense,
00:23:51.080 | there is no Nash equilibrium,
00:23:52.400 | at least you don't know what the optimal strategy is
00:23:54.200 | 'cause there's way too many options.
00:23:56.480 | - Right.
00:23:57.320 | - Is there something that stands out
00:23:58.160 | as just like the hardest, the most annoying thing?
00:24:00.960 | - So when we sort of looked at the problem
00:24:04.160 | and start to define like the parameters of it, right?
00:24:07.600 | What are the observations?
00:24:08.760 | What are the actions?
00:24:10.520 | It became very apparent that, you know,
00:24:14.000 | the very first barrier that one would hit in Starcraft
00:24:17.120 | would be because of the action space being so large
00:24:20.680 | and us not being able to search like you could in Chess
00:24:24.840 | or Go, even though the search space is vast.
00:24:27.280 | The main problem that we identified
00:24:30.560 | was that of exploration, right?
00:24:32.400 | So without any sort of human knowledge or human prior,
00:24:36.680 | if you think about Starcraft
00:24:38.000 | and you know how deep reinforcement learning algorithm work,
00:24:42.000 | which is essentially by issuing random actions
00:24:45.360 | and hoping that they will get some wins sometimes
00:24:47.800 | so they could learn.
00:24:49.200 | So if you think of the action space in Starcraft,
00:24:52.800 | almost anything you can do in the early game is bad
00:24:55.880 | because any action involves taking workers,
00:24:58.720 | which are mining minerals for free.
00:25:01.360 | That's something that the game does automatically,
00:25:03.560 | sends them to mine.
00:25:04.920 | And you would immediately just take them out of mining
00:25:07.720 | and send them around.
00:25:09.080 | So just thinking how is it gonna be possible
00:25:13.600 | to get to understand these concepts,
00:25:16.880 | but even more like expanding, right?
00:25:19.240 | There's these buildings you can place
00:25:21.080 | in other locations in the map to gather more resources,
00:25:24.160 | but the location of the building is important.
00:25:26.840 | And you have to select a worker,
00:25:28.920 | send it walking to that location,
00:25:31.760 | build the building, wait for the building to be built,
00:25:34.120 | and then put extra workers there.
00:25:36.680 | So they start mining.
00:25:37.800 | That feels like impossible if you just randomly click
00:25:41.720 | to produce that state, desirable state,
00:25:44.520 | that then you could hope to learn from,
00:25:46.960 | because eventually that may yield to an extra win, right?
00:25:49.840 | So for me, the exploration problem,
00:25:51.800 | and due to the action space,
00:25:53.800 | and the fact that there's not really turns,
00:25:56.120 | there's so many turns because the game essentially
00:25:59.160 | ticks at 22 times per second.
00:26:02.080 | I mean, that's how they can discretize sort of time.
00:26:05.520 | Obviously, you always have to discretize time.
00:26:07.440 | There's no such thing as real time.
00:26:09.600 | But it's really a lot of time steps
00:26:12.560 | of things that could go wrong.
00:26:14.240 | And that definitely felt a priori like the hardest.
00:26:17.960 | You mentioned many good ones.
00:26:19.320 | I think partial observability,
00:26:21.360 | the fact that there is no perfect strategy
00:26:23.440 | because of the partial observability.
00:26:25.520 | Those are very interesting problems.
00:26:26.840 | We start seeing more and more now
00:26:28.520 | in terms of as we solve the previous ones.
00:26:31.040 | But the core problem to me was exploration,
00:26:34.240 | and solving it has been basically kind of the focus
00:26:37.720 | on how we saw the first breakthroughs.
00:26:39.760 | - So exploration in a multi-hierarchical way.
00:26:43.680 | So like 22 times a second exploration
00:26:46.600 | has a very different meaning than it does
00:26:48.640 | in terms of should I gather resources early,
00:26:51.480 | or should I wait, or so on.
00:26:53.160 | So how do you solve the long-term?
00:26:56.200 | Let's talk about the internals of AlphaStar.
00:26:58.080 | So first of all, how do you represent
00:27:01.880 | the state of the game as input?
00:27:05.400 | How do you then do the long-term sequence modeling?
00:27:08.800 | How do you build a policy?
00:27:10.760 | What's the architecture like?
00:27:12.560 | - So AlphaStar has obviously several components,
00:27:16.840 | but everything passes through what we call the policy,
00:27:20.880 | which is a neural network.
00:27:22.280 | And that's kind of the beauty of it.
00:27:24.280 | There is, I could just now give you a neural network
00:27:27.160 | and some weights, and if you fed the right observations
00:27:30.440 | and you understood the actions the same way we do,
00:27:32.520 | you would have basically the agent playing the game.
00:27:35.120 | There's absolutely nothing else needed
00:27:37.240 | other than those weights that were trained.
00:27:40.320 | Now, the first step is observing the game,
00:27:43.360 | and we've experimented with a few alternatives.
00:27:46.640 | The one that we currently use mixes both spatial
00:27:50.280 | sort of images that you would process from the game,
00:27:53.760 | that is the zoomed out version of the map,
00:27:56.400 | and also a zoomed in version of the camera
00:27:58.960 | or the screen as we call it.
00:28:00.880 | But also we give to the agent the list of units
00:28:04.840 | that it sees, more of as a set of objects
00:28:09.000 | that it can operate on.
00:28:11.040 | That is not necessarily required to use it.
00:28:14.760 | And we have versions of the game that play well
00:28:16.840 | without this set vision that is a bit not like
00:28:19.760 | how humans perceive the game,
00:28:21.680 | but it certainly helps a lot
00:28:23.640 | because it's a very natural way to encode the game
00:28:26.560 | is by just looking at all the units that there are,
00:28:29.360 | they have properties like health, position, type of unit,
00:28:33.960 | whether it's my unit or the enemy's.
00:28:36.200 | And that sort of is kind of the summary
00:28:40.800 | of the state of the game,
00:28:42.840 | not that list of units or set of units
00:28:45.520 | that you see all the time.
00:28:47.400 | - But that's pretty close to the way humans see the game.
00:28:49.560 | Why do you say it's not, isn't that,
00:28:51.480 | you're saying the exactness of it is not similar to humans?
00:28:55.040 | - The exactness of it is perhaps not the problem.
00:28:57.160 | I guess maybe the problem, if you look at it
00:28:59.800 | from how actually humans play the game
00:29:02.280 | is that they play with a mouse and a keyboard and a screen,
00:29:05.720 | and they don't see sort of a structured object
00:29:08.720 | with all the units,
00:29:09.560 | what they see is what they see on the screen, right?
00:29:13.040 | - Remember that there's a, sorry to interrupt,
00:29:14.360 | there's a plot that you showed with camera base
00:29:16.960 | where you do exactly that, right?
00:29:18.600 | You move around and that seems to converge
00:29:21.080 | to similar performance.
00:29:22.240 | - Yeah, I think that's what I,
00:29:23.520 | we're kind of experimenting with what's necessary or not,
00:29:26.320 | but using the set.
00:29:28.720 | So actually, if you look at research in computer vision,
00:29:32.360 | where it makes a lot of sense to treat images
00:29:36.000 | as two-dimensional arrays,
00:29:38.160 | there's actually a very nice paper from Facebook,
00:29:40.400 | I think, I forgot who the authors are,
00:29:42.760 | but I think it's part of K-Ming's group.
00:29:46.400 | And what they do is they take an image,
00:29:49.560 | which is this two-dimensional signal,
00:29:51.960 | and they actually take pixel by pixel
00:29:54.320 | and scramble the image as if it was just a list of pixels.
00:29:59.160 | Crucially, they encode the position of the pixels
00:30:01.800 | with the XY coordinates.
00:30:03.720 | And this is just kind of a new architecture,
00:30:06.160 | which we incidentally also use in StarCraft
00:30:08.520 | called the Transformer,
00:30:09.840 | which is a very popular paper from last year,
00:30:12.000 | which yielded very nice result in machine translation.
00:30:15.600 | And if you actually believe in this kind of,
00:30:18.040 | oh, it's actually a set of pixels,
00:30:20.320 | as long as you encode XY, it's okay,
00:30:22.560 | then you could argue that the list of units that we see
00:30:26.080 | is precisely that,
00:30:26.960 | because we have each unit as a kind of pixel, if you will,
00:30:31.480 | and then their XY coordinates.
00:30:33.240 | So in that perspective, without knowing it,
00:30:36.400 | we use the same architecture that was shown
00:30:38.720 | to work very well on Pascal and ImageNet and so on.
00:30:41.400 | - So the interesting thing here is putting it in that way,
00:30:45.440 | it starts to move it towards
00:30:46.960 | the way you usually work with language.
00:30:49.480 | So what, and especially with your expertise
00:30:52.760 | and work in language,
00:30:55.520 | it seems like there's echoes of a lot of the way
00:30:59.320 | you would work with natural language
00:31:00.720 | in the way you've approached AlphaStar.
00:31:02.400 | - Right.
00:31:03.240 | - What's, does that help with the long-term
00:31:05.880 | sequence modeling there somehow?
00:31:08.200 | - Exactly, so now that we understand what an observation
00:31:11.200 | for a given time step is, we need to move on to say,
00:31:14.680 | well, there's gonna be a sequence of such observations,
00:31:17.760 | and an agent will need to, given all that it's seen,
00:31:21.120 | not only the current time step, but all that it's seen,
00:31:23.760 | why? Because there is partial observability.
00:31:25.960 | We must remember whether we saw a worker
00:31:28.440 | going somewhere, for instance, right?
00:31:30.160 | Because then there might be an expansion
00:31:31.760 | on the top right of the map.
00:31:33.640 | So given that, what you must then think about
00:31:37.840 | is there is the problem of given all the observations,
00:31:40.400 | you have to predict the next action.
00:31:42.640 | And not only given all the observations,
00:31:44.520 | but given all the observations
00:31:45.960 | and given all the actions you've taken,
00:31:47.920 | predict the next action.
00:31:49.360 | And that sounds exactly like machine translation,
00:31:52.480 | where, and that's exactly how kind of I saw the problem,
00:31:57.160 | especially when you are given supervised data
00:32:00.000 | or replaced from humans,
00:32:01.760 | because the problem is exactly the same.
00:32:03.600 | You're translating essentially a prefix
00:32:06.680 | of observations and actions onto what's gonna happen next,
00:32:10.160 | which is exactly how you would train a model,
00:32:11.960 | to translate or to generate language as well, right?
00:32:14.760 | You have a certain prefix,
00:32:16.640 | you must remember everything that comes in the past
00:32:19.000 | because otherwise you might start having non-coherent text.
00:32:22.640 | And the same architectures,
00:32:25.080 | we're using LSTMs and transformers to operate on,
00:32:28.920 | across time to kind of integrate
00:32:30.960 | all that's happened in the past.
00:32:33.080 | Those architectures that work so well
00:32:35.000 | in translation or language modeling
00:32:36.880 | are exactly the same than what the agent is using
00:32:40.640 | to issue actions in the game.
00:32:42.360 | And the way we train it, moreover, for imitation,
00:32:44.760 | which is step one of AlphaSTAR is,
00:32:47.120 | take all the human experience and try to imitate it,
00:32:49.880 | much like you try to imitate translators
00:32:52.920 | that translated many pairs of sentences
00:32:55.360 | from French to English, say,
00:32:57.280 | that sort of principle applies exactly the same.
00:33:00.200 | It's almost the same code, except that instead of words,
00:33:04.520 | you have a slightly more complicated objects,
00:33:06.640 | which are the observations and the actions
00:33:08.920 | are also a bit more complicated than a word.
00:33:11.800 | - Is there a self-play component then too?
00:33:13.960 | So once you run out of imitation?
00:33:16.520 | - Right, so indeed you can bootstrap from human replays,
00:33:21.520 | but then the agents you get are actually not as good
00:33:26.000 | as the humans you imitated, right?
00:33:28.200 | So how do we imitate?
00:33:30.480 | Well, we take humans from 3000 MMR and higher.
00:33:34.320 | 3000 MMR is just a metric of human skill
00:33:38.000 | and 3000 MMR might be like 50% percentile, right?
00:33:41.920 | So it's just average human.
00:33:43.800 | - What's that?
00:33:44.640 | So maybe quick pause.
00:33:45.480 | MMR is a ranking scale, the matchmaking rating for players.
00:33:50.360 | So it's 3000, I remember there's like a master
00:33:52.360 | and a grandmaster, what's 3000?
00:33:54.160 | - So 3000 is pretty bad.
00:33:56.800 | I think it's kind of gold level.
00:33:58.520 | - It just sounds really good relative to chess, I think.
00:34:00.720 | - Oh yeah, yeah, no, the ratings,
00:34:02.520 | the best in the world are at 7,000 MMR.
00:34:04.520 | - 7,000.
00:34:05.400 | - So 3000, it's a bit like Elo indeed, right?
00:34:07.920 | So 3,500 just allows us to not filter a lot of the data.
00:34:12.920 | So we like to have a lot of data in deep learning
00:34:15.720 | as you probably know.
00:34:17.360 | So we take these kind of 3,500 and above,
00:34:20.680 | but then we do a very interesting trick,
00:34:22.760 | which is we tell the neural network
00:34:25.040 | what level they are imitating.
00:34:27.600 | So we say, this replay you're gonna try to imitate
00:34:30.840 | to predict the next action for all the actions
00:34:33.080 | that you're gonna see is a 4,000 MMR replay.
00:34:36.120 | This one is a 6,000 MMR replay.
00:34:38.840 | And what's cool about this is then we take this policy
00:34:42.520 | that is being trained from human,
00:34:44.320 | and then we can ask it to play like a 3000 MMR player
00:34:47.440 | by setting a bit saying, well, okay,
00:34:49.600 | play like a 3000 MMR player or play like a 6,000 MMR player.
00:34:53.680 | And you actually see how the policy behaves differently.
00:34:57.280 | It gets worse economy if you play like a gold level player,
00:35:01.480 | it does less actions per minute,
00:35:02.960 | which is the number of clicks or number of actions
00:35:05.280 | that you will issue in a whole minute.
00:35:07.760 | And it's very interesting to see that it kind of imitates
00:35:10.520 | the skill level quite well.
00:35:12.360 | But if we ask it to play like a 6,000 MMR player,
00:35:15.480 | we tested of course these policies to see how well they do.
00:35:18.600 | They actually beat all the built-in AIs
00:35:20.560 | that Blizzard put in the game,
00:35:22.400 | but they're nowhere near 6,000 MMR players, right?
00:35:24.960 | They might be maybe around gold level, platinum perhaps.
00:35:29.240 | So there's still a lot of work to be done for the policy
00:35:32.200 | to truly understand what it means to win.
00:35:34.960 | So far, we only asked them, okay, here is the screen,
00:35:38.200 | and that's what's happened on the game until this point.
00:35:41.600 | What would the next action be if we ask a pro to now say,
00:35:46.080 | oh, you're gonna click here or here or there.
00:35:49.080 | And the point is experiencing wins and losses
00:35:53.640 | is very important to then start to refine.
00:35:56.320 | Otherwise the policy can get loose,
00:35:58.320 | can just go off policy as we call it.
00:36:00.440 | - That's so interesting that you can at least hope
00:36:02.920 | eventually to be able to control a policy
00:36:06.760 | approximately to be at some MMR level.
00:36:09.960 | That's so interesting,
00:36:11.480 | especially given that you have ground truth
00:36:13.320 | for a lot of these cases.
00:36:15.040 | I can ask you a personal question.
00:36:17.600 | What's your MMR?
00:36:19.280 | - Well, I haven't played StarCraft II, so I am unranked,
00:36:23.440 | which is the kind of lowest league.
00:36:25.440 | - Okay.
00:36:26.280 | - So I used to play StarCraft I, the first one.
00:36:29.640 | - But you haven't seriously played StarCraft II?
00:36:31.480 | - No, not StarCraft II.
00:36:32.720 | So the best player we have at DeepMind is about 5,000 MMR,
00:36:37.720 | which is high masters.
00:36:39.640 | It's not at Grand Master level.
00:36:42.120 | Grand Master level would be the top 200 players
00:36:44.720 | in a certain region like Europe or America or Asia.
00:36:47.960 | But for me, it would be hard to say.
00:36:51.640 | I am very bad at the game.
00:36:53.760 | I actually played AlphaStar a bit too late and it beat me.
00:36:56.680 | I remember the whole team was, "Oh, Oriol, you should play."
00:36:59.760 | And I was, "Oh, it looks like it's not so good yet."
00:37:02.240 | And then I remember I kind of got busy
00:37:04.920 | and waited an extra week and I played
00:37:07.280 | and it really beat me very badly.
00:37:10.280 | - I mean, how did that feel?
00:37:11.520 | Isn't that an amazing feeling?
00:37:12.720 | - That's amazing, yeah.
00:37:13.640 | I mean, obviously I tried my best
00:37:16.520 | and I tried to also impress my...
00:37:18.080 | Because I actually played the first game,
00:37:19.800 | so I'm still pretty good at micromanagement.
00:37:23.160 | The problem is I just don't understand StarCraft II.
00:37:25.280 | I understand StarCraft.
00:37:27.000 | And when I played StarCraft,
00:37:28.520 | I probably was consistently, like,
00:37:31.480 | for a couple of years, top 32 in Europe.
00:37:34.680 | So I was decent, but at the time,
00:37:36.520 | we didn't have this kind of MMR system as well established.
00:37:40.360 | So it would be hard to know what it was back then.
00:37:43.200 | - So what's the difference in interface
00:37:44.680 | between AlphaStar and StarCraft
00:37:47.760 | and a human player in StarCraft?
00:37:49.680 | Is there any significant differences
00:37:52.120 | between the way they both see the game?
00:37:54.160 | - I would say the way they see the game,
00:37:56.040 | there's a few things that are just very hard to simulate.
00:37:59.760 | The main one, perhaps, which is obvious in hindsight,
00:38:05.240 | is what's called cloaked units, which are invisible units.
00:38:10.600 | So in StarCraft, you can make some units
00:38:13.240 | that you need to have a particular kind of unit to detect it.
00:38:18.080 | So these units are invisible.
00:38:20.560 | If you cannot detect them, you cannot target them.
00:38:22.720 | So they would just destroy your buildings
00:38:25.760 | or kill your workers.
00:38:27.720 | But despite the fact you cannot target the unit,
00:38:31.640 | there's a shimmer that, as a human, you observe.
00:38:34.640 | I mean, you need to train a little bit.
00:38:35.920 | You need to pay attention.
00:38:37.440 | But you would see this kind of space-time,
00:38:40.160 | space-time distortion, and you would know,
00:38:42.400 | okay, there are, yeah.
00:38:44.800 | - Yeah, there's like a wave thing.
00:38:46.000 | - Yeah, it's called shimmer.
00:38:47.560 | - Space-time distortion, I like it.
00:38:49.120 | - That's really, the Blizzard term is shimmer.
00:38:51.880 | - Shimmer, okay.
00:38:52.720 | - And so this shimmer, professional players
00:38:55.520 | actually can see it immediately.
00:38:57.120 | They understand it very well.
00:38:59.480 | But it's still something that requires
00:39:01.400 | certain amount of attention,
00:39:02.680 | and it's kind of a bit annoying to deal with.
00:39:05.640 | Whereas for AlphaStar, in terms of vision,
00:39:08.600 | it's very hard for us to simulate sort of,
00:39:11.080 | oh, are you looking at this pixel in the screen and so on?
00:39:14.160 | So the only thing we can do is,
00:39:17.480 | there is a unit that's invisible over there.
00:39:19.680 | So AlphaStar would know that immediately.
00:39:22.480 | Obviously, it still obeys the rules.
00:39:24.000 | You cannot attack the unit.
00:39:25.160 | You must have a detector and so on.
00:39:27.360 | But it's kind of one of the main things
00:39:29.280 | that it just doesn't feel there's a very proper way.
00:39:32.640 | I mean, you could imagine, oh, you don't have high-precision,
00:39:35.440 | maybe you don't know exactly where it is,
00:39:36.920 | or sometimes you see it, sometimes you don't.
00:39:39.200 | But it's just really, really complicated
00:39:41.960 | to get it so that everyone would agree,
00:39:44.280 | oh, that's the best way to simulate this.
00:39:47.240 | - You know, it seems like a perception problem.
00:39:49.280 | - It is a perception problem.
00:39:50.560 | So the only problem is people, or you ask,
00:39:54.240 | oh, what's the difference between
00:39:55.280 | how humans perceive the game?
00:39:56.720 | I would say they wouldn't be able to tell a shimmer
00:39:59.920 | immediately as it appears on the screen.
00:40:02.200 | Whereas AlphaStar, in principle, sees it very sharply.
00:40:05.600 | It sees that the bit turned from zero to one,
00:40:08.640 | meaning there's now a unit there,
00:40:10.440 | although you don't know the unit,
00:40:11.920 | or you know that you cannot attack it and so on.
00:40:15.800 | So that, from a vision standpoint,
00:40:18.040 | that probably is the one that is kind of
00:40:21.200 | the most obvious one.
00:40:22.920 | Then there are things humans cannot do perfectly,
00:40:25.120 | even professionals, which is they might miss a detail,
00:40:28.040 | or they might have not seen a unit.
00:40:30.560 | And obviously, as a computer,
00:40:32.200 | if there's a corner of the screen that turns green
00:40:34.960 | because a unit enters the field of view,
00:40:37.640 | that can go into the memory of the agent, the LSTM,
00:40:41.000 | and persist there for a while,
00:40:42.480 | and for however long is relevant, right?
00:40:45.640 | - And in terms of action,
00:40:47.640 | it seems like the rate of action from AlphaStar
00:40:50.640 | is comparative, if not slower than professional players,
00:40:54.200 | but it's more precise, is what I heard.
00:40:57.080 | - So that's really probably the one
00:40:59.680 | that is causing us more issues for a couple of reasons.
00:41:05.000 | The first one is,
00:41:06.720 | StarCraft has been an AI environment for quite a few years.
00:41:09.960 | In fact, I was participating in the very first competition
00:41:13.960 | back in 2010, and there's really not been
00:41:18.720 | a very clear set of rules, how the actions per minute,
00:41:22.280 | the rate of actions that you can issue is.
00:41:24.680 | And as a result, these agents or bots that people build
00:41:29.240 | in a kind of almost very cool way,
00:41:31.040 | they do like 20,000, 40,000 actions per minute.
00:41:35.360 | Now, to put this in perspective,
00:41:37.160 | a very good professional human
00:41:39.480 | might do 300 to 800 actions per minute.
00:41:44.040 | They might not be as precise,
00:41:45.440 | that's why the range is a bit tricky to identify exactly.
00:41:49.000 | I mean, 300 actions per minute precisely
00:41:51.600 | is probably realistic, 800 is probably not,
00:41:54.560 | but you see humans doing a lot of actions
00:41:56.960 | because they warm up and they kind of select things
00:41:59.440 | and spam and so on, just so that when they need,
00:42:02.200 | they have the accuracy.
00:42:04.200 | So we came into this by not having kind of a standard way
00:42:09.200 | to say, well, how do we measure whether an agent
00:42:12.920 | is at human level or not?
00:42:14.840 | On the other hand, we had a huge advantage,
00:42:18.160 | which is because we do imitation learning,
00:42:21.400 | agents turned out to act like humans
00:42:24.480 | in terms of rate of actions, even precisions
00:42:26.880 | and imprecisions of actions in the supervised policy.
00:42:30.120 | You could see all these,
00:42:31.000 | you could see how agents like to spam click, to move here.
00:42:34.680 | If you played, especially Diablo, you would know what I mean.
00:42:36.880 | I mean, you just like spam,
00:42:38.680 | oh, move here, move here, move here.
00:42:40.320 | You're doing literally like maybe five actions
00:42:43.240 | in two seconds, but these actions are not very meaningful.
00:42:46.800 | One would have sufficed.
00:42:48.720 | So on the one hand, we start from this imitation policy
00:42:52.080 | that is at the ballpark of the actions per minutes of humans
00:42:55.600 | because it's actually statistically
00:42:57.320 | trying to imitate humans.
00:42:58.960 | So we see these very nicely in the curves
00:43:01.040 | that we showed in the blog post.
00:43:02.320 | Like there's these actions per minute
00:43:04.560 | and the distribution looks very human-like.
00:43:07.680 | But then of course, as self-play kicks in,
00:43:10.960 | and that's the part we haven't talked too much yet,
00:43:13.240 | but of course the agent must play against himself to improve.
00:43:17.200 | Then there's almost no guarantees
00:43:19.640 | that these actions will not become more precise
00:43:22.400 | or even the rate of actions is going to increase over time.
00:43:26.040 | So what we did, and this is probably kind of the first attempt
00:43:29.880 | that we thought was reasonable,
00:43:31.160 | is we looked at the distribution of actions for humans
00:43:34.240 | for certain windows of time.
00:43:36.400 | And just to give a perspective,
00:43:37.720 | because I guess I mentioned that some of these agents
00:43:40.680 | that are programmatic, let's call them,
00:43:42.280 | they do 40,000 actions per minute.
00:43:44.600 | Professionals, as I said, do 300 to 800.
00:43:47.360 | So what we looked is we look at the distribution
00:43:49.400 | over professional gamers, and we took reasonably
00:43:52.640 | high actions per minute,
00:43:54.120 | but we kind of identify certain cutoffs
00:43:57.480 | after which, even if the agent wanted to act,
00:44:00.560 | these actions would be dropped.
00:44:02.120 | But the problem is this cutoff is probably set
00:44:05.800 | a bit too high, and what ends up happening,
00:44:08.640 | even though the games, and when we ask the professionals
00:44:11.520 | and the gamers, by and large,
00:44:13.000 | they feel like it's playing human-like.
00:44:15.880 | There are some agents that developed
00:44:17.880 | maybe slightly too high APMs,
00:44:22.880 | which is actions per minute, combined with the precision,
00:44:26.640 | which made people sort of start discussing
00:44:29.400 | a very interesting issue, which is,
00:44:30.720 | should we have limited this?
00:44:32.520 | Should we just let it loose and see what cool things
00:44:35.920 | it can come up with, right?
00:44:37.560 | - Interesting.
00:44:38.400 | - So this is, in itself, an extremely interesting question,
00:44:42.040 | but the same way that modeling the shimmer
00:44:44.000 | would be so difficult, modeling absolutely all the details
00:44:47.720 | about muscles and precision and tiredness of humans
00:44:51.640 | would be quite difficult, right?
00:44:52.920 | So we're really here in kind of innovating
00:44:56.240 | in this sense of, okay, what could be maybe
00:44:58.920 | the next iteration of putting more rules
00:45:01.760 | that makes the agents more human-like
00:45:05.080 | in terms of restrictions?
00:45:06.360 | - Yeah, putting constraints that--
00:45:08.120 | - More constraints, yeah.
00:45:09.280 | - That's really interesting.
00:45:10.120 | That's really innovative.
00:45:11.080 | So one of the constraints you put on yourself,
00:45:15.440 | or at least focused in, is on the Protoss race,
00:45:18.040 | as far as I understand.
00:45:19.920 | Can you tell me about the different races and how they,
00:45:22.920 | so Protoss, Terran, and Zerg, how do they compare?
00:45:27.080 | How do they interact?
00:45:28.200 | Why did you choose Protoss?
00:45:30.040 | - Right. - Yeah.
00:45:30.880 | Is, in the dynamics of the game,
00:45:33.680 | seen from a strategic perspective?
00:45:35.720 | - So Protoss, so in StarCraft, there are three races.
00:45:39.720 | Indeed, in the demonstration, we saw only the Protoss race.
00:45:43.920 | So maybe let's start with that one.
00:45:45.600 | Protoss is kind of the most technologically advanced race.
00:45:49.480 | It has units that are expensive, but powerful, right?
00:45:53.840 | So in general, you wanna kind of conserve your units
00:45:57.880 | as you go attack, so you wanna,
00:45:59.560 | and then you wanna utilize these tactical advantages
00:46:03.280 | of very fancy spells and so on and so forth.
00:46:07.280 | And at the same time, they're kind of,
00:46:10.320 | people say, like, they're a bit easier to play, perhaps.
00:46:14.640 | Right?
00:46:15.480 | But that, I actually didn't know.
00:46:17.160 | I mean, I just talked to, now, a lot to the players
00:46:20.160 | that we work with, TLO and Mana, and they said,
00:46:22.920 | "Oh yeah, Protoss is actually, people think,
00:46:24.640 | "is actually one of the easiest races."
00:46:26.360 | So perhaps the easier, that doesn't mean that it's,
00:46:30.240 | you know, obviously professional players
00:46:32.760 | excel at the three races,
00:46:34.080 | and there's never like a race that dominates
00:46:37.600 | for a very long time anyway.
00:46:38.800 | - So if you look at the top, I don't know,
00:46:40.240 | a hundred in the world,
00:46:41.720 | is there one race that dominates that list?
00:46:44.360 | - It would be hard to know
00:46:45.320 | because it depends on the regions.
00:46:46.840 | I think it's pretty equal in terms of distribution,
00:46:50.600 | and Blizzard wants it to be equal, right?
00:46:52.840 | They don't want, they wouldn't want one race like Protoss
00:46:56.280 | to not be representative in the top place.
00:46:59.000 | - Right.
00:46:59.920 | - So definitely, like, they tried it to be like balanced.
00:47:03.040 | Right?
00:47:03.880 | So then maybe the opposite race of Protoss is Zerg.
00:47:07.320 | Zerg is a race where you just kind of expand
00:47:10.560 | and take over as many resources as you can,
00:47:13.800 | and they have a very high capacity
00:47:15.680 | to regenerate their units.
00:47:17.640 | So if you have an army, it's not that valuable
00:47:20.480 | in terms of losing the whole army is not a big deal as Zerg
00:47:23.920 | because you can then rebuild it,
00:47:25.920 | and given that you generally accumulate
00:47:28.280 | a huge bank of resources,
00:47:30.840 | Zergs typically play by applying a lot of pressure,
00:47:34.160 | maybe losing their whole army,
00:47:36.080 | but then rebuilding it quickly.
00:47:37.800 | So, although of course, every race,
00:47:40.400 | I mean, there's never, I mean, they're pretty diverse.
00:47:43.880 | I mean, there are some units in Zerg
00:47:45.080 | that are technologically advanced
00:47:46.520 | and they do some very interesting spells,
00:47:48.800 | and there's some units in Protoss that are less valuable
00:47:51.280 | and you could lose a lot of them and rebuild them
00:47:53.280 | and it wouldn't be a big deal.
00:47:55.080 | - All right, so maybe I'm missing out.
00:47:57.760 | Maybe I'm gonna say some dumb stuff,
00:47:59.200 | but summary of strategy.
00:48:02.440 | So first there's collection of a lot of resources.
00:48:05.680 | That's one option.
00:48:06.520 | The other one is expanding, so building other bases.
00:48:11.520 | Then the other is obviously attack,
00:48:14.840 | building units and attacking with those units.
00:48:17.240 | And then I don't know what else there is.
00:48:20.600 | Maybe there's the different timing of attacks,
00:48:24.000 | like do I attack early, attack late?
00:48:25.960 | What are the different strategies that emerged
00:48:27.920 | that you've learned about?
00:48:29.040 | I've read that a bunch of people are super happy
00:48:31.280 | that you guys have apparently,
00:48:32.920 | that AlphaStar apparently has discovered
00:48:34.960 | that it's really good to, what is it, saturate?
00:48:37.960 | - Oh yeah, the mineral line.
00:48:39.520 | - Yeah, the mineral line.
00:48:40.680 | - Yeah, yeah.
00:48:42.120 | - And that's for greedy amateur players like myself.
00:48:45.560 | That's always been a good strategy.
00:48:47.440 | You just build up a lot of money
00:48:48.960 | and it just feels good to just accumulate and accumulate.
00:48:53.240 | So thank you for discovering that
00:48:55.200 | and validating all of us.
00:48:56.640 | But is there other strategies that you discovered
00:48:59.160 | interesting, unique to this game?
00:49:01.840 | - Yeah, so if you look at the kind of,
00:49:05.000 | and not being a StarCraft II player,
00:49:06.440 | but of course StarCraft and StarCraft II
00:49:08.040 | and real-time strategy games in general are very similar.
00:49:11.040 | I would classify perhaps the openings of the game.
00:49:17.080 | They're very important.
00:49:18.760 | And generally I would say there's two kinds of openings.
00:49:21.760 | One that's a standard opening.
00:49:23.400 | That's generally how players find sort of a balance
00:49:28.400 | between risk and economy and building some units early on
00:49:33.400 | so that they could defend,
00:49:34.600 | but they're not too exposed basically,
00:49:36.800 | but also expanding quite quickly.
00:49:39.480 | So this would be kind of a standard opening.
00:49:42.040 | And within a standard opening,
00:49:43.680 | then what you do choose generally
00:49:45.520 | is what technology are you aiming towards?
00:49:48.360 | So there's a bit of rock, paper, scissors
00:49:50.280 | of you could go for spaceships
00:49:52.920 | or you could go for invisible units,
00:49:55.080 | or you could go for, I don't know,
00:49:56.400 | like massive units that attack
00:49:58.320 | against certain kinds of units,
00:50:00.080 | but they're weak against others.
00:50:01.640 | So standard openings themselves have some choices
00:50:05.720 | like rock, paper, scissors style.
00:50:07.480 | Of course, if you scout and you're good at guessing
00:50:09.640 | what the opponent is doing,
00:50:11.080 | then you can play as an advantage
00:50:12.800 | because if you know you're gonna play rock,
00:50:14.480 | I mean, I'm gonna play paper, obviously.
00:50:16.480 | So you can imagine that normal standard games
00:50:19.120 | in Starcraft looks like a continuous rock, paper, scissor game
00:50:24.040 | where you guess what the distribution of rock,
00:50:27.240 | paper and scissor is from the enemy
00:50:29.920 | and reacting accordingly to try to beat it
00:50:33.360 | or put the paper out before he kind of changes his mind
00:50:37.440 | from rock to scissors
00:50:38.880 | and then you would be in a weak position.
00:50:40.480 | - So sorry to pause on that.
00:50:42.160 | I didn't realize this element
00:50:43.320 | 'cause I know it's true with poker.
00:50:44.880 | I know I looked at Labrador's.
00:50:48.800 | So you're also estimating,
00:50:50.880 | trying to guess the distribution,
00:50:52.200 | trying to better and better estimate the distribution
00:50:54.120 | of what the opponent is likely to be doing.
00:50:56.040 | - Yeah, I mean, as a player,
00:50:57.440 | you definitely wanna have a belief state
00:50:59.840 | over what's up on the other side of the map.
00:51:03.000 | And when your belief state becomes inaccurate,
00:51:05.600 | when you start having serious doubts
00:51:08.040 | whether he's gonna play something that you must know,
00:51:11.280 | that's when you scout.
00:51:12.440 | You wanna then gather information, right?
00:51:14.560 | - Is improving the accuracy of the belief
00:51:16.440 | or improving the belief state part of the loss
00:51:19.880 | that you're trying to optimize
00:51:21.040 | or is it just a side effect?
00:51:22.720 | - It's implicit, but you could explicitly model it
00:51:25.840 | and it would be quite good at probably predicting
00:51:28.280 | what's on the other side of the map.
00:51:30.360 | But so far, it's all implicit.
00:51:32.880 | There's no additional reward for predicting the enemy.
00:51:36.680 | So there's these standard openings
00:51:38.800 | and then there's what people call cheese,
00:51:41.640 | which is very interesting.
00:51:42.840 | And AlphaStar sometimes really likes this kind of cheese.
00:51:46.760 | These cheeses, what they are is kind of an all-in strategy.
00:51:51.120 | You're gonna do something sneaky.
00:51:53.240 | You're gonna hide your own buildings
00:51:56.680 | close to the enemy base,
00:51:58.200 | or you're gonna go for hiding your technological buildings
00:52:01.600 | so that you do invisible units
00:52:03.040 | and the enemy just cannot react to detect it
00:52:06.040 | and thus lose the game.
00:52:07.960 | And there's quite a few of these cheeses
00:52:10.000 | and variants of them.
00:52:11.760 | And there is where actually the belief state
00:52:14.440 | becomes even more important.
00:52:16.320 | Because if I scout your base and I see no buildings at all,
00:52:20.160 | any human player knows something's up.
00:52:22.440 | They might know, well,
00:52:23.280 | you're hiding something close to my base.
00:52:25.600 | Should I build suddenly a lot of units to defend?
00:52:28.320 | Should I actually block my ramp with workers
00:52:30.960 | so that you cannot come and destroy my base?
00:52:33.480 | So there's all these is happening
00:52:35.640 | and defending against cheeses is extremely important.
00:52:39.400 | And in the AlphaStar League,
00:52:40.720 | many agents actually develop some cheesy strategies.
00:52:45.040 | And in the games we saw against TLO and Mana,
00:52:47.960 | two out of the 10 agents
00:52:49.200 | were actually doing these kinds of strategies,
00:52:51.720 | which are cheesy strategies.
00:52:53.600 | And then there's a variant of cheesy strategy,
00:52:55.560 | which is called all-in.
00:52:57.320 | So an all-in strategy is not perhaps as drastic as,
00:53:00.400 | oh, I'm gonna build cannons on your base
00:53:02.480 | and then bring all my workers
00:53:03.800 | and try to just disrupt your base and game over,
00:53:06.760 | or GG, as we say in StarCraft.
00:53:08.720 | There's these kind of very cool things
00:53:11.920 | that you can align precisely at a certain time mark.
00:53:14.680 | So for instance, you can generate
00:53:17.320 | exactly 10 unit composition that is perfect,
00:53:20.200 | like five of this type, five of this other type,
00:53:22.880 | and align the upgrade so that at four minutes and a half,
00:53:26.160 | let's say, you have these 10 units
00:53:28.600 | and the upgrade just finished.
00:53:30.560 | And at that point, that army is really scary.
00:53:33.880 | And unless the enemy really knows what's going on,
00:53:36.360 | if you push, you might then have an advantage
00:53:40.160 | because maybe the enemy is doing something more standard,
00:53:42.360 | it expanded too much, it developed too much economy,
00:53:45.680 | and it trade off badly against having defenses,
00:53:49.640 | and the enemy will lose.
00:53:51.040 | But it's called all-in because if you don't win,
00:53:53.560 | then you're gonna lose.
00:53:54.960 | So you see players that do these kinds of strategies.
00:53:57.880 | If they don't succeed, game is not over.
00:53:59.920 | I mean, they still have a base
00:54:01.120 | and they're still gathering minerals,
00:54:02.760 | but they will just GG out of the game
00:54:04.680 | because they know, well, game is over.
00:54:06.680 | I gambled and I failed.
00:54:08.760 | So if we start entering the game theoretic aspects of the game,
00:54:13.240 | it's really rich and that's why
00:54:15.800 | it also makes it quite entertaining to watch.
00:54:17.880 | Even if I don't play, I still enjoy watching the game.
00:54:21.720 | But the agents are trying to do this mostly implicitly,
00:54:26.800 | but one element that we improved in self-play
00:54:29.000 | is creating the AlphaStar League.
00:54:31.280 | And the AlphaStar League is not pure self-play.
00:54:34.560 | It's trying to create different personalities of agents
00:54:37.880 | so that some of them will become cheesy agents.
00:54:41.480 | Some of them might become very economical, very greedy,
00:54:44.360 | like getting all the resources,
00:54:46.160 | but then maybe early on, they're gonna be weak,
00:54:48.760 | but later on, they're gonna be very strong.
00:54:51.000 | And by creating this personality of agents,
00:54:53.400 | which sometimes it just happens naturally
00:54:55.360 | that you can see kind of an evolution of agents
00:54:58.200 | that given the previous generation,
00:55:00.760 | they train against all of them
00:55:01.920 | and then they generate kind of the perfect counter
00:55:04.320 | to that distribution.
00:55:05.720 | But these agents, you must have them in the populations
00:55:09.280 | because if you don't have them,
00:55:11.240 | you're not covered against these things, right?
00:55:13.000 | It's kind of, you wanna create all sorts of the opponents
00:55:17.080 | that you will find in the wild
00:55:18.640 | so you can be exposed to these cheeses,
00:55:21.800 | early aggression, later aggression, more expansions,
00:55:25.680 | dropping units in your base from the side,
00:55:28.320 | all these things.
00:55:29.520 | And pure self-play is getting a bit stuck
00:55:32.720 | at finding some subset of these, but not all of these.
00:55:36.160 | So the AlphaStar League is a way to kind of
00:55:39.400 | do an ensemble of agents that they're all playing in a league
00:55:43.440 | much like people play on Battle.net, right?
00:55:45.480 | They play, you play against someone
00:55:47.400 | who does a new cool strategy and you immediately,
00:55:50.200 | oh my God, I wanna try it, I wanna play again.
00:55:53.000 | And this to me was another critical part of the problem,
00:55:57.520 | which was, can we create a Battle.net for agents?
00:56:01.200 | And that's kind of what the AlphaStar League really-
00:56:03.520 | - That's fascinating.
00:56:04.360 | And where they stick to their different strategies.
00:56:06.880 | Yeah, wow, that's really, really interesting.
00:56:09.560 | So, but that said, you were fortunate enough
00:56:13.200 | or just skilled enough to win 5-0.
00:56:16.240 | And so how hard is it to win?
00:56:19.240 | I mean, that's not the goal.
00:56:20.280 | I guess, I don't know what the goal is.
00:56:21.840 | The goal should be to win majority, not 5-0,
00:56:25.360 | but how hard is it in general to win all matchups
00:56:29.320 | on a one V1?
00:56:31.040 | - So that's a very interesting question
00:56:33.560 | because once you see AlphaStar and superficially
00:56:38.560 | you think, well, okay, it won.
00:56:40.440 | Let's, if you sum all the games like 10 to one, right?
00:56:42.880 | It lost the game that it played with the camera interface.
00:56:46.240 | You might think, well, that's done, right?
00:56:48.440 | There's, it's superhuman at the game.
00:56:50.760 | And that's not really the claim
00:56:52.240 | we really can make actually.
00:56:54.720 | The claim is we beat a professional gamer
00:56:58.760 | for the first time.
00:57:00.040 | Starcraft has really been a thing
00:57:02.400 | that has been going on for a few years,
00:57:04.040 | but a moment like this had not occurred before yet.
00:57:09.040 | But are these agents impossible to beat?
00:57:12.280 | Absolutely not, right?
00:57:13.360 | So that's a bit what's, you know,
00:57:15.680 | kind of the difference is the agents play
00:57:18.440 | at Grandmaster level.
00:57:19.480 | They're definitely understand the game enough
00:57:21.400 | to play extremely well, but are they unbeatable?
00:57:24.880 | Do they play perfect?
00:57:27.920 | No, and actually in Starcraft,
00:57:30.320 | because of these sneaky strategies,
00:57:33.240 | it's always possible that you might take a huge risk
00:57:36.040 | sometimes, but you might get wins, right?
00:57:37.920 | Out of this.
00:57:39.200 | So I think that as a domain,
00:57:42.640 | it still has a lot of opportunities,
00:57:44.480 | not only because of course we want to learn
00:57:46.920 | with less experience.
00:57:48.040 | We would like to, I mean, if I learn to play Protoss,
00:57:50.760 | I can play Terran and learn it much quicker
00:57:53.560 | than AlphaStar can, right?
00:57:54.760 | So there are obvious interesting research challenges
00:57:57.720 | as well, but even as the raw performance goes,
00:58:02.720 | really the claim here can be,
00:58:05.120 | we are at pro level or at high Grandmaster level,
00:58:09.320 | but obviously the players also did not know what to expect.
00:58:14.320 | Right, this kind of their prior distribution was a bit off
00:58:16.960 | because they played this kind of new, like alien brain
00:58:20.600 | as they like to say it, right?
00:58:22.080 | And that's what makes it exciting for them.
00:58:25.080 | But also I think if you look at the games closely,
00:58:28.040 | you see there were weaknesses in some points,
00:58:31.520 | maybe AlphaStar did not scout,
00:58:33.280 | or if it had got invisible units going against
00:58:36.080 | at certain points, it wouldn't have known
00:58:38.200 | and it would have been bad.
00:58:39.600 | So there's still quite a lot of work to do,
00:58:42.880 | but it's really a very exciting moment for us to be seeing,
00:58:46.440 | wow, a single neural net on a GPU is actually playing
00:58:50.320 | against these guys who are amazing.
00:58:52.040 | I mean, you have to see them play in life.
00:58:53.720 | They're really, really amazing players.
00:58:55.800 | - Yeah, I'm sure there must be a guy in Poland somewhere
00:59:00.440 | right now training his butt off to make sure
00:59:03.400 | that this never happens again with AlphaStar.
00:59:06.600 | So that's really exciting in terms of AlphaStar
00:59:09.720 | having some holes to exploit, which is great.
00:59:12.200 | And then we build on top of each other
00:59:14.360 | and it feels like StarCraft on let go,
00:59:17.040 | even if you win, it's still not,
00:59:21.640 | there's so many different dimensions
00:59:23.120 | in which you can explore.
00:59:24.240 | So that's really, really interesting.
00:59:25.600 | Do you think there's a ceiling to AlphaStar?
00:59:28.560 | You've said that it hasn't reached,
00:59:31.400 | you know, this is a big,
00:59:32.880 | wait, let me actually just pause for a second.
00:59:35.560 | How did it feel to come here to this point,
00:59:40.240 | to be a top professional player?
00:59:42.260 | Like that night, I mean, you know,
00:59:44.640 | Olympic athletes have their gold medal, right?
00:59:47.160 | This is your gold medal in a sense.
00:59:48.880 | Sure, you're cited a lot,
00:59:50.440 | you've published a lot of prestigious papers,
00:59:52.660 | whatever, but this is like a win.
00:59:55.320 | How did it feel?
00:59:56.520 | I mean, it was, for me, it was unbelievable
00:59:59.480 | because first the win itself,
01:00:03.960 | I mean, it was so exciting.
01:00:05.120 | I mean, so looking back to those last days of 2018,
01:00:10.120 | really, that's when the games were played.
01:00:13.160 | I'm sure I look back at that moment,
01:00:15.040 | I'll say, oh my God, I wanna be like in a project like that.
01:00:18.040 | It's like, I already feel the nostalgia of like,
01:00:21.120 | yeah, that was huge in terms of the energy
01:00:24.240 | and the team effort that went into it.
01:00:26.340 | And so in that sense, as soon as it happened,
01:00:29.240 | I already knew it was kind of,
01:00:31.260 | I was losing it a little bit.
01:00:32.980 | So it's almost like sad that it happened and oh my God,
01:00:36.320 | but on the other hand, it also verifies the approach.
01:00:41.320 | But to me also, there's so many challenges
01:00:43.800 | and interesting aspects of intelligence
01:00:46.080 | that even though we can train a neural network
01:00:49.840 | to play at the level of the best humans,
01:00:52.680 | there's still so many challenges.
01:00:54.180 | So for me, it's also like,
01:00:55.440 | well, this is really an amazing achievement,
01:00:57.420 | but I already was also thinking about next steps.
01:00:59.920 | I mean, as I said, these Asians play Protoss versus Protoss,
01:01:04.040 | but they should be able to play a different race
01:01:07.200 | much quicker, right?
01:01:08.120 | So that would be an amazing achievement.
01:01:10.620 | Some people call this meta reinforcement learning,
01:01:13.320 | meta learning and so on, right?
01:01:15.160 | So there's so many possibilities after that moment,
01:01:18.920 | but the moment itself, it really felt great.
01:01:21.500 | We had this bet, so I'm kind of a pessimist in general.
01:01:27.680 | So I kind of sent an email to the team, I said,
01:01:30.040 | "Okay, let's against TLO first, right?
01:01:33.600 | Like what's gonna be the result?"
01:01:35.080 | And I really thought we would lose like 5-0, right?
01:01:38.800 | We had some calibration made against the 5,000 MMR player.
01:01:43.800 | TLO was much stronger than that player,
01:01:47.280 | even if he played Protoss, which is his off race.
01:01:50.000 | But yeah, I was not imagining we would win.
01:01:53.040 | So for me, that was just kind of a test run or something.
01:01:55.520 | And then he was really surprised.
01:01:58.940 | And unbelievably, we went to this bar to celebrate
01:02:03.940 | and Dave tells me, "Well, why don't we invite someone
01:02:08.280 | who is a thousand MMR stronger in Protoss,
01:02:10.920 | like an actual Protoss player?"
01:02:12.480 | Like it turned up being Mana, right?
01:02:16.120 | And we had some drinks and I said, "Sure, why not?"
01:02:19.320 | But then I thought, "Well, that's really gonna be
01:02:21.160 | impossible to beat."
01:02:22.000 | I mean, even because it's so much ahead,
01:02:24.480 | a thousand MMR is really like 99% probability
01:02:28.320 | that Mana would beat TLO as Protoss versus Protoss, right?
01:02:33.000 | So we did that.
01:02:34.160 | And to me, the second game was much more important,
01:02:38.920 | even though a lot of uncertainty kind of disappeared
01:02:42.040 | after we kind of beat TLO.
01:02:43.600 | I mean, he is a professional player,
01:02:45.600 | so that was kind of,
01:02:46.640 | "Oh, but that's really a very nice achievement."
01:02:49.680 | But Mana really was at the top
01:02:51.720 | and you could see he played much better,
01:02:53.800 | but our agents got much better too.
01:02:55.320 | So it's like, "Ah."
01:02:57.360 | And then after the first game, I said,
01:02:59.680 | "If we take a single game,
01:03:00.840 | at least we can say we beat a game."
01:03:02.680 | I mean, even if we don't beat the series,
01:03:04.280 | for me, that was a huge relief.
01:03:06.880 | And I mean, I remember the hacking dummies.
01:03:09.160 | And I mean, it was really like this moment for me
01:03:11.840 | will resonate forever as a researcher.
01:03:14.120 | And I mean, as a person,
01:03:15.320 | and yeah, it's a really great accomplishment.
01:03:18.200 | And it was great also to be there with the team in the room.
01:03:21.320 | I don't know if you saw like the...
01:03:22.960 | So it was really like...
01:03:24.680 | - I mean, from my perspective,
01:03:25.920 | the other interesting thing is just like watching Kasparov,
01:03:29.640 | now watching Mana was also interesting
01:03:33.680 | because he is kind of a loss of words.
01:03:36.080 | I mean, whenever you lose, I've done a lot of sports.
01:03:38.320 | You sometimes say excuses, you look for reasons.
01:03:43.480 | And he couldn't really come up with reasons.
01:03:45.680 | - Yeah, yeah.
01:03:46.520 | - I mean, so with the off race for Protoss,
01:03:50.000 | you could say, well, it felt awkward, it wasn't,
01:03:52.280 | but here it was just beaten.
01:03:55.160 | And it was beautiful to look at a human being
01:03:57.920 | being superseded by an AI system.
01:04:00.240 | I mean, it's a beautiful moment for researchers.
01:04:04.400 | - Yeah, for sure.
01:04:05.240 | It was, I mean, probably the highlight of my career so far
01:04:09.920 | because of its uniqueness and coolness.
01:04:11.760 | And I don't know.
01:04:12.600 | I mean, it's obviously, as you said,
01:04:14.280 | you can look at paper citations and so on,
01:04:16.200 | but this really is like a testament
01:04:19.240 | of the whole machine learning approach
01:04:21.240 | and using games to advance technology.
01:04:24.640 | I mean, it really was,
01:04:26.840 | everything came together at that moment.
01:04:28.560 | That's really the summary.
01:04:29.840 | - Also on the other side, it's a popularization of AI too,
01:04:34.040 | because just like traveling to the moon and so on.
01:04:38.200 | I mean, this is where a very large community of people
01:04:41.000 | that don't really know AI,
01:04:43.120 | they get to really interact with it.
01:04:45.160 | - Which is very important.
01:04:46.000 | I mean, we must, you know,
01:04:48.640 | writing papers helps our peers, researchers,
01:04:51.400 | to understand what we're doing.
01:04:52.520 | But I think AI is becoming mature enough
01:04:55.880 | that we must sort of try to explain what it is.
01:04:59.000 | And perhaps through games is an obvious way
01:05:01.440 | because these games always had built-in AI.
01:05:03.640 | So it may be everyone experienced an AI playing a video game,
01:05:07.680 | even if they don't know
01:05:08.520 | because there's always some scripted element
01:05:10.240 | and some people might even call that AI already, right?
01:05:13.040 | - So what are other applications
01:05:16.320 | of the approaches underlying AlphaStar
01:05:19.080 | that you see happening?
01:05:20.280 | There's a lot of echoes of, you said,
01:05:22.360 | transformer of language modeling and so on.
01:05:25.680 | Have you already started thinking
01:05:27.120 | where the breakthroughs in AlphaStar
01:05:30.400 | get expanded to other applications?
01:05:32.280 | - Right, so I thought about a few things
01:05:34.640 | for like kind of next month, next year.
01:05:38.440 | The main thing I'm thinking about actually is what's next
01:05:41.480 | as a kind of a grand challenge,
01:05:43.160 | because for me, like we've seen Atari
01:05:47.120 | and then there's like the sort of three-dimensional worlds
01:05:50.280 | that we've seen also like pretty good performance
01:05:52.520 | from this Capture the Flag agents
01:05:54.120 | that also some people at DeepMind and elsewhere
01:05:56.440 | are working on.
01:05:57.600 | We've also seen some amazing results on like,
01:05:59.600 | for instance, Dota 2,
01:06:00.560 | which is also a very complicated game.
01:06:03.280 | So for me, like the main thing I'm thinking about
01:06:05.960 | is what's next in terms of challenge.
01:06:07.960 | So as a researcher, I see sort of two tensions
01:06:12.920 | between research and then applications
01:06:16.160 | or areas or domains where you apply them.
01:06:18.480 | So on the one hand, we've done,
01:06:20.480 | thanks to the application of StarCraft is very hard,
01:06:23.320 | we develop some techniques, some new research
01:06:25.600 | that now we could look at elsewhere.
01:06:27.480 | Like are there other applications where we can apply these?
01:06:30.520 | And the obvious ones, absolutely,
01:06:32.880 | you can think of feeding back to sort of the community
01:06:37.440 | we took from, which was mostly sequence modeling
01:06:40.240 | or natural language processing.
01:06:41.680 | So we've developed and extended things from the transformer
01:06:46.120 | and we use pointer networks.
01:06:48.120 | We combine LSTM and transformers in interesting ways.
01:06:51.280 | So that's perhaps the kind of lowest hanging fruit
01:06:54.200 | of feeding back to now a different field
01:06:57.600 | of machine learning that's not playing video games.
01:07:00.880 | - Let me go old school and jump to Mr. Alan Turing.
01:07:05.680 | So the Turing test, you know,
01:07:08.440 | it's a natural language test, a conversational test.
01:07:11.560 | What's your thought of it as a test for intelligence?
01:07:15.720 | Do you think it is a grand challenge
01:07:17.320 | that's worthy of undertaking?
01:07:18.880 | Maybe if it is, would you reformulate it
01:07:21.920 | or phrase it somehow differently?
01:07:23.680 | - Right, so I really love the Turing test
01:07:25.640 | because I also like sequences and language understanding.
01:07:29.560 | And in fact, some of the early work we did
01:07:32.480 | in machine translation, we tried to apply
01:07:34.960 | to kind of a neural chatbot,
01:07:37.280 | which obviously would never pass the Turing test
01:07:40.160 | because it was very limited.
01:07:42.280 | But it is a very fascinating idea
01:07:45.160 | that you could really have an AI
01:07:49.400 | that would be indistinguishable from humans
01:07:51.760 | in terms of asking or conversing with it, right?
01:07:56.000 | So I think the test itself seems very nice
01:08:00.680 | and it's kind of well-defined actually,
01:08:02.560 | like the passing it or not.
01:08:04.960 | I think there's quite a few rules
01:08:06.560 | that feel like pretty simple
01:08:09.120 | and you could really like have,
01:08:12.480 | I mean, I think they have these competitions every year.
01:08:14.760 | - Yeah, so the Leibner Prize,
01:08:15.920 | but I don't know if you've seen,
01:08:17.520 | I don't know if you've seen the kind of bots
01:08:22.240 | that emerge from that competition.
01:08:24.160 | They're not quite as what you would,
01:08:28.000 | so it feels like that there's weaknesses
01:08:29.920 | with the way Turing formulated it.
01:08:31.400 | It needs to be that the definition
01:08:34.960 | of a genuine, rich, fulfilling human conversation
01:08:39.960 | needs to be something else.
01:08:41.600 | Like the Alexa Prize, which I'm not as well familiar with,
01:08:44.840 | has tried to define that more,
01:08:46.160 | I think by saying you have to continue
01:08:48.200 | keeping a conversation for 30 minutes,
01:08:50.640 | something like that.
01:08:52.200 | So basically forcing the agent not to just fool
01:08:55.480 | but to have an engaging conversation kind of thing.
01:08:57.980 | Is that, I mean, is this,
01:09:02.260 | have you thought about this problem richly?
01:09:06.380 | And if you have in general, how far away are we from,
01:09:10.660 | you worked a lot on language,
01:09:12.340 | understanding language generation,
01:09:15.420 | but the full dialogue, the conversation,
01:09:17.700 | just sitting at the bar,
01:09:19.860 | having a cup of beers for an hour,
01:09:21.720 | that kind of conversation, have you thought about it?
01:09:23.620 | - Yeah, so I think you touched here on the critical point,
01:09:26.380 | which is feasibility, right?
01:09:28.580 | So there's a great sort of essay by Hamming,
01:09:32.860 | which describes sort of grand challenges of physics.
01:09:37.360 | And he argues that, well, okay, for instance,
01:09:41.060 | teleportation or time travel
01:09:43.060 | are great grand challenges of physics,
01:09:45.220 | but there's no attacks.
01:09:46.580 | We really don't know or cannot kind of make any progress.
01:09:50.300 | So that's why most physicists and so on,
01:09:53.340 | they don't work on these in their PhDs
01:09:55.340 | and as part of their careers.
01:09:57.860 | So I see the Turing test as, in the full Turing test,
01:10:00.980 | as a bit still too early.
01:10:02.720 | Like I am, I think we're,
01:10:05.220 | especially with the current trend
01:10:06.700 | of deep learning language models,
01:10:10.060 | we've seen some amazing examples.
01:10:11.580 | I think GPT-2 being the most recent one,
01:10:14.140 | which is very impressive,
01:10:15.820 | but to understand, to fully solve passing
01:10:19.540 | or fooling a human to think that you're,
01:10:22.060 | that there's a human on the other side,
01:10:23.420 | I think we're quite far.
01:10:24.940 | So as a result, I don't see myself,
01:10:27.300 | and I probably would not recommend people doing a PhD
01:10:30.460 | on solving the Turing test,
01:10:31.620 | because it just feels it's kind of too early
01:10:34.080 | or too hard of a problem.
01:10:35.460 | - Yeah, but that said, you said the exact same thing
01:10:37.780 | about StarCraft about a few years ago.
01:10:40.420 | - Indeed. - So to Demis.
01:10:41.580 | So I appreciate. (laughs)
01:10:43.420 | - Yes. - You'll probably also be
01:10:45.020 | the person who passes the Turing test in three years.
01:10:48.180 | - I mean, I think that, yeah, so.
01:10:50.980 | - So we have this on record, this is nice.
01:10:52.660 | - It's true, it's true.
01:10:53.500 | I mean, it's true that progress sometimes
01:10:56.540 | is a bit unpredictable.
01:10:57.780 | I really wouldn't have not, even six months ago,
01:11:00.780 | I would not have predicted the level that we see
01:11:03.220 | that these agents can deliver at Grandmaster level.
01:11:06.740 | But I have worked on language enough,
01:11:10.060 | and basically my concern is not that something could happen,
01:11:13.580 | a breakthrough could happen that would bring us
01:11:15.620 | to solving or passing the Turing test,
01:11:18.380 | is that I just think the statistical approach to it,
01:11:21.660 | like this, it's not gonna cut it.
01:11:24.100 | So we need a breakthrough, which is great for the community.
01:11:28.260 | But given that, I think there's quite more uncertainty.
01:11:31.740 | Whereas for StarCraft, I knew what the steps would be
01:11:36.740 | to kind of get us there.
01:11:38.060 | I think it was clear that using the imitation learning part
01:11:41.540 | and then using these Battle.net for agents
01:11:44.300 | were gonna be key, and it turned out that this was the case
01:11:48.220 | and a little more was needed, but not much more.
01:11:51.540 | For Turing test, I just don't know what the plan
01:11:54.260 | or execution plan would look like.
01:11:55.900 | So that's why I myself working on it
01:11:59.060 | as a grand challenge is hard,
01:12:01.420 | but there are quite a few sub challenges that are related
01:12:04.780 | that you could say, well, I mean,
01:12:05.900 | what if you create a great assistant,
01:12:09.020 | like Google already has like the Google Assistant,
01:12:11.340 | so can we make it better?
01:12:13.020 | And can we make it fully neural and so on?
01:12:15.380 | That I start to believe maybe we're reaching a point
01:12:18.140 | where we should attempt these challenges.
01:12:20.660 | - I like this conversation so much
01:12:22.380 | 'cause it echoes very much the StarCraft conversation.
01:12:24.820 | It's exactly how you approach StarCraft.
01:12:26.820 | Let's break it down into small pieces and solve those,
01:12:29.580 | and you end up solving the whole game.
01:12:31.300 | Great, but that said, you're behind some of the
01:12:35.180 | sort of biggest pieces of work in deep learning
01:12:37.660 | in the last several years.
01:12:39.300 | So you mentioned some limits.
01:12:42.260 | What do you think of the current limits of deep learning
01:12:44.900 | and how do we overcome those limits?
01:12:47.020 | - So if I had to actually use a single word
01:12:50.100 | to define the main challenge in deep learning,
01:12:53.140 | it's a challenge that probably has been the challenge
01:12:55.660 | for many years and is that of generalization.
01:12:59.660 | So what that means is that all that we're doing
01:13:04.460 | is fitting functions to data.
01:13:06.700 | And when the data we see is not from the same distribution
01:13:11.700 | or even if there are sometimes that it is very close
01:13:15.060 | to the distribution, but because of the way we train it
01:13:18.140 | with limited samples, we then get to this stage
01:13:22.340 | where we just don't see generalization
01:13:25.540 | as much as we can generalize.
01:13:27.700 | And I think adversarial examples are a clear example
01:13:30.780 | of this, but if you study machine learning and literature
01:13:34.540 | and the reason why SVMs came very popular
01:13:38.260 | were because they were dealing
01:13:39.660 | and they had some guarantees about generalization,
01:13:42.300 | which is unseen data or out of distribution,
01:13:45.500 | or even within distribution where you take an image,
01:13:47.900 | adding a bit of noise, these models fail.
01:13:51.220 | So I think really, I don't see a lot of progress
01:13:56.220 | on generalization in the strong generalization sense
01:14:00.780 | of the word.
01:14:01.820 | I think our neural networks, you can always find
01:14:06.820 | design examples that will make their outputs arbitrary,
01:14:10.980 | which is not good because we humans would never be fooled
01:14:15.980 | by these kind of images or manipulation of the image.
01:14:19.900 | And if you look at the mathematics, you kind of understand
01:14:22.660 | this is a bunch of matrices multiplied together.
01:14:26.060 | There's probably numerics and instability
01:14:28.020 | that you can just find corner cases.
01:14:30.820 | So I think that's really the underlying topic.
01:14:34.500 | Many times we see when even at the grand stage
01:14:38.700 | of like Turing test generalization, I mean, if you start,
01:14:43.100 | I mean, passing the Turing test, should it be in English
01:14:46.420 | or should it be in any language, right?
01:14:48.500 | I mean, as a human, if you ask something
01:14:52.260 | in a different language, you actually will go
01:14:54.060 | and do some research and try to translate it and so on.
01:14:57.660 | Should the Turing test include that, right?
01:15:01.020 | And it's really a difficult problem and very fascinating
01:15:03.980 | and very mysterious, actually.
01:15:05.300 | - Yeah, absolutely.
01:15:06.260 | But do you think it's, if you were to try to solve it,
01:15:10.460 | can you not grow the size of data intelligently
01:15:14.220 | in such a way that the distribution of your training set
01:15:17.380 | does include the entirety of the testing set?
01:15:20.340 | - I think--
01:15:21.180 | - Is that one path?
01:15:22.020 | The other path is totally new methodology.
01:15:23.820 | - Right. - It's not statistical.
01:15:24.940 | - So a path that has worked well, and it worked well
01:15:27.860 | in StarCraft and in machine translation and in languages,
01:15:30.660 | scaling up the data and the model.
01:15:32.780 | And that's kind of been maybe the only single formula
01:15:37.340 | that still delivers today in deep learning, right?
01:15:40.420 | It's that scale, data scale and model scale
01:15:44.020 | really do more and more of the things that we thought,
01:15:47.060 | oh, there's no way it can generalize to these
01:15:49.180 | or there's no way it can generalize to that.
01:15:51.300 | But I don't think fundamentally it will be solved with this.
01:15:54.820 | And for instance, I'm really liking some style or approach
01:15:59.580 | that would not only have neural networks,
01:16:02.100 | but it would have programs or some discrete decision-making
01:16:06.380 | because there is where I feel there's a bit more,
01:16:09.700 | like, I mean, the example of, the best example,
01:16:12.140 | I think for understanding this is,
01:16:14.620 | I also worked a bit on, oh, like we can learn an algorithm
01:16:17.580 | with a neural network, right?
01:16:18.780 | So you give it many examples and it's gonna sort your,
01:16:21.340 | sort the input numbers or something like that.
01:16:24.380 | But really, strong generalization is,
01:16:27.740 | you give me some numbers or you ask me to create an algorithm
01:16:31.380 | that sorts numbers.
01:16:32.300 | And instead of creating a neural net,
01:16:33.700 | which will be fragile because it's gonna go out of range
01:16:37.340 | at some point, you're gonna give it numbers
01:16:38.980 | that are too large, too small and whatnot.
01:16:42.060 | You just, if you just create a piece of code
01:16:45.340 | that sorts the numbers, then you can prove
01:16:47.180 | that that will generalize to absolutely
01:16:49.660 | all the possible inputs you could give.
01:16:51.940 | So I think that's, the problem comes
01:16:53.820 | with some exciting prospects.
01:16:55.860 | I mean, scale is a bit more boring, but it really works.
01:16:59.460 | And then maybe programs and discrete abstractions
01:17:02.860 | are a bit less developed, but clearly I think
01:17:06.380 | they're quite exciting in terms of future for the field.
01:17:09.900 | - Do you draw any insight wisdom from the 80s
01:17:13.460 | and expert systems and symbolic systems, symbolic computing?
01:17:16.900 | Do you ever go back to those sort of reasoning,
01:17:19.580 | that kind of logic?
01:17:20.740 | Do you think that might make a comeback?
01:17:23.140 | You'll have to dust off those books?
01:17:24.900 | - Yeah, I actually love actually adding
01:17:28.180 | more inductive biases.
01:17:30.180 | To me, the problem really is, what are you trying to solve?
01:17:34.260 | If what you're trying to solve is so important
01:17:36.460 | that try to solve it no matter what,
01:17:39.140 | then absolutely use rules, use domain knowledge,
01:17:44.140 | and then use a bit of the magic of machine learning
01:17:46.860 | to empower or to make the system as the best system
01:17:50.060 | that will detect cancer or detect weather patterns, right?
01:17:55.060 | Or in terms of StarCraft, it also was a very big challenge.
01:17:59.060 | So I was definitely happy that if we had to cut a corner here
01:18:04.180 | and there, it could have been interesting to do.
01:18:06.820 | And in fact, in StarCraft, we start thinking
01:18:09.500 | about expert systems because it's a very, you can define,
01:18:12.700 | I mean, people actually build StarCraft bots
01:18:15.020 | by thinking about those principles,
01:18:16.820 | like state machines and rule-based,
01:18:20.140 | and then you could think of combining a bit
01:18:22.820 | of a rule-based system,
01:18:24.420 | but that has also neural networks incorporated
01:18:27.380 | to make it generalize a bit better.
01:18:28.980 | So absolutely, I mean, we should definitely go back
01:18:31.740 | to those ideas and anything that makes the problem simpler.
01:18:35.300 | As long as your problem is important, that's okay.
01:18:37.900 | And that's research driving a very important problem.
01:18:40.940 | And on the other hand, if you wanna really focus
01:18:44.420 | on the limits of reinforcement learning,
01:18:46.500 | then of course you must try not to look at imitation data
01:18:50.620 | or to look for some rules of the domain
01:18:54.060 | that would help a lot or even feature engineering, right?
01:18:56.900 | So this is a tension that depending on what you do,
01:19:00.620 | I think both ways are definitely fine.
01:19:03.180 | And I would never not do one or the other,
01:19:05.900 | if you're, as long as what you're doing is important
01:19:08.780 | and needs to be solved, right?
01:19:09.900 | - Right.
01:19:11.180 | So there's a bunch of different ideas
01:19:13.380 | that you've developed that I really enjoy.
01:19:16.780 | But one is translating from image captioning,
01:19:21.780 | translating from image to text.
01:19:23.820 | Just another beautiful idea, I think,
01:19:28.460 | that resonates throughout your work, actually.
01:19:33.140 | So the underlying nature of reality being language,
01:19:36.060 | always, somehow. - Yeah.
01:19:38.740 | - So what's the connection between images and text,
01:19:42.460 | or rather the visual world
01:19:43.940 | and the world of language in your view?
01:19:46.460 | - Right, so I think a piece of research
01:19:50.580 | that's been central to, I would say,
01:19:52.300 | even extending into StarCraft is this idea
01:19:54.980 | of sequence to sequence learning,
01:19:57.580 | which what we really meant by that is that
01:20:00.060 | you can now really input anything to a neural network
01:20:04.500 | as the input X,
01:20:06.060 | and then the neural network will learn a function F
01:20:09.500 | that will take X as an input and produce any output Y.
01:20:12.740 | And these X and Ys don't need to be static
01:20:16.140 | or like a fixed vectors or anything like that.
01:20:21.140 | It could be really sequences
01:20:23.700 | and now beyond data structures, right?
01:20:26.500 | So that paradigm was tested in a very interesting way
01:20:31.500 | when we moved from translating French to English
01:20:35.700 | to translating an image to its caption.
01:20:37.860 | But the beauty of it is that really,
01:20:40.660 | and that's actually how it happened.
01:20:42.060 | I ran, I changed the line of code
01:20:44.260 | in this thing that was doing machine translation,
01:20:47.140 | and I came the next day and I saw how it,
01:20:50.420 | like it was producing captions that seemed like,
01:20:53.580 | oh my God, this is really, really working.
01:20:55.980 | And the principle is the same, right?
01:20:57.500 | So I think I don't see text, vision, speech,
01:21:02.500 | waveforms as something different.
01:21:06.100 | As long as you basically learn a function
01:21:10.500 | that will vectorize these into,
01:21:14.740 | and then after we vectorize it,
01:21:16.460 | we can then use transformers, LSTMs,
01:21:19.540 | whatever the flavor of the month of the model is.
01:21:22.380 | And then as long as we have enough supervised data,
01:21:25.700 | really this formula will work and will keep working,
01:21:29.980 | I believe, to some extent,
01:21:31.780 | model of these generalization issues that I mentioned before.
01:21:34.940 | - So, but the task there is to vectorize,
01:21:36.700 | sort of form a representation that's meaningful, I think.
01:21:39.820 | And your intuition now,
01:21:41.460 | having worked with all this media is that
01:21:43.500 | once you are able to form that representation,
01:21:46.460 | you could basically take anything, any sequence.
01:21:49.100 | Is there, going back to StarCraft,
01:21:52.460 | is there limits on the length?
01:21:55.340 | So we didn't really touch on the long-term aspect.
01:21:59.420 | How did you overcome the whole
01:22:01.340 | really long-term aspect of things here?
01:22:03.740 | Is there some tricks or--
01:22:05.100 | - So the main trick, so StarCraft,
01:22:08.340 | if you look at absolutely every frame,
01:22:10.620 | you might think it's quite a long game.
01:22:12.420 | So we would have to multiply 22 times 60 seconds per minute
01:22:17.420 | times maybe at least 10 minutes per game on average.
01:22:21.740 | So there are quite a few frames,
01:22:25.660 | but the trick really was to only observe, in fact,
01:22:30.180 | which might be seen as a limitation,
01:22:32.260 | but it is also a computational advantage.
01:22:35.180 | Only observe when you act.
01:22:37.580 | And then what the neural network decides
01:22:39.980 | is what is the gap gonna be until the next action?
01:22:43.620 | And if you look at most StarCraft games
01:22:48.060 | that we have in the dataset that Blizzard provided,
01:22:51.940 | it turns out that most games are actually only,
01:22:55.980 | I mean, it is still a long sequence,
01:22:57.980 | but it's maybe like 1,000 to 1,500 actions,
01:23:02.060 | which if you start looking at LSTMs,
01:23:06.140 | large LSTMs, transformers,
01:23:08.580 | it's not that difficult,
01:23:11.620 | especially if you have supervised learning.
01:23:14.460 | If you had to do it with reinforcement learning,
01:23:16.220 | the credit assignment problem,
01:23:17.700 | what is it in this game that made you win?
01:23:19.780 | That would be really difficult.
01:23:21.580 | But thankfully, because of imitation learning,
01:23:24.540 | we didn't kind of have to deal with this directly.
01:23:27.420 | Although if we had to, we tried it,
01:23:29.580 | and what happened is you just take all your workers
01:23:31.820 | and attack with them.
01:23:33.340 | And that sort of is kind of obvious in retrospect
01:23:36.060 | because you start trying random actions.
01:23:38.100 | One of the actions will be a worker
01:23:40.300 | that goes to the enemy base,
01:23:41.420 | and because it's self-play,
01:23:42.980 | it's not gonna know how to defend
01:23:44.740 | because it basically doesn't know almost anything.
01:23:47.020 | And eventually what you develop is this,
01:23:49.420 | take all workers and attack,
01:23:51.060 | because the credit assignment issue in ARR is really, really hard.
01:23:55.860 | I do believe we could do better,
01:23:57.580 | and that's maybe a research challenge for the future.
01:24:00.580 | But yeah, even in StarCraft,
01:24:03.460 | the sequences are maybe 1,000,
01:24:05.420 | which I believe is within the realm of what transformers can do.
01:24:10.380 | Yeah, I guess the difference between StarCraft and Go is
01:24:14.540 | in Go and chess, stuff starts happening right away.
01:24:17.820 | - Right. - So there's not...
01:24:19.420 | Yeah, it's pretty easy through self-play,
01:24:22.180 | not easy, but through self-play,
01:24:23.460 | it's possible to develop reasonable strategies quickly
01:24:25.940 | as opposed to StarCraft.
01:24:27.220 | I mean, in Go, there's only 400 actions,
01:24:30.620 | but one action is what people would call the God action
01:24:34.140 | that would be, if you had expanded the whole search tree,
01:24:38.660 | that's the best action if you did minimax
01:24:40.780 | or whatever algorithm you would do
01:24:42.540 | if you had the computational capacity.
01:24:44.940 | But in StarCraft, 400 is minuscule.
01:24:48.620 | Like in 400, you couldn't even click
01:24:51.900 | on the pixels around a unit, right?
01:24:53.780 | So I think the problem there is,
01:24:56.420 | in terms of action space size, is way harder.
01:25:00.900 | So, and that search is impossible.
01:25:03.820 | So there's quite a few challenges indeed
01:25:05.980 | that make this kind of a step up
01:25:09.300 | in terms of machine learning.
01:25:10.580 | For humans, maybe playing StarCraft seems more intuitive
01:25:14.420 | because it looks real.
01:25:15.900 | I mean, the graphics and everything moves smoothly,
01:25:18.780 | whereas I don't know how to,
01:25:20.140 | I mean, Go is a game that I would really need to study.
01:25:22.620 | It feels quite complicated.
01:25:23.860 | But for machines, kind of maybe it's the reverse, yes.
01:25:27.020 | - Which shows you the gap actually between deep learning
01:25:30.140 | and however the heck our brains work.
01:25:32.140 | So you developed a lot of really interesting ideas.
01:25:35.980 | It's interesting to just ask,
01:25:37.540 | what's your process of developing new ideas?
01:25:41.140 | Do you like brainstorming with others?
01:25:42.860 | Do you like thinking alone?
01:25:44.500 | Do you like, like what was it, Ian Goodfellow said
01:25:49.100 | he came up with GANs after a few beers.
01:25:51.260 | - Right.
01:25:53.340 | - He thinks beers are essential for coming up with new ideas.
01:25:55.820 | - We had beers to decide to play another game
01:25:58.500 | of StarCraft after a week.
01:25:59.660 | So it's really similar to that story.
01:26:02.660 | Actually, I explained this in a DeepMind retreat
01:26:05.780 | and I said, this is the same as the GAN story.
01:26:07.900 | I mean, we were in a bar and we decided,
01:26:09.540 | let's play a game next week and that's what happened.
01:26:11.820 | - I feel like we're giving the wrong message
01:26:13.500 | to young undergrads.
01:26:15.020 | - Yeah, I know.
01:26:15.860 | - But in general, like, do you like brainstorming?
01:26:18.220 | Do you like thinking alone, working stuff out?
01:26:20.140 | - So I think throughout the years also things changed, right?
01:26:23.860 | So initially I was very fortunate to be with great minds
01:26:28.860 | like Jeff Hinton, Jeff Dean, Ilya Sutskever.
01:26:33.940 | I was really fortunate to join Brain at the very good time.
01:26:37.660 | So at that point, ideas, I was just kind of brainstorming
01:26:41.460 | with my colleagues and learned a lot.
01:26:43.940 | And keep learning is actually something
01:26:46.300 | you should never stop doing, right?
01:26:48.100 | So learning implies reading papers
01:26:50.940 | and also discussing ideas with others.
01:26:53.140 | It's very hard at some point to not communicate
01:26:56.620 | that being reading a paper from someone
01:26:59.060 | or actually discussing, right?
01:27:00.460 | So definitely that communication aspect needs to be there,
01:27:05.420 | whether it's written or oral.
01:27:07.580 | Nowadays, I'm also trying to be a bit more strategic
01:27:12.780 | about what research to do.
01:27:15.020 | So I was describing a little bit this sort of tension
01:27:18.420 | between research for the sake of research.
01:27:21.460 | And then you have, on the other hand,
01:27:22.940 | applications that can drive the research, right?
01:27:25.580 | And honestly, the formula that has worked best for me
01:27:28.500 | is just find a hard problem
01:27:31.540 | and then try to see how research fits into it,
01:27:34.620 | how it doesn't fit into it, and then you must innovate.
01:27:37.820 | So I think machine translation drove sequence to sequence.
01:27:42.820 | Then maybe like learning algorithms that had to,
01:27:47.140 | like combinatorial algorithms led to pointer networks.
01:27:50.540 | StarCraft led to really scaling up imitation learning
01:27:53.860 | and the AlphaStar League.
01:27:55.540 | So that's been a formula that I personally like,
01:27:58.380 | but the other one is also valid.
01:27:59.980 | And I see it succeed a lot of the times
01:28:02.740 | where you just want to investigate model-based RL
01:28:06.540 | as a kind of a research topic.
01:28:08.180 | And then you must then start to think,
01:28:11.020 | well, how are the tests?
01:28:12.180 | How are you going to test these ideas?
01:28:14.260 | You need to kind of a minimal environment to try things.
01:28:17.940 | You need to read a lot of papers and so on.
01:28:19.740 | And that's also very fun to do
01:28:21.020 | and something I've also done quite a few times,
01:28:24.060 | both at Brain, at DeepMind, and obviously as a PhD.
01:28:27.580 | So I think besides the ideas and discussions,
01:28:32.580 | I think it's important also
01:28:34.660 | because you start sort of guiding not only your own goals,
01:28:39.660 | but other people's goals to the next breakthrough.
01:28:43.860 | So you must really kind of understand this feasibility also,
01:28:48.700 | as we were discussing before, right?
01:28:50.340 | Whether this domain is ready to be tackled or not,
01:28:54.020 | and you don't want to be too early.
01:28:55.460 | You obviously don't want to be too late.
01:28:56.940 | So it's really interesting,
01:28:59.180 | this strategic component of research,
01:29:01.060 | which I think as a grad student, I just had no idea.
01:29:05.100 | I just read papers and discussed ideas.
01:29:07.380 | And I think this has been maybe the major change.
01:29:09.780 | And I recommend people kind of feed forward to success,
01:29:14.180 | how it looks like, and try to backtrack,
01:29:16.060 | other than just kind of looking,
01:29:17.820 | oh, this looks cool, this looks cool.
01:29:19.180 | And then you do a bit of random work,
01:29:21.020 | which sometimes you stumble upon some interesting things,
01:29:23.820 | but in general, it's also good to plan a bit.
01:29:27.540 | - Yeah, I like it.
01:29:29.020 | Especially like your approach
01:29:30.460 | of taking a really hard problem, stepping right in,
01:29:33.140 | and then being super skeptical
01:29:34.660 | about being able to solve the problem.
01:29:37.540 | I mean, there's a balance of both, right?
01:29:40.100 | There's a silly optimism and a critical sort of skepticism
01:29:45.100 | that's good to balance,
01:29:48.380 | which is why it's good to have a team of people
01:29:51.180 | that balance that.
01:29:52.660 | - You don't do that on your own.
01:29:53.900 | You have both mentors that have seen,
01:29:56.460 | or you obviously wanna chat and discuss
01:29:59.740 | whether it's the right time.
01:30:00.900 | I mean, Demis came in 2014 and he said,
01:30:04.620 | "Maybe in a bit, we'll do StarCraft."
01:30:06.580 | And maybe he knew.
01:30:08.340 | And I'm just following his lead, which is great,
01:30:11.220 | because he's brilliant, right?
01:30:12.620 | So these things are obviously quite important
01:30:17.340 | that you wanna be surrounded by people who are diverse.
01:30:22.300 | They have their knowledge.
01:30:24.020 | There's also important to...
01:30:26.380 | I mean, I've learned a lot from people
01:30:28.340 | who actually have an idea that I might not think it's good,
01:30:32.460 | but if I give them the space to try it,
01:30:34.940 | I've been proven wrong many, many times as well.
01:30:36.980 | So that's great.
01:30:38.220 | I think it's...
01:30:39.140 | Your colleagues are more important than yourself, I think.
01:30:43.500 | - Sure.
01:30:44.580 | Now, let's real quick talk about another impossible problem.
01:30:49.620 | - Right.
01:30:50.460 | - What do you think it takes to build a system
01:30:52.460 | that's human level intelligence?
01:30:54.100 | We talked a little bit about the Turing test, StarCraft,
01:30:56.380 | all of these have echoes of general intelligence.
01:30:59.020 | But if you think about just something
01:31:01.420 | that you would sit back and say,
01:31:02.860 | "Wow, this is really something
01:31:05.460 | "that resembles human level intelligence."
01:31:07.820 | What do you think it takes to build that?
01:31:09.580 | - So I find that AGI oftentimes
01:31:13.940 | is maybe not very well-defined.
01:31:17.220 | So what I'm trying to then come up with for myself
01:31:20.500 | is what would be a result look like
01:31:23.980 | that you would start to believe
01:31:25.540 | that you would have agents or neural nets
01:31:28.460 | that no longer sort of overfit to a single task, right?
01:31:31.900 | But actually kind of learn the skill of learning,
01:31:36.900 | so to speak.
01:31:37.900 | And that actually is a field that I am fascinated by,
01:31:41.460 | which is the learning to learn or meta-learning,
01:31:45.020 | which is about no longer learning about a single domain.
01:31:48.620 | So you can think about the learning algorithm itself
01:31:51.620 | is general, right?
01:31:52.700 | So the same formula we applied for AlphaStar or StarCraft,
01:31:56.780 | we can now apply to kind of almost any video game
01:31:59.420 | or you could apply to many other problems and domains.
01:32:03.540 | But the algorithm is what's kind of generalizing.
01:32:06.980 | But the neural network, those weights are useless
01:32:10.420 | even to play another race, right?
01:32:12.060 | I train a network to play very well at Protoss versus Protoss.
01:32:15.420 | I need to throw away those weights.
01:32:17.620 | If I want to play now Terran versus Terran,
01:32:20.620 | I would need to retrain a network from scratch
01:32:23.700 | with the same algorithm.
01:32:24.820 | That's beautiful, but the network itself will not be useful.
01:32:28.540 | So I think if I see an approach that can absorb
01:32:33.540 | or start solving new problems
01:32:36.660 | without the need to kind of restart the process,
01:32:40.100 | I think that to me would be a nice way
01:32:42.580 | to define some form of AGI.
01:32:45.620 | Again, I don't know the grandiose,
01:32:47.620 | like should Turing test be solved before AGI?
01:32:50.540 | I mean, I don't know.
01:32:51.740 | I think concretely, I would like to see clearly
01:32:54.700 | that meta-learning happen,
01:32:56.940 | meaning there is an architecture or a network
01:33:00.860 | that as it sees new problem or new data, it solves it.
01:33:05.020 | And to make it kind of a benchmark,
01:33:08.300 | it should solve it at the same speed that we do solve
01:33:10.740 | new problems.
01:33:11.580 | When I define you a new object and you have to recognize it.
01:33:14.500 | When you start playing a new game,
01:33:16.300 | you played all the Atari games,
01:33:17.500 | but now you play a new Atari game.
01:33:19.460 | Well, you're going to be pretty quickly,
01:33:21.580 | pretty good at the game.
01:33:22.540 | So that's perhaps what's the domain
01:33:25.900 | and what's the exact benchmark is a bit difficult.
01:33:28.060 | I think as a community,
01:33:29.100 | we might need to do some work to define it.
01:33:31.380 | But I think this first step,
01:33:34.380 | I could see it happen relatively soon.
01:33:36.900 | But then the whole what AGI means and so on,
01:33:40.660 | I am a bit more confused about what,
01:33:43.140 | I think people mean different things.
01:33:44.660 | - Yeah, there's an emotional, psychological level
01:33:47.100 | that like even the Turing test,
01:33:51.980 | passing the Turing test is something
01:33:53.780 | that we just pass judgment on as human beings,
01:33:55.900 | what it means to be, you know,
01:33:57.740 | is a dog in AGI system.
01:34:02.740 | - Yeah.
01:34:04.500 | - Like what level, what does it mean?
01:34:06.220 | - Right.
01:34:07.060 | - Yeah, what does it mean?
01:34:07.900 | But I like the generalization
01:34:08.980 | and maybe as a community we converge
01:34:10.700 | towards a group of domains
01:34:13.020 | that are sufficiently far away,
01:34:15.060 | that would be really damn impressive
01:34:16.580 | if it was able to generalize.
01:34:18.340 | So perhaps not as close as Protoss and Zerg,
01:34:21.420 | but like Wikipedia.
01:34:22.820 | - That would be a good step, yeah.
01:34:23.660 | - Yeah, it would be a good step.
01:34:24.700 | And then a really good step,
01:34:26.420 | but then like from Starcraft to Wikipedia and back.
01:34:30.860 | - Yeah.
01:34:31.700 | - That kind of thing.
01:34:32.540 | - And that feels also quite hard and far,
01:34:34.300 | but I think there's,
01:34:36.220 | as long as you put the benchmark out,
01:34:38.220 | as we discovered, for instance, with ImageNet,
01:34:41.140 | then tremendous progress can be had.
01:34:43.060 | So I think maybe there's a lack of benchmark,
01:34:46.460 | but I'm sure we'll find one
01:34:47.820 | and the community will then work towards that.
01:34:50.700 | And then beyond what AGI might mean or would imply,
01:34:57.020 | I really am hopeful to see basically machine learning
01:35:01.100 | or AI just scaling up and helping people
01:35:05.300 | that might not have the resources to hire an assistant
01:35:08.740 | or that they might not even know what the weather is like.
01:35:13.740 | So I think there's, in terms of the impact,
01:35:16.460 | the positive impact of AI,
01:35:18.020 | I think that's maybe what we should also not lose focus.
01:35:22.500 | The research community building AGI,
01:35:23.980 | I mean, that's a real nice goal,
01:35:25.540 | but I think the way that DeepMind puts it is,
01:35:28.500 | and then use it to solve everything else, right?
01:35:30.820 | So I think we should paralyze.
01:35:33.500 | - Yeah, we shouldn't forget about all the positive things
01:35:36.180 | that are actually coming out of AI already
01:35:37.700 | and are going to be coming out.
01:35:40.700 | - Right.
01:35:41.660 | - But on that note, let me ask,
01:35:45.020 | relative to popular perception,
01:35:47.140 | do you have any worry about the existential threat
01:35:49.660 | of artificial intelligence in the near or far future
01:35:53.260 | that some people have?
01:35:55.180 | - I think in the near future, I'm skeptical,
01:35:58.100 | so I hope I'm not wrong,
01:35:59.340 | but I'm not concerned,
01:36:02.380 | but I appreciate efforts, ongoing efforts,
01:36:06.100 | and even like whole research field on AI safety emerging
01:36:09.260 | and in conferences and so on, I think that's great.
01:36:12.620 | In the long term, I really hope we just can simply
01:36:17.580 | have the benefits outweigh the potential dangers.
01:36:20.700 | I am hopeful for that,
01:36:23.380 | but also we must remain vigilant to kind of monitor
01:36:26.540 | and assess whether the trade-offs are there
01:36:29.140 | and we have enough also lead time to prevent
01:36:33.740 | or to redirect our efforts if need be, right?
01:36:36.860 | So, but I'm quite optimistic about the technology
01:36:41.540 | and definitely more fearful of other threats
01:36:45.060 | in terms of planetary level at this point,
01:36:48.580 | but obviously that's the one I kind of have more power on.
01:36:52.500 | So clearly I do start thinking more and more about this
01:36:56.260 | and it's kind of, it's grown in me actually
01:36:58.980 | to start reading more about AI safety,
01:37:02.180 | which is a field that so far I have not really contributed
01:37:05.220 | to, but maybe there's something to be done there as well.
01:37:07.620 | - Well, I think it's really important.
01:37:08.980 | You know, I talk about this with a few folks,
01:37:11.460 | but it's important to ask you and shove it in your head
01:37:14.860 | because you're at the leading edge of actually
01:37:17.860 | what people are excited about in AI.
01:37:19.340 | I mean, the work with AlphaStar,
01:37:21.500 | it's arguably at the very cutting edge of the kind of thing
01:37:25.380 | that people are afraid of.
01:37:27.220 | And so you speaking to that fact
01:37:29.580 | and that we're actually quite far away
01:37:32.660 | to the kind of thing that people might be afraid of,
01:37:35.180 | but it's still worthwhile to think about.
01:37:38.300 | And it's also good that you're not as worried
01:37:43.300 | and you're also open to thinking about it.
01:37:45.780 | - There's two aspects.
01:37:46.620 | I mean, me not being worried,
01:37:47.740 | but obviously we should prepare for it, right?
01:37:52.060 | For like, for things that could go wrong,
01:37:55.260 | misuse of the technologies as with any technologies, right?
01:37:58.300 | So I think there's always trade-offs.
01:38:02.340 | And as a society, we've kind of solved these
01:38:05.700 | to some extent in the past.
01:38:07.300 | So I'm hoping that by having the researchers
01:38:10.660 | and the whole community brainstorm
01:38:13.460 | and come up with interesting solutions
01:38:15.540 | to the new things that will happen in the future,
01:38:18.900 | that we can still also push the research to the avenue
01:38:22.420 | that I think is kind of the greatest avenue,
01:38:24.380 | which is to understand intelligence, right?
01:38:27.700 | How are we doing what we're doing?
01:38:29.620 | And obviously from a scientific standpoint,
01:38:32.540 | that is kind of the drive, my personal drive
01:38:35.420 | of all the time that I spend doing what I'm doing, really.
01:38:40.020 | - Where do you see the deep learning as a field heading?
01:38:42.980 | Where do you think the next big breakthrough might be?
01:38:46.740 | - So I think deep learning,
01:38:48.060 | I discussed a little of this before,
01:38:50.700 | deep learning has to be combined
01:38:53.100 | with some form of discretization, program synthesis.
01:38:56.660 | I think that's kind of as a research in itself
01:38:59.220 | is an interesting topic to expand
01:39:01.500 | and start doing more research.
01:39:03.100 | And then as kind of what will deep learning
01:39:07.060 | enable to do in the future?
01:39:08.620 | I don't think that's gonna be what's gonna happen this year,
01:39:11.500 | but also this idea of starting not to throw away
01:39:15.820 | all the weights, that this idea of learning to learn
01:39:18.900 | and really having these agents
01:39:22.700 | not having to restart their weights.
01:39:24.980 | And you can have an agent that is kind of solving
01:39:28.700 | or classifying images on ImageNet,
01:39:31.060 | but also generating speech
01:39:32.700 | if you ask it to generate some speech.
01:39:34.660 | And it should really be kind of almost the same network,
01:39:39.660 | but it might not be a neural network,
01:39:41.740 | it might be a neural network
01:39:42.700 | with a optimization algorithm attached to it.
01:39:45.620 | But I think this idea of generalization to new task
01:39:49.300 | is something that we first must define good benchmarks,
01:39:52.180 | but then I think that's gonna be exciting
01:39:54.660 | and I'm not sure how close we are,
01:39:56.500 | but I think if you have a very limited domain,
01:40:00.900 | I think we can start doing some progress.
01:40:02.820 | And much like how we did a lot of programs
01:40:06.220 | in computer vision, we should start thinking,
01:40:08.860 | I really like a talk that Leon Boutou gave at ICML
01:40:12.700 | a few years ago, which is this train-test paradigm
01:40:16.460 | should be broken.
01:40:17.380 | We should stop thinking about a training set
01:40:22.300 | and a test set, and these are closed things
01:40:25.180 | that are untouchable.
01:40:26.620 | I think we should go beyond these.
01:40:28.180 | And in meta-learning, we call these the meta-training set
01:40:31.100 | and the meta-test set, which is really thinking about
01:40:35.340 | if I know about ImageNet,
01:40:37.300 | why would that network not work on MNIST,
01:40:39.980 | which is a much simpler problem?
01:40:41.340 | But right now it really doesn't.
01:40:43.020 | But it just feels wrong, right?
01:40:46.180 | So I think that's kind of the,
01:40:48.820 | on the application or the benchmark sites,
01:40:52.060 | we probably will see quite a few more interest and progress
01:40:56.500 | and hopefully people defining new
01:40:59.020 | and exciting challenges, really.
01:41:00.940 | - Do you have any hope or interest in knowledge graphs
01:41:04.180 | within this context?
01:41:05.260 | So this is kind of constructing graph.
01:41:08.180 | So going back to graphs.
01:41:10.500 | Well, neural networks and graphs,
01:41:12.140 | but I mean a different kind of knowledge graph,
01:41:14.900 | sort of like semantic graphs where there's concepts.
01:41:18.100 | - Yeah, so I think the idea of graphs is,
01:41:23.100 | so I've been quite interested in sequences first
01:41:26.420 | and then more interesting or different data structures
01:41:29.100 | like graphs.
01:41:29.940 | And I've studied graph neural networks
01:41:33.100 | in the last three years or so.
01:41:34.540 | I found these models just very interesting
01:41:37.700 | from like deep learning sites standpoint.
01:41:42.220 | But then why do we want these models
01:41:45.860 | and why would we use them?
01:41:47.300 | What's the application?
01:41:48.660 | What's kind of the killer application of graphs, right?
01:41:51.420 | And perhaps if we could extract a knowledge graph
01:41:56.420 | from Wikipedia automatically, that would be interesting
01:42:02.460 | because then these graphs have
01:42:04.740 | this very interesting structure
01:42:06.860 | that also is a bit more compatible
01:42:08.620 | with this idea of programs
01:42:10.540 | and deep learning kind of working together,
01:42:13.180 | jumping neighborhoods and so on.
01:42:14.860 | You could imagine defining some primitives
01:42:17.180 | to go around graphs, right?
01:42:18.860 | So I think I really like the idea of a knowledge graph.
01:42:23.860 | And in fact, when we started,
01:42:27.420 | or as part of the research we did for StarCraft,
01:42:31.340 | I thought, wouldn't it be cool to give the graph
01:42:34.420 | of all these buildings that depend on each other
01:42:39.420 | and units that have prerequisites of being built by that.
01:42:42.420 | And so this is information
01:42:44.820 | that the network can learn and extract,
01:42:46.900 | but it would have been great to see
01:42:50.100 | or to think of really StarCraft as a giant graph
01:42:52.940 | that even also as the game evolves,
01:42:54.940 | you just kind of start taking branches and so on.
01:42:57.980 | And we did a bit of research on this, nothing too relevant,
01:43:02.380 | but I really like the idea.
01:43:04.140 | - And it has elements that are,
01:43:05.660 | which is something you also worked with
01:43:07.340 | in terms of visualizing your networks,
01:43:08.820 | is elements of having human interpretable,
01:43:12.340 | being able to generate knowledge representations
01:43:15.700 | that are human interpretable
01:43:17.020 | that maybe human experts can then tweak
01:43:19.620 | or at least understand.
01:43:20.940 | So there's a lot of interesting aspect there.
01:43:22.860 | And for me personally, I'm just a huge fan of Wikipedia
01:43:25.620 | and it's a shame that our neural networks
01:43:29.140 | aren't taking advantage of all the structured knowledge
01:43:31.340 | that's on the web.
01:43:32.380 | What's next for you?
01:43:34.860 | What's next for DeepMind?
01:43:36.340 | What are you excited about for AlphaStar?
01:43:39.700 | - Yeah, so I think the obvious next steps
01:43:43.540 | would be to apply AlphaStar to other races.
01:43:47.980 | I mean, that sort of shows that the algorithm works
01:43:51.580 | because we wouldn't want to have created by mistake
01:43:55.580 | something in the architecture that happens to work
01:43:58.100 | for Protoss, but not for other races, right?
01:44:00.100 | So as verification, I think that's an obvious next step
01:44:03.500 | that we are working on.
01:44:05.740 | And then I would like to see,
01:44:09.300 | so agents and players can specialize on different skill sets
01:44:13.740 | that allow them to be very good.
01:44:15.980 | I think we've seen AlphaStar understanding very well
01:44:19.500 | when to take battles and when to not do that.
01:44:22.460 | Also very good at micromanagement
01:44:24.900 | and moving the units around and so on.
01:44:27.540 | And also very good at producing nonstop
01:44:29.740 | and trading off economy with building units.
01:44:33.420 | But I have not perhaps seen as much as I would like
01:44:37.300 | this idea of the poker idea that you mentioned, right?
01:44:40.460 | I'm not sure StarCraft or AlphaStar rather
01:44:43.300 | has developed a very deep understanding
01:44:46.100 | of what the opponent is doing and reacting to that
01:44:50.100 | and sort of trying to trick the player to do something else
01:44:54.060 | or that, you know, so this kind of reasoning
01:44:57.220 | I would like to see more.
01:44:58.340 | So I think purely from a research standpoint,
01:45:01.620 | there's perhaps also quite a few things to be done there
01:45:04.620 | in the domain of StarCraft.
01:45:06.060 | - Yeah, in the domain of games,
01:45:08.140 | I've seen some interesting work in sort of,
01:45:10.980 | in even auctions, manipulating other players,
01:45:13.740 | sort of forming a belief state and just messing with people.
01:45:17.220 | - Yeah, it's called theory of mind, I guess.
01:45:18.820 | - Theory of mind, yeah.
01:45:20.140 | So it's fascinating.
01:45:21.420 | - Theory of mind on StarCraft is kind of,
01:45:23.860 | they're really made for each other.
01:45:26.100 | So that will be very exciting to see
01:45:28.660 | those techniques applied to StarCraft
01:45:30.500 | or perhaps StarCraft driving new techniques, right?
01:45:33.260 | As I said, this is always the tension between the two.
01:45:36.660 | - Wow, Oriol, thank you so much for talking today.
01:45:38.860 | - Awesome, it was great to be here, thanks.
01:45:40.980 | (upbeat music)
01:45:43.580 | (upbeat music)
01:45:46.180 | (upbeat music)
01:45:48.780 | (upbeat music)
01:45:51.380 | (upbeat music)
01:45:53.980 | (upbeat music)
01:45:56.580 | [BLANK_AUDIO]