Oriol Vinyals: DeepMind AlphaStar, StarCraft, and Language

00:00:00.000 | The following is a conversation with Ariel Vinales.

00:00:03.280 | He's a senior research scientist at Google DeepMind,

00:00:05.920 | and before that, he was at Google Brain and Berkeley.

00:00:09.120 | His research has been cited over 39,000 times.

00:00:13.280 | He's truly one of the most brilliant and impactful minds

00:00:16.520 | in the field of deep learning.

00:00:18.200 | He's behind some of the biggest papers and ideas in AI,

00:00:20.960 | including sequence-to-sequence learning,

00:00:23.080 | audio generation, image captioning,

00:00:25.480 | neural machine translation,

00:00:27.040 | and, of course, reinforcement learning.

00:00:29.640 | He's a lead researcher of the AlphaStar Project,

00:00:32.840 | creating an agent that defeated a top professional

00:00:35.760 | at the game of StarCraft.

00:00:38.080 | This conversation is part

00:00:39.800 | of the Artificial Intelligence Podcast.

00:00:41.800 | If you enjoy it, subscribe on YouTube, iTunes,

00:00:44.920 | or simply connect with me on Twitter @LexFriedman,

00:00:48.800 | spelled F-R-I-D.

00:00:51.240 | And now, here's my conversation with Ariel Vinales.

00:00:55.440 | You spearheaded the DeepMind team behind AlphaStar

00:00:59.600 | that recently beat a top professional player at StarCraft.

00:01:02.840 | So, you have an incredible wealth of work

00:01:07.680 | in deep learning and a bunch of fields,

00:01:09.440 | but let's talk about StarCraft first.

00:01:11.840 | Let's go back to the very beginning,

00:01:13.720 | even before AlphaStar, before DeepMind,

00:01:16.680 | before deep learning first.

00:01:18.840 | What came first for you, a love for programming

00:01:22.400 | or a love for video games?

00:01:24.960 | - I think for me, it definitely came first

00:01:28.520 | the drive to play video games.

00:01:31.920 | I really liked computers.

00:01:35.240 | I didn't really code much, but what I would do

00:01:38.560 | is I would just mess with the computer, break it and fix it.

00:01:42.040 | That was the level of skills, I guess,

00:01:43.760 | that I gained in my very early days.

00:01:46.360 | I mean, when I was 10 or 11.

00:01:48.480 | And then I really got into video games,

00:01:50.920 | especially StarCraft, actually, the first version.

00:01:53.640 | I spent most of my time just playing

00:01:55.760 | kind of pseudo-professionally,

00:01:57.040 | as professionally as you could play back in '98 in Europe,

00:02:01.000 | which was not a very main scene

00:02:03.040 | like what's called nowadays e-sports.

00:02:05.800 | - Right, of course, in the '90s.

00:02:07.360 | So, how'd you get into StarCraft?

00:02:09.880 | What was your favorite race?

00:02:11.640 | How did you develop your skill?

00:02:15.040 | What was your strategy?

00:02:16.840 | All that kind of thing.

00:02:17.960 | - So, as a player, I tended to try to play not many games,

00:02:21.480 | not to kind of disclose the strategies

00:02:23.640 | that I kind of developed.

00:02:25.360 | And I like to play random, actually,

00:02:27.520 | not in competitions, but just to...

00:02:30.000 | I think in StarCraft, there's three main races,

00:02:33.360 | and I found it very useful to play with all of them.

00:02:36.560 | So, I would choose random many times,

00:02:38.320 | even sometimes in tournaments,

00:02:40.160 | to gain skill on the three races,

00:02:42.320 | because it's not how you play against someone,

00:02:45.400 | but also if you understand the race because you play it,

00:02:48.720 | you also understand what's annoying,

00:02:51.000 | then when you're on the other side,

00:02:52.480 | what to do to annoy that person,

00:02:54.160 | to try to gain advantages here and there, and so on.

00:02:57.280 | So, I actually played random,

00:02:59.080 | although I must say, in terms of favorite race,

00:03:02.000 | I really like Zerk.

00:03:03.640 | I was probably best at Zerk,

00:03:05.480 | and that's probably what I tend to use

00:03:08.320 | towards the end of my career, before starting university.

00:03:11.400 | - So, let's step back a little bit.

00:03:13.280 | Could you try to describe StarCraft

00:03:15.600 | to people that may never have played video games,

00:03:18.880 | especially the massively online variety like StarCraft?

00:03:22.280 | - So, StarCraft is a real-time strategy game.

00:03:25.880 | And the way to think about StarCraft,

00:03:27.760 | perhaps if you understand a bit chess,

00:03:30.920 | is that there's a board, which is called map,

00:03:34.160 | or, again, the map where people play against each other.

00:03:39.080 | There's obviously many ways you can play,

00:03:40.920 | but the most interesting one is the one-versus-one setup,

00:03:44.560 | where you just play against someone else,

00:03:47.320 | or even the built-in AI, right?

00:03:48.800 | The Blizzard put a system

00:03:50.680 | that can play the game reasonably well,

00:03:52.520 | if you don't know how to play.

00:03:54.400 | And then, in this board, you have, again, pieces,

00:03:57.720 | like in chess, but these pieces are not there initially,

00:04:01.320 | like they are in chess.

00:04:02.240 | You actually need to decide to gather resources,

00:04:05.720 | to decide which pieces to build.

00:04:07.800 | So, in a way, you're starting almost with no pieces.

00:04:10.680 | You start gathering resources.

00:04:12.640 | In StarCraft, there's minerals and gas that you can gather,

00:04:16.120 | and then you must decide how much do you want to focus,

00:04:19.360 | for instance, on gathering more resources

00:04:21.400 | or starting to build units or pieces.

00:04:24.280 | And then, once you have enough pieces,

00:04:27.120 | or maybe like attack, a good attack composition,

00:04:32.040 | then you go and attack the other side of the map.

00:04:35.400 | And now, the other main difference with chess

00:04:37.720 | is that you don't see the other side of the map.

00:04:39.840 | So, you're not seeing the moves of the enemy.

00:04:43.280 | It's what we call partially observable.

00:04:45.360 | So, as a result, you must not only decide

00:04:48.600 | trading off economy versus building your own units,

00:04:52.200 | but you also must decide whether you want to scout

00:04:54.840 | to gather information, but also by scouting,

00:04:57.720 | you might be giving away some information

00:04:59.400 | that you might be hiding from the enemy.

00:05:01.840 | So, there's a lot of complex decision-making

00:05:04.880 | all in real time.

00:05:05.880 | There's also, unlike chess, this is not a turn-based game.

00:05:10.000 | You play basically all the time, continuously,

00:05:13.560 | and thus, some skill in terms of speed and accuracy

00:05:16.800 | of clicking is also very important.

00:05:18.800 | And people that train for this really play this game

00:05:21.360 | at an amazing skill level.

00:05:23.440 | I've seen many times this,

00:05:25.680 | and if you can witness this live,

00:05:27.240 | it's really, really impressive.

00:05:29.360 | So, in a way, it's kind of a chess

00:05:31.280 | where you don't see the other side of the board.

00:05:33.280 | You're building your own pieces,

00:05:35.080 | and you also need to gather resources

00:05:37.080 | to basically get some money to build other buildings,

00:05:40.560 | pieces, technology, and so on.

00:05:42.720 | - From the perspective of a human player,

00:05:45.000 | the difference between that and chess,

00:05:47.040 | or maybe that and a game like turn-based strategy,

00:05:50.640 | like "Heroes of Might and Magic,"

00:05:52.640 | is that there's an anxiety,

00:05:55.000 | 'cause you have to make these decisions really quickly.

00:05:58.640 | And if you are not actually aware of what decisions work,

00:06:03.640 | it's a very stressful balance.

00:06:06.320 | Everything you describe is actually quite stressful,

00:06:08.720 | difficult to balance for an amateur human player.

00:06:11.560 | I don't know if it gets easier at the professional level.

00:06:14.000 | Like, if they're fully aware of what they have to do,

00:06:16.320 | but at the amateur level, there's this anxiety,

00:06:19.080 | "Oh, crap, I'm being attacked.

00:06:20.280 | "Oh, crap, I have to build up resources.

00:06:22.560 | "Oh, I have to probably expand."

00:06:24.160 | And all these, the real-time strategy aspect

00:06:28.360 | is really stressful, and computation, I'm sure, difficult.

00:06:31.200 | We'll get into it, but for me, Battle.net,

00:06:35.840 | so StarCraft was released in '98, 20 years ago,

00:06:41.920 | which is hard to believe.

00:06:43.880 | And Blizzard Battle.net with Diablo in '96 came out.

00:06:48.880 | And to me, it might be a narrow perspective,

00:06:52.600 | but it changed online gaming, and perhaps society forever.

00:06:57.240 | But I may have a way too narrow viewpoint,

00:07:00.320 | but from your perspective,

00:07:02.240 | can you talk about the history of gaming

00:07:05.600 | over the past 20 years?

00:07:07.000 | Is this, how transformational,

00:07:09.640 | how important is this line of games?

00:07:12.760 | - Right, so I think I kind of was an active gamer

00:07:16.920 | whilst this was developing, the internet, online gaming.

00:07:20.560 | So for me, the way it came was

00:07:23.040 | I played other games, strategy-related.

00:07:26.400 | I played a bit of Common and Conquer,

00:07:28.400 | and then I played Warcraft II, which is from Blizzard.

00:07:31.840 | But at the time, I didn't know,

00:07:33.040 | I didn't understand about what Blizzard was or anything.

00:07:36.040 | Warcraft II was just a game,

00:07:37.320 | which was actually very similar to StarCraft in many ways.

00:07:40.280 | It's also a real-time strategy game,

00:07:42.480 | where there's orcs and humans, so there's only two races.

00:07:45.400 | - But it was offline.

00:07:46.520 | - And it was offline, right?

00:07:48.000 | So I remember a friend of mine came to school,

00:07:51.600 | say, "Oh, there's this new cool game called StarCraft."

00:07:53.960 | And I just said,

00:07:54.800 | "Oh, this sounds like just a copy of Warcraft II,"

00:07:57.600 | until I kind of installed it.

00:07:59.720 | And at the time, I am from Spain,

00:08:01.960 | so we didn't have very good internet, right?

00:08:04.640 | So there was, for us, StarCraft became first

00:08:07.600 | kind of an offline experience,

00:08:09.520 | where you kind of start to play these missions, right?

00:08:12.920 | You play against some sort of scripted things

00:08:15.720 | to develop the story of the characters in the game.

00:08:18.960 | And then later on, I start playing against the built-in AI,

00:08:23.480 | and I thought it was impossible to defeat it.

00:08:26.120 | Then eventually you defeat one,

00:08:27.440 | and you can actually play against seven built-in AIs

00:08:29.720 | at the same time, which also felt impossible,

00:08:32.680 | but actually it's not that hard to beat

00:08:35.280 | seven built-in AIs at once.

00:08:36.960 | So once we achieved that,

00:08:39.160 | also we discovered that we could play,

00:08:42.120 | as I said, internet wasn't that great,

00:08:43.840 | but we could play with a LAN, right?

00:08:45.920 | Like basically against each other

00:08:48.040 | if we were in the same place,

00:08:49.920 | because you could just connect machines with cables, right?

00:08:53.680 | So we started playing in LAN mode,

00:08:55.640 | as a group of friends, and it was really, really,

00:08:59.280 | like much more entertaining than playing against AIs.

00:09:02.320 | And later on, as internet was starting to develop

00:09:05.160 | and being a bit faster and more reliable,

00:09:07.400 | then it's when I started experiencing Battle.net,

00:09:09.720 | which is this amazing universe,

00:09:11.560 | not only because of the fact that you can play the game

00:09:14.720 | against anyone in the world,

00:09:16.440 | but you can also get to know more people.

00:09:20.200 | You just get exposed to now like this vast variety of,

00:09:23.080 | it's kind of a bit when the chats came about, right?

00:09:25.840 | There was a chat system, you could play against people,

00:09:29.040 | but you could also chat with people,

00:09:30.720 | not only about StarCraft, but about anything.

00:09:32.480 | And that became a way of life for kind of two years,

00:09:36.640 | and obviously then it became like kind of,

00:09:38.840 | it exploded in me that I started to play more seriously,

00:09:42.200 | going to tournaments and so on and so forth.

00:09:44.640 | - Do you have a sense on a societal sociological level,

00:09:49.640 | what's this whole part of society

00:09:52.200 | that many of us are not aware of?

00:09:53.760 | And it's a huge part of society, which is gamers.

00:09:56.840 | I mean, every time I come across that in YouTube

00:10:00.920 | or streaming sites, I mean,

00:10:03.160 | this is the huge number of people play games religiously.

00:10:07.600 | Do you have a sense of those folks,

00:10:08.880 | especially now that you've returned to that realm

00:10:10.840 | a little bit on the AI side?

00:10:12.560 | - Yeah, so in fact, even after StarCraft,

00:10:15.840 | I actually played World of Warcraft,

00:10:17.560 | which is maybe the main sort of online worlds

00:10:21.360 | and presence that you get to interact with lots of people.

00:10:24.600 | So I played that for a little bit.

00:10:26.320 | It was, to me, it was a bit less stressful than StarCraft

00:10:29.000 | because winning was kind of a given.

00:10:30.840 | You just put in this world and you can always

00:10:33.320 | complete missions.

00:10:34.920 | But I think it was actually the social aspect of,

00:10:38.640 | especially StarCraft first,

00:10:40.400 | and then games like World of Warcraft,

00:10:43.320 | really shaped me in a very interesting way.

00:10:46.880 | Because what you get to experience is just people

00:10:49.640 | you wouldn't usually interact with, right?

00:10:51.600 | So even nowadays, I still have many Facebook friends

00:10:54.920 | from the era where I played online

00:10:56.880 | and their ways of thinking is even political.

00:11:00.040 | They just, we don't live in,

00:11:01.560 | like we don't interact in the real world,

00:11:03.640 | but we were connected by basically fiber.

00:11:06.720 | And that way I actually get to understand a bit better

00:11:10.760 | that we live in a diverse world.

00:11:12.760 | And these were just connections that were made by,

00:11:15.560 | because I happened to go in a city,

00:11:18.040 | in a virtual city as a priest,

00:11:20.640 | and I met this warrior and we became friends,

00:11:23.600 | and then we started playing together, right?

00:11:25.640 | So I think it's transformative

00:11:28.720 | and more and more and more people are more aware of it.

00:11:31.240 | I mean, it's becoming quite mainstream,

00:11:33.440 | but back in the day, as you were saying in 2000, 2005,

00:11:37.560 | even it was very, still very strange thing to do,

00:11:42.040 | especially in Europe.

00:11:44.200 | I think there were exceptions like Korea, for instance,

00:11:47.120 | it was amazing that everything happened so early

00:11:50.560 | in terms of cyber cafes.

00:11:52.160 | Like if you go to Seoul, it's a city that,

00:11:56.280 | back in the day, StarCraft was kind of,

00:11:58.360 | you could be a celebrity by playing StarCraft,

00:12:00.600 | but this was like 99, 2000, right?

00:12:03.000 | It's not like recently.

00:12:04.120 | So yeah, it's quite interesting to look back.

00:12:08.480 | And yeah, I think it's changing society,

00:12:10.920 | the same way, of course, like technology

00:12:13.080 | and social networks and so on are also transforming things.

00:12:16.880 | - And a quick tangent, let me ask,

00:12:18.400 | you're also one of the most productive people

00:12:20.960 | in your particular chosen passion and path in life.

00:12:26.400 | And yet you're also appreciate and enjoy video games.

00:12:29.440 | Do you think it's possible to do,

00:12:31.160 | to enjoy video games in moderation?

00:12:35.760 | - Someone told me that you could choose two out of three.

00:12:39.880 | When I was playing video games,

00:12:41.120 | you could choose having a girlfriend,

00:12:43.680 | playing video games or studying.

00:12:46.200 | And I think for the most part, it was relatively true.

00:12:50.520 | These things do take time.

00:12:52.320 | Games like StarCraft, if you take the game pretty seriously

00:12:55.360 | and you wanna study it,

00:12:56.480 | then you obviously will dedicate more time to it.

00:12:59.040 | And I definitely took gaming

00:13:01.160 | and obviously studying very seriously.

00:13:03.640 | I love learning science and et cetera.

00:13:08.640 | So to me, especially when I started university undergrad,

00:13:13.080 | I kind of step off StarCraft,

00:13:14.880 | I actually fully stopped playing.

00:13:16.760 | And then World of Warcraft was a bit more casual.

00:13:19.000 | You could just connect online and I mean, it was fun.

00:13:22.840 | But as I said, that was not as much time investment

00:13:26.800 | as it was for me in StarCraft.

00:13:29.440 | - Okay, so let's get into AlphaStar.

00:13:31.600 | What are the, you're behind the team.

00:13:35.160 | So DeepMind has been working on StarCraft

00:13:37.200 | and released a bunch of cool open source agents

00:13:39.400 | and so on in the past few years.

00:13:41.280 | But AlphaStar really is the moment where

00:13:43.720 | the first time you beat a world class player.

00:13:49.120 | So what are the parameters of the challenge

00:13:51.560 | in the way that AlphaStar took it on?

00:13:53.440 | And how did you and David and the rest of the DeepMind team

00:13:57.400 | get into it?

00:13:58.240 | Consider that you can even beat the best in the world

00:14:00.920 | or top players.

00:14:02.440 | - I think it all started back in 2015.

00:14:07.440 | Actually I'm lying, I think it was 2014

00:14:10.760 | when DeepMind was acquired by Google.

00:14:13.640 | And I at the time was at Google Brain,

00:14:15.640 | which is it was in California, is still in California.

00:14:18.880 | We had this summit where we got together, the two groups.

00:14:21.800 | So Google Brain and Google DeepMind got together

00:14:24.360 | and we gave a series of talks.

00:14:26.320 | And given that they were doing

00:14:28.600 | deep reinforcement learning for games,

00:14:30.560 | I decided to bring up part of my past,

00:14:33.600 | which I had developed at Berkeley,

00:14:35.040 | like this thing which we call Berkeley Overmind,

00:14:37.360 | which is really just a StarCraft one bot, right?

00:14:40.120 | So I talked about that.

00:14:42.120 | And I remember Demis just came to me and said,

00:14:44.200 | "Well, maybe not now, it's perhaps a bit too early,

00:14:47.080 | but you should just come to DeepMind

00:14:48.880 | and do this again with deep reinforcement learning."

00:14:53.680 | And at the time it sounded very science fiction

00:14:56.520 | for several reasons.

00:14:58.720 | But then in 2016, when I actually moved to London

00:15:01.480 | and joined DeepMind, transferring from Brain,

00:15:04.720 | it became apparent that because of the AlphaGo moment

00:15:08.160 | and kind of Blizzard reaching out to us to say,

00:15:11.160 | "Wait, do you want the next challenge?"

00:15:12.960 | And also me being full-time at DeepMind,

00:15:15.040 | so sort of kind of all these came together.

00:15:17.400 | And then I went to Irvine in California,

00:15:20.920 | to the Blizzard headquarters,

00:15:22.520 | to just chat with them and try to explain

00:15:25.320 | how would it all work before you do anything.

00:15:27.760 | And the approach has always been

00:15:30.640 | about the learning perspective, right?

00:15:33.600 | So in Berkeley, we did a lot of rule-based conditioning

00:15:38.600 | and if you have more than three units, then go attack.

00:15:42.480 | And if the other has more units than me, I retreat,

00:15:45.000 | and so on and so forth.

00:15:46.320 | And of course, the point of deep reinforcement learning,

00:15:48.760 | deep learning, machine learning in general,

00:15:50.440 | is that all these should be learned behavior.

00:15:53.400 | So that kind of was the DNA of the project

00:15:56.920 | since its inception in 2016,

00:15:59.400 | where we just didn't even have an environment to work with.

00:16:02.840 | And so that's how it all started really.

00:16:05.800 | - So if you go back to that conversation with Demis,

00:16:08.520 | or even in your own head, how far away did you,

00:16:12.160 | because we're talking about Atari games,

00:16:14.520 | we're talking about Go, which is kind of,

00:16:16.760 | if you're honest about it,

00:16:17.840 | really far away from StarCraft.

00:16:19.800 | Well, now that you've beaten it,

00:16:22.200 | maybe you could say it's close,

00:16:23.320 | but it seems like StarCraft is way harder than Go,

00:16:27.200 | philosophically and mathematically speaking.

00:16:30.920 | So how far away did you think you were?

00:16:35.080 | Do you think in 2019 and '18,

00:16:37.280 | you could be doing as well as you have?

00:16:38.760 | - Yeah, when I kind of thought about,

00:16:40.880 | okay, I'm gonna dedicate a lot of my time

00:16:43.800 | and focus on this,

00:16:44.880 | and obviously I do a lot of different research

00:16:48.080 | in deep learning, so spending time on it,

00:16:50.400 | I mean, I really had to kind of think

00:16:52.240 | there's gonna be something good happening out of this.

00:16:55.880 | So really I thought, well, this sounds impossible,

00:16:59.120 | and it probably is impossible to do the full thing,

00:17:01.600 | like the all, like the full game,

00:17:04.280 | where you play one versus one,

00:17:06.240 | and it's only a neural network playing and so on.

00:17:09.760 | So it really felt like,

00:17:10.960 | I just didn't even think it was possible.

00:17:14.000 | But on the other hand,

00:17:14.840 | I could see some stepping stones like towards that goal.

00:17:19.080 | Clearly you could define sub-problems in StarCraft

00:17:21.600 | and sort of dissect it a bit and say,

00:17:23.400 | okay, here is a part of the game, here's another part.

00:17:26.720 | And also, obviously the fact,

00:17:29.400 | so this was really also critical to me,

00:17:31.280 | the fact that we could access human replays, right?

00:17:34.400 | So Blizzard was very kind,

00:17:35.720 | and in fact, they open sourced this for the whole community

00:17:38.560 | where you can just go,

00:17:39.960 | and it's not every single StarCraft game ever played,

00:17:43.040 | but it's a lot of them, you can just go and download.

00:17:45.880 | And every day they will, you can just query a dataset

00:17:48.720 | and say, well, give me all the games that were played today.

00:17:51.680 | And given my kind of experience with language and sequences

00:17:56.680 | and supervised learning, I thought, well,

00:17:58.560 | that's definitely gonna be very helpful

00:18:00.760 | and something quite unique now,

00:18:02.440 | because ever before we had such a large dataset of replays

00:18:08.240 | of people playing the game at this scale

00:18:11.000 | of such a complex video game, right?

00:18:12.520 | So that to me was a precious resource.

00:18:15.640 | And as soon as I knew that Blizzard was able

00:18:18.000 | to kind of give this to the community,

00:18:20.920 | I started to feel positive

00:18:22.240 | about something non-trivial happening.

00:18:24.240 | But I also thought the full thing, like really no rules,

00:18:28.360 | no single line of code that tries to say, well, I mean,

00:18:31.680 | if you see this, you need to build a detector,

00:18:33.320 | all these, not having any of these specializations

00:18:36.680 | seemed really, really, really difficult to me.

00:18:39.120 | - Intuitively.

00:18:39.960 | I do also like that Blizzard was teasing

00:18:42.680 | or even trolling you, sort of almost, yeah,

00:18:47.680 | pulling you in into this really difficult challenge.

00:18:50.280 | Did they have any awareness?

00:18:51.840 | What's the interest from the perspective of Blizzard,

00:18:55.640 | except just curiosity?

00:18:57.280 | - Yeah, I think Blizzard has really understood

00:18:59.440 | and really bring forward this competitiveness

00:19:03.240 | of eSports in games.

00:19:04.760 | StarCraft really kind of sparked a lot of,

00:19:07.800 | like something that almost was never seen,

00:19:10.680 | especially as I was saying, back in Korea.

00:19:13.920 | So they just probably thought, well,

00:19:16.400 | this is such a pure one versus one setup

00:19:18.840 | that it would be great to see if something

00:19:21.840 | that can play Atari or Go and then later on chess

00:19:26.640 | could even tackle these kind of complex

00:19:29.200 | real-time strategy game, right?

00:19:30.560 | So for them, they wanted to see first, obviously,

00:19:33.840 | whether it was possible, if the game they created

00:19:38.080 | was in a way solvable to some extent.

00:19:40.800 | And I think on the other hand,

00:19:42.120 | they also are a pretty modern company that innovates a lot.

00:19:45.720 | So just starting to understand AI for them

00:19:48.480 | to how to bring AI into games is not AI for games,

00:19:52.360 | but games for AI, right?

00:19:54.280 | I mean, both ways I think can work.

00:19:56.080 | And we obviously at DeepMind use games for AI, right?

00:20:00.000 | To drive AI progress, but Blizzard might actually be able

00:20:03.400 | to do and many other companies to start to understand

00:20:06.040 | and do the opposite.

00:20:06.880 | So I think that is also something

00:20:08.600 | they can get out of this.

00:20:09.760 | And they definitely, we have brainstormed a lot about this.

00:20:13.680 | - But one of the interesting things to me about StarCraft

00:20:16.040 | and Diablo and these games that Blizzard has created

00:20:19.360 | is the task of balancing classes, for example,

00:20:23.520 | sort of making the game fair from the starting point

00:20:27.440 | and then let skill determine the outcome.

00:20:30.920 | Is there, I mean, can you first comment,

00:20:33.560 | there's three races, Zerg, Protoss and Terran.

00:20:36.760 | I don't know if I've ever said that out loud.

00:20:38.920 | Is that how you pronounce it, Terran?

00:20:40.560 | - Terran, yeah.

00:20:41.920 | (laughing)

00:20:44.120 | - Yeah, I don't think I've ever in-person interacted

00:20:46.480 | with anybody about StarCraft, that's funny.

00:20:48.640 | So they seem to be pretty balanced.

00:20:51.760 | I wonder if the AI, the work that you're doing

00:20:56.240 | with AlphaStar would help balance them even further.

00:20:59.160 | Is that something you think about?

00:21:00.520 | Is that something that Blizzard is thinking about?

00:21:03.320 | - Right, so balancing when you add a new unit

00:21:06.400 | or a new spell type is obviously possible

00:21:09.120 | given that you can always train or pre-train at scale

00:21:13.160 | some agent that might start using that in unintended ways.

00:21:16.680 | But I think actually, if you understand how StarCraft

00:21:19.920 | has kind of co-evolved with players,

00:21:22.200 | in a way, I think it's actually very cool,

00:21:24.320 | the ways that many of the things and strategies

00:21:27.400 | that people came up with, right?

00:21:28.680 | So I think we've seen it over and over in StarCraft

00:21:32.280 | that Blizzard comes up with maybe a new unit

00:21:34.960 | and then some players get creative

00:21:37.240 | and do something kind of unintentional

00:21:39.080 | or something that Blizzard designers

00:21:40.840 | that just simply didn't test or think about.

00:21:43.560 | And then after that becomes kind of mainstream

00:21:46.200 | in the community, Blizzard patches the game

00:21:48.280 | and then they kind of maybe weaken that strategy

00:21:51.880 | or make it actually more interesting,

00:21:53.880 | but a bit more balanced.

00:21:55.400 | So these kind of continual talk between players and Blizzard

00:21:58.520 | is kind of what has defined them actually

00:22:01.680 | in actually most games, in StarCraft,

00:22:04.040 | but also in World of Warcraft, they would do that.

00:22:06.440 | There are several classes and it would be not good

00:22:09.280 | that everyone plays absolutely the same race and so on.

00:22:13.200 | So I think they do care about balancing, of course,

00:22:17.240 | and they do a fair amount of testing,

00:22:19.600 | but it's also beautiful to also see

00:22:22.120 | how players get creative anyways.

00:22:24.480 | And I mean, whether AI can be more creative at this point,

00:22:27.440 | I don't think so, right?

00:22:28.680 | I mean, it's just sometimes something so amazing happens.

00:22:31.560 | Like I remember back in the days,

00:22:33.680 | like you have these drop ships that could drop the rivers

00:22:36.920 | and that was actually not thought about

00:22:39.560 | that you could drop this unit

00:22:41.240 | that has this what's called splash damage

00:22:43.200 | that would basically eliminate

00:22:45.600 | all the enemies workers at once.

00:22:47.800 | No one thought that you could actually put them

00:22:50.120 | in really early game, do that kind of damage

00:22:53.040 | and then things change in the game.

00:22:55.400 | But I don't know, I think it's quite an amazing

00:22:58.000 | exploration process from both sides,

00:23:00.280 | players and Blizzard alike.

00:23:01.840 | - Well, it's almost like a reinforcement learning

00:23:04.280 | exploration, but I mean, the scale of humans

00:23:07.000 | that play Blizzard games is almost on the scale

00:23:12.000 | of a large scale, deep mind RL experiment.

00:23:15.320 | I mean, if you look at the numbers,

00:23:17.200 | that's, I mean, you're talking about,

00:23:18.680 | I don't know how many games,

00:23:19.520 | but hundreds of thousands of games probably a month.

00:23:22.040 | - Yeah.

00:23:22.880 | - So you could, it's almost the same as running RL agents.

00:23:27.880 | What aspect of the problem of Starcraft

00:23:31.200 | do you think is the hardest?

00:23:32.120 | Is it the, like you said, the imperfect information?

00:23:35.360 | Is it the fact they have to do long-term planning?

00:23:38.120 | Is it the real time aspects?

00:23:40.280 | We have to do stuff really quickly.

00:23:42.200 | Is it the fact that a large action space

00:23:44.720 | so you can do so many possible things?

00:23:47.600 | Or is it, you know, in the game theoretic sense,

00:23:51.080 | there is no Nash equilibrium,

00:23:52.400 | at least you don't know what the optimal strategy is

00:23:54.200 | 'cause there's way too many options.

00:23:56.480 | - Right.

00:23:57.320 | - Is there something that stands out

00:23:58.160 | as just like the hardest, the most annoying thing?

00:24:00.960 | - So when we sort of looked at the problem

00:24:04.160 | and start to define like the parameters of it, right?

00:24:07.600 | What are the observations?

00:24:08.760 | What are the actions?

00:24:10.520 | It became very apparent that, you know,

00:24:14.000 | the very first barrier that one would hit in Starcraft

00:24:17.120 | would be because of the action space being so large

00:24:20.680 | and us not being able to search like you could in Chess

00:24:24.840 | or Go, even though the search space is vast.

00:24:27.280 | The main problem that we identified

00:24:30.560 | was that of exploration, right?

00:24:32.400 | So without any sort of human knowledge or human prior,

00:24:36.680 | if you think about Starcraft

00:24:38.000 | and you know how deep reinforcement learning algorithm work,

00:24:42.000 | which is essentially by issuing random actions

00:24:45.360 | and hoping that they will get some wins sometimes

00:24:47.800 | so they could learn.

00:24:49.200 | So if you think of the action space in Starcraft,

00:24:52.800 | almost anything you can do in the early game is bad

00:24:55.880 | because any action involves taking workers,

00:24:58.720 | which are mining minerals for free.

00:25:01.360 | That's something that the game does automatically,

00:25:03.560 | sends them to mine.

00:25:04.920 | And you would immediately just take them out of mining

00:25:07.720 | and send them around.

00:25:09.080 | So just thinking how is it gonna be possible

00:25:13.600 | to get to understand these concepts,

00:25:16.880 | but even more like expanding, right?

00:25:19.240 | There's these buildings you can place

00:25:21.080 | in other locations in the map to gather more resources,

00:25:24.160 | but the location of the building is important.

00:25:26.840 | And you have to select a worker,

00:25:28.920 | send it walking to that location,

00:25:31.760 | build the building, wait for the building to be built,

00:25:34.120 | and then put extra workers there.

00:25:36.680 | So they start mining.

00:25:37.800 | That feels like impossible if you just randomly click

00:25:41.720 | to produce that state, desirable state,

00:25:44.520 | that then you could hope to learn from,

00:25:46.960 | because eventually that may yield to an extra win, right?

00:25:49.840 | So for me, the exploration problem,

00:25:51.800 | and due to the action space,

00:25:53.800 | and the fact that there's not really turns,

00:25:56.120 | there's so many turns because the game essentially

00:25:59.160 | ticks at 22 times per second.

00:26:02.080 | I mean, that's how they can discretize sort of time.

00:26:05.520 | Obviously, you always have to discretize time.

00:26:07.440 | There's no such thing as real time.

00:26:09.600 | But it's really a lot of time steps

00:26:12.560 | of things that could go wrong.

00:26:14.240 | And that definitely felt a priori like the hardest.

00:26:17.960 | You mentioned many good ones.

00:26:19.320 | I think partial observability,

00:26:21.360 | the fact that there is no perfect strategy

00:26:23.440 | because of the partial observability.

00:26:25.520 | Those are very interesting problems.

00:26:26.840 | We start seeing more and more now

00:26:28.520 | in terms of as we solve the previous ones.

00:26:31.040 | But the core problem to me was exploration,

00:26:34.240 | and solving it has been basically kind of the focus

00:26:37.720 | on how we saw the first breakthroughs.

00:26:39.760 | - So exploration in a multi-hierarchical way.

00:26:43.680 | So like 22 times a second exploration

00:26:46.600 | has a very different meaning than it does

00:26:48.640 | in terms of should I gather resources early,

00:26:51.480 | or should I wait, or so on.

00:26:53.160 | So how do you solve the long-term?

00:26:56.200 | Let's talk about the internals of AlphaStar.

00:26:58.080 | So first of all, how do you represent

00:27:01.880 | the state of the game as input?

00:27:05.400 | How do you then do the long-term sequence modeling?

00:27:08.800 | How do you build a policy?

00:27:10.760 | What's the architecture like?

00:27:12.560 | - So AlphaStar has obviously several components,

00:27:16.840 | but everything passes through what we call the policy,

00:27:20.880 | which is a neural network.

00:27:22.280 | And that's kind of the beauty of it.

00:27:24.280 | There is, I could just now give you a neural network

00:27:27.160 | and some weights, and if you fed the right observations

00:27:30.440 | and you understood the actions the same way we do,

00:27:32.520 | you would have basically the agent playing the game.

00:27:35.120 | There's absolutely nothing else needed

00:27:37.240 | other than those weights that were trained.

00:27:40.320 | Now, the first step is observing the game,

00:27:43.360 | and we've experimented with a few alternatives.

00:27:46.640 | The one that we currently use mixes both spatial

00:27:50.280 | sort of images that you would process from the game,

00:27:53.760 | that is the zoomed out version of the map,

00:27:56.400 | and also a zoomed in version of the camera

00:27:58.960 | or the screen as we call it.

00:28:00.880 | But also we give to the agent the list of units

00:28:04.840 | that it sees, more of as a set of objects

00:28:09.000 | that it can operate on.

00:28:11.040 | That is not necessarily required to use it.

00:28:14.760 | And we have versions of the game that play well

00:28:16.840 | without this set vision that is a bit not like

00:28:19.760 | how humans perceive the game,

00:28:21.680 | but it certainly helps a lot

00:28:23.640 | because it's a very natural way to encode the game

00:28:26.560 | is by just looking at all the units that there are,

00:28:29.360 | they have properties like health, position, type of unit,

00:28:33.960 | whether it's my unit or the enemy's.

00:28:36.200 | And that sort of is kind of the summary

00:28:40.800 | of the state of the game,

00:28:42.840 | not that list of units or set of units

00:28:45.520 | that you see all the time.

00:28:47.400 | - But that's pretty close to the way humans see the game.

00:28:49.560 | Why do you say it's not, isn't that,

00:28:51.480 | you're saying the exactness of it is not similar to humans?

00:28:55.040 | - The exactness of it is perhaps not the problem.

00:28:57.160 | I guess maybe the problem, if you look at it

00:28:59.800 | from how actually humans play the game

00:29:02.280 | is that they play with a mouse and a keyboard and a screen,

00:29:05.720 | and they don't see sort of a structured object

00:29:08.720 | with all the units,

00:29:09.560 | what they see is what they see on the screen, right?

00:29:12.200 | So--

00:29:13.040 | - Remember that there's a, sorry to interrupt,

00:29:14.360 | there's a plot that you showed with camera base

00:29:16.960 | where you do exactly that, right?

00:29:18.600 | You move around and that seems to converge

00:29:21.080 | to similar performance.

00:29:22.240 | - Yeah, I think that's what I,

00:29:23.520 | we're kind of experimenting with what's necessary or not,

00:29:26.320 | but using the set.

00:29:28.720 | So actually, if you look at research in computer vision,

00:29:32.360 | where it makes a lot of sense to treat images

00:29:36.000 | as two-dimensional arrays,

00:29:38.160 | there's actually a very nice paper from Facebook,

00:29:40.400 | I think, I forgot who the authors are,

00:29:42.760 | but I think it's part of K-Ming's group.

00:29:46.400 | And what they do is they take an image,

00:29:49.560 | which is this two-dimensional signal,

00:29:51.960 | and they actually take pixel by pixel

00:29:54.320 | and scramble the image as if it was just a list of pixels.

00:29:59.160 | Crucially, they encode the position of the pixels

00:30:01.800 | with the XY coordinates.

00:30:03.720 | And this is just kind of a new architecture,

00:30:06.160 | which we incidentally also use in StarCraft

00:30:08.520 | called the Transformer,

00:30:09.840 | which is a very popular paper from last year,

00:30:12.000 | which yielded very nice result in machine translation.

00:30:15.600 | And if you actually believe in this kind of,

00:30:18.040 | oh, it's actually a set of pixels,

00:30:20.320 | as long as you encode XY, it's okay,

00:30:22.560 | then you could argue that the list of units that we see

00:30:26.080 | is precisely that,

00:30:26.960 | because we have each unit as a kind of pixel, if you will,

00:30:31.480 | and then their XY coordinates.

00:30:33.240 | So in that perspective, without knowing it,

00:30:36.400 | we use the same architecture that was shown

00:30:38.720 | to work very well on Pascal and ImageNet and so on.

00:30:41.400 | - So the interesting thing here is putting it in that way,

00:30:45.440 | it starts to move it towards

00:30:46.960 | the way you usually work with language.

00:30:49.480 | So what, and especially with your expertise

00:30:52.760 | and work in language,

00:30:55.520 | it seems like there's echoes of a lot of the way

00:30:59.320 | you would work with natural language

00:31:00.720 | in the way you've approached AlphaStar.

00:31:02.400 | - Right.

00:31:03.240 | - What's, does that help with the long-term

00:31:05.880 | sequence modeling there somehow?

00:31:08.200 | - Exactly, so now that we understand what an observation

00:31:11.200 | for a given time step is, we need to move on to say,

00:31:14.680 | well, there's gonna be a sequence of such observations,

00:31:17.760 | and an agent will need to, given all that it's seen,

00:31:21.120 | not only the current time step, but all that it's seen,

00:31:23.760 | why? Because there is partial observability.

00:31:25.960 | We must remember whether we saw a worker

00:31:28.440 | going somewhere, for instance, right?

00:31:30.160 | Because then there might be an expansion

00:31:31.760 | on the top right of the map.

00:31:33.640 | So given that, what you must then think about

00:31:37.840 | is there is the problem of given all the observations,

00:31:40.400 | you have to predict the next action.

00:31:42.640 | And not only given all the observations,

00:31:44.520 | but given all the observations

00:31:45.960 | and given all the actions you've taken,

00:31:47.920 | predict the next action.

00:31:49.360 | And that sounds exactly like machine translation,

00:31:52.480 | where, and that's exactly how kind of I saw the problem,

00:31:57.160 | especially when you are given supervised data

00:32:00.000 | or replaced from humans,

00:32:01.760 | because the problem is exactly the same.

00:32:03.600 | You're translating essentially a prefix

00:32:06.680 | of observations and actions onto what's gonna happen next,

00:32:10.160 | which is exactly how you would train a model,

00:32:11.960 | to translate or to generate language as well, right?

00:32:14.760 | You have a certain prefix,

00:32:16.640 | you must remember everything that comes in the past

00:32:19.000 | because otherwise you might start having non-coherent text.

00:32:22.640 | And the same architectures,

00:32:25.080 | we're using LSTMs and transformers to operate on,

00:32:28.920 | across time to kind of integrate

00:32:30.960 | all that's happened in the past.

00:32:33.080 | Those architectures that work so well

00:32:35.000 | in translation or language modeling

00:32:36.880 | are exactly the same than what the agent is using

00:32:40.640 | to issue actions in the game.

00:32:42.360 | And the way we train it, moreover, for imitation,

00:32:44.760 | which is step one of AlphaSTAR is,

00:32:47.120 | take all the human experience and try to imitate it,

00:32:49.880 | much like you try to imitate translators

00:32:52.920 | that translated many pairs of sentences

00:32:55.360 | from French to English, say,

00:32:57.280 | that sort of principle applies exactly the same.

00:33:00.200 | It's almost the same code, except that instead of words,

00:33:04.520 | you have a slightly more complicated objects,

00:33:06.640 | which are the observations and the actions

00:33:08.920 | are also a bit more complicated than a word.

00:33:11.800 | - Is there a self-play component then too?

00:33:13.960 | So once you run out of imitation?

00:33:16.520 | - Right, so indeed you can bootstrap from human replays,

00:33:21.520 | but then the agents you get are actually not as good

00:33:26.000 | as the humans you imitated, right?

00:33:28.200 | So how do we imitate?

00:33:30.480 | Well, we take humans from 3000 MMR and higher.

00:33:34.320 | 3000 MMR is just a metric of human skill

00:33:38.000 | and 3000 MMR might be like 50% percentile, right?

00:33:41.920 | So it's just average human.

00:33:43.800 | - What's that?

00:33:44.640 | So maybe quick pause.

00:33:45.480 | MMR is a ranking scale, the matchmaking rating for players.

00:33:50.360 | So it's 3000, I remember there's like a master

00:33:52.360 | and a grandmaster, what's 3000?

00:33:54.160 | - So 3000 is pretty bad.

00:33:56.800 | I think it's kind of gold level.

00:33:58.520 | - It just sounds really good relative to chess, I think.

00:34:00.720 | - Oh yeah, yeah, no, the ratings,

00:34:02.520 | the best in the world are at 7,000 MMR.

00:34:04.520 | - 7,000.

00:34:05.400 | - So 3000, it's a bit like Elo indeed, right?

00:34:07.920 | So 3,500 just allows us to not filter a lot of the data.

00:34:12.920 | So we like to have a lot of data in deep learning

00:34:15.720 | as you probably know.

00:34:17.360 | So we take these kind of 3,500 and above,

00:34:20.680 | but then we do a very interesting trick,

00:34:22.760 | which is we tell the neural network

00:34:25.040 | what level they are imitating.

00:34:27.600 | So we say, this replay you're gonna try to imitate

00:34:30.840 | to predict the next action for all the actions

00:34:33.080 | that you're gonna see is a 4,000 MMR replay.

00:34:36.120 | This one is a 6,000 MMR replay.

00:34:38.840 | And what's cool about this is then we take this policy

00:34:42.520 | that is being trained from human,

00:34:44.320 | and then we can ask it to play like a 3000 MMR player

00:34:47.440 | by setting a bit saying, well, okay,

00:34:49.600 | play like a 3000 MMR player or play like a 6,000 MMR player.

00:34:53.680 | And you actually see how the policy behaves differently.

00:34:57.280 | It gets worse economy if you play like a gold level player,

00:35:01.480 | it does less actions per minute,

00:35:02.960 | which is the number of clicks or number of actions

00:35:05.280 | that you will issue in a whole minute.

00:35:07.760 | And it's very interesting to see that it kind of imitates

00:35:10.520 | the skill level quite well.

00:35:12.360 | But if we ask it to play like a 6,000 MMR player,

00:35:15.480 | we tested of course these policies to see how well they do.

00:35:18.600 | They actually beat all the built-in AIs

00:35:20.560 | that Blizzard put in the game,

00:35:22.400 | but they're nowhere near 6,000 MMR players, right?

00:35:24.960 | They might be maybe around gold level, platinum perhaps.

00:35:29.240 | So there's still a lot of work to be done for the policy

00:35:32.200 | to truly understand what it means to win.

00:35:34.960 | So far, we only asked them, okay, here is the screen,

00:35:38.200 | and that's what's happened on the game until this point.

00:35:41.600 | What would the next action be if we ask a pro to now say,

00:35:46.080 | oh, you're gonna click here or here or there.

00:35:49.080 | And the point is experiencing wins and losses

00:35:53.640 | is very important to then start to refine.

00:35:56.320 | Otherwise the policy can get loose,

00:35:58.320 | can just go off policy as we call it.

00:36:00.440 | - That's so interesting that you can at least hope

00:36:02.920 | eventually to be able to control a policy

00:36:06.760 | approximately to be at some MMR level.

00:36:09.960 | That's so interesting,

00:36:11.480 | especially given that you have ground truth

00:36:13.320 | for a lot of these cases.

00:36:15.040 | I can ask you a personal question.

00:36:17.600 | What's your MMR?

00:36:19.280 | - Well, I haven't played StarCraft II, so I am unranked,

00:36:23.440 | which is the kind of lowest league.

00:36:25.440 | - Okay.

00:36:26.280 | - So I used to play StarCraft I, the first one.

00:36:29.640 | - But you haven't seriously played StarCraft II?

00:36:31.480 | - No, not StarCraft II.

00:36:32.720 | So the best player we have at DeepMind is about 5,000 MMR,

00:36:37.720 | which is high masters.

00:36:39.640 | It's not at Grand Master level.

00:36:42.120 | Grand Master level would be the top 200 players

00:36:44.720 | in a certain region like Europe or America or Asia.

00:36:47.960 | But for me, it would be hard to say.

00:36:51.640 | I am very bad at the game.

00:36:53.760 | I actually played AlphaStar a bit too late and it beat me.

00:36:56.680 | I remember the whole team was, "Oh, Oriol, you should play."

00:36:59.760 | And I was, "Oh, it looks like it's not so good yet."

00:37:02.240 | And then I remember I kind of got busy

00:37:04.920 | and waited an extra week and I played

00:37:07.280 | and it really beat me very badly.

00:37:10.280 | - I mean, how did that feel?

00:37:11.520 | Isn't that an amazing feeling?

00:37:12.720 | - That's amazing, yeah.

00:37:13.640 | I mean, obviously I tried my best

00:37:16.520 | and I tried to also impress my...

00:37:18.080 | Because I actually played the first game,

00:37:19.800 | so I'm still pretty good at micromanagement.

00:37:23.160 | The problem is I just don't understand StarCraft II.

00:37:25.280 | I understand StarCraft.

00:37:27.000 | And when I played StarCraft,

00:37:28.520 | I probably was consistently, like,

00:37:31.480 | for a couple of years, top 32 in Europe.

00:37:34.680 | So I was decent, but at the time,

00:37:36.520 | we didn't have this kind of MMR system as well established.

00:37:40.360 | So it would be hard to know what it was back then.

00:37:43.200 | - So what's the difference in interface

00:37:44.680 | between AlphaStar and StarCraft

00:37:47.760 | and a human player in StarCraft?

00:37:49.680 | Is there any significant differences

00:37:52.120 | between the way they both see the game?

00:37:54.160 | - I would say the way they see the game,

00:37:56.040 | there's a few things that are just very hard to simulate.

00:37:59.760 | The main one, perhaps, which is obvious in hindsight,

00:38:05.240 | is what's called cloaked units, which are invisible units.

00:38:10.600 | So in StarCraft, you can make some units

00:38:13.240 | that you need to have a particular kind of unit to detect it.

00:38:18.080 | So these units are invisible.

00:38:20.560 | If you cannot detect them, you cannot target them.

00:38:22.720 | So they would just destroy your buildings

00:38:25.760 | or kill your workers.

00:38:27.720 | But despite the fact you cannot target the unit,

00:38:31.640 | there's a shimmer that, as a human, you observe.

00:38:34.640 | I mean, you need to train a little bit.

00:38:35.920 | You need to pay attention.

00:38:37.440 | But you would see this kind of space-time,

00:38:40.160 | space-time distortion, and you would know,

00:38:42.400 | okay, there are, yeah.

00:38:44.800 | - Yeah, there's like a wave thing.

00:38:46.000 | - Yeah, it's called shimmer.

00:38:47.560 | - Space-time distortion, I like it.

00:38:49.120 | - That's really, the Blizzard term is shimmer.

00:38:51.880 | - Shimmer, okay.

00:38:52.720 | - And so this shimmer, professional players

00:38:55.520 | actually can see it immediately.

00:38:57.120 | They understand it very well.

00:38:59.480 | But it's still something that requires

00:39:01.400 | certain amount of attention,

00:39:02.680 | and it's kind of a bit annoying to deal with.

00:39:05.640 | Whereas for AlphaStar, in terms of vision,

00:39:08.600 | it's very hard for us to simulate sort of,

00:39:11.080 | oh, are you looking at this pixel in the screen and so on?

00:39:14.160 | So the only thing we can do is,

00:39:17.480 | there is a unit that's invisible over there.

00:39:19.680 | So AlphaStar would know that immediately.

00:39:22.480 | Obviously, it still obeys the rules.

00:39:24.000 | You cannot attack the unit.

00:39:25.160 | You must have a detector and so on.

00:39:27.360 | But it's kind of one of the main things

00:39:29.280 | that it just doesn't feel there's a very proper way.

00:39:32.640 | I mean, you could imagine, oh, you don't have high-precision,

00:39:35.440 | maybe you don't know exactly where it is,

00:39:36.920 | or sometimes you see it, sometimes you don't.

00:39:39.200 | But it's just really, really complicated

00:39:41.960 | to get it so that everyone would agree,

00:39:44.280 | oh, that's the best way to simulate this.

00:39:47.240 | - You know, it seems like a perception problem.

00:39:49.280 | - It is a perception problem.

00:39:50.560 | So the only problem is people, or you ask,

00:39:54.240 | oh, what's the difference between

00:39:55.280 | how humans perceive the game?

00:39:56.720 | I would say they wouldn't be able to tell a shimmer

00:39:59.920 | immediately as it appears on the screen.

00:40:02.200 | Whereas AlphaStar, in principle, sees it very sharply.

00:40:05.600 | It sees that the bit turned from zero to one,

00:40:08.640 | meaning there's now a unit there,

00:40:10.440 | although you don't know the unit,

00:40:11.920 | or you know that you cannot attack it and so on.

00:40:15.800 | So that, from a vision standpoint,

00:40:18.040 | that probably is the one that is kind of

00:40:21.200 | the most obvious one.

00:40:22.920 | Then there are things humans cannot do perfectly,

00:40:25.120 | even professionals, which is they might miss a detail,

00:40:28.040 | or they might have not seen a unit.

00:40:30.560 | And obviously, as a computer,

00:40:32.200 | if there's a corner of the screen that turns green

00:40:34.960 | because a unit enters the field of view,

00:40:37.640 | that can go into the memory of the agent, the LSTM,

00:40:41.000 | and persist there for a while,

00:40:42.480 | and for however long is relevant, right?

00:40:45.640 | - And in terms of action,

00:40:47.640 | it seems like the rate of action from AlphaStar

00:40:50.640 | is comparative, if not slower than professional players,

00:40:54.200 | but it's more precise, is what I heard.

00:40:57.080 | - So that's really probably the one

00:40:59.680 | that is causing us more issues for a couple of reasons.

00:41:05.000 | The first one is,

00:41:06.720 | StarCraft has been an AI environment for quite a few years.

00:41:09.960 | In fact, I was participating in the very first competition

00:41:13.960 | back in 2010, and there's really not been

00:41:18.720 | a very clear set of rules, how the actions per minute,

00:41:22.280 | the rate of actions that you can issue is.

00:41:24.680 | And as a result, these agents or bots that people build

00:41:29.240 | in a kind of almost very cool way,

00:41:31.040 | they do like 20,000, 40,000 actions per minute.

00:41:35.360 | Now, to put this in perspective,

00:41:37.160 | a very good professional human

00:41:39.480 | might do 300 to 800 actions per minute.

00:41:44.040 | They might not be as precise,

00:41:45.440 | that's why the range is a bit tricky to identify exactly.

00:41:49.000 | I mean, 300 actions per minute precisely

00:41:51.600 | is probably realistic, 800 is probably not,

00:41:54.560 | but you see humans doing a lot of actions

00:41:56.960 | because they warm up and they kind of select things

00:41:59.440 | and spam and so on, just so that when they need,

00:42:02.200 | they have the accuracy.

00:42:04.200 | So we came into this by not having kind of a standard way

00:42:09.200 | to say, well, how do we measure whether an agent

00:42:12.920 | is at human level or not?

00:42:14.840 | On the other hand, we had a huge advantage,

00:42:18.160 | which is because we do imitation learning,

00:42:21.400 | agents turned out to act like humans

00:42:24.480 | in terms of rate of actions, even precisions

00:42:26.880 | and imprecisions of actions in the supervised policy.

00:42:30.120 | You could see all these,

00:42:31.000 | you could see how agents like to spam click, to move here.

00:42:34.680 | If you played, especially Diablo, you would know what I mean.

00:42:36.880 | I mean, you just like spam,

00:42:38.680 | oh, move here, move here, move here.

00:42:40.320 | You're doing literally like maybe five actions

00:42:43.240 | in two seconds, but these actions are not very meaningful.

00:42:46.800 | One would have sufficed.

00:42:48.720 | So on the one hand, we start from this imitation policy

00:42:52.080 | that is at the ballpark of the actions per minutes of humans

00:42:55.600 | because it's actually statistically

00:42:57.320 | trying to imitate humans.

00:42:58.960 | So we see these very nicely in the curves

00:43:01.040 | that we showed in the blog post.

00:43:02.320 | Like there's these actions per minute

00:43:04.560 | and the distribution looks very human-like.

00:43:07.680 | But then of course, as self-play kicks in,

00:43:10.960 | and that's the part we haven't talked too much yet,

00:43:13.240 | but of course the agent must play against himself to improve.

00:43:17.200 | Then there's almost no guarantees

00:43:19.640 | that these actions will not become more precise

00:43:22.400 | or even the rate of actions is going to increase over time.

00:43:26.040 | So what we did, and this is probably kind of the first attempt

00:43:29.880 | that we thought was reasonable,

00:43:31.160 | is we looked at the distribution of actions for humans

00:43:34.240 | for certain windows of time.

00:43:36.400 | And just to give a perspective,

00:43:37.720 | because I guess I mentioned that some of these agents

00:43:40.680 | that are programmatic, let's call them,

00:43:42.280 | they do 40,000 actions per minute.

00:43:44.600 | Professionals, as I said, do 300 to 800.

00:43:47.360 | So what we looked is we look at the distribution

00:43:49.400 | over professional gamers, and we took reasonably

00:43:52.640 | high actions per minute,

00:43:54.120 | but we kind of identify certain cutoffs

00:43:57.480 | after which, even if the agent wanted to act,

00:44:00.560 | these actions would be dropped.

00:44:02.120 | But the problem is this cutoff is probably set

00:44:05.800 | a bit too high, and what ends up happening,

00:44:08.640 | even though the games, and when we ask the professionals

00:44:11.520 | and the gamers, by and large,

00:44:13.000 | they feel like it's playing human-like.

00:44:15.880 | There are some agents that developed

00:44:17.880 | maybe slightly too high APMs,

00:44:22.880 | which is actions per minute, combined with the precision,

00:44:26.640 | which made people sort of start discussing

00:44:29.400 | a very interesting issue, which is,

00:44:30.720 | should we have limited this?

00:44:32.520 | Should we just let it loose and see what cool things

00:44:35.920 | it can come up with, right?

00:44:37.560 | - Interesting.

00:44:38.400 | - So this is, in itself, an extremely interesting question,

00:44:42.040 | but the same way that modeling the shimmer

00:44:44.000 | would be so difficult, modeling absolutely all the details

00:44:47.720 | about muscles and precision and tiredness of humans

00:44:51.640 | would be quite difficult, right?

00:44:52.920 | So we're really here in kind of innovating

00:44:56.240 | in this sense of, okay, what could be maybe

00:44:58.920 | the next iteration of putting more rules

00:45:01.760 | that makes the agents more human-like

00:45:05.080 | in terms of restrictions?

00:45:06.360 | - Yeah, putting constraints that--

00:45:08.120 | - More constraints, yeah.

00:45:09.280 | - That's really interesting.

00:45:10.120 | That's really innovative.

00:45:11.080 | So one of the constraints you put on yourself,

00:45:15.440 | or at least focused in, is on the Protoss race,

00:45:18.040 | as far as I understand.

00:45:19.920 | Can you tell me about the different races and how they,

00:45:22.920 | so Protoss, Terran, and Zerg, how do they compare?

00:45:27.080 | How do they interact?

00:45:28.200 | Why did you choose Protoss?

00:45:30.040 | - Right. - Yeah.

00:45:30.880 | Is, in the dynamics of the game,

00:45:33.680 | seen from a strategic perspective?

00:45:35.720 | - So Protoss, so in StarCraft, there are three races.

00:45:39.720 | Indeed, in the demonstration, we saw only the Protoss race.

00:45:43.920 | So maybe let's start with that one.

00:45:45.600 | Protoss is kind of the most technologically advanced race.

00:45:49.480 | It has units that are expensive, but powerful, right?

00:45:53.840 | So in general, you wanna kind of conserve your units

00:45:57.880 | as you go attack, so you wanna,

00:45:59.560 | and then you wanna utilize these tactical advantages

00:46:03.280 | of very fancy spells and so on and so forth.

00:46:07.280 | And at the same time, they're kind of,

00:46:10.320 | people say, like, they're a bit easier to play, perhaps.

00:46:14.640 | Right?

00:46:15.480 | But that, I actually didn't know.

00:46:17.160 | I mean, I just talked to, now, a lot to the players

00:46:20.160 | that we work with, TLO and Mana, and they said,

00:46:22.920 | "Oh yeah, Protoss is actually, people think,

00:46:24.640 | "is actually one of the easiest races."

00:46:26.360 | So perhaps the easier, that doesn't mean that it's,

00:46:30.240 | you know, obviously professional players

00:46:32.760 | excel at the three races,

00:46:34.080 | and there's never like a race that dominates

00:46:37.600 | for a very long time anyway.

00:46:38.800 | - So if you look at the top, I don't know,

00:46:40.240 | a hundred in the world,

00:46:41.720 | is there one race that dominates that list?

00:46:44.360 | - It would be hard to know

00:46:45.320 | because it depends on the regions.

00:46:46.840 | I think it's pretty equal in terms of distribution,

00:46:50.600 | and Blizzard wants it to be equal, right?

00:46:52.840 | They don't want, they wouldn't want one race like Protoss

00:46:56.280 | to not be representative in the top place.

00:46:59.000 | - Right.

00:46:59.920 | - So definitely, like, they tried it to be like balanced.

00:47:03.040 | Right?

00:47:03.880 | So then maybe the opposite race of Protoss is Zerg.

00:47:07.320 | Zerg is a race where you just kind of expand

00:47:10.560 | and take over as many resources as you can,

00:47:13.800 | and they have a very high capacity

00:47:15.680 | to regenerate their units.

00:47:17.640 | So if you have an army, it's not that valuable

00:47:20.480 | in terms of losing the whole army is not a big deal as Zerg

00:47:23.920 | because you can then rebuild it,

00:47:25.920 | and given that you generally accumulate

00:47:28.280 | a huge bank of resources,

00:47:30.840 | Zergs typically play by applying a lot of pressure,

00:47:34.160 | maybe losing their whole army,

00:47:36.080 | but then rebuilding it quickly.

00:47:37.800 | So, although of course, every race,

00:47:40.400 | I mean, there's never, I mean, they're pretty diverse.

00:47:43.880 | I mean, there are some units in Zerg

00:47:45.080 | that are technologically advanced

00:47:46.520 | and they do some very interesting spells,

00:47:48.800 | and there's some units in Protoss that are less valuable

00:47:51.280 | and you could lose a lot of them and rebuild them

00:47:53.280 | and it wouldn't be a big deal.

00:47:55.080 | - All right, so maybe I'm missing out.

00:47:57.760 | Maybe I'm gonna say some dumb stuff,

00:47:59.200 | but summary of strategy.

00:48:02.440 | So first there's collection of a lot of resources.

00:48:05.680 | That's one option.

00:48:06.520 | The other one is expanding, so building other bases.

00:48:11.520 | Then the other is obviously attack,

00:48:14.840 | building units and attacking with those units.

00:48:17.240 | And then I don't know what else there is.

00:48:20.600 | Maybe there's the different timing of attacks,

00:48:24.000 | like do I attack early, attack late?

00:48:25.960 | What are the different strategies that emerged

00:48:27.920 | that you've learned about?

00:48:29.040 | I've read that a bunch of people are super happy

00:48:31.280 | that you guys have apparently,

00:48:32.920 | that AlphaStar apparently has discovered

00:48:34.960 | that it's really good to, what is it, saturate?

00:48:37.960 | - Oh yeah, the mineral line.

00:48:39.520 | - Yeah, the mineral line.

00:48:40.680 | - Yeah, yeah.

00:48:42.120 | - And that's for greedy amateur players like myself.

00:48:45.560 | That's always been a good strategy.

00:48:47.440 | You just build up a lot of money

00:48:48.960 | and it just feels good to just accumulate and accumulate.

00:48:53.240 | So thank you for discovering that

00:48:55.200 | and validating all of us.

00:48:56.640 | But is there other strategies that you discovered

00:48:59.160 | interesting, unique to this game?

00:49:01.840 | - Yeah, so if you look at the kind of,

00:49:05.000 | and not being a StarCraft II player,

00:49:06.440 | but of course StarCraft and StarCraft II

00:49:08.040 | and real-time strategy games in general are very similar.

00:49:11.040 | I would classify perhaps the openings of the game.

00:49:17.080 | They're very important.

00:49:18.760 | And generally I would say there's two kinds of openings.

00:49:21.760 | One that's a standard opening.

00:49:23.400 | That's generally how players find sort of a balance

00:49:28.400 | between risk and economy and building some units early on

00:49:33.400 | so that they could defend,

00:49:34.600 | but they're not too exposed basically,

00:49:36.800 | but also expanding quite quickly.

00:49:39.480 | So this would be kind of a standard opening.

00:49:42.040 | And within a standard opening,

00:49:43.680 | then what you do choose generally

00:49:45.520 | is what technology are you aiming towards?

00:49:48.360 | So there's a bit of rock, paper, scissors

00:49:50.280 | of you could go for spaceships

00:49:52.920 | or you could go for invisible units,

00:49:55.080 | or you could go for, I don't know,

00:49:56.400 | like massive units that attack

00:49:58.320 | against certain kinds of units,

00:50:00.080 | but they're weak against others.

00:50:01.640 | So standard openings themselves have some choices

00:50:05.720 | like rock, paper, scissors style.

00:50:07.480 | Of course, if you scout and you're good at guessing

00:50:09.640 | what the opponent is doing,

00:50:11.080 | then you can play as an advantage

00:50:12.800 | because if you know you're gonna play rock,

00:50:14.480 | I mean, I'm gonna play paper, obviously.

00:50:16.480 | So you can imagine that normal standard games

00:50:19.120 | in Starcraft looks like a continuous rock, paper, scissor game

00:50:24.040 | where you guess what the distribution of rock,

00:50:27.240 | paper and scissor is from the enemy

00:50:29.920 | and reacting accordingly to try to beat it

00:50:33.360 | or put the paper out before he kind of changes his mind

00:50:37.440 | from rock to scissors

00:50:38.880 | and then you would be in a weak position.

00:50:40.480 | - So sorry to pause on that.

00:50:42.160 | I didn't realize this element

00:50:43.320 | 'cause I know it's true with poker.

00:50:44.880 | I know I looked at Labrador's.

00:50:48.800 | So you're also estimating,

00:50:50.880 | trying to guess the distribution,

00:50:52.200 | trying to better and better estimate the distribution

00:50:54.120 | of what the opponent is likely to be doing.

00:50:56.040 | - Yeah, I mean, as a player,

00:50:57.440 | you definitely wanna have a belief state

00:50:59.840 | over what's up on the other side of the map.

00:51:03.000 | And when your belief state becomes inaccurate,

00:51:05.600 | when you start having serious doubts

00:51:08.040 | whether he's gonna play something that you must know,

00:51:11.280 | that's when you scout.

00:51:12.440 | You wanna then gather information, right?

00:51:14.560 | - Is improving the accuracy of the belief

00:51:16.440 | or improving the belief state part of the loss

00:51:19.880 | that you're trying to optimize

00:51:21.040 | or is it just a side effect?

00:51:22.720 | - It's implicit, but you could explicitly model it

00:51:25.840 | and it would be quite good at probably predicting

00:51:28.280 | what's on the other side of the map.

00:51:30.360 | But so far, it's all implicit.

00:51:32.880 | There's no additional reward for predicting the enemy.

00:51:36.680 | So there's these standard openings

00:51:38.800 | and then there's what people call cheese,

00:51:41.640 | which is very interesting.

00:51:42.840 | And AlphaStar sometimes really likes this kind of cheese.

00:51:46.760 | These cheeses, what they are is kind of an all-in strategy.

00:51:51.120 | You're gonna do something sneaky.

00:51:53.240 | You're gonna hide your own buildings

00:51:56.680 | close to the enemy base,

00:51:58.200 | or you're gonna go for hiding your technological buildings

00:52:01.600 | so that you do invisible units

00:52:03.040 | and the enemy just cannot react to detect it

00:52:06.040 | and thus lose the game.

00:52:07.960 | And there's quite a few of these cheeses

00:52:10.000 | and variants of them.

00:52:11.760 | And there is where actually the belief state

00:52:14.440 | becomes even more important.

00:52:16.320 | Because if I scout your base and I see no buildings at all,

00:52:20.160 | any human player knows something's up.

00:52:22.440 | They might know, well,

00:52:23.280 | you're hiding something close to my base.

00:52:25.600 | Should I build suddenly a lot of units to defend?

00:52:28.320 | Should I actually block my ramp with workers

00:52:30.960 | so that you cannot come and destroy my base?

00:52:33.480 | So there's all these is happening

00:52:35.640 | and defending against cheeses is extremely important.

00:52:39.400 | And in the AlphaStar League,

00:52:40.720 | many agents actually develop some cheesy strategies.

00:52:45.040 | And in the games we saw against TLO and Mana,

00:52:47.960 | two out of the 10 agents

00:52:49.200 | were actually doing these kinds of strategies,

00:52:51.720 | which are cheesy strategies.

00:52:53.600 | And then there's a variant of cheesy strategy,

00:52:55.560 | which is called all-in.

00:52:57.320 | So an all-in strategy is not perhaps as drastic as,

00:53:00.400 | oh, I'm gonna build cannons on your base

00:53:02.480 | and then bring all my workers

00:53:03.800 | and try to just disrupt your base and game over,

00:53:06.760 | or GG, as we say in StarCraft.

00:53:08.720 | There's these kind of very cool things

00:53:11.920 | that you can align precisely at a certain time mark.

00:53:14.680 | So for instance, you can generate

00:53:17.320 | exactly 10 unit composition that is perfect,

00:53:20.200 | like five of this type, five of this other type,

00:53:22.880 | and align the upgrade so that at four minutes and a half,

00:53:26.160 | let's say, you have these 10 units

00:53:28.600 | and the upgrade just finished.

00:53:30.560 | And at that point, that army is really scary.

00:53:33.880 | And unless the enemy really knows what's going on,

00:53:36.360 | if you push, you might then have an advantage

00:53:40.160 | because maybe the enemy is doing something more standard,

00:53:42.360 | it expanded too much, it developed too much economy,

00:53:45.680 | and it trade off badly against having defenses,

00:53:49.640 | and the enemy will lose.

00:53:51.040 | But it's called all-in because if you don't win,

00:53:53.560 | then you're gonna lose.

00:53:54.960 | So you see players that do these kinds of strategies.

00:53:57.880 | If they don't succeed, game is not over.

00:53:59.920 | I mean, they still have a base

00:54:01.120 | and they're still gathering minerals,

00:54:02.760 | but they will just GG out of the game

00:54:04.680 | because they know, well, game is over.

00:54:06.680 | I gambled and I failed.

00:54:08.760 | So if we start entering the game theoretic aspects of the game,

00:54:13.240 | it's really rich and that's why

00:54:15.800 | it also makes it quite entertaining to watch.

00:54:17.880 | Even if I don't play, I still enjoy watching the game.

00:54:21.720 | But the agents are trying to do this mostly implicitly,

00:54:26.800 | but one element that we improved in self-play

00:54:29.000 | is creating the AlphaStar League.

00:54:31.280 | And the AlphaStar League is not pure self-play.

00:54:34.560 | It's trying to create different personalities of agents

00:54:37.880 | so that some of them will become cheesy agents.

00:54:41.480 | Some of them might become very economical, very greedy,

00:54:44.360 | like getting all the resources,

00:54:46.160 | but then maybe early on, they're gonna be weak,

00:54:48.760 | but later on, they're gonna be very strong.

00:54:51.000 | And by creating this personality of agents,

00:54:53.400 | which sometimes it just happens naturally

00:54:55.360 | that you can see kind of an evolution of agents

00:54:58.200 | that given the previous generation,

00:55:00.760 | they train against all of them

00:55:01.920 | and then they generate kind of the perfect counter

00:55:04.320 | to that distribution.

00:55:05.720 | But these agents, you must have them in the populations

00:55:09.280 | because if you don't have them,

00:55:11.240 | you're not covered against these things, right?

00:55:13.000 | It's kind of, you wanna create all sorts of the opponents

00:55:17.080 | that you will find in the wild

00:55:18.640 | so you can be exposed to these cheeses,

00:55:21.800 | early aggression, later aggression, more expansions,

00:55:25.680 | dropping units in your base from the side,

00:55:28.320 | all these things.

00:55:29.520 | And pure self-play is getting a bit stuck

00:55:32.720 | at finding some subset of these, but not all of these.

00:55:36.160 | So the AlphaStar League is a way to kind of

00:55:39.400 | do an ensemble of agents that they're all playing in a league

00:55:43.440 | much like people play on Battle.net, right?

00:55:45.480 | They play, you play against someone

00:55:47.400 | who does a new cool strategy and you immediately,

00:55:50.200 | oh my God, I wanna try it, I wanna play again.

00:55:53.000 | And this to me was another critical part of the problem,

00:55:57.520 | which was, can we create a Battle.net for agents?

00:56:01.200 | And that's kind of what the AlphaStar League really-

00:56:03.520 | - That's fascinating.

00:56:04.360 | And where they stick to their different strategies.

00:56:06.880 | Yeah, wow, that's really, really interesting.

00:56:09.560 | So, but that said, you were fortunate enough

00:56:13.200 | or just skilled enough to win 5-0.

00:56:16.240 | And so how hard is it to win?

00:56:19.240 | I mean, that's not the goal.

00:56:20.280 | I guess, I don't know what the goal is.

00:56:21.840 | The goal should be to win majority, not 5-0,

00:56:25.360 | but how hard is it in general to win all matchups

00:56:29.320 | on a one V1?

00:56:31.040 | - So that's a very interesting question

00:56:33.560 | because once you see AlphaStar and superficially

00:56:38.560 | you think, well, okay, it won.

00:56:40.440 | Let's, if you sum all the games like 10 to one, right?

00:56:42.880 | It lost the game that it played with the camera interface.

00:56:46.240 | You might think, well, that's done, right?

00:56:48.440 | There's, it's superhuman at the game.

00:56:50.760 | And that's not really the claim

00:56:52.240 | we really can make actually.

00:56:54.720 | The claim is we beat a professional gamer

00:56:58.760 | for the first time.

00:57:00.040 | Starcraft has really been a thing

00:57:02.400 | that has been going on for a few years,

00:57:04.040 | but a moment like this had not occurred before yet.

00:57:09.040 | But are these agents impossible to beat?

00:57:12.280 | Absolutely not, right?

00:57:13.360 | So that's a bit what's, you know,

00:57:15.680 | kind of the difference is the agents play

00:57:18.440 | at Grandmaster level.

00:57:19.480 | They're definitely understand the game enough

00:57:21.400 | to play extremely well, but are they unbeatable?

00:57:24.880 | Do they play perfect?

00:57:27.920 | No, and actually in Starcraft,

00:57:30.320 | because of these sneaky strategies,

00:57:33.240 | it's always possible that you might take a huge risk

00:57:36.040 | sometimes, but you might get wins, right?

00:57:37.920 | Out of this.

00:57:39.200 | So I think that as a domain,

00:57:42.640 | it still has a lot of opportunities,

00:57:44.480 | not only because of course we want to learn

00:57:46.920 | with less experience.

00:57:48.040 | We would like to, I mean, if I learn to play Protoss,

00:57:50.760 | I can play Terran and learn it much quicker

00:57:53.560 | than AlphaStar can, right?

00:57:54.760 | So there are obvious interesting research challenges

00:57:57.720 | as well, but even as the raw performance goes,

00:58:02.720 | really the claim here can be,

00:58:05.120 | we are at pro level or at high Grandmaster level,

00:58:09.320 | but obviously the players also did not know what to expect.

00:58:14.320 | Right, this kind of their prior distribution was a bit off

00:58:16.960 | because they played this kind of new, like alien brain

00:58:20.600 | as they like to say it, right?

00:58:22.080 | And that's what makes it exciting for them.

00:58:25.080 | But also I think if you look at the games closely,

00:58:28.040 | you see there were weaknesses in some points,

00:58:31.520 | maybe AlphaStar did not scout,

00:58:33.280 | or if it had got invisible units going against

00:58:36.080 | at certain points, it wouldn't have known

00:58:38.200 | and it would have been bad.

00:58:39.600 | So there's still quite a lot of work to do,

00:58:42.880 | but it's really a very exciting moment for us to be seeing,

00:58:46.440 | wow, a single neural net on a GPU is actually playing

00:58:50.320 | against these guys who are amazing.

00:58:52.040 | I mean, you have to see them play in life.

00:58:53.720 | They're really, really amazing players.

00:58:55.800 | - Yeah, I'm sure there must be a guy in Poland somewhere

00:59:00.440 | right now training his butt off to make sure

00:59:03.400 | that this never happens again with AlphaStar.

00:59:06.600 | So that's really exciting in terms of AlphaStar

00:59:09.720 | having some holes to exploit, which is great.

00:59:12.200 | And then we build on top of each other

00:59:14.360 | and it feels like StarCraft on let go,

00:59:17.040 | even if you win, it's still not,

00:59:21.640 | there's so many different dimensions

00:59:23.120 | in which you can explore.

00:59:24.240 | So that's really, really interesting.

00:59:25.600 | Do you think there's a ceiling to AlphaStar?

00:59:28.560 | You've said that it hasn't reached,

00:59:31.400 | you know, this is a big,

00:59:32.880 | wait, let me actually just pause for a second.

00:59:35.560 | How did it feel to come here to this point,

00:59:40.240 | to be a top professional player?

00:59:42.260 | Like that night, I mean, you know,

00:59:44.640 | Olympic athletes have their gold medal, right?

00:59:47.160 | This is your gold medal in a sense.

00:59:48.880 | Sure, you're cited a lot,

00:59:50.440 | you've published a lot of prestigious papers,

00:59:52.660 | whatever, but this is like a win.

00:59:55.320 | How did it feel?

00:59:56.520 | I mean, it was, for me, it was unbelievable

00:59:59.480 | because first the win itself,

01:00:03.960 | I mean, it was so exciting.

01:00:05.120 | I mean, so looking back to those last days of 2018,

01:00:10.120 | really, that's when the games were played.

01:00:13.160 | I'm sure I look back at that moment,

01:00:15.040 | I'll say, oh my God, I wanna be like in a project like that.

01:00:18.040 | It's like, I already feel the nostalgia of like,

01:00:21.120 | yeah, that was huge in terms of the energy

01:00:24.240 | and the team effort that went into it.

01:00:26.340 | And so in that sense, as soon as it happened,

01:00:29.240 | I already knew it was kind of,

01:00:31.260 | I was losing it a little bit.

01:00:32.980 | So it's almost like sad that it happened and oh my God,

01:00:36.320 | but on the other hand, it also verifies the approach.

01:00:41.320 | But to me also, there's so many challenges

01:00:43.800 | and interesting aspects of intelligence

01:00:46.080 | that even though we can train a neural network

01:00:49.840 | to play at the level of the best humans,

01:00:52.680 | there's still so many challenges.

01:00:54.180 | So for me, it's also like,

01:00:55.440 | well, this is really an amazing achievement,

01:00:57.420 | but I already was also thinking about next steps.

01:00:59.920 | I mean, as I said, these Asians play Protoss versus Protoss,

01:01:04.040 | but they should be able to play a different race

01:01:07.200 | much quicker, right?

01:01:08.120 | So that would be an amazing achievement.

01:01:10.620 | Some people call this meta reinforcement learning,

01:01:13.320 | meta learning and so on, right?

01:01:15.160 | So there's so many possibilities after that moment,

01:01:18.920 | but the moment itself, it really felt great.

01:01:21.500 | We had this bet, so I'm kind of a pessimist in general.

01:01:27.680 | So I kind of sent an email to the team, I said,

01:01:30.040 | "Okay, let's against TLO first, right?

01:01:33.600 | Like what's gonna be the result?"

01:01:35.080 | And I really thought we would lose like 5-0, right?

01:01:38.800 | We had some calibration made against the 5,000 MMR player.

01:01:43.800 | TLO was much stronger than that player,

01:01:47.280 | even if he played Protoss, which is his off race.

01:01:50.000 | But yeah, I was not imagining we would win.

01:01:53.040 | So for me, that was just kind of a test run or something.

01:01:55.520 | And then he was really surprised.

01:01:58.940 | And unbelievably, we went to this bar to celebrate

01:02:03.940 | and Dave tells me, "Well, why don't we invite someone

01:02:08.280 | who is a thousand MMR stronger in Protoss,

01:02:10.920 | like an actual Protoss player?"

01:02:12.480 | Like it turned up being Mana, right?

01:02:16.120 | And we had some drinks and I said, "Sure, why not?"

01:02:19.320 | But then I thought, "Well, that's really gonna be

01:02:21.160 | impossible to beat."

01:02:22.000 | I mean, even because it's so much ahead,

01:02:24.480 | a thousand MMR is really like 99% probability

01:02:28.320 | that Mana would beat TLO as Protoss versus Protoss, right?

01:02:33.000 | So we did that.

01:02:34.160 | And to me, the second game was much more important,

01:02:38.920 | even though a lot of uncertainty kind of disappeared

01:02:42.040 | after we kind of beat TLO.

01:02:43.600 | I mean, he is a professional player,

01:02:45.600 | so that was kind of,

01:02:46.640 | "Oh, but that's really a very nice achievement."

01:02:49.680 | But Mana really was at the top

01:02:51.720 | and you could see he played much better,

01:02:53.800 | but our agents got much better too.

01:02:55.320 | So it's like, "Ah."

01:02:57.360 | And then after the first game, I said,

01:02:59.680 | "If we take a single game,

01:03:00.840 | at least we can say we beat a game."

01:03:02.680 | I mean, even if we don't beat the series,

01:03:04.280 | for me, that was a huge relief.

01:03:06.880 | And I mean, I remember the hacking dummies.

01:03:09.160 | And I mean, it was really like this moment for me

01:03:11.840 | will resonate forever as a researcher.

01:03:14.120 | And I mean, as a person,

01:03:15.320 | and yeah, it's a really great accomplishment.

01:03:18.200 | And it was great also to be there with the team in the room.

01:03:21.320 | I don't know if you saw like the...

01:03:22.960 | So it was really like...

01:03:24.680 | - I mean, from my perspective,

01:03:25.920 | the other interesting thing is just like watching Kasparov,

01:03:29.640 | now watching Mana was also interesting

01:03:33.680 | because he is kind of a loss of words.

01:03:36.080 | I mean, whenever you lose, I've done a lot of sports.

01:03:38.320 | You sometimes say excuses, you look for reasons.

01:03:43.480 | And he couldn't really come up with reasons.

01:03:45.680 | - Yeah, yeah.

01:03:46.520 | - I mean, so with the off race for Protoss,

01:03:50.000 | you could say, well, it felt awkward, it wasn't,

01:03:52.280 | but here it was just beaten.

01:03:55.160 | And it was beautiful to look at a human being

01:03:57.920 | being superseded by an AI system.

01:04:00.240 | I mean, it's a beautiful moment for researchers.

01:04:04.400 | - Yeah, for sure.

01:04:05.240 | It was, I mean, probably the highlight of my career so far

01:04:09.920 | because of its uniqueness and coolness.

01:04:11.760 | And I don't know.

01:04:12.600 | I mean, it's obviously, as you said,

01:04:14.280 | you can look at paper citations and so on,

01:04:16.200 | but this really is like a testament

01:04:19.240 | of the whole machine learning approach

01:04:21.240 | and using games to advance technology.

01:04:24.640 | I mean, it really was,

01:04:26.840 | everything came together at that moment.

01:04:28.560 | That's really the summary.

01:04:29.840 | - Also on the other side, it's a popularization of AI too,

01:04:34.040 | because just like traveling to the moon and so on.

01:04:38.200 | I mean, this is where a very large community of people

01:04:41.000 | that don't really know AI,

01:04:43.120 | they get to really interact with it.

01:04:45.160 | - Which is very important.

01:04:46.000 | I mean, we must, you know,

01:04:48.640 | writing papers helps our peers, researchers,

01:04:51.400 | to understand what we're doing.

01:04:52.520 | But I think AI is becoming mature enough

01:04:55.880 | that we must sort of try to explain what it is.

01:04:59.000 | And perhaps through games is an obvious way

01:05:01.440 | because these games always had built-in AI.

01:05:03.640 | So it may be everyone experienced an AI playing a video game,

01:05:07.680 | even if they don't know

01:05:08.520 | because there's always some scripted element

01:05:10.240 | and some people might even call that AI already, right?

01:05:13.040 | - So what are other applications

01:05:16.320 | of the approaches underlying AlphaStar

01:05:19.080 | that you see happening?

01:05:20.280 | There's a lot of echoes of, you said,

01:05:22.360 | transformer of language modeling and so on.

01:05:25.680 | Have you already started thinking

01:05:27.120 | where the breakthroughs in AlphaStar

01:05:30.400 | get expanded to other applications?

01:05:32.280 | - Right, so I thought about a few things

01:05:34.640 | for like kind of next month, next year.

01:05:38.440 | The main thing I'm thinking about actually is what's next

01:05:41.480 | as a kind of a grand challenge,

01:05:43.160 | because for me, like we've seen Atari

01:05:47.120 | and then there's like the sort of three-dimensional worlds

01:05:50.280 | that we've seen also like pretty good performance

01:05:52.520 | from this Capture the Flag agents

01:05:54.120 | that also some people at DeepMind and elsewhere

01:05:56.440 | are working on.

01:05:57.600 | We've also seen some amazing results on like,

01:05:59.600 | for instance, Dota 2,

01:06:00.560 | which is also a very complicated game.

01:06:03.280 | So for me, like the main thing I'm thinking about

01:06:05.960 | is what's next in terms of challenge.

01:06:07.960 | So as a researcher, I see sort of two tensions

01:06:12.920 | between research and then applications

01:06:16.160 | or areas or domains where you apply them.

01:06:18.480 | So on the one hand, we've done,

01:06:20.480 | thanks to the application of StarCraft is very hard,

01:06:23.320 | we develop some techniques, some new research

01:06:25.600 | that now we could look at elsewhere.

01:06:27.480 | Like are there other applications where we can apply these?

01:06:30.520 | And the obvious ones, absolutely,

01:06:32.880 | you can think of feeding back to sort of the community

01:06:37.440 | we took from, which was mostly sequence modeling

01:06:40.240 | or natural language processing.

01:06:41.680 | So we've developed and extended things from the transformer

01:06:46.120 | and we use pointer networks.

01:06:48.120 | We combine LSTM and transformers in interesting ways.

01:06:51.280 | So that's perhaps the kind of lowest hanging fruit

01:06:54.200 | of feeding back to now a different field

01:06:57.600 | of machine learning that's not playing video games.

01:07:00.880 | - Let me go old school and jump to Mr. Alan Turing.

01:07:05.680 | So the Turing test, you know,

01:07:08.440 | it's a natural language test, a conversational test.

01:07:11.560 | What's your thought of it as a test for intelligence?

01:07:15.720 | Do you think it is a grand challenge

01:07:17.320 | that's worthy of undertaking?

01:07:18.880 | Maybe if it is, would you reformulate it

01:07:21.920 | or phrase it somehow differently?

01:07:23.680 | - Right, so I really love the Turing test

01:07:25.640 | because I also like sequences and language understanding.

01:07:29.560 | And in fact, some of the early work we did

01:07:32.480 | in machine translation, we tried to apply

01:07:34.960 | to kind of a neural chatbot,

01:07:37.280 | which obviously would never pass the Turing test

01:07:40.160 | because it was very limited.

01:07:42.280 | But it is a very fascinating idea

01:07:45.160 | that you could really have an AI

01:07:49.400 | that would be indistinguishable from humans

01:07:51.760 | in terms of asking or conversing with it, right?

01:07:56.000 | So I think the test itself seems very nice

01:08:00.680 | and it's kind of well-defined actually,

01:08:02.560 | like the passing it or not.

01:08:04.960 | I think there's quite a few rules

01:08:06.560 | that feel like pretty simple

01:08:09.120 | and you could really like have,

01:08:12.480 | I mean, I think they have these competitions every year.

01:08:14.760 | - Yeah, so the Leibner Prize,

01:08:15.920 | but I don't know if you've seen,

01:08:17.520 | I don't know if you've seen the kind of bots

01:08:22.240 | that emerge from that competition.

01:08:24.160 | They're not quite as what you would,

01:08:28.000 | so it feels like that there's weaknesses

01:08:29.920 | with the way Turing formulated it.

01:08:31.400 | It needs to be that the definition

01:08:34.960 | of a genuine, rich, fulfilling human conversation

01:08:39.960 | needs to be something else.

01:08:41.600 | Like the Alexa Prize, which I'm not as well familiar with,

01:08:44.840 | has tried to define that more,

01:08:46.160 | I think by saying you have to continue

01:08:48.200 | keeping a conversation for 30 minutes,

01:08:50.640 | something like that.

01:08:52.200 | So basically forcing the agent not to just fool

01:08:55.480 | but to have an engaging conversation kind of thing.

01:08:57.980 | Is that, I mean, is this,

01:09:02.260 | have you thought about this problem richly?

01:09:06.380 | And if you have in general, how far away are we from,

01:09:10.660 | you worked a lot on language,

01:09:12.340 | understanding language generation,

01:09:15.420 | but the full dialogue, the conversation,

01:09:17.700 | just sitting at the bar,

01:09:19.860 | having a cup of beers for an hour,

01:09:21.720 | that kind of conversation, have you thought about it?

01:09:23.620 | - Yeah, so I think you touched here on the critical point,

01:09:26.380 | which is feasibility, right?

01:09:28.580 | So there's a great sort of essay by Hamming,

01:09:32.860 | which describes sort of grand challenges of physics.

01:09:37.360 | And he argues that, well, okay, for instance,

01:09:41.060 | teleportation or time travel

01:09:43.060 | are great grand challenges of physics,

01:09:45.220 | but there's no attacks.

01:09:46.580 | We really don't know or cannot kind of make any progress.

01:09:50.300 | So that's why most physicists and so on,

01:09:53.340 | they don't work on these in their PhDs

01:09:55.340 | and as part of their careers.

01:09:57.860 | So I see the Turing test as, in the full Turing test,

01:10:00.980 | as a bit still too early.

01:10:02.720 | Like I am, I think we're,

01:10:05.220 | especially with the current trend

01:10:06.700 | of deep learning language models,

01:10:10.060 | we've seen some amazing examples.

01:10:11.580 | I think GPT-2 being the most recent one,

01:10:14.140 | which is very impressive,

01:10:15.820 | but to understand, to fully solve passing

01:10:19.540 | or fooling a human to think that you're,

01:10:22.060 | that there's a human on the other side,

01:10:23.420 | I think we're quite far.

01:10:24.940 | So as a result, I don't see myself,

01:10:27.300 | and I probably would not recommend people doing a PhD

01:10:30.460 | on solving the Turing test,

01:10:31.620 | because it just feels it's kind of too early

01:10:34.080 | or too hard of a problem.

01:10:35.460 | - Yeah, but that said, you said the exact same thing

01:10:37.780 | about StarCraft about a few years ago.

01:10:40.420 | - Indeed. - So to Demis.

01:10:41.580 | So I appreciate. (laughs)

01:10:43.420 | - Yes. - You'll probably also be

01:10:45.020 | the person who passes the Turing test in three years.

01:10:48.180 | - I mean, I think that, yeah, so.

01:10:50.980 | - So we have this on record, this is nice.

01:10:52.660 | - It's true, it's true.

01:10:53.500 | I mean, it's true that progress sometimes

01:10:56.540 | is a bit unpredictable.

01:10:57.780 | I really wouldn't have not, even six months ago,

01:11:00.780 | I would not have predicted the level that we see

01:11:03.220 | that these agents can deliver at Grandmaster level.

01:11:06.740 | But I have worked on language enough,

01:11:10.060 | and basically my concern is not that something could happen,

01:11:13.580 | a breakthrough could happen that would bring us

01:11:15.620 | to solving or passing the Turing test,

01:11:18.380 | is that I just think the statistical approach to it,

01:11:21.660 | like this, it's not gonna cut it.

01:11:24.100 | So we need a breakthrough, which is great for the community.

01:11:28.260 | But given that, I think there's quite more uncertainty.

01:11:31.740 | Whereas for StarCraft, I knew what the steps would be

01:11:36.740 | to kind of get us there.

01:11:38.060 | I think it was clear that using the imitation learning part

01:11:41.540 | and then using these Battle.net for agents

01:11:44.300 | were gonna be key, and it turned out that this was the case

01:11:48.220 | and a little more was needed, but not much more.

01:11:51.540 | For Turing test, I just don't know what the plan

01:11:54.260 | or execution plan would look like.

01:11:55.900 | So that's why I myself working on it

01:11:59.060 | as a grand challenge is hard,

01:12:01.420 | but there are quite a few sub challenges that are related

01:12:04.780 | that you could say, well, I mean,

01:12:05.900 | what if you create a great assistant,

01:12:09.020 | like Google already has like the Google Assistant,

01:12:11.340 | so can we make it better?

01:12:13.020 | And can we make it fully neural and so on?

01:12:15.380 | That I start to believe maybe we're reaching a point

01:12:18.140 | where we should attempt these challenges.

01:12:20.660 | - I like this conversation so much

01:12:22.380 | 'cause it echoes very much the StarCraft conversation.

01:12:24.820 | It's exactly how you approach StarCraft.

01:12:26.820 | Let's break it down into small pieces and solve those,

01:12:29.580 | and you end up solving the whole game.

01:12:31.300 | Great, but that said, you're behind some of the

01:12:35.180 | sort of biggest pieces of work in deep learning

01:12:37.660 | in the last several years.

01:12:39.300 | So you mentioned some limits.

01:12:42.260 | What do you think of the current limits of deep learning

01:12:44.900 | and how do we overcome those limits?

01:12:47.020 | - So if I had to actually use a single word

01:12:50.100 | to define the main challenge in deep learning,

01:12:53.140 | it's a challenge that probably has been the challenge

01:12:55.660 | for many years and is that of generalization.

01:12:59.660 | So what that means is that all that we're doing

01:13:04.460 | is fitting functions to data.

01:13:06.700 | And when the data we see is not from the same distribution

01:13:11.700 | or even if there are sometimes that it is very close

01:13:15.060 | to the distribution, but because of the way we train it

01:13:18.140 | with limited samples, we then get to this stage

01:13:22.340 | where we just don't see generalization

01:13:25.540 | as much as we can generalize.

01:13:27.700 | And I think adversarial examples are a clear example

01:13:30.780 | of this, but if you study machine learning and literature

01:13:34.540 | and the reason why SVMs came very popular

01:13:38.260 | were because they were dealing

01:13:39.660 | and they had some guarantees about generalization,

01:13:42.300 | which is unseen data or out of distribution,

01:13:45.500 | or even within distribution where you take an image,

01:13:47.900 | adding a bit of noise, these models fail.

01:13:51.220 | So I think really, I don't see a lot of progress

01:13:56.220 | on generalization in the strong generalization sense

01:14:00.780 | of the word.

01:14:01.820 | I think our neural networks, you can always find

01:14:06.820 | design examples that will make their outputs arbitrary,

01:14:10.980 | which is not good because we humans would never be fooled

01:14:15.980 | by these kind of images or manipulation of the image.

01:14:19.900 | And if you look at the mathematics, you kind of understand

01:14:22.660 | this is a bunch of matrices multiplied together.

01:14:26.060 | There's probably numerics and instability

01:14:28.020 | that you can just find corner cases.

01:14:30.820 | So I think that's really the underlying topic.

01:14:34.500 | Many times we see when even at the grand stage

01:14:38.700 | of like Turing test generalization, I mean, if you start,

01:14:43.100 | I mean, passing the Turing test, should it be in English

01:14:46.420 | or should it be in any language, right?

01:14:48.500 | I mean, as a human, if you ask something

01:14:52.260 | in a different language, you actually will go

01:14:54.060 | and do some research and try to translate it and so on.

01:14:57.660 | Should the Turing test include that, right?

01:15:01.020 | And it's really a difficult problem and very fascinating

01:15:03.980 | and very mysterious, actually.

01:15:05.300 | - Yeah, absolutely.

01:15:06.260 | But do you think it's, if you were to try to solve it,

01:15:10.460 | can you not grow the size of data intelligently

01:15:14.220 | in such a way that the distribution of your training set

01:15:17.380 | does include the entirety of the testing set?

01:15:20.340 | - I think--

01:15:21.180 | - Is that one path?

01:15:22.020 | The other path is totally new methodology.

01:15:23.820 | - Right. - It's not statistical.

01:15:24.940 | - So a path that has worked well, and it worked well

01:15:27.860 | in StarCraft and in machine translation and in languages,

01:15:30.660 | scaling up the data and the model.

01:15:32.780 | And that's kind of been maybe the only single formula

01:15:37.340 | that still delivers today in deep learning, right?

01:15:40.420 | It's that scale, data scale and model scale

01:15:44.020 | really do more and more of the things that we thought,

01:15:47.060 | oh, there's no way it can generalize to these

01:15:49.180 | or there's no way it can generalize to that.

01:15:51.300 | But I don't think fundamentally it will be solved with this.

01:15:54.820 | And for instance, I'm really liking some style or approach

01:15:59.580 | that would not only have neural networks,

01:16:02.100 | but it would have programs or some discrete decision-making

01:16:06.380 | because there is where I feel there's a bit more,

01:16:09.700 | like, I mean, the example of, the best example,

01:16:12.140 | I think for understanding this is,

01:16:14.620 | I also worked a bit on, oh, like we can learn an algorithm

01:16:17.580 | with a neural network, right?

01:16:18.780 | So you give it many examples and it's gonna sort your,

01:16:21.340 | sort the input numbers or something like that.

01:16:24.380 | But really, strong generalization is,

01:16:27.740 | you give me some numbers or you ask me to create an algorithm

01:16:31.380 | that sorts numbers.

01:16:32.300 | And instead of creating a neural net,

01:16:33.700 | which will be fragile because it's gonna go out of range

01:16:37.340 | at some point, you're gonna give it numbers

01:16:38.980 | that are too large, too small and whatnot.

01:16:42.060 | You just, if you just create a piece of code

01:16:45.340 | that sorts the numbers, then you can prove

01:16:47.180 | that that will generalize to absolutely

01:16:49.660 | all the possible inputs you could give.

01:16:51.940 | So I think that's, the problem comes

01:16:53.820 | with some exciting prospects.

01:16:55.860 | I mean, scale is a bit more boring, but it really works.

01:16:59.460 | And then maybe programs and discrete abstractions

01:17:02.860 | are a bit less developed, but clearly I think

01:17:06.380 | they're quite exciting in terms of future for the field.

01:17:09.900 | - Do you draw any insight wisdom from the 80s

01:17:13.460 | and expert systems and symbolic systems, symbolic computing?

01:17:16.900 | Do you ever go back to those sort of reasoning,

01:17:19.580 | that kind of logic?

01:17:20.740 | Do you think that might make a comeback?

01:17:23.140 | You'll have to dust off those books?

01:17:24.900 | - Yeah, I actually love actually adding

01:17:28.180 | more inductive biases.

01:17:30.180 | To me, the problem really is, what are you trying to solve?

01:17:34.260 | If what you're trying to solve is so important

01:17:36.460 | that try to solve it no matter what,

01:17:39.140 | then absolutely use rules, use domain knowledge,

01:17:44.140 | and then use a bit of the magic of machine learning

01:17:46.860 | to empower or to make the system as the best system

01:17:50.060 | that will detect cancer or detect weather patterns, right?

01:17:55.060 | Or in terms of StarCraft, it also was a very big challenge.

01:17:59.060 | So I was definitely happy that if we had to cut a corner here

01:18:04.180 | and there, it could have been interesting to do.

01:18:06.820 | And in fact, in StarCraft, we start thinking

01:18:09.500 | about expert systems because it's a very, you can define,

01:18:12.700 | I mean, people actually build StarCraft bots

01:18:15.020 | by thinking about those principles,

01:18:16.820 | like state machines and rule-based,

01:18:20.140 | and then you could think of combining a bit

01:18:22.820 | of a rule-based system,

01:18:24.420 | but that has also neural networks incorporated

01:18:27.380 | to make it generalize a bit better.

01:18:28.980 | So absolutely, I mean, we should definitely go back

01:18:31.740 | to those ideas and anything that makes the problem simpler.

01:18:35.300 | As long as your problem is important, that's okay.

01:18:37.900 | And that's research driving a very important problem.

01:18:40.940 | And on the other hand, if you wanna really focus

01:18:44.420 | on the limits of reinforcement learning,

01:18:46.500 | then of course you must try not to look at imitation data

01:18:50.620 | or to look for some rules of the domain

01:18:54.060 | that would help a lot or even feature engineering, right?

01:18:56.900 | So this is a tension that depending on what you do,

01:19:00.620 | I think both ways are definitely fine.

01:19:03.180 | And I would never not do one or the other,

01:19:05.900 | if you're, as long as what you're doing is important

01:19:08.780 | and needs to be solved, right?

01:19:09.900 | - Right.

01:19:11.180 | So there's a bunch of different ideas

01:19:13.380 | that you've developed that I really enjoy.

01:19:16.780 | But one is translating from image captioning,

01:19:21.780 | translating from image to text.

01:19:23.820 | Just another beautiful idea, I think,

01:19:28.460 | that resonates throughout your work, actually.

01:19:33.140 | So the underlying nature of reality being language,

01:19:36.060 | always, somehow. - Yeah.

01:19:38.740 | - So what's the connection between images and text,

01:19:42.460 | or rather the visual world

01:19:43.940 | and the world of language in your view?

01:19:46.460 | - Right, so I think a piece of research

01:19:50.580 | that's been central to, I would say,

01:19:52.300 | even extending into StarCraft is this idea

01:19:54.980 | of sequence to sequence learning,

01:19:57.580 | which what we really meant by that is that

01:20:00.060 | you can now really input anything to a neural network

01:20:04.500 | as the input X,

01:20:06.060 | and then the neural network will learn a function F

01:20:09.500 | that will take X as an input and produce any output Y.

01:20:12.740 | And these X and Ys don't need to be static

01:20:16.140 | or like a fixed vectors or anything like that.

01:20:21.140 | It could be really sequences

01:20:23.700 | and now beyond data structures, right?

01:20:26.500 | So that paradigm was tested in a very interesting way

01:20:31.500 | when we moved from translating French to English

01:20:35.700 | to translating an image to its caption.

01:20:37.860 | But the beauty of it is that really,

01:20:40.660 | and that's actually how it happened.

01:20:42.060 | I ran, I changed the line of code

01:20:44.260 | in this thing that was doing machine translation,

01:20:47.140 | and I came the next day and I saw how it,

01:20:50.420 | like it was producing captions that seemed like,

01:20:53.580 | oh my God, this is really, really working.

01:20:55.980 | And the principle is the same, right?

01:20:57.500 | So I think I don't see text, vision, speech,

01:21:02.500 | waveforms as something different.

01:21:06.100 | As long as you basically learn a function

01:21:10.500 | that will vectorize these into,

01:21:14.740 | and then after we vectorize it,

01:21:16.460 | we can then use transformers, LSTMs,

01:21:19.540 | whatever the flavor of the month of the model is.

01:21:22.380 | And then as long as we have enough supervised data,

01:21:25.700 | really this formula will work and will keep working,

01:21:29.980 | I believe, to some extent,

01:21:31.780 | model of these generalization issues that I mentioned before.

01:21:34.940 | - So, but the task there is to vectorize,

01:21:36.700 | sort of form a representation that's meaningful, I think.

01:21:39.820 | And your intuition now,

01:21:41.460 | having worked with all this media is that

01:21:43.500 | once you are able to form that representation,

01:21:46.460 | you could basically take anything, any sequence.

01:21:49.100 | Is there, going back to StarCraft,

01:21:52.460 | is there limits on the length?

01:21:55.340 | So we didn't really touch on the long-term aspect.

01:21:59.420 | How did you overcome the whole

01:22:01.340 | really long-term aspect of things here?

01:22:03.740 | Is there some tricks or--

01:22:05.100 | - So the main trick, so StarCraft,

01:22:08.340 | if you look at absolutely every frame,

01:22:10.620 | you might think it's quite a long game.

01:22:12.420 | So we would have to multiply 22 times 60 seconds per minute

01:22:17.420 | times maybe at least 10 minutes per game on average.

01:22:21.740 | So there are quite a few frames,

01:22:25.660 | but the trick really was to only observe, in fact,

01:22:30.180 | which might be seen as a limitation,

01:22:32.260 | but it is also a computational advantage.

01:22:35.180 | Only observe when you act.

01:22:37.580 | And then what the neural network decides

01:22:39.980 | is what is the gap gonna be until the next action?

01:22:43.620 | And if you look at most StarCraft games

01:22:48.060 | that we have in the dataset that Blizzard provided,

01:22:51.940 | it turns out that most games are actually only,

01:22:55.980 | I mean, it is still a long sequence,

01:22:57.980 | but it's maybe like 1,000 to 1,500 actions,

01:23:02.060 | which if you start looking at LSTMs,

01:23:06.140 | large LSTMs, transformers,

01:23:08.580 | it's not that difficult,

01:23:11.620 | especially if you have supervised learning.

01:23:14.460 | If you had to do it with reinforcement learning,

01:23:16.220 | the credit assignment problem,

01:23:17.700 | what is it in this game that made you win?

01:23:19.780 | That would be really difficult.

01:23:21.580 | But thankfully, because of imitation learning,

01:23:24.540 | we didn't kind of have to deal with this directly.

01:23:27.420 | Although if we had to, we tried it,

01:23:29.580 | and what happened is you just take all your workers

01:23:31.820 | and attack with them.

01:23:33.340 | And that sort of is kind of obvious in retrospect

01:23:36.060 | because you start trying random actions.

01:23:38.100 | One of the actions will be a worker

01:23:40.300 | that goes to the enemy base,

01:23:41.420 | and because it's self-play,

01:23:42.980 | it's not gonna know how to defend

01:23:44.740 | because it basically doesn't know almost anything.

01:23:47.020 | And eventually what you develop is this,

01:23:49.420 | take all workers and attack,

01:23:51.060 | because the credit assignment issue in ARR is really, really hard.

01:23:55.860 | I do believe we could do better,

01:23:57.580 | and that's maybe a research challenge for the future.

01:24:00.580 | But yeah, even in StarCraft,

01:24:03.460 | the sequences are maybe 1,000,

01:24:05.420 | which I believe is within the realm of what transformers can do.

01:24:10.380 | Yeah, I guess the difference between StarCraft and Go is

01:24:14.540 | in Go and chess, stuff starts happening right away.

01:24:17.820 | - Right. - So there's not...

01:24:19.420 | Yeah, it's pretty easy through self-play,

01:24:22.180 | not easy, but through self-play,

01:24:23.460 | it's possible to develop reasonable strategies quickly

01:24:25.940 | as opposed to StarCraft.

01:24:27.220 | I mean, in Go, there's only 400 actions,

01:24:30.620 | but one action is what people would call the God action

01:24:34.140 | that would be, if you had expanded the whole search tree,

01:24:38.660 | that's the best action if you did minimax

01:24:40.780 | or whatever algorithm you would do

01:24:42.540 | if you had the computational capacity.

01:24:44.940 | But in StarCraft, 400 is minuscule.

01:24:48.620 | Like in 400, you couldn't even click

01:24:51.900 | on the pixels around a unit, right?

01:24:53.780 | So I think the problem there is,

01:24:56.420 | in terms of action space size, is way harder.

01:25:00.900 | So, and that search is impossible.

01:25:03.820 | So there's quite a few challenges indeed

01:25:05.980 | that make this kind of a step up

01:25:09.300 | in terms of machine learning.

01:25:10.580 | For humans, maybe playing StarCraft seems more intuitive

01:25:14.420 | because it looks real.

01:25:15.900 | I mean, the graphics and everything moves smoothly,

01:25:18.780 | whereas I don't know how to,

01:25:20.140 | I mean, Go is a game that I would really need to study.

01:25:22.620 | It feels quite complicated.

01:25:23.860 | But for machines, kind of maybe it's the reverse, yes.

01:25:27.020 | - Which shows you the gap actually between deep learning

01:25:30.140 | and however the heck our brains work.

01:25:32.140 | So you developed a lot of really interesting ideas.

01:25:35.980 | It's interesting to just ask,

01:25:37.540 | what's your process of developing new ideas?

01:25:41.140 | Do you like brainstorming with others?

01:25:42.860 | Do you like thinking alone?

01:25:44.500 | Do you like, like what was it, Ian Goodfellow said

01:25:49.100 | he came up with GANs after a few beers.

01:25:51.260 | - Right.

01:25:53.340 | - He thinks beers are essential for coming up with new ideas.

01:25:55.820 | - We had beers to decide to play another game

01:25:58.500 | of StarCraft after a week.

01:25:59.660 | So it's really similar to that story.

01:26:02.660 | Actually, I explained this in a DeepMind retreat

01:26:05.780 | and I said, this is the same as the GAN story.

01:26:07.900 | I mean, we were in a bar and we decided,

01:26:09.540 | let's play a game next week and that's what happened.

01:26:11.820 | - I feel like we're giving the wrong message

01:26:13.500 | to young undergrads.

01:26:15.020 | - Yeah, I know.

01:26:15.860 | - But in general, like, do you like brainstorming?

01:26:18.220 | Do you like thinking alone, working stuff out?

01:26:20.140 | - So I think throughout the years also things changed, right?

01:26:23.860 | So initially I was very fortunate to be with great minds

01:26:28.860 | like Jeff Hinton, Jeff Dean, Ilya Sutskever.

01:26:33.940 | I was really fortunate to join Brain at the very good time.

01:26:37.660 | So at that point, ideas, I was just kind of brainstorming

01:26:41.460 | with my colleagues and learned a lot.

01:26:43.940 | And keep learning is actually something

01:26:46.300 | you should never stop doing, right?

01:26:48.100 | So learning implies reading papers

01:26:50.940 | and also discussing ideas with others.

01:26:53.140 | It's very hard at some point to not communicate

01:26:56.620 | that being reading a paper from someone

01:26:59.060 | or actually discussing, right?

01:27:00.460 | So definitely that communication aspect needs to be there,

01:27:05.420 | whether it's written or oral.

01:27:07.580 | Nowadays, I'm also trying to be a bit more strategic

01:27:12.780 | about what research to do.

01:27:15.020 | So I was describing a little bit this sort of tension

01:27:18.420 | between research for the sake of research.

01:27:21.460 | And then you have, on the other hand,

01:27:22.940 | applications that can drive the research, right?

01:27:25.580 | And honestly, the formula that has worked best for me

01:27:28.500 | is just find a hard problem

01:27:31.540 | and then try to see how research fits into it,

01:27:34.620 | how it doesn't fit into it, and then you must innovate.

01:27:37.820 | So I think machine translation drove sequence to sequence.

01:27:42.820 | Then maybe like learning algorithms that had to,

01:27:47.140 | like combinatorial algorithms led to pointer networks.

01:27:50.540 | StarCraft led to really scaling up imitation learning

01:27:53.860 | and the AlphaStar League.

01:27:55.540 | So that's been a formula that I personally like,

01:27:58.380 | but the other one is also valid.

01:27:59.980 | And I see it succeed a lot of the times

01:28:02.740 | where you just want to investigate model-based RL

01:28:06.540 | as a kind of a research topic.

01:28:08.180 | And then you must then start to think,

01:28:11.020 | well, how are the tests?

01:28:12.180 | How are you going to test these ideas?

01:28:14.260 | You need to kind of a minimal environment to try things.

01:28:17.940 | You need to read a lot of papers and so on.

01:28:19.740 | And that's also very fun to do

01:28:21.020 | and something I've also done quite a few times,

01:28:24.060 | both at Brain, at DeepMind, and obviously as a PhD.

01:28:27.580 | So I think besides the ideas and discussions,

01:28:32.580 | I think it's important also

01:28:34.660 | because you start sort of guiding not only your own goals,

01:28:39.660 | but other people's goals to the next breakthrough.

01:28:43.860 | So you must really kind of understand this feasibility also,

01:28:48.700 | as we were discussing before, right?

01:28:50.340 | Whether this domain is ready to be tackled or not,

01:28:54.020 | and you don't want to be too early.

01:28:55.460 | You obviously don't want to be too late.

01:28:56.940 | So it's really interesting,

01:28:59.180 | this strategic component of research,

01:29:01.060 | which I think as a grad student, I just had no idea.

01:29:05.100 | I just read papers and discussed ideas.

01:29:07.380 | And I think this has been maybe the major change.

01:29:09.780 | And I recommend people kind of feed forward to success,

01:29:14.180 | how it looks like, and try to backtrack,

01:29:16.060 | other than just kind of looking,

01:29:17.820 | oh, this looks cool, this looks cool.

01:29:19.180 | And then you do a bit of random work,

01:29:21.020 | which sometimes you stumble upon some interesting things,

01:29:23.820 | but in general, it's also good to plan a bit.

01:29:27.540 | - Yeah, I like it.

01:29:29.020 | Especially like your approach

01:29:30.460 | of taking a really hard problem, stepping right in,

01:29:33.140 | and then being super skeptical

01:29:34.660 | about being able to solve the problem.

01:29:37.540 | I mean, there's a balance of both, right?

01:29:40.100 | There's a silly optimism and a critical sort of skepticism

01:29:45.100 | that's good to balance,

01:29:48.380 | which is why it's good to have a team of people

01:29:51.180 | that balance that.

01:29:52.660 | - You don't do that on your own.

01:29:53.900 | You have both mentors that have seen,

01:29:56.460 | or you obviously wanna chat and discuss

01:29:59.740 | whether it's the right time.

01:30:00.900 | I mean, Demis came in 2014 and he said,

01:30:04.620 | "Maybe in a bit, we'll do StarCraft."

01:30:06.580 | And maybe he knew.

01:30:08.340 | And I'm just following his lead, which is great,

01:30:11.220 | because he's brilliant, right?

01:30:12.620 | So these things are obviously quite important

01:30:17.340 | that you wanna be surrounded by people who are diverse.

01:30:22.300 | They have their knowledge.

01:30:24.020 | There's also important to...

01:30:26.380 | I mean, I've learned a lot from people

01:30:28.340 | who actually have an idea that I might not think it's good,

01:30:32.460 | but if I give them the space to try it,

01:30:34.940 | I've been proven wrong many, many times as well.

01:30:36.980 | So that's great.

01:30:38.220 | I think it's...

01:30:39.140 | Your colleagues are more important than yourself, I think.

01:30:43.500 | - Sure.

01:30:44.580 | Now, let's real quick talk about another impossible problem.

01:30:48.660 | AGI.

01:30:49.620 | - Right.

01:30:50.460 | - What do you think it takes to build a system

01:30:52.460 | that's human level intelligence?

01:30:54.100 | We talked a little bit about the Turing test, StarCraft,

01:30:56.380 | all of these have echoes of general intelligence.

01:30:59.020 | But if you think about just something

01:31:01.420 | that you would sit back and say,

01:31:02.860 | "Wow, this is really something

01:31:05.460 | "that resembles human level intelligence."

01:31:07.820 | What do you think it takes to build that?

01:31:09.580 | - So I find that AGI oftentimes

01:31:13.940 | is maybe not very well-defined.

01:31:17.220 | So what I'm trying to then come up with for myself

01:31:20.500 | is what would be a result look like

01:31:23.980 | that you would start to believe

01:31:25.540 | that you would have agents or neural nets

01:31:28.460 | that no longer sort of overfit to a single task, right?

01:31:31.900 | But actually kind of learn the skill of learning,

01:31:36.900 | so to speak.

01:31:37.900 | And that actually is a field that I am fascinated by,

01:31:41.460 | which is the learning to learn or meta-learning,

01:31:45.020 | which is about no longer learning about a single domain.

01:31:48.620 | So you can think about the learning algorithm itself

01:31:51.620 | is general, right?

01:31:52.700 | So the same formula we applied for AlphaStar or StarCraft,

01:31:56.780 | we can now apply to kind of almost any video game

01:31:59.420 | or you could apply to many other problems and domains.

01:32:03.540 | But the algorithm is what's kind of generalizing.

01:32:06.980 | But the neural network, those weights are useless

01:32:10.420 | even to play another race, right?

01:32:12.060 | I train a network to play very well at Protoss versus Protoss.

01:32:15.420 | I need to throw away those weights.

01:32:17.620 | If I want to play now Terran versus Terran,

01:32:20.620 | I would need to retrain a network from scratch

01:32:23.700 | with the same algorithm.

01:32:24.820 | That's beautiful, but the network itself will not be useful.

01:32:28.540 | So I think if I see an approach that can absorb

01:32:33.540 | or start solving new problems

01:32:36.660 | without the need to kind of restart the process,

01:32:40.100 | I think that to me would be a nice way

01:32:42.580 | to define some form of AGI.

01:32:45.620 | Again, I don't know the grandiose,

01:32:47.620 | like should Turing test be solved before AGI?

01:32:50.540 | I mean, I don't know.

01:32:51.740 | I think concretely, I would like to see clearly

01:32:54.700 | that meta-learning happen,

01:32:56.940 | meaning there is an architecture or a network

01:33:00.860 | that as it sees new problem or new data, it solves it.

01:33:05.020 | And to make it kind of a benchmark,

01:33:08.300 | it should solve it at the same speed that we do solve

01:33:10.740 | new problems.

01:33:11.580 | When I define you a new object and you have to recognize it.

01:33:14.500 | When you start playing a new game,

01:33:16.300 | you played all the Atari games,

01:33:17.500 | but now you play a new Atari game.

01:33:19.460 | Well, you're going to be pretty quickly,

01:33:21.580 | pretty good at the game.

01:33:22.540 | So that's perhaps what's the domain

01:33:25.900 | and what's the exact benchmark is a bit difficult.

01:33:28.060 | I think as a community,

01:33:29.100 | we might need to do some work to define it.

01:33:31.380 | But I think this first step,

01:33:34.380 | I could see it happen relatively soon.

01:33:36.900 | But then the whole what AGI means and so on,

01:33:40.660 | I am a bit more confused about what,

01:33:43.140 | I think people mean different things.

01:33:44.660 | - Yeah, there's an emotional, psychological level

01:33:47.100 | that like even the Turing test,

01:33:51.980 | passing the Turing test is something

01:33:53.780 | that we just pass judgment on as human beings,

01:33:55.900 | what it means to be, you know,

01:33:57.740 | is a dog in AGI system.

01:34:02.740 | - Yeah.

01:34:04.500 | - Like what level, what does it mean?

01:34:06.220 | - Right.

01:34:07.060 | - Yeah, what does it mean?

01:34:07.900 | But I like the generalization

01:34:08.980 | and maybe as a community we converge

01:34:10.700 | towards a group of domains

01:34:13.020 | that are sufficiently far away,

01:34:15.060 | that would be really damn impressive

01:34:16.580 | if it was able to generalize.

01:34:18.340 | So perhaps not as close as Protoss and Zerg,

01:34:21.420 | but like Wikipedia.

01:34:22.820 | - That would be a good step, yeah.

01:34:23.660 | - Yeah, it would be a good step.

01:34:24.700 | And then a really good step,

01:34:26.420 | but then like from Starcraft to Wikipedia and back.

01:34:30.860 | - Yeah.

01:34:31.700 | - That kind of thing.

01:34:32.540 | - And that feels also quite hard and far,

01:34:34.300 | but I think there's,

01:34:36.220 | as long as you put the benchmark out,

01:34:38.220 | as we discovered, for instance, with ImageNet,

01:34:41.140 | then tremendous progress can be had.

01:34:43.060 | So I think maybe there's a lack of benchmark,

01:34:46.460 | but I'm sure we'll find one

01:34:47.820 | and the community will then work towards that.

01:34:50.700 | And then beyond what AGI might mean or would imply,

01:34:57.020 | I really am hopeful to see basically machine learning

01:35:01.100 | or AI just scaling up and helping people

01:35:05.300 | that might not have the resources to hire an assistant

01:35:08.740 | or that they might not even know what the weather is like.

01:35:13.740 | So I think there's, in terms of the impact,

01:35:16.460 | the positive impact of AI,

01:35:18.020 | I think that's maybe what we should also not lose focus.

01:35:22.500 | The research community building AGI,

01:35:23.980 | I mean, that's a real nice goal,

01:35:25.540 | but I think the way that DeepMind puts it is,

01:35:28.500 | and then use it to solve everything else, right?

01:35:30.820 | So I think we should paralyze.

01:35:33.500 | - Yeah, we shouldn't forget about all the positive things

01:35:36.180 | that are actually coming out of AI already

01:35:37.700 | and are going to be coming out.

01:35:40.700 | - Right.

01:35:41.660 | - But on that note, let me ask,

01:35:45.020 | relative to popular perception,

01:35:47.140 | do you have any worry about the existential threat

01:35:49.660 | of artificial intelligence in the near or far future

01:35:53.260 | that some people have?

01:35:55.180 | - I think in the near future, I'm skeptical,

01:35:58.100 | so I hope I'm not wrong,

01:35:59.340 | but I'm not concerned,

01:36:02.380 | but I appreciate efforts, ongoing efforts,

01:36:06.100 | and even like whole research field on AI safety emerging

01:36:09.260 | and in conferences and so on, I think that's great.

01:36:12.620 | In the long term, I really hope we just can simply

01:36:17.580 | have the benefits outweigh the potential dangers.

01:36:20.700 | I am hopeful for that,

01:36:23.380 | but also we must remain vigilant to kind of monitor

01:36:26.540 | and assess whether the trade-offs are there

01:36:29.140 | and we have enough also lead time to prevent

01:36:33.740 | or to redirect our efforts if need be, right?

01:36:36.860 | So, but I'm quite optimistic about the technology

01:36:41.540 | and definitely more fearful of other threats

01:36:45.060 | in terms of planetary level at this point,

01:36:48.580 | but obviously that's the one I kind of have more power on.

01:36:52.500 | So clearly I do start thinking more and more about this

01:36:56.260 | and it's kind of, it's grown in me actually

01:36:58.980 | to start reading more about AI safety,

01:37:02.180 | which is a field that so far I have not really contributed

01:37:05.220 | to, but maybe there's something to be done there as well.

01:37:07.620 | - Well, I think it's really important.

01:37:08.980 | You know, I talk about this with a few folks,

01:37:11.460 | but it's important to ask you and shove it in your head

01:37:14.860 | because you're at the leading edge of actually

01:37:17.860 | what people are excited about in AI.

01:37:19.340 | I mean, the work with AlphaStar,

01:37:21.500 | it's arguably at the very cutting edge of the kind of thing

01:37:25.380 | that people are afraid of.

01:37:27.220 | And so you speaking to that fact

01:37:29.580 | and that we're actually quite far away

01:37:32.660 | to the kind of thing that people might be afraid of,

01:37:35.180 | but it's still worthwhile to think about.

01:37:38.300 | And it's also good that you're not as worried

01:37:43.300 | and you're also open to thinking about it.

01:37:45.780 | - There's two aspects.

01:37:46.620 | I mean, me not being worried,

01:37:47.740 | but obviously we should prepare for it, right?

01:37:52.060 | For like, for things that could go wrong,

01:37:55.260 | misuse of the technologies as with any technologies, right?

01:37:58.300 | So I think there's always trade-offs.

01:38:02.340 | And as a society, we've kind of solved these

01:38:05.700 | to some extent in the past.

01:38:07.300 | So I'm hoping that by having the researchers

01:38:10.660 | and the whole community brainstorm

01:38:13.460 | and come up with interesting solutions

01:38:15.540 | to the new things that will happen in the future,

01:38:18.900 | that we can still also push the research to the avenue

01:38:22.420 | that I think is kind of the greatest avenue,

01:38:24.380 | which is to understand intelligence, right?

01:38:27.700 | How are we doing what we're doing?

01:38:29.620 | And obviously from a scientific standpoint,

01:38:32.540 | that is kind of the drive, my personal drive

01:38:35.420 | of all the time that I spend doing what I'm doing, really.

01:38:40.020 | - Where do you see the deep learning as a field heading?

01:38:42.980 | Where do you think the next big breakthrough might be?

01:38:46.740 | - So I think deep learning,

01:38:48.060 | I discussed a little of this before,

01:38:50.700 | deep learning has to be combined

01:38:53.100 | with some form of discretization, program synthesis.

01:38:56.660 | I think that's kind of as a research in itself

01:38:59.220 | is an interesting topic to expand

01:39:01.500 | and start doing more research.

01:39:03.100 | And then as kind of what will deep learning

01:39:07.060 | enable to do in the future?

01:39:08.620 | I don't think that's gonna be what's gonna happen this year,

01:39:11.500 | but also this idea of starting not to throw away

01:39:15.820 | all the weights, that this idea of learning to learn

01:39:18.900 | and really having these agents

01:39:22.700 | not having to restart their weights.

01:39:24.980 | And you can have an agent that is kind of solving

01:39:28.700 | or classifying images on ImageNet,

01:39:31.060 | but also generating speech

01:39:32.700 | if you ask it to generate some speech.

01:39:34.660 | And it should really be kind of almost the same network,

01:39:39.660 | but it might not be a neural network,

01:39:41.740 | it might be a neural network

01:39:42.700 | with a optimization algorithm attached to it.

01:39:45.620 | But I think this idea of generalization to new task

01:39:49.300 | is something that we first must define good benchmarks,

01:39:52.180 | but then I think that's gonna be exciting

01:39:54.660 | and I'm not sure how close we are,

01:39:56.500 | but I think if you have a very limited domain,

01:40:00.900 | I think we can start doing some progress.

01:40:02.820 | And much like how we did a lot of programs

01:40:06.220 | in computer vision, we should start thinking,

01:40:08.860 | I really like a talk that Leon Boutou gave at ICML

01:40:12.700 | a few years ago, which is this train-test paradigm

01:40:16.460 | should be broken.

01:40:17.380 | We should stop thinking about a training set

01:40:22.300 | and a test set, and these are closed things

01:40:25.180 | that are untouchable.

01:40:26.620 | I think we should go beyond these.

01:40:28.180 | And in meta-learning, we call these the meta-training set

01:40:31.100 | and the meta-test set, which is really thinking about

01:40:35.340 | if I know about ImageNet,

01:40:37.300 | why would that network not work on MNIST,

01:40:39.980 | which is a much simpler problem?

01:40:41.340 | But right now it really doesn't.

01:40:43.020 | But it just feels wrong, right?

01:40:46.180 | So I think that's kind of the,

01:40:48.820 | on the application or the benchmark sites,

01:40:52.060 | we probably will see quite a few more interest and progress

01:40:56.500 | and hopefully people defining new

01:40:59.020 | and exciting challenges, really.

01:41:00.940 | - Do you have any hope or interest in knowledge graphs

01:41:04.180 | within this context?

01:41:05.260 | So this is kind of constructing graph.

01:41:08.180 | So going back to graphs.

01:41:10.500 | Well, neural networks and graphs,

01:41:12.140 | but I mean a different kind of knowledge graph,

01:41:14.900 | sort of like semantic graphs where there's concepts.

01:41:18.100 | - Yeah, so I think the idea of graphs is,

01:41:23.100 | so I've been quite interested in sequences first

01:41:26.420 | and then more interesting or different data structures

01:41:29.100 | like graphs.

01:41:29.940 | And I've studied graph neural networks

01:41:33.100 | in the last three years or so.

01:41:34.540 | I found these models just very interesting

01:41:37.700 | from like deep learning sites standpoint.

01:41:42.220 | But then why do we want these models

01:41:45.860 | and why would we use them?

01:41:47.300 | What's the application?

01:41:48.660 | What's kind of the killer application of graphs, right?

01:41:51.420 | And perhaps if we could extract a knowledge graph

01:41:56.420 | from Wikipedia automatically, that would be interesting

01:42:02.460 | because then these graphs have

01:42:04.740 | this very interesting structure

01:42:06.860 | that also is a bit more compatible

01:42:08.620 | with this idea of programs

01:42:10.540 | and deep learning kind of working together,

01:42:13.180 | jumping neighborhoods and so on.

01:42:14.860 | You could imagine defining some primitives

01:42:17.180 | to go around graphs, right?

01:42:18.860 | So I think I really like the idea of a knowledge graph.

01:42:23.860 | And in fact, when we started,

01:42:27.420 | or as part of the research we did for StarCraft,

01:42:31.340 | I thought, wouldn't it be cool to give the graph

01:42:34.420 | of all these buildings that depend on each other

01:42:39.420 | and units that have prerequisites of being built by that.

01:42:42.420 | And so this is information

01:42:44.820 | that the network can learn and extract,

01:42:46.900 | but it would have been great to see

01:42:50.100 | or to think of really StarCraft as a giant graph

01:42:52.940 | that even also as the game evolves,

01:42:54.940 | you just kind of start taking branches and so on.

01:42:57.980 | And we did a bit of research on this, nothing too relevant,

01:43:02.380 | but I really like the idea.

01:43:04.140 | - And it has elements that are,

01:43:05.660 | which is something you also worked with

01:43:07.340 | in terms of visualizing your networks,

01:43:08.820 | is elements of having human interpretable,

01:43:12.340 | being able to generate knowledge representations

01:43:15.700 | that are human interpretable

01:43:17.020 | that maybe human experts can then tweak

01:43:19.620 | or at least understand.

01:43:20.940 | So there's a lot of interesting aspect there.

01:43:22.860 | And for me personally, I'm just a huge fan of Wikipedia

01:43:25.620 | and it's a shame that our neural networks

01:43:29.140 | aren't taking advantage of all the structured knowledge

01:43:31.340 | that's on the web.

01:43:32.380 | What's next for you?

01:43:34.860 | What's next for DeepMind?

01:43:36.340 | What are you excited about for AlphaStar?

01:43:39.700 | - Yeah, so I think the obvious next steps

01:43:43.540 | would be to apply AlphaStar to other races.

01:43:47.980 | I mean, that sort of shows that the algorithm works

01:43:51.580 | because we wouldn't want to have created by mistake

01:43:55.580 | something in the architecture that happens to work

01:43:58.100 | for Protoss, but not for other races, right?

01:44:00.100 | So as verification, I think that's an obvious next step

01:44:03.500 | that we are working on.

01:44:05.740 | And then I would like to see,

01:44:09.300 | so agents and players can specialize on different skill sets

01:44:13.740 | that allow them to be very good.

01:44:15.980 | I think we've seen AlphaStar understanding very well

01:44:19.500 | when to take battles and when to not do that.

01:44:22.460 | Also very good at micromanagement

01:44:24.900 | and moving the units around and so on.

01:44:27.540 | And also very good at producing nonstop

01:44:29.740 | and trading off economy with building units.

01:44:33.420 | But I have not perhaps seen as much as I would like

01:44:37.300 | this idea of the poker idea that you mentioned, right?

01:44:40.460 | I'm not sure StarCraft or AlphaStar rather

01:44:43.300 | has developed a very deep understanding

01:44:46.100 | of what the opponent is doing and reacting to that

01:44:50.100 | and sort of trying to trick the player to do something else

01:44:54.060 | or that, you know, so this kind of reasoning

01:44:57.220 | I would like to see more.

01:44:58.340 | So I think purely from a research standpoint,

01:45:01.620 | there's perhaps also quite a few things to be done there

01:45:04.620 | in the domain of StarCraft.

01:45:06.060 | - Yeah, in the domain of games,

01:45:08.140 | I've seen some interesting work in sort of,

01:45:10.980 | in even auctions, manipulating other players,

01:45:13.740 | sort of forming a belief state and just messing with people.

01:45:17.220 | - Yeah, it's called theory of mind, I guess.

01:45:18.820 | - Theory of mind, yeah.

01:45:20.140 | So it's fascinating.

01:45:21.420 | - Theory of mind on StarCraft is kind of,

01:45:23.860 | they're really made for each other.

01:45:26.100 | So that will be very exciting to see

01:45:28.660 | those techniques applied to StarCraft

01:45:30.500 | or perhaps StarCraft driving new techniques, right?

01:45:33.260 | As I said, this is always the tension between the two.

01:45:36.660 | - Wow, Oriol, thank you so much for talking today.

01:45:38.860 | - Awesome, it was great to be here, thanks.

01:45:40.980 | (upbeat music)

01:45:43.580 | (upbeat music)

01:45:46.180 | (upbeat music)

01:45:48.780 | (upbeat music)

01:45:51.380 | (upbeat music)

01:45:53.980 | (upbeat music)

01:45:56.580 | [BLANK_AUDIO]

Oriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20

Chapters