back to index

Tuomas Sandholm: Poker and Game Theory | Lex Fridman Podcast #12


Chapters

0:0
18:58 Belief Distribution
20:6 Nash Equilibrium
26:52 Stochastic Games
27:38 Extensive Form Games
61:9 Now You Have a Few Excellent Pieces of Work but What Are You Thinking into the Future with Several Companies You'Re Doing What's the Most Exciting Thing or One of the Exciting Things the Number One Thing but for Me Right Now Is Coming Up with these Scalable Techniques for Game Solving and Applying Them into the Real World They'Re Still Very Interested in Market Design As Well and We'Re Doing that in the Optimized Markets but I'M Most Interested if Number One Right Now Is Strategic Machine Strategy Robots Getting that Technology Out There and Seeing as You Were in the Trenches Doing Applications What Needs To Be Actually Filled What Technology Gap Still Need To Be Filled so It's So Hard To Just Put Your Feet on the Table

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Thomas Sanholm. He's a professor at CMU and co-creator
00:00:05.760 | of Libratus, which is the first AI system to beat top human players in the game of heads-up
00:00:10.880 | no limit Texas Hold'em. He has published over 450 papers on game theory and machine learning,
00:00:17.280 | including a best paper in 2017 at NIPS, now renamed to NewRIPS, which is where I caught up
00:00:24.640 | with him for this conversation. His research and companies have had wide-reaching impact in the
00:00:30.880 | real world, especially because he and his group not only propose new ideas but also build systems
00:00:38.240 | to prove that these ideas work in the real world. This conversation is part of the MIT course on
00:00:44.800 | Artificial General Intelligence and the Artificial Intelligence Podcast. If you enjoy it, subscribe
00:00:50.400 | on YouTube, iTunes, or simply connect with me on Twitter @LexFriedman, spelled F-R-I-D.
00:00:57.280 | And now, here's my conversation with Thomas Sanholm. Can you describe at the high level
00:01:06.000 | the game of poker Texas Hold'em, heads-up Texas Hold'em, for people who might not be familiar
00:01:10.720 | with this card game? Yeah, happy to. So, heads-up no limit Texas Hold'em has really emerged in the
00:01:17.600 | AI community as a main benchmark for testing these application-independent algorithms for
00:01:24.160 | imperfect information game solving. And this is a game that's actually played by humans. You don't
00:01:31.200 | see that much on TV or casinos because, well, for various reasons, but you do see it in some
00:01:38.640 | expert-level casinos and you see it in the best poker movies of all time. It's actually an event
00:01:44.000 | in the World Series of Poker, but mostly it's played online and typically for pretty big sums
00:01:50.400 | of money. And this is a game that usually only experts play. So, if you go to your home game
00:01:57.200 | on a Friday night, it probably is not going to be heads-up no limit Texas Hold'em. It might be
00:02:01.680 | no limit Texas Hold'em in some cases, but typically for a big group and it's not as competitive.
00:02:08.560 | While heads-up means it's two players, so it's really like me against you. Am I better or are
00:02:14.080 | you better? Much like chess or go in that sense, but an imperfect information game,
00:02:19.360 | which makes it much harder because I have to deal with issues of you knowing things that I don't
00:02:24.960 | know and I know things that you don't know instead of pieces being nicely laid on the board for both
00:02:29.920 | of us to see. So, in Texas Hold'em, there's two cards that you only see that belong to you. And
00:02:37.680 | there is, they gradually lay out some cards that add up overall to five cards that everybody can
00:02:43.600 | see. So, the imperfect nature of the information is the two cards that you're holding.
00:02:48.240 | Up front, yeah. So, as you said, you first get two cards in private each and then there's a
00:02:54.160 | betting round. Then you get three cards in public on the table, then there's a betting round. Then
00:02:59.440 | you get the fourth card in public on the table, there's a betting round. Then you get the fifth
00:03:03.760 | card on the table, there's a betting round. So, there's a total of four betting rounds and four
00:03:08.480 | tranches of information revelation, if you will. Only the first tranche is private and then it's
00:03:14.720 | public from there. And this is probably by far the most popular game in AI and just the general
00:03:25.680 | public in terms of imperfect information. So, it's probably the most popular spectator game to watch.
00:03:32.960 | Right? So, which is why it's a super exciting game to tackle. So, it's on the order of chess,
00:03:39.760 | I would say, in terms of popularity, in terms of the AI setting it as the bar of what is
00:03:45.360 | intelligence. So, in 2017, Libratus, how do you pronounce it? Libratus. Libratus. Libratus beats...
00:03:52.880 | Little Latin there.
00:03:53.920 | A little bit of Latin. Libratus beats a few, four expert human players. Can you describe that
00:04:02.400 | event? What you learned from it? What was it like? What was the process in general for people who
00:04:07.200 | have not read the papers and studied? Yeah. So, the event was that we invited four of the top 10
00:04:14.080 | players, who these are specialist players in Heads Up No Limit Texas Hold'em, which is very important
00:04:18.960 | because this game is actually quite different than the multiplayer version. We brought them in
00:04:24.800 | to Pittsburgh to play at the Rivers Casino for 20 days. We wanted to get 120,000 hands in
00:04:31.680 | because we wanted to get statistical significance. So, it's a lot of hands for humans to play,
00:04:38.400 | even for these top pros who play fairly quickly normally. So, we couldn't just have one of them
00:04:44.800 | play so many hands. 20 days they were playing, basically morning to evening, and I raised 200,000
00:04:53.120 | as a little incentive for them to play. And the setting was so that they didn't all get 50,000.
00:05:00.880 | We actually paid them out based on how they did against the AI each. So, they had an incentive to
00:05:06.560 | play as hard as they could, whether they're way ahead or way behind or right at the mark of beating
00:05:12.960 | the AI. And you don't make any money, unfortunately. Right. No, we can't make any money. So,
00:05:18.240 | originally, a couple of years earlier, I actually explored whether we could actually play for money
00:05:24.000 | because that would be, of course, interesting as well to play against the top people for money.
00:05:29.360 | But the Pennsylvania Gaming Board said no. So, we couldn't. So, this is much like an exhibit,
00:05:35.200 | like for a musician or a boxer or something like that. Nevertheless, you were keeping track of the
00:05:41.120 | money and Labradors won close to $2 million, I think. So, if it was for real money, if you were
00:05:50.800 | able to earn money, that was a quite impressive and inspiring achievement. Just a few details.
00:05:57.040 | What were the players looking at? Were they behind a computer? What was the interface like?
00:06:02.000 | Yes. They were playing much like they normally do. These top players, when they play this game,
00:06:07.040 | they play mostly online. So, they're used to playing through a UI. And they did the same
00:06:12.400 | thing here. So, there was this layout. You could imagine there's a table on a screen.
00:06:17.760 | There's the human sitting there, and then there's the AI sitting there. And the screen shows
00:06:23.600 | everything that's happening. The card's coming out and shows the bets being made. And we also
00:06:28.080 | had the betting history for the human. So, if the human forgot what had happened in the hand so far,
00:06:33.200 | they could actually reference back and so forth. Is there a reason they were given access to the
00:06:39.520 | betting history? Well, it didn't really matter. They wouldn't have forgotten anyway. These are
00:06:47.680 | top quality people. But we just wanted to put out there so it's not a question of a human forgetting
00:06:53.360 | and the AI somehow trying to get advantage of better memory. So, what was that like? I mean,
00:06:57.760 | that was an incredible accomplishment. So, what did it feel like before the event? Did you
00:07:03.200 | have doubt, hope? Where was your confidence at? Yeah, that's great. Great question. So,
00:07:10.160 | 18 months earlier, I had organized a similar brains versus AI competition with a previous
00:07:16.400 | AI called Cloudico, and we couldn't beat the humans. So, this time around, it was only 18
00:07:22.720 | months later, and I knew that this new AI, Libratus, was way stronger. But it's hard to say
00:07:29.200 | how you'll do against the top humans before you try. So, I thought we had about a 50/50 shot.
00:07:34.400 | And the international betting sites put us as a four to one or five to one underdog.
00:07:41.600 | So, it's kind of interesting that people really believe in people over AI. People don't just
00:07:48.960 | believe over believing themselves, but they have overconfidence in other people as well
00:07:53.120 | compared to the performance of AI. And yeah, so we were a four to one or five to one underdog.
00:07:59.200 | And even after three days of beating the humans in a row, we were still 50/50 on the international
00:08:05.280 | betting sites. Do you think there's something special and magical about poker in the way
00:08:10.480 | people think about it? In the sense you have, I mean, even in chess, there's no Hollywood movies.
00:08:17.200 | Poker is the star of many movies. And there's this feeling that certain human facial expressions and
00:08:27.920 | body language, eye movement, all these tells are critical to poker. Like you can look into
00:08:34.080 | somebody's soul and understand their betting strategy and so on. So, that's probably why
00:08:40.960 | possibly do you think that is why people have a confidence that humans will outperform? Because
00:08:46.000 | AI systems cannot in this construct perceive these kinds of tells. They're only looking at betting
00:08:52.480 | patterns and nothing else, the betting patterns and statistics. So, what's more important to you
00:09:03.120 | if you step back and human players, human versus human, what's the role of these tells of these
00:09:10.000 | ideas that we romanticize? Yeah. So, I'll split it into two parts. So, one is why do humans
00:09:17.760 | trust humans more than AI and all have overconfidence in humans? I think that's not
00:09:24.720 | really related to the tell question. It's just that they've seen these top players, how good
00:09:30.000 | they are and they're really fantastic. So, it's just hard to believe that an AI could beat them.
00:09:36.720 | So, I think that's where that comes from. And that's actually maybe a more general lesson about
00:09:40.960 | AI that until you've seen it overperform a human, it's hard to believe that it could.
00:09:45.360 | But then the tells, a lot of these top players, they're so good at hiding tells that among the
00:09:55.360 | top players, it's actually not really worth it for them to invest a lot of effort trying to find
00:10:01.840 | tells in each other because they're so good at hiding them. So, yes, at the kind of Friday evening
00:10:09.040 | game, tells are going to be a huge thing. You can read other people and if you're a good reader,
00:10:14.000 | you'll read them like an open book. But at the top levels of poker now, the tells become a much
00:10:20.080 | smaller and smaller aspect of the game as you go to the top levels. The amount of strategies,
00:10:25.840 | the amounts of possible actions is very large, 10 to the power of 100 plus. So, there has to be some,
00:10:36.720 | I've read a few of the papers related, it has to form some abstractions of various hands and actions.
00:10:44.080 | So, what kind of abstractions are effective for the game of poker?
00:10:49.120 | Yeah. So, you're exactly right. So, when you go from a game tree, that's 10 to the 161,
00:10:55.280 | especially in an imperfect information game, it's way too large to solve directly,
00:11:00.080 | even with our fastest equilibrium finding algorithms. So, you want to abstract it first.
00:11:07.120 | And abstraction in games is much trickier than abstraction in MDPs or other single agent settings,
00:11:14.320 | because you have these abstraction pathologies. But if I have a finer grained abstraction,
00:11:19.440 | the strategy that I can get from that for the real game might actually be worse than the strategy I
00:11:25.280 | can get from the coarse grained abstraction. So, you have to be very careful.
00:11:28.720 | Now, the kinds of abstractions, just to zoom out, we're talking about, there's the
00:11:32.640 | hands abstractions, and then there's betting strategies.
00:11:37.120 | Yeah, betting actions. Yeah.
00:11:38.480 | Betting actions.
00:11:39.280 | So, there's information abstraction, talk about general games, information abstraction,
00:11:44.560 | which is the abstraction of what chance does. And this would be the cards in the case of poker.
00:11:50.000 | And then there's action abstraction, which is abstracting the actions of the actual players,
00:11:56.400 | which would be bets in the case of poker.
00:11:58.960 | Yourself and the other players?
00:12:01.280 | Yes, yourself and other players. And for information abstraction, we were completely
00:12:10.160 | automated. So, these are algorithms, but they do what we call potential aware abstraction,
00:12:16.640 | where we don't just look at the value of the hand, but also how it might materialize into
00:12:20.880 | good or bad hands over time. And it's a certain kind of bottom up process with integer programming
00:12:26.320 | there and clustering and various aspects, how do you build this abstraction?
00:12:31.360 | And then in the action abstraction, there, it's largely based on how
00:12:36.800 | humans and other AIs have played this game in the past. But in the beginning,
00:12:43.200 | we actually used an automated action abstraction technology, which is provably convergent,
00:12:49.920 | that it finds the optimal combination of bet sizes, but it's not very scalable.
00:12:55.360 | So, we couldn't use it for the whole game, but we used it for the first couple of betting actions.
00:12:59.680 | So, what's more important, the strength of the hand, so the information abstraction, or the
00:13:06.080 | how you play them, the actions? Does it, you know, the romanticized notion again,
00:13:13.120 | is that it doesn't matter what hands you have, that the actions, the betting,
00:13:17.120 | may be the way you win, no matter what hands you have.
00:13:20.240 | Yeah. So, that's why you have to play a lot of hands, so that the role of luck gets smaller. So,
00:13:27.280 | you could otherwise get lucky and get some good hands, and then you're going to win the match.
00:13:31.440 | Even with thousands of hands, you can get lucky, because there's so much variance in
00:13:37.120 | no limit Texas Hold'em, because if we both go all in, it's a huge stack of variance. So,
00:13:43.040 | there are these massive swings in no limit Texas Hold'em. So, that's why you have to play not just
00:13:49.600 | thousands, but over 100,000 hands to get statistical significance.
00:13:55.200 | So, let me ask another way this question. If you didn't even look at your hands,
00:14:00.400 | but they didn't know that, your opponents didn't know that, how well would you be able to do?
00:14:05.840 | That's a good question. There's actually, I heard the story that there's this Norwegian
00:14:10.720 | female poker player called Annette Oberstad, who's actually won a tournament by doing exactly that.
00:14:16.640 | But that would be extremely rare. So, you cannot really play well that way.
00:14:23.360 | Okay. So, the hands do have some role to play.
00:14:27.760 | So, Libratus does not use, as far as I understand, use learning methods, deep learning.
00:14:35.040 | Is there room for learning in, you know, there's no reason why Libratus doesn't, you know, combine
00:14:44.000 | with an AlphaGo type approach for estimating the quality for a function estimator. What are your
00:14:50.080 | thoughts on this? Maybe as compared to another algorithm, which I'm not that familiar with,
00:14:55.760 | DeepStack, the engine that does use deep learning, that it's unclear how well it does, but
00:15:01.920 | nevertheless uses deep learning. So, what are your thoughts about learning methods to aid
00:15:06.000 | in the way that Libratus plays the game of poker?
00:15:09.280 | Yeah. So, as you said, Libratus did not use learning methods and played very well without
00:15:15.440 | them. Since then, we have actually, actually here, we have a couple of papers on things that do
00:15:20.800 | use learning techniques.
00:15:22.160 | Excellent.
00:15:22.660 | So, and deep learning in particular, and sort of the way you're talking about, where it's learning
00:15:31.360 | an evaluation function. But in imperfect information games, unlike, let's say, in Go or
00:15:40.560 | now also in chess and shogi, it's not sufficient to learn an evaluation for a state, because the
00:15:48.160 | value of an information set depends not only on the exact state, but it also depends on
00:15:56.640 | both players' beliefs. Like, if I have a bad hand, I'm much better off if the opponent thinks I have
00:16:03.760 | a good hand, and vice versa. If I have a good hand, I'm much better off if the opponent believes I
00:16:09.360 | have a bad hand. So, the value of a state is not just a function of the cards. It depends on,
00:16:16.320 | if you will, the path of play, but only to the extent that is captured in the belief distributions.
00:16:23.200 | So, that's why it's not as simple as it is in perfect information games. And I don't want to
00:16:29.840 | say it's simple there either. It's, of course, very complicated computationally there too,
00:16:34.080 | but at least conceptually, it's very straightforward. There's a state,
00:16:37.360 | there's an evaluation function, you can try to learn it. Here, you have to do something more.
00:16:42.240 | And what we do is, in one of these papers, we're looking at allowing, where we allow the opponent
00:16:50.640 | to actually take different strategies at the leaf of the search tree, if you will. And that
00:16:57.200 | is a different way of doing it, and it doesn't assume, therefore, a particular way that the
00:17:02.640 | opponent plays, but it allows the opponent to choose from a set of different continuation
00:17:08.800 | strategies. And that forces us to not be too optimistic in a look-ahead search. And that's
00:17:16.160 | one way you can do sound look-ahead search in imperfect information games, which is very
00:17:21.840 | difficult. And you were asking about DeepStack. What they did, it was very different than what
00:17:28.560 | we do, either in Libratus or in this new work. They were randomly generating various situations
00:17:35.360 | in the game. Then they were doing the look-ahead from there to the end of the game, as if that was
00:17:40.560 | the start of a different game. And then they were using deep learning to learn those values of those
00:17:47.440 | states, but the states were not just the physical states, they include belief distributions.
00:17:52.400 | When you talk about look-ahead for DeepStack or with Libratus, does it mean considering every
00:18:00.320 | possibility that the game can evolve? Are we talking about extremely sort of this exponentially
00:18:05.840 | growth of a tree? Yes. So we're talking about exactly that.
00:18:09.360 | Much like you do in alpha beta search or Monte Carlo tree search, but with different techniques.
00:18:17.280 | So there's a different search algorithm, and then we have to deal with the leaves differently.
00:18:21.760 | So if you think about what Libratus did, we didn't have to worry about this, because we only did it
00:18:26.640 | at the end of the game. So we would always terminate into a real situation, and we would
00:18:32.560 | know what the payout is. It didn't do these depth-limited look-aheads. But now in this new
00:18:37.680 | paper, which is called Depth Limited, I think it's called Depth Limited Search for Imperfect
00:18:42.640 | Information Games, we can actually do sound depth-limited look-aheads. So we can actually
00:18:47.600 | start to do the look-ahead from the beginning of the game on, because that's too complicated to do
00:18:53.440 | for this whole long game. So in Libratus, we were just doing it for the end.
00:18:57.600 | So, and then the other side, this belief distribution. So is it explicitly modeled
00:19:02.960 | what kind of beliefs that the opponent might have?
00:19:06.400 | Yeah. It is explicitly modeled, but it's not assumed. The beliefs are actually output,
00:19:13.920 | not input. Of course, the starting beliefs are input, but they just fall from the rules of the
00:19:20.240 | game, because we know that the dealer deals uniformly from the deck. So I know that every
00:19:26.000 | pair of cards that you might have is equally likely. I know that for a fact. That just follows
00:19:32.080 | from the rules of the game. Of course, except the two cards that I have. I know you don't have those.
00:19:36.160 | You have to take that into account. That's called card removal, and that's very important.
00:19:40.880 | Is the dealing always coming from a single deck in a heads up? So you can assume-
00:19:45.840 | Single deck.
00:19:46.400 | That's called card removal.
00:19:47.680 | You know that if I have the ace of spades, I know you don't have an ace of spades.
00:19:53.440 | Great. So in the beginning, your belief is basically the fact that it's a fair
00:19:57.840 | dealing of hands, but how do you start to adjust that belief?
00:20:02.640 | Well, that's where this beauty of game theory comes. So Nash equilibrium,
00:20:08.160 | which John Nash introduced in 1950, introduces what rational play is when you have more than
00:20:14.320 | one player. And these are pairs of strategies where strategies are contingency plans,
00:20:20.240 | one for each player, so that neither player wants to deviate to a different strategy,
00:20:26.800 | given that the other doesn't deviate. But as a side effect, you get the beliefs from Bayes' rule.
00:20:33.680 | So Nash equilibrium really isn't just deriving in these imperfect information games.
00:20:38.240 | Nash equilibrium doesn't just define strategies. It also defines beliefs for both of us,
00:20:44.800 | and defines beliefs for each state. So at each state, each, they call information sets.
00:20:53.120 | At each information set in the game, there's a set of different states that we might be in,
00:20:58.880 | but I don't know which one we're in. Nash equilibrium tells me exactly what is the
00:21:03.680 | probability distribution over those real world states in my mind.
00:21:08.240 | How does Nash equilibrium give you that distribution? So why-
00:21:12.160 | I'll do a simple example. So you know the game rock, paper, scissors. So we can draw it as
00:21:18.400 | player one moves first, and then player two moves. But of course, it's important that player two
00:21:24.480 | doesn't know what player one moved. Otherwise, player two would win every time. So we can draw
00:21:29.200 | that as an information set where player one makes one of three moves first, and then there's an
00:21:34.000 | information set for player two. So player two doesn't know which of those nodes the world is in.
00:21:41.040 | But once we know the strategy for player one, Nash equilibrium will say that you play one-third rock,
00:21:47.200 | one-third paper, one-third scissors. From that, I can derive my beliefs on the information set
00:21:52.480 | that they're one-third, one-third, one-third. So Bayes gives you that.
00:21:56.160 | Bayes gives you. But is that specific to a particular player, or is it something you
00:22:02.400 | quickly update with the specific player? No, the game theory isn't really player specific.
00:22:08.720 | So that's also why we don't need any data. We don't need any history how these particular humans
00:22:13.920 | played in the past, or how any AI or human had played before. It's all about rationality. So
00:22:21.200 | the AI just thinks about what would a rational opponent do, and what would I do if I am rational.
00:22:27.920 | And that's the idea of game theory. So it's really a data-free, opponent-free approach.
00:22:35.520 | So it comes from the design of the game as opposed to the design of the player.
00:22:40.080 | Exactly. There's no opponent modeling per se. I mean, we've done some work on combining opponent
00:22:45.120 | modeling with game theory so you can exploit weak players even more, but that's another strand. And
00:22:50.320 | in Libratus, we didn't turn that on because I decided that these players are too good. And when
00:22:55.360 | you start to exploit an opponent, you typically open yourself up to exploitation. And these guys
00:23:02.320 | have so few holes to exploit, and they're world's leading experts in counter-exploitation. So I
00:23:06.960 | decided that we're not going to turn that stuff on. Actually, I saw a few of your papers exploiting
00:23:11.440 | opponents. It sounded very interesting to explore. Do you think there's room for exploitation
00:23:17.840 | generally outside of Libratus? Is there a subject or people differences that could be exploited,
00:23:25.200 | maybe not just in poker, but in general interactions and negotiations, all these
00:23:30.800 | other domains that you're considering? Yeah, definitely. We've done some work on that. And
00:23:36.080 | I really like the work that hybridizes the two. So you figure out what would a rational opponent
00:23:42.960 | do. And by the way, that's safe in these zero-sum games, two-player zero-sum games,
00:23:47.360 | because if the opponent does something irrational, yes, it might throw off my beliefs. But the
00:23:54.000 | amount that the player can gain by throwing off my belief is always less than they lose by playing
00:24:00.240 | poorly. So it's safe. But still, if somebody's weak as a player, you might want to play differently
00:24:08.160 | to exploit them more. So you can think about it this way. A game theoretic strategy is unbeatable,
00:24:14.560 | but it doesn't maximally beat the other opponent. So the winnings per hand might be better with a
00:24:22.960 | different strategy. And the hybrid is that you start from a game theoretic approach. And then
00:24:27.600 | as you gain data about the opponent in certain parts of the game tree, then in those parts of
00:24:33.360 | the game tree, you start to tweak your strategy more and more towards exploitation while still
00:24:39.840 | staying fairly close to the game theoretic strategy so as to not open yourself up to
00:24:44.480 | exploitation too much. - How do you do that? Do you try to
00:24:50.080 | vary up strategies, make it unpredictable? It's like, what is it, tit-for-tat strategies in
00:24:58.320 | Prisoner's Dilemma or? - Well, that's a repeated game.
00:25:02.800 | - Repeated games, that's right. - Simple Prisoner's Dilemma,
00:25:05.200 | repeated games. But even there, there's no proof that says that that's the best thing.
00:25:10.000 | But experimentally, it actually does well. - So what kind of games are there, first of all?
00:25:15.280 | I don't know if this is something that you could just summarize. There's perfect information games
00:25:20.240 | where all the information's on the table. There is imperfect information games. There's repeated
00:25:26.320 | games that you play over and over. There's zero-sum games. There's non-zero-sum games.
00:25:34.160 | And then there's a really important distinction you're making, two-player versus more players.
00:25:40.080 | So what other games are there and what's the difference, for example, with this
00:25:46.400 | two-player game versus more players? What are the key differences in your view?
00:25:51.520 | - So let me start from the basics. So a repeated game is a game where the same exact game is played
00:26:01.360 | over and over. In these extensive form games, where it's, think about tree form,
00:26:08.000 | maybe with these information sets to represent incomplete information, you can have kind of
00:26:14.000 | repetitive interactions. Even repeated games are a special case of that, by the way. But
00:26:20.240 | the game doesn't have to be exactly the same. It's like in sourcing auctions. Yes, we're going to see
00:26:25.040 | the same supply base year to year, but what I'm buying is a little different every time,
00:26:29.600 | and the supply base is a little different every time, and so on. So it's not really repeated.
00:26:34.160 | So to find a purely repeated game is actually very rare in the world. So they're really a very
00:26:39.680 | coarse model of what's going on. Then if you move up from just repeated, simple repeated matrix games,
00:26:49.680 | not all the way to extensive form games, but in between, there's stochastic games, where,
00:26:55.040 | you know, there's these, you think about it like these little matrix games, and when you take an
00:27:01.840 | action and your opponent takes an action, they determine not which next state I'm going to,
00:27:07.600 | next game I'm going to, but the distribution over next games where I might be going to.
00:27:12.480 | So that's the stochastic game. But it's like matrix games repeated, stochastic games,
00:27:18.880 | extensive form games. That is from less to more general. And poker is an example of the last one.
00:27:26.160 | So it's really in the most general setting, extensive form games. And that's kind of what
00:27:31.360 | the AI community has been working on and been benchmarked on with this heads up no limit Texas
00:27:37.520 | Hold'em. Can you describe extensive form games? What's the model here?
00:27:40.960 | Yeah, so if you're familiar with the tree form, so it's really the tree form, like in chess,
00:27:46.240 | there's a search tree. Versus a matrix.
00:27:48.560 | Versus a matrix, yeah. And the matrix is called the matrix form or bi-matrix form or normal form
00:27:54.560 | game. And here you have the tree form, so you can actually do certain types of reasoning there
00:27:59.280 | that you lose the information when you go to normal form. There's a certain form of equivalence,
00:28:06.800 | like if you go from tree form and you say every possible contingency plan is a strategy,
00:28:12.640 | then I can actually go back to the normal form, but I lose some information from the
00:28:16.880 | lack of sequentiality. Then the multiplayer versus two player distinction is an important
00:28:22.320 | one. So two player games in zero sum are conceptually easier and computationally easier.
00:28:32.640 | They're still huge, like this one, but they're conceptually easier and computationally easier
00:28:39.520 | in that conceptually you don't have to worry about which equilibrium is the other guy going to play
00:28:45.200 | when there are multiple, because any equilibrium strategy is a best response to any other
00:28:50.400 | equilibrium strategy. So I can play a different equilibrium from you and we'll still get the
00:28:55.600 | right values of the game. That falls apart even with two players when you have general sum games.
00:29:00.400 | Even without cooperation, just in general.
00:29:02.960 | Even without cooperation. So there's a big gap from two player zero sum to two player general
00:29:08.560 | sum or even to three player zero sum. That's a big gap, at least in theory.
00:29:14.160 | Can you maybe non-mathematically provide the intuition why it all falls apart with three
00:29:20.320 | or more players? It seems like you should still be able to have a Nash equilibrium that's instructive,
00:29:29.040 | that holds.
00:29:31.120 | Okay. So it is true that all finite games have a Nash equilibrium. So this is what John Nash
00:29:39.200 | actually proved. So they do have a Nash equilibrium. That's not the problem. The
00:29:43.840 | problem is that there can be many. And then there's a question of which equilibrium to select.
00:29:49.440 | And if you select your strategy from a different equilibrium and I select mine,
00:29:57.760 | then what does that mean? And in these non-zero sum games, we may lose some
00:30:03.040 | joint benefit by being just simply stupid. We could actually both be better off if we did
00:30:08.640 | something else. And in three player, you get other problems also like collusion. Like maybe
00:30:14.080 | you and I can gang up on a third player and we can do radically better by colluding. So there
00:30:20.240 | are lots of issues that come up there. So No Brown, the student you work with on this,
00:30:25.520 | has mentioned, I looked through the AMA on Reddit, he mentioned that the ability of poker players to
00:30:31.280 | collaborate would make the game. He was asked the question of how would you make the game of poker,
00:30:36.720 | or both of you were asked the question, how would you make the game of poker
00:30:40.480 | beyond being solvable by current AI methods? And he said that there's not many ways of making
00:30:50.960 | poker more difficult, but collaboration or cooperation between players would make it
00:30:58.400 | extremely difficult. So can you provide the intuition behind why that is, if you agree with
00:31:04.400 | that idea? Yeah, so I've done a lot of work on coalitional games. And we actually have a paper
00:31:11.360 | here with my other student Gabriele Farina and some other collaborators on, at NIPS on that,
00:31:16.640 | actually just came back from the poster session where we presented this. But so when you have a
00:31:21.600 | collusion, it's a different problem. And it typically gets even harder then. Even the game
00:31:28.000 | representations, some of the game representations don't really allow good computation. So we
00:31:34.640 | actually introduced a new game representation for that. Is that kind of cooperation part of the
00:31:40.560 | model? Do you have information about the fact that other players are cooperating? Or is it just this
00:31:48.000 | chaos that where nothing is known? So there's some things unknown. Can you give an example of a
00:31:53.760 | collusion type game? Or is it usually... So like bridge. So think about bridge. It's like when you
00:32:00.320 | and I are on a team, our payoffs are the same. The problem is that we can't talk. So when I get my
00:32:07.440 | cards, I can't whisper to you what my cards are. That would not be allowed. So we have to somehow
00:32:14.400 | coordinate our strategies ahead of time and only ahead of time. And then there's certain signals
00:32:21.440 | we can talk about, but they have to be such that the other team also understands that.
00:32:26.160 | So that's an example where the coordination is already built into the rules of the game.
00:32:32.800 | But in many other situations like auctions or negotiations or diplomatic relationships,
00:32:39.680 | poker, it's not really built in, but it still can be very helpful for the colluders.
00:32:45.200 | I've read you write somewhere, with negotiations, you come to the table with prior,
00:32:51.440 | like a strategy that you're willing to do and not willing to do, those kinds of things.
00:32:58.160 | So how do you start to now moving away from poker, moving beyond poker into other applications like
00:33:04.560 | negotiations? How do you start applying this to other domains, even real world domains that you've
00:33:11.760 | worked on? Yeah, I actually have two startup companies doing exactly that. One is called
00:33:15.920 | Strategic Machine, and that's for kind of business applications, gaming, sports, all sorts of things
00:33:22.240 | like that. Any applications of this to business and to sports and to gaming, to various types of
00:33:31.600 | things in finance, electricity markets, and so on. And the other is called Strategy Robot,
00:33:36.480 | where we are taking this to military, security, cyber security, and intelligence applications.
00:33:42.800 | I think you worked a little bit in, how do you put it, advertisement,
00:33:50.880 | sort of suggesting ads kind of thing. Yeah, that's another company, Optimized Markets.
00:33:57.120 | But that's much more about a combinatorial market and optimization-based technology.
00:34:02.720 | That's not using these game theoretic reasoning technologies.
00:34:06.640 | I see. Okay, so what sort of high level do you think about our ability to use
00:34:14.240 | game theoretic concepts to model human behavior? Do you think human behavior is
00:34:20.320 | amenable to this kind of modeling? So outside of the poker games, and where have you seen it
00:34:25.680 | done successfully in your work? I'm not sure the goal really is modeling humans.
00:34:32.000 | Like for example, if I'm playing a zero-sum game, I don't really care that the opponent is actually
00:34:40.240 | following my model of rational behavior, because if they're not, that's even better for me.
00:34:46.320 | Right. So see, with the opponents in games, the prerequisite is that you formalize
00:34:55.440 | the interaction in some way that can be amenable to analysis. And you've done this
00:35:02.080 | amazing work with mechanism design, designing games that have certain outcomes.
00:35:07.520 | But so I'll tell you an example from my world of autonomous vehicles, right?
00:35:15.280 | We're studying pedestrians, and pedestrians and cars negotiate in this nonverbal communication.
00:35:22.000 | There's this weird game dance of tension where pedestrians are basically saying,
00:35:27.120 | "I trust that you won't kill me, and so as a jaywalker, I will step onto the road,
00:35:31.680 | even though I'm breaking the law." And there's this tension. And the question is,
00:35:35.680 | we really don't know how to model that well, in trying to model intent. And so people sometimes
00:35:42.080 | bring up ideas of game theory and so on. Do you think that aspect of human behavior
00:35:48.400 | can use these kinds of imperfect information approaches, modeling? How do you start to attack
00:35:56.160 | a problem like that, when you don't even know how to design the game to describe
00:36:02.000 | the situation in order to solve it? Okay, so I haven't really thought about jaywalking.
00:36:06.720 | But one thing that I think could be a good application in autonomous vehicles is the
00:36:12.160 | following. So let's say that you have fleets of autonomous cars operated by different companies.
00:36:18.240 | So maybe here's the Waymo fleet, and here's the Uber fleet. If you think about the rules of the
00:36:23.280 | road, they define certain legal rules, but that still leaves a huge strategy space open. Like as
00:36:30.400 | a simple example, when cars merge, you know how humans merge, you know, they slow down and look
00:36:35.280 | at each other and try to merge. Wouldn't it be better if these situations would already be
00:36:41.920 | pre-negotiated, so we can actually merge at full speed, and we know that this is the situation,
00:36:47.280 | this is how we do it, and it's all going to be faster. But there are way too many situations
00:36:52.320 | to negotiate manually. So you could use automated negotiation, this is the idea at least,
00:36:57.600 | you could use automated negotiation to negotiate all of these situations, or many of them,
00:37:02.960 | in advance. And of course, it might be that, hey, maybe you're not going to always let me go first.
00:37:09.040 | Maybe you said, okay, well, in these situations, I'll let you go first. But in exchange,
00:37:13.360 | you're going to give me too much, you're going to let me go first in these situations. So it's
00:37:17.520 | this huge combinatorial negotiation. And do you think there's room in that example of merging
00:37:24.000 | to model this whole situation as an imperfect information game, or do you really want to
00:37:28.080 | consider it to be a perfect? No, that's a good question. Yeah, that's a good question.
00:37:33.280 | Do you pay the price of assuming that you don't know everything?
00:37:38.320 | Yeah, I don't know. It's certainly much easier. Games with perfect information are much easier.
00:37:45.040 | So if you can get away with it, you should. But if the real situation is of imperfect information,
00:37:52.480 | then you're going to have to deal with imperfect information.
00:37:55.040 | Great. So what lessons have you learned, the annual computer poker competition,
00:37:59.840 | an incredible accomplishment of AI? You know, you look at the history of Deep Blue, AlphaGo,
00:38:06.240 | these kind of moments when AI stepped up in an engineering effort and a scientific effort
00:38:13.280 | combined to beat the best of human players. So what do you take away from this whole experience?
00:38:19.360 | What have you learned about designing AI systems that play these kinds of games? And what does
00:38:24.240 | that mean for AI in general, for the future of AI development?
00:38:30.000 | Yeah, so that's a good question. There's so much to say about it. I do like this type of
00:38:36.480 | performance-oriented research, although in my group we go all the way from idea to theory to
00:38:42.960 | experiments to big system building to commercialization. So we span that spectrum. But I
00:38:48.160 | think that in a lot of situations in AI, you really have to build the big systems and evaluate
00:38:54.000 | them at scale before you know what works and doesn't. And we've seen that in the computational
00:39:00.000 | game theory community, that there are a lot of techniques that look good in the small,
00:39:03.920 | but then they cease to look good in the large. And we've also seen that there are a lot of
00:39:09.040 | techniques that look superior in theory. And I really mean in terms of convergence rates,
00:39:15.600 | better, like first order methods, better convergence rates, like the CFR-based
00:39:19.920 | algorithms. Yet the CFR-based algorithms are the fastest in practice. So it really tells me that
00:39:25.760 | you have to test this in reality. The theory isn't tight enough, if you will, to tell you
00:39:31.280 | which algorithms are better than the others. And you have to look at these things in the large,
00:39:38.400 | because any sort of projections you do from the small can, at least in this domain, be very
00:39:42.880 | misleading. So that's kind of from a kind of science and engineering perspective. From a
00:39:47.680 | personal perspective, it's been just a wild experience in that with the first poker competition,
00:39:54.000 | the first brains versus AI, man-machine poker competition that we organized. There had been,
00:40:00.240 | by the way, for other poker games, there had been previous competitions, but this was for
00:40:04.000 | Heads Up No Limit, this was the first. And I probably became the most hated person in the
00:40:09.680 | world of poker. And I didn't mean to. - Why is that? For cracking the game?
00:40:15.120 | - Yeah. A lot of people felt that it was a real threat to the whole game,
00:40:20.720 | the whole existence of the game. If AI becomes better than humans, people would be scared to
00:40:27.280 | play poker because there are these superhuman AIs running around taking their money and all of that.
00:40:32.800 | So it's just really aggressive. The comments were super aggressive. I got everything,
00:40:39.120 | just short of death threats. - Do you think the same was true for chess?
00:40:43.840 | Because right now they just completed the world championships in chess and humans just started
00:40:48.720 | ignoring the fact that there's AI systems now that outperform humans and they still enjoy the
00:40:53.760 | game. It's still a beautiful game. - That's what I think. And I think the
00:40:57.120 | same thing happens in poker. And so I didn't think of myself as somebody who was going to kill the
00:41:02.000 | game and I don't think I did. I've really learned to love this game. I wasn't a poker player before,
00:41:06.880 | but I learned so many nuances from these AIs and they've really changed how the game is played,
00:41:12.400 | by the way. So they have these very Martian ways of playing poker and the top humans are now
00:41:17.600 | incorporating those types of strategies into their own play. So if anything, to me,
00:41:22.480 | our work has made poker a richer, more interesting game for humans to play,
00:41:29.760 | not something that is going to steer humans away from it entirely.
00:41:33.200 | - Just a quick comment on something you said, which is, if I may say so, in academia is a little
00:41:40.400 | bit rare sometimes. It's pretty brave to put your ideas to the test in the way you described,
00:41:46.400 | saying that sometimes good ideas don't work when you actually try to apply them at scale.
00:41:51.920 | So where does that come from? I mean, if you could do advice for people, what drives you in that
00:42:00.320 | sense? Were you always this way? I mean, it takes a brave person, I guess is what I'm saying, to test
00:42:05.680 | their ideas and to see if this thing actually works against human top human players and so on.
00:42:11.520 | - Yeah, I don't know about brave, but it takes a lot of work. It takes a lot of work and a lot of
00:42:16.800 | time to organize, to make something big and to organize an event and stuff like that.
00:42:22.480 | - And what drives you in that effort? Because you could still, I would argue, get a best paper
00:42:28.000 | award at NIPS as you did in '17 without doing this. - That's right, yes.
00:42:32.160 | - And so... - So in general, I believe
00:42:36.560 | it's very important to do things in the real world and at scale. And that's really where the
00:42:42.640 | pudding, if you will, proof is in the pudding. That's where it is. In this particular case,
00:42:50.000 | it was kind of a competition between different groups for many years as to who can be the first
00:42:58.400 | one to beat the top humans at Heads Up No Limit Texas Hold'em. So it became kind of a
00:43:04.880 | competition who can get there. - Yeah, so a little friendly competition
00:43:11.680 | could do wonders for progress. - Yes, absolutely.
00:43:14.960 | - So the topic of mechanism design, which is really interesting,
00:43:19.840 | also kind of new to me, except as an observer of, I don't know, politics and any... I'm an observer
00:43:26.640 | of mechanisms, but you write in your paper on automated mechanism design that I quickly read.
00:43:32.960 | So mechanism design is designing the rules of the game so you get a certain desirable outcome.
00:43:40.000 | And you have this work on doing so in an automatic fashion as opposed to fine-tuning it. So what have
00:43:47.280 | you learned from those efforts? If you look, say, I don't know, at a complex system, like
00:43:53.840 | our political system, can we design our political system to have, in an automated fashion,
00:44:00.720 | to have outcomes that we want? Can we design something like traffic lights to be smart
00:44:08.960 | where it gets outcomes that we want? So what are the lessons that you draw from that work?
00:44:14.560 | - Yeah, so I still very much believe in the automated mechanism design direction.
00:44:19.040 | - Yes. - But it's not a panacea.
00:44:22.880 | There are impossibility results in mechanism design, saying that there is no mechanism that
00:44:28.480 | accomplishes objective X in class C. So it's not going to... There's no way using any mechanism
00:44:38.240 | design tools, manual or automated, to do certain things in mechanism design.
00:44:42.640 | - Can you describe that again? So meaning it's impossible to achieve that?
00:44:47.040 | - Yeah, yeah. - Or it's unlikely?
00:44:49.840 | - Impossible. - Impossible.
00:44:51.440 | - So these are not statements about human ingenuity, who might come up with something
00:44:56.320 | smart. These are proofs that if you want to accomplish properties X in class C,
00:45:02.240 | that is not doable with any mechanism. The good thing about automated mechanism design is that
00:45:07.680 | we're not really designing for a class. We're designing for specific settings at a time.
00:45:13.360 | So even if there's an impossibility result for the whole class, it doesn't mean that all of the
00:45:19.760 | cases in the class are impossible. It just means that some of the cases are impossible.
00:45:24.880 | So we can actually carve these islands of possibility within these known impossible
00:45:29.680 | classes. And we've actually done that. So one of the famous results in mechanism design is
00:45:35.040 | the Meyerson-Satterthwaite theorem by Roger Meyerson and Mark Satterthwaite from 1983.
00:45:40.160 | It's an impossibility of efficient trade under imperfect information. We show that
00:45:45.760 | you can in many settings avoid that and get efficient trade anyway.
00:45:50.400 | - Depending on how you design the game. Okay, so-
00:45:53.760 | - Depending how you design the game. And of course, it doesn't in any way
00:45:58.720 | contradict the impossibility result. The impossibility result is still there,
00:46:03.680 | but it just finds spots within this impossible class where in those spots,
00:46:10.640 | you don't have the impossibility. - Sorry if I'm going a bit philosophical,
00:46:14.640 | but what lessons do you draw towards, like I mentioned, politics or human interaction and
00:46:20.160 | designing mechanisms for outside of just these kinds of trading or auctioning or
00:46:30.880 | purely formal games or human interaction, like a political system? Do you think it's applicable to
00:46:37.360 | politics or to business, to negotiations, these kinds of things, designing rules that have
00:46:47.920 | certain outcomes? - Yeah, yeah, I do think so.
00:46:50.880 | - Have you seen that successfully done? - There hasn't really, oh, you mean mechanism
00:46:56.000 | design or automated mechanism? - Automated mechanism design.
00:46:58.320 | - So mechanism design itself has had fairly limited success so far. There are certain cases,
00:47:07.440 | but most of the real world situations are actually not sound from a mechanism design perspective.
00:47:14.560 | Even in those cases where they've been designed by very knowledgeable mechanism design people,
00:47:19.840 | the people are typically just taking some insights from the theory and applying those insights into
00:47:25.120 | the real world rather than applying the mechanisms directly. So one famous example of it is the FCC
00:47:32.000 | spectrum auctions. So I've also had a small role in that and very good economists have been,
00:47:40.480 | excellent economists have been working on that with no game theory, yet the rules that are
00:47:45.040 | designed in practice there, they're such that bidding truthfully is not the best strategy.
00:47:51.600 | Usually mechanism design, we try to make things easy for the participants. So telling the truth
00:47:57.040 | is the best strategy. But even in those very high stakes auctions where you have tens of billions
00:48:02.480 | of dollars worth of spectrum being auctioned, truth telling is not the best strategy.
00:48:07.840 | And by the way, nobody knows even a single optimal bidding strategy for those auctions.
00:48:13.520 | - What's the challenge of coming up with an optimal, because there's a lot of players and
00:48:17.280 | there's imperfect players. - It's not so much a lot of players,
00:48:19.920 | but many items for sale. And these mechanisms are such that even with just two items or one item,
00:48:26.560 | bidding truthfully wouldn't be the best strategy. - If you look at the history of AI,
00:48:34.400 | it's marked by seminal events, AlphaGo beating a world champion, human goal player. I would put
00:48:40.560 | Liberatus winning the Heads Up No Limit Hold'em as one of such event. - Thank you.
00:48:45.200 | - And what do you think is the next such event, whether it's in your life or in the broadly AI
00:48:55.920 | community that you think might be out there that would surprise the world?
00:49:00.800 | - So that's a great question. And I don't really know the answer. In terms of game solving,
00:49:05.760 | Heads Up No Limit Texas Hold'em really was the one remaining widely agreed upon benchmark.
00:49:14.320 | So that was the big milestone. Now, are there other things? Yeah, certainly there are,
00:49:18.800 | but there's not one that the community has kind of focused on. So what could be other things?
00:49:24.240 | There are groups working on StarCraft. There are groups working on Dota 2. These are video games.
00:49:30.720 | Or you could have like Diplomacy or Hanabi, things like that. These are like recreational games,
00:49:38.480 | but none of them are really acknowledged as kind of the main next challenge problem,
00:49:44.720 | like Chess or Go or Heads Up No Limit Texas Hold'em was. So I don't really know in the game
00:49:51.440 | solving space what is or what will be the next benchmark. I kind of hope that there will be a
00:49:56.640 | next benchmark because really the different groups working on the same problem really drove these
00:50:02.240 | application-independent techniques forward very quickly over 10 years.
00:50:07.280 | Do you think there's an open problem that excites you that you start moving away from games into
00:50:12.400 | real-world games, like say the stock market trading?
00:50:17.120 | Yeah, so that's kind of how I am. So I am probably not going to work
00:50:22.000 | as hard on these recreational benchmarks. I'm doing two startups on game-solving technology,
00:50:30.080 | Strategic Machine and Strategy Robot, and we're really interested in pushing this
00:50:35.120 | stuff into practice. What do you think would be really a powerful result that would be surprising,
00:50:45.200 | that would be, if you can say, I mean, five years, 10 years from now, something that statistically
00:50:55.920 | you would say is not very likely, but if there's a breakthrough, would achieve?
00:51:01.360 | Yeah, so I think that overall we're in a very different situation in game theory
00:51:08.320 | than we are in, let's say, machine learning. So in machine learning, it's a fairly mature
00:51:13.600 | technology and it's very broadly applied and proven success in the real world.
00:51:19.520 | In game solving, there are almost no applications yet.
00:51:22.400 | We have just become superhuman, which machine learning you could argue happened in the 90s,
00:51:29.360 | if not earlier, and at least on supervised learning, certain complex supervised learning
00:51:35.120 | applications. Now, I think the next challenge problem, I know you're not asking about it this
00:51:40.960 | way, you're asking about the technology breakthrough, but I think that the big,
00:51:44.080 | big breakthrough is to be able to show that, hey, maybe most of, let's say, military planning or
00:51:49.360 | most of business strategy will actually be done strategically using computational game theory.
00:51:54.320 | That's what I would like to see as a next five or 10 year goal.
00:51:58.160 | Maybe you can explain to me again, forgive me if this is an obvious question, but machine
00:52:03.760 | learning methods, neural networks suffer from not being transparent, not being explainable.
00:52:09.200 | Game theoretic methods, Nash equilibria, do they generally, when you see the different solutions,
00:52:15.680 | are they, when you talk about military operations, are they, once you see the strategies,
00:52:22.160 | do they make sense? Are they explainable or do they suffer from the same problems as neural
00:52:26.560 | networks do? So that's a good question. I would say a little bit yes and no. And what I mean by
00:52:33.200 | that is that these game theoretic strategies, let's say Nash equilibrium, it has provable
00:52:39.840 | properties. So it's unlike, let's say, deep learning where you kind of cross your fingers,
00:52:44.800 | hopefully it'll work. And then after the fact, when you have the weights, you're still crossing
00:52:48.800 | your fingers and I hope it will work. Here, you know that the solution quality is there.
00:52:55.760 | There's provable solution quality guarantees. Now that doesn't necessarily mean that the
00:53:01.120 | strategies are human understandable. That's a whole other problem. So I think that deep learning
00:53:06.800 | and computational game theory are in the same boat in that sense, that both are difficult to
00:53:11.760 | understand. But at least the game theoretic techniques, they have these guarantees of
00:53:18.480 | solution quality. So do you see business operations, strategic operations, or even military
00:53:24.000 | in the future being at least the strong candidates being proposed by automated systems?
00:53:31.920 | Do you see that? Yeah, I do. I do. But that's more of a belief than a substantiated fact.
00:53:39.520 | Depending on where you land in optimism or pessimism, that's a really, to me, that's an
00:53:43.760 | exciting future, especially if there's provable things in terms of optimality. So looking into
00:53:52.560 | the future, there's a few folks worried about the, especially you look at the game of poker,
00:54:01.120 | which is probably one of the last benchmarks in terms of games being solved. They worry about
00:54:06.800 | the future and the existential threats of artificial intelligence. So the negative impact
00:54:11.760 | in whatever form on society. Is that something that concerns you as much or are you more optimistic
00:54:18.800 | about the positive impacts of AI? I am much more optimistic about the positive
00:54:23.760 | impacts. So just in my own work, what we've done so far, we run the nationwide kidney exchange.
00:54:29.840 | Hundreds of people are walking around alive today, who wouldn't be? And it's increased
00:54:35.120 | employment. You have a lot of people now running kidney exchanges and at transplant centers,
00:54:42.160 | interacting with the kidney exchange. You have extra surgeons, nurses, anesthesiologists,
00:54:49.280 | hospitals, all of that. So employment is increasing from that and the world is becoming
00:54:54.240 | a better place. Another example is combinatorial sourcing auctions. We did 800 large-scale
00:55:02.320 | combinatorial sourcing auctions from 2001 to 2010 in a previous startup of mine called CombineNet.
00:55:09.360 | And we increased the supply chain efficiency on that $60 billion of spend by 12.6%. So that's over
00:55:18.800 | $6 billion of efficiency improvement in the world. And this is not like shifting value from somebody
00:55:24.400 | to somebody else, just efficiency improvement. Like in trucking, less empty driving, so there's
00:55:29.840 | less waste, less carbon footprint, and so on. This is a huge positive impact in the near term,
00:55:36.800 | but to stay in it for a little longer, because I think game theory has a role to play here.
00:55:43.040 | - Oh, let me actually come back on that as one thing. I think AI is also going to make the world
00:55:48.080 | much safer. So that's another aspect that often gets overlooked.
00:55:52.960 | - Well, let me ask this question. Maybe you can speak to the safer. So I talked to Max Tegmark
00:55:58.560 | and Stuart Russell, who are very concerned about existential threats of AI. And often the concern
00:56:03.760 | is about value misalignment. So AI systems basically operating towards goals that are not
00:56:14.240 | the same as human civilization, human beings. So it seems like game theory has a role to play there
00:56:20.960 | to make sure the values are aligned with human beings. I don't know if that's how you think
00:56:29.360 | about it. If not, how do you think AI might help with this problem? How do you think AI might make
00:56:37.040 | the world safer? - Yeah, I think this value misalignment
00:56:43.120 | is a fairly theoretical worry, and I haven't really seen it in, because I do a lot of real
00:56:51.040 | applications, I don't see it anywhere. The closest I've seen it was the following type of mental
00:56:56.880 | exercise really, where I had this argument in the late '80s when we were building these
00:57:01.840 | transportation optimization systems. And somebody had heard that it's a good idea to have high
00:57:06.400 | utilization of assets. So they told me, "Hey, why don't you put that as objective?"
00:57:10.880 | And we didn't even put it as an objective, because I just showed him that, "If you had
00:57:17.120 | that as your objective, the solution would be to load your trucks full and drive in circles.
00:57:21.920 | Nothing would ever get delivered. You'd have 100% utilization." So yeah, I know this
00:57:26.640 | phenomenon, I've known this for over 30 years, but I've never seen it actually be a problem
00:57:31.920 | in reality. And yes, if you have the wrong objective, the AI will optimize that to the
00:57:37.360 | hilt, and it's gonna hurt more than some human who's kinda trying to solve it in a half-baked
00:57:43.680 | way with some human insight too. But I just haven't seen that materialize in practice.
00:57:48.320 | - There's this gap that you've actually put your finger on very clearly just now,
00:57:54.960 | between theory and reality, that's very difficult to put into words, I think. It's what you can
00:58:00.880 | theoretically imagine, the worst possible case, or even, yeah, I mean, bad cases, and what usually
00:58:09.120 | happens in reality. So for example, to me, maybe it's something you can comment on, having grown
00:58:15.680 | up in, I grew up in the Soviet Union. You know, there's currently 10,000 nuclear weapons in the
00:58:21.200 | world. And for many decades, it's theoretically surprising to me that the nuclear war is not
00:58:29.840 | broken out. Do you think about this aspect from a game theoretic perspective in general,
00:58:36.240 | why is that true? Why, in theory, you could see how things would go terribly wrong, and somehow
00:58:43.200 | yet they have not. How do you think about that? - So I do think about that a lot. I think the
00:58:47.760 | biggest two threats that we're facing as mankind, one is climate change, and the other is nuclear
00:58:52.720 | war. So those are my main two worries that I worry about. And I've tried to do something about
00:58:59.120 | climate, thought about trying to do something for climate change twice. I actually, for two of my
00:59:04.640 | startups, I've actually commissioned studies of what we could do on those things. And we didn't
00:59:09.920 | really find a sweet spot, but I'm still keeping an eye out on that. If there's something where we
00:59:14.480 | could actually provide a market solution or optimization solution or some other technology
00:59:19.360 | solution to problems. Right now, like for example, pollution, great markets was what we were looking
00:59:26.000 | at then. And it was much more the lack of political will by those markets were not so successful,
00:59:32.240 | rather than bad market design. So I could go in and make a better market design,
00:59:37.120 | but that wouldn't really move the needle on the world very much if there's no political will.
00:59:41.200 | And in the US, you know, the market, at least the Chicago market was just shut down,
00:59:45.680 | and so on. So then it doesn't really help how great your market design was.
00:59:50.560 | So on the nuclear side, it's more so global warming is a more encroaching
00:59:56.640 | problem. You know, nuclear weapons have been here. It's an obvious problem that's just been
01:00:05.200 | sitting there. So how do you think about what is the mechanism design there that just made everything
01:00:11.520 | seem stable? And are you still extremely worried?
01:00:14.800 | I am still extremely worried. So you probably know the simple game theory of MAD.
01:00:19.600 | So this was a mutually assured destruction. And it doesn't require any computation. With small
01:00:26.880 | matrices, you can actually convince yourself that the game is such that nobody wants to initiate.
01:00:30.880 | Yeah, that's a very coarse grained analysis. And it really works in a situation where you have
01:00:38.080 | two superpowers or small number of superpowers. Now things are very different. You have a smaller
01:00:42.800 | nuke. So the threshold of initiating is smaller, and you have smaller countries and non-nation
01:00:50.640 | actors who may get nukes and so on. So I think it's riskier now than it was maybe ever before.
01:00:57.440 | And what idea, application of AI, you've talked about a little bit, but what is the most exciting
01:01:06.560 | to you right now? I mean, you're here at NIPS, NeurIPS now, you have a few excellent pieces of
01:01:14.640 | work, but what are you thinking into the future? With several companies you're doing, what's the
01:01:18.160 | most exciting thing or one of the exciting things? The number one thing for me right now is coming up
01:01:24.480 | with these scalable techniques for game solving and applying them into the real world. I'm still
01:01:31.360 | very interested in market design as well. And we're doing that in the optimized markets. But
01:01:35.600 | I'm most interested if number one right now is strategic machine, strategy robot, getting that
01:01:40.480 | technology out there and seeing as you were in the trenches doing applications, what needs to
01:01:46.240 | be actually filled, what technology gaps still need to be filled. So it's so hard to just put
01:01:51.120 | your feet on the table and imagine what needs to be done. But when you're actually doing real
01:01:55.280 | applications, the applications tell you what needs to be done. And I really enjoy that interaction.
01:02:00.880 | Is it a challenging process to apply some of the state of the art techniques you're working on
01:02:06.880 | and having the various players in industry or the military, or people who could really benefit from
01:02:17.600 | it actually use it? What's that process like? You know, in autonomous vehicles, we work with
01:02:22.640 | automotive companies, and they're in many ways, they're a little bit old fashioned. It's difficult.
01:02:29.280 | They really want to use this technology. There's clearly will have a significant benefit. But the
01:02:35.920 | systems aren't quite in place to easily have them integrated in terms of data in terms of
01:02:41.840 | compute in terms of all these kinds of things. So do you, is that one of the bigger challenges
01:02:46.480 | that you're facing? And how do you tackle that challenge?
01:02:50.160 | Yeah, I think that's always a challenge. That's kind of slowness and inertia, really.
01:02:55.600 | Let's do things the way we've always done it. You just have to find the internal champions at the
01:03:00.160 | customer who understand that, hey, things can't be the same way in the future. Otherwise, bad things
01:03:05.280 | are going to happen. And it's in autonomous vehicles, it's actually very interesting that
01:03:09.600 | the car makers are doing that, and they're very traditional. But at the same time, you have tech
01:03:13.760 | companies who have nothing to do with cars or transportation, like Google and Baidu,
01:03:18.240 | really pushing on autonomous cars. I find that fascinating.
01:03:23.120 | Clearly, you're super excited about actually these ideas having an impact in the world.
01:03:28.240 | In terms of the technology, in terms of ideas and research, are there directions that you're
01:03:35.120 | also excited about? Whether that's on some of the approaches you talked about for imperfect
01:03:41.280 | information games, whether it's applying deep learning to some of these problems? Is there
01:03:45.280 | something that you're excited in the research side of things?
01:03:48.800 | Yeah, yeah. Lots of different things in the game solving. Solving even bigger games,
01:03:55.120 | games where you have more hidden action of the player actions as well. Poker is a game where
01:04:03.440 | really the chance actions are hidden, or some of them are hidden, but the player actions are public.
01:04:08.240 | Multiplayer games of various sorts, collusion, opponent exploitation,
01:04:18.000 | all and even longer games. Games that basically go forever, but they're not repeated.
01:04:24.080 | Extensive fun games that go forever. What would that even look like? How do you represent that?
01:04:30.880 | How do you solve that? What's an example of a game like that?
01:04:33.200 | Is this some of the stochastic games that you mentioned?
01:04:35.840 | Let's say business strategy. Not just modeling a particular interaction, but thinking about
01:04:41.920 | the business from here to eternity. Or let's say military strategy. It's not like war is going to
01:04:50.080 | go away. How do you think about military strategy that's going to go forever? How do you even model
01:04:57.440 | that? How do you know whether a move was good that somebody made? And so on. That's one direction.
01:05:06.800 | I'm also very interested in learning much more scalable techniques for integer programming.
01:05:13.360 | So we had a nice email paper this summer on that. The first automated algorithm configuration paper
01:05:20.160 | that has theoretical generalization guarantees. So if I see this many training examples,
01:05:25.360 | and I told my algorithm in this way, it's going to have good performance on the real distribution,
01:05:31.520 | which I've not seen. So which is kind of interesting that algorithm configuration
01:05:36.560 | has been going on now for at least 17 years, seriously. And there has not been any
01:05:42.960 | generalization theory before. Well, this is really exciting. And it's a huge honor to talk to you.
01:05:49.760 | Thank you so much, Tomas. Thank you for bringing Libratus to the world and all the great work
01:05:53.520 | you're doing. Well, thank you very much. It's been fun. Good questions.
01:05:56.880 | Thank you.
01:06:12.300 | [BLANK_AUDIO]