Tuomas Sandholm: Poker and Game Theory | Lex Fridman Podcast #12

00:00:00.000 | The following is a conversation with Thomas Sanholm. He's a professor at CMU and co-creator

00:00:05.760 | of Libratus, which is the first AI system to beat top human players in the game of heads-up

00:00:10.880 | no limit Texas Hold'em. He has published over 450 papers on game theory and machine learning,

00:00:17.280 | including a best paper in 2017 at NIPS, now renamed to NewRIPS, which is where I caught up

00:00:24.640 | with him for this conversation. His research and companies have had wide-reaching impact in the

00:00:30.880 | real world, especially because he and his group not only propose new ideas but also build systems

00:00:38.240 | to prove that these ideas work in the real world. This conversation is part of the MIT course on

00:00:44.800 | Artificial General Intelligence and the Artificial Intelligence Podcast. If you enjoy it, subscribe

00:00:50.400 | on YouTube, iTunes, or simply connect with me on Twitter @LexFriedman, spelled F-R-I-D.

00:00:57.280 | And now, here's my conversation with Thomas Sanholm. Can you describe at the high level

00:01:06.000 | the game of poker Texas Hold'em, heads-up Texas Hold'em, for people who might not be familiar

00:01:10.720 | with this card game? Yeah, happy to. So, heads-up no limit Texas Hold'em has really emerged in the

00:01:17.600 | AI community as a main benchmark for testing these application-independent algorithms for

00:01:24.160 | imperfect information game solving. And this is a game that's actually played by humans. You don't

00:01:31.200 | see that much on TV or casinos because, well, for various reasons, but you do see it in some

00:01:38.640 | expert-level casinos and you see it in the best poker movies of all time. It's actually an event

00:01:44.000 | in the World Series of Poker, but mostly it's played online and typically for pretty big sums

00:01:50.400 | of money. And this is a game that usually only experts play. So, if you go to your home game

00:01:57.200 | on a Friday night, it probably is not going to be heads-up no limit Texas Hold'em. It might be

00:02:01.680 | no limit Texas Hold'em in some cases, but typically for a big group and it's not as competitive.

00:02:08.560 | While heads-up means it's two players, so it's really like me against you. Am I better or are

00:02:14.080 | you better? Much like chess or go in that sense, but an imperfect information game,

00:02:19.360 | which makes it much harder because I have to deal with issues of you knowing things that I don't

00:02:24.960 | know and I know things that you don't know instead of pieces being nicely laid on the board for both

00:02:29.920 | of us to see. So, in Texas Hold'em, there's two cards that you only see that belong to you. And

00:02:37.680 | there is, they gradually lay out some cards that add up overall to five cards that everybody can

00:02:43.600 | see. So, the imperfect nature of the information is the two cards that you're holding.

00:02:48.240 | Up front, yeah. So, as you said, you first get two cards in private each and then there's a

00:02:54.160 | betting round. Then you get three cards in public on the table, then there's a betting round. Then

00:02:59.440 | you get the fourth card in public on the table, there's a betting round. Then you get the fifth

00:03:03.760 | card on the table, there's a betting round. So, there's a total of four betting rounds and four

00:03:08.480 | tranches of information revelation, if you will. Only the first tranche is private and then it's

00:03:14.720 | public from there. And this is probably by far the most popular game in AI and just the general

00:03:25.680 | public in terms of imperfect information. So, it's probably the most popular spectator game to watch.

00:03:32.960 | Right? So, which is why it's a super exciting game to tackle. So, it's on the order of chess,

00:03:39.760 | I would say, in terms of popularity, in terms of the AI setting it as the bar of what is

00:03:45.360 | intelligence. So, in 2017, Libratus, how do you pronounce it? Libratus. Libratus. Libratus beats...

00:03:52.880 | Little Latin there.

00:03:53.920 | A little bit of Latin. Libratus beats a few, four expert human players. Can you describe that

00:04:02.400 | event? What you learned from it? What was it like? What was the process in general for people who

00:04:07.200 | have not read the papers and studied? Yeah. So, the event was that we invited four of the top 10

00:04:14.080 | players, who these are specialist players in Heads Up No Limit Texas Hold'em, which is very important

00:04:18.960 | because this game is actually quite different than the multiplayer version. We brought them in

00:04:24.800 | to Pittsburgh to play at the Rivers Casino for 20 days. We wanted to get 120,000 hands in

00:04:31.680 | because we wanted to get statistical significance. So, it's a lot of hands for humans to play,

00:04:38.400 | even for these top pros who play fairly quickly normally. So, we couldn't just have one of them

00:04:44.800 | play so many hands. 20 days they were playing, basically morning to evening, and I raised 200,000

00:04:53.120 | as a little incentive for them to play. And the setting was so that they didn't all get 50,000.

00:05:00.880 | We actually paid them out based on how they did against the AI each. So, they had an incentive to

00:05:06.560 | play as hard as they could, whether they're way ahead or way behind or right at the mark of beating

00:05:12.960 | the AI. And you don't make any money, unfortunately. Right. No, we can't make any money. So,

00:05:18.240 | originally, a couple of years earlier, I actually explored whether we could actually play for money

00:05:24.000 | because that would be, of course, interesting as well to play against the top people for money.

00:05:29.360 | But the Pennsylvania Gaming Board said no. So, we couldn't. So, this is much like an exhibit,

00:05:35.200 | like for a musician or a boxer or something like that. Nevertheless, you were keeping track of the

00:05:41.120 | money and Labradors won close to $2 million, I think. So, if it was for real money, if you were

00:05:50.800 | able to earn money, that was a quite impressive and inspiring achievement. Just a few details.

00:05:57.040 | What were the players looking at? Were they behind a computer? What was the interface like?

00:06:02.000 | Yes. They were playing much like they normally do. These top players, when they play this game,

00:06:07.040 | they play mostly online. So, they're used to playing through a UI. And they did the same

00:06:12.400 | thing here. So, there was this layout. You could imagine there's a table on a screen.

00:06:17.760 | There's the human sitting there, and then there's the AI sitting there. And the screen shows

00:06:23.600 | everything that's happening. The card's coming out and shows the bets being made. And we also

00:06:28.080 | had the betting history for the human. So, if the human forgot what had happened in the hand so far,

00:06:33.200 | they could actually reference back and so forth. Is there a reason they were given access to the

00:06:39.520 | betting history? Well, it didn't really matter. They wouldn't have forgotten anyway. These are

00:06:47.680 | top quality people. But we just wanted to put out there so it's not a question of a human forgetting

00:06:53.360 | and the AI somehow trying to get advantage of better memory. So, what was that like? I mean,

00:06:57.760 | that was an incredible accomplishment. So, what did it feel like before the event? Did you

00:07:03.200 | have doubt, hope? Where was your confidence at? Yeah, that's great. Great question. So,

00:07:10.160 | 18 months earlier, I had organized a similar brains versus AI competition with a previous

00:07:16.400 | AI called Cloudico, and we couldn't beat the humans. So, this time around, it was only 18

00:07:22.720 | months later, and I knew that this new AI, Libratus, was way stronger. But it's hard to say

00:07:29.200 | how you'll do against the top humans before you try. So, I thought we had about a 50/50 shot.

00:07:34.400 | And the international betting sites put us as a four to one or five to one underdog.

00:07:41.600 | So, it's kind of interesting that people really believe in people over AI. People don't just

00:07:48.960 | believe over believing themselves, but they have overconfidence in other people as well

00:07:53.120 | compared to the performance of AI. And yeah, so we were a four to one or five to one underdog.

00:07:59.200 | And even after three days of beating the humans in a row, we were still 50/50 on the international

00:08:05.280 | betting sites. Do you think there's something special and magical about poker in the way

00:08:10.480 | people think about it? In the sense you have, I mean, even in chess, there's no Hollywood movies.

00:08:17.200 | Poker is the star of many movies. And there's this feeling that certain human facial expressions and

00:08:27.920 | body language, eye movement, all these tells are critical to poker. Like you can look into

00:08:34.080 | somebody's soul and understand their betting strategy and so on. So, that's probably why

00:08:40.960 | possibly do you think that is why people have a confidence that humans will outperform? Because

00:08:46.000 | AI systems cannot in this construct perceive these kinds of tells. They're only looking at betting

00:08:52.480 | patterns and nothing else, the betting patterns and statistics. So, what's more important to you

00:09:03.120 | if you step back and human players, human versus human, what's the role of these tells of these

00:09:10.000 | ideas that we romanticize? Yeah. So, I'll split it into two parts. So, one is why do humans

00:09:17.760 | trust humans more than AI and all have overconfidence in humans? I think that's not

00:09:24.720 | really related to the tell question. It's just that they've seen these top players, how good

00:09:30.000 | they are and they're really fantastic. So, it's just hard to believe that an AI could beat them.

00:09:36.720 | So, I think that's where that comes from. And that's actually maybe a more general lesson about

00:09:40.960 | AI that until you've seen it overperform a human, it's hard to believe that it could.

00:09:45.360 | But then the tells, a lot of these top players, they're so good at hiding tells that among the

00:09:55.360 | top players, it's actually not really worth it for them to invest a lot of effort trying to find

00:10:01.840 | tells in each other because they're so good at hiding them. So, yes, at the kind of Friday evening

00:10:09.040 | game, tells are going to be a huge thing. You can read other people and if you're a good reader,

00:10:14.000 | you'll read them like an open book. But at the top levels of poker now, the tells become a much

00:10:20.080 | smaller and smaller aspect of the game as you go to the top levels. The amount of strategies,

00:10:25.840 | the amounts of possible actions is very large, 10 to the power of 100 plus. So, there has to be some,

00:10:36.720 | I've read a few of the papers related, it has to form some abstractions of various hands and actions.

00:10:44.080 | So, what kind of abstractions are effective for the game of poker?

00:10:49.120 | Yeah. So, you're exactly right. So, when you go from a game tree, that's 10 to the 161,

00:10:55.280 | especially in an imperfect information game, it's way too large to solve directly,

00:11:00.080 | even with our fastest equilibrium finding algorithms. So, you want to abstract it first.

00:11:07.120 | And abstraction in games is much trickier than abstraction in MDPs or other single agent settings,

00:11:14.320 | because you have these abstraction pathologies. But if I have a finer grained abstraction,

00:11:19.440 | the strategy that I can get from that for the real game might actually be worse than the strategy I

00:11:25.280 | can get from the coarse grained abstraction. So, you have to be very careful.

00:11:28.720 | Now, the kinds of abstractions, just to zoom out, we're talking about, there's the

00:11:32.640 | hands abstractions, and then there's betting strategies.

00:11:37.120 | Yeah, betting actions. Yeah.

00:11:38.480 | Betting actions.

00:11:39.280 | So, there's information abstraction, talk about general games, information abstraction,

00:11:44.560 | which is the abstraction of what chance does. And this would be the cards in the case of poker.

00:11:50.000 | And then there's action abstraction, which is abstracting the actions of the actual players,

00:11:56.400 | which would be bets in the case of poker.

00:11:58.960 | Yourself and the other players?

00:12:01.280 | Yes, yourself and other players. And for information abstraction, we were completely

00:12:10.160 | automated. So, these are algorithms, but they do what we call potential aware abstraction,

00:12:16.640 | where we don't just look at the value of the hand, but also how it might materialize into

00:12:20.880 | good or bad hands over time. And it's a certain kind of bottom up process with integer programming

00:12:26.320 | there and clustering and various aspects, how do you build this abstraction?

00:12:31.360 | And then in the action abstraction, there, it's largely based on how

00:12:36.800 | humans and other AIs have played this game in the past. But in the beginning,

00:12:43.200 | we actually used an automated action abstraction technology, which is provably convergent,

00:12:49.920 | that it finds the optimal combination of bet sizes, but it's not very scalable.

00:12:55.360 | So, we couldn't use it for the whole game, but we used it for the first couple of betting actions.

00:12:59.680 | So, what's more important, the strength of the hand, so the information abstraction, or the

00:13:06.080 | how you play them, the actions? Does it, you know, the romanticized notion again,

00:13:13.120 | is that it doesn't matter what hands you have, that the actions, the betting,

00:13:17.120 | may be the way you win, no matter what hands you have.

00:13:20.240 | Yeah. So, that's why you have to play a lot of hands, so that the role of luck gets smaller. So,

00:13:27.280 | you could otherwise get lucky and get some good hands, and then you're going to win the match.

00:13:31.440 | Even with thousands of hands, you can get lucky, because there's so much variance in

00:13:37.120 | no limit Texas Hold'em, because if we both go all in, it's a huge stack of variance. So,

00:13:43.040 | there are these massive swings in no limit Texas Hold'em. So, that's why you have to play not just

00:13:49.600 | thousands, but over 100,000 hands to get statistical significance.

00:13:55.200 | So, let me ask another way this question. If you didn't even look at your hands,

00:14:00.400 | but they didn't know that, your opponents didn't know that, how well would you be able to do?

00:14:05.840 | That's a good question. There's actually, I heard the story that there's this Norwegian

00:14:10.720 | female poker player called Annette Oberstad, who's actually won a tournament by doing exactly that.

00:14:16.640 | But that would be extremely rare. So, you cannot really play well that way.

00:14:23.360 | Okay. So, the hands do have some role to play.

00:14:27.040 | Yes.

00:14:27.760 | So, Libratus does not use, as far as I understand, use learning methods, deep learning.

00:14:35.040 | Is there room for learning in, you know, there's no reason why Libratus doesn't, you know, combine

00:14:44.000 | with an AlphaGo type approach for estimating the quality for a function estimator. What are your

00:14:50.080 | thoughts on this? Maybe as compared to another algorithm, which I'm not that familiar with,

00:14:55.760 | DeepStack, the engine that does use deep learning, that it's unclear how well it does, but

00:15:01.920 | nevertheless uses deep learning. So, what are your thoughts about learning methods to aid

00:15:06.000 | in the way that Libratus plays the game of poker?

00:15:09.280 | Yeah. So, as you said, Libratus did not use learning methods and played very well without

00:15:15.440 | them. Since then, we have actually, actually here, we have a couple of papers on things that do

00:15:20.800 | use learning techniques.

00:15:22.160 | Excellent.

00:15:22.660 | So, and deep learning in particular, and sort of the way you're talking about, where it's learning

00:15:31.360 | an evaluation function. But in imperfect information games, unlike, let's say, in Go or

00:15:40.560 | now also in chess and shogi, it's not sufficient to learn an evaluation for a state, because the

00:15:48.160 | value of an information set depends not only on the exact state, but it also depends on

00:15:56.640 | both players' beliefs. Like, if I have a bad hand, I'm much better off if the opponent thinks I have

00:16:03.760 | a good hand, and vice versa. If I have a good hand, I'm much better off if the opponent believes I

00:16:09.360 | have a bad hand. So, the value of a state is not just a function of the cards. It depends on,

00:16:16.320 | if you will, the path of play, but only to the extent that is captured in the belief distributions.

00:16:23.200 | So, that's why it's not as simple as it is in perfect information games. And I don't want to

00:16:29.840 | say it's simple there either. It's, of course, very complicated computationally there too,

00:16:34.080 | but at least conceptually, it's very straightforward. There's a state,

00:16:37.360 | there's an evaluation function, you can try to learn it. Here, you have to do something more.

00:16:42.240 | And what we do is, in one of these papers, we're looking at allowing, where we allow the opponent

00:16:50.640 | to actually take different strategies at the leaf of the search tree, if you will. And that

00:16:57.200 | is a different way of doing it, and it doesn't assume, therefore, a particular way that the

00:17:02.640 | opponent plays, but it allows the opponent to choose from a set of different continuation

00:17:08.800 | strategies. And that forces us to not be too optimistic in a look-ahead search. And that's

00:17:16.160 | one way you can do sound look-ahead search in imperfect information games, which is very

00:17:21.840 | difficult. And you were asking about DeepStack. What they did, it was very different than what

00:17:28.560 | we do, either in Libratus or in this new work. They were randomly generating various situations

00:17:35.360 | in the game. Then they were doing the look-ahead from there to the end of the game, as if that was

00:17:40.560 | the start of a different game. And then they were using deep learning to learn those values of those

00:17:47.440 | states, but the states were not just the physical states, they include belief distributions.

00:17:52.400 | When you talk about look-ahead for DeepStack or with Libratus, does it mean considering every

00:18:00.320 | possibility that the game can evolve? Are we talking about extremely sort of this exponentially

00:18:05.840 | growth of a tree? Yes. So we're talking about exactly that.

00:18:09.360 | Much like you do in alpha beta search or Monte Carlo tree search, but with different techniques.

00:18:17.280 | So there's a different search algorithm, and then we have to deal with the leaves differently.

00:18:21.760 | So if you think about what Libratus did, we didn't have to worry about this, because we only did it

00:18:26.640 | at the end of the game. So we would always terminate into a real situation, and we would

00:18:32.560 | know what the payout is. It didn't do these depth-limited look-aheads. But now in this new

00:18:37.680 | paper, which is called Depth Limited, I think it's called Depth Limited Search for Imperfect

00:18:42.640 | Information Games, we can actually do sound depth-limited look-aheads. So we can actually

00:18:47.600 | start to do the look-ahead from the beginning of the game on, because that's too complicated to do

00:18:53.440 | for this whole long game. So in Libratus, we were just doing it for the end.

00:18:57.600 | So, and then the other side, this belief distribution. So is it explicitly modeled

00:19:02.960 | what kind of beliefs that the opponent might have?

00:19:06.400 | Yeah. It is explicitly modeled, but it's not assumed. The beliefs are actually output,

00:19:13.920 | not input. Of course, the starting beliefs are input, but they just fall from the rules of the

00:19:20.240 | game, because we know that the dealer deals uniformly from the deck. So I know that every

00:19:26.000 | pair of cards that you might have is equally likely. I know that for a fact. That just follows

00:19:32.080 | from the rules of the game. Of course, except the two cards that I have. I know you don't have those.

00:19:36.160 | You have to take that into account. That's called card removal, and that's very important.

00:19:40.880 | Is the dealing always coming from a single deck in a heads up? So you can assume-

00:19:45.840 | Single deck.

00:19:46.400 | That's called card removal.

00:19:47.680 | You know that if I have the ace of spades, I know you don't have an ace of spades.

00:19:53.440 | Great. So in the beginning, your belief is basically the fact that it's a fair

00:19:57.840 | dealing of hands, but how do you start to adjust that belief?

00:20:02.640 | Well, that's where this beauty of game theory comes. So Nash equilibrium,

00:20:08.160 | which John Nash introduced in 1950, introduces what rational play is when you have more than

00:20:14.320 | one player. And these are pairs of strategies where strategies are contingency plans,

00:20:20.240 | one for each player, so that neither player wants to deviate to a different strategy,

00:20:26.800 | given that the other doesn't deviate. But as a side effect, you get the beliefs from Bayes' rule.

00:20:33.680 | So Nash equilibrium really isn't just deriving in these imperfect information games.

00:20:38.240 | Nash equilibrium doesn't just define strategies. It also defines beliefs for both of us,

00:20:44.800 | and defines beliefs for each state. So at each state, each, they call information sets.

00:20:53.120 | At each information set in the game, there's a set of different states that we might be in,

00:20:58.880 | but I don't know which one we're in. Nash equilibrium tells me exactly what is the

00:21:03.680 | probability distribution over those real world states in my mind.

00:21:08.240 | How does Nash equilibrium give you that distribution? So why-

00:21:12.160 | I'll do a simple example. So you know the game rock, paper, scissors. So we can draw it as

00:21:18.400 | player one moves first, and then player two moves. But of course, it's important that player two

00:21:24.480 | doesn't know what player one moved. Otherwise, player two would win every time. So we can draw

00:21:29.200 | that as an information set where player one makes one of three moves first, and then there's an

00:21:34.000 | information set for player two. So player two doesn't know which of those nodes the world is in.

00:21:41.040 | But once we know the strategy for player one, Nash equilibrium will say that you play one-third rock,

00:21:47.200 | one-third paper, one-third scissors. From that, I can derive my beliefs on the information set

00:21:52.480 | that they're one-third, one-third, one-third. So Bayes gives you that.

00:21:56.160 | Bayes gives you. But is that specific to a particular player, or is it something you

00:22:02.400 | quickly update with the specific player? No, the game theory isn't really player specific.

00:22:08.720 | So that's also why we don't need any data. We don't need any history how these particular humans

00:22:13.920 | played in the past, or how any AI or human had played before. It's all about rationality. So

00:22:21.200 | the AI just thinks about what would a rational opponent do, and what would I do if I am rational.

00:22:27.920 | And that's the idea of game theory. So it's really a data-free, opponent-free approach.

00:22:35.520 | So it comes from the design of the game as opposed to the design of the player.

00:22:40.080 | Exactly. There's no opponent modeling per se. I mean, we've done some work on combining opponent

00:22:45.120 | modeling with game theory so you can exploit weak players even more, but that's another strand. And

00:22:50.320 | in Libratus, we didn't turn that on because I decided that these players are too good. And when

00:22:55.360 | you start to exploit an opponent, you typically open yourself up to exploitation. And these guys

00:23:02.320 | have so few holes to exploit, and they're world's leading experts in counter-exploitation. So I

00:23:06.960 | decided that we're not going to turn that stuff on. Actually, I saw a few of your papers exploiting

00:23:11.440 | opponents. It sounded very interesting to explore. Do you think there's room for exploitation

00:23:17.840 | generally outside of Libratus? Is there a subject or people differences that could be exploited,

00:23:25.200 | maybe not just in poker, but in general interactions and negotiations, all these

00:23:30.800 | other domains that you're considering? Yeah, definitely. We've done some work on that. And

00:23:36.080 | I really like the work that hybridizes the two. So you figure out what would a rational opponent

00:23:42.960 | do. And by the way, that's safe in these zero-sum games, two-player zero-sum games,

00:23:47.360 | because if the opponent does something irrational, yes, it might throw off my beliefs. But the

00:23:54.000 | amount that the player can gain by throwing off my belief is always less than they lose by playing

00:24:00.240 | poorly. So it's safe. But still, if somebody's weak as a player, you might want to play differently

00:24:08.160 | to exploit them more. So you can think about it this way. A game theoretic strategy is unbeatable,

00:24:14.560 | but it doesn't maximally beat the other opponent. So the winnings per hand might be better with a

00:24:22.960 | different strategy. And the hybrid is that you start from a game theoretic approach. And then

00:24:27.600 | as you gain data about the opponent in certain parts of the game tree, then in those parts of

00:24:33.360 | the game tree, you start to tweak your strategy more and more towards exploitation while still

00:24:39.840 | staying fairly close to the game theoretic strategy so as to not open yourself up to

00:24:44.480 | exploitation too much. - How do you do that? Do you try to

00:24:50.080 | vary up strategies, make it unpredictable? It's like, what is it, tit-for-tat strategies in

00:24:58.320 | Prisoner's Dilemma or? - Well, that's a repeated game.

00:25:02.800 | - Repeated games, that's right. - Simple Prisoner's Dilemma,

00:25:05.200 | repeated games. But even there, there's no proof that says that that's the best thing.

00:25:10.000 | But experimentally, it actually does well. - So what kind of games are there, first of all?

00:25:15.280 | I don't know if this is something that you could just summarize. There's perfect information games

00:25:20.240 | where all the information's on the table. There is imperfect information games. There's repeated

00:25:26.320 | games that you play over and over. There's zero-sum games. There's non-zero-sum games.

00:25:34.160 | And then there's a really important distinction you're making, two-player versus more players.

00:25:40.080 | So what other games are there and what's the difference, for example, with this

00:25:46.400 | two-player game versus more players? What are the key differences in your view?

00:25:51.520 | - So let me start from the basics. So a repeated game is a game where the same exact game is played

00:26:01.360 | over and over. In these extensive form games, where it's, think about tree form,

00:26:08.000 | maybe with these information sets to represent incomplete information, you can have kind of

00:26:14.000 | repetitive interactions. Even repeated games are a special case of that, by the way. But

00:26:20.240 | the game doesn't have to be exactly the same. It's like in sourcing auctions. Yes, we're going to see

00:26:25.040 | the same supply base year to year, but what I'm buying is a little different every time,

00:26:29.600 | and the supply base is a little different every time, and so on. So it's not really repeated.

00:26:34.160 | So to find a purely repeated game is actually very rare in the world. So they're really a very

00:26:39.680 | coarse model of what's going on. Then if you move up from just repeated, simple repeated matrix games,

00:26:49.680 | not all the way to extensive form games, but in between, there's stochastic games, where,

00:26:55.040 | you know, there's these, you think about it like these little matrix games, and when you take an

00:27:01.840 | action and your opponent takes an action, they determine not which next state I'm going to,

00:27:07.600 | next game I'm going to, but the distribution over next games where I might be going to.

00:27:12.480 | So that's the stochastic game. But it's like matrix games repeated, stochastic games,

00:27:18.880 | extensive form games. That is from less to more general. And poker is an example of the last one.

00:27:26.160 | So it's really in the most general setting, extensive form games. And that's kind of what

00:27:31.360 | the AI community has been working on and been benchmarked on with this heads up no limit Texas

00:27:37.520 | Hold'em. Can you describe extensive form games? What's the model here?

00:27:40.960 | Yeah, so if you're familiar with the tree form, so it's really the tree form, like in chess,

00:27:46.240 | there's a search tree. Versus a matrix.

00:27:48.560 | Versus a matrix, yeah. And the matrix is called the matrix form or bi-matrix form or normal form

00:27:54.560 | game. And here you have the tree form, so you can actually do certain types of reasoning there

00:27:59.280 | that you lose the information when you go to normal form. There's a certain form of equivalence,

00:28:06.800 | like if you go from tree form and you say every possible contingency plan is a strategy,

00:28:12.640 | then I can actually go back to the normal form, but I lose some information from the

00:28:16.880 | lack of sequentiality. Then the multiplayer versus two player distinction is an important

00:28:22.320 | one. So two player games in zero sum are conceptually easier and computationally easier.

00:28:32.640 | They're still huge, like this one, but they're conceptually easier and computationally easier

00:28:39.520 | in that conceptually you don't have to worry about which equilibrium is the other guy going to play

00:28:45.200 | when there are multiple, because any equilibrium strategy is a best response to any other

00:28:50.400 | equilibrium strategy. So I can play a different equilibrium from you and we'll still get the

00:28:55.600 | right values of the game. That falls apart even with two players when you have general sum games.

00:29:00.400 | Even without cooperation, just in general.

00:29:02.960 | Even without cooperation. So there's a big gap from two player zero sum to two player general

00:29:08.560 | sum or even to three player zero sum. That's a big gap, at least in theory.

00:29:14.160 | Can you maybe non-mathematically provide the intuition why it all falls apart with three

00:29:20.320 | or more players? It seems like you should still be able to have a Nash equilibrium that's instructive,

00:29:29.040 | that holds.

00:29:31.120 | Okay. So it is true that all finite games have a Nash equilibrium. So this is what John Nash

00:29:39.200 | actually proved. So they do have a Nash equilibrium. That's not the problem. The

00:29:43.840 | problem is that there can be many. And then there's a question of which equilibrium to select.

00:29:49.440 | And if you select your strategy from a different equilibrium and I select mine,

00:29:57.760 | then what does that mean? And in these non-zero sum games, we may lose some

00:30:03.040 | joint benefit by being just simply stupid. We could actually both be better off if we did

00:30:08.640 | something else. And in three player, you get other problems also like collusion. Like maybe

00:30:14.080 | you and I can gang up on a third player and we can do radically better by colluding. So there

00:30:20.240 | are lots of issues that come up there. So No Brown, the student you work with on this,

00:30:25.520 | has mentioned, I looked through the AMA on Reddit, he mentioned that the ability of poker players to

00:30:31.280 | collaborate would make the game. He was asked the question of how would you make the game of poker,

00:30:36.720 | or both of you were asked the question, how would you make the game of poker

00:30:40.480 | beyond being solvable by current AI methods? And he said that there's not many ways of making

00:30:50.960 | poker more difficult, but collaboration or cooperation between players would make it

00:30:58.400 | extremely difficult. So can you provide the intuition behind why that is, if you agree with

00:31:04.400 | that idea? Yeah, so I've done a lot of work on coalitional games. And we actually have a paper

00:31:11.360 | here with my other student Gabriele Farina and some other collaborators on, at NIPS on that,

00:31:16.640 | actually just came back from the poster session where we presented this. But so when you have a

00:31:21.600 | collusion, it's a different problem. And it typically gets even harder then. Even the game

00:31:28.000 | representations, some of the game representations don't really allow good computation. So we

00:31:34.640 | actually introduced a new game representation for that. Is that kind of cooperation part of the

00:31:40.560 | model? Do you have information about the fact that other players are cooperating? Or is it just this

00:31:48.000 | chaos that where nothing is known? So there's some things unknown. Can you give an example of a

00:31:53.760 | collusion type game? Or is it usually... So like bridge. So think about bridge. It's like when you

00:32:00.320 | and I are on a team, our payoffs are the same. The problem is that we can't talk. So when I get my

00:32:07.440 | cards, I can't whisper to you what my cards are. That would not be allowed. So we have to somehow

00:32:14.400 | coordinate our strategies ahead of time and only ahead of time. And then there's certain signals

00:32:21.440 | we can talk about, but they have to be such that the other team also understands that.

00:32:26.160 | So that's an example where the coordination is already built into the rules of the game.

00:32:32.800 | But in many other situations like auctions or negotiations or diplomatic relationships,

00:32:39.680 | poker, it's not really built in, but it still can be very helpful for the colluders.

00:32:45.200 | I've read you write somewhere, with negotiations, you come to the table with prior,

00:32:51.440 | like a strategy that you're willing to do and not willing to do, those kinds of things.

00:32:58.160 | So how do you start to now moving away from poker, moving beyond poker into other applications like

00:33:04.560 | negotiations? How do you start applying this to other domains, even real world domains that you've

00:33:11.760 | worked on? Yeah, I actually have two startup companies doing exactly that. One is called

00:33:15.920 | Strategic Machine, and that's for kind of business applications, gaming, sports, all sorts of things

00:33:22.240 | like that. Any applications of this to business and to sports and to gaming, to various types of

00:33:31.600 | things in finance, electricity markets, and so on. And the other is called Strategy Robot,

00:33:36.480 | where we are taking this to military, security, cyber security, and intelligence applications.

00:33:42.800 | I think you worked a little bit in, how do you put it, advertisement,

00:33:50.880 | sort of suggesting ads kind of thing. Yeah, that's another company, Optimized Markets.

00:33:57.120 | But that's much more about a combinatorial market and optimization-based technology.

00:34:02.720 | That's not using these game theoretic reasoning technologies.

00:34:06.640 | I see. Okay, so what sort of high level do you think about our ability to use

00:34:14.240 | game theoretic concepts to model human behavior? Do you think human behavior is

00:34:20.320 | amenable to this kind of modeling? So outside of the poker games, and where have you seen it

00:34:25.680 | done successfully in your work? I'm not sure the goal really is modeling humans.

00:34:32.000 | Like for example, if I'm playing a zero-sum game, I don't really care that the opponent is actually

00:34:40.240 | following my model of rational behavior, because if they're not, that's even better for me.

00:34:46.320 | Right. So see, with the opponents in games, the prerequisite is that you formalize

00:34:55.440 | the interaction in some way that can be amenable to analysis. And you've done this

00:35:02.080 | amazing work with mechanism design, designing games that have certain outcomes.

00:35:07.520 | But so I'll tell you an example from my world of autonomous vehicles, right?

00:35:15.280 | We're studying pedestrians, and pedestrians and cars negotiate in this nonverbal communication.

00:35:22.000 | There's this weird game dance of tension where pedestrians are basically saying,

00:35:27.120 | "I trust that you won't kill me, and so as a jaywalker, I will step onto the road,

00:35:31.680 | even though I'm breaking the law." And there's this tension. And the question is,

00:35:35.680 | we really don't know how to model that well, in trying to model intent. And so people sometimes

00:35:42.080 | bring up ideas of game theory and so on. Do you think that aspect of human behavior

00:35:48.400 | can use these kinds of imperfect information approaches, modeling? How do you start to attack

00:35:56.160 | a problem like that, when you don't even know how to design the game to describe

00:36:02.000 | the situation in order to solve it? Okay, so I haven't really thought about jaywalking.

00:36:06.720 | But one thing that I think could be a good application in autonomous vehicles is the

00:36:12.160 | following. So let's say that you have fleets of autonomous cars operated by different companies.

00:36:18.240 | So maybe here's the Waymo fleet, and here's the Uber fleet. If you think about the rules of the

00:36:23.280 | road, they define certain legal rules, but that still leaves a huge strategy space open. Like as

00:36:30.400 | a simple example, when cars merge, you know how humans merge, you know, they slow down and look

00:36:35.280 | at each other and try to merge. Wouldn't it be better if these situations would already be

00:36:41.920 | pre-negotiated, so we can actually merge at full speed, and we know that this is the situation,

00:36:47.280 | this is how we do it, and it's all going to be faster. But there are way too many situations

00:36:52.320 | to negotiate manually. So you could use automated negotiation, this is the idea at least,

00:36:57.600 | you could use automated negotiation to negotiate all of these situations, or many of them,

00:37:02.960 | in advance. And of course, it might be that, hey, maybe you're not going to always let me go first.

00:37:09.040 | Maybe you said, okay, well, in these situations, I'll let you go first. But in exchange,

00:37:13.360 | you're going to give me too much, you're going to let me go first in these situations. So it's

00:37:17.520 | this huge combinatorial negotiation. And do you think there's room in that example of merging

00:37:24.000 | to model this whole situation as an imperfect information game, or do you really want to

00:37:28.080 | consider it to be a perfect? No, that's a good question. Yeah, that's a good question.

00:37:33.280 | Do you pay the price of assuming that you don't know everything?

00:37:38.320 | Yeah, I don't know. It's certainly much easier. Games with perfect information are much easier.

00:37:45.040 | So if you can get away with it, you should. But if the real situation is of imperfect information,

00:37:52.480 | then you're going to have to deal with imperfect information.

00:37:55.040 | Great. So what lessons have you learned, the annual computer poker competition,

00:37:59.840 | an incredible accomplishment of AI? You know, you look at the history of Deep Blue, AlphaGo,

00:38:06.240 | these kind of moments when AI stepped up in an engineering effort and a scientific effort

00:38:13.280 | combined to beat the best of human players. So what do you take away from this whole experience?

00:38:19.360 | What have you learned about designing AI systems that play these kinds of games? And what does

00:38:24.240 | that mean for AI in general, for the future of AI development?

00:38:30.000 | Yeah, so that's a good question. There's so much to say about it. I do like this type of

00:38:36.480 | performance-oriented research, although in my group we go all the way from idea to theory to

00:38:42.960 | experiments to big system building to commercialization. So we span that spectrum. But I

00:38:48.160 | think that in a lot of situations in AI, you really have to build the big systems and evaluate

00:38:54.000 | them at scale before you know what works and doesn't. And we've seen that in the computational

00:39:00.000 | game theory community, that there are a lot of techniques that look good in the small,

00:39:03.920 | but then they cease to look good in the large. And we've also seen that there are a lot of

00:39:09.040 | techniques that look superior in theory. And I really mean in terms of convergence rates,

00:39:15.600 | better, like first order methods, better convergence rates, like the CFR-based

00:39:19.920 | algorithms. Yet the CFR-based algorithms are the fastest in practice. So it really tells me that

00:39:25.760 | you have to test this in reality. The theory isn't tight enough, if you will, to tell you

00:39:31.280 | which algorithms are better than the others. And you have to look at these things in the large,

00:39:38.400 | because any sort of projections you do from the small can, at least in this domain, be very

00:39:42.880 | misleading. So that's kind of from a kind of science and engineering perspective. From a

00:39:47.680 | personal perspective, it's been just a wild experience in that with the first poker competition,

00:39:54.000 | the first brains versus AI, man-machine poker competition that we organized. There had been,

00:40:00.240 | by the way, for other poker games, there had been previous competitions, but this was for

00:40:04.000 | Heads Up No Limit, this was the first. And I probably became the most hated person in the

00:40:09.680 | world of poker. And I didn't mean to. - Why is that? For cracking the game?

00:40:15.120 | - Yeah. A lot of people felt that it was a real threat to the whole game,

00:40:20.720 | the whole existence of the game. If AI becomes better than humans, people would be scared to

00:40:27.280 | play poker because there are these superhuman AIs running around taking their money and all of that.

00:40:32.800 | So it's just really aggressive. The comments were super aggressive. I got everything,

00:40:39.120 | just short of death threats. - Do you think the same was true for chess?

00:40:43.840 | Because right now they just completed the world championships in chess and humans just started

00:40:48.720 | ignoring the fact that there's AI systems now that outperform humans and they still enjoy the

00:40:53.760 | game. It's still a beautiful game. - That's what I think. And I think the

00:40:57.120 | same thing happens in poker. And so I didn't think of myself as somebody who was going to kill the

00:41:02.000 | game and I don't think I did. I've really learned to love this game. I wasn't a poker player before,

00:41:06.880 | but I learned so many nuances from these AIs and they've really changed how the game is played,

00:41:12.400 | by the way. So they have these very Martian ways of playing poker and the top humans are now

00:41:17.600 | incorporating those types of strategies into their own play. So if anything, to me,

00:41:22.480 | our work has made poker a richer, more interesting game for humans to play,

00:41:29.760 | not something that is going to steer humans away from it entirely.

00:41:33.200 | - Just a quick comment on something you said, which is, if I may say so, in academia is a little

00:41:40.400 | bit rare sometimes. It's pretty brave to put your ideas to the test in the way you described,

00:41:46.400 | saying that sometimes good ideas don't work when you actually try to apply them at scale.

00:41:51.920 | So where does that come from? I mean, if you could do advice for people, what drives you in that

00:42:00.320 | sense? Were you always this way? I mean, it takes a brave person, I guess is what I'm saying, to test

00:42:05.680 | their ideas and to see if this thing actually works against human top human players and so on.

00:42:11.520 | - Yeah, I don't know about brave, but it takes a lot of work. It takes a lot of work and a lot of

00:42:16.800 | time to organize, to make something big and to organize an event and stuff like that.

00:42:22.480 | - And what drives you in that effort? Because you could still, I would argue, get a best paper

00:42:28.000 | award at NIPS as you did in '17 without doing this. - That's right, yes.

00:42:32.160 | - And so... - So in general, I believe

00:42:36.560 | it's very important to do things in the real world and at scale. And that's really where the

00:42:42.640 | pudding, if you will, proof is in the pudding. That's where it is. In this particular case,

00:42:50.000 | it was kind of a competition between different groups for many years as to who can be the first

00:42:58.400 | one to beat the top humans at Heads Up No Limit Texas Hold'em. So it became kind of a

00:43:04.880 | competition who can get there. - Yeah, so a little friendly competition

00:43:11.680 | could do wonders for progress. - Yes, absolutely.

00:43:14.960 | - So the topic of mechanism design, which is really interesting,

00:43:19.840 | also kind of new to me, except as an observer of, I don't know, politics and any... I'm an observer

00:43:26.640 | of mechanisms, but you write in your paper on automated mechanism design that I quickly read.

00:43:32.960 | So mechanism design is designing the rules of the game so you get a certain desirable outcome.

00:43:40.000 | And you have this work on doing so in an automatic fashion as opposed to fine-tuning it. So what have

00:43:47.280 | you learned from those efforts? If you look, say, I don't know, at a complex system, like

00:43:53.840 | our political system, can we design our political system to have, in an automated fashion,

00:44:00.720 | to have outcomes that we want? Can we design something like traffic lights to be smart

00:44:08.960 | where it gets outcomes that we want? So what are the lessons that you draw from that work?

00:44:14.560 | - Yeah, so I still very much believe in the automated mechanism design direction.

00:44:19.040 | - Yes. - But it's not a panacea.

00:44:22.880 | There are impossibility results in mechanism design, saying that there is no mechanism that

00:44:28.480 | accomplishes objective X in class C. So it's not going to... There's no way using any mechanism

00:44:38.240 | design tools, manual or automated, to do certain things in mechanism design.

00:44:42.640 | - Can you describe that again? So meaning it's impossible to achieve that?

00:44:47.040 | - Yeah, yeah. - Or it's unlikely?

00:44:49.840 | - Impossible. - Impossible.

00:44:51.440 | - So these are not statements about human ingenuity, who might come up with something

00:44:56.320 | smart. These are proofs that if you want to accomplish properties X in class C,

00:45:02.240 | that is not doable with any mechanism. The good thing about automated mechanism design is that

00:45:07.680 | we're not really designing for a class. We're designing for specific settings at a time.

00:45:13.360 | So even if there's an impossibility result for the whole class, it doesn't mean that all of the

00:45:19.760 | cases in the class are impossible. It just means that some of the cases are impossible.

00:45:24.880 | So we can actually carve these islands of possibility within these known impossible

00:45:29.680 | classes. And we've actually done that. So one of the famous results in mechanism design is

00:45:35.040 | the Meyerson-Satterthwaite theorem by Roger Meyerson and Mark Satterthwaite from 1983.

00:45:40.160 | It's an impossibility of efficient trade under imperfect information. We show that

00:45:45.760 | you can in many settings avoid that and get efficient trade anyway.

00:45:50.400 | - Depending on how you design the game. Okay, so-

00:45:53.760 | - Depending how you design the game. And of course, it doesn't in any way

00:45:58.720 | contradict the impossibility result. The impossibility result is still there,

00:46:03.680 | but it just finds spots within this impossible class where in those spots,

00:46:10.640 | you don't have the impossibility. - Sorry if I'm going a bit philosophical,

00:46:14.640 | but what lessons do you draw towards, like I mentioned, politics or human interaction and

00:46:20.160 | designing mechanisms for outside of just these kinds of trading or auctioning or

00:46:30.880 | purely formal games or human interaction, like a political system? Do you think it's applicable to

00:46:37.360 | politics or to business, to negotiations, these kinds of things, designing rules that have

00:46:47.920 | certain outcomes? - Yeah, yeah, I do think so.

00:46:50.880 | - Have you seen that successfully done? - There hasn't really, oh, you mean mechanism

00:46:56.000 | design or automated mechanism? - Automated mechanism design.

00:46:58.320 | - So mechanism design itself has had fairly limited success so far. There are certain cases,

00:47:07.440 | but most of the real world situations are actually not sound from a mechanism design perspective.

00:47:14.560 | Even in those cases where they've been designed by very knowledgeable mechanism design people,

00:47:19.840 | the people are typically just taking some insights from the theory and applying those insights into

00:47:25.120 | the real world rather than applying the mechanisms directly. So one famous example of it is the FCC

00:47:32.000 | spectrum auctions. So I've also had a small role in that and very good economists have been,

00:47:40.480 | excellent economists have been working on that with no game theory, yet the rules that are

00:47:45.040 | designed in practice there, they're such that bidding truthfully is not the best strategy.

00:47:51.600 | Usually mechanism design, we try to make things easy for the participants. So telling the truth

00:47:57.040 | is the best strategy. But even in those very high stakes auctions where you have tens of billions

00:48:02.480 | of dollars worth of spectrum being auctioned, truth telling is not the best strategy.

00:48:07.840 | And by the way, nobody knows even a single optimal bidding strategy for those auctions.

00:48:13.520 | - What's the challenge of coming up with an optimal, because there's a lot of players and

00:48:17.280 | there's imperfect players. - It's not so much a lot of players,

00:48:19.920 | but many items for sale. And these mechanisms are such that even with just two items or one item,

00:48:26.560 | bidding truthfully wouldn't be the best strategy. - If you look at the history of AI,

00:48:34.400 | it's marked by seminal events, AlphaGo beating a world champion, human goal player. I would put

00:48:40.560 | Liberatus winning the Heads Up No Limit Hold'em as one of such event. - Thank you.

00:48:45.200 | - And what do you think is the next such event, whether it's in your life or in the broadly AI

00:48:55.920 | community that you think might be out there that would surprise the world?

00:49:00.800 | - So that's a great question. And I don't really know the answer. In terms of game solving,

00:49:05.760 | Heads Up No Limit Texas Hold'em really was the one remaining widely agreed upon benchmark.

00:49:14.320 | So that was the big milestone. Now, are there other things? Yeah, certainly there are,

00:49:18.800 | but there's not one that the community has kind of focused on. So what could be other things?

00:49:24.240 | There are groups working on StarCraft. There are groups working on Dota 2. These are video games.

00:49:30.720 | Or you could have like Diplomacy or Hanabi, things like that. These are like recreational games,

00:49:38.480 | but none of them are really acknowledged as kind of the main next challenge problem,

00:49:44.720 | like Chess or Go or Heads Up No Limit Texas Hold'em was. So I don't really know in the game

00:49:51.440 | solving space what is or what will be the next benchmark. I kind of hope that there will be a

00:49:56.640 | next benchmark because really the different groups working on the same problem really drove these

00:50:02.240 | application-independent techniques forward very quickly over 10 years.

00:50:07.280 | Do you think there's an open problem that excites you that you start moving away from games into

00:50:12.400 | real-world games, like say the stock market trading?

00:50:17.120 | Yeah, so that's kind of how I am. So I am probably not going to work

00:50:22.000 | as hard on these recreational benchmarks. I'm doing two startups on game-solving technology,

00:50:30.080 | Strategic Machine and Strategy Robot, and we're really interested in pushing this

00:50:35.120 | stuff into practice. What do you think would be really a powerful result that would be surprising,

00:50:45.200 | that would be, if you can say, I mean, five years, 10 years from now, something that statistically

00:50:55.920 | you would say is not very likely, but if there's a breakthrough, would achieve?

00:51:01.360 | Yeah, so I think that overall we're in a very different situation in game theory

00:51:08.320 | than we are in, let's say, machine learning. So in machine learning, it's a fairly mature

00:51:13.600 | technology and it's very broadly applied and proven success in the real world.

00:51:19.520 | In game solving, there are almost no applications yet.

00:51:22.400 | We have just become superhuman, which machine learning you could argue happened in the 90s,

00:51:29.360 | if not earlier, and at least on supervised learning, certain complex supervised learning

00:51:35.120 | applications. Now, I think the next challenge problem, I know you're not asking about it this

00:51:40.960 | way, you're asking about the technology breakthrough, but I think that the big,

00:51:44.080 | big breakthrough is to be able to show that, hey, maybe most of, let's say, military planning or

00:51:49.360 | most of business strategy will actually be done strategically using computational game theory.

00:51:54.320 | That's what I would like to see as a next five or 10 year goal.

00:51:58.160 | Maybe you can explain to me again, forgive me if this is an obvious question, but machine

00:52:03.760 | learning methods, neural networks suffer from not being transparent, not being explainable.

00:52:09.200 | Game theoretic methods, Nash equilibria, do they generally, when you see the different solutions,

00:52:15.680 | are they, when you talk about military operations, are they, once you see the strategies,

00:52:22.160 | do they make sense? Are they explainable or do they suffer from the same problems as neural

00:52:26.560 | networks do? So that's a good question. I would say a little bit yes and no. And what I mean by

00:52:33.200 | that is that these game theoretic strategies, let's say Nash equilibrium, it has provable

00:52:39.840 | properties. So it's unlike, let's say, deep learning where you kind of cross your fingers,

00:52:44.800 | hopefully it'll work. And then after the fact, when you have the weights, you're still crossing

00:52:48.800 | your fingers and I hope it will work. Here, you know that the solution quality is there.

00:52:55.760 | There's provable solution quality guarantees. Now that doesn't necessarily mean that the

00:53:01.120 | strategies are human understandable. That's a whole other problem. So I think that deep learning

00:53:06.800 | and computational game theory are in the same boat in that sense, that both are difficult to

00:53:11.760 | understand. But at least the game theoretic techniques, they have these guarantees of

00:53:18.480 | solution quality. So do you see business operations, strategic operations, or even military

00:53:24.000 | in the future being at least the strong candidates being proposed by automated systems?

00:53:31.920 | Do you see that? Yeah, I do. I do. But that's more of a belief than a substantiated fact.

00:53:39.520 | Depending on where you land in optimism or pessimism, that's a really, to me, that's an

00:53:43.760 | exciting future, especially if there's provable things in terms of optimality. So looking into

00:53:52.560 | the future, there's a few folks worried about the, especially you look at the game of poker,

00:54:01.120 | which is probably one of the last benchmarks in terms of games being solved. They worry about

00:54:06.800 | the future and the existential threats of artificial intelligence. So the negative impact

00:54:11.760 | in whatever form on society. Is that something that concerns you as much or are you more optimistic

00:54:18.800 | about the positive impacts of AI? I am much more optimistic about the positive

00:54:23.760 | impacts. So just in my own work, what we've done so far, we run the nationwide kidney exchange.

00:54:29.840 | Hundreds of people are walking around alive today, who wouldn't be? And it's increased

00:54:35.120 | employment. You have a lot of people now running kidney exchanges and at transplant centers,

00:54:42.160 | interacting with the kidney exchange. You have extra surgeons, nurses, anesthesiologists,

00:54:49.280 | hospitals, all of that. So employment is increasing from that and the world is becoming

00:54:54.240 | a better place. Another example is combinatorial sourcing auctions. We did 800 large-scale

00:55:02.320 | combinatorial sourcing auctions from 2001 to 2010 in a previous startup of mine called CombineNet.

00:55:09.360 | And we increased the supply chain efficiency on that $60 billion of spend by 12.6%. So that's over

00:55:18.800 | $6 billion of efficiency improvement in the world. And this is not like shifting value from somebody

00:55:24.400 | to somebody else, just efficiency improvement. Like in trucking, less empty driving, so there's

00:55:29.840 | less waste, less carbon footprint, and so on. This is a huge positive impact in the near term,

00:55:36.800 | but to stay in it for a little longer, because I think game theory has a role to play here.

00:55:43.040 | - Oh, let me actually come back on that as one thing. I think AI is also going to make the world

00:55:48.080 | much safer. So that's another aspect that often gets overlooked.

00:55:52.960 | - Well, let me ask this question. Maybe you can speak to the safer. So I talked to Max Tegmark

00:55:58.560 | and Stuart Russell, who are very concerned about existential threats of AI. And often the concern

00:56:03.760 | is about value misalignment. So AI systems basically operating towards goals that are not

00:56:14.240 | the same as human civilization, human beings. So it seems like game theory has a role to play there

00:56:20.960 | to make sure the values are aligned with human beings. I don't know if that's how you think

00:56:29.360 | about it. If not, how do you think AI might help with this problem? How do you think AI might make

00:56:37.040 | the world safer? - Yeah, I think this value misalignment

00:56:43.120 | is a fairly theoretical worry, and I haven't really seen it in, because I do a lot of real

00:56:51.040 | applications, I don't see it anywhere. The closest I've seen it was the following type of mental

00:56:56.880 | exercise really, where I had this argument in the late '80s when we were building these

00:57:01.840 | transportation optimization systems. And somebody had heard that it's a good idea to have high

00:57:06.400 | utilization of assets. So they told me, "Hey, why don't you put that as objective?"

00:57:10.880 | And we didn't even put it as an objective, because I just showed him that, "If you had

00:57:17.120 | that as your objective, the solution would be to load your trucks full and drive in circles.

00:57:21.920 | Nothing would ever get delivered. You'd have 100% utilization." So yeah, I know this

00:57:26.640 | phenomenon, I've known this for over 30 years, but I've never seen it actually be a problem

00:57:31.920 | in reality. And yes, if you have the wrong objective, the AI will optimize that to the

00:57:37.360 | hilt, and it's gonna hurt more than some human who's kinda trying to solve it in a half-baked

00:57:43.680 | way with some human insight too. But I just haven't seen that materialize in practice.

00:57:48.320 | - There's this gap that you've actually put your finger on very clearly just now,

00:57:54.960 | between theory and reality, that's very difficult to put into words, I think. It's what you can

00:58:00.880 | theoretically imagine, the worst possible case, or even, yeah, I mean, bad cases, and what usually

00:58:09.120 | happens in reality. So for example, to me, maybe it's something you can comment on, having grown

00:58:15.680 | up in, I grew up in the Soviet Union. You know, there's currently 10,000 nuclear weapons in the

00:58:21.200 | world. And for many decades, it's theoretically surprising to me that the nuclear war is not

00:58:29.840 | broken out. Do you think about this aspect from a game theoretic perspective in general,

00:58:36.240 | why is that true? Why, in theory, you could see how things would go terribly wrong, and somehow

00:58:43.200 | yet they have not. How do you think about that? - So I do think about that a lot. I think the

00:58:47.760 | biggest two threats that we're facing as mankind, one is climate change, and the other is nuclear

00:58:52.720 | war. So those are my main two worries that I worry about. And I've tried to do something about

00:58:59.120 | climate, thought about trying to do something for climate change twice. I actually, for two of my

00:59:04.640 | startups, I've actually commissioned studies of what we could do on those things. And we didn't

00:59:09.920 | really find a sweet spot, but I'm still keeping an eye out on that. If there's something where we

00:59:14.480 | could actually provide a market solution or optimization solution or some other technology

00:59:19.360 | solution to problems. Right now, like for example, pollution, great markets was what we were looking

00:59:26.000 | at then. And it was much more the lack of political will by those markets were not so successful,

00:59:32.240 | rather than bad market design. So I could go in and make a better market design,

00:59:37.120 | but that wouldn't really move the needle on the world very much if there's no political will.

00:59:41.200 | And in the US, you know, the market, at least the Chicago market was just shut down,

00:59:45.680 | and so on. So then it doesn't really help how great your market design was.

00:59:50.560 | So on the nuclear side, it's more so global warming is a more encroaching

00:59:56.640 | problem. You know, nuclear weapons have been here. It's an obvious problem that's just been

01:00:05.200 | sitting there. So how do you think about what is the mechanism design there that just made everything

01:00:11.520 | seem stable? And are you still extremely worried?

01:00:14.800 | I am still extremely worried. So you probably know the simple game theory of MAD.

01:00:19.600 | So this was a mutually assured destruction. And it doesn't require any computation. With small

01:00:26.880 | matrices, you can actually convince yourself that the game is such that nobody wants to initiate.

01:00:30.880 | Yeah, that's a very coarse grained analysis. And it really works in a situation where you have

01:00:38.080 | two superpowers or small number of superpowers. Now things are very different. You have a smaller

01:00:42.800 | nuke. So the threshold of initiating is smaller, and you have smaller countries and non-nation

01:00:50.640 | actors who may get nukes and so on. So I think it's riskier now than it was maybe ever before.

01:00:57.440 | And what idea, application of AI, you've talked about a little bit, but what is the most exciting

01:01:06.560 | to you right now? I mean, you're here at NIPS, NeurIPS now, you have a few excellent pieces of

01:01:14.640 | work, but what are you thinking into the future? With several companies you're doing, what's the

01:01:18.160 | most exciting thing or one of the exciting things? The number one thing for me right now is coming up

01:01:24.480 | with these scalable techniques for game solving and applying them into the real world. I'm still

01:01:31.360 | very interested in market design as well. And we're doing that in the optimized markets. But

01:01:35.600 | I'm most interested if number one right now is strategic machine, strategy robot, getting that

01:01:40.480 | technology out there and seeing as you were in the trenches doing applications, what needs to

01:01:46.240 | be actually filled, what technology gaps still need to be filled. So it's so hard to just put

01:01:51.120 | your feet on the table and imagine what needs to be done. But when you're actually doing real

01:01:55.280 | applications, the applications tell you what needs to be done. And I really enjoy that interaction.

01:02:00.880 | Is it a challenging process to apply some of the state of the art techniques you're working on

01:02:06.880 | and having the various players in industry or the military, or people who could really benefit from

01:02:17.600 | it actually use it? What's that process like? You know, in autonomous vehicles, we work with

01:02:22.640 | automotive companies, and they're in many ways, they're a little bit old fashioned. It's difficult.

01:02:29.280 | They really want to use this technology. There's clearly will have a significant benefit. But the

01:02:35.920 | systems aren't quite in place to easily have them integrated in terms of data in terms of

01:02:41.840 | compute in terms of all these kinds of things. So do you, is that one of the bigger challenges

01:02:46.480 | that you're facing? And how do you tackle that challenge?

01:02:50.160 | Yeah, I think that's always a challenge. That's kind of slowness and inertia, really.

01:02:55.600 | Let's do things the way we've always done it. You just have to find the internal champions at the

01:03:00.160 | customer who understand that, hey, things can't be the same way in the future. Otherwise, bad things

01:03:05.280 | are going to happen. And it's in autonomous vehicles, it's actually very interesting that

01:03:09.600 | the car makers are doing that, and they're very traditional. But at the same time, you have tech

01:03:13.760 | companies who have nothing to do with cars or transportation, like Google and Baidu,

01:03:18.240 | really pushing on autonomous cars. I find that fascinating.

01:03:23.120 | Clearly, you're super excited about actually these ideas having an impact in the world.

01:03:28.240 | In terms of the technology, in terms of ideas and research, are there directions that you're

01:03:35.120 | also excited about? Whether that's on some of the approaches you talked about for imperfect

01:03:41.280 | information games, whether it's applying deep learning to some of these problems? Is there

01:03:45.280 | something that you're excited in the research side of things?

01:03:48.800 | Yeah, yeah. Lots of different things in the game solving. Solving even bigger games,

01:03:55.120 | games where you have more hidden action of the player actions as well. Poker is a game where

01:04:03.440 | really the chance actions are hidden, or some of them are hidden, but the player actions are public.

01:04:08.240 | Multiplayer games of various sorts, collusion, opponent exploitation,

01:04:18.000 | all and even longer games. Games that basically go forever, but they're not repeated.

01:04:24.080 | Extensive fun games that go forever. What would that even look like? How do you represent that?

01:04:30.880 | How do you solve that? What's an example of a game like that?

01:04:33.200 | Is this some of the stochastic games that you mentioned?

01:04:35.840 | Let's say business strategy. Not just modeling a particular interaction, but thinking about

01:04:41.920 | the business from here to eternity. Or let's say military strategy. It's not like war is going to

01:04:50.080 | go away. How do you think about military strategy that's going to go forever? How do you even model

01:04:57.440 | that? How do you know whether a move was good that somebody made? And so on. That's one direction.

01:05:06.800 | I'm also very interested in learning much more scalable techniques for integer programming.

01:05:13.360 | So we had a nice email paper this summer on that. The first automated algorithm configuration paper

01:05:20.160 | that has theoretical generalization guarantees. So if I see this many training examples,

01:05:25.360 | and I told my algorithm in this way, it's going to have good performance on the real distribution,

01:05:31.520 | which I've not seen. So which is kind of interesting that algorithm configuration

01:05:36.560 | has been going on now for at least 17 years, seriously. And there has not been any

01:05:42.960 | generalization theory before. Well, this is really exciting. And it's a huge honor to talk to you.

01:05:49.760 | Thank you so much, Tomas. Thank you for bringing Libratus to the world and all the great work

01:05:53.520 | you're doing. Well, thank you very much. It's been fun. Good questions.

01:05:56.880 | Thank you.

01:05:58.420 | you

01:05:58.920 | you

01:05:59.420 | you

01:05:59.920 | you

01:06:02.000 | you

01:06:04.060 | you

01:06:06.120 | you

01:06:08.180 | you

01:06:10.240 | you

01:06:12.300 | [BLANK_AUDIO]

Tuomas Sandholm: Poker and Game Theory | Lex Fridman Podcast #12

Chapters