back to index

Michael Littman: Reinforcement Learning and the Future of AI | Lex Fridman Podcast #144


Chapters

0:0 Introduction
2:30 Robot and Frank
4:50 Music
8:1 Starring in a TurboTax commercial
18:14 Existential risks of AI
36:36 Reinforcement learning
62:24 AlphaGo and David Silver
72:3 Will neural networks achieve AGI?
84:30 Bitter Lesson
97:20 Does driving require a theory of mind?
106:46 Book Recommendations
112:8 Meaning of life

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Michael Littman, a computer science professor at Brown
00:00:04.480 | University doing research on and teaching machine learning, reinforcement learning,
00:00:10.320 | and artificial intelligence. He enjoys being silly and lighthearted in conversation, so this was
00:00:17.040 | definitely a fun one. Quick mention of each sponsor, followed by some thoughts related to
00:00:22.240 | the episode. Thank you to SimpliSafe, a home security company I use to monitor and protect
00:00:28.560 | my apartment, ExpressVPN, the VPN I've used for many years to protect my privacy on the internet,
00:00:34.160 | Masterclass, online courses that I enjoy from some of the most amazing humans in history,
00:00:40.000 | and BetterHelp, online therapy with a licensed professional. Please check out these sponsors
00:00:46.560 | in the description to get a discount and to support this podcast. As a side note, let me say
00:00:52.400 | that I may experiment with doing some solo episodes in the coming month or two. The three ideas I have
00:00:59.040 | floating in my head currently is to use one particular moment in history, two a particular
00:01:06.240 | movie, or three a book to drive a conversation about a set of related concepts. For example,
00:01:13.360 | I could use 2001 A Space Odyssey or Ex Machina to talk about AGI for one, two, three hours.
00:01:21.520 | Or I could do an episode on the, yes, rise and fall of Hitler and Stalin, each in a separate
00:01:29.120 | episode, using relevant books and historical moments for reference. I find the format of a
00:01:35.040 | solo episode very uncomfortable and challenging, but that just tells me that it's something I
00:01:40.960 | definitely need to do and learn from the experience. Of course, I hope you come along
00:01:46.320 | for the ride. Also, since we have all this momentum built up on announcements, I'm giving a few
00:01:52.080 | lectures on machine learning at MIT this January. In general, if you have ideas for the episodes,
00:01:58.320 | for the lectures, or for just short videos on YouTube, let me know in the comments that I still
00:02:07.120 | definitely read, despite my better judgment and the wise sage advice of the great Joe Rogan.
00:02:15.920 | If you enjoy this thing, subscribe on YouTube, review it with Five Stars on Apple Podcast,
00:02:20.320 | follow on Spotify, support on Patreon, or connect with me on Twitter @LexFriedman.
00:02:25.760 | And now, here's my conversation with Michael Littman. I saw a video of you talking to Charles
00:02:33.440 | Isbell about Westworld, the TV series. You guys were doing the kind of thing where you're
00:02:37.760 | watching new things together, but let's rewind back. Is there a sci-fi movie or book
00:02:46.000 | or shows that was profound, that had an impact on you philosophically, or just specifically
00:02:53.040 | something you enjoyed nerding out about? >> Yeah, interesting. I think a lot of us
00:02:57.680 | have been inspired by robots in movies. One that I really like is, there's a movie called
00:03:04.240 | Robot and Frank, which I think is really interesting because it's very near-term future,
00:03:09.360 | where robots are being deployed as helpers in people's homes. And we don't know how to make
00:03:16.880 | robots like that at this point, but it seemed very plausible. It seemed very realistic or
00:03:22.240 | imaginable. And I thought that was really cool because they're awkward, they do funny things,
00:03:27.200 | it raised some interesting issues, but it seemed like something that would ultimately be helpful
00:03:31.200 | and good if we could do it right. >> Yeah, he was an older, cranky gentleman.
00:03:34.640 | >> He was an older, cranky jewel thief, yeah. >> It's kind of a funny little thing, which is,
00:03:39.840 | he's a jewel thief, and so he pulls the robot into his life, which is something you could imagine
00:03:47.200 | taking a home robotics thing and pulling into whatever quirky thing that's involved in your
00:03:55.600 | existence. >> Yeah, that's meaningful to you.
00:03:57.200 | Exactly so. Yeah, and I think from that perspective, I mean, not all of us are
00:04:00.560 | jewel thieves, and so when we bring our robots into our lives- >> Speak for yourself, yeah.
00:04:03.440 | >> It explains a lot about this apartment, actually. But no, the idea that people should
00:04:10.000 | have the ability to make this technology their own, that it becomes part of their lives. And I think
00:04:16.320 | it's hard for us as technologists to make that kind of technology. It's easier to mold people
00:04:21.840 | into what we need them to be. And just that opposite vision, I think, is really inspiring.
00:04:26.640 | >> And then there's a anthropomorphization where we project certain things on them,
00:04:31.600 | because I think the robot was kind of dumb. But I have a bunch of Roombas I play with, and
00:04:35.440 | you immediately project stuff onto them, a much greater level of intelligence. We'll probably do
00:04:40.800 | that with each other, too, a much greater degree of compassion. >> That's right. One of the things
00:04:45.520 | we're learning from AI is where we are smart and where we are not smart. >> Yeah. You also enjoy,
00:04:52.320 | as people can see, and I enjoy myself, watching you sing and even dance a little bit, a little bit
00:05:01.040 | of dancing. >> A little bit of dancing. That's not quite my thing. >> As a method of education
00:05:06.640 | or just in life, in general. So easy question. What's the definitive, objectively speaking,
00:05:15.920 | top three songs of all time? Maybe something that, to walk that back a little bit, maybe something
00:05:23.360 | that others might be surprised by, the three songs that you kind of enjoy. >> That is a great
00:05:29.200 | question that I cannot answer, but instead, let me tell you a story. >> Pick a question you do
00:05:34.560 | want to answer. >> That's right. I've been watching the presidential debates and vice
00:05:37.680 | presidential debates, and it turns out, yeah, you can just answer any question you want.
00:05:41.120 | >> Let me interrupt you. >> That's a related question. >> No, I'm just kidding. >>
00:05:45.960 | Yeah, well said. I really like pop music. I've enjoyed pop music ever since I was very young. So
00:05:51.040 | '60s music, '70s music, '80s music, this is all awesome. And then I had kids, and I think I stopped
00:05:56.240 | listening to music, and I was starting to realize that my musical taste had sort of frozen out.
00:06:01.440 | And so I decided in 2011, I think, to start listening to the top 10 Billboard songs each
00:06:07.920 | week. So I'd be on the treadmill, and I would listen to that week's top 10 songs so I could
00:06:12.320 | find out what was popular now. And what I discovered is that I have no musical taste whatsoever. I like
00:06:19.200 | what I'm familiar with. And so the first time I'd hear a song, it's the first week that it was on
00:06:23.920 | the charts. I'd be like, "Ugh." And then the second week, I was into it a little bit. And the third
00:06:28.480 | week, I was loving it. And by the fourth week, it was just part of me. And so I'm afraid that I can't
00:06:34.640 | tell you my favorite song of all time, because it's whatever I heard most recently. >> Yeah,
00:06:38.880 | that's interesting. People have told me that there's an art to listening to music as well.
00:06:46.640 | And you can start to, if you listen to a song just carefully, explicitly just force yourself to
00:06:51.840 | really listen, you start to... I did this when I was part of a jazz band and fusion band in college,
00:06:57.040 | you start to hear the layers of the instruments. You start to hear the individual instruments.
00:07:03.520 | You can listen to classical music or to orchestra this way. You can listen to jazz this way.
00:07:09.280 | It's funny to imagine you now, to walk in that forward, to listening to pop hits now as like a
00:07:17.440 | scholar, listening to like Cardi B or something like that, or Justin Timberlake. No, not Timberlake,
00:07:23.920 | Bieber. >> They've both been in the top 10 since I've been listening. >> They're still up there.
00:07:29.360 | Oh my God, I'm so cool. >> If you haven't heard Justin Timberlake's top 10 in the last few years,
00:07:34.080 | there was one song that he did where the music video was set at essentially NUR-IPS. >> Oh,
00:07:40.160 | wow. Oh, the one with the robotics. Yeah, yeah, yeah, yeah, yeah. >> Yeah, yeah. It's like at
00:07:44.560 | an academic conference and he's doing a demo. >> He was presenting, right? >> It was sort of
00:07:48.320 | a cross between the Apple, like Steve Jobs kind of talk and NUR-IPS. It's always fun when AI shows
00:07:57.040 | up in pop culture. >> I wonder if he consulted somebody for that. That's really interesting.
00:08:01.840 | So maybe on that topic, I've seen your celebrity multiple dimensions, but one of them is you've
00:08:07.440 | done cameos in different places. I've seen you in a TurboTax commercial as like, I guess, the
00:08:15.360 | brilliant Einstein character. And the point is that TurboTax doesn't need somebody like you.
00:08:22.560 | It doesn't need a brilliant person. >> Very few things need someone like me. But yes,
00:08:27.600 | they were specifically emphasizing the idea that you don't need to be a computer expert to be able
00:08:32.400 | to use their software. >> How'd you end up in that world? >> I think it's an interesting story. So I
00:08:36.880 | was teaching my class. It was an intro computer science class for non-concentrators, non-majors.
00:08:42.480 | And sometimes when people would visit campus, they would check in to say, "Hey,
00:08:48.080 | we want to see what a class is like. Can we sit on your class?" So a person came to my class
00:08:54.560 | who was the daughter of the brother of the husband of the best friend of my wife.
00:09:05.520 | Anyway, basically a family friend came to campus to check out Brown and asked to come to my class
00:09:13.760 | and came with her dad. Her dad is who I've known from various kinds of family events and so forth,
00:09:20.160 | but he also does advertising. And he said that he was recruiting scientists for this ad, this
00:09:28.000 | TurboTax set of ads. And he said, "We wrote the ad with the idea that we get the most brilliant
00:09:35.200 | researchers, but they all said no. So can you help us find B-level scientists?" I'm like, "Sure,
00:09:44.880 | that's who I hang out with. So that should be fine." So I put together a list and I did what
00:09:50.320 | some people called a Dick Cheney. So I included myself on the list of possible candidates,
00:09:54.880 | with a little blurb about each one and why I thought that would make sense for them to do it.
00:09:59.280 | And they reached out to a handful of them, but then ultimately, YouTube stalked me a little bit
00:10:03.840 | and they thought, "Oh, I think he could do this." And they said, "Okay, we're going to offer you
00:10:08.720 | the commercial." I'm like, "What?" So it was such an interesting experience because they have
00:10:15.120 | another world. The people who do nationwide kind of ad campaigns and television shows and movies
00:10:22.800 | and so forth, it's quite a remarkable system that they have going because they- - It's like a set?
00:10:28.800 | - Yeah, so I went to, it was just somebody's house that they rented in New Jersey.
00:10:34.880 | But in the commercial, it's just me and this other woman. In reality, there were 50 people
00:10:41.600 | in that room and another, I don't know, half a dozen kind of spread out around the house in
00:10:46.160 | various ways. There were people whose job it was to control the sun. They were in the backyard
00:10:51.360 | on ladders, putting filters up to try to make sure that the sun didn't glare off the window
00:10:56.800 | in a way that would wreck the shot. So there was like six people out there doing that. There was
00:11:00.400 | three people out there giving snacks, the craft table. There was another three people giving
00:11:05.520 | healthy snacks because that was a separate craft table. There was one person whose job it was
00:11:09.760 | to keep me from getting lost. And I think the reason for all this is because so many people
00:11:15.760 | are in one place at one time, they have to be time efficient. They have to get it done.
00:11:19.440 | The morning they were going to do my commercial, in the afternoon they were going to do a commercial
00:11:23.440 | of a mathematics professor from Princeton. They had to get it done. No wasted time or energy.
00:11:30.320 | And so there's just a fleet of people all working as an organism. And it was fascinating. I was
00:11:35.040 | just the whole time, I'm just looking around like, this is so neat. Like one person whose job it was
00:11:39.760 | to take the camera off of the cameraman so that someone else whose job it was to remove the film
00:11:46.480 | canister, because every couple's takes, they had to replace the film because film gets used up.
00:11:51.920 | It was just, I don't know. I was geeking out the whole time. It was so fun.
00:11:55.440 | - How many takes did it take? It looked the opposite. There was more than two people there.
00:11:59.520 | It was very relaxed. - Right, right. Yeah.
00:12:00.880 | The person who I was in the scene with is a professional. She's an improv comedian from New
00:12:09.200 | York City. And when I got there, they had given me a script, such as it was. And then I got there
00:12:13.600 | and they said, "We're going to do this as improv." I'm like, "I don't know how to improv.
00:12:17.840 | I don't know what you're telling me to do here."
00:12:21.840 | "Don't worry, she knows." I'm like, "Okay. We'll see how this goes."
00:12:25.360 | - I guess I got pulled into the story because like, where the heck did you come from?
00:12:29.840 | I guess in the scene. Like how did you show up in this random person's house? I don't know.
00:12:35.520 | - Yeah, well, I mean, the reality of it is I stood outside in the blazing sun. There was
00:12:39.040 | someone whose job it was to keep an umbrella over me because I started to shvitz. I started to sweat.
00:12:43.520 | And so I would wreck the shot because my face was all shiny with sweat. So there was one person
00:12:47.360 | who would dab me off, had an umbrella. But yeah, like the reality of it, like why is this strange,
00:12:53.440 | stalkery person hanging around outside somebody's house?
00:12:55.840 | - Yeah, we're not sure. We'll have to look in. We'll have to wait for the book. But are you...
00:13:00.080 | So you make, like you said, YouTube, you make videos yourself. You make awesome parody,
00:13:06.800 | sort of parody songs that kind of focus in on a particular aspect of computer science.
00:13:13.360 | How much, those seem really natural. How much production value goes into that? Do you also
00:13:18.480 | have a team of 50 people? - The videos, almost all the videos,
00:13:22.480 | except for the ones that people would have actually seen, are just me. I write the lyrics,
00:13:26.880 | I sing the song. I generally find a backing track online because I'm like you, can't really play an
00:13:35.280 | instrument. And then I do, in some cases, I'll do visuals using just like PowerPoint.
00:13:41.200 | Lots and lots of PowerPoint to make it sort of like an animation. The most produced one is the
00:13:46.160 | one that people might've seen, which is the overfitting video that I did with Charles Isbell.
00:13:51.200 | And that was produced by the Georgia Tech and Udacity people 'cause we were doing a class
00:13:56.640 | together. It was kind of, I usually do parody songs kind of to cap off a class at the end of a
00:14:01.280 | class. - So that one you're wearing,
00:14:03.440 | so it was just a thriller. - Yeah.
00:14:05.440 | - You're wearing the Michael Jackson, the red leather jacket. The interesting thing with
00:14:10.400 | podcasting that you're also into is that I really enjoy is that there's not a team of people.
00:14:20.160 | It's kind of more, 'cause you know, there's something that happens when there's more people
00:14:29.040 | involved than just one person, that just the way you start acting, I don't know, there's a censorship,
00:14:36.400 | you're not given, especially for like slow thinkers like me, you're not, and I think most
00:14:41.840 | of us are if we're trying to actually think, we're a little bit slow and careful. It kind of,
00:14:48.640 | large teams get in the way of that. And I don't know what to do with, like that's the, to me,
00:14:55.840 | like if, you know, it's very popular to criticize quote unquote mainstream media.
00:15:01.760 | But there is legitimacy to criticizing them, the same, I love listening to NPR for example,
00:15:06.880 | but every, it's clear that there's a team behind it, there's a commercial, there's constant
00:15:12.000 | commercial breaks, there's this kind of like rush of like, okay, I have to interrupt you now
00:15:17.520 | because we have to go to commercial, just this whole, it creates, it destroys the possibility
00:15:23.840 | of nuanced conversation. - Yeah, exactly.
00:15:28.160 | Evian, which Charles Isbell, who I talked to yesterday, told me that Evian is naive backwards,
00:15:36.080 | which the fact that his mind thinks this way is just, it's quite brilliant. Anyway, there's a
00:15:41.520 | freedom to this podcast. - He's Dr. Awkward, which by the way,
00:15:44.800 | is a palindrome. That's a palindrome that I happen to know from other parts of my life.
00:15:48.960 | - You just threw it out of, well, you know, - It's gonna stick.
00:15:51.760 | - Use it against Charles. Dr. Awkward. - So what was the most challenging
00:15:57.600 | parody song to make? Was it the Thriller one? - No, that one was really fun. I wrote the lyrics
00:16:03.200 | really quickly and then I gave it over to the production team. They recruited an acapella group
00:16:09.040 | to sing. That went really smoothly. It's great having a team 'cause then you can just focus on
00:16:13.840 | the part that you really love, which in my case is writing the lyrics. For me, the most challenging
00:16:18.960 | one, not challenging in a bad way, but challenging in a really fun way was I did, one of the parody
00:16:25.040 | songs I did is about the halting problem in computer science. The fact that you can't create
00:16:30.640 | a program that can tell for any other arbitrary program whether it actually gonna get stuck in an
00:16:36.320 | infinite loop or whether it's going to eventually stop. And so I did it to an 80s song because
00:16:42.720 | I hadn't started my new thing of learning current songs. And it was Billy Joel's The Piano Man.
00:16:50.080 | - Nice. - Which is a great song.
00:16:52.000 | - Great song. - Yeah, yeah. And-
00:16:54.320 | - Sing Me a Song, You're the Piano Man. Yeah, it's a great song.
00:16:58.960 | - So the lyrics are great because first of all, it rhymes. Not all songs rhyme. I've done
00:17:05.680 | Rolling Stones songs, which turn out to have no rhyme scheme whatsoever. They're just sort of
00:17:10.400 | yelling and having a good time, which makes it not fun from a parody perspective 'cause you can say
00:17:15.120 | anything. But the lines rhymed and there was a lot of internal rhymes as well. And so figuring out
00:17:20.320 | how to sing with internal rhymes, a proof of the halting problem was really challenging. And I
00:17:26.240 | really enjoyed that process. - What about, last question on this topic,
00:17:30.800 | what about the dancing in the Thriller video? How many takes did that take?
00:17:33.760 | - So I wasn't planning to dance. They had me in the studio and they gave me the jacket and it's
00:17:39.680 | like, well, you can't, if you have the jacket and the glove, like there's not much you can do.
00:17:43.040 | - Yeah. - So I think I just danced around.
00:17:46.960 | And then they said, why don't you dance a little bit? There was a scene with me and Charles dancing
00:17:50.400 | together. - In that video?
00:17:51.760 | - They did not use it in the video, but we recorded it.
00:17:53.680 | - I don't remember. - Yeah, yeah, no, it was pretty funny.
00:17:56.800 | And Charles, who has this beautiful, wonderful voice, doesn't really sing. He's not really a
00:18:03.520 | singer. And so that was why I designed the song with him doing a spoken section and me doing the
00:18:08.000 | singing section. - Yeah, it's very like Barry White.
00:18:09.440 | - Yeah, just smooth baritone. Yeah, yeah, it's great.
00:18:12.480 | - Yeah, it was awesome. So one of the other things Charles said is that, you know, everyone
00:18:19.200 | knows you as like a super nice guy, super passionate about teaching and so on. What he said,
00:18:26.400 | don't know if it's true, that despite the fact that you're, you are super--
00:18:30.640 | - Killed a man in cold blood. (laughing)
00:18:32.320 | Like, okay, all right, I will admit this finally for the first time, that was me.
00:18:36.240 | - It's the Johnny Cash song, "Killed a man in Reno just to watch him die."
00:18:41.760 | That you actually do have some strong opinions on some topics. So if this, in fact, is true, what
00:18:47.840 | strong opinions would you say you have? Is there ideas you think, maybe in artificial
00:18:55.120 | intelligence, machine learning, maybe in life, that you believe is true that others might,
00:19:01.200 | you know, some number of people might disagree with you on?
00:19:05.280 | - So I try very hard to see things from multiple perspectives.
00:19:09.760 | There's this great Calvin and Hobbes cartoon where, do you know?
00:19:15.680 | - Yeah.
00:19:16.240 | - Okay, so Calvin's dad is always kind of a bit of a foil and he talked Calvin into,
00:19:21.440 | Calvin had done something wrong, the dad talks him into like seeing it from another perspective
00:19:25.440 | and Calvin, like this breaks Calvin because he's like, oh my gosh, now I can see the opposite
00:19:30.560 | sides of things and so it becomes like a cubist cartoon where there is no front and back,
00:19:36.000 | everything's just exposed and it really freaks him out and finally he settles back down. It's like,
00:19:39.840 | oh good, now I can make that go away. But like I'm that, I live in that world where I'm trying
00:19:44.880 | to see everything from every perspective all the time. So there are some things that I've
00:19:48.400 | formed opinions about that would be harder, I think, to disavow me of. One is the super
00:19:56.240 | intelligence argument and the existential threat of AI is one where I feel pretty confident in my
00:20:02.720 | feeling about that one. Like I'm willing to hear other arguments but like I am not particularly
00:20:07.840 | moved by the idea that if we're not careful, we will accidentally create a super intelligence that
00:20:13.760 | will destroy human life. - Let's talk about that,
00:20:16.160 | let's get you in trouble and record your own video. It's like Bill Gates, I think he said like
00:20:22.240 | some quote about the internet that that's just gonna be a small thing, it's not gonna really go
00:20:26.880 | anywhere. And I think Steve Ballmer said, I don't know why I'm sticking on Microsoft, that's
00:20:33.600 | something that like smartphones are useless, there's no reason why Microsoft should get into
00:20:38.960 | smartphones, that kind of. So let's talk about AGI. As AGI is destroying the world, we'll look
00:20:44.640 | back at this video and see. No, I think it's really interesting to actually talk about because nobody
00:20:49.280 | really knows the future so you have to use your best intuition, it's very difficult to predict it.
00:20:54.080 | But you have spoken about AGI and the existential risks around it and sort of based on your
00:21:01.280 | intuition that we're quite far away from that being a serious concern relative to the other
00:21:08.480 | concerns we have. Can you maybe unpack that a little bit? - Yeah, sure, sure, sure. So as I
00:21:15.120 | understand it, for example, I read Bostrom's book and a bunch of other reading material about this
00:21:22.080 | sort of general way of thinking about the world and I think the story goes something like this,
00:21:26.320 | that we will at some point create computers that are smart enough that they can help design the
00:21:35.200 | next version of themselves, which itself will be smarter than the previous version of themselves,
00:21:41.360 | and eventually bootstrapped up to being smarter than us, at which point we are essentially at
00:21:47.920 | the mercy of this sort of more powerful intellect, which in principle we don't have any control over
00:21:55.600 | what its goals are. And so if its goals are at all out of sync with our goals, like for example,
00:22:03.440 | the continued existence of humanity, we won't be able to stop it. It'll be way more powerful than
00:22:09.840 | us and we will be toast. So there's some, I don't know, very smart people who have signed on to that
00:22:17.120 | story and it's a compelling story. I once, now I can really get myself in trouble, I once wrote
00:22:23.760 | an op-ed about this, specifically responding to some quotes from Elon Musk, who has been on this
00:22:29.440 | very podcast more than once. - AI summoning the demon, I forget. - That's a thing he said,
00:22:36.800 | but then he came to Providence, Rhode Island, which is where I live, and said to the governors
00:22:42.160 | of all the states, "You're worried about entirely the wrong thing. You need to be worried about AI.
00:22:47.360 | You need to be very, very worried about AI." And journalists kind of reacted to that and they
00:22:53.600 | wanted to get people's take. And I was like, "Okay." My belief is that one of the things that
00:23:00.320 | makes Elon Musk so successful and so remarkable as an individual is that he believes in the power
00:23:06.400 | of ideas. He believes that you can have, you can, if you have a really good idea for getting into
00:23:11.600 | space, you can get into space. If you have a really good idea for a company or for how to
00:23:15.600 | change the way that people drive, you just have to do it and it can happen. It's really natural
00:23:22.160 | to apply that same idea to AI. You see these systems that are doing some pretty remarkable
00:23:26.640 | computational tricks, demonstrations, and then to take that idea and just push it all the way to the
00:23:33.840 | limit and think, "Okay, where does this go? Where is this going to take us next?" And if you're a
00:23:38.240 | deep believer in the power of ideas, then it's really natural to believe that those ideas could
00:23:43.680 | be taken to the extreme and kill us. So I think his strength is also his undoing because that
00:23:51.360 | doesn't mean it's true. It doesn't mean that that has to happen, but it's natural for him to think
00:23:55.920 | that. - So another way to phrase the way he thinks, and I find it very difficult to argue
00:24:03.440 | with that line of thinking. So Sam Harris is another person from a neuroscience perspective
00:24:08.560 | that thinks like that, is saying, "Well, is there something fundamental in the physics of the
00:24:16.240 | universe that prevents this from eventually happening?" And Nick Bostrom thinks in the same
00:24:22.000 | way, that kind of zooming out, "Yeah, okay, we humans now are existing in this time scale of
00:24:30.000 | minutes and days, and so our intuition is in this time scale of minutes, hours, and days,
00:24:35.680 | but if you look at the span of human history, is there any reason you can't see this in 100 years?
00:24:44.160 | And is there something fundamental about the laws of physics that prevent this? And if it doesn't,
00:24:50.480 | then it eventually will happen, or we will destroy ourselves in some other way." And it's very
00:24:56.000 | difficult, I find, to actually argue against that. - Yeah, me too. - And not sound like you're just
00:25:08.640 | rolling your eyes, "Ugh, I have--" - It's science fiction, we don't have to think about it. - But
00:25:12.880 | even worse than that, which is like, I don't have kids, but I gotta pick up my kids now. - I see,
00:25:19.040 | there's more pressing short-term-- - Yeah, there's more pressing short-term things that
00:25:23.280 | stop it with this existential crisis, we have much shorter things, like now, especially this year,
00:25:27.840 | there's COVID, so any kind of discussion like that is, there's pressing things today. And then,
00:25:36.800 | so the Sam Harris argument, well, any day, the exponential singularity can occur,
00:25:44.080 | and it's very difficult to argue against. I mean, I don't know. - But part of his story is also,
00:25:48.000 | he's not gonna put a date on it. It could be in 1,000 years, it could be in 100 years,
00:25:52.560 | it could be in two years. It's just that as long as we keep making this kind of progress,
00:25:56.320 | it ultimately has to become a concern. I kind of am on board with that, but the thing that,
00:26:02.000 | the piece that I feel like is missing from that way of extrapolating from the moment that we're in
00:26:07.200 | is that I believe that in the process of actually developing technology that can really get around
00:26:13.120 | in the world and really process and do things in the world in a sophisticated way, we're gonna learn
00:26:18.480 | a lot about what that means, which that we don't know now, 'cause we don't know how to do this
00:26:23.120 | right now. If you believe that you can just turn on a deep learning network and it eventually,
00:26:27.680 | give it enough compute and it'll eventually get there, well, sure, that seems really scary,
00:26:31.200 | because we won't be in the loop at all. We won't be helping to design or target these kinds of
00:26:37.680 | systems. But I don't see that, that feels like it is against the laws of physics, because these
00:26:44.160 | systems need help, right? They need to surpass the difficulty, the wall of complexity that happens
00:26:51.440 | in arranging something in the form that that will happen in. Like I believe in evolution. Like I
00:26:57.600 | believe that there's an argument, right? So there's another argument, just to look at it from a
00:27:02.480 | different perspective, that people say, "Well, I don't believe in evolution. How could evolution,
00:27:06.240 | it's sort of like a random set of parts assemble themselves into a 747, and that could just never
00:27:13.680 | happen." So it's like, okay, that's maybe hard to argue against, but clearly 747s do get assembled,
00:27:19.840 | they get assembled by us. Basically the idea being that there's a process by which we will get to the
00:27:26.080 | point of making technology that has that kind of awareness. And in that process, we're gonna learn
00:27:31.520 | a lot about that process, and we'll have more ability to control it or to shape it or to build
00:27:37.200 | it in our own image. It's not something that is gonna spring into existence like that 747,
00:27:43.120 | and we're just gonna have to contend with it completely unprepared.
00:27:46.640 | - That's very possible that in the context of the long arc of human history, it will in fact spring
00:27:53.920 | into existence. But that springing might take, like if you look at nuclear weapons, like even
00:28:00.800 | 20 years is a springing in the context of human history. And it's very possible, just like with
00:28:07.440 | nuclear weapons, that we could have, I don't know what percentage you wanna put at it, but
00:28:12.480 | the possibility of-- - Could have knocked ourselves out.
00:28:14.400 | - Yeah, the possibility of human beings destroying themselves in the 20th century,
00:28:18.480 | with nuclear weapons, I don't know, if you really think through it, you could really put it close
00:28:24.560 | to like, I don't know, 30, 40%, given like the certain moments of crisis that happen. So like,
00:28:30.800 | I think one, like fear in the shadows that's not being acknowledged, is it's not so much the AI
00:28:40.080 | will run away, is that as it's running away, we won't have enough time to think through how to
00:28:48.880 | stop it. - Right, fast takeoff or fume.
00:28:51.440 | - Yeah, I mean, my much bigger concern, I wonder what you think about it, which is,
00:28:56.240 | we won't know it's happening. So I kind of-- - That argument, yeah.
00:29:03.120 | - Think that there's an AGI situation already happening with social media, that our minds,
00:29:10.720 | our collective intelligence of human civilization is already being controlled by an algorithm.
00:29:15.120 | And like, we're already super, like the level of a collective intelligence, thanks to Wikipedia,
00:29:22.800 | people should donate to Wikipedia to feed the AGI. - Man, if we had a super intelligence that
00:29:28.240 | was in line with Wikipedia's values, that it's a lot better than a lot of other things I can
00:29:34.080 | imagine. I trust Wikipedia more than I trust Facebook or YouTube, as far as trying to do
00:29:39.520 | the right thing from a rational perspective. - Yeah.
00:29:41.680 | - Now that's not where you were going, I understand that, but it does strike me that
00:29:45.120 | there's sort of smarter and less smart ways of exposing ourselves to each other on the internet.
00:29:50.960 | - Yeah, the interesting thing is that Wikipedia and social media are very different forces,
00:29:55.360 | you're right, I mean, Wikipedia, if AGI was Wikipedia, it'd be just like this cranky,
00:30:01.680 | overly competent editor of articles. There's something to that, but the social media aspect
00:30:09.360 | is not, so the vision of AGI is as a separate system that's super intelligent,
00:30:16.240 | that's super intelligent, that's one key little thing. I mean, there's the paperclip argument
00:30:20.880 | that's super dumb, but super powerful systems. But with social media, you have a relatively,
00:30:26.960 | like algorithms we may talk about today, very simple algorithms that when,
00:30:32.480 | so something Charles talks a lot about, which is interactive AI, when they start having at scale,
00:30:39.760 | like tiny little interactions with human beings, they can start controlling these human beings.
00:30:44.480 | So a single algorithm can control the minds of human beings slowly, to what we might not
00:30:50.240 | realize, it could start wars, it could start, it could change the way we think about things.
00:30:56.000 | It feels like in the long arc of history, if I were to sort of zoom out from all the outrage
00:31:02.880 | and all the tension on social media, that it's progressing us towards better and better things.
00:31:09.200 | It feels like chaos and toxic and all that kind of stuff, but--
00:31:13.440 | - It's chaos and toxic, yeah.
00:31:14.880 | - But it feels like actually the chaos and toxic is similar to the kind of debates we had
00:31:20.640 | from the founding of this country. There was a civil war that happened over that period. And
00:31:26.160 | ultimately it was all about this tension of like, something doesn't feel right about our
00:31:31.440 | implementation of the core values we hold as human beings, and they're constantly struggling with
00:31:36.080 | this. And that results in people calling each other, just being shady to each other on Twitter.
00:31:44.880 | But ultimately the algorithm is managing all that. And it feels like there's a possible future in
00:31:50.720 | which that algorithm controls us into the direction of self-destruction, whatever that looks like.
00:31:58.640 | - Yeah, so, all right, I do believe in the power of social media to screw us up royally. I do
00:32:04.880 | believe in the power of social media to benefit us too. I do think that we're in a, yeah, it's
00:32:11.680 | sort of almost got dropped on top of us. And now we're trying to, as a culture, figure out how to
00:32:15.840 | cope with it. There's a sense in which, I don't know, there's some arguments that say that, for
00:32:22.160 | example, I guess, college-age students now, late college-age students now, people who were in
00:32:26.960 | middle school when social media started to really take off, may be really damaged. Like, this may
00:32:34.240 | have really hurt their development in a way that we don't have all the implications of quite yet.
00:32:38.880 | That's the generation who, if, and I hate to make it somebody else's responsibility, but like,
00:32:46.240 | they're the ones who can fix it. They're the ones who can figure out how do we keep the good
00:32:52.000 | of this kind of technology without letting it eat us alive? And if they're successful,
00:32:59.120 | we move on to the next phase, the next level of the game. If they're not successful, then, yeah,
00:33:04.640 | then we're going to wreck each other. We're going to destroy society.
00:33:07.840 | - So you're going to, in your old age, sit on a porch and watch the world burn
00:33:11.840 | because of the TikTok generation that-
00:33:14.480 | - I believe, well, so this is my kid's age, right? And it's certainly my daughter's age,
00:33:18.800 | and she's very tapped in to social stuff, but she's also, she's trying to find that balance,
00:33:24.080 | right, of participating in it and in getting the positives of it, but without letting it
00:33:28.000 | eat her alive. And I think sometimes she ventures, I hope she doesn't watch this,
00:33:33.680 | sometimes I think she ventures a little too far and is consumed by it, and other times she gets
00:33:40.080 | a little distance. And if there's enough people like her out there, they're going to navigate
00:33:46.800 | this choppy waters. - That's an interesting skill,
00:33:50.960 | actually, to develop. I talked to my dad about it. I've now, somehow, this podcast in particular,
00:33:58.960 | but other reasons, has received a little bit of attention. And with that, apparently, in this
00:34:05.200 | world, even though I don't shut up about love and I'm just all about kindness, I have now a little
00:34:11.680 | mini army of trolls. It's kind of hilarious, actually, but it also doesn't feel good. But
00:34:18.560 | it's a skill to learn to not look at that, to moderate, actually, how much you look at that.
00:34:25.760 | The discussion I have with my dad is similar to, it doesn't have to be about trolls, it could be
00:34:30.320 | about checking email, which is, if you're anticipating, my dad runs a large institute
00:34:37.680 | at Drexel University, and there could be stressful emails you're waiting, there's drama of some
00:34:43.680 | kinds. And so there's a temptation to check the email, if you send an email, and that pulls you
00:34:50.320 | in into, it doesn't feel good. And it's a skill that he actually complains that he hasn't learned,
00:34:57.040 | I mean, he grew up without it, so he hasn't learned the skill of how to shut off the internet
00:35:02.400 | and walk away. And I think young people, while they're also being, quote unquote, damaged by
00:35:09.520 | being bullied online, all of those stories, which are very horrific, you basically can't escape your
00:35:14.800 | bullies these days when you're growing up. But at the same time, they're also learning that skill
00:35:19.760 | of how to be able to shut off, disconnect with it, be able to laugh at it, not take it too seriously.
00:35:28.080 | It's fascinating. We're all trying to figure this out, just like you said, it's been dropped on us,
00:35:31.840 | and we're trying to figure it out. - Yeah, I think that's really
00:35:34.240 | interesting. And I guess I've become a believer in the human design, which I feel like I don't
00:35:40.880 | completely understand. How do you make something as robust as us? We're so flawed in so many ways,
00:35:47.200 | and yet, and yet, we dominate the planet, and we do seem to manage to get ourselves out of scrapes,
00:35:56.000 | eventually. Not necessarily the most elegant possible way, but somehow we get to the next step.
00:36:02.240 | And I don't know how I'd make a machine do that. Generally speaking, like if I train one of my
00:36:09.600 | reinforcement learning agents to play a video game, and it works really hard on that first stage over
00:36:13.920 | and over and over again, and it makes it through, it succeeds on that first level. And then the new
00:36:18.160 | level comes, and it's just like, okay, I'm back to the drawing board. And somehow humanity, we keep
00:36:22.400 | leveling up, and then somehow managing to put together the skills necessary to achieve success,
00:36:29.120 | some semblance of success in that next level too. And I hope we can keep doing that.
00:36:35.840 | - You mentioned reinforcement learning. So you have a couple of years in the field. No.
00:36:41.920 | - (laughs)
00:36:43.040 | - Quite a few. Quite a long career in artificial intelligence broadly, but
00:36:50.080 | reinforcement learning specifically. Can you maybe give a hint about your sense of the history of the
00:36:57.600 | field? In some ways it's changed with the advent of deep learning, but has a long roots. How has it
00:37:04.560 | weaved in and out of your own life? How have you seen the community change, or maybe the ideas
00:37:09.680 | that it's playing with change? - I've had the privilege, the pleasure,
00:37:14.240 | of having almost a front row seat to a lot of this stuff, and it's been really, really fun and
00:37:18.800 | interesting. So when I was in college in the '80s, early '80s, the neural net thing was starting to
00:37:28.240 | happen. And I was taking a lot of psychology classes and a lot of computer science classes
00:37:33.440 | as a college student. And I thought, you know, something that can play tic-tac-toe and just
00:37:38.240 | learn to get better at it, that ought to be a really easy thing. So I spent almost all of my,
00:37:43.120 | what would have been vacations during college, hacking on my home computer, trying to teach it
00:37:48.400 | how to play tic-tac-toe. - Programming language.
00:37:50.320 | - Basic. Oh yeah, that's my first language. That's my native language.
00:37:54.960 | - Is that when you first fell in love with computer science, just like programming basic on that?
00:37:59.360 | What was the computer? Do you remember? - I had a TRS-80 Model 1 before they were
00:38:05.680 | called Model 1s, 'cause there was nothing else. I got my computer in 1979. So I would have been
00:38:18.000 | bar mitzvahed, but instead of having a big party that my parents threw on my behalf, they just got
00:38:22.960 | me a computer, 'cause that's what I really, really, really wanted. I saw them in the mall
00:38:26.800 | in Radarshack, and I thought, what, how are they doing that? I would try to stump them. I would
00:38:32.080 | give them math problems, like one plus, and then in parentheses, two plus one. And I would always
00:38:37.040 | get it right. I'm like, how do you know so much? Like, I've had to go to algebra class for the
00:38:42.320 | last few years to learn this stuff, and you just seem to know. So I was smitten, and I got a
00:38:48.240 | computer. And I think ages 13 to 15, I have no memory of those years. I think I just was in my
00:38:55.760 | room with the computer-- - Listening to Billy Joel.
00:38:57.920 | - Communing, possibly listening to the radio, listening to Billy Joel. That was the one album
00:39:02.320 | I had on vinyl at that time. And then I got it on cassette tape, and that was really helpful,
00:39:08.960 | 'cause then I could play it. I didn't have to go down to my parents' Wi-Fi, or Hi-Fi, sorry.
00:39:12.720 | And at age 15, I remember kind of walking out, and like, okay, I'm ready to talk to people again.
00:39:19.440 | Like, I've learned what I need to learn here. And so yeah, so that was my home computer. And so I
00:39:25.680 | went to college, and I was like, oh, I'm totally gonna study computer science. And I opted, the
00:39:29.600 | college I chose specifically had a computer science major. The one that I really wanted,
00:39:33.920 | the college I really wanted to go to didn't, so bye-bye to them.
00:39:37.200 | - Which college did you go to? - So I went to Yale. Princeton would
00:39:41.280 | have been way more convenient, and it was just a beautiful campus, and it was close enough to home,
00:39:45.200 | and I was really excited about Princeton. And I visited, and I said, so, computer science major.
00:39:49.040 | They're like, well, we have computer engineering. I'm like, oh, I don't like that word,
00:39:52.320 | engineering. (both laughing)
00:39:54.800 | I like computer science. I really, I wanna do, like, you're saying hardware and software?
00:39:58.880 | They're like, yeah. I'm like, I just wanna do software. I couldn't care less about hardware.
00:40:01.680 | - And you grew up in Philadelphia? - I grew up outside Philly, yeah, yeah.
00:40:04.320 | - Okay, wow. - So the local schools were like Penn
00:40:08.160 | and Drexel and Temple. Like, everyone in my family went to Temple at least at one point in their lives
00:40:13.680 | except for me. So yeah, Philly family. - Yale had a computer science department,
00:40:18.800 | and that's when you, it's kinda interesting you said '80s and neural networks. That's when
00:40:23.280 | the neural networks was a hot new thing or a hot thing, period. So what, is that in college when
00:40:28.800 | you first learned about neural networks? - Yeah, yeah.
00:40:30.800 | - Or was she like, how did you-- - And it was in a psychology class,
00:40:32.960 | not in a CS class. - Oh, wow.
00:40:34.400 | - Yeah. - Was it psychology
00:40:35.680 | or cognitive science, or like, do you remember like what context?
00:40:38.720 | - It was, yeah, yeah, yeah. So I was a, I've always been a bit of a cognitive psychology groupie.
00:40:44.560 | So like, I study computer science, but I like to hang around where the cognitive scientists are,
00:40:50.480 | 'cause I don't know, brains, man. They're like, they're wacky, cool.
00:40:54.640 | - And they have a bigger picture view of things. They're a little less engineering, I would say.
00:41:00.480 | They're more interested in the nature of cognition and intelligence and perception
00:41:04.400 | and how the vision system works. They're asking always bigger questions. Now with the deep learning
00:41:09.920 | community, they're I think more, there's a lot of intersections, but I do find that the
00:41:16.560 | neuroscience folks actually and cognitive psychology, cognitive science folks are starting
00:41:23.600 | to learn how to program, how to use artificial neural networks. And they are actually approaching
00:41:29.040 | problems in like totally new, interesting ways. It's fun to watch that grad students from those
00:41:33.360 | departments, like approach a problem of machine learning.
00:41:36.960 | - Right, they come in with a different perspective.
00:41:38.880 | - Yeah, they don't care about like your ImageNet data set or whatever. They want like to understand
00:41:44.640 | the basic mechanisms at the neuronal level, at the functional level of intelligence. So it's kind
00:41:53.360 | of cool to see them work. But yeah, okay, so you were always a groupie of cognitive psychology.
00:42:00.720 | - Yeah, yeah. And so it was in a class by Richard Gehrig. He was kind of my favorite
00:42:05.920 | psych professor in college. And I took like three different classes with him.
00:42:10.560 | And yeah, so they were talking specifically, the class I think was kind of a,
00:42:15.840 | there was a big paper that was written by Steven Pinker and Prince, I'm blanking on Prince's first
00:42:23.120 | name, but Pinker and Prince, they wrote kind of a, they were at that time kind of like,
00:42:30.240 | I'm blanking on the names of the current people, the cognitive scientists who are complaining a
00:42:36.880 | lot about deep networks. - Oh, Gary.
00:42:40.080 | - Gary Marcus. - Gary Marcus.
00:42:42.160 | And who else? I mean, there's a few, but Gary is the most feisty.
00:42:47.120 | - Sure, Gary's very feisty. And with his co-author, they're kind of doing these kind
00:42:51.840 | of takedowns where they say, okay, well, yeah, it does all these amazing things, but here's a
00:42:56.320 | shortcoming, here's a shortcoming, here's a shortcoming. And so the Pinker-Prince paper
00:42:59.600 | is kind of like that generation's version of Marcus and Davis, right? Where they're trained
00:43:06.800 | as cognitive scientists, but they're looking skeptically at the results in the artificial
00:43:12.000 | intelligence neural net kind of world and saying, yeah, it can do this and this and this, but like,
00:43:16.720 | it can't do that, and it can't do that, and it can't do that. Maybe in principle, or maybe just
00:43:20.640 | in practice at this point, but the fact of the matter is you've narrowed your focus too far
00:43:26.560 | to be impressed. You're impressed with the things within that circle, but you need to broaden that
00:43:31.840 | circle a little bit. You need to look at a wider set of problems. And so I was in this seminar
00:43:37.760 | in college that was basically a close reading of the Pinker-Prince paper, which was like really
00:43:43.520 | thick, 'cause there was a lot going on in there. And it talked about the reinforcement learning
00:43:50.560 | idea a little bit. I'm like, oh, that sounds really cool, because behavior is what is really
00:43:54.480 | interesting to me about psychology anyway. So making programs that, I mean, programs are things
00:43:59.680 | that behave. People are things that behave. Like I wanna make learning that learns to behave.
00:44:04.640 | - In which way was reinforcement learning presented? Is this talking about human and
00:44:09.760 | animal behavior, or are we talking about actual mathematical constructs?
00:44:12.960 | - Ah, right, so that's a good question. Right, so this is, I think it wasn't actually talked about
00:44:18.720 | as behavior in the paper that I was reading. I think that it just talked about learning.
00:44:22.480 | And to me, learning is about learning to behave, but really neural nets at that point were about
00:44:27.600 | learning, like supervised learning, so learning to produce outputs from inputs. So I kind of
00:44:32.160 | tried to invent reinforcement learning. When I graduated, I joined a research group at Bell
00:44:38.160 | Core, which had spun out of Bell Labs recently at that time, because of the divestiture of the
00:44:43.120 | long-distance and local phone service in the 1980s, 1984. And I was in a group with Dave Ackley, who
00:44:51.440 | was the first author of the Boltzmann machine paper, so the very first neural net paper that
00:44:56.880 | could handle XOR, right? So XOR sort of killed neural nets, the very first, the zero-width order
00:45:02.960 | of neural nets. - The first winter.
00:45:04.320 | - Yeah, the Perceptron's paper, and Hinton, along with his student Dave Ackley, and I think there
00:45:12.000 | was other authors as well, showed that, no, no, no, with Boltzmann machines, we can actually learn
00:45:16.240 | nonlinear concepts, and so everything's back on the table again, and that kind of started that
00:45:21.440 | second wave of neural networks. So Dave Ackley was, he became my mentor at Bell Core, and we
00:45:27.120 | talked a lot about learning and life and computation and how all these things fit
00:45:32.000 | together. Now Dave and I have a podcast together, so I get to kind of enjoy that sort of,
00:45:39.280 | his perspective once again, even all these years later. And so I said, I was really interested in
00:45:46.720 | learning, but in the concept of behavior, and he's like, "Oh, well, that's reinforcement learning
00:45:51.440 | here," and he gave me Rich Sutton's 1984 TD paper. So I read that paper, I honestly didn't get all
00:45:58.480 | of it, but I got the idea. I got that they were using, that he was using ideas that I was familiar
00:46:03.760 | with in the context of neural nets and sort of back prop, but with this idea of making predictions
00:46:10.400 | over time. I'm like, "This is so interesting, but I don't really get all the details," I said to
00:46:14.080 | Dave. And Dave said, "Oh, well, why don't we have him come and give a talk?" And I was like, "Wait,
00:46:20.320 | what? You can do that? Like, these are real people? I thought they were just words. I thought it was
00:46:24.560 | just like ideas that somehow magically seeped into paper." He's like, "No, I know Rich. We'll
00:46:31.440 | just have him come down and he'll give a talk." And so I was, my mind was blown. And so Rich came
00:46:38.400 | and he gave a talk at Bellcore, and he talked about what he was super excited, which was they
00:46:43.120 | had just figured out at the time, Q-learning. So Watkins had visited the Rich Sutton's lab at
00:46:52.320 | UMass or Andy Bartow's lab that Rich was a part of. And he was really excited about this because
00:47:00.000 | it resolved a whole bunch of problems that he didn't know how to resolve in the earlier paper.
00:47:03.680 | And so- - For people who don't know,
00:47:07.200 | TD, temporal difference, these are all just algorithms for reinforcement learning.
00:47:11.200 | - Right, and TD, temporal difference in particular is about making predictions over time. And you can
00:47:16.480 | try to use it for making decisions, right? 'Cause if you can predict how good a future action,
00:47:20.560 | an action outcomes will be in the future, you can choose one that has better and, or,
00:47:25.280 | but the theory didn't really support changing your behavior. Like the predictions had to be
00:47:29.920 | of a consistent process if you really wanted it to work. And one of the things that was really cool
00:47:35.760 | about Q-learning, another algorithm for reinforcement learning, is it was off policy,
00:47:39.680 | which meant that you could actually be learning about the environment and what the value of
00:47:43.360 | different actions would be while actually figuring out how to behave optimally.
00:47:47.920 | - Yeah. - So that was a revelation.
00:47:50.080 | - Yeah, and the proof of that is kind of interesting. I mean, that's really surprising
00:47:53.280 | to me when I first read that. And then in Richard Sutton's book on the matter, it's kind of beautiful
00:47:59.920 | that a single equation can capture all- - One equation, one line of code,
00:48:03.040 | and like you can learn anything. - Yeah, like-
00:48:05.120 | - You can get enough time. - So equation and code, you're right.
00:48:08.080 | The code that you can arguably, at least if you squint your eyes, can say, "This is all of
00:48:17.920 | intelligence," is that you can implement that in a single, I think I started with Lisp, which is,
00:48:24.240 | shout out to Lisp, like a single line of code, key piece of code, maybe a couple,
00:48:30.480 | that you could do that. It's kind of magical. It feels so good to be true.
00:48:36.080 | - Well, and it sort of is. - Yeah, it's kind of-
00:48:40.080 | - It seems they require an awful lot of extra stuff supporting it. But nonetheless, the idea's
00:48:45.920 | really good. And as far as we know, it is a very reasonable way of trying to create
00:48:51.360 | adaptive behavior, behavior that gets better at something over time.
00:48:55.120 | - Did you find the idea of optimal at all compelling, that you could prove that it's
00:49:01.280 | optimal? So like one part of computer science that it makes people feel warm and fuzzy inside
00:49:08.080 | is when you can prove something like that a sorting algorithm, worst case, runs in N log N,
00:49:14.080 | and it makes everybody feel so good. Even though in reality, it doesn't really matter what the
00:49:18.320 | worst case is. What matters is like, does this thing actually work in practice on this particular
00:49:23.280 | actual set of data that I enjoy? Did you- - So here's a place where I have maybe a strong
00:49:28.800 | opinion, which is like, "You're right, of course, but no, no." So what makes worst case so great,
00:49:37.200 | right? If you have a worst case analysis so great, is that you get modularity. You can take that
00:49:42.080 | thing and plug it into another thing and still have some understanding of what's going to happen
00:49:47.280 | when you click them together, right? If it just works well in practice, in other words, with
00:49:51.680 | respect to some distribution that you care about, when you go plug it into another thing, that
00:49:56.320 | distribution can shift, it can change, and your thing may not work well anymore. And you want it
00:50:01.040 | to, and you wish it does, and you hope that it will, but it might not, and then, ah.
00:50:05.600 | - So you're saying you don't like machine learning?
00:50:10.800 | - But we have some positive theoretical results for these things. You can come back at me with,
00:50:19.680 | "Yeah, but they're really weak," and, "Yeah, they're really weak." And you can even say that
00:50:24.480 | sorting algorithms, like if you do the optimal sorting algorithm, it's not really the one that
00:50:28.000 | you want, and that might be true as well. - But it is, the modularity is a really
00:50:33.360 | powerful statement that as an engineer, you can then assemble different things, you can count on
00:50:38.000 | them to be, I mean, it's interesting. It's a balance, like with everything else in life,
00:50:45.120 | you don't want to get too obsessed. I mean, this is what computer scientists do, which they
00:50:49.520 | tend to get obsessed, and they over-optimize things, or they start by optimizing, and then
00:50:54.880 | they over-optimize. So it's easy to get really granular about these things, but the step from
00:51:02.560 | an N squared to an N log N sorting algorithm is a big leap for most real-world systems. No matter
00:51:10.800 | what the actual behavior of the system is, that's a big leap. And the same can probably be said for
00:51:17.520 | other kind of first leaps that you would take in a particular problem. Like it's picking the low
00:51:24.720 | hanging fruit, or whatever the equivalent of doing not the dumbest thing, but the next to the dumbest
00:51:31.600 | thing. - I see, picking the most delicious
00:51:33.840 | reachable fruit. - Yeah, most delicious reachable fruit.
00:51:36.240 | - I don't know why that's not a saying. - Yeah. Okay, so then this is the '80s,
00:51:43.920 | and this kind of idea starts to percolate of learning a system.
00:51:47.200 | - Yeah, well, and at that point, I got to meet Rich Sutton, so everything was sort of downhill
00:51:51.760 | from there, and that was really the pinnacle of everything. But then I felt like I was kind of on
00:51:57.360 | the inside, so then as interesting results were happening, I could check in with Rich, or with
00:52:02.640 | Jerry Tesoro, who had a huge impact on early thinking in temporal difference learning and
00:52:09.120 | reinforcement learning, and showed that you could solve problems that we didn't know how to solve
00:52:13.680 | any other way. And so that was really cool. So as good things were happening, I would hear about it
00:52:19.440 | from either the people who were doing it, or the people who were talking to the people who were
00:52:23.120 | doing it. And so I was able to track things pretty well through the '90s. (laughs)
00:52:28.120 | - So wasn't most of the excitement on reinforcement learning in the '90s era with,
00:52:34.720 | what is it, TD Gamma? What's the role of these kind of little fun game-playing things and
00:52:42.720 | breakthroughs about exciting the community? Was that, what were your, 'cause you've also built
00:52:50.000 | a cross, or were part of building a crossword puzzle solver, solving program called Proverb.
00:52:59.360 | So you were interested in this as a problem, like in forming, in using games to understand how to
00:53:09.760 | build intelligent systems. So what did you think about TD Gamma? What did you think about that
00:53:15.280 | whole thing in the '90s? - Yeah, I mean, I found the TD Gamma
00:53:18.400 | result really just remarkable. So I had known about some of Jerry's stuff before he did TD
00:53:23.040 | Gamma, and he did a system, just more vanilla, well, not entirely vanilla, but a more classical
00:53:28.880 | back-proppy kind of network for playing backgammon, where he was training it on expert moves. So it
00:53:35.280 | was kind of supervised, but the way that it worked was not to mimic the actions, but to learn
00:53:42.000 | internally an evaluation function. So to learn, well, if the expert chose this over this, that
00:53:47.440 | must mean that the expert values this more than this, and so let me adjust my weights to make it
00:53:52.160 | so that the network evaluates this as being better than this. So it could learn from human preferences,
00:53:59.760 | it could learn its own preferences. And then when he took the step from that to actually doing it as
00:54:06.560 | a full-on reinforcement learning problem where you didn't need a trainer, you could just let it play,
00:54:11.840 | that was remarkable, right? And so I think as humans often do, as we've done in the recent
00:54:19.360 | past as well, people extrapolate. It's like, oh, well, if you can do that, which is obviously very
00:54:24.240 | hard, then obviously you could do all these other problems that we want to solve that we know are
00:54:30.000 | also really hard. And it turned out very few of them ended up being practical, partly because
00:54:35.840 | I think neural nets, certainly at the time, were struggling to be consistent and reliable. And so
00:54:42.880 | training them in a reinforcement learning setting was a bit of a mess. I had, I don't know,
00:54:48.480 | generation after generation of like master students who wanted to do value function
00:54:54.960 | approximation, basically reinforcement learning with neural nets. And over and over and over again,
00:55:02.640 | we were failing. We couldn't get the good results that Jerry Tesoro got. I now believe that Jerry is
00:55:08.000 | a neural net whisperer. He has a particular ability to get neural networks to do things that
00:55:15.200 | other people would find impossible. And it's not the technology, it's the technology and Jerry
00:55:21.280 | together. - Yeah, which I think speaks to the role of the human expert in the process of machine
00:55:28.000 | learning. - Right, it's so easy. We're so drawn to the idea that it's the technology that is where
00:55:33.920 | the power is coming from that I think we lose sight of the fact that sometimes you need a really
00:55:39.040 | good, just like, I mean, no one would think, "Hey, here's this great piece of software. Here's like,
00:55:42.640 | I don't know, GNU Emacs or whatever." And doesn't that prove that computers are super powerful and
00:55:48.560 | basically going to take over the world? It's like, no, Stalman is a hell of a hacker, right? So he
00:55:52.880 | was able to make the code do these amazing things. He couldn't have done it without the computer,
00:55:57.280 | but the computer couldn't have done it without him. And so I think people discount the role of
00:56:01.920 | people like Jerry who have just a particular set of skills. - On that topic, by the way,
00:56:10.640 | as a small side note, I tweeted, "Emacs is greater than Vim" yesterday and deleted the tweet 10
00:56:18.800 | minutes later when I realized it started a war. - You were on fire. - I was like, "Oh, I was just
00:56:25.040 | kidding." I was just being- - Provocative. - Walk back and forth. So people still feel
00:56:31.440 | passionately about that particular piece of great software. - Yeah, I don't get that 'cause Emacs is
00:56:35.680 | clearly so much better. I don't understand. But why do I say that? Because I spent a block of time
00:56:41.600 | in the '80s making my fingers know the Emacs keys. And now that's part of the thought process for me.
00:56:49.920 | I need to express. And if you take my Emacs key bindings away, I become... I can't express myself.
00:56:59.520 | - I'm the same way with the, I don't know if you know what it is, but a Kinesis keyboard,
00:57:03.360 | which is this butt-shaped keyboard. - Yes, I've seen them. They're very,
00:57:08.080 | I don't know, sexy, elegant? They're just beautiful. - Yeah, they're gorgeous, way too expensive. But
00:57:17.520 | the problem with them, similar with Emacs, is once you learn to use it- - It's harder to use
00:57:25.200 | other things. - It's hard to use other things. There's this absurd thing where I have small,
00:57:29.280 | elegant, lightweight, beautiful little laptops, and I'm sitting there in a coffee shop with a
00:57:34.000 | giant Kinesis keyboard and a sexy little laptop. It's absurd. I used to feel bad about it, but at
00:57:41.440 | the same time, you just have to... Sometimes it's back to the Billy Joel thing. You just have to
00:57:46.400 | throw that Billy Joel record and throw Taylor Swift and Justin Bieber to the wind.
00:57:51.520 | - See, but I like them now because, again, I have no musical taste. Now that I've heard
00:57:57.600 | Justin Bieber enough, I'm like, "I really like his songs." And Taylor Swift, not only do I like
00:58:02.480 | her songs, but my daughter's convinced that she's a genius, and so now I basically am signed on to
00:58:06.640 | that. - Interesting. So yeah, that speaks back to the robustness of the human brain. That speaks
00:58:12.000 | to the neuroplasticity that you can just, like a mouse, teach yourself to, or probably a dog,
00:58:19.120 | teach yourself to enjoy Taylor Swift. I'll try it out. I don't know. I tried. You know what? It has
00:58:25.680 | to do with just like acclimation, right? Just like you said, a couple of weeks. That's an interesting
00:58:30.480 | experiment. I'll actually try that. - That wasn't the intent of the experiment? Just like social
00:58:34.240 | media, it wasn't intended as an experiment to see what we can take as a society, but it turned out
00:58:38.720 | that way. - I don't think I'll be the same person on the other side of the week listening to Taylor
00:58:42.880 | Swift, but let's try. - No, it's more compartmentalized. Don't be so worried. I get that
00:58:48.240 | you can be worried, but don't be so worried because we compartmentalize really well, and so
00:58:51.920 | it won't bleed into other parts of your life. You won't start, I don't know, wearing red lipstick or
00:58:56.960 | whatever. It's fine. It's fine. - Change fashion and everything. - It's fine, but you know what?
00:59:01.120 | The thing you have to watch out for is you'll walk into a coffee shop once we can do that again.
00:59:04.800 | - And recognize the song. - And you'll be, no, you won't know that you're singing along.
00:59:09.120 | Until everybody in the coffee shop is looking at you, and then you're like, that wasn't me.
00:59:13.840 | - Yeah, that's the, you know, people are afraid of AGI. I'm afraid of the Taylor--
00:59:19.200 | - The Taylor Swift takeover. - Yeah, and I mean, people should know that TD Gammon
00:59:24.720 | was, I get, would you call it, do you like the terminology of self-play by any chance?
00:59:30.800 | - Sure. - So like systems that learn
00:59:32.480 | by playing themselves? Just, I don't know if it's the best word, but-- - So what's
00:59:38.480 | the problem with that term? - I don't know. Silly, it's like the big bang, like, it's like
00:59:45.440 | talking to serious physicists, do you like the term big bang? When it was early, I feel like it's
00:59:50.160 | the early days of self-play. I don't know, maybe it was used previously, but I think it's been used
00:59:55.520 | by only a small group of people. And so like, I think we're still deciding, is this ridiculously
01:00:00.880 | silly name a good name for the concept, potentially one of the most important concepts in artificial
01:00:06.400 | intelligence? - Well, okay, it depends how broadly you apply the term. So I used the term in my 1996
01:00:11.280 | PhD dissertation. - Oh, you, wow, the actual terms of-- - Yeah, because Tesoro's paper was something
01:00:17.600 | like training up an expert backgammon player through self-play. So I think it was in the
01:00:22.400 | title of his paper. - Oh, okay. - If not in the title, it was definitely a term that he used.
01:00:26.320 | There's another term that we got from that work is rollout. So I don't know if you, do you ever
01:00:30.720 | hear the term rollout? That's a backgammon term that has now applied generally in computers.
01:00:36.080 | Well, at least in AI because of TDGammon. - That's fascinating. - So how is self-play being
01:00:42.480 | used now? And like, why is it, does it feel like a more general powerful concept? Sort of the idea of,
01:00:47.680 | well, the machine's just gonna teach itself to be smart. - Yeah, so that's where, maybe you can
01:00:52.320 | correct me, but that's where the continuation of the spirit and actually like literally the exact
01:00:58.960 | algorithms of TDGammon are applied by DeepMind and OpenAI to learn games that are a little bit
01:01:05.680 | more complex. That when I was learning artificial intelligence, Go was presented to me with
01:01:11.280 | Artificial Intelligence, The Modern Approach. I don't know if they explicitly pointed to Go
01:01:16.000 | in those books as like unsolvable kind of thing, like implying that these approaches hit their
01:01:23.840 | limit in this, with these particular kind of games. So something, I don't remember if the book said it
01:01:28.800 | or not, but something in my head, if it was the professors, instilled in me the idea like this
01:01:34.640 | is the limits of artificial intelligence of the field. Like it instilled in me the idea that if we
01:01:41.360 | can create a system that can solve the game of Go, we've achieved AGI. That was kind of, I didn't
01:01:47.280 | explicitly like say this, but that was the feeling. And so from, I was one of the people that it seemed
01:01:53.440 | magical when a learning system was able to beat a human world champion at the game of Go
01:02:02.240 | and even more so from that, that was AlphaGo, even more so with AlphaGo Zero than kind of renamed
01:02:09.440 | and advanced into AlphaZero, beating a world champion or world-class player without any
01:02:18.160 | supervised learning on expert games. We're doing only through by playing itself. So that is,
01:02:27.520 | I don't know what to make of it. I think it would be interesting to hear what your opinions are on
01:02:32.880 | just how exciting, surprising, profound, interesting, or boring the breakthrough performance of AlphaZero
01:02:44.240 | was. - Okay, so AlphaGo knocked my socks off. That was so remarkable. - Which aspect of it?
01:02:52.800 | - That they got it to work, that they actually were able to leverage a whole bunch of different
01:02:58.400 | ideas, integrate them into one giant system. Just the software engineering aspect of it is
01:03:03.360 | mind-blowing. I've never been a part of a program as complicated as the program that they built for
01:03:08.480 | that. And just the, like Jerry Tesoro is a neural net whisperer, like David Silver is a kind of
01:03:16.160 | neural net whisperer too. He was able to coax these networks and these new way out there architectures
01:03:22.240 | to do these, to solve these problems that, as you said, when we were learning from AI,
01:03:29.440 | no one had an idea how to make it work. It was remarkable that these techniques that were so
01:03:39.120 | good at playing chess and that could beat the world champion in chess, couldn't beat your typical
01:03:44.000 | Go playing teenager in Go. So the fact that in a very short number of years, we kind of ramped up
01:03:50.560 | to trouncing people in Go, just blew me away. - So you're kind of focusing on the engineering
01:03:57.920 | aspect, which is also very surprising. I mean, there's something different about large,
01:04:03.600 | well-funded companies. I mean, there's a compute aspect to it too. - Sure.
01:04:07.200 | - Like that, of course, I mean, that's similar to Deep Blue, right? With IBM. Like there's something
01:04:15.360 | important to be learned and remembered about a large company, taking the ideas that are already
01:04:21.280 | out there and investing a few million dollars into it or more. And so you're kind of saying
01:04:28.800 | the engineering is kind of fascinating, both on the, with AlphaGo is probably just gathering all
01:04:34.640 | the data, right? Of the expert games, like organizing everything, actually doing distributed
01:04:41.280 | supervised learning. And to me, see the engineering I kind of took for granted,
01:04:47.840 | to me, philosophically being able to persist in the face of like long odds, because it feels like
01:04:59.040 | for me, I'll be one of the skeptical people in the room thinking that you can learn your way to beat
01:05:04.400 | Go. Like it sounded like, especially with David Silver, it sounded like David was not confident
01:05:10.400 | at all. It's like, it was like, not, it's funny how confidence works. - Yeah.
01:05:18.400 | - It's like, you're not like cocky about it, like, but- - Right, 'cause if you're cocky about it,
01:05:25.680 | you kind of stop and stall and don't get anywhere. - Yeah, but there's like a hope that's unbreakable.
01:05:31.440 | Maybe that's better than confidence. It's a kind of wishful hope in a little dream,
01:05:36.160 | and you almost don't want to do anything else. You kind of keep doing it. That seems to be the
01:05:41.840 | story. - But with enough skepticism that you're looking for where the problems are and fighting
01:05:47.280 | through them. - Yeah.
01:05:48.160 | - 'Cause you know, there's gotta be a way out of this thing. Yeah.
01:05:50.720 | - And for him, it was probably, there's a bunch of little factors that come into play. It's funny
01:05:56.080 | how these stories just all come together. Like everything he did in his life came into play,
01:06:00.400 | which is like a love for video games and also a connection to, so the '90s had to happen with
01:06:07.200 | TD Gammon and so on. - Yeah.
01:06:08.640 | - And in some ways it's surprising, maybe you can provide some intuition to it, that not much more
01:06:14.640 | than TD Gammon was done for quite a long time on the reinforcement learning front.
01:06:19.120 | - Yeah. - Is that weird to you?
01:06:20.560 | - I mean, like I said, the students who I worked with, we tried to get,
01:06:25.600 | basically apply that architecture to other problems and we consistently failed. There were a couple
01:06:31.200 | really nice demonstrations that ended up being in the literature. There was a paper about
01:06:36.880 | controlling elevators, right? Where it's like, okay, can we modify the heuristic that elevators
01:06:42.720 | use for deciding, like a bank of elevators for deciding which floors we should be stopping on to
01:06:47.360 | maximize throughput essentially. And you can set that up as a reinforcement learning problem and
01:06:52.240 | you can have a neural net represent the value function so that it's taking where are all the
01:06:56.960 | elevators, where are the button pushes, this high dimensional, well, at the time, high dimensional
01:07:02.080 | input, a couple of dozen dimensions and turn that into a prediction as to, oh, is it gonna be better
01:07:08.640 | if I stop at this floor or not? And ultimately it appeared as though for the standard simulation
01:07:15.920 | distribution for people trying to leave the building at the end of the day, that the neural
01:07:19.600 | net learned a better strategy than the standard one that's implemented in elevator controllers.
01:07:24.000 | So that was nice. There was some work that Satyendra Singh et al did on handoffs with cell
01:07:32.320 | phones, deciding when should you hand off from this cell tower to this cell tower.
01:07:38.080 | - Oh, okay, communication networks, yeah.
01:07:39.760 | - Yeah, and so a couple of things seemed like they were really promising. None of them made it into
01:07:45.040 | production that I'm aware of. And neural nets as a whole started to kind of implode around then.
01:07:50.160 | And so there just wasn't a lot of air in the room for people to try to figure out, okay,
01:07:55.040 | how do we get this to work in the RL setting? - And then they found their way back in 10 plus
01:08:02.080 | years. So you said AlphaGo was impressive, like it's a big spectacle. Is there-
01:08:06.720 | - Right, so then AlphaZero, so I think I may have a slightly different opinion on this than some
01:08:12.000 | people. So I talked to Satyendra Singh in particular about this. So Satyendra was like
01:08:17.440 | Rich Sutton, a student of Andy Bartow. So they came out of the same lab, very influential machine
01:08:23.120 | learning, reinforcement learning researcher, now at DeepMind, as is Rich, though different sites,
01:08:30.800 | the two of them. - He's in Alberta.
01:08:32.560 | - Rich is in Alberta and Satyendra would be in England, but I think he's in England from
01:08:37.280 | Michigan at the moment. But he was, yes, he was much more impressed with AlphaGo Zero, which is,
01:08:47.120 | didn't get a kind of a bootstrap in the beginning with human-trained games,
01:08:51.760 | just was purely self-play. Though the first one, AlphaGo, was also a tremendous amount of self-play.
01:08:57.760 | Right, they started off, they kick-started the action network that was making decisions,
01:09:02.320 | but then they trained it for a really long time using more traditional temporal difference
01:09:06.400 | methods. So as a result, it didn't seem that different to me. It seems like, yeah,
01:09:13.760 | why wouldn't that work? Once it works, it works. But he found that removal of that extra information
01:09:22.240 | to be breathtaking. That's a game changer. To me, the first thing was more of a game changer.
01:09:27.440 | - But the open question, I mean, I guess that's the assumption, is the expert games might contain
01:09:34.160 | within them a humongous amount of information. - But we know that it went beyond that, right?
01:09:40.960 | We know that it somehow got away from that information because it was learning strategies.
01:09:44.960 | I don't think AlphaGo is just better at implementing human strategies. I think it
01:09:50.640 | actually developed its own strategies that were more effective. And so from that perspective,
01:09:56.000 | okay, well, so it made at least one quantum leap in terms of strategic knowledge. Okay,
01:10:02.640 | so now maybe it makes three. Okay, but that first one is the doozy, right? Getting it to work
01:10:09.520 | reliably and for the networks to hold onto the value well enough. That was a big step.
01:10:15.680 | - Well, maybe you could speak to this on the reinforcement learning front. So starting from
01:10:21.280 | scratch and learning to do something, like the first random behavior to crappy behavior to
01:10:33.040 | somewhat okay behavior, it's not obvious to me that that's not impossible to take those steps.
01:10:40.640 | If you just think about the intuition, how the heck does random behavior become
01:10:47.040 | somewhat basic intelligent behavior? Not human level, not superhuman level, but just basic.
01:10:54.880 | But you're saying to you, the intuition is like, if you can go from human to superhuman level
01:11:00.240 | intelligence on this particular task of game playing, then you're good at taking leaps.
01:11:06.800 | So you can take many of them. - That the system, I believe that the
01:11:09.520 | system can take that kind of leap. Yeah, and also I think that beginner knowledge
01:11:15.120 | in Go, you can start to get a feel really quickly for the idea that being in certain parts of the
01:11:24.640 | board seems to be more associated with winning. 'Cause it's not stumbling upon the concept of
01:11:31.360 | winning. It's told that it wins or that it loses. Well, it's self-play. So it both wins and loses.
01:11:36.480 | It's told which side won. And the information is kind of there to start percolating around
01:11:43.520 | to make a difference as to, well, these things have a better chance of helping you win. And
01:11:48.720 | these things have a worse chance of helping you win. And so it can get to basic play,
01:11:52.560 | I think pretty quickly. Then once it has basic play, well, now it's kind of forced to do some
01:11:58.000 | search to actually experiment with, okay, well, what gets me that next increment of improvement?
01:12:03.200 | - How far do you think, okay, this is where you kind of bring up the Elon Musk and the Sam Harris
01:12:09.440 | is right. How far is your intuition about these kinds of self-play mechanisms being able to take
01:12:15.120 | us? 'Cause it feels one of the ominous, but stated calmly things that when I talked to David Silver,
01:12:24.320 | he said is that they have not yet discovered a ceiling for alpha zero, for example, on the game
01:12:31.440 | of go or chess. It keeps, no matter how much they compute, they throw at it, it keeps improving.
01:12:37.520 | So it's possible, it's very possible that if you throw some 10X compute that it will improve by
01:12:46.640 | 5X or something like that. And when stated calmly, it's so like, oh yeah, I guess so.
01:12:53.680 | But then you think, well, can we potentially have continuations of Moore's law in totally
01:13:02.240 | different way, like broadly defined Moore's law? - Right, exponential improvement.
01:13:06.160 | - Exponential improvement, like are we going to have an alpha zero that swallows the world?
01:13:11.040 | - But notice it's not getting better at other things, it's getting better at go. And I think
01:13:17.360 | that's a big leap to say, okay, well, therefore it's better at other things.
01:13:22.320 | - Well, I mean, the question is how much of the game of life can be turned into-
01:13:27.280 | - Right, so that I think is a really good question. And I think that we don't,
01:13:30.960 | I don't think we as a, I don't know, community really know the answer to this, but,
01:13:34.800 | so, okay, so I went to a talk by some experts on computer chess. So in particular, computer
01:13:44.720 | chess is really interesting because for, of course, for a thousand years, humans were the
01:13:49.760 | best chess playing things on the planet. And then computers like edged ahead of the best person,
01:13:56.240 | and they've been ahead ever since. It's not like people have overtaken computers.
01:14:01.040 | But computers and people together have overtaken computers.
01:14:06.160 | - Right.
01:14:06.640 | - So at least last time I checked, I don't know what the very latest is, but last time I checked
01:14:11.440 | that there were teams of people who could work with computer programs to defeat the best computer
01:14:17.200 | programs.
01:14:17.600 | - In the game of go?
01:14:18.480 | - In the game of chess.
01:14:19.360 | - In the game of chess.
01:14:20.160 | - Right. And so using the information about how, these things called Elo scores,
01:14:26.880 | this sort of notion of how strong a player are you, there's kind of a range of possible scores.
01:14:32.400 | And you increment in score, basically, if you can beat another player of that lower score,
01:14:38.560 | 62% of the time or something like that. Like there's some threshold of,
01:14:43.360 | if you can somewhat consistently beat someone, then you are of a higher score than that person.
01:14:48.640 | And there's a question as to how many times can you do that in chess, right? And so we know that
01:14:53.360 | there's a range of human ability levels that cap out with the best playing humans. And the computers
01:14:58.480 | went a step beyond that. And computers and people together have not gone, I think, a full step
01:15:04.080 | beyond that. It feels, the estimates that they have is that it's starting to asymptote, that
01:15:09.200 | we've reached kind of the maximum, the best possible chess playing. And so that means that
01:15:14.800 | there's kind of a finite strategic depth, right? At some point you just can't get any better at
01:15:20.480 | this game.
01:15:20.960 | - Yeah, I mean, I don't, so I'll actually check that. I think,
01:15:26.240 | it's interesting because if you have somebody like Magnus Carlsen, who's using these chess
01:15:34.160 | programs to train his mind, like to learn about chess.
01:15:37.360 | - To become a better chess player, yeah.
01:15:38.720 | - And so like, that's a very interesting thing, 'cause we're not static creatures,
01:15:43.840 | we're learning together. I mean, just like we're talking about social networks,
01:15:47.680 | those algorithms are teaching us just like we're teaching those algorithms. So that's
01:15:51.680 | a fascinating thing. But I think the best chess playing programs are now better than the pairs.
01:15:58.480 | Like they have competition between pairs, but it's still, even if they weren't, it's an
01:16:03.840 | interesting question, where's the ceiling? So the David, the ominous David Silver kind of statement
01:16:09.280 | is like, we have not found the ceiling.
01:16:11.520 | - Right, but so the question is, okay, so I don't know his analysis on that. My,
01:16:17.360 | from talking to Go experts, the depth, the strategic depth of Go seems to be substantially
01:16:23.920 | greater than that of chess, that there's more kind of steps of improvement that you can make,
01:16:28.720 | getting better and better and better and better. But there's no reason to think that it's infinite.
01:16:31.840 | - Infinite, yeah.
01:16:32.640 | - And so it could be that it's, that what David is seeing is a kind of asymptoting,
01:16:38.080 | that you can keep getting better, but with diminishing returns. And at some point,
01:16:42.160 | you hit optimal play. Like in theory, all these finite games, they're finite,
01:16:47.520 | they have an optimal strategy. There's a strategy that is the mini max optimal strategy.
01:16:51.760 | And so at that point, you can't get any better. You can't beat that strategy. Now that strategy
01:16:57.200 | may be from an information processing perspective, intractable, right? You need,
01:17:03.360 | all the situations are sufficiently different that you can't compress it at all. It's this
01:17:09.120 | giant mess of hard-coded rules, and we can never achieve that.
01:17:14.640 | But that still puts a cap on how many levels of improvement that we can actually make.
01:17:18.480 | - But the thing about self-play is if you put it, although I don't like doing that,
01:17:24.400 | in the broader category of self-supervised learning, is that it doesn't require too much
01:17:30.560 | or any human input. - Human labeling, yeah.
01:17:32.480 | - Yeah, human label or just human effort. The human involvement past a certain point.
01:17:37.840 | And the same thing you could argue is true for the recent breakthroughs in natural English
01:17:44.160 | processing with language models. - Oh, this is how you get to GPT-3.
01:17:47.440 | - Yeah, see how I did the- - That was a good transition.
01:17:51.120 | - Yeah, I practiced that for days leading up to this. But that's one of the questions is,
01:17:58.640 | can we find ways to formulate problems in this world that are important to us humans,
01:18:04.640 | like more important than the game of chess, that to which self-supervised kinds of approaches
01:18:12.400 | could be applied, whether it's self-play, for example, for maybe you could think of
01:18:16.960 | autonomous vehicles in simulation, that kind of stuff, or just robotics applications in simulation,
01:18:24.640 | or in the self-supervised learning where unannotated data or data that's generated
01:18:35.760 | by humans naturally without extra cost, like Wikipedia or like all of the internet,
01:18:42.880 | can be used to create intelligent systems that do something really powerful,
01:18:50.320 | that pass the Turing test or that do some kind of superhuman level performance.
01:18:55.760 | So what's your intuition, trying to stitch all of it together about our discussion of AGI,
01:19:05.120 | the limits of self-play, and your thoughts about maybe the limits of neural networks in the context
01:19:11.200 | of language models? Is there some intuition in there that might be useful to think about?
01:19:16.640 | - Yeah, yeah, yeah. So first of all, the whole transformer network family of things
01:19:23.920 | is really cool. It's really, really cool. I mean, if you've ever, back in the day, you played with,
01:19:31.600 | I don't know, Markov models for generating text, and you've seen the kind of text that they spit
01:19:35.360 | out, and you compare it to what's happening now, it's amazing. It's so amazing. Now, it doesn't
01:19:42.400 | take very long interacting with one of these systems before you find the holes, right? It's
01:19:47.840 | not smart in any kind of general way. It's really good at a bunch of things, and it does seem to
01:19:55.840 | understand a lot of the statistics of language extremely well. And that turns out to be very
01:20:01.280 | powerful. You can answer many questions with that. But it doesn't make it a good conversationalist,
01:20:06.160 | right? And it doesn't make it a good storyteller. It just makes it good at imitating of things it
01:20:10.480 | has seen in the past. - The exact same thing could be said by people who are voting for Donald Trump
01:20:16.480 | about Joe Biden supporters, and people voting for Joe Biden about Donald Trump supporters, is,
01:20:21.280 | you know- - That they're not intelligent?
01:20:23.760 | They're just following the- - Yeah, they're following things
01:20:26.080 | they've seen in the past, and it doesn't take long to find the flaws in their natural language
01:20:35.200 | generation abilities. - Yes, yes.
01:20:36.720 | - So we're being very- - That's interesting.
01:20:39.040 | - Critical of AS. - Right, so I've had a similar thought,
01:20:43.280 | which was that the stories that GPT-3 spits out are amazing and very human-like,
01:20:52.320 | and it doesn't mean that computers are smarter than we realize, necessarily. It partly means
01:20:58.320 | that people are dumber than we realize, or that much of what we do day-to-day is not that deep.
01:21:04.400 | Like, we're just kind of going with the flow, we're saying whatever feels like the natural
01:21:09.040 | thing to say next. Not a lot of it is creative or meaningful or intentional. But enough is that we
01:21:18.080 | actually get by, right? We do come up with new ideas sometimes, and we do manage to talk each
01:21:23.440 | other into things sometimes, and we do sometimes vote for reasonable people sometimes. But it's
01:21:31.200 | really hard to see in the statistics, because so much of what we're saying is kind of rote.
01:21:34.960 | And so our metrics that we use to measure how these systems are doing don't reveal that,
01:21:41.600 | because it's in the interstices that is very hard to detect.
01:21:46.320 | - But do you have an intuition that with these language models,
01:21:50.240 | if they grow in size, it's already surprising when you go from GPT-2 to GPT-3 that there is
01:21:57.920 | a noticeable improvement. So the question now goes back to the ominous David Silver and the ceiling.
01:22:03.120 | - Right, so maybe there's just no ceiling, we just need more compute. Now,
01:22:06.240 | I mean, okay, so now I'm speculating. As opposed to before, when I was completely on firm ground.
01:22:15.200 | I don't believe that you can get something that really can do language and use language as a thing
01:22:21.760 | that doesn't interact with people. I think that it's not enough to just take everything that we've
01:22:27.200 | said written down and just say, "That's enough, you can just learn from that, and you can be
01:22:31.120 | intelligent." I think you really need to be pushed back at. I think that conversations, even people
01:22:37.360 | who are pretty smart, maybe the smartest thing that we know, maybe not the smartest thing we can
01:22:41.680 | imagine, but we get so much benefit out of talking to each other and interacting. That's presumably
01:22:49.120 | why you have conversations live with guests, is that there's something in that interaction that
01:22:54.000 | would not be exposed by, "Oh, I'll just write you a story and then you can read it later."
01:22:58.320 | And I think because these systems are just learning from our stories, they're not learning from
01:23:03.040 | being pushed back at by us, that they're fundamentally limited into what they could
01:23:07.200 | actually become on this route. They have to get shut down. They have to have an argument with us
01:23:15.840 | and lose a couple times before they start to realize, "Oh, okay, wait, there's some nuance
01:23:21.280 | here that actually matters." - Yeah, that's actually subtle-sounding,
01:23:25.680 | but quite profound that the interaction with humans is essential. And the limitation within
01:23:33.840 | that is profound as well, because the time scale, the bandwidth at which you can really interact
01:23:40.400 | with humans is very low. So it's costly. One of the underlying things about self-plays, it has to do
01:23:48.640 | a very large number of interactions. And so you can't really deploy reinforcement learning systems
01:23:56.560 | into the real world to interact. You couldn't deploy a language model into the real world to
01:24:02.400 | interact with humans because it would just not get enough data relative to the cost it takes
01:24:09.200 | to interact. The time of humans is expensive, which is really interesting. That takes us back
01:24:15.040 | to reinforcement learning and trying to figure out if there's ways to make algorithms that are
01:24:20.880 | more efficient at learning, keep the spirit in reinforcement learning and become more efficient.
01:24:25.680 | In some sense, that seems to be the goal. I'd love to hear what your thoughts are. I don't know if
01:24:31.360 | you got a chance to see a blog post called "Bitter Lesson." - Oh, yes.
01:24:35.440 | - By Rich Sutton that makes an argument, and hopefully I can summarize it. Perhaps you can.
01:24:42.880 | - Oh, okay. - I mean, I could try and you can
01:24:46.480 | correct me, which is he makes an argument that it seems if we look at the long arc of the history of
01:24:52.880 | the artificial intelligence field, he calls 70 years, that the algorithms from which we've seen
01:25:00.080 | the biggest improvements in practice are the very simple, dumb algorithms that are able to leverage
01:25:07.120 | computation. You just wait for the computation to improve. All of the academics and so on have fun
01:25:13.520 | by finding little tricks and congratulate themselves on those tricks. Sometimes those
01:25:18.080 | tricks can be big that feel in the moment like big spikes and breakthroughs, but in reality,
01:25:23.680 | over the decades, it's still the same dumb algorithm that just waits for the compute
01:25:29.440 | to get faster and faster. Do you find that to be an interesting argument against the
01:25:36.640 | entirety of the field of machine learning as an academic discipline? - That we're really just a
01:25:41.600 | subfield of computer architecture. We're just kind of waiting around for them to do their next thing.
01:25:45.760 | - Who really don't want to do hardware work. - That's right. I really don't want to think
01:25:50.000 | about it. - We're procrastinating.
01:25:50.880 | - Yes, that's right. Just waiting for them to do their job so that we can pretend to have done
01:25:54.800 | ours. So, yeah, I mean, the argument reminds me a lot of, I think it was a Fred Jelinek quote,
01:26:02.160 | early computational linguist who said, you know, we're building these computational linguistic
01:26:06.320 | systems and every time we fire a linguist, performance goes up by 10%, something like
01:26:12.720 | that. And so the idea of us building the knowledge in, in that case, was much less,
01:26:18.960 | he was finding it to be much less successful than get rid of the people who know about language as a,
01:26:24.560 | you know, from a kind of scholastic academic kind of perspective and replace them with more compute.
01:26:31.200 | And so I think this is kind of a modern version of that story, which is, okay,
01:26:35.440 | we want to do better on machine vision. You could build in all these, you know,
01:26:39.600 | motivated part-based models that, you know, that just feel like obviously the right thing that you
01:26:47.520 | have to have, or we can throw a lot of data at it and guess what we're doing better with it,
01:26:50.960 | with a lot of data. So I hadn't thought about it until this moment in this way, but what I believe,
01:26:58.960 | well, I've thought about what I believe. What I believe is that, you know, compositionality and
01:27:06.000 | what's the right way to say it? The complexity grows rapidly as you consider more and more
01:27:13.440 | possibilities, like explosively. And so far, Moore's law has also been growing explosively,
01:27:20.080 | exponentially. And so, so it really does seem like, well, we don't have to think really hard
01:27:24.960 | about the algorithm design or the way that we build the systems because the best benefit we
01:27:31.280 | could get is exponential. And the best benefit that we can get from waiting is exponential.
01:27:35.680 | So we can just wait. It's got, that's got to end, right? And there's hints now that,
01:27:40.960 | that Moore's law is, is starting to feel some friction, starting to, the world is pushing back
01:27:46.480 | a little bit. One thing that I, I don't know, do lots of people know this? I didn't know this. I
01:27:51.520 | was, I was trying to write an essay and yeah, Moore's law has been amazing and it's been,
01:27:57.120 | it's enabled all sorts of things, but there's a, there's also a kind of counter Moore's law,
01:28:01.200 | which is that the development cost for each successive generation of chips also is doubling.
01:28:07.440 | So it's costing twice as much money. So the amount of development money per cycle or whatever
01:28:12.800 | is actually sort of constant. And at some point we run out of money. So, or we have to come up
01:28:18.000 | with an entirely different way of, of doing the development process. So like, I, I guess I always,
01:28:24.000 | always a bit skeptical of the, look, it's an exponential curve, therefore it has no end.
01:28:28.560 | Soon the number of people going to NeurIPS will be greater than the population of the earth.
01:28:32.640 | That means we're going to discover life on other planets. No, it doesn't. It means that we're in a,
01:28:37.280 | in a sigmoid curve on the front half, which looks a lot like an exponential.
01:28:41.840 | The second half is going to look a lot like diminishing returns.
01:28:45.360 | Yeah. The, I mean, but the interesting thing about Moore's law, if you actually like look at
01:28:50.480 | the technologies involved, it's hundreds, if not thousands of S curves stacked on top of each other.
01:28:56.560 | It's not actually an exponential curve. It's constant breakthroughs. And, and then what
01:29:02.640 | becomes useful to think about, which is exactly what you're saying, the cost of development,
01:29:06.720 | like the size of teams, the amount of resources that are invested in continuing to find new S
01:29:11.920 | curves, new breakthroughs. And yeah, it's a, it's an interesting idea. You know, if we live in the
01:29:20.080 | moment, if we sit here today, it seems to be the reasonable thing to say that exponentials end.
01:29:28.400 | And yet in the software realm, they just keep appearing to be happening. And it's so,
01:29:36.320 | I mean, it's so hard to disagree with Elon Musk on this because it like, I've,
01:29:43.920 | you know, I used to be one of those folks. I'm still one of those folks. I studied autonomous
01:29:50.320 | vehicles. This is what I worked on. And, and it's, it's like, you look at what Elon Musk is saying
01:29:56.080 | about autonomous vehicles. Well, obviously in a couple of years or in a year or next month,
01:30:01.440 | we'll have fully autonomous vehicles. Like there's no reason why we can't. Driving is pretty simple.
01:30:06.080 | Like it's just a learning problem and you just need to convert all the driving that we're doing
01:30:11.440 | into data and just having, you know, know what the trains on that data. And like we use only our eyes,
01:30:17.360 | so you can use cameras and you can train on it. And it's like, yeah, that's, that what,
01:30:24.240 | that should work. And then you put that hat on like the philosophical hat. And, but then you put
01:30:30.080 | the pragmatic hat and it's like, this is what the flaws of computer vision are. Like this is what it
01:30:34.800 | means to train at scale. And then you, you put the human factors, the psychology hat on, which is
01:30:41.200 | like, it's actually driving us a lot. The cognitive science or cognitive, whatever the heck you call
01:30:45.760 | it is. It's really hard. It's much harder to drive than, than we realize there's much larger number
01:30:52.240 | of edge cases. So building up an intuition around this is, around exponential is really difficult.
01:30:59.280 | And on top of that, the pandemic is making us think about exponentials, making us realize that
01:31:06.560 | like we don't understand anything about it. We're not able to intuit exponentials. We're either
01:31:11.440 | ultra terrified, some part of the population and some part is like the opposite of whatever the
01:31:20.480 | different carefree and we're not managing it very well. Blase. Blase. Well, wow. That's,
01:31:26.240 | is that French? I assume so. It's got an accent. So it's, it's fascinating to think what,
01:31:32.480 | what the limits of this exponential growth of technology, not just Moore's law, it's technology,
01:31:44.400 | how that rubs up against the bitter lesson and GPT-3 and self-play mechanisms. Like it's not
01:31:54.240 | obvious. I used to be much more skeptical about neural networks. Now I at least give a slither
01:31:59.680 | of possibility that we'll be all, that we'll be very much surprised and also, you know,
01:32:05.840 | caught in a way that like we are not prepared for. Like in applications of social networks,
01:32:18.320 | for example, because it feels like really good transformer models that are able to do some kind
01:32:25.200 | of like very good natural language generation are the same kind of models that can be used
01:32:32.160 | to learn human behavior and then manipulate that human behavior to gain advertiser dollars and all
01:32:37.920 | those kinds of things through the capitalist system. And they arguably already are manipulating
01:32:44.000 | human behavior. Yeah. So, but not for self-preservation, which I think is a big,
01:32:50.160 | that would be a big step. Like if they were trying to manipulate us to convince us not to shut them
01:32:55.440 | off, I would be very freaked out, but I don't see a path to that from where we are now. They, they,
01:33:02.560 | they don't have any of those abilities. That's not what they're trying to do. They're trying to
01:33:08.080 | keep people on, on the site. But see, the thing is this, this is the thing about life on earth
01:33:12.960 | is they might be borrowing our consciousness and sentience. Like, so like in a sense they do,
01:33:20.880 | because the creators of the algorithms have, like they're not, you know, if you look at our body,
01:33:26.160 | we're not a single organism. We're a huge number of organisms with like tiny little motivations.
01:33:31.600 | We're built on top of each other. In the same sense, the AI algorithms that are, they're not-
01:33:36.480 | - It's a system that includes human companies and corporations, right? Because corporations are
01:33:41.120 | funny organisms in and of themselves that really do seem to have self-preservation built in. And
01:33:45.760 | I think that's at the, at the design level. I think the design to have self-preservation be
01:33:50.080 | a focus. So you're right in that, in that broader system that we're also a part of and can have some
01:33:59.280 | influence on, it's, it's, it is much more complicated, much more powerful. Yeah, I agree with
01:34:05.360 | that. - So people really love it when I ask what three books, technical, philosophical,
01:34:12.080 | fiction had a big impact on your life. Maybe you can recommend, we went with movies. We went with
01:34:19.680 | Billy Joel and I forgot what you, what music you recommended, but- - I didn't, I just said I have
01:34:25.440 | no taste in music. I just like pop music. - That was actually really skillful the way you have
01:34:30.400 | read that question. - Thank you, thanks. I was, I'm going to try to do the same with the books.
01:34:33.200 | - So do you have a skillful way to avoid answering the question about three books you would recommend?
01:34:39.040 | - I'd like to tell you a story. So my first job out of college was at Bellcore. I mentioned that
01:34:46.240 | before where I worked with Dave Ackley. The head of the group was a guy named Tom Landauer. And I
01:34:50.240 | don't know how well known he's known now, but arguably he's the, he's the inventor and the
01:34:56.320 | first proselytizer of word embeddings. So they, they developed a system shortly before I got to
01:35:01.840 | the group. Yeah. That, that called latent semantic analysis that would take words of English and
01:35:09.280 | embed them in a multi-hundred dimensional space. And then use that as a way of assessing similarity
01:35:16.240 | and basically doing reinforcement learning. Not, sorry, not reinforcement, information retrieval,
01:35:19.840 | sort of pre-Google information retrieval. And he was trained as an anthropologist, but then
01:35:28.560 | became a cognitive scientist. So I was in the cognitive science research group. It's, you know,
01:35:31.840 | like I said, I'm a cognitive science groupie. At the time I thought I'd become a cognitive scientist,
01:35:36.960 | but then I realized in that group, no, I'm a computer scientist, but I'm a computer scientist
01:35:41.120 | who really loves to hang out with cognitive scientists. And he said, he studied language
01:35:46.640 | acquisition in particular. He said, you know, humans have about this number of words of vocabulary.
01:35:52.640 | And most of that is learned from reading. And I said, that can't be true because I have a really
01:35:58.000 | big vocabulary and I don't read. He's like, you must. I'm like, I don't think I do. I mean,
01:36:03.120 | like stop signs, I definitely read stop signs, but like reading books is not, is not a thing
01:36:08.080 | that I do a lot of. - Do you really though?
01:36:09.680 | It might be just visual. - No.
01:36:10.720 | - Maybe the red color. - Do I read stop signs?
01:36:13.920 | - Yeah. - No, it's just pattern
01:36:14.880 | recognition at this point. I don't sound it out. - Yeah.
01:36:16.880 | - So now I do, I wonder what that, oh yeah, stoptagons. So.
01:36:25.200 | - That's fascinating. So you don't. - So I don't read very, I mean,
01:36:28.880 | obviously I read and I've read, I've read plenty of books. But like some people like Charles,
01:36:33.840 | my friend Charles and others, like a lot of people in my field, a lot of academics,
01:36:38.480 | like reading was really a central topic to them in development. And I'm not that guy. In fact,
01:36:45.280 | I used to joke that when I got into college, that it was on kind of a help out the illiterate kind
01:36:53.840 | of program because I got to, like I, in my house, I wasn't a particularly bad or good reader. But
01:36:57.680 | when I got to college, I was surrounded by these people that were just voracious in their reading
01:37:02.640 | appetite. And they would like, have you read this? Have you read this? Have you read this?
01:37:06.000 | And I'd be like, no, I'm clearly not qualified to be at this school. Like there's no way I
01:37:10.800 | should be here. Now I've discovered books on tape, like audio books. And so I'm much better.
01:37:16.480 | I'm more caught up. I read a lot of books. - A small tangent on that. It is a fascinating
01:37:23.200 | open question to me on the topic of driving, whether, you know, supervised learning people,
01:37:30.800 | machine learning people think you have to like drive to learn how to drive. To me, it's very
01:37:36.800 | possible that just by us humans, by first of all, walking, but also by watching other people drive,
01:37:44.080 | not even being inside cars as a passenger, but let's say being inside the car as a passenger,
01:37:49.120 | but even just like being a pedestrian and crossing the road, you learn so much about
01:37:55.360 | driving from that. It's very possible that you can, without ever being inside of a car,
01:38:00.560 | be okay at driving once you get in it. Or like watching a movie, for example. I don't know,
01:38:06.720 | something like that. - Have you taught anyone to drive?
01:38:10.320 | - No. - So.
01:38:12.000 | - Except myself. - I have two children.
01:38:15.040 | - Uh-oh. - And I learned a lot about car driving
01:38:18.080 | 'cause my wife doesn't wanna be the one in the car while they're learning. So that's my job.
01:38:22.560 | - Yeah. - So I sit in the passenger seat
01:38:24.400 | and it's really scary. I have wishes to live and they're figuring things out. Now,
01:38:32.320 | they start off very, very much better than I imagine like a neural network would, right?
01:38:39.760 | They get that they're seeing the world. They get that there's a road that they're trying to be on.
01:38:43.920 | They get that there's a relationship between the angle of the steering. But it takes a while to not
01:38:48.320 | be very jerky. And so that happens pretty quickly. Like the ability to stay in lane at speed,
01:38:54.960 | that happens relatively fast. It's not zero shot learning, but it's pretty fast.
01:38:59.040 | The thing that's remarkably hard, and this is, I think, partly why self-driving cars are really
01:39:03.920 | hard, is the degree to which driving is a social interaction activity. And that blew me away. I
01:39:10.400 | was completely unaware of it until I watched my son learning to drive. And I was realizing that
01:39:16.160 | he was sending signals to all the cars around him. And those, in his case, he's always had social
01:39:22.880 | communication challenges. He was sending very mixed, confusing signals to the other cars,
01:39:28.800 | and that was causing the other cars to drive weirdly and erratically. And there was no
01:39:33.040 | question in my mind that he would have an accident because they didn't know how to read him.
01:39:39.680 | There's things you do with the speed that you drive, the positioning of your car,
01:39:43.520 | that you're constantly in the head of the other drivers. And seeing him not knowing how to do
01:39:50.320 | that and having to be taught explicitly, "Okay, you have to be thinking about what the other
01:39:54.080 | driver is thinking," was a revelation to me. I was stunned. - Yeah, it's quite brilliant.
01:39:59.120 | So creating theories of mind of the other-- - Theories of mind of the other cars. Yeah,
01:40:05.440 | which I just hadn't heard discussed in the self-driving car talks that I've been to. Since
01:40:09.840 | then, there's some people who do consider those kinds of issues, but it's way more subtle than I
01:40:15.760 | think-- - There's a little bit of work
01:40:18.160 | involved with that when you realize, like when you especially focus not on other cars,
01:40:22.000 | but on pedestrians, for example. It's literally staring you in the face.
01:40:26.800 | - Yeah, yeah, yeah. - So then when you're just like,
01:40:28.640 | "How do I interact with pedestrians?" - Pedestrians, you're practically talking
01:40:33.200 | to an octopus at that point. They've got all these weird degrees of freedom. You don't know
01:40:36.320 | what they're gonna do. They can turn around any second. - But the point is, we humans know what
01:40:40.560 | they're gonna do. We have a good theory of mind. We have a good mental model of what they're doing,
01:40:46.560 | and we have a good model of the model they have of you, and the model of the model of the model.
01:40:51.840 | We're able to reason about this social game of it all. The hope is that it's quite simple,
01:41:02.160 | actually, that it could be learned. That's why I just talked to the Waymo. I don't know if you
01:41:06.480 | know that company. It's Google's South African car. I talked to their CTO about this podcast.
01:41:12.720 | I rode in their car, and it's quite aggressive, and it's quite fast, and it's good, and it feels
01:41:18.960 | great. It also, just like Tesla, Waymo made me change my mind about maybe driving is easier
01:41:26.080 | than I thought. Maybe I'm just being speciest, human-centered. Maybe-- - It's a speciest argument.
01:41:34.480 | - Yeah, so I don't know, but it's fascinating to think about the same as with reading, which I
01:41:42.720 | think you just said. You avoided the question, though I still hope you answered it somewhat.
01:41:46.960 | You avoided it brilliantly. There's blind spots that artificial intelligence researchers have
01:41:54.960 | about what it actually takes to learn to solve a problem. That's fascinating.
01:41:58.960 | - Have you had Anka Dragan on? - Yeah.
01:42:00.880 | - Okay. - She's one of my favorites.
01:42:02.800 | So much energy. - She's right. Oh, yeah.
01:42:04.800 | - She's amazing. - Fantastic, and in particular,
01:42:07.520 | she thinks a lot about this kind of I know that you know that I know kind of planning.
01:42:12.240 | The last time I spoke with her, she was very articulate about the ways in which
01:42:18.160 | self-driving cars are not solved, like what's still really, really hard.
01:42:21.440 | - But even her intuition is limited. We're all new to this. So in some sense,
01:42:26.800 | the Elon Musk approach of being ultra confident and just like plowing--
01:42:30.080 | - Putting it out there. - Putting it out there.
01:42:31.920 | Some people say it's reckless and dangerous and so on, but partly it seems to be one of the only
01:42:39.360 | ways to make progress in artificial intelligence. These are difficult things. Democracy is messy.
01:42:49.280 | Implementation of artificial intelligence systems in the real world is messy.
01:42:53.680 | - So many years ago, before self-driving cars were an actual thing you could have a discussion about,
01:42:58.400 | somebody asked me like, "What if we could use that robotic technology and use it to drive cars
01:43:03.920 | around? Aren't people gonna be killed and then it's not, you know, blah, blah, blah?" I'm like,
01:43:08.400 | "That's not what's gonna happen," I said with confidence, incorrectly, obviously.
01:43:11.920 | What I think is gonna happen is we're gonna have a lot more, like a very gradual kind of rollout
01:43:17.440 | where people have these cars in like closed communities, right? Where it's somewhat realistic,
01:43:24.320 | but it's still in a box, right? So that we can really get a sense of what are the weird things
01:43:29.760 | that can happen? How do we have to change the way we behave around these vehicles? Like it obviously
01:43:37.120 | requires a kind of co-evolution that you can't just plop them in and see what happens. But of
01:43:42.720 | course we're basically plopping them in to see what happens. So I was wrong, but I do think that
01:43:46.240 | would have been a better plan. - So that's, but your intuition, it's funny, just zooming out and
01:43:51.760 | looking at the forces of capitalism, and it seems that capitalism rewards risk-takers, and rewards
01:43:58.880 | and punishes risk-takers, like, and like, try it out. The academic approach to, let's try a small
01:44:10.000 | thing and try to understand slowly the fundamentals of the problem. And let's start with one, then do
01:44:16.880 | two, and then see that, and then do the three. You know, the capitalist, like startup entrepreneurial
01:44:23.360 | dream is let's build a thousand, and let's- - Right, and 500 of them fail, but whatever,
01:44:28.640 | the other 500, we learn from them. - But if you're good enough, I mean,
01:44:32.480 | one thing is like your intuition would say like, that's gonna be hugely destructive to everything.
01:44:37.840 | But actually, it's kind of the forces of capitalism, like people are quite, it's easy to be
01:44:44.400 | critical, but if you actually look at the data, at the way our world has progressed in terms of the
01:44:49.360 | quality of life, it seems like the competent, good people rise to the top. This is coming from me,
01:44:55.520 | from the Soviet Union and so on. It's like, it's interesting that somebody like Elon Musk is the
01:45:03.840 | way you push progress in artificial intelligence. Like it's forcing Waymo to step their stuff up,
01:45:11.440 | and Waymo is forcing Elon Musk to step up. It's fascinating, 'cause I have this tension in my
01:45:20.160 | heart and just being upset by the lack of progress in autonomous vehicles within academia.
01:45:29.600 | So there's huge progress in the early days of the DARPA challenges, and then it just kind of
01:45:36.960 | stopped, like at MIT, but it's true everywhere else, with an exception of a few sponsors here
01:45:44.320 | and there, it's like, it's not seen as a sexy problem. Like the moment artificial intelligence
01:45:52.160 | starts approaching the problems of the real world, like academics kind of like, "Eh, all right,
01:45:59.360 | let the company--" - 'Cause they get really hard
01:46:00.960 | in a different way. - In a different way,
01:46:02.640 | that's right. - And I think, yeah, right,
01:46:03.840 | some of us are not excited about that other way. - But I still think there's fundamental problems
01:46:09.440 | to be solved in those difficult things. It's still publishable, I think, like we just need to,
01:46:15.840 | it's the same criticism you could have of all these conferences in Europe, the CVPR,
01:46:20.160 | where application papers are often as powerful and as important as like a theory paper. Even like
01:46:28.320 | theory just seems much more respectable and so on. I mean, the machine learning community is
01:46:32.400 | changing that a little bit, I mean, at least in statements, but it's still not seen as the sexiest
01:46:38.480 | of pursuits, which is like, "How do I actually make this thing work in practice?" As opposed to
01:46:43.840 | on this toy dataset. All that to say, are you still avoiding the three books question? Is there
01:46:51.120 | something on audiobook that you can recommend? - Oh, yeah, I mean, yeah, I've read a lot of
01:46:56.960 | really fun stuff. In terms of books that I find myself thinking back on that I read a while ago,
01:47:03.360 | like the test of time to some degree, I find myself thinking of "Program or Be Programmed" a
01:47:08.800 | lot by Douglas Roskopf, which was, it basically put out the premise that we all need to become
01:47:17.520 | programmers in one form or another. And it was an analogy to once upon a time, we all had to become
01:47:25.920 | readers, we had to become literate. And there was a time before that when not everybody was literate,
01:47:29.920 | but once literacy was possible, the people who were literate had more of a say in society than
01:47:36.080 | the people who weren't. And so we made a big effort to get everybody up to speed. And now it's
01:47:40.480 | not 100% universal, but it's quite widespread. The assumption is generally that people can read.
01:47:46.960 | The analogy that he makes is that programming is a similar kind of thing, that we need to have a say
01:47:55.760 | in, right? So being a reader, being literate, being a reader means you can receive all this
01:48:01.120 | information, but you don't get to put it out there. And programming is the way that we get to
01:48:05.760 | put it out there. And that was the argument that he made. I think he specifically has now backed
01:48:10.480 | away from this idea. He doesn't think it's happening quite this way. And that might be true
01:48:16.000 | that it didn't, society didn't sort of play forward quite that way. I still believe in the premise. I
01:48:22.160 | still believe that at some point we have the relationship that we have to these machines and
01:48:26.320 | these networks has to be one of each individual can, has the wherewithal to make the machines
01:48:33.360 | help them do the things that that person wants done. And as software people, we know how to do
01:48:39.680 | that. And when we have a problem, we're like, okay, I'll just, I'll hack up a Perl script or
01:48:42.880 | something and make it. So if we lived in a world where everybody could do that, that would be a
01:48:48.000 | better world. And computers would be, have, I think, less sway over us and other people's
01:48:54.800 | software would have less sway over us as a group. Yeah. In some sense, software engineering
01:48:59.040 | programming is power. Programming is power, right? It's, it's, yeah, it's like magic. It's like magic
01:49:04.800 | spells and, and it's not out of reach of everyone, but at the moment it's just a sliver of the
01:49:11.040 | population who can, who can commune with machines in this way. So I don't know. So that book had a
01:49:16.400 | big, big impact on me. Currently I'm, I'm reading "The Alignment Problem" actually by Brian Christian.
01:49:22.000 | So I don't know if you've seen this out there yet. Is it similar to Stuart Russell's work with the
01:49:25.440 | control problem? It's in that same general neighborhood. I mean, they, they take, they have
01:49:29.920 | different emphases that they're, they're concentrating on. I think, I think Stuart's book
01:49:33.600 | did a remarkably good job, like a, just a celebratory good job at describing AI technology
01:49:41.520 | and sort of how it works. I thought that was great. It was really cool to see that in a book.
01:49:46.160 | - Yeah. I think he has some experience writing some books.
01:49:48.960 | - That's, you know, that's probably a possible thing. He's, he's maybe thought a thing or two
01:49:53.520 | about how to explain AI to people. Yeah. That's a really good point. This book so far has been
01:49:59.760 | remarkably good at telling the story of the, sort of the history, the recent history of some of the
01:50:06.720 | things that have happened. This, I'm in the first third. He said this book is in three thirds. The
01:50:10.960 | first third is essentially AI fairness and implications of AI on society that we're seeing
01:50:17.280 | right now. And that's been great. I mean, he's telling those stories really well. He's, he, he
01:50:21.680 | went out and talked to the frontline people who, whose names were associated with some of these
01:50:26.080 | ideas and it's been terrific. He says the second half of the book is on reinforcement learning. So
01:50:30.800 | maybe that'll be fun. And then the third half, third third is on this superintelligence alignment
01:50:39.360 | problem. And I, I suspect that that part will be less fun for me to read.
01:50:43.760 | - Yeah. It's, yeah, it's, it's an interesting problem to talk about. I find it to be the most
01:50:50.240 | interesting, just like thinking about whether we live in a simulation or not as a, as a thought
01:50:55.520 | experiment to think about our own existence. So in the same way, talking about alignment problem
01:51:01.040 | with AGI is a good way to think similar to like the trolley problem with autonomous vehicles.
01:51:06.560 | It's a useless thing for engineering, but it's a, it's a nice little thought experiment for
01:51:11.040 | actually thinking about what are like our own human ethical systems, our moral systems to,
01:51:17.200 | to, to by thinking how we engineer these things, you start to understand yourself.
01:51:24.400 | - So sci-fi can be good at that too. So one sci-fi book to recommend is Exhalations by
01:51:31.040 | Ted Chiang, a bunch of short stories. This Ted Chiang is the guy who wrote the short story that
01:51:35.920 | became the movie Arrival and all of his stories, just from a, he's, he was a computer scientist,
01:51:43.200 | actually, he studied at Brown. They all have this sort of really insightful bit of science or
01:51:50.080 | computer science that drives them. And so it's just a romp, right, to just like, he creates these
01:51:56.320 | artificial worlds with these, by extrapolating on these ideas that, that we know about, but hadn't
01:52:01.840 | really thought through to this kind of conclusion. And so his stuff is, it's really fun to read.
01:52:06.240 | It's mind warping. - So I'm not sure if you're
01:52:09.840 | familiar, I seem to mention this every other word, is I'm from the Soviet Union and I'm Russian.
01:52:16.240 | Way too much Dostoevsky. - I think my roots are Russian too,
01:52:20.080 | but a couple of generations back. - Well, it's probably in there somewhere.
01:52:24.080 | So maybe we can, we can pull at that thread a little bit of the existential dread that we all
01:52:31.040 | feel. You mentioned that you, I think somewhere in the conversation, you mentioned that you don't
01:52:35.040 | really pretty much like dying. I forget in which context, it might've been a reinforcement
01:52:40.560 | learning perspective. I don't know. - I know, you know what it was?
01:52:43.040 | It was in teaching my kids to drive. - That's how you face your mortality. Yes.
01:52:48.800 | From a human being's perspective or from a reinforcement learning researcher's perspective,
01:52:55.200 | let me ask you the most absurd question. What's, what do you think is the meaning of this whole
01:52:59.920 | thing? What the meaning of life on this spinning rock? - I mean, I think reinforcement learning
01:53:08.240 | researchers maybe think about this from a science perspective more often than a lot of other people,
01:53:13.280 | right? As a supervised learning person, you're probably not thinking about the sweep of
01:53:17.280 | a lifetime, but reinforcement learning agents are having little lifetimes, little weird,
01:53:21.920 | little lifetimes. And it's hard not to project yourself into their world sometimes.
01:53:27.520 | But as far as the meaning of life, so when I turned 42, you may know from, that is a book I read,
01:53:34.720 | the-- - "Hitchhiker's Guide to the Galaxy."
01:53:37.120 | - "Hitchhiker's Guide to the Galaxy," that that is the meaning of life. So when I turned 42,
01:53:41.120 | I had a meaning of life party where I invited people over and everyone shared their meaning
01:53:48.320 | of life. We had slides made up. And so we all sat down and did a slide presentation to each other
01:53:54.960 | about the meaning of life. And mine-- - That's great.
01:53:58.080 | - Mine was balance. I think that life is balance. And so the activity at the party, for a 42-year-old,
01:54:07.440 | maybe this is a little bit nonstandard, but I found all the little toys and devices that I had
01:54:12.160 | that where you had to balance on them. You had to like stand on it and balance or pogo stick I brought,
01:54:17.520 | a rip stick, which is like a weird two-wheeled skateboard. I got a unicycle, but I didn't know
01:54:25.760 | how to do it. I now can do it. - I would love watching you try.
01:54:29.200 | - Yeah, I'll send you a video. - Absolutely, guys.
01:54:30.720 | (laughing) - I'm not great, but I managed.
01:54:34.480 | And so balance, yeah. So my wife has a really good one that she sticks to and is probably pretty
01:54:43.200 | accurate. And it has to do with healthy relationships with people that you love and
01:54:49.200 | working hard for good causes. But to me, yeah, balance, balance in a word, that works for me.
01:54:55.360 | Not too much of anything, 'cause too much of anything is iffy.
01:54:59.680 | - That feels like a Rolling Stones song. I feel like they must be.
01:55:03.200 | - You can't always get what you want, but if you try sometimes, you can strike a balance.
01:55:08.160 | - Yeah, I think that's how it goes. (laughing)
01:55:11.760 | - Michael, it's- - A right to parody.
01:55:13.600 | - It's a huge honor to talk to you. This is really fun.
01:55:16.640 | - Oh, not an honor, but- - I've been a big fan of yours.
01:55:18.720 | So can't wait to see what you do next in the world of education, the world of parody,
01:55:27.040 | in the world of reinforcement learning. Thanks for talking to me.
01:55:29.120 | - My pleasure. - Thank you for listening
01:55:31.440 | to this conversation with Michael Littman. And thank you to our sponsors, SimpliSafe,
01:55:36.000 | a home security company I use to monitor and protect my apartment,
01:55:40.160 | ExpressVPN, the VPN I've used for many years to protect my privacy on the internet,
01:55:44.960 | Masterclass, online courses that I enjoy from some of the most amazing humans in history,
01:55:51.280 | and BetterHelp, online therapy with a licensed professional.
01:55:54.880 | Please check out these sponsors in the description to get a discount and to support this podcast.
01:56:00.720 | If you enjoy this thing, subscribe on YouTube, review it with 5 Stars on Apple Podcasts,
01:56:05.680 | follow us on Spotify, support on Patreon, or connect with me on Twitter @LexFriedman.
01:56:11.440 | And now let me leave you with some words from Groucho Marx.
01:56:15.760 | "If you're not having fun, you're doing something wrong."
01:56:19.520 | Thank you for listening and hope to see you next time.
01:56:23.600 | [END]
01:56:24.640 | Lex Friedman "If you're not having fun, you're doing something wrong."
01:56:29.200 | [BLANK_AUDIO]