back to indexMichael Littman: Reinforcement Learning and the Future of AI | Lex Fridman Podcast #144
Chapters
0:0 Introduction
2:30 Robot and Frank
4:50 Music
8:1 Starring in a TurboTax commercial
18:14 Existential risks of AI
36:36 Reinforcement learning
62:24 AlphaGo and David Silver
72:3 Will neural networks achieve AGI?
84:30 Bitter Lesson
97:20 Does driving require a theory of mind?
106:46 Book Recommendations
112:8 Meaning of life
00:00:00.000 |
The following is a conversation with Michael Littman, a computer science professor at Brown 00:00:04.480 |
University doing research on and teaching machine learning, reinforcement learning, 00:00:10.320 |
and artificial intelligence. He enjoys being silly and lighthearted in conversation, so this was 00:00:17.040 |
definitely a fun one. Quick mention of each sponsor, followed by some thoughts related to 00:00:22.240 |
the episode. Thank you to SimpliSafe, a home security company I use to monitor and protect 00:00:28.560 |
my apartment, ExpressVPN, the VPN I've used for many years to protect my privacy on the internet, 00:00:34.160 |
Masterclass, online courses that I enjoy from some of the most amazing humans in history, 00:00:40.000 |
and BetterHelp, online therapy with a licensed professional. Please check out these sponsors 00:00:46.560 |
in the description to get a discount and to support this podcast. As a side note, let me say 00:00:52.400 |
that I may experiment with doing some solo episodes in the coming month or two. The three ideas I have 00:00:59.040 |
floating in my head currently is to use one particular moment in history, two a particular 00:01:06.240 |
movie, or three a book to drive a conversation about a set of related concepts. For example, 00:01:13.360 |
I could use 2001 A Space Odyssey or Ex Machina to talk about AGI for one, two, three hours. 00:01:21.520 |
Or I could do an episode on the, yes, rise and fall of Hitler and Stalin, each in a separate 00:01:29.120 |
episode, using relevant books and historical moments for reference. I find the format of a 00:01:35.040 |
solo episode very uncomfortable and challenging, but that just tells me that it's something I 00:01:40.960 |
definitely need to do and learn from the experience. Of course, I hope you come along 00:01:46.320 |
for the ride. Also, since we have all this momentum built up on announcements, I'm giving a few 00:01:52.080 |
lectures on machine learning at MIT this January. In general, if you have ideas for the episodes, 00:01:58.320 |
for the lectures, or for just short videos on YouTube, let me know in the comments that I still 00:02:07.120 |
definitely read, despite my better judgment and the wise sage advice of the great Joe Rogan. 00:02:15.920 |
If you enjoy this thing, subscribe on YouTube, review it with Five Stars on Apple Podcast, 00:02:20.320 |
follow on Spotify, support on Patreon, or connect with me on Twitter @LexFriedman. 00:02:25.760 |
And now, here's my conversation with Michael Littman. I saw a video of you talking to Charles 00:02:33.440 |
Isbell about Westworld, the TV series. You guys were doing the kind of thing where you're 00:02:37.760 |
watching new things together, but let's rewind back. Is there a sci-fi movie or book 00:02:46.000 |
or shows that was profound, that had an impact on you philosophically, or just specifically 00:02:53.040 |
something you enjoyed nerding out about? >> Yeah, interesting. I think a lot of us 00:02:57.680 |
have been inspired by robots in movies. One that I really like is, there's a movie called 00:03:04.240 |
Robot and Frank, which I think is really interesting because it's very near-term future, 00:03:09.360 |
where robots are being deployed as helpers in people's homes. And we don't know how to make 00:03:16.880 |
robots like that at this point, but it seemed very plausible. It seemed very realistic or 00:03:22.240 |
imaginable. And I thought that was really cool because they're awkward, they do funny things, 00:03:27.200 |
it raised some interesting issues, but it seemed like something that would ultimately be helpful 00:03:31.200 |
and good if we could do it right. >> Yeah, he was an older, cranky gentleman. 00:03:34.640 |
>> He was an older, cranky jewel thief, yeah. >> It's kind of a funny little thing, which is, 00:03:39.840 |
he's a jewel thief, and so he pulls the robot into his life, which is something you could imagine 00:03:47.200 |
taking a home robotics thing and pulling into whatever quirky thing that's involved in your 00:03:55.600 |
existence. >> Yeah, that's meaningful to you. 00:03:57.200 |
Exactly so. Yeah, and I think from that perspective, I mean, not all of us are 00:04:00.560 |
jewel thieves, and so when we bring our robots into our lives- >> Speak for yourself, yeah. 00:04:03.440 |
>> It explains a lot about this apartment, actually. But no, the idea that people should 00:04:10.000 |
have the ability to make this technology their own, that it becomes part of their lives. And I think 00:04:16.320 |
it's hard for us as technologists to make that kind of technology. It's easier to mold people 00:04:21.840 |
into what we need them to be. And just that opposite vision, I think, is really inspiring. 00:04:26.640 |
>> And then there's a anthropomorphization where we project certain things on them, 00:04:31.600 |
because I think the robot was kind of dumb. But I have a bunch of Roombas I play with, and 00:04:35.440 |
you immediately project stuff onto them, a much greater level of intelligence. We'll probably do 00:04:40.800 |
that with each other, too, a much greater degree of compassion. >> That's right. One of the things 00:04:45.520 |
we're learning from AI is where we are smart and where we are not smart. >> Yeah. You also enjoy, 00:04:52.320 |
as people can see, and I enjoy myself, watching you sing and even dance a little bit, a little bit 00:05:01.040 |
of dancing. >> A little bit of dancing. That's not quite my thing. >> As a method of education 00:05:06.640 |
or just in life, in general. So easy question. What's the definitive, objectively speaking, 00:05:15.920 |
top three songs of all time? Maybe something that, to walk that back a little bit, maybe something 00:05:23.360 |
that others might be surprised by, the three songs that you kind of enjoy. >> That is a great 00:05:29.200 |
question that I cannot answer, but instead, let me tell you a story. >> Pick a question you do 00:05:34.560 |
want to answer. >> That's right. I've been watching the presidential debates and vice 00:05:37.680 |
presidential debates, and it turns out, yeah, you can just answer any question you want. 00:05:41.120 |
>> Let me interrupt you. >> That's a related question. >> No, I'm just kidding. >> 00:05:45.960 |
Yeah, well said. I really like pop music. I've enjoyed pop music ever since I was very young. So 00:05:51.040 |
'60s music, '70s music, '80s music, this is all awesome. And then I had kids, and I think I stopped 00:05:56.240 |
listening to music, and I was starting to realize that my musical taste had sort of frozen out. 00:06:01.440 |
And so I decided in 2011, I think, to start listening to the top 10 Billboard songs each 00:06:07.920 |
week. So I'd be on the treadmill, and I would listen to that week's top 10 songs so I could 00:06:12.320 |
find out what was popular now. And what I discovered is that I have no musical taste whatsoever. I like 00:06:19.200 |
what I'm familiar with. And so the first time I'd hear a song, it's the first week that it was on 00:06:23.920 |
the charts. I'd be like, "Ugh." And then the second week, I was into it a little bit. And the third 00:06:28.480 |
week, I was loving it. And by the fourth week, it was just part of me. And so I'm afraid that I can't 00:06:34.640 |
tell you my favorite song of all time, because it's whatever I heard most recently. >> Yeah, 00:06:38.880 |
that's interesting. People have told me that there's an art to listening to music as well. 00:06:46.640 |
And you can start to, if you listen to a song just carefully, explicitly just force yourself to 00:06:51.840 |
really listen, you start to... I did this when I was part of a jazz band and fusion band in college, 00:06:57.040 |
you start to hear the layers of the instruments. You start to hear the individual instruments. 00:07:03.520 |
You can listen to classical music or to orchestra this way. You can listen to jazz this way. 00:07:09.280 |
It's funny to imagine you now, to walk in that forward, to listening to pop hits now as like a 00:07:17.440 |
scholar, listening to like Cardi B or something like that, or Justin Timberlake. No, not Timberlake, 00:07:23.920 |
Bieber. >> They've both been in the top 10 since I've been listening. >> They're still up there. 00:07:29.360 |
Oh my God, I'm so cool. >> If you haven't heard Justin Timberlake's top 10 in the last few years, 00:07:34.080 |
there was one song that he did where the music video was set at essentially NUR-IPS. >> Oh, 00:07:40.160 |
wow. Oh, the one with the robotics. Yeah, yeah, yeah, yeah, yeah. >> Yeah, yeah. It's like at 00:07:44.560 |
an academic conference and he's doing a demo. >> He was presenting, right? >> It was sort of 00:07:48.320 |
a cross between the Apple, like Steve Jobs kind of talk and NUR-IPS. It's always fun when AI shows 00:07:57.040 |
up in pop culture. >> I wonder if he consulted somebody for that. That's really interesting. 00:08:01.840 |
So maybe on that topic, I've seen your celebrity multiple dimensions, but one of them is you've 00:08:07.440 |
done cameos in different places. I've seen you in a TurboTax commercial as like, I guess, the 00:08:15.360 |
brilliant Einstein character. And the point is that TurboTax doesn't need somebody like you. 00:08:22.560 |
It doesn't need a brilliant person. >> Very few things need someone like me. But yes, 00:08:27.600 |
they were specifically emphasizing the idea that you don't need to be a computer expert to be able 00:08:32.400 |
to use their software. >> How'd you end up in that world? >> I think it's an interesting story. So I 00:08:36.880 |
was teaching my class. It was an intro computer science class for non-concentrators, non-majors. 00:08:42.480 |
And sometimes when people would visit campus, they would check in to say, "Hey, 00:08:48.080 |
we want to see what a class is like. Can we sit on your class?" So a person came to my class 00:08:54.560 |
who was the daughter of the brother of the husband of the best friend of my wife. 00:09:05.520 |
Anyway, basically a family friend came to campus to check out Brown and asked to come to my class 00:09:13.760 |
and came with her dad. Her dad is who I've known from various kinds of family events and so forth, 00:09:20.160 |
but he also does advertising. And he said that he was recruiting scientists for this ad, this 00:09:28.000 |
TurboTax set of ads. And he said, "We wrote the ad with the idea that we get the most brilliant 00:09:35.200 |
researchers, but they all said no. So can you help us find B-level scientists?" I'm like, "Sure, 00:09:44.880 |
that's who I hang out with. So that should be fine." So I put together a list and I did what 00:09:50.320 |
some people called a Dick Cheney. So I included myself on the list of possible candidates, 00:09:54.880 |
with a little blurb about each one and why I thought that would make sense for them to do it. 00:09:59.280 |
And they reached out to a handful of them, but then ultimately, YouTube stalked me a little bit 00:10:03.840 |
and they thought, "Oh, I think he could do this." And they said, "Okay, we're going to offer you 00:10:08.720 |
the commercial." I'm like, "What?" So it was such an interesting experience because they have 00:10:15.120 |
another world. The people who do nationwide kind of ad campaigns and television shows and movies 00:10:22.800 |
and so forth, it's quite a remarkable system that they have going because they- - It's like a set? 00:10:28.800 |
- Yeah, so I went to, it was just somebody's house that they rented in New Jersey. 00:10:34.880 |
But in the commercial, it's just me and this other woman. In reality, there were 50 people 00:10:41.600 |
in that room and another, I don't know, half a dozen kind of spread out around the house in 00:10:46.160 |
various ways. There were people whose job it was to control the sun. They were in the backyard 00:10:51.360 |
on ladders, putting filters up to try to make sure that the sun didn't glare off the window 00:10:56.800 |
in a way that would wreck the shot. So there was like six people out there doing that. There was 00:11:00.400 |
three people out there giving snacks, the craft table. There was another three people giving 00:11:05.520 |
healthy snacks because that was a separate craft table. There was one person whose job it was 00:11:09.760 |
to keep me from getting lost. And I think the reason for all this is because so many people 00:11:15.760 |
are in one place at one time, they have to be time efficient. They have to get it done. 00:11:19.440 |
The morning they were going to do my commercial, in the afternoon they were going to do a commercial 00:11:23.440 |
of a mathematics professor from Princeton. They had to get it done. No wasted time or energy. 00:11:30.320 |
And so there's just a fleet of people all working as an organism. And it was fascinating. I was 00:11:35.040 |
just the whole time, I'm just looking around like, this is so neat. Like one person whose job it was 00:11:39.760 |
to take the camera off of the cameraman so that someone else whose job it was to remove the film 00:11:46.480 |
canister, because every couple's takes, they had to replace the film because film gets used up. 00:11:51.920 |
It was just, I don't know. I was geeking out the whole time. It was so fun. 00:11:55.440 |
- How many takes did it take? It looked the opposite. There was more than two people there. 00:12:00.880 |
The person who I was in the scene with is a professional. She's an improv comedian from New 00:12:09.200 |
York City. And when I got there, they had given me a script, such as it was. And then I got there 00:12:13.600 |
and they said, "We're going to do this as improv." I'm like, "I don't know how to improv. 00:12:17.840 |
I don't know what you're telling me to do here." 00:12:21.840 |
"Don't worry, she knows." I'm like, "Okay. We'll see how this goes." 00:12:25.360 |
- I guess I got pulled into the story because like, where the heck did you come from? 00:12:29.840 |
I guess in the scene. Like how did you show up in this random person's house? I don't know. 00:12:35.520 |
- Yeah, well, I mean, the reality of it is I stood outside in the blazing sun. There was 00:12:39.040 |
someone whose job it was to keep an umbrella over me because I started to shvitz. I started to sweat. 00:12:43.520 |
And so I would wreck the shot because my face was all shiny with sweat. So there was one person 00:12:47.360 |
who would dab me off, had an umbrella. But yeah, like the reality of it, like why is this strange, 00:12:53.440 |
stalkery person hanging around outside somebody's house? 00:12:55.840 |
- Yeah, we're not sure. We'll have to look in. We'll have to wait for the book. But are you... 00:13:00.080 |
So you make, like you said, YouTube, you make videos yourself. You make awesome parody, 00:13:06.800 |
sort of parody songs that kind of focus in on a particular aspect of computer science. 00:13:13.360 |
How much, those seem really natural. How much production value goes into that? Do you also 00:13:18.480 |
have a team of 50 people? - The videos, almost all the videos, 00:13:22.480 |
except for the ones that people would have actually seen, are just me. I write the lyrics, 00:13:26.880 |
I sing the song. I generally find a backing track online because I'm like you, can't really play an 00:13:35.280 |
instrument. And then I do, in some cases, I'll do visuals using just like PowerPoint. 00:13:41.200 |
Lots and lots of PowerPoint to make it sort of like an animation. The most produced one is the 00:13:46.160 |
one that people might've seen, which is the overfitting video that I did with Charles Isbell. 00:13:51.200 |
And that was produced by the Georgia Tech and Udacity people 'cause we were doing a class 00:13:56.640 |
together. It was kind of, I usually do parody songs kind of to cap off a class at the end of a 00:14:05.440 |
- You're wearing the Michael Jackson, the red leather jacket. The interesting thing with 00:14:10.400 |
podcasting that you're also into is that I really enjoy is that there's not a team of people. 00:14:20.160 |
It's kind of more, 'cause you know, there's something that happens when there's more people 00:14:29.040 |
involved than just one person, that just the way you start acting, I don't know, there's a censorship, 00:14:36.400 |
you're not given, especially for like slow thinkers like me, you're not, and I think most 00:14:41.840 |
of us are if we're trying to actually think, we're a little bit slow and careful. It kind of, 00:14:48.640 |
large teams get in the way of that. And I don't know what to do with, like that's the, to me, 00:14:55.840 |
like if, you know, it's very popular to criticize quote unquote mainstream media. 00:15:01.760 |
But there is legitimacy to criticizing them, the same, I love listening to NPR for example, 00:15:06.880 |
but every, it's clear that there's a team behind it, there's a commercial, there's constant 00:15:12.000 |
commercial breaks, there's this kind of like rush of like, okay, I have to interrupt you now 00:15:17.520 |
because we have to go to commercial, just this whole, it creates, it destroys the possibility 00:15:28.160 |
Evian, which Charles Isbell, who I talked to yesterday, told me that Evian is naive backwards, 00:15:36.080 |
which the fact that his mind thinks this way is just, it's quite brilliant. Anyway, there's a 00:15:41.520 |
freedom to this podcast. - He's Dr. Awkward, which by the way, 00:15:44.800 |
is a palindrome. That's a palindrome that I happen to know from other parts of my life. 00:15:48.960 |
- You just threw it out of, well, you know, - It's gonna stick. 00:15:51.760 |
- Use it against Charles. Dr. Awkward. - So what was the most challenging 00:15:57.600 |
parody song to make? Was it the Thriller one? - No, that one was really fun. I wrote the lyrics 00:16:03.200 |
really quickly and then I gave it over to the production team. They recruited an acapella group 00:16:09.040 |
to sing. That went really smoothly. It's great having a team 'cause then you can just focus on 00:16:13.840 |
the part that you really love, which in my case is writing the lyrics. For me, the most challenging 00:16:18.960 |
one, not challenging in a bad way, but challenging in a really fun way was I did, one of the parody 00:16:25.040 |
songs I did is about the halting problem in computer science. The fact that you can't create 00:16:30.640 |
a program that can tell for any other arbitrary program whether it actually gonna get stuck in an 00:16:36.320 |
infinite loop or whether it's going to eventually stop. And so I did it to an 80s song because 00:16:42.720 |
I hadn't started my new thing of learning current songs. And it was Billy Joel's The Piano Man. 00:16:54.320 |
- Sing Me a Song, You're the Piano Man. Yeah, it's a great song. 00:16:58.960 |
- So the lyrics are great because first of all, it rhymes. Not all songs rhyme. I've done 00:17:05.680 |
Rolling Stones songs, which turn out to have no rhyme scheme whatsoever. They're just sort of 00:17:10.400 |
yelling and having a good time, which makes it not fun from a parody perspective 'cause you can say 00:17:15.120 |
anything. But the lines rhymed and there was a lot of internal rhymes as well. And so figuring out 00:17:20.320 |
how to sing with internal rhymes, a proof of the halting problem was really challenging. And I 00:17:26.240 |
really enjoyed that process. - What about, last question on this topic, 00:17:30.800 |
what about the dancing in the Thriller video? How many takes did that take? 00:17:33.760 |
- So I wasn't planning to dance. They had me in the studio and they gave me the jacket and it's 00:17:39.680 |
like, well, you can't, if you have the jacket and the glove, like there's not much you can do. 00:17:46.960 |
And then they said, why don't you dance a little bit? There was a scene with me and Charles dancing 00:17:51.760 |
- They did not use it in the video, but we recorded it. 00:17:53.680 |
- I don't remember. - Yeah, yeah, no, it was pretty funny. 00:17:56.800 |
And Charles, who has this beautiful, wonderful voice, doesn't really sing. He's not really a 00:18:03.520 |
singer. And so that was why I designed the song with him doing a spoken section and me doing the 00:18:08.000 |
singing section. - Yeah, it's very like Barry White. 00:18:09.440 |
- Yeah, just smooth baritone. Yeah, yeah, it's great. 00:18:12.480 |
- Yeah, it was awesome. So one of the other things Charles said is that, you know, everyone 00:18:19.200 |
knows you as like a super nice guy, super passionate about teaching and so on. What he said, 00:18:26.400 |
don't know if it's true, that despite the fact that you're, you are super-- 00:18:32.320 |
Like, okay, all right, I will admit this finally for the first time, that was me. 00:18:36.240 |
- It's the Johnny Cash song, "Killed a man in Reno just to watch him die." 00:18:41.760 |
That you actually do have some strong opinions on some topics. So if this, in fact, is true, what 00:18:47.840 |
strong opinions would you say you have? Is there ideas you think, maybe in artificial 00:18:55.120 |
intelligence, machine learning, maybe in life, that you believe is true that others might, 00:19:01.200 |
you know, some number of people might disagree with you on? 00:19:05.280 |
- So I try very hard to see things from multiple perspectives. 00:19:09.760 |
There's this great Calvin and Hobbes cartoon where, do you know? 00:19:16.240 |
- Okay, so Calvin's dad is always kind of a bit of a foil and he talked Calvin into, 00:19:21.440 |
Calvin had done something wrong, the dad talks him into like seeing it from another perspective 00:19:25.440 |
and Calvin, like this breaks Calvin because he's like, oh my gosh, now I can see the opposite 00:19:30.560 |
sides of things and so it becomes like a cubist cartoon where there is no front and back, 00:19:36.000 |
everything's just exposed and it really freaks him out and finally he settles back down. It's like, 00:19:39.840 |
oh good, now I can make that go away. But like I'm that, I live in that world where I'm trying 00:19:44.880 |
to see everything from every perspective all the time. So there are some things that I've 00:19:48.400 |
formed opinions about that would be harder, I think, to disavow me of. One is the super 00:19:56.240 |
intelligence argument and the existential threat of AI is one where I feel pretty confident in my 00:20:02.720 |
feeling about that one. Like I'm willing to hear other arguments but like I am not particularly 00:20:07.840 |
moved by the idea that if we're not careful, we will accidentally create a super intelligence that 00:20:13.760 |
will destroy human life. - Let's talk about that, 00:20:16.160 |
let's get you in trouble and record your own video. It's like Bill Gates, I think he said like 00:20:22.240 |
some quote about the internet that that's just gonna be a small thing, it's not gonna really go 00:20:26.880 |
anywhere. And I think Steve Ballmer said, I don't know why I'm sticking on Microsoft, that's 00:20:33.600 |
something that like smartphones are useless, there's no reason why Microsoft should get into 00:20:38.960 |
smartphones, that kind of. So let's talk about AGI. As AGI is destroying the world, we'll look 00:20:44.640 |
back at this video and see. No, I think it's really interesting to actually talk about because nobody 00:20:49.280 |
really knows the future so you have to use your best intuition, it's very difficult to predict it. 00:20:54.080 |
But you have spoken about AGI and the existential risks around it and sort of based on your 00:21:01.280 |
intuition that we're quite far away from that being a serious concern relative to the other 00:21:08.480 |
concerns we have. Can you maybe unpack that a little bit? - Yeah, sure, sure, sure. So as I 00:21:15.120 |
understand it, for example, I read Bostrom's book and a bunch of other reading material about this 00:21:22.080 |
sort of general way of thinking about the world and I think the story goes something like this, 00:21:26.320 |
that we will at some point create computers that are smart enough that they can help design the 00:21:35.200 |
next version of themselves, which itself will be smarter than the previous version of themselves, 00:21:41.360 |
and eventually bootstrapped up to being smarter than us, at which point we are essentially at 00:21:47.920 |
the mercy of this sort of more powerful intellect, which in principle we don't have any control over 00:21:55.600 |
what its goals are. And so if its goals are at all out of sync with our goals, like for example, 00:22:03.440 |
the continued existence of humanity, we won't be able to stop it. It'll be way more powerful than 00:22:09.840 |
us and we will be toast. So there's some, I don't know, very smart people who have signed on to that 00:22:17.120 |
story and it's a compelling story. I once, now I can really get myself in trouble, I once wrote 00:22:23.760 |
an op-ed about this, specifically responding to some quotes from Elon Musk, who has been on this 00:22:29.440 |
very podcast more than once. - AI summoning the demon, I forget. - That's a thing he said, 00:22:36.800 |
but then he came to Providence, Rhode Island, which is where I live, and said to the governors 00:22:42.160 |
of all the states, "You're worried about entirely the wrong thing. You need to be worried about AI. 00:22:47.360 |
You need to be very, very worried about AI." And journalists kind of reacted to that and they 00:22:53.600 |
wanted to get people's take. And I was like, "Okay." My belief is that one of the things that 00:23:00.320 |
makes Elon Musk so successful and so remarkable as an individual is that he believes in the power 00:23:06.400 |
of ideas. He believes that you can have, you can, if you have a really good idea for getting into 00:23:11.600 |
space, you can get into space. If you have a really good idea for a company or for how to 00:23:15.600 |
change the way that people drive, you just have to do it and it can happen. It's really natural 00:23:22.160 |
to apply that same idea to AI. You see these systems that are doing some pretty remarkable 00:23:26.640 |
computational tricks, demonstrations, and then to take that idea and just push it all the way to the 00:23:33.840 |
limit and think, "Okay, where does this go? Where is this going to take us next?" And if you're a 00:23:38.240 |
deep believer in the power of ideas, then it's really natural to believe that those ideas could 00:23:43.680 |
be taken to the extreme and kill us. So I think his strength is also his undoing because that 00:23:51.360 |
doesn't mean it's true. It doesn't mean that that has to happen, but it's natural for him to think 00:23:55.920 |
that. - So another way to phrase the way he thinks, and I find it very difficult to argue 00:24:03.440 |
with that line of thinking. So Sam Harris is another person from a neuroscience perspective 00:24:08.560 |
that thinks like that, is saying, "Well, is there something fundamental in the physics of the 00:24:16.240 |
universe that prevents this from eventually happening?" And Nick Bostrom thinks in the same 00:24:22.000 |
way, that kind of zooming out, "Yeah, okay, we humans now are existing in this time scale of 00:24:30.000 |
minutes and days, and so our intuition is in this time scale of minutes, hours, and days, 00:24:35.680 |
but if you look at the span of human history, is there any reason you can't see this in 100 years? 00:24:44.160 |
And is there something fundamental about the laws of physics that prevent this? And if it doesn't, 00:24:50.480 |
then it eventually will happen, or we will destroy ourselves in some other way." And it's very 00:24:56.000 |
difficult, I find, to actually argue against that. - Yeah, me too. - And not sound like you're just 00:25:08.640 |
rolling your eyes, "Ugh, I have--" - It's science fiction, we don't have to think about it. - But 00:25:12.880 |
even worse than that, which is like, I don't have kids, but I gotta pick up my kids now. - I see, 00:25:19.040 |
there's more pressing short-term-- - Yeah, there's more pressing short-term things that 00:25:23.280 |
stop it with this existential crisis, we have much shorter things, like now, especially this year, 00:25:27.840 |
there's COVID, so any kind of discussion like that is, there's pressing things today. And then, 00:25:36.800 |
so the Sam Harris argument, well, any day, the exponential singularity can occur, 00:25:44.080 |
and it's very difficult to argue against. I mean, I don't know. - But part of his story is also, 00:25:48.000 |
he's not gonna put a date on it. It could be in 1,000 years, it could be in 100 years, 00:25:52.560 |
it could be in two years. It's just that as long as we keep making this kind of progress, 00:25:56.320 |
it ultimately has to become a concern. I kind of am on board with that, but the thing that, 00:26:02.000 |
the piece that I feel like is missing from that way of extrapolating from the moment that we're in 00:26:07.200 |
is that I believe that in the process of actually developing technology that can really get around 00:26:13.120 |
in the world and really process and do things in the world in a sophisticated way, we're gonna learn 00:26:18.480 |
a lot about what that means, which that we don't know now, 'cause we don't know how to do this 00:26:23.120 |
right now. If you believe that you can just turn on a deep learning network and it eventually, 00:26:27.680 |
give it enough compute and it'll eventually get there, well, sure, that seems really scary, 00:26:31.200 |
because we won't be in the loop at all. We won't be helping to design or target these kinds of 00:26:37.680 |
systems. But I don't see that, that feels like it is against the laws of physics, because these 00:26:44.160 |
systems need help, right? They need to surpass the difficulty, the wall of complexity that happens 00:26:51.440 |
in arranging something in the form that that will happen in. Like I believe in evolution. Like I 00:26:57.600 |
believe that there's an argument, right? So there's another argument, just to look at it from a 00:27:02.480 |
different perspective, that people say, "Well, I don't believe in evolution. How could evolution, 00:27:06.240 |
it's sort of like a random set of parts assemble themselves into a 747, and that could just never 00:27:13.680 |
happen." So it's like, okay, that's maybe hard to argue against, but clearly 747s do get assembled, 00:27:19.840 |
they get assembled by us. Basically the idea being that there's a process by which we will get to the 00:27:26.080 |
point of making technology that has that kind of awareness. And in that process, we're gonna learn 00:27:31.520 |
a lot about that process, and we'll have more ability to control it or to shape it or to build 00:27:37.200 |
it in our own image. It's not something that is gonna spring into existence like that 747, 00:27:43.120 |
and we're just gonna have to contend with it completely unprepared. 00:27:46.640 |
- That's very possible that in the context of the long arc of human history, it will in fact spring 00:27:53.920 |
into existence. But that springing might take, like if you look at nuclear weapons, like even 00:28:00.800 |
20 years is a springing in the context of human history. And it's very possible, just like with 00:28:07.440 |
nuclear weapons, that we could have, I don't know what percentage you wanna put at it, but 00:28:12.480 |
the possibility of-- - Could have knocked ourselves out. 00:28:14.400 |
- Yeah, the possibility of human beings destroying themselves in the 20th century, 00:28:18.480 |
with nuclear weapons, I don't know, if you really think through it, you could really put it close 00:28:24.560 |
to like, I don't know, 30, 40%, given like the certain moments of crisis that happen. So like, 00:28:30.800 |
I think one, like fear in the shadows that's not being acknowledged, is it's not so much the AI 00:28:40.080 |
will run away, is that as it's running away, we won't have enough time to think through how to 00:28:51.440 |
- Yeah, I mean, my much bigger concern, I wonder what you think about it, which is, 00:28:56.240 |
we won't know it's happening. So I kind of-- - That argument, yeah. 00:29:03.120 |
- Think that there's an AGI situation already happening with social media, that our minds, 00:29:10.720 |
our collective intelligence of human civilization is already being controlled by an algorithm. 00:29:15.120 |
And like, we're already super, like the level of a collective intelligence, thanks to Wikipedia, 00:29:22.800 |
people should donate to Wikipedia to feed the AGI. - Man, if we had a super intelligence that 00:29:28.240 |
was in line with Wikipedia's values, that it's a lot better than a lot of other things I can 00:29:34.080 |
imagine. I trust Wikipedia more than I trust Facebook or YouTube, as far as trying to do 00:29:39.520 |
the right thing from a rational perspective. - Yeah. 00:29:41.680 |
- Now that's not where you were going, I understand that, but it does strike me that 00:29:45.120 |
there's sort of smarter and less smart ways of exposing ourselves to each other on the internet. 00:29:50.960 |
- Yeah, the interesting thing is that Wikipedia and social media are very different forces, 00:29:55.360 |
you're right, I mean, Wikipedia, if AGI was Wikipedia, it'd be just like this cranky, 00:30:01.680 |
overly competent editor of articles. There's something to that, but the social media aspect 00:30:09.360 |
is not, so the vision of AGI is as a separate system that's super intelligent, 00:30:16.240 |
that's super intelligent, that's one key little thing. I mean, there's the paperclip argument 00:30:20.880 |
that's super dumb, but super powerful systems. But with social media, you have a relatively, 00:30:26.960 |
like algorithms we may talk about today, very simple algorithms that when, 00:30:32.480 |
so something Charles talks a lot about, which is interactive AI, when they start having at scale, 00:30:39.760 |
like tiny little interactions with human beings, they can start controlling these human beings. 00:30:44.480 |
So a single algorithm can control the minds of human beings slowly, to what we might not 00:30:50.240 |
realize, it could start wars, it could start, it could change the way we think about things. 00:30:56.000 |
It feels like in the long arc of history, if I were to sort of zoom out from all the outrage 00:31:02.880 |
and all the tension on social media, that it's progressing us towards better and better things. 00:31:09.200 |
It feels like chaos and toxic and all that kind of stuff, but-- 00:31:14.880 |
- But it feels like actually the chaos and toxic is similar to the kind of debates we had 00:31:20.640 |
from the founding of this country. There was a civil war that happened over that period. And 00:31:26.160 |
ultimately it was all about this tension of like, something doesn't feel right about our 00:31:31.440 |
implementation of the core values we hold as human beings, and they're constantly struggling with 00:31:36.080 |
this. And that results in people calling each other, just being shady to each other on Twitter. 00:31:44.880 |
But ultimately the algorithm is managing all that. And it feels like there's a possible future in 00:31:50.720 |
which that algorithm controls us into the direction of self-destruction, whatever that looks like. 00:31:58.640 |
- Yeah, so, all right, I do believe in the power of social media to screw us up royally. I do 00:32:04.880 |
believe in the power of social media to benefit us too. I do think that we're in a, yeah, it's 00:32:11.680 |
sort of almost got dropped on top of us. And now we're trying to, as a culture, figure out how to 00:32:15.840 |
cope with it. There's a sense in which, I don't know, there's some arguments that say that, for 00:32:22.160 |
example, I guess, college-age students now, late college-age students now, people who were in 00:32:26.960 |
middle school when social media started to really take off, may be really damaged. Like, this may 00:32:34.240 |
have really hurt their development in a way that we don't have all the implications of quite yet. 00:32:38.880 |
That's the generation who, if, and I hate to make it somebody else's responsibility, but like, 00:32:46.240 |
they're the ones who can fix it. They're the ones who can figure out how do we keep the good 00:32:52.000 |
of this kind of technology without letting it eat us alive? And if they're successful, 00:32:59.120 |
we move on to the next phase, the next level of the game. If they're not successful, then, yeah, 00:33:04.640 |
then we're going to wreck each other. We're going to destroy society. 00:33:07.840 |
- So you're going to, in your old age, sit on a porch and watch the world burn 00:33:14.480 |
- I believe, well, so this is my kid's age, right? And it's certainly my daughter's age, 00:33:18.800 |
and she's very tapped in to social stuff, but she's also, she's trying to find that balance, 00:33:24.080 |
right, of participating in it and in getting the positives of it, but without letting it 00:33:28.000 |
eat her alive. And I think sometimes she ventures, I hope she doesn't watch this, 00:33:33.680 |
sometimes I think she ventures a little too far and is consumed by it, and other times she gets 00:33:40.080 |
a little distance. And if there's enough people like her out there, they're going to navigate 00:33:46.800 |
this choppy waters. - That's an interesting skill, 00:33:50.960 |
actually, to develop. I talked to my dad about it. I've now, somehow, this podcast in particular, 00:33:58.960 |
but other reasons, has received a little bit of attention. And with that, apparently, in this 00:34:05.200 |
world, even though I don't shut up about love and I'm just all about kindness, I have now a little 00:34:11.680 |
mini army of trolls. It's kind of hilarious, actually, but it also doesn't feel good. But 00:34:18.560 |
it's a skill to learn to not look at that, to moderate, actually, how much you look at that. 00:34:25.760 |
The discussion I have with my dad is similar to, it doesn't have to be about trolls, it could be 00:34:30.320 |
about checking email, which is, if you're anticipating, my dad runs a large institute 00:34:37.680 |
at Drexel University, and there could be stressful emails you're waiting, there's drama of some 00:34:43.680 |
kinds. And so there's a temptation to check the email, if you send an email, and that pulls you 00:34:50.320 |
in into, it doesn't feel good. And it's a skill that he actually complains that he hasn't learned, 00:34:57.040 |
I mean, he grew up without it, so he hasn't learned the skill of how to shut off the internet 00:35:02.400 |
and walk away. And I think young people, while they're also being, quote unquote, damaged by 00:35:09.520 |
being bullied online, all of those stories, which are very horrific, you basically can't escape your 00:35:14.800 |
bullies these days when you're growing up. But at the same time, they're also learning that skill 00:35:19.760 |
of how to be able to shut off, disconnect with it, be able to laugh at it, not take it too seriously. 00:35:28.080 |
It's fascinating. We're all trying to figure this out, just like you said, it's been dropped on us, 00:35:31.840 |
and we're trying to figure it out. - Yeah, I think that's really 00:35:34.240 |
interesting. And I guess I've become a believer in the human design, which I feel like I don't 00:35:40.880 |
completely understand. How do you make something as robust as us? We're so flawed in so many ways, 00:35:47.200 |
and yet, and yet, we dominate the planet, and we do seem to manage to get ourselves out of scrapes, 00:35:56.000 |
eventually. Not necessarily the most elegant possible way, but somehow we get to the next step. 00:36:02.240 |
And I don't know how I'd make a machine do that. Generally speaking, like if I train one of my 00:36:09.600 |
reinforcement learning agents to play a video game, and it works really hard on that first stage over 00:36:13.920 |
and over and over again, and it makes it through, it succeeds on that first level. And then the new 00:36:18.160 |
level comes, and it's just like, okay, I'm back to the drawing board. And somehow humanity, we keep 00:36:22.400 |
leveling up, and then somehow managing to put together the skills necessary to achieve success, 00:36:29.120 |
some semblance of success in that next level too. And I hope we can keep doing that. 00:36:35.840 |
- You mentioned reinforcement learning. So you have a couple of years in the field. No. 00:36:43.040 |
- Quite a few. Quite a long career in artificial intelligence broadly, but 00:36:50.080 |
reinforcement learning specifically. Can you maybe give a hint about your sense of the history of the 00:36:57.600 |
field? In some ways it's changed with the advent of deep learning, but has a long roots. How has it 00:37:04.560 |
weaved in and out of your own life? How have you seen the community change, or maybe the ideas 00:37:09.680 |
that it's playing with change? - I've had the privilege, the pleasure, 00:37:14.240 |
of having almost a front row seat to a lot of this stuff, and it's been really, really fun and 00:37:18.800 |
interesting. So when I was in college in the '80s, early '80s, the neural net thing was starting to 00:37:28.240 |
happen. And I was taking a lot of psychology classes and a lot of computer science classes 00:37:33.440 |
as a college student. And I thought, you know, something that can play tic-tac-toe and just 00:37:38.240 |
learn to get better at it, that ought to be a really easy thing. So I spent almost all of my, 00:37:43.120 |
what would have been vacations during college, hacking on my home computer, trying to teach it 00:37:48.400 |
how to play tic-tac-toe. - Programming language. 00:37:50.320 |
- Basic. Oh yeah, that's my first language. That's my native language. 00:37:54.960 |
- Is that when you first fell in love with computer science, just like programming basic on that? 00:37:59.360 |
What was the computer? Do you remember? - I had a TRS-80 Model 1 before they were 00:38:05.680 |
called Model 1s, 'cause there was nothing else. I got my computer in 1979. So I would have been 00:38:18.000 |
bar mitzvahed, but instead of having a big party that my parents threw on my behalf, they just got 00:38:22.960 |
me a computer, 'cause that's what I really, really, really wanted. I saw them in the mall 00:38:26.800 |
in Radarshack, and I thought, what, how are they doing that? I would try to stump them. I would 00:38:32.080 |
give them math problems, like one plus, and then in parentheses, two plus one. And I would always 00:38:37.040 |
get it right. I'm like, how do you know so much? Like, I've had to go to algebra class for the 00:38:42.320 |
last few years to learn this stuff, and you just seem to know. So I was smitten, and I got a 00:38:48.240 |
computer. And I think ages 13 to 15, I have no memory of those years. I think I just was in my 00:38:55.760 |
room with the computer-- - Listening to Billy Joel. 00:38:57.920 |
- Communing, possibly listening to the radio, listening to Billy Joel. That was the one album 00:39:02.320 |
I had on vinyl at that time. And then I got it on cassette tape, and that was really helpful, 00:39:08.960 |
'cause then I could play it. I didn't have to go down to my parents' Wi-Fi, or Hi-Fi, sorry. 00:39:12.720 |
And at age 15, I remember kind of walking out, and like, okay, I'm ready to talk to people again. 00:39:19.440 |
Like, I've learned what I need to learn here. And so yeah, so that was my home computer. And so I 00:39:25.680 |
went to college, and I was like, oh, I'm totally gonna study computer science. And I opted, the 00:39:29.600 |
college I chose specifically had a computer science major. The one that I really wanted, 00:39:33.920 |
the college I really wanted to go to didn't, so bye-bye to them. 00:39:37.200 |
- Which college did you go to? - So I went to Yale. Princeton would 00:39:41.280 |
have been way more convenient, and it was just a beautiful campus, and it was close enough to home, 00:39:45.200 |
and I was really excited about Princeton. And I visited, and I said, so, computer science major. 00:39:49.040 |
They're like, well, we have computer engineering. I'm like, oh, I don't like that word, 00:39:54.800 |
I like computer science. I really, I wanna do, like, you're saying hardware and software? 00:39:58.880 |
They're like, yeah. I'm like, I just wanna do software. I couldn't care less about hardware. 00:40:01.680 |
- And you grew up in Philadelphia? - I grew up outside Philly, yeah, yeah. 00:40:04.320 |
- Okay, wow. - So the local schools were like Penn 00:40:08.160 |
and Drexel and Temple. Like, everyone in my family went to Temple at least at one point in their lives 00:40:13.680 |
except for me. So yeah, Philly family. - Yale had a computer science department, 00:40:18.800 |
and that's when you, it's kinda interesting you said '80s and neural networks. That's when 00:40:23.280 |
the neural networks was a hot new thing or a hot thing, period. So what, is that in college when 00:40:28.800 |
you first learned about neural networks? - Yeah, yeah. 00:40:30.800 |
- Or was she like, how did you-- - And it was in a psychology class, 00:40:35.680 |
or cognitive science, or like, do you remember like what context? 00:40:38.720 |
- It was, yeah, yeah, yeah. So I was a, I've always been a bit of a cognitive psychology groupie. 00:40:44.560 |
So like, I study computer science, but I like to hang around where the cognitive scientists are, 00:40:50.480 |
'cause I don't know, brains, man. They're like, they're wacky, cool. 00:40:54.640 |
- And they have a bigger picture view of things. They're a little less engineering, I would say. 00:41:00.480 |
They're more interested in the nature of cognition and intelligence and perception 00:41:04.400 |
and how the vision system works. They're asking always bigger questions. Now with the deep learning 00:41:09.920 |
community, they're I think more, there's a lot of intersections, but I do find that the 00:41:16.560 |
neuroscience folks actually and cognitive psychology, cognitive science folks are starting 00:41:23.600 |
to learn how to program, how to use artificial neural networks. And they are actually approaching 00:41:29.040 |
problems in like totally new, interesting ways. It's fun to watch that grad students from those 00:41:33.360 |
departments, like approach a problem of machine learning. 00:41:36.960 |
- Right, they come in with a different perspective. 00:41:38.880 |
- Yeah, they don't care about like your ImageNet data set or whatever. They want like to understand 00:41:44.640 |
the basic mechanisms at the neuronal level, at the functional level of intelligence. So it's kind 00:41:53.360 |
of cool to see them work. But yeah, okay, so you were always a groupie of cognitive psychology. 00:42:00.720 |
- Yeah, yeah. And so it was in a class by Richard Gehrig. He was kind of my favorite 00:42:05.920 |
psych professor in college. And I took like three different classes with him. 00:42:10.560 |
And yeah, so they were talking specifically, the class I think was kind of a, 00:42:15.840 |
there was a big paper that was written by Steven Pinker and Prince, I'm blanking on Prince's first 00:42:23.120 |
name, but Pinker and Prince, they wrote kind of a, they were at that time kind of like, 00:42:30.240 |
I'm blanking on the names of the current people, the cognitive scientists who are complaining a 00:42:42.160 |
And who else? I mean, there's a few, but Gary is the most feisty. 00:42:47.120 |
- Sure, Gary's very feisty. And with his co-author, they're kind of doing these kind 00:42:51.840 |
of takedowns where they say, okay, well, yeah, it does all these amazing things, but here's a 00:42:56.320 |
shortcoming, here's a shortcoming, here's a shortcoming. And so the Pinker-Prince paper 00:42:59.600 |
is kind of like that generation's version of Marcus and Davis, right? Where they're trained 00:43:06.800 |
as cognitive scientists, but they're looking skeptically at the results in the artificial 00:43:12.000 |
intelligence neural net kind of world and saying, yeah, it can do this and this and this, but like, 00:43:16.720 |
it can't do that, and it can't do that, and it can't do that. Maybe in principle, or maybe just 00:43:20.640 |
in practice at this point, but the fact of the matter is you've narrowed your focus too far 00:43:26.560 |
to be impressed. You're impressed with the things within that circle, but you need to broaden that 00:43:31.840 |
circle a little bit. You need to look at a wider set of problems. And so I was in this seminar 00:43:37.760 |
in college that was basically a close reading of the Pinker-Prince paper, which was like really 00:43:43.520 |
thick, 'cause there was a lot going on in there. And it talked about the reinforcement learning 00:43:50.560 |
idea a little bit. I'm like, oh, that sounds really cool, because behavior is what is really 00:43:54.480 |
interesting to me about psychology anyway. So making programs that, I mean, programs are things 00:43:59.680 |
that behave. People are things that behave. Like I wanna make learning that learns to behave. 00:44:04.640 |
- In which way was reinforcement learning presented? Is this talking about human and 00:44:09.760 |
animal behavior, or are we talking about actual mathematical constructs? 00:44:12.960 |
- Ah, right, so that's a good question. Right, so this is, I think it wasn't actually talked about 00:44:18.720 |
as behavior in the paper that I was reading. I think that it just talked about learning. 00:44:22.480 |
And to me, learning is about learning to behave, but really neural nets at that point were about 00:44:27.600 |
learning, like supervised learning, so learning to produce outputs from inputs. So I kind of 00:44:32.160 |
tried to invent reinforcement learning. When I graduated, I joined a research group at Bell 00:44:38.160 |
Core, which had spun out of Bell Labs recently at that time, because of the divestiture of the 00:44:43.120 |
long-distance and local phone service in the 1980s, 1984. And I was in a group with Dave Ackley, who 00:44:51.440 |
was the first author of the Boltzmann machine paper, so the very first neural net paper that 00:44:56.880 |
could handle XOR, right? So XOR sort of killed neural nets, the very first, the zero-width order 00:45:04.320 |
- Yeah, the Perceptron's paper, and Hinton, along with his student Dave Ackley, and I think there 00:45:12.000 |
was other authors as well, showed that, no, no, no, with Boltzmann machines, we can actually learn 00:45:16.240 |
nonlinear concepts, and so everything's back on the table again, and that kind of started that 00:45:21.440 |
second wave of neural networks. So Dave Ackley was, he became my mentor at Bell Core, and we 00:45:27.120 |
talked a lot about learning and life and computation and how all these things fit 00:45:32.000 |
together. Now Dave and I have a podcast together, so I get to kind of enjoy that sort of, 00:45:39.280 |
his perspective once again, even all these years later. And so I said, I was really interested in 00:45:46.720 |
learning, but in the concept of behavior, and he's like, "Oh, well, that's reinforcement learning 00:45:51.440 |
here," and he gave me Rich Sutton's 1984 TD paper. So I read that paper, I honestly didn't get all 00:45:58.480 |
of it, but I got the idea. I got that they were using, that he was using ideas that I was familiar 00:46:03.760 |
with in the context of neural nets and sort of back prop, but with this idea of making predictions 00:46:10.400 |
over time. I'm like, "This is so interesting, but I don't really get all the details," I said to 00:46:14.080 |
Dave. And Dave said, "Oh, well, why don't we have him come and give a talk?" And I was like, "Wait, 00:46:20.320 |
what? You can do that? Like, these are real people? I thought they were just words. I thought it was 00:46:24.560 |
just like ideas that somehow magically seeped into paper." He's like, "No, I know Rich. We'll 00:46:31.440 |
just have him come down and he'll give a talk." And so I was, my mind was blown. And so Rich came 00:46:38.400 |
and he gave a talk at Bellcore, and he talked about what he was super excited, which was they 00:46:43.120 |
had just figured out at the time, Q-learning. So Watkins had visited the Rich Sutton's lab at 00:46:52.320 |
UMass or Andy Bartow's lab that Rich was a part of. And he was really excited about this because 00:47:00.000 |
it resolved a whole bunch of problems that he didn't know how to resolve in the earlier paper. 00:47:07.200 |
TD, temporal difference, these are all just algorithms for reinforcement learning. 00:47:11.200 |
- Right, and TD, temporal difference in particular is about making predictions over time. And you can 00:47:16.480 |
try to use it for making decisions, right? 'Cause if you can predict how good a future action, 00:47:20.560 |
an action outcomes will be in the future, you can choose one that has better and, or, 00:47:25.280 |
but the theory didn't really support changing your behavior. Like the predictions had to be 00:47:29.920 |
of a consistent process if you really wanted it to work. And one of the things that was really cool 00:47:35.760 |
about Q-learning, another algorithm for reinforcement learning, is it was off policy, 00:47:39.680 |
which meant that you could actually be learning about the environment and what the value of 00:47:43.360 |
different actions would be while actually figuring out how to behave optimally. 00:47:50.080 |
- Yeah, and the proof of that is kind of interesting. I mean, that's really surprising 00:47:53.280 |
to me when I first read that. And then in Richard Sutton's book on the matter, it's kind of beautiful 00:47:59.920 |
that a single equation can capture all- - One equation, one line of code, 00:48:03.040 |
and like you can learn anything. - Yeah, like- 00:48:05.120 |
- You can get enough time. - So equation and code, you're right. 00:48:08.080 |
The code that you can arguably, at least if you squint your eyes, can say, "This is all of 00:48:17.920 |
intelligence," is that you can implement that in a single, I think I started with Lisp, which is, 00:48:24.240 |
shout out to Lisp, like a single line of code, key piece of code, maybe a couple, 00:48:30.480 |
that you could do that. It's kind of magical. It feels so good to be true. 00:48:36.080 |
- Well, and it sort of is. - Yeah, it's kind of- 00:48:40.080 |
- It seems they require an awful lot of extra stuff supporting it. But nonetheless, the idea's 00:48:45.920 |
really good. And as far as we know, it is a very reasonable way of trying to create 00:48:51.360 |
adaptive behavior, behavior that gets better at something over time. 00:48:55.120 |
- Did you find the idea of optimal at all compelling, that you could prove that it's 00:49:01.280 |
optimal? So like one part of computer science that it makes people feel warm and fuzzy inside 00:49:08.080 |
is when you can prove something like that a sorting algorithm, worst case, runs in N log N, 00:49:14.080 |
and it makes everybody feel so good. Even though in reality, it doesn't really matter what the 00:49:18.320 |
worst case is. What matters is like, does this thing actually work in practice on this particular 00:49:23.280 |
actual set of data that I enjoy? Did you- - So here's a place where I have maybe a strong 00:49:28.800 |
opinion, which is like, "You're right, of course, but no, no." So what makes worst case so great, 00:49:37.200 |
right? If you have a worst case analysis so great, is that you get modularity. You can take that 00:49:42.080 |
thing and plug it into another thing and still have some understanding of what's going to happen 00:49:47.280 |
when you click them together, right? If it just works well in practice, in other words, with 00:49:51.680 |
respect to some distribution that you care about, when you go plug it into another thing, that 00:49:56.320 |
distribution can shift, it can change, and your thing may not work well anymore. And you want it 00:50:01.040 |
to, and you wish it does, and you hope that it will, but it might not, and then, ah. 00:50:05.600 |
- So you're saying you don't like machine learning? 00:50:10.800 |
- But we have some positive theoretical results for these things. You can come back at me with, 00:50:19.680 |
"Yeah, but they're really weak," and, "Yeah, they're really weak." And you can even say that 00:50:24.480 |
sorting algorithms, like if you do the optimal sorting algorithm, it's not really the one that 00:50:28.000 |
you want, and that might be true as well. - But it is, the modularity is a really 00:50:33.360 |
powerful statement that as an engineer, you can then assemble different things, you can count on 00:50:38.000 |
them to be, I mean, it's interesting. It's a balance, like with everything else in life, 00:50:45.120 |
you don't want to get too obsessed. I mean, this is what computer scientists do, which they 00:50:49.520 |
tend to get obsessed, and they over-optimize things, or they start by optimizing, and then 00:50:54.880 |
they over-optimize. So it's easy to get really granular about these things, but the step from 00:51:02.560 |
an N squared to an N log N sorting algorithm is a big leap for most real-world systems. No matter 00:51:10.800 |
what the actual behavior of the system is, that's a big leap. And the same can probably be said for 00:51:17.520 |
other kind of first leaps that you would take in a particular problem. Like it's picking the low 00:51:24.720 |
hanging fruit, or whatever the equivalent of doing not the dumbest thing, but the next to the dumbest 00:51:33.840 |
reachable fruit. - Yeah, most delicious reachable fruit. 00:51:36.240 |
- I don't know why that's not a saying. - Yeah. Okay, so then this is the '80s, 00:51:43.920 |
and this kind of idea starts to percolate of learning a system. 00:51:47.200 |
- Yeah, well, and at that point, I got to meet Rich Sutton, so everything was sort of downhill 00:51:51.760 |
from there, and that was really the pinnacle of everything. But then I felt like I was kind of on 00:51:57.360 |
the inside, so then as interesting results were happening, I could check in with Rich, or with 00:52:02.640 |
Jerry Tesoro, who had a huge impact on early thinking in temporal difference learning and 00:52:09.120 |
reinforcement learning, and showed that you could solve problems that we didn't know how to solve 00:52:13.680 |
any other way. And so that was really cool. So as good things were happening, I would hear about it 00:52:19.440 |
from either the people who were doing it, or the people who were talking to the people who were 00:52:23.120 |
doing it. And so I was able to track things pretty well through the '90s. (laughs) 00:52:28.120 |
- So wasn't most of the excitement on reinforcement learning in the '90s era with, 00:52:34.720 |
what is it, TD Gamma? What's the role of these kind of little fun game-playing things and 00:52:42.720 |
breakthroughs about exciting the community? Was that, what were your, 'cause you've also built 00:52:50.000 |
a cross, or were part of building a crossword puzzle solver, solving program called Proverb. 00:52:59.360 |
So you were interested in this as a problem, like in forming, in using games to understand how to 00:53:09.760 |
build intelligent systems. So what did you think about TD Gamma? What did you think about that 00:53:15.280 |
whole thing in the '90s? - Yeah, I mean, I found the TD Gamma 00:53:18.400 |
result really just remarkable. So I had known about some of Jerry's stuff before he did TD 00:53:23.040 |
Gamma, and he did a system, just more vanilla, well, not entirely vanilla, but a more classical 00:53:28.880 |
back-proppy kind of network for playing backgammon, where he was training it on expert moves. So it 00:53:35.280 |
was kind of supervised, but the way that it worked was not to mimic the actions, but to learn 00:53:42.000 |
internally an evaluation function. So to learn, well, if the expert chose this over this, that 00:53:47.440 |
must mean that the expert values this more than this, and so let me adjust my weights to make it 00:53:52.160 |
so that the network evaluates this as being better than this. So it could learn from human preferences, 00:53:59.760 |
it could learn its own preferences. And then when he took the step from that to actually doing it as 00:54:06.560 |
a full-on reinforcement learning problem where you didn't need a trainer, you could just let it play, 00:54:11.840 |
that was remarkable, right? And so I think as humans often do, as we've done in the recent 00:54:19.360 |
past as well, people extrapolate. It's like, oh, well, if you can do that, which is obviously very 00:54:24.240 |
hard, then obviously you could do all these other problems that we want to solve that we know are 00:54:30.000 |
also really hard. And it turned out very few of them ended up being practical, partly because 00:54:35.840 |
I think neural nets, certainly at the time, were struggling to be consistent and reliable. And so 00:54:42.880 |
training them in a reinforcement learning setting was a bit of a mess. I had, I don't know, 00:54:48.480 |
generation after generation of like master students who wanted to do value function 00:54:54.960 |
approximation, basically reinforcement learning with neural nets. And over and over and over again, 00:55:02.640 |
we were failing. We couldn't get the good results that Jerry Tesoro got. I now believe that Jerry is 00:55:08.000 |
a neural net whisperer. He has a particular ability to get neural networks to do things that 00:55:15.200 |
other people would find impossible. And it's not the technology, it's the technology and Jerry 00:55:21.280 |
together. - Yeah, which I think speaks to the role of the human expert in the process of machine 00:55:28.000 |
learning. - Right, it's so easy. We're so drawn to the idea that it's the technology that is where 00:55:33.920 |
the power is coming from that I think we lose sight of the fact that sometimes you need a really 00:55:39.040 |
good, just like, I mean, no one would think, "Hey, here's this great piece of software. Here's like, 00:55:42.640 |
I don't know, GNU Emacs or whatever." And doesn't that prove that computers are super powerful and 00:55:48.560 |
basically going to take over the world? It's like, no, Stalman is a hell of a hacker, right? So he 00:55:52.880 |
was able to make the code do these amazing things. He couldn't have done it without the computer, 00:55:57.280 |
but the computer couldn't have done it without him. And so I think people discount the role of 00:56:01.920 |
people like Jerry who have just a particular set of skills. - On that topic, by the way, 00:56:10.640 |
as a small side note, I tweeted, "Emacs is greater than Vim" yesterday and deleted the tweet 10 00:56:18.800 |
minutes later when I realized it started a war. - You were on fire. - I was like, "Oh, I was just 00:56:25.040 |
kidding." I was just being- - Provocative. - Walk back and forth. So people still feel 00:56:31.440 |
passionately about that particular piece of great software. - Yeah, I don't get that 'cause Emacs is 00:56:35.680 |
clearly so much better. I don't understand. But why do I say that? Because I spent a block of time 00:56:41.600 |
in the '80s making my fingers know the Emacs keys. And now that's part of the thought process for me. 00:56:49.920 |
I need to express. And if you take my Emacs key bindings away, I become... I can't express myself. 00:56:59.520 |
- I'm the same way with the, I don't know if you know what it is, but a Kinesis keyboard, 00:57:03.360 |
which is this butt-shaped keyboard. - Yes, I've seen them. They're very, 00:57:08.080 |
I don't know, sexy, elegant? They're just beautiful. - Yeah, they're gorgeous, way too expensive. But 00:57:17.520 |
the problem with them, similar with Emacs, is once you learn to use it- - It's harder to use 00:57:25.200 |
other things. - It's hard to use other things. There's this absurd thing where I have small, 00:57:29.280 |
elegant, lightweight, beautiful little laptops, and I'm sitting there in a coffee shop with a 00:57:34.000 |
giant Kinesis keyboard and a sexy little laptop. It's absurd. I used to feel bad about it, but at 00:57:41.440 |
the same time, you just have to... Sometimes it's back to the Billy Joel thing. You just have to 00:57:46.400 |
throw that Billy Joel record and throw Taylor Swift and Justin Bieber to the wind. 00:57:51.520 |
- See, but I like them now because, again, I have no musical taste. Now that I've heard 00:57:57.600 |
Justin Bieber enough, I'm like, "I really like his songs." And Taylor Swift, not only do I like 00:58:02.480 |
her songs, but my daughter's convinced that she's a genius, and so now I basically am signed on to 00:58:06.640 |
that. - Interesting. So yeah, that speaks back to the robustness of the human brain. That speaks 00:58:12.000 |
to the neuroplasticity that you can just, like a mouse, teach yourself to, or probably a dog, 00:58:19.120 |
teach yourself to enjoy Taylor Swift. I'll try it out. I don't know. I tried. You know what? It has 00:58:25.680 |
to do with just like acclimation, right? Just like you said, a couple of weeks. That's an interesting 00:58:30.480 |
experiment. I'll actually try that. - That wasn't the intent of the experiment? Just like social 00:58:34.240 |
media, it wasn't intended as an experiment to see what we can take as a society, but it turned out 00:58:38.720 |
that way. - I don't think I'll be the same person on the other side of the week listening to Taylor 00:58:42.880 |
Swift, but let's try. - No, it's more compartmentalized. Don't be so worried. I get that 00:58:48.240 |
you can be worried, but don't be so worried because we compartmentalize really well, and so 00:58:51.920 |
it won't bleed into other parts of your life. You won't start, I don't know, wearing red lipstick or 00:58:56.960 |
whatever. It's fine. It's fine. - Change fashion and everything. - It's fine, but you know what? 00:59:01.120 |
The thing you have to watch out for is you'll walk into a coffee shop once we can do that again. 00:59:04.800 |
- And recognize the song. - And you'll be, no, you won't know that you're singing along. 00:59:09.120 |
Until everybody in the coffee shop is looking at you, and then you're like, that wasn't me. 00:59:13.840 |
- Yeah, that's the, you know, people are afraid of AGI. I'm afraid of the Taylor-- 00:59:19.200 |
- The Taylor Swift takeover. - Yeah, and I mean, people should know that TD Gammon 00:59:24.720 |
was, I get, would you call it, do you like the terminology of self-play by any chance? 00:59:32.480 |
by playing themselves? Just, I don't know if it's the best word, but-- - So what's 00:59:38.480 |
the problem with that term? - I don't know. Silly, it's like the big bang, like, it's like 00:59:45.440 |
talking to serious physicists, do you like the term big bang? When it was early, I feel like it's 00:59:50.160 |
the early days of self-play. I don't know, maybe it was used previously, but I think it's been used 00:59:55.520 |
by only a small group of people. And so like, I think we're still deciding, is this ridiculously 01:00:00.880 |
silly name a good name for the concept, potentially one of the most important concepts in artificial 01:00:06.400 |
intelligence? - Well, okay, it depends how broadly you apply the term. So I used the term in my 1996 01:00:11.280 |
PhD dissertation. - Oh, you, wow, the actual terms of-- - Yeah, because Tesoro's paper was something 01:00:17.600 |
like training up an expert backgammon player through self-play. So I think it was in the 01:00:22.400 |
title of his paper. - Oh, okay. - If not in the title, it was definitely a term that he used. 01:00:26.320 |
There's another term that we got from that work is rollout. So I don't know if you, do you ever 01:00:30.720 |
hear the term rollout? That's a backgammon term that has now applied generally in computers. 01:00:36.080 |
Well, at least in AI because of TDGammon. - That's fascinating. - So how is self-play being 01:00:42.480 |
used now? And like, why is it, does it feel like a more general powerful concept? Sort of the idea of, 01:00:47.680 |
well, the machine's just gonna teach itself to be smart. - Yeah, so that's where, maybe you can 01:00:52.320 |
correct me, but that's where the continuation of the spirit and actually like literally the exact 01:00:58.960 |
algorithms of TDGammon are applied by DeepMind and OpenAI to learn games that are a little bit 01:01:05.680 |
more complex. That when I was learning artificial intelligence, Go was presented to me with 01:01:11.280 |
Artificial Intelligence, The Modern Approach. I don't know if they explicitly pointed to Go 01:01:16.000 |
in those books as like unsolvable kind of thing, like implying that these approaches hit their 01:01:23.840 |
limit in this, with these particular kind of games. So something, I don't remember if the book said it 01:01:28.800 |
or not, but something in my head, if it was the professors, instilled in me the idea like this 01:01:34.640 |
is the limits of artificial intelligence of the field. Like it instilled in me the idea that if we 01:01:41.360 |
can create a system that can solve the game of Go, we've achieved AGI. That was kind of, I didn't 01:01:47.280 |
explicitly like say this, but that was the feeling. And so from, I was one of the people that it seemed 01:01:53.440 |
magical when a learning system was able to beat a human world champion at the game of Go 01:02:02.240 |
and even more so from that, that was AlphaGo, even more so with AlphaGo Zero than kind of renamed 01:02:09.440 |
and advanced into AlphaZero, beating a world champion or world-class player without any 01:02:18.160 |
supervised learning on expert games. We're doing only through by playing itself. So that is, 01:02:27.520 |
I don't know what to make of it. I think it would be interesting to hear what your opinions are on 01:02:32.880 |
just how exciting, surprising, profound, interesting, or boring the breakthrough performance of AlphaZero 01:02:44.240 |
was. - Okay, so AlphaGo knocked my socks off. That was so remarkable. - Which aspect of it? 01:02:52.800 |
- That they got it to work, that they actually were able to leverage a whole bunch of different 01:02:58.400 |
ideas, integrate them into one giant system. Just the software engineering aspect of it is 01:03:03.360 |
mind-blowing. I've never been a part of a program as complicated as the program that they built for 01:03:08.480 |
that. And just the, like Jerry Tesoro is a neural net whisperer, like David Silver is a kind of 01:03:16.160 |
neural net whisperer too. He was able to coax these networks and these new way out there architectures 01:03:22.240 |
to do these, to solve these problems that, as you said, when we were learning from AI, 01:03:29.440 |
no one had an idea how to make it work. It was remarkable that these techniques that were so 01:03:39.120 |
good at playing chess and that could beat the world champion in chess, couldn't beat your typical 01:03:44.000 |
Go playing teenager in Go. So the fact that in a very short number of years, we kind of ramped up 01:03:50.560 |
to trouncing people in Go, just blew me away. - So you're kind of focusing on the engineering 01:03:57.920 |
aspect, which is also very surprising. I mean, there's something different about large, 01:04:03.600 |
well-funded companies. I mean, there's a compute aspect to it too. - Sure. 01:04:07.200 |
- Like that, of course, I mean, that's similar to Deep Blue, right? With IBM. Like there's something 01:04:15.360 |
important to be learned and remembered about a large company, taking the ideas that are already 01:04:21.280 |
out there and investing a few million dollars into it or more. And so you're kind of saying 01:04:28.800 |
the engineering is kind of fascinating, both on the, with AlphaGo is probably just gathering all 01:04:34.640 |
the data, right? Of the expert games, like organizing everything, actually doing distributed 01:04:41.280 |
supervised learning. And to me, see the engineering I kind of took for granted, 01:04:47.840 |
to me, philosophically being able to persist in the face of like long odds, because it feels like 01:04:59.040 |
for me, I'll be one of the skeptical people in the room thinking that you can learn your way to beat 01:05:04.400 |
Go. Like it sounded like, especially with David Silver, it sounded like David was not confident 01:05:10.400 |
at all. It's like, it was like, not, it's funny how confidence works. - Yeah. 01:05:18.400 |
- It's like, you're not like cocky about it, like, but- - Right, 'cause if you're cocky about it, 01:05:25.680 |
you kind of stop and stall and don't get anywhere. - Yeah, but there's like a hope that's unbreakable. 01:05:31.440 |
Maybe that's better than confidence. It's a kind of wishful hope in a little dream, 01:05:36.160 |
and you almost don't want to do anything else. You kind of keep doing it. That seems to be the 01:05:41.840 |
story. - But with enough skepticism that you're looking for where the problems are and fighting 01:05:48.160 |
- 'Cause you know, there's gotta be a way out of this thing. Yeah. 01:05:50.720 |
- And for him, it was probably, there's a bunch of little factors that come into play. It's funny 01:05:56.080 |
how these stories just all come together. Like everything he did in his life came into play, 01:06:00.400 |
which is like a love for video games and also a connection to, so the '90s had to happen with 01:06:08.640 |
- And in some ways it's surprising, maybe you can provide some intuition to it, that not much more 01:06:14.640 |
than TD Gammon was done for quite a long time on the reinforcement learning front. 01:06:20.560 |
- I mean, like I said, the students who I worked with, we tried to get, 01:06:25.600 |
basically apply that architecture to other problems and we consistently failed. There were a couple 01:06:31.200 |
really nice demonstrations that ended up being in the literature. There was a paper about 01:06:36.880 |
controlling elevators, right? Where it's like, okay, can we modify the heuristic that elevators 01:06:42.720 |
use for deciding, like a bank of elevators for deciding which floors we should be stopping on to 01:06:47.360 |
maximize throughput essentially. And you can set that up as a reinforcement learning problem and 01:06:52.240 |
you can have a neural net represent the value function so that it's taking where are all the 01:06:56.960 |
elevators, where are the button pushes, this high dimensional, well, at the time, high dimensional 01:07:02.080 |
input, a couple of dozen dimensions and turn that into a prediction as to, oh, is it gonna be better 01:07:08.640 |
if I stop at this floor or not? And ultimately it appeared as though for the standard simulation 01:07:15.920 |
distribution for people trying to leave the building at the end of the day, that the neural 01:07:19.600 |
net learned a better strategy than the standard one that's implemented in elevator controllers. 01:07:24.000 |
So that was nice. There was some work that Satyendra Singh et al did on handoffs with cell 01:07:32.320 |
phones, deciding when should you hand off from this cell tower to this cell tower. 01:07:39.760 |
- Yeah, and so a couple of things seemed like they were really promising. None of them made it into 01:07:45.040 |
production that I'm aware of. And neural nets as a whole started to kind of implode around then. 01:07:50.160 |
And so there just wasn't a lot of air in the room for people to try to figure out, okay, 01:07:55.040 |
how do we get this to work in the RL setting? - And then they found their way back in 10 plus 01:08:02.080 |
years. So you said AlphaGo was impressive, like it's a big spectacle. Is there- 01:08:06.720 |
- Right, so then AlphaZero, so I think I may have a slightly different opinion on this than some 01:08:12.000 |
people. So I talked to Satyendra Singh in particular about this. So Satyendra was like 01:08:17.440 |
Rich Sutton, a student of Andy Bartow. So they came out of the same lab, very influential machine 01:08:23.120 |
learning, reinforcement learning researcher, now at DeepMind, as is Rich, though different sites, 01:08:32.560 |
- Rich is in Alberta and Satyendra would be in England, but I think he's in England from 01:08:37.280 |
Michigan at the moment. But he was, yes, he was much more impressed with AlphaGo Zero, which is, 01:08:47.120 |
didn't get a kind of a bootstrap in the beginning with human-trained games, 01:08:51.760 |
just was purely self-play. Though the first one, AlphaGo, was also a tremendous amount of self-play. 01:08:57.760 |
Right, they started off, they kick-started the action network that was making decisions, 01:09:02.320 |
but then they trained it for a really long time using more traditional temporal difference 01:09:06.400 |
methods. So as a result, it didn't seem that different to me. It seems like, yeah, 01:09:13.760 |
why wouldn't that work? Once it works, it works. But he found that removal of that extra information 01:09:22.240 |
to be breathtaking. That's a game changer. To me, the first thing was more of a game changer. 01:09:27.440 |
- But the open question, I mean, I guess that's the assumption, is the expert games might contain 01:09:34.160 |
within them a humongous amount of information. - But we know that it went beyond that, right? 01:09:40.960 |
We know that it somehow got away from that information because it was learning strategies. 01:09:44.960 |
I don't think AlphaGo is just better at implementing human strategies. I think it 01:09:50.640 |
actually developed its own strategies that were more effective. And so from that perspective, 01:09:56.000 |
okay, well, so it made at least one quantum leap in terms of strategic knowledge. Okay, 01:10:02.640 |
so now maybe it makes three. Okay, but that first one is the doozy, right? Getting it to work 01:10:09.520 |
reliably and for the networks to hold onto the value well enough. That was a big step. 01:10:15.680 |
- Well, maybe you could speak to this on the reinforcement learning front. So starting from 01:10:21.280 |
scratch and learning to do something, like the first random behavior to crappy behavior to 01:10:33.040 |
somewhat okay behavior, it's not obvious to me that that's not impossible to take those steps. 01:10:40.640 |
If you just think about the intuition, how the heck does random behavior become 01:10:47.040 |
somewhat basic intelligent behavior? Not human level, not superhuman level, but just basic. 01:10:54.880 |
But you're saying to you, the intuition is like, if you can go from human to superhuman level 01:11:00.240 |
intelligence on this particular task of game playing, then you're good at taking leaps. 01:11:06.800 |
So you can take many of them. - That the system, I believe that the 01:11:09.520 |
system can take that kind of leap. Yeah, and also I think that beginner knowledge 01:11:15.120 |
in Go, you can start to get a feel really quickly for the idea that being in certain parts of the 01:11:24.640 |
board seems to be more associated with winning. 'Cause it's not stumbling upon the concept of 01:11:31.360 |
winning. It's told that it wins or that it loses. Well, it's self-play. So it both wins and loses. 01:11:36.480 |
It's told which side won. And the information is kind of there to start percolating around 01:11:43.520 |
to make a difference as to, well, these things have a better chance of helping you win. And 01:11:48.720 |
these things have a worse chance of helping you win. And so it can get to basic play, 01:11:52.560 |
I think pretty quickly. Then once it has basic play, well, now it's kind of forced to do some 01:11:58.000 |
search to actually experiment with, okay, well, what gets me that next increment of improvement? 01:12:03.200 |
- How far do you think, okay, this is where you kind of bring up the Elon Musk and the Sam Harris 01:12:09.440 |
is right. How far is your intuition about these kinds of self-play mechanisms being able to take 01:12:15.120 |
us? 'Cause it feels one of the ominous, but stated calmly things that when I talked to David Silver, 01:12:24.320 |
he said is that they have not yet discovered a ceiling for alpha zero, for example, on the game 01:12:31.440 |
of go or chess. It keeps, no matter how much they compute, they throw at it, it keeps improving. 01:12:37.520 |
So it's possible, it's very possible that if you throw some 10X compute that it will improve by 01:12:46.640 |
5X or something like that. And when stated calmly, it's so like, oh yeah, I guess so. 01:12:53.680 |
But then you think, well, can we potentially have continuations of Moore's law in totally 01:13:02.240 |
different way, like broadly defined Moore's law? - Right, exponential improvement. 01:13:06.160 |
- Exponential improvement, like are we going to have an alpha zero that swallows the world? 01:13:11.040 |
- But notice it's not getting better at other things, it's getting better at go. And I think 01:13:17.360 |
that's a big leap to say, okay, well, therefore it's better at other things. 01:13:22.320 |
- Well, I mean, the question is how much of the game of life can be turned into- 01:13:27.280 |
- Right, so that I think is a really good question. And I think that we don't, 01:13:30.960 |
I don't think we as a, I don't know, community really know the answer to this, but, 01:13:34.800 |
so, okay, so I went to a talk by some experts on computer chess. So in particular, computer 01:13:44.720 |
chess is really interesting because for, of course, for a thousand years, humans were the 01:13:49.760 |
best chess playing things on the planet. And then computers like edged ahead of the best person, 01:13:56.240 |
and they've been ahead ever since. It's not like people have overtaken computers. 01:14:01.040 |
But computers and people together have overtaken computers. 01:14:06.640 |
- So at least last time I checked, I don't know what the very latest is, but last time I checked 01:14:11.440 |
that there were teams of people who could work with computer programs to defeat the best computer 01:14:20.160 |
- Right. And so using the information about how, these things called Elo scores, 01:14:26.880 |
this sort of notion of how strong a player are you, there's kind of a range of possible scores. 01:14:32.400 |
And you increment in score, basically, if you can beat another player of that lower score, 01:14:38.560 |
62% of the time or something like that. Like there's some threshold of, 01:14:43.360 |
if you can somewhat consistently beat someone, then you are of a higher score than that person. 01:14:48.640 |
And there's a question as to how many times can you do that in chess, right? And so we know that 01:14:53.360 |
there's a range of human ability levels that cap out with the best playing humans. And the computers 01:14:58.480 |
went a step beyond that. And computers and people together have not gone, I think, a full step 01:15:04.080 |
beyond that. It feels, the estimates that they have is that it's starting to asymptote, that 01:15:09.200 |
we've reached kind of the maximum, the best possible chess playing. And so that means that 01:15:14.800 |
there's kind of a finite strategic depth, right? At some point you just can't get any better at 01:15:20.960 |
- Yeah, I mean, I don't, so I'll actually check that. I think, 01:15:26.240 |
it's interesting because if you have somebody like Magnus Carlsen, who's using these chess 01:15:34.160 |
programs to train his mind, like to learn about chess. 01:15:38.720 |
- And so like, that's a very interesting thing, 'cause we're not static creatures, 01:15:43.840 |
we're learning together. I mean, just like we're talking about social networks, 01:15:47.680 |
those algorithms are teaching us just like we're teaching those algorithms. So that's 01:15:51.680 |
a fascinating thing. But I think the best chess playing programs are now better than the pairs. 01:15:58.480 |
Like they have competition between pairs, but it's still, even if they weren't, it's an 01:16:03.840 |
interesting question, where's the ceiling? So the David, the ominous David Silver kind of statement 01:16:11.520 |
- Right, but so the question is, okay, so I don't know his analysis on that. My, 01:16:17.360 |
from talking to Go experts, the depth, the strategic depth of Go seems to be substantially 01:16:23.920 |
greater than that of chess, that there's more kind of steps of improvement that you can make, 01:16:28.720 |
getting better and better and better and better. But there's no reason to think that it's infinite. 01:16:32.640 |
- And so it could be that it's, that what David is seeing is a kind of asymptoting, 01:16:38.080 |
that you can keep getting better, but with diminishing returns. And at some point, 01:16:42.160 |
you hit optimal play. Like in theory, all these finite games, they're finite, 01:16:47.520 |
they have an optimal strategy. There's a strategy that is the mini max optimal strategy. 01:16:51.760 |
And so at that point, you can't get any better. You can't beat that strategy. Now that strategy 01:16:57.200 |
may be from an information processing perspective, intractable, right? You need, 01:17:03.360 |
all the situations are sufficiently different that you can't compress it at all. It's this 01:17:09.120 |
giant mess of hard-coded rules, and we can never achieve that. 01:17:14.640 |
But that still puts a cap on how many levels of improvement that we can actually make. 01:17:18.480 |
- But the thing about self-play is if you put it, although I don't like doing that, 01:17:24.400 |
in the broader category of self-supervised learning, is that it doesn't require too much 01:17:32.480 |
- Yeah, human label or just human effort. The human involvement past a certain point. 01:17:37.840 |
And the same thing you could argue is true for the recent breakthroughs in natural English 01:17:44.160 |
processing with language models. - Oh, this is how you get to GPT-3. 01:17:47.440 |
- Yeah, see how I did the- - That was a good transition. 01:17:51.120 |
- Yeah, I practiced that for days leading up to this. But that's one of the questions is, 01:17:58.640 |
can we find ways to formulate problems in this world that are important to us humans, 01:18:04.640 |
like more important than the game of chess, that to which self-supervised kinds of approaches 01:18:12.400 |
could be applied, whether it's self-play, for example, for maybe you could think of 01:18:16.960 |
autonomous vehicles in simulation, that kind of stuff, or just robotics applications in simulation, 01:18:24.640 |
or in the self-supervised learning where unannotated data or data that's generated 01:18:35.760 |
by humans naturally without extra cost, like Wikipedia or like all of the internet, 01:18:42.880 |
can be used to create intelligent systems that do something really powerful, 01:18:50.320 |
that pass the Turing test or that do some kind of superhuman level performance. 01:18:55.760 |
So what's your intuition, trying to stitch all of it together about our discussion of AGI, 01:19:05.120 |
the limits of self-play, and your thoughts about maybe the limits of neural networks in the context 01:19:11.200 |
of language models? Is there some intuition in there that might be useful to think about? 01:19:16.640 |
- Yeah, yeah, yeah. So first of all, the whole transformer network family of things 01:19:23.920 |
is really cool. It's really, really cool. I mean, if you've ever, back in the day, you played with, 01:19:31.600 |
I don't know, Markov models for generating text, and you've seen the kind of text that they spit 01:19:35.360 |
out, and you compare it to what's happening now, it's amazing. It's so amazing. Now, it doesn't 01:19:42.400 |
take very long interacting with one of these systems before you find the holes, right? It's 01:19:47.840 |
not smart in any kind of general way. It's really good at a bunch of things, and it does seem to 01:19:55.840 |
understand a lot of the statistics of language extremely well. And that turns out to be very 01:20:01.280 |
powerful. You can answer many questions with that. But it doesn't make it a good conversationalist, 01:20:06.160 |
right? And it doesn't make it a good storyteller. It just makes it good at imitating of things it 01:20:10.480 |
has seen in the past. - The exact same thing could be said by people who are voting for Donald Trump 01:20:16.480 |
about Joe Biden supporters, and people voting for Joe Biden about Donald Trump supporters, is, 01:20:23.760 |
They're just following the- - Yeah, they're following things 01:20:26.080 |
they've seen in the past, and it doesn't take long to find the flaws in their natural language 01:20:39.040 |
- Critical of AS. - Right, so I've had a similar thought, 01:20:43.280 |
which was that the stories that GPT-3 spits out are amazing and very human-like, 01:20:52.320 |
and it doesn't mean that computers are smarter than we realize, necessarily. It partly means 01:20:58.320 |
that people are dumber than we realize, or that much of what we do day-to-day is not that deep. 01:21:04.400 |
Like, we're just kind of going with the flow, we're saying whatever feels like the natural 01:21:09.040 |
thing to say next. Not a lot of it is creative or meaningful or intentional. But enough is that we 01:21:18.080 |
actually get by, right? We do come up with new ideas sometimes, and we do manage to talk each 01:21:23.440 |
other into things sometimes, and we do sometimes vote for reasonable people sometimes. But it's 01:21:31.200 |
really hard to see in the statistics, because so much of what we're saying is kind of rote. 01:21:34.960 |
And so our metrics that we use to measure how these systems are doing don't reveal that, 01:21:41.600 |
because it's in the interstices that is very hard to detect. 01:21:46.320 |
- But do you have an intuition that with these language models, 01:21:50.240 |
if they grow in size, it's already surprising when you go from GPT-2 to GPT-3 that there is 01:21:57.920 |
a noticeable improvement. So the question now goes back to the ominous David Silver and the ceiling. 01:22:03.120 |
- Right, so maybe there's just no ceiling, we just need more compute. Now, 01:22:06.240 |
I mean, okay, so now I'm speculating. As opposed to before, when I was completely on firm ground. 01:22:15.200 |
I don't believe that you can get something that really can do language and use language as a thing 01:22:21.760 |
that doesn't interact with people. I think that it's not enough to just take everything that we've 01:22:27.200 |
said written down and just say, "That's enough, you can just learn from that, and you can be 01:22:31.120 |
intelligent." I think you really need to be pushed back at. I think that conversations, even people 01:22:37.360 |
who are pretty smart, maybe the smartest thing that we know, maybe not the smartest thing we can 01:22:41.680 |
imagine, but we get so much benefit out of talking to each other and interacting. That's presumably 01:22:49.120 |
why you have conversations live with guests, is that there's something in that interaction that 01:22:54.000 |
would not be exposed by, "Oh, I'll just write you a story and then you can read it later." 01:22:58.320 |
And I think because these systems are just learning from our stories, they're not learning from 01:23:03.040 |
being pushed back at by us, that they're fundamentally limited into what they could 01:23:07.200 |
actually become on this route. They have to get shut down. They have to have an argument with us 01:23:15.840 |
and lose a couple times before they start to realize, "Oh, okay, wait, there's some nuance 01:23:21.280 |
here that actually matters." - Yeah, that's actually subtle-sounding, 01:23:25.680 |
but quite profound that the interaction with humans is essential. And the limitation within 01:23:33.840 |
that is profound as well, because the time scale, the bandwidth at which you can really interact 01:23:40.400 |
with humans is very low. So it's costly. One of the underlying things about self-plays, it has to do 01:23:48.640 |
a very large number of interactions. And so you can't really deploy reinforcement learning systems 01:23:56.560 |
into the real world to interact. You couldn't deploy a language model into the real world to 01:24:02.400 |
interact with humans because it would just not get enough data relative to the cost it takes 01:24:09.200 |
to interact. The time of humans is expensive, which is really interesting. That takes us back 01:24:15.040 |
to reinforcement learning and trying to figure out if there's ways to make algorithms that are 01:24:20.880 |
more efficient at learning, keep the spirit in reinforcement learning and become more efficient. 01:24:25.680 |
In some sense, that seems to be the goal. I'd love to hear what your thoughts are. I don't know if 01:24:31.360 |
you got a chance to see a blog post called "Bitter Lesson." - Oh, yes. 01:24:35.440 |
- By Rich Sutton that makes an argument, and hopefully I can summarize it. Perhaps you can. 01:24:42.880 |
- Oh, okay. - I mean, I could try and you can 01:24:46.480 |
correct me, which is he makes an argument that it seems if we look at the long arc of the history of 01:24:52.880 |
the artificial intelligence field, he calls 70 years, that the algorithms from which we've seen 01:25:00.080 |
the biggest improvements in practice are the very simple, dumb algorithms that are able to leverage 01:25:07.120 |
computation. You just wait for the computation to improve. All of the academics and so on have fun 01:25:13.520 |
by finding little tricks and congratulate themselves on those tricks. Sometimes those 01:25:18.080 |
tricks can be big that feel in the moment like big spikes and breakthroughs, but in reality, 01:25:23.680 |
over the decades, it's still the same dumb algorithm that just waits for the compute 01:25:29.440 |
to get faster and faster. Do you find that to be an interesting argument against the 01:25:36.640 |
entirety of the field of machine learning as an academic discipline? - That we're really just a 01:25:41.600 |
subfield of computer architecture. We're just kind of waiting around for them to do their next thing. 01:25:45.760 |
- Who really don't want to do hardware work. - That's right. I really don't want to think 01:25:50.880 |
- Yes, that's right. Just waiting for them to do their job so that we can pretend to have done 01:25:54.800 |
ours. So, yeah, I mean, the argument reminds me a lot of, I think it was a Fred Jelinek quote, 01:26:02.160 |
early computational linguist who said, you know, we're building these computational linguistic 01:26:06.320 |
systems and every time we fire a linguist, performance goes up by 10%, something like 01:26:12.720 |
that. And so the idea of us building the knowledge in, in that case, was much less, 01:26:18.960 |
he was finding it to be much less successful than get rid of the people who know about language as a, 01:26:24.560 |
you know, from a kind of scholastic academic kind of perspective and replace them with more compute. 01:26:31.200 |
And so I think this is kind of a modern version of that story, which is, okay, 01:26:35.440 |
we want to do better on machine vision. You could build in all these, you know, 01:26:39.600 |
motivated part-based models that, you know, that just feel like obviously the right thing that you 01:26:47.520 |
have to have, or we can throw a lot of data at it and guess what we're doing better with it, 01:26:50.960 |
with a lot of data. So I hadn't thought about it until this moment in this way, but what I believe, 01:26:58.960 |
well, I've thought about what I believe. What I believe is that, you know, compositionality and 01:27:06.000 |
what's the right way to say it? The complexity grows rapidly as you consider more and more 01:27:13.440 |
possibilities, like explosively. And so far, Moore's law has also been growing explosively, 01:27:20.080 |
exponentially. And so, so it really does seem like, well, we don't have to think really hard 01:27:24.960 |
about the algorithm design or the way that we build the systems because the best benefit we 01:27:31.280 |
could get is exponential. And the best benefit that we can get from waiting is exponential. 01:27:35.680 |
So we can just wait. It's got, that's got to end, right? And there's hints now that, 01:27:40.960 |
that Moore's law is, is starting to feel some friction, starting to, the world is pushing back 01:27:46.480 |
a little bit. One thing that I, I don't know, do lots of people know this? I didn't know this. I 01:27:51.520 |
was, I was trying to write an essay and yeah, Moore's law has been amazing and it's been, 01:27:57.120 |
it's enabled all sorts of things, but there's a, there's also a kind of counter Moore's law, 01:28:01.200 |
which is that the development cost for each successive generation of chips also is doubling. 01:28:07.440 |
So it's costing twice as much money. So the amount of development money per cycle or whatever 01:28:12.800 |
is actually sort of constant. And at some point we run out of money. So, or we have to come up 01:28:18.000 |
with an entirely different way of, of doing the development process. So like, I, I guess I always, 01:28:24.000 |
always a bit skeptical of the, look, it's an exponential curve, therefore it has no end. 01:28:28.560 |
Soon the number of people going to NeurIPS will be greater than the population of the earth. 01:28:32.640 |
That means we're going to discover life on other planets. No, it doesn't. It means that we're in a, 01:28:37.280 |
in a sigmoid curve on the front half, which looks a lot like an exponential. 01:28:41.840 |
The second half is going to look a lot like diminishing returns. 01:28:45.360 |
Yeah. The, I mean, but the interesting thing about Moore's law, if you actually like look at 01:28:50.480 |
the technologies involved, it's hundreds, if not thousands of S curves stacked on top of each other. 01:28:56.560 |
It's not actually an exponential curve. It's constant breakthroughs. And, and then what 01:29:02.640 |
becomes useful to think about, which is exactly what you're saying, the cost of development, 01:29:06.720 |
like the size of teams, the amount of resources that are invested in continuing to find new S 01:29:11.920 |
curves, new breakthroughs. And yeah, it's a, it's an interesting idea. You know, if we live in the 01:29:20.080 |
moment, if we sit here today, it seems to be the reasonable thing to say that exponentials end. 01:29:28.400 |
And yet in the software realm, they just keep appearing to be happening. And it's so, 01:29:36.320 |
I mean, it's so hard to disagree with Elon Musk on this because it like, I've, 01:29:43.920 |
you know, I used to be one of those folks. I'm still one of those folks. I studied autonomous 01:29:50.320 |
vehicles. This is what I worked on. And, and it's, it's like, you look at what Elon Musk is saying 01:29:56.080 |
about autonomous vehicles. Well, obviously in a couple of years or in a year or next month, 01:30:01.440 |
we'll have fully autonomous vehicles. Like there's no reason why we can't. Driving is pretty simple. 01:30:06.080 |
Like it's just a learning problem and you just need to convert all the driving that we're doing 01:30:11.440 |
into data and just having, you know, know what the trains on that data. And like we use only our eyes, 01:30:17.360 |
so you can use cameras and you can train on it. And it's like, yeah, that's, that what, 01:30:24.240 |
that should work. And then you put that hat on like the philosophical hat. And, but then you put 01:30:30.080 |
the pragmatic hat and it's like, this is what the flaws of computer vision are. Like this is what it 01:30:34.800 |
means to train at scale. And then you, you put the human factors, the psychology hat on, which is 01:30:41.200 |
like, it's actually driving us a lot. The cognitive science or cognitive, whatever the heck you call 01:30:45.760 |
it is. It's really hard. It's much harder to drive than, than we realize there's much larger number 01:30:52.240 |
of edge cases. So building up an intuition around this is, around exponential is really difficult. 01:30:59.280 |
And on top of that, the pandemic is making us think about exponentials, making us realize that 01:31:06.560 |
like we don't understand anything about it. We're not able to intuit exponentials. We're either 01:31:11.440 |
ultra terrified, some part of the population and some part is like the opposite of whatever the 01:31:20.480 |
different carefree and we're not managing it very well. Blase. Blase. Well, wow. That's, 01:31:26.240 |
is that French? I assume so. It's got an accent. So it's, it's fascinating to think what, 01:31:32.480 |
what the limits of this exponential growth of technology, not just Moore's law, it's technology, 01:31:44.400 |
how that rubs up against the bitter lesson and GPT-3 and self-play mechanisms. Like it's not 01:31:54.240 |
obvious. I used to be much more skeptical about neural networks. Now I at least give a slither 01:31:59.680 |
of possibility that we'll be all, that we'll be very much surprised and also, you know, 01:32:05.840 |
caught in a way that like we are not prepared for. Like in applications of social networks, 01:32:18.320 |
for example, because it feels like really good transformer models that are able to do some kind 01:32:25.200 |
of like very good natural language generation are the same kind of models that can be used 01:32:32.160 |
to learn human behavior and then manipulate that human behavior to gain advertiser dollars and all 01:32:37.920 |
those kinds of things through the capitalist system. And they arguably already are manipulating 01:32:44.000 |
human behavior. Yeah. So, but not for self-preservation, which I think is a big, 01:32:50.160 |
that would be a big step. Like if they were trying to manipulate us to convince us not to shut them 01:32:55.440 |
off, I would be very freaked out, but I don't see a path to that from where we are now. They, they, 01:33:02.560 |
they don't have any of those abilities. That's not what they're trying to do. They're trying to 01:33:08.080 |
keep people on, on the site. But see, the thing is this, this is the thing about life on earth 01:33:12.960 |
is they might be borrowing our consciousness and sentience. Like, so like in a sense they do, 01:33:20.880 |
because the creators of the algorithms have, like they're not, you know, if you look at our body, 01:33:26.160 |
we're not a single organism. We're a huge number of organisms with like tiny little motivations. 01:33:31.600 |
We're built on top of each other. In the same sense, the AI algorithms that are, they're not- 01:33:36.480 |
- It's a system that includes human companies and corporations, right? Because corporations are 01:33:41.120 |
funny organisms in and of themselves that really do seem to have self-preservation built in. And 01:33:45.760 |
I think that's at the, at the design level. I think the design to have self-preservation be 01:33:50.080 |
a focus. So you're right in that, in that broader system that we're also a part of and can have some 01:33:59.280 |
influence on, it's, it's, it is much more complicated, much more powerful. Yeah, I agree with 01:34:05.360 |
that. - So people really love it when I ask what three books, technical, philosophical, 01:34:12.080 |
fiction had a big impact on your life. Maybe you can recommend, we went with movies. We went with 01:34:19.680 |
Billy Joel and I forgot what you, what music you recommended, but- - I didn't, I just said I have 01:34:25.440 |
no taste in music. I just like pop music. - That was actually really skillful the way you have 01:34:30.400 |
read that question. - Thank you, thanks. I was, I'm going to try to do the same with the books. 01:34:33.200 |
- So do you have a skillful way to avoid answering the question about three books you would recommend? 01:34:39.040 |
- I'd like to tell you a story. So my first job out of college was at Bellcore. I mentioned that 01:34:46.240 |
before where I worked with Dave Ackley. The head of the group was a guy named Tom Landauer. And I 01:34:50.240 |
don't know how well known he's known now, but arguably he's the, he's the inventor and the 01:34:56.320 |
first proselytizer of word embeddings. So they, they developed a system shortly before I got to 01:35:01.840 |
the group. Yeah. That, that called latent semantic analysis that would take words of English and 01:35:09.280 |
embed them in a multi-hundred dimensional space. And then use that as a way of assessing similarity 01:35:16.240 |
and basically doing reinforcement learning. Not, sorry, not reinforcement, information retrieval, 01:35:19.840 |
sort of pre-Google information retrieval. And he was trained as an anthropologist, but then 01:35:28.560 |
became a cognitive scientist. So I was in the cognitive science research group. It's, you know, 01:35:31.840 |
like I said, I'm a cognitive science groupie. At the time I thought I'd become a cognitive scientist, 01:35:36.960 |
but then I realized in that group, no, I'm a computer scientist, but I'm a computer scientist 01:35:41.120 |
who really loves to hang out with cognitive scientists. And he said, he studied language 01:35:46.640 |
acquisition in particular. He said, you know, humans have about this number of words of vocabulary. 01:35:52.640 |
And most of that is learned from reading. And I said, that can't be true because I have a really 01:35:58.000 |
big vocabulary and I don't read. He's like, you must. I'm like, I don't think I do. I mean, 01:36:03.120 |
like stop signs, I definitely read stop signs, but like reading books is not, is not a thing 01:36:10.720 |
- Maybe the red color. - Do I read stop signs? 01:36:14.880 |
recognition at this point. I don't sound it out. - Yeah. 01:36:16.880 |
- So now I do, I wonder what that, oh yeah, stoptagons. So. 01:36:25.200 |
- That's fascinating. So you don't. - So I don't read very, I mean, 01:36:28.880 |
obviously I read and I've read, I've read plenty of books. But like some people like Charles, 01:36:33.840 |
my friend Charles and others, like a lot of people in my field, a lot of academics, 01:36:38.480 |
like reading was really a central topic to them in development. And I'm not that guy. In fact, 01:36:45.280 |
I used to joke that when I got into college, that it was on kind of a help out the illiterate kind 01:36:53.840 |
of program because I got to, like I, in my house, I wasn't a particularly bad or good reader. But 01:36:57.680 |
when I got to college, I was surrounded by these people that were just voracious in their reading 01:37:02.640 |
appetite. And they would like, have you read this? Have you read this? Have you read this? 01:37:06.000 |
And I'd be like, no, I'm clearly not qualified to be at this school. Like there's no way I 01:37:10.800 |
should be here. Now I've discovered books on tape, like audio books. And so I'm much better. 01:37:16.480 |
I'm more caught up. I read a lot of books. - A small tangent on that. It is a fascinating 01:37:23.200 |
open question to me on the topic of driving, whether, you know, supervised learning people, 01:37:30.800 |
machine learning people think you have to like drive to learn how to drive. To me, it's very 01:37:36.800 |
possible that just by us humans, by first of all, walking, but also by watching other people drive, 01:37:44.080 |
not even being inside cars as a passenger, but let's say being inside the car as a passenger, 01:37:49.120 |
but even just like being a pedestrian and crossing the road, you learn so much about 01:37:55.360 |
driving from that. It's very possible that you can, without ever being inside of a car, 01:38:00.560 |
be okay at driving once you get in it. Or like watching a movie, for example. I don't know, 01:38:06.720 |
something like that. - Have you taught anyone to drive? 01:38:15.040 |
- Uh-oh. - And I learned a lot about car driving 01:38:18.080 |
'cause my wife doesn't wanna be the one in the car while they're learning. So that's my job. 01:38:24.400 |
and it's really scary. I have wishes to live and they're figuring things out. Now, 01:38:32.320 |
they start off very, very much better than I imagine like a neural network would, right? 01:38:39.760 |
They get that they're seeing the world. They get that there's a road that they're trying to be on. 01:38:43.920 |
They get that there's a relationship between the angle of the steering. But it takes a while to not 01:38:48.320 |
be very jerky. And so that happens pretty quickly. Like the ability to stay in lane at speed, 01:38:54.960 |
that happens relatively fast. It's not zero shot learning, but it's pretty fast. 01:38:59.040 |
The thing that's remarkably hard, and this is, I think, partly why self-driving cars are really 01:39:03.920 |
hard, is the degree to which driving is a social interaction activity. And that blew me away. I 01:39:10.400 |
was completely unaware of it until I watched my son learning to drive. And I was realizing that 01:39:16.160 |
he was sending signals to all the cars around him. And those, in his case, he's always had social 01:39:22.880 |
communication challenges. He was sending very mixed, confusing signals to the other cars, 01:39:28.800 |
and that was causing the other cars to drive weirdly and erratically. And there was no 01:39:33.040 |
question in my mind that he would have an accident because they didn't know how to read him. 01:39:39.680 |
There's things you do with the speed that you drive, the positioning of your car, 01:39:43.520 |
that you're constantly in the head of the other drivers. And seeing him not knowing how to do 01:39:50.320 |
that and having to be taught explicitly, "Okay, you have to be thinking about what the other 01:39:54.080 |
driver is thinking," was a revelation to me. I was stunned. - Yeah, it's quite brilliant. 01:39:59.120 |
So creating theories of mind of the other-- - Theories of mind of the other cars. Yeah, 01:40:05.440 |
which I just hadn't heard discussed in the self-driving car talks that I've been to. Since 01:40:09.840 |
then, there's some people who do consider those kinds of issues, but it's way more subtle than I 01:40:18.160 |
involved with that when you realize, like when you especially focus not on other cars, 01:40:22.000 |
but on pedestrians, for example. It's literally staring you in the face. 01:40:26.800 |
- Yeah, yeah, yeah. - So then when you're just like, 01:40:28.640 |
"How do I interact with pedestrians?" - Pedestrians, you're practically talking 01:40:33.200 |
to an octopus at that point. They've got all these weird degrees of freedom. You don't know 01:40:36.320 |
what they're gonna do. They can turn around any second. - But the point is, we humans know what 01:40:40.560 |
they're gonna do. We have a good theory of mind. We have a good mental model of what they're doing, 01:40:46.560 |
and we have a good model of the model they have of you, and the model of the model of the model. 01:40:51.840 |
We're able to reason about this social game of it all. The hope is that it's quite simple, 01:41:02.160 |
actually, that it could be learned. That's why I just talked to the Waymo. I don't know if you 01:41:06.480 |
know that company. It's Google's South African car. I talked to their CTO about this podcast. 01:41:12.720 |
I rode in their car, and it's quite aggressive, and it's quite fast, and it's good, and it feels 01:41:18.960 |
great. It also, just like Tesla, Waymo made me change my mind about maybe driving is easier 01:41:26.080 |
than I thought. Maybe I'm just being speciest, human-centered. Maybe-- - It's a speciest argument. 01:41:34.480 |
- Yeah, so I don't know, but it's fascinating to think about the same as with reading, which I 01:41:42.720 |
think you just said. You avoided the question, though I still hope you answered it somewhat. 01:41:46.960 |
You avoided it brilliantly. There's blind spots that artificial intelligence researchers have 01:41:54.960 |
about what it actually takes to learn to solve a problem. That's fascinating. 01:42:04.800 |
- She's amazing. - Fantastic, and in particular, 01:42:07.520 |
she thinks a lot about this kind of I know that you know that I know kind of planning. 01:42:12.240 |
The last time I spoke with her, she was very articulate about the ways in which 01:42:18.160 |
self-driving cars are not solved, like what's still really, really hard. 01:42:21.440 |
- But even her intuition is limited. We're all new to this. So in some sense, 01:42:26.800 |
the Elon Musk approach of being ultra confident and just like plowing-- 01:42:30.080 |
- Putting it out there. - Putting it out there. 01:42:31.920 |
Some people say it's reckless and dangerous and so on, but partly it seems to be one of the only 01:42:39.360 |
ways to make progress in artificial intelligence. These are difficult things. Democracy is messy. 01:42:49.280 |
Implementation of artificial intelligence systems in the real world is messy. 01:42:53.680 |
- So many years ago, before self-driving cars were an actual thing you could have a discussion about, 01:42:58.400 |
somebody asked me like, "What if we could use that robotic technology and use it to drive cars 01:43:03.920 |
around? Aren't people gonna be killed and then it's not, you know, blah, blah, blah?" I'm like, 01:43:08.400 |
"That's not what's gonna happen," I said with confidence, incorrectly, obviously. 01:43:11.920 |
What I think is gonna happen is we're gonna have a lot more, like a very gradual kind of rollout 01:43:17.440 |
where people have these cars in like closed communities, right? Where it's somewhat realistic, 01:43:24.320 |
but it's still in a box, right? So that we can really get a sense of what are the weird things 01:43:29.760 |
that can happen? How do we have to change the way we behave around these vehicles? Like it obviously 01:43:37.120 |
requires a kind of co-evolution that you can't just plop them in and see what happens. But of 01:43:42.720 |
course we're basically plopping them in to see what happens. So I was wrong, but I do think that 01:43:46.240 |
would have been a better plan. - So that's, but your intuition, it's funny, just zooming out and 01:43:51.760 |
looking at the forces of capitalism, and it seems that capitalism rewards risk-takers, and rewards 01:43:58.880 |
and punishes risk-takers, like, and like, try it out. The academic approach to, let's try a small 01:44:10.000 |
thing and try to understand slowly the fundamentals of the problem. And let's start with one, then do 01:44:16.880 |
two, and then see that, and then do the three. You know, the capitalist, like startup entrepreneurial 01:44:23.360 |
dream is let's build a thousand, and let's- - Right, and 500 of them fail, but whatever, 01:44:28.640 |
the other 500, we learn from them. - But if you're good enough, I mean, 01:44:32.480 |
one thing is like your intuition would say like, that's gonna be hugely destructive to everything. 01:44:37.840 |
But actually, it's kind of the forces of capitalism, like people are quite, it's easy to be 01:44:44.400 |
critical, but if you actually look at the data, at the way our world has progressed in terms of the 01:44:49.360 |
quality of life, it seems like the competent, good people rise to the top. This is coming from me, 01:44:55.520 |
from the Soviet Union and so on. It's like, it's interesting that somebody like Elon Musk is the 01:45:03.840 |
way you push progress in artificial intelligence. Like it's forcing Waymo to step their stuff up, 01:45:11.440 |
and Waymo is forcing Elon Musk to step up. It's fascinating, 'cause I have this tension in my 01:45:20.160 |
heart and just being upset by the lack of progress in autonomous vehicles within academia. 01:45:29.600 |
So there's huge progress in the early days of the DARPA challenges, and then it just kind of 01:45:36.960 |
stopped, like at MIT, but it's true everywhere else, with an exception of a few sponsors here 01:45:44.320 |
and there, it's like, it's not seen as a sexy problem. Like the moment artificial intelligence 01:45:52.160 |
starts approaching the problems of the real world, like academics kind of like, "Eh, all right, 01:45:59.360 |
let the company--" - 'Cause they get really hard 01:46:03.840 |
some of us are not excited about that other way. - But I still think there's fundamental problems 01:46:09.440 |
to be solved in those difficult things. It's still publishable, I think, like we just need to, 01:46:15.840 |
it's the same criticism you could have of all these conferences in Europe, the CVPR, 01:46:20.160 |
where application papers are often as powerful and as important as like a theory paper. Even like 01:46:28.320 |
theory just seems much more respectable and so on. I mean, the machine learning community is 01:46:32.400 |
changing that a little bit, I mean, at least in statements, but it's still not seen as the sexiest 01:46:38.480 |
of pursuits, which is like, "How do I actually make this thing work in practice?" As opposed to 01:46:43.840 |
on this toy dataset. All that to say, are you still avoiding the three books question? Is there 01:46:51.120 |
something on audiobook that you can recommend? - Oh, yeah, I mean, yeah, I've read a lot of 01:46:56.960 |
really fun stuff. In terms of books that I find myself thinking back on that I read a while ago, 01:47:03.360 |
like the test of time to some degree, I find myself thinking of "Program or Be Programmed" a 01:47:08.800 |
lot by Douglas Roskopf, which was, it basically put out the premise that we all need to become 01:47:17.520 |
programmers in one form or another. And it was an analogy to once upon a time, we all had to become 01:47:25.920 |
readers, we had to become literate. And there was a time before that when not everybody was literate, 01:47:29.920 |
but once literacy was possible, the people who were literate had more of a say in society than 01:47:36.080 |
the people who weren't. And so we made a big effort to get everybody up to speed. And now it's 01:47:40.480 |
not 100% universal, but it's quite widespread. The assumption is generally that people can read. 01:47:46.960 |
The analogy that he makes is that programming is a similar kind of thing, that we need to have a say 01:47:55.760 |
in, right? So being a reader, being literate, being a reader means you can receive all this 01:48:01.120 |
information, but you don't get to put it out there. And programming is the way that we get to 01:48:05.760 |
put it out there. And that was the argument that he made. I think he specifically has now backed 01:48:10.480 |
away from this idea. He doesn't think it's happening quite this way. And that might be true 01:48:16.000 |
that it didn't, society didn't sort of play forward quite that way. I still believe in the premise. I 01:48:22.160 |
still believe that at some point we have the relationship that we have to these machines and 01:48:26.320 |
these networks has to be one of each individual can, has the wherewithal to make the machines 01:48:33.360 |
help them do the things that that person wants done. And as software people, we know how to do 01:48:39.680 |
that. And when we have a problem, we're like, okay, I'll just, I'll hack up a Perl script or 01:48:42.880 |
something and make it. So if we lived in a world where everybody could do that, that would be a 01:48:48.000 |
better world. And computers would be, have, I think, less sway over us and other people's 01:48:54.800 |
software would have less sway over us as a group. Yeah. In some sense, software engineering 01:48:59.040 |
programming is power. Programming is power, right? It's, it's, yeah, it's like magic. It's like magic 01:49:04.800 |
spells and, and it's not out of reach of everyone, but at the moment it's just a sliver of the 01:49:11.040 |
population who can, who can commune with machines in this way. So I don't know. So that book had a 01:49:16.400 |
big, big impact on me. Currently I'm, I'm reading "The Alignment Problem" actually by Brian Christian. 01:49:22.000 |
So I don't know if you've seen this out there yet. Is it similar to Stuart Russell's work with the 01:49:25.440 |
control problem? It's in that same general neighborhood. I mean, they, they take, they have 01:49:29.920 |
different emphases that they're, they're concentrating on. I think, I think Stuart's book 01:49:33.600 |
did a remarkably good job, like a, just a celebratory good job at describing AI technology 01:49:41.520 |
and sort of how it works. I thought that was great. It was really cool to see that in a book. 01:49:46.160 |
- Yeah. I think he has some experience writing some books. 01:49:48.960 |
- That's, you know, that's probably a possible thing. He's, he's maybe thought a thing or two 01:49:53.520 |
about how to explain AI to people. Yeah. That's a really good point. This book so far has been 01:49:59.760 |
remarkably good at telling the story of the, sort of the history, the recent history of some of the 01:50:06.720 |
things that have happened. This, I'm in the first third. He said this book is in three thirds. The 01:50:10.960 |
first third is essentially AI fairness and implications of AI on society that we're seeing 01:50:17.280 |
right now. And that's been great. I mean, he's telling those stories really well. He's, he, he 01:50:21.680 |
went out and talked to the frontline people who, whose names were associated with some of these 01:50:26.080 |
ideas and it's been terrific. He says the second half of the book is on reinforcement learning. So 01:50:30.800 |
maybe that'll be fun. And then the third half, third third is on this superintelligence alignment 01:50:39.360 |
problem. And I, I suspect that that part will be less fun for me to read. 01:50:43.760 |
- Yeah. It's, yeah, it's, it's an interesting problem to talk about. I find it to be the most 01:50:50.240 |
interesting, just like thinking about whether we live in a simulation or not as a, as a thought 01:50:55.520 |
experiment to think about our own existence. So in the same way, talking about alignment problem 01:51:01.040 |
with AGI is a good way to think similar to like the trolley problem with autonomous vehicles. 01:51:06.560 |
It's a useless thing for engineering, but it's a, it's a nice little thought experiment for 01:51:11.040 |
actually thinking about what are like our own human ethical systems, our moral systems to, 01:51:17.200 |
to, to by thinking how we engineer these things, you start to understand yourself. 01:51:24.400 |
- So sci-fi can be good at that too. So one sci-fi book to recommend is Exhalations by 01:51:31.040 |
Ted Chiang, a bunch of short stories. This Ted Chiang is the guy who wrote the short story that 01:51:35.920 |
became the movie Arrival and all of his stories, just from a, he's, he was a computer scientist, 01:51:43.200 |
actually, he studied at Brown. They all have this sort of really insightful bit of science or 01:51:50.080 |
computer science that drives them. And so it's just a romp, right, to just like, he creates these 01:51:56.320 |
artificial worlds with these, by extrapolating on these ideas that, that we know about, but hadn't 01:52:01.840 |
really thought through to this kind of conclusion. And so his stuff is, it's really fun to read. 01:52:06.240 |
It's mind warping. - So I'm not sure if you're 01:52:09.840 |
familiar, I seem to mention this every other word, is I'm from the Soviet Union and I'm Russian. 01:52:16.240 |
Way too much Dostoevsky. - I think my roots are Russian too, 01:52:20.080 |
but a couple of generations back. - Well, it's probably in there somewhere. 01:52:24.080 |
So maybe we can, we can pull at that thread a little bit of the existential dread that we all 01:52:31.040 |
feel. You mentioned that you, I think somewhere in the conversation, you mentioned that you don't 01:52:35.040 |
really pretty much like dying. I forget in which context, it might've been a reinforcement 01:52:40.560 |
learning perspective. I don't know. - I know, you know what it was? 01:52:43.040 |
It was in teaching my kids to drive. - That's how you face your mortality. Yes. 01:52:48.800 |
From a human being's perspective or from a reinforcement learning researcher's perspective, 01:52:55.200 |
let me ask you the most absurd question. What's, what do you think is the meaning of this whole 01:52:59.920 |
thing? What the meaning of life on this spinning rock? - I mean, I think reinforcement learning 01:53:08.240 |
researchers maybe think about this from a science perspective more often than a lot of other people, 01:53:13.280 |
right? As a supervised learning person, you're probably not thinking about the sweep of 01:53:17.280 |
a lifetime, but reinforcement learning agents are having little lifetimes, little weird, 01:53:21.920 |
little lifetimes. And it's hard not to project yourself into their world sometimes. 01:53:27.520 |
But as far as the meaning of life, so when I turned 42, you may know from, that is a book I read, 01:53:37.120 |
- "Hitchhiker's Guide to the Galaxy," that that is the meaning of life. So when I turned 42, 01:53:41.120 |
I had a meaning of life party where I invited people over and everyone shared their meaning 01:53:48.320 |
of life. We had slides made up. And so we all sat down and did a slide presentation to each other 01:53:54.960 |
about the meaning of life. And mine-- - That's great. 01:53:58.080 |
- Mine was balance. I think that life is balance. And so the activity at the party, for a 42-year-old, 01:54:07.440 |
maybe this is a little bit nonstandard, but I found all the little toys and devices that I had 01:54:12.160 |
that where you had to balance on them. You had to like stand on it and balance or pogo stick I brought, 01:54:17.520 |
a rip stick, which is like a weird two-wheeled skateboard. I got a unicycle, but I didn't know 01:54:25.760 |
how to do it. I now can do it. - I would love watching you try. 01:54:29.200 |
- Yeah, I'll send you a video. - Absolutely, guys. 01:54:34.480 |
And so balance, yeah. So my wife has a really good one that she sticks to and is probably pretty 01:54:43.200 |
accurate. And it has to do with healthy relationships with people that you love and 01:54:49.200 |
working hard for good causes. But to me, yeah, balance, balance in a word, that works for me. 01:54:55.360 |
Not too much of anything, 'cause too much of anything is iffy. 01:54:59.680 |
- That feels like a Rolling Stones song. I feel like they must be. 01:55:03.200 |
- You can't always get what you want, but if you try sometimes, you can strike a balance. 01:55:08.160 |
- Yeah, I think that's how it goes. (laughing) 01:55:13.600 |
- It's a huge honor to talk to you. This is really fun. 01:55:16.640 |
- Oh, not an honor, but- - I've been a big fan of yours. 01:55:18.720 |
So can't wait to see what you do next in the world of education, the world of parody, 01:55:27.040 |
in the world of reinforcement learning. Thanks for talking to me. 01:55:31.440 |
to this conversation with Michael Littman. And thank you to our sponsors, SimpliSafe, 01:55:36.000 |
a home security company I use to monitor and protect my apartment, 01:55:40.160 |
ExpressVPN, the VPN I've used for many years to protect my privacy on the internet, 01:55:44.960 |
Masterclass, online courses that I enjoy from some of the most amazing humans in history, 01:55:51.280 |
and BetterHelp, online therapy with a licensed professional. 01:55:54.880 |
Please check out these sponsors in the description to get a discount and to support this podcast. 01:56:00.720 |
If you enjoy this thing, subscribe on YouTube, review it with 5 Stars on Apple Podcasts, 01:56:05.680 |
follow us on Spotify, support on Patreon, or connect with me on Twitter @LexFriedman. 01:56:11.440 |
And now let me leave you with some words from Groucho Marx. 01:56:15.760 |
"If you're not having fun, you're doing something wrong." 01:56:19.520 |
Thank you for listening and hope to see you next time. 01:56:24.640 |
Lex Friedman "If you're not having fun, you're doing something wrong."