back to index

Gary Marcus: Toward a Hybrid of Deep Learning and Symbolic AI | Lex Fridman Podcast #43


Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Gary Marcus.
00:00:02.760 | He's a professor emeritus at NYU,
00:00:05.000 | founder of Robust AI and Geometric Intelligence.
00:00:08.220 | The latter is a machine learning company
00:00:10.320 | that was acquired by Uber in 2016.
00:00:13.520 | He's the author of several books
00:00:15.760 | on natural and artificial intelligence,
00:00:18.200 | including his new book, "Rebooting AI,
00:00:20.800 | "Building Machines We Can Trust."
00:00:23.360 | Gary has been a critical voice,
00:00:25.520 | highlighting the limits of deep learning and AI in general,
00:00:28.840 | and discussing the challenges before our AI community
00:00:33.720 | that must be solved in order to achieve
00:00:35.760 | artificial general intelligence.
00:00:38.320 | As I'm having these conversations,
00:00:40.120 | I try to find paths toward insight, towards new ideas.
00:00:43.600 | I try to have no ego in the process.
00:00:45.960 | It gets in the way.
00:00:47.640 | I'll often continuously try on several hats, several roles.
00:00:52.280 | One, for example, is the role of a three-year-old
00:00:54.720 | who understands very little about anything
00:00:57.120 | and asks big what and why questions.
00:01:00.360 | The other might be a role of a devil's advocate
00:01:02.920 | who presents counter ideas with a goal of arriving
00:01:05.620 | at greater understanding through debate.
00:01:08.240 | Hopefully, both are useful, interesting,
00:01:11.240 | and even entertaining at times.
00:01:13.440 | I ask for your patience as I learn
00:01:15.400 | to have better conversations.
00:01:17.760 | This is the Artificial Intelligence Podcast.
00:01:20.780 | If you enjoy it, subscribe on YouTube,
00:01:23.120 | give it five stars on iTunes, support it on Patreon,
00:01:26.320 | or simply connect with me on Twitter
00:01:28.560 | at Lex Friedman, spelled F-R-I-D-M-A-N.
00:01:32.520 | And now, here's my conversation with Gary Marcus.
00:01:36.340 | Do you think human civilization will one day have
00:01:40.400 | to face an AI-driven technological singularity
00:01:42.960 | that will, in a societal way, modify our place
00:01:46.520 | in the food chain of intelligent living beings
00:01:49.120 | on this planet?
00:01:50.120 | - I think our place in the food chain is already changed.
00:01:54.860 | So there are lots of things people used to do by hand
00:01:57.360 | that they do with machine.
00:01:59.200 | If you think of a singularity as like one single moment,
00:02:01.840 | which is, I guess, what it suggests,
00:02:03.240 | I don't know if it'll be like that.
00:02:04.600 | But I think that there's a lot of gradual change,
00:02:07.400 | and AI is getting better and better.
00:02:09.280 | I mean, I'm here to tell you why I think it's not nearly
00:02:11.460 | as good as people think, but the overall trend is clear.
00:02:14.440 | Maybe Ray Kurzweil thinks it's an exponential,
00:02:17.400 | and I think it's linear.
00:02:18.480 | In some cases, it's close to zero right now,
00:02:20.840 | but it's all gonna happen.
00:02:22.440 | We are gonna get to human-level intelligence,
00:02:24.840 | or whatever you want, what you will,
00:02:27.440 | artificial general intelligence at some point,
00:02:30.240 | and that's certainly gonna change our place
00:02:31.840 | in the food chain, 'cause a lot of the tedious things
00:02:34.280 | that we do now, we're gonna have machines do,
00:02:36.280 | and a lot of the dangerous things that we do now,
00:02:38.580 | we're gonna have machines do.
00:02:39.920 | I think our whole lives are gonna change
00:02:41.700 | from people finding their meaning through their work,
00:02:45.040 | through people finding their meaning
00:02:46.720 | through creative expression.
00:02:48.720 | - So the singularity will be a very gradual,
00:02:53.940 | in fact, removing the meaning of the word singularity,
00:02:56.620 | it'll be a very gradual transformation, in your view?
00:03:00.540 | - I think that it'll be somewhere in between,
00:03:03.460 | and I guess it depends what you mean by gradual and sudden.
00:03:05.700 | I don't think it's gonna be one day.
00:03:07.340 | I think it's important to realize
00:03:08.860 | that intelligence is a multidimensional variable.
00:03:11.820 | So people sort of write this stuff
00:03:14.420 | as if IQ was one number, and the day that you hit 262
00:03:19.420 | or whatever, you displace the human beings.
00:03:22.700 | And really, there's lots of facets to intelligence.
00:03:25.300 | So there's verbal intelligence,
00:03:26.720 | and there's motor intelligence,
00:03:28.560 | and there's mathematical intelligence, and so forth.
00:03:32.060 | Machines, in their mathematical intelligence,
00:03:34.620 | far exceed most people already.
00:03:36.900 | In their ability to play games,
00:03:38.140 | they far exceed most people already.
00:03:40.100 | In their ability to understand language,
00:03:41.760 | they lag behind my five-year-old,
00:03:43.140 | far behind my five-year-old.
00:03:44.740 | So there are some facets of intelligence
00:03:46.860 | that machines have grasped, and some that they haven't,
00:03:49.460 | and we have a lot of work left to do
00:03:51.780 | to get them to, say, understand natural language,
00:03:54.300 | or to understand how to flexibly approach
00:03:57.780 | some kind of novel MacGyver problem-solving
00:04:01.340 | kind of situation.
00:04:03.020 | And I don't know that all of these things will come at once.
00:04:05.620 | I think there are certain vital prerequisites
00:04:07.940 | that we're missing now.
00:04:09.320 | So, for example, machines don't really have common sense now.
00:04:12.500 | So they don't understand that bottles contain water,
00:04:15.540 | and that people drink water to quench their thirst,
00:04:18.140 | and that they don't wanna dehydrate.
00:04:19.380 | They don't know these basic facts about human beings,
00:04:22.100 | and I think that that's a rate-limiting step for many things.
00:04:25.300 | It's a rate-limiting step for reading, for example,
00:04:27.680 | because stories depend on things like,
00:04:29.740 | oh my God, that person's running out of water,
00:04:31.540 | that's why they did this thing.
00:04:33.040 | Or if only they had water,
00:04:35.580 | they could put out the fire, or whatever.
00:04:37.100 | So you watch a movie,
00:04:38.500 | and your knowledge about how things work matter.
00:04:41.220 | And so a computer can't understand that movie
00:04:44.320 | if it doesn't have that background knowledge.
00:04:45.780 | Same thing if you read a book.
00:04:47.900 | And so there are lots of places where if we had a good
00:04:50.980 | machine-interpretable set of common sense,
00:04:53.740 | many things would accelerate relatively quickly,
00:04:56.540 | but I don't think even that is a single point.
00:04:59.900 | There's many different aspects of knowledge.
00:05:02.500 | And we might, for example, find that we make a lot of progress
00:05:05.660 | on physical reasoning, getting machines to understand,
00:05:08.460 | for example, how keys fit into locks,
00:05:10.940 | or that kind of stuff,
00:05:11.940 | or how this gadget here works, and so forth and so on.
00:05:17.500 | Machines might do that long before they do
00:05:19.500 | really good psychological reasoning.
00:05:21.780 | 'Cause it's easier to get labeled data,
00:05:24.380 | or to do direct experimentation on a microphone stand
00:05:28.700 | than it is to do direct experimentation on human beings
00:05:31.780 | to understand the levers that guide them.
00:05:34.860 | - That's a really interesting point, actually.
00:05:36.860 | Whether it's easier to gain common sense knowledge
00:05:39.740 | or psychological knowledge.
00:05:41.740 | - I would say that common sense knowledge
00:05:43.300 | includes both physical knowledge and psychological knowledge.
00:05:46.860 | - And the argument I was making--
00:05:47.700 | - You said physical versus psychological.
00:05:49.660 | - Yeah, physical versus psychological.
00:05:51.060 | And the argument I was making is physical knowledge
00:05:53.220 | might be more accessible,
00:05:54.220 | because you could have a robot, for example,
00:05:55.980 | lift a bottle, try putting a bottle cap on it,
00:05:58.380 | see that it falls off if it does this,
00:06:00.380 | and see that it could turn it upside down,
00:06:01.980 | and so the robot could do some experimentation.
00:06:04.660 | We do some of our psychological reasoning
00:06:07.180 | by looking at our own minds.
00:06:09.180 | So I can sort of guess how you might react to something
00:06:11.900 | based on how I think I would react to it.
00:06:13.780 | And robots don't have that intuition,
00:06:15.940 | and they also can't do experiments on people
00:06:18.460 | in the same way, or we'll probably shut them down.
00:06:20.500 | So if we wanted to have robots figure out
00:06:24.260 | how I respond to pain by pinching me in different ways,
00:06:27.940 | that's probably, it's not gonna make it
00:06:29.660 | past the human subjects board,
00:06:31.020 | and companies are gonna get sued or whatever.
00:06:32.900 | So there's certain kinds of practical experience
00:06:35.860 | that are limited or off limits to robots.
00:06:39.660 | - That's a really interesting point.
00:06:41.060 | What is more difficult to gain a grounding in?
00:06:46.060 | Because to play devil's advocate,
00:06:49.980 | I would say that human behavior is easier expressed
00:06:54.980 | in data in digital form.
00:06:56.980 | And so when you look at Facebook algorithms,
00:06:59.140 | they get to observe human behavior.
00:07:01.140 | So you get to study and manipulate even a human behavior
00:07:04.660 | in a way that you perhaps cannot study
00:07:07.740 | or manipulate the physical world.
00:07:09.580 | So it's true why you said pain is like physical pain,
00:07:14.460 | but that's again the physical world.
00:07:16.060 | Emotional pain might be much easier to experiment with,
00:07:20.140 | perhaps unethical, but nevertheless,
00:07:22.820 | some would argue it's already going on.
00:07:25.420 | - I think that you're right, for example,
00:07:27.380 | that Facebook does a lot of experimentation
00:07:30.860 | in psychological reasoning.
00:07:32.940 | In fact, Zuckerberg talked about AI
00:07:36.100 | at a talk that he gave at NIPS.
00:07:38.460 | I wasn't there, but the conference has been renamed NIRPS,
00:07:41.340 | but it used to be called NIPS when he gave the talk.
00:07:43.660 | And he talked about Facebook basically
00:07:45.340 | having a gigantic theory of mind.
00:07:47.140 | So I think it is certainly possible.
00:07:49.540 | I mean, Facebook does some of that.
00:07:51.260 | I think they have a really good idea
00:07:52.660 | of how to addict people to things.
00:07:53.940 | They understand what draws people back to things.
00:07:56.460 | I think they exploit it in ways
00:07:57.620 | that I'm not very comfortable with.
00:07:59.260 | But even so, I think that there are only some slices
00:08:03.340 | of human experience that they can access
00:08:05.660 | through the kind of interface they have.
00:08:07.260 | And of course, they're doing all kinds of VR stuff,
00:08:08.980 | and maybe that'll change, and they'll expand their data.
00:08:11.740 | And I'm sure that that's part of their goal.
00:08:14.940 | So it is an interesting question.
00:08:16.860 | - I think love, fear, insecurity,
00:08:21.700 | all of the things that I would say
00:08:25.140 | some of the deepest things about human nature
00:08:27.860 | and the human mind could be explored through digital form.
00:08:30.500 | It's that you're actually the first person
00:08:32.220 | just now that brought up, I wonder what is more difficult?
00:08:35.860 | Because I think folks who are the slow,
00:08:40.220 | and we'll talk a lot about deep learning,
00:08:41.820 | but the people who are thinking beyond deep learning
00:08:44.860 | are thinking about the physical world.
00:08:46.420 | You're starting to think about robotics
00:08:48.060 | in the home robotics.
00:08:49.180 | How do we make robots manipulate objects,
00:08:52.300 | which requires an understanding of the physical world
00:08:55.020 | and it requires common sense reasoning.
00:08:57.300 | And that has felt to be like the next step
00:08:59.420 | for common sense reasoning,
00:09:00.420 | but you've now brought up the idea
00:09:02.100 | that there's also the emotional part.
00:09:03.620 | And it's interesting whether that's hard or easy.
00:09:06.820 | - I think some parts of it are and some aren't.
00:09:08.540 | So my company that I recently founded with Rod Brooks,
00:09:12.620 | from MIT for many years and so forth,
00:09:14.980 | we're interested in both.
00:09:17.220 | We're interested in physical reasoning
00:09:18.580 | and psychological reasoning among many other things.
00:09:21.500 | And there are pieces of each of these that are accessible.
00:09:26.140 | So if you want a robot to figure out
00:09:28.020 | whether it can fit under a table,
00:09:29.720 | that's a relatively accessible piece of physical reasoning.
00:09:33.640 | If you know the height of the table
00:09:34.760 | and you know the height of the robot, it's not that hard.
00:09:37.000 | If you wanted to do physical reasoning about Jenga,
00:09:39.920 | it gets a little bit more complicated
00:09:41.480 | and you have to have higher resolution data
00:09:43.840 | in order to do it.
00:09:45.240 | With psychological reasoning, it's not that hard to know,
00:09:48.840 | for example, that people have goals
00:09:50.320 | and they like to act on those goals,
00:09:51.700 | but it's really hard to know exactly what those goals are.
00:09:54.880 | - But ideas of frustration.
00:09:56.800 | I mean, you could argue it's extremely difficult
00:09:58.800 | to understand the sources of human frustration
00:10:01.440 | as they're playing Jenga with you or not.
00:10:05.680 | You could argue that it's very accessible.
00:10:07.960 | - There's some things that are gonna be obvious
00:10:09.700 | and some not.
00:10:10.540 | So I don't think anybody really can do this well yet,
00:10:14.200 | but I think it's not inconceivable to imagine machines
00:10:18.200 | in the not so distant future being able to understand
00:10:22.600 | that if people lose in a game, that they don't like that.
00:10:26.200 | That's not such a hard thing to program
00:10:27.900 | and it's pretty consistent across people.
00:10:29.960 | Most people don't enjoy losing
00:10:31.520 | and so that makes it relatively easy to code.
00:10:34.600 | On the other hand, if you wanted to capture everything
00:10:36.820 | about frustration, well, people can get frustrated
00:10:39.120 | for a lot of different reasons.
00:10:40.280 | They might get sexually frustrated,
00:10:42.320 | they might get frustrated
00:10:43.160 | they can get their promotion at work,
00:10:45.120 | all kinds of different things.
00:10:46.840 | And the more you expand the scope,
00:10:48.560 | the harder it is for anything like the existing techniques
00:10:51.480 | to really do that.
00:10:52.960 | - So I'm talking to Garry Kasparov next week
00:10:55.600 | and he seemed pretty frustrated
00:10:57.200 | with his game against T-Blue.
00:10:58.880 | - Yeah, well, I'm frustrated with my game
00:11:00.240 | against him last year 'cause I played him.
00:11:02.600 | I had two excuses, I'll give you my excuses up front,
00:11:04.840 | but it won't mitigate the outcome.
00:11:07.000 | I was jet lagged and I hadn't played in 25 or 30 years,
00:11:10.920 | but the outcome is he completely destroyed me
00:11:12.960 | and it wasn't even close.
00:11:14.360 | - Have you ever been beaten in any board game by a machine?
00:11:19.360 | - I have, I actually played the predecessor to Deep Blue,
00:11:24.680 | Deep Thought, I believe it was called.
00:11:27.880 | And that too crushed me.
00:11:29.960 | - And that was, and after that you realize it's over for us.
00:11:35.280 | - Well, there's no point in my playing Deep Blue,
00:11:36.800 | I mean, it's a waste of Deep Blue's computation.
00:11:40.240 | I mean, I played Kasparov
00:11:41.480 | 'cause we both gave lectures at this same event
00:11:44.760 | and he was playing 30 people.
00:11:46.000 | I forgot to mention that not only did he crush me,
00:11:47.920 | but he crushed 29 other people at the same time.
00:11:50.640 | - I mean, but the actual philosophical
00:11:53.800 | and emotional experience of being beaten by a machine,
00:11:57.280 | I imagine, to you who thinks about these things,
00:12:01.360 | may be a profound experience.
00:12:03.520 | Or no, it was a simple mathematical experience?
00:12:07.720 | - Yeah, I think a game like chess,
00:12:09.720 | particularly where you have perfect information,
00:12:12.720 | it's two player closed end
00:12:14.760 | and there's more computation for the computer,
00:12:16.880 | it's no surprise the machine wins.
00:12:18.840 | I mean, I'm not sad when a computer calculates
00:12:23.840 | a cube root faster than me.
00:12:25.240 | Like, I know I can't win that game, I'm not gonna try.
00:12:28.920 | - Well, with a system like AlphaGo or AlphaZero,
00:12:32.120 | do you see a little bit more magic in a system like that,
00:12:35.120 | even though it's simply playing a board game,
00:12:37.280 | but because there's a strong learning component?
00:12:39.960 | - You know, I find you should mention that
00:12:41.360 | in the context of this conversation
00:12:42.640 | 'cause Kasparov and I are working on an article
00:12:45.360 | that's gonna be called AI is not magic.
00:12:47.360 | And neither one of us thinks that it's magic
00:12:50.520 | and part of the point of this article
00:12:52.000 | is that AI is actually a grab bag of different techniques
00:12:55.180 | and some of them have,
00:12:56.080 | or they each have their own unique strengths and weaknesses.
00:12:59.160 | So, you know, you read media accounts
00:13:02.840 | and it's like, ooh, AI, it must, you know,
00:13:04.600 | it's magical or it can solve any problem.
00:13:06.600 | Well, no, some problems are really accessible
00:13:09.520 | like chess and Go and other problems like reading
00:13:12.040 | are completely outside the current technology.
00:13:15.000 | It's not like you can take the technology
00:13:17.160 | that drives AlphaGo and apply it to reading
00:13:20.160 | and get anywhere.
00:13:21.360 | You know, DeepMind has tried that a bit,
00:13:23.240 | they have all kinds of resources,
00:13:24.520 | you know, they built AlphaGo and they have,
00:13:26.180 | you know, I wrote a piece recently that they lost
00:13:29.460 | and you can argue about the word lost,
00:13:30.540 | but they spent $530 million more than they made last year.
00:13:34.900 | So, you know, they're making huge investments,
00:13:36.620 | they have a large budget
00:13:37.860 | and they have applied the same kinds of techniques
00:13:40.900 | to reading or to language
00:13:43.180 | and it's just much less productive there
00:13:45.540 | 'cause it's a fundamentally different kind of problem.
00:13:47.900 | Chess and Go and so forth are closed end problems,
00:13:50.660 | the rules haven't changed in 2,500 years,
00:13:52.980 | there's only so many moves you can make,
00:13:54.720 | you can talk about the exponential
00:13:56.460 | as you look at the combinations of moves,
00:13:58.200 | but fundamentally, you know, the Go board has 361 squares,
00:14:01.240 | that's it, that's the only, you know,
00:14:02.600 | those intersections are the only places
00:14:05.120 | that you can place your stone.
00:14:07.280 | Whereas when you're reading,
00:14:09.120 | the next sentence could be anything, you know,
00:14:11.760 | it's completely up to the writer
00:14:13.300 | what they're gonna do next.
00:14:14.440 | - That's fascinating that you think this way.
00:14:16.240 | You're clearly a brilliant mind
00:14:17.960 | who points out the emperor has no clothes,
00:14:19.680 | but so I'll play the role of a person who says--
00:14:22.320 | - You can put clothes on the emperor, good luck with it.
00:14:24.280 | - Romanticizes the notion of the emperor, period,
00:14:28.000 | suggesting that clothes don't even matter.
00:14:30.160 | Okay, so that's really interesting
00:14:33.560 | that you're talking about language.
00:14:35.320 | So there's the physical world,
00:14:37.920 | being able to move about the world,
00:14:39.680 | making an omelet and coffee and so on.
00:14:41.960 | There's language where you first understand
00:14:46.040 | what's being written and then maybe even more complicated
00:14:48.840 | than that, having a natural dialogue.
00:14:51.200 | And then there's the game of Go and chess.
00:14:53.600 | I would argue that language is much closer to Go
00:14:57.520 | than it is to the physical world.
00:14:59.680 | Like it is still very constrained.
00:15:01.440 | When you say the possibility of the number of sentences
00:15:04.720 | that could come, it is huge,
00:15:06.440 | but it nevertheless is much more constrained.
00:15:09.240 | It feels maybe I'm wrong than the possibilities
00:15:12.640 | that the physical world brings us.
00:15:14.480 | - There's something to what you say
00:15:15.760 | in some ways in which I disagree.
00:15:17.600 | So one interesting thing about language
00:15:20.520 | is that it abstracts away.
00:15:23.240 | This bottle, I don't know if it would be in the field of view
00:15:26.040 | is on this table.
00:15:27.160 | And I use the word on here,
00:15:28.680 | and I can use the word on here, maybe not here,
00:15:32.880 | but that one word encompasses in analog space
00:15:36.880 | a sort of infinite number of possibilities.
00:15:39.280 | So there is a way in which language filters down
00:15:43.000 | the variation of the world.
00:15:45.160 | And there's other ways.
00:15:46.600 | So we have a grammar and more or less,
00:15:49.840 | you have to follow the rules of that grammar.
00:15:51.680 | You can break them a little bit,
00:15:52.680 | but by and large, we follow the rules of grammar.
00:15:55.400 | And so that's a constraint on language.
00:15:57.000 | So there are ways in which language is a constrained system.
00:15:59.400 | On the other hand, there are many arguments.
00:16:02.280 | Let's say there's an infinite number of possible sentences,
00:16:04.880 | and you can establish that by just stacking them up.
00:16:07.640 | So I think there's water on the table.
00:16:09.480 | You think that I think that there's water on the table.
00:16:11.720 | Your mother thinks that you think that I think
00:16:13.280 | that the water is on the table.
00:16:14.560 | Your brother thinks that maybe your mom is wrong
00:16:16.920 | to think that you think that I think, right?
00:16:19.320 | You know, we can make sentences of infinite length,
00:16:21.960 | or we can stack up adjectives.
00:16:23.560 | This is a very silly example, a very, very silly example,
00:16:26.400 | a very, very, very, very, very, very silly example,
00:16:28.800 | and so forth.
00:16:29.640 | So there are good arguments
00:16:30.960 | that there's an infinite range of sentences.
00:16:32.440 | In any case, it's vast by any reasonable measure.
00:16:35.800 | And for example, almost anything in the physical world
00:16:37.960 | we can talk about in the language world.
00:16:40.480 | And interestingly, many of the sentences that we understand,
00:16:43.800 | we can only understand if we have a very rich model
00:16:46.840 | of the physical world.
00:16:47.840 | So I don't ultimately want to adjudicate the debate
00:16:50.640 | that I think you just set up, but I find it interesting.
00:16:53.440 | Maybe the physical world is even more complicated
00:16:57.200 | than language.
00:16:58.040 | I think that's fair, but--
00:16:59.600 | - But you think that language--
00:17:00.680 | - Language is really, really complicated.
00:17:02.400 | - Is hard.
00:17:03.240 | - It's really, really hard.
00:17:04.120 | Well, it's really, really hard for machines,
00:17:06.120 | for linguists, people trying to understand it.
00:17:08.520 | It's not that hard for children,
00:17:09.680 | and that's part of what's driven my whole career.
00:17:12.120 | I was a student of Steven Pinker's,
00:17:14.360 | and we were trying to figure out
00:17:15.360 | why kids couldn't learn language when machines couldn't.
00:17:18.720 | - I think we're gonna get into language.
00:17:20.560 | We're gonna get into communication intelligence
00:17:22.480 | and neural networks and so on.
00:17:24.240 | But let me return to the high level,
00:17:28.880 | the futuristic for a brief moment.
00:17:32.520 | So you've written in your book, in your new book,
00:17:37.320 | it would be arrogant to suppose that we could forecast
00:17:39.960 | where AI will be, or the impact it will have
00:17:42.480 | in a thousand years, or even 500 years.
00:17:45.160 | So let me ask you to be arrogant.
00:17:47.080 | What do AI systems with or without physical bodies
00:17:51.480 | look like 100 years from now?
00:17:53.520 | If you would just, you can't predict,
00:17:56.800 | but if you were to philosophize and imagine, do--
00:18:00.520 | - Can I first justify the arrogance
00:18:02.040 | before you try to push me beyond it?
00:18:04.080 | - Sure.
00:18:04.920 | - I mean, there are examples.
00:18:06.760 | Like, people figured out how electricity worked.
00:18:09.680 | They had no idea that that was gonna lead to cell phones.
00:18:13.040 | I mean, things can move awfully fast
00:18:15.600 | once new technologies are perfected.
00:18:17.920 | Even when they made transistors,
00:18:19.440 | they weren't really thinking that cell phones
00:18:21.080 | would lead to social networking.
00:18:23.360 | - There are, nevertheless, predictions of the future,
00:18:25.720 | which are statistically unlikely to come to be,
00:18:28.800 | but nevertheless is the best--
00:18:29.640 | - You're asking me to be wrong.
00:18:31.360 | - Asking you to be statistically--
00:18:32.200 | - In which way would I like to be wrong?
00:18:34.040 | - Pick the least unlikely to be wrong thing,
00:18:37.520 | even though it's most very likely to be wrong.
00:18:39.760 | I mean, here's some things
00:18:40.600 | that we can safely predict, I suppose.
00:18:42.760 | We can predict that AI will be faster than it is now.
00:18:46.280 | It will be cheaper than it is now.
00:18:49.520 | It will be better in the sense of being more general
00:18:52.880 | and applicable in more places.
00:18:55.740 | It will be pervasive.
00:18:58.340 | You know, I mean, these are easy predictions.
00:19:01.640 | I'm sort of modeling them in my head
00:19:03.320 | on Jeff Bezos's famous predictions.
00:19:05.840 | He says, "I can't predict the future.
00:19:07.280 | "Not in every way."
00:19:08.160 | I'm paraphrasing.
00:19:09.840 | But I can predict that people will never wanna pay
00:19:11.880 | more money for their stuff.
00:19:13.280 | They're never gonna want it to take longer to get there.
00:19:15.280 | And you know, so like you can't predict everything,
00:19:17.840 | but you can predict something.
00:19:18.920 | Sure, of course it's gonna be faster and better.
00:19:20.960 | And what we can't really predict is the full scope
00:19:25.680 | of where AI will be in a certain period.
00:19:28.760 | I mean, I think it's safe to say that,
00:19:31.960 | although I'm very skeptical about current AI,
00:19:34.860 | that it's possible to do much better.
00:19:37.760 | You know, there's no in-principle argument
00:19:39.760 | that says AI is an insolvable problem,
00:19:42.120 | that there's magic inside our brains
00:19:43.640 | that will never be captured.
00:19:45.000 | I mean, I've heard people make those kind of arguments.
00:19:46.840 | I don't think they're very good.
00:19:49.040 | So AI is gonna come.
00:19:50.480 | And probably 500 years is plenty to get there.
00:19:55.560 | And then once it's here, it really will change everything.
00:19:59.280 | - So when you say AI is gonna come,
00:20:01.100 | are you talking about human level intelligence?
00:20:03.680 | So maybe--
00:20:05.000 | - I like the term general intelligence.
00:20:06.680 | So I don't think that the ultimate AI,
00:20:09.560 | if there is such a thing,
00:20:10.680 | is gonna look just like humans.
00:20:12.000 | I think it's gonna do some things
00:20:13.640 | that humans do better than current machines,
00:20:16.600 | like reason flexibly and understand language and so forth.
00:20:21.200 | But that doesn't mean they have to be identical to humans.
00:20:23.480 | So for example, humans have terrible memory
00:20:26.000 | and they suffer from what some people
00:20:28.800 | call motivated reasoning.
00:20:29.960 | So they like arguments that seem to support them
00:20:32.480 | and they dismiss arguments that they don't like.
00:20:35.480 | There's no reason that a machine should ever do that.
00:20:38.720 | - So you see that those limitations of memory
00:20:42.320 | as a bug, not a feature.
00:20:43.960 | - Absolutely.
00:20:44.880 | I'll say two things about that.
00:20:46.680 | One is I was on a panel with Danny Conneman,
00:20:48.480 | the Nobel Prize winner, last night,
00:20:50.320 | and we were talking about this stuff.
00:20:51.800 | And I think what we converged on
00:20:53.520 | is that humans are a low bar to exceed.
00:20:56.160 | They may be outside of our skill right now,
00:20:58.960 | but as AI programmers, but eventually AI will exceed it.
00:21:03.960 | So we're not talking about human level AI.
00:21:06.080 | We're talking about general intelligence
00:21:07.940 | that can do all kinds of different things
00:21:09.440 | and do it without some of the flaws that human beings have.
00:21:12.240 | The other thing I'll say is I wrote a whole book,
00:21:13.720 | actually, about the flaws of humans.
00:21:15.320 | It's actually a nice bookend to the,
00:21:18.000 | or counterpoint to the current book.
00:21:19.200 | So I wrote a book called "Kluge,"
00:21:21.440 | which was about the limits of the human mind.
00:21:24.080 | Current book is kind of about those few things
00:21:26.400 | that humans do a lot better than machines.
00:21:28.800 | - Do you think it's possible that the flaws
00:21:30.880 | of the human mind, the limits of memory,
00:21:33.320 | our mortality, our bias is a strength, not a weakness?
00:21:38.320 | That that is the thing that enables
00:21:43.540 | from which motivation springs and meaning springs?
00:21:47.820 | - I've heard a lot of arguments like this.
00:21:49.540 | I've never found them that convincing.
00:21:50.940 | I think that there's a lot of making lemonade out of lemons.
00:21:55.180 | So we, for example, do a lot of free association
00:21:58.340 | where one idea just leads to the next
00:22:00.860 | and they're not really that well connected.
00:22:02.620 | And we enjoy that and we make poetry out of it
00:22:04.580 | and we make kind of movies with free associations
00:22:07.180 | and it's fun and whatever.
00:22:08.200 | I don't think that's really a virtue of the system.
00:22:12.360 | I think that the limitations in human reasoning
00:22:15.400 | actually get us in a lot of trouble.
00:22:16.640 | Like, for example, politically, we can't see eye to eye
00:22:19.360 | because we have the motivational reasoning
00:22:21.520 | I was talking about and something related
00:22:22.740 | called confirmation bias.
00:22:24.180 | So we have all of these problems
00:22:26.520 | that actually make for a rougher society
00:22:28.680 | because we can't get along
00:22:30.000 | 'cause we can't interpret the data in shared ways.
00:22:32.780 | And then we do some nice stuff with that.
00:22:36.520 | So my free associations are different from yours
00:22:38.960 | and you're kind of amused by them and that's great.
00:22:41.680 | And hence poetry.
00:22:42.700 | So there are lots of ways in which we take a lousy situation
00:22:46.240 | and make it good.
00:22:47.600 | Another example would be our memories are terrible.
00:22:50.680 | So we play games like concentration
00:22:52.400 | where you flip over two cards, try to find a pair.
00:22:55.040 | Can you imagine a computer playing that?
00:22:56.560 | Computers, like, this is the dullest game in the world.
00:22:58.360 | I know where all the cards are.
00:22:59.360 | I see it once, I know where it is.
00:23:01.040 | What are you even talking about?
00:23:02.600 | Can we make a fun game out of having this terrible memory?
00:23:07.080 | - So we are imperfect in discovering
00:23:10.600 | and optimizing some kind of utility function.
00:23:13.580 | But you think in general there is a utility function.
00:23:16.320 | There's an objective function that's better than others.
00:23:18.880 | - I didn't say that.
00:23:20.400 | - But see, the presumption, when you say--
00:23:24.440 | - I think you could design a better memory system.
00:23:27.260 | You could argue about utility functions
00:23:29.920 | and how you want to think about that.
00:23:32.120 | But objectively, it would be really nice
00:23:34.200 | to do some of the following things.
00:23:36.520 | To get rid of memories that are no longer useful.
00:23:40.040 | Objectively, that would just be good.
00:23:42.720 | And we're not that good at it.
00:23:43.600 | So when you park in the same lot every day,
00:23:46.560 | you confuse where you parked today
00:23:47.940 | with where you parked yesterday
00:23:48.880 | with where you parked the day before and so forth.
00:23:50.740 | So you blur together a series of memories.
00:23:52.640 | There's just no way that that's optimal.
00:23:55.400 | I mean, I've heard all kinds of wacky arguments.
00:23:57.120 | People trying to defend that.
00:23:58.180 | But in the end of the day,
00:23:59.020 | I don't think any of them hold water.
00:24:00.440 | - Or trauma, memories of traumatic events
00:24:02.840 | would be possibly a very nice feature
00:24:05.280 | to have to get rid of those.
00:24:06.800 | - It'd be great if you could just be like,
00:24:08.240 | "I'm gonna wipe this sector.
00:24:10.520 | "I'm done with that.
00:24:11.960 | "I didn't have fun last night.
00:24:13.220 | "I don't want to think about it anymore.
00:24:14.720 | "Whoop, bye bye."
00:24:15.840 | - Gone.
00:24:16.680 | - But we can't.
00:24:17.800 | - Do you think it's possible to build a system,
00:24:20.400 | so you said human level intelligence is a weird concept.
00:24:23.820 | - Well, I'm saying I prefer general intelligence.
00:24:25.440 | - General intelligence.
00:24:26.280 | - Human level intelligence is a real thing.
00:24:28.160 | And you could try to make a machine
00:24:29.880 | that matches people or something like that.
00:24:32.000 | I'm saying that per se shouldn't be the objective,
00:24:34.260 | but rather that we should learn from humans
00:24:37.240 | the things they do well and incorporate that into our AI,
00:24:39.680 | just as we incorporate the things that machines do well
00:24:42.120 | that people do terribly.
00:24:43.280 | So, I mean, it's great that AI systems
00:24:45.800 | can do all this brute force computation that people can't.
00:24:48.520 | And one of the reasons I work on this stuff
00:24:50.840 | is because I would like to see machines solve problems
00:24:53.320 | that people can't,
00:24:54.520 | that combine the strength,
00:24:56.040 | or that in order to be solved would combine
00:24:59.480 | the strengths of machines to do all this computation
00:25:02.240 | with the ability, let's say, of people to read.
00:25:04.240 | So, I'd like machines that can read
00:25:06.200 | the entire medical literature in a day.
00:25:08.680 | 7,000 new papers or whatever the number is
00:25:10.840 | comes out every day.
00:25:11.760 | There's no way for any doctor or whatever to read them all.
00:25:15.440 | A machine that could read would be a brilliant thing.
00:25:18.000 | And that would be strengths of brute force computation
00:25:21.100 | combined with kind of subtlety and understanding medicine
00:25:24.320 | that a good doctor or scientist has.
00:25:26.920 | - So, if we can linger a little bit
00:25:28.040 | on the idea of general intelligence.
00:25:29.680 | So, Jan Lekun believes that human intelligence
00:25:32.880 | isn't general at all, it's very narrow.
00:25:35.560 | How do you think?
00:25:36.720 | - I don't think that makes sense.
00:25:38.160 | We have lots of narrow intelligences for specific problems.
00:25:42.160 | But the fact is, like, anybody can walk into,
00:25:45.960 | let's say, a Hollywood movie,
00:25:47.640 | and reason about the content
00:25:49.160 | of almost anything that goes on there.
00:25:51.720 | So, you can reason about what happens in a bank robbery
00:25:55.200 | or what happens when someone is infertile
00:25:58.640 | and wants to go to IVF to try to have a child.
00:26:02.800 | Or you can, you know, the list is essentially endless.
00:26:05.960 | And not everybody understands every scene in a movie.
00:26:09.560 | But there's a huge range of things
00:26:11.760 | that pretty much any ordinary adult can understand.
00:26:15.080 | - His argument is that actually,
00:26:18.200 | the set of things seems large to us humans
00:26:20.720 | because we're very limited in considering
00:26:24.360 | the kind of possibilities of experiences that are possible.
00:26:27.360 | But in fact, the amount of experience that are possible
00:26:30.200 | is infinitely larger than--
00:26:33.000 | - I mean, if you want to make an argument
00:26:35.120 | that humans are constrained in what they can understand,
00:26:38.760 | I have no issue with that.
00:26:40.960 | I think that's right.
00:26:41.800 | But it's still not the same thing at all
00:26:44.440 | as saying, here's a system that can play Go.
00:26:47.480 | It's been trained on five million games.
00:26:49.840 | And then I say, can it play on a rectangular board
00:26:52.560 | rather than a square board?
00:26:53.680 | And you say, well, if I retrain it from scratch
00:26:56.560 | in another five million games, it can.
00:26:58.320 | That's really, really narrow.
00:26:59.800 | And that's where we are.
00:27:01.120 | We don't have even a system that could play Go
00:27:05.120 | and then without further retraining,
00:27:07.060 | play on a rectangular board,
00:27:08.660 | which any good human could do, you know,
00:27:11.200 | with very little problem.
00:27:12.560 | So that's what I mean by narrow.
00:27:15.280 | And so it's just wordplay to say--
00:27:16.800 | - Then it's semantics.
00:27:18.040 | Then it's just words.
00:27:19.240 | Then yeah, you mean general in a sense
00:27:21.120 | that you can do all kinds of Go board shapes flexibly.
00:27:25.760 | - Well, I mean, that would be like a first step
00:27:28.080 | in the right direction.
00:27:29.040 | Obviously, that's not what it really meaning.
00:27:30.520 | You're kidding.
00:27:31.360 | What I mean by general is that you could transfer
00:27:36.120 | the knowledge you learn in one domain to another.
00:27:38.960 | So if you learn about bank robberies in movies
00:27:43.320 | and there's chase scenes,
00:27:44.780 | then you can understand that amazing scene in "Breaking Bad"
00:27:47.720 | when Walter White has a car chase scene
00:27:50.560 | with only one person.
00:27:51.500 | He's the only one in it.
00:27:52.600 | And you can reflect on how that car chase scene
00:27:55.520 | is like all the other car chase scenes you've ever seen
00:27:58.240 | and totally different and why that's cool.
00:28:01.160 | - And the fact that the number of domains
00:28:03.080 | you can do that with is finite
00:28:04.520 | doesn't make it less general.
00:28:05.760 | So the idea of general is you could just do it
00:28:07.320 | on a lot of, transfer it across a lot of domains.
00:28:09.400 | - Yeah, I mean, I'm not saying humans
00:28:10.720 | are infinitely general or that humans are perfect.
00:28:12.960 | I just said a minute ago, it's a low bar,
00:28:15.360 | but it's just, it's a low bar.
00:28:17.400 | But right now, the bar is here and we're there
00:28:20.480 | and eventually we'll get way past it.
00:28:22.640 | - So speaking of low bars,
00:28:25.600 | you've highlighted in your new book as well,
00:28:27.440 | but a couple of years ago wrote a paper
00:28:29.360 | titled "Deep Learning, a Critical Appraisal"
00:28:31.280 | that lists 10 challenges faced
00:28:33.360 | by current deep learning systems.
00:28:36.040 | So let me summarize them as data efficiency,
00:28:40.160 | transfer learning, hierarchical knowledge,
00:28:42.920 | open-ended inference, explainability,
00:28:46.320 | integrating prior knowledge, causal reasoning,
00:28:49.680 | modeling an unstable world, robustness,
00:28:52.240 | adversarial examples, and so on.
00:28:54.160 | And then my favorite probably is reliability
00:28:56.880 | and engineering of real world systems.
00:28:59.160 | So whatever people can read the paper,
00:29:01.640 | they should definitely read the paper,
00:29:02.960 | should definitely read your book.
00:29:04.360 | But which of these challenges, if solved,
00:29:07.480 | in your view has the biggest impact on the AI community?
00:29:11.080 | - That's a very good question.
00:29:12.640 | And I'm gonna be evasive because I think
00:29:15.800 | that they go together a lot.
00:29:18.000 | So some of them might be solved independently of others,
00:29:21.440 | but I think a good solution to AI
00:29:23.720 | starts by having real, what I would call,
00:29:26.000 | cognitive models of what's going on.
00:29:28.480 | So right now we have an approach that's dominant
00:29:31.360 | where you take statistical approximations of things,
00:29:33.960 | but you don't really understand them.
00:29:35.800 | So you know that bottles are correlated
00:29:38.560 | in your data with bottle caps,
00:29:40.360 | but you don't understand that there's a thread
00:29:42.280 | on the bottle cap that fits with the thread on the bottle,
00:29:45.320 | and that that tightens, and if I tighten enough
00:29:47.800 | that there's a seal and the water will come out.
00:29:49.640 | Like there's no machine that understands that.
00:29:51.960 | And having a good cognitive model
00:29:53.800 | of that kind of everyday phenomena
00:29:55.480 | is what we call common sense.
00:29:56.600 | And if you had that,
00:29:57.800 | then a lot of these other things start to fall
00:30:00.720 | into at least a little bit better place.
00:30:02.840 | Right now, you're like learning correlations between pixels
00:30:05.640 | when you play a video game or something like that.
00:30:07.680 | And it doesn't work very well.
00:30:08.960 | It works when the video game is just the way
00:30:10.720 | that you studied it,
00:30:11.800 | and then you alter the video game in small ways,
00:30:13.640 | like you move the paddle and break out a few pixels,
00:30:15.760 | and the system falls apart.
00:30:17.480 | 'Cause it doesn't understand,
00:30:19.000 | it doesn't have a representation of a paddle,
00:30:20.920 | a ball, a wall, a set of bricks, and so forth.
00:30:23.360 | And so it's reasoning at the wrong level.
00:30:26.440 | - So the idea of common sense,
00:30:29.240 | it's full of mystery, you've worked on it,
00:30:31.060 | but it's nevertheless full of mystery, full of promise.
00:30:34.720 | What does common sense mean?
00:30:36.560 | What does knowledge mean?
00:30:38.000 | So the way you've been discussing it now is very intuitive.
00:30:40.960 | It makes a lot of sense that that is something
00:30:42.600 | we should have, and that's something
00:30:43.720 | deep learning systems don't have.
00:30:45.600 | But the argument could be that we're oversimplifying it
00:30:49.720 | because we're oversimplifying the notion of common sense
00:30:53.180 | because that's how we, it feels like we as humans
00:30:57.120 | at the cognitive level approach problems.
00:30:59.320 | So maybe-- - A lot of people
00:31:00.960 | aren't actually gonna read my book.
00:31:03.320 | But if they did read the book,
00:31:05.200 | one of the things that might come as a surprise to them
00:31:07.140 | is that we actually say common sense
00:31:10.000 | is really hard and really complicated.
00:31:11.640 | So they would probably, my critics know
00:31:14.080 | that I like common sense, but that chapter actually starts
00:31:17.840 | by us beating up not on deep learning,
00:31:19.900 | but kind of on our own home team as it will.
00:31:21.960 | So Ernie and I are first and foremost people
00:31:25.480 | that believe in at least some of what good old-fashioned AI
00:31:28.080 | try to do, so we believe in symbols and logic
00:31:30.720 | and programming, things like that are important.
00:31:33.760 | And we go through why even those tools
00:31:37.040 | that we hold fairly dear aren't really enough.
00:31:39.560 | So we talk about why common sense is actually many things.
00:31:42.680 | And some of them fit really well
00:31:45.040 | with those classical sets of tools.
00:31:46.560 | So things like taxonomy, so I know that a bottle
00:31:50.320 | is an object or it's a vessel, let's say,
00:31:52.840 | and I know a vessel is an object
00:31:54.480 | and objects are material things in the physical world.
00:31:57.580 | So like I can make some inferences.
00:32:00.480 | If I know that vessels need to not have holes in them,
00:32:05.480 | then I can infer that in order to carry their contents
00:32:09.540 | then I can infer that a bottle shouldn't have a hole
00:32:11.560 | in it in order to carry its contents.
00:32:12.840 | So you can do hierarchical inference and so forth.
00:32:15.600 | And we say that's great, but it's only a tiny piece
00:32:18.720 | of what you need for common sense.
00:32:21.120 | We give lots of examples that don't fit into that.
00:32:23.440 | So another one that we talk about is a cheese grater.
00:32:26.480 | You've got holes in a cheese grater,
00:32:28.040 | you've got a handle on top.
00:32:29.520 | You can build a model in the game engine sense of a model
00:32:33.400 | so that you could have a little cartoon character
00:32:35.760 | flying around through the holes of the grater,
00:32:37.980 | but we don't have a system yet,
00:32:39.980 | axiom doesn't help us that much,
00:32:41.620 | that really understands why the handle is on top
00:32:43.760 | and what you do with the handle
00:32:45.240 | or why all of those circles are sharp
00:32:47.620 | or how you'd hold the cheese with respect to the grater
00:32:50.480 | in order to make it actually work.
00:32:52.120 | - Do you think these ideas are just abstractions
00:32:55.020 | that could emerge on a system
00:32:57.160 | like a very large deep neural network?
00:32:59.920 | - I'm a skeptic that that kind of emergence per se can work.
00:33:03.120 | So I think that deep learning might play a role
00:33:05.840 | in the systems that do what I want systems to do,
00:33:08.760 | but it won't do it by itself.
00:33:09.920 | I've never seen a deep learning system
00:33:13.140 | really extract an abstract concept.
00:33:15.920 | What they do, principled reasons for that,
00:33:18.820 | stemming from how back propagation works,
00:33:20.540 | how the architectures are set up.
00:33:22.920 | One example is deep learning people
00:33:25.120 | actually all build in something like,
00:33:27.020 | build in something called convolution,
00:33:29.640 | which Jan Lekun is famous for, which is an abstraction.
00:33:33.200 | They don't have their systems learn this.
00:33:34.960 | So the abstraction is an object looks the same
00:33:37.760 | if it appears in different places.
00:33:39.200 | And what Lekun figured out and why,
00:33:41.960 | essentially why he was a co-winner of the Turing Award
00:33:44.280 | was that if you program this in innately,
00:33:47.620 | then your system would be a whole lot more efficient.
00:33:50.680 | In principle, this should be learnable,
00:33:53.200 | but people don't have systems that kind of reify things
00:33:56.240 | and make them more abstract.
00:33:58.000 | And so what you'd really wind up with
00:34:00.400 | if you don't program that in advance as a system,
00:34:02.720 | it kind of realizes that this is the same thing as this,
00:34:05.460 | but then I take your little clock there
00:34:06.980 | and I move it over and it doesn't realize
00:34:08.400 | that the same thing applies to the clock.
00:34:10.480 | - So the really nice thing, you're right,
00:34:12.680 | that convolution is just one of the things
00:34:14.760 | that's like, it's an innate feature
00:34:17.160 | that's programmed by the human expert.
00:34:19.240 | - We need more of those, not less.
00:34:21.240 | - Yes, so the, but the nice feature is
00:34:23.720 | it feels like that requires coming up
00:34:26.080 | with that brilliant idea, can get you a Turing Award,
00:34:29.780 | but it requires less effort than encoding,
00:34:34.780 | and something we'll talk about, the expert system,
00:34:36.620 | so encoding a lot of knowledge by hand.
00:34:40.020 | So it feels like one, there's a huge amount of limitations
00:34:43.480 | which you clearly outline with deep learning,
00:34:46.500 | but the nice feature of deep learning,
00:34:47.820 | whatever it is able to accomplish,
00:34:49.580 | it does it, it does a lot of stuff automatically
00:34:53.500 | without human intervention.
00:34:54.900 | - Well, and that's part of why people love it, right?
00:34:57.100 | But I always think of this quote from Bertrand Russell,
00:34:59.820 | which is, "It has all the advantages
00:35:02.700 | "of theft over honest toil."
00:35:04.420 | It's really hard to program into a machine
00:35:08.140 | a notion of causality, or even how a bottle works,
00:35:11.300 | or what containers are.
00:35:12.660 | Ernie Davis and I wrote a, I don't know,
00:35:14.260 | 45-page academic paper trying just to understand
00:35:18.000 | what a container is, which I don't think
00:35:19.580 | anybody ever read the paper,
00:35:21.100 | but it's a very detailed analysis of all the things,
00:35:25.260 | well, not even all of this,
00:35:26.100 | but some of the things you need to do
00:35:27.140 | in order to understand a container.
00:35:28.580 | It would be a whole lot nicer,
00:35:30.060 | and I'm a co-author on the paper,
00:35:32.200 | I made it a little bit better,
00:35:33.180 | but Ernie did the hard work for that particular paper.
00:35:36.580 | And it took him like three months
00:35:38.020 | to get the logical statements correct,
00:35:40.640 | and maybe that's not the right way to do it.
00:35:42.820 | It's a way to do it, but on that way of doing it,
00:35:46.100 | it's really hard work to do something
00:35:48.420 | as simple as understanding containers,
00:35:50.260 | and nobody wants to do that hard work.
00:35:52.780 | Even Ernie didn't want to do that hard work.
00:35:55.580 | Everybody would rather just feed their system in
00:35:58.340 | with a bunch of videos with a bunch of containers,
00:36:00.300 | and have the systems infer how containers work.
00:36:03.820 | It would be so much less effort.
00:36:05.420 | Let the machine do the work.
00:36:06.780 | And so I understand the impulse.
00:36:08.220 | I understand why people want to do that.
00:36:10.220 | I just don't think that it works.
00:36:11.860 | I've never seen anybody build a system
00:36:14.580 | that in a robust way can actually watch videos
00:36:18.700 | and predict exactly which containers would leak
00:36:21.300 | and which ones wouldn't or something like that.
00:36:23.540 | And I know someone's gonna go out and do that
00:36:25.060 | since I said it, and I look forward to seeing it.
00:36:28.100 | But getting these things to work robustly
00:36:30.540 | is really, really hard.
00:36:32.900 | So Jan Lekun, who was my colleague at NYU
00:36:36.140 | for many years, thinks that the hard work
00:36:38.820 | should go into defining an unsupervised learning algorithm
00:36:43.180 | that will watch videos, use the next frame, basically,
00:36:46.660 | in order to tell it what's going on.
00:36:48.540 | And he thinks that's the Royal Road,
00:36:49.940 | and he's willing to put in the work
00:36:51.260 | in devising that algorithm.
00:36:53.300 | Then he wants the machine to do the rest.
00:36:55.580 | And again, I understand the impulse.
00:36:57.820 | My intuition, based on years of watching this stuff
00:37:01.740 | and making predictions 20 years ago that still hold,
00:37:03.940 | even though there's a lot more computation and so forth,
00:37:06.500 | is that we actually have to do a different kind of hard work,
00:37:08.520 | which is more like building a design specification
00:37:11.320 | for what we want the system to do,
00:37:13.100 | doing hard engineering work to figure out
00:37:15.060 | how we do things like what Jan did for convolution
00:37:18.460 | in order to figure out how to encode
00:37:20.500 | complex knowledge into the system.
00:37:22.580 | The current systems don't have that much knowledge
00:37:25.300 | other than convolution, which is, again,
00:37:27.580 | this object's being in different places
00:37:30.540 | and having the same perception, I guess I'll say,
00:37:33.260 | same appearance.
00:37:35.300 | People don't wanna do that work.
00:37:38.260 | They don't see how to naturally fit one with the other.
00:37:41.580 | - I think that's, yes, absolutely.
00:37:43.300 | But also on the expert system side,
00:37:45.540 | there's a temptation to go too far the other way,
00:37:47.620 | so it's just having an expert sort of sit down
00:37:49.860 | and code the description, the framework
00:37:52.700 | for what a container is, and then having
00:37:54.900 | the system reason the rest.
00:37:56.540 | From my view, one really exciting possibility
00:37:59.220 | is of active learning where it's continuous interaction
00:38:02.180 | between a human and machine.
00:38:04.080 | As the machine, there's kind of deep learning type
00:38:07.060 | extraction of information from data, patterns, and so on,
00:38:10.120 | but humans also guiding the learning procedures,
00:38:14.660 | guiding both the process and the framework
00:38:19.940 | of how the machine learns, whatever the task is.
00:38:22.100 | - I was with you with almost everything you said
00:38:24.100 | except the phrase deep learning.
00:38:26.500 | What I think you really want there
00:38:28.180 | is a new form of machine learning.
00:38:30.500 | So let's remember, deep learning is a particular way
00:38:32.980 | of doing machine learning.
00:38:33.980 | Most often, it's done with supervised data
00:38:36.980 | for perceptual categories.
00:38:38.820 | There are other things you can do with deep learning,
00:38:41.780 | some of them quite technical,
00:38:42.740 | but the standard use of deep learning
00:38:44.620 | is I have a lot of examples and I have labels for them.
00:38:47.600 | So here are pictures, this one's the Eiffel Tower,
00:38:50.360 | this one's the Sears Tower,
00:38:51.660 | this one's the Empire State Building,
00:38:53.320 | this one's a cat, this one's a pig, and so forth.
00:38:55.180 | You just get millions of examples, millions of labels.
00:38:58.900 | And deep learning is extremely good at that.
00:39:01.220 | It's better than any other solution
00:39:02.660 | that anybody has devised,
00:39:04.440 | but it is not good at representing abstract knowledge.
00:39:07.380 | It's not good at representing things
00:39:09.380 | like bottles contain liquid and have tops to them
00:39:13.980 | and so forth.
00:39:14.820 | It's not very good at learning
00:39:15.840 | or representing that kind of knowledge.
00:39:17.840 | It is an example of having a machine learn something,
00:39:21.300 | but it's a machine that learns a particular kind of thing,
00:39:23.900 | which is object classification.
00:39:25.540 | It's not a particularly good algorithm
00:39:27.760 | for learning about the abstractions that govern our world.
00:39:30.780 | There may be such a thing.
00:39:33.080 | Part of what we counsel in the book
00:39:34.300 | is maybe people should be working on devising such things.
00:39:36.980 | - So one possibility,
00:39:38.580 | just I wonder what you think about it,
00:39:40.580 | is deep neural networks do form abstractions,
00:39:45.200 | but they're not accessible to us humans
00:39:48.500 | in terms of we can't--
00:39:49.340 | - There's some truth in that.
00:39:50.780 | - So is it possible that either current
00:39:53.500 | or future neural networks
00:39:54.820 | form very high-level abstractions
00:39:56.500 | which are as powerful as our human abstractions
00:40:01.500 | of common sense?
00:40:02.660 | We just can't get a hold of them.
00:40:04.900 | And so the problem is essentially,
00:40:07.020 | we need to make them explainable.
00:40:09.220 | - This is an astute question,
00:40:10.640 | but I think the answer is at least partly no.
00:40:13.100 | One of the kinds of classical neural network architectures
00:40:16.160 | that we call an auto-associator.
00:40:17.600 | It just tries to take an input,
00:40:20.140 | goes through a set of hidden layers,
00:40:21.520 | and comes out with an output.
00:40:23.040 | And it's supposed to learn, essentially,
00:40:24.400 | the identity function,
00:40:25.440 | that your input is the same as your output.
00:40:27.240 | So you think of it as binary numbers.
00:40:28.460 | You've got the one, the two, the four,
00:40:30.320 | the eight, the 16, and so forth.
00:40:32.160 | And so if you want to input 24,
00:40:33.920 | you turn on the 16, you turn on the eight.
00:40:35.880 | It's like binary one, one, and a bunch of zeros.
00:40:38.960 | So I did some experiments in 1998
00:40:41.640 | with the precursors of contemporary deep learning.
00:40:46.640 | And what I showed was,
00:40:48.600 | you could train these networks on all the even numbers,
00:40:52.200 | and they would never generalize to the odd number.
00:40:54.800 | A lot of people thought that I was, I don't know,
00:40:56.840 | an idiot or faking the experiment,
00:40:58.600 | or it wasn't true or whatever.
00:41:00.240 | But it is true that with this class of networks
00:41:03.400 | that we had in that day,
00:41:05.000 | that they would never, ever make this generalization.
00:41:07.280 | And it's not that the networks were stupid.
00:41:09.800 | It's that they see the world
00:41:11.240 | in a different way than we do.
00:41:13.520 | They were basically concerned,
00:41:14.840 | what is the probability that the rightmost output node
00:41:18.680 | is going to be a one?
00:41:20.080 | And as far as they were concerned,
00:41:21.320 | in everything they'd ever been trained on, it was a zero.
00:41:24.200 | That node had never been turned on.
00:41:27.120 | And so they figured, well, I turned it on now.
00:41:29.060 | Whereas a person would look at the same problem and say,
00:41:31.040 | well, it's obvious.
00:41:31.880 | We're just doing the thing that corresponds.
00:41:33.880 | The Latin for it is mutandis mutandis,
00:41:35.600 | will change what needs to be changed.
00:41:38.280 | And we do this, this is what algebra is.
00:41:40.600 | So I can do f of x equals y plus two,
00:41:43.920 | and I can do it for a couple of values.
00:41:45.460 | I can tell you if y is three, then x is five.
00:41:47.800 | And if y is four, x is six.
00:41:49.200 | And now I can do it with some totally different number,
00:41:51.040 | like a million.
00:41:51.880 | Then you can say, well, obviously it's a million and two,
00:41:53.200 | because you have an algebraic operation
00:41:55.680 | that you're applying to a variable.
00:41:57.520 | And deep learning systems kind of emulate that,
00:42:00.700 | but they don't actually do it.
00:42:02.560 | The particular example,
00:42:04.200 | you can fudge a solution to that particular problem.
00:42:08.200 | The general form of that problem remains
00:42:10.560 | that what they learn is really correlations
00:42:12.480 | between different input and output nodes.
00:42:14.400 | They're complex correlations
00:42:16.200 | with multiple nodes involved and so forth.
00:42:18.840 | Ultimately, they're correlative.
00:42:20.320 | They're not structured over these operations over variables.
00:42:23.120 | Now, someday people may do a new form of deep learning
00:42:26.040 | that incorporates that stuff.
00:42:27.360 | And I think it will help a lot.
00:42:28.560 | And there's some tentative work on things
00:42:30.320 | like differentiable programming right now
00:42:32.240 | that fall into that category.
00:42:34.320 | But the sort of classic stuff like people use for ImageNet
00:42:37.840 | doesn't have it.
00:42:38.840 | And you have people like Hinton going around saying,
00:42:41.120 | symbol manipulation, like what Marcus,
00:42:42.920 | what I advocate is like the gasoline engine.
00:42:45.760 | It's obsolete.
00:42:46.580 | We should just use this cool electric power
00:42:48.880 | that we've got with a deep learning.
00:42:50.360 | And that's really destructive,
00:42:52.040 | 'cause we really do need to have the gasoline engine stuff
00:42:55.960 | that represents, I mean, I don't think it's a good analogy,
00:42:59.640 | but we really do need to have the stuff
00:43:02.240 | that represents symbols.
00:43:03.720 | - Yeah, and Hinton as well would say
00:43:06.240 | that we do need to throw out everything and start over.
00:43:09.000 | So there's a--
00:43:10.480 | - Yeah, Hinton said that to Axios.
00:43:12.820 | And I had a friend who interviewed him
00:43:15.520 | and tried to pin him down
00:43:16.440 | on what exactly we need to throw out.
00:43:17.800 | And he was very evasive.
00:43:19.880 | - Well, of course, 'cause we can't,
00:43:21.600 | if he knew that he'd throw it out himself.
00:43:23.920 | But I mean, you can't have it both ways.
00:43:25.400 | You can't be like, I don't know what to throw out,
00:43:27.520 | but I am gonna throw out the symbols.
00:43:29.960 | I mean, and not just the symbols,
00:43:32.120 | but the variables and the operations over variables.
00:43:34.080 | Don't forget, the operations over variables,
00:43:36.120 | the stuff that I'm endorsing,
00:43:37.760 | and which John McCarthy did when he founded AI,
00:43:41.520 | that stuff is the stuff that we build most computers out of.
00:43:44.200 | There are people now who say,
00:43:45.400 | "We don't need computer programmers anymore."
00:43:48.800 | Not quite looking at the statistics
00:43:50.280 | of how much computer programmers
00:43:51.200 | actually get paid right now.
00:43:53.000 | We need lots of computer programmers.
00:43:54.440 | And most of them, they do a little bit of machine learning,
00:43:57.800 | but they still do a lot of code, right?
00:43:59.920 | Code where it's like, if the value of X
00:44:02.240 | is greater than the value of Y,
00:44:03.600 | then do this kind of thing,
00:44:04.520 | the conditionals and comparing operations over variables.
00:44:08.080 | Like there's this fantasy you can machine learn anything.
00:44:10.200 | There's some things you would never wanna machine learn.
00:44:12.560 | I would not use a phone operating system
00:44:14.960 | that was machine learned.
00:44:16.080 | Like you made a bunch of phone calls
00:44:17.760 | and you recorded which packets were transmitted
00:44:19.720 | and you just machine learned it.
00:44:21.000 | It'd be insane.
00:44:22.480 | Or to build a web browser by taking logs of keystrokes
00:44:27.440 | and images, screenshots,
00:44:29.480 | and then trying to learn the relation between them.
00:44:31.480 | Nobody would ever, no rational person
00:44:33.840 | would ever try to build a browser that way.
00:44:35.920 | They would use symbol manipulation,
00:44:37.440 | the stuff that I think AI needs to avail itself of
00:44:40.080 | in addition to deep learning.
00:44:42.080 | - Can you describe what your view
00:44:44.560 | of symbol manipulation in its early days?
00:44:47.880 | Can you describe expert systems
00:44:49.480 | and where do you think they hit a wall
00:44:52.520 | or a set of challenges?
00:44:53.880 | - Sure, so I mean, first I just wanna clarify,
00:44:56.520 | I'm not endorsing expert systems per se.
00:44:58.920 | You've been kind of contrasting them.
00:45:00.720 | There is a contrast,
00:45:01.560 | but that's not the thing that I'm endorsing.
00:45:03.280 | - Yes.
00:45:04.200 | - So expert systems try to capture things
00:45:06.480 | like medical knowledge with a large set of rules.
00:45:09.440 | So if the patient has this symptom and this other symptom,
00:45:12.800 | then it is likely that they have this disease.
00:45:15.680 | So there are logical rules
00:45:16.820 | and they were symbol manipulating rules of just the sort
00:45:18.900 | that I'm talking about.
00:45:20.920 | And the--
00:45:21.760 | - They encode a set of knowledge
00:45:23.400 | that the experts then put in.
00:45:24.960 | - And very explicitly so.
00:45:26.240 | So you'd have somebody interview an expert
00:45:28.760 | and then try to turn that stuff into rules.
00:45:31.880 | And at some level I'm arguing for rules,
00:45:33.920 | but the difference is those guys did in the 80s
00:45:37.640 | was almost entirely rules,
00:45:39.980 | almost entirely handwritten with no machine learning.
00:45:42.920 | What a lot of people are doing now
00:45:44.300 | is almost entirely one species of machine learning
00:45:47.300 | with no rules.
00:45:48.240 | And what I'm counseling is actually a hybrid.
00:45:50.320 | I'm saying that both of these things have their advantage.
00:45:52.880 | So if you're talking about perceptual classification,
00:45:55.260 | how do I recognize a bottle?
00:45:57.080 | Deep learning is the best tool we've got right now.
00:45:59.480 | If you're talking about making inferences
00:46:00.880 | about what a bottle does,
00:46:02.360 | something closer to the expert systems
00:46:04.120 | is probably still the best available alternative.
00:46:07.280 | And probably we want something
00:46:08.700 | that is better able to handle quantitative
00:46:11.280 | and statistical information
00:46:12.600 | than those classical systems typically were.
00:46:14.920 | So we need new technologies
00:46:16.960 | that are gonna draw some of the strengths
00:46:18.600 | of both the expert systems and the deep learning,
00:46:21.040 | but are gonna find new ways to synthesize them.
00:46:23.240 | - How hard do you think it is to add knowledge
00:46:26.720 | at the low level?
00:46:27.740 | So mind, human intellects,
00:46:30.460 | to add extra information to symbol manipulating systems?
00:46:35.460 | - In some domains it's not that hard,
00:46:37.860 | but it's often really hard.
00:46:40.120 | Partly because a lot of the things that are important,
00:46:44.140 | people wouldn't bother to tell you.
00:46:46.100 | So if you pay someone on Amazon Mechanical Turk
00:46:49.700 | to tell you stuff about bottles,
00:46:52.100 | they probably won't even bother to tell you
00:46:55.100 | some of the basic level stuff
00:46:57.060 | that's just so obvious to a human being,
00:46:59.180 | and yet so hard to capture in machines.
00:47:02.160 | You know, they're gonna tell you more exotic things,
00:47:06.540 | and like, they're all well and good,
00:47:08.940 | but they're not getting to the root of the problem.
00:47:12.460 | So untutored humans aren't very good at knowing,
00:47:16.540 | and why should they be,
00:47:18.340 | what kind of knowledge
00:47:19.680 | the computer system developers actually need.
00:47:23.460 | I don't think that that's an irremediable problem.
00:47:26.620 | I think it's historically been a problem.
00:47:28.620 | People have had crowdsourcing efforts,
00:47:31.080 | and they don't work that well.
00:47:32.060 | There's one at MIT, we're recording this at MIT,
00:47:35.300 | called Virtual Home,
00:47:36.500 | where, and we talk about this in the book,
00:47:39.540 | find the exact example there,
00:47:40.740 | but people were asked to do things
00:47:42.800 | like describe an exercise routine.
00:47:44.860 | And the things that the people describe it
00:47:47.540 | are very low level,
00:47:48.560 | and don't really capture what's going on.
00:47:50.100 | So they're like, go to the room with the television
00:47:53.120 | and the weights, turn on the television,
00:47:56.100 | press the remote to turn on the television,
00:47:59.020 | lift weight, put weight down, whatever.
00:48:01.440 | It's like very micro level.
00:48:03.620 | And it's not telling you
00:48:04.900 | what an exercise routine is really about,
00:48:06.860 | which is like, I wanna fit a certain number of exercises
00:48:09.860 | in a certain time period,
00:48:10.940 | I wanna emphasize these muscles.
00:48:12.240 | I mean, you want some kind of abstract description.
00:48:15.060 | The fact that you happen to press the remote control
00:48:17.260 | in this room when you watch this television
00:48:20.020 | isn't really the essence of the exercise routine.
00:48:23.060 | But if you just ask people like, what did they do?
00:48:24.780 | Then they give you this fine grain.
00:48:26.980 | And so it takes a level of expertise
00:48:29.780 | about how the AI works
00:48:31.900 | in order to craft the right kind of knowledge.
00:48:34.580 | - So there's this ocean of knowledge
00:48:36.180 | that we all operate on.
00:48:37.580 | Some of it may not even be conscious,
00:48:39.340 | or at least we're not able to communicate it effectively.
00:48:43.300 | - Yeah, most of it we would recognize if somebody said it,
00:48:45.700 | if it was true or not.
00:48:47.440 | But we wouldn't think to say that it's true or not.
00:48:49.660 | - It's a really interesting mathematical property.
00:48:53.080 | This ocean has the property
00:48:54.740 | that every piece of knowledge in it,
00:48:56.740 | we will recognize it as true if it's told,
00:48:59.940 | but we're unlikely to retrieve it in the reverse.
00:49:04.140 | So that interesting property,
00:49:07.220 | I would say there's a huge ocean of that knowledge.
00:49:10.580 | What's your intuition?
00:49:11.580 | Is it accessible to AI systems somehow?
00:49:14.660 | Can we, so you said-- - Not yet.
00:49:16.660 | I mean, most of it is not,
00:49:18.780 | I'll give you an asterisk on this in a second,
00:49:20.540 | but most of it is not ever been encoded
00:49:23.280 | in machine interpretable form.
00:49:25.680 | And so, I mean, if you say accessible,
00:49:27.300 | there's two meanings of that.
00:49:28.640 | One is like, could you build it into a machine?
00:49:32.400 | The other is like, is there some database
00:49:34.480 | that we could go download and stick into our machine?
00:49:38.440 | - The first thing, no.
00:49:39.480 | Could we?
00:49:40.680 | What's your intuition? - I think we could.
00:49:42.020 | I think it hasn't been done right.
00:49:45.040 | You know, the closest, and this is the asterisk,
00:49:47.300 | is the CYC, psych system, tried to do this.
00:49:51.160 | A lot of logicians worked for Doug Lennon
00:49:53.040 | for 30 years on this project.
00:49:55.480 | I think they stuck too closely to logic,
00:49:57.920 | didn't represent enough about probabilities,
00:50:00.200 | tried to hand code it, there are various issues,
00:50:02.160 | and it hasn't been that successful.
00:50:04.480 | That is the closest existing system
00:50:08.480 | to trying to encode this.
00:50:10.640 | - Why do you think there's not more excitement/money
00:50:14.280 | behind this idea currently?
00:50:16.400 | - There was, people view that project as a failure.
00:50:19.180 | I think that they confuse the failure
00:50:22.060 | of a specific instance that was conceived 30 years ago
00:50:25.100 | for the failure of an approach,
00:50:26.180 | which they don't do for deep learning.
00:50:28.140 | So, you know, in 2010, people had the same attitude
00:50:31.940 | towards deep learning.
00:50:32.760 | They're like, this stuff doesn't really work.
00:50:35.500 | And, you know, all these other algorithms work better
00:50:38.620 | and so forth, and then certain key technical advances
00:50:41.540 | were made, but mostly it was the advent
00:50:43.780 | of graphics processing units that changed that.
00:50:46.400 | It wasn't even anything foundational in the techniques.
00:50:50.040 | And there was some new tricks,
00:50:51.240 | but mostly it was just more compute and more data,
00:50:55.280 | things like ImageNet that didn't exist before,
00:50:57.920 | that allowed deep learning.
00:50:59.040 | And it could be, to work, it could be that, you know,
00:51:02.200 | Psyche just needs a few more things or something like Psyche
00:51:05.480 | but the widespread view is that that just doesn't work.
00:51:08.840 | And people are reasoning from a single example.
00:51:11.800 | They don't do that with deep learning.
00:51:13.280 | They don't say nothing that existed in 2010
00:51:16.580 | and there were many, many efforts in deep learning
00:51:18.900 | was really worth anything, right?
00:51:20.600 | I mean, really, there's no model from 2010
00:51:23.840 | in deep learning, the deep learning
00:51:26.660 | that has any commercial value whatsoever at this point.
00:51:29.680 | They're all failures, but that doesn't mean
00:51:32.400 | that there wasn't anything there.
00:51:33.520 | I have a friend, I was getting to know him
00:51:35.980 | and he said, I had a company too, I was talking about,
00:51:39.420 | I had a new company.
00:51:40.600 | He said, I had a company too and it failed.
00:51:42.900 | And I said, well, what did you do?
00:51:44.280 | And he said, deep learning.
00:51:45.640 | And the problem was he did it in 1986 or something like that.
00:51:48.640 | And we didn't have the tools then or 1990.
00:51:51.060 | We didn't have the tools then, not the algorithms.
00:51:53.840 | You know, his algorithms weren't that different
00:51:55.400 | from modern algorithms, but he didn't have the GPUs
00:51:57.720 | to run it fast enough.
00:51:58.560 | He didn't have the data.
00:51:59.640 | And so it failed.
00:52:01.320 | It could be that, you know, symbol manipulation per se
00:52:06.320 | with modern amounts of data and compute
00:52:09.560 | and maybe some advance in compute
00:52:11.900 | for that kind of compute might be great.
00:52:14.440 | My perspective on it is not that we want to resuscitate
00:52:19.320 | that stuff per se, but we want to borrow lessons from it,
00:52:21.480 | bring together with other things that we've learned.
00:52:23.360 | - And it might have an ImageNet moment
00:52:25.840 | where it will spark the world's imagination
00:52:28.200 | and there'll be an explosion of symbol manipulation efforts.
00:52:31.440 | - Yeah, I think that people at AI2,
00:52:33.640 | Paul Allen's AI Institute, are trying to do that.
00:52:36.640 | Trying to build data sets that, well,
00:52:39.360 | they're not doing it for quite the reason that you said,
00:52:41.120 | but they're trying to build data sets
00:52:43.240 | that at least spark interest in common sense reasoning.
00:52:45.400 | - To create benchmarks.
00:52:46.840 | - Benchmarks for common sense.
00:52:48.240 | That's a large part of what the AI2.org
00:52:50.880 | is working on right now.
00:52:52.000 | - So speaking of compute,
00:52:53.280 | Rich Sutton wrote a blog post titled "Bitter Lesson."
00:52:56.400 | I don't know if you've read it,
00:52:57.240 | but he said that the biggest lesson that can be read
00:52:59.920 | from 70 years of AI research is that general methods
00:53:03.040 | that leverage computation are ultimately the most effective.
00:53:06.320 | Do you think that-- - The most effective of what?
00:53:08.640 | - Right, so they have been most effective
00:53:11.840 | for perceptual classification problems
00:53:14.520 | and for some reinforcement learning problems.
00:53:18.120 | He works on reinforcement learning.
00:53:19.400 | - Well, no, let me push back on that.
00:53:20.720 | You're actually absolutely right,
00:53:22.840 | but I would also say they have been most effective
00:53:26.400 | generally because everything we've done up to--
00:53:30.600 | Would you argue against that?
00:53:33.560 | To me, deep learning is the first thing
00:53:36.280 | that has been successful at anything in AI.
00:53:41.280 | And you're pointing out that this success
00:53:45.320 | is very limited, folks,
00:53:47.140 | but has there been something truly successful
00:53:50.280 | before deep learning?
00:53:51.720 | - Sure, I mean, I wanna make a larger point,
00:53:54.920 | but on the narrower point,
00:53:56.640 | classical AI is used, for example,
00:54:00.760 | in doing navigation instructions.
00:54:03.220 | It's very successful.
00:54:06.080 | Everybody on the planet uses it now,
00:54:07.880 | like multiple times a day.
00:54:09.520 | That's a measure of success, right?
00:54:11.400 | So I don't think classical AI was wildly successful,
00:54:16.160 | but there are cases like that.
00:54:17.640 | It is used all the time.
00:54:19.240 | Nobody even notices them 'cause they're so pervasive.
00:54:21.920 | So there are some successes for classical AI.
00:54:26.640 | I think deep learning has been more successful,
00:54:28.800 | but my usual line about this,
00:54:30.240 | and I didn't invent it, but I like it a lot,
00:54:33.160 | is just because you can build a better ladder
00:54:34.880 | doesn't mean you can build a ladder to the moon.
00:54:37.240 | So the bitter lesson is,
00:54:39.440 | if you have a perceptual classification problem,
00:54:42.320 | throwing a lot of data at it is better than anything else.
00:54:45.840 | But that has not given us any material progress
00:54:50.080 | in natural language understanding,
00:54:51.960 | common sense reasoning like a robot
00:54:53.920 | would need to navigate a home.
00:54:56.320 | Problems like that, there's no actual progress there.
00:54:59.520 | - So flip side of that, if we remove data from the picture,
00:55:02.320 | another bitter lesson is that you just have
00:55:05.880 | a very simple algorithm and you wait for compute to scale.
00:55:10.880 | It doesn't have to be learning,
00:55:13.640 | it doesn't have to be deep learning,
00:55:14.680 | it doesn't have to be data-driven,
00:55:16.520 | but just wait for the compute.
00:55:18.320 | So my question for you,
00:55:19.160 | do you think compute can unlock some of the things
00:55:21.760 | with either deep learning or symbol manipulation that--
00:55:25.560 | - Sure, but I'll put a proviso on that.
00:55:29.880 | I think more compute's always better.
00:55:31.280 | Nobody's gonna argue with more compute.
00:55:33.680 | It's like having more money.
00:55:34.760 | - There's diminishing returns on more money.
00:55:37.480 | - Exactly, there's diminishing returns on more money,
00:55:39.760 | but nobody's gonna argue
00:55:41.000 | if you wanna give them more money.
00:55:42.640 | Except maybe the people who signed the giving pledge,
00:55:44.720 | and some of them have a problem.
00:55:46.440 | They've promised to give away more money
00:55:48.040 | than they're able to.
00:55:49.720 | But the rest of us, if you wanna give me more money, fine.
00:55:52.560 | - You say more money, more problems, but okay.
00:55:54.640 | - That's true too.
00:55:55.600 | What I would say to you is your brain uses like 20 watts,
00:56:00.160 | and it does a lot of things that deep learning doesn't do,
00:56:02.720 | or that symbol manipulation doesn't do,
00:56:05.240 | that AI just hasn't figured out how to do.
00:56:07.080 | So it's an existence proof that you don't need
00:56:10.880 | server resources that are Google scale
00:56:13.760 | in order to have an intelligence.
00:56:16.200 | I built, with a lot of help from my wife,
00:56:18.960 | two intelligences that are 20 watts each,
00:56:21.760 | and far exceed anything that anybody else
00:56:24.200 | has built out of silicon.
00:56:27.400 | - Speaking of those two robots,
00:56:29.160 | how, what have you learned about AI from having--
00:56:33.320 | - Well, they're not robots, but--
00:56:35.400 | - Sorry, intelligent agents.
00:56:36.840 | - Those two intelligent agents.
00:56:38.240 | I've learned a lot by watching my two intelligent agents.
00:56:41.240 | I think that what's fundamentally interesting,
00:56:45.920 | well, one of the many things
00:56:47.040 | that's fundamentally interesting about them
00:56:48.720 | is the way that they set their own problems to solve.
00:56:52.040 | So my two kids are a year and a half apart.
00:56:54.600 | They're both five and six and a half.
00:56:56.480 | They play together all the time,
00:56:58.240 | and they're constantly creating new challenges.
00:57:01.000 | That's what they do, is they make up games.
00:57:03.800 | They're like, well, what if this, or what if that,
00:57:05.960 | or what if I had this superpower,
00:57:07.880 | or what if you could walk through this wall?
00:57:10.360 | So they're doing these what if scenarios all the time.
00:57:14.080 | And that's how they learn something about the world
00:57:17.560 | and grow their minds, and machines don't really do that.
00:57:22.560 | - So that's interesting.
00:57:23.680 | And you've talked about this, you've written about it,
00:57:25.280 | you've thought about it, nature versus nurture.
00:57:28.600 | So what innate knowledge do you think we're born with,
00:57:33.600 | and what do we learn along the way
00:57:35.640 | in those early months and years?
00:57:38.340 | - Can I just say how much I like that question?
00:57:40.680 | You phrased it just right, and almost nobody ever does,
00:57:45.860 | which is what is the innate knowledge
00:57:47.320 | and what's learned along the way?
00:57:49.280 | So many people dichotomize it,
00:57:51.240 | and they think it's nature versus nurture,
00:57:53.480 | when it is obviously has to be nature and nurture.
00:57:56.840 | They have to work together.
00:57:58.640 | You can't learn the stuff along the way
00:58:00.600 | unless you have some innate stuff.
00:58:02.440 | But just because you have the innate stuff
00:58:03.960 | doesn't mean you don't learn anything.
00:58:05.920 | And so many people get that wrong, including in the field.
00:58:09.440 | People think if I work in machine learning,
00:58:12.320 | the learning side, I must not be allowed to work
00:58:15.360 | on the innate side, or that will be cheating.
00:58:17.360 | Exactly, people have said that to me.
00:58:19.680 | And it's just absurd, so thank you.
00:58:22.460 | - But you could break that apart more.
00:58:25.240 | I've talked to folks who studied the development
00:58:27.200 | of the brain, and I mean, the growth of the brain
00:58:30.720 | in the first few days, in the first few months,
00:58:34.960 | in the womb, all of that, is that innate?
00:58:39.560 | So that process of development from a stem cell
00:58:42.360 | to the growth of the central nervous system and so on,
00:58:45.120 | to the information that's encoded
00:58:49.340 | through the long arc of evolution,
00:58:52.320 | so all of that comes into play, and it's unclear.
00:58:55.360 | It's not just whether it's a dichotomy or not,
00:58:57.360 | it's where most, or where the knowledge is encoded.
00:59:02.100 | So what's your intuition about the innate knowledge,
00:59:07.100 | the power of it, what's contained in it,
00:59:09.720 | what can we learn from it?
00:59:11.380 | - One of my earlier books was actually trying
00:59:12.800 | to understand the biology of this.
00:59:14.080 | The book was called The Birth of the Mind.
00:59:15.880 | Like, how is it the genes even build innate knowledge?
00:59:18.940 | And from the perspective of the conversation
00:59:21.480 | we're having today, there's actually two questions.
00:59:23.640 | One is what innate knowledge or mechanisms
00:59:26.520 | or what have you, people or other animals
00:59:29.680 | might be endowed with.
00:59:30.920 | I always like showing this video
00:59:32.280 | of a baby ibex climbing down a mountain.
00:59:34.640 | That baby ibex, a few hours after its birth,
00:59:37.400 | knows how to climb down a mountain.
00:59:38.440 | That means that it knows, not consciously,
00:59:40.960 | something about its own body and physics
00:59:43.040 | and 3D geometry and all of this kind of stuff.
00:59:46.440 | So there's one question about it,
00:59:48.680 | what does biology give its creatures?
00:59:51.280 | And what has evolved in our brains?
00:59:53.280 | How is that represented in our brains?
00:59:55.000 | The question I thought about in the book
00:59:56.240 | The Birth of the Mind.
00:59:57.360 | And then there's a question of what AI should have.
00:59:59.360 | And they don't have to be the same.
01:00:01.640 | But I would say that it's a pretty interesting
01:00:06.640 | set of things that we are equipped with
01:00:08.760 | that allows us to do a lot of interesting things.
01:00:10.600 | So I would argue or guess, based on my reading
01:00:13.800 | of the developmental psychology literature,
01:00:15.280 | which I've also participated in,
01:00:18.080 | that children are born with a notion of space,
01:00:21.800 | time, other agents, places,
01:00:24.460 | and also this kind of mental algebra
01:00:27.680 | that I was describing before.
01:00:30.260 | No certain causation, if I didn't just say that.
01:00:33.120 | So at least those kinds of things.
01:00:35.640 | They're like frameworks for learning the other things.
01:00:39.000 | - Are they disjoint in your view
01:00:40.360 | or is it just somehow all connected?
01:00:42.880 | You've talked a lot about language.
01:00:44.360 | Is it all kind of connected in some mesh
01:00:47.960 | that's language-like,
01:00:50.280 | if understanding concepts altogether?
01:00:52.560 | - I don't think we know for people
01:00:54.840 | how they're represented in machines,
01:00:56.280 | just don't really do this yet.
01:00:58.180 | So I think it's an interesting open question,
01:01:00.560 | both for science and for engineering.
01:01:02.620 | Some of it has to be at least interrelated
01:01:06.360 | in the way that the interfaces of a software package
01:01:10.200 | have to be able to talk to one another.
01:01:12.160 | So the systems that represent space and time
01:01:16.640 | can't be totally disjoint
01:01:18.320 | because a lot of the things that we reason about
01:01:20.720 | are the relations between space and time and cause.
01:01:23.000 | So I put this on and I have expectations
01:01:26.440 | about what's gonna happen with the bottle cap
01:01:28.160 | on top of the bottle,
01:01:29.460 | and those span space and time.
01:01:32.560 | If the cap is over here, I get a different outcome.
01:01:35.720 | If the timing is different, if I put this here,
01:01:38.560 | after I move that, then I get a different outcome.
01:01:41.900 | That relates to causality.
01:01:43.040 | So obviously these mechanisms, whatever they are,
01:01:47.840 | can certainly communicate with each other.
01:01:49.920 | - So I think evolution had a significant role
01:01:53.160 | to play in the development of this whole colluge, right?
01:01:57.080 | How efficient do you think is evolution?
01:01:59.200 | - Oh, it's terribly inefficient, except that--
01:02:01.920 | - Well, can we do better?
01:02:03.160 | (laughing)
01:02:03.960 | - Well, let's come to that in a second.
01:02:05.720 | It's inefficient except that once it gets a good idea,
01:02:09.360 | it runs with it.
01:02:10.840 | So it took, I guess, a billion years,
01:02:15.840 | roughly a billion years,
01:02:17.560 | to evolve to a vertebrate brain plan.
01:02:23.680 | And once that vertebrate brain plan evolved,
01:02:26.880 | it spread everywhere.
01:02:28.440 | So fish have it and dogs have it and we have it.
01:02:31.660 | We have adaptations of it and specializations of it.
01:02:34.080 | But, and the same thing with a primate brain plan.
01:02:37.120 | So monkeys have it and apes have it and we have it.
01:02:41.080 | So there are additional innovations like color vision
01:02:43.720 | and those spread really rapidly.
01:02:45.840 | So it takes evolution a long time to get a good idea,
01:02:48.840 | but, and I'm being anthropomorphic and not literal here,
01:02:53.280 | but once it has that idea, so to speak,
01:02:55.560 | which caches out into one set of genes or in the genome,
01:02:58.520 | those genes spread very rapidly.
01:03:00.440 | And they're like subroutines or libraries,
01:03:02.640 | I guess is the word people might use nowadays
01:03:04.560 | or be more familiar with.
01:03:05.640 | They're libraries that get used over and over again.
01:03:08.760 | So once you have the library for building something
01:03:11.720 | with multiple digits, you can use it for a hand,
01:03:13.840 | but you can also use it for a foot.
01:03:15.520 | You just kind of reuse the library
01:03:17.400 | with slightly different parameters.
01:03:19.080 | Evolution does a lot of that,
01:03:20.640 | which means that the speed over time picks up.
01:03:23.480 | So evolution can happen faster
01:03:25.560 | because you have bigger and bigger libraries.
01:03:28.380 | And what I think has happened in attempts
01:03:32.200 | at evolutionary computation is that people
01:03:35.320 | start with libraries that are very, very minimal,
01:03:40.320 | like almost nothing, and then progress is slow
01:03:44.240 | and it's hard for someone to get a good PhD thesis out of it
01:03:46.920 | and they give up.
01:03:48.240 | If we had richer libraries to begin with,
01:03:50.240 | if you were evolving from systems
01:03:52.580 | that had an originate structure to begin with,
01:03:55.320 | then things might speed up.
01:03:56.800 | - Or more PhD students, if the evolutionary process
01:03:59.920 | is indeed in a meta way, runs away with good ideas,
01:04:04.240 | you need to have a lot of ideas,
01:04:06.720 | pool of ideas in order for it to discover one
01:04:08.800 | that you can run away with.
01:04:10.240 | And PhD students representing individual ideas as well.
01:04:13.160 | - Yeah, I mean, you could throw
01:04:14.280 | a billion PhD students at it.
01:04:16.200 | - Yeah, the monkeys are typewriters with Shakespeare, yeah.
01:04:19.160 | - Well, I mean, those aren't cumulative, right?
01:04:22.040 | That's just random.
01:04:23.400 | And part of the point that I'm making
01:04:24.920 | is that evolution is cumulative.
01:04:26.720 | So if you have a billion monkeys independently,
01:04:31.080 | you don't really get anywhere.
01:04:32.380 | But if you have a billion monkeys,
01:04:33.760 | and I think Dawkins made this point originally,
01:04:35.680 | or probably other people,
01:04:36.520 | but Dawkins made it very nice
01:04:37.560 | in either "Selfish Teen" or "Blind Watchmaker".
01:04:40.000 | If there's some sort of fitness function
01:04:44.040 | that can drive you towards something,
01:04:45.840 | I guess that's Dawkins' point.
01:04:47.080 | And my point, which is a variation on that,
01:04:49.400 | is that if the evolution is cumulative,
01:04:51.920 | I mean, the related points,
01:04:53.800 | then you can start going faster.
01:04:55.600 | - Do you think something like the process of evolution
01:04:57.760 | is required to build intelligent systems?
01:05:00.180 | So if we-- - Not logically.
01:05:01.560 | So all the stuff that evolution did,
01:05:04.040 | a good engineer might be able to do.
01:05:07.040 | So for example, evolution made quadrupeds,
01:05:10.560 | which distribute the load across a horizontal surface.
01:05:14.200 | A good engineer could come up with that idea.
01:05:16.960 | I mean, sometimes good engineers come up with ideas
01:05:18.720 | by looking at biology.
01:05:19.760 | There's lots of ways to get your ideas.
01:05:22.480 | Part of what I'm suggesting
01:05:23.640 | is we should look at biology a lot more.
01:05:25.960 | We should look at the biology of thought and understanding.
01:05:30.200 | You know, the biology by which creatures
01:05:32.520 | intuitively reason about physics or other agents.
01:05:35.960 | Or like, how do dogs reason about people?
01:05:37.920 | Like, they're actually pretty good at it.
01:05:39.640 | If we could understand, at my college we joked dognition.
01:05:44.000 | If we could understand dognition well
01:05:46.280 | and how it was implemented, that might help us with our AI.
01:05:49.780 | - So do you think it's possible
01:05:53.780 | that the kind of timescale that evolution took
01:05:57.200 | is the kind of timescale that will be needed
01:05:58.940 | to build intelligent systems?
01:06:00.520 | Or can we significantly accelerate that process
01:06:02.960 | inside a computer?
01:06:04.020 | - I mean, I think the way that we accelerate that process
01:06:07.560 | is we borrow from biology.
01:06:09.720 | Not slavishly, but I think we look at how biology
01:06:14.280 | has solved problems and we say,
01:06:15.680 | does that inspire any engineering solutions here?
01:06:18.880 | - Try to mimic biological systems
01:06:20.680 | and then therefore have a shortcut.
01:06:22.360 | - Yeah, I mean, there's a field called biomimicry
01:06:25.000 | and people do that for like material science all the time.
01:06:28.960 | We should be doing the analog of that for AI.
01:06:32.920 | And the analog for that for AI
01:06:34.440 | is to look at cognitive science or the cognitive sciences,
01:06:37.000 | which is psychology, maybe neuroscience,
01:06:39.600 | linguistics and so forth.
01:06:41.320 | Look to those for insight.
01:06:43.440 | - What do you think is a good test
01:06:44.600 | of intelligence in your view?
01:06:46.680 | - I don't think there's one good test.
01:06:48.480 | In fact, I try to organize a movement
01:06:51.800 | towards something called a Turing Olympics.
01:06:53.360 | And my hope is that Francois is actually gonna take,
01:06:56.160 | Francois Chollet is gonna take over this.
01:06:58.240 | I think he's interested in it.
01:06:59.920 | I just don't have a place in my busy life at this moment.
01:07:03.460 | But the notion is that there'll be many tests
01:07:06.420 | and not just one because intelligence is multifaceted.
01:07:09.480 | There can't really be a single measure of it
01:07:12.880 | because it isn't a single thing.
01:07:14.660 | Like just the crudest level,
01:07:17.320 | the SAT has a verbal component and a math component
01:07:19.840 | 'cause they're not identical.
01:07:21.320 | And Howard Gardner has talked about multiple intelligence,
01:07:23.640 | like kinesthetic intelligence and verbal intelligence
01:07:26.920 | and so forth.
01:07:27.740 | There are a lot of things that go into intelligence
01:07:29.920 | and people can get good at one or the other.
01:07:32.520 | I mean, in some sense, like every expert
01:07:34.680 | has developed a very specific kind of intelligence.
01:07:37.200 | And then there are people that are generalists.
01:07:39.240 | And I think of myself as a generalist
01:07:41.720 | with respect to cognitive science,
01:07:43.360 | which doesn't mean I know anything about quantum mechanics,
01:07:45.600 | but I know a lot about the different facets of the mind.
01:07:49.200 | And there's a kind of intelligence
01:07:51.320 | to thinking about intelligence.
01:07:52.620 | I like to think that I have some of that,
01:07:54.720 | but social intelligence, I'm just okay.
01:07:57.440 | There are people that are much better at that than I am.
01:08:00.120 | - Sure, but what would be really impressive to you?
01:08:03.000 | I think the idea of a Touring Olympics
01:08:06.120 | is really interesting,
01:08:07.040 | especially if somebody like Francois is running it.
01:08:09.640 | But to you in general, not as a benchmark,
01:08:14.360 | but if you saw an AI system
01:08:16.320 | being able to accomplish something
01:08:18.440 | that would impress the heck out of you,
01:08:21.720 | what would that thing be?
01:08:22.740 | Would it be natural language conversation?
01:08:24.700 | - For me personally, I would like to see
01:08:28.580 | a kind of comprehension that relates to what you just said.
01:08:30.680 | So I wrote a piece in the New Yorker in I think 2015,
01:08:34.980 | right after Eugene Guzman, which was a software package,
01:08:39.940 | won a version of the Turing test.
01:08:42.860 | And the way that it did this is it,
01:08:45.220 | well, the way you win the Turing test,
01:08:46.900 | so called win it, is,
01:08:49.340 | the Turing test is you fool a person
01:08:50.700 | into thinking that a machine is a person,
01:08:54.460 | is you're evasive, you pretend to have limitations
01:08:58.020 | so you don't have to answer certain questions and so forth.
01:09:00.620 | So this particular system pretended to be a 13 year old boy
01:09:04.380 | from Odessa who didn't understand English
01:09:07.060 | and was kind of sarcastic
01:09:08.140 | and wouldn't answer your questions and so forth.
01:09:09.740 | And so judges got fooled into thinking briefly
01:09:12.540 | with very little exposure, he was a 13 year old boy,
01:09:14.740 | and it ducked all the questions
01:09:16.420 | Turing was actually interested in,
01:09:17.620 | which is like,
01:09:18.460 | how do you make the machine actually intelligent?
01:09:20.500 | So that test itself is not that good.
01:09:22.140 | And so in the New Yorker, I proposed an alternative,
01:09:25.780 | I guess, and the one that I proposed there
01:09:27.260 | was a comprehension test.
01:09:29.020 | And I must like Breaking Bad
01:09:31.060 | 'cause I've already given you one Breaking Bad example,
01:09:32.900 | and in that article I have one as well,
01:09:35.660 | which was something like,
01:09:36.660 | if Walter White,
01:09:37.660 | you should be able to watch an episode of Breaking Bad,
01:09:40.340 | or maybe you have to watch the whole series
01:09:41.700 | to be able to answer the question and say,
01:09:43.540 | if Walter White took a hit out on Jesse,
01:09:45.580 | why did he do that?
01:09:46.980 | Right, so if you could answer kind of arbitrary questions
01:09:49.380 | about characters motivations,
01:09:51.260 | I would be really impressed with that.
01:09:52.980 | I mean, you build software to do that.
01:09:55.380 | They could watch a film or there are different versions.
01:09:58.540 | And so ultimately I wrote this up with Praveen Paritosh
01:10:01.940 | in a special issue of AI Magazine
01:10:04.060 | that basically was about the Turing Olympics.
01:10:05.780 | There were like 14 tests proposed.
01:10:07.700 | And the one that I was pushing
01:10:08.940 | was a comprehension challenge.
01:10:10.140 | And Praveen, who's at Google,
01:10:11.700 | was trying to figure out like how we would actually run it.
01:10:13.500 | And so we wrote a paper together.
01:10:15.380 | And you could have a text version too,
01:10:17.340 | or you could have an auditory podcast version,
01:10:19.700 | you could have a written version.
01:10:20.580 | But the point is that you win at this test
01:10:23.740 | if you can do, let's say, human level or better than humans
01:10:26.980 | at answering kind of arbitrary questions.
01:10:29.500 | You know, why did this person pick up the stone?
01:10:31.580 | What were they thinking when they picked up the stone?
01:10:34.100 | Were they trying to knock down glass?
01:10:36.180 | And I mean, ideally these wouldn't be multiple choice either
01:10:38.620 | because multiple choice is pretty easily gamed.
01:10:41.060 | So if you could have relatively open-ended questions
01:10:44.140 | and you can answer why people are doing this stuff,
01:10:47.420 | I would be very impressed.
01:10:48.260 | And of course, humans can do this, right?
01:10:50.100 | If you watch a well-constructed movie
01:10:52.860 | and somebody picks up a rock,
01:10:55.580 | everybody watching the movie
01:10:56.980 | knows why they picked up the rock, right?
01:10:59.460 | They all know, oh my gosh,
01:11:01.180 | he's gonna hit this character or whatever.
01:11:03.660 | We have an example in the book
01:11:05.020 | about when a whole bunch of people say,
01:11:07.700 | "I am Spartacus," you know, this famous scene.
01:11:11.820 | The viewers understand, first of all,
01:11:14.140 | that everybody or everybody minus one has to be lying.
01:11:19.060 | They can't all be Spartacus.
01:11:20.380 | We have enough common sense knowledge
01:11:21.820 | to know they couldn't all have the same name.
01:11:24.160 | We know that they're lying
01:11:25.380 | and we can infer why they're lying, right?
01:11:27.140 | They're lying to protect someone
01:11:28.500 | and to protect things they believe in.
01:11:30.380 | You get a machine that can do that.
01:11:32.380 | They can say, "This is why these guys all got up
01:11:35.100 | "and said, 'I am Spartacus.'"
01:11:36.980 | I will sit down and say,
01:11:38.220 | "AI has really achieved a lot, thank you."
01:11:41.380 | - Without cheating any part of the system.
01:11:43.900 | - Yeah, I mean, if you do it,
01:11:45.660 | there are lots of ways you could cheat.
01:11:46.740 | You could build a Spartacus machine
01:11:48.820 | that works on that film.
01:11:50.260 | That's not what I'm talking about.
01:11:51.100 | I'm talking about you can do this
01:11:52.860 | with essentially arbitrary films from a large set.
01:11:55.740 | - Even beyond films,
01:11:56.580 | because it's possible such a system would discover
01:11:59.020 | that the number of narrative arcs in film
01:12:02.580 | is limited to like 90, 30.
01:12:04.180 | - Well, there's a famous thing
01:12:05.020 | about the classic seven plots or whatever.
01:12:07.100 | I don't care.
01:12:07.940 | If you wanna build in the system,
01:12:09.140 | boy meets girl, boy loses girl, boy finds girl.
01:12:11.660 | That's fine.
01:12:12.500 | I don't mind having some head start.
01:12:13.940 | - Innate knowledge, okay, good.
01:12:16.300 | - I mean, you could build it in innately
01:12:17.980 | or you could have your system watch a lot of films.
01:12:20.500 | If you can do this at all,
01:12:22.380 | but with a wide range of films,
01:12:23.740 | not just one film in one genre.
01:12:26.260 | But even if you could do it for all Westerns,
01:12:28.880 | I'd be reasonably impressed.
01:12:30.340 | - Yeah. (laughs)
01:12:31.940 | So in terms of being impressed,
01:12:34.100 | just for the fun of it,
01:12:35.860 | because you've put so many interesting ideas out there
01:12:38.420 | in your book,
01:12:40.380 | challenging the community for further steps.
01:12:43.700 | Is it possible on the deep learning front
01:12:46.720 | that you're wrong about its limitations?
01:12:50.260 | That deep learning will unlock,
01:12:52.260 | Jan LeCun next year will publish a paper
01:12:54.460 | that achieves this comprehension.
01:12:56.900 | So do you think that way often as a scientist?
01:13:00.260 | Do you consider that your intuition,
01:13:03.020 | that deep learning could actually run away with it?
01:13:06.700 | - I'm more worried about rebranding
01:13:09.700 | as a kind of political thing.
01:13:11.300 | So, I mean, what's gonna happen, I think,
01:13:14.040 | is that deep learning is gonna start
01:13:15.620 | to encompass symbol manipulation.
01:13:17.340 | So I think Hinton's just wrong.
01:13:18.980 | You know, Hinton says we don't want hybrids.
01:13:20.820 | I think people will work towards hybrids
01:13:22.340 | and they will relabel their hybrids as deep learning.
01:13:24.700 | We've already seen some of that.
01:13:25.820 | So AlphaGo is often described as a deep learning system,
01:13:29.580 | but it's more correctly described as a system
01:13:31.700 | that has deep learning, but also Monte Carlo Tree Search,
01:13:33.920 | which is a classical AI technique.
01:13:35.780 | And people will start to blur the lines
01:13:37.580 | in the way that IBM blurred Watson.
01:13:39.860 | First, Watson meant this particular system,
01:13:41.580 | and then it was just anything
01:13:42.420 | that IBM built in their cognitive division.
01:13:44.180 | - But purely, let me ask, for sure,
01:13:45.780 | that's a branding question, and that's a giant mess.
01:13:49.540 | I mean, purely, a single neural network
01:13:51.980 | being able to accomplish--
01:13:53.460 | - I don't stay up at night worrying
01:13:55.820 | that that's gonna happen.
01:13:57.820 | And I'll just give you two examples.
01:13:59.260 | One is a guy at DeepMind thought he had finally outfoxed me.
01:14:04.540 | @Zergilord, I think, is his Twitter handle.
01:14:06.940 | And he said, he specifically made an example.
01:14:10.580 | Marcus said that such and such.
01:14:12.620 | He fed it into GPT-2, which is the AI system
01:14:16.420 | that is so smart that OpenAI couldn't release it
01:14:19.060 | 'cause it would destroy the world, right?
01:14:21.180 | You remember that a few months ago.
01:14:22.940 | So he feeds it into GPT-2,
01:14:26.060 | and my example was something like a rose is a rose,
01:14:28.740 | a tulip is a tulip, a lily is a blank.
01:14:31.340 | And he got it to actually do that,
01:14:32.860 | which was a little bit impressive.
01:14:34.060 | And I wrote back and I said, "That's impressive,
01:14:35.340 | "but can I ask you a few questions?"
01:14:37.780 | I said, "Was that just one example?
01:14:40.020 | "Can it do it generally?
01:14:41.580 | "And can it do it with novel words?"
01:14:43.260 | Which was part of what I was talking about in 1998
01:14:45.340 | when I first raised the example.
01:14:46.780 | So a dax is a dax, right?
01:14:49.420 | And he sheepishly wrote back about 20 minutes later.
01:14:53.060 | And the answer was, "Well, it had some problems with those."
01:14:55.380 | So I made some predictions 21 years ago that still hold.
01:15:00.380 | In the world of computer science, that's amazing, right?
01:15:02.700 | Because there's a thousand or a million times more memory
01:15:06.540 | and computations a million times,
01:15:10.060 | do million times more operations per second,
01:15:13.180 | spread across a cluster.
01:15:15.340 | And there's been advances in replacing sigmoids
01:15:19.300 | with other functions and so forth.
01:15:23.420 | There's all kinds of advances,
01:15:25.420 | but the fundamental architecture hasn't changed
01:15:27.140 | and the fundamental limit hasn't changed.
01:15:28.620 | And what I said then is kind of still true.
01:15:30.900 | And then here's a second example.
01:15:32.220 | I recently had a piece in "Wired"
01:15:34.020 | that's adapted from the book.
01:15:35.260 | And the book went to press before GPT-2 came out,
01:15:40.140 | but we described this children's story
01:15:42.300 | and all the inferences that you make in this story
01:15:45.580 | about a boy finding a lost wallet.
01:15:48.260 | And for fun, in the "Wired" piece, we ran it through GPT-2.
01:15:52.100 | Something called talktotransformer.com,
01:15:55.460 | and your viewers can try this experiment themselves.
01:15:58.180 | Go to the "Wired" piece.
01:15:59.100 | It has the link and it has the story.
01:16:01.100 | And the system made perfectly fluent text
01:16:04.300 | that was totally inconsistent
01:16:06.420 | with the conceptual underpinnings of the story.
01:16:09.260 | This is what, again, I predicted in 1998.
01:16:13.260 | And for that matter, Chomsky and Miller
01:16:14.700 | made the same prediction in 1963.
01:16:16.660 | I was just updating their claim for a slightly new text.
01:16:19.420 | So those particular architectures
01:16:22.580 | that don't have any built-in knowledge,
01:16:24.820 | they're basically just a bunch of layers
01:16:27.020 | doing correlational stuff,
01:16:29.020 | they're not gonna solve these problems.
01:16:31.300 | - So 20 years ago, you said the emperor has no clothes.
01:16:34.560 | Today, the emperor still has no clothes.
01:16:36.940 | - The lighting's better, though.
01:16:38.060 | - The lighting is better.
01:16:39.100 | And I think you yourself are also, I mean--
01:16:42.340 | - And we found out some things to do with naked emperors.
01:16:44.420 | I mean, it's not that stuff is worthless.
01:16:47.020 | They're not really naked.
01:16:48.340 | It's more like they're in their briefs
01:16:49.660 | and everybody thinks that.
01:16:50.860 | And so, I mean, they are great at speech recognition,
01:16:54.420 | but the problems that I said were hard,
01:16:56.420 | 'cause I didn't literally say the emperor has no clothes.
01:16:58.280 | I said, this is a set of problems
01:17:00.180 | that humans are really good at.
01:17:01.700 | And it wasn't couched as AI.
01:17:03.180 | It was couched as cognitive science.
01:17:04.340 | But I said, if you wanna build a neural model
01:17:07.740 | of how humans do certain class of things,
01:17:10.380 | you're gonna have to change the architecture.
01:17:11.980 | And I stand by those claims.
01:17:13.660 | - So, and I think people should understand,
01:17:16.780 | you're quite entertaining in your cynicism,
01:17:19.100 | but you're also very optimistic and a dreamer
01:17:22.260 | about the future of AI, too.
01:17:23.940 | So you're both, it's just--
01:17:25.380 | - There's a famous saying about being,
01:17:27.860 | people overselling technology in the short run
01:17:30.720 | and underselling it in the long run.
01:17:34.140 | And so, I actually end the book,
01:17:37.200 | or Ernie Davis and I end our book
01:17:39.260 | with an optimistic chapter, which kind of killed Ernie,
01:17:41.740 | 'cause he's even more pessimistic than I am.
01:17:44.420 | He describes me as a contrarian and him as a pessimist.
01:17:47.620 | But I persuaded him that we should end the book
01:17:49.860 | with a look at what would happen
01:17:52.660 | if AI really did incorporate, for example,
01:17:55.380 | the common sense reasoning and the nativism
01:17:57.340 | and so forth, the things that we counseled for.
01:17:59.660 | And we wrote it, and it's an optimistic chapter
01:18:02.140 | that AI suitably reconstructed so that we could trust it,
01:18:05.900 | which we can't now, could really be world-changing.
01:18:09.500 | - So, on that point, if you look at the future trajectories
01:18:13.100 | of AI, people have worries about negative effects of AI,
01:18:17.180 | whether it's at the large existential scale
01:18:21.060 | or smaller, short-term scale of negative impact on society.
01:18:25.240 | So you write about trustworthy AI.
01:18:27.140 | How can we build AI systems that align with our values,
01:18:31.500 | that make for a better world,
01:18:32.820 | that we can interact with, that we can trust?
01:18:34.740 | - The first thing we have to do is to replace
01:18:36.340 | deep learning with deep understanding.
01:18:38.260 | So you can't have alignment with a system
01:18:42.460 | that traffics only in correlations
01:18:44.620 | and doesn't understand concepts like bottles or harm.
01:18:47.880 | So Asimov talked about these famous laws,
01:18:51.340 | and the first one was first do no harm.
01:18:54.060 | And you can quibble about the details of Asimov's laws,
01:18:56.860 | but we have to, if we're gonna build real robots
01:18:58.780 | in the real world, have something like that.
01:19:00.540 | That means we have to program in a notion
01:19:02.500 | that's at least something like harm.
01:19:04.260 | That means we have to have these more abstract ideas
01:19:06.620 | that deep learning's not particularly good at.
01:19:08.460 | They have to be in the mix somewhere.
01:19:10.620 | I mean, you could do statistical analysis
01:19:12.380 | about probabilities of given harms or whatever,
01:19:14.380 | but you have to know what a harm is,
01:19:15.820 | in the same way that you have to understand
01:19:17.420 | that a bottle isn't just a collection of pixels.
01:19:20.660 | - And also be able to, you're implying that you need
01:19:24.460 | to also be able to communicate that to humans.
01:19:26.900 | So the AI systems would be able to prove to humans
01:19:31.660 | that they understand that they know what harm means.
01:19:35.500 | - I might run it in the reverse direction,
01:19:37.420 | but roughly speaking, I agree with you.
01:19:38.660 | So we probably need to have committees of wise people,
01:19:43.420 | ethicists and so forth,
01:19:45.700 | think about what these rules ought to be.
01:19:47.540 | And we shouldn't just leave it to software engineers.
01:19:49.780 | It shouldn't just be software engineers,
01:19:51.660 | and it shouldn't just be, you know,
01:19:53.580 | people who own large mega corporations
01:19:56.580 | that are good at technology.
01:19:58.340 | Ethicists and so forth should be involved.
01:20:00.300 | But, you know, there should be some assembly of wise people,
01:20:04.700 | as I was putting it,
01:20:06.140 | that tries to figure out what the rules ought to be,
01:20:08.740 | and those have to get translated into code.
01:20:11.580 | You can argue, or code or neural networks or something.
01:20:15.500 | They have to be translated into something
01:20:18.700 | that machines can work with.
01:20:20.060 | And that means there has to be a way
01:20:21.980 | of working the translation.
01:20:23.420 | And right now we don't.
01:20:24.500 | We don't have a way.
01:20:25.380 | So let's say you and I were the committee
01:20:27.100 | and we decide that Asimov's first law is actually right.
01:20:29.860 | And let's say it's not just two white guys,
01:20:31.620 | which would be kind of unfortunate,
01:20:32.860 | and that we have a broad,
01:20:34.060 | and so we've represented a sample of the world
01:20:36.300 | or however we want to do this.
01:20:37.540 | And the committee decides eventually,
01:20:40.500 | okay, Asimov's first law is actually pretty good.
01:20:42.860 | There are these exceptions to it.
01:20:44.060 | We want to program in these exceptions.
01:20:46.100 | But let's start with just the first one,
01:20:47.500 | and then we'll get to the exceptions.
01:20:48.900 | First one is first do no harm.
01:20:50.660 | Well, somebody has to now actually turn that
01:20:53.300 | into a computer program or a neural network or something.
01:20:56.220 | And one way of taking the whole book,
01:20:58.780 | the whole argument that I'm making,
01:21:00.300 | is that we just don't know how to do that yet.
01:21:02.500 | And we're fooling ourselves
01:21:03.580 | if we think that we can build trustworthy AI.
01:21:05.860 | If we can't even specify in any kind of,
01:21:09.580 | we can't do it in Python,
01:21:10.660 | and we can't do it in TensorFlow.
01:21:13.180 | We're fooling ourselves in thinking
01:21:14.420 | that we can make trustworthy AI
01:21:15.860 | if we can't translate harm
01:21:18.180 | into something that we can execute.
01:21:19.980 | And if we can't, then we should be thinking really hard,
01:21:22.860 | how could we ever do such a thing?
01:21:24.660 | Because if we're gonna use AI
01:21:26.540 | in the ways that we want to use it,
01:21:27.980 | to make job interviews or to do surveillance,
01:21:31.100 | not that I personally want to do that, or whatever.
01:21:32.500 | I mean, if we're gonna use AI
01:21:33.800 | in ways that have practical impact on people's lives,
01:21:36.220 | or medicine, it's gotta be able
01:21:39.020 | to understand stuff like that.
01:21:41.220 | - So one of the things your book highlights
01:21:42.860 | is that a lot of people in the deep learning community,
01:21:47.420 | but also the general public, politicians,
01:21:50.260 | just people in all general groups and walks of life,
01:21:53.260 | have different levels of misunderstanding of AI.
01:21:57.380 | So when you talk about committees,
01:21:59.480 | what's your advice to our society?
01:22:05.620 | How do we grow, how do we learn about AI
01:22:08.180 | such that such committees could emerge?
01:22:10.840 | Where large groups of people could have
01:22:13.540 | a productive discourse about
01:22:15.200 | how to build successful AI systems?
01:22:17.840 | - Part of the reason we wrote the book
01:22:19.680 | was to try to inform those committees.
01:22:22.080 | So part of the reason we wrote the book
01:22:23.560 | was to inspire a future generation of students
01:22:25.680 | to solve what we think are the important problems.
01:22:27.880 | So a lot of the book is trying to pinpoint
01:22:29.880 | what we think are the hard problems
01:22:31.240 | where we think effort would most be rewarded.
01:22:33.840 | And part of it is to try to train people
01:22:36.720 | who talk about AI, but aren't experts in the field
01:22:41.000 | to understand what's realistic and what's not.
01:22:43.520 | One of my favorite parts in the book
01:22:44.680 | is the six questions you should ask
01:22:46.960 | anytime you read a media account.
01:22:48.400 | So like number one is if somebody talks about something,
01:22:51.080 | look for the demo.
01:22:51.920 | If there's no demo, don't believe it.
01:22:54.140 | Like the demo that you can try.
01:22:55.320 | If you can't try it at home,
01:22:56.480 | maybe it doesn't really work that well yet.
01:22:58.400 | So if, we don't have this example in the book,
01:23:00.640 | but if Sundar Pinchai says,
01:23:02.200 | "We have this thing that allows it to sound
01:23:06.120 | "like human beings in conversation,"
01:23:08.440 | you should ask, "Can I try it?"
01:23:10.400 | And you should ask how general it is.
01:23:11.880 | And it turns out at that time,
01:23:13.080 | I'm alluding to Google Duplex, when it was announced,
01:23:15.440 | it only worked on calling hairdressers,
01:23:18.200 | restaurants, and finding opening hours.
01:23:20.000 | That's not very general.
01:23:20.840 | That's narrow AI.
01:23:22.240 | - And I'm not gonna ask your thoughts about Sophia.
01:23:24.560 | But yeah, I understand that's a really good question
01:23:27.720 | to ask of any kind of hyped up idea.
01:23:30.200 | - Sophia has very good material written for her,
01:23:32.560 | but she doesn't understand the things that she's saying.
01:23:35.360 | - So a while ago, you've written a book
01:23:38.200 | on the science of learning, which I think is fascinating.
01:23:40.520 | But the learning case studies of playing guitar.
01:23:43.520 | - That's right. - It's called Guitar Zero.
01:23:45.080 | I love guitar myself.
01:23:46.320 | I've been playing my whole life.
01:23:47.360 | So let me ask a very important question.
01:23:50.240 | What is your favorite song, rock song,
01:23:53.480 | to listen to or try to play?
01:23:56.280 | - Well, those would be different.
01:23:57.120 | But I'll say that my favorite rock song to listen to
01:23:59.640 | is probably All Along the Watchtower,
01:24:01.040 | the Jimi Hendrix version.
01:24:02.000 | - The Jimi Hendrix version.
01:24:03.000 | - It just feels magic to me.
01:24:04.880 | - I've actually recently learned it.
01:24:05.960 | I love that song.
01:24:07.040 | I've been trying to put it on YouTube myself, singing.
01:24:09.400 | Singing is the scary part.
01:24:11.280 | If you could party with a rock star for a weekend,
01:24:13.400 | living or dead, who would you choose?
01:24:15.240 | And pick their mind, it's not necessarily about the party.
01:24:21.160 | - Thanks for the clarification.
01:24:22.720 | I guess John Lennon's such an intriguing person.
01:24:27.160 | I mean, I think a troubled person, but an intriguing one.
01:24:31.280 | So beautiful.
01:24:32.480 | - Well, Imagine is one of my favorite songs, so.
01:24:35.480 | - Also one of my favorite songs.
01:24:37.120 | - That's a beautiful way to end it.
01:24:38.320 | Gary, thank you so much for talking today.
01:24:39.840 | - Thanks so much for having me.
01:24:41.400 | (upbeat music)
01:24:44.000 | (upbeat music)
01:24:46.600 | (upbeat music)
01:24:49.200 | (upbeat music)
01:24:51.800 | (upbeat music)
01:24:54.400 | (upbeat music)
01:24:57.000 | [BLANK_AUDIO]