back to index

François Chollet: Measures of Intelligence | Lex Fridman Podcast #120


Chapters

0:0 Introduction
5:4 Early influence
6:23 Language
12:50 Thinking with mind maps
23:42 Definition of intelligence
42:24 GPT-3
53:7 Semantic web
57:22 Autonomous driving
69:30 Tests of intelligence
73:59 Tests of human intelligence
87:18 IQ tests
95:59 ARC Challenge
119:11 Generalization
129:50 Turing Test
140:44 Hutter prize
147:44 Meaning of life

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Francois Chollet,
00:00:03.240 | his second time on the podcast.
00:00:05.320 | He's both a world-class engineer and a philosopher
00:00:09.600 | in the realm of deep learning and artificial intelligence.
00:00:13.200 | This time, we talk a lot about his paper titled
00:00:16.200 | On the Measure of Intelligence
00:00:18.280 | that discusses how we might define
00:00:20.320 | and measure general intelligence in our computing machinery.
00:00:24.640 | Quick summary of the sponsors,
00:00:26.440 | Babbel, Masterclass, and Cash App.
00:00:29.460 | Click the sponsor links in the description
00:00:31.240 | to get a discount and to support this podcast.
00:00:34.500 | As a side note, let me say that the serious,
00:00:36.880 | rigorous scientific study
00:00:38.720 | of artificial general intelligence is a rare thing.
00:00:42.220 | The mainstream machine learning community
00:00:43.760 | works on very narrow AI with very narrow benchmarks.
00:00:47.760 | This is very good for incremental
00:00:49.920 | and sometimes big incremental progress.
00:00:53.200 | On the other hand, the outside the mainstream,
00:00:56.060 | renegade, you could say, AGI community
00:00:59.640 | works on approaches that verge on the philosophical
00:01:03.000 | and even the literary without big public benchmarks.
00:01:07.320 | Walking the line between the two worlds is a rare breed,
00:01:10.660 | but it doesn't have to be.
00:01:12.360 | I ran the AGI series at MIT as an attempt
00:01:15.320 | to inspire more people to walk this line.
00:01:17.700 | Deep mind and open AI for a time
00:01:20.020 | and still on occasion walk this line.
00:01:23.180 | Francois Chollet does as well.
00:01:25.880 | I hope to also.
00:01:27.600 | It's a beautiful dream to work towards
00:01:29.860 | and to make real one day.
00:01:32.460 | If you enjoy this thing, subscribe on YouTube,
00:01:34.560 | review it with Five Stars on Apple Podcast,
00:01:36.720 | follow on Spotify, support on Patreon,
00:01:39.000 | or connect with me on Twitter @LexFriedman.
00:01:42.000 | As usual, I'll do a few minutes of ads now
00:01:44.200 | and no ads in the middle.
00:01:45.760 | I try to make these interesting,
00:01:47.400 | but I give you timestamps so you can skip.
00:01:50.580 | But still, please do check out the sponsors
00:01:52.620 | by clicking the links in the description.
00:01:54.540 | It's the best way to support this podcast.
00:01:57.880 | This show is sponsored by Babbel,
00:02:00.060 | an app and website that gets you speaking
00:02:02.420 | in a new language within weeks.
00:02:04.340 | Go to babbel.com and use code Lex to get three months free.
00:02:08.160 | They offer 14 languages, including Spanish,
00:02:10.900 | French, Italian, German, and yes, Russian.
00:02:15.180 | Daily lessons are 10 to 15 minutes, super easy,
00:02:18.320 | effective, designed by over 100 language experts.
00:02:22.200 | Let me read a few lines from the Russian poem
00:02:24.660 | (speaking in foreign language)
00:02:27.200 | by Alexander Bloch that you'll start to understand
00:02:30.520 | if you sign up to Babbel.
00:02:32.560 | (speaking in foreign language)
00:02:36.480 | Now, I say that you'll start to understand this poem
00:02:48.480 | because Russian starts with a language
00:02:51.360 | and ends with vodka.
00:02:53.980 | Now, the latter part is definitely not endorsed
00:02:56.560 | or provided by Babbel.
00:02:58.000 | It will probably lose me this sponsorship,
00:03:00.320 | although it hasn't yet.
00:03:02.440 | But once you graduate with Babbel,
00:03:04.420 | you can enroll in my advanced course
00:03:06.080 | of late night Russian conversation over vodka.
00:03:09.160 | No app for that yet.
00:03:11.240 | So get started by visiting babbel.com
00:03:13.720 | and use code Lex to get three months free.
00:03:17.040 | This show is also sponsored by Masterclass.
00:03:20.960 | Sign up at masterclass.com/lex to get a discount
00:03:24.520 | and to support this podcast.
00:03:26.580 | When I first heard about Masterclass,
00:03:28.080 | I thought it was too good to be true.
00:03:30.020 | I still think it's too good to be true.
00:03:32.360 | For $180 a year, you get an all-access pass
00:03:35.440 | to watch courses from, to list some of my favorites,
00:03:38.760 | Chris Hadfield on space exploration,
00:03:41.360 | hope to have him in this podcast one day,
00:03:43.520 | Neil deGrasse Tyson on scientific thinking
00:03:45.560 | and communication, Neil too,
00:03:47.920 | Will Wright, creator of SimCity and Sims
00:03:50.120 | on game design, Carlos Santana on guitar,
00:03:52.720 | Gary Kasparov on chess, Daniel Negreanu on poker
00:03:55.960 | and many more.
00:03:57.200 | Chris Hadfield explaining how rockets work
00:03:59.680 | and the experience of being watched at the space alone
00:04:02.120 | is worth the money.
00:04:03.260 | By the way, you can watch it on basically any device.
00:04:06.520 | Once again, sign up at masterclass.com/lex
00:04:09.320 | to get a discount and to support this podcast.
00:04:12.440 | This show finally is presented by Cash App,
00:04:16.440 | the number one finance app in the App Store.
00:04:18.680 | When you get it, use code LEXPODCAST.
00:04:21.200 | Cash App lets you send money to friends,
00:04:23.240 | buy Bitcoin and invest in the stock market
00:04:25.440 | with as little as $1.
00:04:27.200 | Since Cash App allows you to send
00:04:28.940 | and receive money digitally,
00:04:30.520 | let me mention a surprising fact related to physical money.
00:04:33.840 | Of all the currency in the world,
00:04:35.660 | roughly 8% of it is actually physical money.
00:04:39.260 | The other 92% of the money only exists digitally
00:04:42.800 | and that's only going to increase.
00:04:45.240 | So again, if you get Cash App
00:04:46.720 | from the App Store or Google Play
00:04:48.280 | and use code LEXPODCAST, you get 10 bucks
00:04:51.680 | and Cash App will also donate $10 to FIRST,
00:04:54.360 | an organization that is helping to advance robotics
00:04:56.960 | and STEM education for young people around the world.
00:05:00.480 | And now here's my conversation with Francois Chollet.
00:05:03.860 | What philosophers, thinkers or ideas
00:05:07.320 | had a big impact on you growing up and today?
00:05:10.640 | - So one author that had a big impact on me
00:05:14.800 | when I read his books as a teenager was Jean Piaget,
00:05:18.800 | who is a Swiss psychologist,
00:05:21.320 | is considered to be the father of developmental psychology.
00:05:25.520 | And he has a large body of work
00:05:27.000 | about basically how intelligence develops in children.
00:05:32.000 | And so it's very old work,
00:05:35.480 | like most of it is from the 1930s, 1940s.
00:05:39.080 | So it's not quite up to date.
00:05:40.840 | It's actually superseded by many newer developments
00:05:43.800 | in developmental psychology.
00:05:45.640 | But to me, it was very, very interesting,
00:05:48.800 | very striking and actually shaped the early ways
00:05:51.360 | in which I started to think about the mind
00:05:53.800 | and the development of intelligence as a teenager.
00:05:56.200 | - His actual ideas or the way he thought about it
00:05:58.480 | or just the fact that you could think
00:05:59.840 | about the developing mind at all?
00:06:01.600 | - I guess both.
00:06:02.520 | Jean Piaget is the author that reintroduced me
00:06:04.920 | to the notion that intelligence and the mind
00:06:07.960 | is something that you construct throughout your life
00:06:11.120 | and that children construct it in stages.
00:06:15.760 | And I thought that was a very interesting idea,
00:06:17.440 | which is, of course, very relevant to AI,
00:06:20.480 | to building artificial minds.
00:06:22.000 | Another book that I read around the same time
00:06:25.840 | that had a big impact on me,
00:06:27.280 | and there was actually a little bit of overlap
00:06:32.080 | with Jean Piaget as well,
00:06:32.960 | and I read it around the same time,
00:06:35.320 | is Jeff Hawkins' "On Intelligence," which is a classic.
00:06:39.840 | And he has this vision of the mind
00:06:42.520 | as a multi-scale hierarchy of temporal prediction modules.
00:06:47.520 | And these ideas really resonated with me,
00:06:50.000 | like the notion of a modular hierarchy
00:06:53.920 | of potentially of compression functions
00:07:00.120 | or prediction functions.
00:07:01.680 | I thought it was really, really interesting.
00:07:03.960 | And it really shaped the way I started thinking
00:07:07.080 | about how to build minds.
00:07:09.760 | - The hierarchical nature, which aspect.
00:07:13.720 | Also, he's a neuroscientist, so he was thinking actual,
00:07:17.520 | he was basically talking about how our mind works.
00:07:20.560 | - Yeah, the notion that cognition is prediction
00:07:23.240 | was an idea that was kind of new to me at the time
00:07:25.440 | and that I really loved at the time.
00:07:27.840 | And yeah, and the notion that there are multiple scales
00:07:31.880 | of processing in the brain.
00:07:34.000 | - The hierarchy. - Yes.
00:07:36.720 | - This was before deep learning.
00:07:38.600 | These ideas of hierarchies in AI
00:07:41.160 | have been around for a long time,
00:07:43.160 | even before "On Intelligence,"
00:07:45.040 | they've been around since the 1980s.
00:07:47.080 | And yeah, that was before deep learning,
00:07:50.480 | but of course, I think these ideas really found
00:07:52.800 | their practical implementation in deep learning.
00:07:58.080 | - What about the memory side of things?
00:07:59.720 | I think he was talking about knowledge representation.
00:08:02.840 | Do you think about memory a lot?
00:08:04.440 | One way you can think of neural networks
00:08:06.320 | as a kind of memory, you're memorizing things,
00:08:10.760 | but it doesn't seem to be the kind of memory
00:08:14.240 | that's in our brains, or it doesn't have
00:08:17.400 | the same rich complexity, long-term nature
00:08:19.680 | that's in our brains. - Yes.
00:08:21.560 | The brain is more of a sparse access memory
00:08:23.960 | so that you can actually retrieve very precisely
00:08:27.720 | like bits of your experience.
00:08:30.120 | - The retrieval aspect, you can like introspect,
00:08:33.520 | you can ask yourself questions, I guess.
00:08:36.520 | You can program your own memory,
00:08:38.280 | and language is actually the tool you use to do that.
00:08:41.680 | I think language is a kind of operating system for the mind.
00:08:46.360 | And you use language, well, one of the uses of language
00:08:49.560 | is as a query that you run over your own memory.
00:08:53.840 | You use words as keys to retrieve specific experiences
00:08:57.960 | or specific concepts, specific thoughts.
00:09:00.120 | Like language is a way you store thoughts,
00:09:02.400 | not just in writing, in the physical world,
00:09:04.720 | but also in your own mind.
00:09:06.160 | And it's also how you retrieve them.
00:09:07.640 | Like imagine if you didn't have language,
00:09:10.040 | then you would have to, you would not really have
00:09:13.520 | a self-internally triggered way of retrieving past thoughts.
00:09:18.520 | You would have to rely on external experiences.
00:09:21.320 | For instance, you see a specific site,
00:09:24.080 | you smell a specific smell, and that brings up memories,
00:09:26.840 | but you would not really have a way to deliberately
00:09:30.480 | access these memories without language.
00:09:32.760 | - Well, the interesting thing you mentioned
00:09:34.040 | is you can also program the memory.
00:09:37.480 | You can change it, probably with language.
00:09:40.040 | - Yeah, using language, yes.
00:09:41.560 | - Well, let me ask you a Chomsky question,
00:09:44.120 | which is like, first of all,
00:09:46.000 | do you think language is like fundamental,
00:09:49.120 | like there's turtles, what's at the bottom of the turtles?
00:09:54.120 | They don't go, it can't be turtles all the way down.
00:09:57.320 | Is language at the bottom of cognition of everything?
00:10:00.280 | Is like language the fundamental,
00:10:03.800 | aspect of like what it means to be a thinking thing?
00:10:08.800 | - No, I don't think so.
00:10:12.080 | I think language is--
00:10:12.920 | - You disagree with Norm Chomsky?
00:10:14.600 | - Yes, I think language is a layer on top of cognition.
00:10:17.880 | So it is fundamental to cognition in the sense that,
00:10:21.760 | to use a computing metaphor, I see language
00:10:24.600 | as the operating system of the brain, of the human mind.
00:10:29.480 | And the operating system, you know,
00:10:31.040 | is a layer on top of the computer.
00:10:33.200 | The computer exists before the operating system,
00:10:36.160 | but the operating system is how you make it truly useful.
00:10:39.480 | - And the operating system is most likely Windows,
00:10:42.160 | not Linux, 'cause language is messy.
00:10:45.880 | - Yeah, it's messy and it's pretty difficult
00:10:49.440 | to inspect it, introspect it.
00:10:53.160 | - How do you think about language?
00:10:55.080 | Like we use actually sort of human interpretable language,
00:11:00.040 | but is there something like deeper,
00:11:03.120 | that's closer to like logical type of statements?
00:11:07.920 | Like, yeah, what is the nature of language, do you think?
00:11:13.880 | Like is there something deeper
00:11:17.200 | than like the syntactic rules we construct?
00:11:19.160 | Is there something that doesn't require utterances
00:11:22.880 | or writing or so on?
00:11:25.560 | - Are you asking about the possibility that could exist?
00:11:29.440 | Languages for thinking that are not made of words?
00:11:32.840 | - Yeah.
00:11:33.680 | - Yeah, I think so.
00:11:34.520 | I think, so the mind is layers, right?
00:11:38.560 | And language is almost like the outermost,
00:11:41.800 | the uppermost layer.
00:11:43.160 | But before we think in words,
00:11:46.760 | I think we think in terms of emotion in space
00:11:51.080 | and we think in terms of physical actions.
00:11:54.160 | And I think babies in particular
00:11:56.880 | probably express these thoughts in terms of the actions
00:12:01.400 | that they've seen or that they can perform.
00:12:03.720 | And in terms of motions of objects in their environment
00:12:08.080 | before they start thinking in terms of words.
00:12:10.880 | - It's amazing to think about that
00:12:13.920 | as the building blocks of language.
00:12:16.840 | So like the kind of actions and ways the babies
00:12:20.320 | see the world as like more fundamental
00:12:23.320 | than the beautiful Shakespearean language
00:12:26.280 | you construct on top of it.
00:12:27.640 | And we probably don't have any idea
00:12:30.560 | what that looks like, right?
00:12:32.000 | 'Cause it's important for them trying to engineer it
00:12:35.960 | into AI systems.
00:12:37.520 | - I think visual analogies and motion
00:12:42.120 | is a fundamental building block of the mind.
00:12:45.440 | And you actually see it reflected in language.
00:12:48.600 | Like language is full of spatial metaphors.
00:12:51.880 | And when you think about things,
00:12:53.880 | I consider myself very much as a visual thinker.
00:12:57.400 | You often express these thoughts by using things
00:13:02.200 | like visualizing concepts in 2D space
00:13:07.200 | or like you solve problems by imagining yourself
00:13:12.200 | navigating a concept space.
00:13:15.080 | I don't know if you have this sort of experience.
00:13:17.920 | - You said visualizing concept space.
00:13:19.840 | So like, so I certainly think about
00:13:22.680 | I certainly visualize mathematical concepts,
00:13:27.680 | but you mean like in concept space?
00:13:31.440 | Visually you're embedding ideas
00:13:34.920 | into a three dimensional space
00:13:37.000 | you can explore with your mind essentially.
00:13:38.840 | - You mean more like 2D, but yeah.
00:13:40.360 | - 2D?
00:13:41.200 | You're a flatlander.
00:13:43.200 | You're, okay.
00:13:45.760 | No, I do not.
00:13:49.720 | I always have to, before I jump from concept to concept,
00:13:52.800 | I have to put it back down on paper.
00:13:57.120 | It has to be on paper.
00:13:58.080 | I can only travel on a 2D paper, not inside my mind.
00:14:03.080 | You're able to move inside your mind.
00:14:05.400 | - But even if you're writing like a paper, for instance,
00:14:07.960 | don't you have like a spatial representation of your paper?
00:14:11.040 | Like you visualize where ideas lie topologically
00:14:16.680 | in relationship to other ideas,
00:14:19.040 | kind of like a subway map of the ideas in your paper?
00:14:22.520 | - Yeah, that's true.
00:14:23.440 | I mean, there is, in papers, I don't know about you,
00:14:27.960 | but it feels like there's a destination.
00:14:30.600 | There's a key idea that you wanna arrive at
00:14:36.280 | and a lot of it is in the fog
00:14:39.360 | and you're trying to kind of, it's almost like,
00:14:45.440 | what's that called when you do a path planning search
00:14:49.040 | from both directions, from the start and from the end?
00:14:51.740 | And then you find, you do like shortest path,
00:14:54.800 | but in game playing, you do this with like A star
00:14:59.520 | from both sides.
00:15:01.200 | - And you see where we're on the join.
00:15:03.480 | - Yeah, so you kind of do, at least for me,
00:15:05.760 | I think like, first of all, just exploring from the start,
00:15:08.600 | from like first principles, what do I know?
00:15:12.360 | What can I start proving from that?
00:15:15.680 | And then from the destination, if you start backtracking,
00:15:20.680 | if I want to show some kind of sets of ideas,
00:15:25.480 | what would it take to show them?
00:15:26.880 | And you kind of backtrack.
00:15:28.360 | But yeah, I don't think I'm doing all that in my mind,
00:15:31.080 | though, like I'm putting it down on paper.
00:15:33.240 | - Do you use mind maps to organize your ideas?
00:15:35.560 | Yeah, I like mind maps.
00:15:37.040 | I'm that kind of person.
00:15:37.880 | - Let's get into this 'cause it's,
00:15:39.880 | I've been so jealous of people, I haven't really tried it.
00:15:42.160 | I've been jealous of people that seem to like,
00:15:45.560 | they get like this fire of passion in their eyes
00:15:48.160 | 'cause everything starts making sense.
00:15:50.080 | It's like Tom Cruise in the movie
00:15:52.000 | was like moving stuff around.
00:15:53.800 | Some of the most brilliant people I know use mind maps.
00:15:55.920 | I haven't tried really.
00:15:57.680 | Can you explain what the hell a mind map is?
00:16:01.280 | - I guess a mind map is a way to make
00:16:03.760 | kind of like the mess inside your mind
00:16:05.960 | to just put it on paper
00:16:08.200 | so that you gain more control over it.
00:16:10.040 | It's a way to organize things on paper.
00:16:13.040 | And as kind of like a consequence
00:16:16.440 | of organizing things on paper,
00:16:18.000 | it starts being more organized inside your own mind.
00:16:20.320 | - So what does that look like?
00:16:21.560 | You put, like, do you have an example?
00:16:24.000 | Like, what's the first thing you write on paper?
00:16:27.400 | What's the second thing you write?
00:16:29.000 | - I mean, typically you draw a mind map
00:16:31.680 | to organize the way you think about a topic.
00:16:34.880 | So you would start by writing down
00:16:37.320 | like the key concept about that topic.
00:16:39.560 | Like you would write intelligence or something.
00:16:42.160 | And then you would start adding associative connections.
00:16:45.600 | Like, what do you think about
00:16:46.800 | when you think about intelligence?
00:16:48.040 | What do you think are the key elements of intelligence?
00:16:50.440 | So maybe you would have language, for instance,
00:16:52.320 | and you'd have motion.
00:16:53.400 | And so you would start drawing notes with these things.
00:16:55.440 | And then you would see,
00:16:56.440 | what do you think about when you think about motion?
00:16:58.480 | And so on, and you would go like that, like a tree.
00:17:00.600 | - It's a tree or a tree mostly,
00:17:03.760 | or is it a graph too, like a tree?
00:17:05.680 | - Oh, it's more of a graph than a tree.
00:17:08.000 | And it's not limited to just writing down words.
00:17:13.000 | You can also draw things.
00:17:15.960 | And it's not supposed to be purely hierarchical, right?
00:17:19.640 | Like you can, the point is that you can start,
00:17:23.040 | once you start writing it down,
00:17:24.560 | you can start reorganizing it so that it makes more sense,
00:17:27.560 | so that it's connected in a more effective way.
00:17:29.960 | - See, but I'm so OCD that you just mentioned intelligence
00:17:35.120 | and then language and motion.
00:17:37.080 | I would start becoming paranoid
00:17:39.120 | that the categorization is imperfect.
00:17:42.040 | Like that I would become paralyzed with the mind map
00:17:47.040 | that like this may not be.
00:17:49.680 | So like the, even though you're just doing
00:17:52.680 | associative kind of connections,
00:17:55.400 | there's an implied hierarchy that's emerging.
00:17:58.520 | And I would start becoming paranoid
00:17:59.960 | that it's not the proper hierarchy.
00:18:02.380 | So you're not just, one way to see mind maps
00:18:05.000 | is you're putting thoughts on paper.
00:18:07.080 | It's like a stream of consciousness,
00:18:10.620 | but then you can also start getting paranoid.
00:18:12.240 | Well, is this the right hierarchy?
00:18:15.200 | - Sure, but it's mind maps, your mind map.
00:18:17.840 | You're free to draw anything you want.
00:18:19.440 | You're free to draw any connection you want.
00:18:20.880 | And you can just make a different mind map
00:18:23.480 | if you think the central node is not the right node.
00:18:26.320 | - Yeah, I suppose there's a fear of being wrong.
00:18:29.760 | - If you want to organize your ideas
00:18:32.720 | by writing down what you think,
00:18:35.600 | which I think is very effective.
00:18:37.400 | Like how do you know what you think about something
00:18:40.200 | if you don't write it down, right?
00:18:42.980 | If you do that, the thing is that it imposes
00:18:46.240 | much more syntactic structure over your ideas,
00:18:50.000 | which is not required with a mind map.
00:18:51.560 | So a mind map is kind of like a lower level,
00:18:54.200 | more freehand way of organizing your thoughts.
00:18:57.920 | And once you've drawn it,
00:18:59.640 | then you can start actually voicing your thoughts
00:19:03.680 | in terms of paragraphs.
00:19:05.400 | - It's a two dimensional aspect of layout too, right?
00:19:08.180 | And it's a kind of flower, I guess, you start.
00:19:12.880 | There's usually, you want to start with a central concept?
00:19:15.860 | - Yes. - And then you move on.
00:19:16.960 | - Typically it ends up more like a subway map.
00:19:19.180 | So it ends up more like a graph, a topological graph.
00:19:22.120 | - Without a root node.
00:19:23.520 | - Yeah, so like in a subway map,
00:19:25.040 | there are some nodes that are more connected than others.
00:19:27.320 | And there are some nodes that are more important than others.
00:19:30.960 | So there are destinations,
00:19:32.440 | but it's not gonna be purely like a tree, for instance.
00:19:36.440 | - Yeah, it's fascinating to think
00:19:38.600 | if there's something to that about the way our mind thinks.
00:19:42.440 | By the way, I just kind of remembered obvious thing
00:19:45.840 | that I have probably thousands of documents
00:19:49.060 | in Google Doc at this point that are bullet point lists,
00:19:54.360 | which is, you can probably map a mind map
00:19:57.880 | to a bullet point list.
00:19:59.680 | It's the same, it's a, no, it's not, it's a tree.
00:20:05.120 | It's a tree, yeah.
00:20:06.280 | So I create trees,
00:20:07.920 | but also they don't have the visual element.
00:20:10.800 | Like, I guess I'm comfortable with the structure.
00:20:13.480 | It feels like, the narrowness,
00:20:15.760 | the constraints feel more comforting.
00:20:18.320 | - If you have thousands of documents
00:20:20.320 | with your own thoughts in Google Docs,
00:20:23.120 | why don't you write some kind of search engine,
00:20:26.600 | like maybe a mind map, a piece of software,
00:20:30.880 | a mind mapping software where you write down a concept
00:20:33.960 | and then it gives you sentences or paragraphs
00:20:37.480 | from your thousands Google Docs document
00:20:39.720 | that match this concept.
00:20:41.240 | - The problem is it's so deeply, unlike mind maps,
00:20:45.320 | it's so deeply rooted in natural language.
00:20:48.460 | So it's not semantically searchable, I would say,
00:20:53.460 | 'cause the categories are very,
00:20:57.200 | you kind of mentioned intelligence, language, and motion.
00:21:00.700 | They're very strong semantic.
00:21:02.540 | Like, it feels like the mind map forces you
00:21:05.020 | to be semantically clear and specific.
00:21:09.780 | The bullet points list I have
00:21:11.180 | are sparse, disparate thoughts
00:21:16.500 | that poetically represent a category,
00:21:21.420 | like motion, as opposed to saying motion.
00:21:24.200 | So unfortunately, that's the same problem with the internet.
00:21:28.980 | That's why the idea of semantic web is difficult to get.
00:21:31.920 | Most language on the internet is a giant mess
00:21:37.980 | of natural language that's hard to interpret.
00:21:40.220 | So do you think there's something to mind maps as,
00:21:46.160 | you actually originally brought it up
00:21:48.060 | as we were talking about kind of cognition and language.
00:21:53.060 | Do you think there's something to mind maps
00:21:55.300 | about how our brain actually deals,
00:21:58.100 | like, think reasons about things?
00:22:00.300 | - It's possible.
00:22:02.580 | I think it's reasonable to assume
00:22:04.940 | that there is some level of topological processing
00:22:09.940 | in the brain, that the brain is very associative in nature.
00:22:15.140 | And I also believe that a topological space
00:22:20.140 | is a better medium to encode thoughts
00:22:25.440 | than a geometric space.
00:22:27.520 | So I think--
00:22:28.360 | - What's the difference in a topological
00:22:29.720 | and a geometric space?
00:22:31.040 | - Well, if you're talking about topologies,
00:22:34.120 | then points are either connected or not.
00:22:36.220 | So a topology is more like a subway map.
00:22:38.640 | And geometry is when you're interested
00:22:41.680 | in the distance between things.
00:22:43.920 | And in subway maps, you don't really have
00:22:45.200 | the concept of distance.
00:22:46.340 | You only have the concept of whether there is a train
00:22:48.420 | going from station A to station B.
00:22:51.480 | And what we do in deep learning
00:22:54.520 | is that we're actually dealing with geometric spaces.
00:22:57.740 | We are dealing with concept vectors, word vectors,
00:23:01.540 | that have a distance between them
00:23:03.220 | which is expressed in terms of that product.
00:23:05.420 | We are not really building topological models usually.
00:23:10.780 | - I think you're absolutely right.
00:23:11.820 | Like distance is a fundamental importance in deep learning.
00:23:16.580 | I mean, it's the continuous aspect of it.
00:23:19.380 | - Yes, because everything is a vector
00:23:21.220 | and everything has to be a vector
00:23:22.500 | because everything has to be differentiable.
00:23:24.500 | If your space is discrete, it's no longer differentiable.
00:23:26.900 | You cannot do deep learning in it anymore.
00:23:29.700 | Well, you could, but you could only do it
00:23:31.420 | by embedding it in a bigger continuous space.
00:23:35.660 | So if you do topology in the context of deep learning,
00:23:39.380 | you have to do it by embedding your topology
00:23:41.140 | in a geometry.
00:23:41.980 | - Well, let me zoom out for a second.
00:23:46.260 | Let's get into your paper on the measure of intelligence
00:23:49.100 | that you put out in 2019?
00:23:52.900 | - Yes.
00:23:53.740 | - Okay.
00:23:54.560 | - November.
00:23:55.400 | - November.
00:23:56.240 | Yeah, remember 2019?
00:23:59.460 | That was a different time.
00:24:01.180 | - Yeah, I remember.
00:24:02.860 | I still remember.
00:24:03.700 | (Lex laughing)
00:24:06.580 | - It feels like a different world.
00:24:09.660 | You could travel, you could actually go outside
00:24:12.660 | and see friends.
00:24:14.100 | - Yeah.
00:24:16.300 | Let me ask the most absurd question.
00:24:18.980 | I think there's some non-zero probability
00:24:21.780 | there'll be a textbook one day,
00:24:23.380 | like 200 years from now, on artificial intelligence,
00:24:27.780 | or it'll be called just intelligence
00:24:30.700 | 'cause humans will already be gone.
00:24:32.500 | It'll be your picture with a quote.
00:24:35.740 | One of the early biological systems
00:24:39.060 | would consider the nature of intelligence,
00:24:41.660 | and there'll be a definition
00:24:43.240 | of how they thought about intelligence,
00:24:45.260 | which is one of the things you do in your paper
00:24:46.940 | on measure of intelligence is to ask,
00:24:49.740 | well, what is intelligence
00:24:52.660 | and how to test for intelligence and so on.
00:24:55.580 | So is there a spiffy quote about what is intelligence?
00:25:00.580 | What is the definition of intelligence,
00:25:03.980 | according to Francois Chollet?
00:25:06.480 | - Yeah, so do you think the super intelligent AIs
00:25:10.480 | of the future will want to remember us
00:25:13.640 | the way we remember humans from the past?
00:25:15.800 | And do you think they won't be ashamed
00:25:19.160 | of having a biological origin?
00:25:21.040 | - No, I think it would be a niche topic.
00:25:24.400 | It won't be that interesting,
00:25:25.520 | but it'll be like the people that study
00:25:29.120 | in certain contexts, like historical civilization
00:25:32.840 | that no longer exist, the Aztecs,
00:25:34.840 | and so on, that's how it'll be seen.
00:25:38.280 | And it'll be studied in also the context on social media,
00:25:42.360 | there'll be hashtags about the atrocity committed
00:25:47.080 | to human beings when the robots finally got rid of them.
00:25:52.080 | Like it was a mistake, it'll be seen as a giant mistake,
00:25:57.080 | but ultimately in the name of progress,
00:26:00.120 | and it created a better world
00:26:01.560 | because humans were over-consuming the resources,
00:26:05.240 | and they were not very rational,
00:26:07.240 | and were destructive in the end in terms of productivity,
00:26:11.080 | and putting more love in the world.
00:26:13.800 | And so within that context, there'll be a chapter
00:26:16.080 | about these biological systems.
00:26:17.480 | - Seems you have a very detailed vision of that hit here.
00:26:20.400 | You should write a sci-fi novel about it.
00:26:22.640 | - I'm working on a sci-fi novel currently, yes.
00:26:26.480 | - Yeah, so-- - Self-published, yeah.
00:26:29.440 | - The definition of intelligence,
00:26:30.720 | so intelligence is the efficiency
00:26:34.680 | with which you acquire new skills,
00:26:38.920 | tasks that you did not previously know about,
00:26:41.960 | that you did not prepare for, right?
00:26:44.680 | So it is not, intelligence is not skill itself,
00:26:47.760 | it's not what you know, it's not what you can do,
00:26:50.720 | it's how well and how efficiently you can learn new things.
00:26:54.600 | - New things. - Yes.
00:26:56.160 | - The idea of newness there
00:26:58.120 | seems to be fundamentally important.
00:27:01.120 | - Yes, so you would see intelligence on display,
00:27:04.240 | for instance, whenever you see a human being
00:27:08.360 | or an AI creature adapt to a new environment
00:27:12.000 | that it does not see before,
00:27:13.800 | that its creators did not anticipate.
00:27:16.560 | When you see adaptation, when you see improvisation,
00:27:19.280 | when you see generalization, that's intelligence.
00:27:22.440 | In reverse, if you have a system
00:27:24.400 | that when you put it in a slightly new environment,
00:27:27.040 | it cannot adapt, it cannot improvise,
00:27:29.960 | it cannot deviate from what it's hard-coded to do
00:27:33.320 | or what it has been trained to do,
00:27:37.600 | that is a system that is not intelligent.
00:27:41.000 | There's actually a quote from Einstein
00:27:43.520 | that captures this idea, which is,
00:27:46.720 | "The measure of intelligence is the ability to change."
00:27:50.760 | I like that quote, I think it captures
00:27:53.160 | at least part of this idea.
00:27:54.960 | - You know, there might be something interesting
00:27:56.480 | about the difference between your definition and Einstein's.
00:27:59.520 | I mean, he's just being Einstein and clever,
00:28:03.720 | but acquisition of new ability to deal with new things
00:28:09.760 | versus ability to just change,
00:28:16.080 | what's the difference between those two things?
00:28:19.960 | So just change in itself,
00:28:22.320 | do you think there's something to that?
00:28:24.360 | Just being able to change.
00:28:26.880 | - Yes, being able to adapt.
00:28:28.440 | So not change, but certainly a change in direction,
00:28:33.200 | being able to adapt yourself to your environment.
00:28:36.520 | - Whatever the environment is.
00:28:38.760 | - That's a big part of intelligence, yes.
00:28:40.920 | And intelligence is most precisely,
00:28:43.360 | how efficiently you're able to adapt,
00:28:45.800 | how efficiently you're able to
00:28:47.560 | basically master your environment,
00:28:49.120 | how efficiently you can acquire new skills.
00:28:52.560 | And I think there's a big distinction to be drawn
00:28:55.680 | between intelligence, which is a process,
00:28:59.560 | and the output of that process, which is skill.
00:29:03.040 | So for instance, if you have a very smart human programmer
00:29:08.960 | that considers the game of chess
00:29:10.720 | and that writes down a static program that can play chess,
00:29:15.720 | then the intelligence is the process
00:29:19.120 | of developing that program.
00:29:20.600 | But the program itself is just encoding
00:29:25.600 | the output artifacts of that process.
00:29:28.040 | The program itself is not intelligent.
00:29:30.000 | And the way you tell it's not intelligent
00:29:31.840 | is that if you put it in a different context,
00:29:34.000 | you ask it to play Go or something,
00:29:36.000 | it's not gonna be able to perform well
00:29:37.760 | without human involvement.
00:29:38.880 | Because the source of intelligence,
00:29:41.080 | the entity that is capable of that process
00:29:43.120 | is the human programmer.
00:29:44.360 | So we should be able to tell the difference
00:29:47.880 | between the process and its output.
00:29:50.080 | We should not confuse the output and the process.
00:29:53.200 | It's the same as, do not confuse a road building company
00:29:58.200 | and one specific road,
00:30:00.160 | because one specific road takes you from point A to point B,
00:30:03.400 | but a road building company can take you from,
00:30:06.120 | can make a path from anywhere to anywhere else.
00:30:08.920 | - Yeah, that's beautifully put.
00:30:10.080 | But it's also, to play devil's advocate a little bit,
00:30:14.920 | it's possible that there is something more fundamental
00:30:19.640 | than us humans.
00:30:21.200 | So you kind of said the programmer creates
00:30:24.640 | the difference between the choir of the skill
00:30:29.440 | and the skill itself.
00:30:31.320 | There could be something,
00:30:32.520 | like you could argue the universe is more intelligent.
00:30:36.380 | Like the deep, the base intelligence
00:30:40.320 | that we should be trying to measure
00:30:43.560 | is something that created humans.
00:30:45.360 | We should be measuring God,
00:30:48.520 | or the source of the universe,
00:30:51.480 | as opposed to, like there could be a deeper intelligence.
00:30:55.080 | - Sure.
00:30:55.920 | - There's always deeper intelligence, I guess.
00:30:57.120 | - You can argue that,
00:30:58.000 | but that does not take anything away
00:31:00.080 | from the fact that humans are intelligent.
00:31:01.840 | And you can tell that
00:31:03.240 | because they are capable of adaptation and generality.
00:31:07.400 | And you see that in particular in the fact that
00:31:10.040 | humans are capable of handling situations and tasks
00:31:16.720 | that are quite different from anything
00:31:19.720 | that any of our evolutionary ancestors has ever encountered.
00:31:24.480 | So we are capable of generalizing very much
00:31:27.080 | out of distribution,
00:31:28.040 | if you consider our evolutionary history
00:31:30.240 | as being in a way our training data.
00:31:32.240 | - Of course, evolutionary biologists would argue
00:31:35.080 | that we're not going too far out of the distribution.
00:31:37.680 | We're like mapping the skills we've learned previously,
00:31:41.400 | desperately trying to like jam them
00:31:43.560 | into like these new situations.
00:31:47.080 | - I mean, there's definitely a little bit of that,
00:31:49.480 | but it's pretty clear to me that we're able to,
00:31:52.200 | you know, most of the things we do any given day
00:31:56.600 | in our modern civilization
00:31:58.080 | are things that are very, very different
00:32:00.920 | from what our ancestors a million years ago
00:32:03.920 | would have been doing in a given day.
00:32:05.920 | And our environment is very different.
00:32:07.600 | So I agree that everything we do,
00:32:12.240 | we do it with cognitive building blocks
00:32:14.280 | that we acquired over the course of evolution, right?
00:32:17.880 | And that anchors our cognition to certain contexts,
00:32:22.200 | which is the human condition very much.
00:32:25.320 | But still our mind is capable
00:32:27.560 | of a pretty remarkable degree of generality,
00:32:30.520 | far beyond anything we can create
00:32:32.720 | in artificial systems today.
00:32:34.120 | Like the degree in which the mind can generalize
00:32:37.800 | from its evolutionary history,
00:32:40.480 | can generalize away from its evolutionary history
00:32:44.000 | is much greater than the degree
00:32:46.520 | to which a deep learning system today
00:32:48.880 | can generalize away from its training data.
00:32:51.080 | - And like the key point you're making,
00:32:52.440 | which I think is quite beautiful,
00:32:53.760 | is like we shouldn't measure,
00:32:57.040 | if we're talking about measurement,
00:32:58.680 | we shouldn't measure the skill.
00:33:00.360 | We should measure like the creation of the new skill,
00:33:04.320 | the ability to create that new skill.
00:33:06.160 | - Yes.
00:33:07.000 | - But it's tempting.
00:33:08.080 | It's weird because the skill
00:33:11.840 | is a little bit of a small window
00:33:13.640 | into the system.
00:33:16.400 | So whenever you have a lot of skills,
00:33:18.320 | it's tempting to measure the skills.
00:33:21.200 | - Yes.
00:33:22.040 | I mean, the skill is the only thing
00:33:23.800 | you can objectively measure.
00:33:26.920 | But yeah, so the thing to keep in mind
00:33:29.720 | is that when you see skill in a human,
00:33:33.440 | it gives you a strong signal
00:33:37.760 | that that human is intelligent
00:33:39.240 | because you know they weren't born with that skill,
00:33:42.120 | typically.
00:33:42.960 | Like you see a very strong chess player,
00:33:45.200 | maybe you're a very strong chess player yourself.
00:33:47.560 | - I think you're saying that 'cause I'm Russian
00:33:51.000 | and now you're prejudiced.
00:33:52.920 | You assume all Russians are good at chess.
00:33:54.760 | - I'm biased, exactly.
00:33:55.600 | - I'm biased.
00:33:56.920 | Well, you're definitely.
00:33:57.760 | - Cultural bias.
00:33:58.600 | So if you see a very strong chess player,
00:34:01.880 | you know they weren't born knowing how to play chess.
00:34:05.480 | So they had to acquire that skill
00:34:07.800 | with their limited resources,
00:34:09.240 | with their limited lifetime.
00:34:10.920 | And you know, they did that
00:34:13.360 | because they are generally intelligent.
00:34:15.400 | And so they may as well have acquired any other skill.
00:34:18.960 | You know they have this potential.
00:34:21.160 | And on the other hand,
00:34:22.640 | if you see a computer playing chess,
00:34:25.680 | you cannot make the same assumptions
00:34:27.840 | because you cannot just assume
00:34:29.360 | the computer is generally intelligent.
00:34:30.840 | The computer may be born knowing how to play chess
00:34:35.280 | in the sense that it may have been programmed
00:34:37.360 | by a human that has understood chess for the computer
00:34:40.880 | and that has just encoded the output
00:34:44.160 | of that understanding and aesthetic program.
00:34:46.000 | And that program is not intelligent.
00:34:49.440 | - So let's zoom out just for a second
00:34:51.360 | and say like, what is the goal
00:34:54.600 | of the "On the Measure of Intelligence" paper?
00:34:57.440 | Like what do you hope to achieve with it?
00:34:59.000 | - So the goal of the paper
00:35:00.480 | is to clear up some longstanding misunderstandings
00:35:04.560 | about the way we've been conceptualizing intelligence
00:35:08.400 | in the AI community.
00:35:09.920 | And in the way we've been evaluating progress in AI.
00:35:14.920 | There's been a lot of progress recently in machine learning
00:35:19.040 | and people are extrapolating from that progress
00:35:22.120 | that we are about to solve general intelligence.
00:35:26.360 | And if you want to be able to evaluate these statements,
00:35:30.480 | you need to precisely define what you're talking about
00:35:33.800 | when you're talking about general intelligence.
00:35:35.560 | And you need a formal way, a reliable way
00:35:40.080 | to measure how much intelligence,
00:35:42.840 | how much general intelligence a system processes.
00:35:46.360 | And ideally this measure of intelligence
00:35:48.880 | should be actionable.
00:35:50.720 | So it should not just describe what intelligence is.
00:35:55.080 | It should not just be a binary indicator
00:35:57.320 | that tells you the system is intelligent or it isn't.
00:36:01.000 | It should be actionable.
00:36:03.560 | It should have explanatory power, right?
00:36:06.200 | So you could use it as a feedback signal.
00:36:09.040 | It would show you the way
00:36:11.480 | towards building more intelligent systems.
00:36:13.600 | - So at the first level,
00:36:16.000 | you draw a distinction between two divergent views
00:36:18.560 | of intelligence.
00:36:19.560 | As we just talked about,
00:36:23.360 | intelligence is a collection of task-specific skills
00:36:27.320 | and a general learning ability.
00:36:30.360 | So what's the difference between
00:36:31.880 | kind of this memorization of skills
00:36:36.080 | and a general learning ability?
00:36:38.280 | We've talked about it a little bit,
00:36:39.560 | but can you try to linger on this topic for a bit?
00:36:43.040 | - Yeah, so the first part of the paper
00:36:45.440 | is an assessment of the different ways
00:36:49.080 | we've been thinking about intelligence
00:36:50.480 | and the different ways we've been evaluating progress
00:36:53.400 | in AI.
00:36:54.520 | And the history of cognitive sciences
00:36:57.720 | has been shaped by two views of the human mind.
00:37:01.280 | And one view is the evolutionary psychology view
00:37:04.760 | in which the mind is a collection
00:37:08.920 | of fairly static, special purpose, ad hoc mechanisms
00:37:13.920 | that have been hard-coded by evolution
00:37:17.640 | over our history as a species over a very long time.
00:37:25.520 | early AI researchers,
00:37:27.960 | people like Marvin Minsky, for instance,
00:37:30.360 | they clearly subscribed to this view.
00:37:33.320 | And they saw the mind as a kind of,
00:37:36.640 | you know, collection of static programs
00:37:38.720 | similar to the programs they would run
00:37:42.160 | on like mainframe computers.
00:37:43.600 | And in fact, I think they very much understood the mind
00:37:48.040 | through the metaphor of the mainframe computer
00:37:50.560 | because it was the tool they were working with, right?
00:37:53.600 | And so you had these static programs,
00:37:55.120 | this collection of very different static programs
00:37:57.280 | operating over a database like memory.
00:38:00.160 | And in this picture, learning was not very important.
00:38:03.720 | Learning was considered to be just memorization.
00:38:05.760 | And in fact, learning is basically not featured
00:38:10.480 | in AI textbooks until the 1980s
00:38:14.720 | with the rise of machine learning.
00:38:17.040 | - It's kind of fun to think about
00:38:18.880 | that learning was the outcast,
00:38:20.600 | like the weird people working on learning,
00:38:24.160 | like the mainstream AI world was,
00:38:28.200 | I mean, I don't know what the best term is,
00:38:31.840 | but it's non-learning.
00:38:33.080 | It was seen as like reasoning
00:38:36.480 | would not be learning-based.
00:38:38.040 | - Yes, it was seen,
00:38:39.360 | it was considered that the mind was a collection
00:38:42.240 | of programs that were primarily logical in nature.
00:38:46.720 | And that's all you needed to do to create a mind
00:38:49.200 | was to write down these programs
00:38:50.960 | and they would operate over knowledge,
00:38:52.960 | which would be stored in some kind of database.
00:38:55.200 | And as long as your database would encompass
00:38:58.280 | everything about the world
00:38:59.520 | and your logical rules were comprehensive,
00:39:03.440 | then you would have a mind.
00:39:05.040 | So the other view of the mind
00:39:06.560 | is the brain as sort of blank slate, right?
00:39:11.560 | This is a very old idea.
00:39:13.280 | You find it in John Locke's writings.
00:39:16.240 | This is the Tabula rasa.
00:39:17.680 | And this is this idea that the mind
00:39:21.200 | is some kind of like information sponge
00:39:23.360 | that starts empty, that starts blank,
00:39:28.280 | and that absorbs knowledge and skills from experience.
00:39:33.280 | So it's a sponge that reflects the complexity of the world,
00:39:39.560 | the complexity of your life experience, essentially.
00:39:42.680 | That everything you know and everything you can do
00:39:45.240 | is a reflection of something you found
00:39:48.600 | in the outside world, essentially.
00:39:50.480 | So this is an idea that's very old,
00:39:52.480 | that was not very popular, for instance, in the 1970s,
00:39:58.240 | but that had gained a lot of vitality recently
00:40:00.400 | with the rise of connectionism, in particular deep learning.
00:40:04.080 | And so today, deep learning is the dominant paradigm in AI.
00:40:08.280 | And I feel like lots of AI researchers
00:40:12.000 | are conceptualizing the mind via a deep learning metaphor.
00:40:16.600 | Like they see the mind as a kind of
00:40:19.120 | randomly initialized neural network
00:40:22.120 | that starts blank when you're born,
00:40:24.160 | and then that gets trained via exposure to training data
00:40:27.880 | that acquires knowledge and skills
00:40:29.520 | via exposure to training data.
00:40:31.000 | - By the way, it's a small tangent.
00:40:33.480 | I feel like people who are thinking about intelligence
00:40:38.520 | are not conceptualizing it that way.
00:40:41.440 | I actually haven't met too many people
00:40:43.480 | who believe that a neural network will be able to reason,
00:40:48.480 | who seriously think that, rigorously,
00:40:51.680 | 'cause I think it's actually an interesting worldview.
00:40:54.240 | And we'll talk about it more,
00:40:56.400 | but it's been impressive
00:40:58.120 | what neural networks have been able to accomplish.
00:41:02.080 | And to me, I don't know, you might disagree,
00:41:04.520 | but it's an open question whether like scaling size
00:41:09.840 | eventually might lead to incredible results to us,
00:41:14.040 | mere humans will appear as if it's general.
00:41:17.080 | - I mean, if you ask people
00:41:18.800 | who are seriously thinking about intelligence,
00:41:20.760 | they will definitely not say that all you need to do
00:41:23.920 | is like the mind is just a neural network.
00:41:26.520 | However, it's actually a view that's very popular,
00:41:30.440 | I think, in the deep learning community,
00:41:31.800 | that many people are kind of conceptually,
00:41:35.480 | intellectually lazy about it.
00:41:37.120 | - Right.
00:41:38.320 | - But I guess what I'm saying exactly right,
00:41:40.560 | is I haven't met many people,
00:41:44.720 | and I think it would be interesting
00:41:46.960 | to meet a person who is not intellectually lazy
00:41:48.960 | about this particular topic,
00:41:50.240 | and still believes that neural networks will go all the way.
00:41:54.480 | I think Yann LeCun is probably closest to that,
00:41:56.800 | would self-supervise. - There are definitely people
00:41:58.440 | who argue that currently planning techniques
00:42:03.080 | are already the way to general artificial intelligence,
00:42:06.880 | that all you need to do is to scale it up
00:42:09.440 | to all the available trained data.
00:42:12.760 | And that's, if you look at the waves
00:42:16.240 | that OpenAI's GPT-3 model has made,
00:42:19.480 | you see echoes of this idea.
00:42:22.680 | - So on that topic, GPT-3, similar to GPT-2, actually,
00:42:27.680 | have captivated some part of the imagination of the public.
00:42:33.000 | There's just a bunch of hype of different kind
00:42:35.520 | that's, I would say it's emergent,
00:42:37.920 | it's not artificially manufactured,
00:42:39.800 | it's just like people just get excited
00:42:42.600 | for some strange reason.
00:42:43.760 | And in the case of GPT-3, which is funny,
00:42:46.500 | that there's, I believe, a couple months delay
00:42:49.100 | from a release to hype.
00:42:51.580 | Maybe I'm not historically correct on that,
00:42:56.580 | but it feels like there was a little bit of a lack of hype,
00:43:01.260 | and then there's a phase shift into hype.
00:43:04.760 | But nevertheless, there's a bunch of cool applications
00:43:07.480 | that seem to captivate the imagination of the public
00:43:10.420 | about what this language model
00:43:12.160 | that's trained in unsupervised way
00:43:15.200 | without any fine tuning is able to achieve.
00:43:19.520 | So what do you make of that?
00:43:20.920 | What are your thoughts about GPT-3?
00:43:22.960 | - Yeah, so I think what's interesting about GPT-3
00:43:25.720 | is the idea that it may be able to learn new tasks
00:43:29.880 | after just being shown a few examples.
00:43:33.640 | So I think if it's actually capable of doing that,
00:43:35.640 | that's novel, and that's very interesting,
00:43:37.600 | and that's something we should investigate.
00:43:39.920 | That said, I must say, I'm not entirely convinced
00:43:43.160 | that we have shown it's capable of doing that.
00:43:47.320 | It's very likely, given the amount of data
00:43:51.000 | that the model is trained on,
00:43:52.240 | that what it's actually doing is pattern matching
00:43:55.720 | a new task you give it with a task
00:43:58.080 | that it's been exposed to in its training data.
00:44:00.120 | It's just recognizing the task
00:44:01.640 | instead of just developing a model of the task, right?
00:44:05.560 | - But there's, sorry to interrupt,
00:44:07.640 | there's parallels to what you said before,
00:44:10.000 | which is it's possible to see GPT-3
00:44:13.080 | as like the prompts it's given
00:44:15.560 | as a kind of SQL query into this thing that it's learned,
00:44:19.560 | similar to what you said before,
00:44:20.840 | which is language is used to query the memory.
00:44:23.320 | - Yes.
00:44:24.160 | - So is it possible that neural network
00:44:26.920 | is a giant memorization thing,
00:44:29.320 | but then if it gets sufficiently giant,
00:44:32.240 | it'll memorize sufficiently large amounts
00:44:35.080 | of things in the world,
00:44:36.400 | or it becomes, or intelligence becomes a querying machine?
00:44:40.560 | - I think it's possible that a significant chunk
00:44:44.160 | of intelligence is this giant associative memory.
00:44:47.480 | I definitely don't believe that intelligence
00:44:51.320 | is just a giant associative memory,
00:44:53.680 | but it may well be a big component.
00:44:57.640 | - So do you think GPT-3, four, five,
00:45:02.640 | GPT-10 will eventually,
00:45:05.760 | like, what do you think, where's the ceiling?
00:45:08.360 | Do you think you'll be able to reason?
00:45:10.560 | No, that's a bad question.
00:45:13.440 | Like, what is the ceiling is the better question.
00:45:17.320 | - How well is it gonna scale?
00:45:18.520 | How good is GPT-N going to be?
00:45:21.200 | - Yeah.
00:45:22.040 | - So I believe GPT-N is gonna--
00:45:25.440 | - GPT-N?
00:45:26.880 | - Is gonna improve on the strength of GPT-2 and 3,
00:45:30.920 | which is it will be able to generate, you know,
00:45:34.000 | ever more plausible text in context.
00:45:37.640 | - Just monotonically increasing performance.
00:45:39.920 | - Yes, if you train a bigger model on more data,
00:45:44.360 | then your text will be increasingly more context aware
00:45:49.360 | and increasingly more plausible,
00:45:51.240 | in the same way that GPT-3 is much better
00:45:54.720 | at generating plausible text compared to GPT-2.
00:45:57.520 | But that said, I don't think just scaling up the model
00:46:03.400 | to more transformer layers and more trained data
00:46:05.640 | is gonna address the flaws of GPT-3,
00:46:08.440 | which is that it can generate plausible text,
00:46:11.360 | but that text is not constrained by anything else
00:46:15.040 | other than plausibility.
00:46:16.680 | So in particular, it's not constrained by factualness
00:46:20.640 | or even consistency,
00:46:22.040 | which is why it's very easy to get GPT-3
00:46:24.080 | to generate statements that are factually untrue
00:46:27.920 | or to generate statements that are even self-contradictory,
00:46:31.000 | right?
00:46:32.120 | Because its only goal is plausibility
00:46:36.840 | and it has no other constraints.
00:46:39.120 | It's not constrained to be self-consistent, for instance.
00:46:42.440 | And so for this reason,
00:46:44.080 | one thing that I thought was very interesting with GPT-3
00:46:46.680 | is that you can pre-determine the answer it will give you
00:46:51.200 | by asking the question in a specific way,
00:46:53.480 | because it's very responsive to the way you ask the question
00:46:56.600 | since it has no understanding of the content of the question.
00:47:01.600 | And if you ask the same question in two different ways
00:47:07.200 | that are basically adversarially engineered
00:47:10.560 | to produce certain answer,
00:47:11.720 | you will get two different answers,
00:47:14.240 | two contradictory answers.
00:47:15.640 | - It's very susceptible to adversarial attacks, essentially.
00:47:18.200 | - Potentially, yes.
00:47:19.440 | So in general, the problem with these models,
00:47:22.320 | these generative models is that
00:47:24.320 | they're very good at generating plausible text,
00:47:27.280 | but that's just not enough, right?
00:47:30.320 | I think one avenue that would be very interesting
00:47:36.560 | to make progress is to make it possible
00:47:39.520 | to write programs over the latent space
00:47:43.920 | that these models operate on,
00:47:45.720 | that you would rely on these self-supervised models
00:47:49.520 | to generate a sort of pool of knowledge and concepts
00:47:54.360 | and common sense,
00:47:55.320 | and then you would be able to write explicit
00:47:58.320 | reasoning programs over it.
00:48:01.520 | Because the current problem with GPT-3
00:48:03.280 | is that it can be quite difficult to get it
00:48:06.960 | to do what you want to do.
00:48:09.440 | If you want to turn GPT-3 into products,
00:48:12.480 | you need to put constraints on it.
00:48:14.840 | You need to force it to obey certain rules.
00:48:19.520 | So you need a way to program it explicitly.
00:48:22.520 | - Yeah, so if you look at its ability
00:48:24.200 | to do program synthesis,
00:48:26.120 | it generates, like you said, something that's plausible.
00:48:29.040 | - Yeah, so if you try to make it generate programs,
00:48:32.600 | it will perform well for any program
00:48:35.920 | that it has seen in its training data,
00:48:38.720 | but because program space is not interpretative, right?
00:48:44.320 | It's not gonna be able to generalize to problems
00:48:46.760 | it hasn't seen before.
00:48:48.760 | - Now that's currently, do you think,
00:48:51.920 | sort of an absurd, but I think useful,
00:48:56.360 | I guess, intuition builder is,
00:48:59.520 | you know, the GPT-3 has 175 billion parameters.
00:49:05.400 | The human brain has 100,
00:49:09.360 | has about a thousand times that or more
00:49:13.160 | in terms of number of synapses.
00:49:14.840 | Do you think, obviously, very different kinds of things,
00:49:21.200 | but there is some degree of similarity.
00:49:26.200 | Do you think, what do you think GPT will look like
00:49:30.720 | when it has 100 trillion parameters?
00:49:34.240 | You think our conversation might be in nature different?
00:49:38.520 | 'Cause you've criticized GPT-3 very effectively now.
00:49:43.000 | Do you think?
00:49:43.960 | - No, I don't think so.
00:49:46.960 | So to begin with the bottleneck with scaling up GPT-3,
00:49:51.080 | GPT models, generative pre-trained transformer models,
00:49:54.920 | is not gonna be the size of the model
00:49:57.680 | or how long it takes to train it.
00:49:59.640 | The bottleneck is gonna be the trained data
00:50:01.920 | because OpenAI is already training GPT-3
00:50:05.600 | on a crawl of basically the entire web, right?
00:50:08.920 | And that's a lot of data.
00:50:09.880 | So you could imagine training on more data than that.
00:50:12.200 | Google could train on more data than that,
00:50:14.440 | but it would still be only incrementally more data.
00:50:17.480 | And I don't recall exactly how much more data
00:50:20.760 | GPT-3 was trained on compared to GPT-2,
00:50:22.800 | but it's probably at least like 100,
00:50:25.040 | or maybe even a thousand X.
00:50:26.600 | Don't have the exact number.
00:50:28.440 | You're not gonna be able to train a model
00:50:30.120 | on a hundred more data than what you're already doing.
00:50:34.160 | - So that's brilliant.
00:50:35.280 | So it's not, you know,
00:50:36.400 | it's easier to think of compute as a bottleneck
00:50:38.880 | and then arguing that we can remove that bottleneck, but-
00:50:41.600 | - We can remove the compute bottleneck.
00:50:43.040 | I don't think it's a big problem.
00:50:44.560 | If you look at the pace at which we've improved
00:50:48.480 | the efficiency of deep learning models
00:50:50.880 | in the past few years,
00:50:53.800 | I'm not worried about train time bottlenecks
00:50:57.160 | or model size bottlenecks.
00:50:58.720 | The bottleneck in the case of these
00:51:02.080 | generative transformer models
00:51:03.440 | is absolutely the trained data.
00:51:05.560 | - What about the quality of the data?
00:51:07.320 | So- - So, yeah.
00:51:08.440 | So the quality of the data is an interesting point.
00:51:10.880 | The thing is, if you're gonna want to use these models
00:51:14.440 | in real products,
00:51:15.800 | then you want to feed them data
00:51:20.080 | that's as high quality as factual,
00:51:23.480 | I would say as unbiased as possible,
00:51:25.600 | but you know, there's not really such a thing
00:51:27.360 | as unbiased data in the first place.
00:51:30.480 | But you probably don't want to train it on Reddit,
00:51:34.000 | for instance.
00:51:34.840 | It sounds like a bad plan.
00:51:37.040 | So from my personal experience working with
00:51:40.240 | large scale deep learning models,
00:51:42.760 | so at some point I was working on a model at Google
00:51:46.600 | that's trained on like 350 million labeled images.
00:51:51.600 | It's an image classification model.
00:51:53.680 | That's a lot of images.
00:51:54.640 | That's like probably most publicly available images
00:51:58.160 | on the web at the time.
00:51:59.360 | And it was a very noisy dataset
00:52:03.880 | because the labels were not originally annotated by hand,
00:52:07.800 | by humans.
00:52:08.640 | They were automatically derived from like tags
00:52:12.400 | on social media or just keywords in the same page
00:52:16.800 | as the image was found and so on.
00:52:18.200 | So it was very noisy.
00:52:19.080 | And it turned out that you could easily get a better model,
00:52:24.080 | not just by training,
00:52:26.480 | like if you train on more of the noisy data,
00:52:29.960 | you get an incrementally better model,
00:52:31.480 | but you very quickly hit diminishing returns.
00:52:35.480 | On the other hand,
00:52:36.640 | if you train on smaller dataset
00:52:38.360 | with higher quality annotations,
00:52:39.960 | annotations that are actually made by humans,
00:52:44.960 | you get a better model.
00:52:47.280 | And it also takes less time to train it.
00:52:50.080 | - Yeah, that's fascinating.
00:52:51.520 | It's the self-supervised learning.
00:52:53.440 | There's a way to get better doing the automated labeling.
00:52:58.440 | - Yeah, so you can enrich or refine your labels
00:53:03.760 | in an automated way.
00:53:05.840 | - That's correct.
00:53:07.480 | - Do you have a hope for,
00:53:08.720 | I don't know if you're familiar
00:53:09.560 | with the idea of a semantic web.
00:53:11.160 | Is a semantic web,
00:53:13.560 | just for people who are not familiar,
00:53:15.600 | and is the idea of being able to convert the internet
00:53:20.600 | or be able to attach like semantic meaning
00:53:25.680 | to the words on the internet,
00:53:27.560 | the sentences, the paragraphs,
00:53:29.760 | to be able to convert information on the internet
00:53:33.920 | or some fraction of the internet
00:53:35.680 | into something that's interpretable by machines.
00:53:38.160 | That was kind of a dream for,
00:53:43.000 | I think the semantic web papers in the 90s.
00:53:47.000 | It's kind of the dream that,
00:53:49.720 | the internet is full of rich, exciting information.
00:53:52.320 | Even just looking at Wikipedia,
00:53:54.400 | we should be able to use that as data for machines.
00:53:57.760 | And so far-- - The information
00:53:59.000 | is not really in a format that's available to machines.
00:54:01.240 | So no, I don't think the semantic web will ever work
00:54:04.520 | simply because it would be a lot of work, right?
00:54:08.000 | To make, to provide that information in structured form.
00:54:12.000 | And there is not really any incentive
00:54:13.800 | for anyone to provide that work.
00:54:16.320 | So I think the way forward to make the knowledge
00:54:21.160 | on the web available to machines
00:54:22.800 | is actually something closer to unsupervised deep learning.
00:54:26.600 | - Yeah. - The GPT-3
00:54:30.280 | is actually a bigger step in the direction
00:54:32.200 | of making the knowledge of the web available to machines
00:54:34.960 | than the semantic web was.
00:54:36.680 | - Yeah, perhaps in a human centric sense,
00:54:40.160 | it feels like GPT-3 hasn't learned anything
00:54:45.160 | that could be used to reason.
00:54:49.440 | But that might be just the early days.
00:54:52.920 | - Yeah, I think that's correct.
00:54:54.360 | I think the forms of reasoning that you see perform
00:54:57.440 | are basically just reproducing patterns
00:55:00.720 | that it has seen in the string data.
00:55:02.440 | So of course, if you're trained on the entire web,
00:55:06.640 | then you can produce an illusion of reasoning
00:55:09.360 | in many different situations,
00:55:10.800 | but it will break down if it's presented
00:55:13.160 | with a novel situation.
00:55:15.320 | - That's the open question between the illusion of reasoning
00:55:17.720 | and actual reasoning, yeah.
00:55:18.760 | - Yes, the power to adapt to something
00:55:21.280 | that is genuinely new.
00:55:22.800 | Because the thing is, even imagine you had,
00:55:28.040 | you could train on every bit of data ever generated
00:55:32.560 | in this tree of humanity.
00:55:34.000 | It remains, that model would be capable
00:55:38.560 | of anticipating many different possible situations,
00:55:43.200 | but it remains that the future
00:55:45.520 | is gonna be something different.
00:55:47.280 | Like for instance, if you train a GPT-3 model
00:55:51.800 | on data from the year 2002, for instance,
00:55:55.760 | and then use it today,
00:55:56.760 | it's gonna be missing many things.
00:55:58.280 | It's gonna be missing many common sense facts
00:56:01.240 | about the world.
00:56:02.640 | It's even gonna be missing vocabulary and so on.
00:56:05.880 | - Yeah, it's interesting that GPT-3 even doesn't have,
00:56:09.640 | I think, any information about the coronavirus.
00:56:13.560 | - Yes.
00:56:15.000 | Which is why a system that's,
00:56:17.760 | you tell that the system is intelligent
00:56:21.280 | when it's capable to adapt.
00:56:22.880 | So intelligence is gonna require
00:56:25.640 | some amount of continuous learning.
00:56:28.200 | It's also gonna require some amount of improvisation.
00:56:30.600 | Like it's not enough to assume that
00:56:33.560 | what you're gonna be asked to do
00:56:36.120 | is something that you've seen before,
00:56:38.760 | or something that is a simple interpolation
00:56:40.560 | of things you've seen before.
00:56:42.720 | - Yeah.
00:56:43.560 | - In fact, that model breaks down for,
00:56:45.480 | even very tasks that look relatively simple
00:56:51.400 | from a distance, like L5 self-driving, for instance.
00:56:55.600 | Google had a paper a couple of years back
00:56:59.720 | showing that something like 30 million
00:57:03.840 | different road situations
00:57:05.480 | were actually completely insufficient
00:57:07.200 | to train a driving model.
00:57:09.840 | It wasn't even L2, right?
00:57:11.800 | And that's a lot of data.
00:57:12.800 | That's a lot more data than the 20 or 30 hours of driving
00:57:16.920 | that a human needs to learn to drive,
00:57:19.560 | given the knowledge they've already accumulated.
00:57:21.920 | - Well, let me ask you on that topic.
00:57:24.680 | Elon Musk, Tesla Autopilot,
00:57:29.560 | one of the only companies I believe
00:57:31.720 | is really pushing for a learning-based approach.
00:57:34.760 | Are you skeptical that that kind of network
00:57:37.040 | can achieve level four?
00:57:38.320 | - L4 is probably achievable.
00:57:42.720 | L5 probably not.
00:57:44.480 | - What's the distinction there?
00:57:45.920 | Is L5 is completely, you can just fall asleep?
00:57:49.440 | - Yeah, L5 is basically human level.
00:57:51.160 | Well, driving, I have to be careful saying human level
00:57:53.800 | 'cause like that's the most--
00:57:54.640 | - Yeah, there are tons of drivers.
00:57:56.240 | - Yeah, that's the clearest example of like,
00:57:59.920 | you know, cars will most likely be much safer than humans
00:58:03.320 | in many situations where humans fail.
00:58:06.640 | It's the vice versa question.
00:58:08.880 | - So I'll tell you, you know,
00:58:11.440 | the thing is the amounts of trained data you would need
00:58:14.720 | to anticipate for pretty much every possible situation
00:58:17.680 | you'll encounter in the real world
00:58:20.520 | is such that it's not entirely unrealistic
00:58:23.560 | to think that at some point in the future
00:58:25.520 | we'll develop a system that's trained on enough data,
00:58:27.440 | especially provided that we can simulate
00:58:31.200 | a lot of that data.
00:58:32.400 | We don't necessarily need actual cars on the road
00:58:35.680 | for everything, but it's a massive effort.
00:58:39.880 | And it turns out you can create a system
00:58:41.760 | that's much more adaptive,
00:58:43.840 | that can generalize much better
00:58:45.200 | if you just add explicit models
00:58:51.160 | of the surroundings of the car.
00:58:53.320 | And if you use deep learning for what it's good at,
00:58:55.880 | which is to provide perceptive information.
00:58:59.400 | So in general, deep learning is a way to encode perception
00:59:03.640 | and a way to encode intuition,
00:59:05.760 | but it is not a good medium
00:59:07.600 | for any sort of explicit reasoning.
00:59:11.160 | And in AI systems today,
00:59:14.760 | strong generalization tends to come from explicit models,
00:59:20.600 | tend to come from abstractions in the human mind
00:59:24.320 | that are encoded in program form by a human engineer.
00:59:28.680 | These are the abstractions you can actually generalize,
00:59:31.280 | not the sort of weak abstraction
00:59:33.280 | that is learned by a neural network.
00:59:34.920 | - Yeah, and the question is how much reasoning,
00:59:38.520 | how much strong abstractions are required
00:59:41.920 | to solve particular tasks like driving?
00:59:44.600 | That's the question, or human life, existence.
00:59:48.800 | How much strong abstractions does existence require,
00:59:53.320 | but more specifically on driving?
00:59:55.620 | That seems to be a coupled question about intelligence
01:00:00.720 | is like how much intelligence,
01:00:04.400 | like how do you build an intelligent system?
01:00:07.200 | And the coupled problem, how hard is this problem?
01:00:11.400 | How much intelligence does this problem actually require?
01:00:14.400 | So we get to cheat, right?
01:00:18.000 | 'Cause we get to look at the problem.
01:00:19.840 | It's not like you get to close our eyes
01:00:22.840 | and completely new to driving.
01:00:24.760 | We get to do what we do as human beings,
01:00:26.780 | which is for the majority of our life,
01:00:30.260 | before we ever learn, quote unquote, to drive,
01:00:32.440 | we get to watch other cars and other people drive.
01:00:35.480 | We get to be in cars, we get to watch,
01:00:37.520 | we get to see movies about cars,
01:00:39.480 | we get to observe all this stuff.
01:00:42.680 | And that's similar to what neural networks are doing,
01:00:45.080 | is getting a lot of data
01:00:47.120 | and the question is, yeah,
01:00:51.360 | how many leaps of reasoning genius is required
01:00:56.360 | to be able to actually effectively drive?
01:00:59.400 | - I think it's for example of driving.
01:01:01.320 | I mean, sure, you've seen a lot of cars in your life
01:01:06.200 | before you learn to drive,
01:01:07.720 | but let's say you've learned to drive in Silicon Valley
01:01:10.600 | and now you rent a car in Tokyo.
01:01:14.120 | Well, now everyone is driving on the other side of the road
01:01:16.800 | and the signs are different
01:01:18.560 | and the roads are more narrow and so on.
01:01:20.400 | So it's a very, very different environment
01:01:22.640 | and a smart human, even an average human,
01:01:26.760 | should be able to just zero shot it,
01:01:29.280 | to just be operational in this very different environment
01:01:34.200 | right away, despite having had no contact
01:01:38.680 | with the novel complexity
01:01:41.360 | that is contained in this environment, right?
01:01:44.200 | And that is novel complexity,
01:01:45.960 | it's not just interpolation over the situations
01:01:50.800 | that you've encountered previously,
01:01:52.440 | like learning to drive in the US, right?
01:01:55.120 | - I would say the reason I ask
01:01:56.880 | is one of the most interesting tests of intelligence
01:01:59.920 | we have today actively, which is driving,
01:02:03.040 | in terms of having an impact on the world.
01:02:06.400 | Like when do you think we'll pass
01:02:08.080 | that test of intelligence?
01:02:09.840 | - So I don't think driving is that much of a test
01:02:12.720 | of intelligence because again,
01:02:14.800 | there is no task for which skill at that task
01:02:18.520 | demonstrates intelligence,
01:02:20.120 | unless it's a kind of meta task
01:02:23.120 | that involves acquiring new skills.
01:02:26.520 | So I don't think, I think you can actually solve driving
01:02:29.320 | without having any real amount of intelligence.
01:02:34.320 | For instance, if you really did have infinite training data,
01:02:38.280 | you could just literally train
01:02:41.680 | an end-to-end deep learning model that does driving,
01:02:44.120 | provided infinite training data.
01:02:45.720 | The only problem with the whole idea
01:02:49.000 | is collecting a data sets that's sufficiently comprehensive
01:02:53.440 | that covers the very long tail
01:02:55.160 | of possible situations you might encounter.
01:02:57.280 | And it's really just a scale problem.
01:02:59.280 | So I think there's nothing fundamentally wrong
01:03:03.280 | with this plan, with this idea.
01:03:06.520 | It's just that it strikes me
01:03:09.560 | as a fairly inefficient thing to do
01:03:11.560 | because you run into this scaling issue
01:03:16.480 | with diminishing returns.
01:03:17.840 | Whereas if instead you took a more manual engineering
01:03:21.960 | approach where you use deep learning modules
01:03:26.960 | in combination with engineering an explicit model
01:03:32.000 | of the surrounding of the cars
01:03:33.920 | and you bridge the two in a clever way,
01:03:36.120 | your model will actually start generalizing much earlier
01:03:39.840 | and more effectively than the end-to-end deep learning model.
01:03:42.440 | So why would you not go
01:03:44.560 | with the more manual engineering oriented approach?
01:03:47.640 | Like even if you created that system,
01:03:50.040 | either the end-to-end deep learning model system
01:03:52.280 | that's running on infinite data
01:03:53.960 | or the slightly more human system,
01:03:58.560 | I don't think achieving L5 would demonstrate
01:04:01.720 | general intelligence or intelligence of any generality at all.
01:04:05.720 | Again, the only possible test of generality in AI
01:04:10.520 | would be a test that looks at skill acquisition
01:04:12.720 | over unknown tasks.
01:04:14.280 | But for instance, you could take your L5 driver
01:04:17.360 | and ask it to learn to pilot a commercial airplane
01:04:21.520 | for instance, and then you would look at
01:04:23.240 | how much human involvement is required
01:04:25.800 | and how much training data is required
01:04:28.080 | for the system to learn to pilot an airplane.
01:04:29.840 | And that gives you a measure
01:04:33.600 | of how intelligent that system really is.
01:04:35.880 | - Yeah, well, I mean, that's a big leap.
01:04:37.520 | I get you, but I'm more interested as a problem.
01:04:42.080 | I would see, to me, driving is a black box
01:04:46.600 | that can generate novel situations at some rate,
01:04:50.880 | like what people call edge cases.
01:04:53.440 | So it does have newness that keeps being,
01:04:56.160 | like we're confronted, let's say once a month.
01:04:59.440 | - It is a very long tail, yes.
01:05:01.000 | - It's a long tail.
01:05:01.840 | - But it doesn't mean you cannot solve it
01:05:03.840 | just by training a statistical model on a lot of data.
01:05:08.880 | - Huge amount of data.
01:05:09.960 | - It's really a matter of scale.
01:05:12.040 | - But I guess what I'm saying is,
01:05:14.560 | if you have a vehicle that achieves level five,
01:05:17.680 | it is going to be able to deal with new situations.
01:05:22.680 | Or, I mean, the data is so large
01:05:28.520 | that the rate of new situations is very low.
01:05:32.280 | - Yes.
01:05:33.320 | - That's not intelligence.
01:05:34.400 | So if we go back to your kind of definition of intelligence,
01:05:37.960 | it's the efficiency.
01:05:39.680 | - With which you can adapt to new situations,
01:05:42.600 | to truly new situations,
01:05:43.720 | not situations you've seen before, right?
01:05:45.880 | Not situations that could be anticipated by your creators,
01:05:48.640 | by the creators of the system, but truly new situations.
01:05:51.960 | The efficiency with which you acquire new skills.
01:05:55.160 | If you require, if in order to pick up a new skill,
01:05:58.440 | you require a very extensive training data set
01:06:03.440 | of most possible situations that can occur
01:06:07.200 | in the practice of that skill,
01:06:09.040 | then the system is not intelligent.
01:06:10.680 | It is mostly just a lookup table.
01:06:13.960 | - Yeah, well.
01:06:16.680 | - Likewise, if in order to acquire a skill,
01:06:20.160 | you need a human engineer to write down a bunch of rules
01:06:24.360 | that cover most or every possible situation.
01:06:27.000 | Likewise, the system is not intelligent.
01:06:29.760 | The system is merely the output artifact
01:06:32.920 | of a process that happens in the minds
01:06:37.920 | of the engineers that are creating it, right?
01:06:41.040 | It is encoding an abstraction
01:06:44.640 | that's produced by the human mind.
01:06:46.640 | And intelligence would actually be
01:06:48.960 | the process of producing,
01:06:53.480 | of autonomously producing this abstraction.
01:06:56.480 | - Yeah.
01:06:57.320 | - Not like, if you take an abstraction
01:06:59.360 | and you encode it on a piece of paper
01:07:01.360 | or in a computer program,
01:07:03.040 | the abstraction itself is not intelligent.
01:07:06.080 | What's intelligent is the agent
01:07:09.160 | that's capable of producing these abstractions, right?
01:07:11.960 | - Yeah, it feels like there's a little bit of a gray area.
01:07:15.320 | Like, 'cause you're basically saying
01:07:17.800 | that deep learning forms abstractions too,
01:07:20.520 | but those abstractions do not seem to be effective
01:07:25.680 | for generalizing far outside of the things
01:07:29.200 | that it's already seen.
01:07:30.200 | But generalize a little bit.
01:07:31.720 | - Yeah, absolutely.
01:07:32.640 | No, deep learning does generalize a little bit.
01:07:34.320 | Like, generalization is not binary, it's more like a spectrum.
01:07:38.240 | - Yeah, and there's a certain point,
01:07:40.120 | it's a gray area, but there's a certain point
01:07:42.400 | where there's an impressive degree
01:07:43.840 | of generalization that happens.
01:07:46.480 | No, like, I guess exactly what you were saying is
01:07:52.320 | intelligence is how efficiently you're able to generalize
01:07:57.320 | far outside of the distribution
01:08:01.880 | of things you've seen already.
01:08:03.360 | - Yes.
01:08:04.200 | - So it's both like the distance of how far you can,
01:08:06.880 | like how new, how radically new something is
01:08:10.240 | and how efficiently you're able to deal with that.
01:08:12.600 | - So you can think of intelligence as a measure
01:08:16.400 | of an information conversion ratio.
01:08:18.920 | Like imagine a space of possible situations
01:08:22.440 | and you've covered some of them.
01:08:27.760 | So you have some amount of information
01:08:29.880 | about your space of possible situations.
01:08:32.000 | That's provided by the situations you already know.
01:08:34.480 | And that's on the other hand,
01:08:35.840 | also provided by the prior knowledge
01:08:38.880 | that the system brings to the table,
01:08:41.040 | the prior knowledge that's embedded in the system.
01:08:43.560 | So the system starts with some information, right,
01:08:46.440 | about the problem, about the task.
01:08:48.880 | And it's about going from that information to a program,
01:08:53.600 | what we would call a skill program, a behavioral program
01:08:56.440 | that can cover a large area of possible situation space.
01:09:00.560 | And essentially the ratio between that area
01:09:04.120 | and the amount of information you start with is intelligence.
01:09:07.600 | So a very smart agent can make efficient uses
01:09:14.200 | of very little information about a new problem
01:09:17.560 | and very little prior knowledge as well
01:09:19.600 | to cover a very large area of potential situations
01:09:23.360 | in that problem without knowing
01:09:26.560 | what these future new situations are going to be.
01:09:29.400 | - So one of the other big things you talk about in the paper,
01:09:34.560 | we've talked about a little bit already,
01:09:36.280 | but let's talk about it some more,
01:09:37.840 | is the actual tests of intelligence.
01:09:40.920 | So if we look at like human and machine intelligence,
01:09:45.960 | do you think tests of intelligence should be different
01:09:49.040 | for humans and machines,
01:09:50.320 | or how we think about testing of intelligence?
01:09:53.360 | Are these fundamentally the same kind of intelligences
01:09:58.240 | that we're after and therefore the tests should be similar?
01:10:03.680 | - So if your goal is to create AIs
01:10:07.680 | that are more human-like,
01:10:10.560 | then it will be super valuable, obviously,
01:10:12.480 | to have a test that's universal,
01:10:15.680 | that applies to both AIs and humans,
01:10:19.440 | so that you could establish a comparison
01:10:22.400 | between the two that you could tell exactly
01:10:25.080 | how intelligent, in terms of human intelligence,
01:10:29.280 | a given system is.
01:10:30.400 | So that said, the constraints
01:10:34.000 | that apply to artificial intelligence
01:10:36.400 | and to human intelligence are very different,
01:10:39.400 | and your tests should account for this difference.
01:10:43.520 | Because if you look at artificial systems,
01:10:47.200 | it's always possible for an experimenter
01:10:50.440 | to buy arbitrary levels of skill at arbitrary tasks,
01:10:55.440 | either by injecting hard-coded prior knowledge
01:11:00.680 | into the system via rules and so on
01:11:05.480 | that come from the human mind,
01:11:06.920 | from the minds of the programmers,
01:11:08.760 | and also buying higher levels of skill
01:11:13.000 | just by training on more data.
01:11:15.640 | For instance, you could generate
01:11:16.960 | an infinity of different Go games,
01:11:19.520 | and you could train a Go playing system that way,
01:11:23.560 | but you could not directly compare it
01:11:26.880 | to human Go playing skills,
01:11:28.640 | because a human that plays Go had to develop that skill
01:11:32.800 | in a very constrained environment.
01:11:34.720 | They had a limited amount of time,
01:11:36.640 | they had a limited amount of energy,
01:11:38.760 | and of course, this started from a different set of priors.
01:11:42.640 | This started from innate human priors.
01:11:47.640 | So I think if you want to compare
01:11:49.880 | the intelligence of two systems,
01:11:51.440 | like the intelligence of an AI
01:11:53.320 | and the intelligence of a human,
01:11:56.200 | you have to control for priors.
01:11:59.840 | You have to start from the same set of knowledge priors
01:12:04.520 | about the task, and you have to control for experience,
01:12:08.760 | that is to say for training data.
01:12:11.200 | - So what's priors?
01:12:15.040 | - So prior is whatever information you have
01:12:18.680 | about a given task before you start learning
01:12:21.680 | about this task.
01:12:23.240 | - And how's that different from experience?
01:12:25.840 | - Well, experience is acquired, right?
01:12:28.080 | So for instance, if you're trying to play Go,
01:12:31.080 | your experience with Go is all the Go games
01:12:33.920 | you've played or you've seen,
01:12:36.280 | or you've simulated in your mind, let's say.
01:12:38.600 | And your priors are things like,
01:12:42.120 | well, Go is a game on the 2D grid,
01:12:46.000 | and we have lots of hard-coded priors
01:12:48.840 | about the organization of 2D space.
01:12:53.320 | - And so rules of how the dynamics of this,
01:12:57.800 | the physics of this game in this 2D space.
01:13:00.040 | - Yes.
01:13:01.160 | - The idea that you have, what winning is.
01:13:04.360 | - Yes, exactly.
01:13:05.600 | And other board games can also show some similarity with Go.
01:13:09.640 | And if you've played these board games,
01:13:11.120 | then with respect to the game of Go,
01:13:13.680 | that would be part of your priors about the game.
01:13:16.320 | - Well, it's interesting to think about the game of Go
01:13:18.480 | is how many priors are actually brought to the table.
01:13:21.160 | When you look at self-play, reinforcement learning-based
01:13:27.440 | mechanisms that do learning,
01:13:28.960 | it seems like the number of priors is pretty low.
01:13:31.040 | - Yes.
01:13:31.880 | - But you're saying you should be-
01:13:32.720 | - There is a 2D spatial priors in the covenant.
01:13:35.760 | - Right.
01:13:36.600 | But you should be clear at making those priors explicit.
01:13:40.560 | - Yes.
01:13:41.880 | So in particular, I think if your goal
01:13:44.040 | is to measure a human-like form of intelligence,
01:13:47.720 | then you should clearly establish
01:13:49.720 | that you want the AI you're testing
01:13:52.880 | to start from the same set of priors that humans start with.
01:13:57.600 | - Right.
01:13:58.880 | So, I mean, to me personally,
01:14:01.480 | but I think to a lot of people,
01:14:02.760 | the human side of things is very interesting.
01:14:05.360 | So testing intelligence for humans.
01:14:08.040 | What do you think is a good test of human intelligence?
01:14:12.920 | - Well, that's the question that psychometrics
01:14:17.720 | is interested in.
01:14:19.400 | - What's-
01:14:20.240 | - That's an entire subfield of psychology
01:14:22.480 | that deals with this question.
01:14:23.840 | - So what's psychometrics?
01:14:25.240 | - The psychometrics is the subfield of psychology
01:14:28.000 | that tries to measure, quantify aspects of the human mind.
01:14:33.000 | So in particular, cognitive abilities, intelligence,
01:14:37.040 | and personality traits as well.
01:14:39.720 | - So, like what are, might be a weird question,
01:14:43.640 | but what are like the first principles
01:14:45.680 | of psychometrics that operates on, you know,
01:14:52.160 | what are the priors it brings to the table?
01:14:55.400 | - So it's a field with a fairly long history.
01:14:58.720 | It's, so, you know, psychology sometimes gets
01:15:03.840 | a bad reputation for not having very reproducible results.
01:15:08.840 | And so psychometrics has actually some
01:15:11.520 | fairly solidly reproducible results.
01:15:14.120 | So the ideal goals of the field is, you know,
01:15:17.640 | a test should be reliable,
01:15:20.000 | which is a notion tied to reproducibility.
01:15:23.160 | It should be valid, meaning that it should actually
01:15:26.560 | measure what you say it measures.
01:15:29.440 | So for instance, if you're saying
01:15:32.800 | that you're measuring intelligence,
01:15:34.160 | then your test results should be correlated
01:15:36.640 | with things that you expect to be correlated
01:15:39.160 | with intelligence, like success in school,
01:15:41.480 | or success in the workplace, and so on.
01:15:43.640 | Should be standardized, meaning that you can administer
01:15:47.440 | your tests to many different people in some conditions.
01:15:50.520 | And it should be free from bias,
01:15:52.960 | meaning that, for instance, if your test involves
01:15:57.240 | the English language, then you have to be aware
01:15:59.720 | that this creates a bias against people
01:16:02.560 | who have English as their second language,
01:16:04.400 | or people who can't speak English at all.
01:16:07.280 | So of course, these principles
01:16:09.640 | for creating psychometric tests are very much an ideal.
01:16:13.520 | I don't think every psychometric test
01:16:15.520 | is really either reliable,
01:16:18.600 | valid, or free from bias.
01:16:22.160 | But at least the field is aware of these weaknesses,
01:16:25.800 | and is trying to address them.
01:16:27.480 | - So it's kind of interesting.
01:16:28.980 | Ultimately, you're only able to measure,
01:16:31.920 | like you said previously, the skill.
01:16:34.520 | But you're trying to do a bunch of measures
01:16:36.520 | of different skills that correlate,
01:16:38.960 | as you mentioned, strongly with some general concept
01:16:41.880 | of cognitive ability.
01:16:43.440 | - Yes, yes.
01:16:44.280 | - So what's the G factor?
01:16:46.640 | - So, right, there are many different kinds
01:16:48.240 | of tests of intelligence.
01:16:50.720 | And each of them is interested
01:16:53.960 | in different aspects of intelligence.
01:16:56.240 | Some of them will deal with language,
01:16:57.600 | some of them will deal with spatial vision,
01:17:01.000 | maybe mental rotations, numbers, and so on.
01:17:04.520 | When you run these very different tests at scale,
01:17:08.640 | what you start seeing is that there are clusters
01:17:11.680 | of correlations among test results.
01:17:14.160 | So for instance, if you look at homework at school,
01:17:19.360 | you will see that people who do well at math
01:17:21.840 | are also likely, statistically, to do well in physics.
01:17:25.600 | And what's more, there are also people who do well at math,
01:17:29.560 | and physics are also statistically likely
01:17:32.040 | to do well in things that sound completely unrelated,
01:17:35.640 | like writing an English essay, for instance.
01:17:38.520 | And so when you see clusters of correlations
01:17:41.600 | in statistical terms, you would explain them
01:17:46.200 | with a latent variable.
01:17:47.680 | And the latent variable that would, for instance,
01:17:49.400 | explain the relationship between being good at math
01:17:53.040 | and being good at physics would be cognitive ability.
01:17:56.200 | And the G factor is the latent variable
01:18:00.840 | that explains the fact that every test of intelligence
01:18:05.600 | that you can come up with results on this test
01:18:09.360 | and they're being correlated.
01:18:10.480 | So there is some single, unique variable
01:18:16.240 | that explains these correlations, that's the G factor.
01:18:18.880 | So it's a statistical construct.
01:18:20.360 | It's not really something you can directly measure,
01:18:23.080 | for instance, in a person.
01:18:25.600 | - But it's there.
01:18:26.600 | - But it's there, it's there, it's there at scale.
01:18:28.720 | And that's also one thing I want to mention
01:18:31.960 | about psychometrics.
01:18:33.520 | Like, you know, when you talk about measuring intelligence
01:18:36.640 | in humans, for instance, some people get a little bit worried
01:18:40.080 | they will say, you know, that sounds dangerous,
01:18:41.960 | maybe that sounds potentially discriminatory and so on.
01:18:44.360 | And they're not wrong.
01:18:46.560 | And the thing is, so personally,
01:18:48.120 | I'm not interested in psychometrics
01:18:50.320 | as a way to characterize one individual person.
01:18:54.800 | Like if I get your psychometric personality assessments
01:18:59.800 | or your IQ, I don't think that actually tells me much
01:19:03.080 | about you as a person.
01:19:05.040 | I think psychometrics is most useful as a statistical tool.
01:19:10.040 | So it's most useful at scale.
01:19:12.560 | It's most useful when you start getting test results
01:19:15.480 | for a large number of people
01:19:17.480 | and you start cross correlating these test results
01:19:20.640 | because that gives you information about the structure
01:19:25.360 | of the human mind,
01:19:26.520 | in particular about the structure
01:19:28.440 | of human cognitive abilities.
01:19:29.840 | So at scale, psychometrics paints a certain picture
01:19:34.840 | of the human mind, and that's interesting.
01:19:37.240 | And that's what's relevant to AI,
01:19:39.000 | the structure of human cognitive abilities.
01:19:41.200 | - Yeah, it gives you an insight into,
01:19:42.880 | I mean, to me, I remember when I learned about g-factor,
01:19:45.920 | it seemed like it would be impossible for it to be real,
01:19:50.920 | even as a statistical variable.
01:19:55.560 | Like it felt kind of like astrology.
01:19:59.080 | Like it's like wishful thinking among psychologists.
01:20:02.080 | But the more I learned, I realized that there's some,
01:20:05.760 | I mean, I'm not sure what to make about human beings,
01:20:07.680 | the fact that the g-factor is a thing.
01:20:10.280 | That there's a commonality across all of the human species,
01:20:13.320 | that there does need to be a strong correlation
01:20:15.400 | between cognitive abilities.
01:20:17.160 | That's kind of fascinating.
01:20:18.600 | - Yeah, so human cognitive abilities have a structure,
01:20:22.840 | like the most mainstream theory of the structure
01:20:25.440 | of cognitive abilities is called a CHC theory.
01:20:28.840 | So Cattle, Horn, Carroll,
01:20:30.720 | it's named after the three psychologists
01:20:32.800 | who contributed key pieces of it.
01:20:35.360 | And it describes cognitive abilities as a hierarchy
01:20:40.120 | with three levels.
01:20:41.080 | And at the top, you have the g-factor,
01:20:43.160 | then you have broad cognitive abilities,
01:20:46.160 | for instance, fluid intelligence, right?
01:20:48.680 | That encompass a broad set of possible kinds of tasks
01:20:53.680 | that are all related.
01:20:57.080 | And then you have narrow cognitive abilities
01:20:59.920 | at the last level, which is closer to task specific skill.
01:21:04.320 | And there are actually different theories
01:21:08.520 | of the structure of cognitive abilities
01:21:10.000 | that just emerged from different statistical analysis
01:21:12.240 | of IQ test results.
01:21:14.360 | But they all describe a hierarchy
01:21:17.040 | with a kind of g-factor at the top.
01:21:21.120 | And you're right that the g-factor is,
01:21:23.720 | it's not quite real in the sense
01:21:25.680 | that it's not something you can observe and measure,
01:21:29.000 | like your height, for instance.
01:21:30.320 | But it's really in the sense that you see it
01:21:33.920 | in a statistical analysis of the data, right?
01:21:37.680 | One thing I want to mention is that
01:21:39.240 | the fact that there is a g-factor
01:21:40.880 | does not really mean that human intelligence
01:21:43.120 | is a general in a strong sense.
01:21:45.800 | Does not mean human intelligence
01:21:47.240 | can be applied to any problem at all.
01:21:50.360 | And that someone who has a high IQ
01:21:52.200 | is gonna be able to solve any problem at all.
01:21:54.200 | That's not quite what it means.
01:21:55.320 | I think one popular analogy to understand it
01:22:00.320 | is the sports analogy.
01:22:03.400 | If you consider the concept of physical fitness,
01:22:06.760 | it's a concept that's very similar to intelligence
01:22:09.240 | because it's a useful concept.
01:22:11.440 | It's something you can intuitively understand.
01:22:14.480 | Some people are fit, maybe like you.
01:22:17.680 | Some people are not as fit, maybe like me.
01:22:20.600 | - But none of us can fly.
01:22:21.960 | - Absolutely.
01:22:23.920 | - It's a constraint to a specific set of skills.
01:22:25.240 | - Even if you're very fit,
01:22:26.560 | that doesn't mean you can do anything at all
01:22:29.960 | in any environment.
01:22:31.200 | You obviously cannot fly.
01:22:32.400 | You cannot surf over the bottom of the ocean and so on.
01:22:36.000 | And if you were a scientist
01:22:37.920 | and you wanted to precisely define
01:22:40.760 | and measure physical fitness in humans,
01:22:43.440 | then you would come up with a battery of tests.
01:22:47.200 | Like you would have running 100 meter,
01:22:50.760 | playing soccer, playing table tennis, swimming, and so on.
01:22:54.200 | And if you run these tests over many different people,
01:22:58.440 | you would start seeing correlations in test results.
01:23:01.400 | For instance, people who are good at soccer
01:23:03.040 | are also good at sprinting.
01:23:05.640 | And you would explain these correlations
01:23:08.600 | with physical abilities
01:23:10.480 | that are strictly analogous to cognitive abilities.
01:23:14.040 | And then you would start also observing correlations
01:23:17.080 | between biological characteristics,
01:23:21.240 | like maybe lung volume is correlated
01:23:23.680 | with being a fast runner, for instance.
01:23:27.120 | And in the same way that there are neurophysical correlates
01:23:31.760 | of cognitive abilities.
01:23:34.040 | And at the top of the hierarchy of physical abilities
01:23:38.840 | that you would be able to observe,
01:23:39.960 | you would have a G factor, a physical G factor,
01:23:43.080 | which would map to physical fitness.
01:23:45.800 | And as you just said,
01:23:47.520 | that doesn't mean that people
01:23:49.320 | with high physical fitness can fly.
01:23:51.320 | It doesn't mean human morphology
01:23:53.560 | and human physiology is universal.
01:23:55.680 | It's actually super specialized.
01:23:57.880 | We can only do the things that we were evolved to do.
01:24:03.680 | Like we are not appropriate to...
01:24:05.960 | You could not exist on Venus or Mars
01:24:09.960 | or in the void of space or the bottom of the ocean.
01:24:12.480 | So that said, one thing that's really striking
01:24:15.360 | and remarkable is that our morphology
01:24:20.360 | generalizes far beyond the environments that we evolved for.
01:24:27.160 | Like in a way you could say we evolved
01:24:29.360 | to run after prey in the savannah, right?
01:24:32.920 | That's very much where our human morphology comes from.
01:24:36.880 | And that said, we can do a lot of things
01:24:40.720 | that are completely unrelated to that.
01:24:42.960 | We can climb mountains, we can swim across lakes,
01:24:47.240 | we can play table tennis.
01:24:49.000 | I mean, table tennis is very different
01:24:50.720 | from what we were evolved to do, right?
01:24:53.160 | So our morphology, our bodies,
01:24:55.360 | our sense of motor affordances
01:24:57.640 | are of a degree of generality
01:24:59.520 | that is absolutely remarkable, right?
01:25:02.280 | And I think cognition is very similar to that.
01:25:05.360 | Our cognitive abilities have a degree of generality
01:25:08.280 | that goes far beyond what the mind
01:25:10.480 | was initially supposed to do,
01:25:12.400 | which is why we can play music and write novels
01:25:15.280 | and go to Mars and do all kinds of crazy things.
01:25:18.640 | But it's not universal in the same way
01:25:20.840 | that human morphology and our body
01:25:23.400 | is not appropriate for actually most
01:25:26.200 | of the universe by volume.
01:25:27.800 | In the same way you could say that the human mind
01:25:29.680 | is not really appropriate for most of problem space,
01:25:32.680 | potential problem space by volume.
01:25:35.480 | So we have very strong cognitive biases, actually,
01:25:39.720 | that mean that there are certain types of problems
01:25:42.680 | that we handle very well,
01:25:43.640 | and certain types of problem
01:25:45.400 | that we are completely inadaptive for.
01:25:48.280 | So that's really how we'd interpret the G-factor.
01:25:52.440 | It's not a sign of strong generality.
01:25:56.800 | It's really just the broadest cognitive ability.
01:26:00.240 | But our abilities,
01:26:02.560 | whether we are talking about sensory motor abilities
01:26:05.200 | or cognitive abilities,
01:26:06.280 | they still remain very specialized
01:26:09.520 | in the human condition, right?
01:26:11.440 | - Within the constraints of the human cognition,
01:26:16.400 | they're general. (laughs)
01:26:18.400 | - Yes, absolutely.
01:26:19.560 | - But the constraints, as you're saying, are very limited.
01:26:21.640 | - What's, I think what's, yeah.
01:26:23.120 | - Limiting.
01:26:23.960 | So we evolved our cognition and our body,
01:26:27.040 | evolved in very specific environments.
01:26:29.480 | Because our environment was so variable,
01:26:31.840 | fast-changing, and so unpredictable,
01:26:34.600 | part of the constraints that drove our evolution
01:26:37.680 | is generality itself.
01:26:39.600 | So we were, in a way, evolved to be able to improvise
01:26:42.800 | in all kinds of physical or cognitive environments, right?
01:26:46.280 | - Yeah.
01:26:47.640 | - And for this reason,
01:26:49.200 | it turns out that the minds and bodies
01:26:53.080 | that we ended up with
01:26:55.120 | can be applied to much, much broader scope
01:26:58.120 | than what they were evolved for, right?
01:27:00.080 | And that's truly remarkable.
01:27:01.880 | And that goes, that's a degree of generalization
01:27:03.960 | that is far beyond anything you can see
01:27:06.600 | in artificial systems today, right?
01:27:08.680 | That said, it does not mean that human intelligence
01:27:14.560 | is anywhere universal.
01:27:16.400 | - Yeah, it's not general.
01:27:17.640 | You know, it's a kind of exciting topic for people,
01:27:21.160 | even outside of artificial intelligence, is IQ tests.
01:27:24.580 | I think it's Mensa, whatever.
01:27:29.200 | There's different degrees of difficulty for questions.
01:27:32.440 | We talked about this offline a little bit, too,
01:27:34.680 | about sort of difficult questions.
01:27:37.200 | You know, what makes a question on an IQ test
01:27:40.760 | more difficult or less difficult, do you think?
01:27:43.720 | - So the thing to keep in mind is that
01:27:46.120 | there's no such thing as a question
01:27:49.000 | that's intrinsically difficult.
01:27:51.600 | It has to be difficult to suspect
01:27:53.880 | to the things you already know
01:27:55.720 | and the things you can already do, right?
01:27:58.520 | So in terms of an IQ test question,
01:28:02.760 | typically it would be structured, for instance,
01:28:05.960 | as a set of demonstration input and output pairs, right?
01:28:10.960 | And then you would be given a test input, a prompt,
01:28:15.440 | and you would need to recognize or produce
01:28:18.680 | the corresponding output.
01:28:20.360 | And in that narrow context,
01:28:23.120 | you could say a difficult question
01:28:25.200 | is a question where the input prompt
01:28:31.000 | is very surprising and unexpected,
01:28:34.560 | given the training examples.
01:28:36.520 | - Just even the nature of the patterns
01:28:38.320 | that you're observing in the input prompt.
01:28:40.160 | - For instance, let's say you have a rotation problem.
01:28:43.280 | You must rotate a shape by 90 degrees.
01:28:46.720 | If I give you two examples,
01:28:48.160 | and then I give you one prompt,
01:28:50.520 | which is actually one of the two training examples,
01:28:53.040 | then there is zero generalization difficulty for the task.
01:28:56.360 | It's actually a trivial task.
01:28:57.480 | You just recognize that it's one of the training examples
01:29:00.760 | and you probably use the same answer.
01:29:02.320 | Now, if it's a more complex shape,
01:29:05.560 | there is a little bit more generalization,
01:29:07.680 | but it remains that you are still doing the same thing
01:29:11.200 | at test time as you were being demonstrated
01:29:13.880 | at training time.
01:29:15.080 | A difficult task starts to require
01:29:17.680 | some amount of test time adaptation,
01:29:21.280 | some amount of improvisation.
01:29:25.120 | So consider, I don't know,
01:29:28.800 | you're teaching a class on quantum physics or something.
01:29:31.680 | If you wanted to test the understanding
01:29:39.080 | that students have of the material,
01:29:41.480 | you would come up with an exam
01:29:45.720 | that's very different from anything they've seen
01:29:48.800 | like on the internet when they were cramming.
01:29:51.720 | On the other hand, if you wanted to make it easy,
01:29:54.840 | you would just give them something
01:29:56.320 | that's very similar to the mock exams
01:30:00.440 | that they've taken,
01:30:02.400 | something that's just a simple interpolation
01:30:04.640 | of questions that they've already seen.
01:30:07.360 | And so that would be an easy exam.
01:30:09.280 | It's very similar to what you've been trained on.
01:30:12.040 | And a difficult exam is one
01:30:13.440 | that really probes your understanding
01:30:15.480 | because it forces you to improvise.
01:30:19.000 | It forces you to do things that are different
01:30:22.680 | from what you were exposed to before.
01:30:24.720 | So that said, it doesn't mean that the exam
01:30:28.880 | that requires improvisation is intrinsically hard, right?
01:30:32.680 | Because maybe you're a quantum physics expert.
01:30:35.840 | So when you take the exam,
01:30:37.240 | this is actually stuff that despite being new
01:30:39.560 | to the students, it's not new to you, right?
01:30:42.880 | So it can only be difficult with respect
01:30:46.040 | to what the test taker already knows
01:30:49.440 | and with respect to the information
01:30:51.800 | that the test taker has about the task.
01:30:54.600 | So that's what I mean by controlling for priors,
01:30:57.920 | what you, the information you bring to the table.
01:30:59.920 | - And the experience.
01:31:00.760 | - And the experience, which is the training data.
01:31:02.560 | So in the case of the quantum physics exam,
01:31:05.560 | that would be all the course material itself
01:31:09.720 | and all the mock exams that students might have taken online.
01:31:12.920 | - Yeah, it's interesting 'cause I've also,
01:31:15.880 | I sent you an email and asked you,
01:31:18.520 | like I've been, just this curious question of,
01:31:22.480 | you know, what's a really hard IQ test question.
01:31:27.520 | And I've been talking to also people
01:31:30.600 | who have designed IQ tests.
01:31:32.560 | There's a few folks on the internet.
01:31:33.800 | It's like a thing.
01:31:34.640 | People are really curious about it.
01:31:36.200 | First of all, most of the IQ tests they designed,
01:31:39.480 | they like religiously protect against the correct answers.
01:31:44.480 | Like you can't find the correct answers anywhere.
01:31:48.400 | In fact, the question is ruined once you know,
01:31:50.660 | even like the approach you're supposed to take.
01:31:53.760 | So they're very--
01:31:54.600 | - That's the approach is implicit in the training examples.
01:31:58.480 | So if you release the training examples, it's over.
01:32:00.980 | - Well--
01:32:02.800 | - Which is why in ARC, for instance,
01:32:05.040 | there is a test set that is private and no one has seen it.
01:32:09.200 | - No, for really tough IQ questions, it's not obvious.
01:32:13.640 | It's not because the ambiguity.
01:32:17.160 | Like it's, I mean, we'll have to look through them,
01:32:20.820 | but like some number sequences and so on,
01:32:22.880 | it's not completely clear.
01:32:25.080 | So like you can get a sense, but there's like some,
01:32:29.400 | you know, when you look at a number sequence, I don't know,
01:32:33.600 | like your Fibonacci number sequence,
01:32:37.680 | if you look at the first few numbers,
01:32:39.600 | that sequence could be completed in a lot of different ways.
01:32:43.000 | And, you know, some are, if you think deeply,
01:32:45.660 | are more correct than others.
01:32:46.920 | Like there's a kind of intuitive simplicity
01:32:51.320 | and elegance to the correct solution.
01:32:53.040 | - Yes, I am personally not a fan of ambiguity
01:32:56.440 | in test questions actually,
01:32:58.720 | but I think you can have difficulty
01:33:01.200 | without requiring ambiguity,
01:33:03.160 | simply by making the test require a lot of extrapolation
01:33:08.160 | over the training examples.
01:33:09.520 | - But the beautiful question is difficult,
01:33:13.400 | but gives away everything when you give the training example.
01:33:17.240 | - Basically, yes.
01:33:18.520 | Meaning that, so the tests I'm interested in creating
01:33:23.520 | are not necessarily difficult for humans
01:33:27.800 | because human intelligence is the benchmark.
01:33:31.600 | They're supposed to be difficult for machines
01:33:34.440 | in ways that are easy for humans.
01:33:36.320 | Like I think an ideal test of human and machine intelligence
01:33:40.880 | is a test that is actionable,
01:33:44.440 | that highlights the need for progress,
01:33:48.320 | and that highlights the direction
01:33:50.120 | in which you should be making progress.
01:33:51.560 | - I think we'll talk about the ARC challenge
01:33:54.400 | and the test you've constructed,
01:33:55.600 | and you have these elegant examples.
01:33:58.160 | I think that highlight,
01:33:59.400 | this is really easy for us humans,
01:34:01.840 | but it's really hard for machines.
01:34:04.600 | But on the designing an IQ test
01:34:08.440 | for IQs of higher than 160 and so on,
01:34:13.440 | you have to say, you have to take that
01:34:15.240 | and put it on steroids, right?
01:34:16.520 | You have to think like, what is hard for humans?
01:34:19.600 | And that's a fascinating exercise in itself, I think.
01:34:23.940 | And it was an interesting question
01:34:27.760 | of what it takes to create a really hard question
01:34:31.440 | for humans because you again have to do the same process
01:34:36.320 | as you mentioned, which is something basically
01:34:41.320 | where the experience that you have likely
01:34:45.960 | to have encountered throughout your whole life,
01:34:48.720 | even if you've prepared for IQ tests,
01:34:51.760 | which is a big challenge,
01:34:53.340 | that this will still be novel for you.
01:34:55.800 | - Yeah, I mean, novelty is a requirement.
01:34:57.880 | You should not be able to practice for the questions
01:35:02.120 | that you're gonna be tested on, that's important.
01:35:04.760 | Because otherwise what you're doing
01:35:06.680 | is not exhibiting intelligence.
01:35:08.200 | What you're doing is just retrieving
01:35:10.720 | what you've been exposed before.
01:35:12.440 | It's the same thing as a deep learning model.
01:35:14.560 | If you train a deep learning model
01:35:15.960 | on all the possible answers, then it will ace your test.
01:35:20.160 | In the same way that a stupid student
01:35:26.000 | can still ace the test if they cram for it,
01:35:30.200 | they memorize a hundred different possible mock exams
01:35:35.040 | and then they hope that the actual exam
01:35:37.200 | will be a very simple interpolation of the mock exams.
01:35:41.200 | And that student could just be a deep learning model
01:35:43.200 | at that point, but you can actually do that
01:35:45.920 | without any understanding of the material.
01:35:48.200 | And in fact, many students pass their exams
01:35:50.600 | in exactly this way.
01:35:52.000 | And if you want to avoid that,
01:35:53.160 | you need an exam that's unlike anything they've seen
01:35:56.640 | that really probes their understanding.
01:36:00.000 | - So how do we design an IQ test for machines?
01:36:05.000 | An intelligent test for machines?
01:36:08.000 | - All right, so in the paper,
01:36:09.640 | I outline a number of requirements
01:36:12.680 | that you expect of such a test.
01:36:14.920 | And in particular, we should start by acknowledging
01:36:19.720 | the priors that we expect to be required
01:36:23.400 | in order to perform the test.
01:36:25.400 | So we should be explicit about the priors, right?
01:36:28.200 | And if the goal is to compare machine intelligence
01:36:31.920 | and human intelligence,
01:36:32.840 | then we should assume a human cognitive priors, right?
01:36:37.120 | And secondly, we should make sure that we are testing
01:36:42.120 | for skill acquisition ability,
01:36:44.960 | skill acquisition efficiency in particular,
01:36:46.840 | and not for skill itself,
01:36:48.720 | meaning that every task featured in your test
01:36:52.000 | should be novel and should not be something
01:36:54.560 | that you can anticipate.
01:36:56.120 | So for instance, it should not be possible
01:36:58.080 | to brute force the space of possible questions, right?
01:37:02.960 | To pre-generate every possible question and answer.
01:37:06.040 | So it should be tasks that cannot be anticipated,
01:37:10.760 | not just by the system itself,
01:37:12.560 | but by the creators of the system, right?
01:37:16.040 | - Yeah, you know what's fascinating?
01:37:17.760 | I mean, one of my favorite aspects of the paper
01:37:20.920 | and the work you do with the ARC Challenge
01:37:22.960 | is the process of making priors explicit.
01:37:27.240 | Just even that act alone is a really powerful one
01:37:33.520 | of like, what are, it's a really powerful question
01:37:38.520 | to ask of us humans.
01:37:40.560 | What are the priors that we bring to the table?
01:37:42.920 | So the next step is like, once you have those priors,
01:37:46.960 | how do you use them to solve a novel task?
01:37:50.160 | But like, just even making the priors explicit
01:37:53.000 | is a really difficult and really powerful step.
01:37:56.200 | And that's like visually beautiful
01:37:59.040 | and conceptually, philosophically beautiful part
01:38:01.440 | of the work you did with,
01:38:03.280 | and I guess continue to do probably
01:38:06.480 | with the paper and the ARC Challenge.
01:38:08.560 | Can you talk about some of the priors
01:38:10.800 | that we're talking about here?
01:38:12.480 | - Yes, so a researcher that has done a lot of work
01:38:15.440 | on what exactly are the knowledge priors
01:38:19.480 | that are innate to humans is Elizabeth Spelke
01:38:24.480 | from Harvard.
01:38:25.640 | So she developed the core knowledge theory
01:38:30.640 | which outlines four different core knowledge systems.
01:38:35.640 | So systems of knowledge that we are basically
01:38:39.240 | either born with or that we are hardwired
01:38:43.720 | to acquire very early on in our development.
01:38:47.240 | And there's no strong distinction between the two.
01:38:52.080 | Like if you are primed to acquire
01:38:57.080 | as a certain type of knowledge in just a few weeks,
01:39:01.280 | you might as well just be born with it.
01:39:03.560 | It's just part of who you are.
01:39:06.520 | And so there are four different core knowledge systems.
01:39:09.560 | Like the first one is the notion of objectness
01:39:13.520 | and basic physics.
01:39:16.400 | Like you recognize that something that moves
01:39:20.760 | coherently for instance, is an object.
01:39:23.280 | So we intuitively, naturally, innately divide the world
01:39:28.280 | into objects based on this notion of coherence,
01:39:31.360 | physical coherence.
01:39:32.840 | And in terms of elementary physics,
01:39:34.760 | there's the fact that objects can bump against each other
01:39:41.680 | and the fact that they can occlude each other.
01:39:44.520 | So these are things that we are essentially born with
01:39:48.320 | or at least that we are going to be acquiring
01:39:50.800 | extremely early because we're really hardwired
01:39:54.480 | to acquire them.
01:39:55.680 | - So a bunch of points, pixels that move together.
01:40:00.000 | - Are objects.
01:40:01.120 | - Are partly the same object.
01:40:02.880 | - Yes.
01:40:03.720 | - I mean, that like, I don't smoke weed,
01:40:08.800 | but if I did, that's something I could sit like all night
01:40:13.120 | and just like think about.
01:40:14.320 | I remember when I first, in your paper, just objectness.
01:40:16.720 | I wasn't self-aware, I guess, of that particular prior
01:40:21.720 | that that's such a fascinating prior that like--
01:40:27.680 | - That's the most basic one, but actually--
01:40:30.800 | - Objectness, just identity.
01:40:32.880 | Just, yeah, objectness.
01:40:34.440 | I mean, it's very basic, I suppose, but it's so fundamental.
01:40:39.080 | - It is fundamental to human cognition.
01:40:41.400 | - Yeah.
01:40:42.240 | - And the second prior that's also fundamental is agentness,
01:40:46.720 | which is not a real world, but so agentness.
01:40:50.800 | The fact that some of these objects that you segment
01:40:55.240 | your environment into, some of these objects are agents.
01:40:59.000 | So what's an agent?
01:41:00.360 | It's basically it's an object that has goals.
01:41:04.480 | So for instance-- - That has what?
01:41:06.360 | - That has goals, that is capable of pursuing goals.
01:41:09.440 | So for instance, if you see two dots moving
01:41:13.000 | in a roughly synchronized fashion,
01:41:16.320 | you will intuitively infer that one of the dots
01:41:19.800 | is pursuing the other.
01:41:21.600 | So that one of the dots is, and one of the dots is an agent,
01:41:26.600 | and its goal is to avoid the other dot.
01:41:29.440 | And one of the dots, the other dot is also an agent,
01:41:32.760 | and its goal is to catch the first dot.
01:41:35.840 | Pelkey has shown that babies, as young as three months,
01:41:40.560 | identify agentness and goal-directedness
01:41:45.240 | in their environment.
01:41:46.440 | Another prior is basic geometry and topology,
01:41:51.440 | like the notion of distance,
01:41:53.680 | the ability to navigate in your environment and so on.
01:41:57.640 | This is something that is fundamentally hardwired
01:42:01.400 | into our brain.
01:42:02.720 | It's in fact backed by very specific neural mechanisms,
01:42:07.080 | like for instance, grid cells and plate cells.
01:42:10.800 | So it's something that's literally hard-coded
01:42:15.240 | at the neural level in our hippocampus.
01:42:19.920 | And the last prior would be the notion of numbers.
01:42:23.560 | Like numbers are not actually a cultural construct.
01:42:26.440 | We are intuitively, innately able to do some basic counting
01:42:31.440 | and to compare quantities.
01:42:34.960 | So it doesn't mean we can do arbitrary arithmetic.
01:42:37.560 | - Counting, the act of counting.
01:42:39.960 | - Counting, like counting one, two, three-ish,
01:42:42.320 | then maybe more than three.
01:42:44.560 | You can also compare quantities.
01:42:45.920 | If I give you three dots and five dots,
01:42:49.360 | you can tell the side with five dots has more dots.
01:42:53.280 | So this is actually an innate prior.
01:42:56.400 | So that said, the list may not be exhaustive.
01:43:00.560 | So Spelke is still pursuing
01:43:04.480 | the potential existence of new knowledge systems,
01:43:09.840 | for instance, knowledge systems
01:43:12.160 | that would deal with social relationships.
01:43:15.760 | - Yeah, yeah, I mean, and there could be-
01:43:19.160 | - Which is much less relevant
01:43:22.120 | to something like ARC or IQ test in general.
01:43:24.360 | - Right, there could be stuff that's,
01:43:27.560 | like you said, rotation, symmetry,
01:43:29.680 | is it really interesting?
01:43:31.080 | - It's very likely that there is,
01:43:33.320 | speaking about rotation, that there is in the brain,
01:43:37.240 | a hard-coded system that is capable of performing rotations.
01:43:40.920 | One famous experiment that people did in the,
01:43:45.840 | I don't remember which was exactly, but in the '70s,
01:43:51.400 | was that people found that if you asked people,
01:43:54.360 | if you give them two different shapes,
01:43:57.560 | and one of the shapes is a rotated version
01:44:01.400 | of the first shape, and you ask them,
01:44:03.320 | is that shape a rotated version of the first shape or not?
01:44:06.760 | What you see is that the time it takes people to answer
01:44:11.160 | is linearly proportional, right, to the angle of rotation.
01:44:16.160 | So it's almost like you have somewhere in your brain,
01:44:19.640 | like a turntable with a fixed speed,
01:44:24.040 | and if you want to know if two objects
01:44:26.600 | are rotated versions of each other,
01:44:29.040 | you put the object on the turntable,
01:44:31.680 | you let it move around a little bit,
01:44:34.760 | and then you stop when you have a match.
01:44:37.600 | And that's really interesting.
01:44:40.160 | - So what's the ARC challenge?
01:44:42.760 | - So in the paper I outlined, all these principles,
01:44:47.360 | that a good test of machine intelligence
01:44:50.160 | and human intelligence should follow.
01:44:51.960 | And the ARC challenge is one attempt
01:44:55.320 | to embody as many of these principles as possible.
01:44:58.560 | So I don't think it's anywhere near a perfect attempt,
01:45:02.600 | it does not actually follow every principle,
01:45:06.080 | but it is what I was able to do given the constraints.
01:45:10.680 | So the format of ARC is very similar to classic IQ tests,
01:45:15.560 | in particular Raven's Progressive Matrices.
01:45:18.000 | - Raven's?
01:45:19.000 | - Yeah, Raven's Progressive Matrices.
01:45:20.560 | I mean, if you've done IQ tests in the past,
01:45:22.840 | you know what it is probably, or at least you've seen it,
01:45:25.200 | even if you don't know what it's called.
01:45:27.040 | And so you have a set of tasks, that's what they're called,
01:45:32.040 | and for each task you have training data,
01:45:37.080 | which is a set of input and output pairs.
01:45:40.280 | So an input or output pair is a grid of colors, basically,
01:45:45.520 | the size of the grid is variable.
01:45:48.080 | The size of the grid is variable.
01:45:51.480 | And you're given an input and you must transform it
01:45:56.160 | into the proper output, right?
01:45:59.120 | And so you're shown a few demonstrations of a task
01:46:02.800 | in the form of existing input/output pairs,
01:46:05.120 | and then you're given a new input,
01:46:06.960 | and you must produce the correct output.
01:46:12.680 | And the assumptions in ARC is that
01:46:17.680 | every task should only require core knowledge priors,
01:46:25.440 | should not require any outside knowledge.
01:46:30.360 | So for instance, no language, no English, nothing like this,
01:46:35.360 | no concepts taken from our human experience,
01:46:41.520 | like trees, dogs, cats, and so on.
01:46:44.280 | So only reasoning tasks that are built
01:46:49.280 | on top of core knowledge priors.
01:46:52.080 | And some of the tasks are actually explicitly trying
01:46:56.560 | to probe specific forms of abstraction, right?
01:47:01.560 | Part of the reason why I wanted to create ARC
01:47:05.520 | is I'm a big believer in, you know,
01:47:11.120 | when you're faced with a problem as murky
01:47:16.120 | as understanding how to autonomously generate abstraction
01:47:20.960 | in a machine, you have to co-evolve the solution
01:47:25.440 | and the problem.
01:47:27.120 | And so part of the reason why I designed ARC
01:47:29.360 | was to clarify my ideas about the nature of abstraction,
01:47:33.720 | right?
01:47:34.720 | And some of the tasks are actually designed
01:47:36.680 | to probe bits of that theory.
01:47:39.920 | And there are things that turn out to be very easy
01:47:43.240 | for humans to perform, including young kids, right?
01:47:46.760 | But turn out to be near impossible for machines.
01:47:50.520 | - So what have you learned from the nature of abstraction
01:47:53.800 | from designing that?
01:47:57.000 | Can you clarify what you mean?
01:47:59.480 | One of the things you wanted to try to understand
01:48:02.320 | was this idea of abstraction.
01:48:06.040 | - Yes, so clarifying my own ideas about abstraction
01:48:10.360 | by forcing myself to produce tasks that would require
01:48:14.800 | the ability to produce that form of abstraction
01:48:18.120 | in order to solve them.
01:48:19.840 | - Got it.
01:48:20.920 | Okay, so, and by the way, just to, I mean,
01:48:23.080 | people should check out, I'll probably overlay
01:48:24.960 | if you're watching the video part,
01:48:26.360 | but the grid input output
01:48:29.120 | with the different colors on the grid, that's it.
01:48:34.040 | That's, I mean, it's a very simple world,
01:48:36.280 | but it's kind of beautiful.
01:48:37.480 | - It's very similar to classic IQ test.
01:48:39.560 | Like it's not very original in that sense.
01:48:41.600 | The main difference with IQ tests is that
01:48:44.280 | we make the priors explicit,
01:48:46.160 | which is not usually the case in IQ test.
01:48:48.600 | So you make it explicit that everything should only be built
01:48:51.720 | on top of core knowledge priors.
01:48:53.920 | I also think it's generally more diverse
01:48:57.800 | than IQ test in general.
01:48:59.280 | And it's, it perhaps requires a bit more manual work
01:49:03.840 | to produce solutions because you have to click around
01:49:06.640 | on a grid for a while.
01:49:08.480 | Sometimes the grids can be as large as 30 by 30 cells.
01:49:12.000 | - So how did you come up,
01:49:13.200 | if you can reveal with the questions,
01:49:18.000 | like what's the process of the questions?
01:49:19.560 | Was it mostly you?
01:49:20.880 | - Yes.
01:49:21.720 | - That came up with the questions?
01:49:22.680 | What, how difficult is it to come up with a question?
01:49:25.480 | Like, is this scalable to a much larger number?
01:49:30.480 | If we think, you know, with IQ tests,
01:49:32.320 | you might not necessarily want it to,
01:49:34.640 | or need it to be scalable.
01:49:36.480 | With machines, it's possible you could argue
01:49:40.040 | that it needs to be scalable.
01:49:41.640 | - So there are a thousand questions,
01:49:44.240 | a thousand tasks in total.
01:49:46.560 | - Wow.
01:49:47.400 | - Including the tests and the prior test set.
01:49:49.160 | I think it's fairly difficult in the sense that
01:49:51.440 | a big requirement is that every task should be novel
01:49:56.200 | and unique and unpredictable, right?
01:50:00.000 | Like you don't want to create your own little world
01:50:04.240 | that is simple enough that it would be possible
01:50:08.160 | for a human to reverse and generate
01:50:11.080 | and write down an algorithm that could generate
01:50:14.840 | every possible arc task and their solutions,
01:50:17.120 | for instance, that would completely invalidate the test.
01:50:20.200 | - So you're constantly coming up with new stuff.
01:50:21.400 | - You need, yeah, you need a source of novelty,
01:50:24.920 | of unthinkable novelty.
01:50:27.960 | And one thing I found is that as a human,
01:50:32.040 | you are not a very good source of unthinkable novelty.
01:50:36.520 | And so you have to pace the creation of these tasks
01:50:40.600 | quite a bit.
01:50:41.440 | There are only so many unique tasks
01:50:43.000 | that you can do in a given day.
01:50:44.560 | - So I mean, it's coming up with truly original new ideas.
01:50:48.600 | Did psychedelics help you at all?
01:50:52.400 | No, I'm just kidding.
01:50:53.800 | But I mean, that's fascinating to think about.
01:50:55.800 | So you would be like walking or something like that.
01:50:58.640 | Are you constantly thinking of something totally new?
01:51:02.960 | - Yes.
01:51:03.800 | (laughing)
01:51:05.760 | - I mean, this is hard.
01:51:07.040 | This is hard.
01:51:07.880 | - Yeah, I mean, I'm not saying you've done anywhere
01:51:10.960 | near a perfect job at it.
01:51:12.440 | There is some amount of redundancy
01:51:14.560 | and there are many imperfections in arc.
01:51:16.800 | So that said, you should consider arc as a work in progress.
01:51:19.840 | It is not the definitive state
01:51:24.800 | where the arc tasks today
01:51:26.600 | are not definitive states of the test.
01:51:29.360 | I want to keep refining it in the future.
01:51:32.640 | I also think it should be possible
01:51:35.360 | to open up the creation of tasks to a broad audience
01:51:38.640 | to do crowdsourcing.
01:51:39.880 | That would involve several levels of filtering, obviously.
01:51:44.160 | But I think it's possible to apply crowdsourcing
01:51:46.240 | to develop a much bigger and much more diverse arc data set.
01:51:51.120 | That would also be free of potentially,
01:51:53.680 | some of my own personal biases.
01:51:56.480 | - So is there always need to be a part of arc
01:51:59.240 | that's the test, like it's hidden?
01:52:02.960 | - Yes, absolutely.
01:52:04.200 | It is imperative that the test set
01:52:08.560 | that you're using to actually benchmark algorithms
01:52:11.960 | is not accessible to the people developing these algorithms.
01:52:15.280 | Because otherwise what's going to happen
01:52:16.680 | is that the human engineers
01:52:18.760 | are just going to solve the tasks themselves
01:52:21.800 | and code their solution in program form.
01:52:24.840 | But that again, what you're seeing here
01:52:27.400 | is the process of intelligence
01:52:29.680 | happening in the mind of the human.
01:52:31.120 | And then you're just capturing its crystallized output.
01:52:35.440 | But that's crystallized output
01:52:37.160 | is not the same thing as the process that generated it.
01:52:40.040 | It's not intelligent in itself.
01:52:41.320 | - So what, by the way,
01:52:42.600 | the idea of crowdsourcing it is fascinating.
01:52:44.920 | I think the creation of questions
01:52:49.920 | is really exciting for people.
01:52:51.440 | I think there's a lot of really brilliant people out there
01:52:54.240 | that love to create these kinds of stuff.
01:52:56.200 | - Yeah, one thing that kind of surprised me
01:52:59.000 | that I wasn't expecting is that
01:53:00.800 | lots of people seem to actually enjoy arc as a kind of game.
01:53:05.800 | And I was really seeing it as a test,
01:53:08.600 | as a benchmark of fluid general intelligence.
01:53:13.600 | And lots of people just, including kids,
01:53:17.040 | just start enjoying it as a game.
01:53:18.760 | So I think that's encouraging.
01:53:20.920 | - Yeah, I'm fascinated by it.
01:53:22.280 | There's a world of people who create IQ questions.
01:53:24.840 | I think that's a cool activity
01:53:30.840 | for machines and for humans.
01:53:32.560 | And humans are themselves fascinated
01:53:35.400 | by taking the questions,
01:53:37.680 | like measuring their own intelligence.
01:53:42.320 | I mean, that's just really compelling.
01:53:44.400 | It's really interesting to me too.
01:53:45.720 | It helps.
01:53:46.960 | One of the cool things about arc, you said,
01:53:48.680 | it's kind of inspired by IQ tests or whatever,
01:53:51.560 | follows a similar process.
01:53:53.400 | But because of its nature,
01:53:54.980 | because of the context in which it lives,
01:53:57.140 | it immediately forces you to think about
01:54:00.360 | the nature of intelligence
01:54:01.600 | as opposed to just the test of your own.
01:54:03.880 | Like it forces you to really think.
01:54:05.960 | I don't know if it's within the question,
01:54:09.840 | inherent in the question,
01:54:10.960 | or just the fact that it lives in the test
01:54:13.280 | that's supposed to be a test of machine intelligence.
01:54:15.360 | - Absolutely.
01:54:16.200 | As you solve arc tasks as a human,
01:54:19.680 | you will be forced to basically introspect
01:54:24.640 | how you come up with solutions.
01:54:27.080 | And that forces you to reflect
01:54:29.040 | on the human problem-solving process
01:54:33.880 | and the way your own mind generates
01:54:37.080 | abstract representations of the problems it's exposed to.
01:54:44.560 | I think it's due to the fact that
01:54:47.560 | the set of core knowledge priors
01:54:50.200 | that arc is built upon is so small.
01:54:52.560 | It's all a recombination of a very, very small set
01:54:57.560 | of assumptions.
01:55:00.520 | - Okay, so what's the future of arc?
01:55:02.920 | So you held arc as a challenge
01:55:05.080 | as part of like a Kaggle competition.
01:55:06.720 | - Yes.
01:55:07.560 | - Kaggle competition.
01:55:08.440 | And what do you think?
01:55:12.160 | Do you think this is something that continues
01:55:13.600 | for five years, 10 years, like just continues growing?
01:55:17.880 | - Yes, absolutely.
01:55:18.960 | So arc itself will keep evolving.
01:55:21.360 | So I've talked about crowdsourcing.
01:55:22.800 | I think that's a good avenue.
01:55:25.920 | Another thing I'm starting is I'll be collaborating
01:55:30.080 | with folks from the psychology department at NYU
01:55:34.360 | to do human testing on arc.
01:55:36.800 | And I think there are lots of interesting questions
01:55:39.000 | you can start asking,
01:55:39.840 | especially as you start correlating machine solutions
01:55:44.840 | to arc tasks and the human characteristics of solutions.
01:55:50.080 | Like for instance, you can try to see
01:55:52.040 | if there's a relationship
01:55:53.600 | between the human perceived difficulty of a task and--
01:55:58.600 | - Machine perceived.
01:55:59.440 | - Yes, and exactly some measure
01:56:01.480 | of machine perceived difficulty.
01:56:02.800 | - Yeah, it's a nice playground
01:56:04.680 | in which to explore this very difference.
01:56:06.320 | It's the same thing as we talked about
01:56:07.680 | with autonomous vehicles.
01:56:09.280 | The things that could be difficult for humans
01:56:10.920 | might be very different than the things that--
01:56:12.240 | - Yes, absolutely.
01:56:13.080 | - And formalizing or making explicit
01:56:16.520 | that difference in difficulty may teach us something
01:56:20.440 | fundamental about intelligence.
01:56:22.280 | - So one thing I think we did well with arc
01:56:25.040 | is that it's proving to be a very actionable test
01:56:31.400 | in the sense that machine performance
01:56:35.120 | and arc started at very much zero initially.
01:56:39.260 | While humans found actually the tasks very easy.
01:56:43.320 | And that alone was like a big red flashing light
01:56:47.880 | saying that something is going on
01:56:49.800 | and that we are missing something.
01:56:52.360 | And at the same time,
01:56:54.560 | machine performance did not stay at zero for very long.
01:56:57.680 | Actually within two weeks of the Kaggle competition,
01:57:00.280 | we started having a non-zero number.
01:57:03.280 | And now the state of the art is around
01:57:05.640 | 20% of the test set solved.
01:57:08.940 | And so arc is actually a challenge
01:57:12.500 | where our capabilities start at zero,
01:57:15.940 | which indicates the need for progress.
01:57:18.260 | But it's also not an impossible challenge.
01:57:20.580 | It's not accessible.
01:57:21.500 | You can start making progress basically right away.
01:57:25.380 | At the same time,
01:57:27.100 | we are still very far from having solved it.
01:57:29.540 | And that's actually a very positive outcome
01:57:32.820 | of the competition
01:57:33.660 | is that the competition has proven
01:57:35.940 | that there was no obvious shortcut to solve these tasks.
01:57:40.940 | - Yeah, so the test held up.
01:57:43.220 | - Yeah, exactly.
01:57:44.060 | And that was the primary reason to Kaggle competition
01:57:46.940 | is to check if some clever person
01:57:51.020 | was going to hack the benchmark.
01:57:54.580 | And that did not happen.
01:57:56.180 | Like people were solving the task,
01:57:57.900 | are essentially doing it.
01:58:01.140 | Well, in a way,
01:58:02.820 | they're actually exploring some flaws of arc
01:58:05.580 | that we will need to address in the future,
01:58:07.420 | especially they're essentially anticipating
01:58:09.900 | what sort of tasks may be contained in the test set.
01:58:13.860 | - Right, which is kind of,
01:58:16.700 | yeah, that's the kind of hacking.
01:58:18.540 | It's human hacking of the test.
01:58:20.220 | - Yes, that said,
01:58:21.380 | with the state of the art,
01:58:23.380 | it's like 20% were still very, very far from even level,
01:58:28.220 | which is closer to 100%.
01:58:31.020 | And I do believe that it will take a while
01:58:35.580 | until we reach human parity on arc.
01:58:40.580 | And that by the time we have human parity,
01:58:43.540 | we will have AI systems
01:58:45.700 | that are probably pretty close to human level
01:58:48.500 | in terms of general fluid intelligence,
01:58:51.540 | which is, I mean,
01:58:52.860 | they're not gonna be necessarily human-like.
01:58:54.980 | They're not necessarily,
01:58:56.180 | you would not necessarily recognize them
01:58:59.580 | as being an AGI,
01:59:01.060 | but they would be capable of a degree of generalization
01:59:05.980 | that matches the generalization
01:59:09.380 | performed by human fluid intelligence.
01:59:11.380 | - Sure.
01:59:12.220 | I mean, this is a good point
01:59:13.060 | in terms of general fluid intelligence to mention.
01:59:17.100 | In your paper,
01:59:17.920 | you described different kinds of generalizations,
01:59:20.120 | local, broad, extreme,
01:59:23.580 | and there's a kind of a hierarchy that you form.
01:59:25.620 | So when we say generalizations,
01:59:29.500 | what are we talking about?
01:59:31.820 | What kinds are there?
01:59:33.180 | - Right.
01:59:34.020 | So generalization is a very old idea.
01:59:37.020 | I mean, it's even older than machine learning.
01:59:39.500 | In the context of machine learning,
01:59:40.940 | you say a system generalizes
01:59:43.220 | if it can make sense of an input it has not yet seen.
01:59:48.220 | And that's what I would call a system-centric generalization.
01:59:54.700 | It's generalization with respect to novelty
01:59:59.700 | for the specific system you're considering.
02:00:02.900 | So I think a good test of intelligence
02:00:05.020 | should actually deal with developer-aware generalization,
02:00:09.900 | which is slightly stronger than system-centric generalization.
02:00:13.500 | So developer-aware generalization would be
02:00:16.500 | the ability to generalize to novelty or uncertainty
02:00:21.420 | that not only the system itself has not access to,
02:00:24.980 | but the developer of the system
02:00:26.660 | could not have access to either.
02:00:29.020 | - That's a fascinating meta-definition.
02:00:32.380 | So like the system,
02:00:33.820 | it's basically the edge case thing
02:00:37.660 | we're talking about with autonomous vehicles.
02:00:39.740 | Neither the developer nor the system
02:00:41.620 | know about the edge cases they encounter.
02:00:44.380 | So it's up to,
02:00:46.060 | the system should be able to generalize the thing
02:00:47.980 | that nobody expected,
02:00:51.620 | neither the designer of the training data
02:00:53.820 | nor obviously the contents of the training data.
02:00:59.020 | That's a fascinating definition.
02:01:00.500 | - So you can see generalization,
02:01:01.860 | degrees of generalization as a spectrum.
02:01:04.500 | And the lowest level is what machine learning
02:01:08.020 | is trying to do,
02:01:09.420 | is the assumption that any new situation
02:01:13.620 | is gonna be sampled from a static distribution
02:01:17.020 | of possible situations.
02:01:18.340 | And that you already have a representative sample
02:01:21.500 | of the distribution, that's your training data.
02:01:23.900 | And so in machine learning,
02:01:24.780 | you generalize to a new sample from a known distribution.
02:01:28.820 | And the ways in which your new sample
02:01:31.100 | will be new or different
02:01:33.780 | are ways that are already understood
02:01:36.900 | by the developers of the system.
02:01:39.340 | So you are generalizing to known unknowns
02:01:43.020 | for one specific task.
02:01:45.100 | That's what you would call robustness.
02:01:47.540 | You are robust to things like noise,
02:01:49.260 | small variations and so on.
02:01:50.740 | For one fixed known distribution
02:01:56.620 | that you know through your training data.
02:01:59.340 | And a higher degree would be flexibility
02:02:04.340 | in machine intelligence.
02:02:06.380 | So flexibility would be something like
02:02:08.860 | an L5 self-driving car,
02:02:11.060 | or maybe a robot that can,
02:02:14.220 | pass the coffee cup test,
02:02:16.500 | which is the notion that you'd be given
02:02:19.460 | a random kitchen somewhere in the country
02:02:22.420 | and you would have to,
02:02:23.540 | go make a cup of coffee in that kitchen.
02:02:26.780 | So flexibility would be the ability
02:02:30.780 | to deal with unknown unknowns.
02:02:33.260 | So things that could not,
02:02:35.300 | dimension survivability
02:02:36.620 | that could not have been possibly foreseen
02:02:39.380 | by the creators of the system
02:02:41.060 | within one specific task.
02:02:42.860 | So generalizing to the long tail of situations
02:02:46.220 | in self-driving for instance,
02:02:47.500 | would be flexibility.
02:02:48.420 | So you have robustness, flexibility,
02:02:51.060 | and finally you would have extreme generalization,
02:02:53.700 | which is basically flexibility,
02:02:56.660 | but instead of just considering one specific domain
02:03:01.140 | like driving or domestic robotics,
02:03:03.340 | you're considering an open-ended range
02:03:06.020 | of possible domains.
02:03:07.740 | So a robot would be capable of extreme generalization
02:03:12.580 | if let's say it's designed and trained
02:03:15.620 | for cooking for instance.
02:03:18.860 | And if I buy the robots
02:03:22.580 | and if I'm able,
02:03:24.180 | if it's able to teach itself gardening
02:03:27.620 | in a couple of weeks,
02:03:28.860 | it would be capable of extreme generalization for instance.
02:03:32.300 | - So the ultimate goal is extreme generalization.
02:03:34.380 | - Yes.
02:03:35.220 | So creating a system that is so general
02:03:39.020 | that it could essentially achieve a human skill parity
02:03:43.660 | of arbitrary task and arbitrary domains
02:03:47.900 | with the same level of improvisation
02:03:50.900 | and adaptation power as humans
02:03:52.980 | when it encounters new situations.
02:03:55.540 | And it would do so over basically the same range
02:03:59.540 | of possible domains and tasks as humans
02:04:02.820 | and using essentially the same amount
02:04:05.020 | of training experience of practice as humans would require.
02:04:07.980 | That would be human level extreme generalization.
02:04:10.980 | So I don't actually think humans are anywhere near
02:04:15.500 | the optimal intelligence bound if there is such a thing.
02:04:20.500 | So I think-
02:04:22.260 | - For humans or in general?
02:04:23.900 | - In general.
02:04:25.220 | I think it's quite likely that there is
02:04:27.820 | an hard limit to how intelligent any system can be.
02:04:32.820 | But at the same time,
02:04:34.820 | I don't think humans are anywhere near that limit.
02:04:38.260 | - Yeah, last time I think we talked,
02:04:40.820 | I think you had this idea that we're only as intelligent
02:04:44.620 | as the problems we face.
02:04:46.620 | Sort of-
02:04:47.460 | - Yes, intelligence-
02:04:49.460 | - We are upper bounded by the problem.
02:04:51.020 | - In a way, yes.
02:04:51.980 | We are bounded by our environments
02:04:55.180 | and we are bounded by the problems we try to solve.
02:04:58.220 | - Yeah, yeah.
02:04:59.740 | What do you make of Neuralink
02:05:01.100 | and outsourcing some of the brain power,
02:05:05.500 | like brain computer interfaces?
02:05:07.180 | Do you think we can expand our,
02:05:10.780 | augment our intelligence?
02:05:13.540 | - I am fairly skeptical of neural interfaces
02:05:18.340 | because they are trying to fix one specific bottleneck
02:05:23.340 | in human machine cognition,
02:05:26.020 | which is the bandwidths bottleneck,
02:05:28.820 | input and output of information in the brain.
02:05:31.900 | And my perception of the problem is that
02:05:36.380 | bandwidth is not at this time a bottleneck at all,
02:05:40.260 | meaning that we already have senses that enable us
02:05:44.420 | to take in far more information
02:05:47.900 | than what we can actually process.
02:05:50.460 | - Well, to push back on that a little bit,
02:05:53.260 | to sort of play devil's advocate a little bit,
02:05:55.460 | is if you look at the internet, Wikipedia,
02:05:57.540 | let's say Wikipedia,
02:05:59.020 | I would say that humans,
02:06:00.900 | after the advent of Wikipedia,
02:06:03.340 | are much more intelligent.
02:06:05.900 | - Yes, I think that's a good one,
02:06:07.820 | but that's also not about,
02:06:09.940 | that's about externalizing our intelligence
02:06:14.940 | via information processing systems,
02:06:18.340 | external processing systems,
02:06:19.620 | which is very different from brain computer interfaces.
02:06:23.820 | - Right, but the question is whether
02:06:26.780 | if we have direct access,
02:06:28.380 | if our brain has direct access to Wikipedia without-
02:06:31.940 | - Your brain already has direct access to Wikipedia.
02:06:34.540 | It's on your phone,
02:06:35.940 | and you have your hands and your eyes
02:06:38.460 | and your ears and so on to access that information.
02:06:42.140 | And the speed at which you can access it-
02:06:44.380 | - Is bottlenecked by the cognition.
02:06:45.660 | - I think it's already close,
02:06:48.140 | fairly close to optimal,
02:06:49.580 | which is why speed reading, for instance, does not work.
02:06:53.340 | The faster you read, the less you understand.
02:06:56.020 | - But maybe it's 'cause it uses the eyes.
02:06:58.420 | So maybe-
02:06:59.420 | - So I don't believe so.
02:07:01.420 | I think the brain is very slow.
02:07:04.260 | It typically operates,
02:07:06.340 | the fastest things that happen in the brain
02:07:08.580 | are at the level of 50 milliseconds.
02:07:10.500 | Forming a conscious thought
02:07:13.820 | can potentially take entire seconds, right?
02:07:16.660 | And you can already read pretty fast.
02:07:19.140 | So I think the speed at which you can take information in,
02:07:23.500 | and even the speed at which you can output information
02:07:26.500 | can only be very incrementally improved.
02:07:29.980 | - Maybe there's a-
02:07:30.820 | - I think that if you're a very, very fast typer,
02:07:32.740 | if you're a very trained typer,
02:07:34.460 | the speed at which you can express your thoughts
02:07:36.740 | is already the speed at which you can form your thoughts.
02:07:40.580 | - Right, so that's kind of an idea
02:07:42.100 | that there are fundamental bottlenecks to the human mind.
02:07:47.060 | But it's possible that everything we have in the human mind
02:07:51.620 | is just to be able to survive in the environment.
02:07:54.460 | And there's a lot more to expand.
02:07:58.340 | Maybe, you said the speed of the thought.
02:08:02.460 | So I think augmenting human intelligence
02:08:06.820 | is a very valid and very powerful avenue, right?
02:08:09.940 | And that's what computers are about.
02:08:12.300 | In fact, that's what all of culture
02:08:15.020 | and civilization is about.
02:08:16.500 | They are, culture is externalised cognition,
02:08:20.660 | and we rely on culture to think constantly.
02:08:23.780 | - Yeah, I mean, that's another way, yeah.
02:08:26.700 | - Not just computers, not just phones and the internet.
02:08:29.900 | All of culture, like language, for instance,
02:08:32.500 | is a form of externalised cognition.
02:08:34.060 | Books are obviously externalised cognition.
02:08:37.540 | - Yeah, that's a great point.
02:08:38.380 | - And you can scale that externalised cognition
02:08:42.060 | far beyond the capability of the human brain.
02:08:45.220 | And you could see, civilization itself,
02:08:48.700 | it has capabilities that are far beyond
02:08:52.900 | any individual brain.
02:08:54.380 | And we'll keep scaling it,
02:08:55.340 | because it's not rebound by individual brains.
02:08:59.220 | It's a different kind of system.
02:09:01.380 | - Yeah, and that system includes non-humans.
02:09:05.380 | First of all, it includes all the other biological systems,
02:09:08.700 | which are probably contributing
02:09:10.300 | to the overall intelligence of the organism.
02:09:12.980 | And then computers are part of it.
02:09:14.740 | - Non-human systems are probably not contributing much,
02:09:16.900 | but AIs are definitely contributing to that.
02:09:19.780 | Like Google Search, for instance, is a big part of it.
02:09:22.500 | - Yeah, yeah.
02:09:27.340 | A huge part.
02:09:28.380 | A part that we can't probably introspect.
02:09:31.180 | Like how the world has changed in the past 20 years,
02:09:33.820 | it's probably very difficult for us
02:09:35.260 | to be able to understand until...
02:09:38.260 | Of course, whoever created the simulation we're in
02:09:40.700 | is probably doing metrics, measuring the progress.
02:09:44.940 | There was probably a big spike in performance.
02:09:47.380 | They're enjoying this.
02:09:50.500 | So what are your thoughts on the Turing Test
02:09:56.100 | and the Lobner Prize, which is the...
02:09:58.460 | One of the most famous attempts
02:10:02.340 | at the test of human intelligence,
02:10:04.420 | sorry, of artificial intelligence,
02:10:07.140 | by doing a natural language open dialogue test
02:10:11.780 | that's judged by humans as far as how well the machine did.
02:10:16.780 | - So I'm not a fan of the Turing Test itself
02:10:22.100 | or any of its variants for two reasons.
02:10:25.900 | So first of all,
02:10:26.980 | it's really coping out of trying to define
02:10:35.820 | and measure intelligence
02:10:37.700 | because it's entirely outsourcing that
02:10:40.700 | to a panel of human judges.
02:10:43.420 | And these human judges,
02:10:44.860 | they may not themselves have any proper methodology.
02:10:49.740 | They may not themselves have any proper definition
02:10:52.700 | of intelligence.
02:10:53.660 | They may not be reliable.
02:10:54.780 | So the Turing Test is already failing
02:10:57.340 | one of the core psychometrics principles,
02:10:59.420 | which is reliability, because you have biased human judges.
02:11:04.420 | It's also violating the standardization requirement
02:11:08.020 | and the freedom from bias requirement.
02:11:10.260 | And so it's really a cope out
02:11:12.140 | because you are outsourcing everything that matters,
02:11:15.220 | which is precisely describing intelligence
02:11:18.580 | and finding a standard on test to measure it.
02:11:22.220 | You're outsourcing everything to people.
02:11:25.340 | So it's really a cope out.
02:11:26.420 | And by the way, we should keep in mind
02:11:28.900 | that when Turing proposed the imitation game,
02:11:33.820 | it was not meaning for the imitation game
02:11:36.820 | to be an actual goal for the field of AI,
02:11:40.740 | an actual test of intelligence.
02:11:42.540 | It was using the imitation game as a thought experiment
02:11:48.900 | in a philosophical discussion in his 1950 paper.
02:11:53.660 | He was trying to argue that theoretically
02:11:58.660 | it should be possible for something very much
02:12:03.260 | like the human mind, indistinguishable from the human mind
02:12:06.140 | to be encoded in a Turing machine.
02:12:08.180 | And at the time that was a very daring idea.
02:12:13.180 | It was stretching credulity.
02:12:16.660 | But nowadays I think it's fairly well accepted
02:12:20.220 | that the mind is an information processing system
02:12:22.740 | and that you could probably encode it into a computer.
02:12:25.500 | So another reason why I'm not a fan of this type of test
02:12:29.460 | is that the incentives that it creates
02:12:34.340 | are incentives that are not conducive
02:12:37.500 | to proper scientific research.
02:12:40.900 | If your goal is to trick,
02:12:43.620 | to convince a panel of human judges
02:12:46.700 | that they're talking to a human,
02:12:48.540 | then you have an incentive to rely on tricks
02:12:53.540 | and prestidigitation.
02:12:55.620 | In the same way that let's say you're doing physics
02:12:59.340 | and you want to solve teleportation.
02:13:01.660 | And what if the test that you set out to pass
02:13:04.740 | is you need to convince a panel of judges
02:13:07.580 | that teleportation took place
02:13:09.660 | and they're just sitting there
02:13:10.980 | and watching what you're doing.
02:13:12.740 | And that is something that you can achieve with,
02:13:16.900 | you know, David Copperfield could achieve it
02:13:19.340 | in his show at Vegas, right?
02:13:21.620 | But what he's doing is very elaborate,
02:13:25.380 | but it's not actually, it's not physics.
02:13:29.260 | It's not making any progress
02:13:30.940 | in our understanding of the universe, right?
02:13:32.700 | - To push back on that, it's possible,
02:13:34.860 | that's the hope with these kinds of subjective evaluations,
02:13:39.100 | is that it's easier to solve it generally
02:13:42.060 | than it is to come up with tricks
02:13:44.140 | that convince a large number of judges.
02:13:46.660 | That's the hope.
02:13:47.500 | - In practice, what it turns out
02:13:48.940 | that it's very easy to deceive people
02:13:50.860 | in the same way that, you know,
02:13:51.980 | you can do magic in Vegas.
02:13:54.420 | You can actually very easily convince people
02:13:57.380 | that they're talking to a human
02:13:58.660 | when they're actually talking to an algorithm.
02:14:00.460 | - I just disagree.
02:14:01.820 | I disagree with that.
02:14:02.660 | I think it's easy.
02:14:03.700 | I would push, it's not easy.
02:14:06.020 | It's doable.
02:14:08.380 | - It's very easy because-
02:14:09.500 | - I wouldn't say it's very easy though.
02:14:10.820 | - We are biased.
02:14:12.100 | Like we have theory of mind.
02:14:13.980 | We are constantly projecting emotions, intentions,
02:14:18.180 | agent-ness.
02:14:21.100 | Agent-ness is one of our core innate priors, right?
02:14:24.300 | We are projecting these things on everything around us.
02:14:26.860 | Like if you paint a smiley on a rock,
02:14:31.300 | the rock becomes happy in our eyes.
02:14:33.500 | And because we have this extreme bias
02:14:36.420 | that permits everything we see around us,
02:14:39.820 | it's actually pretty easy to trick people.
02:14:41.860 | Like it is a trick people.
02:14:44.340 | - I so totally disagree with that.
02:14:45.900 | You brilliantly put, there's a huge,
02:14:47.940 | the anthropomorphization that we naturally do,
02:14:51.500 | the agent-ness of that word.
02:14:53.420 | Is that a real word?
02:14:54.260 | - No, it's not a real word.
02:14:55.540 | - I like it.
02:14:56.380 | - But it's a good word.
02:14:57.220 | It's a useful word.
02:14:58.060 | - It's a useful word.
02:14:58.900 | Let's make it real.
02:14:59.740 | It's a huge help.
02:15:01.060 | But I still think it's really difficult to convince.
02:15:03.660 | If you do like the Alexa prize formulation,
02:15:07.980 | where you talk for an hour,
02:15:10.060 | like there's formulations of the test you can create
02:15:12.460 | where it's very difficult.
02:15:13.820 | - So I like the Alexa prize better
02:15:16.260 | because it's more pragmatic.
02:15:18.060 | It's more practical.
02:15:19.580 | It's actually incentivizing developers
02:15:22.060 | to create something that's useful
02:15:24.180 | as a human machine interface.
02:15:29.180 | So that's slightly better than just the imitation.
02:15:31.740 | - So I like it.
02:15:32.580 | Your idea is like a test,
02:15:35.740 | which hopefully will help us
02:15:37.580 | in creating intelligent systems as a result.
02:15:39.620 | Like if you create a system that passes it,
02:15:41.780 | it'll be useful for creating further intelligent systems.
02:15:44.780 | - Yes, at least.
02:15:46.140 | - Yeah.
02:15:46.980 | I mean, just to kind of comment,
02:15:49.700 | I'm a little bit surprised
02:15:51.740 | how little inspiration people draw
02:15:54.820 | from the Turing test today.
02:15:56.140 | The media and the popular press
02:15:58.700 | might write about it every once in a while.
02:16:00.900 | The philosophers might talk about it.
02:16:03.540 | But like most engineers are not really inspired by it.
02:16:07.100 | And I know you don't like the Turing test,
02:16:11.380 | but we'll have this argument another time.
02:16:13.780 | There's something inspiring about it, I think.
02:16:17.700 | - As a philosophical device in a philosophical discussion,
02:16:21.740 | I think there is something very interesting about it.
02:16:23.780 | I don't think it is in practical terms.
02:16:26.180 | I don't think it's conducive to progress.
02:16:29.100 | And one of the reasons why is that,
02:16:32.380 | I think being very human-like,
02:16:34.980 | being indistinguishable from a human
02:16:36.940 | is actually the very last step
02:16:39.340 | in the creation of machine intelligence.
02:16:40.980 | That the first AI that will show strong generalization
02:16:45.540 | that will actually implement
02:16:50.860 | human-like broad cognitive abilities,
02:16:53.100 | they will not actually behave or look anything like humans.
02:16:57.260 | Human likeness is the very last step in that process.
02:17:01.620 | And so a good test is a test that points you
02:17:04.380 | towards the first step on the ladder,
02:17:07.020 | not towards the top of the ladder.
02:17:08.860 | - So to push back on that,
02:17:10.380 | so I guess I usually agree with you on most things.
02:17:13.460 | I remember you, I think at some point tweeting
02:17:15.620 | something about the Turing test
02:17:17.100 | not being counterproductive or something like that.
02:17:20.260 | And I think a lot of very smart people agree with that.
02:17:23.020 | Computation speaking, not a very smart person.
02:17:31.500 | I disagree with that 'cause I think there's some magic
02:17:33.820 | to the interactivity with other humans.
02:17:36.940 | So to play devil's advocate on your statement,
02:17:39.660 | it's possible that in order to demonstrate
02:17:42.820 | the generalization abilities of a system,
02:17:45.580 | you have to in conversation show your ability
02:17:50.580 | to adjust, adapt to the conversation,
02:17:55.460 | through not just like as a standalone system,
02:17:58.460 | but through the process of like the interaction,
02:18:01.380 | the game theoretic, where you really are changing
02:18:06.380 | the environment by your actions.
02:18:09.100 | So in the ARC challenge, for example, you're an observer.
02:18:12.780 | You can't scare the test into changing.
02:18:17.460 | You can't talk to the test.
02:18:19.420 | You can't play with it.
02:18:21.140 | So there's some aspect of that interactivity
02:18:24.300 | that becomes highly subjective,
02:18:25.780 | but it feels like it could be conducive
02:18:28.340 | to generalizability. - Yeah, I think you make
02:18:30.300 | a great point.
02:18:31.140 | The interactivity is a very good setting
02:18:33.580 | to force a system to show adaptation,
02:18:36.020 | to show generalization.
02:18:37.220 | That said, you're at the same time,
02:18:42.340 | it's not something very scalable
02:18:43.980 | because you rely on human judges.
02:18:46.100 | It's not something reliable
02:18:47.420 | because the human judges may not.
02:18:49.460 | - So you don't like human judges.
02:18:50.980 | - Basically, yes.
02:18:51.820 | And I think, so I love the idea of interactivity.
02:18:55.180 | I initially wanted an ARC test
02:18:59.180 | that had some amount of interactivity
02:19:01.380 | where your score on a task would not be one or zero,
02:19:04.300 | if you can solve it or not,
02:19:05.380 | but would be the number of attempts
02:19:10.380 | that you can make before you hit the right solution,
02:19:14.140 | which means that now you can start applying
02:19:16.900 | the scientific method as you solve ARC tasks,
02:19:19.860 | that you can start formulating hypotheses
02:19:22.300 | and probing the system to see whether the hypothesis,
02:19:26.540 | the observation will match the hypothesis or not.
02:19:28.660 | - It would be amazing if you could also,
02:19:30.660 | even higher level than that,
02:19:32.620 | measure the quality of your attempts,
02:19:35.580 | which of course is impossible,
02:19:36.620 | but again, that gets subjective.
02:19:38.460 | How good was your thinking?
02:19:40.060 | - Yeah, how efficient was,
02:19:43.860 | so one thing that's interesting about this notion
02:19:46.780 | of scoring you as how many attempts you need
02:19:49.700 | is that you can start producing tasks
02:19:51.860 | that are way more ambiguous, right?
02:19:54.220 | - Right.
02:19:56.460 | - Because-- - Exactly, so you can--
02:19:57.580 | - With the different attempts,
02:19:59.660 | you can actually probe that ambiguity, right?
02:20:03.340 | - Right, so that's in a sense,
02:20:05.700 | which is how good can you adapt to the uncertainty
02:20:10.700 | and reduce the uncertainty?
02:20:15.700 | - Yes, it's how fast is the efficiency
02:20:19.300 | with which you reduce uncertainty in problem space, exactly.
02:20:23.020 | - Very difficult to come up with that kind of test though.
02:20:24.940 | - Yeah, so I would love to be able
02:20:27.300 | to create something like this.
02:20:28.340 | In practice, it would be very, very difficult, but yes.
02:20:31.940 | - But I mean, what you're doing,
02:20:34.300 | what you've done with the ARC challenge is brilliant.
02:20:37.540 | I'm also not, I'm surprised that it's not more popular,
02:20:40.940 | but I think it's picking up--
02:20:41.980 | - It does its niche, it does its niche, yeah.
02:20:44.020 | - Yeah, what are your thoughts about another test
02:20:46.740 | that I talked with Marcus Hutter?
02:20:48.860 | He has the Hutter Prize for compression of human knowledge,
02:20:51.620 | and the idea is really sort of quantify,
02:20:55.100 | like reduce the test of intelligence purely
02:20:57.300 | to just the ability to compress.
02:20:59.620 | What's your thoughts about this intelligence as compression?
02:21:04.620 | - I mean, it's a very fun test
02:21:07.140 | because it's such a simple idea.
02:21:09.260 | Like you're given Wikipedia, basic English Wikipedia,
02:21:13.860 | and you must compress it.
02:21:15.540 | And so it stems from the idea that cognition is compression,
02:21:21.180 | that the brain is basically a compression algorithm.
02:21:24.020 | This is a very old idea.
02:21:25.660 | It's a very, I think, striking and beautiful idea.
02:21:30.620 | I used to believe it.
02:21:31.900 | I eventually had to realize that it was
02:21:35.740 | very much a flawed idea.
02:21:36.900 | So I no longer believe that cognition is compression.
02:21:41.460 | But I can tell you what's the difference.
02:21:44.540 | So it's very easy to believe that cognition and compression
02:21:48.780 | are the same thing because,
02:21:50.380 | so Jeff Hawkins, for instance,
02:21:52.900 | says that cognition is prediction.
02:21:54.780 | And of course, prediction is basically
02:21:57.180 | the same thing as compression, right?
02:21:58.700 | It's just including the temporal axis.
02:22:02.300 | And it's very easy to believe this
02:22:05.100 | because compression is something that we do all the time,
02:22:07.980 | very naturally.
02:22:09.060 | We are constantly compressing information.
02:22:12.020 | We are constantly trying,
02:22:15.660 | we have this bias towards simplicity.
02:22:17.940 | We're constantly trying to organize things in our mind
02:22:21.100 | and around us to be more regular, right?
02:22:24.540 | So it's a beautiful idea.
02:22:26.900 | It's very easy to believe.
02:22:28.700 | There is a big difference between what we do
02:22:31.900 | with our brains and compression.
02:22:34.020 | So compression is actually kind of a tool
02:22:38.340 | in the human cognitive toolkits
02:22:40.140 | that is used in many ways, but it's just a tool.
02:22:43.540 | It is not, it is a tool for cognition.
02:22:46.020 | It is not cognition itself.
02:22:47.740 | And the big fundamental difference
02:22:50.140 | is that cognition is about being able
02:22:54.420 | to operate in future situations
02:22:57.660 | that include fundamental uncertainty and novelty.
02:23:02.180 | So for instance, consider a child at age 10.
02:23:06.980 | And so they have 10 years of life experience.
02:23:10.180 | They've gotten pain, pleasure, rewards,
02:23:12.860 | and punishment in a period of time.
02:23:16.620 | If you were to generate the shortest behavioral program
02:23:21.620 | that would have basically run that child
02:23:25.340 | over those 10 years in an optimal way, right?
02:23:29.380 | The shortest optimal behavioral program
02:23:32.300 | given the experience of that child so far.
02:23:34.940 | Well, that program, that compressed program,
02:23:37.620 | this is what you would get if the mind of the child
02:23:40.020 | was a compression algorithm essentially,
02:23:42.860 | would be utterly unable, inappropriate
02:23:47.860 | to process the next 70 years in the life of that child.
02:23:53.220 | So in the models we build of the world,
02:23:59.100 | we are not trying to make them actually optimally compressed.
02:24:03.300 | We are using compression as a tool
02:24:06.740 | to promote simplicity and efficiency in our models,
02:24:10.140 | but they are not perfectly compressed
02:24:12.140 | because they need to include things
02:24:15.380 | that are seemingly useless today,
02:24:17.700 | that have seemingly been useless so far,
02:24:20.220 | but that may turn out to be useful in the future
02:24:24.180 | because you just don't know the future.
02:24:25.900 | And that's the fundamental principle
02:24:28.780 | that cognition, that intelligence arises from,
02:24:31.260 | is that you need to be able to run
02:24:33.820 | appropriate behavioral programs,
02:24:35.540 | except you have absolutely no idea
02:24:37.980 | what sort of context, environment, and situation
02:24:40.980 | they're going to be running in.
02:24:42.300 | And you have to deal with that uncertainty,
02:24:44.980 | with that future normality.
02:24:46.580 | So an analogy that you can make is with investing,
02:24:51.580 | for instance, if I look at the past, you know,
02:24:57.260 | 20 years of stock market data
02:24:59.580 | and I use a compression algorithm
02:25:01.860 | to figure out the best trading strategy,
02:25:04.380 | it's going to be, you know, you buy Apple stock,
02:25:06.460 | then maybe the past few years
02:25:07.500 | you buy Tesla stock or something.
02:25:10.420 | But is that strategy still going to be true
02:25:13.500 | for the next 20 years?
02:25:14.660 | Well, actually, probably not,
02:25:17.700 | which is why if you're a smart investor,
02:25:21.100 | you're not just going to be following the strategy
02:25:25.460 | that corresponds to compression of the past.
02:25:28.980 | You're going to be following,
02:25:30.420 | you're going to have a balanced portfolio, right?
02:25:34.860 | Because you just don't know what's going to happen.
02:25:38.180 | - I mean, I guess in that same sense,
02:25:40.420 | the compression is analogous to what you talked about,
02:25:43.100 | which is like local or robust generalization
02:25:45.900 | versus extreme generalization.
02:25:47.820 | It's much closer to that side of being able to generalize
02:25:52.380 | in the local sense.
02:25:53.380 | - That's why, you know, as humans,
02:25:54.900 | as when we are children, in our education,
02:25:59.900 | so a lot of it is driven by play, driven by curiosity.
02:26:03.100 | We are not efficiently compressing things.
02:26:07.860 | We're actually exploring.
02:26:09.620 | We are retaining all kinds of things
02:26:14.380 | from our environment that seem to be completely useless
02:26:19.620 | because they might turn out to be eventually useful, right?
02:26:24.380 | And that's what cognition is really about.
02:26:26.860 | And what makes it antagonistic to compression
02:26:29.220 | is that it is about hedging for future uncertainty.
02:26:34.020 | And that's antagonistic to compression.
02:26:35.860 | - Yes, efficiently hedging.
02:26:37.580 | - So cognition leverages compression
02:26:40.820 | as a tool to promote efficiency, right?
02:26:44.900 | And so in that sense, in our models.
02:26:47.380 | - It's like Einstein said, make it simpler,
02:26:50.860 | but not, however that quote goes, but not too simple.
02:26:54.940 | So you want to, compression simplifies things,
02:26:57.660 | but you don't want to make it too simple.
02:27:00.180 | - Yes, so a good model of the world
02:27:02.820 | is going to include all kinds of things
02:27:04.900 | that are completely useless, actually,
02:27:06.540 | just because, just in case.
02:27:08.500 | Because you need diversity
02:27:09.500 | in the same way that's in your portfolio.
02:27:11.140 | You need all kinds of stocks
02:27:12.220 | that may not have performed well so far,
02:27:14.500 | but you need diversity.
02:27:15.540 | And the reason you need diversity
02:27:16.620 | is because fundamentally you don't know what you're doing.
02:27:19.660 | And the same is true of the human mind
02:27:22.060 | is that it needs to behave appropriately in the future.
02:27:26.820 | And it has no idea what the future is going to be like.
02:27:29.420 | It's a bit, it's not going to be like the past.
02:27:31.460 | So compressing the past is not appropriate
02:27:33.620 | because the past is not,
02:27:35.620 | is not proactive with the future.
02:27:39.020 | - Yeah, history repeats itself, but not perfectly.
02:27:43.300 | I don't think I asked you last time
02:27:47.300 | the most inappropriately absurd question.
02:27:50.180 | We've talked a lot about intelligence,
02:27:54.500 | but the bigger question from intelligence is of meaning.
02:27:59.300 | Intelligence systems are kind of goal-oriented.
02:28:02.980 | There's, they're always optimizing for goal.
02:28:05.420 | If you look at the Hutter Prize, actually,
02:28:07.620 | I mean, there's always a clean formulation of a goal,
02:28:10.940 | but the natural questions for us humans,
02:28:14.260 | since we don't know our objective function
02:28:16.020 | is what is the meaning of it all?
02:28:18.380 | So the absurd question is what,
02:28:21.540 | Francois Chollet, do you think is the meaning of life?
02:28:25.740 | - What's the meaning of life?
02:28:26.820 | Yeah, that's a big question.
02:28:31.620 | And I think I can give you my answer,
02:28:35.460 | or at least one of my answers.
02:28:38.060 | And so, you know, the one thing that's very important
02:28:43.060 | in understanding who we are is that
02:28:46.820 | everything that makes up,
02:28:49.340 | that makes up ourselves, that makes up who we are,
02:28:53.540 | even your most personal thoughts
02:28:56.940 | is not actually your own, right?
02:28:59.340 | Like even your most personal thoughts
02:29:01.700 | are expressed in words that you did not invent
02:29:04.820 | and are built on concepts and images
02:29:08.740 | that you did not invent.
02:29:10.620 | We are very much cultural beings, right?
02:29:14.540 | We are made of culture.
02:29:16.820 | What makes us different from animals, for instance, right?
02:29:20.220 | So we are, everything about ourselves is an echo
02:29:25.140 | of the past, an echo of people who lived before us, right?
02:29:29.860 | That's who we are.
02:29:31.380 | And in the same way, if we manage to contribute something
02:29:36.380 | to the collective edifice of culture,
02:29:40.140 | a new idea, maybe a beautiful piece of music,
02:29:44.580 | a work of art, a grand theory,
02:29:47.260 | a new words maybe,
02:29:51.220 | that something is going to become a part
02:29:55.620 | of the minds of future humans, essentially forever.
02:30:00.300 | So everything we do creates ripples, right?
02:30:04.020 | That propagates into the future.
02:30:06.020 | And that's, in a way, this is our path to immortality
02:30:11.020 | is that as we contribute things to culture,
02:30:17.620 | culture in turn becomes future humans.
02:30:21.420 | And we keep influencing people,
02:30:24.220 | thousands of years from now.
02:30:27.660 | So our actions today create ripples.
02:30:30.740 | And these ripples, I think,
02:30:33.700 | basically sum up the meaning of life.
02:30:37.380 | Like in the same way that we are the sum
02:30:40.900 | of the interactions between many different ripples
02:30:45.380 | that came from our past,
02:30:47.220 | we are ourselves creating ripples
02:30:48.940 | that will propagate into the future.
02:30:50.780 | And that's why we should be,
02:30:53.300 | this seems like perhaps a naive thing to say,
02:30:56.060 | but we should be kind to others during our time on earth
02:31:01.060 | because every act of kindness creates ripples.
02:31:05.660 | And in reverse, every act of violence also creates ripples.
02:31:09.380 | And you want to carefully choose
02:31:12.220 | which kind of ripples you want to create
02:31:14.380 | and you want to propagate into the future.
02:31:16.580 | - And in your case, first of all, beautifully put,
02:31:19.100 | but in your case, creating ripples into the future human
02:31:23.300 | and future AGI systems.
02:31:27.060 | - Yes. - It's fascinating.
02:31:29.580 | - All success.
02:31:30.700 | - I don't think there's a better way to end it, Francois.
02:31:35.380 | As always, for a second time,
02:31:37.220 | and I'm sure many times in the future,
02:31:39.380 | it's been a huge honor.
02:31:40.920 | You're one of the most brilliant people
02:31:43.440 | in the machine learning, computer science,
02:31:46.100 | science world.
02:31:47.620 | Again, it's a huge honor.
02:31:48.740 | Thanks for talking today.
02:31:49.580 | - It's been a pleasure.
02:31:50.620 | Thanks a lot for having me.
02:31:52.060 | Really appreciate it.
02:31:54.020 | - Thanks for listening to this conversation
02:31:55.420 | with Francois Chollet.
02:31:56.820 | And thank you to our sponsors,
02:31:58.620 | Babbel, Masterclass, and Cash App.
02:32:01.780 | Click the sponsor links in the description
02:32:03.940 | to get a discount and to support this podcast.
02:32:06.900 | If you enjoy this thing, subscribe on YouTube,
02:32:09.180 | review Five Stars on Apple podcast,
02:32:11.380 | follow on Spotify, support on Patreon,
02:32:14.140 | or connect with me on Twitter @LexFriedman.
02:32:16.960 | And now let me leave you with some words
02:32:19.420 | from Rene Descartes in 1668,
02:32:22.940 | an excerpt of which Francois includes
02:32:24.840 | in his "On the Measure of Intelligence" paper.
02:32:27.780 | "If there were machines which bore a resemblance
02:32:30.260 | "to our bodies and imitated our actions
02:32:32.840 | "as closely as possible for all practical purposes,
02:32:36.300 | "we should still have two very certain means
02:32:38.800 | "of recognizing that they were not real men.
02:32:42.120 | "The first is that they could never use words
02:32:44.580 | "or put together signs as we do
02:32:46.720 | "in order to declare our thoughts to others.
02:32:49.760 | "For we can certainly conceive of a machine so constructed
02:32:53.320 | "that it utters words and even utters words
02:32:55.780 | "that correspond to bodily actions
02:32:57.500 | "causing a change in its organs.
02:32:59.520 | "But it is not conceivable that such a machine
02:33:02.640 | "should produce different arrangements of words
02:33:05.100 | "so as to give an appropriately meaningful answer
02:33:08.020 | "to whatever is said in its presence
02:33:10.580 | "as the dullest of men can do."
02:33:12.760 | Here Descartes is anticipating the Turing test
02:33:15.460 | and the argument still continues to this day.
02:33:17.720 | "Secondly," he continues,
02:33:20.920 | "even though some machines might do some things
02:33:23.360 | "as well as we do them, or perhaps even better,
02:33:26.480 | "they would inevitably fail in others,
02:33:28.980 | "which would reveal that they're acting
02:33:30.840 | "not from understanding,
02:33:32.360 | "but only from the disposition of their organs."
02:33:35.160 | This is an incredible quote.
02:33:38.700 | For whereas reason is a universal instrument
02:33:43.200 | which can be used in all kinds of situations,
02:33:46.580 | these organs need some particular action.
02:33:49.080 | Hence, it is for all practical purposes impossible
02:33:52.120 | for a machine to have enough different organs
02:33:54.300 | to make it act in all the contingencies of life
02:33:57.760 | in the way in which our reason makes us act.
02:34:01.360 | That's the debate between mimicry memorization
02:34:05.060 | versus understanding.
02:34:07.240 | So thank you for listening and hope to see you next time.
02:34:11.680 | (upbeat music)
02:34:14.260 | (upbeat music)
02:34:16.840 | [BLANK_AUDIO]