back to indexFrançois Chollet: Measures of Intelligence | Lex Fridman Podcast #120
Chapters
0:0 Introduction
5:4 Early influence
6:23 Language
12:50 Thinking with mind maps
23:42 Definition of intelligence
42:24 GPT-3
53:7 Semantic web
57:22 Autonomous driving
69:30 Tests of intelligence
73:59 Tests of human intelligence
87:18 IQ tests
95:59 ARC Challenge
119:11 Generalization
129:50 Turing Test
140:44 Hutter prize
147:44 Meaning of life
00:00:00.000 |
The following is a conversation with Francois Chollet, 00:00:05.320 |
He's both a world-class engineer and a philosopher 00:00:09.600 |
in the realm of deep learning and artificial intelligence. 00:00:13.200 |
This time, we talk a lot about his paper titled 00:00:20.320 |
and measure general intelligence in our computing machinery. 00:00:31.240 |
to get a discount and to support this podcast. 00:00:38.720 |
of artificial general intelligence is a rare thing. 00:00:43.760 |
works on very narrow AI with very narrow benchmarks. 00:00:53.200 |
On the other hand, the outside the mainstream, 00:00:59.640 |
works on approaches that verge on the philosophical 00:01:03.000 |
and even the literary without big public benchmarks. 00:01:07.320 |
Walking the line between the two worlds is a rare breed, 00:01:32.460 |
If you enjoy this thing, subscribe on YouTube, 00:02:04.340 |
Go to babbel.com and use code Lex to get three months free. 00:02:15.180 |
Daily lessons are 10 to 15 minutes, super easy, 00:02:18.320 |
effective, designed by over 100 language experts. 00:02:22.200 |
Let me read a few lines from the Russian poem 00:02:27.200 |
by Alexander Bloch that you'll start to understand 00:02:36.480 |
Now, I say that you'll start to understand this poem 00:02:53.980 |
Now, the latter part is definitely not endorsed 00:03:06.080 |
of late night Russian conversation over vodka. 00:03:20.960 |
Sign up at masterclass.com/lex to get a discount 00:03:35.440 |
to watch courses from, to list some of my favorites, 00:03:52.720 |
Gary Kasparov on chess, Daniel Negreanu on poker 00:03:59.680 |
and the experience of being watched at the space alone 00:04:03.260 |
By the way, you can watch it on basically any device. 00:04:09.320 |
to get a discount and to support this podcast. 00:04:30.520 |
let me mention a surprising fact related to physical money. 00:04:39.260 |
The other 92% of the money only exists digitally 00:04:54.360 |
an organization that is helping to advance robotics 00:04:56.960 |
and STEM education for young people around the world. 00:05:00.480 |
And now here's my conversation with Francois Chollet. 00:05:07.320 |
had a big impact on you growing up and today? 00:05:14.800 |
when I read his books as a teenager was Jean Piaget, 00:05:21.320 |
is considered to be the father of developmental psychology. 00:05:27.000 |
about basically how intelligence develops in children. 00:05:40.840 |
It's actually superseded by many newer developments 00:05:48.800 |
very striking and actually shaped the early ways 00:05:53.800 |
and the development of intelligence as a teenager. 00:05:56.200 |
- His actual ideas or the way he thought about it 00:06:02.520 |
Jean Piaget is the author that reintroduced me 00:06:07.960 |
is something that you construct throughout your life 00:06:15.760 |
And I thought that was a very interesting idea, 00:06:22.000 |
Another book that I read around the same time 00:06:27.280 |
and there was actually a little bit of overlap 00:06:35.320 |
is Jeff Hawkins' "On Intelligence," which is a classic. 00:06:42.520 |
as a multi-scale hierarchy of temporal prediction modules. 00:07:03.960 |
And it really shaped the way I started thinking 00:07:13.720 |
Also, he's a neuroscientist, so he was thinking actual, 00:07:17.520 |
he was basically talking about how our mind works. 00:07:20.560 |
- Yeah, the notion that cognition is prediction 00:07:23.240 |
was an idea that was kind of new to me at the time 00:07:27.840 |
And yeah, and the notion that there are multiple scales 00:07:50.480 |
but of course, I think these ideas really found 00:07:52.800 |
their practical implementation in deep learning. 00:07:59.720 |
I think he was talking about knowledge representation. 00:08:06.320 |
as a kind of memory, you're memorizing things, 00:08:23.960 |
so that you can actually retrieve very precisely 00:08:30.120 |
- The retrieval aspect, you can like introspect, 00:08:38.280 |
and language is actually the tool you use to do that. 00:08:41.680 |
I think language is a kind of operating system for the mind. 00:08:46.360 |
And you use language, well, one of the uses of language 00:08:49.560 |
is as a query that you run over your own memory. 00:08:53.840 |
You use words as keys to retrieve specific experiences 00:09:10.040 |
then you would have to, you would not really have 00:09:13.520 |
a self-internally triggered way of retrieving past thoughts. 00:09:18.520 |
You would have to rely on external experiences. 00:09:24.080 |
you smell a specific smell, and that brings up memories, 00:09:26.840 |
but you would not really have a way to deliberately 00:09:49.120 |
like there's turtles, what's at the bottom of the turtles? 00:09:54.120 |
They don't go, it can't be turtles all the way down. 00:09:57.320 |
Is language at the bottom of cognition of everything? 00:10:03.800 |
aspect of like what it means to be a thinking thing? 00:10:14.600 |
- Yes, I think language is a layer on top of cognition. 00:10:17.880 |
So it is fundamental to cognition in the sense that, 00:10:24.600 |
as the operating system of the brain, of the human mind. 00:10:33.200 |
The computer exists before the operating system, 00:10:36.160 |
but the operating system is how you make it truly useful. 00:10:39.480 |
- And the operating system is most likely Windows, 00:10:55.080 |
Like we use actually sort of human interpretable language, 00:11:03.120 |
that's closer to like logical type of statements? 00:11:07.920 |
Like, yeah, what is the nature of language, do you think? 00:11:19.160 |
Is there something that doesn't require utterances 00:11:25.560 |
- Are you asking about the possibility that could exist? 00:11:29.440 |
Languages for thinking that are not made of words? 00:11:46.760 |
I think we think in terms of emotion in space 00:11:56.880 |
probably express these thoughts in terms of the actions 00:12:03.720 |
And in terms of motions of objects in their environment 00:12:08.080 |
before they start thinking in terms of words. 00:12:16.840 |
So like the kind of actions and ways the babies 00:12:32.000 |
'Cause it's important for them trying to engineer it 00:12:45.440 |
And you actually see it reflected in language. 00:12:53.880 |
I consider myself very much as a visual thinker. 00:12:57.400 |
You often express these thoughts by using things 00:13:07.200 |
or like you solve problems by imagining yourself 00:13:15.080 |
I don't know if you have this sort of experience. 00:13:49.720 |
I always have to, before I jump from concept to concept, 00:13:58.080 |
I can only travel on a 2D paper, not inside my mind. 00:14:05.400 |
- But even if you're writing like a paper, for instance, 00:14:07.960 |
don't you have like a spatial representation of your paper? 00:14:11.040 |
Like you visualize where ideas lie topologically 00:14:19.040 |
kind of like a subway map of the ideas in your paper? 00:14:23.440 |
I mean, there is, in papers, I don't know about you, 00:14:39.360 |
and you're trying to kind of, it's almost like, 00:14:45.440 |
what's that called when you do a path planning search 00:14:49.040 |
from both directions, from the start and from the end? 00:14:51.740 |
And then you find, you do like shortest path, 00:14:54.800 |
but in game playing, you do this with like A star 00:15:05.760 |
I think like, first of all, just exploring from the start, 00:15:15.680 |
And then from the destination, if you start backtracking, 00:15:20.680 |
if I want to show some kind of sets of ideas, 00:15:28.360 |
But yeah, I don't think I'm doing all that in my mind, 00:15:33.240 |
- Do you use mind maps to organize your ideas? 00:15:39.880 |
I've been so jealous of people, I haven't really tried it. 00:15:42.160 |
I've been jealous of people that seem to like, 00:15:45.560 |
they get like this fire of passion in their eyes 00:15:53.800 |
Some of the most brilliant people I know use mind maps. 00:16:18.000 |
it starts being more organized inside your own mind. 00:16:24.000 |
Like, what's the first thing you write on paper? 00:16:39.560 |
Like you would write intelligence or something. 00:16:42.160 |
And then you would start adding associative connections. 00:16:48.040 |
What do you think are the key elements of intelligence? 00:16:50.440 |
So maybe you would have language, for instance, 00:16:53.400 |
And so you would start drawing notes with these things. 00:16:56.440 |
what do you think about when you think about motion? 00:16:58.480 |
And so on, and you would go like that, like a tree. 00:17:08.000 |
And it's not limited to just writing down words. 00:17:15.960 |
And it's not supposed to be purely hierarchical, right? 00:17:19.640 |
Like you can, the point is that you can start, 00:17:24.560 |
you can start reorganizing it so that it makes more sense, 00:17:27.560 |
so that it's connected in a more effective way. 00:17:29.960 |
- See, but I'm so OCD that you just mentioned intelligence 00:17:42.040 |
Like that I would become paralyzed with the mind map 00:17:55.400 |
there's an implied hierarchy that's emerging. 00:18:10.620 |
but then you can also start getting paranoid. 00:18:23.480 |
if you think the central node is not the right node. 00:18:26.320 |
- Yeah, I suppose there's a fear of being wrong. 00:18:37.400 |
Like how do you know what you think about something 00:18:46.240 |
much more syntactic structure over your ideas, 00:18:54.200 |
more freehand way of organizing your thoughts. 00:18:59.640 |
then you can start actually voicing your thoughts 00:19:05.400 |
- It's a two dimensional aspect of layout too, right? 00:19:08.180 |
And it's a kind of flower, I guess, you start. 00:19:12.880 |
There's usually, you want to start with a central concept? 00:19:16.960 |
- Typically it ends up more like a subway map. 00:19:19.180 |
So it ends up more like a graph, a topological graph. 00:19:25.040 |
there are some nodes that are more connected than others. 00:19:27.320 |
And there are some nodes that are more important than others. 00:19:32.440 |
but it's not gonna be purely like a tree, for instance. 00:19:38.600 |
if there's something to that about the way our mind thinks. 00:19:42.440 |
By the way, I just kind of remembered obvious thing 00:19:49.060 |
in Google Doc at this point that are bullet point lists, 00:19:59.680 |
It's the same, it's a, no, it's not, it's a tree. 00:20:10.800 |
Like, I guess I'm comfortable with the structure. 00:20:23.120 |
why don't you write some kind of search engine, 00:20:30.880 |
a mind mapping software where you write down a concept 00:20:33.960 |
and then it gives you sentences or paragraphs 00:20:41.240 |
- The problem is it's so deeply, unlike mind maps, 00:20:48.460 |
So it's not semantically searchable, I would say, 00:20:57.200 |
you kind of mentioned intelligence, language, and motion. 00:21:24.200 |
So unfortunately, that's the same problem with the internet. 00:21:28.980 |
That's why the idea of semantic web is difficult to get. 00:21:31.920 |
Most language on the internet is a giant mess 00:21:37.980 |
of natural language that's hard to interpret. 00:21:40.220 |
So do you think there's something to mind maps as, 00:21:48.060 |
as we were talking about kind of cognition and language. 00:22:04.940 |
that there is some level of topological processing 00:22:09.940 |
in the brain, that the brain is very associative in nature. 00:22:46.340 |
You only have the concept of whether there is a train 00:22:54.520 |
is that we're actually dealing with geometric spaces. 00:22:57.740 |
We are dealing with concept vectors, word vectors, 00:23:05.420 |
We are not really building topological models usually. 00:23:11.820 |
Like distance is a fundamental importance in deep learning. 00:23:24.500 |
If your space is discrete, it's no longer differentiable. 00:23:31.420 |
by embedding it in a bigger continuous space. 00:23:35.660 |
So if you do topology in the context of deep learning, 00:23:46.260 |
Let's get into your paper on the measure of intelligence 00:24:09.660 |
You could travel, you could actually go outside 00:24:23.380 |
like 200 years from now, on artificial intelligence, 00:24:45.260 |
which is one of the things you do in your paper 00:24:55.580 |
So is there a spiffy quote about what is intelligence? 00:25:06.480 |
- Yeah, so do you think the super intelligent AIs 00:25:29.120 |
in certain contexts, like historical civilization 00:25:38.280 |
And it'll be studied in also the context on social media, 00:25:42.360 |
there'll be hashtags about the atrocity committed 00:25:47.080 |
to human beings when the robots finally got rid of them. 00:25:52.080 |
Like it was a mistake, it'll be seen as a giant mistake, 00:26:01.560 |
because humans were over-consuming the resources, 00:26:07.240 |
and were destructive in the end in terms of productivity, 00:26:13.800 |
And so within that context, there'll be a chapter 00:26:17.480 |
- Seems you have a very detailed vision of that hit here. 00:26:22.640 |
- I'm working on a sci-fi novel currently, yes. 00:26:38.920 |
tasks that you did not previously know about, 00:26:44.680 |
So it is not, intelligence is not skill itself, 00:26:47.760 |
it's not what you know, it's not what you can do, 00:26:50.720 |
it's how well and how efficiently you can learn new things. 00:27:01.120 |
- Yes, so you would see intelligence on display, 00:27:16.560 |
When you see adaptation, when you see improvisation, 00:27:19.280 |
when you see generalization, that's intelligence. 00:27:24.400 |
that when you put it in a slightly new environment, 00:27:29.960 |
it cannot deviate from what it's hard-coded to do 00:27:46.720 |
"The measure of intelligence is the ability to change." 00:27:54.960 |
- You know, there might be something interesting 00:27:56.480 |
about the difference between your definition and Einstein's. 00:28:03.720 |
but acquisition of new ability to deal with new things 00:28:16.080 |
what's the difference between those two things? 00:28:28.440 |
So not change, but certainly a change in direction, 00:28:33.200 |
being able to adapt yourself to your environment. 00:28:52.560 |
And I think there's a big distinction to be drawn 00:28:59.560 |
and the output of that process, which is skill. 00:29:03.040 |
So for instance, if you have a very smart human programmer 00:29:10.720 |
and that writes down a static program that can play chess, 00:29:31.840 |
is that if you put it in a different context, 00:29:50.080 |
We should not confuse the output and the process. 00:29:53.200 |
It's the same as, do not confuse a road building company 00:30:00.160 |
because one specific road takes you from point A to point B, 00:30:03.400 |
but a road building company can take you from, 00:30:06.120 |
can make a path from anywhere to anywhere else. 00:30:10.080 |
But it's also, to play devil's advocate a little bit, 00:30:14.920 |
it's possible that there is something more fundamental 00:30:24.640 |
the difference between the choir of the skill 00:30:32.520 |
like you could argue the universe is more intelligent. 00:30:51.480 |
as opposed to, like there could be a deeper intelligence. 00:30:55.920 |
- There's always deeper intelligence, I guess. 00:31:03.240 |
because they are capable of adaptation and generality. 00:31:07.400 |
And you see that in particular in the fact that 00:31:10.040 |
humans are capable of handling situations and tasks 00:31:19.720 |
that any of our evolutionary ancestors has ever encountered. 00:31:32.240 |
- Of course, evolutionary biologists would argue 00:31:35.080 |
that we're not going too far out of the distribution. 00:31:37.680 |
We're like mapping the skills we've learned previously, 00:31:47.080 |
- I mean, there's definitely a little bit of that, 00:31:49.480 |
but it's pretty clear to me that we're able to, 00:31:52.200 |
you know, most of the things we do any given day 00:32:14.280 |
that we acquired over the course of evolution, right? 00:32:17.880 |
And that anchors our cognition to certain contexts, 00:32:34.120 |
Like the degree in which the mind can generalize 00:32:40.480 |
can generalize away from its evolutionary history 00:33:00.360 |
We should measure like the creation of the new skill, 00:33:39.240 |
because you know they weren't born with that skill, 00:33:45.200 |
maybe you're a very strong chess player yourself. 00:33:47.560 |
- I think you're saying that 'cause I'm Russian 00:34:01.880 |
you know they weren't born knowing how to play chess. 00:34:15.400 |
And so they may as well have acquired any other skill. 00:34:30.840 |
The computer may be born knowing how to play chess 00:34:35.280 |
in the sense that it may have been programmed 00:34:37.360 |
by a human that has understood chess for the computer 00:34:54.600 |
of the "On the Measure of Intelligence" paper? 00:35:00.480 |
is to clear up some longstanding misunderstandings 00:35:04.560 |
about the way we've been conceptualizing intelligence 00:35:09.920 |
And in the way we've been evaluating progress in AI. 00:35:14.920 |
There's been a lot of progress recently in machine learning 00:35:19.040 |
and people are extrapolating from that progress 00:35:22.120 |
that we are about to solve general intelligence. 00:35:26.360 |
And if you want to be able to evaluate these statements, 00:35:30.480 |
you need to precisely define what you're talking about 00:35:33.800 |
when you're talking about general intelligence. 00:35:42.840 |
how much general intelligence a system processes. 00:35:50.720 |
So it should not just describe what intelligence is. 00:35:57.320 |
that tells you the system is intelligent or it isn't. 00:36:16.000 |
you draw a distinction between two divergent views 00:36:23.360 |
intelligence is a collection of task-specific skills 00:36:39.560 |
but can you try to linger on this topic for a bit? 00:36:50.480 |
and the different ways we've been evaluating progress 00:36:57.720 |
has been shaped by two views of the human mind. 00:37:01.280 |
And one view is the evolutionary psychology view 00:37:08.920 |
of fairly static, special purpose, ad hoc mechanisms 00:37:17.640 |
over our history as a species over a very long time. 00:37:43.600 |
And in fact, I think they very much understood the mind 00:37:48.040 |
through the metaphor of the mainframe computer 00:37:50.560 |
because it was the tool they were working with, right? 00:37:55.120 |
this collection of very different static programs 00:38:00.160 |
And in this picture, learning was not very important. 00:38:03.720 |
Learning was considered to be just memorization. 00:38:05.760 |
And in fact, learning is basically not featured 00:38:39.360 |
it was considered that the mind was a collection 00:38:42.240 |
of programs that were primarily logical in nature. 00:38:46.720 |
And that's all you needed to do to create a mind 00:38:52.960 |
which would be stored in some kind of database. 00:39:28.280 |
and that absorbs knowledge and skills from experience. 00:39:33.280 |
So it's a sponge that reflects the complexity of the world, 00:39:39.560 |
the complexity of your life experience, essentially. 00:39:42.680 |
That everything you know and everything you can do 00:39:52.480 |
that was not very popular, for instance, in the 1970s, 00:39:58.240 |
but that had gained a lot of vitality recently 00:40:00.400 |
with the rise of connectionism, in particular deep learning. 00:40:04.080 |
And so today, deep learning is the dominant paradigm in AI. 00:40:12.000 |
are conceptualizing the mind via a deep learning metaphor. 00:40:24.160 |
and then that gets trained via exposure to training data 00:40:33.480 |
I feel like people who are thinking about intelligence 00:40:43.480 |
who believe that a neural network will be able to reason, 00:40:51.680 |
'cause I think it's actually an interesting worldview. 00:40:58.120 |
what neural networks have been able to accomplish. 00:41:04.520 |
but it's an open question whether like scaling size 00:41:09.840 |
eventually might lead to incredible results to us, 00:41:18.800 |
who are seriously thinking about intelligence, 00:41:20.760 |
they will definitely not say that all you need to do 00:41:26.520 |
However, it's actually a view that's very popular, 00:41:46.960 |
to meet a person who is not intellectually lazy 00:41:50.240 |
and still believes that neural networks will go all the way. 00:41:54.480 |
I think Yann LeCun is probably closest to that, 00:41:56.800 |
would self-supervise. - There are definitely people 00:42:03.080 |
are already the way to general artificial intelligence, 00:42:22.680 |
- So on that topic, GPT-3, similar to GPT-2, actually, 00:42:27.680 |
have captivated some part of the imagination of the public. 00:42:33.000 |
There's just a bunch of hype of different kind 00:42:46.500 |
that there's, I believe, a couple months delay 00:42:56.580 |
but it feels like there was a little bit of a lack of hype, 00:43:04.760 |
But nevertheless, there's a bunch of cool applications 00:43:07.480 |
that seem to captivate the imagination of the public 00:43:22.960 |
- Yeah, so I think what's interesting about GPT-3 00:43:25.720 |
is the idea that it may be able to learn new tasks 00:43:33.640 |
So I think if it's actually capable of doing that, 00:43:39.920 |
That said, I must say, I'm not entirely convinced 00:43:43.160 |
that we have shown it's capable of doing that. 00:43:52.240 |
that what it's actually doing is pattern matching 00:43:58.080 |
that it's been exposed to in its training data. 00:44:01.640 |
instead of just developing a model of the task, right? 00:44:15.560 |
as a kind of SQL query into this thing that it's learned, 00:44:20.840 |
which is language is used to query the memory. 00:44:36.400 |
or it becomes, or intelligence becomes a querying machine? 00:44:40.560 |
- I think it's possible that a significant chunk 00:44:44.160 |
of intelligence is this giant associative memory. 00:45:05.760 |
like, what do you think, where's the ceiling? 00:45:13.440 |
Like, what is the ceiling is the better question. 00:45:26.880 |
- Is gonna improve on the strength of GPT-2 and 3, 00:45:30.920 |
which is it will be able to generate, you know, 00:45:39.920 |
- Yes, if you train a bigger model on more data, 00:45:44.360 |
then your text will be increasingly more context aware 00:45:54.720 |
at generating plausible text compared to GPT-2. 00:45:57.520 |
But that said, I don't think just scaling up the model 00:46:03.400 |
to more transformer layers and more trained data 00:46:08.440 |
which is that it can generate plausible text, 00:46:11.360 |
but that text is not constrained by anything else 00:46:16.680 |
So in particular, it's not constrained by factualness 00:46:24.080 |
to generate statements that are factually untrue 00:46:27.920 |
or to generate statements that are even self-contradictory, 00:46:39.120 |
It's not constrained to be self-consistent, for instance. 00:46:44.080 |
one thing that I thought was very interesting with GPT-3 00:46:46.680 |
is that you can pre-determine the answer it will give you 00:46:53.480 |
because it's very responsive to the way you ask the question 00:46:56.600 |
since it has no understanding of the content of the question. 00:47:01.600 |
And if you ask the same question in two different ways 00:47:15.640 |
- It's very susceptible to adversarial attacks, essentially. 00:47:19.440 |
So in general, the problem with these models, 00:47:24.320 |
they're very good at generating plausible text, 00:47:30.320 |
I think one avenue that would be very interesting 00:47:45.720 |
that you would rely on these self-supervised models 00:47:49.520 |
to generate a sort of pool of knowledge and concepts 00:48:26.120 |
it generates, like you said, something that's plausible. 00:48:29.040 |
- Yeah, so if you try to make it generate programs, 00:48:38.720 |
but because program space is not interpretative, right? 00:48:44.320 |
It's not gonna be able to generalize to problems 00:48:59.520 |
you know, the GPT-3 has 175 billion parameters. 00:49:14.840 |
Do you think, obviously, very different kinds of things, 00:49:26.200 |
Do you think, what do you think GPT will look like 00:49:34.240 |
You think our conversation might be in nature different? 00:49:38.520 |
'Cause you've criticized GPT-3 very effectively now. 00:49:46.960 |
So to begin with the bottleneck with scaling up GPT-3, 00:49:51.080 |
GPT models, generative pre-trained transformer models, 00:50:05.600 |
on a crawl of basically the entire web, right? 00:50:09.880 |
So you could imagine training on more data than that. 00:50:14.440 |
but it would still be only incrementally more data. 00:50:17.480 |
And I don't recall exactly how much more data 00:50:30.120 |
on a hundred more data than what you're already doing. 00:50:36.400 |
it's easier to think of compute as a bottleneck 00:50:38.880 |
and then arguing that we can remove that bottleneck, but- 00:50:44.560 |
If you look at the pace at which we've improved 00:51:08.440 |
So the quality of the data is an interesting point. 00:51:10.880 |
The thing is, if you're gonna want to use these models 00:51:25.600 |
but you know, there's not really such a thing 00:51:30.480 |
But you probably don't want to train it on Reddit, 00:51:42.760 |
so at some point I was working on a model at Google 00:51:46.600 |
that's trained on like 350 million labeled images. 00:51:54.640 |
That's like probably most publicly available images 00:52:03.880 |
because the labels were not originally annotated by hand, 00:52:08.640 |
They were automatically derived from like tags 00:52:12.400 |
on social media or just keywords in the same page 00:52:19.080 |
And it turned out that you could easily get a better model, 00:52:31.480 |
but you very quickly hit diminishing returns. 00:52:39.960 |
annotations that are actually made by humans, 00:52:53.440 |
There's a way to get better doing the automated labeling. 00:52:58.440 |
- Yeah, so you can enrich or refine your labels 00:53:15.600 |
and is the idea of being able to convert the internet 00:53:29.760 |
to be able to convert information on the internet 00:53:35.680 |
into something that's interpretable by machines. 00:53:49.720 |
the internet is full of rich, exciting information. 00:53:54.400 |
we should be able to use that as data for machines. 00:53:59.000 |
is not really in a format that's available to machines. 00:54:01.240 |
So no, I don't think the semantic web will ever work 00:54:04.520 |
simply because it would be a lot of work, right? 00:54:08.000 |
To make, to provide that information in structured form. 00:54:16.320 |
So I think the way forward to make the knowledge 00:54:22.800 |
is actually something closer to unsupervised deep learning. 00:54:32.200 |
of making the knowledge of the web available to machines 00:54:54.360 |
I think the forms of reasoning that you see perform 00:55:02.440 |
So of course, if you're trained on the entire web, 00:55:06.640 |
then you can produce an illusion of reasoning 00:55:15.320 |
- That's the open question between the illusion of reasoning 00:55:28.040 |
you could train on every bit of data ever generated 00:55:38.560 |
of anticipating many different possible situations, 00:55:47.280 |
Like for instance, if you train a GPT-3 model 00:55:58.280 |
It's gonna be missing many common sense facts 00:56:02.640 |
It's even gonna be missing vocabulary and so on. 00:56:05.880 |
- Yeah, it's interesting that GPT-3 even doesn't have, 00:56:09.640 |
I think, any information about the coronavirus. 00:56:28.200 |
It's also gonna require some amount of improvisation. 00:56:51.400 |
from a distance, like L5 self-driving, for instance. 00:57:12.800 |
That's a lot more data than the 20 or 30 hours of driving 00:57:19.560 |
given the knowledge they've already accumulated. 00:57:31.720 |
is really pushing for a learning-based approach. 00:57:45.920 |
Is L5 is completely, you can just fall asleep? 00:57:51.160 |
Well, driving, I have to be careful saying human level 00:57:59.920 |
you know, cars will most likely be much safer than humans 00:58:11.440 |
the thing is the amounts of trained data you would need 00:58:14.720 |
to anticipate for pretty much every possible situation 00:58:25.520 |
we'll develop a system that's trained on enough data, 00:58:32.400 |
We don't necessarily need actual cars on the road 00:58:53.320 |
And if you use deep learning for what it's good at, 00:58:59.400 |
So in general, deep learning is a way to encode perception 00:59:14.760 |
strong generalization tends to come from explicit models, 00:59:20.600 |
tend to come from abstractions in the human mind 00:59:24.320 |
that are encoded in program form by a human engineer. 00:59:28.680 |
These are the abstractions you can actually generalize, 00:59:34.920 |
- Yeah, and the question is how much reasoning, 00:59:44.600 |
That's the question, or human life, existence. 00:59:48.800 |
How much strong abstractions does existence require, 00:59:55.620 |
That seems to be a coupled question about intelligence 01:00:07.200 |
And the coupled problem, how hard is this problem? 01:00:11.400 |
How much intelligence does this problem actually require? 01:00:30.260 |
before we ever learn, quote unquote, to drive, 01:00:32.440 |
we get to watch other cars and other people drive. 01:00:42.680 |
And that's similar to what neural networks are doing, 01:00:51.360 |
how many leaps of reasoning genius is required 01:01:01.320 |
I mean, sure, you've seen a lot of cars in your life 01:01:07.720 |
but let's say you've learned to drive in Silicon Valley 01:01:14.120 |
Well, now everyone is driving on the other side of the road 01:01:29.280 |
to just be operational in this very different environment 01:01:41.360 |
that is contained in this environment, right? 01:01:45.960 |
it's not just interpolation over the situations 01:01:56.880 |
is one of the most interesting tests of intelligence 01:02:09.840 |
- So I don't think driving is that much of a test 01:02:14.800 |
there is no task for which skill at that task 01:02:26.520 |
So I don't think, I think you can actually solve driving 01:02:29.320 |
without having any real amount of intelligence. 01:02:34.320 |
For instance, if you really did have infinite training data, 01:02:41.680 |
an end-to-end deep learning model that does driving, 01:02:49.000 |
is collecting a data sets that's sufficiently comprehensive 01:02:59.280 |
So I think there's nothing fundamentally wrong 01:03:17.840 |
Whereas if instead you took a more manual engineering 01:03:26.960 |
in combination with engineering an explicit model 01:03:36.120 |
your model will actually start generalizing much earlier 01:03:39.840 |
and more effectively than the end-to-end deep learning model. 01:03:44.560 |
with the more manual engineering oriented approach? 01:03:50.040 |
either the end-to-end deep learning model system 01:04:01.720 |
general intelligence or intelligence of any generality at all. 01:04:05.720 |
Again, the only possible test of generality in AI 01:04:10.520 |
would be a test that looks at skill acquisition 01:04:14.280 |
But for instance, you could take your L5 driver 01:04:17.360 |
and ask it to learn to pilot a commercial airplane 01:04:28.080 |
for the system to learn to pilot an airplane. 01:04:37.520 |
I get you, but I'm more interested as a problem. 01:04:46.600 |
that can generate novel situations at some rate, 01:04:56.160 |
like we're confronted, let's say once a month. 01:05:03.840 |
just by training a statistical model on a lot of data. 01:05:14.560 |
if you have a vehicle that achieves level five, 01:05:17.680 |
it is going to be able to deal with new situations. 01:05:34.400 |
So if we go back to your kind of definition of intelligence, 01:05:39.680 |
- With which you can adapt to new situations, 01:05:45.880 |
Not situations that could be anticipated by your creators, 01:05:48.640 |
by the creators of the system, but truly new situations. 01:05:51.960 |
The efficiency with which you acquire new skills. 01:05:55.160 |
If you require, if in order to pick up a new skill, 01:05:58.440 |
you require a very extensive training data set 01:06:20.160 |
you need a human engineer to write down a bunch of rules 01:06:37.920 |
of the engineers that are creating it, right? 01:07:09.160 |
that's capable of producing these abstractions, right? 01:07:11.960 |
- Yeah, it feels like there's a little bit of a gray area. 01:07:20.520 |
but those abstractions do not seem to be effective 01:07:32.640 |
No, deep learning does generalize a little bit. 01:07:34.320 |
Like, generalization is not binary, it's more like a spectrum. 01:07:40.120 |
it's a gray area, but there's a certain point 01:07:46.480 |
No, like, I guess exactly what you were saying is 01:07:52.320 |
intelligence is how efficiently you're able to generalize 01:08:04.200 |
- So it's both like the distance of how far you can, 01:08:10.240 |
and how efficiently you're able to deal with that. 01:08:12.600 |
- So you can think of intelligence as a measure 01:08:32.000 |
That's provided by the situations you already know. 01:08:41.040 |
the prior knowledge that's embedded in the system. 01:08:43.560 |
So the system starts with some information, right, 01:08:48.880 |
And it's about going from that information to a program, 01:08:53.600 |
what we would call a skill program, a behavioral program 01:08:56.440 |
that can cover a large area of possible situation space. 01:09:04.120 |
and the amount of information you start with is intelligence. 01:09:07.600 |
So a very smart agent can make efficient uses 01:09:14.200 |
of very little information about a new problem 01:09:19.600 |
to cover a very large area of potential situations 01:09:26.560 |
what these future new situations are going to be. 01:09:29.400 |
- So one of the other big things you talk about in the paper, 01:09:40.920 |
So if we look at like human and machine intelligence, 01:09:45.960 |
do you think tests of intelligence should be different 01:09:50.320 |
or how we think about testing of intelligence? 01:09:53.360 |
Are these fundamentally the same kind of intelligences 01:09:58.240 |
that we're after and therefore the tests should be similar? 01:10:25.080 |
how intelligent, in terms of human intelligence, 01:10:36.400 |
and to human intelligence are very different, 01:10:39.400 |
and your tests should account for this difference. 01:10:50.440 |
to buy arbitrary levels of skill at arbitrary tasks, 01:10:55.440 |
either by injecting hard-coded prior knowledge 01:11:19.520 |
and you could train a Go playing system that way, 01:11:28.640 |
because a human that plays Go had to develop that skill 01:11:38.760 |
and of course, this started from a different set of priors. 01:11:59.840 |
You have to start from the same set of knowledge priors 01:12:04.520 |
about the task, and you have to control for experience, 01:12:28.080 |
So for instance, if you're trying to play Go, 01:13:05.600 |
And other board games can also show some similarity with Go. 01:13:13.680 |
that would be part of your priors about the game. 01:13:16.320 |
- Well, it's interesting to think about the game of Go 01:13:18.480 |
is how many priors are actually brought to the table. 01:13:21.160 |
When you look at self-play, reinforcement learning-based 01:13:28.960 |
it seems like the number of priors is pretty low. 01:13:32.720 |
- There is a 2D spatial priors in the covenant. 01:13:36.600 |
But you should be clear at making those priors explicit. 01:13:44.040 |
is to measure a human-like form of intelligence, 01:13:52.880 |
to start from the same set of priors that humans start with. 01:14:02.760 |
the human side of things is very interesting. 01:14:08.040 |
What do you think is a good test of human intelligence? 01:14:12.920 |
- Well, that's the question that psychometrics 01:14:25.240 |
- The psychometrics is the subfield of psychology 01:14:28.000 |
that tries to measure, quantify aspects of the human mind. 01:14:33.000 |
So in particular, cognitive abilities, intelligence, 01:14:39.720 |
- So, like what are, might be a weird question, 01:14:55.400 |
- So it's a field with a fairly long history. 01:14:58.720 |
It's, so, you know, psychology sometimes gets 01:15:03.840 |
a bad reputation for not having very reproducible results. 01:15:14.120 |
So the ideal goals of the field is, you know, 01:15:23.160 |
It should be valid, meaning that it should actually 01:15:43.640 |
Should be standardized, meaning that you can administer 01:15:47.440 |
your tests to many different people in some conditions. 01:15:52.960 |
meaning that, for instance, if your test involves 01:15:57.240 |
the English language, then you have to be aware 01:16:09.640 |
for creating psychometric tests are very much an ideal. 01:16:22.160 |
But at least the field is aware of these weaknesses, 01:16:38.960 |
as you mentioned, strongly with some general concept 01:17:04.520 |
When you run these very different tests at scale, 01:17:08.640 |
what you start seeing is that there are clusters 01:17:14.160 |
So for instance, if you look at homework at school, 01:17:21.840 |
are also likely, statistically, to do well in physics. 01:17:25.600 |
And what's more, there are also people who do well at math, 01:17:32.040 |
to do well in things that sound completely unrelated, 01:17:47.680 |
And the latent variable that would, for instance, 01:17:49.400 |
explain the relationship between being good at math 01:17:53.040 |
and being good at physics would be cognitive ability. 01:18:00.840 |
that explains the fact that every test of intelligence 01:18:05.600 |
that you can come up with results on this test 01:18:16.240 |
that explains these correlations, that's the G factor. 01:18:20.360 |
It's not really something you can directly measure, 01:18:26.600 |
- But it's there, it's there, it's there at scale. 01:18:33.520 |
Like, you know, when you talk about measuring intelligence 01:18:36.640 |
in humans, for instance, some people get a little bit worried 01:18:40.080 |
they will say, you know, that sounds dangerous, 01:18:41.960 |
maybe that sounds potentially discriminatory and so on. 01:18:50.320 |
as a way to characterize one individual person. 01:18:54.800 |
Like if I get your psychometric personality assessments 01:18:59.800 |
or your IQ, I don't think that actually tells me much 01:19:05.040 |
I think psychometrics is most useful as a statistical tool. 01:19:12.560 |
It's most useful when you start getting test results 01:19:17.480 |
and you start cross correlating these test results 01:19:20.640 |
because that gives you information about the structure 01:19:29.840 |
So at scale, psychometrics paints a certain picture 01:19:42.880 |
I mean, to me, I remember when I learned about g-factor, 01:19:45.920 |
it seemed like it would be impossible for it to be real, 01:19:59.080 |
Like it's like wishful thinking among psychologists. 01:20:02.080 |
But the more I learned, I realized that there's some, 01:20:05.760 |
I mean, I'm not sure what to make about human beings, 01:20:10.280 |
That there's a commonality across all of the human species, 01:20:13.320 |
that there does need to be a strong correlation 01:20:18.600 |
- Yeah, so human cognitive abilities have a structure, 01:20:22.840 |
like the most mainstream theory of the structure 01:20:25.440 |
of cognitive abilities is called a CHC theory. 01:20:35.360 |
And it describes cognitive abilities as a hierarchy 01:20:48.680 |
That encompass a broad set of possible kinds of tasks 01:20:59.920 |
at the last level, which is closer to task specific skill. 01:21:10.000 |
that just emerged from different statistical analysis 01:21:25.680 |
that it's not something you can observe and measure, 01:21:33.920 |
in a statistical analysis of the data, right? 01:21:52.200 |
is gonna be able to solve any problem at all. 01:22:03.400 |
If you consider the concept of physical fitness, 01:22:06.760 |
it's a concept that's very similar to intelligence 01:22:11.440 |
It's something you can intuitively understand. 01:22:23.920 |
- It's a constraint to a specific set of skills. 01:22:32.400 |
You cannot surf over the bottom of the ocean and so on. 01:22:43.440 |
then you would come up with a battery of tests. 01:22:50.760 |
playing soccer, playing table tennis, swimming, and so on. 01:22:54.200 |
And if you run these tests over many different people, 01:22:58.440 |
you would start seeing correlations in test results. 01:23:10.480 |
that are strictly analogous to cognitive abilities. 01:23:14.040 |
And then you would start also observing correlations 01:23:27.120 |
And in the same way that there are neurophysical correlates 01:23:34.040 |
And at the top of the hierarchy of physical abilities 01:23:39.960 |
you would have a G factor, a physical G factor, 01:23:57.880 |
We can only do the things that we were evolved to do. 01:24:09.960 |
or in the void of space or the bottom of the ocean. 01:24:12.480 |
So that said, one thing that's really striking 01:24:20.360 |
generalizes far beyond the environments that we evolved for. 01:24:32.920 |
That's very much where our human morphology comes from. 01:24:42.960 |
We can climb mountains, we can swim across lakes, 01:25:02.280 |
And I think cognition is very similar to that. 01:25:05.360 |
Our cognitive abilities have a degree of generality 01:25:12.400 |
which is why we can play music and write novels 01:25:15.280 |
and go to Mars and do all kinds of crazy things. 01:25:27.800 |
In the same way you could say that the human mind 01:25:29.680 |
is not really appropriate for most of problem space, 01:25:35.480 |
So we have very strong cognitive biases, actually, 01:25:39.720 |
that mean that there are certain types of problems 01:25:48.280 |
So that's really how we'd interpret the G-factor. 01:25:56.800 |
It's really just the broadest cognitive ability. 01:26:02.560 |
whether we are talking about sensory motor abilities 01:26:11.440 |
- Within the constraints of the human cognition, 01:26:19.560 |
- But the constraints, as you're saying, are very limited. 01:26:34.600 |
part of the constraints that drove our evolution 01:26:39.600 |
So we were, in a way, evolved to be able to improvise 01:26:42.800 |
in all kinds of physical or cognitive environments, right? 01:27:01.880 |
And that goes, that's a degree of generalization 01:27:08.680 |
That said, it does not mean that human intelligence 01:27:17.640 |
You know, it's a kind of exciting topic for people, 01:27:21.160 |
even outside of artificial intelligence, is IQ tests. 01:27:29.200 |
There's different degrees of difficulty for questions. 01:27:32.440 |
We talked about this offline a little bit, too, 01:27:37.200 |
You know, what makes a question on an IQ test 01:27:40.760 |
more difficult or less difficult, do you think? 01:28:02.760 |
typically it would be structured, for instance, 01:28:05.960 |
as a set of demonstration input and output pairs, right? 01:28:10.960 |
And then you would be given a test input, a prompt, 01:28:40.160 |
- For instance, let's say you have a rotation problem. 01:28:50.520 |
which is actually one of the two training examples, 01:28:53.040 |
then there is zero generalization difficulty for the task. 01:28:57.480 |
You just recognize that it's one of the training examples 01:29:07.680 |
but it remains that you are still doing the same thing 01:29:28.800 |
you're teaching a class on quantum physics or something. 01:29:45.720 |
that's very different from anything they've seen 01:29:48.800 |
like on the internet when they were cramming. 01:29:51.720 |
On the other hand, if you wanted to make it easy, 01:30:09.280 |
It's very similar to what you've been trained on. 01:30:19.000 |
It forces you to do things that are different 01:30:28.880 |
that requires improvisation is intrinsically hard, right? 01:30:32.680 |
Because maybe you're a quantum physics expert. 01:30:37.240 |
this is actually stuff that despite being new 01:30:54.600 |
So that's what I mean by controlling for priors, 01:30:57.920 |
what you, the information you bring to the table. 01:31:00.760 |
- And the experience, which is the training data. 01:31:09.720 |
and all the mock exams that students might have taken online. 01:31:18.520 |
like I've been, just this curious question of, 01:31:22.480 |
you know, what's a really hard IQ test question. 01:31:36.200 |
First of all, most of the IQ tests they designed, 01:31:39.480 |
they like religiously protect against the correct answers. 01:31:44.480 |
Like you can't find the correct answers anywhere. 01:31:48.400 |
In fact, the question is ruined once you know, 01:31:50.660 |
even like the approach you're supposed to take. 01:31:54.600 |
- That's the approach is implicit in the training examples. 01:31:58.480 |
So if you release the training examples, it's over. 01:32:05.040 |
there is a test set that is private and no one has seen it. 01:32:09.200 |
- No, for really tough IQ questions, it's not obvious. 01:32:17.160 |
Like it's, I mean, we'll have to look through them, 01:32:25.080 |
So like you can get a sense, but there's like some, 01:32:29.400 |
you know, when you look at a number sequence, I don't know, 01:32:39.600 |
that sequence could be completed in a lot of different ways. 01:32:43.000 |
And, you know, some are, if you think deeply, 01:32:53.040 |
- Yes, I am personally not a fan of ambiguity 01:33:03.160 |
simply by making the test require a lot of extrapolation 01:33:13.400 |
but gives away everything when you give the training example. 01:33:18.520 |
Meaning that, so the tests I'm interested in creating 01:33:31.600 |
They're supposed to be difficult for machines 01:33:36.320 |
Like I think an ideal test of human and machine intelligence 01:34:16.520 |
You have to think like, what is hard for humans? 01:34:19.600 |
And that's a fascinating exercise in itself, I think. 01:34:27.760 |
of what it takes to create a really hard question 01:34:31.440 |
for humans because you again have to do the same process 01:34:36.320 |
as you mentioned, which is something basically 01:34:45.960 |
to have encountered throughout your whole life, 01:34:57.880 |
You should not be able to practice for the questions 01:35:02.120 |
that you're gonna be tested on, that's important. 01:35:12.440 |
It's the same thing as a deep learning model. 01:35:15.960 |
on all the possible answers, then it will ace your test. 01:35:30.200 |
they memorize a hundred different possible mock exams 01:35:37.200 |
will be a very simple interpolation of the mock exams. 01:35:41.200 |
And that student could just be a deep learning model 01:35:53.160 |
you need an exam that's unlike anything they've seen 01:36:00.000 |
- So how do we design an IQ test for machines? 01:36:14.920 |
And in particular, we should start by acknowledging 01:36:25.400 |
So we should be explicit about the priors, right? 01:36:28.200 |
And if the goal is to compare machine intelligence 01:36:32.840 |
then we should assume a human cognitive priors, right? 01:36:37.120 |
And secondly, we should make sure that we are testing 01:36:48.720 |
meaning that every task featured in your test 01:36:58.080 |
to brute force the space of possible questions, right? 01:37:02.960 |
To pre-generate every possible question and answer. 01:37:06.040 |
So it should be tasks that cannot be anticipated, 01:37:17.760 |
I mean, one of my favorite aspects of the paper 01:37:27.240 |
Just even that act alone is a really powerful one 01:37:33.520 |
of like, what are, it's a really powerful question 01:37:40.560 |
What are the priors that we bring to the table? 01:37:42.920 |
So the next step is like, once you have those priors, 01:37:50.160 |
But like, just even making the priors explicit 01:37:53.000 |
is a really difficult and really powerful step. 01:37:59.040 |
and conceptually, philosophically beautiful part 01:38:12.480 |
- Yes, so a researcher that has done a lot of work 01:38:19.480 |
that are innate to humans is Elizabeth Spelke 01:38:30.640 |
which outlines four different core knowledge systems. 01:38:35.640 |
So systems of knowledge that we are basically 01:38:47.240 |
And there's no strong distinction between the two. 01:38:57.080 |
as a certain type of knowledge in just a few weeks, 01:39:06.520 |
And so there are four different core knowledge systems. 01:39:09.560 |
Like the first one is the notion of objectness 01:39:23.280 |
So we intuitively, naturally, innately divide the world 01:39:28.280 |
into objects based on this notion of coherence, 01:39:34.760 |
there's the fact that objects can bump against each other 01:39:41.680 |
and the fact that they can occlude each other. 01:39:44.520 |
So these are things that we are essentially born with 01:39:48.320 |
or at least that we are going to be acquiring 01:39:50.800 |
extremely early because we're really hardwired 01:39:55.680 |
- So a bunch of points, pixels that move together. 01:40:08.800 |
but if I did, that's something I could sit like all night 01:40:14.320 |
I remember when I first, in your paper, just objectness. 01:40:16.720 |
I wasn't self-aware, I guess, of that particular prior 01:40:21.720 |
that that's such a fascinating prior that like-- 01:40:34.440 |
I mean, it's very basic, I suppose, but it's so fundamental. 01:40:42.240 |
- And the second prior that's also fundamental is agentness, 01:40:50.800 |
The fact that some of these objects that you segment 01:40:55.240 |
your environment into, some of these objects are agents. 01:41:00.360 |
It's basically it's an object that has goals. 01:41:06.360 |
- That has goals, that is capable of pursuing goals. 01:41:16.320 |
you will intuitively infer that one of the dots 01:41:21.600 |
So that one of the dots is, and one of the dots is an agent, 01:41:29.440 |
And one of the dots, the other dot is also an agent, 01:41:35.840 |
Pelkey has shown that babies, as young as three months, 01:41:46.440 |
Another prior is basic geometry and topology, 01:41:53.680 |
the ability to navigate in your environment and so on. 01:41:57.640 |
This is something that is fundamentally hardwired 01:42:02.720 |
It's in fact backed by very specific neural mechanisms, 01:42:07.080 |
like for instance, grid cells and plate cells. 01:42:10.800 |
So it's something that's literally hard-coded 01:42:19.920 |
And the last prior would be the notion of numbers. 01:42:23.560 |
Like numbers are not actually a cultural construct. 01:42:26.440 |
We are intuitively, innately able to do some basic counting 01:42:34.960 |
So it doesn't mean we can do arbitrary arithmetic. 01:42:39.960 |
- Counting, like counting one, two, three-ish, 01:42:49.360 |
you can tell the side with five dots has more dots. 01:42:56.400 |
So that said, the list may not be exhaustive. 01:43:04.480 |
the potential existence of new knowledge systems, 01:43:33.320 |
speaking about rotation, that there is in the brain, 01:43:37.240 |
a hard-coded system that is capable of performing rotations. 01:43:40.920 |
One famous experiment that people did in the, 01:43:45.840 |
I don't remember which was exactly, but in the '70s, 01:43:51.400 |
was that people found that if you asked people, 01:44:03.320 |
is that shape a rotated version of the first shape or not? 01:44:06.760 |
What you see is that the time it takes people to answer 01:44:11.160 |
is linearly proportional, right, to the angle of rotation. 01:44:16.160 |
So it's almost like you have somewhere in your brain, 01:44:42.760 |
- So in the paper I outlined, all these principles, 01:44:55.320 |
to embody as many of these principles as possible. 01:44:58.560 |
So I don't think it's anywhere near a perfect attempt, 01:45:06.080 |
but it is what I was able to do given the constraints. 01:45:10.680 |
So the format of ARC is very similar to classic IQ tests, 01:45:22.840 |
you know what it is probably, or at least you've seen it, 01:45:27.040 |
And so you have a set of tasks, that's what they're called, 01:45:40.280 |
So an input or output pair is a grid of colors, basically, 01:45:51.480 |
And you're given an input and you must transform it 01:45:59.120 |
And so you're shown a few demonstrations of a task 01:46:17.680 |
every task should only require core knowledge priors, 01:46:30.360 |
So for instance, no language, no English, nothing like this, 01:46:52.080 |
And some of the tasks are actually explicitly trying 01:46:56.560 |
to probe specific forms of abstraction, right? 01:47:01.560 |
Part of the reason why I wanted to create ARC 01:47:16.120 |
as understanding how to autonomously generate abstraction 01:47:20.960 |
in a machine, you have to co-evolve the solution 01:47:29.360 |
was to clarify my ideas about the nature of abstraction, 01:47:39.920 |
And there are things that turn out to be very easy 01:47:43.240 |
for humans to perform, including young kids, right? 01:47:46.760 |
But turn out to be near impossible for machines. 01:47:50.520 |
- So what have you learned from the nature of abstraction 01:47:59.480 |
One of the things you wanted to try to understand 01:48:06.040 |
- Yes, so clarifying my own ideas about abstraction 01:48:10.360 |
by forcing myself to produce tasks that would require 01:48:14.800 |
the ability to produce that form of abstraction 01:48:23.080 |
people should check out, I'll probably overlay 01:48:29.120 |
with the different colors on the grid, that's it. 01:48:48.600 |
So you make it explicit that everything should only be built 01:48:59.280 |
And it's, it perhaps requires a bit more manual work 01:49:03.840 |
to produce solutions because you have to click around 01:49:08.480 |
Sometimes the grids can be as large as 30 by 30 cells. 01:49:22.680 |
What, how difficult is it to come up with a question? 01:49:25.480 |
Like, is this scalable to a much larger number? 01:49:47.400 |
- Including the tests and the prior test set. 01:49:49.160 |
I think it's fairly difficult in the sense that 01:49:51.440 |
a big requirement is that every task should be novel 01:50:00.000 |
Like you don't want to create your own little world 01:50:04.240 |
that is simple enough that it would be possible 01:50:11.080 |
and write down an algorithm that could generate 01:50:17.120 |
for instance, that would completely invalidate the test. 01:50:20.200 |
- So you're constantly coming up with new stuff. 01:50:21.400 |
- You need, yeah, you need a source of novelty, 01:50:32.040 |
you are not a very good source of unthinkable novelty. 01:50:36.520 |
And so you have to pace the creation of these tasks 01:50:44.560 |
- So I mean, it's coming up with truly original new ideas. 01:50:53.800 |
But I mean, that's fascinating to think about. 01:50:55.800 |
So you would be like walking or something like that. 01:50:58.640 |
Are you constantly thinking of something totally new? 01:51:07.880 |
- Yeah, I mean, I'm not saying you've done anywhere 01:51:16.800 |
So that said, you should consider arc as a work in progress. 01:51:35.360 |
to open up the creation of tasks to a broad audience 01:51:39.880 |
That would involve several levels of filtering, obviously. 01:51:44.160 |
But I think it's possible to apply crowdsourcing 01:51:46.240 |
to develop a much bigger and much more diverse arc data set. 01:51:56.480 |
- So is there always need to be a part of arc 01:52:08.560 |
that you're using to actually benchmark algorithms 01:52:11.960 |
is not accessible to the people developing these algorithms. 01:52:31.120 |
And then you're just capturing its crystallized output. 01:52:37.160 |
is not the same thing as the process that generated it. 01:52:51.440 |
I think there's a lot of really brilliant people out there 01:53:00.800 |
lots of people seem to actually enjoy arc as a kind of game. 01:53:08.600 |
as a benchmark of fluid general intelligence. 01:53:22.280 |
There's a world of people who create IQ questions. 01:53:48.680 |
it's kind of inspired by IQ tests or whatever, 01:54:13.280 |
that's supposed to be a test of machine intelligence. 01:54:37.080 |
abstract representations of the problems it's exposed to. 01:54:52.560 |
It's all a recombination of a very, very small set 01:55:12.160 |
Do you think this is something that continues 01:55:13.600 |
for five years, 10 years, like just continues growing? 01:55:25.920 |
Another thing I'm starting is I'll be collaborating 01:55:30.080 |
with folks from the psychology department at NYU 01:55:36.800 |
And I think there are lots of interesting questions 01:55:39.840 |
especially as you start correlating machine solutions 01:55:44.840 |
to arc tasks and the human characteristics of solutions. 01:55:53.600 |
between the human perceived difficulty of a task and-- 01:56:09.280 |
The things that could be difficult for humans 01:56:10.920 |
might be very different than the things that-- 01:56:16.520 |
that difference in difficulty may teach us something 01:56:25.040 |
is that it's proving to be a very actionable test 01:56:39.260 |
While humans found actually the tasks very easy. 01:56:43.320 |
And that alone was like a big red flashing light 01:56:54.560 |
machine performance did not stay at zero for very long. 01:56:57.680 |
Actually within two weeks of the Kaggle competition, 01:57:21.500 |
You can start making progress basically right away. 01:57:35.940 |
that there was no obvious shortcut to solve these tasks. 01:57:44.060 |
And that was the primary reason to Kaggle competition 01:58:09.900 |
what sort of tasks may be contained in the test set. 01:58:23.380 |
it's like 20% were still very, very far from even level, 01:58:45.700 |
that are probably pretty close to human level 01:59:01.060 |
but they would be capable of a degree of generalization 01:59:13.060 |
in terms of general fluid intelligence to mention. 01:59:17.920 |
you described different kinds of generalizations, 01:59:23.580 |
and there's a kind of a hierarchy that you form. 01:59:37.020 |
I mean, it's even older than machine learning. 01:59:43.220 |
if it can make sense of an input it has not yet seen. 01:59:48.220 |
And that's what I would call a system-centric generalization. 02:00:05.020 |
should actually deal with developer-aware generalization, 02:00:09.900 |
which is slightly stronger than system-centric generalization. 02:00:16.500 |
the ability to generalize to novelty or uncertainty 02:00:21.420 |
that not only the system itself has not access to, 02:00:37.660 |
we're talking about with autonomous vehicles. 02:00:46.060 |
the system should be able to generalize the thing 02:00:53.820 |
nor obviously the contents of the training data. 02:01:04.500 |
And the lowest level is what machine learning 02:01:13.620 |
is gonna be sampled from a static distribution 02:01:18.340 |
And that you already have a representative sample 02:01:21.500 |
of the distribution, that's your training data. 02:01:24.780 |
you generalize to a new sample from a known distribution. 02:02:42.860 |
So generalizing to the long tail of situations 02:02:51.060 |
and finally you would have extreme generalization, 02:02:56.660 |
but instead of just considering one specific domain 02:03:07.740 |
So a robot would be capable of extreme generalization 02:03:28.860 |
it would be capable of extreme generalization for instance. 02:03:32.300 |
- So the ultimate goal is extreme generalization. 02:03:39.020 |
that it could essentially achieve a human skill parity 02:03:55.540 |
And it would do so over basically the same range 02:04:05.020 |
of training experience of practice as humans would require. 02:04:07.980 |
That would be human level extreme generalization. 02:04:10.980 |
So I don't actually think humans are anywhere near 02:04:15.500 |
the optimal intelligence bound if there is such a thing. 02:04:27.820 |
an hard limit to how intelligent any system can be. 02:04:34.820 |
I don't think humans are anywhere near that limit. 02:04:40.820 |
I think you had this idea that we're only as intelligent 02:04:55.180 |
and we are bounded by the problems we try to solve. 02:05:18.340 |
because they are trying to fix one specific bottleneck 02:05:28.820 |
input and output of information in the brain. 02:05:36.380 |
bandwidth is not at this time a bottleneck at all, 02:05:40.260 |
meaning that we already have senses that enable us 02:05:53.260 |
to sort of play devil's advocate a little bit, 02:06:19.620 |
which is very different from brain computer interfaces. 02:06:28.380 |
if our brain has direct access to Wikipedia without- 02:06:31.940 |
- Your brain already has direct access to Wikipedia. 02:06:38.460 |
and your ears and so on to access that information. 02:06:49.580 |
which is why speed reading, for instance, does not work. 02:06:53.340 |
The faster you read, the less you understand. 02:07:19.140 |
So I think the speed at which you can take information in, 02:07:23.500 |
and even the speed at which you can output information 02:07:30.820 |
- I think that if you're a very, very fast typer, 02:07:34.460 |
the speed at which you can express your thoughts 02:07:36.740 |
is already the speed at which you can form your thoughts. 02:07:42.100 |
that there are fundamental bottlenecks to the human mind. 02:07:47.060 |
But it's possible that everything we have in the human mind 02:07:51.620 |
is just to be able to survive in the environment. 02:08:06.820 |
is a very valid and very powerful avenue, right? 02:08:26.700 |
- Not just computers, not just phones and the internet. 02:08:38.380 |
- And you can scale that externalised cognition 02:08:42.060 |
far beyond the capability of the human brain. 02:08:55.340 |
because it's not rebound by individual brains. 02:09:05.380 |
First of all, it includes all the other biological systems, 02:09:14.740 |
- Non-human systems are probably not contributing much, 02:09:19.780 |
Like Google Search, for instance, is a big part of it. 02:09:31.180 |
Like how the world has changed in the past 20 years, 02:09:38.260 |
Of course, whoever created the simulation we're in 02:09:40.700 |
is probably doing metrics, measuring the progress. 02:09:44.940 |
There was probably a big spike in performance. 02:10:07.140 |
by doing a natural language open dialogue test 02:10:11.780 |
that's judged by humans as far as how well the machine did. 02:10:44.860 |
they may not themselves have any proper methodology. 02:10:49.740 |
They may not themselves have any proper definition 02:10:59.420 |
which is reliability, because you have biased human judges. 02:11:04.420 |
It's also violating the standardization requirement 02:11:12.140 |
because you are outsourcing everything that matters, 02:11:18.580 |
and finding a standard on test to measure it. 02:11:28.900 |
that when Turing proposed the imitation game, 02:11:42.540 |
It was using the imitation game as a thought experiment 02:11:48.900 |
in a philosophical discussion in his 1950 paper. 02:11:58.660 |
it should be possible for something very much 02:12:03.260 |
like the human mind, indistinguishable from the human mind 02:12:16.660 |
But nowadays I think it's fairly well accepted 02:12:20.220 |
that the mind is an information processing system 02:12:22.740 |
and that you could probably encode it into a computer. 02:12:25.500 |
So another reason why I'm not a fan of this type of test 02:12:55.620 |
In the same way that let's say you're doing physics 02:13:01.660 |
And what if the test that you set out to pass 02:13:12.740 |
And that is something that you can achieve with, 02:13:34.860 |
that's the hope with these kinds of subjective evaluations, 02:13:58.660 |
when they're actually talking to an algorithm. 02:14:13.980 |
We are constantly projecting emotions, intentions, 02:14:21.100 |
Agent-ness is one of our core innate priors, right? 02:14:24.300 |
We are projecting these things on everything around us. 02:14:47.940 |
the anthropomorphization that we naturally do, 02:15:01.060 |
But I still think it's really difficult to convince. 02:15:10.060 |
like there's formulations of the test you can create 02:15:29.180 |
So that's slightly better than just the imitation. 02:15:41.780 |
it'll be useful for creating further intelligent systems. 02:16:03.540 |
But like most engineers are not really inspired by it. 02:16:13.780 |
There's something inspiring about it, I think. 02:16:17.700 |
- As a philosophical device in a philosophical discussion, 02:16:21.740 |
I think there is something very interesting about it. 02:16:40.980 |
That the first AI that will show strong generalization 02:16:53.100 |
they will not actually behave or look anything like humans. 02:16:57.260 |
Human likeness is the very last step in that process. 02:17:10.380 |
so I guess I usually agree with you on most things. 02:17:13.460 |
I remember you, I think at some point tweeting 02:17:17.100 |
not being counterproductive or something like that. 02:17:20.260 |
And I think a lot of very smart people agree with that. 02:17:23.020 |
Computation speaking, not a very smart person. 02:17:31.500 |
I disagree with that 'cause I think there's some magic 02:17:36.940 |
So to play devil's advocate on your statement, 02:17:45.580 |
you have to in conversation show your ability 02:17:55.460 |
through not just like as a standalone system, 02:17:58.460 |
but through the process of like the interaction, 02:18:01.380 |
the game theoretic, where you really are changing 02:18:09.100 |
So in the ARC challenge, for example, you're an observer. 02:18:28.340 |
to generalizability. - Yeah, I think you make 02:18:51.820 |
And I think, so I love the idea of interactivity. 02:19:01.380 |
where your score on a task would not be one or zero, 02:19:10.380 |
that you can make before you hit the right solution, 02:19:16.900 |
the scientific method as you solve ARC tasks, 02:19:22.300 |
and probing the system to see whether the hypothesis, 02:19:26.540 |
the observation will match the hypothesis or not. 02:19:43.860 |
so one thing that's interesting about this notion 02:19:59.660 |
you can actually probe that ambiguity, right? 02:20:05.700 |
which is how good can you adapt to the uncertainty 02:20:19.300 |
with which you reduce uncertainty in problem space, exactly. 02:20:23.020 |
- Very difficult to come up with that kind of test though. 02:20:28.340 |
In practice, it would be very, very difficult, but yes. 02:20:34.300 |
what you've done with the ARC challenge is brilliant. 02:20:37.540 |
I'm also not, I'm surprised that it's not more popular, 02:20:41.980 |
- It does its niche, it does its niche, yeah. 02:20:44.020 |
- Yeah, what are your thoughts about another test 02:20:48.860 |
He has the Hutter Prize for compression of human knowledge, 02:20:59.620 |
What's your thoughts about this intelligence as compression? 02:21:09.260 |
Like you're given Wikipedia, basic English Wikipedia, 02:21:15.540 |
And so it stems from the idea that cognition is compression, 02:21:21.180 |
that the brain is basically a compression algorithm. 02:21:25.660 |
It's a very, I think, striking and beautiful idea. 02:21:36.900 |
So I no longer believe that cognition is compression. 02:21:44.540 |
So it's very easy to believe that cognition and compression 02:22:05.100 |
because compression is something that we do all the time, 02:22:17.940 |
We're constantly trying to organize things in our mind 02:22:40.140 |
that is used in many ways, but it's just a tool. 02:22:57.660 |
that include fundamental uncertainty and novelty. 02:23:06.980 |
And so they have 10 years of life experience. 02:23:16.620 |
If you were to generate the shortest behavioral program 02:23:25.340 |
over those 10 years in an optimal way, right? 02:23:37.620 |
this is what you would get if the mind of the child 02:23:47.860 |
to process the next 70 years in the life of that child. 02:23:59.100 |
we are not trying to make them actually optimally compressed. 02:24:06.740 |
to promote simplicity and efficiency in our models, 02:24:20.220 |
but that may turn out to be useful in the future 02:24:28.780 |
that cognition, that intelligence arises from, 02:24:37.980 |
what sort of context, environment, and situation 02:24:46.580 |
So an analogy that you can make is with investing, 02:24:51.580 |
for instance, if I look at the past, you know, 02:25:04.380 |
it's going to be, you know, you buy Apple stock, 02:25:21.100 |
you're not just going to be following the strategy 02:25:30.420 |
you're going to have a balanced portfolio, right? 02:25:34.860 |
Because you just don't know what's going to happen. 02:25:40.420 |
the compression is analogous to what you talked about, 02:25:47.820 |
It's much closer to that side of being able to generalize 02:25:59.900 |
so a lot of it is driven by play, driven by curiosity. 02:26:14.380 |
from our environment that seem to be completely useless 02:26:19.620 |
because they might turn out to be eventually useful, right? 02:26:26.860 |
And what makes it antagonistic to compression 02:26:29.220 |
is that it is about hedging for future uncertainty. 02:26:50.860 |
but not, however that quote goes, but not too simple. 02:26:54.940 |
So you want to, compression simplifies things, 02:27:16.620 |
is because fundamentally you don't know what you're doing. 02:27:22.060 |
is that it needs to behave appropriately in the future. 02:27:26.820 |
And it has no idea what the future is going to be like. 02:27:29.420 |
It's a bit, it's not going to be like the past. 02:27:39.020 |
- Yeah, history repeats itself, but not perfectly. 02:27:54.500 |
but the bigger question from intelligence is of meaning. 02:27:59.300 |
Intelligence systems are kind of goal-oriented. 02:28:07.620 |
I mean, there's always a clean formulation of a goal, 02:28:21.540 |
Francois Chollet, do you think is the meaning of life? 02:28:38.060 |
And so, you know, the one thing that's very important 02:28:49.340 |
that makes up ourselves, that makes up who we are, 02:29:01.700 |
are expressed in words that you did not invent 02:29:16.820 |
What makes us different from animals, for instance, right? 02:29:20.220 |
So we are, everything about ourselves is an echo 02:29:25.140 |
of the past, an echo of people who lived before us, right? 02:29:31.380 |
And in the same way, if we manage to contribute something 02:29:40.140 |
a new idea, maybe a beautiful piece of music, 02:29:55.620 |
of the minds of future humans, essentially forever. 02:30:06.020 |
And that's, in a way, this is our path to immortality 02:30:40.900 |
of the interactions between many different ripples 02:30:53.300 |
this seems like perhaps a naive thing to say, 02:30:56.060 |
but we should be kind to others during our time on earth 02:31:01.060 |
because every act of kindness creates ripples. 02:31:05.660 |
And in reverse, every act of violence also creates ripples. 02:31:16.580 |
- And in your case, first of all, beautifully put, 02:31:19.100 |
but in your case, creating ripples into the future human 02:31:30.700 |
- I don't think there's a better way to end it, Francois. 02:32:03.940 |
to get a discount and to support this podcast. 02:32:06.900 |
If you enjoy this thing, subscribe on YouTube, 02:32:24.840 |
in his "On the Measure of Intelligence" paper. 02:32:27.780 |
"If there were machines which bore a resemblance 02:32:32.840 |
"as closely as possible for all practical purposes, 02:32:42.120 |
"The first is that they could never use words 02:32:49.760 |
"For we can certainly conceive of a machine so constructed 02:32:59.520 |
"But it is not conceivable that such a machine 02:33:02.640 |
"should produce different arrangements of words 02:33:05.100 |
"so as to give an appropriately meaningful answer 02:33:12.760 |
Here Descartes is anticipating the Turing test 02:33:15.460 |
and the argument still continues to this day. 02:33:20.920 |
"even though some machines might do some things 02:33:23.360 |
"as well as we do them, or perhaps even better, 02:33:32.360 |
"but only from the disposition of their organs." 02:33:43.200 |
which can be used in all kinds of situations, 02:33:49.080 |
Hence, it is for all practical purposes impossible 02:33:52.120 |
for a machine to have enough different organs 02:33:54.300 |
to make it act in all the contingencies of life 02:34:01.360 |
That's the debate between mimicry memorization 02:34:07.240 |
So thank you for listening and hope to see you next time.