back to indexFrançois Chollet: Keras, Deep Learning, and the Progress of AI | Lex Fridman Podcast #38
Chapters
0:0
11:43 Scale and Defining Intelligent Systems
15:23 Why an Intelligence Explosion Is Not Possible
29:22 Limits of Deep Learning
57:34 Can You Learn Rule-Based Models
58:6 The Field of Program Synthesis
63:7 Weight Agnostic Neural Networks
87:8 Algorithmic Bias
95:17 Turing Test
00:00:00.000 |
The following is a conversation with Francois Chollet. 00:00:03.760 |
He's the creator of Keras, which is an open source deep learning 00:00:13.600 |
It serves as an interface to several deep learning libraries, 00:00:19.040 |
And it was integrated into the TensorFlow main code 00:00:24.120 |
Meaning, if you want to create, train, and use neural networks, 00:00:37.280 |
and popular library, Francois is also a world-class AI 00:00:44.560 |
And he's definitely an outspoken, if not controversial, 00:01:01.000 |
give us five stars on iTunes, support on Patreon, 00:01:04.160 |
or simply connect with me on Twitter at Lex Friedman, 00:01:07.160 |
spelled F-R-I-D-M-A-N. And now, here's my conversation 00:01:14.880 |
You're known for not sugarcoating your opinions 00:01:22.840 |
So what's one of the more controversial ideas 00:01:26.360 |
you've expressed online and gotten some heat for? 00:01:33.880 |
Yeah, no, I think if you go through the trouble 00:01:41.880 |
Otherwise, what's even the point of having a Twitter account? 00:01:44.600 |
It's like having a nice car and just leaving it in the garage. 00:01:53.600 |
Perhaps, you know, that time I wrote something 00:02:27.480 |
that itself is a problem that could be solved by your AI. 00:02:30.520 |
And maybe it could be solved better than what humans can do. 00:02:33.760 |
So your AI could start tweaking its own algorithm, 00:02:36.840 |
could start being a better version of itself. 00:02:39.520 |
And so on, iteratively, in a recursive fashion. 00:02:55.880 |
first of all, because the notion of intelligence explosion 00:03:05.360 |
It considers intelligence as a property of a brain 00:03:19.040 |
Intelligence emerges from the interaction between a brain, 00:03:30.720 |
then you cannot really define intelligence anymore. 00:03:33.840 |
So just tweaking a brain to make it smaller and smaller 00:03:39.120 |
So first of all, you're crushing the dreams of many people. 00:03:48.720 |
people who think the universe is an information processing 00:03:54.640 |
Our brain is kind of an information processing system. 00:04:00.080 |
It doesn't make sense that there should be some-- 00:04:04.800 |
it seems naive to think that our own brain is somehow 00:04:08.160 |
the limit of the capabilities of this information. 00:04:18.080 |
able to build something that's on par with the brain, 00:04:39.080 |
so most people who are skeptical of that are kind of like, 00:04:51.360 |
the whole thing is shrouded in mystery where you can't really 00:04:57.840 |
This doesn't feel like that's how the brain works. 00:05:05.640 |
So one idea is that the brain doesn't exist alone. 00:05:18.200 |
improve the environment and the brain together, 00:05:21.080 |
almost, in order to create something that's much smarter 00:05:28.520 |
of course, we don't have a definition of intelligence. 00:05:31.920 |
I don't think-- if you look at very smart people today, 00:05:37.920 |
I don't think their brain and the performance of their brain 00:05:41.200 |
is the bottleneck to their expressed intelligence, 00:05:47.200 |
You cannot just tweak one part of this system, 00:05:53.440 |
and expect the capabilities, like what emerges out 00:05:56.640 |
of this system, to just explode exponentially. 00:06:04.120 |
of a system with many interdependencies like this, 00:06:09.560 |
And I don't think even today, for very smart people, 00:06:12.320 |
their brain is not the bottleneck to the sort 00:06:21.480 |
they're not actually solving any big scientific problems. 00:06:24.840 |
They're like Einstein, but the patent clerk days. 00:06:32.640 |
was a meeting of a genius with a big problem at the right time. 00:06:39.480 |
But maybe this meeting could have never happened. 00:06:42.520 |
And then Einstein would have just been a patent clerk. 00:06:48.000 |
like genius level smart, but you wouldn't know, 00:06:52.280 |
because they're not really expressing any of that. 00:06:55.680 |
So we can think of the world, Earth, but also the universe 00:07:02.760 |
So all of these problems and tasks are roaming it, 00:07:10.120 |
and animals and so on that are also roaming it. 00:07:17.640 |
But without that coupling, you can't demonstrate 00:07:32.240 |
All you're left with is potential intelligence, 00:07:36.240 |
or how high your IQ is, which in itself is just a number. 00:07:46.160 |
What do you think of as problem solving capacity? 00:07:55.160 |
Like what does it mean to be more or less intelligent? 00:08:00.000 |
Is it completely coupled to a particular problem 00:08:03.000 |
or is there something a little bit more universal? 00:08:09.080 |
Even human intelligence has some degree of generality. 00:08:12.200 |
Well, all intelligence systems have some degree of generality 00:08:15.360 |
but they're always specialized in one category of problems. 00:08:21.880 |
in the human experience and that shows at various levels. 00:08:25.560 |
That shows in some prior knowledge that's innate 00:08:32.040 |
Knowledge about things like agents, goal-driven behavior, 00:08:48.600 |
It's very, very easy for us to learn certain things 00:08:52.040 |
because we are basically hard-coded to learn them. 00:08:54.920 |
And we are specialized in solving certain kinds of problem 00:09:08.800 |
We have no capability of seeing the very long-term. 00:09:21.360 |
are we talking about scale of years, millennia? 00:09:24.880 |
What do you mean by long-term we're not very good? 00:09:34.240 |
Even within one lifetime, we have a very hard time 00:09:47.000 |
- We can solve only fairly narrowly scoped problems. 00:09:53.760 |
we are not actually doing it on an individual level. 00:09:59.320 |
We have this thing called civilization, right? 00:10:03.080 |
Which is itself a sort of problem solving system, 00:10:06.640 |
a sort of artificial intelligence system, right? 00:10:30.280 |
on a much greater scale than any individual human. 00:10:33.760 |
If you look at computer science, for instance, 00:10:51.360 |
as an institution is a kind of artificially intelligent 00:10:55.720 |
problem solving algorithm that is superhuman. 00:11:02.800 |
is like a theorem prover at a scale of thousands, 00:11:10.440 |
At that scale, what do you think is an intelligent agent? 00:11:14.680 |
So there's us humans at the individual level, 00:11:18.320 |
there is millions, maybe billions of bacteria in our skin. 00:11:29.200 |
as systems that behave, you can say intelligently 00:11:36.720 |
as a single organism, you can look at our galaxy 00:11:46.320 |
And we're here at Google, there is millions of devices 00:11:53.440 |
How do you think about intelligence versus scale? 00:11:55.920 |
- You can always characterize anything as a system. 00:12:03.640 |
like intelligence explosion tend to focus on one agent 00:12:07.440 |
is basically one brain, like one brain considered 00:12:10.200 |
in isolation, like a brain, a jaw that's controlling 00:12:12.680 |
a body in a very like top to bottom kind of fashion. 00:12:16.320 |
And that body is pursuing goals into an environment. 00:12:20.720 |
You have the brain at the top of the pyramid, 00:12:22.880 |
then you have the body just plainly receiving orders 00:12:28.920 |
So everything is subordinate to this one thing, 00:12:39.240 |
There is no strong delimitation between the brain 00:12:50.760 |
So you have to look at an entire animal as one agent, 00:12:53.960 |
but then you start realizing as you observe an animal 00:12:57.000 |
over any length of time, that a lot of the intelligence 00:13:16.000 |
So it's externalized in books, it's externalized 00:13:49.960 |
as intelligence explosion in a specific task? 00:13:54.720 |
And then, well, yeah, do you think it's possible 00:13:58.560 |
to have a category of tasks on which you do have something 00:14:07.440 |
- I think if you consider a specific vertical, 00:14:14.640 |
I also don't think we have to speculate about it 00:14:21.640 |
of recursively self-improving intelligent systems. 00:14:26.280 |
- So for instance, science is a problem solving system, 00:14:31.960 |
like a system that experiences the world in some sense 00:14:35.640 |
and then gradually understands it and can act on it. 00:14:47.520 |
Technology can be used to build better tools, 00:14:50.160 |
better computers, better instrumentation and so on, 00:14:52.840 |
which in turn can make science faster, right? 00:14:56.680 |
So science is probably the closest thing we have today 00:15:00.520 |
to a recursively self-improving superhuman AI. 00:15:12.760 |
And you can use that as a basis to try to understand 00:15:23.280 |
What is your intuition why an intelligence explosion 00:15:33.200 |
why can't we slightly accelerate that process? 00:15:43.120 |
So recursive self-improvement is absolutely a real thing. 00:15:48.120 |
But what happens with a recursively self-improving system 00:15:58.640 |
means that suddenly another part of the system 00:16:09.040 |
scientific progress is not actually exploding. 00:16:16.480 |
that is consuming an exponentially increasing 00:16:25.960 |
And maybe that will seem like a very strong claim. 00:16:43.080 |
For instance, the number of papers being published, 00:16:53.600 |
with how many people are working on science today. 00:16:58.480 |
So it's actually an indicator of resource consumption. 00:17:03.200 |
is progress in terms of the knowledge that science generates 00:17:12.520 |
And some people have actually been trying to measure that. 00:17:23.720 |
So his approach to measure scientific progress 00:17:28.360 |
was to look at the timeline of scientific discoveries 00:17:46.760 |
And if the output of science as an institution 00:17:58.160 |
Maybe because there's a faster rate of discoveries, 00:18:00.960 |
maybe because the discoveries are increasingly 00:18:07.840 |
this temporal density of significance measured in this way, 00:18:16.600 |
across physics, biology, medicine, and so on. 00:18:19.720 |
And it actually makes a lot of sense if you think about it, 00:18:37.560 |
when we started having electricity and so on. 00:18:41.520 |
And today is also a time of very, very fast change, 00:18:50.560 |
are moving way faster than they did 50 years ago, 00:19:10.560 |
- And you can check out the paper that Michael Nielsen 00:19:30.440 |
Like the very first person to work on information theory. 00:19:37.920 |
there's a lot of low hanging fruit you can pick. 00:19:50.160 |
probably larger numbers, smaller discoveries. 00:19:57.480 |
And that's exactly the picture you're seeing with science, 00:20:00.040 |
is that the number of scientists and engineers 00:20:11.840 |
So the resource consumption of science is exponential, 00:20:23.080 |
and even though science is recursively self-improving, 00:20:42.680 |
The internet is a technology that's made possible 00:20:47.440 |
And itself, because it enables scientists to network, 00:20:52.400 |
to communicate, to exchange papers and ideas much faster, 00:21:04.080 |
to produce the same amount of problem-solving very much. 00:21:11.080 |
And certainly that holds for the deep learning community. 00:21:14.920 |
If you look at the temporal, what did you call it? 00:21:29.000 |
in deep learning, they might even be decreasing. 00:21:32.360 |
- So I do believe the per paper significance is decreasing. 00:21:45.840 |
my guess is that you would see a linear progress. 00:21:49.680 |
- If you were to sum the significance of all papers, 00:22:03.600 |
that you're seeing linear progress in science 00:22:10.280 |
is dynamically adjusting itself to maintain linear progress 00:22:15.280 |
because we as a community expect linear progress, 00:22:22.320 |
it means that suddenly there are some lower hanging fruits 00:22:25.720 |
that become available and someone's gonna step up 00:22:31.240 |
So it's very much like a market for discoveries and ideas. 00:23:07.400 |
suddenly some other part becomes a bottleneck. 00:23:10.680 |
For instance, let's say we develop some device 00:23:27.880 |
So the air around it is gonna generate friction. 00:23:34.320 |
And even if you were to consider the broader context 00:23:51.960 |
when you look at the problem-solving algorithm 00:23:55.000 |
that is being run by science as an institution, 00:24:02.080 |
despite having this recursive self-improvement component, 00:24:15.000 |
in terms of communication across researchers. 00:24:19.000 |
you were mentioning quantum mechanics, right? 00:24:23.040 |
Well, if you want to start making significant discoveries 00:24:26.960 |
today, significant progress in quantum mechanics, 00:24:44.120 |
And of course, the significant practical experiments 00:24:47.520 |
are going to require exponentially expensive equipment 00:24:52.240 |
because the easier ones have already been run, right? 00:25:01.840 |
there's no way of escaping this kind of friction 00:25:09.840 |
- Yeah, no, I think science is a very good way 00:25:14.080 |
with a superhuman recursively self-improving AI. 00:25:20.880 |
It's not like a mathematical proof of anything. 00:25:28.800 |
to question the narrative of intelligence explosion, 00:25:33.800 |
And you do get a lot of pushback if you go against it. 00:25:40.200 |
AI is not just a subfield of computer science. 00:25:44.920 |
Like this belief that the world is headed towards an event, 00:25:52.800 |
AI will become, will go exponential very much 00:26:04.840 |
because it is not really a scientific argument, 00:26:16.480 |
It's almost like saying God doesn't exist or something. 00:26:27.640 |
they might not be as eloquent or explicit as you're being, 00:26:34.040 |
anything that you could call AI, quote unquote, 00:26:39.160 |
They might not be describing in the same kind of way. 00:26:45.040 |
is from people who get attached to the narrative 00:27:04.000 |
past the singularity, that what people imagine 00:27:08.620 |
Do you have, if you were put on your psychology hat, 00:27:17.280 |
the ways that all of human civilization will be destroyed? 00:27:30.600 |
If you look at the mythology of most civilizations, 00:27:34.400 |
it's about the world being headed towards some final events 00:27:44.960 |
like the apocalypse followed by paradise probably, right? 00:27:49.480 |
It's a very appealing story on a fundamental level. 00:27:54.600 |
We all need stories to structure the way we see the world, 00:28:04.520 |
- So on a more serious, non-exponential explosion question, 00:28:14.960 |
when we'll create something like human-level intelligence 00:28:19.800 |
or intelligent systems that will make you sit back 00:28:23.800 |
and be just surprised at damn how smart this thing is? 00:28:32.120 |
but what's your sense of the timeline and so on 00:28:35.560 |
that you'll be really surprised at certain capabilities? 00:28:41.040 |
And we'll talk about limitations in deep learning. 00:28:42.520 |
So when do you, do you think in your lifetime 00:28:46.600 |
- Around 2013, 2014, I was many times surprised 00:28:51.400 |
by the capabilities of deep learning, actually. 00:28:55.840 |
what deep learning could do and could not do. 00:28:57.840 |
And it felt like a time of immense potential. 00:29:07.120 |
- Was there a moment, there must've been a day in there 00:29:10.800 |
where your surprise was almost bordering on the belief 00:29:19.040 |
Was there a moment, 'cause you've written quite eloquently 00:29:32.400 |
What was really shocking is that it's worked. 00:29:37.640 |
But there's a big jump between being able to do 00:29:41.600 |
really good computer vision and human level intelligence. 00:29:44.880 |
So I don't think at any point, I wasn't an impression 00:29:51.280 |
meant that we were very close to human level intelligence. 00:29:54.120 |
I don't think we were very close to human level intelligence. 00:29:56.080 |
I do believe that there's no reason why we won't achieve it 00:30:01.800 |
I also believe that, you know, it's the problem 00:30:08.600 |
that implicitly you're considering like an axis 00:30:14.400 |
But that's not really how intelligence works. 00:30:22.520 |
but there's also the question of being human-like. 00:30:29.760 |
intelligent agents that are not human-like at all. 00:30:32.720 |
And you can also build very human-like agents. 00:30:35.280 |
And these are two very different things, right? 00:30:38.800 |
Let's go from the philosophical to the practical. 00:30:46.520 |
that you kind of remember in relation to Keras 00:30:48.560 |
and in general, TensorFlow, Theano, the old days. 00:30:52.080 |
Can you give a brief overview, Wikipedia style history 00:30:55.440 |
and your role in it before we return to AGI discussions? 00:31:10.240 |
So I started working on it in February, 2015. 00:31:14.840 |
And so at the time, there weren't too many people 00:31:17.280 |
working on deep learning, maybe like fewer than 10,000. 00:31:20.400 |
The software tooling was not really developed. 00:31:38.960 |
Caffe was the one library that everyone was using 00:31:43.480 |
- And computer vision was the most popular problem 00:31:47.040 |
- Absolutely, like Covenants was like the subfield 00:31:49.920 |
of deep learning that everyone was working on. 00:31:53.160 |
So myself, so in late 2014, I was actually interested 00:32:17.600 |
I had used Caffe, and there was no good solution 00:32:26.160 |
There was no reusable open source implementation 00:32:44.480 |
that was kind of not obvious is that the models 00:32:50.520 |
which was kind of like going against the mainstream 00:32:54.560 |
at the time because Caffe, PyLearn 2, and so on, 00:32:58.160 |
like all the big libraries were actually going 00:33:00.800 |
with the approach of having static configuration files 00:33:05.760 |
So some libraries were using code to define models, 00:33:12.480 |
Lasagne was like a Theano-based, very early library 00:33:16.880 |
that was, I think, developed, I'm not sure exactly, 00:33:28.360 |
and the value proposition at the time was that 00:33:32.520 |
not only that what I think was the first reusable 00:33:44.440 |
with the same library, which was not really possible before. 00:33:53.040 |
so before I was using Theano, I was actually using Scikit-Learn 00:33:58.320 |
So I drew a lot of inspiration from Scikit-Learn 00:34:02.400 |
It's almost like Scikit-Learn for neural networks. 00:34:22.640 |
It's magical in the sense that it's delightful, right? 00:34:42.720 |
you made me realize that that was a design decision at all, 00:34:49.880 |
whether the YAML, especially if CAFE was the most popular. 00:34:53.200 |
- It was the most popular by far at the time. 00:34:56.040 |
- If I were, yeah, I didn't like the YAML thing, 00:35:02.840 |
in a configuration file the definition of a model. 00:35:27.240 |
Lots of people were starting to be interested in LSTM. 00:35:32.440 |
because it was offering an easy to use LSTM implementation. 00:35:35.560 |
Exactly at the time where lots of people started 00:35:37.680 |
to be intrigued by the capabilities of RNN, RNN, so NLP. 00:35:51.520 |
and that was actually completely unrelated to Keras. 00:36:00.720 |
So I was doing computer vision research at Google initially. 00:36:05.520 |
I was exposed to the early internal version of TensorFlow. 00:36:13.920 |
and it was definitely the way it was at the time, 00:36:15.720 |
is that this was an improved version of Theano. 00:36:26.800 |
And I was actually very busy as a new Googler. 00:36:34.520 |
But then in November, I think it was November 2015, 00:36:41.240 |
And it was kind of like my wake up call that, 00:36:44.720 |
hey, I had to actually go and make it happen. 00:36:47.320 |
So in December, I ported Keras to run on top of TensorFlow. 00:36:55.280 |
where I was abstracting away all the backend functionality 00:37:07.440 |
And for the next year, Theano stayed as the default option. 00:37:20.400 |
It was much faster, especially when it came to Ornans. 00:37:30.160 |
has similar architectural decisions as Theano. 00:37:52.160 |
And even though it grew to have a lot of users 00:37:55.800 |
for a deep learning library at the time, like Stroud 2016, 00:38:04.720 |
I think it must have been maybe October 2016. 00:38:32.880 |
And so Rajat was saying, "Hey, we saw Keras, we like it. 00:38:47.280 |
And I was like, "Yeah, that sounds like a great opportunity. 00:38:50.400 |
And so I started working on integrating the Keras API 00:38:57.320 |
So what followed up is a sort of like temporary 00:39:17.560 |
- Well, it's kind of funny that somebody like you 00:39:22.280 |
who dreams of, or at least sees the power of AI systems 00:39:27.280 |
that reason and theorem proving we'll talk about 00:39:39.000 |
that is deep learning, super accessible, super easy. 00:39:49.080 |
But so TensorFlow 2.0 is kind of, there's a sprint. 00:39:56.920 |
What do you look, what are you working on these days? 00:40:05.720 |
There's so many things that just make it a lot easier 00:40:13.560 |
What are the problems you have to kind of solve? 00:40:26.400 |
It's a delightful product compared to TensorFlow 1.0. 00:40:31.400 |
So on the Keras side, what I'm really excited about is that, 00:40:37.360 |
so, you know, previously Keras has been this very easy 00:40:42.040 |
to use high-level interface to do deep learning. 00:40:55.600 |
was probably not the optimal way to do things 00:40:58.640 |
compared to just writing everything from scratch. 00:41:01.120 |
So in some way, the framework was getting in the way. 00:41:04.920 |
And in TensorFlow 2.0, you don't have this at all, 00:41:07.760 |
actually, you have the usability of the high-level interface, 00:41:11.280 |
but you have the flexibility of this lower-level interface. 00:41:21.760 |
and flexibility trade-offs depending on your needs, right? 00:41:29.880 |
and you get a lot of help doing so by, you know, 00:41:33.200 |
subclassing models and writing some train loops 00:41:51.520 |
and, you know, are ideal for a data scientist, 00:42:05.000 |
that are more or less low-level, more or less high-level 00:42:10.440 |
profiles ranging from researchers to data scientists 00:42:21.680 |
You can go on mobile, you can go with TensorFlow Lite, 00:42:24.520 |
you can go in the cloud with serving and so on, 00:42:37.240 |
So with Google, you're now seeing sort of Keras 00:43:12.600 |
is actually discussing design discussions, right? 00:43:18.560 |
participating in design review meetings and so on. 00:43:29.280 |
that is taken in coming up with these decisions 00:43:37.080 |
because TensorFlow has this extremely diverse user base, 00:43:57.480 |
- If I just look at the standard debates of C++ or Python, 00:44:05.920 |
I mean, they're not heated in terms of emotionally, 00:44:08.000 |
but there's probably multiple ways to do it right. 00:44:10.720 |
So how do you arrive through those design meetings 00:44:15.360 |
Especially in deep learning where the field is evolving 00:44:25.240 |
- I don't know if there's magic to the process, 00:44:36.120 |
but also trying to do so in the simplest way possible, 00:44:45.000 |
So you don't want to naively satisfy the constraints 00:44:49.160 |
by just, you know, for each capability you need available, 00:44:59.560 |
and hierarchical so that they have an API surface 00:45:06.080 |
And you want this modular hierarchical architecture 00:45:19.880 |
you're reading a tutorial or some docs pages, 00:45:28.240 |
You already have like certain concepts in mind 00:45:32.360 |
and you're thinking about how they relate together. 00:45:37.240 |
you're trying to build as quickly as possible 00:45:40.360 |
a mapping between the concepts featured in your API 00:45:48.920 |
as a domain expert to the way things work in the API. 00:45:53.640 |
So you need an API and an underlying implementation 00:45:57.080 |
that are reflecting the way people think about these things. 00:46:00.120 |
- So you're minimizing the time it takes to do the mapping. 00:46:06.600 |
in ingesting this new knowledge about your API. 00:46:15.560 |
It should only be referring to domain specific concepts 00:46:24.480 |
So what's the future of Keras and TensorFlow look like? 00:46:29.640 |
- So that's kind of too far in the future for me to answer, 00:46:33.680 |
especially since I'm not even the one making these decisions. 00:46:43.200 |
among many different perspectives on the TensorFlow team, 00:46:46.040 |
I'm really excited by developing even higher level APIs, 00:46:53.600 |
I'm really excited by hyper parameter tuning, 00:47:03.200 |
defining a model like you were assembling Lego blocks 00:47:16.080 |
and optimize the objective you're after, right? 00:47:23.040 |
- Yeah, so you put the baby into a room with the problem 00:47:35.920 |
that's really good at Legos and a box of Legos 00:47:44.120 |
And I think there's a huge amount of applications 00:47:46.080 |
and revolutions to be had under the constraints 00:47:57.480 |
If we look specifically at these function approximators 00:48:06.160 |
So you've talked about local versus extreme generalization. 00:48:10.160 |
You mentioned that neural networks don't generalize well, 00:48:16.280 |
So, and you've also mentioned that generalization, 00:48:19.880 |
extreme generalization requires something like reasoning 00:48:23.960 |
So how can we start trying to build systems like that? 00:48:30.600 |
Deep learning models are like huge parametric models 00:48:39.440 |
that go from an input space to an output space. 00:48:44.120 |
So they're trained pretty much point by point. 00:48:47.200 |
They're learning a continuous geometric morphing 00:48:50.520 |
from an input vector space to an output vector space. 00:49:02.200 |
of points in experience space that are very close 00:49:05.880 |
to things that it has already seen in the stream data. 00:49:08.560 |
At best, it can do interpolation across points. 00:49:17.360 |
you need a dense sampling of the input cross output space, 00:49:26.560 |
if you're dealing with complex real world problems 00:49:29.320 |
like autonomous driving, for instance, or robotics. 00:49:40.920 |
And it's only gonna be able to make sense of things 00:49:44.240 |
that are very close to what it has seen before. 00:49:50.160 |
but even if you're not looking at human intelligence, 00:49:53.200 |
you can look at very simple rules, algorithms. 00:49:58.040 |
it can actually apply to a very, very large set of inputs 00:50:04.840 |
It is not obtained by doing a point by point mapping, right? 00:50:09.840 |
For instance, if you try to learn a sorting algorithm 00:50:15.520 |
well, you're very much limited to learning point by point 00:50:18.480 |
what the sorted representation of this specific list 00:50:23.320 |
is like, but instead you could have a very, very simple 00:50:34.400 |
and it can process any list at all because it is abstract, 00:50:42.240 |
So deep learning is really like point by point 00:50:45.160 |
geometric morphings, morphings, train, risk, and descent. 00:50:48.600 |
And meanwhile, abstract rules can generalize much better. 00:50:53.600 |
And I think the future is really to combine the two. 00:50:56.680 |
- So how do we, do you think, combine the two? 00:50:59.680 |
How do we combine good point by point functions 00:51:03.520 |
with programs, which is what the symbolic AI type systems? 00:51:12.080 |
I mean, obviously we're jumping into the realm 00:51:17.360 |
You just kind of ideas and intuitions and so on. 00:51:20.760 |
- Well, if you look at the really successful AI systems 00:51:23.520 |
today, I think they are already hybrid systems 00:51:26.320 |
that are combining symbolic AI with deep learning. 00:51:39.400 |
At the same time, they're using deep learning 00:51:43.840 |
Sometimes they're using deep learning as a way to inject 00:51:50.920 |
If you look at the system like in a self-driving car, 00:51:54.560 |
it's not just one big end-to-end neural network, 00:52:00.760 |
you would need a dense sampling of experience space 00:52:08.880 |
Instead, the self-driving car is mostly symbolic, 00:52:13.280 |
you know, it's software, it's programmed by hand. 00:52:21.640 |
in this case, mostly 3D models of the environment 00:52:25.840 |
around the car, but it's interfacing with the real world 00:52:31.240 |
- Right, so the deep learning there serves as a way 00:52:38.360 |
Okay, well, let's linger on that a little more. 00:52:59.520 |
let's talk about steering, so staying inside the lane. 00:53:05.080 |
Lane following, yeah, it's definitely a problem 00:53:07.080 |
you can solve with an end-to-end deep learning model, 00:53:11.600 |
I don't know why you're jumping from the extreme so easily, 00:53:25.840 |
I think in general, you know, there is no hard limitations 00:53:30.840 |
to what you can learn with a deep neural network, 00:53:42.240 |
this dense sampling of the input cross output space. 00:53:58.720 |
and think what kind of problems can be solved 00:54:08.000 |
So let's think about natural language dialogue, 00:54:21.120 |
- Well, the Turing test is all about tricking people 00:54:26.880 |
And I don't think that's actually very difficult 00:54:29.040 |
because it's more about exploiting human perception 00:54:39.680 |
intelligent behavior and actual intelligent behavior. 00:54:42.080 |
- So, okay, let's look at maybe the Alexa Prize and so on, 00:54:45.360 |
the different formulations of the natural language 00:54:50.520 |
and more about maintaining a fun conversation 00:54:59.080 |
but it's more about being able to carry forward 00:55:01.440 |
a conversation with all the tangents that happen 00:55:14.520 |
- So I think it would be very, very challenging 00:55:17.800 |
I don't think it's out of the question either. 00:55:26.920 |
What's your sense about the space of those problems? 00:55:36.200 |
In practice, while deep learning is a great fit 00:55:50.240 |
or rules that you can generate by exhaustive search 00:55:59.360 |
as long as you have a sufficient trained data set. 00:56:21.120 |
the three-dimensional structure and relationships 00:56:25.600 |
or really that's where Symbolica has to step in? 00:56:28.320 |
- Well, it's always possible to solve these problems 00:56:38.640 |
A model would be, an explicit rule-based abstract model 00:56:42.080 |
would be a far better, more compressed representation 00:56:48.400 |
between in this situation, this thing happens. 00:56:54.800 |
- Do you think it's possible to automatically generate 00:56:57.480 |
the programs that would require that kind of reasoning? 00:57:02.200 |
Or does it have to, so the way the expert systems fail, 00:57:08.960 |
Do you think it's possible to learn those logical statements 00:57:13.480 |
that are true about the world and their relationships? 00:57:17.280 |
Do you think, I mean, that's kind of what theorem proving 00:57:22.680 |
- Yeah, except it's much harder to formulate statements 00:57:30.320 |
Statements about the world tend to be subjective. 00:57:43.600 |
However, today we just don't really know how to do it. 00:57:48.000 |
So it's very much a graph search or tree search problem. 00:57:52.400 |
And so we are limited to the sort of tree search 00:57:56.480 |
and graph search algorithms that we have today. 00:57:58.560 |
Personally, I think genetic algorithms are very promising. 00:58:05.600 |
- Can you discuss the field of program synthesis? 00:58:08.840 |
Like how many people are working and thinking about it? 00:58:13.360 |
What, where we are in the history of program synthesis 00:58:20.760 |
- Well, if it were deep learning, this is like the '90s. 00:58:24.600 |
So meaning that we already have existing solutions. 00:58:29.160 |
We are starting to have some basic understanding 00:58:35.520 |
but it's still a field that is in its infancy. 00:58:42.840 |
So the one real-world application I'm aware of 00:58:50.800 |
It's a way to automatically learn very simple programs 00:59:00.280 |
For instance, learning a way to format a date, 00:59:04.640 |
- You know, okay, that's a fascinating topic. 00:59:06.240 |
I always wonder when I provide a few samples to Excel, 00:59:19.680 |
And it's fascinating whether that's learnable patterns. 00:59:43.880 |
even though we couldn't really see its potential quite. 00:59:58.200 |
in general, discrete search over rule-based models 01:00:06.640 |
And that doesn't mean we're gonna drop deep learning. 01:00:24.800 |
given lots of delays is just extremely powerful. 01:00:27.800 |
So we are still gonna be working on deep learning 01:00:30.240 |
and we're gonna be working on program synthesis. 01:00:42.280 |
About 10,000 deep learning papers have been written 01:00:45.200 |
about hard coding priors about a specific task 01:00:56.960 |
but really what they're doing is hard coding some priors 01:01:01.560 |
But which gets straight to the point is probably true. 01:01:05.960 |
So you say that you can always buy performance, 01:01:09.280 |
buy in quotes performance by either training on more data, 01:01:12.960 |
better data, or by injecting task information 01:01:19.920 |
about the generalization power of the techniques used, 01:01:24.240 |
Do you think we can go far by coming up with better methods 01:01:29.960 |
for better methods of large-scale annotation of data? 01:01:35.240 |
- If you had made it, it's not cheating anymore. 01:01:43.080 |
about something that hasn't, from my perspective, 01:01:48.280 |
been researched too much is exponential improvement 01:01:58.200 |
- I think it's actually been researched quite a bit. 01:02:07.920 |
Sometimes they're gonna release a new benchmark. 01:02:15.800 |
into data annotation and good data annotation pipelines, 01:02:22.720 |
but do you think there's innovation happening? 01:02:33.880 |
You want to generate knowledge that can be reused 01:02:38.880 |
across different datasets, across different tasks. 01:02:51.440 |
this is no more useful than training a network 01:02:55.840 |
and then saying, "Oh, I found these weight values 01:03:12.120 |
because it really illustrates the fact that an architecture, 01:03:16.360 |
even without weights, an architecture is knowledge 01:03:45.160 |
For instance, I know if you've looked at the Baby dataset, 01:03:50.160 |
which is about natural language question answering, 01:04:06.200 |
you can solve this dataset with nearly 100% accuracy. 01:04:12.280 |
about how to solve question answering in general, 01:04:27.720 |
where he says, "The biggest lesson that we can read 01:04:30.360 |
"from 70 years of AI research is that general methods 01:04:46.640 |
by just having something that leverages computation 01:04:51.560 |
- Yeah, so I think Rich is making a very good point, 01:04:56.840 |
which are actually all about manually hard-coding 01:05:00.960 |
prior knowledge about a task into some system, 01:05:04.080 |
doesn't have to be deep learning architecture, 01:05:07.040 |
You know, these papers are not actually making any impact. 01:05:11.760 |
Instead, what's making really long-term impact 01:05:18.520 |
that are really agnostic to all these tricks, 01:05:23.360 |
And of course, the one general and simple thing 01:05:27.480 |
that you should focus on is that which leverages computation, 01:05:36.200 |
of large-scale computation has been increasing exponentially 01:05:40.600 |
So if your algorithm is all about exploiting this, 01:05:44.120 |
then your algorithm is suddenly exponentially improving. 01:05:52.440 |
However, you know, he's right about the past 70 years. 01:05:59.560 |
I am not sure that this assessment will still hold true 01:06:04.960 |
It might to some extent, I suspect it will not, 01:06:18.400 |
Like Moore's law might not be applicable anymore, 01:06:32.960 |
some other aspects starts becoming the bottleneck. 01:06:41.520 |
And I think we're already starting to be in a regime 01:06:49.320 |
and the quality of data and the scale of data 01:07:00.840 |
So I think we are gonna move from a focus on a scale 01:07:05.840 |
of a competition scale to focus on data efficiency. 01:07:10.760 |
So that's getting to the question of symbolic AI, 01:07:13.120 |
but to linger on the deep learning approaches, 01:07:16.160 |
do you have hope for either unsupervised learning 01:07:31.560 |
- So unsupervised learning and reinforcement learning 01:07:39.000 |
So usually when people say reinforcement learning, 01:07:41.200 |
what they really mean is deep reinforcement learning, 01:07:47.440 |
The question I was asking was unsupervised learning 01:07:50.920 |
with deep neural networks and deep reinforcement learning. 01:08:02.440 |
It is more efficient in terms of the number of annotations, 01:08:18.720 |
And sure, I mean, that's clearly a very good idea. 01:08:21.800 |
It's not really a topic I would be working on, 01:08:27.920 |
- So it would get us to solve some problems that- 01:08:52.800 |
- This is actually something I've briefly written about, 01:08:56.840 |
but the capabilities of deep learning technology 01:09:06.200 |
from mass surveillance with things like facial recognition, 01:09:11.920 |
in general, tracking lots of data about everyone 01:09:15.480 |
and then being able to making sense of this data 01:09:23.160 |
That's something that's being very aggressively pursued 01:09:29.960 |
One thing I am very much concerned about is that 01:09:40.680 |
are increasingly digital, made of information, 01:09:43.280 |
made of information consumption and information production 01:09:56.360 |
and you are in control of where you consume information, 01:10:07.000 |
then you can build a sort of reinforcement loop 01:10:13.880 |
You can observe the state of your mind at time T. 01:10:22.760 |
how to get you to move your mind in a certain direction. 01:10:27.080 |
And then you can feed you the specific piece of content 01:10:37.840 |
at scale in terms of doing it continuously in real time. 01:10:44.960 |
You can also do it at scale in terms of scaling this 01:11:00.720 |
all of our lives are moving to digital devices 01:11:04.160 |
and digital information consumption and creation, 01:11:22.560 |
- Let's look at the YouTube algorithm, Facebook, 01:11:26.160 |
anything that recommends content you should watch next. 01:11:29.680 |
And it's fascinating to think that there's some aspects 01:11:41.080 |
is this person hold Republican beliefs or Democratic beliefs 01:11:45.360 |
and it's a trivial, that's an objective function 01:12:02.000 |
if you look at the human mind as a kind of computer program, 01:12:13.520 |
For instance, when it comes to your political beliefs, 01:12:19.280 |
So for instance, if I'm in control of your newsfeed 01:12:26.000 |
this is actually where you're getting your news from. 01:12:29.400 |
And I can, of course I can choose to only show you news 01:12:33.680 |
that will make you see the world in a specific way, right? 01:12:44.640 |
And then when I get you to express a statement, 01:12:47.960 |
if it's a statement that me as the controller, 01:12:56.880 |
And that will reinforce the statement in your mind. 01:13:05.200 |
I can, on the other hand, show it to opponents, right? 01:13:10.560 |
And then because they attack you at the very least, 01:13:12.800 |
next time you will think twice about posting it. 01:13:18.960 |
stop believing this because you got pushback, right? 01:13:27.200 |
social media platforms can potentially control your opinions. 01:13:31.360 |
so all of these things are already being controlled 01:13:56.280 |
but also for mass opinion control and behavior control, 01:14:07.080 |
even without an explicit intent to manipulate, 01:14:11.320 |
you're already seeing very dangerous dynamics 01:14:28.640 |
Which seems fairly innocuous at first, right? 01:15:01.360 |
simply because they are not constrained to reality. 01:15:24.600 |
You can balance people's worldview with other ideas. 01:15:40.640 |
But there's also a large space that creates division 01:15:45.680 |
and destruction, civil war, a lot of bad stuff. 01:16:01.520 |
what kind of effects are going to be observed 01:16:10.240 |
But the question is, how do we get into rooms 01:16:16.280 |
So inside Google, inside Facebook, inside Twitter, 01:16:20.160 |
and think about, okay, how can we drive up engagement 01:16:34.800 |
I would feel rather uncomfortable with companies 01:16:39.480 |
that are in control of these news algorithms, 01:16:45.720 |
to manipulate people's opinions or behaviors, 01:17:00.360 |
but that's actually something I really care about. 01:17:06.320 |
to present configuration settings to their users 01:17:10.600 |
so that the users can actually make the decision 01:17:21.960 |
For instance, as a user of something like YouTube 01:17:24.840 |
or Twitter, maybe I want to maximize learning 01:17:30.360 |
So I want the algorithm to feed my curiosity, right? 01:17:35.360 |
Which is in itself a very interesting problem. 01:17:41.280 |
it will maximize how fast and how much I'm learning. 01:17:44.720 |
And it will also take into account the accuracy, 01:17:49.600 |
So yeah, the user should be able to determine exactly 01:17:55.680 |
how these algorithms are affecting their lives. 01:17:58.640 |
I don't want actually any entity making decisions 01:18:06.960 |
they're going to try to manipulate me, right? 01:18:11.760 |
So AI, these algorithms are increasingly going to be 01:18:20.080 |
And I want everyone to be in control of this interface, 01:18:25.080 |
to interface with the world on their own terms. 01:18:37.680 |
they should be able to configure these algorithms 01:18:51.120 |
which is some of the most beautiful fundamental philosophy 01:18:54.960 |
that we have before us, which is personal growth. 01:19:01.120 |
If I want to watch videos from which I can learn, 01:19:08.000 |
So if I have a checkbox that wants to emphasize learning, 01:19:11.840 |
there's still an algorithm with explicit decisions in it 01:19:19.080 |
Like, for example, I've watched a documentary 01:19:31.720 |
Not, 'cause I don't have such an allergic reaction 01:19:42.200 |
For others, they might just get turned off for that. 01:19:53.000 |
I don't think it's something that wouldn't happen, 01:20:05.560 |
- Well, it's mostly an interface design problem. 01:20:09.000 |
- The way I see it, you want to create technology 01:20:11.080 |
that's like a mentor or a coach or an assistant 01:20:31.880 |
You should be able to switch to a different algorithm. 01:20:40.160 |
I mean, that's how I see autonomous vehicles too, 01:20:46.440 |
Yeah, Adobe, I don't know if you use Adobe products 01:20:52.440 |
They're trying to see if they can inject YouTube 01:20:56.200 |
Basically allow you to show you all these videos 01:20:59.880 |
that, 'cause everybody's confused about what to do 01:21:02.840 |
with features, so basically teach people by linking to, 01:21:09.480 |
uses videos as a basic element of information. 01:21:18.320 |
to try to fight against abuses of these algorithms 01:21:27.400 |
- Honestly, it's a very, very difficult problem 01:21:30.120 |
there is very little public awareness of these issues. 01:21:33.920 |
Very few people would think there's anything wrong 01:21:39.720 |
even though there is actually something wrong already, 01:21:42.040 |
which is that it's trying to maximize engagement 01:21:44.480 |
most of the time, which has very negative side effects. 01:21:49.880 |
So ideally, so the very first thing is to stop 01:21:59.560 |
try to propagate content based on popularity, right? 01:22:16.920 |
when they look at topic recommendations on Twitter, 01:22:24.480 |
with switch recommendations, it's always the worst garbage 01:22:28.440 |
because it's content that appeals to the smallest 01:22:37.080 |
they're purely trying to optimize popularity, 01:22:39.040 |
they're purely trying to optimize engagement, 01:22:42.960 |
So they should put me in control of some setting 01:22:46.120 |
so that I define what's the objective function 01:22:54.080 |
And honestly, so this is all about interface design. 01:23:09.360 |
Like let the user tell us what they want to achieve, 01:23:13.200 |
how they want this algorithm to impact their lives. 01:23:18.720 |
by article reward structure where you give a signal, 01:23:34.840 |
the algorithm will attempt to relate your choices 01:23:39.120 |
with the choices of everyone else, which might, you know, 01:23:43.280 |
if you have an average profile that works fine, 01:23:49.560 |
If you don't, it can be, it's not optimal at all, actually. 01:23:56.080 |
for the part of the Spotify world that represents you. 01:24:07.960 |
like what Spotify has does not give me control 01:24:10.880 |
over what the algorithm is trying to optimize for. 01:24:14.960 |
- Well, public awareness, which is what we're doing now, 01:24:21.320 |
Do you have concerns about long-term existential threats 01:24:31.000 |
our world is increasingly made of information. 01:24:33.360 |
AI algorithms are increasingly gonna be our interface 01:24:37.840 |
and somebody will be in control of these algorithms. 01:24:41.440 |
And that puts us in any kind of a bad situation, right? 01:24:46.840 |
It has risks coming from potentially large companies 01:24:57.120 |
Also from governments who might want to use these algorithms 01:25:10.280 |
So maybe you're referring to the singularity narrative 01:25:19.360 |
and I don't believe it has to be a singularity. 01:25:25.640 |
the algorithm controlling masses of populations. 01:25:31.960 |
hurt ourselves much like a nuclear war would hurt ourselves. 01:25:40.440 |
that requires a loss of control over AI algorithms. 01:25:47.960 |
Honestly, I wouldn't want to make any long-term predictions. 01:25:52.880 |
I don't think today we really have the capability 01:25:56.880 |
to see what the dangers of AI are gonna be in 50 years, 01:26:02.280 |
I do see that we are already faced with concrete 01:26:07.280 |
and present dangers surrounding the negative side effects 01:26:12.320 |
of content recombination systems, of newsfeed algorithms, 01:26:18.440 |
So we are delegating more and more decision processes 01:26:32.680 |
Sometimes it's a good thing, sometimes not so much. 01:26:37.040 |
And there is in general very little supervision 01:26:41.640 |
So we are still in this period of very fast change, 01:26:46.040 |
even chaos, where society is restructuring itself, 01:26:53.840 |
which itself is turning into an increasingly automated 01:26:59.000 |
And well, yeah, I think the best we can do today 01:27:03.160 |
is try to raise awareness around some of these issues. 01:27:06.640 |
And I think we're actually making good progress. 01:27:08.280 |
If you look at algorithmic bias, for instance, 01:27:17.000 |
And now all the big companies are talking about it. 01:27:22.320 |
but at least it is part of the public discourse. 01:27:52.560 |
How do we have loss functions in neural networks 01:28:10.520 |
Like for now, we're just using very naive loss functions 01:28:16.600 |
what you're trying to minimize, it's everything else. 01:28:22.920 |
we're going to be focusing our human attention 01:28:30.280 |
Like what's actually driving the whole learning system, 01:28:36.880 |
loss function engineer is probably going to be 01:28:40.600 |
- And then the tooling you're creating with Keras 01:28:42.680 |
essentially takes care of all the details underneath. 01:28:47.000 |
And basically the human expert is needed for exactly that. 01:28:53.840 |
Keras is the interface between the data you're collecting 01:28:59.000 |
And your job as an engineer is going to be to express 01:29:02.440 |
your business goals and your understanding of your business 01:29:05.360 |
or your product, your system as a kind of loss function 01:29:11.760 |
- Does the possibility of creating an AGI system 01:29:18.120 |
- So intelligence can never really be general. 01:29:22.120 |
You know, at best it can have some degree of generality 01:29:29.040 |
in the same way that human intelligence is specialized 01:29:37.280 |
I'm never quite sure if they're talking about 01:29:39.480 |
very, very smart AI, so smart that it's even smarter 01:29:44.280 |
than humans, or they're talking about human-like 01:29:47.200 |
intelligence, because these are different things. 01:29:49.720 |
- Let's say, presumably I'm impressing you today 01:30:00.760 |
I'm impressing you with natural language processing. 01:30:11.160 |
- So that's very much about building human-like AI. 01:30:28.000 |
but, you know, from an intellectual perspective, 01:30:30.880 |
I think if you could build truly human-like intelligence, 01:30:34.160 |
that means you could actually understand human intelligence, 01:30:39.880 |
Human-like intelligence is gonna require emotions, 01:30:44.400 |
which is not things that would normally be required 01:30:49.720 |
If you look at, you know, we were mentioning earlier, 01:30:51.880 |
like science as superhuman problem-solving agent or system, 01:30:56.880 |
it does not have consciousness, it doesn't have emotions. 01:31:07.680 |
It is a component of the subjective experience 01:31:12.280 |
that is meant very much to guide behavior generation, right? 01:31:20.840 |
In general, human intelligence and animal intelligence 01:31:24.560 |
has evolved for the purpose of behavior generation, right? 01:31:36.640 |
developed in a different context may well never need them, 01:31:43.120 |
- Well, on that point, I would argue it's possible 01:31:46.000 |
to imagine that there's echoes of consciousness in science 01:31:51.000 |
when viewed as an organism, that science is consciousness. 01:31:54.560 |
- So, I mean, how would you go about testing this hypothesis? 01:32:06.440 |
- Well, the point of probing any subjective experience 01:32:09.560 |
is impossible, 'cause I'm not science, I'm Lex. 01:32:22.680 |
about your subjective experience and you can answer me, 01:32:27.400 |
- Yes, but that's because we speak the same language. 01:32:31.880 |
You perhaps, we have to speak the language of science 01:32:35.920 |
I don't think consciousness, just like emotions 01:32:38.640 |
of pain and pleasure, is not something that inevitably arises 01:33:15.240 |
maybe in a social context, generating behavior 01:33:20.840 |
that's not really what's happening, even though it is. 01:33:23.040 |
It is a form of artificial AI, artificial intelligence, 01:33:50.200 |
human-like intelligent system that has consciousness, 01:33:56.840 |
I mean, it doesn't have to be a physical body, right? 01:34:01.360 |
between a realistic simulation and the real world. 01:34:12.800 |
- In other humans, in order for you to demonstrate 01:34:16.920 |
that you have human-like intelligence essentially. 01:34:32.800 |
you've talked about in terms of theorem proving 01:34:48.960 |
I think it's related questions for human-like intelligence 01:34:56.160 |
- Right, so I mean, you're actually asking two questions, 01:34:59.440 |
which is one is about qualifying intelligence 01:35:02.680 |
and comparing the intelligence of an artificial system 01:35:15.600 |
So if you look, you mentioned earlier the Turing test. 01:35:20.160 |
- Well, I actually don't like the Turing test 01:35:23.400 |
It's all about completely bypassing the problem 01:35:37.480 |
If you want to measure how human-like an agent is, 01:35:43.360 |
I think you have to make it interact with other humans. 01:35:56.880 |
and compare it to what a human would actually have done. 01:36:11.240 |
So we're already talking about two things, right? 01:36:13.680 |
The degree, kind of like the magnitude of an intelligence 01:36:34.240 |
- So the direction, your sense, the space of directions 01:36:42.360 |
So the way you would measure the magnitude of intelligence 01:36:48.320 |
in a system in a way that also enables you to compare it 01:36:58.320 |
for intelligence today, they're all too focused on skill 01:37:04.320 |
That's skill at playing chess, skill at playing Go, 01:37:09.120 |
And I think that's not the right way to go about it 01:37:14.480 |
because you can always be too human at one specific task. 01:37:19.360 |
The reason why our skill at playing Go or juggling 01:37:23.680 |
or anything is impressive is because we are expressing 01:37:26.160 |
this skill within a certain set of constraints. 01:37:29.440 |
If you remove the constraints, the constraints 01:37:32.080 |
that we have one lifetime, that we have this body and so on, 01:37:35.880 |
if you remove the context, if you have unlimited string data, 01:37:41.960 |
if you look at juggling, if you have no restriction 01:37:44.920 |
on the hardware, then achieving arbitrary levels of skill 01:37:48.800 |
is not very interesting and says nothing about 01:37:55.720 |
you need to rigorously define what intelligence is, 01:37:59.920 |
which in itself, it's a very challenging problem. 01:38:07.520 |
I mean, you can provide, many people have provided 01:38:13.520 |
- Where does your definition begin if it doesn't end? 01:38:16.240 |
- Well, I think intelligence is essentially the efficiency 01:38:21.240 |
with which you turn experience into generalizable programs. 01:38:32.000 |
with which you turn a sampling of experience space 01:38:51.080 |
because many different tasks can be one proxy 01:38:58.840 |
You should control for the amount of experience 01:39:03.960 |
that your system has and the priors that your system has. 01:39:08.960 |
But if you control, if you look at two agents 01:39:13.960 |
and you give them the same amount of experience, 01:39:17.200 |
there is one of the agents that is going to learn programs, 01:39:49.560 |
because you're talking about experience space 01:39:51.920 |
and you're talking about segments of experience space. 01:40:13.720 |
the experience space, even though it's specialized. 01:40:16.240 |
There's a certain point when the experience space 01:40:18.520 |
is large enough to where it might as well be general. 01:40:44.640 |
Like, many people have worked on this problem, 01:40:52.400 |
She's worked a lot on what she calls core knowledge, 01:40:56.120 |
and it is very much about trying to determine 01:41:02.480 |
- Like language skills and so on, all that kind of stuff. 01:41:11.480 |
So we could, so I've actually been working on a benchmark 01:41:16.480 |
for the past couple of years, you know, on and off. 01:41:18.760 |
I hope to be able to release it at some point. 01:41:21.400 |
The idea is to measure the intelligence of systems 01:41:34.920 |
so that you can actually compare these scores 01:41:39.680 |
and you can actually have humans pass the same test 01:41:49.200 |
any amount of practicing does not increase your score. 01:42:06.360 |
- As a person who deeply appreciates practice, 01:42:21.800 |
so the only thing you can measure is skill at a task. 01:42:32.360 |
And then you make sure that this is the same set of priors 01:42:36.280 |
So you create a task that assumes these priors, 01:42:44.520 |
And then you generate a certain number of samples 01:42:54.880 |
assuming that the task is new for the agent passing it, 01:42:59.320 |
that's one test of this definition of intelligence 01:43:07.360 |
And now you can scale that to many different tasks, 01:43:10.960 |
each task should be new to the agent passing it, right? 01:43:17.480 |
so that you can actually have a human pass the same test, 01:43:19.960 |
and then you can compare the score of your machine 01:43:26.960 |
just as long as you start with the same set of priors. 01:43:30.600 |
humans are already trying to recognize digits, right? 01:43:39.640 |
that are not digits, some completely arbitrary patterns. 01:43:51.960 |
you would have to isolate these priors and describe them, 01:43:56.200 |
and then express them as computational rules. 01:43:58.520 |
- Having worked a lot with vision science people, 01:44:09.640 |
I mean, we're still probably far away from that perfectly, 01:44:20.880 |
objectness as one of the core knowledge priors. 01:44:38.920 |
sure, we have this pretty diverse and rich set of priors, 01:45:02.840 |
it feels to us humans that that set is not that large? 01:45:11.680 |
through all of our perception, all of our reasoning, 01:45:23.280 |
and then the human brain in order to encode those priors. 01:45:43.120 |
And DNA is a very, very low bandwidth medium. 01:46:01.640 |
the higher level of information you're trying to write, 01:46:31.160 |
that are true over very, very long periods of time, 01:46:35.400 |
For instance, we might have some visual prior 01:46:43.640 |
what's the difference between a face and an ant face? 01:46:49.800 |
Do we have any innate sense of the visual difference 01:47:01.800 |
- I would have to look back into evolutionary history 01:47:08.400 |
I mean, the faces of humans are quite different 01:47:10.440 |
from the faces of great apes, great apes, right? 01:47:17.520 |
- You couldn't tell the face of a female chimpanzee 01:47:21.440 |
from the face of a male chimpanzee, probably. 01:47:23.520 |
- Yeah, and I don't think most humans have all that ability. 01:47:26.280 |
- So we do have innate knowledge of what makes a face, 01:47:30.880 |
but it's actually impossible for us to have any DNA 01:47:36.800 |
between a female human face and a male human face, 01:47:44.960 |
came up into the world actually very recently. 01:48:02.960 |
That naturally creates a very efficient encoding. 01:48:05.360 |
- And one important consequence of this is that, 01:48:13.760 |
sometimes a high level knowledge about the world, 01:48:24.200 |
almost all of this innate knowledge is shared 01:48:53.200 |
that are important to survival and production, 01:48:56.320 |
so for which there is some evolutionary pressure, 01:49:04.960 |
And honestly, it's not that much information. 01:49:07.240 |
There's also, besides the bandwidths constraint 01:49:18.520 |
Like DNA, the part of DNA that deals with the human brain, 01:49:23.480 |
It's like, you know, on the order of megabytes, right? 01:49:31.200 |
- That's quite brilliant and hopeful for a benchmark 01:49:35.200 |
of, that you're referring to, of encoding priors. 01:49:41.680 |
whether you can do it in the next couple of years, 01:49:47.440 |
and it's not like a big breakthrough or anything. 01:49:56.360 |
- These fun side projects could launch entire groups 01:50:00.760 |
of efforts towards creating reasoning systems and so on. 01:50:06.840 |
It's trying to measure strong generalization, 01:50:09.200 |
to measure the strength of abstraction in our minds, 01:50:12.840 |
well, in our minds and in artificial intelligence. 01:50:17.080 |
- And if there's anything true about this science organism, 01:50:36.280 |
So an AI winter is something that would occur 01:50:41.360 |
between how we are selling the capabilities of AI 01:50:47.400 |
And today, so deep learning is creating a lot of value, 01:50:50.760 |
and it will keep creating a lot of value in the sense that 01:50:54.760 |
these models are applicable to a very wide range of problems 01:51:06.360 |
So deep learning will keep creating a lot of value 01:51:10.280 |
What's concerning, however, is that there's a lot of hype 01:51:16.280 |
There are lots of people are overselling the capabilities 01:51:22.880 |
but also overselling the fact that they might be 01:51:39.360 |
which, you know, it might look fast in the sense that 01:51:43.960 |
we have this exponentially increasing number of papers. 01:51:46.760 |
But again, that's just a simple consequence of the fact 01:51:51.720 |
that we have ever more people coming into the field. 01:51:54.600 |
It doesn't mean the progress is actually exponentially fast. 01:52:05.160 |
a grandiose story to investors about how deep learning 01:52:11.640 |
all these incredible problems like self-driving 01:52:15.880 |
And maybe you can tell them that the field is progressing 01:52:18.280 |
so fast and we are going to have AGI within 15 years 01:52:25.960 |
And every time you're like saying these things 01:52:30.440 |
and an investor or, you know, a decision maker believes them, 01:52:34.520 |
well, this is like the equivalent of taking on 01:52:47.800 |
this will be what enables you to raise a lot of money, 01:52:57.320 |
that's what happens with the other AI winters. 01:52:59.240 |
Is the concern is you actually tweet about this 01:53:04.120 |
There's almost every single company now have promised 01:53:07.320 |
that they will have full autonomous vehicles by 2021, 2022. 01:53:19.080 |
- So, because I work especially a lot recently in this area, 01:53:25.720 |
when all of these companies, after I've invested billions, 01:53:29.400 |
have a meeting and say, how much did we actually, 01:53:31.960 |
first of all, do we have an autonomous vehicle? 01:53:37.880 |
we've invested one, two, three, $4 billion into this 01:53:43.160 |
And the reaction to that may be going very hard 01:53:53.480 |
where no one believes any of these promises anymore 01:53:59.640 |
And this will definitely happen to some extent 01:54:04.600 |
because the public and decision makers have been convinced 01:54:12.760 |
by these people who are trying to raise money 01:54:30.360 |
a full-on AI winter because we have these technologies 01:54:33.400 |
that are producing a tremendous amount of real value. 01:54:43.560 |
So, you know, some startups are trying to sell 01:54:49.800 |
And the fact that AGI is going to create infinite value, 01:54:58.920 |
that passes a certain threshold of IQ or something, 01:55:04.360 |
And well, there are actually lots of investors 01:55:11.240 |
And, you know, they will wait maybe 10, 15 years 01:55:17.240 |
And the next time around, well, maybe there will be 01:55:23.400 |
You know, human memory is fairly short after all. 01:55:26.920 |
- I don't know about you, but because I've spoken about AGI 01:55:31.640 |
sometimes poetically, like I get a lot of emails from people 01:55:35.480 |
giving me, they're usually like large manifestos 01:55:39.960 |
of they say to me that they have created an AGI system 01:55:50.120 |
- They're a little bit, feel like it's generated 01:55:53.560 |
by an AI system actually, but there's usually no diagram. 01:55:57.960 |
- Maybe that's recursively self-improving AI. 01:56:06.200 |
- So the question is about, because you've been such a good, 01:56:16.680 |
How do I, so when you start to talk about AGI 01:56:22.280 |
or anything like the reasoning benchmarks and so on, 01:56:31.160 |
who's really looking at neuroscience approaches to how, 01:56:35.160 |
and there's some, there's echoes of really interesting ideas 01:56:45.000 |
Like preventing yourself from being too narrow-minded 01:56:52.680 |
It has to work on these particular benchmarks, 01:57:08.440 |
if you're not doing an improvement on some benchmark, 01:57:12.360 |
Maybe it's not something we've been looking at before, 01:57:14.600 |
but you do need a problem that you're trying to solve. 01:57:25.480 |
If you want to claim that you have an intelligence system, 01:57:35.720 |
It should show that it can create some form of value, 01:57:40.040 |
even if it's a very artificial form of value. 01:57:42.760 |
And that's also the reason why you don't actually 01:57:48.600 |
have actually some hidden potential and which do not. 01:57:57.960 |
you know, this is going to be brought to light very quickly 01:58:02.440 |
So it's the difference between something that's ineffectual 01:58:14.920 |
maybe there are many, many people over the years 01:58:16.920 |
that have had some really interesting theories 01:58:19.880 |
of everything, but they were just completely useless. 01:58:22.840 |
And you don't actually need to tell the interesting theories 01:58:30.200 |
is this actually having an effect on something else? 01:58:38.680 |
I mean, the same applies to quantum mechanics, 01:58:41.000 |
to a string theory, to the holographic principle. 01:58:43.480 |
- Like we are doing deep learning because it works. 01:58:45.480 |
You know, that's like before it started working, 01:58:53.240 |
Like, you know, no one was working on this anymore. 01:58:56.360 |
And now it's working, which is what makes it valuable. 01:59:09.240 |
while being called cranks, stuck with it, right? 01:59:21.880 |
- That's a beautiful, inspirational message to end on. 01:59:25.960 |
Francois, thank you so much for talking today.