back to indexJuergen Schmidhuber: Godel Machines, Meta-Learning, and LSTMs | Lex Fridman Podcast #11
Chapters
0:0
9:15 Traveling Salesman Problems
19:29 Does God Play Dice
27:53 The General Theory of Relativity
38:7 The Role of Creativity and Intelligence
47:29 What Is the Importance of Depth
73:6 They Will Want To Understand that Completely Just like People Today Would Like To Understand How Life Works and Um and Also the History of Our Own Existence and Civilization and Also on the Physical Laws That Created all of that so They in the Beginning They Will Be Fascinated My Life once They Understand that I Was Interest like Anybody Who Loses Interest and Things He Understands and Then as You Said the Most Interesting Sources Information for Them Will Be Others of Their Own Kind So At Least in the Long Run
00:00:00.000 |
The following is a conversation with Jürgen Schmidhuber. 00:00:06.360 |
and a co-creator of long short-term memory networks. 00:00:13.720 |
for speech recognition, translation, and much more. 00:00:17.400 |
Over 30 years, he has proposed a lot of interesting 00:00:38.840 |
If you enjoy it, subscribe on YouTube, iTunes, 00:00:47.320 |
And now, here's my conversation with Jürgen Schmidhuber. 00:01:27.960 |
And that means you have to become a physicist. 00:01:30.720 |
However, then I realized that there's something even grander 00:01:41.920 |
that learns to become a much better physicist 00:01:46.880 |
And that's how I thought maybe I can multiply 00:01:50.120 |
my tiny little bit of creativity into infinity. 00:01:54.320 |
- But ultimately that creativity will be multiplied 00:01:59.120 |
That's the curiosity for that mystery that drove you? 00:02:08.320 |
that learns to solve more and more complex problems 00:02:16.760 |
then you basically have solved all the problems, 00:02:43.640 |
- So in the 80s, I thought about how to build this machine 00:02:54.120 |
And I thought it is clear it has to be a machine 00:02:57.160 |
that not only learns to solve this problem here 00:03:00.880 |
and this problem here, but it also has to learn 00:03:12.480 |
in a representation that allows it to inspect it 00:03:22.120 |
So I call that meta-learning, learning to learn 00:03:37.520 |
but you also improve the way the machine improves 00:03:48.600 |
which was all about that hierarchy of meta-learners 00:04:05.720 |
- In the recent years, meta-learning has gained popularity 00:04:12.840 |
You've talked about how that's not really meta-learning 00:04:16.040 |
with neural networks, that's more basic transfer learning. 00:04:27.960 |
the way it's used today, the way it's talked about today? 00:04:30.880 |
- Let's take the example of a deep neural network 00:04:48.120 |
and you want to quickly learn the new thing as well. 00:04:52.000 |
So one simple way of doing that is you take the network, 00:05:02.440 |
and then you would just take the top layer of that 00:05:06.320 |
and you retrain that using the new label data 00:05:14.720 |
And then it turns out that it really, really quickly 00:05:24.320 |
it already has learned so much about computer vision 00:05:27.560 |
that it can reuse that, and that is then almost good enough 00:05:31.880 |
to solve the new task, except you need a little bit 00:05:37.080 |
So that is transfer learning, and it has been done 00:05:51.080 |
having the learning algorithm itself open to introspection 00:06:04.800 |
such that the learning system has an opportunity 00:06:07.880 |
to modify any part of the learning algorithm, 00:06:12.120 |
and then evaluate the consequences of that modification, 00:06:21.040 |
a better learning algorithm, and so on recursively. 00:06:46.560 |
mathematically, these are really compelling ideas, 00:06:49.960 |
but practically, do you see these self-referential programs 00:06:58.360 |
to having an impact where sort of it demonstrates 00:07:20.320 |
and things like that, that you need to come up with 00:07:25.520 |
asymptotically optimal, theoretically optimal, 00:07:33.200 |
However, one has to admit that through this proof search, 00:07:46.760 |
that vanishes in comparison to what you have to do 00:08:03.280 |
And that's why we also have been doing other things, 00:08:08.120 |
non-universal things, such as recurrent neural networks, 00:08:15.400 |
and local search techniques, which aren't universal at all, 00:08:25.400 |
as long as we only want to solve the small problems 00:08:35.560 |
So the universal problem solvers, like the Godel machine, 00:08:48.080 |
they are associated with these constant overheads 00:08:52.520 |
for proof search, which guarantees that the thing 00:09:01.160 |
of solving all problems with a computable solution, 00:09:23.680 |
through all these cities without visiting any city twice. 00:09:32.320 |
of solving traveling salesman problems, TSPs, 00:09:36.560 |
but let's assume there is a method of solving them 00:09:54.600 |
is going to solve the same traveling salesman problem 00:10:02.160 |
plus O of one, plus a constant number of steps 00:10:09.280 |
which you need to show that this particular class 00:10:14.120 |
of problems, the traveling salesman problems, 00:10:24.320 |
And this additive constant doesn't care for N, 00:10:28.440 |
which means as N is getting larger and larger, 00:10:38.640 |
And that means that almost all large problems are solved 00:10:45.600 |
Already today, we already have a universal problem solver 00:10:50.600 |
However, it's not practical because the overhead, 00:11:00.320 |
that we want to solve in this little biosphere. 00:11:06.520 |
you're talking about things that fall within the constraints 00:11:11.000 |
So they can seem quite large to us mere humans, right? 00:11:21.120 |
but they are still small compared to almost all problems 00:11:24.880 |
because almost all problems are large problems, 00:11:31.000 |
- Do you find it useful as a person who has dreamed 00:11:52.440 |
this kind of worst case analysis type of thinking, 00:12:02.680 |
to give you intuition about what's good and bad. 00:12:11.840 |
And in fact, as you are thinking about that problem, 00:12:21.320 |
On the other hand, we have to admit that at the moment, 00:12:28.400 |
for all kinds of problems that we are now solving 00:12:38.840 |
There we are using general purpose computers, 00:12:46.720 |
which is just local search, gradient descent, 00:12:54.420 |
such that it can solve some interesting problems, 00:13:00.560 |
or machine translation, and something like that. 00:13:06.480 |
behind the best solutions that we have at the moment 00:13:17.120 |
without ever really proving that that system is intelligent 00:13:26.300 |
within some kind of syntactic definition of a language? 00:13:31.120 |
by the thing working extremely well, and that's sufficient? 00:13:54.320 |
like here in this universe, or on this little planet, 00:13:57.040 |
has to take into account these limited resources. 00:14:04.960 |
a theory which is related to what we already have, 00:14:09.960 |
these asymptotically optimal problem solvers, 00:14:14.440 |
which tells us what we need in addition to that 00:14:18.560 |
to come up with a practically optimal problem solver. 00:14:21.760 |
So I believe we will have something like that. 00:14:42.560 |
and returning neural networks and long, short-term memory 00:14:55.520 |
but that doesn't mean that we think we are done. 00:15:11.880 |
Can you talk through your intuition behind this idea? 00:15:25.560 |
- Experience tells us that the stuff that works best 00:15:33.120 |
So the asymptotically optimal ways of solving problems, 00:15:38.800 |
they're just a few lines of code, it's really true. 00:15:45.800 |
Then the most promising and most useful practical thing 00:15:57.800 |
However, they are also just a few lines of code. 00:16:05.080 |
you can write them down in five lines of pseudocode. 00:16:15.640 |
is the lines of pseudocode are sitting on top of layers 00:16:25.040 |
it'll be a beautifully written sort of algorithm, 00:16:29.040 |
but do you think that there's many layers of abstractions 00:16:36.880 |
- Yeah, of course, we are building on all these 00:16:44.000 |
over the millennia, such as matrix multiplications. 00:16:49.760 |
And real numbers and basic arithmetics and calculus 00:17:01.240 |
and derivatives of error functions and stuff like that. 00:17:04.280 |
So without that language that greatly simplifies 00:17:16.560 |
we are standing on the shoulders of the giants 00:17:26.360 |
that now we have a chance to do the final step. 00:17:32.120 |
If we take a step back through all of human civilization 00:17:50.880 |
and inefficient process of evolution is needed 00:18:04.600 |
in order to create something like human level intelligence? 00:18:07.720 |
- So far, the only example we have is this one, 00:18:16.560 |
- Maybe not, but we are part of this whole process. 00:18:30.000 |
that the code that runs the universe is really, really simple. 00:18:39.960 |
are really simple laws that can be easily described 00:19:03.240 |
is going to figure out the pseudo random generator 00:19:31.880 |
- So a couple of years ago, a famous physicist, 00:19:44.840 |
One of the fundamental insights of the 20th century 00:19:52.280 |
was that the universe is fundamentally random 00:20:02.920 |
And that whenever you measure spin up or down 00:20:08.280 |
a new bit of information enters the history of the universe. 00:20:19.320 |
and they had to publish it because I was right. 00:20:21.720 |
That there is no evidence, no physical evidence for that. 00:20:35.760 |
such as the decimal expansion of pi, 3.141 and so on, 00:20:44.360 |
So pi is interesting because every three digit sequence, 00:21:22.600 |
and figures out, oh, it's the second billion digits of pi 00:21:28.520 |
We don't have any fundamental reason at the moment 00:21:49.840 |
like gravity and the other basic forces are very simple. 00:21:54.120 |
So very short programs can explain what these are doing. 00:22:08.160 |
the seemingly random data points that we get all the time, 00:22:12.880 |
that we really need a huge number of extra bits 00:22:17.920 |
to describe all these extra bits of information. 00:22:29.760 |
that computes the entire history of the entire universe, 00:22:34.000 |
we are, as scientists, compelled to look further 00:22:47.800 |
that can backtrack to the creation of the universe. 00:22:52.000 |
So compute the shortest path to the creation of the universe. 00:23:15.720 |
We don't have a proof that it is compressible 00:23:28.440 |
So you said simplicity is beautiful or beauty is simple. 00:23:37.080 |
the romantic notion of randomness, of serendipity, 00:23:42.840 |
of being surprised by things that are about you, 00:24:13.080 |
A universe that is compressible to a short program 00:24:22.920 |
than another one, which needs an almost infinite number 00:24:30.160 |
many things that are happening in this universe 00:24:43.600 |
So all of that seems to be very, very simple. 00:24:45.800 |
Every electron seems to reuse the same sub-program 00:25:04.440 |
injecting new bits of information all the time 00:25:07.800 |
for these extra things which are currently not understood, 00:25:23.640 |
of the data that we can observe of the history 00:25:48.680 |
and you've talked about the idea of compression. 00:26:14.640 |
Hundreds of years ago, there was an astronomer 00:26:43.680 |
And another guy came along whose name was Newton, 00:27:09.800 |
And suddenly, many, many of these observations 00:27:16.960 |
because as long as you can predict the next thing, 00:27:33.680 |
and you had deviations from these predictions of the theory. 00:27:41.880 |
and he was able to explain away all these deviations 00:27:52.920 |
which was called the general theory of relativity, 00:27:56.960 |
which at first glance looks a little bit more complicated, 00:28:02.760 |
but you can phrase it within one single sentence, 00:28:12.920 |
and no matter what is the gravity in your local framework, 00:28:21.360 |
And from that, you can calculate all the consequences. 00:28:25.760 |
and it allows you to further compress all the observations 00:28:30.360 |
because certainly there are hardly any deviations any longer 00:28:40.080 |
So all of science is a history of compression progress. 00:28:56.320 |
You see, oh, first I needed so many bits of information 00:28:59.520 |
to describe the data, to describe my falling apples, 00:29:10.120 |
there is a very simple way of predicting the third frame 00:29:16.080 |
And maybe not every little detail can be predicted, 00:29:22.680 |
that are coming down, they accelerate in the same way, 00:29:25.880 |
which means that I can greatly compress the video. 00:29:37.360 |
That's the fun that you have, the scientific fun, 00:29:43.040 |
And we can build artificial systems that do the same thing. 00:29:50.800 |
which is coming in through their own experiments. 00:29:53.520 |
And we give them a reward, an intrinsic reward, 00:30:01.200 |
And since they are trying to maximize the rewards they get, 00:30:08.160 |
they are suddenly motivated to come up with new action 00:30:11.160 |
sequences, with new experiments that have the property 00:30:15.760 |
that the data that is coming in as a consequence 00:30:24.040 |
see a pattern in there which they hadn't seen yet before. 00:30:28.760 |
- So there's an idea of power play that you've described, 00:30:32.080 |
a training in general problem solver in this kind of way 00:30:38.200 |
- Can you describe that idea a little further? 00:30:50.280 |
and then there is a huge search space of potential solution 00:31:13.040 |
That's what most of computer science is about. 00:31:15.840 |
Power play just goes one little step further and says, 00:31:20.000 |
let's not only search for solutions to a given problem, 00:31:24.640 |
but let's search to pairs of problems and their solutions 00:31:37.320 |
So we are looking suddenly at pairs of problems 00:31:41.000 |
and their solutions or modifications of the problem 00:32:00.440 |
are like scientists in the sense that they not only try 00:32:04.360 |
to solve and try to find answers to existing questions, 00:32:08.200 |
no, they are also free to pose their own questions. 00:32:13.320 |
So if you want to build an artificial scientist, 00:32:19.520 |
So that's a dimension of freedom that's important to have. 00:32:30.800 |
the space of then coming up with your own questions is? 00:32:34.240 |
So it's one of the things that as human beings 00:32:36.920 |
we consider to be the thing that makes us special, 00:32:42.200 |
is that brilliant insight that can create something totally 00:32:51.280 |
Let's look at the set of all possible problems 00:32:55.560 |
that you can formally describe, which is infinite, which 00:33:01.160 |
should be the next problem that a scientist or a power play 00:33:08.120 |
Well, it should be the easiest problem that goes 00:33:20.720 |
that the current problem solver that you have, 00:33:26.600 |
that he cannot solve yet by just generalizing. 00:33:32.400 |
So it has to require a modification of the problem 00:33:37.400 |
solve this new thing, but the old problem solver cannot do it. 00:33:41.560 |
And in addition to that, we have to make sure 00:33:57.480 |
in the set of pairs of problems and problem solver 00:34:01.600 |
modifications for a combination that minimize the time 00:34:08.160 |
So it's always trying to find the problem which is easiest 00:34:14.720 |
So just like grad students and academics and researchers 00:34:18.880 |
can spend their whole career in a local minima, 00:34:22.200 |
stuck trying to come up with interesting questions, 00:34:27.600 |
do you think it's easy in this approach of looking 00:34:42.600 |
that you've already solved in a genuine creative way? 00:34:49.920 |
that it's always trying to break its current generalization 00:34:53.960 |
abilities by coming up with a new problem which 00:35:00.920 |
Just shifting the horizon of knowledge a little bit 00:35:16.400 |
Gödel did when he came up with these new sentences, 00:35:20.640 |
new theorems that didn't have a proof in the formal system, 00:35:23.840 |
which means you can add them to the repertoire, 00:35:27.680 |
hoping that they are not going to damage the consistency 00:35:39.640 |
Formal Theory of Creativity, Fun and Intrinsic Motivation, 00:35:44.320 |
you talk about discovery as intrinsic reward. 00:35:51.640 |
what do you think is the purpose and meaning of life 00:35:58.800 |
Do you see humans as an instance of power play, agents? 00:36:23.520 |
And that's how they learn about gravity and everything. 00:36:27.320 |
And yeah, in 1990, we had the first systems like that, 00:36:30.960 |
which just tried to play around with the environment 00:36:34.200 |
and come up with situations that go beyond what 00:36:42.720 |
and then becoming more general problem solvers 00:36:45.800 |
and being able to understand more of the world. 00:36:48.960 |
So yeah, I think in principle, that curiosity strategy 00:36:59.920 |
or more sophisticated versions of what I just described, 00:37:06.480 |
because evolution discovered that's a good way of exploring 00:37:19.480 |
On the other hand, those guys who were too curious, 00:37:31.960 |
is a certain percentage of extremely explorative guys. 00:37:38.040 |
because many of the others are more conservative. 00:37:54.440 |
wouldn't be present in almost exactly the same form here. 00:38:07.600 |
what do you think is the role of creativity in intelligence? 00:38:12.360 |
essential for intelligence, if you think of intelligence 00:38:17.520 |
as a problem-solving system, as ability to solve problems. 00:38:23.280 |
But do you think it's essential, this idea of creativity? 00:38:34.400 |
It's just a side effect of what our problem solvers do. 00:38:40.200 |
or a space of candidates, of solution candidates, 00:38:44.600 |
until they hopefully find a solution to a given problem. 00:38:48.160 |
But then there are these two types of creativity. 00:38:50.520 |
And both of them are now present in our machines. 00:38:54.280 |
The first one has been around for a long time, 00:39:03.360 |
And this has been happening for many decades. 00:39:07.400 |
found creative solutions to interesting problems, 00:39:11.400 |
where humans were not aware of these particularly 00:39:37.240 |
So here is the artist, and he makes a convincing picture 00:39:41.240 |
of the pope, and the pope likes it and gives him the money. 00:39:52.760 |
you have the freedom to select your own problem, 00:39:57.040 |
like a scientist who defines his own question to study. 00:40:03.400 |
And so that is the pure creativity, if you will, 00:40:15.720 |
almost echoes of narrow AI versus general AI. 00:40:19.160 |
So this kind of constrained painting of a pope 00:40:22.720 |
seems like the approaches of what people are calling 00:40:46.040 |
- If you zoom back a little bit, and you just 00:40:48.520 |
look at a general problem-solving machine, which 00:41:00.160 |
So all of what I said just now about this pre-wired curiosity 00:41:05.320 |
and this will to invent new problems that the system 00:41:09.040 |
doesn't know how to solve yet should be just a byproduct 00:41:15.040 |
However, apparently, evolution has built it into us, 00:41:25.080 |
pre-wiring, a bias, a very successful exploratory bias 00:41:35.680 |
in the same kind of way may be a byproduct of problem solving. 00:41:41.200 |
Do you think-- do you find this an interesting byproduct? 00:41:47.040 |
What are your thoughts on consciousness in general? 00:41:49.320 |
Or is it simply a byproduct of greater and greater 00:41:53.120 |
capabilities of problem solving that's similar to creativity 00:42:00.920 |
- Yeah, we never have a procedure called consciousness 00:42:11.880 |
seem to be closely related to what people call consciousness. 00:42:19.880 |
had simple systems, which were basically recurrent networks, 00:42:26.200 |
trying to map incoming data into actions that lead to success. 00:42:40.400 |
whenever the battery is low and negative signals are coming 00:42:42.720 |
from the battery, always find the charging station in time 00:42:47.240 |
without bumping against painful obstacles on the way. 00:42:50.520 |
So complicated things, but very easily motivated. 00:43:04.800 |
What will happen as a consequence of these actions 00:43:09.360 |
And it's just trained on the long and long history 00:43:14.040 |
So it becomes a predictive model of the world, basically. 00:43:18.120 |
And therefore, also a compressor of the observations 00:43:22.720 |
of the world, because whatever you can predict, 00:43:26.560 |
So compression is a side effect of prediction. 00:43:30.600 |
And how does this recurrent network compress? 00:43:35.720 |
little subnetworks that stand for everything that frequently 00:43:39.960 |
appears in the environment, like bottles and microphones 00:43:43.840 |
and faces, maybe lots of faces in my environment. 00:43:48.120 |
So I'm learning to create something like a prototype 00:43:52.920 |
and all I have to encode are the deviations from the prototype. 00:43:56.360 |
So it's compressing all the time the stuff that frequently 00:44:00.880 |
There's one thing that appears all the time, that 00:44:05.200 |
is present all the time when the agent is interacting 00:44:08.000 |
with its environment, which is the agent itself. 00:44:14.520 |
it is extremely natural for this recurrent network 00:44:21.080 |
stand for the properties of the agents, the hand, 00:44:29.080 |
that you need to better encode the data which is influenced 00:44:34.360 |
So there, just as a side effect of data compression 00:44:39.040 |
during problem solving, you have internal self-models. 00:44:45.800 |
Now, you can use this model of the world to plan your future, 00:44:51.360 |
and that's what you also have done since 1990. 00:44:54.040 |
So the recurrent network, which is the controller, which 00:45:01.840 |
of the world, this model network of the world, 00:45:05.620 |
to plan ahead and say, let's not do this action sequence. 00:45:14.580 |
And whenever it's waking up these little subnetworks that 00:45:18.900 |
stand for itself, then it's thinking about itself. 00:45:22.220 |
Then it's thinking about itself, and it's exploring mentally 00:45:47.300 |
is a process of compressing that data to act efficiently, 00:45:54.140 |
in that data, you yourself appear very often. 00:45:57.640 |
So it's useful to form compressions of yourself. 00:46:02.740 |
of what consciousness is as a necessary side effect. 00:46:16.500 |
long short-term memory networks, that are type 00:46:23.940 |
So these are networks that model the temporal aspects 00:46:36.260 |
So what do you think is the value of depth in the models 00:46:41.220 |
Since you mentioned the long short-term memory and the LSTM, 00:46:48.340 |
I have to mention the names of the brilliant students who 00:46:55.220 |
Sepp Hochreiter, who had fundamental insights already 00:47:00.260 |
Then Felix Geers, who had additional important 00:47:08.100 |
is mostly responsible for this CTC algorithm, which 00:47:11.420 |
is now often used to train the LSTM to do the speech 00:47:15.620 |
recognition on all the Google, Android phones, and whatever, 00:47:21.540 |
So these guys, without these guys, I would be nothing. 00:47:36.220 |
are deep in the sense that the current input doesn't tell you 00:47:49.780 |
And often, important parts of that memory are dated. 00:47:56.460 |
So when you're doing speech recognition, for example, 00:48:04.820 |
about half a second or something like that, which 00:48:18.660 |
But now the system has to see the distinction between 7 00:48:34.940 |
So there you have already a problem of depth 50, 00:48:38.100 |
because for each time step, you have something 00:48:41.320 |
like a virtual layer in the expanded unrolled version 00:48:44.900 |
of this recurrent network, which is doing the speech recognition. 00:48:48.380 |
So these long time lags, they translate into problem depth. 00:48:57.780 |
that you really have to look far back in time 00:49:01.620 |
to understand what is the problem and to solve it. 00:49:06.180 |
But just like with LCMs, you don't necessarily 00:49:09.100 |
need to, when you look back in time, remember every aspect. 00:49:12.340 |
You just need to remember the important aspects. 00:49:15.500 |
The network has to learn to put the important stuff 00:49:18.580 |
into memory and to ignore the unimportant noise. 00:49:24.180 |
But in that sense, deeper and deeper is better? 00:49:30.980 |
I mean, LSTM is one of the great examples of architectures 00:49:36.540 |
that do something beyond just deeper and deeper networks. 00:49:42.380 |
There's clever mechanisms for filtering data, 00:49:47.860 |
So do you think that kind of thinking is necessary? 00:49:51.340 |
If you think about LSTMs as a leap, a big leap forward 00:49:57.820 |
do you think is the next leap within this context? 00:50:06.060 |
but LCM still don't have the same kind of ability 00:50:14.740 |
The credit assignment problem across way back, 00:50:24.540 |
It's not clear what are the practical limits of the LSTM 00:50:35.100 |
where it not only looked back tens of thousands of steps, 00:50:44.820 |
think was the first author of a paper where we really-- 00:51:20.180 |
which is trying to maximize its future expected reward 00:51:24.260 |
and doesn't know yet which of these many possible futures 00:51:27.580 |
should I select, given there's one single past, 00:51:31.620 |
is facing problems that the LSTM by itself cannot solve. 00:51:38.900 |
with a compact representation of the history so far, 00:51:42.380 |
of the history of observations and actions so far. 00:51:46.380 |
But now, how do you plan in an efficient and good way 00:51:54.340 |
how do you select one of these many possible action sequences 00:51:59.860 |
has to consider to maximize reward in this unknown future? 00:52:12.820 |
gets in the video and the speech and whatever, 00:52:15.940 |
and is executing actions, and is trying to maximize reward. 00:52:38.460 |
to make better predictions of the next time step. 00:52:41.620 |
So essentially, although it's predicting only the next time 00:52:45.020 |
step, it is motivated to learn to put into memory something 00:52:52.420 |
because it's important to memorize that if you 00:52:54.980 |
want to predict that at the next time step, the next event. 00:53:14.340 |
to efficiently select among these many possible futures? 00:53:23.100 |
was let's just use the model of the world as a stand-in, 00:53:29.340 |
And millisecond by millisecond, we plan the future. 00:53:32.420 |
And that means we have to roll it out really in detail. 00:53:36.260 |
And it will work only if the model is really good. 00:53:40.380 |
because we have to look at all these possible futures. 00:53:55.500 |
to learn by itself how to use the potentially relevant parts 00:54:00.620 |
of the model network to solve new problems more quickly. 00:54:06.300 |
And if it wants to, it can learn to ignore the M. 00:54:10.100 |
And sometimes it's a good idea to ignore the M, 00:54:14.500 |
It's a bad predictor in this particular situation of life 00:54:23.100 |
However, it can also learn to address and exploit 00:54:27.100 |
some of the subprograms that came about in the model 00:54:31.980 |
network through compressing the data by predicting it. 00:54:40.220 |
that code, the algorithmic information in the model 00:54:48.180 |
such that it can solve a new problem more quickly than 00:54:57.780 |
optimistic and excited about the power of RL, 00:55:01.180 |
of reinforcement learning, in the context of real systems? 00:55:07.180 |
So you see RL as a potential having a huge impact 00:55:11.660 |
beyond just sort of the M part is often developed 00:55:51.500 |
So these little Audis, they are small, maybe like that, 00:56:03.820 |
They go up to 120 kilometers an hour if they want to. 00:56:12.460 |
And they don't want to bump against obstacles 00:56:17.140 |
And so they must learn, like little babies, to park. 00:56:25.340 |
into actions that lead to successful parking behavior, 00:56:47.580 |
is about passive pattern observation and prediction. 00:56:55.780 |
and what the major companies on the Pacific Rim 00:57:05.620 |
And that's only 1% or 2% of the world economy, 00:57:10.620 |
which is big enough to make these companies pretty much 00:57:22.420 |
about machines that shape the data through their own actions. 00:57:28.500 |
Do you think simulation is ultimately the biggest way 00:57:33.180 |
that those methods will be successful in the next 10, 00:57:38.820 |
We're talking about the near-term impact of RL. 00:57:42.620 |
Do you think really good simulation is required? 00:57:45.260 |
Or is there other techniques, like imitation learning, 00:57:49.220 |
observing other humans operating in the real world? 00:57:53.660 |
Where do you think the success will come from? 00:57:59.420 |
of using physics simulations to learn behavior 00:58:05.980 |
for machines that learn to solve problems that humans also do 00:58:13.980 |
However, this is not the future, because the future 00:58:19.580 |
They don't use a physics engine to simulate the world. 00:58:24.660 |
of the world, which maybe sometimes is wrong in many ways, 00:58:30.100 |
but captures all kinds of important abstract high-level 00:58:34.020 |
predictions, which are really important to be successful. 00:58:38.460 |
And that's what was the future 30 years ago, when we started 00:58:45.460 |
And now we know much better how to go there, to move forward, 00:58:51.300 |
and to really make work in systems based on that, where 00:58:54.980 |
you have a learning model of the world, a model of the world 00:58:58.260 |
that learns to predict what's going to happen 00:59:06.660 |
to more quickly learn successful action sequences. 00:59:11.820 |
And then, of course, always this curiosity thing. 00:59:17.780 |
to come up with experiments with action sequences 00:59:27.020 |
constructing an understanding of the world in this connection 00:59:30.340 |
is now the popular approach that has been successful, 00:59:40.660 |
there's symbolic AI approaches, which to us humans 00:59:44.980 |
are more intuitive, in the sense that it makes sense 00:59:49.220 |
that you build up knowledge in this knowledge representation. 00:59:52.540 |
What kind of lessons can we draw into our current approaches 01:00:12.180 |
Because a lot of your work was not so much in that realm, 01:00:25.860 |
1987, was the implementation of genetic algorithm 01:00:34.620 |
So Prolog, that's what you learned back then, 01:00:40.180 |
And the Japanese, they have this huge fifth generation 01:00:44.420 |
AI project, which was mostly about logic programming 01:00:58.140 |
since this guy in the Ukraine, Ivanko, started it. 01:01:05.700 |
they focused really on this logic programming. 01:01:08.060 |
And I was influenced to the extent that I said, 01:01:10.340 |
okay, let's take these biologically inspired algorithms 01:01:16.820 |
and implement that in the language which I know, 01:01:45.420 |
Without that, it would not be asymptotically optimal. 01:02:02.740 |
such as gradient-based search and program space, 01:02:13.380 |
when you're trying to construct something provably optimal 01:02:21.980 |
- It's really useful for our theory improving. 01:02:24.140 |
The best theory improvers today are not neural networks. 01:02:33.140 |
than most math students in the first or second semester. 01:02:37.620 |
- But for reasoning, for playing games of Go or chess, 01:02:51.260 |
- Yeah, as long as the problems have little to do 01:03:01.700 |
you would just want to have better pattern recognition. 01:03:09.100 |
and pedestrian recognition and all these things. 01:03:13.540 |
And you want to minimize the number of false positives, 01:03:19.060 |
which is currently slowing down self-driving cars 01:03:31.580 |
in terms of directions of artificial intelligence 01:03:37.100 |
in your own research and in the broader community? 01:03:54.220 |
"Look here, robot, we are going to assemble a smartphone. 01:03:59.380 |
"Let's take this slab of plastic and the screwdriver 01:04:20.420 |
and he will try to do something with his own actuators, 01:04:46.060 |
and then to interpret these additional noises 01:04:58.540 |
come up with faster ways and more efficient ways 01:05:13.740 |
but we already see how we are going to get there. 01:05:25.140 |
Almost all of production is going to be affected by that. 01:05:37.740 |
witnessing, which is mostly about passive pattern recognition 01:05:42.020 |
This is about active machines that shapes data 01:06:00.140 |
will equip these machines with cameras and other sensors, 01:06:05.820 |
and they are going to learn to solve all kinds of problems 01:06:16.940 |
And lots of old economy is going to be affected by that. 01:06:23.940 |
And in recent years, I have seen that old economy 01:06:27.260 |
is actually waking up and realizing that this is the case. 01:06:35.780 |
There's a lot of people concerned in the near term 01:06:38.340 |
about the transformation of the nature of work. 01:06:54.660 |
And looking a little bit farther into the future, 01:07:02.740 |
concerned about the existential threats of that future. 01:07:07.780 |
job loss in the long-term existential threat, 01:07:10.780 |
are these concerns to you, or are you ultimately optimistic? 01:07:19.540 |
We have had predictions of job losses for many decades. 01:07:28.100 |
For example, when industrial robots came along, 01:07:50.780 |
And today the same car factories have hundreds of robots 01:08:01.900 |
those countries that have lots of robots per capita, 01:08:26.740 |
it's really easy to say which jobs are going to get lost, 01:08:31.740 |
but it's really hard to predict the new ones. 01:08:34.220 |
30 years ago, who would have predicted all these people 01:08:39.300 |
making money as YouTube bloggers, for example? 01:08:56.740 |
But still, only, I don't know, 5% unemployment. 01:09:10.540 |
Most of these jobs are not existentially necessary 01:09:17.740 |
There are only very few existentially necessary jobs, 01:09:28.140 |
but less than 10% of the population is doing that. 01:09:33.620 |
are about interacting with other people in new ways, 01:09:40.900 |
getting new types of kudos in forms of likes and whatever, 01:09:53.380 |
and that's why he's inventing new jobs all the time. 01:09:57.020 |
And he keeps considering these jobs as really important 01:10:01.740 |
and is investing a lot of energy and hours of work 01:10:18.340 |
that we humans are so restless that we create 01:10:24.980 |
totally new things that get likes on Facebook 01:10:32.300 |
So what about long-term existential threat of AI, 01:10:36.700 |
where our whole civilization may be swallowed up 01:10:47.460 |
but I'd be surprised if we humans were the last step 01:10:57.940 |
- You've actually had this beautiful comment somewhere 01:11:13.460 |
"will likely not want to interact with humans. 01:11:21.460 |
"and only tangentially interact with humans." 01:11:45.140 |
that there's not already intelligence systems out there? 01:12:00.400 |
- I'd be surprised if within the next few decades 01:12:06.480 |
we won't have AIs that are truly smart in every single way 01:12:16.600 |
And I'd be surprised if they wouldn't realize 01:12:47.400 |
and self-replicating robot factories and all this stuff. 01:13:04.640 |
and by their own origins in our civilization. 01:13:07.320 |
They will want to understand that completely, 01:13:09.760 |
just like people today would like to understand 01:13:25.640 |
So in the beginning, they will be fascinated by life. 01:13:32.820 |
like anybody who loses interest in things he understands. 01:13:42.180 |
the most interesting sources of information for them 01:14:27.620 |
consisting of trillions of different types of AIs. 01:14:41.020 |
but by trillions of different types of AIs competing 01:14:58.260 |
And now we realize that the universe is still young. 01:15:06.180 |
and it's going to be 1,000 times older than that. 01:15:10.580 |
So there's plenty of time to conquer the entire universe 01:15:19.820 |
and senders and receivers such that AIs can travel 01:15:23.460 |
the way they are traveling in our labs today, 01:15:30.040 |
And let's call the current age of the universe one eon. 01:15:48.980 |
when the universe is going to be 1,000 times older 01:15:54.580 |
"Look, almost immediately after the Big Bang, 01:15:59.820 |
"the entire universe started to become intelligent." 01:16:08.380 |
has already happened or is already in a more advanced stage 01:16:36.540 |
are not much greater intelligence than our own? 01:16:42.300 |
When I was a boy, I was thinking about these things. 01:16:45.140 |
And I thought, "Hmm, maybe it has already happened." 01:16:48.380 |
Because back then I knew, I learned from popular physics 01:16:53.060 |
books that the structure, the large-scale structure 01:17:03.060 |
and then in between there are these huge empty spaces. 01:17:07.500 |
And I thought, "Hmm, maybe they aren't really empty." 01:17:25.740 |
within that bubble for its own unfathomable purposes. 01:17:39.420 |
explains the large-scale structure of the universe, 01:17:42.300 |
and that this is not a convincing explanation. 01:17:45.460 |
And then I thought, "Maybe it's the dark matter." 01:17:59.820 |
And we know that because otherwise our galaxy 01:18:08.060 |
And then the idea was maybe all of these AI civilizations 01:18:17.300 |
they are just invisible because they're really efficient 01:18:23.460 |
in using the energies of their own local systems, 01:18:29.780 |
But this is also not a convincing explanation 01:18:46.900 |
And today, I like to think it's quite plausible 01:18:57.300 |
within the few hundreds of millions of light years 01:19:09.220 |
- Is that exciting to you, that we might be the first? 01:19:16.500 |
because if we mess it up through a nuclear war, 01:19:33.740 |
- Jürgen, thank you so much for talking today.