back to indexOccam's Razor (Marcus Hutter) | AI Podcast Clips
Chapters
0:0 Occams Razor
0:48 The most important principle in science
2:20 Why is Einstein so beautiful
3:43 Induction
4:29 Theory
6:31 Weighting
8:2 Compression
9:38 Kolmogorov Complexity
11:18 The Whole Universe
11:58 Noise and Chaos
13:38 Library of All Books
15:3 Game of Life
17:48 Finding Simple Programs
00:00:09.680 |
which sort of if you translate it to proper English, 00:00:12.680 |
means, and you know, in a scientific context, 00:00:15.280 |
means that if you have two theories or hypotheses 00:00:18.000 |
or models which equally well describe the phenomenon 00:00:32.280 |
Perhaps we'll kind of discuss it and think about it, 00:00:35.720 |
but what's the intuition of why the simpler answer 00:00:40.320 |
is the one that is likelier to be more correct descriptor 00:00:48.800 |
is probably the most important principle in science. 00:00:56.760 |
but science is about finding, understanding the world, 00:01:05.920 |
which explain everything but predict nothing, 00:01:08.200 |
but the simple model seem to have predictive power 00:01:18.200 |
You can just accept it, that is the principle of science, 00:01:21.440 |
and we use this principle and it seems to be successful. 00:01:25.040 |
We don't know why, but it just happens to be. 00:01:28.120 |
Or you can try, you know, find another principle 00:01:51.280 |
and we come back to that later in case of somnolent deduction 00:02:01.240 |
- So I apologize for the romanticized question, 00:02:03.840 |
but why do you think outside of its effectiveness, 00:02:12.120 |
Why does it just, why does E equals MC squared 00:02:22.840 |
many things can be explained by an evolutionary argument. 00:02:27.200 |
And, you know, there's some artifacts in humans 00:02:29.560 |
which are just artifacts and not evolutionary necessary. 00:02:59.080 |
and we know that humans are prone to find more patterns 00:03:13.360 |
but I mean, it's best, of course, if they are, 00:03:23.680 |
but indeed, in terms of just survival purposes, 00:03:30.880 |
for why we find the work of Einstein so beautiful. 00:03:39.040 |
Could you describe what Solomonov induction is? 00:03:47.240 |
and Raysolomonov sort of claimed a long time ago 00:03:49.880 |
that this solves the big philosophical problem of induction. 00:04:04.240 |
induction can be interpreted narrowly and widely. 00:04:11.200 |
And widely means also then using these models 00:04:18.760 |
So I'm a little sloppy sort of with the terminology, 00:04:21.640 |
and maybe that comes from Raysolomonov being sloppy. 00:04:30.360 |
So let me explain a little bit this theory in simple terms. 00:04:43.240 |
The natural answer, I'm gonna speed up a little bit. 00:04:55.040 |
And why should it suddenly after 100 1s be different? 00:04:57.760 |
So what we're looking for is simple explanations or models 00:05:03.960 |
a model has to be presented in a certain language. 00:05:16.120 |
So abstractly on a Turing machine, for instance, 00:05:20.720 |
So, and there are, of course, lots of models. 00:05:25.120 |
and then 100 0s, and 100 1s, that's a model, right? 00:05:43.160 |
It will not stop, it will continue naturally. 00:05:48.360 |
And on the sequence of ones, it's very plausible, right? 00:05:58.320 |
The short program is again, you know, counter. 00:06:05.400 |
The extra twist is that it can also deal with noisy data. 00:06:15.600 |
then it will predict, it will learn and figure this out. 00:06:20.960 |
oh, the next coin flip will be head with probability 60%. 00:06:31.800 |
Well, in Solomonov induction, precisely what you do is, 00:06:34.440 |
so you combine, so looking for the shortest program 00:07:14.920 |
And you weigh all this hypothesis and take this mixture, 00:07:26.040 |
That seems to me maybe a very human-centric concept, 00:07:37.320 |
You've used the term compression quite a bit. 00:07:45.320 |
and maybe science or just all of our intellectual pursuits 00:07:50.000 |
is basically the attempt to compress the complexity 00:07:55.840 |
So what does this word mean to you, compression? 00:08:21.720 |
Compression means finding short descriptions, 00:08:37.600 |
we're kind of zooming in on a particular sort of, 00:08:52.360 |
And well, there are also some other aspects of science, 00:09:01.400 |
And that is then part of the decision-making process. 00:09:08.080 |
to understand the data is essentially compression. 00:09:10.840 |
So I don't see any difference between compression, 00:09:17.240 |
- So we're jumping around topics a little bit, 00:09:22.760 |
a fascinating concept of Kolmogorov complexity. 00:09:43.400 |
And it takes the compression view to the extreme. 00:09:48.200 |
So I explained before that if you have some data sequence, 00:09:53.960 |
and best sort of, you know, just a string of bits. 00:10:01.640 |
like we compress big files into, say, zip files 00:10:05.920 |
And you can also produce self-extracting RKFs. 00:10:15.040 |
It's just a decompressor plus the RKF together in one. 00:10:18.440 |
And now there are better and worse compressors, 00:10:21.040 |
and you can ask, what is the ultimate compressor? 00:10:23.320 |
So what is the shortest possible self-extracting RKF 00:10:27.080 |
you could produce for a certain data set, yeah, 00:10:31.800 |
And the length of this is called the Kolmogorov complexity. 00:10:35.520 |
And arguably, that is the information content 00:10:40.160 |
I mean, if the data set is very redundant or very boring, 00:10:46.960 |
And, you know, it is low according to this definition. 00:10:54.280 |
- And what's your sense of our sort of universe 00:10:58.480 |
when we think about the different objects in our universe, 00:11:03.480 |
that we try concepts or whatever at every level, 00:11:07.680 |
do they have high or low Kolmogorov complexity? 00:11:13.640 |
in being able to summarize much of our world? 00:11:25.800 |
based on the evidence we have, is very simple. 00:11:31.480 |
- Sorry, to linger on that, the whole universe, 00:11:36.260 |
Do you mean at the very basic fundamental level 00:11:54.360 |
- Is noise a problem, or is it a bug or a feature? 00:11:57.520 |
- I would say it makes our life as a scientist 00:12:25.000 |
So we can't get away with statistics even then. 00:12:33.440 |
But I mean, it's still so hard to compute the trajectory 00:12:52.180 |
then arguably you could describe the whole universe 00:12:55.420 |
as well as a standard model plus generativity. 00:12:59.660 |
I mean, we don't have a theory of everything yet, 00:13:01.860 |
but sort of assuming we are close to it or have it, yeah. 00:13:11.300 |
But that's spoiled by noise or by chaotic systems 00:13:15.800 |
or by initial conditions, which may be complex. 00:13:35.280 |
but when you just take a small window, then-- 00:13:38.360 |
- It may become complex, and that may be counterintuitive, 00:13:46.520 |
So imagine you have a normal library with interesting books 00:13:54.260 |
So now I create a library which contains all possible books, 00:13:59.080 |
So the first book just has AAAA over all the pages. 00:14:01.960 |
The next book, AAAA, and ends with B, and so on. 00:14:15.140 |
and suddenly you have a lot of information in there. 00:14:22.700 |
seems to be understudied or under-talked about 00:14:27.200 |
What lessons do you draw from sort of the game of life 00:14:33.060 |
just like you're describing with the universe, 00:14:46.360 |
where, like you said, some chaotic behavior could happen, 00:14:51.640 |
it could die out in some very rigid structures? 00:15:06.480 |
is really great because the rules are so simple, 00:15:09.960 |
and even by hand, you can simulate a little bit, 00:15:16.240 |
and people have proven that it's even Turing-complete. 00:15:19.040 |
You cannot just use a computer to simulate game of life, 00:15:56.840 |
You asked also about whether I understand this phenomenon, 00:16:10.600 |
And I think I'm pretty used to cellular automata, 00:16:21.520 |
I didn't play too much with this converse game of life, 00:16:32.240 |
And well, when the computers were really slow, 00:16:37.600 |
and programmed my own programs in assembler too. 00:16:52.560 |
and then I tried to understand what is going on, 00:17:20.360 |
And by sort of mathematically approaching this problem, 00:17:24.840 |
you slowly get a feeling of why things are like they are, 00:17:30.360 |
and that sort of is a first step to understanding 00:17:37.160 |
- Do you think it's possible, what's your intuition, 00:17:39.440 |
do you think it's possible to reverse engineer 00:17:41.160 |
and find the short program that generated these fractals 00:17:50.040 |
So, I mean, in principle, what you can do is, 00:17:54.240 |
you take any data set, you take these fractals, 00:17:56.800 |
or you take whatever your data set, whatever you have, 00:18:05.480 |
you take a program of size one, two, three, four, 00:18:07.560 |
and all these programs, run them all in parallel 00:18:13.600 |
first one 50%, second one half resources, and so on, 00:18:21.440 |
and if some of these programs produce the correct data, 00:18:24.640 |
then you stop, and then you have already some program. 00:18:26.760 |
It may be a long program because it's faster, 00:18:31.960 |
until you eventually find the shortest program. 00:18:37.800 |
because there could be an even shorter program, 00:18:44.440 |
But asymptotically, and actually after a finite time, 00:18:48.720 |
So this is a theoretical but completely impractical way 00:19:02.880 |
In practice, of course, we have to approach the problem 00:19:07.320 |
if you take resource limitations into account, 00:19:12.960 |
there's, for instance, the field of pseudo random numbers, 00:19:30.120 |
I mean, random numbers maybe not that interesting, 00:19:46.760 |
that's a big challenge for our search for simple programs 00:19:52.040 |
in the space of artificial intelligence, perhaps. 00:19:54.560 |
- Yes, it definitely is for artificial intelligence, 00:20:00.920 |
physicists worked really hard to find these theories, 00:20:04.280 |
but apparently it was possible for human minds