back to index

Juergen Schmidhuber: Godel Machines, Meta-Learning, and LSTMs | Lex Fridman Podcast #11


Chapters

0:0
9:15 Traveling Salesman Problems
19:29 Does God Play Dice
27:53 The General Theory of Relativity
38:7 The Role of Creativity and Intelligence
47:29 What Is the Importance of Depth
73:6 They Will Want To Understand that Completely Just like People Today Would Like To Understand How Life Works and Um and Also the History of Our Own Existence and Civilization and Also on the Physical Laws That Created all of that so They in the Beginning They Will Be Fascinated My Life once They Understand that I Was Interest like Anybody Who Loses Interest and Things He Understands and Then as You Said the Most Interesting Sources Information for Them Will Be Others of Their Own Kind So At Least in the Long Run

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Jürgen Schmidhuber.
00:00:03.520 | He's the co-director of the CS Swiss AI Lab
00:00:06.360 | and a co-creator of long short-term memory networks.
00:00:10.400 | LSTMs are used in billions of devices today
00:00:13.720 | for speech recognition, translation, and much more.
00:00:17.400 | Over 30 years, he has proposed a lot of interesting
00:00:20.800 | out-of-the-box ideas on meta-learning,
00:00:23.440 | adversarial networks, computer vision,
00:00:26.000 | and even a formal theory of quote,
00:00:28.760 | creativity, curiosity, and fun.
00:00:32.360 | This conversation is part of the MIT course
00:00:34.920 | on artificial general intelligence
00:00:36.520 | and the Artificial Intelligence Podcast.
00:00:38.840 | If you enjoy it, subscribe on YouTube, iTunes,
00:00:41.960 | or simply connect with me on Twitter
00:00:43.960 | at Lex Friedman spelled F-R-I-D.
00:00:47.320 | And now, here's my conversation with Jürgen Schmidhuber.
00:00:51.520 | Early on, you dreamed of AI systems
00:00:55.640 | that self-improve recursively.
00:00:58.720 | When was that dream born?
00:01:00.240 | - When I was a baby?
00:01:02.920 | No, that's not true.
00:01:04.000 | When I was a teenager.
00:01:05.240 | - And what was the catalyst for that birth?
00:01:09.440 | What was the thing that first inspired you?
00:01:11.600 | - When I was a boy,
00:01:13.960 | I was thinking about what to do in my life
00:01:19.960 | and then I thought the most exciting thing
00:01:23.640 | is to solve the riddles of the universe.
00:01:27.960 | And that means you have to become a physicist.
00:01:30.720 | However, then I realized that there's something even grander
00:01:35.640 | you can try to build a machine
00:01:39.680 | that isn't really a machine any longer,
00:01:41.920 | that learns to become a much better physicist
00:01:44.320 | than I could ever hope to be.
00:01:46.880 | And that's how I thought maybe I can multiply
00:01:50.120 | my tiny little bit of creativity into infinity.
00:01:54.320 | - But ultimately that creativity will be multiplied
00:01:57.040 | to understand the universe around us.
00:01:59.120 | That's the curiosity for that mystery that drove you?
00:02:04.120 | - Yes, so if you can build a machine
00:02:08.320 | that learns to solve more and more complex problems
00:02:13.320 | and more and more general problem solver,
00:02:16.760 | then you basically have solved all the problems,
00:02:21.760 | at least all the solvable problems.
00:02:25.960 | - So how do you think, what is the mechanism
00:02:28.120 | for that kind of general solver look like?
00:02:31.640 | Obviously we don't quite yet have one
00:02:34.840 | or know how to build one, but we have ideas
00:02:37.040 | and you have had throughout your career
00:02:39.120 | several ideas about it.
00:02:40.800 | So how do you think about that mechanism?
00:02:43.640 | - So in the 80s, I thought about how to build this machine
00:02:48.640 | that learns to solve all these problems
00:02:51.040 | that I cannot solve myself.
00:02:54.120 | And I thought it is clear it has to be a machine
00:02:57.160 | that not only learns to solve this problem here
00:03:00.880 | and this problem here, but it also has to learn
00:03:04.160 | to improve the learning algorithm itself.
00:03:08.080 | So it has to have the learning algorithm
00:03:12.480 | in a representation that allows it to inspect it
00:03:15.720 | and modify it such that it can come up
00:03:19.240 | with a better learning algorithm.
00:03:22.120 | So I call that meta-learning, learning to learn
00:03:25.720 | and recursive self-improvement.
00:03:28.080 | That is really the pinnacle of that,
00:03:29.880 | where you then not only learn how to improve
00:03:34.880 | on that problem and on that,
00:03:37.520 | but you also improve the way the machine improves
00:03:41.120 | and you also improve the way it improves
00:03:43.200 | the way it improves itself.
00:03:44.600 | And that was my 1987 diploma thesis,
00:03:48.600 | which was all about that hierarchy of meta-learners
00:03:53.240 | that have no computational limits
00:03:57.280 | except for the well-known limits
00:03:59.960 | that Gödel identified in 1931
00:04:03.240 | and for the limits of physics.
00:04:05.720 | - In the recent years, meta-learning has gained popularity
00:04:10.120 | in a specific kind of form.
00:04:12.840 | You've talked about how that's not really meta-learning
00:04:16.040 | with neural networks, that's more basic transfer learning.
00:04:21.040 | Can you talk about the difference
00:04:22.720 | between the big general meta-learning
00:04:25.480 | and a more narrow sense of meta-learning
00:04:27.960 | the way it's used today, the way it's talked about today?
00:04:30.880 | - Let's take the example of a deep neural network
00:04:33.440 | that has learned to classify images.
00:04:37.240 | And maybe you have trained that network
00:04:40.080 | on 100 different databases of images.
00:04:45.840 | And now a new database comes along
00:04:48.120 | and you want to quickly learn the new thing as well.
00:04:52.000 | So one simple way of doing that is you take the network,
00:04:57.720 | which already knows 100 types of databases,
00:05:02.440 | and then you would just take the top layer of that
00:05:06.320 | and you retrain that using the new label data
00:05:11.320 | that you have in the new image database.
00:05:14.720 | And then it turns out that it really, really quickly
00:05:17.360 | can learn that too, one shot basically,
00:05:20.600 | because from the first 100 datasets,
00:05:24.320 | it already has learned so much about computer vision
00:05:27.560 | that it can reuse that, and that is then almost good enough
00:05:31.880 | to solve the new task, except you need a little bit
00:05:34.240 | of adjustment on the top.
00:05:37.080 | So that is transfer learning, and it has been done
00:05:42.320 | in principle for many decades.
00:05:44.520 | People have done similar things for decades.
00:05:46.720 | Meta-learning, true meta-learning is about
00:05:51.080 | having the learning algorithm itself open to introspection
00:05:56.080 | by the system that is using it,
00:06:00.440 | and also open to modification,
00:06:04.800 | such that the learning system has an opportunity
00:06:07.880 | to modify any part of the learning algorithm,
00:06:12.120 | and then evaluate the consequences of that modification,
00:06:16.760 | and then learn from that to create
00:06:21.040 | a better learning algorithm, and so on recursively.
00:06:24.880 | So that's a very different animal,
00:06:28.560 | where you are opening the space
00:06:31.160 | of possible learning algorithms
00:06:33.560 | to the learning system itself.
00:06:35.520 | - Right, so you've, like in the 2004 paper,
00:06:39.040 | you described Gator machines,
00:06:41.920 | programs that rewrite themselves, right?
00:06:44.480 | Philosophically, and even in your paper,
00:06:46.560 | mathematically, these are really compelling ideas,
00:06:49.960 | but practically, do you see these self-referential programs
00:06:54.960 | being successful in the near term,
00:06:58.360 | to having an impact where sort of it demonstrates
00:07:01.960 | to the world that this direction
00:07:04.760 | is a good one to pursue in the near term?
00:07:08.640 | - Yes, we had these two different types
00:07:11.320 | of fundamental research,
00:07:13.440 | how to build a universal problem solver,
00:07:15.800 | one basically exploiting proof search,
00:07:20.320 | and things like that, that you need to come up with
00:07:25.520 | asymptotically optimal, theoretically optimal,
00:07:30.280 | self-improvers and problem solvers.
00:07:33.200 | However, one has to admit that through this proof search,
00:07:39.160 | comes in an additive constant,
00:07:43.600 | an overhead, an additive overhead,
00:07:46.760 | that vanishes in comparison to what you have to do
00:07:51.760 | to solve large problems.
00:07:55.160 | However, for many of the small problems
00:07:58.000 | that we want to solve in our everyday life,
00:08:00.880 | we cannot ignore this constant overhead.
00:08:03.280 | And that's why we also have been doing other things,
00:08:08.120 | non-universal things, such as recurrent neural networks,
00:08:12.160 | which are trained by gradient descent,
00:08:15.400 | and local search techniques, which aren't universal at all,
00:08:18.680 | which aren't provably optimal at all,
00:08:21.280 | like the other stuff that we did,
00:08:22.880 | but which are much more practical,
00:08:25.400 | as long as we only want to solve the small problems
00:08:28.760 | that we are typically trying to solve
00:08:33.320 | in this environment here.
00:08:35.560 | So the universal problem solvers, like the Godel machine,
00:08:38.920 | but also Markus Hutter's fastest way
00:08:42.080 | of solving all possible problems,
00:08:44.360 | which he developed around 2002 in my lab,
00:08:48.080 | they are associated with these constant overheads
00:08:52.520 | for proof search, which guarantees that the thing
00:08:55.160 | that you're doing is optimal.
00:08:56.560 | For example, there is this fastest way
00:09:01.160 | of solving all problems with a computable solution,
00:09:05.280 | which is due to Markus Hutter.
00:09:07.280 | And to explain what's going on there,
00:09:12.240 | let's take traveling salesman problems.
00:09:14.320 | With traveling salesman problems,
00:09:17.360 | you have a number of cities, N cities,
00:09:21.320 | and you try to find the shortest path
00:09:23.680 | through all these cities without visiting any city twice.
00:09:27.840 | And nobody knows the fastest way
00:09:32.320 | of solving traveling salesman problems, TSPs,
00:09:36.560 | but let's assume there is a method of solving them
00:09:41.760 | within N to the five operations,
00:09:45.920 | where N is the number of cities.
00:09:50.240 | Then the universal method of Markus
00:09:54.600 | is going to solve the same traveling salesman problem
00:09:58.600 | also within N to the five steps,
00:10:02.160 | plus O of one, plus a constant number of steps
00:10:06.440 | that you need for the proof searcher,
00:10:09.280 | which you need to show that this particular class
00:10:14.120 | of problems, the traveling salesman problems,
00:10:17.240 | can be solved within a certain time bound
00:10:20.320 | within order N to the five steps, basically.
00:10:24.320 | And this additive constant doesn't care for N,
00:10:28.440 | which means as N is getting larger and larger,
00:10:32.320 | as you have more and more cities,
00:10:34.800 | the constant overhead pales in comparison.
00:10:38.640 | And that means that almost all large problems are solved
00:10:43.640 | in the best possible way.
00:10:45.600 | Already today, we already have a universal problem solver
00:10:49.120 | like that.
00:10:50.600 | However, it's not practical because the overhead,
00:10:54.640 | the constant overhead is so large
00:10:57.600 | that for the small kinds of problems
00:11:00.320 | that we want to solve in this little biosphere.
00:11:04.680 | - By the way, when you say small,
00:11:06.520 | you're talking about things that fall within the constraints
00:11:09.520 | of our computational systems.
00:11:11.000 | So they can seem quite large to us mere humans, right?
00:11:14.560 | - That's right, yeah.
00:11:15.480 | So they seem large and even unsolvable
00:11:19.120 | in a practical sense today,
00:11:21.120 | but they are still small compared to almost all problems
00:11:24.880 | because almost all problems are large problems,
00:11:28.600 | which are much larger than any constant.
00:11:31.000 | - Do you find it useful as a person who has dreamed
00:11:36.120 | of creating a general learning system,
00:11:38.720 | has worked on creating one,
00:11:39.960 | has done a lot of interesting ideas there,
00:11:42.240 | to think about P versus NP,
00:11:46.400 | this formalization of how hard problems are,
00:11:50.840 | how they scale,
00:11:52.440 | this kind of worst case analysis type of thinking,
00:11:55.280 | do you find that useful?
00:11:56.880 | Or is it only just a mathematical,
00:11:59.740 | it's a set of mathematical techniques
00:12:02.680 | to give you intuition about what's good and bad.
00:12:05.800 | - So P versus NP, that's super interesting
00:12:09.520 | from a theoretical point of view.
00:12:11.840 | And in fact, as you are thinking about that problem,
00:12:14.600 | you can also get inspiration
00:12:17.320 | for better practical problem solvers.
00:12:21.320 | On the other hand, we have to admit that at the moment,
00:12:24.600 | the best practical problem solvers
00:12:28.400 | for all kinds of problems that we are now solving
00:12:31.120 | through what is called AI at the moment,
00:12:33.880 | they are not of the kind
00:12:36.280 | that is inspired by these questions.
00:12:38.840 | There we are using general purpose computers,
00:12:42.720 | such as recurrent neural networks,
00:12:44.840 | but we have a search technique,
00:12:46.720 | which is just local search, gradient descent,
00:12:50.320 | to try to find a program
00:12:51.960 | that is running on these recurrent networks,
00:12:54.420 | such that it can solve some interesting problems,
00:12:58.560 | such as speech recognition,
00:13:00.560 | or machine translation, and something like that.
00:13:03.120 | And there is very little theory
00:13:06.480 | behind the best solutions that we have at the moment
00:13:09.720 | that can do that.
00:13:10.800 | - Do you think that needs to change?
00:13:12.640 | Do you think that will change?
00:13:14.080 | Or can we go,
00:13:15.120 | can we create a general intelligence systems
00:13:17.120 | without ever really proving that that system is intelligent
00:13:20.600 | in some kind of mathematical way,
00:13:22.560 | solving machine translation perfectly,
00:13:24.960 | or something like that,
00:13:26.300 | within some kind of syntactic definition of a language?
00:13:29.160 | Or can we just be super impressed
00:13:31.120 | by the thing working extremely well, and that's sufficient?
00:13:35.080 | - There's an old saying,
00:13:36.720 | and I don't know who brought it up first,
00:13:39.340 | which says, "There's nothing more practical
00:13:42.440 | than a good theory."
00:13:43.720 | (laughing)
00:13:45.960 | And a good theory of problem solving
00:13:50.400 | under limited resources,
00:13:54.320 | like here in this universe, or on this little planet,
00:13:57.040 | has to take into account these limited resources.
00:14:01.800 | And so probably there is lacking
00:14:04.960 | a theory which is related to what we already have,
00:14:09.960 | these asymptotically optimal problem solvers,
00:14:14.440 | which tells us what we need in addition to that
00:14:18.560 | to come up with a practically optimal problem solver.
00:14:21.760 | So I believe we will have something like that.
00:14:27.120 | And maybe just a few little tiny twists
00:14:29.720 | are necessary to change what we already have
00:14:34.280 | to come up with that as well.
00:14:36.320 | As long as we don't have that,
00:14:37.720 | we admit that we are taking suboptimal ways
00:14:42.560 | and returning neural networks and long, short-term memory
00:14:45.960 | for equipped with local search techniques,
00:14:50.400 | and we are happy that it works better
00:14:53.520 | than any competing method,
00:14:55.520 | but that doesn't mean that we think we are done.
00:15:00.520 | - You've said that an AGI system
00:15:02.760 | will ultimately be a simple one.
00:15:05.080 | A general intelligence system
00:15:06.240 | will ultimately be a simple one.
00:15:08.040 | Maybe a pseudocode of a few lines
00:15:10.280 | would be able to describe it.
00:15:11.880 | Can you talk through your intuition behind this idea?
00:15:16.800 | Why you feel that at its core,
00:15:21.520 | intelligence is a simple algorithm?
00:15:25.560 | - Experience tells us that the stuff that works best
00:15:31.680 | is really simple.
00:15:33.120 | So the asymptotically optimal ways of solving problems,
00:15:37.640 | if you look at them,
00:15:38.800 | they're just a few lines of code, it's really true.
00:15:41.800 | Although they are these amazing properties,
00:15:44.000 | just a few lines of code.
00:15:45.800 | Then the most promising and most useful practical thing
00:15:51.600 | maybe don't have this proof of optimality
00:15:56.360 | associated with them.
00:15:57.800 | However, they are also just a few lines of code.
00:16:00.880 | The most successful return neural networks,
00:16:05.080 | you can write them down in five lines of pseudocode.
00:16:08.520 | - That's a beautiful, almost poetic idea,
00:16:10.920 | but what you're describing there
00:16:15.640 | is the lines of pseudocode are sitting on top of layers
00:16:19.240 | and layers of abstractions in a sense.
00:16:21.440 | So you're saying at the very top,
00:16:25.040 | it'll be a beautifully written sort of algorithm,
00:16:29.040 | but do you think that there's many layers of abstractions
00:16:34.000 | we have to first learn to construct?
00:16:36.880 | - Yeah, of course, we are building on all these
00:16:40.400 | great abstractions that people have invented
00:16:44.000 | over the millennia, such as matrix multiplications.
00:16:49.760 | And real numbers and basic arithmetics and calculus
00:16:54.760 | and derivations of error functions
00:17:01.240 | and derivatives of error functions and stuff like that.
00:17:04.280 | So without that language that greatly simplifies
00:17:10.440 | our way of thinking about these problems,
00:17:13.840 | we couldn't do anything.
00:17:14.800 | So in that sense, as always,
00:17:16.560 | we are standing on the shoulders of the giants
00:17:19.600 | who in the past simplified the problem
00:17:24.240 | of problem solving so much
00:17:26.360 | that now we have a chance to do the final step.
00:17:29.960 | - So the final step will be a simple one.
00:17:32.120 | If we take a step back through all of human civilization
00:17:36.720 | and just the universe in general,
00:17:38.360 | how do you think about evolution
00:17:41.440 | and what if creating a universe is required
00:17:45.360 | to achieve this final step?
00:17:47.280 | What if going through the very painful
00:17:50.880 | and inefficient process of evolution is needed
00:17:53.800 | to come up with this set of abstractions
00:17:55.840 | that ultimately lead to intelligence?
00:17:57.760 | Do you think there's a shortcut
00:18:00.760 | or do you think we have to create
00:18:02.400 | something like our universe
00:18:04.600 | in order to create something like human level intelligence?
00:18:07.720 | - So far, the only example we have is this one,
00:18:13.120 | this universe in which we are living.
00:18:15.400 | - Do you think you can do better?
00:18:16.560 | - Maybe not, but we are part of this whole process.
00:18:25.040 | So apparently, so it might be the case
00:18:30.000 | that the code that runs the universe is really, really simple.
00:18:33.680 | Everything points to that possibility
00:18:36.680 | because gravity and other basic forces
00:18:39.960 | are really simple laws that can be easily described
00:18:44.120 | also in just a few lines of code basically.
00:18:46.480 | And then there are these other events
00:18:52.200 | that the apparently random events
00:18:55.080 | in the history of the universe,
00:18:56.560 | which as far as we know at the moment
00:18:58.800 | don't have a compact code, but who knows,
00:19:01.360 | maybe somebody in the near future
00:19:03.240 | is going to figure out the pseudo random generator
00:19:06.840 | which is computing whether the measurement
00:19:11.840 | of that spin up or down thing here
00:19:16.000 | is going to be positive or negative.
00:19:18.520 | - Underlying quantum mechanics.
00:19:20.000 | - Yes.
00:19:20.840 | - So you ultimately think quantum mechanics
00:19:23.240 | is a pseudo random number generator,
00:19:25.280 | so it's all deterministic.
00:19:27.000 | There's no randomness in our universe.
00:19:28.880 | Does God play dice?
00:19:31.880 | - So a couple of years ago, a famous physicist,
00:19:37.080 | quantum physicist, Anton Zeilinger,
00:19:39.520 | he wrote an essay in "Nature"
00:19:42.160 | and it started more or less like that.
00:19:44.840 | One of the fundamental insights of the 20th century
00:19:52.280 | was that the universe is fundamentally random
00:19:58.880 | on the quantum level.
00:20:02.920 | And that whenever you measure spin up or down
00:20:06.800 | or something like that,
00:20:08.280 | a new bit of information enters the history of the universe.
00:20:12.320 | And while I was reading that,
00:20:16.000 | I was already typing the response
00:20:19.320 | and they had to publish it because I was right.
00:20:21.720 | That there is no evidence, no physical evidence for that.
00:20:27.880 | So there's an alternative explanation
00:20:30.440 | where everything that we consider random
00:20:33.480 | is actually pseudo random,
00:20:35.760 | such as the decimal expansion of pi, 3.141 and so on,
00:20:40.760 | which looks random, but isn't.
00:20:44.360 | So pi is interesting because every three digit sequence,
00:20:49.360 | every sequence of three digits
00:20:52.240 | appears roughly one in a thousand times.
00:20:56.000 | And every five digit sequence
00:20:59.960 | appears roughly one in 10,000 times.
00:21:03.480 | What you would expect if it was random.
00:21:07.040 | But there's a very short algorithm,
00:21:09.320 | a short program that computes all of that.
00:21:11.680 | So it's extremely compressible.
00:21:13.880 | And who knows, maybe tomorrow somebody,
00:21:15.840 | some grad student at CERN
00:21:17.560 | goes back over all these data points,
00:21:20.960 | better decay and whatever,
00:21:22.600 | and figures out, oh, it's the second billion digits of pi
00:21:27.280 | or something like that.
00:21:28.520 | We don't have any fundamental reason at the moment
00:21:31.520 | to believe that this is truly random
00:21:36.120 | and not just a deterministic video game.
00:21:39.080 | If it was a deterministic video game,
00:21:41.120 | it would be much more beautiful
00:21:43.120 | because beauty is simplicity.
00:21:46.680 | And many of the basic laws of the universe,
00:21:49.840 | like gravity and the other basic forces are very simple.
00:21:54.120 | So very short programs can explain what these are doing.
00:21:58.080 | And it would be awful and ugly.
00:22:03.360 | The universe would be ugly.
00:22:04.560 | The history of the universe would be ugly
00:22:06.760 | if for the extra things,
00:22:08.160 | the seemingly random data points that we get all the time,
00:22:12.880 | that we really need a huge number of extra bits
00:22:17.920 | to describe all these extra bits of information.
00:22:22.920 | So as long as we don't have evidence
00:22:27.400 | that there is no short program
00:22:29.760 | that computes the entire history of the entire universe,
00:22:34.000 | we are, as scientists, compelled to look further
00:22:40.360 | for that shortest program.
00:22:42.760 | - Your intuition says there exists a program
00:22:47.800 | that can backtrack to the creation of the universe.
00:22:52.000 | So compute the shortest path to the creation of the universe.
00:22:54.280 | - Yes, including all the entanglement things
00:22:58.440 | and all the spin up and down measurements
00:23:02.480 | that have been taken place
00:23:05.760 | since 13.8 billion years ago.
00:23:10.640 | And so, yeah.
00:23:11.840 | So we don't have a proof that it is random.
00:23:15.720 | We don't have a proof that it is compressible
00:23:19.600 | to a short program.
00:23:20.800 | But as long as we don't have that proof,
00:23:22.400 | we are obliged as scientists
00:23:25.000 | to keep looking for that simple explanation.
00:23:27.600 | - Absolutely.
00:23:28.440 | So you said simplicity is beautiful or beauty is simple.
00:23:31.600 | Either one works.
00:23:33.280 | But you also work on curiosity, discovery,
00:23:37.080 | the romantic notion of randomness, of serendipity,
00:23:42.840 | of being surprised by things that are about you,
00:23:47.840 | kind of in our poetic notion of reality,
00:23:53.440 | we think as humans require randomness.
00:23:56.440 | So you don't find randomness beautiful.
00:23:59.040 | You find simple determinism beautiful.
00:24:04.040 | - Yeah.
00:24:05.720 | - Okay.
00:24:07.720 | - So why?
00:24:08.600 | - Why?
00:24:09.440 | Because the explanation becomes shorter.
00:24:13.080 | A universe that is compressible to a short program
00:24:18.080 | is much more elegant and much more beautiful
00:24:22.920 | than another one, which needs an almost infinite number
00:24:26.280 | of bits to be described.
00:24:27.880 | As far as we know,
00:24:30.160 | many things that are happening in this universe
00:24:33.960 | are really simple in terms of short programs
00:24:37.520 | that compute gravity and the interaction
00:24:41.160 | between elementary particles and so on.
00:24:43.600 | So all of that seems to be very, very simple.
00:24:45.800 | Every electron seems to reuse the same sub-program
00:24:50.240 | all the time as it is interacting
00:24:52.280 | with other elementary particles.
00:24:54.600 | If we now require an extra oracle
00:25:04.440 | injecting new bits of information all the time
00:25:07.800 | for these extra things which are currently not understood,
00:25:11.960 | such as better decay,
00:25:16.040 | then the whole description length
00:25:23.640 | of the data that we can observe of the history
00:25:26.560 | of the universe would become much longer
00:25:31.560 | and therefore uglier.
00:25:33.720 | - And uglier.
00:25:34.920 | Again, simplicity is elegant and beautiful.
00:25:38.000 | - The history of science is a history
00:25:40.840 | of compression progress.
00:25:42.720 | - Yeah, so you've described sort of
00:25:46.960 | as we build up abstractions
00:25:48.680 | and you've talked about the idea of compression.
00:25:52.120 | How do you see this, the history of science,
00:25:55.560 | the history of humanity, our civilization,
00:25:58.600 | and life on Earth as some kind of path
00:26:02.000 | towards greater and greater compression?
00:26:04.120 | What do you mean by that?
00:26:05.040 | How do you think about that?
00:26:06.960 | - Indeed, the history of science
00:26:09.240 | is a history of compression progress.
00:26:13.080 | What does that mean?
00:26:14.640 | Hundreds of years ago, there was an astronomer
00:26:18.000 | whose name was Kepler,
00:26:20.080 | and he looked at the data points that he got
00:26:22.560 | by watching planets move,
00:26:25.800 | and then he had all these data points,
00:26:27.560 | and suddenly it turned out
00:26:28.760 | that he can greatly compress the data
00:26:31.920 | by predicting it through an ellipse law.
00:26:36.920 | So it turns out that all these data points
00:26:39.840 | are more or less on ellipses around the sun.
00:26:43.680 | And another guy came along whose name was Newton,
00:26:49.600 | and before him Hooke,
00:26:51.440 | and they said the same thing
00:26:53.680 | that is making these planets move like that
00:26:58.480 | is what makes the apples fall down.
00:27:01.880 | And it also holds for stones
00:27:05.520 | and for all kinds of other objects.
00:27:09.800 | And suddenly, many, many of these observations
00:27:15.080 | became much more compressible,
00:27:16.960 | because as long as you can predict the next thing,
00:27:19.960 | given what you have seen so far,
00:27:21.720 | you can compress it,
00:27:22.640 | but you don't have to store that data extra.
00:27:25.320 | This is called predictive coding.
00:27:29.240 | And then there was still something wrong
00:27:31.400 | with that theory of the universe,
00:27:33.680 | and you had deviations from these predictions of the theory.
00:27:37.600 | And 300 years later, another guy came along
00:27:40.240 | whose name was Einstein,
00:27:41.880 | and he was able to explain away all these deviations
00:27:46.720 | from the predictions of the old theory
00:27:51.160 | through a new theory,
00:27:52.920 | which was called the general theory of relativity,
00:27:56.960 | which at first glance looks a little bit more complicated,
00:28:00.760 | and you have to warp space and time,
00:28:02.760 | but you can phrase it within one single sentence,
00:28:05.600 | which is no matter how fast you accelerate
00:28:09.000 | and how fast or hard you decelerate,
00:28:12.920 | and no matter what is the gravity in your local framework,
00:28:17.920 | light speed always looks the same.
00:28:21.360 | And from that, you can calculate all the consequences.
00:28:24.280 | So it's a very simple thing,
00:28:25.760 | and it allows you to further compress all the observations
00:28:30.360 | because certainly there are hardly any deviations any longer
00:28:35.360 | that you can measure from the predictions
00:28:37.800 | of this new theory.
00:28:40.080 | So all of science is a history of compression progress.
00:28:44.560 | You never arrive immediately
00:28:47.000 | at the shortest explanation of the data,
00:28:50.800 | but you're making progress.
00:28:52.560 | Whenever you are making progress,
00:28:54.360 | you have an insight.
00:28:56.320 | You see, oh, first I needed so many bits of information
00:28:59.520 | to describe the data, to describe my falling apples,
00:29:02.840 | my video of falling apples.
00:29:04.200 | I need so many data.
00:29:05.800 | So many pixels have to be stored.
00:29:08.200 | But then suddenly I realized, no,
00:29:10.120 | there is a very simple way of predicting the third frame
00:29:13.920 | in the video from the first two.
00:29:16.080 | And maybe not every little detail can be predicted,
00:29:20.120 | but more or less, most of these orange blobs
00:29:22.680 | that are coming down, they accelerate in the same way,
00:29:25.880 | which means that I can greatly compress the video.
00:29:28.640 | And the amount of compression progress,
00:29:33.320 | that is the depth of the insight
00:29:35.680 | that you have at that moment.
00:29:37.360 | That's the fun that you have, the scientific fun,
00:29:40.280 | the fun in that discovery.
00:29:43.040 | And we can build artificial systems that do the same thing.
00:29:46.080 | They measure the depth of their insights
00:29:49.080 | as they are looking at the data,
00:29:50.800 | which is coming in through their own experiments.
00:29:53.520 | And we give them a reward, an intrinsic reward,
00:29:57.120 | in proportion to this depth of insight.
00:30:01.200 | And since they are trying to maximize the rewards they get,
00:30:08.160 | they are suddenly motivated to come up with new action
00:30:11.160 | sequences, with new experiments that have the property
00:30:15.760 | that the data that is coming in as a consequence
00:30:18.800 | of these experiments has the property
00:30:21.800 | that they can learn something about,
00:30:24.040 | see a pattern in there which they hadn't seen yet before.
00:30:28.760 | - So there's an idea of power play that you've described,
00:30:32.080 | a training in general problem solver in this kind of way
00:30:35.320 | of looking for the unsolved problems.
00:30:37.360 | - Yeah.
00:30:38.200 | - Can you describe that idea a little further?
00:30:40.440 | - It's another very simple idea.
00:30:42.440 | So normally what you do in computer science,
00:30:44.840 | you have some guy who gives you a problem,
00:30:50.280 | and then there is a huge search space of potential solution
00:30:55.920 | candidates.
00:30:56.880 | And you somehow try them out, and you
00:31:00.680 | have more or less sophisticated ways
00:31:02.680 | of moving around in that search space
00:31:06.960 | until you finally found a solution which
00:31:10.320 | you consider satisfactory.
00:31:13.040 | That's what most of computer science is about.
00:31:15.840 | Power play just goes one little step further and says,
00:31:20.000 | let's not only search for solutions to a given problem,
00:31:24.640 | but let's search to pairs of problems and their solutions
00:31:30.600 | where the system itself has the opportunity
00:31:33.160 | to phrase its own problem.
00:31:37.320 | So we are looking suddenly at pairs of problems
00:31:41.000 | and their solutions or modifications of the problem
00:31:45.480 | solver that is supposed to generate
00:31:47.640 | a solution to that new problem.
00:31:51.000 | And this additional degree of freedom
00:31:56.800 | allows us to build curious systems that
00:32:00.440 | are like scientists in the sense that they not only try
00:32:04.360 | to solve and try to find answers to existing questions,
00:32:08.200 | no, they are also free to pose their own questions.
00:32:13.320 | So if you want to build an artificial scientist,
00:32:15.360 | you have to give it that freedom,
00:32:16.840 | and power play is exactly doing that.
00:32:19.520 | So that's a dimension of freedom that's important to have.
00:32:23.000 | But how hard do you think that--
00:32:27.360 | how multidimensional and difficult
00:32:30.800 | the space of then coming up with your own questions is?
00:32:34.240 | So it's one of the things that as human beings
00:32:36.920 | we consider to be the thing that makes us special,
00:32:39.880 | the intelligence that makes us special,
00:32:42.200 | is that brilliant insight that can create something totally
00:32:48.800 | So now let's look at the extreme case.
00:32:51.280 | Let's look at the set of all possible problems
00:32:55.560 | that you can formally describe, which is infinite, which
00:33:01.160 | should be the next problem that a scientist or a power play
00:33:06.520 | is going to solve.
00:33:08.120 | Well, it should be the easiest problem that goes
00:33:14.480 | beyond what you already know.
00:33:17.560 | So it should be the simplest problem
00:33:20.720 | that the current problem solver that you have,
00:33:23.040 | which can already solve 100 problems,
00:33:26.600 | that he cannot solve yet by just generalizing.
00:33:31.080 | So it has to be new.
00:33:32.400 | So it has to require a modification of the problem
00:33:35.120 | solver such that the new problem solver can
00:33:37.400 | solve this new thing, but the old problem solver cannot do it.
00:33:41.560 | And in addition to that, we have to make sure
00:33:45.760 | that the problem solver doesn't forget
00:33:48.720 | any of the previous solutions.
00:33:50.200 | Right.
00:33:51.200 | And so by definition, power play is now
00:33:53.720 | trying always to search in this pair of--
00:33:57.480 | in the set of pairs of problems and problem solver
00:34:01.600 | modifications for a combination that minimize the time
00:34:06.600 | to achieve these criteria.
00:34:08.160 | So it's always trying to find the problem which is easiest
00:34:12.160 | to add to the repertoire.
00:34:14.720 | So just like grad students and academics and researchers
00:34:18.880 | can spend their whole career in a local minima,
00:34:22.200 | stuck trying to come up with interesting questions,
00:34:25.860 | but ultimately doing very little,
00:34:27.600 | do you think it's easy in this approach of looking
00:34:31.880 | for the simplest unsolvable problem
00:34:33.760 | to get stuck in a local minima?
00:34:35.520 | Is never really discovering new--
00:34:40.560 | really jumping outside of the 100 problems
00:34:42.600 | that you've already solved in a genuine creative way?
00:34:47.600 | No, because that's the nature of power play,
00:34:49.920 | that it's always trying to break its current generalization
00:34:53.960 | abilities by coming up with a new problem which
00:34:58.120 | is beyond the current horizon.
00:35:00.920 | Just shifting the horizon of knowledge a little bit
00:35:04.440 | out there, breaking the existing rules,
00:35:08.040 | such that the new thing becomes solvable,
00:35:10.960 | but wasn't solvable by the old thing.
00:35:13.280 | So like adding a new axiom, like what
00:35:16.400 | Gödel did when he came up with these new sentences,
00:35:20.640 | new theorems that didn't have a proof in the formal system,
00:35:23.840 | which means you can add them to the repertoire,
00:35:27.680 | hoping that they are not going to damage the consistency
00:35:33.440 | of the whole thing.
00:35:35.880 | So in the paper with the amazing title,
00:35:39.640 | Formal Theory of Creativity, Fun and Intrinsic Motivation,
00:35:44.320 | you talk about discovery as intrinsic reward.
00:35:47.720 | So if you view humans as intelligent agents,
00:35:51.640 | what do you think is the purpose and meaning of life
00:35:54.880 | for us humans?
00:35:56.880 | You've talked about this discovery.
00:35:58.800 | Do you see humans as an instance of power play, agents?
00:36:04.200 | Yeah, so humans are curious, and that
00:36:09.160 | means they behave like scientists,
00:36:11.520 | not only the official scientists,
00:36:13.160 | but even the babies behave like scientists,
00:36:15.800 | and they play around with their toys
00:36:18.280 | to figure out how the world works
00:36:20.040 | and how it is responding to their actions.
00:36:23.520 | And that's how they learn about gravity and everything.
00:36:27.320 | And yeah, in 1990, we had the first systems like that,
00:36:30.960 | which just tried to play around with the environment
00:36:34.200 | and come up with situations that go beyond what
00:36:38.280 | they knew at that time, and then get
00:36:40.720 | a reward for creating these situations,
00:36:42.720 | and then becoming more general problem solvers
00:36:45.800 | and being able to understand more of the world.
00:36:48.960 | So yeah, I think in principle, that curiosity strategy
00:36:59.920 | or more sophisticated versions of what I just described,
00:37:03.240 | they are what we have built in as well,
00:37:06.480 | because evolution discovered that's a good way of exploring
00:37:10.200 | the unknown world.
00:37:11.440 | And a guy who explores the unknown world
00:37:13.440 | has a higher chance of solving problems
00:37:17.000 | that he needs to survive in this world.
00:37:19.480 | On the other hand, those guys who were too curious,
00:37:23.760 | they were weeded out as well.
00:37:25.360 | So you have to find this trade-off.
00:37:27.200 | Evolution found a certain trade-off.
00:37:29.240 | Apparently, in our society, there
00:37:31.960 | is a certain percentage of extremely explorative guys.
00:37:36.120 | And it doesn't matter if they die,
00:37:38.040 | because many of the others are more conservative.
00:37:42.280 | And so yeah, it would be surprising to me
00:37:46.480 | if that principle of artificial curiosity
00:37:54.440 | wouldn't be present in almost exactly the same form here.
00:37:58.400 | In our brains.
00:37:59.840 | So you're a bit of a musician and an artist.
00:38:03.080 | So continuing on this topic of creativity,
00:38:07.600 | what do you think is the role of creativity in intelligence?
00:38:10.440 | So you've kind of implied that it's
00:38:12.360 | essential for intelligence, if you think of intelligence
00:38:17.520 | as a problem-solving system, as ability to solve problems.
00:38:23.280 | But do you think it's essential, this idea of creativity?
00:38:28.920 | We never have a program, a sub-program,
00:38:31.760 | that is called creativity or something.
00:38:34.400 | It's just a side effect of what our problem solvers do.
00:38:37.960 | They are searching a space of problems,
00:38:40.200 | or a space of candidates, of solution candidates,
00:38:44.600 | until they hopefully find a solution to a given problem.
00:38:48.160 | But then there are these two types of creativity.
00:38:50.520 | And both of them are now present in our machines.
00:38:54.280 | The first one has been around for a long time,
00:38:56.480 | which is human gives problem to machine,
00:38:59.640 | machine tries to find a solution to that.
00:39:03.360 | And this has been happening for many decades.
00:39:05.920 | And for many decades, machines have
00:39:07.400 | found creative solutions to interesting problems,
00:39:11.400 | where humans were not aware of these particularly
00:39:16.560 | creative solutions, but then appreciated
00:39:18.920 | that the machine found that.
00:39:21.760 | The second is the pure creativity.
00:39:23.760 | That I would call, what I just mentioned,
00:39:25.640 | I would call the applied creativity,
00:39:29.040 | like applied art, where somebody tells you,
00:39:31.640 | now make a nice picture of this pope,
00:39:35.160 | and you will get money for that.
00:39:37.240 | So here is the artist, and he makes a convincing picture
00:39:41.240 | of the pope, and the pope likes it and gives him the money.
00:39:45.880 | And then there is the pure creativity,
00:39:48.720 | which is more like the power play
00:39:50.440 | and the artificial curiosity thing, where
00:39:52.760 | you have the freedom to select your own problem,
00:39:57.040 | like a scientist who defines his own question to study.
00:40:03.400 | And so that is the pure creativity, if you will,
00:40:07.960 | as opposed to the applied creativity,
00:40:11.400 | which serves another.
00:40:14.360 | - And in that distinction, there's
00:40:15.720 | almost echoes of narrow AI versus general AI.
00:40:19.160 | So this kind of constrained painting of a pope
00:40:22.720 | seems like the approaches of what people are calling
00:40:28.440 | narrow AI.
00:40:29.760 | And pure creativity seems to be--
00:40:33.360 | maybe I'm just biased as a human,
00:40:35.000 | but it seems to be an essential element
00:40:38.440 | of human-level intelligence.
00:40:41.120 | Is that what you're implying, to a degree?
00:40:46.040 | - If you zoom back a little bit, and you just
00:40:48.520 | look at a general problem-solving machine, which
00:40:51.480 | is trying to solve arbitrary problems,
00:40:53.600 | then this machine will figure out
00:40:56.320 | in the course of solving problems
00:40:58.240 | that it's good to be curious.
00:41:00.160 | So all of what I said just now about this pre-wired curiosity
00:41:05.320 | and this will to invent new problems that the system
00:41:09.040 | doesn't know how to solve yet should be just a byproduct
00:41:13.280 | of the general search.
00:41:15.040 | However, apparently, evolution has built it into us,
00:41:21.840 | because it turned out to be so successful,
00:41:25.080 | pre-wiring, a bias, a very successful exploratory bias
00:41:30.440 | that we are born with.
00:41:33.880 | - And you've also said that consciousness
00:41:35.680 | in the same kind of way may be a byproduct of problem solving.
00:41:41.200 | Do you think-- do you find this an interesting byproduct?
00:41:44.720 | Do you think it's a useful byproduct?
00:41:47.040 | What are your thoughts on consciousness in general?
00:41:49.320 | Or is it simply a byproduct of greater and greater
00:41:53.120 | capabilities of problem solving that's similar to creativity
00:41:59.160 | in that sense?
00:42:00.920 | - Yeah, we never have a procedure called consciousness
00:42:04.200 | in our machines.
00:42:05.320 | However, we get as side effects of what
00:42:08.720 | these machines are doing things that
00:42:11.880 | seem to be closely related to what people call consciousness.
00:42:16.720 | So for example, already in 1990, we
00:42:19.880 | had simple systems, which were basically recurrent networks,
00:42:24.160 | and therefore universal computers,
00:42:26.200 | trying to map incoming data into actions that lead to success.
00:42:33.960 | Maximizing reward in a given environment,
00:42:36.720 | always finding the charging station in time
00:42:40.400 | whenever the battery is low and negative signals are coming
00:42:42.720 | from the battery, always find the charging station in time
00:42:47.240 | without bumping against painful obstacles on the way.
00:42:50.520 | So complicated things, but very easily motivated.
00:42:54.720 | And then we give these little guys
00:42:59.200 | a separate recurrent neural network, which
00:43:01.920 | is just predicting what's happening
00:43:03.680 | if I do that and that.
00:43:04.800 | What will happen as a consequence of these actions
00:43:08.240 | that I'm executing?
00:43:09.360 | And it's just trained on the long and long history
00:43:11.600 | of interactions with the world.
00:43:14.040 | So it becomes a predictive model of the world, basically.
00:43:18.120 | And therefore, also a compressor of the observations
00:43:22.720 | of the world, because whatever you can predict,
00:43:25.280 | you don't have to store extra.
00:43:26.560 | So compression is a side effect of prediction.
00:43:30.600 | And how does this recurrent network compress?
00:43:33.240 | Well, it's inventing little subprograms,
00:43:35.720 | little subnetworks that stand for everything that frequently
00:43:39.960 | appears in the environment, like bottles and microphones
00:43:43.840 | and faces, maybe lots of faces in my environment.
00:43:48.120 | So I'm learning to create something like a prototype
00:43:51.000 | face, and a new face comes along,
00:43:52.920 | and all I have to encode are the deviations from the prototype.
00:43:56.360 | So it's compressing all the time the stuff that frequently
00:43:59.320 | appears.
00:44:00.880 | There's one thing that appears all the time, that
00:44:05.200 | is present all the time when the agent is interacting
00:44:08.000 | with its environment, which is the agent itself.
00:44:11.840 | So just for data compression reasons,
00:44:14.520 | it is extremely natural for this recurrent network
00:44:18.640 | to come up with little subnetworks that
00:44:21.080 | stand for the properties of the agents, the hand,
00:44:26.160 | the other actuators, and all the stuff
00:44:29.080 | that you need to better encode the data which is influenced
00:44:32.560 | by the actions of the agent.
00:44:34.360 | So there, just as a side effect of data compression
00:44:39.040 | during problem solving, you have internal self-models.
00:44:45.800 | Now, you can use this model of the world to plan your future,
00:44:51.360 | and that's what you also have done since 1990.
00:44:54.040 | So the recurrent network, which is the controller, which
00:44:57.880 | is trying to maximize reward, can
00:45:00.040 | use this model of the network--
00:45:01.840 | of the world, this model network of the world,
00:45:04.160 | this predictive model of the world,
00:45:05.620 | to plan ahead and say, let's not do this action sequence.
00:45:09.060 | Let's do this action sequence instead,
00:45:11.340 | because it leads to more predicted reward.
00:45:14.580 | And whenever it's waking up these little subnetworks that
00:45:18.900 | stand for itself, then it's thinking about itself.
00:45:22.220 | Then it's thinking about itself, and it's exploring mentally
00:45:27.700 | the consequences of its own actions.
00:45:30.940 | And now you tell me what is still missing.
00:45:37.220 | Missing the next-- the gap to consciousness.
00:45:39.660 | Yeah.
00:45:40.380 | There isn't.
00:45:41.060 | That's a really beautiful idea that if life
00:45:45.220 | is a collection of data, and life
00:45:47.300 | is a process of compressing that data to act efficiently,
00:45:54.140 | in that data, you yourself appear very often.
00:45:57.640 | So it's useful to form compressions of yourself.
00:46:00.900 | It's a really beautiful formulation
00:46:02.740 | of what consciousness is as a necessary side effect.
00:46:05.700 | It's actually quite compelling to me.
00:46:11.420 | You've described RNNs, developed LSTMs,
00:46:16.500 | long short-term memory networks, that are type
00:46:20.340 | over current neural networks.
00:46:22.140 | They've gotten a lot of success recently.
00:46:23.940 | So these are networks that model the temporal aspects
00:46:28.540 | in the data, temporal patterns in the data.
00:46:31.140 | And you've called them the deepest
00:46:33.860 | of the neural networks, right?
00:46:36.260 | So what do you think is the value of depth in the models
00:46:39.780 | that we use to learn?
00:46:41.220 | Since you mentioned the long short-term memory and the LSTM,
00:46:48.340 | I have to mention the names of the brilliant students who
00:46:51.500 | made that possible.
00:46:52.340 | Yes, of course.
00:46:53.380 | First of all, my first student ever,
00:46:55.220 | Sepp Hochreiter, who had fundamental insights already
00:46:58.420 | in his diploma thesis.
00:47:00.260 | Then Felix Geers, who had additional important
00:47:03.660 | contributions.
00:47:04.620 | Alex Gray is a guy from Scotland who
00:47:08.100 | is mostly responsible for this CTC algorithm, which
00:47:11.420 | is now often used to train the LSTM to do the speech
00:47:15.620 | recognition on all the Google, Android phones, and whatever,
00:47:19.300 | and Siri, and so on.
00:47:21.540 | So these guys, without these guys, I would be nothing.
00:47:26.820 | That's a lot of incredible work.
00:47:29.260 | What is now the depth?
00:47:30.540 | What is the importance of depth?
00:47:32.500 | Well, most problems in the real world
00:47:36.220 | are deep in the sense that the current input doesn't tell you
00:47:41.060 | all you need to know about the environment.
00:47:45.460 | So instead, you have to have a memory
00:47:48.460 | of what happened in the past.
00:47:49.780 | And often, important parts of that memory are dated.
00:47:54.820 | They are pretty old.
00:47:56.460 | So when you're doing speech recognition, for example,
00:48:00.340 | and somebody says 11, then that's
00:48:04.820 | about half a second or something like that, which
00:48:09.380 | means it's already 50 time steps.
00:48:12.020 | And another guy, or the same guy, says 7.
00:48:16.260 | So the ending is the same, Evan.
00:48:18.660 | But now the system has to see the distinction between 7
00:48:22.220 | and 11.
00:48:23.300 | And the only way it can see the difference
00:48:25.020 | is it has to store that 50 steps ago, there
00:48:29.980 | was an s or an l, 11 or 7.
00:48:34.940 | So there you have already a problem of depth 50,
00:48:38.100 | because for each time step, you have something
00:48:41.320 | like a virtual layer in the expanded unrolled version
00:48:44.900 | of this recurrent network, which is doing the speech recognition.
00:48:48.380 | So these long time lags, they translate into problem depth.
00:48:53.820 | And most problems in this world are such
00:48:57.780 | that you really have to look far back in time
00:49:01.620 | to understand what is the problem and to solve it.
00:49:06.180 | But just like with LCMs, you don't necessarily
00:49:09.100 | need to, when you look back in time, remember every aspect.
00:49:12.340 | You just need to remember the important aspects.
00:49:14.820 | That's right.
00:49:15.500 | The network has to learn to put the important stuff
00:49:18.580 | into memory and to ignore the unimportant noise.
00:49:24.180 | But in that sense, deeper and deeper is better?
00:49:28.540 | Or is there a limitation?
00:49:30.980 | I mean, LSTM is one of the great examples of architectures
00:49:36.540 | that do something beyond just deeper and deeper networks.
00:49:42.380 | There's clever mechanisms for filtering data,
00:49:45.500 | for remembering and forgetting.
00:49:47.860 | So do you think that kind of thinking is necessary?
00:49:51.340 | If you think about LSTMs as a leap, a big leap forward
00:49:54.500 | over traditional vanilla RNNs, what
00:49:57.820 | do you think is the next leap within this context?
00:50:02.900 | So LCM was a very clever improvement,
00:50:06.060 | but LCM still don't have the same kind of ability
00:50:10.420 | to see far back in the past as us humans do.
00:50:14.740 | The credit assignment problem across way back,
00:50:19.060 | not just 50 time steps or 100 or 1,000,
00:50:21.900 | but millions and billions.
00:50:24.540 | It's not clear what are the practical limits of the LSTM
00:50:29.060 | when it comes to looking back.
00:50:31.140 | Already in 2006, I think, we had examples
00:50:35.100 | where it not only looked back tens of thousands of steps,
00:50:37.900 | but really millions of steps.
00:50:40.740 | And Juan Perez Ortiz in my lab, I
00:50:44.820 | think was the first author of a paper where we really--
00:50:48.660 | was it 2006 or something--
00:50:50.460 | had examples where it learned to look back
00:50:53.620 | for more than 10 million steps.
00:50:57.500 | So for most problems of speech recognition,
00:51:02.060 | it's not necessary to look that far back.
00:51:04.620 | But there are examples where it does.
00:51:06.900 | Now, the looking back thing, that's
00:51:10.340 | rather easy because there is only one past.
00:51:14.460 | But there are many possible futures.
00:51:17.780 | And so a reinforcement learning system,
00:51:20.180 | which is trying to maximize its future expected reward
00:51:24.260 | and doesn't know yet which of these many possible futures
00:51:27.580 | should I select, given there's one single past,
00:51:31.620 | is facing problems that the LSTM by itself cannot solve.
00:51:36.540 | So the LSTM is good for coming up
00:51:38.900 | with a compact representation of the history so far,
00:51:42.380 | of the history of observations and actions so far.
00:51:46.380 | But now, how do you plan in an efficient and good way
00:51:51.420 | among all these--
00:51:54.340 | how do you select one of these many possible action sequences
00:51:58.140 | that a reinforcement learning system
00:51:59.860 | has to consider to maximize reward in this unknown future?
00:52:05.700 | So again, we have this basic setup
00:52:08.700 | where you have one RICAR network, which
00:52:12.820 | gets in the video and the speech and whatever,
00:52:15.940 | and is executing actions, and is trying to maximize reward.
00:52:19.660 | So there is no teacher who tells it
00:52:22.060 | what to do at which point in time.
00:52:24.460 | And then there's the other network,
00:52:26.100 | which is just predicting what's going
00:52:30.500 | to happen if I do that and that.
00:52:32.980 | And that could be an LSTM network.
00:52:35.260 | And it learns to look back all the way
00:52:38.460 | to make better predictions of the next time step.
00:52:41.620 | So essentially, although it's predicting only the next time
00:52:45.020 | step, it is motivated to learn to put into memory something
00:52:50.540 | that happened maybe a million steps ago,
00:52:52.420 | because it's important to memorize that if you
00:52:54.980 | want to predict that at the next time step, the next event.
00:52:59.620 | Now, how can a model of the world
00:53:03.300 | like that, a predictive model of the world,
00:53:05.500 | be used by the first guy?
00:53:07.940 | Let's call it the controller and the model,
00:53:10.180 | the controller and the model.
00:53:11.500 | How can the model be used by the controller
00:53:14.340 | to efficiently select among these many possible futures?
00:53:19.580 | So the naive way we had about 30 years ago
00:53:23.100 | was let's just use the model of the world as a stand-in,
00:53:27.620 | as a simulation of the world.
00:53:29.340 | And millisecond by millisecond, we plan the future.
00:53:32.420 | And that means we have to roll it out really in detail.
00:53:36.260 | And it will work only if the model is really good.
00:53:38.500 | And it will still be inefficient,
00:53:40.380 | because we have to look at all these possible futures.
00:53:42.940 | And there are so many of them.
00:53:45.740 | So instead, what we do now, since 2015,
00:53:49.500 | in our CM systems, controller model systems,
00:53:52.180 | we give the controller the opportunity
00:53:55.500 | to learn by itself how to use the potentially relevant parts
00:54:00.620 | of the model network to solve new problems more quickly.
00:54:06.300 | And if it wants to, it can learn to ignore the M.
00:54:10.100 | And sometimes it's a good idea to ignore the M,
00:54:12.780 | because it's really bad.
00:54:14.500 | It's a bad predictor in this particular situation of life
00:54:19.220 | where the controller is currently
00:54:20.700 | trying to maximize reward.
00:54:23.100 | However, it can also learn to address and exploit
00:54:27.100 | some of the subprograms that came about in the model
00:54:31.980 | network through compressing the data by predicting it.
00:54:36.300 | So it now has an opportunity to reuse
00:54:40.220 | that code, the algorithmic information in the model
00:54:43.540 | network, to reduce its own search space,
00:54:48.180 | such that it can solve a new problem more quickly than
00:54:51.900 | without the model.
00:54:53.820 | Compression-- so you're ultimately
00:54:57.780 | optimistic and excited about the power of RL,
00:55:01.180 | of reinforcement learning, in the context of real systems?
00:55:05.500 | Absolutely, yeah.
00:55:07.180 | So you see RL as a potential having a huge impact
00:55:11.660 | beyond just sort of the M part is often developed
00:55:15.980 | on supervised learning methods.
00:55:19.940 | You see RL as a--
00:55:23.900 | for problems of self-driving cars
00:55:25.740 | or any kind of applied side of robotics,
00:55:28.980 | that's the correct interesting direction
00:55:32.540 | for research, in your view?
00:55:34.660 | I do think so.
00:55:35.580 | We have a company called Naysense--
00:55:37.340 | Naysense.
00:55:37.940 | --which has applied reinforcement learning
00:55:41.020 | to little Audis--
00:55:44.060 | Little Audis.
00:55:45.100 | --which learn to park without a teacher.
00:55:48.180 | The same principles were used, of course.
00:55:51.500 | So these little Audis, they are small, maybe like that,
00:55:55.020 | so much smaller than the real Audis.
00:55:57.740 | But they have all the sensors that you
00:55:59.860 | find in the real Audis.
00:56:01.140 | You find the cameras, the LIDAR sensors.
00:56:03.820 | They go up to 120 kilometers an hour if they want to.
00:56:09.020 | And they have pain sensors, basically.
00:56:12.460 | And they don't want to bump against obstacles
00:56:15.220 | and other Audis.
00:56:17.140 | And so they must learn, like little babies, to park.
00:56:22.660 | Take the raw vision input and translate that
00:56:25.340 | into actions that lead to successful parking behavior,
00:56:29.500 | which is a rewarding thing.
00:56:30.740 | And yes, they learn that.
00:56:32.180 | They learn.
00:56:33.100 | So we have examples like that.
00:56:35.100 | And it's only in the beginning.
00:56:37.580 | This is just the tip of the iceberg.
00:56:39.140 | And I believe the next wave of AI
00:56:42.860 | is going to be all about that.
00:56:45.420 | So at the moment, the current wave of AI
00:56:47.580 | is about passive pattern observation and prediction.
00:56:52.340 | And that's what you have on your smartphone
00:56:55.780 | and what the major companies on the Pacific Rim
00:56:58.140 | are using to sell you ads, to do marketing.
00:57:02.340 | That's the current source of profit in AI.
00:57:05.620 | And that's only 1% or 2% of the world economy,
00:57:10.620 | which is big enough to make these companies pretty much
00:57:13.020 | the most valuable companies in the world.
00:57:15.500 | But there's a much, much bigger fraction
00:57:19.300 | of the economy going to be affected
00:57:20.940 | by the next wave, which is really
00:57:22.420 | about machines that shape the data through their own actions.
00:57:28.500 | Do you think simulation is ultimately the biggest way
00:57:33.180 | that those methods will be successful in the next 10,
00:57:36.180 | 20 years?
00:57:36.820 | We're not talking about 100 years from now.
00:57:38.820 | We're talking about the near-term impact of RL.
00:57:42.620 | Do you think really good simulation is required?
00:57:45.260 | Or is there other techniques, like imitation learning,
00:57:49.220 | observing other humans operating in the real world?
00:57:53.660 | Where do you think the success will come from?
00:57:57.740 | So at the moment, we have a tendency
00:57:59.420 | of using physics simulations to learn behavior
00:58:05.980 | for machines that learn to solve problems that humans also do
00:58:11.500 | not know how to solve.
00:58:13.980 | However, this is not the future, because the future
00:58:16.700 | is in what little babies do.
00:58:19.580 | They don't use a physics engine to simulate the world.
00:58:22.300 | No, they learn a predictive model
00:58:24.660 | of the world, which maybe sometimes is wrong in many ways,
00:58:30.100 | but captures all kinds of important abstract high-level
00:58:34.020 | predictions, which are really important to be successful.
00:58:38.460 | And that's what was the future 30 years ago, when we started
00:58:43.380 | that type of research.
00:58:44.340 | But it's still the future.
00:58:45.460 | And now we know much better how to go there, to move forward,
00:58:51.300 | and to really make work in systems based on that, where
00:58:54.980 | you have a learning model of the world, a model of the world
00:58:58.260 | that learns to predict what's going to happen
00:59:00.420 | if I do that and that.
00:59:01.820 | And then the controller uses that model
00:59:06.660 | to more quickly learn successful action sequences.
00:59:11.820 | And then, of course, always this curiosity thing.
00:59:13.900 | In the beginning, the model is stupid.
00:59:15.480 | So the controller should be motivated
00:59:17.780 | to come up with experiments with action sequences
00:59:20.900 | that lead to data that improve the model.
00:59:24.260 | Do you think improving the model,
00:59:27.020 | constructing an understanding of the world in this connection
00:59:30.340 | is now the popular approach that has been successful,
00:59:34.260 | or grounded in ideas of neural networks?
00:59:38.660 | But in the '80s with expert systems,
00:59:40.660 | there's symbolic AI approaches, which to us humans
00:59:44.980 | are more intuitive, in the sense that it makes sense
00:59:49.220 | that you build up knowledge in this knowledge representation.
00:59:52.540 | What kind of lessons can we draw into our current approaches
00:59:57.460 | from expert systems, from symbolic AI?
01:00:00.580 | So I became aware of all of that.
01:00:03.660 | In the '80s and back then, logic programming
01:00:07.940 | was a huge thing.
01:00:09.020 | Was it inspiring to you yourself?
01:00:10.860 | Did you find it compelling?
01:00:12.180 | Because a lot of your work was not so much in that realm,
01:00:16.620 | right, it was more in the learning systems.
01:00:18.580 | Yes and no, but we did all of that.
01:00:20.860 | So my first publication ever actually was,
01:00:25.860 | 1987, was the implementation of genetic algorithm
01:00:31.620 | of a genetic programming system in Prolog.
01:00:34.620 | So Prolog, that's what you learned back then,
01:00:37.820 | which is a logic programming language.
01:00:40.180 | And the Japanese, they have this huge fifth generation
01:00:44.420 | AI project, which was mostly about logic programming
01:00:48.340 | back then, although neural networks existed
01:00:51.300 | and were well known back then.
01:00:54.060 | And deep learning has existed since 1965,
01:00:58.140 | since this guy in the Ukraine, Ivanko, started it.
01:01:02.260 | But the Japanese and many other people,
01:01:05.700 | they focused really on this logic programming.
01:01:08.060 | And I was influenced to the extent that I said,
01:01:10.340 | okay, let's take these biologically inspired algorithms
01:01:13.780 | like evolution, programs,
01:01:16.820 | and implement that in the language which I know,
01:01:22.340 | which was Prolog, for example, back then.
01:01:25.100 | And then, in many ways, this came back later
01:01:29.060 | because the Godel machine, for example,
01:01:31.940 | has a proof search on board.
01:01:33.540 | And without that, it would not be optimal.
01:01:35.900 | While Markus Hutter's universal algorithm
01:01:38.300 | for solving all well-defined problems
01:01:40.620 | has a proof search on board.
01:01:42.460 | So that's very much logic programming.
01:01:45.420 | Without that, it would not be asymptotically optimal.
01:01:50.300 | But then, on the other hand,
01:01:51.220 | because we have very pragmatic guys also,
01:01:54.340 | we focused on recurrent neural networks
01:01:58.900 | and suboptimal stuff,
01:02:02.740 | such as gradient-based search and program space,
01:02:05.860 | rather than provably optimal things.
01:02:09.100 | - So logic programming, does it,
01:02:11.140 | certainly has a usefulness in,
01:02:13.380 | when you're trying to construct something provably optimal
01:02:16.740 | or provably good or something like that.
01:02:18.980 | But is it useful for practical problems?
01:02:21.980 | - It's really useful for our theory improving.
01:02:24.140 | The best theory improvers today are not neural networks.
01:02:28.020 | No, they are logic programming systems
01:02:30.940 | and they are much better theory improvers
01:02:33.140 | than most math students in the first or second semester.
01:02:37.620 | - But for reasoning, for playing games of Go or chess,
01:02:43.260 | or for robots, autonomous vehicles
01:02:45.700 | that operate in the real world,
01:02:46.940 | or object manipulation, you think learning?
01:02:51.260 | - Yeah, as long as the problems have little to do
01:02:54.340 | with theory improving themselves,
01:02:58.700 | then as long as that is not the case,
01:03:01.700 | you would just want to have better pattern recognition.
01:03:05.300 | So to build a self-driving car,
01:03:06.820 | you want to have better pattern recognition
01:03:09.100 | and pedestrian recognition and all these things.
01:03:13.540 | And you want to minimize the number of false positives,
01:03:19.060 | which is currently slowing down self-driving cars
01:03:21.340 | in many ways.
01:03:22.220 | And all of that has very little to do
01:03:24.980 | with logic programming, yeah.
01:03:27.540 | - What are you most excited about
01:03:31.580 | in terms of directions of artificial intelligence
01:03:34.060 | at this moment in the next few years
01:03:37.100 | in your own research and in the broader community?
01:03:40.020 | - So I think in the not so distant future,
01:03:44.260 | we will have for the first time
01:03:47.420 | little robots that learn like kids.
01:03:49.860 | And I will be able to say to the robot,
01:03:54.220 | "Look here, robot, we are going to assemble a smartphone.
01:03:59.380 | "Let's take this slab of plastic and the screwdriver
01:04:04.020 | "and let's screw in the screw like that.
01:04:07.340 | "No, not like that, like that.
01:04:10.140 | "Not like that, like that."
01:04:12.220 | Like that.
01:04:13.540 | And I don't have a data glove or something.
01:04:16.980 | He will see me and he will hear me
01:04:20.420 | and he will try to do something with his own actuators,
01:04:24.220 | which will be really different from mine,
01:04:26.260 | but he will understand the difference
01:04:28.020 | and will learn to imitate me,
01:04:31.540 | but not in the supervised way
01:04:33.820 | where a teacher is giving target signals
01:04:37.820 | for all his muscles all the time.
01:04:40.140 | No, by doing this high level imitation
01:04:43.060 | where he first has to learn to imitate me
01:04:46.060 | and then to interpret these additional noises
01:04:48.500 | coming from my mouth as helping,
01:04:51.500 | helpful signals to do that better.
01:04:54.660 | And then it will, by itself,
01:04:58.540 | come up with faster ways and more efficient ways
01:05:01.940 | of doing the same thing.
01:05:03.660 | And finally, I stop his learning algorithm
01:05:07.900 | and make a million copies and sell it.
01:05:10.260 | And so at the moment, this is not possible,
01:05:13.740 | but we already see how we are going to get there.
01:05:17.260 | And you can imagine to the extent
01:05:19.220 | that this works economically and cheaply,
01:05:22.060 | it's going to change everything.
01:05:25.140 | Almost all of production is going to be affected by that.
01:05:29.820 | And a much bigger wave,
01:05:32.820 | a much bigger AI wave is coming
01:05:36.340 | than the one that we are currently,
01:05:37.740 | witnessing, which is mostly about passive pattern recognition
01:05:40.740 | on your smartphone.
01:05:42.020 | This is about active machines that shapes data
01:05:44.900 | through the actions they are executing.
01:05:48.180 | And they learn to do that in a good way.
01:05:50.260 | So many of the traditional industries
01:05:54.980 | are going to be affected by that.
01:05:56.620 | All the companies that are building machines
01:06:00.140 | will equip these machines with cameras and other sensors,
01:06:05.820 | and they are going to learn to solve all kinds of problems
01:06:10.500 | through interaction with humans,
01:06:12.740 | but also a lot on their own
01:06:15.100 | to improve what they already can do.
01:06:16.940 | And lots of old economy is going to be affected by that.
01:06:23.940 | And in recent years, I have seen that old economy
01:06:27.260 | is actually waking up and realizing that this is the case.
01:06:32.140 | - Are you optimistic about the future?
01:06:33.980 | Are you concerned?
01:06:35.780 | There's a lot of people concerned in the near term
01:06:38.340 | about the transformation of the nature of work.
01:06:42.940 | The kind of ideas that you just suggested
01:06:45.540 | would have a significant impact
01:06:47.300 | of what kind of things could be automated.
01:06:49.260 | Are you optimistic about that future?
01:06:51.940 | Are you nervous about that future?
01:06:54.660 | And looking a little bit farther into the future,
01:06:58.300 | there's people like Elon Musk,
01:07:01.900 | Stuart Russell,
01:07:02.740 | concerned about the existential threats of that future.
01:07:06.660 | So in the near term,
01:07:07.780 | job loss in the long-term existential threat,
01:07:10.780 | are these concerns to you, or are you ultimately optimistic?
01:07:13.780 | - So let's first address the near future.
01:07:19.540 | We have had predictions of job losses for many decades.
01:07:28.100 | For example, when industrial robots came along,
01:07:31.620 | many people predicted that lots of jobs
01:07:35.900 | are going to get lost.
01:07:38.700 | And in a sense, they were right.
01:07:42.540 | Because back then there were car factories
01:07:45.980 | and hundreds of people,
01:07:47.700 | and these factories assembled cars.
01:07:50.780 | And today the same car factories have hundreds of robots
01:07:53.900 | and maybe three guys watching the robots.
01:07:57.060 | It's a very big number.
01:07:58.380 | On the other hand,
01:08:01.900 | those countries that have lots of robots per capita,
01:08:06.140 | Japan, Korea, Germany, Switzerland,
01:08:08.540 | a couple of other countries,
01:08:09.900 | they have really low unemployment rates.
01:08:14.660 | Somehow all kinds of new jobs were created.
01:08:18.220 | Back then, nobody anticipated those jobs.
01:08:24.860 | Decades ago, I always said,
01:08:26.740 | it's really easy to say which jobs are going to get lost,
01:08:31.740 | but it's really hard to predict the new ones.
01:08:34.220 | 30 years ago, who would have predicted all these people
01:08:39.300 | making money as YouTube bloggers, for example?
01:08:43.860 | 200 years ago, 60% of all people
01:08:50.220 | used to work in agriculture.
01:08:53.820 | Today, maybe 1%.
01:08:56.740 | But still, only, I don't know, 5% unemployment.
01:09:01.740 | Lots of new jobs were created.
01:09:03.740 | And Homo Ludens, the playing man,
01:09:06.820 | is inventing new jobs all the time.
01:09:10.540 | Most of these jobs are not existentially necessary
01:09:15.540 | for the survival of our species.
01:09:17.740 | There are only very few existentially necessary jobs,
01:09:22.860 | such as farming and building houses
01:09:25.820 | and warming up the houses,
01:09:28.140 | but less than 10% of the population is doing that.
01:09:31.300 | And most of these newly invented jobs
01:09:33.620 | are about interacting with other people in new ways,
01:09:38.620 | through new media and so on,
01:09:40.900 | getting new types of kudos in forms of likes and whatever,
01:09:46.220 | and even making money through that.
01:09:48.540 | So Homo Ludens, the playing man,
01:09:51.780 | doesn't want to be unemployed,
01:09:53.380 | and that's why he's inventing new jobs all the time.
01:09:57.020 | And he keeps considering these jobs as really important
01:10:01.740 | and is investing a lot of energy and hours of work
01:10:05.540 | into those new jobs.
01:10:08.340 | - That's quite beautifully put.
01:10:10.180 | We're really nervous about the future
01:10:11.980 | because we can't predict
01:10:13.260 | what kind of new jobs will be created.
01:10:15.020 | But you're ultimately optimistic
01:10:18.340 | that we humans are so restless that we create
01:10:22.380 | and give meaning to newer and newer jobs,
01:10:24.980 | totally new things that get likes on Facebook
01:10:29.980 | or whatever the social platform is.
01:10:32.300 | So what about long-term existential threat of AI,
01:10:36.700 | where our whole civilization may be swallowed up
01:10:40.980 | by this ultra super intelligent systems?
01:10:44.460 | - Maybe it's not going to be swallowed up,
01:10:47.460 | but I'd be surprised if we humans were the last step
01:10:52.460 | in the evolution of the universe.
01:10:57.940 | - You've actually had this beautiful comment somewhere
01:11:03.820 | that I've seen saying that artificial,
01:11:07.340 | quite insightful,
01:11:09.860 | "Artificial general intelligence systems,
01:11:12.020 | "just like us humans,
01:11:13.460 | "will likely not want to interact with humans.
01:11:16.080 | "They'll just interact amongst themselves,
01:11:17.940 | "just like ants interact amongst themselves
01:11:21.460 | "and only tangentially interact with humans."
01:11:25.420 | And it's quite an interesting idea
01:11:27.540 | that once we create AGI,
01:11:28.900 | they will lose interest in humans
01:11:31.420 | and compete for their own Facebook likes
01:11:34.500 | on their own social platforms.
01:11:36.780 | So within that quite elegant idea,
01:11:40.300 | how do we know in a hypothetical sense
01:11:45.140 | that there's not already intelligence systems out there?
01:11:48.840 | How do you think broadly
01:11:50.200 | of general intelligence greater than us?
01:11:54.360 | How would we know it's out there?
01:11:56.600 | How would we know it's around us?
01:11:59.160 | And could it already be?
01:12:00.400 | - I'd be surprised if within the next few decades
01:12:05.320 | or something like that,
01:12:06.480 | we won't have AIs that are truly smart in every single way
01:12:13.100 | and better problem solvers
01:12:14.200 | in almost every single important way.
01:12:16.600 | And I'd be surprised if they wouldn't realize
01:12:23.080 | what we have realized a long time ago,
01:12:24.880 | which is that almost all physical resources
01:12:28.560 | are not here in this biosphere,
01:12:31.000 | but further out,
01:12:32.060 | the rest of the solar system
01:12:36.760 | gets 2 billion times more solar energy
01:12:40.680 | than our little planet.
01:12:43.680 | There's lots of material out there
01:12:45.560 | that you can use to build robots
01:12:47.400 | and self-replicating robot factories and all this stuff.
01:12:50.900 | And they are going to do that.
01:12:53.080 | And they will be scientists and curious,
01:12:56.480 | and they will explore what they can do.
01:12:58.600 | And in the beginning,
01:13:01.100 | they will be fascinated by life
01:13:04.640 | and by their own origins in our civilization.
01:13:07.320 | They will want to understand that completely,
01:13:09.760 | just like people today would like to understand
01:13:12.700 | how life works and also the history
01:13:17.700 | of our own existence and civilization,
01:13:22.640 | but then also in the physical laws
01:13:24.400 | that created all of that.
01:13:25.640 | So in the beginning, they will be fascinated by life.
01:13:30.160 | Once they understand it, they lose interest,
01:13:32.820 | like anybody who loses interest in things he understands.
01:13:38.340 | And then, as you said,
01:13:42.180 | the most interesting sources of information for them
01:13:47.180 | will be others of their own kind.
01:13:53.060 | So at least in the long run,
01:14:01.020 | there seems to be some sort of protection
01:14:04.900 | through lack of interest on the other side.
01:14:11.220 | And now it seems also clear,
01:14:14.700 | as far as we understand physics,
01:14:16.700 | you need matter and energy to compute
01:14:20.460 | and to build more robots and infrastructure
01:14:22.620 | and more AI civilization and AI ecologies
01:14:27.620 | consisting of trillions of different types of AIs.
01:14:31.780 | And so it seems inconceivable to me
01:14:34.780 | that this thing is not going to expand.
01:14:37.620 | Some AI ecology not controlled by one AI,
01:14:41.020 | but by trillions of different types of AIs competing
01:14:44.580 | in all kinds of quickly evolving
01:14:47.860 | and disappearing ecological niches
01:14:49.900 | in ways that we cannot fathom at the moment.
01:14:52.500 | But it's going to expand,
01:14:54.700 | limited by light speed and physics,
01:14:57.020 | but it's going to expand.
01:14:58.260 | And now we realize that the universe is still young.
01:15:02.980 | It's only 13.8 billion years old,
01:15:06.180 | and it's going to be 1,000 times older than that.
01:15:10.580 | So there's plenty of time to conquer the entire universe
01:15:15.580 | and to fill it with intelligence
01:15:19.820 | and senders and receivers such that AIs can travel
01:15:23.460 | the way they are traveling in our labs today,
01:15:27.300 | which is by radio from sender to receiver.
01:15:30.040 | And let's call the current age of the universe one eon.
01:15:35.940 | One eon.
01:15:39.580 | Now it will take just a few eons from now,
01:15:41.940 | and the entire visible universe
01:15:43.620 | is going to be full of that stuff.
01:15:45.340 | And let's look ahead to a time
01:15:48.980 | when the universe is going to be 1,000 times older
01:15:51.300 | than it is now.
01:15:52.140 | They will look back and they will say,
01:15:54.580 | "Look, almost immediately after the Big Bang,
01:15:57.060 | "only a few eons later,
01:15:59.820 | "the entire universe started to become intelligent."
01:16:02.580 | Now to your question,
01:16:04.580 | how do we see whether anything like that
01:16:08.380 | has already happened or is already in a more advanced stage
01:16:12.580 | in some other part of the universe,
01:16:14.740 | of the visible universe?
01:16:16.660 | We are trying to look out there
01:16:17.740 | and nothing like that has happened so far.
01:16:20.660 | Or is that true?
01:16:22.460 | - Do you think we would recognize it?
01:16:24.340 | How do we know it's not among us?
01:16:25.740 | How do we know planets aren't in themselves
01:16:28.820 | intelligent beings?
01:16:30.580 | How do we know ants, seen as a collective,
01:16:36.540 | are not much greater intelligence than our own?
01:16:40.260 | These kinds of ideas.
01:16:41.380 | - Yeah.
01:16:42.300 | When I was a boy, I was thinking about these things.
01:16:45.140 | And I thought, "Hmm, maybe it has already happened."
01:16:48.380 | Because back then I knew, I learned from popular physics
01:16:53.060 | books that the structure, the large-scale structure
01:16:57.140 | of the universe is not homogenous.
01:17:00.100 | And you have these clusters of galaxies,
01:17:03.060 | and then in between there are these huge empty spaces.
01:17:07.500 | And I thought, "Hmm, maybe they aren't really empty."
01:17:12.380 | It's just that in the middle of that,
01:17:13.980 | some AI civilization already has expanded
01:17:16.900 | and then has covered a bubble
01:17:19.740 | of a billion light-years diameter,
01:17:22.220 | and is using all the energy of all the stars
01:17:25.740 | within that bubble for its own unfathomable purposes.
01:17:29.580 | And so it already has happened,
01:17:31.540 | and we just fail to interpret the signs.
01:17:34.860 | But then I learned that gravity by itself
01:17:39.420 | explains the large-scale structure of the universe,
01:17:42.300 | and that this is not a convincing explanation.
01:17:45.460 | And then I thought, "Maybe it's the dark matter."
01:17:50.460 | Because as far as we know today,
01:17:54.820 | 80% of the measurable matter is invisible.
01:17:59.820 | And we know that because otherwise our galaxy
01:18:03.620 | or other galaxies would fall apart.
01:18:06.580 | They are rotating too quickly.
01:18:08.060 | And then the idea was maybe all of these AI civilizations
01:18:15.060 | that are already out there,
01:18:17.300 | they are just invisible because they're really efficient
01:18:23.460 | in using the energies of their own local systems,
01:18:26.580 | and that's why they appear dark to us.
01:18:29.780 | But this is also not a convincing explanation
01:18:31.700 | because then the question becomes,
01:18:34.660 | "Why are there still any visible stars left
01:18:39.660 | in our own galaxy,
01:18:42.060 | which also must have a lot of dark matter?"
01:18:44.620 | So that is also not a convincing thing.
01:18:46.900 | And today, I like to think it's quite plausible
01:18:53.380 | that maybe we are the first,
01:18:54.540 | at least in our local light cone,
01:18:57.300 | within the few hundreds of millions of light years
01:19:02.300 | that we can reliably observe.
01:19:09.220 | - Is that exciting to you, that we might be the first?
01:19:12.020 | - And it would make us much more important
01:19:16.500 | because if we mess it up through a nuclear war,
01:19:20.740 | then maybe this will have an effect
01:19:25.500 | on the development of the entire universe.
01:19:30.500 | - So let's not mess it up.
01:19:32.540 | - Let's not mess it up.
01:19:33.740 | - Jürgen, thank you so much for talking today.
01:19:35.740 | I really appreciate it.
01:19:37.220 | - It's my pleasure.
01:19:38.260 | (upbeat music)
01:19:40.860 | (upbeat music)
01:19:43.460 | (upbeat music)
01:19:46.060 | (upbeat music)
01:19:48.660 | (upbeat music)
01:19:51.260 | (upbeat music)
01:19:53.860 | [BLANK_AUDIO]