back to index

Stephen Wolfram: Computational Universe | MIT 6.S099: Artificial General Intelligence (AGI)


Chapters

0:0
10:57 Random Graph
22:13 Feature Space Plots
27:20 What Does the Space of all Possible Programs Look like
27:42 Cellular Automata
27:51 Cellular Automaton
33:52 Boolean Algebra
37:17 Computational Irreducibility
37:22 Principle of Computational Equivalence
38:39 The First Axiom System That Corresponds to Boolean Algebra
39:49 Proof Graph
50:13 What Is Ai Going To Allow Us To Automate
55:56 Symbolic Discourse
56:15 Smart Contracts
63:52 Key Influences
73:53 Girdles Theorem
77:49 Algorithmic Drugs
79:43 Molecular Computing
85:19 Teach Kids To Be Useful in a World Where Everything Is Computational
100:9 Multi Digit Arithmetic

Whisper Transcript | Transcript Only Page

00:00:00.000 | Welcome back to 6S099 Artificial General Intelligence.
00:00:04.960 | Today we have Stephen Wolfram.
00:00:06.820 | (audience applauding)
00:00:12.560 | That's a first, I didn't even get started,
00:00:15.520 | you're already clapping.
00:00:16.720 | In his book, A New Kind of Science,
00:00:19.960 | he has explored and revealed the power,
00:00:21.840 | beauty and complexity of cellular automata
00:00:24.000 | as simple computational systems
00:00:27.160 | for which incredible complexity can emerge.
00:00:30.160 | It's actually one of the books that really inspired me
00:00:32.360 | to get into artificial intelligence.
00:00:34.240 | He's created the Wolfram Alpha
00:00:35.560 | computational knowledge engine,
00:00:37.200 | created Mathematica that has now expanded
00:00:39.380 | to become Wolfram Language.
00:00:41.120 | Both he and his son were involved in helping analyze,
00:00:44.240 | create the alien language from the movie Arrival
00:00:47.920 | of which they use the Wolfram Language.
00:00:50.040 | Please again, give Stephen a warm welcome.
00:00:53.240 | (audience applauding)
00:00:56.560 | All right, so I gather the brief here
00:00:58.680 | is to talk about how artificial general intelligence
00:01:01.240 | is going to be achieved.
00:01:02.240 | Is that the basic picture?
00:01:04.840 | So maybe I'm reminded of kind of a story
00:01:07.560 | which I don't think I've ever told in public
00:01:09.280 | but that something that happened
00:01:11.640 | just a few buildings over from here.
00:01:13.040 | So this was 2009 and Wolfram Alpha
00:01:16.320 | was about to arrive on the scene.
00:01:19.060 | I assume most of you have used Wolfram Alpha
00:01:21.120 | or seen Wolfram Alpha, yes?
00:01:23.360 | How many of you have used Wolfram Alpha?
00:01:27.560 | Okay, that's good.
00:01:28.640 | (audience laughing)
00:01:31.120 | So I had long been a friend of Marvin Minsky's
00:01:34.320 | and Marvin was a sort of pioneer of the AI world
00:01:37.880 | and I'd kind of seen for years
00:01:41.680 | question answering systems that tried to
00:01:44.080 | do sort of general intelligence question answering
00:01:48.560 | and so had Marvin.
00:01:49.960 | And so I was gonna show Marvin Wolfram Alpha.
00:01:54.440 | He looks at it and he's like, oh, okay, that's fine.
00:01:57.620 | Whatever, said no Marvin, this time it actually works.
00:02:01.860 | You can try real questions.
00:02:03.920 | This is actually something useful.
00:02:05.380 | This is not just a toy.
00:02:07.060 | And it was kind of interesting to see.
00:02:08.800 | It took about five minutes for Marvin to realize
00:02:11.260 | that this was finally a question answering system
00:02:13.860 | that could actually answer questions
00:02:15.300 | that were useful to people.
00:02:16.620 | And so one question is how did we achieve that?
00:02:21.120 | So you go to Wolfram Alpha and you can ask it,
00:02:23.500 | I mean, it's, I don't know what we can ask it.
00:02:25.580 | I don't know, what's the, some random question.
00:02:30.580 | What is the population of Cambridge?
00:02:33.060 | Actually, here's a question, divided by, let's try that.
00:02:36.100 | What's the population of Cambridge?
00:02:37.460 | It's probably gonna figure out.
00:02:39.020 | If we mean Cambridge, Massachusetts,
00:02:40.540 | it's gonna give us some number,
00:02:41.580 | it's gonna give us some plot.
00:02:43.140 | Actually, what I wanna know is number of students
00:02:47.500 | at MIT divided by population of Cambridge.
00:02:51.180 | See if it can figure that out.
00:02:54.740 | And, okay, it's kind of interesting.
00:02:58.260 | Oh, no, that's divided by, ah, that's interesting.
00:03:01.180 | It guessed that we were talking about Cambridge University
00:03:04.220 | as the denominator there.
00:03:05.740 | So it says the number of students at MIT
00:03:08.120 | divided by the number of students at Cambridge University.
00:03:10.300 | That's interesting, I'm actually surprised.
00:03:11.820 | Let's see what happens if I say Cambridge MA there.
00:03:14.620 | No, it'll probably fail horribly.
00:03:17.660 | No, that's good.
00:03:19.140 | Okay, so, no, that's interesting.
00:03:22.660 | That's a plot as a function of time,
00:03:24.940 | of the fraction of the, okay, so anyway.
00:03:28.980 | So I'm glad it works.
00:03:32.020 | So one question is how did we manage to get,
00:03:38.080 | so many things have to work
00:03:39.660 | in order to get stuff like this to work.
00:03:41.820 | You have to be able to understand the natural language,
00:03:44.780 | you have to have data sources,
00:03:46.340 | you have to be able to compute things
00:03:47.660 | from the data, and so on.
00:03:49.100 | One of the things that was a surprise to me
00:03:51.760 | was in terms of natural language understanding,
00:03:54.280 | was the critical thing turned out to be
00:03:57.820 | just knowing a lot of stuff.
00:03:59.780 | The actual parsing of the natural language
00:04:02.660 | is kind of, I think it's kind of clever,
00:04:04.660 | and we use a bunch of ideas that came
00:04:06.220 | from my new kind of science project and so on.
00:04:08.860 | But I think the most important thing
00:04:10.500 | is just knowing a lot of stuff about the world
00:04:13.060 | is really important to actually being able
00:04:16.100 | to understand natural language in a useful situation.
00:04:20.140 | I think the other thing is having,
00:04:22.340 | actually having access to lots of data.
00:04:26.080 | Let me show you a typical example here
00:04:27.680 | of what is needed.
00:04:29.120 | So I ask about the ISS, and hopefully it'll wake up
00:04:32.880 | and tell us something here, come on,
00:04:34.240 | what's going on here?
00:04:35.560 | There we go, okay.
00:04:36.680 | So it figured out that we probably are talking
00:04:38.720 | about a spacecraft, not a file format,
00:04:41.000 | and now it's gonna give us a plot
00:04:42.280 | that shows us where the ISS is right now.
00:04:45.680 | So to make this work, we obviously have to have
00:04:48.280 | some feed of radar tracking data about satellites
00:04:52.680 | and so on, which we have for every satellite
00:04:54.440 | that's out there.
00:04:56.440 | But then that's not good enough to just have that feed.
00:04:59.560 | Then you also have to be able to do celestial mechanics
00:05:02.520 | to work out, well, where is the ISS actually right now
00:05:05.800 | based on the orbital elements that have been deduced
00:05:07.960 | from radar, and then if we want to know things like,
00:05:10.600 | okay, when is it going to, it's not currently visible
00:05:13.340 | from Boston, Massachusetts, it will next rise
00:05:15.920 | at 7.36 p.m. on Monday on today.
00:05:20.920 | So this requires a mixture of data
00:05:26.680 | about what's going on in the world,
00:05:28.400 | together with models about how the world
00:05:31.080 | is supposed to work, being able to predict things,
00:05:33.120 | and so on.
00:05:33.960 | And I think another thing that I kind of realized
00:05:35.680 | about AI and so on from the Wolfram Alpha effort
00:05:42.300 | has been that one of the earlier ideas
00:05:46.840 | for how one would achieve AI was let's make it work
00:05:50.560 | kind of like brains do, and let's make it figure stuff out,
00:05:52.760 | and so if it has to do physics, let's have it do physics
00:05:55.620 | by pure reasoning, like people at least used to do physics.
00:06:00.620 | But in the last 300 years, we've had a different way
00:06:03.960 | to do physics that wasn't sort of based
00:06:05.820 | on natural philosophy.
00:06:07.140 | It was instead based on things like mathematics.
00:06:09.900 | And so one of the things that we were doing
00:06:12.340 | in Wolfram Alpha was to kind of cheat relative
00:06:15.460 | to what had been done in previous AI systems,
00:06:18.340 | which was instead of using kind of reasoning-type methods,
00:06:20.980 | we're just saying, okay, we want to compute
00:06:22.620 | where the ISS is going to be,
00:06:24.300 | we've got a bunch of equations of motion
00:06:25.960 | that correspond to differential equations,
00:06:27.700 | we're just gonna solve the equations of motion
00:06:29.540 | and get an answer.
00:06:30.580 | That's kind of leveraging the last 300 years or so
00:06:33.300 | of exact science that had been done,
00:06:36.100 | rather than trying to make use
00:06:37.500 | of kind of human reasoning ideas.
00:06:39.900 | And I might say that in terms of the history
00:06:43.340 | of the Wolfram Alpha project,
00:06:45.860 | when I was a kid, a disgustingly long time ago,
00:06:49.320 | I was interested in AI kinds of things,
00:06:51.820 | and I, in fact, I was kind of upset recently
00:06:54.780 | to find a bunch of stuff I did when I was 12 years old,
00:06:56.940 | kind of trying to assemble a pre-version of Wolfram Alpha
00:07:00.380 | way back before it was technologically possible,
00:07:02.540 | but it's also a reminder that one just does the same thing
00:07:05.580 | one's whole life, so to speak, at some level.
00:07:09.180 | But what happened was when I started off working mainly
00:07:14.180 | in physics, and then I got involved
00:07:18.440 | in building computer systems to do things
00:07:20.420 | like mathematical computation and so on,
00:07:22.820 | and I then sort of got interested in, okay,
00:07:26.180 | so can we generalize this stuff,
00:07:27.700 | and can we really make systems that can answer
00:07:31.140 | sort of arbitrary questions about the world,
00:07:33.380 | and for example, sort of the promise would be
00:07:38.020 | if there's something that is systematically known
00:07:41.100 | in our civilization, make it automatic to answer questions
00:07:44.020 | on the basis of that systematic knowledge.
00:07:46.240 | And back in around late 1970s, early 1980s,
00:07:50.020 | my conclusion was if you wanted to do something like that,
00:07:52.460 | the only realistic path to being able to do it
00:07:55.060 | was to build something much like a brain.
00:07:57.240 | And so I got interested in neural nets,
00:07:59.020 | and I tried to do things with neural nets back in 1980,
00:08:01.860 | and nothing very interesting happened,
00:08:03.660 | well, I couldn't get them to do anything very interesting.
00:08:06.260 | And that, so I kind of had the idea
00:08:09.500 | that the only way to get the kind of thing
00:08:11.740 | that now exists in Wolfram Alpha, for example,
00:08:14.260 | was to build a brain-like thing.
00:08:16.960 | And then many years later, for reasons I can explain,
00:08:20.300 | I kind of came back to this and realized,
00:08:22.780 | actually, it wasn't true that you had to build
00:08:24.380 | a brain-like thing, sort of mere computation was sufficient.
00:08:27.820 | And that was kind of what got me started
00:08:29.780 | actually trying to build Wolfram Alpha.
00:08:31.660 | When we started building Wolfram Alpha,
00:08:33.460 | one of the things I did was go to a sort of a field trip
00:08:35.700 | to a big reference library, and you see all these shelves
00:08:39.380 | of books and so on, and the question is,
00:08:41.300 | can we take all of this knowledge that exists
00:08:43.260 | in all of these books and actually automate
00:08:46.180 | being able to answer questions on the basis of it?
00:08:48.400 | And I think we've pretty much done that.
00:08:49.980 | For that, at least the books you find
00:08:51.900 | in a typical reference library.
00:08:54.060 | So that was, it looked kind of daunting at the beginning
00:08:57.260 | because there's a lot of knowledge and information
00:08:59.980 | out there, but actually it turns out
00:09:01.860 | there are a few thousand domains,
00:09:03.620 | and we've steadily gone through and worked
00:09:05.860 | on these different domains.
00:09:07.300 | Another feature of the Wolfram Alpha project
00:09:09.120 | was that we didn't really, you know,
00:09:11.140 | I'd been involved a lot in doing basic science
00:09:13.360 | and in trying to have sort of grand theories of the world.
00:09:15.980 | One of my principles in building Wolfram Alpha
00:09:17.920 | was not to start from a grand theory of the world.
00:09:20.100 | That is, not to kind of start from some global ontology
00:09:22.980 | of the world and then try and build down
00:09:25.020 | into all these different domains,
00:09:26.220 | but instead to work up from having, you know,
00:09:28.980 | hundreds, then thousands of domains that actually work,
00:09:31.580 | whether they're, you know, information about cars
00:09:34.220 | or information about sports or information about movies
00:09:37.260 | or whatever else, have each of these domains
00:09:40.440 | sort of building up from the bottom
00:09:41.980 | in each of these domains, and then finding
00:09:44.220 | that there were common themes in these domains
00:09:46.020 | that we could then build into frameworks
00:09:48.020 | and then sort of construct the whole system
00:09:50.000 | on the basis of that, and that's kind of how it's worked,
00:09:52.740 | and I can talk about some of the actual frameworks
00:09:54.620 | that we end up using and so on.
00:09:57.040 | But maybe I should explain a little bit more.
00:10:01.440 | So one question is how does Wolfram Alpha
00:10:04.760 | actually sort of work inside?
00:10:06.960 | And the answer is it's a big program.
00:10:09.320 | It's about, it's the core system is about 15 million lines
00:10:12.880 | of Wolfram language code, and it's some number
00:10:15.840 | of terabytes of raw data.
00:10:17.960 | And so the way, the thing that sort of made
00:10:24.000 | building Wolfram Alpha possible was this language,
00:10:26.940 | Wolfram language, which started with Mathematica,
00:10:30.520 | which came out in 1988, and has been sort of
00:10:33.400 | progressively growing since then.
00:10:35.360 | So maybe I should show you some things
00:10:36.880 | about Wolfram language, and it's easy.
00:10:39.800 | You can go use this.
00:10:41.280 | MIT has a site license for it.
00:10:42.920 | You can use it all over the place.
00:10:45.320 | You can find it on the web, et cetera, et cetera, et cetera.
00:10:48.800 | But, okay, the basics work.
00:10:54.440 | Let's start off with something like,
00:10:57.180 | let's make a random graph, and let's say we have
00:11:01.420 | a random graph with 200 nodes, 400 vertices.
00:11:04.220 | Okay, so there's a random graph.
00:11:05.900 | The first important thing about Wolfram language
00:11:07.780 | is that it's a symbolic language.
00:11:09.260 | So I can just pick up this graph, and I could say,
00:11:11.980 | you know, I don't want to do some analysis of this graph.
00:11:14.500 | That graph is just a symbolic thing
00:11:16.500 | that I can just do computations on.
00:11:19.060 | Or I could say, let's get a, another good thing
00:11:23.020 | to always do is get a current image.
00:11:25.740 | See, there we go.
00:11:27.060 | And now I could go and say something like,
00:11:30.560 | let's do some basic thing.
00:11:33.540 | Let's say, let's edge detect that image.
00:11:35.900 | Again, this image is just a thing that we can manipulate.
00:11:40.300 | We could take the image, we could make it,
00:11:42.400 | I don't know, we could take the image
00:11:45.800 | and partition it into little pieces,
00:11:47.860 | do computations on that.
00:11:49.740 | I don't know, simple.
00:11:50.860 | Let's do, let's just say, sort each row of the image,
00:11:55.100 | assemble the image again, oops.
00:11:57.100 | Assemble that image again,
00:12:02.180 | we'll get some mixed up picture there.
00:12:04.620 | If I wanted to, I could, for example,
00:12:06.140 | let's say, let's make that the current image,
00:12:08.020 | and let's say, make that dynamic.
00:12:10.940 | I can be just running that code, hopefully,
00:12:13.700 | and a little loop, and there we can make that work.
00:12:17.780 | So, one general point here is,
00:12:22.780 | this is just an image for us,
00:12:25.900 | it's just a piece of data like anything else.
00:12:28.020 | If we just have a variable, a thing called x,
00:12:30.300 | it just says, okay, that's x,
00:12:31.880 | I don't need to know a particular value,
00:12:33.580 | it's just a symbolic thing that corresponds to,
00:12:37.440 | that's a thing called x.
00:12:39.220 | Now, what gets interesting when you have
00:12:43.420 | symbolic language and so on is,
00:12:45.240 | we're interested in having it represent
00:12:46.760 | stuff about the world, as well as
00:12:48.980 | just abstract kinds of things.
00:12:50.180 | I mean, I can abstractly say,
00:12:52.780 | find some funky integral, I don't know what,
00:12:57.340 | that's then representing, using symbolic variables
00:13:01.680 | to represent algebraic kinds of things.
00:13:03.480 | But I could also just say, I don't know,
00:13:05.740 | something like Boston.
00:13:07.660 | And Boston is another kind of symbolic thing
00:13:10.500 | that has, if I say, what is it really inside?
00:13:14.020 | That's the entity, a city, Boston,
00:13:17.580 | Massachusetts, United States.
00:13:19.540 | Actually, you notice when I typed that in,
00:13:21.040 | I was using natural language to type it in,
00:13:23.800 | and it gave me a bunch of disambiguation here.
00:13:26.200 | It said, assuming Boston is a city,
00:13:29.180 | assuming Boston, Massachusetts,
00:13:30.600 | use Boston, New York, or, okay,
00:13:32.340 | let's use Boston and the Philippines,
00:13:34.960 | which I've never heard of, but let's try using that instead.
00:13:38.920 | And now, if I look at that, it'll say
00:13:41.160 | it's Boston and some province of the Philippines,
00:13:43.720 | et cetera, et cetera, et cetera.
00:13:45.000 | Now, I might ask it, of that, I could say something like,
00:13:48.880 | what's the population of that?
00:13:52.800 | And it, okay, it's a fairly small place.
00:13:57.640 | Or I could say, for example, let me do this.
00:13:59.760 | Let me say, a geolist plot from that Boston,
00:14:04.480 | let's take from that Boston, two,
00:14:09.000 | and now let's type in Boston again,
00:14:10.720 | and now let's have it use the default meaning
00:14:12.480 | of the word of Boston, and then let's join those up,
00:14:16.160 | and now this should plot, this should show me a plot.
00:14:20.600 | There we go, okay, so there's the path from the Boston
00:14:24.760 | that we picked in the Philippines to the Boston here.
00:14:27.240 | Or we could ask it, I don't know, I could just say,
00:14:29.800 | I could ask it the distance from one to another
00:14:31.640 | or something like that.
00:14:33.160 | So, one of the things here,
00:14:35.720 | one of the things we found really, really useful, actually,
00:14:39.040 | in Wolfram Language, so first of all,
00:14:40.720 | there's a way of representing stuff about the world,
00:14:43.080 | like cities, for example.
00:14:45.080 | Or let's say I want to say, let's do this.
00:14:46.960 | Let's say, let's do something with cities.
00:14:51.120 | Let's say capital cities in South America.
00:14:53.920 | Okay, so notice, this is a piece of natural language.
00:14:56.840 | This will get interpreted into something
00:14:59.720 | which is precise, symbolic Wolfram Language code
00:15:02.640 | that we can then compute with,
00:15:05.320 | and that will give us the cities in South,
00:15:06.960 | capital cities in South America.
00:15:09.000 | I could, for example, let's say I say, find shortest tour.
00:15:12.000 | So now I'm going to use some, oops,
00:15:16.040 | no, I don't want to do that.
00:15:17.120 | What I want to do first is to say,
00:15:18.920 | show me the geopositions of all those cities
00:15:22.880 | on line 21 there.
00:15:24.440 | So now it will find the geopositions,
00:15:26.360 | and now it will say, compute the shortest tour.
00:15:29.200 | So that's saying there's a 10,000 mile
00:15:32.200 | traveling salesman tour around those cities,
00:15:34.920 | so I could take those cities that are on line 21,
00:15:37.320 | and I could say, order the cities according to this,
00:15:41.240 | and then I could make another geolist plot of that,
00:15:45.320 | join it up, and this should now show us
00:15:49.280 | a traveling salesman tour of the capital cities
00:15:52.840 | in South America.
00:15:53.820 | So, it's sort of interesting to see what's involved
00:15:58.520 | in making stuff like this work.
00:16:03.000 | One of, my goal has been to sort of automate
00:16:07.000 | as much as possible about things that have to be computed,
00:16:10.920 | and that means knowing as many algorithms as possible,
00:16:14.400 | and also knowing as much data about the world as possible.
00:16:18.200 | And I kind of view this as sort of a knowledge-based
00:16:21.040 | programming approach, where you have a typical kind of idea
00:16:25.840 | in programming languages is, you have some small
00:16:28.400 | programming languages, has a few primitives
00:16:30.480 | that are pretty much tied into what a machine
00:16:32.920 | can intrinsically do, and then maybe you'll have libraries
00:16:36.800 | that add on to that and so on.
00:16:38.500 | My kind of crazy idea of many, many years ago
00:16:42.720 | has been to build an integrated system
00:16:45.440 | where all of the stuff about different domains
00:16:48.200 | of knowledge and so on are all just built into the system
00:16:51.760 | and designed in a coherent way.
00:16:54.500 | I mean, this has been kind of the story of my life
00:16:56.400 | for the last 30 years, is trying to keep the design
00:16:58.560 | of the system coherent, even as one adds
00:17:02.600 | all sorts of different areas of capability.
00:17:07.000 | So, as, I mean, we can go and dive into all sorts
00:17:11.240 | of different kinds of things here, but maybe as an example,
00:17:15.280 | well, let's do, what could we do here?
00:17:17.120 | We could take, let's try, how about this?
00:17:20.680 | Is that a bone?
00:17:21.760 | I think so, that's a bone.
00:17:23.480 | So let's try that.
00:17:27.160 | (keyboard clicking)
00:17:29.160 | As a mesh region.
00:17:30.440 | See if that works.
00:17:32.520 | So this will now use a completely different domain
00:17:34.880 | of human endeavor.
00:17:36.880 | Okay, oops, there's two of those bones.
00:17:38.720 | Let's try, let's just try, let's try left humerus,
00:17:43.720 | and let's try that, the mesh region for that,
00:17:48.520 | and now we should have a bone here.
00:17:50.480 | Okay, there's a representation of a bone.
00:17:52.960 | Let's take that bone, and we could, for example,
00:17:55.040 | say, let's take the surface area of that,
00:17:59.320 | as in some units, or I could, let's do some much more
00:18:02.560 | outrageous thing.
00:18:03.400 | Let's say we take region distance.
00:18:06.760 | So we're going to take the distance from that bone
00:18:11.760 | to a point, let's say, zero, zero, Z,
00:18:16.760 | and let's make a plot of that distance with Z going from,
00:18:21.760 | let's say, I have no idea where this bone is,
00:18:24.960 | but let's try something like this.
00:18:26.600 | So that was really boring.
00:18:28.120 | Let's try, so what this is doing, again,
00:18:34.320 | a whole bunch of stuff has to work
00:18:35.680 | in order for this to operate.
00:18:37.560 | This has to be, this is some region in 3D space
00:18:40.840 | that's represented by some mesh.
00:18:42.680 | You have to compute, you know,
00:18:44.040 | do the computational geometry to figure out where it is.
00:18:46.320 | If I wanted to, let's try anatomy plot 3D,
00:18:51.600 | and let's say something like left hand, for example,
00:18:55.440 | and now it's going to show us probably the complete data
00:18:58.120 | that it has about the geometry of a left hand.
00:19:03.120 | There we go.
00:19:07.200 | Okay, so there's the result, and we could take that apart
00:19:09.800 | and start computing things from it and so on.
00:19:12.080 | So what, so this is,
00:19:18.960 | so there's a lot of kind of computational knowledge
00:19:23.960 | that's built in here.
00:19:25.840 | One, let's talk a little bit about
00:19:29.640 | kind of the modern machine learning story.
00:19:31.560 | So for instance, if I say, let's get a picture here.
00:19:34.520 | Let's say, let's just say picture of,
00:19:38.560 | has anybody got a favorite kind of animal?
00:19:40.660 | What?
00:19:44.320 | Panda.
00:19:45.160 | Okay, so let's try, okay, giant panda.
00:19:50.160 | Okay, okay, there's a panda.
00:19:55.120 | Let's see what, now let's try saying,
00:19:58.160 | let's try for this panda, let's try saying image identify,
00:20:03.840 | and now here we'll be embarrassed probably,
00:20:05.640 | but let's just see, let's see what happens.
00:20:07.520 | If I say image identify that,
00:20:09.840 | and now it'll hopefully, wake up, wake up, wake up.
00:20:14.160 | This only takes a few hundred milliseconds.
00:20:15.800 | Okay, very good, giant panda.
00:20:17.560 | Let's see what the runners up were to the giant panda.
00:20:21.040 | Let's say we want to say the 10 runners up
00:20:29.240 | in all categories for that thing, okay.
00:20:32.280 | So a giant panda, a procyonid, which I've never heard of.
00:20:37.280 | Are pandas carnivorous?
00:20:39.220 | They eat bamboo shoots, okay.
00:20:42.520 | So that was so lucky it didn't get that one.
00:20:45.240 | It's really sure it's a mammal,
00:20:46.800 | and it's absolutely certain it's a vertebrate.
00:20:49.160 | Okay, so you might ask, how did it figure this out?
00:20:53.340 | And so then you can kind of look under the hood and say,
00:20:57.680 | so we have a whole framework
00:20:59.280 | for representing neural nets symbolically.
00:21:01.720 | And so this is the actual model that it's using to do this.
00:21:05.520 | So this is a, so there's a neural net,
00:21:08.120 | and it's got, we can drill down,
00:21:10.040 | and we can see there's a piece of the neural net.
00:21:12.920 | We can drill down even further to one of these,
00:21:14.760 | and we can probably see what,
00:21:15.820 | that's a batch normalization layer,
00:21:17.600 | somewhere deep, deep inside the entrails of the,
00:21:20.840 | not panda, but of this thing, okay.
00:21:24.240 | So now let's take that object,
00:21:25.760 | which is just a symbolic object,
00:21:27.240 | and let's feed it the picture of the panda.
00:21:29.800 | And we can see, and there, oops.
00:21:33.720 | I was not giving it the right thing.
00:21:36.200 | What did I just do wrong here?
00:21:37.640 | Oh, here, let's take, oh, I see what I did.
00:21:40.000 | Okay, let's take this thing
00:21:41.680 | and feed it the picture of the panda,
00:21:43.240 | and it says it's a giant panda, okay.
00:21:45.320 | How about we do something more outrageous?
00:21:47.120 | Let's take that neural net,
00:21:48.800 | and let's only use the first, let's say,
00:21:51.080 | 10 layers of the neural net.
00:21:53.000 | So let's just take out 10 layers of the neural net
00:21:55.360 | and feed it the panda.
00:21:57.000 | And now what we'll get is something
00:21:58.960 | from the insides of the neural net,
00:22:01.120 | and I could say, for example,
00:22:02.520 | let's just make those into images.
00:22:03.840 | Okay, so that's what the neural net had figured out
00:22:08.160 | about the panda after 10 layers
00:22:10.680 | of going through the neural net.
00:22:12.240 | And maybe, actually, it'd be interesting to see,
00:22:13.880 | let's do a feature space plot.
00:22:15.600 | So now we're going to, of those intermediate things
00:22:19.920 | in the brain of the neural net, so to speak,
00:22:22.920 | this is now taking, so what this is just doing
00:22:25.800 | is to do dimension reduction on this space of images,
00:22:30.800 | and so it's not very exciting.
00:22:33.080 | It's probably mostly distinguishing these
00:22:34.680 | by total gray level, but that's kind of showing us
00:22:37.600 | the space of different sort of features
00:22:41.800 | of the insides of this neural net.
00:22:43.280 | So it's also, what's interesting to see here
00:22:45.760 | is things like the symbolic representation
00:22:47.680 | of the neural net, and if you're wondering
00:22:49.240 | how does that actually work inside,
00:22:51.080 | it's underneath, it's using MXNet,
00:22:52.880 | which we happen to have contributed to a lot,
00:22:55.280 | and there's sort of a bunch of symbolic layers
00:22:57.280 | on top of that that feed into that.
00:23:00.080 | And maybe I can show you here.
00:23:01.240 | Let me show you how you would train one of these neural nets.
00:23:03.400 | That's also kind of fun.
00:23:04.840 | So we have a data repository
00:23:08.040 | that has all sorts of useful data.
00:23:10.080 | One piece of data it has is a bunch
00:23:12.560 | of neural net training sets, so this is
00:23:14.840 | the standard MNIST training set of handwritten digits.
00:23:19.000 | Okay, so there's MNIST, and you notice
00:23:21.320 | that these things here, that's just an image
00:23:23.240 | which I could copy out, and I could do,
00:23:25.800 | let's say I could do color negate on that image
00:23:28.960 | 'cause it's just an image, and there's the result and so on.
00:23:33.200 | And now I could say, let's take a neural net,
00:23:36.400 | like let's take a simple neural net like Lynette,
00:23:38.560 | for example, okay, so let's take Lynette,
00:23:42.240 | and then let's take the untrained evaluation network.
00:23:47.200 | So this is now a version of Lynette,
00:23:49.200 | simple, standard neural net that didn't get trained.
00:23:52.480 | So for example, if I take that symbolic representation
00:23:56.120 | of Lynette, and I could say net initialize,
00:23:59.200 | then it will take that, and it'll just put random weights
00:24:03.000 | into Lynette, okay, so if I take those random weights,
00:24:06.320 | and I feed it a zero here, I feed it that image of a zero,
00:24:10.320 | it will presumably produce something completely random,
00:24:12.840 | in this particular case, two, right?
00:24:15.120 | So now what I would like to do is to take this,
00:24:18.000 | so that was just randomly initializing the weights.
00:24:20.480 | So now what I'd like to do is to take
00:24:22.360 | the MNIST training set, and I'd like to actually train
00:24:26.120 | Lynette using MNIST training set, so let's take this,
00:24:31.160 | and let's take a random sample of,
00:24:35.520 | let's say, I don't know, 1,000 pieces of Lynette.
00:24:39.400 | Come on, why is it having to load it again?
00:24:42.120 | There we go, okay, so there's a random sample there,
00:24:45.520 | it was on line 21, and now let me go down here,
00:24:48.640 | and say, where was it?
00:24:51.280 | Well, we can just take this thing here,
00:24:54.280 | so this is the uninitialized version of Lynette,
00:24:58.200 | and we can say take that, and then let's say
00:25:00.680 | net train of that with the thing on line 21,
00:25:04.840 | which was that 1,000 instances.
00:25:06.520 | So now what it's doing is it's running training on,
00:25:10.560 | and that's, you see the loss going down and so on.
00:25:13.600 | It's running training for those 1,000 instances of Lynette,
00:25:18.600 | and it will, we can stop it if we want to.
00:25:22.160 | Actually, this is a new display, this is very nice.
00:25:24.520 | This is a new version of Wolfram Language,
00:25:26.440 | which is coming out next week, which I'm showing you,
00:25:29.160 | but it's quite similar to what exists today,
00:25:31.360 | but because that's one of the features
00:25:33.800 | of running a software company is that you always run
00:25:35.800 | the very latest version of things, for better or worse,
00:25:39.520 | and this is also a good way to debug it,
00:25:41.360 | 'cause it's supposed to come out next week.
00:25:42.880 | If I find some horrifying bug, maybe it will get delayed,
00:25:46.520 | but let's try, let's try this.
00:25:51.520 | Okay, now it says it's zero, okay,
00:25:54.760 | and so this is now a trained version of Lynette,
00:25:57.720 | trained with that training data.
00:26:00.560 | One of the things, so we can talk about all kinds
00:26:04.760 | of details of neural nets and so on,
00:26:06.960 | but maybe I should zoom out to talk a little bit
00:26:08.760 | about bigger picture as I see it.
00:26:10.760 | So one question is, sort of a question of what is
00:26:17.080 | in principle possible to do with computation?
00:26:21.400 | So we have, as we're building all kinds of things,
00:26:24.800 | we're making image identifiers,
00:26:26.320 | we're figuring out all kinds of things
00:26:28.480 | about where the International Space Station is and so on.
00:26:31.080 | Question is, what is in principle possible to compute?
00:26:35.560 | And so one of the places one can ask that question
00:26:39.560 | is when one looks at, for example,
00:26:41.760 | models of the natural world.
00:26:43.120 | One can say, how do we make models of the natural world?
00:26:46.120 | Kind of a traditional approach has been,
00:26:48.760 | let's use mathematical equations
00:26:50.320 | to make models of the natural world.
00:26:52.120 | A question is, if we want to kind of generalize that
00:26:55.280 | and say, well, what are all possible ways
00:26:56.920 | to make models of things,
00:26:58.560 | what can we say about that question?
00:27:00.880 | So I spent many years of my life
00:27:02.880 | trying to address that question.
00:27:04.880 | And basically what I've thought about a lot
00:27:08.040 | is that if you want to make a model of a thing,
00:27:10.680 | you have to have definite rules
00:27:12.080 | by which the thing operates.
00:27:13.560 | What's the most general way to represent possible rules?
00:27:16.920 | Well, in today's world, we think of that as a program.
00:27:19.520 | So the next question is, well,
00:27:20.600 | what does the space of all possible programs look like?
00:27:23.920 | And most of the time, we're writing programs
00:27:26.040 | like Wolfram Language is 50 million lines of code,
00:27:29.120 | and it's a big, complicated program
00:27:30.880 | that was built for a fairly specific purpose.
00:27:34.080 | But the question is, if we just look at
00:27:35.880 | sort of the space of possible programs,
00:27:38.040 | more or less at random,
00:27:38.960 | what's out there in the space of possible programs?
00:27:40.960 | So I got interested many years ago in cellular automata,
00:27:44.640 | which are a really good example
00:27:45.960 | of a very simple kind of program.
00:27:48.080 | So let me show you an example of one of these.
00:27:49.960 | So these are the rules for a typical cellular automaton.
00:27:54.200 | And this just says you have a row of black and white squares,
00:27:58.200 | and this just says you look at a square,
00:28:01.200 | say what color is that square,
00:28:02.360 | what color are its left and right neighbors,
00:28:05.020 | decide what color of the square will be on the next step
00:28:07.240 | based on that rule.
00:28:08.600 | Okay, so really simple rule.
00:28:10.360 | So now let's take a look at what actually happens
00:28:13.520 | if we use that rule a bunch of times.
00:28:15.400 | So we can take that rule,
00:28:17.320 | the 254 is just the binary digits
00:28:20.200 | that correspond to those positions in this rule.
00:28:23.160 | So now I can say this, I could say let's do 50 steps,
00:28:26.800 | let me do this.
00:28:28.560 | And now if I run according to the rule I just defined,
00:28:33.560 | it turns out to be pretty trivial.
00:28:35.480 | It's just saying, if any square is,
00:28:39.080 | if we start off with a black square,
00:28:40.640 | if any square is, if any neighboring square is black,
00:28:43.960 | make a black square.
00:28:44.880 | So we've used a very simple program.
00:28:47.140 | We got a very simple result out.
00:28:49.040 | Okay, let's try a different program.
00:28:50.400 | We can try changing this.
00:28:52.040 | We'll get, that's a program with one bit different.
00:28:55.960 | Now we get that kind of pattern.
00:28:57.840 | So the question is, well, what happens,
00:28:59.880 | you might say, okay, if you've got such a trivial program,
00:29:03.120 | it's not surprising you're just gonna get
00:29:04.400 | trivial results out.
00:29:06.000 | So, but you can do an experiment to test that hypothesis.
00:29:09.000 | You can just say, let's take all possible programs,
00:29:11.440 | there are 256 possible programs
00:29:13.920 | that are based on these eight bits here.
00:29:16.360 | Let's just take, well, let's just, whoops.
00:29:19.420 | Let's just take, let's say the first 64 of those programs
00:29:24.420 | and let's just make a, there we go.
00:29:28.100 | Let's just make a table of the results that we get
00:29:32.020 | by running those first 64 programs here.
00:29:35.460 | So here we get the result.
00:29:37.280 | And what you see is, well, most of them are pretty trivial.
00:29:39.700 | They start off with one black cell in the middle
00:29:42.020 | and it just tools off to one side.
00:29:44.180 | Occasionally we get something more exciting happening
00:29:46.100 | like here's a nice nested pattern that we get.
00:29:48.540 | If we were to continue it longer,
00:29:49.700 | it would make more detailed nesting.
00:29:53.560 | But then, my all time favorite science discovery,
00:29:57.700 | if you go on and just look at these,
00:29:59.620 | after a while you find this one here,
00:30:02.140 | which is rule 30 in this numbering scheme.
00:30:06.100 | And that's doing something a bit more complicated.
00:30:08.260 | You say, well, what's going on here?
00:30:10.020 | We just started off with this very simple rule.
00:30:12.260 | Let's see what happens.
00:30:13.100 | Maybe after a while, if we run rule 30
00:30:15.900 | long enough, it will resolve into something simpler.
00:30:18.940 | So let's try running it, let's say 500 steps.
00:30:21.400 | And that's the, whoops, that's the result we get.
00:30:24.820 | Let's say, let's just make it full screen.
00:30:29.820 | Okay, it's aliasing a bit on the projector there,
00:30:35.480 | but you get the basic idea.
00:30:37.720 | This is a, so this just started off
00:30:39.660 | from one black cell at the top and this is what it made.
00:30:42.820 | And that's pretty weird because all this is,
00:30:45.340 | you know, this is sort of not the way it's supposed,
00:30:48.220 | things are supposed to work.
00:30:49.340 | 'Cause what we have here is just that little program
00:30:52.480 | down there and it makes this big complicated pattern here.
00:30:55.980 | And, you know, we can see there's a certain amount
00:30:57.620 | of regularity on one side, but for example,
00:31:00.100 | the center column of this pattern is,
00:31:02.180 | for all practical purposes, completely random.
00:31:04.220 | In fact, it was, we used it as the random number generator
00:31:07.380 | in mathematical and morphem language for many years.
00:31:09.820 | It was recently retired after excellent service
00:31:12.980 | because we found a somewhat more efficient one.
00:31:15.280 | But the, so, you know, what do we learn from this?
00:31:21.320 | What we learn from this is out in the computational universe
00:31:24.660 | of possible programs, it's possible to get,
00:31:27.840 | even with very simple programs,
00:31:29.620 | very rich, complicated behavior.
00:31:31.900 | Well, that's important if you're interested
00:31:33.300 | in modeling the natural world because you might think
00:31:36.560 | that there are programs that represent systems in nature
00:31:39.600 | that might work this way and so on.
00:31:40.940 | It's also important for technology because it says,
00:31:44.300 | okay, let's say you're trying to find a,
00:31:48.220 | let's say you're trying to find a program
00:31:49.460 | that's a good random number generator.
00:31:50.740 | How are you gonna do that?
00:31:52.060 | Well, you could start thinking very hard
00:31:53.940 | and you could try and make up, you know,
00:31:55.220 | you could try and write down all kinds of flow charts
00:31:57.820 | about how this random number generator is going to work.
00:32:00.100 | Or you can say, forget that, I'm just gonna search
00:32:02.120 | the computational universe of possible programs
00:32:04.500 | and just look for one that serves
00:32:06.840 | as a good random number generator.
00:32:08.100 | In this particular case, after you've searched 30 programs,
00:32:11.260 | you'll find one that makes a good random number generator.
00:32:13.760 | Why does it work?
00:32:15.140 | That's a complicated story.
00:32:16.740 | It's not a story that I think necessarily
00:32:19.320 | we can really tell very well.
00:32:21.260 | But what's important is that this idea
00:32:24.120 | that out in the computational universe,
00:32:25.660 | there's a lot of rich, sophisticated stuff
00:32:29.140 | that can be essentially mined for our technological purposes.
00:32:32.560 | That's the important thing.
00:32:34.220 | Whether we understand how this works is a different matter.
00:32:37.540 | I mean, it's like when we look at the natural world,
00:32:39.500 | the physical world, we're used to kind of mining things.
00:32:42.220 | You know, we started using magnets to do magnetic stuff
00:32:45.860 | long before we understood the theory
00:32:48.140 | of ferromagnetism and so on.
00:32:50.020 | And so similarly here, we can sort of go out
00:32:52.880 | into the computational universe
00:32:54.260 | and find stuff that's useful for our purposes.
00:32:57.260 | Now, in fact, the world of sort of deep learning
00:33:00.820 | and neural nets and so on is a little bit like this.
00:33:02.900 | It uses the trick that there's a certain degree
00:33:05.340 | of differentiability there,
00:33:06.600 | so you can kind of home in on let's try
00:33:09.380 | and find something that's incrementally better.
00:33:11.660 | And for certain kinds of problems, that works pretty well.
00:33:14.480 | I think the thing that we've done a lot, I've done a lot,
00:33:17.280 | is just sort of exhaustive search
00:33:19.380 | in the computational universe of possible programs.
00:33:21.420 | Just search a trillion programs and try and find one
00:33:24.140 | that does something interesting and useful for you.
00:33:27.340 | There's a lot of things to say about what,
00:33:30.220 | well, actually, in the search a trillion programs
00:33:32.340 | and find one that's useful,
00:33:33.860 | let me show you another example of that.
00:33:36.500 | Let's see.
00:33:37.340 | So I was interested a while ago in,
00:33:41.140 | I have to look something up here, sorry.
00:33:44.980 | Let me see here.
00:33:48.860 | In Boolean algebra,
00:33:54.620 | and I was interested in the space
00:33:57.880 | of all possible mathematicses.
00:33:59.760 | And let me just see here.
00:34:05.100 | I'm not finding what I wanted to find, sorry.
00:34:08.700 | That was a good example.
00:34:11.420 | I should have memorized this, but I haven't.
00:34:14.980 | So here we go.
00:34:17.940 | There it is.
00:34:18.780 | So I was interested in if you just look at,
00:34:25.080 | so we talked about sort of looking at
00:34:27.220 | the space of all possible,
00:34:29.100 | the space of all possible programs.
00:34:34.100 | Another thing you can do is say,
00:34:35.420 | if you're gonna invent mathematics from nothing,
00:34:37.940 | what possible axiom systems could we use in mathematics?
00:34:40.540 | So I was curious, where do,
00:34:43.380 | and that, again, might seem like
00:34:45.020 | a completely crazy thing to do,
00:34:46.860 | to just say, let's just start enumerating axiom systems
00:34:49.540 | at random and see if we find one
00:34:50.860 | that's interesting and useful.
00:34:52.340 | But it turns out, once you have this idea
00:34:56.040 | that out in the computational universe of possible programs,
00:34:59.060 | there's actually a lot of low-hanging fruit to be found,
00:35:02.020 | it turns out you can apply that idea in lots of places.
00:35:04.220 | I mean, the thing to understand is,
00:35:05.500 | why do we not see a lot of engineering structures
00:35:08.220 | that look like this?
00:35:09.060 | The reason is because our traditional model
00:35:11.900 | of engineering has been, we engineer things
00:35:14.380 | in a way where we can foresee what the outcome
00:35:16.940 | of our engineering steps are going to be.
00:35:19.660 | And when it comes to something like this,
00:35:21.300 | we can find it out in the computational universe,
00:35:23.780 | but we can't readily foresee what's going to happen.
00:35:25.740 | We can't do sort of a step-by-step design
00:35:28.380 | of this particular thing.
00:35:29.580 | And so in engineering, in human engineering,
00:35:31.780 | as it's been practiced so far,
00:35:33.860 | most of it has consisted of building things
00:35:36.660 | where we can foresee step-by-step
00:35:38.820 | what the outcome of our engineering is going to be.
00:35:40.480 | And we see that in programs,
00:35:41.980 | we see that in other kinds of engineering structures.
00:35:45.460 | And so there's sort of a different kind of engineering,
00:35:47.260 | which is about mining the computational universe
00:35:49.780 | of possible programs.
00:35:50.860 | And it's worth realizing there's a lot more
00:35:53.500 | that can be done a lot more efficiently
00:35:55.300 | by mining the computational universe of possible programs
00:35:58.220 | than by just constructing things step-by-step as a human.
00:36:00.660 | So for example, if you look for optimal algorithms
00:36:03.060 | for things, like, I don't know,
00:36:04.420 | even something like sorting networks,
00:36:06.520 | the optimal sorting networks look very complicated.
00:36:09.700 | They're not things that you would construct
00:36:12.260 | by sort of step-by-step thinking about things
00:36:15.660 | with in a kind of typical human way.
00:36:19.980 | And so this idea,
00:36:23.020 | if you're really going to have computation work efficiently,
00:36:26.040 | you are going to end up with these programs
00:36:28.320 | that are sort of just mined from the computational universe.
00:36:31.520 | And one of the issues with mining things,
00:36:33.600 | so that this makes use of computation much more efficiently
00:36:37.900 | than a typical thing that we might construct.
00:36:40.240 | Now, one feature of this is
00:36:41.640 | it's hard to understand what's going on.
00:36:43.560 | And there's actually a fundamental reason for that,
00:36:45.500 | which is in our efforts to sort of understand
00:36:47.960 | what's going on, we get to use our brains,
00:36:50.400 | our computers, our mathematics, or whatever.
00:36:52.720 | And our goal is this particular little program
00:36:56.340 | did a certain amount of computation
00:36:57.560 | to work out this pattern.
00:36:58.960 | The question is, can we kind of outrun that computation
00:37:02.220 | and say, oh, I can tell that actually
00:37:04.720 | this particular bit down here is going to be a black bit.
00:37:08.920 | You don't have to go and do all that computation.
00:37:11.320 | But it turns out that, and again,
00:37:13.640 | this will maybe is a digression,
00:37:15.560 | which there's this phenomenon I call
00:37:17.840 | computational irreducibility,
00:37:19.660 | which I think is really common.
00:37:21.200 | And it's a consequence of this thing
00:37:22.360 | I call principle of computational equivalence.
00:37:24.640 | And that principle of computational equivalence
00:37:27.100 | basically says, as soon as you have a system
00:37:29.920 | whose behavior isn't fairly easy to analyze,
00:37:33.280 | the chances are that the computation it's doing
00:37:35.840 | is essentially as sophisticated as it could be.
00:37:38.500 | And that has consequences like it implies
00:37:40.600 | that the typical thing like this
00:37:42.800 | will correspond to a universal computer
00:37:45.080 | that you can use to program anything.
00:37:47.440 | It also has the consequence of this
00:37:48.880 | computational irreducibility phenomenon
00:37:50.940 | that says you can't expect our brains
00:37:53.880 | to be able to outrun the computations
00:37:55.640 | that are going on inside the system.
00:37:57.520 | If there was computational reducibility,
00:38:00.160 | then we can expect that this thing went to a lot of trouble
00:38:02.920 | and did a million steps of evolution.
00:38:05.100 | But actually just by using our brains,
00:38:06.800 | we can jump ahead and see what the answer will be.
00:38:09.540 | Computational irreducibility suggests that isn't the case.
00:38:12.180 | If we're going to make the most efficient use
00:38:14.240 | of computational resources,
00:38:16.020 | we will inevitably run into computational irreducibility
00:38:18.660 | all over the place.
00:38:19.880 | It has the consequence that we get the situation
00:38:22.760 | where we can't readily sort of foresee
00:38:24.600 | and understand what's going to happen.
00:38:26.280 | So back to mathematics for a second.
00:38:28.640 | So this is just an axiom system that,
00:38:32.640 | so I looked for all possible,
00:38:34.200 | looked through sort of all possible axiom systems
00:38:36.960 | starting off with really tiny ones.
00:38:38.880 | And I asked the question,
00:38:40.000 | what's the first axiom system
00:38:42.160 | that corresponds to Boolean algebra?
00:38:44.560 | So it turns out this thing here,
00:38:46.360 | this tiny little thing here,
00:38:48.600 | generates all theorems of Boolean algebra.
00:38:50.400 | It is the simplest axiom for Boolean algebra.
00:38:53.480 | Now, something I have to show you this
00:38:55.120 | 'cause it's a new feature you see.
00:38:56.820 | If I say, find equational proof,
00:39:01.380 | let's say I want to prove commutativity
00:39:03.800 | of the NAND operation.
00:39:05.200 | I'm gonna show you something here.
00:39:06.960 | This is going to try to generate,
00:39:09.200 | let's see if this works.
00:39:11.040 | This is going to try to generate an automated proof
00:39:14.400 | based on that axiom system of that result.
00:39:16.980 | So it had 102 steps in the proof.
00:39:19.560 | And let's try and say,
00:39:21.720 | let's look at, for example, the proof network here.
00:39:24.560 | Actually, let's look at the proof dataset.
00:39:27.440 | No, that's not what I wanted.
00:39:28.920 | Oh, I should learn how to use this, shouldn't I?
00:39:32.200 | (audience laughing)
00:39:35.200 | Let's see.
00:39:37.720 | What I want is the,
00:39:40.360 | yeah, proof dataset.
00:39:42.600 | There we go.
00:39:43.440 | Very good.
00:39:44.260 | Okay, so this is,
00:39:47.160 | actually, let's say,
00:39:49.320 | first of all, let's say the proof graph.
00:39:51.760 | Okay, so this is gonna show me how that proof was done.
00:39:55.600 | So there are a bunch of lemmas that got proved,
00:39:58.560 | and from those lemmas, those lemmas were combined,
00:40:00.920 | and eventually it proved the result.
00:40:02.920 | So let's take a look at what some of those lemmas were.
00:40:06.160 | Okay, so here's the result.
00:40:10.120 | So after, so it goes through,
00:40:12.200 | and these are various lemmas it's using,
00:40:14.080 | and eventually, after many pages of nonsense,
00:40:17.360 | it will get to the result.
00:40:18.880 | Okay, each one of these,
00:40:19.800 | some of these lemmas are kind of complicated there.
00:40:22.240 | That's that lemma.
00:40:23.520 | It's a pretty complicated lemma,
00:40:25.280 | et cetera, et cetera, et cetera.
00:40:26.200 | So you might ask, what on earth is going on here?
00:40:28.840 | And the answer is,
00:40:29.740 | so I first generated a version of this proof 20 years ago,
00:40:32.680 | and I tried to understand what was going on,
00:40:34.300 | and I completely failed.
00:40:36.000 | And it's sort of embarrassing
00:40:37.200 | because this is supposed to be a proof.
00:40:39.360 | It's supposed to be demonstrating some results,
00:40:42.500 | and what we realize is that,
00:40:44.720 | you know, what does it mean to have a proof of something?
00:40:47.160 | What does it mean to explain how a thing is done?
00:40:50.280 | You know, what is the purpose of a proof?
00:40:52.080 | Purpose of a proof is basically
00:40:53.520 | to let humans understand why something is true.
00:40:56.760 | And so, for example, if you go to,
00:40:58.740 | let's say we go to Wolfram Alpha,
00:41:01.680 | and we do, you know, some random thing where we say,
00:41:05.520 | let's do, you know, an integral of something or another,
00:41:08.620 | it will be able to very quickly,
00:41:10.660 | in fact, it will take it only milliseconds internally
00:41:13.120 | to work out the answer to that integral, okay?
00:41:15.660 | But then somebody who wants to hand in a piece of homework
00:41:18.320 | or something like that needs to explain why is this true.
00:41:22.480 | Okay, well, we have this handy
00:41:24.960 | step-by-step solution thing here,
00:41:28.280 | which explains why it's true.
00:41:32.520 | Now, the thing I should admit
00:41:33.880 | about the step-by-step solution is it's completely fake.
00:41:37.200 | That is, the steps that are described
00:41:39.120 | in the step-by-step solution
00:41:40.280 | have absolutely nothing to do with the way
00:41:41.860 | that internally that integral was computed.
00:41:44.560 | These are steps created purely for the purpose
00:41:47.360 | of telling a story to humans
00:41:49.340 | about why this integral came out the way it did.
00:41:52.120 | And now what we're seeing,
00:41:53.480 | and so that's a, so there's one thing is knowing the answer,
00:41:56.040 | the other thing is being able to tell a story
00:41:57.720 | about why the answer worked that way.
00:41:59.760 | Well, what we see here is this is a proof,
00:42:02.920 | but it was an automatically generated proof,
00:42:05.280 | and it's a really lousy story for us humans.
00:42:07.980 | I mean, if it turned out that one of these theorems here
00:42:10.640 | was one that had been proved by Gauss or something
00:42:13.480 | and appeared in all the textbooks,
00:42:15.400 | we would be much happier because then we would start
00:42:17.440 | to have a kind of human representable story
00:42:20.540 | about what was going on.
00:42:21.440 | Instead, we just get a bunch of machine-generated lemmas
00:42:24.120 | that we can't understand,
00:42:25.040 | that we can't kind of wrap our brains around.
00:42:27.360 | And it's sort of the same thing that's going on
00:42:30.240 | in when we look at one of these neural nets.
00:42:32.920 | We're seeing, you know, when we were looking
00:42:34.480 | wherever it was at the innards of that neural net,
00:42:37.400 | and we say, well, how is it figuring out
00:42:39.320 | that that's a picture of a panda?
00:42:40.920 | Well, the answer is it decided that, you know,
00:42:43.840 | if we humans were saying, how would you figure out
00:42:45.960 | if it's a picture of a panda?
00:42:46.960 | We might say, well, look and see if it has eyes.
00:42:50.040 | That's a clue for whether it's an animal.
00:42:51.560 | Look and see if it looks like it's kind of round
00:42:53.840 | and furry and things.
00:42:55.340 | That's a version of whether it's a panda
00:42:57.240 | and et cetera, et cetera, et cetera.
00:42:59.220 | But what it's doing is it learned a bunch of criteria
00:43:02.240 | for, you know, is it a panda or is it one of 10,000
00:43:04.880 | other possible things that it could have recognized?
00:43:07.160 | And it learned those criteria in a way
00:43:09.640 | that was somehow optimal based on the training
00:43:12.400 | that it got and so on.
00:43:13.560 | But it learned things which were distinctions
00:43:15.440 | which are different from the distinctions
00:43:16.960 | that we humans make in the language that we as humans use.
00:43:21.200 | And so in some sense, you know, when we start talking about,
00:43:24.620 | well, describe a picture, we have a certain human language
00:43:27.560 | for describing that picture.
00:43:28.920 | We have, you know, in our human, in typical human languages,
00:43:32.020 | we have maybe 30 to 50,000 words
00:43:34.040 | that we use to describe things.
00:43:35.840 | Those words are words that have sort of evolved
00:43:38.480 | as being useful for describing the world that we live in.
00:43:42.400 | When it comes to this neural net, it could be using,
00:43:46.040 | it could say, well, the words that it has effectively
00:43:49.720 | learned which allow it to make distinctions
00:43:52.360 | about what's going on in the analysis that it's doing,
00:43:56.500 | it has effectively invented words
00:43:59.020 | that describe distinctions, but those words have nothing
00:44:02.080 | to do with our historically invented words
00:44:04.620 | that exist in our languages.
00:44:06.160 | So it's kind of an interesting situation
00:44:07.720 | that it is its way of thinking, so to speak.
00:44:10.800 | If you say, well, what's it thinking about?
00:44:12.300 | How do we describe what it's thinking?
00:44:14.120 | That's a tough thing to answer,
00:44:15.940 | because just like with the automated theorem,
00:44:19.220 | we're sort of stuck having to say,
00:44:22.520 | well, we can't really tell a human story
00:44:25.440 | because the things that it invented are things
00:44:27.780 | for which we don't even have words
00:44:29.060 | in our languages and so on.
00:44:31.200 | Okay, so one thing to realize is in this kind of space
00:44:36.240 | of sort of all possible computations,
00:44:38.680 | there's a lot of stuff out there that can be done.
00:44:40.880 | There's this kind of ocean of sophisticated computation.
00:44:44.880 | And then the question that we have to ask for us humans
00:44:49.680 | is, okay, how do we make use of all of that stuff?
00:44:53.200 | So what we've got kind of on the one hand
00:44:55.440 | is we've got the things we know how to think about,
00:44:58.520 | human languages, our way of describing things,
00:45:01.480 | our way of talking about stuff,
00:45:02.880 | that's the one side of things.
00:45:04.800 | The other side of things we have is this very powerful
00:45:07.360 | kind of seething ocean of computation on the other side
00:45:10.240 | where lots of things can happen.
00:45:12.000 | So the question is, how do we make use
00:45:13.800 | of this sort of ocean of computation
00:45:15.720 | in the best possible way for our human purposes
00:45:18.380 | and building technology and so on?
00:45:20.400 | And so the way I see my kind of part of what I've spent
00:45:25.400 | a very long time doing is kind of building a language
00:45:29.920 | that allows us to take human thinking on the one hand
00:45:33.440 | and describe and sort of provide
00:45:37.000 | a sort of computational communication language
00:45:39.840 | that allows us to get the benefit of what's possible
00:45:42.560 | over in the sort of ocean of computation
00:45:44.760 | in a way that's rooted in what we humans
00:45:48.080 | actually want to do.
00:45:49.480 | And so I kind of view Wolfram Language
00:45:52.040 | as being sort of an attempt to make a bridge between,
00:45:55.280 | so on the one hand, there's all possible computations.
00:45:58.760 | On the other hand, there's things we think we want to do.
00:46:01.800 | And I view Wolfram Language as being my best attempt
00:46:05.880 | right now to make a way to take our sort of human
00:46:10.480 | computational thinking and be able to actually implement it.
00:46:14.680 | So in a sense, it's a language which works on two sides.
00:46:18.760 | It's both a language where you as the machine
00:46:23.760 | can understand, okay, it's looking at this
00:46:28.320 | and that's what it's going to compute.
00:46:30.080 | But on the other hand, it's also a language for us humans
00:46:33.280 | to think about things in computational terms.
00:46:35.400 | So if I go and I, I don't know,
00:46:37.680 | one of these things that I'm doing here,
00:46:39.960 | whatever it is, this wasn't that exciting,
00:46:42.480 | but find shortest tour of the geo position
00:46:46.840 | of the capital cities in South America.
00:46:49.160 | That is a language, that's a representation
00:46:51.760 | and a precise language of something.
00:46:54.120 | And the idea is that that's a language
00:46:56.440 | which we humans can find useful
00:46:59.320 | in thinking about things in computational terms.
00:47:02.080 | It also happens to be a language
00:47:03.560 | that the machine can immediately understand and execute.
00:47:06.560 | And so I think this is sort of a general,
00:47:08.840 | when I think about AI in general,
00:47:10.920 | what is the sort of what's the overall problem?
00:47:14.640 | Well, part of the overall problem is,
00:47:16.000 | so how do we tell the AIs what to do, so to speak?
00:47:19.400 | There's this very powerful,
00:47:21.400 | this sort of ocean of computation is what we get to mine
00:47:24.640 | for purposes of building AI kinds of things.
00:47:27.520 | But then the question is, how do we tell the AIs what to do?
00:47:30.800 | And what I see, what I've tried to do with Wolfram Language
00:47:35.440 | is to provide a way of kind of accessing that computation
00:47:40.440 | and sort of making use of the knowledge
00:47:43.960 | that our civilization has accumulated.
00:47:46.500 | And because that's the, you know,
00:47:49.240 | there's the general computation on this side,
00:47:52.080 | and there's the specific things
00:47:53.640 | that we humans have thought about.
00:47:55.480 | And the question is to make use of the things
00:47:58.480 | that we've thought about to do things
00:48:01.120 | that we care about doing.
00:48:01.960 | Actually, if you're interested in these kinds of things,
00:48:04.080 | I happen to just write a blog post last couple of days ago.
00:48:09.080 | It's kind of a funny blog post.
00:48:10.560 | It's about, well, you can see the title there.
00:48:13.360 | It came because a friend of mine has this crazy project
00:48:16.640 | to put little sort of disks or something
00:48:22.080 | that should represent kind of the best achievements
00:48:25.600 | of human civilization, so to speak,
00:48:27.440 | to send out its hitchhiking on various spacecraft
00:48:31.040 | that are going out into the solar system
00:48:33.880 | in the next little while.
00:48:35.040 | And the question is what to put on this little disk
00:48:37.160 | that kind of represents, you know,
00:48:39.120 | the achievements of civilization.
00:48:40.920 | It's kind of depressing when you go back
00:48:43.400 | and you look at what people have tried to do on this before
00:48:47.280 | and realizing how hard it is to tell
00:48:49.640 | even whether something is an artifact or not.
00:48:52.760 | But this was sort of a, yeah, that's a good one.
00:48:56.040 | That's from 11,000 years ago.
00:48:57.720 | The question is can you figure out what on earth it is
00:49:01.080 | and what it means?
00:49:02.080 | But so what's relevant about this is this whole question
00:49:09.840 | of there are things that are out there
00:49:12.040 | in the computational universe.
00:49:14.000 | And when we think about extraterrestrial intelligence,
00:49:17.480 | I find it kind of interesting that artificial intelligence
00:49:21.480 | is our first example of an alien intelligence.
00:49:24.520 | We don't happen to have found what we view
00:49:26.720 | as extraterrestrial intelligence right now,
00:49:28.680 | but we are in the process of building
00:49:30.560 | pretty decent version of an alien intelligence here.
00:49:33.640 | And the question is if you ask questions like,
00:49:36.760 | well, you know, what is it thinking?
00:49:39.160 | Does it have a purpose in what it's doing and so on?
00:49:41.800 | And you're confronted with things like this.
00:49:43.200 | It's very, you can kind of do a test run
00:49:46.480 | of what's its purpose?
00:49:49.200 | What is it trying to do in a way that is very similar
00:49:52.680 | to the kinds of questions you would ask
00:49:54.080 | about extraterrestrial intelligence?
00:49:56.660 | But in any case, the main point is that I see
00:50:01.660 | this sort of ocean of computation.
00:50:05.240 | There's the let's describe what we actually want to do
00:50:07.800 | with that ocean of computation.
00:50:09.640 | And that's where, that's one of the primary problems
00:50:11.760 | we have.
00:50:12.600 | Now people talk about AI and what is AI going
00:50:15.160 | to allow us to automate?
00:50:16.880 | And my basic answer to that would be,
00:50:19.440 | we'll be able to automate everything that we can describe.
00:50:22.760 | The problem is it's not clear what we can describe.
00:50:25.680 | Or put another way, you imagine various jobs
00:50:28.480 | and people are doing things,
00:50:29.480 | they're repeated judgment jobs, things like this.
00:50:32.180 | They're where we can readily automate those things.
00:50:34.720 | But the thing that we can't really automate is saying,
00:50:37.520 | well, what are we trying to do?
00:50:39.320 | That is what are our goals?
00:50:41.120 | Because in a sense, when we see one of these systems,
00:50:44.120 | let's say it's a cellular automaton here.
00:50:48.080 | The question is, what is this cellular automaton
00:50:50.040 | trying to do?
00:50:51.520 | Maybe I'll give you another cellular automaton
00:50:53.560 | that is a little bit more exciting here.
00:50:55.760 | Let's do this one.
00:50:56.840 | So the question is, what is this cellular automaton
00:51:01.280 | trying to do?
00:51:02.720 | It's got this whole big structure here
00:51:04.760 | and things are happening with it.
00:51:05.880 | We can go, we can run it for a couple of thousand steps.
00:51:08.400 | We can ask, it's a nice example of kind of undecidability
00:51:11.360 | in action, what's gonna happen here?
00:51:13.160 | This is kind of the halting problem.
00:51:14.680 | Is this gonna halt?
00:51:15.520 | What's it gonna do?
00:51:17.280 | There's computational irreducibility,
00:51:18.880 | so we actually can't tell.
00:51:20.360 | There's a case where we know this is a universal computer,
00:51:22.480 | in fact, eventually, well, I won't even spoil it for you.
00:51:26.480 | If I went on long enough, it would go into some kind
00:51:29.920 | of cycle, but we can ask, what is this thing trying to do?
00:51:34.920 | What is it, you know, is it, what's it thinking about?
00:51:37.440 | What's its, you know, what's its goal?
00:51:39.480 | What's its purpose?
00:51:41.000 | And, you know, we get very quickly in a big mess
00:51:43.960 | thinking about those kinds of things.
00:51:45.280 | I've, one of the things that comes out of this principle
00:51:47.960 | of computational equivalence is thinking about
00:51:51.520 | what kinds of things have, are capable
00:51:56.000 | of sophisticated computation.
00:51:57.760 | So I mentioned a while back here,
00:52:01.600 | sort of my personal history with WolfMalpha
00:52:03.760 | of having thought about doing something like WolfMalpha
00:52:05.880 | when I was a kid and then believing that you sort of had
00:52:08.240 | to build a brain to make that possible and so on.
00:52:10.920 | And one of the things that I then thought was
00:52:14.160 | that there was some kind of bright line
00:52:16.200 | between what is intelligent
00:52:19.080 | and what is merely computational, so to speak.
00:52:22.040 | In other words, that there was something which is like,
00:52:23.920 | oh, we've got this great thing that we humans have
00:52:26.320 | that, you know, is intelligence and all these things
00:52:28.840 | in nature and so on and all the stuff that's going on there,
00:52:31.800 | it's just computation or it's just, you know,
00:52:34.320 | things operating according to rules, that's different.
00:52:36.360 | There's some bright line distinction between these things.
00:52:39.040 | Well, I think the thing that came about
00:52:41.720 | after I'd looked at all these cellular automata
00:52:44.200 | and all kinds of other things like that
00:52:46.160 | is I sort of came up with this principle
00:52:48.920 | of computational equivalence idea,
00:52:51.720 | which we've now got quite a lot of evidence for,
00:52:53.600 | which I talk about people are interested in,
00:52:56.120 | but that basically there isn't a,
00:53:00.080 | that once you reach a certain level
00:53:01.920 | of computational sophistication, everything is equivalent.
00:53:05.560 | And that means that, that implies
00:53:08.160 | that there really isn't a bright line distinction
00:53:10.320 | between, for example, the computations going on
00:53:12.160 | in our brains and the computations going on
00:53:14.400 | in the simple cellular automata and so on.
00:53:16.280 | And that essentially philosophical point
00:53:18.600 | is what actually got me to start trying to build
00:53:20.440 | both from alpha, because I realized that, gosh, you know,
00:53:23.480 | I'd been looking for this sort of,
00:53:25.080 | the magic bullets of intelligence,
00:53:27.000 | and I just decided probably there isn't one.
00:53:29.160 | And actually it's all just computation.
00:53:31.600 | And so that means we can actually in practice
00:53:33.440 | build something that does this kind of intelligent
00:53:35.920 | like thing, and so that's what I think is the case,
00:53:39.560 | is that there really isn't sort of a bright line distinction
00:53:41.960 | and that has more extreme consequences.
00:53:44.360 | Like people will say things like, you know,
00:53:46.160 | the weather has a mind of its own, okay?
00:53:48.840 | Sounds kind of silly, sounds kind of animistic,
00:53:51.040 | primitive and so on, but in fact, the, you know,
00:53:54.240 | fluid dynamics of the weather is as computationally
00:53:57.880 | sophisticated as the stuff that goes on in our brains.
00:54:01.680 | But we can start asking, but then you say,
00:54:03.880 | but the weather doesn't have a purpose.
00:54:05.920 | You know, what's the purpose of the weather?
00:54:07.200 | Well, you know, maybe the weather is trying to equalize
00:54:10.360 | the temperature between the, you know,
00:54:11.920 | the North Pole and the tropics or something.
00:54:15.600 | And then we have to say, well, but that's not a purpose
00:54:17.880 | in the way that we think about purposes.
00:54:19.560 | That's just, you know, and we get very confused.
00:54:22.240 | And in the end, what we realize is when we're talking
00:54:24.720 | about things like purposes, we have to have this kind
00:54:27.520 | of chain of provenance that goes back to humans
00:54:31.520 | and human history and all that kind of thing.
00:54:33.760 | And I think it's the same type of thing when we talk
00:54:35.760 | about computation and AI and so on.
00:54:37.960 | The thing that we, this question of sort of purpose,
00:54:41.720 | goals, things like this, that's the thing
00:54:44.200 | which is intrinsically human and not something
00:54:47.360 | that we can ever sort of automatically generate.
00:54:49.240 | It makes no sense to talk about automatically generating it
00:54:51.880 | because these computational systems,
00:54:53.480 | they do all kinds of stuff.
00:54:55.000 | You know, we can say they've got a purpose,
00:54:56.500 | we can attribute purposes to them, et cetera, et cetera,
00:54:58.560 | et cetera, but, you know, ultimately it's sort of
00:55:00.960 | a human thread of purpose that we have to deal with.
00:55:04.000 | So that means, for example, when we talk about AIs
00:55:06.720 | and we're interested in things like, so how do we tell,
00:55:09.720 | you know, like we'd like to be able to tell,
00:55:12.040 | we talk about AI ethics, for example.
00:55:14.400 | We'd like to be able to make a statement to the AIs
00:55:17.240 | like, you know, please be nice to us humans.
00:55:20.500 | And that's a, you know, that's something,
00:55:24.560 | so one of the issues there is,
00:55:26.360 | so talking about that kind of thing,
00:55:30.140 | one of the issues is how are we going to make a statement
00:55:32.640 | like be nice to us humans?
00:55:34.820 | What's the, you know, how are we going to explain that
00:55:37.640 | to an AI?
00:55:38.960 | And this is where, again, you know, my efforts
00:55:42.960 | to build a language, a computational communication language
00:55:46.540 | that bridges the world of what we humans think about
00:55:50.500 | and the world of what is possible in computation
00:55:52.740 | is important, and so one of the things
00:55:54.500 | I've been interested in is actually building
00:55:56.640 | what I call a symbolic discourse language
00:55:58.900 | that can be a general representation
00:56:01.200 | for sort of the kinds of things that we might want to put in,
00:56:05.700 | that we might want to say in things like be nice to humans.
00:56:11.360 | So sort of a little bit of background to that.
00:56:13.740 | So, you know, in the modern world,
00:56:15.480 | people are keen on smart contracts.
00:56:17.740 | They often think of them as being deeply tied
00:56:19.540 | into blockchain, which I don't think is really quite right.
00:56:22.020 | The important thing about smart contracts
00:56:24.240 | is it's a way of having sort of an agreement
00:56:27.740 | between parties which can be executed automatically,
00:56:31.200 | and that agreement may be, you know,
00:56:33.040 | you may choose to sort of anchor that agreement
00:56:36.600 | in a blockchain, you may not,
00:56:38.240 | but the whole point is you have to,
00:56:39.760 | what you, you know, when people write legal contracts,
00:56:42.560 | they write them in an approximation to English.
00:56:44.640 | They write them in legalese typically
00:56:46.600 | 'cause they're trying to write them
00:56:47.420 | in something a little bit more precise than regular English,
00:56:50.060 | but the limiting case of that is to make
00:56:53.220 | a symbolic discourse language in which you can write
00:56:56.200 | the contract in code basically.
00:56:58.560 | And I've been very interested in using Wolfram Language
00:57:01.980 | to do that because in Wolfram Language,
00:57:03.840 | we have a language which can describe things about the world
00:57:07.460 | and we can talk about the kinds of things
00:57:10.240 | that people actually talk about in contracts and so on.
00:57:13.000 | And we're most of the way there to being able to do that.
00:57:16.200 | And then when you start thinking about that,
00:57:19.760 | you start thinking about, okay,
00:57:21.160 | so we've got this language to describe things
00:57:24.300 | that we care about in the world.
00:57:26.700 | And so when it comes to things like tell the AIs
00:57:29.420 | to be nice to the humans,
00:57:31.100 | we can imagine using Wolfram Language
00:57:33.700 | to sort of build an AI constitution that says
00:57:36.220 | this is how the AI is supposed to work.
00:57:38.320 | But when we talk about sort of just the untethered,
00:57:42.020 | you know, the untethered AI doesn't have any particular,
00:57:45.260 | it's just gonna do what it does.
00:57:47.100 | And if we want it to, you know,
00:57:48.860 | if we want to somehow align it with human purposes,
00:57:51.600 | we have to have some way to sort of talk to the AI.
00:57:54.640 | And that's, you know, I view my efforts
00:57:58.800 | to build Wolfram Language as a way to do that.
00:58:01.160 | I mean, you know, as I was showing at the beginning,
00:58:03.880 | you can use, you can take natural language
00:58:07.880 | and with natural language,
00:58:09.400 | you can build up a certain amount of,
00:58:11.760 | you can say a certain number of things in natural language.
00:58:14.080 | You can then say, well, how do we make this more precise
00:58:16.600 | in a precise symbolic language?
00:58:18.480 | If you want to build up more complicated things,
00:58:20.900 | it gets hard to do that in natural language.
00:58:23.220 | And so you have to kind of build up more serious programs
00:58:26.260 | in symbolic language.
00:58:29.340 | And I've probably been yakking a while here
00:58:33.220 | and I'm happy to, I can talk about
00:58:35.820 | all kinds of different things here,
00:58:37.060 | but maybe I've not seen as many reactions
00:58:40.260 | as I might've expected to think.
00:58:41.780 | So I'm not sure which things people are interested
00:58:44.140 | in which they're not.
00:58:44.980 | But so maybe I should stop here
00:58:48.020 | and we can have discussion, questions, comments.
00:58:51.560 | (audience applauding)
00:58:54.120 | - Yes, two microphones if you have questions,
00:58:56.800 | please come up.
00:58:58.200 | - So I have a quick question.
00:58:59.800 | It goes to the earlier part of your talk
00:59:01.600 | where you say you don't build a top-down ontology,
00:59:04.280 | you actually build from the bottom up
00:59:06.040 | with disparate domains.
00:59:08.120 | What do you feel are the core technologies
00:59:10.120 | of the knowledge representation
00:59:11.520 | which you use within Wolfram Alpha
00:59:13.640 | that allows you, you know, different domains
00:59:16.060 | to reason about each other, to come up with solutions?
00:59:18.400 | And is there any feeling of differentiability,
00:59:21.220 | for example, so if you were to come up with a plan
00:59:24.320 | to do something new within Wolfram Alpha language,
00:59:28.020 | you know, how would you go about doing that?
00:59:30.340 | - Okay, so we've done maybe a couple of thousand domains.
00:59:34.920 | What is actually involved in doing one of these domains?
00:59:40.020 | It's a gnarly business.
00:59:42.560 | Every domain has some crazy different thing about it.
00:59:45.620 | I tried to make up actually a while ago,
00:59:47.860 | let me show you something,
00:59:50.660 | a kind of a hierarchy of what it means to make,
00:59:54.300 | see if I can find this here,
00:59:55.940 | kind of a hierarchy of what it means
00:59:57.520 | to make a domain computable.
00:59:59.440 | Where is it?
01:00:01.900 | There we go.
01:00:02.740 | Okay, here we go.
01:00:09.940 | So this is sort of a hierarchy of levels
01:00:11.860 | of what it means to make a domain computable
01:00:13.980 | from just, you know, you've got some array of data
01:00:18.900 | that's quite structured.
01:00:19.900 | Forget, you know, the separate issue
01:00:21.740 | about extracting things from unstructured data,
01:00:24.220 | but let's imagine that you were given,
01:00:26.060 | you know, a bunch of data about landing sites
01:00:30.340 | of meteorites or something, okay?
01:00:32.500 | So you go through various levels.
01:00:33.900 | So, you know, things like, okay,
01:00:36.460 | the landing sites of the meteorites,
01:00:37.780 | are the positions just strings,
01:00:40.580 | or are they some kind of canonical representation
01:00:42.500 | of geoposition?
01:00:43.860 | Is the, you know, is the type of meteorite,
01:00:46.580 | you know, some of them are iron meteorites,
01:00:48.220 | some of them are stone meteorites.
01:00:49.700 | Have you made a canonical representation?
01:00:52.260 | Have you made some kind of way to identify what--
01:00:57.260 | - Sorry, go ahead.
01:00:59.260 | - No, no, I mean, to do that, so--
01:01:01.060 | - So my question is like, you know,
01:01:02.020 | if you did have positions as a string
01:01:03.980 | as well as a canonical representation,
01:01:05.900 | do you have redundant pieces of the same,
01:01:08.460 | redundant representations of the same information
01:01:11.380 | in the different--
01:01:13.780 | - No, I mean, our goal--
01:01:15.380 | - Is everything canonical that you have?
01:01:16.820 | Do you have a minimal representation of everything?
01:01:18.780 | - Yeah, our goal is to make everything canonical.
01:01:21.300 | Now, that's, you know, there is a lot of complexity
01:01:24.620 | in doing that.
01:01:25.460 | I mean, if you, you know, in each,
01:01:27.300 | okay, so another feature of these domains.
01:01:29.660 | Okay, so here's another thing to say.
01:01:32.020 | You know, it would be lovely
01:01:35.020 | if one could just automate everything
01:01:36.380 | and cut the humans out of the loop.
01:01:38.380 | Turns out this doesn't work.
01:01:40.540 | And in fact, whenever we do these domains,
01:01:42.860 | it's fairly critical to have expert humans
01:01:45.420 | who really understand the domain
01:01:46.740 | or you simply get it wrong.
01:01:48.500 | And it's also, having said that,
01:01:51.100 | once you've done enough domains,
01:01:52.340 | you can do a lot of cross-checking between domains
01:01:54.700 | and we are the number one reporters of error
01:01:57.980 | and of errors in pretty much all standardized data sources
01:02:01.940 | because we can do that kind of cross-checking.
01:02:04.460 | But I think, you know, if you ask the question,
01:02:07.140 | what's involved in bringing online a new domain,
01:02:12.460 | it's, you know, those sort of hierarchy of things,
01:02:15.860 | you know, some of those take a few hours.
01:02:18.020 | You can get to the point of having,
01:02:20.300 | you know, we've got good enough tools for ingesting data,
01:02:23.300 | figuring out, oh, those are names of cities in that column.
01:02:25.980 | Let's, you know, let's canonicalize those.
01:02:28.700 | You know, some may be questions,
01:02:29.940 | but many of them we'll be able to nail down.
01:02:32.860 | And to get to the full level
01:02:34.900 | of you've got some complicated domain
01:02:36.900 | and it's fully computable is probably a year of work.
01:02:41.460 | And you might say, well, gosh,
01:02:43.900 | why are you wasting your time?
01:02:45.300 | You've got to be able to automate that.
01:02:46.700 | So you can probably tell we're fairly sophisticated
01:02:48.780 | about machine learning kinds of things and so on.
01:02:50.900 | And we have tried, you know, to automate as much as we can.
01:02:54.660 | And we have got a pretty efficient pipeline,
01:02:57.140 | but if you actually want to get it right,
01:02:59.100 | and you see, here's an example of what happens.
01:03:02.060 | There's a level, even going between Wolfram Alpha
01:03:04.620 | and Wolfram Language, there's a level of,
01:03:07.140 | so for example, let's say you're looking at,
01:03:09.660 | you know, lakes in Wisconsin, okay?
01:03:12.500 | So people are querying about lakes in Wisconsin
01:03:14.660 | and Wolfram Alpha, they'll name a particular lake
01:03:17.420 | and they want to know, you know, how big is the lake?
01:03:20.100 | Okay, fine.
01:03:21.220 | In Wolfram Language, they'll be doing
01:03:23.020 | a systematic computation about lakes in Wisconsin.
01:03:25.980 | So if there's a lake missing,
01:03:27.740 | you're gonna get the wrong answer.
01:03:29.380 | And so that's a kind of higher level of difficulty.
01:03:33.700 | - Okay.
01:03:34.540 | - But there's, yeah, I think you're asking
01:03:37.140 | some more technical questions about ontologies
01:03:38.900 | and I can try and answer those.
01:03:40.260 | - Actually, one quick question.
01:03:41.660 | Can you-- - Wait, wait, wait, wait, wait.
01:03:43.140 | No, there's a lot of other questions.
01:03:45.060 | - Yeah, that's fine. - Okay.
01:03:45.900 | - Thank you very much, that was a great question.
01:03:47.580 | - We'll recycle this.
01:03:48.620 | - To the left here, please.
01:03:50.180 | - I've got a simple question.
01:03:51.700 | Who or what are your key influences?
01:03:54.260 | - Oh gosh.
01:03:56.700 | In terms of language design for Wolfram Language,
01:04:00.820 | for example-- - So in the context
01:04:01.820 | of machine intelligence, if you like,
01:04:03.140 | if you want to make it tailored to this audience.
01:04:08.180 | - I don't know, I've been absorbing stuff forever.
01:04:10.660 | I think my main, in terms of language design,
01:04:14.220 | probably Lisp and APL were my sort of early influences.
01:04:19.220 | But in terms of thinking about AI, hmm.
01:04:24.140 | You know, in, I mean, I'm kind of quite knowledgeable.
01:04:30.820 | I like history of science.
01:04:33.860 | I'm pretty knowledgeable about the early history
01:04:36.540 | of kind of mathematical logic, symbolic kinds of things.
01:04:40.020 | I would say, okay, maybe I can answer that in the negative.
01:04:43.540 | I have, for example, in building Wolfram Alpha,
01:04:46.180 | I thought, gosh, let me do my homework,
01:04:49.620 | let me learn all about computational linguistics,
01:04:51.420 | let me hire some computational linguistics PhDs.
01:04:54.220 | That will be a good way to get this started.
01:04:56.260 | Turns out, we used almost nothing
01:04:58.700 | from the previous sort of history
01:05:01.220 | of computational linguistics,
01:05:02.700 | partly because what we were trying to do,
01:05:04.100 | namely short question natural language understanding,
01:05:07.100 | is different from a lot of the natural language processing,
01:05:09.540 | which has been done in the past.
01:05:11.460 | I also have made, to my disappointment,
01:05:14.780 | very little use of, you know, people like Marvin Minsky,
01:05:18.900 | for example, I really don't think,
01:05:21.020 | I mean, I knew Marvin for years,
01:05:22.540 | and in fact, some of his early work
01:05:24.380 | on simple Turing machines and things,
01:05:26.660 | those are probably more influential to me
01:05:29.100 | than his work on AI.
01:05:31.380 | And, you know, probably my mistake
01:05:34.340 | of not understanding that better,
01:05:36.100 | but really, I would say that I've been rather uninfluenced
01:05:39.340 | by sort of the traditional AI kinds of things.
01:05:42.780 | I mean, it probably hasn't helped
01:05:43.900 | that I've kind of lived through a time
01:05:45.580 | when sort of AI went from, you know,
01:05:48.380 | when I was a kid, AI was gonna solve everything in the world
01:05:50.820 | and then, you know, it kind of decayed for a while
01:05:53.180 | and then sort of come back.
01:05:54.540 | So I would say that I can describe my negative,
01:05:57.660 | my non-influence is better than my influence.
01:05:59.700 | - The impression you give is that you made it up
01:06:01.060 | out of your own head,
01:06:01.900 | and it sounds as though that's pretty much right.
01:06:04.620 | - Yeah, I mean, yes.
01:06:05.940 | I mean, insofar as there's things to,
01:06:08.500 | I mean, look, things like the, you know,
01:06:13.220 | okay, so for example, studying simple programs
01:06:16.060 | and trying to understand the universe of simple programs,
01:06:19.060 | actually, the personal history of that
01:06:20.860 | is sort of interesting.
01:06:21.700 | I mean, I, you know, I used to do particle physics
01:06:26.060 | when I was a kid, basically,
01:06:27.860 | and then I actually got interested,
01:06:31.540 | okay, so I'll tell you the history of that,
01:06:32.820 | just as an example of how sort of interesting
01:06:34.820 | as a sort of history of ideas type thing.
01:06:36.980 | So I was interested in how order arises in the universe.
01:06:40.860 | So, you know, you start off from the hot Big Bang
01:06:43.340 | and then pretty soon you end up with a bunch of humans
01:06:46.020 | and galaxies and things like this.
01:06:47.300 | How does this happen?
01:06:48.820 | So I got interested in that question.
01:06:50.580 | I was also interested in things like neural networks
01:06:54.340 | for sort of AI purposes,
01:06:56.300 | and I thought, let me make a minimal model
01:06:59.460 | that encompasses sort of how complex things arise
01:07:02.780 | from other stuff,
01:07:04.860 | and I ended up sort of making simpler and simpler
01:07:07.940 | and simpler models and eventually wound up
01:07:09.580 | with cellular automata,
01:07:11.060 | and which I didn't know were called cellular automata
01:07:13.060 | when I started looking at them
01:07:14.660 | and then found they did interesting things,
01:07:16.660 | and the two areas where cellular automata
01:07:18.500 | have been singularly unuseful in analyzing things
01:07:22.820 | are large scale structure in the universe
01:07:25.580 | and neural networks.
01:07:26.980 | So it turned out, but that, by the way,
01:07:30.140 | the fact that I kind of even imagined
01:07:32.020 | that one could just start, yeah, I should say,
01:07:34.540 | you know, I've been doing physics,
01:07:36.340 | and in physics, the kind of intellectual concept
01:07:39.460 | is you take the world as it is
01:07:41.220 | and you try and drill down and find out what,
01:07:43.460 | you know, what makes the world out of primitives and so on.
01:07:46.100 | It's kind of a, you know, reduced to find things.
01:07:48.940 | Then I built my first computer language,
01:07:51.180 | I think called SMP, which went the other way around,
01:07:53.860 | where I was just like, I'm just gonna make up
01:07:55.340 | this computer language and, you know,
01:07:57.900 | just make up what I want the primitives to be
01:07:59.700 | and then I'm gonna build stuff up from it.
01:08:01.500 | I think that the fact that I kind of had the idea
01:08:04.140 | of doing things like making up cellular automata
01:08:06.580 | as possible models for the world
01:08:08.580 | was a consequence of the fact that I worked
01:08:09.940 | on this computer language, which was a thing
01:08:11.940 | which worked the opposite way around
01:08:13.500 | from the way that one is used to doing natural science,
01:08:16.060 | which is sort of this reductionist approach.
01:08:18.420 | And that's, I mean, so that's just an example
01:08:21.020 | of, you know, I found, I happen to have spent
01:08:25.020 | a bunch of time studying, as I say, history of science.
01:08:28.180 | And one of my hobbies is sort of history of ideas.
01:08:31.660 | I even wrote this little book called Idea Makers,
01:08:33.940 | which is about biographies of a bunch of people
01:08:36.220 | who for one reason or another I've written about.
01:08:38.220 | And so I'm always curious about this thing
01:08:40.180 | about how do people actually wind up figuring out
01:08:42.260 | the things they figure out.
01:08:43.780 | And, you know, one of the conclusions of my,
01:08:47.660 | you know, investigations of many people
01:08:49.460 | is there are very rarely moments of inspiration.
01:08:53.420 | Usually it's long, multi-decade kinds of things,
01:08:56.900 | which only later get compressed into something short.
01:09:00.220 | And also the path is often much, you know,
01:09:05.220 | it's quite, what can I say, that the steps are quite small,
01:09:10.260 | and, you know, but the path is often kind of complicated.
01:09:14.260 | And that's what it's been for me.
01:09:15.820 | So I-
01:09:16.660 | - Simple question, complex answer.
01:09:17.780 | - Sorry about that.
01:09:18.620 | (laughing)
01:09:19.460 | - Go ahead, please.
01:09:20.780 | - Hello.
01:09:21.620 | So what I basically see from the Wolfram language
01:09:24.660 | is it's a way to describe all of objective reality.
01:09:27.300 | It's kind of formalizing just about the entire domain
01:09:30.580 | of discourse, to use a philosophical term.
01:09:32.300 | And you kind of hinted at this in your lecture
01:09:34.940 | where it sort of leaves off,
01:09:36.500 | is that when we start to talk about
01:09:37.820 | more esoteric philosophical concepts, purpose,
01:09:41.420 | I guess this would lead into things like epistemology,
01:09:44.020 | because essentially you only have science there.
01:09:45.620 | And as amazing as science is,
01:09:46.920 | there are other things that are talked about,
01:09:48.440 | not, you know, like idealism versus materialism, et cetera.
01:09:52.700 | Do you have an idea of how Wolfram might or might not
01:09:56.900 | be able to branch into those discourses?
01:09:58.540 | Because I'm hearing echoes in my head of that time.
01:10:01.020 | Ballstrom said that an AI needs a, you know,
01:10:03.560 | when you give an AI a purpose, there's like,
01:10:05.440 | I think he said philosophers are divided completely evenly
01:10:08.500 | between the top four ways to measure
01:10:10.060 | how good something should be.
01:10:11.120 | It's like utilitarianism and-
01:10:12.900 | - Sure.
01:10:13.740 | - Do you have the four minus Japanese?
01:10:14.840 | - Yeah, right.
01:10:15.680 | So the first thing is, I mean,
01:10:16.580 | this problem of making what, okay,
01:10:19.720 | about 300 years ago, people like Leibniz
01:10:21.960 | were interested in the same problem that I'm interested in,
01:10:23.980 | which is how do you formalize sort of everyday discourse?
01:10:27.440 | And Leibniz had the original idea, you know,
01:10:29.400 | he was originally trained as a lawyer,
01:10:31.260 | and he had this idea, if he could only reduce all law,
01:10:34.520 | all legal questions to matters of logic,
01:10:37.240 | he could have a machine that would basically describe,
01:10:39.320 | you know, answer every legal case, right?
01:10:43.040 | He was unfortunately a few hundred years too early,
01:10:46.260 | even though he did have, you know, he tried to,
01:10:48.320 | he tried to do all kinds of things,
01:10:49.440 | very similar to things I've tried to do,
01:10:51.200 | like he tried to get various dukes
01:10:53.000 | to assemble big libraries of data and stuff like this,
01:10:56.480 | but the point, so what he tried to do
01:10:59.800 | was to make a formalized representation of everyday discourse
01:11:04.400 | for whatever reason, for the last 300 years,
01:11:06.480 | basically people haven't tried to do that.
01:11:08.560 | There's, it's an almost completely barren landscape.
01:11:11.700 | There was this period of time in the 1600s
01:11:14.480 | when people talked about philosophical languages.
01:11:17.320 | Leibniz was one, a guy called John Wilkins was another,
01:11:20.600 | and they tried to, you know, break down human thought
01:11:23.880 | into something symbolic.
01:11:25.760 | People haven't done that for a long time.
01:11:28.560 | In terms of what can we do that with, you know,
01:11:32.100 | I've been trying to figure out
01:11:33.720 | what the best way to do it is.
01:11:34.840 | I think it's actually not as hard as one might think.
01:11:37.720 | These areas, one thing you have to understand,
01:11:39.720 | these areas like philosophy and so on,
01:11:41.960 | are, they're on the harder end.
01:11:43.800 | I mean, things like, a good example, typical example,
01:11:46.800 | you know, I want to have a piece of chocolate, okay?
01:11:49.960 | The, in Wolfram language right now,
01:11:51.860 | we have a pretty good description of pieces of chocolate.
01:11:54.560 | We know all sorts of, you know,
01:11:55.980 | we probably know 100 different kinds of chocolate.
01:11:58.360 | We know how big the pieces are, all that kind of thing.
01:12:01.380 | The I want part of that sentence,
01:12:03.180 | we can't do that right now,
01:12:05.260 | but I don't think that's that hard, and I'm, you know,
01:12:08.160 | that's, now if you ask, let's say we had,
01:12:12.240 | I think the different thing you're saying is,
01:12:14.360 | let's say we had the omnipotent AI, so to speak,
01:12:17.720 | that was able to, you know,
01:12:19.200 | where we turn over the control of the central bank
01:12:21.520 | to the AI, we turn over all these other things to the AI.
01:12:24.640 | Then the question is, we say to the AI,
01:12:26.840 | now do the right thing.
01:12:28.800 | And then the problem with that is,
01:12:31.600 | and this is why I talk about, you know,
01:12:33.060 | creating AI constitutions and so on,
01:12:36.000 | we have absolutely no idea
01:12:38.240 | what do the right thing is supposed to mean.
01:12:39.760 | And philosophers have been arguing about that,
01:12:41.520 | you know, utilitarianism is an example of that,
01:12:44.280 | of one of the answers to that,
01:12:46.280 | although it's not a complete answer by any means,
01:12:48.280 | it's not really an answer,
01:12:49.520 | it's just a way of posing the question.
01:12:51.960 | And so I think that the, you know,
01:12:53.760 | one of the features of,
01:12:56.120 | so I think it's a really hard problem to, you know,
01:13:00.240 | you think to yourself,
01:13:01.120 | what should the AI constitution actually say?
01:13:03.400 | So first thing you might think is,
01:13:05.240 | oh, there's going to be, you know,
01:13:06.480 | something like Asimov's laws of robotics.
01:13:08.200 | There's going to be one, you know, golden rule for AIs.
01:13:11.960 | And if we just follow that golden rule, all will be well.
01:13:15.320 | Okay, I think that that is absolutely impossible.
01:13:18.880 | And in fact, I think you can even sort of mathematically
01:13:20.760 | prove that that's impossible.
01:13:22.960 | Because I think as soon as you have a system that,
01:13:26.280 | you know, essentially what you're trying to do
01:13:27.800 | is you're trying to put in constraints that,
01:13:31.840 | okay, basically, as soon as you have a system
01:13:33.680 | that shows computational irreducibility,
01:13:35.840 | I think it is inevitable that you have
01:13:38.600 | unintended consequences of things,
01:13:41.920 | which means that you never get to just say,
01:13:45.000 | put everything in this one very nice box.
01:13:47.700 | You always have to say, let's put in a patch here,
01:13:49.680 | let's put in a patch there, and so on.
01:13:51.560 | A version of this, much more abstract version of this,
01:13:54.040 | Godel's theorem.
01:13:55.300 | So Godel's theorem is, you know,
01:13:57.760 | it starts off by taking the, you know,
01:14:01.120 | Godel's theorem is trying to talk about integers.
01:14:04.400 | It says, start off with Peano's axioms.
01:14:07.280 | Peano's axioms, you might say, in Peano thought,
01:14:10.400 | describe the integers and nothing but the integers.
01:14:13.240 | Okay, so anything that's provable from Peano's axioms
01:14:17.120 | will be true about integers and vice versa, okay?
01:14:19.760 | What Godel's theorem shows is that that will never work,
01:14:22.840 | that there are an infinite hierarchy of patches
01:14:25.520 | that you have to put on to Peano's axioms
01:14:27.800 | if you want to describe the integers
01:14:29.400 | and nothing but the integers.
01:14:30.920 | And I think the same is true if you want to have
01:14:32.980 | a legal system effectively
01:14:34.880 | that has no bizarre unintended consequences.
01:14:37.720 | So I don't think it's possible to just say, you know,
01:14:40.200 | if you, when you're describing something in the world
01:14:43.040 | that's complicated like that,
01:14:43.920 | I don't think it's possible to just have
01:14:45.840 | a small set of rules that will always do what we want,
01:14:49.840 | so to speak.
01:14:50.680 | I think it's inevitable that you have to have a long,
01:14:52.640 | essentially, code of laws, and that's what, you know,
01:14:55.960 | so my guess is that what will actually have to happen is,
01:14:58.680 | you know, as we try and describe
01:14:59.920 | what should we want the AIs to do,
01:15:02.440 | you know, I don't know the sociopolitical aspects
01:15:04.580 | of how we'll figure out whether it's one AI constitution
01:15:09.220 | or one per, you know, city or whatever.
01:15:13.860 | We can talk about that, that's a separate issue,
01:15:15.540 | but, you know, I think what will happen is
01:15:18.300 | it'll be much like human laws.
01:15:19.640 | It'll be a complicated thing
01:15:20.780 | that gets progressively patched.
01:15:22.500 | And so I think it's some, and these ideas like,
01:15:25.560 | you know, oh, we'll just make the AIs, you know,
01:15:29.160 | run the world according to, you know, Mill's,
01:15:32.780 | you know, John Stuart Mill's idea, it's not gonna work.
01:15:36.500 | Which is not surprising, 'cause philosophy
01:15:39.260 | has made the point that it's not an easy problem
01:15:42.740 | for the last 2,000 years, and they're right.
01:15:45.300 | It's not an easy problem.
01:15:47.140 | - Thank you. - Yeah.
01:15:48.140 | - Hi, you're talking about computational irreducibility
01:15:53.460 | and computational equivalence, and also that earlier on
01:15:57.100 | in your intellectual adventures,
01:15:59.320 | you're interested in particle physics and things like that.
01:16:02.640 | I've heard you make the comment before in other contexts
01:16:06.560 | that things like molecules compute,
01:16:10.660 | and I was curious to ask you exactly what you mean by that,
01:16:13.420 | in what sense does a molecule--
01:16:16.380 | - I mean, what would you like to compute, so to speak?
01:16:21.380 | I mean, in other words, what is the case is that,
01:16:26.880 | you know, one definition of your computing
01:16:29.020 | is given a particular computation, like, I don't know,
01:16:31.460 | finding square roots or something, you know,
01:16:34.020 | you can program a, you know, the surprising thing
01:16:38.500 | is that an awful lot of stuff can be programmed
01:16:41.680 | to do any computation you want.
01:16:43.900 | That's some, and, you know, when it comes to,
01:16:46.780 | I mean, I think, for example, when you look
01:16:48.020 | at nanotechnology and so on, the current,
01:16:52.180 | you know, one of the current beliefs is
01:16:54.300 | to make very small computers, you should take
01:16:57.360 | what we know about making big computers
01:16:59.740 | and just, you know, make them smaller, so to speak.
01:17:03.120 | I don't think that's the approach you have to use.
01:17:05.340 | I think you can take the components that exist
01:17:07.740 | at the level of molecules and say,
01:17:09.980 | how do we assemble those components
01:17:12.680 | to be able to do complicated computations?
01:17:14.460 | I mean, it's like the cellular automata,
01:17:16.220 | that the, you know, the underlying rule
01:17:19.820 | for the cellular automaton is very simple,
01:17:21.900 | yet when that rule is applied many times,
01:17:24.780 | it can do a sophisticated computation.
01:17:26.880 | So I think that that's the sense in which,
01:17:29.500 | what can I say, the raw material that you need
01:17:33.100 | for computation can be, you know,
01:17:36.260 | there's a great diversity in the raw material
01:17:38.140 | that you can use for computation.
01:17:39.340 | Our particular human development, you know,
01:17:41.860 | stack of technologies that we use for computation right now
01:17:46.060 | is just one particular path, and we can, you know,
01:17:48.660 | so a very practical example of this is algorithmic drugs.
01:17:52.020 | So the question is, right now, drugs pretty much work by,
01:17:54.980 | most drugs work by, you know, there is a binding site
01:17:57.660 | on a molecule, drug fits into binding site, does something.
01:18:00.820 | Question is, can you imagine having something
01:18:03.480 | where the molecule, you know, is something
01:18:05.600 | which has computations going on in it,
01:18:08.060 | where it goes around and it looks at that, you know,
01:18:11.020 | that thing it's supposed to be binding to,
01:18:12.540 | and it figures out, oh, there's this knob here
01:18:14.540 | and that knob there, it reconfigures itself,
01:18:17.260 | it's computing something, it's trying to figure out,
01:18:19.980 | you know, is this likely to be a tumor cell or whatever,
01:18:22.500 | based on some more complicated thing.
01:18:24.580 | That's the type of thing that I mean
01:18:26.060 | by computations happening at molecular scale.
01:18:28.660 | - Okay, I guess I meant to ask, if it follows from that,
01:18:32.140 | if, in your view, like the molecules in the chalkboard
01:18:36.700 | and in my face and in the table are, in any sense,
01:18:39.620 | currently doing computing.
01:18:40.980 | - Sure, I mean, the question of what computation,
01:18:42.980 | look, one of the things to realize,
01:18:44.580 | if you look at kind of the sort of past and future of things,
01:18:48.100 | okay, so here's an observation, actually,
01:18:51.940 | I was about Leibniz, actually.
01:18:54.100 | In Leibniz's time, Leibniz made a calculator-type computer
01:18:58.660 | out of brass, took him 30 years, okay?
01:19:01.620 | So in his day, there was, you know,
01:19:03.780 | at most one computer in the world,
01:19:05.580 | as far as he was concerned, right?
01:19:07.300 | Today's world, there may be 10 billion computers,
01:19:09.260 | maybe 20 billion computers, I don't know.
01:19:11.780 | The question is, what's that gonna look like in the future?
01:19:14.220 | And I think the answer is that, in time,
01:19:17.260 | probably everything we have will be made of computers,
01:19:20.740 | in the following sense, that basically, it won't be,
01:19:23.340 | you know, in today's world, things are made of, you know,
01:19:25.780 | metal, plastic, whatever else,
01:19:27.740 | but actually, that won't make it,
01:19:29.380 | there won't be any point in doing that.
01:19:30.860 | Once we know how to do, you know,
01:19:32.660 | molecular-scale manufacturing and so on,
01:19:35.140 | we might as well just make everything
01:19:36.500 | out of programmable stuff.
01:19:38.420 | And I think that's a sense in which, you know,
01:19:42.540 | and, you know, the one example we have
01:19:44.100 | in molecular computing right now is us, in biology.
01:19:47.660 | You know, biology does a reasonable job
01:19:49.820 | of specific kinds of molecular computing.
01:19:52.380 | It's kind of embarrassing, I think,
01:19:53.740 | that the only, you know, molecule we know
01:19:56.060 | that's sort of a memory molecule is DNA,
01:19:58.420 | that's kind of, you know, which is kind of the, you know,
01:20:00.780 | the particular biological solution.
01:20:02.740 | In time, we'll know lots of others.
01:20:04.500 | And, you know, I think the, sort of the end point is,
01:20:09.100 | so if you're asking, is, you know,
01:20:11.060 | is computation going on in, you know,
01:20:13.220 | in this water bottle, the answer is absolutely.
01:20:15.660 | It's probably even many aspects of that computation
01:20:17.980 | are pretty sophisticated.
01:20:18.820 | If we wanted to know what would happen
01:20:20.540 | to particular molecules here, it's gonna be hard to tell.
01:20:23.120 | There's going to be computational irreducibility and so on.
01:20:25.740 | Can we make use of that for our human purposes?
01:20:28.660 | Can we piggyback on that to achieve something technological?
01:20:31.660 | That's a different issue.
01:20:32.980 | And that's, for that, we have to build up
01:20:34.860 | this whole, sort of, chain of technology
01:20:37.140 | to be able to connect it, which is what I've kind of been,
01:20:40.420 | been keep on talking about is, how do we connect,
01:20:43.740 | sort of, what is possible computationally in the universe
01:20:46.740 | to what we humans can kind of conceptualize
01:20:49.700 | that we want to do in computation?
01:20:51.060 | And that's, you know, that's the bridge that we have to make
01:20:53.220 | and that's the hard part.
01:20:54.220 | But getting, the intrinsic getting the computation done is,
01:20:57.260 | is, you know, there's computation going on
01:21:00.100 | all over the place.
01:21:01.100 | - Maybe a couple more questions.
01:21:04.580 | I was hoping you could elaborate on
01:21:07.020 | what you were talking about earlier of, like,
01:21:09.140 | searching the entire space of possible programs.
01:21:12.440 | So that's very broad.
01:21:14.780 | So maybe, like, what kind of searching of that space
01:21:18.740 | we're good at and, like, what we're not
01:21:20.380 | and I guess what the differences are.
01:21:21.220 | - Yeah, right, so, I mean, I would say that we're
01:21:23.260 | at an early stage in knowing how to do that, okay?
01:21:26.300 | So I've done lots of these things and they are,
01:21:29.900 | the thing that I've noticed is,
01:21:32.180 | if you do an exhaustive search,
01:21:34.220 | then you don't miss even things
01:21:35.820 | that you weren't looking for.
01:21:37.740 | If you do a non-exhaustive search,
01:21:39.700 | there is a tremendous tendency to miss things
01:21:42.620 | that you weren't looking for.
01:21:44.860 | And so, you know, we've done searches
01:21:48.500 | or a bunch of function evaluation in Wolfram Language
01:21:51.060 | is done by, was done by searching for optimal approximations
01:21:54.860 | in some big space.
01:21:56.380 | A bunch of stuff with hashing is done that way.
01:21:59.420 | Bunch of image processing is done that way.
01:22:01.540 | Where we're just sort of searching this, you know,
01:22:03.560 | doing exhaustive searches in maybe trillions of programs
01:22:06.540 | to find things.
01:22:07.620 | Now, you know, there is, on the other side of that story
01:22:11.180 | is the incremental improvements story
01:22:13.780 | with deep learning and neural networks and so on,
01:22:17.420 | where because there is differentiability,
01:22:20.200 | you're able to sort of incrementally
01:22:22.300 | get to a better solution.
01:22:23.900 | Now, in fact, people are making less and less
01:22:26.580 | differentiability in deep learning neural nets.
01:22:29.500 | And so, I think eventually there's going to be
01:22:31.740 | sort of a grand unification of these kinds of approaches.
01:22:36.420 | Right now, we're still, you know, I don't really know
01:22:39.460 | what the, you know, the exhaustive search side of things,
01:22:42.680 | which you can use for all sorts of purposes.
01:22:44.820 | I mean, the reason, the surprising thing
01:22:47.120 | that makes exhaustive search not crazy
01:22:49.420 | is that there is rich, sophisticated stuff
01:22:53.300 | near at hand in the computational universe.
01:22:55.180 | If you had to go, you know, quadrillions, you know,
01:22:58.200 | through a quadrillion cases
01:22:59.540 | before you ever found anything,
01:23:01.240 | exhaustive search would be hopeless.
01:23:03.160 | But you don't in many cases.
01:23:05.480 | And, you know, I would say that we are
01:23:07.400 | in a fairly primitive stage of the science
01:23:10.180 | of how to do those searches.
01:23:11.220 | Well, my guess is that there'll be
01:23:13.620 | some sort of unification, which needless to say,
01:23:15.980 | I've thought a bunch about,
01:23:17.100 | and between kind of the neural net.
01:23:20.160 | So, you know, the trade-off typically in neural nets
01:23:22.420 | is you can have a neural net that is very good at,
01:23:25.820 | that is, you know, uses its computational resources well,
01:23:29.200 | but it's really hard to train,
01:23:30.900 | or you can have a neural net
01:23:31.900 | that doesn't use its computational resources so well,
01:23:34.360 | but it's very easy to train,
01:23:35.680 | because it's very, you know, smoothly.
01:23:38.520 | And, you know, my guess is that somewhere in the,
01:23:41.260 | you know, harder to train,
01:23:43.100 | but makes use of things that are closer
01:23:45.520 | to the complete computational universe
01:23:47.960 | is where one's going to see progress.
01:23:50.340 | But it's a really interesting area,
01:23:52.860 | and, you know, I consider us only at the beginning
01:23:56.600 | of figuring that out.
01:23:57.700 | - One last question.
01:23:59.940 | - Hi.
01:24:00.780 | - Hello, keep going?
01:24:01.620 | - Yeah, okay. - All right, let's do it.
01:24:03.740 | - Thank you for your talk.
01:24:04.840 | Just to give a bit of context for my question,
01:24:06.900 | I research how we could teach AI to kids,
01:24:09.260 | and developing platforms for that,
01:24:10.780 | how we could teach artificial intelligence
01:24:12.220 | and machine learning to children,
01:24:13.280 | and I know you develop resources for that as well.
01:24:16.340 | So, I was wondering, like,
01:24:18.820 | where do you think it's problematic
01:24:20.700 | that we have computation that is very efficient,
01:24:23.020 | and can, you know, from a utilitarian
01:24:24.980 | and problem-solving perspective,
01:24:26.940 | it achieves all the goals,
01:24:28.220 | but we don't understand how it works,
01:24:30.300 | so we have to create these fake steps,
01:24:33.180 | and if you could think of scenarios
01:24:34.860 | where that could become very problematic over time,
01:24:37.180 | and why do we approach it in such a deterministic way?
01:24:40.740 | And when you mentioned that computation and intelligence
01:24:43.100 | are differentiated by this, like, very thin line,
01:24:46.500 | how does that affect the way you learn,
01:24:48.060 | and how do you think that will affect
01:24:49.180 | the way we kids learn, we learn?
01:24:51.580 | - Right, so I mean, look, my general principle about,
01:24:54.540 | you know, future generations and what they should learn,
01:24:57.860 | first point is, you know, very obvious point,
01:25:00.820 | that for every field that people study,
01:25:04.380 | you know, archeology to zoology,
01:25:06.700 | there either is now a computational X,
01:25:09.820 | or there will be soon.
01:25:11.260 | So, you know, every field, the paradigm of computation
01:25:15.100 | is becoming important,
01:25:17.300 | perhaps the dominant paradigm in that field.
01:25:19.620 | Okay, so how do you teach kids to be useful
01:25:22.620 | in a world where everything is computational?
01:25:25.900 | I think the number one thing is to teach them
01:25:29.820 | how to think in computational terms.
01:25:32.500 | What does that mean?
01:25:33.340 | It doesn't mean writing code, necessarily.
01:25:36.340 | I mean, in other words, one of the things
01:25:38.220 | that's happening right now as a practical matter
01:25:40.500 | is, you know, there've been these waves of enthusiasm
01:25:42.500 | for teaching coding of various kinds.
01:25:44.580 | You know, we're in a, actually we're in the end
01:25:46.780 | of an uptick wave, I think.
01:25:49.060 | It's going down again.
01:25:51.100 | You know, it's been up and down for 40 years or so.
01:25:54.860 | Okay, why doesn't that work?
01:25:56.820 | Well, it doesn't work because while there are people,
01:25:58.980 | like people who are students at MIT, for example,
01:26:01.700 | for whom they really want to learn, you know,
01:26:03.900 | engineering style coding,
01:26:05.820 | and it really makes sense for them to learn that,
01:26:08.380 | the vast majority of people,
01:26:09.860 | it's just not going to be relevant
01:26:11.860 | because they're not going to write
01:26:12.980 | a low-level C program or something.
01:26:15.340 | And it's the same thing that's happened in math education,
01:26:18.100 | which has been sort of a disaster there,
01:26:20.140 | which is the number one takeaway for most people
01:26:23.300 | from the math they learn in school is, I don't like math.
01:26:27.380 | And, you know, that's not for all of them, obviously,
01:26:30.500 | but that's the, you know, if you ask on a general scale,
01:26:33.740 | you know, what people, and why is that?
01:26:35.660 | Well, part of the reason is because what's been taught
01:26:37.620 | is rather low-level and mechanical.
01:26:39.700 | It's not about mathematical thinking, particularly.
01:26:43.020 | It's mostly about, you know, what teachers can teach
01:26:45.500 | and what assessment processes can assess and so on.
01:26:47.980 | Okay, so how should one teach computational thinking?
01:26:51.100 | I mean, I'm kind of excited about what we can do
01:26:53.620 | with Wolfram Language because I think we have
01:26:55.540 | a high enough level language that people can actually write,
01:27:00.180 | you know, that, for example, I reckon by age 11 or 12,
01:27:04.420 | and I've done many experiments on this,
01:27:05.940 | so I have some, the only problem with my experiments
01:27:08.700 | is most of my experiments end up being with kids
01:27:10.780 | who are high-achieving kids.
01:27:13.020 | Despite many efforts to reach lower-achieving kids,
01:27:15.500 | it always ends up that the kids who actually do the things
01:27:18.620 | that I set up are the high-achieving kids.
01:27:20.620 | But, you know, setting that aside,
01:27:23.020 | you know, you take the typical, you know,
01:27:27.420 | 11, 12, 13-year-olds and so on,
01:27:29.620 | and they can learn how to write stuff in this language,
01:27:33.140 | and what's interesting is they learn to start thinking,
01:27:36.060 | here, I'll show you, let's be very practical.
01:27:37.740 | I can show you, I was doing, every Sunday,
01:27:39.780 | I do a little thing with some middle school kids,
01:27:42.860 | and I might even be able to find my stuff from yesterday.
01:27:45.820 | This is, okay, let's see.
01:27:48.620 | Programming Adventures, January 28th.
01:27:52.300 | Okay, let's see what I did.
01:27:53.740 | Oh, look at that.
01:27:54.580 | That was why I thought of the South America thing here,
01:27:56.860 | because I'd just done that with these kids.
01:27:59.020 | And so, what are we doing?
01:28:03.740 | We were trying to figure out this,
01:28:06.820 | trying to figure out the shortest tour thing
01:28:10.900 | that I just showed you, which is,
01:28:12.820 | this is where I got what to show you,
01:28:14.820 | is what I was doing with these kids.
01:28:16.740 | But this was my version of this,
01:28:19.060 | but the kids all had various different versions of this,
01:28:21.980 | and we had somebody suggested, let's just enumerate,
01:28:26.980 | let's just look at all possible permutations
01:28:29.620 | of these cities and figure out what their distances are.
01:28:33.540 | There's the histogram of those.
01:28:35.380 | That's what we get from those.
01:28:36.860 | Okay, how do you get the largest distance from those,
01:28:39.660 | et cetera, et cetera, et cetera.
01:28:40.940 | And this is, okay, this was my version of it,
01:28:42.820 | but the kids had similar stuff.
01:28:44.980 | And this is, I think, and it probably went off into,
01:28:49.740 | oh yeah, there we go, there's the one for the whole Earth,
01:28:52.860 | and then they wanted to know, how do you do that in 3D?
01:28:55.900 | So I was showing them how to convert
01:28:58.540 | to XYZ coordinates in 3D
01:29:01.100 | and make the corresponding thing in 3D.
01:29:03.260 | So what's, this maybe isn't,
01:29:07.060 | this is a random example from yesterday,
01:29:08.700 | so it's not a highly considered example,
01:29:11.340 | but what I think is interesting is that
01:29:15.780 | we seem to have finally reached the point
01:29:17.500 | where we've automated enough
01:29:19.300 | of the actual doing of the computation
01:29:21.980 | that the kids can be exposed mostly
01:29:24.980 | to the thinking about what you might want to compute.
01:29:27.860 | And part of our role in language design,
01:29:30.860 | as far as I'm concerned,
01:29:32.140 | is to get it as much as possible
01:29:34.020 | to the point where, for example,
01:29:35.340 | you can do a bunch of natural language input,
01:29:37.420 | you can do things which make it as easy as possible
01:29:40.900 | for kids to not get mixed up in the kind of what the,
01:29:44.460 | how the computation gets done,
01:29:45.900 | but rather to just think about
01:29:47.100 | how you formulate the computation.
01:29:48.460 | So for example, a typical example I've used a bunch of times
01:29:51.380 | in what does it mean to write code versus do other things?
01:29:55.500 | Like a typical sort of test example would be,
01:29:58.540 | I don't know, you ask somebody,
01:30:00.620 | you're gonna, there's a practical problem
01:30:01.940 | we had in Wolf Malfoy,
01:30:02.780 | you give a lat-long position on the Earth,
01:30:05.100 | and you say, you're gonna make a map
01:30:07.020 | of that lat-long position.
01:30:08.940 | What scale of map should you make?
01:30:11.220 | Right, so if the lat-long is in the middle of the Pacific,
01:30:13.540 | making a 10-mile radius map isn't very interesting.
01:30:17.580 | If it's in the middle of Manhattan,
01:30:18.820 | a 10-mile radius map might be quite a sensible thing to do.
01:30:22.180 | So the question is, come up with an algorithm,
01:30:23.820 | come up with even a way of thinking about that question.
01:30:26.460 | What do you do?
01:30:27.780 | How should you figure that out?
01:30:29.220 | Well, you might say,
01:30:30.860 | oh, let's look at the visual complexity of the image.
01:30:33.500 | Let's look at how far it is to another city.
01:30:37.020 | That's far, you know, there are various different things,
01:30:38.940 | but thinking about that
01:30:40.420 | as a kind of computational thinking exercise
01:30:43.140 | that is, you know, that's the kind of thing.
01:30:48.140 | So in terms of what one automates
01:30:50.340 | and whether people need to understand how it works inside,
01:30:53.740 | okay, main point is you'll,
01:30:59.980 | in the end, it will not be possible
01:31:01.740 | to know how it works inside.
01:31:03.540 | So you might as well stop having that be a criterion.
01:31:06.260 | I mean, that is, there are plenty of things
01:31:07.940 | that one teaches people that are,
01:31:09.820 | let's say in lots of areas of biology, medicine,
01:31:14.460 | whatever else, you know,
01:31:16.260 | maybe we'll know how it works inside one day,
01:31:18.580 | but you can still, there's an awful lot of useful stuff
01:31:20.660 | you can teach without knowing how it works inside.
01:31:23.260 | And I think also, as we get computation
01:31:25.620 | to be more efficient, inevitably,
01:31:27.020 | we will be dealing with things
01:31:28.060 | where you don't know how it works inside.
01:31:29.580 | Now, you know, we've seen this in math education
01:31:31.460 | 'cause I've happened to make tools
01:31:33.540 | that automate a bunch of things
01:31:34.980 | that people do in math education.
01:31:37.100 | And I think, well, to tell a silly story,
01:31:40.460 | I mean, my older daughter,
01:31:42.540 | who at some point in the past was doing calculus,
01:31:45.540 | you know, and learning doing integrals and things,
01:31:47.220 | and I was saying to her, you know,
01:31:49.300 | I didn't think humans still did that stuff anymore,
01:31:53.500 | which was a very unendearing comment.
01:31:55.420 | But in any case, I mean, you know,
01:31:59.140 | there's a question of whether do humans need
01:32:01.580 | to know how to do that stuff or not?
01:32:03.700 | So I haven't done an integral by hand in probably 35 years.
01:32:07.300 | That true?
01:32:09.060 | More or less true.
01:32:10.540 | But when I was using computers to do them,
01:32:13.460 | I was for a while, you know,
01:32:16.380 | when I used to do physics and so on,
01:32:17.740 | I used computers to do this stuff,
01:32:19.340 | I was a really, really good integrator,
01:32:22.420 | except that it wasn't really me,
01:32:24.380 | it was me plus the computer.
01:32:25.940 | So how did that come to be?
01:32:27.020 | Well, the answer was that because I was doing things
01:32:29.580 | by computer, I was able to try zillions of examples,
01:32:33.020 | and I got a much better intuition than most people got
01:32:36.020 | for how these things would work roughly,
01:32:37.940 | how what you did to make the thing go and so on.
01:32:41.220 | Whereas people who are like,
01:32:42.220 | I'm just working this one thing out by hand,
01:32:44.660 | you get a different, you know,
01:32:45.860 | you don't get that intuition.
01:32:47.140 | So I think, you know, two points.
01:32:48.940 | First of all, you know, this,
01:32:51.060 | how do you think about things computationally?
01:32:52.980 | How do you formulate the question computationally?
01:32:55.140 | That's really important and something that we are now
01:32:57.700 | in a position, I think, to actually teach.
01:32:59.860 | And it is not really something you teach by,
01:33:02.620 | you know, teaching, you know, traditional quotes coding,
01:33:06.020 | because a lot of that is, okay, we're gonna make a loop,
01:33:08.420 | we're gonna define variables.
01:33:10.140 | I just as a, I think I probably have a copy here, yeah.
01:33:13.300 | I wrote this book for,
01:33:14.780 | this is a book kind of for kids about open language,
01:33:17.220 | except it seems to be useful to adults as well,
01:33:19.500 | but I wrote it for kids.
01:33:20.940 | So it's, one of the amusing things in this book
01:33:24.820 | is it doesn't talk about assigning values to variables
01:33:28.620 | until chapter 38.
01:33:30.820 | So in other words,
01:33:31.860 | that will be a thing that you would find in chapter one
01:33:34.260 | of most, you know, low level programming,
01:33:37.220 | coding type things.
01:33:38.660 | It turns out it's not that relevant to know how to do that.
01:33:41.140 | It's also kind of confusing and not necessary.
01:33:45.380 | And so, you know, in terms of the,
01:33:48.260 | you asked where will we get in trouble
01:33:49.540 | when people don't know how the stuff works inside.
01:33:52.060 | That's, I mean, you know,
01:33:55.380 | I think one just has to get used to that
01:33:57.060 | because it's like, you know, you might say,
01:33:59.340 | well, we live in the world
01:34:00.500 | and it's full of natural processes
01:34:02.300 | where we don't know how they work inside,
01:34:04.100 | but somehow we managed to survive
01:34:06.220 | and we go to a lot of effort to do natural science
01:34:08.900 | to try and figure out how stuff works inside.
01:34:11.580 | But it turns out we can still use plenty of things
01:34:13.820 | even when we don't know how they work inside.
01:34:15.700 | We don't need to know.
01:34:17.620 | And I think the, I mean, I think the main point is
01:34:20.740 | computational irreducibility guarantees
01:34:22.860 | that we will be using things where we don't know
01:34:25.140 | and can't know how they work inside.
01:34:27.900 | And, you know, I think the perhaps,
01:34:30.900 | the thing that is a little bit, you know,
01:34:34.340 | to me a little bit unfortunate as a, you know,
01:34:37.980 | as a typical human type thing,
01:34:40.620 | the fact that I can readily see that, you know,
01:34:43.380 | the AI stuff we build is sort of effectively
01:34:47.300 | creating languages and things
01:34:49.420 | that are completely outside our domain to understand.
01:34:52.260 | And where, by that I mean, you know,
01:34:55.020 | our human language with its 50,000 words or whatever
01:34:57.540 | has been developed over the last however many,
01:34:59.620 | you know, tens of thousands of years.
01:35:01.500 | And as a society, we've developed this way
01:35:03.940 | of communicating and explaining things.
01:35:05.980 | You know, the AIs are reproducing that process very quickly,
01:35:10.980 | but they're coming up with a, an ahistorical,
01:35:14.100 | you know, something, you know,
01:35:15.300 | their way of describing the world,
01:35:16.700 | but it doesn't happen to relate at all
01:35:18.180 | to our historical way of doing it.
01:35:20.100 | And that's, you know, it's a little bit disquieting
01:35:23.140 | to me as a human that, you know,
01:35:24.740 | there are things going on inside where I know it is,
01:35:26.980 | you know, in principle, I could learn that language,
01:35:29.900 | but it's, you know, not the historical one
01:35:33.420 | that we've all learned.
01:35:34.740 | And it really wouldn't make a lot of sense to do that
01:35:36.580 | 'cause you learn it for one AI
01:35:37.820 | and then another one gets trained
01:35:39.300 | and it's gonna use something different.
01:35:41.420 | So it's, but my main, I guess my main point for education,
01:35:45.820 | so another point about education I'll just make,
01:35:47.660 | which is something I haven't figured out,
01:35:48.940 | but just is, you know, when do we get to make a good model
01:35:54.580 | for the human learner using machine learning?
01:35:57.700 | So in other words, you know,
01:35:59.380 | part of what we're trying to do,
01:36:00.540 | like I've got that automated proof,
01:36:02.980 | I would really like to manage to figure out a way,
01:36:05.860 | what is the best way to present that proof
01:36:07.940 | so a human can understand it?
01:36:09.740 | And basically for that,
01:36:11.820 | we have to have a bunch of heuristics
01:36:13.460 | about how humans understand things.
01:36:14.900 | So as an example, if we're doing, let's say,
01:36:17.540 | a lot of visualization stuff in Wolfram Language, okay,
01:36:20.180 | we have tried to automate, do automated aesthetics.
01:36:23.820 | So what we're doing is, you know, we're laying out a graph,
01:36:27.500 | what way of laying out that graph
01:36:29.260 | is most likely for humans to understand it?
01:36:31.580 | And we've done that, you know,
01:36:32.580 | by building a bunch of heuristics and so on,
01:36:34.740 | but that's an example of, you know,
01:36:36.420 | if we could do that for learning,
01:36:38.500 | and we say, what's the optimal path,
01:36:40.180 | given that the person is trying to understand this proof,
01:36:42.300 | for example, what's the optimal path to lead them through
01:36:45.380 | understanding that proof?
01:36:46.780 | I suspect we will learn a lot more
01:36:48.500 | in probably fairly small number of years about that.
01:36:51.980 | And it will be the case that, you know,
01:36:53.620 | for example, if you've got, oh, I don't know,
01:36:56.740 | you can do simple things like, you know,
01:36:58.700 | you go to Wikipedia and you look at what the path of,
01:37:01.460 | you know, how do you, if you wanna learn this concept,
01:37:03.180 | what other concepts do you have to learn?
01:37:04.740 | We have much more detailed symbolic information
01:37:07.060 | about what is actually necessary to know
01:37:09.780 | in order to understand this and so on.
01:37:12.300 | It is, I think, reasonably likely
01:37:14.780 | that we will be able to, I mean, you know,
01:37:16.860 | if I look at, I was interested recently
01:37:18.700 | in the history of math education.
01:37:20.060 | So I wanted to look at the complete sort of path
01:37:22.980 | of math textbooks, you know, for the past,
01:37:26.100 | well, basically the, like, 1,200, you know,
01:37:29.900 | Pivarnacci produced this, one of the early math textbooks.
01:37:33.060 | So there've been these different ways of teaching math.
01:37:35.660 | And, you know, I think we've gradually evolved
01:37:38.500 | a fairly optimized way for the typical person,
01:37:41.500 | though it's probably the variation of the population
01:37:43.940 | is not well understood, for, you know,
01:37:45.860 | how to explain certain concepts.
01:37:47.900 | And we've gone through some pretty weird ways of doing it
01:37:50.460 | from the 1600s and so on,
01:37:52.380 | where which have gone out of style and possibly,
01:37:55.620 | you know, who knows whether that's because of, well,
01:37:58.860 | but anyway, so, you know, we've kind of learned this path
01:38:01.580 | of what's the optimal way to explain adding fractions
01:38:04.260 | or something for humans, for the typical human.
01:38:07.420 | But I think we'll learn a lot more about how, you know,
01:38:09.700 | by essentially making a model for the human,
01:38:11.980 | a machine model for the human,
01:38:13.500 | we'll learn more about how to, you know,
01:38:16.340 | how to optimize, how to explain stuff to humans,
01:38:19.260 | a coming attraction.
01:38:20.740 | But-- - Thanks.
01:38:22.180 | - By the way, do you think we're close to that at all?
01:38:24.580 | 'Cause you said that there's something in Wolfram Alpha
01:38:28.300 | that presents the human a nice way.
01:38:31.740 | Are we how far, you said, coming attraction 10 years?
01:38:34.540 | - Yeah, right, so I mean, in that explaining stuff
01:38:39.340 | to humans thing is a lot of human work right now.
01:38:43.340 | Being able to automate explaining stuff to humans.
01:38:46.740 | Okay, so some of these things, let's see.
01:38:50.980 | I mean, so an interesting question,
01:38:53.740 | actually just today I was working on something
01:38:55.380 | that's related to this.
01:38:56.660 | Yeah, it's being able to,
01:38:58.980 | the question is given a whole bunch of,
01:39:02.100 | can we, for example, train a machine learning system
01:39:05.420 | from explanations that it can see, roughly,
01:39:08.540 | can we train it to give explanations
01:39:10.340 | that are likely to be understandable?
01:39:12.180 | Maybe.
01:39:13.340 | I think the, okay, so an example that I'd like to do,
01:39:16.900 | okay, I'd like to do a debugging assistant
01:39:19.540 | where the typical thing is program runs,
01:39:21.940 | program gives wrong answer.
01:39:23.660 | Human says, why did you get the wrong,
01:39:25.580 | why did it give the wrong answer?
01:39:27.180 | Well, the first piece of information to the computer is
01:39:29.580 | that was, the human thought that was the wrong answer
01:39:32.300 | 'cause the computer just did what it was told
01:39:34.460 | and it didn't know that was supposed to be the wrong answer.
01:39:36.500 | So then the question is, can you in fact,
01:39:39.140 | in that domain, can you actually have
01:39:41.580 | a reasonable conversation in which the human
01:39:44.980 | is explaining the computer what they thought
01:39:46.540 | it was supposed to do, the computer is explaining
01:39:48.580 | what happened and why did it happen and so on.
01:39:50.900 | Same kind of thing for math tutoring.
01:39:53.700 | You know, we have a lot of, you know,
01:39:55.500 | we've got a lot of stuff, you know,
01:39:57.300 | we're sort of very widely used for people
01:39:59.380 | who want to understand the steps in math.
01:40:02.220 | You know, can we make a thing where people tell us,
01:40:04.420 | I think it's this?
01:40:05.780 | Okay, I'll tell you one little factoid,
01:40:07.460 | which I, which it did work out.
01:40:08.940 | So if you do multi-digit arithmetic,
01:40:11.540 | multi-digit addition, okay?
01:40:13.820 | Okay, so the basis of this is,
01:40:15.980 | it's kind of silly, silly thing,
01:40:18.340 | but you know, if you get the right answer
01:40:20.100 | for an addition sum, okay,
01:40:22.140 | you don't get very much information.
01:40:23.900 | The student gives the wrong answer,
01:40:25.980 | the question is, can you tell them where they went wrong?
01:40:28.860 | So let's say you have a four-digit addition sum
01:40:31.340 | and the student gives the wrong answer.
01:40:33.260 | Can you backtrace and figure out what they likely did wrong?
01:40:35.980 | And the answer is you can.
01:40:37.620 | You just make this graph of all the different things
01:40:40.260 | that can happen, you know, when did they,
01:40:42.540 | you know, there's certain things that are more common,
01:40:44.500 | transposing numbers and things,
01:40:45.900 | or you know, having a one and a seven mixed up,
01:40:48.660 | those kinds of things.
01:40:49.700 | You can, with very high probability,
01:40:51.860 | given a four-digit addition sum with the wrong answer,
01:40:54.900 | you can say this is the mistake you made,
01:40:57.380 | which is sort of interesting.
01:40:58.980 | And that's, you know, being done in a fairly symbolic way,
01:41:02.340 | whether one can do that in a, you know,
01:41:05.020 | more machine learning kind of way
01:41:06.660 | for more complicated derivations, I'm not sure.
01:41:09.340 | But that's a, you know, that's one that works.
01:41:14.020 | - Hi, sir, I just had a follow-up question.
01:41:16.940 | So do you think, you know, like in the future,
01:41:20.060 | is it possible to simulate virtual environments
01:41:23.780 | which can actually understand how the human mind works
01:41:27.380 | and then build, you know, like finite state machines
01:41:30.260 | inside of this virtual environment
01:41:32.060 | to provide a better learning experience
01:41:36.060 | and a more personalized learning experience?
01:41:39.140 | - Well, I mean, so the question is,
01:41:41.140 | if you're going to, you know,
01:41:43.740 | can you optimize, if you're playing a video game
01:41:45.780 | or something and that video game
01:41:46.900 | is supposed to be educational,
01:41:48.500 | can you optimize the experience
01:41:51.220 | based on a model of you, so to speak?
01:41:54.260 | Yeah, I'm sure the answer is yes.
01:41:56.060 | And I'm sure the, you know,
01:41:57.380 | the question of how complicated the model of you will be
01:42:00.460 | is an interesting question.
01:42:01.700 | I don't know the answer to.
01:42:02.660 | I mean, I've kind of wondered a similar question.
01:42:04.740 | So I'm a kind of personal analytics enthusiast,
01:42:07.980 | so I collect tons of data about myself.
01:42:10.300 | And I mean, I do it mostly
01:42:11.980 | 'cause it's been super easy to do
01:42:13.340 | and I've done it for like 30 years.
01:42:15.260 | And I have, you know,
01:42:16.100 | every keystroke I've typed on a computer,
01:42:17.740 | like every keystroke I've typed here.
01:42:19.260 | And I, the screen of my computer,
01:42:21.060 | every 30 seconds or so, maybe 15 seconds,
01:42:24.220 | I'm not sure, there's a screenshot.
01:42:26.380 | It's a super boring movie to watch.
01:42:28.100 | But anyway, I've been collecting all this stuff.
01:42:30.540 | And so the question that I've asked is,
01:42:33.020 | is there enough data that a bot of me could be made?
01:42:36.980 | In other words, do I have enough data about,
01:42:39.860 | you know, I've got, I've written a million emails,
01:42:43.540 | I have all of those,
01:42:44.380 | I've received 3 million emails over that period of time.
01:42:48.540 | I've got, you know, endless, you know,
01:42:50.820 | things I've typed, et cetera, et cetera, et cetera.
01:42:53.460 | You know, is there enough data to reconstruct,
01:42:56.140 | you know, me basically?
01:42:59.620 | I think the answer is probably yes.
01:43:01.860 | Not sure, but I think the answer is probably yes.
01:43:04.260 | And so the question is in an environment
01:43:06.620 | where you're interacting with some video game,
01:43:08.380 | trying to learn something, whatever else,
01:43:10.220 | you know, how long is it going to be
01:43:11.540 | before it can learn enough about you
01:43:13.780 | to change that environment in a way that's useful
01:43:15.860 | for explaining the next thing to you?
01:43:18.100 | I would guess, I would guess that if done,
01:43:21.100 | that this is comparatively easy.
01:43:23.500 | I might be wrong, but, and that the,
01:43:26.540 | I mean, I think, you know, it's an interesting thing
01:43:29.060 | because, you know, one's dealing with, you know,
01:43:31.220 | there's a space of human personalities,
01:43:32.980 | there's a space of human learning styles.
01:43:35.420 | You know, I'm sort of always interested
01:43:37.060 | in the space of all possible XYZ.
01:43:39.900 | And there's, you know, there's that question
01:43:41.780 | of how do you parameterize the space
01:43:43.420 | of all possible human learning styles?
01:43:45.740 | And is there a way that we will learn, you know,
01:43:49.060 | like, can we do that symbolically
01:43:52.140 | and say these are 10 learning styles, or is it something,
01:43:54.660 | I think that's a case where it's better to use, you know,
01:43:58.220 | sort of soft machine learning type methods
01:44:02.220 | to kind of feel out that space.
01:44:04.420 | - Thank you.
01:44:07.180 | - Yeah, maybe, very last question.
01:44:10.660 | - I was just intuitively thinking
01:44:13.140 | when you spoke about an ocean,
01:44:14.540 | I thought of Isaac Newton when he said,
01:44:17.500 | you know, the famous quote, "I might not."
01:44:22.340 | And I thought instead of Newton on the beach,
01:44:25.020 | what if Franz Liszt were there?
01:44:27.540 | What question would he ask?
01:44:29.700 | What would he say?
01:44:30.700 | And I'm trying to understand your,
01:44:34.540 | the alien ocean and humans
01:44:39.020 | through maybe Franz Liszt and music.
01:44:41.340 | - Well, so, I mean, the quote from Newton is,
01:44:45.820 | it's sort of an interesting quote.
01:44:48.380 | I think it goes something like this.
01:44:49.620 | If, you know,
01:44:50.460 | people are talking about how wonderful calculus
01:44:55.380 | and all that kind of thing are,
01:44:56.980 | and Newton says, you know,
01:44:59.820 | "To others, I may seem like I've done a lot of stuff,
01:45:01.860 | but to me, I seem like a child
01:45:03.500 | who picked up a particularly elegant seashell on the beach.
01:45:08.500 | And I've been studying this seashell for a while,
01:45:11.740 | even though there's this ocean of truth out there
01:45:13.780 | waiting to be discovered."
01:45:14.660 | That's roughly the quote, okay?
01:45:17.020 | I find that quote interesting for the following reason.
01:45:20.700 | What Newton did was, you know, calculus and things like it,
01:45:25.340 | if you look at the computational universe
01:45:26.940 | of all possible programs, there is a small corner.
01:45:30.100 | Newton was exactly right in what he said.
01:45:32.060 | That is, he picked off calculus,
01:45:34.660 | which is a corner of the possible things
01:45:36.620 | that can happen in the computational universe
01:45:39.260 | that happened to be an elegant seashell, so to speak.
01:45:42.220 | They happened to be a case where you can figure out
01:45:44.380 | what's going on and so on,
01:45:46.580 | while there is still this sort of ocean
01:45:48.580 | of other sort of computational possibilities out there.
01:45:53.140 | But when it comes to, you know, you're asking about music,
01:45:55.620 | I, oh, I think my computer stopped
01:45:57.500 | being able to get anywhere,
01:45:58.580 | but sort of interesting, the,
01:46:02.020 | see if we can get to the site.
01:46:04.660 | Yeah, so this is a website that we made years ago,
01:46:09.660 | and now my computer isn't playing anything, but.
01:46:14.300 | (upbeat music)
01:46:16.900 | Let's try that.
01:46:20.500 | Okay, so these things are created
01:46:26.300 | by basically just searching computational universe
01:46:29.260 | of possible programs.
01:46:30.780 | It's sort of interesting because every one
01:46:32.980 | has kind of a story.
01:46:34.300 | Some of them look more interesting than others.
01:46:35.820 | Let's try that one.
01:46:36.780 | Anyway, the, what's interesting,
01:46:49.060 | actually, what was interesting to me about this was,
01:46:51.460 | this is a very trivial, you know,
01:46:53.340 | what this is doing is very trivial at some level.
01:46:55.780 | It's just, it actually happens to use cellular automata.
01:46:58.820 | You can even have it show you, I think, someplace here.
01:47:02.180 | Where is it?
01:47:03.020 | Somewhere there's a way of showing,
01:47:03.900 | you know, show the evolution.
01:47:05.020 | This is showing the behind the scenes
01:47:07.620 | what was actually happening,
01:47:09.660 | what it chose to use to generate that musical piece.
01:47:12.420 | And what I thought was interesting about this site,
01:47:16.940 | I thought, well, you know,
01:47:19.660 | how would computers be relevant to music,
01:47:21.740 | et cetera, et cetera, et cetera?
01:47:22.580 | Well, you know, what would happen is,
01:47:24.620 | a human would have an idea,
01:47:26.380 | and then the computer would kind of dress up that idea.
01:47:29.140 | And then, you know, a bunch of years go by,
01:47:31.380 | and I talk to people, you know,
01:47:33.580 | who are composers and things, and they say,
01:47:35.060 | "Oh yeah, I really like that Wolfram Tone site."
01:47:37.780 | Okay, that's nice.
01:47:39.300 | They say, "It's a very good place for me to get ideas."
01:47:42.740 | So that's sort of the opposite of what I would have expected,
01:47:46.420 | namely, what's happening is, you know,
01:47:49.140 | human comes here, you know,
01:47:51.100 | listens to some 10 second fragment,
01:47:54.740 | and they say, "Oh, that's an interesting idea."
01:47:56.940 | And then they kind of embellish it
01:47:59.260 | using kind of something that is humanly meaningful.
01:48:02.220 | But it's like, you know, you're taking a photograph,
01:48:04.740 | and you see some interesting configuration,
01:48:07.100 | and then kind of you're, you know,
01:48:08.900 | you're filling that with kind of some human sort of context.
01:48:13.660 | But so I'm not quite sure what,
01:48:20.700 | what you were asking about.
01:48:21.740 | I mean, back to the Newton quote,
01:48:24.060 | the thing that I think is another way
01:48:27.100 | to think about that quote is us humans, you know,
01:48:31.020 | with our sort of historical development of, you know,
01:48:34.900 | our intellectual history have explored
01:48:38.140 | this very small corner of what's possible
01:48:40.740 | in the computational universe.
01:48:42.460 | And everything that we care about
01:48:44.580 | is contained in the small corner.
01:48:46.740 | And that means that, you know, you could say,
01:48:49.260 | "Well, gee, you know, I want to, you know,
01:48:52.620 | what we end up wanting to talk about
01:48:56.420 | are the things that we as a society
01:48:58.180 | have decided we care about."
01:49:00.020 | And what, there's an interesting feedback loop,
01:49:02.100 | I'll just mention, it should end,
01:49:03.460 | but so you might say, so here's a funny thing.
01:49:08.420 | So let's take language, for example.
01:49:10.580 | Language evolves, we say, we make up language
01:49:13.540 | to describe what we see in the world, okay?
01:49:16.260 | Fine, that's a fine idea.
01:49:18.020 | Imagine the, you know, in Paleolithic times,
01:49:20.580 | people would make up language.
01:49:21.820 | They probably didn't have a word for table
01:49:24.020 | because they didn't have any tables.
01:49:26.300 | They probably had a word for rock.
01:49:28.300 | But then we end up as a result of the particular,
01:49:32.700 | you know, development that our civilization
01:49:35.780 | has gone through, we build tables.
01:49:38.100 | And there was sort of a synergy
01:49:40.540 | between coming up with a word for table
01:49:42.860 | and deciding tables were a thing
01:49:44.420 | and we should build a bunch of them.
01:49:46.220 | And so there's this sort of complicated interplay
01:49:48.500 | between the things that we learn how to describe
01:49:51.060 | and how to think about, the things that we build
01:49:53.340 | and put in our environment, and then the things
01:49:55.740 | that we end up wanting to talk about
01:49:58.700 | because they're things that we have experience of
01:50:00.780 | in our environment.
01:50:01.940 | And so that's the, you know, I think as we look
01:50:03.780 | at sort of the progress of civilization,
01:50:05.740 | there's, you know, there's various layers of,
01:50:08.060 | first we, you know, we invent a thing
01:50:10.580 | that we can then think about and talk about.
01:50:13.260 | Then we build an environment based on that.
01:50:16.580 | Then that allows us to do more stuff
01:50:18.780 | and we build on top of that.
01:50:19.780 | And that's why, for example, when we talk
01:50:21.180 | about computational thinking and teaching it to kids
01:50:23.140 | and so on, that's one reason that's kind of important
01:50:26.260 | because we're building a layer of things
01:50:28.780 | that people are then familiar with that's different
01:50:30.820 | from what we've had so far.
01:50:32.500 | And they give people a way to talk about things.
01:50:33.980 | I'll give you one example that,
01:50:36.180 | let's see, did I have that still up?
01:50:37.980 | The, yeah, okay, one example here.
01:50:42.580 | (keyboard clicking)
01:50:45.580 | From this blog post of mine, actually.
01:50:49.020 | So the, where is it?
01:50:53.660 | Okay, so that thing there is a nested pattern.
01:50:58.660 | You know, it's a Sapinski.
01:51:01.220 | That tile pattern was created in 1210 AD, okay?
01:51:10.180 | And it's the first example I know of a fractal pattern.
01:51:13.220 | Okay, well, the art historians wrote about these patterns.
01:51:18.220 | There are a bunch of this particular style of pattern.
01:51:20.620 | They wrote about these for years.
01:51:22.620 | They never discussed that nested pattern.
01:51:24.860 | These patterns also have, you know, pictures of lions
01:51:27.100 | and, you know, elephants and things like that in them.
01:51:29.860 | They wrote about those kinds of things,
01:51:31.620 | but they never mentioned the nested pattern
01:51:33.820 | until basically about 25 years ago
01:51:37.620 | when fractals and so on became a thing.
01:51:40.660 | And then it's, ah, I can now talk about that.
01:51:42.940 | It's a nested pattern, it's a fractal.
01:51:44.980 | And then, you know, before that time,
01:51:47.100 | the art historians were blind
01:51:49.020 | to that particular part of this pattern.
01:51:50.980 | It was just like, I don't know what that is.
01:51:52.700 | But there's no, you know, I don't have a word to describe it.
01:51:54.820 | I'm not going to, I'm not gonna talk about it.
01:51:58.780 | So that's a, you know, it's part of this feedback loop
01:52:00.700 | of things that we learn to describe,
01:52:04.100 | then we build in terms of those things,
01:52:05.780 | then we build another layer.
01:52:07.380 | I think one of the things, I mean, you talk about,
01:52:09.620 | you know, just in the sort of,
01:52:11.860 | the thing, one thing I'm really interested in
01:52:14.620 | is the evolution of purposes.
01:52:16.500 | So, you know, if you look back in human history,
01:52:18.740 | there's a, you know, what was thought to be worth doing
01:52:21.740 | a thousand years ago is different
01:52:22.980 | from what's thought to be worth doing today.
01:52:25.460 | And part of that is, you know,
01:52:29.420 | good examples of things like, you know,
01:52:30.900 | walking on a treadmill or buying goods in virtual worlds.
01:52:35.220 | Both of these are hard to explain
01:52:36.900 | to somebody from a thousand years ago,
01:52:39.060 | because each one ends up being a whole sort of societal story
01:52:42.260 | about we're doing this because we do that,
01:52:43.740 | because we do that.
01:52:44.820 | And so the question is,
01:52:45.660 | how will these purposes evolve in the future?
01:52:48.460 | And I think one of the things that I view
01:52:50.100 | as a sort of sobering thought is that,
01:52:52.260 | actually one thing I found rather disappointing
01:52:56.740 | and then I became less pessimistic about it is,
01:52:59.100 | if you think about the future of the human condition,
01:53:01.340 | and, you know, we've been successful
01:53:02.780 | in making our AI systems and we can read out brains
01:53:05.540 | and we can upload consciousnesses and things like that.
01:53:07.980 | And we've eventually got this box
01:53:09.660 | with trillions of souls in it.
01:53:11.620 | And the question is, what are these souls doing?
01:53:13.900 | And to us today, it looks like they're playing video games
01:53:17.700 | for the rest of eternity, right?
01:53:19.540 | And that seems like a kind of a bad outcome.
01:53:21.700 | It's like, we've gone through all of this long history
01:53:24.380 | and what do we end up with?
01:53:25.260 | We end up with a trillion souls
01:53:27.020 | in a box playing video games, okay?
01:53:29.420 | And I thought this is a very, you know,
01:53:32.300 | depressing outcome, so to speak.
01:53:34.540 | And then I realized that actually, you know,
01:53:36.940 | if you look at the sort of arc of human history,
01:53:39.780 | people at any given time in history,
01:53:43.100 | people have been, you know, they've,
01:53:46.820 | my main conclusion is that any time in history,
01:53:51.460 | the things people do seem meaningful and purposeful
01:53:54.700 | to them at that time in history and history moves on.
01:53:59.100 | And, you know, like a thousand years ago,
01:54:00.980 | there were a lot of purposes that people had
01:54:04.460 | that, you know, were to do with weird superstitions
01:54:07.260 | and things like that that we say,
01:54:08.780 | why the heck were you doing that?
01:54:10.100 | That just doesn't make any sense, right?
01:54:12.420 | But to them at that time, it made all the sense in the world.
01:54:15.820 | And I think that, you know,
01:54:16.940 | the thing that makes me sort of less depressed
01:54:18.740 | about the future, so to speak,
01:54:20.420 | is that at any given time in history, you know,
01:54:23.580 | you can still have meaningful purposes,
01:54:26.420 | even though they may not look meaningful
01:54:28.100 | from a different point in history.
01:54:29.260 | And that there's sort of a whole theory
01:54:30.660 | you can kind of build up based on kind of the trajectories
01:54:33.900 | that you follow through the space of purposes
01:54:36.260 | and sort of interesting, if you can't jump, like you say,
01:54:39.460 | let's get cryonically frozen for, you know, 300 years,
01:54:43.020 | and then, you know, be back again.
01:54:46.100 | The interesting case is then, you know,
01:54:48.900 | all the purposes that you sort of, you know,
01:54:52.100 | that you find yourself in,
01:54:53.940 | ones that have any continuity with what we know today.
01:54:56.740 | I should stop with that.
01:54:57.580 | - That's a beautiful way to end it.
01:54:59.300 | Please give Steven a big hand.
01:55:00.940 | (audience applauding)